0% found this document useful (0 votes)
5 views48 pages

CDC Unit - 1

Uploaded by

devildheke7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views48 pages

CDC Unit - 1

Uploaded by

devildheke7
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

UNIT – 1

COMPILER
▪A program that takes as input a program written in one
language (source language) and translates it into a
functionally equivalent program in another language (target
language)
▪Source – Usually high level languages like C, C++, Java
▪Target – Low level languages like assembly or machine code
▪During translation – Also reports errors and warnings to help
the programmer
Prepared by Sherin Joshi
Prepared by Sherin Joshi
Prepared by Sherin Joshi
APPLICATION AREAS
▪The techniques used in compiler design can be applicable to
many problems in computer science
▪Techniques used in a lexical analyzer can be used in text
editors, information retrieval system, and pattern recognition
programs
▪Techniques used in a parser can be used in a query processing
system such as SQL

Prepared by Sherin Joshi


APPLICATION AREAS
▪Many software having a complex front-end may need
techniques used in compiler design
▪A symbolic equation solver which takes an equation as input.
That program should parse the given input equation.
▪Most of the techniques used in compiler design can be used in
Natural Language Processing (NLP) systems

Prepared by Sherin Joshi


INTERPRETER
▪An interpreter is another common kind of language processor
▪Instead of producing a target program as a translation, an
interpreter appears to directly execute the operations
specified in the source program on inputs supplied by the user
▪Both compiler and interpreters do the same job which is
converting higher level programming language to machine
code
▪However, a compiler will convert the code into machine code
(create an exe) before program run
Prepared by Sherin Joshi
INTERPRETER
▪Compiler transforms code written in a HLL into the machine
code, at once, before program runs
▪Interpreter converts each HLL program statement, one by one,
into the machine code, during program run
▪Compiled code runs faster while interpreted code runs slower
▪Compiler displays all errors after compilation, on the other
hand, the Interpreter displays errors of each line one by one

Prepared by Sherin Joshi


Prepared by Sherin Joshi
Prepared by Sherin Joshi
A LANGUAGE PROCESSING SYSTEM

Prepared by Sherin Joshi


▪Preprocessor
Programs which translate source code into simpler or slightly
lower level source code, for compilation by another compiler
Performs a pre-compilation of the source program to expand
any macro definitions
A preprocessor may allow a user to define macros that are
short hands for longer constructs

Eg of a macro in C: #define square(x) ((x)*(x))


In code body it can be used as: int num = square(5)
Which is the same as writing: int num = ((5) * (5))

Prepared by Sherin Joshi


▪Assembler
▪Programs written to automate the translation of assembly
language in to machine language
▪An assembly language is one where mnemonics are used
▪Mnemonics are symbols used for each machine instruction
which make it easier to write/read programs compared to
those written in machine language
▪Mnemonics are subsequently translated into machine
language

▪Linker/Loader
▪If the target program is machine code, loaders are used to
load the target code into memory for execution
▪Linkers are used to link target program with the libraries
Prepared by Sherin Joshi
PHASES OF COMPILING
▪Two main phases of compiling process
▪Analysis
▪Synthesis
▪Analysis
▪Breaks up the source program into pieces and creates a
language independent intermediate representation of
program
▪The analysis part also collects information about the source
program and stores it in a data structure called a ‘symbol
table’, which is passed along with the intermediate
representation to the synthesis part
Prepared by Sherin Joshi
PHASES OF COMPILING
▪Synthesis
▪Constructs the desired target program from the intermediate
representation and the information in the symbol table
▪The analysis part is often called the front end of the compiler
▪The synthesis part is the back end

Prepared by Sherin Joshi


PHASES
OF
COMPILER

Prepared by Sherin Joshi


PHASES OF COMPILER
▪Each phase transforms the source program from one
representation into another representation
▪They communicate with error handlers
▪They communicate with the symbol table

Prepared by Sherin Joshi


PHASES OF COMPILER
▪Analysis
▪Lexical Analysis
▪Syntax Analysis
▪Semantic Analysis
▪Synthesis
▪Intermediate Code Generation
▪Machine-Independent (Intermediate) Code Optimization
▪(Object) Code Generation
▪Machine-Dependent (Object) Code Optimization
Prepared by Sherin Joshi
LEXICAL ANALYSIS (SCANNING)
▪The stream of characters making the source program is read
from left to right and grouped into tokens
▪Tokens are sequence of characters that have a collective
meaning
▪Examples of tokens are identifiers, reserved words, operators,
special symbols, etc

Prepared by Sherin Joshi


LEXICAL ANALYSIS

Prepared by Sherin Joshi


LEXICAL ANALYSIS

Prepared by Sherin Joshi


LEXICAL ANALYSIS
▪The other important task of the lexical analyzer is to build a
symbol table
▪This is a table of all the identifiers (variable names,
procedures, and constants) used in the program
▪When an identifier is first recognized by the analyzer, it is
inserted into the symbol table, along with information about its
type, where it is to be stored, and so forth
▪This information is used in subsequent passes of the compiler
Prepared by Sherin Joshi
LEXICAL ANALYSIS

Prepared by Sherin Joshi


SYNTAX ANALYSIS (PARSING)
▪Tokens found during scanning are grouped together using
context free grammar
▪The grammar is a set of rules that define valid structures in
programming languages
▪Each token is associated with a specific rule and grouped
accordingly
▪This process is called parsing
▪The output of this phase is parse (syntax) tree or derivation
Prepared by Sherin Joshi
SYNTAX ANALYSIS
▪If the program follows the rules of the language, then it is
syntactically correct
▪When the parser encounters a mistake, it issues a warning or
error message and tries to continue
▪When the parser reaches the end of the token stream, it will
tell the compiler that either the program is grammatically
correct and compiling can continue or the program contains too
many errors and compiling must be aborted
▪If a parse tree is reached where there are only tokens, the
corresponding statement is valid Prepared by Sherin Joshi
SYNTAX ANALYSIS
▪Given a CFG in Backus Naur Form (BNF):

Prepared by Sherin Joshi


SYNTAX ANALYSIS
▪Parse tree for –
newval := oldval + 12

Prepared by Sherin Joshi


SYNTAX ANALYSIS
▪ Derivation:
assign-stmt -> identifier := expression
-> newval := expression
-> newval := expression + expression
-> newval := identifier + expression
-> newval := oldval + expression
-> newval := oldval + number
-> newval := oldval + 12
Prepared by Sherin Joshi
SYNTAX ANALYSIS

Prepared by Sherin Joshi


SEMANTIC ANALYSIS
▪The parse tree or derivation is next checked for semantic errors
▪Semantic errors are statements that are syntactically correct but
disobey the semantic rules of source language
▪Detection of things such as undeclared variables, functions with
improper arguments, access violation, incompatible operands, etc
▪Type-checking is an important part of semantic analyzer
▪Eg: int a[9], int b, int c;
c = a * b; //Syntactically correct but semantically incorrect
Prepared by Sherin Joshi
Prepared by Sherin Joshi
INTERMEDIATE CODE GENERATION
▪An intermediate language is often used by many compilers for
analyzing and optimizing the source program
▪Intermediate language should have two important properties:
▪It should be simple and easy to produce
▪It should be easy to translate target program
▪Intermediate codes are generally machine (architecture)
independent
▪But the level of intermediate codes is close to the level of
machine codes
Prepared by Sherin Joshi
INTERMEDIATE CODE GENERATION
▪A common form used for intermediate codes is Three Address
Code (TAC)
▪TAC looks like assembly language but does not represent a
particular architecture
▪TAC is a sequence of simple instructions, each of which can
have at most 3 operands

Prepared by Sherin Joshi


INTERMEDIATE CODE GENERATION

Prepared by Sherin Joshi


INTERMEDIATE CODE GENERATION

Prepared by Sherin Joshi


INTERMEDIATE CODE OPTIMIZATION
▪The optimizer accepts input in the intermediate representation
(TAC) and outputs a streamlined version still in intermediate
representation
▪In this phase, the compiler attempts to produce the smallest,
fastest and most efficient running result by applying various
techniques as:
▪Removing unused variables
▪Eliminating multiplication by 1 and addition by 0
▪Loop Optimization
▪Suppressing code generation of unreachable code segmentPreparedetc
by Sherin Joshi
INTERMEDIATE CODE OPTIMIZATION
▪The optimization phase slows down the compiler
▪So most compilers allow this feature to be suppressed or
turned off by default
▪Example: t1 = b * c
t2 = t1 + 0
t3 = b * c
t4 = t2 + t3
a = t4 Prepared by Sherin Joshi
INTERMEDIATE CODE OPTIMIZATION
▪Optimization: t1 = b * c
a = t1 + t1

Prepared by Sherin Joshi


INTERMEDIATE CODE OPTIMIZATION

Prepared by Sherin Joshi


(OBJECT) CODE GENERATION
▪This process takes the intermediate code produced by the
optimizer and generates final code in target language
▪It is this part of the compilation phase that is machine
dependent
▪The target code is normally is a relocatable object file
containing the machine or assembly codes
▪The TAC is translated into a sequence of assembly or machine
language instructions that perform the same task
Prepared by Sherin Joshi
CODE GENERATION
▪Example (TAC): t1 = b * c
a = t1 + t1
▪Corresponding assembly code (target program):
LDA R1, b
LDA R2, c
MUL R1, R2
STA t1, R1
MOV R3, t1
ADD R3, t1
MOV a, R3 Prepared by Sherin Joshi
CODE GENERATION

Prepared by Sherin Joshi


OBJECT CODE OPTIMIZATION
▪In this phase, the object code is transferred into more efficient
code by making more efficient use of processor and registers
▪The compiler can take advantage of machine specific idioms,
specialized instructions, pipelining, branch prediction and other
optimization techniques
▪As with intermediate code optimization, this phase of compiler
is either configurable or skipped entirely

Prepared by Sherin Joshi


OBJECT CODE OPTIMIZATION
▪Optimized version of above example:
LDA R1, b
MUL R1, c
STA t1, R1
ADD R1, t1
MOV a, R1

Prepared by Sherin Joshi


SYMBOL TABLE
▪A symbol table stores information about keywords and tokens
found during lexical analysis
▪The symbol table is consulted in almost all phases of the
compiler
▪Example:
Insert(“dist”, id) //insert a symbol table entry associating the
//string “dist” with token type “id”

Prepared by Sherin Joshi


SYMBOL TABLE
▪Example:
Lookup(“dist”) //An occurrence of string “dist” can be looked up
//in the symbol table. If found, the reference to
//the “id” is returned else lookup returns 0.

Prepared by Sherin Joshi


ERROR HANDLING
▪Errors may be encountered in different phases of compiler
▪Objective of error handling is to go as far as possible in
compilation whenever an error is encountered
▪Examples:
▪Handling missing symbols during lexical analysis by inserting
symbol
▪Automatic type conversion during semantic analysis

Prepared by Sherin Joshi


EXPLORE
● A simple one-pass compiler

● One-pass/Multi-pass compiler

https://pages.cs.wisc.edu/~fischer/cs536.s08/course.hold/html/NOTES/1.OVERVIEW.html

Prepared by Sherin Joshi

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy