UNIT I-Introduction To Compiler
UNIT I-Introduction To Compiler
P Institute of Technology
Coimbatore- 48
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
Dr.K.Moorthi
ASP/CSE
1
CS8602- COMPILER DESIGN
PREREQUISITES FOR LEARNING
COMPILER DESIGN
• Knowledge of Automata Theory
• Computer Architecture
• Data structures
• Logic or algebra
Types
• Compiler
• Assembler
• Interpreter
TYPES OF TRANSLATORS
1.Interpreter
2.Compiler
3.Assembler
Lexical Analysis
Stream of Tokens
Syntax Analysis
Parse Tree
Semantic Analysis
Symbol table Error handling
management Parse Tree(semantically Verified)
Intermediate code
generation
Three Address Code
Code optimization
Code generation
TargetCS8602-
Program
COMPILER` DESIGN 18
CS8602- COMPILER DESIGN 19
Phases : LEXICAL ANALYZER
• Source program is scanned to read the stream of characters and those characters are
grouped to form a sequence called lexemes which produces token as output.
• Token: Token is a sequence of characters that represent lexical unit, which matches
with the pattern, such as keywords, operators, identifiers etc.
• Pattern: Pattern describes the rule that the lexemes of a token takes. It is the
structure that must be matched by strings.
• Once a token is generated the corresponding entry is made in the symbol table.
• Output: Token
Lexemes Tokens
c identifier
= assignment symbol
a identifier
+ + (addition symbol)
b identifier
* * (multiplication symbol)
5 5 (number)
expression
Assign-expression
expression = expression
subscript-expression additive-expression
c=a+b*5;
<id, 1><=>< id, 2>< +><id, 3 >< * >< 5>
CS8602- COMPILER DESIGN 23
SEMANTIC ANALYZER
c=a+b*5;
<id, 1><=>< id, 2>< +><id, 3 >< * >< 5>
c=a+b*5;
<id, 1><=>< id, 2>< +><id, 3 >< * >< 5> 26
CODE GENERATOR
• Gets input from code optimization phase and produces the target code
or object code as result.
• Intermediate instructions are translated into a sequence of machine
instructions that perform the same task.
LDF R2, id3
MULF R2, # 5.0
t1 = inttofloat (5) t1 = id3* 5.0 LDF R2, id3
LDF R1, id2
t2 = id3* tl id1 = id2 + t1 MULF R2, # 5.0
ADDF R1, R2 t3 = id2 + t2 LDF R1, id2
STF id1, R1 id1 = t3 ADDF R1, R2
STF id1, R1
c=a+b*5;
<id, 1><=>< id, 2>< +><id, 3 >< * >< 5> 27
SYMBOL TABLE
• Used to store all the information about identifiers used in the program.
• It is a data structure containing a record for each identifier, with fields for
the attributes of the identifier.
• It allows finding the record for each identifier quickly and to store or
retrieve data from that record.
• Whenever an identifier is detected in any of the phases, it is stored in the
symbol table.
Example Symbol name Type Address
b Int 1002
c Float 1004
z char 1008
Lexical analysis:
• Faulty sequence of characters which does not result in
a token, e.g.Ö, 5EL, %K,‟string
Syntax analysis:
• Syntax error (e.g. missing semicolon), (4 * (y + 5) -12))
Semantic analysis:
• Type conflict, e.g. ‟HEJ‟+5
Code optimization:
• Uninitialized variables, anomaly detection.
The front end back end model of the compiler is very much advantageous
because
of following reasons -
1. By keeping the same front end and attaching different back ends one can
produce a compiler for same source language on different machines.
2. By keeping different front ends and same back end one can compile
several different languages on the same machine.
• In the first pass the source program is scanned completely and the
generated output will be an easy to use form which is equivalent to
the source program along with the additional information about the
storage allocation.
• It is possible to leave a blank slots for missing information and fill
the slot when the information becomes available. Hence there may
be a requirement of more than one pass in the compilation process.
• A typical arrangement for optimizing compiler is one pass for
scanning and parsing, one pass for semantic analysis and third pass
for code generation and target code optimization. C and PASCAL
permit one pass compilation, Modula-2 requires two passes.
CS8602- COMPILER DESIGN 37
Grouping of Phases
1.
1. Parser generators.
2. Scanner generators.
3. Syntax-directed translation engines.
4. Automatic code generators.
5. Data-flow analysis engines.
PARSER GENERATORS
• Input: Grammatical description of a programming language
• Output: Syntax analyzers.
• Parser generator takes the grammatical description of a programming
language and produces a syntax analyzer.
SCANNER GENERATORS
• Input: Regular expression description of the tokens of a language
• Output: Lexical analyzers.
• Scanner generator generates lexical analyzers from a regular expression
description of the tokens of a language.
• Binary Translation
• Hardware Synthesis
• Database Query Interpreters
• Compiled Simulation