VMKV Engineering College Department of Computer Science & Engineering Principles of Compiler Design Unit I Part-A
VMKV Engineering College Department of Computer Science & Engineering Principles of Compiler Design Unit I Part-A
PART-A
1. What is a compiler?
A program that converts another program from some source
language (or programming language) to machine language
(object code).
8. Define translator
A translator which can translate the high level language to the
low level language.
position +
initial *
rate 60
Token specification
Alphabet :
String :
Language:
Parts of string:
Operations on Languages
•Concatenation:
o L1L2 = { s1s2 | s1 L1 and s2 L2 }
•
Union
o L1 L2 = { s | s L1 or s L2 }
•Exponentiation:
o L0 = {} L1 = L L2 = LL
•
Kleene Closure
o –
L* = zero or more occurance
•Positive Closure
Example
L1 = {a,b,c,d} L2 = {1,2}
L1L2 = {a1,a2,b1,b2,c1,c2,d1,d2}
L1 È L2 = {a,b,c,d,1,2}
L13 = all strings with length three (using a,b,c,d}
L1* = all strings using letters a,b,c,d and empty string
L1+ = doesn’t include the empty string
Regular Expressions
{}
a S {a}
(r)* (L(r))*
(r) L(r)
(r)+ = (r)(r)*
(r)? = (r) |
o * highest
o concatenation next
o | lowest
ab*|c means (a(b)*)|(c)
Examples:
S = {0,1}
0|1 => {0,1}
(0|1)(0|1) => {00,01,10,11}
0* => { ,0,00,000,0000,....}
(0|1)* => all strings with 0 and 1, including the empty string
Regular Definitions
. S{d1,d2,...,di-1}
dn rn
basic symbols previously defined
names
Ex:Identifiers in Pascal
digit 0 | 1 | ... | 9
digits digit +
opt-fraction ( . digits ) ?
opt-exponent ( E (+|-)? digits ) ?
unsigned-num digits opt-fraction opt-exponent
E E+T | T
T T*F | F
F G^F | G
G id | (E)
Left Recursion
A grammar is left recursive if it has a non-terminal A such that
there is a derivation.
o +
Immediate Left-Recursion
A’ A’ | an equivalent grammar
In general,
T T*F | F
F id | (E)
E T E’
E’ +T E’ |
T F T’
T’ *F T’ |
F id | (E)
Left-Recursion – Problem
S Aa | b
S Aa Sca or
- for i from 1 to n do {
- for j from 1 to i-1 do {
replace each production
Ai Aj
by
Ai 1 | ... | k
where Aj 1 | ... | k
}
- eliminate immediate left-recursions among Ai productions
A Ac | Sd | f
- Order of non-terminals: S, A
for S:
- we do not enter the inner loop.
- there is no immediate left recursion in S.
for A:
A bdA’ | fA’
A’ cA’ | adA’ |
So, the resulting equivalent grammar which is not left-recursive is:
S Aa | b
A bdA’ | fA’
A’ cA’ | adA’ |
A Ac | Sd | f
- Order of non-terminals: A, S
for A:
- we do not enter the inner loop.
- Eliminate the immediate left-recursion in A
A SdA’ | fA’
A’ cA’ |
for S:
S fA’aS’ | bS’
S’ dA’aS’ |
S’ dA’aS’ |
A SdA’ | fA’
A’ cA’ |
Predictive Parser
current token
stmt if ...... |
while ...... |
begin ...... |
for .....
A aBb | bAB
proc A {
case of the current token {
‘a’: - match the current token with a, and move to the next
token;
- call ‘B’;
- match the current token with b, and move to the next
token;
‘b’: - match the current token with b, and move to the next
token;
- call ‘A’;
- call ‘B’;
}
}
A aBe | cBd | C
B bB | e
Cf
proc C
{
match the current token with f, and move to the next token;
}
proc A
{
case of the current token
{
a: - match the current token with a, and move to the next
token; - call B;
- match the current token with e, and move to the next
token;
c: - match the current token with c, and move to the next
token;
- call B;
- match the current token with d, and move to the next
token;
f: - call C
}
}
proc B
{
case of the current token
{
b: - match the current token with b, and move to the next
token;
- call B;
e,d: do nothing
}
f- first set of C
e,d – follow set of B
input buffer
Parsing Table
LL(1) Parser
input buffer
our string to be parsed. We will assume that its end is marked with
a special symbol $.
output
The symbol at the top of the stack (say X) and the current symbol in the
input string (say a) determine the parser action.
B bB |
$S abba$ S aBa
$aBa abba$
$aB bba$ B bB
$aBb bba$
$aB ba$ B bB
$aBb ba$
$aB a$ Be
$a a$
$ $ accept, successful completion
a b $
S S
aBa
B B B
bB
LL(1) Parsing Table
Derivation(left-most): SaBaabBaabbBaabba
Unit II
PART-A
1. Define CFG
Context-Free Grammars
A finite set of terminals (in our case, this will be the set of tokens)
A finite set of non-terminals (syntactic-variables)
A finite set of productions rules in the following form
Aa where A is a non-terminal and a is a string of
terminals and non-terminals (including the empty string)
A start symbol (one of the non-terminal symbol)
A finite set of terminals (in our case, this will be the set of tokens)
A finite set of non-terminals (syntactic-variables)
A finite set of productions rules in the following form
Aa where A is a non-terminal and a is a string of
terminals and non-terminals (including the empty string)
A start symbol (one of the non-terminal symbol)
Parser
Parser works on a stream of tokens.
Ambiguity
A grammar produces more than one parse tree for a sentence is called
as an ambiguous grammar.
T’->,ST’/ .
6. What is the role of the error handler in a parser?
3. Predictive Parsing
no backtracking
efficient
needs a special form of grammars (LL(1) grammars).
Recursive Predictive Parsing is a special form of Recursive
Descent parsing without backtracking.
Non-Recursive (Table Driven) Predictive Parser is also
known as LL(1) parser.
1. panic mode
2. pharse level
3. error productions
4. global correction.
5. Draw the block diagram for syntax analysis
token parse
Source
6. Lexical parser
Analyzer Parser
program
Get next token
E +
E *
E 10
15. What are the terminals? Non Terminals and start symbol for
the grammar
S → (L)|a
L→ L,S|S
A’ A’ | an equivalent grammar
T-> ST’
T’->,ST’/ .
9. What is the need of left factoring
A predictive parser (a top-down parser without backtracking)
insists that the grammar must be left-factored.
E +
E *
E 10
15. What are the terminals? Non Terminals and start symbol for
the grammar
S → (L)|a
L→ L,S|S
A’ A’ | an equivalent grammar
T-> ST’
T’->,ST’/ .
14. What is the need of left factoring
A predictive parser (a top-down parser without backtracking)
insists that the grammar must be left-factored.
Unit III
PART-A
Syntax trees
Postfix notation
Three address codes: [the semantic rules for generating three
address code from common programming languages]
1. Intial,
2. position,and
3. rate .
(0) + a b T1
(0) + a b T1
(4) = t4 x
Intermediate languages
Postfix notation
Three address codes: [the semantic rules for generating three
address code from common programming languages]
Graphical representation
Unit IV
PART-A
A basic block:
t1 := a * a
t2 := a * b
t3 := t1 – t2
Intermediate code
Parser Static Intermedi Code
checker ate code generator
generator
Structure-Preserving Transformations
Flow-of-Control Optimizations
goto L1 goto L2
.
L1: goto L2 L1: goto L2
----------------------------------------------
if a<b goto L1 if a<b goto L2
.
L1: goto L2 L1: goto L2
-----------------------------------------------
goto L1 if a<b goto L2
. goto L3
L1: if a<b goto L2 .
Address Modes
The source and destination fields are not long enough to hold
memory addresses. Certain bit-patterns in these fields specify that
words following the instruction (the instruction is also one word)
contain operand addresses (or constants).
Of course, there will be cost for having memory addresses and
constants in instructions.
We will use different addressing modes to get addresses of source
and destination.
18. What are the forms of the output of the code generator?
Target Programs (Output of Code Generation)
PART-B
Code Generation
Memory Management
Instruction Selection
Register Allocation
Target Machine
To design a code generator, we should be familiar with the
structure of the target machine and its instruction set.
nstead of a specific architecture, we will design our own simple
target machine for the code generation.
– We will decide the instruction set, but it will be closer
actually machine instructions.
– We will decide size and speeds of the instructions, and we
will use them in the creation of good code generators.
– Although we do not use an actual target machine, our
discussions are also applicable to actual target machines.
ADD add source to destination
SUB subtract source from destination
MOV move source to destination
Run-Time Addresses
Stack Variables
o Stack variables are accesses using offsets from the beginning
of the activation records.
non-local variable
o access links
o displays
Basic Blocks
A basic block is a sequence of consecutive statements (of intermediate
codes – quadraples) in which flow of control enters at the beginning and
leaves at the end without halt or possibility of branch (except at the end).
A basic block:
t1 := a * a
t2 := a * b
t3 := t1 – t2
Corresponding Quadraples
1: prod := 0
2: i := 1
3: t1 := 4*i
4: t2 := a[t1]
5: t3 := 4*i
6: t4 := b[t3]
7: t5 := t2*t4
8: t6 := prod+t5
9: prod := t6
10: t7 := i+1
11: i := t7
12: if i<=20 goto 3
2. Explain briefly about DAG representation of basic blocks.
1: t1 := 4*i
2: t2 := a[t1]
3: t3 := 4*i
4: t4 := b[t3]
5: t5 := t2*t4
6: t6 := prod+t5
7: prod := t6
8: t7 := i+1
9: i := t7
10: if i<=20 goto 1
Corresponding DAG
Construction of DAGs
•We can systematically create a corresponding dag for a given
basic block.
•Each name is associated with a node of the dag. Initially, all
names are undefined (i.e. they are not associated with nodes of the
dag).
•For each three-address code x := y op z
o –Find node(y). If node(y) is undefined, create a leaf node
labeled y and let node(y) to be this node.
o –Find node(z). If node(z) is undefined, create a leaf node
labeled y and let node(z) to be this node.
o –If there is a node with op, node(y) as its left child, and
node(z) as its right child this is node is also treated as
node(x).
o –Otherwise, create node(x) with op, node(y) as its left child,
and node(z) as its right child.
Applications of DAGs
We automatically detect common sub-expressions.
We can determine which identifiers whose values are used in the
block. (the identifier at leaves).
We can create simplified quadraples for a block using its dag.
o –taking advantage of common sub-expressions
o –without performing unnecessary move instructions.
In general, the interior nodes of a the dag can be evaluated in any
order that is a topological sort of the dag.
o –In topological sort, a node is not evaluated until its all
children are evaluated.
o –So, a different evaluation order may correspond to a better
code sequence.
(i)Flow-of-Control Optimizations:
goto L1 goto L2
.
L1: goto L2 L1: goto L2
----------------------------------------------
if a<b goto L1 if a<b goto L2
.
L1: goto L2 L1: goto L2
-----------------------------------------------
goto L1 if a<b goto L2
. goto L3
L1: if a<b goto L2 .
L3: L3:
Return Value
SP
Actual Parameters
Other Stuff
Local Variables
Temporaries
ADD #caller.recordsize,SP
MOV PARAM1,*8(SP) // save parameters
MOV PARAM2,*12(SP)
.
MOV PARAMn,*4+4n(SP)
. // saving other stuff
MOV #here+16,*SP // save return address
GOTO callee.codearea // jump to procedure
SUB #caller.recordsize,SP // return address
Run-Time Addresses
Static Variables:
o static[12] staticaddressblock+12
So, the static variables are absolute addresses and these absolute
addresses are evaluated at compile time (or load time).
Run-Time Addresses
Stack Variables
o Stack variables are accesses using offsets from the beginning
of the activation records.
i)Register Allocation
iv)Instruction Selection
The structure of the instruction set of the target machine
determines the difficulty of the instruction selection.
–The uniformity and completeness of the instruction set are an
important factors.
Instruction speeds are also important.
–If we do not care speed, the code generation is a straight forward
job. We can map each quadraple into a set of machine instructions.
Naive code generation:
ADD y,z,x MOV y, R0
ADD z, R0
MOV R0,x
Peephole Optimization
Unreachable Code
We may remove unreachable codes.
#define debug 0
.
.
if (debug==1) { print debugging info }
Flow-of-Control Optimizations
goto L1 goto L2
.
L1: goto L2 L1: goto L2
----------------------------------------------
if a<b goto L1 if a<b goto L2
.
L1: goto L2 L1: goto L2
-----------------------------------------------
goto L1 if a<b goto L2
. goto L3
L1: if a<b goto L2 .
L3: L3:
PART-A
1. –Structure-Preserving Transformations
2. –Algebraic Transformations.
i)We say that x is dead at a certain point, if it is not used after that
point in the block (or in the following blocks).
ii)If x is dead at the point of the statement x := y op z, this
statement can be safely eliminated without changing the meaning
of the block
Deep:
Shallow:
Register Descriptors
A register descriptor keeps track of what is currently in each
register.
It will be consulted when a new register is needed by code-
generation algorithm.
We assume that all registers are initially empty before we enter
into a basic block. This is not true if the registers are assigned
across blocks.
At a certain time, each register descriptor will hold zero or more
names.
R1 is empty
MOV a,R1
R1 holds a
MOV R1,b
R1 holds both a and b
Address Descriptors
An address descriptor keeps track of the locations where the
current value of a name can be found at run-time.
The location can be a register, a stack location or a memory
location (in static area). The location can be a set of these.
This information can be stored in the symbol table.
a is in the memory
MOV a,R1
a is in R1 and in the memory
MOV R1,b
b is in R1 and in the memory
Dead-Code Elimination
Example
Assume that we have the following machine codes, and the cost of
each of them is one unit.
o –mov M,Ri
o –mov Ri,M
o –mov Ri,Rj
o –OP M,Ri
o –OP Rj,Ri
Assume that we have only two registers R0 and R1.
First, we have to evaluate cost arrays for the tree.