Chapter No. 5.: Compilers: Analysis Phase
Chapter No. 5.: Compilers: Analysis Phase
COMPILERS :
Analysis Phase.
LL (1) :
L ( SCANNING THE INPUT FROM LEFT TO RIGHT) L (LEFT MOST DERIVATION TO
DERIVE THE PARSE TREE) (1) (NO. OF LOOK AHEAD, HOW MANY SYMBOLS TO
CHECK TO MAKE A DECISION) :
In LL(1) – we will be checking only one symbol, that is why the
look ahead is 1.
Bottom of the stack is always $.
End of the String is represented by $.
$ is used to guess, when we should stop (end).
Parsing program is the parsing algorithm, to show what all
actions should be taken.
Parsing table is the data structure, which will be constructed by
using the Grammar.
Stack is a data structure, which is used for the procedure of
Parsing. Input buffer will have the string.
Parsing program is the parsing algorithm, to
show what all actions should be taken.
Parsing table is the data structure, which will
be constructed by using the Grammar.
Stack is a data structure, which is used for
the procedure of Parsing.
Input buffer will have the string.
For constructing LL(1) parsing table – we
should know two functions – 1. First() and
2. Follow() .
First() : Eg. If Grammar is
S aABCD
A b
B c
C d
D e
If we start with S.
S
a A B C D
what is the starting symbol ?
in all the symbols, we get from S, the
starting symbol is going to be ‘a’.
so First of S is ‘a’.
All Strings derived from S, starts with ‘a’.
For symbol A
A
b
for A the starting symbol is ‘b’.
First of A is b.
first of B is ‘c’, first of C is ‘d’
and first of D is ‘e’.
Eg. If Grammar is
S ABCD
A b
S
A B C D
b
always S is going to derive the string starting
with ‘b’.
Eg. If Grammar is
S ABCD
A b/ε
S
A B C D
b/ε
B c
B aB/ ε
C cC/ ε
Example 2. First Follow
S Bb/Cd {a, b, c, d} {$}
S aBDh
B cC
C bC / ε
D EF
E g/ε
F f/ε
Example 2. First Follow
first Follow
S aBDh {a} {$}
B cC {c} { g, f, h } D vanish
C bC / ε { b, ε } { g, f, h } … of B
D EF { g, f, ε } {h}
E g/ε { g, ε } { f, h }
F f/ε { f, ε } {h}
Example 4 :
first Follow
E T E1
E1 +T E1/ ε
T F T1
T1 *F T1/ ε
F id / (E)
Example 4 :
first Follow
E T E1 { id, ( } { $, ) }
E1 +T E1/ ε { +, ε } { $, ) }
T F T1 { id, ( } { +, $, ) }
T1 *F T1/ ε { *, ε } { +, $, ) }
F id / (E) {id, ( } { *, +, ), $ }
To create LL(1) parsing table.
Its Top Down Parser.
In top down parser, whenever we have two
alternatives, for a variable, we have to make sure,
which production should be used.
Which one to be used depends upon its first.
Consider F id / (E), we have id and (E) ,
whenever we want id, we use the first and
whenever we want ( (open round bracket) we use
the second option. ( in top down parsing).
So in top down parser, it is important to
find the first generated by a symbol.
id + * ( ) $
E
E1
T
T1
F
To create LL(1) parsing table :
id + * ( ) $
E ETE1 ETE1
T TFT1 TFT1
T1 T1 ε T1*FT1 T1 ε T1 ε
F F id F (E)
Seeing at E, in the derivation process, if the
symbol we want to get is “ id “, what is the
production that should be written?
By seeing a variable E, to generate the
terminal “id”, which production should be
used ?, if first of this variable E , contains
the terminal “id”, is written here.
We use $ in the column-, as there might be $
in the string or input string ending with $, for
taking some action.
First of E is TE1, which is T.
first of T is { id, ( }, so write the
production ETE1 ,
in the id and ( column.
Meaning, whenever we are looking at E,
if we have to derive id, use this symbol,
ETE1 , as first of TE1 is generating id.
( first of RHS).
For E1 +T E1/ ε.
Select E1 row, select the + column and write
the production. (E1 +T E1 )
Then for second option, E1 ε, there is no
ε column. ( no ε, (epsilon) in input).
Always there is going to be $.
So whenever get ε production, or first of RHS
containing ε, place this production in the
follow of LHS.
How to find LL(1) parsing table :
Given a grammar, first find the first and then the follow
of…
Take every production, and place it in the left side Row.
Row should be on LHS and the column should be first of
right hand side.
Where the production should be placed, depends on
what is the first of RHS.
If it the terminals, directly place them.
If it is ε, then place it also in the follow of LHS.
This table is used to construct the parse tree.
OPERATOR PRECEDENCE PARSER :
( bottom up parser) : it is only parser which
can parse ambiguous grammar.
It is used to define mathematical operators.
First what is operator grammar ?
a grammar which is generally used to define
mathematical operations.
( with some restrictions on the grammar).
E E + E / E * E / id … ( Expression can
be sum of two expression / product of two
expression or id. Grammar is operator
grammar).
No two variables are adjacent, and there
is no epsilon production. (ε).
If another grammar.
E E A E / id
A +/*
Both grammars are same, but
its not operator grammar,
because E A E, two variables are
adjacent to each other.
In C language;
ab ≠ a*b
another Example :
S S A S / a …. Two variables are
adjacent
A bSb/b
Expand it :
S SbSbS/SbS/a …AbSb/b
A bSb/b
Now no two variable are adjacent to each
other. So its operator grammar.
E E + E / E * E / id … is operator grammar.
But it is ambiguous grammar, which can be parsed by
operator precedence parser.
It constructs operation relation table / operation
precedence table.
The grammar is ambiguous; we are able to construct the
Operation Relation Table; in which we are able to define
the precedence rule; that’s why it is also called as
Operation Precedence Parser.
The reason is : in the grammar given, it doesn’t talk any
thing about the precedence, Table talks about precedence
clearly.
How to create the operation relation
table for the grammar.
E E + E / E * E / id
id + * $
id - .> .> .>
+
*
$
For ( + .> + ) …. Two + in expression, which + will
be executed first. As it is left associative, then left
one will be processed first.
id + * $
id - .> .> .>
+ <. .> <. .>
*
$
For ( * .> *), as it is left associative, left one will be
processed first.
$ and $ ------ is completely successful.
Its is operation relation table.
id + * $
id - .> .> .>
+ <. .> <. .>
* <. .> .> .>
$ <. <. <. -
Consider example :
we want to parse :- id + id * id $
Input ends with ‘$’. This parser is bottom up
parsing. i.e we start with the terminal and we go
to the root.
First of all, define a stack, and bottom of stack as
$.
$
id + id * id $ … arrow is look ahead.
$ id
$ id
$ id +
E
$ id +
$ id + id
$ id + id
Now compare stack top + (stack top ) with *, as +
<. *, push * to top of stack. And increment
pointer. (look ahead )
E E
$ id + id *
$ id + id * id
$ id + id * id
It’s the id, whose precedence is more than all.
Now compare top of stack * with $, as *.> $, pop
*.
Now the first operator poped is *, whose
precedence is more, which is evaluated first.
E
E E E
$ id + id * id
E E E
$ id + id * id
Example : 1.
The given grammar is :
EE*B
EE+B
EB
B0
B1
The given grammar is increased with
an extra rule.
(0)S E.
Where S is the start symbol, and E is
the old start symbol.
The parser uses this rule for
reduction exactly when it has
accepted the input string.
(0)S E …. Is added at the start.
(1)E E * B
(2)E E + B
(3)E B
(4)B 0
(5)B 1
Item sets : It is used to show the state of parser.
It cannot be found by single state, it will characterize
the state of the parser by a set of items.
Item set 0 :
S .E . indicates not processed
E .E * B
E .E + B
E .B E B 0 1
B .0
B .1 G3 G4 S1 S2
Item set 1 : For terminal ‘0’
B 0. R4
Item set 2 : For terminal ‘1’
B 1. R5
Item set 3 : For non-terminal ‘E’
S E. E * +
E E.* B Acc S5 S6
E E. + B
Item set 4 : For non-terminal ‘B’
E B. R3
For item sets 1,2,4, there are no transitions,
(process), since there is no dot (.), in front of any
symbol. Check for the remaining ones.
Item set 5 : For non-terminal ‘*’
E E*. B B01
B .0 G7 S1 S2
B .1
Item set 6 : For non-terminal ‘+’
E E +. B B01
B .0 G8 S1 S2
B .1
Item set 7 : For non-terminal ‘B’
E E * B. R1
Item set 8 : For non-terminal ‘B’
E E + B. R2
Transition Table :
Take the items set on LHS ( as
rows) and
the terminals as columns,
named as action and
the last for Goto column for
non-terminals.
Item
Set Action Goto
* + 0 1 $ E B
0 1 2 3 4
1
2
3 5 6
4
5 1 2 7
6 1 2 8
7
8
Construction of Parse Table:
The Goto column for the non-terminals.
Action table contains the columns of
terminals, as shift action.
The action table has an extra column ‘$’, is
added, that contains accept state for every
item set that contains S E.
S .
Item Set 7 :
S bSa.S
S .aSbS
S .bSaS
S .
Item Set 8 :
S aSbS.
Item Set 9 :
S bSaS.
Action Goto
Item Set
a b $ S
0 S2/R3 S3/R3 R3 G1
1 Acc
2 S2/R3 S3/R3 R3 G4
3 S2/R3 S3/R3 R3 G5
4 S6
5 S7
6 S2/R3 S3/R3 R3 S8
7 S2/R3 S3/R3 R3 S9
8 R1 R1 R1
9 R2 R2 R2
IN LR(0) PARSING TABLE, THERE
ARE SHIFT- REDUCE CONFLICTS.
SO THE GIVEN GRAMMAR IS
NOT LR(0).
Example : 2.
S AA
A aA / b
Add one more production :
(0)S1 S
(1)S AA
(2)A aA
(3)A b
Item set 0
S1 .S
S .AA S A a b
A .aA
A .b
Item set 1
S1 S. Acc
Item set 2
S A.A
A .aA A a b
A .b
Item set 3
A a.A A a b
A .aA
A .b
Item set 4
A b. R
Item set 5
S AA. R
Item set 6
A aA. R
Item Set Action Goto
a b $ S A
0 S3 S4 G1 G2
1 Acc
2 S3 S4 G5
3 S3 S4 G6
4 R3 R3 R3
5 R1 R1 R1
6 R2 R2 R2
Item Set Action Goto
a b $ S A
0
1
2
3
4
5
6
Item Set Action Goto
a b $ S A
0 S3 S4 G1 G2
1 Acc
2 S3 S4 G5
3 S3 S4 G6
4 R3 R3 R3
5 R1 R1 R1
6 R2 R2 R2
Create LR(0)
E 1E
E 1
1 $ E
0 S2 G1
1 Acc
2 S2/R2 R2 G3
As there
3 is S/R
R1 conflict,
R1 it not LR(0).
Checking for SLR(1):
For R1; the statement is E 1E ; - what is the
follow of LHS, ie. E. It is ‘$’. So put R1 in “$
column”.
For R2; the statement is E 1 ; - what is the
follow of LHS, ie. E. It is ‘$’. So put R2 in “$
column”.
SLR(1) Parsing Table :
1 $ E
0 S2 G1
1 Acc
2 S2 R2 G3
3 R1
Example : 3. CONSTRUCT left recursion(0) PARSING TABLE :
C AB
Aa
Bb
Add one more production :
(0)C1 C
(1)C AB
(2)A a
(3)B b
Transition state :
LR(0) Parsing Table :
Item
Action Goto
Set
a $ C S B
0 S3 G1 G2
1 Acc
2 S3 G4
3 R2/R3 R2/R3
As there is R/R conflict, it’s not LR(0) ….
4 R1 R1
Check SLR(1) :
Item
Action Goto
Set
a $ C S B
0 S3 G1 G2
1 Acc
2 S3 G4
3 R2/R3 R2/R3
Now no conflict ….
4 R1 R1
SLR PARSER :-
Simple, left-to-right, rightmost derivation parsers
with 1 symbol look-ahead.
SLR(1) parsers are basically LR(0) parsers with a
very simple look-ahead scheme.
SLR parser prevents certain shift-reduce and
reduce-reduce conflicts that occur in LR(0).
In LR(0), whenever we have final item in a state;
put the reduce move in the entire row.
Difference :
In SLR(0), whenever we have a final item in a
state, don’t put the reduce mode.
Difference is in the terms of Reduce mode.
In SLR(0), don’t blindly reduce, whenever possible.
For row 5.
Final item state S AA. (i.e. R1).
What is follow of S (LHS) ?.
The result is $.
So put R1 in only one column $.
For row 6.
Final item state A aA. (i.e. R2).
What is follow of A (LHS) ?.
The result is $.
So put R2 in all the columns ( a, b, $).
Item
Action Goto
Set
a b $ S A
0 S3 S4 G1 G2
1 Acc
2 S3 S4 G5
3 S3 S4 G6
4 R3 R3 R3
5 R1
6 R2 R2 R2
For LR and SLR, shift and Goto procedure are the
same, only there is change in the reduced mode.
For SLR, don’t place the reduced mode in entire
row;
just see that the next symbol that we get after
reducing in the follow of LHS or not.
Only place the reduce move in the follow of LHS.