Unit3.2 Bottomupparsars
Unit3.2 Bottomupparsars
2
Parsing
3
Bottom-Up Parsing
o Bottom-Up Parser : Constructs a parse tree for an input string beginning at the leaves (the
bottom) and working up towards the root (the top).
o We can think of this process as one of “reducing” a string w to the start symbol of a
grammar.
o Bottom-up parsing is also known as shift-reduce parsing because its two main actions are
shift and reduce.
❑ At each shift action, the current symbol in the input string is pushed to a stack.
❑ At each reduction step, the symbols at the top of the stack (this symbol sequence
is the right side of a production) will replaced by the non-terminal at the left side
of that production. 4
Shift–Reduce Parsing-Example
oConsider the grammar Input string : abbcde
S aABe aAbcde
A Abc | b aAde ⇓ reduction
B d aABe
S
We can scan abbcde looking for a substring that matches the right side of some production. The substrings b and
d qualify. Let us choose left most b and replace it by A, the left side of the production A b; we thus obtain the
string aAbcde. Now the substrings Abc, b and d match the right side of some production. Although b is the
leftmost substring that matches the right side of the some production, we choose to replace the substring Abc
by A, the left side of the production A Abc. We obtain aAde. Then replacing d by B, and then replacing the entire
string by S. Thus, by a sequence of four reductions we are able to reduce abbcde to S.
5
Shift–Reduce Parsing-Example
o These reductions in fact trace out the following right-most derivation in reverse
6
Handle
o Informally, a “handle” of a string is a substring that matches the right side of the production, and whose reduction to
nonterminal on the left side of the production represents one step along the reverse of a rightmost derivation.
• But not every substring matches the right side of a production rule is handle.
o Formally , a “handle” of a right sentential form γ (≡ αβω) is a production rule A → β and a position of γ where the
string β may be found and replaced by A to produce the previous right-sentential form in a rightmost derivation of γ.
S ⇒ αAω ⇒ αβω
*
rm rm
then A β in the position following α is a handle of αβω
o The string ω to the right of the handle contains only terminal symbols.
7
Handle: Example
oConsider the grammar Input string : abbcde
S aABe aAbcde
A Abc | b aAde ⇓ reduction
B d aABe
S
o Consider the example discussed in the beginning, abbcde is a right sentential form whose handle is A b at
position 2. Likewise,aAbcde is a right sentential form whose handle is A Abc at position 2.
o Sometimes we say “the substring β is a handle of αβω” if the position of β and the production A β
we have in mind are clear.
8
A Shift-Reduce Parser
9
A Stack Implementation of a Shift-Reduce Parser
o There are four possible actions of a shift-parser action:
1.Shift : The next input symbol is shifted onto the top of the stack.
2.Reduce: Replace the handle on the top of the stack by the non-terminal.
4.Error: Parser discovers a syntax error, and calls an error recovery routine.
10
A Stack Implementation of A Shift-Reduce Parser
E → E+T | T Input id+id*id
T → T*F | F
F → (E) | id
Parse Tree
Stack Input Action
$ id+id*id$ shift E8
$id +id*id$ Reduce by F→id
$F +id*id$ Reduce by T→F +
$T +id*id$ Reduce by E→T E3 T7
$E +id*id$ Shift
$E+ id*id$ Shift
$E+id *id$ Reduce by F→id *
T 2
T 5 F6
$E+F *id$ Reduce by T→F
$E+T *id$ Shift
$E+T* id$ Shift
$E+T*id $ Reduce by F→id F1 F4 id
$E+T*F $ Reduce by T→T*F
$E+T $ Reduce by E →E+T
$E $ Accept id id
11
Classification of Parsing
12
LR Parsing
❖ To have an operational shift-reduce parser, we must determine:
✴ Whether a handle appears on top of the stack.
✴ The reducing production to be used.
✴ The choice of actions to be made at each parsing step.
13
LR Parsing
❖ LR parsing is attractive for a number of reasons …
✴ Is the most general deterministic parsing method known
✴ Can recognize virtually all programming language constructs
✴ Can be implemented very efficiently
✴ The class of LR grammars is a proper superset of the LL grammars
✴ Can detect a syntax error as soon as an erroneous token is encountered
✴ An LR parser can be generated by a parser generating tool
14
LR Parsers
❖ An LR parser consists of …
✴ Driver program
✧ Same driver is used for all LR parsers
✴ Parsing stack
✧ Contains state information, where s i is state
i
✧ States are obtained from grammar analysis
✴ Parsing table, which has two parts
✧ Action section: specifies the parser actions
✧ Goto section: specifies the successor states
❖ The parser driver receives tokens from the scanner one at a time.
❖ Parser uses top state and current token to lookup parsing table.
15
❖ Different LR analysis techniques produce different tables.
LR Parsing Table Example
16
LR Parsing Example
1: E → E + T State Action Goto
Stack Symbols Input Action + ID ( ) $ E T
2: E → T
0 S1 S2 G4 G3
0 $ id + ( id + id ) $ S1 3: T → id
1 R3 R3 R3
01 $ id + ( id + id ) $ R3, G3 4: T → (E)
03 $T + ( id + id ) $ R2, G4 2 S1 S2 G6 G3
04 $E + ( id + id ) $ S5 3 R2 R2 R2
045 $E+ ( id + id ) $ S2 4 S5 A
0452 $E+( id + id ) $ S1 5 S1 S2 G7
0 4 5 2 1 $ E + ( id + id ) $ R3, G3 6 S5 S8
04523 $E+(T + id ) $ R2, G6
7 R1 R1 R1
04526 $E+(E + id ) $ S5
8 R4 R4 R4
045265 $E+(E+ id ) $ S1
0 4 5 2 6 5 1 $ E + ( E + id )$ R3, G7
0452657 $E+(E+T )$ R1, G6
Grammar
04526 $E+(E )$ S8
045268 $E+(E) $ R4, G7 symbols do not
0457 $E+T $ R1, G4 appear on the
04 $E $ A parsing stack
They are shown
here for clarity
17
LR Parser Driver
❖ Let s be the parser stack top state and t be the current input token
❖ If action[s,t] = shift n then
✴ Push state n on the stack
✴ Call scanner to obtain next token
18
19
20
21
22
LR(0) Parser Generation – Items and States
❖ LR(0) grammars can be parsed looking only at the stack
❖ Making shift/reduce decisions without any lookahead token
❖ Based on the idea of an item or a configuration
❖ An LR(0) item consists of a production and a dot
A → X1 . . . Xi ∙ Xi+1 . . . Xn
❖ The dot symbol ∙ may appear anywhere on the right-hand side
✴ Marks how much of a production has already been seen
✴ X1 . . . Xi appear on top of the stack
✴ Xi+1 . . . Xn are still expected to appear
23
LR(0) Parser Generation – Initial State
❖ Consider the following grammar G1:
1: E → E + T 3: T → ID
2: E → T 4: T → ( E )
❖ For LR parsing, grammars are augmented with a . . .
✴ New start symbol S, and a
✴ New start production 0: S → E $
❖ The input should be reduced to E followed by $
✴ We indicate this by the item: S → ∙ E $
❖ The initial state (numbered 0) will have the item: S → ∙ E $
❖ An LR parser will start in state 0
❖ State 0 is initially pushed on top of parser stack
24
Identifying the Initial State
❖ Since the dot appears before E, an E is expected
✴ There are two productions of E: E → E + T and E → T
✴ Either E+T or T is expected
✴ The items: E → ∙ E + T and E → ∙ T are added to the initial state
❖ Since T can be expected and there are two productions for T
✴ Either ID or ( E ) can be expected
✴ The items: T → ∙ ID and T → ∙ ( E ) are added to the initial state
❖ The initial state (0) is identified by the following set of
items
S → ∙E$
E → ∙E+
T E→ ∙T
T → ∙ ID
T → ∙(E) 0
25
Shift Actions
❖ In state 0, we can shift either an ID or a left parenthesis
✴ If we shift an ID, we shift the dot past the ID
✴ We obtain a new item T → ID ∙ and a new state (state 1)
✴ If we shift a left parenthesis, we obtain T → ( ∙ E )
✴ Since the dot appears before E, an E is expected
✴ We add the items E → ∙ E + T and E → ∙ T
✴ Since the dot appears before T, we add T → ∙ ID and T → ∙ ( E )
✴ The new set of items forms a new state (state 2)
S → ∙E T →( ∙ E ) E
$ ( → ∙E+T
E →∙ E + T E →∙ T (
E →∙ T T → ID ∙ ID T → ∙ ID
ID 1
T → ∙ ID 0 T →∙ ( E ) 2
T →∙ ( E )
26
Reduce and Goto Actions
❖ In state 1, the dot appears at the end of item T → ID ∙
✴ This means that ID appears on top of stack and can be reduced to T
✴ When ∙ appears at end of an item, the parser can perform a reduce action
S → ∙E T TE → T∙ T →( ∙ E ) E
3
$ → ∙E+T
E →∙ E + T ( E →∙ T (
E →∙ T T → ∙ ID
T → ∙ ID 0 ID T → ID ∙ ID T →∙ ( E ) 2
1
T →∙ ( E )
27
DFA of LR(0) States
❖ We complete the state diagram to obtain the DFA of LR(0) states
❖ In state 4, if next token is $, the parser accepts (successful parse)
S → ∙E T TE → T∙ T →( ∙ E ) E
3
$ → ∙E+T
E →∙ E + T ( E →∙ T (
E →∙ T T → ∙ ID
T → ∙ ID 0 ID T → ID ∙ ID T →∙ ( E ) 2
1
T →∙ ( E )
E ID ( E
S → E∙ + E →E + ∙ T T →( E ∙ )
$ 4 T → ∙ ID + E →E ∙ + T 6
E →E ∙ + T T →∙ ( E ) 5
$ )
T
Accept
7 8
E → E+T∙ T → (E)∙
28
LR(0) Parsing Table
❖ The LR(0) parsing table is obtained from the LR(0) state diagram
❖ The rows of the parsing table correspond to the LR(0) states
❖ The columns correspond to tokens and non-terminals
❖ For each state transition i → j caused by a token x …
✴ Put Shift j at position [i, x] of the table
❖ For each transition i → j caused by a nonterminal A …
✴ Put Goto j at position [i, A] of the table
❖ For each state containing an item A → α ∙ of rule n …
✴ Put Reduce n at position [i, y] for every token y
❖ For each transition i → Accept …
✴ Put Accept at position [i, $] of the table
29
LR(0) Parsing Table – cont'd
S → ∙E T TE → T∙ T →( ∙ E ) E
→
3
$ → ∙E+T 1: E E+T
E →∙ E + T ( E →∙ T (
E →∙ T T → ∙ ID 2: E → T
T → ∙ ID 0 ID T → ID ∙
1
ID T →∙ ( E ) 2 3: T → id
T →∙ ( E ) 4: T → (E)
E ID ( E
S → E∙ + E →E + ∙ T T →( E ∙ ) LR(0) parsing table
$ 4 T → ∙ ID + E →E ∙ + T 6 Action Goto
E →E ∙ + T T →∙ ( E ) 5 State + ID ( ) $ E T
$ )
T 0 S1 S2 G4 G3
Accept 1 R3 R3 R3 R3 R3
7 8
Action Goto E → E+T∙ T → (E)∙
Stat 2 S1 S2 G6 G3
e + ID ( ) $ E T
0 S1 S2 G4 G3
3 R2 R2 R2 R2 R2
1 R3 R3 R3 4 S5 A
2 S1 S2 G6 G3 5 S1 S2 G7
3 R2 R2 R2 SLR(1) parsing table 6 S5 S8
4 S5 A
5 S1 S2 G7
7 R1 R1 R1 R1 R1
6 S5 S8 8 R4 R4 R4 R4 R4
7 R1 R1 R1 30
8 R4 R4 R4
Limitations of the LR(0) Parsing Method
❖ Consider grammar G2 for matched parentheses
❖ 0: S' → S $ 1: S → ( S ) S 2: S → ε
(
S' → ∙ S $ S→(∙S)S S→(S)∙S
S→∙(S)S ( S→∙(S)S ( S→∙(S)S
S→∙ 0 S→∙ 2 S→∙ 4
S S ) S
S' → S ∙ $ 1 S →(S∙)S 3
S →(S)S∙ 5
Accept $
31
Conflicts
❖ In state 0 parser encounters a conflict ...
✴ It can shift state 2 on stack when next token is ( S' → ∙ S $
✴ It can reduce production 2: S → ε S→∙(S)S
S→∙ 0
✴ This is a called a shift-reduce conflict
(
✴ This conflict also appears in states 2 and 4
2
32
LR(0) Grammars
❖ The shift-reduce conflict in state 0 indicates that G2 is not LR(0)
❖ A grammar is LR(0) if and only if each state is either …
✴ A shift state, containing only shift items
✴ A reduce state, containing only a single reduce item
❖ If a state contains A → α ● x γ then it cannot contain B → β ●
✴ Otherwise, parser can shift x and reduce B → β ● (shift-reduce conflict)
❖ If a state contains A → α ● then it cannot contain B → β ●
✴ Otherwise, parser can reduce A → α ● and B → β ● (reduce-reduce conflict)
❖ LR(0) lacks the power to parse programming language grammars
✴ Because they do not use the lookahead token in making parsing decisions
33
SLR(1) Parsing
❖ SLR(1), or simple LR(1), improves LR(0) by …
✴ Making use of the lookahead token to eliminate conflicts
❖ SLR(1) works as follows …
✴ It uses the same DFA obtained by the LR(0) parsing method
✴ It puts reduce actions only where indicated by the FOLLOW set
❖ To reduce α to A in A → α ● we must ensure that …
✴ Next token may follow A (belongs to FOLLOW(A))
❖ We should not reduce A → α ● when next token ∉ FOLLOW(A)
❖ In grammar G2 …
✴ 0: S' → S $ 1: S → ( S ) S 2: S → ε
✴ FOLLOW(S) = {$, )}
✴ Productions 1 and 2 are reduced when next token is $ or ) only
34
SLR(1) Parsing Table
❖ The SLR(1) parsing table of grammar G2 is
shown below
❖ The shift-reduce conflicts are now eliminated
✴ The R2 action is removed from [0, ( ], [2, ( ], and [4, ( ]
✴ Because ( does not follow S
✴ S2 remains under [0, ( ], [2, ( ], and [4, ( ]
Action Goto
✴ R1 action is also removed from [5, ( ] State
( ) $ S
❖ Grammar G2 is SLR(1) 0 S2 R2 R2 G1
1 A
✴ No conflicts in parsing table 2 S2 R2 R2 G3
3 S4
✴ R1 and R2 for ) and $ only
4 S2 R2 R2 G5
✴ Follow set indicates when to reduce 5 R1 R1
35
SLR(1) Grammars
❖ SLR(1) parsing increases the power of LR(0) significantly
✴ Lookahead token is used to make parsing decisions
✴ Reduce action is applied more selectively according to FOLLOW set
36
Limits of the SLR(1) Parsing Method
❖ Consider the following grammar G3 …
0: S' → S $ 1: S → id 2: S → V := E 3: V → id 4: E → V 5:
E→n
❖ The initial state consists of 4 items as shown below
✴ When id is shifted in state 0, we obtain 2 items: S → id ∙ and V → id ∙
S' → ∙ S $ id S → id ∙
S → ∙ id V → id ∙ 1
S → ∙ V := E
V → ∙ id 0
37
38
Consider the grammar
E -> T+E | T
T ->id
43
CLR(1) Parsing – Items and States
❖ Even more powerful than SLR(1) is the CLR(1) parsing method
❖ CLR(1) generalizes LR(0) by including a lookahead token in items
❖ An CLR(1) item consists of …
✴ Grammar production rule
✴ Right-hand position represented by the dot, and
✴ Lookahead token
44
CLR(1) Parser Generation – Initial State
❖ Consider again grammar G3 …
0: S' → S $ 1: S → id 2: S → V := E 3: V → id 4: E → V 5: E → n 0: S' → S $
1: S → id
❖ The initial state contains the CLR(1) item: S' → ∙ S , $
2: S → V := E
✴ S' → ∙ S , $ means that S is expected and to be followed by $
3: V → id
❖ The closure of (S' → ∙ S , $) produces the initial state items 4: E → V
✴ Since the dot appears before S, an S is expected 5: E → n
✴ There are two productions of S: S → id and S → V := E
✴ The CLR(1) items (S → ∙ id , $) and (S → ∙ V := E , $) are obtained
✧ The lookahead token is $ (end-of-file token)
✴ Since the ∙ appears before V in (S → ∙ V := E , $), a V is expected
✴ The CLR(1) item ( V → ∙ id , := ) is obtained
✧ The lookahead token is := because it appears after V in (S → ∙ V := E , $)
45
Shift Action
❖ The initial state (state 0) consists of 4 items
❖ In state 0, we can shift an id
✴ The token id can be shifted in two items S' → ∙ S , $
S → ∙ id , $
✴ When shifting id, we shift the dot past the id S → ∙ V :=
✴ We obtain (S → id ∙ , $ ) and ( V → id ∙ , := ) E,$ 0
V →∙ id , :=
✴ The two CLR(1) items form a new state (state 1)
✴ The two items are reduce items
✴ No additional item can be added to state 1
S' → ∙ S , $ S → id ∙
S → ∙ id , $ id ,$ 1
S → ∙ V := V →id ∙ , :=
E,$ 0
V →∙ id , :=
46
Reduce and Goto Actions
2 S S' → ∙ S , $ S → id ∙
S' → S∙,$ id
S → ∙ id , $ ,$ 1
$ S → ∙ V := V →id ∙ , :=
Accept E,$ 0 V 3
V →∙ id , := S → V ∙ := E , $
47
CLR(1) State Diagram
S' → ∙ S , $ S → id ∙ S → V := E ∙ , $
S → ∙ id , $ id ,$ 1 5
S → ∙ V := V →id ∙ , :=
E,$ 0 V 3 E 6
V →∙ id , := S → V ∙ := E , $ E →V ∙ , $
S :=
V
2 S → V := ∙ E , $ E → n∙,$
7
S' → S∙,$
E→∙V,$ E n
$ →∙n,$ V
Accept → ∙ id , $ 4 id
V → id ∙ , $
8
48
CLR(1) State Diagram
S CC
C cC
C d
49
LALR(1) State Diagram
S CC
C cC
C d
50
CLR(1) State Diagram
51
CLR(1) &LALR(1) State Diagram
52
CLR(1) Grammars
53
Drawback of CLR(1)
❖ CLR(1) can generate very large parsing tables
❖ For a typical programming language grammar …
✴ The number of states is around several hundred for LR(0) and SLR(1)
✴ The number of states can be several thousand for CLR(1)
❖ This is why parser generators do not adopt the general
CLR(1)
❖ Consider again grammar G2 for matched parentheses
0: S' → S $ 1: S → ( S ) S 2: S → ε
❖ The CLR(1) DFA has 10 states, while the LR(0) DFA has 6
54
CLR(1) DFA of Grammar G2
(
0 2 6 0: S' → S $
S' → ∙ S $ S → ∙ S →(∙S)S S →(∙S)S
(S)S$ SS→∙ ( $ ( ) 1: S → ( S ) S
$ S →∙(S)S S →∙(S)S
) ) 2: S → ε
1 S 3 S 7 S
S →∙ ) S →∙ )
S' → S ∙ $ S→(S∙)S $ S→(S∙)S )
$ 4 ) 8 )
( (
Accept S→(S)∙S$ S S → ( S ) ∙ S)
→∙(S)S$ S S → ∙ ( S ) S)
→∙$ S →∙ )
5 S 9 S
S →(S)S∙ $ S →(S)S∙ )
55
LALR(1) : Look-Ahead LR(1)
❖ Preferred parsing technique in many parser generators
❖ Close in power to LR(1), but with less number of states
❖ Increased number of states in LR(1) is because
✴ Different lookahead tokens are associated with same LR(0) items
❖ Number of states in LALR(1) = states in LR(0)
❖ LALR(1) is based on the observation that
✴ Some LR(1) states have same LR(0) items
✴ Differ only in lookahead tokens
❖ LALR(1) can be obtained from LR(1) by
✴ Merging LR(1) states that have same LR(0) items
✴ Obtaining the union of the LR(1) lookahead tokens
56
LALR(1) DFA of Grammar G2
( (
0 2 4
0: S' → S $
S' → ∙ S $ S → ∙ S → ( ∙ S ) S$ ) S →(S)∙S $)
(S)S$ SS→∙ ( S→∙(S)S ) ( S →∙(S)S$) 1: S → ( S ) S
$ S→∙ ) S→∙ $)
1 S 3 S ) 5 S 2: S → ε
S' → S ∙ $ S → ( S ∙ )Accept
S $) S → ( S ) SAccept
∙ $)
$
Accept
57
YAAC Parser
58
YAAC Parser
59
YAAC Parser
60
YAAC Parser
61
Grammar Hierarchy
62
Syntax Error Handling
● Next few lectures will focus on the nature of syntactic errors and
general strategies for error-recovery.
● If a compiler had to process only correct programs, its design would
be extremely simple. However it is expected to Assist the programmer
in locating and tracking errors.
● One should note that a programming language does not specify how
a compiler should respond to errors. It is actually left entirely to the
compiler designer.
63
Review: Common Programming Errors
64
Challenges of Error Handling
Viable-Fixable property: detecting an error as soon as a prefix of the input
cannot be completed to form a string of the language. Goals of an
Error Handler:
● Reporting presence of errors, clearly and accurately.
● Recovering from errors quickly enough to detect subsequent errors.
● Add minimal overhead to the processing of correct programs.
65
Error recovery
● There is no universally acceptable method.
● The simplest method is for the parser to quit with the appropriate
error message, at the first instance of an error.
● Problem? Subsequent errors will not be detected.
● Solution? If the parser can restore itself to a state where processing
can continue, future errors can be detected.
● In some cases compiler stops if errors pileup.
66
Error Recovery Strategies
● Panic Mode Recovery: The parser discovers an error. It then discards
input symbols till a designated set of synchronizing token is found.
● Synchronizing tokens selected are such that their role in the program
is unambiguous, like Delimiters ; } etc. Advantage? Simple and never
goes into an infinite loop.
● Drawback: Skips considerable amount of input when checking for
additional errors
67
Phrase-level Recovery
68
Error Productions
69
Global Corrections
70
Reference Materials
71