0% found this document useful (0 votes)
29 views71 pages

Unit3.2 Bottomupparsars

The document discusses bottom-up parsing, a method where a parse tree is constructed from the leaves to the root, primarily using shift and reduce actions. It explains the concept of handles in parsing and provides examples of shift-reduce parsing, including the implementation of a shift-reduce parser and the LR parsing technique. LR parsing is highlighted as a general and efficient method for parsing that can recognize a wide range of programming language constructs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views71 pages

Unit3.2 Bottomupparsars

The document discusses bottom-up parsing, a method where a parse tree is constructed from the leaves to the root, primarily using shift and reduce actions. It explains the concept of handles in parsing and provides examples of shift-reduce parsing, including the implementation of a shift-reduce parser and the LR parsing technique. LR parsing is highlighted as a general and efficient method for parsing that can recognize a wide range of programming language constructs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

Bottom-up Parsers

Dr. Banee Bandana Das


Department of CSE
1
Classification of Parsing

2
Parsing

• Process of determination whether a string can be generated by a


grammar.
• Parsing falls in two categories:
– Top-down parsing:
Construction of the parse tree starts at the root (from the start symbol) and
proceeds towards leaves (token or terminals)
– Bottom-up parsing:
Construction of the parse tree starts from the leaf nodes (tokens or
terminals of the grammar) and proceeds towards root (start symbol)

3
Bottom-Up Parsing

o Bottom-Up Parser : Constructs a parse tree for an input string beginning at the leaves (the
bottom) and working up towards the root (the top).
o We can think of this process as one of “reducing” a string w to the start symbol of a
grammar.
o Bottom-up parsing is also known as shift-reduce parsing because its two main actions are
shift and reduce.

❑ At each shift action, the current symbol in the input string is pushed to a stack.

❑ At each reduction step, the symbols at the top of the stack (this symbol sequence
is the right side of a production) will replaced by the non-terminal at the left side
of that production. 4
Shift–Reduce Parsing-Example
oConsider the grammar Input string : abbcde
S aABe aAbcde
A Abc | b aAde ⇓ reduction
B d aABe
S

We can scan abbcde looking for a substring that matches the right side of some production. The substrings b and
d qualify. Let us choose left most b and replace it by A, the left side of the production A b; we thus obtain the
string aAbcde. Now the substrings Abc, b and d match the right side of some production. Although b is the
leftmost substring that matches the right side of the some production, we choose to replace the substring Abc
by A, the left side of the production A Abc. We obtain aAde. Then replacing d by B, and then replacing the entire
string by S. Thus, by a sequence of four reductions we are able to reduce abbcde to S.

5
Shift–Reduce Parsing-Example

o These reductions in fact trace out the following right-most derivation in reverse

S ⇒ aABe ⇒ aAde ⇒ aAbcde ⇒ abbcde


rm rm rm rm

Right Sentential Forms

o How do we know which substring to be replaced at each reduction step?

6
Handle

o Informally, a “handle” of a string is a substring that matches the right side of the production, and whose reduction to
nonterminal on the left side of the production represents one step along the reverse of a rightmost derivation.

• But not every substring matches the right side of a production rule is handle.

o Formally , a “handle” of a right sentential form γ (≡ αβω) is a production rule A → β and a position of γ where the
string β may be found and replaced by A to produce the previous right-sentential form in a rightmost derivation of γ.

S ⇒ αAω ⇒ αβω

*
rm rm
then A β in the position following α is a handle of αβω

o The string ω to the right of the handle contains only terminal symbols.

7
Handle: Example
oConsider the grammar Input string : abbcde
S aABe aAbcde
A Abc | b aAde ⇓ reduction
B d aABe
S

o Consider the example discussed in the beginning, abbcde is a right sentential form whose handle is A b at
position 2. Likewise,aAbcde is a right sentential form whose handle is A Abc at position 2.

o Sometimes we say “the substring β is a handle of αβω” if the position of β and the production A β
we have in mind are clear.

8
A Shift-Reduce Parser

E → E+T | T Right-Most Derivation of id+id*id


T → T*F | F E ⇒ E+T ⇒ E+T*F ⇒ E+T*id ⇒ E+F*id
F → (E) | id ⇒ E+id*id ⇒ T+id*id ⇒ F+id*id ⇒ id+id*id

Right-Most Sentential form HANDLE Reducing Production


id+id*id id F→id
F+id*id F T→F
T+id*id T E→T
E+id*id id F→id
E+F*id F T→F
E+T*id Id F→id
E+T*F T*F T→T*F
E+T E+T E→E+T
E

9
A Stack Implementation of a Shift-Reduce Parser
o There are four possible actions of a shift-parser action:

1.Shift : The next input symbol is shifted onto the top of the stack.

2.Reduce: Replace the handle on the top of the stack by the non-terminal.

3.Accept: Successful completion of parsing.

4.Error: Parser discovers a syntax error, and calls an error recovery routine.

o Initial stack just contains only the end-marker $.


o The end of the input string is marked by the end-marker $.

10
A Stack Implementation of A Shift-Reduce Parser
E → E+T | T Input id+id*id
T → T*F | F
F → (E) | id
Parse Tree
Stack Input Action
$ id+id*id$ shift E8
$id +id*id$ Reduce by F→id
$F +id*id$ Reduce by T→F +
$T +id*id$ Reduce by E→T E3 T7
$E +id*id$ Shift
$E+ id*id$ Shift
$E+id *id$ Reduce by F→id *
T 2
T 5 F6
$E+F *id$ Reduce by T→F
$E+T *id$ Shift
$E+T* id$ Shift
$E+T*id $ Reduce by F→id F1 F4 id
$E+T*F $ Reduce by T→T*F
$E+T $ Reduce by E →E+T
$E $ Accept id id
11
Classification of Parsing

12
LR Parsing
❖ To have an operational shift-reduce parser, we must determine:
✴ Whether a handle appears on top of the stack.
✴ The reducing production to be used.
✴ The choice of actions to be made at each parsing step.

❖ LR parsing provides a solution to the above problems


✴ Is a general and efficient method of shift-reduce parsing
✴ Is used in a number of automatic parser generators

❖ The LR(k) parsing technique was introduced by Knuth in 1965


✴ L is for Left-to-right scanning of input
✴ R corresponds to a Rightmost derivation done in reverse
✴ k is the number of lookahead symbols used to make parsing decisions

13
LR Parsing
❖ LR parsing is attractive for a number of reasons …
✴ Is the most general deterministic parsing method known
✴ Can recognize virtually all programming language constructs
✴ Can be implemented very efficiently
✴ The class of LR grammars is a proper superset of the LL grammars
✴ Can detect a syntax error as soon as an erroneous token is encountered
✴ An LR parser can be generated by a parser generating tool

❖ Four LR parsing techniques will be considered

✴ LR(0) : LR parsing with no lookahead token to make parsing decisions


✴ SLR(1) : Simple LR, with one token of lookahead
✴ CLR(1) : Canonical LR, with one token of lookahead
✴ LALR(1) : Lookahead LR, with one token of lookahead

❖ LALR(1) is the preferable technique used by parser generators

14
LR Parsers

❖ An LR parser consists of …
✴ Driver program
✧ Same driver is used for all LR parsers
✴ Parsing stack
✧ Contains state information, where s i is state
i
✧ States are obtained from grammar analysis
✴ Parsing table, which has two parts
✧ Action section: specifies the parser actions
✧ Goto section: specifies the successor states

❖ The parser driver receives tokens from the scanner one at a time.

❖ Parser uses top state and current token to lookup parsing table.

15
❖ Different LR analysis techniques produce different tables.
LR Parsing Table Example

❖ Consider the following grammar G1 …


1: E E+T 3: T → ID

2: E T 4: T → ( E )
❖ The following parsing table is obtained after grammar analysis

Action Goto
State
+ ID ( ) $ E T Entries are labeled with …
0 S1 S2 G4 G3 Sn: Shift token and goto state n
1 R3 R3 R3 (call scanner for next token)
2 S1 S2 G6 G3
Rn: Reduce using production n
3 R2 R2 R2
4 S5 A Gn: Goto state n (after reduce)
5 S1 S2 G7 A: Accept parse
6 S5 S8 (terminate successfully)
7 R1 R1 R1
blank : Syntax error
8 R4 R4 R4

16
LR Parsing Example
1: E → E + T State Action Goto
Stack Symbols Input Action + ID ( ) $ E T
2: E → T
0 S1 S2 G4 G3
0 $ id + ( id + id ) $ S1 3: T → id
1 R3 R3 R3
01 $ id + ( id + id ) $ R3, G3 4: T → (E)
03 $T + ( id + id ) $ R2, G4 2 S1 S2 G6 G3
04 $E + ( id + id ) $ S5 3 R2 R2 R2
045 $E+ ( id + id ) $ S2 4 S5 A
0452 $E+( id + id ) $ S1 5 S1 S2 G7
0 4 5 2 1 $ E + ( id + id ) $ R3, G3 6 S5 S8
04523 $E+(T + id ) $ R2, G6
7 R1 R1 R1
04526 $E+(E + id ) $ S5
8 R4 R4 R4
045265 $E+(E+ id ) $ S1
0 4 5 2 6 5 1 $ E + ( E + id )$ R3, G7
0452657 $E+(E+T )$ R1, G6
Grammar
04526 $E+(E )$ S8
045268 $E+(E) $ R4, G7 symbols do not
0457 $E+T $ R1, G4 appear on the
04 $E $ A parsing stack
They are shown
here for clarity
17
LR Parser Driver

❖ Let s be the parser stack top state and t be the current input token
❖ If action[s,t] = shift n then
✴ Push state n on the stack
✴ Call scanner to obtain next token

❖ If action[s,t] = reduce A → X1 X2 ... Xm then


✴ Pop the top m states off the stack
✴ Let s' be the state now on top of the stack
✴ Push goto[s', A] on the stack (using the goto section of the parsing table)

❖ If action[s,t] = accept then return


❖ If action[s,t] = error then call error handling routine
❖ All LR parsers behave the same way
✴ The difference depends on how the parsing table is computed from a CFG

18
19
20
21
22
LR(0) Parser Generation – Items and States
❖ LR(0) grammars can be parsed looking only at the stack
❖ Making shift/reduce decisions without any lookahead token
❖ Based on the idea of an item or a configuration
❖ An LR(0) item consists of a production and a dot

A → X1 . . . Xi ∙ Xi+1 . . . Xn
❖ The dot symbol ∙ may appear anywhere on the right-hand side
✴ Marks how much of a production has already been seen
✴ X1 . . . Xi appear on top of the stack
✴ Xi+1 . . . Xn are still expected to appear

❖ An LR(0) state is a set of LR(0) items


✴ It is the set of all items that apply at a given point in parse

23
LR(0) Parser Generation – Initial State
❖ Consider the following grammar G1:
1: E → E + T 3: T → ID
2: E → T 4: T → ( E )
❖ For LR parsing, grammars are augmented with a . . .
✴ New start symbol S, and a
✴ New start production 0: S → E $
❖ The input should be reduced to E followed by $
✴ We indicate this by the item: S → ∙ E $
❖ The initial state (numbered 0) will have the item: S → ∙ E $
❖ An LR parser will start in state 0
❖ State 0 is initially pushed on top of parser stack

24
Identifying the Initial State
❖ Since the dot appears before E, an E is expected
✴ There are two productions of E: E → E + T and E → T
✴ Either E+T or T is expected
✴ The items: E → ∙ E + T and E → ∙ T are added to the initial state
❖ Since T can be expected and there are two productions for T
✴ Either ID or ( E ) can be expected
✴ The items: T → ∙ ID and T → ∙ ( E ) are added to the initial state
❖ The initial state (0) is identified by the following set of
items

S → ∙E$
E → ∙E+
T E→ ∙T
T → ∙ ID
T → ∙(E) 0
25
Shift Actions
❖ In state 0, we can shift either an ID or a left parenthesis
✴ If we shift an ID, we shift the dot past the ID
✴ We obtain a new item T → ID ∙ and a new state (state 1)
✴ If we shift a left parenthesis, we obtain T → ( ∙ E )
✴ Since the dot appears before E, an E is expected
✴ We add the items E → ∙ E + T and E → ∙ T
✴ Since the dot appears before T, we add T → ∙ ID and T → ∙ ( E )
✴ The new set of items forms a new state (state 2)

❖ In State 2, we can also shift an ID or a left parenthesis as shown

S → ∙E T →( ∙ E ) E
$ ( → ∙E+T
E →∙ E + T E →∙ T (
E →∙ T T → ID ∙ ID T → ∙ ID
ID 1
T → ∙ ID 0 T →∙ ( E ) 2
T →∙ ( E )

26
Reduce and Goto Actions
❖ In state 1, the dot appears at the end of item T → ID ∙
✴ This means that ID appears on top of stack and can be reduced to T
✴ When ∙ appears at end of an item, the parser can perform a reduce action

❖ If ID is reduced to T, what is the next state of the parser?


✴ ID is popped from the stack; Previous state appears on top of stack
✴ T is pushed on the stack
✴ A new item E → T ∙ and a new state (state 3) are obtained
✴ If top of stack is state 0 and we push a T, we go to state 3
✴ Similarly, if top of stack is state 2 and we push a T, we go also to state 3

S → ∙E T TE → T∙ T →( ∙ E ) E
3
$ → ∙E+T
E →∙ E + T ( E →∙ T (
E →∙ T T → ∙ ID
T → ∙ ID 0 ID T → ID ∙ ID T →∙ ( E ) 2
1
T →∙ ( E )

27
DFA of LR(0) States
❖ We complete the state diagram to obtain the DFA of LR(0) states
❖ In state 4, if next token is $, the parser accepts (successful parse)

S → ∙E T TE → T∙ T →( ∙ E ) E
3
$ → ∙E+T
E →∙ E + T ( E →∙ T (
E →∙ T T → ∙ ID
T → ∙ ID 0 ID T → ID ∙ ID T →∙ ( E ) 2
1
T →∙ ( E )
E ID ( E
S → E∙ + E →E + ∙ T T →( E ∙ )
$ 4 T → ∙ ID + E →E ∙ + T 6
E →E ∙ + T T →∙ ( E ) 5
$ )
T
Accept
7 8
E → E+T∙ T → (E)∙

28
LR(0) Parsing Table
❖ The LR(0) parsing table is obtained from the LR(0) state diagram
❖ The rows of the parsing table correspond to the LR(0) states
❖ The columns correspond to tokens and non-terminals
❖ For each state transition i → j caused by a token x …
✴ Put Shift j at position [i, x] of the table
❖ For each transition i → j caused by a nonterminal A …
✴ Put Goto j at position [i, A] of the table
❖ For each state containing an item A → α ∙ of rule n …
✴ Put Reduce n at position [i, y] for every token y
❖ For each transition i → Accept …
✴ Put Accept at position [i, $] of the table

29
LR(0) Parsing Table – cont'd
S → ∙E T TE → T∙ T →( ∙ E ) E

3
$ → ∙E+T 1: E E+T
E →∙ E + T ( E →∙ T (
E →∙ T T → ∙ ID 2: E → T
T → ∙ ID 0 ID T → ID ∙
1
ID T →∙ ( E ) 2 3: T → id
T →∙ ( E ) 4: T → (E)
E ID ( E
S → E∙ + E →E + ∙ T T →( E ∙ ) LR(0) parsing table
$ 4 T → ∙ ID + E →E ∙ + T 6 Action Goto
E →E ∙ + T T →∙ ( E ) 5 State + ID ( ) $ E T
$ )
T 0 S1 S2 G4 G3
Accept 1 R3 R3 R3 R3 R3
7 8
Action Goto E → E+T∙ T → (E)∙
Stat 2 S1 S2 G6 G3
e + ID ( ) $ E T
0 S1 S2 G4 G3
3 R2 R2 R2 R2 R2
1 R3 R3 R3 4 S5 A
2 S1 S2 G6 G3 5 S1 S2 G7
3 R2 R2 R2 SLR(1) parsing table 6 S5 S8
4 S5 A
5 S1 S2 G7
7 R1 R1 R1 R1 R1
6 S5 S8 8 R4 R4 R4 R4 R4
7 R1 R1 R1 30
8 R4 R4 R4
Limitations of the LR(0) Parsing Method
❖ Consider grammar G2 for matched parentheses

❖ 0: S' → S $ 1: S → ( S ) S 2: S → ε

❖ The LR(0) DFA of grammar G2 is shown below


❖ In states: 0, 2, and 4, parser can shift ( and reduce ε to S

(
S' → ∙ S $ S→(∙S)S S→(S)∙S
S→∙(S)S ( S→∙(S)S ( S→∙(S)S
S→∙ 0 S→∙ 2 S→∙ 4

S S ) S
S' → S ∙ $ 1 S →(S∙)S 3
S →(S)S∙ 5

Accept $

31
Conflicts
❖ In state 0 parser encounters a conflict ...
✴ It can shift state 2 on stack when next token is ( S' → ∙ S $
✴ It can reduce production 2: S → ε S→∙(S)S
S→∙ 0
✴ This is a called a shift-reduce conflict
(
✴ This conflict also appears in states 2 and 4
2

❖ Two kinds of conflicts may arise


Action Goto
✴ Shift-reduce and reduce-reduce State
( ) $ S
0 S2,R2 R2 R2 G1
Shift-reduce conflict 1 A
Parser can shift and can reduce 2 S2,R2 R2 R2 G3
Reduce-reduce conflict 3 S4
4 S2,R2 R2 R2 G5
Two (or more) productions can be reduced
5 R1 R1 R1

32
LR(0) Grammars
❖ The shift-reduce conflict in state 0 indicates that G2 is not LR(0)
❖ A grammar is LR(0) if and only if each state is either …
✴ A shift state, containing only shift items
✴ A reduce state, containing only a single reduce item
❖ If a state contains A → α ● x γ then it cannot contain B → β ●
✴ Otherwise, parser can shift x and reduce B → β ● (shift-reduce conflict)
❖ If a state contains A → α ● then it cannot contain B → β ●
✴ Otherwise, parser can reduce A → α ● and B → β ● (reduce-reduce conflict)
❖ LR(0) lacks the power to parse programming language grammars
✴ Because they do not use the lookahead token in making parsing decisions

33
SLR(1) Parsing
❖ SLR(1), or simple LR(1), improves LR(0) by …
✴ Making use of the lookahead token to eliminate conflicts
❖ SLR(1) works as follows …
✴ It uses the same DFA obtained by the LR(0) parsing method
✴ It puts reduce actions only where indicated by the FOLLOW set
❖ To reduce α to A in A → α ● we must ensure that …
✴ Next token may follow A (belongs to FOLLOW(A))
❖ We should not reduce A → α ● when next token ∉ FOLLOW(A)
❖ In grammar G2 …
✴ 0: S' → S $ 1: S → ( S ) S 2: S → ε
✴ FOLLOW(S) = {$, )}
✴ Productions 1 and 2 are reduced when next token is $ or ) only

34
SLR(1) Parsing Table
❖ The SLR(1) parsing table of grammar G2 is
shown below
❖ The shift-reduce conflicts are now eliminated
✴ The R2 action is removed from [0, ( ], [2, ( ], and [4, ( ]
✴ Because ( does not follow S
✴ S2 remains under [0, ( ], [2, ( ], and [4, ( ]
Action Goto
✴ R1 action is also removed from [5, ( ] State
( ) $ S
❖ Grammar G2 is SLR(1) 0 S2 R2 R2 G1
1 A
✴ No conflicts in parsing table 2 S2 R2 R2 G3
3 S4
✴ R1 and R2 for ) and $ only
4 S2 R2 R2 G5
✴ Follow set indicates when to reduce 5 R1 R1

35
SLR(1) Grammars
❖ SLR(1) parsing increases the power of LR(0) significantly
✴ Lookahead token is used to make parsing decisions
✴ Reduce action is applied more selectively according to FOLLOW set

❖ A grammar is SLR(1) if two conditions are met in every state …


✴ If A → α ● x γ and B → β ● then token x ∉ FOLLOW(B)
✴ If A → α ● and B → β ● then FOLLOW(A) ∩ FOLLOW(B) = ∅

❖ Violation of first condition results in shift-reduce conflict


✴ A → α ● x γ and B → β ● and x ∈ FOLLOW(B) then …
✴ Parser can shift x and reduce B → β

❖ Violation of second condition results in reduce-reduce conflict


✴ A → α ● and B → β ● and x ∈ FOLLOW(A) ∩ FOLLOW(B)
✴ Parser can reduce A → α and B → β

❖ SLR(1) grammars are a superset of LR(0) grammars

36
Limits of the SLR(1) Parsing Method
❖ Consider the following grammar G3 …
0: S' → S $ 1: S → id 2: S → V := E 3: V → id 4: E → V 5:
E→n
❖ The initial state consists of 4 items as shown below
✴ When id is shifted in state 0, we obtain 2 items: S → id ∙ and V → id ∙

❖ FOLLOW(S) = {$} and FOLLOW(V) = {:= , $}


❖ Reduce-reduce conflict in state 1 when lookahead token is $
✴ Therefore, grammar G3 is not SLR(1)
✴ The reduce-reduce conflict is caused by the weakness of SLR(1) method
✴ V → id should be reduced only when lookahead token is := (but not $)

S' → ∙ S $ id S → id ∙
S → ∙ id V → id ∙ 1
S → ∙ V := E
V → ∙ id 0

37
38
Consider the grammar
E -> T+E | T
T ->id

Construct LR(0) and SLR(1) parsing tables. 39


40
41
42
Classification of Parsing

43
CLR(1) Parsing – Items and States
❖ Even more powerful than SLR(1) is the CLR(1) parsing method
❖ CLR(1) generalizes LR(0) by including a lookahead token in items
❖ An CLR(1) item consists of …
✴ Grammar production rule
✴ Right-hand position represented by the dot, and
✴ Lookahead token

A → X1 . . . Xi ∙ Xi+1 . . . Xn , l where l is a lookahead token


❖ The ∙ represents how much of the right-hand side has been seen
✴ X1 . . . Xi appear on top of the stack
✴ Xi+1 . . . Xn are expected to appear

❖ The lookahead token l is expected after X1 . . . Xn appear on stack


❖ An CLR(1) state is a set of LR(1) items

44
CLR(1) Parser Generation – Initial State
❖ Consider again grammar G3 …
0: S' → S $ 1: S → id 2: S → V := E 3: V → id 4: E → V 5: E → n 0: S' → S $
1: S → id
❖ The initial state contains the CLR(1) item: S' → ∙ S , $
2: S → V := E
✴ S' → ∙ S , $ means that S is expected and to be followed by $
3: V → id
❖ The closure of (S' → ∙ S , $) produces the initial state items 4: E → V
✴ Since the dot appears before S, an S is expected 5: E → n
✴ There are two productions of S: S → id and S → V := E
✴ The CLR(1) items (S → ∙ id , $) and (S → ∙ V := E , $) are obtained
✧ The lookahead token is $ (end-of-file token)
✴ Since the ∙ appears before V in (S → ∙ V := E , $), a V is expected
✴ The CLR(1) item ( V → ∙ id , := ) is obtained
✧ The lookahead token is := because it appears after V in (S → ∙ V := E , $)

45
Shift Action
❖ The initial state (state 0) consists of 4 items
❖ In state 0, we can shift an id
✴ The token id can be shifted in two items S' → ∙ S , $
S → ∙ id , $
✴ When shifting id, we shift the dot past the id S → ∙ V :=
✴ We obtain (S → id ∙ , $ ) and ( V → id ∙ , := ) E,$ 0
V →∙ id , :=
✴ The two CLR(1) items form a new state (state 1)
✴ The two items are reduce items
✴ No additional item can be added to state 1

S' → ∙ S , $ S → id ∙
S → ∙ id , $ id ,$ 1
S → ∙ V := V →id ∙ , :=
E,$ 0
V →∙ id , :=

46
Reduce and Goto Actions

❖ In state 1, ∙ appears at end of ( S → id ∙ , $ ) and ( V → id ∙ , := )


✴ This means that id appears on top of stack and can be reduced
✴ Two productions can be reduced: S → id and V → id

❖ The lookahead token eliminates the conflict of the reduce items


✴ If lookahead token is $ then id is reduced to S
✴ If lookahead token is := then id is reduced to V

❖ When in state 0 after a reduce action …


✴ If S is pushed, we obtain item (S' → S ∙ , $) and go to state 2
✴ If V is pushed, we obtain item (S → V ∙ := E , $) and go to state 3

2 S S' → ∙ S , $ S → id ∙
S' → S∙,$ id
S → ∙ id , $ ,$ 1
$ S → ∙ V := V →id ∙ , :=
Accept E,$ 0 V 3
V →∙ id , := S → V ∙ := E , $

47
CLR(1) State Diagram

❖ The CLR(1) state diagram of grammar G3 is shown below


❖ Grammar G3, which was not SLR(1), is now CLR(1)
❖ The reduce-reduce conflict that existed in state 1 is now removed
❖ The lookahead token in CLR(1) items eliminated the conflict

S' → ∙ S , $ S → id ∙ S → V := E ∙ , $
S → ∙ id , $ id ,$ 1 5
S → ∙ V := V →id ∙ , :=
E,$ 0 V 3 E 6
V →∙ id , := S → V ∙ := E , $ E →V ∙ , $
S :=
V
2 S → V := ∙ E , $ E → n∙,$
7
S' → S∙,$
E→∙V,$ E n
$ →∙n,$ V
Accept → ∙ id , $ 4 id
V → id ∙ , $
8

48
CLR(1) State Diagram

S CC
C cC
C d

49
LALR(1) State Diagram
S CC
C cC
C d

50
CLR(1) State Diagram

51
CLR(1) &LALR(1) State Diagram

52
CLR(1) Grammars

❖ A grammar is CLR(1) if the following two conditions are met …


✴ If a state contains (A → α ● x γ, a) and (B → β ●, b) then b ≠ x
✴ If a state contains (A → α ●, a) and (B → β ●, b) then a ≠ b
❖ Violation of first condition results in a shift-reduce conflict
❖ If a state contains (A → α ● x γ, a) and (B → β ●, x) then …
✴ It can shift x and can reduce B → β when lookahead token is x
❖ Violation of second condition results in reduce-reduce conflict
❖ If a state contains (A → α ●, a) and (B → β ●, a) then …
✴ It can reduce A → α and B → β when lookahead token is a
❖ CLR(1) grammars are a superset of SLR(1) grammars

53
Drawback of CLR(1)
❖ CLR(1) can generate very large parsing tables
❖ For a typical programming language grammar …
✴ The number of states is around several hundred for LR(0) and SLR(1)
✴ The number of states can be several thousand for CLR(1)
❖ This is why parser generators do not adopt the general
CLR(1)
❖ Consider again grammar G2 for matched parentheses

0: S' → S $ 1: S → ( S ) S 2: S → ε

❖ The CLR(1) DFA has 10 states, while the LR(0) DFA has 6

54
CLR(1) DFA of Grammar G2
(
0 2 6 0: S' → S $
S' → ∙ S $ S → ∙ S →(∙S)S S →(∙S)S
(S)S$ SS→∙ ( $ ( ) 1: S → ( S ) S
$ S →∙(S)S S →∙(S)S
) ) 2: S → ε
1 S 3 S 7 S
S →∙ ) S →∙ )
S' → S ∙ $ S→(S∙)S $ S→(S∙)S )
$ 4 ) 8 )
( (
Accept S→(S)∙S$ S S → ( S ) ∙ S)
→∙(S)S$ S S → ∙ ( S ) S)
→∙$ S →∙ )
5 S 9 S
S →(S)S∙ $ S →(S)S∙ )

55
LALR(1) : Look-Ahead LR(1)
❖ Preferred parsing technique in many parser generators
❖ Close in power to LR(1), but with less number of states
❖ Increased number of states in LR(1) is because
✴ Different lookahead tokens are associated with same LR(0) items
❖ Number of states in LALR(1) = states in LR(0)
❖ LALR(1) is based on the observation that
✴ Some LR(1) states have same LR(0) items
✴ Differ only in lookahead tokens
❖ LALR(1) can be obtained from LR(1) by
✴ Merging LR(1) states that have same LR(0) items
✴ Obtaining the union of the LR(1) lookahead tokens

56
LALR(1) DFA of Grammar G2
( (
0 2 4
0: S' → S $
S' → ∙ S $ S → ∙ S → ( ∙ S ) S$ ) S →(S)∙S $)
(S)S$ SS→∙ ( S→∙(S)S ) ( S →∙(S)S$) 1: S → ( S ) S
$ S→∙ ) S→∙ $)
1 S 3 S ) 5 S 2: S → ε
S' → S ∙ $ S → ( S ∙ )Accept
S $) S → ( S ) SAccept
∙ $)
$
Accept

57
YAAC Parser

58
YAAC Parser

59
YAAC Parser

60
YAAC Parser

61
Grammar Hierarchy

62
Syntax Error Handling

● Next few lectures will focus on the nature of syntactic errors and
general strategies for error-recovery.
● If a compiler had to process only correct programs, its design would
be extremely simple. However it is expected to Assist the programmer
in locating and tracking errors.
● One should note that a programming language does not specify how
a compiler should respond to errors. It is actually left entirely to the
compiler designer.

63
Review: Common Programming Errors

■ Semantic Errors: Type mismatches between operators and operands. Like, a


return statements with return type void.
■ Syntactic Errors: Misplaced semicolons, extra or missing braces.
■ Lexical errors: Misspellings of keywords, identifiers.
■ Logical Errors : Incorrect reasoning or plain carelessness might result in
errors like interchangeably using = and == operators.

64
Challenges of Error Handling
Viable-Fixable property: detecting an error as soon as a prefix of the input
cannot be completed to form a string of the language. Goals of an
Error Handler:
● Reporting presence of errors, clearly and accurately.
● Recovering from errors quickly enough to detect subsequent errors.
● Add minimal overhead to the processing of correct programs.

65
Error recovery
● There is no universally acceptable method.
● The simplest method is for the parser to quit with the appropriate
error message, at the first instance of an error.
● Problem? Subsequent errors will not be detected.
● Solution? If the parser can restore itself to a state where processing
can continue, future errors can be detected.
● In some cases compiler stops if errors pileup.

66
Error Recovery Strategies
● Panic Mode Recovery: The parser discovers an error. It then discards
input symbols till a designated set of synchronizing token is found.
● Synchronizing tokens selected are such that their role in the program
is unambiguous, like Delimiters ; } etc. Advantage? Simple and never
goes into an infinite loop.
● Drawback: Skips considerable amount of input when checking for
additional errors

67
Phrase-level Recovery

● Local Correction by parser on remaining input, by some string which


allows parser to continue.
● Replacing comma by semicolon, inserting extra semicolon etc.
● Drawbacks: Improper replacement might lead to infinite loops. More
importantly, if the error has occurred before the point of detection.
● Advantage: It can correct any input string.

68
Error Productions

● A method of anticipating common errors that might be encountered.


● Augmenting the grammar for the language at hand, with productions
that generate erroneous constructs.
● Such a parser will detect anticipated errors when an error production
is used.
● Advantage: Error diagnostics will be readily available for such
anticipated errors.

69
Global Corrections

● Ideally minimum changes should be made to the input string.


● Given an input string x, algorithm finds parse tree for a related string
y, such that number of insertions, deletions, and token changes
required for converting x to y is minimum.
● Drawback: Too costly to implement in terms of both time and space,
and only theoretical.
● It is however used as a yardstick for evaluating error-recovery
techniques.

70
Reference Materials

References video materials


1. https://www.youtube.com/watch?v=g9Pb5L8aLeI
2. https://www.youtube.com/watch?v=APJ_Eh60Qwo
3. https://www.youtube.com/watch?v=0kiTNN2kHyY
4. https://www.youtube.com/watch?v=MIg2ymmMn4k
5. https://www.youtube.com/watch?v=5s4CWn6GiwY
6. https://www.youtube.com/watch?v=VSkfnRfNuwI
7. https://www.youtube.com/watch?v=1XD2wk52-Cs
8. https://www.cs.princeton.edu/courses/archive/spring20/cos320/LR0/

71

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy