CD Unit-3 (1) (R20)
CD Unit-3 (1) (R20)
UNIT III
Bottom Up Parsing
Introduction
Bottom-Up parsing is applied in the syntax analysis phase of the compiler. Bottom-up parsing
parses the stream of tokens from the lexical analyzer. And after parsing the input string it generates
a parse tree.
The bottom-up parser builds a parse tree from the leaf nodes and proceeds towards the root node of
the tree. In this section, we will be discussing bottom-up parsing along with its types.
Bottom-up parsing pareses the input string from the lexical analyzer. And if there is no error in the
input string it constructs a parse tree as output. The bottom-up parsing starts building parse trees
from the leaf nodes i.e., from the bottom of the tree. And gradually proceeds upwards to the root of
the parse tree.
The bottom-up parsers are created for the largest class of LR grammars. As the bottom-up parsing
corresponds to the process of reducing the string to the starting symbol of the grammar.
Step of Reducing a String:
The reduction process is just the reverse of derivation that we have seen in top-down parsing. Thus,
the bottom-up parsing derives the input string reverse.
Top-down Bottom-up
1. Construct tree from root to leaves 1. Construct tree from leaves to root
2. “Guers” which RHS to substitute 2. “Guers” which rule to “reduce”terminals
for nonterminal
3. Produces left-most derivation 3. Produces reverse right-most derivation.
4. Recursive descent, LL parsers 4. Shift-reduce, LR, LALR, etc.
5. Easy for humans 5. “Harder” for humans
Bottom-up parsing has been classified into various parsing. These are as follows:
1. Shift-Reduce Parsing
2. Operator Precedence Parsing
3. Table Driven LR Parsing
LL LR
Starts with the root nonterminal on the stack. Ends with the root nonterminal on the stack.
Uses the stack for designating what is still to Uses the stack for designating what is already
be expected. seen.
Builds the parse tree top-down. Builds the parse tree bottom-up.
Continuously pops a nonterminal off the Tries to recognize a right hand side on the stack,
stack, and pushes the corresponding right pops it, and pushes the corresponding
hand side. nonterminal.
Reads the terminals when it pops one off the Reads the terminals while it pushes them on the
stack. stack.
Pre-order traversal of the parse tree. Post-order traversal of the parse tree.
Types of LR Parsers
LR parsing is also a category of Bottom-up parsing. It is generally used to parse the class of
grammars whose size is large. In the LR parsing, "L" stands for scanning of the input left-to-right,
and "R" stands for constructing a rightmost derivation in a reverse way.
"K" stands for the count of input symbols of the look-ahead that are used to make many parsing
decisions.
Similar to predictive parsing the end of the input buffer and end of stack has $.
The input buffer has the input string that has to be parsed.
The stack maintains the sequence of grammar symbols while parsing the input string.
The parsing table is a two-dimensional array that has two entries ‘Go To’ and ‘Action’.
The L stands for scan operations of Bottom-up parsing left-to-right, and R stands for scan right-to-
left in Bottom-up parsing.
LR bottom-up parsing parsers have several advantages of Bottom-up parsing like:
The LR algorithm requires input, output, stack, and parsing tables. In all types of LR parsing, input,
output, and stack are the same, but the parsing table is different. The input buffer indicates the end
of input data, and it has the string to be parsed. The symbol of "$ follows that string."A stack is used
to contain grammar symbols' sequence with the symbol $ at the stack's Bottom.
A parsing table can be defined as an array of two-dimension. It usually contains two parts: the
action part and the go-to part.
LR (1) Parsing
The various steps involved in the LR (1) Parsing are as follows:
Augment Grammar
Augmented grammar will be generated if we add one more product in the given grammar G. It helps
the parser identify when to stop the parsing and announce the acceptance of the input.
Shift reduce parsing is the most general form of bottom-up parsing. Here we have an input
buffer that holds the input string that is scanned by the parser from left to right. There is also a stack that is
used to hold the grammar symbols during parsing.
The bottom of the stack and the right end of the input buffer is marked with the $. Initially, before
the parsing starts:
The input buffer holds the input string provided by the lexical analyzer.
The stack is empty.
As the parser parses the string from left to right then it shifts zero or more input symbols onto the
stack.
The parser continues to shift the input symbol onto the stack until it is filled with a substring. A
substring that matches the production body of a nonterminal in the grammar. Then the substring is
replaced or reduced by the appropriate nonterminal.
The parser continues shift-reducing until either of the following condition occurs:
It identifies an error
The stack contains the start symbol of the grammar and the input buffer becomes empty.
1. Shift: This action shifts the next input symbol present on the input buffer onto the top of the
stack.
2. Reduce: This action is performed if the top of the stack has an input symbol that denotes a right
end of a substring. And within the stack there exist the left end of the substring. The reduce
action replaces the entire substring with the appropriate non-terminal. The production body of
this non-terminal matches the replaced substring.
3. Accept: When at the end of parsing the input buffer becomes empty. And the stack has left with
the start symbol of the grammar. The parser announces the successful completion of parsing.
4. Error: This action identifies the error and performs an error recovery routine.
Let us take an example of the shift-reduce parser. Consider that we have a string id * id + id and the
grammar for the input string is:
E ->E + T | T
T -> T * F | F
F -> ( E ) | id
Note: In the shift-reduce parsing a handle always appears on the top of the stack. The handle is a
substring that matches the body of production in the grammar. The handle must never appear inside
the stack.
Shift reduce parsing is a process to reduce a string to its grammar start symbol. It uses a stack to
hold the grammar and an input tape to hold the string. Shift reduce parsing performs the two
Example
Grammar:
1. S → S+S
2. S → S-S
3. S → (S)
4. S → a
Input string: a1-(a2+a3)
Parsing table
Parsing Action
The sequence in which parsing action is performed in operator precedence parsing is as:
At first, the $ symbol is added to both ends of the string.
Now we scan the input string from left to right until the character ⋗ is encountered.
Scanning is done towards leftover all the equal precedence until the first leftmost ⋖ is
encountered.
Everything between leftmost ⋖ and rightmost ⋗ is a handle.
If it is $ on $, then it means parsing is successful.
Table-driven LR Parsing
LR parsing is also a category of Bottom-up parsing. It is generally used to parse the class of
grammars whose size is large. In the LR parsing, "L" stands for scanning of the input left-to-right,
and "R" stands for constructing a rightmost derivation in a reverse way.
"K" stands for the count of input symbols of the look-ahead that are used to make many parsing
decisions.
Types of LR Parsers:
1. LR( 1 )
2. SLR( 1 )
3. CLR ( 1 )
4. LALR( 1 )
The L stands for scan operations of Bottom-up parsing left-to-right, and R stands for scan right-to-
left in Bottom-up parsing.
SLR Parsers
SLR is simple LR. It is the smallest class of grammar having few number of states. SLR is very
easy to construct and is similar to LR parsing. The only difference between SLR parser and LR(0)
parser is that in LR(0) parsing table, there’s a chance of ‘shift reduced’ conflict because we are
entering ‘reduce’ corresponding to all terminal states. We can solve this problem by entering
‘reduce’ corresponding to FOLLOW of LHS of production in the terminating state. This is called
SLR(1) collection of items
Construction of SLR Parsing Tables
Steps for constructing the SLR parsing table :
1. Writing augmented grammar
2. LR(0) collection of items to be found
3. Find FOLLOW of LHS of production
4. Defining 2 functions:goto[list of terminals] and action[list of non-terminals] in the parsing
table
EXAMPLE – Construct LR parsing table for the given context-free grammar
S–>AA
A–>aA|b
Solution:
The SLR Parser discussed in the earlier class has certain flaws.
1.On single input, State may be included a Final Item and a Non- Final Item. This may result in a
Shift-Reduce Conflict .
2.A State may be included Two Different Final Items. This might result in a Reduce-Reduce Conflict
3.SLR(1) Parser reduces only when the next token is in Follow of the left-hand side of the production.
4.SLR(1) can reduce shift-reduce conflicts but not reduce-reduce conflicts
CLR refers to canonical lookahead. CLR parsing use the canonical collection of LR (1) items to build
the CLR (1) parsing table. CLR (1) parsing table produces the more number of states as compare to
the SLR (1) parsing.
In the CLR (1), we place the reduce node only in the lookahead symbols.
LR (1) item
The look ahead always add $ symbol for the argument production.
Example
CLR ( 1 ) Grammar
1. S → AA
2. A → aA
3. A → b
Add Augment Production, insert '•' symbol at the first position for every production in G and also add
the lookahead.
1) S` → •S, $
2) S → •AA, $
3) A → •aA, a/b
4) A → •b, a/b
I0 State:
Add all productions starting with S in to I0 State because "." is followed by the non-terminal. So, the
I0 State becomes
I0 = S` → •S, $
S → •AA, $
Add all productions starting with A in modified I0 State because "." is followed by the non-terminal.
So, the I0 State becomes.
I0= S` → •S, $
S → •AA, $
A → •aA, a/b
A → •b, a/b
Add all productions starting with A in I2 State because "." is followed by the non-terminal. So, the I2
State becomes
Add all productions starting with A in I3 State because "." is followed by the non-terminal. So, the I3
State becomes
Add all productions starting with A in I6 State because "." is followed by the non-terminal. So, the I6
State becomes
I6 = A → a•A, $
A → •aA, $
A → •b, $
Drawing DFA:
1. S → AA ... (1)
2. A → aA ....(2)
3. A → b ... (3)
The placement of shift node in CLR (1) parsing table is same as the SLR (1) parsing table. Only
difference in the placement of reduce node.
I4 contains the final item which drives ( A → b•, a/b), so action {I4, a} = R3, action {I4, b} = R3.
I5 contains the final item which drives ( S → AA•, $), so action {I5, $} = R1.
I7 contains the final item which drives ( A → b•,$), so action {I7, $} = R3.
I8 contains the final item which drives ( A → aA•, a/b), so action {I8, a} = R2, action {I8, b} = R2.
I9 contains the final item which drives ( A → aA•, $), so action {I9, $} = R2.
The method for building the collection of sets of valid LR(1) items is essentially the same as the one
for building the canonical collection of sets of LR(0) items. We need only to modify the two
procedures CLOSURE and GOTO.
We now introduce our last parser construction method, the L A L R (lookahead-L R ) technique. This
method is often used in practice, because the tables obtained by it are considerably smaller than the
canonical LR tables, yet most common syntactic tained by it are considerably smaller than the
canonical LR tables, yet most common syntactic constructs of programming languages can be
expressed con-veniently by an L A L R grammar. The same is almost true for S L R grammars, but
there are a few constructs that cannot be conveniently handled by S L R techniques .
For a comparison of parser size, the S L R and L A L R tables for a grammar always have the same
number of states, and this number is typically several hundred states for a language like C. The
canonical LR table would typically have several thousand states for the same-size language. Thus, it
is much easier and more economical to construct S L R and L A L R tables than the canonical L R
tables.
By way of introduction, let us again consider grammar (4.55), whose sets of L R ( 1 ) items were
shown in Fig. 4.41. Take a pair of similar looking states, such as I4 and I 7 . Each of these states has
which generates the four strings acd, ace, bed, and bee. The reader can check that the grammar is
LR(1) by constructing the sets of items. Upon doing so,
generates a reduce/reduce conflict, since reductions by both A -> c and B -> c are called for on inputs
d and e.
We are now prepared to give the first of two LALR table-construction al-gorithms. The general idea
is to construct the sets of LR(1) items, and if no conflicts arise, merge sets with common cores. We
then construct the parsing table from the collection of merged sets of items. The method we are about
to describe serves primarily as a definition of LALR(l) grammars. Constructing the entire collection
of LR(1) sets of items requires too much space and time to be useful in practice.
INPUT : An augmented grammar G'.
OUTPUT : The LALR parsing-table functions ACTION and GOTO for G'.
METHOD :
The table produced by Algorithm 4.59 is called the LALR parsing table for G. If there are no parsing
action conflicts, then the given grammar is said to be an LALR(l) grammar. The collection of sets
of items constructed in step (3) is called the LALR(l) collection.
Example
To see how the GOTO's are computed, consider GOTO(I3 6 , C). In the original set of LR(1)
items, GOTO(I3, C) = h, and Is is now part of I89, so we make GOTO(I36,C) be 789- We could
have arrived at the same conclusion if we considered I6, the other part of I3 g. That
is, GOTO(I6, C) = Ig, and IQ is now part of Ig9. For another example, consider GOTO(I2,c), an
entry that is exercised after the shift action of I2 on input c. In the original sets of LR(1)
items, GOTO(I2,c) = I6. Since I6 is now part of 736, GOTO(I2,c) becomes I36. Thus, the entry in
Fig. 4.43 for state 2 and input c is made s36, meaning shift and push state 36 onto the stack.
When presented with a string from the language c*dc*d, both the LR parser of Fig. 4.42 and the
LALR parser of Fig. 4.43 make exactly the same sequence of shifts and reductions, although the
names of the states on the stack may differ. For instance, if the LR parser puts I3 or I6 on the stack,
the LALR parser will put I36 on the stack. This relationship holds in general for an LALR grammar.
The LR and LALR parsers will mimic one another on correct inputs.
If we have the LALR(l) kernels, we can generate the LALR(l) parsing table by closing each
kernel, using the function CLOSURE of Fig. 4.40, and then computing table entries by Algorithm
4.56, as if the LALR(l) sets of items were canonical LR(1) sets of items.
Example 4.61 : We shall use as an example of the efficient LALR(l) table-construction method the
non-SLR grammar from Example 4.48, which we re-produce below in its augmented form:
Example
Let us construct the kernels of the LALR(l) items for the grammar of Example 4.61. The kernels of
the LR(0) items were shown in Fig. 4.44. When we apply Algorithm 4.62 to the kernel of set of
items Io, we first compute CLOSURE({[S" ->> S, #]}), which is
In Fig. 4.47, we show steps (3) and (4) of Algorithm 4.63. The column labeled INIT shows the
spontaneously generated lookaheads for each kernel item. These are only the two occurrences of =
discussed earlier, and the spontaneous lookahead $ for the initial item S' - 5 .
On the first pass, the lookahead $ propagates from S' —>• S in Io to the six items listed in Fig.
4.46. The lookahead = propagates from L -> *-R in i4 to items L -» * R- in I7 and R -> L- in
I8- It also propagates to itself and to L —»• id • in I5, but these lookaheads are already present.
In the second and third passes, the only new lookahead propagated is $, discovered for the successors
of i2 and i4 on pass 2 and for the successor of IQ on pass 3. No new lookaheads are propagated on
pass 4, so the final set of lookaheads is shown in the rightmost column of Fig. 4.47.
Note that the shift/reduce conflict found in Example 4.48 using the SLR method has disappeared with
the LALR technique. The reason is that only lookahead $ is associated with R -» L- in I2 , so there is
no conflict with the parsing action of shift on = generated by item S -» L=R in i 2 .
Compilers and interpreters use grammar to build the data structure in order to process the programs.
So ideally one program should have one derivation tree. A parse tree or a derivation tree is a
graphical representation that shows how strings of the grammar are derived using production rules.
But there exist some strings which are ambiguous.
A grammar is said to be ambiguous if there exists more than one leftmost derivation or more than
one rightmost derivation or more than one parse tree for an input string. An ambiguous grammar or
string can have multiple meanings. Ambiguity is often treated as a grammar bug, in programming
languages, this is mostly unintended.
Dangling-else ambiguity
The dangling else problem in syntactic ambiguity. It occurs when we use nested if. When there are
multiple “if” statements, the “else” part doesn’t get a clear view with which “if ” it should combine.
For example:
if (condition) {
}
if (condition 1) {
}
if (condition 2) {
}
else
For example:
To solve the issue the programming languages like C, C++, Java combine the “else” part with the
innermost “if” statement. But sometimes we want the outermost “if” statement to get combined with
the “else” part.
Secondly, we can resolve the dangling-else problems in programming languages by using braces
and indentation.
For example:
if (condition) {
if (condition 1) {
if (condition 2) {}
}
}
else {
}
In the above example, we are using braces and indentation so as to avoid confusion.
Third, we can also use the “if – else if – else” format so as to specifically indicate which “else”
belongs to which “if”.
if(condition) {
}
else if(condition-1) {
}
else if(condition-2){
}
else{
}
LR(0) parser
SLR parser
LALR parser
CLR parser
When there is no valid continuation for the input scanned thus far, LR parsers report an error.
Before notifying a mistake, a CLR parser never performs a single reduction and an SLR or LALR
may do multiple reductions, but they will never move an incorrect input symbol into the stack.
When the parser checks the table and discovers that the relevant action item is empty, an error is
recognized in LR parsing. Goto entries can never be used to detect errors.
Programmer mistakes that call error procedures in the parser table are determined based on the
language.
Creating error procedures that can alter the top of the stack and/or certain symbols on input in a way
that is acceptable for table error entries.
Errors in structure
Missing operator
Misspelled keywords
Unbalanced parenthesis
This approach involves removing consecutive characters from the input one by one until a set of
synchronized tokens is obtained. Delimiters such as or are synchronizing tokens. The benefit is that
it is simple to implement and ensures that you do not end up in an infinite loop. The drawback is
that a significant quantity of data is skipped without being checked for additional problems.
Scan the stack until you find a state ‘a’ with a goto() on a certain non-terminal ‘B’ (by removing
states from the stack).
Until a symbol ‘b’ that can follow ‘B’ is identified, zero or more input symbols are rejected.
E→E+E
E→E*E
E→(E)
E → id
Step 1: Firstly make the parsing table for the given grammar:
Parsing Table
String: id+)$
STACK INPUT
0 id+)$
0id3 +)$
0E1 +)$
0E1 + 4 )$
0E1 + 4 $
0E1 + 4id3 $
0E1 + 4E7 $
0E1 $
Example:
Lets take the following ambiguous grammar:
E -> E+E
E -> E*E
E -> id
Lets assume, the precedence and associativity of the operators (+ and *) of the grammar are as
follows:
From the LR(1) item DFA we can see that there are shift/reduce conflicts in the state I5 and I6. So
the parsing table is as follows:
There are both shift and reduce moves in I5 and I6 on “+ and “*”. To resolve this conflict, that is to
determine which move to keep and which to discard from the table we shall use the precedence and
associativity of the operators.
Consider the input string:
id + id + id
Lets look at the parser moves till the conflict state according to the above parsing table.
If we take the reduce move of I5 state on symbol “+” as in parser 1, then the left “+” of the input
string is reduced before the right “+”, which makes “+” left associative.
If we take the shift move of I5 state on symbol “+” as in parser 2, then the right “+” of the input
string is reduced before the left “+”, which makes “+” right associative.
Similarly, Taking shift move of I5 state on symbol “*” will give “*” higher precedence over “+”, as
“*” will be reduced before “+”. Taking reduce move of I5 state on symbol “*” will give “+” higher
precedence over “*”, as “+” will be reduced before “*”. Similar to I5, conflicts from I6 can also be
resolved.
According to the precedence and associativity of our example, the conflict is resolved as follows,
The shift/reduce conflict at I5 on “+” is resolved by keeping the reduce move and discarding the
shift move, which makes “+” left associative.
The shift/reduce conflict at I5 on “*” is resolved by keeping the shift move and discarding the
reduce move, which will give “*” higher precedence over “+”.
The shift/reduce conflict at I6 on “+” is resolved by keeping the reduce move and discarding the
shift move, which will give “*” higher precedence over “+”.
The shift/reduce conflict at I6 on “*” is resolved by keeping the reduce move and discarding the
shift move, which makes “*” left associative.
Generally, the parser generator tool YAAC resolves conflicts due to ambiguous grammars as
follows,
Shift/reduce conflict in the parsing table is resolved by giving priority to shift move over reduce
move. If the string is accepted for shift move, then reduce move is removed, otherwise shift move is
removed.
Reduce/reduce conflict in the parsing table is resolved by giving priority to first reduce move over
second reduce move. If the string is accepted for first reduce move, then second reduce move is
removed, otherwise first reduce move is removed.