Chapter 6
Chapter 6
• The parser obtains a string of tokens from the lexical analyzer, and
verifies them with the grammar for the language.
– Collect information about various tokens into the symbol table.
– Perform type checking and other semantic analysis.
– Generate intermediate code. No strategy is proven universally
acceptable, and the simplest
• In practice, parsers are expected to approach for the parser is to quit
– Report syntax errors and with an error message when it
– Recover from commonly occurring errors. detects the first error.
Tokens
Lexical Reset of
Parser
Source Analyzer Get next Parse Front End Intermediate
program token tree representation
Symbol
Table
6
Types of Parsers
E→E+T|E–T|T
T→T*F|T/F|F
F → ( E ) | id
Derivations
is a sentential form of G
Leftmost Derivation and Rightmost Derivation
E E E E E E
Yield: read
- E - E - E - E - E leaves from left
(E ) (E ) to right
( E )
(E )
E+ E E+ E E+ E
id id id
Sequence of parse trees for leftmost derivation (4.8)
Relationship Induction between Derivations and Parse
Trees
• Consider any derivation 1 2 … n, where 1 is a
single nonterminal A.
– For each sentential form i, we can construct a parse tree whose
yield is i.
• Induction process on i
– BASIS: the tree for 1= A is a single node labeled A.
– INDUCTION:
- Suppose we have constructed a parse tree with yield
i-1=X1X2…Xk (where Xi is either a nonterminal or a terminal)
- Suppose i is derived from i-1 by replacing Xj with
where Xj→ , and = Y1Y2…Ym
→ i=X1X2…Xj-1 Xj+1…Xk
- To model this step:
· Find the jth leaf from the left in the current parse tree.
· Let this leaf Xj
· Give this leaf m children labeled Y1, Y2, … Ym
Ambiguity
E E
E + E E * E
id E * E E + E id
id id id id
Two parse trees for id+id*id
Context-Free Grammars vs. Regular Expression
• A recursive-descent parsing consists of a set of procedures, each of which is for one nonterminal.
• Backtracking might be needed to repeat scans over the input.
– NOTE: backtracking is not very efficient, and tabular methods such as the dynamic programming
algorithm is preferred.
• Left-recursive grammar can cause a recursive-decent parser to go into an infinite loop. (i.e., A production
might be expanded repeatedly without consuming any input.
void A() {
1) Choose an A-production, A→X1X2 … Xk To allow backtracking, this
2) for ( i = 1 to k) { should try each production in
3) if ( Xi is a nonterminal )
4) call procedure Xi(); some order
5) else if ( Xi equals the current input symbol a)
6) advance the input to the next symbol; To allow backtracking, this
7) else /* an error has occurred */ should return to line (1) and try
}
}
another A-production until no
more A-productions to try.
A typical procedure for a nonterminal in a top-down parser
Recursive-Descent Parsing (Cont.)
S → cAd
• Input string w = cad. A → ab | a
Grammar
S S S S
backtrack
c A d c A d c A d c A d
a b a a
match
S * Aa S
A a c is in FIRST(A)
A * c a is in FOLLOW(A)
c
FIRST
• Compute FIRST(X) for all grammar symbols X:
– If X is a terminal, then FIRST(X) = {X}.
– If X is a nonterminal and X→Y1Y2 … Yk is a production for some k1,
- Everything in FIRST(Y1) is surely in FIRST(X).
- If Y1 does not derive , then nothing more is added to FIRST(X).
- If Y1 * , then FIRST(Y2) is added to FIRST(X), and so on.
– If X→ is a production, then add to FIRST(X).
• FIRST(F) = { (, id }
• FIRST(T’) = {*, }
E→ TE’ – The two productions for T’ begins with * and .
E’ → + TE’ | • FIRST(T) = FIRST(F) = { (, id }
T → FT’ (4.2) – T has one production beginning with F.
T’ → * FT’ | • FIRST(E’) = {+, }
F→ (E) | id – The two productions for E’ begins with + and .
• FIRST(E) = FIRST(T) = { (, id }
Copyright © All
– E has one production Rights Reserved
beginning with by
T.Yuan-Hao Chang
FOLLOW
• LL(1) grammar:
– First L: scan the input from left to right.
– Second L: produce a leftmost derivation.
– The “1”: use one input symbol of lookahead at each step to make parsing
action decisions. FIRST() and
• No left-recursive or ambiguous grammar can be LL(1). FIRST() are disjoint.
• Predictive parsers
– Are recursive-descent parsers that need no backtracking.
– Look only at the current input symbol on applying the proper
production for a nonterminal.
– Can be constructed for a class of grammars called LL(1).
• E.g., we have the following productions:
stmt → if (expr) stmt else stmt
| while (expr) stmt
| { stmt_list }
The keywords if, while and the symbol
{ tell us which alternative is the only
one that could possibly succeed if we
are to find a statement.
Transition Diagrams for Predictive Parsers
E→E+T|T
T→T*F|F (4.1)
F → (E) | id
A S * Aw w
rm rm Production A→b,
w Rightmost derivation
Shift-Reduce Parsing
B B A
y z x y z
STACK INPUT ACTION STACK INPUT ACTION
$ yz $ reduce B→ $ xyz $ reduce B→
$B yz $ shift $B xyz $ shift xy
$By z$ reduce A→By $Bxy z$ reduce A→y
$ z$ shift $ Bx z$ Shift z
Copyright © All Rights Reserved by Yuan -Hao Chang
Conflicts During Shift-Reduce Parsing
• E.g., a grammar for function call and array for the input p(i,j)
– A function called with parameters surrounded by parentheses.
– Indices of arrays are surrounded by parentheses.
(1) stmt → id (parameter_list) One solution to resolve this problem
(2) stmt → expr := expr is to change production into
(3) parameter_list → parameter_list, parameter stmt → procid (parameter_list)
(4) parameter_list → parameter For the token name of procedures.
(5) parameter → id STACK INPUT
(6) expr → id (expr_list)
(7) expr → Id … procid ( id , id) … $
(8) expr_list → expr_list, expr A procedure call is encountered
(9) expr_list → expr
STACK INPUT
Input: p(i,j) is converted to the token string id(id, id)
… id ( id , id) … $ The correct choice is production (5) if p is a
An array is encountered procedure call.
The correct choice is production (7) if p is an array.
Thanks