0% found this document useful (0 votes)
19 views52 pages

Chapter 6

Uploaded by

md5fxz9ths
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views52 pages

Chapter 6

Uploaded by

md5fxz9ths
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Introduction to the Parser

Copyright © All Rights Reserved by Yuan-Hao Chang


Benefits of Grammars for Programming
Languages
• Give a precise syntactic specification of a
programming language.
• Construct automatically a parser that determines
the syntactic structure of a source program.
– Parser-construction process could reveal syntactic
ambiguities and trouble spots.
• Make the source program translation and error
detection easier.
• Make the adding of new constructs for a language
easier.
The Role of the Parser

• The parser obtains a string of tokens from the lexical analyzer, and
verifies them with the grammar for the language.
– Collect information about various tokens into the symbol table.
– Perform type checking and other semantic analysis.
– Generate intermediate code. No strategy is proven universally
acceptable, and the simplest
• In practice, parsers are expected to approach for the parser is to quit
– Report syntax errors and with an error message when it
– Recover from commonly occurring errors. detects the first error.

Tokens
Lexical Reset of
Parser
Source Analyzer Get next Parse Front End Intermediate
program token tree representation

Symbol
Table
6

Types of Parsers

• There are three general types of parsers:


– Universal parsing
- These general methods are too inefficient to use in production
compilers.
- E.g., Cocke-Younger-Kasami algorithm and Earley’s algorithm
– Top-down parsing
- Build parse trees from the root to the leaves.
– Bottom-up parsing
- Build pares trees from the leaves to the root.
Context-Free Grammar

• Context-free grammar is also called grammar for short. It consists of


– Terminals
- The basic symbols from which strings are formed.
- The token name is a synonym for “terminal”.
– Nonterminals
- Nonterminals are syntactic variables denote sets of strings that help define the language generated by the
grammar.
- Nonterminals impose a hierarchical structure that is key to syntax analysis and translation.
– Start symbol
- Start symbol is the first nonterminal of the grammar and can generate the language.
– Productions
- Productions of a grammar specify the manner in which the terminals and nonterminals are combined to
form strings. Each production consists of:
· A nonterminal called the head or left side.
· The symbol →
· A body or right side that consists of zero or more terminals and nonterminals.
A Grammar to Define Arithmetic Expressions

• A grammar to define arithmetic expressions


– 7 terminals or terminal symbols: id + - * / ( )
– 3 nonterminals: expression, term, factor

expression → expression + term


expression → expression – term
expression → term
term → term * factor
term → term / factor
term → factor
factor → ( expression )
factor → id
Notational Conventions

• These symbols are terminals:


– Lowercase letters early in the alphabet, e.g., a, b, c
– Operator symbols, e.g., +, -, *, /, and so on
– Punctuation symbols, e.g., parentheses, comma, and so on
– The digits 0, 1, …, 9.
– Boldface strings each of which represents a single terminal symbol,
e.g., id.
• These symbols are nonterminals:
– Uppercase letters early in the alphabet, e.g., A, B, C
– The letter S represent the start symbol
– Lowercase, italic names, e.g., expr or stmt.
– When discussing programming constructs, uppercase letters may be
used to represent nonterminals for the constructs, e.g., E, T, and F.
(E: expression, T: term, F: factor)
Notational Conventions (Cont.)

• Grammar symbols (either nonterminals or terminals)


– Uppercase letters late in the alphabet, e.g., X, Y, Z
• Strings of terminals
– Lowercase letters late in the alphabet, e.g., u, v, …, z
• Strings of grammar symbols
– Lowercase Greek letters, e.g., , , 
– E.g., A → 
• A-productions
– A set of productions A→ 1, A→ 2, …, A→ K with a common head A.
– It can be written A→ 1 | 2 , | … | K ,
– 1, 2 , …, K are the alternatives for A.
• The head of the first production is the start symbol.
Grammar with Notational Convention

expression → expression + term


expression → expression – term
expression → term
term → term * factor
term → term / factor
term → factor
factor → ( expression )
factor → id

E→E+T|E–T|T
T→T*F|T/F|F
F → ( E ) | id
Derivations

• The construction of a parse tree can be made


precise by taking a derivational view.
– Productions are treated as rewriting rules.
– Beginning with the start symbol, each rewriting step
replaces a nonterminal by the body of one of its
productions.
- The leftmost derivation corresponds to the top-down parsing.
- The rightmost derivation corresponds to the bottom-up parsing.
– E.g., E → E + E | E * E | -E | (E) | id (4.7) read as “E derives –E”
The replacement of a single E by –E is described by: E  -E
E.g., E  -E  -(E)  -(id)
A derivation of –(id) from E
Derivations (Cont.)

• Consider a nonterminal A in the middle of a sequence of


grammar symbol, as in A.
–  and  are arbitrary strings of grammar symbols (either
nontermials or terminals).
– If A→, then A    Derive in
(A derives  in one step) one step

•   …  n ( derives n ) Derive in



* zero or
more steps
.  *  , for any string
+
2. If  *  and  *  , then  *  Derive in
one or
more steps
Derivations (Cont..)

• A language that can be generated by a (context-free)


grammar is a context-free language.
• If two grammars generate the same language, the
grammars are equivalent.
• The language generated by a grammar is the set of
sentences of the grammar.
– A sentential form may contain both terminals and nonterminals, and
may be empty.
– A sentence of grammar G is a sentential form with no nonterminals.
- A string of terminals w is in L(G) iff w is a sentence of G
If S *  , where S is the start symbolof grammar G

 is a sentential form of G
Leftmost Derivation and Rightmost Derivation

• The string –(id + id) is a sentence of grammar (4.7)


• Leftmost derivation: E → E + E | E * E | -E | (E) | id (4.7)
– The leftmost nonterminal in each sentential form is
always chosen. We write   
lm
– E.g.,
E  -E  -(E)  -(E + E)  -(id + E)  -(id + id) (4.8)
lm lm lm lm lm

• Rightmost (or canonical)derivation:


– The rightmost nonterminal in each sentential form is
always chosen. We write rm 
– E.g., E  -E  -(E)  -(E + E)  -(E + id )  -(id + id)
(4.9)
rm rm rm rm rm
Left-Sentential and Right-Sentential Form
• Every leftmost step can be written as wA 
w, where
– w consists of terminals only.
– A→  is the production applied.
–  is a string of grammar symbols.
• If S * , then  is a left-sentential form of the
lm
gram mar.
• If S 
rm
* , then  is a right-sentential form of the
grammar.
Parse Trees and Derivations

• A parse tree is a graphical representation of a derivation,


and filters out the order in which productions are applied to
replace nonterminals.
– Each interior node labeled with the nonterminal in the head of the
production to represent the application of the production.
– The children of an interior node are labeled (from left to right) by the
symbols in the body of the corresponding production.
– During derivation, the head of the production is replaced by the body
of the corresponding production.
• Yield or frontier of the tree:
– Read the leaves of a parse tree from left to right to constitute a
sentential form.
Parse Trees and Derivations (Cont.)

Parse string -(id+id) with the grammar: E → E + E | E * E | -E | (E) | id (4.7)

Derivation with leftmost derivation: Derivation with rightmost derivation:


E  -E E  -E
 -(E)  -(E)
 -(E + E) (4.8)  -(E + E) (4.9)
 -(id + E)  -(E + id)
 -(id + id)  -(id + id)

E  E  E  E  E  E
Yield: read
- E - E - E - E - E leaves from left
(E ) (E ) to right
( E )
(E )
E+ E E+ E E+ E
id id id
Sequence of parse trees for leftmost derivation (4.8)
Relationship Induction between Derivations and Parse
Trees
• Consider any derivation 1  2  …  n, where 1 is a
single nonterminal A.
– For each sentential form i, we can construct a parse tree whose
yield is i.
• Induction process on i
– BASIS: the tree for 1= A is a single node labeled A.
– INDUCTION:
- Suppose we have constructed a parse tree with yield
i-1=X1X2…Xk (where Xi is either a nonterminal or a terminal)
- Suppose i is derived from i-1 by replacing Xj with 
where Xj→ , and  = Y1Y2…Ym
→ i=X1X2…Xj-1  Xj+1…Xk
- To model this step:
· Find the jth leaf from the left in the current parse tree.
· Let this leaf Xj
· Give this leaf m children labeled Y1, Y2, … Ym
Ambiguity

• A grammar that produces more than one parse tree for


some sentence is said to be ambiguous.
– There should be a one-to-one relationship between parse trees and
its rightmost (or rightmost) derivation.
– In other words, every parse tree has associated with a unique
leftmost and a unique rightmost derivation.
• E.g., Produce the sentence id+id*id with the grammar:
E → E + E | E * E | (E) | id (4.3)

Derivation with leftmost derivation: Derivation with another leftmost derivation:


EE+E EE*E
 id + E E+E*E
 id + E * E (4.8)  id + E * E (4.8)
 id + id * E Produce the  id + id * E
 id + id * id same sentence  id + idCopyright
* id © All Rights Reserved by Yuan-Hao Chang
Ambiguity (Cont.)

Derivation with leftmost derivation: Derivation with another leftmost derivation:


EE+E EE*E
 id + E E+E*E
 id + E * E (4.8)  id + E * E (4.8)
 id + id * E  id + id * E
 id + id * id  id + id * id

E E

E + E E * E

id E * E E + E id

id id id id
Two parse trees for id+id*id
Context-Free Grammars vs. Regular Expression

• Grammars are more powerful than regular expressions.


– Every construct that can be described by a regular expression can be
described by a grammar, but not vice-versa.
– Every regular language is a context-free language, but not vice-versa.
• E.g., the language L = {anbn | n1} with an equal number a’s and b’s.
– Grammar: A0 → aA1b, A1 → aA1b | 
The path from si back
– Regular expression with finite automata:
to itself
Path labeled aj-i

Path labeled ai Path labeled bi sk-1
s0 … si … f

Construct a DFA D with a finite number of states k to accept the language L.


– For an input beginning with more than k a’s, D must enter some state twice (i.e., si)
– aibi is in the language, but there is also a path labeled ajbi. Not in the language
Construct a Grammar from NFA

• Construct a grammar to recognize the same language as


an NFA as follows:
– 1. For each state i, create a a
nonterminal Ai. b b
start a
– 2. If state i has a transition to 0 1 2 3
state j on input a, add the
production Ai → aAj. The NFA for (a|b)*abb
If state i goes to state j on input , b
add the production Ai → Aj
A0 → aA0 | bA0 | aA1
– 3. If i is an accepting state,
A1 → bA2
add Ai → 
A2 → bA3
– 4. If i is the start state, make
A3 → 
Ai be the start symbol.
The grammar for (a|b)*abb
Top-Down Parsing

• Top-down parsing can be viewed as


– Constructing a parse tree for the input string from the
root
– Creating the nodes of the parse tree in preorder
– Finding a leftmost derivation for an input string.
• The key problem is to determine the production to
be applied for a nonterminal.
– Once a production is chosen, the rest of the parsing
process is to match the terminal symbols in the
production body with the input string.
E → TE’
Top-Down Parsing (Cont.) E’ → + TE’ | 
T → FT’ (4.2)
T’ → * FT’ | 
• Top-down parse for id+id*id F → (E) | id
E  E  E  E  E  E  E
lm lm lm lm lm lm
E’ T E’
T E’ T E’ T E’ T E’ T
F T’ F T’ F T’ F T’ + T E’ F T’ + T E’
id id  id  id  F T’
 E  E   
lm lm E E E
T E’ T E’ lm E’
lm
E’ lm
T T T E’
F T’ + T E’ F T’ + T E’ F T’ + T E’ F T’ + T E’ F T’ + T E’
id  F T’ id  F T’ id  F T’ id  F T’ id  F T’ 
id id * F T’ id * F T’ id * F T’ id * F T’
id id  id 
Recursive-Descent Parsing

• A recursive-descent parsing consists of a set of procedures, each of which is for one nonterminal.
• Backtracking might be needed to repeat scans over the input.
– NOTE: backtracking is not very efficient, and tabular methods such as the dynamic programming
algorithm is preferred.
• Left-recursive grammar can cause a recursive-decent parser to go into an infinite loop. (i.e., A production
might be expanded repeatedly without consuming any input.

void A() {
1) Choose an A-production, A→X1X2 … Xk To allow backtracking, this
2) for ( i = 1 to k) { should try each production in
3) if ( Xi is a nonterminal )
4) call procedure Xi(); some order
5) else if ( Xi equals the current input symbol a)
6) advance the input to the next symbol; To allow backtracking, this
7) else /* an error has occurred */ should return to line (1) and try
}
}
another A-production until no
more A-productions to try.
A typical procedure for a nonterminal in a top-down parser
Recursive-Descent Parsing (Cont.)
S → cAd
• Input string w = cad. A → ab | a
Grammar

S  S  S  S
backtrack
c A d c A d c A d c A d

a b a a
match

w = cad w = cad w = cad w = cad


FIRST and FOLLOW

• FIRST and FOLLOW allow us to choose which production


to apply, based on the next input symbol.
– FIRST() is the set of terminals that begin strings derived from ,
where  is any string of grammar symbols. If  * , then  is also in
FIRST().
– FOLLOW() is (for nonterminal A) the set of terminals a that can
appear immediately to the right of A in some sentential form.
- If A can be the rightmost symbol in some sentential form, then $ is in
FOLLOW(A), where $ is a special “endmarker” symbol.

S * Aa S

 A a  c is in FIRST(A)
A * c a is in FOLLOW(A)
c 
FIRST
• Compute FIRST(X) for all grammar symbols X:
– If X is a terminal, then FIRST(X) = {X}.
– If X is a nonterminal and X→Y1Y2 … Yk is a production for some k1,
- Everything in FIRST(Y1) is surely in FIRST(X).
- If Y1 does not derive , then nothing more is added to FIRST(X).
- If Y1 * , then FIRST(Y2) is added to FIRST(X), and so on.
– If X→  is a production, then add  to FIRST(X).
• FIRST(F) = { (, id }
• FIRST(T’) = {*, }
E→ TE’ – The two productions for T’ begins with * and .
E’ → + TE’ |  • FIRST(T) = FIRST(F) = { (, id }
T → FT’ (4.2) – T has one production beginning with F.
T’ → * FT’ |  • FIRST(E’) = {+, }
F→ (E) | id – The two productions for E’ begins with + and .
• FIRST(E) = FIRST(T) = { (, id }
Copyright © All
– E has one production Rights Reserved
beginning with by
T.Yuan-Hao Chang
FOLLOW

• Compute FOLLOW(A) for all nonterminals A


– Place $ in FOLLOW(S), where S is the start symbol and $ is the input right endmarker.
– If there is a production A → B, then everything in FIRST() except  is in FOLLOW(B).
– If there is a production A → B
(or A → B with FIRST() • FOLLOW(E) = { ), $ }
contains ), then everything in – E is the start symbol with the production body (E)
FOLLOW(A) is in FOLLOW(B). • FOLLOW(E’) = FOLLOW(E) = { ), $ }
– E’ appears at the ends of the body of E-productions.
• FOLLOW(T) = { +, ), $ }
• FIRST(E) = { (, id } – T only appears in the body followed by E’. Everything in
E → TE’ FIRST(E’) except  is in FOLLOW(T). → +
• FIRST(E’) = { +,  }
E’ → + TE’ |  – In E→TE’, E’ *  so that everything in FOLLOW(E) is in
• FIRST(T) = { (, id } T → FT’ (4.2) FOLLOW(T).

• FIRST(T’) = { *,  } T’ → * FT’ |  • FOLLOW(T’) = FOLLOW(T) = { +, ), $ }


– In T→FT’, everything in FOLLOW(T) is in FOLLOW(T’).
• FIRST(F) = { (, id } F → (E) | id
• FOLLOW(F) = { +, *, ), $ }
– In T→FT’, everything in FIRST(T’) except  is in FOLLOW(F)
– In T→FT’, T’ *  so that everthing in FOLLOW(T) is in
FOLLOW(F) → +, Copyright
), $ © All Rights Reserved by Yuan-Hao Chang
LL(1) Grammars

• LL(1) grammar:
– First L: scan the input from left to right.
– Second L: produce a leftmost derivation.
– The “1”: use one input symbol of lookahead at each step to make parsing
action decisions. FIRST() and
• No left-recursive or ambiguous grammar can be LL(1). FIRST() are disjoint.

• A grammar is LL(1) if whenever A→  |  are two distinct productions of


G, the following conditions should hold to prevent multiply defined
entries in the parsing table:
– 1. For no terminal a do both  and  derive strings beginning with a.
– 2. At most one of  and  can derive the empty string.
– 3. If *, then  does not derive any string beginning with a terminal in
FOLLOW(A), and likewise if  is in FIRST().
If  is in FIRST(), then FIRST() and FOLLOW(A) are disjoint.
Predictive Parsers for LL(1) Grammars

• Predictive parsers
– Are recursive-descent parsers that need no backtracking.
– Look only at the current input symbol on applying the proper
production for a nonterminal.
– Can be constructed for a class of grammars called LL(1).
• E.g., we have the following productions:
stmt → if (expr) stmt else stmt
| while (expr) stmt
| { stmt_list }
The keywords if, while and the symbol
{ tell us which alternative is the only
one that could possibly succeed if we
are to find a statement.
Transition Diagrams for Predictive Parsers

• To construct the transition diagram from a grammar:


– First eliminate left recursion, and left factor the grammar.
– Then, for each nonterminal A,
- 1. Create an initial and final (return) state.
- 2. For each production A→X1X2…Xk, create a path from the initial to the final state,
with edges labeled X1, X2, …, Xk.

• Parsers have one diagram for each nonterminal.


– The labels of edges can be tokens (terminals) or nonterminals.
- A transition on a token means that the token is the next input symbol.
- A transition on a nonterminal A is a call of the procedure for A.
T E’ E → TE’
E: 0 1 2 E’ → + TE’ | 
+ T E’ +
E’: 3 4 5 6 
 E: 0
T
1 2
Use the diagram E’ to substitute E’ in the diagram E with Copyright
tail-recursion removal.
© All Rights Reserved by Yuan-Hao Chang
Predictive Parsing Table

• A predictive parsing table M[A, a] is a two-


dimensional array, where A is a nonterminal, and a
is a terminal or the symbol $ (the input endmarker).
– The production A→ is chosen if the next input symbol a
is in FIRST().
– When = or * , we should choose A→ if
- The current input symbol is in FOLLOW(A) or
- The $ on the input has been reached and $ is in FOLLOW(A).
Predictive Parsing Table (Cont.)

• Algorithm: Construction of a predictive parsing table


• INPUT: Grammar G.
• OUTPUT: Parsing table M.
• METHOD: For each production A→ of the grammar, do
the following:
– For each terminal a in FIRST(A), add A→ to M[A, a].
– If  is in FIRST(), then for each terminal b in FOLLOW(A), add
A→ to M[A, b].
– If  is in FIRST() and $ is in FOLLOW(A), add A→ to M[A, $].
– If (after performing the above) there is no production in M[A, a], then
set M[A, a] to error or an empty entry.
E→ TE’
Predictive Parsing Table (Cont.) E’ → + TE’ | 
• E→TE’: FIRST(TE’) = FIRST(T) = { (, id } T → FT’ (4.2)
• E’→+TE’: FIRST(+TE’) = {+} T’ → * FT’ | 
• E’→: FOLLOW(E’)={ ), $ } F→ (E) | id
• T→FT’: FIRST(FT’)=FIRST(F)={ (, id } • FIRST(E) = { (, id } • FOLLOW(E) = { ), $ }
• T’→*FT’: FIRST(*FT’)={ * } • FIRST(E’) = { +,  } • FOLLOW(E’) = { ), $ }
• T’→: FOLLOW(T’)={ +, ), $} • FIRST(T) = { (, id } • FOLLOW(T) = { +, ), $ }
• F→(E): FIRST((E))={ ( } • FIRST(T’) = { *,  } • FOLLOW(T’) = { +, ), $ }
• F→id: FIRST(id) = { id } • FIRST(F) = { (, id } • FOLLOW(F) = { +, *, ), $ }

NON- INPUT SYMBOL


TERMINAL
id + * ( ) $
E E→TE’ E→TE’
E’ E’→+TE’ E’→ E’→
T T→FT’ T→FT’
T’ T’→ T’→*FT’ T’→ T’→
F F→id F→Copyright
(E) © All Rights Reserved by Yuan-Hao Chang
Predictive Parsing Table (Cont.)
• For every LL(1) grammar, each parsing-table entry uniquely identifies a
production or signals an error.
– If G is left-recursive or ambiguous, then M will have at least one multiply
defined entry.
– Although left-recursion elimination and left factoring are easy to do, some
grammars have no corresponding LL(1) grammar.
• E.g., • S→iEtSS’: FIRST(iEtSS’) = { i } • FOLLOW(S’) = FOLLOW(S) S → i E t S S’ | a
• S→a: FIRST(a) = { a } • FOLLOW(S) = {$}: start symbol S’ → eS | 
• S’→eS: FIRST(eS) = { e } • FOLLOW(S)=FIRST(S’)={e} E→b
• S’→: FOLLOW(S’)= {e, $} Grammar
ambiguity
• E→b: FIRST(b) = {b}
NON- INPUT SYMBOL
TERMINAL a b e i t $
S S→a S→iEtSS’
S’→ S’→
S’ S’→eS
E E→b Copyright © All Rights Reserved by Yuan-Hao Chang
Nonrecursive Predictive Parsing

• A nonrecursive predictive parser is a table-driven parser


that maintains a stack explicitly instead of recursive calls.
• If w is the matched input so far, then the stack holds a
sequence of grammar symbols  such that S  * w
lm
The symbol on Input a + b $
top of the stack
Current input symbol
Stack X Predictive
Y Parsing Output
Z Program
$
Parsing
Table M
Table-Driven Predictive Parsing
• Algorithm: Table-driven predictive parsing
• INPUT: A string w and a parsing table M for grammar G.
• OUTPUT: If w is in L(G), a leftmost derivation of w; otherwise, an error indication.
• METHOD: Initially, the parser is in a configuration with w$ in the input buffer, and
the start symbol S of G on top of the stack, above $.
set ip to the first symbol of w;
set X to the top stack symbol; /* a is the current input symbol */
while (X≠$) { /* stack is not empty */
if( X is a ) pop the stack and advance ip; /* pop X */
else if (X is a terminal) error();
else if (M[X, a] is an error entry) error();
else if (M[X,a]=X→Y1Y2…Yk) {
output the production X→ Y1Y2…Yk
pop the stack; /* pop X */
push YkYk-1…Y1 onto the stack with Y1 on top;
}
set X to the top stack symbol;
} Copyright © All Rights R
E → TE’
E’ → + TE’ | 
Table-Driven Predictive Parsing (Cont.) T → FT’ (4.2)
T’ → * FT’ | 
• Input: id+id*id F→ (E) | id

MATCHED STACK INPUT ACTION


E$ id+id*id$
TE’$ id+id*id$ output E→TE’
FT’E’$ id+id*id$ output T→FT’
idT’E’$ id+id*id$ output F→id
id T’E’$ +id*id$ match id
id E’$ +id*id$ output T’→
id +TE’$ +id*id$ output E’→+TE’
id+ TE’$ id*id$ match +
id+ FT’E’$ id*id$ output T→FT’
id+ idT’E’$ id*id$ output F→id
id+id T’E’$ *id$ match id
id+id *FT’E’$ *id$ output T’→*FT’
id+id* FT’E’$ id$ match *
id+id* idT’E’$ id$ output F→id
id+id*id T’E’$ $ match id
id+id*id E’$ $ output T’→
id+id*id $ $ output E’→
match $
Error Recovery in Predictive Parsing

• An error is detected during predictive parsing


– When the terminal on top of the stack does not match the next input symbol.
Or
– When nonterminal A is on top of the stack, a is the next input symbol, and
M[A, a] is error.
• Error recovery methods:
– Panic mode
- Skip symbols on the input until a token in a selected set of synchronizing tokens
appears.
- The effectiveness depends on the choice of synchronizing set.
– Phrase-level recovery
- Fill in the blank entries in the predictive parsing table with pointers to error routines.
- Error routines may change, insert, or delete symbols on the input and issue
appropriate error messages.
- An infinite loop must be prevented: checking that any recovery action eventually
consumes input symbols.
Panic-Mode Error Recovery

• Some heuristics to select synchronizing set:


– All symbols in FOLLOW(A) as the synchronizing set for nontermial A
- Skip tokens until an element of FOLLOW(A) is seen and pop A.
– The symbols that begin higher-level constructs as the synchronizing
set of a lower-level construct
- E.g., add keywords that begin statements to the synchronizing sets for
the nonterminals generating expressions.
– The symbols in FIRST(A) as the synchronizing set for nonterminal A.
– If a nonterminal can generate the empty string, the production
deriving  can be used as a default.
- To postpone some error detection, but cannot miss an error.
– If a terminal on top of the stack cannot be matched, pop the terminal,
issue a message saying that the terminal was inserted, and continue
parsing.
- This approach takes the synchronizing set of a token to consist of all of
other tokens.
E→ TE’
Panic-Mode Error Recovery (Cont.) E’ → + TE’ | 
T → FT’ (4.2)
T’ → * FT’ | 
• Obtain synchronizing tokens from the FOLLOW set
F→ (E) | id
of the nonterminal.
– If checked M[A, a] is blank, skip the input symbol a. • FOLLOW(E) = { ), $ }
– If the entry is synch, pop the nonterminal on top of the • FOLLOW(E’) = { ), $ }
stack. • FOLLOW(T) = { +, ), $ }
– If a token on top of the stack does not match the input • FOLLOW(T’) = { +, ), $ }
symbol, pop the token from the stack.
• FOLLOW(F) = { +, *, ), $ }

NON- INPUT SYMBOL


TERMINAL
id + * ( ) $
E E→TE’ E→TE’ synch synch
E’ E’→+TE’ E’→ E’→
T T→FT’ synch T→FT’ synch synch
T’ T’→ T’→*FT’ T’→ T’→
F F→id synch synch F→(E) synch
Copyright © All Rights synchChang
Reserve d by Yuan-Hao
E→ TE’
Panic-Mode Error Recovery (Cont.) E’ → + TE’ | 
T → FT’ (4.2)
• Input: +id*+id T’ → * FT’ | 
MATCHED STACK INPUT Remark F→ (E) | id
E$ +id*+id$ error, skip +
E$ id*+id$ id is in FIRST(E)
TE’$ id*+id$
FT’E’$ id*+id$
idT’E’$ id*+id$
id T’E’$ *+id$ match id
id *FT’E’$ *+id$
id* FT’E’$ +id$ match *
id* FT’E’$ +id$ Error, M[F, +]=synch
id* T’E’$ +id$ F has been popped
id* E’$ +id$
id* +TE’$ +id$
id*+ TE’$ id$ match +
id*+ FT’E’$ id$
id*+ idT’E’$ id$
id*+id T’E’$ $ match id
id*+id E’$ $
id*+id $ $ Copyright © All Rights Reserved by Yuan-Hao Chang
Bottom-Up Parse

• A bottom-up parse constructs a parse tree for an


input string beginning at the leaves towards the root.
– It describes parsing as the process of building parse trees.
id * id F * id T * id T * F T E
id F F id T * F T
id id F id T * F
id F id
A bottom-up parse for id*id
id
The derivation corresponds to the parse: E→E+T|T
E  T  T*F  T*id  F*id  id*id T→T*F|F (4.1)
F → (E) | id
A rightmost derivation
Reductions

• Bottom-up parsing is the process of “reducing” a string w to


the start symbol of the grammar.
– The goal is to construct a derivation in reverse.
– At each reduction step, a specific substring matching the body of a
production is replaced by the nonterminal at the head of the production.
– Key decisions: When to reduce and what production to apply

E→E+T|T
T→T*F|F (4.1)
F → (E) | id

Reduction sequence: A reduction is the reverse


id*id, F*id, T*id, T*F, T, E of a step in a derivation.
Handle Pruning

• Bottom-up parsing during a left-to-right scan of the input


constructs a right-most derivation in reverse.
– Handle: a handle is a substring that matches the body of a
production and
– Reduction: the reduction of a handle represents one E→E+T|T
step along the reverse of a rightmost derivation. T→T*F|F (4.1)
F → (E) | id

Right sentential Reducing


Handle
form production
id1 * id2 id1 F→id
F * id2 F T→F
T * id2 id2 F→id
T*F T*F T→T * F
T T E→T
a, b, c: a terminal
w, x, y, z: strings of terminals
Handle Pruning (Cont.) A,B, C: a nonterminal
W, X, Y, Z: a grammar symbol (termina or nonterminal)
  : strings of grammar symbols

• If S * Aw  w, given a production


→A,→) is a handle of w.
– The A(or
• Given a right sentential form  , a handle  of , a production
A→, and a position of  where  may be found, replace  at
that position by A to produce the previous right-sentential
form in a rightmost derivation of .
• Every right-sentential form of the grammar has exactly one
handle, except ambiguous grammars.
– A rightmost derivation in reverse can be obtaine by “handle pruning”.
S
Handle pruning (rightmost derivation in reverse)

A S * Aw  w
rm rm Production A→b,
  w Rightmost derivation
Shift-Reduce Parsing

• Shift-reduce parsing is a form of bottom-up parsing in which


– a stack holds grammar symbols and
– an input buffer holds the rest of the string to be parsed.
• The handle always appears at the top of the stack just before it is
identified as the handle.

Initial state: finish state:


Mark the STACK INPUT STACK INPUT
bottom of $ w$ $S $
the stack
Input The start
string The top symbol
of the
stack
June 15, 2011 72

Shift-Reduce Parsing (Cont.)

• Operations of shift-reduce E→E+T|T


T→T*F|F (4.1)
parsing: F → (E) | id
– Shift: Shift the next input symbol
STACK INPUT ACTION
onto the top of the stack.
$ id1 * id2 $ shift
– Reduce: Locate the left end of the
string within the stack and decide $id1 * id2 $ reduce by F→ id
with what nonterminal to replace the $F * id2 $ reduce by T→ F
string. $T * id2 $ shift
– Accept: Announce successful $T* id2 $ shift
completion of parsing. $T*id2 $ reduce by F→ id
– Error: Discover a syntax error and $T*F $ reduce by T→ T*F
call an error recovery routine. $T $ reduce by E→ T
Copyright © All Rights Reserved by Yuan-Hao Chang $E $ accept
E.g., parse id1 * id2
June 15, 2011 73

Shift-Reduce Parsing (Cont.)


The handle will always eventually appear on top of the stack.
Case 1 Case 2
S * Az  Byz  yz Leftmost S * BxAz  Bxyz  xyz
rm rm rm derivation rm rm rm
S S

B B A

   y z   x y z
STACK INPUT ACTION STACK INPUT ACTION
$ yz $ reduce B→ $ xyz $ reduce B→
$B yz $ shift $B xyz $ shift xy
$By z$ reduce A→By $Bxy z$ reduce A→y
$  z$ shift $ Bx z$ Shift z
Copyright © All Rights Reserved by Yuan -Hao Chang
Conflicts During Shift-Reduce Parsing

• Some context-free grammars could let the shift-


reduce parsing encounter conflicts on deciding the
next action.
– Shift/reduce conflict
- Cannot decide whether to shift or to reduce Dangling-else
- E.g., shift-reduce conflict grammer

stmt → if expr then stmt


| if expr then stmt else stmt Cannot determine
| other whether to shift or
STACK INPUT to reduce
… if expr then stmt else … $
– Reduce/reduce conflict
- Cannot decide which production should be adopted to reduce
Conflicts During Shift-Reduce Parsing (Cont.)

• E.g., a grammar for function call and array for the input p(i,j)
– A function called with parameters surrounded by parentheses.
– Indices of arrays are surrounded by parentheses.
(1) stmt → id (parameter_list) One solution to resolve this problem
(2) stmt → expr := expr is to change production into
(3) parameter_list → parameter_list, parameter stmt → procid (parameter_list)
(4) parameter_list → parameter For the token name of procedures.
(5) parameter → id STACK INPUT
(6) expr → id (expr_list)
(7) expr → Id … procid ( id , id) … $
(8) expr_list → expr_list, expr A procedure call is encountered
(9) expr_list → expr
STACK INPUT
Input: p(i,j) is converted to the token string id(id, id)
… id ( id , id) … $ The correct choice is production (5) if p is a
An array is encountered procedure call.
The correct choice is production (7) if p is an array.
Thanks

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy