0% found this document useful (0 votes)
3 views

slides08-lr-parsing

The document discusses Bottom-Up LR Parsing, contrasting it with Top-Down parsing methods. It explains the mechanics of LR parsing, including the use of LR items and the handling of shift-reduce conflicts, particularly in the context of if-then-else statements. Additionally, it touches on error reporting and recovery strategies, as well as other parsing tools like GLR and PEG parsers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

slides08-lr-parsing

The document discusses Bottom-Up LR Parsing, contrasting it with Top-Down parsing methods. It explains the mechanics of LR parsing, including the use of LR items and the handling of shift-reduce conflicts, particularly in the context of if-then-else statements. Additionally, it touches on error reporting and recovery strategies, as well as other parsing tools like GLR and PEG parsers.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Bottom-Up LR Parsing

17-363/17-663: Programming Language Pragmatics

Reading: PLP section 2.3

Copyright © 2016 Elsevier


Prof. Jonathan Aldrich
Top-Down vs. Bottom-Up Parsing

• Top-Down/LL Parsing Intuition


program Start trying to parse a program

stmt_list $$$ Based


Start trying
on lookahead,
to parse arefine
program
to stmt_list
then to stmt stmt_list
stmt stmt_list $$$
Stack tracks predicted future parsing
...
• Bottom-Up/LR Parsing Intuition
read A Start by shifting a few tokens

stmt Reduce tokens to a stmt, thentotoa stmt_list


stmt, then a stmt_list

stmt_list Continue
Continue to
tokens
to shift
shift and
and reduce
reduce tokens
tokens
tokens to
to recognize
recognize another
another stmt
stmt
stmt_list read B Stack shows what constructs
stmt_list stmt have been recognized so far
Example Program and SLR(1) Grammar

read A
read B
sum := A + B
write sum
write sum / 2
Modeling a Parse with LR Items

• Initial parse state captured by an item

– includes start symbol, production, and current location

• What we see next might be inside stmt_list


– So we expand stmt_list and get a set of items:
Modeling a Parse with LR Items

• We can likewise expand stmt to get the item set:

• This is an SLR parser state


– We’ll call it state 0
Modeling a Parse with LR Items

• Our starting stack has state 0 on it:


0
• Input: read A read B …

• From state 0, we shift read onto the stack and


move to state 1:
0 read 1

• State 1 represents the following item:


Modeling a Parse with LR Items

• stack / item: 0 read 1


• input: A read B …

• From state 1, we shift id onto the stack


• stack / item: 0 read 1 id 1’
• input: read B …

• Now we reduce to stmt, and put stmt into the input


• stack / item: 0
• input: stmt read B …
Modeling a Parse with LR Items

• stack / item: 0
• input: stmt read B …

• We now shift stmt


• stack / item: 0 stmt 0’
• input: read B …

• Next we reduce to stmt_list


• stack / item: 0
• input: stmt_list read B …
Modeling a Parse with LR Items

• stack / item: 0
• input: stmt_list read B …

• Now we shift stmt_list


• stack / item: 0 stmt_list 2
• input: read B …
The Characteristic Finite State
Machine (CFSM)

There are also shift-reduce actions. So our states 0’, 1’ aren’t shown
here: they are “in between” states within a shift-reduce action
The CFSM as a Table
A Detailed Explanation of the CFSM
A Detailed Explanation of the CFSM
A Detailed Explanation of the CFSM
Exercise: LR Parsing

• Assume you are in parsing state 0


and the token stream is write sum / 2
• Show how the parse stack changes as the token
stream is consumed
• We’ll do the first action together
Parsing if-then-else Statements

• A famous parsing challenge (from Algol) involves if-


then-else, where else is optional:

stmt ::= if exp then stmt


| if exp then stmt else stmt

• Consider the phrase:

if exp then if exp then stmt else stmt

• Which then does the else belong to?


Shift/Reduce Conflicts

• This is a shift-reduce conflict

if exp then if exp then stmt . else stmt

• When the else appears


• we can shift, treating it as part of the inner if statement, or
• we can reduce the inner if statement,
treating the else as part of the outer if statement
• How to solve?
– Many existing tools prioritize shift over reduce
• This corresponds to the traditional solution to the if problem
Shift/Reduce Conflicts

• This is a shift-reduce conflict

if exp then if exp then stmt . else stmt

• When the else appears


• we can shift, treating it as part of the inner if statement, or
• we can reduce the inner if statement,
treating the else as part of the outer if statement
• How to solve?
– Many existing tools prioritize shift over reduce
– You can declare productions with precedence
• E.g. giving the if-then-else production higher precedence
than the if-then production
Shift/Reduce Conflicts

• This is a shift-reduce conflict

if exp then if exp then stmt . else stmt

• When the else appears


• we can shift, treating it as part of the inner if statement, or
• we can reduce the inner if statement,
treating the else as part of the outer if statement
• How to solve?
– Many existing tools prioritize shift over reduce
– You can declare productions with precedence
– Rewrite the grammar to make it LR(1)
An LR(0) If-Then-Else Grammar
stmt → balanced_stmt | unbalanced_stmt
balanced_stmt → if cond then balanced_stmt
else balanced_stmt
| other_stuff
unbalanced_stmt → if cond then stmt
| if cond then balanced_stmt
else unbalanced_stmt

Invariant: balanced_stmts may be inside unbalanced_stmts


– but not vice versa
Unfortunately this grammar is LR(0) but not LL(0)
– Have to use precedence in LL parsers
or custom code in a recursive-descent parser
Connections to Theory
• A scanner is a Deterministic Finite Automaton (DFA)
– it can be specified with a state diagram

• An LL or LR parser is a Pushdown Automaton (PDA)


– a PDA can be specified with a state diagram and a stack
• the state diagram looks just like a DFA state diagram, except the arcs
are labeled with <input symbol, top-of-stack symbol> pairs, and in
addition to moving to a new state the PDA has the option of pushing
or popping a finite number of symbols onto/off the stack
• For LL(1) parsers the state machine has only two states:
processing and accepted
• All the action is in the input symbol and top of stack
• LR(1) parsers are richer (and more expressive)
Error Reporting
• Error reporting is relatively simple
• If you get a token for which there’s no entry in the
current parsing state / top of stack element, signal an
error
• Can tell the user what tokens would be OK here
Error Recovery
• Nice to report more than one error to the user
• Rather than stopping after the first one
• Simple idea: Panic mode
• In C-like languages, semicolons are good recovery spots
• So on an error:
• read tokens until you get to a semicolon
• discard the parser’s stack (predictions in an LL parser, states in an LR
parser) until you come to a production that has a semicolon
• assume you’ve parsed the semicolon-containing construct,
and continue parsing
• There are ways to do substantially better – see the online
supplement to the textbook
Other Parsing Tools
• Generalized LR (GLR) parser generators
• Accept any grammar – even ambiguous ones!
• This can be good if you have grammars written by nonexperts, as in
SASyLF
• But for a compiler-writer it is dangerous—you may not even know
your grammar is ambiguous, and then your poor users get ambiguity
errors when the parser runs
• Works like an LR parser, but on ambiguity considers all
possible parses in parallel
• Still O(n) if the grammar is LR (or “close”)
Other Parsing Tools
• Parsing Expression Grammar (PEG) parser generators
• Sidestep ambiguity by always favoring the first production
• Same danger as GLR parsers – you may not know your
grammar is ambiguous
• Still used some in practice (e.g. in Python)
• About as efficient as LL or LR in practice
• Like LR, PEG grammars can be cleaner than LL grammars
• Requires extreme care to get right – must think algorithmically
instead of declaratively
• Guido van Rossum, the developer of Python, saw this as an advantage

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy