0% found this document useful (0 votes)
33 views28 pages

Syntax Analysis: EECS 483 - Lecture 4 University of Michigan Monday, September 17, 2006

This document provides an overview of syntax analysis and context-free grammars. It discusses where syntax analysis occurs, parsing analogies to natural language, and how context-free grammars are used to describe syntax. Key points include: - Syntax analysis determines if an input token stream satisfies the syntax of a programming language. It occurs after lexical analysis. - Context-free grammars are used to describe syntax. They consist of terminals, non-terminals, a start symbol, and productions. - Parsing involves applying the productions of a grammar to see if a given input can be derived. This determines if the input is valid or generates an error.

Uploaded by

adviful
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
33 views28 pages

Syntax Analysis: EECS 483 - Lecture 4 University of Michigan Monday, September 17, 2006

This document provides an overview of syntax analysis and context-free grammars. It discusses where syntax analysis occurs, parsing analogies to natural language, and how context-free grammars are used to describe syntax. Key points include: - Syntax analysis determines if an input token stream satisfies the syntax of a programming language. It occurs after lexical analysis. - Context-free grammars are used to describe syntax. They consist of terminals, non-terminals, a start symbol, and productions. - Parsing involves applying the productions of a grammar to see if a given input can be derived. This determines if the input is valid or generates an error.

Uploaded by

adviful
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 28

Syntax Analysis Part I

EECS 483 Lecture 4


University of Michigan
Monday, September 17, 2006

Announcements/Reading

Reading - Ch 4
4.1 4.3 for today

Simon does not have office hours on Fri


Just Tue/Thu, 1:30-3:30
My mistake on the website

-2-

Where is Syntax Analysis Performed?


if (b == 0) a = b;

Lexical Analysis or Scanner

if

==

Syntax Analysis or Parsing

if
==
b

abstract syntax tree


or parse tree

=
0

a
-3-

Parsing Analogy
Syntax analysis for natural languages
Recognize whether a sentence is grammatically correct
Identify the function of each word
sentence
subject

verb

indirect object

gave

him

I gave him the book


-4-

object
noun phrase
article

noun

the

book

Syntax Analysis Overview

Goal Determine if the input token stream


satisfies the syntax of the program
What do we need to do this?
An expressive way to describe the syntax
A mechanism that determines if the input
token stream satisfies the syntax description

For lexical analysis


Regular expressions describe tokens
Finite automata = mechanisms to generate
tokens from input stream
-5-

Just Use Regular Expressions?

REs can expressively describe tokens


Easy to implement via DFAs

So just use them to describe the syntax of a


programming language??
NO! They dont have enough power to express any
non-trivial syntax
Example Nested constructs (blocks, expressions,
statements) Detect balanced braces:
{ { { { {
{{} {} {{} { }}}

- We need unbounded counting!


- FSAs cannot count except in a strictly
modulo fashion
-6-

...

Context-Free Grammars

Consist of 4 components:

Terminal symbols = token or


Non-terminal symbols = syntactic variables
Start symbol S = special non-terminal
Productions of the form LHSRHS

SaSa
ST
TbTb
T

LHS = single non-terminal


RHS = string of terminals and non-terminals
Specify how non-terminals may be expanded

Language generated by a grammar is the set of


strings of terminals derived from the start symbol
by repeatedly applying the productions
L(G) = language generated by grammar G
-7-

CFG - Example

Grammar for balanced-parentheses language


S(S)S
S

? Why is the final S required?

1 non-terminal: S
2 terminals: (, )
Start symbol: S
2 productions

If grammar accepts a string, there is a derivation of


that string using the productions
(())
S = (S) = ((S) S) = (() ) = (())
-8-

More on CFGs

Shorthand notation vertical bar for


multiple productions
SaSa|T
TbTb|
CFGs powerful enough to expression the syntax
in most programming languages
Derivation = successive application of
productions starting from S
Acceptance? = Determine if there is a derivation
for an input token stream
-9-

A Parser
Context free
grammar, G

Parser

Token stream, s
(from lexer)

Yes, if s in L(G)
No, otherwise
Error messages

Syntax analyzers (parsers) = CFG acceptors which also


output the corresponding derivation when the token stream
is accepted
Various kinds: LL(k), LR(k), SLR, LALR

- 10 -

RE is a Subset of CFG
Can inductively build a grammar for each RE

a
R1 R2
R1 | R2
R1*

S
Sa
S S1 S2
S S1 | S2
S S1 S |

Where
G1 = grammar for R1, with start symbol S1
G2 = grammar for R2, with start symbol S2

- 11 -

Grammar for Sum Expression

Grammar
SE+S|E
E number | (S)

Expanded

SE+S
SE
E number
E (S)

4 productions
2 non-terminals (S,E)
4 terminals: (, ), +, number
start symbol: S

- 12 -

Constructing a Derivation

Start from S (the start symbol)


Use productions to derive a sequence of
tokens
For arbitrary strings , , and for a
production: A
A single step of the derivation is
A
(substitute for A)

Example
SE+S
(S + E) + E (E + S + E) + E
- 13 -

Class Problem
SE+S|E
E number | (S)

Derive: (1 + 2 + (3 + 4)) + 5

- 14 -

Parse Tree
S
E

( S )

E + S

1 E + S
2

Parse tree = tree representation of the


derivation
Leaves of the tree are terminals
Internal nodes are non-terminals
No information about the order of
the derivation steps

( S )
E + S
3
E

4
- 15 -

Parse Tree vs Abstract Syntax Tree


S
E

Parse tree also called concrete syntax


S

( S )

E + S

+
+

1 E + S
2

+
2

+
3

( S )
E + S
3
E

AST discards (abstracts) unneeded


information more compact format
- 16 -

Derivation Order

Can choose to apply productions in any order,


select non-terminal and substitute RHS of
production
Two standard orders: left and right-most
Leftmost derivation
In the string, find the leftmost non-terminal and apply
a production to it
E+S1+S

Rightmost derivation
Same, but find rightmost non-terminal
E+SE+E+S
- 17 -

Leftmost/Rightmost Derivation Examples


SE+S|E
E number | (S)
Leftmost

derive: (1 + 2 + (3 + 4)) + 5
S E + S (S)+S (E+S) + S (1+S)+S (1+E+S)+S
(1+2+S)+S (1+2+E)+S (1+2+(S))+S (1+2+(E+S))+S
(1+2+(3+S))+S (1+2+(3+E))+S (1+2+(3+4))+S
(1+2+(3+4))+E (1+2+(3+4))+5
Now, rightmost derive the same input string
S E+S E+E E+5 (S)+5 (E+S)+5 (E+E+S)+5
(E+E+E)+5 (E+E+(S))+5 (E+E+(E+S))+5
(E+E+(E+E))+5 (E+E+(E+4))+5 (E+E+(3+4))+5
(E+2+(3+4))+5 (1+2+(3+4))+5
Result: Same parse tree: same productions chosen, but in diff order
- 18 -

Class Problem
SE+S|E
E number | (S) | -S

Do the rightmost derivation of : 1 + (2 + -(3 + 4)) + 5

- 19 -

Ambiguous Grammars

In the sum expression grammar, leftmost


and rightmost derivations produced
identical parse trees
+ operator associates to the right in parse
tree regardless of derivation order
+
+

(1+2+(3+4))+5

+
2

+
3

- 20 -

An Ambiguous Grammar

+ associates to the right because of the


right-recursive production: S E + S
Consider another grammar
S S + S | S * S | number

Ambiguous grammar = different


derivations produce different parse trees
More specifically, G is ambiguous if there are
2 distinct leftmost (rightmost) derivations for
some sentence
- 21 -

Ambiguous Grammar - Example


S S + S | S * S | number
Consider the expression: 1 + 2 * 3
Derivation 2: S S*S
S+S*S 1+S*S 1+2*S
1+2*3

Derivation 1: S S+S
1+S 1+S*S 1+2*S
1+2*3

2 leftmost derivations
*

+
1

*
2

1
But, obviously not equal!
- 22 -

3
2

Impact of Ambiguity

Different parse trees correspond to


different evaluations!
Thus, program meaning is not defined!!
*

+
1

*
2

3
2

=9

=7
- 23 -

Can We Get Rid of Ambiguity?

Ambiguity is a function of the grammar, not the


language!
A context-free language L is inherently ambiguous
if all grammars for L are ambiguous
Every deterministic CFL has an unambiguous
grammar
So, no deterministic CFL is inherently ambiguous
No inherently ambiguous programming languages have
been invented

To construct a useful parser, must devise an


unambiguous grammar
- 24 -

Eliminating Ambiguity

Often can eliminate ambiguity by adding


nonterminals and allowing recursion only
S
on right or left
SS+T|T
T T * num | num

T non-terminal enforces precedence


Left-recursion; left associativity
- 25 -

S + T
T

T * 3

A Closer Look at Eliminating Ambiguity

Precedence enforced by
Introduce distinct non-terminals for each
precedence level
Operators for a given precedence level are
specified as RHS for the production
Higher precedence operators are accessed by
referencing the next-higher precedence nonterminal

- 26 -

Associativity

An operator is either left, right or non


associative
Left:
Right:
Non:

a + b + c = (a + b) + c
a ^ b ^ c = a ^ (b ^ c)
a < b < c is illegal (thus undefined)

Position of the recursion relative to the


operator dictates the associativity
Left (right) recursion left (right) associativity
Non: Dont be recursive, simply reference next
higher precedence non-terminal on both sides of
operator
- 27 -

Class Problem
S S + S | S S | S * S | S / S | (S) | -S | S ^ S | num
Enforce the standard arithmetic precedence rules and remove
all ambiguity from the above grammar
Precedence (high to low)
(), unary
^
*, /
+, Associativity
^ = right
rest are left

- 28 -

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy