CC-Lec 5 Week 5 Cfgs
CC-Lec 5 Week 5 Cfgs
CONSTRUCTION
WEEK 5
ANUM ALEEM
Syntax Analysis ( Parsing)
Phases of a Compiler
Source Code
Lexical Analyzer
Syntax Analyzer
Code Optimizer
Code Generator
Object Code
3
Parsing Overview
What is syntax ?
The way in which words are put together to
form phrases, clauses, or sentences.
4
Parsing
5
Parsing
Consider the following code segment that contains a number of
syntax errors:
It is clear that a scanner based upon regular expressions will not be able
to detect syntax error.
6
Parsing
7
Example
Consider a program statement
if x == y
z =1
else
z
=2
Parser
input:
IF ID
== ID
ID = INT
ELSE
ID = INT
8
Example
IF-THEN-ELSE
== = =
I I ID I IN
D D INT D T
9
Example
Java expression
x == y ? 1 : 2
Parser input
ID == ID ? INT : INT
Parser output
?:
INT INT
==
ID ID
10
Comparison with Lexical Analysis
11
Comparison with Lexical Analysis
Scanners
Task: recognize language tokens
Implementation: DFA
Parsers
Task: recognize language syntax (organization of tokens)
Implementation:
Top-down parsing
Bottom-up parsing
12
Role of the Parser
Not all sequences of tokens are programs
Parser must distinguish between valid and invalid
sequences of tokens
We need
A language for describing valid sequences of tokens
A method for distinguishing valid from invalid
sequences of token
An acceptor mechanism that determines if input
token stream satisfies the syntax of the
programming language.
13
Context-Free Grammars (CFG)
16
Example: Given S aS | bS | a | b . Derive abbab.
S aS
abS
abbS
abbaS
abbab
Example:
S aA | bB
A aS | a
B bS | b
Derive
bbaaaa.
[for 17
practice]
Key Idea
18
What is meant by context-free?
A rule that is “free of context”.
The non-terminals appear by themselves to the left of the arrow
in context-free rules:
A α
The rule A α says that A may be replaced by α anywhere, regardless
of where A occurs.
On the other hand, we could define a context as pair of strings β, γ,
such that a rule would apply only if β occurs before α and γ occurs
after A.
We would write this as
βAγ βαγ
Such a rule is called context -sensitive grammar rule.
19
Types of derivations:
Left-most derivation: replace left-most non-terminal at each step.
Right-most derivation: replace right-most non-terminal at each step.
Example: Consider E E + E | E E | (E ) |
Derive a string id id id + id
E
E
E+E E+E
E E+E E+id
id E + E E E + id
id id + E E id + id
id id + id id id + id
Left-most derivation Right-most derivation
20
Parse Tree/Syntax Tree:
The derivations can be represented in a tree-like fashion called parse
tree.
It represents the syntactic structure of a string according to some
formal grammar.
It is made up of nodes and branches.
The start symbol is the root and the derived symbols are nodes.
The interior nodes contain the non-terminals used during the
derivation.
The leaf nodes are the terminals.
Note that right-most and left-most derivations have the same parse tree
The difference is the order in which branches are added
21
Derivation- Learn by Example
Example: Given a CFG E E + E | (E ) |
Derive a string E | E id
id
Left-most derivation: id + id
E
E
E+E
E E+E E + E
id E + E
id id + E
E * E i
id id + id
d
i i
d d
22
Derivation- Learn by Example
Right-most derivation: id id + id
E
E+E E
E+id
E E + id E + E
E id + id
id id + id E * E i
d
i i
d d
23
Example:
24
Abstract Syntax Tree
The parse tree contains a lot of unneeded information. Compilers often
use an abstract syntax tree (AST).
AST is much more concise; it summarizes grammatical structure
without the details of derivation. ASTs are one kind of intermediate
representation (IR).
For example, the AST for below parse tree constructed for id + id
id
E E
+ E i
* d
E * E + i i i
d d d
i i
d d
Parse Tree Abstract Syntax Tree
25
CFG Ambiguity
A grammar is ambiguous if it generates two parse trees (left and
right) for the same string .
Equivalently, there is more than one right-most or left-most derivation
for some string.
Ambiguity is bad
Leaves meaning of some programs ill-defined
Ambiguity is common in programming languages
Arithmetic expressions
IF-THEN-ELSE
26
CFG Ambiguity
Consider E E + E | E E | ( E ) | i n
t a string int * int + int with two different parse trees.
We can generate
E E
E E * E
+
E int E
E * +
E int
27
int int E
CFG Ambiguity
Examples of non-ambiguous CFG:
Consider a CFG of the language PALINDROME.
S aSa | bSb | a | b | є
28
Ambiguity: The Dangling Else
Consider the grammar
S if E then S | if E then S else S |
OTHER This grammar is also ambiguous. HOW?
The expression has two different parse trees
if E1 then if E2 then S1 else S2
if E1 then
if if E1 then
if E2 then S1 if if E2 then 1
else S2 Selse S2
E1 E1 if
if S2
E2 S1 S2
E2 S1
Typically we want the second form because ELSE matches the closest
previously unmatched THEN.
29