0% found this document useful (0 votes)
37 views29 pages

CC-Lec 5 Week 5 Cfgs

Uploaded by

Ch Salman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views29 pages

CC-Lec 5 Week 5 Cfgs

Uploaded by

Ch Salman
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 29

COMPILER

CONSTRUCTION
WEEK 5
ANUM ALEEM
Syntax Analysis ( Parsing)
Phases of a Compiler
Source Code
Lexical Analyzer

Syntax Analyzer

Symbol Semantic Analyzer


Error
Table
Handler
Manager Intermediate Code Generator

Code Optimizer

Code Generator

Object Code

3
Parsing Overview
 What is syntax ?
 The way in which words are put together to
form phrases, clauses, or sentences.

 The function of a parser :


 Input: sequence of tokens from lexical
analyzer
 Output: parse tree of the program

4
Parsing

 The parser checks the stream of words (tokens) and their


parts of speech for grammatical correctness.
 It determines if the input is syntactically well formed.
 It guides context sensitive (“semantic”) analysis (type
checking).
 Finally, it builds IR for the source program.
 The parser ensures that sentences of a programming
language that make up a program abide by the syntax of
the language.
 If there are errors, the parser will detect them and
reports them accordingly.

5
Parsing
 Consider the following code segment that contains a number of
syntax errors:

int* foo(int i, int j))


{
for(k=0; i j; )
if( i > j )
return j;
}

 It is clear that a scanner based upon regular expressions will not be able
to detect syntax error.

6
Parsing

Errors in the previous code:


 Line 1 has extra parenthesis at the end.
 The boolean expression in the for loop in line 3 is
incorrect.
 All such errors are due to the fact that the function
does not abide by the syntax of the C++ language
grammar.

7
Example
 Consider a program statement
if x == y
z =1
else
z
=2
 Parser
input:
IF ID
== ID
ID = INT
ELSE
ID = INT
8
Example

IF-THEN-ELSE

== = =

I I ID I IN
D D INT D T

9
Example
 Java expression
x == y ? 1 : 2
 Parser input
ID == ID ? INT : INT
 Parser output

?:

INT INT
==

ID ID

10
Comparison with Lexical Analysis

Phase Input Output

Lexical Analyzer Sequence of Sequence of tokens


characters

Parser Sequence of tokens Parse tree

11
Comparison with Lexical Analysis

 Scanners
 Task: recognize language tokens

 Implementation: DFA

 Transition based on the next character

 Parsers
 Task: recognize language syntax (organization of tokens)

 Implementation:

 Top-down parsing

 Bottom-up parsing

12
Role of the Parser
 Not all sequences of tokens are programs
 Parser must distinguish between valid and invalid
sequences of tokens

 We need
 A language for describing valid sequences of tokens
 A method for distinguishing valid from invalid
sequences of token
 An acceptor mechanism that determines if input
token stream satisfies the syntax of the
programming language.

13
Context-Free Grammars (CFG)

Context-Free Grammars (CFG)

 The syntax of most programming languages is specified using Context-


Free Grammars (CFG).
 Context- free syntax is specified with a four tuple grammar G=(S,N,T,P)
where
S is the start symbol (non terminal)
N is a set of non-terminal symbols that will be substituted by
terminals
T is set of terminal symbols or words that can’t be substituted
P is a set of productions or rewrite rules
 Parsing is the process of discovering a derivation for some sentence of a
language. The mathematical model of syntax is represented by a grammar
G. The language generated by the grammar is indicated by L(G)
14
 For example, the Context-Free Grammar for arithmetic expressions is
1. goal  expr
2. expr  expr op term | term
3. term  number | id
4. op  + | –

For this CFG,


S = goal
T=
{ number, id}
N = { goal,
expr, term,
op}
P = { 1, 2, 3,
15
4} i.e., all the
above 4
 Example: Given the above CFG, we can derive sentence x+2-y
by repeated substitution.
Productions Result
goal
goal  expr expr
expr  expr op term expr op term
term  id expr op y
op  - expr – y
expr  expr op term expr op term – y
term  number expr op 2 – y
op  + expr + 2 – y
expr  term term + 2 – y
term  id x+2–y

16
Example: Given S  aS | bS | a | b . Derive abbab.
S  aS
 abS
 abbS
 abbaS
 abbab

Example:
S  aA | bB
A  aS | a
B  bS | b
Derive
bbaaaa.

[for 17

practice]
Key Idea

1. Begin with a string consisting of the start symbol “S”


2. Replace any non-terminal X in the string by a right-hand
side of some production
X  Y1 … Yn
3. Repeat step (2) until there are only terminals in the string
4. The successive strings created in this way are called
sentential forms.

18
What is meant by context-free?
 A rule that is “free of context”.
 The non-terminals appear by themselves to the left of the arrow
in context-free rules:
A  α
 The rule A  α says that A may be replaced by α anywhere, regardless
of where A occurs.
 On the other hand, we could define a context as pair of strings β, γ,
such that a rule would apply only if β occurs before α and γ occurs
after A.
We would write this as
βAγ  βαγ
Such a rule is called context -sensitive grammar rule.

19
Types of derivations:
 Left-most derivation: replace left-most non-terminal at each step.
 Right-most derivation: replace right-most non-terminal at each step.

Example: Consider E  E + E | E  E | (E ) |
Derive a string id id id + id
E
E
 E+E  E+E
 E  E+E  E+id
 id  E + E  E  E + id
 id  id + E  E  id + id
 id  id + id  id  id + id
Left-most derivation Right-most derivation

20
Parse Tree/Syntax Tree:
 The derivations can be represented in a tree-like fashion called parse
tree.
 It represents the syntactic structure of a string according to some
formal grammar.
 It is made up of nodes and branches.
 The start symbol is the root and the derived symbols are nodes.
 The interior nodes contain the non-terminals used during the
derivation.
 The leaf nodes are the terminals.
 Note that right-most and left-most derivations have the same parse tree
 The difference is the order in which branches are added

21
Derivation- Learn by Example
Example: Given a CFG E  E +  E | (E ) |
 Derive a string E | E id
id 
 Left-most derivation: id + id
E
E
 E+E
 E  E+E E + E
 id  E + E
 id  id + E
E * E i
 id  id + id
d
i i
d d

22
Derivation- Learn by Example
 Right-most derivation: id  id + id

E
 E+E E
 E+id
 E  E + id E + E
 E  id + id
 id  id + id E * E i
d
i i
d d

23
Example:

24
Abstract Syntax Tree
 The parse tree contains a lot of unneeded information. Compilers often
use an abstract syntax tree (AST).
 AST is much more concise; it summarizes grammatical structure
without the details of derivation. ASTs are one kind of intermediate
representation (IR).
 For example, the AST for below parse tree constructed for  id + id
id

E E
+ E i
* d
E * E + i i i
d d d
i i
d d
Parse Tree Abstract Syntax Tree

25
CFG Ambiguity
 A grammar is ambiguous if it generates two parse trees (left and
right) for the same string .
 Equivalently, there is more than one right-most or left-most derivation
for some string.
 Ambiguity is bad
 Leaves meaning of some programs ill-defined
 Ambiguity is common in programming languages
 Arithmetic expressions
 IF-THEN-ELSE

26
CFG Ambiguity
 Consider E  E + E | E  E | ( E ) | i n
t a string int * int + int with two different parse trees.
We can generate

E E
E E * E
+
E int E

E * +

E int
27

int int E
CFG Ambiguity
 Examples of non-ambiguous CFG:
 Consider a CFG of the language PALINDROME.

S  aSa | bSb | a | b | є

PALINDROME is a word that is readable the same from left or right.


e.g., abba, babab, abbabba.

Try the above CFG and derive parse trees.


Can more than one trees be generated for a single word?

28
Ambiguity: The Dangling Else
 Consider the grammar
S  if E then S | if E then S else S |
OTHER This grammar is also ambiguous. HOW?
 The expression has two different parse trees
if E1 then if E2 then S1 else S2
if E1 then
if if E1 then
if E2 then S1 if if E2 then 1
else S2 Selse S2
E1 E1 if
if S2

E2 S1 S2
E2 S1
Typically we want the second form because ELSE matches the closest
previously unmatched THEN.

29

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy