COSC3054 Lec 03 I Grammars
COSC3054 Lec 03 I Grammars
Compiler Design
COSC3054
Lecture 03
Syntactic Analysis -
Grammars
Syntax Analysis (Parsing)
★ An overview of parsing
○ Functions & Responsibilities
★ Context Free Grammars
○ Concepts & Terminology
★ Writing and Designing Grammars
★ Resolving Grammar Problems / Difficulties
★ Top-Down Parsing
○ Recursive Descent & Predictive LL
★ Bottom-Up Parsing
○ LR(0), SLR, CLR & LALR
★ Concluding Remarks/Looking Ahead 3
The Role of the Parser
★ The Parser or syntactic analyzer obtains a string of tokens from
the lexical analyzer
○ and verifies that the string can be generated by the grammar
for the source language
★ It reports any syntax errors in the program
○ It also recovers from commonly occurring errors
■ so that it can continue processing its input
4
The Role of the Parser
5
The Role of the Parser
★ Parser builds the parse tree
★ Parser verifies the structure generated by the tokens based on the
grammar
○ performs context free syntax analysis
★ Parser helps to construct intermediate code
★ Parser produces appropriate error messages
★ Parser performs error recovery
★ Issues: Parser cannot detect errors such as:
○ Variable re-declaration
○ Variable initialization before use
○ Data type mismatch for an operation 6
The Role of the Parser
★ The Parser
○ to identify the syntactic structure in the sequence of symbols,
that is how the syntactic units are composed from other units
★ Syntactic units in imperative languages are, for example,
variables, expressions, declarations, statements, and sequences of
statements
★ One possible representation of the syntactic structure of the input
program is the syntax tree or parse tree
★ The syntactic structure of the programs written in some
programming language can be described by a context-free
grammar 7
Syntax Analysis Foundations
★ Lexical analysis
○ specified by regular expressions and
○ implemented by finite automata
★ Syntax analysis
○ specified by context-free grammars (CFG) and
○ implemented by pushdown automata (PDA)
★ Regular expressions alone are not sufficient to describe the
syntax of programming languages
○ they cannot express embedded recursion as occurs in the
nesting of expressions, statements, and blocks
8
RE vs. CFG
★ Regular Expressions
○ Basis of lexical analysis
○ Represent regular languages
★ Context Free Grammars
○ Basis of parsing
○ Represent language constructs
○ Characterize context free languages
9
Context-Free Grammars
10
Context-Free Grammars
11
Context-Free Grammars
★ The non-terminal symbol <stat> generates statements
★ The meta-character | is used to combine several alternatives
★ According to these productions,
○ a statement is either an if -statement, a while-statement, a
do-while-statement, an expression followed by a semicolon,
an empty statement, or a sequence of statements in parentheses
★ The non-terminal symbol <if_stat>
○ generates if-statements in which the else-part may be present
or absent
12
Context-Free Grammars
★ Example of context-free grammar:
○ The following grammar defines simple arithmetic expressions:
13
Context-Free Grammars
★ Inherently recursive structures of a programming language are
defined by a context-free grammar
★ In a context-free grammar, we have:
○ A finite set of terminals, T
■ this will be the set of tokens
○ A finite set of non-terminals (syntactic-variables), N
○ A finite set of productions rules in the following form
○ A → α, where A is a non-terminal and α is a string of terminals
and non-terminals (including the empty string)
○ A start symbol (one of the non-terminal symbol)
14
Context-Free Grammars
★ A Context Free Grammar (CFG),
○ Composed of four tuples: (T, NT, S, PR), where:
■ T: Terminals / tokens of the language
■ NT: Non-terminals, S: Start symbol, S є NT
■ PR: Production rules to indicate how T and NT are
combined to generate valid strings of the language
● PR: NT → (T | NT)*
● E.g.: E → E + E
E → num
★ Like a Regular Expression / DFA / NFA,
○ a Context Free Grammar is a mathematical model 15
Context-Free Grammars
★ A language, L(G), that can be generated by a context-free
grammar is said to be a context-free language
★ Two grammars are equivalent if they produce the same language
★ If S is the start symbol of G then and S ⇒ 𝛼
○ If 𝛼 contains non-terminals,
■ it is called as a sentential
form of G
○ If 𝛼 does not contain
non-terminals,
■ it is called as a sentence
of G 16
Context-Free Grammars
★ Types of derivation: two types of derivations
○ Left-most derivation
○ Right-most derivation
★ In leftmost derivations,
○ the leftmost non-terminal in each sentinel is always chosen
first for replacement
★ In rightmost derivations,
○ the rightmost non-terminal in each sentinel is always chosen
first for replacement
17
Context-Free Grammars
18
Ambiguity in CFG
★ A grammar that produces more than one parse for some sentence
is said to be ambiguous grammar
○ Example : Given grammar, G
■ E → E+E | E*E | id this grammar can also be written as:
● E → E+E
● E → E*E
● E → id
○ For the sentence id+id*id,
■ it has two distinct leftmost derivations and two rightmost
derivations:
19
Ambiguity in CFG
20
Ambiguity in CFG
★ Example 1: Given grammar, G,
○ S → aSbS | bSaS | 𝜆, where 𝜆 denotes empty string. is G
ambiguous grammar for input string abab?
21
Ambiguity in CFG
★ Example 1: Given grammar, G,
○ S → aSbS | bSaS | 𝜆, where 𝜆 denotes empty string. is G
ambiguous grammar for input string abab?
22
Ambiguity in CFG
★ Example 2: Given grammar, G,
○ S → aAB, A → bBb, B → A | 𝜆, where 𝜆 denotes empty string.
is G ambiguous grammar for input string abbbb?
23
Ambiguity in CFG
★ Example 2: Given grammar, G,
○ S → aAB, A → bBb, B → A | 𝜆, where 𝜆 denotes empty string.
is G ambiguous grammar for input string abbbb?
24
Ambiguity in CFG
★ Two basic problems due to ambiguity
○ Associativity property violation
■ When two operators have the same precedence, then we
will check their associative property.
■ E.g., plus and minus have the same priority. In such cases
we focus on associative property
○ Precedence property violation
■ E.g., plus and multiplication have different precedence
property. If your parse tree violates such precedence, we
call it precedence property violation
25
Ambiguity in CFG
★ Associativity property violation
○ Most arithmetic operators (+, -, *, /), boolean operators (⋀, ⋁,
¬) are left associative
■ when you face operators with the same precedence
● you give priority for the operator on the left side, called
left associative property
● E.g., 5+8-6+9, In this expression, we first add 5 and 8,
which will be 13-6+9. Then, we have to subtract 6 from
13, 7+9. Finally we can add 7&9, 16
■ If your parse tree violates such property, we call it
associative property violation 26
Ambiguity in CFG
★ Associativity property violation
27
Ambiguity in CFG
★ Associativity property violation and the solution
○ Example 1: Given grammar, G,
■ E → E + E, E → id, and the input string id+id+id
28
Ambiguity in CFG
★ Associativity property violation and the solution
○ Example 1: Given grammar, G,
■ E → E + E, E → id, and the input string id+id+id
29
Ambiguity in CFG
★ Associativity property violation and the solution
○ Example 1: Given grammar, G,
■ E → E + E, E → id, and the input string id+id+id
31
Ambiguity in CFG
★ Precedence property violation
○ BODMAS can tell us the precedence of arithmetic operators
○ In boolean operators, ¬, gets priority than ⋀ and ⋁
○ If your parse tree violates such precedence property, we call it
precedence property violation.
● E.g., 5+8*6/4, In this expression, we first multiply 8 and
6, which will be 5+48/4. Then, we divide 48 to 4, which
will be 5+12. Finally we can add 5&12, 17
32
Ambiguity in CFG
33
Ambiguity in CFG
★ Precedence property violation and the solution
○ Example 1: Given grammar, G,
■ E → E + E, E → E x E, E → id, & the input string id+idxid
34
Ambiguity in CFG
★ Precedence property violation and the solution
○ Example 1: Given grammar, G,
■ E → E + E, E → E x E, E → id, & the input string id+idxid
35
Ambiguity in CFG
★ Precedence property violation and the solution
○ Example 1: Given grammar, G,
■ E → E + E | E x E | E → id, & the input string id+idxid
38
Ambiguity in CFG
★ Example I: Given the following boolean expression
○ (a > 9) and (b<5) or ¬(a>b)
■ The precedence of the boolean expression from high to low
is: ¬, AND, OR
● we use left associative property between AND and OR
■ a) Try to determine the possible production rules of the
grammar
■ b) Make the grammar unambiguous
■ c) construct the parse tree for the expression
39
Ambiguity in CFG
★ Example I: Given the following boolean expression
○ a) Try to determine the possible production rules of the
grammar
40
Ambiguity in CFG
★ Example I: Given the following boolean expression
○
○ b) Make the grammar unambiguous
■ Then according to precedence and associative property,
● we can rewrite the production rules as:
41
Ambiguity in CFG
★ Example I: Given the following boolean expression
○
○ b) Make the grammar unambiguous
■ Then according to precedence and associative property,
● we can rewrite the production rules as:
42
Ambiguity in CFG
★ Example I: Given the following boolean expression
43
Ambiguity in CFG
★ Example II: Given Language R={ab, cb, bb}, the following
boolean expression
○ R∪R*.R
■ The precedence of the operators from high to low is: star
closure, union operator, concatenation operator
● we use left associative property between union operator,
concatenation operator
■ a) Try to determine the possible production rules of the
grammar
■ b) Make the grammar unambiguous
■ c) construct the parse tree for the expression 44
Ambiguity in CFG
★ Example II: Language R={ab, cb, bb}, and expression R∪R*.R
○ a) Try to determine the possible production rules of the
grammar
■ Based on the precedence of the operators, you can reorder
the production rules
45
Ambiguity in CFG
★ Example II: Language R={ab, cb, bb}, and expression R∪R*.R
○
○ b) Make the grammar unambiguous
■ Then according to precedence and associative property,
● we can rewrite the production rules as:
46
Ambiguity in CFG
★ Example II: Language R={ab, cb, bb}, and expression R∪R*.R
○
○ b) Make the grammar unambiguous
■ Then according to precedence and associative property,
● we can rewrite the production rules as:
47
Ambiguity in CFG
★ Example I: Given the following boolean expression
48
Recursion in CFG
Left Recursion:
■ A grammar is said to be left recursive if it has a non-terminal
A such that there is a derivation A=>Aα for some string α
■ Top-down parsing methods cannot handle left-recursive
grammars
49
Recursion in CFG (Cont.)
Left Recursion:
■
50
Recursion in CFG (Cont.)
Right Recursion:
■
51
Recursion in CFG (Cont.)
Left Recursion and Right Recursion
■
52
Recursion in CFG (Cont.)
Left Recursion and Right Recursion
■
53
Recursion in CFG (Cont.)
Left Recursion and solution
■ Left recursion can be eliminated as follows:
● If there is a production A → Aα | β it can be replaced with a
● A’ → αA’ | ε
54
Recursion in CFG (Cont.)
Left Recursion and solution
■
55
Recursion in CFG (Cont.)
Left Recursion and solution
■
56
Recursion in CFG (Cont.)
Eliminating Left Recursion: Example (a)
■
57
Recursion in CFG (Cont.)
Eliminating Left Recursion: Example (a)
■
58
Recursion in CFG (Cont.)
Eliminating Left Recursion: Example (b)
■
59
Recursion in CFG (Cont.)
Eliminating Left Recursion: Example (b)
■
60
Recursion in CFG (Cont.)
Eliminating Left Recursion: Example (c)
■
61
Recursion in CFG (Cont.)
Eliminating Left Recursion: Example (c)
■
62
Recursion in CFG (Cont.)
Eliminating Left Recursion: Example (d)
■
63
Recursion in CFG (Cont.)
Eliminating Left Recursion: Example (d)
■
64
Non-deterministic CFGs
65
Non-deterministic CFGs (Cont.)
Non-determinism can be eliminated using Left factoring
procedure
■ If there is a sequence of productions as follows:
● A → αβ
1
● A αβ2
→
● A αβ3
→
■ This can be replaced with a sequence of two productions
● A → αA’
● A’ → β | β | β
1 2 3
● This is called Left factoring procedure
66
Non-deterministic CFGs (Cont.)
67
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (a)
■
68
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (a)
■
■
■
■
■
■ Lets rearrange it
69
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (a)
■
70
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (a)
■
71
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (b)
■
72
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (b)
■
73
Non-deterministic CFGs (Cont.)
Determinism Vs Ambiguity
■ Lets see with the above grammar for the given input string
74
Non-deterministic CFGs (Cont.)
Determinism Vs Ambiguity
■
75
Non-deterministic CFGs (Cont.)
Determinism Vs Ambiguity
■
76
Non-deterministic CFGs (Cont.)
Determinism Vs Ambiguity
■
77
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (c)
■
78
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (c)
■
79
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (c)
■
80
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (d)
■
81
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (d)
■
82
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (d)
■
83
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (e)
■
84
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (e)
■
85
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (e)
■
86
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (e)
■
87
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (e)
■
88
Productivity and Reachability of non-terminals
Productivity
■ A non-terminal symbol in a CFG is said to be productive if
there exists a derivation or sequence of production rule
applications that can generate at least one string of terminals
■ In other words, a non-terminal is productive if it can eventually
lead to the generation of valid sentences or program constructs
in the language defined by the grammar
■ If a non-terminal cannot lead to any terminal symbols through
any sequence of production rules, it is considered
unproductive and can be safely removed from the grammar
89
without affecting the language generated by the CFG
Productivity and Reachability of non-terminals (Cont.)
Productivity S→A|B
■ Example (a): A→a
B→C
C→d
■ In this grammar, the non-terminal S is directly productive
because it can be replaced by either A or B. The non-terminal A
is directly productive because it can be replaced by the terminal
symbol a. The non-terminal B is directly productive because it
can be replaced by the non-terminal C, which in turn can be
replaced by the terminal symbol d. Therefore, all the
non-terminals (S, A, B, C) in this grammar are productive 90
Productivity and Reachability of non-terminals (Cont.)
Productivity
■ Example (b):
91
Productivity and Reachability of non-terminals (Cont.)
Productivity S→A
■ Example (c): A → aA | ε
B → bB | ε
96