0% found this document useful (0 votes)
23 views96 pages

COSC3054 Lec 03 I Grammars

Uploaded by

daniel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views96 pages

COSC3054 Lec 03 I Grammars

Uploaded by

daniel
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 96

Principles of

Compiler Design
COSC3054
Lecture 03
Syntactic Analysis -
Grammars
Syntax Analysis (Parsing)
★ An overview of parsing
○ Functions & Responsibilities
★ Context Free Grammars
○ Concepts & Terminology
★ Writing and Designing Grammars
★ Resolving Grammar Problems / Difficulties
★ Top-Down Parsing
○ Recursive Descent & Predictive LL
★ Bottom-Up Parsing
○ LR(0), SLR, CLR & LALR
★ Concluding Remarks/Looking Ahead 3
The Role of the Parser
★ The Parser or syntactic analyzer obtains a string of tokens from
the lexical analyzer
○ and verifies that the string can be generated by the grammar
for the source language
★ It reports any syntax errors in the program
○ It also recovers from commonly occurring errors
■ so that it can continue processing its input

4
The Role of the Parser

5
The Role of the Parser
★ Parser builds the parse tree
★ Parser verifies the structure generated by the tokens based on the
grammar
○ performs context free syntax analysis
★ Parser helps to construct intermediate code
★ Parser produces appropriate error messages
★ Parser performs error recovery
★ Issues: Parser cannot detect errors such as:
○ Variable re-declaration
○ Variable initialization before use
○ Data type mismatch for an operation 6
The Role of the Parser
★ The Parser
○ to identify the syntactic structure in the sequence of symbols,
that is how the syntactic units are composed from other units
★ Syntactic units in imperative languages are, for example,
variables, expressions, declarations, statements, and sequences of
statements
★ One possible representation of the syntactic structure of the input
program is the syntax tree or parse tree
★ The syntactic structure of the programs written in some
programming language can be described by a context-free
grammar 7
Syntax Analysis Foundations
★ Lexical analysis
○ specified by regular expressions and
○ implemented by finite automata
★ Syntax analysis
○ specified by context-free grammars (CFG) and
○ implemented by pushdown automata (PDA)
★ Regular expressions alone are not sufficient to describe the
syntax of programming languages
○ they cannot express embedded recursion as occurs in the
nesting of expressions, statements, and blocks
8
RE vs. CFG
★ Regular Expressions
○ Basis of lexical analysis
○ Represent regular languages
★ Context Free Grammars
○ Basis of parsing
○ Represent language constructs
○ Characterize context free languages

9
Context-Free Grammars

10
Context-Free Grammars

11
Context-Free Grammars
★ The non-terminal symbol <stat> generates statements
★ The meta-character | is used to combine several alternatives
★ According to these productions,
○ a statement is either an if -statement, a while-statement, a
do-while-statement, an expression followed by a semicolon,
an empty statement, or a sequence of statements in parentheses
★ The non-terminal symbol <if_stat>
○ generates if-statements in which the else-part may be present
or absent

12
Context-Free Grammars
★ Example of context-free grammar:
○ The following grammar defines simple arithmetic expressions:

13
Context-Free Grammars
★ Inherently recursive structures of a programming language are
defined by a context-free grammar
★ In a context-free grammar, we have:
○ A finite set of terminals, T
■ this will be the set of tokens
○ A finite set of non-terminals (syntactic-variables), N
○ A finite set of productions rules in the following form
○ A → α, where A is a non-terminal and α is a string of terminals
and non-terminals (including the empty string)
○ A start symbol (one of the non-terminal symbol)
14
Context-Free Grammars
★ A Context Free Grammar (CFG),
○ Composed of four tuples: (T, NT, S, PR), where:
■ T: Terminals / tokens of the language
■ NT: Non-terminals, S: Start symbol, S є NT
■ PR: Production rules to indicate how T and NT are
combined to generate valid strings of the language
● PR: NT → (T | NT)*
● E.g.: E → E + E
E → num
★ Like a Regular Expression / DFA / NFA,
○ a Context Free Grammar is a mathematical model 15
Context-Free Grammars
★ A language, L(G), that can be generated by a context-free
grammar is said to be a context-free language
★ Two grammars are equivalent if they produce the same language
★ If S is the start symbol of G then and S ⇒ 𝛼

○ If 𝛼 contains non-terminals,
■ it is called as a sentential
form of G
○ If 𝛼 does not contain
non-terminals,
■ it is called as a sentence
of G 16
Context-Free Grammars
★ Types of derivation: two types of derivations
○ Left-most derivation
○ Right-most derivation
★ In leftmost derivations,
○ the leftmost non-terminal in each sentinel is always chosen
first for replacement
★ In rightmost derivations,
○ the rightmost non-terminal in each sentinel is always chosen
first for replacement

17
Context-Free Grammars

18
Ambiguity in CFG
★ A grammar that produces more than one parse for some sentence
is said to be ambiguous grammar
○ Example : Given grammar, G
■ E → E+E | E*E | id this grammar can also be written as:
● E → E+E
● E → E*E
● E → id
○ For the sentence id+id*id,
■ it has two distinct leftmost derivations and two rightmost
derivations:
19
Ambiguity in CFG

20
Ambiguity in CFG
★ Example 1: Given grammar, G,
○ S → aSbS | bSaS | 𝜆, where 𝜆 denotes empty string. is G
ambiguous grammar for input string abab?

21
Ambiguity in CFG
★ Example 1: Given grammar, G,
○ S → aSbS | bSaS | 𝜆, where 𝜆 denotes empty string. is G
ambiguous grammar for input string abab?

22
Ambiguity in CFG
★ Example 2: Given grammar, G,
○ S → aAB, A → bBb, B → A | 𝜆, where 𝜆 denotes empty string.
is G ambiguous grammar for input string abbbb?

23
Ambiguity in CFG
★ Example 2: Given grammar, G,
○ S → aAB, A → bBb, B → A | 𝜆, where 𝜆 denotes empty string.
is G ambiguous grammar for input string abbbb?

24
Ambiguity in CFG
★ Two basic problems due to ambiguity
○ Associativity property violation
■ When two operators have the same precedence, then we
will check their associative property.
■ E.g., plus and minus have the same priority. In such cases
we focus on associative property
○ Precedence property violation
■ E.g., plus and multiplication have different precedence
property. If your parse tree violates such precedence, we
call it precedence property violation
25
Ambiguity in CFG
★ Associativity property violation
○ Most arithmetic operators (+, -, *, /), boolean operators (⋀, ⋁,
¬) are left associative
■ when you face operators with the same precedence
● you give priority for the operator on the left side, called
left associative property
● E.g., 5+8-6+9, In this expression, we first add 5 and 8,
which will be 13-6+9. Then, we have to subtract 6 from
13, 7+9. Finally we can add 7&9, 16
■ If your parse tree violates such property, we call it
associative property violation 26
Ambiguity in CFG
★ Associativity property violation

27
Ambiguity in CFG
★ Associativity property violation and the solution
○ Example 1: Given grammar, G,
■ E → E + E, E → id, and the input string id+id+id

28
Ambiguity in CFG
★ Associativity property violation and the solution
○ Example 1: Given grammar, G,
■ E → E + E, E → id, and the input string id+id+id

29
Ambiguity in CFG
★ Associativity property violation and the solution
○ Example 1: Given grammar, G,
■ E → E + E, E → id, and the input string id+id+id

○ Production of such are called Left Recursive production


■ To satisfy left associative property,
● we use left recursive production
30
Ambiguity in CFG
★ Associativity property violation and the solution
○ Example 1: Given grammar, G,

31
Ambiguity in CFG
★ Precedence property violation
○ BODMAS can tell us the precedence of arithmetic operators
○ In boolean operators, ¬, gets priority than ⋀ and ⋁
○ If your parse tree violates such precedence property, we call it
precedence property violation.
● E.g., 5+8*6/4, In this expression, we first multiply 8 and
6, which will be 5+48/4. Then, we divide 48 to 4, which
will be 5+12. Finally we can add 5&12, 17

32
Ambiguity in CFG

33
Ambiguity in CFG
★ Precedence property violation and the solution
○ Example 1: Given grammar, G,
■ E → E + E, E → E x E, E → id, & the input string id+idxid

34
Ambiguity in CFG
★ Precedence property violation and the solution
○ Example 1: Given grammar, G,
■ E → E + E, E → E x E, E → id, & the input string id+idxid

35
Ambiguity in CFG
★ Precedence property violation and the solution
○ Example 1: Given grammar, G,
■ E → E + E | E x E | E → id, & the input string id+idxid

○ From the correct syntax tree,


■ operators with high precedence will be at the bottom
● on the leaf nodes 36
Ambiguity in CFG
★ Precedence property violation and the solution
○ Example 1: Given grammar, G,
■ E → E + E | E x E | E → id, & the input string id+idxid

○ operators with high precedence will be at the bottom


■ on the leaf nodes 37
Ambiguity in CFG
★ Precedence property violation and the solution
○ We can remove precedence violation by defining the levels

38
Ambiguity in CFG
★ Example I: Given the following boolean expression
○ (a > 9) and (b<5) or ¬(a>b)
■ The precedence of the boolean expression from high to low
is: ¬, AND, OR
● we use left associative property between AND and OR
■ a) Try to determine the possible production rules of the
grammar
■ b) Make the grammar unambiguous
■ c) construct the parse tree for the expression

39
Ambiguity in CFG
★ Example I: Given the following boolean expression
○ a) Try to determine the possible production rules of the
grammar

■ Based on the precedence of the operators, you can reorder


the production rules

40
Ambiguity in CFG
★ Example I: Given the following boolean expression

○ b) Make the grammar unambiguous
■ Then according to precedence and associative property,
● we can rewrite the production rules as:

41
Ambiguity in CFG
★ Example I: Given the following boolean expression

○ b) Make the grammar unambiguous
■ Then according to precedence and associative property,
● we can rewrite the production rules as:

42
Ambiguity in CFG
★ Example I: Given the following boolean expression

○ c) construct the parse tree for (a > 9) and (b<5) or ¬(a>b)

43
Ambiguity in CFG
★ Example II: Given Language R={ab, cb, bb}, the following
boolean expression
○ R∪R*.R
■ The precedence of the operators from high to low is: star
closure, union operator, concatenation operator
● we use left associative property between union operator,
concatenation operator
■ a) Try to determine the possible production rules of the
grammar
■ b) Make the grammar unambiguous
■ c) construct the parse tree for the expression 44
Ambiguity in CFG
★ Example II: Language R={ab, cb, bb}, and expression R∪R*.R
○ a) Try to determine the possible production rules of the
grammar
■ Based on the precedence of the operators, you can reorder
the production rules

45
Ambiguity in CFG
★ Example II: Language R={ab, cb, bb}, and expression R∪R*.R

○ b) Make the grammar unambiguous
■ Then according to precedence and associative property,
● we can rewrite the production rules as:

46
Ambiguity in CFG
★ Example II: Language R={ab, cb, bb}, and expression R∪R*.R

○ b) Make the grammar unambiguous
■ Then according to precedence and associative property,
● we can rewrite the production rules as:

47
Ambiguity in CFG
★ Example I: Given the following boolean expression

○ c) construct the parse tree for (a > 9) and (b<5) or ¬(a>b)

48
Recursion in CFG
Left Recursion:
■ A grammar is said to be left recursive if it has a non-terminal
A such that there is a derivation A=>Aα for some string α
■ Top-down parsing methods cannot handle left-recursive
grammars

49
Recursion in CFG (Cont.)
Left Recursion:

50
Recursion in CFG (Cont.)
Right Recursion:

51
Recursion in CFG (Cont.)
Left Recursion and Right Recursion

52
Recursion in CFG (Cont.)
Left Recursion and Right Recursion

53
Recursion in CFG (Cont.)
Left Recursion and solution
■ Left recursion can be eliminated as follows:
● If there is a production A → Aα | β it can be replaced with a

sequence of two productions


● A → βA’

● A’ → αA’ | ε

● without changing the set of strings derivable from A

● this makes right recursive

54
Recursion in CFG (Cont.)
Left Recursion and solution

55
Recursion in CFG (Cont.)
Left Recursion and solution

56
Recursion in CFG (Cont.)
Eliminating Left Recursion: Example (a)

57
Recursion in CFG (Cont.)
Eliminating Left Recursion: Example (a)

58
Recursion in CFG (Cont.)
Eliminating Left Recursion: Example (b)

59
Recursion in CFG (Cont.)
Eliminating Left Recursion: Example (b)

60
Recursion in CFG (Cont.)
Eliminating Left Recursion: Example (c)

61
Recursion in CFG (Cont.)
Eliminating Left Recursion: Example (c)

62
Recursion in CFG (Cont.)
Eliminating Left Recursion: Example (d)

63
Recursion in CFG (Cont.)
Eliminating Left Recursion: Example (d)

64
Non-deterministic CFGs

65
Non-deterministic CFGs (Cont.)
Non-determinism can be eliminated using Left factoring
procedure
■ If there is a sequence of productions as follows:
● A → αβ
1
● A αβ2

● A αβ3

■ This can be replaced with a sequence of two productions
● A → αA’

● A’ → β | β | β
1 2 3
● This is called Left factoring procedure
66
Non-deterministic CFGs (Cont.)

67
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (a)

68
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (a)





■ Lets rearrange it

69
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (a)

70
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (a)

71
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (b)

72
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (b)

73
Non-deterministic CFGs (Cont.)
Determinism Vs Ambiguity
■ Lets see with the above grammar for the given input string

74
Non-deterministic CFGs (Cont.)
Determinism Vs Ambiguity

75
Non-deterministic CFGs (Cont.)
Determinism Vs Ambiguity

76
Non-deterministic CFGs (Cont.)
Determinism Vs Ambiguity

77
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (c)

78
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (c)

79
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (c)

80
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (d)

81
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (d)

82
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (d)

83
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (e)

84
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (e)

85
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (e)

86
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (e)

87
Non-deterministic CFGs (Cont.)
Eliminating non-determinism: Example (e)

88
Productivity and Reachability of non-terminals
Productivity
■ A non-terminal symbol in a CFG is said to be productive if
there exists a derivation or sequence of production rule
applications that can generate at least one string of terminals
■ In other words, a non-terminal is productive if it can eventually
lead to the generation of valid sentences or program constructs
in the language defined by the grammar
■ If a non-terminal cannot lead to any terminal symbols through
any sequence of production rules, it is considered
unproductive and can be safely removed from the grammar
89
without affecting the language generated by the CFG
Productivity and Reachability of non-terminals (Cont.)
Productivity S→A|B
■ Example (a): A→a
B→C
C→d
■ In this grammar, the non-terminal S is directly productive
because it can be replaced by either A or B. The non-terminal A
is directly productive because it can be replaced by the terminal
symbol a. The non-terminal B is directly productive because it
can be replaced by the non-terminal C, which in turn can be
replaced by the terminal symbol d. Therefore, all the
non-terminals (S, A, B, C) in this grammar are productive 90
Productivity and Reachability of non-terminals (Cont.)
Productivity
■ Example (b):

■ Y is productive and therefore also X, S and S’. The non-terminal Z,


on the other hand, is not productive since the only production for Z
contains an occurrence of Z on its right side

91
Productivity and Reachability of non-terminals (Cont.)
Productivity S→A
■ Example (c): A → aA | ε
B → bB | ε

■ Non-terminal A is productive because it can eventually derive


strings containing terminal symbol 'a'. Non-terminal B is also
productive because it can eventually derive strings containing
terminal symbol 'b'. Both A and B are productive as they can
generate at least one terminal symbol through their production
rules. Productivity Example: Starting with A, we can derive
strings like 'a', 'aa', 'aaa', and so on. Thus, A is productive 92
Productivity and Reachability of non-terminals (Cont.)
Reachability
■ Reachability refers to whether it is possible to reach a
non-terminal symbol starting from the start symbol of the CFG
■ A non-terminal is reachable if there exists at least one
derivation sequence starting from the start symbol that
eventually reaches that non-terminal.
■ If a non-terminal is unreachable, it means there is no way to
generate any string containing that non-terminal from the start
symbol, and thus it is not contributing to the language described
by the CFG. Unreachable non-terminals can be safely removed
from the grammar without affecting the language it generates93
Productivity and Reachability of non-terminals (Cont.)
Reachability S→A
A → BCD
■ Example (a): B→b
C→c
D→E
E→f
■ In this grammar, the non-terminal S is reachable because it is the
start symbol. The non-terminal A is reachable because it appears on
the right-hand side of the production rule S -> A. The non-terminals
B, C, and D are reachable because they appear on the right-hand
side of the production rule A -> BCD. Finally, the non-terminal E is
reachable because it appears on the right-hand side of the production
94
Productivity and Reachability of non-terminals (Cont.)
Reachability S→A
■ Example (b): A → aA | ε
B → bB | ε

■ Non-terminal S is the start symbol, so it is reachable by


definition. Non-terminal A is reachable because it's directly
reachable from S. However, non-terminal B is not reachable
from the start symbol S. Therefore, it's unreachable.
■ Reachability Example: Since B is not reachable from the start
symbol S, it doesn't matter whether B is productive or not. It's
simply not contributing to the language described by the CFG
95
Productivity and Reachability of non-terminals (Cont.)
Ensuring that all non-terminals in a CFG are both productive and
reachable is important for simplifying and optimizing the
grammar
Removing unproductive or unreachable non-terminals can lead
to a more concise and efficient CFG without changing the
language it generates
This process is often part of grammar analysis and optimization
during compiler construction or natural language processing tasks

96

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy