CH 4 - Context Free Languages Amd Grammars
CH 4 - Context Free Languages Amd Grammars
and
Context Free Languages
1
Outline
CFGs and CFLs
Simplification of CFGs
Set of
variables
Set of Set of Start
terminal productions variable
symbols
4
Context-Free Grammar: G (V , T , P, S )
G V , T , P, S
V {S }
T {a, b} start variable
variables
terminals
6
Language of a Grammar:
String of terminals or
7
Context-Free Language:
A language is context-free
if there is a context-free grammar
with
8
Example:
L {a b : n 0 }
n n
is a context-free language
since context-free grammar :
S aSb |
generates L(G ) L
9
Another Example
Context-free grammar :
S aSa | bSb |
Example derivations:
S aSa abSba abba
S aSa abSba abaSaba abaaba
R
L(G ) {ww : w {a, b}*}
Palindromes of even length
10
Another Example
Context-free grammar :
S aSb | SS |
Example derivations:
S SS aSbS abS ab
S SS aSbS abS abaSb abab
12
Another Example
Construct a grammar that generates the
language:
L={anbnci|n≥0,i≥0}
Answer:
S S1 S2
S1 aS1b|
S2 cS2 |
Check your answer by deriving
abc, aabbc, aaabbbcccc
13
Another Example
Write the productions of CFG that accept the
language:
L= {ab (bbaa)n bba|n≥0}
14
Derivation Order
and
Derivation Trees
Derivation Order
Consider the following example grammar
with 5 productions:
1. S AB 2. A aaA 4. B Bb
3. A 5. B
16
1. S AB 2. A aaA 4. B Bb
3. A 5. B
1 2 3 4 5
S AB aaAB aaB aaBb aab
1 4 5 2 3
S AB ABb Ab aaAb aab
At each step, we substitute the
rightmost variable
18
1. S AB 2. A aaA 4. B Bb
3. A 5. B
S AB A aaA | B Bb |
20
S AB A aaA | B Bb |
21
S AB A aaA | B Bb |
S AB aaAB
aaAB
22
S AB A aaA | B Bb |
S AB aaAB
aaAB
23
S AB A aaA | B Bb |
S AB aaAB
aaAB
24
S AB A aaA | B Bb |
S AB aaAB
aaAB
25
S aAS | a A SbA | SS | ba
Consider the CFG with the above production rules:
Find the Left most & Right most derivations and
construct a parse tree (Generation tree) for
generating the strings:
i) aabaaa
ii) abaabaa
iii) aabaabbaaaa
26
Sometimes, derivation order doesn’t matter
Leftmost derivation:
S AB aaAB aaB aaBb aab
Rightmost derivation:
S AB ABb Ab aaAb aab
Give same
derivation tree
27
Ambiguity
Grammar for mathematical expressions
E E E | E E | (E) | a
Example strings:
(a a ) a (a a (a a ))
29
E E E | E E | (E) | a
E E E a E a EE
a a E a a*a
A leftmost derivation
for a a a
30
E E E | E E | (E) | a
Another
leftmost derivation
for a a a
31
E E E | E E | (E) | a
32
Two different derivation trees
may cause problems in applications which
use the derivation trees:
• Evaluating expressions
• In general, in compilers
for programming languages
35
Ambiguous Grammar:
A context-free grammar G is ambiguous
if there is a string w L(G ) which has:
37
E E E | E E | (E) | a
this grammar is also ambiguous because
string a a a has two leftmost derivations
E E E a E a EE
a a E a a*a
Variables Terminals
41
Disambiguation
Can we always disambiguate a grammar?
E E T |T
T T F | F
F (E) | a
Unique
derivation tree
for a a a
44
Context-Free Languages
Context-Free Pushdown
Grammars Automata
46
Simplification of CFG
and
Chomsky Normal Form
Simplifying CFGs
There are several ways in which context-free grammars
can be simplified.
One natural way is to eliminate useless symbols
those that cannot be part of a derivation (or parse tree)
48
Example of a useless symbol
Consider the CFG G with rules
S → aBC, B → b|Cb, C → c|cC, D → d
49
Reachable symbols
In a CFG, a symbol is reachable iff it is S or
it appears in a, where A → a is a rule of the grammar, and A is
reachable
So in the grammar above
we first find that S is reachable
then a, B, and C are
and finally b and c are.
A symbol that is unreachable cannot be part of a
derivation. It may be eliminated along with all of its
rules.
50
Another reachability example
Suppose the grammar G instead had rules
S → aB, B → b|Cb, C → c|cC, D → d,
Then we would first see that S is reachable
then that a and B are
then that b and C are
and finally that c is
We might say in this case that
S is reachable at level 0,
a and B at level 1,
b and C at level 2,
and c at level 3.
51
A second kind of useless symbol
Two simple inductions show that X is reachable iff S =*>
aXb for some strings a and b of symbols.
52
Another simplification example
In the grammar with rules
S → aB, B → b|BD|cC, C → cC, D → d
the symbol C cannot derive a string of terminals.
So it and all rules that contain it may be eliminated to
get just
S → aB, B → b|BD, D → d
53
Generating strings of terminals
A simple induction shows that the only symbols that
can generate strings of terminals are
terminal symbols
variable A for which A → a is a rule of the grammar and every
symbol of a generates a string of terminals
54
Our example revisited
S → aB, B → b|BD|cC, C → cC, D → d
55
Removing the two kinds of useless symbols
The characterizations of the two kinds of useless
symbols are similar, except that
To find reachable symbols, we work top down
To find generating symbols, we work bottom up.
When removing useless symbols, it’s important to
remove unreachable symbols last
since only this order will leave only useful symbols at the end
of the process.
56
Example of removing useless symbols
Using the algorithms implicit in the above
characterizations, suppose a CFG has rules
S → aB, B → b|bB|CD, C → cC, D → d
We first observe that a, b, c, and d generate strings of
terminals (at level 0)
then that B and D do (at level 1)
and finally that S does (at level 2).
But removing the rule B → CD from this grammar
makes the symbol D unreachable.
57
Exercise on removing useless symbols
Eliminate the useless variables from the grammar given
by the production rules:
1) S → aS|A|C Answer:
A→a
B → aa S → aS|A
C → aCb A→a
Answer:
2) S → aB|bC
A → Bac|bSC|a S → bC
B → aSB|bBC C → ac
C → SBc|aBC|ac
58
Eliminating l-rules
Sometimes it is desirable to eliminate l-rules from a
grammar G.
This cannot be done if l is in L(G),
But it's always possible to eliminate l-rules from a CFG
and get a grammar that generates L(G) - {l}.
59
Nullable symbols
Eliminating l-rules is like eliminating useless symbols.
We first define a nullable symbol A to be one such that
A =*> l.
Then for every rule that contains a nullable symbol on
the RHS, we add a version of the rule that doesn't
contain this symbol.
Finally we remove all l-productions from the resulting
grammar.
60
Nullability
Note that l is in L(G) iff S is nullable.
In this case a CFG with S → l as its only l-rule can be
obtained by removing all other l-rules and then adding
this rule.
Otherwise, removing l-rules gives a CFG that
generates L(G) = L(G) - {l}
By a simple induction, A is nullable iff
G has a rule A → l, or
G has a rule A → a, where every symbol in a is nullable
61
Example: removing nullable symbols
Suppose G has the 9 rules
S → ABC | CDB, A → l|aA, B → b|bB,
C → l|cC, D → AC
Then A and C are nullable, as is D.
Optionally deleting nullable symbols adds:
S → BC | AB | B, S → DB | CB | B,
A → a, C → c, D → A | C
Removing A → l and C → l (and not adding D → l)
gives a CFG with 16 distinct rules
62
Exercise on Eliminating l-Productions
Remove Empty Productions from the following
production rule of a CFG
S → AB Answer:
A → aAA | l
B → bBB | l S → AB | A | B
A → aAA | aA |a
B → bBB | bB | b
63
Observations on the previous example
S → ABC | CDB, A → l|aA, B → b|bB,
C → l|cC, D → AC
65
Eliminating unit productions -- example
Consider the familiar grammar with rules
E → E+T | T, T → T*F | F, F → x | (E)
Here we have that
E =*> T and T =*> F (at level 0), and
E =*> F (at level 1)
Eliminating unit productions gives new rules
E → E+T | T*F | x | (E)
T → T*F | x | (E)
F → x | (E)
66
Exercise on Eliminating unit Productions
Remove Unit Productions from the following
production rule of a CFG
S → Aa|A Answer:
A → a|bc|B
B → A|bb S → Aa |a|bc|bb
A → a| bc |bb
B → a| bc |bb
67
Order of steps when simplifying
To eliminate useless symbols, l-productions, and unit
productions safely from a CFG, we need to apply the
simplification algorithms in an appropriate order.
A safe order is:
l-productions
unit productions
useless symbols
▪ nongenerating symbols
▪ unreachable symbols
68
Chomsky normal form
Noam Chomsky
– The Grammar Guy
– 1928 –
– b. Philadelphia, PA
– PhD – UPenn (1955)
• Linguistics
– Prof at MIT (Linguistics)
(1955 - present)
69
Chomsky normal form
One additional way to simplify a CFG is to simplify each
RHS
A CFG is in Chomsky normal form (CNF) iff each
production has a RHS that consists of
a single terminal symbol, or
two variable symbols
For any CFG G, with l not in L(G), there is an equivalent
grammar G1 in CNF.
Must begin the transformation after simplifying the
grammar (removing λ, all unit productions, & useless
variables)
70
Converting to Chomsky normal form
A CFG that doesn't generate l may be converted to
CNF by first eliminating all l-moves and unit
productions.
This will give a grammar where each RHS of length less
than 2 consists of a lone terminal.
Any RHS of length k > 2 may be broken up by
introducing k-2 new variables.
For any terminal a that remains on a RHS, we add a
new variable and new rule Ca → a.
71
Converting to CNF: an example
For example, the rule S → AbCD in a CFG G can be
replaced by
S → AX, X → bY, Y → CD
Here we don’t change L(G)
After the remaining steps, the new rules would be
S → AX, X → CbY, Y → CD, Cb → b
Again we don’t change L(G)
72
A more complete example
Consider the grammar with rules
E → E + T | T * F | x, T → T * F | x, F → x
The last rule for each symbol is legal in CNF.
We may replace
E → E + T by E → EX, X → C+T, C+ → +
E → T * F by E → TY, Y → C*F, C* → *
T → T * F by T → TZ, Z → C*F, C*→ *
73
The resulting grammar
The resulting CFG is in CNF, with rules
E → EX | TY | x
T → TZ | x
F→x
X → C+T
Y → C*F
Z → C*F (or Z could be replaced by Y)
C+ → +
C* → *
74
Exercise
Convert the CFGs into Chomsky Normal Forms:
1) S |ADDA 2) S ➞ bA | aB
Aa A ➞ bAA | aS | a
Cc B ➞ aBB | bS | b
D bCb
3) S ➞ abAB 4) S ➞ aS | bS | B
A ➞ bAB | λ B ➞ bb | C | λ
B ➞ BAa | A | λ C ➞ cC | λ
75
Greibach Normal Form
A CFG G is in Greibach Normal Form if
every production is of the form A a , where V* and
a ( may be ) and
S is in G if L(G).
82
Conversion to GNF
Example: Cont.…
Step3:
Convert An-productions of form An An to the form
An a , by applying lemma 2.
We have to apply Lemma 2 to A2-productions as we have
A2 A2A2A1, and let Z2 to be the new variable.
The resulting productions are:
A2 aA1 A2 b
A2 aA1Z2 A2 bZ2
Z2 A2A1 Z2 A2A1Z2
83
Conversion to GNF
Example: Cont.…
Step4:
Modify the Ai-productions to the form Ai a for i = 1,2, . . . , n–
1 by applying lemma 1.
(i) The A2-productions are A2 aA1 | b | aA1Z2 | bZ2.
(ii) Among the A1-productions we retain A1 a and eliminate
A1 A2A2 using Lemma 1.
▪ The resulting productions are A1 aA1A2 | bA2, A1 aA1Z2A2 | bZ2A2
▪ The set of all (modified) A1-productions is
A1 a | aA1A2 | bA2 | aA1Z2A2 | bZ2A2
84
Conversion to GNF
Example: Cont.…
Step5:
Modify Zi-productions, which are of the form Zi a or
Zi Ak for some k.
The Z2-productions to be modified are:
Z2 A2A1, Z2 A2A1Z2
We apply Lemma 1 and get
Z2 aA1A1 | bA1 | aA1Z2A1 | bZ2A1
Z2 aA1A1Z2 | bA1Z2 | aA1Z2A1Z2| bZ2A1Z2
85
Conversion to GNF
Example: Cont.…
Hence the equivalent grammar is
G' = ({A1, A2, Z2}, {a, b}, P1, A1)
where P1 consists of
86