CFG
CFG
2
: Idea
Start in p
0
with X
0
on stack
Move to q
0
with Z
0
X
0
on stack
Simulate P
f
From any final states of P
f
add transition to p which will
just empty the stack
X
0
: It prevents accidental acceptance by P
n
when P
f
empties
its stack and rejects
Push Down Automata(PDA)
The reverse: Given P
n
= {Q
2
, ,
2
,
2
, p
0
, X
0
, }
construct P
f
such that L(P
f
) = N(P
n
)
Idea: P
f
= {Q
1
, ,
1
,
1
, p
0
, X
0
, F
1
}
where, Q
1
= Q U {p
0
, p
f
}
1
= U {X
0
}
F
1
= {p
f
}
Idea:
Start with p
0
with X
0
in stack
Move to q
0
with Z
0
X
0
on stack
Simulate P
n
When P
n
empties its stack, it exposes X
0
From all states add an -move to p
f
whenever X
0
in on top of
the stack
Push Down Automata(PDA)
Push Down Automata(PDA)
Equivalence of CFGs and PDA
Claim: Every CFL is accepted by some PDA and every PDA
accepts some CFG
Theorem 1: If L is CFL L = N(M) for some PDA M
Proof: Suppose G is a CFG for L
Our goal: Construct a PDA M for G such that L(M) = N(M)
Idea: PDA M simulates LM derivations in G for input w such
that at any step the sentential form is represented by
a) A sequence of symbols consumed from input w by M
b) Followed by contents of Ms stack
Push Down Automata(PDA)
Formally given CFG G = (V, T, P, S)
Construct PDA M = {Q, , , , q
0
,
0
, }
with Q = {q}, q
0
= q, = T, = VUT, Z
0
= S
Defining : Two types
1. If terminal a is on stack top, then expect to see an a in
input and consume both note no change in sentential
form
2. If variable A is on stack top, then replace it by RHS of
any of its production rule in P note no change in input
consumed.
Thus:
(q, , A) = {(q, 1), (q, 2), , (q, k)}
Where A 1| 2|| k are in P
(q, a, a) = {(q, )}
V Ae
T ae
Push Down Automata(PDA)
Example: Consider G
S AS|
A 0A1|A1|01
PDA: M = {{q}, {0,1}, {0,1,A,S}, , q, S, }
: (q, , S) = {(q, AS), (q, )}
(q, , A) = {(q, 0A1), (q, A1), (q, 01)}
(q, 0, 0) = {(q, )}
(q, 1, 1) = {(q, )}
Push Down Automata(PDA)
Execution: Consider w = 011
In G: S AS A1S 011S 011
In M: <q, 011, S> | <q,011, AS>| <q,011, A1S>
| <q,011, 011S>| <q,11, 11S>
| <q,1, 1S>| <q, , S>
| <q, , >
Observe: one to one correspondence between LM derivation
and execution trace.
Of course there are many execution traces possible each
corresponding to a distinct derivations.
Beside that, observe if two distinct execution accepts w, there
exists two distinct LM derivations and thus the grammar is
ambiguous.
Push Down Automata(PDA)
In theorem, our construction heavily relied on the power of
non-determinism to allow the machine to guess the correct
derivation
But in real life (or in parsers/YACC), we dont have non-
deterministic power
So we need to convert PDAs to some form of deterministic
PDA
Definition: DPDA is a PDA with 2 restrictions:
a) (q, a, Z) has 1possibility
b) If (q, , Z) is defined then for all a , (q, a, Z) is
empty
Context Free Grammars
Simplification of Context Free Grammars: In CFG, we try to
eliminate those symbols and productions which are not useful
for derivation of sentences.
Useful Symbols: Any X belonging to VUT such that
S X w
With w T* and , (VUT)*
Thus useless symbols are those symbols which do not take
part in derivations and can safely eliminate without changing
language
Defn: X VUT is generating if X w for w *
Defn: X VUT is reachable if S X for , (VUT)*
Clearly: useful X is both generating and reachable
Context Free Grammars
Idea: Identify useless symbols by removing:
Step1: Non generating Xs
Step2: Non reachable Xs,
and all their productions
Observe: must do it in this order,
Example S AB|a
A b
Suppose we do Step2 first, all symbols are reachable so when we
do Step1 next we eliminate B as being non-generative
But if we do it in right order, we first eliminate B in Step1 and also
eliminate the production S AB
Now in Step2 we find that A is non-reachable so we eliminate A as
well.
In general we perform both steps recursively.
Context Free Grammars
Step1: Eliminating non-generative symbols
Basis: Label all terminals in T as generating
Induction: For all production: X X
1
X
2
X
k
, if each X
i
is
generating then X is generating.
Terminate when no new generating symbol could be found
Step2: Eliminate non-reachable symbols
Basis: S is reachable
Induction: For all production: X X
1
X
2
X
k
if X is reachable,
then label each X
1
, X
2
, X
3
X
i
as being reachable.
Context Free Grammars
Example: S AB|AC|CD
A BB
B AC|ab
C Ca|CC
D BC|b|d
Step1: Base: {a, b, d} is generating
{a, b, d, A, B, D} is generating
{a, b, d, A, B, D, S} is generating
As C is not found to be generating, remove C and all the
production that contain C either on LHS or RHS.
Context Free Grammars
New grammar:
G2: S AB
A BB
B ab
D b|d
Step2: Reachable?
Base: {S} is reachable
{S, A, B} is reachable
{S, A, B, a, b} is reachable
Remove D and all productions that contain D either in LHS or
RHS
Context Free Grammars
Finally: G3: S AB
A BB
B ab
Removing -moves: -moves slows down the parser
Definition: X belonging to V is nullable if X
Idea: Find nullable symbols recursively
Basis: If P contains A , then label A as nullabel
Induction: For all productions X X
1
X
2
X
3
X
k
, if X
i
is nullable,
label X as nullable
Terminate when no new symbol could be found
Context Free Grammars
Example: G1: S ABC|BCB
AaB|a
BCC|b
CS|
Finding nullable: Base: {C} is nullable
{B,C} is nullable
{S, B, C} is nullable
Suppose we eliminate C
Originally we could derive
S BCB, S CB, S BC, S BB, S C, S B
But now we cant. So we must add in all these effects via direct
productions.
Context Free Grammars
Overall algorithm:
a) Identify all nullable symbols
b) Replace any prod X X
1
X
2
X
3
X
k
by set of productions of
the form X
1
2
3
k
, where;
a)
i
=X
i
if X
i
is non-nullable
b)
i
=X
i
or if X
i
is nullable
c) Remove all -productions
So in previous example: new G2 becomes
S ABC|AB|AC|A|BCB|BC|CB|BB|B|C|
A aB|a
B CC|C|b|
C S|
Now remove all -productions.
Context Free Grammars
Glitch: Originally S was possible, but after final step we do lose
from L(G) This is unavoidable
Removing unit productions ( e.g. A B)
Algorithm: Step 1: Remove -productions
Step 2: For all X, Y belonging to V
if X Y and Y is not unit
then add X
Step 3: eliminate all unit productions
Finding X Y
Since no -production, X Y only if
X Y
1
Y
2
Y
3
Y
With all Y
i
being distinct. Thus k |V|. Can use reachability in
directed graphs
Context Free Grammars
Example: G: S A|B
A Sa|a
B S|b
Algorithm: S A, S B
B S, B A
Get S Sa|a|b|S|A|B
A Sa|a
B Sa|a|b|S|A|B
Removing unit productions
S Sa|a|b
A Sa|a
B Sa|a|b
Observe: A and B are now useless as not reachable
Context Free Grammars
Question to remove useless | -Prod|unit prod all together,
does order matter?
Observe:
a) Removing useless stuff cannot add -Prod|unit prod
b) Removing -Prod could add unit productions
c) Removing unit productions
a) Need to remove -Prod first
b) Could create useless symbols but not -Prod.
Thus use following order:
1. -Productions
2. Unit productions (no epsilons added)
3. Useless symbols (No productions added)
Context Free Grammars
Chomsky Normal Form:
CFG G is in Chomsky Normal Form(CNF) if all its productions
are of the form
A a
A XY,
Theorem: Given any CFG G
1
with not in language L(G
1
)
we can find CNF grammar G
2
such that
L(G
1
) = L(G
2
)
Construction: Three step process:
Step1: Eliminate unit productions and -productions
Now all productions are of the form
A a
A X
1
X
2
X
k
, with X
1
, X
2
, X
k
belongs to V U T
Context Free Grammars
Step 2: Remove mixed bodies
For each a belonging to T add new variable Va and
V
a
a
In each production A X
1
X
k
replace a by V
a
Now all productions are of the form
A a
A A
1
A
k
with A
i
belonging to V
Step 3: Factor long productions
For A A
1
A
2
A
k
, for k 3
Add new variables B
1
B
2
B
k-2
Context Free Grammars
Replace A A
1
A
k
By A A
1
B
1
B
1
A
2
B
2
B
2
A
3
B
3
B
k-2
A
k-1
A
k
Verify: Get CNF grammar and Language is preserved
Example: G1: S ABB|ab
A Ba|ba
B aAbB
Context Free Grammars
Step 2: V
a
a
V
b
b
S ABB|V
a
V
b
A BV
a|
V
b
V
a
B V
a
AV
b
B
Context Free Grammars
Step 3: V
a
a
V
b
b
S AX
1
|V
a
V
b
X
1
BB
A BV
a
|V
b
V
a
B V
a
Y
1
Y
1
AY
2
Y
2
V
b
B
Context Free Grammars
Greibach Normal Form (GNF)
Theorem: A CFG G is in Greibach Normal Form if every
production is of the form
A a, where belongs to V* and a belongs to .
Note: = (Allowed)
GNF is a natural generalization of regular grammar. In
regular grammar the productions are of the form A a,
where
} {c o e E e V and a
Context Free Grammars
Modifying productions(assume doesnt start with V)
Modification 1: Productions of type A B:
For any production of the form A B, where we have
other productions of the form B
1
|
2
||
k
, replace this
particular A-production with
A
1
|
2
||
k
Modification 2: Productions of the form A A:
For productions of the form A
A
1
|A
2
||A
k
|
1
|
2
||
m
, Let Z be a new variable.
Define new productions as follows:
a) A
1
|
2
||
m
, A
1
Z|
2
Z||
l
Z
b) Z
1
|
2
||
k
, Z
1
Z|
2
Z||
k
Z
Context Free Grammars
Steps 1: Eliminate all -productions and construct a grammar
G
1
in Chomsky Normal form
Rename all variables as A
1
, A
2
, A
3
, A
n
, where S = A
1
Step 2: Apply modification 1 on productions of type A
i
A
j
,
where j < i
Step 3: Apply modification 2 on productions of type A A
Step 4: Apply modification 1 on productions of type A
i
A
j
,
where j > i
Step 5: Modify Z productions to convert them to the form Z
a
____________________________________
Example: Convert G
1
= (V,T,P,S) defined as S AA|a, A
SS|b to a grammar G
2
in GNF.
Step 1: G
1
is already in CNF so rename variables as A
1
= S, A
2
= A
Such that, A
1
A
2
A
2
|a, A
2
A
1
A
1
|b
Context Free Grammars
Step 2: A
1
prod are in required form. A
2
b is in required
form
Modify A
2
A
1
A
1
. So resulting prod are
A
2
A
2
A
2
A
1
|aA
1
Production P now becomes:
A
1
A
2
A
2
|a
A
2
A
2
A
2
A
1
|aA
1
|b
Step 3: Apply modification 2 on A
2
prods. Let Z be new
variable.
A
2
aA
1
, A
2
b
A
2
aA
1
Z, A
2
bZ
Z A
2
A
1
, Z A
2
A
1
Z
Context Free Grammars
Step 4: Modify A
1
A
2
A
2
using modification 1. As A
2
productions are A
2
aA
1
|b|aA
1
Z|bZ,
the set of modified A
1
productions is:
A
1
a|aA
1
A
2
|bA
2
|aA
1
ZA
2
|bZA
2
Step 5: Modify Z productions. Z productions are
Z A
2
A
1
|A
2
A
1
Z
Applying modification 1, it becomes,
Z aA
1
A
1
|bA
1
|aA
1
ZA
1
|bZA
1
Z aA
1
A
1
Z|bA
1
Z|aA
1
ZA
1
Z|bZA
1
Z
Context Free Grammars
The resulting grammar thus has following productions
rules:
A
1
a|aA
1
A
2
|bA
2
|aA
1
ZA
1
|bZA
2
A
2
aA
1
|b|aA
1
Z|bZ
Z aA
1
A
1
|bA
1
|aA
1
ZA
1
|bZA
1
Z aA
1
A
1
Z|bA
1
Z|aA
1
ZA
1
Z|bZA
1
Z