0% found this document useful (0 votes)
875 views290 pages

Lectures Examples and Solutions of CFG&RE

This document provides examples and solutions for context-free grammars (CFGs) and regular expressions (REs). It begins by giving CFGs that generate languages of strings with even lengths, odd lengths, signed integers in C++, and palindromes. It then gives CFGs and REs for languages containing strings that begin with a and end with b, contain a specified number of c, or where the number of a's equals the number of b's. The document concludes by providing solutions to generate languages involving nested parentheses, integer identifiers, real numbers in Pascal, and other context-sensitive problems. Overall, the document demonstrates how to write CFGs and REs to describe various formal languages.

Uploaded by

Anutaj Nagpal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
875 views290 pages

Lectures Examples and Solutions of CFG&RE

This document provides examples and solutions for context-free grammars (CFGs) and regular expressions (REs). It begins by giving CFGs that generate languages of strings with even lengths, odd lengths, signed integers in C++, and palindromes. It then gives CFGs and REs for languages containing strings that begin with a and end with b, contain a specified number of c, or where the number of a's equals the number of b's. The document concludes by providing solutions to generate languages involving nested parentheses, integer identifiers, real numbers in Pascal, and other context-sensitive problems. Overall, the document demonstrates how to write CFGs and REs to describe various formal languages.

Uploaded by

Anutaj Nagpal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 290

Lectures examples and solutions of CFG&RE

1- Write a CFG that accepts even length of statements over ∑= {a}.


L={λ, aa,aaaa,…}
S-> aaS|λ
2- Write a CFG that accepts odd length sentences over ∑={a} .
L={a,aaa,aa aaa,…}
S-> aaS|a
Hint: λ is even so we didn’t include it in L.
3- Write CFG that accepts signed integer in C++
S-> x N
x-> +|-|λ
N-> A N |A
A-> 0|1|…|9
4- Write a CFG that accepts all palindrome sentences over ∑={a,b}.
L={λ,a,b,aa,bb,aba,bab,…}
S -> aSa|bSb|a|b|λ
5- Write a CFG that generates all sentences that begin with a , and end with b and contain :
- One C
- One or more C

Contain one C Contain one or more c


L={acb,aabcb,…} S->ax c xb
S -> ax c xb x-> ax|bx|cx|λ
x->ax |bx|λ
Another solution : Another solution :
S-> aA S-> aA
A-> aA|bA|cB A-> aA|bA|cB
B->ab|bB|b B->aB|bB|cB|b

6- Write CFG for L(G)= { a^n b^n c^m | n,m>=0 } Over ∑={a,b,c}
L= {λ,abc,aabbc,abccc,…}
S-> AB
A->aAb|λ
B->cB|λ
7- Describe in English the following grammar
S-> aS|Sb|b
This grammar generates:
- Set of a’s followed by b
- a’s followed by set of b’s
- one b
8- Describe in English :
S-> aS|bB
B->aB|λ

This grammar generates: Set of a’s followed by b followed by set of a’s.


9- Write a CFG for Grammar of integer identifier , Over ∑={a,b,c}
Example: int ab,ba, aa ;
S->TDx
x->;
T->int
D->D,V|V
V> aV|bV|cV|a|b|c
10- Write CFG for nested parentheses ( ( ( ) ) ) .
S-> (S)|()
11- Write CFG for Multiplied parentheses () () () .
S -> ()S | ()
12- Write a CFG for well-formed parentheses
Like : ()() (()) ()
S-> SS|(S)|()
13- Write CFG for language that has
# a’s = # b’s
L={λ,ab,aabb,baab,bbaaab,…}
S-> SS|bSa|aSb|λ
14- Write a CFG for : L(G)= { b^n a^m b^2n | m,n>=0}
L={λ,babb,a,bbb,…}
S-> bsbb|A
A-> aA|λ
15- Write a CFG for L(G)={a^m b^n a^n b^m | m,n>=0}
S-> aSb|x
x->bxa|λ
16- Write a CFG for L(G)={a^m b^n a^m+2 |m,n>=0}
To simplify L(G)= { a^m b^n a^m aa | m,n>=0}
S-> Maa
M-> aMa|x
x->bx|λ
17- Write a CFG for real numbers in pascal
Ex. 345.678 E (+ or – or λ) 569
<real> -> <digit> <digits> <decimal Part> <exp>
<digit > -> 0|1|2|...|9
<digits> -> <digit><digits> | λ
<decimal Part> -> <Dot> <digit> <digits> |λ
<Dot> -> .
<exp> - > <E> <sign> <digit> >digits> |λ
<E> -> E
<sign> -> +|-|λ
18- Write a CFG for language that doesn’t have bb , over ∑={a,b}
S-> aS|baS|xb|λ
x-> bax|ax|λ
19- Write the last example in Regular expression
(a ∪ ba)* ∪ (a ∪ ba)* b
20- L(G) = { a^n b^n c^n | n>=1}
This problem solved using
L= {abc , aabbcc,…} context sensitive
S-> aSBc |abc grammar only, we can’t
cB->Bc solve it using CFG

bB->bb

21- Write RE for all sentences over ∑={a,b,c} , such that all sentences has one c .

RE = (a ∪ b)* c (a∪ b)*

In CFG
S-> NcN
N->aN|bN|λ

In regular grammar
S-> aS|bS|cA
A-> aA|bA|λ

22- Write the same sentences in the last example but contain at least 2’c adjacent.

L = {accb,bcaccb,acbbccccbac,…}
RE = ( a ∪ b ∪ c )* c c ( a ∪ b ∪ c )*

In CFG
S-> NccN
N-> aN|bN|cN|λ

23- Write a CFG over Z={a,b} , such that each b followed by a , and # of a’s are twice the # of b’s .

L={ baa,aba,aababa,…}
S-> aSba| baSa|SS|λ

24- L(G)= {a^n b^m C^n+m | n,m>=0}

To simplify L(G)= a^n b^m c^m C^n


CFG
S-> aSc|x
x->bxc|λ

25- L(G)= { am bn+2 cm+2 | m,n>=0 }

To simplify
L(G)= a^m b^n bb c^m cc
S-> aSc|A
A->B bb cc
B->bB|λ
Solution2
S-> aSc|bbx
x-> bx|cc

26- L(G) = a* a+ b* b+
Simplify = L(G)= a+ b+
CFG
S->AB
A-> aA|a
B->bA|b
Converting CFGs to CNF (Chomsky Normal Form)
Richard Cole
October 17, 2007

A CNF grammar is a CFG with rules restricted as follows.

The right hand side of a rule consists of:

i. Either a single terminal, e.g. A → a.


ii. Or two variables, e.g. A → BC,
iii. Or the rule S → , if  is in the language.

iv. The start symbol S may appear only on the left hand side of rules.

Given a CFG G, we show how to convert it to a CNF grammar G0 generating the same
language.
We use a grammar G with the following rules as a running example.

S → ASA | aB; A → B | S; B → b | 
We proceed in a series of steps which gradually enforce the above CNF criteria; each step
leaves the generated language unchanged.

Step 1 For each terminal a, we introduce a new variable, Ua say, add a rule Ua → a, and
for each occurrence of a in a string of length 2 or more on the right hand side of a rule,
replace a by Ua . Clearly, the generated language is unchanged.
Example: If we have the rule A → Ba, this is replaced by Ua → a, A → BUa .
This ensures that terminals on the right hand sides of rules obey criteria (i) above.
This step changes our example grammar G to have the rules:

S → ASA | Ua B; A → B | S; B → b | ; Ua → a

1
2

Step 2 For each rule with 3 or more variables on the righthand side, we replace it with a
new collection of rules obeying criteria (ii) above. Suppose there is a rule U → W1 W2 · · · Wk ,
for some k ≥ 3. Then we create new variables X2 , X3 , · · · , Xk−1 , and replace the prior rule
with the rules:

U → W1 X2 ; X2 → W2 X3 ; · · · ; Xk−2 → Wk−2 Xk−1 ; Xk−1 → Wk−1 Wk


. Clearly, the use of the new rules one after another, which is the only way they can be used,
has the same effect as using the old rule U → W1 W2 · · · Wk . Thus the generated language is
unchanged.
This ensures, for criteria (ii) above, that no right hand side has more than 2 variables.
We have yet to eliminate right hand sides of one variable or of the form .
This step changes our example grammar G to have the rules:

S → AX | Ua B; X → SA; A → B | S; B → b | ; Ua → a

Step 3 We replace each occurrence of the start symbol S with the variable S 0 and add the
rule S → S 0 . This ensures criteria (iv) above.
This step changes our example grammar G to have the rules:

S → S 0 ; S 0 → AX | Ua B; X → S 0 A; A → B | S 0 ; B → b | ; Ua → a

Step 4 This step removes rules of the form A → , as follows. First, we determine
all variables that can generate  in one or more moves. We explain how to do this two
paragraphs down. Then for each such variable A, for each occurrence of A in a 2-variable
right hand side, we create a new rule with the A omitted; i.e. if there is a rule C → AB we
create the new rule C → B, and if there is a rule C → DA we create the new rule C → D
(if there is a rule C → AA, we create the new rule C → A). Then we remove all rules of the
form A → , apart from S → , if present (i.e. we keep rule S → , if present).
The new rules serve to shortcut a previously generatable instances of , i.e. if previously
we had used a rule A → BC, and then in a series of steps had generated  from B, which
has the net effect of generating C from A, we could instead do this directly by applying the
new rule A → C. Consequently, the generated language is unchanged.
To find the variables that can generate , we use an iterative rule reduction procedure.
First, we make a copy of all the rules. We then modify the rules by removing from the right
hand sides all instances of variables A for which there is a rule A → . We keep iterating
this procedure so long as it creates new reduced rules with  on the right hand side. (An
efficient implementation keeps track of the lengths of each right hand side, and a list of the
locations of each variable; the new rules with  on the right hand side are those which have
newly obtained length 0. It is not hard to have this procedure run in time linear in the sum
of the lengths of the rules.)
This step changes our example grammar G to have the rules:
3

S → S 0 ; S 0 → AX | X | Ua B; X → S 0 A | S 0 ; A → B | S 0 ; B → b; Ua → a

Step 5 This step removes rules of the form A → B, which we call unit rules. We form
the directed graph defined by these rules, i.e. for each rule A → B, we create a directed
edge (A, B). For each strong component in this graph, we replace the variables it contains
with a single one of these variables in all the rules in which these variables occur. So if
U1 , U2 , · · · , Uk form a strong component (and so any one of these variables can be replaced,
in a sequence of applications of unit rules, by any other of these variables) then we replace
every occurrence of Ui , 2 ≤ i ≤ k with U1 in every rule in which they occur.
In the example grammar, the one non-trivial strong component contains the variables
{S , X}. We replace S 0 with X yielding the rules:
0

S → X; X → AX | X | Ua B; X → XA | X; A → B | X; B → b; Ua → a
We can remove the unnecessary rule X → X also.
Next, we traverse the resulting acyclic graph, in reverse topological order (i.e. starting at
nodes with no outedges and working back from these nodes); for each traversed edge (E, F ),
which corresponds to a rule E → F , for each rule F → CD, we add the rule E → CD,
and then remove the rule E → F . Any derivation which had used the rules E → F and
F → CD in turn can now use the rule E → CD instead. So the same strings are derived
with the new set of rules. (This can be implemented via a depth first search on the acyclic
graph.)
This step changes our example grammar G to have the rules:

S → AX | Ua B | XA; X → AX | Ua B | XA; A → b | AX | Ua B | XA; B → b; Ua → a

Steps 4 and 5 complete the observance of criteria (ii), and thereby creates a CNF grammar
generating the same language as the original grammar.
CS 360
Naomi Nishimura

State elimination
Note: this information is meant to cover some material used in class but absent from the
textbook. This is not intended to be comprehensive or a replacement for attending lecture.
This handout is based on material developed by Jeff Shallit for CS 360, in turn based on
material developed by Eric Bach of the University of Wisconsin.
In Section 3.2.2 of the textbook, an algorithm is given for constructing a regular expression
from a DFA. The algorithm presented here (and in class) is simpler to understand, and applies
to NFA’s and -NFA’s as well.
As in the textbook, we will remove states from the automaton, replacing labels of arcs, so
that in the end a single regular expression is formed. The single regular expression will be the
label on an arc that goes from the start state to the accepting state, and this will be the only
arc in the automaton.
The algorithm forms the simpler automaton as follows. In step 1, we modify the automaton
to have a start state that is not an accepting state and has no transitions in (either self-loops or
from other states). In step 2, we create an equivalent automaton that has a single accepting state
with no transitions out. These will be the two states that remain at the end of the algorithm.
In step 3, the other states are eliminated, in any order. Details of the algorithm follow, along
with a running example, illustrated below.

b
b
a b

1 2 3 4
!
a a

Step 1
If the start state is an accepting state or has transitions in, add a new non-accepting start
state and add an -transition between the new start state and the former start state.

b
b
a b

1 2 3 4
!
a a
! a

0
CS 360: State elimination 2

Step 2
If there is more than one accepting state or if the single accepting state has transitions out,
add a new accepting state, make all other states non-accepting, and add an -transition from
each former accepting state to the new accepting state.

b
b
a b

1 2 3 4
!
a a

!
!

!
0 0
5

Step 3
For each non-start non-accepting state in turn, eliminate the state and update transitions
according to the procedure given on page 99 of the textbook, Figures 3.7 and 3.8. The following
illustrations depict the removal of states 1, 2, 3, and 4 in that order.

b
b b

aa

2 ab + ! 3 4

!
a
!
0 0
5
CS 360: State elimination 3

b
b

b + a(aa)∗ (ab + !)
3 4

a + a(aa)∗ (ab + !)
!

!
0 0
5

(a + a(aa)∗ (ab + !))b∗ b

(b + a(aa)∗ (ab + !))b∗ b 4

! + (a + a(aa)∗ (ab + !))b∗

(b + a(aa)∗ (ab + !))b∗

0 0
5

(b + a(aa)∗ (ab + !))b∗ +

((b + a(aa)∗ (ab + !))b∗ b)((a + a(aa)∗ (ab + !))b∗ b)∗ (! + (a + a(aa)∗ (ab + !))b∗ )

0 0
5
CS 208: Automata Theory and Logic
Lecture 6: Context-Free Grammar

Ashutosh Trivedi

b a a

start A ∀x(La (x) → ∃y.(x < y) ∧ Lb (y))


B

b
Department of Computer Science and Engineering,
Indian Institute of Technology Bombay.

Ashutosh Trivedi – 1 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Context-Free Grammars

Pushdown Automata

Properties of CFLs

Ashutosh Trivedi – 2 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Context-Free Grammars

Noam Chomsky
(linguist, philosopher, logician, and activist)
“ A grammar can be regarded as a device that enumerates the sentences of a language. We
study a sequence of restrictions that limit grammars first to Turing machines, then to two
types of systems from which a phrase structure description of a generated language can be
drawn, and finally to finite state Markov sources (finite automata). ”
Ashutosh Trivedi – 3 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Grammars
A (formal) grammar consists of
1. A finite set of rewriting rules of the form

φ→ψ

where φ and ψ are strings of symbols.


2. A special “initial” symbol S (S standing for sentence);
3. A finite set of symbols stand for “words” of the language called
terminal vocabulary;
4. Other symbols stand for “phrases” and are called non-terminal
vocabulary.

Ashutosh Trivedi – 4 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Grammars
A (formal) grammar consists of
1. A finite set of rewriting rules of the form

φ→ψ

where φ and ψ are strings of symbols.


2. A special “initial” symbol S (S standing for sentence);
3. A finite set of symbols stand for “words” of the language called
terminal vocabulary;
4. Other symbols stand for “phrases” and are called non-terminal
vocabulary.
Given such a grammar, a valid sentence can be generated by
1. starting from the initial symbol S,
2. applying one of the rewriting rules to form a new string φ by
applying a rule S → φ1 ,
3. and apply another rule to form a new string φ2 and so on,
4. until we reach a string φn that consists only of terminal symbols.
Ashutosh Trivedi – 4 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Examples

Consider the grammar

S → AB (1)
A → C (2)
CB → Cb (3)
C → a (4)

where {a, b} are terminals, and {S, A, B, C} are non-terminals.

Ashutosh Trivedi – 5 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Examples

Consider the grammar

S → AB (1)
A → C (2)
CB → Cb (3)
C → a (4)

where {a, b} are terminals, and {S, A, B, C} are non-terminals.


We can derive the phrase “ab” from this grammar in the following way:

S → AB, from (1)


→ CB, from (2)
→ Cb, from (3)
→ ab, from (4)

Ashutosh Trivedi – 5 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Examples

Consider the grammar

S → NounPhrase VerbPhrase (5)


NounPhrase → SingularNoun (6)
SingularNoun VerbPhrase → SingularNoun comes (7)
SingularNoun → John (8)

We can derive the phrase “John comes” from this grammar in the
following way:

S → NounPhrase VerbPhrase, from (1)


→ SingularNoun VerbPhrase, from (2)
→ SingularNoun comes, from (3)
→ John comes, from (4)

Ashutosh Trivedi – 6 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Types of Grammars
Depending on the rewriting rules we can characterize the grammars in the
following four types:
1. type 0 grammars with no restriction on rewriting rules;
2. type 1 grammars have the rules of the form

αAβ → αγβ

where A is a nonterminal, α, β, γ are strings of terminals and


nonterminals, and γ is non empty.
3. type 2 grammars have the rules of the form

A→γ

where A is a nonterminal, and γ is a string (potentially empty) of


terminals and nonterminals.
4. type 3 grammars have the rules of the form

A → aB or A → a

where A, B are nonterminals, and a is a string (potentially empty) of


terminals. Ashutosh Trivedi – 7 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Types of Grammars
Depending on the rewriting rules we can characterize the grammars in the
following four types:
1. Unrestricted grammars with no restriction on rewriting rules;
2. Context-sensitive grammars have the rules of the form
αAβ → αγβ
where A is a nonterminal, α, β, γ are strings of terminals and
nonterminals, and γ is non empty.
3. Context-free grammars have the rules of the form
A→γ
where A is a nonterminal, and γ is a string (potentially empty) of
terminals and nonterminals.
4. Regular grammars have the rules of the form
A → aB or A → a
where A, B are nonterminals, and a is a string (potentially empty) of
terminals.
Ashutosh Trivedi – 8 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Types of Grammars
Depending on the rewriting rules we can characterize the grammars in the
following four types:
1. Unrestricted grammars with no restriction on rewriting rules;
2. Context-sensitive grammars have the rules of the form
αAβ → αγβ
where A is a nonterminal, α, β, γ are strings of terminals and
nonterminals, and γ is non empty.
3. Context-free grammars have the rules of the form
A→γ
where A is a nonterminal, and γ is a string (potentially empty) of
terminals and nonterminals.
4. Regular grammars have the rules of the form
A → aB or A → a
where A, B are nonterminals, and a is a string (potentially empty) of
terminals. (also left-linear grammars)
Ashutosh Trivedi – 8 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Do regular grammars capture regular languages?

– Regular grammars to finite automata


– Finite automata to regular grammars

Ashutosh Trivedi – 9 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Context-Free Languages: Syntax
Definition (Context-Free Grammar)
A context-free grammar is a tuple G = (V, T, P, S) where
– V is a finite set of variables (nonterminals, nonterminals vocabulary);
– T is a finite set of terminals (letters);
– P ⊆ V × (V ∪ T)∗ is a finite set of rewriting rules called productions,
– We write A → β if (A, β) ∈ P;
– S ∈ V is a distinguished start or “sentence” symbol.

Ashutosh Trivedi – 10 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Context-Free Languages: Syntax
Definition (Context-Free Grammar)
A context-free grammar is a tuple G = (V, T, P, S) where
– V is a finite set of variables (nonterminals, nonterminals vocabulary);
– T is a finite set of terminals (letters);
– P ⊆ V × (V ∪ T)∗ is a finite set of rewriting rules called productions,
– We write A → β if (A, β) ∈ P;
– S ∈ V is a distinguished start or “sentence” symbol.

Example: G0n 1n = (V, T, P, S) where


– V = {S};
– T = {0, 1};
– P is defined as

S → ε
S → 0S1

– S = S.
Ashutosh Trivedi – 10 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Context-Free Languages: Semantics
Derivation:
– Let G = (V, T, P, S) be a context-free grammar.
– Let αAβ be a string in (V ∪ T)∗ V(V ∪ T)∗
– We say that αAβ yields the string αγβ, and we write αAβ⇒αγβ if

A → γ is a production rule in G.

– For strings α, β ∈ (V ∪ T)∗ , we say that α derives β and we write



α=⇒ β if there is a sequence α1 , α2 , . . . , αn ∈ (V ∪ T)∗ s.t.

α → α1 → α2 · · · αn → β.

Ashutosh Trivedi – 11 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Context-Free Languages: Semantics
Derivation:
– Let G = (V, T, P, S) be a context-free grammar.
– Let αAβ be a string in (V ∪ T)∗ V(V ∪ T)∗
– We say that αAβ yields the string αγβ, and we write αAβ⇒αγβ if

A → γ is a production rule in G.

– For strings α, β ∈ (V ∪ T)∗ , we say that α derives β and we write



α=⇒ β if there is a sequence α1 , α2 , . . . , αn ∈ (V ∪ T)∗ s.t.

α → α1 → α2 · · · αn → β.

Definition (Context-Free Grammar: Semantics)


The language L(G) accepted by a context-free grammar G = (V, T, P, S) is
the set

L(G) = {w ∈ T∗ : S = ⇒ w}.

Ashutosh Trivedi – 11 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
CFG: Example

Recall G0n 1n = (V, T, P, S) where


– V = {S};
– T = {0, 1};
– P is defined as

S → ε
S → 0S1

– S = S.

The string 000111 ∈ L(G0n 1n ), i.e. S =
⇒ 000111 as

S ⇒ 0S1 ⇒ 00S11 ⇒ 000S111 ⇒ 000111.

Ashutosh Trivedi – 12 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Prove that 0n 1n is accepted by the grammar G0n 1n .

The proof is in two parts.


– First show that every string w of the form 0n 1n can be derived from S
using induction over w.
– Then, show that for every string w ∈ {0, 1}∗ derived from S, we have
that w is of the form 0n 1n .

Ashutosh Trivedi – 13 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
CFG: Example
Consider the following grammar G = (V, T, P, S) where
– V = {E, I}; T = {a, b, 0, 1}; S = E; and
– P is defined as

E → I | E + E | E ∗ E | (E)
I → a | Ia | Ib | I0 | I1

The string (a1 + b0 ∗ a1) ∈ L(G), i.e. E =
⇒ (a1 + b0 ∗ a1) as

E ⇒ (E) ⇒ (E + E) ⇒ (I + E) ⇒ (I1 + E) ⇒ (a1 + E) =
⇒ (a1 + b0 ∗ a1).

Ashutosh Trivedi – 14 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
CFG: Example
Consider the following grammar G = (V, T, P, S) where
– V = {E, I}; T = {a, b, 0, 1}; S = E; and
– P is defined as

E → I | E + E | E ∗ E | (E)
I → a | Ia | Ib | I0 | I1

The string (a1 + b0 ∗ a1) ∈ L(G), i.e. E =
⇒ (a1 + b0 ∗ a1) as

E ⇒ (E) ⇒ (E + E) ⇒ (I + E) ⇒ (I1 + E) ⇒ (a1 + E) =
⇒ (a1 + b0 ∗ a1).

E ⇒ (E) ⇒ (E + E) ⇒ (E + E ∗ E) ⇒ (E + E ∗ I) =
⇒ (a1 + b0 ∗ a1).

Ashutosh Trivedi – 14 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
CFG: Example
Consider the following grammar G = (V, T, P, S) where
– V = {E, I}; T = {a, b, 0, 1}; S = E; and
– P is defined as

E → I | E + E | E ∗ E | (E)
I → a | Ia | Ib | I0 | I1

The string (a1 + b0 ∗ a1) ∈ L(G), i.e. E =
⇒ (a1 + b0 ∗ a1) as

E ⇒ (E) ⇒ (E + E) ⇒ (I + E) ⇒ (I1 + E) ⇒ (a1 + E) =
⇒ (a1 + b0 ∗ a1).

E ⇒ (E) ⇒ (E + E) ⇒ (E + E ∗ E) ⇒ (E + E ∗ I) =
⇒ (a1 + b0 ∗ a1).

Leftmost and rightmost derivations:


1. Derivations are not unique
2. Leftmost and rightmost derivations
3. Define ⇒lm and ⇒rm in straightforward manner.
4. Find leftmost and rightmost derivations of (a1 + b0 ∗ a1).
Ashutosh Trivedi – 14 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Exercise

Consider the following grammar:

S → AS | ε.
S → aa | ab | ba | bb

Give leftmost and rightmost derivations of the string aabbba.

Ashutosh Trivedi – 15 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Parse Trees

– A CFG provide a structure to a string


– Such structure assigns meaning to a string, and hence a unique
structure is really important in several applications, e.g. compilers
– Parse trees are a successful data-structures to represent and store such
structures

Ashutosh Trivedi – 16 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Parse Trees

– A CFG provide a structure to a string


– Such structure assigns meaning to a string, and hence a unique
structure is really important in several applications, e.g. compilers
– Parse trees are a successful data-structures to represent and store such
structures
– Let’s review the Tree terminology:
– A tree is a directed acyclic graph (DAG) where every node has at most
incoming edge.

Ashutosh Trivedi – 16 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Parse Trees

– A CFG provide a structure to a string


– Such structure assigns meaning to a string, and hence a unique
structure is really important in several applications, e.g. compilers
– Parse trees are a successful data-structures to represent and store such
structures
– Let’s review the Tree terminology:
– A tree is a directed acyclic graph (DAG) where every node has at most
incoming edge.
– Edge relationship as parent-child relationship
– Every node has at most one parent, and zero or more children
– We assume an implicit order on children (“from left-to-right”)
– There is a distinguished root node with no parent, while all other nodes
have a unique parent
– There are some nodes with no children called leaves—other nodes are
called interior nodes
– Ancestor and descendent relationships are closure of parent and child
relationships, resp.

Ashutosh Trivedi – 16 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Parse Tree

Given a grammar G = (V, T, P, S), the parse trees associated with G has
the following properties:
1. Each interior node is labeled by a variable in V.
2. Each leaf is either a variable, terminal, or ε. However, if a leaf is ε it is
the only child of its parent.
3. If an interior node is labeled A and has children labeled X1 , X2 , . . . , Xk
from left-to-right, then

A → X1 X2 . . . Xk

is a production is P. Only time Xi can be ε is when it is the only child


of its parent, i.e. corresponding to the production A → ε.

Ashutosh Trivedi – 17 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Reading exercise

– Give parse tree representation of previous derivation exercises.

Ashutosh Trivedi – 18 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Reading exercise

– Give parse tree representation of previous derivation exercises.


– Are leftmost-derivation and rightmost derivation parse-trees always
different?

Ashutosh Trivedi – 18 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Reading exercise

– Give parse tree representation of previous derivation exercises.


– Are leftmost-derivation and rightmost derivation parse-trees always
different?
– Are parse trees unique?

Ashutosh Trivedi – 18 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Reading exercise

– Give parse tree representation of previous derivation exercises.


– Are leftmost-derivation and rightmost derivation parse-trees always
different?
– Are parse trees unique?
– Answer is no. A grammar is called ambiguous if there is at least one
string with two different leftmost (or rightmost) derivations.

Ashutosh Trivedi – 18 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Reading exercise

– Give parse tree representation of previous derivation exercises.


– Are leftmost-derivation and rightmost derivation parse-trees always
different?
– Are parse trees unique?
– Answer is no. A grammar is called ambiguous if there is at least one
string with two different leftmost (or rightmost) derivations.
– There are some inherently ambiguous languages.

L = {an bn cm dm : n, m ≥ 1} ∪ {an bm cn dm : n, m ≥ 1}.

Write a grammar accepting this language. Show that the string


a2 b2 c2 d2 has two leftmost derivations.

Ashutosh Trivedi – 18 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Reading exercise

– Give parse tree representation of previous derivation exercises.


– Are leftmost-derivation and rightmost derivation parse-trees always
different?
– Are parse trees unique?
– Answer is no. A grammar is called ambiguous if there is at least one
string with two different leftmost (or rightmost) derivations.
– There are some inherently ambiguous languages.

L = {an bn cm dm : n, m ≥ 1} ∪ {an bm cn dm : n, m ≥ 1}.

Write a grammar accepting this language. Show that the string


a2 b2 c2 d2 has two leftmost derivations.
– There is no algorithm to decide whether a grammar is ambiguous.

Ashutosh Trivedi – 18 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Reading exercise

– Give parse tree representation of previous derivation exercises.


– Are leftmost-derivation and rightmost derivation parse-trees always
different?
– Are parse trees unique?
– Answer is no. A grammar is called ambiguous if there is at least one
string with two different leftmost (or rightmost) derivations.
– There are some inherently ambiguous languages.

L = {an bn cm dm : n, m ≥ 1} ∪ {an bm cn dm : n, m ≥ 1}.

Write a grammar accepting this language. Show that the string


a2 b2 c2 d2 has two leftmost derivations.
– There is no algorithm to decide whether a grammar is ambiguous.
– What does that mean from application side?

Ashutosh Trivedi – 18 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
In-class Quiz

Write CFGs for the following languages:


1. Strings ending with a 0
2. Strings containing even number of 1’s
3. palindromes over {0, 1}
4. L = {ai bj : i ≤ 2j} or L = {ai bj : i < 2j} or L = {ai bj : i 6= 2j}
5. L = {ai bj ck : i = k}
6. L = {ai bj ck : i = j}
7. L = {ai bj ck : i = j + k}.
8. L = {w ∈ {0, 1}∗ : |w|a = |w|b }.
9. Closure under union, concatenation, and Kleene star
10. Closure under substitution, homomorphism, and reversal

Ashutosh Trivedi – 19 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Syntactic Ambiguity in English

Ashutosh Trivedi – 20 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Syntactic Ambiguity in English

—Anthony G. Oettinger

Ashutosh Trivedi – 20 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Context-Free Grammars

Pushdown Automata

Properties of CFLs

Ashutosh Trivedi – 21 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Pushdown Automata

0, X 7→ 0X 0, 0 7→ ε

ε, X 7→ X ε, ⊥ 7→ ⊥
start q0 q1 q2

1, X 7→ 1X 1, 1 7→ ε

Anthony G. Oettinger – Introduced independently by Anthony G. Oettinger


in 1961 and by Marcel-Paul Schützenberger in 1963
– Generalization of ε-NFA with a “stack-like” storage
mechanism
– Precisely capture context-free languages
– Deterministic version is not as expressive as
non-deterministic one
– Applications in program verification and syntax
M. P. Schutzenberger analysis
Ashutosh Trivedi – 22 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Example 1: L = {ww : w ∈ {0, 1}∗ }

input tape 1 1 1 0 0 1 1 1

pushdown stack
0, X 7→ 0X 0, 0 7→ ε

ε, X 7→ X ε, ⊥ 7→ ⊥
start q0 q1 q2


1, X 7→ 1X 1, 1 7→ ε

Ashutosh Trivedi – 23 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Example 1: L = {ww : w ∈ {0, 1}∗ }

input tape 1 1 1 0 0 1 1 1

pushdown stack
0, X 7→ 0X 0, 0 7→ ε

ε, X 7→ X ε, ⊥ 7→ ⊥
start q0 q1 q2
1

1, X 7→ 1X 1, 1 7→ ε

Ashutosh Trivedi – 23 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Example 1: L = {ww : w ∈ {0, 1}∗ }

input tape 1 1 1 0 0 1 1 1

pushdown stack

0, X 7→ 0X 0, 0 7→ ε

ε, X 7→ X ε, ⊥ 7→ ⊥
start q0 q1 q2 1
1

1, X 7→ 1X 1, 1 7→ ε

Ashutosh Trivedi – 23 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Example 1: L = {ww : w ∈ {0, 1}∗ }

input tape 1 1 1 0 0 1 1 1

pushdown stack

0, X 7→ 0X 0, 0 7→ ε

1
ε, X 7→ X ε, ⊥ 7→ ⊥
start q0 q1 q2 1
1

1, X 7→ 1X 1, 1 7→ ε

Ashutosh Trivedi – 23 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Example 1: L = {ww : w ∈ {0, 1}∗ }

input tape 1 1 1 0 0 1 1 1

pushdown stack

0, X 7→ 0X 0, 0 7→ ε
0
1
ε, X 7→ X ε, ⊥ 7→ ⊥
start q0 q1 q2 1
1

1, X 7→ 1X 1, 1 7→ ε

Ashutosh Trivedi – 23 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Example 1: L = {ww : w ∈ {0, 1}∗ }

input tape 1 1 1 0 0 1 1 1

pushdown stack

0, X 7→ 0X 0, 0 7→ ε
0
1
ε, X 7→ X ε, ⊥ 7→ ⊥
start q0 q1 q2 1
1

1, X 7→ 1X 1, 1 7→ ε

Ashutosh Trivedi – 23 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Example 1: L = {ww : w ∈ {0, 1}∗ }

input tape 1 1 1 0 0 1 1 1

pushdown stack

0, X 7→ 0X 0, 0 7→ ε

1
ε, X 7→ X ε, ⊥ 7→ ⊥
start q0 q1 q2 1
1

1, X 7→ 1X 1, 1 7→ ε

Ashutosh Trivedi – 23 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Example 1: L = {ww : w ∈ {0, 1}∗ }

input tape 1 1 1 0 0 1 1 1

pushdown stack

0, X 7→ 0X 0, 0 7→ ε

ε, X 7→ X ε, ⊥ 7→ ⊥
start q0 q1 q2 1
1

1, X 7→ 1X 1, 1 7→ ε

Ashutosh Trivedi – 23 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Example 1: L = {ww : w ∈ {0, 1}∗ }

input tape 1 1 1 0 0 1 1 1

pushdown stack
0, X 7→ 0X 0, 0 7→ ε

ε, X 7→ X ε, ⊥ 7→ ⊥
start q0 q1 q2
1

1, X 7→ 1X 1, 1 7→ ε

Ashutosh Trivedi – 23 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Example 1: L = {ww : w ∈ {0, 1}∗ }

input tape 1 1 1 0 0 1 1 1

pushdown stack
0, X 7→ 0X 0, 0 7→ ε

ε, X 7→ X ε, ⊥ 7→ ⊥
start q0 q1 q2


1, X 7→ 1X 1, 1 7→ ε

Ashutosh Trivedi – 23 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Example 1: L = {ww : w ∈ {0, 1}∗ }

input tape 1 1 1 0 0 1 1 1

pushdown stack
0, X 7→ 0X 0, 0 7→ ε

ε, X 7→ X ε, ⊥ 7→ ⊥
start q0 q1 q2


1, X 7→ 1X 1, 1 7→ ε

Ashutosh Trivedi – 23 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Example 1: L = {ww : w ∈ {0, 1}∗ }

input tape 1 1 1 0 0 1 1 1

pushdown stack
0, X 7→ 0X 0, 0 7→ ε

ε, X 7→ X ε, ⊥ 7→ ⊥
start q0 q1 q2


1, X 7→ 1X 1, 1 7→ ε

Ashutosh Trivedi – 23 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Pushdown Automata

0, X 7→ 0X 0, 0 7→ ε

ε, X 7→ X ε, ⊥ 7→ ⊥
start q0 q1 q2

1, X 7→ 1X 1, 1 7→ ε

A pushdown automata is a tuple (Q, Σ, Γ, δ, q0 , ⊥, F) where:


– Q is a finite set called the states;
– Σ is a finite set called the alphabet;
– Γ is a finite set called the stack alphabet;

– δ : Q × Σ × Γ → 2Q×Γ is the transition function;
– q0 ∈ Q is the start state;
– ⊥ ∈ Γ is the start stack symbol;
– F ⊆ Q is the set of accepting states.

Ashutosh Trivedi – 24 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Semantics of a PDA
– Let P = (Q, Σ, Γ, δ, q0 , ⊥, F) be a PDA.
– A configuration (or instantaneous description) of a PDA is a triple
(q, w, γ) where
– q is the current state,
– w is the remaining input, and
– γ ∈ Γ∗ is the stack contents, where written as concatenation of symbols
form top-to-bottom.
– We define the operator ` (derivation) such that if (p, α) ∈ δ(q, a, X)
then
(q, aw, Xβ) ` (p, w, αβ),
for all w ∈ Σ∗ and β ∈ Γ∗ . The operator ⊥∗ is defined as transitive
closure of ⊥ in straightforward manner.
– A run of a PDA P = (Q, Σ, Γ, δ, q0 , ⊥, F) over an input word w ∈ Σ∗ is
a sequence of configurations

(q0 , w0 , β0 ), (q1 , w1 , β1 ), . . . , (qn , wn , βn )

such that for every 0 ≤ i < n − 1 we have that


(qi , wi , βi ) ` (qi+1 , wi+1 , βi+1 ) and (q0 , w0 , β0 ) = (q0 , w, ⊥).
Ashutosh Trivedi – 25 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Semantics: acceptance via final states

1. We say that a run

(q0 , w0 , β0 ), (q1 , w1 , β1 ), . . . , (qn , wn , βn )

is accepted via final state if qn ∈ F and wn = ε.


2. We say that a word w is accepted via final states if there exists a run of
P over w that is accepted via final state.
3. We write L(P) for the set of words accepted via final states.
4. In other words,

L(P) = {w : (q0 , w, ⊥) `∗ (qn , ε, β) and qn ∈ F}.

Ashutosh Trivedi – 26 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Semantics: acceptance via final states

1. We say that a run

(q0 , w0 , β0 ), (q1 , w1 , β1 ), . . . , (qn , wn , βn )

is accepted via final state if qn ∈ F and wn = ε.


2. We say that a word w is accepted via final states if there exists a run of
P over w that is accepted via final state.
3. We write L(P) for the set of words accepted via final states.
4. In other words,

L(P) = {w : (q0 , w, ⊥) `∗ (qn , ε, β) and qn ∈ F}.

5. Example L = {ww : w ∈ {0, 1}∗ } with the notion of configuration,


computation, run, and acceptance.

Ashutosh Trivedi – 26 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Semantics: acceptance via empty stack

1. We say that a run

(q0 , w0 , β0 ), (q1 , w1 , β1 ), . . . , (qn , wn , βn )

is accepted via empty stack if βn = ε and wn = ε.


2. We say that a word w is accepted via empty stack if there exists a run
of P over w that is accepted via empty stack.
3. We write N(P) for the set of words accepted via empty stack.
4. In other words

N(P) = {w : (q0 , w, ⊥) `∗ (qn , ε, ε)}.

Ashutosh Trivedi – 27 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Semantics: acceptance via empty stack

1. We say that a run

(q0 , w0 , β0 ), (q1 , w1 , β1 ), . . . , (qn , wn , βn )

is accepted via empty stack if βn = ε and wn = ε.


2. We say that a word w is accepted via empty stack if there exists a run
of P over w that is accepted via empty stack.
3. We write N(P) for the set of words accepted via empty stack.
4. In other words

N(P) = {w : (q0 , w, ⊥) `∗ (qn , ε, ε)}.

Is L(P) = N(P)?

Ashutosh Trivedi – 27 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Equivalence of both notions

Theorem
For every language defind by a PDA with empty stack semantics, there exists a
PDA that accept the same language with final state semantics, and vice-versa.

Proof.
– Final state to Empty stack
– Add a new stack symbol, say ⊥0 , as the start stack symbol, and in the
first transition replace it with ⊥⊥0 before reading any symbol.
(How? and Why?)
– From every final state make a transition to a sink state that does not read
the input but empties the stack including ⊥0 .

Ashutosh Trivedi – 28 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Equivalence of both notions

Theorem
For every language defind by a PDA with empty stack semantics, there exists a
PDA that accept the same language with final state semantics, and vice-versa.

Proof.
– Final state to Empty stack
– Add a new stack symbol, say ⊥0 , as the start stack symbol, and in the
first transition replace it with ⊥⊥0 before reading any symbol.
(How? and Why?)
– From every final state make a transition to a sink state that does not read
the input but empties the stack including ⊥0 .
– Empty Stack to Final state
– Replace the start stack symbol ⊥0 and ⊥⊥0 before reading any symbol.
(Why?)
– From every state make a transition to a new unique final state that does
not read the input but removes the symbol ⊥0 .

Ashutosh Trivedi – 28 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Formal Construction: Empty stack to Final State

Let P = (Q, Σ, Γ, δ, q0 , ⊥) be a PDA. We claim that the PDA


P0 = (Q0 , Σ, Γ0 , δ 0 , q00 , ⊥0 , F0 ) is such that N(P) = L(P0 ), where
1. Q0 = Q ∪ {q00 } ∪ {qF }
2. Γ0 = Γ ∪ {⊥0 }
3. F0 = {qF }.
4. δ 0 is such that
– δ 0 (q, a, X) = δ(q, a, X) for all q ∈ Q and X ∈ Γ,
– δ 0 (q00 , ε, ⊥0 ) = {(q0 , ⊥⊥0 )} and
– δ 0 (q, ε, ⊥0 ) = {(qF , ⊥0 )} for all q ∈ Q.

Ashutosh Trivedi – 29 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Formal Construction: Final State to Empty Stack

Let P = (Q, Σ, Γ, δ, q0 , ⊥, F) be a PDA. We claim that the PDA


P0 = (Q0 , Σ, Γ0 , δ 0 , q00 , ⊥0 ) is such that L(P) = N(P0 ), where
1. Q0 = Q ∪ {q00 } ∪ {qF }
2. Γ0 = Γ ∪ {⊥0 }
3. δ 0 is such that
– δ 0 (q, a, X) = δ(q, a, X) for all q ∈ Q and X ∈ Γ,
– δ 0 (q00 , ε, ⊥0 ) = {(q0 , ⊥⊥0 )} and
– δ 0 (q, ε, X) = {(qF , ε)} for all q ∈ Q and X ∈ Γ.
– δ 0 (qF , ε, X) = {(qF , ε)} for all X ∈ Γ.

Ashutosh Trivedi – 30 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Expressive power of CFG and PDA
Theorem
A language is context-free if and only if some pushdown automaton accepts it.

Proof.
1. For an arbitrary CFG G give a PDA PG such that L(G) = L(PG ).

Ashutosh Trivedi – 31 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Expressive power of CFG and PDA
Theorem
A language is context-free if and only if some pushdown automaton accepts it.

Proof.
1. For an arbitrary CFG G give a PDA PG such that L(G) = L(PG ).
– Leftmost derivation of a string using the stack
– One state PDA accepting by empty stack
– Proof via a simple induction over size of an accepting run of PDA

Ashutosh Trivedi – 31 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Expressive power of CFG and PDA
Theorem
A language is context-free if and only if some pushdown automaton accepts it.

Proof.
1. For an arbitrary CFG G give a PDA PG such that L(G) = L(PG ).
– Leftmost derivation of a string using the stack
– One state PDA accepting by empty stack
– Proof via a simple induction over size of an accepting run of PDA
2. For an arbitrary PDA P give a CFG GP such that L(P) = L(GP ).

Ashutosh Trivedi – 31 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Expressive power of CFG and PDA
Theorem
A language is context-free if and only if some pushdown automaton accepts it.

Proof.
1. For an arbitrary CFG G give a PDA PG such that L(G) = L(PG ).
– Leftmost derivation of a string using the stack
– One state PDA accepting by empty stack
– Proof via a simple induction over size of an accepting run of PDA
2. For an arbitrary PDA P give a CFG GP such that L(P) = L(GP ).
– Modify the PDA to have the following properties such that each step is
either a “push” or “pop”, and has a single accepting state and the stack
is emptied before accepting.
– For every state pair of P define a variable Apq in PG generating strings
such that PDA moves from state p to state q starting and ending with
empty stack.
– Three production rules

Apq = aArs b and Apq = Apr Arq and App = ε.

Ashutosh Trivedi – 31 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
From CFGs to PDAs

Given a CFG G = (V, T, P, S) consider PDA PG = ({q}, T, V ∪ T, δ, q, S) s.t.:


– for every a ∈ T we have

δ(q, a, a) = (q, ε), and

– for variable A ∈ V we have that

δ(q, ε, A) = {(q, β) : A → β is a production of P}.

Then L(G) = N(PG ).

Ashutosh Trivedi – 32 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
From CFGs to PDAs

Given a CFG G = (V, T, P, S) consider PDA PG = ({q}, T, V ∪ T, δ, q, S) s.t.:


– for every a ∈ T we have

δ(q, a, a) = (q, ε), and

– for variable A ∈ V we have that

δ(q, ε, A) = {(q, β) : A → β is a production of P}.

Then L(G) = N(PG ).

Example. Give the PDA equivalent to the following grammar

I → a | b | Ia | Ib | I0 | I1
E → I | E ∗ E | E + E | (E).

Ashutosh Trivedi – 32 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
From CFGs to PDAs

Theorem
We have that w ∈ N(P) if and only if w ∈ L(G).

Proof.
– (If part). Suppose w ∈ L(G). Then w has a leftmost derivation

S = γ1 ⇒lm γ2 ⇒lm · · · ⇒lm γn = w.

It is straightforward to see that by induction on i that


(q, w, S) `∗ (q, yi , αi ) where w = xi yi and xi αi = γi .

Ashutosh Trivedi – 33 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
From CFGs to PDAs

Theorem
We have that w ∈ N(P) if and only if w ∈ L(G).

Proof.
– (Only If part). Suppose w ∈ N(P), i.e. (q, w, S) `∗ (q, ε, ε).
We show that if (q, x, A) `∗ (q, ε, ε) then A ⇒∗ x by induction over
number of moves taken by P.
– Base case. x = ε and (q, ε) ∈ δ(q, ε, A). It follows that A → ε is a
production in P.
– inductive step. Let the first step be A → Y1 Y2 . . . Yk . Let x1 x2 . . . xk be the
part of input to be consumed by the time Y1 . . . Yk is popped out of the
stack.
It follows that (q, xi , Yi ) `∗ (q, ε, ε), and from inductive hypothesis we
get that Yi ⇒ xi if Yi is a variable, and Yi = xi is Yi is a terminal. Hence,
we conclude that A ⇒∗ x.

Ashutosh Trivedi – 33 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
From PDAs to CFGs

Given a PDA P = (Q, Σ, Γ, δ, q0 , ⊥, {qF }) with restriction that every


transition is either pushes a symbol or pops a symbol form the stack, i.e.
δ(q, a, X) contains either (q0 , YX) or (q0 , ε).
Consider the grammar Gp = (V, T, P, S) such that
– V = {Ap,q : p, q ∈ Q}
– T=Σ
– S = Aq0 ,qF
– and P has transitions of the following form:
– Aq,q → ε for all q ∈ Q;
– Ap,q → Ap,r Ar,q for all p, q, r ∈ Q,
– Ap,q → a Ar,s b if δ(p, a, ε) contains (r, X) and δ(s, b, X) contains (q, ε).

Ashutosh Trivedi – 34 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
From PDAs to CFGs

Given a PDA P = (Q, Σ, Γ, δ, q0 , ⊥, {qF }) with restriction that every


transition is either pushes a symbol or pops a symbol form the stack, i.e.
δ(q, a, X) contains either (q0 , YX) or (q0 , ε).
Consider the grammar Gp = (V, T, P, S) such that
– V = {Ap,q : p, q ∈ Q}
– T=Σ
– S = Aq0 ,qF
– and P has transitions of the following form:
– Aq,q → ε for all q ∈ Q;
– Ap,q → Ap,r Ar,q for all p, q, r ∈ Q,
– Ap,q → a Ar,s b if δ(p, a, ε) contains (r, X) and δ(s, b, X) contains (q, ε).
We have that L(Gp ) = L(P).

Ashutosh Trivedi – 34 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
From PDAs to CFGs
Theorem
If Ap,q ⇒∗ x then x can bring the PDA P from state p on empty stack to state q on
empty stack.

Proof.
We prove this theorem by induction on the number of steps in the
derivation of x from Ap,q .
– Base case. If Ap,q ⇒∗ x in one step, then the only rule that can
generate a variable free string in one step is Ap,p → ε.
– Inductive step. If Ap,q ⇒∗ x in n + 1 steps. The first step in the
derivation must be Ap,q → Ap,r Ar,q or Ap,q → a Ar,s b.
– If it is Ap,q → Ap,r Ar,q , then the string x can be broken into two parts x1 x2
such that Ap,r ⇒∗ x1 and Ar,q ⇒∗ x2 in at most n steps. The theorem
easily follows in this case.
– If it is Ap,q → aAr,s b, then the string x can be broken as ayb such that
Ar,s ⇒∗ y in n steps. Notice that from p on reading a the PDA pushes a
symbol X to stack, while it pops X in state s and goes to q.

Ashutosh Trivedi – 35 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
From CFGs to PDAs
Theorem
If x can bring the PDA P from state p on empty stack to state q on empty stack
then Ap,q ⇒∗ x.

Proof.
We prove this theorem by induction on the number of steps the PDA takes
on x to go from p on empty stack to q on empty stack.
– Base case. If the computation has 0 steps that it begins and ends with
the same state and reads ε from the tape. Note that Ap,p ⇒∗ ε since
Ap,p → ε is a rule in P.
– Inductive step. If the computation takes n + 1 steps. To keep the stack
empty, the first step must be a “push” move, while the last step must
be a “pop” move. There are two cases to consider:
– The symbol pushed in the first step is the symbol popped in the last step.
– The symbol pushed if the first step has been popped somewhere in the
middle.

Ashutosh Trivedi – 36 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Context-Free Grammars

Pushdown Automata

Properties of CFLs

Ashutosh Trivedi – 37 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Deterministic Pushdown Automata

A PDA P = (Q, Σ, Γ, δ, q0 , ⊥, F) is deterministic if


– δ(q, a, X) has at most one member for every q ∈ Q, a ∈ Σ or a = ε, and
X ∈ Γ.
– If δ(q, a, X) is nonempty for some a ∈ Σ then δ(q, ε, X) must be empty.

Ashutosh Trivedi – 38 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Deterministic Pushdown Automata

A PDA P = (Q, Σ, Γ, δ, q0 , ⊥, F) is deterministic if


– δ(q, a, X) has at most one member for every q ∈ Q, a ∈ Σ or a = ε, and
X ∈ Γ.
– If δ(q, a, X) is nonempty for some a ∈ Σ then δ(q, ε, X) must be empty.
Example. L = {0n 1n : n ≥ 1}.

Ashutosh Trivedi – 38 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Deterministic Pushdown Automata

A PDA P = (Q, Σ, Γ, δ, q0 , ⊥, F) is deterministic if


– δ(q, a, X) has at most one member for every q ∈ Q, a ∈ Σ or a = ε, and
X ∈ Γ.
– If δ(q, a, X) is nonempty for some a ∈ Σ then δ(q, ε, X) must be empty.
Example. L = {0n 1n : n ≥ 1}.

Theorem
Every regular language can be accepted by a deterministic pushdown automata
that accepts by final states.

Ashutosh Trivedi – 38 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Deterministic Pushdown Automata

A PDA P = (Q, Σ, Γ, δ, q0 , ⊥, F) is deterministic if


– δ(q, a, X) has at most one member for every q ∈ Q, a ∈ Σ or a = ε, and
X ∈ Γ.
– If δ(q, a, X) is nonempty for some a ∈ Σ then δ(q, ε, X) must be empty.
Example. L = {0n 1n : n ≥ 1}.

Theorem
Every regular language can be accepted by a deterministic pushdown automata
that accepts by final states.

Theorem (DPDA 6= PDA)


There are some CFLs, for instance {ww} that can not be accepted by a DPDA.

Ashutosh Trivedi – 38 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Chomsky Normal Form

A Context-free grammar (V, T, P, S) is in Chomsky Normal Form if every


rule is of the form

A → BC
A → a.

where A, B, C are variables, and a is a nonterminal. Also, the start variable


S must not appear on the right-side of any rule, and we also permit the
rule S → ε.
Theorem
Every context-free language is generated by a CFG in Chomsky normal form.

Ashutosh Trivedi – 39 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Chomsky Normal Form

A Context-free grammar (V, T, P, S) is in Chomsky Normal Form if every


rule is of the form

A → BC
A → a.

where A, B, C are variables, and a is a nonterminal. Also, the start variable


S must not appear on the right-side of any rule, and we also permit the
rule S → ε.
Theorem
Every context-free language is generated by a CFG in Chomsky normal form.

Reading Assignment: How to convert an arbitrary CFG to Chomsky


Normal Form.

Ashutosh Trivedi – 39 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Pumping Lemma for CFLs

Theorem
For every context-free language L there exists a constant p (that depends on L)
such that
for every string z ∈ L of length greater or equal to p,
there is an infinite family of strings belonging to L.

Ashutosh Trivedi – 40 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Pumping Lemma for CFLs

Theorem
For every context-free language L there exists a constant p (that depends on L)
such that
for every string z ∈ L of length greater or equal to p,
there is an infinite family of strings belonging to L.

Why? Think parse Trees!

Ashutosh Trivedi – 40 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Pumping Lemma for CFLs

Theorem
For every context-free language L there exists a constant p (that depends on L)
such that
for every string z ∈ L of length greater or equal to p,
there is an infinite family of strings belonging to L.

Why? Think parse Trees!

Let L be a CFL. Then there exists a constant n such that if z is a string in L of


length at least n, then we can write z = uvwxy such that
– |vwx| ≤ n
– vx 6= ε,
– For all i ≥ 0 the string uvi wxi y ∈ L.

Ashutosh Trivedi – 40 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Pumping Lemma for CFLs

Theorem
Let L be a CFL. Then there exists a constant n such that if z is a string in L of
length at least n, then we can write z = uvwxy such that i) |vwx| ≤ n, ii) vx 6= ε,
and iii) for all i ≥ 0 the string uvi wxi y ∈ L.

– Let G be a CFG accepting L. Let b be an upper bound on the size of


the RHS of any production rule of G.

Ashutosh Trivedi – 41 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Pumping Lemma for CFLs

Theorem
Let L be a CFL. Then there exists a constant n such that if z is a string in L of
length at least n, then we can write z = uvwxy such that i) |vwx| ≤ n, ii) vx 6= ε,
and iii) for all i ≥ 0 the string uvi wxi y ∈ L.

– Let G be a CFG accepting L. Let b be an upper bound on the size of


the RHS of any production rule of G.
– What is the upper bound on the length strings in L with parse-tree of
height ` + 1 ?

Ashutosh Trivedi – 41 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Pumping Lemma for CFLs

Theorem
Let L be a CFL. Then there exists a constant n such that if z is a string in L of
length at least n, then we can write z = uvwxy such that i) |vwx| ≤ n, ii) vx 6= ε,
and iii) for all i ≥ 0 the string uvi wxi y ∈ L.

– Let G be a CFG accepting L. Let b be an upper bound on the size of


the RHS of any production rule of G.
– What is the upper bound on the length strings in L with parse-tree of
height ` + 1 ? Answer: b` .
– Let N = |V| be the number of variables in G.
– What can we say about the strings z in L of size greater than bN ?

Ashutosh Trivedi – 41 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Pumping Lemma for CFLs

Theorem
Let L be a CFL. Then there exists a constant n such that if z is a string in L of
length at least n, then we can write z = uvwxy such that i) |vwx| ≤ n, ii) vx 6= ε,
and iii) for all i ≥ 0 the string uvi wxi y ∈ L.

– Let G be a CFG accepting L. Let b be an upper bound on the size of


the RHS of any production rule of G.
– What is the upper bound on the length strings in L with parse-tree of
height ` + 1 ? Answer: b` .
– Let N = |V| be the number of variables in G.
– What can we say about the strings z in L of size greater than bN ?
– Answer: in every parse tree of z there must be a path where a variable
repeats.
– Consider a minimum size parse-tree generating z, and consider a path
where at least a variable repeats, and consider the last such variable.
– Justify the conditions of the pumping Lemma.
Ashutosh Trivedi – 41 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Applying Pumping Lemma
Theorem (Pumping Lemma for Context-free Languages)
L ∈ Σ∗ is a context-free language
=⇒
there exists p ≥ 1 such that
for all strings z ∈ L with |z| ≥ p we have that
there exists u, v, w, x, y ∈ Σ∗ with z = uvwxy, |vx| > 0, |vwx| ≤ p such that
for all i ≥ 0 we have that
uvi wxi y ∈ L.

Ashutosh Trivedi – 42 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Applying Pumping Lemma
Theorem (Pumping Lemma for Context-free Languages)
L ∈ Σ∗ is a context-free language
=⇒
there exists p ≥ 1 such that
for all strings z ∈ L with |z| ≥ p we have that
there exists u, v, w, x, y ∈ Σ∗ with z = uvwxy, |vx| > 0, |vwx| ≤ p such that
for all i ≥ 0 we have that
uvi wxi y ∈ L.

Pumping Lemma (Contrapositive)


For all p ≥ 1 we have that
there exists strings z ∈ L with |z| ≥ p such that
for all u, v, w, x, y ∈ Σ∗ with z = uvwxy, |vx| > 0, |vwx| ≤ p we have that
there exists i ≥ 0 such that
uvi wxi y 6∈ L.
=⇒
L ∈ Σ∗ is not a context-free language.
Ashutosh Trivedi – 42 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Example

Prove that the following languages are not context-free:


1. L = {0n 1n 2n : n ≥ 0}
2. L = {0i 1j 2k : 0 ≤ i ≤ j ≤ k}
3. L = {ww : w ∈ {0, 1}∗ }.
4. L = {0n : n is a prime number}.
5. L = {0n : n is a perfect square}.
6. L = {0n : n is a perfect cube}.

Ashutosh Trivedi – 43 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Closure Properties

Theorem
Context-free languages are closed under the following operations:
1. Union
2. Concatenation
3. Kleene closure
4. Homomorphism
5. Substitution
6. Inverse-homomorphism
7. Reverse

Ashutosh Trivedi – 44 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Closure Properties

Theorem
Context-free languages are closed under the following operations:
1. Union
2. Concatenation
3. Kleene closure
4. Homomorphism
5. Substitution
6. Inverse-homomorphism
7. Reverse
Reading Assignment: Proof of closure under these operations.

Ashutosh Trivedi – 44 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Intersection and Complementaion
Theorem
Context-free languages are not closed under intersection and complementation.

Proof.
– Consider the languages

L1 = {0n 1n 2m : n, m ≥ 0}, and


L2 = {0m 1n 2n : n, m ≥ 0}.

– Both languages are CFLs.


– What is L1 ∩ L2 ?

Ashutosh Trivedi – 45 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Intersection and Complementaion
Theorem
Context-free languages are not closed under intersection and complementation.

Proof.
– Consider the languages

L1 = {0n 1n 2m : n, m ≥ 0}, and


L2 = {0m 1n 2n : n, m ≥ 0}.

– Both languages are CFLs.


– What is L1 ∩ L2 ?
– L = {0n 1n 2n : n ≥ 0} and it is not a CFL.
– Hence CFLs are not closed under intersection.

Ashutosh Trivedi – 45 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
Intersection and Complementaion
Theorem
Context-free languages are not closed under intersection and complementation.

Proof.
– Consider the languages

L1 = {0n 1n 2m : n, m ≥ 0}, and


L2 = {0m 1n 2n : n, m ≥ 0}.

– Both languages are CFLs.


– What is L1 ∩ L2 ?
– L = {0n 1n 2n : n ≥ 0} and it is not a CFL.
– Hence CFLs are not closed under intersection.
– Use De’morgan’s law to prove non-closure under complementation.

Ashutosh Trivedi – 45 of 45
Ashutosh Trivedi Lecture 6: Context-Free Grammar
CSE 322 - Introduction to Formal Methods in Computer Science
Chomsky Normal Form
Dave Bacon
Department of Computer Science & Engineering, University of Washington

A useful form for dealing with context free grammars is the Chomksy normal form. This is a particular form of
writing a CFG which is useful for understanding CFGs and for proving things about them. It also makes the parse
tree for derivations using this form of the CFG a binary tree. And as a CS major, I know you really love binary trees!
So what is Chomsky normal form? A CFG is in Chomsky normal form when every rule is of the form A → BC
and A → a, where a is a terminal, and A, B, and C are variables. Further B and C are not the start variable.
Additionally we permit the rule S → ε where S is the start variable, for technical reasons. Note that this means that
we allow S → ε as one of many possible rules.
Okay, so if this is the Chomsky normal form what is it good for? Well as a first fact, note that parse trees for a
derivation using such a grammar will be a binary tree. Thats nice. It will help us down the road. Okay, so if it might
be good for something, we can ask the natural question: is it possible to convert an arbitrary CFG into an equivalent
grammar which is of the Chomsky normal form? The answer, it turns out, is yes. Lets see how such a conversion
would proceed.

A. A new start variable

The first step is simple! We just add a new start variable S0 and the rule S0 → S where S is the original start
variable. By doing this we guarantee that the start variable doesn’t occur on the right hand side of a rule.

B. Eliminate the ε rules

Next we remove the ε rule. We do this as follows. Suppose we are removing the ε rule A → ε. We remove this rule.
But now we have to “fix” the rules which have an A on their right-hand side. We do this by, for each occurrence of A
on the right hand side, adding a rule (from the same starting variable) which has the A removed. Further if A is the
only thing occurring on the right hand side, we replace this A with ε. Of course this latter fact will have created a
new ε rule. So we do this unless we have previously removed A → ε. But onward we press: simply repeat the above
process over and over again until all ε rules have been removed.
For example, suppose our rules contain the rule A → ε and the rule B → uAv where u and v are not both the
empty string. First we remove A → ε. Then we add to this rule the rule B → uv. (Make sure that you don’t delete
the original rule B → uAv. If, on the other hand we had the rule A → ε and B → A, then we would remove the
A → ε and replace the rule B → A with the rule B → ε. Of course we now have to eliminate this rule via the same
procedure.

C. Remove the unit rules

Next we need to remove the unit rules. If we have the rule A → B, then whenever the rule B → u appears, we will
add the rule A → u (unless this rule was already replaced.) Again we do this repeatedly until we eliminate all unit
rules.

D. Take care of rules with more than two terminals or variables

At this point we have converted our CFG to one which has no ε transitions, and where all rules are either of the
form variables goes to terminal, or of the form variable goes to string of variables and terminals with two or more
symbols. These later rules are of the appropriate Chomsky normal form. To convert the remaining rules to proper
form, we introduce extra variables. In particular suppose A → u1 u2 . . . un where n > 2. Then we convert this to a
set of rules, A → u1 A1 , A1 → u2 A2 , . . ., Ak−2 → uk−1 uk . Now we need to take care of the rules with two elements
on the right hand side. If both of the elements are variables, then we are fine. But if any of them are terminals, we
2

add a new variable and a new rule to take care of these. For example, if we have A → u1 B where u1 is a terminal,
then we replace this by A → U1 B and U → u1 .

I. EXAMPLE CONVERSION TO CHOMSKY NORMAL FORM

Lets work out an example. Consider the grammar

S → ASB
A → aAS|a|ε
B → SbS|A|bb

First we add a new start state:

S0 → S
S → ASB
A → aAS|a|ε
B → SbS|A|bb

Next we need to eliminate the ε rules. Eliminating A → ε yields

S0 → S
S → ASB|SB
A → aAS|a|aS
B → SbS|A|bb|ε

Now we have a new ε rule., B → ε. Lets remove it

S0 → S
S → ASB|SB|S|AS
A → aAS|a|aS
B → SbS|A|bb

Next we need to remove all unit rules. Lets begin by removing B → A:

S0 → S
S → ASB|SB|S|AS
A → aAS|a|aS
B → SbS|bb|aAS|a|aS

Next lets remove S → S:

S0 → S
S → ASB|SB|AS
A → aAS|a|aS
B → SbS|bb|aAs|a|aS

Further we can eliminate S0 → S:

S0 → ASB|SB|AS
S → ASB|SB|AS
A → aAS|a|aS
B → SbS|bb|aAs|a|aS
3

Now we need to take care of the rules with more than three symbols. First replace S0 → ASB by S0 → AU1 and
U1 → SB:
S0 → AU1 |SB|AS
S → ASB|SB|AS
A → aAS|a|aS
B → SbS|bb|aAs|a|aS
U1 → SB
Next eliminate S → ASB in a similar form (technically we could reuse U1 , but lets not):
S0 → AU1 |SB|AS
S → AU2 |SB|AS
A → aAS|a|aS
B → SbS|bb|aAs|a|aS
U1 → SB
U2 → SB
Onward and upward, now fix A → aAS by introducing A → aU3 and U3 → AS.
S0 → AU1 |SB|AS
S → AU2 |SB|AS
A → aU3 |a|aS
B → SbS|bb|aAs|a|aS
U1 → SB
U2 → SB
U3 → AS
Finally, fix the two B → rules:
S0 → AU1 |SB|AS
S → AU2 |SB|AS
A → aU3 |a|aS
B → SU4 |bb|aU5 |a|aS
U1 → SB
U2 → SB
U3 → AS
U4 → bS
U5 → AS
Finally we need to work with the rules which have terminals and variables or two terminals. We need to introduce
new variables for these. Let these be V1 → a and V2 → b:
S0 → AU1 |SB|AS
S → AU2 |SB|AS
A → V1 U3 |a|V1 S
B → SU4 |V2 V2 |V1 U5 |a|V1 S
U1 → SB
U2 → SB
U3 → AS
U4 → V2 S
U5 → AS
V1 → a
V2 → b
4

A quick examination shows us that we have ended up with a grammar in Chomsky normal form. (This can, of course,
be simplified.)
CS 301
Lecture 10 – Chomsky Normal Form

1 / 23
More CFLs
i j k
• A = {a b c ∣ i ≤ j or i = k}
• B = {w ∣ w ∈ {a, b, c} contains the same number of as as bs and cs combined}

m n m+n
• C = {1 +1 =1 ∣ m, n ≥ 1}; Σ = {1, +, =}
• D = (abb ∣ bbaa)
∗ ∗

R
• E = {w ∣ w ∈ {0, 1} and w

is a binary number not divisible by 5}

2 / 23
Another proof that regular languages are context-free
We can encode the computation of a DFA on a string using a CFG

Give a DFA M = (Q, Σ, δ, q0 , F ), we can construct an equivalent CFG


G = (V, Σ, R, S) where

3 / 23
Another proof that regular languages are context-free
We can encode the computation of a DFA on a string using a CFG

Give a DFA M = (Q, Σ, δ, q0 , F ), we can construct an equivalent CFG


G = (V, Σ, R, S) where
• states of M are variables in G

3 / 23
Another proof that regular languages are context-free
We can encode the computation of a DFA on a string using a CFG

Give a DFA M = (Q, Σ, δ, q0 , F ), we can construct an equivalent CFG


G = (V, Σ, R, S) where
• states of M are variables in G
• q0 is the start variable, and

3 / 23
Another proof that regular languages are context-free
We can encode the computation of a DFA on a string using a CFG

Give a DFA M = (Q, Σ, δ, q0 , F ), we can construct an equivalent CFG


G = (V, Σ, R, S) where
• states of M are variables in G
• q0 is the start variable, and
• transitions δ(q, t) = r become rules q → tr

3 / 23
Another proof that regular languages are context-free
We can encode the computation of a DFA on a string using a CFG

Give a DFA M = (Q, Σ, δ, q0 , F ), we can construct an equivalent CFG


G = (V, Σ, R, S) where
• states of M are variables in G
• q0 is the start variable, and
• transitions δ(q, t) = r become rules q → tr

If on input w = w1 w2 ⋯wn , M goes through states r0 , r1 , . . . , rn , then

r0 ⇒ w1 r1 ⇒ w1 w2 r2 ⇒ ⋯ ⇒ w1 w2 ⋯wn rn

3 / 23
Another proof that regular languages are context-free
We can encode the computation of a DFA on a string using a CFG

Give a DFA M = (Q, Σ, δ, q0 , F ), we can construct an equivalent CFG


G = (V, Σ, R, S) where
• states of M are variables in G
• q0 is the start variable, and
• transitions δ(q, t) = r become rules q → tr

If on input w = w1 w2 ⋯wn , M goes through states r0 , r1 , . . . , rn , then

r0 ⇒ w1 r1 ⇒ w1 w2 r2 ⇒ ⋯ ⇒ w1 w2 ⋯wn rn

So G has derived the string wrn but this still has a variable

What additional rules should we add to end up with a string of terminals?


For each state q ∈ F , add a rule q → ε

3 / 23
Formally
Proof.
Given a DFA M = (Q, Σ, δ, q0 , F ), we can construct an equivalent CFG
G = (V, Σ, R, S) where

V =Q
S = q0
R = {q → tr ∶ δ(q, t) = r} ∪ {q → ε ∶ q ∈ F }

4 / 23
Formally
Proof.
Given a DFA M = (Q, Σ, δ, q0 , F ), we can construct an equivalent CFG
G = (V, Σ, R, S) where

V =Q
S = q0
R = {q → tr ∶ δ(q, t) = r} ∪ {q → ε ∶ q ∈ F }

If r0 , r1 , . . . , rn is the computation of M on input w = w1 w2 ⋯wn , then r0 = q0 and


δ(ri−1 , wi ) = ri for 1 ≤ i ≤ n


By construction r0 ⇒ w1 r1 ⇒ w1 w2 r2 ⇒ w1 w2 ⋯wn rn

Therefore, w ∈ L(M ) iff rn ∈ F iff rn ⇒ ε iff q0 ⇒ w iff w ∈ L(G)

4 / 23
Returning to our language

R
E = {w ∣ w ∈ {0,1} and w

is a binary number not divisible by 5}

0 Q1
1 0

Q0 1 Q2
0
1
1

Q4 Q3
1 0

5 / 23
Returning to our language

R
E = {w ∣ w ∈ {0,1} and w

is a binary number not divisible by 5}

Q1
0
1 0 Q0 → 0Q0 ∣ 1Q2
Q1 → 0Q3 ∣ 1Q0 ∣ ε
Q0 1 Q2 Q2 → 0Q1 ∣ 1Q3 ∣ ε
0 Q3 → 0Q4 ∣ 1Q1 ∣ ε
1
Q4 → 0Q2 ∣ 1Q4 ∣ ε
1

Q4 Q3
1 0

5 / 23
Chomsky Normal Form (CNF)
A CFG G = (V, Σ, R, S) is in Chomsky Normal Form if all rules have one of these
forms
• S→ε where S is the start variable
• A → BC where A ∈ V and B, C ∈ V ∖ {S}
• A→t where A ∈ V and t ∈ Σ

Note
• The only rule with ε on the right has the start variable on the left
• The start variable doesn’t appear on the right hand side of any rule

6 / 23
CNF example
R
Let A = {w ∣ w ∈ {a, b} and w = w }.

CFG in CNF Derivation of baaab

S → AU ∣ BV ∣ a ∣ b ∣ ε S
T → AU ∣ BV ∣ a ∣ b
U → TA
V → TB
A→a
B→b

7 / 23
CNF example
R
Let A = {w ∣ w ∈ {a, b} and w = w }.

CFG in CNF Derivation of baaab

S → AU ∣ BV ∣ a ∣ b ∣ ε S ⇒ BV
T → AU ∣ BV ∣ a ∣ b
U → TA
V → TB
A→a
B→b

7 / 23
CNF example
R
Let A = {w ∣ w ∈ {a, b} and w = w }.

CFG in CNF Derivation of baaab

S → AU ∣ BV ∣ a ∣ b ∣ ε S ⇒ BV
T → AU ∣ BV ∣ a ∣ b ⇒ bV
U → TA
V → TB
A→a
B→b

7 / 23
CNF example
R
Let A = {w ∣ w ∈ {a, b} and w = w }.

CFG in CNF Derivation of baaab

S → AU ∣ BV ∣ a ∣ b ∣ ε S ⇒ BV
T → AU ∣ BV ∣ a ∣ b ⇒ bV
U → TA ⇒ bT B
V → TB
A→a
B→b

7 / 23
CNF example
R
Let A = {w ∣ w ∈ {a, b} and w = w }.

CFG in CNF Derivation of baaab

S → AU ∣ BV ∣ a ∣ b ∣ ε S ⇒ BV
T → AU ∣ BV ∣ a ∣ b ⇒ bV
U → TA ⇒ bT B
V → TB ⇒ bAU B
A→a
B→b

7 / 23
CNF example
R
Let A = {w ∣ w ∈ {a, b} and w = w }.

CFG in CNF Derivation of baaab

S → AU ∣ BV ∣ a ∣ b ∣ ε S ⇒ BV
T → AU ∣ BV ∣ a ∣ b ⇒ bV
U → TA ⇒ bT B
V → TB ⇒ bAU B
A→a ⇒ baU B
B→b

7 / 23
CNF example
R
Let A = {w ∣ w ∈ {a, b} and w = w }.

CFG in CNF Derivation of baaab

S → AU ∣ BV ∣ a ∣ b ∣ ε S ⇒ BV
T → AU ∣ BV ∣ a ∣ b ⇒ bV
U → TA ⇒ bT B
V → TB ⇒ bAU B
A→a ⇒ baU B
B→b ⇒ baT AB

7 / 23
CNF example
R
Let A = {w ∣ w ∈ {a, b} and w = w }.

CFG in CNF Derivation of baaab

S → AU ∣ BV ∣ a ∣ b ∣ ε S ⇒ BV
T → AU ∣ BV ∣ a ∣ b ⇒ bV
U → TA ⇒ bT B
V → TB ⇒ bAU B
A→a ⇒ baU B
B→b ⇒ baT AB
⇒ baaAB

7 / 23
CNF example
R
Let A = {w ∣ w ∈ {a, b} and w = w }.

CFG in CNF Derivation of baaab

S → AU ∣ BV ∣ a ∣ b ∣ ε S ⇒ BV
T → AU ∣ BV ∣ a ∣ b ⇒ bV
U → TA ⇒ bT B
V → TB ⇒ bAU B
A→a ⇒ baU B
B→b ⇒ baT AB
⇒ baaAB
⇒ baaaB

7 / 23
CNF example
R
Let A = {w ∣ w ∈ {a, b} and w = w }.

CFG in CNF Derivation of baaab

S → AU ∣ BV ∣ a ∣ b ∣ ε S ⇒ BV
T → AU ∣ BV ∣ a ∣ b ⇒ bV
U → TA ⇒ bT B
V → TB ⇒ bAU B
A→a ⇒ baU B
B→b ⇒ baT AB
⇒ baaAB
⇒ baaaB
⇒ baaab

7 / 23
Converting to CNF
Theorem
Every context-free language A is generated by some CFG in CNF.

Proof.
Given a CFG G = (V, Σ, R, S) generating A, we construct a new CFG
G = (V , Σ, R , S ) in CNF generating A.
′ ′ ′ ′

There are five steps.


START Add a new start variable
BIN Replace rules with RHS longer than two with multiple rules each of
which has a RHS of length two
DEL-ε Remove all ε-rules (A → ε)
UNIT Remove all unit-rules (A → B)
TERM Add a variable and rule for each terminal (T → t) and replace terminals
on the RHS of rules

8 / 23
Proof continued
In the following x ∈ V ∪ Σ and u ∈ (Σ ∪ V )
+

′ ′
START Add a new start variable S and a rule S → S

9 / 23
Proof continued
In the following x ∈ V ∪ Σ and u ∈ (Σ ∪ V )
+

′ ′
START Add a new start variable S and a rule S → S
BIN Replace each rule A → xu with the rules A → xA1 and A1 → u and
repeat until the RHS of every rule has length at most two

9 / 23
Proof continued
In the following x ∈ V ∪ Σ and u ∈ (Σ ∪ V )
+

′ ′
START Add a new start variable S and a rule S → S
BIN Replace each rule A → xu with the rules A → xA1 and A1 → u and
repeat until the RHS of every rule has length at most two

DEL-ε For each rule of the form A → ε other than S → ε remove A → ε and
update all rules with A in the RHS

9 / 23
Proof continued
In the following x ∈ V ∪ Σ and u ∈ (Σ ∪ V )
+

′ ′
START Add a new start variable S and a rule S → S
BIN Replace each rule A → xu with the rules A → xA1 and A1 → u and
repeat until the RHS of every rule has length at most two

DEL-ε For each rule of the form A → ε other than S → ε remove A → ε and
update all rules with A in the RHS
• B → A. Add rule B → ε unless B → ε has already been removed

9 / 23
Proof continued
In the following x ∈ V ∪ Σ and u ∈ (Σ ∪ V )
+

′ ′
START Add a new start variable S and a rule S → S
BIN Replace each rule A → xu with the rules A → xA1 and A1 → u and
repeat until the RHS of every rule has length at most two

DEL-ε For each rule of the form A → ε other than S → ε remove A → ε and
update all rules with A in the RHS
• B → A. Add rule B → ε unless B → ε has already been removed
• B → AA. Add rule B → A and if B → ε has not already been
removed, add it

9 / 23
Proof continued
In the following x ∈ V ∪ Σ and u ∈ (Σ ∪ V )
+

′ ′
START Add a new start variable S and a rule S → S
BIN Replace each rule A → xu with the rules A → xA1 and A1 → u and
repeat until the RHS of every rule has length at most two

DEL-ε For each rule of the form A → ε other than S → ε remove A → ε and
update all rules with A in the RHS
• B → A. Add rule B → ε unless B → ε has already been removed
• B → AA. Add rule B → A and if B → ε has not already been
removed, add it
• B → xA or B → Ax. Add rule B → x

9 / 23
Proof continued
In the following x ∈ V ∪ Σ and u ∈ (Σ ∪ V )
+

′ ′
START Add a new start variable S and a rule S → S
BIN Replace each rule A → xu with the rules A → xA1 and A1 → u and
repeat until the RHS of every rule has length at most two

DEL-ε For each rule of the form A → ε other than S → ε remove A → ε and
update all rules with A in the RHS
• B → A. Add rule B → ε unless B → ε has already been removed
• B → AA. Add rule B → A and if B → ε has not already been
removed, add it
• B → xA or B → Ax. Add rule B → x
UNIT For each rule A → B, remove it and add rules A → u for each B → u
unless A → u is a unit rule already removed

9 / 23
Proof continued
In the following x ∈ V ∪ Σ and u ∈ (Σ ∪ V )
+

′ ′
START Add a new start variable S and a rule S → S
BIN Replace each rule A → xu with the rules A → xA1 and A1 → u and
repeat until the RHS of every rule has length at most two

DEL-ε For each rule of the form A → ε other than S → ε remove A → ε and
update all rules with A in the RHS
• B → A. Add rule B → ε unless B → ε has already been removed
• B → AA. Add rule B → A and if B → ε has not already been
removed, add it
• B → xA or B → Ax. Add rule B → x
UNIT For each rule A → B, remove it and add rules A → u for each B → u
unless A → u is a unit rule already removed
TERM For each t ∈ Σ, add a new variable T and a rule T → t; replace each t in
the RHS of nonunit rules with T

9 / 23
Proof continued
In the following x ∈ V ∪ Σ and u ∈ (Σ ∪ V )
+

′ ′
START Add a new start variable S and a rule S → S
BIN Replace each rule A → xu with the rules A → xA1 and A1 → u and
repeat until the RHS of every rule has length at most two

DEL-ε For each rule of the form A → ε other than S → ε remove A → ε and
update all rules with A in the RHS
• B → A. Add rule B → ε unless B → ε has already been removed
• B → AA. Add rule B → A and if B → ε has not already been
removed, add it
• B → xA or B → Ax. Add rule B → x
UNIT For each rule A → B, remove it and add rules A → u for each B → u
unless A → u is a unit rule already removed
TERM For each t ∈ Σ, add a new variable T and a rule T → t; replace each t in
the RHS of nonunit rules with T
Each of the five steps preserves the language generated by the grammar so
L(G ) = A.

9 / 23
Example
Convert to CNF
A → BAB ∣ B ∣ ε
B → 00 ∣ ε

START:

10 / 23
Example
Convert to CNF
A → BAB ∣ B ∣ ε
B → 00 ∣ ε

START:
S→A
A → BAB ∣ B ∣ ε
B → 00 ∣ ε

10 / 23
Example
Convert to CNF
A → BAB ∣ B ∣ ε
B → 00 ∣ ε

START:
S→A
A → BAB ∣ B ∣ ε
B → 00 ∣ ε

BIN: Replace A → BAB:

10 / 23
Example
Convert to CNF
A → BAB ∣ B ∣ ε
B → 00 ∣ ε

START:
S→A
A → BAB ∣ B ∣ ε
B → 00 ∣ ε

BIN: Replace A → BAB:


S→A
A → BA1 ∣ B ∣ ε
B → 00 ∣ ε
A1 → AB

10 / 23
Example
Convert to CNF DEL-ε: Remove A → ε:
A → BAB ∣ B ∣ ε
B → 00 ∣ ε

START:
S→A
A → BAB ∣ B ∣ ε
B → 00 ∣ ε

BIN: Replace A → BAB:


S→A
A → BA1 ∣ B ∣ ε
B → 00 ∣ ε
A1 → AB

10 / 23
Example
Convert to CNF DEL-ε: Remove A → ε:
A → BAB ∣ B ∣ ε S→A∣ε
B → 00 ∣ ε A → BA1 ∣ B
B → 00 ∣ ε
START:
S→A A1 → AB ∣ B
A → BAB ∣ B ∣ ε
B → 00 ∣ ε

BIN: Replace A → BAB:


S→A
A → BA1 ∣ B ∣ ε
B → 00 ∣ ε
A1 → AB

10 / 23
Example
Convert to CNF DEL-ε: Remove A → ε:
A → BAB ∣ B ∣ ε S→A∣ε
B → 00 ∣ ε A → BA1 ∣ B
B → 00 ∣ ε
START:
S→A A1 → AB ∣ B
A → BAB ∣ B ∣ ε Remove B → ε:
B → 00 ∣ ε

BIN: Replace A → BAB:


S→A
A → BA1 ∣ B ∣ ε
B → 00 ∣ ε
A1 → AB

10 / 23
Example
Convert to CNF DEL-ε: Remove A → ε:
A → BAB ∣ B ∣ ε S→A∣ε
B → 00 ∣ ε A → BA1 ∣ B
B → 00 ∣ ε
START:
S→A A1 → AB ∣ B
A → BAB ∣ B ∣ ε Remove B → ε:
B → 00 ∣ ε S→A∣ε
A → BA1 ∣ B ∣ A1
BIN: Replace A → BAB:
S→A B → 00
A → BA1 ∣ B ∣ ε A1 → AB ∣ B ∣ A ∣ ε
B → 00 ∣ ε Don’t add A → ε because we
A1 → AB already removed it

10 / 23
Example
Convert to CNF DEL-ε: Remove A → ε: Remove A1 → ε:
A → BAB ∣ B ∣ ε S→A∣ε
B → 00 ∣ ε A → BA1 ∣ B
B → 00 ∣ ε
START:
S→A A1 → AB ∣ B
A → BAB ∣ B ∣ ε Remove B → ε:
B → 00 ∣ ε S→A∣ε
A → BA1 ∣ B ∣ A1
BIN: Replace A → BAB:
S→A B → 00
A → BA1 ∣ B ∣ ε A1 → AB ∣ B ∣ A ∣ ε
B → 00 ∣ ε Don’t add A → ε because we
A1 → AB already removed it

10 / 23
Example
Convert to CNF DEL-ε: Remove A → ε: Remove A1 → ε:
A → BAB ∣ B ∣ ε S→A∣ε S→A∣ε
B → 00 ∣ ε A → BA1 ∣ B A → BA1 ∣ B ∣ A1
B → 00 ∣ ε B → 00
START:
S→A A1 → AB ∣ B A1 → AB ∣ B ∣ A
A → BAB ∣ B ∣ ε Remove B → ε: Don’t add A → ε because we
B → 00 ∣ ε S→A∣ε already removed it
A → BA1 ∣ B ∣ A1
BIN: Replace A → BAB:
S→A B → 00
A → BA1 ∣ B ∣ ε A1 → AB ∣ B ∣ A ∣ ε
B → 00 ∣ ε Don’t add A → ε because we
A1 → AB already removed it

10 / 23
Example
Convert to CNF DEL-ε: Remove A → ε: Remove A1 → ε:
A → BAB ∣ B ∣ ε S→A∣ε S→A∣ε
B → 00 ∣ ε A → BA1 ∣ B A → BA1 ∣ B ∣ A1
B → 00 ∣ ε B → 00
START:
S→A A1 → AB ∣ B A1 → AB ∣ B ∣ A
A → BAB ∣ B ∣ ε Remove B → ε: Don’t add A → ε because we
B → 00 ∣ ε S→A∣ε already removed it
A → BA1 ∣ B ∣ A1
BIN: Replace A → BAB: UNIT: Remove S → A
S→A B → 00
A → BA1 ∣ B ∣ ε A1 → AB ∣ B ∣ A ∣ ε
B → 00 ∣ ε Don’t add A → ε because we
A1 → AB already removed it

10 / 23
Example
Convert to CNF DEL-ε: Remove A → ε: Remove A1 → ε:
A → BAB ∣ B ∣ ε S→A∣ε S→A∣ε
B → 00 ∣ ε A → BA1 ∣ B A → BA1 ∣ B ∣ A1
B → 00 ∣ ε B → 00
START:
S→A A1 → AB ∣ B A1 → AB ∣ B ∣ A
A → BAB ∣ B ∣ ε Remove B → ε: Don’t add A → ε because we
B → 00 ∣ ε S→A∣ε already removed it
A → BA1 ∣ B ∣ A1
BIN: Replace A → BAB: UNIT: Remove S → A
B → 00
S→A S → BA1 ∣ B ∣ A1 ∣ ε
A → BA1 ∣ B ∣ ε A1 → AB ∣ B ∣ A ∣ ε
A → BA1 ∣ B ∣ A1
B → 00 ∣ ε Don’t add A → ε because we B → 00
A1 → AB already removed it A1 → AB ∣ B ∣ A

10 / 23
Example continued
From previous slide
S → BA1 ∣ B ∣ A1 ∣ ε
A → BA1 ∣ B ∣ A1
B → 00
A1 → AB ∣ B ∣ A

11 / 23
Example continued
From previous slide
S → BA1 ∣ B ∣ A1 ∣ ε
A → BA1 ∣ B ∣ A1
B → 00
A1 → AB ∣ B ∣ A

Remove S → B

11 / 23
Example continued
From previous slide
S → BA1 ∣ B ∣ A1 ∣ ε
A → BA1 ∣ B ∣ A1
B → 00
A1 → AB ∣ B ∣ A

Remove S → B
S → BA1 ∣ A1 ∣ ε ∣ 00
A → BA1 ∣ B ∣ A1
B → 00
A1 → AB ∣ B ∣ A

11 / 23
Example continued
From previous slide Remove S → A1
S → BA1 ∣ B ∣ A1 ∣ ε
A → BA1 ∣ B ∣ A1
B → 00
A1 → AB ∣ B ∣ A

Remove S → B
S → BA1 ∣ A1 ∣ ε ∣ 00
A → BA1 ∣ B ∣ A1
B → 00
A1 → AB ∣ B ∣ A

11 / 23
Example continued
From previous slide Remove S → A1
S → BA1 ∣ B ∣ A1 ∣ ε S → BA1 ∣ ε ∣ 00 ∣ AB
A → BA1 ∣ B ∣ A1 A → BA1 ∣ B ∣ A1
B → 00 B → 00
A1 → AB ∣ B ∣ A A1 → AB ∣ B ∣ A

Remove S → B Don’t add S → B or S → A


S → BA1 ∣ A1 ∣ ε ∣ 00 because we removed them
A → BA1 ∣ B ∣ A1
B → 00
A1 → AB ∣ B ∣ A

11 / 23
Example continued
From previous slide Remove S → A1
S → BA1 ∣ B ∣ A1 ∣ ε S → BA1 ∣ ε ∣ 00 ∣ AB
A → BA1 ∣ B ∣ A1 A → BA1 ∣ B ∣ A1
B → 00 B → 00
A1 → AB ∣ B ∣ A A1 → AB ∣ B ∣ A

Remove S → B Don’t add S → B or S → A


S → BA1 ∣ A1 ∣ ε ∣ 00 because we removed them
A → BA1 ∣ B ∣ A1
Remove A → B
B → 00
A1 → AB ∣ B ∣ A

11 / 23
Example continued
From previous slide Remove S → A1
S → BA1 ∣ B ∣ A1 ∣ ε S → BA1 ∣ ε ∣ 00 ∣ AB
A → BA1 ∣ B ∣ A1 A → BA1 ∣ B ∣ A1
B → 00 B → 00
A1 → AB ∣ B ∣ A A1 → AB ∣ B ∣ A

Remove S → B Don’t add S → B or S → A


S → BA1 ∣ A1 ∣ ε ∣ 00 because we removed them
A → BA1 ∣ B ∣ A1
Remove A → B
B → 00
S → BA1 ∣ ε ∣ 00 ∣ AB
A1 → AB ∣ B ∣ A
A → BA1 ∣ A1 ∣ 00
B → 00
A1 → AB ∣ B ∣ A

11 / 23
Example continued
From previous slide Remove S → A1 Remove A → A1
S → BA1 ∣ B ∣ A1 ∣ ε S → BA1 ∣ ε ∣ 00 ∣ AB
A → BA1 ∣ B ∣ A1 A → BA1 ∣ B ∣ A1
B → 00 B → 00
A1 → AB ∣ B ∣ A A1 → AB ∣ B ∣ A

Remove S → B Don’t add S → B or S → A


S → BA1 ∣ A1 ∣ ε ∣ 00 because we removed them
A → BA1 ∣ B ∣ A1
Remove A → B
B → 00
S → BA1 ∣ ε ∣ 00 ∣ AB
A1 → AB ∣ B ∣ A
A → BA1 ∣ A1 ∣ 00
B → 00
A1 → AB ∣ B ∣ A

11 / 23
Example continued
From previous slide Remove S → A1 Remove A → A1
S → BA1 ∣ B ∣ A1 ∣ ε S → BA1 ∣ ε ∣ 00 ∣ AB S → BA1 ∣ ε ∣ 00 ∣ AB
A → BA1 ∣ B ∣ A1 A → BA1 ∣ B ∣ A1 A → BA1 ∣ 00 ∣ AB
B → 00 B → 00 B → 00
A1 → AB ∣ B ∣ A A1 → AB ∣ B ∣ A A1 → AB ∣ B ∣ A

Remove S → B Don’t add S → B or S → ADon’t add A → B because


S → BA1 ∣ A1 ∣ ε ∣ 00 because we removed them we removed it
A → BA1 ∣ B ∣ A1 Don’t add A → A because
Remove A → B it’s useless
B → 00
S → BA1 ∣ ε ∣ 00 ∣ AB
A1 → AB ∣ B ∣ A
A → BA1 ∣ A1 ∣ 00
B → 00
A1 → AB ∣ B ∣ A

11 / 23
Example continued
From previous slide Remove S → A1 Remove A → A1
S → BA1 ∣ B ∣ A1 ∣ ε S → BA1 ∣ ε ∣ 00 ∣ AB S → BA1 ∣ ε ∣ 00 ∣ AB
A → BA1 ∣ B ∣ A1 A → BA1 ∣ B ∣ A1 A → BA1 ∣ 00 ∣ AB
B → 00 B → 00 B → 00
A1 → AB ∣ B ∣ A A1 → AB ∣ B ∣ A A1 → AB ∣ B ∣ A

Remove S → B Don’t add S → B or S → ADon’t add A → B because


S → BA1 ∣ A1 ∣ ε ∣ 00 because we removed them we removed it
A → BA1 ∣ B ∣ A1 Don’t add A → A because
Remove A → B it’s useless
B → 00
S → BA1 ∣ ε ∣ 00 ∣ AB
A1 → AB ∣ B ∣ A
A → BA1 ∣ A1 ∣ 00 Remove A1 → B
B → 00
A1 → AB ∣ B ∣ A

11 / 23
Example continued
From previous slide Remove S → A1 Remove A → A1
S → BA1 ∣ B ∣ A1 ∣ ε S → BA1 ∣ ε ∣ 00 ∣ AB S → BA1 ∣ ε ∣ 00 ∣ AB
A → BA1 ∣ B ∣ A1 A → BA1 ∣ B ∣ A1 A → BA1 ∣ 00 ∣ AB
B → 00 B → 00 B → 00
A1 → AB ∣ B ∣ A A1 → AB ∣ B ∣ A A1 → AB ∣ B ∣ A

Remove S → B Don’t add S → B or S → ADon’t add A → B because


S → BA1 ∣ A1 ∣ ε ∣ 00 because we removed them we removed it
A → BA1 ∣ B ∣ A1 Don’t add A → A because
Remove A → B it’s useless
B → 00
S → BA1 ∣ ε ∣ 00 ∣ AB
A1 → AB ∣ B ∣ A
A → BA1 ∣ A1 ∣ 00 Remove A1 → B
S → BA1 ∣ ε ∣ 00 ∣ AB
B → 00
A → BA1 ∣ 00 ∣ AB
A1 → AB ∣ B ∣ A
B → 00
A1 → AB ∣ A ∣ 00
11 / 23
Example continued
Copied from the previous slide
S → BA1 ∣ ε ∣ 00 ∣ AB
A → BA1 ∣ 00 ∣ AB
B → 00
A1 → AB ∣ A ∣ 00

Remove A1 → A

12 / 23
Example continued
Copied from the previous slide
S → BA1 ∣ ε ∣ 00 ∣ AB
A → BA1 ∣ 00 ∣ AB
B → 00
A1 → AB ∣ A ∣ 00

Remove A1 → A
S → BA1 ∣ ε ∣ 00 ∣ AB
A → BA1 ∣ 00 ∣ AB
B → 00
A1 → AB ∣ 00 ∣ BA1

12 / 23
Example continued
Copied from the previous slide TERM: Add Z → 0
S → BA1 ∣ ε ∣ 00 ∣ AB
A → BA1 ∣ 00 ∣ AB
B → 00
A1 → AB ∣ A ∣ 00

Remove A1 → A
S → BA1 ∣ ε ∣ 00 ∣ AB
A → BA1 ∣ 00 ∣ AB
B → 00
A1 → AB ∣ 00 ∣ BA1

12 / 23
Example continued
Copied from the previous slide TERM: Add Z → 0
S → BA1 ∣ ε ∣ 00 ∣ AB S → BA1 ∣ ε ∣ ZZ ∣ AB
A → BA1 ∣ 00 ∣ AB A → BA1 ∣ ZZ ∣ AB
B → 00 B → ZZ
A1 → AB ∣ A ∣ 00 A1 → AB ∣ ZZ ∣ BA1
Z→0
Remove A1 → A
S → BA1 ∣ ε ∣ 00 ∣ AB
A → BA1 ∣ 00 ∣ AB
B → 00
A1 → AB ∣ 00 ∣ BA1

12 / 23
Caution
Sipser gives a different procedure
1 START
2 DEL-ε
3 UNIT
4 BIN
5 TERM
This procedure works but can lead to an exponential blow up in the number of rules!

2∣G∣
In general, if DEL-ε comes before BIN, then ∣G ∣ is O(2 );

2
if BIN comes before DEL-ε, then ∣G ∣ is O(∣G∣ )

UNIT is responsible for the quadratic blow up

So use whichever procedure you’d like, but Sipser’s can be very bad
(Sipser’s is bad if you have long rules with lots of variables with ε-rules)

13 / 23
Example blow up

A → BCDEEDCB ∣ CBEDDEBC
B→0∣ε
C→1∣ε
D→2∣ε
E→3∣ε

has five variables and 10 rules

Converting using START, BIN, DEL-ε, UNIT, TERM gives a CFG with 18 variables
and 125 rules

14 / 23
Example blow up

A → BCDEEDCB ∣ CBEDDEBC
B→0∣ε
C→1∣ε
D→2∣ε
E→3∣ε

has five variables and 10 rules

Converting using START, BIN, DEL-ε, UNIT, TERM gives a CFG with 18 variables
and 125 rules

Converting using START, DEL-ε, UNIT, BIN, TERM gives a CFG with 1394 variables
and 1953 rules

14 / 23
Prefix
Recall Prefix(L) = {w ∣ for some x ∈ Σ , wx ∈ L}

Theorem
The class of context-free languages is closed under Prefix.

15 / 23
Prefix
Recall Prefix(L) = {w ∣ for some x ∈ Σ , wx ∈ L}

Theorem
The class of context-free languages is closed under Prefix.

Proof idea
R
Consider the language {w#w ∣ w ∈ {a, b} } generated by

T → aT a ∣ bT b ∣ #

15 / 23
Prefix
Recall Prefix(L) = {w ∣ for some x ∈ Σ , wx ∈ L}

Theorem
The class of context-free languages is closed under Prefix.

Proof idea
R
Consider the language {w#w ∣ w ∈ {a, b} } generated by

T → aT a ∣ bT b ∣ #
Let’s convert to CNF

15 / 23
Prefix
Recall Prefix(L) = {w ∣ for some x ∈ Σ , wx ∈ L}

Theorem
The class of context-free languages is closed under Prefix.

Proof idea
R
Consider the language {w#w ∣ w ∈ {a, b} } generated by

T → aT a ∣ bT b ∣ #
Let’s convert to CNF
S → AU ∣ BV ∣ #
T → AU ∣ BV ∣ #
U → TA
V → TB
A→a
B→b

15 / 23
Derivation of ab#ba

S
S ⇒ AU
⇒ aU
⇒ aT A A U
⇒ aBV A
⇒ abV A a T A
⇒ abT BA
⇒ ab#BA
⇒ ab#bA B V a
⇒ ab#ba

The prefix ab# includes b T B


– all terminals from subtrees with a blue root;
– some terminals from subtrees with a violet root;
– no terminals from subtrees with a red root # b
16 / 23
Desired derivation for the prefix
We would like a derivation like this S
S ⇒ AU
⇒ aU
⇒ aT A A U
⇒ aBV A
⇒ abV A
a T A
⇒ abT BA
⇒ ab#BA
⇒ ab#εA B V ε
⇒ ab#εε

Everything left of the violet path is produced b T B


Everything right of the violet path becomes ε
The leaf connected to the violet path is produced
# ε
17 / 23
The proof idea
The violet path corresponds to the point where we “split” the prefix from the
remainder of the string

We want to construct a CFG that keeps track of whether a given variable in the
derivation is
L left of the split,
S part of the split, or
R right of the split
We can construct a new CFG whose variables are ⟨A, L⟩, ⟨A, S⟩, or ⟨A, R⟩ where A is
a variable in the original CFG

We have to deal with the three types of rules


• S→ε
• A → BC
• A→t
and produce new rules corresponding to the variable on the LHS being left of, right of,
or on the split
18 / 23
Proof
If L = ∅, then Prefix(L) = ∅ which is CF.

Otherwise, let L be CF and generated by the CFG G = (V, Σ, R, S) in CNF.

Construct a new CFG (not in CNF) G = (V , Σ, R , S ) where


′ ′ ′ ′

V = {⟨A, D⟩ ∣ A ∈ V and D ∈ {L, S, R}}


S = ⟨S, S⟩

′ ′
Now we just need to specify R . We’ll start with R = ∅ and add rules to it

19 / 23
Proof continued
Since L is nonempty, ε ∈ Prefix(L) so add the rule ⟨S, S⟩ → ε to R


For each rule of the form A → BC in R, add the following rules to R
⟨A, L⟩ → ⟨B, L⟩⟨C, L⟩ left of the split
⟨A, S⟩ → ⟨B, L⟩⟨C, S⟩ ∣ ⟨B, S⟩⟨C, R⟩ one of B or C is on the split
⟨A, R⟩ → ⟨B, R⟩⟨C, R⟩ right of the split

For each rule of the form A → t in R, add the following rules to R
⟨A, L⟩ → t
⟨A, S⟩ → t
⟨A, R⟩ → ε

20 / 23
Proof continued

For each w = w1 w2 ⋯wn ∈ L, S ⇒ A1 A2 ⋯An where Ai ⇒ wi
By construction,

⟨S, S⟩ ⇒ ⟨A1 , L⟩⋯⟨Ai−1 , L⟩⟨Ai , S⟩⟨Ai+1 , R⟩⋯⟨An , R⟩

⇒ w1 w2 ⋯wi

for each 1 ≤ i ≤ n

I.e., G derives the prefix of every string in L

A similar argument works to show that if G derives a string then it’s a prefix of some
string in L

21 / 23
Applying the construction
Deriving ab# ⟨S, S⟩
⟨S, S⟩ ⇒ ⟨A, L⟩⟨U, S⟩
⇒ a⟨U, S⟩
⇒ a⟨T, S⟩⟨A, R⟩ ⟨A, L⟩ ⟨U, S⟩
⇒ a⟨B, L⟩⟨V, S⟩⟨A, R⟩
⇒ ab⟨V, S⟩⟨A, R⟩
a ⟨T, S⟩ ⟨A, R⟩
⇒ ab⟨T, S⟩⟨BA, R⟩
⇒ ab#⟨B, R⟩⟨A, R⟩
⇒ ab#⟨A, R⟩ ⟨B, L⟩ ⟨V, S⟩ ε
⇒ ab#

b ⟨T, S⟩⟨B, R⟩

# ε
22 / 23
Similarities with regular expression
Proving things about
• Regular languages. Assume there exists a regular expression that generates the
language and consider the six cases
• Context-free languages. Assume there exists a CFG that generates the language
and consider the three types of rules

23 / 23
Pushdown Automata (PDA)
( )
Reading: Chapter 6

1
PDA - the automata for CFLs
 What is?
 FA to Reg Lang,
Lang PDA is to CFL
 PDA == [  -NFA + “a stack” ]
 Wh a stack?
Why t k?

Input -NFA Accept/reject


string

A stack filled with “stack symbols”


2
Pushdown Automata -
Definition
 A PDA P := ( Q,∑,, δ,q0,Z0,F ):
 Q: states of the -NFA
 ∑: input alphabet
 : stack symbols
 δ: transition function
 q0: start state
 Z0: Initial stack top ssymbol
mbol
 F: Final/accepting states

3
old state input symb. Stack top new state(s) new Stack top(s)

δ : Q x ∑ x  => Q x 

δ : The Transition Function


δ(q,a,X) = {(p,Y), …}
1. state transition from q to p
2. a is the next input symbol a X Y
3. X is the current stack top symbol q p
4. Y is the replacement for X;
it is in * (a string of stack Y=? Action
symbols)
i. Set Y =  for: Pop(X) i) Y= Pop(X)
ii. If Y=X: stack top is ii) Y=X Pop(X)
unchanged Push(X)
iii. If Y=Z1Z2…Zk: X is popped
and is replaced by Y in iii) Y=Z
Y Z1Z2..Z
Zk Pop(X)
reverse order (i.e., Z1 will be Push(Zk)
the new stack top) Push(Zk-1)

Push(Z2)
Push(Z1)

4
Example
Let Lwwr = {wwR | w is in (0+1)*}
 CFG for Lwwr : S==> 0S0 | 1S1 | 
 PDA for Lwwr :

 P := ( Q,∑, , δ,q0,Z0,F )

= ( {q0, q1, q2},{0,1},{0,1,Z0},δ,q0,Z0,{q2})

5
Initial state of the PDA:

Stackk
St q0

PDA for Lwwr top Z0

1. δ(q0,0, Z0)={(q0,0Z0)}
First symbol push on stack
2. δ(q0,1, Z0)={(q0,1Z0)}

3. δ(q0,0,
0 0)={(q
0) {(q0,00)}
00)}
4. δ(q0,0, 1)={(q0,01)}
5. δ(q0,1, 0)={(q0,10)} Grow the stack by pushing
6. δ(q0,1, 1)={(q0,11)} new symbols on top of old
(
(w-part)
t)
7. δ(q0, , 0)={(q1, 0)}
8. δ(q0, , 1)={(q1, 1)} Switch to popping mode, nondeterministically
9
9. δ(q0, ,
 Z0))={(q
{(q1, Z0)} (boundary between w and wR)

10. δ(q1,0, 0)={(q1, )}


Shrink the stack by popping matching
11. δ(q1,1, 1)={(q1, )}
symbols
y (w
( R-part)
p )
12. δ(q1, , Z0)={(q2, Z0)} Enter acceptance state
6
PDA as a state diagram
δ(qi,a, X)={(qj,Y)}

Next Current Stack


input stack Top
Current symbol top Replacement
state (w/ string Y)

a, X / Y Next
qi qj state

7
PDA for Lwwr: Transition Diagram

Grow stack ∑ = {0, 1}


0, Z0/0Z0 = {Z0, 0, 1}
1, Z0/1Z0 Pop stack for Q = {q0,q
q1,q
q2}
0, 0/00
0, 1/01
matching symbols
1, 0/10 0, 0/ 
1, 1/11 1, 1/ 

, Z0/Z0 q0 q1 q2
, Z0/Z0 , Z0/Z0
, 0/0
, 1/1 G to acceptance
Go
Switch to
popping mode

This would be a non-deterministic PDA 8


Example
p 2: language
g g of
balanced paranthesis
Pop stack for ∑ = { (, ) }
matching symbols = {Z0, ( }
Grow stack
Q = {q0,qq1,qq2}
(, Z0 / ( Z0
(, ( / ( ( ), ( / 

q0 q1 q2
, Z0 / Z0 ), ( /  , Z0 / Z0
, Z0 / Z0 Go to acceptance (by
G (b fi
finall state))
Switch to when you see the stack bottom symbo
(, ( / ( (
popping mode
(, Z0 / ( Z0

To allow adjacent
blocks of nested paranthesis 9
Example 2: language of balanced
paranthesis (another design)
∑ = { (, ) }
(,Z0 / ( Z0 = {Z0, ( }
(,( / ( (
), ( / 
Q = {q0,qq1}

,Z0/ Z0
start
q0 q1
,Z0/ Z0

10
PDA’s Instantaneous
Description (ID)
A PDA has a configuration at any given instance:
(q,w,y)
 q - current state
 w - remainder of the input (i.e., unconsumed part)
 y - current stack contents as a string from top to bottom
off stack
t k
If δ(q,a, X)={(p, A)} is a transition, then the following are also true:
 (q, a, X ) |--- (p,,A)

 (q,
( aw, XB ) ||--- (p,w,AB)
( )
|--- sign is called a “turnstile notation” and represents
one move
|---* sign represents a sequence of moves
11
How does the PDA for Lwwr
work on input “1111”?
All moves made by the non-deterministic PDA
(q0,1111,Z0)

(q1,1111,Z0) Path dies…


(q0,111,1Z0)

(q0,11,11Z
11 11Z0) (q1,111,1Z
111 1Z0) Path dies
dies…

(q0,1,111Z0) (q1,11,11Z0)

(q0,,1111Z0) (q1,1,111Z0) (q1,1,1Z0) Acceptance by


final state:

( 1, ,1111Z
(q 1111Z0) (q1, ,11Z
 11Z0) (q1, ,Z
 Z0) = emptyt input
i t
Path dies… AND
Path dies…
(q2, ,Z0) final state
12
Principles about IDs
 Theorem 1: If for a PDA,
(q, x, A) |---* (p, y, B), then for any string
w  ∑* and d   *,
* it iis also
l ttrue th
that:
t
 (q, x w, A ) |---* (p, y w, B )

 Theorem 2: If for a PDA,


(q, x w,, A)) |---*
| (p, y w,, B),
), then it is also true
that:
 (q, x, A) |---* (p, y, B)

13
There are two types of PDAs that one can design:
those that accept by final state or by empty stack

Acceptance by…
 PDAs that accept by final state:
 For a PDA P, the language accepted by P,
d
denoted
t db
by L(P) by
b final t t is:
fi l state, i Checklist:
 {w | (q0,w,Z0) |---* (q,, A) }, s.t., q  F - input exhausted?
- in a final state?

 PDAs that accept by empty stack:


 For a PDA P, the language accepted by P,
denoted by N(P) by empty stack, is:
 , 0) ||---* (q, ,, )) }, for anyy q  Q
{{w | (q0,,w,Z Q.
Q) Does a PDA that accepts by empty stack Checklist:
need any final state specified in the design? - input exhausted?
- is the stack empty? 14
Example:
p L of balanced
parenthesis
An equivalent PDA that
PDA that accepts by final state accepts by empty stack
((,Z
Z0 / ( Z0
PF: (,Z0 / ( Z0
PN: (, ( / ( (
), ( / 
(,( / ( ( ,Z0 / 
), ( / 

,Z0/ Z0 start
start
q0 q1 q0
 Z0/ Z0
,Z  Z0/ Z0
,Z

How will these two PDAs work on the input: ( ( ( ) ) ( ) ) ( )


15
PDA for Lwwr: Proof of
correctness
 Theorem: The PDA for Lwwr accepts a string x
by final state if and only if x is of the form
wwR.
 Proof:
 (if part) If the string is of the form wwR then there
(if-part)
exists a sequence of IDs that leads to a final state:
(q0,wwR,Z0) |---* (q0,wR,wZ0) |---* (q1,wR,wZ0) |---*
(q1, ,Z
 Z0) |---
| * (q2, ,Z
 Z0)

 (only-if part)
 Proof by induction on |x|

16
PDAs accepting by final state and empty
stack are equivalent
 PF <= PDA accepting by final state
 PF = (QF,∑, , δF,q0,Z0,F)
 PN <= PDA accepting by empty stack
 PN = (QN,∑, , δN,q0,Z0)
 Th
Theorem:
 (PN==> PF) For every PN, there exists a PF s.t. L(PF)=L(PN)

 (PF==> PN) For every PF, there exists a PN s.t. L(PF)=L(PN)

17
How to convert an empty stack PDA into a final state PDA?

PN==> PF construction
 Whenever PN’s stack becomes empty, make PF go to
a final state without consuming any addition symbol
 To detect empty stack in PN: PF pushes a new stack
symbol X0 (not in  of PN) initially before simultating
PN

PF: PN: , X0/ X0


, X0/Z0X0 , X0/ X0
New
start p0 q0 , X0/ X0 pf

, X0 / X0
, X0/ X0

PF = (QN U {p0,pf}, ∑, PN U {X0}, δF, p0, X0, {pf}) 18


Example: Matching parenthesis “(” “)”
PN: ( {q0}, {(,)}, {Z0,Z1}, δN, q0, Z0 ) Pf: ( {p0,q0 ,pf}, {(,)}, {X0,Z0,Z1}, δf, p0, X0 , pf)

δN : δN(q0,(,Z0) = { (q0,Z1Z0) } δf: δf(p0, ,X0) = { (q0,Z0) }


δN(q0,(,Z
( Z1) = { (q0, Z1Z1) } δf(q0,(,Z
( Z0) = { (q0,Z
Z1 Z0) }
δN(q0,),Z1) = { (q0, ) }
δf(q0,(,Z1) = { (q0, Z1Z1) }
δf(q0,),Z1) = { (q0, ) }
δN(q0, ,Z0) = { (q0, ) } δf(q0, ,Z0) = { (q0, ) }
δf(p0, ,X0) = { (pf, X0 ) }

(,Z0 /Z1Z0 (,Z0/Z1Z0


(,Z1 /Z1Z1 (,Z1/Z1Z1
),Z1 /  ),Z1/ 
,Z0 /   ,Z
Z0/ 

start
q0
start
,X /Z X
0 0 0
,X / X
0 0
p0 q0 pf

Accept by empty stack Accept by final state 19


How to convert an final state PDA into an empty stack PDA?

PF==> PN construction
 Main idea:
 Whenever PF reaches a final state, just make an  -transition into a
new end state, clear out the stack and accept
 Danger: What if PF design is such that it clears the stack midway
without entering a final state?
 to address this,, add a new start symbol
y X0 ((not in  of PF)
PN = (Q U {p0,pe}, ∑,  U {X0}, δN, p0, X0)

PN:
, X0/Z0X0 , any/  , any/ 
New
start p0 q0 , any/  pe

, any/ 
PF

20
Equivalence of PDAs and
CFGs

21
CFGs == PDAs ==> CFLs

PDA by PDA by

final state empty stack

?
CFG

22
This is same as: “implementing a CFG using a PDA”

Converting CFG to PDA


Main idea: The PDA simulates the leftmost derivation on a given
w, and upon consuming it fully it either arrives at acceptance (by
empty stack) or non
non-acceptance.
acceptance

accept

PUT
PDA
UT
INPU

w (
(acceptance

OUTP
by empty
stack) reject

implements

CFG

23
This is same as: “implementing a CFG using a PDA”

Converting a CFG into a PDA


Main idea: The PDA simulates the leftmost derivation on a given w,
and upon consuming it fully it either arrives at acceptance (by
empty
p y stack)) or non-acceptance.
p
Steps:
1. Push the right hand side of the production onto the stack,
with leftmost symbol at the stack top
2. If stack top is the leftmost variable, then replace it by all its
productions (each possible substitution will represent a
distinct path taken by the non
non-deterministic
deterministic PDA)
3. If stack top has a terminal symbol, and if it matches with the
next symbol in the input string, then pop it
S
State iis iinconsequential
i l ((only
l one state iis needed)
d d)

24
Formal construction of PDA
from CFG Note: Initial stack symbol (S)
same as the start variable
i th
in the grammar
 Given: G= (V,T,P,S)
 Output: PN = ({q}
({q}, T
T, V U T
T, δ,
δ q,
q S)
 δ:
Before:
 For allll A  V , add
F dd th
the ffollowing
ll i After:
A
transition(s) in the PDA: 


 δ(q  ,A)
δ(q, (q, ) | “A
A) = { (q ==>”  P}
A ==>
Before:
 For all a  T, add the following After: a…
a transition(s) in the PDA: a pop

δ(q,a,a)= { (q,  ) }



25
Example: CFG to PDA
1,1 / 
 G = ( {S,A}, {0,1}, P, S) 0,0 / 
,A / 01
,A / A1
 P: ,A / 0A1
 S/ 
,S
 S ==> AS |  ,S / AS

 A ==> 0A1 | A1 | 01 ,S / S


q
 PDA = ({q},
({ } {0,1},
{0 1} {0
{0,1,A,S},
1 A S} δ,
δ q, S)
 δ:
 δ(q,  , S) = { (q
δ(q (q, AS)
AS), (q
(q,  )}
 δ(q,  , A) = { (q,0A1), (q,A1), (q,01) }
 δ(q, 0, 0) = { (q,  ) }
 δ(q, 1, 1) = { (q,  ) } How will this new PDA work?
Lets simulate string 0011
26
Simulating
g string
g 0011 on the
new PDA … Leftmost deriv.:
1/ 
1,1
1
0,0 / 
PDA (δ): ,A / 01 S => AS
,A / A1
δ(q,  , S) = { (q, AS), (q,  )} ,A / 0A1
=> 0A1S
δ(q,  , A) = { (q,0A1), (q,A1), (q,01) } ,S
, / => 0011S
δ(q, 0, 0) = { (q,  ) } ,S / AS
δ(q, 1, 1) = { (q,  ) } => 0011
,S / S
Stack moves (shows only the successful path): q

0 0
A A 1 1
A 1 1 1 1 1 Accept by
S S S S S S S S
empty stack
0 0 1 1 

S =>AS =>0A1S =>0011S => 0011


27
Proof of correctness for CFG ==> PDA
construction
 Claim: A string is accepted by G iff it is
accepted (by empty stack) by the PDA
 Proof:
 (only-if part)
 Prove by induction on the number of derivation steps

 (if part)
 If (q,
( wx, S) |--
| * (q,x,B)
( B) then
th S =>*lm wB
B

28
Converting a PDA into a CFG
 Main idea: Reverse engineer the
productions from transitions
If δ(q,a,Z) => (p, Y1Y2Y3…Yk):
1. State is changed from q to p;
2. Terminal a is consumed;
3. Stack top symbol Z is popped and replaced with a
sequence of k variables.
 Action: Create a g
grammar variable called
“[qZp]” which includes the following
production:
 [qZp] => a[pY1q1] [q1Y2q2] [q2Y3q3]…
] [qk-1Ykqk]

 Proof discussion (in the book) 29


Example: Bracket matching
 To avoid confusion, we will use b=“(“ and e=“)”
PN: ( {q0}, {b,e}, {Z0,Z1}, δ, q0, Z0 )
0. S => [q0Z0q0]
1. δ(q0,b,Z0) = { (q0,Z1Z0) } 1
1. [q0Z0q0] => b [q0Z1q0] [q0Z0q0]
2. δ(q0,b,Z1) = { (q0,Z1Z1) } 2. [q0Z1q0] => b [q0Z1q0] [q0Z1q0]
3. [q0Z1q0] => e
3. δ(q0,e,Z1) = { (q0,  ) }
4. [q0Z0q0] => 
4. (q0,  ,,Z0) = { (q0,  ) }
δ(q
Let A=[q0Z0q0] Simplifying,
Let B=[q0Z1q0]
If you were to directly write a CFG: 0. S => b B S | 
0. S => A
1. B => b B B | e
1. A => b B A
S => b S e S |  2. B => b B B
3. B => e
4. A => 

30
Two ways to build a CFG
Build a PDA Construct (indirect)
CFG from PDA

Derive CFG directly (direct)

Similarly… Two ways to build a PDA


Derive a CFG Construct
(indirect)
PDA ffrom CFG

Design a PDA directly (direct)

31
Deterministic PDAs

32
This PDA for Lwwr is non-deterministic
Grow stack
0, Z0/0Z0
Why does it have
1, Z0/1Z0 Pop stack for to be non-
0, 0/00
matching symbols deterministic?
0, 1/01
1, 0/10 0, 0/ 
1, 1/11
1,, 1/ 

q0 q1 q2
, Z0/Z0 , Z0/Z0
, 0/0
, 1/1 A
Accepts b
by fi
finall state
Switch to To remove
popping mode guessing,
impose the user
to insert c in the
middle 33
Example shows that: Nondeterministic PDAs ≠ D-PDAs

D PDA for Lwcwr = {wcwR | c is some


D-PDA
special symbol not in w}
Note:
• all transitions have
Grow stack become deterministic
0, Z0/0Z0 Pop stack for
1, Z0/1Z0 matching symbols
0, 0/00
0, 1/01 0, 0/ 
1, 0/10
1, 1/ 
1, 1/11

q0 q1 q2
c, Z0/Z0 , Z0/Z0
c, 0/0
c, 1/1 Accepts b
A by
Switch to final state
popping mode

34
Deterministic PDA: Definition
 A PDA is deterministic if and only if:
1
1. δ(q,a,X)
δ(q a X) has at most one member for any
a  ∑ U {}

 If δ(q,a,X) is non-empty for some a∑,


δ(q, ,X)
then δ(q  X) must be empty
empty.

35
PDA vs DPDA vs Regular
g
languages
Lwcwr Lwwr

Regular languages D-PDA

non-deterministic PDA

36
Summary
 PDAs for CFLs and CFGs
 Non-deterministic
 Deterministic
 PDA acceptance types
1. By final state
2. By empty stack
 PDA
 IDs, Transition diagram
 Equivalence of CFG and PDA
 CFG => PDA construction
 PDA => CFG construction 37
THE PUMPING LEMMA
THE PUMPING LEMMA
Theorem. For any regular language L there exists an integer
n, such that for all x ∈ L with |x| ≥ n, there exist u, v, w ∈ Σ∗ ,
such that

(1) x = uvw

(2) |uv| ≤ n
(3) |v| ≥ 1

(4) for all i ≥ 0: uv i w ∈ L.


x
L
x
THE PUMPING LEMMA
Theorem. For any regular language L there exists an integer
n, such that for all x ∈ L with |x| ≥ n, there exist u, v, w ∈ Σ∗ ,
such that

(1) x = uvw

(2) |uv| ≤ n
(3) |v| ≥ 1

(4) for all i ≥ 0: uv i w ∈ L.


n x
L
x
THE PUMPING LEMMA
Theorem. For any regular language L there exists an integer
n, such that for all x ∈ L with |x| ≥ n, there exist u, v, w ∈ Σ∗ ,
such that

(1) x = uvw

(2) |uv| ≤ n
(3) |v| ≥ 1

(4) for all i ≥ 0: uv i w ∈ L.


n
L
x
x
THE PUMPING LEMMA
Theorem. For any regular language L there exists an integer
n, such that for all x ∈ L with |x| ≥ n, there exist u, v, w ∈ Σ∗ ,
such that

(1) x = uvw

(2) |uv| ≤ n
(3) |v| ≥ 1

(4) for all i ≥ 0: uv i w ∈ L.


n
L
x

x
THE PUMPING LEMMA
Theorem. For any regular language L there exists an integer
n, such that for all x ∈ L with |x| ≥ n, there exist u, v, w ∈ Σ∗ ,
such that

(1) x = uvw

(2) |uv| ≤ n
(3) |v| ≥ 1

(4) for all i ≥ 0: uv i w ∈ L.


n
L
x

u vx w
THE PUMPING LEMMA
Theorem. For any regular language L there exists an integer
n, such that for all x ∈ L with |x| ≥ n, there exist u, v, w ∈ Σ∗ ,
such that

(1) x = uvw

(2) |uv| ≤ n
(3) |v| ≥ 1

(4) for all i ≥ 0: uv i w ∈ L.


n
L
x
u w
u vx w
THE PUMPING LEMMA
Theorem. For any regular language L there exists an integer
n, such that for all x ∈ L with |x| ≥ n, there exist u, v, w ∈ Σ∗ ,
such that

(1) x = uvw

(2) |uv| ≤ n
(3) |v| ≥ 1

(4) for all i ≥ 0: uv i w ∈ L.


n
L
x
u w
u vx w
u v v w
THE PUMPING LEMMA
Theorem. For any regular language L there exists an integer
n, such that for all x ∈ L with |x| ≥ n, there exist u, v, w ∈ Σ∗ ,
such that

(1) x = uvw

(2) |uv| ≤ n
(3) |v| ≥ 1

(4) for all i ≥ 0: uv i w ∈ L.


n
L
x
u w
u vx w
u v v w
u v v v w
PROOF OF P.L. (SKETCH)
PROOF OF P.L. (SKETCH)
Let M be a DFA for L. Take n be the number of states of M
plus 1.
PROOF OF P.L. (SKETCH)
Let M be a DFA for L. Take n be the number of states of M
plus 1.
Take any x ∈ L with |x| ≥ n. Consider the path (from start
state to an accepting state) in M that corresponds to x. The
length of this path is |x| ≥ n.
PROOF OF P.L. (SKETCH)
Let M be a DFA for L. Take n be the number of states of M
plus 1.
Take any x ∈ L with |x| ≥ n. Consider the path (from start
state to an accepting state) in M that corresponds to x. The
length of this path is |x| ≥ n.

Since M has at most n − 1 states, some state must be visited


twice or more in the first n steps of the path.
PROOF OF P.L. (SKETCH)
Let M be a DFA for L. Take n be the number of states of M
plus 1.
Take any x ∈ L with |x| ≥ n. Consider the path (from start
state to an accepting state) in M that corresponds to x. The
length of this path is |x| ≥ n.

Since M has at most n − 1 states, some state must be visited


twice or more in the first n steps of the path.

u
v
w
PROOF OF P.L. (SKETCH)
Let M be a DFA for L. Take n be the number of states of M
plus 1.
Take any x ∈ L with |x| ≥ n. Consider the path (from start
state to an accepting state) in M that corresponds to x. The
length of this path is |x| ≥ n.

Since M has at most n − 1 states, some state must be visited


twice or more in the first n steps of the path.

u uw ∈ L
x = uvw ∈ L
v uvvw ∈ L
w uvvvw ∈ L
...
USING PUMPING LEMMA TO PROVE NON-REGULARITY

L regular =⇒ L satisfies P.L.


L non-regular =⇒ ?
L non-regular ⇐= L doesn’t satisfy P.L.
Negation:
∃ n ∈ N ∀ x ∈ L with |x| ≥ n
∃ u, v, w ∈ Σ∗
all of these hold:

(1) x = uvw

(2) |uv| ≤ n
(3) |v| ≥ 1

(4) ∀ i ≥ 0: uv i w ∈ L.
USING PUMPING LEMMA TO PROVE NON-REGULARITY

L regular =⇒ L satisfies P.L.


L non-regular =⇒ ?
L non-regular ⇐= L doesn’t satisfy P.L.
Negation:
∃ n ∈ N ∀ x ∈ L with |x| ≥ n
∃ u, v, w ∈ Σ∗
all of these hold:

(1) x = uvw

(2) |uv| ≤ n
(3) |v| ≥ 1

(4) ∀ i ≥ 0: uv i w ∈ L.
USING PUMPING LEMMA TO PROVE NON-REGULARITY

L regular =⇒ L satisfies P.L.


L non-regular =⇒ ?
L non-regular ⇐= L doesn’t satisfy P.L.
Negation:
∃ n ∈ N ∀ x ∈ L with |x| ≥ n ∀ n ∈ N ∃ x ∈ L with |x| ≥ n
∃ u, v, w ∈ Σ∗ ∀ u, v, w ∈ Σ∗
all of these hold: not all of these hold:

(1) x = uvw (1) x = uvw

(2) |uv| ≤ n (2) |uv| ≤ n


(3) |v| ≥ 1 (3) |v| ≥ 1

(4) ∀ i ≥ 0: uv i w ∈ L. (4) ∀ i ≥ 0: uv i w ∈ L.
USING PUMPING LEMMA TO PROVE NON-REGULARITY

L regular =⇒ L satisfies P.L.


L non-regular =⇒ ?
L non-regular ⇐= L doesn’t satisfy P.L.
Negation:
∃ n ∈ N ∀ x ∈ L with |x| ≥ n ∀ n ∈ N ∃ x ∈ L with |x| ≥ n
∃ u, v, w ∈ Σ∗ ∀ u, v, w ∈ Σ∗
all of these hold: not all of these hold:

(1) x = uvw (1) x = uvw


Equivalently:
(2) |uv|(1)
≤∧n (2) ∧ (3) ⇒ not(4) (2) |uv| ≤ n
where not(4) is:
(3) |v| ≥ 1 i (3) |v| ≥ 1
∃ i : uv w �∈ L
(4) ∀ i ≥ 0: uv i w ∈ L. (4) ∀ i ≥ 0: uv i w ∈ L.
EXAMPLE 1
Prove that L = {0i 1i : i ≥ 0} is NOT regular.
EXAMPLE 1
Prove that L = {0i 1i : i ≥ 0} is NOT regular.
Proof. Show that P.L. doesn’t hold (note: showing P.L. holds
doesn’t mean regularity).
EXAMPLE 1
Prove that L = {0i 1i : i ≥ 0} is NOT regular.
Proof. Show that P.L. doesn’t hold (note: showing P.L. holds
doesn’t mean regularity).
If L is regular, then by P.L. ∃ n such that . . .
EXAMPLE 1
Prove that L = {0i 1i : i ≥ 0} is NOT regular.
Proof. Show that P.L. doesn’t hold (note: showing P.L. holds
doesn’t mean regularity).
If L is regular, then by P.L. ∃ n such that . . .
Now let x = 0n 1n
EXAMPLE 1
Prove that L = {0i 1i : i ≥ 0} is NOT regular.
Proof. Show that P.L. doesn’t hold (note: showing P.L. holds
doesn’t mean regularity).
If L is regular, then by P.L. ∃ n such that . . .
Now let x = 0n 1n
x ∈ L and |x| ≥ n, so by P.L. ∃u, v, w such that (1)–(4) hold.
EXAMPLE 1
Prove that L = {0i 1i : i ≥ 0} is NOT regular.
Proof. Show that P.L. doesn’t hold (note: showing P.L. holds
doesn’t mean regularity).
If L is regular, then by P.L. ∃ n such that . . .
Now let x = 0n 1n
x ∈ L and |x| ≥ n, so by P.L. ∃u, v, w such that (1)–(4) hold.
We show that ∀ u, v, w (1)–(4) don’t all hold.
EXAMPLE 1
Prove that L = {0i 1i : i ≥ 0} is NOT regular.
Proof. Show that P.L. doesn’t hold (note: showing P.L. holds
doesn’t mean regularity).
If L is regular, then by P.L. ∃ n such that . . .
Now let x = 0n 1n
x ∈ L and |x| ≥ n, so by P.L. ∃u, v, w such that (1)–(4) hold.
We show that ∀ u, v, w (1)–(4) don’t all hold.
If (1), (2), (3) hold then x = 0n 1n = uvw with |uv| ≤ n and
|v| ≥ 1.
EXAMPLE 1
Prove that L = {0i 1i : i ≥ 0} is NOT regular.
Proof. Show that P.L. doesn’t hold (note: showing P.L. holds
doesn’t mean regularity).
If L is regular, then by P.L. ∃ n such that . . .
Now let x = 0n 1n
x ∈ L and |x| ≥ n, so by P.L. ∃u, v, w such that (1)–(4) hold.
We show that ∀ u, v, w (1)–(4) don’t all hold.
If (1), (2), (3) hold then x = 0n 1n = uvw with |uv| ≤ n and
|v| ≥ 1.
So, u = 0s , v = 0t , w = 0p 1n with
EXAMPLE 1
Prove that L = {0i 1i : i ≥ 0} is NOT regular.
Proof. Show that P.L. doesn’t hold (note: showing P.L. holds
doesn’t mean regularity).
If L is regular, then by P.L. ∃ n such that . . .
Now let x = 0n 1n
x ∈ L and |x| ≥ n, so by P.L. ∃u, v, w such that (1)–(4) hold.
We show that ∀ u, v, w (1)–(4) don’t all hold.
If (1), (2), (3) hold then x = 0n 1n = uvw with |uv| ≤ n and
|v| ≥ 1.
So, u = 0s , v = 0t , w = 0p 1n with
s + t ≤ n, t ≥ 1, p ≥ 0, s + t + p = n.
EXAMPLE 1
Prove that L = {0i 1i : i ≥ 0} is NOT regular.
Proof. Show that P.L. doesn’t hold (note: showing P.L. holds
doesn’t mean regularity).
If L is regular, then by P.L. ∃ n such that . . .
Now let x = 0n 1n
x ∈ L and |x| ≥ n, so by P.L. ∃u, v, w such that (1)–(4) hold.
We show that ∀ u, v, w (1)–(4) don’t all hold.
If (1), (2), (3) hold then x = 0n 1n = uvw with |uv| ≤ n and
|v| ≥ 1.
So, u = 0s , v = 0t , w = 0p 1n with
s + t ≤ n, t ≥ 1, p ≥ 0, s + t + p = n.
But then (4) fails for i = 0:
EXAMPLE 1
Prove that L = {0i 1i : i ≥ 0} is NOT regular.
Proof. Show that P.L. doesn’t hold (note: showing P.L. holds
doesn’t mean regularity).
If L is regular, then by P.L. ∃ n such that . . .
Now let x = 0n 1n
x ∈ L and |x| ≥ n, so by P.L. ∃u, v, w such that (1)–(4) hold.
We show that ∀ u, v, w (1)–(4) don’t all hold.
If (1), (2), (3) hold then x = 0n 1n = uvw with |uv| ≤ n and
|v| ≥ 1.
So, u = 0s , v = 0t , w = 0p 1n with
s + t ≤ n, t ≥ 1, p ≥ 0, s + t + p = n.
But then (4) fails for i = 0:
uv 0 w = uw = 0s 0p 1n = 0s+p 1n �∈ L, since s + p �= n
IN PICTURE
∃ u, v, w such that

(1) x = uvw

(2) |uv| ≤ n
(3) |v| ≥ 1

(4) ∀ i ∈ N : uv i w ∈ L.
IN PICTURE
∃ u, v, w such that

(1) x = uvw
u v w
(2) |uv| ≤ n � �� � ���� � �� �
(3) |v| ≥ 1
00000 0 . . . 01111 . . . 1 ∈ L
(4) ∀ i ∈ N : uv i w ∈ L.
IN PICTURE
∃ u, v, w such that non-empty
(1) x = uvw
u v w
(2) |uv| ≤ n � �� � ���� � �� �
(3) |v| ≥ 1
00000 0 . . . 01111 . . . 1 ∈ L
(4) ∀ i ∈ N : uv i w ∈ L.
IN PICTURE
∃ u, v, w such that non-empty
(1) x = uvw
u v w
(2) |uv| ≤ n � �� � ���� � �� �
(3) |v| ≥ 1
00000 0 . . . 01111 . . . 1 ∈ L
(4) ∀ i ∈ N : uv i w ∈ L.

If (1), (2), (3) hold then (4) fails: it is not the case that for all
i, uv i w is in L.

In particular, let i = 0. uw �∈ L.
EXAMPLE 2
Prove that L = {0i : i is a prime} is NOT regular.
EXAMPLE 2
Prove that L = {0i : i is a prime} is NOT regular.

Proof. We show that P.L. doesn’t hold.


EXAMPLE 2
Prove that L = {0i : i is a prime} is NOT regular.

Proof. We show that P.L. doesn’t hold.


If L is regular, then by P.L. ∃ n such that . . .
EXAMPLE 2
Prove that L = {0i : i is a prime} is NOT regular.

Proof. We show that P.L. doesn’t hold.


If L is regular, then by P.L. ∃ n such that . . .
Now let x = 0m where m ≥ n + 2 is prime.
EXAMPLE 2
Prove that L = {0i : i is a prime} is NOT regular.

Proof. We show that P.L. doesn’t hold.


If L is regular, then by P.L. ∃ n such that . . .
Now let x = 0m where m ≥ n + 2 is prime.
x ∈ L and |x| ≥ n, so by P.L. ∃u, v, w such that (1)–(4) hold.
EXAMPLE 2
Prove that L = {0i : i is a prime} is NOT regular.

Proof. We show that P.L. doesn’t hold.


If L is regular, then by P.L. ∃ n such that . . .
Now let x = 0m where m ≥ n + 2 is prime.
x ∈ L and |x| ≥ n, so by P.L. ∃u, v, w such that (1)–(4) hold.
We show that ∀ u, v, w (1)–(4) don’t all hold.
EXAMPLE 2
Prove that L = {0i : i is a prime} is NOT regular.

Proof. We show that P.L. doesn’t hold.


If L is regular, then by P.L. ∃ n such that . . .
Now let x = 0m where m ≥ n + 2 is prime.
x ∈ L and |x| ≥ n, so by P.L. ∃u, v, w such that (1)–(4) hold.
We show that ∀ u, v, w (1)–(4) don’t all hold.
If 0m is written as 0m = uvw, then 0m = 0|u| 0|v| 0|w| .
EXAMPLE 2
Prove that L = {0i : i is a prime} is NOT regular.

Proof. We show that P.L. doesn’t hold.


If L is regular, then by P.L. ∃ n such that . . .
Now let x = 0m where m ≥ n + 2 is prime.
x ∈ L and |x| ≥ n, so by P.L. ∃u, v, w such that (1)–(4) hold.
We show that ∀ u, v, w (1)–(4) don’t all hold.
If 0m is written as 0m = uvw, then 0m = 0|u| 0|v| 0|w| .
If |uv| ≤ n and |v| ≥ 1, then consider i = |v| + |w|:
EXAMPLE 2
Prove that L = {0i : i is a prime} is NOT regular.

Proof. We show that P.L. doesn’t hold.


If L is regular, then by P.L. ∃ n such that . . .
Now let x = 0m where m ≥ n + 2 is prime.
x ∈ L and |x| ≥ n, so by P.L. ∃u, v, w such that (1)–(4) hold.
We show that ∀ u, v, w (1)–(4) don’t all hold.
If 0m is written as 0m = uvw, then 0m = 0|u| 0|v| 0|w| .
If |uv| ≤ n and |v| ≥ 1, then consider i = |v| + |w|:

uv wi
= 0 0 |v| |v|(|v|+|w|) |w|
0
= 0 (|v|+1)(|v|+|w|)
�∈ L
EXAMPLE 2
Prove that L = {0i : i is a prime} is NOT regular.

Proof. We show that P.L. doesn’t hold.


If L is regular, then by P.L. ∃ n such that . . .
Now let x = 0m where m ≥ n + 2 is prime.
x ∈ L and |x| ≥ n, so by P.L. ∃u, v, w such that (1)–(4) hold.
We show that ∀ u, v, w (1)–(4) don’t all hold.
If 0m is written as 0m = uvw, then 0m = 0|u| 0|v| 0|w| .
If |uv| ≤ n and |v| ≥ 1, then consider i = |v| + |w|:

uv wi
= 0 0 |v| |v|(|v|+|w|) |w|
0
= 0 (|v|+1)(|v|+|w|)
�∈ L
Both factors ≥ 2
EXAMPLE 3
Prove that L = {yy : y ∈ {0, 1}∗ } is NOT regular.
EXAMPLE 3
Prove that L = {yy : y ∈ {0, 1}∗ } is NOT regular.
Again we try to show that P.L. doesn’t hold.
EXAMPLE 3
Prove that L = {yy : y ∈ {0, 1}∗ } is NOT regular.
Again we try to show that P.L. doesn’t hold.
If L is regular, then by P.L. ∃ n such that . . .
EXAMPLE 3
Prove that L = {yy : y ∈ {0, 1}∗ } is NOT regular.
Again we try to show that P.L. doesn’t hold.
If L is regular, then by P.L. ∃ n such that . . .
Let us consider x = 0n 0n ∈ L. Obviously |x| ≥ n.
EXAMPLE 3
Prove that L = {yy : y ∈ {0, 1}∗ } is NOT regular.
Again we try to show that P.L. doesn’t hold.
If L is regular, then by P.L. ∃ n such that . . .
Let us consider x = 0n 0n ∈ L. Obviously |x| ≥ n.
Can 0n 0n be written as 0n 0n = uvw such that |uv| ≤ n |v| ≥ 1
and that for all i: uv i w ∈ L?
EXAMPLE 3
Prove that L = {yy : y ∈ {0, 1}∗ } is NOT regular.
Again we try to show that P.L. doesn’t hold.
If L is regular, then by P.L. ∃ n such that . . .
Let us consider x = 0n 0n ∈ L. Obviously |x| ≥ n.
Can 0n 0n be written as 0n 0n = uvw such that |uv| ≤ n |v| ≥ 1
and that for all i: uv i w ∈ L?
YES! Let u = �, v = 00, and w = 02n−2 .
EXAMPLE 3
Prove that L = {yy : y ∈ {0, 1}∗ } is NOT regular.
Again we try to show that P.L. doesn’t hold.
If L is regular, then by P.L. ∃ n such that . . .
Let us consider x = 0n 0n ∈ L. Obviously |x| ≥ n.
Can 0n 0n be written as 0n 0n = uvw such that |uv| ≤ n |v| ≥ 1
and that for all i: uv i w ∈ L?
YES! Let u = �, v = 00, and w = 02n−2 .
Then ∀i , uv i w is of the form 02k = 0k 0k .
EXAMPLE 3
Prove that L = {yy : y ∈ {0, 1}∗ } is NOT regular.
Again we try to show that P.L. doesn’t hold.
If L is regular, then by P.L. ∃ n such that . . .
Let us consider x = 0n 0n ∈ L. Obviously |x| ≥ n.
Can 0n 0n be written as 0n 0n = uvw such that |uv| ≤ n |v| ≥ 1
and that for all i: uv i w ∈ L?
YES! Let u = �, v = 00, and w = 02n−2 .
Then ∀i , uv i w is of the form 02k = 0k 0k .

Does this mean that L is regular?


EXAMPLE 3
Prove that L = {yy : y ∈ {0, 1}∗ } is NOT regular.
Again we try to show that P.L. doesn’t hold.
If L is regular, then by P.L. ∃ n such that . . .
Let us consider x = 0n 0n ∈ L. Obviously |x| ≥ n.
Can 0n 0n be written as 0n 0n = uvw such that |uv| ≤ n |v| ≥ 1
and that for all i: uv i w ∈ L?
YES! Let u = �, v = 00, and w = 02n−2 .
Then ∀i , uv i w is of the form 02k = 0k 0k .

Does this mean that L is regular?

NO. We have chosen a bad string x. To show that L fails the


P.L., we only need to exhibit some x that cannot be “pumped”
(and |x| ≥ n).
EXAMPLE 3, 2ND ATTEMPT
Prove that L = {yy : y ∈ {0, 1}∗ } is NOT regular.
EXAMPLE 3, 2ND ATTEMPT
Prove that L = {yy : y ∈ {0, 1}∗ } is NOT regular.
Given n from the P.L., let x = (01)n (01)n . Obviously x ∈ L
and |x| ≥ n.
EXAMPLE 3, 2ND ATTEMPT
Prove that L = {yy : y ∈ {0, 1}∗ } is NOT regular.
Given n from the P.L., let x = (01)n (01)n . Obviously x ∈ L
and |x| ≥ n.

Q: Can x be “pumped” for some choice of u, v, w with |uv| ≤ n


and |v| ≥ 1?
EXAMPLE 3, 2ND ATTEMPT
Prove that L = {yy : y ∈ {0, 1}∗ } is NOT regular.
Given n from the P.L., let x = (01)n (01)n . Obviously x ∈ L
and |x| ≥ n.

Q: Can x be “pumped” for some choice of u, v, w with |uv| ≤ n


and |v| ≥ 1?
A: Yes! Take u = �, v = 0101, w = (01)2n−2 .
EXAMPLE 3, 2ND ATTEMPT
Prove that L = {yy : y ∈ {0, 1}∗ } is NOT regular.
Given n from the P.L., let x = (01)n (01)n . Obviously x ∈ L
and |x| ≥ n.

Q: Can x be “pumped” for some choice of u, v, w with |uv| ≤ n


and |v| ≥ 1?
A: Yes! Take u = �, v = 0101, w = (01)2n−2 .

Another bad choice of x!


EXAMPLE 3, 3RD ATTEMPT
Prove that L = {yy : y ∈ {0, 1}∗ } is NOT regular.
EXAMPLE 3, 3RD ATTEMPT
Prove that L = {yy : y ∈ {0, 1}∗ } is NOT regular.
Given n from the P.L., let x = 0n 10n 1. Again x ∈ L and
|x| ≥ n.
EXAMPLE 3, 3RD ATTEMPT
Prove that L = {yy : y ∈ {0, 1}∗ } is NOT regular.
Given n from the P.L., let x = 0n 10n 1. Again x ∈ L and
|x| ≥ n.
∀u, v, w such that 0n 10n 1 = uvw and |uv| ≤ n and |v| ≥ 1:
EXAMPLE 3, 3RD ATTEMPT
Prove that L = {yy : y ∈ {0, 1}∗ } is NOT regular.
Given n from the P.L., let x = 0n 10n 1. Again x ∈ L and
|x| ≥ n.
∀u, v, w such that 0n 10n 1 = uvw and |uv| ≤ n and |v| ≥ 1:
must have uv contained in the first group of 0n . Thus consider
EXAMPLE 3, 3RD ATTEMPT
Prove that L = {yy : y ∈ {0, 1}∗ } is NOT regular.
Given n from the P.L., let x = 0n 10n 1. Again x ∈ L and
|x| ≥ n.
∀u, v, w such that 0n 10n 1 = uvw and |uv| ≤ n and |v| ≥ 1:
must have uv contained in the first group of 0n . Thus consider

uv w = 0
0 n−|v|
10 1.n
EXAMPLE 3, 3RD ATTEMPT
Prove that L = {yy : y ∈ {0, 1}∗ } is NOT regular.
Given n from the P.L., let x = 0n 10n 1. Again x ∈ L and
|x| ≥ n.
∀u, v, w such that 0n 10n 1 = uvw and |uv| ≤ n and |v| ≥ 1:
must have uv contained in the first group of 0n . Thus consider

uv w = 0
0 n−|v|
10 1.n

Since |v| is at least 1, this is clearly not of the form yy.


Solutions to Practice Problems

Pumping Lemma

1. L = { akbk | k $ 0}
see notes

2. L = {ak | k is a prime number}


Proof by contradiction:
Let us assume L is regular. Clearly L is infinite (there are infinitely many prime
numbers). From the pumping lemma, there exists a number n such that any string
w of length greater than n has a “repeatable” substring generating more strings in
the language L. Let us consider the first prime number p $ n. For example, if n was
50 we could use p = 53. From the pumping lemma the string of length p has a
“repeatable” substring. We will assume that this substring is of length k $ 1. Hence:

ap 0 L and
ap + k 0 L as well as
ap+2k 0 L, etc.

It should be relatively clear that p + k, p + 2k, etc., cannot all be prime but let us add
k p times, then we must have:

ap + pk 0 L, of course ap + pk = ap (k + 1)

so this would imply that (k + 1)p is prime, which it is not since it is divisible by both p
and k + 1.

Hence L is not regular.

3. L = {anbn+1}
Assume L is regular. From the pumping lemma there exists a p such that every w 0
L such that |w| $ p can be represented as x y z with |y| … 0 and |xy| # p. Let us
choose apbp+1. Its length is 2p + 1 $ p. Since the length of xy cannot exceed p, y
must be of the form ak for some k > 0. From the pumping lemma ap-kbp+1 must also
be in L but it is not of the right form. Hence the language is not regular.

Note that the repeatable string needs to appear in the first n symbols to avoid the
following situation:

assume, for the sake of argument that n = 20 and you choose the string a10 b11
which is of length larger than 20, but |xy| # 20 allows xy to extend past b, which
means that y could contain some b’s. In such case, removing y (or adding more y’s)
could lead to strings which still belong to L.
4. L = {anb2n }
Assume L is regular. From the pumping lemma there exists a p such that every w 0
L such that |w| $ p can be represented as x y z with |y| … 0 and |xy| # p. Let us
choose apb2p. Its length is 3p $ p. Since the length of xy cannot exceed p, y must
be of the form ak for some k > 0. From the pumping lemma ap-kb2p must also be in L
but it is not of the right form. Hence the language is not regular.

5. TRAILING-COUNT as any string s followed by a number of a’s equal to the length of


s.
Assume L is regular. From the pumping lemma there exists a p such that every w 0
L such that |w| $ p can be represented as x y z with |y| … 0 and |xy| # p. Let us
choose bpap. Its length is 2p $ p. Since the length of xy cannot exceed p, y must be
of the form bk for some k > 0. From the pumping lemma bp-kap must also be in L but
it is not of the right form. Hence the language is not regular.

6. EVENPALINDROME = { all words in PALINDROME that have even length}


Same as #2 above, choose anbban.

7. ODDPALINDROME = { all words in PALINDROME that have odd length}


Same as #2 above, choose anban.

8. DOUBLESQUARE = { anbn where n is a square }


Assume DOUBLESQUARE is regular. From the pumping lemma there exists a p
such that every w 0 L such that |w| $ p can be represented as x y z with |y| … 0 and
|xy| # p. Let us choose ap*pbp*p. Its length is 2p2 $ p. Since the length of xy cannot
exceed p, y must be of the form ak for some k > 0. Let us add y p times. From the
pumping lemma ap*p+pkbp*p = ap(p + k)bp*p must also be in L but it is not of the right form.
Hence the language is not regular.

9. L = { w | w 0 {a, b}*, w = wR}


Proof by contradiction:
Assume L is regular. Then the pumping lemma applies.

From the pumping lemma there exists an n such that every w 0 L longer than n can
be represented as x y z with |y| … 0 and |x y| # n.

Let us choose the palindrome anban.

Again notice that we were clever enough to choose a string which:


a. has a center mark which is not a (otherwise when we remove or add y we
would be left with an acceptable string)
b. has a first portion on length n which is all a’s (so that when we remove or add
y it will create an imbalance).

Its length is 2n + 1 $ n. Since the length of xy cannot exceed n, y must be of the


form ak for some k > 0. From the pumping lemma an-k b an must also be in L but it is
not a palindrome.

Hence L is not regular.

10. L = { w 0 {a, b}* | w has an equal number of a’s and b’s}


Let us show this by contradiction: assume L is regular. We know that the
language generated by a*b* is regular. We also know that the intersection of two
regular languages is regular. Let M = (anbn | n $ 0} = L(a*b*) 1 L. Therefore if L
is regular M would also be regular. but we know tha M is not regular. Hence, L
is not regular.

11. L = { w wR | w 0 {a, b}* }


see # 7

12. L = { 0n | n is a power of 2 }
Assume L is regular. From the pumping lemma there exists a p such that every
w 0 L such that |w| $ p can be represented as x y z with |y| … 0 and |xy| # p. Let
us choose
n = 2p. Since the length of xy cannot exceed p, y must be of the form 0k for
some 0 < k #p. From the pumping lemma 0m where m = 2p+ k must also
be in L. We have

2p < 2p + k # 2 p + p < 2 p + 1

Hence this string is not of the right form. Hence the language is not regular.

13. L = {a2kw | w 0 {a, b}*, |w| = k}


Assume L is regular. From the pumping lemma there exists a p such that every
w 0 L such that |w| $ p can be represented as x y z with |y| … 0 and |xy| # p. Let
us choose a2pbp. Its length is 3p $ p. Since the length of xy cannot exceed p, y
must be of the form ak for some k > 0. From the pumping lemma a2p-kbp must
also be in L but it is not of the right form since the number of a’s cannot be twice
the number of b’s (Note that you must subtract not add , otherwise some a’s
could be shifted into w). Hence the language is not regular.

14. L = {akw | w 0 {a, b}*, |w| = k}


Assume L is regular. From the pumping lemma there exists a p such that every
w 0 L such that |w| $ p can be represented as x y z with |y| … 0 and |xy| # p. Let
us choose apbp. Its length is 2p $ p. Since the length of xy cannot exceed p, y
must be of the form ak for some k > 0. From the pumping lemma ap-kbp must also
be in L but it is not of the right form since the number of a’s cannot be equal to
the number of b’s (Note that you must subtract not add , otherwise some a’s
could be shifted into w). Hence the language is not regular.

15. L = {anbl | n # l}
Assume L is regular. From the pumping lemma there exists a p such that every
w 0 L such that |w| $ p can be represented as x y z with |y| … 0 and |xy| # p. Let
us choose apbp. Its length is 2p $ p. Since the length of xy cannot exceed p, y
must be of the form ak for some k > 0. From the pumping lemma ap+k bp must
also be in L but it is not of the right form since the number of a’s exceeds the
number of b’s (Note that you must add not subtract, otherwise the string would
be OK). Hence the language is not regular.

16. L = {anblak | k = n + l}
Assume L is regular. From the pumping lemma there exists a p such that every
w 0 L such that |w| $ p can be represented as x y z with |y| … 0 and |xy| # p. Let
us choose apbap+1. Its length is 2p+2 $ p. Since the length of xy cannot exceed
p, y must be of the form am for some m > 0. From the pumping lemma ap-mbap+1
must also be in L but it is not of the right form. Hence the language is not
regular.

17. L = {vak+1 | v 0 {a, b}*, |v| = k}


Assume L is regular. From the pumping lemma there exists an n such that every
w 0 L such that |w| $ n can be represented as x y z with |y| … 0 and |xy| # p. Let
us choose bnan+1. Its length is 2n+1 $ n. Since the length of xy cannot exceed n,
y must be of the form bk for some k > 0. From the pumping lemma if we add two
y to the original string bn+2k an+1 must also be in L but that string is of length
2n+2k+1 and v would have to be bn+k to fit the pattern the rest of the string would
then be bk ak+1 which is not of the right form. Hence the language is not regular.

18. L = {va2k | v 0 {a, b}*, |v| = k}


Assume L is regular. From the pumping lemma there exists a n such that every
w 0 L such that |w| $ n can be represented as x y z with |y| … 0 and |xy| # n. Let
us choose bna2n. Its length is 3n $ n. Since the length of xy cannot exceed n, y
must be of the form bk for some k > 0. From the pumping lemma bn+k an must
also be in L but it is not of the right form since the number of a’s exceeds the
number of b’s and we cannot move any b’s on the a side (Note that you must
add not subtract, otherwise the string would be OK by shifting a’s to the b side).
Hence the language is not regular.

19. L = {ww | w 0 {a, b}*}


Assume L is regular. From the pumping lemma there exists an n such that every
w 0 L such that |w| $ n can be represented as x y z with |y| … 0 and |xy| # n. Let
us choose anbnanbn. Its length is 4n $ n. Since the length of xy cannot exceed n,
y must be of the form ak for some k > 0. From the pumping lemma an+kbnanbn
must also be in L but it is not of the right form since the middle of the string
would be in the middle of the b which prevents a match with the beginning of the
string. Hence the language is not regular.

20. L = { an! | n $ 0}
Proof by contradiction:
Let us assume L is regular. From the pumping lemma, there exists a number p
such that any string w of length greater than p has a “repeatable” substring
generating more strings in the language L. Let us consider ap! (unless p < 3 in
which case we chose a3! ). From the pumping lemma the string w has a
“repeatable” substring. We will assume that this substring is of length k $ 1.
From the pumping lemma ap!-k must also be in L. For this to be true there must
be j such that j! = m! - k But this is not possible since when p > 2 and k # m we
have
m! - k > (m - 1)!
Hence L is not regular.

21. L = { anbl | n … l}
Proof by contradiction:
Let us assume L is regular. From the pumping lemma, there exists a number p
such that any string w of length greater than p has a “repeatable” substring
generating more strings in the language L. Let us consider n = p! and l = (p+1)!
From the pumping lemma the resulting string is of length larger than p and has a
“repeatable” substring. We will assume that this substring is of length k $ 1.
From the pumping lemma we can add y i-1 times for a total of i ys. If we can find
an i such that the resulting number of a’s is the same as the number of b’s we
have won. This means we must find i such that:
m! + (i - 1)*k = (m + 1)! or
(i - 1) k = (m + 1) m! - m! = m * m! or
i = (m * m!) / k +1

but since k < m we know that k must divide m! and that (m * m!) / k must be an
integer. This proves that we can choose i to obtain the above equality.
Hence L is not regular.

22. L = {anblak | k > n + l}


Assume L is regular. From the pumping lemma there exists a p such that every
w 0 L such that |w| $ p can be represented as x y z with |y| … 0 and |xy| # p. Let
us choose apbap+2. Its length is 2p+3 $ p. Since the length of xy cannot exceed
p, y must be of the form am for some m > 0. From the pumping lemma ap+2mbap+2
must also be in L but it is not of the right form since p+2m+1 > p+2. Hence the
language is not regular.

23. L = {anblck | k … n + l}
Assume L is regular. From the pumping lemma there exists a p such that every
w 0 L such that |w| $ p can be represented as x y z with |y| … 0 and |xy| # p. Let
us choose ap! bp! a(p+1)! . Its length is 2p!+(p+1)! $ p. Since the length of xy cannot
exceed p, y must be of the form am for some m > 0. From the pumping lemma
any string of the form xyi. z must always be in L. If we can show that it is always
possible to choose i in such a way that we will have k = n + l for one such string
we will have shown a contradiction. Indeed we can have
p!+(i-1)m + p! = (p+1)!
if we have i = 1 + ((p+1)! - 2 p!)/ m Is that possible? only if m divides
((p+1)! -2 p!
((p + 1)! - 2 * (p)! = (p + 1 - 2) p! and since m # p m is guaranteed to divide p!.

Hence i exists and the language is not regular.

24. L = {anblak | n = l or l … k}
Proof by contradiction:
Let us assume L is regular. From the pumping lemma, there exists a number p
such that any string w of length greater than p has a “repeatable” substring
generating more strings in the language L. Let us consider w = apbpap. From
the pumping lemma the string w, of length larger than p has a “repeatable”
substring. We will assume that this substring is of length m $ 1. From the
pumping lemma we can remove y and the resulting string should be in L.
However, if we remove y we get ap - mbpap. But this string is not in L since p-m … p
and p = p.
Hence L is not regular.
25. L = {anba3n | n $ 0}
Assume L is regular. From the pumping lemma there exists a p such that every
w 0 L such that |w| $ p can be represented as x y z with |y| … 0 and |xy| # p. Let
us choose apba3p. Its length is 4p+1 $ p. Since the length of xy cannot exceed
p, y must be of the form ak for some k > 0. From the pumping lemma ap-kba3p
must also be in L but it is not of the right form. Hence the language is not
regular.

26. L = {anbncn | n $ 0}
Assume L is regular. From the pumping lemma there exists a p such that every
w 0 L such that |w| $ p can be represented as x y z with |y| … 0 and |xy| # p. Let
us choose apbpcp. Its length is 3p $ p. Since the length of xy cannot exceed p,
y must be of the form ak for some k > 0. From the pumping lemma ap-kbpap must
also be in L but it is not of the right form. Hence the language is not regular.

27. L = {aibn | i, n $ 0, i = n or i = 2n}


Assume L is regular. From the pumping lemma there exists a p such that every
w 0 L such that |w| $ p can be represented as x y z with |y| … 0 and |xy| # p. Let
us choose apbp. Its length is 2p $ p. Since the length of xy cannot exceed p,
y must be of the form a for some k > 0. From the pumping lemma ap-kap must
k

also be in L but it is not of the right form. Hence the language is not regular.

28. L = {0k10k | k $ 0 }
Assume L is regular. From the pumping lemma there exists an n such that every
w 0 L such that |w| $ n can be represented as x y z with |y| … 0 and |xy| # n. Let
us choose 0n10n. Its length is 2n+1 $ n. Since the length of xy cannot exceed n,
y must be of the form 0p for some p > 0. From the pumping lemma 0n-p10n must
also be in L but it is not of the right form. Hence the language is not regular.

29. L = {0n1m2n | n, m $ 0 }
Assume L is regular. From the pumping lemma there exists a p such that every
w 0 L such that |w| $ p can be represented as x y z with |y| … 0 and |xy| # p. Let
us choose 0p12p. Its length is 2p+1 $ p. Since the length of xy cannot exceed p,
y must be of the form 0p for some p > 0. From the pumping lemma 0n-p12n must
also be in L but it is not of the right form. Hence the language is not regular.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy