Unit 3 First Half and Unit 4 First Half
Unit 3 First Half and Unit 4 First Half
UNIT – II
GRAMMARS
Part -A
1. What are the applications of Context free languages?
Context free languages are used in:
(i) Defining programming languages.
(ii) Formalizing the notion of parsing.
(iii) Translation of programming languages.
(iv) String processing applications.
2. What are the uses of Context free grammars?
Construction of compilers.
Simplified the definition of programming languages.
Describes the arithmetic expressions with arbitrary nesting of balanced
parenthesis {(,)}.
Describes block structure in programming languages.
Model neural nets.
3. Define a context free grammar(Apr/May 13) (Nov/Dec 15)
A context free grammar (CFG) is denoted as G= (V, T, P, S) where V and T are
finite set of variables and terminals respectively. V and T are disjoint. P is a finite set of
productions each is of the form A->a where A is a variable and a is a string of symbols
from (V U T)*.
4. What is the language generated by CFG or G?
The language generated by G (L (G)) is {w | w is in T* and S=>w. That is a G
string is in L (G) if:
(1) The string consists solely of terminals.
(2) The string can be derived from S.
5. .What is : (a) CFL (b) Sentential form
L is a context free language (CFL) if it is L (G) for some CFG G.
A string of terminals and variables α is called a sentential form if:
S => α, where S is the start symbol of the grammar.
6. What is the language generated by the grammar G=(V,T,P,S) where
P={S->aSb, S->ab}?
S=> aSb=>aaSbb=>…………………………..=>anbn
Thus the language L (G) = { anbn | n>=1}.The language has strings with equal number of
a‟s and b‟s.
7. What is :(a) derivation (b)derivation/parse tree (c) sub tree
(a) Let G= (V, T, P, S) be the context free grammar. If A-> β is a production of P and
α and γ are any strings in (VUT)* then α A γ => αβγ
(b) A tree is a parse \ derivation tree for G if:
(i) Every vertex has a label which is a symbol of VU TU {_}.
(ii) The label of the root is S.
(iii) If a vertex is interior and has a label A, then A must be in V.
(iv) If n has a label A and vertices n1, n2… nk are the sons of the vertex n in
order from left with labels X1,X2,………..Xk respectively then A X1X2…..Xk
must be in P.
(v) If vertex n has label _, then n is a leaf and is the only son of its father.
Page 47
CS6503 THEORY OF COMPUTATION
(c) A sub tree of a derivation tree is a particular vertex of the tree together with all its
descendants, the edges connecting them and their labels. The label of the root may not be
the start symbol of the grammar.
8. If S->aSb | aAb , A->bAa , A->ba .Find out the CFL
Soln: S->aAb=>abab
S->aSb=>a aAb b =>a a ba b b(sub S->aAb)
S->aSb =>a aSb b =>a a aAb b b=>a a a ba b bb
Thus L= {anbmambn, where n, m>=1}
9. What is an ambiguous grammar? ( Nov /Dec 12)
A grammar is said to be ambiguous if it has more than one derivation trees for a
sentence or in other words if it has more than one leftmost derivation or more than one
rightmost derivation.
10. Consider the grammar={S->aS | aSbS | Є } is ambiguous by constructing:
(a) Two parse trees (b) two leftmost derivation (c) rightmost derivation
Consider a string aab : ( Apr/May13,14)
(a)
Page 48
CS6503 THEORY OF COMPUTATION
12. Construct CFG without Є production from: S →a | Ab | aBa, A →b | Є, B →b | A.
S->a
S->Ab
S->aBa
A->b
A->Є
B->b
B->A are the given set of production.
A->Є is the only empty production. Remove the empty production
S-> Ab, Put A->Є and hence S-> b.
If B-> A and A->Є then B ->Є
Hence S->aBa becomes S->aa.
Thus S-> a | Ab | b | aBa | aa
A->b
B->b
Finally the productions are: S-> a | Ab | b | aBa | aa,
A->b
B->b
13. What are the three ways to simplify a context free grammar?
(i) Removing the useless symbols from the set of productions.
(ii) Eliminating the empty productions.
(iii) Eliminating the unit productions.
14. What are the properties of the CFL generated by a CFG?
Each variable and each terminal of G appears in the derivation of some word in L
There are no productions of the form A->B where A and B are variables.
15. Find the grammar for the language L= {a 2n bc, where n>1}
let G=( {S,A,B}, {a,b,c} ,P , {S} ) where P:
S->Abc
A->aaA | Є
16. Find the language generated by
S->0S1 | 0A | 0 |1B | 1
A->0A | 0 ,
B->1B | 1
The minimum string is S-> 0 | 1
S->0S1=>001
S->0S1=>011
S->0S1=>00S11=>000S111=>0000A111=>00000111
Thus L={ 0 n 1 m | m not equal to n, and n, m >=1}
17. Construct the grammar for the language L= { an b an | n>=1}. (Nov/Dec13)
The grammar has the production P as:
S->aAa
A->aAa | b
The grammar is thus: G= ({S, A}, {a, b}, P, S)
18. Construct a grammar for the language L which has all the strings which are all
Palindrome over Σ= {a, b}.
G= ({S}, {a, b}, P, S)
P :{ S -> aSa, S-> b S b, S-> a, S->b, S->Є}
Which is in palindrome.
CS6503 THEORY OF COMPUTATION
19. Differentiate sentences Vs sentential forms
A sentence is a string of terminal symbols.
A sentential form is a string containing a mix of variables and terminal symbols or all
variables. This is an intermediate form in doing a derivation.
20. What is a parser?
A parser for grammar G is a program that takes as input a string w and produces as output
either a parse tree for w, if w is a sentence of G or an error message indicating that w is not
a sentence of G.
21. What are the two major normal forms for context-free grammar?
The two Normal forms are
i. Chomsky Normal Form (CNF)
ii. Greibach Normal Form (GNF)
22. What is a useless symbol?
A symbol x is useful if there is a derivation S->* α x β ->* w for some α, β, w Σ T* or else, it
is useful.
PART - B
1. Discuss about Types of Grammars[Revised topic]
Grammars
Grammars are language generators. They consist of an alphabet of terminal symbols, alphabet of
non-terminal symbols, a starting symbol and rules. Each language, generated by some grammar,
can be recognized by some automaton. Languages (and the corresponding grammars) can be
classified according to the minimal automaton sufficient to recognize them. Such classification,
known as Chomsky Hierarchy, has been defined by Noam Chomsky, a distinguished linguist
with major contributions to linguistics. The Chomsky Hierarchy comprises four types of
languages and their associated grammars and machines.
Regular grammars
• Right-linear
Finite-state
Type3 Regular languages grammars a*
automata
• Left-linear
grammars
Recursive languages
Unrestricted Turing any computable
Type0 Recursively
grammars machines function
enumerable languages
The types of languages form a hierarchy:
The distinction between languages can be seen by examining the structure of the grammar rules
of their grammar, or the nature of the automata which can be used to identify them.
Page 50
CS6503 THEORY OF COMPUTATION
Type 3 - Regular Languages
A regular language is one which can be represented by a regular grammar, described using a
regular expression, or accepted using an FSA.
There are two kinds of regular grammar:
Right-linear (right-regular), with rules of the form
A → α B or A → α,
Where A and B are single non-terminal symbols, α is a terminal symbol
Parse trees with these grammars are right-branching.
Left-linear (left-regular), with rules of the form
A → B α or A → α
Parse trees with these grammars are left-branching.
Examples of regular languages are pattern matching languages (regular expressions).
Type 2 - Context-Free Languages
A Context-Free Grammar (CFG) is one whose production rules are of the form:
A→α
Where A is any single non-terminal, and α is any combination of terminals and non-terminals.
The minimal automaton that recognizes context-free languages is a push-down automaton. It
uses stack when expanding the non-terminal symbols with the right-hand side of the
corresponding grammar rule.
Examples of CFLs are some simple programming languages.
Type 1 - Context-Sensitive Languages
Context-Sensitive grammars may have more than one symbol on the left-hand-side of their
grammar rules, provided that at least one of them is a non-terminal and the number of symbols
on the left-hand-side does not exceed the number of symbols on the right-hand-side. Their rules
have the form:
α A γ→ α β γ
Where A is a single non-terminal symbol, and α β γ are any combination of terminals and non-
terminals. Since we allow more than one symbol on the left-hand-side, we refer to those symbols
other than the one we are replacing as the context of the replacement.
The automaton which recognizes a context-sensitive language is called a linear-bounded
automaton: an FSA with a memory to store symbols in a list.
Since the number of the symbols on the left-hand side is always smaller or equal to the number
of the symbols on the right-hand side, the length of each derivation string is increased when
applying a grammar rule. This length is bound by the length of the input string. Thus a linear-
bound automaton always needs a finite list as its store
Examples of context-sensitive languages are most programming languages
Type 0 - Unrestricted (Free) Languages
Unrestricted grammars have no restrictions on their grammar rules, except that there must be at
least one non-terminal on the left-hand-side. The rules have the form
α→β
Where α and β are arbitrary strings of terminal and non-terminal symbols and ε (the empty
string) The type of automata which can recognize such a language is a Turing machine, with an
infinitely long memory. Examples of unrestricted languages are almost all natural languages.
2. Explain About Context Free Grammar (CFG)
Definition:
A context free grammar is a finite set of variables each of which represents a language.
The languages represented by the variable are described recursively in terms of each other and
Page 51
CS6503 THEORY OF COMPUTATION
primitive symbols called terminals. The rules relating the variables are called production. A
Context Free Grammar (CFG) is denoted by,
G = (V, T, P, S)
Where, V – Variables
T – Terminals
P – Finite set of Productions of the form A → α where, A is a variable and α is
a string of symbols.
S – Start symbol.
A CFG is represented in Backus-Naur Form (BNF). For example consider the grammar,
<expression> → <expression> + <expression>
<expression> → <expression> * <expression>
<expression> → (<expression>)
<expression> → id
Here <expression> is the variable and the terminals are +, *, (,), id. The first two
productions say that an expression can be composed of two expressions connected by an addition
or multiplication sign. The third production says that an expression may be another expression
surrounded by parenthesis. The last says a single operand is an expression.
Ex.1: A CFG, G = (V, T, P, S) whose productions are given by,
A → Ba
B→b
Soln: A → Ba
→ ba which produces the string ba.
(ii)S → aAb
→ abAab [A → bAa]
→ abbaab [A → ba]
Leftmost and Rightmost Derivations:
Leftmost Derivation: If at each step in a derivation, a production is applied to the leftmost
variable then it is called leftmost derivation.
Ex.1: Let G = (V, T, P, S), V = {E}, T = {+, *, id}, S = E, P is given by,
E → E + E | E * E | id Construct leftmost derivation for id+id*id.
lm
Soln: E E+E
id+E [E → id]
id+E*E [E → E*E]
id+id*E [E → id]
id+id*id [E → id]
Rightmost Derivation:
If at each step in a derivation a production is applied to the rightmost variable, then it is
called rightmost derivation.
Ex.1: Let G = ( V, T, P, S ), V = {E}, T = {+, *, id}, S = E, P is given by, E → E + E | E * E | id
Construct rightmost derivation for id+id*id.
rm
Soln: E E*E
E*id [E → id]
E+E*id [E → E+E]
E+id*id [E → id]
id+id*id [E → id]
Parse Trees or (Derivation Trees)
The derivations can be represented by trees using “parse trees”.
Constructing Parse Trees:
Let G = (V, T, P, S) be a grammar. The parse trees for „G‟ are trees with following
conditions.
1. Each interior node is labeled by a variable in V.
2. Each leaf is labeled by either a variable, a terminal or ε.
3. If an interior node is labeled A, and its children are labeled X1, X2… Xk respectively from
left, then A → X1, X2… Xk is a production in P.
4. If A → ε then A is considered to be the label.
Page 53
CS6503 THEORY OF COMPUTATION
Ex.1: Construct a parse tree for the grammar, E → E + E | E * E | (E) | id, for the string
id*id+id
Soln: Derivation: Parse Tree:
Ex.2: Construct a parse tree for the grammar, P → ε | 0 | 1 | 0P0 | 1P1, A parse tree showing the
derivation of a string 0110.
Soln: Derivation: Parse Tree:
Page 54
CS6503 THEORY OF COMPUTATION
Parse Tree: Parse Tree:
Relationship between the Derivation Trees and Derivation (I6 Marks Important Question)
*
Theorem: Let G = (V, T, P, S) be a CFG. Then S α if and only if there is a derivation tree in
grammar G with yield α.
Proof: Suppose there is a parse tree with root S and yield α, then there is a leftmost derivation,
*
S α in G.
lm
*
To prove, S α in G. Let us prove this theorem by induction on height of the tree.
lm
Basis: If the height of parse tree is „1‟ then the tree must be of the form given in figure below
with root „S‟ and yield „α‟.
This means that „S‟ has only leaves and no sub trees. This is possible only with the
production. S α in G
S α is a one step leftmost derivation.
lm
Induction: If the height of the parse tree is in the parse tree must look like in.
α = α1α2 ……… αk. Where X1, X2… Xk are all the sub trees of S. Assume there exist a
*
leftmost derivation S α for every parse tree of height less than „n‟. Consider a parse tree of
*
height „n‟. Let the leftmost derivation be, S X1X2… Xk
lm
The Xi‟s may be either terminals or variables.
(i) If Xi is a terminal then Xi = αi
Page 55
CS6503 THEORY OF COMPUTATION
(ii) If Xi is a variable then it must be the root of some sub-tree with yield αi of height less
than „n‟.
By applying inductive hypothesis, there is a leftmost derivation,
*
Xi αi
lm
S X1X2, ….. Xk
lm
If Xi is a terminal then no change.
*
S α1α2 ……… αi Xi+1 ….. Xk
lm
*
α1α2 ……… αi-1 αi Xi+1 ….. Xk
lm
By repeating the process we can get,
*
S α1α2 ……… αk [ α = α1α2 ……… αk ]
lm
Thus proved.
AMBIGUITY IN GRAMMARS AND LANGUAGES [May/June 16]
Sometimes there is an occurrence of ambiguous sentence in a language we are using.
Like that in CFG there is a possibility of having two derivations for the same string.
Ambiguous Grammars:
A CFG, G = (V, T, P, S) is said to be ambiguous, if there is at least one string „w‟ has two
different parse trees.
Ex.1: Construct ambiguous grammar for the grammar, E → E + E | E * E | (E) | id, and generate
a string id+id*id.
Soln:
Derivation1: Derivation2:
E E+E E E*E
id+E [E → id] E+E*E [E → E+E]
id+E*E [E → E*E] id*E+E [E → id]
id+id*E [E → id] id*id+E [E → id]
id+id*id [E → id] id*id+id [E → id]
Page 56
CS6503 THEORY OF COMPUTATION
Parse Tree1: Parse Tree2:
Unambiguous:
If each string has atmost one parse tree in the grammar, then the grammar is
unambiguous.
Leftmost Derivations as a way to Express Ambiguity: [May/June 16]
A grammar is said to be ambiguous, if it has more than one leftmost derivation.
Ex: Construct ambiguous grammar for the grammar,
E → I | E+E | E*E | (E)
I → a | b | Ia | Ib | I0 | I1
and generate two leftmost derivation for a string a+a*a.
Soln:
Derivation1: Derivation2:
rm rm
E E+E E E*E
I+E [E → I] E+E*E [E → E+E]
a+E [I → a] I+E*E [E → I]
a+E*E [E → E*E] a+E*E [I → a]
Page 57
CS6503 THEORY OF COMPUTATION
a+I*E [E → I] a+I*E [E → I]
a+a*E [I → a] a+a*E [I → a]
a+a*I [E → I] a+a*I [E → I]
a+a*a [I → a] a+a*a [I → a]
Parse Tree1: Parse Tree2:
Inherent Ambiguity:
A CFL L is said to be inherently ambiguous, if every grammar for the language must be
ambiguous.
Ex: Show that the language is inherent ambiguous L = {anbncmdm | n≥1, m≥1} { anbmcmdn |
n≥1, m≥1} then the production P is given by,
S → AB | C
A → aAb |ab C → aCd |aDd
B → cBd | cd D → bDc | bc
Soln: L is a context free language. It separate sets of productions to generate two kinds of
strings in L. This grammar is ambiguous For ex, the string aabbccdd has two leftmost
derivations.
Derivation1: Derivation2:
rm rm
S AB S C
aAbB aCd
aabbB aaDdd
aabbcBd aabDcdd
aabbccdd aabbccdd
Page 58
CS6503 THEORY OF COMPUTATION
NORMAL FORMS FOR CFG [Apr/May 11, 12, 13, 14, May/June 16]
Simplification of CFG:
In a CFG, it may not be necessary to use all the symbols in V T on all the productions in P for
deriving sentences. So we try to eliminate the symbols and productions in G which are not useful for
derivation of sentences. To simplify a CFG we need to eliminate,
(a) Eliminating Useless Symbols.
(b) Eliminating ε – Productions.
(c) Eliminating Unit Productions.
(a) Eliminating Useless Symbols:
The productions from a grammar that can never take part in any derivation is called useless symbols.
Definition:Let G = (V, T, P, S) be a grammar, A grammar „X‟ is useful, if there is a derivation S αXβ
w, where „w‟ is in T. A symbol „X‟ is not useful, we say it is useless. There are two ways to
eliminating useless symbols,
(1) First, eliminate non generating symbols.
(2) Second, eliminate all symbols that are not reachable in the grammar G.
Ex.1: Eliminate useless symbols from the given grammar,
S → AB | a
A→a
Soln: We find that no terminal string is derivable from B. So, to eliminate the symbol B and the
production S→AB. After eliminating useless symbols and the productions are,
S→a
A→a
Ex.2: Eliminate useless symbols from the given grammar,
S → aSbS | C | ε
C → aC
Soln: The production S → C is useless production, since C cannot derive any terminal string. To
eliminate the variable C, and the productions S → C, C → aC.
After eliminating all useless symbols the productions are,
S → aSbS | ε
(b) Eliminating ε – Productions: (May/June 16)
Any productions of CFG of the form A → ε is called ε – production. Any variable A for which
the derivations A ε is possible, is called as nullable.
Steps:
(i) Find the set of nullable variables of G, for all productions of the form A→ε. Put A to V
nullable, if B→A1A2………An, where A1, A2 ….An are in V nullable then put B also in V
nullable.
(ii) Construct a new set of production P‟. For each production in P, put that production and all the
production generated by replacing nullable variables for all possible combinations into P‟.
Ex.1: Eliminate ε – productions from the grammar,
S → abB
B → Bb | ε
Soln:
(i) B → ε is a null production. V nullable = {B}
(ii) Construct the production in P‟
Page 59
CS6503 THEORY OF COMPUTATION
S → abB | ab
B → Bb | b
Ex.2: Eliminate ε – productions from the grammar,
S → BabC
B→a|ε
C→b|ε
Soln:
(i) Find all nullable variables, V nullable = {C, B}.
(ii) Construct the production in P‟.
S → BabC | abC | Bab | ab
B→a
C→b
Page 60
CS6503 THEORY OF COMPUTATION
Soln:
(1) To eliminate all unit productions to P‟.
S → ABA | BA | AA | AB
A → aA | a
B → bB | b
(2) S → A, S → B are unit productions, they are derivable, hence removing unit productions to
get,
S → ABA | BA | AA | AB | aA | a | bB | b
A → aA | a
B → bB | b
Types of Normal Forms:
There are two normal forms, they are,
(1) Chomsky Normal Form (CNF)
(2) Greibach Normal Form (GNF)
(1) Chomsky Normal Form: [May/June 16]
Every CFL is generated by a CFG in which all productions are of the form A → BC or A → a,
where A, B, C are variables, and „a‟ is a terminal. This form is called Chomsky Normal Form (CNF).
Rules for Converting a Grammar into CNF:
(i) Simplify the grammar by, eliminating ε – productions, unit productions and useless symbols.
(ii) Add all the productions of the form A → BC and A → a to the new production set P‟.
(iii) Consider a production A → A1A2 ……. An, if Ai is a terminal say ai then add a new variable Cai
to the set of variables, say V‟ and a new production Cai → ai to the new set of production P‟.
Replace Ai and A production of P by Cai.
(iv) Consider A → A1A2 ……… An, where n ≥ 3 and all Ai‟s are variables then introduce new
productions. A → X1C1, C1 → X2C2… Cn-2 → Xn-1Cn to the new set of productions P‟ and the
new variables C1, C2 … Cn-2 into new set of variables V‟.
Ex.1: Convert the following grammar into CNF,
S → AAC
A → aAb | ε
C → aC | a
Soln:
Step 1: Simplify the Grammar:
Eliminate ε – Productions:
- Find nullable variables V nullable = {A}
- Construct the production in P‟.
P‟: S → AAC | AC | C
A → aAb | ab
C → aC | a
Eliminate Unit Productions:
S → C is a Unit Production. Replace C by its productions,
S → AAC | AC | aC | a
A → aAb | ab
C → aC | a
Page 61
CS6503 THEORY OF COMPUTATION
Page 62
CS6503 THEORY OF COMPUTATION
S → CaACa ; Ca → a
S → CaD1 ; D1 → ACa
S → bBb D → ab
S → CbBCb ; Cb → b D → CaCb
S → CbD2 ; D2 → BCb
The Resultant Grammar is,
S → CaD1 | CbD2
Ca → a Cb → b
D1 → ACa D2 → BCb
A→a
B→b
D → a | b | CaCb
Ex.3: Convert the following grammar into CNF,
S → abAB
A → bAB | ε
B → BAa | A | ε
Soln:
Step 1: Simplify the Grammar:
Eliminate ε – Productions:
- Find nullable variables Vnullable = {A, B}
- Construct the production in P‟.
P‟: S → abAB | abB | abA | ab
A → bAB | bB | bA | b
B → BAa | Ba | Aa | a | A
Eliminate Unit Productions:
B → A is a Unit Production. Replace A by its productions,
S → abAB | abB | abA | ab
A → bAB | bB | bA | b
B → BAa | Ba | Aa | a | bAB | bB | bA | b
Eliminate Useless Symbols:
There are no useless symbols. All the variables are generating terminal string.
Step 2: Reduce the given grammar into CNF:
Add all the productions of the form A → BC or A → a.
S → abAB A → bA
S → CaCbAB ; Ca → a; Cb → b A → CbA
S → CaCbD1 ; D1 → AB
S → CaD2 ; D2 → CbD1 B → BAa
B → BACa
S → abB B → BD5 ; D5 → ACa
S → CaCbB
S → CaD3 ; D3 → CbB B → Ba
B → BCa
S → abA
S → CaCbA B → Aa
S → CaD4 ; D4 → CbA B → ACa
Page 63
CS6503 THEORY OF COMPUTATION
S → ab B → bAB
S → CaCb B → CbAB
B → CbD1
A → bAB
A → CbAB B → bB
A → D4B B → CbB
A → bB B → bA
A → CbB B → CbA
A → β | βB
B → α | αB
Page 64
CS6503 THEORY OF COMPUTATION
Ex.1: Construct a GNF for the following grammar,
S → AA | a
A → SS | b
Soln:
Step 1:
The given grammar is in CNF. Rename the variables „S‟ and „A‟ as „A1‟ and „A2‟. Modify the
productions,
A1 → A2A2 | a
A2 → A1A1 | b
Step 2:
Check Ai → Aj, where i < j.
Page 65
CS6503 THEORY OF COMPUTATION
Step 2:
Check Ai → Aj, where i < j.
- A1 productions are in the required form. (ie, i < j, where i = 1, j = 2)
- A2 productions are in the required form. (ie, i < j, where i = 2, j = 3)
- A3 productions are not in the required form
ie) A3 → A1A2 (i > j, where i = 3, j = 1)
A3 → A3A1A3A2 | bA3A2 | a
Here, α = A1A3A2 β = bA3A2 | a
A3 → bA3A2 | a | (bA3A2 | a) B
A3 → bA3A2 | a | bA3A2B | aB
B → A1A3A2 | A1A3A2B
bA3A3A2B
Page 66