0% found this document useful (0 votes)
46 views86 pages

CH 4 - Context Free Languages Amd Grammars

The document discusses context-free grammars and context-free languages. It defines context-free grammars formally and provides examples. It also covers derivation order, derivation trees, ambiguity in grammars, and simplifying context-free grammars including Chomsky normal form.

Uploaded by

halal.army07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
46 views86 pages

CH 4 - Context Free Languages Amd Grammars

The document discusses context-free grammars and context-free languages. It defines context-free grammars formally and provides examples. It also covers derivation order, derivation trees, ambiguity in grammars, and simplifying context-free grammars including Chomsky normal form.

Uploaded by

halal.army07
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 86

Chapter 4 – Context Free Grammar

and
Context Free Languages

1
Outline
CFGs and CFLs

Parsing (or derivation) and Parse trees

Ambiguity of grammar and language

Simplification of CFGs

Chmosky Normal Form (CNF) and


GreibachNormal Form (GNF)
2
Context-Free Grammars
and
Context-Free Languages
Formal Definitions
Grammar: G  V , T , P, S 

Set of
variables
Set of Set of Start
terminal productions variable
symbols

4
Context-Free Grammar: G  (V , T , P, S )

All productions in P are of the form


A s
String of
Variable variables and
terminals

* The restriction on the right side of the


productions made on regular grammar
regarding the number and position of
variables are relaxed.
5
Example of Context-Free Grammar
S  aSb | 
productions
P  {S  aSb, S  }

G  V , T , P, S 

V  {S }
T  {a, b} start variable
variables
terminals
6
Language of a Grammar:

For a grammar with start variable


*
L(G )  {w : S  w, w  T *}

String of terminals or 

7
Context-Free Language:
A language is context-free
if there is a context-free grammar
with

8
Example:

L  {a b : n  0 }
n n

is a context-free language
since context-free grammar :
S  aSb | 

generates L(G )  L

9
Another Example

Context-free grammar :
S  aSa | bSb | 
Example derivations:
S  aSa  abSba  abba
S  aSa  abSba  abaSaba  abaaba

R
L(G ) {ww : w {a, b}*}
Palindromes of even length
10
Another Example
Context-free grammar :
S  aSb | SS | 
Example derivations:
S  SS  aSbS  abS  ab
S  SS  aSbS  abS  abaSb  abab

Describes A core idea of most


balanced programming languages.
parentheses:
() ((( ))) (( )) a  (, b )
11
Another Example
S  aB | bA
A  a | aS | bAA
B  b | bS | aBB

Is ab, baba, abbbaa in L?


How about a, bba,abbaba?
Show with the possible derivations.

12
Another Example
Construct a grammar that generates the
language:
L={anbnci|n≥0,i≥0}
Answer:
S  S1 S2
S1  aS1b| 
S2  cS2 |
Check your answer by deriving
abc, aabbc, aaabbbcccc
13
Another Example
Write the productions of CFG that accept the
language:
L= {ab (bbaa)n bba|n≥0}

Ans: S  S1S2S3 Ans: S  abABa


S1  ab A  bbaaA|
S2  bbaaS2 | B  bb
S3  bba

14
Derivation Order
and
Derivation Trees
Derivation Order
Consider the following example grammar
with 5 productions:

1. S  AB 2. A  aaA 4. B  Bb
3. A   5. B  

16
1. S  AB 2. A  aaA 4. B  Bb
3. A   5. B  

Leftmost derivation order of string aab :

1 2 3 4 5
S  AB  aaAB  aaB  aaBb  aab

At each step, we substitute the


leftmost variable
17
1. S  AB 2. A  aaA 4. B  Bb
3. A   5. B  

Rightmost derivation order of string aab :

1 4 5 2 3
S  AB  ABb  Ab  aaAb  aab
At each step, we substitute the
rightmost variable
18
1. S  AB 2. A  aaA 4. B  Bb
3. A   5. B  

Leftmost derivation of aab :


1 2 3 4 5
S  AB  aaAB  aaB  aaBb  aab

Rightmost derivation of aab :


1 4 5 2 3
S  AB  ABb  Ab  aaAb  aab
19
Derivation Trees
Consider the same example grammar:

S  AB A  aaA |  B  Bb | 

And a derivation of aab :

S  AB  aaAB  aaABb  aaBb  aab

20
S  AB A  aaA |  B  Bb | 

21
S  AB A  aaA |  B  Bb | 

S  AB  aaAB

aaAB

22
S  AB A  aaA |  B  Bb | 

S  AB  aaAB

aaAB

23
S  AB A  aaA |  B  Bb | 

S  AB  aaAB

aaAB

24
S  AB A  aaA |  B  Bb | 

S  AB  aaAB

aaAB

25
S  aAS | a A  SbA | SS | ba
Consider the CFG with the above production rules:
Find the Left most & Right most derivations and
construct a parse tree (Generation tree) for
generating the strings:
i) aabaaa
ii) abaabaa
iii) aabaabbaaaa

26
Sometimes, derivation order doesn’t matter

Leftmost derivation:
S  AB  aaAB  aaB  aaBb  aab
Rightmost derivation:
S  AB  ABb  Ab  aaAb  aab

Give same
derivation tree

27
Ambiguity
Grammar for mathematical expressions

E  E  E | E  E | (E) | a

Example strings:
(a  a )  a  (a  a  (a  a ))

Denotes any number

29
E  E  E | E  E | (E) | a

E  E  E  a E  a EE
 a  a E  a  a*a

A leftmost derivation
for a  a  a

30
E  E  E | E  E | (E) | a

E  EE  E  EE  a EE


 a  aE  a  aa

Another
leftmost derivation
for a  a  a

31
E  E  E | E  E | (E) | a

32
Two different derivation trees
may cause problems in applications which
use the derivation trees:

• Evaluating expressions

• In general, in compilers
for programming languages

35
Ambiguous Grammar:
A context-free grammar G is ambiguous
if there is a string w L(G ) which has:

two different derivation trees


or
two leftmost derivations

(Two different derivation trees give two


different leftmost derivations and vice-versa)
36
Example: E  E  E | E  E | (E) | a

this grammar is ambiguous since


string a  a  a has two derivation trees

37
E  E  E | E  E | (E) | a
this grammar is also ambiguous because
string a  a  a has two leftmost derivations
E  E  E  a E  a EE
 a  a E  a  a*a

E  EE  E  EE  a EE


 a  aE  a  aa
38
Another ambiguous grammar:

STMT if EXPR then STMT


| if EXPR then STMT else STMT

Variables Terminals

Very common piece of grammar


in programming languages
39
In general, ambiguity is bad
and we want to remove it

Sometimes it is possible to find


a non-ambiguous grammar for a language

But, in general we cannot do so

41
Disambiguation
 Can we always disambiguate a grammar?

 No, for two reasons


 There exists an inherently ambiguous context-free L:
Every CFG for this language is ambiguous
 There is no general procedure that can tell if a grammar is
ambiguous

 However, grammars used in programming languages


can typically be disambiguated
42
A successful example:
Equivalent
Ambiguous
Non-Ambiguous
Grammar
Grammar
E E E
E  E T |T
E  E E
T T  F | F
E  (E )
E a F  (E ) | a
generates the same
language
43
E  E T T T  F T  a T  a T F
 a  F F  a  aF  a  aa

E  E T |T
T T F | F
F  (E) | a

Unique
derivation tree
for a  a  a
44
Context-Free Languages

Context-Free Pushdown
Grammars Automata

46
Simplification of CFG
and
Chomsky Normal Form
Simplifying CFGs
 There are several ways in which context-free grammars
can be simplified.
 One natural way is to eliminate useless symbols
 those that cannot be part of a derivation (or parse tree)

 Symbols may be useless in one of two ways.


 they may not be reachable from the start symbol.
 or they may be variables that cannot derive a string of
terminals

48
Example of a useless symbol
 Consider the CFG G with rules
 S → aBC, B → b|Cb, C → c|cC, D → d

 Here the symbols S, B, C, a, b, and c are reachable, but


D is not.

 D may be removed without changing L(G)

49
Reachable symbols
 In a CFG, a symbol is reachable iff it is S or
 it appears in a, where A → a is a rule of the grammar, and A is
reachable
 So in the grammar above
 we first find that S is reachable
 then a, B, and C are
 and finally b and c are.
 A symbol that is unreachable cannot be part of a
derivation. It may be eliminated along with all of its
rules.
50
Another reachability example
 Suppose the grammar G instead had rules
 S → aB, B → b|Cb, C → c|cC, D → d,
 Then we would first see that S is reachable
 then that a and B are
 then that b and C are
 and finally that c is
 We might say in this case that
 S is reachable at level 0,
 a and B at level 1,
 b and C at level 2,
 and c at level 3.
51
A second kind of useless symbol
 Two simple inductions show that X is reachable iff S =*>
aXb for some strings a and b of symbols.

 A symbol X is also useless iff it cannot derive a string of


terminals
 that is, iff there's no string w of terminals such that X =*> w.

52
Another simplification example
 In the grammar with rules
 S → aB, B → b|BD|cC, C → cC, D → d
 the symbol C cannot derive a string of terminals.
 So it and all rules that contain it may be eliminated to
get just
 S → aB, B → b|BD, D → d

53
Generating strings of terminals
 A simple induction shows that the only symbols that
can generate strings of terminals are
 terminal symbols
 variable A for which A → a is a rule of the grammar and every
symbol of a generates a string of terminals

54
Our example revisited
S → aB, B → b|BD|cC, C → cC, D → d

 In the grammar above, we would observe


 first that a, b, c, and d generate strings of terminals (at level
0),
 then that B and D do (at level 1),
 and finally that S does (at level 2)

55
Removing the two kinds of useless symbols
 The characterizations of the two kinds of useless
symbols are similar, except that
 To find reachable symbols, we work top down
 To find generating symbols, we work bottom up.
 When removing useless symbols, it’s important to
remove unreachable symbols last
 since only this order will leave only useful symbols at the end
of the process.

56
Example of removing useless symbols
 Using the algorithms implicit in the above
characterizations, suppose a CFG has rules
 S → aB, B → b|bB|CD, C → cC, D → d
 We first observe that a, b, c, and d generate strings of
terminals (at level 0)
 then that B and D do (at level 1)
 and finally that S does (at level 2).
 But removing the rule B → CD from this grammar
makes the symbol D unreachable.

57
Exercise on removing useless symbols
 Eliminate the useless variables from the grammar given
by the production rules:
1) S → aS|A|C Answer:
A→a
B → aa S → aS|A
C → aCb A→a

Answer:
2) S → aB|bC
A → Bac|bSC|a S → bC
B → aSB|bBC C → ac
C → SBc|aBC|ac
58
Eliminating l-rules
 Sometimes it is desirable to eliminate l-rules from a
grammar G.
 This cannot be done if l is in L(G),
 But it's always possible to eliminate l-rules from a CFG
and get a grammar that generates L(G) - {l}.

59
Nullable symbols
 Eliminating l-rules is like eliminating useless symbols.
 We first define a nullable symbol A to be one such that
A =*> l.
 Then for every rule that contains a nullable symbol on
the RHS, we add a version of the rule that doesn't
contain this symbol.
 Finally we remove all l-productions from the resulting
grammar.

60
Nullability
 Note that l is in L(G) iff S is nullable.
 In this case a CFG with S → l as its only l-rule can be
obtained by removing all other l-rules and then adding
this rule.
 Otherwise, removing l-rules gives a CFG that
generates L(G) = L(G) - {l}
 By a simple induction, A is nullable iff
 G has a rule A → l, or
 G has a rule A → a, where every symbol in a is nullable

61
Example: removing nullable symbols
 Suppose G has the 9 rules
S → ABC | CDB, A → l|aA, B → b|bB,
C → l|cC, D → AC
 Then A and C are nullable, as is D.
 Optionally deleting nullable symbols adds:
 S → BC | AB | B, S → DB | CB | B,
 A → a, C → c, D → A | C
 Removing A → l and C → l (and not adding D → l)
gives a CFG with 16 distinct rules

62
Exercise on Eliminating l-Productions
 Remove Empty Productions from the following
production rule of a CFG

S → AB Answer:
A → aAA | l
B → bBB | l S → AB | A | B
A → aAA | aA |a
B → bBB | bB | b

63
Observations on the previous example
S → ABC | CDB, A → l|aA, B → b|bB,
C → l|cC, D → AC

 Note that if the rule S → ABC had been replaced by


S → AC, then l would be in L(G).
 We’d then have to allow the rule S → l into the
simplified grammar to generate all of L(G).
 Our algorithm for eliminating l-rules has the annoying
property that it introduces rules with a single variable
on the RHS
64
Unit productions
 Productions of the form A → B are called unit
productions.
 Unit productions can be eliminated from a CFG
 In all cases where A =*> B => a, a rule must be added of
the form A → a

65
Eliminating unit productions -- example
 Consider the familiar grammar with rules
 E → E+T | T, T → T*F | F, F → x | (E)
 Here we have that
 E =*> T and T =*> F (at level 0), and
 E =*> F (at level 1)
 Eliminating unit productions gives new rules
 E → E+T | T*F | x | (E)
 T → T*F | x | (E)
 F → x | (E)

66
Exercise on Eliminating unit Productions
 Remove Unit Productions from the following
production rule of a CFG

S → Aa|A Answer:
A → a|bc|B
B → A|bb S → Aa |a|bc|bb
A → a| bc |bb
B → a| bc |bb

67
Order of steps when simplifying
 To eliminate useless symbols, l-productions, and unit
productions safely from a CFG, we need to apply the
simplification algorithms in an appropriate order.
 A safe order is:
 l-productions
 unit productions
 useless symbols
▪ nongenerating symbols
▪ unreachable symbols

68
Chomsky normal form

Noam Chomsky
– The Grammar Guy
– 1928 –
– b. Philadelphia, PA
– PhD – UPenn (1955)
• Linguistics
– Prof at MIT (Linguistics)
(1955 - present)
69
Chomsky normal form
 One additional way to simplify a CFG is to simplify each
RHS
 A CFG is in Chomsky normal form (CNF) iff each
production has a RHS that consists of
 a single terminal symbol, or
 two variable symbols
 For any CFG G, with l not in L(G), there is an equivalent
grammar G1 in CNF.
 Must begin the transformation after simplifying the
grammar (removing λ, all unit productions, & useless
variables)
70
Converting to Chomsky normal form
 A CFG that doesn't generate l may be converted to
CNF by first eliminating all l-moves and unit
productions.
 This will give a grammar where each RHS of length less
than 2 consists of a lone terminal.
 Any RHS of length k > 2 may be broken up by
introducing k-2 new variables.
 For any terminal a that remains on a RHS, we add a
new variable and new rule Ca → a.

71
Converting to CNF: an example
 For example, the rule S → AbCD in a CFG G can be
replaced by
 S → AX, X → bY, Y → CD
 Here we don’t change L(G)
 After the remaining steps, the new rules would be
 S → AX, X → CbY, Y → CD, Cb → b
 Again we don’t change L(G)

72
A more complete example
 Consider the grammar with rules
 E → E + T | T * F | x, T → T * F | x, F → x
 The last rule for each symbol is legal in CNF.
 We may replace
 E → E + T by E → EX, X → C+T, C+ → +
 E → T * F by E → TY, Y → C*F, C* → *
 T → T * F by T → TZ, Z → C*F, C*→ *

73
The resulting grammar
 The resulting CFG is in CNF, with rules
 E → EX | TY | x
 T → TZ | x
 F→x
 X → C+T
 Y → C*F
 Z → C*F (or Z could be replaced by Y)
 C+ → +
 C* → *

74
Exercise
 Convert the CFGs into Chomsky Normal Forms:

1) S   |ADDA 2) S ➞ bA | aB
Aa A ➞ bAA | aS | a
Cc B ➞ aBB | bS | b
D  bCb

3) S ➞ abAB 4) S ➞ aS | bS | B
A ➞ bAB | λ B ➞ bb | C | λ
B ➞ BAa | A | λ C ➞ cC | λ

75
Greibach Normal Form
 A CFG G is in Greibach Normal Form if
 every production is of the form A  a , where V* and
a  ( may be ) and
 S   is in G if   L(G).

 When   L(G), we assume that S does not appear on


the R.H.S. of any production.
 For example, G given by
S  aAB | A, A  bC, B  b, C  c
is in GNF.
76
Greibach Normal Form
 For converting any CFG G into GNF, we use this two technical lemmas:
Lemma 1:
 Let G = (V, T, P, S) be a CFG, and A  B be an A-production in P. Let the B-
productions be B  1 | 2 | . . . | s . Define P1 = (P – {A  B })  {Ai | 1  I 
s }. Then, G1 = (V, T, P1, S) is a CFG equivalent to G.
Lemma 2:
 Let G = (V, T, P, S) be a CFG, and let the set of A-productions be
A Aα1 | . . . |Aαr | 1 | . . . | s (i's do not start with A).
 Let Z be a new variable and G1 = (V{Z}, T, P1, S), where P1 is defined as follows:
(i) The set of A-productions in P1 are A  1 | . . . | s | 1Z | . . . | sZ
(ii) The set of Z-productions in P1 are Z α1 | . . . |αr | α1Z | . . . | αrZ
(iii) The productions for the other variables are as in P. Then G1 is a CFG and
equivalent to G.
77
Conversion to GNF
Step 1:
 Eliminate null productions and then construct a grammar G in Chomsky
normal form generating L.
 Rename the variables as A1, A2, . . . , An with S = A1.
 We write G as ({A1, A2, . . . , An}, Σ, P, A1).
Step 2:
 To get the productions in the form Ai a or Ai Aj , where j > i, convert the
Ai-productions (i = 1, 2, . . . , n - 1) to the form Ai Ai such that j > I, using
lemma 2.
Step 3:
 Convert An-productions to the form An a. Here, the productions of the form
An  An are eliminated using Lemma 2.
 The resulting An-productions are of the form An a.
78
Conversion to GNF
Step 4:
 Modify the Ai-productions to the form Ai  a for i = 1, 2, . . . ,
n – 1.
 At the end of step 3, the An-productions are of the form An
a.
 The An-1-productions are of the form An  a ' or An-1  An .
 By applying Lemma 1, we eliminate productions of the form
An-1  An .
 The resulting An-1-productions are in the required form. We
repeat the construction by considering An-2, An-3, . . . , A1.
79
Conversion to GNF
Step 5:
 Modify Zi-productions. Every time we apply Lemma 2, we get a new
variable. (We take it as Zi when we apply the Lemma 2 for Ai-
Productions.)
 The Zi-productlons are of the form Zi Zi or Zi  (where  is
obtained from Ai  Ai), and hence of the form Zi  a or Zi  Ak
for some k.
 At the end of step 4, the R.H.S. of any Ak-production starts with a
terminal.
 So we can apply Lemma 1 to eliminate Zi  Ak .
 Thus at the end of step 5, we get an equivalent grammar G1 in GNF.
80
Conversion to GNF
Example: Construct a grammar in Greibach normal form
equivalent to the grammar
S  AA | a, A  SS | b
Solution:
Step1:
 Eliminate null productions and convert G to CNF.
 The given grammar is in CNF.
 S and A are renamed as A1 and A2, respectively.
 So the productions are A1  A2A2 | a, and A2  A1A1 | b.
 As the given grammar has no null productions and is in CNF we need
not carry out step 1.
 So we proceed to step 2.
81
Conversion to GNF
Example: Cont.…
Step2:
 Convert the Ai-productions (i = 1, 2, . . . , n - 1)
Ai  a or Ai  Aj such that j > i.
(i) A1-productions are in the required form.
▪ They are A1  A2A2 | a.
(ii) A2  b is in the required form.
▪ We apply Lemma 1 to A2  A1A1.
▪ The resulting productions are A2  A2A2A1, A2  aA1.
▪ Thus the A2-productions are A2  A2A2A1, A2  aA1, A2  b

82
Conversion to GNF
Example: Cont.…
Step3:
 Convert An-productions of form An  An to the form
An  a , by applying lemma 2.
 We have to apply Lemma 2 to A2-productions as we have
A2  A2A2A1, and let Z2 to be the new variable.
 The resulting productions are:
A2  aA1 A2  b
A2  aA1Z2 A2  bZ2
Z2  A2A1 Z2  A2A1Z2
83
Conversion to GNF
Example: Cont.…
Step4:
 Modify the Ai-productions to the form Ai  a for i = 1,2, . . . , n–
1 by applying lemma 1.
(i) The A2-productions are A2  aA1 | b | aA1Z2 | bZ2.
(ii) Among the A1-productions we retain A1  a and eliminate
A1  A2A2 using Lemma 1.
▪ The resulting productions are A1  aA1A2 | bA2, A1  aA1Z2A2 | bZ2A2
▪ The set of all (modified) A1-productions is
A1  a | aA1A2 | bA2 | aA1Z2A2 | bZ2A2

84
Conversion to GNF
Example: Cont.…
Step5:
 Modify Zi-productions, which are of the form Zi  a or
Zi  Ak for some k.
 The Z2-productions to be modified are:
Z2  A2A1, Z2  A2A1Z2
 We apply Lemma 1 and get
Z2  aA1A1 | bA1 | aA1Z2A1 | bZ2A1
Z2  aA1A1Z2 | bA1Z2 | aA1Z2A1Z2| bZ2A1Z2

85
Conversion to GNF
Example: Cont.…
 Hence the equivalent grammar is
G' = ({A1, A2, Z2}, {a, b}, P1, A1)
where P1 consists of

A1  a | aA1A2 | bA2 | aA1Z2A2 | bZ2A2


A2  aA1 | b | aA1Z2 | bZ2
Z2  aA1A1 | bA1 | aA1Z2A1 | bZ2A1
Z2  aA1A1Z2 | bA1Z2 | aA1Z2A1Z2| bZ2A1Z2

 G' is the required equivalent grammar in GNF.

86

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy