0% found this document useful (0 votes)
1K views95 pages

Chapters (5 - 8) TOC BOOK by Adesh K Pandey

Uploaded by

Third Semester
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
1K views95 pages

Chapters (5 - 8) TOC BOOK by Adesh K Pandey

Uploaded by

Third Semester
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 95
Properties of | Regular Languages 5.1. Inrropuction Inside This. Chapter This chapter explores the properties of ae regular languages. Our first tool for this |/ 52° provng Languages Not io be exploration is a way to prove that certain Regular are not regular. One important | 5.3. Closure Properties of Regular I kind of fact about the regular languages is ya called a “closure property.” These properties let us build recognize for languages that are constructed from other languages by certain operations, As an example, the intersection of two regular languages is also regular. Thus, given automata that recognize two different regular languages, we can construct mechanically or automation that recognizes exactly the intersection. of these two languages. Some other important facts about regular languages are called “decision properties,” Our study of these properties gives us algorithms for answering important questions about automata. Decision Properties of Regular Languages: 5.2. Provinc LanGuaGes Not To BE REGULAR We have established that the class of Languages known as the regular languages for at least four different descriptions. They are the languages accepted by DFA’s, NFA’s, and by €-NFA’s; they are also the languages defined by regular expression. Not every language is regular. In this section, we shall introduce a powerful technique, known as the “Pumping Lemma’, for showing certain language not to be regular. WwW Automata Theory and Formal Lang © 2.1. The Pumping Lemma for Regular Languages Pumping Lemma is a powerful tool for proving certain language regular. It’s also useful in*the development of algorithm to answer cet questions concerning, finite automata, such as weather the language acct by a given FA is finite or infinite. Statement ; Let L be a regular language. Then there exist a consti (which depends on L) such that for every string w in L, such that | w n, we can break w into three substrings, w = xyz, such that : @ ye Gi) Ixy lsn (iii) For all 7 2 0, the string xy’z is also in L. ‘That is we always find a nonempty string y not t00 far from the begii of w that can be “pumped”; that is, repeating y any number of times, k the resulting string in the language. Proof, Suppose L. is regular. Then L = L(M) for some DFA, M. Suppo: has n states. Now, consider any string w of length » or more, say w = a yy Where m > and each a, is an input symbol, for i = 0, 1, 2, .... md state p, to be 8(gg, aay ...4;) + the start state of M. That symbols of 7. By the pigeonhole principle, it is not possible for the n + 1 different for i=0, 1, .., n to be distinct, since there are only 1 different states. Thus can find two different integers i and j, with 0 $ i 0, then M goes'from py to p; on input string x, circles from pf Pi jtimes on input yj, and then goes to:the accepting state on input =: Thus, for any i 2 0, x/2 is also accepted by M; that is, ayz is in L. We can also understand these transactions as follows = For i=0 : : tael aS (Pop) = Spi X)= Pe Bp) = Pa For i> 0 BGqo.2) = Sem, =P: : Brey) = Pi In general (pi Vise) = Pi &(-2) = Pn Pe 5.2.2. Application of the Pumping Lemma ee oy ‘The pumping lemma is extremely useful in proving that certain sets are non-regular. The general methodology in its application is an “adversary argument” of the following : 1, Select the language L, you/wish to prove non-regular, >. The “adversary” picks 1/ the constant mentioned in the pumping lemma, Once the adversary has picked 1, he may not change it 43, Select a string 2 in L. Your choice may depend implicitly on the value of n chosen. A 4, The adversary breaks z into u, v and w, subject to the constants that {uv | Solution. Let us see how we could apply the pumping lemma directly to ¥ this case. The pumping lemma says that there must be strings =) y and z such that all words of the form xy"2 are in L. How do we break this into three pieces x, y and 2 ? Case 1. If middle part y is made off entirely of a's, as x aaaa ... Z How if we pump it as xyz, xyyy2 then number of a's increases, but in language L = {a"b” for n =0, 1, 2,3, a's and b’s are equal so it is not allowed. 174) Automata Theory and Formal Case 2. If middle part y is made off entirely of b's as x bbbbb ... z For the same reason, it is also not allowed. Case 3. y part is made of some positive number of a's and some posi number of b’s. This would mean that y contain the substring ab, x... aaaaabbbbb ... z Then xyyz would have two copies of the substring ab. But every wor L contains substring ab exactly once. Therefore xyyz cannot be a word in! This proves that the pumping lemma cannot apply to L and therefore not regular. Example 5.2. Prove that L = {a"b a" for n = 0, 1, 2, ...) is not reguli Solution. If this language were regular then there would exist three stri +, y and z such that xyz and xyy2 were both words in this language. We: show that this is impossible. Observation 1. If the y string contained the b, then xyy2 would cont two bs which is not allowed, according to the language L. q Observation 2, If the y string is all a’s than the & in the middle of word xyz is in the x-side or 2-side. In either case, xyyz has increased number of a’s either infront 6f the 6 or after b, but not both. Conclusion. Therefore, xyyz does not have its b in the middle and is in the form aba”. This language cannot be pumped and is therefore not rej Example 5.3. Prove that L = {a"b" ab" *" for n = 1, 2, 3, « Solution. We are going to show that this language too is not regular showing that if xyz is in this language for any three strings x, y and z, xyyz is not in this language : Observation 1. For every word in this language, if we know the number of a’s, we can calculate the exact number of b's (twice the total n of a’s — 1), And conversely, if we know the total number of b's, we can uni caleulate the number of a's (add 1 and divide by 2). So no two diff words have the same number of a’s and b's. Observation 2. All words in this language have exactly two substri two substrings equal to ab and one equal to ba. Observation 3. If xyz and xyyz are both in this language, then y c contain either substring ab or the substring ba because then xyyz would too many. Conclusion 1. It must be a solid dump of a’s or solid dump of b's. Conclusion 2. If y is solid a's, then xyz and xyyz are different words same total b's, violating observation 1. If y is solid b’s, then xyz and xyyz different words with the same number of a’s violating observation 1. Conclusion 3. It is impossible for both xyz and xyyz to be in this langu for any string x, y and z, Therefore, the language is unpumpable and n regular. Properties of Regular Languages 715) Example 5.4. Prove that language which contains set of strings of balanced parentheses is not regular. Solution. Let us suppose language is regular, and n be a constant according to pumping lemma. We can divide each string in three parts xyz as | xy | 0. Let us choose i = 0. The resulting string xz, is not in L since it has fewer O's then 1's. This contradicts the pumping lemma, so our original assumption that L was regular, must have been incorrect. Example 5.6. Prove that language L = (0"1" | nm < mi) is not regular. Solution. Let us analyse the language first, L = 0"1" | n 0. Conclusion. So let us choose k to be 2. The resulting string xyz; is mi L, since it has more 0's than 1's. This contradicts the pumping lemma, so @ original assumption, that L was regular, must have been incorrect. Example 5.7. Prove that 1 = (0"1"" | 1 21) is not regular. Solution. Let us assume that L is regular and p be the constant provi by the pumping lemma. Let w be the string of L w= or? Observation. The string w is in L, and its length at least p. So w can written as xyz with | xy |

0. Conclusion. Since | xy | 0 (since y was not €). For second inequality, we compute : (P+ iy 2p +1, and this greater than p’ + q since q

0 (since y was not €). For second inequality, we compute (p + 1) =p? + 3p? + 3p + 1, and this is greater than p’ + q since q < p (since | xy | 0. Conclusion. Let us choose k = 2 then xyyz is not in L, since as we pump y by two then number of a's increases which violates the condition n,(w) < n,(w). So we got a contradiction for the pumping lemma so our assumption that L is regular must have been incorrect. Example 5.11. Prove that set of all strings begining with a nonempty string of the form ww, is not regular. It is given that = = {0, 1). Solution. Let us assume that given language L. is regular. Because the regular language over any alphabet are closed under intersection (we will discuss this theorem in Section 5.3). So clearly language L’ = LL is regular. Now let us assume that 11 be constant given by the pumping lemma for language L’. Let w= 10" e By the pumping lemma, we can write w = xyz such that xy <1, y#€, and for all k = N, xyz € L’. Now let us consider xyz (that is choose k = 2). Conclusion. If y contains a 1, then xy"z certain more than two 1’s and hence is not in L’. If y contains no 1’s then xyz = 10"*'10" for i > 1, and hence is not (ince y ¥ €). In either case, we obtain a contradiction. We conclude that L is not r. Example 5.12, Prove that language L, which is the set of all stx beginning with a string w of length at least 3 such that w = w*, is regular. It is given that alphabet 5 = {0, 1). Solution. Let us assume L is not regular. Because the set of reg languages over any alphabet is closed under intersection, the langi L'=L.0 Lis regular, (let say 001* 01°00). Let be the constant guaranteed by the pumping lemma for w = 001"*'01" "00 € L’. By the pumping lemma; we can write w = xyz § that xy < m, y #e and for all k € N, xz 1’. Conclusion. Because every string in L’ has exactly five Qs, y can cont 0's, for otherwise, xy°2 would not be in L’. Therefore xy'z = 001" *'*! + for some i > 0. Clearly xy"z contains no prefix of length at least three d palindrome so xyz ¢ L’—a contraction. Therefore, Lis not regular. Example 5.13. Prove that language L = {0 |KeiN} is not regular, Solution. By contradiction. | Let us assume that L is regular\Let n be constant provided by the pum lemma. Let w = 0° for some i € N such that 2' > n. Clearly w L, by the pum lemma we can write w = xyz such that | xy | 0 a contradiction. Therefore, |. is not regular. Example 5.15. Prove that L = {x € (0, 1)* | x = x") is not regular, Solution. We will apply same approach of proving by contradictio Let us assume that Lis regular and 1 be a constant guaranteed by pum lemma. Properties of Regular Languages {179} Let-w = 0" 10" € L, By the pumping lemma we can write w= xyz such that | xy | $n, ye © and for all ke Noxy'z € L. Because | xy | Ty is regular ‘Theorem 5.4 If L and m are regular languages, then so is L ~ m. Proof, Observe that L - nt = Lami. Here iii is also regular since m is regular (according to Theorem 52) and L/vii is regular according to the Theorem 5.3. Therefore L - m is regular. 5.3.1. Reversal ‘The reversal of a string a, a, a; .. a, is the string written backwards, that is aa"... a, We use to" for the reversal of string w. Thus, 1101* is 1011 ande* =e. Ea The reversal of a language L, written L", is the language consisting of the reversal of all its strings. For instance, if L = (001, 10, 111}, then L= (100, 61, 111}. Reversal is another operation thal prese! gular languages; that Is if L is regular language, so Is L* Thére are two simple proofs, one based on automata and one based on regular Cy_ expression Given a language 1 that is L(A) for some finite automation, perhaps, with non-determinism and €-transitions, we may construct automation for L* by: 1. Reverse all the arcs in the transition diagram for A. >. Make the start of A be the only accepting state for the new automation. 3. Create a new start state py with transition on € to all the accepting states of A. The result is an automation that simulates A “in reverse”, and therefore accepts a string w if and only if A accepts 1”. ‘Theorem 5.5. If L is a regular language, so is ie Proof, Let us assume that L is defined by the regular expression r. The proof is a structural induction on the size of r. We show that there is another regular expression r" such that L(r®) = (L(n)*; that is the language of ris the reversal of the language of r- Basis. If r is €, 6, or a, for some symbol a, then rig the same as r. That is, we know fe} = [e], 0° = @ and {a)* = (a). Induction. There are three cases, depending on the form of r. 1. r= ry + tp Then r= 18 + rf. The justification is that the reversal of the Uniory of two languages is obtained by computing the reversal of the two languages and taking the union of those languages 1= ty .t» Then r= 1,'r:", Note that we reverse the order of the two languages, as reversing the language themselves. 3. r= 7,2. Then r*= (r,*)*. The justification is that any string » in L(r) can be written as 1, ty «., Wy, where each 1, is in L(r). But [123] Automata Theory and Formal Each w;' is in L(r)®, sow" is in (rf)>. Example 5.17. Let L be defined by the tegular expression (0 + 1)0*, th find 1. Solution. L= (0+1)0 L® = (0»)* (0 + 1), by the rule for the concatenation. 5.3.2. Homomorphism . A string homomorphism is a function on strings that wo Definition by substituting a particular string for each symbol. 2 __ Suppose 3 and ¥/ are alphabets. Then function 7 = fee —— . is-called a “homomorphism”, y In other words, a homomorphism is a substitution in which a single lef is replaced with a string. The domain of the function li is extended to strin in an obvious fashion, if Ce ee E then A(w) = Io,) h(x) (aay. hx,) If L. is a language on $, then its homomorphic image is defined as W(L) = (h(eo) : w € Ly, Example 5.18. Let Y = {0, 1) and E’ = (0, 1, 2) and defined n by (0) = 01 h@) = 112 Find (010) and homomorphic image of L = (00, 010). Solution. Given {0, 1], D’ = (0, 1, 2) his defined as : 01 112 So (010) = 0111201. The homomorphic image of L = (00, 010} is the language h(L) = (01 0111201). If we have a regular expression r for a language L, then a regi expression for h(L) can be obtained by simply applying the homomorphi to each © symbol of r. Example 5.19. = {0, 1), and 5’ = (1, 2, 3). Define h by ho) = 3122 (1) 132 if L is regular language denoted by r= (0+ 1*) (00)* the find out the regular expression for language h(L). Solution. Given z= 10, 1} Properties of Regular Languages 123] SS 2.8 and /t is defined as ‘h(O) = 3122 ha) = 132 1 = (0 + 1*) (00)*; where r is regular expression for language L. Let regular expression for the language h(L) be 1’, then # = (h(Q) + W(A)*) (0) (0) Let us put h(0) and (1) in 1. r= (3122 + (132)*) (6122 3122)" denotes then regular language h(L). Theorem 5.6. Let h be a homomorphism. If L is a regular, then its homomorphic image Ii(L) is also regular. The family of regular languages is therefore closed under arbitrary homomorphisms. Proof. Let L be regular language denoted by some regular expression T- We find (r) by substituting h(a) for each symbol a € or 7. It can be shown directly by an appeal to the definition of a regular expression that the result is a regular expression. It is equally easy to see that expression denotes HL). ‘All weneed to do is to show that for every w € L(r). The corresponding /(z) is in L(h(r)) and conversely that for every x in L(i(7)) there is a w in L, such that x = h(w). Leaving the detail as an exercise, we claim that h(L) is regular. 5.3.3. Inverse Homomorphism “Figure 5,6 suggests the effect of a homomorphism on a language L in part (@, and the effect of an inverse homomorphism in part () a-=f9) ==) (a) A homomorphism applied in (6) A homamorphism applied into forward direction. inverse. direction. ‘Fig. 5.6. Theorem 5.7. If n is a homomorphism from alphabet to alphabet 5’, and L is a regular over 3’; then t’(L) is also regular language. Proof. The proof starts with a DFA A for L. We construct from A and ft a DFA for I'(L) using the plan suggested by Fig. 5.7. This DFA uses the States of A but translates the input symbol according to h before deciding on the next state. (24 Automata Theory and Format tnput'a Input ja) to A os Accepts/rejects: > legl= Fig. §.7. The DFA for /'(L) applies to its input, and then simulates the DFA for Formally, let L be L(A), where DFA A = (Q, B’, 8 qo F). Define a Dl (Q, . 8, q, F), where transition function 8’ is constructed by the 8(q,a).= 8(q, i(a)). That is, the transition B makes on input a is the result] the sequence of transitions that a makes on the string of symbols h( Remember that f(a) could be , it could be one symbol, or it could be symbols, but 6 is properly defined to take care of all these cases. It is an easy induction on | w | to show that §(q), w)=8(qp. htt). the accepting states of A and B are the same, B accepts w if and only if A accept h@w), Put another way, B accepts exactly those strings what are in 1'(L) 5.4, Decision Properties or ReGcutar Lancuaces In this section we consider how one answers important questions abou regular languages. First, we must consider what it means to ask a quest about a language. The typical language is infinite, so. you cannot present # strings of the language to someone and ask a question that requires them inspect the infinite set of strings. Rather, we present a language by git one of the finite representations for it that we have developed : a DFA, NPA, an <-NFA or a regular expression. Of course the language so described will be regular and in fact there is1 way at all to represent completely arbitrary languages. Let us consider of the fundamental questions about languages : 1. Is the given language, empty ? 2. Isa patticular string w is in the described language ? 3. Do two descriptions of a language actually describe the sam language ? This question is often called “Equivalence” of langua 5.4.1. Testing Emptiness of Regular Languages If our representation is any kind of finite automation, the emptiness question is whether there is any path whatsoever trom the start state to some accepting state, if so, language is nonempty, while if the accepting states are all separated from the start state, then the language is emply. The'start state is surely reachable from the start state, if there is an from q top with any label (an input symbol, or if the automation is an NEA) then p is reachable. In that manner we can compute the set of reacha Properties of Regular Languages (125) states. any accepting state is among them, we answer “no” (the language of the automation is not empty), and otherwise we answer “yes”. Note that the reachability calculation takes no more time that (ir) if the automation has 1 States, and in fact it is no worse than proportional to the number of atcs in the automation is transition diagram, which could be Jess than n” and cannot be more than O(n"). If we are given a regular expression representing the language L, rather than an automation, we could convert the expression to an <-NFA and proceed as above. Since the automation that results from a regular expression of length nt has at most O(r) states and transitions, the algorithm takes (7) time. 5.4.2. Testing Membership in a Regular Language The next question of importance is, given a string w and a regular language L, is win L, While w is represented explicitly, L is represented by an automation or regular expression. If L is represented by a DFA, the algorithm is simple. Simulate the DFA processing the string of input symbol w, beginining in the start state. If the BFA ends in an accepting state, the answer is “yes”; otherwise the answer is “no”. If | w | =n, and the DFA is represented by a suitable data structure, such as two-dimensional array that is the transition table, then each transition requires constant time, and the entire test takes O(rr) time, ‘Theorem 58. Whether two regular sets are identical is solvable. Proof, Let us take two finite automate (M, and M,) and examine a picture of the sets they accept. ( If the intersection is the same as both sets then. indeed they are identical. Or, on the other hand, if the areas outside the intersection are empty then both sets (s(M,) and’ s(M,)) are identical. Fig, 5.8. Let us examine these outside areas. The picture in Fig. 59 (a) represents the set accepted by M, and rejected by M, while that in Fig, 5.9 (b) is the set which M, accepts and M, rejects eS 2 (M19 5%) (Mt) 0M) a ® Fig. 5.9. If these two areas (cr sets) are empty then sets accepted by M, and M, are exactly the same, This means that the equivalence problem for s(M,) and 5(M,) is exactly same as the emptiness problem for: Jom yn st) |v |s(Ma) ost) | Automata Theory and Formal Langua: So, if we can solve the emptiness problem for the above set, then we ¢ solve the equivalence problem for s(M,) and s(M,). Since the regular sets-a ‘losed under union, complement and intersection; the above set is a regul set. And we know that emptiness for the class of regular sets is solvabl Hence it proved. Theorem 5.9. Whether a regular set is finite is solvable. Proof. We know that if a finite automation accepts any strings at all t some will be of length less than the size of the state set. (This does not hel directly but it gives a hint as to what we need for this theorem:) If we were to find a string accepted by finite automation m which longer than or equal to the size of it’s state set, we could use the inflati aspect of the pumping lemma to show that machine m must accept on infinil number of strings. This means that : a finite automation accepts only strings of length less than the size of set of states, if and only if it accepts a finite set’ ‘Thus, to solve the finiteness problem for M = (Q, Z, 8, qo F), we need determine whether or not : L(M) - | strings of length < | Q ! l=o A question now arises as to how many inputs we must examine in or to tell it M will accept on input longer than or equal to the size of its state ‘The answer is that we only need to consider input strings upto twice the sk of the state set. Theorem 5.10. If a finite automation accepts any strings, it will ace one of length less than the size of its state set. Proof. Let m be a finite automation which accepts the string x and the length of x is no less than the size of M's state set. Assume further thy Maccepts no string shorter than x. This is the opposite of our theorem) Immediately the pumping lemma asserts that there are strings 1, 7 and such that wow =x, 0 # © and uw € L(M). Since v # €, uw is shorter vw =x. Thus M accepts shorter strings than x and the theorem follows : 4. Use the pumping lemma to show that each of these languages is non-regul (i) (a" b*) = (abb, abbb, aaabbbb ...) (ii) L = (ab"s" | n> 0) (iif) L = (a | n> 0) (io) L = {aba | n> 0) (v) L = {abe where # = 0, 1, 2, ... and m = 0, 1, 4 2. Define the language square as follows : Square = {a" where n is a square, n > 0} (a, aaa, aaaaaaaca Using the pumping lemma to prove that square is non-regular. 3. Prove that L = {a"b” where m is square and » > 0) is non-regular. 4. Define the language double prime as follows Double prime = {a’#” where p is any prime! prove that double prime is non-regular. Properties of Regular Languages 727] 3. Define the language double factorial as follows : DF ={a" b" 1 n> 0} prove that DF is non-regular. Let Ly, Ly Ly « be an infinite sequence of regular languages. () Let L be the infinite union of all these language taken together. Is L necessarily regular ? (iy Ts the infinite intersection of all these languages necessarily regular 2 7. Consider the following language P a" where n is not prime} = le, a aaas, adnan, aaaaaaaa, ..) prove that 7 is non-regular. 8. Consider the language L defined as L = (a! where mis any integer with an even number of digits in base 10] 2 fey aac prove that L is non-regular. 9, () Show that if we adda finite set of words to a regular language, the result is @ regular language (ii) Show that if we subtract a finite set of words from a regular language, the result is a regular language. 40. Prove that following languages are not regular. Latte ken ey i) L= lab kei el (ii) L = (a’bat sj = j or j = MGR)L = vis (o) L = bw = nfo) 4 nC) 11, Prove that following language regular @ b= byes s ye (0,11 Gj Letwfeeweye Oy lelete 42; Suppose that we know that 1 U Ly and L, are regular. Can we conclude from this that L, is regular. 13, Prove that following languages over alphabet (0, 1) are not regular (@ The set of strings of 0's and 1's, beginning with @ 1, such that when interpreted a5 an integer, that integer is prime, (ii) The set of strings of the form w In, where w is a string of 0's and 1's of length 1, ii) The set of strings of 0's and 1's of the form wi, where @ is formed from w by replacing all 0's by 1's, and vice-versa that is, mi = 100, and 011 100 is an example of string in the language 44, Given an algorithm to tell whether a regular language L is infinite. 415. Let A be any FA with N states. Then : (i) TA accepts an input string w such that N-< length (#9) < 2N, then A accepts an infinte language (i) tf A accepts many words, then A concepts some word w stich thet NS length (w) <2N. sess Context-Free Grammars and Languages 6.1, Grammars Susie This Chapter All of us know that a grammar is nothing but a set of rules to define valid sentences in any languages. In this chapter we introduce the context-free grammars, which generates context-free languages. Context-free languages have great practical significance in defining programming languages and in simplifying the translation for programming languages. wy Initially Linguists were trying to define precisely valid sentences and give structural description for these sentences. They tried to define rules ff natural languages like Hindi, English ete. Noam Chomsky gave a mathemati model for the grammars in 1956. Although it was useless to describe natura languages but it becomes very useful for computer languages ‘The original motivation for grammars was the description’ of natu Janguages. We can write rules for the grammar of natural languages as follows —> —>

‘According to above set of rules “The gitl eats” is valid sentence. 6.1. Grammars 6.2. Context-Free Grammars 6.3. Parse Tree 6.4. Parsing an Example of Contes Free Grammar Ambiguity in Grammars and Languages 65. It can be easily observed that, some words in the grammar works as terminator for the sentence, these symbols are called terminal symbol. Rest of the symbols of the vocabulary are called non-terminals. So it is clear that terminals and non-terminals forms vocabulary tor any language. 128 Context-Free Grammars and Languages “i03) Let us now formalise the idea of a grammar and how itis tised. There are tant components in a grammatical description of a language + Gates is a finite set of symbols that form the strings of the language being defined. We call these alphabets terminal symbols, represented by V,. 2, There is a finite set of variables, also called sometimes non-terminals or syntactic categories. Each variable represents a language; i., a set of strings, represented by V, . 3, One of the variable represents the language being defined; it is called the start symbol. Generally it is denoted by S. — 4 There is a finite set of rules or productions that represents the recursive definition of a langauge. Each production consists of : (ay A variable that is being (partially) defined by the production. This variable is often called the head of the production. (v) The production symbol —> (c) A string of zero or more terminals and variables. This string, called the body of the production, represents one way to form strings in the language of the variable of the head. So, we leave terminals unchanged and substitute for each variable of the body any string that is known to be in the language of that variable After this discussion, we are able to define context free grammar. 6.2. Context-Free GRraMMaRs Now let us formalize the concept of context free grammar, as we discussed intuitive notions for context free grammar in previous section. cf Mathmatically context-free grammar is defined as follows = Definition “A grammar G = (Vy Vy P S) is said to be context-free” a where V,: A finite set of non-terminals, generally ns represented by capital letters, A, B, C, D, V, : A finite set of terminals, generally represented by. small letters, like, 1, b, ¢, dy ¢; frm § ; Starting non-terminal, calléd start symbol of the a grammar. S belongs to V,- A P + Set of rules or productions in CFG: G is context-free and all’ productions in P have the enh p eet es where aeV, ad peu Every regular grammar is context-free, so a regular language’is also a context-free one, It is already proved by pumping lemma that language [a" b'/ n> 0) is not regular, but it is possible to design a context-free-grammar for these languages (we will design it in next section). So now it is very clear that “the family of regular language is a proper subset of the family of context-free language”. = E gram Context-free grammers derive their name from the fact that the Sri substitution of the variable on the left of a production can be made any time, such a variable zppears in the sentential form. It does not depend ‘on the symbols in the rest of the sentential form (the context). This feature is the consequence of allowing only a single variable on the left side of the production. E B 6.2.1. Derivations he We now define the notations to represent a derivation. First we define two notations > and 5. If « = B is a production of P in CFG and.a and 6 € are strings in (V, U V,)*, then nab > aBb We say that the production « — is applied to the string a « b to obtain @B bor we say that ao b directly drives a B b. are string in (V, UV,)*, Now suppose 0, (4, ty m > Vand 04 Oy, Oy Oy, 09 Oy oer Ora = On fe Can a Sai trary Me ve Then we say that a Say, 1¢,, we say ay, drives «., in grammar G. If drives B by exactly i steps, we say a8. Grammar 6.2.2. Language of Context-Fre If G = (V,, Vp P, §) is a CFG, the language of G, denoted by L(G), is the set of terminal strings that have derivations from the start symbol. That is and L(G) = {winv,/so} (ek an es eae 6.2.3. Sentential Forms Derivations from the start symbol produce strings that have a special’ rule. We call these “Sentential forms”. That is, if G = (V,, V, P, S) is a CFG, then any string « in (V, U V,)* such that S=ya_ isa sentential form. Note that the language L(G) is those sértential forms that are in Vj, i.e. they consists solely of terminals. Now let us discuss some examples, Example 6.1. Consider a grammar G = (V,, Vy P, S) where For V, = ISI, V; = (a, 6) and set of productions P is given by pic P= (S— aSb Sab he Context-Free Grammars and Languages Here § is the only non-terminal which is the starting symbol for the grammar; ‘a’ and ‘b’ are terminals, There are two productions 5 — as and 5 ab, Now we will show how the strings wb" can be derived. ~~ S = aSb = anbb = ab Here we need to apply the first production then second production. By applying first production’ — 1 times, followed by an application of second production, we get $= aSb = aaSbb = (0 =a" tsp"? J = a"b" Hence we can say that language for the above grammar is L(G) = {a b’/ n > =I). Example 6.2. Following is a CFG for the language nl L = (wew"lw € (a, b)*) Solution. Let G be CEG for language L = {wew*/w € (a, 4)*) G G= Vy VPS) gpeba. 9°”, Here V,.= {S} \ a V, = {a, 6, ¢} spe 7 Ue and P is given by C2 vse” , (soy asa of % $a @ vv SSH. & Se 5 Let us check that abbebba can be ‘drived from the given CFG. S = aSa (use the $ — aa) => abSba (use the S > bSb) => abbSbba (use the S$ — bSb) = abbebba (use the $= 0) So string abbebba can be derived from given CPG. 6.2.4. Bacus Naur Form (BNF) While linguists were studying CFG’s computer scientists began to describe programming languages by notation called Bacus-Naur form or Bacus Normal Form, which is the CFG notation with minor changes in format and some shorthand. For example, the above grammar can be rewritten in BNF as S —>bA/aB A >bAA/2S/a B —aBB/bS/a ‘Automata Theory and Forme! Languages ‘Hence BNF is a shorthand notation for context free grammar. Example 6.3. Write a CFG, which generates string of balanced parenthesis. Solution. This grammar will accept the balanced right and left parenthesis. For example () () is acceptable, ((())) is also accepted Let us design the CFG for this Let CFG be G= (Vy V; P, S) where V, = set of non-terminals = (S}, V, = set of terminals = ((, )) and set of production P is given by . $388 5 539) an, Ww? Sse g Now we see what this grammar generates : $= SS = (5) § = (5) $$ = (9) (8) S = (S) (S) (S) 3() 8) 6) 2008 SOOM Thus the above grammar always generates balanced pairs of parenthesis: Example 6.4. Write a CFG, which generates palindrome for binary numbers. Solution. Grammar will generate palindrome for binary numbers, that is 00, 010, 11, 101, 11100111 ... Let CFG be G= Vy VPS) where V, = set of non-terminal = {5} V, = set of terminals = (0, 1) and production rule P is defined as $ = 050/181 S$ 0/1/e Obviously this grammar generates palindrome for binary nup bers, it can be seen by the following derivation $= 050 = p1si0 = 0108010 => 0101010 which is a palindrome. num Vs. whe and wt Context-Free Grammars and Languages = {133} Example 6.5. Write a CFG for the regular expression r= (010 + BE Solution. Let us analyse regular expression r= o1@+1)° Clearly regular expression is the set of strings which starts with any number of 0'5 followed by a one and end with any combination of 0's and 1's. Let CFG be G= Vy Vy P, 5) where V, = (S,.A, B} Vv, = (0,1) and productions P are defined as $ > AIB A> 0A/e B— 0B/1B/€ Let us see the derivation of the string 00101 S = ALB => OA10B = 00A101B = 00101 So clearly G is CEG for regular expression r_ Example 6.6. Write a CFG which generates strings having equal number of a’s an b’s. Solution. Let CFG be C= WeVeP, 9) v, = 18) V, = {a. b} where P is defined as § > aSbS/bSaS/< Let us derive a string w = bbabaabbaa S = bSaS = bbSaSaS => bbaSbSaSaS => bbaSbaSbSaSaS = bbaSbaaSbSbSaSaS = bbabaaSbSbSaSaS = bbabaabaSbSaSaS = bbabaablSaSaS = bbabaabbaSaS = bbabaabbaaS = bbataabbaa (aq) ___ Automata Theory and Formal Languages Cont Example 6.7. Design a CFG, which can generate string, having any we'c combination of a’s and b’s, except null string. we Solution. Let CFG be G = (V,, V;, P, §) Be v, = (8) V, = {a,b Productions P are defined as (Py, say) S > aS (Py, say) $ > bS (Py, say) S >a (Py say) Sb We can produce the word baab as follows : S = bS (by P,) => baS (by P,) = bas (by P,) = ban (by P,) ‘The language generated by this CFG is the set of all possible strings of the letters a and b except for the null string, which we cannot generate. We can generate any word by the following algorthim : At the beginning, the working string is the start symbol S select a word to be generated. Read the letters of the desired word from left to right, one at atime. If an.a is read that is not the last letter of the word, apply’ P, to the working string. If ‘b’ is read that is not the last letter of the word, apply P, to. the working string. If the last letter is read and it is an a, apply P, to the working string. If the last letter is read and it is b, apply P, to the working string. Productions 3 and 4 can be used only once and one of them can be used. For example, to generated babb, we apply in order productions P, P,, P, and P, as below whie S => bS => baS = babS = babb. Example 6.8. Design CFG for regular expression r= (a+ W aa (a+b). Solution. Let CEG be Bit G= (Vy Vy P, 8) {s, Th {a, b} Productions P are defined as - S — TarT (say P) Tal (say P,) or | ; TOT (say P3). % Toe (say P,) Last three productions, i., P,, P, and P, allows us to generate any word ‘od we want from terminal T. If the non-terminal T appears in any working string, re Context-Free Grammars and Languages 135) we can apply productions to turn in to any string we want. Therefore, the words generated from S have the form any substring aa-any substring or (a + b)* aa (a + DY which is the language of the all words with a double ‘a’ in them somewhere. For example to generate baabaab, we can proceed as follows : $ = TaaT = bTaaT = baTaaT => baaTaaT => baabTaaT = baabeaaT => baabaaT => baabaabT => baabaabe => baabaab ‘There are other sequences that can also derive the word baabaab. Example 6.9. Design a CFG for the regular expression r= (a + 1)". Solution. Let G be CFG a 0 Wis v, = {3} V, = (a, b} Vy P, $) u P are desined as $28 $9 bS $a Sb Sse The word ab can be generated by the derivation $ as =» abS = abe = abe or by the derivation saas = ab. Example 6.10. The difference between even ‘palindrome’ and ‘odd palindrome’ (whose definitions is obvious) is that when we are finally ready to get rid of S in the even palindrome working string, we must replace [136] Automata Theory and Formal Languages it with €. If we were force to replace it with an a or b instead, we would create a central letter and the result would be a grammar for odd palindrome as follows : S — aSa S > bSb Sa S36 Solution. If we allow the option of turning the central S into either € or a letter, we would have a grammar for the entire language palindrome : S + aSa S > bsb $34 Sab Ste The language {a"b"] and palindrome are amazingly similar in grammatical structure while the first is nearly a regular expression and the other is for from it. Example 6.11. Write a CFG for the language L(G) = {ww" : w € (0, 1)"). Solution. Let grammar be G. G= W,V,P, 5) v, = (Ss) V,= (0,1) Productions are defined as $ + 050 (P;) $181 >) Se (P)) Let us derive a string m 100 by above CFG. Ss = 050 (by P,) = 00S00 (by P,) => 0018100 (by P,) = 001100 (by P,) Example 6.12. Design a CFG for the language L(G) = {ab (bbaa)" bba (ba)" : n> 0). Solution. Let CEG be G G= Wy V,P,§) (S, X, Y) {a,b} Productions are defined as S — abX X > bbYa Y — aaXb or ar Context-Free Grammars and Languages Yyoe or we can also defined production as = © S 9 abX X = bbaaXba X — bba Let us derive a string ab(bbaa)? bba(ba)> S = abX = abbbYa => abbbaaXba => abbbaabbYaba = abbbaabbaaXbaba = abibibaa)?bbY aba)” = ab(bbaa)?bbe aba)? = ab(bbna)?bba(bay” or $= abX = abbbaaXba = abbbaabbaaXbaba => ab(bbaa)*bba(bay". Example 6.13. Design a CFG for the language L= a" b™:n# m). Solution, If 1 # m then there are only two cases are possible. Case 1. n>m Let us say langauge L on condition n > m is Ly and L, = {a" "n> mi). Let us say G, be the CFG for the language ben G,= (VL ViPS) vi= (SpA. SI Vi = (a, b} then productions P' are defined as st AS, $, > aS,b/€ Az aA/a Case 2. n aSzb/e B > bB/b by the help (by the combining G, and G,) of G, and G, we can write the CFG for L as : $3 s/s? where $ is start symbol of CFG for L. : Example 6.14. Design a CFG for the language L= (0" i'n >0) UG" "m2 Solution. We can assure 1. as b= UL Ly = {0"1"/n 20) Let say G, be CBG for the language L; G, = (Vy Ye P, §) v, = 15) be « S = {5,) Vv, = (0,1) and P is defined as S, 3 0S,1/e and L, = {1'0'/n = 0} Let say G, be CFG for language L, G, = (V, V, P, 8) Vv, = (8,) S= {5} V, = 10, 1 P is defined as 8, > 18,0/€ Since L = Ly U Ly suppose G be CFG for language L, with starting non- terminal S. Then productions in G will be S$ S/S, S, 3 0S,1/€ S, > 18,0/e. an Context-Free Grammars and Languages 439) Example 6.15. Design CFG for © = (a, b} that generates the set of (a) all strings with exactly one a. (b) all strings with at least one a. (o) all strings with at least 3 a's. Solution. (a) Let CEG be X G, G, = MuaVePs 8) v, = {5A} Vv, = (a,b) P is defined as S— AaA A= bA/e Let us derive a string bbaab S = Aad => bAaA = bbAaA => bbAabA = bbAabe = bbAab How we don’t have any choice for derving second G so string bbaab cannot be derive from G,. (b) Let CFG be G, Gy = (Vay Vy PS) Vv, = (8, Ab V, = (a, 6} P is defined as S$ Aaa A= aA/ba/e Let us derive string baab S = AaA = bAaA = baAaA = bre AcA = baaA => baabA = banbe = baab (e) Let CFG be G3 and Gy = Vy Ve P. 8) V, = (5, A} V, = ta, 0) Automata Theory and Formal Languages: P is defined as Ww S —+ AaAaAaA and b' . A> aA/bA/e. ean Example 6.16. Give the simple description of the language generated, Le by the grammar with production Ssaa A>bS p? Sse. Solution. Given CFG is $+ aA (say Py) : Abs (ay P,) whi Se (say P,) E From the production P,, CPG can generate null string. Now let us production P, (ie. $ > aA), once we use it then we have to A — bS. Now s again we have to use either production P, (§ > 2A) or production P, (S$ > €) stein so clearly the langauge of above CFG will be as follows : and L = {(aby"/n 2 0) it Clearly language L is the set of strings. Starts with ab followed by any ea number of ab’s. Null string (<) also included in language L. Example 6.17. Give the simple description of the language generated by the grammar with productions. S— Ss $3 (S) ste. d Solution. Given CFG is 2 5 > SS Ss — (S) Ss (e) The language generated by the CFG is set of strings, contains balanced set of paranthesis that is for every left paranthesis there is a right paranthesis, and strings like ()() () or C(O) OO (((Q))) - are the part of the language of CFG. Example 6.18. Write the CFG for the language L = {a" b" c" d"In > =1, m > =1). Solution. Let CFG for the language L be G G= (Vy VP, $) a v, = (5, X,Y} V, = (a,b) Productions P is defined as follows = S$ —+ XY (Production P,) X 5 aXb/ab (Production P,) Y 3 c¥d/ed (Production P;). an Context-Free Grammars and Languages 741 ‘When we use P, (ie. 5 > XY) then X can generate equal number of a’s and b's with no a's follows when any b is encountered. Similarly ¥ car. generate Strings of equal number of c's and d's with no c’s follows when any d is encountered. “Let us derive a string aaabbbed that = 3, m = 1 S = XY = aXbY = aaXbbY = aanbbbY = aaabbbed which required string. Example 6.19. Design a CFG for the language L = {a" bc" d'/n 21, m > = Ie Solution. Let us analyse the language L first, language L is the set of strings which contain equal number of a's and d’s and equal number of b's sera eg in between a's and d's. There is no ‘a’ followed by b, ¢ and, there is and y followed by c and, there is no c followed by 4. Similarly there is no b, cand d before any ‘a’, there is no ‘c’ and d before any ‘¥, finally there is no d before any c. ‘On the base of above description let us design a CFG for the language. Let us assume CFG be G and P is defined as (say Py) (say Pp) (say P,) Asbe (say P,) Let us derive a string aabbbeecdd from grammar $ >aSd (by P,) = aaAdd (by P,) = aabAcdd (by P) = anbbAccdd (by P,) => aabbbecead (by Py) which is required string. Example 6.20. Design a CFG for the language L = {a" bm > = 0}. Solution. Let CFG be G for the language L G = (Vy Vp P, 5) v, = Is) V, = {a,b and P is defined as (142) Automata Theory and Formal S$ — aSbb/e Let us derive a string azabbbbbb S = aShb => aaSbbbb => aaaSbbbbbb = aaae bbbbbb = aaabbbbbb which is required string. Now let us derive another string abbb. S = aSbb From this derivation only it is clear that third b is not possible against P, single ‘a’ Example 6.21. Write the CFG for the language L = {a™" b"/n > 0, m > =0) Solution. Let us analyse the language first language 1 is the set of strings - if string starts with a then number of a’s are even, followed by any number et vrumes b's. There is no ‘a’ in any string after first ‘b’ is encountered similarly 5 there is no ‘b’ before any ‘a’ in the string. Let suppose CFG for the L is G L G= (Vy VP, S) V, = (S, A, B} V, = (a, ae Productions P is defined as S$ > aaAB (say P) A> aA/e (say P3) B > bB/e (say P3) Let us derive the string aaaabbb S$ => @AB (by P). => aaaaAB (by P,) = aagae B (by P,), = anaabB (by Py) = aaaabbB (by P,) this = aaaabbe (by P,) = anaabb oti which is required string. aa Example 6.21(a). Write CFG for Lela" bY c” d"in > = I,m > = 1) U a" Bc" din > =n >= Te Solution. Let suppose L = b, v Ly where L,= (a pict d"/n>=1,m>=1) uF Let CFG for L, is G, Gy = Vy Vu Py Si) V,, = (Sy A, B} and P, is defined as follows 5, > AB A aAb/ab B > cBd/cd L,= (a Bc" din > = 1m>=1) Let us assume that CFG for L; is Gp and Gy = Wye Vo Py Sd Vv, = (S$, D, E} V,= (abe dl P, is defined as follows S$, aDd D> aDd/E E — béc/be By the help of the G, and Gy, we can define CEG for the language L, since L=luly 80 G= GyUG, Let suppose G= Wye VP, S) V, = 18, Sy SpA By D, Eh V, = lab, ah and P is defined as $= 5,/3, (for G = Gv G) Ss, > AB A aAb/ab B= cBd/cd $, 9 DE D> abd/E E = bEc/be Example 6.22. Write a CFG for language L = Wik 05,2/0Y2 ¥ > c/ox/e C3 1C/e (4) Automata Theory and Formal Let us assume that CFG for Ly is Gy Gy = Vu Vr Par $2) V, = (SpA, B) Vv, = 10,1) P, is defined as follows SAB A> 0A/e B > 1B2/1B/e ‘We can define CEG for L (since L = Ly UL) Let us assume that CFG for L is G G= Vy Vy PS) V, = (Sy Sy X,Y, Z, B) (0,1) P is defined as $ > S,/S, $, = 08,2/0Y2 ¥ > C/0Y/e C > 1C/e S$, > AB A> 0A/e B — 1B2/1B/e. Example 6.23. Write the CFG for the langauge L= 0 v 2/i=jorj =k Solution. Let us assume L = L, U Ly where L= Ov ki fl and 1, = ov A7j= Let us consider L, first, let CFG for L, be G, Gy = Vy Vy By'S)) V,, = (5, 4, B} 10, 1, 2) P, is defined as follows S$; > AB A= 0Al/e B+ 2B/e Now let us assume that CFG for language L, is G, Gy = Vy Vir Pay So) V, = (Sx C, D) V,= © 1,2) Productions are defined as follows : $, CD C 30C/e D ~ 1D2/e Context-Free Grammars and Languages 45) By the help G, and G, we can define CFG for the language L, Let it is G. G= Vy Vy P, S) V, = (8) Sy Sy A, B,C DI V,= (0,1, 2} Productions are defined as follows $35,[S. $9 AB A> 0A1/e B+ 2B/e $, 3 CD C3 0C/e D> 1D2/e which is required CFG. Example 6.24. Find the CFG for the following language L= a" # "/n, m> = 0) Solution. Let us analyse the language first, L is language, which contain a set of strings such that every string may start from @ or ¢ but not by b. Tf string starts with “a then number of a's must follow b's and the, number of b's are twice then the ntumber of as. If string does start with a then it starts With c followed by any number of c's. There should be no a after any, b oF gry o, and no b after any c. Similarly there should be no b or ¢ before ‘a’ and no c before any b On the basic of above discussion, let us assume that CFG for the languages G. G = WVqr VP, $) v, = (8,4, BI V, = (a, bec} P is defined as follows 5 AB A> aAbb/e B—cB/e which is required CFG. Example 6.25. Write the CFG for the language L= (a" b" O"/n, m > = 0}. Solution. This problem is very similar to Example 6.25. Let us assume CFG be G Go Wy Ve 5) pcre V, = (S, A, B) V,= @ 4.0 Automata Theory and Formal Languages Co P is defined as follows SAB A> aA/e B bBcc/e which is required grammar. 6.2.5. Left most and Right most Derivations In order to restrict the number of choices of replacement of variables, if at each step we replace the “left most” variable by one of its production bodies such a derivation is called a “left most derivation”, and we indicate that a derivation is left most by using relation => and = for one or many steps, respectively. Similarly, it is possible that at each step the “right most” variable is replaced by one of its bodies. If so, we call the derivation “right most” and an use symbols = and => to indicate one or many right most derivation steps, respectively. “” ‘od Example 6.26. (i) Write a CFG for solving simple expression, such + and *. (ii) Also write CFG for regular expression r=(a+b)(atb+0+ 0" (iii) Derive the string (which is defined in part (ii)) a* (a + b00) by applying left most derivation and right most derivation. wt Solution. (i) We need two variables in this grammar. One, which we call E, represents the expression, it is the start symbol and represents the language of expressions we are defining. The other variable, I, reprsents the identifiers. Let CFG be G= (Vy, Vy P, 5) Vv, = (EH Vi bi GO wh P is defined as Esl EE+E E>E*E us E> (E) as Here I can derive any terminal symbol as ue 1 terminal (ii) Given regular epression is 6: @+d @+b+04+ 0" Let CFG be G= Vy VP, 8) & V, = (EU) V,= {+%,C),4, 6,0, 1) Context-Free Grammars and Languages Pigs defined as follows: EI EsE*E EsE+E E>() loa Ib Ista I> Ib 1310 Ion 130 91. (iii) String is a * (@ + b00), and we want to derive it by using left most and right most derivations, from CFG defined in part (ii) By using leftmost derivation : 147} EES olan foe (Bavar(E*E) atte) a* (a+ E)oa*(a+10)=9.a*(a+100)> see (a tO) which is required string. By using right most derivation : E> E*ESEME)SE*(E+ => E*(E+ =e E* (E+ 10) E* (E+ 100) E*(E+b00)=E*(1 + b00) => E*(a+b00) => => 1" (a + B00) = b* (a + B00) which is required string. 6.3. Parse TREE There is a tree representation for derivation that has proved extremely usefull It is the second way of showing derivations, independent-of the order in which productions are used, is also called “derivation tree”. “A parse tree is an odered tree in which nodes are labeled with the left sides of productions and in which the children of a node represent its corresponding right sides”. 6.3.1. Definition as Let G = (Vy V, P, §) be a context-free grammar. An ordered tree for this CFG, G is a derivation tree if and only if it has the following properties. (a) ‘The root is labeled by the starting non-terminal of the CFG that is S. (6) Every leaf of the orderd ordered tree has a label from V, U le}. (©) Every Interior node of ordered tree has a label from V,. [143] __ Automata Theory and Formal Conte (d) Let us assume that a vertex has label X € V,, and its children four / labeled (from left to right) ¥), Y/Y ~/ Yy then production must contait Avs a production of the form X ye Yar + Yr (c) A leaf labeled € has no siblings, that is, vertex with a child labeled € can have no other children. Clearly if the leaf is labeled €, then i must be the only child of its parent. 6.3.2. The Yield of Parse Tree If we look at the leaves of any parse tree and can coneate them from left, we get a string, called the yield of the tree, which is always a string that is derived from the root will be proved shortly of special importance a1 : ‘those parse trees such that : es (a) The yield is a terminal string. That is all leaves are labeled either wit a terminal or with €. (b) The root is labeled by the start symbol. Now let us see some examples based on the parse tree. Example 6.27. Consider the CFG S > Xx X — XXX/bx/Xbla Find the parse tree for the string bbaaaab, Solution. Given CFG is S > XX Ke XXX/bX/Xb/a and string is w = bbaaaab R We begin with $ and apply the production $ —> XX. at , S, a E x x Fig, 6.1. To the left hand X, let us apply the production X — bX. To the right hand X, Jet us apply X > XXX. ae yield ee Zu Ws ® MX xe Fig. 6.2. ‘The b that we have on the bottom line is a terminal, so it does not descend ; further. In the terminology of trees, it is called a “terminal node”. Let the Context-Free Grammars and Languages (149) four A’s left to right, undergo the production A + bA, A > a, A > a, and A-— Ab, respectively. We now have : a Diet yo! wa A, aX, 8S wR), 2 Fig. 63. Let us finish off the generation of a word with the production X > a and Xa: prin 7m i, 71 by, Fig. 6.4. Reading from left to right, we see the word we have produced is bbnaaab. These tree diagrams are called “syntax trees”, “parse trees”, “generation” trees, “production trees” or derivation trees, Example 6.28. Consider the grammar G, with production 5 > aXY me X — bYb a bY Y Y > Xle. Solution. Let us see a partial derivation tree for G, this tree gives abbbb as ie JN © Fig. 6.5. — Now extend this partial derivative tree for the string albbb b Automata Theory and Formal Languages i ah Ape, | | Y | € Fig. 6.6. the yield is abebbeb that is abbbh. Example 6.29. Write the CFG for the language L = (x O"yl"z/n > =0), and give the parse tree for the string x000 y11z. S. Solution. Let the CFG for the language L is G= Vy Vp PS) A is 8 where V, = (5, Bh V,= bn y 20,1) AS and production P is defined as S$ > xBz aA B y/0B1. | Now construct an arbitrary derivation for , $= x00 yltz. Bier «¥. One possible derivation using the above grammar is S = xBz = x0B1z = xO0B11z => x00yl1z Now we are able to design the parse tree for the string x00y11z from the above CFG. Parse tree is shown in Fig. 6.7. Example 6.30. Consider the CFG ‘G” whose productions are S > aASla A > SbAISS/ba. show that S=> aabbaa and construct a derivation tree whose yield is aabbaa Solution, 5 => aAS = oSbAS => aabAS = aabbaS = a°b*aS = aPt'a? ‘The derivation tree is given in Fig. 68. ~ 7 s. DN™ hola ot, : a y Fig. 6.8. Derivation tree with yield aabbaa. Context-Free Grammars and Languages 751 Example’6.31. Let G be CFG S— bBlaA, A> bibSIaAA B > alaS/bBB. For the string bbaababa find {@ left most derivation (ii) rightmost derivation and (iii) parse tree. Solution. (i) Left most derivation for string w = bbaababa is 5 bB => DbBB = bbaB = bhanS = PPB = V'a'baS = ba" babB = Harbaba (ii) The right most derivation is $= ba => bOBB = bias —> bbBabB = W*Babas = 'BababB = WP Bababa = Wa?baba The derivation tree is following in Fig. 6.9. Ses b | / 9 Ne [tar ee e v oe 8 a Tag Ct a Fig. 6.9. Yield is bbaababa. 6.4. Ampicurry in Grammars AND LANGUAGES Grammars can be used to put structure on programs and documents. The assumption was that a grammar uniquely determines a structure for each string in its language. However not every grammar does provide unique ‘structure. When a grammar fails to provide unique structures, it is sometimes possible to redesign the grammar to make the structure unique for each string fn the language. Unfortunately sometimes we cannot do so. That is, there are some CFL's that are “inherently ambiguous”; every grammar for the language puts more than one structure for some strings in the language. i eee z Example 6.32. The language of all ni by a CFG as follows = on null strings of a’s can be defined $ — aSiSala [152] Automata Theory and Formal Languages Solution. In this case, the word a° can be generated by four different tree’s : “ll ea nee wt nN eal j | ie my »—ao. =»—»—»—o @ ® © @ Fig. 6.10. The CEG is therefore ambiguous. However, the same language can also be defined by the CFG S > aS/a for which a° has only one production : “AA “ i a Fig. 6.11. Example 6.33. The CFG S$ — aSb/SS/e is ambiguous. The sentence aabb_ has the two derivation trees shown in Fig, 6.12 (a) and 6.12 (U). ae eae MN, PAN mS b (a) oy Fig. 6.12. Example 6.34. Consider the grammar Eo! ESE+E E>E*E E> (E) loa 19 ‘ler igus eee aed Context-Free Grammars and Languages 153] Ib IIa I> i310 isn For instance, consider the sentential form E + E * E. It has two derivation from E: a EsE+E>E+E*E Q) EsE*E>E+E*E Figure 6.13 shows the two parse trees, which we should note are distinct aN eg bene Ts (a) o Fig. 6.13. Two parse trees with the same yield. The difference between two derivations is significant. As the structure of the expression is concerned (1) says that second and third expression are multiplied, and the result is added to the first expression, while derivation (2) adds the first two expressions and multiply the result by the third one. For example, according to first derivation 2 + 5 * 3 should be grouped 2+(5*3)=17, while the second derivation suggest the same expression should be grouped (2 + 5) * 3 = 21. Obviously, the first of these, and not the second, matches our notation of correct grouping of arithmatic expressions. Example 6.35. Using the same grammar as in Example 6.34, we find that strng a + b x b has many different derivations trees, as follows. > ETS JAN, JN é — o—-— o—-—m ———F . @ ® Fig. 6.14, Example 6.36. If CFG a is S > SbS/a, show that G is ambiguous. Solution. To prove that G is ambiguous, we have to find a string, w € — L(G) which is ambiguous. Consider the w = abababa € L(G). Then we get two derivation tree for 1. ‘Thus grammar G is ambiguous. ea ‘Automata Theory and Formal Languages {* ; ian? aes we N, /\ Sa Jn sno ! a b 1 i | | a 3 y Fe eae es a ors meee Fig. 6.15. Example 6.37. Consider the productions for the if then else statement which are the part of programming language —> if then | if then else | Expression —expi | exp2 |... exp. —sst1 | st2 | ... | stm. Prove that grammar is ambiguous. Solution. Consider the statement if exp! then if exp2 then sti else st2 There are two different parse trees as shown in Fig, 6.16 and Fig. 6.17. =| SS Se he __ st st Fig. 6.17. Hence grammar is ambiguous. BeBez2e of nm I eerser Context-Free Grammars and Languages 6.4.1. Removing Ambiguity from Grammars If a context free grammar is ambiguous, it is often possible and usually desirable to find an equivalent unambiguous CFG. Although some CFGs are “{nherently ambiguous” in the sense that they cannot be produced except by ambiguous grammars. Ambiguity is usually the property of the grammar rather than the language. Let us consider the grammar for algebraic expressions and for the string a + a°a the two parse tree are possible. RIN, aS pods AN | | | a (@ o Fig. 6.18. ‘The first cause of ambiguity in the grammar is that the precedence of the operator + and * is not respected. Both parsing trees in Fig. 6.18 are valid. We need to force only the structure of Fig. 6.18(a) to be legal in an ambiguous grammar in which the * operator has higher precedence then + because a + aa is equivalent to a + (a*a) and a*a +a is equivalent to (aa) + a. — Again, the parse trees for a +a + a are “AL Zi Zi | ee dees baer Poe of ; @ ® Fig. 6.19. ‘The cause of the ambiguity in string a +a + a is that the associativity it of + operator is not respected. Since, we assume the + operator is left associative the string a + @ +a is equivalent to (a + a) +a. Thus, again we need to force only the structure of Fig, 6.19(a) to be legal in an ambiguous eames Because we do not want E + £ (as we have seen, E — E + E, by itself is enough to produce ambiguity) we will not think of expressions involving + as being sums of other expressions. They are obviously sums of something: Let we use the word terms to stand for this things that are added i.e. combined using + to create expressions. The corresponding variable in the grammar will be T. Expressions can also be products; because the two expressions @ + b*canda™b + care both sums, it is more appropriate to say that terms can be products. Let us say that “factors” are the things that are multiplied ‘Automata Theory and Formal Languages (combined using *) to produce terms. The corresponding variable in the grammar will be F. Since the multiplication have higher precedence over addition. We can’ say that Expressions are sums of one or more terms, and terms are products of one or more factors. ‘Now, we must deal with parentheses. We might say that (E) could be an expression, a term or a factor. However, evaluation of a parenthetical expression takes precedence over any operators outside the parentheses. Factors are evaluated first in hierarchy and therefore the appropriate choice is to say that (E) is a factor and that E itself can be an arbitrary expression. Now, expressions are sums of one or more terms, terms are products of one or more factors, and factors are either parenthetical expressions or single identifiers. Sum of terms might suggest something such as E> T+T | T. (keep in mind that an expression can consist of single term) However, to obtain the sum of three terms with this approach, we would be forced to try T > T + T or something comparable, and again we would have ambiguity. What we say instead is that an expression is either a single term or the sum of a term and another experssion. The only question is whether we want E> E +T or ET + E, Since + operator is left associative, we would probably choose E > F + T as more appropriate. Similarly, we choose the production T > T * F rather than T > F * T. The resulting unambiguous grammar is E 3E+TIT TOT*FIF F (EB) 1a In the similar fashion we can find the unambiguous grammar for algebraic expressions having other operators. Example 6.38. Eliminate the ambiguity from the following grammar —if then | if then else | +5, | S, 1S, 1 « S, +E, | E, | Ey. | Ey Solution. The Grammar is ambiguous since the string “if E, then if Ey then S; else S,” has two parse trees, Zeb eke a : | | & 8, 3 Context-Free Grammars and Languages 7 aa — | ie pete ae he oe ae | L henbraseione - 33 Fig. 621. In all programming languages with conditional statements of this form, the first parse tree is preferred. The general rule is “Match each else with the closest previous unmatched then”. Then unambiguous grammar is as follows : ‘ > | — if then else | — if then | if then else > | Ey |. Ey 5, ! 5). 5, matched statement> 5, | S,.- | 5, The idea in this grammar is that a statement appearing between a then and an else must be “matched” i.e. it must not end with an unmatched then followed by any statement, for the else would then be forced to match this unmatched then. A matched statement is either an if then else statement containing no unmatched statements or it is any other kind of unconditional statement. ‘This grammar generates the same set of strings as previous one, but it allows only one parsing. 6.4.2. Inherent Ambiguity “f L is a context-free language for which three exists an unambiguous grammar, then L. is said to be unambiguous. Let us consider an example for it Example 6.39. Show that L= {a bre" Ula" BC" with and m non-negative, is an inherently ambiguous context-free language. Solution. Let us say L= Luly where L, = a" Bc} and L, = (a" b™ c™ Let us write CFG for L,, let it is G,, as follows : [158] Automata Theory and Formal Languages Co Sp 5, c/A A aAb/e Similarly we can write CFG for L,, Let it is Gy, as follows : $4 @5,/B B- bBe/e Now with the help of G, and G, we can write CFG for the language L, as follows S$ S,/S; where $ is starting non-terminal for CEG of language L. The grammar is ambiguous since the string a” b" c” has two distinct derivation, one starting with S => S,, and another with $ => 5». It does of course not follows that L is in herently ambiguous as there might exist some other non-ambiguous grammar for it. But in some way L, and L, have some. conflicting requirements, the first putting a restriction on the number of a's and b’s, while the second does the same for b's and c’s. A few tries will quickly convince us of the impossiblity of combining these requirements in a single set of rules that cover the case n = m uniquely. Now let us discuss some more examples about these concepts. Example 6.40. Design a CFG for the language L = {a" b": 1 < m + 3). s Solution. Let us solve this problem for n = m +3. Then add more b's. Let us assume CFG for above concept is G = (Vy, Vy P, S) where V, = (5, A, Bl V, = {a,b €} and productions are defined as : c S > anaA A aAb/B B > Bb/e But when we analyse this solution, then it is found incomplete since it creates at least three a's. To take care of these cases 71 = 1, 2 we add S > €/aA/aaA So finally solution is S + €/aA/aA S > aaaA A> aAb/B B > Bb/e Example 6.41. Find the context-free grammar for the set of all regular expression on the alphabet (a, b]. Solution. Let us analyse the regular expression on the alphabet (a, 6} regular expression by any combination of a’s and b’s is r = (a + b)*. Let CFG for regular expression is G = (V,, V, P, S) V, = (5, 4, Bh Vj= (a,b, €) and P is defined as Context-Free Grammars and Languages 159) $= AB/BA (P, say) A> adse (P, say) B- bB/e (P3 say) By the help of P,, P, and P, we can derive any combination of a’s and 6's. Example 6.42. Let L = {a" b":n20) Show that 1” is context-free. Solution. L= (a" b': n 20) Let CFG for L is G = W,, Vp P, 5) V, = (st V,= (a be} P is defined as S — aSb Soe Clearly L is context-free language, since there exist a CFG for given language. Now consider a grammar, which is CFG and production of this CFG are defined as follows. S$, 9 SS Sash Ste Now, when carefully analyse these production then we find that, it is the CEG for the language L’, So L? is context free language, provided L is context free. Example 6.43. Show that the following grammar is ambiguous. S — ABlaaB \ Av alAa _ Bob. Solution. Given CFG is S = AB/aaB As a/Aa Bob The string w = aab has two distinct leftmost derivations, as follows = (a) S => aaB => anb (2) S = AB = AaB = aaB = aab Now let us see parse trees for the same Te As fone no | Fig. 6.22. Automata Theory and Formal Languages Cc Example 6.44. Find a derivation free of a * b + a* b given that a* b + a*b is in L(G), where G is given by S—>S+S/S*S S$ a/b. Solution. Given grammar is $45+S/5*S S—a/b and string isw=a"b+a*b S>S+SaS*S+Saa*S+Smatrb+Saath+s*s Satb+atS=athtath ae PAn Ki ferent a Bit Fig. 6.23. Derivation tree for Example 6.44. Example 6.45. Consider the grammar is S > aBlbA A — aS/bAAla B ~ bS/aBBib For the string aaabbabbba find (i) The left most derivation and left most derivation tree. (ii) The right most derivation and right most derivation tree. Solution. Given string is aazbbabbba. (i) Left most derivation S$ => aB = aaBB => aaaBBB = anabBB = aaabbB = aaabbaBB => aaabbabB = aaabbabbS =» aaabbabbbA = aaabbabbba. fo ee s Pe i aia ea [oun s s Fig. 6.24, Left most derivation tree for given string. (ii) Right most derivation : S = aB = aaBB = aaBbS = aaBbbA = aaBbba = aaaBBbba => aaaBbbba => aaabSbbba = aaabbAbbba => aaabbabbba i w ic} Fig, 6.25. Right most derivation tree for given string. Example 6.4. Show that the grammar $ -» alabSblaAb A bS/aAAD is ambiguous. Solution. Given grammar $34 (P, say) $= abSb (P, say) Saab —— sy) Av bs (P, say) A> aAAb (P, say) Let us consider a string w = abab, it has two different derivations, as follows: (i § = abSb = abab (by using P, and P;) (ii) § = aAb = abSb = abab (by using Py P, and P,) Example 6.47. Show that grammar S = aBiab A aABla B > ABbib is ambiguous. Solution. Let us consider a string w= ab which can be easily generated by given grammar. ‘Now w =ab as two different derivation tree as follows : @ ae 0) byes a b % a So clearly given grammar is ambiguous. Automata Theory and Formal Languages Co 1. Find CFG’s that generate these regular language over the alphabet © = (a, bj. (i) The language defined by (aaa + b)*. i (ii) The language defined by (a + b)* (bbb + aaa) (a + B)*. (ii) All the strings without the substring aaa. (io) All strings that end in b and have an even number of D's in total. (2) The set of all strings of odd length. (i) All strings with exactly one a or exactly one b. (oii) All strings with an odd number of a’s or even number of B's. 2. Write CFG for the following languages (i) fa" 8" /m > = n} (it) (a" BF d'/m +n =p +4) (ii) (ua @ b/y, w € (a, BI =) w I) 3. Consider the grammar = (Vip Vir P, S) ' =lE 1 = 1a, be HOM = E, and P is given by ‘ a1 E+E ECE =(E) a/b/e show that this grammar is ambiguous by deriving two parse tree for string arbre. 4. Consider the grammar with production 1 Saab : 1 A sbBb/e . B Aa ‘Show that the string aabiabba is not accepted by this grammar. The following grammar generates prefix expression with operands x and y and ‘inary opersione +=, and * E +EE/*EE/-EE/x/y. where am mmm aS oO s Find the leftmost and rightmost derivations for the string + * - xyxy. # 6. (i) Find the left most derivation for the word abba in the grammar $ SAA AaB 4 B bB/e Context-Free Grammars and Languages 163 (ii) Find the left most derivation for the word abbabaabbbabbab: in the CFG § 9885 /aXb X —sbe/bba /abb 7, Find the regular expression and defined the language of following CFG's. 0 = ax/bS/a/b sj ax/a = bS/aX/b bX /aS/a = aaS /abS /baS /bbS /€ — aB/bAs/e as bs. aB/bA > aB/a — bA/b @ i) Ge) () weber unukaxu 8. Starting with the alphabet Eb O71 Find a CFG that generates all regular expressions, 9. Give the derivation tree for (((a + 6)(c)) + a+ b, using the following grammar ET oF at SE+T aT" F >(E) 1 -a/b/e 10. Show that the language L = {rowR : w € (a, 6)*] is not inherently ambiguous. 41. Write the CFG for the following language (with » > 0, m > 0) mam @ L = (bY sn em} (i) L = fe (a, B= ,(w) = 2n,(e) + 1). 42. Find the CFG for the following language (with 1 > 0, m 2 0). @ L= ("ic ken +m i) L = ta bck tk 3) (Cc) L = [we (a,b, ct: 1,(20) + m(w) # 1.0) 43, Define what one right mean by properly rested parenthesis structures involving two kind of parenthesis, say ( ) and [ ]. Intuitively, properly rested strings in this situation are ({ 1), ({L 1}) {()], but not ({)] or (J. Using your definition, give a CFG for generating all properly nested parentheses. 44. Find a CEG that generate all the production rules for CFG with T = (a, b] and V = (A.B, CI. seo

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy