Unit Ii Regular Expressions and Languages: 2.1.1. Definition
Unit Ii Regular Expressions and Languages: 2.1.1. Definition
UNIT II
REGULAR EXPRESSIONS AND LANGUAGES
Regular Expression – FA and Regular Expressions – Proving
languages not to be regular – Closure properties of regular languages –
Equivalence and minimization of Automata.
2.1.1. Definition:
Let Σ be an alphabet. The regular expressions over Σ and the regular sets
are defined as follows,
1. ф is a regular expression, and the regular set is denoted as empty set {}.
2. ε is a regular expression and the regular set is denoted as {ε}.
3. For each ‘a’ in Σ. ‘a’ is a regular expression and the regular set is
denoted as {a}.
4. If ‘r’ and ‘s’ are regular expressions denoting the languages R and S then
r+s, rs and r* are regular expressions that denotes the set R S, RS and
R* respectively.
Languages associated with the regular expressions ‘r’ is denoted as
L(r). If ‘r1’ and ‘r2’ are regular expressions, then
L(r1 + r2) = L(r1) + L(r2)
L(r1 . r2) = L(r1) . L(r2)
L(r1)* = (L(r1))*
1. Union( ):
If L and M are regular expressions then L M is the set of strings that are
in either L or M or both.
Ex: If L = {001, 10, 111} and M = {ε, 001} then,
L M = {ε, 10, 001, 111}.
29
2. Concatenation(.):
If L and M are regular expressions then, the set of strings that can be
formed by taking any stings in L and concatenating it with any string in M.
We denote the concatenation operator is frequently called ‘dot’.
Ex: If L = {001, 10, 111} and M = {ε, 001} then,
L.M (or) LM = {001, 10, 111, 001001, 10001, 111001}.
3. Closure(*):
A language L is denoted L* and represents the set of those strings that
can be formed by taking any number of strings from L, possibly with
repetitions and concatenating all of them.
PROBLEMS:
Ex.1:
Let Σ = {a, b, c, d}, check whether (a+b)*(cd) is a regular expression?
Soln:
Let r = (a+b)*cd
Let r1 = a and r2 = b
r3 = r1 + r2
r3 = a + b is a regular expression.
(r3)* = (a + b)* is also a regular expression.
Let r4 = c, r5 = d
r6 = cd is also a regular expression. [ r6 = r4.r5 ]
Hence, (a + b)*(cd) is regular expression.
Ex.2:
Describe the following sets by regular expressions,
(a) The set of all strings of 0’s and 1’s ending in 00.
(b) Set of all strings of 0’s and 1’s beginning with 11 and ending with ‘0’.
(c) The set of all strings over {a, b} with three consecutive b’s.
(d) The set of all strings with atleast one pair of consecutive 0’s and atleast
one pair of consecutive 1’s.
(e) All strings that end with ‘1’ and does not contains the substring ‘00’.
Soln:
(a) r = (0+1)*00
(b) r = 11(0+1)*0
(c) r = (a+b)*bbb(a+b)*
(d) r = (1+01)*00(1+01)*(0+10)*11(0+10)*
(e) r = (1+01)*(10+11)*1
31
Case 2:
If there is exactly one such symbol ‘a’, then = a, That is,
Case 3:
If there are symbols a1, a2,………, ak that label arcs from state ‘i’ to state
‘j’, then = a1+a2+…..+ak, That is,
Case 4:
If i = j then the legal paths are the path of length ‘0’ and all loops
from ‘i’ to itself. The path of length ‘0’ is represented by the regular
expression ‘ε’, since that path has no symbols along it.
If there is no such symbol ‘a’, the expression becomes ε, then
= ε + ф, That is,
32
Case 5:
If there is a symbol ‘a’, the expression becomes ε, then =ε+
a, That is,
(or)
= ε+a1+a2+…..+ak, That is,
Induction:
Suppose there is a path from state ‘i’ to state ‘j’ that goes through
no state higher than ‘k’. There are two possible cases to consider,
Case 1:
The path does not go through state ‘k’ at all. In this case, the label
of the path is in the language of .
Case 2:
The path goes through state ‘k’ atleast once. The we can break the path
into several pieces. That is,
(i) The first goes from state ‘i’ to state ‘k’ without passing through ‘k’. That
is, .
(ii) The last piece goes from ‘k’ to ‘j’ without passing through ‘k’. That is,
.
(iii) All the pieces in the middle go from ‘k’ to itself without passing through
‘k’. That is, .
= case1 + case2
33
= +
PROBLEMS:
Ex.1:
Convert the following DFA to regular expression.
Soln:
Regular Expression is denoted by, R.E. =
where i = start state, j = final state
k = total number of states
34
In this, R.E = ; i = 1, j = 2, k = 2
Basis: Assume k = 0
=ε+1
=0
=ф
=ε+0+1
Induction:
Formula is,
= +
Now, substitute the i, j and k values;
= +
Here, Find Regular Expressions and
First Find
= +
= 0 + ε + 1(ε + 1)*
= 0 + 1*0 [ (ε + R) (ε + R)* = R* by 20]
= 1*0 [ R*R + R = R*R by 16]
Now Find
= +
= ε + 0 + 1 + ф (ε + 1)*0 [ ф R = ф]
=ε+0+1
= +
= 1*0 + 1*0(ε + 0 + 1)* ε + 0 + 1
= 1*0 + 1*0(0 + 1)* [ (ε + R) (ε + R)* = R* by 20]
= 1*0(0 + 1)* [ R*R + R = R*R by 16]
Ex.2:
Convert the following DFA to Regular Expression.
35
Soln:
Basis: Assume k = 0
=ε+ф
=a+b
=ф
=ε+a+b
Induction:
Formula is,
= +
Now, substitute the i, j and k values;
= +
Here, Find Regular Expressions and
First Find
= +
= a + b + ε + ф (ε + ф)* a + b
= a + b + ε(ε)* a + b [ ф+ε=ε]
=a+b+a+b
= a+b [ R + R = R]
Now Find
= +
= ε + a + b + ф (ε + ф)*a + b
=ε+a+b [ ф R = ф]
= +
= a + b + a + b(ε + a + b)* ε + a + b
= a + b + a + b(a + b)* [ (ε + R) (ε + R)* = R* by 20]
= a + b(a + b)* [ R*R + R = R*R by 16]
36
Ex.3:
Convert the following DFA to Regular expression,
Soln:
Basis: Assume k = 0
=ε+c
=a+b
=ф
=ε+c+a+b
Induction:
Formula is,
= +
Now, substitute the i, j and k values;
= +
Here, Find Regular Expressions and
First Find
= +
= a + b + ε + c(ε + c)* a + b
= a + b + c* a + b [ (ε + R) (ε + R)* = R* by 20]
= c*(a + b) [ R*R + R = R*R by 16]
Now Find
= +
= ε + c + a + b + ф (ε + c)*a + b
=ε+c+a+b [ ф R = ф]
= +
= c*(a + b) + c*(a + b)(ε + c + a + b)* ε + c + a + b
= c*(a + b) + c*(a + b)(c + a + b)* [ (ε + R) (ε + R)* = R* by 20]
= c*(a + b)(c + a + b)* [ R*R + R = R*R by 16]
37
Ex.4:
Find the Regular expression for the set of all strings denoted by
from the DFA given below,
Soln:
Basis: Assume k = 0
=ε+ф =0 =ф
=1 =ε+ф =0+1
=0 =1 =ε+ф
Induction:
Formula is,
= +
Now, substitute the i, j and k values;
= +
Here, Find Regular Expressions and
First Find
= +
= 1 + 0(ε + ф)* 0
= 1 + 0(ε)*0 [ ε+ф=ε]
= 1 + ε .00 [ ε* = ε ]
= 1 + 00 [ εR = R]
Now Find
= +
= ε + ф + 0(ε + ф)* 1
= ε + 0(ε)* 1 [ ε+ф=ε]
38
= ε + ε .01 [ ε* = ε ]
= ε + 01 [ εR = R]
= +
= 1 + 00 + ε + 01(ε + 01)* 1 + 00
= 1 + 00 + (01)* 1 + 00 [ (ε + R) (ε + R)* = R* by 20]
= 1 + 00(ε + (01)*)
= 1 + 00 (01)* [ ε + R* = R* ]
PROBLEMS:
Ex.1:
Convert the following DFA to regular expression by eliminating states.
Soln:
To eliminate state ‘s’. So all arcs involving state ‘s’ are deleted.
1. Find q1→ p1
39
2. Find qk → pm
3. Find q1 → pm
4. Find qk → p1
Ex.2:
Covert the following NFA to regular expression by eliminating states,
Soln:
1. To convert it to an automaton with regular expression labels;
40
(Fig.1)
2. Let us first eliminate state B. State B has one predecessor A, and one
successor C. After eliminate state B the result is,
(Fig.2)
3. Now, we must branch eliminating states C and D in separate reductions.
To eliminate state C, the mechanics are similar to those we performed
above to eliminate state B, and the resulting automaton is,
(Fig.3)
(Fig.4)
5. Finally, to sum the two expressions to get the expression for the entire
automaton. The expression is,
(0 + 1)* 1(0 + 1) + (0 + 1)* 1 (0 + 1) (0 + 1)
Theorem:
Every language defined by a regular expression is also defined by a finite
automaton.
Proof:
Suppose L = L(R) for a regular expression R. We show that L = L(E) for
some ε – NFA E with;
1. Exactly one accepting state.
2. No arcs into the initial state.
3. No arcs out of the accepting state.
Basis:
There are three parts to the basis,
a. How to handle the expression ε.
The language of the automaton is easily seen to be {ε}, since the
only path from the start state to an accepting state is labeled ε.
b. It shows the construction for ф. Clearly there are no paths from start
state to accepting state. So ф is the language of this automaton.
c. The language of this automaton evidently consists of the one string ‘a’,
which is also L(a).
Induction:
There are three parts to the induction,
1. The expression is R + S for some smaller expressions R and s. Thus the
language of the automaton is L(R) L(S). The R + S equivalent ε –
NFA is,
42
2. The expression is R.S for some smaller expressions R and S. Thus the
language of automaton is L(R)L(S). The automaton for the
concatenation is shown in fig.
(or)
PROBLEMS:
Ex.1:
Convert the following regular expressions to NFA’s with ε – transitions.
(i) 01*
(ii) (0 + 1) 01
(iii) 00 (0 + 1)*
(iv) (0 + 1)* 1 (0 + 1)
Soln:
(i) 01*
43
(ii) (0 + 1) 01
(iii) 00 (0 + 1)*
(iv) (0 + 1)* 1 (0 + 1)
44
Principle:
• For a string of length > n accepted by DFA (n, number of states) the walk
through of a DFA must contain a cycle.
• Repeating the cycle an arbitrary number of times, it should yield another
string accepted by the DFA.
2.3.2. Theorem:
Let L be a regular language. Then there exists a constant ‘n’(which
depends on L) such that for every string ‘w’ in L such that |w| ≥ n, we can break
‘w’ into three strings, w = xyz, such that;
(1) | y | ≥ 1
(2) | xy | ≤ n
(3) For all k ≥ 0, the string xykz is also in L.
Proof:
Suppose L is regular. Then L = L(A) for some DFA A. Suppose A has
‘n’ states. Now, consider any string ‘w’ of length ‘n’ or more, say w = a1a2 …..
am, where m ≥ n and each ‘ai’ is an input symbol. For i=0,1,….n define state Pi to
be,
(q0, a1a2 ……. ai)
Where δ is the transition function of A, and q0 is the start state of A.
That is Pi is the state A is in after reading the first ‘i’ symbols of ‘w’. Note that
P0 = q0.
By the Pigeonhole principle, it is not possible for the n+1 different Pi’s
for i = 0,1, …. n to be distinct, since there are only ‘n’ different states. Thus we
can find two different integers ‘i’ and ‘j’, with 0 ≤ i ≤ j ≤ n, such that Pi = Pj.
Now, we can break w = xyz as follows,
(1) x = a1a2 …. ai
(2) y = ai+1 ai+2 …….. aj
(3) z = aj+1 aj+2 ……... am
45
PROBLEMS:
Ex.1:
Show that L = {0n1n | n ≥ 1} is not regular.
Soln:
Assume L is regular.
L = {01, 0011, 000111, 00001111, …….. }
Let w = xyz.
Take a string in L = 000111 [ Take any string in L]
To prove, 000111 is not regular.
Case 1:
w = 000111 ; n = 6
Now, divide ‘w’ into xyz.
Let x = 00, y = 0, z = 111 [ |y| ≥ 1, |xy| ≤ n]
k
Find, xy z. When k = 2,
46
xy2z = 0000111
0000111 L.
So L is not regular.
Case 2:
w = 000111 ; n = 6
Now, divide ‘w’ into xyz.
Let x = 0, y = 00, z = 111 [ |y| ≥ 1, |xy| ≤ n]
Find, xykz. When k = 2,
xy2z = 00000111
00000111 L.
So L is not regular.
Ex.2:
Show that L = {0i1j | i > j} is not regular.
Soln:
Assume L is regular.
L = {001, 0001, 00011, 000011, …….. }
Let w = xyz.
Take a string in L = 00011 [ Take any string in L]
To prove, 00011 is not regular.
Case 1:
w = 00011 ; n = 5
Now, divide ‘w’ into xyz.
Let x = 000, y = 1, z = 1 [ |y| ≥ 1, |xy| ≤ n]
Find, xykz. When k = 2,
2
xy z = 000111
000111 L.
So L is not regular.
Case 2:
w = 00011 ; n = 5
Now, divide ‘w’ into xyz.
Let x = 000, y = 11, z = ε [ |y| ≥ 1, |xy| ≤ n]
Find, xykz. When k = 2,
2
xy z = 0001111
0001111 L.
So L is not regular.
47
Theorem:
If L and M are regular languages then L M is also regular.
Proof:
Since L and M are regular languages the regular expressions say,
L = L(R), M = L(S)
Then L M = L(R + S)
Theorem:
Theorem
Proof:
Let L = L(A) for DFA, A = (Q, Σ, δ, q0, F) then, = L(B) where B = (Q,
Σ, δ, q0, Q-F). B is similar to A but accepting states of A have become non
accepting states of B and accepting states of B have become accepting states of
A. Then ‘w’ is in L(B) iff (q0, w) is in Q-F which occurs iff ‘w’ is not in L(A).
Ex.
Find the complement of (0+1)*01.
49
Soln:
Step 1: Convert the regular expression to NFA.
Transition Diagram:
Theorem:
If L and M be regular languages, then L-M is regular.
Proof:
Since L and M are regular languages the regular expressions say,
L = L(R), M = L(S)
Then L-M = L
By theorem is regular and also intersection of two regular languages is
regular.
For each symbol ‘a’ in the alphabet of some regular set R, Let Ra be a
particular regular set. Suppose that we replace each word a1a2 ….. an in R by
set of words of the form w1w2 ……wn, where wi is an arbitrary word in Rai,
then the result is always regular set.
A substitution ‘f’ is a mapping of an alphabet Σ onto subsets of ∆* for
some alphabet ∆. Regular sets are closed under substitution.
Homomorphism(h):
It is a substitution such that h(a) contains a single string for each ‘a’.
Let h(0) = ab, h(1) = ε and w = 0011
h(w) = abab
Since homomorphism is also a kind of substitution, it is also closed
under regular expression.
51
The language generated by a DFA is unique. But, there can exist many
DFA’s that accept the same language. In such cases, the DFA’s are said to be
equivalent.
Two states ‘p’ and ‘q’ of a DFA are equivalent if and only if δ(p, w) and
δ(q, w) are final states or both δ(p, w) and δ(q, w) are non-final state for all w
Σ* that is, if δ(p, w) F and δ(q, w) F then the states ‘p’ and ‘q’ are equivalent.
If δ(p, w) F and δ(q, w) F then also the states ‘p’ and ‘q’ are equivalent.
Ex.1:
52
Input
State
0 1
→A B F
B G C
*C A C
D C G
E B F
F C G
G G E
H G C
Soln:
Construct a table with an entry for each pair of states. An ‘X’ is placed
in the table each time we discover a pair of state that cannot be equivalent.
Initially an ‘X’ is placed in each entry corresponding to one final state and one
non-final state.
X
B
C X X
D X X X
E X X X
F X X X X
G X X X X X X
H X X X X X X
A B C D E F G
PROBLEMS:
Ex.1:
Construct minimized DFA for the following table,
53
Input
State
A b
→A B C
B B D
C B C
D B E
*E B C
Soln:
Groups [ABCD] [E]
Find equivalent for each group, here A C
Now, choose the representative for states A and C.
If the representative is A then, eliminate state C and substitute all C for A
inside the table.
Input
State
a b
→A B A
B B D
D B E
*E B A
54
Ex.2:
Construct minimized DFA for the following table,
Input
State
A b
→A B C
B B D
C B C
D B E
*E F G
*F F H
*G F G
*H F I
*I F G
Soln:
Groups [ABCD] [EFGHI]
Find equivalent for each group, here group [ABCD], A C
Now, choose the representative for states A and C.
If the representative is A then, eliminate state C and substitute all C for A
inside the table.
ie)
Input
State
A b
→A B A
B B D
D B E
*E F G
*F F H
*G F G
*H F I
*I F G
Input
State
A b
→A B A
B B D
D B E
*E F E
*F F H
*H F E
Input
State
A b
→A B A
B B D
D B E
*E F E
*F F E
Input
State
A b
→A B A
B B D
D B E
*E E E
56
Ex.3:
Construct the minimal state DFA for the regular expression (a/b)*abb.
Soln:
State A b
A B A
B B D
D B E
*E B A
Ex.4:
Construct the minimal state DFA for the regular expression (a/b)*a(a/b).
Soln:
State a b
A B C
B D E
C B C
*D D E
*E B C
State a b
A B A
B D E
*D D E
*E B A