4-DFA To RE Conversion-24-01-2025
4-DFA To RE Conversion-24-01-2025
So far we have seen different ways of specifying regular language: DFA, NFA, ε-NFA, regular
expressions and regular grammar. Noted that all these different expressions are equal in power
by showing the equivalences. Regular expressions and grammars are considered as generators
of regular language while the machines (DFA, NFA, ε-NFA) are considered as acceptors of
the language.
Now will look at the properties of regular language. The properties can be broadly classified
as two parts: (A) Closure properties and (B) Decision properties
Let DFA(L) denote the DFA for the language L. Modify the DFA as follows to obtain
DFA(L').
This can be shown by an example using a DFA. Let L denote the language containing strings
that begins and ends with a. Σ = {a, b}. The DFA for L is given below.
L' denotes the language that does not contain strings that begin and end with a. This implies
L' contains strings that
The DFA for L' is obtained by flipping the final states of DFA(L) to non-final states and vice-
versa. The DFA for L' is given below.
• q0 ensures ε is
accepted
• q1 ensures all
strings that begin
with a and end with
b are accepted.
• q3 ensures all
strings that begin
with b (ending with
either a or b) are
accepted.
Important Note: While specifying the DFA for L, we have also included the dead state q3. It
is important to include the dead state(s) if we are going to derive the complement DFA since,
the dead state(s) too would become final in the complementation. If we didn't add the dead
state(s) originally, the complement will not accept all strings supposed to be accepted.
In the above example, if we didn't include q3 originally, the complement will not accept
strings starting with b. It will only accept strings that begin with a and end with b which is
only a subset of the complement.
2. Union
This is easier proved using regular expressions. If L1 is regular, there exists a regular
expression R1 to describe it. Similarly, if L2 is regular, there exists a regular expression R2 to
describe it. R1 + R2 denotes the regular expression that describe L1 ∪ L2. Therefore, L1 ∪ L2
is regular.
This again can be shown using an example. If L1 is a language that contains strings that begin
with a and L2 is a language that contain strings that end with a, then L1 ∪ L2 denotes the
language the contain strings that either begin with a or end with a.
- a(a+b)* is the regular expression that denotes L1.
In terms of DFA, we can say that a DFA(L1 ∪ L2) accepts those strings that are accepted by
either DFA(L1) or DFA(L2) or both.
• DFA(L1 ∪ L2) can be constructed by adding a new start state and new final state.
• The new start state connects to the two start states of DFA(L1) and DFA(L2) by
εtransitions.
• Similarly, two ε transitions are added from the final states of DFA(L1) and DFA(L2)
to the new final state.
• Convert this resulting NFA to its equivalent DFA.
As an exercise you can try this approach of DFA construction for union for the given
example.
3. Intersection
Since a language denotes a set of (possibly infinite) strings and we have shown above that
regular languages are closed under union and complementation, by De Morgan's law can be
applied to show that regular languages are closed under intersection too.
L1 and L2 are regular ⇒ L1' and L2' are regular (by Complementation property)
L1' ∪ L2' is regular (by Union property)
L1 ∩ L2 is regular (by De Morgan's law)
In terms of DFA, we can say that a DFA(L1 ∩ L2) accepts those strings that are accepted by
both DFA(L1) and DFA(L2).
4. Concatenation
5. Kleene star
This can be easily proved by regular expression. If L is regular, then there exists a regular
expression R. We know that if R is a regular expression, R* is a regular expression too. R*
denotes the language L*. Therefore L* is regular.
In terms of DFA, in the DFA(L) we add two ε transitions, one from start state to final state
and another from final state to start state. This denotes DFA(L*). You can try showing this
for an example.
6. Difference
L1 and L2 are regular ⇒ L1 and L2' are regular (by Complementation property)
L1 ∩ L2' is regular (by Intersection property)
L1 - L2 is regular (by De Morgan's law)
In terms of DFA, we can say that a DFA(L1 - L2) accepts those strings that are accepted by
both DFA(L1) and not accepted by DFA(L2). You can try showing this for an example.
7. Reverse
Let DFA(L) denote the DFA of L. Make the following modifications to construct DFA(LR).
In case there are more than one final state in DFA(L), first add a new final state and
add ε- transitions from the final states (which now cease to be final states any more)
and perform this step.
3. Reverse the direction of the arrows.
• Construct DFA(L).
• Run w on DFA(L).
• If DFA(L) accepts w, then w ∈ L. Else w ∉ L.
2. Emptiness question
Is L = φ?
• Construct DFA(L).
• If there exists no path from start state to final state, L = φ. Else L ≠ φ.
3. Equivalence question
Is L1 = L2?
4. Subset question
Is L1 ⊂ L2?
5. Infinite question
Is L infinite?
• Construct DFA(L).
• If DFA(L) has at least one loop, then L is infinite. Else L finite.
Question: Can we conclude that a language is not regular if no one could come up with a
DFA, NFA, ε-NFA, regular expression or regular grammar so far?
- No. Since, someone may very well come up with any of these in future.
We need a property that just holds for regular languages and so we can prove that any
language without that property is not regular. Let's recall some of the properties.
• We have seen that a regular language can be expressed by a finite state automaton. Be
it deterministic or non-deterministic, the automaton consists of a finite set of states.
• Since the states are finite, if the automaton has no loop, the language would be finite.
- Any finite language is indeed a regular language since we can express the language
using the regular
expression: S1 + S2 + ... + SN, where N is the total number of strings accepted by
the automaton.
Any finite automaton with a loop can be divided into parts three.
For example consider the following DFA. It accepts all strings that start with aba followed by
any number of baa's and finally ending with ba.
Investigating this further, we can say that any string w accepted by this DFA can be written
as w = x yi z
where y represents the part that can be pumped again and again to generate more and more
valid strings. This is shown below for the given example.
2. What if the loop was at the beginning? Say a self-loop at q0 instead of at q2.
Then x = ε or |x| = 0. In such a special case, w = yz.
3. What is the loop was at the end. Say a self loop at q6 instead of at q2.
Then z = ε or |z| = 0. In such a special case, w = xy.
6. What is the shortest string accepted if there are more final states? Say q2 is final.
ab of length 2.
7. What is the longest string accepted by the DFA without going through the loop even once?
ababa (= xz). So, any string of length > 5 accepted by DFA must go through the loop at
least once.
8. What is the longest string accepted by the DFA by going through the loop exactly once?
ababaaba (= xyz) of length 8. We call this pumping length.
More precisely, pumping length is an integer p denoting the length of the string w such that w
is obtained by going through the loop exactly once. In other words, |w| = |xyz| = p.
Pumping Lemma: If L is a regular language, then there exists a constant p such that every
string w ∈ L, of length p or more can be written as w = xyz, where
1. |y| > 0
2. |xy| ≤ p
3. xyiz ∈ L for all i
Before proving L is not regular using pumping property, let's see why we can't come up with
a DFA or regular expression for L.
It may be tempting to use the regular expression a*b* to describe L. No doubt, a*b*
generates these strings. However, it is not appropriate since it generates other strings not in L
such as a, b, aa, ab, aaa, aab, abb, ...
Let's try to come up with a DFA. Since it has to accept ε, start state has to be final. The
following DFA can accept anbn for n ≤ 3. i.e. {ε, a, b, ab, aabb, aaabbb}
The basic problem is DFA does not have any memory. A transition just depends on the
current state. So it cannot keep count of how many a's it has seen. So, it has no way to match
the number of a's and b's. So, only way to accept all the strings of L is to keep adding newer
and newer states which makes automaton to infinite states since n is unbounded.
Now, let's prove that L does not have the pumping property.
⇒ w = ap/2bp/2
We know that w can be broken into three terms xyz such that y ≠ ε and xyiz ∈ L.
Then xy2z has more a's than b's and does not belong to L.
Then xy2z has a's and b's out of order and does not belong to L.
Since none of the 3 cases hold, the pumping property does not hold for L. And therefore L is
not regular.
|w| = 2p + 2 ≥ p
Therefore, pumping property does not hold for L. Hence, L is not regular.
Lets assume L is regular. Let p be the pumping length. Let q ≥ p be a prime number (since we
cannot assume that pumping length p will be prime).
We know that w can be broken into three terms xyz such that y ≠ ε and xyiz ∈ L
Exercises
Show that the following languages are not regular.
4. L = { anbm : n ≠ m }
5. L = { anbm : n > m }
6. L = { w : na(w) = nb(w) }
7. L = { ww : w ∈ {a,b}* }
8. L = { an2 : n > 0 }
IMPORTANT NOTE
Never use pumping lemma to prove a language regular. Pumping property is necessary but
not sufficient for regularity.