Kleene's Theorem - Part 1: The Recursive Definition of Regular Language
Kleene's Theorem - Part 1: The Recursive Definition of Regular Language
It states that any regular language is accepted by an FA and conversely that any
language accepted by an FA is regular.
Basis Step: As shown below the languages ,{ } and { a } for any symbol a
in are accepted by an FA.
Inductive Step: We are going to show that for any languages L1 and L2 if they are
accepted by FAs, then L1 L2 , L1L2 and L1* are accepted by FAs. Since any
regular language is obtained from { } and { a } for any symbol a in by using
union, concatenation and Kleene star operations, that together with the Basis Step
would prove the theorem.
Suppose that L1 and L2 are accepted by FAs M1 = < Q1 , , q1,0 , 1 , A1 > and
Qc = Q1 Q2
qc,0 = q1,0
c= 1 2 { (q, , { q2,0 } ) | q A1 }
Ac = A2
End of Proof
Examples of Mu , Mc and Mk:
The converse of the part 1 of Kleene Theorem also holds true. It states that any
language accepted by a finite automaton is regular.
Before proceeding to a proof outline for the converse, let us study a method to
compute the set of strings accepted by a finite automaton.
Given a finite automaton, first relabel its states with the integers 1 through n,
where n is the number of states of the finite automaton. Next denote by L(p, q, k)
the set of strings representing paths from state p to state q that go through only
states numbered no higher than k. Note that paths may go through arcs and vertices
any number of times.
Then the following lemmas hold.
What this lemma says is that the set of strings representing paths from p to q
passing through states labeled with k+1 or lower numbers consists of the following
two sets:
1. L(p, q, k) : The set of strings representing paths from p to q passing through
states labeled wiht k or lower numbers.
2. L(p, k+1, k)L(k+1, k+1, k)*L(k+1, q, k) : The set of strings going first from p to
k+1, then from k+1 to k+1 any number of times, then from k+1 to q, all without
passing through states labeled higher than k.
See the figure below for the illustration.
Lemma 2: L(p, q, 0) is regular.
Proof: L(p, q, 0) is the set of strings representing paths from p to q without passing
any states in between. Hence if p and q are different, then it consists of single
symbols representing arcs from p to q. If p = q, then is in it as well as the
strings representing any loops at p (they are all single symbols). Since the number
of symbols is finite and since any finite language is regular, L(p, q, 0) is regular.
Lemma 3: L(p, q, k) is regular for any states p and q and any natural number k.
Since the language accepted by a finite automaton is the union of L(q0, q, n) over
all accepting states q, where n is the number of states of the finite automaton, we
have the following converse of the part 1 of Kleene Theorem.
Let us denote by r(p, q, k) the regular expression for the set of strings L(p, q, k).
Then the language accepted by this NFA is r(1, 1, 3).
By Lemma 1, r(1, 1, 3) = r(1, 1, 2) + r(1, 3, 2)r(3, 3, 2)*r(3, 1, 2) .
r(1, 1, 2):
r(1, 1, 2) = r(1, 1, 1) + r(1, 2, 1)r(2, 2, 1)*r(2, 1, 1)
r(1, 1, 1) = r(1,1,0) + r(1,1,0)r(1,1,0)*r(1,1,0) = a* , since r(1,1,0) = a + .
r(1, 2, 1) = r(1,2,0) + r(1,1,0)r(1,1,0)*r(1,2,0) = a+ , since r(1,2,0) = a .
r(2, 2, 1) = r(2,2,0) + r(2,1,0)r(1,1,0)*r(1,2,0) = ba+ + , since r(2,2,0) = and
r(2,1,0) = b .
r(2, 1, 1) = r(2,1,0) + r(2,1,0)r(1,1,0)*r(1,1,0) = ba* .
r(1, 3, 2):
r(1, 3, 2) = r(1, 3, 1) + r(1, 2, 1)r(2, 2, 1)*r(2, 3, 1)
r(1, 3, 1) =
r(2, 3, 1) = a
r(3, 1, 2):
r(3, 1, 2) = r(3, 1, 1) + r(3, 2, 1)r(2, 2, 1)*r(2, 1, 1)
r(3, 1, 1) = r(3,1,0) + r(3,1,0)r(1,1,0)<sup*< sup="" style="color: rgb(0, 0, 0); font-
family: "Times New Roman"; font-size: medium; font-style: normal; font-variant-ligatures:
normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; orphans: 2; text-
align: start; text-indent: 0px; text-transform: none; white-space: normal; widows: 2; word-
spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration-style: initial; text-decoration-
color: initial;">r(1,1,0) = ba*
Hence r(1, 1, 3) = a* + a+(b a+)*ba* + ( a+( ba+ )*a )( + ( ba+)+a )*( ba+ )*ba*.
This can be further simplified to (a + ab + abb)*, then to (a + ab)*. The detail is left as an
exercise though it would be quite challenging.
In this example there is only one accepting state. If there are more accepting states, then r(p,
q, n) must be found for each accepting state q, where p is the initial state and n is the number
of states in the given finite automaton, and all the r(p, q, n)'s must be added together to get
the regular expression for the language accepted by the automaton.