States, Transitions and Finite-State Transition System
States, Transitions and Finite-State Transition System
They
were initially proposed as a simple model for the behavior of neurons. The concept of a finite
automaton appears to have arisen in the 1943 paper �A logical calculus of the ideas immanent in
nervous activity", by Warren McCullock and Walter Pitts. In 1951 Kleene introduced regular
expressions to describe the behaviour of finite automata. He also proved the important theorem
saying that regular expressions exactly capture the behaviours of finite automata. In 1959, Dana
Scott and Michael Rabin introduced non-deterministic automata and showed the surprising theorem
that they are equivalent to deterministic automata. We will study these fundamental results. Since
those early years, the study of automata has continued to grow, showing that they are indeed a
fundamental idea in computing.
Let us first give some intuitive idea about a state of a system and state transitions before describing
finite automata.
Informally, a state of a system is an instantaneous description of that system which gives all relevant
information necessary to determine how the system can evolve from that point on.
Transitions are changes of states that can occur spontaneously or in response to inputs to the
states. Though transitions usually take time, we assume that state transitions are instantaneous
(which is an abstraction).
Some examples of state transition systems are: digital systems, vending machines, etc.
A system containing only a finite number of states and transitions among them is called a finite-state
transition system.
Finite-state transition systems can be modeled abstractly by a mathematical model called finite
automation
We said that automata are a model of computation. That means that they are a simplified abstraction
of `the real thing'. So what gets abstracted away? One thing that disappears is any notion of
hardware or software. We merely deal with states and transitions between states. The distinction
between program and machine executing it disappears. One could say that an automaton is the
machine and the program. This makes automata relatively easy to implement in either hardware or
software. From the point of view of resource consumption, the essence of a finite automaton is that it
is a strictly finite model of computation. Everything in it is of a fixed, finite size and cannot be
modified in the course of the computation
Informally, a DFA (Deterministic Finite State Automaton) is a simple machine that reads an input
string -- one symbol at a time -- and then, after the input has been completely read, decides whether
to accept or reject the input. As the symbols are read from the tape, the automaton can change its
state, to reflect how it reacts to what it has seen so far.
An automaton processes a string on the tape by repeating the following actions until the tape head
has traversed the entire string:
1. The tape head reads the current tape cell and sends the symbol s found there to the control.
Then the tape head moves to the next cell.
2. he control takes s and the current state and consults the state transition function to get the
next state, which becomes the new current state.
Once the entire string has been processed, the state in which the automation enters is examined. If
it is an accept state , the input string is accepted ; otherwise, the string is rejected . Summarizing all
the above we can formulate the following formal definition:
tuple :
Acceptance of Strings :
2. for all .
3. .
Language Accepted or Recognized by a DFA :
The language accepted or recognized by a DFA M is the set of all strings accepted by M , and is denoted by
i.e.
The notion of acceptance can also be made more precise by extending the transition function .
That is, is the state the automation reaches when it starts from the state q and finish processing the string w
Formally, we can give an inductive definition as follows:
The language of the DFA M is the set of strings that can take the start state to one of the accepting states i.e.
L(M) = { | M accepts w }
= { | }
Example 1 :
It is a formal description of a DFA. But it is hard to comprehend. For ex. The language of the DFA is any string over
{ 0, 1} having at least one 1.
We can describe the same DFA by transition table or state transition diagram as following:
Transition Table :
0 1
Explanation : We cannot reach find state w/0 or in the i/p string. There can be any no. of 0's at the beginning.
( The self-loop at on label 0 indicates it ). Similarly there can be any no. of 0's & 1's in any order at the end of the
string.
Transition table :
It is basically a tabular representation of the transition function that takes two arguments (a state and a symbol) and
returns a value (the �next state�).
0 1
A state transition diagram or simply a transition diagram is a directed graph which can be constructed as follows:
2. There is a directed edge from node q to node p labeled a iff . (If there are several input symbols
that cause a transition, the edge is labeled by the list of these symbols.)
3. There is an arrow with no source into the start state.
4. Accepting states are indicated by double circle.
Here is an informal description how a DFA operates. An input to a DFA can be any string . Put a pointer to the
start state q. Read the input string w from left to right, one symbol at a time, moving the pointer according to the
transition function, . If the next symbol of w is a and the pointer is on state p, move the pointer to . When
the end of the input string w is encountered, the pointer is on some state, r. The string is said to be accepted by the
DFA if and rejected if . Note that there is no formal mechanism for moving the pointer.
- transitions do not increase the power of an NFA . That is, any - NFA ( NFA with transition), we can alway
construct an equivalent NFA without -transitions. The equivalent NFA must keep track where the NFA goes a
every step during computation. This can be done by adding extra transitions for removal of every - transitions
from the - NFA as follows.
If we removed the - transition from the - NFA , then we need to moves from state p to all the state
on input symbol which are reachable from state q (in the - NFA ) on same input symbol q. This will allow the
modified NFA to move from state p to all states on some input symbols which were possible in case of -NFA on the
same input symbol. This process is stated formally in the following theories.
Theorem if L is accepted by an - NFA N , then there is some equivalent without transitions accepting th
same language L
Proof:
We construct
i.e.
Basis : , then
But by definition of .
By definition of extension of
By inductions hypothesis.
Assuming that
By definition of
Since
If (and thus is not in F ), then with leads to an accepting state in N' iff it leads to an
accepting state in N ( by the construction of N' and N ).
Also, if ( , thus w is accepted by N' iff w is accepted by N (iff )
Let . If w cannot lead to in N , then . (Since can add transitions to get an accept state). So there
is no harm in making an accept state in N'.
0 1
0 1
Transition table ' for the equivalent NFA without - moves
Since the start state q0 must be final state in the equivalent NFA .
-closures:
The concept used in the above construction can be made more formal by defining the -closure for a state (or a set
of states). The idea of -closure is that, when moving from a state p to a state q (or from a set of states Si to a set of
states Sj ) an input , we need to take account of all -moves that could be made after the transition. Formally,
for a given state q,
So, in the construction of equivalent NFA N' without -transition from any NFA with moves. the first rule can now b
written as
It is worth noting that a DFA is a special type of NFA and hence the class of languages accepted by DFA s is a subse
of the class of languages accepted by NFA s. Surprisingly, these two classes are in fact equal. NFA s appeared to
have more power than DFA s because of generality enjoyed in terms of -transition and multiple next states. But the
are no more powerful than DFA s in terms of the languages they accept.
Proof: A DFA is just a special type of an NFA . In a DFA , the transition functions is defined from whereas
as follows.
-closures:
-closures:
i. e
If and
Then it is clear from the above construction of N that there is a sequence of states (in N)
There are possible subsets of states for any NFA with n states. Every subset corresponds to one of the possibilitie
that the equivalent DFA must keep track of. Thus, the equivalent DFA will have states.
The formal constructions of an equivalent DFA for any NFA is given below. We first consider an NFA without
transitions and then we incorporate the affects of transitions later.
as follows
i.e.
where
That is,
To show that this construction works we need to show that L(D)=L(N) i.e.
Or,
We will prove the following which is a stranger statement thus required.
So, by definition.
Inductions hypothesis : Assume inductively that the statement holds of length less than or equal to n.
Inductive step
Now,
Now, given any NFA with -transition, we can first construct an equivalent NFA without -transition and then use th
above construction process to construct an equivalent DFA , thus, proving the equivalence of NFA s and DFA s..
It is also possible to construct an equivalent DFA directly from any given NFA with -transition by integrating the
concept of -closure in the above construction.
- closure :
In the equivalent DFA , at every step, we need to modify the transition functions to keep track of all the states
where the NFA can go on -transitions. This is done by replacing by -closure , i.e. we now
Besides this the initial state of the DFA D has to be modified to keep track of all the states that can be reached from
the initial state of NFA on zero or more -transitions. This can be done by changing the initial state to -closure (
).
It is clear that, at every step in the processing of an input string by the DFA D , it enters a state that corresponds to th
subset of states that the NFA N could be in at that particular point. This has been proved in the constructions of an
equivalent NFA for any -NFA
If the number of states in the NFA is n , then there are states in the DFA . That is, each state in the DFA is a subse
of state of the NFA .
But, it is important to note that most of these states are inaccessible from the start state and hence can be
removed from the DFA without changing the accepted language. Thus, in fact, the number of states in the equivalent
DFA would be much less than .
Transition table
0 1
{ }
Note that states are not accessible and hence can be removed. This gives us
the following simplified DFA with only 3 states.
It is interesting to note that we can avoid encountering all those inaccessible or unnecessary states in the equivalent
DFA by performing the following two steps inductively.
1. If is the start state of the NFA, then make - closure ( ) the start state of the equivalent DFA . This is
definitely the only accessible state.
2.
If we have already computed a set of states which are accessible. Then . compute because
these set of states will also be accessible.
Following these steps in the above example, we get the transition table given below
0 1
It is easy to construct and comprehend an NFA than DFA for a given regular language. The concept
of NFA can also be used in proving many theorems and results. Hence, it plays an important role in
this subject.
In the context of FA nondeterminism can be incorporated naturally. That is, an NFA is defined in the
same way as the DFA but with the following two exceptions:
- transitions :
In an -transition, the tape head doesn't do anything- it doesnot read and it doesnot move. However,
the state of the automata can be changed - that is can go to zero, one or more states. This is written
Acceptance :
Informally, an NFA is said to accept its input if it is possible to start in some start state and
process , moving according to the transition rules and making choices along the way whenever
the next state is not uniquely defined, such that when is completely processed (i.e. end of is
reached), the automata is in an accept state. There may be several possible paths through the
automation in response to an input since the start state is not determined and there are choices
along the way because of multiple next states. Some of these paths may lead to accpet states while
others may not. The automation is said to accept if at least one computation path on input
starting from at least one start state leads to an accept state- otherwise, the automation rejects input
. Alternatively, we can say that, is accepted iff there exists a path with label from some start
state to some accept state. Since there is no mechanism for determining which state to start in or
which of the possible next moves to take (including the -transitions) in response to an input
symbol we can think that the automation is having some "guessing" power to chose the correct one
in case the input is accepted.
Example 1 : Consider the language L = { {0, 1}* | The 3rd symbol from the right is 1}. The
following four-state automation accepts L.
The m/c is not deterministic since there are two transitions from state on input 1 and no transition
(zero transition) from on both 0 & 1.
For any string whose 3rd symbol from the right is a 1, there exists a sequence of legal transitions
leading from the start state q, to the accept state . But for any string where 3rd symbol from the
right is 0, there is no possible sequence of legal tranisitons leading from and . Hence m/c
accepts L. How does it accept any string L?
The m/c starts at and remains in the state on any input until the 3rd symbol from the right is
encountered. (Of course, must satisfy | | 3 ). At this point, if the symbol is 1, it goes to the
state and these enters & in the next two steps on any input 0 or 1. But if the 3rd symbol from
the right is , thus it will get stuck at that point, because of no transition defined.
To enter the state from , the m/c needs the input 1. If the 1 occur prior to the position 4 in the
input or more from the right (instead of 3rd), thus it can enter from on that input and finally will
enter accept state but at that point some of the input symbols may be left i.e. the input will not be
exhausted and hence, the string will not be accepted by the m/c.
The Extended Transition function , :
1. that is, without rending any input symbol, an NFA doesnot change state.
2. Let some and a . Also assume that
. Then .
That is, can be computed by first computing , and by then following any transtive
from any of these stats that is labelled a.
From the discussion of the acceptance by an NFA, we can give the formal definition of a language
accepted by an NFA as follows :
given by .
That is, L(N) is the set of all strings w in such that contains at least one accepting
state.
Regular Expressions (RE) Print this pag
First | Last | Prev | Nex
REs: Formal Definition
We construct REs from primitive constituents (basic elements) by repeatedly applying certain recursive rules as given
below. (In the definition)
Definition : Let S be an alphabet. The regular expressions are defined recursively as follows.
Basis :
i) is a RE
ii) is a RE
iii) , a is RE.
Recursive Step :
i)
ii)
iii)
iv)
Closure : r is RE over only if it can be obtained from the basis elements (Primitive REs) by a finite no of
applications of the recursive step (given in 2).
Example : Let = { 0,1,2 }. Then (0+21)*(1+ F ) is a RE, because we can construct this expression
by applying the above rules as given in the following step.
Steps RE Constructed Rule Used
1 1 Rule 1(iii)
2 Rule 1(i)
3 1+ Rule 2(i) & Results of Step 1, 2
4 (1+ ) Rule 2(iv) & Step 3
5 2 1(iii)
6 1 1(iii)
7 21 2(ii), 5, 6
8 0 1(iii)
9 0+21 2(i), 7, 8
10 (0+21) 2(iv), 9
11 (0+21)* 2(iii), 10
12 (0+21)* 2(ii), 4, 11
Language described by REs : Each describes a language (or a language is associated with every
RE). We will see later that REs are used to attribute regular languages.
Notation : If r is a RE over some alphabet then L(r) is the language associate with r . We can
define the language L(r) associated with (or described by) a REs as follows.
= L(0)*L(0) L(1)
= { , 0,00,000,........} {0,1}
Consider the RE ab + c. The language described by the RE can be thought of either L(a)L(b+c) or
L(ab) L(c) as provided by the rules (of languages described by REs) given already. But these two
represents two different languages lending to ambiguity. To remove this ambiguity we can either
2) Use a set of precedence rules to evaluate the options of REs in some order. Like other algebras
mod in mathematics.
i) The star operator precedes concatenation and concatenation precedes union (+) operator.
ii) It is also important to note that concatenation & union (+) operators are associative and union
operation is commutative.
Using these precedence rule, we find that the RE ab+c represents the language L(ab) L(c) i.e. it
should be grouped as ((ab)+c).
We can, of course change the order of precedence by using parentheses. For example, the
language represented by the RE a(b+c) is L(a)L(b+c).
Example : The RE ab*+b is grouped as ((a(b*))+b) which describes the language L(a)(L(b))*
L(b)
Example : It is easy to see that the RE (0+1)*(0+11) represents the language of all strings over {0,1}
which are either ended with 0 or 11.
Example : The regular expression r =(00)*(11)*1 denotes the set of all strings with an even number
of 0's followed by an odd number of 1's i.e.
Note : The notation is used to represent the RE rr*. Similarly, represents the RE rr,
denotes r, and so on.
Exercise : Give a RE r over {0,1} s.t. L(r)={ has at least one pair of consecutive 1's}
Solution : Every string in L(r) must contain 00 somewhere, but what comes before and what goes
before is completely arbitrary. Considering these observations we can write the REs as
(0+1)*11(0+1)*.
Example : Consider the RE 0*10*10*. It is not difficult to see that this RE describes the set of strings
over {0,1} that contains exactly two 1's. The presence of two 1's in the RE and any no of 0's before,
between and after the 1's ensure it.
Example : Consider the language of strings over {0,1} containing two or more 1's.
Solution : There must be at least two 1's in the RE somewhere and what comes before, between,
and after is completely arbitrary. Hence we can write the RE as (0+1)*1(0+1)*1(0+1)*. But following
two REs also represent the same language, each ensuring presence of least two 1's somewhere in
the string
i) 0*10*1(0+1)*
ii) (0+1)*10*10*
Solution : Though it looks similar to ex ��., it is harder to construct to construct. We observer that,
whenever a 1 occurs, it must be immediately followed by a 0. This substring may be preceded &
followed by any no of 0's. So the final RE must be a repetition of strings of the form: 00�0100�.00
i.e. 0*100*. So it looks like the RE is (0*100*)*. But in this case the strings ending in 1 or consisting
of all 0's are not accounted for. Taking these observations into consideration, the final RE is r =
(0*100*)(1+ )+0*(1+ ).
Alternative Solution :
The language can be viewed as repetitions of the strings 0 and 01. Hence get the RE as r =
(0+10)*(1+ ).This is a shorter expression but represents the same language
Recall that, language that is accepted by some FAs are known as Regular language. The two
concepts : REs and Regular language are essentially same i.e. (for) every regular language can be
developed by (there is) a RE, and for every RE there is a Regular Langauge. This fact is rather
suprising, because RE approach to describing language is fundamentally differnet from the FA
approach. But REs and FA are equivalent in their descriptive power. We can put this fact in the focus
of the following Theorem.
Theorem : A language is regular iff some RE describes it.
This Theorem has two directions, and are stated & proved below as a separate lemma
RE to FA :
Proof : To prove the lemma, we apply structured index on the expression r. First, we show
how to construct FA for the basis elements: , and for any . Then we show how to
combine these Finite Automata into Complex Automata that accept the Union,
Concatenation, Kleen Closure of the languages accepted by the original smaller automata.
Use of NFAs is helpful in the case i.e. we construct NFAs for every REs which are
represented by transition diagram only.
Basis :
Case (i) : . Then . Then and the following NFA N recognizes L(r).
where .
Since the start state is also the accept step, and there is no any transition defined, it will accept the
only string and nothing else.
Case (iii) : r = a for some . Then L(r) = {a}, and the following NFA N accepts L(r).
Induction :
Assume that the start of the theorem is true for REs and . Hence we can assume that we have
automata and that accepts languages denoted by REs and , respectively i.e.
Each has an initial state and a final state. There are four cases to consider.
Create a final state and give -transition from the two final state of and . is the
only final state of and final state of and will be ordinary states in .
All the state of and are also state of .
All the moves of and are also moves of . [ Formal Construction]
= by following transition of .
Starts at initial state and enters the start state of either or follwoing the transition i.e.
without consuming any input. WLOG, assume that, it enters the start state of . From this point
onward it has to follow only the transition of to enter the final state of , because this is the
only way to enter the final state of M by following the e-transition.(Which is the last transition & no
input is taken at hte transition). Hence the whole input w is considered while traversing from the start
state of to the final state of . Therefore must accept .
Say, or .
WLOG, say
Therefore when process the string w , it starts at the initial state and enters the final state when
w consumed totally, by following its transition. Then also accepts w, by starting at state and
taking -transition enters the start state of -follows the moves of to enter the final state of
consuming input w thus takes -transition to . Hence proved.
2. All the states of are also the states of . has 2 more states than that of namely
and .
3. All the moves of are also included in .
Case(iv) : Let =( ). Then the FA is also the FA for ( ), since the use of parentheses
does not change the language denoted by the expression.
FA to RE (REs for Regular Languages) :
Lemma : If a language is regular, then there is a RE to describe it. i.e. if L = L(M) for some DFA M,
then there is a RE r such that L = L(r).
Notations : is a RE denoting the language which is the set of all strings w such that w is the
label of a path from state i to state j in M, and that path has no intermediate state
whose number is greater then k. ( i & j (begining and end pts) are not considered to be
"intermediate" so i and /or j can be greater than k )
Basis : k = 0, i.e. the paths must not have any intermediate state ( since all states are
numbered 1 or above). There are only two possible paths meeting the above condition :
symbols .
Induction :
Assume that there exists a path from state i to state j such that there is no intermediate state whose
number is greater than k. The corresponding Re for the label of the path is .
There are only two possible cases :
1. The path dose not go through the state k at all i.e. number of all the intermediate states are
less than
k. So, the label of the path from state i to state j is tha language described by the RE .
2. The path goes through the state k at least once. The path may go from i to j and k may
appear more than once. We can break the into pieces as shown in the figure 7.
Figure 7
1. The first part from the state i to the state k which is the first recurence. In this path, all
intermediate states are less than k and it starts at iand ends at k. So the RE denotes
the language of the label of path.
2. The last part from the last occurence of the state k in the path to state j. In this path also, no
intermediate state is numbered greater than k. Hence the RE denoting the language of
the label of the path.
3. In the middle, for the first occurence of k to the last occurence of k , represents a loop which
may be taken zero times, once or any no of times. And all states between two consecutive
k's are numbered less than k.
Hence the label of the path of the part is denoted by the RE .The label of the path from state
i to state j is the concatenation of these 3 parts which is
Since either case 1 or case 2 may happen the labels of all paths from state i to j is denoted by the
following RE
We can construct for all i, j {1,2,..., n} in increasing order of k starting with the basis k = 0 upto
k = n since depends only on expressions with a small superscript (and hence will be available).
WLOG, assume that state 1 is the start state and are the m final states where ji {1, 2,
... , n }, and . According to the convention used, the language of the automatacan
be denoted by the RE
Since is the set of all strings that starts at start state 1 and finishes at final state following the
transition of the FA with any value of the intermediate state (1, 2, ... , n) and hence accepted by the
automata
Limitations of Finite Automata and Non regular Languages :
The class of languages recognized by FA s is strictly the regular set. There are certain languages
which are non regular i.e. cannot be recognized by any FA
In order to accept is language, we find that, an automaton seems to need to remember when
passing the center point between a's and b's how many a's it has seen so far. Because it would
have to compare that with the number of b's to either accept (when the two numbers are same) or
reject (when they are not same) the input string.
But the number of a's is not limited and may be much larger than the number of states since the
string may be arbitrarily long. So, the amount of information the automaton need to remember is
unbounded.
A finite automaton cannot remember this with only finite memory (i.e. finite number of states).
The fact that FA s have finite memory imposes some limitations on the structure of the languages
recognized. Inductively, we can say that a language is regular only if in processing any string in
this language, the information that has to be remembered at any point is strictly limited. The
argument given above to show that is non regular is informal. We now present a formal
method for showing that certain languages such as are non regular.
We can prove that a certain language is non regular by using a theorem called �Pumping
Lemma�. According to this theorem every regular language must have a special property. If a
language does not have this property, than it is guaranteed to be not regular. The idea behind this
theorem is that whenever a FA process a long string (longer than the number of states) and
accepts, there must be at least one state that is repeated, and the copy of the sub string of the
input string between the two occurrences of that repeated state can be repeated any number of
times with the resulting string remaining in the language.
Pumping Lemma :
There exists a number (called, the pumping length), where, if w is any string in L of length
at least k i.e. , then w may be divided into three sub strings w = xyz, satisfying the
following conditions:
1. i.e.
2.
3.
Proof : Since L is regular, there exists a DFA that recognizes it, i.e. L = L(M)
. Let the number of states in M is n.
Say,
Consider a string such that (we consider the language L to be infinite and hence
such a string can always be found). If no string of such length is found to be in L , then the
lemma becomes vacuously true.
Since , the number of states in the above sequence must be greater than n + 1. But number
of states in M is only n. hence, by pigeonhole principle at least one state must be repeated.
Let qi and ql be the ql same state and is the first state to repeat in the sequence (there may be
some more, that come later in the sequence). The sequence, now, looks like
Missing figure
Since is the first repeated state, we have, and at the same time y cannot be empty
i.e . From the above, it immediately follows that . Hence .
Similarly,
implying
implying
and so on.
That is, starting at the loop on state can be omitted, taken once, twice, or many more times, (by
the DFA M ) eventually arriving at the final state
Thus, accepting the string xz, xyz, xy2z,... i.e. xyiz for all
Hence .
We can use the pumping lemma to show that some languages are non regular.
Please note, carefully, hat the theorem guarantees the existence of a number as well as the
decomposition of the string w to xyz. But it is not known what they are. So, if the theorem is
violated for particular values of