Synthesis of High-Speed Finite State Machines in Fpgas by State Splitting
Synthesis of High-Speed Finite State Machines in Fpgas by State Splitting
splitting
Valery Salauyou
Key words — synthesis, finite state machine, high-speed, high performance, state splitting, field programmable gate array, FPGA,
SoC, look up table, LUT.
Abstract. A synthesis method of high-speed finite state machines (FSMs) on LUT-based field programmable gate array
(FPGA) by internal state splitting is offered. The method does not change an FSM type (Mealy or Moore), the method does
not demand introduction of additional blocks or clock signals, and one can be easily included in a designing flow of digital
systems in FPGA. Estimations of a number of LUT levels are presented with an implementation of FSM transition
functions in case of a sequential and parallel decomposition. Algorithms splitting of FSM internal states for synthesis high-
speed FSMs are described. Experimental results showed a high efficiency of the offered method, an FSM performance
increases by 1.52 times on occasion. In conclusions, the experimental results are considered, a possibility of the using
method for creation high-speed FSMs on ASIC is marked, and perspective directions designing of high-speed FSMs are
specified.
1. Introduction
Large-size functional blocks and nodes of a digital system and also itself the digital system, as a rule, include a
control device or a controller. A speed of the digital system and functional blocks making it directly depends on a speed of
their control devices. A mathematical model of the majority of control devices and controllers is a finite state machine
(FSM). Because of this, the synthesis methods of high-speed FSMs are necessary for designing of high-performance digital
systems. During synthesis of high-speed FSMs it is possible ignored a implementation cost as an area on a chip of control
devices makes a small part in comparison with other system components (for example, memory or transceivers).
In modern times, programmable logic devices (PLDs) are widely used for designing of digital systems. Now, two
types of PLD architectures are widely used: on the basis of two programmed matrixes (AND and OR), and on the basis of
functional generators an LUT (Look Up Table). The first type PLDs will be called Complex Programmable Logic Devices
(CPLDs), and the second will be called Field Programmable Gate Arrays (FPGAs). It is possible to represent a structure
FPGA as a great quantity of LUTs united by interconnections. Every LUT allows to realize any Boolean function from a
small number of arguments (as a rule, from 4 to 6). The methods of FSM synthesis in CPLD have been considered in [1]. In
given paper, a synthesis method of high-speed FSMs in FPGA is considered.
Many articles are devoted a problem of high-speed FSM designing in PLD, which are characterized by the big
variety of approaches for the decision of the given task. In [2], the technique for improving the performance of a
synchronous circuit configured as a look-up table based FPGA without changing the initial circuit configuration is
presented. Only the register location is altered. It improves clock speed and data throughput at the expense of latency. In
[3], the methods and tools for state encoding and combinational synthesis of sequential circuits based on new criteria of
information flow optimization are considered. In [4], the timing optimization technique for a complex FSM that consists of
not only random logic but also data operators is proposed. The technique, based on the concept of catalyst , adds a
functionally redundant block (which includes a piece of combinational logic and several other registers) to the circuits
under consideration so that the timing critical paths are divided into stages. In [5,6], styles of the description of FSMs in
language VHDL, and also known methods of a state assignment for implementation of FSMs are researched. In [7],
evolutionary methods are applied to synthesis of FSMs. At the first stage, the task of state assignment by means of genetic
algorithms is resolved. Then evolutionary algorithms are applied to minimization of a chip area and a time delay of FSM
output signals. In [8], a task of a state assignment and an optimization of the combinational circuit at implementation of
high-speed FSMs in CPLD is considered. In [9], a novel architecture that is specifically optimized for implementing
reconfigurable FSMs Transition-based Reconfigurable FSM (TR-FSM) is presented. The architecture shows a considerable
reduction in area, delay and power consumption compared to FPGA architectures. In [10], the new model of the automatic
machine named the virtual finite state machine (Finite Virtual State Machine - FVSM) is offered. For implementation of
the FVSM the architecture based on storage, and a technique of generation of the FVSM from traditional FSMs is offered.
The FVSM implemented on new architecture, have advantage on high-speed performance, in comparison with traditional
implementation of FSMs on storage RAM. In [11], an implementation of FSMs in FPGA with usage of integral units of
storage ROM is considered. Two architecture of FSMs with multiplexers on inputs of blocks ROM which allow to reduce
the area and to increase high-speed performance of the FSM are offered. In [12], the reduction task of arguments of
transition functions by state splitting is considered, it allows to reduce a chip area and a time delay with implementation of
FSMs in FPGA.
This paper also uses splitting of FSM states, but the purpose of splitting is increase of a performance of FSMs in
LUT-based FPGA. Splitting of FSM states is assigned to operations of the equivalent conversions of an FSM and does not
change an algorithm of its functioning. During splitting of FSM states the machine type (Mealy or Moore) is saved, the
general structure of the FSM does not change, and embedded memory blocks of FPGAs are not used. In the course of state
splitting the hierarchy of the state names of also is saved that simplifies an analysis and debugging of system project.
Because of this the offered synthesis method of high-speed FSMs in FPGA is aimed at practical usage and can be easily
included in a general flow of digital systems designing.
This paper is organized as follows. Section 2 describes estimations of a number of LUT levels with an
implementation of FSM transition functions in case of a sequential and parallel decomposition. Section 3 considers the
synthesis method of high-speed FSMs, which includes two algorithms: the general algorithm and the algorithm for
decomposition of the individual state. The detailed example shows the method. Experimental results are reported in Section
4. Section 5 concludes.
n-1 LUT li
n-1 LUT di
r n
lis int i 1
n 1
1
LUT di
n n
LUT
1
LUT
n
n
LUT
Figure 2. The parallel decomposition of Boolean function
The values of the function arguments arrive on inputs of the first level LUTs, and the values of the intermediate
functions arrive on inputs of all next levels of the LUTs. Because the number of the LUT’s levels in case the parallel
decomposition the transition function di having the rank ri is defined by following expression:
It is difficult to predict what decomposition (sequential or parallel) used by the specific synthesizer. The
preliminary researches showed that, for example, the design tool Quartus II from Altera simultaneously uses both
sequential, and parallel decomposition. The number li levels of LUTs at implementation in FPGA transition function di with
s p
the rank ri can be between values li and li , i= 1, M .
Let us enter integer coefficient k, k [0,10], which allows to adapt the offered algorithm with determination of the
number of the LUT’s levels for the specific synthesizer. In this case the number li of the LUT’s levels for implementation
of the transition function di having the rank ri will be defined by following expression:
10 k p k s
li int li li
10 10
The specific value of the coefficient k depends on the architecture of an FPGA and the used synthesizer.
The following problem is the answer to a question: when it is necessary to stop splitting of the FSM states? The
matter is that with splitting of the some state ai (i= 1, M ), except the increase of the number M of the FSM states, the
number of the transitions in the states of a set A(ai) also is increased, where A(ai) is the set of the states in which the
transitions from the state ai terminate. With splitting of the state ai the capacities of the sets B(am), am A(ai), increase for
the states of set A(ai). Therefore according to (1) for states of the set A(ai) the ranks of the transition functions grow that can
s p
lead to increase of the values and li , li , and li.
In the this algorithm the process of state splitting stops, when the following condition is performance:
where lmax is the number of the LUT levels which necessary for implementation of the most "bad" function having the
maximum rank; lmid is the arithmetic mean value of the number of LUT levels for all transition functions. Note that in the
process splitting of internal states the value lmid will increase, and the value lmax will decrease, therefore the algorithm
execution always comes to an end.
a2 1
!x1
!x 10
7!
x 8x 9
x
!(!x6x7!x8x9!x10) !x 6
1 a5
a6
y=0 y=1
The given FSM represents a machine Moore, has 6 states a1,…,a6, 10 input variables x1,…,x10 and one output
variable y, which value on fig. 3 is shown near each state. The transitions from states a3, a4, and a5 are unconditional,
therefore on these transitions the logical value 1 is written as a transition condition. The values of the sets B(ai) and X(ai), and
also the ranks ri of transition functions for the given FSM are presented in Table 1. As for the given example we have max(|
X(am,as)|)=5, then according to (2) the value r* = 6. Let it is necessary to construct the FSM in FPGA for which the
maximum number of inputs of LUTs equally 6, i.e. we have n = 6.
s p
Table 1. Values of B(ai), X(ai), ri, li , and li for initial FSM
State B(ai) X(ai) ri lis lip
a1 {a6} {x6,x7,x8,x9,x10} 6 1 1
a2 {a1,a6} {x1,x2,x3,x4,x5,x6,x7,x8,x9,x10} 12 3 2
a3 {a1} {x1,x2,x3,x4,x5} 6 1 1
a4 {a2,a3} {x1} 3 1 1
a5 {a2,a4} {x1} 3 1 1
a6 {a5} Ø1 1 1 1
Note 1. Ø is an empty set.
s p
According to (3) and (4) the values li and li are defined for each state (they are presented in the appropriate
columns of Table 1). Suppose that we do not know as the compiler performs a decomposition of Boolean functions, therefore
we assume the sequential decomposition (the worst variant). Hence the value of coefficient k in expression (5) putting equal
10, i.e. we have k = 10. As a result the number of LUT levels (which are necessary for implementation of each transition
s
function) is defined by value li = li . Thus, for our example we have int(lmid) = int(8/6) = 2. In other words, splitting of FSM
internal states stops as soon as each transition function can be implemented in two levels of LUTs.
s s
For a considered example we have lmax = l2 = 3, i.e. the condition (9) is broken for a state a2, as lmax = l2 = 3 >
int(lmid) = 2. For this reason the state a2 splitting by means of algorithm 2. The matrix W is constructed for splitting state a2
(fig. 4).
a1 a6 x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
w1 1 0 1 1 1 1 1 0 0 0 0 0
w2 0 1 0 0 0 0 0 1 1 1 1 1
a2_2 !x1
! x 10 y=1
!(!x6x7!x8x9!x10) x !x 8x 9
!x 6 7
1 a5
a6
y=0 y=1
Thus, for given FSM we reduced by splitting state a2 the number of LUT levels with 3 to 1, in case of a sequential
decomposition, and with 2 to 1, in case of a parallel decomposition.
4. Experimental results
An efficiency of the offered synthesis method was checked with implementation of the initial FSM (fig. 1) and the
FSM after splitting of state a2 (fig. 2) in FPGAs from Altera by means of the design tool Quartus II version 15.0. The main
optimization criterion had been selected the parameter «speed». The method of state assignment for the initial FSM has
been selected «one-hot», and for the FSM after synthesis has been selected «user» (the codes of states are defined from the
FSM description).
Table 3 represents results of experimental researches of a considered synthesis method of high-speed FSMs for
various FPGA families, where nLUT1 and nLUT2 are the number of used LUT with implementation of the initial FSM and
the synthesized FSM respectively; F1 and F2 are the frequency of functioning (in MHz) for the initial FSM and the
synthesized FSM respectively; F1/F2 is a relation of appropriate parameters.
Table 3. Results of experimental researches.
5. Conclusion
The represented results of experimental researches showed the following. In despite of the fact that in the
considered example the rank of transition functions has been reduced from 12 to 6 that allowed to reduce the number of
LUT levels from 3 to 1, in case of the sequential decomposition, and from 2 to 1, in case of the parallel decomposition,
however performance of the FSM increased not for all FPGA families. It speaks complexity of the synthesis task of high-
speed FSMs, for example, in comparison with the task of a reduction of the implementation cost. The matter is that a
performance of an FSM depends not only from results of a logical synthesis, but also from results of placing and routing.
The reduction of the implementation cost for some families FPGA, as a result of application of the given method, speaks
simply: with the reduction of the number of LUT levels, an amount LUT also decreases.
Note that the offered method can be applied also with implementation of high-speed FSMs in chips ASIC. For this
purpose it is enough for the specific ASIC architecture to define the estimations (3) and (4) a number of circuit levels. The
further development of synthesis methods of high-speed FSMs can go by the way using of special structural models of
FSMs, using the architectural properties FPGA, special control of a clock signal, using embedded memory blocks, etc.
This research was partially supported by Bialystok University of Technology, Poland, grant no. S/WI/1/2013.
References
[1] Salauyou V.V., Klimowicz А. Logic design of digital systems on programmable logic devices. Moscow: Hot Line – Telecom. 2008.
376 p. (in Russian)
[2] Miyazaki N., Nakada H., Tsutsui A., Yamada K., Ohta N. Performance Improvement Technique for Synchronous Circuits Realized
as LUT-Based FPGA’s // IEEE Transactions on Very Large Scale Integration (VLSI) systems, V. 3. № 3. 1995. P. 455-459.
[3] Jozwiak L, Slusarczyk A, Chojnacki A. Fast and compact sequential circuits through the information-driven circuit synthesis // Proc.
of the Euromicro Symposium on Digital Systems Design. Warsaw. Poland. 4-6 September 2001. P. 46-53.
[4] Huang S.-Y. On speeding up extended finite state machines using catalyst circuitry // Proc. of the Asia and South Pacific Design
Automation Conf. (ASAP-DAC). Yokohama. Jan.-Feb. 2001. P. 583-588.
[5] Kuusilinna K., Lahtinen V., Hamalainen T., Saarinen J. Finite state machine encoding for VHDL synthesis // Computers and Digital
Techniques. IEE Proceedings. 2001. V. 148. № 1. P. 23-30.
[6] Rafla N. I., Davis B. A Study of finite state machine coding styles for implementation in FPGAs // Proc. of the 49th IEEE
International Midwest Symposium on Circuits and Systems. San Juan. USA. 6-9 Aug. 2006. V. 1. P. 337-341.
[7] Nedjah N., Mourelle L. Evolutionary synthesis of synchronous finite state machines // Proc. of the Int. Conf. on Computer
Engineering and Systems. Cairo. Egypt. 5-7 Nov. 2006. P.19-24.
[8] Czerwiński R., Kania D. Synthesis method of high speed finite state machines // Bulletin of the Polish Academy of Sciences:
Technical Sciences. 2010. V. 58, № 4, P. 635–644.
[9] Glaser J., Damm M., Haase J., and Grimm C. TR-FSM: Transition-based Reconfigurable Finite State Machine // ACM Transactions
on Reconfigurable Technology and Systems (TRETS). Aug. 2011. V. 4, № 3, P. 23:1-23:14.
[10] Senhadji-Navarro R., Garcia-Vargas I. Finite virtual state machines // IEICE Transactions on information and systems. 2012. V.
E95D. № 10. P. 2544-2547.
[11] Garcia-Vargas I., Senhadji-Navarro R. Finite state machines with input multiplexing: a performance study // IEEE Transaction on
computer-aided design of integrated circuits and systems. 2015. V. 34. № 5. P. 867-871.
[12] Solov'ev V.V. Splitting the Internal States in Order to Reduce the Number of Arguments in Functions of Finite Automata. - Journal
of Computer and Systems Sciences International, Vol. 44, No. 5, 2005, pp. 777-783.