A Survey of Pulse Triggered Flip Flop
A Survey of Pulse Triggered Flip Flop
Abstract:- A cell library includes a number of cells with phase of the clock, the master stage is transparent, and the D
different functionalities, where each cell may be available in input is passed to the master stage output,QM. During this
several sizes and with different driving capability. Two central period, the slave stage is in the hold mode, keeping its
categories of cells included in cell libraries are flip-flops and previous value using feedback. On the rising edge of the
latches. Latches and flip-flops have a direct impact on power
consumption and speed of VLSI systems. Therefore study on
clock, the master slave stops sampling the input, and the
low-power and high performance latches and flip-flops is slave stage starts sampling. During the high phase of the
inevitable. clock, the slave stage samples the output of the master stage
(QM), while the master stage remains in a hold mode. Since
Keywords - Flip Flop (FF), Master slave FF, pulse-triggered QM is constant during the high phase of the clock, the
FF. output Q makes only one transition per cycle. The value of
1. INTRODUCTION Q is the value of D right before the rising edge of the clock,
For high performance VLSI chip-design, the achieving the positive edge-triggered effect. A negative
choice of the back-end methodology has a significant edge-triggered register can be constructed using the same
impact on the design time and the design cost. Making principle by simply switching the order of the positive and
every single gate from scratch is not necessarily the best negative latch (this is, placing the positive latch first).
method. Instead, a sufficient set of predesigned standard Where as in Pulse triggered flip flops, a short pulse around
cells can be utilized as building blocks to design most of the the rising (or falling) edge of the clock is created through a
functional blocks. Semiconductor manufacturers offer pulse generator circuit. This pulse acts as the clock input to
standard cell libraries which are also supported by CAD a latch. Sampling of latch is done in this short window
tools in automated design flows including the final physical created by the pulse generator. Race conditions are thus
auto-placement and routing. Despite the performance avoided by keeping the opening time, the transparent
limitations, standard cell libraries could be useful even in period) of the latch very short. The combination of the
design of high performance VLSI chips. Often, only a glitch generation circuitry and the latch results in a positive
smaller portion of the chips include performance-critical edge-triggered register. In pulse triggered flip flops only
units, and the rest of the design could be maximally one latch is used whereas it is two in normal edge triggered
automated to reduce the design time without degrading the flip flops. The only type of flip flops which has time
targeted performance. In addition, the concept of cell library borrowing capability with negative set-up time is pulse
can be extended to even support the full-custom part of the triggered flip flops.
chip. Custom cell libraries can be made and shared by the 2.1 Dynamic and Static Flip Flops
designers of the performance critical units. Static flip-flops are a group of flip-flops that can
Flip flops and Latches are extremely important preserve their stored value even if the clock is stopped. In
circuit elements in any synchronous VLSI chip. They are contrast, in dynamic flip-flops the stored value will be
not only responsible for correct timing, functionality, and destroyed if it is not refreshed for a while. Basically
performance of the chips, but also their clocked devices dynamic flip-flops can achieve higher speed and lower
consume a significant portion of the total active power. power consumption. However this family of flip-flops
Based on the comparison of the power breakdown for suffers from serious potential failures. Storage loss because
different elements in VLSI chips, latches and flip-flops are of leakage currents, power supply noise and etc. are
the major source of the power consumption in synchronous possible in dynamic flip-flops and must be considered by
systems. the designers.
1.1 Factors Desirable for Flip Flops Millisecond storage retention time is usually not a
The factors which are desirable in latches and flip- problem when chip is operating normally; However when
flops are high speed, Low power consumption, Robustness chip is in testing mode it becomes a serious problem. In
and noise stability, Small area and less number of many modern testing modes are inevitable.
transistors, Supply voltage scalability and less internal
activity when data activity is low. A fundamentally different
approach for constructing a FF uses pulse signals.
By lowering the clock swing, the power of the clock consumption compared with a typical approach, depending
distribution network is decreased as proportional to either on the FF topology, their number in the clock domain and
Vcfk or Vclk. When the clock is to be stopped, it should be the transistor sizing. Other than understanding interesting
stopped at vss. Then there is no leak current. Transistor properties of FFs in nanometer technologies, the
count of the RCSFF is 20 including an inverter for investigation has also permitted to derive several
generating D’, while that of the conventional F/F is 24. The considerations that help the designer in designing the clock
area of the RCSFF is 16% smaller than the conventional network. A detailed analysis on the FF delay variability
F/F as seen from Fig. 4 even when the well for the and buffer-related skew/jitter sources has also been carried
precharge PMOS is separated. SPICE analysis is carried out. By employing a clock slope up to at the clock domain
out assuming typical parameters of a generic 0.5pm double level, the impact of capacitive crosstalk and process
metal CMOS process. The delay depends on Wclk. Since variations is very slightly increased, whereas the impact of
delay improvement is saturated at Wclk = 10pm, this value the supply voltage noise on the jitter is even reduced. As
of Wclk is used in the area and power estimation. The regards delay variability, it is substantially unaffected by
power consumption is reduced to about 1/2-1/3 compared the clock slope. Analysis has revealed that, in the wide
to the conventional F/F depending on the type of the clock range of clock slope, the speed performance of FFs is
driver and Vwell. In the best case studied here, 63% power approximately unaffected by the clock slope, whereas the
reduction is observed. For the RCSFF, the D and D ’ input FF energy dissipation is more heavily influenced. Finally, a
can also be small voltage swing signals. Using these study on the effect of technology scaling has revealed that
characteristics, the RCSFF can be used to speed up RC optimization of the clock slope will be more important in
delay of long buses. By placing the RCSFF at the end of a the future and that the optimal clock slope will move
long bus and by sense-amplifying the slowly changing D towards smoother values.
input, RC delay can be reduced to 1/3 compared to the
conventional F/F.
•Soheil Ziabakhsh and Meysam Zoghi in “Design of a
•Massimo Alioto, Elio Consoli, and Gaetano Palumb in Low-Power High-Speed T-Flip Flop Using the Gate-
“Flip-Flop Energy/Performance versus Clock Slope and Diffusion Input Technique” proposed an implementation
Impact on the Clock Network Design” proposed the of a new TFF using GDI technique for low-power and
influence of the clock slope on the speed of various classes high-speed in order to achieve a PDP is presently while
of flip-flops (FFs) and on the overall energy dissipation of having a still low complexity. Simulation results using
both FFs and clock domain buffers is analyzed. Analysis ADS 2008 show that the proposed flip flop has the least
shows that an optimum clock slope exists, which propagation delay of 169.7 psec and consumption power
minimizes the energy spent in a clock domain. Results 188.9 μW in a power supply of 1.8 V. Also results show
show that the clock slope requirement can be relaxed with more than 45% decrease in PDP of proposed circuit. This
respect to traditional assumptions, leading up to 30-40% paper proposes of a low-power high-speed T flip flop
energy savings and at a very small speed performance Using GDI Technique. It is based on the Master-Slave
penalty. The effectiveness of the clock slope optimization connection of two GDI Latches and some gates. Each latch
is discussed in detail for the existing classes of FFs. The consists of four basic GDI cells, resulting in a simple eight-
impact of such an optimization in terms of additive skew transistor structure and gates consists six transistors in
and jitter contributions is discussed, together to the analysis order that related with latch. The components of the latch
of the impact of technology scaling. Extensive post-layout circuit can be divided into two main categories; GDI gate
simulations on a 65-nm CMOS technology are performed and inverter. GDI gate uses two transistors and controlled
to check the validity of the underlying assumptions and by the Clk signal. Clk signals fed to the gate of transistors
approximations. In this paper, the impact of clock slope on and create two alternative states: one state is when the Clk
the speed performance and energy consumption of a wide is low and the signals are propagating through PMOS
range of FFs has been discussed. Analysis has revealed transistors and create transient state and other one is when
that, in the wide range of clock slope, the speed the Clk is high and the prior values are maintained due to
performance of FFs is approximately unaffected by the conduction of the outputs. In this state, GDI gates holding
clock slope, whereas the FF energy dissipation is more state of the latch.A novel methodology for asynchronous
heavily influenced. Analysis of the energy contributions in circuits, based on two-transistor GDI cells, was presented.
a clock domain has shown that a smoother clock slope The proposed circuit has a simple structure, based on two
leads to an increase in the FF energy, and a decrease in the Master-Slave principles, and some gates to describe T flip
energy dissipated by the local clock buffer. Detailed flop. It contains 24 transistors. An optimization procedure
analysis of this tradeoff has shown that the energy was developed for GDI TFF, based on iterative transistor
dissipation of the clock network within a clock domain can sizing, while targeting a minimal power-delay product.
be significantly reduced by properly choosing the clock Performance comparison with other TFF design techniques
slope. This optimal value has been analytically derived and was shown, with respect to gate area, delay and power
validated by simulations, and has been shown to be quite dissipation.
different from the usual assumption of a steep slope.
Typical values of optimal clock slope range from F04-F05.
This optimization allows saving up to40%energy
papers, figures of merit that designers are familiar with significantly smaller leakage power consumption; and (2)
have been considered to gain an insight into the considered in four corner process, proposed flip flop has significantly
tradeoffs in a wide range of applications. Analysis showed decreasing average power consumption than either of the
that the results are different from previous papers because, flip-flops discussed earlier. Moreover proposed flip-flop
here, the layout parasitic has been explicitly included from has a good setup time and hold time and this proposed flip-
the beginning and a much wider range of topologies has flop is the most area efficient compared with others flip-
been considered. According to the presented results, the flops.
fastest topology is the STFF, the best low-energy FFs are
the DETTGLM and TGFF, whereas the most energy- • Borivoje Nikolic, Vojin G. Oklobd zija, Vladimir
efficient throughout a wide region of the energy-delay Stojanovic, Wenyan Jia, James Kar-Shing Chiu and
design space is the TGPL. For the first time, the layout Michael Ming-Tak Leung in “Improved Sense-
efficiency of FFs has been analyzed. In particular, HLFF, Amplifier-Based Flip-Flop: Design and Measurements”
MSAFF, and TGFF exhibit a very efficient area-delay presented the design and experimental evaluation of a new
tradeoff. Moreover, it has been shown that area is almost sense amplifier based flip-flop (SAFF). It was found that
proportional to leakage regardless of the FF topology and the main speed bottleneck of existing SAFF’s is the cross-
the transistor sizing. The differences between the leakage- coupled set-reset (SR) latch in the output stage. The
delay and the more general energy-delay tradeoff have proposed SAFF has all the advantages of earlier published
been pointed out. It has also been shown that leakage has a SAFF’s. It allows integration of the logic into the flip-flop,
significant impact on the optimum transistor sizing, as well as reduced clock-swing operation. The single-ended
especially for MS FFs. The clock load seen from the clock input version with multiplexed data scan and asynchronous
terminal of a FF and the related dissipation of the clock reset is possible. The new flip-flop uses a new output stage
distribution network, has also been analyzed. It is also latch topology that significantly reduces delay and
shown that, by including the impact of local clock improves driving capability. The performance of this flip-
distribution buffers, whose dissipation is directly related flop is verified by measurements on a test chip
with FFs clock load, the rankings of FFs in the E-D space implemented in 0.18 m effective channel length CMOS.
do not change significantly, unless for the MS class that is Demonstrated speed places it among the fastest flip-flops
somewhat penalized. As a general remark, simpler basic used in the state-of-the-art processors. Measurement
structures are rewarded in nanometer technologies because techniques employed in this work as well as the
of the strong impact of layout parasitic. In particular, measurement set-up are discussed in this paper. The
explicit pulsed topologies, and specifically the TGPL, have interest in high-speed flip-flop design re-emerged recently
been recognized as the most efficient FF topologies in a as the frequencies of operation passed 1 GHz. The
very wide range of applications from many points of view. importance of a good flip-flop design affects the power
consumed by the clock as well as the available time in
• Majid Rahimi Nezhad and Mohsen Saneei in “Low- ever-shrinking pipeline. The strong driving capability of
Power Pulsed Triggered Flip-Flop with New Explicit this flip-flop makes it suitable for GHz design
Pulse in 65-nm CMOS Technology” a low power pulse characterized with a short pipeline and high fan-out. The
triggered flip-flop with new explicit pulse is proposed. differential input signal nature of the flip-flop makes it
Their idea is in the pulse generator that is explicit and the compatible with the logic utilizing reduced signal swing.
main structure is from clock branch sharing flip-flop. Further, they developed a method for accurate
Because they have decreased short circuit current, and measurement of flip-flop parameters from the test chip.
utilizing dual-edge triggering technique also have They obtained very good measurement accuracy of 10 ps
improvements in power consumption and power-delay under difficult conditions characterized with high test
product (PDP). In different switching activities, proposed frequency. This flip-flop was implemented on a test chip in
circuit has minimum PDP with FO4 load. Circuits were 0.18 CMOS technology. The measurement results place it
optimized for PDP. Simulation results show for 50% data on the top in terms of speed as compared to other flip-flops
activity, power consumption is less than 7% to 32% lower used in high-performance processors.
than other flip-flops. Both power consumption and
minimum delay of proposed flip flop are better than other • Fabian Klass, Chaim Amir, Ashutosh Das,
flip-flops which have compared to. Supply voltage and Kathirgamar Aingaran, Cindy Truong, Richard Wang,
operating clock frequency are 1.1v and 1GHz respectively, Anup Mehta, Ray Heald, and Gin Yee in “A New
and proposed flip-flop is implemented in 65 nm CMOS Family of Semi dynamic and Dynamic Flip-Flops with
technology. In this paper, low-power pulsed triggered Embedded Logic for High-Performance Processors”
double-edge triggered flip-flop has been proposed. Its latch presented this paper in an attempt to reduce the pipeline
structure is from clock branch sharing flip-flop. Simulation overhead; a new family of edge-triggered flip-flops has
results show for 50% data activity, power consumption is been developed. The flip-flops belong to a class of
less than 7% to 32% lower than other flip-flops. In semidynamic and dynamic circuits that can interface to
different data activity, proposed flip-flop has smallest both static and dynamic circuits. The main features of the
power consumption. Proposed flip-flop has a number of basic design are short latency, small clock load, small area,
advantages. Two notable advantages are (1) the size of and a single-phase clock scheme. Furthermore, the flip-flop
transistor in critical path is decreased, allowing family has the capability of easily incorporating logic
functions with a small delay penalty. This feature greatly borrowing is only possible for late arriving high inputs,
reduces the pipeline overhead, since each flip-flop can be e.g., from a preceding domino logic stage or a preceding
viewed as a special logic gate that serves as a skewed static logic stage. The rising and falling delays for
synchronization element as well. The flip-flop family the DPSCRFF have been separated out since they differ
presented in this paper has played an integral role in significantly. The rising delays are negative since the
meeting the cycle-time goal of the microprocessor reported output precharge before the input is required to arrive. The
in. In an attempt to reduce the pipeline overhead, a new flip-flops were optimized for the worst-case positive delay,
family of edge-triggered flip-flops has been developed. The which in some cases increases the negative delays. As
flip-flops belong to a class of semidynamic and dynamic described above, the negative delay can be used to improve
circuits that can interface to both static and dynamic logic. performance or to lower power if skewed logic circuits are
The term semidynamic is used here to denote circuits that used. As can be seen, the fastest DPSCRFF at 54 ps is
internally have a precharge and evaluation phase, similar to significantly faster than the next fastest flops (HLFF and
dynamic gates. The main features of the basic design are SSASPL) at roughly 150 ps. The lowest-power DPSCRFF
short latency, small clock load, small area, and a single- at 141 fJ is comparable to the lowest-power flop (PPCFF)
phase clock scheme. Furthermore, this flip-flop family has at 130 fJ. However, it has a propagation delay of only 167
the capability of easily incorporating logic functions with a ps compared to 342 ps. When the data is held low while the
small delay penalty. This feature greatly reduces the clock continues to run, the energy dissipation of the
pipeline overhead, since each flip flop can be viewed as a DPSCRFF is reduced. However, if the clock is running and
special logic gate that serves as a synchronization element the data is held high, the DPSCRFF actually dissipates
as well. Taken together, these features make the flip-flop more power than for the full activity waveforms because of
family presented in this paper well suited for high- its output glitches. When the clock is held stable, no
performance microprocessor design. This paper describes a internal nodes change state and only the single data input
new family of semidynamic and dynamic edge-triggered gate toggles. The DPSCRFF therefore has low energy
flip-flops, which are well suited for high-performance when the local clock is gated. When the clock is gated, the
microprocessor design. They provide short latency and a DPSCRFF has the lowest possible data input loading (a
good interface to static and dynamic logic, and can easily single transistor gate). The asymmetric propagation delay
incorporate complex logic functions with a small delay enables the use of highly skewed logic to reduce cycle time
penalty. These features contribute to reducing the pipeline and energy. The glitching present at the output may cause
overhead of the processor by allowing the elimination of additional energy dissipation in downstream logic
one or more gate delays from a path leading to the flip flop. dependent on signal statistics.
These flip-flops have played an integral role in meeting
cycle-time goals. 5. CONCLUSION
Various pulse triggered flip flops were reviewed. A
•Albert Ma and Krste Asanovi´c in “A Double-Pulsed universal flip-flop with the best performance, lowest power
Set-Conditional-Reset Flip-Flop” proposed a new flip- consumption, and highest robustness against noise would be
flop design using a double-pulsed static latch. The flip-flop an ideal component to be included in cell libraries. The
has only a single stage of logic in the critical path and as a combination of a pulse-generation circuitry and a latch
result is up to three times faster than the fastest previously results in a positive edge triggered register. Pulse triggered
known flip-flops, while consuming approximately the same FF’s reduce the number of latch stages into a single stage.
energy as the lowest power flip-flops. The flip-flop has The logic complexity and number of stages are reduced in
asymmetric timing properties which make it a good match these pulse triggered FF’s leading lesser D-to-Q delays. The
to skewed logic styles. A novel dual-pulse generator further main advantage of these pulse triggered FF’s is that they
reduces power requirements. In this work we introduce a allow time borrowing across clock cycle boundaries and
new flip-flop structure, the double-pulsed set-conditional- feature a zero or even negative setup time. Due to these
reset flip-flop (DPSCRFF), which is up to three times advantages P-FF’s has been considered a popular alternative
faster than the fastest previously known flip-flops while for traditional master slave FF.
consuming the same power as the lowest-power flip-flops.
The DPSCRFF is a single-ended static flip-flop design with ACKNOWLEDGEMENT
a single logic stage which can include arbitrary logic I believe the success of any work depends on the
functionality. The DPSCRFF is compatible with static or encouragement and guidelines of others. So I am taking
dynamic logic, and in particular can directly drive this opportunity to express my sincere gratitude to the
following dynamic logic. The two pulses are generated by people who have been there in all my success. I owe a
a local pulse generator to avoid pulse distortions from sincere prayer to the LORD ALMIGHTY for his kind
additional pulse buffers and wiring. The pulse generator blessings and full support, without which this would have
can be shared by a few neighboring flip flops to reduce not been possible. I wish to take this opportunity to express
pulse generator area and energy overheads. This DPSCRFF my gratitude to all, who helped me directly or indirectly to
does not allow arbitrary time borrowing across the complete this paper.
transparency window as with other pulsed latches. Time
REFERENCES
[1] H. Kawaguchi and T. Sakurai, “A reduced clock-swing flip-flop
(RCSFF) for 63% power reduction,” IEEE J. Solid-State Circuits,
vol. 33, no. 5, pp. 807–811, May 1998.
[2] K. Chen, “A 77% energy saving 22-transistor single phase clocking
D-flip-flop with adoptive-coupling configuration in 40 nm
CMOS,” in Proc. IEEE Int. Solid-State Circuits Conf., Nov. 2011,
pp. 338–339.
[3] E. Consoli, M. Alioto, G. Palumbo, and J. Rabaey, “Conditional
pushpull pulsed latch with 726 fJops energy delay product in 65
nm CMOS,” in Proc. IEEE Int. Solid-State Circuits Conf., Feb.
2012, pp. 482–483.
[4] H. Partovi, R. Burd, U. Salim, F.Weber, L. DiGregorio, and D.
Draper, “Flow-through latch and edge-triggered flip-flop hybrid
elements,” in Proc. IEEE Int. Solid-State Circuits Conf., Feb. 1996,
pp. 138139.
[5] F. Klass, C. Amir, A. Das, K. Aingaran, C. Truong, R. Wang, A.
Mehta,R. Heald, and G. Yee, “A new family of semi-dynamic and
dynamic flip-flops with embedded logic for high-performance
processors,” IEEE J. Solid-State Circuits, vol. 34, no. 5, pp. 712–
716, May 1999.
[6] V. Stojanovic and V. Oklobdzija, “Comparative analysis of
masterslave latches and flip-flops for high-performance and low-
power systems,” IEEE J. Solid-State Circuits, vol. 34, no. 4, pp.
536–548, Apr. 1999.
[7] J. Tschanz, S. Narendra, Z. Chen, S. Borkar, M. Sachdev, and V.
De, “Comparative delay and energy of single edge-triggered and
dual edge triggered pulsed flip-flops for high-performance
microprocessors,” in Proc. ISPLED, 2001, pp. 207–212.
[8] S. D. Naffziger, G. Colon-Bonet, T. Fischer, R. Riedlinger, T. J.
Sullivan, and T. Grutkowski, “The implementation of the Itanium 2
microprocessor,” IEEE J. Solid-State Circuits, vol. 37, no. 11, pp.
1448–1460, Nov. 2002.
[9] S. Sadrossadat, H. Mostafa, and M. Anis, “Statistical design
framework of sub-micron flip-flop circuits considering die-to-die
and within-die variations,” IEEE Trans. Semicond. Manuf., vol. 24,
no. 2, pp. 69–79, Feb. 2011.
[10] M. Alioto, E. Consoli, and G. Palumbo, “General strategies to
design nanometer flip-flops in the energy-delay space,” IEEE
Trans. Circuits Syst., vol. 57, no. 7, pp. 1583–1596, Jul. 2010.
[11] M. Alioto, E. Consoli, and G. Palumbo, “Flip-flop
energy/performance versus Clock Slope and impact on the clock
network design,” IEEE Trans. Circuits Syst., vol. 57, no. 6, pp.
1273–1286, Jun. 2010.
[12] M. Alioto, E. Consoli, and G. Palumbo, “Analysis and comparison
in the energy-delay-area domain of nanometer CMOS flip-flops:
Part I methodology and design strategies,” IEEE Trans. Very Large
Scale Integr. (VLSI) Syst., vol. 19, no. 5, pp. 725–736, May 2011.
[13] M. Alioto, E. Consoli and G. Palumbo, “Analysis and comparison
in the energy-delay-area domain of nanometer CMOS flip-flops:
Part II - results and figures of merit,” IEEE Trans. Very Large
Scale Integr.(VLSI) Syst., vol. 19, no. 5, pp. 737–750, May 2011.
[14] B. Kong, S. Kim, and Y. Jun, “Conditional-capture flip-flop for
statistical power reduction,” IEEE J. Solid-State Circuits, vol. 36,
no. 8,pp. 1263–1271, Aug. 2001.