Sync Errors
Sync Errors
Ran Ginosar
VLSI Systems Research Center, Technion—Israel Institute of Technology
Haifa 32000, Israel
[ran@ee.technion.ac.il]
1
product. If latency is not an issue, T is simply set to be a on cycle 2. If low, it will surely go high on the next
whole clock cycle, and for most SOCs it implies MTBF of cycle, when the input R is already stable high, and R2
many eons. goes high on cycle 3.
The two synchronizers connect two simple finite state A word of caution is due here: Although outcome c
machines that implement the required protocol. A four- above implies that metastability typically disappears
phase protocol is specified by means of a generalized STG within a single clock cycle, the second flop is still
in Figure 2, where “DD” means that the data is available required. An exception is discussed in Section 3.2 below.
(at the sender), “UU” means that it may be removed, and
“LL” means data latched by the receiver. (A two-phase D
V TX R R1 R2
protocol may also be employed; the circuits are a bit more FSM
complex [13, 14], and this is typically used in order to F RX
minimize latency on long lines.) The complete logic and A2 A1 A FSM
FSM are shown in Figure 3. A send request (V, true for a L L
single cycle) latches data into REGS and starts the
REGR
REGS
sender’s FSM. The synchronized request (R2) latches the
data into REGR and triggers the receiver’s FSM. The
receiver is given a single-cycle “data received” (D) signal.
The protocol is sometimes modified so that A is set as IDLE IDLE
soon as the received data are latched, but removed only V R2 R2
after the receiver has had an opportunity to use the data. A2 REQ/R=1 ACK/A=1
A2
WAIT
DD
Figure 3: Push synchronizer logic and
R+ protocol FSM
R+
LL CLOCK
CYCLE 1 CYCLE 2 CYCLE 3
A+ A+
R
UU R- a
R-
R1 c
b
A- a c
A-
R2
b c
Figure 2: Four-phase handshake push
synchronization protocol STG Figure 4: Three synchronization scenarios
To consider the synchronizer’s behavior in cases of A VHDL specification of the synchronizer is shown in
conflicts, assume that T equals a whole clock cycle. Upon Figure 5. This is a highly sensitive code, where minor
a potential clock-data conflict on R, one of three possible modifications may render the synchronizer useless. Some
outcomes may happen (Figure 4): such innovative but often fatal modifications are reviewed
a. The rising edge of R is sampled high. R2 goes high in the rest of this paper.
on cycle 2, and data is latched into REGR by the Logic validation tools are typically incapable of
beginning of cycle 3. detecting any errors in such synchronizers. When
b. The rising edge of R is sampled low. Since the reasonable logic assumptions are made, many erroneous
protocol assures that R stays high as long is A is low, synchronizers appear to operate perfectly well.
it will be sampled high on cycle 2, when it is surely Synchronizer-specific verification algorithms are required
stable high. R2 will go high on cycle 3, and data is for this analysis.
latched into REGR by cycle 4.
c. The first flop goes metastable. With a probability of
1-e-T/τ (which is infinitesimally close to 1), the flop
has exited metastability by the next clock, and has
arbitrarily settled to either high or low (the thick
traces of R1 in the figure). If high, then R2 goes high
2
3. The Interesting Synchronizers
-- TRANSMITTER (inputs V, A, output R)
if rising_edge(tx_clock) then
A2 <= A1; A1 <= A; -- 2 flop 3.1 Avoiding the Synchronizer
A3 <= A2; F <= not A3 and A2; -- 1 shot
case (tx_fsm_state) is The most common synchronization error is the transfer
when idle => of a signal from one clock domain into another without
if (V = '1') then any synchronization. In some cases the designer felt that
tx_fsm_state <= req;
R <= '1'; failure probability was too low to worry about (he has
end if; learned about MTBF in the range of 10100 years, so why
when req => bother?). In other cases, the receiver operated at a much
if (A2 = '1') then higher clock frequency than the sender, and the designer
tx_fsm_state <= waiting;
R <= '0'; felt that the receiver would always be fast enough to catch
end if; the signal.
when waiting => The incoming data is used as a combinational input to
if (A2 = '0') then a combinational circuit, which eventually feeds into a flip-
tx_fsm_state <= idle;
end if; flop. Since the timing of the input is unknown, there is no
when others => way to guarantee the timing of the output of the
tx_fsm_state <= idle; combinational circuit. In particular, it may change
R <= '0'; simultaneously with the sampling edge of the clock, and
end case;
end if; the receiving flip-flop may enter metastability or take
excessively long time to respond, hampering correct
-- RECEIVER (input R, output A) operation of the next stage of logic [2].
if rising_edge(rx_clock) then How often does the receiving flop enter metastability?
R2 <= R1; R1 <= R; -- 2 flop
R3 <= R2; D <= not R3 and R2; -- 1 shot The rate of entering metastability is TW×fD×fC. For a
case (rx_fsm_state) is 0.18µm SOC (where TW≈50ps) with a clock domain
when idle => operating at 200MHz and receiving data every 1000
if (R2 = '1') then
cycles, that rate is 2000/sec, namely two metastability
rx_fsm_state <= ack;
A <= '1'; events every millisecond. Ignoring such a high rate does
end if; take some courage!
when ack => This error can sometimes evade detection by normal
if (R2 = ‘0') then
logic validation tools. Simulations may assume such
rx_fsm_state <= idle;
A <= '0'; timing relations among the different clocks that all timing
end if; constraints are met. Static timing analysis would generate
when others => setup and hold violation warnings for every signal that
rx_fsm_state <= idle; crosses domain boundaries, but due to the typically huge
A <= '0';
end case; number of such warnings most designers treat them as
end if; chaff and ignore them, assuming that the synchronizers
Figure 5: Push 2-way 4-phase synchronizer will handle all those issues anyway. Consequently,
VHDL specification legitimate warnings can easily be overlooked.
The error can be detected by the following clock
One tool has been developed specifically for validating domain crossing analysis, which can be performed using
synchronization. The Avant! Clock Domain Checker [15] standard path analysis, e.g. as offered by logic
is a decent first attempt at addressing this issue. However, synthesizers and by static timing analyzers. All possible
it has a number of drawbacks: First, the control and data pairs of clocks must be identified. For each pair, the CAD
signals that cross domain boundaries must be named in a tool is made to report all logic paths that begin in a flop
manner that facilitates these checks. Second, it validates driven by the first clock and end in a flop driven by the
only one-sided transfers and does not examine complete second clock. The resulting list should be studied, either
two-sided protocols and the protocol state machines. manually or with an automated script, and every reported
Third, it only validates a limited set of pre-defined rules, path must be approved. Typically, the crossing lists are
mostly covering a simple two-flop synchronizer and data carefully maintained and are used as ‘false-path’
lines protected by it; for instance, it does not check the specifications, instructing the analysis tool to ignore cross-
synchronization of asynchronous reset. Fourth, it only domain paths that are already verified.
handles “push” (and control-only) synchronizers, but
neither “pull” nor “push-pull” ones. Another such tool is
@Verifier from @HDL [16].
3
with additional combinational logic, and the timing of that
3.2 One Flop Synchronizer combinational path is typically designed to fit within a
A deceptively effective means of cutting down on the single clock cycle. But in cases of clock-data conflict of
two-flop synchronizer’s latency is to remove one of the R, R1 may take longer than the normal flop tPD to
flops (Figure 6). stabilize, and consequently the entire combinational path
from R1 through D and to the last flop fails to converge
during a single cycle. The right solution, obviously, is to
R
add a flop and set D= R 2 × R3 (as in Figure 3).
R D
R1 R2
SENDER A RECEIVER
4
domains. The leading edge of the reset signal is harmless, 3.8 DFT Leakage
as it forces all circuits to a known starting state. The
trailing edge, on the other hand, is the culprit in some Simple production testers may have only a single
chips. During global reset all the various clocks are started clock. To test a GALS SOC on such testers, all clocks are
and all PLLs settle into their respective different shorted together. Static faults (such as stuck-at) and some
frequencies. When the reset is removed, it can happen dynamic faults (speed testing of the individual clock
simultaneously with the sampling edge of one of the domains) are properly tested that way. The clock shorts of
clocks. The global reset is typically connected into the course must be ignored during path analysis (by means of
asynchronous clear (or preset) input of many flip-flops, manually assembled ‘false-path’ lists or by instructing the
and its trailing edge must respect a setup constraint, or analysis to ignore all paths that are conditioned upon a
else the flops may enter metastability. test-enable signal). But certain changes of the design may
A safe interface is shown in Figure 8. It belongs with result in an error (sneaky) path masked by the list.
each of the several clock generators of the SOC. While the The solution is to recheck the entire false-path list as a
leading edge is transferred without delay (when the clocks final check, after all design changes are completed.
may be inoperative), the trailing edge is synchronized. 3.9 Pulse Synchronizer
RESET The pulse synchronizer (Figure 11) is designed to pass
RESET WITH a single “pulse” (a logic signal that is set to ‘1’ for only a
SYNCHRONIZED
single clock cycle) from one clock domain to another. A
TRAILING EDGE
pulse on P causes the sender’s flop to toggle. Eventually,
CLOCK D is set high for a single cycle of the receiver’s clock as a
Figure 8: Global reset synchronizer result.
The designer was lucky to discover the problem when
3.7 Async Clear Synchronizer the circuit was tried on an FPGA, prior to tapeout.
Sometimes the P input was set to ‘1’ for two consecutive
Occasionally (and contrary to the wisdom of typical
cycles. At other times two pulses came in succession, with
synchronous design methodologies) asynchronous clear or
only one cycle in between. In both cases the synchronizer
preset of a flop may be employed as part of the logic
has generated undesirable results. The astute reader can
(rather than for global reset, as discussed in Section 3.6).
easily figure out what they were. The situation was
Some designers feel that, since this is an asynchronous
mended by replacing this with a standard control-only
clear, it needs not be synchronized even when it crosses
synchronizer, operating with a standard two—phase
clock domain boundaries (Figure 9).
protocol.
The problem is very similar to that described in
Section 3.6: Removal of the asynchronous clear signal
may concur with the rising edge of the receiver’s clock,
potentially leading to metastability. The solution is either D
to synchronize the reset signal with two flops, or (when
the leading edge must not be delayed) design an P EN
5
R While the first scenario seems to be handled properly by
the circuit (in spite of the designer’s ignorance), the latter
EN case may cause damage in the circuit that follows the flop.
Various “metastability blockers” or circuits that
“eliminate” metastability are repeatedly reinvented and
SLOW FAST occasionally get published. Fortunately, most practitioners
SENDER RECEIVER have learned to take them with a grain of salt.
...
A designer has suggested blocking metastability by the
circuit of Figure 13. RESET clears the SR latch and the
synchronizing flop. When the clock is high, if INPUT
rises, the latch is set. When the clock goes low, the
asynchronous input is blocked and only the SR latch Figure 14: Parallel “synchronizer”
output is connected to the flop. When the clock rises, it
This scheme is a yet another prescription for a sure
samples the synchronous output of the latch, rather than
disaster. On clock-data conflict, each of the several data
the asynchronous input.
synchronizers may end up doing something different:
Some may sample the new data, others may miss it and
retain the old data, while yet others may enter
metastability. Of the metastable ones, some may settle to
S
0
CLR ‘1’ while others may settle to ‘0’. There is no way of
RESET R
MUX D Q
SYNCHRONIZED telling which is which, as all four options are equally
INPUT
legitimate and possible outcomes.
INPUT 1
To emphasize the severity of failure, recall that a
CLOCK typical single synchronizer may enter metastability twice
every millisecond, as computed in Section 3.1. Thus, a 32
bit parallel synchronizer faces a risk of failure every 16
Figure 13: Metastability “blocker” microseconds!
The designer has missed two problem scenarios, Another incarnation of this problem employs three
though. If INPUT rises exactly when the clock goes low, parallel synchronizers and takes a vote of their outputs. Is
the SR latch can become metastable. It will most likely this any safer than the non-voting parallel synchronizer?
settle by the next rising edge of the clock. In other words, 3.13 Shared Flop Synchronizer
the metastability risk has simply been transferred from the
flop to the latch, and one-half clock cycle is allowed for The synchronization handshake protocol is sometimes
settling. If the proper protocol is employed (e.g., INPUT implemented with a signaling latch, set by the sender and
stays high until acknowledged), the synchronization will cleared by the receiver. A somewhat misleading example
function correctly. based on two signaling flops has been published by a
The second scenario is more dangerous. If INPUT leading FPGA vendor (Figure 15). The problem is that the
rises exactly when the clock rises, the SR latch will RECEIVE signal, which is driven by the sender’s clock, is
probably miss it but the flop may become metastable.
6
never synchronized by the receiver’s (at least not in the R
schematics shown in the publication).
SENDER RECEIVER
DATA
TRANSMIT RECEIVE A
1 D Q Q D 1 4. Conclusions
A few examples of synchronization design errors have
READY ACK been presented and analyzed. As long as there are no fool-
Figure 15: Shared flop “synchronizer” proof algorithms and tools to validate synchronizers, the
rules to safe design should be closely watched. A strict
A better scheme for a shared latch synchronizer design methodology and discipline should be enforced,
(Figure 16) has been shown by Dike [19] and has been especially prohibiting arbitrary “improvements” of
employed successfully in a low-voltage product (low synchronizers and shortcuts in their design and
supply voltage increases the risk of metastability). The implementation. Optimizations that may impede future
control signals generated by the shared latch are both design reuse should be avoided. Knowledgeable rigorous
carefully synchronized with their respective clocks. validation should be carried out to verify that all crossings
of clock domains are understood and legitimate. Global
ACK Q D Q D D Q D Q REQ signals that span multiple domains, such as reset and
Q
clocks, should be examined carefully. Such validation
WRITE READ should be repeated after every design change and before
CLK
WRITE VALID S R READ VALID
CLK final design closure.
Present efforts to design synchronizer cell libraries and
to develop rigorous tools for synchronization validation
Figure 16: A correct shared latch may help alleviate these issues and assure safe GALS
synchronizer SOCs.
Synchronization issues may be more difficult to
3.14 Conservative Synchronizer examine and validate with third-party IP cores, and
The careful designer occasionally wishes to be on the especially “hard” cores whose internal logic design is
safe side and, when synchronization latency is not an unknown to the SOC designer. The architect should insist
issue, adds “just a few more stages” to the synchronizer on at least a complete specification of their synchronizing
(Figure 17). While this is not an error, it is interesting to circuits.
learn what additional level of safety is thus obtained. A certain type of synchronizers has not been dealt with
Considering an SOC with two clock domains where the in this paper, namely fast synchronizers for multi-sync
receiver operates at 200 MHz (a reasonable frequency for [20] or mesochronous [4, 5] clock domains. Their design
the 0.18µm technology), and where data is exchanged and validation are more complex and deserve another
every ten clock cycles (as a worst case), and assuming paper.
TW=50ps, τ=10ps (all ‘conservative’ numbers), the normal
two-flop MTBF is e500 2 ×105 = 10 204 years. This is rather Acknowledgement
safe, when we recall that the age of the universe is 1010
years. The added cycle time provides an extra safety The author is grateful to the many imaginative
designers whose innovations ended up in this paper. Their
factor of e500 , achieving a more comforting level of 10420
names are kept in confidence. The anonymous referees
years. Imagine how much better MTBF could have been if added some interesting examples to this catalog and
you used four flops, rather than three! helped weed out some of the bugs; the author alone should
be blamed for any remaining mistakes.
7
Transactions on VLSI Systems, vol. 3, pp. 264--
References 272, 1995.
[14] A. Peeters and K. v. Berkel, "Single-Rail
[1] J. Jex and C. Dike, "A fast resolving BiNMOS Handshake Circuits," in Asynchronous Design
synchronizer for parallel processor interconnect," Methodologies: IEEE Computer Society Press,
IEEE Journal of Solid-State Circuits, vol. 30, pp. 1995, pp. 53--62.
133-139, 1995.
[15] "Clock Domain Checker User Manual," Avant!
[2] C. Dike and E. Burton, "Miller and Noise Effects Corporation v2001.3, 2001.
in a Synchronizing Flip-Flop," IEEE Journal of
Solid-State Circuits, vol. 34, pp. 849-855, 1999. [16] atHDL, "Multiple Clock Domain Analysis,"
www.athdl.com.
[3] D. J. Kinniment, A. Bystrov, and A. Yakovlev,
"Synchronization Circuit Performance," IEEE [17] A. V. Yakovlev, "On Limitations and Extensions
Journal of Solid-State Circuits, vol. 37, pp. 202-- of STG model for Designing Asynchronous
209, 2002. Control Circuits," in Proc. International Conf.
Computer Design (ICCD): IEEE Computer
[4] W. J. Dally and J. W. Poulton, Digital System Society Press, 1992, pp. 396--400.
Engineering(Eds.): Cambridge University Press,
1998. [18] Principles of Asynchronous Circuit Design: A
Systems Perspective, S. Furber (Eds.): Kluwer
[5] T. H.-Y. Meng, Synchronization Design for Academic Publishers, 2001.
Digital Systems(Eds.): Kluwer Academic
Publishers, 1991. [19] C. Dike, "Sychronization Tutorial," presented at
Sixth International Symposium on Advanced
[6] D. J. Kinniment and J. V. Woods, Research in Asynchronous Circuits and Systems
"Synchronization and Arbitration Circuits in (ASYNC2000), 2000.
Digital Systems," Proceedings of the IEE, vol.
123, pp. 961--966, 1976. [20] R. Ginosar and R. Kol, "Adaptive
Synchronization," in Proc. International Conf.
[7] T. J. Chaney and C. E. Molnar, "Anomalous Computer Design (ICCD), 1998, pp. 188--189.
Behavior of Synchronizer and Arbiter Circuits,"
IEEE Transactions on Computers, vol. C-22, pp.
421--422, 1973.
[8] M. Pechoucek, "Anomalous Response Times of
Input Synchronizers," IEEE Transactions on
Computers, vol. 25, pp. 133--139, 1976.
[9] W. Fleischhammer and O. Dortok, "The
anomalous behavior of flip-flops in synchronizer
circuits," IEEE Transactions on Computers, vol.
28, pp. 273--276, 1979.
[10] H. J. M. Veendrick, "The Behavior of Flip-Flops
Used as Synchronizers and Prediction of Their
Failure Rate," IEEE Journal of Solid-State
Circuits, vol. 15, pp. 169--176, 1980.
[11] L. Kleeman and A. Cantoni, "Can redundancy
and masking improve the performance of
synchronizers," IEEE Transactions on
Computers, vol. 35, pp. 643--646, 1986.
[12] Y. Semiat and R. Ginosar, "Timing
Measurements of Synchronization Circuits,"
under http://www.ee.technion.ac.il/~ran -->
publications.
[13] P. Day and J. V. Woods, "Investigation into
Micropipeline Latch Design Styles," IEEE