Globally Asynchronous, Locally Synchronous PDF
Globally Asynchronous, Locally Synchronous PDF
Globally Asynchronous,
Locally Synchronous
Circuits: Overview
and Outlook
Miloš Krstić and Eckhard Grass Pascal Vivet
IHP Microelectronics CEA-LETI
Frank K. Gürkaynak
Swiss Federal Institute of Technology Lausanne
430 0740-7475/07/$25.00 G 2007 IEEE Copublished by the IEEE CS and the IEEE CASS IEEE Design & Test of Computers
Figure 1. Globally asynchronous, locally synchronous (GALS) system with pausible clocking.
and power consumption, but for general implementa- which avoids metastability by ensuring no clock
tions the overhead of adding asynchronous wrappers pulses are generated when data is transferred.
resulted in performance penalties. In any case, & FIFO buffers—using asynchronous FIFO buffers
reported improvements were not large enough to between locally synchronous blocks to hide the
warrant a change in design methodology. More recent synchronization problem. A SoC architecture
GALS solutions have focused more on facilitating that uses distinct clock domains connected
system integration, reducing EMI, and providing side- through bisynchronous FIFO buffers is common-
channel security. Although the GALS approach has not ly called a GALS system. In our case, however,
yet profoundly affected IC design, its system integra- we refer only to pure GALS systems, in which the
tion aspects will provide another opportunity for its blocks are connected asynchronously.
wider adoption. & Boundary synchronization—performing bound-
In this article, based on our broad practical ary synchronization on the signals crossing the
experience, we present different GALS techniques borders of the locally synchronous island with-
and architectures, and we analyze the challenges and out stopping the complete locally synchronous
possibilities for wider adoption of these methods. block during data transfer.
September–October 2007
431
Globally Asynchronous, Locally Synchronous Design and Test
degradation or suboptimal
architecture, designers can
achieve the main goal of
designing GALS systems in
the standard design envi-
ronment.
Boundary
synchronization
A third solution is to
perform data synchroni-
zation at the borders of
the locally synchronous
island, without affecting
the inner operation of lo-
cally synchronous blocks
Figure 2. Typical FIFO-based GALS system.
and without relying on
both clocks. The basic GALS method focuses on point- FIFO buffers. For this purpose, designers can use
to-point communication between blocks. standard two-flop, one-flop, predictive, or adaptive
synchronizers for mesochronous systems, or locally
FIFO solutions delayed latching.1,11 This method can achieve very
Another approach to interfacing locally synchro- reliable data transfer between locally synchronous
nous blocks is using specially designed asynchronous blocks. On the other hand, such solutions generally
FIFO buffers8–10 and hiding the system synchronization increase latency and reduce data throughput, resulting
problem within the FIFO buffers. Such a system can in limited applicability for high-speed systems. Table 1
tolerate very large interconnect delays and is also summarizes the properties of GALS systems’ synchro-
robust with regard to metastability. Designers can use nization methods.
this method to interconnect asynchronous and
synchronous systems and also to construct synchro- Advantages and limitations of
nous-synchronous and asynchronous-asynchronous GALS solutions
interfaces. Figure 2 diagrams a typical FIFO interface, The scientific community has shown great interest
which achieves an acceptable data throughput.8 In in GALS solutions and architectures in the past two
addition to the data cells, the FIFO structure includes decades. However, this interest hasn’t culminated in
an empty/full detector and a special deadlock de- many commercial applications, despite all reported
tector. advantages. There are several reasons why standard
The advantage of FIFO synchronizers is that they design practice has not adopted GALS techniques.
don’t affect the locally synchronous module’s opera-
tion. However, with very wide interconnect data Design and system integration issues
buses, FIFO structures can be costly in silicon area. Many proposed solutions require programmable
Also, they require specialized complex cells to ring oscillators. This is an inexpensive solution that
generate the empty/full flags used for flow control. allows full control of the local clock. However, it has
The introduced latency might be significant and significant drawbacks. Ring oscillators are impractical
unacceptable for high-speed applications. for industrial use. They need careful calibration
As an alternative, Beigne and Vivet designed because they are very sensitive to process, voltage,
a synchronous-asynchronous FIFO based on the and temperature variations. Moreover, embedded ring
bisynchronous classical FIFO design using gray code, oscillators consume additional power through contin-
for the specific case of an asynchronous network-on- uous switching of the chained inverters.
chip (NoC) interface.10 Their aim was to maintain On the other hand, careful design of the delay line
compatibility with existing design solutions and to use can reduce its power consumption to a level below
standard CAD tools. Thus, even with some performance that of a corresponding clock tree. In addition,
Synchronization method
3,4,6
Property Pausible clocking FIFO-based8–10 Boundary synchronization11,12
Area overhead Low Medium to high Low
Latency Low High Medium
Throughput Lowered according to clock pause rate High Medium
Power consumption Low High Medium
Additional cells Mutex, delay-line, Muller-C Empty/full flag Muller-C, mutex
Advantages No metastability Simple solution, throughput Low overhead
Disadvantages Local clock generators, throughput Area overhead, latency Requires verification, throughput
programmable delay lines offer a great opportunity to timing behavior. Therefore, the test result can differ
easily build dynamic frequency-scaling systems, which from chip to chip and from test run to test run.
enable reduced power dissipation at the system level. For synchronous circuits, the usual test strategy is to
Contrary to earlier expectations, GALS-based solu- apply a scan chain. If we can design independent scan
tions don’t automatically offer performance gains. chains for different locally synchronous blocks and
Interblock communication incurs some penalty in all provide a method for the ATE to access these scan
GALS systems. In pausible-clock systems, the clock can chains, we can also use this method for GALS systems.
be stretched when transferring data on slow commu- However, the question of how to test the asynchronous
nication links, reducing the locally synchronous part of the chip remains. Fortunately, the problem is not
modules’ operating frequency. FIFO-based systems, as severe as in fully asynchronous circuits, because the
depending on the communication link, suffer from number of asynchronous gates in a typical GALS system
additional latency. If designed carefully, performance is comparatively small. To test the asynchronous part of
degradation in a GALS system will be insignificant; a GALS system, we can use scan-based methods for
however, in some examples (for various reasons), the asynchronous circuits,13 or we can devise specialized
reported performance degradation of the GALS system functional tests that cover most faults in these gates.
was as high as 23%.4 The main requirement for widespread acceptance
The GALS approach is a vehicle for block of GALS-based design techniques is a stable and
interconnects. A crucial parameter for such an reliable design flow. Currently, support for a mixed
application is data throughput and latency. For many asynchronous-synchronous design flow from commer-
GALS solutions, the problem of data throughput is cial CAD suppliers is limited. Consequently, the design
critical. Some pausible-clocking schemes can theoret- flow is not automated and is usually based on a mix of
ically reach a maximum data throughput of one data commercial synchronous CAD tools, supplemented by
item per clock cycle.4 However, more often, data asynchronous tools from academic institutions and
transfers are limited to every second clock cycle or many customized, manual steps. Additionally, to
even every fourth or fifth clock cycle of the locally achieve reliable operation of asynchronous compo-
synchronous block. In addition, in a multiport envi- nents, it is necessary to generate additional standard-
ronment, the intensive data transfers significantly ized cells for mutual exclusion, programmable delay
degrade performance. For FIFO-based solutions, the elements, C-elements, and even complete handshak-
throughput problem is less severe, but latency ing circuits.14 Some companies, such as Silistix (http://
increases. www.silistix.com), offer ways to cope with GALS
IC testability is a crucial issue in industrial applica- design flow issues. However, their approaches are
tions. For a chip to operate outside the lab environ- not very general, because they focus on supporting
ment, it must be extensively tested. Functional test of their own interconnect.
asynchronous circuits is very difficult because most
ATE is cycle based and cannot provide event-based Power reduction possibilities and limitations
handshake signals. For GALS circuits, the process of Talpes and Marculescu performed an evaluation of
arbitration and stretching leads to nondeterministic the power-saving potential of GALS systems.15 Their
September–October 2007
433
Globally Asynchronous, Locally Synchronous Design and Test
investigation, based on the application of GALS to performance, and thus the power of the SoC’s local
high-speed processor implementations, showed some units.
general trends. In such environments, the clock signal
is the dominant source of power consumption. EMI reduction
Initially, the researchers assumed that splitting the For many applications, lowering the level of noise
clock network into several smaller subnetworks would generated from digital circuits is important. In a mixed-
lead to lower overall power consumption. The basic signal system, the noise generated from the digital
idea was that the locally synchronous block would be ‘‘aggressor’’ can adversely affect the analog part’s
clocked only when data was to be transferred or operation or even cause its total malfunction. GALS
processed, reducing the number of unproductive methods, however, can significantly reduce the noise
clock cycles to a minimum. After modeling a GALS generated on power supply lines.
superscalar processor, the researchers demonstrated To estimate the effect of GALS application on EMI
that a GALS approach can actually lead to a perfor- characteristics, researchers developed a Matlab model
mance drop. The achieved power reduction was not of the supply current variation of externally and
very impressive either. They estimated the drop in internally driven GALS systems and compared them
performance at about 5% to 25%, and observed an with an equivalent synchronous system.6 According to
increase of around 1% in energy consumption in some this analysis, GALS reduces the maximum spectral
cases. peak by around 20 dB. For a wide range of
However, GALS techniques allow each locally frequencies, the spectral components in the GALS
synchronous module’s frequency and voltage to be system are at least 10 dB lower than those of the
set independently, making scaling far more conve- synchronous circuit. Furthermore, in the time domain,
nient than with the standard synchronous approach. the supply current peaks are about 40% lower than
It’s possible to set the optimal frequency for a GALS those of the synchronous system. This maximum peak
module, because all interblock communication is current reduction can reduce the chip’s power supply
performed asynchronously. The block boundaries are network, as well as the total number of power pads,
clearly defined, and the GALS partitioning naturally leading to a significant savings in area.
leads to a hierarchical layout process, which eases the The goal of the GALS design methodology is
introduction of various power rings in the layout and primarily to combine the advantages of asynchronous
the insertion of DC-DC converters. Talpes and Marcu- design techniques with the convenience of using
lescu also investigated possible power savings from a well-supported digital design methodology. Many
GALS in conjunction with dynamic voltage and clock current asynchronous circuit designs are geared
frequency scaling (DVS). They estimated that using toward secure cryptographic chips. Because an
DVS in a GALS system can achieve an average energy asynchronous circuit’s power spectrum does not
reduction of up to 33%, with a slight performance drop contain large peaks at multiples of a global clock
of 10%.15 However, even in purely synchronous frequency, some researchers believe it reveals less
systems, the performance reduction together with information about the circuit’s operation. Further-
voltage scaling can lead to significant energy savings. more, because of their asynchronous components,
The GALS approach on its own has more or less the GALS chips are less controllable and their timing is less
same limitations for power reduction as clock gating in predictive. Thus, it might be possible to develop GALS-
synchronous circuits. Both approaches rest on the based systems that provide increased immunity
same paradigm—discarding unnecessary clock cy- against differential power analysis attacks—a major
cles. One advantage of GALS-based systems is the threat to secure hardware implementations such as
additional power savings resulting from simpler clock smart cards.16
trees. However, this improvement is very limited. The
only possibility for significant improvement over GALS techniques in the
synchronous low-power methods is combining a GALS research community
approach with voltage and frequency scaling. This Over the past 20 years, many publications on the
method may bring power savings in SoCs that address GALS approach have appeared. We have performed
multiple applications and modes. In that case, we will a search to see trends in the research community. We
be able to dynamically tune the constraints, the tried to find as many Internet publications as possible
GALS implementation
Feature Acacia Faust IHP baseband processor
Designed by ETHZ CEA-LETI IHP Microelectronics
Process (mm) 0.25 0.13 0.25
Area (mm2) 1.1 80 45
Clock frequency (MHz) 80–200 160–250 20–80
GALS type Pausible clock FIFO Request-driven (pausible clock)
*CEA-LETI: French Atomic Energy Commission Laboratory for Electronics and Information Technology; ETHZ: Swiss Federal
Institute of Technology Zurich; Faust: Flexible Architecture of a Unified System for Telecommunications.
September–October 2007
435
Globally Asynchronous, Locally Synchronous Design and Test
September–October 2007
437
Globally Asynchronous, Locally Synchronous Design and Test
September–October 2007
439
Globally Asynchronous, Locally Synchronous Design and Test
4. J. Muttersbach, T. Villiger, and W. Fichtner, ‘‘Practical 16. F.K. Gürkaynak et al., ‘‘Improving DPA Security by Using
Design of Globally-Asynchronous Locally-Synchronous Globally-Asynchronous Locally-Synchronous Systems,’’
Systems,’’ Proc. 6th Int’l Symp. Advanced Research in Proc. 31st European Solid-State Circuits Conf.
Asynchronous Circuits and Systems (ASYNC 00), IEEE (ESSCIRC 05), IEEE Press, 2005, pp. 407-410.
CS Press, 2000, pp. 52-59. 17. D. Lattard et al., ‘‘A Telecom Baseband Circuit Based on
5. J. Kessels et al., ‘‘Clock Synchronization through an Asynchronous Network-on-Chip,’’ Proc. Int’l Solid-
Handshake Signalling,’’ Proc. 8th Int’l Symp. State Circuits Conf. (ISSCC 07), IEEE Press, 2007, pp.
Asynchronous Circuits and Systems (ASYNC 02), IEEE 258-601.
CS Press, 2002, pp. 59-68. 18. R. Dobkin et al., ‘‘An Asynchronous Router for Multiple
6. M. Krstić et al., ‘‘System Integration by Request-Driven Service Levels Networks on Chip,’’ Proc. 11th IEEE Int’l
GALS Design,’’ IEE Proc. Computers & Digital Symp. Asynchronous Circuits and Systems (ASYNC 05),
Techniques, vol. 153, no. 5, Sept. 2006, pp. 362-372. IEEE CS Press, 2005, pp. 44-53.
7. R. Mullins and S. Moore, ‘‘Demystifying Data-Driven and 19. G. DeMicheli and L. Benini, Networks on Chips:
Pausible Clocking Schemes,’’ Proc. 13th IEEE Int’l Technology and Tools (Systems on Silicon), Morgan
Symp. Asynchronous Circuits and Systems (ASYNC 07), Kaufmann, 2006.
IEEE CS Press, 2007, pp. 175-185. 20. Z. Yu and B.M. Baas, ‘‘Implementing Tile-Based Chip
8. T. Chelcea and S. Nowick, ‘‘Low-Latency Asynchronous Multiprocessors with GALS Clocking Styles,’’ Proc. IEEE
FIFO’s Using Token Rings,’’ Proc. 6th Int’l Symp. Int’l Conf. Computer Design (ICCD 06), IEEE Press,
Advanced Research in Asynchronous Circuits and 2006, pp. 174-180.
Systems (ASYNC 00), IEEE CS Press, 2000, pp.
210-220.
9. A. Chakraborty and M. Greenstreet, ‘‘Efficient Self-Timed Miloš Krstić is a research associate
Interfaces for Crossing Clock Domains,’’ Proc. 9th IEEE at IHP Microelectronics, Frankfurt
Int’l Symp. Asynchronous Circuits and Systems (ASYNC (Oder), Germany. His research inter-
03), IEEE CS Press, 2003, pp. 78-88. ests include low-power digital design
10. E. Beigne and P. Vivet, ‘‘Design of On-Chip and Off-Chip for wireless applications and globally
Interfaces for a GALS NoC Architecture,’’ Proc. 12th asynchronous, locally synchronous (GALS) methodol-
IEEE Int’l Symp. Asynchronous Circuits and Systems ogies for digital-systems integration. Krstić has a
(ASYNC 06), IEEE CS Press, 2006, pp. 172-181. Dipl-Ing in electronics and communications and an
11. R. Dobkin, R. Ginosar, and C. Sotiriu, ‘‘Data MSc in electronics from the University of Niš, Serbia,
Synchronization Issues in GALS SoCs,’’ Proc. 10th IEEE and a Dr-Ing in electronics from Brandenburg Univer-
Int’l Symp. Asynchronous Circuits and Systems (ASYNC sity of Technology, Cottbus, Germany.
04), IEEE CS Press, 2004, pp. 170-179.
12. T. Bjerregaard et al., ‘‘An OCP Compliant Network Eckhard Grass is a research fellow
Adapter for GALS-Based SoC Design Using the MANGO at IHP Microelectronics, Frankfurt
Network-on-Chip,’’ Proc. Int’l Symp. System-on-Chip (Oder), Germany, where he leads
(SoC 05), IEEE Press, 2005, pp. 171-174. a project on the development and
13. K. van Berkel, A. Peeters, and F. te Beest, ‘‘Adding implementation of a wireless broad-
Synchronous and LSSD Modes to Asynchronous band communication system in the 60-GHz band. His
Circuits,’’ Proc. 8th IEEE Int’l Symp. Asynchronous research interests include data-driven (asynchro-
Circuits and Systems (ASYNC 02), IEEE CS Press, nous) signal-processing structures and low-power
2002, pp. 161-170. VLSI implementation of communication systems.
14. F. Gürkaynak et al., ‘‘GALS at ETH Zurich: Success or Grass has a Dr-Ing in electronics from Humboldt
Failure?’’ Proc. 12th IEEE Int’l Symp. Asynchronous University, Berlin.
Circuits and Systems (ASYNC 06), IEEE CS Press,
2006, pp. 150-159. Frank K. Gürkaynak is a research
15. E. Talpes and D. Marculescu, ‘‘Toward a Multiple Clock/ associate at the Swiss Federal In-
Voltage Island Design Style for Power-Aware stitute of Technology Lausanne,
Processors,’’ IEEE Trans. Very Large Scale Integration where he works on lab-on-chip sys-
(VLSI) Systems, vol. 13, no. 5, May 2005, pp. 591-603. tems. His research interests include
September–October 2007
441