0% found this document useful (0 votes)
206 views12 pages

Globally Asynchronous, Locally Synchronous PDF

Globally Asynchronous, Locally Synchronous (GALS) design addresses challenges in integrating complex digital blocks by allowing them to operate synchronously locally but communicate asynchronously globally. There are three main GALS strategies: 1) Using pausible clock generators to stop local clocks during data transfer, 2) Adding asynchronous FIFO buffers between blocks to hide synchronization, and 3) Performing synchronization only at block borders without stopping local clocks. GALS aims to improve modularity, scalability, and reduce power compared to traditional synchronous designs.

Uploaded by

Souvik Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
206 views12 pages

Globally Asynchronous, Locally Synchronous PDF

Globally Asynchronous, Locally Synchronous (GALS) design addresses challenges in integrating complex digital blocks by allowing them to operate synchronously locally but communicate asynchronously globally. There are three main GALS strategies: 1) Using pausible clock generators to stop local clocks during data transfer, 2) Adding asynchronous FIFO buffers between blocks to hide synchronization, and 3) Performing synchronization only at block borders without stopping local clocks. GALS aims to improve modularity, scalability, and reduce power compared to traditional synchronous designs.

Uploaded by

Souvik Das
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Globally Asynchronous, Locally Synchronous Design and Test

Globally Asynchronous,
Locally Synchronous
Circuits: Overview
and Outlook
Miloš Krstić and Eckhard Grass Pascal Vivet
IHP Microelectronics CEA-LETI

Frank K. Gürkaynak
Swiss Federal Institute of Technology Lausanne

A GALS system consists of complex


Editor’s note digital blocks operating synchronously.
This article provides a pragmatic survey on the state of the art in GALS Those blocks are usually developed
architectural techniques, design flows, and applications. The authors also using standard synchronous CAD tools
prescribe several industrial inventions and changes in methodology, tools,
and design flow. However, the opera-
and design flow that would improve GALS-based integration of IP blocks.
tion of the blocks is not mutually
—Sandeep Shukla, Virginia Tech
synchronized—hence the term locally
synchronous. These locally synchro-
&THE INCREASED COMPLEXITY of digital circuits nous blocks communicate with one another asyn-
leads to severe challenges in the design process. Most chronously; on the block level (globally), the system is
modern digital systems are implemented as SoCs. asynchronous. A common approach is to add an
Consequently, system integration has become a crucial asynchronous wrapper, which provides an interface
problem. The SoC designer faces physical-design from the synchronous to the asynchronous environ-
issues such as global clock tree synthesis and top- ment (and vice versa), to every locally synchronous
level timing optimization. Even if technology scaling block. The asynchronous wrapper also controls
offers more integration possibilities, modularity and asynchronous communication between locally syn-
scalability at the physical level are more difficult to chronous blocks. In the GALS system design, the main
achieve. In addition, SoCs frequently incorporate issue is designing reliable GALS interfaces to handle
several analog subblocks such as phase-locked loops the problem of metastability, which can occur
and A/D and D/A converters. The clock signal used for between synchronous and asynchronous logic do-
the digital part is a very strong noise source for the mains.1
analog part. Therefore, electromagnetic interference The earliest GALS proposals appeared in the 1980s.2
(EMI) effects must be reduced as much as possible. However, interest increased in the mid-1990s and early
Finally, especially for mobile devices, reducing power 2000s, with the first practical proposals for pausible (or
consumption is highly important. The modern design stretchable) clocking.3,4 Since then, GALS proposals
flow should incorporate all possible tools for coping have featured various approaches.5,6 Earlier solutions
with these issues. A promising option for dealing with were designed to improve throughput, reduce area,
such design challenges is the deployment of globally and reduce power consumption. Certain test cases
asynchronous, locally synchronous (GALS) systems. demonstrated benefits in operation speed, circuit area,

430 0740-7475/07/$25.00 G 2007 IEEE Copublished by the IEEE CS and the IEEE CASS IEEE Design & Test of Computers
Figure 1. Globally asynchronous, locally synchronous (GALS) system with pausible clocking.

and power consumption, but for general implementa- which avoids metastability by ensuring no clock
tions the overhead of adding asynchronous wrappers pulses are generated when data is transferred.
resulted in performance penalties. In any case, & FIFO buffers—using asynchronous FIFO buffers
reported improvements were not large enough to between locally synchronous blocks to hide the
warrant a change in design methodology. More recent synchronization problem. A SoC architecture
GALS solutions have focused more on facilitating that uses distinct clock domains connected
system integration, reducing EMI, and providing side- through bisynchronous FIFO buffers is common-
channel security. Although the GALS approach has not ly called a GALS system. In our case, however,
yet profoundly affected IC design, its system integra- we refer only to pure GALS systems, in which the
tion aspects will provide another opportunity for its blocks are connected asynchronously.
wider adoption. & Boundary synchronization—performing bound-
In this article, based on our broad practical ary synchronization on the signals crossing the
experience, we present different GALS techniques borders of the locally synchronous island with-
and architectures, and we analyze the challenges and out stopping the complete locally synchronous
possibilities for wider adoption of these methods. block during data transfer.

Existing GALS solutions


Several GALS methods address the problem of safe GALS wrapper with pausible clocking
and reliable data transfer between independent clock Many GALS systems presented in the past few years
domains. For example, Mullins and Moore give use pausible (or stretchable) clocking.3,4 The basic
a detailed GALS analysis based on the clock genera- idea of all these proposals is similar: transferring data
tion processes and I/O port operations of the various between wrappers when both the data transmitter and
methods.7 In this article, we use a taxonomy based on data receiver clocks are stopped. This elegantly solves
the hardware architecture used to transfer data safely. the problem of synchronization between the two clock
This leads to three main strategies for implementing domains. Figure 1 illustrates the general structure of
GALS systems: such a system. The asynchronous wrapper contains
input and output ports that perform the handshake
& Pausible-clock generators—applying local (pau- process between the locally synchronous modules,
sible, stretchable, or data-driven) clocking, and it generates a stretch signal to stop the activity of

September–October 2007
431
Globally Asynchronous, Locally Synchronous Design and Test

degradation or suboptimal
architecture, designers can
achieve the main goal of
designing GALS systems in
the standard design envi-
ronment.

Boundary
synchronization
A third solution is to
perform data synchroni-
zation at the borders of
the locally synchronous
island, without affecting
the inner operation of lo-
cally synchronous blocks
Figure 2. Typical FIFO-based GALS system.
and without relying on
both clocks. The basic GALS method focuses on point- FIFO buffers. For this purpose, designers can use
to-point communication between blocks. standard two-flop, one-flop, predictive, or adaptive
synchronizers for mesochronous systems, or locally
FIFO solutions delayed latching.1,11 This method can achieve very
Another approach to interfacing locally synchro- reliable data transfer between locally synchronous
nous blocks is using specially designed asynchronous blocks. On the other hand, such solutions generally
FIFO buffers8–10 and hiding the system synchronization increase latency and reduce data throughput, resulting
problem within the FIFO buffers. Such a system can in limited applicability for high-speed systems. Table 1
tolerate very large interconnect delays and is also summarizes the properties of GALS systems’ synchro-
robust with regard to metastability. Designers can use nization methods.
this method to interconnect asynchronous and
synchronous systems and also to construct synchro- Advantages and limitations of
nous-synchronous and asynchronous-asynchronous GALS solutions
interfaces. Figure 2 diagrams a typical FIFO interface, The scientific community has shown great interest
which achieves an acceptable data throughput.8 In in GALS solutions and architectures in the past two
addition to the data cells, the FIFO structure includes decades. However, this interest hasn’t culminated in
an empty/full detector and a special deadlock de- many commercial applications, despite all reported
tector. advantages. There are several reasons why standard
The advantage of FIFO synchronizers is that they design practice has not adopted GALS techniques.
don’t affect the locally synchronous module’s opera-
tion. However, with very wide interconnect data Design and system integration issues
buses, FIFO structures can be costly in silicon area. Many proposed solutions require programmable
Also, they require specialized complex cells to ring oscillators. This is an inexpensive solution that
generate the empty/full flags used for flow control. allows full control of the local clock. However, it has
The introduced latency might be significant and significant drawbacks. Ring oscillators are impractical
unacceptable for high-speed applications. for industrial use. They need careful calibration
As an alternative, Beigne and Vivet designed because they are very sensitive to process, voltage,
a synchronous-asynchronous FIFO based on the and temperature variations. Moreover, embedded ring
bisynchronous classical FIFO design using gray code, oscillators consume additional power through contin-
for the specific case of an asynchronous network-on- uous switching of the chained inverters.
chip (NoC) interface.10 Their aim was to maintain On the other hand, careful design of the delay line
compatibility with existing design solutions and to use can reduce its power consumption to a level below
standard CAD tools. Thus, even with some performance that of a corresponding clock tree. In addition,

432 IEEE Design & Test of Computers


Table 1. Properties of GALS techniques.

Synchronization method
3,4,6
Property Pausible clocking FIFO-based8–10 Boundary synchronization11,12
Area overhead Low Medium to high Low
Latency Low High Medium
Throughput Lowered according to clock pause rate High Medium
Power consumption Low High Medium
Additional cells Mutex, delay-line, Muller-C Empty/full flag Muller-C, mutex
Advantages No metastability Simple solution, throughput Low overhead
Disadvantages Local clock generators, throughput Area overhead, latency Requires verification, throughput

programmable delay lines offer a great opportunity to timing behavior. Therefore, the test result can differ
easily build dynamic frequency-scaling systems, which from chip to chip and from test run to test run.
enable reduced power dissipation at the system level. For synchronous circuits, the usual test strategy is to
Contrary to earlier expectations, GALS-based solu- apply a scan chain. If we can design independent scan
tions don’t automatically offer performance gains. chains for different locally synchronous blocks and
Interblock communication incurs some penalty in all provide a method for the ATE to access these scan
GALS systems. In pausible-clock systems, the clock can chains, we can also use this method for GALS systems.
be stretched when transferring data on slow commu- However, the question of how to test the asynchronous
nication links, reducing the locally synchronous part of the chip remains. Fortunately, the problem is not
modules’ operating frequency. FIFO-based systems, as severe as in fully asynchronous circuits, because the
depending on the communication link, suffer from number of asynchronous gates in a typical GALS system
additional latency. If designed carefully, performance is comparatively small. To test the asynchronous part of
degradation in a GALS system will be insignificant; a GALS system, we can use scan-based methods for
however, in some examples (for various reasons), the asynchronous circuits,13 or we can devise specialized
reported performance degradation of the GALS system functional tests that cover most faults in these gates.
was as high as 23%.4 The main requirement for widespread acceptance
The GALS approach is a vehicle for block of GALS-based design techniques is a stable and
interconnects. A crucial parameter for such an reliable design flow. Currently, support for a mixed
application is data throughput and latency. For many asynchronous-synchronous design flow from commer-
GALS solutions, the problem of data throughput is cial CAD suppliers is limited. Consequently, the design
critical. Some pausible-clocking schemes can theoret- flow is not automated and is usually based on a mix of
ically reach a maximum data throughput of one data commercial synchronous CAD tools, supplemented by
item per clock cycle.4 However, more often, data asynchronous tools from academic institutions and
transfers are limited to every second clock cycle or many customized, manual steps. Additionally, to
even every fourth or fifth clock cycle of the locally achieve reliable operation of asynchronous compo-
synchronous block. In addition, in a multiport envi- nents, it is necessary to generate additional standard-
ronment, the intensive data transfers significantly ized cells for mutual exclusion, programmable delay
degrade performance. For FIFO-based solutions, the elements, C-elements, and even complete handshak-
throughput problem is less severe, but latency ing circuits.14 Some companies, such as Silistix (http://
increases. www.silistix.com), offer ways to cope with GALS
IC testability is a crucial issue in industrial applica- design flow issues. However, their approaches are
tions. For a chip to operate outside the lab environ- not very general, because they focus on supporting
ment, it must be extensively tested. Functional test of their own interconnect.
asynchronous circuits is very difficult because most
ATE is cycle based and cannot provide event-based Power reduction possibilities and limitations
handshake signals. For GALS circuits, the process of Talpes and Marculescu performed an evaluation of
arbitration and stretching leads to nondeterministic the power-saving potential of GALS systems.15 Their

September–October 2007
433
Globally Asynchronous, Locally Synchronous Design and Test

investigation, based on the application of GALS to performance, and thus the power of the SoC’s local
high-speed processor implementations, showed some units.
general trends. In such environments, the clock signal
is the dominant source of power consumption. EMI reduction
Initially, the researchers assumed that splitting the For many applications, lowering the level of noise
clock network into several smaller subnetworks would generated from digital circuits is important. In a mixed-
lead to lower overall power consumption. The basic signal system, the noise generated from the digital
idea was that the locally synchronous block would be ‘‘aggressor’’ can adversely affect the analog part’s
clocked only when data was to be transferred or operation or even cause its total malfunction. GALS
processed, reducing the number of unproductive methods, however, can significantly reduce the noise
clock cycles to a minimum. After modeling a GALS generated on power supply lines.
superscalar processor, the researchers demonstrated To estimate the effect of GALS application on EMI
that a GALS approach can actually lead to a perfor- characteristics, researchers developed a Matlab model
mance drop. The achieved power reduction was not of the supply current variation of externally and
very impressive either. They estimated the drop in internally driven GALS systems and compared them
performance at about 5% to 25%, and observed an with an equivalent synchronous system.6 According to
increase of around 1% in energy consumption in some this analysis, GALS reduces the maximum spectral
cases. peak by around 20 dB. For a wide range of
However, GALS techniques allow each locally frequencies, the spectral components in the GALS
synchronous module’s frequency and voltage to be system are at least 10 dB lower than those of the
set independently, making scaling far more conve- synchronous circuit. Furthermore, in the time domain,
nient than with the standard synchronous approach. the supply current peaks are about 40% lower than
It’s possible to set the optimal frequency for a GALS those of the synchronous system. This maximum peak
module, because all interblock communication is current reduction can reduce the chip’s power supply
performed asynchronously. The block boundaries are network, as well as the total number of power pads,
clearly defined, and the GALS partitioning naturally leading to a significant savings in area.
leads to a hierarchical layout process, which eases the The goal of the GALS design methodology is
introduction of various power rings in the layout and primarily to combine the advantages of asynchronous
the insertion of DC-DC converters. Talpes and Marcu- design techniques with the convenience of using
lescu also investigated possible power savings from a well-supported digital design methodology. Many
GALS in conjunction with dynamic voltage and clock current asynchronous circuit designs are geared
frequency scaling (DVS). They estimated that using toward secure cryptographic chips. Because an
DVS in a GALS system can achieve an average energy asynchronous circuit’s power spectrum does not
reduction of up to 33%, with a slight performance drop contain large peaks at multiples of a global clock
of 10%.15 However, even in purely synchronous frequency, some researchers believe it reveals less
systems, the performance reduction together with information about the circuit’s operation. Further-
voltage scaling can lead to significant energy savings. more, because of their asynchronous components,
The GALS approach on its own has more or less the GALS chips are less controllable and their timing is less
same limitations for power reduction as clock gating in predictive. Thus, it might be possible to develop GALS-
synchronous circuits. Both approaches rest on the based systems that provide increased immunity
same paradigm—discarding unnecessary clock cy- against differential power analysis attacks—a major
cles. One advantage of GALS-based systems is the threat to secure hardware implementations such as
additional power savings resulting from simpler clock smart cards.16
trees. However, this improvement is very limited. The
only possibility for significant improvement over GALS techniques in the
synchronous low-power methods is combining a GALS research community
approach with voltage and frequency scaling. This Over the past 20 years, many publications on the
method may bring power savings in SoCs that address GALS approach have appeared. We have performed
multiple applications and modes. In that case, we will a search to see trends in the research community. We
be able to dynamically tune the constraints, the tried to find as many Internet publications as possible

434 IEEE Design & Test of Computers


explicitly dealing with GALS
methods, architectures, or
evaluations. Figure 3 shows
the results of our study. We
based our search on lead-
ing asynchronous confer-
ences such as the Interna-
tional Symposium on Asyn-
chronous Circuits and Sys-
tems (ASYNC), the Asyn-
chronous Circuit Design
(ACiD) Workshop, the FM
GALS Workshop, and the
International Workshop on
Power and Timing Model-
ing Optimization, and Sim-
ulation (PATMOS). We in-
cluded all important refer-
ences found in the relevant
papers.
An interesting point is
the number of publica- Figure 3. Number of published works on GALS methods, architectures, or evaluations.
tions that feature a practi-
cal demonstration of GALS techniques. The number of GALS techniques in practice
practical GALS demonstrators and GALS design cases So far, industrial application of GALS techniques
(the ‘‘Applications’’ category in the figure) has in- has been extremely limited, and fully functional
creased over the years. The greatest increase in interest integrated GALS systems are extremely rare. We now
in the GALS approach occurred in 2002. Since then, present three GALS implementations, which to our
interest has been more or less constant. Another knowledge are the main published GALS demonstra-
interesting point is that research activities to develop tions. Table 2 lists the main design parameters of these
new GALS architectures were stronger earlier. Re- designs. Each design targets a different application
cently, fewer papers have dealt with this issue. The and embodies design decisions that result from
most frequent topics are power and performance practical requirements and the general research
analysis of existing GALS systems, and modeling and direction of the R&D group that created them. Hence,
formal verification. it is virtually impossible to compare their performance.

Table 2. Practical GALS demonstrators.

GALS implementation
Feature Acacia Faust IHP baseband processor
Designed by ETHZ CEA-LETI IHP Microelectronics
Process (mm) 0.25 0.13 0.25
Area (mm2) 1.1 80 45
Clock frequency (MHz) 80–200 160–250 20–80
GALS type Pausible clock FIFO Request-driven (pausible clock)
*CEA-LETI: French Atomic Energy Commission Laboratory for Electronics and Information Technology; ETHZ: Swiss Federal
Institute of Technology Zurich; Faust: Flexible Architecture of a Unified System for Telecommunications.

September–October 2007
435
Globally Asynchronous, Locally Synchronous Design and Test

approach to reduce power consumption or increase


performance; rather they used the programmable
clock generators required in the GALS technique to
construct additional countermeasures against side-
channel attacks. Still, compared with a synchronous
implementation using similar partitioning, Acacia
provides slightly higher throughput (8%) with a small
area overhead (less than 2%).
Acacia’s development benefited significantly from
the ETHZ Integrated Systems Laboratory’s experience
in designing GALS chips.14 The developers based vital
components such as local clock generators and
asynchronous port controllers on silicon-proven earli-
er implementations. They also found that a standard
hierarchical back-end design flow is well-suited to
GALS systems. In this approach, the asynchronous
wrapper occupies an additional level of hierarchy
around the locally synchronous island. The project
also tackled the challenge of providing a test solution.
By combining classical scan-based testing for most of
the locally synchronous blocks and functional testing
Figure 4. Die photo of Acacia chip. for the asynchronous wrapper, the developers ob-
tained a combined test coverage of 99.89%.
Instead, we give a brief summary of these designs,
describe how they profit from the GALS approach, and Faust
describe the designers’ experiences in implementing Recently, CEA-LETI (French Atomic Energy Com-
them. mission Laboratory for Electronics and Information
Technology) implemented a complex GALS chip
Acacia named Faust (Flexible Architecture of a Unified
As mentioned earlier, several researchers believe System for Telecommunications) in STMicroelectro-
that asynchronous design methods are well-suited to nics 130-nm CMOS technology.17 Faust, an open
implementing secure cryptographic hardware. The platform for fourth-generation multicarrier code di-
Swiss Federal Institute of Technology Zurich (ETHZ) vision multiple-access (MC-CDMA) telecommunica-
designed the Acacia chip (see Figure 4) to explore tions applications, was designed to validate the
opportunities of using the GALS design methodology principles and feasibility of an innovative GALS NoC
in developing cryptographic hardware with increased architecture.
resistance to side-channel attacks.16 The chip imple- CEA-LETI has proposed and developed a complete
ments the common 128-bit advanced encryption asynchronous network-on-chip (ANOC) architecture
algorithm by using three locally synchronous blocks. adapted to GALS systems,17 using virtual channels to
In addition to implementing well-known counter- provide low latency and high quality of service.12,18 The
measures, Acacia allows pseudorandom changing of ANOC is implemented in quasi-delay-insensitive asyn-
each locally synchronous island’s clock period. chronous logic. A dedicated on-chip GALS NoC
Coupled with other countermeasures, this makes it interface connects the synchronous and asynchronous
increasingly difficult (if not impossible) for an attacker NoC domains through FIFO buffers.10 Compared with
to sample the chip’s power consumption at a given a more optimized FIFO solution,8 the Faust design
state over many operations (a key requirement for adapts classical bisynchronous FIFO buffers with gray
performing differential power analysis, a common code, which are compatible with standard CAD tools.
side-channel attack). Like earlier ETHZ designs, this The FIFO design lets users robustly interface the NoC
chip’s GALS technique is based on the pausible- protocol with high throughput (one transfer per clock
clocking scheme.4 The designers did not use the GALS cycle) and small latency overhead (two clock cycles).

436 IEEE Design & Test of Computers


For off-chip NoC communication, a dual synchronous/
asynchronous-mode NoC port allows connection of
different NoC-based subsystems.
The Faust chip (see Figure 5) integrates 20
asynchronous NoC nodes, 23 synchronous units
including an ARM946 core, embedded memories,
various programmable hardware blocks, reconfigur-
able data path engines, and one clock management
unit that generates 23 distinct clocks. The Faust
prototyping platform integrates two Faust chips and
two FPGAs connected by off-chip NoC communica-
tions. The Faust open platform addresses software-
defined-radio applications and implements a fourth-
generation MC-CDMA multiple-input, multiple-output
application (htttp://www.ist-4more.org). Faust is one
of the most complex existing GALS systems, with more
than 3 million gates and 3.5 Mbits of embedded RAM,
corresponding to a chip area of 79.5 mm2.
Because almost no design tools exist and no well-
known, off-the-shelf GALS methodology is available,
the Faust developers designed the NoC building
blocks by hand. They also developed dedicated Figure 5. Die photo of Faust chip (Source: Lattard et al.17).
standard cells—about 50 cells with C-elements and
mutual-exclusion (mutex) elements for specific asyn-
chronous design parts. They were able to use standard bus-based solution. In addition to this low power
place-and-route tools for physical layout, but their consumption, the developers expect to obtain more
main CAD difficulties were timing analysis and timing energy savings at the system level through frequency
optimization. Better CAD tool support of asynchronous scaling enabled by the GALS partitioning. Finally,
logic would definitely help the design flow. because of the 23 distinct clock domains, EMI should
For testability, Faust uses a standard full-scan be very low.
methodology for all the synchronous units (scan
patterns are transported by the NoC itself), and Baseband processor
functional test for the asynchronous NoC. Test IHP developed a GALS wireless local-area network
coverage is expected to be about 95%. Despite the (WLAN) baseband processor compliant with the IEEE
CAD difficulties, the GALS approach helped the 802.11a standard.6 This chip (see Figure 6) serves as
implementation of the large chip by breaking the a feasibility study of a request-driven GALS technique.
timing constraints with separate clock domains. Using A synchronous version of the same system exists, so
a mixed top-down, bottom-up methodology made it we can compare the two design methodologies.
easy to implement smaller synchronous units with In principle, the GALS design process was faster
distinct small clock trees rather than having a large than the synchronous one. Dealing with smaller design
single-clock system. blocks was less difficult. Challenges such as generating
No synchronous version of the Faust design exists, a global clock tree with an enormous number of
so it is difficult to draw comparisons. The NoC leaves, dividing the clock, and handling clock gating
infrastructure costs approximately 15% of the overall disappeared. Clock skew within smaller clock do-
chip area, which is equivalent to a classical bus-based mains was significantly reduced. However, with more
architecture, and the GALS FIFO interface area is stringent constraints, even better results are possible.
comparable with a classical bisynchronous FIFO Without a global clock tree, timing closure of the
approach. The asynchronous NoC power consump- complete design was achieved far more easily.
tion represents only 6% of the overall application However, during the design of the GALS baseband
consumption, which is 50% less than an equivalent processor, several new issues arose. The main

September–October 2007
437
Globally Asynchronous, Locally Synchronous Design and Test

special setup for asynchronous modules, such as


adjusting clock phases for different blocks.

GALS application prospects


In the past few years, there has been considerable
GALS research activity. However, the prospects for
industrial application of GALS systems are not entirely
optimistic. Research has resulted in only a few
practical demonstrators. With the previously described
GALS solutions in mind, we can define the main needs
that GALS methods must meet to be widely adopted:

& System integration. Simplification of design in-


tegration is the ultimate goal for any GALS
solution. The GALS design flow must be faster
and less error-prone, with smaller design periods
and fewer design iterations. Another interesting
feature is EMI reduction. Simply introducing
a GALS solution cannot achieve significant
power reduction.
& Standard interfaces. The GALS method must
incorporate clearly defined, standardized para-
Figure 6. Die photo of IHP baseband processor chip.
metrical interfaces and a simple protocol be-
tween synchronous modules and the GALS
difficulty was a lack of tool support for asynchronous wrapper.
components—that is, the immaturity of asynchronous & Standard EDA tools. The design flow must be
tools. For example, CAD tool limitations made direct simple and rely on commercial tools. It is
gate mapping of generated logic equations impossible. desirable to create tools for wrapper generation
Therefore, many operations had to be performed and verification, as well as for GALS partitioning.
manually. This degrades the final design’s perfor- & High-throughput and low-complexity solutions.
mance and introduces additional delay in the design The deployed GALS architecture should offer
process. In addition, the designers performed wrapper high throughput (up to 1 data transfer per cycle,
evaluation and improvement in parallel with the GALS if needed), and quality of service (QoS) must be
chip design. These issues caused additional iterations guaranteed. The proposed GALS architecture
of the design process. should introduce no, or very low, performance
Testing the GALS chip with a standard synchronous degradation at an acceptable power overhead
hardware tester was a problem. The designers (less than 1% to 2%), and low area overhead (no
embedded special BIST logic to allow the use of more than 5% to 10% for medium-size local
a classical hardware tester. They had to perform cores—100,000 gates).
a special calibration of the ring oscillators during & Popularity. To gain popularity, the GALS interface
testing to match the testing and simulation results. solution and source code should be made public
The GALS system implementation resulted in as an open core.
a hardware overhead of about 3% for the asynchro-
nous wrappers. Power measurements of both chips Many proposed GALS solutions aim at general
showed only a marginal improvement (1%) for the applications. In reality, however, asynchronous logic
GALS chip. On the other hand, supply variation noise doesn’t give good results under all circumstances.
measurements showed a clear advantage for the GALS Therefore, a GALS approach will be suitable for
solution. The absolute maximum of the GALS circuit’s applications in certain fields and less so in others. We
power spectrum was about 5 dB lower than that of the expect that GALS architectures will have the best chance
synchronous circuit.6 This was achieved without any for commercial application in the following fields.

438 IEEE Design & Test of Computers


Moderate-performance designs with built like NoCs and are aimed at using application of
complex structures GALS methods.20 Finally, conceptually, it is not clear
One strength of GALS design is system integration. whether a packet-switched network protocol on a chip
Large SoCs running with moderate clock frequencies is the proper answer for SoC interconnects. It seems
and without stringent performance requirements will that the future of GALS methods is coupled with the
benefit greatly from a GALS approach. The possibilities future of NoCs. If the NoC concept fails, the chance of
for reducing power consumption and digital noise practical GALS applications will certainly be smaller.
point to potential GALS applications in handheld and On the other hand, a NoC boom will lead to increased
other mobile-communication SoCs. In view of the application of GALS systems.
current GALS demonstrators, this outcome seems
likely. Extreme-performance CPUs is not a realistic Low-power systems
goal for GALS design techniques. These systems are The GALS methodology has not yet proven itself as
highly specialized designs, optimized to the limits of a way to achieve significant power savings at the SoC
available synchronous design methodologies. Even level. The only possibility for significant improvement
theoretical estimates show that performance gains over synchronous low-power methods is a combina-
from such realizations are marginal at best.15 Practical tion of GALS-based systems with voltage and frequen-
implementation issues will more likely result in cy scaling.
systems that show no measurable benefits.
DESPITE ALL THE PROMISING features of GALS systems,
Submicron systems with short time to market GALS techniques are still not frequently used in
GALS methods can be suitable for submicron industrial design practice and are not part of the
systems with short time to market and for which using standard design flow. This is mainly because it has not
standard automated design flow and commercial CAD been shown that the gains offered by GALS methods
tools is preferred. The GALS technique should be the can justify the additional effort needed for their
vehicle for a modern, inexpensive, automated design implementation. Moreover, improvements of classical
process. On the other hand, high-performance systems synchronous design have so far been able to deal with
typically use custom design flows anyway. In such complex design issues. Nevertheless, the future for
cases, designers have very efficient solutions available GALS techniques is still quite promising. Currently,
for design problems (such as phase-locked loops and GALS applications mainly target the area of NoCs,
deskewing). A GALS solution is probably not the first multiprocessor systems, and integration of highly
choice for such applications. complex SoCs. Since there are some commonalities
conceptually between GALS design and NoCs, their
Networks on chips future appears to be coupled. We believe that the
The NoC is a promising target platform for future GALS design flow issues are solvable, and we are
applications.19 The development of NoC architectures encouraged by several new efforts in this direction.
is currently a very attractive topic in the research Once the right target system addresses these issues, the
community, resulting in several NoC platforms.10,12,18 GALS approach will find wider acceptance. &
Most of the proposed NoC interfaces are based on FIFO-
like GALS structures13 or, for low-throughput applica- & References
tions, on synchronizers.12 These studies have shown 1. R. Ginosar, ‘‘Fourteen Ways to Fool Your Synchronizer,’’
that such implementations can achieve sufficiently Proc. 9th IEEE Int’l Symp. Asynchronous Circuits and
good performance. On the other hand, the presented Systems (ASYNC 03), IEEE CS Press, 2003, pp. 89-96.
NoC nodes introduce area and power overheads. For 2. D. Chapiro, ‘‘Globally-Asynchronous Locally-
example, a 5 3 5 asynchronous NoC node from CEA- Synchronous Systems,’’ doctoral dissertation, Dept. of
LETI contains 19,000 gates,17 and a solution from Computer Science, Stanford Univ., 1984.
Technion contains 17,500 gates.18 Optimization of 3. K. Yun and R. Donohue, ‘‘Pausible Clocking: A First
current approaches can result in better figures. Step toward Heterogeneous Systems,’’ Proc. IEEE Int’l
It is almost certain that most NoC applications will Conf. Computer Design: VLSI in Computers and
include some sort of GALS system. Most of today’s Processors (ICCD 96), IEEE CS Press, 1996, pp.
research multiprocessor grid array architectures are 118-127.

September–October 2007
439
Globally Asynchronous, Locally Synchronous Design and Test

4. J. Muttersbach, T. Villiger, and W. Fichtner, ‘‘Practical 16. F.K. Gürkaynak et al., ‘‘Improving DPA Security by Using
Design of Globally-Asynchronous Locally-Synchronous Globally-Asynchronous Locally-Synchronous Systems,’’
Systems,’’ Proc. 6th Int’l Symp. Advanced Research in Proc. 31st European Solid-State Circuits Conf.
Asynchronous Circuits and Systems (ASYNC 00), IEEE (ESSCIRC 05), IEEE Press, 2005, pp. 407-410.
CS Press, 2000, pp. 52-59. 17. D. Lattard et al., ‘‘A Telecom Baseband Circuit Based on
5. J. Kessels et al., ‘‘Clock Synchronization through an Asynchronous Network-on-Chip,’’ Proc. Int’l Solid-
Handshake Signalling,’’ Proc. 8th Int’l Symp. State Circuits Conf. (ISSCC 07), IEEE Press, 2007, pp.
Asynchronous Circuits and Systems (ASYNC 02), IEEE 258-601.
CS Press, 2002, pp. 59-68. 18. R. Dobkin et al., ‘‘An Asynchronous Router for Multiple
6. M. Krstić et al., ‘‘System Integration by Request-Driven Service Levels Networks on Chip,’’ Proc. 11th IEEE Int’l
GALS Design,’’ IEE Proc. Computers & Digital Symp. Asynchronous Circuits and Systems (ASYNC 05),
Techniques, vol. 153, no. 5, Sept. 2006, pp. 362-372. IEEE CS Press, 2005, pp. 44-53.
7. R. Mullins and S. Moore, ‘‘Demystifying Data-Driven and 19. G. DeMicheli and L. Benini, Networks on Chips:
Pausible Clocking Schemes,’’ Proc. 13th IEEE Int’l Technology and Tools (Systems on Silicon), Morgan
Symp. Asynchronous Circuits and Systems (ASYNC 07), Kaufmann, 2006.
IEEE CS Press, 2007, pp. 175-185. 20. Z. Yu and B.M. Baas, ‘‘Implementing Tile-Based Chip
8. T. Chelcea and S. Nowick, ‘‘Low-Latency Asynchronous Multiprocessors with GALS Clocking Styles,’’ Proc. IEEE
FIFO’s Using Token Rings,’’ Proc. 6th Int’l Symp. Int’l Conf. Computer Design (ICCD 06), IEEE Press,
Advanced Research in Asynchronous Circuits and 2006, pp. 174-180.
Systems (ASYNC 00), IEEE CS Press, 2000, pp.
210-220.
9. A. Chakraborty and M. Greenstreet, ‘‘Efficient Self-Timed Miloš Krstić is a research associate
Interfaces for Crossing Clock Domains,’’ Proc. 9th IEEE at IHP Microelectronics, Frankfurt
Int’l Symp. Asynchronous Circuits and Systems (ASYNC (Oder), Germany. His research inter-
03), IEEE CS Press, 2003, pp. 78-88. ests include low-power digital design
10. E. Beigne and P. Vivet, ‘‘Design of On-Chip and Off-Chip for wireless applications and globally
Interfaces for a GALS NoC Architecture,’’ Proc. 12th asynchronous, locally synchronous (GALS) methodol-
IEEE Int’l Symp. Asynchronous Circuits and Systems ogies for digital-systems integration. Krstić has a
(ASYNC 06), IEEE CS Press, 2006, pp. 172-181. Dipl-Ing in electronics and communications and an
11. R. Dobkin, R. Ginosar, and C. Sotiriu, ‘‘Data MSc in electronics from the University of Niš, Serbia,
Synchronization Issues in GALS SoCs,’’ Proc. 10th IEEE and a Dr-Ing in electronics from Brandenburg Univer-
Int’l Symp. Asynchronous Circuits and Systems (ASYNC sity of Technology, Cottbus, Germany.
04), IEEE CS Press, 2004, pp. 170-179.
12. T. Bjerregaard et al., ‘‘An OCP Compliant Network Eckhard Grass is a research fellow
Adapter for GALS-Based SoC Design Using the MANGO at IHP Microelectronics, Frankfurt
Network-on-Chip,’’ Proc. Int’l Symp. System-on-Chip (Oder), Germany, where he leads
(SoC 05), IEEE Press, 2005, pp. 171-174. a project on the development and
13. K. van Berkel, A. Peeters, and F. te Beest, ‘‘Adding implementation of a wireless broad-
Synchronous and LSSD Modes to Asynchronous band communication system in the 60-GHz band. His
Circuits,’’ Proc. 8th IEEE Int’l Symp. Asynchronous research interests include data-driven (asynchro-
Circuits and Systems (ASYNC 02), IEEE CS Press, nous) signal-processing structures and low-power
2002, pp. 161-170. VLSI implementation of communication systems.
14. F. Gürkaynak et al., ‘‘GALS at ETH Zurich: Success or Grass has a Dr-Ing in electronics from Humboldt
Failure?’’ Proc. 12th IEEE Int’l Symp. Asynchronous University, Berlin.
Circuits and Systems (ASYNC 06), IEEE CS Press,
2006, pp. 150-159. Frank K. Gürkaynak is a research
15. E. Talpes and D. Marculescu, ‘‘Toward a Multiple Clock/ associate at the Swiss Federal In-
Voltage Island Design Style for Power-Aware stitute of Technology Lausanne,
Processors,’’ IEEE Trans. Very Large Scale Integration where he works on lab-on-chip sys-
(VLSI) Systems, vol. 13, no. 5, May 2005, pp. 591-603. tems. His research interests include

440 IEEE Design & Test of Computers


design of VLSI systems, full-custom design, GALS search interests include networks on chips, GALS
systems, cryptography, and lab-on-chip systems. architecture, and low-power design. Vivet has an MS
Gürkaynak has a BSc and an MSc in electrical and and a PhD in electronics from Université Joseph
electronics engineering from Istanbul Technical Fourier, Grenoble, France.
University, and a PhD in electrical and electronics
engineering from the Swiss Federal Institute of & Direct questions and comments about this article to
Technology Zurich. He is a member of the IEEE. Miloš Krstić, IHP, Im Technologiepark 25, 15236
Frankfurt (Oder), Germany; krstic@ihp-microelectronics.
Pascal Vivet works in the Ad- com.
vanced Design Department of the
French Atomic Energy Commission For further information on this or any other computing
Laboratory for Electronics & Informa- topic, visit our Digital Library at http://www.computer.
tion Technology (CEA-LETI). His re- org/publications/dlib.

September–October 2007
441

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy