0% found this document useful (0 votes)
23 views4 pages

Multiple Clock Domain Synchronization For Network On Chip

Uploaded by

tranhoang230202
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views4 pages

Multiple Clock Domain Synchronization For Network On Chip

Uploaded by

tranhoang230202
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Multiple Clock Domain Synchronization for Network on Chip

Architectures

Jabulani Nyathi, Souradip Sarkar, Partha Pratim Pande

Washington State University


Pullman, Washington-99163, USA
(jabu, ssarkar, pande)@eecs.wsu.edu

ABSTRACT challenges in terms of data integrity,


The Network-on-Chip (NoC) is emerging as a synchronization and energy dissipation.
revolutionary methodology in solving the One of the principal characteristics of the NoC
performance limitations arising out of long architectures is that the functional blocks
interconnects. Continued advancement of NoC communicate with one another with the help of
designs is heavily dependent on the ability to intelligent switches. Switches have FIFO buffers
effectively communicate among the constituent either at the input or output and we propose to re-
Intellectual Property (IP) blocks/Embedded cores, use these buffers to manage multiple clock domain
as well as manage/reduce energy dissipation. This synchronization. Section II presents the general
paper presents a low-latency, low-energy architectures of two FIFO interfaces considered in
synchronization mechanism for Network on Chip this work. In Section III performance evaluation of
architectures, which enables the network to span a these two schemes when applied to common NoC
system-on-chip (SoC) with multiple independent architectures in terms of latency and energy
clock domains. The proposed interface scheme has dissipation are presented. Section IV presents
been compared to another existing scheme and some concluding remarks.
shown to outperform it in terms of latency and
energy dissipation. II. NOC CLOCK SYNCHRONIZATION

I. INTRODUCTION Multiple clocks are necessary for communication


among IPs firstly because different IP cores on a
With shrinking geometries, global interconnects single chip have different functions and may run at
are becoming the principal performance bottleneck different frequencies and also the SoC often must
for high-performance Systems-on-Chip (SoCs) [1]. work on external clocks or data obtained from an
The Network-on-Chip (NoC) model [2] is emerging external clock. We aim to provide a low energy
as a revolutionary methodology in solving the synchronization mechanism for Network on Chip
performance limitations arising out of long (NoC) architectures to enable the network to span a
interconnects. It is a widely accepted fact that SoC containing many IP Blocks or groups of blocks
several clock cycles are required for a global signal with completely independent clock domains. One of
to travel from one end of a chip to the other. several possible synchronization schemes was
Consequently synchronization of future chips with a presented by Hataminian and Cash [3]. They noted
single clock source and negligible skew will be that for very regular structures, the skew can be
extremely difficult, if not impossible. As a result of divided into one horizontal and one vertical
the cross-chip signaling constraints, instead of component. If the vertical clock lines are placed
trying to distribute synchronous clocks, it has been equidistant from each other, the horizontal
proposed that the whole SoC be divided into difference in skew between two neighboring vertical
multiple functional islands with independent clocks. lines becomes close to constant. Furthermore, the
The inherent characteristics of NoCs allow this horizontal skew between two neighboring nodes on
division into multiple clusters as dictated by the different vertical heights also becomes almost a
interconnect infrastructure. Given that these constant.
clusters operate at different frequencies, Due to the regular structure of the NoCs, shown
communication among them gives rise to new in Figure 1, it is proposed that the Hataminian and

291
Cash solution can be easily extended to these. In are synchronous even though they might be clocked
[4], the authors describe a method of distributing a by clock signals with different frequencies.
Quasi-synchronous clock, i.e., a synchronous clock
with the same frequency but with a constant phase A. The Chelcea-Nowick (C-N) Interfaces
difference, across the entire NoC. The basic idea is The work of Chelcea and Nowick [6] discusses a
to divide the chip into clock regions, where the number of low-latency mixed timing FIFO designs
difference in arrival time of the clock signal between that interface system on chip modules running at
any two neighboring clock regions can be controlled different frequencies. The work of Chelcea and
and/or calculated beforehand due the regular Nowick has a wealth of designs to choose from but
structure of the NoCs. The principal limitation of of interest to our research is the synchronous to
these approaches is that the authors assume to synchronous interface. The Chelcea-Nowick
distribute a single synchronous clock with differing synchronous interfaces referred to as the C-N
phases all along the chip. The phase difference is scheme from here on, require detectors in order to
calculated assuming a MESH or Folded Torus-like compute the current state of the FIFO (full or
regular NoC structure. But in reality there would be empty). The full and empty detectors shown in
IP blocks running at different frequencies in a single Figure 2 monitor and report the status of the FIFO
SoC. Consequently the above assumption has very cell. A full FIFO cell cannot be written to by the
limited applicability. Instead of depending on the sending module, but can be read from by the
architectural regularity of NoC architectures for receiving module. An empty FIFO cell cannot be
clock synchronization we suggest designing the read from, but can be written to by the sending
NoC switch blocks in such a way that they can module. These detectors ensure that FIFO cell
handle communication of signals between different accesses occur only when valid operations can be
clock domains. performed. The C-N scheme also has external
Embedded core Switch controllers for conditionally passing requests for
data operations to cell arrays. These external
controllers are the put and get modules. All the
modules, along with their associated input and
output signals are shown in the block diagram of
Figure 2.
full

en_put en_get

Get Controller
req_put req_get
Put Controller

(a) (b) Cell0


valid_get
Figure 1: (a) Mesh; (b) Folded-Torus based NoCs
empty
There are different circuit-level design
Empty Detector

Cell1
methodologies to address interfacing signals
Full Detector

crossing multiple clock boundaries. In this paper we


present a new FIFO interfacing scheme that uses CLK_put Cell2 CLK_get
the ansynchronous symmetric pulse protocol (asPP)
approach abreviated GasP henceforth in data_put data_get
accordance with the definition given in [5]. The Cell3

interfacing control circuitry’s performance is synchronous synchronous


compared to that presented by Chelcea and Nowick Figure 2: Block diagram of the C-N synchronous-synchronous
[6]. Key metrics of energy dissipation and latency interface (courtesy of [C-N])
are evaluated.
Chakraborty and Greenstreet [7] present self- Further details on the basic functions of each
timed interfaces for crosing clock domains and interface module are not repeated in this study and
address various clock relationships between two an interested reader is referred to [6]. Logic level
communicating modules. Chelcea and Nowick [6] details that can be inferred from a close
present robust interfaces for mixed timing systems. examination of the block diagram of Figure 2
Our scheme is capable of providing an extensive indicate considerable logic depth. Essentially the C-
array of interfaces except we limit our study to a N interface will incur significant delays due to the
scenario in which all the communicating modules number of logic stages the majority of the signals
have to traverse.

292
The full/empty signal in the block diagram of
B. New GasP Based Interfaces Figure 3 is represented on the schematic by the
In this subsection we present a newly proposed signal of node C. The sender’s request is stored in
interface for crossing clock domains. The interface the keeper circuitry with nodes A and A_bar
uses self-timed control circuitry to generate local allowing transistor N2’s gate to be at logic 1. When
clocks and allow communicating modules operating the full/empty signal transitions from logic 0 to logic
at different arbitrary frequencies to exchange data 1 the enable signal gets generated. Note that in the
[8]. The communicating modules can be close to event that the receiving module operates with a
each other or far apart but the operation principle faster clock than that of the sending module the
remains the same. The control circuitry depends on full/empty signal is retained at node C. When clock1
both clock1 and clock2 to trigger generation of the arrives the enable signal gets generated. This
local clocks to enable the FIFO cells of the data operation ensures that events can start at either the
path to shift data along the communicating channel. sending module or the receiving module. It is this
Figure 3 shows a block diagram of the GasP based mechanism that permits clock1 and clock2 to be
scheme. There are fewer control signals involved in either of equal frequency, or of arbitrary
the generation of the enable signals than those of frequencies. The enable signals constitute the local
Figure 2. clocks and enable for data propagation from one
buffer cell to the next.
empty
clock1 FIFO Control FIFO Control clock2
(Sender end) P3 clock2
copy (Receiver end) P2
enable 1 enable 2
P1
B C
register

register

data_in buffer cell buffer cell data_out

Figure 3: Block diagram of the GasP based Scheme A A N4

B
clock1
It must be noted that this scheme allows either N1 N2

the sending or receiving module to initiate a request


for data transfer. The empty and copy signals play a N3

major role in the synchronization of data transfer


when clock1 and clock2 are independent clocks with enable (to buffer cell)
Figure 4: Circuit level representation of FIFO control circuitry
arbitrary frequencies.
Logic 0 on the empty signal does not permit for
Latency for both interfaces is considered as the
the generation of the enable signal allowing data
time it takes to queue the data on FIFO cells to the
stored at the buffer cell to remain there. The
time the data is read. Latency and energy
request, if initiated by the sender remains queued
dissipation are evaluated for the various clock
and when empty changes to logic 1 the enable
scenarios such as both modules operating at the
signal gets generated. The change of the empty
same frequency and one module having a slower
signal from logic 0 to logic 1 for this scenario
clock than that of the other module. Performance
resulted due to the arrival of clock2. This depicts a
evaluation based on latency and energy metrics is
situation in which clock2 arrives after clock1,
presented in the next section.
implying that clock2 is slower. In the event that
clock1 is slower than clock2, the empty signal would
III. PERFORMANCE EVALUATION
be at logic 1 before the sender’s request (clock1)
arrives. The arrival of clock1 will ensure that an The previous section presented two FIFO
enable signal is generated albeit after some delay
interface mechanisms to handle communication
allowing data to be stable on the data bus. Note
between modules operating at different clocks with
that the empty signal’s status of logic 0 indicates a
arbitrary frequencies. We considered a system with
full status. The above description of clock activity
64 embedded cores and mapped that onto MESH
represents cases (i) clock1 > clock2 and (ii) clock1 <
and Folded Torus-based NoCs. It is assumed that
clock2. The clock synchronization events are better
the NoC-switch blocks operate with different clock
understood by studying the circuit level diagram frequencies. Consequently the multiple clock
shown in Figure 4.
domain crossing needs to be accounted for while

293
considering inter-switch communication. The interface shows around 80% improvement over that
experimental set up is depicted in Figure 5. The two of the C-N FIFO interface. The energy values
communicating switch blocks are running with shown in Table 1 show the C-N synchronous to
different clocks clock1 and clock2. The inter-switch synchronous FIFO interface to dissipate
wire lengths depend on the architecture under significantly more energy than the GasP controlled
consideration. For the MESH-based NoC this inter- FIFO interface for both the MESH and Folded-
switch wire length turned out to be 3 mm and for Torus based NoC architectures. This is a direct
Folded Torus it was 6 mm. Both the receiver and result of the simplicity and the reduced circuitry of
sender’s clock signals are involved in the our design.
generation of the synchronization signals at the
interface. The bi-directional control signal between IV. CONCLUDING REMARKS
the interface circuitry represents the empty/full
signal. Simulations were done in 90 nm technology A FIFO interface scheme that addresses the
node and for both the C-N and the GasP based multiple clock domain synchronization issue in a
FIFO interfaces different clock frequencies were NoC platform has been presented. It has been
used. shown in this study that communicating NoC switch
blocks running at the same or arbitrary frequencies
can be managed by the proposed FIFO interface.
The proposed FIFO interface circuitry is simple yet
effective, reducing energy dissipation significantly.
At a minimum the C-N FIFO interface is shown to
dissipate 1.97 and 1.78 times more energy than the
newly proposed FIFO interface for the MESH and
Figure 5: The experimental set up Folded Torus architectures respectively with much
higher latency. Overall it has been shown in this
Table 1 shows the latency and energy values for paper that instead of depending on the architectural
the C-N and the GasP based FIFO interfaces. In all regularity of NoC architectures for clock
categories the GasP based FIFO interface out- synchronization, the NoC switch blocks can be
performs the C-N FIFO interface. For various designed in such a way that they can handle
relationships between the sender and the receiver communication among modules operating in
clocks the latency of the GasP based FIFO different clock domains.
Table 1 Performance comparison of the C-N and the GasP based FIFO interfaces
Latency (ps) Energy Dissipation (pJ)
Sender Receiver
Architecture C-N GasP C-N GasP
(GHz) (GHz)
Interface Interface Interface Interface
1.00 1.00 1950 332 2.80 1.42
MESH 1.66 0.66 1940 340 4.58 1.28
0.66 1.66 1940 300 5.56 1.60
1.00 1.00 2019 480 5.31 2.08
Folded
1.66 0.66 2009 475 6.27 1.37
Torus
0.66 1.66 2012 468 9.72 2.33

References
[1] L. P. Carloni and A. L. Sangiovani-Vincentelli, “Coping with [6] T. Chelcea and S. M. Nowick, “Robust Interfaces for Mixed-
Latency in SoC Design,” IEEE Micro, Oct. 2002, pp. 24-35. Timing Systems,” IEEE Transactions on Very Large Scale
[2] L. Benini and G. De Micheli, “Networks on Chips: A New Integration Systems, Vol. 12, No. 8, Aug. 2004, pp. 857-
SoC Paradigm,” IEEE Computer, Jan. 2002, pp. 70-78. 873.
[3] M. Hataminian, and G. Cash, “A 70-MHz 8 bit x 8 bit [7] A. Chakraborty and M. R. Greenstreet, “Efficient Self-Timed
parallel pipelined multiplier in 2.5 um CMOS”, IEEE Journal Interfaces for Crossing Clock Domains,” 2003 IEEE
on Solid-State Circuits, Vol. SC-21, pp. 505-513, Aug 1986 Proceedings of the Ninth International Symposium on
Asynchronous Circuits and Systems (ASYNC’03), May 12-
[4] E. Nilsson, J. Oberg, “Reducing Power and Latency in 2-D
15, 2003, pp. 68-78.
Mesh NoCs using Globally Pseudochronous Locally
Synchronous Clocking” Proceedings of International [8] I. Sutherland and S. Fairbanks, “GasP: A Minimal FIFO
Conference Hardware/Software Co design and System Control,” Proc. of ASYNC, 2001, pp. 46-53.
Synthesis, 2004. CODES + ISSS 2004, pp. 176-181
[5] I. E. Sutherland and J. Ebergen, “Computers Without
Clocks,” Scientific American, Vol 287, No. 2, Aug. 2002, pp.
62-69.

294

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy