AdvVLSI - Module2 1 9
AdvVLSI - Module2 1 9
Module 2:
Syllabus:
Floor planning and placement: Goals and objectives, Measurement of delay in Floor planning, Floor
planning tools, Channel definition, I/O and Power planning and Clock planning. Placement: Goals and
Objectives, Min-cut Placement algorithm, Iterative Placement Improvement, Time driven placement
methods, Physical Design Flow. Routing: Global Routing: Goals and objectives, Global Routing Methods,
Global routing between blocks, Back annotation.
The input to the floorplanning step is the output of system partitioning and design entry—a
netlist. Floorplanning precedes placement, but we shall cover them together. The output of the
placement step is a set of directions for the routing tools.
The input to a floorplanning tool is a hierarchical netlist that describes the interconnection of the blocks
(RAM, ROM, ALU, cache controller, and so on); the logic cells (NAND, NOR, D flip-flop, and so on) within
the blocks; and the logic cell connectors (the terms terminals , pins , or ports mean the same thing as
connectors ). The netlist is a logical description of the ASIC; the floorplan is a physical description of an
ASIC. Floorplanning is thus a mapping between the logical description (the netlist) and the physical
description (the floorplan).
The objectives of floorplanning are to minimize the chip area and minimize delay. Measuring area is
straightforward, but measuring delay is more difficult and we shall explore this next.
Throughout the ASIC design process we need to predict the performance of the final layout. In
floorplanning we wish to predict the interconnect delay before we complete any routing. Imagine trying
to predict how long it takes to get from Russia to China without knowing where in Russia we are or
where our destination is in China. Actually it is worse, because in floorplanning we may move Russia or
China.
FIGURE 16.4 Predicted capacitance. (a) Interconnect lengths as a function of fanout (FO) and circuit-block size. (b)
Wire-load table. There is only one capacitance value for each fanout (typically the average value). (c) The wire-load
table predicts the capacitance and delay of a net (with a considerable error). Net A and net B both have a fanout of
1, both have the same predicted net delay, but net B in fact has a much greater delay than net A in the actual layout
(ofcourse we shall not know what the actual layout is until much later in the design process).
To predict delay we need to know the parasitics associated with interconnect: the interconnect
capacitance ( wiring capacitance or routing capacitance ) as well as the interconnect resistance. At the
floorplanning stage we know only the fanout ( FO ) of a net (the number of gates driven by a net) and the
size of the block that the net belongs to. We cannot predict the resistance of the various pieces of the
interconnect path since we do not yet know the shape of the interconnect for a net. However, we can
estimate the total length of the interconnect and thus estimate the total capacitance. We estimate
interconnect length by collecting statistics from previously routed chips and analyzing the results. From
these statistics we create tables that predict the interconnect capacitance as a function of net fanout and
block size. A floorplanning tool can then use these predicted-capacitance tables (also known as
interconnect-load tables or wire-load tables). Figure 16.4 shows how we derive and use wire-load tables
and illustrates the following facts:
Example1 / Technique 1:
Figure 16.6 (a) shows an initial random floorplan generated by a floorplanning tool. Two of the
blocks, A and C in this example, are standard-cell areas (the chip shown in Figure 16.1 is one large
standard-cell area). These are flexible blocks (or variable blocks ) because, although their total area
is fixed, their shape (aspect ratio) and connector locations may be adjusted during the placement
step. The dimensions and connector locations of the other fixed blocks (perhaps RAM, ROM,
compiled cells, or megacells) can only be modified when they are created.
FIGURE 16.6 Floorplanning a cell-based ASIC. (a) Initial floorplan generated by the floorplanning tool. Two
of the blocks are flexible (A and C) and contain rows of standard cells (unplaced). A pop-up window shows
the status of block A. (b) An estimated placement for flexible blocks A and C. The connector positions are
known and a rat’s nest display shows the heavy congestion below block B. (c) Moving blocks to improve
the floorplan. (d) The updateddisplay shows the reduced congestion after the changes.
The floorplanner can complete an estimated placement to determine the positions of connectors
at the boundaries of the flexible blocks. Figure 16.6 (b) illustrates a rat's nest display of the
connections between blocks. Connections are shown asbundles between the centers of blocks or
as flight lines between connectors. Figure 16.6 (c) and (d) show how we can move the blocks in a
floorplanning tool to minimize routing congestion.
Example 2 / Technique 2:
We need to control the aspect ratio of our floorplan because we have to fit our chip into the die cavity
(a fixed-size hole, usually square) inside a package. Figure 16.7 (a)–(c) show how we can rearrange
our chip to achieve a square aspect ratio.Figure 16.7 (c) also shows a congestion map , another form of
FIGURE 16.7 Congestion analysis. (a) The initial floorplan with a 2:1.5 die aspect ratio.(b)Altering the floorplan to
give a 1:1 chip aspect ratio. (c) A trial floorplan with a congestion map. Blocks A and C have been placed so that we
know the terminal positions in the channels. Shading indicates the ratio of channel density to the channel capacity.
Dark areas show regionsthat cannot be routed because the channel congestion exceeds the estimated capacity.
(d) Resizing flexible blocks A and C alleviates congestion.
4. Channel Definition
During the floorplanning step we assign the areas between blocks that are to be used for
interconnect. This process is known as channel definition or channel allocation . Figure 16.8
shows a T-shaped junction between two rectangular channels and illustrates why we must
route the stem (vertical) of the T before the bar. The general problem of choosing the order of
rectangular channels to route is channel ordering .
FIGURE 16.8 Routing a T-junction between two channels in two-level metal. The dots represent logic cell pins.
(a) Routing channel A (the stem of the T) first allows us to adjust the width of channel B. (b) If we route
channel B first (the top of the T), this fixes the width of channel A. We have to route the stem of a T-junction
before we route the top.
FIGURE 16.9 Defining the channel routing order for a slicing floorplan using a slicing tree. (a) Make a cut all the
way across the chip between circuit blocks. Continue slicing until each piece contains just one circuit block. Each
cut divides a piece into two without cutting through a circuit block. (b) A sequence of cuts: 1, 2, 3, and 4 that
successively slices the chip until only circuit blocks are left. (c) The slicing tree corresponding to the sequence of
cuts gives the order in which to route the channels: 4, 3, 2, and finally 1.
Figure 16.9 shows a floorplan of a chip containing several blocks. Suppose we cut along the block
boundaries slicing the chip into two pieces ( Figure 16.9 a). Then suppose we can slice each of
these pieces into two. If we can continue in this fashion until all the blocks are separated, then
we have a slicing floorplan ( Figure 16.9 b). Figure 16.9 (c) shows how the sequence we use to
slice the chip defines a hierarchy of the blocks. Reversing the slicing order ensures that we route
the stems of all the channel T-junctions first.
Cyclic constraints
Figure 16.10 shows a floorplan that is not a slicing structure. We cannot cut the chip all the way
across with a knife without chopping a circuit block in two. This means we cannot route any of the
channels in this floorplan without routing all of the other channels first. We say there is a cyclic
constraint in this floorplan. There are two solutions to this problem. One solution is to move the
blocks until we obtain a slicing floorplan. The other solution is to allow the use of L -shaped, rather
than rectangular, channels (or areas with fixed connectors on all sides—a switch box ).We need an
area-based router rather than a channel router to route L –shaped regions or switch boxes
FIGURE 16.10 Cyclic constraints. (a) A nonslicing floorplan with a cyclic constraint that prevents
channel routing. (b) In this case it is difficult to find a slicing floorplan without increasing the chip area.
(c) This floorplan may be sliced (with initial cuts 1 or 2) and has no cyclic constraints, but it is inefficient
in area use and will be very difficult to route.
Every chip communicates with the outside world. Signals flow onto and off the chip and we need to
supply power. We need to consider the I/O and power constraints early in the floorplanning process.
A silicon chip or die is mounted on a chip carrier inside a chip package. Connections are made by
bonding the chip pads to fingers on a metal lead frame that is part of the package. The metal lead-frame
fingers connect to the package pins. A die consistsof a logic core inside a pad ring.
FIGURE 16.12 Pad-limited and core-limited die. (a) A pad-limited die. The number of pads determines the
die size. (b) A core-limited die: The core logic determines the die size. (c) Using both pad-limited pads and
core-limited pads for a square die.
FIGURE 16.13 Bonding pads. (a) This chip uses both pad-limited and core-limited pads. (b) A hybrid corner
pad. (c) A chip with stagger-bonded pads. (d) An area-bump bonded chip (or flip- chip). The chip is turned
upside down and solder bumps connect the pads to the lead frame.
Figure 16.13 (a) and (b) are magnified views of the southeast corner of our example chip and show
the different types of I/O cells. Figure 16.13 (c) shows a stagger-bond arrangement using two rows of
I/O pads. In this case the design rules for bond wires (the spacing and the angle at which the bond
wires leave the pads) become very important. Figure 16.13 (d) shows an area-bump bonding
arrangement (also known as flip- chip, solder-bump or C4, terms coined by IBM who developed this
technology
In an MGA the pad spacing and I/O-cell spacing is fixed—each pad occupies a fixed pad slot (or pad
site ). This means that the properties of the pad I/O are also fixed but, if we need to, we can parallel
adjacent output cells to increase the drive.To increase flexibility further the I/O cells can use a
separation, the I/O-cell pitch , that is smaller than the pad pitch . For example, three 4 mA driver
cells can occupy two pad slots. Then we can use two 4 mA output cells in parallel to drive one pad,
forming an 8 mA output pad as shown in Figure 16.14 . This arrangement also means the I/O pad
cells can be changed without changing the base array. This is useful as bonding techniques improve
and the pads can be moved closer together.
FIGURE 16.14 Gate-array I/O pads. (a) Cell-based ASICs may contain pad cells of different sizes and widths.
(b) A corner of a gate-array base. (c) A gate-array base with different I/O celland pad pitches.
Figure 16.15 shows two possible power distribution schemes. The long direction of a rectangular
channel is the channel spine . Some automatic routers may require that metal lines parallel to a
channel spine use a preferred layer (either m1, m2, or m3). Alternatively we say that a particular
metal layer runs in a preferreddirection . Since we can have both horizontal and vertical channels, we
may have the situation shown in Figure 16.15 , where we have to decide whether to use a preferred
layer or the preferred direction for some channels. This may or may not be handled automatically by
the routing software.
6. Clock Planning
Figure 16.16 (a) shows a clock spine (not to be confused with a channel spine) routing scheme with
all clock pins driven directly from the clock driver. MGAs and FPGAs often use this fish bone type of
clock distribution scheme. Figure 16.16 (b) shows a clock spine for a cell-based ASIC. Figure 16.16
(c) shows the clock-driver cell, often part of a special clock-pad cell.Figure 16.16 (d) illustrates clock
skew and clock latency . Since all clocked elements are driven from one net with a clock spine, skew
is caused by differing interconnect lengths and loads. If the clock- driver delay is much larger than
the interconnect delays, a clock spine achieves minimum skew but with long latency.
FIGURE 16.16 Clock distribution. (a) A clock spine for a gate array. (b) A clock spine for a cell-based ASIC
(typical chips have thousands of clock nets). (c) A clock spine is usually driven from one or more clock-
driver cells. Delay in the driver cell is a function of the number of stages and the ratio of output to input
capacitance for each stage (taper). (d) Clock latency and clock skew. We would like to minimize both latency
and skew.