0% found this document useful (0 votes)
24 views53 pages

ECE260B - CSE241A Winter 2017 Floorplanning and Partitioning

Uploaded by

jasonliang772
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views53 pages

ECE260B - CSE241A Winter 2017 Floorplanning and Partitioning

Uploaded by

jasonliang772
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 53

ECE260B – CSE241A

Winter 2017

Floorplanning and Partitioning

Website: http://vlsicad.ucsd.edu/courses/ece260b-w17/

ECE 260B – CSE 241A Floorplanning and Partitioning 1 Andrew B. Kahng, UCSD
Physical Design Flow Overview

ECE 260B – CSE 241A Floorplanning and Partitioning 2 Andrew B. Kahng, UCSD
Physical Design Flow – Pictures!

Floorplanning Powerplanning

Placement Routing

ECE 260B – CSE 241A Floorplanning and Partitioning 3 Andrew B. Kahng, UCSD
Step 0 – Architecture analysis
Understand architecture of the target design

From H. Kaeslin textbook


ECE 260B – CSE 241A Floorplanning and Partitioning 4 Andrew B. Kahng, UCSD
Step 1 – Partitioning major blocks
Circuit partitioning based on functionality
Not physically partitioned yet

From H. Kaeslin textbook


ECE 260B – CSE 241A Floorplanning and Partitioning 5 Andrew B. Kahng, UCSD
Step 2 – Pin budgeting
Considering die size, budgeting signal and power pads
Core size will be determined by the IO size

Core-limited floorplan

Pad-limited floorplan
From H. Kaeslin textbook
ECE 260B – CSE 241A Floorplanning and Partitioning 6 Andrew B. Kahng, UCSD
Step 3 – Macrocell placement
Macro cell placement

From H. Kaeslin textbook


ECE 260B – CSE 241A Floorplanning and Partitioning 7 Andrew B. Kahng, UCSD
Step 4 – Block placement
Determine size, location and A/R of soft-blocks
Blocks can be partitioned, and then placed/routed
independently – Hierarchical Physical Design

block7

block3 block6
block1
block4

block2 block5

From H. Kaeslin textbook


ECE 260B – CSE 241A Floorplanning and Partitioning 8 Andrew B. Kahng, UCSD
Step 5 – Power routing
Power routing and placement results

From H. Kaeslin textbook


ECE 260B – CSE 241A Floorplanning and Partitioning 9 Andrew B. Kahng, UCSD
Floorplanning

ECE 260B – CSE 241A Floorplanning and Partitioning 10 Andrew B. Kahng, UCSD
Floorplanning Input
Design netlist
Not necessarily final netlist, but including macros, IOs
Area requirements
Die size, package style, BEOL layer stackup
Power requirements
Peak power per block, voltage islands level shifters, power gating /
voltage scaling strategy, block placement guidance (e.g., close to power
supply pins)
Timing constraints (budgeted across top-level blocks)
Logical and/or physical hierarchy information
Datapath, control, memory, …
Structured-custom vs. ASIC; hierarchical vs. flat
IP integration requirements
Supply, isolation,
Pinout (= I/O placement for wirebond; redistribution layer in flip-chip)
SSO, ESD constraints
Sense of whether design is pad-limited or core-limited
Pad-limited: low utilization, muxing of IOs, use of pads for signals vs. power-
ground distribution, …
Core-limited: high utilization, routability, macro placement
ECE 260B – CSE 241A Floorplanning and Partitioning 11 Andrew B. Kahng, UCSD
Floorplanning Output
Per-block areas
In a hierarchical flow
IOs placed
Macros placed
Power domains created
Level shifters and power switches
Power grid designed and pre-routed
Standard cell placement regions defined
Placement guidance
Placement groups; assignment of placement groups to regions
Pre-placements
Soft blockages (forbidden to some types of cells)
Hard blockages (keep-outs)
- E.g., to preserve routability near a hard macro

Design ready for standard cell placement

ECE 260B – CSE 241A Floorplanning and Partitioning 12 Andrew B. Kahng, UCSD
Floorplan Picture

Modern SoC:
many memories,
heavy power
network

ECE 260B – CSE 241A Floorplanning and Partitioning 13 Andrew B. Kahng, UCSD
Blocks
Blocks are inside a pad frame blocks
Hard = defined outline = fixed (H,W)
Soft = defined area, but (H,W) flexible
Semi-soft = discrete set of (H,W) pairs
Shapes: rectangular, L, T, rectilinear
Pin locations defined
Can rotate, mirror std cell row
RAM
Routing inside, between blocks
Sometimes, over blocks
Floorplanning of different-sized blocks
is harder than place and route of
standard cells
Block placement is done by hand
Issues: data path
- access to power supply (power-hungry
blocks)
- alignment of power grid to supply pins
- soft blockage / “halo” to ensure routability
- leave contiguous region for std-cell P&R
- buffer sites for nets that want to get
around a macro
I/O pads
- data flow
Courtesy K. Yang, UCLA
ECE 260B – CSE 241A Floorplanning and Partitioning 14 Andrew B. Kahng, UCSD
Size Estimation in Standard-Cell Blocks
Why we care about size …
If area is too small: P&R will not finish or meet timing, will run too long
Schedule and size are inversely related (size will win out for high-volume
production – and everyone hopes that their chip will be high-volume…)
Performance and size have a complex relationship
Physical Design
Perf Schedule
(design time)

Size

Size

Old rules of thumb (modulo corrections for power, clock, etc.):


- 3LM: Cell utilization 65 percent
- 4LM: Cell utilization 70 percent
- 5LM: Cell utilization 75 percent
- 6LM: Cell utilization 80 percent // high utilizations -> dynamic IR drop, EM !!!
Metrics for standard-cell blocks
Low interconnect density Cell utilization (std-cell area / std-cell row area)
High interconnect density Pin density (causes routing hotspots); post-
placement, have congestion map (demand/supply for H, V, H+V resources)
ECE 260B – CSE 241A Floorplanning and Partitioning 15 Andrew B. Kahng, UCSD
Power Grid Definition During Floorplanning
Which layers are the primary mesh (= thick metal, e.g., M7, M6)
What is the width and pitch of the power rails?
Depends on peak current draws (e.g., “1 um width per mA”)
How frequently to tap down from primary mesh to M1 rails
What is the width and pitch of power rings?
Choose power routing widths and pitch of via stacks carefully to
avoid blocking extra routing tracks
Easy to make the design unroutable
If a track is blocked, then use the space…
Determine number and placement of power switches
Cadence SOC Encounter manual: ring, column, checkerboard
As soon as can get a quick placement, check IR drop before
continuing
All modes including test mode
Grid must be DRC-clean after floorplanning, before continuing

ECE 260B – CSE 241A Floorplanning and Partitioning 16 Andrew B. Kahng, UCSD
Automated Floorplanning
No automated floorplanning tool has ever made it to “prime
time”, but such a tool remains…
… one of several “holy grails” for the implementation flow
… the subject of MANY academic research papers
Issues
How should a floorplan be represented ?
- Completeness: should be able to represent all possible floorplans
- Efficiency: conversion between representation and actual realization
- Redundancy: not good to evaluate two floorplans, only to find they are same
- Nonoverlapping “packing” of rectilinear shapes, or ? (Kahng, ISPD-2000)
How do we search over the space of feasible floorplan representations?
- Often, simulated annealing (= a “metaheuristic”) s used
- Need a ‘perturbation’ or ‘neighborhood operator’ that induces a smooth cost
landscape
What is the optimization objective ?
- Area (whitespace minimization), wirelength, …
- Dataflow aware?
What are the optimization constraints ?
- Pre-placements, timing, routability, …

ECE 260B – CSE 241A Floorplanning and Partitioning 17 Andrew B. Kahng, UCSD
Simulated Annealing (SA)
Kirkpatrick, Gelatt, Vecchi, Science (1983): One of the most cited
scientific papers ever
SA is one of many “metaheuristics” that are used to deal with instances of
intractable (NP-hard) combinatorial problems
Genetic algorithms (Holland, U. Michigan)
Tabu search (Glover, U. Colorado)
Etc.
Combinatorial optimization has a physical analogy to the annealing (slow
cooling) of metals to produce a perfectly-ordered, minimum-energy state:
a “state” is a “solution”, “energy” is “cost”, etc.
Basic idea
Initialize – Start with a random initial solution. Initialize high “temperature”.
Step 2: “Move” – Perturb current solution to obtain a ‘neighbor’ solution
Step 3: Calculate cost change – calculate the change in solution cost due to
the move (minimization: negative change is better, positive change is worse)
Step 4: Accept/Reject – Depending on the cost change, accept or reject the
move. Probability of acceptance depends on current “temperature”.
Step 5: Update – Update temperature, current solution. Go to Step 2.
Continue until termination condition (‘freezing’ or ‘quenching’) is satisfied

ECE 260B – CSE 241A Floorplanning and Partitioning 18 Andrew B. Kahng, UCSD
SA Pseudocode
http://www.ecs.umass.edu/ece/labs/vlsicad/ece665/slides/SimulatedAnnealing.ppt

Algorithm SIMULATED-ANNEALING
Begin
temp = INIT-TEMP;
currentSol = INIT-SOLUTION;
for i = 1 to M
candidateSol = NEIGHBOR(currentSol);
ΔC = COST(candidateSol) – COST(currentSol);
if (ΔC < 0) then
currentSol = candidateSol;
else with Pr = e-(ΔC/temp))
currentSol = candidateSol;
temp = SCHEDULE(temp);
End What happens when temp = +∞ ?
What happens when temp = 0 ?
ECE 260B – CSE 241A Floorplanning and Partitioning 19 Andrew B. Kahng, UCSD
Simulated Annealing Facts

NEIGHBOR(solution) defines a topology


over all solutions in the solution space
At a fixed value of temp, SA behavior Initial state SA chooses uphill move with
nonzero probability (Hill Climbing)
corresponds to a homogeneous Markov
chain
Matrix of transition probabilities between
states
Steady-state (= equilibrium) probability Greedy Algorithm
gets stuck here!
of the Markov chain being in state A is Locally Optimum
Solution.
proportional to e(-cost(A)/temp)
When temp 0, exponentially more
likely to be in the global optimum state
“SA is optimal” (in the limit of ‘infinite
time’)
Of course, we spend only a finite amount
SA converges to global opt solution with Pr = 1
of time (#moves) at any temperature (in limit of infinite time, infinitely slow cooling)

value
Is cooling the best strategy with finite
time? See Boese/Kahng, 1993

ECE 260B – CSE 241A Floorplanning and Partitioning 20 Andrew B. Kahng, UCSD
Slicing Floorplan Representation (Otten, 1982)
A slicing floorplan can be 1
recursively cut in two without
cutting any blocks C
A “wheel” is an example of a A
non-slicing floorplan
3
2
D

A slicing floorplan can be B 4


represented as a binary tree, E
with internal nodes representing
1 (V)
slices in the floorplan and leaves 2 (H) 3 (H)
representing blocks.
Polish Expression (PE): Post-order
listing of nodes in depth-first traversal 4 (H)
of binary tree: ABHCDEHHV A B C
For given slicing floorplan, PE not
unique some redundancy D E
ECE 260B – CSE 241A Floorplanning and Partitioning 21 Andrew B. Kahng, UCSD
Annealing of Slicing Floorplans (Wong/Liu, 1986)
Normalized Polish Expression (NPE): no consecutive H’s or V’s
Chain: HVHVH.... or VHVHV....

16H35V2HV74HV
Chains

Neighborhood operators (“moves”)


M1: Swap adjacent operands (ignoring chains)
M2: Complement (= reverse) some chain
M3: Swap an adjacent operand and operator
(can give an invalid NPE, so must check validity of this move)

Fact: every pair of valid NPE’s is connected by some move


sequence “reachability” within neighborhood structure
Initial SA solution: 12V3V…nV 1 2 3 .... n Adapted from D. Pan, EE382V Fall 2008, UT Austin

ECE 260B – CSE 241A Floorplanning and Partitioning 22 Andrew B. Kahng, UCSD
Realization of Slicing Floorplans
Floorplanning (classically) is difficult for at least two reasons
Blocks have bounded or discrete aspect ratio (AR) = max (H/W, W/H)
Non-overlapping constraint: minimum area = minimum “dead space”
Discrete sizing

Rotation Dead space

Classical objective function: C = α∙Area + β∙Wirelength


Issue: How to estimate WL ( timing, noise, power, …) when pin
locations are not known, blockages not comprehended, etc.
- 2-pin net
- 3-pin net

Adapted from D. Pan, EE382V Fall 2008, UT Austin

ECE 260B – CSE 241A Floorplanning and Partitioning 23 Andrew B. Kahng, UCSD
Realization of Slicing Floorplans
What is the implied area
of a slicing tree?

Shape function captures set


of feasible (W, H) pairs for
each node in slicing tree

Shape functions can be


combined recursively
(bottom-up) in slicing tree
Pick best-area implementation
of root node
Maintain k points on each
shape curve O(kn) time to
compute shape function of
slicing floorplan
Can be updated incrementally
as well
Adapted from D. Pan, EE382V Fall 2008, UT Austin

ECE 260B – CSE 241A Floorplanning and Partitioning 24 Andrew B. Kahng, UCSD
Comments on Scalability, Time Constants
Chip implementation requires substantial resources
Time, hardware, people
Human engineer’s time constants
Interactive (real-time in layout editor, or < 2 minutes to get cursor back)
Cup of coffee (< 15 minutes)
Lunch (< 1 hour)
Overnight (< 12 hours)
Other costs and bounds
Until recently: 4GB addressable memory limit on 32-bit machines
Tool costs: SOC Encounter ($700K), PrimeTime ($100K), Design Compiler
($150K), etc. for 1-year time-based license
Engineer costs: ~$1K per workday
Instance sizes double with each technology node
But processors are not twice as fast per node
Parallel processing, more shortcuts
- E.g., #moves SA must consider per second constrains move set, cost function…
Reminder: internal course document How To Start Using
Tools Efficiently (read this !)
ECE 260B – CSE 241A Floorplanning and Partitioning 25 Andrew B. Kahng, UCSD
Sequence Pair Floorplan Representation

Based on layout partitions by non-


overlapping ascending/descending
staircases
Coded in two node sequences C
E.g., CEDFAB for descending
staircases and A
ABCDEF for ascending staircases
Larger solution space, finer
representation D E
Optimize floorplan by searching
over these representations B
F

Courtesy K. Yang, UCLA


ECE 260B – CSE 241A Floorplanning and Partitioning 26 Andrew B. Kahng, UCSD
Partitioning

Useful reference: Alpert/Kahng, “Recent Directions in Netlist Partitioning:


A Survey”, Integration: the VLSI Journal, 19(1-2), 1995, pp. 1-81.

ECE 260B – CSE 241A Floorplanning and Partitioning 27 Andrew B. Kahng, UCSD
Hypergraphs in VLSI CAD
Circuit netlist represented by hypergraph

Courtesy K. Yang, UCLA


ECE 260B – CSE 241A Floorplanning and Partitioning 28 Andrew B. Kahng, UCSD
Hypergraph Partitioning in VLSI
Circuit netlist represented by hypergraph
Variants
- directed/undirected hypergraphs
- weighted/unweighted vertices, edges
- constraints, objectives, …
Human-designed instances
- up to 10,000,000 vertices
- sparse (vertex degree ≈ 4, hyperedge size ≈ 4)
- small number of very large hyperedges (clock, reset …)
Efficiency, flexibility: “KL-FM” style preferred
KL = Kernighan-Lin 1970
FM = Fiduccia-Mattheyses 1982

Courtesy K. Yang, UCLA


ECE 260B – CSE 241A Floorplanning and Partitioning 29 Andrew B. Kahng, UCSD
Example: Partitioning of a Circuit

#vertices = 48

Hyperedge Cut = 4 (9) Hyperedge Cut = 8 Hyperedge Cut = 4 (5)


Partition Size = 15 Partition Size = 16 Partition Size = 17

Courtesy K. Yang, UCLA


ECE 260B – CSE 241A Floorplanning and Partitioning 30 Andrew B. Kahng, UCSD
Partitioning Context: Top-Down Placement
Speed: 10,000 cells/minute to final detailed placement
- partitioning used only in top-down global placement
- implied partitioning runtime: <1 second for 25K cells, < 30 seconds for 1M cells
Structure: tight balance constraint on total cell areas in partitions
- widely varying cell areas
- fixed terminals (pads, terminal propagation, etc.)

etc

ECE 260B – CSE 241A Floorplanning and Partitioning 31 Andrew B. Kahng, UCSD
Fiduccia-Mattheyses (FM) Approach

Pass:
start with all vertices free to move (unlocked)
label each possible move with immediate change in cost that it
causes (gain)
iteratively select and execute a move with highest gain, lock the
moving vertex (i.e., cannot move again during the pass), and
update affected gains
best solution seen during the pass is adopted as starting solution
for next pass

FM:
start with some initial solution
perform passes until a pass fails to improve solution quality

ECE 260B – CSE 241A Floorplanning and Partitioning 32 Andrew B. Kahng, UCSD
Gain Bucket Data Structure

+pmax

Max Cell Cell


Gain # #

-pmax

1 2 n

From D. Pan, EE382V Fall 2008, UT Austin


ECE 260B – CSE 241A Floorplanning and Partitioning 33 Andrew B. Kahng, UCSD
FM Partitioning
Moves are made based on object gain

Object Gain: The amount of change in cut crossings


that will occur if an object is moved from
its current partition into the other partition

-1 0 2
- each object is assigned a gain
- objects are put into a sorted gain list
- the object with the highest gain from the 0
larger of the two sides is selected and 0 -
-2
moved.
- the moved object is "locked"
- gains of "touched" objects are
recomputed 0 0
- gain lists are resorted -2
-1
1
-1
1

From D. Pan, EE382V Fall 2008, UT Austin


ECE 260B – CSE 241A Floorplanning and Partitioning 34 Andrew B. Kahng, UCSD
FM Partitioning

-1 0 2

0
0 -
-2

0 0
-2
-1
1
-1
1

From D. Pan, EE382V Fall 2008, UT Austin


ECE 260B – CSE 241A Floorplanning and Partitioning 35 Andrew B. Kahng, UCSD
-1 -2 -2

0
-2 -
-2

0 0
-2
-1
1
-1
1

From D. Pan, EE382V Fall 2008, UT Austin


ECE 260B – CSE 241A Floorplanning and Partitioning 36 Andrew B. Kahng, UCSD
-1 -2 -2

0
-2 -
-2

0 0
-2
-1

1 1
-1

From D. Pan, EE382V Fall 2008, UT Austin


ECE 260B – CSE 241A Floorplanning and Partitioning 37 Andrew B. Kahng, UCSD
-1 -2 -2

0
-2 -
-2

0 0
-2
-1
1
1
-1

From D. Pan, EE382V Fall 2008, UT Austin


ECE 260B – CSE 241A Floorplanning and Partitioning 38 Andrew B. Kahng, UCSD
-1 -2 -2

0
-2 -
-2

0 -2
-2
1 -1
-1
-1

From D. Pan, EE382V Fall 2008, UT Austin


ECE 260B – CSE 241A Floorplanning and Partitioning 39 Andrew B. Kahng, UCSD
-1 -2 -2

-2 -
-2 0

0 -2
-2
1 -1
-1
-1

From D. Pan, EE382V Fall 2008, UT Austin


ECE 260B – CSE 241A Floorplanning and Partitioning 40 Andrew B. Kahng, UCSD
-1 -2 -2

-2 -
-2 0

0 -2
-2
1 -1
-1
-1

From D. Pan, EE382V Fall 2008, UT Austin


ECE 260B – CSE 241A Floorplanning and Partitioning 41 Andrew B. Kahng, UCSD
-1 -2 -2

-2 1
-2
0

-2 -2
-2
1 -1
-1
-1

From D. Pan, EE382V Fall 2008, UT Austin


ECE 260B – CSE 241A Floorplanning and Partitioning 42 Andrew B. Kahng, UCSD
-1 -2 -2

-2 1
-2
0

-2 -2
1 -2

-1
-1
-1

From D. Pan, EE382V Fall 2008, UT Austin


ECE 260B – CSE 241A Floorplanning and Partitioning 43 Andrew B. Kahng, UCSD
-1 -2 -2

-2 1
-2
0

-2 -2
1 -2

-1
-1
-1

From D. Pan, EE382V Fall 2008, UT Austin


ECE 260B – CSE 241A Floorplanning and Partitioning 44 Andrew B. Kahng, UCSD
-1 -2 -2

-2 1
-2
0

-2 -1
-2
-2

-3
-1
-1

From D. Pan, EE382V Fall 2008, UT Austin


ECE 260B – CSE 241A Floorplanning and Partitioning 45 Andrew B. Kahng, UCSD
-1 -2 -2

1
-2
-2
0
-
-2 1
-2
-2

-3
-1
-1

From D. Pan, EE382V Fall 2008, UT Austin


ECE 260B – CSE 241A Floorplanning and Partitioning 46 Andrew B. Kahng, UCSD
-1 -2 -2

1
-2
-2
0
-
-2 1
-2
-2

-3
-1
-1

From D. Pan, EE382V Fall 2008, UT Austin


ECE 260B – CSE 241A Floorplanning and Partitioning 47 Andrew B. Kahng, UCSD
-1 -2 -2

-1
-2
-2
-2
-
-2 1
-2
-2

-3
-1
-1

From D. Pan, EE382V Fall 2008, UT Austin


ECE 260B – CSE 241A Floorplanning and Partitioning 48 Andrew B. Kahng, UCSD
Cut During One F-M Pass (Bipartitioning)

Cut

Moves
ECE 260B – CSE 241A Floorplanning and Partitioning 49 Andrew B. Kahng, UCSD
Time Complexity of FM

For each pass:


Constant time to find the best vertex to move
After each move, time to update gain buckets is proportional to
degree of vertex moved
Total time is O(p), where p is total number of pins

Number of passes is usually small


In practice
Force #passes = 2 or less
Cut off the pass very early (after only 5% finished)
Together, these two heuristic modifications result in ~50X speedup!

ECE 260B – CSE 241A Floorplanning and Partitioning 50 Andrew B. Kahng, UCSD
Multilevel Partitioning (since ~1995)

Refinement /
Clustering / Uncoarsening
Coarsening
Near-linear scalability
ECE 260B – CSE 241A Floorplanning and Partitioning 51 Andrew B. Kahng, UCSD
Homework (due Monday Jan 30 in Gradescope)
Q1. (a) What is “fixed-outline floorplanning”? Give a
definition and at least one citation. (b) How does “fixed-
outline floorplanning” differ from the “packing with
minimum whitespace” formulations seen in the academic
literature? (c) Why is “fixed-outline floorplanning” sensible
in the context of modern design of a (large) logic block? (d)
Look at the documentation of Cadence EDI / Innovus. List,
and give explanations of, at least three commands that
define “floorplan” regions that will affect the subsequent
gate-level placement.
Q2. Download and skim U.S. Patent #6,223,329. Think
about Figures 4, 5 and 8. (a) Explain in your own words
how the invention claimed in this patent will define “ports”
of blocks. (b) Why does the patent call the blocks in the
figures “soft blocks”?

ECE 260B – CSE 241A Floorplanning and Partitioning 52 Andrew B. Kahng, UCSD
Readings linked from class webpage
C.M. Fiduccia and R.M. Mattheyses, A linear time heuristic for improving
network partitions, Proc. ACM/IEEE Design Automation Conference. (1982)
pp. 175 - 181.
A. E. Caldwell, A. B. Kahng and I. L. Markov. Design and Implementation of
the Fiduccia-Mattheyses Heuristic for VLSI Netlist Partitioning. Proc.
Workshop on Algorithm Engineering and Experimentation (ALENEX),
January, 1999
C. J. Alpert and A. B. Kahng, "Recent Directions in Netlist Partitioning: A
Survey“, Integration: The VLSI Journal 19 (1995), pp. 1-81.

ECE 260B – CSE 241A Floorplanning and Partitioning Andrew B. Kahng, UCSD

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy