11 Timing Analysis Logic
11 Timing Analysis Logic
Pingqiang Zhou
ShanghaiTech University
ASIC Timing: Role of CAD Tools
ASIC timing has deep interactions with logic and layout
synthesis.
High-level description
+ Timing Specifications
Logic Layout
Synthesis Synthesis
Logic Layout
Synthesis Synthesis
3
Our Topics for ASIC Timing
Logic-side: Static Timing Analysis
How do we estimate the worst-case timing through a logic
network?
Turns out to be longest paths through a graph, which
properly models the gates and wires.
4
Timing Analysis at the Logic Level
Goal: Verify timing behavior of our logic design
Input:
A gate-level netlist.
Timing models of the gates and/or wires.
Output:
Signal arrival time at various points in the network.
Longest delays through gate network.
Does the netlist satisfy the timing requirement? If not, where
are key problems?
This is surprisingly complicated in the real world...
5
Analyzing Design Performance
Assume design is synchronous.
All storage is in explicit sequential elements, e.g., flip-flop elements.
Consequence: we can just focus on delays through combinational
gates.
Launch Capture
Combinational
Flip Flops
Flip Flops
Logic
(No feedback
loops)
Clock
6
Question: Can’t We Just Simulate Logic?
What logic simulation does?
Determines how a system will behave by simulating the logical
function.
Gives the most accurate answer with good simulation models.
… but it is (practically) impossible to give a complete answer –
especially timing.
Requires examination of an exponential number of cases.
All possible input vectors …
With all possible relative timings …
Under all possible manufacturing variations …
We need a different, faster solution...
7
Timing Analysis: Basic Model
Assume we know clock cycle 1ns
E.g., 1GHz clock, cycle = 1ns.
Combinational
Flip Flops
Flip Flops
Logic
Longest delay
< Clock cycle
8
Clock
Timing Analysis: Gate Delay Models
First: we need a model of delay through each logic gate.
∆
Delay of a single gate:
X Y X Y
∆
1
9
[Courtesy: UC Berkeley]
10
11 [Courtesy: UC Berkeley]
12 [Courtesy: UC Berkeley]
13 [Courtesy: UC Berkeley]
In Reality: Gate Delay is Very Complex
Gate type affects delay Gate loading affects delay
∆ ≠ ∆ ∆ ≠ ∆
∆ ≠ ∆ ∆ ≠ ∆
14
In Reality: Gate Delay is Very Complex
Gate input pin affects delay
Why? At transistor level, inputs are not symmetric.
∆ ≠ ∆
PDF
∆
∆
200 240 280 ∆
15
Our Model: Pin-to-Pin Delay
In our lecture, we keep it simple: Fixed, pin-to-pin delay
model
No slopes, transition direction, distributions. Loading effects
“pushed” into gate delay itself.
Per-pin delays are essential, but we will use just 1 value per
gate, for simplicity.
Turns out this is enough to see all the interesting algorithm
ideas.
∆=3 ∆=5
∆=3 ∆=5
16
Do We Consider Logical Function?
Does logic function matter?
Try an example, where we “erase” gates.
In this example: PI = Primary Input, PO = Primary Output
PI ∆=8 ∆=8
∆=2 ∆=2
PI ∆=1 ∆=1 PO
PI ∆=1
PI ∆=8 ∆=8
0 0
∆=2 ∆=2
1 1 PO
PI ∆=1 ∆=1
PI
We cannot sensitize this path: cannot make a logic change
at this input propagate down this path to change this output.
18
Topological vs. Logical Timing Analysis
When we ignore logic, this is called Topological Analysis.
We only work with graph and delays, don’t consider logic.
We can get wrong answers: what we found was called a
False Path.
Going forward: we ignore logic (Too tough to deal with)
Assume that all paths are statically sensitizable.
Means: Can find a constant pattern of inputs to other PIs that
makes some output sensitive to some input.
Reminder: this is exactly the Boolean Difference concept of
sensitivity.
This timing analysis has a name: Static Timing Analysis
(STA).
19
STA Representation: Delay Graph
From gate-level network, we build a delay graph.
Vertices: Wires in gate network, one per gate output, also one
for each PI and PO.
Edges: Input pin to output pin of gate in network (one edge
per input pin). Put gate delays on edges.
PI a ∆=4 c a 4
∆=3 e c 3
PI ∆=4 PO b 4 e
b PI d ∆=3 d 3
20
Delay Graph
Common convention: Add Source/Sink nodes
Add one “source” (src) node that has a 0-weight edge to
each PI.
Add one “sink” (snk) node that has a 0-weight edge from
each PO.
Why do this?
Now, the network has exactly 1 “entry” node, and 1 “exit” node.
All the longest (or shortest) path question have same start/end
nodes.
0 a 4
c 3 0
src 0 b 4 e snk
21 0 d 3
Representation: Delay Graph
What about interconnect delay?
Can still use delay graph: model each wire as a “special” gate
that just has a delay.
∆=1 x
PI a ∆=4 c ∆=2 w
∆=3 e ∆=2 q
PI ∆=4 PO
b ∆=2 y PI d z ∆=3
∆=1
1 4
0 a x 2
2 c w 3 2 0
src 0 b y 4 e q snk
0 d z 3
22 1
Operations on Delay Graph
So how do we use delay graph to do timing analysis?
What we don’t do: Try to enumerate all the source-to-sink
paths.
Why not? Exponential explosion in number of paths, even for
small graph.
0 1 2 … n
How many paths
from 0 to n?
2𝑛
23
Define Values on Nodes in Delay Graph
Arrival Time at a node (AT)
AT(n) = Latest time the signal can become stable node n
Think: Longest path from source
Required Arrival Time at node (RAT)
RAT(n) =Latest time the signal is allowed to become
stable at node n
Think: Longest path to sink
AT RAT
n snk
src
Other paths
24
Define Values on Nodes in Delay Graph
Slack at node n: Slack(n) = RAT(n) – AT(n)
Amount of timing “margin” for the signal: positive is good,
negative is bad.
Determined by longest path through node.
Amount by which a signal can be delayed at node and
not increase the longest path through the network
Can increase delay at node (to minimize power, circuit
area) with positive slack and not degrade overall
performance.
AT RAT
n snk Slack(n) = RAT(n) – AT(n)
src
25 Other paths
Slack is Hugely Important in Timing Analysis
About slacks
Defined so negative slack always bad: it indicates a timing
problem.
Measures “sensitivity” of network to this node’s delay.
Positive slack
Good: can change something at this node, and not hurt network’s
overall timing.
Example: make this node slower, maybe save some power, not hurt
timing.
Negative slack
Bad: have problem at this node; more negative the slack, bigger the
problem.
Looking for a node to “fix” to help timing? These nodes are where to
26
look first. These affect the critical paths the most.
How To Compute ATs? Recursively
predecessor successor
paths * * paths
…
…
* *
predecessor successor
0, if n is source
AT(n) = maximum delay to n =
max {AT(p)+∆(p,n)}, else
27
p ∈ prec(n)
How To Compute ATs?
Big idea
If we know the longest path to each predecessor of n, it’s a
simple “Maximum” operation to compute the longest path to
n itself.
AT(x)=5 x ∆=7
AT(n) = max {AT(p)+∆(p,n)}
p ∈ {x,y,z}
src AT(y)=10 y ∆=1 n
= max {5+7, 10+1, 5+5}
∆=5
z =12
AT(z)=5
28
How To Compute RATs?
predecessor successor
paths * * paths
src p n s snk
∆(n,s)
…
…
* *
predecessor successor
RAT(n): Latest time in cycle where n could change and signal
would still propagate to sink before end of cycle.
First, what is RAT(snk)? RAT(snk) = Cycle Time
How about internal node n? RAT(n) = min {RAT(s)−∆(n,s)}
s ∈ succ(n)
29
How To Compute RATs? Recursively
predecessor successor
paths * * paths
src p n s snk
∆(n,s)
…
…
* *
predecessor successor
31
Launch Clock Cycle Time Capture
Negative Slack is BAD!
1 3 2
0
a d g i 0
5 1
0 4 3 0
src b f j snk
0 4 2 0
1
c e h k
2 3 5
33
Compute ATs
0 1 3
4 2
7
1
0
a d g i 0
0 0 5 1 15
6 12 0
0 4 3
src b f j snk
0 4 2 0
1
c e h k
2 3 5
0 2 10 15
34
Compute RATs
Clock cycle is 12.
-3 -2 3
10 2
12
1
0
a d g i 0
-3 -1 5 1 12
3 12 0
0 4 3
src b f j snk
0 4 2 0
1
c e h k
2 3 5
2 4 7 12
35
Compute Slack
Slack = RAT - AT
0 -3 -3 1 -2 -3 3 4 10 6 2 7 12 5
1
0
a d g i 0
0 -3 -3 5 1
0 -1 -1 1212 0 0 15 12 -3
src 0 b 4
f 3
j snk
0 6 3 -3 4 2 0
1
c e h k
2 3 5
022 24 2 10 7 -3 15 12 -3
36
Analyzing the Example
Worst (most negative) slack is -3.
Big results:
Your timing violation at sink = the worst slack value.
The worst slack appears along this entire worst path.
0 -3 -3 1 -2 -3 3 4 10 6 2 7 12 5
1
0
a d g i 0
0 -3 -3 5 1
0 -1 -1 1212 0 0 15 12 -3
src 0 b 4 f 3
j snk
0
1 6 3 -3 4 2 0
c e h k
2 3 5
022 24 2 10 7 -3 15 12 -3
37
Analyzing the Example
Look at those slacks
A negative slack at an output (PO) means a failed timing
requirement.
A negative slack on internal node n means there is a path from n
to some problem PO.
38
The Most Typical STA Problem
Answer this problem: What are all the too-slow paths that
violate timing?
Most useful report:
Report paths in order, from slowest to fastest.
In other words: Enumerate these paths, in delay order.
Flip Flops
39
Clock
What Do We Need?
Calculate all the ATs.
Calculate all the RATs.
Calculate all the Slacks.
… do all of this very efficiently: Delay graphs are huge!
…enumerate the violating paths, in worst delay order.
0 -3 -3 1 -2 -3 4 10 6 7 12 5
1 3 2
a d g i 0
0
0 -3 -3 0 -1 -1 5 1 15 12 -3
12 12 0
0 4 3 0
src b f j snk
0 6 3 -3 4 2 0
1
c e h k
2 3 5
40 0 2 2 2 4 2 10 7 -3 15 12 -3
Computational Strategy
Topological sorting (“Topsorting”) the delay graph.
Sort the vertices in the delay graph into one single ordered list.
Essential property: if there is an edge from 𝑝 to 𝑠, then 𝑝
appears before 𝑠 in sorted order.
Compute ATs by going forward through the sorted list.
Compute RATs by going backward through the sorted list.
5
3
b d 6 Legal Topsorting Order
11 a, b, c, d, e, f
a f
a, b, d, c, e, f
4 9
c e 15
41
Assume Have Topsort: Compute ATs
computeATs() {
AT(SRC) = 0;
foreach ( n in topsort order ) {
AT(n) = -∞;
foreach ( node p in pred(n) )
AT(n) = max( AT(n), AT(p) + ∆(p,n) );
}
} * *
…
…
* *
42 predecessor successor
Compute RATs
Trick: Pretend all edges are reversed, they point from SNK to
SRC, and walk graph backwards.
computeRATs() {
RAT(sink) = CycleTime;
foreach ( n in reverse topsort order ) {
RAT(n) = ∞;
foreach (successor s in succ(n) )
RAT(n) = min( RAT(n), RAT(s) - ∆(n,s) );
}
} * *
src p n s snk
∆(n,s)
…
…
* *
43 predecessor successor
Using Slack For Path Reporting
Useful slack property: all nodes on longest path have same worst
slack value.
Surprising result: slack let us can find N worst paths, even
though we did not trace them all.
AT=3 AT=8
Assume clock cycle = 29 RAT=3 RAT=23
Slack=0 Slack=15
5
3
b d 6
AT=0 AT=29
RAT=0 a 11
f RAT=29
Slack=0 9 Slack=0
4 c e 15
AT=4 AT=14
RAT=5 RAT=14
44
Slack=1 Slack=0
N-Worst Path Reporting
We evolve partial paths; each partial path stores 3 things:
(Path itself, Delay of this path, Slack of the final node on path)
We store the partial paths in a min heap, which is indexed on
the Slack value.
Initially this heap contains only the source node.
Algorithm is quite simple (and just like maze routing!).
Expand: Pop partial path off the heap – it has the smallest (most
negative) slack.
Reach target? If its end node is the sink, print out the path.
Reach: Else add each successor node to make new partial paths,
push them back onto the heap, each with
(Path, Delay, Slack) labeled.
Repeat until N paths are reported – go pop next partial path.
45
Worst Case Path Reporting: Example
Slack=0 Slack=15
5
3
b d 6
11
Slack=0 a f Slack=0
4 9
Source c e 15 Sink
Slack=1 Slack=0
Min heap entry of the form (Path, Delay, Slack)
Initially, heap contains only the source node.
Min Heap
Min Heap
(a-b-e,14,0)
(a-b,3,0) Expand path a-b,
(a-c,4,1)
(a-c,4,1) reach d & e
(a-b-d,8,15)
47
Worst Case Path Reporting: Example
Slack=0 Slack=15
5
3
b d 6
11
Slack=0 a f Slack=0
4 9
Source c e 15 Sink
Slack=1 Slack=0
f is sink!. Report 1st
Min Heap worst path a-b-e-f,
(a-b-e,14,0) with delay=29
Expand path a-b-e,
(a-c,4,1) reach f Min Heap
(a-b-d,8,15)
(a-c,4,1)
(a-b-d,8,15)
48
Worst Case Path Reporting: Example
Slack=0 Slack=15
5
3
b d 6
11
Slack=0 a f Slack=0
4 9
Source c e 15 Sink
Slack=1 Slack=0
Min Heap Min Heap
(a-c,4,1) Expand path a-c, (a-c-e,13,0)
(a-b-d,8,15) reach e (a-b-d,8,15)
49
Worst Case Path Reporting: Example
Slack=0 Slack=15
5
3
b d 6
11
Slack=0 a f Slack=0
4 9
Source c e 15 Sink
Slack=1 Slack=0
f is sink!. Report 2nd
Min Heap worst path a-c-e-f,
Expand path a-c-e, with delay=28
(a-c-e,13,0)
(a-b-d,8,15) reach f Min Heap
(a-b-d,8,15)
50
Worst Case Path Reporting: Example
Slack=0 Slack=15
5
3
b d 6
11
Slack=0 a f Slack=0
4 9
Source c e 15 Sink
Slack=1 Slack=0
f is sink!. Report 3rd
worst path a-b-d-f,
Min Heap with delay=14
Expand path a-b-d,
(a-b-d,8,15) reach f Min Heap
(EMPTY) Done!
51
Worst Case Path Reporting: Example
Slack=0 Slack=15
5
3
b d 6
11
Slack=0 a f Slack=0
4 9
Source c e 15 Sink
Slack=1 Slack=0
52
Static Timing Analysis: Summary
STA is a very important step in design of complex ASICs.
It’s a critical “sign off” step, which means: you don’t get to
fabricate unless you pass.
Several big ideas
Gate level delay models matter, and can be pretty complex in
real world.
Logical ≠ Topological path analysis (i.e., STA).
Build delay graph, calculate ATs, RATs, slacks recursively.
Concept of slack is big: lets us locate worst paths, and problem
gates on path.
A similar idea to maze routing lets us find worst paths in delay
order.
53
Static Timing Analysis: Aside
STA is a huge topic – several things we did not cover.
STA for sequential elements
How do we model flip flops and latches, so we can verify, e.g., that setup and
hold times are met? More tricks with delay graph.
Early mode versus late mode timing
Our development was only so-called late mode timing, where we care about
longest path. Early mode focuses on shortest paths, and is critical for more
advanced timing, e.g., with transparent latches.
Incremental STA
In practice, you change 10,000 gates out of 1,000,000 gates, you don’t want to
redo the whole STA analysis. Advanced methods can update incrementally.
54
55