02c BranchPred
02c BranchPred
Branch Prediction
Outline
• Control Dependence • Micro-architecture
and Branch – Branch Target Buffer
• Static Branch – Return Address
Prediction Stack
• Dynamic Branch • Branch Prediction in
Prediction Real World
– One Bit
– Two Bits
• Global Branch History
• Hybrid Branch
Predictor
Control Dependencies
• Predict Branches
– And predict them well!
• Fetch, decode, etc. on the predicted
path
– Option 1: No execute until branch
resovled
– Option 2: Execute anyway (speculation)
• Recover from mispredictions
– Restart fetch from correct path
Branch Prediction
• Need to know two things
– Whether the branch is taken or not (direction)
– The target address if it is taken (target)
}
The Bit Is Not Enough!
100,000 iterations
NT
How often is branch outcome != previous outcome? TN
2 / 100,000
99.998%
DC44: TTTTT ... TNTTTTT … TNTTTTT …
Prediction
2 / 100 Rate
98.0%
DC50: TNTNTNTNTNTNTNTNTNTNTNTNTNTNT …
2/2 0.0%
Two Bits are Better Than One
Predict NT
Predict T
Transistion on T outcome
2 3
Transistion on NT outcome
0 1
0 1
2bC:
0 1 2 3 3 3 3 2 3 3
… …
T T T T T T N T T T
This is bad!
Importance of Branches
• 98% 99%
– Who cares?
– Actually, it’s 2% misprediction rate 1%
– That’s a halving of the number of mispredictions
• So what?
– If misp rate equals 50%, and 1 in 5 insts is a branch, then
number of useful instructions that we can fetch is:
5*(1 + ½ + (½)2 + (½)3 + … ) = 10
– If we halve the miss rate down to 25%:
5*(1 + ¾ + (¾)2 + (¾)3 + … ) = 20
– Halving the miss rate doubles the number of useful
instructions that we can try to extract ILP from
How about the Branch at
0xdc50?
• 1bc and 2bc don’t do too well (50% at
best)
• But it’s still obviously predictable
• Why?
– It has a repeating pattern: (NT)*
– How about other patterns? (TTNTN)*
1 3 3 prev = 1 3 0 prediction = N
prev = 0 3 0 prediction = T
prev = 1 3 3 prediction = T prev = 1 3 0 prediction = N
prev = 1 3 2 prediction = T
prev = 1 3 3 prediction = T
Deeper History Covers More
Patterns
(3, 2) predictor
PC Counter if prev=010
0 0 1 1 3 1 0 3 2 0 2
Counter if prev=111
A: p = findNode(foo);
if ( p is parent )
do something;
Meta
Pred0 Pred1
Final Prediction Update
---
If meta-counter MSB = 0,
use pred0 else use pred1 Inc
Dec
---
Common Combinations
• Global history + Local history
• “easy” branches + global history
– 2bC and gshare
• short history + long history
• Hybrid predictor
– combines local history and global history
components with a meta-predictor
Example 2: Pentium-M