0% found this document useful (0 votes)
7 views27 pages

L13 Stalls and Flushes

The document discusses data hazards in pipelined CPUs, focusing on how stalls and forwarding can resolve issues when instructions depend on each other. It highlights the limitations of forwarding, particularly with load instructions and branches, which may require stalling the pipeline to maintain performance. Additionally, it covers methods for detecting stalls, branch prediction, and the implications of mispredictions on performance.

Uploaded by

govaje4313
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views27 pages

L13 Stalls and Flushes

The document discusses data hazards in pipelined CPUs, focusing on how stalls and forwarding can resolve issues when instructions depend on each other. It highlights the limitations of forwarding, particularly with load instructions and branches, which may require stalling the pipeline to maintain performance. Additionally, it covers methods for detecting stalls, branch prediction, and the implications of mispredictions on performance.

Uploaded by

govaje4313
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 27

Stalls and flushes

 So far, we have discussed data hazards that can occur in


pipelined CPUs if some instructions depend upon others that are
still executing.
— Many hazards can be resolved by forwarding data from the
pipeline registers, instead of waiting for the writeback stage.
— The pipeline continues running at full speed, with one
instruction beginning on every clock cycle.
 Now, we’ll see some real limitations of pipelining.
— Forwarding may not work for data hazards from load
instructions.
— Branches affect the instruction fetch for the next clock cycle.
 In both of these cases we may need to slow down, or stall, the
pipeline.

1
Data hazard review
 A data hazard arises if one instruction needs data that isn’t ready
yet.
— Below, the AND and OR both need to read register $2.
— But $2 isn’t updated by SUB until the fifth clock cycle.
 Dependency arrows that point backwards indicate hazards.
Clock cycle
1 2 3 4 5 6 7

IM Reg DM Reg
sub $2, $1, $3

IM Reg DM Reg
and $12, $2, $5

IM Reg DM Reg
or $13, $6, $2

2
Forwarding
 The desired value ($1 - $3) has actually already been computed—
it just hasn’t been written to the registers yet.
 Forwarding allows other instructions to read ALU results directly
from the pipeline registers, without going through the register
file.
Clock cycle
1 2 3 4 5 6 7

IM Reg DM Reg
sub $2, $1, $3

IM Reg DM Reg
and $12, $2, $5

IM Reg DM Reg
or $13, $6, $2

3
What about loads?
 Imagine if the first instruction in the example was LW instead of
SUB.
— How does this change the data hazard?

Clock cycle
1 2 3 4 5 6

IM Reg DM Reg
lw $2, 20($3)

IM Reg DM Reg
and $12, $2, $5

4
What about loads?
 Imagine if the first instruction in the example was LW instead of
SUB.
— The load data doesn’t come from memory until the end of
cycle 4.
— But the AND needs that value at the beginning of the same
cycle!
 This is a “true” data hazard—the datacycle
Clock is not available when we
need it. 1 2 3 4 5 6

IM Reg DM Reg
lw $2, 20($3)

IM Reg DM Reg
and $12, $2, $5

5
Stalling
 The easiest solution is to stall the pipeline.
 We could delay the AND instruction by introducing a one-cycle
delay into the pipeline, sometimes called a bubble.

Clock cycle
1 2 3 4 5 6 7

IM Reg DM Reg
lw $2, 20($3)

IM Reg DM Reg
and $12, $2, $5

 Notice that we’re still using forwarding in cycle 5, to get data


from the MEM/WB pipeline register to the ALU.

6
Stalling and forwarding
 Without forwarding, we’d have to stall for two cycles to wait for
the LW instruction’s writeback stage.

Clock cycle
1 2 3 4 5 6 7 8

IM Reg DM Reg
lw $2, 20($3)

IM Reg DM Reg
and $12, $2, $5

 In general, you can always stall to avoid hazards—but


dependencies are very common in real code, and stalling often
can reduce performance by a significant amount.

7
Stalling delays the entire pipeline
 If we delay the second instruction, we’ll have to delay the third
one too.
— Why?

Clock cycle
1 2 3 4 5 6 7 8

IM Reg DM Reg
lw $2, 20($3)

IM Reg DM Reg
and $12, $2, $5

IM Reg DM Reg
or $13, $12, $2

8
Stalling delays the entire pipeline
 If we delay the second instruction, we’ll have to delay the third
one too.
— This is necessary to make forwarding work between AND and
OR.
— It also prevents problems such as two instructions trying to
Clock
write to the same register in the cycle
same cycle.
1 2 3 4 5 6 7 8

IM Reg DM Reg
lw $2, 20($3)

IM Reg DM Reg
and $12, $2, $5

IM Reg DM Reg
or $13, $12, $2

9
What about EXE, MEM, WB
 But what about the ALU during cycle 4, the data memory in cycle
5, and the register file write in cycle 6?

Clock cycle
1 2 3 4 5 6 7 8

IM Reg DM Reg
lw $2, 20($3)

IM Reg Reg DM Reg


and $12, $2, $5

IM IM Reg DM Reg

or $13, $12, $2

 Those units aren’t used in those cycles because of the stall, so we


can set the EX, MEM and WB control signals to all 0s.

11
Stall = Nop conversion
Clock cycle
1 2 3 4 5 6 7 8

lw $2, 20($3) IM Reg DM Reg

IM Reg DM Reg
and -> nop

IM Reg DM Reg

and $12, $2, $5

IM Reg DM Reg

or $13, $12, $2
 The effect of a load stall is to insert an empty or nop instruction
into the pipeline

12
Detecting stalls
 Detecting stall is much like detecting data hazards.

 Recall the format of hazard detection equations:

if (EX/MEM.RegWrite = 1
and EX/MEM.RegisterRd = ID/EX.RegisterRs)
then Bypass Rs from EX/MEM stage latch

ex/mem mem\wb
ex/mem
id/ex
if/id

IM Reg DM Reg
sub $2, $1, $3

mem\wb
id/ex
IM Reg DM Reg
if/id

and $12, $2, $5

13
Detecting Stalls, cont.
 When should stalls be detected?

mem\wb
ex/mem
id/ex
lw $2, 20($3) IM Reg DM Reg

if/id

mem\wb
ex/mem
id/ex
if/id

if/id
IM Reg Reg DM Reg
and $12, $2, $5

 What is the stall condition?

if (

)
then stall

14
Adding hazard detection to the CPU
Hazard
Unit
ID/EX
WB EX/MEM
M WB MEM/WB
Control EX M WB
PC
IF/ID
Read Read 0
register 1 data 1 1
Addr Instr 2
Read ALU
register 2 Zero
ALUSrc
Write Read Result Address
0
Instruction register data 2
1 0 Data
memory
Write Registers 2 memory
data 1
Write Read
Instr [15 - 0] 1
RegDst data data
Extend
Rt 0
0
Rd
1 EX/MEM.RegisterRd
Rs

Forwarding
Unit

MEM/WB.RegisterRd

16
Adding hazard detection to the CPU
ID/EX.MemRead
Hazard
Unit ID/EX.RegisterRt
IF/ID Write
ID/EX
PC Write

Rs Rt 0 WB EX/MEM
M WB MEM/WB
Control 1
EX M WB
PC
IF/ID
Read Read 0
register 1 data 1 1
Addr Instr 2
Read ALU
register 2 Zero
ALUSrc
Write Read Result Address
0
Instruction register data 2
1 0 Data
memory
Write Registers 2 memory
data 1
Write Read
Instr [15 - 0] 1
RegDst data data
Extend
Rt 0
0
Rd
1 EX/MEM.RegisterRd
Rs

Forwarding
Unit

MEM/WB.RegisterRd

17
Generalizing Forwarding/Stalling
 What if data memory access was so slow, we wanted to pipeline
it over 2 cycles?
Clock cycle
1 2 3 4 5 6

IM Reg DM Reg

 How many bypass inputs would the muxes in EXE have?


 Which instructions in the following require stalling and/or
bypassing?

lw r13, 0(r11)
add r7, r8, r9
add r15, r7, r13

19
Branches in the original pipelined datapath
1

0 ID/EX
When are they resolved?
WB EX/MEM
PCSrc
Control M WB MEM/WB
IF/ID EX M WB
4
Add
P Add
C Shift
RegWrite left 2

Read Read
register 1 data 1 MemWrite
ALU
Read Instruction Zero
Read Read
address [31-0] 0
register 2 data 2 Result Address
Write
1 Data
Instruction register MemToReg
memory
memory Registers ALUOp
Write
data ALUSrc Write Read
1
data data
Instr [15 - 0] Sign
RegDst
extend MemRead
0
Instr [20 - 16]
0
Instr [15 - 11]
1

20
Branches
 Most of the work for a branch computation is done in the EX
stage.
— The branch target address is computed.
— The source registers are compared by the ALU, and the Zero
flag is set or cleared accordingly.
 Thus, the branch decision cannot be made until the end of the EX
stage.
— But we need to know which instruction to fetch next, in order
to keep the pipeline running!
Clock cycle
— This leads to what’s called a control hazard.
1 2 3 4 5 6 7 8

IM Reg DM Reg
beq $2, $3, Label

IM

???

21
Stalling is one solution
 Again, stalling is always one possible solution.

Clock cycle
1 2 3 4 5 6 7 8

IM Reg DM Reg
beq $2, $3, Label

IM IM Reg DM Reg

???

 Here we just stall until cycle 4, after we do make the branch


decision.

22
Branch prediction
 Another approach is to guess whether or not the branch is taken.
— In terms of hardware, it’s easier to assume the branch is not
taken.
— This way we just increment the PC and continue execution, as
for normal instructions.
 If we’re correct, then there is no problem and the pipeline keeps
going at full speed. Clock cycle
1 2 3 4 5 6 7

IM Reg DM Reg
beq $2, $3, Label

IM Reg DM Reg

next instruction 1

IM Reg DM Reg

next instruction 2

23
Branch misprediction
 If our guess is wrong, then we would have already started
executing two instructions incorrectly. We’ll have to discard, or
flush, those instructions and begin executing the right ones from
the branch target address, Label.
Clock cycle
1 2 3 4 5 6 7 8

beq $2, $3, Label IM Reg DM Reg

IM Reg
flush
next instruction 1

IM
flush
next instruction 2

IM Reg DM Reg
Label: ...

24
Performance gains and losses
 Overall, branch prediction is worth it.
— Mispredicting a branch means that two clock cycles are
wasted.
— But if our predictions are even just occasionally correct, then
this is preferable to stalling and wasting two cycles for every
branch.
 All modern CPUs use branch prediction.
— Accurate predictions are important for optimal performance.
— Most CPUs predict branches dynamically—statistics are kept at
run-time to determine the likelihood of a branch being taken.
 The pipeline structure also has a big impact on branch prediction.
— A longer pipeline may require more instructions to be flushed
for a misprediction, resulting in more wasted time and lower
performance.
— We must also be careful that instructions do not modify
registers or memory before they get flushed.

25
Implementing branches
 We can actually decide the branch a little earlier, in ID instead of
EX.
— Our sample instruction set has only a BEQ.
— We can add a small comparison circuit to the ID stage, after
the source registers are read.
 Then we would only need to flush one instruction on a
Clock cycle
misprediction.
1 2 3 4 5 6 7

IM Reg DM Reg
beq $2, $3, Label

IM
flush
next instruction 1

IM Reg DM Reg

Label: ...

26
Implementing flushes
 We must flush one instruction (in its IF stage) if the previous
instruction is BEQ and its two source registers are equal.
 We can flush an instruction from the IF stage by replacing it in the
IF/ID pipeline register with a harmless nop instruction.
— MIPS uses sll $0, $0, 0 as the nop instruction.
— This happens to have a binary encoding of all 0s: 0000 ....
0000.
 Flushing introduces a bubble into the pipeline, which represents
the one-cycle delay in taking the branch.
 The IF.Flush control signal shown on the next page implements
this idea, but no details are shown in the diagram.

27
Branching without forwarding and load stalls
1
ID/EX
0 WB EX/MEM
IF/ID Control M WB MEM/WB
PCSrc EX M WB
4
The
Add
other
P
C Shift stuff just
left 2 won’t fit!

Read Read
register 1 data 1
ALU
Addr Instr
Read Zero
register 2 = ALUSrc
Result
Write Read Address
0
Instruction register data 2
Data
memory
Write Registers 1 memory
data
Write Read
1
RegDst data data
IF.Flush Extend
Rt 0
0
Rd
1

28
Timing
 If no prediction:

IF ID EX MEM WB
IF IF ID EX MEM WB --- lost 1 cycle

 If prediction:
— If Correct
IF ID EX MEM WB
IF ID EX MEM WB -- no cycle lost
— If Misprediction:
IF ID EX MEM WB
IF0 IF1 ID EX MEM WB --- 1 cycle lost

29
Summary
 Three kinds of hazards conspire to make pipelining difficult.
 Structural hazards result from not having enough hardware
available to execute multiple instructions simultaneously.
— These are avoided by adding more functional units (e.g., more
adders or memories) or by redesigning the pipeline stages.
 Data hazards can occur when instructions need to access
registers that haven’t been updated yet.
— Hazards from R-type instructions can be avoided with
forwarding.
— Loads can result in a “true” hazard, which must stall the
pipeline.
 Control hazards arise when the CPU cannot determine which
instruction to fetch next.
— We can minimize delays by doing branch tests earlier in the
pipeline.
— We can also take a chance and predict the branch direction, to
make the most of a bad situation.

30

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy