0% found this document useful (0 votes)

7 views27 pages

L13 Stalls and Flushes

The document discusses data hazards in pipelined CPUs, focusing on how stalls and forwarding can resolve issues when instructions depend on each other. It highlights the limitations of forwarding, particularly with load instructions and branches, which may require stalling the pipeline to maintain performance. Additionally, it covers methods for detecting stalls, branch prediction, and the implications of mispredictions on performance.

Uploaded by

govaje4313

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views27 pages

L13 Stalls and Flushes

Uploaded by

govaje4313

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 27

Stalls and flushes

 So far, we have discussed data hazards that can occur in

pipelined CPUs if some instructions depend upon others that are
still executing.
— Many hazards can be resolved by forwarding data from the
pipeline registers, instead of waiting for the writeback stage.
— The pipeline continues running at full speed, with one
instruction beginning on every clock cycle.
 Now, we’ll see some real limitations of pipelining.
— Forwarding may not work for data hazards from load
instructions.
— Branches affect the instruction fetch for the next clock cycle.
 In both of these cases we may need to slow down, or stall, the
pipeline.

1
Data hazard review
 A data hazard arises if one instruction needs data that isn’t ready
yet.
— Below, the AND and OR both need to read register $2.
— But $2 isn’t updated by SUB until the fifth clock cycle.
 Dependency arrows that point backwards indicate hazards.
Clock cycle
1 2 3 4 5 6 7

IM Reg DM Reg
sub $2, $1, $3

IM Reg DM Reg
and $12, $2, $5

IM Reg DM Reg
or $13, $6, $2

2
Forwarding
 The desired value ($1 - $3) has actually already been computed—
it just hasn’t been written to the registers yet.
 Forwarding allows other instructions to read ALU results directly
from the pipeline registers, without going through the register
file.
Clock cycle
1 2 3 4 5 6 7

IM Reg DM Reg
sub $2, $1, $3

IM Reg DM Reg
and $12, $2, $5

IM Reg DM Reg
or $13, $6, $2

3
What about loads?
 Imagine if the first instruction in the example was LW instead of
SUB.
— How does this change the data hazard?

Clock cycle
1 2 3 4 5 6

IM Reg DM Reg
lw $2, 20($3)

IM Reg DM Reg
and $12, $2, $5

4
What about loads?
 Imagine if the first instruction in the example was LW instead of
SUB.
— The load data doesn’t come from memory until the end of
cycle 4.
— But the AND needs that value at the beginning of the same
cycle!
 This is a “true” data hazard—the datacycle
Clock is not available when we
need it. 1 2 3 4 5 6

IM Reg DM Reg
lw $2, 20($3)

IM Reg DM Reg
and $12, $2, $5

5
Stalling
 The easiest solution is to stall the pipeline.
 We could delay the AND instruction by introducing a one-cycle
delay into the pipeline, sometimes called a bubble.

Clock cycle
1 2 3 4 5 6 7

IM Reg DM Reg
lw $2, 20($3)

IM Reg DM Reg
and $12, $2, $5

 Notice that we’re still using forwarding in cycle 5, to get data

from the MEM/WB pipeline register to the ALU.

6
Stalling and forwarding
 Without forwarding, we’d have to stall for two cycles to wait for
the LW instruction’s writeback stage.

Clock cycle
1 2 3 4 5 6 7 8

IM Reg DM Reg
lw $2, 20($3)

IM Reg DM Reg
and $12, $2, $5

 In general, you can always stall to avoid hazards—but

dependencies are very common in real code, and stalling often
can reduce performance by a significant amount.

7
Stalling delays the entire pipeline
 If we delay the second instruction, we’ll have to delay the third
one too.
— Why?

Clock cycle
1 2 3 4 5 6 7 8

IM Reg DM Reg
lw $2, 20($3)

IM Reg DM Reg
and $12, $2, $5

IM Reg DM Reg
or $13, $12, $2

8
Stalling delays the entire pipeline
 If we delay the second instruction, we’ll have to delay the third
one too.
— This is necessary to make forwarding work between AND and
OR.
— It also prevents problems such as two instructions trying to
Clock
write to the same register in the cycle
same cycle.
1 2 3 4 5 6 7 8

IM Reg DM Reg
lw $2, 20($3)

IM Reg DM Reg
and $12, $2, $5

IM Reg DM Reg
or $13, $12, $2

9
What about EXE, MEM, WB
 But what about the ALU during cycle 4, the data memory in cycle
5, and the register file write in cycle 6?

Clock cycle
1 2 3 4 5 6 7 8

IM Reg DM Reg
lw $2, 20($3)

IM Reg Reg DM Reg

and $12, $2, $5

IM IM Reg DM Reg

or $13, $12, $2

 Those units aren’t used in those cycles because of the stall, so we

can set the EX, MEM and WB control signals to all 0s.

11
Stall = Nop conversion
Clock cycle
1 2 3 4 5 6 7 8

lw $2, 20($3) IM Reg DM Reg

IM Reg DM Reg
and -> nop

IM Reg DM Reg

and $12, $2, $5

IM Reg DM Reg

or $13, $12, $2
 The effect of a load stall is to insert an empty or nop instruction
into the pipeline

12
Detecting stalls
 Detecting stall is much like detecting data hazards.

 Recall the format of hazard detection equations:

if (EX/MEM.RegWrite = 1
and EX/MEM.RegisterRd = ID/EX.RegisterRs)
then Bypass Rs from EX/MEM stage latch

ex/mem mem\wb
ex/mem
id/ex
if/id

IM Reg DM Reg
sub $2, $1, $3

mem\wb
id/ex
IM Reg DM Reg
if/id

and $12, $2, $5

13
Detecting Stalls, cont.
 When should stalls be detected?

mem\wb
ex/mem
id/ex
lw $2, 20($3) IM Reg DM Reg

if/id

mem\wb
ex/mem
id/ex
if/id

if/id
IM Reg Reg DM Reg
and $12, $2, $5

 What is the stall condition?

if (

)
then stall

14
Adding hazard detection to the CPU
Hazard
Unit
ID/EX
WB EX/MEM
M WB MEM/WB
Control EX M WB
PC
IF/ID
Read Read 0
register 1 data 1 1
Addr Instr 2
Read ALU
register 2 Zero
ALUSrc
Write Read Result Address
0
Instruction register data 2
1 0 Data
memory
Write Registers 2 memory
data 1
Write Read
Instr [15 - 0] 1
RegDst data data
Extend
Rt 0
0
Rd
1 EX/MEM.RegisterRd
Rs

Forwarding
Unit

MEM/WB.RegisterRd

16
Adding hazard detection to the CPU
ID/EX.MemRead
Hazard
Unit ID/EX.RegisterRt
IF/ID Write
ID/EX
PC Write

Rs Rt 0 WB EX/MEM
M WB MEM/WB
Control 1
EX M WB
PC
IF/ID
Read Read 0
register 1 data 1 1
Addr Instr 2
Read ALU
register 2 Zero
ALUSrc
Write Read Result Address
0
Instruction register data 2
1 0 Data
memory
Write Registers 2 memory
data 1
Write Read
Instr [15 - 0] 1
RegDst data data
Extend
Rt 0
0
Rd
1 EX/MEM.RegisterRd
Rs

Forwarding
Unit

MEM/WB.RegisterRd

17
Generalizing Forwarding/Stalling
 What if data memory access was so slow, we wanted to pipeline
it over 2 cycles?
Clock cycle
1 2 3 4 5 6

IM Reg DM Reg

 How many bypass inputs would the muxes in EXE have?

 Which instructions in the following require stalling and/or
bypassing?

lw r13, 0(r11)
add r7, r8, r9
add r15, r7, r13

19
Branches in the original pipelined datapath
1

0 ID/EX
When are they resolved?
WB EX/MEM
PCSrc
Control M WB MEM/WB
IF/ID EX M WB
4
Add
P Add
C Shift
RegWrite left 2

Read Read
register 1 data 1 MemWrite
ALU
Read Instruction Zero
Read Read
address [31-0] 0
register 2 data 2 Result Address
Write
1 Data
Instruction register MemToReg
memory
memory Registers ALUOp
Write
data ALUSrc Write Read
1
data data
Instr [15 - 0] Sign
RegDst
extend MemRead
0
Instr [20 - 16]
0
Instr [15 - 11]
1

20
Branches
 Most of the work for a branch computation is done in the EX
stage.
— The branch target address is computed.
— The source registers are compared by the ALU, and the Zero
flag is set or cleared accordingly.
 Thus, the branch decision cannot be made until the end of the EX
stage.
— But we need to know which instruction to fetch next, in order
to keep the pipeline running!
Clock cycle
— This leads to what’s called a control hazard.
1 2 3 4 5 6 7 8

IM Reg DM Reg
beq $2, $3, Label

???

21
Stalling is one solution
 Again, stalling is always one possible solution.

Clock cycle
1 2 3 4 5 6 7 8

IM Reg DM Reg
beq $2, $3, Label

IM IM Reg DM Reg

???

 Here we just stall until cycle 4, after we do make the branch

decision.

22
Branch prediction
 Another approach is to guess whether or not the branch is taken.
— In terms of hardware, it’s easier to assume the branch is not
taken.
— This way we just increment the PC and continue execution, as
for normal instructions.
 If we’re correct, then there is no problem and the pipeline keeps
going at full speed. Clock cycle
1 2 3 4 5 6 7

IM Reg DM Reg
beq $2, $3, Label

IM Reg DM Reg

next instruction 1

IM Reg DM Reg

next instruction 2

23
Branch misprediction
 If our guess is wrong, then we would have already started
executing two instructions incorrectly. We’ll have to discard, or
flush, those instructions and begin executing the right ones from
the branch target address, Label.
Clock cycle
1 2 3 4 5 6 7 8

beq $2, $3, Label IM Reg DM Reg

IM Reg
flush
next instruction 1

IM
flush
next instruction 2

IM Reg DM Reg
Label: ...

24
Performance gains and losses
 Overall, branch prediction is worth it.
— Mispredicting a branch means that two clock cycles are
wasted.
— But if our predictions are even just occasionally correct, then
this is preferable to stalling and wasting two cycles for every
branch.
 All modern CPUs use branch prediction.
— Accurate predictions are important for optimal performance.
— Most CPUs predict branches dynamically—statistics are kept at
run-time to determine the likelihood of a branch being taken.
 The pipeline structure also has a big impact on branch prediction.
— A longer pipeline may require more instructions to be flushed
for a misprediction, resulting in more wasted time and lower
performance.
— We must also be careful that instructions do not modify
registers or memory before they get flushed.

25
Implementing branches
 We can actually decide the branch a little earlier, in ID instead of
EX.
— Our sample instruction set has only a BEQ.
— We can add a small comparison circuit to the ID stage, after
the source registers are read.
 Then we would only need to flush one instruction on a
Clock cycle
misprediction.
1 2 3 4 5 6 7

IM Reg DM Reg
beq $2, $3, Label

IM
flush
next instruction 1

IM Reg DM Reg

Label: ...

26
Implementing flushes
 We must flush one instruction (in its IF stage) if the previous
instruction is BEQ and its two source registers are equal.
 We can flush an instruction from the IF stage by replacing it in the
IF/ID pipeline register with a harmless nop instruction.
— MIPS uses sll $0, $0, 0 as the nop instruction.
— This happens to have a binary encoding of all 0s: 0000 ....
0000.
 Flushing introduces a bubble into the pipeline, which represents
the one-cycle delay in taking the branch.
 The IF.Flush control signal shown on the next page implements
this idea, but no details are shown in the diagram.

27
Branching without forwarding and load stalls
1
ID/EX
0 WB EX/MEM
IF/ID Control M WB MEM/WB
PCSrc EX M WB
4
The
Add
other
P
C Shift stuff just
left 2 won’t fit!

Read Read
register 1 data 1
ALU
Addr Instr
Read Zero
register 2 = ALUSrc
Result
Write Read Address
0
Instruction register data 2
Data
memory
Write Registers 1 memory
data
Write Read
1
RegDst data data
IF.Flush Extend
Rt 0
0
Rd
1

28
Timing
 If no prediction:

IF ID EX MEM WB
IF IF ID EX MEM WB --- lost 1 cycle

 If prediction:
— If Correct
IF ID EX MEM WB
IF ID EX MEM WB -- no cycle lost
— If Misprediction:
IF ID EX MEM WB
IF0 IF1 ID EX MEM WB --- 1 cycle lost

29
Summary
 Three kinds of hazards conspire to make pipelining difficult.
 Structural hazards result from not having enough hardware
available to execute multiple instructions simultaneously.
— These are avoided by adding more functional units (e.g., more
adders or memories) or by redesigning the pipeline stages.
 Data hazards can occur when instructions need to access
registers that haven’t been updated yet.
— Hazards from R-type instructions can be avoided with
forwarding.
— Loads can result in a “true” hazard, which must stall the
pipeline.
 Control hazards arise when the CPU cannot determine which
instruction to fetch next.
— We can minimize delays by doing branch tests earlier in the
pipeline.
— We can also take a chance and predict the branch direction, to
make the most of a bad situation.

206 Scheduling Problems
No ratings yet
206 Scheduling Problems
2 pages
AutoSys Training
50% (2)
AutoSys Training
30 pages
Lecture 4.3 - The Processor - Pipelining
No ratings yet
Lecture 4.3 - The Processor - Pipelining
27 pages
CS3350B Computer Architecture: Lecture 6.2: Instructional Level Parallelism: Hazards and Resolutions
No ratings yet
CS3350B Computer Architecture: Lecture 6.2: Instructional Level Parallelism: Hazards and Resolutions
31 pages
Pipeline Review: Here Is The Example Instruction Sequence Used To Illustrate Pipelining On The Previous Page
No ratings yet
Pipeline Review: Here Is The Example Instruction Sequence Used To Illustrate Pipelining On The Previous Page
11 pages
COA Unit 3
No ratings yet
COA Unit 3
89 pages
CS M151B / EE M116C: Computer Systems Architecture
No ratings yet
CS M151B / EE M116C: Computer Systems Architecture
50 pages
Lec12 Pipeline 2 Notes
No ratings yet
Lec12 Pipeline 2 Notes
58 pages
Lec7 Pipelining
No ratings yet
Lec7 Pipelining
22 pages
Computer Architecture LAB 2
No ratings yet
Computer Architecture LAB 2
4 pages
Pipeline Hazards Detailed Notes
No ratings yet
Pipeline Hazards Detailed Notes
49 pages
Two Forms of Pipelining: - E.g., Floating Point Operations
No ratings yet
Two Forms of Pipelining: - E.g., Floating Point Operations
36 pages
U33
No ratings yet
U33
61 pages
ch4 3
No ratings yet
ch4 3
61 pages
Tuesday, October 31, 2023 10:53 PM: Discuss, The Schemes For Dealing With The Pipeline Stalls Caused by Branch Hazards
No ratings yet
Tuesday, October 31, 2023 10:53 PM: Discuss, The Schemes For Dealing With The Pipeline Stalls Caused by Branch Hazards
7 pages
Lec13 Pipe Control
No ratings yet
Lec13 Pipe Control
19 pages
Forwarding Assignment
No ratings yet
Forwarding Assignment
35 pages
Lecture 13-14: Pipelines Hazards": Suggested Reading:" (HP Chapter 4.5-4.7) "
No ratings yet
Lecture 13-14: Pipelines Hazards": Suggested Reading:" (HP Chapter 4.5-4.7) "
51 pages
CS104: Computer Organization: 2 April, 2020
No ratings yet
CS104: Computer Organization: 2 April, 2020
33 pages
Enhancing Performance With Pipelining
No ratings yet
Enhancing Performance With Pipelining
71 pages
L8 PipelineHazards 1
No ratings yet
L8 PipelineHazards 1
28 pages
Unit 5 Pipeline Hazard
No ratings yet
Unit 5 Pipeline Hazard
31 pages
Lecture 6 The Processors-Improving The Performance
No ratings yet
Lecture 6 The Processors-Improving The Performance
40 pages
Table 1: Control Signals and Opcodes
No ratings yet
Table 1: Control Signals and Opcodes
6 pages
Lec 13
No ratings yet
Lec 13
36 pages
Ca CT2
No ratings yet
Ca CT2
4 pages
Data Hazards
No ratings yet
Data Hazards
29 pages
Pipelining (All Slides)
No ratings yet
Pipelining (All Slides)
45 pages
2014fa CS61C L31 DG PipelineII 6up
No ratings yet
2014fa CS61C L31 DG PipelineII 6up
4 pages
Pipelining
No ratings yet
Pipelining
29 pages
Computer Architecture: Introduction To The Concept of Pipelined Processor
No ratings yet
Computer Architecture: Introduction To The Concept of Pipelined Processor
20 pages
Week 4 - Pipelining
No ratings yet
Week 4 - Pipelining
44 pages
Lec 27
No ratings yet
Lec 27
26 pages
Lec 4
No ratings yet
Lec 4
35 pages
Ca HW5
No ratings yet
Ca HW5
4 pages
Arch4 Pipelined Processor Design Afterlecture
No ratings yet
Arch4 Pipelined Processor Design Afterlecture
130 pages
3 Pipeline
No ratings yet
3 Pipeline
38 pages
Chapter Six: 2004 Morgan Kaufmann Publishers
No ratings yet
Chapter Six: 2004 Morgan Kaufmann Publishers
25 pages
Chapter 8 - Pipelining
No ratings yet
Chapter 8 - Pipelining
38 pages
3 Pipeline
No ratings yet
3 Pipeline
21 pages
Pipelining New
No ratings yet
Pipelining New
33 pages
Pipelining
No ratings yet
Pipelining
44 pages
Hazards - V3
No ratings yet
Hazards - V3
34 pages
Pipeline Hazards: Structural Hazards: Resource Conflict
No ratings yet
Pipeline Hazards: Structural Hazards: Resource Conflict
49 pages
CSCE 5610 Computer System Architecture: Instruction Level Parallelism
No ratings yet
CSCE 5610 Computer System Architecture: Instruction Level Parallelism
16 pages
Lecture-16 CH-04 4
No ratings yet
Lecture-16 CH-04 4
21 pages
Pipelining 3
No ratings yet
Pipelining 3
37 pages
Project-B-Report v1
No ratings yet
Project-B-Report v1
16 pages
DLX-Phases of Instruction Cycle
No ratings yet
DLX-Phases of Instruction Cycle
12 pages
CA Unit-2 Chapter-2
No ratings yet
CA Unit-2 Chapter-2
36 pages
Unit 6 Part1 Ilp
No ratings yet
Unit 6 Part1 Ilp
39 pages
Chapter4 Pipelining END FA11
No ratings yet
Chapter4 Pipelining END FA11
84 pages
Content: - Introduction To Pipeline Hazard - Structural Hazard - Data Hazard - Control Hazard
No ratings yet
Content: - Introduction To Pipeline Hazard - Structural Hazard - Data Hazard - Control Hazard
27 pages
Lec 06
No ratings yet
Lec 06
18 pages
Ca Assignment: Syeda Haima Batool Naqvi CS-18022
No ratings yet
Ca Assignment: Syeda Haima Batool Naqvi CS-18022
11 pages
Lecture9 Cda3101
No ratings yet
Lecture9 Cda3101
62 pages
4-Pipeline
No ratings yet
4-Pipeline
30 pages
Lec 2
No ratings yet
Lec 2
21 pages
Automatic Malware Signature Generation: Karin Ask October 16, 2006
No ratings yet
Automatic Malware Signature Generation: Karin Ask October 16, 2006
69 pages
Official AccuPoint-NG Installation-Guide
No ratings yet
Official AccuPoint-NG Installation-Guide
22 pages
Compute S Programming The Commodore 64 The Definitive Guide
No ratings yet
Compute S Programming The Commodore 64 The Definitive Guide
628 pages
Using Rac Parallel Instance Groups
No ratings yet
Using Rac Parallel Instance Groups
6 pages
CAN 5F00 Comparison 5F00 TJA1050 5F00 HVD230
No ratings yet
CAN 5F00 Comparison 5F00 TJA1050 5F00 HVD230
13 pages
Mesh Central 2 Router User Guide
No ratings yet
Mesh Central 2 Router User Guide
7 pages
Professional Cloud Developer
No ratings yet
Professional Cloud Developer
168 pages
A320mh 220712 (En+ru+cn)
No ratings yet
A320mh 220712 (En+ru+cn)
82 pages
WhatsMiner M3 Operating Guide - V1.1
No ratings yet
WhatsMiner M3 Operating Guide - V1.1
18 pages
Rejinpaul App From Playstore: D. Virtual
No ratings yet
Rejinpaul App From Playstore: D. Virtual
7 pages
1.intro - Design Issues
No ratings yet
1.intro - Design Issues
21 pages
Fycs 1
No ratings yet
Fycs 1
2 pages
Vision To Reality Plan - 5W1H
100% (1)
Vision To Reality Plan - 5W1H
8 pages
Lec16 Synch
No ratings yet
Lec16 Synch
9 pages
Parallel Programming With Intel Parallel Studio XE Wrox Programmer To Programmer 1. Auflage Edition Stephen Blair-Chappell
100% (9)
Parallel Programming With Intel Parallel Studio XE Wrox Programmer To Programmer 1. Auflage Edition Stephen Blair-Chappell
82 pages
Transformation Server For AS400 - Version 6.1
100% (1)
Transformation Server For AS400 - Version 6.1
168 pages
COM - 113 - INTRO. Basic Program
No ratings yet
COM - 113 - INTRO. Basic Program
3 pages
Cao 2012
0% (1)
Cao 2012
116 pages
Smart Home Technology: Hands-0N Technical Demonstration
No ratings yet
Smart Home Technology: Hands-0N Technical Demonstration
64 pages
Programming Coursework - Marking Scheme
No ratings yet
Programming Coursework - Marking Scheme
1 page
Spectro V16 SoftWare
No ratings yet
Spectro V16 SoftWare
20 pages
Logitrace v13 Manual E
No ratings yet
Logitrace v13 Manual E
70 pages
Synopsis Project Title E-Health Care Management System: College of Management and Computer Sicnece, Yavatmal
No ratings yet
Synopsis Project Title E-Health Care Management System: College of Management and Computer Sicnece, Yavatmal
6 pages
Autosys Job Management - Unix Installation Guide
67% (3)
Autosys Job Management - Unix Installation Guide
235 pages
How C Programming Works
No ratings yet
How C Programming Works
10 pages
Ihelp Catalogue 2025
No ratings yet
Ihelp Catalogue 2025
35 pages
Your Interactive Guide To The Digital World
No ratings yet
Your Interactive Guide To The Digital World
52 pages
MURALI - VLSI Design Engineers
0% (1)
MURALI - VLSI Design Engineers
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

L13 Stalls and Flushes

Uploaded by

L13 Stalls and Flushes

Uploaded by

Stalls and flushes

 So far, we have discussed data hazards that can occur in

 Notice that we’re still using forwarding in cycle 5, to get data

 In general, you can always stall to avoid hazards—but

IM Reg Reg DM Reg

 Those units aren’t used in those cycles because of the stall, so we

lw $2, 20($3) IM Reg DM Reg

and $12, $2, $5

 Recall the format of hazard detection equations:

and $12, $2, $5

 What is the stall condition?

 How many bypass inputs would the muxes in EXE have?

 Here we just stall until cycle 4, after we do make the branch

beq $2, $3, Label IM Reg DM Reg

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.