0% found this document useful (0 votes)

13 views16 pages

CSCE 5610 Computer System Architecture: Instruction Level Parallelism

The document discusses pipeline hazards in computer architecture, categorizing them into structural, data, and control hazards. It explains how data hazards arise from instruction dependencies and presents solutions like stalling and forwarding to mitigate these issues. The text also details the mechanics of forwarding in pipelined architectures to enhance performance while addressing the challenges posed by load instructions and branches.

Uploaded by

x6ycdqdpj6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views16 pages

CSCE 5610 Computer System Architecture: Instruction Level Parallelism

Uploaded by

x6ycdqdpj6

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Pipeline hazards

● Hazards: Situations in pipelined Datapath, in which the next

instruction cannot execute in the following clock cycle.
CSCE 5610
Computer System Architecture
● Three types of hazards:
● Structural Hazards
● Data Hazards
● Control Hazards
Instruction Level Parallelism

Structural Hazards Data Hazards

● Structural Hazards: The hardware cannot support the combination of ● Data Hazards occur when an instruction depends on the results of a
instructions that are required to be executed in the same clock cycle previous instruction in a way that is exposed by the overlapping of
● MIPS instruction set is designed to be pipelined making it easy to avoid instructions in the pipeline.
structural hazards.
● Example 1:
add $s0, $t0, $t1
sub $t2, $s0, $t3
Suppose we had a
single memory
instead of two data
and instruction
memories
The add instruction
does not write it
results until the fifth
The result of add step.
instruction $s0 is
required in this step.
Data Hazards Pipeline Performance with stalls
● Example 1: ● Assume the stages are balanced, speedup of pipelining?
add $s0, $t1, $t2
sub $t2, $s0, $t3

● Stalling: a simple but slow solution

Pipeline Performance with stalls Data hazards – Dependency Arrow

● If there are no stalls, the speedup is equal to the number of pipeline stages. Clock cycle
However, if we add two stalls after each instruction in a 5-stage pipelined MIPS 1 2 3 4 5 6 7 8 9
processor:
su $2, $1, $3 IF ID EX MEM WB
b

an $12, $2, $5 IF ID EX MEM WB

or $13, $6, $2 IF ID EX MEM WB

ad $14, $2, $2 IF ID EX MEM WB

• Arrows
d indicate the flow of data between instructions.
— The tails of the arrows show when register $2 is written.
— sw The$15,heads of the arrows show when $2 is read.
100($2) IF ID EX MEM WB
• Any arrow that points backwards in time represents a data hazard in our basic
pipelined datapath. Here, hazards exist between instructions 1 & 2 and 1 & 3.
A Fancier Pipeline Diagram A more Detailed Look at the Pipeline
• We have to eliminate the hazards, so the AND and OR instructions in our
example will use the correct value for register $2.
• When is the data is actually produced and consumed?
• What can we do?

Clock cycle
1 2 3 4 5 6 7
sub $2, $1, $3 IF ID EX MEM WB

and $12, $2, $5 IF ID EX MEM WB

or $13, $6, $2 IF ID EX MEM WB

A more Detailed Look at the Pipeline A more Detailed Look at the Pipeline
• We have to eliminate the hazards, so the AND and OR instructions in our • The actual result $1 -$3 is computed in clock cycle 3, before it’s needed in cycles
example will use the correct value for register $2. 4 and 5.
• Let’s look at when the data is actually produced and consumed. • If we could somehow bypass the writeback and register read stages when
— The SUB instruction produces its result in its EX stage, during cycle 3 in needed, then we can eliminate these data hazards.
the diagram below. — Now, we’ll focus on hazards involving arithmetic instructions.
— The AND and OR need the new value of $2 in their EX stages, during — Next time, we’ll examine the lw instruction.
clock cycles 4-5 here. • Essentially, we need to pass the ALU output from SUB directly to the AND
and OR instructions, without going through the register file.

Clock cycle Clock cycle

1 2 3 4 5 6 7 1 2 3 4 5 6 7
sub $2, $1, $3 IF ID EX MEM WB sub $2, $1, $3 IF ID EX MEM WB

and $12, $2, $5 IF ID EX MEM WB and $12, $2, $5 IF ID EX MEM WB

or $13, $6, $2 IF ID EX MEM WB or $13, $6, $2 IF ID EX MEM WB

Where to find the ALU result Forwarding/Bypassing
• The ALU result generated in the EX stage is normally passed through the pipeline • Since the pipeline registers already contain the ALU result, we could just forward that
registers to the MEM and WB stages, before it is finally written to the register value to subsequent instructions, to prevent data hazards.
file. — In clock cycle 4, the AND instruction can get the value $1 -$3 from
• This is an abridged diagram of our pipelined datapath. the EX/MEM pipeline register used by sub.
— Then in cycle 5, the OR can get that same result from the MEM/WB pipeline register
being used by SUB.
Clock cycle
1 2 3 4 5 6 7

sub $2, $1, IM Reg DM Reg

and $12, $2, IM Reg DM Reg

or $13, $6, $2 IM Reg DM Reg

Outline of Forwarding Hardware Simplified Datapath with Forwarding Muxes

• A forwarding unit selects the correct ALU inputs for the EX stage.
— If there is no hazard, the ALU’s operands will come from the register file,
just like before.
— If there is a hazard, the operands will come from either the EX/MEM
or MEM/WB pipeline registers instead.
• The ALU sources will be selected by two new multiplexers, with control signals
named ForwardA and ForwardB.

IM Reg DM Reg
sub $2, $1,
$3

and $12, $2, IM Reg DM Reg

or $13, $6, $2 IM Reg DM Reg

Detecting EX/MEM data hazards Detecting EX/MEM data hazards
• So how can the hardware determine if a hazard exists? • So how can the hardware determine if a hazard exists?
• An EX/MEM hazard occurs between the instruction currently in its EX
stage and the previous instruction if:
1. The previous instruction will write to the register file, and
2. The destination is one of the ALU source registers in the EX stage.
• There is an EX/MEM hazard between the two instructions below.

IM Reg DM Reg
IM Reg DM Reg sub $2, $1, $3
sub $2, $1,
$3
and $12, $2, $5 IM Reg DM Reg
and $12, $2, IM Reg DM Reg

• Data in a pipeline register can be referenced using a class-like syntax. For

example, ID/EX.RegisterRt refers to the rt field stored in the ID/EX pipeline.

EX/MEM data hazard equations Detecting MEM/WB data hazards

• The first ALU source comes from the pipeline register when necessary. • A MEM/WB hazard may occur between an instruction in the EX stage and the
instruction from two cycles ago.
if (EX/MEM.RegWrite = 1
and EX/MEM.RegisterRd = ID/EX.RegisterRs) • One new problem is if a register is updated twice in a row.
then ForwardA = 2 add $1, $2, $
3
• The second ALU source is similar.
add $1, $1, $
if (EX/MEM.RegWrite = 1 • 4
Register $1 is written by both of the previous instructions, but only the most
and EX/MEM.RegisterRd = ID/EX.RegisterRt) then recent result (from the second sub
ADD) $5,should$5,be forwarded.
$
1
ForwardB = 2

IM Reg DM Reg
sub $2, $1, $3

and $12, $2, $5 IM Reg DM Reg

MEM/WB hazard equations Simplified Datapath with Forwarding
• Here is an equation for detecting and handling MEM/WB hazards for the first
ALU source.

if (MEM/WB.RegWrite = 1
and MEM/WB.RegisterRd = ID/EX.RegisterRs
and (EX/MEM.RegisterRd ≠ ID/EX.RegisterRs or EX/MEM.RegWrite = 0)
then ForwardA = 1

• The second ALU operand is handled similarly.

if (MEM/WB.RegWrite = 1
and MEM/WB.RegisterRd = ID/EX.RegisterRt
and (EX/MEM.RegisterRd ≠ ID/EX.RegisterRt or EX/MEM.RegWrite = 0)
then ForwardB = 1

The Forwarding Unit Example

• The forwarding unit has several control signals as inputs.
sub $2, $1, $3
and $12, $2, $
ID/EX.RegisterRs EX/MEM.RegisterRd MEM/WB.Register 5
Rd or $13, $6, $
ID/EX.RegisterRt EX/MEM.RegWrite MEM/WB.RegWrite sw $15, 100($2) 2
(The two RegWrite signals are not shown in the diagram, but they come from the add $14, $2, $
control unit.) • Assume again each register initially contains its number
2 plus 100.
• The fowarding unit outputs are selectors for the ForwardA and ForwardB — After the first instruction, $2 should contain -2 (101 -103).
multiplexers attached to the ALU. These outputs are generated from the inputs — The other instructions should all use -2 as one of their operands.
using the equations on the previous pages.
• Some new buses route data from pipeline registers to the new muxes. • We’ll try to keep the example short.
— Assume no forwarding is needed except for register $2.
— We’ll skip the first two cycles, since they’re the same as before.
Clock cycle 3 Clock cycle 4: Forwarding $2 from EX/MEM

Clock cycle 5: Forwarding $2 from MEM/WB Complete Pipelined Datapath...so far

Stalls and Flushes What about loads?
• Imagine if the first instruction in the example was LW instead of SUB.
• So far, we have discussed data hazards that can occur in — How does this change the data hazard?
pipelined CPUs if some instructions depend upon others
that are still executing. ● Example 2:
— Many hazards can be resolved by forwarding data from the pipeline
registers, instead of waiting for the writeback stage.
— The pipeline continues running at full speed, with one instruction beginning Clock cycle
on every clock cycle. 1 2 3 4 5 6

• Now, we’ll see some real limitations of pipelining.

lw $2, 20($3) IM Reg DM Reg
— Forwarding may not work for data hazards from load instructions.
— Branches affect the instruction fetch for the next clock cycle.
• In both of these cases we may need to slow down, or stall, and $12, $2, $5 IM Reg DM Reg

the pipeline.

Stalling Stalling and Forwarding

• The easiest solution is to stall the pipeline. • Without forwarding, we’d have to stall for two cycles to wait for the LW
• We could delay the AND instruction by introducing a one-cycle delay into instruction’s writeback stage.
the pipeline, sometimes called a bubble.
Clock cycle
1 2 3 4 5 6 7 8
Clock cycle
1 2 3 4 5 6 7
lw $2, 20($3) IM Reg DM Reg

lw $2, 20($3) IM Reg DM Reg

and $12, $2, $5 IM Reg DM Reg

• In general, you can always stall to avoid hazards—but dependencies are very
common in real code, and stalling often can reduce performance by a significant
amount.
• Notice that we’re still using forwarding in cycle 5, to get data from the
MEM/WB pipeline register to the ALU.
Stalling Delays the Entire Pipeline Stalling Delays the Entire Pipeline
• If we delay the second instruction, we’ll have to delay the third one too. • If we delay the second instruction, we’ll have to delay the third one too.
— Why? — This is necessary to make forwarding work between AND and OR.
— It also prevents problems such as two instructions trying to write to
the same register in the same cycle.

Clock cycle Clock cycle

1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

IM Reg DM Reg
lw $2, 20($3) lw $2, 20($3)
IM Reg DM Reg

and $12, $2, $5 IM Reg DM Reg and $12, $2, $5 IM Reg DM Reg

or $13, $12, $2 IM Reg DM Reg or $13, $12, $2 IM Reg DM Reg

Stall = Nop conversion Detecting stalls

Clock cycle • Detecting stall is much like detecting data hazards.
1 2 3 4 5 6 7 8

lw $2, 20($3) IM Reg DM Reg

• Recall the format of hazard detection equations:
if (EX/MEM.RegWrite = 1
and -> nop IM Reg DM Reg and EX/MEM.RegisterRd = ID/EX.RegisterRs)
then Bypass Rs from EX/MEM stage latch

and $12, $2, $5 IM Reg DM Reg

ex m
IM if/ Reg id/ /m DM e Reg
sub $2, $1, $3 id ex e m\
IM Reg DM Reg
or $13, $12, $2 m w
bex m
and $12, $2, $5 IM if/ Reg id/ /m DM e Reg
• The effect of a load stall is to insert an empty or nop instruction into the pipeline id ex e m\
m w
b
Detecting Stalls, cont. Hazard detection unit
• When should stalls be detected?

ex m
lw $2, 20($3) IM if/ Reg id/ /m DM e Reg

id ex e m\
m w
b m
ex
and $12, $2, $5 IM if/ Reg if/ Reg
id/
/m DM
e Reg
id id ex m\
e
w
m
b

• What is the stall condition?

if (ID/EX.MemRead = 1
and (ID/EX.RegisterRt = IF/ID.RegisterRs or ● To stall the pipeline, the instructions in ID stage should be stalled We can do it by
ID/EX.RegisterRt = IF/ID.RegisterRt) continuing to read the same PC
) ● To insert bubbles or nops, we set all the nine control signals to “0” no register or
memories are written, which will create a “do nothing” instruction.
then stall

Data Hazards: Reordering Data Hazards: reordering

1)lw $t1,0($t0) $t
Given: Consider the following code segment in C:
1
a = b + e;
C = b + f; $t
2)lw $t2,4($t0)
2
Here is the generated MIPS code for this segment, assuming all variables
are in memory and are addressable as offsets from $t0: 3)add $t3,$t1,$t2 $t
3
lw $t1, 0($t0)
lw $t2, 4($t0)
add $t3, $t1,$t2
sw $t3, 12($t0) 4)sw $t3,12($t0)
lw $t4, 8(t0) $t
add $t5, $t1, $t4 4
sw $t5, 16($t0) 5)lw $t4,8(t0)
$t
5
6)add $t5,$t1,$t4
Sought: Find the hazards in the following code segment and reorder the
instructions to avoid any pipeline stalls. (The pipelined Datapath is equipped
with forwarding/bypassing mechanism) 7)sw $t5,16($t0)
Data Hazards: Reordering Other types of data hazard
1)lw $t1,0($t0) $t ➢ Write After Read (WAR) Hazard a.k.a “antidependence”
1
add $t1, $t0, $s1
$t and $t0, $s3, $s4
2)lw $t2,4($t0)
2
● WAR Hazard is the result of the developer’s choice to choose the same
register for two independent instructions, which can happen due to the
3)lw $t4,8(t0) $t limitation in the number available registers in MIPS processor, i.e. 32 registers.
4
● Antidependence does not cause a hazard in simple MIPS pipeline, however, it
$t could lead to a data hazard in out-of-order processors if the dependent
4)add $t3,$t1,$t2 3 instruction (and) is moved too early (before add in this example)

➢ Write After Write (WAW) Hazard a.k.a “output dependence”

5)sw $t3,12($t0)
$t add $t0, $t1, $s1
and $t0, $s3, $s4
5
6)add $t5,$t1,$t4
● If the and instruction is moved before add instructions, the following
No need for instructions would work with the wrong data
7)sw $t5,16($t0) forwarding
Data is already
available

Loop Unrolling and scheduling Loop Unrolling and scheduling

● Let’s see how the compiler can increase the amount of instruction level parallelism by unrolling
● Given the below latencies for the dependent Floating-Point (FP) and Integer operations in a MIPS
loops.
processor:
● Image the below code segment, in which x and s are double-precision floating point numbers:
for (i=999; i>=0; i=i-1)
x[i]=x[i]+s;

● Assuming $s1 include the address of the element in the array with the highest address, $s2
is the address of the last element in the array, and $f2 contains the value s, the MIPS code
for the above loop would be: ● Without any reordering/scheduling the loop will execute as follows:

Loop: l.d $f0,0($s1) Loop: l.d $f0,0($s1)

Loop: l.d $f0,0($s1) #$f0=array element add.d $f4,$f0,$f2 <Stall>
add.d $f4,$f0,$f2 #add s is $f2 s.d $f4,0($s1) add.d $f4,$f0,$f2
s.d $f4,0($s1) #store result addi $s1,$s1,-8 <Stall> IPC= 5 inst/ 8 clk =
addi $s1,$s1,-8 #decrement pointer bne $s1,$s2,Loop <Stall> 0.625
bne $s1,$s2,Loop #branch $s1!= $s1 s.d $f4,0($s1)
addi $s1,$s1,-8
bne $s1,$s2,Loop
Loop Unrolling Loop Unrolling and scheduling
● Replicate loop body to expose more parallelism ● Image the below code segment, in which x and s are double-precision floating point numbers:
for (i=2; i>=0; i=i-1)
● Reduces loop-control overhead
x[i]=x[i]+s;
● Use different registers per replication ● Assuming $s1 include the address of the element in the array with the highest address, $s2
● Called “register renaming” is the address of the last element in the array, and $f2 contains the value s, the MIPS code
for the above loop would be:
● Avoid loop-carried “anti-dependencies”
● Increasing Instruction-Level Parallelism: l.d $f0,0($s1) l.d $f0,0($s1)
● Avoiding Data Hazards add.d $f4,$f0,$f2 add.d $f4,$f0,$f2
s.d $f4,0($s1)
Loop: l.d $f0,0($s1) addi $s1,$s1,-8 s.d $f4,0($s1)
add.d $f4,$f0,$f2 Unrolling
s.d $f4,0($s1) l.d $f0,0($s1)
addi $s1,$s1,-8 add.d $f4,$f0,$f2 l.d $f0,-8($s1)
bne $s1,$s2,Loop s.d $f4,0($s1) add.d $f4,$f0,$f2
addi $s1,$s1,-8
l.d $f0,0($s1) s.d $f4,-8($s1)
add.d $f4,$f0,$f2
s.d $f4,0($s1) l.d $f0,-16($s1)
addi $s1,$s1,-8 add.d $f4,$f0,$f2

s.d $f4,-16($s1)
addi $s1,$s1,-24

Loop Unrolling and scheduling Loop Unrolling and scheduling

● What is the maximum IPC that can be achieved when unrolling the loop with a factor of 3? ● Now let’s do reordering to increase IPC:
Loop: l.d $f0,0($s1) Loop: l.d $f0,0($s1) Loop: l.d $f0,0($s1)
<Stall> <Stall> <Stall>
add.d $f4,$f0,$f2 add.d $f4,$f0,$f2 add.d $f4,$f0,$f2
<Stall> <Stall> <Stall>
<Stall> <Stall> <Stall>
s.d $f4,0($s1) s.d $f4,0($s1) Loop: l.d $f0,0($s1)
s.d $f4,0($s1)
l.d $f6,-8($s1)
l.d $f6,-8($s1) l.d $f10,-16($s1)
l.d $f0,-8($s1) l.d $f6,-8($s1)
<Stall> add.d $f4,$f0,$f2
<Stall> <Stall>
add.d $f4,$f0,$f2 Register renaming to add.d $f8,$f6,$f2 add.d $f8,$f6,$f2
add.d $f8,$f6,$f2
<Stall> remove antidependance <Stall> <Stall> Reordering/Scheduling add.d $f12,$f10,$f2
<Stall> s.d $f4,0($s1)
<Stall> <Stall>
s.d $f8,-8($s1) s.d $f8,-8($s1) s.d $f8,-8($s1)
s.d $f4,-8($s1)
s.d $f12,-16($s1)
l.d $f10,-16($s1) addi $s1,$s1,-24
l.d $f0,-16($s1) l.d $f10,-16($s1)
<Stall> bne $s1,$s2,Loop
<Stall> <Stall>
add.d $f4,$f0,$f2 add.d $f12,$f10,$f2 add.d $f12,$f10,$f2
<Stall> <Stall> <Stall>
<Stall> <Stall> <Stall>
s.d $f4,-16($s1) s.d $f12,-16($s1) s.d $f12,-16($s1)
addi $s1,$s1,-24 addi $s1,$s1,-24 addi $s1,$s1,-24 IPC= 11 inst/ 11 clk
bne $s1,$s2,Loop bne $s1,$s2,Loop bne $s1,$s2,Loop =1
Control hazards Branches in the original Pipelined Datapath
● Control Hazards arise from the need to make a decision based on the results of one
instruction while other are executing.

● Branch Instruction: The pipeline cannot know what the next instruction should be!

Branches Stalling is one Solution

• Most of the work for a branch computation is done in the EX stage.
• Again, stalling is always one possible solution.
— The branch target address is computed.
— The source registers are compared by the ALU, and the Zero flag is set
Clock cycle
or cleared accordingly.
1 2 3 4 5 6 7 8
• Thus, the branch decision cannot be made until the end of the EX stage.
— But we need to know which instruction to fetch next, in order to keep the
beq $2, $3, Label IM Reg DM Reg
pipeline running!
— This leads to what’s called a control hazard.
Clock cycle ??? IM IM Reg DM Reg

1 2 3 4 5 6 7 8

IM Reg DM Reg
beq $2, $3, Label
• Here we just stall until cycle 4, after we do make the branch decision.

??? IM
Branch prediction Branch prediction
• Another approach is to guess whether or not the branch is taken.
● Branch Prediction: A method of resolving a branch hazard that assumes a given
— In terms of hardware, it’s easier to assume the branch is not taken.
outcome for the branch and proceeds according to that assumption rather then
waiting to ensure the actual outcome. — This way we just increment the PC and continue execution, as for
normal instructions.
● Possible Static Approaches: • If we’re correct, then there is no problem and the pipeline keeps going at full
1. always predict that branches will be untaken. speed.
2. always predict that branches at the bottom of the loops will be taken, and Clock cycle
branches at the top of the loop will be untaken. 1 2 3 4 5 6 7
I Reg DM Reg
beq $2, $3, Label M
● Dynamic Branch Prediction: Keep history for each branch as taken or untaken, and
then use the recent past behavior to predict the future.
next instruction 1 I Reg DM Reg
M

I Reg DM Reg
next instruction 2 M

Branch misprediction Implementing branches

• If our guess is wrong, then we would have already started executing two • We can actually decide the branch a little earlier, in ID instead of EX.
instructions incorrectly. We’ll have to discard, or flush, those instructions and — Our sample instruction set has only a BEQ.
begin executing the right ones from the branch target address, Label.
— We can add a small comparison circuit to the ID stage, after the
Clock cycle source registers are read.
1 2 3 4 5 6 7 8 • Then we would only need to flush one instruction on a misprediction.
beq $2, $3, Label I Reg DM Reg
M Clock cycle
1 2 3 4 5 6 7

next instruction 1 I
M
Reg
flush beq $2, $3, Label IM Reg DM Reg

next instruction 2 I
flush next instruction 1
M
IM flush

Label: ... I Reg DM Reg

M Label: ... IM Reg DM Reg
Implementing flushes Branch prediction
• We must flush one instruction (in its IF stage) if the previous instruction is BEQ Example: Show what happens when the branch is taken in the below instruction
and its two source registers are equal. sequence, assuming that branch is predicted as “not taken”.
• We can flush an instruction from the IF stage by replacing it in the IF/ID pipeline
register with a harmless nop instruction.
— MIPS uses sll $0, $0, 0 as the nop instruction.
— This happens to have a binary encoding of all 0s: 0000 .... 0000.
• Flushing introduces a bubble into the pipeline, which represents the one- cycle
delay in taking the branch.

Branch prediction Branch prediction

Timing Dynamic Branch prediction
• If no prediction: 2-bit Dynamic Branch Prediction Scheme:

I ID EX ME W
F M B “00” “01”
IF IF ID EX MEM --- lost 1
WB cycle
“11” “10”
• If prediction:
— If Correct
IF ID MEM WB Given: Consider a loop that branches seven times in a row and then is not taken once. What is
EX the prediction accuracy using the above 2-bit prediction scheme with the initial state of “00”?
EX MEM WB -- no cycle lost
IF
— If Misprediction:
IF IDID EX MEM 0 0 0 0 0 0 0 01
WBIF0 True 0
True 0
True 0
True 0
True 0
True 0
True 0
False
IF1 ID MEM --- 1 cycle lost
EX WB 7/8 = 87.5%

Dynamic Branch prediction

2-bit Dynamic Branch Prediction Scheme:

“00” “01”

“11” “10”

Given: Consider a loop that branches seven times in a row and then is not taken once. What is
the prediction accuracy using the above 2-bit prediction scheme with the initial state of “10”?

0 0 0 0 0 0 0 01
False 1
True 0
True 0
True 0
True 0
True 0
True 0
False
6/8 = 75%

CMP3010L03 Pipelining
No ratings yet
CMP3010L03 Pipelining
42 pages
U33
No ratings yet
U33
61 pages
Forwarding Assignment
No ratings yet
Forwarding Assignment
35 pages
Lec7 Pipelining
No ratings yet
Lec7 Pipelining
22 pages
Lecture 4.3 - The Processor - Pipelining
No ratings yet
Lecture 4.3 - The Processor - Pipelining
27 pages
Pipelining 3
No ratings yet
Pipelining 3
37 pages
ch4 3
No ratings yet
ch4 3
61 pages
Arch4 Pipelined Processor Design Afterlecture
No ratings yet
Arch4 Pipelined Processor Design Afterlecture
130 pages
CH 6
No ratings yet
CH 6
29 pages
L20-21 Pipeline Hazards
No ratings yet
L20-21 Pipeline Hazards
22 pages
Week 12
No ratings yet
Week 12
41 pages
Revisiting Hazards: Data Hazards Control Hazards Hardware
No ratings yet
Revisiting Hazards: Data Hazards Control Hazards Hardware
45 pages
Chapter 04 Processor 3.5
No ratings yet
Chapter 04 Processor 3.5
52 pages
L13 Stalls and Flushes
No ratings yet
L13 Stalls and Flushes
27 pages
CS M151B / EE M116C: Computer Systems Architecture
No ratings yet
CS M151B / EE M116C: Computer Systems Architecture
50 pages
CA Unit-2 Chapter-2
No ratings yet
CA Unit-2 Chapter-2
36 pages
Presentation 1
No ratings yet
Presentation 1
22 pages
Lec 2
No ratings yet
Lec 2
21 pages
Unit 5 Pipeline Hazard
No ratings yet
Unit 5 Pipeline Hazard
31 pages
Lecture9 Cda3101
No ratings yet
Lecture9 Cda3101
62 pages
L8 PipelineHazards 1
No ratings yet
L8 PipelineHazards 1
28 pages
Lecture-16 CH-04 4
No ratings yet
Lecture-16 CH-04 4
21 pages
Chapter4 Pipelining END FA11
No ratings yet
Chapter4 Pipelining END FA11
84 pages
Lec12 Pipeline 2 Notes
No ratings yet
Lec12 Pipeline 2 Notes
58 pages
DAA or Algorithms in 9 Hours
No ratings yet
DAA or Algorithms in 9 Hours
344 pages
Lec13 Data Hazards
No ratings yet
Lec13 Data Hazards
42 pages
Advanced Computer Architecture
No ratings yet
Advanced Computer Architecture
214 pages
Ca HW5
No ratings yet
Ca HW5
4 pages
Ca HW4
No ratings yet
Ca HW4
4 pages
M3.3 Data Hazard
No ratings yet
M3.3 Data Hazard
12 pages
Lect8 Pipelined DP Control
No ratings yet
Lect8 Pipelined DP Control
59 pages
CA Slides#5 Pipeline Hazards
No ratings yet
CA Slides#5 Pipeline Hazards
33 pages
Pipeline Hazards Detailed Notes
No ratings yet
Pipeline Hazards Detailed Notes
49 pages
Pipelining Hazards 2
No ratings yet
Pipelining Hazards 2
12 pages
CS104: Computer Organization: 2 April, 2020
No ratings yet
CS104: Computer Organization: 2 April, 2020
33 pages
CS3350B Computer Architecture: Lecture 6.2: Instructional Level Parallelism: Hazards and Resolutions
No ratings yet
CS3350B Computer Architecture: Lecture 6.2: Instructional Level Parallelism: Hazards and Resolutions
31 pages
Pipeline Review: Here Is The Example Instruction Sequence Used To Illustrate Pipelining On The Previous Page
No ratings yet
Pipeline Review: Here Is The Example Instruction Sequence Used To Illustrate Pipelining On The Previous Page
11 pages
Hazards: CSE378 W, 2001 CSE378 W, 2001
No ratings yet
Hazards: CSE378 W, 2001 CSE378 W, 2001
6 pages
04 Power Query Tutorial
No ratings yet
04 Power Query Tutorial
38 pages
Lecture 11 COMP2611 Processor Part3
No ratings yet
Lecture 11 COMP2611 Processor Part3
41 pages
DPWH DO NO. 006 S 2024-YOUTUBE LIVESTREAMING, POSTING OF PROCUREMENT ACTIVITIES AND CONTRACT AWARD REPORTING
No ratings yet
DPWH DO NO. 006 S 2024-YOUTUBE LIVESTREAMING, POSTING OF PROCUREMENT ACTIVITIES AND CONTRACT AWARD REPORTING
66 pages
C-Programming-Class 9
No ratings yet
C-Programming-Class 9
47 pages
CSE 431 Computer Architecture Fall 2005 Lecture 07: Overcoming Data Hazards
No ratings yet
CSE 431 Computer Architecture Fall 2005 Lecture 07: Overcoming Data Hazards
17 pages
base24 stuff M9hFGpip67zoQkJgES4OkhDzSdGE41AFJf15wm2lyXbU1zM2jWeZplA%3D%3D-dx8SR18S2LU1pEVsCAfadagun%2B4%3D; __gads=ID=3855e33d2849c23a:T=1362756576:S=ALNI_MaMOBN46-wm5NXstwbDf-ZVO1eIWw; __CJ_session_metadata=%22%7B%5C%22active_facebook_session%5C%22%3A%5C%22false%5C%22%2C%5C%22last_facebook_ping%5C%22%3A1362799725053%7D%22; _trp_hit_8989/15071_300x250=2; grvinsights=d3b5fc74702b7e7494caab382114b774; __utma=137936306.1372482448.1362756592.1362756592.1362799723.2; __utmb=137936306.43.9.1362800241261; __utmc=137936306; __utmz=137936306.1362756592.1.1.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=scribd; __utmv=137936306.|1=logged_in=true=1^2=fb_setup_context=none=1; _scribd_session=BAh7CjoQbGFzdF9yZWF1dGhsKweArDpRIgx3b3JkX2lkaQRoJTkBOgxjc3JmX2lkIiVkNjI3OWNjNGVhNTJmMjQzMGFiOWZlNDVmYjQ1NGQwMyIKZmxhc2hJQzonQWN0aW9uQ29udHJvbGxlcjo6Rmxhc2g6OkZsYXNoSGFzaHsABjoKQHVzZWR7ADocZGlzYWJsZV9pbnN0YW50X2Nvbm5lY3RG--234c3e551720993a91c6c4cb308ccb64e39eaac9 X-Forwarded-For: 218.186.49.46 Jj�
100% (1)
base24 stuff M9hFGpip67zoQkJgES4OkhDzSdGE41AFJf15wm2lyXbU1zM2jWeZplA%3D%3D-dx8SR18S2LU1pEVsCAfadagun%2B4%3D; __gads=ID=3855e33d2849c23a:T=1362756576:S=ALNI_MaMOBN46-wm5NXstwbDf-ZVO1eIWw; __CJ_session_metadata=%22%7B%5C%22active_facebook_session%5C%22%3A%5C%22false%5C%22%2C%5C%22last_facebook_ping%5C%22%3A1362799725053%7D%22; _trp_hit_8989/15071_300x250=2; grvinsights=d3b5fc74702b7e7494caab382114b774; __utma=137936306.1372482448.1362756592.1362756592.1362799723.2; __utmb=137936306.43.9.1362800241261; __utmc=137936306; __utmz=137936306.1362756592.1.1.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=scribd; __utmv=137936306.|1=logged_in=true=1^2=fb_setup_context=none=1; _scribd_session=BAh7CjoQbGFzdF9yZWF1dGhsKweArDpRIgx3b3JkX2lkaQRoJTkBOgxjc3JmX2lkIiVkNjI3OWNjNGVhNTJmMjQzMGFiOWZlNDVmYjQ1NGQwMyIKZmxhc2hJQzonQWN0aW9uQ29udHJvbGxlcjo6Rmxhc2g6OkZsYXNoSGFzaHsABjoKQHVzZWR7ADocZGlzYWJsZV9pbnN0YW50X2Nvbm5lY3RG--234c3e551720993a91c6c4cb308ccb64e39eaac9 X-Forwarded-For: 218.186.49.46 Jj�
105 pages
CA Unit 3 Answers
No ratings yet
CA Unit 3 Answers
10 pages
03 Pipeline
0% (1)
03 Pipeline
38 pages
Week 4 - Pipelining
No ratings yet
Week 4 - Pipelining
44 pages
Data Hazards in ALU Instructions: Consider This Sequence
No ratings yet
Data Hazards in ALU Instructions: Consider This Sequence
14 pages
Ca Assignment: Syeda Haima Batool Naqvi CS-18022
No ratings yet
Ca Assignment: Syeda Haima Batool Naqvi CS-18022
11 pages
CS 162 Computer Architecture Lecture 3: Pipelining Contd.: Instructor: L.N. Bhuyan
No ratings yet
CS 162 Computer Architecture Lecture 3: Pipelining Contd.: Instructor: L.N. Bhuyan
21 pages
Unit 6 Part1 Ilp
No ratings yet
Unit 6 Part1 Ilp
39 pages
Advanced Linux Programming
No ratings yet
Advanced Linux Programming
31 pages
SAP Table Authorizations
No ratings yet
SAP Table Authorizations
12 pages
Sop Retail and Corporate Net Banking
No ratings yet
Sop Retail and Corporate Net Banking
3 pages
Computer Architecture LAB 2
No ratings yet
Computer Architecture LAB 2
4 pages
Ca07 2014 PDF
No ratings yet
Ca07 2014 PDF
56 pages
Lecture 13-14: Pipelines Hazards": Suggested Reading:" (HP Chapter 4.5-4.7) "
No ratings yet
Lecture 13-14: Pipelines Hazards": Suggested Reading:" (HP Chapter 4.5-4.7) "
51 pages
Chapter Six: 2004 Morgan Kaufmann Publishers
No ratings yet
Chapter Six: 2004 Morgan Kaufmann Publishers
25 pages
Content: - Introduction To Pipeline Hazard - Structural Hazard - Data Hazard - Control Hazard
No ratings yet
Content: - Introduction To Pipeline Hazard - Structural Hazard - Data Hazard - Control Hazard
27 pages
Mcs 023
No ratings yet
Mcs 023
261 pages
Snyk Integration + Training Resources
No ratings yet
Snyk Integration + Training Resources
27 pages
Enhancing Performance With Pipelining
No ratings yet
Enhancing Performance With Pipelining
71 pages
A Pipelined Datapath: Resisters Are Used To Save Data Between Stages
No ratings yet
A Pipelined Datapath: Resisters Are Used To Save Data Between Stages
14 pages
Data Hazards
No ratings yet
Data Hazards
15 pages
Building and Training Your Own 2D CNN Model With OpendTect - Session 1 - 061523
No ratings yet
Building and Training Your Own 2D CNN Model With OpendTect - Session 1 - 061523
13 pages
Modern Service Management For Azure v1.1
100% (1)
Modern Service Management For Azure v1.1
45 pages
COA Unit 3
No ratings yet
COA Unit 3
89 pages
2021 10 08 - Log
No ratings yet
2021 10 08 - Log
190 pages
All Obj Methods
100% (1)
All Obj Methods
24 pages
Qemu Interrupt
No ratings yet
Qemu Interrupt
34 pages
Pipelining
No ratings yet
Pipelining
29 pages
What Is Oracle Data Integrator (ODI) ?
100% (1)
What Is Oracle Data Integrator (ODI) ?
8 pages
BACKTRACKING
No ratings yet
BACKTRACKING
25 pages
Lateral Thinking
No ratings yet
Lateral Thinking
4 pages
Ozone Console
No ratings yet
Ozone Console
3 pages
2-5 - Storage - Network - Architecture - Copie
No ratings yet
2-5 - Storage - Network - Architecture - Copie
41 pages
Touch Panel Designer - Manual v1.0.6.0
No ratings yet
Touch Panel Designer - Manual v1.0.6.0
14 pages
Types of DBMS Architecture
No ratings yet
Types of DBMS Architecture
13 pages
Nectarchat Mobile Application V1.0
No ratings yet
Nectarchat Mobile Application V1.0
15 pages
COT Convert Old TableV1.4
No ratings yet
COT Convert Old TableV1.4
8 pages
Mobile Legends Hack No Offers + Unlimited Diamonds Generator 2019 New Year Offer
No ratings yet
Mobile Legends Hack No Offers + Unlimited Diamonds Generator 2019 New Year Offer
4 pages
Zoology queSTION
No ratings yet
Zoology queSTION
1 page
Oz Cheatsheet
No ratings yet
Oz Cheatsheet
3 pages
Employee Schedule1
No ratings yet
Employee Schedule1
4 pages
Sap Basis
No ratings yet
Sap Basis
6 pages
XML Functions For Template
No ratings yet
XML Functions For Template
6 pages
Vivek Resume
No ratings yet
Vivek Resume
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

CSCE 5610 Computer System Architecture: Instruction Level Parallelism

Uploaded by

CSCE 5610 Computer System Architecture: Instruction Level Parallelism

Uploaded by

Pipeline hazards

● Hazards: Situations in pipelined Datapath, in which the next

Structural Hazards Data Hazards

● Stalling: a simple but slow solution

Pipeline Performance with stalls Data hazards – Dependency Arrow

an $12, $2, $5 IF ID EX MEM WB

or $13, $6, $2 IF ID EX MEM WB

ad $14, $2, $2 IF ID EX MEM WB

and $12, $2, $5 IF ID EX MEM WB

or $13, $6, $2 IF ID EX MEM WB

Clock cycle Clock cycle

and $12, $2, $5 IF ID EX MEM WB and $12, $2, $5 IF ID EX MEM WB

or $13, $6, $2 IF ID EX MEM WB or $13, $6, $2 IF ID EX MEM WB

sub $2, $1, IM Reg DM Reg

and $12, $2, IM Reg DM Reg

or $13, $6, $2 IM Reg DM Reg

Outline of Forwarding Hardware Simplified Datapath with Forwarding Muxes

and $12, $2, IM Reg DM Reg

or $13, $6, $2 IM Reg DM Reg

• Data in a pipeline register can be referenced using a class-like syntax. For

EX/MEM data hazard equations Detecting MEM/WB data hazards

and $12, $2, $5 IM Reg DM Reg

• The second ALU operand is handled similarly.

The Forwarding Unit Example

Clock cycle 5: Forwarding $2 from MEM/WB Complete Pipelined Datapath...so far

• Now, we’ll see some real limitations of pipelining.

Stalling Stalling and Forwarding

lw $2, 20($3) IM Reg DM Reg

and $12, $2, $5 IM Reg DM Reg

and $12, $2, $5 IM Reg DM Reg

Clock cycle Clock cycle

or $13, $12, $2 IM Reg DM Reg or $13, $12, $2 IM Reg DM Reg

Stall = Nop conversion Detecting stalls

lw $2, 20($3) IM Reg DM Reg

and $12, $2, $5 IM Reg DM Reg

• What is the stall condition?

Data Hazards: Reordering Data Hazards: reordering

➢ Write After Write (WAW) Hazard a.k.a “output dependence”

Loop Unrolling and scheduling Loop Unrolling and scheduling

Loop: l.d $f0,0($s1) Loop: l.d $f0,0($s1)

Loop Unrolling and scheduling Loop Unrolling and scheduling

Branches Stalling is one Solution

Branch misprediction Implementing branches

Label: ... I Reg DM Reg

Branch prediction Branch prediction

Dynamic Branch prediction

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.