0% found this document useful (0 votes)

44 views58 pages

This Study Resource Was: Pipelining Analogy

The document provides an overview of pipelining by: 1) Describing how pipelining works analogously to an assembly line, allowing instructions to overlap execution for improved performance. 2) Explaining the five stage MIPS pipeline and how instructions move through each stage in parallel. 3) Noting that pipelining achieves speedup by reducing the time between instructions compared to a single-cycle processor.

Uploaded by

arkaprava paul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views58 pages

This Study Resource Was: Pipelining Analogy

Uploaded by

arkaprava paul

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 58

4.

5 An Overview of Pipelining
Pipelining Analogy

Pipelined laundry: overlapping execution

Parallelism improves performance

m
Four loads:

er as
Speedup = 8/3.5 = 2.3

co
eH w
Non-stop:

Speedup = 2n/0.5n + 1.5 ≈ 4

o.
rs e = number of stages
ou urc
32 CS/ECE 3330 – Fall 2009
o
aC s
vi y re

MIPS Pipeline

Five stages, one step per stage

ed d

IF: Instruction fetch from memory

ar stu

ID: Instruction decode & register read

EX:
EX Execute
E i
operation l l
or calculate dd
address
MEM: Access memory operand
sh is

WB: Write result back to register

Instr 1 IF ID EX MEM WB
Instr 2 IF ID EX MEM WB
Instr 3 IF ID EX MEM WB
Time Æ
33 CS/ECE 3330 – Fall 2009

https://www.coursehero.com/file/5657130/cs3330-chap4-pipeline-1/
1
Pipeline Performance

Assume time for stages is

100ps for register read or write
200ps for other stages
Compare pipelined datapath with single-cycle
datapath

Instr Instr fetch Register ALU op Memory Register Total time

read access write

m
er as
lw 200ps 100 ps 200ps 200ps 100 ps 800ps

co
sw 200ps 100 ps 200ps 200ps 700ps

eH w
R-format 200ps 100 ps 200ps 100 ps 600ps

o.
beq 200ps 100 ps 200ps 500ps

rs e
ou urc
34 CS/ECE 3330 – Fall 2009
o
aC s
vi y re

Pipeline Performance
ed d

Single-cycle (Tc= 800ps)

ar stu
sh is
Th

Pipelined (Tc= 200ps)

35 CS/ECE 3330 – Fall 2009

https://www.coursehero.com/file/5657130/cs3330-chap4-pipeline-1/
2
Pipeline Speedup

If all stages are balanced

i.e., all take the same time
Time between instructionspipelined
Ti
= Time b t
between i t ti
instructionsnonpipelined
Number of stages
If not balanced, speedup is lower
Speedup due to increased throughput
Latency (time for each instruction) does not

m
decrease

er as
co
eH w
o.
rs e
ou urc
36 CS/ECE 3330 – Fall 2009
o
aC s
vi y re

Pipelining and ISA Design

MIPS ISA designed for pipelining

ed d

All instructions are 32-bits

ar stu

– Easier to fetch and decode in one cycle

f x86:
– c.f. 86 1-
1 to 17-byte
17 b i i
instructions
Few and regular instruction formats
sh is

– Can decode and read registers in one step

Load/store addressing
– Can calculate address in 3rd stage, access
memory in 4th stage
Alignment of memory operands
– Memory access takes only one cycle

37 CS/ECE 3330 – Fall 2009

https://www.coursehero.com/file/5657130/cs3330-chap4-pipeline-1/
3
Hazards

Situations that prevent starting the next

instruction in the next cycle
Structure hazards
A required resource is busy
Data hazard
Need to wait for previous instruction to complete
its data read/write
Control hazard

m
Deciding

er as
D idi on controll action
i depends
d d on previous
i
instruction

co
eH w
o.
rs e
ou urc
38 CS/ECE 3330 – Fall 2009
o
aC s
vi y re

Structure Hazards

Conflict for use of a resource

ed d

In MIPS pipeline with a single memory

ar stu

Load/store requires data access

Instruction fetch would have to stall for that cycle
– Would cause a pipeline “bubble”
sh is

Hence, pipelined datapaths require separate

instruction/data memories
Or separate instruction/data caches

39 CS/ECE 3330 – Fall 2009

https://www.coursehero.com/file/5657130/cs3330-chap4-pipeline-1/
4
Data Hazards

An instruction depends on completion of data

access by a previous instruction
add $s0, $t0, $t1
sub $t2 $s0,
$t2, $s0 $t3

m
er as
co
eH w
o.
rs e
ou urc
40 CS/ECE 3330 – Fall 2009
o
aC s
vi y re

Forwarding (aka Bypassing)

Use result when it is computed

ed d

Don’t wait for it to be stored in a register

ar stu

Requires extra connections in the datapath

sh is
Th

41 CS/ECE 3330 – Fall 2009

https://www.coursehero.com/file/5657130/cs3330-chap4-pipeline-1/
5
Load-Use Data Hazard

Can’t always avoid stalls by forwarding

If value not computed when needed
Can’t forward backward in time!

m
er as
co
eH w
o.
rs e
ou urc
42 CS/ECE 3330 – Fall 2009
o
aC s
vi y re

Code Scheduling to Avoid Stalls

Reorder code to avoid use of load result in the

ed d

next instruction
ar stu

C code for A = B + E; C = B + F;
sh is
Th

lw $t1, 0($t0) lw $t1, 0($t0)

lw $t2, 4($t0) lw $t2, 4($t0)
stall add $t3, $t1, $t2 lw $t4, 8($t0)
sw $t3
$t3, 12($t0) add $t3
$t3, $t1,
$t1 $t2
lw $t4, 8($t0) sw $t3, 12($t0)
stall add $t5, $t1, $t4 add $t5, $t1, $t4
sw $t5, 16($t0) sw $t5, 16($t0)
13 cycles 11 cycles

43 CS/ECE 3330 – Fall 2009

https://www.coursehero.com/file/5657130/cs3330-chap4-pipeline-1/
6
Control Hazards

Branch determines flow of control

Fetching next instruction depends on branch
outcome
Pipeline
Pi li ’t always
can’t l ffetch
t h correctt instruction
i t ti
– Still working on ID stage of branch

In MIPS pipeline
Need to compare registers and compute target
early in the pipeline
Add hardware to do it in ID stage

m
er as
co
eH w
o.
rs e
ou urc
44 CS/ECE 3330 – Fall 2009
o
aC s
vi y re

Stall on Branch

Wait until branch outcome determined before

ed d

fetching next instruction
ar stu
sh is
Th

45 CS/ECE 3330 – Fall 2009

https://www.coursehero.com/file/5657130/cs3330-chap4-pipeline-1/
7
Branch Prediction

Longer pipelines can’t readily determine

branch outcome early
Stall penalty becomes unacceptable
Predict outcome of branch
Only stall if prediction is wrong
In MIPS pipeline
Can predict branches not taken
Fetch instruction after branch, with no delay

m
er as
co
eH w
o.
rs e
ou urc
46 CS/ECE 3330 – Fall 2009
o
aC s
vi y re

MIPS with Predict Not Taken

ed d
ar stu

Prediction
correct
sh is
Th

Prediction
incorrect

47 CS/ECE 3330 – Fall 2009

https://www.coursehero.com/file/5657130/cs3330-chap4-pipeline-1/
8
More-Realistic Branch Prediction

Static branch prediction

Based on typical branch behavior
Example: loop and if-statement branches
– P di t b
Predict k
backwardd branches
b h taken
t k
– Predict forward branches not taken
Dynamic branch prediction
Hardware measures actual branch behavior
– e.g., record recent history of each branch
Assume future behavior will continue the trend

m
er as
– When wrong, stall while re-fetching, and update history

co
eH w
o.
rs e
ou urc
48 CS/ECE 3330 – Fall 2009
o
aC s
vi y re

Pipeline Summary

The BIG Picture

ed d
ar stu

Pipelining improves performance by increasing

instruction throughput
g p
Executes multiple instructions in parallel
Each instruction has the same latency
sh is

Subject to hazards
Th

Structure, data, control

g affects complexity
Instruction set design p y of
pipeline implementation

49 CS/ECE 3330 – Fall 2009

https://www.coursehero.com/file/5657130/cs3330-chap4-pipeline-1/
9
Powered by TCPDF (www.tcpdf.org)
Hazards

Situations that prevent starting the next

instruction in the next cycle
Structure hazards
A required resource is busy
Data hazard
Need to wait for previous instruction to complete
its data read/write
Control hazard
Deciding
D idi on controll action
i depends
d d on previous
i
instruction

38 CS/ECE 3330 – Fall 2009

Structure Hazards

Conflict for use of a resource

In MIPS pipeline with a single memory
Load/store requires data access
Instruction fetch would have to stall for that cycle
– Would cause a pipeline “bubble”
Hence, pipelined datapaths require separate
instruction/data memories
Or separate instruction/data caches

39 CS/ECE 3330 – Fall 2009

1
Data Hazards

An instruction depends on completion of data

access by a previous instruction
add $s0, $t0, $t1
sub $t2 $s0,
$t2, $s0 $t3

40 CS/ECE 3330 – Fall 2009

Data Hazards

Think about which pipeline stage generates or

uses a value

add $s0, $t0, $t1 IF ID EX MEM WB

sub $t2, $s0, $t3 IF ID EX MEM WB

lw $s0, 20($t1) IF ID EX MEM WB

sw $t3, 12($t0) IF ID EX MEM WB

41 CS/ECE 3330 – Fall 2009

2
Forwarding (aka Bypassing)

Use result when it is computed

Don’t wait for it to be stored in a register
Requires extra connections in the datapath

42 CS/ECE 3330 – Fall 2009

Load-Use Data Hazard

Can’t always avoid stalls by forwarding

If value not computed when needed
Can’t forward backward in time!

43 CS/ECE 3330 – Fall 2009

3
Code Scheduling to Avoid Stalls

Reorder code to avoid use of load result in the

next instruction
C code for A = B + E; C = B + F;

lw $t1, 0($t0) lw $t1, 0($t0)

lw $t2, 4($t0) lw $t2, 4($t0)
stall add $t3, $t1, $t2 lw $t4, 8($t0)
sw $t3,
$t3 12($t0) add $t3,
$t3 $t1,
$t1 $t2
lw $t4, 8($t0) sw $t3, 12($t0)
stall add $t5, $t1, $t4 add $t5, $t1, $t4
sw $t5, 16($t0) sw $t5, 16($t0)
13 cycles 11 cycles

44 CS/ECE 3330 – Fall 2009

Control Hazards

Branch determines flow of control

Fetching next instruction depends on branch
outcome
Pipeline
Pi li ’t always
can’t l ffetch
t h correctt instruction
i t ti
– Still working on ID stage of branch
In MIPS pipeline
Need to compare registers and compute target
early in the pipeline
Add hardware to do it in ID stage

45 CS/ECE 3330 – Fall 2009

4
Stall on Branch

Wait until branch outcome determined before

fetching next instruction

46 CS/ECE 3330 – Fall 2009

Requested Detour: Branch Delay Slots

Execute the instruction following the branch

unconditionally
MIPS uses them!
Behind the scenes
Compiler places a “control independent”
instruction after the branch
Yuck!

slt $t0
$t0, $t1,
$t1 $t2 Always executes
beq END_OF_LOOP
add $t0, $t1, $t2 Control dependent
add $t0, $t0, $t1
END_OF_LOOP:
47 CS/ECE 3330 – Fall 2009

5
Branch Prediction

Longer pipelines can’t readily determine

48 CS/ECE 3330 – Fall 2009

MIPS with Predict Not Taken

Prediction
correct

Prediction
incorrect

49 CS/ECE 3330 – Fall 2009

6
More-Realistic Branch Prediction

Static branch prediction

50 CS/ECE 3330 – Fall 2009

The Big Picture

Pipelining improves performance by increasing

instruction throughput
g p
Executes multiple instructions in parallel
Each instruction has the same latency
Subject to hazards
Structure, data, control
g affects complexity
Instruction set design p y of
pipeline implementation

51 CS/ECE 3330 – Fall 2009

7
MIPS Pipelined Datapath

MEM

Right-to-left WB
flow leads to
hazards

52 CS/ECE 3330 – Fall 2009

Pipeline registers

Need registers between stages

To hold information produced in previous cycle

53 CS/ECE 3330 – Fall 2009

8
Pipeline Operation

Cycle-by-cycle flow of instructions through the

pipelined datapath
“Single-clock-cycle” pipeline diagram
– Shows pipeline usage in a single cycle
– Highlight resources used
c.f. “multi-clock-cycle” diagram
– Graph of operation over time
We’ll look at “single-clock-cycle” diagrams for
load & store

54 CS/ECE 3330 – Fall 2009

IF for Load, Store, …

55 CS/ECE 3330 – Fall 2009

9
ID for Load, Store, …

56 CS/ECE 3330 – Fall 2009

EX for Load

57 CS/ECE 3330 – Fall 2009

10
MEM for Load

58 CS/ECE 3330 – Fall 2009

WB for Load

Wrong
register
number

59 CS/ECE 3330 – Fall 2009

11
Corrected Datapath for Load

60 CS/ECE 3330 – Fall 2009

EX for Store

61 CS/ECE 3330 – Fall 2009

12
MEM for Store

62 CS/ECE 3330 – Fall 2009

WB for Store

63 CS/ECE 3330 – Fall 2009

13
Multi-Cycle Pipeline Diagram

Form showing resource usage

64 CS/ECE 3330 – Fall 2009

Multi-Cycle Pipeline Diagram

Traditional form

65 CS/ECE 3330 – Fall 2009

14
Single-Cycle Pipeline Diagram

State of pipeline in a given cycle

66 CS/ECE 3330 – Fall 2009

Pipelining Demo

http://bellerofonte.dii.unisi.it/WEBMIPS/

70 CS/ECE 3330 – Fall 2009

15
4.7 Data Hazards: Forwarding vs. Stalling
Data Hazards in ALU Instructions
Consider this sequence:
sub $2, $1,$3
and $12,$2,$5
or $13,$6,$2
add $14,$2,$2
sw $15,100($2)
We can resolve hazards with forwarding
How do we detect when to forward?

0 CS/ECE 3330 – Fall 2009

Dependencies & Forwarding

1 CS/ECE 3330 – Fall 2009

1
Detecting the Need to Forward

Pass register numbers along pipeline

e.g., ID/EX.RegisterRs = register number for Rs
sitting in ID/EX pipeline register
ALU operand register numbers in EX stage are
given by
ID/EX.RegisterRs, ID/EX.RegisterRt

2 CS/ECE 3330 – Fall 2009

Detecting the Need to Forward

Data hazards when

1a. EX/MEM.RegisterRd = ID/EX.RegisterRs Fwd from
EX/MEM
1b. EX/MEM.RegisterRd = ID/EX.RegisterRt pipeline reg
2 MEM/WB.RegisterRd
2a. MEM/WB R i Rd = ID/EX.RegisterRs
ID/EX R i R Fwd from
2b. MEM/WB.RegisterRd = ID/EX.RegisterRt MEM/WB
pipeline reg

3 CS/ECE 3330 – Fall 2009

2
Detecting the Need to Forward

But only if forwarding instruction will write to

a register!
EX/MEM.RegWrite, MEM/WB.RegWrite
And only if Rd for that instruction is not $zero
EX/MEM.RegisterRd ≠ 0,
MEM/WB.RegisterRd ≠ 0

4 CS/ECE 3330 – Fall 2009

Forwarding Paths

a. Without forwarding

5 CS/ECE 3330 – Fall 2009

3
Forwarding Paths

6 CS/ECE 3330 – Fall 2009

Forwarding Conditions

EX hazard
if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs))
ForwardA = 10
if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRt))
ForwardB = 10

7 CS/ECE 3330 – Fall 2009

4
Forwarding Conditions

MEM hazard
if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)
and (MEM/WB.RegisterRd = ID/EX.RegisterRs))
ForwardA = 01
if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)
and (MEM/WB.RegisterRd = ID/EX.RegisterRt))
ForwardB = 01

8 CS/ECE 3330 – Fall 2009

Double Data Hazard

Consider the sequence:
add $1,$1,$2
add $1,$1,$3
$1,$1,$4
add $ ,$ ,$
Both hazards occur
Want to use the most recent
Revise MEM hazard condition
Only fwd if EX hazard condition isn’t true

9 CS/ECE 3330 – Fall 2009

5
Double Data Hazard

Initially:
$1 = 1
$2 = 2
$3 = 3
$4 = 4

10 CS/ECE 3330 – Fall 2009

Revised Forwarding Condition

MEM hazard
if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)
and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs))
and (MEM/WB.RegisterRd = ID/EX.RegisterRs))
ForwardA = 01
if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)
and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRt))
and (MEM/WB.RegisterRd = ID/EX.RegisterRt))
ForwardB = 01

11 CS/ECE 3330 – Fall 2009

6
Datapath with Forwarding

12 CS/ECE 3330 – Fall 2009

Load-Use Data Hazard

Need to stall
for one cycle

13 CS/ECE 3330 – Fall 2009

7
Load-Use Hazard Detection

Check when using instruction is decoded in ID

stage
ALU operand register numbers in ID stage are
given by
IF/ID.RegisterRs, IF/ID.RegisterRt

14 CS/ECE 3330 – Fall 2009

Load-Use Hazard Detection

Load-use hazard when

ID/EX.MemRead and
((ID/EX.RegisterRt = IF/ID.RegisterRs) or
(ID/EX RegisterRt = IF/ID.RegisterRt))
(ID/EX.RegisterRt IF/ID RegisterRt))
If detected, stall and insert bubble

15 CS/ECE 3330 – Fall 2009

8
How to Stall the Pipeline
Force control values in ID/EX register
to 0
EX, MEM and WB do nop (no-operation)
Prevent update of PC and IF/ID register
Using instruction is decoded again
Following instruction is fetched again
1-cycle stall allows MEM to read data for lw
– Can subsequently forward to EX stage

16 CS/ECE 3330 – Fall 2009

Stall/Bubble in the Pipeline

Stall inserted
here

17 CS/ECE 3330 – Fall 2009

9
Stall/Bubble in the Pipeline

Or, more
accurately…

18 CS/ECE 3330 – Fall 2009

Datapath with Hazard Detection

19 CS/ECE 3330 – Fall 2009

10
The Big Picture
Stalls reduce performance
But are required to get correct results
Compiler can arrange code to avoid hazards
d stalls
and ll
Requires knowledge of the pipeline structure

20 CS/ECE 3330 – Fall 2009

4.8 Control Hazard

Branch Hazards

If branch outcome determined in MEM ds

Flush these
instructions
(Set control
values to 0)

21 CS/ECE 3330 – Fall 2009

11
Reducing Branch Delay

Move hardware to determine outcome to ID

stage
Target address adder
Register
R i t comparator t
Example: branch taken
36: sub $10, $4, $8
40: beq $1, $3, 7
44: and $12, $2, $5
48: or $13, $2, $6
52: add $14, $4, $2
56: slt $15, $6, $7
...
72: lw $4, 50($7)

22 CS/ECE 3330 – Fall 2009

Example: Branch Taken

23 CS/ECE 3330 – Fall 2009

12
Example: Branch Taken
Really, sll $0, $0, 0

24 CS/ECE 3330 – Fall 2009

Data Hazards for Branches

If a comparison register is a destination

of 2nd or 3rd preceding ALU instruction

add $1, $2, $3 IF ID EX MEM WB

add $4, $5, $6 IF ID EX MEM WB

… IF ID EX MEM WB

beq $1, $4, target IF ID EX MEM WB

Can resolve using forwarding

25 CS/ECE 3330 – Fall 2009

13
Data Hazards for Branches

If a comparison register is a destination of

preceding ALU instruction or 2nd preceding
load instruction
Need 1 stall cycle

lw $1, addr IF ID EX MEM WB

add $ , $5,
$4, $ , $6
$ IF ID EX MEM WB

beq stalled IF ID

beq $1, $4, target ID EX MEM WB

26 CS/ECE 3330 – Fall 2009

Data Hazards for Branches

If a comparison register is a destination of

immediately preceding load instruction
Need 2 stall cycles

lw $1, addr IF ID EX MEM WB

q stalled
beq IF ID

beq stalled ID

beq $1, $0, target ID EX MEM WB

27 CS/ECE 3330 – Fall 2009

14
Dynamic Branch Prediction

In deeper and superscalar pipelines, branch

penalty is more significant
Use dynamic prediction
Branch prediction buffer (aka branch history table)
Indexed by recent branch instruction addresses
Stores outcome (taken/not taken)
To execute a branch
– Check table, expect the same outcome
– Start fetching from fall-through or target
– If wrong, flush pipeline and flip prediction

28 CS/ECE 3330 – Fall 2009

1-Bit Predictor: Shortcoming

Inner loop branches mispredicted twice!

outer: …
…
inner: …
…
beq …, …, inner
…
beq …, …, outer

Mispredict as taken on last iteration of

inner loop
Then mispredict as not taken on first
iteration of inner loop next time around

29 CS/ECE 3330 – Fall 2009

15
2-Bit Predictor

Only change prediction on two successive

mispredictions

30 CS/ECE 3330 – Fall 2009

Calculating the Branch Target

Even with predictor, still need to calculate the

target address
1-cycle penalty for a taken branch
Branch target buffer
Cache of target addresses
Indexed by PC when instruction fetched
– If hit and instruction is branch predicted taken, can fetch
target immediately

31 CS/ECE 3330 – Fall 2009

16
Last Time

Data Hazards
Detection
Classification
Handling
H dli
Control Hazards and Branch Prediction

98 CS/ECE 3330 – Fall 2009

4.9 Exceptions
Exceptions and Interrupts

“Unexpected” events requiring change

in flow of control
Different ISAs use the terms differently
Exception
Arises within the CPU
– e.g., undefined opcode, overflow, syscall, …

Interrupt
From an external I/O controller
Dealing
D li with them without
ith th ith t sacrificing
ifi i
performance is hard

99 CS/ECE 3330 – Fall 2009

1
Sample Exceptions

I/O request
Invoke the operating system from user
program
Arithmetic overflow
Undefined instruction
Hardware malfunction

100 CS/ECE 3330 – Fall 2009

Handling Exceptions

Save PC of offending (or interrupted)

instruction
In MIPS: Exception Program Counter (EPC)
Save indication of the problem
In MIPS: Cause register
We’ll assume 1-bit
– 0 for undefined opcode, 1 for overflow
Jump to handler at 8000 00180

101 CS/ECE 3330 – Fall 2009

2
An Alternate Mechanism

Vectored Interrupts
Handler address determined by the cause
Example:
Undefined opcode: C000 0000
Overflow: C000 0020
…: C000 0040
Instructions either
Deal with the interrupt, or
Jump
J to
t reall handler
h dl

102 CS/ECE 3330 – Fall 2009

Handler Actions

Read cause, and transfer to relevant handler

Determine action required
If restartable
Take corrective action
use EPC to return to program
Otherwise
Terminate program
Report error using EPC, cause, …

103 CS/ECE 3330 – Fall 2009

3
Exceptions in a Pipeline

Another form of control hazard

Consider overflow on add in EX stage
add $1, $2, $1
Prevent $1 from being clobbered
Complete previous instructions
Flush add and subsequent instructions
Set Cause and EPC register values
Transfer control to handler
Similar
Si il to
t mispredicted
i di t d branch
b h
Use much of the same hardware

104 CS/ECE 3330 – Fall 2009

Pipeline with Exceptions

105 CS/ECE 3330 – Fall 2009

4
Exception Properties

Restartable exceptions
Pipeline can flush the instruction
Handler executes, then returns to the instruction
R f t h d and
– Refetched d executed
t d from
f t h
scratch

PC saved in EPC register

Identifies causing instruction
Actually PC + 4 is saved
– Handler must adjust

106 CS/ECE 3330 – Fall 2009

Exception Example

Exception on add in
40 sub $11, $2, $4
44 and $12, $2, $5
48 or $13 $2,
$13, $2 $6
4C add $1, $2, $1
50 slt $15, $6, $7
54 lw $16, 50($7)
…
Handler
80000180 sw $25,
$2 1000($0)
80000184 sw $26, 1004($0)
…

107 CS/ECE 3330 – Fall 2009

5
Exception Example

108 CS/ECE 3330 – Fall 2009

Exception Example

109 CS/ECE 3330 – Fall 2009

6
Multiple Exceptions

Pipelining overlaps multiple instructions

Could have multiple exceptions at once
Simple approach: deal with exception from
earliest instruction
Flush subsequent instructions
“Precise” exceptions
In complex pipelines
Multiple instructions issued per cycle
Out-of-order
Out of order completion
Maintaining precise exceptions is difficult!

110 CS/ECE 3330 – Fall 2009

Imprecise Exceptions

Just stop pipeline and save state

Including exception cause(s)
Let the handler work out
Which instruction(s) had exceptions
Which to complete or flush
– May require “manual” completion

Simplifies hardware, but more complex

handler software
Not
N t feasible
f ibl for
f complex
l multiple-issue
lti l i
out-of-order pipelines

111 CS/ECE 3330 – Fall 2009

7
Instruction-Level Parallelism (ILP)

Pipelining: executing multiple instructions in

4.10 Para
parallel
To increase ILP

allelism and Advance

Deeper pipeline
– Less work per stage ⇒ shorter clock cycle
Multiple issue
– Replicate pipeline stages ⇒ multiple pipelines
– Start multiple instructions per clock cycle
– CPI < 1, so use Instructions Per Cycle (IPC)
– E.g.,
E g 4GHz 4 4-way
way multiple-issue
multiple issue

ed ILP
• 16 BIPS, peak CPI = 0.25, peak IPC = 4
– But dependencies reduce this in practice

112 CS/ECE 3330 – Fall 2009

Multiple Issue

Static multiple issue

Compiler groups instructions to be issued together
Packages them into “issue slots”
Compiler
C il d d avoids
detects and id h d
hazards
Dynamic multiple issue
CPU examines instruction stream and chooses
instructions to issue each cycle
Compiler can help by reordering instructions
CPU resolves hazards using advanced techniques
at runtime

113 CS/ECE 3330 – Fall 2009

8
Speculation

“Guess” what to do with an instruction

Start operation as soon as possible
Check whether guess was right
– If so, complete
l t the
th operation
ti
– If not, roll-back and do the right thing
Common to static and dynamic multiple issue
Examples
Speculate on branch outcome
– Roll back if p
path taken is different
Speculate on load
– Roll back if location is updated

114 CS/ECE 3330 – Fall 2009

Compiler/Hardware Speculation

Compiler can reorder instructions

e.g., move load before branch
Can include “fix-up” instructions to recover from
i t guess
incorrect
Hardware can look ahead for instructions to
execute
Buffer results until it determines they are actually
needed
Flush buffers on incorrect speculation
p

115 CS/ECE 3330 – Fall 2009

9
Speculation and Exceptions

What if exception occurs on a speculatively

executed instruction?
e.g., speculative load before null-pointer check
Static speculation
Can add ISA support for deferring exceptions
Dynamic speculation
Can buffer exceptions until instruction completion
(which may not occur)

116 CS/ECE 3330 – Fall 2009

Static Multiple Issue

Compiler groups instructions into “issue

packets”
Group of instructions that can be issued on a
single cycle
Determined by pipeline resources required
Think of an issue packet as a very long
instruction
Specifies multiple concurrent operations
⇒ Veryy Longg Instruction Word (VLIW)
( )

117 CS/ECE 3330 – Fall 2009

10
Scheduling Static Multiple Issue

Compiler must remove some/all hazards

Reorder instructions into issue packets
No dependencies with a packet
Possibly
P ibl some d d
dependenciesi b k
between packets
– Varies between ISAs; compiler must know!
Pad with nop if necessary

118 CS/ECE 3330 – Fall 2009

MIPS with Static Dual Issue

Two-issue packets
One ALU/branch instruction
One load/store instruction
64-bit
64 bi aligned
li d
– ALU/branch, then load/store
– Pad an unused instruction with nop

Address Instruction type Pipeline Stages

n ALU/branch IF ID EX MEM WB
n+4 Load/store IF ID EX MEM WB
n+8 ALU/branch IF ID EX MEM WB
n + 12 Load/store IF ID EX MEM WB
n + 16 ALU/branch IF ID EX MEM WB
n + 20 Load/store IF ID EX MEM WB

119 CS/ECE 3330 – Fall 2009

11
MIPS with Static Dual Issue

120 CS/ECE 3330 – Fall 2009

Hazards in the Dual-Issue MIPS

More instructions executing in parallel

EX data hazard
Forwarding avoided stalls with single-issue
Now can’t use ALU result in load/store in same
packet
– add $t0, $s0, $s1
load $s2, 0($t0)
– Split into two packets, effectively a stall
Load-use hazard
Still one cycle use latency, but now two
instructions
More aggressive scheduling required

121 CS/ECE 3330 – Fall 2009

12
Scheduling Example

Schedule this for dual-issue MIPS

Loop:
p lw $t0,
$ , 0($s1)
($ ) # $t0=array
$ y element
addu $t0, $t0, $s2 # add scalar in $s2
sw $t0, 0($s1) # store result
addi $s1, $s1,–4 # decrement pointer
bne $s1, $zero, Loop # branch $s1!=0

ALU/branch Load/store cycle

Loop: nop lw $t0, 0($s1) 1
addi
ddi $s1,
$ 1 $s1,–4
$ 1 4 nop 2
addu $t0, $t0, $s2 nop 3
bne $s1, $zero, Loop sw $t0, 4($s1) 4

IPC = 5/4 = 1.25 (c.f. peak IPC = 2)

122 CS/ECE 3330 – Fall 2009

Loop Unrolling

Replicate loop body to expose more

parallelism
Reduces loop-control overhead
Use different registers per replication
Called “register renaming”
Avoid loop-carried “anti-dependencies”
– Store followed by a load of the same register
– Aka “name dependence”
• Reuse of a register name

123 CS/ECE 3330 – Fall 2009

13
Loop Unrolling Example

ALU/branch Load/store cycle

Loop: addi $s1, $s1,–16 lw $t0, 0($s1) 1
nop lw $t1, 12($s1) 2
addu $t0, $t0, $s2 lw $t2, 8($s1) 3
addu $t1, $t1, $s2 lw $t3, 4($s1) 4
addu $t2, $t2, $s2 sw $t0, 16($s1) 5
addu $t3, $t4, $s2 sw $t1, 12($s1) 6
nop sw $t2, 8($s1) 7
bne $s1, $zero, Loop sw $t3, 4($s1) 8

IPC = 14/8 = 1.75

Closer to 2, but at cost of registers and code size

124 CS/ECE 3330 – Fall 2009

Dynamic Multiple Issue

“Superscalar” processors
CPU decides whether to issue 0, 1, 2, … each
cycle
Avoiding structural and data hazards
Avoids the need for compiler scheduling
Though it may still help
Code semantics ensured by the CPU

125 CS/ECE 3330 – Fall 2009

14
Dynamic Pipeline Scheduling

Allow the CPU to execute instructions out of

order to avoid stalls
But commit result to registers in order
Example
lw $t0, 20($s2)
addu $t1, $t0, $t2
sub $s4, $s4, $t3
slti $t5, $s4, 20
Can start sub while addu is waiting for lw

126 CS/ECE 3330 – Fall 2009

Dynamically Scheduled CPU

Preserves
dependencies

Hold pending
operands

Results also sent

to any waiting
reservation
stations
Reorders buffer
for register
Can supply
writes
operands for
issued
instructions
127 CS/ECE 3330 – Fall 2009

15
Register Renaming

Reservation stations and reorder buffer

effectively provide register renaming
On instruction issue to reservation station
If operand is available in register file or reorder
buffer
– Copied to reservation station
– No longer required in the register; can be overwritten
If operand is not yet available
– It will be provided to the reservation station by a function
unit
– Register update may not be required

128 CS/ECE 3330 – Fall 2009

Speculation

Predict branch and continue issuing

Don’t commit until branch outcome determined
Load speculation
Avoid load and cache miss delay
– Predict the effective address
– Predict loaded value
– Load before completing outstanding stores
– Bypass stored values to load unit
Don’t commit load until speculation cleared

129 CS/ECE 3330 – Fall 2009

16
Why Do Dynamic Scheduling?

Why not just let the compiler schedule code?

Not all stalls are predicable
e.g., cache misses
Can’t always schedule around branches
Branch outcome is dynamically determined
Different implementations of an ISA have
different latencies and hazards

130 CS/ECE 3330 – Fall 2009

Does Multiple Issue Work?

The BIG Picture

Yes, but not as much as we’d like
Programs have real dependencies that limit ILP
Some dependencies are hard to eliminate
e.g., pointer aliasing
Some parallelism is hard to expose
Limited window size during instruction issue
Memory delays and limited bandwidth
Hard to keep pipelines full
Speculation can help if done well

131 CS/ECE 3330 – Fall 2009

17
Power Efficiency

Complexity of dynamic scheduling and

speculations requires power
Multiple simpler cores may be better

Microprocessor Year Clock Rate Pipeline Issue Out-of-order/ Cores Power

Stages width Speculation
i486 1989 25MHz 5 1 No 1 5W
Pentium 1993 66MHz 5 2 No 1 10W
Pentium Pro 1997 200MHz 10 3 Yes 1 29W
P4 Willamette 2001 2000MHz 22 3 Yes 1 75W
P4 Prescott 2004 3600MHz 31 3 Yes 1 103W
Core 2006 2930MHz 14 4 Yes 2 75W
UltraSparc III 2003 1950MHz 14 4 No 1 90W
UltraSparc T1 2005 1200MHz 6 1 No 8 70W

132 CS/ECE 3330 – Fall 2009

4.4 Pipelining
No ratings yet
4.4 Pipelining
39 pages
Cse410 10 Pipelining A
No ratings yet
Cse410 10 Pipelining A
27 pages
Module 4-Pipelining
No ratings yet
Module 4-Pipelining
39 pages
06 - CS F342 Pipelining (ForMIDSEM - Upto35slides)
No ratings yet
06 - CS F342 Pipelining (ForMIDSEM - Upto35slides)
69 pages
5.1-5.3 Pipelining and Parallel Processing
No ratings yet
5.1-5.3 Pipelining and Parallel Processing
56 pages
Pipelining Unit 3
No ratings yet
Pipelining Unit 3
19 pages
SRM Pipelining 05
No ratings yet
SRM Pipelining 05
42 pages
Module 5 Part2 Pipelining
No ratings yet
Module 5 Part2 Pipelining
36 pages
A Pipelining
No ratings yet
A Pipelining
16 pages
Pipelining
No ratings yet
Pipelining
32 pages
ch4 2
No ratings yet
ch4 2
42 pages
Lec18 Pipeline Chap9 2
No ratings yet
Lec18 Pipeline Chap9 2
26 pages
Comp206 Lecture9
No ratings yet
Comp206 Lecture9
53 pages
Kien-Truc-May-Tinh-Nang-Cao - Tran-Ngoc-Thinh - Lec03-Pipelining - (Cuuduongthancong - Com)
No ratings yet
Kien-Truc-May-Tinh-Nang-Cao - Tran-Ngoc-Thinh - Lec03-Pipelining - (Cuuduongthancong - Com)
35 pages
Chapter 04 Processor 3.5
No ratings yet
Chapter 04 Processor 3.5
52 pages
CS429: Computer Organization and Architecture: Pipeline III
No ratings yet
CS429: Computer Organization and Architecture: Pipeline III
27 pages
Lec03-Pipelining 2021
No ratings yet
Lec03-Pipelining 2021
20 pages
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
No ratings yet
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
64 pages
3.2 Pipeline Processing
No ratings yet
3.2 Pipeline Processing
18 pages
Comp206 Lecture8
No ratings yet
Comp206 Lecture8
32 pages
Pipeline Processing
No ratings yet
Pipeline Processing
16 pages
Pipelined Datapath and Control
No ratings yet
Pipelined Datapath and Control
37 pages
Computer Architecture: Nguyễn Trí Thành
No ratings yet
Computer Architecture: Nguyễn Trí Thành
77 pages
Chapter4 Pipelining END FA11
No ratings yet
Chapter4 Pipelining END FA11
84 pages
Ca Lecture 9
No ratings yet
Ca Lecture 9
26 pages
Week 11
No ratings yet
Week 11
33 pages
Lecture 13 Pipelining
No ratings yet
Lecture 13 Pipelining
12 pages
CS530 Fall2015 Lecture9
No ratings yet
CS530 Fall2015 Lecture9
5 pages
Boundary Scan and Core-Based Testing-NOTES
No ratings yet
Boundary Scan and Core-Based Testing-NOTES
58 pages
Lab 1 - Introduction To 8086 Microprocessor Emulator
100% (4)
Lab 1 - Introduction To 8086 Microprocessor Emulator
10 pages
L15 MipsPipeline
No ratings yet
L15 MipsPipeline
26 pages
Slides14 Pipeline1 4up
No ratings yet
Slides14 Pipeline1 4up
6 pages
Lecture-14 CH-04 2
No ratings yet
Lecture-14 CH-04 2
20 pages
Chap4 Pipelining
No ratings yet
Chap4 Pipelining
39 pages
Physical Design Automation (Pe - Iv)
No ratings yet
Physical Design Automation (Pe - Iv)
2 pages
Unit6 - Digital System Design Based On Data Path and Control Unit
No ratings yet
Unit6 - Digital System Design Based On Data Path and Control Unit
90 pages
Datasheet PDF
No ratings yet
Datasheet PDF
213 pages
Basic Computer Organization and Architecture
No ratings yet
Basic Computer Organization and Architecture
25 pages
Profibus 9
No ratings yet
Profibus 9
25 pages
Pipeline
No ratings yet
Pipeline
39 pages
Computer Architecture Pipe Line
No ratings yet
Computer Architecture Pipe Line
28 pages
Pipelinehazard 160823134502
No ratings yet
Pipelinehazard 160823134502
61 pages
Pipelined MIPS Processor: Dmitri Strukov ECE 154A
No ratings yet
Pipelined MIPS Processor: Dmitri Strukov ECE 154A
81 pages
Vicaya Hara: Kenassu Nappakāsati Kissābhilepanaṃ Brūsi, Kiṃ Su Tassa Mahabbhaya '' Nti
No ratings yet
Vicaya Hara: Kenassu Nappakāsati Kissābhilepanaṃ Brūsi, Kiṃ Su Tassa Mahabbhaya '' Nti
69 pages
Operating System and Computer Architecture (CT049-3-1-OS)
No ratings yet
Operating System and Computer Architecture (CT049-3-1-OS)
40 pages
Unit 5a - CPU Design
100% (1)
Unit 5a - CPU Design
64 pages
A Review On SRAM Memory Design Using FinFET Techno
No ratings yet
A Review On SRAM Memory Design Using FinFET Techno
21 pages
UnitI - Session1 - 8086 - 8087 Min Max Mode Timing Diagram
No ratings yet
UnitI - Session1 - 8086 - 8087 Min Max Mode Timing Diagram
56 pages
Nibbana Gaminipatipada Volume 2
100% (1)
Nibbana Gaminipatipada Volume 2
537 pages
Pipeline Processor Design
No ratings yet
Pipeline Processor Design
89 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
49 pages
Ee6502 Microprocessors and Microcontrollers
0% (1)
Ee6502 Microprocessors and Microcontrollers
97 pages
Pipe Lining
No ratings yet
Pipe Lining
66 pages
Lect8 Pipelined DP Control
No ratings yet
Lect8 Pipelined DP Control
59 pages
Computer Architecture
No ratings yet
Computer Architecture
10 pages
Pipelinehazard For Class
No ratings yet
Pipelinehazard For Class
61 pages
Gatelevel Modeling
No ratings yet
Gatelevel Modeling
13 pages
Chapter 4 The Processor
No ratings yet
Chapter 4 The Processor
72 pages
Nibbana Gaminipatipada Volume 4
100% (1)
Nibbana Gaminipatipada Volume 4
441 pages
Nibbana Gaminipatipada Volume 1
100% (1)
Nibbana Gaminipatipada Volume 1
636 pages
06 Pipeline PDF
No ratings yet
06 Pipeline PDF
17 pages
Embedded Systems Design: Pipelining and Instruction Scheduling
No ratings yet
Embedded Systems Design: Pipelining and Instruction Scheduling
48 pages
Pipelining
No ratings yet
Pipelining
44 pages
Argentinargumentatory
No ratings yet
Argentinargumentatory
33 pages
On Mindfulness of In&out Breaths - Ānāpānassatikathā - Pasanna Citta
No ratings yet
On Mindfulness of In&out Breaths - Ānāpānassatikathā - Pasanna Citta
35 pages
Indian Statistical Institute: Students' Brochure
No ratings yet
Indian Statistical Institute: Students' Brochure
68 pages
Advanced Linux Programming
No ratings yet
Advanced Linux Programming
31 pages
16.482 / 16.561 Computer Architecture and Design: Instructor: Dr. Michael Geiger Fall 2013
No ratings yet
16.482 / 16.561 Computer Architecture and Design: Instructor: Dr. Michael Geiger Fall 2013
42 pages
Ca07 2014 PDF
No ratings yet
Ca07 2014 PDF
56 pages
Data Hazards in ALU Instructions: Consider This Sequence
No ratings yet
Data Hazards in ALU Instructions: Consider This Sequence
14 pages
01 Chapter
No ratings yet
01 Chapter
41 pages
3 Pipeline
No ratings yet
3 Pipeline
21 pages
COA Unit 3
No ratings yet
COA Unit 3
89 pages
HCF4020
No ratings yet
HCF4020
11 pages
COE 305 Syllabus - SHH
No ratings yet
COE 305 Syllabus - SHH
5 pages
Nand
No ratings yet
Nand
7 pages
List of Vlsi Books
100% (2)
List of Vlsi Books
15 pages
Verilog Question 2
No ratings yet
Verilog Question 2
27 pages
Pipeline and Vector
No ratings yet
Pipeline and Vector
29 pages
Logic Gates Grade 12 Physics Lesson 1-4
No ratings yet
Logic Gates Grade 12 Physics Lesson 1-4
38 pages
Digital Electronics MCQs
No ratings yet
Digital Electronics MCQs
3 pages
Cse410 10 Pipelining A
No ratings yet
Cse410 10 Pipelining A
7 pages
HRY-312 Computer Organization Introduction To Pipelining
No ratings yet
HRY-312 Computer Organization Introduction To Pipelining
30 pages
A1060204202 - 22152 - 28 - 2018 - 17891 - Logic Families PDF
No ratings yet
A1060204202 - 22152 - 28 - 2018 - 17891 - Logic Families PDF
18 pages
6 Sem Presentation
No ratings yet
6 Sem Presentation
15 pages
Pipelining: CIT 595 Spring 2007
No ratings yet
Pipelining: CIT 595 Spring 2007
16 pages
Introduction to PHP, Part 2, Second Edition
From Everand
Introduction to PHP, Part 2, Second Edition
Adam Majczak
No ratings yet
Therīgāthā
No ratings yet
Therīgāthā
51 pages
Parallel Chapter3
No ratings yet
Parallel Chapter3
29 pages
1 Low Vol Tech
No ratings yet
1 Low Vol Tech
6 pages
Ladder Logic Symbols
No ratings yet
Ladder Logic Symbols
13 pages
LEARN MPLS FROM SCRATCH PART-B: A Beginners guide to next level of networking
From Everand
LEARN MPLS FROM SCRATCH PART-B: A Beginners guide to next level of networking
POONAM DEVI
No ratings yet
Maximum Mode 8086 System
No ratings yet
Maximum Mode 8086 System
9 pages
Dawn of The Dhamma
No ratings yet
Dawn of The Dhamma
65 pages
The Notion of Emptiness in Early Buddhism
100% (1)
The Notion of Emptiness in Early Buddhism
75 pages
Lec12 Pipeline
No ratings yet
Lec12 Pipeline
23 pages
The Thirty-One Planes of Existence
No ratings yet
The Thirty-One Planes of Existence
3 pages
Going Forwards Practising The Jhānas Rob Burbea January 8, 2020
No ratings yet
Going Forwards Practising The Jhānas Rob Burbea January 8, 2020
8 pages
Chapter 3: Jump, Loop and Call Instructions
100% (1)
Chapter 3: Jump, Loop and Call Instructions
8 pages
Empty Universe
No ratings yet
Empty Universe
108 pages
General Principles of Pipelining: Andrew Warfield CS313
No ratings yet
General Principles of Pipelining: Andrew Warfield CS313
25 pages
Acc Verilog
No ratings yet
Acc Verilog
10 pages
Question Bank
No ratings yet
Question Bank
13 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

This Study Resource Was: Pipelining Analogy

Uploaded by

This Study Resource Was: Pipelining Analogy

Uploaded by

4.

 Pipelined laundry: overlapping execution

 Speedup = 2n/0.5n + 1.5 ≈ 4

Five stages, one step per stage

 IF: Instruction fetch from memory

 ID: Instruction decode & register read

 WB: Write result back to register

 Assume time for stages is

Instr Instr fetch Register ALU op Memory Register Total time

Single-cycle (Tc= 800ps)

Pipelined (Tc= 200ps)

35 CS/ECE 3330 – Fall 2009

 If all stages are balanced

Pipelining and ISA Design

MIPS ISA designed for pipelining

 All instructions are 32-bits

– Easier to fetch and decode in one cycle

– Can decode and read registers in one step

37 CS/ECE 3330 – Fall 2009

 Situations that prevent starting the next

Conflict for use of a resource

In MIPS pipeline with a single memory

 Load/store requires data access

Hence, pipelined datapaths require separate

39 CS/ECE 3330 – Fall 2009

 An instruction depends on completion of data

Forwarding (aka Bypassing)

Use result when it is computed

 Don’t wait for it to be stored in a register

 Requires extra connections in the datapath

41 CS/ECE 3330 – Fall 2009

 Can’t always avoid stalls by forwarding

Code Scheduling to Avoid Stalls

Reorder code to avoid use of load result in the

lw $t1, 0($t0) lw $t1, 0($t0)

43 CS/ECE 3330 – Fall 2009

 Branch determines flow of control

Wait until branch outcome determined before

45 CS/ECE 3330 – Fall 2009

 Longer pipelines can’t readily determine

MIPS with Predict Not Taken

47 CS/ECE 3330 – Fall 2009

 Static branch prediction

The BIG Picture

 Pipelining improves performance by increasing

 Structure, data, control

49 CS/ECE 3330 – Fall 2009

 Situations that prevent starting the next

38 CS/ECE 3330 – Fall 2009

 Conflict for use of a resource

39 CS/ECE 3330 – Fall 2009

 An instruction depends on completion of data

40 CS/ECE 3330 – Fall 2009

 Think about which pipeline stage generates or

add $s0, $t0, $t1 IF ID EX MEM WB

sub $t2, $s0, $t3 IF ID EX MEM WB

lw $s0, 20($t1) IF ID EX MEM WB

sw $t3, 12($t0) IF ID EX MEM WB

41 CS/ECE 3330 – Fall 2009

 Use result when it is computed

42 CS/ECE 3330 – Fall 2009

Load-Use Data Hazard

 Can’t always avoid stalls by forwarding

43 CS/ECE 3330 – Fall 2009

 Reorder code to avoid use of load result in the

lw $t1, 0($t0) lw $t1, 0($t0)

44 CS/ECE 3330 – Fall 2009

 Branch determines flow of control

45 CS/ECE 3330 – Fall 2009

 Wait until branch outcome determined before

46 CS/ECE 3330 – Fall 2009

Requested Detour: Branch Delay Slots

 Execute the instruction following the branch

 Longer pipelines can’t readily determine

48 CS/ECE 3330 – Fall 2009

MIPS with Predict Not Taken

Pipelined laundry: overlapping execution

Speedup = 2n/0.5n + 1.5 ≈ 4

IF: Instruction fetch from memory

ID: Instruction decode & register read

WB: Write result back to register

Assume time for stages is

If all stages are balanced

All instructions are 32-bits

Situations that prevent starting the next

Load/store requires data access

An instruction depends on completion of data

Don’t wait for it to be stored in a register

Requires extra connections in the datapath

Can’t always avoid stalls by forwarding

Branch determines flow of control

Longer pipelines can’t readily determine

Static branch prediction

Pipelining improves performance by increasing

Structure, data, control

Situations that prevent starting the next

Conflict for use of a resource

An instruction depends on completion of data

Think about which pipeline stage generates or

Use result when it is computed

Can’t always avoid stalls by forwarding

Reorder code to avoid use of load result in the

Branch determines flow of control

Wait until branch outcome determined before

Execute the instruction following the branch

Longer pipelines can’t readily determine

Static branch prediction

Pipelining improves performance by increasing

Need registers between stages

Cycle-by-cycle flow of instructions through the

Form showing resource usage

State of pipeline in a given cycle

Pass register numbers along pipeline

Data hazards when

But only if forwarding instruction will write to

Check when using instruction is decoded in ID

Load-use hazard when

If branch outcome determined in MEM ds

Move hardware to determine outcome to ID