0% found this document useful (0 votes)
44 views58 pages

This Study Resource Was: Pipelining Analogy

The document provides an overview of pipelining by: 1) Describing how pipelining works analogously to an assembly line, allowing instructions to overlap execution for improved performance. 2) Explaining the five stage MIPS pipeline and how instructions move through each stage in parallel. 3) Noting that pipelining achieves speedup by reducing the time between instructions compared to a single-cycle processor.

Uploaded by

arkaprava paul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views58 pages

This Study Resource Was: Pipelining Analogy

The document provides an overview of pipelining by: 1) Describing how pipelining works analogously to an assembly line, allowing instructions to overlap execution for improved performance. 2) Explaining the five stage MIPS pipeline and how instructions move through each stage in parallel. 3) Noting that pipelining achieves speedup by reducing the time between instructions compared to a single-cycle processor.

Uploaded by

arkaprava paul
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 58

4.

5 An Overview of Pipelining
Pipelining Analogy

„ Pipelined laundry: overlapping execution


ƒ Parallelism improves performance

m
„ Four loads:

er as
„ Speedup = 8/3.5 = 2.3

co
eH w
„ Non-stop:

„ Speedup = 2n/0.5n + 1.5 ≈ 4

o.
rs e = number of stages
ou urc
32 CS/ECE 3330 – Fall 2009
o
aC s
vi y re

MIPS Pipeline

Five stages, one step per stage


ed d

ƒ IF: Instruction fetch from memory


ar stu

ƒ ID: Instruction decode & register read


ƒ EX:
EX Execute
E i
operation l l
or calculate dd
address
ƒ MEM: Access memory operand
sh is

ƒ WB: Write result back to register


Th

Instr 1 IF ID EX MEM WB
Instr 2 IF ID EX MEM WB
Instr 3 IF ID EX MEM WB
Time Æ
33 CS/ECE 3330 – Fall 2009

https://www.coursehero.com/file/5657130/cs3330-chap4-pipeline-1/
1
Pipeline Performance

„ Assume time for stages is


ƒ 100ps for register read or write
ƒ 200ps for other stages
„ Compare pipelined datapath with single-cycle
datapath

Instr Instr fetch Register ALU op Memory Register Total time


read access write

m
er as
lw 200ps 100 ps 200ps 200ps 100 ps 800ps

co
sw 200ps 100 ps 200ps 200ps 700ps

eH w
R-format 200ps 100 ps 200ps 100 ps 600ps

o.
beq 200ps 100 ps 200ps 500ps

rs e
ou urc
34 CS/ECE 3330 – Fall 2009
o
aC s
vi y re

Pipeline Performance
ed d

Single-cycle (Tc= 800ps)


ar stu
sh is
Th

Pipelined (Tc= 200ps)

35 CS/ECE 3330 – Fall 2009

https://www.coursehero.com/file/5657130/cs3330-chap4-pipeline-1/
2
Pipeline Speedup

„ If all stages are balanced


ƒ i.e., all take the same time
ƒ Time between instructionspipelined
Ti
= Time b t
between i t ti
instructionsnonpipelined
Number of stages
„ If not balanced, speedup is lower
„ Speedup due to increased throughput
ƒ Latency (time for each instruction) does not

m
decrease

er as
co
eH w
o.
rs e
ou urc
36 CS/ECE 3330 – Fall 2009
o
aC s
vi y re

Pipelining and ISA Design

MIPS ISA designed for pipelining


ed d

ƒ All instructions are 32-bits


ar stu

– Easier to fetch and decode in one cycle


f x86:
– c.f. 86 1-
1 to 17-byte
17 b i i
instructions
ƒ Few and regular instruction formats
sh is

– Can decode and read registers in one step


Th

ƒ Load/store addressing
– Can calculate address in 3rd stage, access
memory in 4th stage
ƒ Alignment of memory operands
– Memory access takes only one cycle

37 CS/ECE 3330 – Fall 2009

https://www.coursehero.com/file/5657130/cs3330-chap4-pipeline-1/
3
Hazards

„ Situations that prevent starting the next


instruction in the next cycle
„ Structure hazards
ƒ A required resource is busy
„ Data hazard
ƒ Need to wait for previous instruction to complete
its data read/write
„ Control hazard

m
ƒ Deciding

er as
D idi on controll action
i depends
d d on previous
i
instruction

co
eH w
o.
rs e
ou urc
38 CS/ECE 3330 – Fall 2009
o
aC s
vi y re

Structure Hazards

Conflict for use of a resource


ed d

In MIPS pipeline with a single memory


ar stu

ƒ Load/store requires data access


ƒ Instruction fetch would have to stall for that cycle
– Would cause a pipeline “bubble”
sh is

Hence, pipelined datapaths require separate


Th

instruction/data memories
ƒ Or separate instruction/data caches

39 CS/ECE 3330 – Fall 2009

https://www.coursehero.com/file/5657130/cs3330-chap4-pipeline-1/
4
Data Hazards

„ An instruction depends on completion of data


access by a previous instruction
ƒ add $s0, $t0, $t1
sub $t2 $s0,
$t2, $s0 $t3

m
er as
co
eH w
o.
rs e
ou urc
40 CS/ECE 3330 – Fall 2009
o
aC s
vi y re

Forwarding (aka Bypassing)

Use result when it is computed


ed d

ƒ Don’t wait for it to be stored in a register


ar stu

ƒ Requires extra connections in the datapath


sh is
Th

41 CS/ECE 3330 – Fall 2009

https://www.coursehero.com/file/5657130/cs3330-chap4-pipeline-1/
5
Load-Use Data Hazard

„ Can’t always avoid stalls by forwarding


ƒ If value not computed when needed
ƒ Can’t forward backward in time!

m
er as
co
eH w
o.
rs e
ou urc
42 CS/ECE 3330 – Fall 2009
o
aC s
vi y re

Code Scheduling to Avoid Stalls

Reorder code to avoid use of load result in the


ed d

„
next instruction
ar stu

„ C code for A = B + E; C = B + F;
sh is
Th

lw $t1, 0($t0) lw $t1, 0($t0)


lw $t2, 4($t0) lw $t2, 4($t0)
stall add $t3, $t1, $t2 lw $t4, 8($t0)
sw $t3
$t3, 12($t0) add $t3
$t3, $t1,
$t1 $t2
lw $t4, 8($t0) sw $t3, 12($t0)
stall add $t5, $t1, $t4 add $t5, $t1, $t4
sw $t5, 16($t0) sw $t5, 16($t0)
13 cycles 11 cycles

43 CS/ECE 3330 – Fall 2009

https://www.coursehero.com/file/5657130/cs3330-chap4-pipeline-1/
6
Control Hazards

„ Branch determines flow of control


ƒ Fetching next instruction depends on branch
outcome
ƒ Pipeline
Pi li ’t always
can’t l ffetch
t h correctt instruction
i t ti
– Still working on ID stage of branch

„ In MIPS pipeline
ƒ Need to compare registers and compute target
early in the pipeline
ƒ Add hardware to do it in ID stage

m
er as
co
eH w
o.
rs e
ou urc
44 CS/ECE 3330 – Fall 2009
o
aC s
vi y re

Stall on Branch

Wait until branch outcome determined before


ed d

„
fetching next instruction
ar stu
sh is
Th

45 CS/ECE 3330 – Fall 2009

https://www.coursehero.com/file/5657130/cs3330-chap4-pipeline-1/
7
Branch Prediction

„ Longer pipelines can’t readily determine


branch outcome early
ƒ Stall penalty becomes unacceptable
„ Predict outcome of branch
ƒ Only stall if prediction is wrong
„ In MIPS pipeline
ƒ Can predict branches not taken
ƒ Fetch instruction after branch, with no delay

m
er as
co
eH w
o.
rs e
ou urc
46 CS/ECE 3330 – Fall 2009
o
aC s
vi y re

MIPS with Predict Not Taken


ed d
ar stu

Prediction
correct
sh is
Th

Prediction
incorrect

47 CS/ECE 3330 – Fall 2009

https://www.coursehero.com/file/5657130/cs3330-chap4-pipeline-1/
8
More-Realistic Branch Prediction

„ Static branch prediction


ƒ Based on typical branch behavior
ƒ Example: loop and if-statement branches
– P di t b
Predict k
backwardd branches
b h taken
t k
– Predict forward branches not taken
„ Dynamic branch prediction
ƒ Hardware measures actual branch behavior
– e.g., record recent history of each branch
ƒ Assume future behavior will continue the trend

m
er as
– When wrong, stall while re-fetching, and update history

co
eH w
o.
rs e
ou urc
48 CS/ECE 3330 – Fall 2009
o
aC s
vi y re

Pipeline Summary

The BIG Picture


ed d
ar stu

„ Pipelining improves performance by increasing


instruction throughput
g p
ƒ Executes multiple instructions in parallel
ƒ Each instruction has the same latency
sh is

Subject to hazards
Th

ƒ Structure, data, control


„ g affects complexity
Instruction set design p y of
pipeline implementation

49 CS/ECE 3330 – Fall 2009

https://www.coursehero.com/file/5657130/cs3330-chap4-pipeline-1/
9
Powered by TCPDF (www.tcpdf.org)
Hazards

„ Situations that prevent starting the next


instruction in the next cycle
„ Structure hazards
ƒ A required resource is busy
„ Data hazard
ƒ Need to wait for previous instruction to complete
its data read/write
„ Control hazard
ƒ Deciding
D idi on controll action
i depends
d d on previous
i
instruction

38 CS/ECE 3330 – Fall 2009

Structure Hazards

„ Conflict for use of a resource


„ In MIPS pipeline with a single memory
ƒ Load/store requires data access
ƒ Instruction fetch would have to stall for that cycle
– Would cause a pipeline “bubble”
„ Hence, pipelined datapaths require separate
instruction/data memories
ƒ Or separate instruction/data caches

39 CS/ECE 3330 – Fall 2009

1
Data Hazards

„ An instruction depends on completion of data


access by a previous instruction
ƒ add $s0, $t0, $t1
sub $t2 $s0,
$t2, $s0 $t3

40 CS/ECE 3330 – Fall 2009

Data Hazards

„ Think about which pipeline stage generates or


uses a value

add $s0, $t0, $t1 IF ID EX MEM WB

sub $t2, $s0, $t3 IF ID EX MEM WB

lw $s0, 20($t1) IF ID EX MEM WB

sw $t3, 12($t0) IF ID EX MEM WB

41 CS/ECE 3330 – Fall 2009

2
Forwarding (aka Bypassing)

„ Use result when it is computed


ƒ Don’t wait for it to be stored in a register
ƒ Requires extra connections in the datapath

42 CS/ECE 3330 – Fall 2009

Load-Use Data Hazard

„ Can’t always avoid stalls by forwarding


ƒ If value not computed when needed
ƒ Can’t forward backward in time!

43 CS/ECE 3330 – Fall 2009

3
Code Scheduling to Avoid Stalls

„ Reorder code to avoid use of load result in the


next instruction
„ C code for A = B + E; C = B + F;

lw $t1, 0($t0) lw $t1, 0($t0)


lw $t2, 4($t0) lw $t2, 4($t0)
stall add $t3, $t1, $t2 lw $t4, 8($t0)
sw $t3,
$t3 12($t0) add $t3,
$t3 $t1,
$t1 $t2
lw $t4, 8($t0) sw $t3, 12($t0)
stall add $t5, $t1, $t4 add $t5, $t1, $t4
sw $t5, 16($t0) sw $t5, 16($t0)
13 cycles 11 cycles

44 CS/ECE 3330 – Fall 2009

Control Hazards

„ Branch determines flow of control


ƒ Fetching next instruction depends on branch
outcome
ƒ Pipeline
Pi li ’t always
can’t l ffetch
t h correctt instruction
i t ti
– Still working on ID stage of branch
„ In MIPS pipeline
ƒ Need to compare registers and compute target
early in the pipeline
ƒ Add hardware to do it in ID stage

45 CS/ECE 3330 – Fall 2009

4
Stall on Branch

„ Wait until branch outcome determined before


fetching next instruction

46 CS/ECE 3330 – Fall 2009

Requested Detour: Branch Delay Slots

„ Execute the instruction following the branch


unconditionally
„ MIPS uses them!
ƒ Behind the scenes
ƒ Compiler places a “control independent”
instruction after the branch
„ Yuck!

slt $t0
$t0, $t1,
$t1 $t2 Always executes
beq END_OF_LOOP
add $t0, $t1, $t2 Control dependent
add $t0, $t0, $t1
END_OF_LOOP:
47 CS/ECE 3330 – Fall 2009

5
Branch Prediction

„ Longer pipelines can’t readily determine


branch outcome early
ƒ Stall penalty becomes unacceptable
„ Predict outcome of branch
ƒ Only stall if prediction is wrong
„ In MIPS pipeline
ƒ Can predict branches not taken
ƒ Fetch instruction after branch, with no delay

48 CS/ECE 3330 – Fall 2009

MIPS with Predict Not Taken

Prediction
correct

Prediction
incorrect

49 CS/ECE 3330 – Fall 2009

6
More-Realistic Branch Prediction

„ Static branch prediction


ƒ Based on typical branch behavior
ƒ Example: loop and if-statement branches
– P di t b
Predict k
backwardd branches
b h taken
t k
– Predict forward branches not taken
„ Dynamic branch prediction
ƒ Hardware measures actual branch behavior
– e.g., record recent history of each branch
ƒ Assume future behavior will continue the trend
– When wrong, stall while re-fetching, and update history

50 CS/ECE 3330 – Fall 2009

The Big Picture

„ Pipelining improves performance by increasing


instruction throughput
g p
ƒ Executes multiple instructions in parallel
ƒ Each instruction has the same latency
„ Subject to hazards
ƒ Structure, data, control
„ g affects complexity
Instruction set design p y of
pipeline implementation

51 CS/ECE 3330 – Fall 2009

7
MIPS Pipelined Datapath

MEM

Right-to-left WB
flow leads to
hazards

52 CS/ECE 3330 – Fall 2009

Pipeline registers

„ Need registers between stages


ƒ To hold information produced in previous cycle

53 CS/ECE 3330 – Fall 2009

8
Pipeline Operation

„ Cycle-by-cycle flow of instructions through the


pipelined datapath
ƒ “Single-clock-cycle” pipeline diagram
– Shows pipeline usage in a single cycle
– Highlight resources used
ƒ c.f. “multi-clock-cycle” diagram
– Graph of operation over time
„ We’ll look at “single-clock-cycle” diagrams for
load & store

54 CS/ECE 3330 – Fall 2009

IF for Load, Store, …

55 CS/ECE 3330 – Fall 2009

9
ID for Load, Store, …

56 CS/ECE 3330 – Fall 2009

EX for Load

57 CS/ECE 3330 – Fall 2009

10
MEM for Load

58 CS/ECE 3330 – Fall 2009

WB for Load

Wrong
register
number

59 CS/ECE 3330 – Fall 2009

11
Corrected Datapath for Load

60 CS/ECE 3330 – Fall 2009

EX for Store

61 CS/ECE 3330 – Fall 2009

12
MEM for Store

62 CS/ECE 3330 – Fall 2009

WB for Store

63 CS/ECE 3330 – Fall 2009

13
Multi-Cycle Pipeline Diagram

„ Form showing resource usage

64 CS/ECE 3330 – Fall 2009

Multi-Cycle Pipeline Diagram

„ Traditional form

65 CS/ECE 3330 – Fall 2009

14
Single-Cycle Pipeline Diagram

„ State of pipeline in a given cycle

66 CS/ECE 3330 – Fall 2009

Pipelining Demo

http://bellerofonte.dii.unisi.it/WEBMIPS/

70 CS/ECE 3330 – Fall 2009

15
4.7 Data Hazards: Forwarding vs. Stalling
Data Hazards in ALU Instructions
„ Consider this sequence:
sub $2, $1,$3
and $12,$2,$5
or $13,$6,$2
add $14,$2,$2
sw $15,100($2)
„ We can resolve hazards with forwarding
ƒ How do we detect when to forward?

0 CS/ECE 3330 – Fall 2009

Dependencies & Forwarding

1 CS/ECE 3330 – Fall 2009

1
Detecting the Need to Forward

„ Pass register numbers along pipeline


ƒ e.g., ID/EX.RegisterRs = register number for Rs
sitting in ID/EX pipeline register
„ ALU operand register numbers in EX stage are
given by
ƒ ID/EX.RegisterRs, ID/EX.RegisterRt

2 CS/ECE 3330 – Fall 2009

Detecting the Need to Forward

„ Data hazards when


1a. EX/MEM.RegisterRd = ID/EX.RegisterRs Fwd from
EX/MEM
1b. EX/MEM.RegisterRd = ID/EX.RegisterRt pipeline reg
2 MEM/WB.RegisterRd
2a. MEM/WB R i Rd = ID/EX.RegisterRs
ID/EX R i R Fwd from
2b. MEM/WB.RegisterRd = ID/EX.RegisterRt MEM/WB
pipeline reg

3 CS/ECE 3330 – Fall 2009

2
Detecting the Need to Forward

„ But only if forwarding instruction will write to


a register!
ƒ EX/MEM.RegWrite, MEM/WB.RegWrite
„ And only if Rd for that instruction is not $zero
ƒ EX/MEM.RegisterRd ≠ 0,
MEM/WB.RegisterRd ≠ 0

4 CS/ECE 3330 – Fall 2009

Forwarding Paths

a. Without forwarding

5 CS/ECE 3330 – Fall 2009

3
Forwarding Paths

6 CS/ECE 3330 – Fall 2009

Forwarding Conditions

„ EX hazard
ƒ if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs))
ForwardA = 10
ƒ if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRt))
ForwardB = 10

7 CS/ECE 3330 – Fall 2009

4
Forwarding Conditions

„ MEM hazard
ƒ if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)
and (MEM/WB.RegisterRd = ID/EX.RegisterRs))
ForwardA = 01
ƒ if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)
and (MEM/WB.RegisterRd = ID/EX.RegisterRt))
ForwardB = 01

8 CS/ECE 3330 – Fall 2009

Double Data Hazard


„ Consider the sequence:
add $1,$1,$2
add $1,$1,$3
$1,$1,$4
add $ ,$ ,$
„ Both hazards occur
ƒ Want to use the most recent
„ Revise MEM hazard condition
ƒ Only fwd if EX hazard condition isn’t true

9 CS/ECE 3330 – Fall 2009

5
Double Data Hazard

Initially:
$1 = 1
$2 = 2
$3 = 3
$4 = 4

10 CS/ECE 3330 – Fall 2009

Revised Forwarding Condition


„ MEM hazard
ƒ if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)
and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs))
and (MEM/WB.RegisterRd = ID/EX.RegisterRs))
ForwardA = 01
ƒ if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)
and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRt))
and (MEM/WB.RegisterRd = ID/EX.RegisterRt))
ForwardB = 01

11 CS/ECE 3330 – Fall 2009

6
Datapath with Forwarding

12 CS/ECE 3330 – Fall 2009

Load-Use Data Hazard

Need to stall
for one cycle

13 CS/ECE 3330 – Fall 2009

7
Load-Use Hazard Detection

„ Check when using instruction is decoded in ID


stage
„ ALU operand register numbers in ID stage are
given by
ƒ IF/ID.RegisterRs, IF/ID.RegisterRt

14 CS/ECE 3330 – Fall 2009

Load-Use Hazard Detection

„ Load-use hazard when


ƒ ID/EX.MemRead and
((ID/EX.RegisterRt = IF/ID.RegisterRs) or
(ID/EX RegisterRt = IF/ID.RegisterRt))
(ID/EX.RegisterRt IF/ID RegisterRt))
„ If detected, stall and insert bubble

15 CS/ECE 3330 – Fall 2009

8
How to Stall the Pipeline
„ Force control values in ID/EX register
to 0
ƒ EX, MEM and WB do nop (no-operation)
„ Prevent update of PC and IF/ID register
ƒ Using instruction is decoded again
ƒ Following instruction is fetched again
ƒ 1-cycle stall allows MEM to read data for lw
– Can subsequently forward to EX stage

16 CS/ECE 3330 – Fall 2009

Stall/Bubble in the Pipeline

Stall inserted
here

17 CS/ECE 3330 – Fall 2009

9
Stall/Bubble in the Pipeline

Or, more
accurately…

18 CS/ECE 3330 – Fall 2009

Datapath with Hazard Detection

19 CS/ECE 3330 – Fall 2009

10
The Big Picture
„ Stalls reduce performance
ƒ But are required to get correct results
„ Compiler can arrange code to avoid hazards
d stalls
and ll
ƒ Requires knowledge of the pipeline structure

20 CS/ECE 3330 – Fall 2009

4.8 Control Hazard


Branch Hazards

„ If branch outcome determined in MEM ds

Flush these
instructions
(Set control
values to 0)

PC

21 CS/ECE 3330 – Fall 2009

11
Reducing Branch Delay

„ Move hardware to determine outcome to ID


stage
ƒ Target address adder
ƒ Register
R i t comparator t
„ Example: branch taken
36: sub $10, $4, $8
40: beq $1, $3, 7
44: and $12, $2, $5
48: or $13, $2, $6
52: add $14, $4, $2
56: slt $15, $6, $7
...
72: lw $4, 50($7)

22 CS/ECE 3330 – Fall 2009

Example: Branch Taken

23 CS/ECE 3330 – Fall 2009

12
Example: Branch Taken
Really, sll $0, $0, 0

24 CS/ECE 3330 – Fall 2009

Data Hazards for Branches

„ If a comparison register is a destination


of 2nd or 3rd preceding ALU instruction

add $1, $2, $3 IF ID EX MEM WB

add $4, $5, $6 IF ID EX MEM WB

… IF ID EX MEM WB

beq $1, $4, target IF ID EX MEM WB

„ Can resolve using forwarding

25 CS/ECE 3330 – Fall 2009

13
Data Hazards for Branches

„ If a comparison register is a destination of


preceding ALU instruction or 2nd preceding
load instruction
ƒ Need 1 stall cycle

lw $1, addr IF ID EX MEM WB

add $ , $5,
$4, $ , $6
$ IF ID EX MEM WB

beq stalled IF ID

beq $1, $4, target ID EX MEM WB

26 CS/ECE 3330 – Fall 2009

Data Hazards for Branches

„ If a comparison register is a destination of


immediately preceding load instruction
ƒ Need 2 stall cycles

lw $1, addr IF ID EX MEM WB

q stalled
beq IF ID

beq stalled ID

beq $1, $0, target ID EX MEM WB

27 CS/ECE 3330 – Fall 2009

14
Dynamic Branch Prediction

„ In deeper and superscalar pipelines, branch


penalty is more significant
„ Use dynamic prediction
ƒ Branch prediction buffer (aka branch history table)
ƒ Indexed by recent branch instruction addresses
ƒ Stores outcome (taken/not taken)
ƒ To execute a branch
– Check table, expect the same outcome
– Start fetching from fall-through or target
– If wrong, flush pipeline and flip prediction

28 CS/ECE 3330 – Fall 2009

1-Bit Predictor: Shortcoming

„ Inner loop branches mispredicted twice!

outer: …

inner: …

beq …, …, inner

beq …, …, outer

„ Mispredict as taken on last iteration of


inner loop
„ Then mispredict as not taken on first
iteration of inner loop next time around

29 CS/ECE 3330 – Fall 2009

15
2-Bit Predictor

„ Only change prediction on two successive


mispredictions

30 CS/ECE 3330 – Fall 2009

Calculating the Branch Target

„ Even with predictor, still need to calculate the


target address
ƒ 1-cycle penalty for a taken branch
„ Branch target buffer
ƒ Cache of target addresses
ƒ Indexed by PC when instruction fetched
– If hit and instruction is branch predicted taken, can fetch
target immediately

31 CS/ECE 3330 – Fall 2009

16
Last Time

„ Data Hazards
ƒ Detection
ƒ Classification
ƒ Handling
H dli
„ Control Hazards and Branch Prediction

98 CS/ECE 3330 – Fall 2009

4.9 Exceptions
Exceptions and Interrupts

„ “Unexpected” events requiring change


in flow of control
ƒ Different ISAs use the terms differently
„ Exception
ƒ Arises within the CPU
– e.g., undefined opcode, overflow, syscall, …

„ Interrupt
ƒ From an external I/O controller
„ Dealing
D li with them without
ith th ith t sacrificing
ifi i
performance is hard

99 CS/ECE 3330 – Fall 2009

1
Sample Exceptions

„ I/O request
„ Invoke the operating system from user
program
„ Arithmetic overflow
„ Undefined instruction
„ Hardware malfunction

100 CS/ECE 3330 – Fall 2009

Handling Exceptions

„ Save PC of offending (or interrupted)


instruction
ƒ In MIPS: Exception Program Counter (EPC)
„ Save indication of the problem
ƒ In MIPS: Cause register
ƒ We’ll assume 1-bit
– 0 for undefined opcode, 1 for overflow
„ Jump to handler at 8000 00180

101 CS/ECE 3330 – Fall 2009

2
An Alternate Mechanism

„ Vectored Interrupts
ƒ Handler address determined by the cause
„ Example:
ƒ Undefined opcode: C000 0000
ƒ Overflow: C000 0020
ƒ …: C000 0040
„ Instructions either
ƒ Deal with the interrupt, or
ƒ Jump
J to
t reall handler
h dl

102 CS/ECE 3330 – Fall 2009

Handler Actions

„ Read cause, and transfer to relevant handler


„ Determine action required
„ If restartable
ƒ Take corrective action
ƒ use EPC to return to program
„ Otherwise
ƒ Terminate program
ƒ Report error using EPC, cause, …

103 CS/ECE 3330 – Fall 2009

3
Exceptions in a Pipeline

„ Another form of control hazard


„ Consider overflow on add in EX stage
add $1, $2, $1
ƒ Prevent $1 from being clobbered
ƒ Complete previous instructions
ƒ Flush add and subsequent instructions
ƒ Set Cause and EPC register values
ƒ Transfer control to handler
„ Similar
Si il to
t mispredicted
i di t d branch
b h
ƒ Use much of the same hardware

104 CS/ECE 3330 – Fall 2009

Pipeline with Exceptions

105 CS/ECE 3330 – Fall 2009

4
Exception Properties

„ Restartable exceptions
ƒ Pipeline can flush the instruction
ƒ Handler executes, then returns to the instruction
R f t h d and
– Refetched d executed
t d from
f t h
scratch

„ PC saved in EPC register


ƒ Identifies causing instruction
ƒ Actually PC + 4 is saved
– Handler must adjust

106 CS/ECE 3330 – Fall 2009

Exception Example

„ Exception on add in
40 sub $11, $2, $4
44 and $12, $2, $5
48 or $13 $2,
$13, $2 $6
4C add $1, $2, $1
50 slt $15, $6, $7
54 lw $16, 50($7)

„ Handler
80000180 sw $25,
$2 1000($0)
80000184 sw $26, 1004($0)

107 CS/ECE 3330 – Fall 2009

5
Exception Example

108 CS/ECE 3330 – Fall 2009

Exception Example

109 CS/ECE 3330 – Fall 2009

6
Multiple Exceptions

„ Pipelining overlaps multiple instructions


ƒ Could have multiple exceptions at once
„ Simple approach: deal with exception from
earliest instruction
ƒ Flush subsequent instructions
ƒ “Precise” exceptions
„ In complex pipelines
ƒ Multiple instructions issued per cycle
ƒ Out-of-order
Out of order completion
ƒ Maintaining precise exceptions is difficult!

110 CS/ECE 3330 – Fall 2009

Imprecise Exceptions

„ Just stop pipeline and save state


ƒ Including exception cause(s)
„ Let the handler work out
ƒ Which instruction(s) had exceptions
ƒ Which to complete or flush
– May require “manual” completion

„ Simplifies hardware, but more complex


handler software
„ Not
N t feasible
f ibl for
f complex
l multiple-issue
lti l i
out-of-order pipelines

111 CS/ECE 3330 – Fall 2009

7
Instruction-Level Parallelism (ILP)

„ Pipelining: executing multiple instructions in

4.10 Para
parallel
„ To increase ILP

allelism and Advance


ƒ Deeper pipeline
– Less work per stage ⇒ shorter clock cycle
ƒ Multiple issue
– Replicate pipeline stages ⇒ multiple pipelines
– Start multiple instructions per clock cycle
– CPI < 1, so use Instructions Per Cycle (IPC)
– E.g.,
E g 4GHz 4 4-way
way multiple-issue
multiple issue

ed ILP
• 16 BIPS, peak CPI = 0.25, peak IPC = 4
– But dependencies reduce this in practice

112 CS/ECE 3330 – Fall 2009

Multiple Issue

„ Static multiple issue


ƒ Compiler groups instructions to be issued together
ƒ Packages them into “issue slots”
ƒ Compiler
C il d d avoids
detects and id h d
hazards
„ Dynamic multiple issue
ƒ CPU examines instruction stream and chooses
instructions to issue each cycle
ƒ Compiler can help by reordering instructions
ƒ CPU resolves hazards using advanced techniques
at runtime

113 CS/ECE 3330 – Fall 2009

8
Speculation

„ “Guess” what to do with an instruction


ƒ Start operation as soon as possible
ƒ Check whether guess was right
– If so, complete
l t the
th operation
ti
– If not, roll-back and do the right thing
„ Common to static and dynamic multiple issue
„ Examples
ƒ Speculate on branch outcome
– Roll back if p
path taken is different
ƒ Speculate on load
– Roll back if location is updated

114 CS/ECE 3330 – Fall 2009

Compiler/Hardware Speculation

„ Compiler can reorder instructions


ƒ e.g., move load before branch
ƒ Can include “fix-up” instructions to recover from
i t guess
incorrect
„ Hardware can look ahead for instructions to
execute
ƒ Buffer results until it determines they are actually
needed
ƒ Flush buffers on incorrect speculation
p

115 CS/ECE 3330 – Fall 2009

9
Speculation and Exceptions

„ What if exception occurs on a speculatively


executed instruction?
ƒ e.g., speculative load before null-pointer check
„ Static speculation
ƒ Can add ISA support for deferring exceptions
„ Dynamic speculation
ƒ Can buffer exceptions until instruction completion
(which may not occur)

116 CS/ECE 3330 – Fall 2009

Static Multiple Issue

„ Compiler groups instructions into “issue


packets”
ƒ Group of instructions that can be issued on a
single cycle
ƒ Determined by pipeline resources required
„ Think of an issue packet as a very long
instruction
ƒ Specifies multiple concurrent operations
ƒ ⇒ Veryy Longg Instruction Word (VLIW)
( )

117 CS/ECE 3330 – Fall 2009

10
Scheduling Static Multiple Issue

„ Compiler must remove some/all hazards


ƒ Reorder instructions into issue packets
ƒ No dependencies with a packet
ƒ Possibly
P ibl some d d
dependenciesi b k
between packets
– Varies between ISAs; compiler must know!
ƒ Pad with nop if necessary

118 CS/ECE 3330 – Fall 2009

MIPS with Static Dual Issue

„ Two-issue packets
ƒ One ALU/branch instruction
ƒ One load/store instruction
ƒ 64-bit
64 bi aligned
li d
– ALU/branch, then load/store
– Pad an unused instruction with nop

Address Instruction type Pipeline Stages


n ALU/branch IF ID EX MEM WB
n+4 Load/store IF ID EX MEM WB
n+8 ALU/branch IF ID EX MEM WB
n + 12 Load/store IF ID EX MEM WB
n + 16 ALU/branch IF ID EX MEM WB
n + 20 Load/store IF ID EX MEM WB

119 CS/ECE 3330 – Fall 2009

11
MIPS with Static Dual Issue

120 CS/ECE 3330 – Fall 2009

Hazards in the Dual-Issue MIPS

„ More instructions executing in parallel


„ EX data hazard
ƒ Forwarding avoided stalls with single-issue
ƒ Now can’t use ALU result in load/store in same
packet
– add $t0, $s0, $s1
load $s2, 0($t0)
– Split into two packets, effectively a stall
„ Load-use hazard
ƒ Still one cycle use latency, but now two
instructions
„ More aggressive scheduling required

121 CS/ECE 3330 – Fall 2009

12
Scheduling Example

„ Schedule this for dual-issue MIPS

Loop:
p lw $t0,
$ , 0($s1)
($ ) # $t0=array
$ y element
addu $t0, $t0, $s2 # add scalar in $s2
sw $t0, 0($s1) # store result
addi $s1, $s1,–4 # decrement pointer
bne $s1, $zero, Loop # branch $s1!=0

ALU/branch Load/store cycle


Loop: nop lw $t0, 0($s1) 1
addi
ddi $s1,
$ 1 $s1,–4
$ 1 4 nop 2
addu $t0, $t0, $s2 nop 3
bne $s1, $zero, Loop sw $t0, 4($s1) 4

„ IPC = 5/4 = 1.25 (c.f. peak IPC = 2)

122 CS/ECE 3330 – Fall 2009

Loop Unrolling

„ Replicate loop body to expose more


parallelism
ƒ Reduces loop-control overhead
„ Use different registers per replication
ƒ Called “register renaming”
ƒ Avoid loop-carried “anti-dependencies”
– Store followed by a load of the same register
– Aka “name dependence”
• Reuse of a register name

123 CS/ECE 3330 – Fall 2009

13
Loop Unrolling Example

ALU/branch Load/store cycle


Loop: addi $s1, $s1,–16 lw $t0, 0($s1) 1
nop lw $t1, 12($s1) 2
addu $t0, $t0, $s2 lw $t2, 8($s1) 3
addu $t1, $t1, $s2 lw $t3, 4($s1) 4
addu $t2, $t2, $s2 sw $t0, 16($s1) 5
addu $t3, $t4, $s2 sw $t1, 12($s1) 6
nop sw $t2, 8($s1) 7
bne $s1, $zero, Loop sw $t3, 4($s1) 8

„ IPC = 14/8 = 1.75


ƒ Closer to 2, but at cost of registers and code size

124 CS/ECE 3330 – Fall 2009

Dynamic Multiple Issue

„ “Superscalar” processors
„ CPU decides whether to issue 0, 1, 2, … each
cycle
ƒ Avoiding structural and data hazards
„ Avoids the need for compiler scheduling
ƒ Though it may still help
ƒ Code semantics ensured by the CPU

125 CS/ECE 3330 – Fall 2009

14
Dynamic Pipeline Scheduling

„ Allow the CPU to execute instructions out of


order to avoid stalls
ƒ But commit result to registers in order
„ Example
lw $t0, 20($s2)
addu $t1, $t0, $t2
sub $s4, $s4, $t3
slti $t5, $s4, 20
ƒ Can start sub while addu is waiting for lw

126 CS/ECE 3330 – Fall 2009

Dynamically Scheduled CPU

Preserves
dependencies

Hold pending
operands

Results also sent


to any waiting
reservation
stations
Reorders buffer
for register
Can supply
writes
operands for
issued
instructions
127 CS/ECE 3330 – Fall 2009

15
Register Renaming

„ Reservation stations and reorder buffer


effectively provide register renaming
„ On instruction issue to reservation station
ƒ If operand is available in register file or reorder
buffer
– Copied to reservation station
– No longer required in the register; can be overwritten
ƒ If operand is not yet available
– It will be provided to the reservation station by a function
unit
– Register update may not be required

128 CS/ECE 3330 – Fall 2009

Speculation

„ Predict branch and continue issuing


ƒ Don’t commit until branch outcome determined
„ Load speculation
ƒ Avoid load and cache miss delay
– Predict the effective address
– Predict loaded value
– Load before completing outstanding stores
– Bypass stored values to load unit
ƒ Don’t commit load until speculation cleared

129 CS/ECE 3330 – Fall 2009

16
Why Do Dynamic Scheduling?

„ Why not just let the compiler schedule code?


„ Not all stalls are predicable
ƒ e.g., cache misses
„ Can’t always schedule around branches
ƒ Branch outcome is dynamically determined
„ Different implementations of an ISA have
different latencies and hazards

130 CS/ECE 3330 – Fall 2009

Does Multiple Issue Work?

The BIG Picture


„ Yes, but not as much as we’d like
„ Programs have real dependencies that limit ILP
„ Some dependencies are hard to eliminate
ƒ e.g., pointer aliasing
„ Some parallelism is hard to expose
ƒ Limited window size during instruction issue
„ Memory delays and limited bandwidth
ƒ Hard to keep pipelines full
„ Speculation can help if done well

131 CS/ECE 3330 – Fall 2009

17
Power Efficiency

„ Complexity of dynamic scheduling and


speculations requires power
„ Multiple simpler cores may be better

Microprocessor Year Clock Rate Pipeline Issue Out-of-order/ Cores Power


Stages width Speculation
i486 1989 25MHz 5 1 No 1 5W
Pentium 1993 66MHz 5 2 No 1 10W
Pentium Pro 1997 200MHz 10 3 Yes 1 29W
P4 Willamette 2001 2000MHz 22 3 Yes 1 75W
P4 Prescott 2004 3600MHz 31 3 Yes 1 103W
Core 2006 2930MHz 14 4 Yes 2 75W
UltraSparc III 2003 1950MHz 14 4 No 1 90W
UltraSparc T1 2005 1200MHz 6 1 No 8 70W

132 CS/ECE 3330 – Fall 2009

18

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy