05 Pipelining
05 Pipelining
¨ This lecture
¤ Impacts of pipelining on performance
¤ The MIPS five-stage pipeline
¤ Pipeline hazards
n Structural
hazards
n Data hazards
Pipelining Technique
¨ Improving throughput at the expense of latency
¤ Delay:D = T + nδ
¤ Throughput: IPS = n/(T + nδ)
Combinational Logic D=
Critical Path Delay = 30 IPS =
Combinational Logic D = 31
Critical Path Delay = 30 IPS = 1/31
15
10
5
0
0 50 100 150 200
Number of Pipeline Stages
Five Stage MIPS Pipeline
Simple Five Stage Pipeline
¨ A pipelined load-store architecture that processes
up to one instruction per cycle
Write Back
PC
clock
Branch Target
NPC = PC + 4
NPC
clock PC +
4 Why increment
by 4?
Instruction
Memory
Pipeline
Register
Instruction Fetch
clock
P3
Branch Target
NPC = PC + 4
NPC
clock PC +
P2
4 Why increment
by 4?
Instruction
P1
Memory
NPC target
NPC
reg
Register
Instruction
File
reg
ctrl
decode
Pipeline Pipeline
Register Register
Execute Stage
¨ Perform ALU operation
¤ Compute the result of ALU
n Operation type: control signals
n First operand: contents of a register
n Second operand: either a register or the immediate value
Target
NPC
Res
reg
ALU
reg
reg
ctrl
ctrl
Pipeline Pipeline
Register Register
Memory Access
¨ Access data memory
¤ Load/store address: ALU outcome
¤ Control signals determine read or write access
¤ Destination register
Memory Access
Target
Res
Res
addr
Dat
reg
Memory
data data
ctrl
ctrl
Pipeline Pipeline
Register Register
Register Write Back
¨ Update register file
¤ Controlsignals determine if a register write is needed
¤ Only one write port is required
n Write the ALU result to the destination register, or
n Write the loaded data into the register file
Five Stage Pipeline
¨ Ideal pipeline: IPC=1
¤ Isthere enough resources to keep the pipeline stages
busy all the time?
+
PC +
Reg. ALU Reg.
4
File Mem File
Mem
Pipeline Hazards
Pipeline Hazards
¨ Structural hazards: multiple instructions compete for
the same resource
R1ß Mem[R2]
R3ß Mem[R20]
R6ß R4-R5
R7ß R1+R0
Structural Hazards
¨ 1. Unified memory for instruction and data
R1ß Mem[R2]
R3ß Mem[R20]
R6ß R4-R5
R7ß R1+R0
R1ß Mem[R2]
R3ß Mem[R20]
R6ß R4-R5
R7ß R1+R0
Structural Hazards
¨ 1. Unified memory for instruction and data
¨ 2. Register file with shared read/write access ports
R1ß Mem[R2]
R3ß Mem[R20]
R6ß R4-R5
R7ß R1+R0
R1ß Mem[R2]
R3ß R1+R0
R4ß R1-R3
Data Hazards
¨ True dependence: read-after-write (RAW)
¤ Consumer has to wait for producer
Loaded data will be available two cycles later.
R1ß Mem[R2]
R3ß R1+R0
R4ß R1-R3
Data Hazards
¨ True dependence: read-after-write (RAW)
¤ Consumer has to wait for producer
Inserting two bubbles.
R1ß Mem[R2]
Nothing
Nothing
R3ß R1+R0
R4ß R1-R3
Data Hazards
¨ True dependence: read-after-write (RAW)
¤ Consumer has to wait for producer
Inserting single bubble + RF bypassing.
R1ß Mem[R2]
Nothing
R3ß R1+R0
R4ß R1-R3
Load delay slot.
SW vs. HW management?
Data Hazards
¨ True dependence: read-after-write (RAW)
¤ Consumer has to wait for producer
Using the result of an ALU instruction.
R1ß R2+R3
R5ß R1+R0
R3ß R1+R0
R4ß R1-R3
Data Hazards
¨ True dependence: read-after-write (RAW)
¤ Consumer has to wait for producer
Using the result of an ALU instruction.
R1ß R2+R3
R5ß R1+R0
R3ß R1+R0
R4ß R1-R3
R1ß R2+R1
R2ß R8+R9
Data Hazards
¨ True dependence: read-after-write (RAW)
¨ Anti dependence: write-after-read (WAR)
¤ Write must wait for earlier read
R1ß R2+R1
R2ß R8+R9
R1ß R2+R3
R1ß R8+R9
Data Hazards
¨ True dependence: read-after-write (RAW)
¨ Anti dependence: write-after-read (WAR)
¨ Output dependence: write-after-write (WAW)
¤ Old writes must not overwrite the younger write
R1ß R2+R3
R1ß R8+R9
R1ß Mem[R2]
R2ß R1+R0
R1ß R1-R2
Mem[R3] ß R2
Data Hazards
¨ How to detect and resolve data hazards
¤ Show all of the data hazards in the code below
R1ß Mem[R2]
WAR
Mem[R3] ß R2