Computer Science 37 Lecture 22
Computer Science 37 Lecture 22
Control Hazards
IF
ID
EX
MEM
Visualize: A beq instruction going through the pipeline. Branch: At what pipeline stage can the branch decision be made?
Question: What is going on while the beq is floating towards the MEM stage?
WB
beq
IF ID EX MEM WB
Question: Do we have to wait until the beq hits the MEM stage to launch the execution of another instruction in the pipeline?
Ok, so we say that we dont have to wait we assume that the branch is not taken.
Time (in clock cycles) Program execution CC 1 CC 2 order (in instructions) 40 beq $1, $3, 7 IM Reg
CC 3
CC 4
CC 5
CC 6
CC 7
CC 8
CC 9
DM
Reg
IM
Reg
DM
Reg
48 or $13, $6, $2
IM
Reg
DM
Reg
IM
Reg
DM
Reg
72 lw $4, 50($7)
IM
Reg
DM
Reg
The beq reaches MEM and we figure out we want to branch. Interlude: Why do we branch to address 72? Question: What happens to the three instructions that started after the beq?
Win: We already started the execution of the instructions following the branch. No time was wasted; all stages in the pipeline were kept busy.
Lose: We have to cancel the execution of the three instructions that followed the branch. Three stages wont have done any useful work (three clock cycles wasted). To flush the pipeline, we must set the control bits of the instructions in IF, ID, and EX to 0s.
Question: Do we need extra controls in the pipeline to implement flush? We could reduce the cost of the taken branch by moving the decision to an earlier stage in the pipeline. The optimization calls for two changes: 1) Do branch address calculation right at the ID stage. PC value and immediate can be found in IF/ID, move the branch adder from MEM to ID. 2) Branch decision: Lets not use the ALU. Equality can be tested by XOR-ing the two operands and OR-ing all the bits in the result. We can do this easily in the ID stage. Its much faster than using the ALU, anyway.
IF.Flush
Hazard detection unit M u x M u x
ID/EX
WB Control M EX
EX/MEM
WB M
0 IF/ID
MEM/WB
WB
Shift left 2 M u x
ALU
M u x
Data memory
M u x
Sign extend
M u x Forwarding unit
To flush out an instruction from IF, we can just zero out all the bits in the IF/ID register creating a nop.
Performance: The more the branch is not taken, the more improvement one will observe in this architecture.
The more times the branch is taken, the worst the CPI will be.
Branch History Buffer (beq $t1, $t2, 20) 0 (beq $a0, $t0, 100) 1 (beq $s4, $t3, 123) 0
Becomes
Becomes
if $s2 = 0 then add $s1, $s2, $s3 add $s1, $s2, $s3 if $s1 = 0 then sub $t4, $t5, $t6
Best choice
2nd best
When is this good?
10
Correctness: The optimization can never change the expected behavior of the program.
Performance: If the pipeline is short, an optimization that gains a cycle is big relative improvement. If the pipeline is long, one cycle in many is not such a great improvement.
11
12
IF.Flush
Hazard detection unit 40000040 M u x
ID.Flush
EX.Flush
ID/EX
WB Control 0 M u x M 0
M u x M u x
EX/MEM
WB M
MEM/WB
WB
0 EX Cause Except PC
IF/ID
Shift left 2 M u x
ALU
M u x
Data memory
M u x
Sign extend
M u x Forwarding unit
13
Dynamic Pipeline Scheduling: Look at instructions down the sequence (past the stall) to start executing sooner.
14