Lecture 4.3 - The Processor - Pipelining
Lecture 4.3 - The Processor - Pipelining
▪ When a 1-bit control to a 2-way multiplexor is asserted, the multiplexor selects the
input corresponding to 1.
▪ If the control is deasserted, the multiplexor selects the 0 input.
▪ PCSrc is controlled by an AND gate.
▪ If the Branch signal and the ALU Zero signal are both set, then PCSrc is 1; otherwise, it is 0.
▪ Control sets the Branch signal only during a beq instruction; otherwise, PCSrc is set to 0.
Multi-Cycle Pipeline Diagram
Multi-Cycle Pipeline Diagram
▪ Traditional form
Multi-Cycle Pipeline Diagram
Data Hazards in ALU Instructions
The first instruction writes into x2, and all the following
instructions read x2.
The colored lines from the top datapath to the lower ones
show the dependences. Those that must go backward in
time are pipeline data hazards.
Detecting the Need to Forward
▪ Pass register numbers along pipeline
▪ e.g., ID/EX.RegisterRs1 = register number for Rs1 sitting in ID/EX pipeline
register
▪ the one from the first read port of the register file.
ForwardB = 00 ID/EX The second ALU operand comes from the register file.
ForwardB = 10 EX/MEM The second ALU operand is forwarded from the prior
ALU result.
ForwardB = 01 MEM/WB The second ALU operand is forwarded from data
memory or an earlier ALU result.
Forwarding Conditions
▪ EX hazard
▪ if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs1))
ForwardA = 10
▪ if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs2))
ForwardB = 10
▪ MEM hazard
▪ if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)
and (MEM/WB.RegisterRd = ID/EX.RegisterRs1))
ForwardA = 01
▪ if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)
and (MEM/WB.RegisterRd = ID/EX.RegisterRs2))
ForwardB = 01
Revised Forwarding Condition
▪ MEM hazard
▪ if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)
and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs1))
and (MEM/WB.RegisterRd = ID/EX.RegisterRs1))
ForwardA = 01
PC
More-Realistic Branch Prediction
▪ Branch Prediction ▪ Static branch prediction
▪ A method of resolving a branch ▪ Based on typical branch behavior
hazard that assumes a given ▪ Example: loop and if-statement branches
outcome for the branch and
proceeds from that assumption ▪ Predict backward branches taken
rather than waiting to ascertain ▪ Predict forward branches not taken
the actual outcome.
▪ In RISC-V pipeline
▪ Can predict branches not taken
▪ Fetch instruction after branch, with no delay
Assume Branch Not Taken
▪ Stalling until the branch is complete is too slow. One improvement over branch
stalling is to predict that the branch will not be taken and thus continue execution
down the sequential instruction stream.
▪ If the branch is taken, the instructions that are being fetched and decoded must be
discarded. Execution continues at the branch target. If branches are untaken half the
time, and if it costs little to discard the instructions, this optimization halves the cost of
control hazards.
▪ To discard instructions, we merely change the original control values to 0, much as we
did to stall for a load-use data hazard. The difference is that we must also change the
three instructions in the IF, ID, and EX stages when the branch reaches the MEM
stage;
▪ For load-use stalls, we just change control to zero in the ID stage and let them filter
through the pipeline. Discarding instructions, then, means we must be able to flush
instructions in the IF, ID, and EX stages of the pipeline.
Reducing Branch Delay
▪ More hardware to determine outcome to ID stage
▪ Target address adder
▪ Register comparator
▪ Example: branch taken
36: sub x10, x4, x8
40: beq x1, x3, 16 // PC-relative branch
// to 40+16*2=72
44: and x12, x2, x5
48: or x13, x2, x6
52: add x14, x4, x2
56: sub x15, x6, x7
...
72: ld x4, 50(x7)
Example: Branch Taken
Example: Branch Taken
Clock cycle 4 shows the instruction at
location 72 being fetched and the single
bubble or nop instruction in the pipeline
as a result of the taken branch. (Since
the nop is really sll x0, x0, 0, it’s
arguable whether or not the ID stage in
clock 4 should be highlighted.)