0% found this document useful (0 votes)
17 views27 pages

Lecture 4.3 - The Processor - Pipelining

Chapter 4 discusses the concept of pipelining in processors, focusing on control signals, data hazards, and forwarding techniques to resolve dependencies in ALU instructions. It explains how to detect and handle load-use and branch hazards, including static and dynamic branch prediction methods to optimize performance. The chapter emphasizes the importance of managing control signals and the flow of data through the pipeline to minimize stalls and improve execution efficiency.

Uploaded by

Kiet Do
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views27 pages

Lecture 4.3 - The Processor - Pipelining

Chapter 4 discusses the concept of pipelining in processors, focusing on control signals, data hazards, and forwarding techniques to resolve dependencies in ALU instructions. It explains how to detect and handle load-use and branch hazards, including static and dynamic branch prediction methods to optimize performance. The chapter emphasizes the importance of managing control signals and the flow of data through the pipeline to minimize stalls and improve execution efficiency.

Uploaded by

Kiet Do
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Chapter 4

The Processor – Pipelining


7 Control Signals Cont.
▪ A signal is asserted when its logical state is set to true ,
▪ A signal is deasserted when it's set to false or unknown
▪ Some signals are true with low voltage and some with high.

▪ When a 1-bit control to a 2-way multiplexor is asserted, the multiplexor selects the
input corresponding to 1.
▪ If the control is deasserted, the multiplexor selects the 0 input.
▪ PCSrc is controlled by an AND gate.
▪ If the Branch signal and the ALU Zero signal are both set, then PCSrc is 1; otherwise, it is 0.
▪ Control sets the Branch signal only during a beq instruction; otherwise, PCSrc is set to 0.
Multi-Cycle Pipeline Diagram
Multi-Cycle Pipeline Diagram
▪ Traditional form
Multi-Cycle Pipeline Diagram
Data Hazards in ALU Instructions

▪ Consider this sequence:


sub x2, x1,x3 # Register x2 written by sub
and x12,x2,x5 # 1st operand (x2) depends on sub
or x13,x6,x2 # 2nd operand (x2) depends on sub
add x14,x2,x2 # 1st (x2) & 2nd (x2) depend on sub
sd x15,100(x2) # Base (x2) depends on sub

▪ We can resolve hazards with forwarding


▪ How do we detect when to forward?
Dependencies & Forwarding

All the dependent actions are shown in color, and “CC 1” at


the top of the figure means clock cycle 1.

The first instruction writes into x2, and all the following
instructions read x2.

This register is written in clock cycle 5, so the proper value is


unavailable before clock cycle 5. (A read of a register during
a clock cycle returns the value written at the end of the first
half of the cycle, when such a write occurs.)

The colored lines from the top datapath to the lower ones
show the dependences. Those that must go backward in
time are pipeline data hazards.
Detecting the Need to Forward
▪ Pass register numbers along pipeline
▪ e.g., ID/EX.RegisterRs1 = register number for Rs1 sitting in ID/EX pipeline
register
▪ the one from the first read port of the register file.

▪ ALU operand register numbers in EX stage are given by


▪ ID/EX.RegisterRs1, ID/EX.RegisterRs2
▪ Data hazards when Fwd from
EX/MEM
1a. EX/MEM.RegisterRd = ID/EX.RegisterRs1 pipeline reg
1b. EX/MEM.RegisterRd = ID/EX.RegisterRs2
Fwd from
2a. MEM/WB.RegisterRd = ID/EX.RegisterRs1 MEM/WB
pipeline reg
2b. MEM/WB.RegisterRd = ID/EX.RegisterRs2
Detecting the Need to Forward

▪ But only if forwarding instruction will write to a register!


▪ EX/MEM.RegWrite, MEM/WB.RegWrite

▪ And only if Rd for that instruction is not $zero


▪ EX/MEM.RegisterRd ≠ 0,
MEM/WB.RegisterRd ≠ 0
Double Data Hazard
▪ Consider the sequence:
add x1,x1,x2
add x1,x1,x3
add x1,x1,x4

▪ Both hazards occur


▪ Want to use the most recent

▪ Revise MEM hazard condition


▪ Only fwd if EX hazard condition isn’t true
Revised Forwarding Condition
▪ MEM hazard
▪ if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd ≠ 0)
and not(EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd ≠ ID/EX.RegisterRs1))
and (MEM/WB.RegisterRd = ID/EX.RegisterRs1)) ForwardA = 01
▪ if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd ≠ 0)
and not(EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd ≠ ID/EX.RegisterRs2))
and (MEM/WB.RegisterRd = ID/EX.RegisterRs2)) ForwardB = 01
Forwarding Paths
No Forwarding vs Forwarding
Forwarding Conditions
Mux control Source Explanation
ForwardA = 00 ID/EX The first ALU operand comes from the register file.
ForwardA = 10 EX/MEM The first ALU operand is forwarded from the prior ALU
result.
ForwardA = 01 MEM/WB The first ALU operand is forwarded from data memory
or an earlier ALU result.

ForwardB = 00 ID/EX The second ALU operand comes from the register file.

ForwardB = 10 EX/MEM The second ALU operand is forwarded from the prior
ALU result.
ForwardB = 01 MEM/WB The second ALU operand is forwarded from data
memory or an earlier ALU result.
Forwarding Conditions
▪ EX hazard
▪ if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs1))
ForwardA = 10
▪ if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs2))
ForwardB = 10
▪ MEM hazard
▪ if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)
and (MEM/WB.RegisterRd = ID/EX.RegisterRs1))
ForwardA = 01
▪ if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)
and (MEM/WB.RegisterRd = ID/EX.RegisterRs2))
ForwardB = 01
Revised Forwarding Condition
▪ MEM hazard
▪ if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)
and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs1))
and (MEM/WB.RegisterRd = ID/EX.RegisterRs1))
ForwardA = 01

▪ if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0)


and not (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs2))
and (MEM/WB.RegisterRd = ID/EX.RegisterRs2))
ForwardB = 01
Datapath with Forwarding
Load-Use Hazard Detection
▪ Check when using instruction is decoded in ID stage The first line tests to see if the instruction is
a load: the only instruction that reads data
▪ ALU operand register numbers in ID stage are given by memory is a load.
▪ IF/ID.RegisterRs1, IF/ID.RegisterRs2
The next two lines check to see if the
▪ Load-use hazard when destination register field of the load in the
EX stage matches either source register of
▪ ID/EX.MemRead and the instruction in the ID stage. If the
((ID/EX.RegisterRd = IF/ID.RegisterRs1) or
condition holds, the instruction stalls one
(ID/EX.RegisterRd = IF/ID.RegisterRs2))
clock cycle.
▪ If detected, stall and insert bubble
After this 1-cycle stall, the forwarding logic
can handle the dependence and execution
proceeds.
Load-Use Data Hazard
A bubble is inserted beginning in clock
cycle 4, by changing the and instruction
to a nop.

The and instruction is fetched and


decoded in clock cycles 2 and 3, but its
EX stage is delayed until clock cycle 5
Stall inserted (versus the non-stalled position in clock
here cycle 4).

Likewise the OR instruction is fetched in


clock cycle 3, but its ID stage is delayed
until clock cycle 5 (versus the unstalled
clock cycle 4 position).

After insertion of the bubble, all the


dependences go forward in time and no
further hazards occur.
Datapath with Hazard Detection
Branch Hazards OR Control Hazards The numbers to the left of the instruction (40, 44,
▪ If branch outcome determined in MEM ...) are the addresses of the instructions.

Since the branch instruction decides whether to


branch in the MEM stage—clock cycle 4 for the beq
instruction—the three sequential instructions that
follow the branch will be fetched and begin
execution.

Without intervention, those 3 following instructions


will begin execution before beq branches to ld at
location 72.
Flush these
instructions
(Set control
values to 0)

PC
More-Realistic Branch Prediction
▪ Branch Prediction ▪ Static branch prediction
▪ A method of resolving a branch ▪ Based on typical branch behavior
hazard that assumes a given ▪ Example: loop and if-statement branches
outcome for the branch and
proceeds from that assumption ▪ Predict backward branches taken
rather than waiting to ascertain ▪ Predict forward branches not taken
the actual outcome.

▪ Dynamic branch prediction


▪ Hardware measures actual branch behavior
▪ e.g., record recent history of each branch
▪ Assume future behavior will continue the trend
▪ When wrong, stall while re-fetching, and
update history
Branch Prediction
▪ Longer pipelines can’t really determine branch outcome early
▪ Stall penalty becomes unacceptable

▪ Predict outcome of branch


▪ Only stall if prediction is wrong

▪ In RISC-V pipeline
▪ Can predict branches not taken
▪ Fetch instruction after branch, with no delay
Assume Branch Not Taken
▪ Stalling until the branch is complete is too slow. One improvement over branch
stalling is to predict that the branch will not be taken and thus continue execution
down the sequential instruction stream.
▪ If the branch is taken, the instructions that are being fetched and decoded must be
discarded. Execution continues at the branch target. If branches are untaken half the
time, and if it costs little to discard the instructions, this optimization halves the cost of
control hazards.
▪ To discard instructions, we merely change the original control values to 0, much as we
did to stall for a load-use data hazard. The difference is that we must also change the
three instructions in the IF, ID, and EX stages when the branch reaches the MEM
stage;
▪ For load-use stalls, we just change control to zero in the ID stage and let them filter
through the pipeline. Discarding instructions, then, means we must be able to flush
instructions in the IF, ID, and EX stages of the pipeline.
Reducing Branch Delay
▪ More hardware to determine outcome to ID stage
▪ Target address adder
▪ Register comparator
▪ Example: branch taken
36: sub x10, x4, x8
40: beq x1, x3, 16 // PC-relative branch
// to 40+16*2=72
44: and x12, x2, x5
48: or x13, x2, x6
52: add x14, x4, x2
56: sub x15, x6, x7
...
72: ld x4, 50(x7)
Example: Branch Taken
Example: Branch Taken
Clock cycle 4 shows the instruction at
location 72 being fetched and the single
bubble or nop instruction in the pipeline
as a result of the taken branch. (Since
the nop is really sll x0, x0, 0, it’s
arguable whether or not the ID stage in
clock 4 should be highlighted.)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy