0% found this document useful (0 votes)
31 views53 pages

Pipeline Hazards

Pipeline hazards are situations that prevent the next instruction from executing during its designated clock cycle, reducing performance. There are three classes of hazards: structural, data, and control hazards, each arising from different conflicts in instruction execution. Solutions to these hazards include stalling the pipeline, forwarding results, and implementing branch prediction techniques.

Uploaded by

jekitoc589
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views53 pages

Pipeline Hazards

Pipeline hazards are situations that prevent the next instruction from executing during its designated clock cycle, reducing performance. There are three classes of hazards: structural, data, and control hazards, each arising from different conflicts in instruction execution. Solutions to these hazards include stalling the pipeline, forwarding results, and implementing branch prediction techniques.

Uploaded by

jekitoc589
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 53

Pipeline Hazards

Pipeline Hazards
• There are situations, called hazards, that prevent the next instruction in the
instruction stream from executing during its designated clock cycle.
• Hazards reduce the performance from the ideal speedup gained by pipelining.
• There are three classes of hazards:
• 1. Structural hazards arise from resource conflicts when the hardware cannot
support all possible combinations of instructions simultaneously in overlapped
execution.
• 2. Data hazards arise when an instruction depends on the results of a previous
instruction in a way that is exposed by the overlapping of instructions in the pipeline.
• 3. Control hazards arise from the pipelining of branches and other instructions that
change the PC
• Hazards in pipelines can make it necessary to stall the pipeline.
• Avoiding a hazard often requires that some instructions in the
pipeline be allowed to proceed while others are delayed
Structural Hazards
• When a processor is pipelined, the overlapped execution of
instructions requires pipelining of functional units and duplication of
resources to allow all possible combinations of instructions in the
pipeline.
• If some combination of instructions cannot be accommodated
because of resource conflicts, the processor is said to have a
structural hazard.
Structural Hazards
• The most common instances of structural hazards arise
when some functional unit is not fully pipelined.
• Then a sequence of instructions using that unpipelined unit
cannot proceed at the rate of one per clock cycle.
• Another common way that structural hazards appear is
when some resource has not been duplicated enough to
allow all combinations of instructions in the pipeline to
execute.
• For example, a processor may have only one register-file
write port, but under certain circumstances, the pipeline
might want to perform two writes in a clock cycle. This will
generate a structural hazard
Structural Hazards
• When a sequence of instructions encounters this hazard, the pipeline
will stall one of the instructions until the required unit is available.
Solution 1-Structural Hazard: stall

Stall Instr i+3


till CC 5
Solution 2-Structural Hazard
Data Hazards

• A major effect of pipelining is to change the relative timing of


instructions by overlapping their execution.
• This overlap introduces data and control hazards.
• Data hazards occur when the pipeline changes the order of read/write
accesses to operands so that the order differs from the order seen by
sequentially executing instructions on an unpipelined processor
•C
• The DADD instruction writes the value of R1 in the WB pipe stage, but
the DSUB instruction reads the value during its ID stage. This problem
is called a data hazard.
SOLUTIONS 1: Forwarding
• directly feed back EX/MEM&MEM/WB pipeline registers’ results to
the ALU inputs;

• if forwarding hardware detects that previous ALU has written the reg
corresponding to a source for the current ALU,control logic selects the
forwarded result as the ALU input
• Generalized forwarding
-pass a result directly to the functional unit that requires it;

-forward results to not only ALU inputs but also other types of
functional units;
Data Hazards Requiring Stalls
• Unfortunately, not all potential data hazards can be handled by
bypassing.
• Consider the following sequence of instructions:
LD R1,0(R2)
DSUB R4,R1,R5
AND R6,R1,R7
OR R8,R1,R9
The LD instruction does not have the data until the end of clock cycle 4 (its MEM
cycle), while the DSUB instruction needs to have the data by the beginning of that
clock cycle.
Thus, the data hazard from using the result of a load instruction cannot be
completely eliminated with simple hardware
Solution 2 – STALL
ADAS.MCNSAKLchlk/jn

ADD R1,R2,R3
SUB R4,R1,R5
AND R6,R1,R7
OR R8,R1,R9
• ADD R1,R2,R3
• LOAD R4,8 (R1)
• STR R4 ,12(R1)
• ADD R1,R4 R3
• LOAD R1,0 (R3)
• SUB R4,R1,R5
• AND R6 R4 R1
• OR R8 R4 R1
• The IF, ID and WB stages take one clock cycle each to complete the
operation. The number of clock cycles for the EX stage depends on
the instruction. The ADD and SUB instructions need 1 clock cycle and
the MUL instruction needs 3 clock cycles in the EX stage. Operand
forwarding is used in the pipelined processor. What is the number of
clock cycles taken to complete the following sequence of instructions?
• ADD R2, R1, R0 R2 <- R0 + R1
• MUL R4, R3, R2 R4 <- R3 * R2
• SUB R6, R5, R4 R6 <- R5 - R4
• A 5-stage pipelined processor has Instruction Fetch(IF),Instruction
Decode(ID),Operand Fetch(OF),Execution (EXE)and Write
Operand(WO)stages.The IF,ID,OF and WO stages take 1 clock cycle each for any
instruction.The EXE stage takes 1 clock cycle for ADD and SUB instructions,3
clock cycles for MUL instruction,and 6 clock cycles for DIV instruction
respectively.Operand forwarding is used in the pipeline.What is the number of
clock cycles needed to execute the following sequence of instructions?

• Instruction Meaning of instruction


• I0 :MUL R2 ,R0 ,R1 R2 ¬ R0 *R1
• I1 :DIV R5 ,R3 ,R4 R5 ¬ R3/R4
• I2 :ADD R2 ,R5 ,R2 R2 ¬ R5+R2
• I3 :SUB R5 ,R2 ,R6 R5 ¬ R2-R6
Branch Hazards
• Control hazards are called Branch hazards and caused by
Branch Instructions.
• Branch instructions control the flow of program/
instructions execution
• Control hazards are caused by branches in the code.
• During the IF stage remember that the PC is incremented by 4 in
preparation for the next IF cycle of the next instruction.
• What happens if there is a branch performed and we aren’t simply
incrementing the PC by 4.
• The easiest way to deal with the occurrence of a branch is to perform
the IF stage again once the branch occurs.
The instruction after the branch is fetched, but the instruction is
ignored, and the fetch is restarted once the branch target is
known.
It is probably obvious that if the branch is not taken, the second
IF for branch successor is redundant. This will be addressed
shortly.
Reducing Pipeline Branch Penalties
• First solution
• The simplest scheme to handle branches is to freeze or flush the pipeline, holding
or deleting any instructions after the branch until the branch destination is known.
• Second Solution
• . In the simple five-stage pipeline, this predicted-not-taken or predicted untaken
scheme is implemented by continuing to fetch instructions as if the branch were a
normal instruction.
• The pipeline looks as if nothing out of the ordinary is happening. If the branch is
taken, however, we need to turn the fetched instruction into a no-op and restart the
fetch at the target address
• Another scheme in use in some processors is called delayed branch. This
technique was heavily used in early RISC processors and works reasonably well
in the five-stage pipeline.
• In a delayed branch, the execution cycle with a branch delay of one is
• branch instruction
• sequential successor1
• branch target if taken
• The sequential successor is in the branch delay slot.
• This instruction is executed whether or not the branch is taken.
Reducing the Cost of Branches
through Prediction
• Static Branch Prediction
• A key way to improve compile-time branch prediction is to use
profile information collected from earlier runs.
• The key observation that makes this worthwhile is that the behavior
of branches is often bimodally distributed; that is, an individual
branch is often highly biased toward taken or untaken
Dynamic Branch Prediction and
Branch-Prediction Buffers
• The simplest dynamic branch-prediction scheme is a branch-
prediction buffer or branch history table.
• A branch-prediction buffer is a small memory indexed by the lower
portion of the address of the branch instruction.
• The memory contains a bit that says whether the branch was recently
taken or not.
• This scheme is the simplest sort of buffer; it has no tags and is useful
only to reduce the branch delay when it is longer than the time to
compute the possible target PCs
Dynamic Branch Prediction and
Branch-Prediction Buffers
• If the branch is taken the bit is set to 1. The next time the branch
instruction is fetched we will know that the branch occurred and we
can assume that the branch will be taken.
• This scheme adds some “history” to our previous discussion on
“branch taken” and “branch not taken” control hazard avoidance
2-bit Prediction Scheme
• This method is more reliable than using a single bit to represent
whether the branch was recently taken or not.
• The use of a 2-bit predictor will allow branches that favor taken (or
not taken) to be mispredicted less often than the one-bit case.

ENGR9861 Winter 2005 JPR


ENGR9861 Winter 2005 JPR
Datapath and control considerations

1.There are separate instruction and data caches that use


separate address and data connections to the processor. This
requires two versions of the MAR register, IMAR for accessing
tile instruction cache and DMAR for accessing the data cache.
2.The PC is connected directly to the IMAR, so that the contents
of the PC can be transferred to IMAR at the same time that an
independent ALU operation is taking place.
3.The data address in DMAR can be obtained directly from the
register file or from the ALU to support the register indirect and
indexed addressing modes.
Datapath and control considerations
4.Separate MDR registers are provided for read and write
operations. Data can be transferred directly between these
registers and the register file during load and store operations
without the need to pass through the ALU.
5.Buffer registers have been introduced at the inputs and output
of the ALU. These are registers SRCl, SRC2, and RSLT.
Forwarding connections may be added if desired.
6.The instruction register has been replaced with an instruction
queue, which is loaded from the instruction cache.
Load / Store Architecture
• RISC is referred to as Load/Store architecture.
• Alternatively the operations in its instruction set are defined as Register-to-Register
operations.
• The reason is that all the RISC machine operations are between the operands that reside in
the General Purpose Register File (GPR).
• The result of the operation is also written back to GPR. Restricting the locations of the
operands to the GPR only, allows for determinism in the RISC operation.
• In the other words, a potentially multi-cycle and unpredictable access to memory has
been separated from the operation. Once the operands are available in the GPR the
operation can proceed in a deterministic fashion.
• It is almost certain that once commenced the operation will be completed in the number of
cycled determined by the pipeline depth and the result will be written back into the GPR.
• Memory Access is accomplished through Load and Store instructions
only, thus the term “Load/Store Architecture” is often used when
referring to RISC.
• The RISC pipeline is specified in a way in which it must accommodate
both: operation and memory access with equal efficiency.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy