Ca CT2
Ca CT2
RI, O(R2)
R3, R4, R5
R6, RI, R3
fixplain the flow of execution in a pipeline using a simple diagram. 2. a.Consider the following
instructions are executed five stage pipclinc: lw R
Explain the now of execution in a pipeline using a simple diagram
b. For the instruction mix above, on what instruction results does the last add instruction depend on?
3. Indicate the type of hazards and data dependences for the following instructions: (5)
b. Iw R5, - 16(R5)
sw R5, - - 6(R5)
add R5, R5, R5
4. For the above sequence, add nop instructions to eliminate the hazards. Assume that there is no
forwarding in this pipelined processor.
5. Discuss, the schemes for dealing with the pipeline stalls caused by branch hazards. (3)
Discuss, the schemes for dealing with the pipeline stalls caused by branch hazards.
ChatGPT
Sure, let's dive into the strategies for mitigating pipeline stalls caused by branch hazards.
1. Branch Prediction:
• Static Prediction: This involves predicting branches at compile-time based on the program's
structure. However, it may not be very accurate for dynamic and runtime-dependent branches.
• Dynamic Prediction: Here, the processor predicts the outcome of a branch during runtime.
Two-level predictors, tournament predictors, and neural branch predictors are examples of
dynamic prediction techniques.
2. Branch Target Buffer (BTB):
• A BTB is a cache that stores target addresses of recent branches. When a branch instruction
is encountered, the BTB is checked to fetch the predicted target address. This can help in
reducing stalls as the pipeline can continue with the predicted instruction.
3. Delayed Branching:
• Instructions following a branch are executed regardless of the branch outcome. If the branch is
taken, the pipeline is flushed, and incorrect instructions are discarded. This reduces the impact
of branch stalls but may lead to wasted cycles if the prediction is incorrect.
4. Out-of-Order Execution:
• Modern processors often use out-of-order execution, allowing instructions after a branch to be
executed even before the branch outcome is determined. If the branch prediction was correct,
the results are committed; otherwise, the pipeline is rolled back.
5. Speculative Execution:
From <https://chat.openai.com/c/cb22d345-9d92-4035-ad45-9cbc0fb32bb6>
4.13.1: As we know lw used to load word from memory to a register. and sw same but stores a
register into memory add to adds together its two operands.
RAW on $1 from I1 to I3
RAW on $6 from I2 to I3
WAR on $6 from I1 to I2 and I3
b
I1: lw $5,-16($5)
I2: sw $5,-16($5)
I3: add $5,$5,$5
4.13.2: In the basic five-stage pipeline WAR and WAW dependences do not cause any hazards. Without
forwarding, any RAW dependence between an instruction and the next two instructions (if register read
happens in the second half of the clock cycle and the register write happens in the fi rst half). The code
that eliminates these hazards by inserting nop instructions is:
Instruction sequence
a
lw $1,40($6)
add $6,$2,$2
nop
sw $6,50($1)
4.13.3: With full forwarding, an ALU instruction can forward a value to EX stage of the next instruction
without a hazard. However, a load cannot forward to the EX stage of the next instruction (by can to the
instruction after that).The code that eliminates these hazards by inserting nop instructions is:
Instruction sequence
a
lw $1,40($6)
add $6,$2,$2
sw $6,50($1)
b
lw $5,-16($5)
nop
sw $5,-16($5)
add $5,$5,$5
4.13.4 The total execution time is the clock cycle time times the number of cycles. Without any stalls, a
three-instruction sequence executes in 7 cycles (5 to complete the fi rst instruction, then one per
instruction). The execution without forwarding must add a stall for every nop we had in 4.13.2, and
execution forwarding must add a stall cycle for every nop we had in 4.13.3. Overall, we get:
7 × 400ps = 2800ps
(7 + 2) × 200ps = 1800ps
(7 + 1) × 250ps = 2000ps
4.13.5 With ALU-ALU-only forwarding, an ALU instruction can forward to the next instruction, but not to
the second-next instruction (because that would be forwarding from MEM to EX). A load cannot forward
Instruction sequence
a
lw $1,40($6)
add $6,$2,$2
nop
sw $6,50($1)
lw $5,-16($5)
nop
nop
sw $5,-16($5)
add $5,$5,$5
4.13.6: Total execulation time of this instruction sequence with different condition ALU-ALU, NO
forwarding are given bellow.
(7 + 1) × 360ps = 2880ps
(7 + 2) × 200ps = 1800ps
(7 + 2) × 220ps = 1980ps
2A