Pipeline in ARM
Pipeline in ARM
1. Register Bank – It stores the state of the processor. It is used in arithmetic operations,
intermediate variable storage, temporary address storage etc… The register bank pictured
above has 5 ports. The ports with red dots are the outgoing read ports. The ports with the
blue dots are the incoming write ports. The ports with the yellow dots are the read and
write port of the the program counter. The program counter requires special read and
write ports because it is responsible for holding address of the next instruction. Through
the special write port, the updated address is written into the program counter. Through
the special read port, the current instruction address is read from the program counter.
2. The ALU or the Arithmetic Logic Unit performs numerical and logical operations. It does
not increment the PC.
3. The barrel shifter can shift or rotate one operand by any number of bits by pure
combinational logic instead of sequential logic. A barrel shifter is used for the floating
point arithmetic.
4. The address register and incrementer, which select and hold all memory addresses and
generate sequential addresses when required.
Three stage pipeline:
There are three stages in this pipeline method:
1. Fetch – The instruction is fetched from the memory and stored in the instruction register.
2. Decode – The instruction is moved to the decoder which decodes the instruction. It
activates the appropriate control signals and takes the necessary steps for the the next
execution stage.
3. Execute – The instruction is executed. Data transfer, logical and arithmetic operations all
take place during this stage.
PC behaviour:
During the execution stage of the first instruction, the third instruction is in the fetch stage.
That means the address in the program counter is pointing to the third instruction. Thus it can
be said that PC must always point 8 bytes ahead of the current instruction (The size of one
instruction in ARM is 4 bytes. Supposing the address of the current instruction is x, the next
instruction would be at x+4 and the next to next instruction will be at x+8).
Advantages of pipe-lining:
2. CPU’s ALU can be designed to work faster. But this requires complex hardware.
2. In a pipelined processor, insertion of flip flops between modules increases the instruction
latency compared to a non-pipelining processor.
5. Not all instructions are independent of each other. The output of one instruction maybe
the input to another instruction. In such cases, stalling of the pipe-lining is required so
that one instruction has completed execution. The output of this instruction is needed for
the subsequent dependent instruction.
6. When writing assembly code, it is assumed that the one instruction is executed after
another. But when this assumption is not validated by the pipelining, the program behaves
unexpectedly or incorrectly causing situations known as hazards.
Five Stage Pipeline in ARM
where Ninst is the number of ARM instructions executed in the course of the program, CPI is
the average number of clock cycles per instruction and fclk is the processor’s clock frequency.
To increase the performance of the processor, we need to decrease the time taken to
execute the program. To decrease Tprog, we can reduce CPI or increase fclk (for a given
program and compiler, Ninst is constant).
In three stage pipe-lining, during the execute stage, data transfer from source (from memory
to register or from register to ALU), data processing (arithmetic or logical operation in the
ALU) and the data transfer to destination all has to take place during a single clock cycle.
Because there are so many operations that needs to done in a single clock cycle, the time
period of the clock cycle must be sufficiently long. This means fclk cannot be increased above
a certain maximum value (With increase in fclk, time period of one clock cycle decreases).
But if we can split the execute stage into three stages namely Execute, Buffer and Write back,
then each of one of these stages will have only a small number of operations to complete in
one cycle. As a result, we can decrease the time period of one clock cycle and
correspondingly increase fclk. This is the 5-stage pipeline method.
The five stages of pipeline are:
1. Fetch – The instruction is fetched from the memory and stored in the instruction register.
2. Decode – The instruction is moved to the decoder which decodes the instruction. It
activates the appropriate control signals and takes the necessary steps for the the next
execution stage.
3. Execute – An operand is shifted and the ALU result generated. If the instruction is a load
or store, the memory address is computed in the ALU.
4. Buffer/Data – Data memory is accessed if required. Otherwise the ALU result is simply
buffered for one cycle.
5. Write back – The result generated by the instruction are written back to the register file,
including any data loaded from memory.
One of the major disadvantages of pipelining is something called the hazards. Hazards are
situations that prevent the proper functioning of the pipelining. It prevents the instructions to
be executed properly. There are three types of hazards.
1. Structural Hazards:
Structural hazards are the result of resource conflicts in the hardware, when the the hardware
cannot support all possible combinations of instructions in a simultaneous execution. For
example
In clock cycle 4, the load instruction is in the data/buffer stage and the instruction 3 is in
instruction fetch stage. Both are memory access operations. In Van-Neumann architecture,
where both data and instructions are stored in the same memory, this will cause a resource
conflict. Both instructions cannot access the memory simultaneously.
There are two possible solutions to this problem.
1. Stall the instruction fetch of the 3rd instruction for one clock cycle.
2. Data Hazards:
Data hazards occurs when the successive instructions are not independent of each other. In
other words, the input for a instruction depends upon the output of the previous instruction.
In the example given below, we can see that SUB,AND,OR and XOR take the value in R1 as
an input. We can also see that the first instruction ADD modifies the value in R1. The result
of the ADD operation will be written back into R1 only in clock cycle 5. But SUB needs the
updated value of R1 is in the fourth cycle itself. This causes the pipeline to malfunction. This
is called the data hazard.
Under the forwarding mechanism, the result of the ALU operation is supplied back to the
ALU as the input for every execution operation. If the processor detects that a instruction
needs the output of the last executed instruction, it will select the input the ALU received
from the last instruction’s execution. Otherwise it will just read its required data from the
registers during the data/buffer stage.
For example, ADD finishes its execution in the third cycle. It forwards its output to ALU as
the new input. During the fourth cycle, the processor detects that the SUB instruction requires
the updated value of R1. But R1 has not been updated yet (It will be updated in cycle 5). But
the result of the ADD operation is already in the input of the ALU. Instead of accessing the
R1 register, it simply makes use of this input. This is called forwarding. This code now can
Another solution to the data hazard is stalling. But this decreases efficiency and performance.
But there are certain data hazards that cannot be solved with forwarding only. It requires
stalling also. For example.
The load instruction has a delay or latency that cannot be eliminated by forwarding alone
hence stalling is necessary.
3. Control Hazards:
Control hazard occurs when the pipeline makes wrong decisions on branch prediction and
therefore brings instructions into the pipeline that must subsequently be discarded. The term
branch hazard also refers to a control hazard.
In the below example, beqz (Branch Equal to Zero) is a conditional branching instruction.
When beqz is in the instruction decode stage, sub instruction is in the execute state. But beqz
instruction depends on the value of R1. But R1 is not yet modified by the sub instruction.
That will happen only two cycles later.
|_1__|_2__|_3__|_4__|_5__|_6__|_7__|_8__|_9__|_10_|
Instructions
|_IF_|_ID_|_EX_|_MM_|_WB_|_____ ld r2,
0(r4)
|_IF_|_ID_|_EX_|_MM_|_WB_|_________ ld r3, 4(r4)
|_IF_|_ID_|_EX_|_MM_|_WB_|____ sub r1, r2, r3
|_IF_|_ID_|_EX_|_MM_|_WB_|____ beqz r1, L1
Further, we don’t know what instruction to fetch after the beqz instruction. If the condition
becomes true then we have to fetch the successive branch instruction. If the condition is false,
then we have to continue with the current instruction list. Ideally at cycle 5, we should
fetched a new instruction but we still don’t know the result of the condition at this stage.
The simplest method is to stall the pipeline until the MEM stage when we will know the
result of the conditional statement. But if we don’t do that and we decide to go ahead with the
pipelining assuming one result or another, then there is a chance that our assumption is
wrong. In other words we have made a wrong decision on the branch prediction. This is
called the control hazard.