0% found this document useful (0 votes)
169 views31 pages

CS 3351 Digital Principles and Computer Organization

UNIT IV PROCESSOR

Uploaded by

Dr.Kalaivazhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
169 views31 pages

CS 3351 Digital Principles and Computer Organization

UNIT IV PROCESSOR

Uploaded by

Dr.Kalaivazhi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 31

UNIT 4 PROCESSOR

Instruction Execution – Building a Data Path – Designing a Control Unit – Hardwired Control,
Microprogrammed Control – Pipelining – Data Hazard – Control Hazards.

4.1 Instruction Cycle

As instructions are a part of the program which are stored inside the memory, so every time the processor
requires to execute an instruction, for that the processor first fetches the instruction from the memory,
then decodes the instruction and then executes the instruction. The whole process is known as an instruction
cycle.

In the basic computer, each instruction cycle includes the following procedures −

 It can fetch instruction from memory.


 It is used to decode the instruction.
 It can read the effective address from memory if the instruction has an indirect address.
 It can execute the instruction.

1. After the following four procedures are done, the control switches back to the first step and repeats the
similar process for the next instruction.
2. Therefore, the cycle continues until a Halt condition is met.
3. The figure shows the phases contained in the instruction cycle.

Figure 4.1 Instruction Cycle


Fetch Cycle

 The address instruction to be implemented is held at the program counter.


 The processor fetches the instruction from the memory that is pointed by the PC.
 Next, the PC is incremented to display the address of the next instruction.
 This instruction is loaded onto the instruction register.
 The processor reads the instruction and executes the important procedures.

Dr.V.Kalaivaazhi B.E.,M.Tech.,Ph.D Page 1


Execute Cycle

The data transfer for implementation takes place in two methods are as follows −

Processor-memory − The data sent from the processor to memory or from memory to processor.
Processor-Input/Output − The data can be transferred to or from a peripheral device by the transfer between a
processor and an I/O device.

 These two methods associate and complete the execute cycle.

Instruction cycle state transition diagram

Figure 4.2 State transition Diagram for Instruction Cycle

Instruction execution :

Instruction execution needs the following steps, which are

 PC (program counter) register of the processor gives the address of the instruction which needs to be fetched
from the memory.
 If the instruction is fetched then, the instruction opcode is decoded.
 On decoding, the processor identifies the number of operands. If there is any operand to be fetched from the
memory, then that operand address is calculated.
 Operands are fetched from the memory. If there is more than one operand, then the operand fetching process
may be repeated (i.e. address calculation and fetching operands).
 After this, the data operation is performed on the operands, and a result is generated.
 If the result has to be stored in a register, the instructions end here.

Dr.V.Kalaivaazhi B.E.,M.Tech.,Ph.D Page 2


 If the destination is memory, then first the destination address has to be calculated. Then the result is then
stored in the memory. If there are multiple results which need to be stored inside the memory, then this
process may repeat (i.e. destination address calculation and store result).
 Now the current instructions have been executed. Side by side, the PC is incremented to calculate the address
of the next instruction.
 The above instruction cycle then repeats for further instructions.

4.2Building a Data path


Data path element:
 A unit used to operate on or hold data within a processor.
 In the MIPS implementation, the datapath elements include the instruction and data memories,
the register file, the ALU, and adders.

1. Figure 4.3a shows the first element needed: A memory unit to store the instructions of a program and
supply instructions given an address.
 The instruction memory need only provide read access because the data path does not write
instructions.
 The instruction memory is treated as combinational logic since it only reads,
 Output at any time reflects the contents of the location specified by the address input,
 No read control signal is needed.
2. Figure 4.3b shows the program counter (PC), a register that holds the address of the current instruction.
 The program counter is a 32-bit register that is written at the end of every clock cycle.
 So, it does not need a write control signal.
3. Figure 4.3c shows the adder needed to increment the PC to the address of the next instruction.
 The adder is a wired ALU that always add its two 32-bit inputs and place the sum on its output.

Figure 4.3 Elements needed for data path design Figure 4.4 Combination of three elements

4. Figure 4.4 shows how to combine the three elements from Figure 4.3 to form a data path that fetches
instructions and increments the PC to obtain the address of the next sequential instruction.

Dr.V.Kalaivaazhi B.E.,M.Tech.,Ph.D Page 3


Dr.V.Kalaivaazhi B.E.,M.Tech.,Ph.D Page 4
Dr.V.Kalaivaazhi B.E.,M.Tech.,Ph.D Page 5
Dr.V.Kalaivaazhi B.E.,M.Tech.,Ph.D Page 6
R-format instructions
1. They all read two registers, perform an ALU operation on the contents of the registers, and write the
result to a register called as R-type instructions or arithmetic-logical instructions.
2. This instruction class includes add, sub, AND, OR, and slt.
3. The processor’s 32 general-purpose registers are stored in a structure called a register file.
 It is a collection of registers in which any register can be read or written by specifying the number of
the register in the file.
 It contains the register state of the computer and an ALU to operate on the values read from the
registers.
4. R-format instructions have three register operands,
a. To read two data words from the register file
 an input to the register file that specifies the register number to be read
 an output from the register file that will carry the value that has been read from the registers.
b. Write one data word into the register file for each instruction.
 Two inputs: one to specify the register number to be written and one to supply the data to be
written into the register.
 Outputs the contents of whatever register numbers are on the Read register inputs
5. Writes are controlled by the write control signal, which must be asserted for a write to occur at the clock
edge.
6. Figure 4.5a shows the elements of R-format instruction, a total of four inputs are needed (3 for register
numbers and 1 for data) and two outputs (both for data).
7. The register number inputs are 5 bits wide to specify one of 32 registers (32 = 25), whereas the data input
and two data output buses are each 32 bits wide.
8. Figure 4.5b shows the ALU, which takes two 32-bit inputs and produces a 32-bit result, as well as a 1-bit
signal if the result is 0.

Figure 4.5 Elements of R-Format instruction. Figure 4.6. Elements of loads and stores
Implementation of loads and stores
1. The MIPS load word and store word instructions computes a memory address by adding the base register,
which is $t2, to the 16-bit signed off set field contained in the instruction.
lw $t1,offset_value($t2)
sw $t1,offset_value ($t2)
2. A sign-extend unit is needed to extend the 16-bit off set field in the instruction to a 32-bit signed value,
and a data memory unit to read from or write to as shown in Figure 3.6.
3. sign-extend To increase the size of a data item by replicating the high-order sign bit of the original data
item in the high order bits of the larger, destination data item.

Dr.V.Kalaivaazhi B.E.,M.Tech.,Ph.D Page 7


4. The data memory must be written on store instructions; hence, it has read and write control signals, an
address input, and an input for the data to be written into memory.
5. Diagram explanation
 The memory unit is a state element with inputs for the address and the write data, and a single output
for the read result.
 There are separate read and write controls, although only one of these may be asserted on any given
clock.
 The memory unit needs a read signal, since, unlike the register file, reading the value of an invalid
address can cause problems.
 The sign extension unit has a 16-bit input that is sign-extended into a 32-bit result appearing on the
output.
 The data memory is assumed to be edge-triggered for writes. Standard memory chips actually have a
write enable signal that is used for writes.
 Although the write enable is not edge-triggered, our edge-triggered design could easily be adapted to
work with real memory chips.
Branch instructions
beq $t1,$t2,offset
1. The beq instruction has three operands,
 Two registers that are compared for equality,
 A 16-bit off set used to compute the branch target address relative to the branch instruction address.
2. Branch target address
 The address specified in a branch, which becomes the new program counter (PC) if the branch is
taken.
 In the MIPS architecture the branch target is given by the sum of the offset field of the instruction and
the address of the instruction following the branch.
3. Definition of branch instructions
a. ISA specifies that the base for the branch address calculation is the address of the instruction
following the branch.
b. It also states that the offset field is shifted left 2 bits so that it is a word off set; this shift increases the
effective range of the offset field by a factor of 4.
4. Branch taken
 A branch where the branch condition is satisfied and the program counter (PC) becomes the branch
target. All unconditional jumps are taken branches.
5. Branch not taken or (untaken branch)
 A branch where the branch condition is false and the program counter (PC) becomes the address of
the instruction that sequentially follows the branch.
Thus, the branch datapath must do two operations:
i. Compute the branch target address
ii. Compare the register contents.
6. Figure 4.7 shows the structure of the datapath segment that handles branches. The data path for a branch
uses
 The ALU -evaluates the branch condition
 A separate adder - computes the branch target (incremented PC + the branch displacement), shifted
left 2 bits.
7. Shift left 2 unit – routes the signals between input and output that adds 00two to the low-order end of the
sign-extended off set field; no actual shift hardware is needed, since the amount of the “shift” is constant.
8. Since the offset was sign-extended from 16 bits, the shift throws away only “sign bits.”

Dr.V.Kalaivaazhi B.E.,M.Tech.,Ph.D Page 8


9. Control logic –decides whether the incremented PC or branch target should replace the PC, based on the
Zero output of the ALU.

Figure 4.7. The data path for a branch instruction


Creating a Single Data path
1. The simplest data path shown in figure 3.9 attempts to execute all instructions in one clock cycle.
2. No data path resource can be used more than once per instruction, so any element that needs more than
one must be duplicated.
3. Therefore a separate memory is needed for instructions and data. Although some of the functional units
need to be duplicated, many of the elements can be shared by different instruction flows.

Figure 4.8. MIPS architecture data path for different instruction classes.
4. To share a datapath element between two different instruction classes,
 A multiplexor is used to allow multiple connections to the input of an element,
 A control signal is used to select one among the multiple inputs.
5. The branch instruction uses the main ALU for comparison of the register operands, so the adder is used
for computing the branch target address.

Dr.V.Kalaivaazhi B.E.,M.Tech.,Ph.D Page 9


6. An additional multiplexor is required to select either the sequentially following instruction address (PC +
4) or the branch target address to be written into the PC.
7. The control unit must be able to take inputs and generate a write signal for each state element, the
selector control for each multiplexor, and the ALU control.

4.3 Designing a Control Unit:

Control Unit :
 It is the part of the computer’s central processing unit (CPU), which directs the operation of
the processor.
 It was included as part of the Von Neumann Architecture by John von Neumann.
 It is the responsibility of the Control Unit to tell the computer’s memory, arithmetic/logic unit
and input and output devices how to respond to the instructions that have been sent to the
processor.
Dr.V.Kalaivaazhi B.E.,M.Tech.,Ph.D Page 10
 It fetches internal instructions of the programs from the main memory to the processor
instruction register, and based on this register contents, the control unit generates a control
signal that supervises the execution of these instructions.
 A control unit works by receiving input information to which it converts into control signals,
which are then sent to the central processor.
 The computer’s processor then tells the attached hardware what operations to perform.
 The functions that a control unit performs are dependent on the type of CPU because the
architecture of CPU varies from manufacturer to manufacturer.
Examples of devices that require a CU are:
i. Control Processing Units(CPUs)
ii. Graphics Processing Units(GPUs)

Figure 4.9 Block Diagram Of Control Unit

Functions of the Control Unit –

1. It coordinates the sequence of data movements into, out of, and between a processor’s many sub-units.
2. It interprets instructions.
3. It controls data flow inside the processor.
4. It receives external instructions or commands to which it converts to sequence of control signals.
5. It controls many execution units(i.e. ALU, data buffers and registers) contained within a CPU.
6. It also handles multiple tasks, such as fetching, decoding, execution handling and storing results.

Types of Control Unit –

There are two types of control units:


i. Hardwired control unit
ii. Micro programmable control unit.

4.4 Hardwired Control Unit

In the Hardwired control unit, the control signals that are important for instruction execution
control are generated by specially designed hardware logical circuits, in which we can not modify the
signal generation method without physical change of the circuit structure.

Dr.V.Kalaivaazhi B.E.,M.Tech.,Ph.D Page 11


The operation code of an instruction contains the basic data for control signal generation. In the
instruction decoder, the operation code is decoded. The instruction decoder constitutes a set of many
decoders that decode different fields of the instruction opcode.

As a result, few output lines going out from the instruction decoder obtains active signal values.
These output lines are connected to the inputs of the matrix that generates control signals for execution
units of the computer.

This matrix implements logical combinations of the decoded signals from the instruction
opcode with the outputs from the matrix that generates signals representing consecutive control unit states
and with signals coming from the outside of the processor, e.g. interrupt signals.

The matrices are built in a similar way as a programmable logic arrays.

Figure 4.10 Block Diagram Of a hardwired control unit

Control signals for an instruction execution have to be generated not in a single time point but during the
entire time interval that corresponds to the instruction execution cycle.

Following the structure of this cycle, the suitable sequence of internal states is organized in the control
unit.

A number of signals generated by the control signal generator matrix are sent back to inputs of the next
control state generator matrix.

This matrix combines these signals with the timing signals, which are generated by the timing unit based
on the rectangular patterns usually supplied by the quartz generator. When a new instruction arrives at the
control unit, the control units is in the initial state of new instruction fetching.

Instruction decoding allows the control unit enters the first state relating execution of the new instruction,
which lasts as long as the timing signals and other input signals as flags and state information of the
computer remain unaltered.

Dr.V.Kalaivaazhi B.E.,M.Tech.,Ph.D Page 12


A change of any of the earlier mentioned signals stimulates the change of the control unit state.

This causes that a new respective input is generated for the control signal generator matrix. When an
external signal appears, (e.g. an interrupt) the control unit takes entry into a next control state that is the
state concerned with the reaction to this external signal (e.g. interrupt processing).

The values of flags and state variables of the computer are used to select suitable states for the instruction
execution cycle.

The last states in the cycle are control states that commence fetching the next instruction of the program:
sending the program counter content to the main memory address buffer register and next, reading the
instruction word to the instruction register of computer.

When the ongoing instruction is the stop instruction that ends program execution, the control unit enters
an operating system state, in which it waits for a next user directive.

4.5 Micro programmable control unit

The fundamental difference between these unit structures and the structure of the hardwired
control unit is the existence of the control store that is used for storing words containing encoded
control signals mandatory for instruction execution.

In micro programmed control units, subsequent instruction words are fetched into the instruction
register in a normal way. However, the operation code of each instruction is not directly decoded to
enable immediate control signal generation but it comprises the initial address of a microprogram
contained in the control store.

With a single-level control store:

In this, the instruction opcode from the instruction register is sent to the control store address register.
Based on this address, the first microinstruction of a microprogram that interprets execution of this
instruction is read to the microinstruction register.

This microinstruction contains in its operation part encoded control signals, normally as few bit fields. In
a set microinstruction field decoders, the fields are decoded. The microinstruction also contains the
address of the next microinstruction of the given instruction microprogram and a control field used to
control activities of the microinstruction address generator.

Dr.V.Kalaivaazhi B.E.,M.Tech.,Ph.D Page 13


Figure 4.11 Block Diagram Of Microprogrammed control unit with a single level control store

The last mentioned field decides the addressing mode (addressing operation) to be applied to the
address embedded in the ongoing microinstruction.

In microinstructions along with conditional addressing mode, this address is refined by using the
processor condition flags that represent the status of computations in the current program.

The last microinstruction in the instruction of the given microprogram is the microinstruction that
fetches the next instruction from the main memory to the instruction register.

With a two-level control store:

In this, in a control unit with a two-level control store, besides the control memory for microinstructions,
a nano-instruction memory is included.

In such a control unit, microinstructions do not contain encoded control signals.

The operation part of microinstructions contains the address of the word in the nano-instruction memory,
which contains encoded control signals.

The nano-instruction memory contains all combinations of control signals that appear in microprograms
that interpret the complete instruction set of a given computer, written once in the form of nano-
instructions.

Dr.V.Kalaivaazhi B.E.,M.Tech.,Ph.D Page 14


Figure 4.12 Block Diagram Of Micro programmed control unit with a two- level control store

In this way, unnecessary storing of the same operation parts of microinstructions is avoided. In this case,
microinstruction word can be much shorter than with the single level control store.

It gives a much smaller size in bits of the microinstruction memory and, as a result, a much smaller size
of the entire control memory.

The microinstruction memory contains the control for selection of consecutive microinstructions, while
those control signals are generated at the basis of nano-instructions.

In nano-instructions, control signals are frequently encoded using 1 bit/ 1 signal method that eliminates
decoding.

4.6 Pipelining
An implementation technique in which multiple instructions are overlapped in execution, much like an
assembly line.

MIPS instructions classically take five steps:


1. Fetch instruction from memory.
2. Read registers while decoding the instruction. The regular format of MIPS instructions allows
reading and decoding to occur simultaneously.
3. Execute the operation or calculate an address.
4. Access an operand in data memory.
5. Write the result into a register.

Dr.V.Kalaivaazhi B.E.,M.Tech.,Ph.D Page 15


Design goal: To balance the length of each pipeline stage. If the stages are perfectly balanced, then

Time between instructions pipelined =Time between instruction nonpipelined


Number of pipe stages
Designing Instruction Sets for Pipelining

All MIPS instructions are the same length. This makes it easier to fetch instructions in the first pipeline stage
and to decode them in the second stage.

1. MIPS have only a few instruction formats, with the source register fields being located in the same
place in each instruction.
 The second stage can begin reading the register file at the same time that the hardware is
determining what type of instruction was fetched.
 If MIPS instruction formats were not symmetric, stage 2 is splitted, resulting in six
pipeline stages.

2. Memory operands appear only in loads or stores in MIPS.


 The execute stage is used to calculate the memory address.
3. Operands must be aligned in memory. So, the requested data can be transferred between processor
and memory in a single pipeline stage.

Figure 4.13 Single-cycle, non - pipelined execution in top versus pipelined execution in bottom.

Dr.V.Kalaivaazhi B.E.,M.Tech.,Ph.D Page 16


4.7 Hazards

A situation in pipelining when the next instruction cannot execute in the following clock cycle is called
hazards.
1. Structural hazard
2. Data Hazards
3. Control Hazards

1. Structural hazard

i. When a planned instruction cannot execute in the proper clock cycle because the hardware does not
support the combination of instructions that are set to execute.
ii. MIPS instruction set avoids structural hazards.
 If the pipeline in Figure 4.13 had a fourth instruction, in the same clock cycle the first
instruction is accessing data from memory while the fourth instruction is fetching an
instruction from that same memory.
 Without two memories, it could have a structural hazard.

2. Data Hazards

i. When a planned instruction cannot execute in the proper clock cycle because data that is needed to
execute the instruction is not yet available.
ii. It occurs when the pipeline must be stalled because one step must wait for another to complete.
iii. Data hazards arise from the dependence of one instruction on an earlier one that is still in the pipeline.

Dr.V.Kalaivaazhi B.E.,M.Tech.,Ph.D Page 17


Example,
sum ($s0): add $s0, $t0, $t1
sub $t2, $s0, $t3
Forwarding or bypassing:
 Adding extra hardware to retrieve the missing item early from the internal resources
 A method of resolving a data hazard by retrieving the missing data element from internal buffers
rather than waiting for it to arrive from programmer visible registers or memory.
Load-use data hazard:
 A specific form of data hazard in which the data being loaded by a load instruction has not yet
become available when it is needed by another instruction.
Pipeline stall or bubble. A stall initiated in order to resolve a hazard.
3. Control Hazards or branch hazards
i. When the proper instruction cannot execute in the proper pipeline clock cycle because the instruction
that was fetched is not the one that is needed.
ii. The flow of instruction addresses is not the expected order.
iii. Computers use prediction to handle branches.
Branch prediction
i. A method of resolving a branch hazard that assumes a given outcome for the branch and proceeds
from that assumption rather than waiting to ascertain the actual outcome.
 One simple approach is to predict always that branches will be untaken. When you’re right,
the pipeline proceeds at full speed.
 Only when branches are taken does the pipeline stall.
ii. Dynamic hardware predictors make their guesses depending on the behavior of each branch and may
change predictions for a branch over the life of a program.
iii. It keeps a history for each branch as taken or untaken, and then using the recent past behavior to
predict the future.
iv. When the guess is wrong, the pipeline control must ensure that the instructions following the wrongly
guessed branch have no effect and must restart the pipeline from the proper branch address.
Advantages of pipelining
 Pipelining increases the number of simultaneously executing instructions and the rate at which
instructions are started and completed.
 Pipelining does not reduce the time it takes to complete an individual instruction, called the latency.
 Pipelining improves instruction throughput rather than individual instruction execution time or
latency.
 Latency(pipeline):
o The number of stages in a pipeline or the number of stages between two instructions during
execution.
4.8 Pipelined data path and control
In five-stage pipeline, an instruction is divided into five stages and they are in execution during any
single clock cycle.
4.8.1 IF: Instruction fetch
4.8.2 ID: Instruction decode and register file read
4.8.3 EX: Execution or address calculation
4.8.4 MEM: Data memory access
4.8.5 WB: Write back

Dr.V.Kalaivaazhi B.E.,M.Tech.,Ph.D Page 18


Exceptions in the left -to-right flow of instructions:

 The write-back stage, which places the result back into the register file in the middle of the
data path
 The selection of the next value of the PC, choosing between the incremented PC and the
branch address from the MEM stage

Data flowing from right to left does not affect the current instruction; these reverse data movement’s
influence only later instructions in the pipeline.
 The first right-to-left flow of data causes data hazards
 The second causes control hazards.
Figure3.16 shows the pipelined version of the data path with registers, to hold information produced in
previous cycle.

Figure 4.14 The pipelined version of the data path with registers.

Pipelined data path for Load and Store instructions

 The right half of registers or memory is highlighted during read operation.


 The left half of the registers or memory is highlighted during write operation.

Dr.V.Kalaivaazhi B.E.,M.Tech.,Ph.D Page 19


Stages Load instruction Store instruction
Instruction fetch  Instruction is read from memory using  Instruction is read from memory using
the address in the PC and placed in the the address in the PC and placed in the
IF/ID pipeline register. IF/ID pipeline register.
 The PC address is incremented by 4 and
written back into the PC to be ready for
the next clock cycle.
Instruction decode  IF/ID pipeline register supplies the 16-bit  IF/ID pipeline register supplies the
and register file immediate field, which is sign-extended register numbers for reading two
read to 32 bits, and the register numbers to registers and extends the sign of the 16-
read the two registers. bit immediate.
 All three values are stored in the ID/EX  These three 32-bit values are all stored
pipeline register, along with the in the ID/EX pipeline register.
incremented PC address.
Execute or address  The load instruction reads the contents of  The effective address is placed in the
calculation register 1 and the sign-extended EX/MEM pipeline register.
immediate from the ID/EX pipeline
register and adds them using the ALU.
 That sum is placed in the EX/MEM
pipeline register.
Memory access  The address from the EX/MEM pipeline  The data is placed into the EX/MEM
register is used to read the data memory. pipeline register in the EX stage to make
 The data is loaded into the MEM/WB it available during the MEM stage.
pipeline register.  The register containing the data to be
stored was read in an earlier stage and
stored in ID/EX.
Write-back  Reading the data from the MEM/WB  Nothing happens in the write-back stage
pipeline register and writing it into the of store instruction.
register file.  Since every instruction behind the store
is already in progress.
Load and store illustrates a key point:
1. Each logical component of the data path can be used only within a single pipeline stage. Otherwise, a
structural hazard is experienced.
2. The logical components of a data path are,
 Instruction memory,
 Register read ports,
 ALU, data memory and
 Register write port
Pipelined Control
It’s useful to group control signals by the stage with which they’re associated as shown in 4.15,
1. Instruction fetch: no control signals because the same thing happens every time.
2. Instruction decode/register file read: no control signals because the same thing happens every time.
3. Execution/address calculation:
 RegDst, ALUOp, and ALUSrc are used to select the write register, the ALU operation, and either
read data 2 or the 16-bit immediate offset.
4. Memory access: Branch, MemRead, and MemWrite to select whether the branch address will be written
Dr.V.Kalaivaazhi B.E.,M.Tech.,Ph.D Page 20
to the PC, memory will be read, or memory will be written.
5. Write-back:
 MemToReg and RegWrite which select the source of the write (ALU or Memory) and whether or not
to write to the register.
After the instruction decode in stage 2, control information is passed via the pipeline registers.
Explanation of figure 4.15
i. Four of the nine control lines are used in the EX phase.
ii. The remaining five control lines passed on to the EX/MEM pipeline register extended to hold the
control lines.
iii. Three are used during the MEM stage.
iv. The last two are passed to MEM/ WB for use in the WB stage.

Figure 4.15 The control lines for the final three stages

The control information follows the instruction with which it’s associated.

Figure 4.16 The pipelined data path with the control signals

Dr.V.Kalaivaazhi B.E.,M.Tech.,Ph.D Page 21


Explanation of figure 4.16
i. The control values for the last three stages are created during the instruction decode stage and then
placed in the ID/EX pipeline register.
ii. The control lines for each pipe stage are used, and remaining control lines are then passed to the next
pipeline stage.
4.9 Handling Data hazards
Data Hazards
When a planned instruction cannot execute in the proper clock cycle since data that is needed to
execute the instruction is not yet available.
add $s0, $t0, $t1
sub $t2, $s0, $t3
Forwarding or Bypassing
A method of resolving a data hazard,
 By retrieving the missing data element from internal buffers rather than waiting for it to arrive from
registers or memory.
Consider the sequence:
sub $2, $1,$3
and $12,$2,$5
or $13,$6,$2
add $14,$2,$2
sw $15,100($2)
The last four instructions are all dependent on the result in register $2 of the first instruction.

Figure 4.17 Pipelined dependences

4.9.1 Figure 4.17 illustrates the execution of above instructions using a multiple-clock-cycle pipeline
representation.
4.9.2 When a register is read and written in the same clock cycle it is assumed that the write is in the
first half of the clock cycle and the read is in the second half, so the read delivers recently
written value.
4.9.2.1 The value of register $2, changes during the middle of clock cycle 5, when the sub
instruction writes its result.

Dr.V.Kalaivaazhi B.E.,M.Tech.,Ph.D Page 22


4.9.2.2 The values read for register $2 would not be the result of the sub instruction unless
the read occurred during clock cycle 5 or later.
4.9.2.3 Thus, the instructions that would get the correct value of −20 are add and sw; the
AND and OR instructions would get the incorrect value 10.
4.9.3 A notation that names the fields of the pipeline registers gives a more precise notation of
dependences.
4.9.4 Using this notation, the two pairs of hazard conditions are
1a. EX/MEM.RegisterRd = ID/EX.RegisterRs
1b. EX/MEM.RegisterRd = ID/EX.RegisterRt
2a. MEM/WB.RegisterRd = ID/EX.RegisterRs
2b. MEM/WB.RegisterRd = ID/EX.RegisterRt
4.9.4.1 For example,
i. “ID/EX.RegisterRs”- the number of register whose value is in the pipeline register
ID/EX
 The first hazard in the sequence is on register $2, between the result of sub $2,$1,$3 and the first read
operand of and $12,$2,$5.
 This hazard can be detected when the and instruction is in the EX stage and the prior instruction is
in the MEM stage, so this is hazard 1a:
EX/MEM.RegisterRd = ID/EX.RegisterRs = $2
 The sub-or is a type 2b hazard:
MEM/WB.RegisterRd = ID/EX.RegisterRt = $2
 The two dependences on sub-add are not hazards because the register file supplies the proper data
during the ID stage of add.
 There is no data hazard between sub and sw because sw reads $2 the clock cycle after sub writes $2.
Conditions for detecting hazards and the control signals to resolve them:
1. EX hazard:
if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 10
if (EX/MEM.RegWrite
and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10MEM hazard
 EX/MEM.RegisterRd field is the register destination for either an ALU instruction (which comes from
the Rd field of the instruction) or a load (which comes from the Rt field).
 This case forwards the result from the previous instruction to either input of the ALU.
 If the previous instruction is going to write to the register file, and the write register number matches
the read register number of ALU inputs A or B, provided it is not register 0, then steer the multiplexor
to pick the value instead from the pipeline register EX/MEM.
2. MEM hazard:
if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd ≠ 0)
and ( MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01
if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd ≠ 0)
and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01

Dr.V.Kalaivaazhi B.E.,M.Tech.,Ph.D Page 23


No hazard in the WB stage, because
o It is assumed that the register file supplies the correct result if the instruction in the ID stage
reads the same register written by the instruction in the WB stage.
o Such a register file performs another form of forwarding, but it occurs within the register file.
Complication in Data hazards
1. Hazards that occur between the results of the instruction in the WB stage, the result of the instruction
in the MEM stage, and the source operand of the instruction in the ALU stage.
2. Example:
When summing a vector of numbers in a single register, a sequence of instructions will all read and
write to the same register:
add $1,$1,$2
add $1,$1,$3
add $1,$1,$4
3. The result is forwarded from the MEM stage because the result in the MEM stage is the more recent
result.
4. Thus, the control for the MEM hazard would be (with the additions highlighted):
if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd ≠ 0)
and not(EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd ≠ ID/EX.RegisterRs))
and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01

if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd ≠ 0)
and not(EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd ≠ ID/EX.RegisterRt))
and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01
Data Hazards and Stalls
1. Forwarding cannot be used, when an instruction tries to read a register following a load instruction
that writes the same register.
2. Something must stall the pipeline for the combination of load followed by an instruction that reads its
result.
3. A hazard detection unit is used in addition to the forwarding unit to deal such cases.
 It operates during the ID stage so that it can insert the stall between the load and its use.
 Checking for load instructions, the control for the hazard detection unit is this single condition:
if (ID/EX.MemRead and
((ID/EX.RegisterRt = IF/ID.RegisterRs) or
(ID/EX.RegisterRt = IF/ID.RegisterRt)))
stall the pipeline
 First line checks whether the instruction is a load: the only instruction that reads data memory is a
load.
 The next two lines check whether the destination register field of the load in the EX stage matches
either source register of the instruction in the ID stage.
 If the condition holds, the instruction stalls one clock cycle.
 After this 1-cycle stall, the forwarding logic can handle the dependence and execution proceeds.
 Force control values in ID/EX register to 0
o EX, MEM and WB do nop (no-operation)

Dr.V.Kalaivaazhi B.E.,M.Tech.,Ph.D Page 24


o nop -An instruction that does no operation to change state.
o No registers or memories are written if the control values are all 0.
Figure 4.18 explains the way stalls are really inserted into the pipeline,
1. A bubble is inserted beginning in clock cycle 4, by changing the instruction to a nop.
2. Note that the instruction is really fetched and decoded in clock cycles 2 and 3, but it’s EX stage is
delayed until clock cycle 5 .
3. Likewise the OR instruction is fetched in clock cycle 3, but its ID stage is delayed until clock cycle 5
4. After insertion of the bubble, all the dependences go forward in time and no further hazards occur.

Figure 4.18 the way stalls are really inserted into the pipeline

Figure 4.19 Datapath with Hazard Detection and forwarding unit


Figure 4.19 highlights the pipeline connections for both the hazard detection unit and the forwarding unit.
1. The forwarding unit controls the ALU multiplexors to replace the value from a general-purpose
register with the value from the proper pipeline register.
2. The hazard detection unit controls the writing of the PC and IF/ID registers plus the multiplexor that
Dr.V.Kalaivaazhi B.E.,M.Tech.,Ph.D Page 25
chooses between the real control values and all 0s.
3. The hazard detection unit stalls and deasserts the control fields if the load-use hazard test above is
true.
4.10 Handling Control Hazards
Control Hazards or Branch hazards
When the proper instruction cannot execute in the proper pipeline clock cycle because,
a. The instruction that was fetched is not the one that is needed.
b. The flow of instruction addresses is not what the pipeline expected.
Solutions to control hazards
1. Stall.
2. Predict.
Assume Branch Not Taken
4.10.1 Stalling until the branch is complete is too slow.
4.10.2 So predict that the branch will not be taken and continue execution down the sequential
instruction stream.
4.10.3 If the branch is taken, the instructions that are being fetched and decoded must be discarded.
Execution continues at the branch target.
4.10.4 To discard instructions, the original control values are changed to 0s in IF, ID, and EX stages
when the branch reaches the MEM stage.
4.10.5 For load-use stalls, just the control is changed to 0 in the ID stage and let them get into through
the pipeline.
4.10.6 Discarding instructions means flush instructions in the IF, ID, and EX stages of the pipeline.
4.10.7 Flush: To discard instructions in a pipeline, usually due to an unexpected event.
Reducing the Delay of Branches
1. The cost of the taken branch is reduced to improve branch performance.
2. The MIPS architecture was designed to support fast single-cycle branches that could be pipelined
with a small branch penalty.
3. Many branches rely only on simple tests (equality or sign, for example) and such tests do not require
a full ALU operation but can be done with at most a few gates.
4. When a more complex branch decision is required, a separate instruction that uses an ALU to perform
a comparison is required.
5. Moving the branch decision up requires two actions to occur earlier:
a. Computing the branch target address
 The PC value is already known and the immediate field is in the IF/ID pipeline register, so
the branch adder is just moved from the EX stage to the ID stage.
 The branch target address calculation will be performed for all instructions, but only used
when needed.
b. Evaluating the branch decision.
 The harder part is the branch decision itself.
 For branch equal, the two registers read during the ID stage is compared to see if they are
equal.
 Equality can be tested by first exclusive ORing their respective bits and then ORing all the
results.
 Moving the branch test to the ID stage implies additional forwarding and hazard detection
hardware, since a branch dependent on a result in the pipeline must still work properly
with this optimization.

Dr.V.Kalaivaazhi B.E.,M.Tech.,Ph.D Page 26


For example,
 To implement branch on equal (and its inverse), there is a need to forward results to the equality test
logic that operates during ID.
 There are two complicating factors:
1. The instruction is decoded during ID, and decided whether a bypass to the equality unit is
needed, and complete the equality comparison so that if the instruction is a branch, the PC can
be set to the branch target address.
2. Because the values in a branch comparison are needed during ID but may be produced later in
time, it is possible that a data hazard can occur and a stall will be needed.
6. To flush instructions in the IF stage, an additional control line is used, called IF.Flush, that zeros the
instruction field of the IF/ID pipeline register.
7. Clearing the register transforms the fetched instruction into a nop, an instruction that has no action and
changes no state.
Dynamic Branch Prediction
1. Prediction of branches at runtime using runtime information.
2. Look up the address of the instruction to see if a branch was taken the last time this instruction
was executed and, if so, to begin fetching new instructions from the same place as the last time. It
is called dynamic branch prediction.
3. A simple static prediction scheme will probably waste too much performance; with more
hardware it is possible to try to predict branch behavior during program execution.
4. Implementation of that approach is branch prediction buffer or branch history table
5. A branch prediction buffer is a small memory indexed by the lower portion of the address of the
branch instruction, that contains one or more bits indicating whether the branch was recently
taken or not.
6. It is not known if the prediction is the right one—it may have been put there by another branch that
has the same low-order address bits. However, this doesn’t affect correctness.
7. Prediction is just a hint that we hope is correct, so fetching begins in the predicted direction.
8. If the hint turns out to be wrong, the incorrectly predicted instructions are deleted, the prediction
bit is inverted and stored back, and the proper sequence is fetched and executed.
9. This simple 1-bit prediction scheme has a performance shortcoming: even if a branch is almost
always taken, it can be predicted incorrectly twice, rather than once, when it is not taken.

Figure 4.20 the states in a 2-bit prediction scheme


10. By using 2 bits rather than 1, a branch that strongly favors taken or not taken—as many branches
do—will be mispredicted only once. The 2 bits are used to encode the four states in the system.
11. The 2-bit scheme is a general instance of a counter-based predictor,
a. It is incremented when the prediction is accurate
b. Otherwise decremented
Dr.V.Kalaivaazhi B.E.,M.Tech.,Ph.D Page 27
c. Uses the mid-point of its range as the division between taken and not taken.
Correlating predictor:

A branch predictor that combines local behavior of a particular branch and global information about the
behavior of some recent number of executed branches.
Tournament branch predictor

A branch predictor with multiple predictions for each branch and a selection mechanism that chooses which
predictor to enable for a given branch.

University Question:

Problem 1 Consider a 4 stage pipeline processor. The number of cycles needed by the four instructions I1,
I2, I3 and I4 in stages S1, S2, S3 and S4 is shown below-

S1 S2 S3 S4

I1 2 1 1 1

I2 1 3 2 2

I3 2 1 1 3

I4 1 2 2 2

What is the number of cycles needed to execute the following loop?


for(i=1 to 2) { I1; I2; I3; I4; }
A. 16
B. 23
C. 28
D. 30

Dr.V.Kalaivaazhi B.E.,M.Tech.,Ph.D Page 28


Solution-

The phase-time diagram is-

From here, number of clock cycles required to execute the loop = 23 clock cycles.
Thus, Option (B) is correct.

Problem 2 Consider a pipelined processor with the following four stages-


IF : Instruction Fetch
ID : Instruction Decode and Operand Fetch
EX : Execute
WB : Write Back

The IF, ID and WB stages take one clock cycle each to complete the operation. The number of clock cycles
for the EX stage depends on the instruction. The ADD and SUB instructions need 1 clock cycle and the MUL
instruction need 3 clock cycles in the EX stage. Operand forwarding is used in the pipelined processor. What
is the number of clock cycles taken to complete the following sequence of instructions?

ADD R2, R1, R0 R2 ← R0 + R1


MUL R4, R3, R2 R4 ← R3 + R2
SUB R6, R5, R4 R6 ← R5 + R4

Dr.V.Kalaivaazhi B.E.,M.Tech.,Ph.D Page 29


A. 7
B. 8
C. 10
D. 14

Solution-

The phase-time diagram is-

From here, number of clock cycles required to execute the instructions = 8 clock cycles.
Thus, Option (B) is correct.

Dr.V.Kalaivaazhi B.E.,M.Tech.,Ph.D Page 30


1

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy