CS 3351 Digital Principles and Computer Organization
CS 3351 Digital Principles and Computer Organization
Instruction Execution – Building a Data Path – Designing a Control Unit – Hardwired Control,
Microprogrammed Control – Pipelining – Data Hazard – Control Hazards.
As instructions are a part of the program which are stored inside the memory, so every time the processor
requires to execute an instruction, for that the processor first fetches the instruction from the memory,
then decodes the instruction and then executes the instruction. The whole process is known as an instruction
cycle.
In the basic computer, each instruction cycle includes the following procedures −
1. After the following four procedures are done, the control switches back to the first step and repeats the
similar process for the next instruction.
2. Therefore, the cycle continues until a Halt condition is met.
3. The figure shows the phases contained in the instruction cycle.
The data transfer for implementation takes place in two methods are as follows −
Processor-memory − The data sent from the processor to memory or from memory to processor.
Processor-Input/Output − The data can be transferred to or from a peripheral device by the transfer between a
processor and an I/O device.
Instruction execution :
PC (program counter) register of the processor gives the address of the instruction which needs to be fetched
from the memory.
If the instruction is fetched then, the instruction opcode is decoded.
On decoding, the processor identifies the number of operands. If there is any operand to be fetched from the
memory, then that operand address is calculated.
Operands are fetched from the memory. If there is more than one operand, then the operand fetching process
may be repeated (i.e. address calculation and fetching operands).
After this, the data operation is performed on the operands, and a result is generated.
If the result has to be stored in a register, the instructions end here.
1. Figure 4.3a shows the first element needed: A memory unit to store the instructions of a program and
supply instructions given an address.
The instruction memory need only provide read access because the data path does not write
instructions.
The instruction memory is treated as combinational logic since it only reads,
Output at any time reflects the contents of the location specified by the address input,
No read control signal is needed.
2. Figure 4.3b shows the program counter (PC), a register that holds the address of the current instruction.
The program counter is a 32-bit register that is written at the end of every clock cycle.
So, it does not need a write control signal.
3. Figure 4.3c shows the adder needed to increment the PC to the address of the next instruction.
The adder is a wired ALU that always add its two 32-bit inputs and place the sum on its output.
Figure 4.3 Elements needed for data path design Figure 4.4 Combination of three elements
4. Figure 4.4 shows how to combine the three elements from Figure 4.3 to form a data path that fetches
instructions and increments the PC to obtain the address of the next sequential instruction.
Figure 4.5 Elements of R-Format instruction. Figure 4.6. Elements of loads and stores
Implementation of loads and stores
1. The MIPS load word and store word instructions computes a memory address by adding the base register,
which is $t2, to the 16-bit signed off set field contained in the instruction.
lw $t1,offset_value($t2)
sw $t1,offset_value ($t2)
2. A sign-extend unit is needed to extend the 16-bit off set field in the instruction to a 32-bit signed value,
and a data memory unit to read from or write to as shown in Figure 3.6.
3. sign-extend To increase the size of a data item by replicating the high-order sign bit of the original data
item in the high order bits of the larger, destination data item.
Figure 4.8. MIPS architecture data path for different instruction classes.
4. To share a datapath element between two different instruction classes,
A multiplexor is used to allow multiple connections to the input of an element,
A control signal is used to select one among the multiple inputs.
5. The branch instruction uses the main ALU for comparison of the register operands, so the adder is used
for computing the branch target address.
Control Unit :
It is the part of the computer’s central processing unit (CPU), which directs the operation of
the processor.
It was included as part of the Von Neumann Architecture by John von Neumann.
It is the responsibility of the Control Unit to tell the computer’s memory, arithmetic/logic unit
and input and output devices how to respond to the instructions that have been sent to the
processor.
Dr.V.Kalaivaazhi B.E.,M.Tech.,Ph.D Page 10
It fetches internal instructions of the programs from the main memory to the processor
instruction register, and based on this register contents, the control unit generates a control
signal that supervises the execution of these instructions.
A control unit works by receiving input information to which it converts into control signals,
which are then sent to the central processor.
The computer’s processor then tells the attached hardware what operations to perform.
The functions that a control unit performs are dependent on the type of CPU because the
architecture of CPU varies from manufacturer to manufacturer.
Examples of devices that require a CU are:
i. Control Processing Units(CPUs)
ii. Graphics Processing Units(GPUs)
1. It coordinates the sequence of data movements into, out of, and between a processor’s many sub-units.
2. It interprets instructions.
3. It controls data flow inside the processor.
4. It receives external instructions or commands to which it converts to sequence of control signals.
5. It controls many execution units(i.e. ALU, data buffers and registers) contained within a CPU.
6. It also handles multiple tasks, such as fetching, decoding, execution handling and storing results.
In the Hardwired control unit, the control signals that are important for instruction execution
control are generated by specially designed hardware logical circuits, in which we can not modify the
signal generation method without physical change of the circuit structure.
As a result, few output lines going out from the instruction decoder obtains active signal values.
These output lines are connected to the inputs of the matrix that generates control signals for execution
units of the computer.
This matrix implements logical combinations of the decoded signals from the instruction
opcode with the outputs from the matrix that generates signals representing consecutive control unit states
and with signals coming from the outside of the processor, e.g. interrupt signals.
Control signals for an instruction execution have to be generated not in a single time point but during the
entire time interval that corresponds to the instruction execution cycle.
Following the structure of this cycle, the suitable sequence of internal states is organized in the control
unit.
A number of signals generated by the control signal generator matrix are sent back to inputs of the next
control state generator matrix.
This matrix combines these signals with the timing signals, which are generated by the timing unit based
on the rectangular patterns usually supplied by the quartz generator. When a new instruction arrives at the
control unit, the control units is in the initial state of new instruction fetching.
Instruction decoding allows the control unit enters the first state relating execution of the new instruction,
which lasts as long as the timing signals and other input signals as flags and state information of the
computer remain unaltered.
This causes that a new respective input is generated for the control signal generator matrix. When an
external signal appears, (e.g. an interrupt) the control unit takes entry into a next control state that is the
state concerned with the reaction to this external signal (e.g. interrupt processing).
The values of flags and state variables of the computer are used to select suitable states for the instruction
execution cycle.
The last states in the cycle are control states that commence fetching the next instruction of the program:
sending the program counter content to the main memory address buffer register and next, reading the
instruction word to the instruction register of computer.
When the ongoing instruction is the stop instruction that ends program execution, the control unit enters
an operating system state, in which it waits for a next user directive.
The fundamental difference between these unit structures and the structure of the hardwired
control unit is the existence of the control store that is used for storing words containing encoded
control signals mandatory for instruction execution.
In micro programmed control units, subsequent instruction words are fetched into the instruction
register in a normal way. However, the operation code of each instruction is not directly decoded to
enable immediate control signal generation but it comprises the initial address of a microprogram
contained in the control store.
In this, the instruction opcode from the instruction register is sent to the control store address register.
Based on this address, the first microinstruction of a microprogram that interprets execution of this
instruction is read to the microinstruction register.
This microinstruction contains in its operation part encoded control signals, normally as few bit fields. In
a set microinstruction field decoders, the fields are decoded. The microinstruction also contains the
address of the next microinstruction of the given instruction microprogram and a control field used to
control activities of the microinstruction address generator.
The last mentioned field decides the addressing mode (addressing operation) to be applied to the
address embedded in the ongoing microinstruction.
In microinstructions along with conditional addressing mode, this address is refined by using the
processor condition flags that represent the status of computations in the current program.
The last microinstruction in the instruction of the given microprogram is the microinstruction that
fetches the next instruction from the main memory to the instruction register.
In this, in a control unit with a two-level control store, besides the control memory for microinstructions,
a nano-instruction memory is included.
The operation part of microinstructions contains the address of the word in the nano-instruction memory,
which contains encoded control signals.
The nano-instruction memory contains all combinations of control signals that appear in microprograms
that interpret the complete instruction set of a given computer, written once in the form of nano-
instructions.
In this way, unnecessary storing of the same operation parts of microinstructions is avoided. In this case,
microinstruction word can be much shorter than with the single level control store.
It gives a much smaller size in bits of the microinstruction memory and, as a result, a much smaller size
of the entire control memory.
The microinstruction memory contains the control for selection of consecutive microinstructions, while
those control signals are generated at the basis of nano-instructions.
In nano-instructions, control signals are frequently encoded using 1 bit/ 1 signal method that eliminates
decoding.
4.6 Pipelining
An implementation technique in which multiple instructions are overlapped in execution, much like an
assembly line.
All MIPS instructions are the same length. This makes it easier to fetch instructions in the first pipeline stage
and to decode them in the second stage.
1. MIPS have only a few instruction formats, with the source register fields being located in the same
place in each instruction.
The second stage can begin reading the register file at the same time that the hardware is
determining what type of instruction was fetched.
If MIPS instruction formats were not symmetric, stage 2 is splitted, resulting in six
pipeline stages.
Figure 4.13 Single-cycle, non - pipelined execution in top versus pipelined execution in bottom.
A situation in pipelining when the next instruction cannot execute in the following clock cycle is called
hazards.
1. Structural hazard
2. Data Hazards
3. Control Hazards
1. Structural hazard
i. When a planned instruction cannot execute in the proper clock cycle because the hardware does not
support the combination of instructions that are set to execute.
ii. MIPS instruction set avoids structural hazards.
If the pipeline in Figure 4.13 had a fourth instruction, in the same clock cycle the first
instruction is accessing data from memory while the fourth instruction is fetching an
instruction from that same memory.
Without two memories, it could have a structural hazard.
2. Data Hazards
i. When a planned instruction cannot execute in the proper clock cycle because data that is needed to
execute the instruction is not yet available.
ii. It occurs when the pipeline must be stalled because one step must wait for another to complete.
iii. Data hazards arise from the dependence of one instruction on an earlier one that is still in the pipeline.
The write-back stage, which places the result back into the register file in the middle of the
data path
The selection of the next value of the PC, choosing between the incremented PC and the
branch address from the MEM stage
Data flowing from right to left does not affect the current instruction; these reverse data movement’s
influence only later instructions in the pipeline.
The first right-to-left flow of data causes data hazards
The second causes control hazards.
Figure3.16 shows the pipelined version of the data path with registers, to hold information produced in
previous cycle.
Figure 4.14 The pipelined version of the data path with registers.
Figure 4.15 The control lines for the final three stages
The control information follows the instruction with which it’s associated.
Figure 4.16 The pipelined data path with the control signals
4.9.1 Figure 4.17 illustrates the execution of above instructions using a multiple-clock-cycle pipeline
representation.
4.9.2 When a register is read and written in the same clock cycle it is assumed that the write is in the
first half of the clock cycle and the read is in the second half, so the read delivers recently
written value.
4.9.2.1 The value of register $2, changes during the middle of clock cycle 5, when the sub
instruction writes its result.
if (MEM/WB.RegWrite
and (MEM/WB.RegisterRd ≠ 0)
and not(EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0)
and (EX/MEM.RegisterRd ≠ ID/EX.RegisterRt))
and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01
Data Hazards and Stalls
1. Forwarding cannot be used, when an instruction tries to read a register following a load instruction
that writes the same register.
2. Something must stall the pipeline for the combination of load followed by an instruction that reads its
result.
3. A hazard detection unit is used in addition to the forwarding unit to deal such cases.
It operates during the ID stage so that it can insert the stall between the load and its use.
Checking for load instructions, the control for the hazard detection unit is this single condition:
if (ID/EX.MemRead and
((ID/EX.RegisterRt = IF/ID.RegisterRs) or
(ID/EX.RegisterRt = IF/ID.RegisterRt)))
stall the pipeline
First line checks whether the instruction is a load: the only instruction that reads data memory is a
load.
The next two lines check whether the destination register field of the load in the EX stage matches
either source register of the instruction in the ID stage.
If the condition holds, the instruction stalls one clock cycle.
After this 1-cycle stall, the forwarding logic can handle the dependence and execution proceeds.
Force control values in ID/EX register to 0
o EX, MEM and WB do nop (no-operation)
Figure 4.18 the way stalls are really inserted into the pipeline
A branch predictor that combines local behavior of a particular branch and global information about the
behavior of some recent number of executed branches.
Tournament branch predictor
A branch predictor with multiple predictions for each branch and a selection mechanism that chooses which
predictor to enable for a given branch.
University Question:
Problem 1 Consider a 4 stage pipeline processor. The number of cycles needed by the four instructions I1,
I2, I3 and I4 in stages S1, S2, S3 and S4 is shown below-
S1 S2 S3 S4
I1 2 1 1 1
I2 1 3 2 2
I3 2 1 1 3
I4 1 2 2 2
From here, number of clock cycles required to execute the loop = 23 clock cycles.
Thus, Option (B) is correct.
The IF, ID and WB stages take one clock cycle each to complete the operation. The number of clock cycles
for the EX stage depends on the instruction. The ADD and SUB instructions need 1 clock cycle and the MUL
instruction need 3 clock cycles in the EX stage. Operand forwarding is used in the pipelined processor. What
is the number of clock cycles taken to complete the following sequence of instructions?
Solution-
From here, number of clock cycles required to execute the instructions = 8 clock cycles.
Thus, Option (B) is correct.