Ddco M5
Ddco M5
The processing unit which executes machine instructions and coordinates the
activities of other units of computer is called the Instruction Set Processor (ISP)
or processor or Central Processing Unit (CPU).
Consider the instruction Move R2,(R1). This requires the following sequence:
1) R1out, MARin desired address is loaded into MAR.
2) R2out,MDRin,Write ;data to be written are loaded into MDR & write
command is issued.
3) MDRoutE, WMFC ; load data into memory-location pointed by R1 from
MDR.
Pipelining:
5.7 Basic Concepts:
The speed of execution of programs is influenced by many factors.
One way to improve performance is to use faster circuit technology to build the
processor and the main memory. Another possibility is to arrange the hardware so that more
than one operation can be performed at the same time. In this way, the number of
operations performed per second is increased even though the elapsed time needed to perform
any one operation is not changed.
Pipelining is a particularly effective way of organizing concurrent activity in a computer
system.
The technique of decomposing a sequential process into sub-operations, with each
sub- operation being executed in a dedicated segment
Pipelining is commonly known as an assembly-line operation.
Now consider a computer that has two separate hardware units, one for fetching
instructions and another for executing them, as shown in Figure b. The instruction
fetched by the fetch unit is deposited in an intermediate storage buffer, B1. This buffer is
needed to enable the execution unit to execute the instruction while the fetch unit is
fetching the next instruction. The results of execution are deposited in the destination
location specified by the instruction.
The computer is controlled by a clock. any instruction fetch and execute steps completed
in one clock cycle.
Operation of the computer proceeds as in Figure 8.1c.
In the first clock cycle, the fetch unit fetches an instruction I1 (step F1) and
stores it in buffer B1 at the end of the clock cycle.
In the second clock cycle, the instruction fetch unit proceeds with the fetch
operation for instruction I2 (step F2). Meanwhile, the execution unit performs
the operation specified by instruction I1, which is available to it in buffer B1
(step E1). By the end of the second clock cycle, the execution of instruction I1 is
completed and instruction I2 is available. Instruction I2 is stored in B1, replacing
I1, which is no longer needed.
Step E2 is performed by the execution unit during the third clock cycle, while
instruction I3 is being fetched by the fetch unit. In this manner, both the fetch
and execute units are kept busy all the time. If the pattern in Figure 8.1c can be
sustained for a long time, the completion rate of instruction execution will be
twice that achievable by the sequential operation depicted in Figure a.
8
The sequence of events for this case is shown in Figure a. Four instructions are in
progress at any given time. This means that four distinct hardware units are needed,
as shown in Figure b. These units must be capable of performing their tasks
simultaneously and without interfering with one another. Information is passed
from one unit to the next through a storage buffer. As an instruction progresses
through the pipeline, all the information needed by the stages downstream
must be passed along. For example, during clock cycle 4, the information in the
buffers is as follows:
Buffer B1 holds instruction I3, which was fetched in cycle 3 and is being
decoded by the instruction-decoding unit.
Buffer B2 holds both the source operands for instruction I2 and the specification of
the operation to be performed. This is the information produced by the decoding
hardware in cycle 3. The buffer also holds the information needed for the write step of
instruction I2 (stepW2). Even though it is not needed by stage E, this information
must be passed on to stage W in the following clock cycle to enable that stage to
perform the required Write operation.
Buffer B3 holds the results produced by the execution unit and the destination
information for instruction I1.
9
5.9 Pipeline Performance:
The potential increase in performance resulting from pipelining is
proportional to the number of pipeline stages.
However, this increase would be achieved only if pipelined operation as
depicted in Figure a could be sustained without interruption throughout
program execution.
Unfortunately, this is not the True.
Floating point may involve many clock cycle.
For a variety of reasons, one of the pipeline stages may not be able to
complete its processing task for a given instruction in the time allotted. For
example, stage E in the four stage pipeline of Figure b is responsible for
arithmetic and logic operations, and one clock cycle is assigned for this task.
Although this may be sufficient for most operations, some operations, such
as divide, may require more time to complete. Figure shows an example in
which the operation specified in instruction I2 requires three cycles to
complete, from cycle 4 through cycle 6. Thus, in cycles 5 and 6, the Write
stage must be told to do nothing, because it has no data to work with.
Meanwhile, the information in buffer B2 must remain intact until the
Execute stage has completed its operation. This means that stage 2 and, in
turn, stage1 are blocked from accepting new instructions because the
information in B1 cannot be overwritten. Thus, steps D4 and F5 must be
postponed as shown.
Pipelined operation in Figure 8.3 is said to have been stalled for two clock cycles.
Normal pipelined operation resumes in cycle 7. Any condition that causes the
pipeline to stall is called a hazard. We have just seen an example of a data
hazard.
1) A data hazard is any condition in which either the source or the destination
operands of an instruction are not available at the time expected in the
pipeline. As a result some operation has to be delayed, and the pipeline
stalls.