0% found this document useful (0 votes)
20 views14 pages

Fifthmodulenotes

Uploaded by

Bhavya Sri G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views14 pages

Fifthmodulenotes

Uploaded by

Bhavya Sri G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Lecture Notes Digital Design and Computer Organization

Module 5
Basic Processing Unit (BPU)
➢ Fundamental Concepts
• Processor fetches one instruction at a time, and performs the operation specified.
• Instructions are fetched from successive memory locations until a branch or a jump
instruction is encountered.
• Processor keeps track of the address of the memory location containing the next
instruction to be fetched using Program Counter (PC).
• Instruction Register (IR)

Executing an Instruction

• Fetch the contents of the memory location pointed to by the PC. The contents of this
location are loaded into the IR (fetch phase).
IR [[PC]]
• Assuming that the memory is byte addressable, increment the contents of the PC by 4
(fetch phase).
PC  [PC] + 4
• Carry out the actions specified by the instruction in the IR (execution phase).

Processor Organization (Single BUS)

ALU and all the registers are interconnected via a single common bus.
• The data and address lines of the external memory bus connected to the internal
processor bus via the memory data register, MDR and the memory address register,
MAR respectively.
• Register MDR has two inputs and two outputs.
• Data may be loaded into MDR either from the memory bus or from the internal
processor bus.
• The data stored in MDR may be placed on either bus.

Page 1
Lecture Notes Digital Design and Computer Organization

• The input of MAR is connected to the internal bus, and its output is connected to the
external bus.
• The control lines of the memory bus are connected to the instruction decoder and
control logic.
• This unit is responsible for issuing the signals that control the operation of all the
units inside the processor and for increasing with the memory bus.
• The MUX selects either the output of register Y or a constant value 4 to be provided
as input A of the ALU.
• The constant 4 is used to increment the contents of the program counter.
Figure1 below shows the SINGLE BUS organization of the data path inside the processor:

Figure 1: Single-bus Organization of the Data path inside CPU

Page 2
Lecture Notes Digital Design and Computer Organization

Executing an Instruction
Instruction performs some operations like:

1. Transfer a word of data from one processor register to another or to the


ALU.(REGISTER TRANSFERS)
2. Perform arithmetic or a logic operation and store the result in a processor
register.(ARITHMETIC OR LOGIC OPERATION)
3. Fetch the contents of a given memory location and load them into a processor
register.(FETCHING)
4. Store a word of data from a processor register into a given memory location.(STORING)

Figure 2: Input and Output gating for the registers

Page 3
Lecture Notes Digital Design and Computer Organization

Instruction execution involves a sequence of steps in which data are transferred from one
register to another.
• For each register two control signals are used to place the contents of that register on
the bus or to load the data on the bus into register.(symbolically represented in above
figure)
• The input and output of register Ri are connected to the bus via switches controlled by
the signals Riin and Riout respectively.
• When Riin is set to 1, the data on the bus are loaded into Ri.
• Similarly, when Riout is set to 1, the contents of register Ri are placed on the bus.

1. REGISTER TRANSFERS
Example: MOVE R1, R4

Suppose we wish to transfer the contents of register R1 to register R4. This can be
accomplished as follows:
• Enable the output of registers R1 by setting R1out to 1. This places the contents of R1
on the processor bus.
• Enable the input of register R4 by setting R4in to 1. This loads data from the processor
bus into register R4.
CONTROL SEQUENCE
1. R1out, R4in

2. PERFORM AN ARITHMETIC OR LOGIC OPERATION

Example: ADD R1, R2, R3


• The ALU is a combinational circuit that has no internal storage.
• ALU gets the two operands from MUX and bus. The result is temporarily stored in
register Z.
• The sequence of operations to add the contents of register R1 to those of R2 and store
the result in R3 is shown below:

CONTROL SEQUENCE

1. R1out, Yin
2. R2out, SelectY, Add, Zin
3. Zout, R3in

Page 4
Lecture Notes Digital Design and Computer Organization

• In step 1, the output of register R1 and the input of register Y are enabled, causing
thecontents of R1 to be transferred over the bus to Y.
• In Step 2, the multiplexer’s select signal is set to Select Y, causing the multiplexer to gate
the contents of register Y to input A of the ALU. At the same time, the contents of
register R2 are gated onto the bus and, hence, to input B.
• The function performed by the ALU depends on the signals applied to its control lines.
• In this case, the ADD line is set to 1, causing the output of the ALU to be the sum of the
two numbers at inputs A and B.
• This sum is loaded into register Z because its input control signal is activated. In step 3,
the contents of register Z are transferred to the destination register R3. This last transfer
cannot be carried out during step 2, because only one register output can be connected
to the bus during any clock cycle.

3. FETCHING A WORD FROM MEMORY

Example: Move (R1), R2

• The processor has to specify the address of the memory location where this information is
stored and request a Read operation.
• This applies whether the information to be fetched represents an instruction in a program
or an operand specified by an instruction.
• The processor transfers the required address to the MAR, whose output is connected to
the address lines of the memory bus.
• At the same time, the processor uses the control lines of the memory bus to indicate that a
read operation is needed.
• When the requested data are received from the memory they are stored in register MDR,
from where they can be transferred to other registers in the processor.
• To accommodate this, the processor waits until it receives an indication that the requested
operation has been completed (Memory-Function-Completed, MFC).
• The output of MAR is enabled all the time.
• Thus the contents of MAR are always available on the address lines of the memory bus.
• When a new address is loaded into MAR, it will appear on the memory bus at the
beginning of the next clock cycle
• A read control signal is activated at the same time MAR is loaded.

Page 5
Lecture Notes Digital Design and Computer Organization

The actions needed to execute this instruction are:

1. MAR [R1]
2. Start a Read operation on the memory bus
3. Wait for the MFC response from the memory
4. Load MDR from the memory bus
5. R2  [MDR]

The connections and control signals for register MDR is shown below:

Figure3: Connections for register MDR

CONTROL SEQUENCE

1. R1out, MARin, Read


2. MDRinE, WMFC
3. MDRout, R2in

4. STORING A WORD FROM MEMORY

Example: Move R2, (R1)

• Writing a word into a memory location follows a similar procedure.


• The desired address is loaded into MAR.
• Then, the data to be written are loaded into MDR, and a write command is issued.

Page 6
Lecture Notes Digital Design and Computer Organization

CONTROL SEQUENCE

1. R1out,MARin
2. R2out,MDRin, Write
3. MDRoutE,WMFC

➢ EXECUTION OF A COMPLETE INSTRUCTION

Let us now put together the sequence of elementary operations required to execute one
instruction.
Consider the instruction
ADD (R3), R1

which adds the contents of memory location provided by R3 to register R1. Executing this
instruction requires the following actions:
1. Fetch the instruction.
2. Fetch the first operand (the contents of the memory location pointed to by R3).
3. Perform the addition.
4. Load the result into R1.

Figure5 below gives the sequence of control steps required to perform these operations
for the single-bus architecture of figure1:

Figure 5: Control Sequence for execution of the instruction ADD (R3), R1

Page 7
Lecture Notes Digital Design and Computer Organization

Figure 5 gives the sequence of control steps required to perform these operations for the
single bus architecture of figure1.
Instruction execution proceeds as follows:

FETCH PHASE

• In step 1 instruction fetch operation is initiated by loading the contents of the PC into the
MAR and sending a read request to the memory.
• The select signal is set to select the constant4.
• This value is added to the operand at input B, which is the contents of the PC, and the
result is stored in register Z.
• The updated value is moved from register Z back into the PC during step 2, while waiting
for the memory to respond.
• In step 3, the word fetched from the memory is loaded into the IR.
• Steps 1 to 3 constitute the instruction fetch phase, which is the same for all instructions.

EXECUTE PHASE
• The instruction is decoded and the control circuitry activates the control signals for steps
4 through 7, which constitute the execution phase.
• The contents of register R3 are transferred to MAR in step 4, and a memory read
operation is initiated.
• Then the contents of R1 are transferred to register Y in step 5, to prepare for the addition
operation.
• When the read operation is completed, the memory operand is available in register MDR,
and the addition operation is performed in step 6.
• The addition is performed by ALU and the sum is stored in register Z, and then
transferred to R1 in step 7.
• The END signal causes a new instruction fetch cycle to begin by returning to step 1.

This discussion accounts for all control signals in figure5 except Yin in step 2. There
is no need to copy the updated contents of PC into register Y when executing the Add
instruction. But, in branch instructions the updated value of the PC is needed to compute the
branch target address. To speed up the execution of branch instructions, this value is copied
into register Y in step 2.

Page 8
Lecture Notes Digital Design and Computer Organization

BRANCH INSTRUCTIONS
• A branch instruction replaces the contents of the PC with the branch target address.
• This address is usually obtained by adding an offset X, which is given in the branch
instruction, to the updated value of PC.
• Figure 6 below gives the control sequence that implements an unconditional branch
instruction.

Figure 6: Control sequence for unconditional branch instruction

• Processing starts, as usual, with the fetch phase. This phase ends when the instruction is
loaded into the IR in step 3.
• The offset value is extracted from the IR by the instruction decoding circuit.
• Since the value of the updated PC is already available in register Y, the offset X is gated
onto the bus in step 4, and an addition operation is performed.
• The result, which is the branch target address, is loaded into the PC in step 5.

Consider now a conditional branch. In this case we need to check the status of the
condition codes before loading the new value into the PC.
For example, for a Branch-on-negative (Branch < 0) instruction step 4 in figure 6 is
replaced with
Offset-field-of-IRout, Add, Zin, If N=0 then END
Thus if N=0 the processor returns to step 1 immediately after step 4. If N=1, step 5 is
performed to load a new value into the PC, thus performing the branch instruction.

Page 9
Lecture Notes Digital Design and Computer Organization

Figure11: Generation of the Zin control signal

This signal is asserted during time slot T1 for all the instructions, during T6 for an
ADD instruction, during T4 for an unconditional branch instruction, and so on.

Pipelining

The speed of execution of programs is influenced by many factors one way to improve
performance is to use pipelining.It is particularly effective way of organizing concurrent
activity in a computer sytem.

Consider how the idea of pipelining can be used in a computer. The processor executes a
program by fetching and executing instructions one after the other. Let Fi and Ei refer to the
fetch and execute steps for instruction Ii

A computer has 2 separate hardware units, one for fetching instructions and another for
executing them. The instruction fetched by the fetch unit is deposited in an intermediate
storage buffer B1. This buffer is needed to enable the execution unit to execute the
instruction while the fetch unit is fetching the next instruction.

Page 10
Lecture Notes Digital Design and Computer Organization

The processing of an instruction need not be divided into only 2 steps. For ex a pipelined
processor may process each instruction in 4 steps as follows.

*F Fetch *D Decode *E Execute * W write

This means that 4 distinct h/w units are needed

During Clock cycle 4 the information in the buffer is as follows.

Buffer B1 holds instruction I3 ,which was fetched in cycle 3 and is being decoded by the
instruction decoding unit.

Buffer B2 holds both the source operands for Instruction I2 and the specification of the
operation to be performed. This is the information produced by the decoding h/W in cycle
3.The buffer also holds the information needed for the write step of Instruction I2 (step
W2).Even though it is not needed by stage E,this information must be passed on to stage W
in the following clock cycle to enable that stage to perform the required write operation.

Buffer B3 holds the results produced by the execution unit and the destination information
for Instruction I1.

Page 11
Lecture Notes Digital Design and Computer Organization

Role of cache memory

Each stage in a pipeline is expected to complete its operation in one clock cycle. Hence the
clock period should be sufficiently long to complete the task being performed in any stage.

If different units require different amounts of time, the clock period must allow the longest
task to be completed. A unit that completes its task early is idle for the remainder of the
clock period.

Hence pipelining is most effective in improving performance if the tasks being performed
in different stages require about the same amount of time.

The clock cycle has to be equal to or greater than the time needed to complete a fetch
operation .However the access time of the main memory may be as much as 10 times
greater than the time needed to perform the basic pipeline stage operations. Inside the
processor ,such as adding 2 numbers. Thus, if each instruction fetch required access to the
main memory pipelining would be of little value.

Page 12
Lecture Notes Digital Design and Computer Organization

The use of cache memories solves the memory access problem .In particular, when a cache
is included on the same chip as the processor access time to the cache is usually the same as
the time needed to perform other basic operations inside the processor.

Pipeline Performance

In the example, the operation specified in Instruction requires 3 cycles to complete from
cycle 4 through cycle 6. Thus,in cycle cycle 5 and 6 the write stage must be told to do
nothing because it has no data to work with .Meanwhile the information in buffer B2 must
remain intact until the execute stage has completed its operation. This means that stage2
and in turn stage1 are blocked from accepting new instructions because the information in
B1 cannot be overwritten. Thus steps D4 and F5 must be postponed.

Pipelined operation is said to have been stalled for 2 clock cycles. Normal pipelined
operation resumes in cycle7.Any condition that causes the pipeline to stall is called as
hazard. A data hazard is any condition in which either the source or the destination
operands of an instruction are not available at the time expected in the pipeline.As a result
some operation has to be delayed and the pipeline stalls.

The pipeline may also be stalled because of a delay in the availability of an instruction.For
ex ,this may be a result of a miss in the cache, requiring the instruction to be fetched from
the main memory .Such hazards are often called control hazards or instruction hazards.

Page 13
Lecture Notes Digital Design and Computer Organization

This figure gives the function performed by each pipeline stage in each clock cycle. Note
that the Decode unit is idle in cycle3 through cycle5 the execute unit is idle in cycle 4
through 6,and write unit is idle in cycle 5 through 7.such idle periods are called stalls.They
are also often referred to as bubbles in the pipeline.

A third type of hazard that may be encountered in pipelined operation is known as a


structural hazard. This is the situation when 2 instructions require the use of a given
hardware resource at the same time. The most common case in which this hazard may arise
is in access to memory. One instruction may need to access memory as part of the Execute
or write stage while another instruction is being fetched.

If instructions and data reside in the same cache unit only one instruction can proceed and
the other instruction is delayed. Many processors use separate instruction and data caches to
avoid this delay.

Page 14

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy