6 (DDCO (BCS302) ) (MODULE 1 5) Notes
6 (DDCO (BCS302) ) (MODULE 1 5) Notes
Module 1
Introduction
An electronic circuit is composed of individual electronic components such as resistors, diodes, capacitors,
transistors etc. An electronic circuit can usually be categorized as an analog circuits and digital circuits.
Analog electronics is an electronics system where signal change continuously. Analog signal is a signal
whose amplitude can take any value between given limits. An analog circuit operates on continuous signals.
Digital electronics is a field of electronics involving the study of digital signals and the engineering of
devices that use or produce them. Digital signal is a signal whose amplitude can have only given discrete
values between defined limits. A signal that changes amplitude in discrete steps. A digital circuit operates on
discrete signals. Digital signals are represented using binary value 0’s and 1’s.
There are two types of digital circuits 1. Combinational logic circuits 2. Sequential logic circuits
Clock is a periodic, rectangular waveform used as a basic timing signal. Duty cycle for a periodic digital
signal, the ratio of high-level time to the period or the ratio of low-level time to the period. A table that shows
all of the input output possibilities of a logic circuit is called truth table.
NOT gate:
A gate with only one input and a
complemented output.
OR gate:
A gate with two or more inputs. The
output is high when any input is high
AND gate:
A gate with 2 or more inputs. The output
is high only when all inputs are high
Any logic function/ logic circuit can be implemented using only one kind of gate then such gates are called
universal logic gates. NOR gate and NAND gate are called universal logic gates
NOR Gate
A gate with two or more inputs. The output is low when any input is high.
NAND gate:
A gate with two or
more inputs. The
output is low when
all input is high.
Exclusive-OR gate:
A gate with two or more inputs and
output of is HIGH only when the
number of HIGH inputs is odd.
Equivalence/exclusive-NOR gate
A gate with two or more inputs and
output of is HIGH only when the
number of HIGH inputs is even.
Active-low refers to the concept in which a signal must be low to cause something to happen or to indicate
that something has happened. Assert means to activate. If an input line has a bubble on it, you assert the input
by making it low. If there is no bubble, you assert the input by making it high.
The minterm expression can be written by collecting all terms for which function evaluates to 1 i.e., high
output.
Example:
An Incompletely specified function is a Boolean function that only define output values for a subset of its
inputs - i.e., a Boolean function whose output is a don't care for at least one of its input combinations. The Xs
in the truth table indicate that we don’t care whether the value of 0 or 1 is assigned to F.
Example: Truth Table with Don’t-Cares.
In SOP we use m to denote the required minterms and d to denote the don’t-care minterms.F
(A, B, C) = Σ m(0, 3, 7) + Σ d(1, 6)
In POS we use M to denote the required maxterms and D to denote the don’t-care maxterms.F
(A, B, C) = Π M (2, 4, 5) · Π D(1, 6)
Write minterm and maxterm expansions for the following truth table.
The minimum sum of products is not necessarily unique; that is, a given function may have two different
minimum sums of products forms, each with the same number of terms and the same number of literals.
Given a minterm expansion, the minimum sum-of products form can often be obtained by the following
procedure:
(i) Combine terms by using XY'+ XY =(Y′+Y) =X. Do this repeatedly to eliminate as many literals as
possible.A given term may be used more than once because X+X=X.
(ii) Eliminate redundant terms by using the theorems of Boolean Algebra.
A minimum product of sums expression for a function is defined as a product of sum terms which
(i) has a minimum number of terms, and
(ii) of all those expressions which have the same number of terms, has a minimum number of literals.
Unlike the maxterm expansion, the minimum product of sums form of a function is not necessarily unique.
Given a maxterm expansion, the minimum product of sums can often be obtained by a procedure similar to
that used in the minimum sum of products case, except that the theorem (X+Y′)(X+Y)= X is used to combine
terms.
Simplification of Boolean function reduces the gate count required to implement the circuit, the circuit
works faster and circuit require less power consumption.
Switching/Boolean functions can generally be simplified by using the algebraic techniques. The
disadvantages of algebraic procedure usage are
(i) The procedures are difficult to apply in a systematic way,
(ii) It is difficult to tell when we have arrived at a minimum solution.
Karnaugh map/K map is a method simplifying and manipulating switching functions. K map method is
fasterand easier to apply than other simplification methods.
OR
A quad is a group of four ls that are horizontally or vertically adjacent and a quad eliminates two variables
and their complements
An octet is a group of 8 ls that are horizontally or vertically adjacent and an octet eliminates three variablesand
their complements
Overlapping of groups: We are allowed to use the same 1 more than once.
Rolling of Map:
Groups may wrap around the table. The leftmost cell in a row may be grouped with the rightmost cell
and the top cell in a column may be grouped with the bottom cell. Roll and overlap to get largest group.
Any single 1 or any group of 1’s which can be combined together on a map of the function F representsa
product term which is called an implicant of F. Several implicants of F may be possible. A product term
implicant is called a prime implicant if it cannot be combined with another term to eliminate a variable.
The following procedure can then be used to obtain a minimum sum of products from a Karnaugh map.
1) Choose a minterm (a 1) which has not yet been covered.
2) Find all 1’s and X’s adjacent to that minterm. (Check the n adjacent squares on an n-variable map.)
3) If a single term covers the minterm and all of the adjacent 1’s and X’s, then that term is an
essential prime implicant, so select that term. (don’t-care terms are treated like 1’s in steps 2 and 3
but not in step 1.)
4) Repeat steps 1, 2, and 3 until all essential prime implicants have been chosen.
5) Find a minimum set of prime implicants which cover the remaining 1’s on the map. (If there is
more than one such set, choose a set with a minimum number of literals.)
The following figure shows the flowchart for determining a minimum sum of products using a
Karnaughmap with an example
Solve S= F(A,B,C)=Σ m(0, 1, 3, 5, 6, 7, 11, 12, 14) using Kmap and implement using basic gates, nand
onlyand nor only.
Solve S=F(A,B,C,D)=Σ m(0,1, 2, 4, 5,6, 8,9,10,12,13) using Kmap and implement using basic gates,
nandonly and nor only.
Limitations of K map:
Complexity of K-map simplification process increases with the increase in the number of variables K map is
manual technique and simplification process heavily depends on the human ability.
Module 2
COMBINATIONAL LOGIC CIRCUITS
MULTIPLEXERS:
A multiplexer (or data selector, abbreviated as MUX) has a group of data inputs and a group of control inputs. The
control inputs are used to select one of the data inputs and connect it to the output terminal. The following Figure
shows a 2-to-1 multiplexer.
When the control input A is 0, the switch is in the upper position and the MUX output is Z = I0; when A is 1, the
switch is in the lower position and the MUX output is Z = I1. In other words, a MUX acts like a switch that
selects one of the data inputs (I0 or I1) and transmits it to the output. The logic equation for the 2-to-1 MUX is
therefore: 𝑍 = 𝐴′ 𝐼0 + 𝐴𝐼1
The following Figure shows diagrams for a 4-to-1 multiplexer, 8-to-1 multiplexer, and 2n-to-1 multiplexer
The 4-to-1 MUX acts like a four-position switch that transmits one of the four inputs to the output. Two control
inputs (A and B) are needed to select one of the four inputs. If the control inputs are AB = 00, the output is I0;
similarly, the control inputs 01, 10, and 11 give outputs of I1, I2, and I3, respectively.
The 4- to-1 multiplexer is described by the equation: 𝑍 = 𝐴′ 𝐵′𝐼0 + 𝐴′ 𝐵𝐼1 + 𝐴𝐵′𝐼2 + 𝐴𝐵𝐼3
Similarly, the 8-to-1 MUX selects one of eight data inputs using three control inputs.
It is described by the equation: 𝑍 = 𝐴′ 𝐵′𝐶′𝐼0 + 𝐴′ 𝐵′ 𝐶𝐼1 + 𝐴′𝐵𝐶′𝐼2 + 𝐴′𝐵𝐶𝐼3 + 𝐴𝐵′𝐶′𝐼4 + 𝐴𝐵′ 𝐶𝐼5 +
𝐴𝐵𝐶′𝐼6 + 𝐴𝐵𝐶𝐼7.
If the OR gate in the above Figure is replaced by a NOR gate, then the 8-to-1 MUX inverts the selected input.
To distinguish between these two types of multiplexers, we will say that the multiplexers without the inversion
have active high outputs, and the multiplexers with the inversion have active low outputs.
In general, a multiplexer with n control inputs can be used to select any one of 2n data inputs. The general
equation for the output of a MUX with n control inputs and 2n data inputs is
Where mk is a minterm of the n control variables and Ik is the corresponding data input.
Multiplexers are frequently used in digital system design to select the data which is to be processed or stored.
The following Figure shows how a quadruple 2-to-1 MUX is used to select one of two 4-bit data words. If the
control A = 0, the values of x0, x1, x2, and x3 will appear at the z0, z1, z2, and z3 outputs; if A = 1, the values
of y0, y1, y2, and y3 will appear at the outputs.
Multiplexer Logic:
A digital design usually begins with a truth table. The problem is to come up with a logic circuit that has the
same truth table. We have two standard methods for implementing a truth table – the SOP and the POS solution.
The third method is the multiplexer solution.
Problem: Implement Y (A, B, C, D) = ∑m (0, 2, 3, 4, 5, 8, 9, 10, 11, 12, 13, 15) using 16-to-1 multiplexer
(IC 74150) & 8-to-1 multiplexer.
We follow a procedure that is similar to the one that we adopted in Entered Variable Map method to
implement Y using 8-to-1 MUX.
8-to-1 MUX
A B C
Data Inputs
0 0 0 ̅
𝐷
0 0 1 1
0 1 0 1
0 1 1 0
1 0 0 1
1 0 1 1
1 1 0 1
1 1 1 D
Problem:
Design a 32-to-1 multiplexer using two 16-to-1 multiplexer and one 2-to-1 multiplexer. Solution: The circuit
diagram is shown in the following Fig. A 32-to-1 multiplexer required 5 (log2 32) select lines (say, ABCDE).
The lower four select lines (BCDE) chose 16-to-1 multiplexer outputs. The 2- to-1 multiplexer chooses one of
the outputs of two 16-to-1 multiplexers, depending on the 5th select line
Problem: Realize 𝑌 = 𝐴̅𝐵 + 𝐵𝐶̅ + 𝐴𝐵𝐶 using an 8-to-1 multiplexer. Also, realize the same with a 4-to-1multiplexer.
Solution: Given, 𝑌 = 𝐴̅𝐵 + 𝐵̅ 𝐶̅ + 𝐴𝐵𝐶
𝑌 = 𝐴̅𝐵(𝐶̅ + 𝐶) + 𝐵̅ 𝐶̅(𝐴̅ + 𝐴) + 𝐴𝐵𝐶
𝑌 = 𝐴̅𝐵𝐶̅ + 𝐴̅ 𝐵𝐶 + 𝐴̅𝐵̅ 𝐶̅ + 𝐴𝐵̅ 𝐶̅ + 𝐴𝐵𝐶
Y = ∑m (0, 2, 3, 4, 7).
Hence, to generate the given logic function, using 8-to-1 multiplexer,we find D0 = D2 = D3 = D4 = D7 = 1 and
D1 =D5 = D6 = 0.
Alternative method
8-to-1 MUX 4-to-1 MUX
A B C
𝑌 = 𝐴̅𝐵𝐶̅ + 𝐴̅ 𝐵𝐶 + 𝐴̅𝐵̅ 𝐶̅ + 𝐴𝐵̅ 𝐶̅ + 𝐴𝐵𝐶
Data Inputs Data Inputs
0 0 0 1 = D0
𝐶̅ = D0
0 0 1 0 = D1 𝑌 = 𝐴̅𝐵̅ (𝐶̅ ) + 𝐴̅ 𝐵(𝐶̅ ) + 𝐴̅ 𝐵(𝐶) + 𝐴𝐵̅ (𝐶̅ ) + 𝐴𝐵(𝐶)
0 1 0 1 = D2
1 = D1 Hence, for a 4-to-1 multiplexer,
0 1 1 1 = D3
1 0 0 1 = D4 we find D0 = C’, D1 = 1, D2 = C’, and D3 = C generates the
𝐶̅ = D2
1 0 1 0 = D5 given function.
1 1 0 0 = D6
C = D3
1 1 1 1 = D7
Normally, a logic circuit will not operate correctly if the outputs of two or more gates or other logic devices are
directly connected to each other. Use of three-state logic permits the outputs of two or more gates or other logic
devices to be connected together. The following Figure shows a three-state buffer and its logical equivalent.
When the enable input B is 1, the output C equals A; when B is 0, the output C acts like an open circuit. In other
words, when B is 0, the output C is effectively disconnected from the buffer output so that no current can flow.
This is often referred to as a Hi-Z (high-impedance) state of the output because the circuit offers a very high
resistance or impedance to the flow of current. Three-state buffers are also called tri-state buffers.
The following Figure shows the truth tables for four types of three-state buffers.
In Figures (a) and (b), the enable input B is not inverted, so the buffer output is enabled when B = 1 and disabled
when B = 0. That is, the buffer operates normally when B = 1, and the buffer output is effectively an open circuit
when B = 0. We use the symbol Z to represent this high-impedance state.
In Figure (b), the buffer output is inverted so that C = A’ when the buffer is enabled.
The buffers in Figures (c) and (d) operate the same as in (a) and (b) except that the enable input is inverted, so the
buffer is enabled when B = 0.
In the following Figure, the outputs of two three-state buffers are tied together. When B = 0, the top buffer is
enabled, so that D = A; when B = 1, the lower buffer is enabled, so that D = C. Therefore, 𝐷 = 𝐵′ 𝐴 + 𝐵𝐶. This is
logically equivalent to using a 2-to-1 multiplexer to select the A input when B = 0 and the C input when B = 1.
When we connect two three-state buffer outputs together, as shown in the following Figure, if one of the buffers
is disabled (output = Z), the combined output F is the same as the other buffer output. If both buffers
are disabled, the output is Z. If both buffers are enabled, a conflict can occur. If A = 0 and C = 1, we do not know
what the hardware will do, so the F output is unknown (X). If one of the buffer inputs is unknown, the F output
will also be unknown. The table in the following Figure summarizes the operation of the circuit. S1 and S2
represent the outputs the two buffers would have if they were not connected together. When a bus is driven by
three-state buffers, we call it a three-state bus. The signals on this bus can have values of 0, 1, Z, and perhaps X.
A multiplexer may be used to select one of several sources to drive a device input. For example, if an adder input
must come from four different sources; a 4-to-1 MUX may be used to select one of the four sources. An
alternative is to set up a three-state bus, using three-state buffers to select one of the sources (see the following
Figure). In this circuit, each buffer symbol actually represents four three-state buffers that have a common enable
signal.
Integrated circuits are often designed using bi-directional pins for input and output. Bi-directional means that the
same pin can be used as an input pin and as an output pin, but not both at the same time. To accomplish this, the
circuit output is connected to the pin through a three-state buffer, as shown in the following Figure. When the
buffer is enabled, the pin is driven with the output signal. When the buffer is disabled, an external source can
drive the input pin.
The decoder is another commonly used type of integrated circuit. The following Figure shows the diagram and
truth table for a 3-to-8 line decoder. This decoder generates all of the minterms of the three input variables.
Exactly one of the output lines will be 1 for each combination of the values of the input variables.
The following Figure illustrates a 4-to-10 decoder. This decoder has inverted outputs (indicated by the small
circles). For each combination of the values of the inputs, exactly one of the output lines will be 0. When a
binary-coded-decimal digit is used as an input to this decoder, one of the output lines will go low to indicate
which of the 10 decimal digits is present
In general, an n-to-2n line decoder generates all 2n minterms (or maxterms) of the n input variables. The
outputs are defined by the equations:
or
yi = mi’ = Mi, i = 0 to 2n – 1 (inverted outputs)
where mi is a minterm of the n input variables and Mi is a maxterm.
m4 + m7 +m9 Solution:
An n-input decoder generates all of the minterms of n variables. Hence, n-variable functions can be realized by
ORing together selected minterm outputs from a decoder.
Rewriting given f1 and f2; we have: f1 = (m1’ m2’m4’)’ and f2 = (m4’ m7’m9’)’ Now, f1 and f2 can be generated
using NAND gates, as shown in the following Figure
Problem:
Show how using a 3-to8 decoder and multi-input OR gates following Boolean expression can be realized
simultaneously.
F1 (A, B, C) = ∑m (0, 4, 6) F2 (A, B, C) = ∑m (0, 5) F3 (A, B, C) = ∑m (1, 2, 3, 7).
Solution: Since, at the decoder output, we get all the min-terms, we use them as shown in the following Fig, toget
the required Boolean expression.
An encoder (converts an active input signal to a coded output signal) performs the inverse function of a decoder.
The following Figure shows a 8-to-3 priority encoder with inputs y0 through y7. If input yi is 1 and the other
inputs are 0, then the abc outputs represent a binary number equal to i. For example, if y3 = 1, then abc = 011.
If more than one input is 1 at the same time, the output can be defined using a priority scheme. The truth table in
the above Figure uses the following scheme: If more than one input is 1, the highest numbered input determines
the output. For example, if inputs y1, y4, and y5 are 1, the output is abc = 101.The X’s in the table are don’t-
cares; for example, if y5 is 1, we do not care what inputs y0 through y4 are. Output d is 1 if any input is 1,
otherwise, d is 0. This signal is needed to distinguish the case of all 0 inputs from the case where only y0 is 1.
Module 3
Basic Structure of computers
3. CONTROL UNIT
It controls the data transfer operations between memory and the processor. It controls the
data transfer operations between I/O and processor. It generates control signals for Memory
and I/O devices.
4. PC (PROGRAM COUNTER)
It is a special purpose register used to hold the address of the next instruction to be
executed. The contents of PC are incremented by 1 or 2 or 4 after either instruction or data
fetched from memory The contents of pc are incremented by 1 for 8 bit CPU, 2 for 16 bit
CPU and for 4 for 32 bit CPU
5. REGISTER ARRAY
The structure of register file is as shown in the above figure. It consists of set of registers.
A register is defined as group of flip flops. Each flip flop is designed to store 1 bit of data. It
is a storage element.
6. IR (INSTRUCTION REGISTER
It holds the instruction to be executed. It output is available to the control units.
7. ALU (ARITHMETIC and LOGIC UNIT)
It performs arithmetic and logical operations on given data.
BUS STRUCTURE
Bus: It is defined as set of parallel wires used for data communication. Each wire carries 1 bit
of data. There are 3 of buses, namely
1. Address bus
2. Data bus and
3. Control bus.
1. Address bus :
It is unidirectional.
The CPU sends the address of an I/O device or Memory device by means of this bus.
2. Data bus
It is a bidirectional bus.
The CPU sends data from Memory to CPU and vice versa as well as from I/O to CPU
and vice versa by means of this bus.
3. Control bus:
This bus carries control signals for Memory and I/O devices.
PERFORMANCE
The performance of a Computer System is based on hardware design of the processor
and the instruction set of the processors. To obtain high performance of computer
system it is necessary to reduce the execution time of the processor.
Execution time: It is defined as total time required executing one complete program.
The performance of the processor is inversely proportional to execution time of the
processor.
More performance = Less Execution time.
Less Performance = More Execution time.
.CACHE MEMORY: It is defined as a fast access memory located in between CPU and
Memory. It is part of the processor as shown in the fig
The processor needs more time to read the data and instructions from main memory
because main memory is away from the processor as shown in the figure. Hence it slowdown
the performance of the system.
The processor needs less time to read the data and instructions from Cache Memory
because it is part of the processor. Hence it improves the performance of the system.
PROCESSOR CLOCK:
The processor circuits are controlled by timing signals called as Clock.It defines constant
time intervals and are called as Clock Cycles. To execute one instruction there are 3 basic
steps namely
1. Fetch
2. Decode
3. Execute.
The processor uses one clock cycle to perform one operation as shown in the figure
Clock Cycle → T1 T2 T3
Instruction → Fetch Decode Execute
The performance of the processor depends on the length of the clock cycle. To obtain
high performance reduce the length of the clock cycle. Let ‘ P ’ be the number of clock
cycles generated by the Processor and ‘ R ‘ be the Clock rate .
Ex 1: R= 500MHZ, P=?
1/500 =0.002*10-6 =2ns
Ex 2 : R=1250 MHZ , P = ?
1/1250 = 0.0008 * 10-6 = 0.8 ns
Let ‘N ‘ be the number of instructions contained in the program. To execute one instruction
there are 3 steps namely 1. Fetch 2. Decode 3. Execute
Let ‘ S ‘ be the average number of steps required to one instruction.
Let ‘ R’ be number of clock cycles per second generated by the processor to execute
one program. Hence Processor Execution Time is given by
T=N*S/R
This equation is called as Basic Performance Equation.
For the programmer the value of T is important. To obtain high performance it is necessary to
reduce the values of N and S and to increase the value of R
CLOCK RATE
Improving the integrating –circuit (IC) technology makes logic circuits faster, which
reduces the time needed to complete a basic step. this allows the clock period ‘p’ to be
reduced and the clock rate ’R’ to be increased.
Reducing the amount of processing done in one basic step also makes it possible to
reduce the clock period.
PERFORMANCE MEASUREMENT
The computer community adopted the idea of measuring computer performance
using benchmark programs.
The performance measure is the time it takes a computer to execute given benchmark
An organization called system performance evaluation corporation(SPEC) selects and
publishes programs for different application domains.
It also provides many test results for commercially available computers.
This was developed in the year 1995 and modified in the year 2000, respectively
called as SPEC95 and SPEC2000.
Programs are selected from various fields like games, database, numerical
calculations.
The program is compiled for the computer under test, and running time on that
computer is measured.
The same program is compiled and run on one computer selected as a reference.
For SPEC95, the reference is the SUN SPARC station, for SPEC2000, the reference
computer is an ULTRA SPARC workstation.
Spec rating :
Spec rating is computed as follows
spec rating = running time on the reference computer
running time on the computer under test
Spec rating of 50 means that the computer under test is 50 times as fast as the
reference computer for that particular benchmark.
The test is repeated for all programs in the spec suit, and the geometric mean of the
results is computed.
let speci be the rating for program i in the suite.
The overall spec rating for the computer is given by
Spec Rating= ( ∏n SPECi )1/n
i=1
1. Memory is a storage device. It is used to store character operands, data operands and
instructions.
2. It consists of number of semiconductor cells and each cell holds 1 bit of
information.A group of 8 bits is called as byte and a group of 16-64 bits is called as
word.
World length = 16 for 16 bit CPU and World length = 8 for 8 bit CPU. It is defined as
number of bits in the byte or word.
Memory is organized in terms of bytes or words.
The organization of memory for 32 bit processor is as shown in the fig.
Memory words
The contents of memory location can be accessed for read and write operation either by
specifying address of the memory location or by name of the memory location.
Address space : It is defined as number of bytes accessible to CPU and it depends on the
number of address lines.
BYTE ADDRESSABILITY
The computer performs ALU operations on 3 quantities namely bit, byte and word. It is
impractical to assign addresses for 1 bit of information. Hence for practical reasons it is
necessary to assign the addresses for successive bytes.
In this technique lower byte of data is assigned to lower address of the memory and
higher byte of data is assigned to higher address of the memory.
The structure of memory to represent 32 bit number for little endian assignment is as shown
in the fig.
WORD ALLIGNMENT
16 bit
Word size 32 bit
64 bit
It is process of assignment of addresses of two successive words and this address is the
number of bytes in the word is called as Word alignment.
The character occupies 1 byte of memory and hence byte address for memory.
The numbers occupies 2 bytes of memory and hence word address for numbers.
MEMORY OPERATION
There are two types of memory operations namely 1. Memory read and 2. Memory
write
ADD R0 , R1, R2
Opcode source1, Source2, Destination
This instruction adds the contents of R0 with the contents of R1 and result is stored in R2.
The mathematical representation of this statement is given by
R2 ←[R0] + [R1].
Such a notations are called as “Assembly Language Notations”
Consider the arithmetic expression C = A + B, Where A,B,Z are the Memory locations.
Steps for evaluation
1. Access the first memory operand whose symbolic name is given by A.
2. Access the second memory operand whose symbolic name is given by B.
3. Perform the addition operation between two memory operands.
4. Store the result into the 3rd memory location C.
5. The mathematical representation is C ←[A] + [B].
opcode operand
Ex1: LOAD A This instruction copies the contents of memory location whose symbolic
name is given by ‘A’ into the Accumulator as shown in the figure.
ADD B This instruction adds the contents of Accumulator with the contents of
Memory
location ‘B’ and result is stored in Accumulator.
STORE B This instruction copies the contents of Accumulator into memory
location whose symbolic name is given by ‘B’ .
The 3 instruction program is stored in the successive memory locations of the processor is as
shown in the fig.
The system bus consists of uni-directional address bus,bi-directional data bus and control bus.
“It is the process of accessing the 1st instruction from memory whose address is stored in
program counter into Instruction Register (IR) by means of bi-directional data bus and at the
same time after instruction access the contents of PC are incremented by 4 in order to access
the next instruction. Such a process is called as “Straight Line Sequencing”.
INSTRUCTION EXECUTION
There are 4 steps for instruction execution
1 Fetch the instruction from memory into the Instruction Register (IR) whose
address is stored in PC.
IR ← [ [PC] ]
2 Increment the contents of PC by 4.
PC ← [PC] + 4.
3 Decode the instruction.
4 Perform the operation according to the opcode of an instruction
5 Load the result into the destination.
BRANCHING
Instead of using a long list of add instructions, it is possible to place a single Add
instruction in a program loop as shown in figure.
The loop is a straight- line sequence of instructions executed as many times as
needed.
C V Z N
1 N (NEGATIVE) Flag:
It is designed to differentiate between positive and negative result.
It is set 1 if the result is negative, and set to 0 if result is positive.
2 Z (ZERO) Flag:
It is set to 1 when the result of an ALU operation is found to zero, otherwise it
is cleared.
3 V (OVER FLOW) Flag:
In case of 2s Complement number system n-bit number is capable of
representing a range of numbers and is given by -2n-1 to +2n-1. . The Over-Flow
flag is set to 1 if the result is found to be out of this range.
4 C (CARRY) Flag :
This flag is set to 1 if there is a carry from addition or borrow from
subtraction, otherwise it is cleared..
ADDRESSING MODES
The different ways in which the location of an operand is pecified in an instruction are
referred to as addressing modes
1.REGISTER ADDRESSING
In this mode operands are stored in the registers of CPU. The name of the register is directly
specified in the instruction.
Ex: MOVE R1,R2 Where R1 and R2 are the Source and Destination registers respectively.
This instruction transfers 32 bits of data from R1 register into R2 register. This instruction
does not refer memory for operands. The operands are directly available in the registers.
2. ABSOLUTE ADDRESSING
It is also called as Absolute Addressing Mode. In this addressing mode operands are stored in
the memory locations. The name of the memory location is directly specified in the
instruction.
Ex: MOVE X, R1 : Where X is the memory location and R1 is the Register.
This instruction transfers 32 bits of data from memory location X into the General Purpose
Register R1.
3.IMMEDIATE ADDRESSING
In this Addressing Mode operands are directly specified in the instruction. The source field is
used to represent the operands. The operands are represented by # (hash) sign.
Ex: MOVE #23, R0
EA of an operand = X + (Ri )
RELATIVE ADDRESSING
In this Addressing Mode EA of an operand is computed by the Index Addressing Mode. This
Addressing Mode uses PC (Program Counter)to store the EA of the next instruction instead
of GPR.The symbolic representation of this mode is X (PC).Where X is the offset value and
PC is the Program Counter to store the address of the next instruction to be executed.
It can be represented as
EA of an operand = X + (PC).
This Addressing Mode is useful to calculate the EA of the target memory location.
ADDITIONAL MODES
In this Addressing Mode , EA of an operand is stored in the one of the GPR s of the CPU. This
Addressing Mode decrements the contents of memory register by 4 memory locations and
then transfers the data to destination.
Module 4
INPUT/OUTPUT ORGANIZATION
⚫ The program shown in the figure reads a line characters from the keyboard and stores it
in a memory buffer starting at location LINE.
⚫ As each character is read , it is echoed back to the display.
⚫ Register R0 is used as a pointer to the memory buffer area.
⚫ The contents of R0 are updated using the auto increment mode so that successive
characters are stored in successive memory locations.
⚫ Each character is checked to see if it is the carriage return (CR) character ,which has
ASCII code 0D(hex).
⚫ If it is, line feed character (ASCII Code 0A) is sent to move the cursor one lime down on
the display, otherwise the program loops back to wait for another character from the
keyboard.
⚫ This example illustrates program controlled I/O in which the processor repeatedly checks
a status flag to achieve the required synchronization between the processor and an input
or output device.
INTERRUPTS
⚫ In program controlled I/O technique, the processor initiates the action by checking the
status of the device by entering into a wait loop.
⚫ During this period, the processor is not performing any useful computation.
⚫ There are many situations where other tasks can be performed while waiting for an I/O
device to become ready.
⚫ To allow this to happen, we can arrange for the I/O device to alert the processor when it
becomes ready.
⚫ It can do so by sending a hardware signal called an INTERRUPT to the processor.
⚫ At least one of the bus control lines, called an INTERRUPT REQUEST LINE is usually
dedicated for this purpose.
⚫ Using interrupts waiting periods can ideally be eliminated.
EXAMPLE:
⚫ Consider the task that requires some computations to be performed and the results to be
printed on a line printer.
⚫ Let the program consist of two routines, COMPUTE and PRINT.
⚫ Assume that COMPUTE produces a set of “N” lines of output, to be printed by the
PRINT routine.
⚫ But the printer accepts only one line of text at a time.
⚫ First COMPUTE routine is executed to produce the first “N” lines of output.
Then the PRINT routine is executed to send the first line of text to the printer., at this
time instead of waiting for the line to be printed, the print routine may be temporarily
suspended and execution of the COMPUTE routine continued
⚫ whenever printer becomes ready, it alerts the processor by sending an interrupt request
signal.
⚫ In response, the processor interrupts the execution of the COMPUTE routine and
transfers control to the PRINT routine.
⚫ The PRINT routine sends the second line to the printer and is again suspended.
⚫ But an ISR may not have anything in common with the program being executed at the
time interrupt request is received.
⚫ In fact, the two programs often belong to different tasks.
⚫ Therefore, before starting the execution of the interrupt service routine, any information
that may be altered during the execution of the interrupted program is resumed.
⚫ The task of saving and restoring information can be done automatically by the processor
or by program instructions.
⚫ Saving registers increases the delay between the time an interrupt request is received and
the start the start of execution of the interrupt-service routine.
⚫ This delay is called INTERRUPT LATENCY.
⚫ In some earlier processors, particularly those with small number of registers , all registers
are saved automatically by the processor at the time an interrupt request is accepted.
⚫ The data saved are restored to their respective registers as part of the execution of the
return from interrupt instruction.
⚫ Some computers provide two types of interrupts
1) one saves all register contents
2) the other does not.
Interrupt Hardware
⚫ We discussed that an I/O device requests an interrupt by activating a bus line called
interrupt request line.
⚫ Most computers are likely to have several I/O devices that can request an interrupt.
⚫ A single interrupt request line may be used to serve ‘n’ devices as shown in the figure.
⚫ Thus if all interrupt request signals are inactive, that is if all switches are open, the
voltage on the line will be equal to vdd, this is inactive state of the line.
⚫ When device requests its an interrupt by closing its switch, the voltage on line drops to
0.causing interrupt request line INTR received by the processor to go to 1.
⚫ When an interrupt arrives the processor suspends the execution of one program and
begins the execution of another program requested by an I/O device.
⚫ Because interrupts can arrive at any time, they may alter the sequence of events.
⚫ A fundamental facility found in all computers is the ability to enable and disable such
interrupts.
⚫ There are many situations in which processor should ignore interrupt requests.
⚫ For these reasons, some means for enabling and disabling interrupts must be available for
programmer.
⚫ A simple way is to provide machine instructions, such as interrupt enable and interrupt
disable, that performs these functions.
⚫ Let us consider in detail the specific case of a single interrupt request from one device.
⚫ When device activates the interrupt request signal, it keeps the signal activated until it
learns that the processor has accepted its request.
⚫ It is essential to ensure that this active request signal does not lead to successive
interruptions, causing system to enter an infinite loop from which it cannot recover.
First Possibility:
⚫ The processor hardware ignores the interrupt-request line until the execution of the first
instruction of interrupt-service routine has been completed.
⚫ Then, by using interrupt disable instruction as the first instruction in the interrupt-service
routine.
⚫ Typically the interrupt-enable instruction will be the last instruction in the interrupt-
service routine.
Second Possibility:
⚫ The processor automatically disables the interrupts before starting the execution of the
ISR.
⚫ Prior to disabling, the processor should save the contents of PC and PROCESSOR
STATUS REGISTER(PS) on the stack.
⚫ The processor status register has one bit called interrupt-enable which will enable
interrupts when set to 1.
⚫ After saving the contents of the PS on the stack, the processor clears the interrupt-enable
bit in its PS register, thus disabling further interrupts.
⚫ When return from interrupt instruction is executed, the contents of the PS are restored
from the stack, setting the interrupt enable bit back to 1, hence interrupts are again
enabled.
Third Possibility:
⚫ The processor has special interrupt request line for which the interrupt-handling circuit
responds only to the leading edge of the signal.
⚫ In this case processor will receive only one request, regardless of how long the line is
activated.
⚫ Let us consider the situation where a number of devices capable of initiating interrupts
are connected to the processor.
⚫ Because these devices are operationally independent, there is no definite order in which
they will generate interrupts.
⚫ For example, device X may request an interrupt while an interrupt caused by Y is being
serviced or several devices may request interrupts at exactly the same time.
⚫ Given that different devices are likely to require different interrupt-service routines, how
can processor obtain the starting address of the appropriate routine in each case?
⚫ Should a device be allowed to interrupt the processor while another interrupt is being
serviced?
Polling Technique
⚫ When a device raises an interrupt request, it sets to 1 one of the bits in its status register,
which we call the IRQ bit.
⚫ For example bits KIRQ and DIRQ are the interrupt request bits for the keyboard and the
display.
⚫ The first device encountered with its IRQ bit set is the device that should be serviced.
⚫ Its main disadvantage is the time spent interrogating the IRQ bits of all devices that may
not be requesting any service.
Vectored Interrupts
⚫ To reduce the time involved in the polling process, a device requesting an interrupt may
identify itself directly to the processor.
⚫ Then, the processor can immediately start executing the corresponding interrupt-service
routine.
⚫ A device requesting an interrupt can identify itself by sending a special code to the
processor over the bus.
⚫ This enables the processor to identify individual devices even if they share a single
interrupt-request line.
⚫ The code supplied by the device may represent the starting address of the interrupt-
service routine for that device.
⚫ The location pointed to by the interrupting device is used to store the starting address of
the interrupt-service routine.
⚫ When a device sends an interrupt request, the processor may not be ready to receive the
interrupt-vector code immediately.
⚫ The interrupting device must wait to put data on the bus only when the processor is ready
to receive it.
⚫ When processor is ready to receive the interrupt-vector code, it activates the interrupt-
acknowledge line, INTA.
Interrupt nesting
⚫ The same arrangement is often used when several devices are involved, in which case
execution of a given interrupt-service routine , once started always continues to
completion before the processor accepts an interrupt request from a second device.
⚫ Interrupt service routines are typically short, and the delay they may cause is acceptable
for most simple devices.
⚫ For some devices, however a long delay in responding to an interrupt request may cause
errors.
⚫ Consider, for example a computer that keeps track of the time of day using real-time
clock.
⚫ This is a device that sends interrupt requests to the processor at regular intervals.
⚫ For each of these requests, the processor executes a short interrupt-service routine to
increment a set of counters in the memory that keep track of time in seconds, minutes and
so on.
⚫ It may be necessary to accept an interrupt request from the clock during the execution of
an interrupt-service routine for another device.
⚫ This example suggests that I/O devices should be organized in a priority structure.
⚫ An interrupt request from a high- priority device should be accepted while the processor
is servicing another request from a lower –priority device.
⚫ To implement this scheme, we can assign a priority level to the processor that can be
changed under program control.
⚫ The priority level of the processor is the priority of the program that is currently being
executed.
⚫ The processor accepts interrupts only from devices that have priorities higher than its
own.
⚫ At the time the execution of an interrupt–service routine for some device is started , the
priority of the processor is raised to that of the device.
⚫ This action disables interrupts from the devices at the same level of priority or lower.
⚫ The processor’s priority is usually encoded in a few bits of the processor status word.
⚫ Interrupt requests received over these lines are sent to a priority arbitration circuit in the
processor.
⚫ A request is accepted only if it has a higher priority level than that currently assigned to
the processor.
Simultaneous Requests
⚫ When multiple requests are received over a single request line at the same time.
⚫ The processor must have some means of deciding which request to service first.
Daisy-Chain:
⚫ So, method called Daisy-chain is a commonly used hardware arrangement for handling
many requests over a single interrupt-request line.
In this method, priority is determined by the order in which these devices are polled.
⚫ When several devices raise an interrupt request and the INTR line is activated, the
processor responds by setting the INTA line to 1.
⚫ Device 1 passes the signal on to device 2 only if it does not require any service.
⚫ If device 1 has pending request for interrupt, it blocks the INTA signal and proceeds to
put its identifying code on the data lines.
⚫ Therefore, in the daisy-chain arrangement, the device that is electrically closest to the
processor has the highest priority, and so on.
Priority Groups
⚫ Devices are organized in groups, and each group is connected at a different priority level.
⚫ Within a group, devices are connected in a daisy-chain, this organization is used in many
systems.
⚫ DMA transfers are performed by a control circuit that is part of the I/O device interface.
We refer to this circuit as DMA Controller.
⚫ To initiate the transfer of a block of words, the processor sends the starting address, the
number of words in the block, and direction of the transfer.
⚫ On receiving this information, the DMA controller proceeds to perform the requested
operation.
⚫ When the entire block has been transferred, the controller informs the processor by
raising an interrupt signal.
⚫ While a DMA transfer is taking place, the program that requested the transfer cannot
continue, and the processor can be used to execute another program.
⚫ After the DMA transfer is completed, the processor can return to the program that
requested the transfer.
⚫ I/O operations are always performed by the Operating System of the computer.
⚫ The OS is also responsible for suspending the execution of one program and starting the
another.
⚫ Thus for I/O operation involving DMA, the OS puts the program that requested the
transfer in the blocked state, and imitates the DMA operation, and starts the execution of
another program.
⚫ When the transfer is completed, the DMA controller informs the processor by sending an
interrupt request.
⚫ FIGURE shows an example of the DMA controller registers that are accessed by the
processor to initiate transfer operations.
⚫ Two registers are used for storing the starting address and word count.
⚫ When this bit is 1 the controller performs the read operation. Otherwise it performs the
write operation.
⚫ When the controller has completed transferring a block of data it sets the DONE flag to
1.
⚫ When this flag is set to 1, it causes the controller to raise an interrupt after it has
completed transferring a block of data.
⚫ The controller sets the IRQ bit to 1 when it has requested an interrupt.
⚫ An example of computer system is given in the figure showing how DMA controllers
may be used.
⚫ The DMA controller which controls two disks, also has DMA capability and provides
two DMA channels.
⚫ It can perform two independent DMA operations, as if each disk has its own DMA
controller.
⚫ The registers needed to store the memory address, the word count, and soon are
duplicated so that one set can be used with each device.
⚫ To start a DMA transfer of a block of data from the main memory to one of the disks, a
processor sends the address and word count information into the registers of the
corresponding channel of the disk controller.
⚫ When the DMA transfer is completed, this fact is recorded in the status and control
register of the DMA channel by setting the DONE bit.
⚫ Requests by DMA devices for using bus are always given higher priority than processor
requests.
⚫ Since the processor originates most memory access cycles, the DMA controller can be
said to “STEAL” memory cycles from the processor. Hence this technique is called
CYCLE STEALING.
BLOCK/BURST Mode:
⚫ The DMA controller may be given exclusive access to the main memory to transfer a
block of data without interruption.
Most DMA controllers contain a data storage buffer. In the case of the network interface
in the figure for example, the DMA controller reads a block of data from main memory
and stores it into its input buffer, then the data in the buffer is transmitter over the
network.
Bus Arbitration
⚫ A conflict may arise if both processor and a DMA controller or two DMA controllers try
to use the bus at the same time to access the main memory.
⚫ The device that is allowed to initiate data transfers on the bus at any given time is called
the BUS MASTER.
⚫ When the current bus master relinquishes control of the bus, another device can acquire
this status.
It is process by which the next device to become bus master is selected and bus
mastership is transferred to it.
1) Centralized Arbitration
2) Distributed Arbitration
⚫ In distributed arbitration, all devices participate in the selection of the next bus master.
Centralized Arbitration
⚫ In centralized arbitration, the bus master may be the processor or a separate unit
connected to the bus.
⚫ Figure shows a basic arrangement in which processor contains the bus arbitration circuit.
⚫ In this case, the processor is normally the bus master unless it grants bus mastership to
one of the DMA controllers.
⚫ A DMA controller indicates that it needs to become the bus master by activating the
BUS request line, BR.
⚫ When the bus request line is activated, the processor activates the bus grant signal,BG1
indicating to the DMA controllers that they may use the bus when it becomes free.
⚫ Thus, if DMA controller 1 is requesting the bus, it blocks the propagation of the grant
signal to the other devices, otherwise, it passes the grant signal to next device.
⚫ The current bus master indicates to all devices that it is using bus by activating another
line called BUS-BUSY(BBSY).
⚫ Hence, after receiving the BUS –grant signal, a DMA controller waits for BUS-BUSY to
become inactive, then it gets the BUS Mastership. at this time it activates BUS-BUSY.
⚫ The timing diagram in the figure shows the sequence of events for the devices
Distributed Arbitration
⚫ In distributed arbitration all devices participate in the selection of next bus master.
⚫ When one or more devices request the bus, they assert the start arbitration signal and
place their 4-bit identification numbers on four lines, ARB0 through ARb3.
⚫ A winner is selected as a result of the interaction among the signals transmitted over
these lines by all contenders.
⚫ If one device puts 1 on the bus and another device puts 0 on the same bus line, the bus
line status will be 0.
⚫ Consider that two devices A and B having ID numbers 5 and 6 respectively are
requesting the use of bus.
⚫ Device A transmits the pattern 0101, and device B transmits the pattern 0110.
⚫ Each device compares the pattern on the arbitration lines to its own ID, starting from the
most significant bit.
⚫ If it detects a difference at any bit position , it disables its drivers at that bit position and
for all lower-order bits.
⚫ In our example device A detects the difference on the line ARB1 , hence it disables its
drivers on lines ARB1 and ARB0. this causes the pattern on the arbitration lines to
change to 0110, which means that device B has won the contention.
The level next in the memory hierarchy is called as secondary memory. It holds huge
amount of data.
Cache Memories
It is the fast access time located in between processor and main memory as
shown in the fig. It is designed to reduce the access time.
Many instructions in localized areas of the program are executed repeatedly during some
time period and the remainder of the program is accessed relatively infrequently.This is
referred to as Locality of Reference.
The memory control circuitry is designed to take advantage of the property of locality of
Reference.
The Temporal aspect of the locality of Reference suggests that whenever an information
item is first needed this item should be brought into the cache where it will hopefully
remain until it is needed again.
The Spatial aspect suggests that instead of fetching just one item from the main memory
to the cache it is useful to fetch several items that reside at adjacent addresses as
well.We will use the term block to refer to a set of contiguous address locations of some
size.
The processor does not need to know explicitly about the existence of the cache.
The cache control circuitry determines whether the requested word currently exists in the
cache.
If it does , the Read or write operation is performed on the appropriate cache location. It
is referred to as Read or write Hit.
In a Read operation the main memory is not involved. For a write operation the system
can proceed in 2 ways.
In the first technique called the write through protocol the cache location and the main
memory location are updated simultaneously.
The second technique is to update only the cache location and to mark it as updated with
an associated flag bit often called the dirty or modified bit. The main memory location
of the word is updated later when the block containing this marked word is to be
removed from the cache to make room for a new Block.This technique is known as the
write-back or copy-back proctocol.
The write back protocol may also result in unnecessary write operations because when a
cache block is written back to the memory all words of the block are written back even
if only a single word has been changed while the block was in the cache.
When the addressed word in a Read operation is not in the cache a Read miss occurs. The
block of words that contains the requested word is copied from the main memory into
the cache.After the entire block is loaded into the cache,the particular word requested is
forwarded to the processor
Mapping functions
There are 3 techniques to map main memory blocks into cache memory.
1. Direct mapped cache
The simplest way to determine cache locations in which to store memory blocks is the
direct mapping technique as shown in the figure.
The block 0 , 128 and block 256 are mapped into block-0 cache location.
Similarly blocks 1,129 and 257 from main memory are loaded into cache
block-1.
It is note that contention may arise when more than one memory blocks are loaded into
single cache block, even when the cache is not full.
The main memory block is loaded into cache block by means of memory address. The
main memory address consists of 3 fields as shown in the figure.
Each block consists of 16 words. Hence least significant 4 bits are used to select one of
the 16 words.
The 7bits of memory address are used to specify the position of the cache block, location.
The most significant 5 bits of the memory address are stored in the tag bits. The tag bits
are used to map one of 25 = 32 blocks into cache block location.
The higher order 5 bits of memory address are compared with the tag bits. If they match,
then the desired word is in that block of the cache.
If there is no match, then the block containing the required word must first be read from
the main memory and loaded into the cache. It is very easy to implement, but not
flexible.
2. Associative Mapping
It is also called as associative mapped cache. It is much more flexible.
In this technique main memory block can be placed into any cache block position.
In this case , 12 tag bits are required to identify a memory block when it is resident of the
cache memory.
The Associative Mapping technique is illustrated as shown in the fig.
In this technique 12 bits of address generated by the processor are compared with the tag
bits of each block of the cache to see if the desired block is present. This is called as
associative mapping technique.
It gives more flexibility to choose the cache location in which to place the memory block.
A cache consists of 64 sets as shown in the figure. Hence 6 bit set field is used to select a
cache set from 64 sets.
The tag field (6 bits) of memory address is compared with the tag fields of each set to
determine whether memory block is available or not.
In this case memory blocks 0,64,128,---4032 map into cache set 0 and they can occupy
either of the 2 block positions within that set.
The following figure clearly describes the working principle of Set Associative Mapping
technique.
Hence the contention problem of the direct method is erased by having a few choices for
block placement.At the same time the hardware cost is reduced by decreasing the size of
the associative search.
Module 5
Basic Processing Unit (BPU)
Fundamental Concepts
Processor fetches one instruction at a time, and performs the operation specified.
Instructions are fetched from successive memory locations until a branch or a jump
instruction is encountered.
Processor keeps track of the address of the memory location containing the next
instruction to be fetched using Program Counter (PC).
Instruction Register (IR)
Executing an Instruction
Fetch the contents of the memory location pointed to by the PC. The contents of this
location are loaded into the IR (fetch phase).
IR [[PC]]
Assuming that the memory is byte addressable, increment the contents of the PC by 4
(fetch phase).
PC [PC] + 4
Carry out the actions specified by the instruction in the IR (execution phase).
ALU and all the registers are interconnected via a single common bus.
The data and address lines of the external memory bus connected to the internal
processor bus via the memory data register, MDR and the memory address register,
MAR respectively.
Register MDR has two inputs and two outputs.
Data may be loaded into MDR either from the memory bus or from the internal
processor bus.
The data stored in MDR may be placed on either bus.
The input of MAR is connected to the internal bus, and its output is connected to the
external bus.
The control lines of the memory bus are connected to the instruction decoder and
control logic.
This unit is responsible for issuing the signals that control the operation of all the
units inside the processor and for increasing with the memory bus.
The MUX selects either the output of register Y or a constant value 4 to be provided
as input A of the ALU.
The constant 4 is used to increment the contents of the program counter.
Figure1 below shows the SINGLE BUS organization of the data path inside the processor:
Executing an Instruction
Instruction execution involves a sequence of steps in which data are transferred from one
register to another.
For each register two control signals are used to place the contents of that register on
the bus or to load the data on the bus into register.(symbolically represented in above
figure)
The input and output of register Ri are connected to the bus via switches controlled by
the signals Riin and Riout respectively.
When Riin is set to 1, the data on the bus are loaded into Ri.
Similarly, when Riout is set to 1, the contents of register Ri are placed on the bus.
1. REGISTER TRANSFERS
Example: MOVE R1, R4
Suppose we wish to transfer the contents of register R1 to register R4. This can be
accomplished as follows:
Enable the output of registers R1 by setting R1 out to 1. This places the contents of R1
on the processor bus.
Enable the input of register R4 by setting R4in to 1. This loads data from the processor
bus into register R4.
CONTROL SEQUENCE
1. R1out, R4in
CONTROL SEQUENCE
1. R1out, Yin
2. R2out, SelectY, Add, Zin
3. Zout, R3in
In step 1, the output of register R1 and the input of register Y are enabled, causing
thecontents of R1 to be transferred over the bus to Y.
In Step 2, the multiplexer’s select signal is set to Select Y, causing the multiplexer to gate
the contents of register Y to input A of the ALU. At the same time, the contents of
register R2 are gated onto the bus and, hence, to input B.
The function performed by the ALU depends on the signals applied to its control lines.
In this case, the ADD line is set to 1, causing the output of the ALU to be the sum of the
two numbers at inputs A and B.
This sum is loaded into register Z because its input control signal is activated. In step 3,
the contents of register Z are transferred to the destination register R3. This last transfer
cannot be carried out during step 2, because only one register output can be connected
to the bus during any clock cycle.
The processor has to specify the address of the memory location where this information is
stored and request a Read operation.
This applies whether the information to be fetched represents an instruction in a program
or an operand specified by an instruction.
The processor transfers the required address to the MAR, whose output is connected to
the address lines of the memory bus.
At the same time, the processor uses the control lines of the memory bus to indicate that a
read operation is needed.
When the requested data are received from the memory they are stored in register MDR,
from where they can be transferred to other registers in the processor.
To accommodate this, the processor waits until it receives an indication that the requested
operation has been completed (Memory-Function-Completed, MFC).
The output of MAR is enabled all the time.
Thus the contents of MAR are always available on the address lines of the memory bus.
When a new address is loaded into MAR, it will appear on the memory bus at the
beginning of the next clock cycle
A read control signal is activated at the same time MAR is loaded.
1. MAR [R1]
2. Start a Read operation on the memory bus
3. Wait for the MFC response from the memory
4. Load MDR from the memory bus
5. R2 [MDR]
The connections and control signals for register MDR is shown below:
CONTROL SEQUENCE
CONTROL SEQUENCE
1. R1out,MARin
2. R2out,MDRin, Write
3. MDRoutE,WMFC
Let us now put together the sequence of elementary operations required to execute one
instruction.
Consider the instruction
ADD (R3), R1
which adds the contents of memory location provided by R3 to register R1. Executing this
instruction requires the following actions:
1. Fetch the instruction.
2. Fetch the first operand (the contents of the memory location pointed to by R3).
3. Perform the addition.
4. Load the result into R1.
Figure5 below gives the sequence of control steps required to perform these operations
for the single-bus architecture of figure1:
Figure 5 gives the sequence of control steps required to perform these operations for the
single bus architecture of figure1.
Instruction execution proceeds as follows:
FETCH PHASE
In step 1 instruction fetch operation is initiated by loading the contents of the PC into the
MAR and sending a read request to the memory.
The select signal is set to select the constant4.
This value is added to the operand at input B, which is the contents of the PC, and the
result is stored in register Z.
The updated value is moved from register Z back into the PC during step 2, while waiting
for the memory to respond.
In step 3, the word fetched from the memory is loaded into the IR.
Steps 1 to 3 constitute the instruction fetch phase, which is the same for all instructions.
EXECUTE PHASE
The instruction is decoded and the control circuitry activates the control signals for steps
4 through 7, which constitute the execution phase.
The contents of register R3 are transferred to MAR in step 4, and a memory read
operation is initiated.
Then the contents of R1 are transferred to register Y in step 5, to prepare for the addition
operation.
When the read operation is completed, the memory operand is available in register MDR,
and the addition operation is performed in step 6.
The addition is performed by ALU and the sum is stored in register Z, and then
transferred to R1 in step 7.
The END signal causes a new instruction fetch cycle to begin by returning to step 1.
This discussion accounts for all control signals in figure5 except Yin in step 2. There
is no need to copy the updated contents of PC into register Y when executing the Add
instruction. But, in branch instructions the updated value of the PC is needed to compute the
branch target address. To speed up the execution of branch instructions, this value is copied
into register Y in step 2.
BRANCH INSTRUCTIONS
A branch instruction replaces the contents of the PC with the branch target address.
This address is usually obtained by adding an offset X, which is given in the branch
instruction, to the updated value of PC.
Figure 6 below gives the control sequence that implements an unconditional branch
instruction.
Processing starts, as usual, with the fetch phase. This phase ends when the instruction is
loaded into the IR in step 3.
The offset value is extracted from the IR by the instruction decoding circuit.
Since the value of the updated PC is already available in register Y, the offset X is gated
onto the bus in step 4, and an addition operation is performed.
The result, which is the branch target address, is loaded into the PC in step 5.
Consider now a conditional branch. In this case we need to check the status of the
condition codes before loading the new value into the PC.
For example, for a Branch-on-negative (Branch < 0) instruction step 4 in figure 6 is
replaced with
Offset-field-of-IRout, Add, Zin, If N=0 then END
Thus if N=0 the processor returns to step 1 immediately after step 4. If N=1, step 5 is
performed to load a new value into the PC, thus performing the branch instruction.
This signal is asserted during time slot T1 for all the instructions, during T6 for an
ADD instruction, during T4 for an unconditional branch instruction, and so on.
Pipelining
The speed of execution of programs is influenced by many factors one way to improve
performance is to use pipelining.It is particularly effective way of organizing concurrent
activity in a computer sytem.
Consider how the idea of pipelining can be used in a computer. The processor executes a
program by fetching and executing instructions one after the other. Let Fi and Ei refer to the
fetch and execute steps for instruction Ii
A computer has 2 separate hardware units, one for fetching instructions and another for
executing them. The instruction fetched by the fetch unit is deposited in an intermediate
storage buffer B1. This buffer is needed to enable the execution unit to execute the
instruction while the fetch unit is fetching the next instruction.
The processing of an instruction need not be divided into only 2 steps. For ex a pipelined
processor may process each instruction in 4 steps as follows.
Buffer B1 holds instruction I3 ,which was fetched in cycle 3 and is being decoded by the
instruction decoding unit.
Buffer B2 holds both the source operands for Instruction I2 and the specification of the
operation to be performed. This is the information produced by the decoding h/W in cycle
3.The buffer also holds the information needed for the write step of Instruction I2 (step
W2).Even though it is not needed by stage E,this information must be passed on to stage W
in the following clock cycle to enable that stage to perform the required write operation.
Buffer B3 holds the results produced by the execution unit and the destination information
for Instruction I1.
Each stage in a pipeline is expected to complete its operation in one clock cycle. Hence the
clock period should be sufficiently long to complete the task being performed in any stage.
If different units require different amounts of time, the clock period must allow the longest
task to be completed. A unit that completes its task early is idle for the remainder of the
clock period.
Hence pipelining is most effective in improving performance if the tasks being performed
in different stages require about the same amount of time.
The clock cycle has to be equal to or greater than the time needed to complete a fetch
operation .However the access time of the main memory may be as much as 10 times
greater than the time needed to perform the basic pipeline stage operations. Inside the
processor ,such as adding 2 numbers. Thus, if each instruction fetch required access to the
main memory pipelining would be of little value.
The use of cache memories solves the memory access problem .In particular, when a cache
is included on the same chip as the processor access time to the cache is usually the same as
the time needed to perform other basic operations inside the processor.
Pipeline Performance
In the example, the operation specified in Instruction requires 3 cycles to complete from
cycle 4 through cycle 6. Thus,in cycle cycle 5 and 6 the write stage must be told to do
nothing because it has no data to work with .Meanwhile the information in buffer B2 must
remain intact until the execute stage has completed its operation. This means that stage2
and in turn stage1 are blocked from accepting new instructions because the information in
B1 cannot be overwritten. Thus steps D4 and F5 must be postponed.
Pipelined operation is said to have been stalled for 2 clock cycles. Normal pipelined
operation resumes in cycle7.Any condition that causes the pipeline to stall is called as
hazard. A data hazard is any condition in which either the source or the destination
operands of an instruction are not available at the time expected in the pipeline.As a result
some operation has to be delayed and the pipeline stalls.
The pipeline may also be stalled because of a delay in the availability of an instruction.For
ex ,this may be a result of a miss in the cache, requiring the instruction to be fetched from
the main memory .Such hazards are often called control hazards or instruction hazards.
This figure gives the function performed by each pipeline stage in each clock cycle. Note
that the Decode unit is idle in cycle3 through cycle5 the execute unit is idle in cycle 4
through 6,and write unit is idle in cycle 5 through 7.such idle periods are called stalls.They
are also often referred to as bubbles in the pipeline.
If instructions and data reside in the same cache unit only one instruction can proceed and
the other instruction is delayed. Many processors use separate instruction and data caches to
avoid this delay.