0% found this document useful (0 votes)

4 views58 pages

Computer Architecture_notes

The document outlines a syllabus for a course on Computer Architecture and Organization, detailing key topics such as functional units of a computer, computer arithmetic, I/O organization, pipelining, and CPU control unit design. It also covers instruction set architecture (ISA), register transfer language (RTL), and compares single and multiple bus architectures. Each section provides foundational concepts essential for understanding computer systems and their operations.

Uploaded by

adityavardhangavande77

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views58 pages

Computer Architecture_notes

Uploaded by

adityavardhangavande77

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 58

Computer Architecture & Organization

Course Notes
Syllabus
Unit No. Contents
Basic functional blocks of a computer: Functional units, Basic operational concepts,
Bus structures: single bus architecture and multiple bus architecture, Instruction set
1
architecture (ISA) of a CPU, Register transfer language (RTL) notations, and instruction
execution: straight line sequencing and branching, addressing modes.
Computer Arithmetic’s: Fixed point Addition, Subtraction, Multiplication and
Division. Floating Point arithmetic, Booth’s algorithm, integer division, Data
2
Representation method, Booths multiplication, division algorithm and example, IEEE
standard single and double precision format and examples
I/O Organization: Interrupts: Enabling and Disabling Interrupts, Handling
3 Multiple Devices. Direct Memory Access, Bus Arbitration, Interface circuits, Standard
I/O Interfaces (PCI, SCSI and USB).
Pipelining: Basic concepts of pipelining, Arithmetic and Instruction Pipeline, throughput
4 and speedup, pipeline hazards, Logic Design Conventions

CPU control unit design: Hardwired and micro-programmed design

approaches, Multicore processors architecture, Basic concepts in parallel
5
Processing & classification of parallel architectures, Advanced topics & its
Application

UNIT-1
Functional Units of Computer

The functional units of a computer refer to the essential components that work together to
perform tasks. These units are typically classified into the following categories:

GHRU Amravati Dept. of CSE Page 1

1. Input Unit
o Accepts data and instructions from the user.
o Examples: Keyboard, Mouse, Scanner, Microphone.
2. Output Unit
o Provides results and processed data to the user.
o Examples: Monitor, Printer, Speaker.
3. Central Processing Unit (CPU)
o Often called the "brain" of the computer, it processes data and instructions.
o Composed of:
 Arithmetic and Logic Unit (ALU): Performs mathematical calculations
and logical operations.
 Control Unit (CU): Directs and coordinates the activities of the
computer.
 Registers: Small, high-speed storage locations for temporary data.
4. Memory Unit
o Stores data and instructions temporarily or permanently.
o Types:
 Primary Memory (RAM, ROM): Fast and directly accessible by the
CPU.
 Secondary Storage (Hard Drive, SSD): For long-term data storage.
5. Storage Unit
o Holds data, instructions, and results for future use.
o Examples: Hard Drives, SSDs, USB Drives, CDs.
6. Communication Unit
o Facilitates data transfer between the computer and external devices or networks.
o Examples: Network Interface Card (NIC), Modem.

Each unit plays a crucial role in ensuring the computer system functions efficiently.

Basic Operational Concepts of functional unit of computer

The basic operational concepts of the functional units of a computer involve the flow and
processing of data to perform tasks effectively. Each functional unit has a specific role in this
process. Below are the core concepts:

GHRU Amravati Dept. of CSE Page 2

1. Input Unit

 Function: Accepts data and instructions from the user or external devices.
 Operation: Converts user-friendly input (e.g., key presses, mouse clicks) into binary
data that the computer understands.
 Example: Typing on a keyboard sends data to the CPU for further processing.

2. Central Processing Unit (CPU)

The CPU is the "brain" of the computer and controls all operations. It comprises three main
components:

a. Control Unit (CU):

 Function: Directs the flow of data between the CPU, memory, and I/O devices.
 Operation:
1. Fetches instructions from memory.
2. Decodes them into control signals.
3. Executes them by coordinating with other units.

b. Arithmetic and Logic Unit (ALU):

 Function: Performs all mathematical (arithmetic) and logical operations.

 Operation:
o Arithmetic operations: Addition, subtraction, multiplication, and division.
o Logical operations: Comparisons like greater than, equal to, etc.

c. Registers:

 Function: Temporary storage for data during processing.

 Operation: Hold data, instructions, or intermediate results for quick access.

3. Memory Unit
a. Primary Memory (RAM/ROM):

 Function: Temporarily stores data and instructions needed by the CPU.

 Operation: CPU reads and writes data directly to/from this memory during execution.

b. Secondary Memory (HDD/SSD):

 Function: Permanently stores data, instructions, and results.

 Operation: Data is transferred to primary memory as needed.

4. Output Unit

 Function: Delivers processed data (results) to the user in a human-readable form.

 Operation: Converts binary data into text, graphics, sound, etc., using devices like
monitors, printers, or speakers.

5. Data Flow

1. Input Stage: User inputs are captured by the input unit.

2. Processing Stage: The CPU:

GHRU Amravati Dept. of CSE Page 3

oFetches the input from memory.
oDecodes and processes it.
oStores intermediate results in memory or registers.
3. Output Stage: The processed data is sent to the output unit for display or storage.

6. Communication/Bus System

 Function: Facilitates data transfer between the CPU, memory, and I/O units.
 Operation:
o Address Bus: Carries memory addresses.
o Data Bus: Transfers actual data.
o Control Bus: Sends control signals.

These basic operational concepts define how data flows and is processed within a computer
system, ensuring efficient performance of tasks.

Single Bus Architecture

In a single bus structure, one common bus is used to communicate between peripherals
and microprocessors. It has disadvantages due to the use of one common bus.

Advantages of Single Bus Structure

 Simplicity: The design is simplistic in nature and hence is easy to roll out and even
administer.
 Cost-Effective: It is more expensive to have a large number of buses and wiring to power
the smart grids, so fewer buses and less wiring leads to least expensive system.
 Ease of Maintenance: Since there are fewer components that are generally involved, it is
easier to diagnose and even rectify any problems that may exist.
Disadvantages of Single Bus Structure
 Bandwidth Limitation: Because all the components feed off this bus, the rate at which
data is transferred is rather slow and therefore creates a bottleneck.
 Slower Performance: This is because when many components in the computer request
access to the central processing unit or the RAM at the same time then the system slows
down.
 Scalability Issues: More components can intensify the bandwidth problem causing a
problem if expansion of the system becomes necessary.

Multiple buses Architecture

In a double bus structure, one bus is used to fetch instructions while other is used to fetch data,
required for execution. It is to overcome the bottleneck of a single bus structure.
Advantages of Double Bus Structure
 Improved Performance: The efficiency is improved because the system can manage more
data at once since it consists of two buses which are one for memory and the other for I/O.
 Better Bandwidth Utilization: Every bus can work on its own which will help to decrease
the traffic and increase the speed of data exchange.

GHRU Amravati Dept. of CSE Page 4

 Scalability: The system can be expanded more easily, as the two buses can accommodate
more components without significant performance degradation.
Disadvantages of Double Bus Structure
 Increased Complexity: The design is slightly cumbersome requiring several parts which
have to be fitted properly.
 Higher Cost: The major drawbacks of more buses and wiring are that it brings about an
overall increase in the cost of the system.
 Challenging Maintenance: Identifying problems becomes even more challenging mainly
because of the numerous parts and contact points.

Sr.
No. Single Bus Structure Double Bus Structure

The same bus is shared by three

The two independent buses link various units
1. units (Memory, Processor, and I/O
together.
units).

One common bus is used for Two buses are used, one for communication
2. communication between peripherals from peripherals and the other for the
and processors. processor.

Here, the I/O bus is used to connect I/O units

The same memory address space is
3. and processor and other one, memory bus is
utilized by I/O units.
used to connect memory and processor.

Instructions and data both are Instructions and data both are transferred in
4.
transferred in same bus. different buses.

5. Its performance is low. Its performance is high.

GHRU Amravati Dept. of CSE Page 5

Sr.
No. Single Bus Structure Double Bus Structure

The cost of a single bus structure is

6. The cost of a double bus structure is high.
low.

Number of cycles for execution is

7. Number of cycles for execution is less.
more.

8. Execution of the process is slow. Execution of the process is fast.

Numbers of registers associated are

9. Numbers of registers associated are more.
less.

At a time single operand can be

10. At a time two operands can be read.
read from the bus.

Advantages-
Advantages-
 Better performance
11.  Less expensive
 Improves Efficiency
 Simplicity

Instruction Set Architecture (ISA) of a CPU

The Instruction Set Architecture (ISA) is the interface between the hardware and software of a
computer system. It defines the set of instructions that the CPU can execute, as well as the
associated data formats, addressing modes, and control mechanisms.

Key Components of ISA

1. Instruction Set:
o A collection of machine instructions that the CPU can execute. These instructions
are divided into categories:
 Data Transfer Instructions: Move data between memory, registers, and
I/O devices.
Examples: MOV, LOAD, STORE
 Arithmetic Instructions: Perform arithmetic operations like addition,
subtraction, multiplication, and division.
Examples: ADD, SUB, MUL, DIV
 Logical Instructions: Perform logical operations such as AND, OR,
NOT, XOR.
Examples: AND, OR, XOR, NOT

GHRU Amravati Dept. of CSE Page 6

 Control Flow Instructions: Alter the flow of execution (branching,
looping).
Examples: JUMP, CALL, RETURN, IF
 Input/Output Instructions: Handle data transfer between the CPU and
peripherals.
Examples: IN, OUT
 Special Instructions: Perform processor-specific tasks like interrupts and
no-ops.
Examples: NOP, HLT

2. Registers:
o Defines the number, type, and purpose of CPU registers, which are high-speed
storage locations.
 General Purpose Registers (GPRs): Hold temporary data.
 Special Purpose Registers: Include Program Counter (PC), Stack Pointer
(SP), and Status Register.

3. Addressing Modes:
o Determines how the CPU accesses data operands in memory or registers.
Common modes include:
 Immediate Addressing: Data is part of the instruction.
 Register Addressing: Data resides in a register.
 Direct Addressing: Instruction specifies the memory address.
 Indirect Addressing: Memory address is obtained from a register.
 Indexed Addressing: Combines a base address and an index.

4. Data Types:
o Specifies the types of data the CPU can process, such as:
 Integers (signed and unsigned)
 Floating-point numbers
 Characters
 Packed/Unpacked BCD (Binary Coded Decimal)

5. Instruction Format:
o Defines the structure of an instruction, including:
 Opcode: Specifies the operation to perform.
 Operands: Specifies the data to operate on.
 Mode Bits: Define the addressing mode.

6. Interrupts and Exceptions:

o Mechanisms for handling unexpected events or pre-defined conditions during
execution.

Types of ISA

1. CISC (Complex Instruction Set Computer):

o Large instruction set with complex operations.
o Examples: x86 architecture.
2. RISC (Reduced Instruction Set Computer):
o Small, highly optimized instruction set.
o Examples: ARM, MIPS.
3. VLIW (Very Long Instruction Word):
GHRU Amravati Dept. of CSE Page 7
oEncodes multiple operations in a single instruction.
oExample: Itanium processors.
4. EPIC (Explicitly Parallel Instruction Computing):
o Allows for parallel execution of instructions.
o Example: Intel's Itanium architecture.

Role of ISA

 Software Development: Guides compiler design for efficient translation of high-level

code.
 Hardware Design: Defines how the processor executes instructions.
 Compatibility: Ensures software can run on different hardware implementations of the
same ISA.

Examples of Popular ISAs

1. x86: Common in desktops and laptops.

2. ARM: Widely used in mobile and embedded systems.
3. RISC-V: Open-source ISA with growing popularity.
4. PowerPC: Used in older Apple devices and some embedded systems.

Register Transfer Language (RTL) Notation

Register Transfer Language (RTL) is a symbolic notation used to describe the operations, data
transfers, and processes occurring at the register level within a computer's architecture. It is
commonly used in computer design and analysis.

Basic Elements of RTL

1. Registers:
o Represented by symbols like R1, R2, etc.
o Each register is a storage unit that can hold a value.
2. Data Transfer:
o Denoted by the ← symbol, meaning data is transferred to a destination register.
o Example:

R1 ← R2
This indicates the content of R2 is copied into R1.

3. Control Signals:
o Used to indicate when specific actions occur.
o Example:

If (C = 1) then R1 ← R2
The transfer occurs only if the control signal C is active (true).

4. Arithmetic and Logic Operations:

o Represent operations performed on data.
o Example:

GHRU Amravati Dept. of CSE Page 8

R3 ← R1 + R2
The sum of R1 and R2 is stored in R3.

5. Memory Access:
o Data is moved between memory and registers.
o Example:
 Reading from memory:
R1 ← M[100]
The data at memory address 100 is transferred to R1.
 Writing to memory:
M[100] ← R1
The content of R1 is stored at memory address 100.
6. Control Instructions:
o Indicate program flow changes, such as jumps or branches.
o Example:
If (R1 = 0) then PC ← 200
If R1 is zero, the program counter (PC) jumps to instruction address 200.

Common RTL Examples

1. Data Transfer:
R1 ← R2
Copies the content of R2 into R1.
2. Arithmetic Operations:
R3 ← R1 + R2
Adds the content of R1 and R2, and stores the result in R3.
3. Logical Operations:
R1 ← R2 AND R3
Performs a logical AND operation between R2 and R3, and stores the result in R1.
4. Shift Operations:
R1 ← R1 >> 1
Performs a right shift on R1.

5. Memory Read/Write:
o Read:
R1 ← M[Address]
Data from memory at Address is loaded into R1.
o Write:
M[Address] ← R1
The content of R1 is stored in memory at Address.

6. Conditional Execution:

If (Flag = 1) then R1 ← R2

Data is transferred only if the Flag signal is active.

7. Incrementing the Program Counter:

PC ← PC + 1
Advances the program counter to the next instruction.

Advantages of RTL

1. Clarity: Provides a clear and precise way to describe data flow and operations.

GHRU Amravati Dept. of CSE Page 9

2. Hardware Design: Helps in creating and verifying control and data paths in digital
circuits.
3. Simulation: Useful for simulating and debugging the behaviour of digital systems.

Instruction Execution: Straight Line Sequencing and Branching

1. Straight-Line Sequencing

 Definition: Instructions are executed sequentially, one after the other, in the order they
appear in memory.
 Process:
1. The Program Counter (PC) holds the address of the next instruction.
2. The CPU fetches the instruction from the memory location pointed to by the PC.
3. The PC is incremented to point to the next instruction.
4. The CPU decodes and executes the fetched instruction.

Instruction 1
Instruction 2
Instruction 3

 The CPU executes instructions 1, 2, and 3 sequentially.

 Applications:
o Used in simple programs without decision-making or looping constructs.

2. Branching

 Definition: The control flow of the program changes based on specific conditions,
allowing non-sequential execution of instructions.
 Types of Branching:
1. Unconditional Branching:
 The program always jumps to a specified address.
 Example

JUMP Address
2. Conditional Branching:

 The program jumps to a specified address only if a certain condition is true.

 Example

IF (Condition) THEN JUMP Address

3. Subroutine Call and Return:

 Control is transferred to a subroutine, and execution resumes after the subroutine

finishes.
 Example

CALL Subroutine
...
RETURN

 Applications:
o Used in decision-making, loops, and subroutine handling in programs.

GHRU Amravati Dept. of CSE Page 10

Addressing Modes

Addressing modes define how the CPU identifies the location of data (operands) to be used in an
instruction. These modes are critical for optimizing program performance and flexibility.

1. Immediate Addressing

 Definition: The operand is part of the instruction itself.

 Example:

ADD R1, #5
Adds the value 5 directly to the content of R1.

2. Register Addressing

 Definition: The operand resides in a register.

 Example:

ADD R1, R2
Adds the content of R2 to R1.

3. Direct Addressing

 Definition: The instruction contains the memory address of the operand.

 Example:

LOAD R1, 1000

Loads data from memory address 1000 into R1.

4. Indirect Addressing

 Definition: The instruction specifies a register or memory location that contains the
memory address of the operand.
 Example:

LOAD R1, (R2)

Loads data from the memory address stored in R2.

5. Indexed Addressing

 Definition: Combines a base address and an index to compute the effective address.
 Example:

LOAD R1, BASE + INDEX

Loads data from the address obtained by adding a base address and an index.

6. Base Register Addressing

 Definition: Similar to indexed addressing, but uses a base register instead of a fixed base
address.
GHRU Amravati Dept. of CSE Page 11
 Example:

LOAD R1, (BaseReg) + Offset

7. Relative Addressing

 Definition: The effective address is calculated relative to the current value of the
Program Counter (PC).
 Example:

JUMP PC + Offset

8. Stack Addressing

 Definition: Operands are implicitly taken from the top of the stack.
 Example:

PUSH R1
POP R2

UNIT-III

GHRU Amravati Dept. of CSE Page 12

What is an Interrupt?
The interrupt is a signal emitted by hardware or software when a process or an event needs
immediate attention. It alerts the processor to a high-priority process requiring interruption of
the current working process. In I/O devices one of the bus control lines is dedicated for this
purpose and is called the Interrupt Service Routine (ISR).
When a device raises an interrupt at let’s say process i,e., the processor first completes the
execution of instruction i. Then it loads the Program Counter (PC) with the address of the first
instruction of the ISR. Before loading the Program Counter with the address, the address of the
interrupted instruction is moved to a temporary location. Therefore, after handling the interrupt
the processor can continue with process i+1.
While the processor is handling the interrupts, it must inform the device that its request has
been recognized so that it stops sending the interrupt request signal. Also, saving the registers
so that the interrupted process can be restored in the future, increases the delay between the
time an interrupt is received and the start of the execution of the ISR. This is called Interrupt
Latency.
Types of Interrupt
Event-related software or hardware can trigger the issuance of interrupt signals. These fall into
one of two categories: software interrupts or hardware interrupts.
1. Software Interrupts
A sort of interrupt called a software interrupt is one that is produced by software or a system as
opposed to hardware. Traps and exceptions are other names for software interruptions. They
serve as a signal for the operating system or a system service to carry out a certain function or
respond to an error condition. Generally, software interrupts occur as a result of specific
instructions being used or exceptions in the operation. In our system, software interrupts often
occur when system calls are made. In contrast to the fork() system call, which also generates a
software interrupt, division by zero throws an exception that results in the software interrupt.
A particular instruction known as an “interrupt instruction” is used to create software
interrupts. When the interrupt instruction is used, the processor stops what it is doing and
switches over to a particular interrupt handler code. The interrupt handler routine completes
the required work or handles any errors before handing back control to the interrupted
application.

Types of Interrupt

2. Hardware Interrupts
In a hardware interrupt, all the devices are connected to the Interrupt Request Line. A single
request line is used for all the n devices. To request an interrupt, a device closes its associated
switch. When a device requests an interrupt, the value of INTR is the logical OR of the
requests from individual devices.
Hardware interrupts are further divided into two types of interrupt

GHRU Amravati Dept. of CSE Page 13

 Maskable Interrupt: Hardware interrupts can be selectively enabled and disabled thanks
to an inbuilt interrupt mask register that is commonly found in processors. A bit in the
mask register corresponds to each interrupt signal; on some systems, the interrupt is
enabled when the bit is set and disabled when the bit is clear, but on other systems, the
interrupt is deactivated when the bit is set.
 Spurious Interrupt: A hardware interrupt for which there is no source is known as a
spurious interrupt. This phenomenon might also be referred to as phantom or ghost
interrupts. When a wired-OR interrupt circuit is connected to a level-sensitive processor
input, spurious interruptions are typically an issue. When a system performs badly, it could
be challenging to locate these interruptions.
Interrupt Cycle
A normal instruction cycle starts with the instruction fetch and execute. But, to
accommodate the occurrence of the interrupts while normal processing of the
instructions, the interrupt cycle is added to the normal instruction cycle as shown in the
figure below.

After the execution of the current instruction, the processor verifies the interrupt signal to
check whether any interrupt is pending. If no interrupt is pending then the processor
proceeds to fetch the next instruction in the sequence.
 If the processor finds the pending interrupts, it suspends the execution of the current
program by saving the address of the next instruction that has to be executed and it
updates the program counter with the starting address of the interrupt service routine to
service the occurred interrupt.
 After the interrupt is serviced completely the processor resumes the execution of the
program it has suspended.

 What is Interrupt Latency?

As we know to service the occurred to interrupt the processor suspends the execution of
the current program and save the details of the program to maintain the integrity of the
program execution. The modern processor store the minimum information that will be
needed by the processor to resume the execution of the suspended program. Still, the
saving and restoring of information from memory and registers which involve memory
transfer increase the execution time of the program.
Transfer of memory also occurs when the program counter is updated with the starting
address of the interrupt service routine. This memory transfer causes the delay between
the time the interrupt was received and the processor starts executing the interrupt service
routine. This time delay is termed as interrupt latency.

GHRU Amravati Dept. of CSE Page 14

Enabling and Disabling Interrupts
Modern computers have facilities to enable or disable interrupts. A programmer must have
control over the events during the execution of the program.

For example, consider the situation, that a particular sequence of instructions must be executed
without any interruption. As it may happen that the execution of the interrupt service routine
may change the data used by the sequence of instruction. So the programmer must have the
facility to enable and disable interrupt in order to control the events during the execution of the
program.

Now you can enable and disable the interrupts on both ends i.e. either at the processor end or at
the I/O device end. With this facility, if the interrupts are enabled or disabled at the processor
end the processor can accept or reject the interrupt request. And if the I/O devices are allowed to
enable or disable interrupts at their end then either I/O devices are allowed to raise an interrupt
request or prevented from raising an interrupt request.

To enable or disable interrupt at the processor end, one bit of its status register i.e. IE (Interrupt
Enable) is used. When the IE flag is set to 1 the processor accepts the occurred interrupts. IF IE
flag is set to 0 processor ignore the requested interrupts.

To enable and disable interrupts at the I/O device end, the control register present at the interface
of the I/O device is used. One bit of this control register is used to regulate the enabling and
disabling of interrupts from the I/O device end.

Handling Multiple Devices

Consider the situation that the processor is connected to multiple devices each of which is
capable of generating the interrupt. Now as each of the connected devices is functionally
independent of each other, there is no certain ordering in which they can initiate interrupts.

Let us say device X may interrupt the processor when it is servicing the interrupt caused by
device Y. Or it may happen that multiple devices request interrupts simultaneously. These
situations trigger several questions like:

 How the processor will identify which device has requested the interrupt?
 If the different devices requested different types of interrupt and the processor has to service
them with different service routine then how the processor is going to get starting address of that
particular to interrupt the service routine?
 Can a device interrupt the processor while it is servicing the interrupt produced by another
device?
 How can the processor handle if multiple devices request the interrupts simultaneously?

How these situations are handled vary from computer to computer. Now, if multiple devices are
connected to the processor where each is capable of raising an interrupt the how will the
processor determine which device has requested an interrupt.

The solution to this is that whenever a device request an interrupt it set its interrupt request bit
(IRQ) to 1 in its status register. Now the processor checks this IRQ bit of the devices and the
device encountered with IRQ bit as 1 is the device that has to raise an interrupt.

But this is a time taking method as the processor spends its time checking the IRQ bits of every
connected device. The time wastage can be reduced by using a vectored interrupt.

GHRU Amravati Dept. of CSE Page 15

Vectored Interrupt

The devices raising the vectored interrupt identify themselves directly to the processor. So
instead of wasting time in identifying which device has requested an interrupt the processor
immediately start executing the corresponding interrupt service routine for the requested
interrupt.

Now, to identify themselves directly to the processors either the device request with its own
interrupt request signal or by sending a special code to the processor which helps the processor
in identifying which device has requested an interrupt.

Usually, a permanent area in the memory is allotted to hold the starting address of each interrupt
service routine. The addresses referring to the interrupt service routines are termed as interrupt
vectors and all together they constitute an interrupt vector table. Now how does it work?

The device requesting an interrupt sends a specific interrupt request signal or a special code to
the processor. This information act as a pointer to the interrupt vector table and the
corresponding address (address of a specific interrupt service routine which is required to service
the interrupt raised by the device) is loaded to the program counter.

Interrupt Nesting

When the processor is busy in executing the interrupt service routine, the interrupts are disabled
in order to ensure that the device does not raise more than one interrupt. A similar kind of
arrangement is used where multiple devices are connected to the processor. So that the servicing
of one interrupt is not interrupted by the interrupt raised by another device.

What if the multiple devices raise interrupts simultaneously, in that case, the interrupts are
prioritized.

GHRU Amravati Dept. of CSE Page 16

DMA (DIRECT MEMORY ACCESS)
In modern computer systems, transferring data between input/output devices and memory can
be a slow process if the CPU is required to manage every step. To address this, a Direct
Memory Access (DMA) Controller is utilized. A Direct Memory Access (DMA) Controller
solves this by allowing I/O devices to transfer data directly to memory, reducing CPU
involvement. This increases system efficiency and speeds up data transfers, freeing the CPU to
focus on other tasks. DMA controller needs the same old circuits of an interface to
communicate with the CPU and Input/Output devices.
What is a DMA Controller?
Direct Memory Access (DMA) uses hardware for accessing the memory, that hardware is
called a DMA Controller. It has the work of transferring the data between Input Output devices
and main memory with very less interaction with the processor. The direct Memory Access
Controller is a control unit, which has the work of transferring data.
DMA Controller in Computer Architecture
DMA Controller is a type of control unit that works as an interface for the data bus and the I/O
Devices. As mentioned, DMA Controller has the work of transferring the data without the
intervention of the processors, processors can control the data transfer. DMA Controller also
contains an address unit, which generates the address and selects an I/O device for the transfer
of data. Here we are showing the block diagram of the DMA Controller.

GHRU Amravati Dept. of CSE Page 17

Types of Direct Memory Access (DMA)
There are four popular types of DMA.
 Single-Ended DMA
 Dual-Ended DMA
 Arbitrated-Ended DMA
 Interleaved DMA
Single-Ended DMA: Single-Ended DMA Controllers operate by reading and writing from a
single memory address. They are the simplest DMA.
Dual-Ended DMA: Dual-Ended DMA controllers can read and write from two memory
addresses. Dual-ended DMA is more advanced than single-ended DMA.
Arbitrated-Ended DMA: Arbitrated-Ended DMA works by reading and writing to several
memory addresses. It is more advanced than Dual-Ended DMA.
Interleaved DMA: Interleaved DMA are those DMA that read from one memory address and
write from another memory address.
Working of DMA Controller
The DMA controller registers have three registers as follows.
 Address register – It contains the address to specify the desired location in memory.
 Word count register – It contains the number of words to be transferred.
 Control register – It specifies the transfer mode.
Note: All registers in the DMA appear to the CPU as I/O interface registers. Therefore, the
CPU can both read and write into the DMA registers under program control via the data bus.
The figure below shows the block diagram of the DMA controller. The unit communicates
with the CPU through the data bus and control lines. Through the use of the address bus and
allowing the DMA and RS register to select inputs, the register within the DMA is chosen by
the CPU. RD and WR are two-way inputs. When BG (bus grant) input is 0, the CPU can
communicate with DMA registers. When BG (bus grant) input is 1, the CPU has relinquished
the buses and DMA can communicate directly with the memory.

GHRU Amravati Dept. of CSE Page 18

Explanation: The CPU initializes the DMA by sending the given information through the data
bus.
 The starting address of the memory block where the data is available (to read) or where
data are to be stored (to write).
 It also sends word count which is the number of words in the memory block to be read or
written.
 Control to define the mode of transfer such as read or write.
 A control to begin the DMA transfer
Modes of Data Transfer in DMA
There are 3 modes of data transfer in DMA that are described below.
 Burst Mode: In Burst Mode, buses are handed over to the CPU by the DMA if the whole
data is completely transferred, not before that.
 Cycle Stealing Mode: In Cycle Stealing Mode, buses are handed over to the CPU by the
DMA after the transfer of each byte. Continuous request for bus control is generated by
this Data Transfer Mode. It works more easily for higher-priority tasks.
 Transparent Mode: Transparent Mode in DMA does not require any bus in the transfer of
the data as it works when the CPU is executing the transaction.
What is 8237 DMA Controller?
8237 DMA Controller is a type of DMA Controller which has a flexible number of channels
but generally works on 4 Input-Output channels. In these present channels, the channel has to
be given the highest priority to be decided by the Priority Encoder. Each channel in the 8237
DMA Controller has to be programmed separately.
What is 8257 DMA Controller?
8257 DMA Controller is a type of DMA Controller, that when a single Intel 8212 I/O device is
paired with it, becomes 4 channel DMA Controller. In 8257 DMA Controller, the highest
priority channel is acknowledged. It contains two 16-bit registers, one is DMA Address
Register and the other one is Terminal Count Register.
Advantages of DMA Controller
 Data Memory Access speeds up memory operations and data transfer.
 CPU is not involved while transferring data.
 DMA requires very few clock cycles while transferring data.
 DMA distributes workload very appropriately.
 DMA helps the CPU in decreasing its load.
Disadvantages of DMA Controller
 Direct Memory Access is a costly operation because of additional operations.
 DMA suffers from Cache-Coherence Problems.
 DMA Controller increases the overall cost of the system.
 DMA Controller increases the complexity of the software.

BUS ARBITRATION

Introduction :
In a computer system, multiple devices, such as the CPU, memory, and I/O controllers, are
connected to a common communication pathway, known as a bus. In order to transfer data
between these devices, they need to have access to the bus. Bus arbitration is the process of
resolving conflicts that arise when multiple devices attempt to access the bus at the same time.
When multiple devices try to use the bus simultaneously, it can lead to data corruption and
system instability. To prevent this, a bus arbitration mechanism is used to ensure that only one
device has access to the bus at any given time.
There are several types of bus arbitration methods, including centralized, decentralized, and
distributed arbitration. In centralized arbitration, a single device, known as the bus controller,

GHRU Amravati Dept. of CSE Page 19

is responsible for managing access to the bus. In decentralized arbitration, each device has its
own priority level, and the device with the highest priority is given access to the bus. In
distributed arbitration, devices compete for access to the bus by sending a request signal and
waiting for a grant signal.
Bus Arbitration refers to the process by which the current bus master accesses and then
leaves the control of the bus and passes it to another bus requesting processor unit. The
controller that has access to a bus at an instance is known as a Bus master.
A conflict may arise if the number of DMA controllers or other controllers or processors try to
access the common bus at the same time, but access can be given to only one of those. Only
one processor or controller can be Bus master at the same point in time. To resolve these
conflicts, the Bus Arbitration procedure is implemented to coordinate the activities of all
devices requesting memory transfers. The selection of the bus master must take into account
the needs of various devices by establishing a priority system for gaining access to the bus.
The Bus Arbiter decides who would become the current bus master.
Applications of bus arbitration in computer organization:
Shared Memory Systems: In shared memory systems, multiple devices need to access the
memory to read or write data. Bus arbitration allows multiple devices to access the memory
without interfering with each other.
Multi-Processor Systems: In multi-processor systems, multiple processors need to
communicate with each other to share data and coordinate processing. Bus arbitration allows
multiple processors to share access to the bus to communicate with each other and with shared
memory.
Input/Output Devices: Input/Output devices such as keyboards, mice, and printers need to
communicate with the processor to exchange data. Bus arbitration allows multiple input/output
devices to share access to the bus to communicate with the processor and memory.
Real-time Systems: In real-time systems, data needs to be transferred between devices and
memory within a specific time frame to ensure timely processing. Bus arbitration can help to
ensure that data transfer occurs within a specific time frame by managing access to the bus.
Embedded Systems: In embedded systems, multiple devices such as sensors, actuators, and
controllers need to communicate with the processor to control and monitor the system. Bus
arbitration allows multiple devices to share access to the bus to communicate with the
processor and memory.
There are two approaches to bus arbitration:
1. Centralized bus arbitration –
A single bus arbiter performs the required arbitration.

2. Distributed bus arbitration –

All devices participating in the selection of the next bus master.

Methods of Centralized BUS Arbitration:

There are three bus arbitration methods:
(i) Daisy Chaining method: It is a simple and cheaper method where all the bus masters use
the same line for making bus requests. The bus grant signal serially propagates through each
master until it encounters the first one that is requesting access to the bus. This master blocks
the propagation of the bus grant signal, therefore any other requesting module will not receive
the grant signal and hence cannot access the bus.
During any bus cycle, the bus master may be any device – the processor or any DMA
controller unit, connected to the bus.

GHRU Amravati Dept. of CSE Page 20

Advantages:
 Simplicity and Scalability.
 The user can add more devices anywhere along the chain, up to a certain maximum value.
Disadvantages:
 The value of priority assigned to a device depends on the position of the master bus.
 Propagation delay arises in this method.
 If one device fails then the entire system will stop working.
(ii) Polling or Rotating Priority method: In this, the controller is used to generate the address
for the master(unique priority), the number of address lines required depends on the number of
masters connected in the system. The controller generates a sequence of master addresses. When
the requesting master recognizes its address, it activates the busy line and begins to use the bus.

Advantages –
 This method does not favor any particular device and processor.
 The method is also quite simple.
 If one device fails then the entire system will not stop working.
Disadvantages –
 Adding bus masters is difficult as increases the number of address lines of the circuit.
(iii) Fixed priority or Independent Request method –
In this, each master has a separate pair of bus request and bus grant lines and each pair has a
priority assigned to it.
The built-in priority decoder within the controller selects the highest priority request and
asserts the corresponding bus grant signal.

GHRU Amravati Dept. of CSE Page 21

Advantages –
 This method generates a fast response.
Disadvantages –
 Hardware cost is high as a large no. of control lines is required.

Distributed BUS Arbitration :

In this, all devices participate in the selection of the next bus master. Each device on the bus is
assigned a 4bit identification number. The priority of the device will be determined by the
generated ID.
Uses of BUS Arbitration in Computer Organization:
Bus arbitration is a critical process in computer organization that has several uses and benefits,
including:
1. Efficient use of system resources: By regulating access to the bus, bus arbitration ensures
that each device has fair access to system resources, preventing any single device from
monopolizing the bus and causing system slowdowns or crashes.
2. Minimizing data corruption: Bus arbitration helps prevent data corruption by ensuring that
only one device has access to the bus at a time, which minimizes the risk of multiple
devices writing to the same location in memory simultaneously.
3. Support for multiple devices: Bus arbitration enables multiple devices to share a common
communication pathway, which is essential for modern computer systems with multiple
peripherals, such as printers, scanners, and external storage devices.
4. Real-time system support: In real-time systems, bus arbitration is essential to ensure that
high-priority tasks are executed quickly and efficiently. By prioritizing access to the bus,
bus arbitration can ensure that critical tasks are given the resources they need to execute in
a timely manner.
5. Improved system stability: By preventing conflicts between devices, bus arbitration helps
to improve system stability and reliability. This is especially important in mission-critical
systems where downtime or data corruption could have severe consequences.
Issues of BUS Arbitration in Computer Organization :
Bus arbitration is a critical process in computer organization that has several uses and benefits,
including:
1. Efficient use of system resources: By regulating access to the bus, bus arbitration ensures
that each device has fair access to system resources, preventing any single device from
monopolizing the bus and causing system slowdowns or crashes.
2. Minimizing data corruption: Bus arbitration helps prevent data corruption by ensuring that
only one device has access to the bus at a time, which minimizes the risk of multiple
devices writing to the same location in memory simultaneously.

GHRU Amravati Dept. of CSE Page 22

3. Support for multiple devices: Bus arbitration enables multiple devices to share a common
communication pathway, which is essential for modern computer systems with multiple
peripherals, such as printers, scanners, and external storage devices.
4. Real-time system support: In real-time systems, bus arbitration is essential to ensure that
high-priority tasks are executed quickly and efficiently. By prioritizing access to the bus,
bus arbitration can ensure that critical tasks are given the resources they need to execute in
a timely manner.
5. Improved system stability: By preventing conflicts between devices, bus arbitration helps
to improve system stability and reliability. This is especially important in mission-critical
systems where downtime or data corruption could have severe consequences.

INTEFACE CIRCUITS
The I/O interface circuit is circuitry that is designed to link the I/O devices to the processor. Now
the question is why do we require an interface circuit?

We know that every component or module of the computer has its distinct capabilities and
processing speed. For example, the processing speed of the CPU is much higher than the other
components of the computer such as keyboard, display, etc.

So, we need a mediator to make the computer communicate with the I/O modules. This mediator
is referred to as an interface circuit. Observe the figure below, we can easily see that the one end
of the interface circuit is connected to the system bus line i.e., address line, data line, and control
line.

The address line is decoded by the interface circuit to determine if the processor has addressed
this particular I/O device or not. The control line is decoded to identify which kind of operation
is requested by the processor. The data line is used to transfer the data between I/O and the
processor.

The other side of the interface circuit has the connections that are essential to transfer data
between the I/O interface circuit and the I/O device. And this side of the I/O interface is referred

GHRU Amravati Dept. of CSE Page 23

to as the port. The port of the I/O interface can be a parallel port or a serial port. We will discuss
these ports in the section ahead.

But before discussing these ports let us take a brief outlook of what are the features of the I/O
interface circuit.

1. The interface circuit has a data register that stores the data temporarily while the data is being
exhanged between I/O and processor.
2. The interface circuit also has a status register, the bits in the status register indicate the
processor whether the I/O device is set for the transmission or not.
3. The interface circuit also has the control register, the bits in the control register indicate the
type of operation (read or write) requested by the processor to the I/O interface.
4. The interface circuit also has address decoding circuitry which decodes the address over the
address line to determine whether it is being addressed by the processor.
5. The interface circuitry also generates the timing signals that synchronize the operation between
the processor and the I/O device.
6. The interface circuit is also responsible for the format conversion that is essential for
exchanging data between the processor and the I/O interface.

Now, let us learn about the parallel port and the serial port of the I/O interface circuit.

Parallel Port

To understand the interface circuit with a parallel port we will take the example of two I/O
devices. First, we will study an input device i.e., a keyboard that has an 8-bit input port, and then
an output device i.e., a display that has an 8-bit output port. Here multiple bits are transferred at
once.

Input Port

Observe the parallel input port that connects the keyboard to the processor. Now, whenever the
key is tapped on the keyboard an electrical connection is established that generates an electrical
signal. This signal is encoded by the encoder to convert it into ASCII code for the corresponding
character pressed at the keyboard.

GHRU Amravati Dept. of CSE Page 24

The encoder then outputs one byte of data that presents the character encoded by the encoder
along with one valid bit. This valid bit changes its status from 0 to 1 when the key is pressed. So,
when the valid bit is 1 the ASCII code of the corresponding character is loaded to the
KBD_DATA register of the input interface circuit.

Now, when the data is loaded into the KBD_DATA register the KIN status flag present in the
KBD_STATUS register is set to1. Which causes the processor to read the data from
KBD_DATA.

Once the processor reads the data from KBD_DATA register the KIN flag is again set to 0. Here
the input interface is connected to the processor using an asynchronous bus.

So, the way they alert each other is using the master ready line and the slave ready line.
Whenever the processor is ready to accept the data, it activates its master-ready line and
whenever the interface is ready with the data to transmit it to the processor it activates its slave-
ready line.

The bus connecting processor and interface has one more control line i.e., R/W which is set to
one for reading operation.

Output Port

Observe the output interface shown in the figure below that connects the display and processor.
The display device uses two handshake signals that are ready and new data and the other master
and slave-ready.

When the display unit is ready to display a character, it activates its ready line to 1 which setups
the DOUT flag in the DISP_STATUS register to 1. This indicates the processor and the
processor places the character to the DISP_DATA register.

GHRU Amravati Dept. of CSE Page 25

As soon as the processor loads the character in the DISP_DATA the DOUT flag setbacks to 0
and the New-data line to 1. Now as the display senses that the new-data line is activated it turns
the ready line to 0 and accepts the character in the DISP_DATA register to display it.

Serial Port

Opposite to the parallel port, the serial port connects the processor to devices that transmit only
one bit at a time. Here on the device side, the data is transferred in the bit-serial pattern, and on
the processor side, the data is transferred in the bit-parallel pattern.

The transformation of the format from serial to parallel i.e., from device to processor, and from
parallel to serial i.e., from processor to device is made possible with the help of shift registers
(input shift register & output shift register).

Observe the figure above to understand the functioning of the serial interface at the device side.
The input shift register accepts the one bit at a time in a bit-serial fashion till it receives all 8 bits.
When all the 8 bits are received by the input shift register it loads its content into the DATA IN
register parallelly. In a similar fashion, the content of the DATA OUT register is transferred in
parallel to the output shift register.

The serial interface port connected to the processor via system bus functions similarly to the
parallel port. The status and control block has two status flags SIN and SOUT. The SIN flag is
set to 1 when the I/O device inputs the data into the DATA IN register through the input shift
register and the SIN flag is cleared to 0 when the processor reads the data from the DATA IN
register.

GHRU Amravati Dept. of CSE Page 26

When the value of the SOUT register is 1 it indicates to the processor that the DATA OUT
register is available to receive new data from the processor. The processor writes the data into
the DATA OUT register and sets the SOUT flag to 0 and when the output shift register reads the
data from the DATA OUT register sets back SOUT to 1.

This makes the transmission convenient between the device that transmits and receives one bit at
a time and the processor that transmits and receives multiple bits at a time.

The serial interface does not have any clock line to carry timing information. So, the timing
information must be embedded with the transmitted data using the encoding scheme. There are
two techniques to do this.

Asynchronous Serial Transmission

In the asynchronous transmission, the clock used by the transmitter and receiver is not
synchronized. So, the bits to be transmitted are grouped into a group of 6 to 8 bits which has a
defined starting bit and ending bit. The start bit has a logic value 0 and the stop bit has a logic
value 1.

The data received at the receiver end is recognized by this start and stop bit. This approach is
useful where is the transmission is slow.

Synchronous Serial Transmission

The start and stop bit we used in the asynchronous transmission provides the correct timing
information but this approach is not useful where the transmission speed is high.

So, in the synchronous transmission, the receiver generates the clock that is synchronized with
the clock of the transmitter. This lets the transmitting large blocks of data at a high speed.

This is all about the interface circuit that is an intermediatory circuit between the I/O device and
processor. The parallel interface is faster, costly, and efficient for the devices that are at a closer
distance to the processor. Whereas the serial interface is is slow, less costly, and efficient for
long-distance connection.

Standard I/O Interfaces-(PCI, SCSI and USB)

PCI
The PCI bus is a good example of a system bus that grew out of the need for standardization.
It supports the functions found on a processor bus bit in a standardized format that is independent
of any particular processor.
Devices connected to the PCI bus appear to the processor as if they were connected directly to the
processor bus. They are assigned addresses in the memory address space of the processor.
The PCI follows a sequence of bus standards that were used primarily in IBM PCs. Early PCs
used the 8-bit XT bus, whose signals closely mimicked those of Intel’s 80x86 processors. Later,
the 16-bit bus used on the PC At computers became known as the ISA bus. Its extended 32-bit
version is known as the EISA bus.

GHRU Amravati Dept. of CSE Page 27

Other buses developed in the eighties with similar capabilities are the Microchannel used in IBM
PCs and the NuBus used in Macintosh computers.
The PCI was developed as a low-cost bus that is truly processor independent. Its design
anticipated a rapidly growing demand for bus bandwidth to support high-speed disks and graphic
and video devices, as well as the specialized needs of multiprocessor systems. As a result, the PCI
is still popular as an industry standard almost a decade after it was first introduced in 1992.
An important feature that the PCI pioneered is a plug-and-play capability for connecting I/O
devices. To connect a new device, the user simply connects the device interface board to the bus.
The software takes care of the rest.
Data Transfer
In today’s computers, most memory transfers involve a burst of data rather than just one word.
The reason is that modern processors include a cache memory. Data are transferred between the
cache and the main memory in burst of several words each.
The words involved in such a transfer are stored at successive memory locations. When the
processor (actually the cache controller) specifies an address and requests a read operation from
the main memory, the memory responds by sending a sequence of data words starting at that
address. Similarly, during a write operation, the processor sends a memory address followed by a
sequence of data words, to be written in successive memory locations starting at the address.
The PCI is designed primarily to support this mode of operation. A read or write operation
involving a single word is simply treated as a burst of length one.
The bus supports three independent address spaces: memory, I/O, and configuration. The first two
are self-explanatory. The I/O address space is intended for use with processors, such as Pentium,
that have a separate I/O address space. However, as noted , the system designer may choose to use
memory-mapped I/O even when a separate I/O address space is available.
In fact, this is the approach recommended by the PCI its plug-and-play capability. A 4 bit
command that accompanies the address identifies which of the three spaces is being used in a
given data transfer operation.
The signaling convention on the PCI bus is similar to the one used, we assumed that the master
maintains the address information on the bus until data transfer is completed. But, this is not
necessary. The address is needed only long enough for the slave to be selected. The slave can store
the address in its internal buffer. Thus, the address is needed on the bus for one clock cycle only,
freeing the address lines to be used for sending data in subsequent clock cycles. The result is a
significant cost reduction because the number of wires on a bus is an important cost factor. This
approach in used in the PCI bus.
At any given time, one device is the bus master. It has the right to initiate data

GHRU Amravati Dept. of CSE Page 28

transfers by issuing read and write commands. A master is called an initiator in PCI terminology.
This is either a processor or a DMA controller.
The addressed device that responds to read and write commands is called a target.

Device Configuration
When an I/O device is connected to a computer, several actions are needed to configure both the
device and the software that communicates with it.
The PCI simplifies this process by incorporating in each I/O device interface a small
configuration ROM memory that stores information about that device. The configuration ROMs of
all devices is accessible in the configuration address space.
The PCI initialization software reads these ROMs whenever the system is powered up or reset. In
each case, it determines whether the device is a printer, a keyboard, an Ethernet interface, or a disk
controller. It can further learn bout various device options and characteristics. Devices are
assigned addresses during the initialization process. This means that during the bus configuration
operation, devices cannot be accessed based on their address, as they have not yet been assigned
one. Hence, the configuration address space uses a different mechanism. Each device has an input
signal called Initialization Device Select, IDSEL#.
The PCI bus has gained great popularity in the PC word. It is also used in many other computers,
such as SUNs, to benefit from the wide range of I/O devices for which a PCI interface is available.
In the case of some processors, such as the Compaq Alpha, the PCI-processor bridge circuit is
built on the processor chip itself, further simplifying system design and packaging.

SCSI
It is a standard bus defined by the American National Standards Institute (ANSI). A controller
connected to a SCSI bus is an initiator or a target. The processor sends a command to the SCSI
controller, which causes the following sequence of events to take place:
The SCSI controller contends for control of the bus (initiator).
When the initiator wins the arbitration process, it selects the target controller and hands over
control of the bus to it.
The target starts an output operation. The initiator sends a command specifying the required read
operation.
The target sends a message to the initiator indicating that it will temporarily suspends the
connection between them. Then it releases the bus.

GHRU Amravati Dept. of CSE Page 29

The acronym SCSI stands for Small Computer System Interface. It refers to a standard bus defined
by the American National Standards Institute (ANSI) under the designation X3.131 .
In the original specifications of the standard, devices such as disks are connected to a computer
via a 50-wire cable, which can be up to 25 meters in length and can transfer data at rates up to 5
megabytes/s.
The SCSI bus standard has undergone many revisions, and its data transfer capability has
increased very rapidly, almost doubling every two years. SCSI-2 and SCSI-3 have been defined,
and each has several options.
A SCSI bus may have eight data lines, in which case it is called a narrow bus and transfers data
one byte at a time.
Alternatively, a wide SCSI bus has 16 data lines and transfers data 16 bits at a time.
There are also several options for the electrical signaling scheme used.
Devices connected to the SCSI bus are not part of the address space of the processor in the same
way as devices connected to the processor bus.
The SCSI bus is connected to the processor bus through a SCSI controller. This controller uses
DMA to transfer data packets from the main memory to the device, or vice versa.
A packet may contain a block of data, commands from the processor to the device, or status
information about the device.
Communication with a disk drive differs substantially from communication with the main
memory.
A controller connected to a SCSI bus is one of two types – an initiator or a target.
An initiator has the ability to select a particular target and to send commands specifying the
operations to be performed.
Clearly, the controller on the processor side, such as the SCSI controller, must be able to operate
as an initiator.
The disk controller operates as a target. It carries out the commands it receives from the initiator.
The initiator establishes a logical connection with the intended target. Once this connection has
been established, it can be suspended and restored as needed to transfer commands and bursts of
data.
While a particular connection is suspended, other device can use the bus to transfer information.
This ability to overlap data transfer requests is one of the key features of the SCSI bus that leads to
its high performance.
Data transfers on the SCSI bus are always controlled by the target controller. To send a command
to a target, an initiator requests control of the bus and, after winning arbitration, selects the
controller it wants to communicate with and hands control of the bus over to it. Then the controller

GHRU Amravati Dept. of CSE Page 30

starts a data transfer operation to receive a command from the initiator. The processor sends a
command to the SCSI controller, which causes the following sequence of event to take place:
The SCSI controller, acting as an initiator, contends for control of the bus.
When the initiator wins the arbitration process, it selects the target controller and hands over
control of the bus to it.
The target starts an output operation (from initiator to target); in response to this, the initiator
sends a command specifying the required read operation.
USB
The USB has been designed to meet several key objectives such as:
• Provide a simple, low-cost and easy to use interconnection system that overcomes the difficulties
due to the limited number of I/O ports available on a computer
• Accommodate a wide range of data transfer characteristics for I/O devices, including telephone
and Internet connections
• Enhance user convenience through a “plug-and-play” mode of operation
Port Limitation
Here to add new ports, a user must open the computer box to gain access to the internal expansion
bus and install a new interface card. The user may also need to know how to configure the device
and the software. And also it is to make it possible to add many devices to a computer system at
any time, without opening the computer box.
Device Characteristics
The kinds of devices that may be connected to a computer cover a wide range of functionality -
speed, volume and timing constraints. A variety of simple devices attached to a computer generate
data in different asynchronous mode. A signal must be sampled quickly enough to track its
highest-frequency components.
Plug-and-play
Whenever a device is introduced, do not turn the computer off/restart to connect/disconnect a
device. The system should detect the existence of this new device automatically, identify the
appropriate device-driver software and any other facilities needed to service that device, and
establish the appropriate addresses and logical connections to enable them to communicate.
USB architecture
To accommodate a large number of devices that can be added or removed at any time, the USB
has the tree structure. Each node has a device called a hub. Root hub, functions, split bus
operations – high speed (HS) and Full/Low speed (F/LS).

GHRU Amravati Dept. of CSE Page 31

UNIT-IV

What is Pipeline?

Pipelining is a technique used in modern processors to improve performance by executing

multiple instructions simultaneously. It breaks down the execution of instructions into several
stages, where each stage completes a part of the instruction. These stages can overlap,
allowing the processor to work on different instructions at various stages of completion.
Design of a basic Pipeline
 In a pipelined processor, a pipeline has two ends, the input end and the output end.
Between these ends, there are multiple stages/segments such that the output of one stage is
connected to the input of the next stage and each stage performs a specific operation.
 Interface registers are used to hold the intermediate output between two stages. These
interface registers are also called latch or buffer.
 All the stages in the pipeline along with the interface registers are controlled by a common
clock.
Execution in a pipelined processor Execution sequence of instructions in a pipelined
processor can be visualized using a space-time diagram. For example, consider a processor
having 4 stages and let there be 2 instructions to be executed. We can visualize the execution
sequence through the following space-time diagrams:
Non-Overlapped Execution
Stage / Cycle 1 2 3 4 5 6 7 8

S1 I1 I2

S2 I1 I2

S3 I1 I2

S4 I1 I2

Total time = 8 Cycle

Overlapped Execution
Stage / Cycle 1 2 3 4 5

S1 I1 I2

S2 I1 I2

S3 I1 I2

S4 I1 I2

Total time = 5 Cycle Pipeline Stages RISC processor has 5 stage instruction pipeline to
execute all the instructions in the RISC instruction set. Following are the 5 stages of the RISC
pipeline with their respective operations:
 Stage 1 (Instruction Fetch): In this stage the CPU fetches the instructions from the
address present in the memory location whose value is stored in the program counter.

GHRU Amravati Dept. of CSE Page 32

 Stage 2 (Instruction Decode): In this stage, the instruction is decoded and register file is
accessed to obtain the values of registers used in the instruction.
 Stage 3 (Instruction Execute): In this stage some of activities are done such
as ALU operations.
 Stage 4 (Memory Access): In this stage, memory operands are read and written from/to
the memory that is present in the instruction.
 Stage 5 (Write Back): In this stage, computed/fetched value is written back to the register
present in the instructions.

Performance of a pipelined processor Consider a ‘k’ segment pipeline with clock cycle time
as ‘Tp’. Let there be ‘n’ tasks to be completed in the pipelined processor. Now, the first
instruction is going to take ‘k’ cycles to come out of the pipeline but the other ‘n – 1’
instructions will take only ‘1’ cycle each, i.e, a total of ‘n – 1’ cycles. So, time taken to execute
‘n’ instructions in a pipelined processor:
ETpipeline = k + n – 1 cycles
= (k + n – 1) Tp
In the same case, for a non-pipelined processor, the execution time of ‘n’ instructions will be:
ETnon-pipeline = n * k * Tp
So, speedup (S) of the pipelined processor over the non-pipelined processor, when ‘n’ tasks
are executed on the same processor is:
S = Performance of non-pipelined processor /
Performance of pipelined processor
As the performance of a processor is inversely proportional to the execution time, we have,
S = ETnon-pipeline / ETpipeline
=> S = [n * k * Tp] / [(k + n – 1) * Tp]
S = [n * k] / [k + n – 1]
When the number of tasks ‘n’ is significantly larger than k, that is, n >> k
S=n*k/n
S=k
where ‘k’ are the number of stages in the pipeline. Also, Efficiency = Given speed up / Max
speed up = S / Smax We know that Smax = k So, Efficiency = S / k Throughput = Number of
instructions / Total time to complete the instructions So, Throughput = n / (k + n – 1) * Tp
Note: The cycles per instruction (CPI) value of an ideal pipelined processor is 1 Please see Set
2 for Dependencies and Data Hazard and Set 3 for Types of pipeline and Stalling.
Performance of pipeline is measured using two main metrices as Throughput and latency.
What is Throughout?
 It measure number of instruction completed per unit time.
 It represents overall processing speed of pipeline.
 Higher throughput indicate processing speed of pipeline.
 Calculated as, throughput= number of instruction executed/ execution time.
 It can be affected by pipeline length, clock frequency. efficiency of instruction execution
and presence of pipeline hazards or stalls.
What is Latenecy?
 It measure time taken for a single instruction to complete its execution.
 It represents delay or time it takes for an instruction to pass through pipeline stages.
 Lower latency indicates better performance .
 It is calculated as, Latency= Execution time/ Number of instruction executed.
 It in influenced by pipeline length, depth, clock cycle time, instruction dependencies and
pipeline hazards.
Advantages of Pipelining
 Increased Throughput: Pipelining enhance the throughput capacity of a CPU and enables
a number of instruction to be processed at the same time at different stages. This leads to
the improvement of the amount of instructions accomplished in a given period of time, thus
improving the efficiency of the processor.

GHRU Amravati Dept. of CSE Page 33

 Improved CPU Utilization: From superimposing of instructions, pipelining helps to
ensure that different sections of the CPU are useful. This gives no time for idling of the
various segments of the pipeline and optimally utilizes hardware resources.
 Higher Instruction Throughput: Pipelining occurring because when one particular
instruction is in the execution stage it is possible for other instructions to be at varying
stages of fetch, decode, execute, memory access, and write-back. In this manner there is
concurrent processing going on and the CPU is able to process more number of
instructions in a given time frame than in non pipelined processors.
 Better Performance for Repeated Tasks: Pipelining is particularly effective when all the
tasks are accompanied by repetitive instructions, because the use of the pipeline shortens
the amount of time each task takes to complete.
 Scalability: Pipelining is RSVP implemented in different types of processors hence it is
scalable from simple CPU’s to an advanced multi-core processor.
Disadvantages of Pipelining
 Pipeline Hazards: Pipelining may result to data hazards whereby instructions depends on
other instructions; control hazards, which arise due to branch instructions; and structural
hazards whereby there are inadequate hardware facilities. Some of these hazards may lead
to delays hence tough strategies to manage them to ensure progress is made.
 Increased Complexity: Pipelining enhances the complexity of processor design as well as
its application as compared to non-pipelined structures. Pipelining stages management,
dealing with the risks and correct instruction sequence contribute to the design and control
considerations.
 Stall Cycles: When risks are present, pipeline stalls or bubbles can be brought about, and
this produces idle times in certain stages in the pipeline. These stalls can actually remove
some of the cycles acquired by pipelining, thus reducing the latter’s efficiency.
 Instruction Latency: While pipelining increases the throughput of instructions the delay
of each instruction may not necessarily be reduced. Every instruction must still go through
all the pipeline stages and the time it takes for a single instruction to execute can neither
reduce nor decrease significantly due to overheads.
 Hardware Overhead: It increases the complexity in designing the pipelining due to the
presence of pipeline registers and the control logic used in managing the pipe stages and
the data. This not only increases the cost of the wares but also forces integration of more
complicated, and thus costly, hardware.

Arithmetic Pipeline

1. Arithmetic Pipeline:
An arithmetic pipeline divides an arithmetic problem into various sub problems for execution
in various pipeline segments. It is used for floating point operations, multiplication and various
other computations. The process or flowchart arithmetic pipeline for floating point addition is
shown in the diagram.

GHRU Amravati Dept. of CSE Page 34

Floating point addition using arithmetic pipeline:
The following sub operations are performed in this case:
1. Compare the exponents.
2. Align the mantissas.
3. Add or subtract the mantissas.
4. Normalise the result
First of all the two exponents are compared and the larger of two exponents is chosen as the
result exponent. The difference in the exponents then decides how many times we must shift
the smaller exponent to the right. Then after shifting of exponent, both the mantissas get
aligned. Finally the addition of both numbers take place followed by normalisation of the
result in the last segment.
Example:
Let us consider two numbers,
X=0.3214*10^3 and Y=0.4500*10^2
Explanation:

GHRU Amravati Dept. of CSE Page 35

First of all the two exponents are subtracted to give 3-2=1. Thus 3 becomes the exponent of
result and the smaller exponent is shifted 1 times to the right to give
Y=0.0450*10^3
Finally the two numbers are added to produce
Z=0.3664*10^3
As the result is already normalized the result remains the same.

2. Instruction Pipeline:
In this a stream of instructions can be executed by overlapping fetch, decode and execute
phases of an instruction cycle. This type of technique is used to increase the throughput of the
computer system. An instruction pipeline reads instruction from the memory while previous
instructions are being executed in other segments of the pipeline. Thus we can execute
multiple instructions simultaneously. The pipeline will be more efficient if the instruction
cycle is divided into segments of equal duration. In the most general case computer needs to
process each instruction in following sequence of steps:
1. Fetch the instruction from memory (FI)
2. Decode the instruction (DA)
3. Calculate the effective address
4. Fetch the operands from memory (FO)
5. Execute the instruction (EX)
6. Store the result in the proper place
The flowchart for instruction pipeline is shown below.

Let us see an example of instruction pipeline.

Example:

GHRU Amravati Dept. of CSE Page 36

Here the instruction is fetched on first clock cycle in segment 1. Now it is decoded in next
clock cycle, then operands are fetched and finally the instruction is executed. We can see that
here the fetch and decode phase overlap due to pipelining. By the time the first instruction is
being decoded, next instruction is fetched by the pipeline. In case of third instruction we see
that it is a branched instruction. Here when it is being decoded 4th instruction is fetched
simultaneously. But as it is a branched instruction it may point to some other instruction when
it is decoded. Thus fourth instruction is kept on hold until the branched instruction is executed.
When it gets executed then the fourth instruction is copied back and the other phases continue
as usual.

Pipeline Hazards

There are three kinds of hazards:

 Structural Hazards
 Data Hazards
 Control Hazards

Structural Hazards

Structural hazards arise due to hardware resource conflict amongst the instructions in the
pipeline. A resource here could be the Memory, a Register in GPR or ALU. This resource
conflict is said to occur when more than one instruction in the pipe is requiring access to the
same resource in the same clock cycle. This is a situation that the hardware cannot handle all
possible combinations in an overlapped pipelined execution.

Observe the figure 16.1. In any system, instruction is fetched from memory in IF machine cycle.
In our 4-stage pipeline Result Writing (RW) may access memory or one of the General Purpose

GHRU Amravati Dept. of CSE Page 37

Registers depending on the instruction. At t4, Instruction-1(I1) is at RW stage and Instruction-
4(I4) at IF stage. Alas!. Both I1 and I4 are accessing the same resource i.e memory if I1 is a
STORE instruction. How is it possible to access memory by 2 instructions from the same CPU
in a timing state? Impossible. This is structural dependency. What is the solution?

Solution 1: Introduce bubble which stalls the pipeline as in figure 16.2. At t4, I4 is not allowed
to proceed, rather delayed. It could have been allowed in t5, but again a clash with I2 RW. For
the same reason, I4 is not allowed in t6 too. Finally, I4 could be allowed to proceed (stalled) in
the pipe only at t7.

This delay is percolated to all the subsequent instructions too. Thus, while the ideal 4-stage
system would have taken 8 timing states to execute 5 instructions, now due to structural
dependency it has taken 11 timing states. Just not this. By now you would have guessed that this
hazard is likely to happen at every 4th instruction. Not at all a good solution for a heavy load on
CPU. Is there a better way? Yes!

A better solution would be to increase the structural resources in the system using one of the few
choices below:

 The pipeline may be increased to 5 or more stages and suitably redefine the functionality
of the stages and adjust the clock frequency. This eliminates the issue of the hazard at
every 4th instruction in the 4-stage pipeline
 The memory may physically be separated as Instruction memory and Data Memory. A
Better choice would be to design as Cache memory in CPU, rather than dealing with
Main memory. IF uses Instruction memory and Result writing uses Data Memory. These
become two separate resources avoiding dependency.
 It is possible to have Multiple levels of Cache in CPU too.
 There is a possibility of ALU in resource dependency. ALU may be required in IE
machine cycle by an instruction while another instruction may require ALU in IF stage to
calculate Effective Address based on addressing mode. The solution would be either
stalling or have an exclusive ALU for address calculation.
 Register files are used in place of GPRs. Register files have multiport access with
exclusive read and write ports. This enables simultaneous access on one write register
and read register.

GHRU Amravati Dept. of CSE Page 38

Data Hazards

Data hazards occur when an instruction's execution depends on the results of some previous
instruction that is still being processed in the pipeline. Consider the example below.

In the above case, ADD instruction writes the result into the register R3 in t5. If bubbles
are not introduced to stall the next SUB instruction, all three instructions would be using the
wrong data from R3, which is earlier to ADD result. The program goes wrong! The possible
solutions before us are:

Solution 1: Introduce three bubbles at SUB instruction IF stage. This will facilitate SUB – ID to
function at t6. Subsequently, all the following instructions are also delayed in the pipe.

Solution 2: Data forwarding - Forwarding is passing the result directly to the functional unit
that requires it: a result is forwarded from the output of one unit to the input of another. The
purpose is to make available the solution early to the next instruction.

In this case, ADD result is available at the output of ALU in ADD –IE i.e t3 end. If this can be
controlled and forwarded by the control unit to SUB-IE stage at t4, before writing on to output
register R3, then the pipeline will go ahead without any stalling. This requires extra logic to
identify this data hazard and act upon it. It is to be noted that although normally Operand Fetch
happens in the ID stage, it is used only in IE stage. Hence forwarding is given to IE stage as
input. Similar forwarding can be done with OR and AND instruction too.

GHRU Amravati Dept. of CSE Page 39

Solution 3: Compiler can play a role in detecting the data dependency and reorder (resequence)
the instructions suitably while generating executable code. This way the hardware can be eased.

Solution 4: In the event, the above reordering is infeasible, the compiler may detect and
introduce NOP ( no operation) instruction(s). NOP is a dummy instruction equivalent bubble,
introduced by the software.

The compiler looks into data dependencies in code optimisation stage of the compilation
process.

Data Hazards classification

Data hazards are classified into three categories based on the order of READ or WRITE
operation on the register and as follows:

1. RAW (Read after Write) [Flow/True data dependency]

This is a case where an instruction uses data produced by a previous one. Example

ADD R0, R1, R2

SUB R4, R3, R0

2. WAR (Write after Read) [Anti-Data dependency]

This is a case where the second instruction writes onto register before the first instruction
reads. This is rare in a simple pipeline structure. However, in some machines with
complex and special instructions case, WAR can happen.

ADD R2, R1, R0

SUB R0, R3, R4

3. WAW (Write after Write) [Output data dependency]

This is a case where two parallel instructions write the same register and must do it in the
order in which they were issued.

GHRU Amravati Dept. of CSE Page 40

ADD R0, R1, R2
SUB R0, R4, R5

WAW and WAR hazards can only occur when instructions are executed in parallel or out of
order. These occur because the same register numbers have been allotted by the compiler
although avoidable. This situation is fixed by renaming one of the registers by the compiler or by
delaying the updating of a register until the appropriate value has been produced.
Modern CPUs not only have incorporated Parallel execution with multiple ALUs but also Out
of order issue and execution of instructions along with many stages of pipelines.

Control Hazards

Control hazards are called Branch hazards and caused by Branch Instructions. Branch
instructions control the flow of program/ instructions execution. Recall that we use conditional
statements in the higher-level language either for iterative loops or with conditions checking
(correlate with for, while, if, case statements). These are transformed into one of the variants of
BRANCH instructions. It is necessary to know the value of the condition being checked to get
the program flow. Life is complicating you! So it is for the CPU!

Thus a Conditional hazard occurs when the decision to execute an instruction is based on the
result of another instruction like a conditional branch, which checks the condition’s resultant
value.

The branch and jump instructions decide the program flow by loading the appropriate location in
the Program Counter(PC). The PC has the value of the next instruction to be fetched and
executed by CPU. Consider the following sequence of instructions.

In this case, there is no point in fetching the I3. What happens to the pipeline? While in I2, the I3
fetch needs to be stopped. This can be known only after I2 is decoded as JMP and not until then.
So the pipeline cannot proceed at its speed and hence this is a Control Dependency (hazard). In
case I3 is fetched in the meantime, it is not only a redundant work but possibly some data in
registers might have got altered and needs to be undone.

Similar scenarios arise with conditional JMP or BRANCH.

GHRU Amravati Dept. of CSE Page 41

Solutions for Conditional Hazards

1. Stall the Pipeline as soon as decoding any kind of branch instructions. Just not allow
anymore IF. As always, stalling reduces throughput. The statistics say that in a program,
at least 30% of the instructions are BRANCH. Essentially the pipeline operates at 50%
capacity with Stalling.
2. Prediction – Imagine a for or while loop getting executed for 100 times. We know for
sure 100 times the program flows without the branch condition being met. Only in the
101st time, the program comes out of the loop. So, it is wiser to allow the pipeline to
proceed and undo/flush when the branch condition is met. This does not affect the
throttle of the pipeline as much stalling.
3. Dynamic Branch Prediction - A history record is maintained with the help of Branch
Table Buffer (BTB). The BTB is a kind of cache, which has a set of entries, with the PC
address of the Branch Instruction and the corresponding effective branch address. This is
maintained for every branch instruction encountered. SO whenever a conditional branch
instruction is encountered, a lookup for the matching branch instruction address from the
BTB is done. If hit, then the corresponding target branch address is used for fetching the
next instruction. This is called dynamic branch prediction.

Figure 16.6 Branch Table Buffer

This method is successful to the extent of the temporal locality of reference in the
programs. When the prediction fails flushing needs to take place.

4. Reordering instructions - Delayed branch i.e. reordering the instructions to position the
branch instruction later in the order, such that safe and useful instructions which are not
affected by the result of a branch are brought-in earlier in the sequence thus delaying the
branch instruction fetch. If no such instructions are available then NOP is introduced.
This delayed branch is applied with the help of Compiler.

I do not want to load the reader with more timing state diagram. But I am sure the earlier
discussions would have familiarised the reader to understand with words.

Last but not the least, in a pipelined design, the Control unit is expected to handle the following
scenarios:

 No Dependence
 Dependence requiring Stall
 Dependence solution by Forwarding
 Dependence with access in order
 Out of Order Execution
 Branch Prediction Table and more

Pipeline Hazards knowledge is important for designers and Compiler writers.

GHRU Amravati Dept. of CSE Page 42

LOGIC DESIGN CONVENTIONS

Types of logic elements

---> Information encoded in binary

 Low voltage = 0, High voltage = 1

 One wire per bit
 Multi-bit data encoded on multi-wire buses

---> Combinational element

 Operate on data
 Output is a function of input
 Output only depends on the current input
 Uses for ALU, multiplier, and other datapath

---> State (sequential) elements

 Store information
 State element to store the states
 Output depends on current inputs and current states

GHRU Amravati Dept. of CSE Page 43

COMBINATIONAL ELEMENTS

SEQUENTIAL ELEMENTS

Register: stores data in a circuit (use D flip flop)

---> Uses a clock signal to determine when to update the stored value

---> Edge-triggered: update when CLK changes from 0 to 1

GHRU Amravati Dept. of CSE Page 44

The logical operation of the positive edge-triggered D flip-flop is summarized in the table
below :

To write new data in the register, we use D flip flop with Write Enable

--->Write Enable:
 0: Only updates on clock edge where the output of the register becomes the input
itself (Data in register will not change.
 1: New data is fed to the flip-flop and the register changes its state

CLOCKING METHODOLOGY

Clocking methodology

 Defines when signals can be read and when they can be written
 Mainstream: An edge triggered methodology
 Determine when data is valid and stable relative to the clock

Typical execution:

– read contents of some state elements,

– send values through some combinational logic
– write results to one or more state elements

GHRU Amravati Dept. of CSE Page 45

GHRU Amravati Dept. of CSE Page 46
Unit-V

Hardwired Control Unit

A Hardwired Control Unit (HCU) is a type of control unit in a computer’s CPU that generates
control signals using fixed electronic circuits, such as combinational logic and sequential logic
gates. It directly decodes the instruction and controls the execution process through predefined
hardware paths.

Operation of Hardwired Control Unit

1. Instruction Fetch: The control unit fetches the instruction from memory.
2. Instruction Decode: The instruction is decoded to determine the required operations.
3. Signal Generation: Control signals are generated using logic circuits.
4. Execution: The required control signals are sent to different CPU components (ALU,
Registers, Memory, etc.).
5. Next Instruction: The control unit moves to the next instruction.

Benefits of Hardwired Control Unit

✅ Faster Execution – Since it is implemented using hardware circuits, it operates faster than
microprogrammed control units.
✅ Efficient for Simple Instruction Sets – Works well for simple and fixed instruction sets like

GHRU Amravati Dept. of CSE Page 47

RISC (Reduced Instruction Set Computing).
✅ Lower Latency – As control signals are generated directly, there is minimal delay.

Benefits of Hardwired Control Unit

Faster Execution – Since it is implemented using hardware circuits, it operates faster than
microprogrammed control units.
Efficient for Simple Instruction Sets – Works well for simple and fixed instruction sets like
RISC (Reduced Instruction Set Computing).
Lower Latency – As control signals are generated directly, there is minimal delay.

Design of Hardwired Control Unit

The design consists of several key components:

1. Instruction Decoder: Converts opcode into control signals.

2. Sequence Generator: Determines the sequence of control signals for instruction
execution.
3. Control Logic: Combinational logic that produces necessary signals based on instruction
type.

Hardwired control units are typically designed using finite state machines (FSMs) to handle
different instruction cycles.

Limitations of Hardwired Control Unit

Difficult to Modify – Changing the control logic requires redesigning the entire circuit.
Complex for CISC Architecture – Complex instruction sets require large and complicated
hardware.
Not Scalable – If new instructions need to be added, the whole unit may need redesigning.

Microprogrammed Control Unit.

A Microprogrammed Control Unit (MCU) is a control mechanism in a CPU where control

signals are generated using a stored sequence of microinstructions (microprograms). Instead of
hardwired logic, the control unit uses memory to store control instructions.

Design of Microprogrammed Control Unit

The Microprogrammed Control Unit consists of the following components:

1. Control Memory – Stores the microinstructions (microprograms) for executing

instructions.
2. Microinstruction Register (MIR) – Holds the current microinstruction being executed.
3. Control Address Register (CAR) – Contains the address of the next microinstruction.
4. Microprogram Sequencer – Determines the next microinstruction based on conditions
and branching.

GHRU Amravati Dept. of CSE Page 48

5. Decoder & Control Logic – Decodes microinstructions and generates control signals
accordingly.

Microinstructions are fetched from control memory, decoded, and executed in sequence to
generate the required control signals for CPU operations.

Operations of Microprogrammed Control Unit

1. Instruction Fetch: The instruction is fetched from the main memory.

2. Instruction Decode: The instruction is decoded into smaller microinstructions.
3. Microinstruction Execution: The control unit fetches the corresponding
microinstructions from control memory and generates control signals.
4. Sequencing: The control unit determines the next microinstruction address based on the
execution flow.
5. Execution Completion: The execution continues until all microinstructions for the
current instruction are processed.

Benefits of Microprogrammed Control Unit

Easier to Modify & Update – New instructions can be added by modifying the control
memory without redesigning hardware.
Supports Complex Instructions – Ideal for CISC (Complex Instruction Set Computing)
architectures.
Simpler & Cost-Effective Design – Requires fewer logic circuits compared to a hardwired
control unit.
Better Fault Tolerance – Errors in control signals can be corrected by updating
microinstructions.

GHRU Amravati Dept. of CSE Page 49

Difference Between Hardwired and Micro-programmed Control Unit
The key difference between hardwired and micro-programmed design lies in how control signals
are generated and how flexibility is achieved in a processor's control unit.

1. Hardwired Control Unit:

 Design: The control signals are generated by a fixed set of logic gates and combinational
circuits. These gates are hardwired to perform specific tasks.
 Speed: Hardwired designs are generally faster because they use fixed logic paths.
 Flexibility: They are less flexible because any change in the instruction set or control
logic requires physical changes to the hardware.
 Complexity: The complexity can increase with the number of instructions because the
control unit requires more combinational logic.
 Example: Simple, early processors like the 8080.

2. Micro-programmed Control Unit:

 Design: In this design, control signals are generated by a set of microinstructions stored
in a control memory (also called microcode). Each instruction in the instruction set has a
corresponding microinstruction sequence.
 Speed: Micro-programmed control units are typically slower than hardwired ones
because fetching and decoding microinstructions from memory adds overhead.
 Flexibility: They are more flexible because modifying the control logic can be done by
changing the microcode, rather than redesigning the hardware.
 Complexity: The control unit is more complex, but easier to modify and expand.
 Example: More complex processors like the IBM 360, VAX.

Summary of Differences:
Feature Hardwired Control Unit Micro-programmed Control Unit
Control
Fixed, combinational circuits Stored microinstructions in memory
Logic
Slower (due to memory access for
Speed Faster
microinstructions)
Less flexible, requires hardware
Flexibility More flexible, modified via microcode
changes
Complexity Less complex for simpler tasks More complex but easier to update
Modification Difficult to modify once designed Easy to modify by changing the microcode

In essence, hardwired control units are faster but less adaptable, while micro-programmed units
are slower but offer more flexibility and ease of modification.

GHRU Amravati Dept. of CSE Page 50

Multi-Core processor Architecture
What is Multicore Processing?

Multicore processing refers to a computer architecture that incorporates multiple processing

units (called cores) within a single processor chip. Each core can independently execute
instructions, enabling the processor to handle multiple tasks simultaneously. This is a form of
parallel processing, where different cores can work on different parts of a task or multiple tasks
at the same time, leading to improved performance and efficiency.

Key Characteristics of Multicore Processors:

1. Multiple Cores: Each core can execute instructions independently, and the cores share
memory, but each core has its own processing unit.
2. Parallelism: It allows for parallel execution, meaning that multiple tasks can be
processed at once, increasing the overall throughput of the system.
3. Shared Resources: Cores typically share cache memory (L1, L2, or even L3 caches) and
sometimes the main system memory.

Types of Multicore Architectures:

1. Symmetric Multiprocessing (SMP):

o All cores have equal access to memory and share system resources.
o The operating system schedules tasks across all available cores.
2. Asymmetric Multiprocessing (AMP):
o One core is the master (or primary) and controls the rest, while the others work as
slaves performing specific tasks.

Advantages of Multicore Processors:

1. Increased Performance: Multiple cores can handle multiple tasks simultaneously,

improving overall system performance, especially for multi-threaded applications.
2. Energy Efficiency: Rather than increasing clock speeds (which can cause heat
dissipation and higher power consumption), multicore processors distribute the workload
across multiple cores, making them more energy-efficient.
3. Better Multitasking: With more cores, a system can perform multiple operations
concurrently, such as running multiple applications or processing tasks in the background
without slowing down.
4. Improved Scalability: Software that supports parallelism can scale better with multicore
processors. For instance, multi-threaded applications benefit significantly from having
more cores.
5. Cost-Effectiveness: Instead of building several single-core processors, integrating
multiple cores into one chip is a cost-effective approach.

Diagram: Multicore Processor Architecture Operations

The diagram below demonstrates the basic architecture of a multicore processor.

GHRU Amravati Dept. of CSE Page 51

Key Components in the Diagram:

 Cores: Each core can independently execute tasks. Multiple cores help in parallel
execution of tasks.
 Cache: Each core has its own cache (L1 cache), which stores frequently used data.
Caches can be shared between cores (L2 or L3 cache).
 Shared Memory: All cores typically have access to a shared memory space, allowing
them to share data and communicate.
 I/O Devices: Input and output devices are connected to the system, and their data is
processed by the cores.
 Interconnect Bus: This connects the cores and other components, allowing data
exchange between the processor, memory, and I/O devices.

Operations in Multicore Processors:

1. Task Division: The software divides tasks into smaller sub-tasks (threads). These threads
are then distributed across different cores to execute in parallel.
2. Execution: Each core processes its assigned tasks independently, allowing the system to
perform more computations in the same amount of time compared to a single-core
processor.
3. Synchronization: The cores communicate with each other through shared memory or
interconnects, ensuring that the data is consistent and operations are synchronized.
4. Load Balancing: The operating system or scheduler may dynamically allocate tasks to
different cores based on their availability to optimize overall performance.

Basic Concepts of Parallel Processing

Parallel processing is a type of computing architecture in which several processes are executed
simultaneously. It's designed to increase the speed of computation and handle large-scale tasks
efficiently. Here are some basic concepts in parallel processing:

GHRU Amravati Dept. of CSE Page 52

1. Parallelism

1. Data Parallelism: This involves distributing subsets of data across multiple processors.
Each processor performs the same operation on its subset of data simultaneously.
2. Task Parallelism: This involves dividing a task into smaller, independent tasks that can
be executed simultaneously on different processors.

2. Concurrency

1. Refers to the ability of a system to handle multiple tasks at the same time. Unlike
parallelism, concurrency doesn’t necessarily imply simultaneous execution; it just allows
for tasks to be interleaved, giving the illusion of simultaneous processing.

3. Threads

1. A thread is the smallest unit of a CPU's execution. Multiple threads can exist within the
same process and share resources like memory space, which makes it easier to implement
parallelism. Multithreading allows for better resource utilization and can be crucial for
parallel processing.

4. Processors and Cores

1. Processor (CPU): A CPU can be a single chip or multiple chips that execute
instructions.
2. Cores: A modern CPU often has multiple cores. Each core can independently execute a
task, so multi-core processors can execute multiple tasks at once, thus supporting
parallelism.

5. Shared vs. Distributed Memory

1. Shared Memory: Multiple processors share the same memory space, making it easy to
communicate and share data. However, managing access to shared memory (avoiding
conflicts) is crucial in this setup.
2. Distributed Memory: Each processor has its own private memory. Communication
between processors must occur over a network. This setup is used in systems like clusters
and supercomputers.

6. Synchronization

1. In parallel processing, it’s important to manage the execution order of threads and
processes to avoid issues like data races, where multiple threads access shared data
simultaneously, causing unpredictable results. Synchronization mechanisms (e.g., locks,
semaphores, barriers) help ensure that tasks are executed in a controlled manner.

7. Load Balancing

1. It’s the process of distributing tasks evenly across processors to ensure no single
processor is overburdened. Load balancing helps to maximize resource utilization and
improve the overall performance of a parallel system.

GHRU Amravati Dept. of CSE Page 53

8. Scalability

1. Scalability refers to the ability of a parallel system to handle increasing amounts of work
by adding more resources (e.g., processors or cores) without sacrificing performance. A
scalable parallel system maintains efficiency as the workload grows.

9. Amdahl's Law

1. Amdahl’s Law provides a theoretical limit to the speedup achievable by parallelizing a

task. It states that the speedup of a program due to parallelization is limited by the
portion of the program that cannot be parallelized. Even with infinite processors, the
sequential portion of the task will still limit overall performance.

Where:

 S = Speedup of the program

 P = Proportion of the program that can be parallelized
 (1 - P) = Proportion of the program that must remain sequential
 N = Number of processors

1. 10. Types of Parallel Architectures

1. Single Instruction, Multiple Data (SIMD): A single instruction is applied to multiple

pieces of data simultaneously (e.g., vector processors).
2. Multiple Instruction, Multiple Data (MIMD): Different instructions are applied to
different pieces of data at the same time. Most modern parallel systems (e.g., multi-core
CPUs) use MIMD.

11. Distributed Computing

1. This is a form of parallel processing in which computing resources are spread across
multiple physical machines that communicate over a network. Examples include cloud
computing and grid computing.

12. Parallel Programming Models

1. Message Passing Interface (MPI): A standard for communication between processes in

a parallel system, typically used in distributed memory systems.
2. OpenMP: A set of compiler directives and runtime libraries for parallel programming in
shared memory systems.
3. CUDA: A parallel computing platform and API model developed by NVIDIA, enabling
programmers to use GPUs for general-purpose processing.

13. Speedup and Efficiency

1. Speedup: The ratio of the time taken to execute a task on a single processor to the time
taken when the task is parallelized and run on multiple processors.

GHRU Amravati Dept. of CSE Page 54

2. Efficiency: Efficiency measures how well the system utilizes its processors. It is
calculated by dividing the speedup by the number of processors.

Classification of Parallel Architectures

Parallel architectures are categorized based on how multiple processors or processing units
communicate and share data. The classification is primarily based on Flynn’s Taxonomy and
memory organization.

1. Flynn’s Taxonomy (Based on Instruction & Data Streams)

Michael J. Flynn proposed a classification based on the number of instruction and data streams
in a system:

a) SISD (Single Instruction Single Data)

 A traditional sequential computing model.

 A single processor executes a single instruction stream on a single data stream.
 Example: Traditional uniprocessor systems (e.g., old desktop computers).

b) SIMD (Single Instruction Multiple Data)

 A single instruction operates on multiple data elements simultaneously.

 Used in vector processors and GPUs.
 Example: Graphics Processing Units (GPUs), Intel AVX (Advanced Vector Extensions).

c) MISD (Multiple Instruction Single Data)

 Multiple processors execute different instructions on the same data stream.

 Rarely used in practical computing.
 Example: Fault-tolerant systems (e.g., Spacecraft control systems).

d) MIMD (Multiple Instruction Multiple Data)

 Multiple processors execute different instructions on different data.

 Used in modern multiprocessor and distributed systems.
 Example: Multi-core processors, Supercomputers.

2. Classification Based on Memory Architecture

a) Shared Memory Architecture

 Multiple processors share a common memory and communicate through shared

variables.
 Advantages:
✅ Easier communication and data sharing.
✅ Lower latency in memory access.

GHRU Amravati Dept. of CSE Page 55

 Disadvantages:
✅ Memory bottleneck due to multiple processors accessing shared memory.
 Example: SMP (Symmetric Multiprocessing) systems like multi-core CPUs.

b) Distributed Memory Architecture

 Each processor has its own local memory and communicates via message passing.
 Advantages:
✅ Scalable to a large number of processors.
✅ Avoids memory contention.
 Disadvantages:
✅ Requires complex communication protocols.
 Example: Cluster Computing, MPI-based systems.

c) Hybrid Architecture (Shared + Distributed Memory)

 Combines shared and distributed memory models.

 Example: NUMA (Non-Uniform Memory Access), where processors have fast local
memory but can access remote memory at a higher latency.

3. Classification Based on Granularity of Parallelism

a) Fine-Grained Parallelism

 Tasks are divided into very small subtasks executed in parallel.

 Requires frequent synchronization.
 Example: Multithreading on a single CPU core.

b) Coarse-Grained Parallelism

 Larger independent tasks are executed in parallel with less frequent synchronization.
 Example: Distributed Computing (Hadoop, Spark).

Advanced Topics in Parallel Computing & Their Applications

As parallel computing continues to evolve, several advanced topics have emerged that improve
efficiency, scalability, and performance. Below are some key advanced topics along with their
applications.

1. Multi-Core and Many-Core Architectures

Topic:

 Multi-core processors contain multiple CPU cores on a single chip.

 Many-core architectures extend this further (e.g., GPUs with thousands of cores).
 Examples: Intel Core i9, AMD Ryzen, NVIDIA GPUs.

GHRU Amravati Dept. of CSE Page 56

Applications:

✅ Gaming & Graphics Processing – High-performance GPUs for rendering.

✅ Scientific Simulations – Weather forecasting, molecular modeling.
✅ AI & Machine Learning – Deep learning models run on many-core GPUs.

2. Heterogeneous Computing

Topic:

 Combines different types of processors (CPU, GPU, FPGA, TPU) for optimized
performance.
 Used in High-Performance Computing (HPC) environments.

Applications:

✅ Autonomous Vehicles – CPUs handle decision-making, GPUs process sensor data.

✅ Financial Computing – High-speed trading using FPGA acceleration.
✅ Medical Imaging – AI-based diagnostics using TPUs.

3. Quantum Computing

Topic:

 Uses quantum bits (qubits) instead of traditional binary bits.

 Capable of solving certain problems exponentially faster.
 Leading companies: Google, IBM, D-Wave.

Applications:

✅ Cryptography – Breaking RSA encryption using Shor’s Algorithm.

✅ Drug Discovery – Simulating molecules for pharmaceuticals.
✅ Optimization Problems – Logistics, supply chain management.

4. Neuromorphic Computing

Topic:

 Mimics the human brain using spiking neural networks.

 Low power consumption and high efficiency for AI tasks.

Applications:

✅ Edge AI Devices – AI-enabled sensors, IoT.

✅ Autonomous Robots – Smart navigation with minimal energy.
✅ Brain-Computer Interfaces – Direct interaction between machines and human brains.

GHRU Amravati Dept. of CSE Page 57

5. Parallel Programming Models

Topic:

 MPI (Message Passing Interface): Used in distributed memory systems.

 OpenMP: Multi-threading for shared memory systems.
 CUDA: GPU programming by NVIDIA.
 TensorFlow & PyTorch: AI frameworks using parallel processing.

Applications:

✅ Supercomputing Clusters – Simulating black hole collisions.

✅ Deep Learning – Training large-scale AI models.
✅ Big Data Analytics – Hadoop and Spark for massive datasets.

6. Edge & Fog Computing

Topic:

 Moves computation closer to data sources (IoT devices, sensors) instead of centralized
cloud processing.

Applications:

✅ Smart Cities – Real-time traffic monitoring and control.

✅ Healthcare IoT – Patient monitoring devices for early diagnosis.
✅ Industrial Automation – Real-time predictive maintenance in factories.

7. Fault Tolerance & Resilient Computing

Topic:

 Methods to ensure systems continue working despite failures.

 Techniques: Checkpointing, Redundancy, Error Correction Codes (ECC).

Applications:

✅ Aerospace & Space Missions – NASA’s Mars rovers need fault tolerance.
✅ Financial Systems – Ensuring no transaction loss in banking.
✅ Cloud Computing – Data replication in distributed storage.

GHRU Amravati Dept. of CSE Page 58

18csc203j Coa Full Unit1
No ratings yet
18csc203j Coa Full Unit1
183 pages
Computer Architecture & Organisation Unit-1
No ratings yet
Computer Architecture & Organisation Unit-1
22 pages
Computer System Architecture
No ratings yet
Computer System Architecture
27 pages
Computer Organisation and Architecture.
100% (1)
Computer Organisation and Architecture.
178 pages
COA 100 Important Question and Answers (Draft)
No ratings yet
COA 100 Important Question and Answers (Draft)
79 pages
Computer Organization Notes
No ratings yet
Computer Organization Notes
116 pages
CO & A All Modules Notes 21CS34 PDF
100% (2)
CO & A All Modules Notes 21CS34 PDF
190 pages
unit_1_coa.pptx[1]
No ratings yet
unit_1_coa.pptx[1]
36 pages
CS313- Introduction to Computer Systems-lecture Note 5
No ratings yet
CS313- Introduction to Computer Systems-lecture Note 5
31 pages
GROUP 1 Alvie and Mr
No ratings yet
GROUP 1 Alvie and Mr
16 pages
02 Basic Structure of Computer
No ratings yet
02 Basic Structure of Computer
6 pages
Coa Notes Aktu
No ratings yet
Coa Notes Aktu
147 pages
COARM_Module_1_notes
No ratings yet
COARM_Module_1_notes
25 pages
UNIT-1 COA
No ratings yet
UNIT-1 COA
40 pages
Unit-1 Coa
No ratings yet
Unit-1 Coa
43 pages
Computer Organization, Unit 1 & 2
No ratings yet
Computer Organization, Unit 1 & 2
198 pages
Csa-final Compressed 1649216793
No ratings yet
Csa-final Compressed 1649216793
142 pages
Functional Units of A Computer System: (A) Arithmetic Logical Unit (ALU)
No ratings yet
Functional Units of A Computer System: (A) Arithmetic Logical Unit (ALU)
71 pages
Coa Unit1,2 Merged
No ratings yet
Coa Unit1,2 Merged
41 pages
MODULE 3
No ratings yet
MODULE 3
90 pages
BPOPS103 pdf
No ratings yet
BPOPS103 pdf
43 pages
Lecture # 01(COAL)
No ratings yet
Lecture # 01(COAL)
67 pages
CO - OS Unit-1 (Part1)
No ratings yet
CO - OS Unit-1 (Part1)
40 pages
csa notes
No ratings yet
csa notes
142 pages
StudyMaterial_CSE_3RD_CSA_BanitaPatri
No ratings yet
StudyMaterial_CSE_3RD_CSA_BanitaPatri
50 pages
Unit1
No ratings yet
Unit1
224 pages
Chapter1 Basic Structure of Computers
100% (2)
Chapter1 Basic Structure of Computers
7 pages
Question Bank - COA
No ratings yet
Question Bank - COA
58 pages
Module 3, Notes PDF
No ratings yet
Module 3, Notes PDF
17 pages
Cao Assignment 1 Answers New
No ratings yet
Cao Assignment 1 Answers New
14 pages
Unit I
No ratings yet
Unit I
15 pages
Unit I
No ratings yet
Unit I
45 pages
Computer Architecture and Organisation Notes
100% (1)
Computer Architecture and Organisation Notes
18 pages
Week 3- Review of Wk 1 and WK 2 Lectures
No ratings yet
Week 3- Review of Wk 1 and WK 2 Lectures
10 pages
COA Notes For MCA UNIT-1
No ratings yet
COA Notes For MCA UNIT-1
6 pages
Coa Unit1 (Part 1)
No ratings yet
Coa Unit1 (Part 1)
95 pages
COA Unit-1 Notes (P1)
No ratings yet
COA Unit-1 Notes (P1)
14 pages
Basic Computer Organization: Unit - 1
No ratings yet
Basic Computer Organization: Unit - 1
9 pages
Quantum COA
88% (17)
Quantum COA
293 pages
Syllabus
No ratings yet
Syllabus
29 pages
COmputer Booklet
No ratings yet
COmputer Booklet
7 pages
Unit I Basic Structure of Computers
No ratings yet
Unit I Basic Structure of Computers
20 pages
COMPUTER ORGANIZATION AND ARCHITECTURE
No ratings yet
COMPUTER ORGANIZATION AND ARCHITECTURE
19 pages
Unit-1-1
No ratings yet
Unit-1-1
22 pages
Unit 1&2 Coa
No ratings yet
Unit 1&2 Coa
38 pages
Introduction To Computer Architecture and Organization
No ratings yet
Introduction To Computer Architecture and Organization
40 pages
Computer Organization KCS 302
No ratings yet
Computer Organization KCS 302
30 pages
(RB) COA - Notes UNIT-1
No ratings yet
(RB) COA - Notes UNIT-1
20 pages
COA-Unit 1 (Intoduction)
No ratings yet
COA-Unit 1 (Intoduction)
20 pages
Final Co Notes 1 38
No ratings yet
Final Co Notes 1 38
38 pages
Unit 1 Computer Architecture
No ratings yet
Unit 1 Computer Architecture
36 pages
Cos103 CPU-and InstFormat
No ratings yet
Cos103 CPU-and InstFormat
16 pages
Co Notes Module 1
No ratings yet
Co Notes Module 1
42 pages
Unit 1
No ratings yet
Unit 1
22 pages
Unit 1
No ratings yet
Unit 1
6 pages
SQL Test
No ratings yet
SQL Test
972 pages
Microsoft Python Certification Exam: Question Bank
No ratings yet
Microsoft Python Certification Exam: Question Bank
119 pages
Defense HTTP and MQ API (V7.02.003)
No ratings yet
Defense HTTP and MQ API (V7.02.003)
495 pages
Examen Oracle
67% (3)
Examen Oracle
237 pages
Unit 4 DBMS
No ratings yet
Unit 4 DBMS
15 pages
14 EJB Design Patterns
No ratings yet
14 EJB Design Patterns
116 pages
2B Cover Letter
50% (4)
2B Cover Letter
2 pages
DBMS E21 E22 E23 Cse3001 1
100% (1)
DBMS E21 E22 E23 Cse3001 1
2 pages
Introduction To Teradata
No ratings yet
Introduction To Teradata
51 pages
Lab 1 ADMS
No ratings yet
Lab 1 ADMS
14 pages
ABHINAY VARMA PINNAMARAJU - Data Engineering
No ratings yet
ABHINAY VARMA PINNAMARAJU - Data Engineering
6 pages
Scope and Limitation
100% (7)
Scope and Limitation
2 pages
Trusted Communication
No ratings yet
Trusted Communication
30 pages
Splunk User
No ratings yet
Splunk User
11 pages
Oracle RAC Commands
100% (2)
Oracle RAC Commands
2 pages
Assignment II CST131
No ratings yet
Assignment II CST131
19 pages
Lecture 16
No ratings yet
Lecture 16
92 pages
Input - Output Exploring Java - Io
No ratings yet
Input - Output Exploring Java - Io
8 pages
PPS SOLVED Previous Years Questions PDF
No ratings yet
PPS SOLVED Previous Years Questions PDF
19 pages
Salihun Award BIOS
No ratings yet
Salihun Award BIOS
28 pages
IEEE 802.15.4 and Zigbee Overview
No ratings yet
IEEE 802.15.4 and Zigbee Overview
22 pages
Essbase: Stable Release Operating System Type License
No ratings yet
Essbase: Stable Release Operating System Type License
9 pages
Izpack Installer Guide
No ratings yet
Izpack Installer Guide
26 pages
WP RedisLabs Geospatial Redis
No ratings yet
WP RedisLabs Geospatial Redis
12 pages
What Is Multimedia?: Multimedia Means That Computer Information Can Be Represented Through Audio, Video, and Animation in
No ratings yet
What Is Multimedia?: Multimedia Means That Computer Information Can Be Represented Through Audio, Video, and Animation in
2 pages
Mongodb MCQ
No ratings yet
Mongodb MCQ
3 pages
Er Diagram: Data Base Design
No ratings yet
Er Diagram: Data Base Design
5 pages
List of Useful Keyboard Shortcuts For Command Prompt in Windows
No ratings yet
List of Useful Keyboard Shortcuts For Command Prompt in Windows
7 pages
4G Advanced M2M GW: IDG851-LT001
No ratings yet
4G Advanced M2M GW: IDG851-LT001
2 pages
Certification Training: Cisco
No ratings yet
Certification Training: Cisco
3 pages
Fundamentals of Modern Computer Architecture: From Logic Gates to Parallel Processing
From Everand
Fundamentals of Modern Computer Architecture: From Logic Gates to Parallel Processing
Sam Steed
No ratings yet
The Ultimate Guide to Mastering Technology
From Everand
The Ultimate Guide to Mastering Technology
Anshu Goyal
No ratings yet
Quantum Computer Vs Traditional Computer
From Everand
Quantum Computer Vs Traditional Computer
Arief Muinnudin
No ratings yet
The complete guide to Hardware Technician Terminology: A simplified guide
From Everand
The complete guide to Hardware Technician Terminology: A simplified guide
Sumitra Kumari
No ratings yet
History Of Computers
From Everand
History Of Computers
IntroBooks Team
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.