Micro 1
Micro 1
Here’s a breakdown of the Booth's multiplication algorithm is an efficient method for multiplying signed binary numbers in Using the addresses , two address, one address instructions do following example C=(A*B)/D
three addressing modes you asked about: two's complement form. It reduces the number of additions and subtractions by encoding the ### 1. **Three-Address Instructions**
### 1. Immediate Addressing Mode multiplier to handle sequences of 1's more effectively. In three-address instructions, each instruction can specify two source operands and a destination.
In **Immediate Addressing Mode**, the operand is a constant value or literal that is embedded ### Steps of Booth's Algorithm: ```assembly
directly in the instruction. The operand does not need to be fetched from memory; it is available 1. **Initialize:** MUL R1, A, B ; R1 = A * B
immediately. - Set the multiplicand (`M`), multiplier (`Q`), an extra bit (`Q-1` = 0), and an accumulator (`A` = 0). DIV C, R1, D ; C = R1 / D (C = (A * B) / D)
- **Example in Assembly:** 2. **Process Bits:** ### 2. **Two-Address Instructions**
MOV AL, 5 ; Move the immediate value 5 into register AL - For each bit in the multiplier, examine the current least significant bit of `Q` (`Q0`) and `Q-1`. In two-address instructions, the result overwrites one of the source operands.
- **Explanation:** The value `5` is directly provided in the instruction. Here, `AL` is a register, and - If `Q0 = 0` and `Q-1 = 0`: Do nothing (just shift). ```assembly
`5` is the immediate operand. - If `Q0 = 1` and `Q-1 = 0`: Subtract `M` from `A`. MOV R1, A ; R1 = A
### 2. Register Addressing Mode - If `Q0 = 0` and `Q-1 = 1`: Add `M` to `A`. MUL R1, B ; R1 = R1 * B (R1 = A * B)
In **Register Addressing Mode**, the operand is stored in a register. The instruction specifies the - If `Q0 = 1` and `Q-1 = 1`: Do nothing (just shift). DIV R1, D ; R1 = R1 / D (R1 = (A * B) / D)
register in which the operand resides. This mode is faster because it avoids memory access. - Perform an arithmetic right shift on (`A`, `Q`, `Q-1`). MOV C, R1 ; C = R1 (C = (A * B) / D)
**Example in Assembly:** 3. **Repeat:** Repeat for the number of bits in `Q`. ### 3. **One-Address Instructions**
MOV AX, BX ; Move the contents of register BX into register AX 4. **Result:** After all bits are processed, the result is in the combined (`A`, `Q`) register. In one-address instructions, the accumulator (AC) is used implicitly for operations.
- **Explanation:** The data is located in the `BX` register and is copied to the `AX` register. ### Example: Multiply 3 (0011) by -4 (1100) ```assembly
### 3. Absolute (or Direct) Addressing Mode - **Initial:** `A = 0000`, `Q = 1100`, `Q-1 = 0`. LOAD A ; AC = A
In **Absolute Addressing Mode** (also known as **Direct Addressing Mode**), the instruction - **Step 1:** `Q0 = 0`, `Q-1 = 0` -> No operation, ARS -> `A = 0000`, `Q = 0110`. MUL B ; AC = AC * B (AC = A * B)
contains the memory address where the operand is stored. The operand is fetched from this - **Step 2:** `Q0 = 0`, `Q-1 = 0` -> No operation, ARS -> `A = 0000`, `Q = 0011`. DIV D ; AC = AC / D (AC = (A * B) / D)
specific memory address. - **Step 3:** `Q0 = 1`, `Q-1 = 0` -> Subtract `M` from `A`, ARS -> `A = 1101`, `Q = 1001`. STORE C ; C = AC (C = (A * B) / D)
- **Example in Assembly:* - **Step 4:** `Q0 = 1`, `Q-1 = 1` -> No operation, ARS -> `A = 1110`, `Q = 1100`. consider 6 segment pipeline with segment delay of segment as 10ns,16ns,11ns,12ns,14ns and 14 ns
MOV AX, [1234H] ; Move the data from memory address 1234H into register AX **Final Result:** `A = 1110`, `Q = 1100` -> Result = `-12` (correct for 3 * -4). respectively calculate the processing time of pipeline for 1000 task in short
Explanation:** The instruction points to a specific memory location, `1234H`, where the operand is Using restoring division method devide the values where M=3(divisor) and Q=12(dividen) ### Step 1: Determine the cycle time
stored. The contents at that memory location are moved to the `AX` register. ### Step 1: Convert to Binary - The cycle time of the pipeline is determined by the segment with the maximum delay.
comparison between a microprogrammed control unit and a hardwired control unit: - \( M = 3 ) -> `0011` - **Maximum segment delay:** `16 ns` (this is the cycle time).
### 1. **Design Approach:** - \( Q = 12 ) -> `1100` ### Step 2: Calculate the total time for 1000 tasks
- **Hardwired Control Unit:** Uses fixed combinational logic circuits to generate control signals. ### Step 2: Initialize - **Time to fill the pipeline: 6X16ns =
- **Microprogrammed Control Unit:** Uses a sequence of microinstructions stored in control - **A (Accumulator):** `0000` - **Time to process remaining tasks: (100-1) X 16ns = 999 X 16ns = 15984ns
memory to generate control signals. - **Q (Dividend):** `1100` ### Step 3: Calculate the total processing time
### 2. **Flexibility:** - **M (Divisor):** `0011` - **Total time : 96 ns + 15984 ns = 16080 ns
- **Hardwired Control Unit:** Less flexible; modifying the control logic requires redesigning the ### Step 3: Perform Division (4-bit process) ### Answer:
hardware. 1. **Shift Left `A:Q`:** `0000 1100` -> `0001 1000` - The total processing time for 1000 tasks in the pipeline is 16080 ns.
- **Microprogrammed Control Unit:** Highly flexible; changes can be made by updating the - **Subtract M:** `0001 - 0011 = 1110` (Negative, so restore) #######Flash memory is a type of non-volatile storage technology that is widely used in various
microprogram. - **Restore:** `0001 1000` -> `0001 1000` (Q_0 = 0) electronic devices. Here are its key characteristics:
### 3. **Speed:** 2. **Shift Left `A:Q`:** `0001 1000` -> `0011 0000` 1. **Non-Volatile Storage**: Flash memory retains data even when the power is turned off,
- **Hardwired Control Unit:** Generally faster, as control signals are generated directly by - **Subtract M:** `0011 - 0011 = 0000` (Positive) making it ideal for storing data permanently or semi-permanently.
hardware. - **Q_0 = 1** 2. **Solid-State**: Flash memory has no moving parts, unlike traditional hard drives. This solid-
- **Microprogrammed Control Unit:** Slightly slower, as it involves fetching and executing 3. **Shift Left `A:Q`:** `0000 0000` -> `0000 0000` state nature makes it more durable, resistant to physical shock, and faster in data access and
microinstructions. - **Subtract M:** `0000 - 0011 = 1101` (Negative, so restore) transfer.
### 4. **Complexity:** - **Restore:** `0000 0000` (Q_0 = 0) 3. **High-Speed Access**: Flash memory provides quick read and write operations, which makes it
- **Hardwired Control Unit:** More complex to design, especially for large instruction sets. 4. **Shift Left `A:Q`:** `0000 0000` -> `0000 0000` suitable for applications that require fast data retrieval and storage, like booting operating systems
- **Microprogrammed Control Unit:** Easier to design, particularly for complex instruction sets. - **Subtract M:** `0000 - 0011 = 1101` (Negative, so restore) or running applications.
### 5. **Applications:** - **Restore:** `0000 0000` (Q_0 = 0) 4. **Rewritable**: Flash memory can be rewritten many times, although it has a finite number of
- **Hardwired Control Unit:** Common in RISC architectures with simpler instruction sets. ### Final Result write/erase cycles. This number can vary depending on the type of flash memory (e.g., NAND or
- **Microprogrammed Control Unit:** Common in CISC architectures with more complex - **Quotient (Q):** `0100` (4 in decimal) NOR flash).
instruction sets. - **Remainder (A):** `0000` (0 in decimal) 5. **Varied Form Factors**: Flash memory comes in different forms, such as USB drives, SD cards,
**Result:** ( 12 div 3 = 4 ) with a remainder of 0. SSDs (Solid State Drives), and embedded memory in devices like smartphones and tablets.
[1] 6. **Cost and Storage Capacity**: Flash memory is generally more expensive per gigabyte than
[2] traditional hard drives, but prices have been decreasing over time. It also offers a range of storage
capacities, from a few megabytes to several terabytes.
[3]
Define ROM and explain its significance in computer system. ### Stack Organization in Computer Systems 1. **Function Call Management:**
**Definition:** **Definition:** - **Call Stack:** During function calls, the return address, parameters, and local variables are
ROM is a type of non-volatile memory used in computers and other electronic devices. As its name A stack is a data structure that operates on a Last-In, First-Out (LIFO) principle. In a stack, the last pushed onto the stack. When the function completes, the stack is popped to restore the previous
implies, data stored in ROM can only be read and not written to. It is used primarily to store element added (pushed) is the first one to be removed (popped). This structure is widely used in state.
firmware (software that is permanently programmed into the hardware). computer systems, particularly in the management of function calls, expression evaluation, and - **Example:** In recursive functions, each call creates a new stack frame, which is popped off
**Significance in Computer Systems:** memory management. when the function returns.
- **Firmware Storage:** ROM stores the firmware, which is the low-level software that controls ### Key Concepts in Stack Organization 2. **Expression Evaluation:**
the hardware. For example, the BIOS (Basic Input/Output System) in a computer is stored in ROM. 1. **Stack Pointer (SP):** - **Postfix Evaluation:** Expressions in postfix notation (e.g., `AB+`) are evaluated using a stack.
- **Boot Process:** ROM plays a crucial role during the computer's startup process, providing the - A special-purpose register that holds the address of the top element in the stack. Operands are pushed, and when an operator is encountered, the required operands are popped,
necessary instructions to initialize hardware and load the operating system. - In a growing stack, the SP is incremented or decremented based on whether the stack is the operation is performed, and the result is pushed back.
- **Reliability:** ROM retains its contents even when the power is turned off, ensuring that growing up or down in memory. - **Example:** Evaluating the postfix expression `3 4 + 2 *`:
essential instructions are always available to the system. 2. **Push Operation:** - Push 3
### Differences Between ROM and RAM - Adds (pushes) an element onto the stack. - Push 4
| **Aspect** | **ROM (Read-Only Memory)** | **RAM (Random-Access Memory)| - The SP is adjusted accordingly (decreased in a downward-growing stack or increased in an - Pop 4 and 3, compute `3 + 4 = 7`, and push 7
|-------------------- |---------------------------------------------------------|----------------------------------------| upward-growing stack). - Push 2
| **Functionality** | Stores firmware and boot instructions; read-only memory | Temporary 3. **Pop Operation:** - Pop 2 and 7, compute `7 * 2 = 14`, and push 14 (final result)
storage for data and programs in active use; read/write memory | - Removes (pops) the top element from the stack. 3. **Interrupt Handling:**
| **Data Storage** | Non-volatile (retains data without power) | Volatile (loses data when power - The SP is adjusted in the opposite direction of the push. - When an interrupt occurs, the CPU's current state (like the program counter and flags) is pushed
is off) | 4. **Top of the Stack:** onto the stack. After the interrupt is handled, the state is restored by popping these values off the
| **Volatility** | Non-volatile | Volatile | - Refers to the most recent element added to the stack. stack.
| **Data Access** | Data is pre-written and not easily modified | Data is frequently ### Example of Stack Operations 4. **Backtracking Algorithms:**
written to and read from by the CPU | Let's consider a stack implemented in a downward-growing memory model (common in many - Stacks are used in algorithms like depth-first search (DFS) where backtracking is required. As the
| **Usage** | Holds critical instructions for system startup and hardware control | Holds data systems): algorithm progresses, states are pushed onto the stack, and when a dead end is reached, the
for currently running applications and processes | #### Initial State algorithm pops the stack to backtrack.
### Types of ROM - **Stack Pointer (SP):** `0x1000`
1. **PROM (Programmable ROM):** - **Stack:** Empty differentiate between RISC and CISC architecture with functional diagram
- Can be programmed by the user after manufacturing, but only once. Once programmed, it #### Push Operations ### 1. **Definition**
cannot be altered. 1. **Push 5 onto the Stack:** - **RISC (Reduced Instruction Set Computer):**
2. **EPROM (Erasable Programmable ROM):** - SP is decremented: `SP = 0x0FFC` - A CPU design philosophy that uses a small, highly optimized set of instructions, all of which
- Can be erased and reprogrammed using UV light. The contents are erased by exposing the chip - The value 5 is stored at address `0x0FFC`. typically execute in a single clock cycle.
to ultraviolet light, allowing for reprogramming. 2. **Push 10 onto the Stack:** - **CISC (Complex Instruction Set Computer):**
3. **EEPROM (Electrically Erasable Programmable ROM):** - SP is decremented: `SP = 0x0FF8` - A CPU design philosophy that uses a large set of instructions, some of which are complex and
- Can be erased and reprogrammed electrically, typically while the chip is still in the computer. It - The value 10 is stored at address `0x0FF8`. can execute multiple low-level operations in a single instruction.
allows for selective erasure and reprogramming of data, which makes it more flexible. #### Stack After Pushes ### 2. **Instruction Set**
4. **Mask ROM:** - **Memory:** - **RISC:**
- Pre-programmed during the manufacturing process and cannot be modified afterward. It is - Address `0x0FF8` -> 10 (Top of the stack) - Fewer, simpler instructions.
typically used for large-scale production where the data does not change. - Address `0x0FFC` -> 5 - Each instruction typically takes one clock cycle.
- **SP:** `0x0FF8` - Instructions are of uniform length.
[4] - **CISC:**
#### Pop Operation - More complex instructions, which can perform multiple operations.
1. **Pop from the Stack:** - Instructions can take multiple clock cycles.
- The value at `0x0FF8` (10) is removed. - Instructions are of variable length.
- SP is incremented: `SP = 0x0FFC`. ### 3. **Execution Time**
#### Stack After Pop - **RISC:**
- **Memory:** - Single-cycle execution for most instructions.
- Address `0x0FF8` -> (Empty) - Pipelining is easier to implement due to uniform instruction length.
- Address `0x0FFC` -> 5 (Top of the stack) - **CISC:**
- **SP:** `0x0FFC` - Multi-cycle execution for complex instructions.
### Uses of Stack in Computer Systems - Pipelining is more challenging due to variable instruction length.
[5] [6]
### 4. **Memory Usage** +-----------------+ ### Signed Number Representation
- **RISC:** | Write Back | **Signed number representation** allows both positive and negative integers to be represented in
- More memory usage for programs since complex operations must be broken into simpler | Unit | binary form. In a signed number system, the most significant bit (MSB) is used as the **sign bit**:
instructions. +-----------------+ - `0` indicates a positive number.
- **CISC:** #### **CISC Architecture Functional Diagram** - `1` indicates a negative number.
- Less memory usage for programs since a single complex instruction can perform multiple tasks. +-----------------+ ### 4-Bit Signed Number Representation (Range: -8 to 7)
### 5. **Pipeline Efficiency** | Instruction | In a 4-bit format:
- **RISC:** | Fetch Unit | - The range of values is from `-8` to `7`.
- High pipeline efficiency due to the simplicity and uniformity of instructions. +-----------------+ - The numbers are represented as follows:
- **CISC:** | | Decimal | Binary (4-bit) | Note |
- Lower pipeline efficiency due to complex, multi-cycle instructions and variable instruction v |---------|----------------|---------------|
lengths. +-----------------+ +----------------+ |0 | `0000` | Positive zero |
### 6. **Hardware Complexity** | Instruction | | Microcode | |1 | `0001` | |
- **RISC:** | Decode Unit | <--> | Control Unit | |2 | `0010` | |
- Simpler hardware design with fewer transistors devoted to instruction decoding. +-----------------+ +----------------+ |3 | `0011` | |
- **CISC:** | |4 | `0100` | |
- More complex hardware design with more transistors used for instruction decoding and v |5 | `0101` | |
execution. +-----------------+ +----------------+ |6 | `0110` | |
### 7. **Examples of Architectures** | Arithmetic & | <--> | Register File | |7 | `0111` | |
- **RISC:** | Logic Unit | +----------------+ | -1 | `1111` | Two's complement |
- ARM, MIPS, SPARC, PowerPC | (ALU) | | -2 | `1110` | |
- **CISC:** +-----------------+ | -3 | `1101` | |
- x86, VAX, System/360 | | -4 | `1100` | |
### 8. **Functional Diagram** v | -5 | `1011` | |
#### **RISC Architecture Functional Diagram** +-----------------+ | -6 | `1010` | |
+-----------------+ | Memory Access | | -7 | `1001` | |
| Instruction | | Unit | | -8 | `1000` | |
| Fetch Unit | +-----------------+
+-----------------+ | ### Representing -35 in 8-Bit Format
| v To represent `-35` in 8-bit format using two's complement:
v +-----------------+ 1. **Find the binary representation of +35**:
+-----------------+ | Write Back | - \( 35 \) in binary (8-bit) is `00100011`.
| Instruction | | Unit | 2. **Find the two's complement**:
| Decode Unit | +-----------------+ - Invert the digits: `11011100`
+-----------------+ [8] - Add 1 to the result: `11011100` + `1` = `11011101`
| So, **-35** in 8-bit two's complement format is `11011101`.
v
+-----------------+ +----------------+ [9]
| Arithmetic & | <--> | Register File |
| Logic Unit | +----------------+
| (ALU) |
+-----------------+
|
v
+-----------------+
| Memory Access |
| Unit |
+-----------------+
|
v
[7]
### Floating Point Numbers ### Role of DMA in Enhancing I/O Performance ### Basic Concept of Pipelining
**Floating point numbers** are used to represent real numbers (numbers with fractional parts) in **Direct Memory Access (DMA)** is a crucial mechanism in computer systems that enhances the **Pipelining** is a technique used in computer architecture to improve the throughput of a
a way that can support a wide range of values. They are represented in a format similar to performance of input/output (I/O) operations by allowing peripherals to transfer data directly to processor by overlapping the execution of multiple instructions. In a pipelined processor, an
scientific notation, where a number is split into three parts: the sign, the exponent, and the and from the memory without involving the CPU in each transfer. This offloads the CPU, allowing it instruction is divided into several stages, and different stages of multiple instructions are
mantissa (or significand). to perform other tasks while the DMA controller handles the data transfer, thereby improving processed simultaneously.
### IEEE 754 Standard (32-bit Single Precision) overall system efficiency. ### Stages in a Simple Pipeline
- **Sign bit (1 bit):** Indicates whether the number is positive (`0`) or negative (`1`). ### How DMA Improves I/O Operations A basic instruction pipeline typically consists of the following stages:
- **Exponent (8 bits):** Encodes the exponent value (with a bias of 127). 1. **Reduces CPU Overhead:** By allowing data transfer directly between memory and I/O 1. **Fetch (IF):** Retrieve the instruction from memory.
- **Mantissa (23 bits):** Represents the significant digits of the number. devices, DMA reduces the need for the CPU to manage each data byte, freeing up CPU cycles for 2. **Decode (ID):** Decode the instruction to understand what action is needed.
### Example Representation other operations. 3. **Execute (EX):** Perform the operation (e.g., arithmetic calculation).
Consider the number `-12.5` in IEEE 754 format: 2. **Faster Data Transfers:** Since DMA operates independently, it can transfer data faster than if 4. **Memory Access (MEM):** Access memory if needed.
1. **Convert to binary:** the CPU were managing the transfer, especially for large blocks of data. 5. **Write Back (WB):** Write the result back to the register.
- `12.5` in binary is `1100.1`. 3. **Increases System Throughput:** By offloading I/O tasks to the DMA controller, the system can ### Pipelining vs. Sequential (Non-Pipelined) Processing
2. **Normalize the number:** handle more operations simultaneously, leading to higher throughput. - **Sequential Processing:**
- `1100.1` becomes `1.1001 × 2^3`. ### Steps Involved in a Typical DMA Transfer - Each instruction is completed before the next instruction begins.
3. **Determine the components:** 1. **Initiation:** - Instructions are processed one after another without overlap.
- **Sign bit:** `1` (since the number is negative) - The CPU sets up the DMA controller by providing the source address (memory or I/O), - **Pipelined Processing:**
- **Exponent:** `3 + 127 = 130`, which in binary is `10000010`. destination address, the number of bytes to transfer, and the direction of data transfer (read or - Multiple instructions are processed simultaneously, with each instruction at a different stage of
- **Mantissa:** `10010000000000000000000` (23 bits after the decimal point). write). execution.
4. **IEEE 754 Representation:** - The CPU then issues a command to the DMA controller to start the transfer. - As one instruction moves from one stage to the next, the next instruction enters the pipeline.
- `1 10000010 10010000000000000000000` 2. **Address and Control:** ### Comparison with Diagram
### Floating Point Operations - The DMA controller takes control of the system bus to manage the data transfer between the **Sequential Processing (Non-Pipelined):**
**1. Addition/Subtraction:** memory and the I/O device. Time: T1 T2 T3 T4 T5 T6 T7 T8 T9 T10
- **Align the exponents:** Adjust the numbers so they have the same exponent. - For a **Read operation (DMA reads from memory to I/O device)**: The DMA controller reads Inst1: IF ID EX MEM WB
- **Add/Subtract the mantissas:** Perform the operation on the mantissas. data from the memory address specified and writes it to the I/O device. Inst2: IF ID EX MEM WB
- **Normalize the result:** Adjust the result so it fits into the floating-point format. - For a **Write operation (DMA writes from I/O device to memory)**: The DMA controller reads Inst3: IF ID EX MEM WB
**Example:** data from the I/O device and writes it to the memory address specified. **Pipelined Processing:**
- Add `1.25` (`1.01 × 2^0`) and `-2.5` (`-1.01 × 2^1`). 3. **Transfer:** Time: T1 T2 T3 T4 T5 T6 T7 T8 T9 T10
- Align exponents: `1.25` becomes `0.101 × 2^1`. - The DMA controller transfers data directly between the memory and the I/O device without Inst1: IF ID EX MEM WB
- Add mantissas: `0.101 + -1.01 = -0.101` CPU intervention. The data is transferred in bursts or blocks, depending on the setup. Inst2: IF ID EX MEM WB
- Normalize: `-1.01 × 2^0 = -1.25`. 4. **Completion:** Inst3: IF ID EX MEM WB
**2. Multiplication:** - Once the transfer is complete, the DMA controller sends an interrupt to the CPU, indicating that Inst4: IF ID EX MEM WB
- **Multiply the mantissas.** the data transfer is finished.
- **Add the exponents.** - The CPU then resumes control and continues processing with the data now available in
- **Normalize the result.** memory. [12]
**Example:** ### Example: Impact of DMA on I/O Operations
- Multiply `2.5` (`1.01 × 2^1`) and `4.0` (`1.00 × 2^2`). **Without DMA:**
- Mantissa multiplication: `1.01 × 1.00 = 1.01`. - Suppose a system is transferring a 1MB file from a hard disk to memory. Without DMA, the CPU
- Add exponents: `1 + 2 = 3`. would have to handle each byte, checking, reading, and writing data, which would be slow and
- Result: `1.01 × 2^3 = 10.0` or `8.0`. resource-intensive.
**With DMA:**
[10] - With DMA, the CPU simply initiates the transfer and then moves on to other tasks. The DMA
controller manages the 1MB transfer independently, significantly speeding up the process and
freeing the CPU for other operations.
[11]
why we need an instruction buffer in a pipelined CPU explain: - **L3 Cache:** - **Source Address 2:** Specifies the second operand.
### 1. **Ensures Continuous Instruction Flow** - **Size:** Larger (2MB to 16MB or more). - **Destination Address:** Specifies where the result should be stored.
- **Pipelining Requires Continuous Input:** Pipelined CPUs have multiple stages for processing - **Speed:** Moderate (tens of nanoseconds). - **Instruction:** `MOV A, B, C` (moves the result of the operation on `A` and `B` into `C`).
instructions (fetch, decode, execute, etc.). For the pipeline to be effective, each stage must be - **Role:** Shares data among multiple CPU cores and acts as a last level cache before main - **Example in Assembly:** `MOV R1, R2, R3` (moves the result of an operation involving `R2`
continuously supplied with instructions. An instruction buffer stores a queue of instructions memory. and `R3` into `R1`).
fetched from memory, ensuring that each stage of the pipeline receives instructions without delay, 3. **Main Memory (RAM):** 4. **Register Format:**
even if there are variations in memory access times. - **Size:** Large (4GB to 64GB or more). - **Structure:** All operands are specified by registers.
### 2. **Reduces Pipeline Stalls** - **Speed:** Moderate (hundreds of nanoseconds). - **Example:**
- **Prevents Idle Stages:** Without an instruction buffer, the pipeline might experience stalls due - **Role:** Holds data and instructions that are currently being used or processed. It is slower - **Opcode:** Defines the operation.
to delays in fetching instructions from memory. The buffer acts as a temporary holding area that than cache but has much more capacity. - **Source Register:** Specifies the register containing the source operand.
smooths out these delays, preventing pipeline stages from stalling when there’s a delay in fetching 4. **Secondary Storage (e.g., SSDs, HDDs):** - **Destination Register:** Specifies the register to store the result.
instructions. - **Size:** Very large (hundreds of GBs to several TBs). - **Instruction:** `ADD R1, R2` (adds contents of register `R2` to register `R1`).
### 3. **Handles Memory Access Latency** - **Speed:** Slower (milliseconds). - **Example in Assembly:** `ADD R1, R2` (adds the contents of register `R2` to register `R1`).
- **Mitigates Latency Issues:** Memory access times can vary due to cache misses or other delays. - **Role:** Stores data and programs not currently in use. It is non-volatile, meaning data 5. **Immediate Address Format:**
The instruction buffer allows instructions to be pre-fetched and stored, so the CPU can continue persists without power. - **Structure:** Includes an opcode and an immediate value (constant).
processing instructions even if there is a delay in accessing the next set of instructions from 5. **Tertiary and Off-line Storage (e.g., Optical Discs, External Drives):** - **Example:**
memory. - **Size:** Extremely large (multiple TBs). - **Opcode:** Defines the operation.
### 4. **Facilitates Branch Prediction and Handling** - **Speed:** Slowest (seconds to minutes). - **Immediate Value:** Specifies the constant operand.
- **Branch Handling Efficiency:** In cases of branch instructions, the CPU might need to fetch and - **Role:** Used for backup and archival purposes. Typically not used for active data access. - **Instruction:** `MOV A, #5` (loads the constant `5` into the register `A`).
decode multiple possible instruction paths. The instruction buffer can hold instructions for both - **Example in Assembly:** `MOV R1, #10` (loads the constant value `10` into register `R1`).
potential paths (branch and not-branch), allowing the CPU to quickly switch paths without having ### what is instruction format , explain different types of instruction format ### Instruction Cycle
to wait for the branch decision to be resolved. **Instruction format** refers to the layout or structure of the bits in an instruction word. It The **instruction cycle** is the process through which the CPU executes an instruction. It
### 5. **Supports Instruction Prefetching** determines how different parts of the instruction are encoded and how they are used by the CPU. generally includes the following stages:
- **Improves Performance:** The instruction buffer allows the CPU to prefetch instructions ahead An instruction typically consists of several fields, including an **opcode** (operation code) and 1. **Fetch:** Retrieve the instruction from memory.
of the current execution point. This means that while the pipeline is working on one set of various **operand** fields. 2. **Decode:** Interpret the opcode and determine the operation and operand addresses.
instructions, the buffer can be loading the next set of instructions, reducing idle times and ### Types of Instruction Formats 3. **Execute:** Perform the operation specified by the opcode using the operands.
improving overall throughput. 1. **One-Address Format:** 4. **Store/Write Back:** Save the result back to memory or a register.
- **Structure:** Includes an opcode and a single address. ### Example
### Principle of Memory Hierarchy - **Example:** Consider the instruction `ADD R1, R2` (in a 2-address format).
The **memory hierarchy** is a structure used in computer systems to balance the trade-offs - **Opcode:** Defines the operation. 1. **Fetch:** The CPU fetches the `ADD R1, R2` instruction from memory.
between memory size, speed, and cost. It organizes memory into a hierarchy of levels, each with - **Address:** Specifies the location of the operand. 2. **Decode:** The CPU decodes the opcode `ADD` and identifies `R1` as the destination register
different characteristics, to optimize performance and cost-efficiency. The principle is based on the - **Instruction:** `LOAD A` (where `A` is the address of the data to be loaded into the and `R2` as the source register.
observation that while faster memory is more expensive and has less capacity, slower memory is accumulator). 3. **Execute:** The CPU adds the contents of `R2` to `R1`.
cheaper and has more capacity. - **Example in Assembly:** `LOAD 5000` (loads data from memory address 5000 into the 4. **Store/Write Back:** The result is stored in `R1`.
### Levels of Memory Hierarchy accumulator).
1. **Registers:** 2. **Two-Address Format:** [15]
- **Size:** Smallest (typically a few KBs or less). - **Structure:** Includes an opcode and two addresses (source and destination).
- **Speed:** Fastest (clock cycle time or a few nanoseconds). - **Example:**
- **Role:** Hold data that the CPU is currently processing. Immediate access required. - **Opcode:** Defines the operation.
2. **Cache Memory:** - **Source Address:** Specifies the location of the source operand.
- **L1 Cache:** - **Destination Address:** Specifies where the result should be stored.
- **Size:** Very small (32KB to 64KB per core). - **Instruction:** `ADD A, B` (adds data from address `B` to data at address `A` and stores the
- **Speed:** Very fast (a few nanoseconds). result in `A`).
- **Role:** Stores frequently accessed data and instructions close to the CPU to reduce access - **Example in Assembly:** `ADD R1, R2` (adds the contents of register `R2` to register `R1`).
time. 3. **Three-Address Format:**
- **L2 Cache:** - **Structure:** Includes an opcode and three addresses (two source addresses and one
- **Size:** Small to medium (256KB to 2MB). destination address).
- **Speed:** Fast (a few nanoseconds to tens of nanoseconds). - **Example:**
- **Role:** Provides a second level of caching to reduce the time it takes to access data that is - **Opcode:** Defines the operation.
not in L1. - **Source Address 1:** Specifies the first operand.
[13] [14]
### Data Hazards and Control Hazards in Computer Architecture ### Importance of Cache Memory - **Disadvantage:** Can result in higher write latency and more memory traffic.
In pipelined processors, hazards can affect the smooth execution of instructions. The two main **Cache memory** is a small, fast type of volatile computer memory that provides high-speed - **Example:** When data is written to the cache, it is immediately written to the main memory
types of hazards are **data hazards** and **control hazards**. data access to the CPU and improves overall system performance. It acts as an intermediary as well.
### 1. Data Hazards between the CPU and the slower main memory (RAM), significantly reducing the time the CPU - **Write-Back (Write-Behind):**
**Data hazards** occur when instructions that are close together in the pipeline need to use the needs to access frequently used data and instructions. - **Description:** Writes to the cache are not immediately written to the main memory. The data
same data. They can be categorized into three types: #### **Key Benefits:** is only written back to main memory when it is evicted from the cache.
- **Read After Write (RAW):** Also known as a true dependency, this occurs when an instruction 1. **Speed Improvement:** - **Advantage:** Reduces the number of write operations to main memory and can improve
needs to read a value that a previous instruction is writing. - Cache memory is much faster than main memory, allowing for quicker access to data and performance.
- **Write After Read (WAR):** Occurs when an instruction writes to a location before a previous instructions that are used frequently or recently. - **Disadvantage:** May result in stale data in main memory if the cache is not properly
instruction has read from that location. 2. **Reduction in Latency:** managed.
- **Write After Write (WAW):** Occurs when two instructions write to the same location, and the - By storing copies of frequently accessed data, cache memory minimizes the time required to - **Example:** Data modifications are made in the cache and are written to main memory only
order of writing affects the result. fetch data from the slower main memory. when the cache line is replaced.
#### Example of Data Hazard 3. **Enhanced CPU Efficiency:** #### **3. Cache Coherence Policies**
Consider the following instructions: - With faster access to necessary data, the CPU spends less time waiting for memory accesses In multi-core systems, cache coherence policies ensure that all caches in different processors have
1. `I1: ADD R1, R2, R3` ; R1 = R2 + R3 and can perform more operations in a given period. a consistent view of memory:
2. `I2: SUB R4, R1, R5` ; R4 = R1 - R5 4. **Better System Throughput:** - **MESI Protocol (Modified, Exclusive, Shared, Invalid):**
- **RAW Hazard:** `I2` needs the result of `I1`, but `I1` hasn’t completed writing to `R1` when `I2` - Faster data access results in improved overall system performance and responsiveness. - **Description:** A popular coherence protocol that manages cache line states to ensure
starts reading `R1`. ### Different Types of Cache Policies consistency.
#### Diagram Cache policies dictate how the cache operates in terms of what data is stored, replaced, and how - **Advantage:** Helps maintain a coherent view of memory across multiple caches.
Cycle: 1 2 3 4 5 6 7 cache hits and misses are handled. Here are the primary cache policies: - **Example:** If one core modifies a cache line, other cores are notified to update or invalidate
I1: IF ID EX MEM WB #### **1. Cache Replacement Policies** their copies.
I2: IF ID EX MEM WB When the cache is full, the cache replacement policy determines which data to remove to make
**Resolution:** space for new data. Common policies include:
- **Stall:** Delay the execution of `I2` until `I1` has completed. - **Least Recently Used (LRU):** [18]
- **Forwarding/Bypassing:** Pass the result of `I1` directly to `I2` without waiting for `I1` to - **Description:** Replaces the cache line that has not been used for the longest period.
complete. - **Advantage:** Often provides good performance by keeping frequently used data in the cache.
### 2. Control Hazards - **Example:** If the cache is full and new data needs to be loaded, the data that has been
**Control hazards** occur due to branching instructions that affect the flow of control in the accessed the least recently is evicted.
pipeline. When a branch instruction is encountered, the next instruction to execute depends on - **First-In, First-Out (FIFO):**
the result of the branch. - **Description:** Replaces the oldest cache line that was loaded first.
- **Branch Instructions:** These include conditional and unconditional jumps, which can alter the - **Advantage:** Simple to implement but may not always provide the best performance
flow of instructions. compared to LRU.
#### Example of Control Hazard - **Example:** The cache line that was loaded first is removed when new data needs to be
Consider the following instructions: loaded.
1. `I1: BEQ R1, R2, LABEL` ; If R1 == R2, jump to LABEL - **Least Frequently Used (LFU):**
2. `I2: ADD R3, R4, R5` ; This instruction may or may not be executed depending on the branch - **Description:** Replaces the cache line that has been used the least number of times.
- **Control Hazard:** The pipeline needs to wait to determine if `I2` should be executed based on - **Advantage:** Effective if access patterns are consistent over time.
the result of `I1`. - **Example:** If the cache is full, the line with the lowest usage count is evicted.
#### Diagram - **Random Replacement:**
Cycle: 1 2 3 4 5 6 7 - **Description:** Randomly selects a cache line to evict.
I1: IF ID EX MEM WB - **Advantage:** Easy to implement and can sometimes perform comparably to more complex
I2: IF ID (depends on I1) policies.
**Resolution:** - **Example:** Any cache line is randomly chosen for removal when new data needs to be
- **Stall:** Insert no-operation (NOP) instructions until the branch decision is made. loaded.
- **Branch Prediction:** Predict the outcome of the branch and continue executing instructions #### **2. Cache Write Policies**
based on the prediction. Cache write policies determine how data modifications are handled between the cache and the
- **Branch Target Buffer (BTB):** Store the target addresses of branches to quickly determine the main memory:
next instruction to fetch. - **Write-Through:**
- **Description:** Every write to the cache is also written to the main memory simultaneously.
[16] - **Advantage:** Ensures that the main memory is always up-to-date with the cache.
[17]