Study Notes COAL Mids
Study Notes COAL Mids
Additional References:
• Computer Architecture:
o Refers to attributes visible to the programmer (e.g., instruction set, number of
bits for data, addressing techniques).
o Impacts how programs are written and executed logically.
o Example: Whether a processor uses 32-bit or 64-bit instructions.
• Computer Organization:
o Refers to how these architectural features are implemented in hardware (e.g.,
control signals, memory technology).
o Focuses on the physical structure and behavior of the system.
o Example: How the CPU communicates with memory via buses.
Key Difference: Architecture is what the programmer sees; organization is how it’s built.
• Structure: How components (e.g., CPU, memory) are connected and relate to each
other.
• Function: What each component does as part of the system.
• Hierarchical System:
o Computers are designed as a set of interrelated subsystems (e.g., CPU contains
ALU, registers).
o Designers focus on one level at a time, simplifying complexity.
1. Data Processing: Handles various data forms with broad processing needs (e.g.,
calculations).
2. Data Storage:
o Short-term (e.g., registers, cache).
o Long-term (e.g., hard drives).
3. Data Movement:
o Input/Output (I/O): Moves data to/from devices directly connected to the
computer (e.g., keyboard, monitor).
o Data Communications: Moves data over long distances (e.g., network).
4. Control: Manages resources and coordinates functions via a control unit.
CPU Subcomponents:
Multicore Structure:
Semiconductor Memory:
• 1970: Fairchild’s first large semiconductor memory (256 bits, faster than core
memory).
• 1974: Cheaper than core memory, leading to widespread adoption.
• Evolution: 13 generations since 1970, each with 4x storage density, lower cost, and
faster access.
Microprocessors:
• Trends:
o Costs drop, performance rises dramatically.
o Today’s laptops match IBM mainframes from 10–15 years ago.
• Applications Needing Power:
o Image processing, 3D rendering, speech recognition, videoconferencing,
multimedia authoring, simulations.
• Servers:
o Handle transactions, databases, and client/server networks.
o Cloud providers use high-performance server banks for high-volume tasks.
Chip Improvements:
• Power Density: More logic and higher clock speeds increase heat.
• RC Delay: Thinner wires increase resistance, closer wires increase capacitance,
slowing signals.
• Memory Latency: Memory speeds lag behind processors.
Multicore Solution:
1. Pipelining:
o Instructions are processed in stages (e.g., fetch, decode, execute) like an
assembly line, with all stages working simultaneously.
o Increases throughput by overlapping tasks.
2. Branch Prediction:
o Processor predicts which instructions (branches) will be executed next based
on patterns, fetching them early.
o Reduces delays from waiting for branch decisions.
3. Superscalar Execution:
o Executes multiple instructions per clock cycle using parallel pipelines.
o Example: Issuing two or more instructions at once.
4. Data Flow Analysis:
o Analyzes instruction dependencies to optimize their execution order.
o Ensures efficient use of processor resources.
5. Speculative Execution:
o Executes instructions ahead of time (based on predictions), storing results
temporarily.
o Keeps the processor busy, discarding results if predictions are wrong.
• Power Issues:
o Power density rises with more logic and higher clock speeds, making heat
dissipation a challenge.
• RC Delay:
o Resistance (R) and capacitance (C) in wires slow signal speed (RC product
increases).
o Thinner wires (higher R) and closer wires (higher C) worsen delays as
components shrink.
• Memory Latency:
o Memory speeds lag behind processor speeds, creating a bottleneck.
Multicore Strategy:
• Uses multiple simpler processors (cores) on one chip instead of one complex
processor.
• Benefits:
o Performance increases without raising clock speed.
o Larger caches (e.g., L1, L2, L3) are justified, improving data access speed.
• Evolution: Started with two cores, now includes three or more cache levels on-chip.
• MIC:
o A chip with many general-purpose cores for high parallel performance.
o Challenge: Software must be designed to use many cores effectively.
• GPU:
o Specialized core for parallel graphics tasks (e.g., 2D/3D rendering, video
processing).
o Also used as vector processors for repetitive computations in other fields.
o Typically found on graphics cards but increasingly integrated into CPUs.
Clock Speed:
• Definitions:
o Instruction Count (Ic): Total machine instructions executed in a program.
o Cycles Per Instruction (CPI): Average clock cycles per instruction.
▪ Varies by instruction type (e.g., arithmetic = 1 cycle, branch = 4
cycles).
▪ Formula: CPI = Σ(CPI_i × I_i) / Ic
▪ CPI_i = cycles for instruction type i, I_i = count of type i
instructions.
o Execution Time (T): Time to run a program.
▪ Formula: T = Ic × CPI × τ
• MIPS Rate: Millions of instructions per second, a common performance measure.
o Formula: MIPS = Ic / (T × 10^6) = f / (CPI × 10^6)
• Example Calculation:
o Program: 2M instructions, 400 MHz processor.
o Instruction Mix:
▪ Arithmetic (60%, CPI = 1), Load/Store (18%, CPI = 2), Branch (12%,
CPI = 4), Cache Miss (10%, CPI = 8).
o CPI = (0.6 × 1) + (0.18 × 2) + (0.12 × 4) + (0.1 × 8) = 0.6 + 0.36 + 0.48 + 0.8
= 2.24
o MIPS = (400 × 10^6) / (2.24 × 10^6) ≈ 178 MIPS
Benchmarks:
Amdahl’s Law:
Problem-Solving Examples
Given:
Solution:
1. CPI:
o Total cycles = (45,000 × 1) + (32,000 × 2) + (15,000 × 2) + (8,000 × 2)
o = 45,000 + 64,000 + 30,000 + 16,000 = 155,000 cycles
o
CPI = 155,000 / 100,000 = 1.55
2. Execution Time (T):
o τ = 1 / (40 × 10^6) = 25 ns = 25 × 10^-9 s
o T = Ic × CPI × τ = 100,000 × 1.55 × 25 × 10^-9 = 3.875 × 10^-3 s = 3.875 ms
3. MIPS Rate:
o MIPS = f / (CPI × 10^6) = (40 × 10^6) / (1.55 × 10^6) ≈ 25.81 MIPS
• Machine A:
o Arithmetic/Logic: 8M (CPI = 1), Load/Store: 4M (CPI = 3), Branch: 2M (CPI
= 4), Others: 4M (CPI = 3)
• Machine B:
o Arithmetic/Logic: 10M (CPI = 1), Load/Store: 8M (CPI = 2), Branch: 2M
(CPI = 4), Others: 4M (CPI = 3)
Find: CPI, MIPS, execution time for each.
Solution:
1. Machine A:
o Ic = 8M + 4M + 2M + 4M = 18M
o Cycles = (8M × 1) + (4M × 3) + (2M × 4) + (4M × 3) = 8M + 12M + 8M +
12M = 40M
o CPI = 40M / 18M ≈ 2.22
o T = Ic × CPI × τ = 18M × 2.22 × (1 / 200M) = 0.2 s
o MIPS = 200 / 2.22 ≈ 90
2. Machine B:
o Ic = 10M + 8M + 2M + 4M = 24M
o Cycles = (10M × 1) + (8M × 2) + (2M × 4) + (4M × 3) = 10M + 16M + 8M +
12M = 46M
o CPI = 46M / 24M ≈ 1.92
o T = 24M × 1.92 × (1 / 200M) = 0.23 s
o MIPS = 200 / 1.92 ≈ 104
Comment: Machine B is faster (higher MIPS, lower CPI) despite more
instructions, due to a better instruction mix and lower average CPI.
Answer:
• Software:
o Sequence of instructions (codes) interpreted by hardware to generate control
signals.
o New programs change the code sequence without rewiring hardware.
• Major Components:
o CPU:
▪ Instruction Interpreter: Decodes and executes instructions.
▪ Arithmetic/Logic Module: Performs general-purpose computations.
o I/O Components:
▪ Input Module: Accepts data/instructions, converts them to internal
signals.
▪ Output Module: Reports results (e.g., to display or storage).
Registers:
• Memory Address Register (MAR): Specifies memory address for next read/write.
• Memory Buffer Register (MBR): Holds data to be written to memory or read from
memory.
• I/O Address Register (I/OAR): Identifies a specific I/O device.
• I/O Buffer Register (I/OBR): Exchanges data between CPU and I/O module.
3.2 Instruction Cycle
Action Categories:
3.3 Interrupts
• Purpose: Allow the processor to handle events without polling (checking constantly).
• Classes of Interrupts (Table 3.1):
o Program: Caused by instruction execution (e.g., overflow, illegal instruction,
memory access violation).
o Timer: Generated by a processor timer for regular OS tasks.
o I/O: Signaled by I/O controller (e.g., operation complete, error, service
request).
o Hardware Failure: Triggered by faults (e.g., power loss, memory parity
error).
Instruction Cycle with Interrupts:
• Process:
o Processor checks for interrupts after each instruction.
o If an interrupt occurs, it saves the current state (e.g., PC) and jumps to an
Interrupt Handler.
o After handling, it restores the state and resumes the program.
• Diagrams:
o Figure 3.7: Shows program flow with/without interrupts (short/long I/O
waits).
o Figure 3.9: Adds interrupt check to the basic cycle.
o Figure 3.12: State diagram including interrupt handling.
Multiple Interrupts:
• Basic I/O:
o I/O module exchanges data directly with the processor.
o Processor uses I/O instructions (not memory instructions) to read/write data.
o I/OAR identifies the device; I/OBR holds the data.
• Direct Memory Access (DMA):
o I/O module reads/writes to memory directly, bypassing the processor.
o Processor grants permission, then continues other tasks.
o Benefit: Frees the processor from managing data transfers.
Bus:
Bus Types:
1. Data Bus:
o Moves data between modules.
o Width (e.g., 32, 64, 128 bits) determines bits transferred at once.
o Wider bus = higher performance.
2. Address Bus:
o Specifies source/destination of data (e.g., memory address, I/O port).
o Width determines maximum memory capacity (e.g., 32-bit = 4GB).
o Higher-order bits select module; lower-order bits select location within
module.
3. Control Bus:
o Manages access to data/address lines (shared by all components).
o Timing Signals: Indicate when data/address are valid.
o Command Signals: Specify operations (e.g., read, write).
Point-to-Point Interconnect:
• Why Used:
o High-frequency synchronous buses face electrical constraints (hard to
synchronize/arbitrate).
o Shared buses struggle with high data rates and low latency on-chip.
• Advantages:
o Lower latency, higher data rate, better scalability than traditional buses.
• Example: Modern CPUs use point-to-point links (e.g., Intel’s QuickPath
Interconnect).