0% found this document useful (0 votes)
1 views14 pages

Study Notes COAL Mids

The document contains study notes for a midterm examination in Computer Organization and Architecture, referencing key textbooks and additional materials. It covers fundamental concepts such as computer architecture vs. organization, historical developments in computing technology, and performance assessment techniques. Key topics include CPU structure, memory hierarchy, performance issues, and methodologies for enhancing microprocessor speed and efficiency.

Uploaded by

yumnaumer4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views14 pages

Study Notes COAL Mids

The document contains study notes for a midterm examination in Computer Organization and Architecture, referencing key textbooks and additional materials. It covers fundamental concepts such as computer architecture vs. organization, historical developments in computing technology, and performance assessment techniques. Key topics include CPU structure, memory hierarchy, performance issues, and methodologies for enhancing microprocessor speed and efficiency.

Uploaded by

yumnaumer4
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

Study Notes for Midterm Examination

Course: Computer Organization and Architecture

Textbook: William Stallings, Computer Organization and Architecture, 10th Edition

Additional References:

• Computer Systems: A Programmer's Perspective, 3/E by Randal E. Bryant and David


R. O'Hallaron
• Assembly Language for x86 Processors, 7th Edition by Kip R. Irvine

Chapter 1: Basic Concepts and Computer Evolution


1.1 Computer Architecture vs. Computer Organization

• Computer Architecture:
o Refers to attributes visible to the programmer (e.g., instruction set, number of
bits for data, addressing techniques).
o Impacts how programs are written and executed logically.
o Example: Whether a processor uses 32-bit or 64-bit instructions.
• Computer Organization:
o Refers to how these architectural features are implemented in hardware (e.g.,
control signals, memory technology).
o Focuses on the physical structure and behavior of the system.
o Example: How the CPU communicates with memory via buses.

Key Difference: Architecture is what the programmer sees; organization is how it’s built.

1.2 IBM System/370 Architecture

• Introduced in 1970 by IBM.


• Key Features:
o Multiple models with varying speeds and costs.
o Upgradable to faster models without changing software (backward
compatibility).
o Same architecture across models, protecting software investment.
• Legacy: Still the basis for IBM’s modern mainframe product line.
• Why It Matters: Shows how a consistent architecture supports long-term system
evolution.
1.3 Structure and Function

• Structure: How components (e.g., CPU, memory) are connected and relate to each
other.
• Function: What each component does as part of the system.
• Hierarchical System:
o Computers are designed as a set of interrelated subsystems (e.g., CPU contains
ALU, registers).
o Designers focus on one level at a time, simplifying complexity.

Four Basic Computer Functions:

1. Data Processing: Handles various data forms with broad processing needs (e.g.,
calculations).
2. Data Storage:
o Short-term (e.g., registers, cache).
o Long-term (e.g., hard drives).
3. Data Movement:
o Input/Output (I/O): Moves data to/from devices directly connected to the
computer (e.g., keyboard, monitor).
o Data Communications: Moves data over long distances (e.g., network).
4. Control: Manages resources and coordinates functions via a control unit.

1.4 Computer Structure

Four Main Components:

1. CPU: Controls operations and processes data.


2. Main Memory: Stores data and instructions.
3. I/O: Transfers data between the computer and external devices.
4. System Interconnection: Communication mechanism (e.g., buses) between CPU,
memory, and I/O.

CPU Subcomponents:

• Control Unit: Directs CPU operations.


• Arithmetic and Logic Unit (ALU): Performs calculations and logic operations.
• Registers: Fast internal storage for temporary data.
• CPU Interconnection: Links control unit, ALU, and registers.

Multicore Structure:

• Core: An individual processing unit on a chip (like a mini-CPU).


• Processor: A chip with one or more cores (multicore if multiple).
• Benefit: Multiple cores improve performance by parallel processing.
Cache Memory:

• Small, fast memory between processor and main memory.


• Stores frequently used data to speed up access.
• Levels: L1 (closest to core, fastest), L2, L3 (farther, slower but larger).
• Why It Helps: Reduces time to fetch data from slower main memory.

1.5 History of Computers

First Generation (1946–1957): Vacuum Tubes

• Technology: Vacuum tubes for logic and memory.


• IAS Computer:
o Introduced stored program concept by John von Neumann (1945).
o Programs and data stored in the same memory.
o Built at Princeton, completed in 1952.
o Prototype for all modern general-purpose computers.
• Speed: ~40,000 operations per second.
• Registers (IAS):
o Memory Buffer Register (MBR): Holds data to/from memory or I/O.
o Memory Address Register (MAR): Specifies memory address.
o Instruction Buffer Register (IBR): Holds right-hand instruction temporarily.
o Instruction Register (IR): Holds current instruction’s opcode.
o Program Counter (PC): Address of next instruction.
o Accumulator (AC) & Multiplier Quotient (MQ): Temporary storage for
ALU operations.

Second Generation (1957–1964): Transistors

• Technology: Transistors (smaller, cheaper, less heat than vacuum tubes).


• Invented: 1947 at Bell Labs, widely used by late 1950s.
• Speed: ~200,000 operations per second.
• Features:
o More complex ALU and control units.
o High-level programming languages (e.g., Fortran, COBOL).
o System software for loading programs, managing I/O, and libraries.

Third Generation (1965–1971): Integrated Circuits

• Technology: Integrated circuits (ICs) combine multiple transistors on a chip.


• Invented: 1958, used widely in computers by mid-1960s.
• Speed: ~1,000,000 operations per second.
• Key Systems: IBM System/360, DEC PDP-8.
• Components:
o Memory Cells: Store data.
o Gates: Process data (logic operations).
o Paths: Move data between components.
o Control Signals: Coordinate operations.
Later Generations:

• Fourth (1972–1977): Large Scale Integration (LSI), ~10,000,000 ops/sec.


• Fifth (1978–1991): Very Large Scale Integration (VLSI), ~100,000,000 ops/sec.
• Sixth (1991–): Ultra Large Scale Integration (ULSI), >1,000,000,000 ops/sec.

Semiconductor Memory:

• 1970: Fairchild’s first large semiconductor memory (256 bits, faster than core
memory).
• 1974: Cheaper than core memory, leading to widespread adoption.
• Evolution: 13 generations since 1970, each with 4x storage density, lower cost, and
faster access.

Microprocessors:

• 1971: Intel 4004 (first CPU on a single chip).


• 1972: Intel 8008 (first 8-bit microprocessor).
• 1974: Intel 8080 (first general-purpose microprocessor, faster, richer instruction set).

Chapter 2: Performance Issues


2.1 Designing for Performance

• Trends:
o Costs drop, performance rises dramatically.
o Today’s laptops match IBM mainframes from 10–15 years ago.
• Applications Needing Power:
o Image processing, 3D rendering, speech recognition, videoconferencing,
multimedia authoring, simulations.
• Servers:
o Handle transactions, databases, and client/server networks.
o Cloud providers use high-performance server banks for high-volume tasks.

Microprocessor Speed Techniques:

1. Pipelining: Processes instructions in stages (like an assembly line).


2. Branch Prediction: Guesses next instructions to fetch.
3. Superscalar Execution: Executes multiple instructions per clock cycle.
4. Data Flow Analysis: Optimizes instruction order based on dependencies.
5. Speculative Execution: Runs instructions ahead, stores results temporarily.

Chip Improvements:

• Smaller Gates: Faster clock rates, less signal delay.


• Larger/Faster Caches: Reduce memory access time.
• Parallelism: Multiple cores or instructions executed simultaneously.
Challenges:

• Power Density: More logic and higher clock speeds increase heat.
• RC Delay: Thinner wires increase resistance, closer wires increase capacitance,
slowing signals.
• Memory Latency: Memory speeds lag behind processors.

Multicore Solution:

• Multiple simpler processors on one chip instead of one complex processor.


• Larger caches (L1, L2, L3) improve performance.

MIC and GPU:

• Many Integrated Core (MIC): Many general-purpose cores on one chip.


• Graphics Processing Unit (GPU): Specialized for parallel graphics tasks, also used
for vector processing.

Chapter 2: Performance Issues


2.1 Designing for Performance

• Trends in Computer Systems:


o Costs are dropping dramatically while performance and capacity rise
significantly.
o Example: Today’s laptops have the power of IBM mainframes from 10–15
years ago.
o Microprocessors are so cheap they’re disposable (e.g., in everyday devices).
• Applications Requiring High Performance:
o Image processing, 3D rendering, speech recognition, videoconferencing,
multimedia authoring, voice/video file annotation, simulation modeling.
• Business Use:
o Servers handle transactions, database processing, and massive client/server
networks (replacing old mainframes).
o Cloud providers use banks of high-performance servers for high-volume, high-
transaction-rate applications.

Microprocessor Speed Techniques:

1. Pipelining:
o Instructions are processed in stages (e.g., fetch, decode, execute) like an
assembly line, with all stages working simultaneously.
o Increases throughput by overlapping tasks.
2. Branch Prediction:
o Processor predicts which instructions (branches) will be executed next based
on patterns, fetching them early.
o Reduces delays from waiting for branch decisions.
3. Superscalar Execution:
o Executes multiple instructions per clock cycle using parallel pipelines.
o Example: Issuing two or more instructions at once.
4. Data Flow Analysis:
o Analyzes instruction dependencies to optimize their execution order.
o Ensures efficient use of processor resources.
5. Speculative Execution:
o Executes instructions ahead of time (based on predictions), storing results
temporarily.
o Keeps the processor busy, discarding results if predictions are wrong.

Improvements in Chip Organization and Architecture:

• Increase Hardware Speed:


o Shrinking logic gate size allows more gates, tighter packing, and higher clock
rates.
o Reduced signal propagation time due to smaller distances.
• Increase Cache Size and Speed:
o Part of the processor chip is dedicated to cache, reducing access time to
frequently used data.
o Example: L1 cache is faster than main memory.
• Change Processor Organization:
o Enhances instruction execution speed through parallelism (e.g., multiple cores
or pipelines).

Problems with Clock Speed and Logic Density:

• Power Issues:
o Power density rises with more logic and higher clock speeds, making heat
dissipation a challenge.
• RC Delay:
o Resistance (R) and capacitance (C) in wires slow signal speed (RC product
increases).
o Thinner wires (higher R) and closer wires (higher C) worsen delays as
components shrink.
• Memory Latency:
o Memory speeds lag behind processor speeds, creating a bottleneck.

Multicore Strategy:

• Uses multiple simpler processors (cores) on one chip instead of one complex
processor.
• Benefits:
o Performance increases without raising clock speed.
o Larger caches (e.g., L1, L2, L3) are justified, improving data access speed.
• Evolution: Started with two cores, now includes three or more cache levels on-chip.

Many Integrated Core (MIC) and Graphics Processing Unit (GPU):

• MIC:
o A chip with many general-purpose cores for high parallel performance.
o Challenge: Software must be designed to use many cores effectively.
• GPU:
o Specialized core for parallel graphics tasks (e.g., 2D/3D rendering, video
processing).
o Also used as vector processors for repetitive computations in other fields.
o Typically found on graphics cards but increasingly integrated into CPUs.

2.2 Performance Assessment

Clock Speed:

• Key Parameters for Performance:


o Performance, cost, size, security, reliability, power consumption.
• System Clock:
o Measured in Hz (e.g., MHz, GHz); drives CPU operations.
o Cycle Time (τ): Time per clock cycle, τ = 1/f (f = frequency).
o Example: 400 MHz → τ = 1 / (400 × 10^6) = 2.5 ns.
• Instruction Execution:
o Takes multiple clock cycles (e.g., fetch, decode, load/store, execute).
o Signals need time to settle (1 or 0), and operations must be synchronized.
• Pipelining:
o Overlaps instruction stages, improving efficiency beyond raw clock speed.
• Key Point: Clock speed alone doesn’t determine performance—other factors matter
too.

Instruction Execution Rate:

• Definitions:
o Instruction Count (Ic): Total machine instructions executed in a program.
o Cycles Per Instruction (CPI): Average clock cycles per instruction.
▪ Varies by instruction type (e.g., arithmetic = 1 cycle, branch = 4
cycles).
▪ Formula: CPI = Σ(CPI_i × I_i) / Ic
▪ CPI_i = cycles for instruction type i, I_i = count of type i
instructions.
o Execution Time (T): Time to run a program.
▪ Formula: T = Ic × CPI × τ
• MIPS Rate: Millions of instructions per second, a common performance measure.
o Formula: MIPS = Ic / (T × 10^6) = f / (CPI × 10^6)
• Example Calculation:
o Program: 2M instructions, 400 MHz processor.
o Instruction Mix:
▪ Arithmetic (60%, CPI = 1), Load/Store (18%, CPI = 2), Branch (12%,
CPI = 4), Cache Miss (10%, CPI = 8).
o CPI = (0.6 × 1) + (0.18 × 2) + (0.12 × 4) + (0.1 × 8) = 0.6 + 0.36 + 0.48 + 0.8
= 2.24
o MIPS = (400 × 10^6) / (2.24 × 10^6) ≈ 178 MIPS
Benchmarks:

• Purpose: Test processor performance with standardized programs.


• Characteristics:
o Written in high-level languages (e.g., C, C++, Fortran).
o Portable across systems, widely used, and easily measured.
• Example: SPEC CPU2006:
o 17 floating-point programs, 12 integer programs, ~3M lines of code.
o Measures speed (single task) and throughput (multiple tasks).
• Types: Systems, numerical, commercial tasks.

Amdahl’s Law:

• Concept: Limits speedup from multiple processors due to serial code.


• Key Idea:
o Fraction f of code is parallelizable (no overhead).
o Fraction (1-f) is inherently serial.
• Formula:
o Speedup = T_single / T_parallel = 1 / [(1-f) + (f / N)]
▪ T = execution time on a single processor.
▪ N = number of processors.
• Conclusions:
o Small f → little speedup from parallelism.
o As N → ∞, speedup maxes out at 1 / (1-f).
o Diminishing returns with more processors.
• Applications:
o Servers benefit from parallel connections.
o Databases can split tasks for parallelism.

Problem-Solving Examples

Problem 1 (Page 53):

Given:

• 40 MHz processor, 100,000 instructions.


• Instruction Mix:
o Integer Arithmetic: 45,000 (CPI = 1)
o Data Transfer: 32,000 (CPI = 2)
o Floating Point: 15,000 (CPI = 2)
o Control Transfer: 8,000 (CPI = 2)
Find: Effective CPI, MIPS rate, execution time.

Solution:

1. CPI:
o Total cycles = (45,000 × 1) + (32,000 × 2) + (15,000 × 2) + (8,000 × 2)
o = 45,000 + 64,000 + 30,000 + 16,000 = 155,000 cycles
o
CPI = 155,000 / 100,000 = 1.55
2. Execution Time (T):
o τ = 1 / (40 × 10^6) = 25 ns = 25 × 10^-9 s
o T = Ic × CPI × τ = 100,000 × 1.55 × 25 × 10^-9 = 3.875 × 10^-3 s = 3.875 ms
3. MIPS Rate:
o MIPS = f / (CPI × 10^6) = (40 × 10^6) / (1.55 × 10^6) ≈ 25.81 MIPS

Answer: CPI = 1.55, MIPS ≈ 25.81, T = 3.875 ms

Problem 2 (Page 54):

Given: Two machines, 200 MHz clock, benchmark results:

• Machine A:
o Arithmetic/Logic: 8M (CPI = 1), Load/Store: 4M (CPI = 3), Branch: 2M (CPI
= 4), Others: 4M (CPI = 3)
• Machine B:
o Arithmetic/Logic: 10M (CPI = 1), Load/Store: 8M (CPI = 2), Branch: 2M
(CPI = 4), Others: 4M (CPI = 3)
Find: CPI, MIPS, execution time for each.

Solution:

1. Machine A:
o Ic = 8M + 4M + 2M + 4M = 18M
o Cycles = (8M × 1) + (4M × 3) + (2M × 4) + (4M × 3) = 8M + 12M + 8M +
12M = 40M
o CPI = 40M / 18M ≈ 2.22
o T = Ic × CPI × τ = 18M × 2.22 × (1 / 200M) = 0.2 s
o MIPS = 200 / 2.22 ≈ 90
2. Machine B:
o Ic = 10M + 8M + 2M + 4M = 24M
o Cycles = (10M × 1) + (8M × 2) + (2M × 4) + (4M × 3) = 10M + 16M + 8M +
12M = 46M
o CPI = 46M / 24M ≈ 1.92
o T = 24M × 1.92 × (1 / 200M) = 0.23 s
o MIPS = 200 / 1.92 ≈ 104
Comment: Machine B is faster (higher MIPS, lower CPI) despite more
instructions, due to a better instruction mix and lower average CPI.

Answer:

• Machine A: CPI = 2.22, MIPS ≈ 90, T = 0.2 s


• Machine B: CPI = 1.92, MIPS ≈ 104, T = 0.23 s
Key Takeaways for Midterm

• Understand how performance is improved (pipelining, caches, multicore).


• Know the limits (power, RC delay, memory latency, Amdahl’s Law).
• Master CPI, MIPS, and execution time calculations with examples.
• Be ready to analyze benchmarks and compare processor designs.

Chapter 3: A Top-Level View of Computer Function and


Interconnection
3.1 Computer Components

• Von Neumann Architecture:


o Developed by John von Neumann at the Institute for Advanced Studies,
Princeton.
o Three Key Concepts:
1. Single Memory: Data and instructions stored in the same read-write
memory.
2. Addressable Memory: Contents accessed by location, regardless of
data type.
3. Sequential Execution: Instructions executed one after another (unless
modified).
o Hardwired Program: Result of physically connecting components in a fixed
configuration.
• Why It Matters: Basis for most modern computers, emphasizing memory and
sequential processing.

Hardware and Software:

• Software:
o Sequence of instructions (codes) interpreted by hardware to generate control
signals.
o New programs change the code sequence without rewiring hardware.
• Major Components:
o CPU:
▪ Instruction Interpreter: Decodes and executes instructions.
▪ Arithmetic/Logic Module: Performs general-purpose computations.
o I/O Components:
▪ Input Module: Accepts data/instructions, converts them to internal
signals.
▪ Output Module: Reports results (e.g., to display or storage).

Registers:

• Memory Address Register (MAR): Specifies memory address for next read/write.
• Memory Buffer Register (MBR): Holds data to be written to memory or read from
memory.
• I/O Address Register (I/OAR): Identifies a specific I/O device.
• I/O Buffer Register (I/OBR): Exchanges data between CPU and I/O module.
3.2 Instruction Cycle

• Basic Instruction Cycle:


o Fetch Cycle:
▪ Processor fetches the next instruction from memory.
▪ Program Counter (PC): Holds address of the next instruction;
incremented after fetch.
▪ Instruction loaded into Instruction Register (IR).
o Execute Cycle:
▪ Processor interprets the instruction and performs the action (e.g.,
arithmetic, data transfer).
• Diagram (Figure 3.3): Two phases—Fetch and Execute—run in a loop.

Action Categories:

1. Processor-Memory: Data transferred between processor and memory.


2. Processor-I/O: Data transferred between processor and I/O devices.
3. Data Processing: Arithmetic or logic operations on data.
4. Control: Manages execution flow (e.g., branching).

Hypothetical Machine Example (Figure 3.4):

• Instruction Format: 16 bits (4-bit opcode, 12-bit address).


• Integer Format: 16 bits (1-bit sign, 15-bit magnitude).
• Registers: PC (address), IR (instruction), AC (accumulator for temporary storage).
• Opcodes:
o 0001 = Load AC from memory.
o 0010 = Store AC to memory.
o 0101 = Add to AC from memory.
• Execution Example (Figure 3.5):
o Shows step-by-step memory/register changes in hexadecimal during a
program run.

3.3 Interrupts

• Purpose: Allow the processor to handle events without polling (checking constantly).
• Classes of Interrupts (Table 3.1):
o Program: Caused by instruction execution (e.g., overflow, illegal instruction,
memory access violation).
o Timer: Generated by a processor timer for regular OS tasks.
o I/O: Signaled by I/O controller (e.g., operation complete, error, service
request).
o Hardware Failure: Triggered by faults (e.g., power loss, memory parity
error).
Instruction Cycle with Interrupts:

• Process:
o Processor checks for interrupts after each instruction.
o If an interrupt occurs, it saves the current state (e.g., PC) and jumps to an
Interrupt Handler.
o After handling, it restores the state and resumes the program.
• Diagrams:
o Figure 3.7: Shows program flow with/without interrupts (short/long I/O
waits).
o Figure 3.9: Adds interrupt check to the basic cycle.
o Figure 3.12: State diagram including interrupt handling.

Multiple Interrupts:

• Sequential Processing: Handles one interrupt at a time (Figure 3.13a).


• Nested Processing: Allows higher-priority interrupts to interrupt lower ones (Figure
3.13b).
• Example (Figure 3.14):
o User program interrupted by printer, then communication, then disk—shows
timing sequence.

3.4 I/O Function

• Basic I/O:
o I/O module exchanges data directly with the processor.
o Processor uses I/O instructions (not memory instructions) to read/write data.
o I/OAR identifies the device; I/OBR holds the data.
• Direct Memory Access (DMA):
o I/O module reads/writes to memory directly, bypassing the processor.
o Processor grants permission, then continues other tasks.
o Benefit: Frees the processor from managing data transfers.

3.5 Interconnection Structures

• Types of Transfers (Figure 3.15):


o Memory to Processor: Fetch instructions/data.
o Processor to Memory: Store data.
o I/O to Processor: Read data from devices.
o Processor to I/O: Send data to devices.
o I/O to/from Memory: DMA transfers.

Bus:

• Definition: Shared communication pathway connecting devices.


• Characteristics:
o Signals from one device are received by all attached devices.
o Multiple devices transmitting at once causes signal overlap (garbled data).
o Made of multiple lines (e.g., data, address, control).
• System Bus: Connects major components (CPU, memory, I/O).

Bus Types:

1. Data Bus:
o Moves data between modules.
o Width (e.g., 32, 64, 128 bits) determines bits transferred at once.
o Wider bus = higher performance.
2. Address Bus:
o Specifies source/destination of data (e.g., memory address, I/O port).
o Width determines maximum memory capacity (e.g., 32-bit = 4GB).
o Higher-order bits select module; lower-order bits select location within
module.
3. Control Bus:
o Manages access to data/address lines (shared by all components).
o Timing Signals: Indicate when data/address are valid.
o Command Signals: Specify operations (e.g., read, write).

Point-to-Point Interconnect:

• Why Used:
o High-frequency synchronous buses face electrical constraints (hard to
synchronize/arbitrate).
o Shared buses struggle with high data rates and low latency on-chip.
• Advantages:
o Lower latency, higher data rate, better scalability than traditional buses.
• Example: Modern CPUs use point-to-point links (e.g., Intel’s QuickPath
Interconnect).

Key Diagrams to Understand

• Figure 3.2: Top-level view of computer components (CPU, memory, I/O).


• Figure 3.6: Instruction cycle state diagram (fetch, execute).
• Figure 3.16: Bus interconnection scheme (data, address, control lines).

Key Takeaways for Midterm

• Von Neumann Architecture: Understand its principles and components (CPU,


memory, I/O).
• Instruction Cycle: Master fetch-execute process and how interrupts modify it.
• Interrupts: Know types, handling (sequential/nested), and their role in I/O.
• I/O and DMA: Differentiate processor-driven I/O vs. DMA.
• Buses: Grasp data, address, and control bus roles, plus point-to-point alternatives.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy