02.EECE 345 Computer Architecture ISA Design
02.EECE 345 Computer Architecture ISA Design
Spring 2025
Jan , 2025
American University in Dubai
EECE345 Computer Architecture
Lecture II
Vinod Pangracious
School of Engineering
Chapter II: Computer Architecture
3
Computer Architecture
4
Computer Architecture
Markets
• Each level of design imposes different requirements
and constraints, which change over time. Applications
• History and economics: there is commercial pressure Operating Systems
to evolve in a way that minimizes disruption and Programming languages
possible costs to the ecosystem (e.g., software). and compilers
• There is also a need to look forward and not design for Architecture
yesterday’s technology and workloads! Microarchitecture
• Design decisions should be carefully justified through Hardware
experimentation. Fabrication Technology
5
Ex: The Smartphone
Camera
Sensor hub
Touchscreen & CORTEX-M Power
sensor hub CORTEX-M management
CORTEX-M
CORTEX-A
CORTEX-M
Apps processor Flash controller
CORTEX-M
CORTEX-A
CORTEX-M
GPS
2G/3G/4G/5G CORTEX-M
CORTEX-A
CORTEX-R Bluetooth
CORTEX-M
CORTEX-M
Wi-Fi
CORTEX-R
6 CORTEX-M
What is A Computer?
• Computation
• Communication
• Storage (memory)
7
What is A Computer?
• We will cover all three components
Processing
control Memory
(sequencing) (program I/O
and data)
datapath
8
The Von Neumann
Model/Architecture
• Also called stored program computer (instructions in memory). Two key
properties:
• Stored program
• Instructions stored in a linear memory array
• Memory is unified betweenWheninstructions and data as an instruction?
is a value interpreted
• The interpretation of a stored value depends on the control signals
9
The Von Neumann
Model/Architecture
• Recommended reading
• Burks, Goldstein, von Neumann, “Preliminary discussion of the logical design of an
electronic computing instrument,” 1946.
• Patt and Patel book, Chapter 4, “The von Neumann Model”
• Stored program
10
The Von Neumann Model (of a Computer)
MEMORY
Mem Addr Reg
PROCESSING UNIT
INPUT OUTPUT
ALU TEMP
CONTROL UNIT
IP Inst Register
11
The Von Neumann Model (of a
Computer)
• A: No.
• Qualified Answer: But, it has been the dominant way
• i.e., the dominant paradigm for computing
• for N decades
12
The Dataflow Model (of a Computer)
• Von Neumann model: An instruction is fetched and executed in control flow order
• As specified by the instruction pointer
• Sequential unless explicit control flow instruction
+ *2
v <= a + b;
w <= b * 2;
x <= v - w
y <= v + w - +
z <= x * y
Sequential Dataflow *
z
Which model is more natural to you as a programmer?
14
More on Data Flow
15
Data Flow Nodes
16
An Example Data Flow Program
OUT
17
ISA-level Tradeoff: Instruction Pointer
19
Let’s Get Back to the Von Neumann Model
20
The Von-Neumann Model
• All major instruction set architectures today use this model
• x86, ARM, MIPS, SPARC, Alpha, POWER
• Underneath (at the microarchitecture level), the execution model of almost all
implementations (or, microarchitectures) is very different
• Pipelined instruction execution: Intel 80486 uarch
• Multiple instructions at a time: Intel Pentium uarch
• Out-of-order execution: Intel Pentium Pro uarch
• Separate instruction and data caches
• But, what happens underneath that is not consistent with the von Neumann model is not
exposed to software
• Difference between ISA and microarchitecture
21
What is Computer Architecture?
23
Last Lecture Recap
• Levels of Transformation
• Algorithm, ISA, Microarchitecture
• Moore’s Law
• What is Computer Architecture
• Why Study Computer Architecture
• Fundamental Concepts
• Von Neumann Model
• Dataflow Model
• ISA vs. Microarchitecture
• Digital system Design (ALU Architecture)
• Assignments: 1, 2 and 3
24
Review: ISA vs. Microarchitecture
Problem
• ISA Algorithm
• Agreed upon interface between software and hardware Program
• SW/compiler assumes, HW promises ISA
• What the software writer needs to know to write and Microarchitecture
debug system/user programs Circuits
• Microarchitecture Electrons
• Specific implementation of an ISA
• Not visible to the software
• Microprocessor
• ISA, uarch, circuits
• “Architecture” = ISA + microarchitecture
25
Review: ISA
• Instructions
• Opcodes, Addressing Modes, Data Types
• Instruction Types and Formats
• Registers, Condition Codes
• Memory
• Address space, Addressability, Alignment
• Virtual memory management
27
Property of ISA vs. Uarch?
• Remember
• Microarchitecture: Implementation of the ISA under specific design constraints and goals
28
Design Point
• A set of design considerations and their importance Problem
• leads to tradeoffs in both ISA and uarch Algorithm
• Considerations Program
• Cost ISA
• Performance Microarchitecture
• Maximum power consumption Circuits
• Energy consumption (battery life) Electrons
• Availability
• Reliability and Correctness
• Time to Market
30
Tradeoffs: Soul of Computer
Architecture
• ISA-level tradeoffs
• Microarchitecture-level tradeoffs
• Computer architecture is the science and art of making the appropriate trade-offs
to meet a design point
• Why art? 31
Why Is It (Somewhat) Art?
36
MIPS
37
ARM
38
Set of Instructions, Encoding, and
Spec
39
40
Bit Steering in Alpha
41
What Are the Elements of An ISA?
• Instruction sequencing model
• Control flow vs. data flow
• Tradeoffs?
-- Computations that are not easily expressible with “postfix notation” are
difficult to map to stack machines
• Cannot perform operations on many values at the same time (only top N values on the
stack at the same time)
• Not flexible
43
An Example: Stack Machine (II)
44
An Example: Stack Machine
Operation
45
Other Examples
• PDP-11: A 2-address machine
• PDP-11 ADD: 4-bit opcode, 2 6-bit operand specifiers
• Why? Limited bits to specify an instruction
• Disadvantage: One source operand is always clobbered with the result of the
instruction
• How do you ensure you preserve the old value of the source?
• Data types
• Definition: Representation of information for which there are instructions
that operate on the representation
• Integer, floating point, character, binary, decimal, BCD
• Doubly linked list, queue, string, bit vector, stack
• VAX: INSQUEUE and REMQUEUE instructions on a doubly linked list or queue; FINDFIRST
• Digital Equipment Corp., “VAX11 780 Architecture Handbook,” 1977.
• X86: SCAN opcode operates on character strings; PUSH/POP 47
Data Type Tradeoffs
• What is the benefit of having more or high-level data types in the ISA?
• What is the disadvantage?
50
What Are the Elements of An ISA?
• Registers
• How many
• Size of each register
51
Programmer Visible (Architectural)
State
M[0]
M[1]
M[2]
M[3] Registers
M[4] - given special names in the ISA
(as opposed to addresses)
- general vs. special purpose
• Microarchitectural state
• Programmer cannot access this directly
53
Evolution of Register Architecture
• Accumulator
• a legacy from the “adding” machine days
• Operate instructions
• Process data: arithmetic and logical operations
• Fetch operands, compute result, store result
• Implicit sequential control flow
56
What Are the Elements of An ISA?
• Addressing modes specify how to obtain the operands
• Absolute LW rt, 10000
use immediate value as address
• Register Indirect: LW rt, (rbase)
use GPR[rbase] as address
• Displaced or based: LW rt, offset(rbase)
use offset+GPR[rbase] as address
• Indexed: LW rt, (rbase, rindex)
use GPR[rbase]+GPR[rindex] as address
• Memory Indirect LW rt ((rbase))
use value at M[ GPR[ rbase ] ] as address
• Auto inc/decrement LW Rt, (rbase)
use GRP[rbase] as address, but inc. or dec. GPR[rbase] each time 57
What Are the Benefits of Different
Addressing Modes?
• Orthogonal ISA:
• All addressing modes can be used with all instruction types
• Example: VAX
• (~13 addressing modes) x (>300 opcodes) x (integer and FP formats)
59
Is the LC-3b ISA Orthogonal?
Orthogonal Architecture
60
LC-3b: Addressing Modes of ADD
61
LC-3b: Addressing Modes of of JSR(R)
62
What Are the Elements of An ISA?
• How to interface with I/O devices
• Memory mapped I/O
• A region of memory is mapped to I/O devices
• I/O operations are loads and stores to those locations
• Tradeoffs?
• Which one is more general purpose?
63
What Are the Elements of An ISA?
• Privilege modes
• User vs supervisor
• Who can execute what instructions?
• Virtual memory
• Each program has the illusion of the entire memory space, which is greater than
physical memory
64
Another Question or Two
65
Complex vs. Simple Instructions
• Complex instruction: An instruction does a lot of work, e.g. many
operations
• Insert in a doubly linked list
• Compute FFT
• String copy
67
ISA-level Tradeoffs: Semantic Gap
• Where to place the ISA? Semantic gap
• Closer to high-level language (HLL) Small semantic gap, complex
instructions
• Closer to hardware control signals? Large semantic gap, simple instructions
• Enabled by the ability to specify repeated execution of an instruction (in the ISA)
• Using a “prefix” called REP prefix
• RISC motivated by
• Memory stalls (no work done in a complex instruction when there is a memory
stall?)
• When is this correct?
• Simplifying the hardware lower cost, higher frequency
• Enabling the compiler to optimize the code better
• Find fine-grained parallelism to reduce stalls
73
An Aside
• An Historical Perspective on RISC Development at IBM
• http://www-03.ibm.com/ibm/history/ibm100/us/en/icons/ris
c/
74
How High or Low Can You Go?
• Examples:
• Limited on-chip and off-chip memory size
• Limited compiler optimization technology
• Limited memory bandwidth
• Need for specialization in important applications (e.g., MMX)
• One can translate from one ISA to another ISA to change the semantic
gap tradeoffs
• ISA (virtual ISA) Implementation ISA
• Examples
• Intel’s and AMD’s x86 implementations translate x86 instructions into
programmer-invisible microoperations (simple instructions) in hardware
• Transmeta’s x86 implementations translated x86 instructions into “secret” VLIW
instructions in software (code morphing software)
Klaiber, “The Technology Behind Crusoe Processors,” Transmeta White Paper 2000.
78
Software-Based Translation
Klaiber, “The Technology Behind Crusoe Processors,” Transmeta White Paper 2000.
79
ISA-level Tradeoffs: Instruction
Length
• Tradeoffs
• Code size (memory space, bandwidth, latency) vs. hardware complexity
• ISA extensibility and expressiveness vs. hardware complexity
• Performance? Smaller code vs. ease of decode 80
ISA-level Tradeoffs: Uniform Decode
• Uniform decode: Same bits in each instruction correspond to the same
meaning
• Opcode is always in the same location
• Ditto operand specifiers, immediate values, …
• Many “RISC” ISAs: Alpha, MIPS, SPARC
+ Easier decode, simpler hardware
+ Enables parallelism: generate target address before knowing the instruction is a branch
-- Restricts instruction format (fewer instructions?) or wastes space
• Non-uniform decode
• E.g., opcode can be the 1st-7th byte in x86
+ More compact and powerful instruction format
-- More complex decode logic
81
x86 vs. Alpha Instruction Formats
• x86:
• Alpha:
82
MIPS Instruction Format
• R-type, 3 register operands
0 rs rt rd shamt funct R-type
6-bit 5-bit 5-bit 5-bit 5-bit 6-bit
• Simple Decoding
• 4 bytes per instruction, regardless of format
• must be 4-byte aligned (2 lsb of PC must be 2b’00) 83
ARM
84
A Note on Length and Uniformity
85
A Note on RISC vs. CISC
• Usually, …
• RISC
• Simple instructions
• Fixed length
• Uniform decode
• Few addressing modes
• CISC
• Complex instructions
• Variable length
• Non-uniform decode
• Many addressing modes 86
ISA-level Tradeoffs: Number of
Registers
• Affects:
• Number of bits used for encoding register address
• Number of values kept in fast storage (register file)
• (uarch) Size, access time, power consumption of register file
87
ISA-level Tradeoffs: Addressing
Modes
• Addressing mode specifies how to obtain an operand of an instruction
• Register
• Immediate
• Memory (displacement, register indirect, indexed, absolute, memory indirect,
autoincrement, autodecrement, …)
• More modes:
+ help better support programming constructs (arrays, pointer-based accesses)
-- make it harder for the architect to design
-- too many choices for the compiler?
• Many ways to do the same thing complicates compiler design
• Wulf, “Compilers and Computer Architecture,” IEEE Computer 1981
88
x86 vs. Alpha Instruction Formats
• x86:
• Alpha:
89
x86
Register indirect
Memory
absolute
SIB +
displacement
register +
displacement
register Register
90
x86
indexed
(base +
index)
scaled
(base +
index*4)
91
X86 SIB-D Addressing Mode
Static address
Dynamic storage
Arrays
Records
2D arrays
2D arrays
• Virtual memory
• vs. overlay programming
• Should the programmer be concerned about the size of code
blocks fitting physical memory?
• Addressing modes
• Unaligned memory access
• Compile/programmer needs to align data
96
MIPS: Aligned Access
MSB byte-3 byte-2 byte-1 byte-0 LSB
• LWL/LWR is slower
• Note LWL and LWR still fetch within word boundary
97
X86: Unaligned Access
• LD/ST instructions automatically align data that spans a “word”
boundary
• Programmer/compiler does not need to worry about where data
is stored (whether or not in a word-aligned location)
98
X86: Unaligned Access
99
What About ARM?
• https://www.scss.tcd.ie/~waldroj/3d1/arm_arm.pdf
• Section A2.8
100
Historical Performance Gains
• By 1985, it was possible to integrate a complete microprocessor onto
a single die or “chip.”
• As fabrication technology improved, and transistors got smaller, the
performance of a single core improved quickly.
• Performance improved at the rate of 52% per year for nearly 20 years
(measured using SPEC benchmark data).
• Note: the data are for desktop/server processors
101
Historical Performance Gains
Clock period
• Clock frequency improved
104
Clocks Per Instruction (CPI)
• Eventually, the industry was also able to fetch and execute multiple
instructions per clock cycle. This reduced CPI to below 1.
• When we fetch and execute multiple instructions together, we often
refer to Instructions Per Cycle (IPC), which is 1/CPI.
• For instructions to be executed at the same time, they must be
independent.
• Again, growing transistor budgets were exploited to help find and exploit
this Instruction-Level Parallelism (ILP).
105
What Is Pipelining?
Engine
Chassis Paint
Time Time
A A
B B
C C
improvements. 0.003
• The remaining gains (~8x) were from a
0.0025
reduction in instruction count, better
compiler optimizations, and improvements 0.002
in IPC. 0.0015
0.001
The graph to the right shows these
improvements. It plots performance 0.0005
Year
(SpecInt2000 benchmark performance per 0
1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004
MHz for Intel processors against time).
A Shorter Critical Path
109