03.EECE345 Computer Architecture ISA Design 02
03.EECE345 Computer Architecture ISA Design 02
Single-Cycle
Vinod PANGRACIOUS
School of Engineering
American University in Dubai
Agenda for Today & Next Few
Lectures
Start Microarchitecture
Single-cycle Microarchitectures
Multi-cycle Microarchitectures
Microprogrammed Microarchitectures
Pipelining
2
Recap of Two Weeks and Last
Lecture
Computer Architecture Today and Basics (Lectures 1
& 2)
Fundamental Concepts (Lecture 3)
ISA basics and tradeoffs (Lectures 3 & 4)
4
Food for Thought for You
How would you design a new ISA?
RISC
Simple instructions
Fixed length
Uniform decode
Few addressing modes
CISC
Complex instructions
Variable length
Non-uniform decode
Many addressing modes
7
Now That We Have an ISA
How do we implement it?
8
Implementing the ISA:
Microarchitecture
Basics
9
How Does a Machine Process
Instructions?
What does processing an instruction mean?
Remember the von Neumann model
Process instruction
AS’ Sequential AS
Combinational
Logic
Logic
(State)
13
Remember: Programmer Visible
(Architectural) State
M[0]
M[1]
M[2]
M[3] Registers
M[4] - given special names in the ISA
(as opposed to addresses)
- general vs. special purpose
M[N-1]
Memory Program Counter
array of storage locations memory address
indexed by an address of the current instruction
Multi-cycle machines
Instruction processing broken into multiple cycles/stages
State updates can be made during an instruction’s execution
Architectural state updates made only at the end of an
instruction’s execution
Advantage over single-cycle: The slowest “stage” determines
cycle time
Fetch
Decode
Evaluate Address
Fetch Operands
Execute
Store Result
Not all instructions require all six stages (see P&P Ch.
16
Instruction Processing “Cycle” vs. Machine
Clock Cycle
Single-cycle machine:
All six phases of the instruction processing cycle take a
single machine clock cycle to complete
Multi-cycle machine:
All six phases of the instruction processing cycle can
take multiple machine clock cycles to complete
In fact, each phase can take multiple clock cycles to
complete
17
Instruction Processing Viewed
Another Way
Instructions transform Data (AS) to Data’ (AS’)
Multi-cycle machine:
Control signals needed in the next cycle can be
generated in the current cycle
Latency of control processing can be overlapped with
latency of datapath operation (more parallelism)
Hardwired/combinational vs.
microcoded/microprogrammed control
Control signals generated by combinational logic
versus
Control signals stored in a memory structure
22
Remember…
Single-cycle machine
AS’ Sequential AS
Combinational
Logic
Logic
(State)
23
Let’s Start with the State
Elements
Data and control inputs 5 Read 3
ALU control
register 1
Read
Register 5 Read data 1
Instruction
address numbers register 2 Zero
PC Registers Data ALU ALU
Instruction Add Sum 5 Write result
Instruction
register
memory
Read
Write data 2
a. Instruction memory b. Program counter c. Adder Data data
RegWrite
a. Registers b. ALU
MemWrite
Instruction
address
Address Read
PC data 16 32
Sign
Instruction Add Sum
extend
Write Data
Instruction
data memory
memory
**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
24
For Now, We Will Assume
“Magic” memory and register file
Combinational read
output of the read data port is a combinational
function of the register file contents and the
corresponding read select port
Synchronous write
the selected register is updated on the positive edge
clock transition when write enable is asserted
Cannot affect read output in between clock edges
WB
IF Data
Register #
PC Address Instruction Registers ALU Address
Register #
ID/RF
Instruction
memory Data
Register # EX/AG memory
Data
MEM
**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
26
What Is To Come: The Full MIPS
Datapath PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
Instruction [5– 0]
Instruction [5– 0]
28
Single-Cycle Datapath
for
Arithmetic and Logical
Instructions
29
R-Type ALU Instructions
Assembly (e.g., register-register signed addition)
ADD rdreg rsreg rtreg
Machine encoding
0 rs rt rd 0 ADD R-type
6-bit 5-bit 5-bit 5-bit 5-bit 6-bit
Semantics
if MEM[PC] == ADD rd rs rt
GPR[rd] GPR[rs] + GPR[rt]
PC PC + 4
30
ALU Datapath
Add
ALU operation
Read 25:21 Read 3
PC address register 1 Read
20:16 Read data 1
register 2 Zero
Instruction Instruction
Registers ALU ALU
Instruction 15:11 Write result
register
memory Read
Write data 2
data
RegWrite
1
IF ID EX MEM WB
if MEM[PC] == ADD rd rs rt
GPR[rd] GPR[rs] + GPR[rt]
Combinational
**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
state update logic
PC PC + 4
**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] 31
I-Type ALU Instructions
Assembly (e.g., register-immediate signed
additions)
ADDI rtreg rsreg immediate16
Machine encoding
ADDI rs rt immediate I-type
6-bit 5-bit 5-bit 16-bit
Semantics
if MEM[PC] == ADDI rt rs immediate
GPR[rt] GPR[rs] + sign-extend (immediate)
PC PC + 4
32
Datapath for R and I-Type ALU
Insts.
Add
34
Load Instructions
Assembly (e.g., load 4-byte word)
LW rtreg offset16 (basereg)
Machine encoding
LW base rt offset I-type
6-bit 5-bit 5-bit 16-bit
Semantics
if MEM[PC]==LW rt offset16 (base)
EA = sign-extend(offset) + GPR[base]
GPR[rt] MEM[ translate(EA) ]
PC PC + 4
35
LW Datapath
Add
0
4 add MemWrite
Read 3 ALU operation
Read register 1 MemWrite
PC address Read
data 1
Read Address Read
Instruction register 2 Zero data 16 32
Instruction Registers ALU ALU Sign
Write Read extend
result Address
Instruction register data Write Data
Read data memory
memory data 2
Write Data
data
memory
RegDest RegWrite Write
MemRead
isItype 116
data
ALUSrc
Sign
32
isItype MemRead a. Data memory unit b. Sign-extension unit
extend
1
Machine encoding
Semantics
if MEM[PC]==SW rt offset16 (base)
EA = sign-extend(offset) + GPR[base]
MEM[ translate(EA) ] GPR[rt]
PC PC + 4
37
SW Datapath
Add
1
4 add
Read 3 ALU operation MemWrite
Read register 1 MemWrite
PC address Read
data 1
Read
Instruction register 2 Zero Address Read
Instruction Registers ALU ALU data 16 32
Write Read Sign
result Address extend
Instruction register data
Read Write Data
memory data 2 data memory
Write Data
data
memory
RegDest RegWrite Write
isItype 016
data
ALUSrc MemRead
Sign
32
isItype MemRead
a. Data memory unit b. Sign-extension unit
extend
0
Add
4
add
Read 3 ALU operation isStore
Read register 1 MemWrite
PC address Read
data 1
Read
Instruction register 2 Zero
Instruction Registers ALU ALU
Write Read
result Address
Instruction register data
Read
memory data 2
Write Data
data
memory
RegDest RegWrite Write
data
isItype !isStore 16
ALUSrc
32
Sign isItype MemRead
extend
isLoad
MemtoReg
isLoad
**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] 40
Single-Cycle Datapath
for
Control Flow
Instructions
41
Unconditional Jump Instructions
Assembly
J immediate26
Machine encoding
J immediate J-type
6-bit 26-bit
Semantics
if MEM[PC]==J immediate26
target = { PC[31:28], immediate26, 2’b00 }
PC target
42
Unconditional Jump Datapath
isJ Add
PCSrc
4
X
Read 3 ALU operation 0
Read register 1 MemWrite
PC address Read
data 1
Read
Instruction register 2 Zero
Instruction Registers ALU ALU
Write Read
result Address
Instruction register data
Read
memory data 2
concat Write Data
data
?
memory
RegWrite Write
data
ALUSrc
0 16 32
Sign X MemRead
extend
if MEM[PC]==J immediate26
PC = { PC[31:28], immediate26, 2’b00 } What about JR, JAL, JALR?
43
Conditional Branch Instructions
Assembly (e.g., branch if equal)
BEQ rsreg rtreg immediate16
Machine encoding
44
Conditional Branch Datapath (for
you to finish)
watch out
PC + 4 from instruction datapath
Add
PCSrc Add Sum Branch target
4
Shift
left 2
Read
PC address sub ALU operation
Read 3
Instruction register 1
Read
Instruction data 1
Read
Instruction register 2 To branch
memory Registers ALU Zero
bcond
concat Write control logic
register
Read
data 2
Write
data
RegWrite
0 16
Sign
32
extend
**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
Instruction [5– 0]
47
Single-Cycle Hardwired Control
As combinational function of Inst=MEM[PC]
31 26 21 16 11 6 0
Consider
All R-type and I-type ALU instructions
LW and SW
48
Single-Bit Control Signals
49
JAL and JALR require additional RegDest and MemtoReg options
Single-Bit Control Signals
50
JR and JALR require additional PCSrc options
ALU Control
case opcode
‘0’ select operation according to funct
‘ALUi’ selection operation according to opcode
‘LW’ select addition
‘SW’ select addition
‘Bxx’ select bcond generation function
__ don’t care
51
R-Type ALU
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
1
RegWrite
0
Instruction [25– 21] Read
Read register 1
PC address Read
Instruction [20– 16] data 1
Read
register 2 Zero
bcond
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend
funct
ALU
control
ALU operation
0
Instruction [5– 0]
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
1
RegWrite
0
Instruction [25– 21] Read
Read register 1
PC address Read
Instruction [20– 16] data 1
Read
register 2 Zero
bcond
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend
opcode
ALU
control
ALU operation
0
Instruction [5– 0]
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
1
RegWrite
0
Instruction [25– 21] Read
Read register 1
PC address Read
Instruction [20– 16] data 1
Read
register 2 Zero
bcond
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend
Add
ALU
control
ALU operation
1
Instruction [5– 0]
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
0
RegWrite
1
Instruction [25– 21] Read
Read register 1
PC address Read
Instruction [20– 16] data 1
Read
register 2 Zero
bcond
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
X
M Write data 2 result Address 1
Instruction u register M data
M
X
memory x u
Instruction [15– 11] Write x u
1 data Data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend
Add
ALU
control
ALU operation
0
Instruction [5– 0]
0
RegWrite
0
Instruction [25– 21] Read
Read register 1
PC address Read
Instruction [20– 16] data 1
Read
register 2 Zero
bcond
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
X
Instruction u register M data
M
X
memory x u
Instruction [15– 11] Write x u
1 data Data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend
bcond
ALU
control
ALU operation
0
Instruction [5– 0]
0
RegWrite
0
Instruction [25– 21] Read
Read register 1
PC address Read
Instruction [20– 16] data 1
Read
register 2 Zero
bcond
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
X
Instruction u register M data
M
X
memory x u
Instruction [15– 11] Write x u
1 data Data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend
bcond
ALU
control
ALU operation
0
Instruction [5– 0]
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
X
u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc
0
RegWrite
0
Instruction [25– 21] Read
Read register 1
PC address Read
Instruction [20– 16] data 1
Read
register 2 Zero
bcond
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
X
Instruction u register M data
X
M
X
memory x u
Instruction [15– 11] Write x u
1 data Data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
ALU operation
0
X
extend ALU
control
Instruction [5– 0]
59
Evaluating the Single-
Cycle Microarchitecture
60
A Single-Cycle
Microarchitecture
Is this a good idea/design?
61
A Single-Cycle
Microarchitecture:
Every instruction takes 1 cycle Analysis
to execute
CPI (Cycles per instruction) is strictly 1
62
What is the Slowest Instruction
toLet’s
Process?
go back to the basics
63
Single-Cycle Datapath Analysis
Assume
memory units (read or write): 200 ps
steps IF ID EX MEM WB
Delay
resources mem RF ALU mem RF
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
Instruction [5– 0]
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
100ps
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
100ps
MemWrite
ALUSrc
RegWrite
200ps
Instruction
Instruction [20– 16]
0
Read
register 2
data 1
400ps
u M
memory x u
Instruction [15– 11] u
350ps
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control
Instruction [5– 0]
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
100ps
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
100ps
MemWrite
ALUSrc
RegWrite
200ps
Instruction
Instruction [20– 16]
0
Read
register 2
data 1
550ps
M Write data 2 0 Address Read
result data 1
Instruction u register M
u M
memory Instruction [15– 11] x u
600ps 350ps
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control
Instruction [5– 0]
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
100ps
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
100ps
MemWrite
ALUSrc
RegWrite
200ps
Instruction
Instruction [20– 16]
0
Read
register 2
data 1
350ps 550ps
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control
Instruction [5– 0]
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
200ps
PC+4 [31– 28] M M
u u
100ps ALU
Add result 1
x x
0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
350ps
MemWrite
ALUSrc
RegWrite
PC
Read
address
Instruction [25– 21] Read
register 1
Read
350ps
200ps
Instruction
Instruction [20– 16]
0
Read
register 2
data 1
Instruction [5– 0]
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1
100ps ALU
Add result 1
x x
0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
200ps
MemWrite
ALUSrc
RegWrite
200ps
Instruction
Instruction [20– 16]
0
Read
register 2
data 1
Registers Read
Zero
bcond
ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control
Instruction [5– 0]
71
What is the Slowest Instruction
toMemory
Process?
is not magic
72
Single Cycle uArch: Complexity
Contrived
All instructions run as slow as the slowest instruction
Inefficient
All instructions run as slow as the slowest instruction
Must provide worst-case combinational resources in parallel as
required by any instruction
Need to replicate a resource if it is needed more than once by
an instruction during different parts of the instruction
processing cycle
Balanced design
Balance instruction/data flow through hardware
components
Design to eliminate bottlenecks: balance the hardware
for the work 74
Single-Cycle Design vs. Design
Principles
Critical path design
Balanced design
75
Aside: System Design Principles
When designing computer systems/architectures, it
is important to follow good principles
76
Aside: From Lecture 1
“architecture […] based upon principle, and not
upon precedent”
77
Aside: System Design Principles
We will continue to cover key principles in this
course
Here are some references where you can learn more
78
Aside: One Important Principle
Keep it simple
80