0% found this document useful (0 votes)
32 views80 pages

03.EECE345 Computer Architecture ISA Design 02

The document introduces microarchitecture concepts, focusing on single-cycle microarchitectures and comparing them with multi-cycle architectures. It outlines the instruction processing cycle, emphasizing the phases involved and the differences in control and data handling between single-cycle and multi-cycle machines. Additionally, it discusses the implications of design choices in ISA and the performance analysis of different microarchitecture types.

Uploaded by

ahmed27042004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views80 pages

03.EECE345 Computer Architecture ISA Design 02

The document introduces microarchitecture concepts, focusing on single-cycle microarchitectures and comparing them with multi-cycle architectures. It outlines the instruction processing cycle, emphasizing the phases involved and the differences in control and data handling between single-cycle and multi-cycle machines. Additionally, it discusses the implications of design choices in ISA and the performance analysis of different microarchitecture types.

Uploaded by

ahmed27042004
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 80

Intro to Microarchitecture:

Single-Cycle

Vinod PANGRACIOUS
School of Engineering
American University in Dubai
Agenda for Today & Next Few
Lectures
Start Microarchitecture

 Single-cycle Microarchitectures

 Multi-cycle Microarchitectures

 Microprogrammed Microarchitectures

 Pipelining

 Issues in Pipelining: Control & Data Dependence


Handling, State Maintenance and Recovery, …

2
Recap of Two Weeks and Last
Lecture
Computer Architecture Today and Basics (Lectures 1
& 2)
 Fundamental Concepts (Lecture 3)
 ISA basics and tradeoffs (Lectures 3 & 4)

 Last Lecture: ISA tradeoffs continued + MIPS ISA


 Instruction length
 Uniform vs. non-uniform decode
 Number of registers
 Addressing modes
 Aligned vs. unaligned access
 RISC vs. CISC properties
 MIPS ISA Overview
3
Food for Thought for You

 As you learn the MIPS ISA, think about what


tradeoffs the designers have made
 in terms of the ISA properties, we talked about
 And think about the pros and cons of design choices
 In comparison to ARM, Alpha
 In comparison to x86, VAX
 And think about the potential mistakes
 Branch delay slot?
 Load delay slot?
 No FP, no multiply, MIPS (initial) Look Backward

4
Food for Thought for You
 How would you design a new ISA?

 Where would you place it?


 What design choices would you make in terms of
ISA properties?

 What would be the first question you ask in this


process?
 “What is my design point?”

Look Forward & Up


5
Review: Other Example ISA-level
Tradeoffs
 Condition codes vs. not

 VLIW vs. single instruction


 SIMD (single instruction multiple data) vs. SISD
 Precise vs. imprecise exceptions
 Unaligned access vs. not
 Virtual memory vs. not
 Unaligned access vs. not
 Hardware interlocks vs. software-guaranteed
interlocking
 Cache coherence (hardware vs. software)

Think Programmer vs. (Micro)architect


6
Review: A Note on RISC vs.
CISC
Usually, …

 RISC
 Simple instructions
 Fixed length
 Uniform decode
 Few addressing modes

 CISC
 Complex instructions
 Variable length
 Non-uniform decode
 Many addressing modes
7
Now That We Have an ISA
 How do we implement it?

 i.e., how do we design a system that obeys the


hardware/software interface?

 Aside: “System” can be solely hardware or a


combination of hardware and software
 Remember “Translation of ISAs”
 A virtual ISA can be converted by “software” into an
implementation ISA

 We will assume “hardware” for most lectures

8
Implementing the ISA:
Microarchitecture
Basics

9
How Does a Machine Process
Instructions?
What does processing an instruction mean?
 Remember the von Neumann model

AS = Architectural (programmer visible) state before


an instruction is processed

Process instruction

AS’ = Architectural (programmer visible) state after an


instruction is processed

 Processing an instruction: Transforming AS to AS’


according to the ISA specification of the instruction
10
The “Process instruction” Step
 ISA specifies abstractly what AS’ should be, given an
instruction and AS
 It defines an abstract finite state machine where
 State = programmer-visible state
 Next-state logic = instruction execution specification
 From ISA point of view, there are no “intermediate
states” between AS and AS’ during instruction execution
 One state transition per instruction

 Microarchitecture implements how AS is transformed


to AS’
 There are many choices in implementation
 We can have programmer-invisible state to optimize the
speed of instruction execution: multiple state transitions
per instruction
 Choice 1: AS  AS’ (transform AS to AS’ in a single clock cycle)
 Choice 2: AS  AS+MS1  AS+MS2  AS+MS3  AS’ (take 11
A Very Basic Instruction
Processing
 Engine
Each instruction takes a single clock cycle to
execute
 Only combinational logic is used to implement
instruction execution
 No intermediate, programmer-invisible state updates

AS = Architectural (programmer visible) state


at the beginning of a clock cycle

Process instruction in one clock cycle

AS’ = Architectural (programmer visible) state


at the end of a clock cycle
12
A Very Basic Instruction
Processing
 Engine
Single-cycle machine

AS’ Sequential AS
Combinational
Logic
Logic
(State)

 What is the clock cycle time determined by?


 What is the critical path of the combinational logic
determined by?

13
Remember: Programmer Visible
(Architectural) State

M[0]
M[1]
M[2]
M[3] Registers
M[4] - given special names in the ISA
(as opposed to addresses)
- general vs. special purpose

M[N-1]
Memory Program Counter
array of storage locations memory address
indexed by an address of the current instruction

Instructions (and programs) specify how to transform


the values of programmer visible state
14
Single-cycle vs. Multi-cycle
Machines
Single-cycle machines
 Each instruction takes a single clock cycle
 All state updates made at the end of an instruction’s execution
 Big disadvantage: The slowest instruction determines cycle
time  long clock cycle time

 Multi-cycle machines
 Instruction processing broken into multiple cycles/stages
 State updates can be made during an instruction’s execution
 Architectural state updates made only at the end of an
instruction’s execution
 Advantage over single-cycle: The slowest “stage” determines
cycle time

 Both single-cycle and multi-cycle machines literally


follow the von Neumann model at the microarchitecture
level 15
Instruction Processing “Cycle”
 Instructions are processed under the direction of a
“control unit” step by step.
 Instruction cycle: Sequence of steps to process an
instruction
 Fundamentally, there are six phases:

 Fetch
 Decode
 Evaluate Address
 Fetch Operands
 Execute
 Store Result

 Not all instructions require all six stages (see P&P Ch.
16
Instruction Processing “Cycle” vs. Machine
Clock Cycle
 Single-cycle machine:
 All six phases of the instruction processing cycle take a
single machine clock cycle to complete

 Multi-cycle machine:
 All six phases of the instruction processing cycle can
take multiple machine clock cycles to complete
 In fact, each phase can take multiple clock cycles to
complete

17
Instruction Processing Viewed
Another Way
 Instructions transform Data (AS) to Data’ (AS’)

 This transformation is done by functional units


 Units that “operate” on data
 These units need to be told what to do to the data
 An instruction processing engine consists of two
components
 Datapath: Consists of hardware elements that deal with
and transform data signals
 functional units that operate on data
 hardware structures (e.g. wires and muxes) that enable the
flow of data into the functional units and registers
 storage units that store data (e.g., registers)
 Control logic: Consists of hardware elements that
determine control signals, i.e., signals that specify what
the datapath elements should do to the data 18
Single-cycle vs. Multi-cycle:
Control
Single-cycle& Data
machine:
 Control signals are generated in the same clock cycle
as the one during which data signals are operated on
 Everything related to an instruction happens in one
clock cycle (serialized processing)

 Multi-cycle machine:
 Control signals needed in the next cycle can be
generated in the current cycle
 Latency of control processing can be overlapped with
latency of datapath operation (more parallelism)

 We will see the difference clearly in


microprogrammed multi-cycle microarchitectures
19
Many Ways of Datapath and Control
Design
 There are many ways of designing the data path

and control logic

 Single-cycle, multi-cycle, pipelined datapath and


control
 Single-bus vs. multi-bus datapaths

 Hardwired/combinational vs.
microcoded/microprogrammed control
 Control signals generated by combinational logic
versus
 Control signals stored in a memory structure

 Control signals and structure depend on the


20
datapath design
Flash-Forward: Performance
Analysis
Execution time of an instruction
 {CPI} x {clock cycle time}
 Execution time of a program
 Sum over all instructions [{CPI} x {clock cycle time}]
 {# of instructions} x {Average CPI} x {clock cycle
time}

 Single cycle microarchitecture performance


 CPI = 1
 Clock cycle time = long
 Multi-cycle microarchitecture performance
 CPI = different for each instruction Now, we have
two degrees of freedom
 Average CPI  hopefully small to optimize independently
 Clock cycle time = short 21
A Single-Cycle
Microarchitecture
A Closer Look

22
Remember…
 Single-cycle machine

AS’ Sequential AS
Combinational
Logic
Logic
(State)

23
Let’s Start with the State
 Elements
Data and control inputs 5 Read 3
ALU control
register 1
Read
Register 5 Read data 1
Instruction
address numbers register 2 Zero
PC Registers Data ALU ALU
Instruction Add Sum 5 Write result
Instruction
register
memory
Read
Write data 2
a. Instruction memory b. Program counter c. Adder Data data

RegWrite

a. Registers b. ALU

MemWrite

Instruction
address
Address Read
PC data 16 32
Sign
Instruction Add Sum
extend
Write Data
Instruction
data memory
memory

a. Instruction memory b. Program counter c. Adder MemRead

a. Data memory unit b. Sign-extension unit

**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
24
For Now, We Will Assume
 “Magic” memory and register file
 Combinational read
 output of the read data port is a combinational
function of the register file contents and the
corresponding read select port

 Synchronous write
 the selected register is updated on the positive edge
clock transition when write enable is asserted
 Cannot affect read output in between clock edges

 Single-cycle, synchronous memory


 Contrast this with memory that tells when the data is
ready
25
 i.e., Ready bit: indicating the read or write is done
Instruction Processing
 5 generic steps (P&H book)
 Instruction fetch (IF)
 Instruction decode and register operand fetch (ID/RF)
 Execute/Evaluate memory address (EX/AG)
 Memory operand fetch (MEM)
 Store/writeback result (WB)

WB
IF Data

Register #
PC Address Instruction Registers ALU Address
Register #
ID/RF
Instruction
memory Data
Register # EX/AG memory

Data
MEM
**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
26
What Is To Come: The Full MIPS
Datapath PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M


u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite

Instruction [25– 21] Read


Read register 1
PC address Read
Instruction [20– 16] data 1
Read
register 2 Zero
bcond
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control

Instruction [5– 0]

**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier.


ALL RIGHTS RESERVED.]
27
JAL, JR, JALR omitted
Processor Design
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M


u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite

Instruction [25– 21] Read


Read register 1
PC address Read
Instruction [20– 16] data 1
Read
register 2 Zero
bcond
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control

Instruction [5– 0]

28
Single-Cycle Datapath
for
Arithmetic and Logical
Instructions

29
R-Type ALU Instructions
 Assembly (e.g., register-register signed addition)
ADD rdreg rsreg rtreg

 Machine encoding

0 rs rt rd 0 ADD R-type
6-bit 5-bit 5-bit 5-bit 5-bit 6-bit

 Semantics

if MEM[PC] == ADD rd rs rt
GPR[rd]  GPR[rs] + GPR[rt]
PC  PC + 4

30
ALU Datapath

Add

ALU operation
Read 25:21 Read 3
PC address register 1 Read
20:16 Read data 1
register 2 Zero
Instruction Instruction
Registers ALU ALU
Instruction 15:11 Write result
register
memory Read
Write data 2
data

RegWrite
1

ADD rdreg rsreg rtreg

IF ID EX MEM WB
if MEM[PC] == ADD rd rs rt
GPR[rd]  GPR[rs] + GPR[rt]
Combinational
**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
state update logic
PC  PC + 4
**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] 31
I-Type ALU Instructions
 Assembly (e.g., register-immediate signed
additions)
ADDI rtreg rsreg immediate16

 Machine encoding
ADDI rs rt immediate I-type
6-bit 5-bit 5-bit 16-bit

 Semantics
if MEM[PC] == ADDI rt rs immediate
GPR[rt]  GPR[rs] + sign-extend (immediate)
PC  PC + 4

32
Datapath for R and I-Type ALU
Insts.
Add

Read 3 ALU operation


Read 25:21
PC register 1 MemWrite
address Read
data 1
Read
20:16
Instruction register 2 Zero
Instruction Registers ALU ALU
Write Read
15:11 result Address
Instruction register data
Read
memory data 2
Write Data
RegDest data
memory
isItype RegWrite Write
data
ALUSrc
1
16 32
Sign isItype MemRead
extend

ADDI rtreg rsreg immediate16


IF ID EX MEM WB
if MEM[PC] == ADDI rt rs immediate
GPR[rt]  GPR[rs] + sign-extend (immediate)
Combinational
PC  PC + 4 state update logic33
**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
Single-Cycle Datapath
for
Data Movement
Instructions

34
Load Instructions
 Assembly (e.g., load 4-byte word)
LW rtreg offset16 (basereg)

 Machine encoding
LW base rt offset I-type
6-bit 5-bit 5-bit 16-bit

 Semantics
if MEM[PC]==LW rt offset16 (base)
EA = sign-extend(offset) + GPR[base]
GPR[rt]  MEM[ translate(EA) ]
PC  PC + 4

35
LW Datapath

Add
0
4 add MemWrite
Read 3 ALU operation
Read register 1 MemWrite
PC address Read
data 1
Read Address Read
Instruction register 2 Zero data 16 32
Instruction Registers ALU ALU Sign
Write Read extend
result Address
Instruction register data Write Data
Read data memory
memory data 2
Write Data
data
memory
RegDest RegWrite Write
MemRead
isItype 116
data
ALUSrc
Sign
32
isItype MemRead a. Data memory unit b. Sign-extension unit
extend
1

LW rtreg offset16 (basereg)


if MEM[PC]==LW rt offset16 (base) IF ID EX MEM WB
EA = sign-extend(offset) + GPR[base] Combinational
GPR[rt]  MEM[ translate(EA) ]
PC  PC + 4 state update logic36
Store Instructions
 Assembly (e.g., store 4-byte word)
SW rtreg offset16 (basereg)

 Machine encoding

SW base rt offset I-type


6-bit 5-bit 5-bit 16-bit

 Semantics
if MEM[PC]==SW rt offset16 (base)
EA = sign-extend(offset) + GPR[base]
MEM[ translate(EA) ]  GPR[rt]
PC  PC + 4

37
SW Datapath

Add
1
4 add
Read 3 ALU operation MemWrite
Read register 1 MemWrite
PC address Read
data 1
Read
Instruction register 2 Zero Address Read
Instruction Registers ALU ALU data 16 32
Write Read Sign
result Address extend
Instruction register data
Read Write Data
memory data 2 data memory
Write Data
data
memory
RegDest RegWrite Write
isItype 016
data
ALUSrc MemRead

Sign
32
isItype MemRead
a. Data memory unit b. Sign-extension unit
extend
0

SW rtreg offset16 (basereg)

if MEM[PC]==SW rt offset16 (base) IF ID EX MEM WB


EA = sign-extend(offset) + GPR[base] Combinational
MEM[ translate(EA) ]  GPR[rt]
PC  PC + 4 state update logic38
Load-Store Datapath

Add

4
add
Read 3 ALU operation isStore
Read register 1 MemWrite
PC address Read
data 1
Read
Instruction register 2 Zero
Instruction Registers ALU ALU
Write Read
result Address
Instruction register data
Read
memory data 2
Write Data
data
memory
RegDest RegWrite Write
data
isItype !isStore 16
ALUSrc
32
Sign isItype MemRead
extend
isLoad

**Based on original figure from [P&H CO&D, COPYRIGHT


2004 Elsevier. ALL RIGHTS RESERVED.]
39
Datapath for Non-Control-Flow
Insts.
Add

Read 3 ALU operation isStore


Read register 1 MemWrite
PC address Read
data 1
Read
Instruction register 2 Zero
Instruction Registers ALU ALU
Write Read
result Address
Instruction register data
Read
memory data 2
Write Data
data
memory
RegDest RegWrite Write
data
isItype !isStore 16
ALUSrc
32
Sign isItype MemRead
extend
isLoad

MemtoReg
isLoad
**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] 40
Single-Cycle Datapath
for
Control Flow
Instructions

41
Unconditional Jump Instructions
 Assembly
J immediate26

 Machine encoding

J immediate J-type
6-bit 26-bit

 Semantics
if MEM[PC]==J immediate26
target = { PC[31:28], immediate26, 2’b00 }
PC  target

42
Unconditional Jump Datapath

isJ Add
PCSrc
4
X
Read 3 ALU operation 0
Read register 1 MemWrite
PC address Read
data 1
Read
Instruction register 2 Zero
Instruction Registers ALU ALU
Write Read
result Address
Instruction register data
Read
memory data 2
concat Write Data
data

?
memory
RegWrite Write
data
ALUSrc
0 16 32
Sign X MemRead
extend

**Based on original figure from [P&H CO&D, COPYRIGHT 0


2004 Elsevier. ALL RIGHTS RESERVED.]

if MEM[PC]==J immediate26
PC = { PC[31:28], immediate26, 2’b00 } What about JR, JAL, JALR?
43
Conditional Branch Instructions
 Assembly (e.g., branch if equal)
BEQ rsreg rtreg immediate16

 Machine encoding

BEQ rs rt immediate I-type


6-bit 5-bit 5-bit 16-bit

 Semantics (assuming no branch delay slot)


if MEM[PC]==BEQ rs rt immediate16
target = PC + 4 + sign-extend(immediate) x 4
if GPR[rs]==GPR[rt] then PC  target
else PC  PC + 4

44
Conditional Branch Datapath (for
you to finish)

watch out
PC + 4 from instruction datapath
Add
PCSrc Add Sum Branch target
4
Shift
left 2
Read
PC address sub ALU operation
Read 3
Instruction register 1
Read
Instruction data 1
Read
Instruction register 2 To branch
memory Registers ALU Zero
bcond
concat Write control logic
register
Read
data 2
Write
data
RegWrite

0 16
Sign
32
extend

**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]

How to uphold the delayed branch semantics?


45
Putting It All Together
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M


u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite

Instruction [25– 21] Read


Read register 1
PC address Read
Instruction [20– 16] data 1
Read
register 2 Zero
bcond
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control

Instruction [5– 0]

**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier.


ALL RIGHTS RESERVED.]
46
JAL, JR, JALR omitted
Single-Cycle Control
Logic

47
Single-Cycle Hardwired Control
 As combinational function of Inst=MEM[PC]
31 26 21 16 11 6 0

0 rs rt rd shamt funct R-type


6-bit 5-bit 5-bit 5-bit 5-bit 6-bit
31 26 21 16 0

opcode rs rt immediate I-type


6-bit 5-bit 5-bit 16-bit
31 26 0

opcode immediate J-type


6-bit 26-bit

 Consider
 All R-type and I-type ALU instructions

 LW and SW

 BEQ, BNE, BLEZ, BGTZ

 J, JR, JAL, JALR

48
Single-Bit Control Signals

When De-asserted When asserted Equation


GPR write select GPR write select opcode==0
RegDest according to rt, i.e., according to rd, i.e.,
inst[20:16] inst[15:11]
2nd ALU input from 2nd 2nd ALU input from sign- (opcode!=0) &&
GPR read port extended 16-bit (opcode!=BEQ) &&
ALUSrc immediate (opcode!=BNE)

Steer ALU result to GPR steer memory load to opcode==LW


MemtoReg write port GPR wr. port
GPR write disabled GPR write enabled (opcode!=SW) &&
(opcode!=Bxx) &&
RegWrite (opcode!=J) &&
(opcode!=JR))

49
JAL and JALR require additional RegDest and MemtoReg options
Single-Bit Control Signals

When De-asserted When asserted Equation


Memory read disabled Memory read port opcode==LW
MemRead return load value

Memory write disabled Memory write enabled opcode==SW


MemWrite

According to PCSrc2 next PC is based on 26- (opcode==J) ||


PCSrc1 bit immediate jump (opcode==JAL)
target
next PC = PC + 4 next PC is based on 16- (opcode==Bxx) &&
PCSrc2 bit immediate branch “bcond is satisfied”
target

50
JR and JALR require additional PCSrc options
ALU Control
 case opcode
‘0’  select operation according to funct
‘ALUi’  selection operation according to opcode
‘LW’  select addition
‘SW’  select addition
‘Bxx’  select bcond generation function
__  don’t care

 Example ALU operations


 ADD, SUB, AND, OR, XOR, NOR, etc.
 bcond on equal, not equal, LE zero, GT zero, etc.

51
R-Type ALU

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M


u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc

1
RegWrite

0
Instruction [25– 21] Read
Read register 1
PC address Read
Instruction [20– 16] data 1
Read
register 2 Zero
bcond
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend
funct
ALU
control
ALU operation
0
Instruction [5– 0]

**Based on original figure from [P&H CO&D, COPYRIGHT


2004 Elsevier. ALL RIGHTS RESERVED.]
52
I-Type ALU

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M


u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc

1
RegWrite

0
Instruction [25– 21] Read
Read register 1
PC address Read
Instruction [20– 16] data 1
Read
register 2 Zero
bcond
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend
opcode
ALU
control
ALU operation
0
Instruction [5– 0]

**Based on original figure from [P&H CO&D, COPYRIGHT 2004


Elsevier. ALL RIGHTS RESERVED.]
53
LW

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M


u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc

1
RegWrite

0
Instruction [25– 21] Read
Read register 1
PC address Read
Instruction [20– 16] data 1
Read
register 2 Zero
bcond
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend
Add
ALU
control
ALU operation
1
Instruction [5– 0]

**Based on original figure from [P&H CO&D, COPYRIGHT 2004


Elsevier. ALL RIGHTS RESERVED.]
54
SW

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M


u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc

0
RegWrite

1
Instruction [25– 21] Read
Read register 1
PC address Read
Instruction [20– 16] data 1
Read
register 2 Zero
bcond
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read

X
M Write data 2 result Address 1
Instruction u register M data
M

X
memory x u
Instruction [15– 11] Write x u
1 data Data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend
Add
ALU
control
ALU operation
0
Instruction [5– 0]

**Based on original figure from [P&H CO&D, COPYRIGHT 2004


Elsevier. ALL RIGHTS RESERVED.]
55
Branch (Not Taken)
Some control signals are dependent
on the processing of data
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M


u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc

0
RegWrite

0
Instruction [25– 21] Read
Read register 1
PC address Read
Instruction [20– 16] data 1
Read
register 2 Zero
bcond
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1

X
Instruction u register M data
M

X
memory x u
Instruction [15– 11] Write x u
1 data Data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend
bcond
ALU
control
ALU operation
0
Instruction [5– 0]

**Based on original figure from [P&H CO&D, COPYRIGHT 2004


Elsevier. ALL RIGHTS RESERVED.]
56
Branch (Taken)
Some control signals are dependent
on the processing of data
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M


u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc

0
RegWrite

0
Instruction [25– 21] Read
Read register 1
PC address Read
Instruction [20– 16] data 1
Read
register 2 Zero
bcond
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1

X
Instruction u register M data
M

X
memory x u
Instruction [15– 11] Write x u
1 data Data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend
bcond
ALU
control
ALU operation
0
Instruction [5– 0]

**Based on original figure from [P&H CO&D, COPYRIGHT


2004 Elsevier. ALL RIGHTS RESERVED.]
57
Jump

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M

X
u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc

0
RegWrite

0
Instruction [25– 21] Read
Read register 1
PC address Read
Instruction [20– 16] data 1
Read
register 2 Zero
bcond
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1

X
Instruction u register M data

X
M

X
memory x u
Instruction [15– 11] Write x u
1 data Data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
ALU operation
0
X
extend ALU
control

Instruction [5– 0]

**Based on original figure from [P&H CO&D, COPYRIGHT


2004 Elsevier. ALL RIGHTS RESERVED.]
58
What is in That Control Box?
 Combinational Logic  Hardwired Control
 Idea: Control signals generated combinationally based
on instruction
 Necessary in a single-cycle microarchitecture…

 Sequential Logic  Sequential/Microprogrammed


Control
 Idea: A memory structure contains the control signals
associated with an instruction
 Control Store

59
Evaluating the Single-
Cycle Microarchitecture

60
A Single-Cycle
Microarchitecture
Is this a good idea/design?

 When is this a good design?

 When is this a bad design?

 How can we design a better microarchitecture?

61
A Single-Cycle
Microarchitecture:
Every instruction takes 1 cycle Analysis
to execute
 CPI (Cycles per instruction) is strictly 1

 How long each instruction takes is determined by


how long the slowest instruction takes to execute
 Even though many instructions do not need that long
to execute

 Clock cycle time of the microarchitecture is


determined by how long it takes to complete the
slowest instruction
 Critical path of the design is determined by the
processing time of the slowest instruction

62
What is the Slowest Instruction
toLet’s
 Process?
go back to the basics

 All six phases of the instruction processing cycle take a


single machine clock cycle to complete
 Fetch
1. Instruction fetch (IF)
 Decode 2. Instruction decode and
 Evaluate Address register operand fetch (ID/RF)
 Fetch Operands 3. Execute/Evaluate memory address (EX
 Execute 4. Memory operand fetch (MEM)
 Store Result 5. Store/writeback result (WB)

 Do each of the above phases take the same time


(latency) for all instructions?

63
Single-Cycle Datapath Analysis
 Assume
 memory units (read or write): 200 ps

 ALU and adders: 100 ps

 register file (read or write): 50 ps

 other combinational logic: 0 ps

steps IF ID EX MEM WB
Delay
resources mem RF ALU mem RF

R-type 200 50 100 50 400


I-type 200 50 100 50 400
LW 200 50 100 200 50 600
SW 200 50 100 200 550
Branch 200 50 100 350
Jump 200 200
64
Let’s Find the Critical Path

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M


u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite

Instruction [25– 21] Read


Read register 1
PC address Read
Instruction [20– 16] data 1
Read
register 2 Zero
bcond
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control

Instruction [5– 0]

[Based on original figure from P&H CO&D, COPYRIGHT 2004


Elsevier. ALL RIGHTS RESERVED.]
65
R-Type and I-Type ALU

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M


u u
x x
ALU

100ps
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp

100ps
MemWrite
ALUSrc
RegWrite

Instruction [25– 21] Read


Read register 1
PC address Read

200ps
Instruction
Instruction [20– 16]

0
Read
register 2
data 1

Registers Read 250ps Zero


bcond
ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction register M data

400ps
u M
memory x u
Instruction [15– 11] u

350ps
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control

Instruction [5– 0]

[Based on original figure from P&H CO&D, COPYRIGHT


2004 Elsevier. ALL RIGHTS RESERVED.]
66
LW

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M


u u
x x
ALU

100ps
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp

100ps
MemWrite
ALUSrc
RegWrite

Instruction [25– 21] Read


Read register 1
PC address Read

200ps
Instruction
Instruction [20– 16]

0
Read
register 2
data 1

Registers Read 250ps Zero


bcond
ALU ALU
[31– 0]

550ps
M Write data 2 0 Address Read
result data 1
Instruction u register M
u M
memory Instruction [15– 11] x u

600ps 350ps
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control

Instruction [5– 0]

[Based on original figure from P&H CO&D, COPYRIGHT


2004 Elsevier. ALL RIGHTS RESERVED.]
67
SW

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M


u u
x x
ALU

100ps
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp

100ps
MemWrite
ALUSrc
RegWrite

Instruction [25– 21] Read


Read register 1
PC address Read

200ps
Instruction
Instruction [20– 16]

0
Read
register 2
data 1

Registers Read 250ps Zero


bcond
ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u

350ps 550ps
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control

Instruction [5– 0]

[Based on original figure from P&H CO&D, COPYRIGHT


2004 Elsevier. ALL RIGHTS RESERVED.]
68
Branch Taken

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

200ps
PC+4 [31– 28] M M
u u

100ps ALU
Add result 1
x x
0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp

350ps
MemWrite
ALUSrc
RegWrite

PC
Read
address
Instruction [25– 21] Read
register 1
Read
350ps
200ps
Instruction
Instruction [20– 16]

0
Read
register 2
data 1

Registers Read 250ps Zero


bcond
ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control

Instruction [5– 0]

[Based on original figure from P&H CO&D, COPYRIGHT


2004 Elsevier. ALL RIGHTS RESERVED.]
69
Jump

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M


u u

100ps ALU
Add result 1
x x
0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp

200ps
MemWrite
ALUSrc
RegWrite

Instruction [25– 21] Read


Read register 1
PC address Read

200ps
Instruction
Instruction [20– 16]

0
Read
register 2
data 1

Registers Read
Zero
bcond
ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control

Instruction [5– 0]

[Based on original figure from P&H CO&D, COPYRIGHT


2004 Elsevier. ALL RIGHTS RESERVED.]
70
What About Control Logic?
 How does that affect the critical path?

 Food for thought for you:


 Can control logic be on the critical path?
 A note on CDC 5600: control store access too long…

71
What is the Slowest Instruction
toMemory
 Process?
is not magic

 What if memory sometimes takes 100ms to access?

 Does it make sense to have a simple register to


register add or jump to take {100ms+all else to do
a memory operation}?

 And, what if you need to access memory more than


once to process an instruction?
 Which instructions need this?
 Do you provide multiple ports to memory?

72
Single Cycle uArch: Complexity
 Contrived
 All instructions run as slow as the slowest instruction

 Inefficient
 All instructions run as slow as the slowest instruction
 Must provide worst-case combinational resources in parallel as
required by any instruction
 Need to replicate a resource if it is needed more than once by
an instruction during different parts of the instruction
processing cycle

 Not necessarily the simplest way to implement an ISA


 Single-cycle implementation of REP MOVS (x86) or INDEX
(VAX)?

 Not easy to optimize/improve performance


 Optimizing the common case does not work (e.g. common 73
(Micro)architecture Design
Principles
Critical path design
 Find and decrease the maximum combinational logic
delay
 Break a path into multiple cycles if it takes too long

 Bread and butter (common case) design


 Spend time and resources on where it matters most
 i.e., improve what the machine is really designed to do
 Common case vs. uncommon case

 Balanced design
 Balance instruction/data flow through hardware
components
 Design to eliminate bottlenecks: balance the hardware
for the work 74
Single-Cycle Design vs. Design
Principles
Critical path design

 Bread and butter (common case) design

 Balanced design

How does a single-cycle microarchitecture fare in light


of these principles?

75
Aside: System Design Principles
 When designing computer systems/architectures, it
is important to follow good principles

 Remember: “principled design” from our first lecture


 Frank Lloyd Wright: “architecture […] based upon
principle, and not upon precedent”

76
Aside: From Lecture 1
 “architecture […] based upon principle, and not
upon precedent”

77
Aside: System Design Principles
 We will continue to cover key principles in this
course
 Here are some references where you can learn more

 Yale Patt, “Requirements, Bottlenecks, and Good Fortune:


Agents for Microprocessor Evolution,” Proc. of IEEE, 2001.
(Levels of transformation, design point, etc)
 Mike Flynn, “Very High-Speed Computing Systems,” Proc. of
IEEE, 1966. (Flynn’s Bottleneck  Balanced design)
 Gene M. Amdahl, "Validity of the single processor approach to
achieving large scale computing capabilities," AFIPS
Conference, April 1967. (Amdahl’s Law  Common-case design)
 Butler W. Lampson, “Hints for Computer System Design,” ACM
Operating Systems Review, 1983.
 http://research.microsoft.com/pubs/68221/acrobat.pdf

78
Aside: One Important Principle
 Keep it simple

 “Everything should be made as simple as possible,


but no simpler.”
 Albert Einstein

 And, do not forget: “An engineer is a person who


can do for a dime what any fool can do for a dollar.”

 For more, see:


 Butler W. Lampson, “Hints for Computer System
Design,” ACM Operating Systems Review, 1983.
 http://research.microsoft.com/pubs/68221/acrobat.pdf
79
Multi-Cycle
Microarchitectures

80

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy