0% found this document useful (0 votes)

32 views80 pages

03.EECE345 Computer Architecture ISA Design 02

The document introduces microarchitecture concepts, focusing on single-cycle microarchitectures and comparing them with multi-cycle architectures. It outlines the instruction processing cycle, emphasizing the phases involved and the differences in control and data handling between single-cycle and multi-cycle machines. Additionally, it discusses the implications of design choices in ISA and the performance analysis of different microarchitecture types.

Uploaded by

ahmed27042004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views80 pages

03.EECE345 Computer Architecture ISA Design 02

Uploaded by

ahmed27042004

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 80

Intro to Microarchitecture:

Single-Cycle

Vinod PANGRACIOUS
School of Engineering
American University in Dubai
Agenda for Today & Next Few
Lectures
Start Microarchitecture

 Single-cycle Microarchitectures

 Multi-cycle Microarchitectures

 Microprogrammed Microarchitectures

 Pipelining

 Issues in Pipelining: Control & Data Dependence

Handling, State Maintenance and Recovery, …

2
Recap of Two Weeks and Last
Lecture
Computer Architecture Today and Basics (Lectures 1
& 2)
 Fundamental Concepts (Lecture 3)
 ISA basics and tradeoffs (Lectures 3 & 4)

 Last Lecture: ISA tradeoffs continued + MIPS ISA

 Instruction length
 Uniform vs. non-uniform decode
 Number of registers
 Addressing modes
 Aligned vs. unaligned access
 RISC vs. CISC properties
 MIPS ISA Overview
3
Food for Thought for You

 As you learn the MIPS ISA, think about what

tradeoffs the designers have made
 in terms of the ISA properties, we talked about
 And think about the pros and cons of design choices
 In comparison to ARM, Alpha
 In comparison to x86, VAX
 And think about the potential mistakes
 Branch delay slot?
 Load delay slot?
 No FP, no multiply, MIPS (initial) Look Backward

4
Food for Thought for You
 How would you design a new ISA?

 Where would you place it?

 What design choices would you make in terms of
ISA properties?

 What would be the first question you ask in this

process?
 “What is my design point?”

Look Forward & Up

5
Review: Other Example ISA-level
Tradeoffs
 Condition codes vs. not

 VLIW vs. single instruction

 SIMD (single instruction multiple data) vs. SISD
 Precise vs. imprecise exceptions
 Unaligned access vs. not
 Virtual memory vs. not
 Unaligned access vs. not
 Hardware interlocks vs. software-guaranteed
interlocking
 Cache coherence (hardware vs. software)

Think Programmer vs. (Micro)architect

6
Review: A Note on RISC vs.
CISC
Usually, …

 RISC
 Simple instructions
 Fixed length
 Uniform decode
 Few addressing modes

 CISC
 Complex instructions
 Variable length
 Non-uniform decode
 Many addressing modes
7
Now That We Have an ISA
 How do we implement it?

 i.e., how do we design a system that obeys the

hardware/software interface?

 Aside: “System” can be solely hardware or a

combination of hardware and software
 Remember “Translation of ISAs”
 A virtual ISA can be converted by “software” into an
implementation ISA

 We will assume “hardware” for most lectures

8
Implementing the ISA:
Microarchitecture
Basics

9
How Does a Machine Process
Instructions?
What does processing an instruction mean?
 Remember the von Neumann model

AS = Architectural (programmer visible) state before

an instruction is processed

Process instruction

AS’ = Architectural (programmer visible) state after an

instruction is processed

 Processing an instruction: Transforming AS to AS’

according to the ISA specification of the instruction
10
The “Process instruction” Step
 ISA specifies abstractly what AS’ should be, given an
instruction and AS
 It defines an abstract finite state machine where
 State = programmer-visible state
 Next-state logic = instruction execution specification
 From ISA point of view, there are no “intermediate
states” between AS and AS’ during instruction execution
 One state transition per instruction

 Microarchitecture implements how AS is transformed

to AS’
 There are many choices in implementation
 We can have programmer-invisible state to optimize the
speed of instruction execution: multiple state transitions
per instruction
 Choice 1: AS  AS’ (transform AS to AS’ in a single clock cycle)
 Choice 2: AS  AS+MS1  AS+MS2  AS+MS3  AS’ (take 11
A Very Basic Instruction
Processing
 Engine
Each instruction takes a single clock cycle to
execute
 Only combinational logic is used to implement
instruction execution
 No intermediate, programmer-invisible state updates

AS = Architectural (programmer visible) state

at the beginning of a clock cycle

Process instruction in one clock cycle

AS’ = Architectural (programmer visible) state

at the end of a clock cycle
12
A Very Basic Instruction
Processing
 Engine
Single-cycle machine

AS’ Sequential AS
Combinational
Logic
Logic
(State)

 What is the clock cycle time determined by?

 What is the critical path of the combinational logic
determined by?

13
Remember: Programmer Visible
(Architectural) State

M[0]
M[1]
M[2]
M[3] Registers
M[4] - given special names in the ISA
(as opposed to addresses)
- general vs. special purpose

M[N-1]
Memory Program Counter
array of storage locations memory address
indexed by an address of the current instruction

Instructions (and programs) specify how to transform

the values of programmer visible state
14
Single-cycle vs. Multi-cycle
Machines
Single-cycle machines
 Each instruction takes a single clock cycle
 All state updates made at the end of an instruction’s execution
 Big disadvantage: The slowest instruction determines cycle
time  long clock cycle time

 Multi-cycle machines
 Instruction processing broken into multiple cycles/stages
 State updates can be made during an instruction’s execution
 Architectural state updates made only at the end of an
instruction’s execution
 Advantage over single-cycle: The slowest “stage” determines
cycle time

 Both single-cycle and multi-cycle machines literally

follow the von Neumann model at the microarchitecture
level 15
Instruction Processing “Cycle”
 Instructions are processed under the direction of a
“control unit” step by step.
 Instruction cycle: Sequence of steps to process an
instruction
 Fundamentally, there are six phases:

 Fetch
 Decode
 Evaluate Address
 Fetch Operands
 Execute
 Store Result

 Not all instructions require all six stages (see P&P Ch.
16
Instruction Processing “Cycle” vs. Machine
Clock Cycle
 Single-cycle machine:
 All six phases of the instruction processing cycle take a
single machine clock cycle to complete

 Multi-cycle machine:
 All six phases of the instruction processing cycle can
take multiple machine clock cycles to complete
 In fact, each phase can take multiple clock cycles to
complete

17
Instruction Processing Viewed
Another Way
 Instructions transform Data (AS) to Data’ (AS’)

 This transformation is done by functional units

 Units that “operate” on data
 These units need to be told what to do to the data
 An instruction processing engine consists of two
components
 Datapath: Consists of hardware elements that deal with
and transform data signals
 functional units that operate on data
 hardware structures (e.g. wires and muxes) that enable the
flow of data into the functional units and registers
 storage units that store data (e.g., registers)
 Control logic: Consists of hardware elements that
determine control signals, i.e., signals that specify what
the datapath elements should do to the data 18
Single-cycle vs. Multi-cycle:
Control
Single-cycle& Data
machine:
 Control signals are generated in the same clock cycle
as the one during which data signals are operated on
 Everything related to an instruction happens in one
clock cycle (serialized processing)

 Multi-cycle machine:
 Control signals needed in the next cycle can be
generated in the current cycle
 Latency of control processing can be overlapped with
latency of datapath operation (more parallelism)

 We will see the difference clearly in

microprogrammed multi-cycle microarchitectures
19
Many Ways of Datapath and Control
Design
 There are many ways of designing the data path

and control logic

 Single-cycle, multi-cycle, pipelined datapath and

control
 Single-bus vs. multi-bus datapaths

 Hardwired/combinational vs.
microcoded/microprogrammed control
 Control signals generated by combinational logic
versus
 Control signals stored in a memory structure

 Control signals and structure depend on the

20
datapath design
Flash-Forward: Performance
Analysis
Execution time of an instruction
 {CPI} x {clock cycle time}
 Execution time of a program
 Sum over all instructions [{CPI} x {clock cycle time}]
 {# of instructions} x {Average CPI} x {clock cycle
time}

 Single cycle microarchitecture performance

 CPI = 1
 Clock cycle time = long
 Multi-cycle microarchitecture performance
 CPI = different for each instruction Now, we have
two degrees of freedom
 Average CPI  hopefully small to optimize independently
 Clock cycle time = short 21
A Single-Cycle
Microarchitecture
A Closer Look

22
Remember…
 Single-cycle machine

AS’ Sequential AS
Combinational
Logic
Logic
(State)

23
Let’s Start with the State
 Elements
Data and control inputs 5 Read 3
ALU control
register 1
Read
Register 5 Read data 1
Instruction
address numbers register 2 Zero
PC Registers Data ALU ALU
Instruction Add Sum 5 Write result
Instruction
register
memory
Read
Write data 2
a. Instruction memory b. Program counter c. Adder Data data

RegWrite

a. Registers b. ALU

MemWrite

Instruction
address
Address Read
PC data 16 32
Sign
Instruction Add Sum
extend
Write Data
Instruction
data memory
memory

a. Instruction memory b. Program counter c. Adder MemRead

a. Data memory unit b. Sign-extension unit

**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
24
For Now, We Will Assume
 “Magic” memory and register file
 Combinational read
 output of the read data port is a combinational
function of the register file contents and the
corresponding read select port

 Synchronous write
 the selected register is updated on the positive edge
clock transition when write enable is asserted
 Cannot affect read output in between clock edges

 Single-cycle, synchronous memory

 Contrast this with memory that tells when the data is
ready
25
 i.e., Ready bit: indicating the read or write is done
Instruction Processing
 5 generic steps (P&H book)
 Instruction fetch (IF)
 Instruction decode and register operand fetch (ID/RF)
 Execute/Evaluate memory address (EX/AG)
 Memory operand fetch (MEM)
 Store/writeback result (WB)

WB
IF Data

Data
MEM
**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
26
What Is To Come: The Full MIPS
Datapath PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M

u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite

Instruction [25– 21] Read

Read register 1
PC address Read
Instruction [20– 16] data 1
Read
register 2 Zero
bcond
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control

Instruction [5– 0]

**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier.

ALL RIGHTS RESERVED.]
27
JAL, JR, JALR omitted
Processor Design
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M

u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite

Instruction [25– 21] Read

Instruction [5– 0]

28
Single-Cycle Datapath
for
Arithmetic and Logical
Instructions

29
R-Type ALU Instructions
 Assembly (e.g., register-register signed addition)
ADD rdreg rsreg rtreg

 Machine encoding

0 rs rt rd 0 ADD R-type
6-bit 5-bit 5-bit 5-bit 5-bit 6-bit

 Semantics

if MEM[PC] == ADD rd rs rt
GPR[rd]  GPR[rs] + GPR[rt]
PC  PC + 4

30
ALU Datapath

Add

ALU operation
Read 25:21 Read 3
PC address register 1 Read
20:16 Read data 1
register 2 Zero
Instruction Instruction
Registers ALU ALU
Instruction 15:11 Write result
register
memory Read
Write data 2
data

RegWrite
1

ADD rdreg rsreg rtreg

IF ID EX MEM WB
if MEM[PC] == ADD rd rs rt
GPR[rd]  GPR[rs] + GPR[rt]
Combinational
**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
state update logic
PC  PC + 4
**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] 31
I-Type ALU Instructions
 Assembly (e.g., register-immediate signed
additions)
ADDI rtreg rsreg immediate16

 Machine encoding
ADDI rs rt immediate I-type
6-bit 5-bit 5-bit 16-bit

 Semantics
if MEM[PC] == ADDI rt rs immediate
GPR[rt]  GPR[rs] + sign-extend (immediate)
PC  PC + 4

32
Datapath for R and I-Type ALU
Insts.
Add

Read 3 ALU operation

Read 25:21
PC register 1 MemWrite
address Read
data 1
Read
20:16
Instruction register 2 Zero
Instruction Registers ALU ALU
Write Read
15:11 result Address
Instruction register data
Read
memory data 2
Write Data
RegDest data
memory
isItype RegWrite Write
data
ALUSrc
1
16 32
Sign isItype MemRead
extend

ADDI rtreg rsreg immediate16

IF ID EX MEM WB
if MEM[PC] == ADDI rt rs immediate
GPR[rt]  GPR[rs] + sign-extend (immediate)
Combinational
PC  PC + 4 state update logic33
**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
Single-Cycle Datapath
for
Data Movement
Instructions

34
Load Instructions
 Assembly (e.g., load 4-byte word)
LW rtreg offset16 (basereg)

 Machine encoding
LW base rt offset I-type
6-bit 5-bit 5-bit 16-bit

 Semantics
if MEM[PC]==LW rt offset16 (base)
EA = sign-extend(offset) + GPR[base]
GPR[rt]  MEM[ translate(EA) ]
PC  PC + 4

35
LW Datapath

Add
0
4 add MemWrite
Read 3 ALU operation
Read register 1 MemWrite
PC address Read
data 1
Read Address Read
Instruction register 2 Zero data 16 32
Instruction Registers ALU ALU Sign
Write Read extend
result Address
Instruction register data Write Data
Read data memory
memory data 2
Write Data
data
memory
RegDest RegWrite Write
MemRead
isItype 116
data
ALUSrc
Sign
32
isItype MemRead a. Data memory unit b. Sign-extension unit
extend
1

LW rtreg offset16 (basereg)

if MEM[PC]==LW rt offset16 (base) IF ID EX MEM WB
EA = sign-extend(offset) + GPR[base] Combinational
GPR[rt]  MEM[ translate(EA) ]
PC  PC + 4 state update logic36
Store Instructions
 Assembly (e.g., store 4-byte word)
SW rtreg offset16 (basereg)

 Machine encoding

SW base rt offset I-type

6-bit 5-bit 5-bit 16-bit

 Semantics
if MEM[PC]==SW rt offset16 (base)
EA = sign-extend(offset) + GPR[base]
MEM[ translate(EA) ]  GPR[rt]
PC  PC + 4

37
SW Datapath

Add
1
4 add
Read 3 ALU operation MemWrite
Read register 1 MemWrite
PC address Read
data 1
Read
Instruction register 2 Zero Address Read
Instruction Registers ALU ALU data 16 32
Write Read Sign
result Address extend
Instruction register data
Read Write Data
memory data 2 data memory
Write Data
data
memory
RegDest RegWrite Write
isItype 016
data
ALUSrc MemRead

Sign
32
isItype MemRead
a. Data memory unit b. Sign-extension unit
extend
0

SW rtreg offset16 (basereg)

if MEM[PC]==SW rt offset16 (base) IF ID EX MEM WB

EA = sign-extend(offset) + GPR[base] Combinational
MEM[ translate(EA) ]  GPR[rt]
PC  PC + 4 state update logic38
Load-Store Datapath

Add

4
add
Read 3 ALU operation isStore
Read register 1 MemWrite
PC address Read
data 1
Read
Instruction register 2 Zero
Instruction Registers ALU ALU
Write Read
result Address
Instruction register data
Read
memory data 2
Write Data
data
memory
RegDest RegWrite Write
data
isItype !isStore 16
ALUSrc
32
Sign isItype MemRead
extend
isLoad

**Based on original figure from [P&H CO&D, COPYRIGHT

2004 Elsevier. ALL RIGHTS RESERVED.]
39
Datapath for Non-Control-Flow
Insts.
Add

Read 3 ALU operation isStore

Read register 1 MemWrite
PC address Read
data 1
Read
Instruction register 2 Zero
Instruction Registers ALU ALU
Write Read
result Address
Instruction register data
Read
memory data 2
Write Data
data
memory
RegDest RegWrite Write
data
isItype !isStore 16
ALUSrc
32
Sign isItype MemRead
extend
isLoad

MemtoReg
isLoad
**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.] 40
Single-Cycle Datapath
for
Control Flow
Instructions

41
Unconditional Jump Instructions
 Assembly
J immediate26

 Machine encoding

J immediate J-type
6-bit 26-bit

 Semantics
if MEM[PC]==J immediate26
target = { PC[31:28], immediate26, 2’b00 }
PC  target

42
Unconditional Jump Datapath

isJ Add
PCSrc
4
X
Read 3 ALU operation 0
Read register 1 MemWrite
PC address Read
data 1
Read
Instruction register 2 Zero
Instruction Registers ALU ALU
Write Read
result Address
Instruction register data
Read
memory data 2
concat Write Data
data

?
memory
RegWrite Write
data
ALUSrc
0 16 32
Sign X MemRead
extend

**Based on original figure from [P&H CO&D, COPYRIGHT 0

if MEM[PC]==J immediate26
PC = { PC[31:28], immediate26, 2’b00 } What about JR, JAL, JALR?
43
Conditional Branch Instructions
 Assembly (e.g., branch if equal)
BEQ rsreg rtreg immediate16

 Machine encoding

BEQ rs rt immediate I-type

6-bit 5-bit 5-bit 16-bit

 Semantics (assuming no branch delay slot)

if MEM[PC]==BEQ rs rt immediate16
target = PC + 4 + sign-extend(immediate) x 4
if GPR[rs]==GPR[rt] then PC  target
else PC  PC + 4

44
Conditional Branch Datapath (for
you to finish)

watch out
PC + 4 from instruction datapath
Add
PCSrc Add Sum Branch target
4
Shift
left 2
Read
PC address sub ALU operation
Read 3
Instruction register 1
Read
Instruction data 1
Read
Instruction register 2 To branch
memory Registers ALU Zero
bcond
concat Write control logic
register
Read
data 2
Write
data
RegWrite

0 16
Sign
32
extend

How to uphold the delayed branch semantics?

45
Putting It All Together
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M

u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite

Instruction [25– 21] Read

Instruction [5– 0]

**Based on original figure from [P&H CO&D, COPYRIGHT 2004 Elsevier.

47
Single-Cycle Hardwired Control
 As combinational function of Inst=MEM[PC]
31 26 21 16 11 6 0

0 rs rt rd shamt funct R-type

6-bit 5-bit 5-bit 5-bit 5-bit 6-bit
31 26 21 16 0

opcode rs rt immediate I-type

6-bit 5-bit 5-bit 16-bit
31 26 0

opcode immediate J-type

6-bit 26-bit

 Consider
 All R-type and I-type ALU instructions

 LW and SW

 BEQ, BNE, BLEZ, BGTZ

 J, JR, JAL, JALR

48
Single-Bit Control Signals

When De-asserted When asserted Equation

GPR write select GPR write select opcode==0
RegDest according to rt, i.e., according to rd, i.e.,
inst[20:16] inst[15:11]
2nd ALU input from 2nd 2nd ALU input from sign- (opcode!=0) &&
GPR read port extended 16-bit (opcode!=BEQ) &&
ALUSrc immediate (opcode!=BNE)

Steer ALU result to GPR steer memory load to opcode==LW

MemtoReg write port GPR wr. port
GPR write disabled GPR write enabled (opcode!=SW) &&
(opcode!=Bxx) &&
RegWrite (opcode!=J) &&
(opcode!=JR))

49
JAL and JALR require additional RegDest and MemtoReg options
Single-Bit Control Signals

When De-asserted When asserted Equation

Memory read disabled Memory read port opcode==LW
MemRead return load value

Memory write disabled Memory write enabled opcode==SW

MemWrite

According to PCSrc2 next PC is based on 26- (opcode==J) ||

PCSrc1 bit immediate jump (opcode==JAL)
target
next PC = PC + 4 next PC is based on 16- (opcode==Bxx) &&
PCSrc2 bit immediate branch “bcond is satisfied”
target

50
JR and JALR require additional PCSrc options
ALU Control
 case opcode
‘0’  select operation according to funct
‘ALUi’  selection operation according to opcode
‘LW’  select addition
‘SW’  select addition
‘Bxx’  select bcond generation function
__  don’t care

 Example ALU operations

 ADD, SUB, AND, OR, XOR, NOR, etc.
 bcond on equal, not equal, LE zero, GT zero, etc.

51
R-Type ALU

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M

u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc

1
RegWrite

0
Instruction [25– 21] Read
Read register 1
PC address Read
Instruction [20– 16] data 1
Read
register 2 Zero
bcond
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend
funct
ALU
control
ALU operation
0
Instruction [5– 0]

**Based on original figure from [P&H CO&D, COPYRIGHT

2004 Elsevier. ALL RIGHTS RESERVED.]
52
I-Type ALU

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M

u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc

1
RegWrite

0
Instruction [25– 21] Read
Read register 1
PC address Read
Instruction [20– 16] data 1
Read
register 2 Zero
bcond
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend
opcode
ALU
control
ALU operation
0
Instruction [5– 0]

**Based on original figure from [P&H CO&D, COPYRIGHT 2004

Elsevier. ALL RIGHTS RESERVED.]
53
LW

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M

u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc

1
RegWrite

0
Instruction [25– 21] Read
Read register 1
PC address Read
Instruction [20– 16] data 1
Read
register 2 Zero
bcond
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend
Add
ALU
control
ALU operation
1
Instruction [5– 0]

**Based on original figure from [P&H CO&D, COPYRIGHT 2004

Elsevier. ALL RIGHTS RESERVED.]
54
SW

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M

u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc

0
RegWrite

1
Instruction [25– 21] Read
Read register 1
PC address Read
Instruction [20– 16] data 1
Read
register 2 Zero
bcond
Instruction 0 Registers Read ALU ALU
[31– 0] 0 Read

X
M Write data 2 result Address 1
Instruction u register M data
M

X
memory x u
Instruction [15– 11] Write x u
1 data Data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend
Add
ALU
control
ALU operation
0
Instruction [5– 0]

**Based on original figure from [P&H CO&D, COPYRIGHT 2004

Elsevier. ALL RIGHTS RESERVED.]
55
Branch (Not Taken)
Some control signals are dependent
on the processing of data
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M

u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc

0
RegWrite

X
Instruction u register M data
M

X
memory x u
Instruction [15– 11] Write x u
1 data Data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend
bcond
ALU
control
ALU operation
0
Instruction [5– 0]

**Based on original figure from [P&H CO&D, COPYRIGHT 2004

Elsevier. ALL RIGHTS RESERVED.]
56
Branch (Taken)
Some control signals are dependent
on the processing of data
PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M

u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc

0
RegWrite

X
Instruction u register M data
M

X
memory x u
Instruction [15– 11] Write x u
1 data Data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend
bcond
ALU
control
ALU operation
0
Instruction [5– 0]

**Based on original figure from [P&H CO&D, COPYRIGHT

2004 Elsevier. ALL RIGHTS RESERVED.]
57
Jump

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M

X
u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc

0
RegWrite

X
Instruction u register M data

X
M

X
memory x u
Instruction [15– 11] Write x u
1 data Data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
ALU operation
0
X
extend ALU
control

Instruction [5– 0]

**Based on original figure from [P&H CO&D, COPYRIGHT

2004 Elsevier. ALL RIGHTS RESERVED.]
58
What is in That Control Box?
 Combinational Logic  Hardwired Control
 Idea: Control signals generated combinationally based
on instruction
 Necessary in a single-cycle microarchitecture…

 Sequential Logic  Sequential/Microprogrammed

Control
 Idea: A memory structure contains the control signals
associated with an instruction
 Control Store

59
Evaluating the Single-
Cycle Microarchitecture

60
A Single-Cycle
Microarchitecture
Is this a good idea/design?

 When is this a good design?

 When is this a bad design?

 How can we design a better microarchitecture?

61
A Single-Cycle
Microarchitecture:
Every instruction takes 1 cycle Analysis
to execute
 CPI (Cycles per instruction) is strictly 1

 How long each instruction takes is determined by

how long the slowest instruction takes to execute
 Even though many instructions do not need that long
to execute

 Clock cycle time of the microarchitecture is

determined by how long it takes to complete the
slowest instruction
 Critical path of the design is determined by the
processing time of the slowest instruction

62
What is the Slowest Instruction
toLet’s
 Process?
go back to the basics

 All six phases of the instruction processing cycle take a

single machine clock cycle to complete
 Fetch
1. Instruction fetch (IF)
 Decode 2. Instruction decode and
 Evaluate Address register operand fetch (ID/RF)
 Fetch Operands 3. Execute/Evaluate memory address (EX
 Execute 4. Memory operand fetch (MEM)
 Store Result 5. Store/writeback result (WB)

 Do each of the above phases take the same time

(latency) for all instructions?

63
Single-Cycle Datapath Analysis
 Assume
 memory units (read or write): 200 ps

 ALU and adders: 100 ps

 register file (read or write): 50 ps

 other combinational logic: 0 ps

steps IF ID EX MEM WB
Delay
resources mem RF ALU mem RF

R-type 200 50 100 50 400

I-type 200 50 100 50 400
LW 200 50 100 200 50 600
SW 200 50 100 200 550
Branch 200 50 100 350
Jump 200 200
64
Let’s Find the Critical Path

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M

u u
x x
ALU
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp
MemWrite
ALUSrc
RegWrite

Instruction [25– 21] Read

Instruction [5– 0]

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M

u u
x x
ALU

100ps
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp

100ps
MemWrite
ALUSrc
RegWrite

Instruction [25– 21] Read

Read register 1
PC address Read

200ps
Instruction
Instruction [20– 16]

0
Read
register 2
data 1

Registers Read 250ps Zero

bcond
ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction register M data

400ps
u M
memory x u
Instruction [15– 11] u

350ps
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control

Instruction [5– 0]

[Based on original figure from P&H CO&D, COPYRIGHT

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M

u u
x x
ALU

100ps
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp

100ps
MemWrite
ALUSrc
RegWrite

Instruction [25– 21] Read

Read register 1
PC address Read

200ps
Instruction
Instruction [20– 16]

0
Read
register 2
data 1

Registers Read 250ps Zero

bcond
ALU ALU
[31– 0]

550ps
M Write data 2 0 Address Read
result data 1
Instruction u register M
u M
memory Instruction [15– 11] x u

600ps 350ps
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control

Instruction [5– 0]

[Based on original figure from P&H CO&D, COPYRIGHT

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M

u u
x x
ALU

100ps
Add result 1 0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp

100ps
MemWrite
ALUSrc
RegWrite

Instruction [25– 21] Read

Read register 1
PC address Read

200ps
Instruction
Instruction [20– 16]

0
Read
register 2
data 1

Registers Read 250ps Zero

bcond
ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u

350ps 550ps
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control

Instruction [5– 0]

[Based on original figure from P&H CO&D, COPYRIGHT

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

200ps
PC+4 [31– 28] M M
u u

100ps ALU
Add result 1
x x
0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp

350ps
MemWrite
ALUSrc
RegWrite

PC
Read
address
Instruction [25– 21] Read
register 1
Read
350ps
200ps
Instruction
Instruction [20– 16]

0
Read
register 2
data 1

Registers Read 250ps Zero

bcond
ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control

Instruction [5– 0]

[Based on original figure from P&H CO&D, COPYRIGHT

PCSrc1=Jump
Instruction [25– 0] Shift Jump address [31– 0]
left 2
26 28 0 1

PC+4 [31– 28] M M

u u

100ps ALU
Add result 1
x x
0
Add
RegDst Shift PCSrc2=Br Taken
Jump left 2
4 Branch
MemRead
Instruction [31– 26]
Control MemtoReg
ALUOp

200ps
MemWrite
ALUSrc
RegWrite

Instruction [25– 21] Read

Read register 1
PC address Read

200ps
Instruction
Instruction [20– 16]

0
Read
register 2
data 1

Registers Read
Zero
bcond
ALU ALU
[31– 0] 0 Read
M Write data 2 result Address 1
Instruction u register M data
u M
memory Instruction [15– 11] x u
1 Write x Data
data x
1 memory 0
Write
data
16 32
Instruction [15– 0] Sign
extend ALU ALU operation
control

Instruction [5– 0]

[Based on original figure from P&H CO&D, COPYRIGHT

 Food for thought for you:

 Can control logic be on the critical path?
 A note on CDC 5600: control store access too long…

71
What is the Slowest Instruction
toMemory
 Process?
is not magic

 What if memory sometimes takes 100ms to access?

 Does it make sense to have a simple register to

 And, what if you need to access memory more than

once to process an instruction?
 Which instructions need this?
 Do you provide multiple ports to memory?

72
Single Cycle uArch: Complexity
 Contrived
 All instructions run as slow as the slowest instruction

 Inefficient
 All instructions run as slow as the slowest instruction
 Must provide worst-case combinational resources in parallel as
required by any instruction
 Need to replicate a resource if it is needed more than once by
an instruction during different parts of the instruction
processing cycle

 Not necessarily the simplest way to implement an ISA

 Single-cycle implementation of REP MOVS (x86) or INDEX
(VAX)?

 Not easy to optimize/improve performance

 Optimizing the common case does not work (e.g. common 73
(Micro)architecture Design
Principles
Critical path design
 Find and decrease the maximum combinational logic
delay
 Break a path into multiple cycles if it takes too long

 Bread and butter (common case) design

 Spend time and resources on where it matters most
 i.e., improve what the machine is really designed to do
 Common case vs. uncommon case

 Balanced design
 Balance instruction/data flow through hardware
components
 Design to eliminate bottlenecks: balance the hardware
for the work 74
Single-Cycle Design vs. Design
Principles
Critical path design

 Bread and butter (common case) design

 Balanced design

How does a single-cycle microarchitecture fare in light

of these principles?

75
Aside: System Design Principles
 When designing computer systems/architectures, it
is important to follow good principles

 Remember: “principled design” from our first lecture

 Frank Lloyd Wright: “architecture […] based upon
principle, and not upon precedent”

76
Aside: From Lecture 1
 “architecture […] based upon principle, and not
upon precedent”

77
Aside: System Design Principles
 We will continue to cover key principles in this
course
 Here are some references where you can learn more

 Yale Patt, “Requirements, Bottlenecks, and Good Fortune:

Agents for Microprocessor Evolution,” Proc. of IEEE, 2001.
(Levels of transformation, design point, etc)
 Mike Flynn, “Very High-Speed Computing Systems,” Proc. of
IEEE, 1966. (Flynn’s Bottleneck  Balanced design)
 Gene M. Amdahl, "Validity of the single processor approach to
achieving large scale computing capabilities," AFIPS
Conference, April 1967. (Amdahl’s Law  Common-case design)
 Butler W. Lampson, “Hints for Computer System Design,” ACM
Operating Systems Review, 1983.
 http://research.microsoft.com/pubs/68221/acrobat.pdf

78
Aside: One Important Principle
 Keep it simple

 “Everything should be made as simple as possible,

but no simpler.”
 Albert Einstein

 And, do not forget: “An engineer is a person who

can do for a dime what any fool can do for a dollar.”

 For more, see:

 Butler W. Lampson, “Hints for Computer System
Design,” ACM Operating Systems Review, 1983.
 http://research.microsoft.com/pubs/68221/acrobat.pdf
79
Multi-Cycle
Microarchitectures

Learn SAP Basis in 24 Hours
From Everand
Learn SAP Basis in 24 Hours
Alex Nordeen
4.5/5 (2)
3000 - 6000 E Firmware and Software User Guide
100% (1)
3000 - 6000 E Firmware and Software User Guide
39 pages
Onur 447 Spring15 Lecture5 Uarch Afterlecture
No ratings yet
Onur 447 Spring15 Lecture5 Uarch Afterlecture
80 pages
Comparch 04
No ratings yet
Comparch 04
73 pages
Onur 447 Spring15 Lecture6 Multi Cycle and Microprogrammed Microarchitectures Afterlecture
No ratings yet
Onur 447 Spring15 Lecture6 Multi Cycle and Microprogrammed Microarchitectures Afterlecture
81 pages
Arch2 Microarchitecture Design Afterlecture
No ratings yet
Arch2 Microarchitecture Design Afterlecture
222 pages
Lec3 - Single-Cycle and Multi-Cycle Microarchitectures
No ratings yet
Lec3 - Single-Cycle and Multi-Cycle Microarchitectures
121 pages
Fundamentals of Processor Design: Using Figures From by Hamblen and Furman
No ratings yet
Fundamentals of Processor Design: Using Figures From by Hamblen and Furman
35 pages
Comparch 05
No ratings yet
Comparch 05
48 pages
Explain The Instruction Cycle and Its Advantages
No ratings yet
Explain The Instruction Cycle and Its Advantages
8 pages
ITEC582 Chapter14
No ratings yet
ITEC582 Chapter14
24 pages
461 Assignment
No ratings yet
461 Assignment
52 pages
Comparch 03
No ratings yet
Comparch 03
44 pages
Comparch 2015 S 03
No ratings yet
Comparch 2015 S 03
44 pages
4 The Processors
No ratings yet
4 The Processors
112 pages
Lecture-1 (Intro To Microprocessors)
No ratings yet
Lecture-1 (Intro To Microprocessors)
21 pages
Onur 447 Spring15 Lecture7 Pipelining Afterlecture
No ratings yet
Onur 447 Spring15 Lecture7 Pipelining Afterlecture
66 pages
MP Lect 2
No ratings yet
MP Lect 2
54 pages
Unit II
No ratings yet
Unit II
46 pages
CH 3
No ratings yet
CH 3
53 pages
Untitled Document
No ratings yet
Untitled Document
23 pages
Lecture Slides Week2
No ratings yet
Lecture Slides Week2
44 pages
Slide 2
No ratings yet
Slide 2
16 pages
IA 64 Architecture Review
No ratings yet
IA 64 Architecture Review
48 pages
Ca 08 (Isa)
No ratings yet
Ca 08 (Isa)
37 pages
Ca 12
No ratings yet
Ca 12
64 pages
Lecture 12
No ratings yet
Lecture 12
34 pages
05SingleCycleCPU 1410693631
No ratings yet
05SingleCycleCPU 1410693631
48 pages
CPU Organization Modified
No ratings yet
CPU Organization Modified
68 pages
Week6 Performance Numericals
No ratings yet
Week6 Performance Numericals
38 pages
Lecture11 New
No ratings yet
Lecture11 New
31 pages
Ch#4 Part 1, 2,34
No ratings yet
Ch#4 Part 1, 2,34
70 pages
Lecture 1
No ratings yet
Lecture 1
15 pages
Computer Organization and Assembly Language: Lecture 1 & 2 Introduction and Basics
No ratings yet
Computer Organization and Assembly Language: Lecture 1 & 2 Introduction and Basics
33 pages
Course File of Microprocessor (EE-402)
No ratings yet
Course File of Microprocessor (EE-402)
13 pages
CAO - Processor Organization and Control Unit
No ratings yet
CAO - Processor Organization and Control Unit
120 pages
Chap 2 and 3
No ratings yet
Chap 2 and 3
75 pages
Lecture 1
No ratings yet
Lecture 1
30 pages
Computer Architecture Note by Redwan (UptoMemorySystem)
100% (1)
Computer Architecture Note by Redwan (UptoMemorySystem)
64 pages
Single Cycle Vs Multi Cycle Cpu
No ratings yet
Single Cycle Vs Multi Cycle Cpu
11 pages
Chapter 4 - MARIE
No ratings yet
Chapter 4 - MARIE
58 pages
Chapter 12 Performance of Single-Cycle and Multi-Cycle Data Path
No ratings yet
Chapter 12 Performance of Single-Cycle and Multi-Cycle Data Path
27 pages
Module 2
No ratings yet
Module 2
64 pages
LECTURE - 1 - 2 Fall 2015 334
No ratings yet
LECTURE - 1 - 2 Fall 2015 334
49 pages
Microprocessor
No ratings yet
Microprocessor
332 pages
Instruction Set Architecture
No ratings yet
Instruction Set Architecture
45 pages
Computer Architecture
100% (1)
Computer Architecture
125 pages
Microprocessor
No ratings yet
Microprocessor
332 pages
Arch3 Pipelining Afterlecture
No ratings yet
Arch3 Pipelining Afterlecture
180 pages
Chapter1 Basic Structure of Computers
No ratings yet
Chapter1 Basic Structure of Computers
119 pages
Chap 1 & Chap 2 Micro Processors 8086-88 Book
No ratings yet
Chap 1 & Chap 2 Micro Processors 8086-88 Book
53 pages
Chapter1 - Basic Structure of Computers
100% (1)
Chapter1 - Basic Structure of Computers
119 pages
04 CPUOverview
No ratings yet
04 CPUOverview
40 pages
Microprocessor System and Interfacing
No ratings yet
Microprocessor System and Interfacing
44 pages
08 Architecture
No ratings yet
08 Architecture
51 pages
Introduction To Microcontroller
No ratings yet
Introduction To Microcontroller
33 pages
Processor and Control Unit
No ratings yet
Processor and Control Unit
71 pages
Single Cycle
No ratings yet
Single Cycle
28 pages
Slide 3
No ratings yet
Slide 3
65 pages
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
From Everand
Computer Science: Learn about Algorithms, Cybersecurity, Databases, Operating Systems, and Web Design
Jonathan Rigdon
No ratings yet
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
From Everand
SAS Programming Guidelines Interview Questions You'll Most Likely Be Asked
Vibrant Publishers
No ratings yet
Assignment 1
No ratings yet
Assignment 1
3 pages
Difference Between Hub and Bridge
No ratings yet
Difference Between Hub and Bridge
5 pages
SMTP Diag Tool
No ratings yet
SMTP Diag Tool
6 pages
XFS - Extended Filesystem
No ratings yet
XFS - Extended Filesystem
46 pages
Unit 4
No ratings yet
Unit 4
56 pages
DataSheet SmartViewer 4.9.13 220516 EN
No ratings yet
DataSheet SmartViewer 4.9.13 220516 EN
1 page
En Ha 442 Manual V1.1
No ratings yet
En Ha 442 Manual V1.1
40 pages
Provided by Short Notes 9618 P1
No ratings yet
Provided by Short Notes 9618 P1
20 pages
Beginning Stm32: Developing With Freertos, Libopencm3 and GCC
No ratings yet
Beginning Stm32: Developing With Freertos, Libopencm3 and GCC
18 pages
Scenario Questions
100% (1)
Scenario Questions
6 pages
Tech Note 992 - Changing The Port Number in InTouch Access Anywhere
No ratings yet
Tech Note 992 - Changing The Port Number in InTouch Access Anywhere
4 pages
Configure L2TP Over IPsec Between Window
No ratings yet
Configure L2TP Over IPsec Between Window
28 pages
Embedded System and Matlab SIMULINK PDF
No ratings yet
Embedded System and Matlab SIMULINK PDF
31 pages
Syllabus FPGA B Tech
No ratings yet
Syllabus FPGA B Tech
1 page
CISCO Asa Troubleshoot Throughput 00
No ratings yet
CISCO Asa Troubleshoot Throughput 00
4 pages
Intel 82541PI GbE
No ratings yet
Intel 82541PI GbE
15 pages
The Difference Between The Internet and World Wide Web
No ratings yet
The Difference Between The Internet and World Wide Web
1 page
90210-1190DEC - CD - TCPIP Communication Manual
No ratings yet
90210-1190DEC - CD - TCPIP Communication Manual
70 pages
Perfil Mandatório Windows 10
No ratings yet
Perfil Mandatório Windows 10
79 pages
Quake II Evolved Manual Navigation: Installation System Requirements New To This Release Change Log Shader Manual
No ratings yet
Quake II Evolved Manual Navigation: Installation System Requirements New To This Release Change Log Shader Manual
8 pages
Building Automation System IT Checklist
No ratings yet
Building Automation System IT Checklist
12 pages
AQUATOX Release 3.1 Installation Guide
No ratings yet
AQUATOX Release 3.1 Installation Guide
5 pages
User Manual: Professional Gaming Mouse
No ratings yet
User Manual: Professional Gaming Mouse
5 pages
Software Configuration Guide For The Cisco ISR 4400 Series
No ratings yet
Software Configuration Guide For The Cisco ISR 4400 Series
236 pages
IBM Storage As A Service - Data Sheet: Simplify Your Storage Experience
No ratings yet
IBM Storage As A Service - Data Sheet: Simplify Your Storage Experience
6 pages
Raw Sockets
No ratings yet
Raw Sockets
15 pages
Oracle Cloud Infrastructure Data Foundations Associate - 2024
No ratings yet
Oracle Cloud Infrastructure Data Foundations Associate - 2024
10 pages
Olivia Lucca Fraser Returnoriented Programme Evolution With Roper A Proof of Concept 1
No ratings yet
Olivia Lucca Fraser Returnoriented Programme Evolution With Roper A Proof of Concept 1
8 pages
Lexmark Firmware Job Aid
No ratings yet
Lexmark Firmware Job Aid
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.