The Processor: Computer Organization and Design
The Processor: Computer Organization and Design
5th
Edition
The Hardware/Software Interface
Chapter 4
The Processor
§4.1 Introduction
Introduction
CPU performance factors
Instruction count
Determined by ISA and compiler
CPI and Cycle time
Determined by CPU hardware
A 32-bit ALU
a
32 ALU
result
32
32
CarryIn
1-bit addition: a
Sum
cout = a b + a cin + b cin b
How could we build a 1-bit ALU for add, and, and or?
How could we build a 32-bit ALU?
1-bit
ALU
(sign-bit value)
Control lines
Ainv Binv Operation Function
0 0 00 and
0 0 01 or
0 0 10 add
0 1 10 sub
0 1 11 slt
1 1 00 nor
Control lines
Ainv Binv Operation Function
0 0 00 and
0 0 01 or
0 0 10 add
0 1 10 sub
0 1 11 slt
1 1 00 nor
A
Y
B
Arithmetic/Logic Unit
Multiplexer Y = F(A, B)
Y = S ? In1 : In0
A
In0 M
u Y ALU Y
In1 x
B
S F
Clk
D Q
D
Clk
Q
Clk
Write
D Q
Write D
Clk
Q
Increment by
4 for next
32-bit instruction
register
Instruction
fetch Load & Store
R-type Beq
Composing the Elements
First-cut data path does an instruction in
one clock cycle
Each datapath element can only do one
function at a time
Hence, we need separate instruction and data
memories
Use multiplexers where alternate data
sources are used for different instructions
(beq)
(sw)
(R) (lw)
(R)
(lw, sw)
(R, lw)
(lw)
Load/
35 or 43 rs rt address
Store
31:26 25:21 20:16 15:0
rd rs rt
add $t0, $t1, $t2
rt rs
write for
lw $t0, 4($t1) R-type
rs rt and load
beq $t0, $t1, L
(lw)
rt M
U
rd X
(R)
Chapter 4 — The Processor — 32
A Simple Datapath for MIPS Inst. Set
A Simple Datapath for MIPS Inst. Set
R-type
add $rd, $rs, $rt
A Simple Datapath for MIPS Inst. Set
Load
lw $rt, 32($rs)
A Simple Datapath for MIPS Inst. Set
Branch
beq $rs, $rt, L
§4.4 A Simple Implementation Scheme
ALU Control
ALU used for
Load/Store: F = add
Branch: F = subtract
R-type: F depends on funct field
ALU control Function
0000 AND
0001 OR
0010 add
0110 subtract
0111 set-on-less-than (slt)
1100 NOR
ALU
Control
Instruction [5:0]
ALUOp
Load/
35 or 43 rs rt address
Store
31:26 25:21 20:16 15:0
Branch 4 rs rt address
31:26 25:21 20:16 15:0
rs
rt
rd
rs
rt
rd
1 0 0 1 0 0 0 1 0
Load Instruction
rs
rt
rd
lw $rt, 32($rs)
0 1 1 1 1 0 0 0 0 — 46
Branch-on-Equal Instr.
rs
rt
rd
x 0 x 0 0 0 1 0 1
The Main Control Unit
The setting of the control lines:
Four loads:
Speedup
= 8/3.5 = 2.3
Non-stop:
Speedup
= 2n/(0.5n + 1.5) ≈ 4
= number of stages
Prediction
correct
Prediction
incorrect
MEM
Right-to-left WB
flow leads to
hazards
Memory[PC] IF/ID
PC+4 PC
PC+4 IF/ID
Note: Shaded left-half: Write Shaded right-half: Read
Reg[IF/ID.rs] ID/EX
Reg[IF/ID.rt] ID/EX
Sign-extend( IF/ID.Instr[15:0] ) ID/EX
IF/ID.pc+4 ID/EX
Wrong
register
number
MEM/WB.mem-data Reg[ rt ]
rt
rd
rt
rd
Instr. Execution/address calculation stage Memory access stage control Write-back stage
control lines lines control lines
RegDst ALUOp1 ALUOp0 ALUSrc Branch MemRead MemWrite RegWrite MemtoReg
R-format 1 1 0 0 0 0 0 1 0
lw 0 0 0 1 0 1 0 1 1
sw X 0 0 1 0 0 1 0 X
beq X 0 1 0 1 0 0 0 x
EX/MEM MEM/WB
Without
forwarding paths
With
forwarding paths
Need to stall
for one cycle
Stall inserted
here
00
01
10
fwdA
00
01
10
fwdB
Stalls and Performance
The BIG Picture
Flush these
instructions
(Set control
values to 0)
PC
… IF ID EX MEM WB
stalled
beq $1, $4, target IF ID EX MEM WB
beq stalled IF ID
beq stalled ID
1-bit: T – N – T – N – N – T – N
T T N T N N T
2-bit: T – N – T – N – N – T – N
T T T T T N T
2-bit: T – N – T – N – N – N – T – N
T T T T T N N N
Chapter 4 — The Processor — 121
Calculating the Branch Target
127
Exception Properties
Restartable exceptions
Pipeline can flush the instruction
Handler executes, then returns to the
instruction
Refetched and executed from scratch
PC saved in EPC register
Identifies causing instruction
Actually PC + 4 is saved
Handler must adjust
Multiple issue
Replicate pipeline stages multiple pipelines
Hold pending
operands
72 physical
registers
FP is 5 stages longer
Up to 106 RISC-ops in progress
Bottlenecks
Complex instructions with long dependencies
Branch mispredictions
Memory access delays