CMP3010L07 Tomasulo
CMP3010L07 Tomasulo
Dina Tantawy
Computer Engineering Department
Cairo University
Dynamic Multiple-Issue Processors
• Superscalar Processors
– An advanced pipelining technique that enables the processor to execute
more than one instruction per clock cycle by selecting them during
execution.
‹#›
Dynamic Pipeline Scheduling
• It is Hardware support for reordering the order of instruction
execution to avoid stalls.
• Chooses which instructions to execute in a given clock cycle
while trying to avoid hazards and stalls (out of order
execution)
Dynamic Pipeline Scheduling
• It is Hardware support for reordering the order of instruction
execution to avoid stalls.
• Chooses which instructions to execute in a given clock cycle
while trying to avoid hazards and stalls (out of order
execution)
Subtract
doesn’t need to
wait addu nor
load
Dynamic Pipeline Scheduling
• The pipeline is divided into 4 major units:
– Instruction fetch and issue unit
– Multiple functional units
– Write Result
– Commit unit
Dynamic Pipeline Scheduling
• The pipeline is divided into 4 major units:
– Instruction fetch and issue unit
– fetches instructions, decodes them, and sends each instruction to a corresponding
functional unit for execution.
– If a reservation station is free (no structural hazard), and reorder buffer slot free, issue instr &
send operands & reorder buffer no. for destination (this stage sometimes called “dispatch”)
– issue instruction & source operand values (if they are in the registers).
– If source operands are not in the registers – rename registers (eliminate WAR, WAW hazards)
and keep track of functional units producing operands
Dynamic Pipeline Scheduling
• The pipeline is divided into 4 major units:
– Instruction fetch and issue unit
– Multiple functional units (operate on operands (EX))
• Each functional unit has buffers, called reservation stations, that hold the
operands and the operation.
• If both operands ready then execute;
• if not ready, watch Common Data Bus for result (Avoid RAW hazard)
Dynamic Pipeline Scheduling
• The pipeline is divided into 4 major units:
– Instruction fetch and issue unit
– Multiple functional units
– Write Results
• When the result is completed, it is sent to any reservation stations waiting for this particular
result as well as to the commit unit using common databus, which buffers the result (reorder
buffer)until it is safe to put the result into the register file or, for a store, into memory.
• Write on Common Data Bus to all units; mark reservation station available
15
Tomasulo’s Algorithm
16
Tomasulo’s Algorithm
z FU buffers are called reservation stations; have pending operands
17
Reservation Station Components
z Busy
y Indicates reservation station is busy
z Op
y Operation to perform in the unit (e.g., + or –)
z Vj, Vk
y Value of Source operands
y Store buffers have V field with result to be stored
z Qj, Qk
y Reservation stations producing source operand (Qj,Qk=0 => ready)
18
Load RS
Fetch & Decode busy Address
L1
Mem
L2
LD → 2
L3
Val Rs O Add RS
F1 1 0 b op Vi Vj Qi Qj
A1 Commit Unit
F2 2 0 Add/Sub (reorder
F3 3 0 A2 buffer)
A3 Add/Sub → 2
F4 4 0
F5 5 0 Mult RS
b op Vi Vj Qi Qj
F6 6 0
M1 Mult/div
F7 7 0
M2 Mult → 10
F8 8 0
Div → 40
F9 9 0
F10 10 0
20
Life Cycle of one Instruction
‹#›
Cycle#1
Fetch & Decode
‹#›
issue start write
Life Cycle of one Instruction: Clock Cycle 1 MUL
‹#›
issue start write
Life Cycle of one Instruction: Clock Cycle 2 MUL 1 2
Load RS
Fetch & Decode busy Address
L1 No
MULT F0, F2, F4 Mem
L2 No
LD → 2
Val Rs O L3 No
F0 0 M1 0 Add RS
F1 1 0
b op Vi Vj Qi Qj L1 : 23
A1 N Commit Unit
F2 2 0 Add/Sub (reorder
F3 3 0
A2 N buffer)
A3 N Add/Sub → 2
F4 4 L1 1
F5 5 0
Mult RS
b op Vi Vj Qi Qj
F6 6 0
M1 Yes M 2 23 Mult/div
F7 7 0
M2 N
F8 8 0 Mult → 10
Div → 40
F9 9 0 1. Are all operands ready ?
F10 10 0 2. Is Mult unit busy ?
28
issue start write
Life Cycle of one Instruction: Clock Cycle 2 MUL 1 2
Load RS
Fetch & Decode busy Address
L1 No
MULT F0, F2, F4 Mem
L2 No
LD → 2
Val Rs O L3 No
F0 0 M1 0 Add RS
F1 1 0
b op Vi Vj Qi Qj
A1 N Commit Unit
F2 2 0 Add/Sub (reorder
F3 3 0
A2 N buffer)
A3 N Add/Sub → 2
F4 4 0
F5 5 0
Mult RS
b op Vi Vj Qi Qj M1
F6 6 0
M1 Yes M 2 23 Mult/div
F7 7 0
M2 N
F8 8 0 Mult → 10
Div → 40
F9 9 0 1. Are all operands ready ?
F10 10 0 2. Is Mult unit busy ?
29
Fast forward to
Cycle#12
Write Result
‹#›
issue start write
Life Cycle of one Instruction: Clock Cycle 12 MUL 1 2 12
Load RS
Fetch & Decode busy Address
L1 No
MULT F0, F2, F4 Mem
L2 No
LD → 2
Val Rs O L3 No
F0 0 M1 0 Add RS
F1 1 0
b op Vi Vj Qi Qj
A1 N Commit Unit
F2 2 0 Add/Sub (reorder
F3 3 0
A2 N buffer)
A3 N Add/Sub → 2
F4 4 0
F5 5 0
Mult RS
b op Vi Vj Qi Qj M1: 46
F6 6 0
M1 Yes M 2 23 Mult/div
F7 7 0
M2 N
F8 8 0 Mult → 10
Div → 40
F9 9 0 1. Is common databus free?
F10 10 0
31
issue start write
Life Cycle of one Instruction: Clock Cycle 12 MUL 1 2 12
Load RS
Fetch & Decode busy Address
L1 No
MULT F0, F2, F4 Mem
L2 No
LD → 2
Val Rs O L3 No
F0 0 M1 0 Add RS
F1 1 0
b op Vi Vj Qi Qj M1: 46
A1 N Commit Unit
F2 2 0 Add/Sub (reorder
F3 3 0
A2 N buffer)
A3 N Add/Sub → 2
F4 4 0
F5 5 0
Mult RS
b op Vi Vj Qi Qj
F6 6 0
M1 Yes M 2 23 Mult/div
F7 7 0
M2 N
F8 8 0 Mult → 10
Div → 40
F9 9 0 1. Is common databus free?
F10 10 0
2. Go to every one listening to the bus 32
issue start write
Life Cycle of one Instruction: Clock Cycle 12 MUL 1 2 12
Load RS
Fetch & Decode busy Address
L1 No
MULT F0, F2, F4 Mem
L2 No
LD → 2
Val Rs O L3 No
F0 0 M1 0 Add RS
F1 1 0
b op Vi Vj Qi Qj M1: 46
A1 N Commit Unit
F2 2 0 Add/Sub (reorder
F3 3 0
A2 N buffer)
A3 N Add/Sub → 2
F4 4 0
F5 5 0
Mult RS
b op Vi Vj Qi Qj
F6 6 0
M1 Yes M 2 23 Mult/div
F7 7 0
M2 N
F8 8 0 Mult → 10
Div → 40
F9 9 0 1. Is common databus free?
F10 10 0
2. Go to every one listening to the bus 33
issue start write
Life Cycle of one Instruction: Clock Cycle 12 MUL 1 2 12
Load RS
Fetch & Decode busy Address
L1 No
MULT F0, F2, F4 Mem
L2 No
LD → 2
Val Rs O L3 No
F0 0 M1 0 Add RS
F1 1 0
b op Vi Vj Qi Qj M1: 46
A1 N Commit Unit
F2 2 0 Add/Sub (reorder
F3 3 0
A2 N buffer)
A3 N Add/Sub → 2
F4 4 0
F5 5 0
Mult RS
b op Vi Vj Qi Qj
F6 6 0
M1 No Mult/div
F7 7 0
M2 N
F8 8 0 Mult → 10
Div → 40
F9 9 0 1. Is common databus free?
F10 10 0
2. Go to every one listening to the bus 34
issue start write
Life Cycle of one Instruction: Clock Cycle 12 MUL 1 2 12
Load RS
Fetch & Decode busy Address
L1 No
MULT F0, F2, F4 Mem
L2 No
LD → 2
Val Rs O L3 No
F0 0 M1 1 Add RS
F1 1 0
b op Vi Vj Qi Qj M1: 46
A1 N Commit Unit
F2 2 0 Add/Sub (reorder
F3 3 0
A2 N buffer)
A3 N Add/Sub → 2
F4 4 0
F5 5 0
Mult RS
b op Vi Vj Qi Qj
F6 6 0
M1 No Mult/div
F7 7 0
M2 N
F8 8 0 Mult → 10
Div → 40
F9 9 0 1. Is common databus free?
F10 10 0
2. Go to every one listening to the bus 35
Fast forward to sometime …
‹#›
issue start write
Life Cycle of one Instruction: Clock Cycle X MUL 1 2 12
Load RS
Fetch & Decode busy Address
L1 No
MULT F0, F2, F4 Mem
L2 No
LD → 2
Val Rs O L3 No
F0 0 M1 1 Add RS
F1 1 0
b op Vi Vj Qi Qj M1 : 46
A1 N Commit Unit
F2 2 0 Add/Sub (reorder
F3 3 0
A2 N buffer)
A3 N Add/Sub → 2
F4 4 0
F5 5 0
Mult RS
1. Is the top of the
b op Vi Vj Qi Qj
F6 6 0 queue ready to
M1 N Mult/div commit?
F7 7 0
M2 N
F8 8 0 Mult → 10
Div → 40
F9 9 0
F10 10 0
37
issue start write
Life Cycle of one Instruction: Clock Cycle X MUL 1 2 12
Load RS
Fetch & Decode busy Address
L1 No
MULT F0, F2, F4 Mem
L2 No
LD → 2
Val Rs O L3 No
F0 0 0 Add RS
F1 1 0
b op Vi Vj Qi Qj
A1 N Commit Unit
F2 2 0 Add/Sub (reorder
F3 3 0
A2 N buffer)
A3 N Add/Sub → 2
F4 4 0
F5 5 0
Mult RS
1. Is the top of the
b op Vi Vj Qi Qj
F6 6 0 queue ready to
M1 N Mult/div commit?
F7 7 0
M2 N
F8 8 0 Mult → 10
Div → 40
F9 9 0
F10 10 0
38
Another Example
LD F6, 34(R2) Latencies (clock cycles)
LD F2, 45(R3) LD 2
MULT F0, F2, F4 MULT 10
SUBD F8, F6, F2 DIVD 40
DIVD F10, F0, F6 ADDD, SUBD 2
ADDD F6, F8, F2
‹#›
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle 1 LD
F0 0 0 Add RS
b op Vi Vj Qi Qj
F1 1 0
A1 No Commit Unit
F2 2 0 Add/Sub (reorder
A2 No buffer)
F3 3 0
A3 No Add/Sub → 2
F4 4 0
Mult RS
F5 5 0
b op Vi Vj Qi Qj
F6 6 0
M1 No Mult/div
F7 7 0
M2 No Mult → 10
F8 8 0
Div → 40
F9 9 0
F10 10 0 40
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle 1 LD 1
SUBD F8, F6, F2 LD
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address MUL
MULT F4,F1,F5
L1 Yes 34+R2 SUBD
Mem
L2 No DIV
LD → 2
Val Rs O L3 No ADD
F0 0 0 Add RS
b op Vi Vj Qi Qj
F1 1 0
A1 No Commit Unit
F2 2 0 Add/Sub (reorder
A2 No buffer)
F3 3 0
A3 No Add/Sub → 2
F4 4 0
Mult RS
F5 5 0
b op Vi Vj Qi Qj
F6 6 L1 0
M1 No Mult/div
F7 7 0
M2 No Mult → 10
F8 8 0
Div → 40
F9 9 0
F10 10 0 41
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle 2 LD 1 2
SUBD F8, F6, F2 LD 2
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address L1 MUL
MULT F4,F1,F5
L1 Yes 34+R2 SUBD
Mem
L2 Yes 45+R2 DIV
LD → 2
Val Rs O L3 No ADD
F0 0 0 Add RS
b op Vi Vj Qi Qj
F1 1 0
A1 No Commit Unit
F2 2 L2 0 Add/Sub (reorder
A2 No buffer)
F3 3 0
A3 No Add/Sub → 2
F4 4 0
Mult RS
F5 5 0
b op Vi Vj Qi Qj
F6 6 L1 0
M1 No Mult/div
F7 7 0
M2 No Mult → 10
F8 8 0
Div → 40
F9 9 0
F10 10 0 42
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle 3 LD 1 2
SUBD F8, F6, F2 LD 2
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address L1 MUL 3
MULT F4,F1,F5
L1 Yes 34+R2 SUBD
Mem
L2 Yes 45+R2 DIV
LD → 2
Val Rs O L3 No ADD
F0 0 M1 0 Add RS
b op Vi Vj Qi Qj
F1 1 0
A1 No Commit Unit
F2 2 L2 0 Add/Sub (reorder
A2 No buffer)
F3 3 0
A3 No Add/Sub → 2
F4 4 0
Mult RS
F5 5 0
b op Vi Vj Qi Qj
F6 6 L1 0
M1 Yes M 4 L2 Mult/div
F7 7 0
M2 No Mult → 10
F8 8 0
Div → 40
F9 9 0 Note that we wrote L2
instead of register value
F10 10 0 43
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle 4 L2 started exe
LD 1 2 4
SUBD F8, F6, F2 Rs got freed LD 2 4
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address L2 MUL 3
MULT F4,F1,F5
L1 No SUBD 4
Mem
L2 Yes 45+R2 DIV
LD → 2
Val Rs O L3 No ADD
F0 0 M1 0 Add RS
b op Vi Vj Qi Qj L1:F6
F1 1 0
A1 Yes Sub V(L1)
L2 Commit Unit
F2 2 L2 0 Add/Sub (reorder
A2 No Sub read L1 buffer)
F3 3 0
A3 No value Add/Sub → 2
F4 4 0
Set it to Mult RS
F5 5 0 buffer
reorder b op Vi Vj Qi Qj
F6 6 L1 1
M1 Yes M 4 L2 Mult/div
F7 7 0
M2 No Mult → 10
F8 8 A1 0
Div → 40
F9 9 0
F10 10 0 44
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle 5 LD 1 2 4
SUBD F8, F6, F2 LD 2 4
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address L2 MUL 3
MULT F4,F1,F5
L1 No SUBD 4
Mem
L2 Yes 45+R2 DIV 5
LD → 2
Val Rs O L3 No ADD
F0 0 M1 0 Add RS
b op Vi Vj Qi Qj
F1 1 0
A1 Yes Sub V(L1)
L2 Commit Unit
F2 2 L2 0 Add/Sub (reorder
A2 No buffer)
F3 3 0
A3 No Add/Sub → 2
F4 4 0
reorder buffer Mult RS
F5 5 0
freed b op Vi Vj Qi Qj
V(L1)
F6 0
M1 Yes M 4 L2 Mult/div
F7 7 0
V(L1)
M2 Yes D M1 Mult → 10
F8 8 A1 0
Div → 40
F9 9 0
F10 10 M2 0 45
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle 6 LD 1 2 4
SUBD F8, F6, F2 LD 2 4 6
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address MUL 3
MULT F4,F1,F5
L1 No SUBD 4
Mem
L2 No DIV 5
LD → 2
Val Rs O L3 No ADD 6
F0 0 M1 0 Add RS Now Sub is ready, it will
b start
op exe
Vi next
Vj cycle
Qi Qj L2:F2
F1 1 0
A1 Yes Sub V(L1) V(L2) Commit Unit
F2 2 L2 1 Add/Sub (reorder
Add V(L2)
A2 Yes A1 buffer)
F3 3 0
A3 No Add/Sub → 2
F4 4 0
Mult RS
F5 5 0
b op Vi Vj Qi Qj
V(L1)
F6 A2 0
M1 Yes M V(L2)
4 Mult/div
F7 7 0
V(L1)
M2 Yes Mult
D is also ready,M1
what Mult → 10
F8 8 A1 0
will happen next cycle? Div → 40
F9 9 0
F10 10 M2 0 46
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle7 LD 1 2 4
SUBD F8, F6, F2 LD 2 4 6
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address MUL 3 7
MULT F4,F1,F5
L1 No SUBD 4 7
Mem
L2 No DIV 5
LD → 2
Val Rs O L3 No ADD 6 No place in
F0 0 M1 0 Add RS MUL RS, STALL
b op Vi Vj Qi Qj A1
F1 1 0
A1 Yes Sub V(L1) V(L2) Commit Unit
F2 2 0 Add/Sub (reorder
Add V(L2)
A2 Yes A1 buffer)
F3 3 0
A3 No Add/Sub → 2
F4 4 0
Mult RS
F5 5 0 M1
b op Vi Vj Qi Qj
V(L1)
F6 A2 0
M1 Yes M V(L2)
4 Mult/div
F7 7 0
V(L1)
M2 Yes D M1 Mult → 10
F8 8 A1 0
Div → 40
F9 9 0
F10 10 M2 0 47
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle8 LD 1 2 4
SUBD F8, F6, F2 LD 2 4 6
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address MUL 3 7
MULT F4,F1,F5
L1 No SUBD 4 7
Mem
L2 No DIV 5
LD → 2
Val Rs O L3 No ADD 6 No place in
F0 0 M1 0 Add RS MUL RS, STALL
b op Vi Vj Qi Qj A1
F1 1 0
A1 Yes Sub V(L1) V(L2) Commit Unit
F2 2 0 Add/Sub (reorder
Add V(L2)
A2 Yes A1 buffer)
F3 3 0
A3 No Add/Sub → 2
F4 4 0
Mult RS
F5 5 0 M1
b op Vi Vj Qi Qj
V(L1)
F6 A2 0
M1 Yes M V(L2)
4 Mult/div
F7 7 0
V(L1)
M2 Yes D M1 Mult → 10
F8 8 A1 0
Div → 40
F9 9 0
F10 10 M2 0 48
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle9 LD 1 2 4
SUBD F8, F6, F2 LD 2 4 6
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address MUL 3 7
MULT F4,F1,F5
L1 No SUBD 4 7 9
Mem
L2 No DIV 5
LD → 2
Val Rs O L3 No ADD 6 No place in
F0 0 M1 0 Add RS MUL RS, STALL
b op Vi Vj Qi Qj
F1 1 0
A1 No Commit Unit
F2 2 0 Add/Sub (reorder
A1:F8
Add v(A1) V(L2)
A2 Yes buffer)
F3 3 0
A3 No Add/Sub → 2
F4 4 0
Mult RS
F5 5 0 M1
b op Vi Vj Qi Qj
V(L1)
F6 A2 0
M1 Yes M V(L2)
4 Mult/div
F7 7 0
V(L1)
M2 Yes D M1 Mult → 10
F8 8 A1 1
Div → 40
F9 9 0
F10 10 M2 0 49
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle10 LD 1 2 4
SUBD F8, F6, F2 LD 2 4 6
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address MUL 3 7
MULT F4,F1,F5
L1 No SUBD 4 7 9
Mem
L2 No DIV 5
LD → 2
Val Rs O L3 No ADD 6 10
F0 0 M1 0 Add RS MUL No place in
b op Vi Vj Qi Qj A2 RS, STALL
F1 1 0
A1 No Commit Unit
F2 2 0 Add/Sub (reorder
A2 Yes Add v(A1) V(L2) A1:F8
F3 3 0 buffer)
A3 No Add/Sub → 2
F4 4 0
Mult RS
F5 5 0 M1
b op Vi Vj Qi Qj
V(L1)
F6 A2 0
M1 Yes M V(L2)
4 Mult/div
F7 7 0
V(L1)
M2 Yes D M1 Mult → 10
F8 8 A1 1
Div → 40
F9 9 0
F10 10 M2 0 50
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle11 LD 1 2 4
SUBD F8, F6, F2 LD 2 4 6
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address MUL 3 7
MULT F4,F1,F5
L1 No SUBD 4 7 9
Mem
L2 No DIV 5
LD → 2
Val Rs O L3 No ADD 6 10
F0 0 M1 0 Add RS MUL No place in
b op Vi Vj Qi Qj A2 RS, STALL
F1 1 0
A1 No Commit Unit
F2 2 0 Add/Sub (reorder
A2 Yes Add v(A1) V(L2) A1:F8
F3 3 0 buffer)
A3 No Add/Sub → 2
F4 4 0
Mult RS
F5 5 0 M1
b op Vi Vj Qi Qj
V(L1)
F6 A2 0
M1 Yes M V(L2)
4 Mult/div
F7 7 0
V(L1)
M2 Yes D M1 Mult → 10
F8 8 A1 1
Div → 40
F9 9 0
F10 10 M2 0 51
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle12 LD 1 2 4
SUBD F8, F6, F2 LD 2 4 6
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address MUL 3 7
MULT F4,F1,F5
L1 No SUBD 4 7 9
Mem
L2 No DIV 5
LD → 2
Val Rs O L3 No ADD 6 10 12
F0 0 M1 0 Add RS MUL No place in
b op Vi Vj Qi Qj RS, STALL
F1 1 0
A1 No Commit Unit
F2 2 0 Add/Sub (reorder
A2 No A1:F8
F3 3 0 buffer)
A3 No Add/Sub → 2 A2:F6
F4 4 0
Mult RS
F5 5 0 M1
b op Vi Vj Qi Qj
V(L1)
F6 A2 1
M1 Yes M V(L2)
4 Mult/div
F7 7 0
V(L1)
M2 Yes D M1 Mult → 10
F8 8 A1 1
Div → 40
F9 9 0
F10 10 M2 0 52
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle13 LD 1 2 4
SUBD F8, F6, F2 LD 2 4 6
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address MUL 3 7
MULT F4,F1,F5
L1 No SUBD 4 7 9
Mem
L2 No DIV 5
LD → 2
Val Rs O L3 No ADD 6 10 12
F0 0 M1 0 Add RS MUL No place in
b op Vi Vj Qi Qj RS, STALL
F1 1 0
A1 No Commit Unit
F2 2 0 Add/Sub (reorder
A2 No A1:F8
F3 3 0 buffer)
A3 No Add/Sub → 2 A2:F6
F4 4 0
Mult RS
F5 5 0 M1
b op Vi Vj Qi Qj
V(L1)
F6 A2 1
M1 Yes M V(L2)
4 Mult/div
F7 7 0
V(L1)
M2 Yes D M1 Mult → 10
F8 8 A1 1
Div → 40
F9 9 0
F10 10 M2 0 53
Fast forward to
clock#17
Write Result
‹#›
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle17 LD 1 2 4
SUBD F8, F6, F2 LD 2 4 6
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address MUL 3 7 17
MULT F4,F1,F5
L1 No SUBD 4 7 9
Mem
L2 No DIV 5
LD → 2
Val Rs O L3 No ADD 6 10 12
F0 0 M1 1 Add RS MUL No place in
b op Vi Vj Qi Qj RS, STALL
F1 1 0 M1:F0
A1 No Commit Unit
F2 2 0 Add/Sub (reorder
A2 No A1:F8
F3 3 0 buffer)
A3 No Add/Sub → 2 A2:F6
F4 4 0
Mult RS
F5 5 0
b op Vi Vj Qi Qj
V(L1)
F6 A2 1
M1 No Mult/div
F7 7 0
V(M1 V(L1)
M2 Yes D Mult → 10
F8 8 A1 1
M1 is now free, Mul can be Div → 40
F9 9 0
loaded next cycle
F10 10 M2 0 55
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle18 LD 1 2 4
SUBD F8, F6, F2 LD 2 4 6
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address MUL 3 7 17
MULT F4,F1,F5
L1 No SUBD 4 7 9
Mem
L2 No DIV 5 18
LD → 2
Val Rs O L3 No ADD 6 10 12
F0 V(M1)
0 Add RS MUL 18
b op Vi Vj Qi Qj
F1 1 0 M1:F0
A1 No Commit Unit
F2 2 0 Add/Sub (reorder
A2 No A1:F8
F3 3 0 buffer)
A3 No Add/Sub → 2 A2:F6
F4 4 M1 0
Mult RS
F5 5 0 M2
b op Vi Vj Qi Qj
V(L1)
F6 A2 1
M1 Yes M 1 5 Mult/div
F7 7 0
V(M1 V(L1)
M2 Yes D Mult → 10
F8 8 A1 1
Div → 40
F9 9 0 M1 is ready can it execute?
F10 10 M2 0 56
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle19 LD 1 2 4
SUBD F8, F6, F2 LD 2 4 6
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address MUL 3 7 17
MULT F4,F1,F5
L1 No SUBD 4 7 9
Mem
L2 No DIV 5 18
LD → 2
Val Rs O L3 No ADD 6 10 12
F0 V(M1)
0 Add RS MUL 18
b op Vi Vj Qi Qj
F1 1 0
A1 No Commit Unit
F2 2 0 Add/Sub (reorder
A2 No A1:F8
F3 3 0 buffer)
A3 No Add/Sub → 2 A2:F6
F4 4 M1 0
Mult RS
F5 5 0 M2 A1 can now be written to
b op Vi Vj Qi Qj
F6 V(L1)
A2 1 register file
M1 Yes M 1 5 Mult/div
F7 7 0
V(M1 V(L1)
V(A1)
M2 Yes D Mult → 10
F8 0
Div → 40
F9 9 0
F10 10 M2 0 57
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle20 LD 1 2 4
SUBD F8, F6, F2 LD 2 4 6
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address MUL 3 7 17
MULT F4,F1,F5
L1 No SUBD 4 7 9
Mem
L2 No DIV 5 18
LD → 2
Val Rs O L3 No ADD 6 10 12
F0 V(M1)
0 Add RS MUL 18
b op Vi Vj Qi Qj
F1 1 0
A1 No Commit Unit
F2 2 0 Add/Sub (reorder
A2 No buffer)
F3 3 0
A3 No Add/Sub → 2 A2:F6
F4 4 M1 0
Mult RS
F5 5 0 M2
b op Vi Vj Qi Qj Why A2 is still in reorder buffer?
V(L1)
F6 A2 1
M1 Yes M 1 5 Mult/div
F7 7 0
V(M1 V(L1)
V(A1)
M2 Yes D Mult → 10
F8 0
Div → 40
F9 9 0
F10 10 M2 0 58
Fast forward to
clock#58
Write Result
‹#›
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle58 LD 1 2 4
SUBD F8, F6, F2 LD 2 4 6
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address MUL 3 7 17
MULT F4,F1,F5
L1 No SUBD 4 7 9
Mem
L2 No DIV 5 18 58
LD → 2
Val Rs O L3 No ADD 6 10 12
F0 V(M1)
0 Add RS MUL 18 58
b op Vi Vj Qi Qj
F1 1 0
A1 No Commit Unit
F2 2 0 Add/Sub (reorder
A2 No M2:F10
F3 3 0 buffer)
A3 No Add/Sub → 2 A2:F6
F4 4 M1 0
Mult RS
F5 5 0 M1
b op Vi Vj Qi Qj
V(L1)
F6 A2 1
M1 Yes M 1 5 Mult/div
F7 7 0
V(A1)
M2 No Mult → 10
F8 0
Div → 40
F9 9 0 M1 is ready can it execute?
F10 10 M2 1 60
Fast forward to
clock#68
Write Result
‹#›
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle68 LD 1 2 4
SUBD F8, F6, F2 LD 2 4 6
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address MUL 3 7 17
MULT F4,F1,F5
L1 No SUBD 4 7 9
Mem
L2 No DIV 5 18 58
LD → 2
Val Rs O L3 No ADD 6 10 12
F0 V(M1)
0 Add RS MUL 18 58 68
b op Vi Vj Qi Qj
F1 1 0
A1 No Commit Unit
F2 2 0 Add/Sub (reorder
A2 No buffer)
F3 3 0
A3 No Add/Sub → 2 M1:F4
F4 4 M1 1
Mult RS
F5 5 0
V(A2
b op Vi Vj Qi Qj
F6 0
M1 No Mult/div
F7 7 0
V(A1)
M2 No Mult → 10
F8 0
Div → 40
F9 9 0
F10 V(m 0
2) 62
Important Notes
• If an instruction fetches at cycle#1 and takes 4 cycles to execute, Then it
will start execution at cycle#2 and finishes at the end of cycle#5 And
writebacks at cycle#6
• The next instruction requires the FU will start at #6
• The next instruction requires the RS will use it at #7
• The next instruction requires the data in Exe will use it at #7
(no forwarding)
• Execution units are non-pipelined unless stated the opposite.
• Only one execution unit available per function unless stated otherwise.
Tomasulo’s Summary
z Prevents Register as bottleneck
z Lasting Contributions
y Dynamic scheduling
y Register renaming
y Load/store buffers
64
Tomasulo’s Summary issue start write
‹#›
Thank you
‹#›