0% found this document useful (0 votes)

3 views70 pages

CMP3010L07 Tomasulo

The document discusses dynamic multiple-issue processors, particularly focusing on superscalar processors and their ability to execute multiple instructions per clock cycle. It explains dynamic pipeline scheduling, the components involved, and introduces Tomasulo's Algorithm for out-of-order execution, which helps avoid data hazards. The document also details the life cycle of an instruction within this architecture, emphasizing the importance of reservation stations and register renaming.

Uploaded by

Mostafa Mohamed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views70 pages

CMP3010L07 Tomasulo

Uploaded by

Mostafa Mohamed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 70

CMP3010: Computer Architecture

L06: Multiple Issue Techniques

Dina Tantawy
Computer Engineering Department
Cairo University
Dynamic Multiple-Issue Processors
• Superscalar Processors
– An advanced pipelining technique that enables the processor to execute
more than one instruction per clock cycle by selecting them during
execution.

• in the simplest superscalar instructions issue in-order, and the

processor decides whether zero, one, or more instructions can
issue in a given clock cycle.
Difference between Simple Superscalar
and VLIW Processor
• The code is guaranteed by the hardware to execute
correctly.
• Compiled code will always run correctly independent of the
issue rate or pipeline structure of the processor.
• In some VLIW designs, recompilation was required when
moving across different processor models.
In VLIM, the compiler does the
scheduling,

how can we do this in superscalar ?

‹#›
Dynamic Pipeline Scheduling
• It is Hardware support for reordering the order of instruction
execution to avoid stalls.
• Chooses which instructions to execute in a given clock cycle
while trying to avoid hazards and stalls (out of order
execution)
Dynamic Pipeline Scheduling
• It is Hardware support for reordering the order of instruction
execution to avoid stalls.
• Chooses which instructions to execute in a given clock cycle
while trying to avoid hazards and stalls (out of order
execution)

Subtract
doesn’t need to
wait addu nor
load
Dynamic Pipeline Scheduling
• The pipeline is divided into 4 major units:
– Instruction fetch and issue unit
– Multiple functional units
– Write Result
– Commit unit
Dynamic Pipeline Scheduling
• The pipeline is divided into 4 major units:
– Instruction fetch and issue unit
– fetches instructions, decodes them, and sends each instruction to a corresponding
functional unit for execution.

– If a reservation station is free (no structural hazard), and reorder buffer slot free, issue instr &
send operands & reorder buffer no. for destination (this stage sometimes called “dispatch”)

– issue instruction & source operand values (if they are in the registers).

– If reservation stations or reorder buffer are busy, instruction stalls

– If source operands are not in the registers – rename registers (eliminate WAR, WAW hazards)
and keep track of functional units producing operands
Dynamic Pipeline Scheduling
• The pipeline is divided into 4 major units:
– Instruction fetch and issue unit
– Multiple functional units (operate on operands (EX))
• Each functional unit has buffers, called reservation stations, that hold the
operands and the operation.
• If both operands ready then execute;
• if not ready, watch Common Data Bus for result (Avoid RAW hazard)
Dynamic Pipeline Scheduling
• The pipeline is divided into 4 major units:
– Instruction fetch and issue unit
– Multiple functional units
– Write Results
• When the result is completed, it is sent to any reservation stations waiting for this particular
result as well as to the commit unit using common databus, which buffers the result (reorder
buffer)until it is safe to put the result into the register file or, for a store, into memory.

• Write on Common Data Bus to all units; mark reservation station available

• Common data bus: data + source + Broadcasts

Dynamic Pipeline Scheduling
• The pipeline is divided into 4 major units:
– Instruction fetch and issue unit
– Multiple functional units
– Write results
– Commit unit
• The unit that decides when it is safe to release the result of an operation to
programmer-visible registers and memory.
• The unit is used to make sure that the results of all instructions will be written
in the same order that instructions are fetched.
Tomasulo’s Algorithm
• It is a computer architecture hardware algorithm for dynamic
scheduling of instructions that allows out-of-order execution,
designed to efficiently utilize multiple execution units.
• Developed for IBM (1966)
• Goal: High Performance without special compilers
Tomasulo’s Algorithm
• Tracks when operands are available to satisfy data dependences.
• Removes name dependences through register renaming.
• Very similar to what is used today: Almost all modern high-
performance processors use a derivative of Tomasulo’s… much of
the terminology survives to today.
Tomasulo’s Algorithm

z Avoid RAW Hazards

y Execute an instruction only when its operands are available
y Has a scheme to track when operands are available
z Avoid WAR and WAW Hazards
y Support Register renaming.
y Renames all destination registers: Out-of-order write does not affect any instructions that
depend on an earlier value of an operand
x DIVD F0, F2, F4
x ADDD F6, F0, F8
x SD F6, 0(R1)
x SUBD F8, F10, F14
x MULD F6, F10, F8

15
Tomasulo’s Algorithm

z Avoid RAW Hazards

y Execute an instruction only when its operands are available
y Has a scheme to track when operands are available
z Avoid WAR and WAW Hazards
y Support Register renaming.
y Renames all destination registers: Out-of-order write does not affect any instructions that
depend on an earlier value of an operand
DIVD
x F0, F2, F4 DIVD F0, F2, F4
x ADDD F6, F0, F8 ADDD S, F0, F8 //S & T temp Reg
x SD F6, 0(R1) SD S, 0(R1)
WAR x SUBD F8, F10, F14 SUBD T, F10, F14
WAR & WAWx MULD F6, F10, F8 MULD F6, F10, T

16
Tomasulo’s Algorithm
z FU buffers are called reservation stations; have pending operands

z Registers in instructions replaced by values or pointers to reservation

stations(RS); called register renaming
y avoids WAR, WAW hazards

z A Common Data Bus broadcasts results to all FUs

z Load and Stores treated as FUs with reservation stations as well

17
Reservation Station Components
z Busy
y Indicates reservation station is busy

z Op
y Operation to perform in the unit (e.g., + or –)

z Vj, Vk
y Value of Source operands
y Store buffers have V field with result to be stored

z Qj, Qk
y Reservation stations producing source operand (Qj,Qk=0 => ready)

18
Load RS
Fetch & Decode busy Address
L1
Mem
L2
LD → 2
L3
Val Rs O Add RS
F1 1 0 b op Vi Vj Qi Qj
A1 Commit Unit
F2 2 0 Add/Sub (reorder
F3 3 0 A2 buffer)
A3 Add/Sub → 2
F4 4 0
F5 5 0 Mult RS
b op Vi Vj Qi Qj
F6 6 0
M1 Mult/div
F7 7 0
M2 Mult → 10
F8 8 0
Div → 40
F9 9 0
F10 10 0
20
Life Cycle of one Instruction

‹#›
Cycle#1
Fetch & Decode

‹#›
issue start write
Life Cycle of one Instruction: Clock Cycle 1 MUL

1. Is there a place in relevant reservation station & reorder buffer?

Load RS
Fetch & Decode busy Address
L1 No
MULT F0, F2, F4 Mem
L2 No
LD → 2
Val Rs O L3 No
F0 0 0 Add RS
F1 1 0
b op Vi Vj Qi Qj L1 : 23
A1 N Commit Unit
F2 2 0 Add/Sub (reorder
F3 3 0
A2 N buffer)
A3 N Add/Sub → 2
F4 4 L1 1
F5 5 0
Mult RS
b op Vi Vj Qi Qj
F6 6 0
M1 N Mult/div
F7 7 0
M2 N
F8 8 0 Mult → 10
Div → 40
F9 9 0
F10 10 0
23
issue start write
Life Cycle of one Instruction: Clock Cycle 1 MUL 1
1. Is there a place in relevant reservation station?
Load RS
Fetch & Decode busy Address
L1 No
MULT F0, F2, F4 Mem
L2 No
LD → 2
Val Rs O L3 No
F0 0 0 Add RS
F1 1 0
b op Vi Vj Qi Qj L1 : 23
A1 N Commit Unit
F2 2 0 Add/Sub (reorder
F3 3 0
A2 N buffer)
A3 N Add/Sub → 2
F4 4 L1 1
F5 5 0
Mult RS
b op Vi Vj Qi Qj
F6 6 0
M1 Yes M Mult/div
F7 7 0
M2 N
F8 8 0 Mult → 10
Div → 40
F9 9 0 2. Get Operands from Register File or Reorder Buffer
F10 10 0
24
issue start write
Life Cycle of one Instruction: Clock Cycle 1 MUL 1
1. Is there a place in relevant reservation station?
Load RS
Fetch & Decode busy Address
L1 No
MULT F0, F2, F4 Mem
L2 No
LD → 2
Val Rs O L3 No
F0 0 0 Add RS
F1 1 0
b op Vi Vj Qi Qj L1 : 23
A1 N Commit Unit
F2 2 0 Add/Sub (reorder
F3 3 0
A2 N buffer)
A3 N Add/Sub → 2
F4 4 L1 1
F5 5 0
Mult RS
b op Vi Vj Qi Qj
F6 6 0
M1 Yes M 2 23 Mult/div
F7 7 0
M2 N
F8 8 0 Mult → 10
Div → 40
F9 9 0 2. Get Operands from Register File or Reorder Buffer
F10 10 0
25
issue start write
Life Cycle of one Instruction: Clock Cycle 1 MUL 1
1. Is there a place in relevant reservation station?
Load RS
Fetch & Decode busy Address
L1 No
MULT F0, F2, F4 Mem
L2 No
LD → 2
Val Rs O L3 No
F0 0 M1 0 Add RS
3.
F1Rename
1 destination
0 register b op Vi Vj Qi Qj L1 : 23
A1 N Commit Unit
F2 2 0 Add/Sub (reorder
F3 3 0
A2 N buffer)
A3 N Add/Sub → 2
F4 4 L1 1
F5 5 0
Mult RS
b op Vi Vj Qi Qj
F6 6 0
M1 Yes M 2 23 Mult/div
F7 7 0
M2 N
F8 8 0 Mult → 10
Div → 40
F9 9 0 2. Get Operands from Register File or Reorder Buffer
F10 10 0
26
Cycle#2
Execute Phase

‹#›
issue start write
Life Cycle of one Instruction: Clock Cycle 2 MUL 1 2
Load RS
Fetch & Decode busy Address
L1 No
MULT F0, F2, F4 Mem
L2 No
LD → 2
Val Rs O L3 No
F0 0 M1 0 Add RS
F1 1 0
b op Vi Vj Qi Qj L1 : 23
A1 N Commit Unit
F2 2 0 Add/Sub (reorder
F3 3 0
A2 N buffer)
A3 N Add/Sub → 2
F4 4 L1 1
F5 5 0
Mult RS
b op Vi Vj Qi Qj
F6 6 0
M1 Yes M 2 23 Mult/div
F7 7 0
M2 N
F8 8 0 Mult → 10
Div → 40
F9 9 0 1. Are all operands ready ?
F10 10 0 2. Is Mult unit busy ?
28
issue start write
Life Cycle of one Instruction: Clock Cycle 2 MUL 1 2
Load RS
Fetch & Decode busy Address
L1 No
MULT F0, F2, F4 Mem
L2 No
LD → 2
Val Rs O L3 No
F0 0 M1 0 Add RS
F1 1 0
b op Vi Vj Qi Qj
A1 N Commit Unit
F2 2 0 Add/Sub (reorder
F3 3 0
A2 N buffer)
A3 N Add/Sub → 2
F4 4 0
F5 5 0
Mult RS
b op Vi Vj Qi Qj M1
F6 6 0
M1 Yes M 2 23 Mult/div
F7 7 0
M2 N
F8 8 0 Mult → 10
Div → 40
F9 9 0 1. Are all operands ready ?
F10 10 0 2. Is Mult unit busy ?
29
Fast forward to
Cycle#12
Write Result

‹#›
issue start write
Life Cycle of one Instruction: Clock Cycle 12 MUL 1 2 12
Load RS
Fetch & Decode busy Address
L1 No
MULT F0, F2, F4 Mem
L2 No
LD → 2
Val Rs O L3 No
F0 0 M1 0 Add RS
F1 1 0
b op Vi Vj Qi Qj
A1 N Commit Unit
F2 2 0 Add/Sub (reorder
F3 3 0
A2 N buffer)
A3 N Add/Sub → 2
F4 4 0
F5 5 0
Mult RS
b op Vi Vj Qi Qj M1: 46
F6 6 0
M1 Yes M 2 23 Mult/div
F7 7 0
M2 N
F8 8 0 Mult → 10
Div → 40
F9 9 0 1. Is common databus free?
F10 10 0
31
issue start write
Life Cycle of one Instruction: Clock Cycle 12 MUL 1 2 12
Load RS
Fetch & Decode busy Address
L1 No
MULT F0, F2, F4 Mem
L2 No
LD → 2
Val Rs O L3 No
F0 0 M1 0 Add RS
F1 1 0
b op Vi Vj Qi Qj M1: 46
A1 N Commit Unit
F2 2 0 Add/Sub (reorder
F3 3 0
A2 N buffer)
A3 N Add/Sub → 2
F4 4 0
F5 5 0
Mult RS
b op Vi Vj Qi Qj
F6 6 0
M1 Yes M 2 23 Mult/div
F7 7 0
M2 N
F8 8 0 Mult → 10
Div → 40
F9 9 0 1. Is common databus free?
F10 10 0
2. Go to every one listening to the bus 32
issue start write
Life Cycle of one Instruction: Clock Cycle 12 MUL 1 2 12
Load RS
Fetch & Decode busy Address
L1 No
MULT F0, F2, F4 Mem
L2 No
LD → 2
Val Rs O L3 No
F0 0 M1 0 Add RS
F1 1 0
b op Vi Vj Qi Qj M1: 46
A1 N Commit Unit
F2 2 0 Add/Sub (reorder
F3 3 0
A2 N buffer)
A3 N Add/Sub → 2
F4 4 0
F5 5 0
Mult RS
b op Vi Vj Qi Qj
F6 6 0
M1 Yes M 2 23 Mult/div
F7 7 0
M2 N
F8 8 0 Mult → 10
Div → 40
F9 9 0 1. Is common databus free?
F10 10 0
2. Go to every one listening to the bus 33
issue start write
Life Cycle of one Instruction: Clock Cycle 12 MUL 1 2 12
Load RS
Fetch & Decode busy Address
L1 No
MULT F0, F2, F4 Mem
L2 No
LD → 2
Val Rs O L3 No
F0 0 M1 0 Add RS
F1 1 0
b op Vi Vj Qi Qj M1: 46
A1 N Commit Unit
F2 2 0 Add/Sub (reorder
F3 3 0
A2 N buffer)
A3 N Add/Sub → 2
F4 4 0
F5 5 0
Mult RS
b op Vi Vj Qi Qj
F6 6 0
M1 No Mult/div
F7 7 0
M2 N
F8 8 0 Mult → 10
Div → 40
F9 9 0 1. Is common databus free?
F10 10 0
2. Go to every one listening to the bus 34
issue start write
Life Cycle of one Instruction: Clock Cycle 12 MUL 1 2 12
Load RS
Fetch & Decode busy Address
L1 No
MULT F0, F2, F4 Mem
L2 No
LD → 2
Val Rs O L3 No
F0 0 M1 1 Add RS
F1 1 0
b op Vi Vj Qi Qj M1: 46
A1 N Commit Unit
F2 2 0 Add/Sub (reorder
F3 3 0
A2 N buffer)
A3 N Add/Sub → 2
F4 4 0
F5 5 0
Mult RS
b op Vi Vj Qi Qj
F6 6 0
M1 No Mult/div
F7 7 0
M2 N
F8 8 0 Mult → 10
Div → 40
F9 9 0 1. Is common databus free?
F10 10 0
2. Go to every one listening to the bus 35
Fast forward to sometime …

‹#›
issue start write
Life Cycle of one Instruction: Clock Cycle X MUL 1 2 12
Load RS
Fetch & Decode busy Address
L1 No
MULT F0, F2, F4 Mem
L2 No
LD → 2
Val Rs O L3 No
F0 0 M1 1 Add RS
F1 1 0
b op Vi Vj Qi Qj M1 : 46
A1 N Commit Unit
F2 2 0 Add/Sub (reorder
F3 3 0
A2 N buffer)
A3 N Add/Sub → 2
F4 4 0
F5 5 0
Mult RS
1. Is the top of the
b op Vi Vj Qi Qj
F6 6 0 queue ready to
M1 N Mult/div commit?
F7 7 0
M2 N
F8 8 0 Mult → 10
Div → 40
F9 9 0
F10 10 0
37
issue start write
Life Cycle of one Instruction: Clock Cycle X MUL 1 2 12
Load RS
Fetch & Decode busy Address
L1 No
MULT F0, F2, F4 Mem
L2 No
LD → 2
Val Rs O L3 No
F0 0 0 Add RS
F1 1 0
b op Vi Vj Qi Qj
A1 N Commit Unit
F2 2 0 Add/Sub (reorder
F3 3 0
A2 N buffer)
A3 N Add/Sub → 2
F4 4 0
F5 5 0
Mult RS
1. Is the top of the
b op Vi Vj Qi Qj
F6 6 0 queue ready to
M1 N Mult/div commit?
F7 7 0
M2 N
F8 8 0 Mult → 10
Div → 40
F9 9 0
F10 10 0
38
Another Example
LD F6, 34(R2) Latencies (clock cycles)
LD F2, 45(R3) LD 2
MULT F0, F2, F4 MULT 10
SUBD F8, F6, F2 DIVD 40
DIVD F10, F0, F6 ADDD, SUBD 2
ADDD F6, F8, F2

‹#›
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle 1 LD

SUBD F8, F6, F2 LD

DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address MUL
MULT F4,F1,F5
L1 No SUBD
Mem
L2 No DIV
LD → 2
Val Rs O L3 No ADD

F0 0 0 Add RS
b op Vi Vj Qi Qj
F1 1 0
A1 No Commit Unit
F2 2 0 Add/Sub (reorder
A2 No buffer)
F3 3 0
A3 No Add/Sub → 2
F4 4 0
Mult RS
F5 5 0
b op Vi Vj Qi Qj
F6 6 0
M1 No Mult/div
F7 7 0
M2 No Mult → 10
F8 8 0
Div → 40
F9 9 0
F10 10 0 40
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle 1 LD 1
SUBD F8, F6, F2 LD
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address MUL
MULT F4,F1,F5
L1 Yes 34+R2 SUBD
Mem
L2 No DIV
LD → 2
Val Rs O L3 No ADD

F0 0 0 Add RS
b op Vi Vj Qi Qj
F1 1 0
A1 No Commit Unit
F2 2 0 Add/Sub (reorder
A2 No buffer)
F3 3 0
A3 No Add/Sub → 2
F4 4 0
Mult RS
F5 5 0
b op Vi Vj Qi Qj
F6 6 L1 0
M1 No Mult/div
F7 7 0
M2 No Mult → 10
F8 8 0
Div → 40
F9 9 0
F10 10 0 41
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle 2 LD 1 2
SUBD F8, F6, F2 LD 2
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address L1 MUL
MULT F4,F1,F5
L1 Yes 34+R2 SUBD
Mem
L2 Yes 45+R2 DIV
LD → 2
Val Rs O L3 No ADD

F0 0 0 Add RS
b op Vi Vj Qi Qj
F1 1 0
A1 No Commit Unit
F2 2 L2 0 Add/Sub (reorder
A2 No buffer)
F3 3 0
A3 No Add/Sub → 2
F4 4 0
Mult RS
F5 5 0
b op Vi Vj Qi Qj
F6 6 L1 0
M1 No Mult/div
F7 7 0
M2 No Mult → 10
F8 8 0
Div → 40
F9 9 0
F10 10 0 42
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle 3 LD 1 2
SUBD F8, F6, F2 LD 2
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address L1 MUL 3
MULT F4,F1,F5
L1 Yes 34+R2 SUBD
Mem
L2 Yes 45+R2 DIV
LD → 2
Val Rs O L3 No ADD

F0 0 M1 0 Add RS
b op Vi Vj Qi Qj
F1 1 0
A1 No Commit Unit
F2 2 L2 0 Add/Sub (reorder
A2 No buffer)
F3 3 0
A3 No Add/Sub → 2
F4 4 0
Mult RS
F5 5 0
b op Vi Vj Qi Qj
F6 6 L1 0
M1 Yes M 4 L2 Mult/div
F7 7 0
M2 No Mult → 10
F8 8 0
Div → 40
F9 9 0 Note that we wrote L2
instead of register value
F10 10 0 43
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle 4 L2 started exe
LD 1 2 4
SUBD F8, F6, F2 Rs got freed LD 2 4
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address L2 MUL 3
MULT F4,F1,F5
L1 No SUBD 4
Mem
L2 Yes 45+R2 DIV
LD → 2
Val Rs O L3 No ADD

F0 0 M1 0 Add RS
b op Vi Vj Qi Qj L1:F6
F1 1 0
A1 Yes Sub V(L1)
L2 Commit Unit
F2 2 L2 0 Add/Sub (reorder
A2 No Sub read L1 buffer)
F3 3 0
A3 No value Add/Sub → 2
F4 4 0
Set it to Mult RS
F5 5 0 buffer
reorder b op Vi Vj Qi Qj
F6 6 L1 1
M1 Yes M 4 L2 Mult/div
F7 7 0
M2 No Mult → 10
F8 8 A1 0
Div → 40
F9 9 0
F10 10 0 44
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle 5 LD 1 2 4
SUBD F8, F6, F2 LD 2 4
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address L2 MUL 3
MULT F4,F1,F5
L1 No SUBD 4
Mem
L2 Yes 45+R2 DIV 5
LD → 2
Val Rs O L3 No ADD

F0 0 M1 0 Add RS
b op Vi Vj Qi Qj
F1 1 0
A1 Yes Sub V(L1)
L2 Commit Unit
F2 2 L2 0 Add/Sub (reorder
A2 No buffer)
F3 3 0
A3 No Add/Sub → 2
F4 4 0
reorder buffer Mult RS
F5 5 0
freed b op Vi Vj Qi Qj
V(L1)
F6 0
M1 Yes M 4 L2 Mult/div
F7 7 0
V(L1)
M2 Yes D M1 Mult → 10
F8 8 A1 0
Div → 40
F9 9 0
F10 10 M2 0 45
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle 6 LD 1 2 4
SUBD F8, F6, F2 LD 2 4 6
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address MUL 3
MULT F4,F1,F5
L1 No SUBD 4
Mem
L2 No DIV 5
LD → 2
Val Rs O L3 No ADD 6
F0 0 M1 0 Add RS Now Sub is ready, it will
b start
op exe
Vi next
Vj cycle
Qi Qj L2:F2
F1 1 0
A1 Yes Sub V(L1) V(L2) Commit Unit
F2 2 L2 1 Add/Sub (reorder
Add V(L2)
A2 Yes A1 buffer)
F3 3 0
A3 No Add/Sub → 2
F4 4 0
Mult RS
F5 5 0
b op Vi Vj Qi Qj
V(L1)
F6 A2 0
M1 Yes M V(L2)
4 Mult/div
F7 7 0
V(L1)
M2 Yes Mult
D is also ready,M1
what Mult → 10
F8 8 A1 0
will happen next cycle? Div → 40
F9 9 0
F10 10 M2 0 46
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle7 LD 1 2 4
SUBD F8, F6, F2 LD 2 4 6
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address MUL 3 7
MULT F4,F1,F5
L1 No SUBD 4 7
Mem
L2 No DIV 5
LD → 2
Val Rs O L3 No ADD 6 No place in
F0 0 M1 0 Add RS MUL RS, STALL
b op Vi Vj Qi Qj A1
F1 1 0
A1 Yes Sub V(L1) V(L2) Commit Unit
F2 2 0 Add/Sub (reorder
Add V(L2)
A2 Yes A1 buffer)
F3 3 0
A3 No Add/Sub → 2
F4 4 0
Mult RS
F5 5 0 M1
b op Vi Vj Qi Qj
V(L1)
F6 A2 0
M1 Yes M V(L2)
4 Mult/div
F7 7 0
V(L1)
M2 Yes D M1 Mult → 10
F8 8 A1 0
Div → 40
F9 9 0
F10 10 M2 0 47
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle8 LD 1 2 4
SUBD F8, F6, F2 LD 2 4 6
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address MUL 3 7
MULT F4,F1,F5
L1 No SUBD 4 7
Mem
L2 No DIV 5
LD → 2
Val Rs O L3 No ADD 6 No place in
F0 0 M1 0 Add RS MUL RS, STALL
b op Vi Vj Qi Qj A1
F1 1 0
A1 Yes Sub V(L1) V(L2) Commit Unit
F2 2 0 Add/Sub (reorder
Add V(L2)
A2 Yes A1 buffer)
F3 3 0
A3 No Add/Sub → 2
F4 4 0
Mult RS
F5 5 0 M1
b op Vi Vj Qi Qj
V(L1)
F6 A2 0
M1 Yes M V(L2)
4 Mult/div
F7 7 0
V(L1)
M2 Yes D M1 Mult → 10
F8 8 A1 0
Div → 40
F9 9 0
F10 10 M2 0 48
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle9 LD 1 2 4
SUBD F8, F6, F2 LD 2 4 6
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address MUL 3 7
MULT F4,F1,F5
L1 No SUBD 4 7 9
Mem
L2 No DIV 5
LD → 2
Val Rs O L3 No ADD 6 No place in
F0 0 M1 0 Add RS MUL RS, STALL
b op Vi Vj Qi Qj
F1 1 0
A1 No Commit Unit
F2 2 0 Add/Sub (reorder
A1:F8
Add v(A1) V(L2)
A2 Yes buffer)
F3 3 0
A3 No Add/Sub → 2
F4 4 0
Mult RS
F5 5 0 M1
b op Vi Vj Qi Qj
V(L1)
F6 A2 0
M1 Yes M V(L2)
4 Mult/div
F7 7 0
V(L1)
M2 Yes D M1 Mult → 10
F8 8 A1 1
Div → 40
F9 9 0
F10 10 M2 0 49
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle10 LD 1 2 4
SUBD F8, F6, F2 LD 2 4 6
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address MUL 3 7
MULT F4,F1,F5
L1 No SUBD 4 7 9
Mem
L2 No DIV 5
LD → 2
Val Rs O L3 No ADD 6 10
F0 0 M1 0 Add RS MUL No place in
b op Vi Vj Qi Qj A2 RS, STALL
F1 1 0
A1 No Commit Unit
F2 2 0 Add/Sub (reorder
A2 Yes Add v(A1) V(L2) A1:F8
F3 3 0 buffer)
A3 No Add/Sub → 2
F4 4 0
Mult RS
F5 5 0 M1
b op Vi Vj Qi Qj
V(L1)
F6 A2 0
M1 Yes M V(L2)
4 Mult/div
F7 7 0
V(L1)
M2 Yes D M1 Mult → 10
F8 8 A1 1
Div → 40
F9 9 0
F10 10 M2 0 50
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle11 LD 1 2 4
SUBD F8, F6, F2 LD 2 4 6
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address MUL 3 7
MULT F4,F1,F5
L1 No SUBD 4 7 9
Mem
L2 No DIV 5
LD → 2
Val Rs O L3 No ADD 6 10
F0 0 M1 0 Add RS MUL No place in
b op Vi Vj Qi Qj A2 RS, STALL
F1 1 0
A1 No Commit Unit
F2 2 0 Add/Sub (reorder
A2 Yes Add v(A1) V(L2) A1:F8
F3 3 0 buffer)
A3 No Add/Sub → 2
F4 4 0
Mult RS
F5 5 0 M1
b op Vi Vj Qi Qj
V(L1)
F6 A2 0
M1 Yes M V(L2)
4 Mult/div
F7 7 0
V(L1)
M2 Yes D M1 Mult → 10
F8 8 A1 1
Div → 40
F9 9 0
F10 10 M2 0 51
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle12 LD 1 2 4
SUBD F8, F6, F2 LD 2 4 6
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address MUL 3 7
MULT F4,F1,F5
L1 No SUBD 4 7 9
Mem
L2 No DIV 5
LD → 2
Val Rs O L3 No ADD 6 10 12
F0 0 M1 0 Add RS MUL No place in
b op Vi Vj Qi Qj RS, STALL
F1 1 0
A1 No Commit Unit
F2 2 0 Add/Sub (reorder
A2 No A1:F8
F3 3 0 buffer)
A3 No Add/Sub → 2 A2:F6
F4 4 0
Mult RS
F5 5 0 M1
b op Vi Vj Qi Qj
V(L1)
F6 A2 1
M1 Yes M V(L2)
4 Mult/div
F7 7 0
V(L1)
M2 Yes D M1 Mult → 10
F8 8 A1 1
Div → 40
F9 9 0
F10 10 M2 0 52
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle13 LD 1 2 4
SUBD F8, F6, F2 LD 2 4 6
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address MUL 3 7
MULT F4,F1,F5
L1 No SUBD 4 7 9
Mem
L2 No DIV 5
LD → 2
Val Rs O L3 No ADD 6 10 12
F0 0 M1 0 Add RS MUL No place in
b op Vi Vj Qi Qj RS, STALL
F1 1 0
A1 No Commit Unit
F2 2 0 Add/Sub (reorder
A2 No A1:F8
F3 3 0 buffer)
A3 No Add/Sub → 2 A2:F6
F4 4 0
Mult RS
F5 5 0 M1
b op Vi Vj Qi Qj
V(L1)
F6 A2 1
M1 Yes M V(L2)
4 Mult/div
F7 7 0
V(L1)
M2 Yes D M1 Mult → 10
F8 8 A1 1
Div → 40
F9 9 0
F10 10 M2 0 53
Fast forward to
clock#17
Write Result

‹#›
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle17 LD 1 2 4
SUBD F8, F6, F2 LD 2 4 6
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address MUL 3 7 17
MULT F4,F1,F5
L1 No SUBD 4 7 9
Mem
L2 No DIV 5
LD → 2
Val Rs O L3 No ADD 6 10 12
F0 0 M1 1 Add RS MUL No place in
b op Vi Vj Qi Qj RS, STALL
F1 1 0 M1:F0
A1 No Commit Unit
F2 2 0 Add/Sub (reorder
A2 No A1:F8
F3 3 0 buffer)
A3 No Add/Sub → 2 A2:F6
F4 4 0
Mult RS
F5 5 0
b op Vi Vj Qi Qj
V(L1)
F6 A2 1
M1 No Mult/div
F7 7 0
V(M1 V(L1)
M2 Yes D Mult → 10
F8 8 A1 1
M1 is now free, Mul can be Div → 40
F9 9 0
loaded next cycle
F10 10 M2 0 55
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle18 LD 1 2 4
SUBD F8, F6, F2 LD 2 4 6
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address MUL 3 7 17
MULT F4,F1,F5
L1 No SUBD 4 7 9
Mem
L2 No DIV 5 18
LD → 2
Val Rs O L3 No ADD 6 10 12
F0 V(M1)
0 Add RS MUL 18
b op Vi Vj Qi Qj
F1 1 0 M1:F0
A1 No Commit Unit
F2 2 0 Add/Sub (reorder
A2 No A1:F8
F3 3 0 buffer)
A3 No Add/Sub → 2 A2:F6
F4 4 M1 0
Mult RS
F5 5 0 M2
b op Vi Vj Qi Qj
V(L1)
F6 A2 1
M1 Yes M 1 5 Mult/div
F7 7 0
V(M1 V(L1)
M2 Yes D Mult → 10
F8 8 A1 1
Div → 40
F9 9 0 M1 is ready can it execute?
F10 10 M2 0 56
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle19 LD 1 2 4
SUBD F8, F6, F2 LD 2 4 6
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address MUL 3 7 17
MULT F4,F1,F5
L1 No SUBD 4 7 9
Mem
L2 No DIV 5 18
LD → 2
Val Rs O L3 No ADD 6 10 12
F0 V(M1)
0 Add RS MUL 18
b op Vi Vj Qi Qj
F1 1 0
A1 No Commit Unit
F2 2 0 Add/Sub (reorder
A2 No A1:F8
F3 3 0 buffer)
A3 No Add/Sub → 2 A2:F6
F4 4 M1 0
Mult RS
F5 5 0 M2 A1 can now be written to
b op Vi Vj Qi Qj
F6 V(L1)
A2 1 register file
M1 Yes M 1 5 Mult/div
F7 7 0
V(M1 V(L1)
V(A1)
M2 Yes D Mult → 10
F8 0
Div → 40
F9 9 0
F10 10 M2 0 57
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle20 LD 1 2 4
SUBD F8, F6, F2 LD 2 4 6
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address MUL 3 7 17
MULT F4,F1,F5
L1 No SUBD 4 7 9
Mem
L2 No DIV 5 18
LD → 2
Val Rs O L3 No ADD 6 10 12
F0 V(M1)
0 Add RS MUL 18
b op Vi Vj Qi Qj
F1 1 0
A1 No Commit Unit
F2 2 0 Add/Sub (reorder
A2 No buffer)
F3 3 0
A3 No Add/Sub → 2 A2:F6
F4 4 M1 0
Mult RS
F5 5 0 M2
b op Vi Vj Qi Qj Why A2 is still in reorder buffer?
V(L1)
F6 A2 1
M1 Yes M 1 5 Mult/div
F7 7 0
V(M1 V(L1)
V(A1)
M2 Yes D Mult → 10
F8 0
Div → 40
F9 9 0
F10 10 M2 0 58
Fast forward to
clock#58
Write Result

‹#›
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle58 LD 1 2 4
SUBD F8, F6, F2 LD 2 4 6
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address MUL 3 7 17
MULT F4,F1,F5
L1 No SUBD 4 7 9
Mem
L2 No DIV 5 18 58
LD → 2
Val Rs O L3 No ADD 6 10 12
F0 V(M1)
0 Add RS MUL 18 58
b op Vi Vj Qi Qj
F1 1 0
A1 No Commit Unit
F2 2 0 Add/Sub (reorder
A2 No M2:F10
F3 3 0 buffer)
A3 No Add/Sub → 2 A2:F6
F4 4 M1 0
Mult RS
F5 5 0 M1
b op Vi Vj Qi Qj
V(L1)
F6 A2 1
M1 Yes M 1 5 Mult/div
F7 7 0
V(A1)
M2 No Mult → 10
F8 0
Div → 40
F9 9 0 M1 is ready can it execute?
F10 10 M2 1 60
Fast forward to
clock#68
Write Result

‹#›
LD F6, 34(R2) issue start write
LD
MULT
F2, 45(R3)
F0, F2, F4
Clock Cycle68 LD 1 2 4
SUBD F8, F6, F2 LD 2 4 6
DIVD F10, F0, F6
Load RS
ADDD F6, F8, F2 busy Address MUL 3 7 17
MULT F4,F1,F5
L1 No SUBD 4 7 9
Mem
L2 No DIV 5 18 58
LD → 2
Val Rs O L3 No ADD 6 10 12
F0 V(M1)
0 Add RS MUL 18 58 68
b op Vi Vj Qi Qj
F1 1 0
A1 No Commit Unit
F2 2 0 Add/Sub (reorder
A2 No buffer)
F3 3 0
A3 No Add/Sub → 2 M1:F4
F4 4 M1 1
Mult RS
F5 5 0
V(A2
b op Vi Vj Qi Qj
F6 0
M1 No Mult/div
F7 7 0
V(A1)
M2 No Mult → 10
F8 0
Div → 40
F9 9 0
F10 V(m 0
2) 62
Important Notes
• If an instruction fetches at cycle#1 and takes 4 cycles to execute, Then it
will start execution at cycle#2 and finishes at the end of cycle#5 And
writebacks at cycle#6
• The next instruction requires the FU will start at #6
• The next instruction requires the RS will use it at #7
• The next instruction requires the data in Exe will use it at #7
(no forwarding)
• Execution units are non-pipelined unless stated the opposite.
• Only one execution unit available per function unless stated otherwise.
Tomasulo’s Summary
z Prevents Register as bottleneck

z Avoids different data hazards

z Lasting Contributions
y Dynamic scheduling
y Register renaming
y Load/store buffers

Performance is limited by Common Data bus, WHY?!!!!

64
Tomasulo’s Summary issue start write

• Without re-order buffer LD 1 2 4

LD 2 4 6
• In-order issue, out-of-order execution, and out-
MUL 3 7 17
of-order completion
SUBD 4 7 9
• What will happen in case of control Hazard ? DIV 5 18 58
• Tomasulo with re-order buffer is called ADD 6 10 12
speculative Tomasulo MUL 18 58 68
• What is the speedup of this processor
compared to similar architecture without
dynamic scheduling?
Four Stages of Tomasulo’s Algorithm
1. Issue—get instruction from FP Op Queue
If reservation station and reorder buffer slot free, issue instr & send operands & reorder buffer no.
for destination (this stage sometimes called “dispatch”)
2. Execution—operate on operands (EX)
When both operands ready then execute; if not ready, watch CDB for result; when both in
reservation station, execute; checks RAW (sometimes called “issue”)
3. Write result—finish execution (WB)
Write on Common Data Bus to all awaiting FUs
& reorder buffer; mark reservation station available.
4. Commit—update register with reorder result
When instr. at head of reorder buffer & result present, update register with result (or store to
memory) and remove instr from reorder buffer. Mispredicted branch flushes reorder buffer
(sometimes called “graduation”)
Speculative Tomasulo’s Algorithm
Important Terms
Exercise
• LD F1, 0(R2) //LD 3 cycles
• ADD F2,F3,F1 //ALU → 1cyclle
• SUB F2,F4,F5
• XOR F4,F2,F1
• SW F1, 4(R1) // SW → 3cycles

‹#›
Thank you

‹#›

Topic 6 BPR Methodology
0% (1)
Topic 6 BPR Methodology
23 pages
Aviat PV User Manual PDF
100% (3)
Aviat PV User Manual PDF
568 pages
Lecture 9: Dynamic Scheduling: Kunle Olukotun Gates 302 Kunle@ogun - Stanford.edu
No ratings yet
Lecture 9: Dynamic Scheduling: Kunle Olukotun Gates 302 Kunle@ogun - Stanford.edu
14 pages
Midterm Recap: Performance Evaluation
No ratings yet
Midterm Recap: Performance Evaluation
5 pages
Parallelism Via Instructions: Instruction-Level Parallelism (ILP)
No ratings yet
Parallelism Via Instructions: Instruction-Level Parallelism (ILP)
21 pages
Onur Digitaldesign - Comparch 2021 Lecture16 Out of Order Execution Beforelecture
No ratings yet
Onur Digitaldesign - Comparch 2021 Lecture16 Out of Order Execution Beforelecture
89 pages
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
No ratings yet
Onur 447 Spring15 Lecture12 Ooo Execution Afterlecture
67 pages
DSP q1
No ratings yet
DSP q1
7 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
36 pages
Lec18 Tomasulo Algorithm
No ratings yet
Lec18 Tomasulo Algorithm
40 pages
ILP-Architectures Part I
No ratings yet
ILP-Architectures Part I
56 pages
Star Lion College of Engineering & Technology: Cs2354 Aca-2 Marks & 16 Marks
No ratings yet
Star Lion College of Engineering & Technology: Cs2354 Aca-2 Marks & 16 Marks
14 pages
Onur Digitaldesign - Comparch 2021 Lecture15b Out of Order Execution I Afterlecture
No ratings yet
Onur Digitaldesign - Comparch 2021 Lecture15b Out of Order Execution I Afterlecture
110 pages
Chapter 2 Lecture 4 and 5
No ratings yet
Chapter 2 Lecture 4 and 5
56 pages
Pipelining
No ratings yet
Pipelining
21 pages
Computer Architecture Chapter 4: The Processor Part 3: Dr. Phạm Quốc Cường
No ratings yet
Computer Architecture Chapter 4: The Processor Part 3: Dr. Phạm Quốc Cường
23 pages
Instruction Level Parallelism
No ratings yet
Instruction Level Parallelism
49 pages
Ca06 2014 PDF
No ratings yet
Ca06 2014 PDF
53 pages
ACA Question Bank
No ratings yet
ACA Question Bank
19 pages
08 Speculation
No ratings yet
08 Speculation
21 pages
Onur Digitaldesign - Comparch 2021 Lecture14 Pipelined Processor Design Afterlecture
No ratings yet
Onur Digitaldesign - Comparch 2021 Lecture14 Pipelined Processor Design Afterlecture
97 pages
DSP - Presentation - Sumit 5
No ratings yet
DSP - Presentation - Sumit 5
45 pages
William Stallings Computer Organization and Architecture: CPU Structure and Function
No ratings yet
William Stallings Computer Organization and Architecture: CPU Structure and Function
40 pages
CH10-Processor Structure and Function
No ratings yet
CH10-Processor Structure and Function
14 pages
Instruction Pipeline Design, Arithmetic Pipeline Deign - Super Scalar Pipeline Design
No ratings yet
Instruction Pipeline Design, Arithmetic Pipeline Deign - Super Scalar Pipeline Design
34 pages
Lec5 - ILP Issues in Pipeline Design
No ratings yet
Lec5 - ILP Issues in Pipeline Design
38 pages
Aca Important Questions 2 Marks 16marks
60% (5)
Aca Important Questions 2 Marks 16marks
18 pages
QUESTION BANK UNIT 5 - Computer Organization and Architecture
No ratings yet
QUESTION BANK UNIT 5 - Computer Organization and Architecture
9 pages
M116C 1 M116C 1 Lec10-Pipeline-II
No ratings yet
M116C 1 M116C 1 Lec10-Pipeline-II
18 pages
CPU Structure & Functions
No ratings yet
CPU Structure & Functions
44 pages
Contact Session 8
No ratings yet
Contact Session 8
63 pages
CAQA5e ch3
No ratings yet
CAQA5e ch3
45 pages
U. Wisconsin CS/ECE 752 Advanced Computer Architecture I
No ratings yet
U. Wisconsin CS/ECE 752 Advanced Computer Architecture I
74 pages
Dynamic Scheduling - Tomasulo Algorithm
No ratings yet
Dynamic Scheduling - Tomasulo Algorithm
48 pages
Computer Architecture: Edited by Galatro Giovanni
No ratings yet
Computer Architecture: Edited by Galatro Giovanni
34 pages
Module 5 - Processor Structure and Function
No ratings yet
Module 5 - Processor Structure and Function
74 pages
Superscalar Processor Simulator Report PDF Version
No ratings yet
Superscalar Processor Simulator Report PDF Version
16 pages
CA Lecture 12
No ratings yet
CA Lecture 12
48 pages
Pipeline and Vector Processing
100% (1)
Pipeline and Vector Processing
18 pages
Contact Session 8 - With Annotation-1
No ratings yet
Contact Session 8 - With Annotation-1
47 pages
Chapter One: Introduction To Pipelined Processors
No ratings yet
Chapter One: Introduction To Pipelined Processors
48 pages
03ILP Speculation and Advanced Topics
No ratings yet
03ILP Speculation and Advanced Topics
48 pages
Lecture-11 Dynamic Scheduling A
No ratings yet
Lecture-11 Dynamic Scheduling A
18 pages
Coa Unit 5
No ratings yet
Coa Unit 5
20 pages
Onur Ddca 2025 Lecture14 Out of Order Execution Afterlecture
No ratings yet
Onur Ddca 2025 Lecture14 Out of Order Execution Afterlecture
114 pages
William Stallings Computer Organization and Architecture: CPU Structure and Function
No ratings yet
William Stallings Computer Organization and Architecture: CPU Structure and Function
22 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
30 pages
Ec 6009 - Advanced Computer Architecture 2 Marks
No ratings yet
Ec 6009 - Advanced Computer Architecture 2 Marks
8 pages
Dynamic Scheduling:-: If An Instruction Is Stalled in The Pipeline, No Later Instructions Can Proceed
No ratings yet
Dynamic Scheduling:-: If An Instruction Is Stalled in The Pipeline, No Later Instructions Can Proceed
4 pages
EE (CE) 6304 Computer Architecture Lecture #2 (8/28/13)
No ratings yet
EE (CE) 6304 Computer Architecture Lecture #2 (8/28/13)
35 pages
10th Lecture: Multiple-Issue Processors: Please Recall: Branch Prediction
No ratings yet
10th Lecture: Multiple-Issue Processors: Please Recall: Branch Prediction
28 pages
Superscalar
No ratings yet
Superscalar
38 pages
Reduced Instruction Set Computer (Risc) Complex Instruction Set Computer (Cisc)
No ratings yet
Reduced Instruction Set Computer (Risc) Complex Instruction Set Computer (Cisc)
7 pages
CPU Structure and Function
100% (1)
CPU Structure and Function
30 pages
Unit V
No ratings yet
Unit V
23 pages
Basics and Hazards of Pipeline Controller
No ratings yet
Basics and Hazards of Pipeline Controller
23 pages
OS - I Unit
No ratings yet
OS - I Unit
38 pages
Instructions and Addressing
No ratings yet
Instructions and Addressing
61 pages
EE457Unit9a OoO
No ratings yet
EE457Unit9a OoO
77 pages
Chapter 6
No ratings yet
Chapter 6
71 pages
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
5/5 (1)
C Programming
From Everand
C Programming
Netra
No ratings yet
CMP3010L03 Pipelining
No ratings yet
CMP3010L03 Pipelining
42 pages
Pattern Classification 08. Gaussian Mixture Model: Abdelmoniem Bayoumi, PHD
No ratings yet
Pattern Classification 08. Gaussian Mixture Model: Abdelmoniem Bayoumi, PHD
12 pages
05 Density Estimation
No ratings yet
05 Density Estimation
29 pages
02 Training Patterns
No ratings yet
02 Training Patterns
18 pages
Audio Technica ATH-M20x
No ratings yet
Audio Technica ATH-M20x
1 page
Ps Primer: Description
No ratings yet
Ps Primer: Description
2 pages
Dickson Winter Catalog 2010
No ratings yet
Dickson Winter Catalog 2010
20 pages
Android App Dissertation
100% (2)
Android App Dissertation
5 pages
SLG Math 10 Quarter 2 Week 2
No ratings yet
SLG Math 10 Quarter 2 Week 2
5 pages
PNZ Series
No ratings yet
PNZ Series
2 pages
Lab Jam WASv8 Development Lab
No ratings yet
Lab Jam WASv8 Development Lab
121 pages
JT808-2013 Protocol
No ratings yet
JT808-2013 Protocol
88 pages
CODE UNNATI Marathon by BHIBHUSHITAM
No ratings yet
CODE UNNATI Marathon by BHIBHUSHITAM
91 pages
Breakdown Price: Jasa
No ratings yet
Breakdown Price: Jasa
2 pages
Lect6 Traffic Safety
No ratings yet
Lect6 Traffic Safety
83 pages
1.3. Clarification To Comments On Turbne Foundation Load Calculation - Rev A
No ratings yet
1.3. Clarification To Comments On Turbne Foundation Load Calculation - Rev A
2 pages
MA1014 Lecture 15 and 16 Semester 1 Intake 2023
No ratings yet
MA1014 Lecture 15 and 16 Semester 1 Intake 2023
2 pages
EM-80/EM-300 MDS 5150A/LIT Actuator System: Applications
No ratings yet
EM-80/EM-300 MDS 5150A/LIT Actuator System: Applications
5 pages
HTML Tag Sheet
100% (2)
HTML Tag Sheet
1 page
Template For GigaByte Journal Data Report Submissions
No ratings yet
Template For GigaByte Journal Data Report Submissions
10 pages
Mitsubishi - FD30N
100% (1)
Mitsubishi - FD30N
7 pages
API - Pipeline Fact Sheet - RV8
No ratings yet
API - Pipeline Fact Sheet - RV8
1 page
How To Apply, Submission of Application and Printing of Admit Card
No ratings yet
How To Apply, Submission of Application and Printing of Admit Card
3 pages
Crashing
No ratings yet
Crashing
33 pages
23-04-2024 Tuesday Educational Information and o
No ratings yet
23-04-2024 Tuesday Educational Information and o
2 pages
Pipe2024 Help Manual
No ratings yet
Pipe2024 Help Manual
1,861 pages
113 Trellix NX 4600 Ds Trellix Network Security Tech Specifications Datasheet
No ratings yet
113 Trellix NX 4600 Ds Trellix Network Security Tech Specifications Datasheet
9 pages
ELCB F
No ratings yet
ELCB F
5 pages
2 Crypto
No ratings yet
2 Crypto
86 pages
EO Catalyst
No ratings yet
EO Catalyst
30 pages
Wpq-105-03 Gmaw 3g Jose A. Rivas
No ratings yet
Wpq-105-03 Gmaw 3g Jose A. Rivas
1 page
API ISCAN-LITE Scanner
No ratings yet
API ISCAN-LITE Scanner
4 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

CMP3010L07 Tomasulo

Uploaded by

CMP3010L07 Tomasulo

Uploaded by

CMP3010: Computer Architecture

L06: Multiple Issue Techniques

• in the simplest superscalar instructions issue in-order, and the

how can we do this in superscalar ?

– If reservation stations or reorder buffer are busy, instruction stalls

• Common data bus: data + source + Broadcasts

z Avoid RAW Hazards

z Avoid RAW Hazards

z Registers in instructions replaced by values or pointers to reservation

z A Common Data Bus broadcasts results to all FUs

z Load and Stores treated as FUs with reservation stations as well

1. Is there a place in relevant reservation station & reorder buffer?

SUBD F8, F6, F2 LD

z Avoids different data hazards

Performance is limited by Common Data bus, WHY?!!!!

• Without re-order buffer LD 1 2 4

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.