0% found this document useful (0 votes)
23 views55 pages

Instruction Level Parallelism: Omid Fatemi Advanced Computer Architecture

This document discusses instruction level parallelism using the MIPS R4000 processor as a case study. It describes: - The 8 stage pipeline of the MIPS R4000 and how it leads to load and branch delays. - How dynamic scheduling allows instructions to execute out-of-order to avoid stalls from data hazards, enabled by a scoreboard to track dependencies. - How a scoreboard handles write-after-read and write-after-write hazards from out-of-order execution by stalling instructions during issue or writeback until dependencies are resolved.

Uploaded by

Parham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views55 pages

Instruction Level Parallelism: Omid Fatemi Advanced Computer Architecture

This document discusses instruction level parallelism using the MIPS R4000 processor as a case study. It describes: - The 8 stage pipeline of the MIPS R4000 and how it leads to load and branch delays. - How dynamic scheduling allows instructions to execute out-of-order to avoid stalls from data hazards, enabled by a scoreboard to track dependencies. - How a scoreboard handles write-after-read and write-after-write hazards from out-of-order execution by stalling instructions during issue or writeback until dependencies are resolved.

Uploaded by

Parham
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 55

Instruction Level Parallelism

Omid Fatemi
Advanced Computer Architecture

University of Tehran 1
Outline

• MIPS R4000

• HW Schemes: Instruction Parallelism

• Dynamic Scheduling

• Scoreboard Implications

• MIPS with a Scoreboard

• Scoreboard Example

University of Tehran 2
Putting It All Together:
MIPS R4000
• 8 Stage Pipeline:
– IF–first half of fetching of instruction; PC selection happens here as well
as initiation of instruction cache access.
– IS–second half of access to instruction cache.
– RF–instruction decode and register fetch, hazard checking and also
instruction cache hit detection.
– EX–execution, which includes effective address calculation, ALU
operation, and branch target computation and condition evaluation.
– DF–data fetch, first half of access to data cache.
– DS–second half of access to data cache.
– TC–tag check, determine whether the data cache access hit.
– WB–write back for loads and register-register operations.
• 8 Stages: What is impact on Load delay? Branch delay? Why?

University of Tehran 5
Case Study: MIPS R4000
TWO Cycle IF IS RF EX DF DS TC WB
Load Latency IF IS RF EX DF DS TC
IF IS RF EX DF DS
IF IS RF EX DF
IF IS RF EX
IF IS RF
IF IS
IF

THREE Cycle IF IS RF EX DF DS TC WB
Branch Latency IF IS RF EX DF DS TC
(conditions evaluated IF IS RF EX DF DS
during EX phase) IF IS RF EX DF
IF IS RF EX
Delay slot plus two stalls IF IS RF
Branch likely cancels delay slot if not taken IF IS
IF
University of Tehran 6
Branch Delay

taken

Not taken
University of Tehran 7
MIPS R4000 Floating Point
• FP Adder, FP Multiplier, FP Divider
• Last step of FP Multiplier/Divider uses FP Adder HW
• 8 kinds of stages in FP units:
Stage Functional unit Description
A FP adder Mantissa ADD stage
D FP divider Divide pipeline stage
E FP multiplier Exception test stage
M FP multiplier First stage of multiplier
N FP multiplier Second stage of multiplier
R FP adder Rounding stage
S FP adder Operand shift stage
U Unpack FP numbers

University of Tehran 8
MIPS FP Pipe Stages
FP Instr 1 2 3 4 5 6 7 8 …
Add, Subtract U S+A A+R R+S
Multiply U E+M M M M N N+A R
Divide U A R D28 … D+A D+R, D+R, D+A, D+R, A, R
Square root U E (A+R)108 … A R
Negate U S
Absolute value U S
FP compare U A R
Stages:
M First stage of multiplier
N Second stage of multiplier A Mantissa ADD stage
R Rounding stage D Divide pipeline stage
S Operand shift stage E Exception test stage
U Unpack FP numbers

University of Tehran 9
Latencies

Assuming destination instruction is an FP operation


(one cycle less if FP store)

University of Tehran 10
Multiply followed by Add

University of Tehran 11
Add followed by a Multiply

No stall

University of Tehran 12
Add followed by a Devide

University of Tehran 13
Divide followed by Add

University of Tehran 14
R4000 Performance
• Not ideal CPI of 1:
– Load stalls (1 or 2 clock cycles)
– Branch stalls (2 cycles + unfilled slots)
– FP result stalls: RAW data hazard (latency)
– FP structural stalls: Not enough FP hardware (parallelism)
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
gcc

doduc
espress o

nasa7

ora

tomcatv
eqntott

li

spice2g6

su2cor
Base Load stalls Branch stalls FP result stalls FP structural
stalls
University of Tehran 15
University of Tehran 16
HW Schemes: Instruction Parallelism
• Why in HW at run time?
– Works when can’t know real dependence at compile time
– Compiler simpler
– Code for one machine runs well on another

• Key idea: Allow instructions behind stall to proceed


DIVD F0,F2,F4
ADDD F10,F0,F8
SUBD F12,F8,F14
– Enables out-of-order execution
– Implies out-of-order completion
– ID stage checked both for structural and data hazards
– Scoreboard dates to CDC 6600 in 1963

University of Tehran 18
RISC or CISC

• For ILP?
– Example: A  B + C
– CISC: 1 instruction
– RISC: 4 instruction

• RISC:
– More chance to schedule

University of Tehran 19
Dynamic Scheduling
DIV.D F0, F2, F4
ADD.D F10, F0, F8 •7-cycle divider
SUB.D F12, F8, F14 •4-cycle adder
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
In-order DIV.D F0, F2, F4 F I D E E E E E E E M W
ADD.D F10, F0, F8 F I D S S S S S S E E E E M W
SUB.D F12, F8, F14 F I D S S S S S S S S S E E E E M W

Out-of-order DIV.D F0, F2, F4 F I D E E E E E E E M W


ADD.D F10, F0, F8 F I D S S S S S S E E E E M W
SUB.D F12, F8, F14 F I D E E E E M W

• Instructions are issued in order (leftmost I)


• Execution can begin out of order (leftmost E)
• Execution can terminate out of order (W)

University of Tehran 20
Explanation of I
• To be able to execute the SUB.D instruction
– A function unit must be available
» Adder is free in example
– There should be no data hazards preventing early
execution
» None in this example
– We must be able to recognize the two previous
conditions
» Must examine several instructions before deciding
on what to execute
• I represents the instruction window (or issue
window) in which this examination happens
– If every instruction starts execution in order, then I is
superfluous
– Otherwise:
» Instruction enter the issue window in order
» Several instructions may be in issue window at any
instant
» Execution can begin out of order
University of Tehran 21
HW Schemes: Instruction Parallelism
• Out-of-order execution divides ID stage:
1. Issue—decode instructions, check for structural hazards
2. Read operands—wait until no data hazards, then read operands
• Scoreboards allow instruction to execute whenever
1 & 2 hold, not waiting for prior instructions
• CDC 6600: In order issue, out of order execution, out
of order commit ( also called completion)

University of Tehran 22
Scoreboard Implications
• Out-of-order completion => WAR, WAW hazards?
• WAR:
» DIVD F0,F2,F4
» ADDD F10,F0,F8
» SUBD F8,F10,F14
• WAW:
» DIVD F0,F2,F4
» ADDD F0,F5,F8
• Scoreboard keeps track of dependencies, state or operations
– for WAW: stall in Issue until previous write completes
– for WAR: stall in Write Result until previous read completes
• Need to have multiple instructions in execution phase =>
multiple execution units or pipelined execution units
• Scoreboard replaces ID, EX, WB with 4 stages
University of Tehran 23
Out-of-order Execution and
Renaming
DIV.D F0, F2, F4
ADD.D F10, F0, F8
SUB.D F10, F8, F14

• WAW hazard on register F10: prevents out-


of-order execution on machine like CDC 6600
• If processor was capable of register renaming:
– the WAW hazard would be eliminated
» SUB.D could execute early as before
– example: IBM 360/91

University of Tehran 24
What is a Scoreboard?

A Scoreboard is a table maintained by the


hardware:
– keeps track of instructions being fetched, issued,
executed etc.
– keeps track of the resources (functional units and
operands) they use/need
– keeps track of which instructions modify which registers

uses this information to dynamically schedule


instructions
» very similar to a pen and paper calculation
» simple step-by-step procedure easily implemented in
hardware

University of Tehran 25
MIPS with a Scoreboard

University of Tehran 26
Dynamic Scheduling with a
Scoreboard
• Original development in CDC 6600
• Simplified example in the book for MIPS FP
operations (Read Section C.7)
– Using neither renaming nor forwarding
» Values always move from registers to function units,
and from function units back to registers
– Out-of-order completion can give rise to WAR and WAW
hazards
» Machine “knows” original program order (needed for
hazard detection)
– Machine model
» 2 FP multipliers (10 cycles), 1 FP adder (2 cycles), 1
FP divider (40 cycles), all non-pipelined
» 1 integer unit for everything else (incl. memory
references)

University of Tehran 27
Four Stages of Scoreboard
Control
1. Issue: decode instr. & check for structural hazards (ID1)
– If functional unit is free and no WAW hazard with other active instruction …
» … scoreboard issues the instruction to the functional unit and updates
its internal data structure.
– If a structural or WAW hazard exists …
» … instruction issue stalls
• no further instructions can issue until these hazards are cleared.

2. Read operands: wait until no data hazards, then read (ID2)


– A source operand is available if no earlier issued active instruction is going
to write it.
– When all source operands are available …
» … scoreboard tells the functional unit to proceed to read the operands
from registers and begin execution.
– Thus, scoreboard resolves RAW hazards dynamically in this step
» instructions may be sent into execution out of order
University of Tehran 28
Four Stages of Scoreboard Control
(cont.)
3. Execution: operate on operands
– The functional unit begins execution upon receiving operands
– When result is ready, it notifies the scoreboard

4. Write Result: finish execution (WB)


– Once scoreboard is aware that functional unit has completed execution,
scoreboard checks for WAR hazards.
– If no WAR hazard …
» … it writes results
– If WAR hazard …
» … it stalls the completing instruction
– Example:
DIV.D F0,F2,F4
ADD.D F10,F0,F8
SUB.D F8,F8,F14

» CDC 6600 scoreboard would stall SUB.D until ADD.D reads ops
University of Tehran 29
Three Parts of the Scoreboard
1. Instruction status—which of 4 steps the instruction is in

2. Functional unit status—Indicates the state of the functional unit


(FU). 9 fields for each functional unit
Busy—Indicates whether the unit is busy or not
Op—Operation to perform in the unit (e.g., + or –)
Fi—Destination register
Fj, Fk—Source-register numbers
Qj, Qk—Functional units producing source registers Fj, Fk
Rj, Rk—Flags indicating when Fj, Fk are ready

3. Register result status—Indicates which functional unit will write


each register, if one exists. Blank when no pending instructions will
write that register

University of Tehran 30
Detailed Scoreboard Pipeline
Control
Instruction
Wait until Bookkeeping
status
Busy(FU) yes; Op(FU) op;
Fi(FU) `D’; Fj(FU) `S1’;
Not busy (FU)
Issue Fk(FU) `S2’; Qj Result(‘S1’);
and not result(D)
Qk Result(`S2’); Rj not Qj;
Rk not Qk; Result(‘D’) FU;
Read
Rj and Rk Rj No; Rk No
operands
Execution Functional unit
complete done

f((Fj( f )≠Fi(FU)
f(if Qj(f)=FU then Rj(f) Yes);
or Rj( f )=No) &
Write result f(if Qk(f)=FU then Rj(f) Yes);
(Fk( f ) ≠Fi(FU) or
Result(Fi(FU)) 0; Busy(FU) No
Rk( f )=No))

University of Tehran 31
Scoreboard Example
Instruction status Read Execution Write
Instruction j k Issue operandscompleteResult
ADD: 2 clock cycles
LD F6 34+ R2 MUL: 10 clock cycles
LD F2 45+ R3 DIV: 40 clock cycles
MULTDF0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
FU

University of Tehran 32
Scoreboard Example Cycle 1
Instruction status Read Execution
W rite
Instruction j k Issue operandscompleteResult ADD: 2 clock cycles
LD F6 34+ R2 1 MUL: 10 clock cycles
LD F2 45+ R3 DIV: 40 clock cycles
MULTDF0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 Yes
Mult1 No
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
1 FU Integer

University of Tehran 33
Scoreboard Example Cycle 2
Instruction status Read Execution
W rite
Instruction j k Issue operandscompleteResult ADD: 2 clock cycles
LD F6 34+ R2 1 2 MUL: 10 clock cycles
LD F2 45+ R3 DIV: 40 clock cycles
MULTDF0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 Yes
Mult1 No
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
2 FU Integer

• Issue 2nd LD? University of Tehran 34


Scoreboard Example Cycle 3
Instruction status Read Execution
W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3
LD F2 45+ R3
MULTDF0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 Yes
Mult1 No
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
3 FU Integer

• Issue MULT? University of Tehran 35


Scoreboard Example Cycle 4
Instruction status Read Execution
W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3
MULTDF0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F6 R2 Yes
Mult1 No
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
4 FU Integer

University of Tehran 36
Scoreboard Example Cycle 5
Instruction status Read Execution
W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5
MULTDF0 F2 F4
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 Yes
Mult1 No
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
5 FU Integer

University of Tehran 37
Scoreboard Example Cycle 6
Instruction status Read Execution
W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6
MULTDF0 F2 F4 6
SUBD F8 F6 F2
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 Yes
Mult1 Yes Mult F0 F2 F4 Integer No Yes
Mult2 No
Add No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
6 FU Mult1 Integer

University of Tehran 38
Scoreboard Example Cycle 7
Instruction status Read Execution
W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7
MULTDF0 F2 F4 6
SUBD F8 F6 F2 7
DIVD F10 F0 F6
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 Yes
Mult1 Yes Mult F0 F2 F4 Integer No Yes
Mult2 No
Add Yes Sub F8 F6 F2 Integer Yes No
Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
7 FU Mult1 Integer Add

• Read multiply operands? University of Tehran 39


Scoreboard Example Cycle 8a
Instruction status Read Execution
W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7
MULTDF0 F2 F4 6
SUBD F8 F6 F2 7
DIVD F10 F0 F6 8
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer Yes Load F2 R3 Yes
Mult1 Yes Mult F0 F2 F4 Integer No Yes
Mult2 No
Add Yes Sub F8 F6 F2 Integer Yes No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
8 FU Mult1 Integer Add Divide

University of Tehran 40
Scoreboard Example Cycle 8b
Instruction status Read Execution
W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6
SUBD F8 F6 F2 7
DIVD F10 F0 F6 8
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
Add Yes Sub F8 F6 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
8 FU Mult1 Add Divide

University of Tehran 41
Scoreboard Example Cycle 9
Instruction status Read Execution
W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9
SUBD F8 F6 F2 7 9
DIVD F10 F0 F6 8
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
10 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
2 Add Yes Sub F8 F6 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
9 FU Mult1 Add Divide

• Read operands for MULT & SUBD? Issue ADDD?


University of Tehran 42
Scoreboard Example Cycle 11
Instruction status Read Execution
W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11
DIVD F10 F0 F6 8
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
8 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
0 Add Yes Sub F8 F6 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
11 FU Mult1 Add Divide

University of Tehran 43
Scoreboard Example Cycle 12
Instruction status Read Execution
W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
7 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
Add No
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
12 FU Mult1 Divide

• Read operands for DIVD? University of Tehran 44


Scoreboard Example Cycle 13
Instruction status Read Execution
W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
6 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
13 FU Mult1 Add Divide

University of Tehran 45
Scoreboard Example Cycle 14
Instruction status Read Execution
W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
5 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
2 Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
14 FU Mult1 Add Divide

University of Tehran 46
Scoreboard Example Cycle 15
Instruction status Read Execution
W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
4 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
1 Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
15 FU Mult1 Add Divide

University of Tehran 47
Scoreboard Example Cycle 16
Instruction status Read Execution
W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14 16
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
3 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
0 Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
16 FU Mult1 Add Divide

University of Tehran 48
Scoreboard Example Cycle 17
Instruction status Read Execution
W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14 16
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
2 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
17 FU Mult1 Add Divide

• Write result of ADDD? University of Tehran 49


Scoreboard Example Cycle 18
Instruction status Read Execution
W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14 16
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
1 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
18 FU Mult1 Add Divide

University of Tehran 50
Scoreboard Example Cycle 19
Instruction status Read Execution
W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9 19
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14 16
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
0 Mult1 Yes Mult F0 F2 F4 Yes Yes
Mult2 No
Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Mult1 No Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
19 FU Mult1 Add Divide

University of Tehran 51
Scoreboard Example Cycle 20
Instruction status Read Execution
W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8
ADDD F6 F8 F2 13 14 16
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Yes Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
20 FU Add Divide

University of Tehran 52
Scoreboard Example Cycle 21
Instruction status Read Execution
W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8 21
ADDD F6 F8 F2 13 14 16
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add Yes Add F6 F8 F2 Yes Yes
Divide Yes Div F10 F0 F6 Yes Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
21 FU Add Divide

University of Tehran 53
Scoreboard Example Cycle 22
Instruction status Read Execution
W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8 21
ADDD F6 F8 F2 13 14 16 22
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
40 Divide Yes Div F10 F0 F6 Yes Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
22 FU Divide

University of Tehran 54
Scoreboard Example Cycle 61
Instruction status Read Execution
W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8 21 61
ADDD F6 F8 F2 13 14 16 22
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
0 Divide Yes Div F10 F0 F6 Yes Yes
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
61 FU Divide

University of Tehran 55
Scoreboard Example Cycle 62
Instruction status Read Execution
W rite
Instruction j k Issue operandscompleteResult
LD F6 34+ R2 1 2 3 4
LD F2 45+ R3 5 6 7 8
MULTDF0 F2 F4 6 9 19 20
SUBD F8 F6 F2 7 9 11 12
DIVD F10 F0 F6 8 21 61 62
ADDD F6 F8 F2 13 14 16 22
Functional unit status dest S1 S2 FU for j FU for k Fj? Fk?
Time Name Busy Op Fi Fj Fk Qj Qk Rj Rk
Integer No
Mult1 No
Mult2 No
Add No
0 Divide No
Register result status
Clock F0 F2 F4 F6 F8 F10 F12 ... F30
62 FU

University of Tehran 56
CDC 6600 Scoreboard
• Speedup:
– 1.7 from compiler (Fortran program);
– 2.5 by hand coded assembly programs.
– BUT slow memory (no cache) limits benefit.
• Had as much logic as one of the functional U.
• Limitations of 6600 scoreboard:
– No forwarding hardware.
– Limited to instructions in basic block (small
window).
– Small number of functional units (structural
hazards), especially integer/load store units.
– Do not issue on structural hazards.
– Wait for WAR hazards.
– Prevent WAW hazards.
University of Tehran 57
Summary
• Instruction Level Parallelism (ILP) in HW
• HW exploiting ILP
– Works when can’t know dependence at run time
– Code for one machine runs well on another
• Key idea of Scoreboard: Allow instructions behind stall to proceed
(Decode => Issue instr & read operands)
– Enables out-of-order execution => out-of-order completion
– ID stage checked both for structural and WAW hazards;

University of Tehran 58

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy