0% found this document useful (0 votes)

14 views124 pages

Chapter 10 Principles of Pipelining

This document discusses the principles of pipelining in computer architecture, outlining the structure of a pipelined data path and the various stages involved in instruction processing. It covers pipeline hazards, including data hazards, control hazards, and structural hazards, along with solutions like inserting nop instructions and using interlocks. The document emphasizes the importance of maintaining efficiency and correctness in pipelined processors.

Uploaded by

govaje4313

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views124 pages

Chapter 10 Principles of Pipelining

Uploaded by

govaje4313

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 124

PowerPoint Slides

The
Processor
LanguageDesign
of Bits
Basic Computer Architecture
Prof. Smruti Ranjan Sarangi
IIT Delhi

Chapter 10: Principles of Pipelining

1
2nd version

www.basiccomparch.com
Download the pdf of the book

videos

Slides, software, solution manual

The pdf version of the book and Print version

all the learning resources can (Publisher: WhiteFalcon, 2021)
be freely downloaded from the Available on e-commerce sites.
website:
www.basiccomparch.com
Outline
* Overview of Pipelining
* A Pipelined Data Path
* Pipeline Hazards
* Pipeline with Interlocks
* Forwarding
* Performance Metrics
* Interrupts/ Exceptions

3
Up till now ….
* We have designed a processor that can
execute all the SimpleRisc Instructions
* We have look at two styles :
*
With a hardwired control unit
*
Microprogrammed control unit
* Microprogrammed data path
* Microassembly Language
* Microinstructions

4
Designing Efficient Processors

* Microprogrammed processors are much

slower that hardwired processors
* Even hardwired processors
* Have a lot of waste !!!
* We have 5 stages.
* What is the IF stage doing, when the MA stage is
active ?
* ANSWER : It is idling

5
The Notion of Pipelining
* Let us go back to the car assembly line
* Is the engine shop idle, when the paint shop is
painting a car ?
* NO : It is building the engine of another car
* When this engine goes to the body shop, it
builds the engine of another car, and so on ….
* Insight :
* Multiple cars are built at the same time.
* A car proceeds from one stage to the next

6
Pipelined Processors
inst 5 inst 4 inst 3 inst 2 inst 1

Instruction Operand Execute Memory Register

Fetch Fetch Access Write
(IF) (OF) (EX) (MA) (RW)

* The IF, ID, EX, MA, and RW stages process

5 instructions simultaneously
* Each instruction proceeds from one stage
to the next
* This is known as pipelining
7
Advantages of Pipelining

* We keep all parts of the data path, busy all

the time
* Let us assume that all the 5 stages do the
same amount of work
* Without pipelining, every T seconds, an
instruction completes its execution
* With pipelining, every T/5 seconds, a new
instruction completes its execution

8
Design of a Pipeline
* Splitting the Data Path
* We divide the data path into 5 parts : IF, OF, EX,
MA, and RW
* Timing
* We insert latches (registers) between
consecutive stages
* 4 Latches → IF-OF, OF-EX, EX-MA, and MA-RW
* At the negative edge of a clock, an instruction
moves from one stage to the next
9
Pipelined Data Path with Latches

Latches

Instruction Operand Execute Memory Register

Fetch Fetch Access Write
(IF) (OF) (EX) (MA) (RW)

* Add a latch between subsequent stages.

* Triggered by a negative clock edge

10
The Instruction Packet
* What travels between stages ?
* ANSWER : the instruction packet
* Instruction Packet
* Instruction contents
* Program counter
* All intermediate results
* Control signals
* Every instruction moves with its entire state, no
interference between instructions
11
Outline
* Overview of Pipelining
* A Pipelined Data Path
* Pipeline Hazards
* Pipeline with Interlocks
* Forwarding
* Performance Metrics
* Interrupts/ Exceptions

12
IF Stage

instruction

instruction IF/OF Register

* Instruction contents saved in the instruction field

13
OF Stage
instruction

Control
Immediate and unit
branch target

branchTarget op2 instruction control

* A, B → ALU Operands, op2 (store operand),

control (set of all control signals)

14
EX Stage

pc branchTarget B A op2 instruction control OF-EX

aluSignals

To fetch unit

flag
0 1 isBeq
Branch
isRet ALU unit isBgt
branchPC

s
?ags
isUBranch
isBranchTaken

pc aluResult op2 instruction control EX-MA

* aluResult → result of the ALU Operation

* op2, control, pc, instruction (passed from
OF-EX)

15
MA Stage
pc aluResult op2 instruction control EX-MA

mdr
mar
isLd
Data memory Memory
unit
isSt

pc control MA-RW
ldResult aluResult instruction

* ldResult → result of the load operation

* aluResult, control, pc, instruction (passed
from EX-MA)

16
RW Stage

pc ldResult aluResult instruction control

4 isLd
10 01 00 isCall isWb
E
rd
0
Register
E enable A file
1
data ra(15) D
A address
D data

17
1

pc + 4 0

pc Instruction instruction
memory

pc instruction

rd rs2 ra(15) rs1

1 0 1 0 isSt
isRet Control
reg
Immediate and Register unit
file data
branch target
op2 op1
isWb
immx isImmediate
1 0

pc branchTarget B A op2 instruction control

aluSignals

flags
0 isBeq
1 Branch
isRet ALU unit
isBgt
isUBranch

isBranchTaken
pc aluResult op2 instruction control

isLd
mar mdr

Data
Memory
memory unit
isSt

DRAFT
pc ldResult aluResult instruction control

4 isLd
isWb
10 01 00 isCall
rd
0

C Smruti
data
R. Sarangi
ra(15)<srsarangi@cse.iitd.ac.in>
1

18
Abridged Diagram

IF-OF OF-EX EX-MA MA-RW

Control
unit Branch
unit Memory
unit
Fetch Immediate Register
and branch flags write unit
unit unit

Data
ALU
memory
op2 Unit
Instruction Register
memory file op1

19
Outline
* Overview of Pipelining
* A Pipelined Data Path
* Pipeline Hazards
* Pipeline with Interlocks
* Forwarding
* Performance Metrics
* Interrupts/ Exceptions

20
Pipeline Hazards
* Now, let us consider correctness
* Let us introduce a new tool → Pipeline
Diagram Clock cycles

1 2 3 4 5 6 7 8 9

IF 1 2 3
[1]: add r1, r2, r3 1 2 3
OF
[2]: sub r4, r5, r6 EX 1 2 3
MA 1 2 3
[3]: mul r8, r9, r10
RW 1 2 3

21
Rules for Constructing a Pipeline
Diagram
* It has 5 rows
* One per each stage
* The rows are named : IF, OF, EX, MA, and RW
* Each column represents a clock cycle
* Each cell represents the execution of an
instruction in a stage
* It is annotated with the name(label) of the
instruction
* Instructions proceed from one stage to
the next across clock cycles 22
Example

Clock cycles

1 2 3 4 5 6 7 8 9

IF 1 2 3
[1]: add r1, r2, r3 1 2
OF 3
[2]: sub r4, r2, r5 EX 1 2 3
MA 1 2 3
[3]: mul r5, r8, r9
RW 1 2 3

23
Data Hazards

clock cycles

1 2 3 4 5 6 7 8 9

IF 1 2
[1]: add r1, r2, r3
OF 1 2

[2]: sub r3, r1, r4 EX 1 2

1 2
MA
RW 1 2

 Instruction 2 will read incorrect values !!!

24
Data Hazard
Definition: A hazard is defined as the possibility of erroneous execution of an
instruction in a pipeline. A data hazard represents the possibility of erroneous
execution because of the unavailability of data, or the availability of incorrect
data.

* This situation represents a data hazard

* In specific,
* it is a RAW (read after write) hazard
* The earliest we can dispatch
instruction 2, is cycle 5
25
Other Types of Data Hazards

* Our pipeline is in-order

Definition: In an in-order pipeline (such as ours), a preceding instruction is
always ahead of a succeeding instruction in the pipeline. Modern processors
however use out-of-order pipelines that break this rule. It is possible for later
instructions to execute before earlier instructions.

* We will only have RAW hazards in our

pipeline.
* Out-of-order pipelines can have WAR and
WAW hazards
26
WAW Hazards

[1]: add r1, r2, r3

[2]: sub r1, r4, r3

* Instruction [2] cannot write the value of

r1, before instruction [1] writes to it, will
lead to a WAW hazard

27
WAR Hazards

[1]: add r1, r2, r3

[2]: add r2, r5, r6

* Instruction [2] cannot write the value of

r2, before instruction [1] reads it → will
lead to a WAR hazard

28
Control Hazards

[1]: beq .foo

[2]: mov r1, 4
[3]: add r2, r4, r3
...
...
.foo:
[100]: add r4, r1, r2

* If the branch is taken, instructions [2]

and [3], might get fetched, incorrectly

29
Control Hazard – Pipeline
Diagram
Clock cycles

1 2 3 4 5 6 7 8 9

IF 1 2 3
[1]: beq .foo OF 1 2 3
[2]: mov r1, 4 EX 1 2 3
MA 1 2 3
[3]: add r2, r4, r3
RW 1 2 3

* The two instructions fetched immediately

after a branch instruction might have been
fetched incorrectly.
30
Control Hazards
* The two instructions fetched immediately
after a branch instruction might have
been fetched incorrectly.
* These instructions are said to be on the
wrong path
* A control hazard represents the possibility of
erroneous execution in a pipeline because
instructions in the wrong path of a branch can
possibly get executed and save their results in
memory, or in the register file
31
Structural Hazards

* A structural hazard may occur when two

instructions have a conflict on the same set of
resources in a cycle
* Example :
* Assume that we have an add instruction
that can read one operand from memory
* add r1, r2, 10[r3]

32
Structural Hazards - II

[1]: st r4, 20[r5]

[2]: sub r8, r9, r10
[3]: add r1, r2, 10[r3]

* This code will have a structural hazard

* [3] tries to read 10[r3] (MA unit) in cycle 4
* [1] tries to write to 20[r5] (MA unit) in cycle 4
* Does not happen in our pipeline

33
Solutions in Software
* Data hazards
* Insert nop instructions, reorder code
[1]: add r1, r2, r3
[2]: sub r3, r1, r4

[1]: add r1, r2, r3

[2]: nop
[3]: nop
[4]: nop
[5]: sub r3, r1, r4

34
Code Reordering

add r1, r2, r3 add r1, r2, r3

add r4, r1, 3 add r8, r5, r6
add r8, r5, r6 add r10, r11, r12
add r9, r8, r5 nop
add r10, r11, r12 add r4, r1, 3
add r13, r10, 2 add r9, r8, r5
add r13, r10, 2

35
Control Hazards
* Trivial Solution : Add two nop
instructions after every branch
* Better solution :
* Assume that the two instructions fetched after a
branch are valid instructions
* These instructions are said to be in the delay
slots
* Such a branch is known as a delayed branch

36
Example with 2 Delay Slots

b .foo
add r1, r2, r3 add r1, r2, r3
add r4, r5, r6 add r4, r5, r6
b .foo add r8, r9, r10
add r8, r9, r10

 The compiler transfers instructions before the

branchto the delay slots.

 If it cannot find 2 valid instructions, it inserts nops.

37
Outline
 Overview of Pipelining
 A Pipelined Data Path
 Pipeline Hazards
 Pipeline with Interlocks
 Forwarding
 Performance Metrics
 Interrupts/ Exceptions

38
Why interlocks ?

 We cannot always trust the compiler to do a good

job, or even introduce nop instructions correctly.
 Compilers now need to be tailored to specific
hardware.
 We should ideally not expose the details of the
pipeline to the compiler (might be confidential
also)
 Hardware mechanism to enforce correctness →
interlock

39
Two kinds of Interlocks
* Data-Lock
* Do not allow a consumer instruction to move
beyond the OF stage till it has read the
correct values. Implication : Stall the IF and
OF stages.
* Branch-Lock
* We never execute instructions in the wrong path.
* The hardware needs to ensure both
these conditions.
40
Comparison between Software and
Hardware

Attribute Software Hardware(withinterlocks)

Portability Limited to a specific Programs can be run on any
processor processor irrespective of the nature
of the pipeline
Branches Possible to have no Need to stall the pipeline for 2 cycles
performance penalty, by in our design
using delay slots
RAW hazards Possible to eliminate Need to stall the pipeline
them through code
scheduling
Performance Highly dependent on theThe basic version of a pipeline with
nature of the program interlocks is expected to be slower
than the version that relies on
software

41
Conceptual Look at Pipeline with
Interlocks

[1]: add r1, r2, r3

[2]: sub r4, r1, r2

* We have a RAW hazard

* We need to stall, instruction [2] at the OF
stage for 3 cycles.
* We need to keep sending nop instructions
to the EX stage during these 3 cycles

42
Example

Clock cycles
bubble

1 2 3 4 5 6 7 8 9

IF 1 2
[1]: add r1, r2, r3
OF 1 2 2 2 2

[2]: sub r4, r1, r2 EX 1 2

MA 1 2

RW 1 2

43
A Pipeline Bubble
* A pipeline bubble is inserted into a stage,
when the previous stage needs to be
stalled
* It is a nop instruction
* To insert a bubble
* Create a nop instruction packet
* OR, Mark a designated bubble bit to 1

44
Bubbles in the Case of a Branch
Instruction

Clock cycles
bubble

1 2 3 4 5 6 7 8 9
[1]: beq. foo
[2]: add r1, r2, r3 IF 1 2 3 4
[3]: sub r4, r5, r6
OF 1 2 4
....
.... EX 1 4
.foo:
MA 1 4
[4]: add r8, r9, r10
RW 1 4

45
Control Hazards and Bubbles

* We know that an instruction is a branch in

the OF stage
* When it reaches the EX stage and the
branch is taken, let us convert the
instructions in the IF, and OF stages to
bubbles
* Ensures the branch-lock condition

46
Ensuring the Data-Lock Condition

* When an instruction reaches the OF

stage, check if it has a conflict with any of
the instructions in the EX, MA, and RW
stages
* If there is no conflict, nothing needs to be
done
* Otherwise, stall the pipeline (IF and OF
stages only)

47
Algorithm 5: Algorithm to detect conflicts between instructions
Data: instructions, [A], and [B]
Result: conflict exists (true), no conflict (false)
if [A].opcode ∈ (nop,b,beq,bgt,call) then
/* Does not read from any register */
return false
end
if [B].opcode ∈ (nop, cmp, st, b, beq, bgt, ret) then
/* Does not write to any register */
return false
end
/* Set the sources */
src1 ← [A].rs1
src2 ← [A].rs2
if [A].opcode = st then
src2 ← [A].rd
end
if [A].opcode = ret then
src1 ← ra
end
hasSrc1 ← true
if ([A] ∈ (not, mov)) hasSrc1 ← false

48
dest ← [B].rd
if [B].opcode = call then
dest ← ra
end
/* Check the second operand to see if it is a register */
hasSrc2 ← true
if [A].opcode ≠ ( st) then
if [A].I = 1 then
hasSrc2 ← false
end
end
/* Detect conflicts */ */
if (hasSrc1 = true) and (src1 = dest) then
return true
end
else if (hasSrc2 = true) and (src2 = dest) then
return true
end
return false

49
How to Stall a Pipeline ?

* Disable the write functionality of :

* The IF-OF register
* and the Program Counter (PC)
* To insert a bubble
* Write a bubble (nop instruction) into the OF-EX
register

50
Data Path with Interlocks (Data-
Lock)
bubble
stall stall
Data-lock Unit

Control
unit Branch
unit Memory
unit

MA-RW
Register

EX-MA
Fetch

OF-EX
Immediate
unit
IF-OF and branch flags write unit
unit

Data
ALU
op2
memory
unit
Instruction Register
memory file op1

51
Ensuring the Branch-Lock Condition

* Option 1 :
* Use delay slots (interlocks not required)
* Option 2 :
* Convert the instructions in the IF, and OF stages,
to bubbles once a branch instruction reaches the
EX stage.
* Start fetching from the next PC (not taken) or the
branch target (taken)

52
Ensuring the Branch-Lock Condition
- II

* Option 3
* If the branch instruction in the EX stage is taken,
then invalidate the instructions in the IF and OF
stages. Start fetching from the branch target.
* Otherwise, do not take any special action
* This method is also called predict not-taken (we
shall use this method because it is more
efficient that option 2)

53
Data Path with Interlocks
isBranchTaken

bubble Branch-lock unit bubble

stall stall
Data-lock unit

Control
unit Branch
unit Memor
yunit
Fetch Immediate Register
IF-OF

MA-RW
OF-EX
and branch flags write unit

EX-MA
unit unit

Data
ALU
unit
memory
op2
Instruction Register
memory file op1

54
Outline
* Overview of Pipelining
* A Pipelined Data Path
* Pipeline Hazards
* Pipeline with Interlocks
* Forwarding
* Performance Metrics
* Interrupts/ Exceptions

55
Relook at the Pipeline Diagram

Clock cycles Clock cycles

bubble
1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9

IF 1 2 1 2

[1]: add r1, r2, r3

OF 1 2 2 2 2 1 2

[2]: sub r4, r1, r2 EX 1 2 1 2

1 2
MA 1 2

RW 1 2 1 2

(a) (b)

* (a) → with bubbles

* (b) → no bubbles (may lead to wrong
results)
56
Crucial Insight (Figure (b))
 When does instruction 2 need the
value of r1 ?
 ANSWER : Cycle 3, OF Stage (wrong!!!)
 CORRECT ANSWER : Cycle 4, EX Stage
 When does instruction 1 produce the
value of r1 ?
 ANSWER : Cycle 5, RW Stage (wrong!!!)
 CORRECT ANSWER : End of Cycle 3, EX Stage

57
Forwarding
Clock cycles

1 2 3 4 5 6 7 8 9

IF 1 2

[1]: add r1, r2, r3 1 2

[2]: sub r4, r1, r2 EX 1 2

1 2
MA

RW 1 2

 If the correct value is already there in

another stage, we can forward it.

58
Forwarding from MA to EX
Clock cycles

1 2 3 4 5 6 7 8 9

IF 1 2

[1]: add r1, r2, r3 1 2

[2]: sub r4, r1, r2 EX 1 2

1 2
MA

RW 1 2

 Fowarding in cycle 4 from instruction [1]

to [2]

59
Different Forwarding Paths

 We need to add a multitude of forwarding

paths
 Rules for creating forwarding paths
 Add a path from a later stage to an earlier stage
 Try to add a forwarding path as late as possible.
For, example, we avoid the EX → OF forwarding
path, since we have the MA → EX forwarding
path
 The IF stage is not a part of any forwarding path.

60
Forwarding Path
* 3 Stage Paths
* RW → OF
* 2 Stage Paths
* RW → EX
* MA → OF (X Not Required)
* 1 Stage Paths
* RW → MA (load to store)
* MA → EX (ALU Instructions, load, store)
* EX → OF (X Not Required)

61
Forwarding Paths : RW →
MA
Clock cycles

1 2 3 4 5 6 7 8 9

IF 1 2

[1]: ld r1, 4[r2] OF 1 2

[2]: st r1, 10[r3] EX 1 2

1 2
MA

RW 1 2

62
Forwarding Paths : RW → EX

Clock cycles

1 2 3 4 5 6 7 8 9

IF 1 2 3

[1]: ld r1, 4[r2] OF 1 2 3

[2]: st r8, 10[r3] EX 1 2 3

1 2 3
MA
[3]: add r2, r1, r4
RW 1 2 3

63
Forwarding Path : MA → EX

Clock cycles

1 2 3 4 5 6 7 8 9

IF 1 2
[1]: add r1, r2, r3
OF 1 2

[2]: sub r4, r1, r2 EX 1 2

1 2
MA
RW 1 2

64
Forwarding Path : RW → OF

Clock cycles

1 2 3 4 5 6 7 8 9

IF 1 2 3 4

[1]: ld r1, 4[r2] 1 2 3 4

OF
[2]: st r4, 10[r3]
EX 1 2 3 4
[3]: st r5, 10[r6]
[4]: sub r7, r1, r2 1 2 3 4
MA

RW 1 2 3 4

65
Data Hazards with Forwarding

* Forwarding has unfortunately not

eliminated all data hazards
* We are left with one special case.
* Load-use hazard
* The instruction immediately after a load
instruction has a RAW dependence with it.

66
Load-Use Hazard
Clock cycles

1 2 3 4 5 6 7 8 9

IF 1 2

[1]: ld r1, 10[r2] OF 1 2

EX 1 2
[2]: sub r4, r1, r2
MA 1 2

RW 1 2

* Cannot forward (arrow goes backwards in time)

* Need to add a bubble (then use RW → EX
forwarding)

67
Implementation of Forwarding

* At every stage there is a choice of inputs

for each functional unit
* Use the default inputs from the previous stage
* OR, use one of the forwarded inputs
* Use a multiplexer to choose between the inputs
* A dedicated forwarding unit generates the
signals for the forwarding multiplexers

68
OF Stage with Forwarding
IF-OF OF-EX

Control
unit

Immediate
and branch
unit

M2 B
M'
op2
Register op2
file op1
A
M1
from RW
69
EX Stage with Forwarding
to IF

OF-EX EX-MA

Branch
unit

M3 flags
A
ALU
M4 unit
B M5
op2

from MA
from RW

70
MA Stage with Forwarding

MA-RW
EX-MA
aluResult
Memory
op2 unit

M6
Data
memory

to EX
from RW
to EX and OF

71
RW Stage with Forwarding
MA-RW

to OF, EX, and MA

72
Data Path with Forwarding

MA-RW
IF-OF OF-EX EX-MA
Control
unit Branch Memory
unit unit
Register
Fetch Immediate
write unit
and branch flags
unit unit
Data
memory
ALU
op2
Instruction Register unit
memory file op1
op2

73
Forwarding Conditions
* Determine if there is a conflict
between two instructions in different
stages
* Find if there is a conflict for the first operand
(rs1/ ra)
* Find if there is a conflict for the second operand
(rs2/rd)
* Always forward from the latest
instruction (that is earlier than the
74
Algorithm 6: Conflict on the first operand (rs1/ra)
Data: instructions, [A], and [B] (possible forwarding: [B] → [A])
Result: conflict exists on rs1/ra (true), no conflict (false)
if [A].opcode ∈ (nop,b,beq,bgt,call,not,mov) then
/* Does not read from any register */
return false
end
if [B].opcode ∈ (nop, cmp, st, b, beq, bgt, ret) then
/* Does not write to any register */
return false
end
/* Set the sources */
src1 ← [A].rs1
if [A].opcode = ret then
src1 ← ra
end
/* Set the destination */
dest ← [B].rd
if [B].opcode = call then
dest ← ra
end
/* Detect conflicts */
if src1 = dest then
return true
end
return false

75
Algorithm 7: Conflict on the second operand (rs2/rd)
Data: instructions, [A], and [B] (possible forwarding: [B] → [A])
Result: conflict exists on second operand (rs2/rd) (true), no conflict
(false)
if [A].opcode ∈ (nop,b,beq,bgt,call) then
/* Does not read from any register */
return false
end
if [B].opcode ∈ (nop, cmp, st, b, beq, bgt, ret) then
/* Does not write to any register */
return false
end
/* Check the second operand to see if it is a register */
if [A].opcode ≠( st) then
if [A].I = 1 then
return false
end
end
/* Set the sources */
src2 ← [A].rs2
if [A].opcode = st then
src2 ← [A].rd
end

76
/* Set the destination */
dest ← [B].rd
if [B].opcode = call then
dest ← ra
end
/* Detect conflicts */
if src2 = dest then
return true
end
return false

77
Interlocks with Forwarding

* Data-Lock
* We need to only check for the load-use hazard
* If the instruction in the EX stage is a load, and the
instruction in the OF stage uses its loaded value,
then stall for 1 cycle
* Branch-Lock
* Remains the same as before.

78
The Curious Case of the
call instruction

* The call instruction generates the value of ra in

the RW stage
* Any instruction that uses the value of ra (such as ret)
still works correctly !!!
* Prove it ....
79
Complete Data Path
isBranchTaken
bubble Branch-lock Unit
stall
Data-lock Unit

Control
unit Branch Memory
unit unit
Register
Fetch Immediate
write unit
unit and branch flags

EX-MA
OF-EX

MA-RW
IF-OF
unit
Data
memory
Execute
op2
Instruction Register unit
memory file op1
op2

Forwarding unit
80
Outline
* Overview of Pipelining
* A Pipelined Data Path
* Pipeline Hazards
* Pipeline with Interlocks
* Forwarding
* Performance Metrics
* Interrupts/ Exceptions

81
Measuring Performance
* What do we mean by the performance of
a processor ?
* ANSWER : Almost nothing
* What should we ask instead ?
* What is the performance with respect to a given
program or a set of programs ?
* Performance is inversely proportional to the time it
takes to execute a program

82
Computing the Time a Program
Takes

𝜏=¿𝑠𝑒𝑐𝑜𝑛𝑑𝑠
* CPI → Cycles per instruction
* f → frequency (cycles per second)

83
The Performance Equation

𝐼𝑃𝐶 ∗ 𝑓
𝑃∝
¿ 𝑖𝑛𝑠𝑡𝑠

* IPC → 1/CPI (Instructions per Cycle)

* What are the units of performance ?
* ANSWER : arbitrary

84
Number of Instructions (#insts)

Static Instruction: The binary or executable of a program, contains a list of

static instructions.
Dynamic Instruction: A dynamic instruction is a running instance of a static
instruction, which is created by the processor when an instruction
enters the pipeline.

* Note that these are dynamic instructions

* NOT static instructions
* A smart compiler can reduce the number
of executed instructions
85
Number of Instructions(#insts) – 2

* Dead code removal

* Often programmers write code that does not
determine the final output
* This code is redundant
* It can be identified and removed by the compiler
* Function inlining
* Very small functions have a lot of overhead → call, ret
instructions, register spilling, and restoring
* Paste the code of the callee in the code of the caller
(known as inlining)
86
Computing the CPI
* CPI for a single cycle processor = 1
* CPI for an ideal pipeline(no hazards)
* Assume we have n instructions, and k stages
* The first instruction enters the pipeline in cycle 1
* It leaves the pipeline in cycle k
* The rest of the (n-1) instructions leave in the next
(n-1) consecutive cycles

𝑛+ 𝑘 − 1
𝐶𝑃𝐼 =
𝑛
87
Computing the Maximum Frequency

* Let the maximum amount of time that it

takes to execute any instruction be :
* tmax (also known as algorithmic work)

* Minimum clock cycle time of a single cycle

pipeline → tmax
* In the case of a pipeline, let us assume that
all the pipeline stages are balanced
* Time per stage → tmax / k
88
Maximum Frequency - II
* Let the latch delay be l
* We thus have :

𝑡 𝑚𝑎𝑥
𝑡 𝑠𝑡𝑎𝑔𝑒 = +𝑙
𝑘

The minimum cycle time (1/f) is equal

1 𝑡 𝑚𝑎𝑥
= +𝑙 to tstage . Let us thus, assume that
𝑓 𝑘
our cycle time is as low as possible.

89
Performance of an Ideal Pipeline

* Let us assume that the number of

instructions are a constant

𝑓
𝑃=
𝐶𝑃𝐼
90
Optimal Number of Pipeline Stages

𝜕¿¿
* k is inversely proportional to
* k is proportional to

91
Implications
* As we increase the latch delay, we should
have less pipeline stages
* We need to minimise the time wasted in accessing
latches
* As we increase the amount of algorithmic
work, we require more pipeline stages for
ideal performance
* More pipeline stages help distribute the work better,
and increase the overlap across instructions

92
Implications - II

* As the number of instructions tends

to ∞, the number of ideal pipeline
stages also tends to ∞
* The higher startup time gets
amortized in the long run

93
A Non-Ideal Pipeline

* Our ideal CPI (CPIideal = 1) is 1

* However, in reality, we have stalls
𝐶𝑃𝐼 = 𝐶𝑃𝐼 𝑖𝑑𝑒𝑎𝑙 + 𝑠𝑡𝑎𝑙𝑙 𝑟𝑎𝑡𝑒 ∗ 𝑠𝑡𝑎𝑙𝑙 𝑝𝑒𝑛𝑎𝑙𝑡𝑦

* Let us assume that the stall rate is a

function of the program, and its nature of
dependences

94
Non-Ideal Pipeline - II

* Let us assume that the stall penalty is

proportional to the number of pipeline
stages
* Both these assumptions are strictly not
correct. They are being used to make a
coarse grained mathematical model.
* CPI = (n+k-1)/n + rck
* r → stall rate, c → constant of proportionality

95
Mathematical Model

𝑓
𝑃=
𝐶𝑃𝐼

96
Mathematical Model - II

97
Implications
* For programs with a lot of
dependences (high value of r) → Use
less pipeline stages
* For a pipeline with forwarding → c is
smaller (than a pipeline that just has
interlocks)
* It requires a larger number of pipeline stages
for optimal performance

98
Implications

* The optimal number of pipeline stages

is directly proportional to √(tmax / l)
* This ratio is not significantly changing across
technologies.
* This explains why the number of pipline stages
has remained more or less constant for the last
5-10 years

99
Example
Example Consider two programs that have the following characteristics.

Program 1 Program 2
Instruction Fraction Instruction Fraction
Type Type

loads 0.4 loads 0.3

Branches 0.2 Branches 0.1

ratio(taken 0.5 ratio(taken 0.4
branches) branches)

100
Example

101
Performance, Architecture, Compiler

P f IPC
Technology Compiler
Architecture Architecture

102
Outline
* Overview of Pipelining
* A Pipelined Data Path
* Pipeline Hazards
* Pipeline with Interlocks
* Forwarding
* Performance Metrics
* Interrupts/ Exceptions

103
What happens when you press a
key ?

* The keyboard logs the key press

* Converts the key to ASCII or Unicode
* Sends the code to the processor
* The processor thus receives an interrupt
* It suspends the current program
* Jumps to the interrupt handler.
* The interrupt handler draws the shape associated with
the key
* The processor returns to execute the original
program
104
Exceptions
* Exceptions are generated when
* A program accesses an illegal address
* We try to divide 5/0
* We issue an invalid instruction
* …
* Exception are treated the same way as
interrupts
* Jump to the exception handler
* Come back and start executing programs

105
Precise Exceptions

* Informal definition
* We need to return to the original program at exactly
the same point, at which we had left it
* The execution of the interrupt handler should not
disrupt the execution of the original program in any
way. The outcome of the original program should be
independent of the interrupt (unless the program
caused an exception).

106
Precise Exceptions - II
* Formal Definition
* Let us number the dynamic instructions in a program :
I1 … In
* Let us assume that an instruction completes after it
either updates memory, writes to registers, or reaches
the MA stage (cmp, b, beq, bgt, ret)
* Let the last program instruction that completes before
the first instruction in the interrupt handler completes,
be Ik
* Let all the program instructions that complete before
the first instruction in the interrupt handler completes,
be C
107
Precise Exceptions - III

𝐼 𝑗 ∈ 𝐶 ⇔( 𝑗 ≤ 𝑘 )

* We need to ensure this condition.

* Also no instruction of the form Ik' (k' > k) should
complete before all the instructions in the interrupt
handler complete
* After returning, we can seamlessly execute Ik(same
instruction)or Ik+1 (next instruction)

108
Marking Instructions

* When an interrupt arrives, let us mark

the instruction in the MA stage
* Otherwise, if there is an exception
* We mark the instruction, as soon as it encounters
a fault/exception
* Once, an insruction is marked, we have
two kinds of instructions in the pipeline

* Instructions before/after the marked instruction 109

Implementing Precise Exceptions

* Wait till a marked instruction reaches

the end of the pipeline
* Convert all the instructions after the marked
instruction to bubbles
* Ensures that both the conditions of a
precise exception are met.
* Once the marked instruction reaches
the end of the pipeline, the exception
unit loads the pc of the interrupt
110
Saving/Restoring Program State

* Program State
* PC
* Registers
* Flags
* Memory
* Memory → Assume that there is no overlap
of memory regions, unless explicitly
intended

111
oldPC Register
* Let us add a npc field in the instruction packet
* For taken branches it is equal to the branch target
* For all other instructions it is equal to (pc + 4)
* We populate the npc field in the EX stage
* Depending on the type of the exception, we might
want to return to pc or npc
* The exception unit sets the oldPC register to the right return
address

112
Spilling/ Restoring Registers

* In the case of functions, we stored registers on

the stack
* In this case, the interrupt handler has a separate stack.
* We cannot overwrite the stack pointer (we will lose its
previous value)
* Solution :
* Use an additional register, oldSP, to save the stack pointer
of the program.
* Load the new stack pointer, and spill all the registers
* Save oldPC
113
The Strange Case of the Flags

* Naive solution :
* Do not allow any instruction after the marked
instruction to update the flags register
* We detect an exception typically towards the
middle or end of a cycle
* By that time, the instruction might have already
updated the flags register (at least the master
latch)

114
Solution

* Add a flags field to the instruction packet

* Every instruction saves the updated value of
the flags register to the flags field in its
instruction packet.
* The exception unit saves the flags field of the
marked instruction to the oldFlags register
* We store the oldFlags register to a location in
the interrupt handler's stack
115
Privileged Instructions
* We add the following new registers to
the ISA
* special registers → flags, oldFlags, oldPC, oldSP
* The flags register is now accessible to the ISA
* Add the z series of priveleged
instructions
* movz instruction
* Transfers values between regular registers and
special registers
116
Priveleged Instructions - II
* retz
* pc ← oldPC
* The z series instructions can only be used by
the interrupt handler, and the operating
system
* We use a CPL bit (current privelege level) bit
for setting permissions
* User programs (CPL = 1)
* Interrupt handlers, kernel programs, (CPL = 0)
117
Implementing movz and retz

* Define two new opcodes

* These opcodes are usable only when CPL = 0
* Augment the OF and RW stages to use the special
registers.
* movz, and retz see a different view of registers.
Register Encoding
r0 0000
oldPC 0001

oldSP 0010
flags 0011
oldFlags 0100
sp 1110

118
Assembly Code for Spilling Registers

/* save the stack pointer */

movz oldSP, sp
mov sp, 0x FF FC

/* spill all the registers other

than sp*/
st r0, -4[sp]
st r1, -8[sp]
st r2, -12[sp]
st r3, -16[sp]
st r4, -20[sp]
st r5, -24[sp]
st r6, -28[sp]
st r7, -32[sp]
st r8, -36[sp]
st r9, -40[sp]
st r10, -44[sp]
st r11, -48[sp]
st r12, -52[sp]

119
Spilling Registers - II
st r13, -56[sp]
st r15, -60[sp]

/* save the stack pointer */

movz r0, oldSP
st r0, -64[sp]

/* save the flags register */

movz r0, oldFlags
st r0, -68[sp]

/* save the oldPC */

movz r0, oldPC
st r0, -72[sp]

/* update the stack pointer */

sub sp, sp, 72
/* code of the interrupt handler */
....
....
....

120
Restoring Registers
/* update the stack pointer */
add sp, sp, 72
/* restore the oldPC register */
ld r0, -72[sp]
movz oldPC, r0

/* restore the flags register */

ld r0, -68[sp]
movz flags, r0

/* restore all the registers other than sp*/

ld r0, -4[sp]
ld r1, -8[sp]
ld r2, -12[sp]
ld r3, -16[sp]
ld r4, -20[sp]
ld r5, -24[sp]
ld r6, -28[sp]
ld r7, -32[sp]
ld r8, -36[sp]
121
Restoring Registers – II

ld r9, -40[sp]
ld r10, -44[sp]
ld r11, -48[sp]
ld r12, -52[sp]
ld r13, -56[sp]
ld r15, -60[sp]
/* restore the stack pointer */
ld sp, -64[sp]
/* return to the program */
retz

122
PC of exception handler

isBranchTaken
bubble CPL
Branch-lock Unit
Exception unit
stall

(pc/npc), flags
Data-lock Unit

Control
unit Branch Memory
unit unit
Register
Fetch Immediate
write unit
unit
and branch flags

EX-MA
OF-EX

MA-RW
unit

IF-OF
Data
Register unit memory
ALU
op2
Instruction Register unit
memory file op1
op2
oldFlags
oldPC
oldSP
flags

Forwarding unit
123
THE END

124

Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
No ratings yet
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
64 pages
Design of 3 Stage Pipelining Processor Using VHDL
No ratings yet
Design of 3 Stage Pipelining Processor Using VHDL
22 pages
(Viral) Kamal Kaur Viral Video Original Link
No ratings yet
(Viral) Kamal Kaur Viral Video Original Link
5 pages
Chapter 17 - Pipelining Hazards
No ratings yet
Chapter 17 - Pipelining Hazards
33 pages
Module 4-Pipelining
No ratings yet
Module 4-Pipelining
39 pages
CAO Fall 2024 Lecture 07 RISC V Pipelined Implementation
No ratings yet
CAO Fall 2024 Lecture 07 RISC V Pipelined Implementation
114 pages
CH 6
No ratings yet
CH 6
29 pages
Module 5 Part2 Pipelining
No ratings yet
Module 5 Part2 Pipelining
36 pages
Pipelining 2019
No ratings yet
Pipelining 2019
82 pages
CAO Pipelining Lecture
No ratings yet
CAO Pipelining Lecture
50 pages
Pipelining Lecture
No ratings yet
Pipelining Lecture
39 pages
2007 GMC Acadia 3.6L Vin 7 Electric Diagrams 4of5
57% (7)
2007 GMC Acadia 3.6L Vin 7 Electric Diagrams 4of5
1 page
Unit 5.2 Processor
No ratings yet
Unit 5.2 Processor
40 pages
Lecture-5-09 01 2025
No ratings yet
Lecture-5-09 01 2025
25 pages
Lec 1
No ratings yet
Lec 1
30 pages
SRM Pipelining 05
No ratings yet
SRM Pipelining 05
42 pages
CEA201 - Chapter 14 - Processor Structure and Function
No ratings yet
CEA201 - Chapter 14 - Processor Structure and Function
42 pages
Moduel 5
No ratings yet
Moduel 5
46 pages
Chapter 4
No ratings yet
Chapter 4
78 pages
Computer Architecture and Organization
No ratings yet
Computer Architecture and Organization
49 pages
CBA Processor
No ratings yet
CBA Processor
21 pages
Lecture 13 Pipelining
No ratings yet
Lecture 13 Pipelining
12 pages
Week 11
No ratings yet
Week 11
33 pages
Lect8 Pipelined DP Control
No ratings yet
Lect8 Pipelined DP Control
59 pages
Pipelining Basic Concept
No ratings yet
Pipelining Basic Concept
23 pages
Computer Architecture: Pipelining: Dr. Ashok Kumar Turuk
No ratings yet
Computer Architecture: Pipelining: Dr. Ashok Kumar Turuk
136 pages
Slot24 25 CH14 ProcessorStructureAndFunction 42 Slots
No ratings yet
Slot24 25 CH14 ProcessorStructureAndFunction 42 Slots
42 pages
Computer Architecture: Nguyễn Trí Thành
No ratings yet
Computer Architecture: Nguyễn Trí Thành
77 pages
Ios Mat 0010 13
50% (2)
Ios Mat 0010 13
55 pages
Pipelining. Pipeline Hazards: Sabina Batyrkhanovna
No ratings yet
Pipelining. Pipeline Hazards: Sabina Batyrkhanovna
19 pages
10 Pipelining
No ratings yet
10 Pipelining
44 pages
Week 4 - Pipelining
No ratings yet
Week 4 - Pipelining
44 pages
Unit 5 Pipeline Hazard
No ratings yet
Unit 5 Pipeline Hazard
31 pages
Presentation 1
No ratings yet
Presentation 1
22 pages
Pipelining - Modified1
No ratings yet
Pipelining - Modified1
51 pages
CA Slides#5 Pipeline Hazards
No ratings yet
CA Slides#5 Pipeline Hazards
33 pages
COA Pipelining
No ratings yet
COA Pipelining
35 pages
CODch 6 Slides
No ratings yet
CODch 6 Slides
77 pages
PIPELINING
No ratings yet
PIPELINING
30 pages
CBSE Class 6 Maths Practice Worksheets
100% (1)
CBSE Class 6 Maths Practice Worksheets
2 pages
31 Pipeline Hazards 25-04-2024
No ratings yet
31 Pipeline Hazards 25-04-2024
35 pages
Pipelinehazard 160823134502
No ratings yet
Pipelinehazard 160823134502
61 pages
Unit-V: Performance Enhancement Techinques
No ratings yet
Unit-V: Performance Enhancement Techinques
61 pages
Theories in Nursing Informatics
No ratings yet
Theories in Nursing Informatics
31 pages
12 - Processor Structure and Function
No ratings yet
12 - Processor Structure and Function
73 pages
Helping Slides Pipelining Hazards Solutions
No ratings yet
Helping Slides Pipelining Hazards Solutions
55 pages
Pipelinehazard For Class
No ratings yet
Pipelinehazard For Class
61 pages
Enhancing Performance With Pipelining
No ratings yet
Enhancing Performance With Pipelining
85 pages
Coal Assignment
No ratings yet
Coal Assignment
10 pages
Maths New Sylabus Ministry of Primary and Secondary Education - Validated-1
No ratings yet
Maths New Sylabus Ministry of Primary and Secondary Education - Validated-1
96 pages
CS530 Fall2015 Lecture9
No ratings yet
CS530 Fall2015 Lecture9
5 pages
Instruction Pipeline
No ratings yet
Instruction Pipeline
4 pages
Ca07 2014 PDF
No ratings yet
Ca07 2014 PDF
56 pages
Embedded Systems Design: Pipelining and Instruction Scheduling
No ratings yet
Embedded Systems Design: Pipelining and Instruction Scheduling
48 pages
CH14 COA9e Processor Structure and Function
No ratings yet
CH14 COA9e Processor Structure and Function
40 pages
Advanced Linux Programming
No ratings yet
Advanced Linux Programming
31 pages
Pipe Lining
No ratings yet
Pipe Lining
16 pages
Ch2 Lec7 Instruction Piplining
No ratings yet
Ch2 Lec7 Instruction Piplining
34 pages
Zanussi ZAN2250 Sewing Machine Instruction Manual
No ratings yet
Zanussi ZAN2250 Sewing Machine Instruction Manual
76 pages
KDP Amazon
100% (1)
KDP Amazon
7 pages
William Stallings Computer Organization and Architecture 8 Edition Processor Structure and Function
No ratings yet
William Stallings Computer Organization and Architecture 8 Edition Processor Structure and Function
74 pages
Brand Perception of Honda Products
No ratings yet
Brand Perception of Honda Products
64 pages
MODULE 12 GMOs AND GENE THERAPY
100% (2)
MODULE 12 GMOs AND GENE THERAPY
37 pages
Lec3 PDF
No ratings yet
Lec3 PDF
15 pages
Xtrade Website Demo
No ratings yet
Xtrade Website Demo
72 pages
Processor Organization & Instruction Cycle
No ratings yet
Processor Organization & Instruction Cycle
31 pages
Project Africa Now
No ratings yet
Project Africa Now
6 pages
HRY-312 Computer Organization Introduction To Pipelining
No ratings yet
HRY-312 Computer Organization Introduction To Pipelining
30 pages
Salih GÖKMEN - 07.2021
No ratings yet
Salih GÖKMEN - 07.2021
114 pages
CO Pipelining PDF Notes
No ratings yet
CO Pipelining PDF Notes
10 pages
Chapter - 14 Advanced Regression Models
No ratings yet
Chapter - 14 Advanced Regression Models
49 pages
Lecture 1 - Introduction To Islamic Architecture
No ratings yet
Lecture 1 - Introduction To Islamic Architecture
51 pages
Why Law Students Should Study The Course On Environmental Studies and The Law 2
No ratings yet
Why Law Students Should Study The Course On Environmental Studies and The Law 2
5 pages
Fag Smartcheck: High Process Security by Means of Decentralised Machinery Monitoring
No ratings yet
Fag Smartcheck: High Process Security by Means of Decentralised Machinery Monitoring
26 pages
Be 20230428
No ratings yet
Be 20230428
8 pages
Active Driveline
No ratings yet
Active Driveline
17 pages
Accomplishment District Meet
No ratings yet
Accomplishment District Meet
1 page
CCNA4e Case Study
No ratings yet
CCNA4e Case Study
12 pages
ARUNKUMAR K - Profama Invoice
No ratings yet
ARUNKUMAR K - Profama Invoice
2 pages
Jin Memorial Temple
No ratings yet
Jin Memorial Temple
2 pages
Travel Guidelines by Destination - Etihad Airways
No ratings yet
Travel Guidelines by Destination - Etihad Airways
6 pages
Notation: Ae Aeff An
No ratings yet
Notation: Ae Aeff An
4 pages
SHM Exercise-3
No ratings yet
SHM Exercise-3
5 pages
Course Contents of List of Courses Approved by Federal University of Technology, Akure in Metallurgical and Materials Engineering Department
100% (2)
Course Contents of List of Courses Approved by Federal University of Technology, Akure in Metallurgical and Materials Engineering Department
14 pages
"Blended Wing Body" (BWD)
No ratings yet
"Blended Wing Body" (BWD)
28 pages
BREAK Character Sheet (Tam)
No ratings yet
BREAK Character Sheet (Tam)
1 page
Group Permutations
No ratings yet
Group Permutations
5 pages
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
From Everand
PLC: Programmable Logic Controller – Arktika.: EXPERIMENTAL PRODUCT BASED ON CPLD.
Franco Mario
No ratings yet
IP Routing Protocols All-in-one: OSPF EIGRP IS-IS BGP Hands-on Labs
From Everand
IP Routing Protocols All-in-one: OSPF EIGRP IS-IS BGP Hands-on Labs
Redouane MEDDANE
No ratings yet
LEARN MPLS FROM SCRATCH PART-B: A Beginners guide to next level of networking
From Everand
LEARN MPLS FROM SCRATCH PART-B: A Beginners guide to next level of networking
POONAM DEVI
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.