0% found this document useful (0 votes)

68 views27 pages

CS429: Computer Organization and Architecture: Pipeline III

There are two types of hazards that can interfere with instruction flow through a pipeline: data hazards and control hazards. Data hazards occur when a value produced by one instruction is needed by a subsequent instruction before it is available. Control hazards occur when the execution of a branch instruction makes the next instruction to fetch ambiguous. These hazards can be addressed by stalling the pipeline to wait for dependent values, forwarding values within the pipeline before writeback, or a combination of both. Load-use hazards, where an instruction needs a value from a prior load before it is available from memory, are handled with stalling and bubbles inserted into the pipeline.

Uploaded by

tanyapahwa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

68 views27 pages

CS429: Computer Organization and Architecture: Pipeline III

Uploaded by

tanyapahwa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

CS429: Computer Organization and Architecture

Pipeline III

Dr. Bill Young

Department of Computer Science
University of Texas at Austin

Last updated: July 11, 2019 at 15:02

CS429 Slideset 16: 1 Pipeline III

Data Hazard vs. Control Hazard

There are two types of hazards that interfere with flow through a
pipeline.

Data hazard: values produced from

one instruction are not available when
needed by a subsequent instruction.

Control hazard: a branch in the

control flow makes ambiguous what is
the next instruction to fetch.

CS429 Slideset 16: 2 Pipeline III

How Do We Fix the Pipeline? Possibilities:

1 Pad the program with NOPs. That could mean two things:
Change the program itself. That violates our Pipeline
Correctness Axiom. Why?
Make the implementation behave as if there were NOPs
inserted.
2 That’s called stalling the pipeline
Data hazards:
Wait for producing instruction to complete
Then proceed with consuming instruction
Control hazards:
Wait until new PC has been determined, then fetch
Make a guess and patch later, if wrong
How is this better than inserting NOPs into the program?

CS429 Slideset 16: 3 Pipeline III

How Do We Fix the Pipeline?

3 Forward data within the pipeline

Grab the result from somewhere in the pipe
After it has been computed
But before it has been written back
This gives an opportunity to avoid performance degradation
due to stalling for hazards.
4 Do some clever combination of these.

The implemented solution (4) is a combination of 2 and 3: forward

data when possible and stall the pipeline only when necessary.

CS429 Slideset 16: 4 Pipeline III

Data Forwarding

irmovq $10, %rdx

irmovq $3, %rax
addq %rdx, %rax

Naive pipeline
Register isn’t written until completion of write-back stage.
Source operands read from register file in decode stage.
Needs to be in register file at start of stage.
Observation: value was available in execute or memory stage.
Trick:
Pass value directly from
generating instruction to
decode stage.
Needs to be available at end of
decode stage.

CS429 Slideset 16: 5 Pipeline III

Data Forwarding Example

CS429 Slideset 16: 6 Pipeline III

Bypass Paths
W_icode, W_valM W_valE, W_valM, W_dstE, W_dstM

Decode Stage: W
W_valE
W_valM
m_valM

Forwarding logic Data

Data
M_icode,
Memory
selects valA and valB M_Bch,
M_valA
Addr, Data
memory
memory

M_valE

Normally from M

register file Bch

CC
CC
e_valE

Execute ALU
ALU

Forwarding: get valA E_valA, E_valB,

E_srcA, E_srcB

or valB from later

pipeline stage valA, valB

Forward

Forwarding Sources: Decode

d_srcA,
d_srcB A B
Register M
Register
file
file
E

Execute: valE D valP

Write back

Memory: valE, valM icode, ifun,

rA, rB, valC valP

Fetch Instruction
Instruction PC
PC

Write back: valE, memory

memory increment
increment
predPC

valM PC f_PC

CS429 Slideset 16: 7 Pipeline III

Data Forwarding Example 2

Register %rdx: generated by ALU during previous cycle;

forwarded from memory as valA.
Register %rax: value just generated by ALU; forward from
execute as valB.
CS429 Slideset 16: 8 Pipeline III
Implementing Forwarding

Add new feedback paths from E, M, and W pipeline registers

into decode stage.
Create logic blocks to select from multiple sources for valA
and valB in decode stage.

CS429 Slideset 16: 9 Pipeline III

CS429 Slideset 16: 10 Pipeline III
Limitation of Forwarding

Load-use (data) dependency:

Value needed by end of decode stage in cycle 7.
Value read from memory in memory stage of cycle 8.
CS429 Slideset 16: 11 Pipeline III
Dealing with Load/Use Hazard

Notice that value needed is not in any pipeline register

Stall using instruction for one cycle; requires one bubble.
Can pick up loaded value by forwarding from memory stage.
CS429 Slideset 16: 12 Pipeline III
What’s a Bubble

If we stall the pipeline at one stage and let the instructions ahead
proceed, that creates a gap that has to be filled.
A bubble is a “virtual nop” created by populating the pipeline
registers at that stage with values as if had there been a nop at that
point in the program. The bubble can flow through the pipeline just
like any other instruction.

A bubble is used for two

purposes:
1 fill the gap created when the
pipeline is stalled;
2 replace a real instruction
that was fetched
erroneously.

CS429 Slideset 16: 13 Pipeline III

Control for Load/Use Hazard

Stall instructions in fetch and decode stages

Inject bubble into execute stage.
CS429 Slideset 16: 14 Pipeline III
Control for Load/Use Hazard

Condition F D E M W
Load/Use Hazard stall stall bubble normal normal
CS429 Slideset 16: 15 Pipeline III
Control Hazards: Recall Our Prediction Strategy

Instructions that don’t transfer control:

Predict next PC to be valP; this is always reliable.
Call and Unconditional Jumps:
Predict next PC to be valC (destination); this is always reliable.
Conditional Jumps:
Predict next PC to be valC (destination).
Only correct if the branch is taken; right about 60% of the
time.
Return Instruction:
Don’t try to predict.

Note that we could have used a different prediction strategy

CS429 Slideset 16: 16 Pipeline III

Branch Misprediction Example

0x000: xorq %rax, %rax

0x002: jne target # Not taken
0x00b: irmovq $1, %rax # Fall through
0x015: halt
0x016: target:
0x016: irmovq $2, %rdx # Target
0x020: irmovq $3, %rcx # Target + 1
0x02a: halt

Should only execute the first 4 instructions.

CS429 Slideset 16: 17 Pipeline III

Handling Misprediction

Predict branch as taken

Fetch 2 instructions at target
Cancel when mispredicted
Detect branch not taken in execute stage
On following cycle, replace instruction in execute and decode
stage by bubbles.
No side effects have occurred yet.
CS429 Slideset 16: 18 Pipeline III
Control for Misprediction

Condition F D E M W
Mispredicted normal bubble bubble normal normal
Branch

CS429 Slideset 16: 19 Pipeline III

Return Example

irmovq Stack, %rsp # Initialize stack pointer

call p # Procedure call
irmovq $5, %rsi # Return point
halt
.pos 0x20
p: irmovq $-1, %rdi # procedure
ret
irmovq $1, %rax # should not be executed
irmovq $2, %rcx # should not be executed
irmovq $3, %rdx # should not be executed
irmovq $4, %rbx # should not be executed
.pos 0x100
Stack: # Stack pointer

Without stalling, could execute three additional instructions.

CS429 Slideset 16: 20 Pipeline III

Correct Return Example

ret
bubble
bubble
bubble
irmovq $5, %rsi # Return

As ret passes through pipeline, stall at fetch stage—while in

decode, execute, and memory stages.
Inject bubble into decode stage.
Release stall when ret reaches write-back stage.

CS429 Slideset 16: 21 Pipeline III

Control for Return

This is a bit confusing, because there are actually three bubbles

inserted. Stall until the ret reaches write back.

ret
bubble
bubble
bubble
irmovq $5, %rsi # Return

Condition F D E M W
Processing ret stall bubble normal normal normal

CS429 Slideset 16: 22 Pipeline III

Pipeline Summary
Data Hazards
Most handled by forwarding with no performance penalty
Load / use hazard requires one cycle stall

Control Hazards
Cancel instructions when detect mispredicted branch; two
cycles wasted
Stall fetch stage while ret pass through pipeline; three cycles
wasted.

Control Combinations
Must analyze carefully
First version had a subtle bug
Only arises with unusual instruction combination

CS429 Slideset 16: 23 Pipeline III

Performance Analysis with Pipelining

Seconds Instructions Cycles Seconds

CPU time = = ∗ ∗
Program Program Instruction Cycle

Ideal pipelined machine: Cycles per Instruction (CPI) = 1

One instruction completed per cycle.
But much faster cycle time than unpipelined machine.
However, hazards work against the ideal
Hazards resolved using forwarding are fine with no penalty.
Stalling degrades performance and instruction completion rate
is interrupted.
CPI is a measure of the “architectural efficiency” of the
design.

CS429 Slideset 16: 24 Pipeline III

Computing CPI

CPI is a function of useful instructions and bubbles:

Ci + Cb Cb
CPI = = 1.0 +
Ci Ci

You can reformulate this to account for:

load/use penalties (lp): 1 bubble
branch misprediction penalties (mp): 2 bubbles
return penalties (rp): 3 bubbles

lp + mp + rp
CPI = 1.0 +
Ci

CS429 Slideset 16: 25 Pipeline III

Computing CPI (2)

So, how do we determine the penalties?

Depends on how often each situation occurs on average.
How often does a load occur and how often does that load
cause a stall?
How often does a branch occur and how often is it
mispredicted?
How often does a return occur?
We can measure these using:
a simulator, or
hardware performance counters.
We can also estimate them through historical averages.
Then use estimates to make early design tradeoffs for the
architecture.

CS429 Slideset 16: 26 Pipeline III

Computing CPI (3)

Assume some hypothetical counts:

Cause Name Instruction Condition Stalls Product
Frequency Frequency
Load/use lp 0.30 0.3 1 0.09
Mispredict mp 0.20 0.4 2 0.16
Return rp 0.02 1.0 3 0.06
Total penalty 0.31
CPI = 1 + 0.31 = 1.31 == 31%
This is not ideal.

This gets worse when:

you also account for non-ideal memory access latency;
deeper pipeline (where stalls per hazard increase).

CS429 Slideset 16: 27 Pipeline III

05 Risc v Pipeline
No ratings yet
05 Risc v Pipeline
31 pages
Chapter 04 RISC v Removed
No ratings yet
Chapter 04 RISC v Removed
99 pages
Chapter 10 Principles of Pipelining
No ratings yet
Chapter 10 Principles of Pipelining
124 pages
Arch4 Pipelined Processor Design Afterlecture
No ratings yet
Arch4 Pipelined Processor Design Afterlecture
130 pages
CA unit-2 Chapter-2
No ratings yet
CA unit-2 Chapter-2
36 pages
Lecture 8 Chapter_04 RISC-V Pipelining - Student Version (1)
No ratings yet
Lecture 8 Chapter_04 RISC-V Pipelining - Student Version (1)
59 pages
ch4-2
No ratings yet
ch4-2
42 pages
Komplete Elements Setup Guide English PDF
No ratings yet
Komplete Elements Setup Guide English PDF
44 pages
Unit-3
No ratings yet
Unit-3
94 pages
Ba CP-1243-1 76
No ratings yet
Ba CP-1243-1 76
80 pages
Pooja Vashisth
No ratings yet
Pooja Vashisth
68 pages
L04 Pipelining
No ratings yet
L04 Pipelining
48 pages
Pipeline
100% (2)
Pipeline
8 pages
Lecture 3: CPU Structure and Function
No ratings yet
Lecture 3: CPU Structure and Function
47 pages
CEA201 - Chapter 14 - Processor Structure and Function
No ratings yet
CEA201 - Chapter 14 - Processor Structure and Function
42 pages
3 Pipeline
No ratings yet
3 Pipeline
38 pages
Pipelined Datapath and Control
No ratings yet
Pipelined Datapath and Control
37 pages
SRM Pipelining 05.Pptx
No ratings yet
SRM Pipelining 05.Pptx
42 pages
This Study Resource Was: Pipelining Analogy
No ratings yet
This Study Resource Was: Pipelining Analogy
58 pages
Solidigm Bootable FUT User Guide 730885 001US
No ratings yet
Solidigm Bootable FUT User Guide 730885 001US
23 pages
CH 12.ppt Type I
No ratings yet
CH 12.ppt Type I
54 pages
6.1.CSE 4293 Pipelining
No ratings yet
6.1.CSE 4293 Pipelining
36 pages
L05-PipeliningII
No ratings yet
L05-PipeliningII
36 pages
lec4
No ratings yet
lec4
35 pages
Chapter 4 Part 2
No ratings yet
Chapter 4 Part 2
50 pages
Chapter4 Pipelining END FA11
No ratings yet
Chapter4 Pipelining END FA11
84 pages
Log
No ratings yet
Log
14 pages
3.2 Pipeline Processing
No ratings yet
3.2 Pipeline Processing
18 pages
Ga Installation Guide
No ratings yet
Ga Installation Guide
62 pages
L13 Stalls and Flushes
No ratings yet
L13 Stalls and Flushes
27 pages
x8088/8086 Compare and Jump Instruction
No ratings yet
x8088/8086 Compare and Jump Instruction
30 pages
Lec12 Pipeline 2 Notes
No ratings yet
Lec12 Pipeline 2 Notes
58 pages
Unit 5 Notes
No ratings yet
Unit 5 Notes
19 pages
Computer Architecture: Nguyễn Trí Thành
No ratings yet
Computer Architecture: Nguyễn Trí Thành
77 pages
Data Hazards
No ratings yet
Data Hazards
29 pages
Slides14 Pipeline1 4up
No ratings yet
Slides14 Pipeline1 4up
6 pages
CSE-302 Mobile Computing: Mobile Adhoc Networks Asst. Prof. Nitin Jain JMIT, Radaur
No ratings yet
CSE-302 Mobile Computing: Mobile Adhoc Networks Asst. Prof. Nitin Jain JMIT, Radaur
45 pages
Week 11
No ratings yet
Week 11
33 pages
Pipelining Basics
No ratings yet
Pipelining Basics
12 pages
Unit 5 Pipeline Hazard
No ratings yet
Unit 5 Pipeline Hazard
31 pages
Pipelinehazard 160823134502
No ratings yet
Pipelinehazard 160823134502
61 pages
DX Diag
No ratings yet
DX Diag
35 pages
VR3000_softwareinfo_V02.52_
No ratings yet
VR3000_softwareinfo_V02.52_
3 pages
TCR 3.1 Overview PDF
No ratings yet
TCR 3.1 Overview PDF
23 pages
31 Pipeline Hazards 25-04-2024
No ratings yet
31 Pipeline Hazards 25-04-2024
35 pages
IOT and Embedded Systems
No ratings yet
IOT and Embedded Systems
2 pages
Computer Architecture: Pipelining Basics
No ratings yet
Computer Architecture: Pipelining Basics
19 pages
Pentest Cheat Sheets
100% (1)
Pentest Cheat Sheets
28 pages
Pipeline Review: Here Is The Example Instruction Sequence Used To Illustrate Pipelining On The Previous Page
No ratings yet
Pipeline Review: Here Is The Example Instruction Sequence Used To Illustrate Pipelining On The Previous Page
11 pages
Microprocessor (Ass.1)
No ratings yet
Microprocessor (Ass.1)
2 pages
Coa Unit 4
No ratings yet
Coa Unit 4
10 pages
Data Hazards: Danger!Danger!Danger!
No ratings yet
Data Hazards: Danger!Danger!Danger!
7 pages
Pipeline Processor Design
No ratings yet
Pipeline Processor Design
89 pages
2011 Fall Midterm2 Soln CS 439
No ratings yet
2011 Fall Midterm2 Soln CS 439
6 pages
Resume ALBERTO VISTRAIN PDF
No ratings yet
Resume ALBERTO VISTRAIN PDF
3 pages
Pipelinehazard For Class
No ratings yet
Pipelinehazard For Class
61 pages
Comparison of CPUs
No ratings yet
Comparison of CPUs
3 pages
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
No ratings yet
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
64 pages
Ca07 2014 PDF
No ratings yet
Ca07 2014 PDF
56 pages
AXESS Brochure 2022 08
No ratings yet
AXESS Brochure 2022 08
2 pages
Pagepack™ Center 1.4: User Guide
No ratings yet
Pagepack™ Center 1.4: User Guide
37 pages
Lect8 Pipelined DP Control
No ratings yet
Lect8 Pipelined DP Control
59 pages
Week 4 - Pipelining
No ratings yet
Week 4 - Pipelining
44 pages
Lec 06
No ratings yet
Lec 06
18 pages
Tecnms 4175 PDF
No ratings yet
Tecnms 4175 PDF
205 pages
Pipeline Hazards: Structural Hazards: Resource Conflict
No ratings yet
Pipeline Hazards: Structural Hazards: Resource Conflict
49 pages
CS 162 Computer Architecture Lecture 3: Pipelining Contd.: Instructor: L.N. Bhuyan
No ratings yet
CS 162 Computer Architecture Lecture 3: Pipelining Contd.: Instructor: L.N. Bhuyan
21 pages
Pipelining
No ratings yet
Pipelining
44 pages
Advanced Linux Programming
No ratings yet
Advanced Linux Programming
31 pages
Operating System Questions With Their Answers: (Memory Management, Virtual Memory, Processes Synchronization
No ratings yet
Operating System Questions With Their Answers: (Memory Management, Virtual Memory, Processes Synchronization
14 pages
How To Start A Web Service Project in SOAP UI - Rebdev, LLC
No ratings yet
How To Start A Web Service Project in SOAP UI - Rebdev, LLC
6 pages
Pipe Lining
No ratings yet
Pipe Lining
66 pages
Lec3 PDF
No ratings yet
Lec3 PDF
15 pages
Alvarion Breezemax Asn-Gw All-In-One Wimax Gateway: Www.4Gon - Co.Uk - Help@4Gon - Co.Uk - Page 1/3
No ratings yet
Alvarion Breezemax Asn-Gw All-In-One Wimax Gateway: Www.4Gon - Co.Uk - Help@4Gon - Co.Uk - Page 1/3
3 pages
Computer Science 37 Lecture 22
No ratings yet
Computer Science 37 Lecture 22
14 pages
Preparation Before Using DJI Terra
No ratings yet
Preparation Before Using DJI Terra
39 pages
OPC XML-DA Demonstration DOKU v10 e
No ratings yet
OPC XML-DA Demonstration DOKU v10 e
34 pages
Chapter 5
No ratings yet
Chapter 5
40 pages
NITGEN USB Device Driver For Linux Installation Guide (Eng)
No ratings yet
NITGEN USB Device Driver For Linux Installation Guide (Eng)
7 pages
Content: - Introduction To Pipeline Hazard - Structural Hazard - Data Hazard - Control Hazard
No ratings yet
Content: - Introduction To Pipeline Hazard - Structural Hazard - Data Hazard - Control Hazard
27 pages
Script Cryptotab
100% (2)
Script Cryptotab
2 pages
CX2040 Cpu
No ratings yet
CX2040 Cpu
3 pages
Enhancing Performance With Pipelining
No ratings yet
Enhancing Performance With Pipelining
71 pages
COA Unit 3
No ratings yet
COA Unit 3
89 pages
Baselines in P6
No ratings yet
Baselines in P6
3 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

CS429: Computer Organization and Architecture: Pipeline III

Uploaded by

CS429: Computer Organization and Architecture: Pipeline III

Uploaded by

CS429: Computer Organization and Architecture

Dr. Bill Young

Last updated: July 11, 2019 at 15:02

CS429 Slideset 16: 1 Pipeline III

Data hazard: values produced from

Control hazard: a branch in the

CS429 Slideset 16: 2 Pipeline III

CS429 Slideset 16: 3 Pipeline III

3 Forward data within the pipeline

The implemented solution (4) is a combination of 2 and 3: forward

CS429 Slideset 16: 4 Pipeline III

irmovq $10, %rdx

CS429 Slideset 16: 5 Pipeline III

CS429 Slideset 16: 6 Pipeline III

Forwarding logic Data

register file Bch

Forwarding: get valA E_valA, E_valB,

or valB from later

pipeline stage valA, valB

Forwarding Sources: Decode

Execute: valE D valP

Memory: valE, valM icode, ifun,

Write back: valE, memory

CS429 Slideset 16: 7 Pipeline III

Register %rdx: generated by ALU during previous cycle;

Add new feedback paths from E, M, and W pipeline registers

CS429 Slideset 16: 9 Pipeline III

Load-use (data) dependency:

Notice that value needed is not in any pipeline register

A bubble is used for two

CS429 Slideset 16: 13 Pipeline III

Stall instructions in fetch and decode stages

Instructions that don’t transfer control:

Note that we could have used a different prediction strategy

CS429 Slideset 16: 16 Pipeline III

0x000: xorq %rax, %rax

Should only execute the first 4 instructions.

CS429 Slideset 16: 17 Pipeline III

Predict branch as taken

CS429 Slideset 16: 19 Pipeline III

irmovq Stack, %rsp # Initialize stack pointer

Without stalling, could execute three additional instructions.

CS429 Slideset 16: 20 Pipeline III

As ret passes through pipeline, stall at fetch stage—while in

CS429 Slideset 16: 21 Pipeline III

This is a bit confusing, because there are actually three bubbles

CS429 Slideset 16: 22 Pipeline III

CS429 Slideset 16: 23 Pipeline III

Seconds Instructions Cycles Seconds

Ideal pipelined machine: Cycles per Instruction (CPI) = 1

CS429 Slideset 16: 24 Pipeline III

CPI is a function of useful instructions and bubbles:

You can reformulate this to account for:

CS429 Slideset 16: 25 Pipeline III

So, how do we determine the penalties?

CS429 Slideset 16: 26 Pipeline III

Assume some hypothetical counts:

This gets worse when:

CS429 Slideset 16: 27 Pipeline III

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.