0% found this document useful (0 votes)

23 views43 pages

Pipelinenew

Uploaded by

Dibakar Banerjee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

23 views43 pages

Pipelinenew

Uploaded by

Dibakar Banerjee

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 43

PIPELINING

Pipelining is the process of accumulating instruction from the processor through a

pipeline.

It allows storing and executing instructions in an orderly process. It is also known as pipeline
processing.
What Is Pipelining

 Laundry Example
A B C D
 Ann, Brian, Cathy, Dave
each have one load of clothes
to wash, dry, and fold
 Washer takes 30 minutes

 Dryer takes 40 minutes

 “Folder” takes 20 minutes

2
What Is Pipelining
6 PM 7 8 9 10 11 Midnight

Time

30 40 20 30 40 20 30 40 20 30 40 20
T
a A
s
k
B
O
r C
d
e
r D

Sequential laundry takes 6 hours for 4 loads

If they learned pipelining, how long would laundry take?
3
Motivation
 Non-pipelined design
 Single-cycle implementation
 The cycle time depends on the slowest instruction
 Every instruction takes the same amount of time
 Multi-cycle implementation
 Divide the execution of an instruction into multiple steps
 Each instruction may take variable number of steps (clock cycles)
 Pipelined design
 Divide the execution of an instruction into multiple steps (stages)
 Overlap the execution of different instructions in different stages
 Each cycle different instruction is executed in different stages
 For example, 5-stage pipeline (Fetch-Decode-Read-Execute-Write),
 5 instructions are executed concurrently in 5 different pipeline stages
 Complete the execution of one instruction every cycle (instead of every 5
cycle)
 Can increase the throughput of the machine 5 times
PIPELINE PRINCIPLE

 Linear Pipeline
 Asynchronous
 Synchronous

 Nonlinear Pipeline
Asynchronous

 Input signal with Ready signal followed

By a Ack signal

Input Output/Input

Ready S 1
ACK ACK
Synchronous

It is a pure combinational circuit.

A valid bit from the previous stage is used to gate the clock signal .
 Clock
L L L
 Input S1 Sn

 Latch input ….

CLK Pulse
Pipeline Example

5 stage pipeline:
Fetch – Decode – Read – Execute - Write

Non-pipelined processor: 25 cycles = number of instrs (5) * number of stages (5)

F D R E W
F D R E W
F D R E W
F D R E W
Pipelined processor: 9 cycles = start-up latency (4) + number of instrs (5)
F
F D R E W Draining the
pipeline
F D R E W
F D R E W
Filling the F D R E W
pipeline
F D R E W
Theoretical Speedup due to Pipelining

 The theoretical speedup offered by a pipeline can be determined

as follows:
 • Let k be total number of stages and tp be the time per stage
 • Each instruction represents a task, T, in the pipeline and n be
the total number of tasks
 • The first task (instruction) requires k × tp time to complete in a
k-stage pipeline.
 • The remaining (n - 1) tasks emerge from the pipeline one per
cycle
 So the total time to complete the remaining tasks is (n - 1)tp
 • Thus, to complete n tasks using a k-stage pipeline requires:
(k × tp) + (n - 1)tp = (k + n - 1)tp
 If we take the time required to complete n tasks without a
pipeline and divide it by the time it takes to complete n tasks
using a pipeline, we find:

 tn = k x tp

 If we take the limit as n approaches infinity, (k + n - 1)

approaches n, which results in a theoretical speedup of:
Speed up & Efficiency

Sk = n.k/(k+(n-1))
 Efficiency is measured by the percentage of busy time
space span over the total time space span.
Impact on Clock cycle time due to Pipelining

 Lets recall

• If we lower the time per cycle, this will lower the

program execution time and hence improve performance
• This implies that we if we shorten the time per pipeline
stages, we will lower clock cycle time. This can be achieved by
adding more pipe stages of shorter duration
Non-linear Dynamic Pipelines
 Multiple processors (k-stages) as linear pipeline
 Variable functions of individual processors
 Functions may be dynamically assigned
 Feedforward and feedback connections
Reservation Table

 A reservation table displays the time-space flow

of data through the pipeline for one function
evaluation

 A static pipeline is specified by a single

reservation table

 A dynamic pipeline may be specified by multiple

reservation tables
Static Pipeline

Time

S1 X

S2 X

S3 X

S4
X
Dynamic Pipeline

S1 X X X
S2 X X
S3 X X X
Latency Analysis
 Latency : the number of clock cycles between two initiations
of the pipeline

 Collision : an attempt by two initiations to use the same pipeline

stage at the same time

 Some latencies cause collision, some not

Latency analysis cont.
 Collision: The number of time unit between
two initiation of a pipeline is the latency
between them. Always ve+ integer.

 Forbidden latency: Latency that causes collision

is called forbidden latency.
State Diagram
Method of finding Latency

• Forbidden Latency Set,F = {5} , {2} , {2}

= { 2,5}
State Diagram
 The initial collision vector (ICV) is a binary vector
formed from F such that
C = (Cn…. C2 C1)
Thus in our example
F = { 2,5 }
C = (1 0 0 1 0)
State Diagram
 The procedure is as follows:
1. Start with the ICV
2. For each unprocessed state,
For each bit i in the CVi which is 0, do the following:
a. Shift CVi right by i bits
b. Drop i rightmost bits
State Diagram

c. Append zeros to left

d. Logically OR with ICV
e. If step(d) results in a new state then form a new
node for this state and join it with node of CVi by an arc
with a marking i.
 This shifting process needs to continue until no
more new states can be generated.
State Diagram

10010
State Diagram

10010 i =1

1
ICV – 10010 OR
CVi – 01001
11011 CV* 11011
State Diagram

10010 3
i =3

ICV – 10010 OR
11011 CVi – 00010
CV* 10010
State Diagram

10010 3
i =4

1 4

11011 10011

ICV – 10010 OR
CVi – 00001
CV* 10011
State Diagram
5
10010 3
i =5

1 4

11011 10011

ICV – 10010 OR
CVi – 00000
CV* 10010
State Diagram
5
10010 3

4
1 3

11011 10011

i =3

ICV – 10010 OR
CVi – 00010
CV* 10010
State Diagram
5
10010 3

4
1 3
4

11011 10011

i =4

ICV – 10010 OR
CVi – 00001
CV* 10011
State Diagram
5
10010 3

4
1 3
4

11011 10011
3

i =3

ICV – 10010 OR
CVi – 00011
CV* 10011
State Diagram
5+
10010 3

5+ 4
1 3
4

11011 10011
3

i =5

ICV – 10010 OR
CVi – 00000
CV* 10010
State Diagram
5+
10010 3

5+ 4 5+
1 3
4

11011 10011
3

i =5

ICV – 10010 OR
CVi – 00000
CV* 10010
State Diagram

 The state with all zeros has a self-loop which corresponds to empty pipeline
and it is possible to wait for indefinite number of latency cycles of the form
(7),(8), (9),(10) etc.
 Simple Cycle: latency cycle in which each state is encountered
only once.
 Complex Cycle: consists of more than one simple cycle in it.
 It is enough to look for simple cycles
State Diagram

 Greedy Cycle: A simple cycle is a greedy cycle if each latency contained in a

cycle is the minimal latency(outgoing arc) from a state
in the cycle.
 A good task initiation sequence should include the greedy cycle.
Simple cycles & Greedy cycles
 The Simple cycles are?
 The Greedy cycles are ?
Simple cycles & Greedy cycles
 The simple cycles are (3),(5) ,(1,3,3),(4,3) and (4)
 The Greedy cycle is (1,3,3)
State Diagram

 In the above example, the cycle that offers MAL is (1, 3, 3)

(MAL = (1+3+3)/3 =2.333)
Reservation Table (Cont.)
 The number of columns in a reservation table is called
the evaluation time of a given function.

 The checkmarks in a row correspond to the time

instants (cycles) that a particular stage will be used.

 Multiple checkmarks in a row  repeated usage of the

same stage in different cycles
Reservation Table (Cont.)

 Contiguous checkmarks  extended usage of a stage over more

than one cycle

 Multiple checkmarks in one column  multiple stages are used

in parallel

 A dynamic pipeline may allow different initiations to follow a mix

of reservation table
Visualizing Pipelining

Time (clock cycles)

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

ALU
n Ifetch Reg DMem Reg

s
t
r.

ALU
Ifetch Reg DMem Reg

O
r

ALU
Ifetch Reg DMem Reg

d
e
r

ALU
Ifetch Reg DMem Reg
Collision Free Scheduling 18

Goal : to find the shortest average latency

Lengths : for reservation table with n columns, maximum forbidden

latency is m <= n – 1, and permissible latency p is
1 <= p <= m – 1

Ideal case : p = 1 (static pipeline)

Collision vector : C = (CmCm-1 . . .C2C1)

[ Ci = 1 if latency i causes collision ]
[ Ci = 0 for permissible latencies ]
Thank you

Unit 3 - Advanced Computer Architecture - WWW - Rgpvnotes.in
No ratings yet
Unit 3 - Advanced Computer Architecture - WWW - Rgpvnotes.in
15 pages
Embedded System Design
100% (1)
Embedded System Design
45 pages
Operating System Notes
100% (3)
Operating System Notes
73 pages
Pipelineing
No ratings yet
Pipelineing
82 pages
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
No ratings yet
Topic 10: Pipelining: Cos / Ele 375 Computer Architecture and Organization
64 pages
Lec18 Pipeline Chap9 2
No ratings yet
Lec18 Pipeline Chap9 2
26 pages
Unit 6
No ratings yet
Unit 6
30 pages
Principle of Designing Pipeline Processors
No ratings yet
Principle of Designing Pipeline Processors
23 pages
15CS72 ACA Module3 Chapter2finalnotes
No ratings yet
15CS72 ACA Module3 Chapter2finalnotes
20 pages
Pipelining Concepts and Problems
No ratings yet
Pipelining Concepts and Problems
33 pages
Pipeline
No ratings yet
Pipeline
30 pages
Lecture 060708
No ratings yet
Lecture 060708
37 pages
Unit 3
No ratings yet
Unit 3
64 pages
CS 211: Computer Architecture: Instructor: Prof. Bhagi Narahari
No ratings yet
CS 211: Computer Architecture: Instructor: Prof. Bhagi Narahari
82 pages
CS 211: Computer Architecture: Instructor: Prof. Bhagi Narahari
No ratings yet
CS 211: Computer Architecture: Instructor: Prof. Bhagi Narahari
82 pages
BCS-29 Advanced Computer Architecture: Linear & Nonlinear Pipelines Instruction Pipelines & Arithmetic Operations
No ratings yet
BCS-29 Advanced Computer Architecture: Linear & Nonlinear Pipelines Instruction Pipelines & Arithmetic Operations
33 pages
Pipeline 1
No ratings yet
Pipeline 1
17 pages
Lecture Notes On Parallel Processing Pipeline
No ratings yet
Lecture Notes On Parallel Processing Pipeline
12 pages
CS 211: Computer Architecture: Instructor: Prof. Bhagi Narahari
No ratings yet
CS 211: Computer Architecture: Instructor: Prof. Bhagi Narahari
82 pages
Alimak Lift Control, ALC II User's Manual: Your Hoist Has
100% (1)
Alimak Lift Control, ALC II User's Manual: Your Hoist Has
69 pages
Chap4 Pipelining
No ratings yet
Chap4 Pipelining
39 pages
Pipelined Datapath and Control
No ratings yet
Pipelined Datapath and Control
37 pages
Ca 4
No ratings yet
Ca 4
39 pages
4.non Linear Pipeline
88% (8)
4.non Linear Pipeline
20 pages
Computer Systems A Programmers Perspective, Section 4.4, "General Principles of Pipelining"
No ratings yet
Computer Systems A Programmers Perspective, Section 4.4, "General Principles of Pipelining"
7 pages
33 Hazards in Pipeline 06-04-2023
No ratings yet
33 Hazards in Pipeline 06-04-2023
27 pages
Vectors
No ratings yet
Vectors
52 pages
Module 3-Part 2
No ratings yet
Module 3-Part 2
50 pages
ACA - Pipelining
No ratings yet
ACA - Pipelining
25 pages
Computer Architecture Pipe Line
No ratings yet
Computer Architecture Pipe Line
28 pages
APznzabDMN0K7ucLj 5y16mZ4MCAzvYka6XPubu o-J2kvJ41PtLmk6WmKHv2EeC4Ezo2wWs0bceGCsYwyq4dsvlt0hqLhY17sXl8HI4hJMeArq1cYV0OrVA-LXS0 77s jVurWxDlctuiAfZ24C8IrdGDNq-YxVFyEtTihvDe2xUFnrVedfCLXwLd0z
No ratings yet
APznzabDMN0K7ucLj 5y16mZ4MCAzvYka6XPubu o-J2kvJ41PtLmk6WmKHv2EeC4Ezo2wWs0bceGCsYwyq4dsvlt0hqLhY17sXl8HI4hJMeArq1cYV0OrVA-LXS0 77s jVurWxDlctuiAfZ24C8IrdGDNq-YxVFyEtTihvDe2xUFnrVedfCLXwLd0z
20 pages
Parallelism in Uniprocessor System and Granularity
100% (5)
Parallelism in Uniprocessor System and Granularity
5 pages
UNIT-5: Pipeline and Vector Processing
No ratings yet
UNIT-5: Pipeline and Vector Processing
63 pages
Stud CSA Mod4 p2 PipeliningBasics
No ratings yet
Stud CSA Mod4 p2 PipeliningBasics
83 pages
CAO-II Module 2 Complete
100% (1)
CAO-II Module 2 Complete
32 pages
Pipelining and Vector Processing
No ratings yet
Pipelining and Vector Processing
28 pages
Lec18 Pipeline
No ratings yet
Lec18 Pipeline
59 pages
Chapter 6
No ratings yet
Chapter 6
71 pages
Section A
No ratings yet
Section A
18 pages
CA Slides#3 Pipeline Introduction
No ratings yet
CA Slides#3 Pipeline Introduction
26 pages
Advanced Computer Architecture: Pipelined Processor
No ratings yet
Advanced Computer Architecture: Pipelined Processor
20 pages
AAPP Mod3 Latest
No ratings yet
AAPP Mod3 Latest
65 pages
Principles of Linear Pipelining
50% (2)
Principles of Linear Pipelining
71 pages
Nonlinear Pipelining: Nonlinear Pipeline: Which Allows
No ratings yet
Nonlinear Pipelining: Nonlinear Pipeline: Which Allows
25 pages
Comp Architecture Chapter 4 - Pipelining
No ratings yet
Comp Architecture Chapter 4 - Pipelining
53 pages
Pipeline Processing
No ratings yet
Pipeline Processing
43 pages
ACA - Chapter 6
No ratings yet
ACA - Chapter 6
75 pages
2.2 Pipelining: Asynchronous
25% (4)
2.2 Pipelining: Asynchronous
24 pages
Lecture 8 Unit 4 Pipeline and Vector Processing 2019
No ratings yet
Lecture 8 Unit 4 Pipeline and Vector Processing 2019
36 pages
Unit-4 Pipelinie and Vector Processing
No ratings yet
Unit-4 Pipelinie and Vector Processing
33 pages
Parallel Chapter3
No ratings yet
Parallel Chapter3
29 pages
Chapter 6 (Pipelining and Superscalar Techniques)
No ratings yet
Chapter 6 (Pipelining and Superscalar Techniques)
10 pages
Pipeline and Vector
No ratings yet
Pipeline and Vector
29 pages
Nonlinear Pipelining: Nonlinear Pipeline: Which Allows
No ratings yet
Nonlinear Pipelining: Nonlinear Pipeline: Which Allows
25 pages
Parallel Processing
No ratings yet
Parallel Processing
32 pages
Floating Pt. Mult. Pipeline: Exponent Add
No ratings yet
Floating Pt. Mult. Pipeline: Exponent Add
7 pages
Pipelining: Advanced Computer Architecture
100% (1)
Pipelining: Advanced Computer Architecture
30 pages
Collision Free Scheduling
No ratings yet
Collision Free Scheduling
18 pages
ENG - STAG-TAP-01 - 02v1 - 3 - 2
No ratings yet
ENG - STAG-TAP-01 - 02v1 - 3 - 2
30 pages
ECE Sem 5 Syllabus
No ratings yet
ECE Sem 5 Syllabus
2 pages
Project On PLC For Color Mixer Conveyor
No ratings yet
Project On PLC For Color Mixer Conveyor
37 pages
Pipelining: Advanced Computer Architecture
No ratings yet
Pipelining: Advanced Computer Architecture
23 pages
EE 423 Embedded System Design Awais Kamboh
No ratings yet
EE 423 Embedded System Design Awais Kamboh
5 pages
3BDS005875R201B - en - Release - Notes - AC160 - Version - 2 - 2 - 1
No ratings yet
3BDS005875R201B - en - Release - Notes - AC160 - Version - 2 - 2 - 1
53 pages
Chapter 9 - Pipeline and Vector Processing Section 9.1 - Parallel Processing
No ratings yet
Chapter 9 - Pipeline and Vector Processing Section 9.1 - Parallel Processing
10 pages
NXP LPC3250 Featuring ARM9 With Ethernet, USB, and LCD: EPC - Embedded Platform Concept
100% (1)
NXP LPC3250 Featuring ARM9 With Ethernet, USB, and LCD: EPC - Embedded Platform Concept
29 pages
LIS210 Notes
No ratings yet
LIS210 Notes
96 pages
Abbreviations
No ratings yet
Abbreviations
194 pages
Design Choices - Ethernet
No ratings yet
Design Choices - Ethernet
29 pages
Unit 2 - ARM7 Based Microcontroller
No ratings yet
Unit 2 - ARM7 Based Microcontroller
112 pages
Acer Tab A100 Service Manual
No ratings yet
Acer Tab A100 Service Manual
224 pages
CV-8052 Opcodes
No ratings yet
CV-8052 Opcodes
8 pages
Microeconomics (Sandeep Garg)
No ratings yet
Microeconomics (Sandeep Garg)
6 pages
Adaptive Dynamic Relaxation Algorithm For Non-Linear Hyperelastic Structures
No ratings yet
Adaptive Dynamic Relaxation Algorithm For Non-Linear Hyperelastic Structures
19 pages
5.embedded Based Solar Panel Cleaning System
No ratings yet
5.embedded Based Solar Panel Cleaning System
34 pages
Embedded Processors: Instruction Set Architecture (ISA)
No ratings yet
Embedded Processors: Instruction Set Architecture (ISA)
90 pages
Internal Architecture of 8086
No ratings yet
Internal Architecture of 8086
24 pages
The Type of Memory Assignment Used in Intel Processors Is
No ratings yet
The Type of Memory Assignment Used in Intel Processors Is
26 pages
Microcontroller Based Control
No ratings yet
Microcontroller Based Control
5 pages
Isca24 Ghost
No ratings yet
Isca24 Ghost
16 pages
Design Embedded System With Pic
No ratings yet
Design Embedded System With Pic
13 pages
High Speed electronics-UoH-3-SoC-IC-Basics - Requirments
No ratings yet
High Speed electronics-UoH-3-SoC-IC-Basics - Requirments
46 pages
Generations of Computers
No ratings yet
Generations of Computers
4 pages
Microprocessor Interview Questions and Answers Guide.: Global Guideline
No ratings yet
Microprocessor Interview Questions and Answers Guide.: Global Guideline
6 pages
Unit 4
No ratings yet
Unit 4
10 pages
Caches in Multicore Systems: Universitatea Politehnica Din Timisoara Facultatea de Automatica Şi Calculatoare
No ratings yet
Caches in Multicore Systems: Universitatea Politehnica Din Timisoara Facultatea de Automatica Şi Calculatoare
7 pages
Class Test 11 Python
No ratings yet
Class Test 11 Python
2 pages
Design of Electrical Circuits using Engineering Software Tools
From Everand
Design of Electrical Circuits using Engineering Software Tools
K.Meenendranath Reddy
No ratings yet
Solutions to Problems in Fluids and Turbomachinery
From Everand
Solutions to Problems in Fluids and Turbomachinery
Rahul Basu
No ratings yet
Electronics II Essentials
From Everand
Electronics II Essentials
The Editors of REA
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Pipelinenew

Uploaded by

Pipelinenew

Uploaded by

PIPELINING

Pipelining is the process of accumulating instruction from the processor through a

 Dryer takes 40 minutes

 “Folder” takes 20 minutes

Sequential laundry takes 6 hours for 4 loads

 Input signal with Ready signal followed

It is a pure combinational circuit.

Non-pipelined processor: 25 cycles = number of instrs (5) * number of stages (5)

 The theoretical speedup offered by a pipeline can be determined

 If we take the limit as n approaches infinity, (k + n - 1)

• If we lower the time per cycle, this will lower the

 A reservation table displays the time-space flow

 A static pipeline is specified by a single

 A dynamic pipeline may be specified by multiple

 Collision : an attempt by two initiations to use the same pipeline

 Some latencies cause collision, some not

 Forbidden latency: Latency that causes collision

• Forbidden Latency Set,F = {5} , {2} , {2}

c. Append zeros to left

 Greedy Cycle: A simple cycle is a greedy cycle if each latency contained in a

 In the above example, the cycle that offers MAL is (1, 3, 3)

 The checkmarks in a row correspond to the time

 Multiple checkmarks in a row  repeated usage of the

 Contiguous checkmarks  extended usage of a stage over more

 Multiple checkmarks in one column  multiple stages are used

 A dynamic pipeline may allow different initiations to follow a mix

Time (clock cycles)

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7

Goal : to find the shortest average latency

Lengths : for reservation table with n columns, maximum forbidden

Ideal case : p = 1 (static pipeline)

Collision vector : C = (CmCm-1 . . .C2C1)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.