0% found this document useful (0 votes)
23 views43 pages

Pipelinenew

Uploaded by

Dibakar Banerjee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views43 pages

Pipelinenew

Uploaded by

Dibakar Banerjee
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 43

PIPELINING

Pipelining is the process of accumulating instruction from the processor through a


pipeline.

It allows storing and executing instructions in an orderly process. It is also known as pipeline
processing.
What Is Pipelining

 Laundry Example
A B C D
 Ann, Brian, Cathy, Dave
each have one load of clothes
to wash, dry, and fold
 Washer takes 30 minutes

 Dryer takes 40 minutes

 “Folder” takes 20 minutes

2
What Is Pipelining
6 PM 7 8 9 10 11 Midnight

Time

30 40 20 30 40 20 30 40 20 30 40 20
T
a A
s
k
B
O
r C
d
e
r D

Sequential laundry takes 6 hours for 4 loads


If they learned pipelining, how long would laundry take?
3
Motivation
 Non-pipelined design
 Single-cycle implementation
 The cycle time depends on the slowest instruction
 Every instruction takes the same amount of time
 Multi-cycle implementation
 Divide the execution of an instruction into multiple steps
 Each instruction may take variable number of steps (clock cycles)
 Pipelined design
 Divide the execution of an instruction into multiple steps (stages)
 Overlap the execution of different instructions in different stages
 Each cycle different instruction is executed in different stages
 For example, 5-stage pipeline (Fetch-Decode-Read-Execute-Write),
 5 instructions are executed concurrently in 5 different pipeline stages
 Complete the execution of one instruction every cycle (instead of every 5
cycle)
 Can increase the throughput of the machine 5 times
PIPELINE PRINCIPLE

 Linear Pipeline
 Asynchronous
 Synchronous

 Nonlinear Pipeline
Asynchronous

 Input signal with Ready signal followed


By a Ack signal

Input Output/Input

Ready S 1
ACK ACK
Synchronous

It is a pure combinational circuit.


A valid bit from the previous stage is used to gate the clock signal .
 Clock
L L L
 Input S1 Sn

 Latch input ….

CLK Pulse
Pipeline Example

5 stage pipeline:
Fetch – Decode – Read – Execute - Write

Non-pipelined processor: 25 cycles = number of instrs (5) * number of stages (5)

F D R E W
F D R E W
F D R E W
F D R E W
Pipelined processor: 9 cycles = start-up latency (4) + number of instrs (5)
F
F D R E W Draining the
pipeline
F D R E W
F D R E W
Filling the F D R E W
pipeline
F D R E W
Theoretical Speedup due to Pipelining

 The theoretical speedup offered by a pipeline can be determined


as follows:
 • Let k be total number of stages and tp be the time per stage
 • Each instruction represents a task, T, in the pipeline and n be
the total number of tasks
 • The first task (instruction) requires k × tp time to complete in a
k-stage pipeline.
 • The remaining (n - 1) tasks emerge from the pipeline one per
cycle
 So the total time to complete the remaining tasks is (n - 1)tp
 • Thus, to complete n tasks using a k-stage pipeline requires:
(k × tp) + (n - 1)tp = (k + n - 1)tp
 If we take the time required to complete n tasks without a
pipeline and divide it by the time it takes to complete n tasks
using a pipeline, we find:

 tn = k x tp

 If we take the limit as n approaches infinity, (k + n - 1)


approaches n, which results in a theoretical speedup of:
Speed up & Efficiency

Sk = n.k/(k+(n-1))
 Efficiency is measured by the percentage of busy time
space span over the total time space span.
Impact on Clock cycle time due to Pipelining

 Lets recall

• If we lower the time per cycle, this will lower the


program execution time and hence improve performance
• This implies that we if we shorten the time per pipeline
stages, we will lower clock cycle time. This can be achieved by
adding more pipe stages of shorter duration
Non-linear Dynamic Pipelines
 Multiple processors (k-stages) as linear pipeline
 Variable functions of individual processors
 Functions may be dynamically assigned
 Feedforward and feedback connections
Reservation Table

 A reservation table displays the time-space flow


of data through the pipeline for one function
evaluation

 A static pipeline is specified by a single


reservation table

 A dynamic pipeline may be specified by multiple


reservation tables
Static Pipeline

Time

S1 X

S2 X

S3 X

S4
X
Dynamic Pipeline

S1 X X X
S2 X X
S3 X X X
Latency Analysis
 Latency : the number of clock cycles between two initiations
of the pipeline

 Collision : an attempt by two initiations to use the same pipeline


stage at the same time

 Some latencies cause collision, some not


Latency analysis cont.
 Collision: The number of time unit between
two initiation of a pipeline is the latency
between them. Always ve+ integer.

 Forbidden latency: Latency that causes collision


is called forbidden latency.
State Diagram
Method of finding Latency

• Forbidden Latency Set,F = {5} , {2} , {2}


= { 2,5}
State Diagram
 The initial collision vector (ICV) is a binary vector
formed from F such that
C = (Cn…. C2 C1)
Thus in our example
F = { 2,5 }
C = (1 0 0 1 0)
State Diagram
 The procedure is as follows:
1. Start with the ICV
2. For each unprocessed state,
For each bit i in the CVi which is 0, do the following:
a. Shift CVi right by i bits
b. Drop i rightmost bits
State Diagram

c. Append zeros to left


d. Logically OR with ICV
e. If step(d) results in a new state then form a new
node for this state and join it with node of CVi by an arc
with a marking i.
 This shifting process needs to continue until no
more new states can be generated.
State Diagram

10010
State Diagram

10010 i =1

1
ICV – 10010 OR
CVi – 01001
11011 CV* 11011
State Diagram

10010 3
i =3

ICV – 10010 OR
11011 CVi – 00010
CV* 10010
State Diagram

10010 3
i =4

1 4

11011 10011

ICV – 10010 OR
CVi – 00001
CV* 10011
State Diagram
5
10010 3
i =5

1 4

11011 10011

ICV – 10010 OR
CVi – 00000
CV* 10010
State Diagram
5
10010 3

4
1 3

11011 10011

i =3

ICV – 10010 OR
CVi – 00010
CV* 10010
State Diagram
5
10010 3

4
1 3
4

11011 10011

i =4

ICV – 10010 OR
CVi – 00001
CV* 10011
State Diagram
5
10010 3

4
1 3
4

11011 10011
3

i =3

ICV – 10010 OR
CVi – 00011
CV* 10011
State Diagram
5+
10010 3

5+ 4
1 3
4

11011 10011
3

i =5

ICV – 10010 OR
CVi – 00000
CV* 10010
State Diagram
5+
10010 3

5+ 4 5+
1 3
4

11011 10011
3

i =5

ICV – 10010 OR
CVi – 00000
CV* 10010
State Diagram

 The state with all zeros has a self-loop which corresponds to empty pipeline
and it is possible to wait for indefinite number of latency cycles of the form
(7),(8), (9),(10) etc.
 Simple Cycle: latency cycle in which each state is encountered
only once.
 Complex Cycle: consists of more than one simple cycle in it.
 It is enough to look for simple cycles
State Diagram

 Greedy Cycle: A simple cycle is a greedy cycle if each latency contained in a


cycle is the minimal latency(outgoing arc) from a state
in the cycle.
 A good task initiation sequence should include the greedy cycle.
Simple cycles & Greedy cycles
 The Simple cycles are?
 The Greedy cycles are ?
Simple cycles & Greedy cycles
 The simple cycles are (3),(5) ,(1,3,3),(4,3) and (4)
 The Greedy cycle is (1,3,3)
State Diagram

 In the above example, the cycle that offers MAL is (1, 3, 3)


(MAL = (1+3+3)/3 =2.333)
Reservation Table (Cont.)
 The number of columns in a reservation table is called
the evaluation time of a given function.

 The checkmarks in a row correspond to the time


instants (cycles) that a particular stage will be used.

 Multiple checkmarks in a row  repeated usage of the


same stage in different cycles
Reservation Table (Cont.)

 Contiguous checkmarks  extended usage of a stage over more


than one cycle

 Multiple checkmarks in one column  multiple stages are used


in parallel

 A dynamic pipeline may allow different initiations to follow a mix


of reservation table
Visualizing Pipelining

Time (clock cycles)

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7


I

ALU
n Ifetch Reg DMem Reg

s
t
r.

ALU
Ifetch Reg DMem Reg

O
r

ALU
Ifetch Reg DMem Reg

d
e
r

ALU
Ifetch Reg DMem Reg
Collision Free Scheduling 18

Goal : to find the shortest average latency

Lengths : for reservation table with n columns, maximum forbidden


latency is m <= n – 1, and permissible latency p is
1 <= p <= m – 1

Ideal case : p = 1 (static pipeline)

Collision vector : C = (CmCm-1 . . .C2C1)


[ Ci = 1 if latency i causes collision ]
[ Ci = 0 for permissible latencies ]
Thank you

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy