0% found this document useful (0 votes)
120 views3 pages

Question 1 (50 Points) Pipelining

This document contains instructions for an online final exam with 3 independent questions worth a total of 50 points over 40 minutes. Question 1 has 3 parts about pipelining concepts like speedup calculations, maximum clock rates, and pipeline hazards. Question 2 compares two 5-stage and 6-stage pipeline implementations. Question 3 analyzes the execution of a loop through a 6-stage pipeline showing stalls and forwarding.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
120 views3 pages

Question 1 (50 Points) Pipelining

This document contains instructions for an online final exam with 3 independent questions worth a total of 50 points over 40 minutes. Question 1 has 3 parts about pipelining concepts like speedup calculations, maximum clock rates, and pipeline hazards. Question 2 compares two 5-stage and 6-stage pipeline implementations. Question 3 analyzes the execution of a loop through a 6-stage pipeline showing stalls and forwarding.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Zoom/Online Final - question Q1

There are 3 independent questions Total: 50


points Duration: 40 minutes

GOOD LUCK!

Question 1 (50 points) Pipelining


The following 3 parts are independent, you should answer each as if it is a separate question. Do not
forget to write your name on every page.

PART 1 (15 points) Assume you have a single cycle processor operating at 1 GHz. You are going to
make a 5-stage pipeline out of this processor. Although the processor can potentially operate at a
higher frequency, overheads associated with pipelining force you to operate the pipelined processor at
3 GHz. In a given program, assume that 40% are memory instructions, 50% are ALU instructions and
the rest are branch instructions. 10% of the memory instructions cause stalls of 20 clock cycles each
due to cache misses and 50% of the branch instructions cause stalls of 4 cycles each. Assume that
there are no stalls associated with the execution of ALU instructions. For this program, what is the
speedup achieved by the pipelined processor over the single cycle processor?

Answer:

time_single_cycle = IC x CPI x t_clock


= IC x 1 x 1/1GHz

CPI_pipeline = 1 + Overhead due to mem instr + Overhead due to branch instr


= 1 + 0.4 x 0.1 x 20 + 0.1 x 0.5 x 4 = 2
time_pipeline = IC x CPI x t_clock
= IC x 2 x 1/3GHz

Speedup = time_single_cycle / time_pipeline


= (IC x 1 x 1/1GHz) / (IC x 2 x 1/3GHz)
= 3/2 = 1.5
PART 2 (15 points) Compare two pipeline implementations: A and B with 5 and 6 stages,
respectively.The logic delays of the pipeline stages are as follows:

Stage 1 2 3 4 5 6
A 250ps 180ps 400ps 200ps 150ps -
B 200ps 150ps 250ps 250ps 150ps 180ps

a) What are the maximum clock rates for the two implementations? Note that 1ps = 10-12 seconds.

Option A f max = (include the unit with your result)


Tc = 400ps, f = 1/400ps therefore f = 1/ (400 * 10 ) = 2.5 * 109 Hz = 2.5 GHz
-12

Option B f max = (include the unit with your result)


Tc = 250ps, f = 1/250ps therefore f = 1/ (250 * 10-12) = 4 * 109 Hz = 4 GHz

b) Consider a program which requires 2 billion instructions to execute on pipeline A with a CPI
of 1.5, whereas 1.5 billion instructions to execute on pipeline B with a CPI of 4. Which
implementation would you prefer for this program?
T = IC * CPI * Tc
T_A = 2. 109 * 1.5 * 400 . 10-12 = 1.2 sec.
T_B = 1.5. 109 * 4 * 250 . 10-12 = 1.5 sec.
Therefore, A is faster for this program and should be chosen.
PART 3 (20 points) Assume you have a 6 stage pipeline which is composed of the following stages:

F D X1 X2 M W

Instruction RegFile ALU Data RegFile


Memory Memory

Note that, execute stage requires two clock cycles (X1 and X2). Also, the register file is designed in a
way so that there is NO early write and late read. Assuming that the execute stage is designed in such
a way that a new execution can begin even while the previous one is in progress to complete, we have
a pipeline which can theoretically start (and complete) one instruction per clock cycle. But hazards
complicate things, and stalls which are unavoidable will result in a CPI greater than 1. Assume that
branch decisions are performed in the X1 stage. The following code needs to be run:

I1:Loop: add $t0, $t1, $t2


I2: lw $t3, 0($t0)
I3: beq $t3, $t0, Loop
I4: Exit: ...

Consider only 2 iterations of the loop, that is, for a total of 3x2=6 instructions:
a) How many clock cycles does this code take in an ideal world if there were no control dependencies
or data dependencies?

b) Similar to the following table show which stage of each instruction is executed (F, D, X1, X2, M,
W) using the info given above, and assuming that pipeline has forwarding hardware. Also,
clearly show forwarding with arrows between stages (if any). Make sure that you explicitly
show stalls (if any).

Clock Cycle No.


(use as many as needed)
Instr 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

I1 add F D X1 X2 M W
I2 lw F D - X1 X2 M W
I3 beq F - D - - X1 X2 M W
I1 add - - - - F D X1 X2 M W
I2 lw F D - X1 X2 M W
I3 beq F - D - - X1 X2 M W

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy