12 CPUPerformance
12 CPUPerformance
COE 233
Logic Design and Computer Organization
Dr. Muhamed Mudawar
❖ Amdahl’s Law
CPU Performance COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 2
Response Time and Throughput
❖ Response Time
Time between start and completion of a task, as observed by end user
Response Time = CPU Time + Waiting Time (I/O, OS scheduling, etc.)
❖ Throughput
Number of tasks the machine can run in a given period of time
CPU Performance COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 3
Higher Performance = Less Execution Time
1
PerformanceX =
Execution timeX
CPU Performance COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 4
What do we mean by Execution Time?
❖ Real Elapsed Time
Counts everything:
▪ Waiting time, Input/output, disk access, OS scheduling, … etc.
CPU cycles
CPU Execution Time = CPU cycles × Cycle time =
Clock rate
CPU Performance COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 5
What is the Clock Cycle?
❖ Operation of digital hardware is governed by a clock
Clock Cycle
Clock
❖ Important point
Changing the cycle time often changes the number of cycles
required for various instructions
CPU Performance COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 8
Performance Equation
❖ To execute, a given program will require …
Some number of machine instructions
Some number of clock cycles
Some number of seconds
CPU Performance COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 9
Understanding Performance Equation
Program X
Compiler X X
ISA X X
Organization X X
Technology X
CPU Performance COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 10
Using the Performance Equation
❖ Suppose we have two implementations of the same ISA
❖ For a given program
Machine A has a clock cycle time of 250 ps and a CPI of 2.0
Machine B has a clock cycle time of 500 ps and a CPI of 1.2
Which machine is faster for this program, and by how much?
❖ Solution:
Both computer execute same count of instructions = I
CPU execution time (A) = I × 2.0 × 250 ps = 500 × I ps
CPU execution time (B) = I × 1.2 × 500 ps = 600 × I ps
600 × I
Computer A is faster than B by a factor = = 1.2
500 × I
CPU Performance COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 11
Determining the CPI
❖ Different types of instructions have different CPI
Let CPIi = clocks per instruction for class i of instructions
Let Ci = instruction count for class i of instructions
n ∑ (CPI × C )i i
❖ Solution
CPU cycles (1st sequence) = (2×1) + (1×2) + (2×3) = 2+2+6 = 10 cycles
CPU cycles (2nd sequence) = (4×1) + (1×2) + (1×3) = 4+2+3 = 9 cycles
Second sequence is faster, even though it executes one extra instruction
CPI (1st sequence) = 10/5 = 2 CPI (2nd sequence) = 9/6 = 1.5
CPU Performance COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 13
Second Example on CPI
Given: instruction mix of a program on a RISC processor
What is average CPI?
What is the percent of time used by each instruction class?
Classi Freqi CPIi CPIi × Freqi %Time
ALU 50% 1 0.5×1 = 0.5 0.5/2.2 = 23%
Load 20% 5 0.2×5 = 1.0 1.0/2.2 = 45%
Store 10% 3 0.1×3 = 0.3 0.3/2.2 = 14%
Branch 20% 2 0.2×2 = 0.4 0.4/2.2 = 18%
❖ Amdahl’s Law
CPU Performance COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 15
Drawback of Single Cycle Processor
❖ Single Cycle ➔ CPI = 1 for all instructions
❖ Major drawback is the Long cycle time
❖ All instructions take as much time as the slowest instruction
Instruction Decode Reg
ALU ALU
Fetch Reg Read Write
longest delay
Instruction Decode Compute Reg
Load Fetch Reg Read Address
Memory Read
Write
CPU Performance COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 17
Single-Cycle versus Multi-Cycle Performance
❖ Assume the following operation times for components:
Access time for Instruction and data memories: 200 ps
Delay in ALU and adders: 180 ps
Delay in Decode and Register file access (read or write): 150 ps
Ignore the other delays in PC, mux, extender, and wires
CPU Performance COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 18
Solution
Instruction Instruction Register ALU Data Register
Total
Class Memory Read Operation Memory Write
ALU 200 150 180 150 680 ps
Load 200 150 180 200 150 880 ps
Store 200 150 180 200 730 ps
Branch 200 150 180 Compare and update PC 530 ps
Jump 200 150 Decode and update PC 350 ps
CPU Performance COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 20
Drawbacks of MIPS
CPU Performance COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 21
MIPS example
❖ Two different compilers are being tested on the same program
for a 4 GHz machine with three different classes of instructions:
Class A, Class B, and Class C, which require 1, 2, and 3 cycles,
respectively.
❖ The instruction count produced by the first compiler is 5 billion
Class A instructions, 1 billion Class B instructions, and 1 billion
Class C instructions.
❖ The second compiler produces 10 billion Class A instructions, 1
billion Class B instructions, and 1 billion Class C instructions.
❖ Which compiler produces a higher MIPS?
❖ Which compiler produces a better execution time?
CPU Performance COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 22
Solution to MIPS Example
❖ First, we find the CPU cycles for both compilers
CPU cycles (compiler 1) = (5×1 + 1×2 + 1×3)×109 = 10×109
CPU cycles (compiler 2) = (10×1 + 1×2 + 1×3)×109 = 15×109
❖ Next, we find the execution time for both compilers
Execution time (compiler 1) = 10×109 cycles / 4×109 Hz = 2.5 sec
Execution time (compiler 2) = 15×109 cycles / 4×109 Hz = 3.75 sec
❖ Compiler1 generates faster program (less execution time)
❖ Now, we compute MIPS rate for both compilers
MIPS = Instruction Count / (Execution Time × 106)
MIPS (compiler 1) = (5+1+1) × 109 / (2.5 × 106) = 2800
MIPS (compiler 2) = (10+1+1) × 109 / (3.75 × 106) = 3200
❖ So, code from compiler 2 has a higher MIPS rating !!!
CPU Performance COE 233 – Logic Design and Computer Organization © Muhamed Mudawar – slide 23
Amdahl’s Law
❖ Amdahl's Law is a measure of Speedup
How a program performs after improving portion of a computer
Relative to how it performed previously