0% found this document useful (0 votes)
44 views10 pages

Minmin 9

This document discusses computer organization and exploiting memory hierarchy. It provides an example of mapping a byte address to a cache block. A simplified performance model is presented that shows decreasing miss rate or penalty can improve performance. Example problems are given and solved regarding the performance impact of a perfect cache, reducing CPI, and increasing clock rate. Faster systems are shown to be hurt more by cache misses.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views10 pages

Minmin 9

This document discusses computer organization and exploiting memory hierarchy. It provides an example of mapping a byte address to a cache block. A simplified performance model is presented that shows decreasing miss rate or penalty can improve performance. Example problems are given and solved regarding the performance impact of a perfect cache, reducing CPI, and increasing clock rate. Faster systems are shown to be hurt more by cache misses.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

COMPUTER

ORGANIZATION
After Midterm
‫الر ِحي ِم‬
‫َّ‬ ‫ن‬
‫ِ‬ ‫ـ‬ ‫م‬‫ح‬ ‫الر‬
‫َّ ْ َ ٰ‬ ‫ِ‬
‫ـه‬‫َ‬ ‫ِب ْس ِم الل ّ‬
‫‪14‬‬
‫‪Exploiting Memory Hierarchy‬‬
‫” حسبنا الله سيؤتينا الله من فضله ان الى اللهراغبون ”‬
Example Problem

 Consider a cache with 64 blocks and a block size of 16 bytes.


What block number does byte address 1200 map to?

We divide the byte address by the block size to find the index
field As block size = 16 bytes:
byte address 1200  block address 1200/16  = 75
 As cache size = 64 blocks:
block address 75  cache block (75 mod 64) = 11
Performance

 Simplified model assuming equal read and write miss penalties:


 CPU time = (execution cycles + memory stall cycles)  cycle time
 memory stall cycles = memory accesses  miss rate  miss penalty

 Therefore, two ways to improve performance in cache:


 decrease miss rate
 decrease miss penalty
 what happens if we increase block size?
Example Problems
 Assume for a given machine and program:
 instruction cache miss rate 2%
 data cache miss rate 4%
 miss penalty always 40 cycles
 CPI of 2 without memory stalls
 frequency of load/stores 36% of instructions

1. How much faster is a machine with a perfect cache that never


misses?
2. What happens if we speed up the machine by reducing its CPI to 1
without changing the clock rate?
3. What happens if we speed up the machine by doubling its clock rate,
but if the absolute time for a miss penalty remains same?
Solution
1.
 Assume instruction count = I
 Instruction miss cycles = I  2%  40 = 0.8  I
 Data miss cycles = I  36%  4%  40 = 0.576  I
 So, total memory-stall cycles = 0.8  I + 0.576  I = 1.376  I
 in other words, 1.376 stall cycles per instruction
 Therefore, CPI with memory stalls = 2 + 1.376 = 3.376
 Assuming instruction count and clock rate remain same for a
perfect cache and a cache that misses:
CPU time with stalls / CPU time with perfect cache
= 3.376 / 2 = 1.688
 Performance with a perfect cache is better by a factor of 1.688
Solution

2.
 CPI without stall = 1
 CPI with stall = 1 + 1.376 = 2.376 (clock has not changed so
stall cycles per instruction
remains same)
 CPU time with stalls / CPU time with perfect cache
= CPI with stall / CPI without stall
= 2.376
 Performance with a perfect cache is better by a factor of 2.376
 Conclusion: with higher CPI cache misses “hurt more” than with
lower CPI
Solution

3.
 With doubled clock rate, miss penalty = 2  40 = 80 clock cycles
 Stall cycles per instruction = (I  2%  80) + (I  36%  4%  80)
= 2.752  I
 So, faster machine with cache miss has CPI = 2 + 2.752 = 4.752
 CPU time with stalls / CPU time with perfect cache
= CPI with stall / CPI without stall
= 4.752 / 2 = 2.376
 Performance with a perfect cache is better by a factor of 2.376
 Conclusion: with higher clock rate cache misses “hurt more” than
with lower clock rate
‫‪14‬‬
‫‪Exploiting Memory Hierarchy‬‬
‫” حسبنا الله سيؤتينا الله من فضله ان الى اللهراغبون ”‬

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy