CMP3010L09 MemoryII
CMP3010L09 MemoryII
Dina Tantawy
Computer Engineering Department
Cairo University
Agenda
• Review
• Full Associative Cache
• Set Associative Cache
• Cache performance
• Cache vs virtual memory
• Software optimization based on caching
Review: memory hierarchy
3
Review: memory hierarchy
4
Review: memory hierarchy
Principle of locality
States that programs access a relatively small portion of their
address space at any instant of time.
‹#›
Review: Terminologies
•Hit: data appears in some block in the upper level
–Hit Rate: the fraction of memory access found in the upper level
–Hit Time: Time to access the upper level which consists of
cache access time + Time to determine hit/miss
7
Review: Direct Mapped Cache
8
Review: Read and Write Policies
• Two write options when the data block is in the memory :
• Write Through: write to cache and memory at the same time.
• Isn’t memory too slow for this?
• Write Back: write to cache only. Write the cache block to memory
when that cache block is being replaced on a cache miss.
• Need a “dirty” bit for each cache block
• Control can be complex
9
Review: Write Miss Policies
• Write allocate (also called fetch on write): data at the missed-
write location is loaded to cache, followed by a write-hit
operation. In this approach, write misses are like read misses.
‹#›
What about other mappings ?
‹#›
Flexible placement of blocks: Associativity
1111111111 2222222222 33
Block Number 0 1 2 3 4 5 6 7 8 9 0123456789 0123456789 01
Memory
Cache
: :
X Byte 63 Byte 33 Byte 32
X
X
X
: : :
13
Flexible placement of blocks: Associativity
14
A Four-way Set Associative Cache
• N-way set associative: N
entries for each Cache Index
• N direct mapped caches
operates in parallel
• Example: Four-way set
associative cache
• Cache Index selects a “set”
from the cache
• The four tags in the set are
compared in parallel
• Data is selected based on the
tag result
15
Replacement Policy
In an associative cache, which block from a set should be
evicted when the set becomes full?
• Random
• Least-Recently Used (LRU)
• LRU cache state must be updated on every access
• true implementation only feasible for small sets (2-way)
16
Quiz
Assume a 16 Kbyte cache that holds both instructions and data.
Additional specs for the 16 Kbyte cache include:
- Each block will hold 32 bytes of data
- The cache would be 4-way set associative
- Physical addresses are 32 bits
17
Cache Performance
‹#›
Measuring Cache Performance
Impact of cache miss on Performance
19
Example
20
Example: Solution
21
Example Solution
22
Improving Cache Performance
To improve performance:
• reduce the hit time
• reduce the miss rate
• reduce the miss penalty
23
Sources of Cache Misses
• Compulsory (cold start, first reference): first access to a block
• Misses that would occur even with infinite cache
• “Cold” fact of life: not a whole lot you can do about it
• Conflict (collision):
• Multiple memory locations mapped to the same cache location
• Solution 1: increase cache size
• Solution 2: increase associativity
• Capacity:
• Cache cannot contain all blocks accessed by the program
• Solution: increase cache size
Reducing Miss Penalty Using Multilevel caches
• Use smaller L1 if there is also L2
CPU L1 L2 DRAM
25
Performance of Multilevel Caches
26
Effect of Cache Parameters on Performance
• Larger cache size
+ reduces capacity and conflict misses
- hit time will increase
• Higher associativity
+ reduces conflict misses
- may increase hit time
27
Quiz
• Suppose a processor executes at
• Clock Rate = 1 GHz (1 ns per cycle), Ideal (no misses) CPI = 1.5
• 40% arith/logic, 40% ld/st, 20% control
• Suppose that 5% of memory operations (involving data) get 100 cycle miss penalty
• Suppose that 2% of instructions get same miss penalty
Determine how much faster a processor with a perfect cache that never missed
would run?
28
Is Virtual Memory same as
Caching ?
‹#›
‹#›
Virtual Memory Vs Cache Memory
‹#›
Software Optimization via Blocking
• When dealing with arrays, we can get good performance from the
memory system if we store the array in memory so that accesses to
the array are sequential in memory. What about the matrix ?
‹#›
Software Optimization via Blocking
‹#›
Thank you
‹#›