0% found this document useful (0 votes)
0 views39 pages

CMP3010L09 MemoryII

The document covers the concepts of memory hierarchy, focusing on cache types, performance metrics, and optimization techniques. It discusses fully associative and set associative caches, cache performance measurements, and the differences between cache and virtual memory. Additionally, it explores software optimization strategies for improving cache efficiency, particularly in matrix operations.

Uploaded by

Mostafa Mohamed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
0 views39 pages

CMP3010L09 MemoryII

The document covers the concepts of memory hierarchy, focusing on cache types, performance metrics, and optimization techniques. It discusses fully associative and set associative caches, cache performance measurements, and the differences between cache and virtual memory. Additionally, it explores software optimization strategies for improving cache efficiency, particularly in matrix operations.

Uploaded by

Mostafa Mohamed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

CMP3010: Computer Architecture

L09: Memory Hierarchy ||

Dina Tantawy
Computer Engineering Department
Cairo University
Agenda
• Review
• Full Associative Cache
• Set Associative Cache
• Cache performance
• Cache vs virtual memory
• Software optimization based on caching
Review: memory hierarchy

3
Review: memory hierarchy

4
Review: memory hierarchy

Principle of locality
States that programs access a relatively small portion of their
address space at any instant of time.

Temporal locality Spatial locality

If an item is referenced, If an item is referenced, items


it will tend to be referenced again whose addresses are close by will tend
soon. to be referenced soon.

‹#›
Review: Terminologies
•Hit: data appears in some block in the upper level
–Hit Rate: the fraction of memory access found in the upper level
–Hit Time: Time to access the upper level which consists of
cache access time + Time to determine hit/miss

•Miss: data needs to be retrieved from a block in the lower level


–Miss Rate = 1 - (Hit Rate)
–Miss Penalty = Time to replace a block in the upper level + Time to deliver the
block to the processor

•Hit Time << Miss Penalty


6
Review: The Basics of Caches
• How do we know if a data item is in the cache?
• How do we find it?

7
Review: Direct Mapped Cache

8
Review: Read and Write Policies
• Two write options when the data block is in the memory :
• Write Through: write to cache and memory at the same time.
• Isn’t memory too slow for this?

• Write Back: write to cache only. Write the cache block to memory
when that cache block is being replaced on a cache miss.
• Need a “dirty” bit for each cache block
• Control can be complex

9
Review: Write Miss Policies
• Write allocate (also called fetch on write): data at the missed-
write location is loaded to cache, followed by a write-hit
operation. In this approach, write misses are like read misses.

• No-write allocate (also called write-no-allocate or write


around): data at the missed-write location is not loaded to
cache, and is written directly to the backing store. In this
approach, data is loaded into the cache on read misses only.

‹#›
What about other mappings ?

‹#›
Flexible placement of blocks: Associativity
1111111111 2222222222 33
Block Number 0 1 2 3 4 5 6 7 8 9 0123456789 0123456789 01

Memory

Set Number 0 1 2 3 01234567

Cache

Fully (2-way) Set Direct


Associative Associative Mapped
anywhere anywhere in only into
block 12
can be placed set 0 block 4
(12 mod 4) (12 mod 8)
12
Fully Associative
• Fully Associative Cache -- push the set associative idea to its limit!
• Forget about the Cache Index
• Compare the Cache Tags of all cache entries in parallel
• Example: Block Size = 32 B blocks, we need N 27-bit comparators
• By definition: Conflict Miss = 0 for a fully associative cache
3 4 0
1 Cache Tag (27 bits long) Byte Select
Ex: 0x01

Cache Tag Valid Bit Cache Data


X Byte 31 Byte 1 Byte 0

: :
X Byte 63 Byte 33 Byte 32
X
X

X
: : :
13
Flexible placement of blocks: Associativity

14
A Four-way Set Associative Cache
• N-way set associative: N
entries for each Cache Index
• N direct mapped caches
operates in parallel
• Example: Four-way set
associative cache
• Cache Index selects a “set”
from the cache
• The four tags in the set are
compared in parallel
• Data is selected based on the
tag result

15
Replacement Policy
In an associative cache, which block from a set should be
evicted when the set becomes full?

• Random
• Least-Recently Used (LRU)
• LRU cache state must be updated on every access
• true implementation only feasible for small sets (2-way)

• First-In, First-Out (FIFO) a.k.a. Round-Robin


• used in highly associative caches

Replacement only happens on misses

16
Quiz
Assume a 16 Kbyte cache that holds both instructions and data.
Additional specs for the 16 Kbyte cache include:
- Each block will hold 32 bytes of data
- The cache would be 4-way set associative
- Physical addresses are 32 bits

Q1: How many blocks would be in this cache?


Q2: How many bits of tag are stored with each block entry?

17
Cache Performance

‹#›
Measuring Cache Performance
Impact of cache miss on Performance

19
Example

20
Example: Solution

21
Example Solution

22
Improving Cache Performance

Average memory access time(AMAT) =


Hit time + Miss rate x Miss penalty

To improve performance:
• reduce the hit time
• reduce the miss rate
• reduce the miss penalty

23
Sources of Cache Misses
• Compulsory (cold start, first reference): first access to a block
• Misses that would occur even with infinite cache
• “Cold” fact of life: not a whole lot you can do about it

• Conflict (collision):
• Multiple memory locations mapped to the same cache location
• Solution 1: increase cache size
• Solution 2: increase associativity

• Capacity:
• Cache cannot contain all blocks accessed by the program
• Solution: increase cache size
Reducing Miss Penalty Using Multilevel caches
• Use smaller L1 if there is also L2

• Trade increased L1 miss rate for reduced L1 hit time


and reduced L1 miss penalty

• Reduces average access time

CPU L1 L2 DRAM

25
Performance of Multilevel Caches

26
Effect of Cache Parameters on Performance
• Larger cache size
+ reduces capacity and conflict misses
- hit time will increase

• Higher associativity
+ reduces conflict misses
- may increase hit time

• Larger block size


+ reduces compulsory misses and reload
- increases conflict misses and miss penalty

27
Quiz
• Suppose a processor executes at
• Clock Rate = 1 GHz (1 ns per cycle), Ideal (no misses) CPI = 1.5
• 40% arith/logic, 40% ld/st, 20% control
• Suppose that 5% of memory operations (involving data) get 100 cycle miss penalty
• Suppose that 2% of instructions get same miss penalty

Determine how much faster a processor with a perfect cache that never missed
would run?

28
Is Virtual Memory same as
Caching ?

‹#›
‹#›
Virtual Memory Vs Cache Memory

Virtual Memory Cache Memory


Increases the capacity of main memory. Increase the accessing speed of CPU.

Virtual memory is not a memory unit, its a


Cache memory is a hardware.
technique.
Operating System manages the Virtual
Hardware manages the cache memory.
memory.
The size of virtual memory could be greater The size of cache memory is less than the
than main memory main memory
‹#›
How can we benefit from cache?

‹#›
Software Optimization via Blocking
• When dealing with arrays, we can get good performance from the
memory system if we store the array in memory so that accesses to
the array are sequential in memory. What about the matrix ?

• How Matrix is stored ?


• Row Major (row by row)
• Column Major (column by column)

• A size matrix of 512x512 needs = 1MB, much bigger than level-1


cache. It doesn’t fit in the memory ?!
‹#›
Software Optimization via Blocking
• How Matrix Multiplication is done?

for (int j = 0; j < n; ++j)


{
double cij = C[i+j*n]; /* cij = C[i][j] */
for( int k = 0; k < n; k++ )
cij += A[i+k*n] * B[k+j*n]; /* cij += A[i][k]*B[k][j] */
C[i+j*n] = cij; /* C[i][j] = cij */
}

‹#›
Software Optimization via Blocking

Do we need to store all three


White: not accessed
matrices ? Isn’t that increasing
Light grey: old access
cache misses due to
Dark grey: new access
replacement? ‹#›
Software Optimization via Blocking

Blocked DGEMM ‹#›


Software Optimization via Blocking

Blocked DGEMM ‹#›


Software Optimization via Blocking

‹#›
Thank you

‹#›

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy