0% found this document useful (0 votes)
8 views143 pages

Chapter4 CacheMemory

COA

Uploaded by

beshahashenafi32
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views143 pages

Chapter4 CacheMemory

COA

Uploaded by

beshahashenafi32
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 143

Chapter 4 - Cache Memory

Luis Tarrataca
luis.tarrataca@gmail.com

CEFET-RJ

L. Tarrataca Chapter 4 - Cache Memory 1 / 143


Table of Contents I

1 Introduction

2 Computer Memory System Overview

Characteristics of Memory Systems

Memory Hierarchy

3 Cache Memory Principles

L. Tarrataca Chapter 4 - Cache Memory 2 / 143


Table of Contents I

4 Elements of Cache Design

Cache Addresses

Cache Size

Mapping Function
Direct Mapping

Associative Mapping

Set-associative mapping

Replacement Algorithms

Write Policy

Line Size

Number of caches
L. Tarrataca Chapter 4 - Cache Memory 3 / 143
Table of Contents II
Multilevel caches

Unified versus split caches

L. Tarrataca Chapter 4 - Cache Memory 4 / 143


Table of Contents I

5 Intel Cache

Intel Cache Evolution

Intel Pentium 4 Block diagram

L. Tarrataca Chapter 4 - Cache Memory 5 / 143


Introduction

Introduction

Remember this guy? Why was he famous for?

L. Tarrataca Chapter 4 - Cache Memory 6 / 143


Introduction

• John von Neumann;


• Hungarian-born scientist;
• Manhattan project;
• von Neumann Architecture:
• CPU ;
• Memory;
• I/O Module

L. Tarrataca Chapter 4 - Cache Memory 7 / 143


Introduction

Today’s focus: memory module of von Neumann’s architecture.

• Why may you ask?


• Because that is the order that your book follows =P

L. Tarrataca Chapter 4 - Cache Memory 8 / 143


Introduction

Although simple in concept computer memory exhibits wide range of:

• type;

• technology;

• organization;

• performance;

• and cost.

No single technology is optimal in satisfying all of these...

L. Tarrataca Chapter 4 - Cache Memory 9 / 143


Introduction

Typically:

• Higher performance → higher cost;

• Lower performance → lower cost;

L. Tarrataca Chapter 4 - Cache Memory 10 / 143


Introduction

Typically:

• Higher performance ⇒ higher cost;

• Lower performance ⇒ lower cost;

What to do then? Any ideas?

L. Tarrataca Chapter 4 - Cache Memory 11 / 143


Introduction

Typically, a computer has a hierarchy of memory subsystems:

• some internal to the system


• i.e. directly accessible by the processor;

• some external
• accessible via an I/O module;

L. Tarrataca Chapter 4 - Cache Memory 12 / 143


Introduction

Typically, a computer has a hierarchy of memory subsystems:

• some internal to the system


• i.e. directly accessible by the processor;

• some external
• accessible via an I/O module;

Can you see any advantages / disadvantages with using each one?

L. Tarrataca Chapter 4 - Cache Memory 13 / 143


Computer Memory System Overview Characteristics of Memory Systems

Computer Memory System Overview


Classification of memory systems according to their key characteristics:

Figure: Key Characteristics Of Computer Memory Systems (Source: [Stallings, 2015])

L. Tarrataca Chapter 4 - Cache Memory 14 / 143


Computer Memory System Overview Characteristics of Memory Systems

Lets see if you can guess what each one of these signifies... Any ideas?

L. Tarrataca Chapter 4 - Cache Memory 15 / 143


Computer Memory System Overview Characteristics of Memory Systems

• Location: either internal or external to the processor.


• Forms of internal memory:
• registers;
• cache;
• and others;

• Forms of external memory:


• disk;
• magnetic tape (too old... =P );
• devices that are accessible to the processor via I/O controllers.

L. Tarrataca Chapter 4 - Cache Memory 16 / 143


Computer Memory System Overview Characteristics of Memory Systems

• Capacity: amount of information the memory is capable of holding.


• Typically expressed in terms of bytes (1 byte = 8 bits) or words;

• A word represents each addressable block of the memory


• common word lengths are 8, 16, and 32 bits;

• External memory capacity is typically expressed in terms of bytes;

L. Tarrataca Chapter 4 - Cache Memory 17 / 143


Computer Memory System Overview Characteristics of Memory Systems

• Unity of transfer: number of bytes read / written into memory at a time.


• Need not equal a word or an addressable unit;

• Also possible to transfer blocks:


• Much larger units than a word;
• Used in external memory...
• External memory is slow...
• Idea: minimize number of acesses, optimize amount of data transfer;

L. Tarrataca Chapter 4 - Cache Memory 18 / 143


Computer Memory System Overview Characteristics of Memory Systems

• Access Method: How are the units of memory accessed?


• Sequential Method: Memory is organized into units of data, called records.
• Access must be made in a specific linear sequence;
• Stored addressing information is used to assist in the retrieval process.
• A shared read-write head is used;
• The head must be moved from its one location to the another;
• Passing and rejecting each intermediate record;
• Highly variable times.

Figure: Sequential Method Example: Magnetic Tape

L. Tarrataca Chapter 4 - Cache Memory 19 / 143


Computer Memory System Overview Characteristics of Memory Systems

• Access Method: How are the units of memory accessed?


• Direct Access Memory:
• Involves a shared read-write mechanism;
• Individual records have a unique address;
• Requires accessing general record vicinity plus sequential searching, counting,
or waiting to reach the final location;

• Access time is also variable;

Figure: Direct Access Memory Example: Magnetic Disk

L. Tarrataca Chapter 4 - Cache Memory 20 / 143


Computer Memory System Overview Characteristics of Memory Systems

• Access Method: How are the units of memory accessed?


• Random Access: Each addressable location in memory has a unique,
physically wired-in addressing mechanism.
• Constant time;
• independent of the sequence of prior accesses;
• Any location can be selected at random and directly accessed;
• Main memory and some cache systems are random access.

L. Tarrataca Chapter 4 - Cache Memory 21 / 143


Computer Memory System Overview Characteristics of Memory Systems

• Access Method: How are the units of memory accessed?


• Associative: RAM that enables one to make a comparison of desired bit
locations within a word for a specified match
• a word is retrieved based on a portion of its contents rather than its address;
• retrieval time is constant independent of location or prior access patterns
• E.g.: cache, neural networks.

L. Tarrataca Chapter 4 - Cache Memory 22 / 143


Computer Memory System Overview Characteristics of Memory Systems

• Performance:
• Access time ( latency ):
• For RAM: time to perform a read or write operation;
• For Non-RAM: time to position the read-write head at desired location;

• Memory cycle time: Primarily applied to RAM:


• Access time + additional time required required before a second access;
• Required for electrical signals to be terminated/regenerated;
• Concerns the system bus.

L. Tarrataca Chapter 4 - Cache Memory 23 / 143


Computer Memory System Overview Characteristics of Memory Systems

• Transfer time: Rate at which data can be transferred in / out of memory;


1
• For RAM: cycle time

• For Non-RAM: Tn = TA + Rn , where:


• Tn : Average time to read or write n bits;
• TA : Average access time;
• n: Number of bits
• R: Transfer rate, in bits per second (bps)

L. Tarrataca Chapter 4 - Cache Memory 24 / 143


Computer Memory System Overview Characteristics of Memory Systems

• Physical characteristics:
• Volatile: information decays naturally or is lost when powered off;

• Nonvolatile: information remains without deterioration until changed:


• no electrical power is needed to retain information.;
• E.g.: Magnetic-surface memories are nonvolatile;

• Semiconductor memory (memory on integrated circuits) may be either


volatile or nonvolatile.

L. Tarrataca Chapter 4 - Cache Memory 25 / 143


Computer Memory System Overview Characteristics of Memory Systems

Now that we have a better understanding of key memory aspects:

• We can try to relate some of these dimensions...

L. Tarrataca Chapter 4 - Cache Memory 26 / 143


Computer Memory System Overview Memory Hierarchy

Memory Hierarchy

Design constraints on memory can be summed up by three questions:

• How much?
• If memory exists, applications will likely be developed to use it.

• How fast?
• Best performance achieved when memory keeps up with the processor;

• I.e. as the processor execute instructions, memory should minimize pausing /


waiting for instructions or operands.

• How expensive?
• Cost of memory must be reasonable in relationship to other components;

L. Tarrataca Chapter 4 - Cache Memory 27 / 143


Computer Memory System Overview Memory Hierarchy

Memory tradeoffs are a sad part of reality =’(

• Faster access time, greater cost per bit;

• Greater capacity, smaller cost per bit;

• Greater capacity, slower access time;

L. Tarrataca Chapter 4 - Cache Memory 28 / 143


Computer Memory System Overview Memory Hierarchy

These tradeoff imply a dilemma:

• Large capacity memories are desired:


• low cost and because the capacity is needed;

• However, to meet performance requirements, the designer needs:


• to use expensive, relatively lower-capacity memories with short access times.

L. Tarrataca Chapter 4 - Cache Memory 29 / 143


Computer Memory System Overview Memory Hierarchy

These tradeoff imply a dilemma:

• Large capacity memories are desired:


• low cost and because the capacity is needed;

• However, to meet performance requirements, the designer needs:


• to use expensive, relatively lower-capacity memories with short access times.

How can we solve this issue? Or at least mitigate the problem? Any
ideas?

L. Tarrataca Chapter 4 - Cache Memory 30 / 143


Computer Memory System Overview Memory Hierarchy

The way out of this dilemma:

• Don’t rely on a single memory;


• Instead employ a memory hierarchy;
• Supplement:
• smaller, more expensive, faster
memories with...
• ...larger, cheaper, slower
memories;
• engineering FTW =)
Figure: The memory hierarchy (Source:
[Stallings, 2015])

L. Tarrataca Chapter 4 - Cache Memory 31 / 143


Computer Memory System Overview Memory Hierarchy

The way out of this dilemma:

• As one goes down the hierarchy:


• Decreasing cost per bit;
• Increasing capacity;
• Increasing access time;
• Decreasing frequency of
access of memory by
processor

Figure: The memory hierarchy (Source:


[Stallings, 2015])

L. Tarrataca Chapter 4 - Cache Memory 32 / 143


Computer Memory System Overview Memory Hierarchy

Key to the success of this organization is the last item:

• Decreasing frequency of access of memory by processor.

But why is this key to success? Any ideas?

L. Tarrataca Chapter 4 - Cache Memory 33 / 143


Computer Memory System Overview Memory Hierarchy

The key to the success of this organization is the last item:

• Decreasing frequency of access of the memory by the processor.

But why is this key to success? Any ideas?

• As we go down the hierarchy we gain in size but lose in speed;

• Therefore: not efficient for the processor to access these memories;

• Requires having specific strategies to minimize such accesses;

L. Tarrataca Chapter 4 - Cache Memory 34 / 143


Computer Memory System Overview Memory Hierarchy

So now the question is...

How can we develop strategies to minimize these accesses? Any ideas?

L. Tarrataca Chapter 4 - Cache Memory 35 / 143


Computer Memory System Overview Memory Hierarchy

How can we develop strategies to minimize these accesses? Any ideas?

Space and Time locality of reference principle:

• Space:
• if we access a memory location, close by addresses will very likely be accessed;

• Time:
• if we access a memory location, we will very likely access it again;

L. Tarrataca Chapter 4 - Cache Memory 36 / 143


Computer Memory System Overview Memory Hierarchy

Space and Time locality of reference principle:

• Space:
• if we access a memory location, close by addresses will very likely be accessed;

• Time:
• if we access a memory location, we will very likely access it again;

But why does this happen? Any ideas?

L. Tarrataca Chapter 4 - Cache Memory 37 / 143


Computer Memory System Overview Memory Hierarchy

Space and Time locality of reference principle:

• Space:
• if we access a memory location, close by addresses will very likely be accessed;

• Time:
• if we access a memory location, we will very likely access it again;

But why does this happen? Any ideas?

This a consequence of using iterative loops and subroutines:

• instructions and data will be accessed multiple times;

L. Tarrataca Chapter 4 - Cache Memory 38 / 143


Computer Memory System Overview Memory Hierarchy

Example (1/5)

Suppose that the processor has access to two levels of memory:

• Level 1 - L1 :
• contains 1000 words and has an access time of 0.01µs;

• Level 2 - L2 :
• contains 100,000 words and has an access time of 0.1µs.

• Assume that:
• if word ∈ L1 , then the processor accesses it directly;

• If word ∈ L2 , then word is transferred to L1 and then accessed by the


processor.

L. Tarrataca Chapter 4 - Cache Memory 39 / 143


Computer Memory System Overview Memory Hierarchy

Example (2/5)

For simplicity:

• ignore time required for processor to determine whether word is in L1 or L2 .

Also, let:

• H define the fraction of all memory accesses that are found L1;

• T1 is the access time to L1;

• T2 is the access time to L2

L. Tarrataca Chapter 4 - Cache Memory 40 / 143


Computer Memory System Overview Memory Hierarchy

Example (3/5)
General shape of the curve that covers this situation:

Figure: Performance of accesses involving only L1 (Source: [Stallings, 2015])


L. Tarrataca Chapter 4 - Cache Memory 41 / 143
Computer Memory System Overview Memory Hierarchy

Example (4/5)

Textual description of the previous plot:

• For high percentages of L1 access, the average total access time is much
closer to that of L1 than that of L2 ;

Now lets consider the following scenario:

• Suppose 95% of the memory accesses are found in L1 .

• Average time to access a word is:

(0.95)(0.01µs ) + (0.05)(0.01µs + 0.1µs ) = 0.0095 + 0.0055 = 0.015µs


• Average access time is much closer to 0.01µs than to 0.1µs, as desired.

L. Tarrataca Chapter 4 - Cache Memory 42 / 143


Computer Memory System Overview Memory Hierarchy

Example (5/5)

Strategy to minimize accesses should be:

• organize data across the hierarchy such that


• percentage of accesses to lower levels is substantially less than that of upper
levels

• I.e. L2 memory contains all program instructions and data:


• Data that is currently being used should be place in L1 ;

• Eventually, data in L1 will have to be swapped back to L2 to make room for


new data;

• On average, most references will be to data contained in L1 .

L. Tarrataca Chapter 4 - Cache Memory 43 / 143


Computer Memory System Overview Memory Hierarchy

This principle can be applied across more than two levels of memory:

• Processor registers:
• fastest, smallest, and most expensive type of memory
• Followed immediately by the cache:
• Stages data movement between registers and main memory;
• Improves perfomance;
• Is not usually visible to the processor;
• Is not usually visible to the programmer.
• Followed by main memory:
• principal internal memory system of the computer;
• Each location has a unique address.

L. Tarrataca Chapter 4 - Cache Memory 44 / 143


Computer Memory System Overview Memory Hierarchy

This means that we should maybe have a closer look at the cache =)

Guess what the next section is...

L. Tarrataca Chapter 4 - Cache Memory 45 / 143


Cache Memory Principles

Cache Memory Principles

Cache memory is designed to combine:

• memory access time of expensive, high-speed memory combined with...

• ...the large memory size of less expensive, lower-speed memory.

Figure: Cache and main memory - single cache approach (Source: [Stallings, 2015])

L. Tarrataca Chapter 4 - Cache Memory 46 / 143


Cache Memory Principles

Figure: Cache and main memory - single cache approach (Source: [Stallings, 2015])

• large and slow memory together with a smaller, faster memory;

• the cache contains a copy of portions of main memory.

L. Tarrataca Chapter 4 - Cache Memory 47 / 143


Cache Memory Principles

When the processor attempts to read a word of memory:

• a check is made to determine if the word is in the cache;


• If so, the word is delivered to the processor.

• If the word is not in cache:


• a block of main memory is read into the cache;
• then the word is delivered to the processor.

• Because of the locality of reference principle:


• when a block of data is fetched into the cache;

• it is likely that there will be future references to that same memory location;

L. Tarrataca Chapter 4 - Cache Memory 48 / 143


Cache Memory Principles

Can you see any way of improving the cache concept? Any ideas?

L. Tarrataca Chapter 4 - Cache Memory 49 / 143


Cache Memory Principles

Can you see any way of improving the cache concept? Any ideas?

• What if we introduce multiple levels of cache?


• L2 cache is slower and typically larger than the L1 cache

• L3 cache is slower and typically larger than the L2 cache.

Figure: Cache and main memory - three-level cache organization (Source: [Stallings, 2015])

L. Tarrataca Chapter 4 - Cache Memory 50 / 143


Cache Memory Principles

So, what is the structure of the main-memory system?

L. Tarrataca Chapter 4 - Cache Memory 51 / 143


Cache Memory Principles

Main memory:

• Consists of 2n addressable words;

• Each word has a unique n-bit address;

• Memory consists of a number of


fixed-length blocks of K words each;

• There are M = 2n /K blocks;

Figure: Main memory (Source:


L. Tarrataca [Stallings,
Chapter 2015])
4 - Cache Memory 52 / 143
Cache Memory Principles

So, what is the structure of the cache system?

L. Tarrataca Chapter 4 - Cache Memory 53 / 143


Cache Memory Principles

Cache memory (1/2):

• Consisting of m blocks, called lines;

• Each line contains K words;

• m≪M

• Each line also includes control bits:


Figure: Cache memory (Source:
• Not shown in the figure; [Stallings, 2015])

• MESI protocol (later chapter).

L. Tarrataca Chapter 4 - Cache Memory 54 / 143


Cache Memory Principles

Cache memory (2/2):

• If a word in a block of memory is read:

• block is transferred to a cache line;

• Because m ≪ M, lines:

• cannot permanently store a block.

• need to identify the block stored; Figure: Cache memory (Source:


[Stallings, 2015])
• info stored in the tag field;

L. Tarrataca Chapter 4 - Cache Memory 55 / 143


Cache Memory Principles

Now that we have a better understanding of the cache structure:

What is the specific set of operations that need to be performed for a


read operation issued by the processor? Any ideas?

L. Tarrataca Chapter 4 - Cache Memory 56 / 143


Cache Memory Principles

Figure: Cache read address (RA) (Source: [Stallings, 2015])


L. Tarrataca Chapter 4 - Cache Memory 57 / 143
Cache Memory Principles

Read operation:

• Processor generates read address (RA) of word to be read;

• If the word ∈ cache, it is delivered to the processor;

• Otherwise:
• Block containing that word is loaded into the cache;

• Word is delivered to the processor;

• These last two operations occurring in parallel.

L. Tarrataca Chapter 4 - Cache Memory 58 / 143


Cache Memory Principles

Typical contemporary cache organization:

Figure: Typical cache organization (Source: [Stallings, 2015])

L. Tarrataca Chapter 4 - Cache Memory 59 / 143


Cache Memory Principles

In this organization the cache:

• connects to the processor via data, control, and address lines;

• the data and address lines also attach to data and address buffers:
• which attach to a system bus...

• ...from which main memory is reached.

L. Tarrataca Chapter 4 - Cache Memory 60 / 143


Cache Memory Principles

What do you think happens when a word is in cache? Any ideas?

Figure: Typical cache organization (Source: [Stallings, 2015])

L. Tarrataca Chapter 4 - Cache Memory 61 / 143


Cache Memory Principles

What do you think happens when a word is in cache? Any ideas?

When a cache hit occurs (word is in cache):

• the data and address buffers are disabled;

• communication is only between processor and cache;

• no system bus traffic.

L. Tarrataca Chapter 4 - Cache Memory 62 / 143


Cache Memory Principles

What do you think happens when a word is not in cache? Any ideas?

Figure: Typical cache organization (Source: [Stallings, 2015])

L. Tarrataca Chapter 4 - Cache Memory 63 / 143


Cache Memory Principles

What do you think happens when a word is not in cache? Any ideas?

When a cache miss occurs (word is not in cache):

• the desired address is loaded onto the system bus;

• the data are returned through the data buffer...

• ...to both the cache and the processor

L. Tarrataca Chapter 4 - Cache Memory 64 / 143


Elements of Cache Design

Elements of Cache Design


Cache architectures can be classified according to key elements:

Figure: Elements of cache design (Source: [Stallings, 2015])


L. Tarrataca Chapter 4 - Cache Memory 65 / 143
Elements of Cache Design Cache Addresses

Cache Addresses

There are two types of cache addresses:

• Physical addresses:
• Actual memory addresses;

• Logical addresses:
• Virtual-memory addresses;

L. Tarrataca Chapter 4 - Cache Memory 66 / 143


Elements of Cache Design Cache Addresses

Cache Addresses

What is virtual memory?

Virtual memory performs mapping between:

• Logical addresses used by a program into physical addresses.

• Why is this important? We will see in a later chapter...

L. Tarrataca Chapter 4 - Cache Memory 67 / 143


Elements of Cache Design Cache Addresses

Cache Addresses

Main idea behind virtual memory:

• Disregard amount of main memory available;

• Transparent transfers to/from:


• main memory and...

• ...secondary memory:

• Idea: use RAM, when space runs out use HD ;)

• requires a hardware memory management unit (MMU):


• to translate each virtual address into a physical address in main memory.

L. Tarrataca Chapter 4 - Cache Memory 68 / 143


Elements of Cache Design Cache Addresses

With virtual memory cache may be placed:

• between the processor and the MMU;

Figure: Virtual Cache (Source: [Stallings, 2015])

• between the MMU and main memory;

Figure: Physical Cache (Source: [Stallings, 2015])

L. Tarrataca Chapter 4 - Cache Memory 69 / 143


Elements of Cache Design Cache Addresses

What is the difference? Any ideas?

L. Tarrataca Chapter 4 - Cache Memory 70 / 143


Elements of Cache Design Cache Addresses

Virtual cache stores data using logical addresses.

• Processor accesses the cache directly, without going through the MMU.

• Advantage:
• Faster access speed;

• Cache can respond without the need for an MMU address translation;

• Disadvantage:
• Same virtual address in two different applications refers to two different
physical addresses;

• Therefore cache must be flushed with each application context switch...

• ...or extra bits must be added to each cache line


• to identify which virtual address space this address refers to.

L. Tarrataca Chapter 4 - Cache Memory 71 / 143


Elements of Cache Design Cache Size

Cache Size

Cache size should be:

• small enough so that the overall average cost per bit is close to that of
main memory alone;

• large enough so that the overall average access time is close to that of
the cache alone;

The larger the cache, the more complex the addressing logic:

• The result is that large caches tend to be slightly slower than small ones

The available chip and board area also limits cache size.

L. Tarrataca Chapter 4 - Cache Memory 72 / 143


Elements of Cache Design Cache Size

Conclusion: It is impossible to arrive at a single ‘‘optimal’’ cache size.

• as illustrated by the table in the next slide...

L. Tarrataca Chapter 4 - Cache Memory 73 / 143


Elements of Cache Design Cache Size

L. Tarrataca Chapter 4 - Cache Memory 74 / 143


Elements of Cache Design Mapping Function

Mapping Function

Recall that there are fewer cache lines than main memory blocks

How should one map main memory blocks into cache lines? Any ideas?

L. Tarrataca Chapter 4 - Cache Memory 75 / 143


Elements of Cache Design Mapping Function

Three techniques can be used for mapping blocks into cache lines:

• direct;

• associative;

• set associative

Lets have a look into each one of these...

• I know that you like when we go into specific details ;)

L. Tarrataca Chapter 4 - Cache Memory 76 / 143


Elements of Cache Design Mapping Function

Direct Mapping

Maps each block of main memory into only one possible cache line as:

i=j mod m

where:

• i = cache line number;

• j = main memory block number;

• m = number of lines in the cache

L. Tarrataca Chapter 4 - Cache Memory 77 / 143


Elements of Cache Design Mapping Function

Figure: Direct mapping (Source: [Stallings, 2015])

L. Tarrataca Chapter 4 - Cache Memory 78 / 143


Elements of Cache Design Mapping Function

Previous picture shows mapping of main memory blocks into cache:

• First m main memory blocks map into each line of the cache;

• Next m blocks of main memory map in the following manner:


• Bm maps into line L0 of cache;

• Bm+1 maps into line L1 ;

• and so on...

• Modulo operation implies repetitive structure;

L. Tarrataca Chapter 4 - Cache Memory 79 / 143


Elements of Cache Design Mapping Function

With direct mapping blocks are assigned to lines as follows:

Figure: (Source: [Stallings, 2015])

Over time:

• each line can have a different main memory block;

• we need the ability to distinguish between these;

• the most significant s - r bits (tag) serve this purpose.

L. Tarrataca Chapter 4 - Cache Memory 80 / 143


Elements of Cache Design Mapping Function

Each main memory address (s + w bits) can be viewed as:

• Offset (w bits): identifies a word within a block of main memory;

• Line (r bits): specify one of the 2r cache lines;

• Tag (s − r bits): to distinguish blocks that are mapped to the same line:
• Note that:
• s allows addressing of all blocks in main memory...
• ...whilst the number of available cache lines is much smaller than 2s
• Tag fields identifies the block that is currently mapped to a cache line;

L. Tarrataca Chapter 4 - Cache Memory 81 / 143


Elements of Cache Design Mapping Function

To determine whether a block is in the cache:

Figure: Direct mapping cache organization (Source: [Stallings, 2015])

L. Tarrataca Chapter 4 - Cache Memory 82 / 143


Elements of Cache Design Mapping Function

To determine whether a block is in the cache:

1 Use the line field of the memory address to index the cache line;

2 Compare the tag from the memory address with the line tag;
1 If both match, then Cache Hit:
1 Use the line field of the memory address to index the cache line;

2 Retrieve the corresponding word from the cache line;

2 If both do not match, then Cache Miss:


1 Use the line field of the memory address to index the cache line;

2 Update the cache line (word + tag);

L. Tarrataca Chapter 4 - Cache Memory 83 / 143


Elements of Cache Design Mapping Function

Direct mapping technique:

• Advantage: simple and inexpensive to implement;

• Disadvantage: there is a fixed cache location for any given block;


• if a program happens to reference words repeatedly from two different
blocks that map into the same line;

• then the blocks will be continually swapped in the cache;

• hit ratio will be low (a.k.a. thrashing).

L. Tarrataca Chapter 4 - Cache Memory 84 / 143


Elements of Cache Design Mapping Function

Associative Mapping

Overcomes the disadvantage of direct mapping by:

• permitting each block to be loaded into any cache line:

Figure: Associative Mapping (Source: [Stallings, 2015])

L. Tarrataca Chapter 4 - Cache Memory 85 / 143


Elements of Cache Design Mapping Function

Cache interprets a memory address as a Tag and a Word field:

• Tag: (s bits) uniquely identifies a block of main memory;

• Word: (w bits) uniquely identifies a word within a block;

L. Tarrataca Chapter 4 - Cache Memory 86 / 143


Elements of Cache Design Mapping Function

Figure: Fully associative cache organization (Source: [Stallings, 2015])

L. Tarrataca Chapter 4 - Cache Memory 87 / 143


Elements of Cache Design Mapping Function

To determine whether a block is in the cache:

• simultaneously compare every line’s tag for a match:

• If a match exists, then Cache Hit:


1 Use the tag field of the memory address to index the cache line;

2 Retrieve the corresponding word from the cache line;

• If a match does not exist, then Cache Miss:


1 Choose a cache line. How?

2 Update the cache line (word + tag);

L. Tarrataca Chapter 4 - Cache Memory 88 / 143


Elements of Cache Design Mapping Function

Associative mapping technique:

• Advantage: flexibility as to which block to replace when a new block is


read into the cache;

• Disadvantage: complex circuitry required to examine the tags of all cache


lines in parallel.

L. Tarrataca Chapter 4 - Cache Memory 89 / 143


Elements of Cache Design Mapping Function

Can you see any way of improving this scheme? Any ideas?

L. Tarrataca Chapter 4 - Cache Memory 90 / 143


Elements of Cache Design Mapping Function

Can you see any way of improving this scheme? Any ideas?

Idea: Perform less comparisons

• Instead of comparing the tag against all lines

• compare only against a subset of the cache lines.

• Welcome to set-associative mapping =)

L. Tarrataca Chapter 4 - Cache Memory 91 / 143


Elements of Cache Design Mapping Function

Set-associative mapping

Combination of direct and associative approaches:

• Cache consists of a number of sets, each consisting of a number of lines.

• From direct mapping:


• each block can only be mapped into a single set;

• I.e. Block Bj always maps to set j;

• Done in a modulo way =)

• From associative mapping:


• each block can be mapped into any cache line of a certain set.

L. Tarrataca Chapter 4 - Cache Memory 92 / 143


Elements of Cache Design Mapping Function

• The relationships are:

m=v ×k
i=j mod v

where:
• i = cache set number;

• j = main memory block number;

• m = number of lines in the cache;

• v = number of sets;

• k = number of lines in each set

L. Tarrataca Chapter 4 - Cache Memory 93 / 143


Elements of Cache Design Mapping Function

Figure: v associative mapped caches (Source: [Stallings, 2015])

Idea:

• 1 memory block → 1 single set, but to any row of that set.

• can be physically implemented as v associative caches


L. Tarrataca Chapter 4 - Cache Memory 94 / 143
Elements of Cache Design Mapping Function

Cache interprets a memory address as a Tag, a Set and a Word field:

• Set: identifies a set (d bits, v = 2d sets);

• Tag: used in conjunction with the set bits to identify a block (s − d bits);

• Word: identifies a word within a block;

L. Tarrataca Chapter 4 - Cache Memory 95 / 143


Elements of Cache Design Mapping Function

To determine whether a block is in the cache:

1 Determine the set through the set fields;

2 Compare address tag simultaneously with all cache line tags;

3 If a match exists, then Cache Hit:


1 Retrieve the corresponding word from the cache line;

4 If a match does not exist, then Cache Miss:


1 Choose a cache line within the set. How?

2 Update the cache line (word + tag);

L. Tarrataca Chapter 4 - Cache Memory 96 / 143


Elements of Cache Design Mapping Function

To determine whether a block is in the cache:

Figure: K -Way Set Associative Cache Organization (Source: [Stallings, 2015])


L. Tarrataca Chapter 4 - Cache Memory 97 / 143
Elements of Cache Design Mapping Function

Hint: The specific details about these models would make great exam
questions ;)

L. Tarrataca Chapter 4 - Cache Memory 98 / 143


Elements of Cache Design Mapping Function

Ok, we saw a lot of details, but:

What happens with cache performance?

E.g.: How does the direct mapping compare against others?

E.g.: what happens when we vary the number of lines k in each set?

L. Tarrataca Chapter 4 - Cache Memory 99 / 143


Elements of Cache Design Mapping Function

Figure: Varying associativity degree k (lines per set) over cache size
L. Tarrataca Chapter 4 - Cache Memory 100 / 143
Elements of Cache Design Mapping Function

Key points from the plot:

• Based on simulating the execution of GCC compiler:


• Different applications may yield different results;

• Significant performance difference between:


• Direct and 2-way set associative up to at least 64kB;

• Beyond 32kB:
• increase in cache size brings no significant increase in performance.

• Difference between:
• 2-way and 4-way at 4kB is much less than the...
• ...difference in going from for 4kB to 8kB in cache size;

L. Tarrataca Chapter 4 - Cache Memory 101 / 143


Elements of Cache Design Replacement Algorithms

Replacement Algorithms

He have seen three mapping techniques:

• Direct Mapping;

• Associative Mapping;

• Set-Associative Mapping

Why do we need replacement algorithms? Any ideas?

L. Tarrataca Chapter 4 - Cache Memory 102 / 143


Elements of Cache Design Replacement Algorithms

Replacement Algorithms

Eventually: cache will fill and blocks will need to be replaced:

• For direct mapping, there is only one possible line for any particular block:
• Thus no choice is possible;

• For the associative and set-associative techniques:


• a replacement algorithm is needed

L. Tarrataca Chapter 4 - Cache Memory 103 / 143


Elements of Cache Design Replacement Algorithms

Most common replacement algorithms (1/2):

• Least recently used (LRU):


• Probably the most effective;

• Replace block in the set that has been in the cache longest;

• Maintains a list of indexes to all the lines in the cache:


• Whenever a line is used move it to the front of the list;
• Choose the line at the back of the list when replacing a block;

L. Tarrataca Chapter 4 - Cache Memory 104 / 143


Elements of Cache Design Replacement Algorithms

Most common replacement algorithms (2/2):

• First-in-first-out (FIFO):
• Replace the block in the set that has been in the cache longest.

• easily implemented as a round-robin or circular buffer technique

• Least frequently used (LFU):


• Replace the block in the set that has experienced the fewest references;

• implemented by associating a counter with each line.

L. Tarrataca Chapter 4 - Cache Memory 105 / 143


Elements of Cache Design Replacement Algorithms

Can you think of any other technique?

L. Tarrataca Chapter 4 - Cache Memory 106 / 143


Elements of Cache Design Replacement Algorithms

Strange possibility: random line replacement:

• studies have shown only slightly inferior performance to LRU, LFU and FIFO.

• =)

L. Tarrataca Chapter 4 - Cache Memory 107 / 143


Elements of Cache Design Write Policy

Write Policy

What happens when a block resident in memory needs to be replaced?


Any ideas?

Can you see any implications that having a cache has on memory
management? Any ideas?

L. Tarrataca Chapter 4 - Cache Memory 108 / 143


Elements of Cache Design Write Policy

Write Policy

Two cases to consider:

• If the old block in the cache has not been altered:


• simply overwrite with a new block;

• If at least one write operation has been performed:


• main memory must be updated before bringing in the new block.

L. Tarrataca Chapter 4 - Cache Memory 109 / 143


Elements of Cache Design Write Policy

Some problem examples of having multiple memories:

• more than one device may have access to main memory, e.g.:
• I/O module may be able to read-write directly to memory;

• if a word has been altered only in the cache:


• the corresponding memory word is invalid.

• If the I/O device has altered main memory:


• then the cache word is invalid.

• Multiple processors, each with its own cache


• if a word is altered in one cache, invalidates the same word in other caches.

L. Tarrataca Chapter 4 - Cache Memory 110 / 143


Elements of Cache Design Write Policy

How can we tackle these issues? Any ideas?

L. Tarrataca Chapter 4 - Cache Memory 111 / 143


Elements of Cache Design Write Policy

How can we tackle these issues? Any ideas?

We have two possible techniques:

• Write through;

• Write back;

Lets have a look at these two techniques =)

L. Tarrataca Chapter 4 - Cache Memory 112 / 143


Elements of Cache Design Write Policy

Write through technique:

• all write operations are made to main memory as well as to the cache;

• ensuring that main memory is always valid;

• Disadvantage: lots of memory accesses → worse performance;

L. Tarrataca Chapter 4 - Cache Memory 113 / 143


Elements of Cache Design Write Policy

Write back technique:

• Minimizes memory writes;

• Updates are made only in the cache:


• When an update occurs, a use bit, associated with the line is set.

• When a block is replaced, it is written to memory iff the use bit is on.

• Disadvantage:
• I/O module can access main memory (later chapter)...

• But now all updates must pass through the cache...

• This makes for complex circuitry and a potential bottleneck

L. Tarrataca Chapter 4 - Cache Memory 114 / 143


Elements of Cache Design Write Policy

Example (1/2)

Consider a system with:

• 32 byte cache line size;


• 30 ns main memory transfer time for a 4-byte word;

What is the number of times that the line must be written before being
swapped out for a write-back cache to be more efficient that a write-
through cache?

L. Tarrataca Chapter 4 - Cache Memory 115 / 143


Elements of Cache Design Write Policy

Example (2/2)

What is the number of times that the line must be written before being
swapped out for a write-back cache to be more efficient that a write-
through cache?

• Write-back case:
• At swap-out time we need to transfer 32/4 = 8 words;
• Thus we need 8 × 30 = 240ns
• Write-through case:
• Each line update requires that one word be written to memory, taking 30ns
• Conclusion:
• If line gets written more than 8 times, the write-back method is more efficient;

L. Tarrataca Chapter 4 - Cache Memory 116 / 143


Elements of Cache Design Write Policy

But what happens when we have multiple caches?

L. Tarrataca Chapter 4 - Cache Memory 117 / 143


Elements of Cache Design Write Policy

But what happens when we have multiple caches?

Can you see the implications of having multiple caches for memory
management?

L. Tarrataca Chapter 4 - Cache Memory 118 / 143


Elements of Cache Design Write Policy

But what happens when we have multiple caches?

Can you see the implications of having multiple caches for memory
management?

What happens if data is altered in one cache?

L. Tarrataca Chapter 4 - Cache Memory 119 / 143


Elements of Cache Design Write Policy

If data in one cache is altered:

• invalidates not only the corresponding word in main memory...

• ...but also that same word in other caches:


• if any other cache happens to have that same word

• Even if a write-through policy is used:


• other caches may contain invalid data;

• We want to guarantee cache coherency (Chapter 5).

L. Tarrataca Chapter 4 - Cache Memory 120 / 143


Elements of Cache Design Write Policy

What are the possible mechanisms for dealing with cache coherency?
Any ideas?

L. Tarrataca Chapter 4 - Cache Memory 121 / 143


Elements of Cache Design Write Policy

Possible approaches to cache coherency (1/3):

• Bus watching with write through:


• Each cache monitors the address lines to detect write operations to memory;

• If a write is detected to memory that also resides in the cache:


• cache line is invalidated;

L. Tarrataca Chapter 4 - Cache Memory 122 / 143


Elements of Cache Design Write Policy

Possible approaches to cache coherency (2/3):

• Hardware transparency:
• Use additional hardware to ensure that all updates to main memory via
cache are reflected in all caches

L. Tarrataca Chapter 4 - Cache Memory 123 / 143


Elements of Cache Design Write Policy

Possible approaches to cache coherency (3/3):

• Noncacheable memory:
• Only a portion of main memory is shared by more than one processor, and
this is designated as noncacheable;

• All accesses to shared memory are cache misses, because the shared
memory is never copied into the cache.

• MESI Protocol:
• We will see this in better detail later on...

L. Tarrataca Chapter 4 - Cache Memory 124 / 143


Elements of Cache Design Line Size

Line Size

Another design element is the line size:

• Lines store memory blocks:


• Includes not only the desired word but also some adjacent words.

• As the block size increases from very small to larger sizes:


• Hit ratio will at first increase because of the principle of locality;

• However, as the block becomes even bigger:


• Hit ratio will begin to decrease;
• A lot of the words in bigger blocks will be irrelevant...

L. Tarrataca Chapter 4 - Cache Memory 125 / 143


Elements of Cache Design Line Size

Two specific effects come into play:

• Larger blocks reduce the number of blocks that fit into a cache.
• Also, because each block fetch overwrites older cache contents...

• ...a small number of blocks results in data being overwritten shortly after they
are fetched.

• As a block becomes larger:


• each additional word is farther from the requested word...

• ... and therefore less likely to be needed in the near future.

L. Tarrataca Chapter 4 - Cache Memory 126 / 143


Elements of Cache Design Line Size

The relationship between block size and hit ratio is complex:

• depends on the locality characteristics of a program;

• no definitive optimum value has been found

L. Tarrataca Chapter 4 - Cache Memory 127 / 143


Elements of Cache Design Number of caches

Number of caches

Recent computer systems:

• use multiple caches;

This design issue covers the following topics

• number of cache levels;

• also, the use of unified versus split caches;

Lets have a look at the details of each one of these...

L. Tarrataca Chapter 4 - Cache Memory 128 / 143


Elements of Cache Design Number of caches

Multilevel caches

As logic density increased:

• became possible to have a cache on the same chip as the processor:


• reduces the processor’s external bus activity;

• therefore improving performance;

• when the requested instruction or data is found in the on-chip cache:


• bus access is eliminated;

• because of the short data paths internal to the processor:


• cache accesses will be faster than even zero-wait state bus cycles.
• Furthermore, during this period the bus is free to support other transfers.

L. Tarrataca Chapter 4 - Cache Memory 129 / 143


Elements of Cache Design Number of caches

With the continued shrinkage of processor components:

• processors now incorporate a second cache level (L2) or more:

• savings depend on the hit rates in both the L1 and L2 caches.


• In general: use of a second-level cache does improve performance;

• However, multilevel caches complicate design issues:


• size;
• replacement algorithms;
• write policy;

L. Tarrataca Chapter 4 - Cache Memory 130 / 143


Elements of Cache Design Number of caches

Two-level cache performance as a function of cache size:

Figure: Total hit ratio (L1 and L2) for 8-Kbyte and 16-Kbyte L1 (Source: [Stallings, 2015])

L. Tarrataca Chapter 4 - Cache Memory 131 / 143


Elements of Cache Design Number of caches

Figure from previous slide (1/2):

• assumes that both caches have the same line size;

• shows the total hit ratio:

L. Tarrataca Chapter 4 - Cache Memory 132 / 143


Elements of Cache Design Number of caches

Figure from previous slide (2/2):

• shows the impact of L2 size on total hits with respect to L1 size.


• Steepest part of the slope for an L1 cache:
• of 8 Kbytes is for an L2 cache of 16 Kbytes;
• of 16 Kbytes is for an L2 cache size of 32 Kbytes;

• L2 has little effect on performance until it is at least double the L1 cache size.
• Otherwise, L2 cache has little impact on total cache performance.

L. Tarrataca Chapter 4 - Cache Memory 133 / 143


Elements of Cache Design Number of caches

It may be a strange question: but why do we need an L2 cache to be


larger than L1? Any ideas?

L. Tarrataca Chapter 4 - Cache Memory 134 / 143


Elements of Cache Design Number of caches

It may be a strange question but: why do we need an L2 cache to be


larger than L1? Any ideas?

• If the L2 cache has the same line size and capacity as the L1 cache...

• ...its contents will more or less mirror those of the L1 cache.

Also, there is a performance advantage to adding a L3 and a L4.

L. Tarrataca Chapter 4 - Cache Memory 135 / 143


Elements of Cache Design Number of caches

Unified versus split caches

In recent computer systems:

• it has become common to split the cache into two:


• Instruction cache;

• Data cache;

• both exist at the same level:


• typically as two L1 caches:

• When the processor attempts to fetch:


• an instruction from main memory, it first consults the instruction L1 cache,
• data from main memory, it first consults the data L1 cache.

L. Tarrataca Chapter 4 - Cache Memory 136 / 143


Elements of Cache Design Number of caches

Two potential advantages of a unified cache:

• Higher hit rate than split caches:


• automatically load balancing between instruction and data fetches, i.e.:

• if an execution pattern involves more instruction fetches than data fetches...


• ...the cache will tend to fill up with instructions;

• if an execution pattern involves relatively more data fetches...


• ...the cache will tend to fill up with data;

• Only one cache needs to be designed and implemented.

L. Tarrataca Chapter 4 - Cache Memory 137 / 143


Elements of Cache Design Number of caches

Unified caches seems pretty good...

So why the need for split caches? Any ideas?

L. Tarrataca Chapter 4 - Cache Memory 138 / 143


Elements of Cache Design Number of caches

Key advantage of the split cache design:

• eliminates competition for the cache between


• Instruction fetch/decode/execution stages...

• and the load / store data stages;

• Important in any design that relies on the pipelining of instructions:


• fetch instructions ahead of time

• thus filling a pipeline with instructions to be executed.

• Chapter 14

L. Tarrataca Chapter 4 - Cache Memory 139 / 143


Elements of Cache Design Number of caches

With a unified instruction / data cache:

• Data / instructions will be stored in a single location;

• Pipelining:
• allows for multiples stages of the instruction cycle to be executed
simultaneously.

• Because there is a single cache:


• The executions of multiple stages cannot be performed;

• Performance bottleneck;

Split cache structure overcomes this difficulty.

L. Tarrataca Chapter 4 - Cache Memory 140 / 143


Intel Cache Intel Cache Evolution

Intel Cache Evolution

Figure: Intel Cache Evolution (Source: [Stallings, 2015])


L. Tarrataca Chapter 4 - Cache Memory 141 / 143
Intel Cache Intel Pentium 4 Block diagram

Intel Pentium 4 Block diagram

Figure: Pentium 4 block diagram (Source: [Stallings, 2015])

L. Tarrataca Chapter 4 - Cache Memory 142 / 143


References

References I

Stallings, W. (2015).

Computer Organization and Architecture.

Pearson Education.

L. Tarrataca Chapter 4 - Cache Memory 143 / 143

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy