0% found this document useful (0 votes)
125 views25 pages

2015Sp CS61C L16 Kavs Caches3

The document discusses caches and memory hierarchies. It begins by reviewing cache parameters like block size, associativity, capacity, write policies, and replacement policies. It then discusses the three sources of cache misses: compulsory, capacity, and conflict misses. The document ends by discussing how changing cache parameters like block size, associativity, and capacity can impact performance metrics like hit time and miss rate. Specifically, increasing block size can initially lower miss rate but increase conflict misses; increasing associativity lowers miss rate but increases hit time; and increasing cache capacity lowers miss rate and hit time.

Uploaded by

MaiDung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
125 views25 pages

2015Sp CS61C L16 Kavs Caches3

The document discusses caches and memory hierarchies. It begins by reviewing cache parameters like block size, associativity, capacity, write policies, and replacement policies. It then discusses the three sources of cache misses: compulsory, capacity, and conflict misses. The document ends by discussing how changing cache parameters like block size, associativity, and capacity can impact performance metrics like hit time and miss rate. Specifically, increasing block size can initially lower miss rate but increase conflict misses; increasing associativity lowers miss rate but increases hit time; and increasing cache capacity lowers miss rate and hit time.

Uploaded by

MaiDung
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 25

CS 61C: Great Ideas in Computer

Architecture (Machine Structures)


Caches Part 3

Instructors:
Krste Asanovic & Vladimir Stojanovic
http://inst.eecs.berkeley.edu/~cs61c/
You Are Here!
Software Hardware
• Parallel Requests
Warehouse Smart
Assigned to computer Scale Phone
e.g., Search “Katz” Computer
Harness
• Parallel Threads Parallelism &
Assigned to core Achieve High
e.g., Lookup, Ads Performance Computer
• Parallel Instructions Core … Core
Today’s
>1 instruction @ one time Memory (Cache)
Lecture
e.g., 5 pipelined instructions
Input/Output Core
• Parallel Data
Instruction Unit(s) Functional
>1 data item @ one time Unit(s)
e.g., Add of 4 pairs of words A0+B0 A1+B1 A2+B2 A3+B3
• Hardware descriptions
Main Memory
All gates @ one time
Logic Gates
• Programming Languages
2
Caches Review
• Direct-Mapped vs. Set-Associative vs. Fully
Associative
• AMAT = Hit Time + Miss Rate * Miss Penalty
• 3 Cs of cache misses: Compulsory, Capacity,
Conflict
• Effect of cache parameters on performance

3
Primary Cache Parameters
• Block size (aka line size)
– how many bytes of data in each cache entry?
• Associativity
– how many ways in each set?
– Direct-mapped => Associativity = 1
– Set-associative => 1 < Associativity < #Entries
– Fully associative => Associativity = #Entries
• Capacity (bytes) = Total #Entries * Block size
• #Entries = #Sets * Associativity
4
Other Cache Parameters
• Write Policy
• Replacement policy

5
Write Policy Choices
• Cache hit:
– write through: writes both cache & memory on every access
• Generally higher memory traffic but simpler pipeline & cache design
– write back: writes cache only, memory `written only when dirty
entry evicted
• A dirty bit per line reduces write-back traffic
• Must handle 0, 1, or 2 accesses to memory for each load/store
• Cache miss:
– no write allocate: only write to main memory
– write allocate (aka fetch on write): fetch into cache

• Common combinations:
– write through and no write allocate
– write back with write allocate

6
Replacement Policy
In an associative cache, which line from a set should be
evicted when the set becomes full?
• Random
• Least-Recently Used (LRU)
• LRU cache state must be updated on every access
• True implementation only feasible for small sets (2-way)
• Pseudo-LRU binary tree often used for 4-8 way
• First-In, First-Out (FIFO) a.k.a. Round-Robin
• Used in highly associative caches
• Not-Most-Recently Used (NMRU)
• FIFO with exception for most-recently used line or lines

This is a second-order effect. Why?

Replacement only happens on misses


7
Sources of Cache Misses (3 C’s)
• Compulsory (cold start, first reference):
– 1st access to a block, “cold” fact of life, not a lot you can
do about it.
• If running billions of instructions, compulsory misses are
insignificant
• Capacity:
– Cache cannot contain all blocks accessed by the program
• Misses that would not occur with infinite cache
• Conflict (collision):
– Multiple memory locations mapped to same cache set
• Misses that would not occur with ideal fully associative cache

8
Impact of Cache Parameters on
Performance
• AMAT = Hit Time + Miss Rate * Miss Penalty
– Note, we assume always first search cache, so
must charge hit time for both hits and misses!
• For misses, characterize by 3Cs

9
CPU-Cache Interaction
(5-stage pipeline)

0x4 Add E
M
A
we
Decode, ALU Y addr
bubble Primary
IR Register B
Data rdata
PC addr inst Fetch Cache R
D wdata hit?
hit? wdata
PCen Primary
Instruction MD1 MD2
Cache
Stall entire
CPU on data
cache miss
To Memory Control

Cache Refill Data from Lower Levels of


Memory Hierarchy
10
Increasing Block Size?
• Hit time as block size increases?
– Hit time unchanged, but might be slight hit-time
reduction as number of tags is reduced, so faster to
access memory holding tags
• Miss rate as block size increases?
– Goes down at first due to spatial locality, then
increases due to increased conflict misses due to
fewer blocks in cache
• Miss penalty as block size increases?
– Rises with longer block size, but with fixed constant
initial latency that is amortized over whole block

11
Increasing Associativity?
• Hit time as associativity increases?
– Increases, with large step from direct-mapped to >=2 ways,
as now need to mux correct way to processor
– Smaller increases in hit time for further increases in
associativity
• Miss rate as associativity increases?
– Goes down due to reduced conflict misses, but most gain is
from 1->2->4-way with limited benefit from higher
associativities
• Miss penalty as associativity increases?
– Unchanged, replacement policy runs in parallel with
fetching missing line from memory

12
Increasing #Entries?
• Hit time as #entries increases?
– Increases, since reading tags and data from larger
memory structures
• Miss rate as #entries increases?
– Goes down due to reduced capacity and conflict
misses
– Architects rule of thumb: miss rate drops ~2x for every
~4x increase in capacity (only a gross approximation)
• Miss penalty as #entries increases?
– Unchanged

13
Administrivia
• Project 2, Part 2 due 3/22
• No assigned work over spring break
• Next assignment, HW5, due 04/05
• Midterm II is 04/09
– Conflict? Email Sagar
– DSP will receive email about accommodations
soon

14
How to Reduce Miss Penalty?
• Could there be locality on misses from a
cache?
• Use multiple cache levels!
• With Moore’s Law, more room on die for
bigger L1 caches and for second-level (L2)
cache
• And in some cases even an L3 cache!
• IBM mainframes have ~1GB L4 cache off-chip.
15
Review: Memory Hierarchy
Processor
Increasing
Inner distance from
Level 1 processor,
Levels in decreasing
memory Level 2 speed
hierarchy Level 3
Outer ...
Level n

Size of memory at each level


As we move to outer levels the latency goes up
and price per bit goes down.
16
From Lecture 11: In the News
• At ISSCC 2015 in San Francisco yesterday, latest IBM
mainframe chip details
• z13 designed in 22nm SOI technology with
seventeen metal layers, 4 billion transistors/chip
• 8 cores/chip, with 2MB L2 cache, 64MB L3 cache,
and 480MB L4 off-chip cache.
• 5GHz clock rate, 6 instructions per cycle, 2
threads/core
• Up to 24 processor chips in shared memory node

17
IBM z13 Memory Hierarchy

18
Local vs. Global Miss Rates
• Local miss rate – the fraction of references to
one level of a cache that miss
• Local Miss rate L2$ = $L2 Misses / L1$ Misses
• Global miss rate – the fraction of references that
miss in all levels of a multilevel cache
• L2$ local miss rate >> than the global miss rate

19
L1 Cache: 32KB I$, 32KB D$
L2 Cache: 256 KB
L3 Cache: 4 MB

6/10/2018 Fall 2013 -- Lecture #13 20


Local vs. Global Miss Rates
• Local miss rate – the fraction of references to one
level of a cache that miss
• Local Miss rate L2$ = $L2 Misses / L1$ Misses
• Global miss rate – the fraction of references that
miss in all levels of a multilevel cache
• L2$ local miss rate >> than the global miss rate
• Global Miss rate = L2$ Misses / Total Accesses
= L2$ Misses / L1$ Misses x L1$ Misses / Total Accesses
= Local Miss rate L2$ x Local Miss rate L1$
• AMAT = Time for a hit + Miss rate x Miss penalty
• AMAT = Time for a L1$ hit + (local) Miss rate L1$ x
(Time for a L2$ hit + (local) Miss rate L2$ x L2$ Miss penalty)

21
Clickers/Peer Instruction
• Overall, what are L2 and L3 local miss rates?
A: L2 > 50%, L3 > 50%
B: L2 ~ 50%, L3 < 50%
C: L2 ~ 50%, L3 ~ 50%
D: L2 < 50%, L3 < 50%
E: L2 > 50%, L3 ~50%

22
23
CPI/Miss Rates/DRAM Access
SpecInt2006
Data Only Data Only Instructions and Data

6/10/2018 Fall 2013 -- Lecture #12 24


In Conclusion, Cache Design Space
• Several interacting dimensions Cache Size

– Cache size
– Block size Associativity

– Associativity
– Replacement policy
– Write-through vs. write-back
Block Size
– Write-allocation
• Optimal choice is a compromise
– Depends on access characteristics Bad
• Workload
• Use (I-cache, D-cache)
– Depends on technology / cost Good Factor A Factor B

• Simplicity often wins Less More

25

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy