0% found this document useful (0 votes)

18 views32 pages

Ch01 Part3 Caches

This document provides an overview of cache memory architecture, including the principles of locality and various cache types such as direct-mapped and set associative caches. It discusses the performance implications of cache design, including access times, block placement, and replacement strategies. The notes are based on established computer architecture texts and aim to prepare students for more complex topics in the course.

Uploaded by

syedsaudnaqvi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views32 pages

Ch01 Part3 Caches

Uploaded by

syedsaudnaqvi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 32

332

Advanced Computer Architecture

Chapter 1.4

Caches: a quick review of introductory memory system

architecture

Objective: bring everyone up to speed, and also establish some key

ideas that will come up later in the course in more complicated contexts
October 2023
Paul H J Kelly

These lecture notes are partly based on the course text, Hennessy and
Patterson’s Computer Architecture, a quantitative approach (6th ed), and on
the lecture slides of David Patterson’s Berkeley course (CS252)

Course materials online on

https://scientia.doc.ic.ac.uk/2324/modules/60001/materials and
https://www.doc.ic.ac.uk/~phjk/AdvancedCompArchitecture/aca20/
Intel Skylake quad-core die photo

https://en.wikichip.org/wiki/File:skylake_(quad-core)_(annotated).png
Intel Skylake quad-core die photo

https://en.wikichip.org/wiki/File:skylake_(quad-core)_(annotated).png
We finished the last lecture by asking how fast
a pipelined processor can go?
A simple 5-stage pipeline can run at 5-9GHz
Limited by critical path through slowest pipeline stage
logic
Tradeoff: do more per cycle? Or increase clock rate?
Or do more per cycle, in parallel…
At 3GHz, clock period is 330 picoseconds.
The time light takes to go about four inches
About 10 gate delays
for example, the Cell BE is designed for 11 FO4 (“fan-out=4”)
gates per cycle:
www.fe.infn.it/~belletti/articles/ISSCC2005-cell.pdf
Pipeline latches etc account for 3-5 FO4 delays leaving only
5-8 for actual work

How can we build a RAM that can implement

our MEM stage in 5-8 FO4 delays?
Life used to be so easy
Processor-DRAM Memory Gap (latency)

µProc
1000 CPU
60%/yr.
“Moore’s Law” (2X/1.5yr)
Performance

100 Processor-Memory
Performance Gap:
(grows 50% / year)
10
DRAM
DRAM 9%/yr.
(2X/10 yrs)
1
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1980
1981
1982
1983
1984
1985
1986

1999
2000
Time
In 1980 a large RAM’s access time was close to the CPU cycle time. 1980s
machines
Levels of the Memory Hierarchy
Upper Level
Capacity
Access Time
Cost faster
Management:
CPU Registers Registers by programmer/compiler
100s Bytes
<1ns Transfer unit:
Instructions and Operands 1-16 bytes
Cache (perhaps multilevel)
10s-1000s K Bytes Cache by cache controller
1-10 ns (L1-L3) 8-128 bytes
~$10/ MByte
Blocks
Main Memory
G Bytes by Operating System
100ns- 300ns “Main” memory 4K-8K bytes
$0.01/ MByte
Pages
Disk by user/operator
100s G Bytes, Disk Mbytes
10 ms
(10,000,000 ns)
$0.00005 Mbyte ($50/TB) Files
Larger
Tape
infinite Tape Lower Level
sec-min
$0.00005/ MByte
~ Exponential increase in access latency, block size, capacity
• The Principle of Locality:
– Programs access a relatively small portion of the
address space at any instant of time.

• Two Different Types of Locality:

– Temporal Locality (Locality in Time): If an item is
referenced, it will tend to be referenced again
soon (e.g., loops, reuse)
– Spatial Locality (Locality in Space): If an item is
referenced, items whose addresses are
close by tend to be referenced soon
(e.g., straightline code, array access)

• Most modern architectures are heavily

reliant (totally reliant?) on locality for
speed
1 KB “Direct Mapped” Cache, 32B blocks
• For a 2N byte cache:
– The uppermost (32 - N) bits are always the Cache Tag
– The lowest M bits are the Byte Select (Block Size = 2M)
31 9 4
0
Cache Tag Example: 0x50
Ex: 0x01 Cache
Ex: 0x00
Tags: metadata toIndex
enable us Byte Select
to check whether we have a
hit
Valid Bit Cache Cache Data
Tag
Byte 31 Byte 1 Byte 0

: :
0x50 Byte 63 Byte 33 0Byte 32
1 2
3

: : :
Byte 1023 Byte 992

:
31

Data: the cached data itself,

Direct-mapped cache - storage arranged in cache lines/blocks
1 KB “Direct Mapped” Cache, 32B blocks
• For a 2N byte cache:
– The uppermost (32 - N) bits are always the Cache Tag
– The lowest M bits are the Byte Select (Block Size = 2M)
31 9 4
0
Cache Tag Example: 0x50
Ex: 0x01 Cache
Ex: 0x00
Load address issued by processor
Index Byte Select

Valid Bit Cache Tag Cache Data

Byte 31 Byte 1 Byte 0 0

: :
0x50 Byte 63 Byte 33 Byte 32 1
2
3

: : :
Byte 1023 Byte 992 31

:
Compare

Data
Direct-mapped cache – read access Hit
1 KB Direct Mapped Cache, 32B blocks
(0) 0
Cache location 0 can be occupied
1

2 by data from main memory

4
location 0, 32, 64, … etc.
5
Cache location 1 can be occupied
by data from main memory
6

8 location 1, 33, 65, … etc.

9
In general, all locations with same
Main
10

11
Address<9:4> bits map to the same
12 location in the cache Which one
should we place in the cache?
Memory
13

How can we tell which one is

16 in
the cache?
17

19
Cache Data
20
Byte 31 Byte 1 Byte 0 0

: :
21

23
Byte 63 Byte 33 Byte 1
32
24

25
2
26 3
27

29
:
30

Byte 1023 Byte 992 31

:
31

(32) 32

13
34

35
Associativity conflicts in a direct-mapped cache
Consider a loop that repeatedly reads A A+0
part of two different arrays: 64x4=256Bytes
A+32
int A[256]; A+32*2 ie 8 32B cache
int B[256]; A+32*3 lines
int r = 0;
for (int i=0; i<10; ++i) Repeatedly
{ for (int j=0; j<64; + re-reads 64
+j) { r += A[j] + B[j]; values from
both A and B
}
} B B+0
For the accesses to A and B to be 64x4=256Bytes
B+32
mostly cache hits, we need a cache
big enough to hold 2x64 ints, ie B+32*2 ie 8 32B cache
512B B+32*3 lines

Consider the 1KB direct-mapped

cache on the previous slide - what
might go wrong?

16
Associativity conflicts in a direct-mapped cache
Consider a loop that repeatedly reads
part of two different arrays:
int A[256];
int B[256];
int r = 0; Array B is
located
for (int i=0; i<10; ++i) Repeatedly exactly 1024
{ for (int j=0; j<64; + re-reads 64 bytes after
+j) { r += A[j] + B[j]; values from array A
both A and B
}
}
For the accesses to A and B to be
mostly cache hits, we need a cache
big enough to hold 2x64 ints, ie
512B

Consider the 1KB direct-mapped

cache on the previous slide - what
might go wrong?

17
Direct-mapped Cache - structure
• Capacity: C bytes (eg 1KB)
• Blocksize: B bytes (eg 32)
• Byte select bits: 0..log(B)-1 (eg 0..4)
• Number of blocks: C/B (eg 32)
• Address size: A (eg 32 bits)
• Cache index size: I=log(C/B) (eg log(32)=5)
• Tag size: A-I-log(B) (eg 32-5-5=22)

Cache Index
Valid Cache Tag Cache Data
Cache Block 0

: : :

Adr Tag
Compare

Cache Block
Hit
Two-way Set Associative Cache
• N-way set associative: N entries for each Cache Index
– N direct mapped caches operated in parallel (N typically 2 to 4)
• Example: Two-way set associative cache
– Cache Index selects a “set” from the cache
– The two tags in the set are compared in parallel
– Data is selected based on the tag result

Cache Index
Valid Cache Tag Cache Data Cache Data Cache Tag Valid
Cache Block 0 Cache Block 0

: : : : : :

Adr Tag
Compare 0 Sel0
Compare
Sel1 1 Mux
OR
Cache Block
Hit
Disadvantage of Set Associative Cache
• N-way Set Associative Cache v. Direct Mapped Cache:
– N comparators vs. 1
– Extra MUX delay for the data
– Data comes AFTER Hit/Miss
• In a direct mapped cache, Cache Block is available
BEFORE Hit/Miss:
– Possible to assume a hit and continue. Recover later if
miss.
Valid Cache Tag Cache Data Cache Index
Cache Data Cache Tag Valid
Cache Block 0 Cache Block 0

: : : : :
:
Adr Tag
Compare 0 Sel0
Compare
Sel1 1 Mux
OR
Cache Block
Hit
Example: Intel Pentium 4 Level-1 cache (pre-Prescott)
Capacity: 8K bytes (total amount of data cache can store)
Block: 64 bytes (so there are 8K/64=128 blocks in the cache)
Ways: 4 (addresses with same index bits can be placed in one of 4 ways)
Sets: 32 (=128/4, that is each RAM array holds 32 blocks)
Index: 5 bits (since 25=32 and we need index to select one of the 32 ways)
Tag: 21 bits (=32 minus 5 for index, minus 6 to address byte within block)
Access time: 2 cycles, (.6ns at 3GHz; pipelined, dual-ported [load+store])

Cache Index
Valid Cache Tag Cache Data Cache Data Cache Tag Valid
Cache Block 0 Cache Block 0

: : : : : :

Adr Tag
Compare 0 Sel0
Compare
Sel1 1 Mux
OR
Cache Block
Hit
4 Questions for Memory Hierarchy
• Q1: Where can a block be placed in the upper
level?
– Block placement
• Q2: How is a block found if it is in the upper
level?
– Block identification
• Q3: Which block should be replaced on a
miss?
– Block replacement
• Q4: What happens on a write?
– Write strategy
Q1: Where can a block be placed in the upper level?
0
1 In a direct-mapped cache, block 12 can only be placed in one cache
2
3 location, determined by its low-order address bits –
4 (12 mod 8) = 4
5
6
7

0 1
Set 0 In a two-way set-associative cache, the set is determined by its low-
1
2 order address bits –
3 (12 mod 4) = 0
Block 12 can be placed in either of the two cache locations in set 0

0 1 2 3 4 5 6 7

In a fully-associative cache, block 12 can be placed in any location in the cache

More associativity:
More comparators – larger, more energy
Better hit rate (diminishing returns)
Reduced storage layout sensitivity –
Q2: How is a block found if it is in the upper level?
Cache Index
Valid Cache Tag Cache Data Cache Data Cache Tag Valid
Cache Block 0 Cache Block 0

: : : : : :

Adr Tag Adr Tag

Compare 0 Sel0
Compare
Sel1 1 Mux
OR
Cache Block
Hit

• Tag on each block

– No need to check index or block
offset
Block Address Block
Tag Index Offset

• Increasing associativity shrinks index, expands tag

Q3: Which block should be replaced on a miss?
• With Direct Mapped there is no choice
• With Set Associative or Fully Associative we want to
choose
– Ideal: least-soon re-used
– LRU (Least Recently Used) is a popular approximation
– Random is remarkably good in large caches

Assoc: 2- 4-way 8-way

way
Size LRU Ran LRU Ran LRU Ran
16 KB 5.2% 5.7% 4.7% 5.3% 4.4% 5.0%
64 KB 1.9% 2.0% 1.5% 1.7% 1.4% 1.5%
256 KB 1.15% 1.17% 1.13% 1.13% 1.12% 1.12%
Benchmark studies show that LRU beats random only with small caches

LRU can be pathologically bad.......

Q4: What happens on a write?
• Write through—The information is written to
both the block in the cache and to the block
in the lower-level memory

• Write back—The information is written only

to the block in the cache. The modified
cache block is written to main memory only
when it is replaced.
– is block clean or dirty?

• Pros and Cons of each?

– WT: read misses cannot result in writes
– WB: absorbs repeated writes to same
location

• WT always combined with write buffers so

that we don’t wait for lower level memory
Caches are a big topic
• Cache coherency
– If your data can be in more than one cache, how do you
keep the copies consistent?
• Victim caches
– Stash recently-evicted blocks in a small fully-associative
cache (a “competitive strategy”)
• Prefetching
– Use a predictor to guess which block to fetch next –
before the processor requests it
• And much much more........
What’s at the bottom of the memory hierarchy?
• StorageTek STK 9310 (“Powderhorn”)
– 2,000, 3,000, 4,000, 5,000, or 6,000
cartridge slots per library storage module
(LSM)
– Up to 24 LSMs per library (144,000
cartridges)
– 120 TB (1 LSM) to 28,800 TB capacity (24
LSM)
– Each cartridge holds 300GB, readable up
to
40 MB/sec
• Up to 28.8 petabytes
• Ave 4s to load tape
•2017 product: Oracle SL8500
•Up to 1.2 Exabyte per unit
•Combine up to 32 units into single
robot tape drive system
• http://www.oracle.com/us/products/ser
vers-storage/storage/tape-
storage/034341.pdf
https://www.itnews.com.au/gallery/inside-suns-multi-storey-colorado-data-centre-135385/page1
IBM System Storage Tape Library ts3500 ts4500
From
https://www.youtube.com/watch?v=CVN93H6Eu
AU&list
=PLp5rLKqrfZu_EvvnFM1HDptl_n0k5Th_q

StorageTek Powderhorn before disassembly, CERN 2007

http://www.flickriver.com/photos/naezmi/2074280052/#large
Can we live without cache?
• Interesting exception: Cray/Tera MTA, first
delivered June 1999:
– www.cray.com/products/systems/mta/

• Each CPU switches every cycle between

128 threads

• Each thread can have up to 8

outstanding
memory accesses

• 3D toroidal mesh interconnect

• Memory accesses hashed to spread

load
across banks

• MTA-1 fabricated using Gallium Arsenide,

not silicon
• “nearly un-manufacturable” (wikipedia)

• Third-generation Cray XMT:

– http://www.cray.com/Products/XMT.aspx
http://www.karo.com
Summary:
Without caches we are in trouble We will look at various
techniques to exploit
DRAM access times are
memory parallelism to
commonly >100 cycles
overcome this –
Without locality caches especially in GPUs
won’t help
Spatial vs temporal
locality
Direct-mapped We will see similar structures,
Set-associative and issues, in branch
Associativity conflicts predictors, prefetching etc

Policy questions: We will see similar choices in

Write-through cache coherency protocols
Write-back for multicore
Many more – see next chapter!
Next:

Discussion exercise – the “Turing Tax”

Then dynamic scheduling

Then a deeper dive into caches and the

memory hierarchy
In response to a student question:
• There is a tag for each 32-byte cache block (and in the 1KB cache, there
would, as you say, be 32 blocks, since 1024=32x32).
• Two adjacent cache blocks could (normally will) hold 32-byte blocks from
different parts of the memory.
• In a fully-associative cache we would have a tag and a tag comparator for
every 32-byte block.
• In a direct-mapped cache, we have a tag for every block, but only one tag
comparator.
• This is cheaper, faster and lower-power. But in order to make it work, we
use some of the low-order address bits to index the cache - to select just
one cache block. If its tag matches, we have a hit. If not, we
don't. Similarly, when data is allocated into the cache. the same index
bits are used to select the cache block that will be used (perhaps
displacing whatever was there before).
• This means that different addresses that happen to have the same index
bits map to the same cache block. So only one of them can be in the
cache at the same time.
Running long simulations… (helpful student’s edstem post)
Hello! Here is a quick list of tips I’ve picked up when running remote workloads on
the lab machines.
• I presume you are familiar with SSH-ing into the machines, normally I SSH with 2
hops (you should not run any apps on the shell servers, as far as I know they are
meant only for accessing the DoC network): my laptop -> shell[1-5].doc.ic.ac.uk ->
[your chosen lab machine]. Also, I have found that at rare times the shell servers can
be slow. Either try another one or use the Imperial VPN and skip the first hop. Here is
a list of lab machines for your convenience: link.
• Now, to find an empty lab machine, try to run htop.

• Here is some sample output, you can see in the first image an empty machine (low
ram/CPU usage) and after that a busy one. To quit just press q.
Running long simulations… (helpful student’s edstem post)

• To make your session persistent you can use GNU Screen.

• Just type screen in your terminal and it should open up

a new bash session.
• You can do your work there, and when you are ready to
leave just type CTRL-A-D to minimise the session - it
should
now persist even if you log out!
• To list your sessions, type screen -ls.
• When you are ready to reconnect, just SSH into the same
machine and run screen -r (potentially pass the name of
the session as well if you have multiple). Don't forget to close
your sessions when you are done (a simple exit will do).

Finally, a caveat: checkpoint your work - if your machine gets

reset for whatever reason you will loose your sessions.
Feeding curiosity
• Does LRU have pathological worst-case behaviour? How much
worse might it be than an optimal replacement policy?
– See: Some Mathematical Facts About Optimal Cache Replacement, Pierre
Michaud, ACM TACO 2016 https://dl.acm.org/doi/pdf/10.1145/3017992
• Can we reason about the efficiency of programs in a way that takes
into account how well they will use the memory hierarchy – can we
reason about the asymptotic complexity of programs as the distance
to memory grows?
– See: The Uniform Memory Hierarchy Model of Computation, Alpern et al,
Algorithmica 1994
https://link.springer.com/content/pdf/10.1007/BF01185206.pdf
• As memory gets bigger, latency gets worse. But we could pipeline
it? Under what circumstances does an algorithm’s time complexity
depend on memory latency?
– See: On approximating the ideal random access machine by physical machines,
Bilardi et al JACM 2009 https://dl.acm.org/doi/10.1145/1552285.1552288

Faith in Mind PDF
No ratings yet
Faith in Mind PDF
2 pages
Ch01 Part3 Caches
No ratings yet
Ch01 Part3 Caches
32 pages
Ch01 Part3 Caches
No ratings yet
Ch01 Part3 Caches
32 pages
11 Cache Memory
No ratings yet
11 Cache Memory
40 pages
William Stallings Computer Organization and Architecture 6th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 6th Edition Cache Memory
55 pages
04 Cache Memory
No ratings yet
04 Cache Memory
71 pages
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
71 pages
4.1 Computer Memory System Overview
No ratings yet
4.1 Computer Memory System Overview
12 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
57 pages
CAO - Lecutre7 Cache Memory
100% (1)
CAO - Lecutre7 Cache Memory
39 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
51 pages
Computer Organization and Architecture: Cache Memory
100% (1)
Computer Organization and Architecture: Cache Memory
57 pages
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
71 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
57 pages
13 - Large and Fast Exploiting Memory Hierarchy Final
No ratings yet
13 - Large and Fast Exploiting Memory Hierarchy Final
118 pages
04 - Cache Memory
No ratings yet
04 - Cache Memory
61 pages
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
72 pages
04 - Cache Memory
No ratings yet
04 - Cache Memory
47 pages
55-Types of Caches, Caches Misses,-04!03!2025
No ratings yet
55-Types of Caches, Caches Misses,-04!03!2025
64 pages
04 - Cache Memory (Compatibility Mode)
No ratings yet
04 - Cache Memory (Compatibility Mode)
12 pages
Chap 6
No ratings yet
Chap 6
48 pages
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
No ratings yet
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
53 pages
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
No ratings yet
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
53 pages
William Stallings Computer Organization and Architecture 7th Edition
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition
57 pages
Cache Memory
No ratings yet
Cache Memory
39 pages
Cache Memory
No ratings yet
Cache Memory
51 pages
Lec8 - Caches
No ratings yet
Lec8 - Caches
55 pages
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 8th Edition Cache Memory
71 pages
Lecture 7 Cache Memory
No ratings yet
Lecture 7 Cache Memory
44 pages
Computer Arch 06
No ratings yet
Computer Arch 06
41 pages
EE6304 Lecture9 Mem Caches
No ratings yet
EE6304 Lecture9 Mem Caches
61 pages
CMP3010L08 Memory
No ratings yet
CMP3010L08 Memory
45 pages
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
No ratings yet
Characteristics Location Capacity Unit of Transfer Access Method Performance Physical Type Physical Characteristics Organisation
53 pages
CH04
No ratings yet
CH04
46 pages
Cache + Associations Ch-4
No ratings yet
Cache + Associations Ch-4
52 pages
361 Computer Architecture Lecture 14: Cache Memory
No ratings yet
361 Computer Architecture Lecture 14: Cache Memory
20 pages
Cache Memory
No ratings yet
Cache Memory
61 pages
William Stallings Computer Organization and Architecture 6th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 6th Edition Cache Memory
54 pages
Wk10a Cache PDF
No ratings yet
Wk10a Cache PDF
25 pages
Caches - Basic Idea
No ratings yet
Caches - Basic Idea
11 pages
04 - Cache Memory PDF
No ratings yet
04 - Cache Memory PDF
71 pages
Lec 4
No ratings yet
Lec 4
31 pages
CS2115 Chapter-6
No ratings yet
CS2115 Chapter-6
45 pages
William Stallings Computer Organization and Architecture: Internal Memory
No ratings yet
William Stallings Computer Organization and Architecture: Internal Memory
60 pages
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
No ratings yet
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
66 pages
Unit 4
No ratings yet
Unit 4
72 pages
5-Cache Memories-14-02-2025
No ratings yet
5-Cache Memories-14-02-2025
42 pages
04 Cache Memory
No ratings yet
04 Cache Memory
36 pages
Lecture 13 - Introduction To Cache
No ratings yet
Lecture 13 - Introduction To Cache
47 pages
BiD 05
No ratings yet
BiD 05
97 pages
Unit 1 Part 2 (Chapter 4) Cache Memory
No ratings yet
Unit 1 Part 2 (Chapter 4) Cache Memory
53 pages
CH05
No ratings yet
CH05
56 pages
Memory Organization AndCache Mapping Study 13
100% (1)
Memory Organization AndCache Mapping Study 13
55 pages
04 - Cache Memory
No ratings yet
04 - Cache Memory
79 pages
06 - Memory System - I
No ratings yet
06 - Memory System - I
63 pages
9 - Cache
No ratings yet
9 - Cache
58 pages
Computer Organization and Architecture (AT70.01)
No ratings yet
Computer Organization and Architecture (AT70.01)
49 pages
04 Cache Memory
No ratings yet
04 Cache Memory
75 pages
Cache Memory: Prepared by - : Manan Mewada (TA, IET)
No ratings yet
Cache Memory: Prepared by - : Manan Mewada (TA, IET)
19 pages
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
From Everand
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
Rodrigo Copetti
No ratings yet
Neo Geo Architecture: Architecture of Consoles: A Practical Analysis, #23
From Everand
Neo Geo Architecture: Architecture of Consoles: A Practical Analysis, #23
Rodrigo Copetti
No ratings yet
Az1084s PDF
No ratings yet
Az1084s PDF
17 pages
Compaq Armada m300
No ratings yet
Compaq Armada m300
102 pages
ADBMDatabase System Development Lifecycle
No ratings yet
ADBMDatabase System Development Lifecycle
7 pages
Properties of GCD and LCM
No ratings yet
Properties of GCD and LCM
1 page
SVM 4001 - Instruction Manual and Safety Information
No ratings yet
SVM 4001 - Instruction Manual and Safety Information
45 pages
Industrial AI Applications With Sustainable Performance 1st Edition Jay Lee Download PDF
No ratings yet
Industrial AI Applications With Sustainable Performance 1st Edition Jay Lee Download PDF
40 pages
Format For GWA
No ratings yet
Format For GWA
6 pages
Core Java - Munishwar Gulati
No ratings yet
Core Java - Munishwar Gulati
252 pages
GRES Integrated Energy Storage Systgem User Manual V1.01-EN
No ratings yet
GRES Integrated Energy Storage Systgem User Manual V1.01-EN
112 pages
P N M T: PNMT (Java Version) Operation Manual
No ratings yet
P N M T: PNMT (Java Version) Operation Manual
118 pages
HP 404
No ratings yet
HP 404
36 pages
Guide To Computer Forensics and Investigations Fourth Edition
No ratings yet
Guide To Computer Forensics and Investigations Fourth Edition
44 pages
List of Job Consultancy With Address in Hyderbad
67% (6)
List of Job Consultancy With Address in Hyderbad
20 pages
Data Science CLASS 12 INVESTIGATORY PROJECT
No ratings yet
Data Science CLASS 12 INVESTIGATORY PROJECT
9 pages
Design of Mini Compressor Less Powered Refrigerator: Project Report ON
No ratings yet
Design of Mini Compressor Less Powered Refrigerator: Project Report ON
37 pages
Ppce 12
No ratings yet
Ppce 12
3 pages
Brochure SpeedCast
No ratings yet
Brochure SpeedCast
16 pages
Smsa PDF
No ratings yet
Smsa PDF
61 pages
Aditee ORMB Testing Resume (1) - 1-1
No ratings yet
Aditee ORMB Testing Resume (1) - 1-1
3 pages
CSM User Manual 2020
No ratings yet
CSM User Manual 2020
183 pages
RICOH IM 370 460F Brochure 3
No ratings yet
RICOH IM 370 460F Brochure 3
6 pages
MSC Pool Conceptdfadslfkdslfkdsal
No ratings yet
MSC Pool Conceptdfadslfkdslfkdsal
4 pages
A MATLAB-based Mean-Field-Type Games Toolbox: Continuous-Time
No ratings yet
A MATLAB-based Mean-Field-Type Games Toolbox: Continuous-Time
16 pages
Python-Unit-6 R16 PDF
No ratings yet
Python-Unit-6 R16 PDF
19 pages
Sirena Interior
No ratings yet
Sirena Interior
2 pages
Open Group Guide: Business Capabilities
100% (1)
Open Group Guide: Business Capabilities
25 pages
Ic Datasheet CH en
No ratings yet
Ic Datasheet CH en
2 pages
Lab 5 Password Cracking 2018 v5.10 Temple
No ratings yet
Lab 5 Password Cracking 2018 v5.10 Temple
14 pages
Processor Architecture
No ratings yet
Processor Architecture
25 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Ch01 Part3 Caches

Uploaded by

Ch01 Part3 Caches

Uploaded by

332

Advanced Computer Architecture

Caches: a quick review of introductory memory system

Objective: bring everyone up to speed, and also establish some key

Course materials online on

How can we build a RAM that can implement

• Two Different Types of Locality:

• Most modern architectures are heavily

Data: the cached data itself,

Valid Bit Cache Tag Cache Data

2 by data from main memory

8 location 1, 33, 65, … etc.

How can we tell which one is

Byte 1023 Byte 992 31

Consider the 1KB direct-mapped

Consider the 1KB direct-mapped

In a fully-associative cache, block 12 can be placed in any location in the cache

Adr Tag Adr Tag

• Tag on each block

• Increasing associativity shrinks index, expands tag

Assoc: 2- 4-way 8-way

LRU can be pathologically bad.......

• Write back—The information is written only

• Pros and Cons of each?

• WT always combined with write buffers so

StorageTek Powderhorn before disassembly, CERN 2007

• Each CPU switches every cycle between

• Each thread can have up to 8

• 3D toroidal mesh interconnect

• Memory accesses hashed to spread

• MTA-1 fabricated using Gallium Arsenide,

• Third-generation Cray XMT:

Policy questions: We will see similar choices in

Discussion exercise – the “Turing Tax”

Then dynamic scheduling

Then a deeper dive into caches and the

• To make your session persistent you can use GNU Screen.

• Just type screen in your terminal and it should open up

Finally, a caveat: checkpoint your work - if your machine gets

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.