William Stallings Computer Organization and Architecture 7th Edition Cache Memory
William Stallings Computer Organization and Architecture 7th Edition Cache Memory
Computer Organization
and Architecture
7th Edition
Chapter 4
Cache Memory
Characteristics of Memory Systems
• Location
• Capacity
• Unit of transfer
• Access method
• Performance
• Physical type
• Physical characteristics
• Organisation
Location
• CPU
• Internal
—Directly accessible by CPU
• External
—Accessible by CPU via an I/O module
Capacity
• Word size
—The natural unit of organisation
• Number of words
—or Bytes
Unit of Transfer
• Internal
—Usually governed by data bus width
• External
—Usually a block which is much larger than a
word
• Addressable unit
—Smallest location which can be uniquely
addressed
—Word internally
—Cluster on disks
Access Methods (1)
• Sequential
—Start at the beginning and read through in
order
—Access time depends on location of data and
previous location
—e.g. tape
• Direct
—Individual blocks have unique address
—Access is by jumping to vicinity plus sequential
search
—Access time depends on location and previous
location
—e.g. disk
Access Methods (2)
• Random
—Individual addresses identify locations exactly
—Access time is independent of location or
previous access
—e.g. RAM
• Associative
—Data is located by a comparison with contents
of a portion of the store
—Access time is independent of location or
previous access
—e.g. cache
Performance
• Access time
—Time between presenting the address and
getting the valid data
• Memory Cycle time
—Time may be required for the memory to
“recover” before next access
—Cycle time is access + recovery
• Transfer Rate
—Rate at which data can be moved
Physical Types
• Semiconductor
—RAM
• Magnetic
—Disk & Tape
• Optical
—CD & DVD
Physical Characteristics
• Volatility/non-volatile
—Main memory
—Flash memory
—Hard Disk
• Erasable/non-erasable
—RAM
—ROM
—CD-ROM
Organisation
• Physical arrangement of bits into words
• Not always obvious
• e.g. interleaved
The Bottom Line
• How much?
—Capacity
• How fast?
—Time is money
• How expensive?
So you want fast?
• It is possible to build a computer which
uses only static RAM (see later)
• This would be very fast
• This would need no cache
—How can you cache cache?
• This would cost a very large amount
Memory Hierarchy
• Registers
—In CPU
• Internal or Main memory
—May include one or more levels of cache
—“RAM”
• External memory
—Backing store
Memory Hierarchy - Diagram
Hierarchy List
• Registers
• L1 Cache
• L2 Cache
• Main memory
• Disk cache
• Disk
• Optical
• Tape
Locality of Reference
• Basis for the performance advantage of
memory hierarchy
• During the course of the execution of a
program, memory references tend to
cluster
• e.g. loops, arrays
Cache: concept
• Small amount of fast memory
• Sits between normal main memory and
CPU
• May be located on CPU chip or module
Cache/Main Memory Structure
Cache operation – overview
• CPU requests contents of memory location
• Check cache for this data
• If present, get from cache (fast)
• If not present, read required block from
main memory to cache
• Then deliver from cache to CPU
• Cache includes tags to identify which
block of main memory is in each cache
slot
Cache Read Operation - Flowchart
Typical Cache Organization
Cache Design
• Design parameters
—Cache Size
—Mapping Function
—Replacement Algorithm
—Write Policy
—Block Size
—Number of Caches
Size does matter
• Cost
—More cache is expensive
• Speed
—More cache is faster (up to a point)
—Checking cache for data takes time
Comparison of Cache Sizes
Year of
Processor Type L1 cachea L2 cache L3 cache
Introduction
IBM 360/85 Mainframe 1968 16 to 32 KB — —
PDP-11/70 Minicomputer 1975 1 KB — —
VAX 11/780 Minicomputer 1978 16 KB — —
IBM 3033 Mainframe 1978 64 KB — —
IBM 3090 Mainframe 1985 128 to 256 KB — —
Intel 80486 PC 1989 8 KB — —
Pentium PC 1993 8 KB/8 KB 256 to 512 KB —
PowerPC 601 PC 1993 32 KB — —
PowerPC 620 PC 1996 32 KB/32 KB — —
PowerPC G4 PC/server 1999 32 KB/32 KB 256 KB to 1 MB 2 MB
IBM S/390 G4 Mainframe 1997 32 KB 256 KB 2 MB
IBM S/390 G6 Mainframe 1999 256 KB 8 MB —
Pentium 4 PC/server 2000 8 KB/8 KB 256 KB —
High-end server/
IBM SP 2000 64 KB/32 KB 8 MB —
supercomputer
CRAY MTAb Supercomputer 2000 8 KB 2 MB —
Itanium PC/server 2001 16 KB/16 KB 96 KB 4 MB
SGI Origin 2001 High-end server 2001 32 KB/32 KB 4 MB —
Itanium 2 PC/server 2002 32 KB 256 KB 6 MB
IBM POWER5 High-end server 2003 64 KB 1.9 MB 36 MB
CRAY XD-1 Supercomputer 2004 64 KB/64 KB 1MB —
Mapping Function
• Map memory block to cache line
—Where to store a memory block in cache
—How to determine whether a memory block is
in cache
• Three techniques
—Direct
—Associative
—Set associative
Direct Mapping
• Each block of main memory maps to only
one cache line
—i.e. if a block is in cache, it must be in one
specific place
• Address is in two parts
• Least Significant w bits identify unique
word
• Most Significant s bits specify one memory
block
• The MSBs are split into a cache line field r
and a tag of s-r (most significant)
Direct Mapping
Cache Line Table
Cache line Main Memory blocks held
0 0, m, 2m, 3m, …
1 1,m+1, 2m+1, …
Increased processor speed results in external bus becoming a Move external cache on-chip, 486
bottleneck for cache access. operating at the same speed as the
processor.
Contention occurs when both the Instruction Prefetcher and Create separate data and instruction Pentium
the Execution Unit simultaneously require access to the caches.
cache. In that case, the Prefetcher is stalled while the
Execution Unit’s data access takes place.
Some applications deal with massive databases and must Add external L3 cache. Pentium III
have rapid access to large amounts of data. The on-chip
caches are too small.
Move L3 cache on-chip. Pentium 4
Pentium 4 Block Diagram
Pentium 4 Core Processor
• Fetch/Decode Unit
— Fetches instructions from L2 cache
— Decode into micro-ops
— Store micro-ops in L1 cache
• Out of order execution logic
— Schedules micro-ops
— Based on data dependence and resources
— May speculatively execute
• Execution units
— Execute micro-ops
— Data from L1 cache
— Results in registers
• Memory subsystem
— L2 cache and systems bus
Pentium 4 Design Reasoning
• Decodes instructions into RISC like micro-ops before L1
cache
• Micro-ops fixed length
— Superscalar pipelining and scheduling
• Pentium instructions long & complex
• Performance improved by separating decoding from
scheduling & pipelining
— (More later – ch14)
• Data cache is write back
— Can be configured to write through
• L1 cache controlled by 2 bits in register
— CD = cache disable
— NW = not write through
— 2 instructions to invalidate (flush) cache and write back then
invalidate
• L2 and L3 8-way set-associative
— Line size 128 bytes
PowerPC Cache Organization
• 601 – single 32kb 8 way set associative
• 603 – 16kb (2 x 8kb) two way set
associative
• 604 – 32kb
• 620 – 64kb
• G3 & G4
—64kb L1 cache
– 8 way set associative
—256k, 512k or 1M L2 cache
– two way set associative
• G5
—32kB instruction cache
—64kB data cache
PowerPC G5 Block Diagram
Internet Sources
• Manufacturer sites
—Intel
—IBM/Motorola
• Search on cache