CS6461 - Computer Architecture Fall 2016 Morris Lancaster - Memory Systems
CS6461 - Computer Architecture Fall 2016 Morris Lancaster - Memory Systems
Fall 2016
Morris Lancaster
Adapted from Professor Stephen Kaislers Notes
Lecture 4 Memory Systems
Processor
4-64 bytes (word)
Inclusive what is in
L1$ is a subset of
Increasing L1$ what is in L2$ is a
distance from subset of what is in
the processor in 16-128 bytes (block)
Main Memory that is
access time L2$ a subset of is in
We show only Secondary Memory
two levels of 1 to 8 blocks (usually disk).
cache here, but
as we will see Main Memory
in later
lectures, some 1,024 4M bytes (disk sector = page)
processors
have three
levels of cache.
Secondary Memory
The Williams tube depends on an effect called secondary Magnetic cores are little donuts with three
emission. When a dot is drawn on a cathode ray tube, the wires passing through them: two select x
area of the dot becomes slightly positively charged and and y, while the third sets the charge or
the area immediately around it becomes slightly not on the core.
negatively charged, creating a charge well. The charge
well remains on the surface of the tube for a fraction of a Who invented magnetic cores?
second, allowing the device to act as a computer memory.
The lifetime of the charge well depends on the electrical
resistance of the inside of the tube.
DRAM
1970 RAM 4.77 MHz
1987 Fast-Page Mode DRAM 20 MHz
1995 Extended Data Output 20 MHz
1997 PC66 Synchronous DRAM 66 MHz
Synchronous DRAM
1998 PC100 Synchronous DRAM 100 MHz
1999 Rambus DRAM 800 MHz
1999 PC133 Synchronous DRAM 133 MHz
2000 DDR Synchronous DRAM 266 MHz
2002 Enhanced DRAM 450 MHz
2005 DDR2 660 MHz
2009 DDR3 800 MHz
And so on
You can think of computer memory as being one big array of data.
The address serves as an array index.
Each address refers to one word of data.
You can read or modify the data at any given memory address, just
like you can read or modify the contents of an array at any given
index.
If youve worked with pointers in C or C++, then youve already
worked with memory addresses.
2k x n memory
CS WR Memory operation
k n
ADRS OUT 0 x None
n
DATA 1 0 Read selected word
CS 1 1 Write selected word
WR
bit lines
Col. Col. word lines
1 2M
Row 1
Row Address
Decoder Row 2N
Memory cell
M (one bit)
N+M Column Decoder & Sense
Amplifiers
D
Data
Memory access time = cache hit time or cache miss rate * miss
penalty
To improve performance: reduce memory time
=> we need to reduce hit time, miss rate & miss penalty.
As L1 caches are in the critical path of instruction execution, hit time
is the most important parameter.
When one parameter is improved, others might suffer
Misses:
Compulsory miss: always occurs first time.
Capacity miss: reduces with increase in cache size.
Conflict miss: reduces with level of associativity.
Types:
Instruction or Data Cache: 1-way or 2-way
Data Cache: write through & write-back
Cache
Processor DRAM
Write Buffer
Q. Are Read After Write (RAW) A. Yes! Drain buffer before next
hazards an issue for write buffer? read, or send read 1st after check
write buffers.
0 1 0 1 0 1 0 1
0 0 1 1 0 0 1 1
0 0 0 0 1 1 1 1
32 memory
blocks
cacheable
Memory
M e m o ry
Advantages:
simple and cheap;
the tag field is short; only those bits have to be stored
which are not used to address the cache (compare with the
following approaches);
access is very fast.
Disadvantage:
a given block fits into a fixed cache location
a given cache line will be replaced whenever there is a
reference to another memory block which fits to the same
line, regardless what the status of the other cache lines is.
This can produce a low hit ratio, even if only a very small
part of the cache is effectively used.
Assume a hit takes 1 clock cycle and the miss penalty is 50 cycles.
Assume a load or store takes 1 extra clock cycle on a unified cache
since there is only one cache port.
Assume 75% of memory accesses are instruction references
(75% X 0.64%) + (25% X 6.47%) = 2.10%
Tag->
Physical Virtual
Index
Physical PP PV
Virtual VP VV
B B B
R R R
10/7/2017 CS61 Computer Architecture 51
Cache Optimizations - IV