Disks, Memories & Buffer Management: "The Two Offices of Memory Are Collection and Distribution." - Samuel Johnson
Disks, Memories & Buffer Management: "The Two Offices of Memory Are Collection and Distribution." - Samuel Johnson
Buffer Management
CS3223 - Storage 1
What does a DBMS Store?
• Relations – Actual data
• Indexes – Data structures to speed up access to relations
• System catalog (a.k.a. data dictionary) stores metadata
about relations
– Relation schemas – structure of relations, constraints,
triggers
– View definitions
– Statistical information about relations for use by query
optimizer
– Index metadata
• Log files – information maintained for data recovery
CS3223 - Storage 2
Where are the data stored?
• Memory Hierarchy
– Primary memory: registers, static RAM (caches), dynamic RAM
(physical memory)
• Currently used data
– Secondary memory: magnetic disks (HDD), solid state disks (SSD)
• Main database
• SSD can also be used as an intermediary between disk and RAM
– Tertiary memory: optical disks, tapes, jukebox
• Archiving older versions of the data
• Infrequently accessed data
• Tradeoffs:
– Capacity
– Cost
– Access speed
– Volatile vs non-volatile
CS3223 - Storage 3
Memory Hierarchy
CS3223 - Storage 4
Data Access
• DBMS stores information on non-volatile (“hard”)
disks
• DBMS processes data in main memory (RAM)
• This has major implications for DBMS design!
– READ: transfer data from disk to main memory (RAM)
– WRITE: transfer data from RAM to disk
– Both are high-cost operations, relative to in-memory
operations, so must be planned carefully!
CS3223 - Storage 5
Disks
• Secondary storage device of choice
• Offers random access to data
• Data is stored and retrieved in units called disk
pages or blocks (consecutive number of pages)
– Typical page size is 4KB – 1MB
– Typical block size is 1MB – 64MB
• Unlike RAM, time to retrieve a disk page varies
depending upon its “relative” location on disk at
the time of access
– Therefore, relative placement of pages on disk has major impact on
DBMS performance!
CS3223 - Storage 6
Components
of a Disk
The platters spin (say, 120rps)
CS3223 - Storage 7
Components of Disk Access Time
CS3223 - Storage 8
Accessing a Disk Page
• Time to access (read/write) a disk block:
– seek time (moving arms to position disk head on track)
– rotational delay (waiting for block to rotate under head)
– transfer time (actually moving data to/from disk surface)
• Seek time and rotational delay dominate
– Seek time varies from about 0.3 to 10msec
– Rotational delay varies from 0 to 4msec
– Transfer rate is about 0.05msec per 8KB page
• Key to lower I/O cost: reduce seek/rotation
delays!
CS3223 - Storage 9
Improving Access Time of Secondary
Storage
CS3223 - Storage 10
An Example
• How long does it take to read a 2,048,000-byte file
that is divided into 8,000 256-byte records
assuming the following disk characteristics?
average seek time 18 ms
track-to-track seek time 5 ms
average rotational delay 8.3 ms
maximum transfer rate 16.7 ms/track
bytes/sector 512
sectors/track 40
tracks/cylinder 11
tracks/surface 1,331
• 1 track contains 40*512 = 20,480 bytes, the file
needs 100 tracks (~10 cylinders)
CS3223 - Storage 11
Design Issues
CS3223 - Storage 12
Design Issues
* The actual value is smaller as the last cylinders does not need to read all 11 tracks
CS3223 - Storage 13
Why Not Store Everything in Main Memory?
• Costs too much? Not any more
– <$1 will buy you 1 GB of RAM
• Data is also increasing at an alarming rate
– “Big-Data” phenomenon
• Main memory is volatile
– We want data to be saved between runs
• Memory error
– Larger memory means higher chances of data corruption
• Energy issues
– In a typical query execution in an in-memory database, 59% of the overall energy
is spent in main memory
– Furthermore, there are inherent physical limitations related to leakage current
and voltage scaling that prevent DRAM from further scaling
• Multiple applications
– DBMS is running more than one applications, and managing more than one
databases. These are competing for the memory resource.
CS3223 - Storage 14
Disk Space Management
• Many files will be stored on a single disk
• Need to allocate space to these files so that
– disk space is effectively utilized
– files can be quickly accessed
• Several issues
– How is the free space in a disk managed?
• system maintains a free space list -- implemented as bitmaps or
link lists
– How is the free space allocated to files?
• granularity of allocation (blocks, extents)
• allocation methods (contiguous, linked)
– How is the allocated space managed?
CS3223 - Storage 15
Managing Free Space: Bitmap
• Each block (one or more • Consider a disk whose
pages) is represented by blocks 2, 3, 4, 5, 8, 9, 10,
one bit 11, 12, 13, 17, etc. are
• A bitmap is kept for all free. The bitmap would
blocks in the disk be
• 110000110000001...
– if a block is free, its
corresponding bit is 0
– if a block is allocated, its
corresponding bit is 1
0 1 2 3 4 5 6 7
• To allocate space, scan the
8 9 10 11 12 13 14 15
map for 0s
CS3223 - Storage 16
Managing Free Space: Link Lists
• Link all the free disk blocks together
– each free block points to the next free block
• DBMS maintains a free space list head (FSLH) to the first
free block
• To allocate space
FSLH
– look up FSLH
– follow the pointers
– reset the FSLH 0 1 2 3 4 5 6 7
8 9 10 11 12 13 14 15
CS3223 - Storage 17
Allocation of Free Space
• Granularity
– pages vs blocks (multiple consecutive pages) vs
extents (multiple consecutive blocks)
• smaller granularity more fragmented
• larger granularity leads to lower space utilization; good as file
grows in size
• Allocation methods
– contiguous: all pages/blocks/extents are close by
• may need to reclaim space frequently
– linked lists: simple but may be fragmented
CS3223 - Storage 18
Managing Space Allocated to Files: Heap
(Unordered) File Implemented as a List
Data
DIRECTORY Page N
• The entry for a page can include the number of free bytes on the
page.
• The directory is a collection of pages; linked list implementation
is just one alternative
– Much smaller than linked list of all HF pages!
CS3223 - Storage 20
Buffer Management in a DBMS
Page Requests from Higher Levels • Data must be in RAM for
DBMS to operate on it!
BUFFER POOL
• Buffer pool = main memory
disk page allocated for DBMS
• Buffer pool is partitioned
free frame into pages called frames
MAIN MEMORY • Table of <frame#, pageid>
DISK pairs is maintained
choice of frame • Each frame has two
DB dictated
by replacement
values: pin count and dirty
policy flag
CS3223 - Storage 21
When a Page is Requested ...
• If requested page is not in the buffer pool:
– If no free frames available
• Choose a frame for replacement
– What are such frames?? How to choose?
• If frame is dirty, write it to disk
– Read requested page into chosen frame
• Pin the page (or increase pin count) and return its address
• What if
– a page is requested/shared by multiple transactions?
– no page can be replaced? (when will this happen?)
• Cost to access a page??
CS3223 - Storage 23
Files of Records
• Page or block is OK when doing I/O, but higher levels
of DBMS operate on records, and files of records.
• FILE: A collection of pages, each containing a
collection of records. Must support:
– Create/insert/delete/modify record
– Read a particular record (specified using record id)
– Scan all records (possibly with some conditions on
the records to be retrieved)
CS3223 - Storage 24
How are records stored?
Record Formats
Fixed Length Variable Length:
F1 F2 F3 F4 F1 F2 F3 F4
L1 L2 L3 L4 4 $ $ $ $
CS3223 - Storage 25
How are pages structured?
Page Formats: Fixed Length Records
Slot 1
Slot 2
Free
Space
...
Slot N
Slot M
1 . . . 0 1 1M
M ... 3 2 1 number
UNPACKED, BITMAP of slots
Rid = (i,N)
20 bytes Page i
Rid = (i,2)
16 bytes
Rid = (i,1)
24 bytes
20 16 24 N Pointer
N ... 2 1 # slots to start
of free
space
SLOT DIRECTORY
Disk storage
CS3223 - Storage 28