Indexing
Indexing
Indexing
Outline
1. Indexes
For an index on the SSN field, assume the field size V = 9 bytes,
assume the block pointer size P = 6 bytes.
Then:
index entry size Ri = ( V + P ) = (9+6) = 15 bytes
index blocking factor bfri = ⎣B / Ri ⎦=
⎣1024 / 15 ⎦= 68 entries/block
number of index entries ri = 3000 (since this is a pri- mary index we have one
index entry per data block)
number of index blocks bi = ⎡ri/bfri ⎤=(3000 / 68) = 45 blocks
binary search of the index file needs ⎡log2bi⎤ =
⎡log245⎤ = 6 block accesses
(+1 additional access to the data file to get the block)
Compared 7 to 12 block accesses required for a binary search of the data
file.
2.1. Primary indexes (5)
The techniques used for ordered files can be used here too.
Both the data file and the index file are periodically
reorganized.
2.2. Clustering indexes (1)
Includes one index entry for each distinct value of the field; the index
entry points to the first data block that contains records with that field
value.
It is a sparse index since there is one index entry per distinct index field
value rather than per record in the data file.
2.2. Clustering indexes (2)
2.2. Clustering indexes (3)
All data records with the same index field value are placed in the same
block (or cluster of blocks).
→
2.2. Clustering indexes (4)
Clustering Index with a Separate Block Cluster for Each Group
of Records That Share the Same Value for the Clustering Field
2.3. Secondary indexes (1)
•A secondary index on a key field includes one entry for each record in the
data file; hence, it is a dense index
•A secondary index usually needs more storage space and longer search
time than a primary index (why?)
For the secondary index on a key field, assume the block pointer size
P = 6 bytes.
Then:
index entry size Ri = ( V + P ) = (9+6) = 15 bytes
index blocking factor bfri = ⎣B / Ri ⎦=
⎣1024 / 15 ⎦= 68 entries/block
• If the indexing field is not a key field of the data file, multiple data records can
have the same value for the indexing field.
Option 1: Include in the index one entry for each data record – dense index.
Option 2: Use a variable length record for the index entries with a repeating
field for the pointer.
Option 3: (most common) have a single entry for each index field value and
have the pointer point to a block of pointers (or to a cluster or linked list of
blocks if necessary).
Ordering Non
field ordering
field
Key field Primary index Secondary
index (key)
Non key field Clustering Secondary
index index (non-key)
3. Summary of single-level indexes (2)
Secondary
(key) Number of records in data file Dense No No
•The blocking factor of the second (and higher level) indexes is called
fan-out of the multilevel index.
•We can repeat the process, creating a third, fourth, ..., top level until all
entries of the top level fit in one disk block.