0% found this document useful (0 votes)
8 views27 pages

Indexing

Uploaded by

Thanuja Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views27 pages

Indexing

Uploaded by

Thanuja Reddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

PHYSICAL ORGANIZATION

Indexing
Outline

1. Indexes

2. Types of single-level indexes

3. Summary of single level indexes

4. Multiple level indexes


1. Indexes (1)

•A single-level index is an auxiliary file that makes it more efficient to


search for a record in the data file.
•The index is usually specified on one field of the file called the indexing
field (although it could be specified on several fields).
•One form of an index is a file of index entries
<field value, pointer to block>, which is ordered
by field value.
•The index is called an access path on the field.
•The index file usually occupies considerably less disk blocks than the
data file because its entries are much smaller and, depending on the type
of index, fewer than the records of the data file.
•A binary search on the index yields a pointer to the data file record or to
the block of the data file that holds the record.
2. Types of single-level indexes (1)

•There are several types of single-level ordered indexes.


•A primary index is specified on the ordering key field of an ordered
file of data records.
•A clustering index is specified on the ordering non-key field of an
ordered file of data records.
•A secondary index can be specified on any
non-ordering field of a file of data records.
•A data file can have at most one primary index or one clustering index
but not both.
•A data file can have multiple secondary indexes in addition to its primary
access path.
2. Types of single-level indexes (2)

•Indexes can be characterized as dense or sparse.


• A dense index has an index entry for every record in the data file.
• A non-dense (or sparse) index has index entries for only some
of the records in the data file.
2.1. Primary indexes (1)

•Defined on an ordered data file


•The data file is ordered on a key field
•Includes one index entry for each block in the data file – hence it is a
sparse index.
•The index entry has the key field value for the first record in the data
block, which is called the block anchor
•A similar scheme can use the last record in a block
2.1. Primary indexes (2)
2.1. Primary indexes (3)

Example: Consider the following data file: EMPLOYEE(NAME,


SSN, ADDRESS, JOB, ... )
Assume that SSN is the ordering key field.
Suppose that:
number of records in the data file r = 30,000 records
record size R =100 bytes
block size B = 1024 bytes
Then, we get:
blocking factor bfr = ⎣B / R ⎦= ⎣1024 / 100 ⎦ =10 records/block
number of data file blocks b = ⎡r / bfr ⎤=(30000 / 10) = 3000
blocks

The binary search cost of the data file would be:


⎡log2b⎤ = ⎡log23000⎤ = 12 block accesses
2.1. Primary indexes (4)

For an index on the SSN field, assume the field size V = 9 bytes,
assume the block pointer size P = 6 bytes.

Then:
index entry size Ri = ( V + P ) = (9+6) = 15 bytes
index blocking factor bfri = ⎣B / Ri ⎦=
⎣1024 / 15 ⎦= 68 entries/block
number of index entries ri = 3000 (since this is a pri- mary index we have one
index entry per data block)
number of index blocks bi = ⎡ri/bfri ⎤=(3000 / 68) = 45 blocks
binary search of the index file needs ⎡log2bi⎤ =
⎡log245⎤ = 6 block accesses
(+1 additional access to the data file to get the block)
Compared 7 to 12 block accesses required for a binary search of the data
file.
2.1. Primary indexes (5)

•A major problem with a primary index is insertion and deletion of


records.

The techniques used for ordered files can be used here too.

•For insertion of records we can use an unordered overflow file or a


linked list of overflow records for each block in the data file.

•For deletion of records we can use deletion markers.

Both the data file and the index file are periodically
reorganized.
2.2. Clustering indexes (1)

•Defined on an ordered data file

The data file is ordered on a non-key field

Includes one index entry for each distinct value of the field; the index
entry points to the first data block that contains records with that field
value.

It is a sparse index since there is one index entry per distinct index field
value rather than per record in the data file.
2.2. Clustering indexes (2)
2.2. Clustering indexes (3)

•To facilitate insertion and deletion of records it is common to reserve in


the data file a whole block (or a cluster of contiguous blocks) for each
value of the index field.

All data records with the same index field value are placed in the same
block (or cluster of blocks).


2.2. Clustering indexes (4)
Clustering Index with a Separate Block Cluster for Each Group
of Records That Share the Same Value for the Clustering Field
2.3. Secondary indexes (1)

•Defined on a data file not ordered based on the indexing field.

Can be defined on a key field or a non-key field


2.3. Secondary indexes on a key field

•A secondary index on a key field includes one entry for each record in the
data file; hence, it is a dense index

A pointer in an index entry points to a record or to a block in which the


record is stored.

•A secondary index usually needs more storage space and longer search
time than a primary index (why?)

•The improvement in search time for an arbitrary record is much greater


for a secondary index than for a primary index (why?)
2.3. Secondary indexes on a key field

A Dense Secondary Index (with Block Pointers)


on a Nonordering Key Field of a File
2.3. Secondary indexes on a key field

Example: Consider the previous example: EMPLOYEE(NAME, SSN,


ADDRESS, JOB, ... )
We construct a secondary index on a non-ordering key field of size V = 9
bytes.
As previously:
number of records in the data file r = 30,000 records
record size R =100 bytes
block size B = 1024 bytes
We have computed:
blocking factor bfr = ⎣B / R ⎦= ⎣1024 / 100 ⎦ =10 records/block
number of data file blocks b = ⎡r / bfr ⎤=(30000 / 10) = 3000 blocks
Then:
The average linear search cost of the data file is: (b/2) = 3000/2 = 1500
block accesses (assuming that the record exists in the file)
2.3. Secondary indexes on a key field

For the secondary index on a key field, assume the block pointer size
P = 6 bytes.

Then:
index entry size Ri = ( V + P ) = (9+6) = 15 bytes
index blocking factor bfri = ⎣B / Ri ⎦=
⎣1024 / 15 ⎦= 68 entries/block

number of index blocks bi = ⎡r / bfri ⎤=


(30000 / 68) = 442 blocks (we have one index entry per record in
the data file)
binary search of the index file needs ⎡log2bi⎤ =
⎡log2442⎤ = 9 block accesses
(+1 additional access to the data file to get the data block)
Compare 10 to the 1500 block accesses required on the average for a linear
search of the data file.
2.3. Secondary indexes on a non-key field

• If the indexing field is not a key field of the data file, multiple data records can
have the same value for the indexing field.

• There are different implementation options:

Option 1: Include in the index one entry for each data record – dense index.

Option 2: Use a variable length record for the index entries with a repeating
field for the pointer.

Option 3: (most common) have a single entry for each index field value and
have the pointer point to a block of pointers (or to a cluster or linked list of
blocks if necessary).

• A secondary index provides a logical ordering of the data records by the


indexing field.

2.3. Secondary indexes on a non-key field
3. Summary of single-level indexes (1)

Types of indexes based on the properties of the


indexing field.

Ordering Non
field ordering
field
Key field Primary index Secondary
index (key)
Non key field Clustering Secondary
index index (non-key)
3. Summary of single-level indexes (2)

Properties of index types.

TYPE OF NUMBER OF FIRST LEVEL DENSE/ BLOCK


INDEX INDEX ENTRIES SPARSE ANCHORING COMMENT

Primary Number of Blocks in data file Sparse Yes

Yes No distinct values of the ordering


Number of distinct index field field in the same data block
Clustering values Sparse
No

Secondary
(key) Number of records in data file Dense No No

Number of records in data file Dense No


Secondary One index entry for each data
(non key) record
Number of distinct index field Sparse No One index entry for each distinct
value
values of the index field, and separate
blocks of data record pointers
4. Multiple level indexes (1)

•Because a single-level index is an ordered file, we can create a primary


index to the index itself ; in this case, the original index file is called the
first- level index and the index to the index is called the second-level
index.

•The blocking factor of the second (and higher level) indexes is called
fan-out of the multilevel index.

•We can repeat the process, creating a third, fourth, ..., top level until all
entries of the top level fit in one disk block.

•A multi-level index can be created for any type of first-level index


(primary, secondary, clustering) as long as the first-level index consists of
more than one disk block.
4. Multiple level indexes (2)
4. Multiple level indexes (3)
Example: Consider the previous example: EMPLOYEE(NAME, SSN,
ADDRESS, JOB, ...
We convert the dense secondary index into a
multilevel index.
We have computed:
number of first level index blocks b1 = ⎡r/bfr1⎤= (30000 / 68) = 442 blocks.
Then:
The fan-out fo of the multilevel index equals bfr1.
number of second level index blocks b2 = ⎡b1 / fo⎤ =(442 / 68) = 7 blocks
number of third level index blocks b3 = ⎡b2 / fo⎤ =(7 / 68) = 1 block (top level of
the index)

To access a data record through the multilevel index we need 3 + 1 = 4 block


accesses.

Compare 4 to 10 block accesses needed when a single-level index and binary


search is used.
4. Multiple level indexes (4)

•A multi-level index is a form of search tree ; however, insertion and


deletion of new index entries is a severe problem because every level of
the index is an ordered file.

•Because of the insertion and deletion problem, most multi-level indexes


use dynamic multilevel indexes, which leave space in each tree node
(disk block) to allow for new index entries.

•Dynamic multilevel indexes are often implemented using data structures


called B-trees and B+-trees.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy