0% found this document useful (0 votes)
20 views6 pages

Indexing

Indexing is a technique that improves data retrieval efficiency in databases by creating auxiliary data structures, allowing the DBMS to quickly locate data without full table scans. There are two main types of indexing: primary indexing, which is based on the primary key and can be dense or sparse, and secondary indexing, which allows for different ordering of data. While indexing enhances query performance and data access, it also requires careful consideration due to its impact on storage, data modification performance, and maintenance overhead.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views6 pages

Indexing

Indexing is a technique that improves data retrieval efficiency in databases by creating auxiliary data structures, allowing the DBMS to quickly locate data without full table scans. There are two main types of indexing: primary indexing, which is based on the primary key and can be dense or sparse, and secondary indexing, which allows for different ordering of data. While indexing enhances query performance and data access, it also requires careful consideration due to its impact on storage, data modification performance, and maintenance overhead.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Indexing

Indexing is a technique for creating auxiliary data structures that enable the database management system
(DBMS) to locate and access desired data more efficiently. Think of an index as a lookup table that provides a
shortcut, directing the DBMS to the location of data that matches the search criteria. Without an index, the DBMS
might need to perform a full table scan, reading each record in the table, to find the ones that match the query, which
is slow and resource-intensive, especially for large tables.

An index consists of two main components:

Search Key: This contains a copy of the column(s) on which the index is created, with values stored in sorted
order to allow for efficient searching.
Data Reference/Pointer: This is a set of pointers associated with each search key value that point to the disk
location of the corresponding data records. This allows the DBMS to directly access the data blocks containing
the matching records.

There are two main types of indexing methods based on their attributes: Primary indexing and Secondary
indexing.

Primary Indexing Or Single Level Indexing

This is an ordered file of fixed length with two fields. The first field is the same as the primary key, and the second field
points to the specific data block. It always has a one-to-one relationship between entries in the index table. Primary
indexing can be further divided into two types:
Dense Index: A record is created for every search key value in the database. This provides faster search
capabilities but needs more storage space for index records. Each record in a dense index contains the search
key value and a pointer to the actual record on the disk.
Sparse Index: An index record appears for only some of the values in the file. This helps resolve the issues of
dense indexing by storing the same data block address for a range of index columns. When data needs to be
retrieved, the block address is fetched. Sparse indices need less space and incur less maintenance overhead for
insertions and deletions, but they can be slower for locating records.

Secondary Indexing Or Multilevel Indexing

Secondary indexing is a method where the search key specifies a different order from the sequential order of the file.
It does not store data physically in the order of the index. Instead, it uses pointers or references (often in leaf nodes)
to indicate the actual data location. For example, a book's contents page provides an ordered reference to the
location of information on each page, even though the data (the information on each page) is not physically
organized. Secondary indexing can only have dense ordering because data is not physically organized according to
the index. It takes longer than clustered indexing because it requires extra steps to follow the pointer to the actual
data location.

B-Tree Index

The B-tree index is a widely used, multilevel, tree-based indexing technique that utilizes balanced binary search trees.
All leaf nodes in a B-tree represent actual data pointers and are interconnected with a linked list, supporting both
random and sequential access. Key features of a B-Tree Index include:

Leaf nodes must have between 2 and 4 values.


Every path from the root to a leaf is of the same length.
Non-leaf nodes (excluding the root) have between 3 and 5 children.
Each node (except the root and leaf nodes) has between n/2 and n children.

Advantages of Indexing

Improved Query Performance: Indexing significantly reduces the time to execute queries that use indexed
columns. The DBMS can quickly find matching rows without accessing every row in the database.
Faster Search and Retrieval: Users can access data more quickly and efficiently.
Reduced Tablespace: By eliminating the need to store ROWIDs in the index, indexing can reduce tablespace
requirements.
Efficient Data Access: By reducing the number of disk I/O operations required to retrieve data, indexing
improves data access efficiency. For frequently accessed data, the index can help keep the relevant data pages
in memory, minimizing the need to read from the disk.
Optimized Data Sorting: Indexing can speed up sorting operations by allowing the DBMS to sort only the
relevant rows based on the index, rather than sorting the entire table.
Consistent Data Performance: Indexing helps maintain consistent database performance even as the data
volume increases.
Enforced Data Integrity: Indexing can be used to enforce data integrity by ensuring that only unique values are
inserted into indexed columns designated as unique.

Disadvantages of Indexing

Requirement for a Primary Key: Indexing requires a primary key on the table with unique values.
Restrictions on Indexed Data: It's not possible to perform other indexes on data that has already been indexed.
Limitations on Index-Organized Tables: Indexed-organized tables cannot be partitioned.
Performance Impact on Data Modification: Indexing can negatively impact the performance of INSERT,
DELETE, and UPDATE queries.
Increased Storage Space: Storing the index data structure requires additional storage space.
Increased Database Maintenance Overhead: Maintaining the index structure as data is added, deleted, or
modified adds to the database maintenance overhead.
Difficulty in Choosing an Index: Selecting the right indexes for a specific query or application can be
challenging and might require careful analysis of data and access patterns.

Choosing the Right Indices is crucial for database design and performance tuning. The specific indices to create
depend on:

The nature of the data


The types of queries frequently executed
The database's overall workload

Understanding the different types of indices, their advantages, and their limitations is essential for designing
and managing efficient database systems.

Primary Indexing

Primary indexing is a type of indexing in which the index is built on the Primary Key of a table and induces a
sequential file organisation. This means that the data in the table is physically stored in the same order as the
index's search key values. The primary key is a column or group of columns in a table that uniquely identifies every
row in that table. Because primary keys are unique and are stored in sorted order, searching operations are very
efficient. There can only be one primary or clustered index per table.

*mainly search in Binary Search

Primary indexing is further divided into two types:


Dense index: An index entry is created for every search key value in the data file . The index entry contains the
search key value and a pointer to the first data record with that search key value. Because data records with the
same search key value are stored sequentially after the first record, lookups using a dense index are very fast.
However, dense indices take up more space and have greater maintenance overhead for insertions and
deletions. Figure 14.2 illustrates a dense index.
Sparse index: An index entry appears for only a few of the search key values in the data file . This method
addresses the issues of dense indexing. Each entry points to a block of data records, and the index entry
contains the search key value and a pointer to the first data record with that search key value in the block. To
locate a record, the DBMS finds the index entry with the largest search key value that is less than or equal to the
search key value being sought and starts at the record pointed to by that index entry. The DBMS then follows the
pointers in the data file, reading records sequentially until it finds the desired record. The number of accesses
required is log₂(n)+1, where n is the number of blocks acquired by the index file. Sparse indices require less
space and have less maintenance overhead for insertions and deletions than dense indices. However, sparse
indices are slower than dense indices for locating records. Figure 14.3 shows a sparse index.

Since a primary index determines the physical order of data records in the table , it is also called a clustering
index . Clustered indexing can also be used when creating an index on non-primary key columns that are not unique
for each record. In these cases, to identify records faster, two or more columns can be grouped together to create a
unique value, and then an index is created on that composite search key. This essentially groups records with similar
properties together.

Primary indices are suitable for queries that retrieve a range of values because the data is physically stored
in sorted order. For example, a primary index on the 'date of birth' column could be used to quickly retrieve all
students born between 1990 and 2000.

Note: If the records in a database table are stored in a B+-tree file organisation or another file organisation that
requires the relocation of records, secondary indices, which are non-clustering indices, usually do not store pointers
to the data records. Instead, they store the value of the attribute that is used as the search key in the B+-tree file
organisation. This means that two steps are required to access a record through a secondary index in these file
organisations: first, the secondary index is searched to find the B+-tree file organisation search key value; then, the
B+-tree file organisation is searched to find the record.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy