0% found this document useful (0 votes)
9 views

Lec 8 Indexing & Data Structures for Query Processing

The document provides an overview of indexing and data structures used in query processing within databases, highlighting the importance of indexing for faster data retrieval and improved query performance. It discusses various types of indexes, including B-Trees, Hash Indexes, and Composite Indexes, along with their advantages and use cases. Additionally, it covers the mechanisms of unique, primary, and clustered indexes, emphasizing their roles in ensuring data integrity and optimizing queries.

Uploaded by

mhariskhan513
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Lec 8 Indexing & Data Structures for Query Processing

The document provides an overview of indexing and data structures used in query processing within databases, highlighting the importance of indexing for faster data retrieval and improved query performance. It discusses various types of indexes, including B-Trees, Hash Indexes, and Composite Indexes, along with their advantages and use cases. Additionally, it covers the mechanisms of unique, primary, and clustered indexes, emphasizing their roles in ensuring data integrity and optimizing queries.

Uploaded by

mhariskhan513
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Indexing & Data Structures for Query

Processing
Lecture Agenda
⚫ Introduction to Query Processing & Indexing
⚫ B-Tree Indexing
⚫ Hash Indexing
⚫ Indexing Mechanisms in DBMS
⚫ Unique, Composite, and Covering Indexes
What is Indexing?
⚫ Indexing is a data structure technique used in
databases to speed up the retrieval of records.
⚫ Instead of scanning every page (or record), the
database uses the index to quickly locate the data.
Advantages of Indexing

⚫ Faster query execution:


Reduces no. of rows scanned, mostly in large tables
⚫ Improved performance:
Optimize SELECT queries and JOIN operations
⚫ Efficient searching:
Locates rows based on key columns much faster
Advantages (contd..)
• Enforce Uniqueness:
Unique indexes help enforce constraints like primary
keys or unique values
• Better Sorting and Grouping:
Speeds up ORDER BY, GROUP BY and range based
queries
⚫ Supports composite searches:
Composite indexes help with queries using multiple
cloumns in conditions
Example
⚫ Without Index: linear scan(Full table Scan)
SELECT * FROM employee WHERE emp-id = 105;
The DBMS checks each row until it finds emp-id = 105

⚫ With Index: logarithmic/constant time


With index on emp-id
The DBMS directly jumps to the correct record using the
index – much faster
Key VS Index
Key Index
⚫ Ensures uniqueness and ⚫ Improves query
integrity of data performance (faster data
⚫ A constraint defined on retrieval)
column ⚫ A data structure (eg.
⚫ Enforce rules (eg. No B-tree,hash) used to find
duplicate primary key) data quickly
⚫ Creating a key often ⚫ Does not enforce rules,
creates an index just finds data faster
⚫ Primary key, unique key, ⚫ Creating an index does not
foreign key create a key
⚫ B-tree, hash, composite
index
Example
Key:
CREATE TABLE employee (
emp_id INT PRIMARY KEY, -- This is a key (enforces
uniqueness)
name VARCHAR(50)
);
Index:
CREATE INDEX emp_name_index ON employee(name);
This index improves query performance , but does not
enforce uniqueness
All keys create indexes, but not all indexes are keys
Indexing Mechanism
⚫ Database creates a special data structure (like a tree or
table) based on one or more columns.
⚫ Each entry in the index points to the location of the
corresponding record in the database table.
Types of Indexes

Type Meaning Example Use


Automatically created on
Primary Index Student ID
primary key
Created on non-primary
Secondary Index Student Name
attributes
Ensures all values are
Unique Index Email address
unique
also called a
Composite Index Dept, age
multi-column index
Physically reorders the
One per table (e.g., Order
Clustered Index table data to match the
IDs)
index
Separate structure; Search by Customer
Non-clustered Index
doesn’t change table order Name
Primary Index
⚫ A Primary Index is an index that is automatically
created on a table’s Primary Key.
⚫ It organizes the data rows based on the primary key.
⚫ It makes searching very fast because the data is sorted
and indexed by that key.
⚫ One primary index per table only.
⚫ No duplicate values allowed (because Primary Key
itself must be unique).
Secondary Index
⚫ A Secondary Index is an index that is created on a
non-primary key column.
⚫ It helps to speed up searching on columns other than
the primary key.
⚫ A table can have many secondary indexes.
Primary Index = on Primary Key
Secondary Index = on Any other column
Clustered Index
⚫ A Clustered Index is an index where the actual data
rows are stored in the same order as the index.
⚫ Data IS the index — it’s arranged physically on disk
according to the indexed column.
⚫ Only one clustered index per table (because you can
physically sort the table in only one way).
Example of Clustered Index
Suppose you have a Student table:
⚫ If you create a Clustered Index on RollNo:
⚫ The database sorts the records physically by RollNo.
⚫ Searching becomes very fast because records are
already in order.
RollNo Name Grade
101 Ali A
102 Sara B
103 John A
Non-Clustered Index
⚫ A Non-Clustered Index is an index where the index
and the actual data are stored separately.
⚫ It does NOT change the physical order of the table’s
rows.
⚫ It just creates a new structure (like a lookup table) with
pointers to the actual data rows.
Think of it like:
⚫ A separate mini table that knows where each row is.
⚫ You find the data using the index without moving the
original data.
Example of Non-Clustered
Consider the table Student
⚫ RollNo is the Primary Key → Clustered Index.
⚫ You can create a Non-Clustered Index on Name.
Now when searching for Name = 'Sara':
⚫ Database looks at the Non-Clustered Index for Name.
⚫ Then it uses a pointer to jump to the correct row.
Unique Index
⚫ A unique index ensures that no two rows in a table
have the same value in the indexed column(s). Its used
to enforce uniqueness and also improves query
performance.
⚫ Key features:
⚫ Prevents duplicate value in a column or a group of
column.
⚫ Typically created on columns that must be unique (like
e-mail, username etc).
⚫ A unique constraint creates a unique index.
Example
Scenario:
In a user registration system, each user must have a unique email.
CREATE TABLE users
( user_id INT PRIMARY KEY, email VARCHAR(100), name
VARCHAR(50) );

Now we enforce uniqueness


-- Create Unique Index on email
CREATE UNIQUE INDEX indx_unique_email ON users(email);

Now, trying to insert a duplicate email:


INSERT INTO users VALUES (1, 'john@example.com', 'John');
This will fail (email already exists)
INSERT INTO users VALUES (2, 'john@example.com',
'Johnny');
Error: Duplicate entry for email
Composite Unique Index
A Composite Index (also called a multi-column index)
is an index created on two or more columns of a table.
Purpose:
⚫ Speeds up queries that filter or sort using multiple
columns together.
⚫ Helps with performance when WHERE, ORDER BY,
or GROUP BY uses multiple fields.
Think of sorting books in a library by Author + Title.
If you only sort by Author → results may be messy.
Sorting by both Author and Title makes searches faster
and more organized.
Example with composite unique index
Example 1:
CREATE UNIQUE INDEX indx_unique_course ON
enrollments(student_id, course_id);
This ensures that:

⚫ A student cannot register for the same course more than


once, but

⚫ A student can register for different courses

⚫ And multiple students can register for the same course


Example with composite unique index
Example 2:
CREATE INDEX idx_dept_age ON employee(dept, age);
This index will help with queries like:

--Uses dept AND age


SELECT * FROM employee WHERE dept = 'HR' AND age
= 28;
Index is fully used — fast search
What if you only use the first column?
-- Uses only dept
SELECT * FROM employee WHERE dept = 'HR';
Index is partially used (still helpful)
Example with composite unique index
What if you reverse the order?
-- Uses only age SELECT * FROM employee WHERE age = 30;

Index is not used effectively, because age is the second column in


the composite index.

Rule of Thumb: Left-to-Right Prefix Rule


A composite index on (A, B, C) will help with:
WHERE A = ?
WHERE A = ? AND B = ?
WHERE A = ? AND B = ? AND C = ?

But not with WHERE B = ? alone.


Composite Index ≠ Multiple Indexes
⚫ One Composite Index (Good for joint filtering)
CREATE INDEX idx_dept_age ON employee(dept,
age);

⚫ Two Single Indexes (Good only for individual


filtering)
CREATE INDEX idx_dept ON employee(dept);
CREATE INDEX idx_age ON employee(age);
Optimizing Complex Queries
⚫ Query with two conditions
SELECT * FROM employee
WHERE dept = 'IT' AND age > 25
ORDER BY age;

Composite index on (dept, age) improves performance


for both WHERE and ORDER BY.
Common Data Structures Used

Structure Why Used?


Balanced, fast for searching and
B-Tree
inserting
Like B-Tree but better for range
B+ Tree
queries
Fast lookups for exact matches
Hash Index
(e.g., WHERE id = 5)
Efficient for columns with few
Bitmap Index
distinct values (like gender: M/F)
B-Tree Indexing in Databases
What is a B-tree?
⚫ B-tree stands for Balanced Tree.
⚫ It is a self-balancing search tree.
⚫ All leaf nodes are at the same level, ensuring
logarithmic search time.
⚫ Widely used in databases for range queries, sorting,
and searching.
⚫ It is a sorted, balanced, branching structure that
organizes data for very fast lookups.
B-tree Structure
⚫ Each node can store multiple keys (not just one like
binary trees).
⚫ Each node has multiple child pointers.
⚫ B-tree remains balanced by splitting or merging nodes
during insertions/deletions.
Example: Simple B-tree
[30]
/ \
[10,20] [40,50]
Keys are sorted in each node.
To search for 40:
Compare with 30 → move to right child → find 40.

Fast search with few comparisons


Types of B-tree Indexes in Databases
1. Single-Column B-tree Index
Index on one column only.
Example:
CREATE INDEX idx_emp_id ON employee(emp_id);

Good for:
⚫ Searching or filtering by emp_id
⚫ Sorting by emp_id
Types of B-tree Indexes in Databases
2. Composite B-tree Index
Index on multiple columns (multi-column B-tree).
Example:
CREATE INDEX idx_dept_age ON employee(dept,
age);

Good for:
⚫ Queries filtering by both dept and age
⚫ Follows Left-to-Right Prefix Rule
Types of B-tree Indexes in Databases
3. Unique B-tree Index
A B-tree index that enforces uniqueness on a column or
a combination of columns.
Example:
CREATE UNIQUE INDEX idx_email ON users(email);

Good for:
⚫ Preventing duplicate values (e.g., duplicate emails).
Types of B-tree Indexes in Databases
4. Clustered B-tree Index
In some DBMSs (like SQL Server, MySQL):
A Clustered Index means the table rows are physically
ordered based on the index.
Example:
In MySQL , PRIMARY KEY creates clustered index
automatically
CREATE TABLE student ( student_id INT PRIMARY KEY,
name VARCHAR(50) );
Good for:
⚫ Fast range queries
⚫ Faster retrieval when ordering is needed.
B-Tree Index
B+ Tree Indexing in Databases
What is a B+ Tree?
⚫ B+ Tree is an advanced version of a B-tree.
⚫ It is widely used in database systems (MySQL, Oracle,
PostgreSQL) for indexing large amounts of data.
⚫ It keeps all the data only in the leaf nodes.
⚫ Internal nodes store only keys (used for navigation).
Structure of B+ Tree Index
⚫ Internal nodes: Only keys and pointers (no actual data).
⚫ Leaf nodes:
⚫ Store complete data records.
⚫ Are linked together (like a linked list) for fast sequential
access.
Simple B+ Tree Example

[20 | 40]
/|\
[5 10] [25 30 35] [45 50 60]

⚫ Internal node [20 | 40] just helps navigate.


⚫ Actual values 5,10,25,30,35,45,50,60 are stored only in
leaf nodes.
⚫ Leaves are linked for quick range scanning.
Why B+ Tree?

⚫ Faster Range Queries:


Linked leaves allow quick sequential access
⚫ Balanced Structure:
Always O(log n) search time
⚫ Efficient Disk Access:
Designed to minimize disk I/O
⚫ Suitable for Huge Data:
Handles millions of records easily
B+ Tree Index
Example: Search for 35
At root [20 | 40]
35 > 20 but < 40 → Go to middle child.

At middle child [25 30 35]


Found 35!

Fast: Only 2 levels accessed.


How Insertions Work

⚫ Insert data into a leaf node.


⚫ If the leaf overflows, split it into two leaves.
⚫ Push the middle key up to the parent.
⚫ Parent may also split if it overflows.

Always keeps the tree balanced.


SQL Example of B+ Tree Index
CREATE INDEX idx_salary ON employee(salary);

In MySQL , it automatically creates a B+ Tree Index


like:

⚫ Internal nodes: only salaries (keys).

⚫ Leaf nodes: salaries + full employee records.

Searching salary ranges (e.g., salary BETWEEN 30000


AND 60000) becomes super efficient
Hash Index in Database Management
Systems
What is a Hash Index?
⚫ Hash Indexing uses a hash function to compute the
address (location) where a record should be stored.
⚫ Instead of searching linearly or through a tree
structure, the hash value directly points to the data.
⚫ Super fast for exact match queries (=, not < or >).
Key Concepts

⚫ Hash Function: A function that converts a key into a


number (called a hash value).
⚫ Buckets: Locations or slots where records are stored.
⚫ Collisions: When two keys map to the same bucket
(handled separately).
Example of Hash Index
Suppose we have a table:
We want to create a Hash Index on EmpID.

Emp ID Name

101 Alice
102 Bob
103 Charlie
104 David
Hash Function Example

Hash Function
Let’s say:
hash(EmpID) = EmpID % 3
(Meaning: remainder when divided by 3.)

Hash Computations:
⚫ 101 % 3 = 2 → Bucket 2
⚫ 102 % 3 = 0 → Bucket 0
⚫ 103 % 3 = 1 → Bucket 1
⚫ 104 % 3 = 2 → Bucket 2 (collision!)
Bucket Representation

Bucket No Entries

0 102 (Bob)

1 103 (Charlie)

2 101 (Alice), 104 (David)

Collisions are handled either by chaining (linked list) or open


addressing (find next empty spot).
How Search Works
To search for EmpID 104:
⚫ Calculate hash(104) → 104 % 3 = 2.
⚫ Go to Bucket 2.
⚫ Search within the bucket.

Very fast — no need to scan the whole table!


Properties of Hash Index

Property Hash Index

Exact Match Queries (WHERE EmpID


Good for
= 104)

Range Queries (WHERE EmpID


Not good for
BETWEEN 100 and 200)

Search Speed O(1) ideally (constant time!)

Handling Collisions Chaining or Open Addressing

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy