Lec 7 Query Processing, Optimization & Indexing
Lec 7 Query Processing, Optimization & Indexing
Indexing
Lecture Agenda
⚫ Query Processing in DBMS
⚫ Query Parsing & Optimization
⚫ Execution Plans & Join Algorithms
⚫ Indexing & Index Structures
⚫ B-Trees, Hash Indexes
⚫ Unique, Composite, and Covering Indexes
What is Query Processing?
⚫ Query processing is the series of steps a database
management system (DBMS) follows to interpret and
execute a user's request (query) to retrieve or
manipulate data stored in a database.
⚫ Translation of SQL queries into low-level instructions
Query Processing Phases
Query Processing Involves multiple stages: Parsing,
Optimization, Execution
⚫ Parsing: Analyzing SQL syntax and semantics
⚫ Optimization: Finding the most efficient execution
plan
⚫ Execution: Running the query using the selected plan
Query Parsing
⚫ Query parsing is the process of analyzing a query
written in SQL (or another query language) to
understand its syntax and structure.
⚫ The parser checks whether the query follows the
correct grammar of the query language. If the syntax is
correct, it translates the query into a parse tree or
abstract syntax tree (AST), which is a tree
representation of the query's logical structure.
Steps in Query Parsing
⚫ Lexical Analysis: The query string is divided into
tokens (e.g., keywords, operators, identifiers).
⚫ Syntax Analysis: The tokens are checked against the
grammar of the query language to ensure correct
structure.
⚫ Semantic Analysis: The parser checks for semantic
errors (e.g., referencing non-existent tables or
columns).
⚫ Query Rewrite: Some queries are rewritten for
optimization purposes (e.g., transforming a subquery
into a join).
Query Optimization
⚫ Query Optimization is the process where the database
system evaluates multiple ways to execute a query and
chooses the most efficient one, usually based on cost
estimates (e.g., time, memory, or I/O operations).
⚫ It happens after parsing and translating the query into a
logical plan, but before actual execution.
⚫ Why Optimize? Reduce I/O, CPU, memory use
Types of Query Optimization
⚫ Query optimization can be categorized in several ways
based on how and when the optimization occurs
⚫ Key Types are
1. Heuristic query optimization
2. Cost-based Query optimization
Heuristic Query Optimization
⚫ Uses a set of rules of thumb (heuristics) to transform
and simplify the query into a more efficient form
without considering cost estimates.
Key Techniques:
⚫ Apply selection operations as early as possible (push
down selections).
⚫ Combine selections and projections to reduce the
number of columns/rows.
⚫ Use smaller tables first in joins.
⚫ Reorder joins and other operations based on known
patterns.
Heuristic optimization
Advantages:
⚫ Fast and simple
⚫ Good for basic improvements
Disadvantage:
Doesn’t always find the most efficient plan
Cost-Based Query Optimization
⚫ Evaluates multiple possible query execution plans using
statistics (e.g., table size, indexes, row selectivity) and chooses
the one with the lowest estimated cost.
Key Steps:
⚫ Generate All Possible Execution Plans: The optimizer
generates all possible ways to execute the query using
different combinations of joins, scans, and sorting.
⚫ Estimate the Cost of Each Plan: Each plan’s cost is
estimated based on factors such as:
⚫ Number of I/O operations: Reading and writing data.
⚫ CPU time: Time spent processing data.
⚫ Memory usage: Storage used during query processing.
⚫ Select the Best Plan: The plan with the lowest cost is
selected as the optimal execution plan.
Cost Based optimization
Factors Affecting Query Cost:
⚫ Table Size: Larger tables require more I/O
operations to read.
⚫ Index Availability: If indexes are available, they
can speed up data retrieval.
⚫ Join Methods: Different join algorithms (e.g.,
nested loop, merge join, hash join) have different
costs.
⚫ Sort Operations: Sorting data can be expensive if
it requires additional memory or disk I/O.
Cost Based optimization
Advantage:
⚫ More accurate and efficient plans
⚫ Adapts to real data distribution
Disadvantage:
⚫ More time-consuming than heuristic optimization
⚫ Depends heavily on up-to-date statistics
Query Execution
⚫ A Query Execution Plan (QEP) is a step-by-step
strategy chosen by the database management system
(DBMS) to execute a SQL query efficiently. After
optimization, the DBMS selects the best plan and uses
it to retrieve or modify the data.
⚫ Generated by the optimizer
⚫ Describes how tables are accessed and joined
Execution Plan
⚫ Access Paths
How data will be accessed: full table scan, index scan, etc.
⚫ Join Methods
How tables will be joined: nested loop, hash join, merge
join.
⚫ Join Order
In what sequence tables will be joined.
⚫ Selection/Projection
When and how WHERE and SELECT clauses are applied.
⚫ Intermediate Steps
Temporary tables, sorting, filtering, grouping.
⚫ Estimated Costs
Estimated time, CPU usage, I/O operations, number of
rows processed at each step.
Example: Optimized Query Execution
Plan
⚫ Student(student_id, name, age, course_id)
⚫ Course(course_id, course_name)
SELECT s.name, c.course_name
FROM Student s
JOIN Course c ON s.course_id = c.course_id
WHERE s.age > 20;
Execution Plan
Step Operation Table Method Notes
Use index on
1 Index Scan Student B-Tree Index age to filter
early
2 Filter Student - s.age > 20
Student + Join on
3 Hash Join Hash Table
Course course_id
Load all
Table Access Full Table
4 Course course
(Full) Scan
records
Output
s.name,
5 Projection Result -
c.course_na
me
How to View Execution Plans
Query:
Select the names of students enrolled in course of
database
Query optimization & Joins Example
⚫ Plan 1:
Select s.name from student
Where course_id
IN (select course_id from course where course_name =
Database)
⚫ Plan 2:
Select s.name from student s JOIN course c
ON s.course_id = c.course_id
Where c.course_name = ‘Database’;
Query optimization & Joins Example
⚫ Plan 3:
WITH registration AS(select course_id from course
where course_name = ‘Database’)
Select s.name from student s JOIN registration r ON
s.course_id = r.course_id;
Join Algorithm Comparison
⚫ Nested Loop: Small tables, O(n × m)
⚫ Hash Join: Medium/Large, EQ joins, O(n + m)
⚫ Merge Join: Sorted data, O(n + m)