0% found this document useful (0 votes)
16 views

Lec 7 Query Processing, Optimization & Indexing

The document covers the fundamentals of query processing in database management systems, detailing the stages of parsing, optimization, and execution. It discusses various optimization techniques, including heuristic and cost-based methods, as well as different join algorithms such as nested loop, hash, and merge joins. Additionally, it provides insights into execution plans and how to optimize SQL queries for efficiency.

Uploaded by

mhariskhan513
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views

Lec 7 Query Processing, Optimization & Indexing

The document covers the fundamentals of query processing in database management systems, detailing the stages of parsing, optimization, and execution. It discusses various optimization techniques, including heuristic and cost-based methods, as well as different join algorithms such as nested loop, hash, and merge joins. Additionally, it provides insights into execution plans and how to optimize SQL queries for efficiency.

Uploaded by

mhariskhan513
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

Query Processing, Optimization &

Indexing
Lecture Agenda
⚫ Query Processing in DBMS
⚫ Query Parsing & Optimization
⚫ Execution Plans & Join Algorithms
⚫ Indexing & Index Structures
⚫ B-Trees, Hash Indexes
⚫ Unique, Composite, and Covering Indexes
What is Query Processing?
⚫ Query processing is the series of steps a database
management system (DBMS) follows to interpret and
execute a user's request (query) to retrieve or
manipulate data stored in a database.
⚫ Translation of SQL queries into low-level instructions
Query Processing Phases
Query Processing Involves multiple stages: Parsing,
Optimization, Execution
⚫ Parsing: Analyzing SQL syntax and semantics
⚫ Optimization: Finding the most efficient execution
plan
⚫ Execution: Running the query using the selected plan
Query Parsing
⚫ Query parsing is the process of analyzing a query
written in SQL (or another query language) to
understand its syntax and structure.
⚫ The parser checks whether the query follows the
correct grammar of the query language. If the syntax is
correct, it translates the query into a parse tree or
abstract syntax tree (AST), which is a tree
representation of the query's logical structure.
Steps in Query Parsing
⚫ Lexical Analysis: The query string is divided into
tokens (e.g., keywords, operators, identifiers).
⚫ Syntax Analysis: The tokens are checked against the
grammar of the query language to ensure correct
structure.
⚫ Semantic Analysis: The parser checks for semantic
errors (e.g., referencing non-existent tables or
columns).
⚫ Query Rewrite: Some queries are rewritten for
optimization purposes (e.g., transforming a subquery
into a join).
Query Optimization
⚫ Query Optimization is the process where the database
system evaluates multiple ways to execute a query and
chooses the most efficient one, usually based on cost
estimates (e.g., time, memory, or I/O operations).
⚫ It happens after parsing and translating the query into a
logical plan, but before actual execution.
⚫ Why Optimize? Reduce I/O, CPU, memory use
Types of Query Optimization
⚫ Query optimization can be categorized in several ways
based on how and when the optimization occurs
⚫ Key Types are
1. Heuristic query optimization
2. Cost-based Query optimization
Heuristic Query Optimization
⚫ Uses a set of rules of thumb (heuristics) to transform
and simplify the query into a more efficient form
without considering cost estimates.
Key Techniques:
⚫ Apply selection operations as early as possible (push
down selections).
⚫ Combine selections and projections to reduce the
number of columns/rows.
⚫ Use smaller tables first in joins.
⚫ Reorder joins and other operations based on known
patterns.
Heuristic optimization
Advantages:
⚫ Fast and simple
⚫ Good for basic improvements
Disadvantage:
Doesn’t always find the most efficient plan
Cost-Based Query Optimization
⚫ Evaluates multiple possible query execution plans using
statistics (e.g., table size, indexes, row selectivity) and chooses
the one with the lowest estimated cost.
Key Steps:
⚫ Generate All Possible Execution Plans: The optimizer
generates all possible ways to execute the query using
different combinations of joins, scans, and sorting.
⚫ Estimate the Cost of Each Plan: Each plan’s cost is
estimated based on factors such as:
⚫ Number of I/O operations: Reading and writing data.
⚫ CPU time: Time spent processing data.
⚫ Memory usage: Storage used during query processing.
⚫ Select the Best Plan: The plan with the lowest cost is
selected as the optimal execution plan.
Cost Based optimization
Factors Affecting Query Cost:
⚫ Table Size: Larger tables require more I/O
operations to read.
⚫ Index Availability: If indexes are available, they
can speed up data retrieval.
⚫ Join Methods: Different join algorithms (e.g.,
nested loop, merge join, hash join) have different
costs.
⚫ Sort Operations: Sorting data can be expensive if
it requires additional memory or disk I/O.
Cost Based optimization
Advantage:
⚫ More accurate and efficient plans
⚫ Adapts to real data distribution
Disadvantage:
⚫ More time-consuming than heuristic optimization
⚫ Depends heavily on up-to-date statistics
Query Execution
⚫ A Query Execution Plan (QEP) is a step-by-step
strategy chosen by the database management system
(DBMS) to execute a SQL query efficiently. After
optimization, the DBMS selects the best plan and uses
it to retrieve or modify the data.
⚫ Generated by the optimizer
⚫ Describes how tables are accessed and joined
Execution Plan
⚫ Access Paths
How data will be accessed: full table scan, index scan, etc.
⚫ Join Methods
How tables will be joined: nested loop, hash join, merge
join.
⚫ Join Order
In what sequence tables will be joined.
⚫ Selection/Projection
When and how WHERE and SELECT clauses are applied.
⚫ Intermediate Steps
Temporary tables, sorting, filtering, grouping.
⚫ Estimated Costs
Estimated time, CPU usage, I/O operations, number of
rows processed at each step.
Example: Optimized Query Execution
Plan
⚫ Student(student_id, name, age, course_id)
⚫ Course(course_id, course_name)
SELECT s.name, c.course_name
FROM Student s
JOIN Course c ON s.course_id = c.course_id
WHERE s.age > 20;
Execution Plan
Step Operation Table Method Notes
Use index on
1 Index Scan Student B-Tree Index age to filter
early
2 Filter Student - s.age > 20
Student + Join on
3 Hash Join Hash Table
Course course_id
Load all
Table Access Full Table
4 Course course
(Full) Scan
records
Output
s.name,
5 Projection Result -
c.course_na
me
How to View Execution Plans

⚫ MYSQL EXPLAIN Select


Join Algorithms Overview
The type of join used in the execution plan of a query
depends on various factors, such as:
⚫ The size of the tables involved
⚫ Availability of indexes
⚫ Join condition (e.g., equality vs inequality)
⚫ The database engine's optimizer
Selection depends on: Input size, Sort order, Join
condition
Types of Join Algorithms
Types of Join Algorithms:
⚫ Nested Loop Join,
⚫ Hash Join,
⚫ Merge Join
Nested Loop Join
⚫ Brute-force method: Compare every pair
⚫ Time Complexity: O(n × m)
⚫ Best For: Small datasets or indexed lookups
How it works: For each row in the outer table, the DBMS
searches for matching rows in the inner table.
Best when: One table is small, or there's an index on the join
column of the inner table.
⚫ Example in plan:
Nested Loop
-> Index Scan on Student
-> Index Lookup on Course
Hash Join
⚫ Build Phase: Hash smaller table
⚫ Probe Phase: Scan larger table and match using hash
⚫ Efficient for equi-joins
How it works:
⚫ Build a hash table on the smaller table using the join key.
⚫ Scan the larger table and use the hash table to find matches.
Best when: No indexes, large tables, and equality joins.
Example in plan:
Hash Join
-> Seq Scan on Student
-> Hash on Course
Merge Join
⚫ Works on sorted inputs
⚫ Time Complexity: O(n + m)
⚫ Efficient for large, pre-sorted data
How it works: Both tables are sorted on the join key,
and then merged together like in merge sort.
Best when: Tables are already sorted or sorting is
efficient.
Example in plan:
Merge Join
-> Sort on Student.course_id
-> Sort on Course.course_id
Example
Consider Schema
Student(student_id, name, age, course_id)
Course(course_id, course_name)

Query:
Select the names of students enrolled in course of
database
Query optimization & Joins Example
⚫ Plan 1:
Select s.name from student
Where course_id
IN (select course_id from course where course_name =
Database)
⚫ Plan 2:
Select s.name from student s JOIN course c
ON s.course_id = c.course_id
Where c.course_name = ‘Database’;
Query optimization & Joins Example
⚫ Plan 3:
WITH registration AS(select course_id from course
where course_name = ‘Database’)
Select s.name from student s JOIN registration r ON
s.course_id = r.course_id;
Join Algorithm Comparison
⚫ Nested Loop: Small tables, O(n × m)
⚫ Hash Join: Medium/Large, EQ joins, O(n + m)
⚫ Merge Join: Sorted data, O(n + m)

Join Type Best For Common in execution


plan when
Nested Loop Small + Large Index exists on inner
Join (with index) table
Hash Join Large + Large No index, but equality
(equality only) join
Merge Join Sorted data Data already sorted or
sorted easily
Class Assignment
⚫ Optimize this query:
⚫ SELECT E.name, D.name FROM Employees E JOIN
Departments D ON E.dept_id = D.id WHERE E.salary
> 70000;
⚫ Suggest 2 execution plans, Recommend suitable
indexes

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy