0% found this document useful (0 votes)

6 views28 pages

dbi3

The document discusses query optimization in databases, focusing on the transformation of relational expressions, cost estimation, and the selection of evaluation plans. It outlines the steps involved in cost-based query optimization, including generating equivalent expressions and estimating the cost of different evaluation plans based on statistical information. Additionally, it presents various equivalence rules for relational algebra operations that can be used to optimize query execution.

Uploaded by

nadun.emailclient

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

6 views28 pages

dbi3

Uploaded by

nadun.emailclient

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 28

Database Internals

Query Optimization
Lecture 3

1
Chapter 14: Query Optimization

 Introduction
 Transformation of Relational Expressions
 Estimating Statistics of Expression Results
 Choice of Evaluation Plans

2
Introduction
 Alternative ways of evaluating a given query
 Equivalent expressions
 Different algorithms for each operation

3
Introduction (Cont.)
 An evaluation plan defines exactly what algorithm is used for each
operation, and how the execution of the operations is coordinated.

4
Introduction (Cont.)
 Cost difference between evaluation plans for a query can be enormous
 E.g. seconds vs. days in some cases
 Steps in cost-based query optimization
1. Generate logically equivalent expressions using equivalence rules
2. Annotate resultant expressions to get alternative query plans
3. Choose the cheapest plan based on estimated cost
 Estimate of plan cost based on:
 Statistical information about relations. e.g. -
 number of tuples, number of distinct values for an attribute
 Statistics estimation for intermediate results
 to compute cost of complex expressions
 Cost formulae for algorithms, computed using statistics

5
Transformation of Relational Expressions
 Two relational algebra expressions are said to be equivalent if the
two expressions generate the same set of tuples on every legal
database instance
 Note: order of tuples is irrelevant
 In SQL, inputs and outputs are multisets of tuples
 Two expressions in the multiset version of the relational algebra
are said to be equivalent if the two expressions generate the same
multiset of tuples on every legal database instance.
 An equivalence rule says that expressions of two forms are
equivalent
 Can replace expression of first form by second, or vice versa

6
Equivalence Rules
1. Conjunctive selection operations can be deconstructed into a
sequence of individual selections.
    ( E )   (  ( E ))
1 2 1 2
2. Selection operations are commutative.
  (  ( E ))   (  ( E ))
1 2 2 1

3. Only the last in a sequence of projection operations is needed, the

others can be omitted.
 L1 ( L2 ( ( Ln ( E )) ))  L1 ( E )
4. Selections can be combined with Cartesian products and theta joins.
a. (E1 X E2) = E1  E2

b. 1(E1 2 E2 ) = E 1 1 2 E2

7
Equivalence Rules (Cont.)
5. Theta-join operations (and natural joins) are commutative.
E 1  E 2 = E 2  E1
6. (a) Natural join operations are associative:
(E1 E2) E3 = E1 (E2 E3)

(b) Theta joins are associative in the following manner:

(E1 1 E2) 2 3 E3 = E 1 1 3 (E2 2 E3)

where 2 involves attributes from only E2 and E3.

8
Equivalence Rules (Cont.)
7. The selection operation distributes over the theta join operation under
the following two conditions:
(a) When all the attributes in 0 involve only the attributes of one
of the expressions (E1) being joined.

0E1  E2) = (0(E1))  E2

(b) When  1 involves only the attributes of E1 and 2 involves

only the attributes of E2.
1 E1  E2) = (1(E1))  ( (E2))

9
Pictorial Depiction of Equivalence Rules

10
Equivalence Rules (Cont.)
8. The projection operation distributes over the theta join operation as
follows:
Consider a join E1  E2.
 Let L1 and L2 be sets of attributes from E1 and E2, respectively.
 Let L3 be attributes of E1 that are involved in join condition , but are
not in L1  L2, and
 let L4 be attributes of E2 that are involved in join condition , but are
not in L1  L2.

(a) if  involves only attributes from L1  L2:

L1 L2 ( E1  E2 )  ( L1 ( E1 ))  ( L2 ( E2 ))

(b) General rule:

 L L ( E1
1 2  E2 )   L L (( L L ( E1 ))
1 2 1 3  ( L L ( E2 )))
2 4

11
Equivalence Rules (Cont.)
9. The set operations union and intersection are commutative
E1  E2 = E 2  E 1
E1  E2 = E 2  E 1
 (set difference is not commutative).
10. Set union and intersection are associative.
(E1  E2)  E3 = E1  (E2  E3)
(E1  E2)  E3 = E1  (E2  E3)
11. The selection operation distributes over ,  and –.
 (E1 – E2) =  (E1) – (E2)
and similarly for  and  in place of –
Also:  (E1 – E2) = (E1) – E2
and similarly for  in place of –, but not for 
12. The projection operation distributes over union
L(E1  E2) = (L(E1))  (L(E2))
12
Transformation Example: Pushing Selections

 Query: Find the names of all customers who have an account at

some branch located in Brooklyn.
customer_name(branch_city = “Brooklyn”
(branch (account depositor)))
 Transformation using rule 7a.
customer_name
((branch_city =“Brooklyn” (branch))
(account depositor))
 Performing the selection as early as possible reduces the size of the
relations to be joined.

13
Example with Multiple Transformations
 Query: Find the names of all customers with an account at a
Brooklyn branch whose account balance is over $1000.
customer_name((branch_city = “Brooklyn”  balance > 1000
(branch (account depositor)))
 Transformation using join associatively (Rules 6a, 7a):
customer_name((branch_city = “Brooklyn”  balance > 1000
(branch account)) depositor)
 Second form provides an opportunity to apply the “perform
selections early” rule, resulting in the subexpression (Rule 7b)
branch_city = “Brooklyn” (branch)  balance > 1000 (account)
 Thus a sequence of transformations can be useful

Branch_schema=(branch_name,branch_city,assets)
Account_schema=(account_number,branch_name,balance)
Depositor_schema=(customer_name,account_number)
14
Multiple Transformations (Cont.)

15
Transformation Example: Pushing Projections

customer_name((branch_city = “Brooklyn” (branch) account) depositor)

 When we compute
(branch_city = “Brooklyn” (branch) account )

we obtain a relation whose schema is:

(branch_name, branch_city, assets, account_number, balance)
 Push projections using equivalence rules 8a and 8b; eliminate unneeded
attributes from intermediate results to get:
customer_name ((
account_number ( (branch_city = “Brooklyn” (branch) account ))
depositor )
 Performing the projection as early as possible reduces the size of the
relation to be joined.

16
Join Ordering Example
 For all relations r1, r2, and r3,
(r1 r2) r3 = r1 (r2 r3 )
(Join Associativity)
 If r2 r3 is quite large and r1 r2 is small, we choose

(r1 r2) r3
so that we compute and store a smaller temporary relation.

17
Join Ordering Example (Cont.)
 Consider the expression
customer_name ((branch_city = “Brooklyn” (branch))
(account depositor))
 Could compute account depositor first, and join result with
branch_city = “Brooklyn” (branch)
but account depositor is likely to be a large relation.
 Only a small fraction of the bank’s customers are likely to have
accounts in branches located in Brooklyn
 it is better to compute
branch_city = “Brooklyn” (branch) account
first.

18
Exercise :
Draw the expression trees and select the best options

(A) proj_name,budget((emp_city = “Moratuwa” (Employee)) (Assignment Project) )

(B) proj_name,budget(emp_city = “Moratuwa” ((Employee Assignment) Project))

Write a better evaluation expression using the following information ?

Employee(emp_no, emp_name, emp_city, …….)

Assignment(emp_no,proj_no,hours,……….)

Project(proj_name, budget, proj_no,……..)

19
Cost Estimation
 Cost of each operator computed as described in Chapter 13
 Need statistics of input relations
 E.g. number of tuples, sizes of tuples
 Inputs can be results of sub-expressions
 Need to estimate statistics of expression results
 To do so, we require additional statistics
 E.g. number of distinct values for an attribute

20
Statistical Information for Cost Estimation

 nr: number of tuples in a relation r.

 br: number of blocks containing tuples of r.
 lr: size of a tuple of r (in bytes).
 fr: blocking factor of r — i.e. the number of tuples of r that fit into one block.
 V(A, r): number of distinct values that appear in r for attribute A; same as
the size of A(r).
 If tuples of r are stored together physically in a file, then:
nr 

br  

fr 



21
Histograms
 Histogram on attribute age of relation person

 Equi-width histograms
 Equi-depth histograms
 If no histogram is available, the optimizer assumes that the distribution
is uniform
22
Selection Size Estimation
 A=v(r)
 nr / V(A,r) : number of records that will satisfy the selection
 Equality condition on a key attribute: size estimate = 1
 AV(r) (case of A  V(r) is symmetric)
 Let c denote the estimated number of tuples satisfying the condition.
 If min(A,r) and max(A,r) are available in catalog
 c = 0 if v < min(A,r)
 c = nr if v > max(A,r)

v  min( A, r )
nr .
 c= max( A, r )  min( A, r )

 If histograms available, can refine above estimate

 In absence of statistical information c is assumed to be nr / 2.

23
Size Estimation of Complex Selections

 The selectivity of a condition i is the probability that a tuple in the relation r

satisfies i .
 If si is the number of satisfying tuples in r, the selectivity of i is given
by si /nr.

 Conjunction: 1 2. . .  n (r). Assuming independence, estimate

s1  s2  . . .  sn
of tuples in the result is: nr 
nrn

 Disjunction:1 2 . . .  n (r). Estimated number of tuples:

 s s s 
nr   1  (1  1 )  (1  2 )  ...  (1  n ) 
 nr nr nr 
 Negation: (r). Estimated number of tuples:
nr – size((r))

24
Size Estimation for Other Operations
 Projection: estimated size of A(r) = V(A,r) /*elim. Duplicates*/
 Aggregation : estimated size of AgF(r) = V(A,r) /* group on A */
 Set operations
 For unions/intersections of selections on the same relation:
rewrite and use size estimate for selections
 E.g. 1 (r)  2 (r) can be rewritten as 1  2 (r)
 For operations on different relations:
 estimated size of r  s = size of r + size of s.
 estimated size of r  s = minimum size of r and size of s.
 estimated size of r – s = size of r
 All the three estimates may be quite inaccurate, but provide
upper bounds on the sizes.
 It is also possible to estimate the size of joins

25
Choice of Evaluation Plans
 Must consider the interaction of evaluation techniques when choosing
evaluation plans
 choosing the cheapest algorithm for each operation independently
may not yield best overall algorithm. E.g.
 merge-join may be costlier than hash-join, but may provide a
sorted output which reduces the cost for an outer level
aggregation.
 nested-loop join may provide opportunity for pipelining
 Practical query optimizers incorporate elements of the following two
broad approaches:
1. Search all the plans and choose the best plan in a
cost-based fashion.
2. Uses heuristics to choose a plan.

26
Cost-Based Optimization
 Consider finding the best join-order for r1 r2 . . . rn.
 There are (2(n – 1))!/(n – 1)! different join orders for above expression.
With n = 7, the number is 665280, with n = 10, the number is greater
than 176 billion!
 No need to generate all the join orders. Using dynamic programming,
the least-cost join order for any subset of
{r1, r2, . . . rn} is computed only once and stored for future use.

27
Heuristic Optimization
 Cost-based optimization is expensive, even with dynamic programming.
 Systems may use heuristics to reduce the number of choices that must
be made in a cost-based fashion.
 Heuristic optimization transforms the query-tree by using a set of rules
that typically (but not in all cases) improve execution performance:
 Perform selection early (reduces the number of tuples)
 Perform projection early (reduces the number of attributes)
 Perform most restrictive selection and join operations (i.e. with
smallest result size) before other similar operations.
 Some systems use only heuristics, others combine heuristics with
partial cost-based optimization.

DEEPSEEK FOR LAWYERS Unlocking AI-Powered Legal Practice with 400+ Essential Prompts (Jabbar, Adam)
No ratings yet
DEEPSEEK FOR LAWYERS Unlocking AI-Powered Legal Practice with 400+ Essential Prompts (Jabbar, Adam)
80 pages
Advanced Database System Chapter Two Query Processing and Optimization
No ratings yet
Advanced Database System Chapter Two Query Processing and Optimization
50 pages
Query Optimization
No ratings yet
Query Optimization
84 pages
Lecture 1 (1)
No ratings yet
Lecture 1 (1)
29 pages
Chapter 4 - HIRARC (2)
No ratings yet
Chapter 4 - HIRARC (2)
30 pages
Gate Project
No ratings yet
Gate Project
77 pages
Chapter - 2 Query Processing
No ratings yet
Chapter - 2 Query Processing
64 pages
741OutlierDetection
No ratings yet
741OutlierDetection
55 pages
(Part 2) Latihan Nombor Indeks
No ratings yet
(Part 2) Latihan Nombor Indeks
3 pages
DBMS_Unit-2 relational algebra
No ratings yet
DBMS_Unit-2 relational algebra
113 pages
Chapter 13 (2)
No ratings yet
Chapter 13 (2)
57 pages
03-Relational Algebra - Additional Operators
No ratings yet
03-Relational Algebra - Additional Operators
31 pages
Relational Algebra
No ratings yet
Relational Algebra
80 pages
Chapter 5: Query Optimization: Acknowledgements: Slides Are Adapted From Böhlen and
No ratings yet
Chapter 5: Query Optimization: Acknowledgements: Slides Are Adapted From Böhlen and
53 pages
2311.18760v4
No ratings yet
2311.18760v4
29 pages
Chapter 3
No ratings yet
Chapter 3
53 pages
Lecture note (14-10-2022)
No ratings yet
Lecture note (14-10-2022)
12 pages
Query Trees and Heuristics For Query Optimization
No ratings yet
Query Trees and Heuristics For Query Optimization
29 pages
Unit 6: Query Processing and Optimization
No ratings yet
Unit 6: Query Processing and Optimization
21 pages
ICT502 - Relational Algebra
No ratings yet
ICT502 - Relational Algebra
40 pages
Unit 1 (Query Optimization)
No ratings yet
Unit 1 (Query Optimization)
21 pages
Relational Algebra1
No ratings yet
Relational Algebra1
54 pages
Lesson 07
No ratings yet
Lesson 07
57 pages
Chapter 12, 13 - Query Processing and Optimization
No ratings yet
Chapter 12, 13 - Query Processing and Optimization
24 pages
Relation Algebra Anshul
No ratings yet
Relation Algebra Anshul
50 pages
Lecture 06
No ratings yet
Lecture 06
41 pages
Case Study: Betty Sue
No ratings yet
Case Study: Betty Sue
15 pages
Lecture 2005 3
No ratings yet
Lecture 2005 3
26 pages
DBMS UNIT4
No ratings yet
DBMS UNIT4
45 pages
CS2202_RelAlgebra
No ratings yet
CS2202_RelAlgebra
55 pages
Chapter 4 - B_Relational_algebra II
No ratings yet
Chapter 4 - B_Relational_algebra II
73 pages
Certified AI Practitioner Exam AIP-110 Blueprint_Final_20190813
No ratings yet
Certified AI Practitioner Exam AIP-110 Blueprint_Final_20190813
11 pages
DBMS Unit - 7
No ratings yet
DBMS Unit - 7
34 pages
Clear
No ratings yet
Clear
60 pages
Crystal Pools
No ratings yet
Crystal Pools
15 pages
Unit-3 RDBMS-1 (1)
No ratings yet
Unit-3 RDBMS-1 (1)
22 pages
RelationalAlgebra
No ratings yet
RelationalAlgebra
23 pages
CH 14 Updated
No ratings yet
CH 14 Updated
30 pages
Translate R. H. Docs
No ratings yet
Translate R. H. Docs
24 pages
Chapter7-Relational Algebra
No ratings yet
Chapter7-Relational Algebra
41 pages
4 Chapter Four
No ratings yet
4 Chapter Four
34 pages
2308.15299v1
No ratings yet
2308.15299v1
11 pages
DBMS Unit - 7
No ratings yet
DBMS Unit - 7
33 pages
COMP303 Lecture No 08 - 153949
No ratings yet
COMP303 Lecture No 08 - 153949
35 pages
08 Query Processing Strategies and Optimization
No ratings yet
08 Query Processing Strategies and Optimization
32 pages
Database Management Systems Week 4
No ratings yet
Database Management Systems Week 4
31 pages
Module 1 CPAR
No ratings yet
Module 1 CPAR
17 pages
Query Optimization
No ratings yet
Query Optimization
63 pages
6 Query Optimization-Ch 16
No ratings yet
6 Query Optimization-Ch 16
35 pages
DBMS - Unit 3 1
No ratings yet
DBMS - Unit 3 1
17 pages
Relational Algebra Operations in RDM: Tools Boot Camp
No ratings yet
Relational Algebra Operations in RDM: Tools Boot Camp
32 pages
28-Query Processing-30-09-2024
No ratings yet
28-Query Processing-30-09-2024
17 pages
Lab5_Counter
No ratings yet
Lab5_Counter
6 pages
Legal Citation Guide 201704 V 21
No ratings yet
Legal Citation Guide 201704 V 21
19 pages
ch05 - 6th Edition
No ratings yet
ch05 - 6th Edition
57 pages
Participant Guide Jolly Phonics Course
No ratings yet
Participant Guide Jolly Phonics Course
14 pages
Homeostasis: Body Systems Maintain Homeostasis
No ratings yet
Homeostasis: Body Systems Maintain Homeostasis
46 pages
1.6 PPT - Query Optimization
No ratings yet
1.6 PPT - Query Optimization
53 pages
SOEN 363 - Data Systems For Software Engineers: Query Optimization
No ratings yet
SOEN 363 - Data Systems For Software Engineers: Query Optimization
15 pages
Chapter - 2 Query Processing
No ratings yet
Chapter - 2 Query Processing
61 pages
Ch13-Query Optimization
No ratings yet
Ch13-Query Optimization
42 pages
Lecture 10-12 (R.A)
No ratings yet
Lecture 10-12 (R.A)
31 pages
Advanced Database
No ratings yet
Advanced Database
47 pages
DE_Module5_QueryOptimization
No ratings yet
DE_Module5_QueryOptimization
11 pages
Functions: Write A C++ Program To Print Out The Following Using Functions and Compare It Without Functions
No ratings yet
Functions: Write A C++ Program To Print Out The Following Using Functions and Compare It Without Functions
9 pages
Lecture 2 Relational Algebra
No ratings yet
Lecture 2 Relational Algebra
37 pages
Properties of MDF Manufactured
No ratings yet
Properties of MDF Manufactured
8 pages
Monitoring
0% (1)
Monitoring
18 pages
Lab5-IP addressing
No ratings yet
Lab5-IP addressing
3 pages
Exercises
No ratings yet
Exercises
6 pages
SA-1035 Chapter 1 and 2-1
No ratings yet
SA-1035 Chapter 1 and 2-1
16 pages
Business Communication - Module 2
No ratings yet
Business Communication - Module 2
10 pages
Chapter - 2 Query Processing
No ratings yet
Chapter - 2 Query Processing
61 pages
In class quiz on virtual memory_ Attempt review
No ratings yet
In class quiz on virtual memory_ Attempt review
2 pages
Chapter 4 - Relational Algebra
100% (1)
Chapter 4 - Relational Algebra
40 pages
11 Ch13 Query Optimization
No ratings yet
11 Ch13 Query Optimization
54 pages
Week 4: Relational Algebra (Part II) : Database System Concepts
No ratings yet
Week 4: Relational Algebra (Part II) : Database System Concepts
35 pages
MGT 200 Principles of Management Syllabus Fall 20121
No ratings yet
MGT 200 Principles of Management Syllabus Fall 20121
4 pages
Chapter 4 - RA
No ratings yet
Chapter 4 - RA
59 pages
Relational Algebra
No ratings yet
Relational Algebra
47 pages
Slides-7-Relational Algebra
No ratings yet
Slides-7-Relational Algebra
31 pages
18.625 136.000 T-95 Seal-Lock Boss
No ratings yet
18.625 136.000 T-95 Seal-Lock Boss
1 page
Cambridge Primary Checkpoint - English (0844) October 2019 Paper 1 Mark Scheme
100% (5)
Cambridge Primary Checkpoint - English (0844) October 2019 Paper 1 Mark Scheme
10 pages
Time and Motion Study
0% (1)
Time and Motion Study
12 pages
Fireside Quiz
0% (1)
Fireside Quiz
1 page
Relational Algebra and Relational Calculus
No ratings yet
Relational Algebra and Relational Calculus
44 pages
Glass Tile Premium Thin-Set Mortar
No ratings yet
Glass Tile Premium Thin-Set Mortar
5 pages
Relational Algebra
100% (1)
Relational Algebra
40 pages
Pepsico India
No ratings yet
Pepsico India
101 pages
Best Teacher Cover Letter Examples
100% (1)
Best Teacher Cover Letter Examples
7 pages
A Study of Supply Chain Management in Civil Engineering
No ratings yet
A Study of Supply Chain Management in Civil Engineering
5 pages
Artificial Neural Network Tutorial
0% (2)
Artificial Neural Network Tutorial
11 pages
Query Optimiation
No ratings yet
Query Optimiation
39 pages
Empty House in Astrology
No ratings yet
Empty House in Astrology
7 pages
Chapter 13: Query Processing
No ratings yet
Chapter 13: Query Processing
25 pages
Catia V5 R16 - Assembly Design
100% (33)
Catia V5 R16 - Assembly Design
551 pages
Oracle SQL and PL/SQL
From Everand
Oracle SQL and PL/SQL
Niraj Gupta
4.5/5 (8)
Digital Circuit Simulation Using Excel
From Everand
Digital Circuit Simulation Using Excel
Anthony Mazzurco
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

dbi3

Uploaded by

dbi3

Uploaded by

Database Internals

3. Only the last in a sequence of projection operations is needed, the

(b) Theta joins are associative in the following manner:

(E1 1 E2) 2 3 E3 = E 1 1 3 (E2 2 E3)

where 2 involves attributes from only E2 and E3.

0E1  E2) = (0(E1))  E2

(b) When  1 involves only the attributes of E1 and 2 involves

(a) if  involves only attributes from L1  L2:

(b) General rule:

 Query: Find the names of all customers who have an account at

customer_name((branch_city = “Brooklyn” (branch) account) depositor)

we obtain a relation whose schema is:

(A) proj_name,budget((emp_city = “Moratuwa” (Employee)) (Assignment Project) )

(B) proj_name,budget(emp_city = “Moratuwa” ((Employee Assignment) Project))

Write a better evaluation expression using the following information ?

Employee(emp_no, emp_name, emp_city, …….)

Project(proj_name, budget, proj_no,……..)

 nr: number of tuples in a relation r.

 If histograms available, can refine above estimate

 The selectivity of a condition i is the probability that a tuple in the relation r

 Conjunction: 1 2. . .  n (r). Assuming independence, estimate

 Disjunction:1 2 . . .  n (r). Estimated number of tuples:

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.