CH 14 Updated
CH 14 Updated
Introduction
● Alternative ways of evaluating a given query
● Equivalent expressions
● Different algorithms for each operation
● Select customer_name from branch,account,depositor where
branch.branch_name=account.branch_name and account.accountno=depositor.account_no
and branch.branch_city=‘Brooklyn’;
| 2
Introduction (Cont.)
● An evaluation plan defines exactly what algorithm is used for each operation, and how
the execution of the operations is coordinated.
| 3
Introduction (Cont.)
| 4
Generating Equivalent Expressions
| 5
Transformation of Relational Expressions
● Two relational algebra expressions are said to be equivalent if the two expressions
generate the same set of tuples on every legal database instance
● Note: order of tuples is irrelevant
● In SQL, inputs and outputs are multisets of tuples
● Two expressions in the multiset version of the relational algebra are said to be
equivalent if the two expressions generate the same multiset of tuples on every
legal database instance.
● An equivalence rule says that expressions of two forms are equivalent and can
replace expression of first form by second, or vice versa
| 6
Equivalence Rules
1. Conjunctive selection operations can be deconstructed into a sequence of
individual selections.
3. Only the last in a sequence of projection operations is needed, the others can be
omitted.
| 7
Equivalence Rules (Cont.)
5. Theta-join operations (and natural joins) are commutative.
| 8
Pictorial Depiction of Equivalence Rules
| 9
Equivalence Rules (Cont.)
7. The selection operation distributes over the theta join operation under the following
two conditions:
(a) When all the attributes in θ0 involve only the attributes of one of the expressions
(E1) being joined.
(b) When θ 1 involves only the attributes of E1 and θ2 involves only the attributes of E2.
| 10
Equivalence Rules (Cont.)
8. The projection operation distributes over the theta join operation as follows:
(a) if θ involves only attributes from L1 ∪ L2:
| 11
Equivalence Rules (Cont.)
9. The set operations union and intersection are commutative
| 12
Equivalence Rules (Cont.)
11.The selection operation distributes over ∪, ∩ and –.
Also:
and similarly for ∩ in place of –, but not for ∪
| 13
Transformation Example: Pushing Selections Early
● Query: Find the names of all customers who have an account at some branch
located in Brooklyn.
Πcustomer_name(σbranch_city = “Brooklyn” (branch (account depositor)))
| 14
Example with Multiple Transformations
● Query: Find the names of all customers with an account at a Brooklyn branch whose
account balance is over $1000.
Πcustomer_name((σbranch_city = “Brooklyn” ∧ balance > 1000(branch (account depositor)))
| 15
Multiple Transformations (Cont.)
Pushing
selection
early
| 16
Transformation Example: Pushing Projections
• Query: Find the names of all customers with an account at a Brooklyn
branch.
Πcustomer_name((σbranch_city = “Brooklyn” (branch) account) depositor)
● When we compute
(σbranch_city = “Brooklyn” (branch) account )
● Instead of having all the above attributes in the intermediate schema, push
projections as follows using equivalence rules 8a and 8b (project only
account_number); eliminates unneeded attributes.
● Performing the projection as early as possible reduces the size of the relation to
be joined. | 17
Join Ordering Example
(r1 r2) r3
so that we compute and store a smaller temporary relation.
| 18
Join Ordering Example (Cont.)
● Consider the expression
Πcustomer_name ((σbranch_city = “Brooklyn” (branch)) (account depositor))
● Could compute account depositor first, and join result with σbranch_city = “Brooklyn”
(branch)
● Only a small fraction of the bank’s customers are likely to have accounts in
branches located in Brooklyn
● it is better to compute
σbranch_city = “Brooklyn” (branch) account
first.
| 19
Enumeration of Equivalent Expressions
| 20
Implementing Transformation Based Optimization
● Space requirements reduced by sharing common sub-expressions:
● when E1 is generated from E2 by an equivalence rule, usually only the top level of the two
are different, subtrees below are the same and can be shared using pointers
• E.g. when applying join commutativity (Both E1 join E2 and E2 join E1 will give you the
same result)
E E
1 2
| 21
Cost Estimation
● Cost of each operator computed as described in previous chapter.
● Need statistics of input relations
• E.g. number of tuples, sizes of tuples
● Inputs can be results of sub-expressions
● Need to estimate statistics of expression results
● To do so, we require additional statistics
• E.g. number of distinct values for an attribute
| 22
Choice of Evaluation Plans
● Must consider the interaction of evaluation techniques when choosing evaluation plans
● choosing the cheapest algorithm for each operation independently may not
yield best overall algorithm. E.g.
• merge-join may be costlier than hash-join, but may provide a sorted output
which reduces the cost for an outer level aggregation.
• nested-loop join may provide opportunity for pipelining
● Practical query optimizers incorporate elements of the following two broad
approaches:
1. Search all the plans and choose the best plan in a cost-based fashion.
2. Uses heuristics to choose a plan.
| 23
Cost-Based Optimization
● Consider finding the best join-order for r1 r2 . . . r n.
● There are (2(n – 1))!/(n – 1)! different join orders for above expression. For ex.,
with n=5, number is 1680; with n = 10, the number is greater than 176 billion!
● With n = 3, the number is 12. Therefore, if R1 , R2 and R3 are 3 relations, there will
be total 12 join orders as following.
r1, (r2, r3) r1, (r3, r2) (r2, r3), r1 (r3, r2), r1
r2, (r1, r3) r2, (r3, r1) (r1, r3), r2 (r3, r1), r2
r3, (r1, r2) r3, (r2, r1) (r1, r2), r3 (r2, r1), r3
● No need to generate all the join orders. Using dynamic programming, the
least-cost join order for any subset of {r1, r2, . . . rn} is computed only
once and stored for future use.
| 24
Optimization
| 26
Interesting Sort Orders
| 27
Heuristic Optimization
● Cost-based optimization is expensive, even with dynamic programming.
● Systems may use heuristics to reduce the number of choices that must be
made in a cost-based fashion.
● Heuristic optimization transforms the query-tree by using a set of rules that typically
(but not in all cases) improve execution performance:
● Perform selection early (reduces the number of tuples)
● Perform projection early (reduces the number of attributes)
● Perform most restrictive selection and join operations (i.e. with smallest result
size) before other similar operations.
● Some systems use only heuristics, others combine heuristics with partial cost-
based optimization.
| 28
Structure of Query Optimizers
● Many optimizers considers left-deep join orders.
● Plus heuristics to push selections and projections down the query tree
● Reduces optimization complexity and generates plans amenable to pipelined
evaluation.
BUT
● Even with the use of heuristics, cost-based query optimization imposes a substantial
overhead.
● But is worth it for expensive queries
| 29
● Optimizers often use simple heuristics for very cheap queries, and perform
End of Chapter
| 30