CH 13 Updated
CH 13 Updated
| 2
Basic Steps in Query Processing (Cont.)
● Parsing and translation
● translate the query into its internal form. This is then translated into relational
algebra.
● Parser checks syntax, verifies relations
● Evaluation
● The query-execution engine takes a query-evaluation plan, executes that plan, and
returns the answers to the query.
| 3
Basic Steps in Query Processing :
Optimization
● A relational algebra expression may have many equivalent expressions
● E.g., σbalance<2500(∏balance(account)) is equivalent to
∏balance(σbalance<2500(account))
● Each relational algebra operation can be evaluated using one of several different
algorithms
● Correspondingly, a relational-algebra expression can be evaluated in many ways.
● Annotated expression specifying detailed evaluation strategy is called an evaluation-
plan.
● E.g., can use an index on balance to find accounts with balance < 2500,
● or can perform complete relation scan and discard accounts with balance ≥ 2500
| 4
Basic Steps: Optimization (Cont.)
● Query Optimization: Amongst all equivalent evaluation plans choose the one with
lowest cost.
● Cost is estimated using statistical information from the database catalog
• e.g. number of tuples in each relation, size of tuples, etc.
| 5
Measures of Query Cost
| 6
Measures of Query Cost (Cont.)
● For simplicity we just use the number of block transfers from disk and the number of
seeks as the cost measures
● tT – time to transfer one block
● tS – time for one seek
● Cost for b block transfers plus S seeks
b * tT + S * t S
● We ignore CPU costs for simplicity
● Real systems do take CPU cost into account
● Also, we do not include cost to writing output to disk in our cost formulae
| 7
Selection Operation – Algorithms List
● Search Algorithms are used to search and retrieve records that fulfill selection condition.
| 8
Selection Operation – Algo. (Cont…)
● File scan – search algorithms that locate and retrieve records that fulfill a selection
condition.
● Algorithm A1 (linear search). Scan each file block and test all records to see
whether they satisfy the selection condition.
● Cost estimate = br block transfers + 1 seek (1 seek is required to
access first block of the file. Then blocks can be accessed if stored
contiguously. If not stored contiguously, extra seeks may be required.)
• br denotes number of blocks containing records from relation r
● If selection is on a key attribute, can stop on finding record because unique
value exists for key
• The average transfer cost is (br /2) block transfers + 1 seek
• But the worst case is br block transfers + 1 seek
● Linear search can be applied to any file regardless of
• selection condition or
• ordering of records in the file, or
| 9
• availability of indices
Selection Operation (Cont.)
● A2 (binary search). Applicable if selection is an equality comparison on
the attribute
on which file is ordered.
● Assume that the blocks of a relation are stored contiguously
● Cost estimate (number of disk blocks to be scanned):
• cost of locating the first tuple (satisfying the condition) by a binary
search on the blocks
• ⎡log2(br)⎤ * (tT + tS)
Here, ⎡log2(br)⎤ is the no. of blocks to be examined in worst case and
(tT + tS) is
the time cost (tT is block transfer time and tS is block seek time).
| 12
Implementation of Complex Selections
| 13
Algorithms for Complex Selections
| 14
Sorting
● For relations that fit in memory, techniques like quicksort can be used.
• Sort-merge or merge-join is applied only on the relations having a join condition with = operator i.e., on equi-join.
| 16
Join Operation
| 17
Nested-Loop Join
● To compute the theta join (theta join means join is based on the operator other than
= ) of two relations r and s is r θ s
| 18
Block Nested-Loop Join
● Variant of nested-loop join in which every block of inner relation is paired with
every block of outer relation.
for each block Br of r do begin
for each block Bs of s do begin
for each tuple tr in Br do begin
for each tuple ts in Bs do begin
Check if (tr,ts) satisfy the join condition
| 19
Indexed Nested-Loop Join
● In the previous algorithm nested-loop join, if index is available on the inner loop’s join
attribute and equi/natural join is used then index scan could be used instead of file
scans.
● For each tuple tr in the outer relation r=customer, use the index of S (created on join
attribute) to look up tuples in s that satisfy the join condition with tuple tr.
● If indices are available on join attributes of both r and s, use the relation with
fewer tuples as the outer relation.
| 20
Merge-Join (Sort-Merge-Join)
| 21
Merge-Join (Cont.)
● hybrid merge-join: If one relation is sorted, and the other has a secondary B+-tree
index on the join attribute.
● Merge the sorted relation with the leaf entries of the B+-tree. The result file will
contain tuples from the sorted relation and addresses for tuples of the unsorted
relation.
● Sort the result file on the addresses of the unsorted relation’s tuples
● Scan the unsorted relation in physical address order and merge with previous
result, to replace addresses by the actual tuples
| 22
Hash-Join
| 23
Hash-Join (Cont.)
| 24
Hash-Join (Cont.)
| 25
Handling of Overflows
● Partitioning is said to be skewed if some partitions have significantly more tuples than
some others
● Hash-table overflow occurs in specific partition si if si does not fit in memory.
Reasons could be
● Many tuples in s with same value for join attributes
● Bad hash function
● Overflow resolution
● Partition si is further partitioned using different hash function.
● Partition ri must be similarly partitioned as si.
● Overflow avoidance
● perform partitioning carefully to avoid overflows
● E.g. partition relation into many partitions, then combine them
● Both approaches fail with large numbers of duplicates
| 26
Complex Joins
θ1 ∧ . . . ∧ θi –1 ∧ θi +1 ∧ . . . ∧ θn
● Join with a disjunctive condition
r θ1 ∨ θ2 ∨... ∨ θn s
● Either use nested loops/block nested loops, or
● Compute as the union of the records in individual joins r θi s:
(r θ1 s) ∪ (r θ2 s) ∪ . . . ∪ (r θn s)
| 27
Other Operations
| 28
Other Operations : Aggregation
| 29
End of Chapter
| 30