0% found this document useful (0 votes)
12 views30 pages

CH 13 Updated

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views30 pages

CH 13 Updated

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Chapter 13: Query Processing

Basic Steps in Query Processing

1. Parsing and translation


2. Optimization
3. Evaluation

| 2
Basic Steps in Query Processing (Cont.)
● Parsing and translation
● translate the query into its internal form. This is then translated into relational
algebra.
● Parser checks syntax, verifies relations
● Evaluation
● The query-execution engine takes a query-evaluation plan, executes that plan, and
returns the answers to the query.

| 3
Basic Steps in Query Processing :
Optimization
● A relational algebra expression may have many equivalent expressions
● E.g., σbalance<2500(∏balance(account)) is equivalent to
∏balance(σbalance<2500(account))
● Each relational algebra operation can be evaluated using one of several different
algorithms
● Correspondingly, a relational-algebra expression can be evaluated in many ways.
● Annotated expression specifying detailed evaluation strategy is called an evaluation-
plan.
● E.g., can use an index on balance to find accounts with balance < 2500,
● or can perform complete relation scan and discard accounts with balance ≥ 2500

| 4
Basic Steps: Optimization (Cont.)

● Query Optimization: Amongst all equivalent evaluation plans choose the one with
lowest cost.
● Cost is estimated using statistical information from the database catalog
• e.g. number of tuples in each relation, size of tuples, etc.

| 5
Measures of Query Cost

● Cost is generally measured as total elapsed time for answering query


● Many factors contribute to time cost
• disk accesses, CPU, or even network communication
● Typically disk access is the main cost, and is also relatively easy to estimate. Measured
by taking into account
● Number of seeks * average-seek-cost
● Number of blocks read * average-block-read-cost
● Number of blocks written * average-block-write-cost
• Cost to write a block is greater than cost to read a block
– data is read back after being written to ensure that the write was successful

| 6
Measures of Query Cost (Cont.)
● For simplicity we just use the number of block transfers from disk and the number of
seeks as the cost measures
● tT – time to transfer one block
● tS – time for one seek
● Cost for b block transfers plus S seeks
b * tT + S * t S
● We ignore CPU costs for simplicity
● Real systems do take CPU cost into account
● Also, we do not include cost to writing output to disk in our cost formulae

| 7
Selection Operation – Algorithms List
● Search Algorithms are used to search and retrieve records that fulfill selection condition.

● File Scan : Entire relation (file) is scanned


● Two basic Algorithms
● Linear Search
● Binary Search
● Index Scan : Indexes are used to search records.
● Four Algorithms
● Primary index, equality on key
● Primary index, equality on nonkey
● Secondary index, equality on key
● Secondary index, equality on nonkey
● Algo. For Selection involving comparisons (<, <=, >, >=)
● Primary index, comparison
● Secondary index, comparison
● Algo. For complex selection (Conjunction, Disjunction, Negation)
● Conjunctive selection using one index
● Conjunctive selection using composite index
● Conjunctive selection by intersection of identifiers
● Disjunctive selection by union of identifiers

| 8
Selection Operation – Algo. (Cont…)

● File scan – search algorithms that locate and retrieve records that fulfill a selection
condition.
● Algorithm A1 (linear search). Scan each file block and test all records to see
whether they satisfy the selection condition.
● Cost estimate = br block transfers + 1 seek (1 seek is required to
access first block of the file. Then blocks can be accessed if stored
contiguously. If not stored contiguously, extra seeks may be required.)
• br denotes number of blocks containing records from relation r
● If selection is on a key attribute, can stop on finding record because unique
value exists for key
• The average transfer cost is (br /2) block transfers + 1 seek
• But the worst case is br block transfers + 1 seek
● Linear search can be applied to any file regardless of
• selection condition or
• ordering of records in the file, or
| 9
• availability of indices
Selection Operation (Cont.)
● A2 (binary search). Applicable if selection is an equality comparison on
the attribute
on which file is ordered.
● Assume that the blocks of a relation are stored contiguously
● Cost estimate (number of disk blocks to be scanned):
• cost of locating the first tuple (satisfying the condition) by a binary
search on the blocks
• ⎡log2(br)⎤ * (tT + tS)
Here, ⎡log2(br)⎤ is the no. of blocks to be examined in worst case and
(tT + tS) is
the time cost (tT is block transfer time and tS is block seek time).

• If there are multiple records (non-key attribute) satisfying the selection


condition
– Cost of reading extra blocks has to be added
– Add transfer cost of the number of blocks containing records that
| 10
satisfy selection condition
Selections Using Indices
● Index scan – search algorithms that use an index
● selection condition must be on the search-key of an index.
● A3 (primary index on primary key, equality). Retrieve a single record that satisfies
the corresponding equality condition
● Cost = (hi + 1) * (tT + tS) [hi is the height of a B+ Tree]
● A4 (primary index on nonkey, equality) Retrieve multiple records.
● Records will be on consecutive blocks
• Let b = number of blocks containing matching records
● Cost = hi * (tT + tS) + tS + tT * b
● A5 (secondary index on candidate key, equality).
● Retrieve a single record if the search-key is a primary key(or candidate key)
• Cost = (hi + 1) * (tT + tS)
• A6 (secondary index on nonkey, equality).
● Retrieve multiple records if search-key is not a primary key
• each of n matching records may be on a different block
• Cost = (hi + n) * (tT + tS) | 11

– Can be very expensive!


Selections Involving Comparisons

● Can implement selections of the form σA≤V (r) or σA ≥ V(r) by using


● a linear file scan or binary search,
● or by using indices in the following ways:
● A7 (primary index, comparison). (Relation is sorted on A)
• For σA ≥ V(r) use index to find first tuple ≥ v and scan relation sequentially from there
• For σA≤V (r) just scan relation sequentially till first tuple > v; do not use index
● A8 (secondary index, comparison).
• For σA ≥ V(r) use index to find first index entry ≥ v and scan index sequentially from
there, to find pointers to records.
• For σA≤V (r) just scan leaf pages of index finding pointers to records, till first entry > v
• In either case, retrieve records that are pointed to

| 12
Implementation of Complex Selections

● Conjunction: σθ1∧ θ2∧. . . θn (r)


● A9 (conjunctive selection using one index).
● Select a combination of θi and algorithms A1 through A8 that results in the least
cost for σθi (r).
● Test other conditions on tuple after fetching it into memory buffer.
● A10 (conjunctive selection using multiple-key index).
● Use appropriate composite (multiple-key) index if available.
● A11 (conjunctive selection by intersection of identifiers).
● Requires indices with record pointers.
● Use corresponding index for each condition, and take intersection of all the
obtained sets of record pointers.
● Then fetch records from file
● If some conditions do not have appropriate indices, apply test in memory.

| 13
Algorithms for Complex Selections

● Disjunction: σθ1∨ θ2 ∨. . . θn (r).


● A12 (disjunctive selection by union of identifiers).
● Applicable if all conditions have available indices.
• Otherwise use linear scan.
● Use corresponding index for each condition, and take union of all the obtained sets of
record pointers.
● Then fetch records from file
● Negation: σ¬θ(r)
● Use linear scan on file
● If very few records satisfy ¬θ, and an index is applicable to θ
• Find satisfying records using index and fetch from file

| 14
Sorting

● Sorting is important in DBMS because:


● To display tuples in sorted order.
● To process the queries, several relational operations use sorting before
applying the actual operation. For ex., join

● Relation could be sorted by building an index on the relation.

● This process orders the relation logically, not physically. Hence,


reading of tuples in sorted order may lead to disk access (disk seek plus
block transfer) for each tuple (when no of records are larger than no of
blocks) which could be very expensive. Therefore, it is desirable to arrange
the records in order physically.

● For relations that fit in memory, techniques like quicksort can be used.

● For relations that don’t fit in memory, external sort-merge is a good


choice. | 15
Example: External Sorting Using Sort-Merge
(Merge-Join)

• Sort-merge or merge-join is applied only on the relations having a join condition with = operator i.e., on equi-join.

| 16
Join Operation

● Several different algorithms to implement joins


● Nested-loop join
● Block nested-loop join
● Indexed nested-loop join
● Merge-join
● Hash-join
● Choice based on cost estimate

| 17
Nested-Loop Join

● To compute the theta join (theta join means join is based on the operator other than
= ) of two relations r and s is r θ s

for each tuple tr in r do begin


for each tuple ts in s do begin
test pair (tr,ts) to see if they satisfy the join condition θ
if they do, add tr • ts to the result.
end
end
● r is called the outer relation and s the inner relation of the join.
● Requires no indices and can be used with any kind of join condition. (similar to
linear file scan)
● Expensive since it examines every pair of tuples (nr * ns) in the two relations.

| 18
Block Nested-Loop Join

● Variant of nested-loop join in which every block of inner relation is paired with
every block of outer relation.
for each block Br of r do begin
for each block Bs of s do begin
for each tuple tr in Br do begin
for each tuple ts in Bs do begin
Check if (tr,ts) satisfy the join condition

if they do, add tr • ts to the result.


end
end
end
end

| 19
Indexed Nested-Loop Join

● Used with existing indices.

● In the previous algorithm nested-loop join, if index is available on the inner loop’s join
attribute and equi/natural join is used then index scan could be used instead of file
scans.

● For each tuple tr in the outer relation r=customer, use the index of S (created on join
attribute) to look up tuples in s that satisfy the join condition with tuple tr.

● If indices are available on join attributes of both r and s, use the relation with
fewer tuples as the outer relation.

| 20
Merge-Join (Sort-Merge-Join)

• Applied on sorted relations and for equi/natural join.


• Sort both relations on their join attribute (if not already sorted on the join attributes).
• Merge the sorted relations to join them using pointers pr and ps.

| 21
Merge-Join (Cont.)

● hybrid merge-join: If one relation is sorted, and the other has a secondary B+-tree
index on the join attribute.
● Merge the sorted relation with the leaf entries of the B+-tree. The result file will
contain tuples from the sorted relation and addresses for tuples of the unsorted
relation.
● Sort the result file on the addresses of the unsorted relation’s tuples
● Scan the unsorted relation in physical address order and merge with previous
result, to replace addresses by the actual tuples

| 22
Hash-Join

● Applicable for equi-joins and natural joins.


● A hash function h is used to partition tuples of both relations.
● The tuples of each relation are partitioned into sets that have the same hash value
on the Join Attribute.
● h maps JoinAttrs values to {0, 1, ..., n}, where JoinAttrs denotes the common
attributes of r and s used in the natural join.
● r0, r1, . . ., rn denote partitions of r tuples

• Each tuple tr ∈ r is put in partition ri where i = h(tr [JoinAttrs]).

● r0,, r1. . ., rn denotes partitions of s tuples

• Each tuple ts ∈s is put in partition si, where i = h(ts [JoinAttrs]).

| 23
Hash-Join (Cont.)

| 24
Hash-Join (Cont.)

● r tuples in ri need only to be compared with s tuples in si


● Need not be compared with s tuples in any other partition, since:
● an r tuple and an s tuple that satisfy the join condition will have the same
value for the join attributes.
● If that value is hashed to some value i, the r tuple has to be in ri and the s
tuple in si.

| 25
Handling of Overflows

● Partitioning is said to be skewed if some partitions have significantly more tuples than
some others
● Hash-table overflow occurs in specific partition si if si does not fit in memory.
Reasons could be
● Many tuples in s with same value for join attributes
● Bad hash function
● Overflow resolution
● Partition si is further partitioned using different hash function.
● Partition ri must be similarly partitioned as si.
● Overflow avoidance
● perform partitioning carefully to avoid overflows
● E.g. partition relation into many partitions, then combine them
● Both approaches fail with large numbers of duplicates

| 26
Complex Joins

● Join with a conjunctive (“and” operator in where clause) condition:


r θ1∧ θ 2∧... ∧ θ n s
● Either use nested loops/block nested loops, or
● Compute the result of one of the simpler joins r θi s
• final result comprises those tuples in the intermediate result that satisfy
the remaining conditions

θ1 ∧ . . . ∧ θi –1 ∧ θi +1 ∧ . . . ∧ θn
● Join with a disjunctive condition
r θ1 ∨ θ2 ∨... ∨ θn s
● Either use nested loops/block nested loops, or
● Compute as the union of the records in individual joins r θi s:
(r θ1 s) ∪ (r θ2 s) ∪ . . . ∪ (r θn s)

| 27
Other Operations

● Duplicate elimination can be implemented via hashing or sorting.


● On sorting duplicates will come adjacent to each other, and all but one set of
duplicates can be deleted.
● Hashing is similar – duplicates will come into the same bucket.
● Projection:
● perform projection on each tuple followed by duplicate elimination.

| 28
Other Operations : Aggregation

● Aggregation can be implemented in a manner similar to duplicate elimination.


● Sorting or hashing can be used to bring tuples in the same group together, and
then the aggregate functions can be applied on each group.

| 29
End of Chapter

| 30

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy