0% found this document useful (0 votes)
9 views36 pages

Vu Lec 33

Uploaded by

zahid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views36 pages

Vu Lec 33

Uploaded by

zahid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 36

Distributed Database

Management Systems

Lecture 33
In the previous lecture

• Final phase of QD
• Data Localization: for HF,
VF and DF.
In today’s Lecture

• Data Localization for


Hybrid Fragmentation
• Query Optimization.
Reduction for HyF
• HyF contains both types of
Fragmentations
• EMP1=eNo ≤ E4 (eNo, eName (EMP))
• EMP2=eNo > E4 (eNo, eName (EMP))
• EMP3=eNo, title (EMP).
• Select eName from EMP
where eNo = E5
 eName
 eName
eNo = E5
eNo = E5
⋈ e

U
No
EMP2

EMP1 EMP2 EMP3


Reduced Query-
Summary of what we
have done so far
• Query Decomposition:
generates an efficient query
in relational algebra
– Normalization, Analysis,
Simplification, Rewriting
• Data Localization: applies
global query to fragments;
increases optimization level-
• So, next is the cost-based
optimization
• Mainly concentrates on
the order of performing
joins
• Characteristics of relations
like cardinalities are
considered
• First QO in general
• QO refers to
producing a Query
Execution plan (QEP)
that represents
execution strategy.
• Components of Optimizer
• Search Space: set of eq.
alternative exec plans
• Cost Model: predicts cost
of a execution plan
• Search Strategy:
produces best plan
Search Space
• Search space consists of
eq. Query Trees
produced using Tr Rules
• Optimizer concentrates
on join trees, since join
cost is the most effective
• Example:
• Select eName, resp
From EMP, ASG, PROJ
where EMP.eNo = ASG.
eNo and ASG.pNo =
PROJ.pNo.
⋈pNo ⋈eNo
⋈eNo ⋈pNo
PROJ EMP

EMP ASG PROJ ASG


⋈pNo, eNo

x
ASG

EMP PROJ
• Alternatives with N
relations are O(N!)
based on properties of
relations
• So, restrictions are
applied
1- Heuristics
- Selection and
projection on base
relations
- Avoid Cartesian
product
2- Shape of Tree
- Linear Tree: At least one
node for each operand is
a base relation
- Bushy tree: May have
operators with interm
tables only; allows
parallel execution
Search Strategy
• Most popular is Dynamic
Programming
• That starts with base
relations and keeps on
adding relations calculating
cost
• DP is almost exhaustive
so produces best plan
• Too expensive with more
than 5 relations
• Other option is
Randomized strategy
• Do not guarantee best
Cost Model
• Cost of operators, statistics
of base data to predict size
of intermediate tables
• Cost considered as Total
Time and Response Time.
• Total time = CPU time +
I/O time + tr time
• In WAN, major cost is tr
time
• Initially ratios were 20:1
for tr and I/O, for LAN it
is 1:1.6
• Response time = CPU
time + I/O time + tr
time
• Difference.?
• TCPU = time for a CPU inst
• TI/O = a disk I/O
• TMSG = fixed time for
initiating and recv a msg
• TTR = transmit a data unit
from one site to another.
Site 1 X units
Site 3
Site 2 Y units

• TT = 2TMSG + TTR*(x+y)
• RT = max{TMSG + TTR*X,
TMSG + TTR*Y}
Database Statistics
• Major factor is interm tabs
• If the interm results are to
be transmitted, then
estimation about size is a
must
• More precise statistics cost
more
• For each relation R[A1, A2, …, An]
fragmented as R1, …, Rr
1.length of each attribute: length(Ai)
2.the number of distinct values for
each attribute in each fragment:
card(Ai(Rj))
3.maximum and minimum values in
the domain of each attribute:
min(Ai), max(Ai).
4.The cardinalities of each
domain: card(dom[Ai])
and the cardinalities of
each fragment: card(Rj)
5.Join selectivity factor for

SFJ (R,S) = card(R ⋈ S)/


some of the relations

(card(R) ∗ card(S))-
Cardinalities of
Intermediate Results
Selection Operation
• Card(
Card( F(R))=SFS(F) *
card(R)
• SFS(A = value) = 1/card(A(R))
• SFS(A > value) = max(A) – value
/(max(A) – min(A))
• SFS(A < value) = value - min(A)
/(max(A) – min(A))
• SFS(A < value) = max(A) – value
/(max(A) – min(A))
• SFS(p(Ai) ^ p(Aj)) = SFS(p(Ai)) *
(SFSp(Aj))
• SFS(p(Ai) v p(Aj)) = SFS(p(Ai)) +
SFS(p(Aj))–(SFS(p(Ai))* SFS(p(Ai))).
Cardinality of Projection
• Hard to determine precisely
• Two cases when it is trivial
1- When a single attribute A,
card(A(R)) = card (A)
2- When PK is included
card(A(R)) = card (R)
Cartesian Product
• card(RxS) = card (R) * card(S).
• Cardinality of Join
• No general way to test without
additional information

Card(R ⋈ S) = card (S)


• In case of PK/FK combination
• Semi Join:
SFSJ(R ⋉AS)= card(A(S))/
card(dom[A])
card(R ⋉AS) = SFSJ(S.A) *
card(R).
• Union: Hard to estimate
• Limits possible which are
card(R) + card(S) and
max{card (R) + card (S))
• Difference: Like Union,
card (R) for (R-S), and 0
Centralized Query
Optimization
Why to Study
1.Distributed Query is
transformed into local ones
2.Issues are related and
more complex in DD
3.Easier to understand
• Two famous ones.
• INGRES
– Dynamic
– Recursively breaks into
smaller ones
• System R
–static
–exhaustive search

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy