0% found this document useful (0 votes)
17 views28 pages

Vu Lec 30

Uploaded by

zahid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views28 pages

Vu Lec 30

Uploaded by

zahid
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 28

Distributed Database

Management Systems

Lecture 30
In the previous lecture
• Locking based CC
• Timestamp ordering
based CC
• Concluded TM.
In this Lecture
• Basic Concepts of Query
Optimization
• QP in centralized and
Distributed DBs.
Introduction
• SQL one of the success
factors of RDBMS
• Query processor
transforms complex
queries into concise and
simple ones
• Query processing is
critical performance
issue
• QP a complex
problem specially in
DDBS environment
• Main function of QP is to
transform an SQL query
into equivalent relational
algebra one (low level
language)
• Transformation must
achieve correctness and
efficiency
• Correctness is
straightforward since
rules exist
• An SQL query can
have many equivalents
in R Algebra
• Considering the tables
• EMP(eNo, eName, title)
• ASG(eNo, pNo, resp, dur)
• PROJ(pNo, pName,
budget, loc)
• Query: Get the names of
employees who are
managing a project
• SELECT eName
FROM EMP, ASG
WHERE EMP.eNo =
ASG.eNo
AND resp = ‘Manager’
eName(resp=‘Manager’ ^ EMP.eNo =
ASG.eNo) (EMPxASG)

eName(EMP ⋈ (resp=‘Manager’
(ASG)))
• Obviously second one needs
less computing resources
since avoids Cartesian product
• Centralized QP is to
choose best query
execution plan
• Distributed is more
complex; it also involves
the selection of site to
execute query
• Same query in DDBS
• Suppose EMP and ASG are
HF as
• EMP1 = eNo ≤ ‘E3’ (EMP)
• EMP2 = eNo > ‘E3’ (EMP)
• ASG1 = eNo ≤ ‘E3’ (ASG)
• ASG2 = eNo > ‘E3’ (ASG)
• Further suppose these
fragments are stored
at site 1, 2, 3 and 4
and result at site 5
result = EMP1’ U EMP2’ Site 5
EMP1’ EMP2’
Site 3 Site 4

EMP1’=EMP1 ⋈(ASG ’) 1
EMP2’=EMP2 ⋈(ASG ’)2

ASG1’ ASG2’
Site 1 Site 2
ASC1’=resp = ‘Manager(ASG1) ASC2’=resp = ‘Manager(ASG2)
result = (EMP1 U EMP2) ⋈ eNo
 resp = ‘Manager’ (ASG1 U ASG2)
ASG1 ASG2 EMP1 EMP2

Site 1 Site 2 Site 3 Site 4


Lets Assume
• size(EMP) 400
• size(ASG) 1000
• tuple access cost 1 unit
• tuple transfer cost 10 units
• There are 20 Managers
• Data distributed evenly at all
sites
Strategy 1
• produce ASG': 20*1 20
• transfer ASG' to the sites of 200
E: 20 * 10
• produce EMP': (10+10) 40
*1*2
• transfer EMP' to result site: 200
20*10
Total 460
Strategy 2
• Transfer EMP to site 5: 400 4000
* 10
• Transfer ASG to the site 5 10000
1000 * 10
• Produce ASG‘ by selecting 1000
ASG
• Join EMP and ASG’ 8000
Total 23000
Query Optimization
• An important aspect of QP
• Minimize resource
consumption
• I/O cost + CPU cost +
communication cost
• First two in Centralized DB
• Communication Cost will
dominate in WAN
• Not that dominant in
LANs, so total cost
should be considered in
LANs
• QO can also maximize
throughput
Operators’ Complexity
• Select, Project (without
duplicate elimination) O(n)
• Project (with duplicate
elimination), Group O(nlogn)
• Join, Semi-Join,
Division, Set Operators O(nlog n)
• Cartesian Product O(n2)
Characterization of
Query Processors
• Types of Optimization
–Exhaustive search for the
cost of each strategy to find
the most optimal one
–May be very costly in case of
multiple options and more
fragments
–Heuristics
• Optimization Timing
–Static: during compilation
• Size of intermediate tables not
known always
• Cost justified with repeated
execution
–Dynamic: during execution
• Intermediate tables’ size known
• Re-optimzation may be required
• Statistics
–Relation/Fragment:
Cardinality, size of a tuple,
fraction of tuples participating
in a join with another relation
–Attribute: cardinality of
domain, actual number of
distinct values
• Decision Sites
–Centralized: simple, need
knowledge about the entire
distributed database
–Distributed: cooperation among
sites to determine the schedule,
need only local information
–Hybrid: one site determines the
global schedule, each site
optimizes the local subqueries
• Other factors like:
–Network topology
–Replicated fragments
–Use of semijoins.
SQL Query on Distributed Relations
QUERY GLOBAL
DECOMPOSITION SCHEMA
Algebraic Query on Distributed
Relations
DATA FRAGMENT
LOCALIZATION SCHEMA
Fragment Query
GLOBAL STAT OF
OPTIMIZATION FRAGMENTS
Optimized Fragment Query with
Communication Operations
LOCAL LOCAL
OPTIMIZATION SCHEMA

Optimized
Local Query

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy