0% found this document useful (0 votes)
20 views

PDB Partitioning

Uploaded by

leena sakri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views

PDB Partitioning

Uploaded by

leena sakri
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 11

Data Partitioning Strategies in Parallel

Database Systems
 Data partitioning distributes data over a
number of processing elements.
 Each processing element is then executed

simultaneously with other processing


elements thereby creating parallelism.
 By partitioning the data distributed equally

into many different processor’s workload,


we can achieve better performance (better
parallelism) of the whole system.
Partitioning strategies
 There are various partitioning strategies
proposed to manage the data distribution
into multiple processors evenly.
 Let us assume that in our parallel database
system we have
◦ n processors from P0,P1,P2,…..Pn-1 and
◦ n disks D0,D1,D2,….Dn-1 where we partition our data.
 The value of n is chosen according to the
degree of parallelism required.
 The partitioning strategies are
◦ Round robin partitioning
◦ Hash partitioning and
◦ Range partitioning
Round Robin Partitioning
 The Emp_table has 14 records
and every record stores
information about the name of
the employee, his or her work
grade and the department name
 Assume that we have 3
processors namely P0,P1,P2 and
three disk associated with those
three processors namely D0,D1,D2.
 In Round Robin strategy, we
partition records in a round robin-
manner using the function i mod
n, where i is the record position
in the table and n is the number
of partitions or disk in our case it
is 3.
 On the application of partitioning
technique, first record goes to D1,
second record goes into D2, third
record goes into D0, fourth record
goes into D1 and so on.
Hash Partitioning

 Let us take GRADE attribute of the EMP_table to


explain Hash partitioning.
 Let us choose a hash function as follows:
◦ h(GRADE)= (GRADE mod n)
 Where
◦ GRADE is the value of GRADE attribute of a record
◦ n is the number of partitions is 3 in our case
 While applying the Hash partitioning on GRADE ,we
will get the following partitions of EMP_table.
 For example,
◦ the grade of Smith is 1 while hashing the function shows
partition 1 i.e 1 mod 3 = 1.
◦ The GRADE of Blake is 4 i.e..,(4 mod 3) directs to Partition 1.
◦ The GRADE of King is 5 which directs to partition 2 (5 mod
3)=2
Range Partitioning
 In range partitioning we identify one or more attributes

as partitioning attributes, then we choose a range


partition Vector to partition the table into n disk. The
vector is the values present in the partitioning attribute.
 Let us consider grade of emp_table to partition under

range partitioning.
 For applying range partition, we need to first identify

partitioning Vector.
 Let us choose the following Vector as range partitioning

Vector for our case [2,4].


◦ According to the vector the records having the grade value 2
and less will go into partition 0
◦ greater than 2 and less than or equal to 4 will go into partition
1
◦ all other values that is greater than 4 will go into partition
number 2
SOLVE
 Consider a parallel DBMS in which each relation is stored by horizontally
partitioning its tuples across all disks.

 The mgrid field of Departments is the eid of the manager. Each relation
contains 20-byte tuples, and the sal and budget fields both contain uniformly
distributed values in the range 0 to 1,000,000. The Employees relation
contains 100,000 pages, the Departments relation contains 5,000 pages, and
each processor has 100 buffer pages of 4,000 bytes each. The cost of one
page I/O is td, and the cost of shipping one page is ts; tuples are shipped in
units of one page by waiting for a page to be filled before sending a message
from processor i to processor j. There are no indexes, and all joins that are
local to a processor are carried out using a sort-merge join. Assume that the
relations are initially partitioned using a round-robin algorithm and that there
are 10 processors.
 For each of the following queries, describe the evaluation plan briefly and give
its cost in terms of td and ts. You should compute the total cost across all sites
as well as the ‘elapsed time’ cost (i.e., if several operations are carried out
concurrently, the time taken is the maximum over these operations).
 1. Find the highest paid employee.
 2. Find the highest paid employee in the department with did
55.
 3. Find the highest paid employee over all departments with
budget less than 100,000.
 4. Find the highest paid employee over all departments with
budget less than 300,000.
 5. Find the average salary over all departments with budget
less than 300,000.
 6. Find the salaries of all managers.
 7. Find the salaries of all managers who manage a department
with a budget less than 300,000 and earn more than 100,000.
 8. Print the eids of all employees, ordered by increasing
salaries. Each processor is connected to a separate printer,
and the answer can appear as several sorted lists, each
printed by a different processor, as long as we can obtain a
fully sorted list by concatenating the printed lists (in some
order)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy