Chapter 7 Distributed Database Systems
Chapter 7 Distributed Database Systems
Fikru T.(MSc.) 3
Parallel Versus Distributed Technology
– There are two main types of multiprocessor system architectures that are commonplace:
1. Shared memory (tightly coupled) architecture
▪ Multiple processors share secondary (disk) storage and also share primary
memory.
2. Shared disk (loosely coupled) architecture
▪ Multiple processors share secondary (disk) storage but each has their own
primary memory.
– These architectures enable processors to communicate without the overhead of
exchanging messages over a network.
– Database management systems developed using the above types of architectures are
termed parallel database management systems rather than DDBMS. 4
Parallel Versus Distributed Technology Cont.…
– since they utilize parallel processor technology.
– Another type of multiprocessor architecture is called shared nothing architecture.
– In this architecture, every processor has its own primary and secondary (disk) memory,
no common memory exists, and the processors communicate over a high-speed
interconnection network (bus or switch).
– Although the shared nothing architecture resembles a distributed database computing
environment, major differences exist in the mode of operation.
– In shared nothing multiprocessor systems, there is symmetry and homogeneity of nodes;
– Shared nothing architecture is also considered as an environment for parallel databases.
– If both primary and secondary memories are shared, the architecture is also known as
shared everything architecture. Fikru T.(MSc.) 5
Parallel Versus Distributed Technology Cont.…
Fikru T.(MSc.) 9
Advantages of Distributed Databases Cont.…
II. Replication transparency: copies of data may be stored at multiple sites for
better availability, performance, and reliability.
• Replication transparency makes the user unaware of the existence of copies.
iii. Fragmentation transparency: Two types of fragmentation are possible.
a) Horizontal fragmentation distributes a relation into sets of tuples (rows).
b) Vertical fragmentation distributes a relation into subrelations where each
subrelation is defined by a subset of the columns of the original relation.
– A global query by the user must be transformed into several fragment queries.
Fragmentation transparency makes the user unaware of the existence of fragments.
Fikru T.(MSc.) 10
Advantages of Distributed Databases Cont.…
2. Increased reliability and availability: These are two of the most common
potential advantages cited for distributed databases.
i. Reliability is the probability that a system is running (not down) at a certain
time point.
ii. availability is the probability that the system is continuously available during
a time interval.
3. Improved performance: A distributed DBMS fragments the database by keeping
the data closer to where it is needed most.
• Data localization reduces the contention for CPU and I/O services and
simultaneously reduces access delays involved in wide area networks. 11
Advantages of Distributed Databases Cont.…
4. Easier expansion: In a distributed environment, expansion of the system in terms of
adding more data, increasing database sizes, or adding more processors is much easier.
– Additional Functions of Distributed Databases:
➢ Keeping track of data
➢ Distributed query processing
➢ Distributed transaction management
➢ Replicated data management
➢ Distributed database recovery
➢ Security
➢ Distributed directory (catalog) management
12
Data Fragmentation, Replication, and Allocation Techniques for
Distributed database Design
1. Data Fragmentation is techniques that are used to break up the database into
logical units, called fragments, which may be assigned for storage at the various
sites.
i. Horizontal Fragmentation.
• A horizontal fragment of a relation is a subset of the tuples in that relation.
• The tuples that belong to the horizontal fragment are specified by a condition
on one or more attributes of the relation.
• Horizontal fragmentation divides a relation "horizontally" by grouping rows
to create subsets of tuples, where each subset has a certain logical meaning.
13
Data Fragmentation, Replication, and Allocation Techniques Cont.…
ii. Vertical Fragmentation.
• Each site may not need all the attributes of a relation, which would indicate
the need for a different type of fragmentation.
• Vertical fragmentation divides a relation "vertically" by columns.
• A vertical fragment of a relation keeps only certain attributes of the relation.
• A vertical fragment on a relation R can be specified by a пLi (R) operation in
the relational algebra.
• A set of vertical fragments whose projection lists L1, L2, ... , Ln include all the
attributes in R but share only the primary key attribute of R is called a
complete vertical fragmentation of R. 14
Data Fragmentation, Replication, and Allocation Techniques Cont.…
25
Query Processing in Distributed Databases
– DDBMS processes and optimizes a query in terms of communication cost of
processing a distributed query and other parameters.
– Various factors which are considered while processing a query:
1. Cost of Data Transfer
– This is a very important factor while processing queries.
– The intermediate data is transferred to other location for data processing and the
final result will be sent to the location where the actual query is processing.
– The cost of data increases if the locations are connected via high performance
communicating channel.
– The DDBMS query optimization algorithm are used to minimize the cost of data
transfer.
26
Query Processing in Distributed Databases Cont.…
2. Semi-join based query optimization
– Semi join is used to reduce the number of relations in a table before transferring it another
location.
– Only joining columns are transferred in this method.
– This method reduces the cost of data transfer.
3. Cost based query optimization
– It involves many operations like selection, projection, aggregation.
– Cost of communication is considered in query optimization.
– In centralized system, the information of relations at remote locations is obtained from the
server system catalogs.
– The data (query) which is manipulated at local locations is considered as a sub query to other
global locations. End!!!
Thank you
– This process estimates the total cost which is needed to compute the intermediate relations.