0% found this document useful (0 votes)
2 views27 pages

Chapter 7 Distributed Database Systems

The document discusses distributed database systems, highlighting their advantages such as improved reliability, availability, and performance through data fragmentation and replication techniques. It differentiates between various architectures like shared memory, shared disk, and shared nothing, as well as the types of distributed database systems based on homogeneity and autonomy. Additionally, it covers query processing strategies to optimize data transfer and minimize costs in distributed environments.

Uploaded by

joshua211619
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views27 pages

Chapter 7 Distributed Database Systems

The document discusses distributed database systems, highlighting their advantages such as improved reliability, availability, and performance through data fragmentation and replication techniques. It differentiates between various architectures like shared memory, shared disk, and shared nothing, as well as the types of distributed database systems based on homogeneity and autonomy. Additionally, it covers query processing strategies to optimize data transfer and minimize costs in distributed environments.

Uploaded by

joshua211619
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

Advanced Database System

Chapter 7 Distributed Database System

6/15/2025 Fikru T.(MSc.) 1


Distributed Database Concepts
– Distributed databases bring the advantages of distributed computing to the
database management domain.
– A distributed computing system consists of a number of processing elements, not
necessarily homogeneous, that are interconnected by a computer network, and that
cooperate in performing certain assigned tasks.
– As a general goal, distributed computing systems partition a big, unmanageable
problem into smaller pieces and solve it efficiently in a coordinated manner.
– The economic viability of this approach stems from two reasons:
1. more computer power is harnessed to solve a complex task, and
2. each autonomous processing element can be managed independently and
develop its own applications. Fikru T.(MSc.) 2
Distributed Database Concepts Cont.…

– A distributed database (DDB) as a collection of multiple logically


interrelated databases distributed over a computer network.
– A distributed database management system (DDBMS) as a software
system that manages a distributed database while making the distribution
transparent to the user.
– A collection of files stored at different nodes of a network and the
maintaining of interrelationships among them via hyperlinks has become a
common organization on the Internet, with files of Web pages.

Fikru T.(MSc.) 3
Parallel Versus Distributed Technology
– There are two main types of multiprocessor system architectures that are commonplace:
1. Shared memory (tightly coupled) architecture
▪ Multiple processors share secondary (disk) storage and also share primary
memory.
2. Shared disk (loosely coupled) architecture
▪ Multiple processors share secondary (disk) storage but each has their own
primary memory.
– These architectures enable processors to communicate without the overhead of
exchanging messages over a network.
– Database management systems developed using the above types of architectures are
termed parallel database management systems rather than DDBMS. 4
Parallel Versus Distributed Technology Cont.…
– since they utilize parallel processor technology.
– Another type of multiprocessor architecture is called shared nothing architecture.
– In this architecture, every processor has its own primary and secondary (disk) memory,
no common memory exists, and the processors communicate over a high-speed
interconnection network (bus or switch).
– Although the shared nothing architecture resembles a distributed database computing
environment, major differences exist in the mode of operation.
– In shared nothing multiprocessor systems, there is symmetry and homogeneity of nodes;
– Shared nothing architecture is also considered as an environment for parallel databases.
– If both primary and secondary memories are shared, the architecture is also known as
shared everything architecture. Fikru T.(MSc.) 5
Parallel Versus Distributed Technology Cont.…

Figure: Shared nothing architecture.


Fikru T.(MSc.) 6
Parallel Versus Distributed Technology Cont.…

Figure: A networked architecture with a centralized database at one of the sites


Fikru T.(MSc.) 7
Parallel Versus Distributed Technology Cont.…

Figure: A truly distributed database architecture


Fikru T.(MSc.) 8
Advantages of Distributed Databases
1. Management of distributed data with different levels of transparency
i. Distribution or network transparency: This refers to freedom for the user
from the operational details of the network.
• It may be divided into location transparency and naming transparency.
a) Location transparency refers to the fact that the command used to
perform a task is independent of the location of data and the location of
the system where the command was issued.
b) Naming transparency implies that once a name is specified, the named
objects can be accessed unambiguously without additional specification.

Fikru T.(MSc.) 9
Advantages of Distributed Databases Cont.…
II. Replication transparency: copies of data may be stored at multiple sites for
better availability, performance, and reliability.
• Replication transparency makes the user unaware of the existence of copies.
iii. Fragmentation transparency: Two types of fragmentation are possible.
a) Horizontal fragmentation distributes a relation into sets of tuples (rows).
b) Vertical fragmentation distributes a relation into subrelations where each
subrelation is defined by a subset of the columns of the original relation.
– A global query by the user must be transformed into several fragment queries.
Fragmentation transparency makes the user unaware of the existence of fragments.

Fikru T.(MSc.) 10
Advantages of Distributed Databases Cont.…
2. Increased reliability and availability: These are two of the most common
potential advantages cited for distributed databases.
i. Reliability is the probability that a system is running (not down) at a certain
time point.
ii. availability is the probability that the system is continuously available during
a time interval.
3. Improved performance: A distributed DBMS fragments the database by keeping
the data closer to where it is needed most.
• Data localization reduces the contention for CPU and I/O services and
simultaneously reduces access delays involved in wide area networks. 11
Advantages of Distributed Databases Cont.…
4. Easier expansion: In a distributed environment, expansion of the system in terms of
adding more data, increasing database sizes, or adding more processors is much easier.
– Additional Functions of Distributed Databases:
➢ Keeping track of data
➢ Distributed query processing
➢ Distributed transaction management
➢ Replicated data management
➢ Distributed database recovery
➢ Security
➢ Distributed directory (catalog) management
12
Data Fragmentation, Replication, and Allocation Techniques for
Distributed database Design
1. Data Fragmentation is techniques that are used to break up the database into
logical units, called fragments, which may be assigned for storage at the various
sites.
i. Horizontal Fragmentation.
• A horizontal fragment of a relation is a subset of the tuples in that relation.
• The tuples that belong to the horizontal fragment are specified by a condition
on one or more attributes of the relation.
• Horizontal fragmentation divides a relation "horizontally" by grouping rows
to create subsets of tuples, where each subset has a certain logical meaning.
13
Data Fragmentation, Replication, and Allocation Techniques Cont.…
ii. Vertical Fragmentation.
• Each site may not need all the attributes of a relation, which would indicate
the need for a different type of fragmentation.
• Vertical fragmentation divides a relation "vertically" by columns.
• A vertical fragment of a relation keeps only certain attributes of the relation.
• A vertical fragment on a relation R can be specified by a пLi (R) operation in
the relational algebra.
• A set of vertical fragments whose projection lists L1, L2, ... , Ln include all the
attributes in R but share only the primary key attribute of R is called a
complete vertical fragmentation of R. 14
Data Fragmentation, Replication, and Allocation Techniques Cont.…

iii. Mixed (Hybrid) Fragmentation


• We can intermix the two types of fragmentation, yielding a mixed
fragmentation.
• In general, a fragment of a relation R can be specified by a SELECT-PROJECT
combination of operations пL(σC(R)).
• A fragmentation schema of a database is a definition of a set of fragments
that includes all attributes and tuples in the database and satisfies the
condition that the whole database can be reconstructed from the fragments
by applying some sequence of OUTER UNION (or OUTER JOIN) and
UNION operations. 15
Data Fragmentation, Replication, and Allocation Techniques Cont.…
• An allocation schema describes the allocation of fragments to sites of the
DDBS; hence, it is a mapping that specifies for each fragment the sites) at
which it is stored.
• If a fragment is stored at more than one site, it is said to be replicated.
2. Data Replication and Allocation
– Replication is useful in improving the availability of data.
– The most extreme case is replication of the whole database at every site in
the distributed system, thus creating a fully replicated distributed database.
– This can improve availability remarkably because the system can continue to
operate as long as at least one site is up. 16
Data Fragmentation, Replication, and Allocation Techniques Cont.…
– It also improves performance of retrieval for global queries, because the
result of such a query can be obtained locally from anyone site;
– hence, a retrieval query can be processed at the local site where it is
submitted, if that site includes a server module.
– The disadvantage of full replication is that it can slow down update
operations drastically, since a single logical update must be performed on
every copy of the database to keep the copies consistent.
– This is especially true if many copies of the database exist.
– Full replication makes the concurrency control and recovery techniques
more expensive than they would be if there were no replication. 17
Types of Distributed Database Systems
– The first factor we consider is the degree of homogeneity of the
DDBMS software.
– If all servers (or individual local DBMSs) use identical software and all
users (clients) use identical software, the DDBMS is called homogeneous;
– otherwise, it is called heterogeneous.
– Another factor related to the degree of homogeneity is the degree of
local autonomy.
– If there is no provision for the local site to function as a stand-alone
DBMS, then the system has no local autonomy.
18
Types of Distributed Database Systems Cont..
– On the other hand, if direct access by local transactions to a server is
permitted, the system has some degree of local autonomy.
– At one extreme of the autonomy spectrum, we have a DDBMS that "looks
like" a centralized DBMS to the user.
– A single conceptual schema exists, and all access to the system is obtained
through a site that is part of the DDBMS-which means that no local
autonomy exists.
– At the other extreme we encounter a type of DDBMS called a federated
DDBMS (or a multidatabase system).
19
Types of Distributed Database Systems Cont..
– In such a system, each server is an independent and autonomous centralized DBMS
that has its own local users, local transactions, and DBA and hence has a very high
degree of local autonomy.
– The term federated database system (FDBS) is used when there is some global
view or schema of the federation of databases that is shared by the applications.
– On the other hand, a multidatabase system does not have a global schema and
interactively constructs one as needed by the application.
– Both systems are hybrids between distributed and centralized systems and the
distinction we made between them is not strictly followed.
– We will refer to them as FDBSs in a generic sense.
20
Types of Distributed Database Systems Cont..
– Federated Database Management Systems Issues.
1. Differences in data models
2. Differences in constraints:
3. Differences in query languages
Semantic Heterogeneity
– Semantic heterogeneity occurs when there are differences in the meaning,
interpretation, and intended use of the same or related data.
– Semantic heterogeneity among component database systems (DBSs) creates the biggest
hurdle in designing global schemas of heterogeneous databases.
– The design autonomy of component DBSs refers to their freedom of choosing the
following design parameters, which in tum affect the eventual complexity of the FOBS:
21
Types of Distributed Database Systems Cont..
– Federated Database Management Systems Issues.
1. The universe of discourse from which the data is drawn
2. Representation and naming
3. The understanding, meaning, and subjective interpretation of data
4. Transaction and policy constraints
5. Derivation of summaries
– Communication autonomy of a component DBS refers to its ability to decide
whether to communicate with another component DBS.
– Execution autonomy refers to the ability of a component DBS to execute local
operations without interference from external operations by other component
DBSs and its ability to decide the order in which to execute them. 22
Types of Distributed Database Systems Cont..
– The association autonomy of a component DBS implies that it has the ability to
decide whether and how much to share its functionality (operations it supports)
and resources (data it manages) with other component DBSs.
– The major challenge of designing FDBSs is to let component DBSs interoperate
while still providing the above types of autonomies to them.
– A typical five-level schema architecture to support global applications in the FOBS
environment is shown in Figure below.
– In this architecture, the local schema is the conceptual schema (full database
definition) of a component database, and the component schema is derived by
translating the local schema into a canonical data model or common data model
(CDM) for the FDBS. 23
Types of Distributed Database Systems Cont..
– Schema translation from the local schema to the component schema is
accompanied by generation of mappings to transform commands on a
component schema into commands on the corresponding local schema.
– The export schema represents the subset of a component schema that is
available to the FDBS.
– The federated schema is the global schema or view, which is the result of
integrating all the shareable export schemas.
– The external schemas define the schema for a user group or an application, as in
the three-level schema architecture.
24
Types of Distributed Database Systems Cont..

FIGURE : The five-level schema


architecture in a federated database
system (FOBS).

25
Query Processing in Distributed Databases
– DDBMS processes and optimizes a query in terms of communication cost of
processing a distributed query and other parameters.
– Various factors which are considered while processing a query:
1. Cost of Data Transfer
– This is a very important factor while processing queries.
– The intermediate data is transferred to other location for data processing and the
final result will be sent to the location where the actual query is processing.
– The cost of data increases if the locations are connected via high performance
communicating channel.
– The DDBMS query optimization algorithm are used to minimize the cost of data
transfer.
26
Query Processing in Distributed Databases Cont.…
2. Semi-join based query optimization
– Semi join is used to reduce the number of relations in a table before transferring it another
location.
– Only joining columns are transferred in this method.
– This method reduces the cost of data transfer.
3. Cost based query optimization
– It involves many operations like selection, projection, aggregation.
– Cost of communication is considered in query optimization.
– In centralized system, the information of relations at remote locations is obtained from the
server system catalogs.
– The data (query) which is manipulated at local locations is considered as a sub query to other
global locations. End!!!
Thank you
– This process estimates the total cost which is needed to compute the intermediate relations.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy