0% found this document useful (0 votes)
6 views

19516_Week 2 Parallel and Distributed Database

The document discusses Parallel and Distributed Databases, highlighting the benefits of parallel databases in improving processing speeds through multiple CPUs and disks, and detailing their architectures: Shared Memory, Shared Disk, and Shared Nothing systems. It also explains Distributed Databases, which consist of interrelated databases across a network, managed by a Distributed Database Management System (DDBMS), and outlines the advantages and disadvantages of both systems. Key differences between Parallel and Distributed DBMS are also noted, emphasizing their operational and design purposes.

Uploaded by

Fountain Josiah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

19516_Week 2 Parallel and Distributed Database

The document discusses Parallel and Distributed Databases, highlighting the benefits of parallel databases in improving processing speeds through multiple CPUs and disks, and detailing their architectures: Shared Memory, Shared Disk, and Shared Nothing systems. It also explains Distributed Databases, which consist of interrelated databases across a network, managed by a Distributed Database Management System (DDBMS), and outlines the advantages and disadvantages of both systems. Key differences between Parallel and Distributed DBMS are also noted, emphasizing their operational and design purposes.

Uploaded by

Fountain Josiah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

TOPIC: Parallel and Distributed Database

Parallel database:
Parallel Database improve processing and input/output speeds by using multiple CPU and disks in
parallel. A Parallel Database system seeks to improve performance through parallelization of
various operations, such as loading data, building indexes and evaluating queries. In Parallel
processing, many operations are performed simultaneously, as opposed to serial processing, in
which the computational steps are performed sequentially.
Organizations of every size benefit from databases because they improve the management of
information. The database has a server, a specialized program that oversees all user requests.
Organization use parallel database approach for a large user base and millions of records to
process. They are fast, flexible and reliable.
Architecture for parallel database:
There are three main architectures for building parallel DBMS
Shared Memory
Shared Disk System
Shared Nothing System
1. Shared Memory System: This is where multiple processors are attached to an interconnected
network and access a common region of memory.
Advantages
1. It is closer to a conventional machine and easy to program. 2. Overhead is low.
3. OS Services are leveraged to utilized the additional CPU
Disadvantages
1. It leads to a bottleneck problem.
2. Expensive to build.
3. It is less sensitive to partitioning
2. Shared disk system: where each processor has its own main memory, and direct access to all
disks through an interconnected network.
Advantages
1. The same with shared memory
Disadvantages
1. More interference
2. Increases N/ W bandwidth.
3. The shared disk is less sensitive to partitioning.
3. Shared nothing: This is where each processor has local main memory and disk space, but no
two processors can access the same storage area and all communication between processor is
through a network connection. It has its own mass storage as well as main memory.
Advantages
1. It provides a linear scale-up and linear speedup. 2. Shared nothing benefit from “good”
partitioning. 3. Cheap to build.
Disadvantages
1. It is hard to program.
2. Addition of new nodes requires reorganization.
Parallel query evaluation
A relational query execution plan is a graph/ tree of relational algebra operators (based on this
operators can execute in parallel) and the operators in a graph can be executed in parallel. If an
operator consumes the output of a second operator, we have pipelined parallelism.
Data partitioning: In this case, a large database is partitioned horizontally across several disks, this
enables us to exploit the I/O bandwidth of the disk by reading and writing them in parallel. This
can be done in the following ways:
1. Round-robin partitioning: If there are n processors, the 1st tuple is assigned to processor mod
n round-robin partitioning. Round-robin partitioning is suitable for efficiently evaluating queries
that access the entire relation. If only a subset of the tuples is required, hash partitioning and range
partitioning are better than round-robin partitioning.
2. Hash partitioning: A hash function is applied to (selected fields of) a tuple to determine its
processor. Hash partitioning has the additional virtue that it keeps data evenly distributed even if
the data grows and shrinks over time.
3. Range partitioning: Tuples are sorted and ranges are chosen for the sort key values so that each
range contains roughly the same number of tuples, tuples in range, I reassigned to processor i.
Range Partitioning can lead to data skew.
Advantages of parallel databases
A parallel database runs on many computers at the same time.
1. High Performances 2. Speed
3. Reliability
4. Capacity
Disadvantages of Parallel database
1. Implementation is highly expensive.
2. Handling Parallel database simultaneously is difficult and complex. 3. A lot of resources are
needed to support and maintain the database.
Distributed Database
A Distributed database (DDB) is a collection of multiple, logical interrelated database distributed
over a computer network.
A Distributed database management system (DDBMS) is the software that manages the DDB and
provides an access mechanism that makes this distribution transparent to the users. A distributed
database system is a system that permits physical data storage across several sites and each
site/node is managed by a DBMS that is capable of running independently of the other sites. It is
a database in which storage devices are not all attached to a common processing unit as the CPU,
controlled by a distributed database management system. It may be stored in multiple computers,
located in the same physical location; or may be dispersed over a network of interconnected
computers. System administrators can distribute collections of data (e. g in a database) across
multiple physical locations. A distributed database can reside on network servers on the internet,
on corporate intranets, or other company networks.
Two processing ensure that the distributed database remain up- to date and current:
Replication: involves using specialized software that looks for changes in the distributed database.
Once the changes have been identified, the replication process makes all the databases look the
same. The replication process can be complex and time –consuming depending on the size and
number of the distributed databases
Duplication: This process has less complexity, it basically identifies one database as a master and
then duplicates that database. The duplication process is normally done at a set time hour. This is
to ensure that each distributed location has the same data. In the duplication process, users may
change only the master database, which ensures that local data will not be overwritten.
A Distributed Database management system is designed for heterogeneous database platforms that
focus on heterogeneous database management systems. The following property is considered
desirable:
1. Distributed Data Independence: Users should be able to ask queries without specifying where
the referenced relations or copies or fragments of the relations are located.
2. Distributed Transaction Atomicity: User should be able to write transactions that access and
update at several sites just as they would write transactions over purely local data
Types of distributed database
There are two major types of distributed database systems: they are:
1. Homogenous distributed database
2. Heterogeneous distributed database.
Homogenous distributed database:
1. The following conditions must be satisfied for the homogeneous database:
2. The operating system use, at each location, must be the same.
3. the operating system, must, data structures and database application used at each
location must be same or compatible.
Heterogeneous distributed database:
The following conditions must be satisfied for the heterogeneous database:
1. Different sites may use different schema and software.
2. In heterogeneous systems, different nodes may have different hardware, software and
data structure at various nodes or locations.
The three major distributed DBMS architectures are:
Client-Server Collaborating Server Middleware
1. Client-Server Architecture: In this architecture, the Client (front end) does data presentation or
processing, while the Server (back- end) does storage, security and major data processing. The
client is held responsible for user-interface issues and servers manage data and execute
transactions. A client-server system has one or more client processes and one or more server
processes, and a client process can send a query to any one server process. Thus a client process
could run on a personal computer and send queries to a server running on a mainframe.
Clients characteristics
1. Always initiate requests to servers.
2. Waits for replies.
3. Receives replies.
4. Usually connects to a small number of servers at one time.
Servers characteristics
1. Always wait for a request from one of the clients
2. Servers client request then replies with requested data to the clients
3. A server may communicate with other servers to serve a client request.
4. A server is a source which sends a request to the client to get the needed data users.
Advantages of client-server architecture
1. Very easy to implement because of its clear separation of functionally and a centralized server.
2. Allow user to run a graphical user interface.
3. It enables the roles and responsibilities of a computing system to be distributed among
several independent computers known to each other only through the network. It also provides
greater ease of maintenance.
4. Servers provide better security control access and resources to guarantee that only those clients
with the appropriate permissions may access and change data.
5. Since data storage is centralized, updates to that data are much easier for administrators.
6. Many advanced client-server technologies are designed to ensure security, user-friendly
interfaces and ease of use.
7. It works with multiple different clients of different specifications.

Disadvantages of client-server
1. The client-Server architecture does not permit a single query to span multiple servers.
2. Some times to separate and distinguish between clients and server architecture become
harder.
3. The problem of overlapping, the client process and the server.
4. Networks traffic blocking is one of the problems related to the client-server model.
2. Collaborating server system: This is a collection of database servers, each capable of running
transactions against local data, which cooperatively execute transactions spanning multiple
servers. This overcomes the problem of client-server architecture.
3. Middleware architecture: All web transactions take place on the servers. The web server is
responsible for communicating with the browser while the database server is responsible for
storing the required information.
Advantages of distributed databases
1. Data is stored at many sites, also referred to as nodes.
2. The processors at nodes are interconnected by a computer network rather than a
multiprocessor configuration.
3. The distributed database is indeed a true database, not a collection of files that can be
stored individually at each node.
4. The overall system has the full functionality of a database management system.
5. Reliable transactions due to the replication of database
6. Hardware, operating system, network, fragmentation, DBMS, replication and location
independence.
7. Continuous operation, even if some nodes go offline.
8. Distributed query processes can improve performance.
9. Easier expansion.
10. Local autonomy of site autonomy: a department can control the data about them.
11. Protection of valuable data if there is a fire outbreak as a result of the distributed data in multiple
sites.
12. Modularity systems can be modified added and removed fro the distributed database without
affecting other systems or modules.
13. It is very economical.
Disadvantages of distributed databases
1. Data integrity is difficult to maintain.
2. Distributed data is very complex in nature. For example, extra work must be done to maintain
multiple disparate systems, instead of one big one.
3. It is not really economical because a more extensive infrastructure implies extra labour costs.
4. Absence of standards right.
5. Additional software is needed.
6. Complexity in database design.
7. The operating system should support a distributed environment.
Storing data in DDBS
Data storage in a distributed database involve two concepts
1. Fragmentation 2. Replication
1. Fragmentation: This is a process of splitting a relation into smaller relation or fragments, and
storing the fragment possibly at different sites. In horizontal fragmentation, each fragment consists
of a subset of rows of the original relation. While in vertical fragmentation, each fragment consists
of a subset of columns of the original relations.
2. Replication: This means that several copies of a relation or relation fragment can be stored. An
entire relation can be replicated at one or more sites. Similarly, one or more fragments of a relation
can be replicated at other sites. For example, if a relation R is fragmented into R1, R2 and R3,
there might be just one copy of R1, whereas R2 is replicated at two other sites and R3 is replicated
at all sites.
Parallel DBMS against distributed DBMS
Parallel Distributed System: seeks to improve performance through parallelization of various
operations, such as data loading, index building and query evaluating. Distributed Database
System: Data is physically stored across several sites, and each site is typically managed by a
DBMS capable of running independently of the other sites. The distribution of data is governed by
factors such as local ownership and increased availability.
1. System component: Distributed DBMS consists of many Geo-distributed, low –bandwidth link
connected, autonomic site. While parallel DBMS consists of tightly coupled, high- bandwidth link
connected, non- autonomic node.
2. Component role: Sites in distributed DBMS can work independently to handle local transaction
or work together to handle global transactions. While nodes in parallel DBMS can only work
together to handle global transactions.
3. Design purposes: Distributed DBMS is for sharing data, local autonomy, high availability, while
parallel DBMSA is for high-performance high availability.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy