DDS Unit - 1-1
DDS Unit - 1-1
UNIT - I
Centralized database
Architectural Models
Client - Server Architecture for DDBMS
Peer - to - Peer Architecture for DDBMS
Multi - DBMS Architecture
– data management
– Optimization
Client performs
– Application
– User interface
Data processor
Local query optimizer
Acts as the access path selector
Responsible for choosing the best access path
Local Recovery Manager
– Bottom-up design
– Top-down procedure
Three issues
– A directory may either be global to the entire database or local to each site.
– Directory may be maintained centrally at one site, or in a distributed fashion by
distributing it over a number of sites.
If system is distributed, directory is always distributed
– Replication may be single copy or multiple copies.
Multiple copies would provide more reliability
Bottom-Up Approach
Suitable for applications where database already exists
Starting point is individual conceptual schemas
Exists primarily in the context of heterogeneous database.
Design Alternatives
The distribution design alternatives for the tables in a DDBMS are as follows
− Non-replicated and non-fragmented
Fully replicated
Partially replicated Fragmented
Mixed
Non-replicated & Non-fragmented
In this design alternative, different tables are placed at different sites. Data is placed so that it
is at a close proximity to the site where it is used most. It is most suitable for database
systems where the percentage of queries needed to join information in tables placed at
different sites is low. If an appropriate distribution strategy is adopted, then this design
alternative helps to reduce the communication cost during data processing.
Fully Replicated
In this design alternative, at each site, one copy of all the database tables is stored. Since,
each site has its own copy of the entire database, queries are very fast requiring negligible
communication cost. On the contrary, the massive redundancy in data requires huge cost
during update operations. Hence, this is suitable for systems where a large number of
queries is required to be handled whereas the number of database updates is low.
Partially Replicated
Copies of tables or portions of tables are stored at different sites. The distribution of the
tables is done in accordance to the frequency of access. This takes into consideration the
fact that the frequency of accessing the tables vary considerably from site to site. The
number of copies of the tables (or portions) depends on how frequently the access queries
execute and the site which generate the access queries.
Fragmented
In this design, a table is divided into two or more pieces referred to as fragments or partitions,
and each fragment can be stored at different sites. This considers the fact that it seldom
happens that all data stored in a table is required at a given site. Moreover, fragmentation
increases parallelism and provides better disaster recovery. Here, there is only one copy of
each fragment in the system, i.e. no redundant data.
The three fragmentation techniques are −
Vertical fragmentation
Horizontal fragmentation
Hybrid fragmentation
Mixed Distribution: This is a combination of fragmentation and partial replications. Here, the
tables are initially fragmented in any form (horizontal or vertical), and then these fragments
are partially replicated across the different sites according to the frequency of accessing the
fragments.
Design Strategies
In the last chapter, we had introduced different design alternatives. In this chapter, we will
study the strategies that aid in adopting the designs. The strategies can be broadly divided
into replication and fragmentation. However, in most cases, a combination of the two is
used.
Data Replication
Data replication is the process of storing separate copies of the database at two or more
sites. It is a popular fault tolerance technique of distributed databases.
Advantages of Data Replication
Reliability − In case of failure of any site, the database system continues to work
since a copy is available at another site(s).
Reduction in Network Load − Since local copies of data are available, query
processing can be done with reduced network usage, particularly during prime hours.
Data updating can be done at non-prime hours.
Quicker Response − Availability of local copies of data ensures quick query
processing and consequently quick response time.
Simpler Transactions − Transactions require less number of joins of tables located at
different sites and minimal coordination across the network. Thus, they become
simpler in nature.
Disadvantages
1. Applications whose views are defined on more than one fragment may suffer
performance degradation, if applications have conflicting requirements.
2. Simple tasks like checking for dependencies, would result in chasing after data in a
number of sites
3. When data from different fragments are required, the access speeds may be very
high.
4. In case of recursive fragmentations, the job of reconstruction will need expensive
techniques.
5. Lack of back-up copies of data in different sites may render the database ineffective in
case of failure of a site.
Vertical Fragmentation
In vertical fragmentation, the fields or columns of a table are grouped into fragments. In
order to maintain reconstructiveness, each fragment should contain the primary key field(s)
of the table. Vertical fragmentation can be used to enforce privacy of data.
Grouping
Starts by assigning each attribute to one fragment
o At each step, joins some of the fragments until some criteria is satisfied.
Results in overlapping fragments
Splitting
Starts with a relation and decides on beneficial partitioning based on the access
behavior of applications to the attributes
Fits more naturally within the top-down design
Generates non-overlapping fragments
For example, let us consider that a University database keeps records of all registered
students in a Student table having the following schema.
STUDENT
Regd_No Name Course Address Semester Fees Ma
rks
Now, the fees details are maintained in the accounts section. In this case, the designer will
fragment
CREATE TABLE STD_FEES AS
SELECT Regd_No, Fees
Horizontal Fragmentation
FROM STUDENT;
Horizontal fragmentation groups the tuples of a table in accordance to values of one or more
fields. Horizontal fragmentation should also confirm to the rule of reconstructiveness. Each
horizontal fragment must have all columns of the original base table.
Primary horizontal fragmentation is defined by a selection operation on the owner
relation of a database schema.
Given relation Ri, its horizontal fragments are given by
Ri = σFi(R), 1<= i <= w
Fi selection formula used to obtain fragment Ri
The example mentioned in slide 20, can be represented by using the above formula as
Emp1 = σSal <= 20K (Emp)
Emp2 = σSal > 20K (Emp)
For example, in the student schema, if the details of all students of Computer Science Course
needs to be maintained at the School of Computer Science, then the designer will
horizontally fragment the database as follows −
Link between the owner and the member relations is defined as equi-join
Given a link L where owner (L) = S and member (L) = R, the derived horizontal
fragments of R are defined as
Ri = R α Si, 1 <= I <= w
Where,
Si = σ Fi (S)
w is the max number of fragments that will be defined on
Fi is the formula using which the primary horizontal fragment Si is defined
Hybrid Fragmentation
In hybrid fragmentation, a combination of horizontal and vertical fragmentation techniques
are used. This is the most flexible fragmentation technique since it generates fragments with
minimal extraneous information. However, reconstruction of the original table is often an
expensive task.
Hybrid fragmentation can be done in two alternative ways −
At first, generate a set of horizontal fragments; then generate vertical fragments from one or
more of the horizontal fragments.
At first, generate a set of vertical fragments; then generate horizontal fragments from one or
more of the vertical fragments.
Transparency
Transparency in DBMS stands for the separation of high level semantics of the system from
the low-level implementation issue. High-level semantics stands for the endpoint user, and
low level implementation concerns with complicated hardware implementation of data or
how the data has been stored in the database. Using data independence in various layers of
the database, transparency can be implemented in DBMS.
Distribution transparency is the property of distributed databases by the virtue of which the
internal details of the distribution are hidden from the users. The DDBMS designer may
choose to fragment tables, replicate the fragments and store them at different sites.
However, since users are oblivious of these details, they find the distributed database easy to
use like any centralized database.
Unlike normal DBMS, DDBMS deals with communication network, replicas and fragments
of data. Thus, transparency also involves these three factors.
Following are three types of transparency:
1. Location transparency
2. Fragmentation transparency
3. Replication transparency
Location Transparency
Location transparency ensures that the user can query on any table(s) or fragment(s) of a
table as if they were stored locally in the user’s site. The fact that the table or its fragments
are stored at remote site in the distributed database system, should be completely oblivious to
the end user. The address of the remote site(s) and the access mechanisms are completely
hidden.In order to incorporate location transparency, DDBMS should have access to updated
and accurate data dictionary and DDBMS directory which contains the details of locations
of data.
Fragmentation Transparency
Fragmentation transparency enables users to query upon any table as if it were unfragmented.
Thus, it hides the fact that the table the user is querying on is actually a fragment or union of
some fragments. It also conceals the fact that the fragments are located at diverse sites.This is
somewhat similar to users of SQL views, where the user may not know that they are using a
view of a table instead of the table itself.
Replication Transparency
Replication transparency ensures that replication of databases are hidden from the users. It
enables users to query upon a table as if only a single copy of the table exists.Replication
transparency is associated with concurrency transparency and failure transparency. Whenever
a user updates a data item, the update is reflected in all the copies of the table. However, this
operation should not be known to the user. This is concurrency transparency. Also, in case of
failure of a site, the user can still proceed with his queries using replicated copies without
any knowledge of failure. This is failure transparency.
Combination of Transparencies
In any distributed database system, the designer should ensure that all the stated
transparencies are maintained to a considerable extent. The designer may choose to fragment
tables, replicate them and store them at different sites; all oblivious to the end user.
However, complete distribution transparency is a tough task and requires considerable design
efforts.
Database Control
Database control refers to the task of enforcing regulations so as to provide correct data to
authentic users and applications of a database. In order that correct data is available to users,
all data should conform to the integrity constraints defined in the database. Besides, data
should be screened away from unauthorized users so as to maintain security and privacy of
the database. Database control is one of the primary tasks of the database administrator
(DBA).
The three dimensions of database control are −
Authentication
Access Control
Integrity Constraints
Authentication
In a distributed database system, authentication is the process through which only legitimate
users can gain access to the data resources.
Authentication can be enforced in two levels −
Controlling Access to Client Computer − At this level, user access is restricted while login
to the client computer that provides user-interface to the database server. The most common
method is a username/password combination. However, more sophisticated methods like
biometric authentication may be used for high security data.
Controlling Access to the Database Software − At this level, the database
software/administrator assigns some credentials to the user. The user gains access to the
database using these credentials. One of the methods is to create a login account within the
database server.
Access Rights
A user’s access rights refers to the privileges that the user is given regarding DBMS
operations such as the rights to create a table, drop a table, add/delete/update tuples in a
table or query upon the table.
In distributed environments, since there are large number of tables and yet larger number of
users, it is not feasible to assign individual access rights to users. So, DDBMS defines
certain roles. A role is a construct with certain privileges within a database system. Once the
different roles are defined, the individual users are assigned one of these roles. Often a
hierarchy of roles are defined according to the organization’s hierarchy of authority and
responsibility.
For example, the following SQL statements create a role "Accountant" and then assigns this
role to user "ABC".