Cb3401-Unit 1
Cb3401-Unit 1
Reshma/AP/CSE
INTRODUCTION
A Database Management System (DBMS) is a software system that is designed to manage and organize data in a structured
manner. It allows users to create, modify, and query a database, as well as manage the security and access controls for that
database.
DBMS provides an environment to store and retrieve data conveniently and efficiently.
Key Features of DBMS
• Data modeling: A DBMS provides tools for creating and modifying data models, which define the structure and
relationships of the data in a database.
• Data storage and retrieval: A DBMS is responsible for storing and retrieving data from the database, and can
provide various methods for searching and querying the data.
• Concurrency control: A DBMS provides mechanisms for controlling concurrent access to the database, to ensure
that multiple users can access the data without conflicting with each other.
• Data integrity and security: A DBMS provides tools for enforcing data integrity and security constraints, such as
constraints on the values of data and access controls that restrict who can access the data.
• Backup and recovery: A DBMS provides mechanisms for backing up and recovering the data in the event of a
system failure.
• DBMS can be classified into two types: Relational Database Management System (RDBMS) and Non-Relational
Database Management System (NoSQL or Non-SQL)
• RDBMS: Data is organized in the form of tables and each table has a set of rows and columns. The data are related
to each other through primary and foreign keys.
• NoSQL: Data is organized in the form of key-value pairs, documents, graphs, or column-based. These are designed
to handle large-scale, high-performance scenarios.
A database is a collection of interrelated data that helps in the efficient retrieval, insertion, and deletion of data from the
database and organizes the data in the form of tables, views, schemas, reports, etc. For Example, a university database
organizes the data about students, faculty, admin staff, etc., which helps in the efficient retrieval, insertion, and deletion of
data from it.
Database Languages in DBMS
o A DBMS has appropriate languages and interfaces to express database queries and updates.
o Database languages can be used to read, store and update the data in the database.
o DDL stands for Data Definition Language. It is used to define database structure or pattern.
o It is used to create schema, tables, indexes, constraints, etc. in the database.
o Using the DDL statements, you can create the skeleton of the database.
o Data definition language is used to store the information of metadata like the number of tables and schemas, their
names, indexes, columns in each table, constraints, etc.
These commands are used to update the database schema that's why they come under Data definition language.
DML stands for Data Manipulation Language. It is used for accessing and manipulating data in a database. It handles user
requests.
o DCL stands for Data Control Language. It is used to retrieve the stored or saved data.
o The DCL execution is transactional. It also has rollback parameters.
(But in Oracle database, the execution of data control language does not have the feature of rolling back.)
There are the following operations which have the authorization of Revoke:
TCL is used to run the changes made by the DML statement. TCL can be grouped into a logical transaction.
DBMS is the management of data that should remain integrated when any changes are done in it. It is because if the integrity
of the data is affected, whole data will get disturbed and corrupted. Therefore, to maintain the integrity of the data, there are
four properties described in the database management system, which are known as the ACID properties. The ACID
properties are meant for the transaction that goes through a different group of tasks, and there we come to see the role of the
ACID properties.
Relational Databases
A relational database is a collection of information that organizes data in predefined relationships where data is stored in
one or more tables (or "relations") of columns and rows, making it easy to see and understand how different data structures
relate to each other. Relationships are a logical connection between different tables, established based on interaction among
these tables.
A relational database (RDB) is a way of structuring information in tables, rows, and columns. An RDB can establish links—
or relationships–between information by joining tables, which makes it easy to understand and gain insights about the
relationship between various data points.
Data Models
Data Model is the modeling of the data description, data semantics, and consistency constraints of the data. It provides the
conceptual tools for describing the design of a database at each level of data abstraction. Therefore, there are following four
data models used for understanding the structure of the database:
1) Relational Data Model: This type of model designs the data in the form of rows and columns within a table. Thus, a
relational model uses tables for representing data and in-between relationships. Tables are also called relations. This model
was initially described by Edgar F. Codd, in 1969. The relational data model is the widely used model which is primarily
used by commercial data processing applications.
2) Entity-Relationship Data Model: An ER model is the logical representation of data as objects and relationships among
them. These objects are known as entities, and relationship is an association among these entities. This model was designed
by Peter Chen and published in 1976 papers. It was widely used in database designing. A set of attributes describe the
entities. For example, student_name, student_id describes the 'student' entity. A set of the same type of entities is known as
an 'Entity set', and the set of the same type of relationships is known as 'relationship set'.
3) Object-based Data Model: An extension of the ER model with notions of functions, encapsulation, and object identity,
as well. This model supports a rich type system that includes structured and collection types. Thus, in 1980s, various
database systems following the object-oriented approach were developed. Here, the objects are nothing but the data carrying
its properties.
4) Semistructured Data Model: This type of data model is different from the other three data models (explained above).
The semistructured data model allows the data specifications at places where the individual data items of the same type may
have different attributes sets. The Extensible Markup Language, also known as XML, is widely used for representing the
semistructured data. Although XML was initially designed for including the markup information to the text document, it
gains importance because of its application in the exchange of data.
Attributes (columns) specify a data type, and each record (or row) contains the value of that specific data type. All tables in
a relational database have an attribute known as the primary key, which is a unique identifier of a row, and each row can
be used to create a relationship between different tables using a foreign key—a reference to a primary key of another
existing table.
How the relational database model works in practice: Say you have a Customer table and an Order table.
• Customer name
• Billing address
• Shipping address
In the Customer table, the customer ID is a primary key that uniquely identifies who the customer is in the relational
database. No other customer would have the same Customer ID.
The Order table contains transactional information about an order:
• Order ID (primary key)
• Order date
• Shipping date
• Order status
Here, the primary key to identify a specific order is the Order ID. You can connect a customer with an order by using a
foreign key to link the customer ID from the Customer table. The two tables are now related based on the shared customer
ID, which means you can query both tables to create formal reports or use the data for other applications. For instance, a
retail branch manager could generate a report about all customers who made a purchase on a specific date or figure out
which customers had orders that had a delayed delivery date in the last month.
The above explanation is meant to be simple. But relational databases also excel at showing very complex relationships
between data, allowing you to reference data in more tables as long as the data conforms to the predefined relational schema
of your database. As the data is organized as pre-defined relationships, you can query the data declaratively. A declarative
query is a way to define what you want to extract from the system without expressing how the system should compute the
result. This is at the heart of a relational system as opposed to other systems.
Cloud-based relational databases like Cloud SQL, Cloud Spanner , and AlloyDB have become increasingly popular as they
offer managed services for database maintenance, patching, capacity management, provisioning and infrastructure support.
• Flexibility
It’s easy to add, update, or delete tables, relationships, and make other changes to data whenever you need without
changing the overall database structure or impacting existing applications.
• ACID compliance
Relational databases support ACID (Atomicity, Consistency, Isolation, Durability) performance to ensure data validity
regardless of errors, failures, or other potential mishaps.
• Ease of use
It’s easy to run complex queries using SQL, which enables even non-technical users to learn how to interact with the
database.
• Collaboration
Multiple people can operate and access data simultaneously. Built-in locking prevents simultaneous access to data when
it’s being updated.
• Built-in security
Role-based security ensures data access is limited to specific users.
• Database normalization
Relational databases employ a design technique known as normalization that reduces data redundancy and improves data
integrity.
Unlike relational databases, NoSQL databases follow a flexible data model, making them ideal for storing data that changes
frequently or for applications that handle diverse types of data.
The relational model represents DB in the form of a collection of various relations. This relation refers to a table of various
values. And every row present in the table happens to denote some real-world entities or relationships. The names of tables
and columns help us interpret the meaning of the values present in every row of the table. This data gets represented in the
form of a set of various relations. Thus, in the relational model, basically, this data is stored in the form of tables. However,
this data’s physical storage is independent of its logical organization.
Popular Relational Database Management Systems:
C1 RIYA DELHI 15 20
C2 SUNITA GURGAON 16 22
C3 ASHWANI ROHTAK 12 18
C4 PREETI DELHI 25
Important Terminologies
Here are some Relational Model concepts in DBMS:
• Attribute: It refers to every column present in a table. The attributes refer to the properties that help us define a
relation. E.g., Employee_ID, Student_Rollno, SECTION, NAME, etc.
• Tuple – It is a single row of a table that consists of a single record. The relation above consists of four tuples, one
of which is like:
C1 RIYA DELHI 15 20
• Tables – In the case of the relational model, all relations are saved in the table format, and it is stored along with
the entities. A table consists of two properties: columns and rows. While rows represent records, the columns
represent attributes.
• Degree: It refers to the total number of attributes that are there in the relation. The EMPLOYEE relation defined
here has degree 5.
• Relation Schema: It represents the relation’s name along with its attributes. E.g., EMPLOYEE (ID_NO, NAME,
ADDRESS, ROLL_NO, AGE) is the relation schema for EMPLOYEE. If a schema has more than 1 relation, then
it is known as Relational Schema.
• Column: It represents the set of values for a certain attribute. The column ID_NO is extracted from the relation
EMPLOYEE.
• Cardinality: It refers to the total number of rows present in the given table. The EMPLOYEE relation defined here
has cardinality 4.
• Relation instance – It refers to a finite set of tuples present in the RDBMS system. A relation instance never has
duplicate tuples.
• Attribute domain – Every attribute has some predefined value and scope, which is known as the attribute domain.
• Relation key – Each and every row consists of a single or multiple attributes. It is known as a relation key.
• NULL Values: The value that is NOT known or the value that is unavailable is known as a NULL value. This null
value is represented by the blank spaces. E.g., the MOBILE of the EMPLOYEE having ID_NO 4 is NULL.
Domain Constraints
The domain constraints are like attribute level constraints. Now an attribute is only capable of taking values that lie inside
the domain range. For example, if a constraint ID_NO>0 is applied on the EMPLOYEE relation, inserting some negative
value of ID_NO will result in failure.
Key Integrity
Each and every relation present in the database should have at least one set of attributes that uniquely defines a tuple. Those
sets of attributes are known as keys. For example, ID_NO in EMPLOYEE is a key. Now, remember that no two students
would be capable of having the very same ID number. Thus, a key primarily consists of these two properties:
Referential Integrity
Whenever one of the attributes of a relation is capable of only taking values from another attribute of the same relation or
other relations, it is termed referential integrity.
Now, let us have the following two relations:
LEARNER
C1 RIYA DELHI 15 20 CS
C2 SUNITA GURGAON 16 22 CS
C4 PREETI DELHI 18 25 IT
SUBJECT
SUBJECT_NAME SUBJECT_CODE
COMPUTER SCIENCE CS
INFORMATION TECHNOLOGY IT
CIVIL ENGINEERING CV
The SUBJECT_CODE of LEARNER can only take the values that are present in the SUBJECT_CODE of SUBJECT,
which is known as referential integrity constraint. Thus, the relation that is referencing to the other relation is known as
REFERENCING RELATION (LEARNER in this case), while that relation to which the other relations refer is known as
REFERENCED RELATION (SUBJECT in this case).
Relational Algebra
Relational Algebra is a procedural query language. Relational algebra mainly provides a theoretical foundation for relational
databases and SQL. The main purpose of using Relational Algebra is to define operators that transform one or more input
relations into an output relation. Given that these operators accept relations as input and produce relations as output, they
can be combined and used to express potentially complex queries that transform potentially many input relations (whose
data are stored in the database) into a single output relation (the query results). As it is pure mathematics, there is no use of
English Keywords in Relational Algebra and operators are represented using symbols.
Fundamental Operators
These are the basic/fundamental operators used in Relational Algebra.
1. Selection(σ)
2. Projection(π)
3. Union(U)
4. Set Difference(-)
5. Set Intersection(∩)
6. Rename(ρ)
7. Cartesian Product(X)
Example:
A B C
1 2 4
2 2 3
3 2 3
4 3 4
For the above relation, σ(c>3)R will select the tuples which have c more than 3.
A B C
1 2 4
4 3 4
Note: The selection operator only selects the required tuples but does not display them. For display, the data projection
operator is used.
Example: Consider Table 1. Suppose we want columns B and C from Relation R. π(B,C)R will show following columns.
B C
2 4
2 3
3 4
3. Union(U): Union operation in relational algebra is the same as union operation in set theory.
Example: FRENCH
Student_Name Roll_Number
Ram 01
Mohan 02
Vivek 13
Geeta 17
GERMAN
Student_Name Roll_Number
Vivek 13
Geeta 17
Shyam 21
Rohan 25
Consider the following table of Students having different optional subjects in their course.
π(Student_Name)FRENCH U π(Student_Name)GERMAN
Student_Name
Ram
Mohan
Vivek
Geeta
Shyam
Rohan
Note: The only constraint in the union of two relations is that both relations must have the same set of Attributes.
4. Set Difference(-): Set Difference in relational algebra is the same set difference operation as in set theory.
Example: From the above table of FRENCH and GERMAN, Set Difference is used as follows
π(Student_Name)FRENCH - π(Student_Name)GERMAN
Student_Name
Ram
Mohan
Note: The only constraint in the Set Difference between two relations is that both relations must have the same set of
Attributes.
5. Set Intersection(∩): Set Intersection in relational algebra is the same set intersection operation in set theory.
Example: From the above table of FRENCH and GERMAN, the Set Intersection is used as follows
π(Student_Name)FRENCH ∩ π(Student_Name)GERMAN
Student_Name
Vivek
Geeta
Note: The only constraint in the Set Difference between two relations is that both relations must have the same set of
Attributes.
7. Cross Product(X): Cross-product between two relations. Let’s say A and B, so the cross product between A X B will
result in all the attributes of A followed by each attribute of B. Each record of A will pair with every record of B.
Example: A
Note: If A has ‘n’ tuples and B has ‘m’ tuples then A X B will have ‘ n*m ‘ tuples.
Derived Operators
These are some of the derived operators, which are derived from the fundamental operators.
• Natural Join(⋈)
• Conditional Join
1. Natural Join(⋈): Natural join is a binary operator. Natural join between two or more relations will result in a set of all
combinations of tuples where they have an equal common attribute.
Example: EMP
Name ID Dept_Name
A 120 IT
B 125 HR
C 110 Sales
D 111 IT
DEPT
Dept_Name Manager
Sales Y
Production Z
IT A
Natural join between EMP and DEPT with the condition:
EMP.Dept_Name = DEPT.Dept_Name
EMP ⋈ DEPT
2. Conditional Join: Conditional join works similarly to natural join. In natural join, by default condition is equal between
common attributes while in conditional join we can specify any condition such as greater than, less than, or not equal.
Example:
R
ID Sex Marks
1 F 45
2 F 55
3 F 60
S
ID Sex Marks
10 M 20
11 M 22
12 M 59
Relational Calculus
As Relational Algebra is a procedural query language, Relational Calculus is a non-procedural query language. It basically
deals with the end results. It always tells me what to do but never tells me how to do it.
There are two types of Relational Calculus
• Tuple Relational Calculus(TRC)
• Domain Relational Calculus(DRC)
We can create and interact with a database using SQL efficiently and easily. The benefit of SQL is that we don’t have to
specify how to get the data from the database. Rather, we simply specify what is to be retrieved, and SQL does the rest.
Although called a query language, SQL can do much more besides querying. SQL provides statements for defining the
structure of the data, manipulating data in the database, declaring constraints and retrieving data from the database in various
ways, depending on our requirements.
ROLL_NO NAME
1 RAM
2 RAMESH
3 SUJIT
4 SURESH
Case 2: If we want to retrieve ROLL_NO and NAME of the students whose ROLL_NO is greater than 2, the query will
be:
ROLL_NO NAME
3 SUJIT
4 SURESH
CASE 3: If we want to retrieve all attributes of students, we can write * in place of writing all attributes as:
CASE 5: If we want to retrieve distinct values of an attribute or group of attribute, DISTINCT is used as in:
ADDRESS
DELHI
GURGAON
ROHTAK
If DISTINCT is not used, DELHI will be repeated twice in result set. Before understanding GROUP BY and HAVING, we
need to understand aggregations functions in SQL.
Aggregation Functions
Aggregation functions are used to perform mathematical operations on data values of a relation. Some of the common
aggregation functions used in SQL are:
• COUNT: Count function is used to count the number of rows in a relation. E.g;
COUNT(PHONE)
4
• SUM: SUM function is used to add the values of an attribute in a relation. e.g;
SELECT SUM(AGE) FROM STUDENT;
SUM(AGE)
74
In the same way, MIN, MAX and AVG can be used. As we have seen above, all aggregation functions return only 1 row.
• AVERAGE: It gives the average values of the tupples. It is also defined as sum divided by count values.
Syntax:
AVG(attributename)
OR
SUM(attributename)/COUNT(attributename)
The above-mentioned syntax also retrieves the average value of tupples.
• MINIMUM:It extracts the minimum value amongst the set of all the tupples.
Syntax:
MIN(attributename)
• GROUP BY:Group by is used to group the tuples of a relation based on an attribute or group of attribute. It is
always combined with aggregation function which is computed on group. e.g.;
In this query, SUM(AGE) will be computed but not for entire table but for each address. i.e.; sum of AGE for address
DELHI(18+18=36) and similarly for other address as well. The output is:
ADDRESS SUM(AGE)
DELHI 36
GURGAON 18
ROHTAK 20
If we try to execute the query given below, it will result in error because although we have computed SUM(AGE) for each
address, there are more than 1 ROLL_NO for each address we have grouped. So it can’t be displayed in result set. We need
to use aggregate functions on columns after SELECT statement to make sense of the resulting set whenever we are using
GROUP BY.
NOTE:
• An attribute that is not a part of GROUP BY clause can’t be used for selection.
• Any attribute which is part of GROUP BY CLAUSE can be used for selection but it is not mandatory.
But we could use attributes that are not a part of the GROUP BY clause in an aggregate function.
Entity-Relationship Model
The Entity Relational Model is a model for identifying entities to be represented in the database and representation of how
those entities are related. The ER data model specifies enterprise schema that represents the overall logical structure of a
database graphically.
The Entity Relationship Diagram explains the relationship among the entities present in the database. ER models are used
to model real-world objects like a person, a car, or a company and the relation between these real-world objects. In short,
the ER Diagram is the structural format of the database.
Why Use ER Diagrams In DBMS?
• ER diagrams are used to represent the E-R model in a database, which makes them easy to convert into relations
(tables).
• ER diagrams provide the purpose of real-world modeling of objects which makes them intently useful.
• ER diagrams require no technical knowledge and no hardware support.
• These diagrams are very easy to understand and easy to create even for a naive user.
• It gives a standard solution for visualizing the data logically.
Components of ER Diagram
Entity
An Entity may be an object with a physical existence – a particular person, car, house, or employee – or it may be an object
with a conceptual existence – a company, a job, or a university course.
Entity Set: An Entity is an object of Entity Type and a set of all entities is called an entity set. For Example, E1 is an entity
having Entity Type Student and the set of all students is called Entity Set. In ER diagram, Entity Type is represented as:
Entity Set
1. Strong Entity
A Strong Entity is a type of entity that has a key Attribute. Strong Entity does not depend on other Entity in the Schema. It
has a primary key, that helps in identifying it uniquely, and it is represented by a rectangle. These are called Strong Entity
Types.
2. Weak Entity
An Entity type has a key attribute that uniquely identifies each entity in the entity set. But some entity type exists for which
key attributes can’t be defined. These are called Weak Entity types.
For Example, A company may store the information of dependents (Parents, Children, Spouse) of an Employee. But the
dependents don’t have existed without the employee. So Dependent will be a Weak Entity Type and Employee will be
Identifying Entity type for Dependent, which means it is Strong Entity Type.
A weak entity type is represented by a Double Rectangle. The participation of weak entity types is always total. The
relationship between the weak entity type and its identifying strong entity type is called identifying relationship and it is
represented by a double diamond.
Attribute
1. Key Attribute
The attribute which uniquely identifies each entity in the entity set is called the key attribute. For example, Roll_No will be
unique for each student. In ER diagram, the key attribute is represented by an oval with underlying lines.
Key Attribute
2. Composite Attribute
An attribute composed of many other attributes is called a composite attribute. For example, the Address attribute of the
student Entity type consists of Street, City, State, and Country. In ER diagram, the composite attribute is represented by an
oval comprising of ovals.
Composite Attribute
3. Multivalued Attribute
An attribute consisting of more than one value for a given entity. For example, Phone_No (can be more than one for a given
student). In ER diagram, a multivalued attribute is represented by a double oval.
Multivalued Attribute
4. Derived Attribute
An attribute that can be derived from other attributes of the entity type is known as a derived attribute. e.g.; Age (can be
derived from DOB). In ER diagram, the derived attribute is represented by a dashed oval.
Derived Attribute
The Complete Entity Type Student with its Attributes can be represented as:
Entity-Relationship Set
A set of relationships of the same type is known as a relationship set. The following relationship set depicts S1 as enrolled
in C2, S2 as enrolled in C1, and S3 as registered in C3.
Relationship Set
Unary Relationship
2. Binary Relationship: When there are TWO entities set participating in a relationship, the relationship is called a binary
relationship. For example, a Student is enrolled in a Course.
Binary Relationship
3. n-ary Relationship: When there are n entities set participating in a relation, the relationship is called an n-ary
relationship.
Cardinality
The number of times an entity of an entity set participates in a relationship set is known as cardinality. Cardinality can be
of different types:
1. One-to-One: When each entity in each entity set can take part only once in the relationship, the cardinality is one-to-one.
Let us assume that a male can marry one female and a female can marry one male. So the relationship will be one-to-one.
the total number of tables that can be used in this is 2.
2. One-to-Many: In one-to-many mapping as well where each entity can be related to more than one relationship and the
total number of tables that can be used in this is 2. Let us assume that one surgeon department can accommodate many
doctors. So the Cardinality will be 1 to M. It means one department has many Doctors.
Total number of tables that can used is 3.
3. Many-to-One: When entities in one entity set can take part only once in the relationship set and entities in other entity
sets can take part more than once in the relationship set, cardinality is many to one. Let us assume that a student can take
only one course but one course can be taken by many students. So the cardinality will be n to 1. It means that for one course
there can be n students but for one student, there will be only one course.
The total number of tables that can be used in this is 3.
4. Many-to-Many: When entities in all entity sets can take part more than once in the relationship cardinality is many to
many. Let us assume that a student can take more than one course and one course can be taken by many students. So the
relationship will be many to many.
Participation Constraint
Participation Constraint is applied to the entity participating in the relationship set.
1. Total Participation – Each entity in the entity set must participate in the relationship. If each student must enroll in a
course, the participation of students will be total. Total participation is shown by a double line in the ER diagram.
2. Partial Participation – The entity in the entity set may or may NOT participate in the relationship. If some courses are
not enrolled by any of the students, the participation in the course will be partial.
The diagram depicts the ‘Enrolled in’ relationship set with Student Entity set having total participation and Course Entity
set having partial participation.
After step 1:
Step 2:
• Figure out the weak entity types from the diagram and create a corresponding relation(table) that includes all
its simple attributes.
• Add as foreign key all of the primary key attributes in the entity corresponding to the owner entity.
• The primary key is a combination of all the primary key attributes from the owner and the primary key of the
weak entity.
• For the given ER-Diagram we have Dependent as a weak entity, as it is enclosed in a double rectangle that is
indicative of an entity being weak.
• The Dependent relation(table) is created that is shown in the figure below.
After Step 2:
Step 3:
• Now we need to figure out the entities from ER diagram for which there exists a 1-to-1 relationship.
• The entities for which there exists a 1-to-1 relationship, choose one relation(table) as S, the other as T.
Better if S has total participation (reduces the number of NULL values).
• Then we need to add to S all the simple attributes of the relationship if there exists any.
• After that, we add as a foreign key in S the primary key attributes of T.
• For the given ER-Diagram there exists a 1-to-1 relationship between Employee and Department entity.
• Here Department has total participation therefore consider it as relation S and Employee as relation T.
• The 1-to-1 mapping between Employee and Department is depicted in the figure below.
Step 4:
• Now we need to figure out the entities from ER diagram for which there exists a 1-to-N relationship.
• The entities for which there exists a 1-to-N relationship, choose a relation as S as the type at N-side of
relationship and other as T.
• Then we add as a foreign key to S all of the primary key attributes of T.
• In the given ER diagram there are two 1-to-N relationships that exists between Employee-
Department and Employee-Dependent entity.
• The 1-to-N mapping between Employee-Department and Employee-Dependent is depicted in the figure below.
After Step 4.
Step 5:
• Now we need to figure out the entities from ER diagram for which there exists an M-to-N relationship.
• Create a new relation(table) S.
• The primary keys of relations(tables) between which M-to-N relationship exists, are added to the new relation
S created, that acts as a foreign key.
• Then we,add any simple attributes of the M-to-N relationship to S.
• For the given ER-Diagram there exists M-to-N relationship between Employee and Project entity.
• The new table Works_On is created for mapping the relationship between Employee and Project relation(table).
After Step 5;
Step 6:
• Now identify the relations(tables) that contain multi-valued attributes.
• Then we need to create a new relation S
• In the new relation S we add as foreign keys the primary keys of the corresponding relation.
• Then we add the multi-valued attribute to S; the combination of all attributes in S forms the primary key.
• For the given ER-Diagram there exists a multi-valued attribute (Locations) in Department relation(table).
• So, we create a new relation called Dept_Locations. To this new relation we add the primary key
of Department Table that is D_Number and the multi-valued attribute Locations.
After step 6.
Distributed Databases
A distributed database is basically a database that is not limited to one system, it is spread over different sites, i.e, on
multiple computers or over a network of computers. A distributed database system is located on various sites that don’t
share physical components. This may be required when a particular database needs to be accessed by various users
globally. It needs to be managed such that for the users it looks like one single database.
Types:
1. Homogeneous Database:
In a homogeneous database, all different sites store database identically. The operating system, database management
system, and the data structures used – all are the same at all sites. Hence, they’re easy to manage.
Homogeneous Databases
2. Heterogeneous Database:
In a heterogeneous distributed database, different sites can use different schema and software that can lead to problems in
query processing and transactions. Also, a particular site might be completely unaware of the other sites. Different computers
may use a different operating system, different database application. They may even use different data models for the
database. Hence, translations are required for different sites to communicate.
Heterogeneous Database
Distributed Data Storage :
There are 2 ways in which data can be stored on different sites. These are:
1. Replication –
In this approach, the entire relationship is stored redundantly at 2 or more sites. If the entire database is available at all
sites, it is a fully redundant database. Hence, in replication, systems maintain copies of data.
This is advantageous as it increases the availability of data at different sites. Also, now query requests can be processed in
parallel.
However, it has certain disadvantages as well. Data needs to be constantly updated. Any change made at one site needs to
be recorded at every site that relation is stored or else it may lead to inconsistency. This is a lot of overhead. Also,
concurrency control becomes way more complex as concurrent access now needs to be checked over a number of sites.
2. Fragmentation –
In this approach, the relations are fragmented (i.e., they’re divided into smaller parts) and each of the fragments is stored
in different sites where they’re required. It must be made sure that the fragments are such that they can be used to
reconstruct the original relation (i.e, there isn’t any loss of data).
Fragmentation is advantageous as it doesn’t create copies of data, consistency is not a problem.
The main advantage of a distributed database system is that it can provide higher availability and reliability than a
centralized database system. Because the data is stored across multiple sites, the system can continue to function even if
one or more sites fail. In addition, a distributed database system can provide better performance by distributing the data
and processing load across multiple sites.
There are several different architectures for distributed database systems, including:
• Client-server architecture: In this architecture, clients connect to a central server, which manages the distributed
database system. The server is responsible for coordinating transactions, managing data storage, and providing
access control.
• Peer-to-peer architecture: In this architecture, each site in the distributed database system is connected to all
other sites. Each site is responsible for managing its own data and coordinating transactions with other sites.
• Federated architecture: In this architecture, each site in the distributed database system maintains its own
independent database, but the databases are integrated through a middleware layer that provides a common
interface for accessing and querying the data.
Distributed database systems can be used in a variety of applications, including e-commerce, financial services, and
telecommunications. However, designing and managing a distributed database system can be complex and requires
careful consideration of factors such as data distribution, replication, and consistency.
Advantages of Distributed Database System :
1) There is fast data processing as several sites participate in request processing.
2) Reliability and availability of this system is high.
3) It possess reduced operating cost.
4) It is easier to expand the system by adding more sites.
5) It has improved sharing ability and local autonomy.
***********************