Dbms Notes
Dbms Notes
A Database Management System (DBMS) is a software system that is designed to manage and
organize data in a structured manner. It allows users to create, modify, and query a database, as well
as manage the security and access controls for that database.
1. Data Security: The more accessible and usable the database, the more it is prone to security
issues. As the number of users increases, the data transferring or data sharing rate also increases
thus increasing the risk of data security. It is widely used in the corporate world where
companies invest large amounts of money, time, and effort to ensure data is secure and used
properly. A Database Management System (DBMS) provides a better platform for data privacy
and security policies thus, helping companies to improve Data Security.
2. Data integration: Due to the Database Management System we have access to well-managed
and synchronized forms of data thus it makes data handling very easy and gives an integrated
view of how a particular organization is working and also helps to keep track of how one
segment of the company affects another segment.
3. Data abstraction: The major purpose of a database system is to provide users with an abstract
view of the data. Since many complex algorithms are used by the developers to increase the
efficiency of databases that are being hidden by the users through various data abstraction
levels to allow users to easily interact with the system.
4. Reduction in data Redundancy: When working with a structured database, DBMS provides
the feature to prevent the input of duplicate items in the database. for e.g. – If there are two
same students in different rows, then one of the duplicate data will be deleted.
5. Data sharing: A DBMS provides a platform for sharing data across multiple applications and
users, which can increase productivity and collaboration.
6. Data consistency and accuracy: DBMS ensures that data is consistent and accurate by
enforcing data integrity constraints and preventing data duplication. This helps to eliminate data
discrepancies and errors that can occur when data is stored and managed manually.
8. Efficient data access and retrieval: DBMS allows for efficient data access and retrieval by
providing indexing and query optimization techniques that speed up data retrieval. This reduces
the time required to process large volumes of data and increases the overall performance of the
system.
Application of DBMS:
There are different fields where a database management system is utilized. Following are a few
applications that utilize the information base administration framework.
1. Railway Reservation System:
In the rail route reservation framework, the information base is needed to store the record or
information of ticket appointments, status of train’s appearance, and flight. Additionally, if trains get
late, individuals become acquainted with it through the information base update.
3. Banking
Database the executive’s framework is utilized to store the exchange data of the client in the
information base.
4. Education Sector
Presently, assessments are led online by numerous schools and colleges. They deal with all
assessment information through the data set administration framework (DBMS). In spite of that
understudy’s enlistments subtleties, grades, courses, expense, participation, results, and so forth all
the data is put away in the information base.
o Logical data independence refers characteristic of being able to change the conceptual schema
without having to change the external schema.
o Logical data independence is used to separate the external level from the conceptual view.
o If we do any changes in the conceptual view of the data, then the user view of the data would
not be affected.
o Logical data independence occurs at the user interface level.
o Physical data independence can be defined as the capacity to change the internal schema
without having to change the conceptual schema.
o If we do any changes in the storage size of the database system server, then the Conceptual
structure of the database will not be affected.
o Physical data independence is used to separate conceptual levels from the internal levels.
o Physical data independence occurs at the logical interface level.
“Schema” and “Instance” are key ideas in a database management system (DBMS) that help
organize and manage data. Let’s begin by examining their distinctions from one another.
Instances:
An Instance is the state of an operational database with data at any given time. It contains a
snapshot of the database. The instances can be changed by certain CRUD operations, such as like
addition, and deletion of data. It may be noted that any search query will not make any kind of
changes in the instances.
Example:
Let’s say a table teacher in our database whose name is School, suppose the table has 50 records so
the instance of the database has 50 records for now and tomorrow we are going to add another fifty
records so tomorrow the instance has a total of 100 records. This is called an instance.
Schema:
Schema is the overall description of the database. The basic structure of how the data will be stored
in the database is called schema.
Schema is of three types: Logical Schema, Physical Schema and view Schema.
Logical Schema – It describes the database designed at a logical level.
Physical Schema – It describes the database designed at the physical level.
View Schema – It defines the design of the database at the view level.
Schema
Schema Instance
Defines the basic structure of the database i.e. It is the set of Information stored at a
how the data will be stored in the database. particular time.
Data Dictionary:
A Data Dictionary can be defined as a collection of information on all data elements or contents of
databases such as data types, and text descriptions of the system.
Types of Data Dictionary
Data Dictionary is basically of two types. We will discuss each of them.
Integrated Data Dictionary
Stand Alone Data Dictionary
1. Internal Level: The internal level, or physical schema, describes the physical storage structure of
the database. It details how data is stored in blocks, the storage space allocations, access paths, and
the optimization of internal structures. This level is concerned with the technical aspects of data
storage, such as data compression, encryption, and the use of data structures like B-Trees and
hashing1.
2. Conceptual Level: Also known as the logical level, the conceptual schema provides a blueprint of
the entire database structure. It outlines what data is stored and the relationships among those data.
This level abstracts the internal details of data storage, focusing instead on the overall design of
the database. Database administrators and developers typically interact with the DBMS at this
level1.
3. External Level: The external level comprises multiple schemas or views, each tailored to different
user groups. These views, also referred to as subschemas, present a portion of the database that is
relevant to a particular group, hiding the rest. For example, a faculty member may have a view that
includes course details of students, while students may have a view that encompasses academic,
account, and hostel information
Data Models
Data Model is the modeling of the data description, data semantics, and consistency constraints of the
data.
There are 4 types of data models:
A simple ER Diagram:
In the following diagram we have two entities Student and College and their
relationship. The relationship between Student and College is many to one as
a college can have many students however a student cannot study in multiple
colleges at the same time. Student entity has attributes such as Stu_Id,
Stu_Name & Stu_Addr and College entity has attributes such as Col_ID &
Col_Name.
Here are the geometric shapes and their meaning in an E-R Diagram. We will
discuss these terms in detail in the next section(Components of a ER
Diagram) of this guide so don’t worry too much about these terms now, just go
through them once.
Components of a ER Diagram
1. Entity
An entity is an object or component of data. An entity is represented as
rectangle in an ER diagram.
For example: In the following ER diagram we have two entities Student and
College and these two entities have many to one relationship as many
students study in a single college. We will read more about relationships later,
for now focus on entities.
Weak Entity:
An entity that cannot be uniquely identified by its own attributes and relies on
the relationship with other entity is called weak entity. The weak entity is
represented by a double rectangle. For example – a bank account cannot be
uniquely identified without knowing the bank to which the account belongs, so
bank account is a weak entity.
2. Attribute
An attribute describes the property of an entity. An attribute is represented as
Oval in an ER diagram. There are four types of attributes:
1. Key attribute
2. Composite attribute
3. Multivalued attribute
4. Derived attribute
1. Key attribute:
A key attribute can uniquely identify an entity from an entity set. For example,
student roll number can uniquely identify a student from a set of students. Key
attribute is represented by oval same as other attributes however the text of
key attribute is underlined.
2. Composite attribute:
3. Multivalued attribute:
An attribute that can hold multiple values is known as multivalued attribute. It
is represented with double ovals in an ER Diagram. For example – A person
can have more than one phone numbers so the phone number attribute is
multivalued.
4. Derived attribute:
A derived attribute is one whose value is dynamic and derived from another
attribute. It is represented by dashed oval in an ER Diagram. For example –
Person age is a derived attribute as it changes over time and can be derived
from another attribute (Date of birth).
E-R diagram with multivalued and derived attributes:
3. Relationship
A relationship is represented by diamond shape in ER diagram, it shows the
relationship among entities. There are four types of relationships:
1. One to One
2. One to Many
3. Many to One
4. Many to Many
When more than one instances of an entity is associated with more than one
instances of another entity then it is called many to many relationship. For
example, a can be assigned to many projects and a project can be assigned
to many students.
Partial participation is represented using a single line between the entity set
and relationship set.
1. Primary Key
This type of keys in DBMS refers to a column that uniquely identifies all the records
within that table. A table has one primary key only, and this key must not contain
repeated or duplicated values across its rows. Each value within the primary
key must be unique, with no repetitions allowed.
The Primary Key field shouldn’t be left NULL; the Primary Key column must
contain a value.
In that column, no two rows in the table may contain identical values.
If a foreign key in a DBMS refers to the primary Key, no value may be altered
or modified in this primary key column.
1. Uniqueness is the most crucial element while selecting this key in DBMS. It
means that this column’s value does not occur in any other table row.
2. The definition and values shouldn’t be altered. While altering a PK column
value would need updating all referenced rows in the child (related) table,
altering a PK’s columns would necessitate redefining all pertinent foreign
keys.
3. If a composite primary key is used, no single column or smaller group of
columns should be able to identify each individual uniquely.
4. Use as few columns as you can, and if you can, pick columns whose values
are simple to read and recall.
2. Candidate Key
Candidate keys play a vital role in upholding the integrity and consistency of a
database. The purpose of this key in DBMS is to guarantee each row’s uniqueness
and independent identification within a table. Additionally, candidate keys enforce
relationships between tables, ensuring data integrity and maintaining overall
database consistency.
The candidate keys Roll No., Stud ID, and Email in the table enable us to identify
each student record uniquely.
A primary key is a column that allows each A column that can uniquely identify
entry in a database to be identified only once. each record in a database is sometimes
It is selected as the primary key for the table referred to as a candidate key. It might
from the list of potential keys. be used as the primary key.
The primary key enforces entity integrity, Candidate keys are potential candidates
The primary key is chosen by the database A candidate key can be chosen from
designer or administrator. the set of candidate keys for a table.
3. Super Key
The collection of all keys enabling us to recognize every row in the table is a super
key. This type of key in DBMS specifies that all the table columns that may identify
the columns uniquely function as the super keys.
It is a key that serves as both a secondary key and a primary key in two
different tables.
At any given time, it combines two or more relations.
They serve as cross-references for the tables.
121 Science
213 English
513 Computer
In this Key in DBMS example, we have two tables: instructor and department at a
school. However, it is hard to distinguish which instructor is assigned to which
division.
We can link the two tables in this table by adding the Foreign Key in Deptcode to the
Teacher name.
Data in one table is connected to another through foreign keys. To develop a manner
of cross-referencing two columns, a foreign key column in one table links to the
column in another table with unique values.
5. Alternate Key
A key in DBMS might be selected as the main Key in a table in multiple ways. Any
key that has the potential to replace the primary key but is not yet the primary key is
considered an alternate key. It’s a prospective main key that hasn’t been selected
yet.
Alternate keys refer to all keys that are not main keys.
It’s a backup key.
It has two or more fields that allow it to recognize two or more records.
These criteria are reiterated.
StudID, Roll No., and Email serves as the main keys. However, because StudID is
the main key, Roll No. and Email is now the secondary key.
6. Compound Key
This key in DBMS contains two or more characteristics that recognise a specific
record exclusively. It’s conceivable that none of the columns in the database are
unique on their own. However, when paired with the additional column or columns,
the composite key combination becomes unique. Each record in the table is to be
uniquely identified using the database’s compound key.
OrderNo and ProductID cannot be the main key since they do not uniquely identify a
record. A compound key combining the Order ID and Product ID might be utilized to
identify each record uniquely.
Compound keys are always constructed from two or more other tables’ primary keys.
Both keys uniquely identify data in their respective tables, but both are required to do
so in the table utilizing the compound key.
Null
Key Definitio Unique Alterati
Valu Purpose Index Usage
Type n ness on
es
Data
Establish
consisten More
es a
Not cy, May Betwee flexible
Foreig relations Allo
Require relations create n for
n Key hip wed
d hip Index tables mainten
between
maintena ance.
tables.
nce.
Alternati
Backup May
Candid ve Not May Within
Require primary become
ate unique Allo create same
d key primary
Key keys that wed Index table
options. key.
could be
Null
Key Definitio Unique Alterati
Valu Purpose Index Usage
Type n ness on
es
primary
keys.
Combina
tion of
Concept May Concep
Super attributes Require Allo
ual, not create tual N/A
Key that d wed
enforced. Index use
uniquely
identify.
Combine
Specializ Create May be
d
Comp Not ed s Within complex
attributes Require
osite Allo unique Comp same , impact
used as a d
Key wed identific osite table perform
single
ation. Index ance.
key.
Artificial
Enhance
keys Create Generall
d data
Surrog assigned Not s Within y static,
Require privacy,
ate for Allo Uniqu same non-
d data
Key record wed e table changin
warehou
identific Index g.
sing.
ation.
Cardinality in DBMS (Mapping Constraints)
Cardinality means how the entities are arranged to each other or what is the relationship
structure between entities in a relationship set. In a Database Management System,
Cardinality represents a number that denotes how many times an entity is participating
with another entity in a relationship set. The Cardinality of DBMS is a very important
attribute in representing the structure of a Database. In a table, the number of rows or
tuples represents the Cardinality.
1. One to one
2. Many to one
3. One to many
4. Many to many
One to one cardinality is represented by a 1:1 symbol. In this, there is at most one
relationship from one entity to another entity. There are a lot of examples of one-to-one
cardinality in real life databases.
For example, one student can have only one student id, and one student id can belong
to only one student. So, the relationship mapping between student and student id will be
one to one cardinality mapping.
Another example is the relationship between the director of the school and the school
because one school can have a maximum of one director, and one director can belong to
only one school.
Many to One Cardinality:
In many to one cardinality mapping, from set 1, there can be multiple sets that can make
relationships with a single entity of set 2. Or we can also describe it as from set 2, and one
entity can make a relationship with more than one entity of set 1.
One to one Cardinality is the subset of Many to one Cardinality. It can be represented
by M:1.
For example, there are multiple patients in a hospital who are served by a single doctor,
so the relationship between patients and doctors can be represented by Many to one
Cardinality.
It is represented by M: N or N: M.
One to one cardinality, One to many cardinalities, and Many to one cardinality is the subset
of the many to many cardinalities.
For Example, in a college, multiple students can work on a single project, and a single
student can also work on multiple projects. So, the relationship between the project and
the student can be represented by many to many cardinalities.
What is a Hierarchical Data Model?
The hierarchical data model is the oldest type of the data model. It was
developed by IBM in 1968. It organizes data in a tree-like structure.
Hierarchical model consists of the following:
It contains nodes which are connected by branches.
The topmost node is called the root node.
If there are multiple nodes appear at the top level, then these can be
called root segments.
Each node has exactly one parent.
One parent may have many children.
In the above figure, Electronics is the root node which has two children i.e.
Televisions and Portable Electronics. These two has further children for
which they act as parent. For example: Television has children as Tube,
LCD and Plasma, for these three Television act as parent. It follows one
to many relationship.
In the above figure, Project is the root node which has two children i.e.
Project 1 and Project 2. Project 1 has 3 children and Project 2 has 2
children. Total there are 5 children i.e Department A, Department B and
Department C, they are network related children as we said that this model
can have more than one parent. So, for the Department B and Department
C have two parents i.e. Project 1 and Project 2.
Advantages of the Network Data Model
The relational data model was developed by E.F. Codd in 1970. There are
no physical links as they are in the hierarchical data model. Following are
the properties of the relational data model:
Data is represented in the form of table only.
It deals only with the data not with the physical structure.
It provides information regarding metadata.
At the intersection of row and column there will be only one value for
the tuple.
It provides a way to handle the queries with ease.
For certain kinds of straightforward data retrieval tasks, they may not
perform as well as hierarchical models.
Demands a deeper comprehension of SQL and normalization
principles.
Update existing
UPDATE table_name SET column1 = value1,
UPDATE data within a column2 = value2 WHERE condition;
table
Delete records
DELETE from a database DELETE FROM table_name WHERE condition;
table
Call a PL/SQL or
CALL JAVA CALL procedure_name(arguments);
subprogram
Command Description Syntax
Describe the
EXPLAIN EXPLAIN PLAN FOR SELECT * FROM
access path to table_name;
PLAN
data
Fundamental Operators:
1. Selection(σ)
2. Projection(π)
3. Union(U)
4. Set Difference(-)
5. Set Intersection(∩)
6. Rename(ρ)
7. Cartesian Product(X)
1 2 4
2 2 3
3 2 3
4 3 4
For the above relation, σ(c>3)R will select the tuples which have c more
than 3.
A B C
1 2 4
A B C
4 3 4
Note: The selection operator only selects the required tuples but does not
display them. For display, the data projection operator is used.
2 4
2 3
3 4
FRENCH
Student_Name Roll_Number
Ram 01
Mohan 02
Vivek 13
Student_Name Roll_Number
Geeta 17
GERMAN
Student_Name Roll_Number
Vivek 13
Geeta 17
Shyam 21
Rohan 25
Student_Name
Ram
Mohan
Vivek
Geeta
Shyam
Rohan
Note: The only constraint in the union of two relations is that both relations
must have the same set of Attributes.
Student_Name
Ram
Mohan
Note: The only constraint in the Set Difference between two relations is that
both relations must have the same set of Attributes.
Vivek
Geeta
Note: The only constraint in the Set Difference between two relations is that
both relations must have the same set of Attributes.
Ram 14 M
Sona 15 F
Kim 20 M
B
ID Course
1 DS
2 DBMS
A
XB
Name Age Sex ID Course
Ram 14 M 1 DS
Ram 14 M 2 DBMS
Sona 15 F 1 DS
Sona 15 F 2 DBMS
Kim 20 M 1 DS
Kim 20 M 2 DBMS
Note: If A has ‘n’ tuples and B has ‘m’ tuples then A X B will have ‘ n*m ‘
tuples.
Derived Operators
These are some of the derived operators, which are derived from the
fundamental operators.
1. Natural Join(⋈)
2. Conditional Join
EMP
Name ID Dept_Name
A 120 IT
B 125 HR
C 110 Sales
D 111 IT
DEPT
Dept_Name Manager
Sales Y
Production Z
IT A
EMP ⋈ DEPT
Name ID Dept_Name Manager
A 120 IT A
C 110 Sales Y
D 111 IT A
R
ID Sex Marks
1 F 45
2 F 55
3 F 60
S
ID Sex Marks
10 M 20
ID Sex Marks
11 M 22
12 M 59
1 F 45 10 M 20
1 F 45 11 M 22
2 F 55 10 M 20
2 F 55 11 M 22
3 F 60 10 M 20
3 F 60 11 M 22
3 F 60 12 M 59
Normalization
A large database defined as a single relation may result in data duplication. This
repetition of data may result in:
What is Normalization?
○ Normalization is the process of organizing the data in the database.
○ Normalization is used to minimize the redundancy from a relation or
set of relations. It is also used to eliminate undesirable characteristics
like Insertion, Update, and Deletion Anomalies.
○ Normalization divides the larger table into smaller and links them using
relationships.
○ The normal form is used to reduce redundancy from the database table.
The main reason for normalizing the relations is removing these anomalies.
Failure to eliminate anomalies leads to data redundancy and can cause data
integrity and other problems as the database grows. Normalization consists of a
series of guidelines that helps to guide you in creating a good database
structure.
Advantages of Normalization
○ Normalization helps to minimize data redundancy.
○ Greater overall database organization.
○ Data consistency within the database.
○ Much more flexible database design.
○ Enforces the concept of relational integrity.
Disadvantages of Normalization
○ You cannot start building the database before knowing what the user
needs.
○ The performance degrades when normalizing the relations to higher
normal forms, i.e., 4NF, 5NF.
○ It is very time-consuming and difficult to normalize relations of a higher
degree.
○ Careless decomposition may lead to a bad database design, leading to
serious problems.
EMPLOYEE table:
14 John 7272826385, UP
9064738238
8589830302
The decomposition of the EMPLOYEE table into 1NF has been shown below:
14 John 9064738238 UP
Example: Let's assume, a school can store the data of teachers and the subjects
they teach. In a school, a teacher can teach more than one subject.
TEACHER table
25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_ID TEACHER_AGE
25 30
47 35
83 38
TEACHER_SUBJECT table:
TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
A relation is in third normal form if it holds atleast one of the following conditions
for every non-trivial function dependency X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Example:
EMPLOYEE_DETAIL table:
1.
Super key in the table above:
{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME,
EMP_ZIP}....so on
2. Candidate key: {EMP_ID}
Non-prime attributes: In the given table, all attributes except EMP_ID are
non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent
on EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively
dependent on super key(EMP_ID). It violates the rule of third normal form.
That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMPLOYEE table:
3.
EMPLOYEE_ZIP table:
EMP_ZIP EMP_STATE EMP_CITY
201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
Example: Let's assume there is a company where employees work in more than
one department.
EMPLOYEE table:
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate key: {EMP-ID, EMP-DEPT}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID EMP_COUNTRY
264 India
264 India
EMP_DEPT table:
EMP_DEPT_MAPPING table:
EMP_ID EMP_DEPT
D394 283
D394 300
D283 232
D283 549
Functional dependencies:
EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
Now, this is in BCNF because left side part of both the functional dependencies is
a key.