0% found this document useful (0 votes)
34 views47 pages

DBMS Unit 2

Uploaded by

RIYA SUDRIK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views47 pages

DBMS Unit 2

Uploaded by

RIYA SUDRIK
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 47

MIT School of Computing

Department of Computer Science & Engineering

Third Year Engineering

21BTCS502-Database Management System

Class - T.Y.PLD
(Division-)

AY 2023-2024
SEM-I

1
Unit – II

Database Design

2
MIT School of Computing
Department of Computer Science & Engineering

Syllabus

● Functional Dependency, Purpose of Normalization, Data


Redundancy and Update Anomalies, Functional Dependency
Single Valued Dependencies.
PLD
● Single Valued Normalization: 1NF, 2NF, 3NF, BCNF.
● Decomposition: lossless join decomposition and dependency
preservation.
● Multi valued Normalization (4NF), Join Dependencies and the
Fifth Normal Form.

3
Functional Dependency
• Functional dependency in DBMS, as the name suggests is a
relationship between attributes of a table dependent on each
other.
• Functional Dependency (FD) determines the relation of one
attribute to another attribute in a database management
system.
• Functional dependency helps you to maintain the quality of
data in the database.
• A functional dependency is denoted by an arrow →.
• X→Y read as X determines Y
• Where X is the determinant attribute and Y is the dependent
attribute.
• E.g. sid → sname
Types of Functional Dependency
1.Multivalued dependency
2.Trivial functional dependency
3.Non-trivial functional dependency
4.Transitive dependency
1. Multivalued dependency-Multivalued dependency occurs in
the situation where there are multiple independent multivalued
attributes in a single table.
2. Trivial functional dependency -The Trivial dependency is a set
of attributes which are called a trivial if the set of attributes are
included in that attribute.
So, X -> Y is a trivial functional dependency if Y is a subset of X.
3. Non-trivial functional dependency-
Functional dependency which also known as a nontrivial
dependency occurs when A->B holds true where B is not a subset
of A. In a relationship, if attribute B is not a subset of attribute A,
then it is considered as a non-trivial dependency.

4. Transitive dependency-
A transitive is a type of functional dependency which happens
when t is indirectly formed by two functional dependencies.
Properties of functional dependencies
1. Reflexivity: If Y is a subset of X then X 🡪 Y and it is
always valid.
e.g. sid🡪sid
2. Augmentation: if X 🡪 Y then XZ 🡪 YZ
e.g. sidphoneno 🡪 snamephoneno

3. Transitive: if X 🡪 Y and Y 🡪 Z then X 🡪 Z


e.g. sid 🡪 sname and sname 🡪city then sid 🡪city

4. Union: If X 🡪 Y and X 🡪 Z then X 🡪Y Z

5. Decomposition: if X 🡪Y Z then X 🡪 Y and X 🡪 Z


Purpose of Normalization
• If a database design is not perfect, it may contain anomalies,
which are like a bad dream for any database administrator.
Managing a database with anomalies is next to impossible.
• Update anomalies − If data items are scattered and are not
linked to each other properly, then it could lead to strange
situations.
• Deletion anomalies − We tried to delete a record, but parts of it
was left undeleted because of unawareness, the data is also
saved somewhere else.
• Insert anomalies − We tried to insert data in a record that does
not exist at all.
Data Redundancy and Update Anomalies
Codd’s 12 Rule:
Rule 1: Information Rule
• The data stored in a database, may it be user data or
metadata, must be a value of some table cell.
Everything in a database must be stored in a table
format.
Rule 2: Guaranteed Access Rule
• Every single data element (value) is guaranteed to be
accessible logically with a combination of table-name,
primary-key (row value), and attribute-name (column
value). No other means, such as pointers, can be used
to access data.
10
Rule 3: Systematic Treatment of NULL Values
• The NULL values in a database must be given a systematic
and uniform treatment. This is a very important rule
because a NULL can be interpreted as one the following −
data is missing, data is not known, or data is not applicable.
Rule 4: Active Online Catalog
• The structure description of the entire database must be
stored in an online catalog, known as data dictionary,
which can be accessed by authorized users. Users can use
the same query language to access the catalog which they
use to access the database itself.

11
Rule 5: Comprehensive Data Sub-Language Rule
• A database can only be accessed using a language
having linear syntax that supports data definition, data
manipulation, and transaction management operations.
This language can be used directly or by means of
some application. If the database allows access to data
without any help of this language, then it is considered
as a violation.
Rule 6: View Updating Rule
• All the views of a database, which can theoretically be
updated, must also be updatable by the system.
12
Rule 7: High-Level Insert, Update, and Delete Rule
• A database must support high-level insertion, updation,
and deletion. This must not be limited to a single row,
that is, it must also support union, intersection and
minus operations to yield sets of data records.
Rule 8: Physical Data Independence
• The data stored in a database must be independent of
the applications that access the database. Any change in
the physical structure of a database must not have any
impact on how the data is being accessed by external
applications.
13
Rule 9: Logical Data Independence
• The logical data in a database must be independent of its user’s
view (application). Any change in logical data must not affect the
applications using it. For example, if two tables are merged or
one is split into two different tables, there should be no impact or
change on the user application. This is one of the most difficult
rule to apply.
Rule 10: Integrity Independence
• A database must be independent of the application that uses it.
All its integrity constraints can be independently modified
without the need of any change in the application. This rule
makes a database independent of the front-end application and
its interface.

14
Rule 11: Distribution Independence
• The end-user must not be able to see that the data is
distributed over various locations. Users should always
get the impression that the data is located at one site
only. This rule has been regarded as the foundation of
distributed database systems.
Rule 12: Non-Subversion Rule
• If a system has an interface that provides access to low-
level records, then the interface must not be able to
subvert the system and bypass security and integrity
constraints.
15
Normalization
• Normalization is a database design technique that reduces data
redundancy and eliminates undesirable characteristics like
Insertion, Update and Deletion Anomalies.
Types of Normal forms
1. First Normal Form
2. Second Normal Form
3. Third Normal Form
4. BCNF (Boyce Codd Normal Form)
5. Fifth Normal Form
1st Normal Form (1NF)

• A table is referred to as being in its First Normal Form if


atomicity of the table is 1.

• Here, atomicity states that a single cell cannot hold multiple


values. It must hold only a single-valued attribute.

• The First normal form disallows the multi-valued attribute,


composite attribute, and their combinations.

Let’s understand the First Normal Form with the help of an example.
Below is a students’ record table that has information about
student roll number, student name, student course, and age of
the student.

In the students record table, you can see that the course column has
two values. Thus it does not follow the First Normal Form.
Now, if you use the First Normal Form to the previous table,
you get the below table as a result.

By applying the First Normal Form, you achieve atomicity,


and also every column has unique values.
Before proceeding with the Second Normal Form, get familiar with
Candidate Key and Super Key.
Candidate Key
A candidate key is a set of one or more columns that can identify a
record uniquely in a table, and you can use each candidate key as a 
Primary Key.
Now, let’s use an example to understand this better.
Super Key
Super key is a set of over one key that can identify a record uniquely
in a table, and the Primary Key is a subset of Super Key.
Let’s understand this with the help of an example.
Second Normal Form (2NF)

•In the 2NF, relational must be in 1NF.


•In the second normal form, all non-key attributes are fully
functional dependent on the primary key
Example: Let's assume, a school can store the data of teachers and
the subjects they teach. In a school, a teacher can teach more than
one subject.
• TEACHER table

TEACHER_ID SUBJECT TEACHER_AGE

25 Chemistry 30
25 Biology 30
47 English 35
83 Math 38
83 Computer 38

In the given table, non-prime attribute TEACHER_AGE is


dependent on TEACHER_ID which is a proper subset of a
candidate key. That's why it violates the rule for 2NF.
To convert the given table into 2NF, we decompose it into
two tables:
23
TEACHER_DETAIL table:

TEACHER_ID TEACHER_AGE
25 30
47 35
83 38

TEACHER_SUBJECT table:

TEACHER_ID SUBJECT
25 Chemistry
25 Biology
47 English
83 Math
83 Computer
Third Normal Form (3NF)

•A relation will be in 3NF if it is in 2NF and not contain any


transitive partial dependency.
•3NF is used to reduce the data duplication. It is also used to
achieve the data integrity.
•If there is no transitive dependency for non-prime attributes, then
the relation must be in third normal form.
A relation is in third normal form if it holds atleast one of the
following conditions for every non-trivial function dependency
X → Y.
1.X is a super key.
2.Y is a prime attribute, i.e., each element of Y is part of some
candidate key.
Example:
EMPLOYEE_DETAIL table:
EMP_ID EMP_NAME EMP_ZIP EMP_STATE EMP_CITY
222 Harry 201010 UP Noida
333 Stephan 02228 US Boston
444 Lan 60007 US Chicago
555 Katharine 06389 UK Norwich
666 John 462007 MP Bhopal

Super key in the table above:


1.{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on
  
Candidate key: {EMP_ID}
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime .

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent


on EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively
dependent on super key(EMP_ID). It violates the rule of third normal form.
That's why we need to move the EMP_CITY and EMP_STATE to the new
<EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMPLOYEE table:

EMP_ID EMP_NAME EMP_ZIP


222 Harry 201010
333 Stephan 02228
444 Lan 60007
555 Katharine 06389
666 John 462007

EMPLOYEE_ZIP table:

EMP_ZIP EMP_STATE EMP_CITY


201010 UP Noida
02228 US Boston
60007 US Chicago
06389 UK Norwich
462007 MP Bhopal
Boyce Codd Normal Form (BCNF)
Boyce Codd Normal Form is also known as 3.5 NF. It is the
superior version of 3NF and was developed by Raymond F.
Boyce and Edgar F. Codd to tackle certain types of anomalies
which were not resolved with 3NF.

The first condition for the table to be in Boyce Codd Normal


Form is that the table should be in the third normal form.
Secondly, every Right-Hand Side (RHS) attribute of the
functional dependencies should depend on the super key of that
particular table.
For example :
You have a functional dependency X → Y. In the particular
functional dependency, X has to be the part of the super key of
the provided table.
Consider the subject table:

The subject table follows these


conditions:
•Each student can enroll in multiple subjects.
•Multiple professors can teach a particular subject.
•For each subject, it assigns a professor to the student.

In the above table, student_id and subject together form the primary
key because using student_id and subject; you can determine all the
table columns.
Another important point to be noted here is that one professor
teaches only one subject, but one subject may have two professors.
Which exhibit there is a dependency between subject and professor,
i.e. subject depends on the professor's name.
This table follows all the Normal forms except the Boyce Codd
Normal Form.
As you can see stuid, and subject forms the primary key, which
means the subject attribute is a prime attribute.
However, there exists yet another dependency - professor → subject.
BCNF does not follow in the table as a subject is a prime attribute,
the professor is a non-prime attribute.
.
To transform the table into the BCNF, you will divide the table
into two parts.
One table will hold stuid which already exists and the second
table will hold a newly created column profid and in the second
table will have the columns profid, subject, and professor, which
satisfies the BCNF
Multivalued Dependency
•Multivalued dependency occurs when two attributes in a table are
independent of each other but, both depend on a third attribute.
•A multivalued dependency consists of at least two attributes that
are dependent on a third attribute that's why it always requires at
least three attributes.
Example: Suppose there is a bike manufacturer company which
produces two colors(white and black) of each model every year.
BIKE_MODEL MANUF_YEAR COLOR
M2011 2008 White
M2001 2008 Black
M3001 2013 White
M3001 2013 Black
M4006 2017 White
M4006 2017 Black
Here columns COLOR and MANUF_YEAR are dependent on
BIKE_MODEL and independent of each other.

In this case, these two columns can be called as multivalued


dependent on BIKE_MODEL.

The representation of these dependencies is shown below:


1.BIKE_MODEL   →  →  MANUF_YEAR  
2.BIKE_MODEL   →  →  COLOR  

This can be read as "BIKE_MODEL multidetermined


MANUF_YEAR" and "BIKE_MODEL multidetermined
COLOR".
Fourth normal form (4NF)
•A relation will be in 4NF if it is in Boyce Codd normal form and
has no multi-valued dependency.

•For a dependency A → B, if for a single value of A, multiple


values of B exists, then the relation will be a multi-valued
dependency.
Example:
The given STUDENT table is in 3NF, but the COURSE and
HOBBY are two independent entity. Hence, there is no relationship
between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two
courses, Computer and Math and two
hobbies, Dancing and Singing. So there is a Multi-valued
dependency on STU_ID, which leads to unnecessary repetition of
data.
So to make the above table into 4NF, we can decompose it into
two tables:
Relational Decomposition
•When a relation in the relational model is not in appropriate normal
form then the decomposition of a relation is required.
•In a database, it breaks the table into multiple tables.

•If the relation has no proper decomposition, then it may lead to


problems like loss of information.

•Decomposition is used to eliminate some of the problems of bad


design like anomalies, inconsistencies, and redundancy.

Types of Decomposition

1. Lossless Decomposition
2. Lossy decomposition
Lossless Decomposition
•If the information is not lost from the relation that is decomposed,
then the decomposition will be lossless.
•The lossless decomposition guarantees that the join of relations
will result in the same relation as it was decomposed.
•The relation is said to be lossless decomposition if natural joins of
all the decomposition give the original relation.
Example:
EMPLOYEE_DEPARTMENT table:
EMP_ID EMP_NAME EMP_AG EMP_CITY DEPT_ID DEPT_NAME
E
22 Denim 28 Mumbai 827 Sales
33 Alina 25 Delhi 438 Marketing
46 Stephan 30 Bangalore 869 Finance
52 Katherine 36 Mumbai 575 Production
The 60 Jack is decomposed into two
above relation 40 relations
Noida 678 Testing
EMPLOYEE and DEPARTMENT
EMPLOYEE table:

EMP_ID EMP_NAME EMP_AGE EMP_CITY

22 Denim 28 Mumbai
33 Alina 25 Delhi
46 Stephan 30 Bangalore
52 Katherine 36 Mumbai
60 Jack 40 Noida

DEPARTMENT table

DEPT_ID EMP_ID DEPT_NAME

827 22 Sales
438 33 Marketing
869 46 Finance
575 52 Production
678 60 Testing
Now, when these two relations are joined on the common column
"EMP_ID", then the resultant relation will look like:
Employee ⋈ Department
EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

Hence, the decomposition is Lossless join decomposition.


Dependency Preserving

•It is an important constraint of the database.


•In the dependency preservation, at least one decomposed table must
satisfy every dependency.
•If a relation R is decomposed into relation R1 and R2, then the
dependencies of R either must be a part of R1 or R2 or must be
derivable from the combination of functional dependencies of R1
and R2.
•For example, suppose there is a relation R (A, B, C, D) with
functional dependency set (A->BC). The relational R is decomposed
into R1(ABC) and R2(AD) which is dependency preserving because
FD A->BC is a part of relation R1(ABC).
Fifth normal form (5NF)
•A relation is in 5NF if it is in 4NF and not contains any join
dependency and joining should be lossless.
•5NF is satisfied when all the tables are broken into as many tables
as possible in order to avoid redundancy.
•5NF is also known as Project-join normal form (PJ/NF).
In the above table, John takes both Computer and Math class for
Semester 1 but he doesn't take Math class for Semester 2.
In this case, combination of all these fields required to identify a
valid data.

Suppose we add a new Semester as Semester 3 but do not know


about the subject and who will be taking that subject so we leave
Lecturer and Subject as NULL. But all three columns together acts
as a primary key, so we can't leave other two columns blank.

So to make the above table into 5NF, we can decompose it into


three relations P1, P2 & P3:
P1
Summary
Normal Form Description
1NF A relation is in 1NF if it contains an atomic value.
2NF A relation will be in 2NF if it is in 1NF and all non-key
attributes are fully functional dependent on the primary key.
3NF A relation will be in 3NF if it is in 2NF and no transition
dependency exists.
BCNF A stronger definition of 3NF is known as Boyce Codd's normal
form.
4NF A relation will be in 4NF if it is in Boyce Codd's normal form
and has no multi-valued dependency.
5NF A relation is in 5NF. If it is in 4NF and does not contain any join
dependency, joining should be lossless.
Question
• What is Functional dependency? Explain its type with examples.
• What do you mean by redundancy? How can this be avoided?
• What is Multivalued dependency.explain with example
• What is Lossless join property?
• What is Fully Functional dependency?
• What is lossy decomposition?
• What is 1NF, 2NF Explain with suitable examples?
• What is 3NF and BCNF (Boyce-Codd Normal Form)?
• Explain in detail the fifth normal form?
• State the need for normalization? Explain 1NF , 2NF and 3NF
• List the properties of decomposition. Explain lossless join with the
example
• Specify Codd’s Norms to be satisfied by RDBMS

46
Question
• What is lossless decomposition? Suppose that we decompose the schema
R=(A,B,C,D,E) into (A,B,C) and (A,D,E), show that this decomposition is a
lossless decomposition if the following set F of functional dependencies
holds: A→BC CD → E B→D E → A.
• Given relation R (A, B, C, D, E) with dependencies AB→ C, CD→ E, DE→ B,
IS AB a candidate key? or IS ABD is a candidate key?
• List the desirable properties of decomposition. Explain lossless join with
an example.
• Define Boyce Codd Normal form. How is it different from 3NF? Why it is
considered as stronger form of 3NF

47

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy