0% found this document useful (0 votes)
11 views56 pages

normalisation

Uploaded by

Kashish Bhalla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views56 pages

normalisation

Uploaded by

Kashish Bhalla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Normalization

Features of good relational database design.

Simple model: Relational Model is simple and easy to use in comparison


to other languages.
Flexible: Relational Model is more flexible than any other relational model
present.
Secure: Relational Model is more secure than any other relational model.
Data Accuracy: Data is more accurate in the relational data model.
Data Integrity: The integrity of the data is maintained in the relational
model.
Operations can be Applied Easily: It is better to perform operations in the
relational model.
Data Normalization
Normalization is the process of organizing the data in the database.
Normalization is used to minimize the redundancy from a relation or set of
relations. It is also used to eliminate undesirable characteristics like
Insertion, Update, and Deletion Anomalies.
Normalization divides the larger table into smaller and links them using
relationships.
The normal form is used to reduce redundancy from the database table
Advantages of Normalization

Normalization helps to minimize data redundancy.


Greater overall database organization.
Data consistency within the database.
Much more flexible database design.
Enforces the concept of relational integrity
Disadvantages of Normalization
You cannot start building the database before knowing what the user needs.
The performance degrades when normalizing the relations to higher normal
forms, i.e., 4NF, 5NF.
It is very time-consuming and difficult to normalize relations of a higher
degree.
Careless decomposition may lead to a bad database design, leading to
serious problems.
Normal Forms
First Normal Form (1NF): This is the most basic level of normalization. In
1NF, each table cell should contain only a single value (atomic values), and each
column should have a unique name. The first normal form helps to eliminate
duplicate data and simplify queries.

Second Normal Form (2NF): 2NF eliminates redundant data by requiring that
each non-key attribute be dependent on the primary key. This means that there
should not be any partial functional dependency and that each column should be
directly related to the primary key, and not to other columns.

Third Normal Form (3NF): 3NF builds on 2NF by requiring that all non-key
attributes are independent of each other. This means that there should not be any
transitive dependency and that each column should be directly related to the
primary key, and not to any other columns in the same table.
Normal Forms
Boyce-Codd Normal Form (BCNF): BCNF is a stricter form of 3NF that
ensures that each determinant in a table is a candidate key. In other words,
BCNF ensures that each non-key attribute is dependent only on the
candidate key.

Normalization Process:
Find the candidate key of the given relation by generating the closure set of attributes
for all attributes i.e identify all candidate keys.
Check the relation for 2NF. If it does not satisfy 2 NF then decompose the table so that
the resulting relation follow 2NF.
If relation satisfies 2 NF then check for 3NF. If it does not satisfy 3 NF then
decompose the table so that the resulting relation follow 3NF.
First Normal Form

If a relation contain composite or multi-valued attribute, it violates first


normal form or a relation is in first normal form if it does not contain
any composite or multi-valued attribute.

A relation is in first normal form if every attribute in that relation


is singled valued attribute.
Example 1 – Relation STUDENT in table 1 is not in 1NF because of multi-valued
attribute STUD_PHONE. Its decomposition into 1NF has been shown in table 2.
Second Normal Form
• To be in second normal form, a relation must be in first normal form
and relation must not contain any partial dependency. A relation is in
2NF if it has No Partial Dependency, i.e., no non-prime attribute
(attributes which are not part of any candidate key) is dependent on
any proper subset of any candidate key of the table.

• Partial Dependency – If the proper subset of candidate key


determines non-prime attribute, it is called partial dependency.
Example 1 – Consider table-3 as following below.
• Note that, there are many courses having the
same course fee
• COURSE_FEE cannot alone decide the value of
COURSE_NO or STUD_NO
• COURSE_FEE together with STUD_NO cannot decide
the value of COURSE_NO
• COURSE_FEE together with COURSE_NO cannot
decide the value of STUD_NO
• COURSE_FEE would be a non-prime attribute, as it
does not belong to the one only candidate key
{STUD_NO, COURSE_NO}

• But, COURSE_NO -> COURSE_FEE,

• COURSE_FEE is dependent on COURSE_NO, which is a


proper subset of the candidate key.
• COURSE_FEE is dependent on COURSE_NO, which is a
proper subset of the candidate key. which is a partial
dependency and so this relation is not in 2NF.
• To convert the above relation to 2NF, we need to split the table into
two tables such as : Table 1: STUD_NO, COURSE_NO Table 2:
COURSE_NO, COURSE_FEE

Table 1 Table 2
STUD_NO COURSE_NO COURSE_NO COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C3 1000
4 C3 C4 2000
4 C1 C5 2000

NOTE: 2NF tries to reduce the redundant For instance, if there are 100 students taking C1
data getting stored in memory course, we don’t need to store its Fee as 1000 for
all the 100 records, instead, once we can store it
in the second table as the course fee for C1 is
1000.
Types of Functional dependencies in DBMS

In a relational database management, functional dependency is a


concept that specifies the relationship between two sets of attributes
where one attribute determines the value of another attribute.

It is denoted as X → Y, where the attribute set on the left side of the


arrow, X is called Determinant, and Y is called the Dependent.
Example:
Roll_no name dept_name dept_building

42 abc CO A4

43 pqr IT A3

44 xyz CO A4

45 xyz IT A3

46 mno EC B2

47 jkl ME B2

roll_no → { name, dept_name, dept_building }


Here, roll_no can determine values of fields name, dept_name and dept_building,
hence a valid Functional dependency

roll_no → dept_name ,
Since, roll_no can determine whole set of {name, dept_name, dept_building}, it can
determine its subset dept_name also.
dept_name → dept_building ,
Dept_name can identify the dept_building accurately, since
departments with different dept_name will also have a different
dept_building
• More valid functional dependencies:
roll_no → name,
{roll_no, name} ⇢ {dept_name, dept_building}, etc.
Here are some invalid functional
dependencies:
name → dept_name
Students with the same name can have different dept_name, hence this
is not a valid functional dependency.
dept_building → dept_name
There can be multiple departments in the same building. Example, in
the above table departments ME and EC are in the same building B2,
hence dept_building → dept_name is an invalid functional dependency.
More invalid functional dependencies:
• name → roll_no,
• {name, dept_name} → roll_no, dept_building → roll_no, etc.
Types of Functional Dependencies in DBMS

• Trivial functional dependency


• Non-Trivial functional dependency
• Multivalued functional dependency
• Transitive functional dependency
Trivial Functional Dependency
• In Trivial Functional Dependency, a dependent is always a subset of
the determinant. i.e.
• If X → Y and Y is the subset of X, then it is called trivial functional
dependency
• Example: roll_no name age

42 abc 17

43 pqr 18

44 xyz 18

Here, {roll_no, name} → name is a trivial functional dependency, since the


dependent name is a subset of determinant set {roll_no, name}. Similarly, roll_no →
roll_no is also an example of trivial functional dependency.
Non-trivial Functional Dependency
• In Non-trivial functional dependency, the dependent is strictly not a
subset of the determinant. i.e.
• If X → Y and Y is not a subset of X, then it is called Non-trivial
functional dependency.
• Example:
roll_no name age

42 abc 17

43 pqr 18

44 xyz 18

Here, roll_no → name is a non-trivial functional dependency, since the dependent name is not
a subset of determinant roll_no. Similarly, {roll_no, name} → age is also a non-trivial
functional dependency, since age is not a subset of {roll_no, name}
Multivalued Functional Dependency
• In Multivalued functional dependency, entities of the dependent set
are not dependent on each other. i.e.
• If a → {b, c} and there exists no functional dependency between b
and c, then it is called a multivalued functional dependency.
• For example,
roll_no name age

42 abc 17

43 pqr 18

44 xyz 18

45 abc 19

Here, roll_no → {name, age} is a multivalued functional dependency, since


the dependents name & age are not dependent on each other(i.e. name →
age or age → name doesn’t exist !)
Transitive Functional Dependency
• In transitive functional dependency, dependent is indirectly
dependent on determinant. i.e.
• If a → b & b → c, then according to axiom of transitivity, a → c. This
is a transitive functional dependency.
• For example, enrol_no name dept building_no

42 abc CO 4

43 pqr EC 2

44 xyz IT 1

45 abc EC 2

Here, enrol_no → dept and dept → building_no. Hence, according to the axiom of
transitivity, enrol_no → building_no is a valid functional dependency. This is an
indirect functional dependency, hence called Transitive functional dependency.
Armstrong’s axioms/properties of
functional dependencies:
• Reflexivity: If Y is a subset of X, then X→Y holds by reflexivity rule
Example, {roll_no, name} → name is valid.

• Augmentation: If X → Y is a valid dependency, then XZ → YZ is also valid by the


augmentation rule.
Example, {roll_no, name} → dept_building is valid, hence {roll_no, name, dept_name} →
{dept_building, dept_name} is also valid.

• Transitivity: If X → Y and Y → Z are both valid dependencies, then X→Z is also valid by the
Transitivity rule.
Example, roll_no → dept_name & dept_name → dept_building, then roll_no → dept_building
is also valid.
Fully Functional Dependency

• In full functional dependency an attribute or a set of attributes


uniquely determines another attribute or set of attributes.

• If a relation R has attributes X, Y, Z with the dependencies

• X->Y and X->Z which states that those dependencies are fully
functional.
Partial Functional Dependency

• In partial functional dependency a non key attribute depends on a part


of the composite key, rather than the whole key. If a relation R has
attributes X, Y, Z where X and Y are the composite key and Z is non
key attribute. Then X->Z is a partial functional dependency in
RBDMS.
Third Normal Form (3NF)

• A relation is in the third normal form, if there is no transitive dependency


for non-prime attributes as well as it is in the second normal form. A
relation is in 3NF if at least one of the following conditions holds in
every non-trivial function dependency X –> Y.

• X is a super key.

• Y is a prime attribute (each element of Y is part of some candidate key).


In other words,
A relation that is in First and Second Normal Form and in which no

non-primary-key attribute is transitively dependent on the primary key,

then it is in Third Normal Form (3NF).

Note: If A->B and B->C are two FDs then A->C is called transitive

dependency.
• The normalization of 2NF relations to 3NF involves the removal of
transitive dependencies.

• If a transitive dependency exists, we remove the transitively


dependent attribute(s) from the relation by placing the attribute(s) in a
new relation along with a copy of the determinant.
Example

FD set: {STUD_NO -> STUD_NAME, STUD_NO -> STUD_STATE, STUD_STATE ->


STUD_COUNTRY, STUD_NO -> STUD_AGE}

Candidate Key: {STUD_NO} For this relation in table 4, STUD_NO -> STUD_STATE and
STUD_STATE -> STUD_COUNTRY are true.

So STUD_COUNTRY is transitively dependent on STUD_NO.


It violates the third normal form.

To convert it in third normal form, we will decompose the relation STUDENT (STUD_NO,
STUD_NAME, STUD_PHONE, STUD_STATE, STUD_COUNTRY_STUD_AGE) as

STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_AGE)


STATE_COUNTRY (STATE, COUNTRY)
Example 2:
• Consider relation R(A, B, C, D, E)
• A -> BC,
• CD -> E,
• B -> D,
• E -> A
• All possible candidate keys in above relation are {A, E, CD, B} All attribute
are on right sides of all functional dependencies are prime.
• Note:
• Third Normal Form (3NF) is considered adequate for normal relational
database design because most of the 3NF tables are free of insertion,
update, and deletion anomalies. Moreover, 3NF always ensures functional
dependency preserving and lossless.
Boyce-Codd Normal Form (BCNF)

• Application of the general definitions of 2NF and 3NF may identify


additional redundancy caused by dependencies that violate one or
more candidate keys.

• However, despite these additional constraints, dependencies can still


exist that will cause redundancy to be present in 3NF relations.

• This weakness in 3NF resulted in the presentation of a stronger


normal form called the Boyce-Codd Normal Form (Codd, 1974).
Boyce–Codd Normal Form (BCNF)
• Although, 3NF is an adequate normal form for relational databases,
still, this (3NF) normal form may not remove 100% redundancy
• because of X−>Y functional dependency if X is not a candidate key of
the given relation. This can be solved by Boyce-Codd Normal Form
(BCNF).
• Boyce–Codd Normal Form (BCNF) is based on functional
dependencies that take into account all candidate keys in a relation;
however, BCNF also has additional constraints compared with the
general definition of 3NF.
Rules for BCNF

• Rule 1: The table should be in the 3rd Normal Form.

• Rule 2: X should be a superkey for every functional dependency (FD)


X−>Y in a given relation.
Examples
• Here, we are going to discuss some basic examples which let you
understand the properties of BCNF. We will discuss multiple examples
here.
Stu_ID Stu_Branch Stu_Course Branch_Number Stu_Course_No

Computer Science &


101 DBMS B_001 201
Engineering

Computer Science &


101 Computer Networks B_001 202
Engineering

Electronics &
102 Communication VLSI Technology B_003 401
Engineering

Electronics &
102 Communication Mobile Communication B_003 402
Engineering
Functional Dependency of the above is as mentioned:
• Stu_ID −> Stu_Branch
• Stu_Course −> {Branch_Number, Stu_Course_No}
• Candidate Keys of the above table are: {Stu_ID, Stu_Course}
Why this Table is Not in BCNF?

• The table present above is not in BCNF, because as we can see that
neither Stu_ID nor Stu_Course is a Super Key or Candidate key.

• As the rules mentioned above clearly tell that for a table to be in


BCNF, it must follow the property that for functional dependency
X−>Y, X must be in Super Key and here this property fails, that’s why
this table is not in BCNF.
How to Satisfy BCNF?

• For satisfying this table in BCNF, we have to decompose it into further


tables. Here is the full procedure through which we transform this
table into BCNF.
• Let us first divide this main table into two
tables Stu_Branch and Stu_Course
• Table Candidate Key for this table: Stu_ID.
Stu_
Stu_Branch
ID

101 Computer Science & Engineering

102 Electronics & Communication Engineering


Stu_Course Table
• Candidate Key for this table: Stu_Course.
Stu_Course Branch_Number Stu_Course_No

DBMS B_001 201

Computer Networks B_001 202

VLSI Technology B_003 401

Mobile
B_003 402
Communication
Stu_ID to Stu_Course_No Table
Stu_ID Stu_Course_No

101 201

101 202

102 401

102 402

Candidate Key for this table: {Stu_ID, Stu_Course_No}.


After decomposing into further tables, now it is in BCNF, as it is passing the
condition of Super Key, that in functional dependency X−>Y, X is a Super
Key.
Example -2
Find the highest normal form of a relation R(A, B, C, D, E)
with FD set as: { BC->D, AC->BE, B->E }
Explanation:
• Step-1: As we can see, (AC)+ ={A, C, B, E, D} but none of its subsets can
determine all attributes of the relation, So AC will be the candidate key.
A or C can’t be derived from any other attribute of the relation, so
there will be only 1 candidate key {AC}.
• Step-2: Prime attributes are those attributes that are part of candidate
key {A, C} in this example and others will be non-prime {B, D, E} in this
example.
• Step-3: The relation R is in 1st normal form as a relational DBMS does
not allow multi-valued or composite attributes.
• The relation is in 2nd normal form because BC->D is in 2nd normal
form (BC is not a proper subset of candidate key AC) and AC->BE is in
2nd normal form (AC is candidate key) and B->E is in 2nd normal form
(B is not a proper subset of candidate key AC).

• The relation is not in 3rd normal form because in BC->D (neither BC is


a super key nor D is a prime attribute) and in B->E (neither B is a
super key nor E is a prime attribute) but to satisfy 3rd normal for,
either LHS of an FD should be super key or RHS should be a prime
attribute. So the highest normal form of relation will be the 2nd
Normal form.
• Note: A prime attribute cannot be transitively dependent on a key in
BCNF relation.
Example
• R(A,B,C,D) • CK { AB,DB}
• FD { AB🡪CD,D🡪A} • PA🡪 (A,B,D)
• Non Prime🡪 C
• AB+ 🡪 ABCD (CK) • ABCD 🡪 R1(DA), R2(BCD)
• CB+🡪 CB Not CK • BD+🡪 BC
• DB+🡪 DBAC (CK) • BD+🡪BDAC
• ABD (Super key) •
• D🡪 A right hand side prime
attribute (3NF)
4th and 5th Normal Form in DBMS
• Two of the highest levels of database normalization are the fourth
normal form (4NF) and the fifth normal form (5NF).
• Multivalued dependencies are handled by 4NF, whereas join
dependencies are handled by 5NF.
• If two or more independent relations are kept in a single relation or
we can say multivalue dependency occurs when the presence of one
or more rows in a table implies the presence of one or more other
rows in that same table.
• Put another way, two attributes (or columns) in a table are
independent of one another, but both depend on a third attribute.
• A multivalued dependency always requires at least three attributes
because it consists of at least two attributes that are dependent on a
third.
Intro..

• For a dependency A -> B, if for a single value of A, multiple values of B


exist, then the table may have a multi-valued dependency.
• The table should have at least 3 attributes and B and C should be
independent for A ->> B multivalued dependency.
Person Mobile Food_Likes

Mahesh 9893/9424 Burger/Pizza

Ramesh 9191 Pizza

Person->-> mobile,
This is read as “person multi determines mobile” and “person multi
Person ->-> food_likes
determines food_likes.”
Note that a functional dependency is a special case of multivalued
dependency. In a functional dependency X -> Y, every x determines
exactly one y, never more than one.
Fourth Normal Form (4NF)

• The Fourth Normal Form (4NF) is a level of database normalization


where there are no non-trivial multivalued dependencies other than
a candidate key.

• It builds on the first three normal forms (1NF, 2NF, and 3NF) and
the Boyce-Codd Normal Form (BCNF).

• It states that, in addition to a database meeting the requirements of


BCNF, it must not contain more than one multivalued dependency.
Properties
A relation R is in 4NF if and only if the following conditions are
satisfied:
1. It should be in the Boyce-Codd Normal Form (BCNF).
2. The table should not have any Multi-valued Dependency.

A table with a multivalued dependency violates the normalization


standard of the Fourth Normal Form (4NF) because it creates
unnecessary redundancies and can contribute to inconsistent data.

To bring this up to 4NF, it is necessary to break this information into


two tables.
Example:
• Consider the database table of a class that has two relations R1
contains student ID(SID) and student name (SNAME) and R2 contains
course id(CID) and course name (CNAME). Table R1 X R2

Table R1 Table R2 SID SNAME CID CNAME

SID SNAME CID CNAME


S1 A C1 C
S1 A C1 C

S2 B C2 D S1 A C2 D

S2 B C1 C

S2 B C2 D
Multivalued dependencies (MVD) are:
• SID->->CID;
• SID->->CNAME;
• SNAME->->CNAME
Join Dependency
• Join decomposition is a further generalization of Multivalued
dependencies. If the join of R1 and R2 over C is equal to relation R
then we can say that a join dependency (JD) exists,
• where R1 and R2 are the decomposition R1(A, B, C) and R2(C, D) of a
given relations R (A, B, C, D).
• Alternatively, R1 and R2 are a lossless decomposition of R. A JD ⋈ {R1,
R2, …, Rn} is said to hold over a relation R if R1, R2, ….., Rn is a
lossless-join decomposition.
• The *(A, B, C, D), (C, D) will be a JD of R if the join of joins attribute is
equal to the relation R. Here, *(R1, R2, R3) is used to indicate that
relation R1, R2, R3 and so on are a JD of R. Let R is a relation schema
R1, R2, R3……..Rn be the decomposition of R. r( R ) is said to satisfy
join dependency if and only if

Joint Dependency
Example:
Table R3
Table R1 Table R2
Company Product Agent Company Agent Product

C1 Pendrive Aman Pendrive


Aman C1

C1 mic Aman Mic


Aman C2
C2 speaker Aman speaker

Mohan C1 Mohan speaker


C2 speaker

Company->->Product Agent->->Company Agent->->Product

Table R1⋈R2⋈R3 Company Product Agent

C1 Pendrive Aman

C1 mic Aman

C2 speaker speaker

C1 speaker Aman

Agent->->Product
Fifth Normal Form/Projected Normal Form (5NF)

• A relation R is in Fifth Normal Form if and only if everyone joins


dependency in R is implied by the candidate keys of R.
• A relation decomposed into two relations must have lossless
join Property, which ensures that no spurious or extra tuples are
generated when relations are reunited through a natural join.
Properties

A relation R is in 5NF if and only if it satisfies the following conditions:


• 1. R should be already in 4NF.
• 2. It cannot be further non loss decomposed (join dependency).
• Example – Consider the above schema, with a case as “if a company
makes a product and an agent is an agent for that company, then he
always sells that product for the company”. Under these
circumstances, the ACP table is shown as:

Table ACP

Compan
Agent Product
y
The relation ACP is again decomposed into 3
A1 PQR Nut relations. Now, the natural Join of all three
relations will be shown as:
A1 PQR Bolt

A1 XYZ Nut

A1 XYZ Bolt

A2 PQR Nut
Decomposition
Table R1 Table R2 Table R3
Compan Agent Product
Agent Company Product
y
A1 Nut PQR Nut
A1 PQR
A1 Bolt PQR Bolt
A1 XYZ
A2 Nut XYZ Nut
A2 PQR

XYZ Bolt

The result of the Natural Join of R1 and R3 over ‘Company’ and then the Natural Join of
R13 and R2 over ‘Agent’and ‘Product’ will be Table ACP.
Hence, in this example, all the redundancies are eliminated, and the decomposition of ACP
is a lossless join decomposition. Therefore, the relation is in 5NF as it does not violate the
property of lossless join.
Thank you!!

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy