0% found this document useful (0 votes)
14 views49 pages

1ST Normalization - Mcan102

Uploaded by

starlord68736
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views49 pages

1ST Normalization - Mcan102

Uploaded by

starlord68736
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 49

SUBJECT : MCAN-102

CHAPTER : NORMALIZATION
MODULE - 2
MCA 1ST SEM

Compiled By: SANDIP GHOSHAL 1


1. DataBase System Concepts ---
Korth

2. SQL, PL / SQL
Ivan Bayross (BPB Publications)

Compiled By: SANDIP GHOSHAL 2


Normalization
What is data integrity?
➢ Data integrity is the overall accuracy, completeness, and
consistency of data.
➢ It is maintained by a collection of processes, rules, and
standards implemented during the design phase.
➢ When the integrity of data is secure, the information
stored in a database will remain complete, accurate, and
reliable no matter how long it’s stored or how often it’s
accessed.
➢ Data integrity also ensures that your data is safe from any
outside forces.

Compiled By: SANDIP GHOSHAL 3


Normalization
Integrity Constraints
Integrity constraints are a set of rules. It is used to
maintain the quality of information.
Integrity constraints ensure that the data insertion,
updating, and other processes have to be performed in
such a way that data integrity is not affected.
Thus, integrity constraint is used to guard against
accidental damage to the database.

Compiled By: SANDIP GHOSHAL 4


Normalization
1. Domain constraints
Domain constraints can be defined as the definition of a
valid set of values for an attribute.
The data type of domain includes string, character, integer,
time, date, currency, etc. The value of the attribute must
be available in the corresponding domain.

Compiled By: SANDIP GHOSHAL 5


Normalization
2. Entity integrity constraints
The entity integrity constraint states that primary key
value can't be null.
This is because the primary key value is used to identify
individual rows in relation and if the primary key has a
null value, then we can't identify those rows.
A table can contain a null value other than the primary
key field.

Compiled By: SANDIP GHOSHAL 6


Normalization
3. Referential Integrity Constraints
A referential integrity constraint is specified between two
tables.
In the Referential integrity constraints, if a foreign key in
Table 1 refers to the Primary Key of Table 2, then every
value of the Foreign Key in Table 1 must be null or be
available in Table 2.

Compiled By: SANDIP GHOSHAL 7


Normalization
4. Key constraints
Keys are the entity set that is used to identify an entity
within its entity set uniquely.
An entity set can have multiple keys, but out of which one
key will be the primary key. A primary key can contain a
unique and null value in the relational table.

Compiled By: SANDIP GHOSHAL 8


Normalization

What are Keys in DBMS?

KEYS in DBMS is an attribute or set of attributes


which helps you to identify a row(tuple) in a
relation(table). They allow you to find the relation
between two tables.

Keys help you uniquely identify a row in a table by a


combination of one or more columns in that table.

Compiled By: SANDIP GHOSHAL 9


Normalization
Types of Keys in RDBMS:
There are mainly seven different types of Keys in
DBMS and each key has it’s different functionality:

1. Super Key - A super key is a group of single or


multiple keys which identifies rows in a table.
2. Primary Key - is a column or group of columns in a
table that uniquely identify every row in that table.
3. Candidate Key - is a set of attributes that uniquely
identify tuples in a table. Candidate Key is a super key
with no repeated attributes [Minimal Super key].
4. Alternate Key - is a column or group of columns in a
table that uniquely identify every row in that table, but
not a primary key.
Compiled By: SANDIP GHOSHAL 10
Normalization
5. Foreign Key - is a column that creates a relationship
between two tables. The purpose of Foreign keys is to
maintain data integrity and allow navigation between
two different instances of an entity. Foreign keys are the
column of the table which is used to point to the
primary key of another table.
6. Composite Key - has two or more attributes that
allow you to uniquely recognize a specific record. It is
possible that each column may not be unique by itself
within the database.
7. Surrogate Key - An artificial key which aims to
uniquely identify each record is called a surrogate key.
These kind of key are unique because they are created
when you don't have any natural primary key.
Compiled By: SANDIP GHOSHAL 11
Normalization
Types of Keys - Example:

Super key

Primary key Alternate key

Compiled By: SANDIP GHOSHAL 12


Normalization
Types of Keys - Example:

Foreign key

Compiled By: SANDIP GHOSHAL 13


Normalization
Problems caused due to redundancy are:
Insertion anomaly, Deletion anomaly and Updation anomaly.

Redundancy means having multiple copies of same data in


the database. This problem arises when a database is not
normalized. Suppose a table of student details attributes
are: student Id, student name, college name, college rank,
course opted.

Compiled By: SANDIP GHOSHAL 14


Normalization
As it can be observed that values of attribute college name,
college rank, course is being repeated which can lead to
problems. Problems caused due to redundancy are:
Insertion anomaly, Deletion anomaly, and Updation
anomaly.
1. Insertion Anomaly –
If a student detail has to be inserted whose course is
not being decided yet then insertion will not be
possible till the time course is decided for student.

This problem happens when the insertion of a data record


is not possible without adding some additional
unrelated data to the record.
Compiled By: SANDIP GHOSHAL 15
Normalization
2. Deletion Anomaly –

If the details of students in this table is deleted then


the details of college will also get deleted which
should not occur by common sense.

This anomaly happens when deletion of a data record


results in losing some unrelated information that
was stored as part of the record that was deleted
from a table.

Compiled By: SANDIP GHOSHAL 16


Normalization
3. Updation Anomaly –
Suppose if the rank of the college changes then changes will
have to be all over the database which will be time-
consuming and computationally costly.

If updation do not occur at all places then database will be


in inconsistent state.

Compiled By: SANDIP GHOSHAL 17


Normalization
Functional Dependency

Functional dependency FD is a set of constraints


between two attributes in a relation.
Functional dependency says that if two tuples have
same values for attributes A1, A2,..., An, then those
two tuples must have to have same values for
attributes B1, B2, ..., Bn.
Functional dependency is represented by an arrow
sign → that is, X→Y, where X functionally
determines Y.
The left-hand side attributes determine the values
of attributes on the right-hand side.

Compiled By: SANDIP GHOSHAL 18


Normalization
Trivial Functional Dependency

Trivial − If an FD X → Y holds, where Y is a subset of


X, then it is called a trivial FD.
Non-trivial − If an FD X → Y holds, where Y is not a
subset of X, then it is called a non-trivial FD.
Completely non-trivial − If an FD X → Y holds,
where x intersect Y = NULL, it is said to be a
completely non-trivial FD.
Partially-trivial − If an FD X → Y holds, where Y is
not a subset of X but X intersect Y is NOT NULL, it
is said to be a Partially-trivial FD.

Compiled By: SANDIP GHOSHAL 19


Normalization
Normalization

If a database design is not perfect, it may contain


anomalies, which are like a bad dream for any
database administrator. Managing a database with
anomalies is next to impossible.
Basically normalization is a process through
which we can analyze a given relational
schema to reduce data redundancy and also to
remove following three anomalies.
▪ Update anomalies
▪ Deletion anomalies
▪ Insert anomalies
Compiled By: SANDIP GHOSHAL 20
Normalization
NORMALIZATION (Normal Form)

➢ 1NF →Remove repeating groups or non atomic


attribute.

➢ 2NF →Remove partial dependency.

➢ 3NF →Remove transitive dependency.

➢ BCNF / 3.5NF →All the determinants should be


the key element.

Compiled By: SANDIP GHOSHAL 21


Normalization

➢ 4NF →Separate SVD(Single Valued Dependency)


and MVD(Multi Valued Dependency).

➢ 5NF →Remove JD(Join Dependency). Create tables


for each existing MVD. Also known as PJNF [Project
Join Normal Form].

➢ 6NF →Same as 5NF. Applied for Temporal Data


Base.

➢ DK/NF →Separate all the attributes which have no


explicit constraint.

Compiled By: SANDIP GHOSHAL 22


Normalization
First Normal Form
❑ First Normal Form is defined in the definition of
relations tables itself. This rule defines that all the
attributes in a relation must have atomic domains.
The values in an atomic domain are indivisible units.

UNF → 1NF:
✓ Remove repeating groups: ƒ
✓ Entering appropriate data in the empty columns of
rows.
✓ Placing repeating data along with a copy of the
original key attribute in a separate relation. Identifying
a primary key for each of the new relations.

Compiled By: SANDIP GHOSHAL 23


Normalization
Second Normal Form
❑ Prime attribute − An attribute, which is a part of the
primary-key, is known as a prime attribute.
❑ Non-prime attribute − An attribute, which is not a part of
the primary-key, is said to be a non-prime attribute.
❑ If we follow second normal form, then every non-prime
attribute should be fully functionally dependent on
primary key. That is, if X → A holds, then there should not
be any proper subset Y of X, for which Y → A also holds
true.

1NF → 2NF:
✓ Remove partial dependencies:
✓ The functionally dependent attributes are removed from
the relation by placing them in a new relation along with a
copy of their determinant.
Compiled By: SANDIP GHOSHAL 24
Normalization
Third Normal Form
❑ For a relation to be in Third Normal Form, it must be in
Second Normal form and the following must satisfy –
❑ No non-prime attribute is transitively dependent on
prime key attribute. For any non-trivial functional
dependency, X → A, then either –
X is a super key or,
A is prime attribute.

2NF → 3NF:
✓ Remove transitive dependencies:
✓ The transitively dependent attributes are removed from
the relation by placing them in a new relation along
with a copy of their determinant.
Compiled By: SANDIP GHOSHAL 25
Normalization
Boyce-Codd Normal Form [ 3.5 Normal Form ]

❑ Boyce-Codd Normal Form BCNF is an extension of


Third Normal Form on strict terms.

BCNF states that –


❑ For any non-trivial functional dependency, X → A, X
must be a super-key.

Transformation to BCNF:
✓ Remove violating functional dependencies by placing
them in a new relation.
Compiled By: SANDIP GHOSHAL 26
Normalization
Multi Valued Dependency [ MVD ]

• Multi valued dependency occurs when two attributes


in a table are independent of each other but, both
depend on a third attribute.
• A multi valued dependency consists of at least two
attributes that are dependent on a third attribute
that's why it always requires at least three attributes.

Example: Suppose there is a bike manufacturer


company which produces two colors (white
and black) of each model every year.

Compiled By: SANDIP GHOSHAL 27


Normalization
BIKE_MODEL MANUF_YEAR COLOR

M2001 2008 White


M2001 2009 Black
M3001 2013 White
M3001 2013 Black
M4006 2017 White
M4006 2017 Black

Here columns COLOR and MANUF_YEAR are


dependent on BIKE_MODEL and independent of each
other.
In this case, these two columns can be called as multi
valued dependent on BIKE_MODEL. The representation
of these dependencies is shown below:
BIKE_MODEL → → MANUF_YEAR
BIKE_MODEL → → COLOR
Compiled By: SANDIP GHOSHAL 28
Normalization
Fourth normal form (4NF):
❑ Fourth normal form (4NF) is a level of database
normalization where there are no non-trivial multi
valued dependencies other than a candidate key.
❑ It states that, in addition to a database meeting the
requirements of BCNF, it must not contain more than
one multi-valued dependency.

Properties – A relation R is in 4NF if and only if


the following conditions are satisfied:
✓ It should be in the Boyce-Codd Normal Form
(BCNF).
✓ The table should not have any Multi-valued
Dependency.
Compiled By: SANDIP GHOSHAL 29
Normalization
Join Dependency

❑ Joindecomposition is a further generalization


of Multi valued dependencies.

❑ If the join of R1 and R2 over C is equal to


relation R, then we can say that a join
dependency (JD) exists.

❑ Where R1 and R2 are the decompositions R1


(A, B, C) and R2 (C, D) of a given relations R
(A, B, C, D).

Compiled By: SANDIP GHOSHAL 30


Normalization
Fifth Normal Form / Projected Normal Form
(5NF/PJNF):
❑ A relation R is in 5NF if and only if every join
dependency in R is implied by the candidate keys of R.
❑ A relation decomposed into two relations must have
loss-less join Property, which ensures that no spurious
or extra tuples are generated, when relations are
reunited through a natural join.
Properties – A relation R is in 5NF if and only if it
satisfies following conditions:
✓ R should be already in 4NF.
✓ It cannot be further non loss decomposed (join
dependency)

Compiled By: SANDIP GHOSHAL 31


Normalization
A spurious tuple is, basically, a record in a database
that gets created when two tables are joined
badly.
In database-ese, spurious tuples are created when
two tables are joined on attributes that are
neither primary keys nor foreign keys.

Types of Data Base


1. Non Temporal or Time Stamp
2.Temporal
a. Uni – Temporal
b. Bi – Temporal
c.Tri - Temporal

Compiled By: SANDIP GHOSHAL 32


Normalization
Temporal database
A temporal database stores data relating to time instances. It
offers temporal data types and stores information relating to
past, present and future time. Temporal databases could
be uni-temporal, bi-temporal or tri-temporal.
More specifically the temporal aspects usually include valid
time, transaction time or decision time
• Valid time is the time period during which a fact is true in
the real world. It consists of VST [Valid Start Time] & VET
[Valid End Time].
• Transaction time is the time period during which a fact
stored in the database was known. It consists of TST
[Transaction Start Time] & TET [Transaction End Time].
• Decision time is the time period during which a fact stored
in the database was decided to be valid.
Compiled By: SANDIP GHOSHAL 33
Normalization
Uni-Temporal
❑ A uni-temporal database has one axis of time, either the
validity range or the system time range.

Bi-Temporal
A bi-temporal database has two axis of time.
❑ valid time. [VST & VET]
❑ transaction time or decision time. [TST & TET]

Tri-Temporal
A tri-temporal database has three axes of time.
❑ valid time.
❑ transaction time
❑ decision time.
Compiled By: SANDIP GHOSHAL 34
Normalization

sixth normal form

A R [table] is in sixth normal form (abbreviated


6NF) if and only if it satisfies no nontrivial join
dependencies at all — where, as before, a join
dependency is trivial if and only if at least one of
the projections (possibly U_projections) involved
is taken over the set of all attributes of the R
[table] concerned.

Compiled By: SANDIP GHOSHAL 35


Normalization
Domain-key normal form (DK/NF)
❑ Domain-key normal form (DK/NF) is a normal
form used in database normalization which
requires that the database contains no
constraints other than domain constraints and
key constraints.
❑ A domain constraint specifies the permissible values for
a given attribute, while a key constraint specifies the
attributes that uniquely identify a row in a given table.
❑ The domain/key normal form is achieved when every
constraint on the relation is a logical consequence of the
definition of keys and domains, and enforcing key and
domain restraints and conditions causes all constraints
to be met.Thus, it avoids all non-temporal anomalies.
Compiled By: SANDIP GHOSHAL 36
Normalization
Example of 1NF; 2NF; 3NF & BCNF (3.5 NF)
TABLE1
{ROLLNO, DEPT_NO, HOD, DEPT_NAME, S_NAME,
STATUS, SUB, SECTION}

FDs =>
{ROLLNO, DEPT_NO, HOD} {DEPT_NAME, S_NAME,
STATUS, SUB, SECTION}

{DEPT_NO, HOD} {DEPT_NAME, STATUS} -------


[Partial Dependency]

{S_NAME} {SECTION} ----- [Transitive Dependency]

{DEPT_NAME} {HOD}
[Assuming HOD & S_NAME are atomic]

Compiled By: SANDIP GHOSHAL 37


Normalization
1NF => Remove SUB as repeating group / non
atomic attribute.
TABLE2
{ROLLNO, DEPT_NO, HOD, DEPT_NAME, S_NAME,
STATUS, SECTION}
TABLE3
{ROLLNO,DEPT_NO,HOD, SUB}

2NF => Remove PARTIAL DEPENDENCY.


TABLE3
{ROLLNO,DEPT_NO,HOD, SUB}
TABLE4
{ROLLNO,DEPT_NO,HOD, S_NAME,SECTION}
TABLE5
{ DEPT_NO,HOD, DEPT_NAME,STATUS}
Compiled By: SANDIP GHOSHAL 38
Normalization
3NF => Remove TRANSITIVE DEPENDENCY.
TABLE3
{ROLLNO, DEPT_NO, HOD, SUB}
TABLE5
{ DEPT_NO, HOD, DEPT_NAME, STATUS}
TABLE6
{ROLLNO, DEPT_NO, HOD, S_NAME}
TABLE7
{S_NAME, SECTION}
BCNF (3.5 NF) => Remove “Non Key Determinants”.
TABLE3
{ROLLNO, DEPT_NO, HOD, SUB}
TABLE6
{ROLLNO, DEPT_NO, HOD, S_NAME}
TABLE7
{S_NAME, SECTION}
TABLE8 TABLE9
{ DEPT_NO, HOD, STATUS} {DEPT_NAME, HOD}
Compiled By: SANDIP GHOSHAL 39
Normalization

Compiled By: SANDIP GHOSHAL 40


Normalization
Example of 4NF; 5NF; Spurious Tuples
TABLE1
ROLLNO NAME SUBJECT MARKS
1 AB DBMS 30
1 AB C 29
2 CD DBMS 29
1 AB DBMS 29

FDs =>

{ROLLNO } → {NAME} ----------------------------------- SVD


{ROLLNO } → →{SUBJECT}
{ROLLNO } → → {MARKS} --------------------- MVD
{SUBJECT } → → {MARKS}
[Assuming NAME is atomic]

Compiled By: SANDIP GHOSHAL 41


Normalization
4NF => Separate SVD & MVD.

TABLE2 {ROLLNO, NAME}


ROLLNO NAME
1 AB
2 CD

TABLE3 {ROLLNO, SUBJECT, MARKS}


ROLLNO SUBJECT MARKS
1 DBMS 30
1 C 29
2 DBMS 29
1 DBMS 29

Compiled By: SANDIP GHOSHAL 42


Normalization
5NF / PJNF => Remove JD. Separate all MVDs.
TABLE2 {ROLLNO, NAME}
ROLLNO NAME
1 AB
2 CD
TABLE4 {ROLLNO, SUBJECT}
ROLLNO SUBJECT
1 DBMS
1 C
2 DBMS
TABLE5 {ROLLNO, MARKS}
ROLLNO MARKS
1 30
1 29
2 29
TABLE6 { SUBJECT, MARKS}
SUBJECT MARKS
DBMS 30
C 29
DBMS 29
Compiled By: SANDIP GHOSHAL 43
Normalization

Compiled By: SANDIP GHOSHAL 44


Normalization
EX. OF SPURIOUS TOUPLE.
❑ During normalizing into 5NF if any one decomposes TABLE3 into
TABLE4 & TABLE5 [As those two tables cover all the attributes of
TABLE3] only then that will be a bad decomposition or wrong
normalization which may cause a problem of spurious touple.
❑ After joining TABLE4 & TABLE5 we get following set of data which
is not same as TABLE3.
TABLE4 TABLE5
ROLLNO SUBJECT MARKS
1 DBMS 30
1 DBMS 29
1 C 30
1 C 29
2 DBMS 29
❑ The 3rd row is an extra information or spurious touple.
❑ But if we join TABLE4, TABLE5 & TABLE6 then, we get same data as
TABLE3.
❑ So, spurious touple may caused by bad or wrong normalization.
Compiled By: SANDIP GHOSHAL 45
Normalization
Example of DK/NF
TABLE1
{ROLLNO,NAME, FEES_PAID,SECTION}

FDs =>

{ROLLNO } {NAME, FEES_PAID, SECTION}


[Assuming NAME is atomic]

List of Implicit Constraint and Explicit Constraint / Domain Constraint

ROLLNO → Primary Key


NAME → Char(50) Implicit Constraint
FEES_PAID → {‘Yes’ or ‘No’} Explicit Constraint
SECTION → {‘A’ or ‘B’ or ‘K1’ or ‘K2’} Domain Constraint

Compiled By: SANDIP GHOSHAL 46


Normalization
DK/NF =>
Separate Implicit and Explicit Constraint.

TABLE2
{ROLLNO, NAME}

TABLE3
{ROLLNO, FEES_PAID, SECTION}

Compiled By: SANDIP GHOSHAL 47


Normalization

DK/NF

Compiled By: SANDIP GHOSHAL 48


THANK YOU

Compiled By: SANDIP GHOSHAL 49

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy