Normalization-DBMS
Normalization-DBMS
Normalization is the process of organizing data in a database. This includes creating tables and establishing
relationships between those tables according to rules designed both to protect the data and to make the
database more flexible by eliminating redundancy and inconsistent dependency.
The inventor of the relational model Edgar Codd proposed the theory of normalization of data with the
introduction of the First Normal Form, and he continued to extend theory with Second and Third Normal
Form. Later he joined Raymond F. Boyce to develop the theory of Boyce-Codd Normal Form.
Objectives of Normalization
1. It is used to remove the duplicate data and database anomalies from the relational table.
2. Normalization helps to reduce redundancy and complexity.
3. It is helpful to divide the large database table into smaller tables and link them using relationship.
4. It avoids duplicate data or no repeating groups into a table.
Data redundancy occurs in a relational database when two or more rows or columns have the same value or
repetitive value leading to unnecessary utilization of the memory.
Student Table:
There are two students in the above table, 'James' and 'Ritchie Rich', whose records are repetitive when we
enter a new CourseID. Hence it repeats the studRegistration, StudName and address attributes.
Types of Anomalies
1. Insert Anomaly: An insert anomaly occurs in the relational database when some attributes or data items
are to be inserted into the database without existence of other attributes. For example, In the Student table, if
we want to insert a new courseID, we need to wait until the student enrolled in a course. In this way, it is
difficult to insert new record in the table. Hence, it is called insertion anomalies.
2. Update Anomalies: The anomaly occurs when duplicate data is updated only in one place and not in all
instances. Hence, it makes our data or table inconsistent state. For example, suppose there is a student 'James'
who belongs to Student table. If we want to update the course in the Student, we need to update the same in
the course table; otherwise, the data can be inconsistent. And it reflects the changes in a table with updated
values where some of them will not.
3. Delete Anomalies: An anomaly occurs in a database table when some records are lost or deleted from the
database table due to the deletion of other records. For example, if we want to remove Trent Bolt from the
Student table, it also removes his address, course and other details from the Student table. Therefore, we can
say that deleting some attributes can remove other attributes of the database table.
If a relation contain composite or multi-valued attribute, it violates first normal form or a relation is in first
normal form if it does not contain any composite or multi-valued attribute. A relation is in first normal form
if every attribute in that relation is singled valued attribute.
Example 1 – Relation STUDENT in table 1 is not in 1NF because of multi-valued attribute STUD_PHONE. Its
decomposition into 1NF
Second Normal Form -
To be in second normal form, a relation must be in first normal form and relation must not contain any partial
dependency. A relation is in 2NF if it has No Partial Dependency, i.e., no non-prime attribute (attributes
which are not part of any candidate key) is dependent on any proper subset of any candidate key of the table.
Partial Dependency – If the proper subset of candidate key determines non-prime attribute, it is called
partial dependency.
Example-
Customer Table
Table 1
Cust_ID Store_ID
1 1
1 3
2 1
3 2
4 3
Table 2
Store_ID Location
1 Delhi
2 Banglore
3 Mumbai
A relation is in third normal form, if there is no transitive dependency for non-prime attributes as well as it
is in second normal form.
A given relation is called in Third Normal Form (3NF) if and only if-
Transitive Dependency
Student Table
Table 1
Table 2
Exam_Id Exam_Name Total
Marks
Example-
Solution: Table 1
Table 2
Employee Employee
Zipcode City
110033 Model Town
110044 Badarpur
110028 Naraina
110064 Hari Nagar
Example:
A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency.
For a dependency A → B, if for a single value of A, multiple values of B exists, then the relation will
be a multi-valued dependency.
Example
STUDENT
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
STUDENT_COURSE
STU_ID COURSE
21 Computer
21 Math
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU_ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be
lossless.
5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid
redundancy.
5NF is also known as Project-join normal form (PJ/NF).
Table 1
SEMESTER SUBJECT
Semester 1 Computer
Semester 1 Math
Semester 1 Chemistry
Semester 2 Math
Table 2
SUBJECT LECTURER
Computer Anshika
Computer John
Math John
Math Akash
Chemistry Praveen
Table 3
SEMSTER LECTURER
Semester 1 Anshika
Semester 1 John
Semester 1 John
Semester 2 Akash
Semester 1 Praveen
Domain Constraint
Values of an attribute had some set of values, for example, EmployeeID should be four digits long −
Key Constraint
An attribute or its combination is a candidate key, which make the record unique means it doesn’t allow the
duplicate value.
Normal Forms
Normal Form Description
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully
functional dependent on the primary key.
5NF A relation is in 5NF. If it is in 4NF and does not contain any join dependency,
joining should be lossless.