0% found this document useful (0 votes)
93 views39 pages

IS202 Data Management: Part 2: Normalization

This document discusses data normalization and provides examples to illustrate the concepts and process. It begins by motivating the need for normalization to avoid data anomalies. It then defines the various normal forms - first normal form (1NF), second normal form (2NF), and third normal form (3NF) - and explains how to decompose relations to eliminate functional dependencies and anomalies. Examples are provided to demonstrate how to normalize relations from the first normal form up through the third normal form. The document concludes with a summary of the key concepts and processes of data normalization.

Uploaded by

Stephen Ngo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
93 views39 pages

IS202 Data Management: Part 2: Normalization

This document discusses data normalization and provides examples to illustrate the concepts and process. It begins by motivating the need for normalization to avoid data anomalies. It then defines the various normal forms - first normal form (1NF), second normal form (2NF), and third normal form (3NF) - and explains how to decompose relations to eliminate functional dependencies and anomalies. Examples are provided to demonstrate how to normalize relations from the first normal form up through the third normal form. The document concludes with a summary of the key concepts and processes of data normalization.

Uploaded by

Stephen Ngo
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 39

IS202 Data Management

Chapter 4
Part 2: Normalization

Motivation for Data Normalization

Why an entity is mapped to a relation?


Why a many-to-many relationship has to
be mapped to a new relation? Why not use
one relation to capture all the information?

Motivation for Data Normalization

Question Is this a relation?

Question Whats the


primary key?

Anomalies in this Table


Insertion cant enter a new employee without
having the employee take a class
Deletion if we remove employee 140, we lose
information about the existence of a Tax Acc class
Modification giving a salary increase to
employee 100 forces us to update multiple
records
Why do these anomalies exist?

Well-Structured Relations
A relation that contains minimal data redundancy
and allows users to insert, delete, and update
rows without causing data inconsistencies
Goal is to avoid anomalies
Insertion Anomaly adding new rows forces user to
create duplicate data
Deletion Anomaly deleting rows may cause a loss of
data that would be needed for other future rows
Modification Anomaly changing data in a row forces
changes to other rows because of duplication

General rule of thumb: a table should not pertain to


more than one entity type
5

Data Normalization
A tool to validate and improve a logical design
so that it satisfies certain constraints that

avoid unnecessary duplication of


data

The problems of having duplication of data


Waste of space
Difficulty in consistency control
Any other?
The process of decomposing relations with
anomalies to produce smaller, wellstructured relations

Data Normalization
Data Normalization
Functional Dependency
1st Normal Form (1NF)
2nd Normal Form (2NF)
3rd Normal Form (3NF)

Figure 4.22 -Steps in normalization

First Normal Form


No multivalued attributes
Every attribute value is atomic

All relations are in 1st Normal Form

Figure 4-2 (a) Table with multivalues attributes

Fig. 4-2a is not a relation (multivalued


attributes).
It is not in 1st Normal Form (1NF).

10

Figure 4-2b Eliminating multivalued attributes


EMPLOYEE2 relation

Fig. 4-2b is in 1st Normal form

11

Second Normal Form


1NF plus every non-key attribute is fully
functionally dependent on the ENTIRE
primary key
Every non-key attribute must be defined by
the entire key, not by only part of the key
No partial functional dependencies

12

STUDENT

True of False (based on data in the table only)

The value of Name determines that of Nationality


The value of Name determines that of Email
The value of Email determines that of AveGPA
The value of Nationality determines that of Email
The value of ID determines that of Nationality

13

Functional Dependencies and Keys


Candidate Key:
An attribute (or a combination of multiple attributes) that
can uniquely identify a row.
One of the candidate keys is chosen as identifier (in ER
stage), which becomes the Primary Key (in LD stage)
E.g. for a student, both email address and student id are unique.
However, only one of them becomes the identifier (and
subsequently the primary key)

Non-key attributes: attributes that are not candidate keys

Functional Dependency: The value of one attribute,


or of combination of attributes, (the determinant)
determines the value of another attribute
Each non-key field is functionally dependent on every
candidate key
14

Student

True or False (assume each student has a unique email


address):

Name is a non-key attribute:


Email is a candidate key:
Name is a candidate key:
Name depends on ID:
Name depends on Email :
Email depends on AveGPA :
Nationality depends on Email :
AveGPA depends on Name :

15

Representing Functional
Dependencies
Student

Graphical representation: a link is issued from


the determinant pointing to the other attribute
For example: ID is the determinant, and Name is
functionally dependent on ID

Text representation:
ID -> Name, AveGPA, Nationality
Email -> Name, AveGPA, Nationality

16

Partial Functional Dependency

A function dependency in which one or


more non-key attributes are functionally
dependent on part (but not in all) of the
primary key is defined as a partial
functional dependency.

17

Fig 4.23(b) Functional Dependencies in EMPLOYEE2

Dependency on entire primary key

Dependencies on part of the key (partial functional dependencies)


EmpID Name, DeptName, Salary
CourseID Course_Title
EmpID, CourseID DateCompleted

Partial dependencies => NOT in 2nd Normal Form!!


18

Getting it into 2nd Normal Form


Decompose the relation into three separate
relations
Emp

EmpID
Emp_Course

Course

Name DeptName Salary

EmpID

CourseID

No partial
functional
dependencies

DateCompleted

CourseID CourseTitle

19

Third Normal Form


2NF PLUS no transitive
dependencies
A transitive dependency is a
functional dependency between two
(or more) non-key attributes.

20

Example -- Relation with transitive dependency


(a) Customer_Order Relation with simple data

21

Example -- Relation with transitive dependency


CUSTOMER_ORDER

BUT
Order_ID Order_Date
Order_ID Customer_ID
Order_ID Customer_Name
Order_ID Customer_Address

Customer_ID Customer_Name
Customer_ID Customer_Address

Transitive Dependency: Not 3rd NF

All this is OK
(2nd NF)

22

Order_ID Customer_ID
Customer_ID Customer_Name, Customer_Address
Now, there are no transitive dependencies
Both relations are in 3rd NF
23

Another Example
Fig 4-26 Invoice relation (1NF)
(Pine Valley Furniture Company)

24

2NF: Remove Partial Functional


Dependency

OrderID OrderDate, CustomerID, CustomerName, CustomerAddress


ProductID ProductDescription, ProductFinish, UnitPrice
OrderID, ProductID OrderQuantity

Therefore, NOT in 2nd Normal Form

25

Getting it into the Second Normal Form


Fig 4-28 Removing Partial Dependencies

No partial functional dependency, and all three


relations are in 2NF
26

3NF: Remove Transitive Dependency

CustomerID CustomerName, CustomerAddress

Therefore, CUSTOMER_ORDER relation is


NOT in 3rd Normal Form

Getting it into the Third Normal Form


Fig 4-29 Removing Transitive Dependencies

Transitive dependencies are removed.

28

In Class Exercise 5.7


PRODUCT
Product
_ID

Product_Description

Product_Finish

Standard_Price

Product_Line_Id

1 End Table

Cherry

$175.00

2 Coffer Table

Natural Ash

$200.00

3 Computer Desk

Natural Ash

$175.00

4 Entertainment Center

Walnut

$650.00

5 Writers Desk

Cherry

$325.00

6 8-Drawer Desk

White Ash

$750.00

7 Dining Table

Natural Ash

$800.00

Walnut

$500.00

Computer Desk

Find all the function dependencies.


Is this in 2NF? If not, get it into 2NF.
Is this in 3NF? If not, get it into 3NF.
29

Data Normalization Summary


1st Normal Form
no multivalued attributes, and every attribute value is
atomic
All relations are in 1st Normal Form

2nd Normal Form


1NF + every non-key attribute is fully functionally
dependent on the ENTIRE primary key
Decomposing the relation into two new relations

3rd Normal Form


2NF PLUS no transitive dependencies
Decomposing the relation into two new relations

30

Other Normal Forms


(from Appendix B)
Boyce-Codd NF
All determinants are candidate keysthere is no
determinant that is not a unique identifier

4th NF
No multivalued dependencies

5th NF
No lossless joins

Domain-key NF
The ultimate NFperfect elimination of all
possible anomalies

31

In Class Exercise 5.8


Transitive Dependency Removal
INVOICE

32

In Class Exercise 5.9

The structure and sample data are provided for following table. Break it into
relations in 3NF (assumption: Dept_Manager must be a Emp who has a unique
Emp_Code, and Emp_Educ is a multi-valued attribute).

33

In Class Exercise 5.10


Consider the following
relation definition and
sample data:
PROJECT (ProjectID,
EmployeeName,
EmployeeSalary)
ProjectID: name of a
work project
EmployeeName: name
of an employee
EmployeeSalary: salary
of the employee

PROJECT
ProjectID Employee
Name

Employee
Salary

100A

Jones

64K

100A

Smith

51K

100B

Smith

51K

200A

Jones

64K

200B

Jones

64K

200C

Parks

28K

200C

Smith

51K

200D

Parks

28K

34

In Class Exercise (contd)


Assuming that all of the functional dependencies
and constraints are apparent in this data, which
of the following statements is true?
ProjectID EmployeeName
ProjectID EmployeeSalary
(ProjectID, EmployeeName)
EmployeeSalary
EmployeeName EmployeeSalary
EmployeeSalary ProjectID
EmployeeSalary (ProjectID,
EmployeeName)
35

In Class Exercise (contd)


What is the key of PROJECT?
Are all non-key attributes (if any)
dependent on all of the key?
In what normal form is PROJECT?
Describe one modification anomaly from
which PROJECT suffers.
Is ProjectID a determinant?
Is EmployeeName a determinant?

36

In Class Exercise (contd)


Is (ProjectID, EmployeeName) a
determinant?
Is EmployeeSalary a determinant?
Does this relation contain a partial
dependency? If so, what is it?
Redesign this relation to eliminate the
modification anomalies.

37

What we have learned


Key concepts:
Functional dependency
Partial functional dependency
Transitive dependency

Data normalization
1NF: must be a relation
2NF: 1NF + no partial functional dependency
3NF: 2NF + no transitive dependency

38

What you need to do


Review Chapter 4 (part 2)
Concepts

Recommendation

Introduction to Normalization

Normalization Example: Pine Valley


Furniture Company

A Final Step for Defining Relational


Keys

: must read, : good for you to read

39

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy