0% found this document useful (0 votes)
21 views45 pages

AD Chap3

Chapter 3 discusses functional dependencies and normalization in relational databases, emphasizing the importance of designing schemas that minimize redundancy and update anomalies. It introduces key concepts such as functional dependencies, the lossless join property, and various inference rules for deriving additional dependencies. The chapter also provides algorithms for determining attribute closures and identifying keys within a relation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views45 pages

AD Chap3

Chapter 3 discusses functional dependencies and normalization in relational databases, emphasizing the importance of designing schemas that minimize redundancy and update anomalies. It introduces key concepts such as functional dependencies, the lossless join property, and various inference rules for deriving additional dependencies. The chapter also provides algorithms for determining attribute closures and identifying keys within a relation.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 45

Chapter 3:

Functional Dependencies and


Normalization for Relational
Databases
Course: Advanced Database Systems

1
Outline

 1. Semantics of the Relation Attributes


 2. Functional Dependencies
 3. Normal Forms

2
1. Semantics of the Relation
Attributes
GUIDELINE 1: Informally, each tuple in a relation
should represent one entity or relationship instance.
(Applies to individual relations and their attributes).
 Attributes of different entities (EMPLOYEE,
DEPARTMENT, PROJECT) should not be mixed in the
same relation
 Only foreign keys should be used to refer to other
entities
 Entity and relationship attributes should be kept apart
as much as possible.
Bottom Line: Design a schema that can be explained
easily relation by relation. The semantics of
attributes should be easy to interpret.
3
Figure 3.1 A
simplified
COMPANY
relational
database schema

4 Chapter 3
Redundant Information in Tuples
and Update Anomalies
 Mixing attributes of multiple entities may cause
problems
 Information is stored redundantly wasting
storage
 Problems with update anomalies
 Insertion anomalies
 Deletion anomalies
 Modification anomalies

5
EXAMPLE OF AN UPDATE ANOMALY
(1)

Consider the relation:


EMP_PROJ ( Emp#, Proj#, Ename, Pname, No_hours)

 Update Anomaly: Changing the name of project


number P1 from “Billing” to “Customer-Accounting”
may cause this update to be made for all 100
employees working on project P1.

6
EXAMPLE OF AN UPDATE ANOMALY
(2)

 Insert Anomaly: Cannot insert a project unless an


employee is assigned to .
Inversely - Cannot insert an employee unless an
he/she is assigned to a project.
 Delete Anomaly: When a project is deleted, it will
result in deleting all the employees who work on that
project. Alternately, if an employee is the sole
employee on a project, deleting that employee
would result in deleting the corresponding project.

7
Figure 3.2 Two relation schemas
suffering from update anomalies

8
Guideline to Redundant
Information in Tuples and Update
Anomalies
 GUIDELINE 2: Design a schema that does not
suffer from the insertion, deletion and update
anomalies. If there are any present, then note them
so that applications can be made to take them into
account.

9
Null Values in Tuples

GUIDELINE 3: Relations should be designed such


that their tuples will have as few NULL values as
possible
 Attributes that are NULL frequently could be placed

in separate relations (with the primary key)


 Reasons for nulls:
 attribute not applicable or invalid
 attribute value unknown (may exist)
 value known to exist, but unavailable

10
Spurious Tuples

 Bad designs for a relational database may result in


erroneous results for certain JOIN operations
 The "lossless join" property is used to guarantee
meaningful results for join operations

GUIDELINE 4: The relations should be designed to


satisfy the lossless join condition. No spurious
tuples should be generated by doing a natural-join
of any relations.

11
Spurious Tuples (2)

There are two important properties of


decompositions:
(a) non-additive or losslessness of the corresponding
join
(b) preservation of the functional dependencies.

Note that property (a) is extremely important and


cannot be sacrificed. Property (b) is less stringent
and may be sacrificed.

12
2 Functional Dependencies
 Functional dependencies (FDs) are used to
specify formal measures of the "goodness" of
relational designs
 FDs and keys are used to define normal forms
for relations
 FDs are constraints that are derived from the
meaning and interrelationships of the data
attributes
 A set of attributes X functionally determines a
set of attributes Y if the value of X determines a
unique value for Y.

13
Functional Dependencies (cont.)
 X  Y holds if whenever two tuples have the same value
for X, they must have the same value for Y.
 For any two tuples t1 and t2 in any relation instance r(R):
If t1[X]=t2[X], then t1[Y]=t2[Y]
 X  Y in R specifies a constraint on all relation
instances r(R)
 Written as X  Y; can be displayed graphically on a
relation schema as in Figures. (denoted by the arrow).
 FDs are derived from the real-world constraints on the
attributes

14
Examples of FD constraints
 social security number determines employee name
SSN  ENAME
 project number determines project name and
location
PNUMBER  {PNAME, PLOCATION}
 employee ssn and project number determines the
hours per week that the employee works on the
project
{SSN, PNUMBER}  HOURS

15
Examples of FD constraints (cont.)

 An FD is a property of the attributes in the


schema R
 The constraint must hold on every relation
instance r(R)
 If K is a key of R, then K functionally determines
all attributes in R (since we never have two
distinct tuples with t1[K]=t2[K])

16
Inference Rules for FDs

 Given a set of FDs F, we can infer additional


FDs that hold whenever the FDs in F hold
Armstrong's inference rules:
IR1. (Reflexive) If Y subset-of X, then X  Y
IR2. (Augmentation) If X  Y, then XZ  YZ
(Notation: XZ stands for X U Z)
IR3. (Transitive) If X  Y and Y  Z, then X  Z

 IR1, IR2, IR3 form a sound and complete set of


inference rules

17
Example
 Consider relation r below. Relation r satisfies the FD: A  B
r(A B C D)
a1 b1 c1 d1
a2 b2 c1 d1
a1 b1 c1 d2
a3 b3 c3 d3
And hence, the FDs: AB  B, AC  B, AD  B, ABC  B, ABD  B, ACD  B
and ABCD  B.

AB  B (due to IR1)
AC  A (IR1) and A  B  AC  B (due to transitive rule)
AD  BD (IR2) and BD  B  AD  B (due to transitive rule)
ABC  B, ABD  B and ABCD  B (due to IR1).
ACD  AD and AD  B  ACD  B (due to transitive rule)

18
Example

Consider relation r below. Relation r satisfies the FD: A 


B and B  C
r(A B C D)
a1 b1 c2 d1
a2 b2 c 1 d2
a3 b1 c 2 d1
a4 b1 c 2 d3
In the above relation, relation r also satisfies A  C by
Transitivity.

19
Additional Inference Rules for FDs
Some additional inference rules that are useful:
(Decomposition) If X  YZ, then X  Y and X  Z
(Union) If X  Y and X  Z, then X  YZ
(Pseudotransitivity) If X  Y and WY  Z, then WX  Z

X  YZ, YZ  Y  X  Y; X  YZ, YZ  Z  X  Z
X  XY, XY  YZ X  YZ
WX  WY, WY  Z WX  Z

 The last three inference rules, as well as any other


inference rules, can be deduced from IR1, IR2, and IR3
(completeness property)
Note: decomposition = projective; union = additive
19
Example
 Consider relation r below. Relation r satisfies the FDs: A  B,
A C
r(A B C D)
a1 b 1 c 1 d 1
a2 b 2 c 1 d1
a1 b 1 c 1 d2
a3 b 3 c 3 d3
and hence, relation r also satisfies A  BC by Additivity rule.
 Consider relation r above. Relation r satisfies the FD: A  BC.

And hence, relation r also satisfies A  B and A  C by


Projectivity rule.

21
Closure

 Closure of a set F of FDs is the set F+ of all FDs that can


be inferred from F

 Closure of a set of attributes X with respect to F is the


set X+ of all attributes that are functionally determined by
X

 X+ can be calculated by repeatedly applying IR1, IR2,


IR3 using the FDs in F.
 It is difficult to calculate the closure of a set of FDs, but
we can compute the closure of a set of attributes.

21
Example: Closure of a set of
FDs
 Given a set of FDs F = {AB  C, C  B} on relation
r(ABC).
 The closure of the set F is

F+ = {A  A, AB  A, AC  A, ABC  A, B  B, AB
 B, BC  B, ABC  B, C  C, AC  C, BC  C,
ABC  C, AB  AB, ABC  AB, AC  AC, ABC 
AC, BC  BC, ABC  BC, ABC  ABC, AB  C,
AB  AC, AB  BC, AB  ABC, C  B, C  BC,
AC  B, AC  AB}

23
Algorithm 3.1 Finding the closure of the
set of attributes.
Algorithm 3.1: Determine X+, the closure of the set of
attributes X under F.
X+ = X;
repeat
oldX+ := X+;
for each function dependency Y  Z in F do
if X+ Y then X+:= X+ Z;
until (X+= oldX+)
Example: Given a relation R(A,B,C,D,E,F) and a functional
dependency set F = {f1: D  B, f2: A  C, f3: AD  E, f4: C 
F}. Find A+ and {AD}+.

24
Example
Given a relation R(A,B,C,D,E,F) and a functional dependency set
F = {f1: D  B, f2: A  C, f3: AD  E, f4: C  F}. Find A+ and {AD}+.

Find A+: At first, A+ = {A}. We do the first scan on the


functional dependency set F: from f2, A+ = {AC}; from f4, A+
= {ACF}. Then, we achieve the second scan on the
functional dependency set F, but in this scan, A+ does not
change. Finally, A+ = {ACF}.
Find {AD}+: At first, {AD}+ = {AD}. We do the first scan on

the functional dependency set F: from f1, {AD}+ = {ADB};


from f2, {AD}+ = {ADBC}; from f3, {AD}+ = {ADBCE}; from f4,
{AD}+ = {ADBCEF} = U (the attribute set of the relation R).
So, {AD}+ = {ADBCEF}.

25
How to check membership of
FTo
+ check if the set of FDs F implies X Y (i.e. F|= X Y,
we should check X Y  F+.
 However, it difficult to find F+ since this set is very large.
So we want to check if F implies X Y without computing
F+. To do so, we have to check if Y  X+.

 Example: Given a functional dependency set F = {D  B,


A  C, AD  E, C  B}. We want to check if the set F
implies A  B or not by finding A+F. We obtain A+F =
{ACB}, and B  A+F . Therefore F implies A  B.

26
Algorithm 3.2 Finding a Key K for R
given a set F of functional dependencies
on the attributes of R.
Input: A universal relation R and a set of functional
dependencies F on the attributes of R.
1. Set K := U; // U: the set of attributes in relation R
2. for each attribute A in K do
{compute (K- {A})+ with respect to F;
if (K – {A})+ contains all the attributes in R then
K:= K – {A}
}
// K is the key

27
Algorithm 3.3 Finding the set K of all
the keys for R given a set F of FDs on
the attributes
procedure of F,R.
Set_of_Keys (U, K)
begin
N := U   f  F right(f); // U: the set of attributes in relation R
if N+F = U then K := {N}
else
begin
D :=  f  F right(f)   f  F left(f);
L := U  N+F D;
K := ;
for each Li  L do
if {NLi}+F = U then K := K  {NLi};
while  Ki, Kj  K and Ki  Kj do
K := K  {Kj};
end
end;
Note: right(f) is the attributes at the right side of FD f 28
Example 1
 Given a relation R(A,B,C,D,E,F) and a set of functional
dependencies F = {D  B, A  C, AD  E, C  F}. To
find all the keys for R, we perform the following
computations:
 N = U   f  F right(f) = {ABCDEF} – {BCEF} = {AD}
N+F = {AD}+F = {ADBCEF} = U
So the relation R has only one key, that is {AD}.

29
Example 2
 Given a relation R(A,B,C,D,E,F) and a set of functional
dependencies F = {A  D, C  AF, AB  EC}. To find all the
keys for R, we perform the following computations:
 N = U   f  F right(f) = {ABCDEF} – {DAFEC} = {B}
N+F = {B}+F = {B}  U
 D :=  f  F right(f)   f  F left(f) = {DAFEC} – {ACB} = {DFE}
 L := U  N+F D = {ABCDEF} – {BDFE} = {AC}. The subsets of L
are {A}, {C} and {AC}.
 {BA}+F = {BADECF} = U. So {BA} is a key for R.
 {BC}+F = {BCAFED} = U. So {BC} is a key for R.
 We don’t need to calculate {BAC}+F sine {BAC} is a superset of
{BA} and {BC}.
 Finally, relation R has two keys {BA} and {BC}.
30
3. Normal Forms
 Normalization: The process of decomposing
unsatisfactory "bad" relations by breaking up
their attributes into smaller relations

 Normal form: Condition using keys and FDs of


a relation to certify whether a relation schema is
in a particular normal form

30
Normalization of Relations

 2NF, 3NF, Boyce Codd Normal Form based on


keys and Functional Dependencies of a relation
schema
 4NF based on keys, multi-valued dependencies

31
Definitions of Keys and Attributes
Participating in Keys
 If a relation schema has more than one key, each is
called a candidate key. One of the candidate keys
is arbitrarily designated to be the primary key, and
the others are called secondary keys.
 A Prime attribute must be a member of some
candidate key
 A Nonprime attribute is not a prime attribute—that
is, it is not a member of any candidate key.

33
Second Normal Form
A relation schema R is in second normal form (2NF)
if every non-prime attribute A in R is fully functionally
dependent on the primary key.

Third Normal Form


A relation schema R is in third normal form (3NF) if it
is in 2NF and no non-prime attribute A in R is
transitively dependent on the primary key

Definition of superkey
Definition:
Superkey of relation schema R - a set of attributes S
of R that contains a key of R
34
BCNF (Boyce-Codd Normal Form)

 A relation schema R is in Boyce-Codd Normal


Form (BCNF) if whenever an FD X  A holds in
R, then X is a superkey of R
 Each normal form is strictly stronger than the previous
one
 Every 2NF relation is in 1NF
 Every 3NF relation is in 2NF
 Every BCNF relation is in 3NF
 There exist relations that are in 3NF but not in BCNF
 The goal is to have each relation in BCNF (or 3NF)

34
Figure 3.3 Example on Boyce-Codd
normal form

35
Figure 3.4 A relation TEACH that is in
3NF but not in BCNF

37
Achieving the BCNF by Decomposition
(1)
 Two FDs exist in the relation TEACH:
fd1: { student, course}  instructor
fd2: instructor  course
 {student, course} is a candidate key for this relation and
that the dependencies shown follow the pattern in
Figure 3.3 (b). So this relation is in 3NF but not in BCNF
 A relation NOT in BCNF should be decomposed so as
to meet this property, while possibly forgoing the
preservation of all functional dependencies in the
decomposed relations.

37
Achieving the BCNF by Decomposition
(2)
 Three possible decompositions for relation TEACH
1. {student, instructor} and {student, course}
2. {course, instructor } and {course, student}
3. {instructor, course } and {instructor, student}
 All three decompositions will lose fd1. We have to settle for
sacrificing the functional dependency preservation. But we
cannot sacrifice the non-additivity (lossless) property after
decomposition.
 Out of the above three, only the 3rd decomposition will not
generate spurious tuples after join.(and hence has the non-
additivity property).

38
Fourth Normal form
 When a relation is in BCNF, there may still be
anomalies that results from multivalued
dependencies.
 For example, there is a user view which shows for
each course the instructors who can teach that
course and the textbooks that are used.
 In this view we have the following assumptions:
 Each course has a well-defined set of instructors

 Each course has a well-defined set of textbooks that

are used.
 The textbooks that are used for a given course are

independent of the instructor for that course.

40
Figure 3.5. Data with multivalued
dependencies

The relation OFFERING is in first normal form. The primary key of this
relation consists of all three attributes. Because there are no
determinants other than the primary key, the relation is in BCNF.
Update anomaly: suppose we want to add a third textbook to the
Management course, we have to add three new rows to the relation.
41
Multivalued dependency
 The type of dependency that exists when there are at
least 3 attributes (e.g. A, B, and C) in a relation, with a
well-defined set of B and C values with for each A value,
but those B and C values are independent of each other.
 To remove the multivalued dependency from a relation,
we divide the relation into 2 relations. Each of these tables
two attributes that have a multivalued relationship in the
original relation.

42
Figure 3.6. Relations in 4NF

 Fourth normal form (4NF): A relation in BCNF that


contains no multivalued dependencies.

43
Reference
 Ramez Elmasri, Shamkant B.Navathe,
Fundamentals of Database Systems-
6thEdition, Pearson, 2011.
 Dương Tuấn Anh, Nguyễn Trung Trực, Hệ Cơ
Sở Dữ Liệu, Nhà Xuất Bản Đại Học Quốc Gia
TP. Hồ Chí Minh, 2006.

44
tuple, update anomalies, null value, functional dependency,
Armstrong’s inference rules, reflexive, augmentation,
transitive, decomposition rule, union, pseudo-transitive,
closure of a set of attributes, closure of a set of functional
dependencies, candidate key, primary key, prime attribute,
non-prime attribute, lossless decomposition, spurious
tuples, superkey, Boyce Codd Normal Form, 4th Normal
Form, multivalued dependencies.

45

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy