AD Chap3
AD Chap3
1
Outline
2
1. Semantics of the Relation
Attributes
GUIDELINE 1: Informally, each tuple in a relation
should represent one entity or relationship instance.
(Applies to individual relations and their attributes).
Attributes of different entities (EMPLOYEE,
DEPARTMENT, PROJECT) should not be mixed in the
same relation
Only foreign keys should be used to refer to other
entities
Entity and relationship attributes should be kept apart
as much as possible.
Bottom Line: Design a schema that can be explained
easily relation by relation. The semantics of
attributes should be easy to interpret.
3
Figure 3.1 A
simplified
COMPANY
relational
database schema
4 Chapter 3
Redundant Information in Tuples
and Update Anomalies
Mixing attributes of multiple entities may cause
problems
Information is stored redundantly wasting
storage
Problems with update anomalies
Insertion anomalies
Deletion anomalies
Modification anomalies
5
EXAMPLE OF AN UPDATE ANOMALY
(1)
6
EXAMPLE OF AN UPDATE ANOMALY
(2)
7
Figure 3.2 Two relation schemas
suffering from update anomalies
8
Guideline to Redundant
Information in Tuples and Update
Anomalies
GUIDELINE 2: Design a schema that does not
suffer from the insertion, deletion and update
anomalies. If there are any present, then note them
so that applications can be made to take them into
account.
9
Null Values in Tuples
10
Spurious Tuples
11
Spurious Tuples (2)
12
2 Functional Dependencies
Functional dependencies (FDs) are used to
specify formal measures of the "goodness" of
relational designs
FDs and keys are used to define normal forms
for relations
FDs are constraints that are derived from the
meaning and interrelationships of the data
attributes
A set of attributes X functionally determines a
set of attributes Y if the value of X determines a
unique value for Y.
13
Functional Dependencies (cont.)
X Y holds if whenever two tuples have the same value
for X, they must have the same value for Y.
For any two tuples t1 and t2 in any relation instance r(R):
If t1[X]=t2[X], then t1[Y]=t2[Y]
X Y in R specifies a constraint on all relation
instances r(R)
Written as X Y; can be displayed graphically on a
relation schema as in Figures. (denoted by the arrow).
FDs are derived from the real-world constraints on the
attributes
14
Examples of FD constraints
social security number determines employee name
SSN ENAME
project number determines project name and
location
PNUMBER {PNAME, PLOCATION}
employee ssn and project number determines the
hours per week that the employee works on the
project
{SSN, PNUMBER} HOURS
15
Examples of FD constraints (cont.)
16
Inference Rules for FDs
17
Example
Consider relation r below. Relation r satisfies the FD: A B
r(A B C D)
a1 b1 c1 d1
a2 b2 c1 d1
a1 b1 c1 d2
a3 b3 c3 d3
And hence, the FDs: AB B, AC B, AD B, ABC B, ABD B, ACD B
and ABCD B.
AB B (due to IR1)
AC A (IR1) and A B AC B (due to transitive rule)
AD BD (IR2) and BD B AD B (due to transitive rule)
ABC B, ABD B and ABCD B (due to IR1).
ACD AD and AD B ACD B (due to transitive rule)
18
Example
19
Additional Inference Rules for FDs
Some additional inference rules that are useful:
(Decomposition) If X YZ, then X Y and X Z
(Union) If X Y and X Z, then X YZ
(Pseudotransitivity) If X Y and WY Z, then WX Z
X YZ, YZ Y X Y; X YZ, YZ Z X Z
X XY, XY YZ X YZ
WX WY, WY Z WX Z
21
Closure
21
Example: Closure of a set of
FDs
Given a set of FDs F = {AB C, C B} on relation
r(ABC).
The closure of the set F is
F+ = {A A, AB A, AC A, ABC A, B B, AB
B, BC B, ABC B, C C, AC C, BC C,
ABC C, AB AB, ABC AB, AC AC, ABC
AC, BC BC, ABC BC, ABC ABC, AB C,
AB AC, AB BC, AB ABC, C B, C BC,
AC B, AC AB}
23
Algorithm 3.1 Finding the closure of the
set of attributes.
Algorithm 3.1: Determine X+, the closure of the set of
attributes X under F.
X+ = X;
repeat
oldX+ := X+;
for each function dependency Y Z in F do
if X+ Y then X+:= X+ Z;
until (X+= oldX+)
Example: Given a relation R(A,B,C,D,E,F) and a functional
dependency set F = {f1: D B, f2: A C, f3: AD E, f4: C
F}. Find A+ and {AD}+.
24
Example
Given a relation R(A,B,C,D,E,F) and a functional dependency set
F = {f1: D B, f2: A C, f3: AD E, f4: C F}. Find A+ and {AD}+.
25
How to check membership of
FTo
+ check if the set of FDs F implies X Y (i.e. F|= X Y,
we should check X Y F+.
However, it difficult to find F+ since this set is very large.
So we want to check if F implies X Y without computing
F+. To do so, we have to check if Y X+.
26
Algorithm 3.2 Finding a Key K for R
given a set F of functional dependencies
on the attributes of R.
Input: A universal relation R and a set of functional
dependencies F on the attributes of R.
1. Set K := U; // U: the set of attributes in relation R
2. for each attribute A in K do
{compute (K- {A})+ with respect to F;
if (K – {A})+ contains all the attributes in R then
K:= K – {A}
}
// K is the key
27
Algorithm 3.3 Finding the set K of all
the keys for R given a set F of FDs on
the attributes
procedure of F,R.
Set_of_Keys (U, K)
begin
N := U f F right(f); // U: the set of attributes in relation R
if N+F = U then K := {N}
else
begin
D := f F right(f) f F left(f);
L := U N+F D;
K := ;
for each Li L do
if {NLi}+F = U then K := K {NLi};
while Ki, Kj K and Ki Kj do
K := K {Kj};
end
end;
Note: right(f) is the attributes at the right side of FD f 28
Example 1
Given a relation R(A,B,C,D,E,F) and a set of functional
dependencies F = {D B, A C, AD E, C F}. To
find all the keys for R, we perform the following
computations:
N = U f F right(f) = {ABCDEF} – {BCEF} = {AD}
N+F = {AD}+F = {ADBCEF} = U
So the relation R has only one key, that is {AD}.
29
Example 2
Given a relation R(A,B,C,D,E,F) and a set of functional
dependencies F = {A D, C AF, AB EC}. To find all the
keys for R, we perform the following computations:
N = U f F right(f) = {ABCDEF} – {DAFEC} = {B}
N+F = {B}+F = {B} U
D := f F right(f) f F left(f) = {DAFEC} – {ACB} = {DFE}
L := U N+F D = {ABCDEF} – {BDFE} = {AC}. The subsets of L
are {A}, {C} and {AC}.
{BA}+F = {BADECF} = U. So {BA} is a key for R.
{BC}+F = {BCAFED} = U. So {BC} is a key for R.
We don’t need to calculate {BAC}+F sine {BAC} is a superset of
{BA} and {BC}.
Finally, relation R has two keys {BA} and {BC}.
30
3. Normal Forms
Normalization: The process of decomposing
unsatisfactory "bad" relations by breaking up
their attributes into smaller relations
30
Normalization of Relations
31
Definitions of Keys and Attributes
Participating in Keys
If a relation schema has more than one key, each is
called a candidate key. One of the candidate keys
is arbitrarily designated to be the primary key, and
the others are called secondary keys.
A Prime attribute must be a member of some
candidate key
A Nonprime attribute is not a prime attribute—that
is, it is not a member of any candidate key.
33
Second Normal Form
A relation schema R is in second normal form (2NF)
if every non-prime attribute A in R is fully functionally
dependent on the primary key.
Definition of superkey
Definition:
Superkey of relation schema R - a set of attributes S
of R that contains a key of R
34
BCNF (Boyce-Codd Normal Form)
34
Figure 3.3 Example on Boyce-Codd
normal form
35
Figure 3.4 A relation TEACH that is in
3NF but not in BCNF
37
Achieving the BCNF by Decomposition
(1)
Two FDs exist in the relation TEACH:
fd1: { student, course} instructor
fd2: instructor course
{student, course} is a candidate key for this relation and
that the dependencies shown follow the pattern in
Figure 3.3 (b). So this relation is in 3NF but not in BCNF
A relation NOT in BCNF should be decomposed so as
to meet this property, while possibly forgoing the
preservation of all functional dependencies in the
decomposed relations.
37
Achieving the BCNF by Decomposition
(2)
Three possible decompositions for relation TEACH
1. {student, instructor} and {student, course}
2. {course, instructor } and {course, student}
3. {instructor, course } and {instructor, student}
All three decompositions will lose fd1. We have to settle for
sacrificing the functional dependency preservation. But we
cannot sacrifice the non-additivity (lossless) property after
decomposition.
Out of the above three, only the 3rd decomposition will not
generate spurious tuples after join.(and hence has the non-
additivity property).
38
Fourth Normal form
When a relation is in BCNF, there may still be
anomalies that results from multivalued
dependencies.
For example, there is a user view which shows for
each course the instructors who can teach that
course and the textbooks that are used.
In this view we have the following assumptions:
Each course has a well-defined set of instructors
are used.
The textbooks that are used for a given course are
40
Figure 3.5. Data with multivalued
dependencies
The relation OFFERING is in first normal form. The primary key of this
relation consists of all three attributes. Because there are no
determinants other than the primary key, the relation is in BCNF.
Update anomaly: suppose we want to add a third textbook to the
Management course, we have to add three new rows to the relation.
41
Multivalued dependency
The type of dependency that exists when there are at
least 3 attributes (e.g. A, B, and C) in a relation, with a
well-defined set of B and C values with for each A value,
but those B and C values are independent of each other.
To remove the multivalued dependency from a relation,
we divide the relation into 2 relations. Each of these tables
two attributes that have a multivalued relationship in the
original relation.
42
Figure 3.6. Relations in 4NF
43
Reference
Ramez Elmasri, Shamkant B.Navathe,
Fundamentals of Database Systems-
6thEdition, Pearson, 2011.
Dương Tuấn Anh, Nguyễn Trung Trực, Hệ Cơ
Sở Dữ Liệu, Nhà Xuất Bản Đại Học Quốc Gia
TP. Hồ Chí Minh, 2006.
44
tuple, update anomalies, null value, functional dependency,
Armstrong’s inference rules, reflexive, augmentation,
transitive, decomposition rule, union, pseudo-transitive,
closure of a set of attributes, closure of a set of functional
dependencies, candidate key, primary key, prime attribute,
non-prime attribute, lossless decomposition, spurious
tuples, superkey, Boyce Codd Normal Form, 4th Normal
Form, multivalued dependencies.
45