Relational Database Design
Relational Database Design
Redundancy:
Data for branch-name, branch-city, assets are repeated for each
loan that a branch makes
Wastes space
Complicates updating, introducing possibility of inconsistency of
assets value
Null values
Cannot store information about a branch if no loans exist
Can use null values, but they are difficult to handle.
Decomposition
Decomposition of R = (A, B)
R2 = (A) R2 = (B)
A B A B
1 1
2 2
1 B(r)
A(r)
r
A B
A (r) B (r)
1
2
1
2
Goal — Devise a Theory for the
Following
F+ = F
repeat
for each functional dependency f in F+
apply reflexivity and augmentation rules on f
add the resulting functional dependencies to F+
for each pair of functional dependencies f1and f2 in F+
if f1 and f2 can be combined using transitivity
then add the resulting functional dependency
to F+
until F+ does not change any further
result := ;
while (changes to result) do
for each in F do
begin
if result then result := result
end
Example of Attribute Set Closure
R = (A, B, C, G, H, I)
F = {A B
AC
CG H
CG I
B H}
(AG)+
1. result = AG
2. result = ABCG (A C and A B)
3. result = ABCGH (CG H and CG AGBC)
4. result = ABCGHI (CG I and CG AGBCH)
Is AG a candidate key?
1. Is AG a super key?
1. Does AG R?
2. Is any subset of AG a superkey?
1. Does A+ R?
2. Does G+ R?
Uses of Attribute Closure
There are several uses of the attribute closure
algorithm:
Testing for superkey:
To test if is a superkey, we compute +, and check if
+ contains all attributes of R.
Testing functional dependencies
To check if a functional dependency holds (or, in
other words, is in F+), just check if +.
That is, we compute + by using attribute closure, and
then check if it contains .
Is a simple and cheap test, and very useful
Computing closure of F
For each R, we find the closure +, and for each S
+, we output a functional dependency S.
Canonical Cover
R = (A, B, C)
F = {A BC
BC
AB
AB C}
Combine A BC and A B into A BC
Set is now {A BC, B C, AB C}
A is extraneous in AB C because B C logically
implies AB C.
Set is now {A BC, B C}
C is extraneous in A BC since A BC is logically
implied by A B and B C.
The canonical cover is:
AB
BC
Goals of Normalization
Decide whether a particular relation R is in “good” form.
In the case that a relation R is not in “good” form,
decompose it into a set of relations {R1, R2, ..., Rn} such
that
each relation is in good form
the decomposition is a lossless-join decomposition
Our theory is based on:
functional dependencies
multivalued dependencies
Decomposition
A B A B
1 1
2 2
1 B(r)
A(r)
r
A B
A (r) B (r)
1
2
1
2
Normalization Using Functional Dependencie
R = (A, B, C)
F = {A B, B C)
R1 = (A, B), R2 = (B, C)
Lossless-join decomposition:
R1 R2 = {B} and B BC
Dependency preserving
R1 = (A, B), R2 = (A, C)
Lossless-join decomposition:
R1 R2 = {A} and A AB
Not dependency preserving
(cannot check B C without computing R1 R2 )
Testing for Dependency
Preservation
is trivial (i.e., )
is a superkey for R
Example
R = (A, B, C)
F = {A B
B C}
Key = {A}
R is not in BCNF
Decomposition R1 = (A, B), R2 = (B, C)
R1 and R2 in BCNF
Lossless-join decomposition
Dependency preserving
Testing for BCNF
JK L
Third Normal Form: Motivation
Example
R = (J, K, L)
F = {JK L, L K}
Two candidate keys: JK and JL
R is in 3NF
JK L JK is a superkey
LK K is contained in a candidate key
BCNF decomposition has (JL) and (LK)
Testing for JK L requires a join
There is some redundancy in this schema
Equivalent to example in book:
Banker-schema = (branch-name, customer-name, banker-
name)
banker-name branch name
branch name customer-name banker-name
Testing for 3NF
Relation schema:
Banker-info-schema = (branch-name, customer-name,
banker-name, office-number)
The functional dependencies for this relation
schema are:
banker-name branch-name office-number
customer-name branch-name banker-name
The key is:
{customer-name, branch-name}
Applying 3NF to Banker-info-
schema
j2 l1 k1
j3 l1 k1
null l2 k2
A schema that is in 3NF but not in BCNF has the problems of
repetition of information (e.g., the relationship l1, k1)
need to use null values (e.g., to represent the relationship
l2, k2 where there is no corresponding value for J).
Design Goals
course teacher
database Avi
database Hank
database Sudarshan
operating systems Avi
operating systems Jim
teaches
course book
database DB Concepts
database Ullman
operating systems OS Concepts
operating systems Shaw
text
We shall see that these two relations are in
Fourth Normal Form (4NF)
Multivalued Dependencies (MVDs)
Tabular representation of
Example
In our example:
course teacher
course book
The above formal definition is supposed to
formalize the notion that given a particular
value of Y (course) it has associated with it a
set of values of Z (teacher) and a set of
values of W (book), and these two sets are in
some sense independent of each other.
Note:
If Y Z then Y Z
Indeed we have (in above notation) Z1 = Z2
The claim follows.
Use of Multivalued Dependencies
result: = {R};
done := false;
compute D+;
Let Di denote the restriction of D+ to Ri
while (not done)
if (there is a schema Ri in result that is not in 4NF)
then
begin
let be a nontrivial multivalued
dependency that holds
on Ri such that Ri is not in Di, and ;
result := (result - Ri) (Ri - ) (, );
end
else done:= true;
Note: each Ri is in 4NF, and decomposition is lossless-join
Example
R =(A, B, C, G, H, I)
F ={ A B
B HI
CG H }
R is not in 4NF since A B and A is not a superkey for R
Decomposition
a) R1 = (A, B) (R1 is in 4NF)
b) R2 = (A, C, G, H, I) (R2 is not in 4NF)
c) R3 = (C, G, H) (R3 is in 4NF)
d) R4 = (A, C, G, I) (R4 is not in 4NF)
Since A B and B HI, A HI, A I
e) R5 = (A, I) (R5 is in 4NF)
f)R6 = (A, C, G) (R6 is in 4NF)
Further Normal Forms