DB Design
DB Design
When designing a database using the relational model, often one has to select
among alternative sets of relation schemes. Some choices are more convenient
than others for various reasons. The goal of the designer at this point should be
to select relation schemes such that database consistency is preserved following
any update operations. If the relation schemes are not properly chosen,
anomalies can occur after database operations. For example, let us consider the
relation scheme SUPPLIER (sname, saddress, sphone, item, price). The
problems with this scheme design are as follows:
1. Redundancy: The address of the supplier is repeated once for each item
supplied.
1
Functional Dependency
Data Dependency requires that if some tuples in the database fulfil certain
qualities, then either some other tuples must also exist in the database, or some
values of the given tuples must be equal.
The underlying semantics of these two fds are – the roll_no of a student is
unique and in each course a student gets unique grade.
The only way to determine the functional dependencies that hold for a relation
scheme is by careful examination of the meaning of each attribute. In this
sense, the dependencies actually capture semantics of the data.
2
Full functional dependency
The term full functional dependency is used to indicate the minimum set of
attributes in the left side of an fd to functionally determine the right hand side.
Formally Y is fully functionally dependent on X if
i) Y is functionally dependent on X
ii) Y is not functionally dependent on any proper subset of X.
Keys
The concept of key is very important in any file management system. The key
is used to uniquely identify a record. There is an analogous concept for
relations with functional dependencies. If R (A1, A2, . . ., An) is a relation
scheme with fds F, and X is a subset of { A1 , A2, . . ., An }, we say X is a key
of R if
i) X A1 , A2, . . ., An is implied by F
ii) R is not functionally dependent on any proper
subset of X i.e. X A1 , A2, . . ., An is a full
functional dependency.
There may be more than one key for a relation. They are defined as candidate
keys. Among the candidate keys, the key which is chosen by the designer is
termed as primary key. The term super key is used to denote any superset of a
key.
Here reg_no of each student is unique. So with the help of reg_no alone a
student record can be found out. Alternatively the combination of ( course,
dept, sem, roll_no) may find out a student record uniquely. Hence there are two
candidate keys for identifying a student record. A designer in this context may
chose either ( reg_no) or ( course, dept, sem, roll_no) as primary key. The
combination ( reg_no, name) may be considered as super key.
3
Armstrong’s axioms for functional dependencies
There are several inference rules which tell us how one or more dependencies
imply other dependencies. The set of rules is often called Armstrong’s axioms.
There are several other inference rules that follow from Armstrong’s axioms.
Prime and non-prime attributes: Attributes which are parts of any candidate
key of relation are called as prime attribute, others are non-prime attributes. For
example, name, address in STUDENT relation are prime attributes, others are
non-prime attributes.
Computing Closures
4
All such sets of X+, in combine, form a closure of F.
At the other extreme, computing X+ for a set of attributes X is not hard; it takes
time proportional to the length of all the dependencies in F. There is a simple
way (membership algorithm) to compute X+ . With the help of an example this
can be explained.
AB C D EG
CA BE C
BC D CG BD
ACD B CE AG
Let X = BD. X(0) = BD. To compute X(1) we look for dependencies that have a
left side B, D or BD. There is only one D EG , so we adjoin E and G to X(0)
and make X(1) = BDEG. In this way if we continue till the results of two
iterations become equal.
Minimal Cover
((F´ - { X A }) { Z A }) is a cover of F.
Here the first condition implies that no dependency in F´ is redundant. The
second condition guarantees that no attributes on any left side is redundant i.e.
each fd is a full functional dependency. It should be pointed out that minimal
cover of a set of fds is not unique.
Membership algorithm can be used to check each fd whether they can be
deduced from the rest of the set.
5
Normal Forms
There are a number of normal forms. These normal forms guarantee that most
of the problems of redundancy and anomalies do not occur.
Mech no. Skill no. Skill cat. Mech name Mech age Shop no city Supv. Prof
If Hary is the only mechanic in Shop Number 52 and he quits, all tuples
concerning him will have to be deleted. But at the same time, the information
that Jay is the supervisor of Shop no. 52 will be deleted. So we are going to
loose one information. There are many other problems as well.
Thus a poor tabular design for a database may cause problems of redundancy
and anomalies. Normal forms evolved in the relational model as an attempt to
do away with these problems.
6
First Normal Forms (1NF)
The 1NF has the property that every data entry for each attribute is non-
decomposable. In the above table ‘skill no.’ and ‘skill cat’ contain
decomposable data. So the above table may be converted to 1NF as follows :
Mech no. Skill no. Skill cat. Mech name Mech age Shop no city Supv. Prof
One may verify that ( mech no skill no ) combination of attributes is a valid key
for the above relation. Inspecting the set of fds it can be said that many of the
non-prime attributes are functionally dependent on only part of the key i.e.
either mech no or skill no. So it violates the regulation of 2NF. The table has to
be converted into 2NF compatible form as follows :
Mechanic :
Skill : Proficiency :
skill no skill cat mech no skill no prof
113 Body 92 113 3
55 Engn 47 113 5
21 Axle 47 55 1
28 Tire 43 55 6
52 21 2
52 28 6
7
Another Example
For the example “personnel data”, “Skill” and “Proficiency” relations are
already in 3NF as there is no fds related to these relations where nonkey
nonkey fd is there. Whereas in the “Mechanic” relation there are two such fds.
They are shop no supv and shop no city. So the relation “Mechanic” may
be converted to two relations as follows :
Mechanic : Shop :
Let us consider the relation ORDER (order#, parts, supplier, unit_price, qty)
with functional dependencies as follows :
order# parts supplier qty
supplier parts unit_price
Here order# is a key, but unit_price depends on non prime attributes supplier
and parts. Hence this relation is not in 3NF. In this example there are
insertion/deletion anomalies. We can’t store here unit_price information for
any parts supplied by a supplier unless an order has been placed for that parts.
8
Boyce-Codd Normal Form (BCNF)
Here the first fd implies that at any instant of time a quarters is allocated to
only one employee and the rent is fixed at the time of occupancy. The second
fd represents the information that an employee is allocated only one quarter.
This relation is in 3NF with qtr# and date_of_occupancy as key. This scheme
suffers from the anomaly that we cannot record the quarters allocated to an
employee unless we know the date of occupancy. To remove such problems the
relation should be in BC normal form i.e. BCNF.
Another Example:
One student can enrol for multiple subjects. For example, student with
student_id 101, has opted for subjects - Java & C++
And, there can be multiple professors teaching one subject like we have
for Java.
9
In the table above student_id, subject together form the primary key,
because using student_id and subject, we can find all the columns of the
table.
One more important point to note here is, one professor teaches only one
subject, but one subject may have two different professors.
This table satisfies the 1st Normal form because all the values are atomic,
column names are unique and all the values stored in a particular column are of
same domain.
This table also satisfies the 2nd Normal Form as there is no Partial
Dependency.
And, there is no nonkey nonkey Dependency, hence the table also satisfies
the 3rd Normal Form.
BCNF conversion
Student
student_id p_id
101 1
101 2
and so on...
Professor
p_id professor subject
1 P.Java Java
2 P.Cpp C++
and so on...
10
The fourth normal form deals with multiple associations among the attributes.
Let us take the same example of personnel information of an automobile shop.
A shop may have more than one supervisor. If shop number 21 now has two
supervisors Bob and Katz, it would be necessary to add 2 additional tuples to
the relation. The updated table is shown below:
Mech no. Skill no. Skill cat. Mech name Mech age Shop no city Supv.
Prof
Mechanic :
Skill : Proficiency :
skill no skill cat mech no skill no prof
113 Body 92 113 3
55 Engn 47 113 5
21 Axle 47 55 1
28 Tire 43 55 6
52 21 2
52 28 6
11
Shop No. City Shop No. Supervisor
52 Delhi 52 Jay
44 Bombay 44 Chris
21 Madras 21 Bob
21 Katz
3.
If all these conditions are true for any relation(table), it is said to have multi-
valued dependency.
Example 1
12
Brown W Jim
Brown X Jim
Brown Y Jim
Brown Z Jim
Brown W Joan
Brown X Joan
Brown Y Joan
Brown Z Joan
Brown W Bob
Brown X Bob
Brown Y Bob
Brown Z Bob
Emp_Project Emp_dependents
Example 2
V_I_P
13
V3 I1 P2
Vendor_Supply
Vendor_code Item_code
V1 I1
V1 I2
V2 I2
V2 I3
V3 I1
(a)
Vendor_Project
Vendor_code Project_no
V1 P1
V1 P3
V2 P1
V3 P1
V3 P2
(b)
These relations (a) & (b) still have a problem. Even though a vendor is capable
to supply items, a project may not need the same. For example V2 is capable
to supply I3 but the same is not needed by project P1.
Project_Items
Project_no Item_code
P1 I1
P1 I2
P2 I1
P3 I1
P3 I2
14
(c)
Now if vendor_supply and vendor_project relations are joined, the original
V_I_P can be constructed. Now the intermediate resultant relation is to be
joined with Project_Items the following desired relation is obtained :
modified_V_I_P
Let us consider that certain technicians represent certain companies and also
specialise in servicing certain models of computer. The following relation
Business embodies these two constraints.
Business
15
The above relation contains two MVDs –
It is clear that the relation Business contains some redundancy due to the above
MVDs and may be reduced to the 4NF by decomposing into two following
relations :
Represents Sells
Let us now assume that an additional constraint exists in this enterprise. Not all
technicians are trained to service all computer models. The services by
technicians are given below :
Therefore the relation Business does not correctly represent these constraints.
The solution is to add a MVD – Technician -->> Model into the database.
This can be achieved if we join Business with the relation Training (given
below).
Training
Technician Model
Chow D Portable
Chow D Rainbow
Smith J AT
Smith J RT
Smith J PS2
16
Decomposition of Relation Schemes
Lossless Joins
Let us take one relation SAIP (supplier_name, address, item, price). Let it has
been divided into two – SA and SIP. The fds are – S A and SI P. A table
may be drawn as follows :
S A I P
------------------------------------------
a1 a2 b13 b14
a1 b22 a3 a4
17
we may equate their symbols for A, making b22 become a2. The resulting table
is as follows :
S A I P
------------------------------------------
a1 a2 b13 b14
a1 a2 a3 a4
Example :
R = ABCDE, R1 = AD, R2 = AB, R3 = BE, R4 = CDE, R5 = AE
The table :
A B C D E
------------------------------------------------------------
a1 b12 b13 a4 b15
a1 a2 b23 b24 b25
b31 a2 b33 b34 a5
b41 b42 a3 a4 a5
a1 b52 b53 b54 a5
18
The above algorithm can be applied for many number of relations. But in case
of decomposition into two relations there is a simpler test.
Note that these dependencies need not be in the given set F, it is sufficient that
they be in F+.
Input : Relation scheme R and set of fds F, which we assume without loss of
generality to be a minimal cover.
Algorithm
i := 0
for each fds X Y in F do
begin
i := i + 1 ;
Ri := XY;
end
if none of the schemes Rj , 1 j i
contains a candidate key for R then
begin
i := i + 1;
Ri := any candidate key for R;
end
if Uj=1 Rj R
then begin
Ri+1 := R - Uj = 1 Rj ;
i:= i + 1;
end
return ( R1, R2, . . . , Ri)
19
Lossless Join Decomposition into BCNF
Any relation scheme has a lossless join decomposition into BCNF, and it has a
decomposition into 3NF that has a lossless join and is also dependency
preserving. However, there may be no decomposition of a relation scheme into
BCNF that is dependency preserving.
Example Let us consider a relation scheme CTHRSG. The fds are as follows :
C T Each course has one teacher
HR C Only one course can meet in a room at one time
HT R A teacher can be in only one room at one time
CS G Each student has one grade in each course
HS R A student can be in only one room at one time
20
CTHRSG
key = HS
C T CS G
HR C HS R
TH R
CSG CTHRS
key = CS key = HS
CS G C T TH R
HR C HS R
CT CHRS
key = C key = HS
CT CH R HR C
HS R
CHR CHS
keys = CH, HR key = HS
CH R HR C HS C
Tree of decomposition
21