DBMS_Module4
DBMS_Module4
• There are two levels at which we can discuss the goodness of relation
schemas.
– logical (or conceptual) level—how users interpret the relation
schemas and the meaning of their attributes.
– implementation (or physical storage) level—how the tuples in a
base relation are stored and updated.
• As with many design problems, database design may be performed using
two approaches:
– Bottom-up or Top-down.
– A bottom-up design methodology (also called design by synthesis)
considers the basic relationships among individual attributes as the
starting point and uses those to construct relation schemas.
• In contrast, a top-down design methodology (also called design by
analysis) starts with a number of groupings of attributes into relations that
exist together naturally, for example, on an invoice, a form, or a report.
• One goal of schema design is to minimize the storage space used by the
base relations (and hence the corresponding files).
– Grouping attributes into relation schemas has a significant effect on
storage space.
• For example, compare the space used by the two base relations
EMPLOYEE and DEPARTMENT with that for an EMP_DEPT base
relation, which is the result of applying the NATURAL JOIN operation to
EMPLOYEE and DEPARTMENT.
• Storing natural joins of base relations leads to an additional problem
referred to as update anomalies.
– These can be classified into insertion anomalies, deletion anomalies,
and modification anomalies.
• Consider the relation:
– EMP_PROJ( Emp#, Proj#, Ename, Pname, No_hours)
• Insertion Anomalies:
– Suppose you want to add new information in any relation because of
some constraints.
– Ex:
• Cannot insert a project unless an employee is assigned to it.
• Conversely
– Cannot insert an employee unless an he/she is assigned to a
project.
• It is difficult to insert a new department that has no employees as yet in
the EMP_DEPT relation.
• The only way to do this is to place NULL values in the attributes for
employee.
– This violates the entity integrity for EMP_DEPT because its primary
key Ssn cannot be null.
• This problem does not occur in the design of Figure 14.2 because a
department is entered in the DEPARTMENT relation whether or not any
employees work for it, and whenever an employee is assigned to that
department, a corresponding tuple is inserted in EMPLOYEE.
Deletion Anomalies:
– This problem does not occur in the database of Figure 14.2 because
DEPARTMENT tuples are stored separately.
Modification Anomalies:
• In some schema designs we may group many attributes together into a “fat”
relation.
• If many of the attributes do not apply to all tuples in the relation, we end up
with many NULLs in those tuples.
• This can waste space at the storage level and may also lead to problems with
understanding the meaning of the attributes and with specifying JOIN
operations at the logical level.
• Another problem with NULLs is how to account for them when aggregate
operations such as COUNT or SUM are applied.
• SELECT and JOIN operations involve comparisons; if NULL values are
present, results will be unpredictable.
• NULLs can have multiple interpretations, such as attribute does not apply to
this tuple, attribute value for this tuple is unknown, value is known but
absent;
Guideline 3:
• As far as possible, avoid placing attributes in a base relation whose
values may frequently be NULL.
• If NULLs are unavoidable, make sure that they apply in exceptional
cases only and do not apply to a majority of tuples in the relation.
Generation of Spurious Tuples
• Bad designs for a relational database may result in erroneous results for
certain JOIN operations
• A spurious tuple is, basically, a record in a database that gets created when
two tables are joined badly.
– spurious tuples are created when two tables are joined on attributes
that are neither primary keys nor foreign keys
• The "lossless join" property is used to guarantee meaningful results for
join operations .
Guideline 4:
• Design relation schemas so that they can be joined with equality
conditions on attributes that are appropriately related (primary key,
foreign key) pairs in a way that guarantees that no spurious tuples are
generated.
• Avoid relations that contain matching attributes that are not (foreign key,
primary key) combinations because joining on such attributes may
produce spurious tuples.
Functional Dependencies
• From the semantics of the attributes and the relation, we know that the
following functional dependencies should hold:
• These functional dependencies specify that:
– (a) the value of an employee’s Social Security number (Ssn) uniquely
determines the employee name (Ename),
– (b) the value of a project’s number (Pnumber) uniquely determines the
project name (Pname) and location (Plocation), and
– (c) a combination of Ssn and Pnumber values uniquely determines the
number of hours the employee currently works on the project per week
(Hours).
Normal Forms Based on Primary Keys
The attribute ID is the identification key. All attributes are single valued (1NF). The
table is also in 2NF.
(a) The LOTS relation with its functional dependencies FD1 through FD4.
• Suppose that the following two additional functional dependencies hold in
LOTS:
– dependency FD3 says that the tax rate is fixed for a given county (does
not vary lot by lot within the same county),
– whereas FD4 says that the price of a lot is determined by its area
regardless of which county it is in.
• The LOTS relation schema violates the general definition of 2NF because
Tax_rate is partially dependent on the candidate key {County_name,
Lot#}, due to FD3.
• To normalize LOTS into 2NF, we decompose it into the two relations LOTS1
and LOTS2, shown in Figure 14.12(b).
(b) Decomposing into the 2NF relations LOTS1 and LOTS2.
(c) Decomposing LOTS1 into the 3NF relations LOTS1A and LOTS1B.
General Definition of Third Normal Form:
• A relation schema R is in third normal form (3NF) if, whenever a
nontrivial functional dependency X → A holds in R, either
(a) X is a superkey of R, or
(b) A is a prime attribute of R. (each element of A is part of some
candidate key)
• According to this definition, LOTS2 (Figure 14.12(b)) is in 3NF.
• FD4 in LOTS1 violates 3NF because
– Area is not a superkey and Price is not a prime attribute in LOTS1.
• To normalize LOTS1 into 3NF, decompose it into the relation schemas
LOTS1A and LOTS1B shown in Figure 14.12(c).
– Both LOTS1A and LOTS1B are in 3NF.
(d) Progressive normalization of LOTS into a 3NF design.
Assignment
convert the following invoice table to 1nf, 2nf & 3nf
1NF
2NF
3NF
• Identify candidate key in the below relation . Decompose the table into
1NF, 2NF, 3NF.
• STUD_NO -> STUD_STATE
• STUD_STATE -> STUD_COUNTRY are true.
– So STUD_COUNTRY is transitively dependent on STUD_NO. It
violates the third normal form.
• To convert it in third normal form, we will decompose the relation
– One student can enroll for multiple subjects. For example, student with
student_id 101, has opted for subjects - Java & C++.
– In the table above student_id, subject together form the primary key,
because using student_id and subject, we can find all the columns of
the table.
– One more important point to note here is, one professor teaches only
one subject, but one subject may have two different professors.
• Here columns manuf_year and color are independent of each other and
dependent on bike_model.
• In this case these two columns are said to be multivalued dependent on
bike_model. These dependencies can be represented like this:
» bike_model ->-> manuf_year
» bike_model ->-> color
4NF (Contd..)
• As you can see in the table above, student with s_id 1 has opted for two
courses, Science and Maths, and has two hobbies, Cricket and Hockey.
• What is the problem ???
• Two records for student with s_id 1, will give rise to two more records, as
shown below, because for one student, two hobbies exists, hence along
with both the courses, these hobbies should be specified.
• Fifth normal form is satisfied when all tables are broken into as many
tables as possible in order to avoid redundancy.
• Also known as project join NF.
• Once it is in fifth normal form it cannot be broken into smaller relations
without changing the facts or the meaning.
• A relation decomposed into two relations must have loss-less join
Property, which ensures that no spurious or extra tuples are generated,
when relations are reunited through a natural join.
• Whenever a non-trivial join dependency *{R1 , R2 , …, Rn } holds in R,
implies every Ri (all the attributes of Ri ) is a superkey for R.
Example:
• Let us consider the relation STOCK(Agent, Company, Product)
• We assume that:
• A key ‘K’ is a superkey with the additional property that removal of any
attribute from ‘K’ will cause ‘K’ not to be a superkey any more.
– Ssn → Ssn
– Dnumber → Dname
• The closure F+ of F is the set of all functional dependencies that can be inferred
from F.
• Set of inference rules can be used to infer new dependencies from a given set
of dependencies.
• We use the notation F |=X → Y to denote that the functional dependency X → Y is
inferred from the set of functional dependencies F.
• They were proposed first by Armstrong (1974) and hence are known as
Armstrong’s axioms.
• Reflexive Rule (IR1)
– In the reflexive rule, if Y is a subset of X, then X determines Y.
• If X ⊇ Y then X → Y
• Example:
– X = {a, b, c, d, e}
– Y = {a, b, c}
• Augmentation Rule (IR2)
– The augmentation is also called as a partial dependency.
– In augmentation, if X determines Y, then XZ determines YZ for any Z
• If X → Y then XZ → YZ
• Example:
– For R(ABCD), if A → B then AC → BC
• Transitive Rule (IR3)
– In the transitive rule, if X determines Y and Y determine Z, then X
must also determine Z.
• If X → Y and Y → Z then X → Z
• Union or additive Rule (IR4)
– Union rule says, if X determines Y and X determines Z, then X must
also determine Y and Z
• If X → Y and X → Z then X → YZ
– Proof:
• Decomposition or project Rule (IR5)
Definition:
For each such set of attributes X, we determine the set X+ of attributes that
are functionally determined by X based on F; X+ is called the closure of X
under F.
Determining X+, the Closure of X under F
• Algorithm 15.1 starts by setting X+ to all the attributes in X. By IR1, we
know that all these attributes are functionally dependent on X.
• Using inference rules IR3 and IR4, we add attributes to X+, using each
functional dependency in F.
• We keep going through all the dependencies in F (the repeat loop) until no
more attributes are added to X+ during a complete cycle (of the for loop)
through the dependencies in F.
• For example, consider the following relation schema about classes held at a
university in a given academic year.
CLASS ( Classid, Course#, Instr_name, Credit_hrs, Text, Publisher,
Classroom, Capacity)
• Let F, the set of functional dependencies for the above relation include the
following f.d.s:
– FD1: Classid → Course#, Instr_name, Credit_hrs, Text, Publisher,
Classroom, Capacity;
– FD2: Course# → Credit_hrs;
– FD3: {Course#, Instr_name} → Text, Classroom;
– FD4: Text → Publisher
– FD5: Classroom → Capacity
• Using the inference rules about the FDs and applying the definition of
closure, we can define the following closures:
– { Classid } + = { Classid , Course#, Instr_name, Credit_hrs, Text,
Publisher, Classroom, Capacity } = CLASS
– { Course#} + = { Course#, Credit_hrs}
– {Course#, Instr_name } + = { Course#, Credit_hrs, Text, Publisher,
Classroom, Capacity }
– {Text}+ ={Text,Publisher}
– {Classroom}+ ={Classroom,capacity}
consider the relation schema EMP_PROJ. From the semantics of the
attributes, we specify
• SSN → ENAME
• PNUMBER → {PNAME, PLOCATION}
• {SSN, PNUMBER} → HOURS
• Using Algorithm 15.1, we calculate the following closure sets with respect
to F;
Equivalence of Sets of Functional Dependencies
Step 3. As FD2 ⊃ FD1 and FD1 ⊃ FD2 both are true FD2 =FD1 is true. These
two FD sets are semantically equivalent.
Find whether the following FD’s are equivalent
Definition:
A canonical cover of a set of functional dependencies F is a simplified set
of functional dependencies that has the same closure as the original set F.
• Canonical cover: A canonical cover Fc of a set of functional dependencies
F such that ALL the following properties are satisfied:
Algorithm 15.2. Finding a Minimal Cover F for a Set of
Functional Dependencies E
Input: A set of functional dependencies E.
1. Set F := E.
2. Replace each functional dependency X → {A1, A2, … , An} in F by the n
functional dependencies X →A1, X →A2, … , X → An.
3. For each functional dependency X → A in F
– for each attribute B that is an element of X
» if { {F − {X → A} } ∪ { (X − {B} ) → A} } is equivalent to F
• then replace X → A with (X − {B} ) → A in F.
– [This constitutes removal of an extraneous attribute B contained in
the left-hand side X of a functional dependency X → A when possible]
4. For each remaining functional dependency X → A in F if {F − {X → A} } is
equivalent to F, then remove X → A from F.
– [This constitutes removal of a redundant functional dependency X →
A from F when possible]
Example 1: Let the given set of FDs be E: {B → A, D → A, AB → D}. We have to
find the minimal cover of E.
• All above dependencies are in canonical form (that is, they have only one attribute
on the right-hand side), so we have completed step 1 of Algorithm 15.2 and can
proceed to step 2.
• In step 2 we need to determine if AB → D has any redundant (extraneous) attribute
on the left-hand side; that is, can it be replaced by B → D or A → D?
– Since B → A, by augmenting with B on both sides (IR2), we have BB → AB, or
B → AB (i). However, AB → D as given (ii).
– Hence by the transitive rule (IR3), we get from (i) and (ii), B → D. Thus AB →
D may be replaced by B → D.
• We now have a set equivalent to original E, say E′: {B → A, D → A, B → D}.
• No further reduction is possible in step 2 since all FDs have a single attribute on the
left-hand side.
• In step 3 we look for a redundant FD in E′. By using the transitive rule on B → D
and D → A, we derive B → A. Hence B → A is redundant in E′ and can be
eliminated.
• Therefore, the minimal cover of E is F: {B → D, D → A}.
Example 2: Let the given set of FDs be G: {A → BCDE, CD → E}.
• Here, the given FDs are NOT in the canonical form. So we first convert
them into:
• E: {A → B, A→ C, A→ D, A→ E, CD → E}.
• In step 2 of the algorithm, for CD → E, neither C nor D is extraneous on
the left-hand side, since we cannot show that C → E or D → E from the
given FDs. Hence we cannot replace it with either.
• In step 3, we want to see if any FD is redundant. Since A→ CD and CD →
E, by transitive rule (IR3), we get A→ E. Thus, A→ E is redundant in G.
• So we are left with the set F, equivalent to the original set G as: {A → B,
A→ C, A→ D, CD → E}. F is the minimum cover.
• F = {A → C, AB → C, C → DI, CD → I, EC → AB, EI → C}
• F={BC->ADEF,F->DE}
Algorithm 15.2(a). Finding a Key K for R Given a Set F of Functional
Dependencies
Definition:
Formally, a decomposition D = {R1, R2, … , Rm} of R has the lossless
(nonadditive) join property with respect to the set of dependencies F on R if,
for every relation state r of R that satisfies F, the following holds, where
* is the NATURAL JOIN of all the relations in D: *(πR1(r), … , πRm(r)) =
r.
• The word loss in lossless refers to loss of information, not to loss of tuples. If a
decomposition does not have the lossless join property, we may get additional spurious
tuples
• The nonadditive join property ensures that no spurious tuples result after the application of
PROJECT and JOIN operations.
Algorithm 15.3. Testing for Nonadditive Join Property
• Example: Consider the schema R=ABCD, subjected to FDs F= { A → B, B →
C }, and the Non-binary partition D1= {ACD, AB, BC}. Question Is D1a
Lossless decomposition?
• Example 8: Consider the schema R=ABCD, subjected to FDs F=
{A→B,B → C }, and the Non-binary partition D2= {ABC, AD}.
• Example: Consider the schema R=ABCDE, subjected to FDs F= {A→C,
B → C, C → D, DE →C, CE →A}, and the Non-binary partition D4= AD,
B, BE, CDE, AE}. Question Is D4 a Lossless decomposition?
• Example : –Your turn…
– Consider the schema R=ABCD, subjected to FDs F= {A→B, C → D},
and the Non-binary partitions D4= {AB, AC, AD} and D5= {AB, AC,
CD}. Question. Are partitions D4 and D5 Lossless decompositions?
Algorithms for Relational Database
Schema Design
■ Preserves dependencies
■ Has the nonadditive join property
■ Is such that each resulting relation schema in the decomposition is in
3NF
Algorithm 15.4 Relational Synthesis into 3NF with Dependency Preservation
and Nonadditive Join Property
• DKNF stands for Domain Key Normal Form requires the database that
contains no constraints other than domain constraints and key constraints.
• In DKNF, it is easy to build a database.
• It avoids general constraints in the database which are not clear domain or
key constraints.
• A constraint in this definition is any rule that’s precise enough that you can
evaluate whether or not it’s true. A key is a unique identifier of a row in a
table. A domain is the set of permitted values of an attribute.
• The 3NF, 4NF, 5NF and BCNF are special cases of the DKNF.
• It is achieved when every constraint on the relation is a logical consequence
of the definition.
• Look at this database, which is in 1NF, to see what you must do to put that
database in DK/NF.