Chapter14 - Revised
Chapter14 - Revised
Navathe
CHAPTER 14
Figure 14. 4
Design # 2
◼ Update Anomaly:
◼ Changing the name of project number P1 from
“Billing” to “Customer-Accounting” may cause this
update to be made for all 100 employees working on
project P1.
◼ Insert Anomaly:
◼ Cannot insert a project unless an employee is
assigned to it.
◼ Conversely
◼ Cannot insert an employee unless he/she is assigned
to a project.
Copyright © 2017 Ramez Elmasri and Shamkant B. Navathe Slide 14- 14
EXAMPLE OF A DELETE ANOMALY
◼ Delete Anomaly:
◼ When a project is deleted, it will result in deleting all
the employees who work on that project.
◼ Alternately, if an employee is the sole employee on a
project, deleting that employee would result in deleting
the corresponding project.
Figure 14.3
Two relation schemas
suffering from update
anomalies. (a)
EMP_DEPT and (b)
EMP_PROJ.
◼ GUIDELINE 2:
◼ Design a schema that does not suffer from the
insertion, deletion and update anomalies.
◼ GUIDELINE 3:
◼ Relations should be designed such that their
tuples will have as few NULL values as possible
◼ Attributes that are NULL frequently could be
placed in separate relations (with the primary key)
◼ Reasons for nulls:
◼ Attribute not applicable or invalid
◼ Attribute value unknown (may exist)
◼ Value known to exist, but unavailable
◼ GUIDELINE 4:
◼ Design relation schemas so that they can be joined with
equality conditions on attributes that are appropriately
related (primary key, foreign key) pairs in a way that
guarantees that no spurious tuples are generated
If we attempt a NATURAL
JOIN operation on
EMP_PROJ1 and
EMP_LOCS, the result
produces many more
tuples than the original set
of tuples in EMP_PROJ.
Figure 14.4(b).
◼ Normalization:
◼ The process of decomposing unsatisfactory "bad"
relations by breaking up their attributes into
smaller relations
◼ Normal form:
◼ Condition using keys and FDs of a relation to
certify whether a relation schema is in a particular
normal form
Figure 14.3
Two relation schemas
suffering from update
anomalies. (a)
EMP_DEPT and (b)
EMP_PROJ.
◼ Denormalization:
◼ The process of storing the join of higher normal form relations
as a base relation—which is in a lower normal form
Copyright © 2017 Ramez Elmasri and Shamkant B. Navathe Slide 14- 36
3.3 Definitions of Keys and Attributes
Participating in Keys (1)
◼ A superkey of a relation schema R = {A1, A2, ...., An} is a set of
attributes S subset-of R with the property that no two tuples t1
and t2 in any legal relation state r of R will have t1[S] = t2[S]
◼ A key K is a superkey with the additional property that removal of
any attribute from K will cause K not to be a superkey any more.
◼ Disallows
◼ composite attributes
◼ multivalued attributes
◼ nested relations; attributes whose values for an
individual tuple are non-atomic
◼ Considered to be part of the definition of a relation
◼ Most RDBMSs allow only those relations to be
defined that are in First Normal Form
Figure 14.10
Normalizing nested relations into 1NF. (a) Schema of the EMP_PROJ relation with a
nested relation attribute PROJS. (b) Sample extension of the EMP_PROJ relation
showing nested relations within each tuple. (c) Decomposition of EMP_PROJ into
relations EMP_PROJ1 and EMP_PROJ2 by propagating the primary key.
◼ Examples:
◼ {SSN, PNUMBER}➔ HOURS is a full FD since neither
SSN ➔ HOURS nor PNUMBER➔ HOURS hold
◼ {SSN, PNUMBER}➔ ENAME is not a full FD (it is called a
partial dependency) since SSN ➔ ENAME also holds
Figure 14.3
Two relation schemas
suffering from update
anomalies. (a)
EMP_DEPT and (b)
EMP_PROJ.
◼ That is,:
◼ Lot numbers are unique only within each county, but
◼ Property_id# numbers are unique across counties for entire state.
Figure 14.12
Normalization into 2NF and 3NF.
◼ NOTE:
◼ In X➔ Y and Y ➔ Z, with X as the primary key, we consider
this a problem only if Y is not a candidate key.
◼ When Y is a candidate key, there is no problem with the
transitive dependency.
◼ E.g., Consider EMP (SSN, Emp#, Salary ).
◼ Here, SSN➔ Emp# ➔Salary and Emp# is a candidate key.
Copyright © 2017 Ramez Elmasri and Shamkant B. Navathe Slide 14- 50
4. General Normal Form Definitions (For
Multiple Keys) (1)
◼ Definition:
◼ Again, Superkey of relation schema R - a set of
attributes S of R that contains a key of R
◼ A relation schema R is in third normal form (3NF) if
whenever a FD X → A holds in R, then either:
◼ (a) X is a superkey of R, or
◼ (b) A is a prime attribute of R
◼ LOTS1 relation violates 3NF because
Area➔ Price; and Area is not a superkey in LOTS1.
(see Figure 14.12).
Figure 14.13
Boyce-Codd normal form. (a) BCNF normalization of
LOTS1A with the functional dependency FD2 being lost in
the decomposition. (b) A schematic relation with FDs; it is
in 3NF, but not in BCNF due to the f.d. C → B.
◼ Suppose that we have thousands of lots in the relation but the lots are from only
two counties: DeKalb and Fulton.
◼ Suppose also that lot sizes in DeKalb County are only 0.5, 0.6, 0.7, 0.8, 0.9, and 1.0
acres, whereas lot sizes in Fulton County are restricted to 1.1, 1.2, … , 1.9, and 2.0
acres.
◼ In such a situation we would have the additional functional dependency FD5:
Area → County_name. If we add this to the other dependencies, the relation
schema LOTS1A still is in 3NF because this f.d. conforms to clause (b) in the
general definition of 3NF, County_name being a prime attribute.
◼ The area of a lot that determines the county, as specified by FD5, can be
represented by 16 tuples in a separate relation R(Area, County_name), since
there are only 16 possible Area values (see Figure 14.13).
◼ This representation reduces the redundancy of repeating the same information in
the thousands of LOTS1A tuples. BCNF is a stronger normal form that would
disallow LOTS1A and suggest the need for decomposing it.
Copyright © 2017 Ramez Elmasri and Shamkant B. Navathe Slide 14- 58
Important Practice Example