0% found this document useful (0 votes)

7 views

lecture notes on database normalization

The document provides lecture notes on database normalization, focusing on concepts such as keys, functional dependencies, and normal forms (2NF, 3NF, BCNF). It includes procedures for determining candidate keys, computing transitive closures, and decomposing relations into BCNF while ensuring lossless joins and preservation of functional dependencies. The notes also contain examples illustrating the application of these concepts and procedures.

Uploaded by

madhumita.panda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

lecture notes on database normalization

Uploaded by

madhumita.panda

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Lecture Notes on Database Normalization

Chengkai Li
Department of Computer Science and Engineering
The University of Texas at Arlington

April 15, 2012

I decided to write this document, because many students do not have the textbook, the slides
themselves won’t be sufficient, and we did not have enough time in lecture to elaborate it in
further details. My goal is to summarize the concepts we learned and explain various points about
normalization through examples. These examples can help you solve similar problems in homework
and exam. To thoroughly understand these topics, you should read the textbook.

1 Keys of a Relation and Transitive Closure

1.1 Concepts and Procedures
Concept 1 (superkeys, candidate keys (keys), primary key, secondary keys) Here we sum-
marize various concepts related to keys.
Superkeys: A set of attributes in a relation is a superkey if it can determine the rest of the
attributes in the relation. In other words, given a tuple, if we know its values on all the attributes
in a superkey, then we can determine its values on all remaining attributes. Under set semantics,
it means we can uniquely identify a tuple by its superkey value; under bag semantics, it means we
can identify the identical tuples by their superkey value.
Candidate keys (also simply called keys): A candidate key is a superkey and none of its proper
subset is a superkey. In other words, a candidate key is a minimal superkey. It cannot be made
smaller. Removing any attribute from it will disqualify it as a superkey. (Hence any proper superset
of a candidate key is not a candidate key, and any proper subset of a candidate key is not a candidate
key either.)
Primary key: If a relation has multiple candidate keys, then one of the candidate keys is
designated as the primary key.
Secondary keys: If a relation has multiple candidate keys, except the primary key, every other
candidate key is a secondary key.

Concept 2 (prime attribute, nonprime (nonkey) attribute) An attribute is a prime attribute

if it belongs to any candidate key. (Note that we cannot replace “candidate key” by “superkey” in
this concept, because the set of all attributes of a relation is also a superkey, making every attribute
belong to at least one superkey.)
An attribute is a nonprime (nonkey) attribute if it is not a prime attribute, i.e., it is not part
of any candidate key.

Note: The Table 15.1 in the textbook (and in the slides of Chapter 15) is not fully rigorous.
For 3NF, it says “Relation should not have a nonkey attribute functionally determined by another
nonkey attribute (or by a set of nonkey attributes).” It should be “Relation should not have a
nonkey attribute functionally determined by another nonkey attribute (or by a set of attributes
that is not a superkey).” The original statement does not cover the case in which a nonkey attribute

1
is functionally determined by a set of attributes containing both nonkey and prime attributes but
is not a superkey.

Concept 3 (Functional Dependency) The concept of functional dependency (FD) was repeat-
edly explained in lectures and you can find its definition and explanation in slides and textbook.
It is very important for understanding all other concepts.
A note to make is that given any superkey (and thus candidate key) X in a relation schema and
any set of attributes Y in the schema, there exists an FD X → Y. (The FD is nontrivial if Y does
not overlap with X.)

Concept 4 (Transitive Closure) Given a set of attributes X, the transitive closure of X, denoted
by X+ , is the set of attributes that can be derived directly or transitively from X according to
functional dependencies (FDs). To simplify the notation, we also use a+ (instead of {a}+ ) to
denote the transitive closure of a single attribute a.

Procedure 1 (How to compute transitive closure?) Given a set of attributes X, to compute

X+ , we start with X+ =X. If there exits a functional dependency (FD) A → B such that A ⊆ X+ ,
we will make X+ =X+ ∪ B. We keep doing this until we reach a fixed point, i.e., X+ does not change
anymore.
A less formal and more intuitive way of describing the procedure is as follows. At the beginning,
the transitive closure of X includes the attributes in X itself. We find functional dependencies (FDs)
whose left-hand sides are included in the transitive closure and expand the transitive closure to
include the right-hand sides. We keep doing this, until the transitive closure cannot be further
expanded.

Procedure 2 (How to find candidate keys by transitive closure?) Given a set of attributes
X in a relation schema, if its transitive closure X+ contains all attributes in the schema, then X is a
superkey. Furthermore, if none of X’s proper subsets can be a superkey, then X is a candidate key.

1.2 Examples
Problem 1 Consider the following FDs for relation schema R(a, b, c, d, e): a → bc; cd → e; b →
d; e → a. List all candidate keys for R.

Solution 1 To find candidate keys, we can compute the transitive closure of every possible subset
of the attributes, i.e., every possible left-hand side of an FD.
Compute a+ : Begin with a+ ={a}; expand a+ to {abc} by FD a → bc; further expand it to
{abcd} by FD b → d; further expand it to {abcde} by FD cd → e. Hence a+ ={abcde}.
Compute b+ : b+ ={b}; then b+ ={bd} by FD b → d.
Compute c+ : c+ ={c}. It cannot be expanded. As we can see, none of the FDs has c only in
the left-hand side.
Compute d+ : d+ ={d}.
Compute e+ : e+ ={e}; then e+ ={ae} by FD e → a; we already know a+ ={abcde}, therefore
+
e ={abcde}.
Since a+ ={abcde} and e+ ={abcde}, we know that both a and e are candidate keys.
To find remaining candidate keys, we can continue the enumeration of all possible combinations
of attributes and compute their transitive closures. However, we can avoid the enumeration of
several cases, by making the following observation.

2
Remember we showed in Concept 1 that any proper superset of a candidate key cannot be a
candidate key. Instead, such a proper superset is only a superkey. Hence, we can conclude that
any proper superset of a or e is not a candidate key. Based on this observation, any multi-attribute
candidate key must not contain a or e.
For finding possible two-attribute candidate keys, we only need to look at bc, bd, and cd.
{bc}+ : First {bc}+ ={bc}; then {bc}+ ={bcd} by b → d; furthermore {bc}+ ={bcde} by cd →
e; finally {bc}+ ={abcde} by e → a. Therefore bc is a candidate key.
{bd}+ : First {bd}+ ={bd}. It cannot be further expanded. (Another way of reasoning about
it: We have FD b → d. Hence {bd}+ is equal to b+ , which we have already computed.)
{cd}+ : First {cd}+ ={cd}; then {cd}+ ={cde} by cd → e. Based on the previous steps, we
already know e is a candidate key. Hence cd is also a candidate key.
For finding possible three-attribute candidate keys, we only need to consider bcd. However, it
cannot be a candidate key, given that bc (cd) is a candidate key.
In summary, a, e, bc, cd are all the candidate keys.

Problem 2 Consider the following FDs for relation schema R(a, b, c, d, e): ab → c; cd → e; de →
b. List all candidate keys for R.

Solution 2 Again, we can enumerate all subsets of the schema including itself and compute the
transitive closures of these subsets. However, we can save time and avoid such exhaustive exami-
nation, by following some key observations.
Note that none of the FDs has a or d in its right-hand side. In other words, a and d cannot be
derived from any other attributes. Hence any candidate key must contain both a and d.
Consider possible 2-attribute candidate keys, i.e., ad. ad+ ={ad}. It is not a candidate key.
Consider possible 3-attribute candidate keys, i.e., abd, acd, ade. abd+ ={abcde}, acd+ ={abcde},
+
ade ={abcde}. All of them are candidate keys.
It is easy to see that there cannot be any 4-attribute or 5-attribute candidate keys. As mentioned
above, a candidate key must contain both a and d. Thus a 4-attribute or 5-attribute candidate key
must be a superset of abd, or acd, or ade, which are all candidate keys. That would disqualify it
from being considered a candidate key.

2 Normal Forms
In defining normal forms, we only consider nontrivial FDs. An FD X → Y is trivial if X subsumes
Y, i.e, X ⊇ Y.
We also only consider simple FD in which the right-hand side is a single attribute. Thus we
will represent it by X → y, where the lower case y indicates it is a single attribute. An FD with
multiple attributes in the right-hand side can be split into individual simple FDs.

2NF: The following statements are all equivalent. They all define what 2NF is about.
• A relation is in 2NF if every nonprime attribute is fully functional dependent on every can-
didate key.
• A relation is in 2NF if there does not exist an FD X → y such that y is a nonprime attribute
and X is a proper subset of a candidate key.
• A relation is in 2NF if there does not exist a nonprime attribute y that is partially dependent
on any candidate key.

3
3NF: The following statements are all equivalent. They all define what 3NF is about.
• A relation is in 3NF if it is in 2NF and there does not exist a nonprime attribute y that is
transitively dependent on any candidate key.
• A relation is in 3NF if there does not exist a nonprime attribute y that is dependent on any
set of attributes that is not a superkey.
• A relation is in 3NF if given any FD X → y, either y is a prime attribute or X is a superkey.

BCNF: The BCNF condition is really simple.

• A relation is in BCNF if given any FD X → y, X is a superkey.

In practice, we often want our relations in BCNF or 3NF. Higher normal forms 4NF
and 5NF are not pursued in practice. The lower one 2NF is often insufficient.

3 Decomposition
3.1 Decomposition Algorithms
Procedure 3 (How to determine if a relation is in 2NF, 3NF, or BCNF?) Find all can-
didate keys based on given FDs. Then check if any FD violates the corresponding normal form,
according to the definitions of 2NF, 3NF, and BCNF.

Proposition 1 Any relation schema with two attributes is in BCNF.

Proof 1 We have proved this in class. It is also a question in HW3. We will provide the proof in
HW3 solution.

Procedure 4 (How to decompose a relation R into a set of relations in BCNF?)

1. To start with, pick an FD X → y that violates BCNF in R.

2. Compute X+ , the transitive closure of X.

3. Decompose R into R1 and R2. R1 contains all attributes in X+ , R2 contains the set of
attributes (R−X+ )∪X, which is also equivalent to the set of attributes R − (X+ − X). In other
words, R2 contains X itself and all attributes outside of the transitive closure of X.

4. Find which original FDs in R are still applicable (i.e., preserved) in R1 and R2. Find all
candidate keys of R1 and R2 based on the preserved FDs in these relations, and find if they
are in BCNF.

5. If R1 or R2 is not in BCNF, recursively decompose the relation, until all relations are in
BCNF. (It is guaranteed that we will eventually have a set of relations that are all in BCNF.
See Proposition 1.)

3.2 Lossless Join Property and Perseverance of FDs

There are two consequences of decomposition:
(1) The original schema is decomposed into multiple relations. Thus the attributes are not all
together anymore. Some of the original FDs may be reserved in one or more of the new relations,
i.e., they can be checked in these relations. Some FDs may be lost. It is not always possible to
preserve all FDs.

4
(2) The tuples in the original relation are projected into the new relations. We may wonder–
Can we recover the original relation? In other words, do we get exactly the same relation if we
perform natural joins of the new relations? After all, we will join these relations for various queries
that are supposed to be executed on the original single relation.

Proposition 2 If we follow the above procedure to decompose a relation into BCNF relations, it
is guaranteed that the original relation can be recovered losslessly, i.e., we can obtain the original
relation exactly by natural joins of the new relations.

In HW3, through one question, you will see that if we do not follow the above procedure (e.g.,
if we decompose a relation in a fashion that is not based on a violating FD), then the lossless
property is not guaranteed. We may not get back the exact original relation.

Proposition 3 By following an algorithm of 3NF decomposition, a relation can be decomposed

into 3NF relations such that (1) the lossless join property is guaranteed and (2) all FDs are pre-
served. We will not discuss the details of this algorithm. Instead, I encourage you to apply the
BCNF decomposition algorithm, but use an FD that violates 3NF to start with. Although it is not
the same as the 3NF decomposition algorithm, you get some exercise and better understanding by
checking if join is lossless and if FDs are preserved.

3.3 Examples
Problem 3 Consider the following FDs for relation schema R(a, b, c, d): ab → c; c → d; d → a.
(1) Find all candidate keys.
(2) Find all BCNF violations.
(3) Decompose R into relations in BCNF.
(4) What FDs are not preserved by BCNF.

Solution 3
(1) b is not in the right-hand side of any of the given FDs. Hence every candidate key must contain
b. b+ ={b}. Hence b itself is not a candidate key. We need to consider its supersets. ab+ ={abcd},
bc+ ={abcd}, bd+ ={abcd}. All of them are candidate keys. And thus there cannot be any candi-
date key with more than 2 attributes.

(2) ab, bc, and bd are candidate keys. Let’s check each of the given FDs:
ab → c: left-hand side is a candidate key, and thus a superkey. Hence it does not violate BCNF.
c → d: left-hand side is not a superkey. Hence it violates BCNF.
d → a: left-hand side is not a superkey. Hence it violates BCNF.

(3) We can decompose by starting with one of the violating FDs.

(3.1) Start with c → d: The transitive closure of c is c+ ={a, c, d}. Hence R can be decomposed into
R1(b, c) and R2(a, c, d). For R1(b, c), we know a 2-attribute relation schema must be in BCNF.
In R2(a, c, d), the original FDs c → d and d → a are preserved. The only candidate key is c. Hence
there is a transitive dependency: c → d and then d → a. a is a nonprime attribute in R2. Thus
the FD d → a in R2 violates 3NF (and thus BCNF too). By this violating FD, we will further
decompose R2 into R3(c, d) and R4(a, d). Both R3 and R4 have only 2 attributes. Therefore they
are in BCNF. Hence the original R is decomposed into R1, R3, R4 so that all of them are in BCNF.

5
(3.2) Start with d → a: The transitive closure of d is d+ ={a, d}. Hence R can be decomposed
into R5(b, c, d) and R6(a, d). R6 must be in BCNF. In R5(b, c, d), c → d is preserved. Hence b, c
is the only candidate key in R5. c → d violates 2NF (and thus 3NF and BCNF as well) in R5
since c is only part of a candidate key and d is a nonprime attribute. We further decompose R5
by this violating FD into R7(b, c) and R8(c, d). Both are in BCNF. The original R is decomposed
into R6, R7, R8 so that all of them are in BCNF.

(4) If we follow the decomposition in (3.1) or (3.2), ab → c is not preserved in any of the resulting
relations. As we can see, it is not always possible to preserve all original FDs during decomposition.

Gryptite Design Manual
No ratings yet
Gryptite Design Manual
20 pages
UREA Granulation Unit
No ratings yet
UREA Granulation Unit
26 pages
Pegging
No ratings yet
Pegging
3 pages
Functional Dependencies and Normalization
No ratings yet
Functional Dependencies and Normalization
7 pages
Manul Book of Self-Climbing Concrete Placingboom PDF
No ratings yet
Manul Book of Self-Climbing Concrete Placingboom PDF
73 pages
Functional Dependency Theory
No ratings yet
Functional Dependency Theory
22 pages
closure properties
No ratings yet
closure properties
22 pages
Dbms Unit-4
No ratings yet
Dbms Unit-4
78 pages
DBMS
No ratings yet
DBMS
8 pages
Workbookdocx (1) PDF
No ratings yet
Workbookdocx (1) PDF
21 pages
Chapter 3 - Functional Dependencies
No ratings yet
Chapter 3 - Functional Dependencies
122 pages
Functional Dependency
No ratings yet
Functional Dependency
33 pages
n13 DBDesign
No ratings yet
n13 DBDesign
11 pages
5 - Chapter 3 - Functional Dependencies
No ratings yet
5 - Chapter 3 - Functional Dependencies
26 pages
DBMS SOLVED QP
No ratings yet
DBMS SOLVED QP
10 pages
6 - Chapter 3 - Functional Dependencies
No ratings yet
6 - Chapter 3 - Functional Dependencies
29 pages
5.1 - Chapter 3 - Functional Dependencies
No ratings yet
5.1 - Chapter 3 - Functional Dependencies
34 pages
HE170786 - Nguyen Quy Duong
No ratings yet
HE170786 - Nguyen Quy Duong
5 pages
ADBMSUnit2pptx__2023_07_31_11_36_38
No ratings yet
ADBMSUnit2pptx__2023_07_31_11_36_38
40 pages
ques
No ratings yet
ques
7 pages
1.7 Armstrong Axioms and Finding Candidate Keys
No ratings yet
1.7 Armstrong Axioms and Finding Candidate Keys
25 pages
Functional Dependency
No ratings yet
Functional Dependency
9 pages
DBMS
No ratings yet
DBMS
22 pages
Functional Dependencies
No ratings yet
Functional Dependencies
18 pages
Functional Dependencies: R&G Chapter 19
No ratings yet
Functional Dependencies: R&G Chapter 19
16 pages
Lecture7-Keys_and_FD
No ratings yet
Lecture7-Keys_and_FD
60 pages
Unit 3_Normalization
No ratings yet
Unit 3_Normalization
56 pages
Candidate Key
No ratings yet
Candidate Key
8 pages
Dependencies in DBMS
No ratings yet
Dependencies in DBMS
30 pages
4 Normalization
No ratings yet
4 Normalization
41 pages
Functional Dependencies: CS 186, Spring 2006, Lecture 21 R&G Chapter 19
No ratings yet
Functional Dependencies: CS 186, Spring 2006, Lecture 21 R&G Chapter 19
17 pages
FD Slide2 09
No ratings yet
FD Slide2 09
19 pages
Exercise: B CD (C) AB CD (D) C D (E) B A (F) BD AC (G) AD BC (H) D B (I) D C (J) C A
No ratings yet
Exercise: B CD (C) AB CD (D) C D (E) B A (F) BD AC (G) AD BC (H) D B (I) D C (J) C A
17 pages
Normalization PPT
No ratings yet
Normalization PPT
161 pages
Lecture 10: BCSE302L - DBMS: Functional Dependencies
No ratings yet
Lecture 10: BCSE302L - DBMS: Functional Dependencies
35 pages
370 - Lec 6
No ratings yet
370 - Lec 6
24 pages
Database Management System Weekly Test 01 Test Paper
No ratings yet
Database Management System Weekly Test 01 Test Paper
5 pages
BCNF & Lossless Decomposition: Prof. Sin-Min Lee Department of Computer Science
No ratings yet
BCNF & Lossless Decomposition: Prof. Sin-Min Lee Department of Computer Science
67 pages
Chapter II Relational Model
No ratings yet
Chapter II Relational Model
17 pages
Functional Dependencies and Normalization
No ratings yet
Functional Dependencies and Normalization
26 pages
QUESTIONS ON NORMALIZATION
No ratings yet
QUESTIONS ON NORMALIZATION
6 pages
Normalization
No ratings yet
Normalization
113 pages
UNIT - IV B
No ratings yet
UNIT - IV B
89 pages
BD 2024 - RDT - I
No ratings yet
BD 2024 - RDT - I
23 pages
Normal Forms 1 2 3 BCNF
No ratings yet
Normal Forms 1 2 3 BCNF
9 pages
Unit 5
No ratings yet
Unit 5
41 pages
Lecture17
No ratings yet
Lecture17
31 pages
Normalization
No ratings yet
Normalization
107 pages
2database Assignment2
No ratings yet
2database Assignment2
12 pages
DBMS It 405
No ratings yet
DBMS It 405
4 pages
DB Design Normalization
No ratings yet
DB Design Normalization
62 pages
Normalization
No ratings yet
Normalization
30 pages
Normalization
No ratings yet
Normalization
177 pages
Functional Dependency
No ratings yet
Functional Dependency
12 pages
Normalization 1
No ratings yet
Normalization 1
10 pages
Schema Refinement: Book: Chapter 19
No ratings yet
Schema Refinement: Book: Chapter 19
34 pages
ch4dbms FDand Nor
No ratings yet
ch4dbms FDand Nor
73 pages
Module 3 Part 1 (1)
No ratings yet
Module 3 Part 1 (1)
14 pages
MS SQL
No ratings yet
MS SQL
95 pages
MS SQL
No ratings yet
MS SQL
95 pages
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
Lecture Notes in Elementary Real Analysis
From Everand
Lecture Notes in Elementary Real Analysis
Rohan Dalpatadu
No ratings yet
Limits and Continuity (Calculus) Engineering Entrance Exams Question Bank
From Everand
Limits and Continuity (Calculus) Engineering Entrance Exams Question Bank
Mohmmad Khaja Shareef
No ratings yet
From Simple IO to Monad Transformers
From Everand
From Simple IO to Monad Transformers
J Adrian Zimmer
2/5 (1)
The Adventures of Benny The Bunny
No ratings yet
The Adventures of Benny The Bunny
18 pages
Gear Train Experiment
No ratings yet
Gear Train Experiment
8 pages
MULTICAL® 602 & Ultraflow®: Installation and User Guide For
No ratings yet
MULTICAL® 602 & Ultraflow®: Installation and User Guide For
24 pages
Gentle Glory International Enterprise LTD: Unit 2205, Wellborne Commercial Centre No.8 Java Road, North Point, Hong Kong
No ratings yet
Gentle Glory International Enterprise LTD: Unit 2205, Wellborne Commercial Centre No.8 Java Road, North Point, Hong Kong
24 pages
New Intern Guide Quick Notes
No ratings yet
New Intern Guide Quick Notes
8 pages
NG
No ratings yet
NG
474 pages
DBA 1748 Industrial Relations and Labour Welfare
No ratings yet
DBA 1748 Industrial Relations and Labour Welfare
184 pages
Solution 1403695
No ratings yet
Solution 1403695
13 pages
Travel Guide New Orleans DK Publishing
No ratings yet
Travel Guide New Orleans DK Publishing
49 pages
Experiment 4
No ratings yet
Experiment 4
6 pages
A3 2BHK
No ratings yet
A3 2BHK
1 page
Active and Passive Voice
No ratings yet
Active and Passive Voice
3 pages
On Defining The Morpheme
No ratings yet
On Defining The Morpheme
7 pages
The Art of Laser Dentistry
No ratings yet
The Art of Laser Dentistry
45 pages
Nail Designing Tools
100% (1)
Nail Designing Tools
19 pages
Rolls-Royce: 250-C30 Series Operation and Maintenance
No ratings yet
Rolls-Royce: 250-C30 Series Operation and Maintenance
4 pages
Vmi Prayer
No ratings yet
Vmi Prayer
2 pages
b1 Preliminary Handbook 2020 PDF
No ratings yet
b1 Preliminary Handbook 2020 PDF
2 pages
CCNA 2 Pretest Exam Answers v5
100% (2)
CCNA 2 Pretest Exam Answers v5
81 pages
BBSS Paintball Tournament
No ratings yet
BBSS Paintball Tournament
70 pages
3is STEST
No ratings yet
3is STEST
4 pages
Instructional Plan in Arts
100% (1)
Instructional Plan in Arts
3 pages
SRD T73 PDF
No ratings yet
SRD T73 PDF
2 pages
Learning Objective: Introduction To Theory of Factor Pricing
No ratings yet
Learning Objective: Introduction To Theory of Factor Pricing
5 pages
Mycology Practical lecture 4
No ratings yet
Mycology Practical lecture 4
4 pages
9 Laplace Transformation 2017
No ratings yet
9 Laplace Transformation 2017
48 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

lecture notes on database normalization

Uploaded by

lecture notes on database normalization

Uploaded by

Lecture Notes on Database Normalization

April 15, 2012

1 Keys of a Relation and Transitive Closure

Concept 2 (prime attribute, nonprime (nonkey) attribute) An attribute is a prime attribute

Procedure 1 (How to compute transitive closure?) Given a set of attributes X, to compute

BCNF: The BCNF condition is really simple.

Proposition 1 Any relation schema with two attributes is in BCNF.

Procedure 4 (How to decompose a relation R into a set of relations in BCNF?)

1. To start with, pick an FD X → y that violates BCNF in R.

2. Compute X+ , the transitive closure of X.

3.2 Lossless Join Property and Perseverance of FDs

Proposition 3 By following an algorithm of 3NF decomposition, a relation can be decomposed

(3) We can decompose by starting with one of the violating FDs.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.