Theory of Database Systems: Lecture 10. The Process of Normalization I
The document discusses the process of normalization in database design. Normalization is done in steps to reach higher normal forms like 1NF, 2NF and 3NF. This removes anomalies like insertion, deletion and modification anomalies. The document explains reaching 1NF by removing repeating groups and 2NF by removing partial dependencies on primary keys through decomposition into multiple tables.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPS, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
99 views31 pages
Theory of Database Systems: Lecture 10. The Process of Normalization I
The document discusses the process of normalization in database design. Normalization is done in steps to reach higher normal forms like 1NF, 2NF and 3NF. This removes anomalies like insertion, deletion and modification anomalies. The document explains reaching 1NF by removing repeating groups and 2NF by removing partial dependencies on primary keys through decomposition into multiple tables.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPS, PDF, TXT or read online on Scribd
You are on page 1/ 31
Theory of Database
Systems Lecture 10. The process of normalization I. Normalization
• Normalization is a technique for producing a set of
suitable relations that support the data requirements of an enterprise. Suitable set of relations
• Characteristics of a suitable set of relations include:
– the minimal number of attributes necessary to
support the data requirements of the enterprise;
– attributes with a close logical relationship are found in
the same relation;
– minimal redundancy with each attribute represented
only once with the important exception of attributes that form all or part of foreign keys. Benefits of suitable set of relations
• The benefits of using a database that has a
suitable set of relations is that the database will be:
– easier for the user to access and maintain the data;
– take up minimal storage space on the computer.
How Normalization Supports Database Design
• Normalization is a bottom-up approach to DB design that begins by
examining the relationships between attributes. • However a top-down approach can also be used that begins by identifying the main entities and relationships and uses normalization as a validation technique. The Process of Normalization
• Normalization is a formal technique for analyzing
a relation based on its primary key and the functional dependencies between the attributes of that relation.
• Often executed as a series of steps. Each step
corresponds to a specific normal form, which has known properties. Normalization
• Four most commonly used normal forms are first
(1NF), second (2NF) and third (3NF) normal forms, and Boyce–Codd normal form (BCNF).
• Normalization is based on functional
dependencies among the attributes of a relation.
• A relation can be normalized to a specific form to
prevent possible occurrence of update anomalies. The Process of Normalization
The relationship between the normal forms.
It shows that some 1NF relations are also in 2NF and some 2NF relations are also in 3NF, an so on. The Process of Normalization Unnormalized Form (UNF)
• Before discussing first normal form, we initially
give a definition of the state prior to first normal form.
• Unnormalized form is a table that contains one or
more repeating groups.
• To create an unnormalized table
– Transform the data from the information source (e.g. form) into table format with columns and rows.
• In this format, the table is in unnormalized form
(UNF). Repeating group
• A repeating group is an attribute, or group of
attributes, within a table that occurs with multiple values for a single occurrence of the nominated key attribute(s) of that table.
• Nominated key: refers to the attribute(s) that
uniquely identify each row within the unnormalized table. Example: Form
Collection of DreamHome leases.
In the example it is assumed that a client rents a given property only once and cannot rent more than one property at any one time. UNF example
• Sample data is taken from two leases for two different
clients and is transferred into table format with rows and columns. • This is an unnormalized table.
ClientRental unnormalized table.
UNF example
• We identify the key attribute for the Clientrental
unnormalized table as clientNo.
• Next we identify the repeating group in the
unnormalized table:
Repeating Group = (propertyNo, pAddress, rentstart,
rentFinish, rent, ownerNo, ownerName)
• As a consequence, there are multiple values at
the intersection of certain rows and columns. First Normal Form (1NF)
• A relation in which the intersection of each row
and column contains one and only one value. UNF to 1NF
• To transform the unnormalized table to first
normal form we identify and remove repeating groups within the table.
– Nominate an attribute or group of attributes to act
as the key for the unnormalized table.
– Identify the repeating group(s) in the unnormalized
table which repeats for the key attribute(s).
• There are two common approaches to removing
repeating groups from unnormalized tables. Method 1
• We remove the repeating group by entering
appropriate data into the empty columns of rows containing the repeating data (‘flattening’ the table). We fill in the blanks by duplicating the nonrepeating data.
• The resulting relation contains atomic values at
the intersection of each row and column, and is therefore in 1NF.
• With this approach redundancy is introduced
into the resulting relation. Method 1 example
• Remove the repeating group by entering the
appropriate client data into each row.
• The resulting relation ClientRental is in 1NF as there is
a single value at the intersection of each row and column. Method 1 example
• We identify the candidate keys for the
ClientRental relation as being composite keys: – (clientNo, propertyNo) – (clientNo, rentStart) – (propertyNo, rentStart)
• We select (clientNo, propertyNo) as the primary
key.
• The relation contains data describing clients,
property rented, and property owners, which is repeated several times. As a result, the relation contains significant data redundancy. Method 2
• We remove the repeating group by placing the
repeating data along with a copy of the original key attribute(s) into a separate relation.
• A primary key is identified for the new relation.
• This approach produces relations in at least 1NF
with less redundancy. Method 2 example
• Using the second approach, we remove the repeating group by
placing the repeating data along with a copy of the original key attribute (clientNo) into a separate table, called PropertyRentalOwner. Method 2 example
• Then we identify a primary key for the new table
(clientNo, propertyNo).
• The format of the resulting 1NF relations are as follows:
• Both the Client and PropertyRentalOwner tables are in
1NF, but the PropertyRentalOwner table contains significant redundancy. Second Normal Form (2NF)
• Second normal form is based on the concept of
full functional dependency.
• Full functional dependency indicates that if
– A and B are attributes of a relation, – B is fully functionally dependent on A if B is functionally dependent on A but not on any proper subset of A. • A functional dependency A B is a full functional dependency if removal of any attribute from A results in the dependency not being sustained any more. Second Normal Form (2NF)
• A relation that is in 1NF and every non-primary-
key attribute is fully functionally dependent on the primary key.
– Second normal form applies to relations with
composite keys (the primary key composed of two or more attributes).
– A relation with a single attribute primary key is
automatically in at least 2NF. 1NF to 2NF
• Identify the primary key for the 1NF relation.
• Identify the functional dependencies in the
relation.
• If partial dependencies exist on the primary key
remove them by placing them in a new relation along with a copy of their determinant. Partial dependency
• A functional dependency A B is partially
dependent if there is some attribute that can be removed from A and the dependency still holds. 2NF example
Consider the ClientRental relation.
• This ClientRental table is in 1NF. The primary key of the
table is (clientNo, propertyNo).
• In order to move this table to a 2NF solution, we must
identify and remove the partial dependencies from the table. Functional dependencies in ClientRental relation • The functional dependencies (fd) for the ClientRental relation are as follows:
• The presence of partial dependencies show that
the table is not in 2NF. – cName is partially dependent on the primary key, in other words, on only the clientNo attribute. – Property attributes are also partially dependent on the primary key. Transform the ClientRental relation into 2NF • To remove the partial dependencies, we create new tables so that the non-primary-key columns are removed, along with a copy of the part of the primary key on which they are fully functionally dependent.
• This results in the creation of three new relations
called Clioent, Rental, and PropertyOwner. 2NF relations derived from ClientRental relation • The three tables, Client, Rental and PropertyOwner are in 2NF because every non-primary-key column is fully functionally dependent on the primary key of the table. Remarks
• Although 2NF relations have less redundancy than those
in 1NF, they may still suffer from update anomalies.
• E.g. if we want to update the name of on owner e.g.
Tony Diamond we have to update two tuples in the PropertyOwner relation.
• If we update only one tuple and not the other the
database would be in an inconsistent state.
• This update anomaly is caused by a transitive
dependency.
• We need to remove such dependencies by progressing