0% found this document useful (0 votes)
4 views10 pages

Normalization Bbc2024

The document discusses unique identifiers and database normalization, emphasizing the importance of primary keys, candidate keys, and foreign keys in maintaining data integrity and establishing relationships between tables. It outlines the normalization process, which organizes data into smaller related tables to eliminate redundancy and inconsistencies, and describes the first three normal forms (1NF, 2NF, 3NF) necessary for effective database design. Additionally, it highlights the issues of data anomalies, such as insert, update, and delete anomalies, that arise from unnormalized data.

Uploaded by

wamalaeddie2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views10 pages

Normalization Bbc2024

The document discusses unique identifiers and database normalization, emphasizing the importance of primary keys, candidate keys, and foreign keys in maintaining data integrity and establishing relationships between tables. It outlines the normalization process, which organizes data into smaller related tables to eliminate redundancy and inconsistencies, and describes the first three normal forms (1NF, 2NF, 3NF) necessary for effective database design. Additionally, it highlights the issues of data anomalies, such as insert, update, and delete anomalies, that arise from unnormalized data.

Uploaded by

wamalaeddie2
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

4.

UIDS AND DATABASE NORMALISATION

[an introduction to Database normalization]

Before we delve into the normalization process, its healthy for us to remind ourselves of the concept of
Unique identifiers and other keys.

Unique identifiers refer to a broader term for any unique identifier used in a database. It can be a primary
key or a candidate key.

Primary key- a column/attribute that uniquely identifies records in the table. A table can only have one
primary key. For example, in a student database we can choose the student number as the primary key.
The implication is that every student will have a unique student number, and in case we want to know
anything about a student, we can use their student numbers to identify them. Typically, a primary key is
chosen from the candidate keys. The primary key enforces data integrity by ensuring uniqueness and
providing a means to reference and relate records in other tables through foreign keys.

A candidate key is a set of one or more attributes that can be used to uniquely identify each record. A
table can have many candidate keys. A candidate key is a potential primary key but is not explicitly
designated as the primary key. Did you think about registration number and email or phone number when
I talked about a Primary key? I know you did and you are on the right path of thinking. The student number
and any of these attributes could uniquely identify tuples in a relation. They are therefore candidate keys.

Task1: Read about alternate keys and super keys as well.

Foreign key- it’s a column in one table that refers to the primary key of another table. In simple terms if
the primary key of table A appears in table B, we call it a foreign key in table B. A foreign key can thus be
regarded as a referential key. Its very important in establishing relationships between tables, you will
further appreciate this concept when we look at JOINS in SQL.

Composite keys- sometimes we may require two or more attributes to uniquely identify each record in the
database. When the primary key is made of more than one attribute, we call it a composite primary key
or simply a composite key.

Key attributes- columns or attributes that are part of the candidate key (or composite key if the candidate
key is made up of multiple attributes). These can be primary keys, alternate keys or candidate keys.

Non-key attributes are not used for identification but contains descriptive data that helps to provide more
details or characteristics of the entity represented in a record. They are very essential for data analysis,
reporting and understanding the entity’s properties.
DATABASE NORMALISATION

Database normalization or just normalization as it is commonly put is a mechanism or a process in data


modeling and database creation of organizing data into smaller related tables so it can be updated and
queried efficiently. Database normalization is a database design principle for organizing data in an
organized and consistent way. This of course improves on the integrity and efficiency of the database.

Normalization gives data meaning by defining its relationship with other data. It is thus about connecting
data in the right way.

The main objective of normalization is to allow the storage of data without unnecessary redundancy and
hence eliminating data inconsistencies so that users can retrieve the data without a lot of difficulty.

Elmasri et al. (2006) strongly argued that although it was possible for tables generated from a semantic
data model such as an ERD, not to require further normalization as long as all the necessary entities
have been identified correctly and precisely, possibilities of functional dependencies between attributes
of an entity could still exist. The moral of this assertion is thus to ensure that each table represents a
logically coherent grouping of attributes and has the measure of goodness associated with normalization.

Therefore, database normalization is the process of organizing data in form of related tables to
eliminate unnecessary data redundancy and inconsistencies and improve on the integrity of the
data. It usually involves dividing a larger table into smaller related tables and linking them using
foreign keys (we looked at foreign keys and primary keys in topic two). The aim is to reduce data
redundancies as much as possible but not to remove them completely, as this may not be tenable.
Actually, we remove un necessary redundancies.

Advantages of Normalization

✓ It reduces redundancy and ensures data is stored logically.


✓ It helps to produce database designs that are cost effective and have better security models.
✓ It leads to ease in update and querying of data as it is always hard to extract data if it is
unorganized.
✓ Flexible database designs
✓ Improves on data integrity.

DATA REDUNDANCY

This means storing the same piece of information in the same database in different locations.
Its literally avoidable repletion of data. What is wrong with this? The obvious answer is that it is associated
with a lot of resource wastage. The other reason and perhaps the most important for this particular
context, is that it leads to data anomalies.

Data anomalies are glitches, errors, abnormalities or issues in the process of data management, usually
as a result of not normalizing the database. In a more technical jargon, data anomaly refers to an
unexpected side effect of trying to insert, update or delete a row. There are insert, update and delete
anomalies. I will bother myself with illustrating what each of them means in the following section.

Practical example1.

Let us assume a student database as an example. It looks like:


STUDEEN STUDEN FEE COURSE CLASS1 CLASS CLASS3
T T S NAME 2
ID NAME PAID
1 Atwine 200 Economic Economics Biology
Moreen s 1 1
2 Maria 500 Computer Biology1 Busines Programmin
Namubir Science s g
u Intro 2
3 Dinah 400 Medicine Biology2
Kihunde
4 Hassan 850 Dentistry
Magumb
a

The table keeps track of a few pieces of information; the name of the student, the fees a student has paid
and the class a student is taking if any.

This is not a normalized table and there are a few data anomalies with this. Can you figure out what is
exactly wrong with this table? Its important to note that unnormalized data has unnecessary data
redundancies and dependencies. These two contribute to the data anomalies in un normalized
databases.
INSERT ANOMALY

An insert anomaly happens when we try to insert a record into this table without knowing all the data we
need to know. Suppose we wanted to add a new student without knowing their course name.

The new record will look like this:


STUDEENT STUDENT FEES COURSE CLASS1 CLASS2 CLASS3
ID NAME PAID NAME
1 Atwine 200 Economics Economics1 Biology
Moreen 1
2 Maria 500 Computer Biology1 Business Programming
Namubiru Science Intro 2
3 Dinah 400 Medicine Biology2
Kihunde
4 Hassan Dentistry
Magumba
5 Annet 850
Knolly
0

We would be adding incomplete data to our table, which can cause issues when trying to analyse this
data.

UPDATE ANOMALY

This issue happens when we want to update data, and we update some of the data but not other data.
Some data that ought to be changed remains unchanged. In a hypothetical situation that Biology 1 is
changed to Introduction to Biology, we need to update this for every student doing Biology 1. The hustle
can be more real if we have hundreds of them! There is obviously the risk of missing out on a value which
would cause issues.

An update anomaly occurs when part but not all the data that ought to be updated is updated. This causes
inconsistencies in data. The ideal situation would require that we update the value only once and in one
location.
DELETE ANOMALY

A delete anomaly occurs when we want to delete data from the table, but we end up deleting more than
what we intended.

Let’s refer to our table again. In the event that Dinnah Kihunde leaves the school, we shall have to delete
her off our records. Deleting her record would however mean that we also delete Biology 2. Ooops!!
Biology 2 is only stored in this row, same is for medicine! The ideal situation would guarantee that the
deletion of one type of data does not have impacts on other records we don’t want to delete.

The Normalization Process.

The process involves applying rules to a set of data. Each of these rules transforms the data to a certain
structure, called a normal form. There are about six normal forms, but the first three are the main and
most common ones. Whenever the first rule is applied, then the relation or table is in the first normal form,
when the second rule is applied to the data, we say, it is in the second normal form and the same principle
applies to the other normal forms. Its also fun learning that there is no standard definition of the sixth
normal form yet, the debate s still ongoing. Fortunately, we shall not get involved into the debate, since
our scope confines us to the 3NF! Actually, if we can normalize our databases at least up to the 3NF, we
shall have gotten ourselves a reason to celebrate and testify! This would be sufficient.

The First Normal Form [1NF]

A relation is in the 1NF if each attribute contains simple (atomic) values and has a primary key. An
Atomic value is one that cannot be further divided into meaningful data elements. This rule ensures that
each cell in a table holds a single value and avoids storing multiple values in a single attribute. The idea
is that there should not be any multivalued attributes in the relation!

There must be unique column names and no repeating rows of data.

The order of rows and columns does not matter.

Consider the relation Employees below

Employee_ID Name Station Phone Department

2 Agness Namugumya Nansana 0702124567 Sales

1 Atwine Moreen Banda 0781852097, Production

0701659713
Imagine a situation where we want to update or delete Moreen’s airtel phone number. We can not do so
without affecting the MTN number as well. I guess you can note that these two values refer to the same
attribute and yet we could divide it perfectly into two meaningful parts; phone 1 and phone 2. I know you
now remember atomic values and the concept of multivalued attributes!

So is this relation in 1NF? I hope you guessed it right, since Phone consists of non-atomic values.

We can decompose Phone into two attributes; Phone1 and Phone2 and move them to a separate
relation.

Employee_ID Name Station Department

2 Agness Nansana Sales


Namugumya

3 Jerome Gisimbi Mbarara Procurement

1 Atwine Moreen Banda Production

Contact_infor

Employee_ID Phone1 Phone2

2 0781852097 0781659712

1 0701659712 0781852099

3 070050970 078050097

The above two relations are in 1NF

In practice, a database is bad if it does not at least conform to the rules of the 1NF.

THE SECOND NORMAL FORM[2NF]

For a relation to be in 2NF, it must;

1. Be in the 1NF first. This means that it should first satisfy the requirements of the 1NF before we
even think of whether it’s in the 2NF or not.
2. Each non-key attribute must be functionally dependent on the entire primary key. This means
that there should be no partial dependencies. If a relation has a composite primary key ( the
candidate key is made up of multiple attributes), ensure that each non-key attribute depends on
the entire primary key, not just part of it. If any non-key attribute depends on only a portion of the
primary key, it should be moved to a separate relation.

So is our employee’s relation in 2NF now? Lets ask ourselves if it is in 1NF first. Since the
attributes have atomic values, then this is satisfied. Are all columns functionally dependent on
the Employee ID? We can see that the name, station and department are all relating to the
employee entity type. Therefore, the relation is already in 2NF.

We can thus move department to a separate relation and link it to employees by means of a
foreign key.

Department _ID Department_Name Employ_ID

1 Sales 2

2 Production 1

THE THIRD NOMAL FORM [3 NF]

This is usually the final stage in the moralization process, however, some database designers go
beyond the 3NF. For a relation to be in the 3NF, it must:

1. Meet the requirements of 2NF


2. Have no transitive dependencies. This means that every non-key attribute must wholly depend
on the primary key and the primary key only.

Is our relation in 3NF already? Are there transitive dependencies? Of course, there are not any,
therefore we have our relation in 3NF.

But suppose we introduced in an attribute called supervisor name in the employee table. This is
how our table will look like.

Employee_ID Name Station Department Supervisor Name

2 Agness Na nsana Sales Andrew Male


Namugumya

1 Atwine Moreen Banda Production Akankunda


Lonitah
In this relation, the name, residence and department belong to the employee. They are all
attributes for this entity type and none of the attributes determines the other. Let us think about
supervisor name, does it depend on the employee id? Different departments have different
supervisors- thus the dependency between employee id and supervisor name is established
through the department. We therefore see a transitive dependency and this violates 3NF.

We can remove this troublesome attribute from the relation and create a new table for it called
supervisors and link it to the employee by means of a special key called……...? What did we say
serves as the referential key? That is the answer.

Supervisors Table

Supervisor_ID Name email Department

2 Andrew Male male@fantansy.com Sales

1 Akankunda lnitah@fantansy.com Production


Lonitah

Our Employees table now’

Employee_ID Name Station Department Supervisor_ID

2 Agness Nansana Sales S1


Namugumya

1 Atwine Moreen Banda Production P1

Department Table

Department _ID Department_Name Employ_ID

1 Sales 2

2 Production 1
Contact_infor

Employee_ID Phone1 Phone2

2 0781852097 0781659712

1 0701659712 0781852099

3 070050970 078050097

The Concept of Partial dependencies and Transitive dependencies.

There is always a confusion between these two concepts. Both are common concepts in database design
and normalization. The most concise difference between the two is that a partial dependency occurs
when an attribute depends on only a part of a composite primary key. While a transitive dependency
exists when an attribute in a relation depends on another non-key attribute, which in turn depends on the
primary key. The dependency “A depends on B and B depends on C “creates a transitive dependency
between A and C. The notation A B C implies two things;
a. B functionally depends on A and C functionally depends on B
b. A transitive dependency between C and A through B exists.

Please note that the value of the attribute on the left determines the value of the attribute on the right.

Dealing with Multivalued Attributes and Composite Attributes

A multi-valued attribute is an attribute that can have multiple values for a single entity instance. In
database normalization, multi-valued attributes are always resolved by creating a separate table to
represent the attribute, and linking the two tables by means of a foreign key.

Imagine a database for a company that tracks employee information with an entity ‘skills. If we transform
the ERD for such a database into a table, it would be as follows;

EMPLOYEE
EMPLOYEE_ID NAME SKILLS
In this case, the skills attribute (and later on skills column- at the level of physical design and
implementation) can have multiple values representing the various skills possessed by an individual
employee (a single instance of the employee entity) such as ‘programming’, ‘public speaking’, ‘design’
and ‘database management’. To address this issue, we can create a separate table for skills and establish
a relationship with the employee table.
The new Employee table would be (EmployeeId, name), lets mark this as table 1.

The skills table would be (employeID, skills). Oops!! This creates the problem of many to many
relationships and also the problem of multi valued attributes.

So we need to resolve the many to many relationship. We do this by creating a junction/bridge table. This
table’s role is to store the relationship between the employee and skills table. Let me call it EmpSkills
table.

Table2 :skills table(skillID, SkillName)

Table3: EmpSkills (skillID, EmployeeID)-this is the junction table.

On the other hand, a Composite attribute is one that can be further subdivided into multiple meaningful
components. In other words, it’s made up of other attributes. We decompose the composite attribute into
individual components and create a separate attribute for each component.

Let us reflect on the attribute’s employee_name and address. Consider an Employee called ‘Nabukeera
Brittney Ainokumka’. The employee name in question (attribute value) can be perfectly divided into three
meaningful parts; ‘Brittney-first name’, ‘Nabukeera-last/surname’ and ‘Ainomuka-maiden name or other
name’. So, we would rather have the attributes (columns if its at the physical design level or
implementation stage) FirstName, LastName and OtherName than having the composite attribute
‘employeeName’.

Also, assuming we are talking about the address of the employee. This could mean their email address,
street name, zipcode, postal address, etc. to resolve this, we would still subdivide the address attribute.

Task2: You are required to normalize the table we looked at when we’re discussing anomalies (marked
practical example 1) at least up to the third normal form.

CONCLUSION

In this chapter, we reviewed the concept of unique identifiers and looked at the concept of database
normalization. We concentrated on the first three normal forms. Sometimes we find ourselves normalizing
the data even up to 3NF automatically by looking at the data and applying a little common sense. We
also noted that a database is bad if it does not at least conform to the rules of the first normal form. There
is a related concept called Denormalization which is a process of intentionally and carefully adding
redundancy to a database design to improve database performance. We have not looked at the in depth
of this concept, because it will be covered comprehensively in a sister course called Business Intelligence
and Data Ware Housing.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy