0% found this document useful (0 votes)
8 views71 pages

Chapter 2 RM and Normalization V2

The document provides an overview of the relational model and normalization in database design, covering key concepts such as functional dependencies, normal forms, and the importance of reducing redundancy. It explains the structure of relations, candidate keys, primary keys, and foreign keys, as well as various types of anomalies that can occur in poorly designed databases. The content is aimed at students in a preparatory cycle for IT studies, focusing on foundational principles of database management.

Uploaded by

Meli Rta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views71 pages

Chapter 2 RM and Normalization V2

The document provides an overview of the relational model and normalization in database design, covering key concepts such as functional dependencies, normal forms, and the importance of reducing redundancy. It explains the structure of relations, candidate keys, primary keys, and foreign keys, as well as various types of anomalies that can occur in poorly designed databases. The content is aimed at students in a preparatory cycle for IT studies, focusing on foundational principles of database management.

Uploaded by

Meli Rta
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

Dr.

Nour El-Houda BENALIA


Responsible for the IT subject –
2nd year Preparatory cycle - ENP

Chapter 02:

Relational model and


Normalization
2 You will learn
 Functional dependencies.
 Normal forms.
 Normalization of a relationship.

ER
Transformation

RM
Normalization is a process that
Normalisation improves a database design by
generating relations that are of
Functional higher normal forms.
dependencies Normal forms
3
The relational model
 Informally, the relational model can be defined as follows:

 data is organized in the form of two-dimensional tables, also


called relations, the rows of which are called n-tuples or tuples.

 Data is manipulated by relational algebra operators.

 The coherent state of the database is defined by a set of


integrity constraints.
4 Concepts of the relational model
1. Relation
 A relation is a subset of the Cartesian product of n attribute
domains (n > 0).

 A relationship is represented in the form of a two-dimensional


table in which the n attributes appear as the titles of the n
columns.
5 Concepts of the relational model
 2. Relation Schema

 A relationship schema specifies the name of the relationship as well as the list
of attributes with their domains.
 The relationship diagram is:
Person (num : integer, FN : String, LN : String)
 Remarks :

- a relation is a set of tuples, there is therefore no notion of order on the tuples,


- on the other hand a tuple is an ordered sequence of attributes,
- an attribute value is atomic but can possibly be zero (a particular value which
indicates that the value is missing).
6 Concepts of the relational model
3. Degree
The degree of a relationship is its number of attributes.

4. Cardinality
The cardinality of a relation is its number of occurrences.

5. Occurrence
An occurrence is an element of the whole represented by a
relation. In other words, an occurrence is a row in the table.
7 Concepts of the relational model
6. Candidate key
 In relational theory, all tuples must be distinct.
 A relation therefore has at least one subset (A1,A2,….Ak) of the set of
attributes which makes it possible to identify each of the tuples of the
relation.
 Informally, a key is a minimum set of attributes whose values knowledge
makes it possible to identify a unique record of the relationship
considered.
 A key is invariant over time. In general, there are several keys for the
same relation R.
8
Concepts of the relational model

 Every relation has at least one candidate key and can have
several. This has the consequence that there can never be two
identical occurrences within a relationship: these two
occurrences would in fact represent the same object.

 The candidate keys of a relation do not necessarily have the


same number of attributes.
9
Concepts of the relational model
7. Primary key
 Among the possible keys, we choose a key which will be called
primary key. (underlined in the relational diagram).
 The primary key of a relation is one of its candidate keys.

Key Examples :
 EMPLOYEE (number, last name, first name, salary) has a candidate key:
number
 If we are certain that 2 employees never have the same name, name is a
2nd Candidate key;
 So EMPLOYEES (number, last name, first name, salary) has 2 candidate
keys: number (which we will certainly choose as the primary key) and (last
name, first name)
10
Concepts of the relational model
8. Foreign key

 A foreign key in a relationship is formed from one or more of


its attributes that constitute a candidate key in another
relationship.

 Foreign key cannot be a candidate key in a relationship


Database Schema and Integrity
11
Constraints
 A relational database schema S (intent) is a set of relation schemas
S = {R1, R2, …, Rp} and a set of integrity constraints (IC).

 An integrity constraint is a property of the schema, invariant over time.


There are different types of integrity constraints:
 model-related (no duplicates in a relationship.);
 domain (number_time < 100; no null value for primary key);
 so-called foreign key referentials which require that the attribute value
of relation r1 appear as a key value in another relation r2.
12 Functional Dependencies

 A functional dependency, denoted by X → Y, between two sets of


attributes X and Y (X and Y are subsets of R) specifies a constraint
on the possible tuples that can form a relation instance r of R: for
any two tuples t1 and t2 in r such that t1[X]= t2[X], we must have
t1[Y]= t2[Y].
 If X → Y, we say X functionally determines Y or Y is functionally
dependent on X.
 We abbreviate functional dependency by FD. X is called the left-
hand side of the FD. Y is called the right-hand side of the FD.
 A functional dependency is a property of the meaning or semantics
of the attributes, I.e., a property of the relation schema. They must
hold on all relation states (extensions) of R. Relation extensions
r(R) that satisfy the FD are called legal extensions.
13 Functional Dependencies (FD)

Example :
 CUSTOMER(CustomerNum,CustomerLastName,CustomerFirst
Name)
 CustomerNum —> CustomerLastName.
 There is a DF between client num and client name, because if we
know a value of the client num property (eg: 4553), only one
value of the name property can correspond to it (eg: Benbou).
 The converse is false Customer name —> Customer number is
not a DF. If we know the value of the Client Name property, we
cannot deduce the Client Num property from it, because there
can be several clients who have the same name.
14 Functional Dependencies (Cont.)

 Examples.
1. SSN → ENAME
2. PNUMBER → {PNAME, PLOCATION}
3. {SSN, PNUMBER} → HOURS
4. Others?

 Diagrammatic notation for displaying FDs. (Fig. 14.3)

 FD is property of the relation schema R, not of a particular relation


state/instance r(R).

 FDs cannot be inferred from a given relation extension r, but must be


defined explicitly by someone who knows the semantics of the attributes
of R. (Fig. 14.7)
Figure 14.3 Two relation schemas and their functional dependencies.
15 Both suffer from update anomalies. (a) The EMP_DEPT relation schema.
(b) The EMP_PROJ relation schema.
Functional Dependencies (Cont)
16
 From the FDs:
F = {SSN → { ENAME, BDATE, ADDRESS, DNUMBER},
DNUMBER → {DNAME, DMGRSSN}}
we can infer the following FDs:
SSN → {ENAME, DMGRSSN},
SSN → SSN,
DNUMBER → DNAME

 A FD X → Y is inferred from a set of dependencies F specified on R if


X → Y holds in every relation state r that is a legal extension of R.

 F |= X → Y denotes X → Y is inferred from F.


 The closure of F, denoted by F+, is the set of all FDs that can be inferred
from F.
17 Exercise 01: FDs
 Consider the following diagram of the EMPLOYMENT-TIME
relationship:
SCHEDULE (number-st, name-st, day, time, room, teacher,
module, name-department, university)
1) What do the following DFs mean, if they are checked on
SCHEDULE
 number-st Module
 number-st, jour Module
 number-st, day, time Module
 teacher, module room
18

A student takes only one module.


A student takes only one module per day.
A student takes a single module at one time of the day.
A teacher teaches a module in a single room
19 Exercise 01(cont.)
2) Translate if possible the following statements in terms of DF?
 A teacher can teach in several departments.
 A teacher teaches in the same room every day.
 A student can be registered at several universities.
 A student is registered in only one department of a university,
but can be registered in several universities.
20

- impossible

- teacher, day room

- impossible

- number-st, number-univ name-department


21 Partial Functional Dependence
 A BFD is a functional dependence of the form X —> A,
where A is a unique attribute not belonging to X and where there does not exist
X’ included in the strict sense in X. (i.e. X’ c x) such that X’ —> A.
In other words, a functional dependency is elementary if the target is a unique
attribute and if the source does not include superfluous attributes.
Remarque
 The question about the elementality of a functional dependence must
therefore only arise when the left part of the functional dependence
includes several attributes.
 Example:
Employee (MatrEmp, NumProjet, NomEmp, DesignProjet)
EmpMatr, ProjectNum —> EmpName (EmpMatr —> EmpName )
22 Direct functional dependence
 A FD X —> A is a DFD if there exists no set of attributes Y such that
X —> Y and Y —> A.
In other words, this means that the dependence between X and A cannot be
obtained by transitivity.
Example
Student(StudentNum, StudentName, DepartmentNum,
DepartmentName)
The DF: Student Number -> Department Number, is direct.
On the other hand, the DF StudentNum -> DepartmentName is not direct.
We have in fact:Numétudiant -> Numdépartement et
Numdépartement -> Nomdépartement.
23 Inference rules for FDs

 Abbreviated notation: XYZ → UV for {X, Y, Z} → {U, V}


 Reflective: If Y  X, then X → Y
 Augmentation: {X → Y} |= XZ → YZ
 Transitive: {X → Y, Y → Z} |= X → Z
 Decomposition (projective): {X → YZ} |= X → Y
 Union (additive): {X → Y, X → Z} |= X → YZ
 Pseudotransitive: {X → Y, WY → Z} |= WX → Z
 The first three rules are sound and complete, called Armstrong's inference rules.
24 Functional Dependencies (Cont.)

 Equivalence of sets of FDs.


 E is covered by F if every FD in E is also in F+, i.e., every FD in E can be inferred
from F.
 E and F are equivalent if E+ = F+, i.e, E covers F and F covers E.
 F is minimal if
- every dependency in F has a single attribute for its right hand side;
- we cannot remove any FD from F and still have a set of FDs
equivalent to F;
- we cannot replace any FD X → A in F with a FD Y → A where
Y  X and still have a set of FDs equivalent to F.
 Minimal set: a standard or canonical form with no redundancies.

 A minimal cover of F is a minimal set of dependencies, Fmin, that is equivalent to F.


25 Minimal Cover (MC)
 The minimum coverage aims to find a minimal set of FDs
representing “the same information” as a given set of FDs: F,
but without redundancies.

 MC: a set F of elementary DFs associated with a set of


attributes verifying the following properties:
a) No FDs in F are redundant, i.e. for any FD f of F, F-f is
not equivalent to F;
b) Any elementary DF of the attributes is in the closure of
F.
26 Exemple
F = { A -> B, A->C, B->C, C->B}
Find the minimum coverage(s).

MC1= {A->C, B-> C, C->B}


MC2={ A -> B, B-> C, C->B}
27 Graphes de Dépendances
Fonctionnelles (GDF)

 Represent the Dfs by a graph called a functional dependency


graph (FDG).
 Let G be a set of attributes and F a set of DFs. The GDF
relative to G and F is a graph

Vertices are the attributes of G and


Arcs are the functional dependencies of F.
28 Utility Functional Dependency Graphs (FDM)

 Check that the graph is indeed minimum.


 Find relationship identifiers.
 Test if the relationship is good (well normalized) .
 Otherwise find the decompositions.

Example of non-minimum graph


E->D is deducted from E->C et C->D
We must remove E->D from the graph
29 Transitive Closure
 Transitive closure: the set of considered FDs enriched with all the
elementary DFs deduced by transitivity.

 Example :
 Let F be the set of DF relative to
 R( N°veh, Type, Couleur, Marque, Puissance)

F= (N°veh->Type ; Type->Marque ; Type->Puisance ;


N°veh->Couleur)
 The transitive closure of F is :
 FT= F U (N°veh->Marque ; N°veh->Puissance)
30

Normalization
31 What is Normalization ?

▪ NORMALIZATION is a database design technique that organizes tables


in a manner that reduces redundancy and dependency of data.

▪ Normalization divides larger tables into smaller tables and links them
using relationships.

▪ The purpose of Normalization is to eliminate redundant (useless) data and


ensure data is stored logically.

 The inventor of the relational model E.F.Codd proposed the theory of


normalization.
32 Redundancy

 Row Level Redundancy:  If the SID is primary key to


each row, you can use it to
remove the duplicates as
shown below:
SID SName Age SID SName Age

1 Jojo 20 1 Jojo 20
2 Kit 25
2 Kit 25

1 Jojo 20
33 Redundancy (Cont..)

 Column Level Redundancy:


 Now Rows are same but in column level because of Sid is primary key but
columns are same. Redundant
Sid Sname Cid Cnam Fid Fname Salary Column
Values
e
1 AA C1 DBMS F1 Jojo 30000

2 BB C2 JAVA F2 KK 50000

3 CC C1 DBMS F1 Jojo 30000

4 DD C1 DBMS F1 Jojo 30000


34 What is an Anomaly?

 Problems that can occur in poorly planned,


unnormalized databases where all the data is stored in
one table (a flat-file database).
 Types of Anomalies:
• Insert
• Delete
• Update
35 Anomalies in DBMS

 Insert Anomaly : An Insert Anomaly occurs when certain


attributes cannot be inserted into the database without the
presence of other attributes.
 Delete Anomaly: A Delete Anomaly exists when certain
attributes are lost because of the deletion of other attributes.
 Update Anomaly: An Update Anomaly exists when one or more
instances of duplicated data is updated, but not all.
36 Anomaly Example

 Below table University consists of seven attributes: Sid, Sname, Cid,


Cname, Fid, Fname, and Salary. And the Sid acts as a key attribute
or a primary key in the relation.
37 Insertion Anomaly

 Suppose a new faculty joins the University, and the Database Administrator
inserts the faculty data into the above table. But he is not able to insert
because Sid is a primary key, and can’t be NULL. So this type of anomaly is
known as an insertion anomaly.
38 Delete Anomaly

 When the Database Administrator wants to delete the student details of


Sid=2 from the above table, then it will delete the faculty and course
information too which cannot be recovered further.
SQL:
DELETE FROM University WHERE Sid=2;
39 Update Anomaly
 When the Database Administrator wants to change the salary of faculty F1
from 30000 to 40000 in above table University, then the database will
update salary in more than one row due to data redundancy. So, this is an
update anomaly in a table.
SQL:
UPDATE University
SET Salary= 40000
WHERE Fid=“F1”;

To remove all these anomalies, we need to normalize the data


in the database.
40 Normal forms
 The Theory of Data Normalization in SQL is still being developed further. For
example, there are discussions even on 6th Normal Form. However, in most
practical applications, normalization achieves its best in 3rd Normal Form.
The evolution of Normalization theories is illustrated below-

Database Normalization
41 First Normal Form (1NF)

 According to the E.F. Codd, a relation will be in 1NF, if each


cell of a relation contains only an atomic value.
42 1NF Example

 Example:

The following Course_Content relation is not in 1NF


because the Content attribute contains multiple values.
43 1NF Example (Cont..)

 The below relation student is in 1NF:


44 Rules of 1NF
The official qualifications for 1NF are:

1. Each attribute name must be unique.

2. Each attribute value must be single.

3. Each row must be unique.

 Additional:

 Choose a primary key.

 Reminder:

A primary key is unique, not null, unchanged. A primary key can be


either an attribute or combined attributes.
45 Second Normal Form (2NF)

 According to the E.F. Codd, a relation is in 2NF, if it satisfies


the following conditions:

The table should be in the First Normal Form.

There should be no Partial Dependency.


46 Prime and Non Prime Attributes
Prime attributes: The attributes which are used to form a candidate key are
called prime attributes.

Non-Prime attributes: The attributes which do not form a candidate key are
called non-prime attributes.

 Prime Attribute: Roll No., Course Code

 Non-Prime Attribute: First Name of Student, Last Name of Student


47 2NF Example
 In Student_Project relation that the prime key attributes are Stu_ID and
Proj_ID.
 According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must
be dependent upon both and not on any of the prime key attribute individually.
 But we find that Stu_Name can be identified by Stu_ID and Proj_Name can
be identified by Proj_ID independently. This is called partial dependency,
which is not allowed in Second Normal Form.

 Candidate Keys: {Stu_ID, Proj_ID}


 Non-prime attribute: Stu_Name, Proj_Name
48 2NF Example (Cont..)

 We broke the relation in two as depicted in the above picture. So


there exists no partial dependency.
49 Example 2NF

 The Course Name depends on only CourseID, a part of the


primary key
not the whole primary {CourseID, SemesterID}.It’s called
partial dependency.
 Solution:
 Remove CourseID and Course Name together to create a
new table.
50 Example 2NF (Cont..)
CourseID SemesterID Num Student
IT101 201301 25
IT101 201302 25
IT102 201301 30
IT102 201302 35
IT103 201401 20

Done? Oh no, it is still CourseID Course Name


not in 1NF yet. IT101 Database
Remove the repeating IT102 Web Prog
groups too. IT103 Networking
Finally, connect the
relationship.
51 Third Normal Form (3NF)

 According to the E.F. Codd, a relation is in third normal form


(3NF) if it satisfies the following conditions:

✓It should be in the Second Normal form.

✓It should not have Transitive Dependency.

✓All transitive dependencies are removed to place in


another table.
52 3NF Example
 We find that in the above Student_detail relation, Stu_ID is the key and only prime
key attribute.
 We find that City can be identified by Stu_ID as well as Zip itself.
 Neither Zip is a superkey nor is City a prime attribute. Additionally, Stu_ID → Zip
→ City, so there exists transitive dependency.

Candidate Key: {Stu_ID}


Prime attribute: Stu_ID
Non-prime attribute: {Stu_Name, City, Zip}
53 3NF Example (Cont..)

 To bring this relation into third normal form, we break the


relation into two relations as follows −
54 Example 3NF

Solution: The Teacher Tel is a nonkey attribute, and


Remove Teacher Name and Teacher Tel the Teacher Name is also a nonkey atttribute.
together But Teacher Tel depends on Teacher Name.
to create a new table. It is called transitive dependency.
55 Example 3NF
StudyID Course Name T.ID
1 Database T1
2 Database T2
3 Web Prog T3
4 Web Prog T3
5 Networking T4
Done?
Oh no, it is still not in
1NF yet.
Remove Repeating
row. ID Teacher Name Teacher Tel
Note about primary key:
T1 Sok Piseth 012 123 456
- In theory, you can choose
Teacher Name to be a primary key. T2 Sao Kanha 0977 322 111
- But in practice, you should add T3 Chan Veasna 012 412 333
Teacher ID as the primary key. T4 Pou Sambath 077 545 221
56 Example Table

 StudentID is the primary key.

Is it 1NF?
How can you make it 1NF?
57 Example 1 (Cont..)

 Create new rows so each cell contains only one value

 But now the studentID no longer uniquely identifies each


row. You now need to declare studentID and subject
together to uniquely identify each row. So the new key is
StudentID and Subject.
Is it 2NF?
58 Example 1 (Cont..)

 Studentname and address are dependent on studentID (which is part of


the key)
This is good. But they are not dependent on Subject (the other part of the
key)

 And 2NF requires…

All non-key fields are dependent on the ENTIRE key (studentID +


subject)
59 Example 1 (Cont..)

 Make new tables


Make a new table for each primary key field
Give each new table its own primary key
Move columns from the original table to the new table that
matches their primary key…
60 Example (Cont..)

 STUDENT TABLE (key = StudentID)

 RESULTS TABLE (key = StudentID+Subject) SUBJECTS


TABLE (key = Subject)

But is it 3NF?
61 Example 1 (Cont..)

 HouseName is dependent on both StudentID + HouseColour


Or
 HouseColour is dependent on both StudentID + HouseName

 But either way, non-key fields are dependent on MORE THAN THE
PRIMARY KEY (studentID). And 3NF says that non-key fields must
depend on nothing but the key
62 Example 1 (Cont..)
63 Example 1 (Cont..)

• The Final Scheme


64 Example 2

 We will use the Student_Grade_Report table below, from a


School database, as our example to explain the process for 1NF.

Student_Grade_Report (StudentNo, StudentName, Major,


CourseNo, CourseName, InstructorNo, InstructorName,
InstructorLocation, Grade)
65 Process for 1NF
 In the Student Grade Report table, the repeating group is the course information. A
student can take many courses.
 Remove the repeating group. In this case, it’s the course information for each student.
 Identify the PK for your new table.
 The PK must uniquely identify the attribute value (StudentNo and CourseNo).
 After removing all the attributes related to the course and student, you are left with the
student course table (StudentCourse).
 The Student table (Student) is now in first normal form with the repeating group
removed.
 The two new tables are shown below:

Student (StudentNo, StudentName, Major)


StudentCourse (StudentNo, CourseNo, CourseName, InstructorNo,
InstructorName, InstructorLocation, Grade)
66 Example 2 (Cont..)
Student (StudentNo, StudentName, Major)
StudentCourse (StudentNo, CourseNo, CourseName, InstructorNo,
InstructorName, InstructorLocation, Grade)

 To move to 2NF, a table must first be in 1NF.


 The Student table is already in 2NF because it has a single-column PK.
 When examining the Student Course table, we see that not all the
attributes are fully dependent on the PK; specifically, all course
information. The only attribute that is fully dependent is grade.
 Identify the new table that contains the course information.
 Identify the PK for the new table.
 The three new tables are shown below.
67 Example 2 (Cont..)

Student (StudentNo, StudentName, Major)

CourseGrade (StudentNo, CourseNo, Grade)

CourseInstructor (CourseNo, CourseName, InstructorNo,


InstructorName, InstructorLocation)
68 Process for 3NF

 Eliminate all dependent attributes in transitive


relationship(s) from each of the tables that have a transitive
relationship.
 Create new table(s) with removed dependency.
 Check new table(s) as well as table(s) modified to make
sure that each table has a determinant and that no table
contains inappropriate dependencies.
 See the four new tables below.
69 Process for 3NF

Student (StudentNo, StudentName, Major)

CourseGrade (StudentNo, CourseNo, Grade)

Course (CourseNo, CourseName, InstructorNo)

Instructor (InstructorNo, InstructorName,


InstructorLocation)
70 Process for 3NF

 At this stage, there should be no anomalies in third normal


form.

Student (StudentNo, StudentName, Major)

StudentCourse (StudentNo, CourseNo, CourseName,


InstructorNo, InstructorName, InstructorLocation,
Grade)
71

END

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy