0% found this document useful (0 votes)
6 views53 pages

Dbms Unit 3 Part2

Schema refinement is a process aimed at eliminating data redundancy and maintaining consistency in databases through normalization and decomposition. Redundant data can lead to various anomalies such as update, deletion, and insertion anomalies, which complicate database management. Decomposition can be either lossless or lossy, impacting the integrity of the data when tables are divided into smaller relations.

Uploaded by

shinchanm1809
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views53 pages

Dbms Unit 3 Part2

Schema refinement is a process aimed at eliminating data redundancy and maintaining consistency in databases through normalization and decomposition. Redundant data can lead to various anomalies such as update, deletion, and insertion anomalies, which complicate database management. Decomposition can be either lossless or lossy, impacting the integrity of the data when tables are divided into smaller relations.

Uploaded by

shinchanm1809
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 53

Schema refinement

• Schema refinement is intended to address and a refinement approach based


on decompositions. Redundant storage of information is the root cause of
these problems. Although decomposition can eliminate redundancy, it can
lead to problems of its own and should be used with caution.
• Normalisation or Schema Refinement is a technique of organizing the data in
the database. It is a systematic approach of decomposing tables to eliminate
data redundancy and undesirable characteristics like Insertion, Update and
Deletion Anomalies.
• To Remove Redundancy problem in Database with the help of schema
refinement in normalization.
• To maintain Consistency and Integrity.
Problems caused by redundancy
• Redundancy means having multiple copies of same data in the database.
This problem arises when a database is not normalized.
• If a database design is not perfect, it may contain anomalies. Managing a
database with anomalies is next to impossible.
EX: STUDENT Relation
SID NAME ADDRESS COURSE
1 Rashmi Kolkata C++
2 Rahul Punjab Algorithms
3 Sonia Haryana C++
3 Sonia Haryana Algorithms
4 Shwetha Delhi NULL
1.Update anomalies − If data items are scattered and are not
linked to each other properly, then it could lead to strange situations.
For example, when we try to update one data item having its copies
scattered over several places, a few instances get updated properly
while a few others are left with old values. Such instances leave the
database in an inconsistent state.
Ex: Updating the address of Sonia, Say She shifted from Haryana to
Chandigarh, would mean performing an updation in all rows
containing Sonia’s record , Else an inconsistency in values would arise
Called updation Anomaly-Leading to Loss of Integrity.

2. Deletion anomalies – If the user tried to delete a record,


but parts of it was left undeleted because of unawareness, the data is
also saved somewhere else. Loss of certain attributes occurs because
of the deletion of some other attributes
Ex: Let us say that Rahul dropped the course “Algorithm” then, as
soon as we delete the record entry corresponding to the course
“Algorithm”, the entire record of “Rahul” gets deleted. To avoid it
we insert a Null Entry in the course attribute instead of deleting
the record. So a better option is Normalization

3. Insertion anomalies –
If the user tried to insert data in a record that does not exist at
all. An Insert anomaly arises when certain attributes cannot be
inserted into the database without the presence of other
attributes.
Ex: Adding record of a new student not taking any course would
mean inserting a student record with Null entity in the course
field. This Null entity is done in order to avoid the insertion
anamoly else we won’t be able to insert a student record for a
student who has not opted any Course Yet.
Decomposition
• TO AVOID REDUNDANCY and problems due to redundancy, we use
refinement technique called DECOMPOSITION.
• Decomposition in DBMS removes redundancy, anomalies and
inconsistencies from a database by dividing the table into multiple
tables.
• Decomposition:- Process of decomposing a larger relation into
smaller relations.
• Each of smaller relations contain subset of attributes of original
relation.
Decomposing Relations into No.of
Relations

R(ABCD….)

R1(AB) R2(BC) Rn(ACD)


……………………
• Decompositions are of two types
1.Lossless Join Decomposition.
2.Lossy Join Decomposition.
1. Lossless Join Decomposition-
• Consider there is a relation R which is decomposed into sub
relations R1 , R2 , …. , Rn.
• This decomposition is called lossless join decomposition when
the join of the sub relations results in the same relation R that
was decomposed.
• For lossless join decomposition, we always have-
R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn = R
where ⋈ is a natural join operator
Example-
Consider the following relation R( A , B , C )-

A B C

1 2 1

2 5 3

3 3 3

R( A , B , C )
Consider this relation is decomposed into two sub relations R1( A , B )
and R2( B , C )-
The two sub relations are-
A B

1 2

2 5

3 3

R1( A , B )

B C

2 1

5 3

3 3

R2( B , C )
• Now, let us check whether this decomposition is
lossless or not.
• For lossless decomposition, we must have-
• R1 ⋈ R2 = R
• Now, if we perform the natural join ( ⋈ ) of the
sub relations R1 and R2 , we get-
A B C
1 2 1
2 5 3
3 3 3
• This relation is same as the original relation R.
• Thus, we conclude that the above decomposition is
lossless join decomposition.
NOTE-
• Lossless join decomposition is also known as non-
additive join decomposition.
• This is because the resultant relation after joining
the sub relations is same as the decomposed
relation.
• No extraneous tuples appear after joining of the
sub-relations.
2. Lossy Join Decomposition-
• Consider there is a relation R which is decomposed into sub
relations R1 , R2 , …. , Rn.
• This decomposition is called lossy join decomposition when the
join of the sub relations does not result in the same relation R that
was decomposed.
• The natural join of the sub relations is always found to have some
extraneous tuples.
• For lossy join decomposition, we always have-
R1 ⋈ R2 ⋈ R3 ……. ⋈ Rn ⊃ R
where ⋈ is a natural join operator
Example-
• Consider the following relation R( A , B , C )-
A B C
1 2 1
2 5 3
3 3 3

R( A , B , C )
Consider this relation is decomposed into two sub relations as R1( A , C ) and R2( B , C )-

The two sub relations are-


A C B C
1 1
2 1
2 3
5 3
3 3
3 3

R1( A , B ) R2( B , C )

Now, let us check whether this decomposition is lossy or not.

For lossy decomposition, we must have-

R1 ⋈ R2 ⊃ R
Now, if we perform the natural join ( ⋈ ) of the sub relations R1 and
R2 we get-
A B C
This relation is not same as the original relation R
1 2 1
and contains some extraneous tuples.
2 5 3
Clearly, R1 ⋈ R2 ⊃ R.
2 3 3
Thus, we conclude that the above decomposition is
3 5 3
lossy join decomposition.
3 3 3

NOTE-
• Lossy join decomposition is also known as careless
decomposition.
• This is because extraneous tuples get introduced in the natural
join of the sub-relations.
• Extraneous tuples make the identification of the original tuples
difficult.
NOTE: R ⋈ S(Natural Join):-
(Attributes of Relation R) ∩ (Attributes of Relation S) = Ø Then
Natural join act as cross product , Otherwise we will take common
attribute as one attribute
Ex:-
Relation (R) Relation (S)

A B C D

a1 b1
c1 d1

a2 b2
c2 d2
a3 b3
R ⋈ S(Natural Join)= R✕S when
(Attributes of Relation R) ∩ (Attributes of Relation S) = Ø
So here R is M tuples and S is N tuples i.e M*N=3*2 = 6 Tuples in
total.
A B C D
a1 b1 c1 d1
a1 b1 c2 d2
a2 b2 c1 d1
a2 b2 c2 d2
a3 b3 c1 d1
a3 b3 c2 d2
Functional Dependency
• Functional Dependency (FD) determines the relation of one
attribute to another attribute in a database management system
(DBMS) system.
• Functional dependency helps you to maintain the quality of data in
the database.
• Functional Dependency plays a vital role to find the difference
between good and bad database design.
• FD is the generalization of the concept of key.
• A functional dependency is denoted by an arrow →.
• The functional dependency of X on Y is represented by X → Y i.e
(X→Y means X determines Y ) or (X is functionally dependent on Y).
• Here X is Determinant and Y is called Dependent.
• X→Y says that if two tuples agree on the values of attribute X they
must also agree on the value in attribute Y.
• Given a relation R, a set of attributes X in R is said to functionally
determine another set of attributes Y also in R i.e X→Y Iff (if and
only if) each X value is associated with precisely one value of Y. So
R is said to satisfy the FD X → Y.
Find Valid and invalid FD’s

Invalid FDs are


A B C D
a1 b1 c1 d1 A→B, A→C, B→A, B→C, B→D, C→A,
a1 b2 c2 d1 C→B, C→D, D→A, D→B, D→C, AD→B,
a2 b2 c1 d2 AD→C.
a3 b3 c2 d2
valid FDs are
a4 b4 c4 d4
a5 b3 c3 d3 A→D, AB→C, AB→D, AC→B, AC→D,
BC→A, BC→D,BD→A, BD→C, CD→A,
CD→B.
Reasoning about functional dependencies

Armstrong Axioms :
Armstrong axioms defines the set of rules for
reasoning about functional dependencies and also
to infer all the functional dependencies on a
relational database
Various axioms rules or inference rules:
1.Primary axioms 2.secondary or derived axioms
Rule 4 : Composition
If X → Y and Z → W then XZ → YW
Attribute closure:
Attribute closure of an attribute set can be defined as set of attributes
which can be functionally determined from it.
NOTE:
To find attribute closure of an attribute set-
1) Add elements of attribute set to the result set.
2) Recursively add elements to the result set which can be
functionally determined from the elements of result set.
Ex1: R(A, B, C) FD’s:{A → B, B → C}
Attribute closure(A) I.e A+ → {A, B, C}
Attribute closure(B) I.e B+ → {B, C}
Ex2: R(A, B, C, D, E, F) FD’s: {AB → C, BC → AD, D →E, CE →B}
Find (AB)+, (BC)+, (D)+, (CE)+

Solution:
(AB)+={A, B, C, D, E} so the following FD’s will be generated
from(AB) +
AB → A, AB → B, AB → C, AB → D, AB → E
(BC)+ ={B, C, A, D, E} = {A, B, C, D, E}
(D)+={D,E}
(CE)+={C, E, B, A, D}={A, B, C, D, E}

Ex3: R(A, B, C, D, E, F, G)

FD: {AB → CD, AF → D, DE → F, C → G, F → E, G → A}


Find (CF) + (BG) + (AF) + (AB) +
(CF) += {C, F, G, E, A, D} , (BG) += {B, G, A, C, D}
(AF) += {A, F, E, D}, (AB) += {A, B, C, D, G}

Ex4: R(A, B, C, D, E)
FD: {A → B, B → D, C →DE, CD → AB}
Find (A)+ , (B)+ , (C)+ , (D)+ , (F)+ , (ABD)+

Solution:
(A)+ = {A, B, D}
(B)+ = {B, D}
(C)+ = {C, D, E, A, B}
(D)+ = {D}
(F)+ = {F}
(ABD)+ = {A, B, D}
Types of functional dependencies:
1) Trivial functional dependency:-A FD X→Y is said to be Trivial FD If
and only if (iff) Y⊆X. In other words If RHS of same FD is the
subset of LHS of the FD called Trivial FD.
Ex: AB → A, AB → B, AB → AB
AB → C(Non Trivial FD), AB → AC (Non Trivial FD)
2) Completely Non-trivial functional dependency:-If X→Y and Y is not
subset of X. In other words If X→Y and X∩Y=Ф (null) then it is called
completely non-trivial functional dependency.
Ex: AB → C, AB →CD
3) Semi Non Trivial FD:- If X→Y and X ∩ Y ≠ Ø then it is called as Semi
Non-Trivial FD. In other words if there is at least one attribute in the
RHS i.e not part of the LHS such FD is called Non-Trivial FD.
Ex: AB → BC (AB ∩ BC=B i.e ≠ Ø), AB → AC, AB → A (Trivial FD)
Prime and non-prime attributes
Attributes which are parts of any candidate key of relation are called
as prime attribute, others are non-prime attributes.
Candidate Key:
Candidate Key is minimal set (super key) of attributes of a relation
which can be used to identify a tuple uniquely.
Consider student table: student(sno, sname, sphone, age)
we can take sno as candidate key. we can have more than 1 candidate
key in a table.
Types of candidate keys:
1. simple(having only one attribute)
2. composite(having multiple attributes as candidate key)
Super Key:
Super Key is set of attributes of a relation which can
be used to identify a tuple uniquely.
• Adding zero or more attributes to candidate key
generates super key.
• A candidate key is a super key but vice versa is not
true.
Consider student table: student(sno, sname,
sphone, age)
we can take sno, (sno, sname) as super key
EX 1: R(A, B, C, D) FD’s are {A → BC, B → CD, D → AB}
Find Candidate keys, Super keys, Prime attributes and Non-Prime
attributes ?
Solution: 1. Candidate Keys are minimal set(super key) that
determines all attributes of a relation which can be used to identify a
tuple uniquely.
2. Super Key can determine all the attributes of that relation.
3. Attributes which are parts of any candidate key of relation are
called as prime attribute, others are non-prime attributes.
Note:1. if X+ contains all the attributes of that relation then X is called
super key of relation(R).
2. If X is a minimal set then X is called as candidate key of a
relation(R).
A+ →ABCD C+ →C
B+ →BCDA D+ →DABC
Here A, B and D are Candidate keys and Prime Attributes.
C is a Non-Prime attribute.
EX 2: R(A, B, C) FD’s are {A →B, B →C}
Solution: A+→ABC- It (A)is CK and SK
B+→BC- B is not SK and not CK because it is not generating all
attributes of relation(R).
C+→C- is not SK and not CK because it is not generating all
attributes of relation(R).
(AB)+→ABC -AB is SK but not CK
(AC)+→ABC -AC is SK but not CK
(BC)+→BC -BC is not SK and not CK
(ABC)+→ABC -ABC is SK but not CK
Prime attribute is A
Non Prime attribute are B,C
Ex 3: R(ABCDE) FD’s are {AB→C, C→D, B→E}. Find Candidate keys,
Super keys, Prime attributes and Non-Prime attributes ?
Solution:
• Ex 4: R(ABCDE) FD’s are {AB→C, C→D, B→EA}. Find Candidate
keys, Super keys, Prime attributes and Non-Prime attributes ?
Solution:
• Ex 5: R(ABCD) FD’s are {AB→CD, A→B}. Find Candidate keys,
Super keys, Prime attributes and Non-Prime attributes ?
Solution:
• Ex 6: R(ABCDE) FD’s are {A→B, BC→D, D→AE}. Find Candidate
keys, Super keys, Prime attributes and Non-Prime attributes ?
Solution:
• Ex 7: R(ABCDE) FD’s are {AB→C, CD→E, DE→B}. Find Candidate
keys, Super keys, Prime attributes and Non-Prime attributes ?
Solution:
Normalization of Database

Database Normalization is a technique of organizing the data in the


database. Normalization is a systematic approach of decomposing
tables to eliminate data redundancy (repetition) and undesirable
characteristics like Insertion, Update and Deletion Anomalies. It is a
multi-step process that puts data into tabular form, removing
duplicated data from the relation tables. Normalization is used for
mainly two purposes,

• Eliminating redundant (useless) data.

• Ensuring data dependencies make sense i.e data is logically stored.

It will help in designing a good data base which involves a set of normal
forms as follows -
First normal form
A relation is said to be in first normal form if it contains all atomic
values or single values.
Example:
Course Content
Programming Java, C++
Web HTML, Php, ASP

The above table consists of multiple values in single columns which


can be reduced into atomic values by using first normal form as
follows
Course Content
Programming Java
Programming C++
Web HTML
Web Php
Web ASP
Second normal form
• A relation is said to be in second normal form if it is in first normal
form without any partial dependencies.
• In second normal form non-prime attributes should not depend on
proper subset of (Candidate Key)key attributes.
Example:
Student ID Student Name Project ID Project Name

Here (student id, project id) are key attributes and (student name,
project name) are non-prime attributes. It is decomposed as
Project ID Project Name
Student ID Student Name Project ID
Ex: R(A, B, C, D) FD’s are {A→B, B→C} Find whether it is in 2nd Normal
form or not ?
Solution: AD+ →{A, D, B, C}
Note: A+ →{A, B, C}, D+ →{D}
So CK=AD
Prime attributes are A and D.
Non-prime attributes are B and C.
A→B
R1={A, B, C} R2={A,D}
A+ →{A, B, C} FDs are {Nill}
Fds are {A → BC, B → C} A+=ABC
A+=ABC D+=D
B+=BC
C+=C
CK=A for R1
So It is in 2nd Normal form.
Third Normal Form (3NF)
A table is said to be in the Third Normal Form when,
1. It is in the Second Normal form.
2. And, it doesn't have Transitive Dependency.
Transitive dependency – If A->B and B->C are two FDs then A->C is
called transitive dependency.

A relation is in 3NF if at least one of the following condition holds in


every non-trivial function dependency X –> Y
2. X is a super key or Candidate Key.
Or
2. Y is a prime attribute (each element of Y is part of some candidate
key).
Student Id Student Name City Country Zip

This Table is Decomposed as


Student Id Student Name Zip

Zip City Country


Ex: R(A, B, C, D) FD’s are {A→B, B→C, C→D} Find whether it is in 3rd
Normal form or not ?
Solution:
Solution: A+ →{A, B, C, D}
So CK=A
Prime attribute is A .
Non-prime attributes are B, C and D.
B→C, C→D- These two FDs are Transitive Dependencies.
B→C C→D
B+ →{B, C, D} C+ →{C, D}
R1={B, C, D} R2={C, D} R3= {A, B}
Fds are {B→ CD, C → D} Fds are {C → D} Fds are {A→B}
It is still in 2nd NF
But Not in 3rdNF
In R1 C→D is Transitivity dependence , So R1 is divided into R11 and
R12
C+→{C, D}
R11={C, D} R12= {B, C}
FDs are {C→D} B+→{B, C, D}
FDs are B→C

So Total Sub relations are R12(B, C), R2(C, D), R3(A, B) , so it is in 3rd NF.
Boyce and Codd Normal Form (BCNF)
Boyce and Codd Normal Form is a higher version of the Third Normal form. This form
deals with certain type of anomaly that is not handled by 3NF. It is an extension of
third normal form. A 3NF table which does not have multiple overlapping candidate
keys is said to be in BCNF. For a table to be in BCNF, following conditions must be
satisfied:
• R must be in 3rd Normal Form and
• for each functional dependency ( X → Y ), X should be a super Key or Candidate
key.
St Name Course Teacher

This table is divided further


St Name Course

Course Teacher
Ex: R(A, B, C, D, E, F, G, H) FD’s are {A→BD, B→C, E→FG, AE→H}. Find
whether it is in BCNF or not ?
Solution: AE+ →{A, E, B, D, C, F, G, H}
So CK=AE
Prime attribute is A and E .
Non-prime attributes are B, C, D, F, G and H.
A→BD B→C
A+→{A, B, D, C} B+→{B,C} R12={A, B, D}
R1=(ABCD) R11={BC} A+→{A, B, D, C}
A+→{A, B, D, C} C+→{C} D+→ {D}
B+→ {B, C} Fds are {B→C} Fds are {A→BD}
C+→ {C} So These are in BCNF
D+→ {D}
FDs are {A→BCD , B→C}
B→C E→FG
B+→ {B, C} E+→ {E, F, G} R4= {H, A , E}
R2={B,C} R3= {E, F, G} H+→ {H}
C+→ {C} F+→ {F} A+→ {A, B, D, C}
Fds are {B→C} G+→ {G} E+→ {E, F, G}
It is in BCNF Fds are {E→FG} AE+→AEBDCFGH
It is in BCNF AH+→AHBDC
HE+→HEFG
HAE+→HAEBDCFG
Fds are {AE→H}
The above all (R2, R3 and R4)are in BCNF
So {R11 ∪ R12 ∪ R2 ∪ R3 ∪ R4}=R
Fourth Normal Form (4NF)
A table is said to be in the Fourth Normal Form when,
1. It is in the Boyce-Codd Normal Form.
2. it doesn't have Multi-Valued Dependency .
Note: In some cases multi value dependencies may exist not more
than one time in a given relation.

S_ID Course Hobby


1 Science Cricket
1 Maths Hockey
2 C# Cricket
2 Php Hockey

S_ID → → Course → → : It is MVD Symbol


S_ID → → Hobby
S_ID Course
1 Science
S_ID → → Course
1 Maths
2 C#
2 Php

S_ID Hobby
1 Cricket
S_ID → → Hobby
1 Hockey
2 Cricket
2 Hockey
Fifth Normal Form / Projected Normal Form (5NF)

A relation R is in 5NF if and only if every join dependency in R is


implied by the candidate keys of R. A relation decomposed into two
relations must have loss-less join Property, which ensures that no
spurious or extra tuples are generated, when relations are reunited
through a natural join.
Properties – A relation R is in 5NF if and only if it satisfies following
conditions:

1. R should be already in 4NF.

2. It cannot be further non loss decomposed (join dependency)


Relation(R)
Agent Company Product
Smith C1 Book
Smith C1 Pencil
Smith C2 Laptop
Sam C1 Laptop

R1 R2

Agent Company Agent Product


Smith C1 Smith Book
Smith C2 Smith Pencil
Sam C1 Smith Laptop
Sam Laptop
R1⋈R2 is

Agent Company Product


Smith C1 Book
Smith C1 Pencil
Smith C1 Laptop
Smith C2 Book
Smith C2 Pencil
Smith C2 Laptop
Sam C1 Laptop
Key Points related to normal forms –

1. BCNF is free from redundancy.


2. If a relation is in BCNF, then 3NF is also also satisfied.
3. If all attributes of relation are prime attribute, then the relation is always in
3NF.
4. A relation in a Relational Database is always and at least in 1NF form.
5. Every Binary Relation ( a Relation with only 2 attributes ) is always in BCNF.
6. If a Relation has only singleton candidate keys( i.e. every candidate key consists
of only 1 attribute), then the Relation is always in 2NF( because no Partial
functional dependency possible).
7. Sometimes going for BCNF form may not preserve functional dependency. In
that case go for BCNF only if the lost FD(s) is not required, else normalize till
3NF only.
8. There are many more Normal forms that exist after BCNF, like 4NF and more.
But in real world database systems it’s generally not required to go beyond
BCNF.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy