0% found this document useful (0 votes)
7 views40 pages

ADBMSUnit2pptx 2023 07 31 11 36 38

The document discusses data redundancy in databases, outlining its disadvantages such as insert, update, and delete anomalies. It explains normalization as a method to organize data and minimize redundancy, detailing various normal forms including 1NF, 2NF, 3NF, and BCNF. The document also provides examples of how to determine the highest normal form of a relation based on functional dependencies.

Uploaded by

parthx932
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views40 pages

ADBMSUnit2pptx 2023 07 31 11 36 38

The document discusses data redundancy in databases, outlining its disadvantages such as insert, update, and delete anomalies. It explains normalization as a method to organize data and minimize redundancy, detailing various normal forms including 1NF, 2NF, 3NF, and BCNF. The document also provides examples of how to determine the highest normal form of a relation based on functional dependencies.

Uploaded by

parthx932
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 40

DIPLOMA –

COMPUTER
ENGINEERING
Unit 2
Advance
Normalization Database
Management
System
(09CE1502)
 Redundancy means repeating data in database

Data  Leads various problem


 Insert anomalies
redundancy
 Update anomalies
 Delete anomalies
Disadvanta
ges of data
redundancy  As it can be observed that values of attribute college
name, college rank, course is being repeated which
can lead to problems.
 If a student detail has to be inserted whose course is
not being decided yet then insertion will not be
possible till the time course is decided for student.
Insert
anomaly
 This problem happens when the insertion of a data
record is not possible without adding some additional
unrelated data to the record.
 Suppose if the rank of the college changes then
changes will have to be all over the database which
will be time-consuming and computationally costly.

Update
anomaly  If updating do not occur at all places then database
will be in inconsistent state.
 If the details of students in this table is deleted then
the details of college will also get deleted which
should not occur by common sense.

Delete
anomaly  This anomaly happens when deletion of a data record
results in losing some unrelated information that was
stored as part of the record that was deleted from a
table.
 Normalization is the process of organizing the data in
the database.

Normalizati
on  Normalization is used to minimize the redundancy
from a relation or set of relations. It is also used to
eliminate the undesirable characteristics like
Insertion, Update and Deletion Anomalies.
 Normalization divides the larger table into the smaller
table and links them using relationship.

Normalizati
on  The normal form is used to reduce redundancy from
the database table.
 1NF
 2NF
Type of  3NF

normal form  BCNF(Boyce-codd NF)


 4NF
 5NF
 A relation will be 1NF if it contains an atomic value.

 It states that an attribute of a table cannot hold


First Normal multiple values. It must hold only single-valued
attribute.
Form (1NF)
 First normal form disallows the multi-valued attribute,
composite attribute, and their combinations.
First Normal
Form (1NF)

 Relation EMPLOYEE is not in 1NF because of multi-


valued attribute EMP_PHONE.
 The decomposition of the EMPLOYEE table into 1NF
has been shown below

First Normal
Form (1NF)
 Before we learn about the second normal form, we
need to understand the following −

Second  Prime attribute − An attribute, which is a part of the


candidate-key, is known as a prime attribute.
Normal
Form (2NF)  Non-prime attribute − An attribute, which is not a
part of the prime-key, is said to be a non-prime
attribute.
 In the 2NF, relational must be in 1NF.

Second
Normal  In the second normal form, all non-prime attributes
Form (2NF) are fully functional dependent on the prime attribute.
Second
Normal
Form (2NF)
 In the given table, non-prime attribute TEACHER_AGE
is dependent on TEACHER_ID which is a proper subset
of a candidate key. That's why it violates the rule for
2NF.
 To convert the given table into 2NF, we decompose it
into two tables.

Teacher_detail Teacher_subject
Second
Normal
Form (2NF)
 A relation will be in 3NF if it is in 2NF.

 3NF is used to reduce the data duplication. It is also


Third used to achieve the data integrity.
Normal
Form (3NF)  If there is no transitive dependency for non-prime
attributes, then the relation must be in third normal
form.
Third
Normal
Form (3NF) • Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP
and EMP_ZIP dependent on EMP_ID.
• The non-prime attributes (EMP_STATE, EMP_CITY)
transitively dependent on primary key(EMP_ID).
• It violates the rule of third normal form.
 That's why we need to move the EMP_CITY and
EMP_STATE to the new <EMPLOYEE_ZIP> table, with
EMP_ZIP as a Primary key.

EMPLOYEE EMPLOYEE_ZIP
Third
Normal
Form (3NF)
 BCNF is the advance version of 3NF. It is stricter than
3NF.

Boyce Codd  A table is in BCNF if every functional dependency X →


normal form Y, X is the super key of the table.

(BCNF)  For BCNF, the table should be in 3NF, and for every
FD, LHS is super key.
Boyce Codd
normal form
(BCNF) Super keys
{STU_ID} NOT
POSSIBLE
{STU_ID, SUBJECT}

{SUBJECT, PROFESSOR_ID}NOT
POSSIBLE
 Dependency
 {student_id, subject} →
professor_id
Boyce Codd  Professor_id → subject
normal form Non-prime Prime attribute
(BCNF) attribute
 It violates the rule of BCNF.
Boyce Codd
normal form
(BCNF)
 A set X of attributes in R is a superkey of R if and only
if X+ contains all attributes of R. In other words, X is a
superkey if and only if it determines all other
attributes.

Relationship  X is a candidate key if and only if it is a superkey, but


none of its proper subset is a superkey.
between
FDs and  In other words, X is a candidate key if and only if X+
keys =R, but for any proper subset Y of X, Y+ ≠ R.

 The above observation provides a way to find all


candidate keys of R using the functional
dependencies. We can simply try all subsets of R, and
test whether it is a candidate key.
 However, the candidate keys can be found more
efficiently, using the observations below:

Relationship  Observation 1: any candidate key must contain


between attributes that have not appeared on the RHS of any
functional dependency.
FDs and
keys  Observation 2: if an attributed has occurred on the
RHS of some FD, but not on the LHS of any FD, then it
cannot be in any candidate key.
 Given a set F of functional dependencies that hold on
R, we can find all candidate keys of R as follows:
1. Find all attributes that have not appeared on the
RHS of any FD. Denote this set by A.
2. Denote the set of attributes that appear on the RHS
Candidate of some FD, but not on the LHS of any FD by B.

keys using 3. Compute the closure set A+ , if A+ =R, then A is the


only candidate key.
FDs 4. If A+ ≠R, then for each attribute x in R-B, test
whether AU{x} is a candidate key. If not, try to add
another attribute in R-B to A and test whether it is
candidate key.
5. Repeat step 4, until all candidate keys have been
found
 Relation schema R(ABCDE). FD: AB → C, DE → B, CD
→ E.

Candidate  Since, A and D appear on LHS, A and D will be part of


candidate key.
keys using (AD)+ = (AD)
FDs :
Example 1  Since, (AD)+ does not contain all attributes of R,
hence AD can’t be a candidate key.
 Now, consider combination with A and D,
 (ADB)+ = (ABCD) (using AB → C)
= (ABCDE) (using CD → E)

 (ADC)+ = (ACDE) (using CD → E)


Candidate = (ABCDE) (using DE → B)
keys using
FDs :  (ADE)+ = (ABDE) (using DE → B)
Example 1 = (ABCDE) (using AB → C)

 ADB, ADC and ADE are candidate key for the relation
R, as (ADB)+ , (ADC)+ and (ADE)+ contain all attributes
of relation R.
 R(ABCDE). FD(A → C, CD → B)
 Since, A, D and E appear on LHS.

Candidate
 (ADE)+ = (ACDE)
keys using (using A → C)
= (ABCDE) (using CD → B)
FDs :
Example 2  ADE is candidate key for the relation R, as (ADE)+
contain all attributes of relation R.
 Steps to find the highest normal form of a
relation:

1. Find all possible candidate keys of the relation.

Highest
2. Divide all attributes into two categories: prime
Normal attributes and non-prime attributes.
form
3. Check for 1st normal form then 2nd and so on. If it
fails to satisfy nth normal form condition, highest
normal form will be n-1.
 Is the relation in 1NF? Yes, always!

Highest
 Is the relation in 2NF? Is there a composite CK? If
Normal not, 2NF is automatically achieved. Otherwise, check
form your LHS and RHS. If LHS is proper subset of
candidate key is determining non-prime attribute RHS
than it is not in 2NF.
 Is the relation in 3NF? Does the relation have any
non-primes? If no, the relation is automatically in 3NF!
Otherwise: Is there a non-prime that determines
Highest (point to) another non-prime (resulting in a transitive
dependency because the last non-prime is
Normal determined indirectly)? If yes, 3NF does not pass (and
the relation is thus in 2NF).
form
 Is the relation in BCNF? Are all the determinants
also candidate keys? If yes, BCNF pass.
 Find the highest normal form of a
relation R(A,B,C,D,E) with FD set {A->D, B->A, BC-
>D, AC->BE}

 Step 1. As we can see, (AC)+ ={A,C,B,E,D} but


Highest none of its subset can determine all attribute of
Normal relation, So AC will be candidate key. A can be derived
from B, so we can replace A in AC by B. So BC will
form: also be a candidate key. So there will be two
candidate keys {AC, BC}.
Example 1
 Step 2. Prime attribute are those attribute which are
part of candidate key {A,B,C} in this example and
others will be non-prime {D,E} in this example.
 Step 3. The relation R is in 1st normal form as a
relational DBMS does not allow multi-valued or
composite attribute.

Highest
 The relation is not in 2nd Normal form because A->D is
Normal partial dependency (A which is subset of candidate
form: key AC is determining non-prime attribute D) and
2nd normal form does not allow partial dependency.
Example 1
 So the highest normal form will be 1st Normal Form.
 Find the highest normal form of a relation
R(A,B,C,D,E) with FD set as {BC->D, AC->BE, B->E}

 Step 1. As we can see, (AC)+ ={A,C,B,E,D} but none of


Highest its subset can determine all attribute of relation, So AC will
be candidate key. A or C can’t be derived from any other
Normal attribute of the relation, so there will be only 1 candidate
key {AC}.
form:
Example 2  Step 2. Prime attribute are those attribute which are part
of candidate key {A,C} in this example and others will be
non-prime {B,D,E} in this example.

 Step 3. The relation R is in 1st normal form as a relational


DBMS does not allow multi-valued or composite attribute.
 The relation is in 2nd normal form because BC->D is
in 2nd normal form (BC is not proper subset of
candidate key AC) and AC->BE is in 2nd normal form
(AC is candidate key) and B->E is in 2nd normal form
(B is not a proper subset of candidate key AC).
Highest
Normal  The relation is not in 3rd normal form because in BC-
>D (neither BC is a super key nor D is a prime
form: attribute) and in B->E (neither B is a super key nor E
is a prime attribute) but to satisfy 3rd normal for,
Example 2 either LHS of an FD should be super key or RHS
should be prime attribute.

 So the highest normal form of relation will be 2nd


Normal form.
 Find the highest normal form of a
relation R(A,B,C,D,E) with FD set {B->A, A->C,
BC->D, AC->BE}

 Step 1. As we can see, (B)+ ={B,A,C,D,E}, so B will


Highest be candidate key. B can be derived from AC using AC-
Normal >B (Decomposing AC->BE to AC->B and AC->E). So
AC will be super key but (C)+ ={C} and (A)
form: +
={A,C,B,E,D}. So A (subset of AC) will be candidate
key. So there will be two candidate keys {A,B}.
Example 3
 Step 2. Prime attribute are those attribute which are
part of candidate key {A,B} in this example and
others will be non-prime {C,D,E} in this example.
 Step 3. The relation R is in 1st normal form as a
relational DBMS does not allow multi-valued or
composite attribute.

 The relation is in 2nd normal form because B->A is in


Highest 2nd normal form (B is a super key) and A->C is in
Normal 2nd normal form (A is super key) and BC->D is in
2nd normal form (BC is a super key) and AC->BE is in
form: 2nd normal form (AC is a super key).

Example 3  The relation is in 3rd normal form because LHS of all


FD’s are super keys. The relation is in BCNF as all LHS
of all FD’s are super keys. So the highest normal form
is BCNF.
 Find the highest normal form of a
relation R(A,B,C,D,E,F,G,H) with FD set {AB → C, A
→ DE, B → F, F → GH}

 Find the highest normal form of a


Highest relation R(A,B,C,D,E) with FD set {CE → D, D → B,
Normal C → A}

form:  Find the highest normal form of a


Examples relation R(A,B,C,D,E,F) with FD set {AB → C, DC →
AE, E → F}
THANK YOU

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy