0% found this document useful (0 votes)
10 views22 pages

UNIT-III Redundancy

The document discusses schema refinement in databases, focusing on the problems caused by redundancy, such as insertion, deletion, and updation anomalies, which can lead to data inconsistency and increased storage requirements. It outlines the concepts of normalization and decomposition as methods to reduce redundancy and improve data integrity, along with the types of decomposition (lossless and lossy) and the importance of dependency preservation. Additionally, it highlights the advantages and disadvantages of data redundancy in database management systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views22 pages

UNIT-III Redundancy

The document discusses schema refinement in databases, focusing on the problems caused by redundancy, such as insertion, deletion, and updation anomalies, which can lead to data inconsistency and increased storage requirements. It outlines the concepts of normalization and decomposition as methods to reduce redundancy and improve data integrity, along with the types of decomposition (lossless and lossy) and the importance of dependency preservation. Additionally, it highlights the advantages and disadvantages of data redundancy in database management systems.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

UNIT-III

Schema Refinement
Problems caused by redundancy:

Redundancy means having multiple copies of the same data in the


database. This problem arises when a database is not normalized.
Suppose a table of student details attributes is: student ID, student
name, college name, college rank, and course opted.

Student_I Name Contact College Course Rank


D

100 Himanshu 7300934851 GEU B.Tech 1

101 Ankit 7900734858 GEU B.Tech 1

102 Ayush 7300936759 GEU B.Tech 1

103 Ravi 7300901556 GEU B.Tech 1

It can be observed that values of attribute college name, college rank,


and course are being repeated which can lead to problems. Problems
caused due to redundancy are:

• Insertion anomaly
• Deletion anomaly
• Updation anomaly
Insertion Anomaly

• If a student detail has to be inserted whose course is not being


decided yet then insertion will not be possible till the time course
is decided for the student.

Student_ID Name Contact College Course Rank

Himans 7300934
100 GEU 1
hu 851

• This problem happens when the insertion of a data record is not


possible without adding some additional unrelated data to the
record.

Deletion Anomaly
If the details of students in this table are deleted then the details of the
college will also get deleted which should not occur by common
sense. This anomaly happens when the deletion of a data record results
in losing some unrelated information that was stored as part of the
record that was deleted from a table.
It is not possible to delete some information without losing some other
information in the table as well.
Updation Anomaly
Suppose the rank of the college changes then changes will have to be
all over the database which will be time-consuming and
computationally costly.

Student_ID Name Contact College Course Rank

100 Himanshu 7300934851 GEU B.Tech 1

101 Ankit 7900734858 GEU B.Tech 1

102 Ayush 7300936759 GEU B.Tech 1

103 Ravi 7300901556 GEU B.Tech 1

All places should be updated, If updation does not occur at all places
then the database will be in an inconsistent state.
Redundancy in a database occurs when the same data is stored in
multiple places. Redundancy can cause various problems such as data
inconsistencies, higher storage requirements, and slower data
retrieval.
Problems Caused Due to Redundancy:

• Data Inconsistency: Redundancy can lead to data


inconsistencies, where the same data is stored in multiple locations,
and changes to one copy of the data are not reflected in the other
copies. This can result in incorrect data being used in decision-
making processes and can lead to errors and inconsistencies in the
data.
• Storage Requirements: Redundancy increases the storage
requirements of a database. If the same data is stored in multiple
places, more storage space is required to store the data. This can
lead to higher costs and slower data retrieval.
• Update Anomalies: Redundancy can lead to update anomalies,
where changes made to one copy of the data are not reflected in the
other copies. This can result in incorrect data being used in
decision-making processes and can lead to errors and
inconsistencies in the data.
• Performance Issues: Redundancy can also lead to performance
issues, as the database must spend more time updating multiple
copies of the same data. This can lead to slower data retrieval and
slower overall performance of the database.
• Security Issues: Redundancy can also create security issues, as
multiple copies of the same data can be accessed and manipulated
by unauthorized users. This can lead to data breaches and
compromise the confidentiality, integrity, and availability of the
data.
• Maintenance Complexity: Redundancy can increase the
complexity of database maintenance, as multiple copies of the same
data must be updated and synchronized. This can make it more
difficult to troubleshoot and resolve issues and can require more
time and resources to maintain the database.
• Data Duplication: Redundancy can lead to data duplication,
where the same data is stored in multiple locations, resulting in
wasted storage space and increased maintenance complexity. This
can also lead to confusion and errors, as different copies of the data
may have different values or be out of sync.
• Data Integrity: Redundancy can also compromise data integrity,
as changes made to one copy of the data may not be reflected in the
other copies. This can result in inconsistencies and errors and can
make it difficult to ensure that the data is accurate and up-to-date.
• Usability Issues: Redundancy can also create usability issues, as
users may have difficulty accessing the correct version of the data
or may be confused by inconsistencies and errors. This can lead to
frustration and decreased productivity, as users spend more time
searching for the correct data or correcting errors.
To prevent redundancy in a database, normalization techniques can be
used. Normalization is the process of organizing data in a database to
eliminate redundancy and improve data
integrity. Normalization involves breaking down a larger table into
smaller tables and establishing relationships between them. This
reduces redundancy and makes the database more efficient and
reliable.
Advantages of data redundancy in DBMS

o Provides Data Security: Data redundancy can enhance data


security as it is difficult for cyber attackers to attack data that are
in different locations.
o Provides Data Reliability: Reliable data improves accuracy
because organizations can check and confirm whether data is
correct.
o Create Data Backup: Data redundancy helps in backing up the
data.

Disadvantages of data redundancy in DBMS

o Data corruption: Redundant data leads to high chances of data


corruption.
o Wastage of storage: Redundant data requires more space,
leading to a need for more storage space.
o High cost: Large storage is required to store and maintain
redundant data, which is costly.

How to reduce data redundancy in DBMS


We can reduce data redundancy using the following methods:

o Database Normalization: We can normalize the data using the


normalization method. In this method, the data is broken down
into pieces, which means a large table is divided into two or more
small tables to remove redundancy. Normalization removes insert
anomaly, update anomaly, and delete anomaly.
o Deleting Unused Data: It is important to remove redundant data
from the database as it generates data redundancy in the DBMS.
It is a good practice to remove unwanted data to reduce
redundancy.
o Master Data: The data administrator shares master data across
multiple systems. Although it does not remove data redundancy,
but it updates the redundant data whenever the data is changed.
Decomposition

Decomposition refers to the division of tables into multiple tables to


produce consistency in the data. In this article, we will learn about the
Database concept. This article is related to the concept of Decomposition
in DBMS. It explains the definition of Decomposition, types of
Decomposition in DBMS, and its properties.
What is Decomposition in DBMS?
When we divide a table into multiple tables or divide a relation into
multiple relations, then this process is termed Decomposition in
DBMS. We perform decomposition in DBMS when we want to process
a particular data set. It is performed in a database management system
when we need to ensure consistency and remove anomalies and
duplicate data present in the database. When we perform
decomposition in DBMS, we must try to ensure that no information or
data is lost.

Decomposition in DBMS

Types of Decomposition
There are two types of Decomposition:
• Lossless Decomposition
• Lossy Decomposition
Types of Decomposition

Lossless Decomposition

o If the information is not lost from the relation that is decomposed, then the
decomposition will be lossless.
o The lossless decomposition guarantees that the join of relations will result in
the same relation as it was decomposed.
o The relation is said to be lossless decomposition if natural joins of all the
decomposition give the original relation.
Example:

EMPLOYEE_DEPARTMENT table:

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT

EMPLOYEE table:

EMP_ID EMP_NAME EMP_AGE EMP_CITY

22 Denim 28 Mumbai

33 Alina 25 Delhi

46 Stephan 30 Bangalore

52 Katherine 36 Mumbai

60 Jack 40 Noida

DEPARTMENT table

DEPT_ID EMP_ID DEPT_NAME

827 22 Sales

438 33 Marketing

869 46 Finance

575 52 Production

678 60 Testing
Now, when these two relations are joined on the common column "EMP_ID", then the
resultant relation will look like:

Employee ⋈ Department

EMP_ID EMP_NAME EMP_AGE EMP_CITY DEPT_ID DEPT_NAME

22 Denim 28 Mumbai 827 Sales

33 Alina 25 Delhi 438 Marketing

46 Stephan 30 Bangalore 869 Finance

52 Katherine 36 Mumbai 575 Production

60 Jack 40 Noida 678 Testing

Hence, the decomposition is Lossless join decomposition.

Example:

There is a relation called R(A, B, C)


A B C

55 16 27

48 52 89
Now we decompose this relation into two sub relations R1 and R2
R1(A, B)
A B

55 16

48 52
R2(B, C)
B C

16 27

52 89
After performing the Join operation we get the same original relation

A B C

55 16 27

48 52 89

Now, if we take the natural join of R1 and R2 on attribute A, we get


back the original relation R. Therefore, this is a lossless decomposition.

Example: Let's consider a table `R(A, B, C)` with a dependency `A →


B`. If you decompose it into `R1(A, B)` and `R2(B, C)`, it would be
lossy because you can't recreate the original table using natural joins.
Example: Consider a relation R(A,B,C) with the following data:

|A |B |C |
|----|----|----|
|1 |X |P |
|1 |Y |P |
|2 |Z |Q |
Suppose we decompose R into R1(A,B) and R2(A,C).
R1(A, B):

|A |B |
|----|----|
|1 |X |
|1 |Y |
|2 |Z |
R2(A, C):

|A |C |
|----|----|
|1 |P |
|1 |P |
|2 |Q |
Now, if we take the natural join of R1 and R2 on attribute A, we get
back the original relation R. Therefore, this is a lossless decomposition.

Dependency Preserving
o It is an important constraint of the database.
o In the dependency preservation, at least one decomposed table
must satisfy every dependency.
o If a relation R is decomposed into relation R1 and R2, then the
dependencies of R either must be a part of R1 or R2 or must be
derivable from the combination of functional dependencies of R1
and R2.
o For example, suppose there is a relation R (A, B, C, D) with
functional dependency set (A->BC). The relational R is
decomposed into R1(ABC) and R2(AD) which is dependency
preserving because FD A->BC is a part of relation R1(ABC).
o Dependency Preservation: A Decomposition D = { R1, R2,
R3…Rn } of R is dependency preserving wrt a set F of
Functional dependency if

o (F1 ? F2 ? … ? Fm)+ = F+.


o Consider a relation R
o R ---> F{...with some functional dependency(FD)....}
o
o R is decomposed or divided into R1 with FD { f1 } and R2
with { f2 }, then
o there can be three cases:
o
o f1 U f2 = F -----> Decomposition is dependency preserving.
o f1 U f2 is a subset of F -----> Not Dependency preserving.
o f1 U f2 is a super set of F -----> This case is not possible.
Problem:
Let a relation R (A, B, C, D ) and functional dependency {AB –> C,
C –> D, D –> A}. Relation R is decomposed into R1( A, B, C) and
R2(C, D). Check whether decomposition is dependency preserving
or not.
Solution:

R1(A, B, C) and R2(C, D)

Let us find closure of F1 and F2


To find closure of F1, consider all combination of
ABC. i.e., find closure of A, B, C, AB, BC and AC
Note ABC is not considered as it is always ABC

closure(A) = { A } // Trivial
closure(B) = { B } // Trivial
closure(C) = {C, A, D} but D can't be in closure as D is not present
R1.
= {C, A}
C--> A // Removing C from right side as it is trivial attribute

closure(AB) = {A, B, C, D}
= {A, B, C}
AB --> C // Removing AB from right side as these are trivial attributes

closure(BC) = {B, C, D, A}
= {A, B, C}
BC --> A // Removing BC from right side as these are trivial attributes
closure(AC) = {A, C, D}
NULL SET

F1 {C--> A, AB --> C, BC --> A}.


Similarly F2 { C--> D }

In the original Relation Dependency { AB --> C , C --> D , D --> A}.


AB --> C is present in F1.
C --> D is present in F2.
D --> A is not preserved.

F1 U F2 is a subset of F. So given decomposition is not dependency


preserving.

Problems Related to Decomposition

1. Loss of Information(lossless decomposition or lossy


decomposition)

2. Loss of Functional Dependency

• Once tables are decomposed, certain functional dependencies


might not be preserved, which can lead to the inability to enforce
specific integrity constraints.
• Example: If you have the functional dependency `A → B` in the
original table, but in the decomposed tables, there is no table with
both `A` and `B`, this functional dependency can't be preserved.
Example: Let's consider a relation R with attributes A,B, and C and the
following functional dependencies:
A→B
B→C

Now, suppose we decompose R into two relations:


R1(A,B)withFDA→B
R2(B,C) with FD B → C

In this case, the decomposition is dependency-preserving because all


the functional dependencies of the original relation R can be found in
the decomposed relations R1 and R2. We do not need to join R1 and
R2 to enforce or check any of the functional dependencies.
However, if we had a functional dependency in R, say A → C, which
cannot be determined from either R1 or R2 without joining them, then
the decomposition would not be dependency-preserving for that
specific FD.

3. Increased Complexity

• Decomposition leads to an increase in the number of tables, which


can complicate queries and maintenance tasks. While tools and
ORM (Object-Relational Mapping) libraries can mitigate this to
some extent, it still adds complexity.

4. Redundancy

• Incorrect decomposition might not eliminate redundancy, and in


some cases, can even introduce new redundancies.

5. Performance Overhead

• An increased number of tables, while aiding normalization, can


also lead to more complex SQL queries involving multiple joins,
which can introduce performance overheads.
Functional Dependency

The functional dependency is a relationship that exists between two


attributes. It typically exists between the primary key and non-key
attribute within a table.

X → Y

The left side of FD is known as a determinant, the right side of the


production is known as a dependent.

For example:

Assume we have an employee table with attributes: Emp_Id,


Emp_Name, Emp_Address.

Here Emp_Id attribute can uniquely identify the Emp_Name attribute


of employee table because if we know the Emp_Id, we can tell that
employee name associated with it.

Functional dependency can be written as:

Emp_Id → Emp_Name

We can say that Emp_Name is functionally dependent on Emp_Id.

Types of Functional dependency


1. Trivial functional dependency

o A → B has trivial functional dependency if B is a subset of A.


o The following dependencies are also trivial like: A → A, B → B

Example:

Consider a table with two columns Employee_Id and Employee_Name.


{Employee_id, Employee_Name} → Employee_Id is a trivial function
al dependency as
Employee_Id is a subset of {Employee_Id, Employee_Name}.
Also, Employee_Id → Employee_Id and Employee_Name → Employee
_Name are trivial dependencies too.

2. Non-trivial functional dependency

A → B has a non-trivial functional dependency if B is not a subset of A.

When A intersection B is NULL, then A → B is called as complete non-


trivial.

Example:

ID → Name,
Name → DOB
Reasoning about Functional Dependancy:
Inference Rule (IR):

1.The Armstrong's axioms are the basic inference rule.

2.Armstrong's axioms are used to conclude functional dependencies on


a relational database.

3.The inference rule is a type of assertion. It can apply to a set of


FD(functional dependency) to derive other FD.

4.Using the inference rule, we can derive additional functional


dependency from the initial set.
The Functional dependency has 6 types of inference rule:

1. Reflexive Rule (IR1):

In the reflexive rule, if Y is a subset of X, then X determines Y.

If X ⊇ Y then X → Y

Example:

X = {a, b, c, d, e}
Y = {a, b, c}

2. Augmentation Rule (IR2):

The augmentation is also called as a partial dependency. In


augmentation, if X determines Y, then XZ determines YZ for any Z.

If X → Y then XZ → YZ
Example:

For R(ABCD), if A → B then AC → BC

3. Transitive Rule (IR3):

In the transitive rule, if X determines Y and Y determine Z, then X must


also determine Z.

If X → Y and Y → Z then X → Z

4. Union Rule (IR4):

Union rule says, if X determines Y and X determines Z, then X must


also determine Y and Z.

If X → Y and X → Z then X → YZ
Proof:

1.X→Y(given)
2.X→Z(given)
3.X→XY(using IR2 on 1 by augmentation with X. Where XX=X)
4.XY→YZ(using IR2 on 2 by augmentation with Y)
5. X → YZ (using IR3 on 3 and 4)
5. Decomposition Rule (IR5):

Decomposition rule is also known as project rule. It is the reverse of


union rule.

This Rule says, if X determines Y and Z, then X determines Y and X


determines Z separately.

If X → YZ then X → Y and X → Z

Proof:

1.X→YZ(given)
2.YZ→Y(usingIR1 Rule)
3. X → Y (using IR3 on 1 and 2)
6. Pseudo transitive Rule (IR6):

In Pseudo transitive Rule, if X determines Y and YZ determines W,


then XZ determines W.

If X → Y and YZ → W then XZ → W

Proof:

1.X→Y(given)
2.WY→Z(given)
3.WX→WY(using IR2 on 1 by augmenting with W)
4. WX → Z (using IR3 on 3 and 2)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy