Unit 3 DBMS (2 Part)
Unit 3 DBMS (2 Part)
2. Data Anomalies
These are inconsistencies that arise due to redundancy.
Update Anomalies: When you have the same piece of data stored in
multiple places, updating it in one place can lead to inconsistency if it's not
updated everywhere.
Insertion Anomalies: You might have to insert redundant data in multiple
places, leading to inconsistencies.
Deletion Anomalies: Deleting data in one table might unintentionally
remove necessary data that's needed elsewhere.
3. Increased Complexity
Querying and maintaining redundant data can be more complex.
4. Performance Issues
Duplicate data can slow down search, update, and insert operations.
Problems:
Solution:
Normalizing the database can resolve these problems. In this example, splitting
the table into two tables, `Orders` and `Customers`, would be a start:
1. Customers Table:
|A |B |C |
|----|----|----|
|1 |X |P |
|1 |Y |P |
|2 |Z |Q |
|A |B |
|----|----|
|1 |X |
|1 |Y |
|2 |Z |
R2(A, C):
|A |C |
|----|----|
|1 |P |
|1 |P |
|2 |Q |
Now, if we take the natural join of R1 and R2 on attribute A, we get back the
original relation R. Therefore, this is a lossless decomposition.
3. Increased Complexity
Decomposition leads to an increase in the number of tables, which can
complicate queries and maintenance tasks. While tools and ORM (Object-
Relational Mapping) libraries can mitigate this to some extent, it still adds
complexity.
4. Redundancy
Incorrect decomposition might not eliminate redundancy, and in some
cases, can even introduce new redundancies.
5. Performance Overhead
An increased number of tables, while aiding normalization, can also lead
to more complex SQL queries involving multiple joins, which can
introduce performance overheads.
Best Practices
1. Ensure decomposition is non-lossy. After decomposition, it should be
possible to recreate the original data using natural joins.
2. Preserve functional dependencies to enforce integrity constraints.
3. Strike a balance. While normalization and decomposition are essential, in
some scenarios (like reporting databases), a certain level of
denormalization might be preferred for performance reasons.
4. Regularly review and optimize the database design, especially as the
application's requirements evolve.
1. sid functionally determines sname because for a given student ID, there's
only one possible student name
2. zipcode functionally determines cityname, a specific zip code should
determine a unique cityname
3. cityname functionally determines state, A city name could determine a
state.
Mathematically, these functional dependencies can be represented as:
sid→sname
zipcode→cityname
cityname→state
3. Transitive Dependency
- If A -> B and B -> C, then A has a transitive dependency on C through B.
Example: Consider a relation Employees with the following attributes:
4. Closure
- The closure of a set of attributes X with respect to a set of functional
dependencies FD, denoted as X+, is the set of attributes that are functionally
determined by X.
- For example, given FDs: {A -> B, B -> C}, the closure of {A}, denoted as A+,
would be {A, B, C}.
Introduction to Normal Forms
| Student_ID | Subjects |
|------------|-------------------|
|1 | Math, English |
|2 | English, Science |
|3 | Math, History |
The table above is not in 1NF because the "Subjects" column contains
multiple values.
To transform it to 1NF:
| Student_ID | Subject |
|------------|-----------|
|1 | Math |
|1 | English |
|2 | English |
|2 | Science |
|3 | Math |
|3 | History |
Now, each combination of "Student_ID" and "Subject" is unique, and
every attribute contains only atomic values, ensuring the table is in
1NF.
Achieving 1NF is a fundamental step in database normalization, laying
the foundation for further normalization processes to eliminate
redundancy and ensure data integrity.
All non-key attributes (i.e., columns that aren't part of the primary
key) should be functionally dependent on the *entire* primary
key. This rule is especially relevant for tables with composite
primary keys (i.e., primary keys made up of more than one
column).
In simpler terms, no column should depend on just a part of the
composite primary key.
| Student_ID | Course_ID |
|------------|-----------|
|1 | C1 |
|1 | C2 |
|2 | C1 |
|3 | C3 |
Course Table
Simply put, in 3NF, non-key attributes should not depend on other non-
key attributes; they should only depend on the primary key.
Vendor Table
| Vendor_Name | Vendor_Address |
|-------------|-----------------|
| TechCorp | 123 Tech St. |
| FurniShop | 456 Furni Rd. |
Now, the `Product` table has `Product_ID` as the primary key, and all
attributes in this table depend only on the primary key. The `Vendor`
table has `Vendor_Name` as its primary key, and the address in this
table depends only on the vendor name.
This normalization eliminates the transitive dependency and reduces
redundancy. If we need to change a vendor's address, we now only have
to make the change in one place in the `Vendor` table.
To further refine the database structure, we might proceed to other
normalization forms like BCNF, but 3NF is often sufficient for many
practical applications and strikes a good balance between minimizing
redundancy and maintaining a manageable schema.
Boyce-Codd Normal Form (BCNF) in
DBMS
Boyce-Codd Normal Form (BCNF) is an advanced step in the
normalization process, and it's a stronger version of the Third Normal
Form (3NF). In fact, every relation in BCNF is also in 3NF, but the
converse isn't necessarily true. BCNF was introduced to handle certain
anomalies that 3NF does not deal with.
A relation is in BCNF if:
1. It is already in 3NF.
2. For every non-trivial functional dependency X→Y�→�, X is a
superkey. This essentially means that the only determinants in the
relation are superkeys.
Here, "non-trivial" means that Y is not a subset of X, and a "superkey"
is a set of attributes that functionally determines all other attributes in
the relation.
Here:
Each professor is associated with exactly one topic.
The primary key is {Student, Professor}, meaning a professor can
supervise multiple students, but each student has one thesis and
thus one topic.
There's a functional dependency {Professor} → {Topic} since
each professor supervises only one topic.
Now, observe that {Professor} is not a superkey (because the primary
key is a combination of Student and Professor), but it determines
another attribute in the table (Topic). This violates the definition of
BCNF.
To bring this table into BCNF, we can decompose it into two tables:
StudentSupervision Table:
| Student | Professor |
|---------|-----------|
| Alice | Mr. A |
| Bob | Mr. B |
| Charlie | Mr. C |
ProfessorTopic Table:
| Professor | Topic |
|-----------|--------|
| Mr. A | Math |
| Mr. B | Math |
| Mr. C | Physics|
In the table:
For student `S1`, there are two hobbies (`Painting` and `Hiking`)
and two courses (`Math` and `Physics`), resulting in a
combination of every hobby with every course.
This design suggests a multi-valued dependency between
`Student_ID` and `Hobby`, and also between `Student_ID` and
`Course`.
To bring the table to 4NF, we can decompose it into two separate tables:
StudentHobbies Table:
| Student_ID | Hobby |
|------------|------------|
| S1 | Painting |
| S1 | Hiking |
| S2 | Reading |
StudentCourses Table:
| Student_ID | Course |
|------------|------------|
| S1 | Math |
| S1 | Physics |
| S2 | Chemistry |
| S2 | Biology |
| Supplier | Part |
|----------|-------|
| S1 | P1 |
| S1 | P2 |
| S2 | P2 |
SupplierProjects:
| Supplier | Project |
|----------|---------|
| S1 | J1 |
| S1 | J2 |
| S2 | J2 |
PartsProjects:
| Part | Project |
|-------|---------|
| P1 | J1 |
| P2 | J1 |
| P1 | J2 |
| P2 | J2 |
In the above table, each row specifies the salary of an employee for a
specific time interval. As you can imagine, updates (like giving a raise)
could become complicated and might require adjustments in the
`ValidTo` and `ValidFrom` columns, especially if you have multiple
date ranges.
To bring this into 6NF, you could decompose the table into separate
relations, one capturing the essence of the entity (e.g., the employee
and some constant attributes) and others capturing the temporal
aspects.
Employee:
| EmployeeID | OtherConstantAttributes |
|------------|-------------------------|
| E1 | ... |
| E2 | ... |
EmployeeSalaryHistory: