Dbms Notes Unit 4
Dbms Notes Unit 4
FACULTY
SURESH
suresh.mentor@gmail.com
Database Management Systems B. Tech (CSE) II Year II Sem
UNIT IV
Schema Refinement (Normalization): Purpose of Normalization or schema refinement,
concept of functional dependency, normal forms based on functional dependency(1NF,
2NF and 3 NF), concept of surrogate key, Boyce-codd normal form(BCNF), Lossless join
and dependency preserving decomposition, Fourth normal form(4NF), Fifth Normal
Form (5NF).
SCHEMA REFINEMENT(Normalization)
The problems caused by the data redundant in the tables can be reduced by using
decomposition technique called Schema Refinement (also called as Normalization).
Problems Caused by Redundancy:
Storing the same information redundantly, that is, in more than one place within
a database, can lead to several problems:
Redundant storage: Some information is stored repeatedly.
Update anomalies: If one copy of such repeated data is updated, an
inconsistency is created unless all copies are similarly updated.
Insertion anomalies: It may not be possible to store some information unless
some other information is stored as well.
Deletion anomalies: It may not be possible to delete some information without
losing some other information as well.
suresh.mentor@gmail.com
Database Management Systems B. Tech (CSE) II Year II Sem
What is decomposition?
Decomposition is the process of breaking down in parts or elements.
It replaces a relation with a collection of smaller relations.
It breaks the table into multiple tables in a database.
It should always be lossless, because it confirms that the information in the
original relation can be accurately reconstructed based on the decomposed
relations.
If there is no proper decomposition of the relation, then it may lead to
problems like loss of information.
Hourly_Emps2(ssn,name,lot,rating,hours_worked)
Wages(rating,hourly_wages)
Hourly_Emps table:
ssn name lot rating Hours_worked
11-22 Aaa 48 8 40
22-33 Bbb 22 8 30
33-44 Ccc 35 5 30
44-55 Ddd 35 5 32
55-66 Eee 40 8 40
Wages table:
rating Hourly_wages
8 10
5 7
suresh.mentor@gmail.com
Database Management Systems B. Tech (CSE) II Year II Sem
The first question question can be answered based on the several normal foms have
been proposed for relations. Whether decomposition need or not depends the normal
forms of the relation.
The second question can be answered on the two possible properties of decomposition.
Properties of Decomposition:
1. Lossless Decomposition
2. Dependency Preservation
Functional Dependency:
Functional Dependency is when one attribute determines another attribute in a
DBMS system. The functional dependency is a relationship that exists between
two attributes. It typically exists between the primary key and non-key attribute
within a table.
A functional dependency is denoted by an arrow →
The functional dependency of X on Y is represented by X →Y
The left side of FD is known as a determinant, the right side of the production is
known as a schema refinement.
For example:
Assume we have an employee table with attributes: Emp_Id, Emp_Name,
Emp_Address.
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of
employee table because if we know the Emp_Id, we can tell that employee name
associated with it.
Functional dependency can be written as:
Emp_Id → Emp_Name
We can say that Emp_Name is functionally dependent on Emp_Id.
suresh.mentor@gmail.com
Database Management Systems B. Tech (CSE) II Year II Sem
suresh.mentor@gmail.com
Database Management Systems B. Tech (CSE) II Year II Sem
For example:
Emp id Emp name
AS555 Harry
AS811 George
AS999 Kevin
Consider this table with two columns Emp_id and Emp_name.
{Emp_id, Emp_name} -> Emp_id
is a trivial functional dependency as Emp_id is a subset of {Emp_id,Emp_name}.
suresh.mentor@gmail.com
Database Management Systems B. Tech (CSE) II Year II Sem
Transitive dependency:
A transitive is a type of functional dependency which happens when t is
indirectly formed by two functional dependencies.
Example:
Company CEO Age
Alibaba Jack Ma 54
{Company} -> {CEO} (if we know the compay, we know its CEO's name)
{CEO } -> {Age} If we know the CEO, we know the Age
Therefore according to the rule of rule of transitive dependency:
{ Company} -> {Age} should hold, that makes sense because if we know the
company name, we can know his age.
Note: You need to remember that transitive dependency can only occur in a
relation of three or more attributes.
suresh.mentor@gmail.com
Database Management Systems B. Tech (CSE) II Year II Sem
BC -> D
Not possible the above because AB can not determine D, or A cannot determine
D, BC also can not determine D.
Now, We will calculate the closure of all the attributes present in the relation
using the three steps mentioned below.
suresh.mentor@gmail.com
Database Management Systems B. Tech (CSE) II Year II Sem
Step-1 : Add attributes present on the LHS of the first functional dependency to the closure.
{Roll_no}+ = {Roll_No}
Step-2 : Add attributes present on the RHS of the original functional
dependency to the closure.
{Roll_no}+ = {Roll_No, Marks}
Step-3 : Add the other possible attributes which can be derived using attributes present
on the RHS of the closure. So Roll_No attribute cannot functionally determine any
attribute but Name attribute can determine other attributes such as Marks and Location
using 2nd Functional Dependency(Name-> Marks, Location).
Now, we need to calculate the closure of attributes of the relation R. The closures
will be:
{A}+ = {A, B, C}
{B}+ = {B}
{C}+ = {B, C}
suresh.mentor@gmail.com
Database Management Systems B. Tech (CSE) II Year II Sem
{D}+ = {D, E}
{E}+ = {E}
Closure Of Functional Dependency : Calculating Candidate Key
“A Candidate Key of a relation is an attribute or set of attributes that can
determine the whole relation or contains all the attributes in its closure."
Let’s try to understand how to calculate candidate keys.
Example-1 : Consider the relation R(A,B,C) with given functional dependencies :
FD1 : A -> B
FD2 : B -> C
Now, calculating the closure of the attributes as :
{A}+ = {A, B, C}
{B}+ = {B, C}
{C}+ = {C}
Clearly, “A” is the candidate key as, its closure contains all the attributes present
in the relation “R”.
Example-2 : Consider another relation R(A, B, C, D, E) having the Functional
dependencies :
FD1 : A -> BC
FD2 : C-> B
FD3 : D -> E
FD4 : E -> D
Now, calculating the closure of the attributes as :
{A}+ = {A, B, C}
{B}+ = {B}
{C}+ = {C, B}
{D}+ = {E, D}
{E}+ = {E, D}
In this case, a single attribute is unable to determine all the attribute on its own
like in previous example. Here, we need to combine two or more attributes to
determine the candidate keys.
{A, D}+ = {A, B, C, D, E}
{A, E}+ = {A, B, C, D, E}
Hence, "AD" and "AE" are the two possible keys of the given relation “R”. Any
other combination other than these two would have acted as extraneous
attributes.
NOTE : Any relation “R” can have either single or multiple candidate keys.
suresh.mentor@gmail.com
Database Management Systems B. Tech (CSE) II Year II Sem
Candidate Keys
Candidate Keys are super keys for which no proper subset is a super key. In other
words candidate keys are minimal super keys.
suresh.mentor@gmail.com
Database Management Systems B. Tech (CSE) II Year II Sem
2. Primary Key: is the columns you choose to maintain uniqueness in a table. Here in Employee
table you can choose either EmployeeID or SSN columns, EmployeeID is preferable choice, as SSN is
a secure value.
3. Alternate Key: Candidate column other the Primary column, like if EmployeeID is PK then SSN
would be the Alternate key.
4. Super Key: If you add any other column/attribute to a Primary Key then it become a super key,
like EmployeeID + FullName is a Super Key.
5. Composite Key: If a table do have a single columns that qualifies for a Candidate key, then you
have to select 2 or more columns to make a row unique. Like if there is no EmployeeID or SSN
columns, then you can make FullName + DateOfBirth as Composite primary Key. But still there can
be a narrow chance of duplicate row.
R(A,B,C,D,E)
A→BCDE This means the attribute 'A' uniquely determines the other attributes B,C,D,E.
BC→ADE This means the attributes 'BC' jointly determines all the other attributes
A,D,E in the relation.
Primary Key :A
Candidate Keys :A, BC
Super Keys : A,BC,ABC
ABC are not Candidate Keys since both are not minimal super keys.
suresh.mentor@gmail.com
Database Management Systems B. Tech (CSE) II Year II Sem
Normalization
Normalization is the process of minimizing redundancy from a relation or set of
relations. Redundancy in relation may cause insertion, deletion and updation
anomalies. So, it helps to minimize the redundancy in relations. Normal forms are used
to eliminate or reduce redundancy in database tables.
suresh.mentor@gmail.com
Database Management Systems B. Tech (CSE) II Year II Sem
suresh.mentor@gmail.com
Database Management Systems B. Tech (CSE) II Year II Sem
333 40
teacher_subject table:
teacher_id subject
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry
Now the tables comply with Second normal form (2NF).
Check THE GIVEN TABLE IS IN 2nf OR NOT?
Problem 1: Check the R(A,B,C,D) is in 2NF or not which contains following
Functional Dependencies?
A->B
C->D
Solution:
1) Find candidate key: AC is candidate key because [AC]+ = ABCD
2) Prime keys are A and C
3) Non-Prime Keys are B and D
4) Check the A-> B for partial dependency.
A->B partial dependency because the A is proper subset of candidate key AC
and B is non-prime attribute.
5) Check the C-> D for partial dependency.
C->D partial dependency because the C is proper subset of candidate key AC
and D is non-prime attribute.
Conclusion: Since both are partial dependencies , the relation is not in 2NF.
suresh.mentor@gmail.com
Database Management Systems B. Tech (CSE) II Year II Sem
Conclusion: Since one partial dependency exist in the above , the relation is Not in 2NF.
suresh.mentor@gmail.com
Database Management Systems B. Tech (CSE) II Year II Sem
employee table:
emp_id emp_name emp_zip
1001 John 282005
1002 Ajeet 222008
1006 Lora 282007
1101 Lilly 292008
1201 Steve 222999
employee_zip table:
emp_zip emp_state emp_city emp_district
282005 UP Agra Dayal Bagh
222008 TN Chennai M-City
282007 TN Chennai Urrapakkam
292008 UK Pauri Bhagwan
222999 MP Gwalior Ratan
suresh.mentor@gmail.com
Database Management Systems B. Tech (CSE) II Year II Sem
Problem 2 Check the R(A,B,C) is in 3NF or not which contains following Functional
Dependencies?
AB->C
C->B
Solution:
1) Find candidate key: AB is candidate key because [AB]+ = ABC. And also AC is also
candidate key because as per C->B, B is replace by C in the candidate key AB i.e AC.
We have now two candidate keys AB and AC.
2) Prime keys : A,B,C
3) Non-Prime Keys : Null
4) Check the AB-> C for 3NF.
AB->C is satisfying 3NF because the AB is super key(LHS is true) and B prime
attribute(RHS is true). Both are true
5) Check the C-> B for 3NF.
C->B is satisfying 3NF because the C is not super key(LHS is false) but B is
prime attribute(RHS is True ). Atlease one is true
Conclusion: Since both FDs are in 3NF , then relation is in 3NF.
suresh.mentor@gmail.com
Database Management Systems B. Tech (CSE) II Year II Sem
emp_dept_mapping table:
emp_id emp_dept
1001 Production
1001 stores
1002 design
suresh.mentor@gmail.com
Database Management Systems B. Tech (CSE) II Year II Sem
1002 Purchasing
Functional dependencies:
emp_id -> emp_nationality
emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both the functional dependencies left side part is a
key.
Fourth normal form (4NF)
o A relation will be in 4NF if it is in Boyce Codd normal form and has no
multi-valued dependency.
o For a dependency A → B, if for a single value of A, multiple values of B
exists, then the relation will be a multi-valued dependency.
Example
STUDENT
21 Computer Dancing
21 Math Singing
34 Chemistry Dancing
74 Biology Cricket
59 Physics Hockey
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two
independent entity. Hence, there is no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two
courses, Computer and Math and two hobbies, Dancing and Singing. So there is a
Multi-valued dependency on STU_ID, which leads to unnecessary repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU ID COURSE
21 Computer
21 Math
suresh.mentor@gmail.com
Database Management Systems B. Tech (CSE) II Year II Sem
34 Chemistry
74 Biology
59 Physics
STUDENT_HOBBY
STU ID HOBBY
21 Dancing
21 Singing
34 Dancing
74 Cricket
59 Hockey
Example 2:
PERSON MOBILE FOOD LIKES
Mahesh 9893 / 9424 Burger / pizza
Ramesh 9191 Pizza
Person->-> mobile,
Person ->-> food_likes
So to make the above table into 4NF, we can decompose it into two tables:
Person_mobile Table:
PERSON MOBILE
Mahesh 9893
Mahesh 9424
Ramesh 9191
Person_FoodLIkes Table:
PERSON FOOD LIKES
Mahesh Burger
Mahesh pizza
suresh.mentor@gmail.com
Database Management Systems B. Tech (CSE) II Year II Sem
suresh.mentor@gmail.com
Database Management Systems B. Tech (CSE) II Year II Sem
EmpSkills EmpJob
Networking EJ001
Web Development EJ002
Programming EJ002
Our Join Dependency:
{(EmpName, EmpSkills ), ( EmpName, EmpJob), (EmpSkills, EmpJob)}
The above relations have join dependency, so they are not in 5NF. That would mean
that a join relation of the above three relations is equal to our original
relation <Employee>.
LOSS LESS JOIN DECOPOSITION
If we decompose a relation R into relations R1 and R2,
Decomposition is lossy if R1 ⋈ R2 ⊃ R
Decomposition is lossless if R1 ⋈ R2 = R
To check for lossless join decomposition using FD set, following conditions must
hold:
1. Union of Attributes of R1 and R2 must be equal to attribute of R. Each attribute
of R must be either in R1 or in R2.
Att(R1) U Att(R2) = Att(R)
2. Intersection of Attributes of R1 and R2 must not be NULL.
Att(R1) ∩ Att(R2) ≠ Φ
3. Common attribute must be a key for at least one relation (R1 or R2)
Att(R1) ∩ Att(R2) -> Att(R1) or Att(R1) ∩ Att(R2) -> Att(R2)
For Example, A relation R (A, B, C, D) with FD set{A->BC} is decomposed into R1(ABC)
and R2(AD) which is a lossless join decomposition as:
1. First condition holds true as Att(R1) U Att(R2) = (ABC) U (AD) = (ABCD) =
Att(R).
2. Second condition holds true as Att(R1) ∩ Att(R2) = (ABC) ∩ (AD) ≠ Φ
3. Third condition holds true as Att(R1) ∩ Att(R2) = A is a key of R1(ABC) because
A->BC is given.
Example:
Consider a
relation R(A,B,C,D)
suresh.mentor@gmail.com
Database Management Systems B. Tech (CSE) II Year II Sem
suresh.mentor@gmail.com
Database Management Systems B. Tech (CSE) II Year II Sem
suresh.mentor@gmail.com