0% found this document useful (0 votes)
14 views104 pages

DBMS Unit-3

The document discusses the need for normalization in Database Management Systems (DBMS) to eliminate data redundancy and various anomalies such as insertion, updation, and deletion anomalies. It outlines the process of normalization, including different normal forms (1NF, 2NF, 3NF, BCNF, 4NF, and 5NF) and their requirements, as well as concepts like functional dependency and keys. Additionally, it covers types of functional dependencies and Armstrong's axioms, which are fundamental in understanding the relationships between attributes in a database.

Uploaded by

manvipaulemail
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views104 pages

DBMS Unit-3

The document discusses the need for normalization in Database Management Systems (DBMS) to eliminate data redundancy and various anomalies such as insertion, updation, and deletion anomalies. It outlines the process of normalization, including different normal forms (1NF, 2NF, 3NF, BCNF, 4NF, and 5NF) and their requirements, as well as concepts like functional dependency and keys. Additionally, it covers types of functional dependencies and Armstrong's axioms, which are fundamental in understanding the relationships between attributes in a database.

Uploaded by

manvipaulemail
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 104

Database Management Systems

Unit-3
Need of Normalization in DBMS

In the table above, we have data for four Computer Sci. students.

As we can see, data for the fields branch, hod(Head of Department), and office_tel are repeated for the
students who are in the same branch in the college, this is Data Redundancy.
Need of Normalization in DBMS
1. Insertion Anomaly in DBMS
• Suppose for a new admission, until and unless a student opts for a branch, data of the student cannot be
inserted, or else we will have to set the branch information as NULL.
• Also, if we have to insert data for 100 students of the same branch, then the branch information will be
repeated for all those 100 students.
• These scenarios are nothing but Insertion anomalies.
2. Updation Anomaly in DBMS
• What if Mr. X leaves the college? or Mr. X is no longer the HOD of the computer science department?
In that case, all the student records will have to be updated, and if by mistake we miss any record, it will
lead to data inconsistency.

• This is an Updation anomaly because you need to update all the records in your table just because one
piece of information got changed.
Need of Normalization in DBMS
3. Deletion Anomaly in DBMS
• In our Student table, two different pieces of information are kept together, the Student information and
the Branch information.

• So if only a single student is enrolled in a branch, and that student leaves the college, or for some reason,
the entry for the student is deleted, we will lose the branch information too.
Normalization
• Normalization is the process of organizing the data in the database.
• Normalization is used to minimize the redundancy from a relation or set of relations. It
is also used to eliminate undesirable characteristics like Insertion, Update, and
Deletion Anomalies.
• Normalization divides the larger table into smaller and links them using relationships.
• The normal form is used to reduce redundancy from the database table.
Functional Dependency

• A Functional Dependency describes a relationship between attributes within a single relation.


• An attribute is functionally dependent on another if we can use the value of one attribute to determine the
value of another. Formally, functional dependency is defined as;
• Functional dependency – In a given relation R, X and Y are attributes. Attribute Y is functionally dependent
on attribute X if each value of X determines exactly one value of Y.
• This is represented as XY
Functional Dependency
In any relation, a functional dependency α → β holds if-
Two tuples having same value of attribute α also have same value for attribute β.

Mathematically,
If α and β are the two sets of attributes in a relational table R where-
α⊆R
β⊆R
Then, for a functional dependency to exist from α to β,
If t1[α] = t2[α], then t1[β] = t2[β]
Functional Dependency
If t1[α] = t2[α], then t1[β] = t2[β]

Example-
Consider the following table-

A→B
AB → C
A→ E
DE → C
AD → C
Functional Dependency
If t1[α] = t2[α], then t1[β] = t2[β]

Example-
Consider the following table-
roll_no → { name,
dept_name, dept_building }
Types of Functional Dependencies in DBMS
1. Trivial functional dependency
2. Non-Trivial functional dependency
3. Multivalued functional dependency
4. Transitive functional dependency
1. Trivial Functional Dependency

In Trivial Functional Dependency, a dependent is always a subset of the determinant. i.e. If X → Y and Y is the
subset of X, then it is called trivial functional dependency

Here, {roll_no, name} → name is a trivial functional


dependency, since the dependent name is a subset of
determinant set {roll_no, name}. Similarly, roll_no → roll_no
is also an example of trivial functional dependency.
Types of Functional Dependencies in DBMS
1. Trivial functional dependency
2. Non-Trivial functional dependency
3. Multivalued functional dependency
4. Transitive functional dependency
2. Non-trivial Functional Dependency

In Non-trivial functional dependency, the dependent is strictly not a subset of the determinant. i.e. If X → Y and Y
is not a subset of X, then it is called Non-trivial functional dependency.

Here, roll_no → name is a non-trivial functional dependency,


since the dependent name is not a subset of determinant
roll_no. Similarly, {roll_no, name} → age is also a non-trivial
functional dependency, since age is not a subset of {roll_no,
name}
Types of Functional Dependencies in DBMS
1. Trivial functional dependency
2. Non-Trivial functional dependency
3. Multivalued functional dependency
4. Transitive functional dependency
3. Multivalued Functional Dependency
In Multivalued functional dependency, entities of the dependent set are not dependent on each other. i.e. If a →
{b, c} and there exists no functional dependency between b and c, then it is called a multivalued functional
dependency.

Here, roll_no → {name, age} is a multivalued functional


dependency, since the dependents name & age are not
dependent on each other(i.e. name → age or age → name
doesn’t exist !)
Types of Functional Dependencies in DBMS
4. Transitive Functional Dependency
In transitive functional dependency, dependent is indirectly dependent on determinant. i.e. If a → b & b → c,
then according to axiom of transitivity, a → c. This is a transitive functional dependency.

Here, enrol_no → dept and dept →


building_no. Hence, according to the axiom of
transitivity, enrol_no → building_no is a valid
functional dependency. This is an indirect
functional dependency, hence called Transitive
functional dependency.
Types of Functional Dependencies in DBMS
Fully Functional Dependency
In full functional dependency an attribute or a set of attributes uniquely determines another attribute or set of attributes.
If a relation R has attributes X, Y, Z with the dependencies X->Y and X->Z which states that those dependencies are
fully functional.

Partial Functional Dependency


In partial functional dependency a non key attribute depends on a part of the composite key, rather than the whole key.
If a relation R has attributes X, Y, Z where X and Y are the composite key and Z is non key attribute. Then X->Z is a
partial functional dependency in RBDMS.
Armstrong’s Axioms
• Reflexive rule: if   , then   
• Augmentation rule: if   , then     
• Transitivity rule: if   , and   , then   
• Union rule: If    holds and    holds, then     holds.
• Decomposition rule: If     holds, then    holds and   
holds.
• Pseudotransitivity rule:If    holds and     holds, then  
  holds.
Armstrong’s Axioms
• Reflexivity: Example, {roll_no, name} → name is
valid.
• Augmentation: Example, {roll_no, name} →
dept_building is valid, hence {roll_no, name,
dept_name} → {dept_building, dept_name} is also
valid.
• Transitivity: Example, roll_no → dept_name &
dept_name → dept_building, then roll_no →
dept_building is also valid.
Closure of an Attribute Set
The set of all those attributes which can be functionally determined from an attribute set is called as a closure of that
attribute set.
Closure of attribute set {X} is denoted as {X}+.
Example-
Consider a relation R ( A , B , C , D , E , F , G ) with the functional dependencies-
A → BC
BC → DE
D→F
CF → G
Closure of attribute A- A+ = { A , B , C , D , E , F , G }
Closure of attribute D- D+ = { D , F }
Closure of attribute set {BC}- { B , C }+ = { B , C , D , E , F , G }
Keys
Super Key in DBMS
A super key as a set of those keys that identify a row or a tuple uniquely. The word super denotes the superiority of a key. Thus,
a super key is the superset of a key known as a Candidate key.Super Key values may also be NULL.
Candidate Key
• It is a minimal super key.
• It is a super key with no repeated data is called a candidate key.
• The minimal set of attributes that can uniquely identify a record.
• It must contain unique values.
• It can contain NULL values.
• Every table must have at least a single candidate key.
• A table can have multiple candidate keys but only one primary key.
• The value of the Candidate Key is unique and may be null for a tuple.
• There can be more than one candidate key in a relationship.
Primary Key
• It is a unique key.
• It can identify only one tuple (a record) at a time.
• It has no duplicate values, it has unique values.
• It cannot be NULL.
• Primary keys are not necessarily to be a single column; more
than one column can also be a primary key for a table.
Primary Key
Normalization
Normalization is a systematic approach of decomposing tables to eliminate data redundancy and undesirable characteristics like Insertion, Update and
Deletion anomalies.
These would include two properties
o Lossless join property - which guarantees that the generation of spurious tuples will not occur.
o Dependency preservation property - This ensures that each functional dependency is represented in some individual relation resulting after
decomposition.
**Prime attribute - An attribute of relation schema R is called a prime attribute of R if it is a member of some candidate key of R.
**Non-prime attribute - An attribute is called nonprime if it is not a prime attribute—that is, if it is not a member of any candidate key.
If a relation schema has more than one key, each is called a candidate key. One of the candidate keys is arbitrarily designated to be the primary
key,and the others are called secondary keys.

Normal forms: The normal form is a relation refers to the highest normal form condition that it meets and hence indicates the degree to
which it has been normalized. Normal forms are used to eliminate or reduce redundancy in database tables.
First Normal Form (1NF)
A relation R is said to be in the first normal form if and only if all the attributes of the relation R are atomic
in nature. For example, consider the Department table given in figure (b). It is not in 1NF because one of
its attributes Dlocations is non -atomic as it contains more than one value in row 1. To make it 1NF
compliant, create a separate row for each value of Dlocations of row 1 as shown in figure (c).

Figure (a) A relation


schema (b) sample state
of relation Department
that isnot in 1NF (c)1NF
version of the same
relation
Second Normal Form (2NF)
A Relation is said to be in Second Normal Form (2NF) if and only if:
It is in First normal form (1NF)
No partial dependency exists between non -key attributes and keyattributes

If the above case is found in FD, thus R is not in 2NF.


Second Normal Form (2NF)
Third Normal Form ( 3NF):
Third Normal Form ( 3NF):
Third Normal Form ( 3NF):
• To make it 3NF, we need to remove this transitive dependency. To do that, we need to split Employee table into
two tables (Employee table and Rate table) as given below.
Boyce-Codd Normal Form (BCNF)
Rules for BCNF
Rule 1: The table should be in the 3rd Normal Form.
Rule 2: For any functional dependency (A->B),
A should be either the super key or the candidate key.
In simple words, it means that if B is given as a prime attribute, then A can't be a non-prime attribute.
Boyce-Codd Normal Form (BCNF)
Fourth normal form (4NF)
• A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency.
• For a dependency A → B, if for a single value of A, multiple values of B exists, then the relation
will be a multi-valued dependency.
• For a dependency A->B, if for a single value of A, multiple values of B exist, then the relation will
be a multi-valued dependency.
Fourth normal form (4NF)
Consider the following table
Here,
regno->-> phoneno regno->-> qualification. Both are
nontrivial MVD
Regno Phoneno Qualification
The given relation is in BCNF [since no functional dependency
exists]. But the above table is not in 4NF [since there is a
1 P1 Diploma
nontrivial MVD].

1 P1 B.Tech
Anomalies
1 P1 M.Tech It also suffers with anomalies which are as follows −
Insertion anomaly: If we want to insert a new phoneno for
1 P2 Diploma regno3 then we have to insert 3 rows, because for each
phoneno we have stored all three combinations of
qualification.
1 P2 B.Tech
Deletion anomaly: If we want to delete the qualification
diploma, then we have to delete it in more than one place.
1 P2 M.Tech
Updation anomaly: If we want to update the qualification
diploma to IT, then we have to update in more than one place.
Fourth normal form (4NF)
3.5.1 4NF decomposition

If R(XYZP) has X->->Y and X->->Z then, R is decomposed to R1(XY) and


R2(XZP).=> R(regno, phoneno, qualification) is decomposed to All the anomalies
R1(regno, phoneno) and R2(regno, qualification). discussed above are
R1
removed. The above two
Regno Phoneno
relations are in 4NF.
Now, regno->->phoneno
1 P1 is trivial MVD (since
1 P2 {regno} U
{phoneno}=R1)
R2
=>R1 is in 4NF.
Regno Qualification
Regno->-> qualification
1 Diploma is trivial MVD (since
1 B.Tech {regno} U {qualification}
1 M.Tech
=R2)
=> R2 is in 4NF.
Fifth normal form (5NF)
• A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be
lossless.
• 5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid
redundancy.
• 5NF is also known as Project-join normal form (PJ/NF).
Fifth normal form (5NF)
. In the above table, John takes both Computer
SUBJECT LECTURER SEMESTER
and Math class for Semester 1 but he doesn't
take Math class for Semester 2. In this case,
Computer Anshika Semester1
combination of all these fields required to
identify a valid data.
Computer John Semester1
Suppose we add a new Semester as Semester
Math John Semester1 3 but do not know about the subject and
who will be taking that subject so we leave
Math Akash Semester2 Lecturer and Subject as NULL. But all three
columns together acts as a primary key, so
Chemistry Praveen Semester1 we can't leave other two columns blank.
So to make the above table into 5NF, we can
decompose it into three relations P1, P2 &
P3:
Fifth normal form (5NF)
P1

. SEMESTER SUBJECT
P3
Semester1 Computer

Semester1 Math SEMSTER LECTURER

Semester1 Chemistry
Semester1 Anshika
Semester2 Math

P2 Semester1 John

SUBJECT LECTURER Semester1 John

Computer Anshika
Semester2 Akash
Computer John

Math John Semester1 Praveen

Math Akash

Chemistry Praveen
Decomposition
• A functional decomposition is the process of
breaking down the functions of an
organization into progressively greater (finer
and finer)levels of detail.
• In decomposition, one function is described in
greater detail by a set of other supporting
functions
• The decomposition of a relation schema R
consists of replacing the relation schema by
two or more relation schemas that each
contain a subset of the attributes of R and
together include all attributes in R.
• Decomposition helps in eliminating some of
the problems of baddesign such as .
redundancy, inconsistencies and anomalies.
Decomposition
Decomposition
Method-1
Now, let us check whether this decomposition is lossless or
not.
For lossless decomposition, we must have- R1 ⋈ R2 = R
Now, if we perform the natural join ( ⋈ ) of the sub relations
R1 and R2 , we get-
It is lossless decomposition

A B C

1 2 1

2 5 3

3 3 3
Decomposition
Method-2 :
2.1 union -- Attribute of R1 U Attribute of R2
A, B, C (getting all attributes)
2.2 Intersection ----R1, R2---B (Common Attributes are
available)
2.3 B is candidate key (is common attribute key ?)
Hence it is Loss less Decomposition
Dependency Preservation Decomposition:

(F1 𝖴 F2 𝖴 … 𝖴 Fm)+ = F+.


Consider a relation R
R ---> F{...with some functional dependency(FD). }

R is decomposed or divided into R1 with FD { f1 } and R2 with { f2 }, then


there can be three cases:

f1 U f2 = F -----> Decomposition is dependency preserving.


f1 U f2 is a subset of F -----> Not Dependency preserving.
f1 U f2 is a super set of F -----> This case is not possible.
Dependency Preservation Decomposition:
Difference between 3NF and BCNF :
Transaction
• Transaction: A transaction is a unit of program execution that accesses and possibly updates
various data items. As an example, consider the; following transaction that transfers 50 RS/-
from account A to account B.
• It can be written as:
1. read (A)
2. A := A – 50
3.write (A)
4.read (B)
5. B := B + 50
6. write (B)

This transaction consists of 6 actions that need to takes place in order to transfer $50 from
account A to account B.
Transaction
• Comparing Transaction and Program: A transaction is the result from the execution of user
;
program written in high level data manipulation language (DML) or programming language and it
is started and ended between the statements begin transaction and end transaction.

• Transaction operations: Every transaction access data using two operations:


• read(X) : which transfers the data item X from the database to a local buffer belonging to the
transaction that executed the read operation.
• write (X): which transfers the data item X from the local buffer ofthe transaction that executed
the write back to the database.
Properties of the Transaction

;
Properties of the Transaction
• To ensure integrity of the data, we require that the database system maintain the following properties
of the transactions: ;
1.Atomicity. Either all operations of the transaction are reflected properly in the database, or none are.
2.Consistency. Execution of a transaction in isolation (that is, with no other transaction executing
concurrently) preserves theconsistency of the database.
3.Isolation. Although multiple transactions may execute concurrently, each transaction must be unaware
of other concurrently executing transactions. Intermediate transaction results must be hidden from other
concurrently executed transactions.
• That is, for every pair of transactions Ti and Tj , it appears toTi that either Tj finished execution before Ti
started, or Tjstarted execution after Ti finished. Thus, each transaction isunaware of other transactions
executing concurrently in the system.
4.Durability. After a transaction completes successfully, the changes it has made to the database persist,
even if there are systemfailures.
These properties are often called the ACID properties.
Properties of the Transaction
To gain a better understanding of ACID properties and the need for them, consider a simplified banking system consisting of
;
several accounts anda set of transactions that access and update those accounts.
Let Ti be a transaction that transfers $50 from account A to account B. This transaction can be defined as;
Ti :
Read(A);
A := A − 50;
write(A);
read(B);
B := B + 50;
write(B).
Let us now consider each of the ACID requirements.
Consistency: The consistency requirement here is that the sum of A and
B be unchanged by the execution of the transaction. It can be verified easily that, if the database is consistent before an
execution of the transaction, the database remains consistent after the execution of the transaction.
Properties of the Transaction
Atomicity: Suppose that, just before the execution of transaction Ti the values of accounts A and B are$1000 and $2000,
respectively.
;
Now suppose that, during the execution of transaction Ti, a failureoccurs (power failures, hardware failures, and software
errors) that prevent Ti from completing its execution successfully.
Further, suppose that the failure happened after the write(A) operation but before the write(B) operation. In this case, the
values of accounts A and B reflected in the database are $950 and $2000. The system destroyed $50 as a result of this
failure. In p articular, we note that the sum A + B is no longer preserved.
We call such a state as an inconsistent state. We must ensure that such inconsistencies are not visible in a database system.
This state, however, is eventually replaced by the consistent state where the value of account A is $950, and the value of
account B is $2050.
This is the reason for the atomicity requirement: If the atomicity propertyis present, all actions of the transaction are
reflected in the database, or none are.
The basic idea behind ensuring atomicity is this: The database system keeps track (on disk) of the old values of any data on
which a transaction performs a write, and, if the transaction does not complete its execution, the database system restores
the old values to make it appear as though the transaction never executed.
Properties of the Transaction
Isolation: If several transactions are executed concurrently, their operations may interleave in some undesirable way,
;
resulting in an inconsistent state. For example consider the following; If between steps 3 and 6, another transaction T2 is
allowed to access the partially updated database, it will see an inconsistent database (the sum A + B will be less than it
should be).

A way to avoid the problem of concurrently executing transactions is toexecute transactions serially that is, one after the
other.
However, executing multiple transactions concurrently has significantbenefits, as we will see later.
Durability: The durability property guarantees that, once a transaction completes successfully (i.e., the transfer of the $50
has taken place), all the updates that it carried out on the database persist, even if there is asystem failure after the
transaction completes execution.
Transaction State
Transaction State
1.Active State –When the instructions of the transaction are running then the transaction is in active state. If all the
‘read and write’ operations are performed without any error then it goes to the “partially committed state”; if any
instruction fails, it goes to the “failed state”.
2.Partially Committed –After completion of all the read and write operation the changes are made in main memory or
local buffer. If the changes are made permanent on the DataBase then the state will change to “committed state” and
in case of failure it will go to the “failed state”.
3.Failed State –When any instruction of the transaction fails, it goes to the “failed state” or if failure occurs in making
a permanent change of data on DataBase.
4.Aborted State –After having any type of failure the transaction goes from “failed state” to “aborted state” and since
in previous states, the changes are only made to local buffer or main memory and hence these changes are deleted or
rolled-back.
5.Committed State –It is the state when the changes are made permanent on the Data Base and the transaction is
complete and therefore terminated in the “terminated state”.
6.Terminated State –If there isn’t any roll-back or the transaction comes from the “committed state”, then the system
is consistent and ready for new transaction and the old transaction is terminated.
Schedules
• A schedule is a sequence of instructions that
specify the chronological order in which
instructions of transactions are executed.
Serial Schedule:
If transactions are executed from start to finish,
one after another then the schedule is called as a
serial schedule.
Characteristics-

Serial schedules are always-

Consistent
Recoverable
Cascadeless
Strict
Concurrent schedule/ Non-Serial Schedule

• If the instructions of different transactions are interleaved then the schedule is called as a
concurrent schedule.
• Thus, for a set of n transactions, there exist n! different valid serial schedules.

• If two transactions are running concurrently, the operating system may execute one transaction for
a little while, then perform a context switch, execute the second transaction for some time, and
then switch back to the first transaction for some time, and so on. With multiple transactions, the
CPU time is shared among all the transactions.
• Concurrent execution of transactions improves the performance of the system. Following figure
shows a concurrent schedule.
Concurrent schedule/ Non-Serial Schedule
Concurrent schedule/ Non-Serial Schedule

• Not all concurrent executions result in a


correct state. To illustrate,consider the
schedule shown in following Figure.
• the sum A + B is not preserved by the
execution of the two transactions.
• We can ensure consistency of the database
under concurrent execution by making
Serializability

• Some non-serial schedules may lead to inconsistency of the database.


• Serializability is a concept that helps to identify which non-serial schedules are correct and
will maintain the consistency of the database.
• If a given non-serial schedule of ‘n’ transactions is equivalent to some serial schedule of ‘n’
transactions, then it is called as a serializable schedule.
Serializability
Conflict Serializability

Conflict Serializable: A schedule is called conflict serializable if it can be transformed into


a serial schedule by swapping non-conflicting operations and using precedence graph.
Transactioni Transactionj IsConflicting

Read(X) Read(X) Non-conflicting

Read(X) Write(X) Conflicting

Write(X) Read(X) Conflicting

Write(X) Write(X) Conflicting


Conflict Serializability
Conflict Serializability
Precedence graph
It is used to check conflict serializability.
The steps to check conflict
serializability are as follows
• For each transaction T, put a node or
vertex in the graph.
• For each conflicting pair, put an edge
from Ti to Tj.
• If there is a cycle in the graph then
schedule is not conflict serializable
else schedule is conflict serializable. Not a conflict serializable schedule
Conflict Serializability

Not a conflict seriali


zable schedule
Conflict Serializability

Check whether the given schedule S is conflict serializable or not-

S : R1(A) , R2(A) , R1(B) , R2(B) , R3(B) , W1(A) , W2(B)


View Serializability
If a given schedule is found to be view equivalent to some serial schedule, then it is called as a view serializable schedule.
View Equivalent Schedules-
Consider two schedules S1 and S2 each consisting of two transactions T1 and T2.
Schedules S1 and S2 are called view equivalent if the following three conditions hold true for them-
Condition-01: Intial Read
For each data item X, if transaction Ti reads X from the database initially in schedule S1, then in schedule S2 also, Ti must
perform the initial read of X from the database.
Condition-02: Write-Read
If transaction Ti reads a data item that has been updated by the transaction Tj in schedule S1, then in schedule S2 also,
transaction Ti must read the same data item that has been updated by the transaction Tj.
Condition-03: Final Write
For each data item X, if X has been updated at last by transaction Ti in schedule S1, then in schedule S2 also, X must be
updated at last by transaction Ti.
View Serializability
Checking view serializability
1. First, check for conflict serializability.
• If the given schedule is conflict serializable, then it is surely view serializable. Stop and report your
answer.
• If the given schedule is not conflict serializable, then it may or may not be view serializable. Go and
check using other methods.
2. Check for a blind write. If there is a blind write, then the schedule can be view serializable. So, check
its view serializability using the view equivalent schedule technique . If there is no blind write, then
the schedule can never be view serializable.
(Blind write is writing a value or piece of data without reading it.)
View Serializability
View Serializability
example of the schedule as follows to check the View Serializability
Recoverability of Schedule
Recoverable Schedules
Non-serial schedules which are not serializable are
called as non-serializable schedules.Non-serializable
schedules may be recoverable or irrecoverable.

If any transaction that performs a dirty read


operation from an uncommitted transaction and also
its committed operation becomes delayed till the
uncommitted transaction is either committed or
rollback such type of schedules is called as
Recoverable Schedules.
Irrecoverable schedule:
A schedule is said to be
irrecoverable if the transaction
commits before the transaction
from which it has read the data
commits.
Recoverable Schedules
Types of recoverable schedules
There are three types of
recoverable schedules which are
explained below with relevant
examples −
• Cascading schedules
• Cascadeless Schedules
• Strict Schedules.
Cascading Schedule

• If in a schedule, failure of one transaction causes several other dependent transactions to rollback or
abort, then such a schedule is called as a Cascading Schedule or Cascading Rollback or Cascading
Abort.
• It simply leads to the wastage of CPU time. Here,
Transaction T2 depends on transaction T1.
Transaction T3 depends on transaction T2.
Transaction T4 depends on transaction T3.

In this schedule,
The failure of transaction T1 causes the transaction
T2 to rollback.
The rollback of transaction T2 causes the
transaction T3 to rollback.
The rollback of transaction T3 causes the
transaction T4 to rollback.
Such a rollback is called as a Cascading Rollback.
Cascadeless Schedule
If in a schedule, a transaction is not allowed to read a data item until the last transaction that has
written it is committed or aborted, then such a schedule is called as a Cascadeless Schedule.
In other words,
• Cascadeless schedule allows only committed read operations.
• Therefore, it avoids cascading roll back and thus saves CPU time.
Cascadeless Schedule

Cascadeless schedule allows only committed read


operations.However, it allows uncommitted write
operations.
Strict Schedule

If in a schedule, a transaction is neither


allowed to read nor write a data item
until the last transaction that has written
it is committed or aborted, then such a
schedule is called as a Strict Schedule.

In other words,
• Strict schedule allows only committed
read and write operations.
• Strict schedules are more strict than cascadeless
• Clearly, strict schedule implements schedules.
more restrictions than cascadeless • All strict schedules are cascadeless schedules.
schedule. • All cascadeless schedules are not strict
schedules.
Concurrency Problems
When multiple transactions execute concurrently in an uncontrolled or unrestricted manner, then it might lead
to several problems. Such problems are called as concurrency problems.
1. Dirty Read Problem
Reading the data written by an uncommitted
transaction is called as dirty read.
This read is called as dirty read because-
• There is always a chance that the uncommitted
transaction might roll back later.
• Thus, uncommitted transaction might make other
transactions read a value that does not even exist.
• This leads to inconsistency of the database. In this example,
T2 reads the dirty value of A written by the
NOTE- uncommitted transaction T1.
• Dirty read does not lead to inconsistency always. T1 fails in later stages and roll backs.
Thus, the value that T2 read now stands to be
• It becomes problematic only when the incorrect.
uncommitted transaction fails and roll backs later Therefore, database becomes inconsistent.
due to some reason.
2. Unrepeatable Read Problem

This problem occurs when a


transaction gets to read
unrepeated i.e. different values of
the same variable in its different
read operations even when it has
not updated its value.
In this example,

• T2 gets to read a different value of X in its second reading.


• T2 wonders how the value of X got changed because
according to it, it is running in isolation.
3. Lost Update Problem
This problem occurs when multiple transactions execute
concurrently and updates from one or more transactions
get lost.

NOTE-

• This problem occurs whenever there is a write-write


conflict.
• In write-write conflict, there are two writes one by each
transaction on the same data item without any read in
the middle

In this example,
T1 writes the over written value of X in the database.
Thus, update from T1 gets lost.
4. Phantom Read Problem
This problem occurs when a transaction reads
some variable from the buffer and when it reads
the same variable later, it finds that the variable
does not exist.

In this example,

T2 finds that there does not exist any variable X when it tries
reading X again.
T2 wonders who deleted the variable X because according to it,
it is running in isolation.
Avoiding Concurrency Problems
• To ensure consistency of the database, it is very important to prevent the occurrence of above
problems.
• Concurrency Control Protocols help to prevent the occurrence of above problems and maintain the
consistency of the database.
Two Phase Locking Protocol
Locking in a database management system is used for handling transactions in databases. The two-phase locking
protocol ensures serializable conflict schedules. A schedule is called conflict serializable if it can be transformed
into a serial schedule by swapping non-conflicting operations.

Before understanding the two phases of Locking, let's understand the types of locks used in transaction control.

• Shared Lock: Data can only be read when a shared lock is applied. Data cannot be written. It is denoted as
lock-S
• Exclusive lock: Data can be read as well as written when an exclusive lock is applied. It is denoted as lock-X.

The two phases of Locking are :

• Growing Phase: In the growing phase, the transaction only obtains the lock. The transaction can not release
the lock in the growing phase. Only when the data changes are committed the transaction starts the Shrinking
phase.
• Shrinking Phase: Neither are locks obtained nor they are released in this phase. When all the data changes
are stored, only then the locks are released.
Two Phase Locking Protocol
Two Phase Locking Protocol

Transaction T1
The growing Phase is from steps 1-3
The shrinking Phase is from steps 5-7
Lock Point at 3
Transaction T2
The growing Phase is from steps 2-6
The shrinking Phase is from steps 8-9
Lock Point at 6

Lock Point
The Point at which the growing phase ends, i.e., when
a transaction takes the final lock it needs to carry on its
work.
Two Phase Locking Protocol
Drawbacks Cascading Rollbacks in 2-PL
• Cascading Rollback is possible under 2-PL.
• Deadlocks and Starvation are possible.
Two Phase Locking Protocol
• In the above transaction in step
3, T1 cannot execute the
Deadlock in 2PL exclusive lock in R2 as the
Exclusive lock is already
obtained by transaction T2 in
step 1.
• Also, in step 3, the Exclusive lock
on R1 cannot be executed by T2
as T1 has already applied the
Exclusive lock on R1.
Two Phase Locking Protocol

So, if we draw a wait-for graph,


we can see that T1 waits for
T2 and T2 waits for T1, which
creates conflict, and the
waiting time never ends, which
leads to a deadlock.

Hence, deadlock is possible in a two-phase locking protocol.


Two Phase Locking Protocol

Problem with Two-Phase Locking


• It does not insure recoverability which can be solved
by strict two-phase locking and rigorous two-phase
locking.
• It does not ensure a cascade-less schedule which can be
solved by strict two-phase locking and rigorous two-
phase locking.
• It may suffer from deadlock which can be solved by
conservative two-phase locking.
Timestamp Ordering Protocols

• The Timestamp Ordering Protocol is used to order the transactions based on their Timestamps. The order
of transaction is nothing but the ascending order of the transaction creation.
• The priority of the older transaction is higher that's why it executes first. To determine the timestamp of the
transaction, this protocol uses system time or logical counter.
• The lock-based protocol is used to manage the order between conflicting pairs among transactions at the
execution time. But Timestamp based protocols start working as soon as a transaction is created.
• Let's assume there are two transactions T1 and T2. Suppose the transaction T1 has entered the system
at 007 times and transaction T2 has entered the system at 009 times. T1 has the higher priority, so it
executes first as it is entered the system first.
• The timestamp ordering protocol also maintains the timestamp of last 'read' and 'write' operation on a
data.
Timestamp Ordering Protocols
Basic Timestamp ordering protocol works as follows:
1. Check the following condition whenever a transaction Ti issues a Read (X) operation:
If W_TS(X) >TS(Ti) then the operation is rejected.
If W_TS(X) <= TS(Ti) then the operation is executed.
Timestamps of all the data items are updated.
2. Check the following condition whenever a transaction Ti issues a Write(X) operation:
If TS(Ti) < R_TS(X) then the operation is rejected.
If TS(Ti) < W_TS(X) then the operation is rejected and Ti is rolled back otherwise the operation is executed.
Where,
TS(TI) denotes the timestamp of the transaction Ti.
R_TS(X) denotes the Read time-stamp of data-item X.
W_TS(X) denotes the Write time-stamp of data-item X.
Timestamp Ordering Protocols

Advantages
• Timestamp-based protocols in dbms ensure serializability as the transaction is ordered on their creation timestamp. The precedence
graph for the timestamp ordering protocol looks as follows:

• No deadlock occurs when timestamp ordering protocol is used as no transaction waits.


• No older transaction waits for a longer period of time so the protocol is free from deadlock.
• Timestamp based protocol in dbms ensures that there are no conflicting items in the transaction execution.
Timestamp Ordering Protocols

Disadvantages

• Timestamp-based protocols in dbms may not be cascade free or recoverable.


• In timestamp based protocol in dbms there is a possibility of starvation of long transactions if a sequence
of conflicting short transactions causes repeated restarting of the long transaction.
Log Based Recovery

Log-based recovery in DBMS provides the ability to maintain or recover data in case of system failure.
DBMS keeps a record of every transaction on some stable storage device to provide easy access to data
when the system fails. A log file will be created for every operation performed on the database at that point.
The original transaction should be processed before it is applied to the database.
Log Based Recovery

Log and log records


The log is a sequence of log records, recording all the updated activities in the database. In stable storage,
logs for each transaction are maintained. Any operation which is performed on the database is recorded on
the log. Prior to performing any modification to the database, an updated log record is created to reflect that
modification. An update log record represented as: <Ti, Xj, V1, V2> has these fields:

Transaction identifier: Unique Identifier of the transaction that performed the write operation.
Data item: Unique identifier of the data item written.
Old value: Value of data item prior to write.
New value: Value of data item after write operation.
Log Based Recovery
Other types of log records are:
<Ti start>: It contains information about when a transaction Ti
starts.
<Ti commit>: It contains information about when a transaction Ti
commits.
<Ti abort>: It contains information about when a transaction Ti
aborts.
Undo and Redo Operations
Because all database modifications must be preceded by the creation of
a log record, the system has available both the old value prior to the
modification of the data item and new value that is to be written for
data item.
Undo: using a log record sets the data item specified in log record to
old value.
Redo: using a log record sets the data item specified in log record to
Log Based Recovery

Recovery using Log records


After a system crash has occurred, the system consults the
log to determine which transactions need to be redone and
which need to be undone.

• Transaction Ti needs to be undone if the log contains the


record <Ti start> but does not contain either the record
<Ti commit> or the record <Ti abort>.
• Transaction Ti needs to be redone if log contains record
<Ti start> and either the record <Ti commit> or the
record <Ti abort>.
Log Based Recovery

Advantages of Log based Disadvantages of Log based Recovery


Recovery
• Additional overhead: Maintaining the log file
• Durability: In the event of a breakdown, incurs an additional overhead on the database
the log file offers a dependable and long- system, which can reduce the performance of
lasting method of recovering data. It
guarantees that in the event of a system
the system.
crash, no committed transaction is lost. • Complexity: Log-based recovery is a complex
• Faster Recovery: Since log-based recovery
recovers databases by replaying committed
process that requires careful management and
transactions from the log file, it is administration. If not managed properly, it can
typically faster than alternative recovery lead to data inconsistencies or loss.
methods.
• Incremental Backup: Backups can be made in • Storage space: The log file can consume a
increments using log-based recovery. Just significant amount of storage space, especially
the changes made since the last backup are in a database with a large number of
kept in the log file, rather than creating
a complete backup of the database each transactions.
time. • Time-Consuming: The process of replaying the
• Lowers the Risk of Data Corruption: By
making sure that all transactions are transactions from the log file can be time-
correctly committed or canceled before consuming, especially if there are a large
they are written to the database, log- number of transactions to recover.
based recovery lowers the risk of data
corruption.
Checkpoint

• The checkpoint is a type of mechanism where all the previous logs are
removed from the system and permanently stored in the storage disk.
• The checkpoint is like a bookmark. While the execution of the transaction,
such checkpoints are marked, and the transaction is executed then using the
steps of the transaction, the log files will be created.
• When it reaches to the checkpoint, then the transaction will be updated into
the database, and till that point, the entire log file will be removed from the
file. Then the log file is updated with the new step of transaction till next
checkpoint and so on.
• The checkpoint is used to declare a point before which the DBMS was in the
consistent state, and all transactions were committed.
Recovery using Checkpoint
In the following manner, a recovery system recovers the database from
this failure:
Recovery using Checkpoint
• The recovery system reads log files from the end to start. It reads log files from T4
to T1.
• Recovery system maintains two lists, a redo-list, and an undo-list.
• The transaction is put into redo state if the recovery system sees a log with <Tn,
Start> and <Tn, Commit> or just <Tn, Commit>. In the redo-list and their previous
list, all the transactions are removed and then redone before saving their logs.
• For example: In the log file, transaction T2 and T3 will have <Tn, Start> and <Tn,
Commit>. The T1 transaction will have only <Tn, commit> in the log file. That's
why the transaction is committed after the checkpoint is crossed. Hence it puts T1,
T2 and T3 transaction into redo list.
• The transaction is put into undo state if the recovery system sees a log with <Tn,
Start> but no commit or abort log found. In the undo-list, all the transactions are
undone, and their logs are removed.
• For example: Transaction T4 will have <Tn, Start>. So T4 will be put into undo list
since this transaction is not yet complete and failed a mid.
Advantages of Checkpoints

• Checkpoints help us in recovering the transaction of the database in case of a random shutdown of the
database.
• It enhancing the consistency of the database in case when multiple transactions are executing in the database
simultaneously.
• It increasing the data recovery process.
• Checkpoints work as a synchronization point between the database and the transaction log file in the database.
• Checkpoint records in the log file are used to prevent unnecessary redo operations.
• Since dirty pages are flushed out continuously in the background, it has a very low overhead and can be done
frequently.
• Checkpoints provide the baseline information needed for the restoration of the lost state in the event of a
system failure.
• A database checkpoint keeps track of change information and enables incremental database backup.
• A database storage checkpoint can be mounted, allowing regular file system operations to be performed.
• Database checkpoints can be used for application solutions which include backup, recovery or database
modifications.
Disadvantages of Checkpoints
1. Database storage checkpoints can only be used to restore from
logical errors (E.g. a human error).

2. Because all the data blocks are on the same physical device,
database storage checkpoints cannot be used to restore files due to a
media failure.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy