DBMS Unit-3
DBMS Unit-3
Unit-3
Need of Normalization in DBMS
In the table above, we have data for four Computer Sci. students.
As we can see, data for the fields branch, hod(Head of Department), and office_tel are repeated for the
students who are in the same branch in the college, this is Data Redundancy.
Need of Normalization in DBMS
1. Insertion Anomaly in DBMS
• Suppose for a new admission, until and unless a student opts for a branch, data of the student cannot be
inserted, or else we will have to set the branch information as NULL.
• Also, if we have to insert data for 100 students of the same branch, then the branch information will be
repeated for all those 100 students.
• These scenarios are nothing but Insertion anomalies.
2. Updation Anomaly in DBMS
• What if Mr. X leaves the college? or Mr. X is no longer the HOD of the computer science department?
In that case, all the student records will have to be updated, and if by mistake we miss any record, it will
lead to data inconsistency.
• This is an Updation anomaly because you need to update all the records in your table just because one
piece of information got changed.
Need of Normalization in DBMS
3. Deletion Anomaly in DBMS
• In our Student table, two different pieces of information are kept together, the Student information and
the Branch information.
• So if only a single student is enrolled in a branch, and that student leaves the college, or for some reason,
the entry for the student is deleted, we will lose the branch information too.
Normalization
• Normalization is the process of organizing the data in the database.
• Normalization is used to minimize the redundancy from a relation or set of relations. It
is also used to eliminate undesirable characteristics like Insertion, Update, and
Deletion Anomalies.
• Normalization divides the larger table into smaller and links them using relationships.
• The normal form is used to reduce redundancy from the database table.
Functional Dependency
Mathematically,
If α and β are the two sets of attributes in a relational table R where-
α⊆R
β⊆R
Then, for a functional dependency to exist from α to β,
If t1[α] = t2[α], then t1[β] = t2[β]
Functional Dependency
If t1[α] = t2[α], then t1[β] = t2[β]
Example-
Consider the following table-
A→B
AB → C
A→ E
DE → C
AD → C
Functional Dependency
If t1[α] = t2[α], then t1[β] = t2[β]
Example-
Consider the following table-
roll_no → { name,
dept_name, dept_building }
Types of Functional Dependencies in DBMS
1. Trivial functional dependency
2. Non-Trivial functional dependency
3. Multivalued functional dependency
4. Transitive functional dependency
1. Trivial Functional Dependency
In Trivial Functional Dependency, a dependent is always a subset of the determinant. i.e. If X → Y and Y is the
subset of X, then it is called trivial functional dependency
In Non-trivial functional dependency, the dependent is strictly not a subset of the determinant. i.e. If X → Y and Y
is not a subset of X, then it is called Non-trivial functional dependency.
Normal forms: The normal form is a relation refers to the highest normal form condition that it meets and hence indicates the degree to
which it has been normalized. Normal forms are used to eliminate or reduce redundancy in database tables.
First Normal Form (1NF)
A relation R is said to be in the first normal form if and only if all the attributes of the relation R are atomic
in nature. For example, consider the Department table given in figure (b). It is not in 1NF because one of
its attributes Dlocations is non -atomic as it contains more than one value in row 1. To make it 1NF
compliant, create a separate row for each value of Dlocations of row 1 as shown in figure (c).
1 P1 B.Tech
Anomalies
1 P1 M.Tech It also suffers with anomalies which are as follows −
Insertion anomaly: If we want to insert a new phoneno for
1 P2 Diploma regno3 then we have to insert 3 rows, because for each
phoneno we have stored all three combinations of
qualification.
1 P2 B.Tech
Deletion anomaly: If we want to delete the qualification
diploma, then we have to delete it in more than one place.
1 P2 M.Tech
Updation anomaly: If we want to update the qualification
diploma to IT, then we have to update in more than one place.
Fourth normal form (4NF)
3.5.1 4NF decomposition
. SEMESTER SUBJECT
P3
Semester1 Computer
Semester1 Chemistry
Semester1 Anshika
Semester2 Math
P2 Semester1 John
Computer Anshika
Semester2 Akash
Computer John
Math Akash
Chemistry Praveen
Decomposition
• A functional decomposition is the process of
breaking down the functions of an
organization into progressively greater (finer
and finer)levels of detail.
• In decomposition, one function is described in
greater detail by a set of other supporting
functions
• The decomposition of a relation schema R
consists of replacing the relation schema by
two or more relation schemas that each
contain a subset of the attributes of R and
together include all attributes in R.
• Decomposition helps in eliminating some of
the problems of baddesign such as .
redundancy, inconsistencies and anomalies.
Decomposition
Decomposition
Method-1
Now, let us check whether this decomposition is lossless or
not.
For lossless decomposition, we must have- R1 ⋈ R2 = R
Now, if we perform the natural join ( ⋈ ) of the sub relations
R1 and R2 , we get-
It is lossless decomposition
A B C
1 2 1
2 5 3
3 3 3
Decomposition
Method-2 :
2.1 union -- Attribute of R1 U Attribute of R2
A, B, C (getting all attributes)
2.2 Intersection ----R1, R2---B (Common Attributes are
available)
2.3 B is candidate key (is common attribute key ?)
Hence it is Loss less Decomposition
Dependency Preservation Decomposition:
This transaction consists of 6 actions that need to takes place in order to transfer $50 from
account A to account B.
Transaction
• Comparing Transaction and Program: A transaction is the result from the execution of user
;
program written in high level data manipulation language (DML) or programming language and it
is started and ended between the statements begin transaction and end transaction.
;
Properties of the Transaction
• To ensure integrity of the data, we require that the database system maintain the following properties
of the transactions: ;
1.Atomicity. Either all operations of the transaction are reflected properly in the database, or none are.
2.Consistency. Execution of a transaction in isolation (that is, with no other transaction executing
concurrently) preserves theconsistency of the database.
3.Isolation. Although multiple transactions may execute concurrently, each transaction must be unaware
of other concurrently executing transactions. Intermediate transaction results must be hidden from other
concurrently executed transactions.
• That is, for every pair of transactions Ti and Tj , it appears toTi that either Tj finished execution before Ti
started, or Tjstarted execution after Ti finished. Thus, each transaction isunaware of other transactions
executing concurrently in the system.
4.Durability. After a transaction completes successfully, the changes it has made to the database persist,
even if there are systemfailures.
These properties are often called the ACID properties.
Properties of the Transaction
To gain a better understanding of ACID properties and the need for them, consider a simplified banking system consisting of
;
several accounts anda set of transactions that access and update those accounts.
Let Ti be a transaction that transfers $50 from account A to account B. This transaction can be defined as;
Ti :
Read(A);
A := A − 50;
write(A);
read(B);
B := B + 50;
write(B).
Let us now consider each of the ACID requirements.
Consistency: The consistency requirement here is that the sum of A and
B be unchanged by the execution of the transaction. It can be verified easily that, if the database is consistent before an
execution of the transaction, the database remains consistent after the execution of the transaction.
Properties of the Transaction
Atomicity: Suppose that, just before the execution of transaction Ti the values of accounts A and B are$1000 and $2000,
respectively.
;
Now suppose that, during the execution of transaction Ti, a failureoccurs (power failures, hardware failures, and software
errors) that prevent Ti from completing its execution successfully.
Further, suppose that the failure happened after the write(A) operation but before the write(B) operation. In this case, the
values of accounts A and B reflected in the database are $950 and $2000. The system destroyed $50 as a result of this
failure. In p articular, we note that the sum A + B is no longer preserved.
We call such a state as an inconsistent state. We must ensure that such inconsistencies are not visible in a database system.
This state, however, is eventually replaced by the consistent state where the value of account A is $950, and the value of
account B is $2050.
This is the reason for the atomicity requirement: If the atomicity propertyis present, all actions of the transaction are
reflected in the database, or none are.
The basic idea behind ensuring atomicity is this: The database system keeps track (on disk) of the old values of any data on
which a transaction performs a write, and, if the transaction does not complete its execution, the database system restores
the old values to make it appear as though the transaction never executed.
Properties of the Transaction
Isolation: If several transactions are executed concurrently, their operations may interleave in some undesirable way,
;
resulting in an inconsistent state. For example consider the following; If between steps 3 and 6, another transaction T2 is
allowed to access the partially updated database, it will see an inconsistent database (the sum A + B will be less than it
should be).
A way to avoid the problem of concurrently executing transactions is toexecute transactions serially that is, one after the
other.
However, executing multiple transactions concurrently has significantbenefits, as we will see later.
Durability: The durability property guarantees that, once a transaction completes successfully (i.e., the transfer of the $50
has taken place), all the updates that it carried out on the database persist, even if there is asystem failure after the
transaction completes execution.
Transaction State
Transaction State
1.Active State –When the instructions of the transaction are running then the transaction is in active state. If all the
‘read and write’ operations are performed without any error then it goes to the “partially committed state”; if any
instruction fails, it goes to the “failed state”.
2.Partially Committed –After completion of all the read and write operation the changes are made in main memory or
local buffer. If the changes are made permanent on the DataBase then the state will change to “committed state” and
in case of failure it will go to the “failed state”.
3.Failed State –When any instruction of the transaction fails, it goes to the “failed state” or if failure occurs in making
a permanent change of data on DataBase.
4.Aborted State –After having any type of failure the transaction goes from “failed state” to “aborted state” and since
in previous states, the changes are only made to local buffer or main memory and hence these changes are deleted or
rolled-back.
5.Committed State –It is the state when the changes are made permanent on the Data Base and the transaction is
complete and therefore terminated in the “terminated state”.
6.Terminated State –If there isn’t any roll-back or the transaction comes from the “committed state”, then the system
is consistent and ready for new transaction and the old transaction is terminated.
Schedules
• A schedule is a sequence of instructions that
specify the chronological order in which
instructions of transactions are executed.
Serial Schedule:
If transactions are executed from start to finish,
one after another then the schedule is called as a
serial schedule.
Characteristics-
Consistent
Recoverable
Cascadeless
Strict
Concurrent schedule/ Non-Serial Schedule
• If the instructions of different transactions are interleaved then the schedule is called as a
concurrent schedule.
• Thus, for a set of n transactions, there exist n! different valid serial schedules.
• If two transactions are running concurrently, the operating system may execute one transaction for
a little while, then perform a context switch, execute the second transaction for some time, and
then switch back to the first transaction for some time, and so on. With multiple transactions, the
CPU time is shared among all the transactions.
• Concurrent execution of transactions improves the performance of the system. Following figure
shows a concurrent schedule.
Concurrent schedule/ Non-Serial Schedule
Concurrent schedule/ Non-Serial Schedule
• If in a schedule, failure of one transaction causes several other dependent transactions to rollback or
abort, then such a schedule is called as a Cascading Schedule or Cascading Rollback or Cascading
Abort.
• It simply leads to the wastage of CPU time. Here,
Transaction T2 depends on transaction T1.
Transaction T3 depends on transaction T2.
Transaction T4 depends on transaction T3.
In this schedule,
The failure of transaction T1 causes the transaction
T2 to rollback.
The rollback of transaction T2 causes the
transaction T3 to rollback.
The rollback of transaction T3 causes the
transaction T4 to rollback.
Such a rollback is called as a Cascading Rollback.
Cascadeless Schedule
If in a schedule, a transaction is not allowed to read a data item until the last transaction that has
written it is committed or aborted, then such a schedule is called as a Cascadeless Schedule.
In other words,
• Cascadeless schedule allows only committed read operations.
• Therefore, it avoids cascading roll back and thus saves CPU time.
Cascadeless Schedule
In other words,
• Strict schedule allows only committed
read and write operations.
• Strict schedules are more strict than cascadeless
• Clearly, strict schedule implements schedules.
more restrictions than cascadeless • All strict schedules are cascadeless schedules.
schedule. • All cascadeless schedules are not strict
schedules.
Concurrency Problems
When multiple transactions execute concurrently in an uncontrolled or unrestricted manner, then it might lead
to several problems. Such problems are called as concurrency problems.
1. Dirty Read Problem
Reading the data written by an uncommitted
transaction is called as dirty read.
This read is called as dirty read because-
• There is always a chance that the uncommitted
transaction might roll back later.
• Thus, uncommitted transaction might make other
transactions read a value that does not even exist.
• This leads to inconsistency of the database. In this example,
T2 reads the dirty value of A written by the
NOTE- uncommitted transaction T1.
• Dirty read does not lead to inconsistency always. T1 fails in later stages and roll backs.
Thus, the value that T2 read now stands to be
• It becomes problematic only when the incorrect.
uncommitted transaction fails and roll backs later Therefore, database becomes inconsistent.
due to some reason.
2. Unrepeatable Read Problem
NOTE-
In this example,
T1 writes the over written value of X in the database.
Thus, update from T1 gets lost.
4. Phantom Read Problem
This problem occurs when a transaction reads
some variable from the buffer and when it reads
the same variable later, it finds that the variable
does not exist.
In this example,
T2 finds that there does not exist any variable X when it tries
reading X again.
T2 wonders who deleted the variable X because according to it,
it is running in isolation.
Avoiding Concurrency Problems
• To ensure consistency of the database, it is very important to prevent the occurrence of above
problems.
• Concurrency Control Protocols help to prevent the occurrence of above problems and maintain the
consistency of the database.
Two Phase Locking Protocol
Locking in a database management system is used for handling transactions in databases. The two-phase locking
protocol ensures serializable conflict schedules. A schedule is called conflict serializable if it can be transformed
into a serial schedule by swapping non-conflicting operations.
Before understanding the two phases of Locking, let's understand the types of locks used in transaction control.
• Shared Lock: Data can only be read when a shared lock is applied. Data cannot be written. It is denoted as
lock-S
• Exclusive lock: Data can be read as well as written when an exclusive lock is applied. It is denoted as lock-X.
• Growing Phase: In the growing phase, the transaction only obtains the lock. The transaction can not release
the lock in the growing phase. Only when the data changes are committed the transaction starts the Shrinking
phase.
• Shrinking Phase: Neither are locks obtained nor they are released in this phase. When all the data changes
are stored, only then the locks are released.
Two Phase Locking Protocol
Two Phase Locking Protocol
Transaction T1
The growing Phase is from steps 1-3
The shrinking Phase is from steps 5-7
Lock Point at 3
Transaction T2
The growing Phase is from steps 2-6
The shrinking Phase is from steps 8-9
Lock Point at 6
Lock Point
The Point at which the growing phase ends, i.e., when
a transaction takes the final lock it needs to carry on its
work.
Two Phase Locking Protocol
Drawbacks Cascading Rollbacks in 2-PL
• Cascading Rollback is possible under 2-PL.
• Deadlocks and Starvation are possible.
Two Phase Locking Protocol
• In the above transaction in step
3, T1 cannot execute the
Deadlock in 2PL exclusive lock in R2 as the
Exclusive lock is already
obtained by transaction T2 in
step 1.
• Also, in step 3, the Exclusive lock
on R1 cannot be executed by T2
as T1 has already applied the
Exclusive lock on R1.
Two Phase Locking Protocol
• The Timestamp Ordering Protocol is used to order the transactions based on their Timestamps. The order
of transaction is nothing but the ascending order of the transaction creation.
• The priority of the older transaction is higher that's why it executes first. To determine the timestamp of the
transaction, this protocol uses system time or logical counter.
• The lock-based protocol is used to manage the order between conflicting pairs among transactions at the
execution time. But Timestamp based protocols start working as soon as a transaction is created.
• Let's assume there are two transactions T1 and T2. Suppose the transaction T1 has entered the system
at 007 times and transaction T2 has entered the system at 009 times. T1 has the higher priority, so it
executes first as it is entered the system first.
• The timestamp ordering protocol also maintains the timestamp of last 'read' and 'write' operation on a
data.
Timestamp Ordering Protocols
Basic Timestamp ordering protocol works as follows:
1. Check the following condition whenever a transaction Ti issues a Read (X) operation:
If W_TS(X) >TS(Ti) then the operation is rejected.
If W_TS(X) <= TS(Ti) then the operation is executed.
Timestamps of all the data items are updated.
2. Check the following condition whenever a transaction Ti issues a Write(X) operation:
If TS(Ti) < R_TS(X) then the operation is rejected.
If TS(Ti) < W_TS(X) then the operation is rejected and Ti is rolled back otherwise the operation is executed.
Where,
TS(TI) denotes the timestamp of the transaction Ti.
R_TS(X) denotes the Read time-stamp of data-item X.
W_TS(X) denotes the Write time-stamp of data-item X.
Timestamp Ordering Protocols
Advantages
• Timestamp-based protocols in dbms ensure serializability as the transaction is ordered on their creation timestamp. The precedence
graph for the timestamp ordering protocol looks as follows:
Disadvantages
Log-based recovery in DBMS provides the ability to maintain or recover data in case of system failure.
DBMS keeps a record of every transaction on some stable storage device to provide easy access to data
when the system fails. A log file will be created for every operation performed on the database at that point.
The original transaction should be processed before it is applied to the database.
Log Based Recovery
Transaction identifier: Unique Identifier of the transaction that performed the write operation.
Data item: Unique identifier of the data item written.
Old value: Value of data item prior to write.
New value: Value of data item after write operation.
Log Based Recovery
Other types of log records are:
<Ti start>: It contains information about when a transaction Ti
starts.
<Ti commit>: It contains information about when a transaction Ti
commits.
<Ti abort>: It contains information about when a transaction Ti
aborts.
Undo and Redo Operations
Because all database modifications must be preceded by the creation of
a log record, the system has available both the old value prior to the
modification of the data item and new value that is to be written for
data item.
Undo: using a log record sets the data item specified in log record to
old value.
Redo: using a log record sets the data item specified in log record to
Log Based Recovery
• The checkpoint is a type of mechanism where all the previous logs are
removed from the system and permanently stored in the storage disk.
• The checkpoint is like a bookmark. While the execution of the transaction,
such checkpoints are marked, and the transaction is executed then using the
steps of the transaction, the log files will be created.
• When it reaches to the checkpoint, then the transaction will be updated into
the database, and till that point, the entire log file will be removed from the
file. Then the log file is updated with the new step of transaction till next
checkpoint and so on.
• The checkpoint is used to declare a point before which the DBMS was in the
consistent state, and all transactions were committed.
Recovery using Checkpoint
In the following manner, a recovery system recovers the database from
this failure:
Recovery using Checkpoint
• The recovery system reads log files from the end to start. It reads log files from T4
to T1.
• Recovery system maintains two lists, a redo-list, and an undo-list.
• The transaction is put into redo state if the recovery system sees a log with <Tn,
Start> and <Tn, Commit> or just <Tn, Commit>. In the redo-list and their previous
list, all the transactions are removed and then redone before saving their logs.
• For example: In the log file, transaction T2 and T3 will have <Tn, Start> and <Tn,
Commit>. The T1 transaction will have only <Tn, commit> in the log file. That's
why the transaction is committed after the checkpoint is crossed. Hence it puts T1,
T2 and T3 transaction into redo list.
• The transaction is put into undo state if the recovery system sees a log with <Tn,
Start> but no commit or abort log found. In the undo-list, all the transactions are
undone, and their logs are removed.
• For example: Transaction T4 will have <Tn, Start>. So T4 will be put into undo list
since this transaction is not yet complete and failed a mid.
Advantages of Checkpoints
• Checkpoints help us in recovering the transaction of the database in case of a random shutdown of the
database.
• It enhancing the consistency of the database in case when multiple transactions are executing in the database
simultaneously.
• It increasing the data recovery process.
• Checkpoints work as a synchronization point between the database and the transaction log file in the database.
• Checkpoint records in the log file are used to prevent unnecessary redo operations.
• Since dirty pages are flushed out continuously in the background, it has a very low overhead and can be done
frequently.
• Checkpoints provide the baseline information needed for the restoration of the lost state in the event of a
system failure.
• A database checkpoint keeps track of change information and enables incremental database backup.
• A database storage checkpoint can be mounted, allowing regular file system operations to be performed.
• Database checkpoints can be used for application solutions which include backup, recovery or database
modifications.
Disadvantages of Checkpoints
1. Database storage checkpoints can only be used to restore from
logical errors (E.g. a human error).
2. Because all the data blocks are on the same physical device,
database storage checkpoints cannot be used to restore files due to a
media failure.