DBMS MOD 5 NOTES
DBMS MOD 5 NOTES
Module – 5
Chapter 1 – Transaction Processing
5.1 Introduction to Transaction Processing
• The concept of transaction provides a mechanism for describing logical units of
database processing.
• Transaction processing systems are systems with large databases and hundreds of
concurrent users executing database transactions.
• Examples of such systems include airline reservations, banking, credit card
processing, online retail purchasing, stock markets, supermarket checkouts, and
many other applications.
• Single-user DBMSs are mostly restricted to personal computer systems; most other
DBMSs are multiuser. For example, an airline reservations system is used by
hundreds of users and travel agents concurrently.
• Multiuser if many users can use the system and access the database—concurrently.
• A data item can be a database record, but it can also be a larger unit such as a whole
disk block, or even a smaller unit such as an individual field (attribute) value of some
record in the database.
• The transaction processing concepts are independent of the data item granularity
(size) and apply to data items in general.
• Each data item has a unique name, and it means to uniquely identify each data item.
• The basic database access operations that a transaction can include are as follows:
■ read_item(X). Reads a database item named X into a program variable.
■ write_item(X). Writes the value of program variable X into the database item named X.
Dbms Buffers:
• The basic unit of data transfer from disk to main memory is one disk page (disk
block).
• Executing a read_item(X) command includes the following steps:
1. Find the address of the disk block that contains item X.
2. Copy that disk block into a buffer in main memory (if that disk block is not already in
some main memory buffer). The size of the buffer is the same as the disk block size.
3. Copy item X from the buffer to the program variable named X
• Executing a write_item(X) command includes the following steps:
1. Find the address of the disk block that contains item X.
2. Copy that disk block into a buffer in main memory (if that disk block is not already in
some main memory buffer).
3. Copy item X from the program variable named X into its correct location in the buffer.
4. Store the updated disk block from the buffer back to disk (either immediately or at some
later point in time).
• A main memory buffer is handled by the recovery manager of the DBMS in
cooperation with the underlying operating system.
• The DBMS will maintain in the database cache a number of data buffers in main
memory. Each buffer typically holds the contents of one database disk block, which
contains some of the database items being processed.
• When these buffers are all occupied, and additional database disk blocks must be
copied into memory, some buffer replacement policy is used to choose which of the
current occupied buffers is to be replaced. Some commonly used buffer replacement
policies are LRU (least recently used).
• The read-set of a transaction is the set of all items that the transaction reads, and
the write-set is the set of all items that the transaction writes.
• For example, the read-set of T1 in Figure 20.2 is {X, Y} and its write-set is also {X, Y}
• Concurrency control and recovery mechanisms are mainly concerned with the
database commands in a transaction.
• Transactions submitted by the various users may execute concurrently and may
access and update the same database items.
• The Temporary Update (or Dirty Read) Problem. This problem occurs when one
transaction updates a database item and then the transaction fails for some reason.
Meanwhile, the updated item is accessed (read) by another transaction before it is
changed back (or rolled back) to its original value.
• The Incorrect Summary Problem. If one transaction is calculating an aggregate
summary function on a number of database items while other transactions are
updating some of these items, the aggregate function may calculate some values
before they are updated and others after they are updated.
• The Unrepeatable Read Problem. Another problem that may occur is called
unrepeatable read, where a transaction T reads the same item twice and the item is
6. Physical problems and catastrophes. This refers to an endless list of problems that
includes power or air-conditioning failure, fire, theft, overwriting disks or tapes by mistake,
and mounting of a wrong tape by the operator
• For recovery purposes, the system needs to keep track of when each transaction
starts, terminates, and commits, or aborts. Therefore, the recovery manager of the
DBMS needs to keep track of the following operations:
5.2.1 Transaction States and Additional Operations:
• The DBMS cache will hold the disk pages that contain information currently being
processed in main memory buffers. If all the buffers in the DBMS cache are occupied
and new disk pages are required to be loaded into main memory from disk, a page
replacement policy is needed to select the particular buffers to be replaced. Some
page replacement policies that have been developed specifically for database
systems are:
• Domain Separation (DS) Method. In a DBMS, various types of disk pages exist: index
pages, data file pages, log file pages, and so on. In this method, the DBMS cache is
divided into separate domains (sets of buffers). Each domain handles one type of
data via the basic LRU (least recently used) page replacement
• Hot Set Method. This page replacement algorithm is useful in queries that have to
scan a set of pages repeatedly, such as when a join operation is performed using the
nested-loop method. The hot set method determines for each database processing
algorithm the set of disk pages that will be accessed repeatedly, and it does not
replace them until their processing is completed.
• The DBMIN Method. This page replacement policy uses a model known as QLSM
(query locality set model), which predetermines the pattern of page references for
each algorithm for a particular type of database operation.
• Every transaction has certain characteristics attributed to it. These characteristics are
specified by a SET TRANSACTION statement in SQL. The characteristics are the access
mode, the diagnostic area size, and the isolation level.
• The access mode can be specified as READ ONLY or READ WRITE. The default is READ
WRITE, and this mode of READ WRITE allows select, update, insert, delete, and
create commands to be executed. A mode of READ ONLY, as the name implies, is
simply for data retrieval.
• The diagnostic area size option, DIAGNOSTIC SIZE n, specifies an integer value n,
which indicates the number of conditions that can be held simultaneously in the
diagnostic area. These conditions supply feedback information (errors or exceptions)
to the user or program on the n most recently executed SQL statement.
• The isolation level option is specified using the statement ISOLATION LEVEL , where
the value for can be READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ,
or SERIALIZABLE.
• The default isolation level is SERIALIZABLE, although some systems use READ
COMMITTED as their default. The use of the term SERIALIZABLE here is based on not
allowing violations that cause :
1. dirty read: A transaction T1 may read the update of a transaction T2, which has not yet
committed. If T2 fails and is aborted, then T1 would have read a value that does not exist
and is incorrect.
2. unrepeatable read: A transaction T1 may read a given value from a table. If another
transaction T2 later updates that value and T1 reads that value again, T1 will see a different
value
3. phantoms: A transaction T1 may read a set of rows from a table, perhaps based on some
condition specified in the SQL WHERE-clause. Now suppose that a transaction T2 inserts a
new row r that also satisfies the WHERE-clause condition used in T1, into the table used by
T1. The record r is called a phantom record because it was not there when T1 starts but is
there when T1 ends.
1
V. SHARVANI, ASST. PROF.,MCA,BITM
0
INTRODUCTION TO SQL
A Lock is a variable assigned to any data item in order to keep track of the
status of that data item so that isolation and non-interference is ensured
during concurrent transactions.
At its basic, a database lock exists to prevent two or more database users from
performing any change on the same data item at the very same time.
This lock is usually associated with every data item in the database ( maybe at table
level, row level or even the entire database level).
Should item X be unlocked, then a corresponding object lock(X) would return the
value 0. So, the instant a user/session begins updating the contents of item X, lock(X)
is set to a value of 1.
There are 2 operations used to implement binary locks. They are lock_data( ) and
unlock_data( ). The algorithms have been discussed below:
lock_item(X):
1
V. SHARVANI, ASST. PROF.,MCA,BITM
1
INTRODUCTION TO SQL
go to B
end;
unlock_item(X):
LOCK(X) ← 0; (* unlock the item *)
if any transactions are waiting
The incentive governing these types of locks is the restrictive nature of binary locks.
Here we look at locks which permit other transactions to make read queries since a
READ query is non-conflicting. However, if a transaction demands a write query on
item X, then that transaction must be given exclusive access to item X.
We require a kind of multi-mode lock which is what shared/exclusive locks are. They
are also known as Read/Write locks.
1
V. SHARVANI, ASST. PROF.,MCA,BITM
2
INTRODUCTION TO SQL
READ-LOCKED – If a transaction only requires to read the contents of item X and the
lock only permits reading. This is also known as a shared lock.
WRITE-LOCKED –If a transaction needs to update or write to item X, the lock must
restrict all other transactions and provide exclusive access to the current transaction.
Thus, these locks are also known as exclusive locks.
read_lock(X):
B: if LOCK(X) = “unlocked”
then begin LOCK(X) ← “read-locked”;
no_of_reads(X) ← 1
end
else if LOCK(X) = “read-locked”
then no_of_reads(X) ← no_of_reads(X) + 1
else begin
wait (until LOCK(X) = “unlocked” and the lock manager wakes up the transaction);
go to B
end;
write_lock(X):
B: if LOCK(X) = “unlocked”
end
unlock (X):
if LOCK(X) = “write-locked”
then begin LOCK(X) ← “unlocked”;
1
V. SHARVANI, ASST. PROF.,MCA,BITM
3
INTRODUCTION TO SQL
if no_of_reads(X) = 0
then begin LOCK(X) = “unlocked”;
wakeup one of the waiting transactions, if any
end
end;
1
V. SHARVANI, ASST. PROF.,MCA,BITM
4
INTRODUCTION TO SQL
1
V. SHARVANI, ASST. PROF.,MCA,BITM
5
INTRODUCTION TO SQL
An algorithm must ensure that, for each item accessed by Conflicting Operations
in the schedule, the order in which the item is accessed does not violate the
ordering. To ensure this, use two Timestamp Values relating to each database
item X.
The protocol manages concurrent execution such that the timestamps determine
the serializability order. Whenever some Transaction T tries to issue a R_item(X) or a
W_item(X), the Basic TO algorithm compares the timestamp of T with R_TS(X) &
W_TS(X) to ensure that the Timestamp order is not violated. This describes the Basic
TO protocol in the following two cases.
1
V. SHARVANI, ASST. PROF.,MCA,BITM
6
INTRODUCTION TO SQL
Timestamp Ordering protocol ensures serializability since the precedence graph will
be of the form:
A variation of Basic TO is called Strict TO ensures that the schedules are both Strict
and Conflict Serializable. In this variation, a Transaction T that issues a R_item(X) or
W_item(X) such that TS(T) > W_TS(X) has its read or write operation delayed until
the Transaction T‘ that wrote the values of X has committed or aborted.
Efficient: The technique is efficient and scalable, as it does not require locking and
can handle a large number of transactions.
A modification of the basic TO algorithm, known as Thomas’s write rule, does not
enforce conflict serializability, but it rejects fewer write operations by modifying the
checks for the write_item(X) operation as follows:
If read_TS(X) > TS(T), then abort and roll back T and reject the
operation.
If write_TS(X) > TS(T), then do not execute the write operation but
continue processing.
This is because some transaction with timestamp greater than TS(T)—and hence
after T in the timestamp ordering—has already written the value of X. Thus, we must
ignore the write_item(X) operation of T because it is already outdated and obsolete.
Notice that any conflict arising from this situation would be detected by case (1).
If neither the condition in part (1) nor the condition in part (2) occurs,
then execute the write_item(X) operation of T and set write_TS(X) to
TS(T).
IV. Multiversion Concurrency Control Techniques
Multi-Version Concurrency Control (MVCC) is a database optimization method,
that makes redundant copies of records to allow for safe concurrent reading and
updating of data. DBMS reads and writes are not blocked by one another while
using MVCC. A technique called concurrency control keeps concurrent processes
running to avoid read/write conflicts or other irregularities in a database.
1
V. SHARVANI, ASST. PROF.,MCA,BITM
8
INTRODUCTION TO SQL
For each version Xi of a data item(X), the system maintains the following three fields:
1. The value of the version.
2. Read_TS (Xi): The read timestamp of Xi is the largest timestamp of any transaction
that successfully reads version Xi.
3. Write_TS(Xi): The write timestamp of Xi is the largest timestamp of any transaction
that successfully writes version Xi.
• To ensure serializability the following two rules are used:
• Suppose a transaction T issues a read request and a written request for a data item
X. Let Xi be the version having the largest Write_TS(Xi) of all the versions of X which is
also less than or equal to TS(T).
• Rule1: Let us suppose the transaction T issues a Read(X) request, If
Read_TS(Xi)<TS(T) then the system returns the value of Xi to the transaction T and
update the value of Read_TS(Xi) to TS(T)
• Rule 2: Let us suppose the transaction T issues a Write(X) request TS(T) <
Read_TS(X), then the system aborts transaction T. On the other hand, if TS(T) =
Write_TS(X), the system overwrites the contents of X; if TS(T)>Write_TS(X) it creates
a new version of X.
1
V. SHARVANI, ASST. PROF.,MCA,BITM
9
INTRODUCTION TO SQL
• In this multiversion 2PL scheme, reads can proceed concurrently with a single write
operation—an arrangement not permitted under the standard 2PL schemes.
• The cost is that a transaction may have to delay its commit until it obtains exclusive
certify locks on all the items it has updated.
• It can be shown that this scheme avoids cascading aborts, since transactions are only
allowed to read the version X that was written by a committed transaction.
• However, deadlocks may occur if upgrading of a read lock to a write lock is allowed,
and these must be handled by variations of the techniques described.
2
V. SHARVANI, ASST. PROF.,MCA,BITM
0
INTRODUCTION TO SQL
All updates are applied to local copies of data items kept for the transaction.
Read Phase:
Values of committed data items from the database can be read by a transaction.
Updates are only applied to local data versions.
Validation Phase:
Checking is performed to make sure that there is no violation of serializability
when the transaction updates are applied to the database.
Write Phase:
On the success of the validation phase, the transaction updates are applied to
the database, otherwise, the updates are discarded and the transaction is slowed
down.
In order to perform the Validation test, each transaction should go through the
various phases as described above. Then, we must know about the following
three time-stamps that we assigned to transaction Ti, to check its validity:
2. Validation(Ti): It is the time when Ti just finished its read phase and begin its
validation phase.
3. Finish(Ti): the time when Ti end it’s all writing operations in the database
under write-phase.
In the Validation phase for transaction Ti the protocol inspect that Ti doesn’t
overlap or intervene with any other transactions currently in their validation
phase or in committed. The validation phase for Ti checks that for all transaction
Tj one of the following below conditions must hold to being validated or pass
validation phase:
2. Ti begins its write phase after Tj completes its write phase, and the read_set of
Ti should be disjoint with write_set of Tj.
3. Tj completes its read phase before Ti completes its read phase and both
read_set and write_set of Ti are disjoint with the write_set of Tj.
2
V. SHARVANI, ASST. PROF.,MCA,BITM
2