0% found this document useful (0 votes)
32 views27 pages

Unit Iv

This document covers transaction processing, including the ACID properties, concurrency control, and the states of transactions. It explains how transactions are initiated, the operations involved, and the importance of maintaining database consistency during concurrent executions. Additionally, it discusses serializability of schedules and the mechanisms for ensuring that transactions do not leave the database in an inconsistent state.

Uploaded by

idyllic.bns1921
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views27 pages

Unit Iv

This document covers transaction processing, including the ACID properties, concurrency control, and the states of transactions. It explains how transactions are initiated, the operations involved, and the importance of maintaining database consistency during concurrent executions. Additionally, it discusses serializability of schedules and the mechanisms for ensuring that transactions do not leave the database in an inconsistent state.

Uploaded by

idyllic.bns1921
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

UNIT IV

Syllabus

 Transaction processing
 Concurrency control,
 ACID property,
 Serializability of scheduling,
 Locking and timestamp-based schedulers,
 Multi-version and
 Optimistic Concurrency Control schemes,
 Database Recovery.

1. Transactions
Transaction is a collection of operations that form a single logical unit of work. A transaction is a
unit of program execution that accesses and possibly updates various data items. A transaction is
initiated by a user program written in a high level data manipulation language, where it delimited by
statements of the form ‘begin transaction’ and ‘end transaction’. The transaction consists of all
operations executed between the begin transaction and end transaction.

1.1 Example
Consider the following fund transfer transaction

• Begin transaction
• Transfer 100 from Account X to Account Y
• Commit

1.2 Operations in this transaction


i. Read Account X from disk
ii. If balance is less than 100, return with an appropriate message.
iii. Subtract 100 from balance of Account X.
iv. Write Account X back to disk.
v. Read Account Y from disk.
vi. Add 100 to balance of Account Y.
vii. Write Account Y back to disk.

2. Transaction Processing
Transaction processing systems are systems with large databases and hundreds of concurrent users
that are executing database transaction. Ex: Systems for reservations, banking, and credit card
processing etc.

2.1 Transactions access data using two operations


• read(X): which transfers the data item X from the database to a local buffer belonging to the
transaction that executed the read operation.
• write(X): which transfers the data item X from the local buffer of the transaction that
executed the write back to the database.

Prepred By: Minu Choudhary Page 1


3. States of a Transaction
A transaction is a program unit whose execution may change the contents of a database. If the
database was in a consistent state before transaction, then on completion of the execution of the
program, the unit corresponding to the transaction, the database will be in a consistent state.

Figure 4.1 States of a Transaction

A transaction can end in three possible ways.

i. It can end after a commit operation (a successful termination).


ii. It can detect an error during its processing and decide to abort itself by performing a rollback
operation (a suicidal termination).
iii. The DBMS or the operating system can force it to be aborted for one reason or another (a
murderous termination).
We assume that the database is in a consistent state before a transaction starts. A transaction starts
when the first statement of the transaction is executed.

Step1: Initially the transaction is in “modify” state when the first statement of the transaction is
executed. At the end of the modify state, there is a transition into one of the following states:

Start to commit, abort or error.

Step2: If the transaction completes the modification state, it enters the start-to-commit state where it
instructs the DBMS to reflect the changes made by it into the database.

Step3: Once all the changes made by the transaction are propagated to the database, the transaction is
said to be in the “commit” state and from there the transaction is terminated, the database once again
being in a “consistent” state.

Prepred By: Minu Choudhary Page 2


Step4: Possibly, all the modifications made by the transaction cannot be propagated to the database
due to conflicts or hardware failures. In this case the system forces the transaction to the “abort”
state.

Step5: The abort state could also be entered from the modify state if there are system errors, for ex,
division by zero. In case the transaction detects an error while in the modify state, it decides to
terminate itself (suicide) and enters the error state and then the “rollback” state. If the system aborts a
transaction, it may have to initiate a rollback to undo partial changes made by the transaction.

The transaction outcome can be either successful (if the transaction goes through the commit state),
suicidal (if the transaction goes through the rollback state) or murdered (if the transaction goes
through the abort state).

4. ACID PROPERTIES (Properties of a Transaction)


Atomicity: Atomicity requires that database modifications must follow an "all or nothing" rule. Each
transaction is said to be atomic. If one part of the transaction fails, the entire transaction fails and the
database state is left unchanged. A transaction implies that it will run to completion as an indivisible
unit, at the end of which either no changes have occurred to the database or the database has been
changed in a consistent manner.

Consistency: The consistency property of a transaction implies that if the database was in a consistent
state before the start of a transaction, then on termination of a transaction the database will also be in a
consistent state.

Isolation: The isolation property of a transaction indicates that actions performed by a transaction will
be isolated or hidden from outside the transaction until the transaction terminates. This property gives
the transaction a measure of relative independence.

Durability: The durability property of a transaction ensures that the commit action of a transaction,
on its termination, will be reflected in the database. The permanence of the commit action of a
transaction requires that any failures after to commit operation will not cause loss of the updates made
by the transaction.

Example

Let Ti be a transaction that transfers $50 from account A to account B. This transaction can be defined
as

Ti: read(A);
A := A − 50;
write(A);
read(B);
B := B + 50;
write(B).

Consistency: The consistency requirement here is that the sum of A and B be unchanged by the
execution of the transaction. It can be verified easily that, if the database is consistent before an
execution of the transaction, the database remains consistent after the execution of the transaction.

Prepred By: Minu Choudhary Page 3


Atomicity: Suppose that, just before the execution of transaction Ti the values of accounts A and B
are $1000 and $2000, respectively. Now suppose that, during the execution of transaction Ti, a failure
occurs that prevents Ti from completing its execution successfully. Suppose that the failure happened
after the write(A) operation but before the write(B) operation. In this case, the values of accounts A
and B reflected in the database are $950 and $2000. Thus, because of the failure, the state of the
system no longer reflects a real state of the world that the database is supposed to capture. We term
such a state an inconsistent state.

If the atomicity property is present, all actions of the transaction are reflected in the database, or none
are. Ensuring atomicity is the responsibility of the database system itself; specifically, it is handled by
a component called the transaction-management component

Durability: Once the execution of the transaction completes successfully, and the user who initiated
the transaction has been notified that the transfer of funds has taken place, it must be the case that no
system failure will result in a loss of data corresponding to this transfer of funds. The durability
property guarantees that, once a transaction completes successfully, all the updates that it carried out
on the database persist, even if there is a system failure after the transaction completes execution.

We can guarantee durability by ensuring that either-

1. The updates carried out by the transaction have been written to disk before the transaction
completes.

2. Information about the updates carried out by the transaction and written to disk is sufficient to
enable the database to reconstruct the updates when the database system is restarted after the failure.

Ensuring durability is the responsibility of a component of the database system called the recovery-
management component.

Isolation: the database is temporarily inconsistent while the transaction to transfer funds from A to B
is executing, with the deducted total written to A and the increased total yet to be written to B. If a
second concurrently running transaction reads A and B at this intermediate point and computes A+B, it
will observe an inconsistent value.

A way to avoid the problem of concurrently executing transactions is to execute transactions


serially—that is, one after the other. The isolation property of a transaction ensures that the concurrent
execution of transactions results in a system state that is equivalent to a state that could have been
obtained had these transactions executed one at a time in some order.

Prepred By: Minu Choudhary Page 4


Concurrent Executions
Multiple transactions are allowed to run concurrently in the system.

Advantages are:

o increased processor and disk utilization: one transaction can be using the CPU while another
is reading from or writing to the disk
o reduced average response time for transactions: short transactions need not wait behind long
ones.
If control of concurrent execution is left entirely to the operating system, many possible schedules,
including ones that leave the database in an inconsistent state, are possible. It is the job of the database
system to ensure that any schedule that gets executed will leave the database in a consistent state. The
concurrency-control component of the database system carries out this task.

Transactions access data using two operations

read(X): which transfers the data item X from the database to a local buffer belonging to the
transaction that executed the read operation.

write(X): which transfers the data item X from the local buffer of the transaction that executed the
write back to the database.

SCHEDULES
Schedules represent the chronological order in which instructions of concurrent transactions are
executed in the system. A schedule for a set of transactions must consist of all instructions of those
transactions. Must preserve the order in which the instructions appear in each individual transaction.
A transaction that successfully completes its execution will have commit instructions as the last
statement. A transaction that fails to successfully complete its execution will have an abort instruction
as the last statement.

Serial and Non-serial schedule


A schedule S is serial if, for every transaction T participating in the schedule, all the operations of T,
are executed consecutively in the schedule, otherwise the schedule is called NONSERIAL.

Schedule 1: Let T1 transfer $50 from A to B, and T2 transfer 10% of the balance from A to B. A serial
schedule in which T1 is followed by T2: A=1000, B=2000

Figure 4.2 Schedule 1

Prepred By: Minu Choudhary Page 5


Schedule 2: A serial schedule where T2 is followed by T1

Figure 4.3 Schedule 2

Schedule 3: Let T1 and T2 be the transactions defined previously. The following schedule is not a
serial schedule, but it is equivalent to Schedule 1. In Schedules 1, 2 and 3, the sum A + B is preserved.

Figure 4.4 Schedule 3

Schedule 4: The following concurrent schedule does not preserve the value of (A + B).

Figure 4.5 Schedule 4

Major goals of Concurrency control


Concurrency control mechanisms are usually designed to achieve some of, or all the following goals:

I. Serializability
II. Recoverability

Prepred By: Minu Choudhary Page 6


SERIALIZABILITY OF SCHEDULES
A schedule is serializable if it is equivalent to a serial schedule.
Types of serializability
1. Conflict serializability
2. View serializability

Conflict operations
Two operations conflict if:

 They are issued by different transactions,


 They operate on the same data item, and
 At least on of them is a write operation
li = read(Q), lj = read(Q). li and lj don’t conflict
li = read(Q), lj = write(Q). They conflict
li = write(Q), lj = read(Q). They conflic
li = write(Q), lj = write(Q). They conflict

Conflict equivalent
If a schedule S can be transformed into a schedule S´ by a series of swaps of non-conflicting
instructions, we say that S and S´ are conflict equivalent.

Conflict Serializability
An execution is conflict-serializable if it is conflict equivalent to a serial schedule.

Example

Schedule 3 can be transformed into Schedule 6, a serial schedule where T2 follows T1, by series of
swaps of non-conflicting instructions. Therefore Schedule 3 is conflict serializable.

Figure 4.6 Schedule 3 and Figure 4.4 Schedule 5

Schedule 3 is converted into serial schedule(schedule 5) by swapping non-conflicting instructions:

 Swap write(A) instruction of T2 with the read(B ) instruction of T1.


 Swap the read(B) instruction of T1 with the read(A) instruction of T2.
 Swap the write(B) instruction of T1 with the write(A) instruction of T2.
 Swap the write(B) instruction of T1 with the read(A) instruction of T2.
Example of a schedule that is not conflict serializable

We are unable to swap instructions in the above schedule to obtain either the serial schedule < T3, T4
>, or the serial schedule < T4, T3 >.

Prepred By: Minu Choudhary Page 7


Figure 4.7 schedule that is not conflict serializable

View Serializability

Let S and S´ be two schedules with the same set of transactions. S and S´ are view equivalent if the
following three conditions are met:

i. For each data item Q, if TI reads initial value of Q in S, then TI also read initial value of Q in S´.
ii. For each data item Q if TI read value of Q written by TJ in S, then TI also reads the value of Q written
by TJ in S´.
iii. For each data item Q, if TI writes final value of Q in S, then TI also writes final value of Q in S´.

Conditions 1 and 2 ensure that each transaction reads the same values in both schedules and,
therefore, performs the same computation. Condition 3, coupled with conditions 1 and 2, and ensures
that both schedules result in the same final system state.

A schedule S is view serializable it is view equivalent to a serial schedule. Every conflict serializable
schedule is also view serializable, but there are view-serializable schedules that are not conflict
serializable. In below schedule, transactions T4 and T6 perform Write(Q) operations without having
performed a read(Q) operation. Writes of this sort are called blind writes.

Figure 4.8 blind writes

Every conflict-serializable schedule is also view serializable, but there are view serializable schedules
that are not conflict serializable. Indeed, schedule 9 is not conflict serializable, since every pair of
consecutive instructions conflicts, and, thus, no swapping of instructions is possible.

TESTING FOR SERIALIZABILITY


Consider some schedule of a set of transactions T1, T2, ..., Tn We construct a directed graph, called a
precedence graph, from S. This graph consists of a pair G = (V, E), where V is a set of vertices and
E is a set of edges. The set of vertices consists of all the transactions participating in the schedule.

The set of edges consists of all edges Ti →Tj for which one of three conditions holds:

i. Ti executes write(Q) before Tj executes read(Q).


ii. Ti executes read(Q) before Tj executes write(Q).
iii. Ti executes write(Q) before Tj executes write(Q).

Prepred By: Minu Choudhary Page 8


Figure 4.9 Schedule 1 and Precedence graph

Figure 4.10 Schedule 2 and Precedence graph

Figure 4.11 Schedule 4 and Precedence graph

Test for Conflict Serializability


A schedule is conflict serializable if and only if its precedence graph is acyclic. Schedules 1 and 2
indeed do not contain cycles. Both are conflict serializable. The precedence graph for schedule 4
contains a cycle, indicating that this schedule is not conflict serializable.

Prepred By: Minu Choudhary Page 9


Question 1

Consider the following precedence graph. Is corresponding schedule conflict serializability. Explain
your answer.

Solution 1: There is a conflict serializable schedule corresponding to the precedence graph below,
since the graph is acyclic.

Question 2

Define and explain conflict serializability. Show that following schedules S1 is conflict serizable
whereas S2 is not.

Solution 2:

• For schedule S1.


• Precedence graph is
• This is acylic.
• So S1 is conflict seriziable

• For schedule S2.


• Precedence graph is
• This is cylic.
• So S2 is not conflict
seriziable

Prepred By: Minu Choudhary Page 10


Question 3

Test the serializability of the given schedule

Solution 3:

• Precedence graph is
• This is cylic.
• So schedule is not conflict
seriziable

Prepred By: Minu Choudhary Page 11


Concurrency Control
When several transactions execute concurrently in the database, however, the isolation property may
no longer be preserved. To ensure that it is, the system must control the interaction among the
concurrent transactions; this control is achieved through one of a variety of mechanisms called
concurrency-control schemes.

Concurrency Control techniques

1. Lock-Based Protocols
2. Timestamp-Based Protocols
3. Validation-Based Protocols/Optimistic concurrency control
4. Multi-version Schemes

1. Lock-Based Protocols
A lock is a mechanism to control concurrent access to a data item.
Data items can be locked in two modes:
I. Exclusive (X) mode: Data item can be both read as well as written. X-lock is requested using
lock-X instruction.
II. Shared (S) mode: Data item can only be read. S-lock is requested using lock-S instruction.

Lock requests are made to concurrency-control manager. Transaction can proceed only after request is
granted.

Lock-compatibility matrix

A transaction may be granted a lock on an item if the requested lock is compatible with locks already
held on the item by other transactions. Any number of transactions can hold shared locks on an item.
If a lock cannot be granted, the requesting transaction is made to wait till all incompatible locks held
by other transactions have been released. The lock is then granted.

Example of a transaction performing locking:


T2: lock-S(A);
read (A);
unlock(A);
lock-S(B);
read (B);
unlock(B);
display(A+B)

A locking protocol is a set of rules followed by all transactions while requesting and releasing locks.

Pitfalls of Lock-Based Protocols

The potential for deadlock exists in most locking protocols. Deadlocks are a necessary evil.
Starvation is also possible if concurrency control manager is badly designed. For example:

Prepred By: Minu Choudhary Page 12


A transaction may be waiting for an X-lock on an item, while a sequence of other transactions request
and are granted an S-lock on the same item. The same transaction is repeatedly rolled back due to
deadlocks. Concurrency control manager can be designed to prevent starvation.

The Two-Phase Locking Protocol

This is a protocol which ensures serializable schedules. This protocol requires that each transaction
issue lock and unlock requests in two phases:

Phase 1: Growing Phase: A transaction may obtain locks, but may not release any lock
Phase 2: Shrinking Phase: A transaction may release locks, but may not obtain any new locks

Initially, a transaction is in the growing phase. The transaction acquires locks as needed. Once the
transaction releases a lock, it enters the shrinking phase, and it can issue no more lock requests. The
point in the schedule where the transaction has obtained its final lock is called the lock point of the
transaction. Transactions can be ordered according to their lock points—this ordering is, in fact, a
serializability ordering for the transactions. Two-phase locking does not ensure freedom from
deadlocks. Cascading roll-back is possible under two-phase locking.

To avoid this, follow a modified protocol called strict two-phase locking. Here a transaction must
hold all its exclusive locks till it commits/aborts. Rigorous two-phase locking is even stricter: here
all locks are held till commit/abort. Most database systems implement either strict or rigorous two-
phase locking.

Lock Conversions

A mechanism for upgrading a shared lock to an exclusive lock and downgrading an exclusive lock to
a shared lock. We denote conversion from shared to exclusive modes by upgrade, and from
exclusive to shared by downgrade.

Two-phase locking with lock conversions:


– First Phase:
 can acquire a lock-S on item
 can acquire a lock-X on item
 can convert a lock-S to a lock-X (upgrade)
– Second Phase:
 can release a lock-S
 can release a lock-X
 can convert a lock-X to a lock-S (downgrade)

Incomplete schedule with a lock conversion

Prepred By: Minu Choudhary Page 13


Implementation of Locking
A lock manager can be implemented as a separate process to which transactions send lock and
unlock requests. The lock manager replies to a lock request by sending a lock grant messages (or a
message asking the transaction to roll back, in case of a deadlock). The requesting transaction waits
until its request is answered. The lock manager maintains a data-structure called a lock table to record
granted locks and pending requests.

Lock Table
Black rectangles indicate granted locks, white ones indicate waiting requests. Lock table also records
the type of lock granted or requested. New request is added to the end of the linked list of requests for
the data item, and granted if it is compatible with all earlier locks. Unlock requests result in the
request being deleted, and later requests are checked to see if they can now be granted. If transaction
aborts, all waiting or granted requests of the transaction are deleted. Lock manager may keep a list of
locks held by each transaction, to implement this efficiently.

2. Timestamp-Based Protocols

Each transaction is issued a timestamp when it enters the system. Timestamp is the unique value
assigned to every transaction. It tells the order when they enters into system. If an old transaction Ti
has time-stamp TS(Ti), a new transaction Tj is assigned time-stamp TS(Tj) such that TS(Ti) <TS(Tj).
The protocol manages concurrent execution such that the time-stamps determine the serializability
order.
There are two simple methods for implementing this scheme:

i. Use the value of the system clock as the timestamp; that is, a transaction’s timestamp is
equal to the value of the clock when the transaction enters the system.
ii. Use a logical counter that is incremented after a new timestamp has been assigned; that
is, a transaction’s timestamp is equal to the value of the counter when the transaction enters
the system.

Prepred By: Minu Choudhary Page 14


In order to assure such behavior, the protocol maintains for each data Q two timestamp values:
• W-timestamp(Q) is the time-stamp of latest transaction that performed write(Q)
successfully.
• R-timestamp(Q) is the time-stamp of latest transaction that performed read(Q) successfully.

The timestamp ordering protocol ensures that any conflicting read and write operations are
executed in timestamp order. This protocol operates as follows:

1) Suppose a transaction Ti issues a read(Q)


i. If TS(Ti)  W-timestamp(Q), then read operation is rejected and Ti is rolled back.
ii. Otherwise read operation is executed, and R-timestamp(Q) is set to max(R-
timestamp(Q), TS(Ti)).
2) Suppose that transaction Ti issues write(Q).
i. If TS(Ti) < R-timestamp(Q), then the write operation is rejected, and Ti is rolled
back.
ii. If TS(Ti) < W-timestamp(Q), then write operation is rejected, and Ti is rolled back.
iii. Otherwise, the write operation is executed, and W-timestamp(Q) is set to TS(Ti).

Case 1: write operation of T2 Case 2: read operation of T2 Case 3: write operation of T2


will execute will execute will execute

Prepred By: Minu Choudhary Page 15


Case 4: Write operation of T1 is Case 5: read operation of T1 is Case 6: Write operation of T1 is
rejected rejected rejected

Disadvantages
• Each value stored in the database requires two additional time stamp fields.
• One for the last time the field was read and one for the last update.
• Time stamping thus increases the memory needs and the database processing overhead.

3. Validation-Based Protocol / Optimistic concurrency control

Execution of transaction Ti is done in three phases.


1. Read phase: During this phase, the system executes transaction Ti . It reads the values of the
various data items and stores them in variables local to T i . It performs all write operations on
temporary local variables, without updates of the actual database.
2. Validation phase: Transaction Ti performs a validation test to determine whether it can copy
to the database the temporary local variables that hold the results of write operations without
causing a violation of serializability.
3. Write phase: If transaction Ti succeeds in validation (step 2), then the system applies the
actual updates to the database. Otherwise, the system rolls back Ti .

Example:

To perform the validation test, we need to know when the various phases of transactions Ti took place.
1. Start(Ti): the time when Ti started its execution.
2. Validation(Ti): the time when Ti finished its read phase and started its validation phase.
3. Finish(Ti): the time when Ti finished its write phase.

Prepred By: Minu Choudhary Page 16


The validation test for transaction Tj requires that, for all transactions Ti with TS(Ti) < TS(Tj ), one of
the following two conditions must hold:
• finish(Ti) < start(Tj)
• start(Tj) < finish(Ti) < validation(Tj) and the set of data items written by Ti does not intersect
with the set of data items read by Tj.

Schedule Produced by Validation

4. Multiversion Schemes
In Multiversion concurrency control schemes, each write(Q) operation creates a new version of Q.
When a transaction issues a read (Q) operation, the concurrency control manager selects one of the
versions of Q to be read. The concurrency-control scheme must ensure that the version to be read is
selected in a manner that ensures serializability.

Three are two schemes


1. Multiversion Timestamp Ordering
2. Multiversion Two-Phase Locking

Multiversion Timestamp Ordering

Each data item Q has a sequence of versions <Q1, Q2,...., Qm>. Each version Qk contains three data
fields:
• Content -- the value of version Qk.
• W-timestamp(Qk) -- timestamp of the transaction that created (wrote) version Qk
• R-timestamp(Qk) -- largest timestamp of a transaction that successfully read version Qk

When a transaction Ti creates a new version Qk of Q by issuing a write (Q) operation and W-
timestamp and R-timestamp are initialized to TS(Ti).

Prepred By: Minu Choudhary Page 17


• R-timestamp of Qk is updated whenever a transaction Tj reads Qk, and TS(Tj) > R-
timestamp(Qk).

Suppose that transaction Ti issues a read(Q) or write(Q) operation. Let Qk denote the version of Q
whose write timestamp is the largest write timestamp less than or equal to TS(Ti).
• If transaction Ti issues a read(Q), then the value returned is the content of version Qk.
• If transaction Ti issues a write(Q)
• if TS(Ti) < R-timestamp(Qk), then transaction Ti is rolled back.
• if TS(Ti) = W-timestamp(Qk), the contents of Qk are overwritten
• else a new version of Q is created.

Multiversion Two-Phase Locking


 Differentiates between read-only transactions and update transactions.
 Update transactions acquire read and write locks, and hold all locks up to the end of the
transaction. That is, update transactions follow rigorous two-phase locking.
 Each successful write results in the creation of a new version of the data item written.
 Each version of a data item has a single timestamp whose value is obtained from a
counter ts-counter that is incremented during commit processing.
 Read-only transactions are assigned a timestamp by reading the current value of ts-counter
before they start execution; they follow the multiversion timestamp-ordering protocol for
performing reads.
 When an update transaction wants to read a data item:
 it obtains a shared lock on it, and reads the latest version.
 When it wants to write an item
 it obtains X lock on; it then creates a new version of the item and sets this version's
timestamp to .
 When update transaction Ti completes, commit processing occurs:
 Ti sets timestamp on the versions it has created to ts-counter + 1
 Ti increments ts-counter by 1
 Read-only transactions that start after Ti increments ts-counter will see the values updated by
Ti.
 Read-only transactions that start before Ti increments the
ts-counter will see the value before the updates by Ti.
 Only serializable schedules are produced.

Prepred By: Minu Choudhary Page 18


Database Recovery
Database recovery is the process of restoring a database to the consistent state that existed
before the failure. When a system crashes, it may have several transactions being executed
and various files opened for them to modify the data items. Transactions are made of various
operations, which are atomic in nature. But according to ACID properties of DBMS,
atomicity of transactions as a whole must be maintained, that is, either all the operations are
executed or none.

When a DBMS recovers from a crash, it should maintain the following −


• It should check the states of all the transactions, which were being executed.
• A transaction may be in the middle of some operation; the DBMS must ensure the
atomicity of the transaction in this case.
• It should check whether the transaction can be completed now or it needs to be rolled
back.
• No transactions would be allowed to leave the DBMS in an inconsistent state.

RECOVERY TECHNIQUES
1. Log Based recovery technique
o Deferred update
o Immediate update
2. Shadow Paging

Log-Based Recovery
The most widely used structure for recording database modifications is the log. The log is a sequence
of log records, recording all the update activities in the database. There are several types of log
records. An update log record describes a single database write. It has these fields:

• Transaction identifier is the unique identifier of the transaction that performed the write operation.
• Data-item identifier is the unique identifier of the data item written. Typically, it is the location on
disk of the data item.
• Old value is the value of the data item prior to the write.
• New value is the value that the data item will have after the write.

Other special log records exist to record significant events during transaction processing, such as the
start of a transaction and the commit or abort of a transaction.

We denote the various types of log records as:


• <Ti start>. Transaction Ti has started.
• < Ti, Xi, V1 , V2>. Transaction Ti has performed a write on data item Xi . Xi
had value V1 before the write, and will have value V2 after the write.
• < Ti commit>. Transaction Ti has committed.
• < Ti abort>. Transaction Ti has aborted.

Whenever a transaction performs a write, it is essential that the log record for that write be created
before the database is modified. Once a log record exists, we can output the modification to the
database if that is desirable. Also, we have the ability to undo a modification that has already been
output to the database. We undo it by using the old-value field in log records. For log records to be
useful for recovery from system and disk failures, the log must reside in stable storage.

Prepred By: Minu Choudhary Page 19


Deferred Database Modification
The deferred-modification technique ensures transaction atomicity by recording all database
modifications in the log, but deferring the execution of all write operations of a transaction until the
transaction partially commits.

When a transaction partially commits, the information on the log associated with the transaction is
used in executing the deferred writes. If the system crashes before the transaction completes its
execution, or if the transaction aborts, then the information on the log is simply ignored.

The execution of transaction Ti proceeds as follows:

 The deferred database modification scheme records all modifications to the log, but defers
all the writes to after partial commit.
 Assume that transactions execute serially
 Transaction starts by writing <Ti start> record to log.
 A write(X) operation results in a log record <Ti , X, V> being written, where V is the new
value for X
 Note: old value is not needed for this scheme
 The write is not performed on X at this time, but is deferred.
 When Ti partially commits, <Ti commit> is written to the log.
 Finally, the log records are read and used to actually execute the previously deferred writes.

Let T0 be a transaction that transfers $50 from account A to account B:


T0: read(A);
A := A − 50;
write(A);
read(B);
B := B + 50;
write(B).
Let T1 be a transaction that withdraws $100 from account C:
T1: read(C);
C := C − 100;
write(C).

Suppose that these transactions are executed serially, in the order T0 followed by T1, and that the
values of accounts A, B, and C before the execution took place were $1000, $2000, and $700,
respectively. The portion of the log containing the relevant information on these two transactions
appears in Figure 5.1 There are various orders in which the actual outputs can take place to both the
database system and the log as a result of the execution of T0 and T1. One such order appears in
Figure 5.2.

< T0start>
< T0 , A, 950>
< T0, B, 2050>
< T0 commit>
< T1 start>
< T1 , C, 600>
< T1 commit>
Figure 5.1 Portion of the database log corresponding to T0 and T1.

The recovery scheme uses the following recovery procedure:


• redo(Ti) sets the value of all data items updated by transaction Ti to the new values.

Prepred By: Minu Choudhary Page 20


 After a failure, the recovery subsystem consults the log to determine which transactions need
to be redone.
 Transaction Ti needs to be redone if and only if the log contains both the record < Ti start>
and the record < Ti commit>.
 Thus, if the system crashes after the transaction completes its execution, the recovery scheme
uses the information in the log to restore the system to a previous consistent state after the
transaction had completed.

Figure 5.2 State of the log and database corresponding to T0 and T1.

< T0 start> < T0 start> < T0 start>


< T0 , A, 950> < T0 , A, 950> < T0 , A, 950>
< T0 , B, 2050> < T0 , B, 2050> < T0 , B, 2050>
< T0 commit> < T0 commit>
< T1 start> < T1 start>
< T1, C, 600> < T1 , C, 600>
< T1 commit>

(a) (b) (c)

Figure 5.3 The same log as that in Figure 17.3, shown at three different times.

Case a:

In Figure 5.3 a. When the system comes back up, no redo actions need to be taken, since no commit
record appears in the log. The values of accounts A and B remain $1000 and $2000, respectively. The
log records of the incomplete transaction T0 can be deleted from the log.

Case b:

In Figure 5.3 b. When the system comes back up, the operation redo(T0) is performed, since the
record < T0 commit> appears in the log on the disk. After this operation is executed, the values of
accounts A and B are $950 and $2050, respectively. The value of account C remains $700. As before,
the log records of the incomplete transaction T1 can be deleted from the log.

Case c:

In Figure 5.3 c. When the system comes back up, two commit records are in the log: one for T0 and
one for T1. Therefore, the system must perform operations redo(T0) and redo(T1), in the order in
which their commit records appear in the log. After the system executes these operations, the values
of accounts A, B, and C are $950, $2050, and $600, respectively.

Immediate Database Modification


The immediate-modification technique allows database modifications to be output to the database
while the transaction is still in the active state. Data modifications written by active transactions are
called uncommitted modifications. In the event of a crash or a transaction failure, the system must

Prepred By: Minu Choudhary Page 21


use the old-value field of the log records to restore the modified data items to the value they had prior
to the start of the transaction. As an illustration, let us reconsider our simplified banking system, with
transactions T0 and T1 executed one after the other in the order T0 followed by T1. The portion of the
log containing the relevant information concerning these two transactions appears in Figure 5.4.
Figure 5.5 shows one possible order in which the actual outputs took place in both the database
system and the log as a result of the execution of T0 and T1.

< T0 start>
< T0 , A, 1000, 950>
< T0 , B, 2000, 2050>
<T0 commit>
< T1 start>
< T1 , C, 700, 600>
< T1 commit>
Figure 5.4 Portion of the system log corresponding to T0 and T1.

Log Database
< T0 start>
< T0 , A, 1000, 950>
< T0 , B, 2000, 2050>
A = 950
B = 2050
< T0 commit>
< T1 start>
< T1 , C, 700, 600>
C = 600
< T1 commit>
Figure 5.5 State of system log and database corresponding to T0 and T1.

Using the log, the system can handle any failure that does not result in the loss of information in
nonvolatile storage. The recovery scheme uses two recovery procedures:
• undo(Ti) restores the value of all data items updated by transaction Ti to the old values.
• redo(Ti) sets the value of all data items updated by transaction Ti to the new values.

After a failure has occurred, the recovery scheme consults the log to determine which transactions
need to be redone, and which need to be undone:
i. Transaction Ti needs to be undone if the log contains the record < Ti start>, but does not
contain the record < Ti commit>.
ii. Transaction Ti needs to be redone if the log contains both the record < Ti start> and the
record < Ti commit>.

As an illustration, return to our banking example, with transaction T0 and T1 executed one after the
other in the order T0 followed by T1. Suppose that the system crashes before the completion of the
transactions. We shall consider three cases. The state of the logs for each of these cases appears in
Figure 5.6.

Prepred By: Minu Choudhary Page 22


Figure 5.6 the same log, shown at three different times.

Figure 5.6 a, when the system comes back up, it finds the record < T0 start> in the log, but no
corresponding < T0 commit> record. Thus, transaction T0 must be undone, so an undo(T0) is
performed. As a result, the values in accounts A and B (on the disk) are restored to $1000 and $2000,
respectively.

Figure 5.6 b, when the system comes back up, two recovery actions need to be taken. The operation
undo(T1) must be performed, since the record < T1 start> appears in the log, but there is no record <
T1 commit>. The operation redo(T0)must be performed, since the log contains both the record < T0
start> and the record < T0 commit>. At the end of the entire recovery procedure, the values of
accounts A, B, and C are $950, $2050, and $700, respectively. Note that the undo(T1) operation is
performed before the redo(T0). In this example, the same outcome would result if the order were
reversed.

Figure 5.6 c, when the system comes back up, both T0 and T1 need to be redone, since the records <
T0 start> and < T0 commit> appear in the log, as do the records < T1 start> and < T1 commit>. After
the system performs the recovery procedures redo(T0) and redo(T1), the values in accounts A, B, and
C are $950, $2050, and $600, respectively.

Checkpoints
When a system failure occurs, we must consult the log to determine those transactions that need to be
redone and those that need to be undone. In principle, we need to search the entire log to determine
this information. There are two major difficulties with this approach:
1. The search process is time consuming.

2. Most of the transactions that, according to our algorithm, need to be redone have already
written their updates into the database. Although redoing them will cause no harm, it will
nevertheless cause recovery to take longer.

To reduce these types of overhead, we introduce checkpoints. During execution, the system maintains
the log, using one of the two techniques, i.e. deferred or immediate database modification. In addition,
the system periodically performs checkpoints, which require the following sequence of actions to
take place:

1. Output onto stable storage all log records currently residing in main memory.

2. Output to the disk all modified buffer blocks.

3. Output onto stable storage a log record <checkpoint>.

Prepred By: Minu Choudhary Page 23


Transactions are not allowed to perform any update actions, such as writing to a buffer block or
writing a log record, while a checkpoint is in progress. The presence of a <checkpoint> record in the
log allows the system to streamline its recovery procedure.

During recovery we need to consider only the most recent transaction T i that started before the
checkpoint, and transactions that started after Ti.

1. Scan backwards from end of log to find the most recent <checkpoint> record.
2. Continue scanning backwards till a record <Ti start> is found.
3. Once the system has identified transaction Ti the redo and undo operations need to be
applied to only transaction Ti and all transactions Tj that started executing after
transaction Ti.
4. For all transactions (starting from Ti or later) with no <Ti commit>, execute
undo(Ti).
5. Scanning forward in the log, for all transactions starting from Ti or later with a <Ti
commit>, execute redo(Ti).
As an illustration, consider the set of transactions { T0, T1, . . ., T100 } executed in the order of the
subscripts. Suppose that the most recent checkpoint took place during the execution of transaction T67.
Thus, only transactions T67, T68, . . ., T100 need to be considered during the recovery scheme. Each of
them needs to be redone if it has committed; otherwise, it needs to be undone.

Figure 5.7 Example of Checkpoints

In the above example

 T1 can be ignored (updates already output to disk due to checkpoint)


 T2 and T3 redone.
 T4 undone

Shadow Paging

An alternative to log-based crash-recovery techniques is shadow paging. Under certain


circumstances, shadow paging may require fewer disk accesses than do the log-based methods. As
before, the database is partitioned into some number of fixed-length blocks, which are referred to as
pages. Assume that there are n pages, numbered 1 through n. We use a page table. Page table,
contains all the pages. The page table has n entries—one for each database page. Each entry
contains a pointer to a page on disk. The first entry contains a pointer to the first page of the database,
the second entry points to the second page, and so on. The example in Figure 5.8 shows that the
logical order of database pages does not need to correspond to the physical order in which the pages
are placed on disk.

Prepred By: Minu Choudhary Page 24


Figure 5.8 Sample page table

The key idea behind the shadow-paging technique is to maintain two page tables during the life of a
transaction: the current page table and the shadow page table. When the transaction starts, both
page tables are identical. The shadow page table is never changed over the duration of the transaction.
The current page table may be changed when a transaction performs a write operation. All input and
output operations use the current page table to locate database pages on disk.

Whenever any page is about to be written for the first time

 A copy of this page is made onto an unused page.


 The current page table is then made to point to the copy
 The update is performed on the copy
Figure 5.9 shows the shadow and current page tables for a transaction performing a write to the fourth
page of a database consisting of 10 pages. Intuitively, the shadow-page approach to recovery is to
store the shadow page table in nonvolatile storage, so that the state of the database prior to the
execution of the transaction can be recovered in the event of a crash, or transaction abort. When the
transaction commits, the system writes the current page table to nonvolatile storage.

Prepred By: Minu Choudhary Page 25


Figure 5.9 shadow and current page tables

To commit a transaction, we must do the following:

1. Ensure that all buffer pages in main memory are output to disk.
2. Output the current page table to disk.
3. Output the disk address of the current page table to the fixed location in stable storage
containing the address of the shadow page table. This action overwrites the address of the old
shadow page table. Therefore, the current page table has become the shadow page table, and
the transaction is committed.
• No recovery is needed after a crash — new transactions can start right away, using the
shadow page table.
• Pages not pointed to from current/shadow page table should be freed (garbage collected).

Advantages of shadow-page technique


1. The overhead of log-record output is eliminated
2. recovery from crashes is significantly faster (since no undo or redo operations are needed).

Drawbacks to the shadow-page technique

• Commit overhead. The commit of a single transaction using shadow paging requires multiple
blocks to be output—the actual data blocks, the current page table, and the disk address of the current
page table. Log-based schemes need to output only the log records, which, for typical small
transactions, fit within one block.

The overhead of writing an entire page table can be reduced by implementing the page table as a tree
structure, with page table entries at the leaves.

Prepred By: Minu Choudhary Page 26


• Data fragmentation. In Shadow paging causes database pages to change location when they are
updated. As a result, either we lose the locality property of the pages or we must resort to more
complex, higher-overhead schemes for physical storage management.

• Garbage collection. Each time that a transaction commits the database pages containing the old
version of data changed by the transaction become inaccessible. In Figure 17.9, the page pointed to by
the fourth entry of the shadow page table will become inaccessible once the transaction of that
example commits. Such pages are considered garbage, since they are not part of free space and do not
contain usable information. Garbage may be created also as a side effect of crashes. Periodically, it is
necessary to find all the garbage pages, and to add them to the list of free pages. This process, called
garbage collection, imposes additional overhead and complexity on the system. There are several
standard algorithms for garbage collection.

Prepred By: Minu Choudhary Page 27

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy