Transaction New
Transaction New
1. What is a Transaction?
A transaction is a logical unit of database processing that includes one or more operations (like reading,
writing, updating) on the database. The term transaction refers to a collection of operations that
form a single logical unit of work. For instance, transfer of money from one account to another is a
transaction consisting of two updates, one to each account.
All these steps must happen together. If any step fails, none of them should have any effect.
Every transaction should start with a consistent state of the database and leave it in a consistent state after
it finishes.
Consistency constraints (like primary key, foreign key, domain constraints) must never be
violated.
If a transaction is applied to a consistent database, and the transaction logic is correct, it will
result in a new consistent state.
To prevent this, isolation mechanisms like locking or multiversion control are used.
Even if there is a power failure, system crash, or disk failure after the commit, the changes
must remain.
Transaction Management
Achieved using logs and recovery techniques (like Write-Ahead Logging, ARIES protocol).
This guarantees that committed data will not be lost.
In real-time systems, multiple users and applications access the database simultaneously.
DBMS uses concurrency control protocols (like 2PL, timestamps, MVCC) to prevent such issues and ensure
serializability.
a. Handling Failures
Types of failures:
o Transaction failure (e.g., divide by zero, logical error)
o System crash (e.g., OS crash, power failure)
o Disk failure
DBMS ensures atomicity (all-or-nothing) and durability by:
o Logging changes
o Rollback for uncommitted transactions
o Recovery from system failures
Lost Update
Transaction Management
Temporary Inconsistency
Cascading Abort
A transaction is a unit of program execution that accesses and possibly updates various data items.To
preserve the integrity of data the database system must ensure:
1. Atomicity
Definition:
Atomicity ensures that a transaction is treated as a single indivisible unit. Either all operations in the
transaction are executed successfully, or none are.
Example:
Consider a transaction to transfer ₹1000 from Account A to B:
If the system crashes after step 1 but before step 2, the database will be in an inconsistent state.
With atomicity, both operations will be rolled back, and no money is transferred.
How it is ensured:
Using logs and rollback mechanisms. If failure occurs before commit, the transaction is aborted and
changes are undone.
2. Consistency
Definition:
Consistency ensures that a transaction takes the database from one consistent state to another.
It must preserve all integrity constraints.
Example:
In a banking system, the rule may be:
Total amount = A's balance + B's balance = ₹5000
Transaction:
A has ₹3000
B has ₹2000
Transfer ₹1000 from A to B
Transaction Management
After transaction:
A has ₹2000
B has ₹3000
Total = ₹5000 ✅ → Database is consistent
If a bug deducts ₹1000 from A but doesn’t credit B, the total becomes ₹4000 ❌ → violates consistency
How it is ensured:
By enforcing integrity constraints and correct transaction logic.
3. Isolation
Definition:
Isolation ensures that concurrent execution of transactions does not interfere with each other.
Each transaction should execute as if it were the only one in the system.
Example:
If T2 reads A's balance after the debit but before the credit, it might see a temporary incorrect balance.
How it is ensured:
Using concurrency control techniques like:
4. Durability
Transaction Management
Definition:
Durability ensures that once a transaction commits, its changes are permanent, even in the event of
system crashes.
Example:
With durability, when the system restarts, the record must still be there.
How it is ensured:
A transaction passes through several states during its lifetime. Understanding these states helps
manage failures, rollbacks, commits, and concurrency properly.
Transaction Management
1. Active State
Definition:
The transaction is in the active state when it is being executed.
Activities:
o Reading data
o Writing data (tentative)
o Performing computations
Note:
This is the initial state of every transaction.
Transition:
o If execution is successful → Partially Committed
o If an error occurs → Failed
Definition:
The transaction has completed its final statement and is almost ready to commit.
Activities:
o The system performs final checks.
o Updates are written to the log, not yet to the database.
Transition:
o If commit is successful → Committed
o If a failure occurs before commit → Failed
3. Failed State
Definition:
The transaction cannot proceed due to an error or failure.
Causes:
Transaction Management
o Logical errors (e.g., divide by zero)
o System crashes
o Violated integrity constraints
Action:
o The system undoes the transaction's changes using the log.
o Then moves to Aborted State.
4. Aborted State
Definition:
The transaction has been rolled back, and all changes made are undone.
Two Possibilities:
Restart the transaction :It can restart the transaction, but only if the transaction was aborted
as a result of some hardware or software error that was not created through the internal
logic of the transaction. A restarted transaction is considered to be a new transaction.
Terminate permanently:It can kill the transaction. It usually does so because of some
internal logical
error that can be corrected only by rewriting the application program, or
because the input was bad, or because the desired data were not found in the
database.
We must be cautious when dealing with observable external writes, such as writes to a user’s
screen, or sending email.
Note:
The database is returned to its consistent state using undo operations.
5. Committed State
Definition:
The transaction has completed successfully and permanently saved its changes in the
database.
Durability Ensured:
o Even if the system crashes now, the changes will not be lost.
o Ensured by Write-Ahead Logging (WAL).
Summary Table
State Meaning
Active Transaction is executing
Transaction Management
State Meaning
Partially Committed All statements are executed, final step pending
Failed Error detected, transaction cannot continue
Aborted Changes undone, may restart or terminate
Committed All changes are saved permanently
1. Volatile Storage
2. Nonvolatile Storage
3. Stable Storage
🔹 1. Volatile Storage
✅ Definition:
Volatile storage refers to memory that loses data when power is lost or a system crash
occurs.
🧠 Examples:
⚡ Characteristics:
❌ Limitation:
🔹 2. Nonvolatile Storage
✅ Definition:
Nonvolatile storage retains data even if the system crashes or is powered off.
🧠 Examples:
Secondary storage:
o Magnetic disks (e.g., HDDs)
o Flash storage (e.g., SSDs)
Tertiary storage:
o Optical media (CDs/DVDs)
o Magnetic tapes
🛠 Characteristics:
⚠ Limitation:
Though data survives crashes, hardware failures (e.g., disk crash) can still cause data loss.
Not sufficient alone to guarantee stable long-term durability.
🔹 3. Stable Storage
✅ Definition:
Stable storage is designed to never lose information—even during power failures, crashes,
or hardware faults.
📝 Note:
Transaction Management
Perfect stable storage is theoretical (nothing is 100% fail-proof).
Practically, we approximate it using redundancy and careful design.
Redundancy:
o Store multiple copies of data across independent nonvolatile storage devices.
Failure-Resistant Techniques:
o If one copy fails, others remain intact.
Safe update protocols:
o Ensure no data is lost if a failure occurs during an update.
RAID systems often help approximate stable storage.
Battery-backed RAM and write-ahead logging also contribute.
✅ Atomicity
✅ Durability
Once a transaction is committed, its effects must persist, even if the system crashes
afterward.
This is ensured by writing committed data to stable storage.
Transaction Isolation
🔹 Definition
Transaction isolation refers to the ability of a database to allow multiple transactions to occur concurrently
without interfering with each other and still maintain data consistency.
Though serial execution (one after another) ensures consistency easily, concurrent execution is preferred
due to:
Transaction Management
1. Improved Throughput & Resource Utilization
o Transactions involve both CPU and I/O operations.
o While one transaction is waiting for disk I/O, the CPU can process another
transaction.
o Multiple transactions can use CPU and disk in parallel, which:
Increases system throughput (more transactions completed per unit time)
Improves CPU and disk utilization
Reduces system idle time
2. Reduced Waiting Time
o Some transactions are short, others are long.
o In serial execution, short transactions must wait for longer ones.
o With concurrency:
Transactions can run in parallel if they use different data
Reduces unpredictable delays
Decreases average response time
o
Concurrent execution.
Consider the simplified banking system of which has several accounts, and a set of transactions that access
and update those accounts Suppose the current values of accounts A and B are $1000 and $2000,
respectively.
Suppose also that the two transactions are executed one at a time in the order T1 followed by T2. In the
sequence of instruction steps is in chronological order from top to bottom.
The final values of accounts A and B, after the execution in takes place, are $855 and $2145, respectively.
Transaction Management
Similarly, if the transactions are executed one at a time in the order T2 followed by T1, then the
corresponding execution sequence is.
The execution sequences just described are called schedules. They represent the chronological order in
which instructions are executed in the system. a schedule for a set of transactions must consist of all
instructions of those transactions,and must preserve the order in which the instructions appear in
each
individual transaction.
These schedules are serial and for a set of n transactions, there exist n factorial (n!) different valid serial
schedules.
Transaction Management
After the execution of this schedule, we arrive at a state where the final values of accounts A and B are
$950 and $2100, respectively.This final state is an inconsistent state, since we have gained $50 in the
process of the concurrent execution. Indeed, the sum A + B is not preserved by the execution of the
two transactions.
Serializability
🔹 What is Serializability?
When steps from multiple transactions are interleaved, it's harder to determine if the result
is the same as a serial execution.
Transactions are like programs with many instructions, so analyzing every detail is complex.
Transaction Management
Conflict Serializability
🔹 Swapping Instructions
In a schedule, if two consecutive instructions I (from transaction Ti) and J (from Tj, where i ≠
j):
o Access different data items → they can be swapped safely.
o Access the same data item (Q) → the order may affect the result, depending on the
operation types.
🔹 Key Point
If a schedule S can be transformed into a schedule S_ by a series of swaps of non conflicting instructions,
we say that S and S_ are conflict equivalent
The concept of conflict equivalence leads to the concept of conflict serializability. We say that a schedule S
is conflict serializable if it is conflict equivalent to a serial schedule.
If the precedence graph for S has a cycle, then schedule S is not conflict serializable.
If the graph contains no cycles, then the schedule S is conflict serializable.
A serializability order of the transactions can be obtained by finding a linear
order consistent with the partial order of the precedence graph. This process is
called topological sorting.
Transaction Management
So far, we have studied schedules while assuming implicitly that there are no transaction failures. We now
address the effect of transaction failures during concurrent execution.
If a transaction Ti fails, for whatever reason, we need to undo the effect of this transaction to ensure the
atomicity property of the transaction. In a system that allows concurrent execution, the atomicity
property requires that any transaction Tj that is dependent on Ti (that is, Tj has read data written by
Ti) is also aborted. To achieve this, we need to place restrictions on the type of schedules permitted
in the system.
Transaction Management
Recoverable Schedules
Above is a partial schedule because we have not included a commit or abort operation for T6. Above
Schedule 9 is an example of a nonrecoverable schedule. Arecoverable schedule is one where, for
each pair of transactions Ti and Tj such that Tj reads a data item previously written by Ti , the commit
operation of Ti appears before the commit operation of Tj .
Cascadeless Schedules
Cascading rollback is undesirable, since it leads to the undoing of a significant amount of work. It is
desirable to restrict the schedules to those where cascading rollbacks cannot occur. Such schedules
are called cascadeless schedules.
Formally,a cascadeless schedule is one where, for each pair of transactions Ti and Tj such that Tj reads a
data item previously written by Ti , the commit operation of Ti appears before the read operation of
Tj . It is easy to verify that every cascadeless schedule is also recoverable.
• Serializable usually ensures serializable execution. Some database systems implement this isolation level
in a manner that may, in certain cases, allow nonserializable executions.
Transaction Management
• Repeatable read allows only committed data to be read and further requires
that, between two reads of a data item by a transaction, no other transaction
is allowed to update it. However, the transaction may not be serializable
with respect to other transactions. For instance, when it is searching for data
satisfying some conditions, a transaction may find some of the data inserted
by a committed transaction, but may not find other data inserted by the same
transaction.
• Read committed allows only committed data to be read, but does not require
repeatable reads. For instance, between two reads of a data item by the transaction,another transaction
may have updated the data item and committed.
Concurrency Control
When several transactions execute concurrently in the database, however,the isolation property may no
longer be preserved. To ensure that there are a variety of concurrency-control schemes. No one
scheme is clearly the best; each one has advantages. In practice, the most frequently used schemes
are two-phase locking and snapshot isolation.
Lock-Based Protocols
The most common method used to implement this requirement is to allow a transaction to access a data
item only if it is currently holding a lock on that item.
15.1.1 Locks
There are various modes in which a data item may be locked. In this section, we restrict our attention to
two modes:
1. Shared. If a transaction Ti has obtained a shared-mode lock (denoted by S) on item Q, then Ti can read,
but cannot write, Q.
2. Exclusive. If a transaction Ti has obtained an exclusive-mode lock (denoted by X) on item Q, then Ti can
both read and write Q.
The transaction makes the request to the concurrency-control manager. The transaction can proceed with
the operation only after the concurrency-control manager grants the lock to the transaction. The use
of these two lock modes allows multiple transactions to read a data item but limits write access to
just onetransaction at a time.
If transaction Tican be granted a lock on Q immediately, in spite of the presence of the mode B lock, then
we say mode A is compatible with mode B. At any time, several shared-mode locks can be held
simultaneously (by
Transaction Management
different transactions) on a particular data item. A subsequent exclusive-mode lock request has to wait
until the currently held shared-mode locks are released.
Transaction Ti may unlock a data item that it had locked at some earlier point. Note that a transaction must
hold a lock on a data item as long as it accesses that item. Moreover, it is not necessarily desirable
for a transaction to unlock a data item immediately after its final access of that data item, since
serializability may not be ensured.
Suppose that the values of accounts A and B are $100 and $200, respectively. If these two transactions are
executed serially, either in the order T1, T2 or the order T2, T1, then transaction T2 will display the
value $300. If, however, these transactions are executed concurrently, then an inconsistent state
way arise
Suppose now that unlocking is delayed to the end of the transaction. You should verify that the sequence
of reads and writes in schedule 1, which lead to an incorrect total of $250 being displayed, is no
longer possible.
Unfortunately, locking can lead to an undesirable situation. Consider the partial schedule of Figure 15.7 for
T3 and T4.
Transaction Management
We have arrived at a state where neither of these transactions can ever proceed with its normal execution.
This situation is called deadlock. When deadlock occurs, the system must roll back one of the two
transactions.
If we do not use locking, or if we unlock data items too soon after reading or writing them, we may get
inconsistent states. On the other hand, if we do not unlock a data item before requesting a lock on
another data item, deadlocks may occur.
We shall require that each transaction in the system follow a set of rules, called a locking protocol,
indicating when a transaction may lock and unlock each of the data items.
A schedule S is legal under a given locking protocol if S is a possible schedule for a set of transactions that
follows the rules of the locking protocol.We say that a locking protocol ensures conflict serializability
if and only if all legal schedules are conflict serializable; in other words, for all legal schedules the
associated→relation is acyclic.
Granting of Locks
When a transaction requests a lock on a data item in a particular mode, and no other transaction has a lock
on the same data item in a conflicting mode, the lock can be granted. In this situation starvation may
occur.
We can avoid starvation of transactions by granting locks in the following manner: When a transaction Ti
requests a lock on a data item Q in a particular mode M, the concurrency-control manager grants
the lock provided that:
One protocol that ensures serializability is the two-phase locking protocol. Thisprotocol requires that each
transaction issue lock and unlock requests in twophases:
1. Growing phase. A transaction may obtain locks, but may not release any lock.
2. Shrinking phase. A transaction may release locks, but may not obtain any new locks.
The point in the schedule where the transaction has obtained its final lock (the end of its growing phase) is
called the lock point of the transaction. Now, transactions can be ordered according to their lock
points—
this ordering is, in fact, a serializability ordering for the transactions.
Cascading rollbacks can be avoided by a modification of two-phase locking called the strict two-phase
locking protocol. This protocol requires not only that locking be two phase, but also that all
exclusive-mode locks taken by a transaction be held until that transaction commits. Another variant
of two-phase locking is the rigorous two-phase locking protocol, which requires that all locks be held
until the transaction commits.
Consider the following two transactions, forwhich we have shown only some of the significant read and
write operations:
T8:
read(a1);
read(a2);
...
read(an);
write(a1).
T9: read(a1);
read(a2);
display(a1 + a2).
If we employ the two-phase locking protocol, then T8 must lock a1 in exclusive mode. Therefore, any
concurrent execution of both transactions amounts to a serial execution. Notice, however, that T8
needs an exclusive lock on a1 only at the end of its execution, when it writes a1. Thus, if T8 could
initially lock a1 in shared mode, and then could later change the lock to exclusive mode, we could
get more concurrency, since T8 and T9 could access a1 and a2 simultaneously.
Transaction Management
• When a transaction Ti issues a read(Q) operation, the system issues a lock- S(Q) instruction followed by
the read(Q) instruction.
• When Ti issues a write(Q) operation, the system checks to see whether Ti already holds a shared lock on
Q. If it does, then the system issues an upgrade(Q) instruction, followed by the write(Q) instruction.
Otherwise, the system issues a lock-X(Q) instruction, followed by the write(Q) instruction.
• All locks obtained by a transaction are unlocked after that transaction commits or aborts.
When a lock request message arrives, it adds a record to the end of the linked list for the data item, if the
linked list is present. Otherwise it creates a new linked list, containing only the record for the request.
Transaction Management
It always grants a lock request on a data item that is not currently locked. But if the transaction requests a
lock on an item on which a lock is currently held, the lock manager grants the request only if it is
compatible with the locks that are currently held, and all earlier requests have been granted
already.Otherwise the request has to wait.
When the lock manager receives an unlock message from a transaction, it deletes the record for that data
item in the linked list corresponding to that transaction. It tests the record that follows, if any, as
described in the previous paragraph, to see if that request can now be granted. If it can, the lock
manager grants that request, and processes the record following it, if any, similarly, and so on.
• If a transaction aborts, the lock manager deletes any waiting request made by the transaction. Once the
database system has taken appropriate actions to undo the transaction it releases all locks held by
the aborted transaction.
Graph Based Protocals
To develop protocols that are not two phase,we need additional information on how each transaction will
access the database.There are various models that can give us the additional information, each
differing in the amount of information provided. The simplest model requires that we have prior
knowledge about the order in which the database items will be accessed. Given such information, it
is possible to construct locking protocols that are not two phase, but that, nevertheless, ensure
conflict serializability.
To acquire such prior knowledge, we impose a partial ordering→on the set
D = {d1, d2, . . . , dh} of all data items. If di → dj , then any transaction accessing both di and dj must access
di before accessing dj . This partial ordering may be the result of either the logical or the physical
organization of the data, or it may be imposed solely for the purpose of concurrency control.
The partial ordering implies that the set D may now be viewed as a directed acyclic graph, called a
database graph.
In the tree protocol, the only lock instruction allowed is lock-X. Each transaction
Ti can lock a data item at most once, and must observe the following
rules:
1. The first lock by Ti may be on any data item.
2. Subsequently, a data item Q can be locked by Ti only if the parent of Q is
currently locked by Ti .
3. Data items may be unlocked at any time.
4. A data item that has been locked and unlocked by Ti cannot subsequently
be relocked by Ti .
All schedules that are legal under the tree protocol are conflict serializable
Transaction Management
The tree protocol in Figure 15.12 does not ensure recoverability and cascadelessness.
To ensure recoverability and cascadelessness, the protocol can be
modified to not permit release of exclusive locks until the end of the transaction.
Holding exclusive locks until the end of the transaction reduces concurrency.
Transaction Management
Here is an alternative that improves concurrency, but ensures only recoverability:
For each data item with an uncommitted write, we record which transaction
performed the last write to the data item. Whenever a transaction Ti performs a
read of an uncommitted data item, we record a commit dependency of Ti on the transaction that
performed the last write to the data item. Transaction Ti is then
not permitted to commit until the commit of all transactions on which it has a
commit dependency. If any of these transactions aborts, Ti must also be aborted.
The tree-locking protocol has an advantage over the two-phase locking protocol
in that, unlike two-phase locking, it is deadlock-free, so no rollbacks are
required. The tree-locking protocol has another advantage over the two-phase
locking protocol in that unlocking may occur earlier. Earlier unlocking may lead
to shorter waiting times, and to an increase in concurrency.
However, the protocol has the disadvantage that, in some cases, a transaction
may have to lock data items that it does not access. For example, a transaction that needs to access data
items A and J in the database graph of Figure 15.11 must lock not only A and J, but also data items
B, D, and H. This additional locking results in increased locking overhead, the possibility of additional
waiting time, and apotential decrease in concurrency.
Deadlock Handling
A system is in a deadlock state if there exists a set of transactions such that every transaction in the set is
waiting for another transaction in the set.
The only remedy to this undesirable situation is for the system to invoke
some drastic action, such as rolling back some of the transactions involved in
the deadlock. Rollback of a transaction may be partial: That is, a transaction may be rolled back to the
point where it obtained a lock whose release resolves the deadlock.
There are two principal methods for dealing with the deadlock problem. We
can use a deadlock prevention protocol to ensure that the system will never enter a deadlock state.
Alternatively, we can allow the system to enter a deadlock state, and then try to recover by using a
deadlock detection and deadlock recovery scheme.
Deadlock Prevention
The approach for preventing deadlocks is to use preemption and
transaction rollbacks. In preemption, when a transaction Tj requests a lock that
transaction Ti holds, the lock granted to Ti may be preempted by rolling back
of Ti , and granting of the lock to Tj . To control the preemption, we assign a
unique timestamp, based on a counter or on the system clock, to each transaction
when it begins. The system uses these timestamps only to decide whether a
transaction should wait or roll back. Locking is still used for concurrency control.
If a transaction is rolled back, it retains its old timestamp when restarted. Two
different deadlock-prevention schemes using timestamps have been proposed:
Transaction Management
1. The wait–die scheme is a nonpreemptive technique. When transaction Ti
requests a data item currently held by Tj , Ti is allowed to wait only if it has
a timestamp smaller than that of Tj (that is, Ti is older than Tj ). Otherwise,
Ti is rolled back (dies).
a. How long the transaction has computed, and how much longer the
transaction will compute before it completes its designated task.
c. How many more data items the transaction needs for it to complete
.
d. How many transactions will be involved in the rollback.
Multiple Granularity
Multiple-Granularity Locking:
Problem:
Locking many small items individually (like every record) can be slow and heavy for big
Transaction Management
operations. But locking everything (the entire database) would kill concurrency for smaller
transactions.
Solution:
Use different levels of granularity — database → area → file → record — structured in a tree
.
Each level can be locked separately.
.
Each node in the tree can be locked individually. As we did in the two phase
locking protocol, we shall use shared and exclusive lock modes. When a
transaction locks a node, in either shared or exclusive mode, the transaction also
has implicitly locked all the descendants of that node in the same lock mode.
For example, if transaction Ti gets an explicit lock on file in exclusive mode, then it has an implicit lock in
exclusive mode on all the records belonging to that file.
Key ideas:
Implicit locks:
If a transaction locks a node (like a file), it automatically has an implicit lock on all children
(like records in that file).
This protocol enhances concurrency and reduces lock overhead. It is particularlyuseful in applications
that include a mix of:
Short transactions that touch just a few records don’t block long transactions that need a
whole file (and vice versa).
Fewer locks means less overhead.
Still maintains serializability.