dbms_unit_4
dbms_unit_4
Failure Classification
To see where the problem has occurred, we generalize a failure into various
categories, as follows −
Transaction failure
A transaction has to abort when it fails to execute or when it reaches a point from
where it can’t go any further. This is called transaction failure where only a few
transactions or processes are hurt.
Reasons for a transaction failure could be −
Logical errors − Where a transaction cannot complete because it has some
code error or any internal error condition.
System errors − Where the database system itself terminates an active
transaction because the DBMS is not able to execute it, or it has to stop
because of some system condition. For example, in case of deadlock or
resource unavailability, the system aborts an active transaction.
System Crash
There are problems − external to the system − that may cause the system to stop
abruptly and cause the system to crash. For example, interruptions in power supply
may cause the failure of underlying hardware or software failure.
Examples may include operating system errors.
Disk Failure
Storage Structure
We have already described the storage system. In brief, the storage structure can
be divided into two categories −
Volatile storage − As the name suggests, a volatile storage cannot survive
system crashes. Volatile storage devices are placed very close to the CPU;
normally they are embedded onto the chipset itself. For example, main
memory and cache memory are examples of volatile storage. They are fast
but can store only a small amount of information.
Non-volatile storage − These memories are made to survive system
crashes. They are huge in data storage capacity, but slower in accessibility.
1
Examples may include hard-disks, magnetic tapes, flash memory, and non-
volatile (battery backed up) RAM.
When a system crashes, it may have several transactions being executed and
various files opened for them to modify the data items. Transactions are made of
various operations, which are atomic in nature. But according to ACID properties of
DBMS, atomicity of transactions as a whole must be maintained, that is, either all
the operations are executed or none.
When a DBMS recovers from a crash, it should maintain the following −
It should check the states of all the transactions, which were being executed.
A transaction may be in the middle of some operation; the DBMS must
ensure the atomicity of the transaction in this case.
It should check whether the transaction can be completed now or it needs to
be rolled back.
No transactions would be allowed to leave the DBMS in an inconsistent state.
There are two types of techniques, which can help a DBMS in recovering as well as
maintaining the atomicity of a transaction −
Maintaining the logs of each transaction, and writing them onto some stable
storage before actually modifying the database.
Maintaining shadow paging, where the changes are done on a volatile
memory, and later, the actual database is updated.
Log-Based Recovery
When the system is crashed, then the system consults the log to find
which transactions need to be undone and which need to be redone.
1. If the log contains the record <Ti, Start> and <Ti, Commit> or <Ti,
Commit>, then the Transaction Ti needs to be redone.
2. If log contains record<Tn, Start> but does not contain the record
either <Ti, commit> or <Ti, abort>, then the Transaction Ti needs
to be undone.
Checkpoint
o The recovery system reads log files from the end to start. It reads
log files from T4 to T1.
o Recovery system maintains two lists, a redo-list, and an undo-list.
o The transaction is put into redo state if the recovery system sees a
log with <Tn, Start> and <Tn, Commit> or just <Tn, Commit>. In
the redo-list and their previous list, all the transactions are removed
and then redone before saving their logs.
o For example: In the log file, transaction T2 and T3 will have <Tn,
Start> and <Tn, Commit>. The T1 transaction will have only <Tn,
commit> in the log file. That's why the transaction is committed
after the checkpoint is crossed. Hence it puts T1, T2 and T3
transaction into redo list.
o The transaction is put into undo state if the recovery system sees a
log with <Tn, Start> but no commit or abort log found. In the undo-
list, all the transactions are undone, and their logs are removed.
o For example: Transaction T4 will have <Tn, Start>. So T4 will be
put into undo list since this transaction is not yet complete and
failed amid.
4
o Whenever more than one transaction is being executed, then the
interleaved of logs occur. During recovery, it would become difficult
for the recovery system to backtrack all logs and then start
recovering.
o To ease this situation, 'checkpoint' concept is used by most DBMS.
Transaction rollback :
In this scheme, we rollback a failed transaction by using the log.
The system scans the log backward a failed transaction, for every log record found in the
log the system restores the data item.
Checkpoints :
Checkpoints is a process of saving a snapshot of the applications state so that it can restart
from that point in case of failure.
Checkpoint is a point of time at which a record is written onto the database form the
buffers.
Checkpoint shortens the recovery process.
When it reaches the checkpoint, then the transaction will be updated into the database,
and till that point, the entire log file will be removed from the file. Then the log file is
updated with the new step of transaction till the next checkpoint and so on.
The checkpoint is used to declare the point before which the DBMS was in the consistent
state, and all the transactions were committed.
To ease this situation, ‘Checkpoints‘ Concept is used by the most DBMS.
In this scheme, we used checkpoints to reduce the number of log records that the system
must scan when it recovers from a crash.
In a concurrent transaction processing system, we require that the checkpoint log record
be of the form <checkpoint L>, where ‘L’ is a list of transactions active at the time of the
checkpoint.
A fuzzy checkpoint is a checkpoint where transactions are allowed to perform updates
even while buffer blocks are being written out.
5
Restart recovery :
When the system recovers from a crash, it constructs two lists.
The undo-list consists of transactions to be undone, and the redo-list consists of
transaction to be redone.
The system constructs the two lists as follows: Initially, they are both empty. The system
scans the log backward, examining each record, until it finds the first <checkpoint>
record.
Introduction of Shadow Paging
Shadow Paging is recovery technique that is used to recover database. In this
recovery technique, database is considered as made up of fixed size of logical
units of storage which are referred as pages. pages are mapped into physical
blocks of storage, with help of the page table which allow one entry for each
logical page of database. This method uses two page tables named current page
table and shadow page table.
The entries which are present in current page table are used to point to most
recent database pages on disk. Another table i.e., Shadow page table is used
when the transaction starts which is copying current page table. After this, shadow
page table gets saved on disk and current page table is going to be used for
transaction. Entries present in current page table may be changed during execution
but in shadow page table it never get changed. After transaction, both tables
become identical.
This technique is also known as Cut-of-Place updating.
7
Failure with Loss of Nonvolatile Storage
Until now, we have considered only the case where a failure results in the loss of
information residing in volatile storage while the content of the nonvolatile storage
remains intact. Although failures in which the content of nonvolatile storage is lost
are rare, we nevertheless need to be prepared to deal with this type of failure. In this
section, we discuss only disk storage. Our discussions apply as well to other
nonvolatile storage types.
The basic scheme is to dump the entire content of the database to stable storage
periodically—say, once per day. For example, we may dump the database to one or
more magnetic tapes. If a failure occurs that results in the loss of physical database
blocks, the system uses the most recent dump in restoring the database to a
previous consistent state. Once this restoration has been accomplished, the system
uses the log to bring the database system to the most recent consistent state.
More precisely, no transaction may be active during the dump procedure, and a
procedure similar to checkpointing must take place:
1. Output all log records currently residing in main memory onto stable storage.
Steps 1, 2, and 4 correspond to the three steps used for checkpoints in Section 17.4.3.
To recover from the loss of nonvolatile storage, the system restores the database to disk by
using the most recent dump. Then, it consults the log and redoes all the transactions that have
committed since the most recent dump occurred. Notice that no undo operations need to be
executed.
A dump of the database contents is also referred to as an archival dump, since we can archive
the dumps and use them later to examine old states of the database.
Dumps of a database and checkpointing of buffers are similar.
The simple dump procedure described here is costly for the following two reasons.
First, the entire database must be be copied to stable storage, resulting in considerable data
transfer. Second, since transaction processing is halted during the dump procedure, CPU
cycles are wasted. Fuzzy dump schemes have been developed, which allow transactions to be
active while the dump is in progress.
8
Database Backup and Recovery from Catastrophic Failures
So far, all the techniques we have discussed apply to noncatastrophic failures. A key
assumption has been that the system log is maintained on the disk and is not lost as a result of
the failure. Similarly, the shadow directory must be stored on disk to allow recovery when
shadow paging is used. The recovery techniques we have dis-cussed use the entries in the
system log or the shadow directory to recover from fail-ure by bringing the database back to a
consistent state.
The recovery manager of a DBMS must also be equipped to handle more catastrophic failures
such as disk crashes. The main technique used to handle such crashes is a database backup,
in which the whole database and the log are periodically copied onto a cheap storage medium
such as magnetic tapes or other large capacity offline storage devices. In case of a
catastrophic system failure, the latest backup copy can be reloaded from the tape to the disk,
and the system can be restarted.
Data from critical applications such as banking, insurance, stock market, and other databases
is periodically backed up in its entirety and moved to physically separate safe locations.
Subterranean storage vaults have been used to protect such data from flood, storm,
earthquake, or fire damage. Events like the 9/11 terrorist attack in New York (in 2001) and
the Katrina hurricane disaster in New Orleans (in 2005) have created a greater awareness
of disaster recovery of business-critical databases.
To avoid losing all the effects of transactions that have been executed since the last backup, it
is customary to back up the system log at more frequent intervals than full database backup
by periodically copying it to magnetic tape. The system log is usually substantially smaller
than the database itself and hence can be backed up more frequently. Therefore, users do not
lose all transactions they have performed since the last database backup. All committed
transactions recorded in the portion of the system log that has been backed up to tape can
have their effect on the data-base redone. A new log is started after each database backup.
Hence, to recover from disk failure, the database is first recreated on disk from its latest
backup copy on tape. Following that, the effects of all the committed transactions whose
operations have been recorded in the backed-up copies of the system log are reconstructed.
9
Authentication
User authentication is to make sure that the person accessing the database is who
he claims to be. Authentication can be done at the operating system level or even
the database level itself. Many authentication systems such as retina scanners or
bio-metrics are used to make sure unauthorized people cannot access the database.
Authorization
Authorization is a privilege provided by the Database Administer. Users of the
database can only view the contents they are authorized to view. The rest of the
database is out of bounds to them.
The different permissions for authorizations available are:
Entity Integrity - This is related to the concept of primary keys. All tables
should have their own primary keys which should uniquely identify a row and
not be NULL.
Referential Integrity - This is related to the concept of foreign keys. A
foreign key is a key of a relation that is referred in another relation.
Domain Integrity - This means that there should be a defined domain for all
the columns in a database.
10
Relational databases catered to the need of backend database in
application development arena for quite a long time, but there are other
types of databases that have the potential to do the same. We should not
overlook their merit while dealing with a RDBMS as they can serve similar
purposes equally well, and possibly in a better way. These database
models are related to the relational concepts in doing what they do best.
This article provides a perspective on the philosophies of database
systems that are designed to be different from the relation database
model yet have a similar kind of vibe in their approach.
Introduction
There are quite a few popular variants or types of database system such
as relational (RDBMS), object-relational (ORDBMS) and object-oriented
(OODBMS). The differences lie in the architecture or design. They have
many common as well as distinct features. Here, we provide a brief
overview on each of them.
The idea of a relational database and its associated features that form a
package called RDBMS flowered from the basic concept of a collection of
relations. Each relation is represented as a table and the rows in the table
represents a collection of related data values. The table and the column
name give a clue to the meaning of the values in each row. For example,
each row in the EMPLOYEE table represents a specific real-world entity
11
such as a record of an employee. The column names such as – emp_id,
name, birth_date, join_date determines how to interpret the data values
in each record according to the column it is in. Of course, all values in the
column must be the same data type, otherwise they disturb consistency.
There are many merits and demerits of this system. If we sum up the
merit, it is the its robustness that has stood the test of time. However,
down the years its upkeep has significantly increased both its size and
complexity. The addition and modification of feature as well as the tools
built to support changing needs, often had the challenge to coerce
different technologies. This is a clear call for unavoidable complexity.
Even having to deal with many of its pros and cons, relational database
management systems (RDBMS) are here to stay. This is true even in the
day and age of big data, No SQL and what not. Professionals have
devoted a lot of their expertise to improving and assimilating it with many
of new technological system. When the wave of object-oriented
technology became the forefront of the application development, RDBMS
seemed too incompatible. But miscible components were conjugated with
techniques like Hibernate, JTA and the like. These techniques became so
popular and effective that RDBMS may not seem like a system of yore
and can be quite a reliable backend companion in many projects. The fact
is, there are many old systems still available today and many new
systems thriving on the relational model.
Note that the core technology used in ORDBMS is based upon the
relational model. The commercial implementations simply added a layer
of some of the object-oriented principle on top of the relational database
management system. The simplest example is Microsoft SQL Server.
Since this system is based on relational model, there is one more problem
added to it: translating object-oriented concept to relational mechanism.
However, this problem is extenuated by an object-oriented application
12
that helps in the communication between the object-oriented application
with the underlying relational databases.
Conclusion
Database technologies have also taken to new height due to the advances
in technology. RDBMS has stayed relevant in the evolution even today.
ORDBMS and OODMS draws many inspirations from RDBMS. ORDBMS
provide some layer of data encapsulation and behavior. Database vendors
often build extensions to the statement-response interfaces by extending
SQL to contain object descriptors and spatial query mechanisms.
13
n object-oriented database (OODBMS) or object database management system
(ODBMS) is a database that is based on object-oriented programming (OOP). The
data is represented and stored in the form of objects. OODBMS are also called
object databases or object-oriented database management systems.
In this article, we will discuss what object-oriented databases are and why they are
useful.
Object-Oriented Database
The idea of object databases was originated in 1985 and today has become
common for various common OOP languages, such as C++, Java, C#, Smalltalk,
and LISP. Common examples are Smalltalk is used in GemStone, LISP is used in
Gbase, and COP is used in Vbase.
Object databases are commonly used in applications that require high performance,
calculations, and faster results. Some of the common applications that use object
databases are real-time systems, architectural & engineering for 3D modeling,
telecommunications, and scientific products, molecular science, and astronomy.
In a typical relational database, the program data is stored in rows and columns. To
store and read that data and convert it into program objects in memory requires
reading data, loading data into objects, and storing it in memory. Imagine creating a
class in your program and saving it as it is in a database, reading back and start
using it again.
Object databases bring permanent persistent to objects. Objects can be stored in
persistent storage forever.
Some object database can be used in multiple languages. For example, Gemstone
database supports C++, Smalltalk and Java programming languages.
14
What Is Object Relational Database?
Key Differences
15
1. In object oriented database, relationships are represented by
references via the object identifier (OID).
2. Object oriented systems employ indexing techniques to locate
disk pages that store the object. Therefore, they are able to
provide persistent storage for complex-structured objects.
3. Handles larger and complex data than RDBMS.
4. The constraints supported by object oriented systems vary from
system to system.
5. In object oriented systems, the data management language is
typically incorporated into a programming language such as
#C++.
6. Stores data entries are described as object.
7. Object oriented database can handle different types of data.
8. In the object oriented database, the data is stored in the form of
objects.
his chapter introduces the concept of DDBMS. In a distributed database, there are a
number of databases that may be geographically distributed all over the world. A
distributed DBMS manages the distributed database in a manner so that it appears
as one single database to users. In the later part of the chapter, we go on to study
the factors that lead to distributed databases, its advantages and disadvantages.
A distributed database is a collection of multiple interconnected databases, which
are spread physically across various locations that communicate via a computer
network.
16
Features
Databases in the collection are logically interrelated with each other. Often
they represent a single logical database.
Data is physically stored across multiple sites. Data in each site can be
managed by a DBMS independent of the other sites.
The processors in the sites are connected via a network. They do not have
any multiprocessor configuration.
A distributed database is not a loosely connected file system.
A distributed database incorporates transaction processing, but it is not
synonymous with a transaction processing system.
Features
17
data from other sites while the damaged site is being reconstructed. Thus,
database failure may become almost inconspicuous to users.
Support for Multiple Application Software − Most organizations use a
variety of application software each with its specific database support.
DDBMS provides a uniform functionality for using the same data among
different platforms.
18
Types of Distributed Databases
In a homogeneous distributed database, all the sites use identical DBMS and
operating systems. Its properties are −
The sites use very similar software.
The sites use identical DBMS or DBMS from the same vendor.
Each site is aware of all other sites and cooperates with other sites to process
user requests.
The database is accessed through a single interface as if it is a single
database.
Design Alternatives
The distribution design alternatives for the tables in a DDBMS are as follows −
In this design alternative, different tables are placed at different sites. Data is placed
so that it is at a close proximity to the site where it is used most. It is most suitable
for database systems where the percentage of queries needed to join information in
tables placed at different sites is low. If an appropriate distribution strategy is
adopted, then this design alternative helps to reduce the communication cost during
data processing.
Fully Replicated
In this design alternative, at each site, one copy of all the database tables is stored.
Since, each site has its own copy of the entire database, queries are very fast
requiring negligible communication cost. On the contrary, the massive redundancy
in data requires huge cost during update operations. Hence, this is suitable for
systems where a large number of queries is required to be handled whereas the
number of database updates is low.
Partially Replicated
20
Copies of tables or portions of tables are stored at different sites. The distribution of
the tables is done in accordance to the frequency of access. This takes into
consideration the fact that the frequency of accessing the tables vary considerably
from site to site. The number of copies of the tables (or portions) depends on how
frequently the access queries execute and the site which generate the access
queries.
Fragmented
In this design, a table is divided into two or more pieces referred to as fragments or
partitions, and each fragment can be stored at different sites. This considers the fact
that it seldom happens that all data stored in a table is required at a given site.
Moreover, fragmentation increases parallelism and provides better disaster
recovery. Here, there is only one copy of each fragment in the system, i.e. no
redundant data.
The three fragmentation techniques are −
Vertical fragmentation
Horizontal fragmentation
Hybrid fragmentation
Mixed Distribution
This is a combination of fragmentation and partial replications. Here, the tables are
initially fragmented in any form (horizontal or vertical), and then these fragments are
partially replicated across the different sites according to the frequency of accessing
the fragments.
A spatial database is a database that is optimized for storing and querying data that
represents objects defined in a geometric space. Most spatial databases allow representing
simple geometric objects such as points, lines and polygons. Some spatial databases handle
more complex structures such as 3D objects, topological coverages, linear networks,
and TINs. While typical databases have developed to manage various numeric and
character types of data, such databases require additional functionality to process spatial
data types efficiently, and developers have often added geometry or feature data types.
The Open Geospatial Consortium developed the Simple Features specification (first released
in 1997)[1] and sets standards for adding spatial functionality to database
systems.[2] The SQL/MM Spatial ISO/EIC standard is a part the SQL/MM multimedia standard
and extends the Simple Features standard with data types that support circular
interpolations.
More Costly
Creating and managing a database is quite costly. High cost software and hardware
is required for the database. Also highly trained staff is required to handle the
database and it also needs continuous maintenance. All of these ends up making a
database quite a costly venture.
High Complexity
21
A Database Management System is quite complex as it involves creating, modifying
and editing a database. Consequently, the people who handle a database or work
with it need to be quite skilled or valuable data can be lost.
Database handling staff required
As discussed in the previous point, database and DBMS are quite complex. Hence,
skilled personnel are required to handle the database so that it works in optimum
condition. This is a costly venture as these professionals need to be very well paid.
Database Failure
All the relevant data for any company is stored in a database. So it is imperative that
the database works in optimal condition and there are no failures. A database failure
can be catastrophic and can lead to loss or corruption of very important data.
High Hardware Cost
A database contains vast amount of data. So a large disk storage is required to store
all this data. Sometimes extra storage may even be needed. All this increases
hardware costs by a lot and makes a database quite expensive.
Huge Size
A database contains a large amount of data, especially for bigger organisations. This
data may even increase as more data is updated into the database. All of these
leads to a large size of the database.
The bigger the database is, it is more difficult to handle and maintain. It is also more
complex to ensure data consistency and user authentication across big databases.
Upgradation Costs
Often new functionalities are added to the database.This leads to database
upgradations. All of these upgradations cost a lot of money. Moreover it is also quite
expensive to train the database managers and users to handle these new
upgradations.
Cost of Data Conversion
If the database is changed or modified in some manner, all the data needs to be
converted to the new form. This cost may even exceed the database creation and
management costs sometimes. This is the reason most organisations prefer to work
on their old databases rather than upgrade to new ones.
22