BCAR-202 SLM
BCAR-202 SLM
BABASAHEB AMBEDKAR
OPEN UNIVERSITY
BCA
BACHELOR OF COMPUTER APPLICATION
BCAR-202
Database Management System (DBMS)
DATABASE MANAGEMENT SYSTEM
ISBN 978-81-949223-0-8
Edition : 2020
Acknowledgment
We sincerely hope this book will help you in every way you
expect.
Unit 6 NORMALISATION
Block Objectives :
After learning this block, you will be able :
Block Structure :
Unit 1 : Introduction to Database Management System
Unit 2 : Data Models
Unit 3 : Entity Relationship Model and Diagrams
Unit
INTRODUCTION TO DATABASE
01 MANAGEMENT SYSTEM
UNIT STRUCTURE
1.0 Learning Objectives
1.1 Introduction
1.2 Definition of DBMS
1.2.1 What is Database ?
1.2.2 What is DBMS ?
1.2.3 Functions of a DBMS
1.2.4 Data Abstraction
1.3 Comparison of File Processing System and DBMS
1.4 Advantages and Disadvantages of DBMS
1.5 Users of DBMS
1.6 Capabilities of DBMS
1.7 Let Us Sum Up
1.8 Suggested Answer for Check Your Progress
1.9 Glossary
1.10 Assignment
1.11 Activities
1.12 Case Study
1.13 Further Readings
1.1 Introduction :
In the age of Information Technology, the significance of Data is very
crucial. Database Management System (DBMS) is a separate branch of IT that
comprises of several aspects regarding management of data.
Data can be defined as the collection of facts and figures.
The data can be generated in any type of transaction taking place in
an organization. For example, amount withdrawn from a particular account on
a particular day is the data which is generated in any bank. The basic unit
1
Database Management of data is byte. One byte comprises of eight bits. Bit refers to as binary digit
System i.e. zero and one. Thus the data comprises of several numbers of bits in form
of zero's and ones. The other unit of data is Kilobyte. One kilobyte is equivalent
to 1028 bytes (i.e. 210 bytes). The other units are Megabyte, Gigabyte, and
Terabyte and so on.
There is a difference between data and information. When we process
data, it becomes information. For example, we have recorded daily financial
transaction of a company. This forms a data. When we process this data for
generating some report, say journal, profit and loss account or balance sheet,
it becomes information.
In short, information is the processed form of data which helps managers
to make a decision. On these lines, let us compare data and information.
Data Information
Data is a collection of facts and figures. Information is the processed form of
It is difficult to draw any conclusion data.
from data. Information can be used to draw some
It is difficult to use data for decision conclusion.
making. Information can be used for the purpose
Data is a raw material for processing. of decision making.
The data is required to be processed Information can also be further utilized
further. to produce highly densified information.
Data is not generally time bound. Information is generally time bound
2
Introduction to
Database
Management System
Definition of DBMS
1.2.1 What is Database ?
A database is an orderly collection of information. Sales order, stock
control or student records systems are all examples of databases
• A database categorizes information into logical groups, which are physically
stored in files called tables
• A table is an orderly collection of records
• A record is a collection of fields
3
Database Management A database can be viewed as a storeroom of data. A collection of actual
System information regarding an organization is stored in a database. For example,
there are 1000 students in a college and we want to store their personal details,
marks details etc. This information can be recorded in a database.
A collection of programs that enables you to store, modify, and extract
information from a database is known as DBMS. The major goal of a DBMS
is to provide a way to store & retrieve database information in convenient
and efficient manner.
Database systems are designed to handle large number of information.
Management of data involves both defining structures for storage of information
& providing way for manipulation of data. In addition, the database system
must ensure safety and accuracy of data.
1.2.2 What is DBMS ?
As DBMS is set of programs that enable you to store, modify, and extract
important information from a database. There are many different types of
DBMS, ranging from small systems that run on personal computers to huge
systems that run on mainframes.
4
A student record database holds details of students, courses and number Introduction to
Database
Table
Management System
Matric No Name Address Course
9712345 Samir Mumbai MA
9682374 Mihir Surat BE
9754322 Nishita Baroda MBA
Record 9567892 Alka Pune MBA
6763333 Soham Pune MCA
--------- --------- --------- ---------
--------- --------- --------- ---------
--------- --------- --------- ---------
--------- --------- --------- ---------
Fields
The student record database contains tables for students, courses and staff.
The student table has a record for each student, and each student record has
fields Matriculation Number, Student Name, Address, Date–of–birth, Course
etc.
One of the rules of relational databases is that only these "rectangular"
tables are permitted.
A database comprises 'tables', which contain 'records', which in turn
contain 'fields'.
A Database Management System (DBMS) provides facilities for the
storage and retrieval of information and, in addition, provides facilities for
preservation of the 'integrity' of the data
There are a number of different types of database management systems,
of which the 'relational' model, as used in Microsoft Access, is currently the
most common.
1.2.3 Functions of a DBMS :
1. Data Storage, Retrieval, and Update.
2. A User–Accessible Catalog.
3. Transaction Support.
4. Concurrency Control Services.
5. Recovery Services.
6. Authorization Services.
7. Support for Data Communication.
8. Integrity Services.
9. Services to Promote Data Independence.
10. Utility Services.
5
Database Management
System
Data Abstraction
Levels of Abstraction :
Various levels of abstraction are as follows :
Physical Level : It is the lowest level of abstraction. It describes how
the data is actually stored and describes the data structure and access methods
to be used by the database.
At the physical level, complex low level data is described in detail. The
internal view is expressed by the internal schema which contains the definition
6
of stored record, the method of representing the data fields and access aids Introduction to
used. Database
Management System
Conceptual level or logical level : It is the next higher level of abstraction.
It describes what data is stored actually in the database. It also explains what
relationships exist among the data.
At the conceptual level, the entire database is described in terms of
relatively simple data structures by conceptual schema.
View Level : It is the highest level of abstraction. At this level, only
a part of the database is described because the users of database may not be
concerned with the entire database. To simplify their interaction with database,
the view level is defined. This system may provide many views with same
database.
External/ User view defined
by user or application View 1 View 2 View 3
programmer in consultation
with Database Administrator. View Level 1
7
Database Management The database system supports three types of database schema –
System
Physical Schemas : These are the schema at the physical level. They
are at the lowest level.
Logical Schemas : These are the schema at the conceptual level. These
are provided at the next or intermediate level.
Sub–Schemas : These are the schema at view level. These are at the
highest level.
Data Independence :
The ability to modify schema definition at one level without affecting
schema definition in the next higher level is known as data independence.
There are two levels of data independence – physical data independence
and logical data independence. Physical data independence means the ability
to modify physical schema without causing application programs to be rewritten.
On similar lines, logical data independence means the ability to modify
the logical schema without causing application programs to be rewritten.
It is more difficult to achieve Logical data independence than achieving
Physical data independence
Data Model : It is the collection of conceptual tools for describing Data,
Data schema and Consistency constraints.
Data models are classified into following three categories –
(1) Object based logical model
(2) Record–based logical model
(3) Physical data model
Check Your Progress – 1 :
1. What is database ? Explain with example ?
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
2. What is DBMS ? Explain with example ?
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
9
Database Management 1.4 Advantages and Disadvantages of DBMS :
System
• Advantages of DBMS – One of the main advantages of using a database
system is that the organization can make use of, via the DBA, centralized
management and control over the data. The database administrator is the
focus of the centralized control. Any application requiring a change in
the structure of a data record requires an arrangement with the DBA,
who makes the necessary modifications.
• Reduction of Redundancies – Centralized control of data by the DBA
avoids unnecessary duplication of data and effectively reduces the total
amount of data storage required. It also eliminates the extra processing
necessary to trace the required data in a large mass of data.
• Elimination of Inconsistencies – The main advantage of avoiding
duplication is the elimination of inconsistencies that tend to be present
in redundant data files. Any redundancies that exist in the DBMS are
controlled and the system ensures that these multiple copies are consistent.
• Shared Data – A database allows the sharing of data under its control
by any number of application programs or users. For example, the
applications for the public relations and payroll departments can share
the same data.
• Integrity – Centralized control can also ensure that adequate checks are
incorporated in the DBMS to provide data integrity. Data integrity means
that the data contained in the database is both accurate and consistent.
Therefore, data values being entered for the storage could be checked
to ensure that they fall within a specified range and are of the correct
format.
• Security – Data is of vital importance to an organization and may be
confidential. Such confidential data must not be accessed by unauthorized
persons. The DBA who has the ultimate responsibility for the data in
the DBMS can ensure that proper access procedures are followed, including
proper authentication schemes for access to the DBMS and additional
checks before permitting access to sensitive data. Different levels of
security could be implemented for various types of data and operations.
• Conflict Resolution – Since the database is under the control of the
DBA, he/she should resolve the conflicting requirements of various users
and applications. In essence, the DBA chooses the best file structure and
access method to get optimal performance for the response–critical
applications, while permitting less critical applications to continue to use
the database, albeit with a relatively slower response.
• Data Independence – It is advantageous in the database environment
since it allows for changes at one level of the database without affecting
other levels. These changes are absorbed by the mapping between the
levels.
Disadvantages are as follows :
1. A complex conceptual design process
2. The need for multiple external databases
3. The need to hire database–related employees
4. High DBMS acquisition costs
10
5. A more complex programmer environment Introduction to
Database
6. Potentially catastrophic program failures
Management System
7. A longer running time for individual applications
8. Highly dependent DBMS operations
Check Your Progress – 3 :
1. What are the Advantages of DBMS ?
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
11
Database Management (d) Standalone users – these users maintain personal database by using
System readymade program packages. These packages provide flexible
environment with easy to use menu or graphical user interface.
Check Your Progress – 4 :
1. Explain different users of database management system.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
12
3. is collection of interrelated data and set of program to access Introduction to
them. Database
Management System
(A) Data Structure (B) Programming language
(C) Database Management System (D) Database
4. DBMS should provide following feature(s) .
(A) Protect data from system crash
(B) All of these
(C) Safety of the information stored
(D) Authorized access
5. Which of the following is considered as DBMS ?
(A) Oracle (B) Foxpro (C) All of these (D) Access
6. stores data about the data (meta–data).
(A) Model (B) Data Dictionary
(C) Template (D) None
1.9 Glossary :
1. Data – It can be defined as the collection of facts and figures.
2. Information – When we process data, it becomes information.
3. Database – A database is an orderly collection of information.
4. Database Management System (DBMS) – It is a collection of inter–
related data and a set of programs/methods to access those data.
5. Data Abstraction – It is the process of hiding certain details of how
data is stored and maintained.
6. Instances and Schemas – Collection of information stored in database
at a particular instance of time moment is called an instance of database.
The overall design of the database is called database schema.
7. Data Independence – The ability to modify schema definition at one
level without affecting schema definition in the next higher level is known
as data independence.
8. Physical Data Independence – It means the ability to modify physical
schema without causing application programs to be rewritten.
9. Logical Data Independence – It means the ability to modify the logical
schema without causing application programs to be rewritten.
10. Integrity – The property of the database that ensures that the data
contained in the database is as accurate and consistent as possible.
11. Data Dictionary – The database stores metadata in an area called the
data dictionary, which describes the tables, fields, indexes, constraints,
and other related items that make up the database.
12. Concurrent Access – Two or more users operating on the same records
in the same database table at the same time.
14
1.10 Assignment : Introduction to
Database
Design a file structure of student with consists of student name, student Management System
roll no., student address, student course and student phone no. and save it,
so that further it will be used to convert it in to database
1.11 Activities :
List major steps that you would take in setting up a database for particular
organization.
15
Unit
02 DATA MODELS
UNIT STRUCTURE
2.0 Learning Objectives
2.1 Introduction
2.2 Types of Data Models
2.2.1 Object Base Logical Model
2.2.2 Record Base Logical Model
2.2.3 Physical Data Models
2.2.4 Relational
2.2.5 Network
2.2.6 Hierarchical Model
2.3 Let Us Sum Up
2.4 Suggested Answer for Check Your Progress
2.5 Glossary
2.6 Assignment
2.7 Activities
2.8 Case Study
2.9 Further Readings
2.1 Introduction :
In the earlier unit, we had a discussion on the fundamental concepts of
Database and Database Management Systems. One of the main purposes of
Database Management System is to provide some level of data abstraction by
hiding details of data storage that are not required by most of the database
users. Now, let us consider how the structure of data, the relationship between
the data, its semantics and consistency constraints can be conceptually described.
These details are provided by the tool called Data Model.
Data model is integrated collection of concepts for describing data,
relationships between data, and constraints on the data in an organization.
16
2.2 Types of Data Models : Data Models
18
• Domains – A domain definition specifies the kind of data represented Data Models
by the attribute. Particularly, a domain is the set of all possible values
that an attribute may validly contain. Domains are often confused with
data types. Data type is a physical concept while domain is a logical
one. "Number" is a data type and "Age" is a domain.
To give another example "Street Name" and "Surname" might both be
represented as text fields, but they are obviously different kinds of text
fields; they belong to different domains.
• Body of a Relation – The body of the relation consists of an unordered
set of zero or more tuples. There are some important facts :
First the relation is unordered. Record numbers do not apply to
relations.
Second a relation with no tuples still qualifies as a relation.
Third, a relation is a set. The items in a set are, by definition,
uniquely identifiable. Therefore, for a table to qualify as a relation
each record must be uniquely identifiable and the table must
contain no duplicate records.
In this model, two sets of data are linked by a relationship. The
relationship could be 'one–to–one', 'one–to–many' or 'many–to–
many'.
Relational View of Sample Database :
• Let us take an example of a sample database consisting of supplier, parts
and shipments tables. The table structure and some sample records for
supplier, parts and shipments tables are given as Tables as shown below :
19
Database Management more than one shipment exists for a given supplier/part combination_in
System the shipments table.
• Note that the relations Parts and Shipments have PNo (Part Number)
in common and Supplier and Shipments relations have SNo (Supplier
Number) in common. The Supplier and Parts relations have City in
common. For example, the fact that supplier S3 and part P2 are located
in the same city is represented by the appearance of the same value,
Amritsar, in the city column of the two tuples in relations.
Keys of a Relation :
It is a set of one or more columns whose combined values are unique
among all occurrences in a given table. A key is the relational means of
specifying uniqueness. Some different types of keys are :
• Primary key – is an attribute or a set of attributes of a relation which
possess the properties of uniqueness and irreducibility (No subset should
be unique). For example: Supplier number in S table is primary key,
Part number in P table is primary key and the combination of Supplier
number and Part Number in SP table is a primary key
• Foreign key – is the attributes of a table, which refers to the primary
key of some another table. Foreign key permit only those values, which
appears in the primary key of the table to which it refers or may be
null (Unknown value). For example: SNO in SP table refers the SNO
of S table, which is the primary key of S table, so we can say that SNO
in SP table is the foreign key. PNO in SP table refers the PNO of P
table, which is the primary key of P table, so we can say that PNO
in SP table is the foreign key.
Operations in Relational Model :
The four basic operations Insert, Update, Delete and Retrieve operations
are shown below on the sample database in relational model:
• Insert Operation – Suppose we wish to insert the information of supplier
who does not supply any part, can be inserted in S table without any
anomaly e.g. S4 can be inserted in Stable. Similarly, if we wish to insert
information of a new part that is not supplied by any supplier can be
inserted into a P table. If a supplier starts supplying any new part, then
this information can be stored in shipment table SP with the supplier
number, part number and supplied quantity. So, we can say that insert
operations can be performed in all the cases without any anomaly.
• Update Operation – Suppose supplier S1 has moved from Qadian to
Jalandhar. In that case we need to make changes in the record, so that
the supplier table is up–to–date. Since supplier number is the primary
key in the S (supplier) table, so there is only a single entry of S 1,
which needs a single update and problem of data inconsistencies would
not arise. Similarly, part and shipment information can be updated by
a single modification in the tables P and SP respectively without the
problem of inconsistency. Update operation in relational model is very
simple and without any anomaly in case of relational model.
• Delete Operation – Suppose if supplier S3 stops the supply of part P2,
then we have to delete the shipment connecting part P2 and supplier
S3 from shipment table SP. This information can be deleted from SP
20
table without affecting the details of supplier of S3 in supplier table and Data Models
part P2 information in part table. Similarly, we can delete the information
of parts in P table and their shipments in SP table and we can delete
the information suppliers in S table and their shipments in SP table.
The Relational database model is based on the Relational Algebra.
Advantages :
1. Structural Independence
2. Conceptual Simplicity
3. Ease of design, implementation, maintenance and usage.
4. Ad hoc query capability
Disadvantages :
1. Hardware Overheads
2. Ease of design can lead to bad design
2.2.5 Network :
The popularity of the network data model coincided with the popularity
of the hierarchical data model. Some data were more naturally modeled with
more than one parent per child. So, the network model permitted the modeling
of many–to–many relationships in data. In 1971, the Conference on Data
Systems Languages (CODASYL) formally defined the network model. The
basic data modeling construct in the network model is the set construct. A
set consists of an owner record type, a set name, and a member record type.
A member record type in the Network Model can have that role in more
than one set; hence the multi parent concept is supported. An owner record
type can also be a member or owner in another set. The data model is a simple
network, and link and intersection record types (called junction records by
IDMS) may exist, as well as sets between them.
Thus, the complete network of relationships is represented by several
pair wise sets; in each set some (one) record type is owner (at the tail of
the network arrow) and one or more record types are members (at the head
21
Database Management of the relationship arrow). Usually, a set defines a 1 : M relationship, although
System 1 : 1 is permitted. The CODASYL network model is based on mathematical
set theory.
Network view of Sample Database :
Considering again the sample supplier–part database, its network view
is shown. In addition to the part and supplier record types, a third record type
is introduced which we will call as the connector. A connector occurrence
specifies the association (shipment) between one supplier and one part. It
contains data (quantity of the parts supplied) describing the association between
supplier and part records.
Advantages :
1. Conceptual Simplicity
2. Ease of data access
3. Data Integrity and capability to handle more relationship types
4. Data independence
5. Database standards
Disadvantages :
1. System complexity
2. Absence of structural independence
23
Database Management Class
System
Student Teacher
Grade ID Department
24
In the hierarchical model, any information about the child can be accessed Data Models
only through the parent. For example, if you want any transaction information
about an account in the branch, then it will be accessed through the branch
– function account route. Direct access to the transaction, even if you know
its identity key, is not possible.
In this model, there are issues about loss of data and information, if
you delete the entity. When you delete the entity, you lose the information
of the child as well as of the tree structure e.g. if you delete account B, its
transaction and records are lost.
The typical characteristics of a hierarchical model are as follows:
1. Hierarchical model starts with a root and it has several roots.
2. A root will have several branches.
3. Each branch is connected to one and only one root.
4. A branch has several leaves and a set of leaves is connected to one branch.
5. Thus, the hierarchical tree structure is made up of branches (nodes) and
leaves (fields).
In order to understand the hierarchical data model better, let us take the
example of the sample database consisting of supplier, parts and shipments.
The record structure and some sample records for supplier, parts and shipments
elements are as given in following tables.
25
Database Management We assume that each row in Supplier table is identified by a unique
System SNo (Supplier Number) that uniquely identifies the entire row of the table.
Likewise each part has a unique Pno (Part Number). Also we assume that no
more than one shipment exists for a given supplier/part combination in the
shipments table.
Hierarchical View for the Suppliers–Parts Database :
The tree structure has parts record superior to supplier record. That is
parts form the parent and supplier forms the children. Each of the four trees
figure, consists of one part record occurrence, together with a set of subordinate
supplier record occurrences. There is one supplier record for each supplier of
a particular part. Each supplier occurrence includes the corresponding shipment
quantity.
For example, supplier S3 supplies 300 quantities of part P2. Note that
the set of supplier occurrences for a given part occurrence may contain any
number of members, including zero (for the case of part P4). Part PI is supplied
by two suppliers, S1 and S2. Part P2 is supplied by three suppliers, S1, S2
and S3 and part P3 supplied by only supplier SI as shown in figure.
Operations on Hierarchical Model :
There are four basic operations Insert, Update, Delete and Retrieve that
can be performed on each model. Now, we consider in detail that how these
basic operations are performed in hierarchical database model.
• Insert Operation – It is not possible to insert the information of the
supplier e.g. S4 who does not supply any part. This is because a node
cannot exist without a root. Since, a part P5 that is not supplied by any
supplier can be inserted without any problem, because a parent can exist
without any child. So, we can say that insert anomaly exists only for
those children, which has no corresponding parents.
• Update Operation – Suppose we wish to change the city of supplier
S1 from Qadian to Jalandhar, then we will have to carry out two
operations such as searching S1 for each part and then multiple updates
for different occurrences of S1. But, if we wish to change the city of
part P1 from Qadian to Jalandhar, then these problems will not occur
because there is only a single entry for part P I and the problem of
inconsistency will not arise. So, we can say that update anomalies only
exist for children not for parent because children may have multiple
entries in the database.
26
• Delete Operation – In hierarchical model, quantity information is Data Models
incorporated into supplier record. Hence, the only way to delete a
shipment (or supplied quantity) is to delete the corresponding supplier
record. But such an action will lead to loss of information of the supplier,
which is not desired. For example: Supplier S2 stops supplying 250
quantity of part PI, then the whole record of S2 has to be deleted under
part PI which may lead to loss the information of supplier. Another
problem will arise if we wish to delete a part information and that part
happens to be only part supplied by some supplier. In hierarchical model,
deletion of parent causes the deletion of child records also and if the
child occurrence is the only occurrence in the whole database, then the
information of child records will also lost with the deletion of parent.
For example: if we wish to delete the information of part P2 then we
also lost the information of S3, S2 and S1 supplier. The information of
S2 and Sl can be obtained from PI, but the information about supplier
S3 is lost with the deletion of record for P2.
Program Work Area / User Work Area :
In an application environment, as a number of records are accessed from
a database, they are retrieved into a set of program variables. The program
then accesses the field values of the database records through them. The
database system maintains a program work area or user work area. The PWA/
UWA is a buffer storage area for these variables.
The following variables are maintained in a PWA/UWA for each application
program:
a. Record templates – The variables with data types and width of each
field of the record types accessed by the application programs.
b. Currency Pointers – A set of pointers, one for each database tree
containing the address of the record in that particular tree accessed most
recently by the application program.
c. Status flag – A variable set (DB – status) by the system to indicate
to the application program, the outcome of the last database operation.
If DB–status = 0, then it indicates that the last operation succeeded.
Check Your Progress – 1 :
1. Explain the concept of Data models.
2. State the difference between Network data model and Hierarchical Model.
3. Explain advantages and disadvantages of relational model.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
4. Data Model is collection of conceptual tools for describing –
(A) Data (B) Schema (C) Constraints (D) All of these
27
Database Management 5. Data Models in DBMS are classified into ______ categories.
System
(A) 3 (B) 2 (C) 5 (D) 4
6. Object based logical model(s) are used to describe data at – [Select
Appropriate Option(s)
(A) View Level (B) Logical Level
(C) Physical Level (D) None
7. Which of the following is example of Object based logical model ?
(A) Relational Model (B) Hierarchical Model
(C) Network Layer (D) ERM
8. is a variable set (DB – status) by the system to indicate to
the application program, the outcome of the last database operation
(A) Status flag (B) Currency Pointer
(C) None (D) Both
2.5 Glossary :
1. Data Model – It can be defined as an integrated collection of concepts
for describing and manipulating data, relationships between data, and
constraints on the data in an organization.
2. Object based data model – the model which uses concepts such as
entities, attributes, and relationships
3. Record based logical model – the model is used in describing data at
the logical and view levels.
28
4. Physical Model – Physical data models describe how data is stored in Data Models
the computer, representing information such as record structures, record
ordering, and access paths.
5. Relational model – the model stores data in the form of tables.
6. Network model – The model which organizes entity as graph and can
be accessed by several paths.
7. Hierarchical model – This model is like a structure of a tree with the
records forming the nodes and fields forming the branches of the tree.
8. CODASYL – Conference on Data Systems Languages (CODASYL)
formally defined the network model.
9. HDML – Hierarchical Data Manipulation Language for Hierarchical
model
10. IMS – Information Management System is one of the oldest and widely
used database system introduced by IBM.
11. DDL – Data definition Language.
2.6 Assignment :
Pick the various Databases which you are very much familiar with. And
identify the data model that suits those databases
2.7 Activities :
Study different object based logical models in detail.
29
Database Management http://rakeshpurohit.wordpress.com
System
http://www.gdcbemina.com/Study–Material/BCOM–FINAL–YEAR–
STUDY–MATERIAL(COMPUTER)/B.Com.3rd–Study–mat...
http://sql–plsql–notes.blogspot.in/2011/07/relational–database–management
–system.html
http://www.cs.iit.edu/~cs561/cs425/PANDURENGAN_VIGNESH
RelationalData baseIntro/test/DBModels.html
http://www.unixspace.com/context/databases.html
30
Unit
ENTITY RELATIONSHIP
03 MODEL & DIAGRAMS
UNIT STRUCTURE
3.0 Learning Objectives
3.1 Introduction
3.2 Entity Set
3.2.1 What is Entity Set ?
3.2.2 What is weak Entity Set ?
3.3 Attribute
3.4 Relationship Set
3.5 ER Diagrams
3.6 Let Us Sum Up
3.7 Suggested Answer for Check Your Progress
3.8 Glossary
3.9 Assignment
3.10 Activities
3.11 Case Study
3.12 Further Readings
3.1 Introduction :
The E–R data model is based on a perception of a real world that consists
of a set of basic objects called entities and of relationships among these objects.
It was developed to facilitate database design by allowing the specification of
an enterprise schema, which represents the overall logical structure of a database.
The E–R model is extremely useful in mapping the meanings and interactions
of real–world enterprises onto a conceptual schema. Because of this utility,
many database–design tools draw on concepts from the E–R model.
There are three basic notations that the E–R Model employs:
1. Entity Sets.
2. Relationship sets.
3. Attributes.
Entities and Entity sets :
31
Database Management 3.2 Entity Set :
System
3.2.1 What is Entity Set ?
An Entity is any object of interest to and organization or for the
representation in the database. They represent objects in the real world which
is distinguishable from all other objects.
For Eg : Every person in a college is an entity.
Every room in a college is an entity.
Associated with an entity is a set of properties. These properties are used
to distinguish to from one entity to another entity.
For Eg : 1. The Attributes of the entity of student are
USN, Name, Address.
2. The Attributes of the Entity of Vehicle are
Vehicle no, Make, Capacity.
For the purpose of accessing and storing information only certain attributes
are used.
Those attributes which uniquely identify every instance of the entity is
termed as primary key.
An Entity which has a set of attributes which can uniquely identify all
the entities is termed as Strong entity.
An entity whose primary key does not determine all the instance of the
entity uniquely termed as weak entity.
A collection of similar entities, which has certain properties which are
common forms an entity set for organization such as a college the object of
concern include Student, Teacher, Rooms, Subjects. The collection of similar
entities forms entity set.
3.2.2 What is Weak Entity Set ?
An entity set may not have sufficient attributes to form a primary key,
and its primary key compromises of its partial key and primary key of its parent
entity, then it is said to be Weak Entity set.
Check Your Progress – 1 :
1. What is Entity Set ?
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
3.3 Attribute :
Attribute / Column :
A column stores an attribute of the entity. For example, if details of
students are stored then student name is an attribute; course is another attribute
and so on.
32
In general, an attribute is a property or characteristic. Colour, for example, Entity Relationship
is an attribute of your hair. In using or programming computers, an attribute Model & Diagrams
is a changeable property or characteristic of some component of a program
that can be set to different values.
In a database management system (DBMS), an attribute may describe
a component of the database, such as a table or a field, or may be used itself
as another term for a field.
Types of attributes :
1. Simple Attributes – The attributes which cannot be further divided into
subparts.
Eg : University Seat Number of a student is unique which cannot be
further divided.
2. Composite Attributes – The attributes can be further divided into
portions.
Eg : The attribute name in the Student Database can be further divided
into First
name, Middle name, Last name.
Name
First name Middle name Last name
3. Single valued Attributes – The attribute at any instant contains only
a specific value at any instant.
for eg The USN is unique
4. Multivalued Attributes – Certain attributes for example the dependent
name in the policy database may have set of values assigned to it. There
may be more than one dependent for a single policy holder.
5. Stored Attributes – For a person entity, the value of age can be
determined from the current date and the value of that person's birthdate.
The Age attribute is hence derived attribute and is said to be derivable
from the birthdate attributes, which is called a stored attributes.
6. NULL Attributes – A NULL value attribute is used when an attributes
does not have any values.
33
Database Management Check Your Progress – 2 :
System
1. Explain concept of Attribute with proper example
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
3.5 ER Diagrams :
Figure depicts the entity relationship diagram for a store system. In this
diagram, the line connecting entities are shown. The arrows also indicate the
nature of relationships. The one to one (1:1) relationship occurs when one entity
is associated exactly with the other. For example, one receipt causes only one
payable and there is only one payable for every receipt so the lines connecting
these entities have only one arrow.
There can also be a one–to–many (1: n) relationship. One order contains
many different order items so the line connecting order–to–order item has a
dark arrow pointing to the order item and a single arrow pointing to the order.
Finally there are many to many (n: n) relationships. There can be many items
on a sale and many sales of item double arrow point to each entity.
34
The symbols used in ER diagram are as follows : Entity Relationship
Model & Diagrams
36
Step 5 : Draw complete ER diagram Entity Relationship
Model & Diagrams
By connecting all these details, we can now draw ER diagram as given
below.
38
4. Association among several entities is called as . Entity Relationship
Model & Diagrams
(A) Relationship (B) Association
(C) Combination (D) Extraction
5. express the number of entities to which another entity can
be associated via a relationship set.
(A) Mapping Cardinality (B) Messaging Cardinality
(C) Logical Cardinality (D) None
6. is an association of entities where the association includes one entity
from each participating entity type.
(A) Entity (B) Relationships
(C) Diagram (D) Node
39
Database Management Check Your Progress 4 :
System
1 : See Section 3.5
2 : Entities 3 : Entity
4 : Relationships 5 : Mapping Cardinality
6 : Relationships
3.8 Glossary :
1. ER model – It is a conceptual data model that views the real world
as entities and relationships. A basic component of the model is the
Entity–Relationship diagram, which is used to visually represent data
objects.
2. Entity – An entity is something which is described in the database by
storing its data, it may be a concrete entity a conceptual entity.
3. Entity set – An entity set is a collection of similar entities.
4. Attribute – An attribute describes a property associated with entities.
Attribute will have a name and a value for each entity.
5. Domain – A domain defines a set of permitted values for a attribute.
6. Relationship – A relationship is a link or association between entities.
Relationships are usually denoted by verb phrases.
7. Weak entity – An entity may not have sufficient attributes to form a
primary key, and its primary key compromises of its partial key and
primary key of its parent entity, then it is said to be Weak Entity.
8. Candidate key – If an attribute can be thought of as a unique identifier
for an entity, it is called a candidate key.
9. Primary key – When a candidate key is chosen to be the unique
identifier, it becomes the primary key for the entity.
3.9 Assignment :
Explain in your own words how to define Entity, attribute and relationship
with example.
3.10 Activities :
Explain ER–design methodology.What are problems with ER Model ?
40
3.12 Further Reading : Entity Relationship
Model & Diagrams
1. Introduction to Database Systems by C.J.Date
2. Database Management Systems – Rajesh Narang –PHI Learning Pvt Ltd
3. Database System Concepts by Silberschatz,Korth –Tata McGraw–Hill
Publication
4. An Introduction to Database Systems – Bipin Desai– Galgotia Publication
5. Database Management System by Raghu Ramkrishnan– Tata McGraw–
Hill Publication
6. SQL, PL/SQL:The Programming Language Oracle – Ivan Bayross– BPB
Publication
Some useful websites are as follows :
http://ecomputernotes.com/fundamental/what–is–a–database/network–
model
http://rakeshpurohit.wordpress.com
http://www.authorstream.com/Presentation/nehasinghal–761246–dbms
http://www.slidsseshare.net/the10dar/document–33319174
http://rakeshpurohit.wordpress.com
http://www.gdcbemina.com/Study–Material/BCOM–FINAL–YEAR–
STUDY–MATERIAL(COMPUTER)/B.Com.3rd–Study–mat...
http://www.orafaq.com/wiki/Talk:DBMS
http://www.answers.com/Q/What_are_the_attributes_of_an_Object
http://www.programmingking.in/2013_05_26_archive.html
41
Database Management BLOCK SUMMARY :
System
Unit 1 provides concept of database and database management system.
Database is well organized data about particular enterprise. DBMS is a collection
of interrelated data and set of programs to access those data. The primary goal
of DBMS is to provide a way to store and retrieve database information that
is both convenient and efficient. Keeping organizational information in file
processing system has major disadvantages like data redundancy and inconsistency,
difficulty in accessing data, data isolation, integrity and security and concurrent
access anomalies. In order to remove drawbacks, database management system
comes into existence. A major purpose of database management system is to
provide users with an abstract view of the data. Data abstraction is provided
at physical, logical and view level. Database system has schemas according
to levels of abstraction: physical, logical and sub–schema. There are different
types of database users: naïve users, application programmers and sophisticated
users and specialized users. Database Administrator is person who has central
control over database. Components of DBMS are Transaction manager, storage
manager, query processor, file manager and buffer manager.
Unit 2 provides the concept of data models. Data Model can be defined
as an integrated collection of concepts for describing and manipulating data,
relationships between data, and constraints on the data in an organization. The
purpose of a data model is to represent data and to make the data understandable.
They fall into three broad categories: Object Based Data Models, Physical Data
Models and Record Based Data Models.
42
BLOCK ASSIGNMENT :
Short Questions :
Long Questions :
BIBLIOGRAPHY
http://www.webopedia.com/TERM/D/database_management_system_
DBMS.html
http://searchsoa.techtarget.com/definition/attribute
http://www.fidelcaptain.com/casestudy1/entitiesandattributescs1.html
43
Database Management Enrolment No. :
System
1. How many hours did you need for studying the units ?
Unit No. 1 2 3
No. of Hrs.
2. Please give your reactions to the following items based on your reading
of the block :
44
Dr. Babasaheb Ambedkar BCAR-202/
Open University Ahmedabad DCAR-202
UNIT 6 NORMALISATION
RELATIONAL DATABASE
AND DATABASE DESIGN
Block Introduction :
Unit 4 provides overview of relational database. "Relational database was
proposed by Edgar Codd (of IBM Research) around 1969. It has since become
the dominant database model for commercial applications (in comparison with
other database models such as hierarchical, network and object models). Today,
there are many commercial Relational Database Management System (RDBMS),
such as Oracle, IBM DB2 and Microsoft SQL Server. There are also many free
and open–source RDBMS, such as MySQL, mSQL (mini–SQL) and the embedded
JavaDB" (Apache Derby).
A relational database organizes data in tables (or relations). A table is made
up of rows and columns. A row is also called a record (or tuple). A column is
also called a field (or attribute). A database table is similar to a spreadsheet.
However, the relationships that can be created among the tables enable a
relational database to efficiently store huge amount of data, and effectively
retrieve selected data.
Unit 5 discusses design issues in relational databases. The goal of relational
database design is to generate a set of relational schemas that allows us to store
information without redundancy and allows us to retrieve it easily. One approach
is to design schemas that are in appropriate normal forms.
In Unit 6 the need of normalization process is explained. Also basic three
normal forms are discussed with examples. It also explained all the 6 NF with
examples and give detail understanding.
Block Objectives :
After learning this block, you will be able to :
Block Structure :
Unit 4 : Introduction To Relational Database
Unit 5 : Database Design
Unit 6 : Normalisation
Unit
INTRODUCTION TO RELATIONAL
04 DATABASE MANAGEMENT SYSTEM
UNIT STRUCTURE
4.0 Learning Objectives
4.1 Introduction
4.2 Codd's 12 Rules
4.3 Terms
4.4 Keys
4.5 Anomalies of Un–normalized Database
4.6 Comparison Hierarchical, Network and Relational Databases
4.7 Let Us Sum Up
4.8 Suggested Answer for Check Your Progress
4.9 Glossary
4.10 Assignment
4.11 Activities
4.12 Case Study
4.13 Further Readings
4.1 Introduction :
Database of business computing from the beginning of the digital age
is a fastener. EF Codd, a researcher at IBM, wrote a letter outlining the process,
when in fact, was born in 1970 in a relational database. Since then, relational
databases have become popular and standard.
Originally, databases were flat. This means that the information was
stored in one long text file, called a tab delimited file. Each entry in the tab
delimited fil e is separated by a special character, such as a vertical bar (|).
Each entry contains multiple pieces of information about a particular object
or person grouped togethe r as a record. It is very difficult to search for specific
information or to create reports in text files.
A relational database allows you to easily obtain specific information.
It also lets you sort the data by any field and each record contains only a
few areas that allows you to generate reports. Relational databases use tables
to store information. Standard fields and records in a table columns (fields)
and rows (records) are represented. 45
Database Management With a relational database, you quickly learn to arrange the data in the
System columns can compare. To increase the speed and versatility of the database
similar uses relationship data. The name "relational" part is derived from the
mathematical relationships. On the table to gather information from other tables,
each table can key in a column or columns.
Relational databases are created using a special computer language,
structured query language (SQL) that is the standard for database interoperability.
SQL is the foundation for all of the popular database applications available
today.
4.3 Term :
Terms
• In the relational model, a database is a collection of relational tables.
• Relations can be represented as two–dimensional data tables with rows
and columns
• The rows of a relation are called tuples.
• The columns of a relation are called attributes.
• The attributes draw values from a domain (a legal pool of values).
• The number of tuples in a relation is called its cardinality while the
number of attributes in a relation is called its degree.
• A relation also consists of a schema and an instance.
• Schema defines the structure of a relation which consists of a fixed set
of attribute domain pairs.
• An instance of a relation is a time–varying set of tuples where each tuple
consists of attribute–value pairs.
Relational tables can be expressed concisely by eliminating the sample
data and showing just the table name and the column names.
47
Database Management For example,
System
AUTHOR (au_id, au_lname, au_fname, address, city, state, zip)
TITLE (title_id, title, type, price, pub_id)
PUBLISHER (pub_id, pub_name, city)
AUTHOR TITLE (au_id, title_id)
Properties of Relational Tables :
Relational tables have six properties :
1. Values are atomic.
2. Column values are of the same kind.
3. Each row is unique.
4. The sequence of columns is insignificant.
5. The sequence of rows is insignificant.
6. Each column must have a unique name.
Check Your Progress – 2 :
1. Explain properties of relational table.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
4.4 Keys :
Key :
A set of attributes whose values, k, a tuple to identify any specific
examples and none of Kashmir proper subsets property
Example : {rollNumber} is a key for studentrelation. {rollNumber, name}
– values can uniquely identify a tuple but the set is not minimal. So it is not
a Key. A particular example of a key cannot be determined from the data.
It's an integral scheme. It property can be determined only by means of
attributes. A relationship may be more than one key.
Each of the keys is called a candidate key.
Example : book (isbnNo, authorName, title, publisher, year)
(Assumption : books have only one author)
Keys : {isbnNo}, {authorName, title}
ƒ A relation has at least one key i.e. The set of all attributes, in case
no proper subset is a key.
• Superkey – A set of attributes that contains any key as a subset. A key
can also be defined as a minimal superkey.
48
Introduction to
Relational
Database Management
System
• Primary Key – One of the candidate keys chosen for indexing purposes.
This is key whose values uniquely identify each row in a table.
• A foreign key – is key whose values are the same as the primary key
of another table. Foreign key is a copy of primary key from another
relational table. The relationship is made between two relational tables
by matching the values of the foreign key in one table with the values
of the primary key in another.
50
• Domain constraints Introduction to
Relational
• Key constraints
Database Management
• Referential integrity constraints System
Domain constraints – Properties have been associated with the domain.
Domain is set for a specific type of atomic data values. Domain constraint
declared the actual values of an attribute in a tuple to be related to the domain
stipulates that. For example, the employee ID must be unique, the employee
birthday is in the range [Jan 1, 1950, Jan 1, 2000]. Such information is provided
in logical statements called integrity constraints.
Key constraints or Entity constraint – Each table has a primary key
is required. Primary key, nor any part of the primary key, can contain null
values. We identify a few lines for the primary key cannot be null value because
it means. Thus entity integrity constraint states that the primary key cannot
be null. For example, in the EMPLOYEE table, mobile cannot be a key since
some people may not have a mobile.
Referential integrity constraints – This integrity constraints works on
the concept of Foreign Key. A key attribute of a relation can be referred in
other relation, where it is called foreign key. The referential integrity rule states
that if a relational table has a foreign key, then every value of the foreign
key must either be null or match the values in the relational table in which
that foreign key is a primary key.
Check Your Progress – 3 :
1. Define primary, superkey and foreign key.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
51
Database Management FIRST (s#, status, city, p#, qty)
System
53
Database Management 4.7 Let Us Sum Up :
System
Dr. Edgar F. Codd's relational model database systems did some extensive
research and according to him, a true relational database is a database that
must be followed in order to come up with your own twelve rules
• Relations can be represented as two–dimensional data tables with rows
and columns
• The rows of a relation are called tuples.
• The columns of a relation are called attributes.
• The attributes draw values from a domain (a legal pool of values).
• The number of tuples in a relation is called its cardinality while the
number of attributes in a relation is called its degree
• A relation also consists of a schema and an instance
• Schema defines the structure of a relation which consists of a fixed set
of attribute domain pairs.
• An instance of a relation is a time–varying set of tuples where each tuple
consists of attribute–value pairs.
• Keys are fundamental to the concept of relational databases because they
enable tables in the database to be related with each other.
• Candidate key is an attribute or set of attributes that uniquely identifies
a row.
• The Candidate key that you choose to identify each row uniquely is called
the Primary Key.
• When a primary key of one table appears as an attribute in another table,
it is called the foreign key in the second table. A foreign key is used
to relate two table.
• Integrity constraints are necessary conditions to be satisfied by the data
values in the relational instances so that the set of data values constitute
a meaningful database.
• Data Integrity falls into the following categories : Domain integrity,Entity
integrity and Referential integrity
• Domain integrity refers to the range of valid entries for a given column.
It ensures that, there are only valid entries in the column
• Entity integrity ensures that each row can be uniquely identified by an
attribute called the Primary key. The Primary key cannot have a NULL
value.
• Referential integrity ensures that for every value of a foreign key, there
is a matching value of the Primary key.
• Most of these problems, two redundant data and discrepancies are the
result of bad design. Unnecessary data is unnecessary recurring data.
Discrepancies due to irregular or inconsistent storage to undermine the
integrity of your data to any event.
• When we move with the data models such as hierarchical model, network
model, relational model we can identify number of difference in terms
of data structures, Data manipulation and Data integrity.
54
4.8 Suggested Answer for Check Your Progress : Introduction to
Relational
Check Your Progress 1 : Database Management
System
See Section 4.2
Check Your Progress 2 :
See Section 4.3
Check Your Progress 3 :
See Section 4.4
Check Your Progress 4 :
See Section 4.5
Check Your Progress 5 :
1 : See Section 4.6
2 : Row 3 : Tuple
4 : Domain
5 : Schema, Instance, Answer 6: Relational
4.9 Glossary :
1. Relations – It can be represented as two–dimensional data tables with
rows and columns
2. Tuples – The rows of a relation.
3. Attributes – The columns of a relation.
4. Schema – It defines the structure of a relation which consists of a fixed
set of attribute domain pairs.
5. Instance of a relation – It is a time–varying set of tuples where each
tuple consists of attribute–value pairs.
6. Candidate key – It is an attribute or set of attributes that uniquely
identifies a row.
7. Primary Key – The Candidate key that you choose to identify each row
uniquely.
8. Foreign key – When a primary key of one table appears as an attribute
in another table, it is called Foreign key the in the second table. A foreign
key is used to relate two tables.
9. Integrity constraints – these are necessary conditions to be satisfied
by the data values in the relational instances so that the set of data values
constitute a meaningful database.
10. Domain integrity – It refers to the range of valid entries for a given
column. It ensures that there are only valid entries in the column
11. Entity integrity – It ensures that each row can be uniquely identified
by an attribute called the Primary key. The Primary key cannot have
a NULL value.
12. Referential integrity – It ensures that for every value of a foreign key,
there is a matching value of the Primary key.
55
Database Management 4.10 Assignment :
System
Compare all three (Hierarchical, relational and Network) models of
Database with Suitable example.
4.11 Activities :
Study and write Codd's 12 rules in detail.
56
Unit
05 DATABASE DESIGN
UNIT STRUCTURE
5.0 Learning Objectives
5.1 Introduction
5.2 Database Development Life Cycle
5.3 Logical Design
5.4 Physical Model
5.5 Capacity Planning
5.6 Advantages and Disadvantages of Normalization
5.7 Let Us Sum Up
5.8 Suggested Answer for Check Your Progress
5.9 Glossary
5.10 Assignment
5.11 Activities
5.12 Case Study
5.13 Further Readings
5.1 Introduction :
Database design is made up of two main phases : logical and physical
database design.
Logical database design is the process of constructing a model of the
data used in a company based on a specific data model, but independent of
a particular DBMS and other physical considerations.
In the logical database design phase we build the logical representation
of the database, which includes identification of the important entities and
relationships, and then translate this representation to a set of tables. The logical
data model is a source of information for the physical design phase, providing
the physical database designer with a vehicle for making tradeoffs that are very
important to the design of an efficient database.
57
Database Management Physical database design is the process of producing a description of
System the implementation of the database on secondary storage; it describes the base
tables, file organizations, and indexes used to achieve efficient access to the
data, and any associated integrity constraints and security restrictions. In the
physical database design phase we decide how the logical design is to be
physically implemented in the target relational DBMS. This phase allows the
designer to make decisions on how the database is to be implemented. Therefore,
physical design is tailored to a specific DBMS.
58
5.3 Logical Design : Database Design
61
Database Management When a user creates a view with the CREATE VIEW statement, he or
System she automatically becomes the owner of the view, but does not necessarily
receive full privileges on the view. To create the view, a user must have
SELECT privilege to all the tables that make up the view. However, the
owner will only get other privileges if he or she holds those privileges
for every table in the view.
• Step 5 considers relaxing the normalization constraints imposed on the
tables to improve the overall performance of the system. This is a step
that you should undertake only if necessary, because of the inherent
problems involved in introducing redundancy while still maintaining
consistency.
Formally, the term denormalisation refers to a change to the structure
of a base table, such that the new table is in a lower normal form than
the original table. However, we also use the term more loosely to refer
to situations where we combine two tables into one new table, where
the new table is in the same normal form but contains more nulls than
the original tables.
• Step 6 is an ongoing process of monitoring and tuning the operational
system to identify and resolve any performance problems resulting from
the design and to implement new or changing requirements.
Check Your Progress – 2 :
1. What is physical design of database ?
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
63
Database Management Need of Normalization :
System
Normalization is a well–designed relational database management system
(RDBMS) is aimed. The data is put in its simplest forms, step by step set
of rules by which we need normalization in the relational database data for
the following reasons:
• In order to minimize data redundancy.
• It is possible to add new data values and rows without reorganizing the
database structure.
• The Data should be consistent throughout the database it means it should
not suffer from following anomalies.
• Insert Anomaly – Due to the lack of data i.e., all the data available for
insertion such that null values in keys should be avoided. This kind of
anomaly can seriously damage a database
• Update Anomaly – It is due to data redundancy i.e. multiple occurrences
of same values in a column. This can lead to inefficiency.
• Deletion Anomaly – It leads to loss of data for rows that are not stored
elsewhere. It could result in loss of vital data.
• Complex queries required by the user should be easy to handle.
• On decomposition of a relation into smaller relations with fewer attributes
on normalization the resulting relations whenever joined must result in
the same relation without any extra rows. The join operations can be
performed in any order. This is known as Lossless Join decomposition.
The resulting relations (tables) should possess qualities such as the
normalization of each line received, resulting relations (tables), a unique key,
no repeating groups must be identified by identical columns, each column is
assigned a unique name, etc.
They are six normal forms as follows :
• First Normal Form
• Second Normal Form
• Third Normal Form
• Boyce–Codd Normal Form
• Fourth Normal Form
• Fifth Normal Form
• Sixth or Domain–key Normal form
How to identify whether table is in normalized form or not ?
Let's take an example to understand this,
Suppose we want to create a database which stores friends name and
their three favorite artists. The database would be quite a simple so initially
there is only one table called friends table. Here FID is the primary key.
FID FNAME FavoriteArtist
1 Srihari Akon, The Corrs, Robbie Williams.
2 Arvind Engima, Chicane, Shania Twain
64
This table is not in normal form as Favorite Artist column is not atomic Database Design
or having more than one value.
So we can modify this table
FID FNAME FavoriteArtist1 FavoriteArtist2 FavoriteArtist3
1 Srihari Akon The Corrs Robbie Williams.
2 Arvind Engima Chicane Shania Twain
Still this table is also not in normal form as we are having multiple
columns with same kind of value i.e. repeating group of data or repeating
columns.
To bring it in First Normal Form, we will perform the following steps:
• We'll first break single table into two.
• Each table should have information about only one entity. So we will
store friend's information in one table and his favorite artists' information
in another
FID FNAME
1 Srihari
2 Arvind
FID FavoriteArtist
1 Akon
1 The Corrs
1 Robbie Williams.
2 Engima
2 Chicane
2 Shania Twain
FID foreign key in Favorite Artist table which refers to FID in our Friends
Table.
Now we can say that our table is in first normal form.
So First Normal Form :
• Column values should be atomic, scalar or should be holding single value
• No repetition of information or values in multiple columns.
So what does Second Normal Form means ?
For second normal form our database should already be in first normal
form and every non–key column must depend on entire primary key.
Here we can say that our Friend database was already in second normal
form because we don't have composite primary key in our friends and favorite
artists table.
Composite primary keys are– primary keys made up of more than one
column.
But there is no such thing in our database.
But still let's try to understand second normal form with another example
This is our new table
65
Database Management Gadgets Supplier Cost Supplier Address
System
Headphone Abaci 123$ New York
Mp3 Player Sagas 250$ California
Headphone Mayas 100$ London
In above table ITEM+SUPPLIER together form a composite primary key.
Let's check for dependency. If we know gadget, still we cannot find the cost
as same gadget is provided my different supplier at different rate.
If we know supplier, then still we unable to find cost because same
supplier can provide me with different gadgets.
If we know both gadget and supplier, then we can find cost. So cost
is fully dependent (functionally dependent) on our composite primary key
(Gadgets+Supplier)
Let's start with another non–key column Supplier Address.
If we know gadget, then from that we cannot find supplier address.
If we know who the supplier, then we can find address.
So here supplier is not completely dependent on (partial dependent) on
our composite primary key (Gadgets+Supplier).
This table is surely not in Second Normal Form.
To bring it in second normal form, we'll break the table in two.
Gadgets Supplier Cost
Headphone Abaci 123$
Mp3 Player Sagas 250$
Headphone Mayas 100$
Supplier Supplier Address
Abaci New York
Sagas California
Mayas London
We know how to normalize till second normal form.
But let's take a break over here and learn some definitions and terms.
Composite Key :– Composite key is a primary key composed of multiple
columns.
Functional Dependency – When value of one column is dependent on
another column.
So that if value of one column changes the value of other column changes
as well.
E.g. Supplier Address is functionally dependent on supplier name. If
supplier's name is changed in a record we need to change the supplier address
as well.
Supplier–>SupplierAddress
"In our s table supplier address column is functionally dependent on the
supplier column"
66
Partial Functional Dependency – A non – key columns in a composite Database Design
primary key is dependent on some of the columns.
In our above example Supplier Address was partially dependent on our
composite key columns (Gadgets+Supplier).
Transitive Dependency – In a non – key column is a transitive dependency
and non – key column value is determined by the value, which is a type of
functional dependency
With these definitions let's move to Third Normal Form.
For a table in third normal form
• It should already be in Second Normal Form.
• There should be no transitive dependency, i.e. we shouldn't have any
non–key column depending on any other non–key column.
Album Artist No. of tracks Country
Come on over Shania Twain 11 Canada
History Michael Jackson 15 USA
Up Shania Twain 11 Canada
MCMXC A.D. Enigma 8 Spain
The cross of changes Enigma 8 Spain
Again we need to make sure that the non–key columns depend upon
the primary key and not on any other non–key column.
Although the above table looks fine but still there is something in it
because of which we will normalize it further.
Album is the primary key of the above table.
Artist and No. of tracks are functionally dependent on the Album (primary
key).
In the above table Country value is getting repeated because of artist.
So in our above table Country column is depended on Artist column
which is a non–key column.
So we will move that information in another table and could save table
from redundancy i.e. repeating values of Country column.
Album Artist No. of tracks
Up Shania Twain 11
67
Database Management Artist Country
System
Shania Twain Canada
Enigma Spain
Enigma Spain
Steps in Normalization
Relationship between Normal Forms
68
Let us summarize basic three forms : Database Design
First Normal Form :
This is defined in the definition of relations (tables) itself. This rule
defines that all the attributes in a relation must have atomic domains. Values
in atomic domain are indivisible units.
A relation is in first normal form if it meets the definition of a relation:
1. Each attribute (column) value must be a single value only.
2. All values for a given attribute (column) must be of the same type.
3. Each attribute (column) name must be unique.
4. The order of attributes (columns) is insignificant
5. No two tuples (rows) in a relation can be identical.
6. The order of the tuples (rows) is insignificant.
Converting from UNF to 1NF :
• Select attribute(s) to act as the key.
• Identify the repeating group(s) in the unnormalised table which repeats
for the key attribute(s).
• Remove the repeating group by Entering data into empty columns of
rows which contain the repeating data or by placing the repeating data
along with a copy of the original key attribute(s) into a separate relation.
Second Normal Form (2NF) :
It is based on the concept of full functional dependency
• A relation is in second normal form (2NF) if all of its non–key attributes
are dependent on all of the key.
• Another way to say this: A relation is in second normal form if it is
free from partial–key dependencies.
• Relations that have a single attribute for a key are automatically in 2NF.
A 2NF relation is in 1NF and every non–primary–key attribute is fully
functionally dependent on the primary key.
Converting from 1NF to 2NF :
• Identify the primary key for the 1NF relation.
• Identify the functional dependencies in the relation.
• If partial dependencies exist on the primary key remove them by placing
then in a new relation along with a copy of their determinant.
Third Normal Form :
It is based on the concept of transitive dependency. A relation that is
in 1NF and 2NF and in which no non–primary–key attribute is transitively
dependent on the primary key.
For a relation to be in Third Normal Form, it must be in Second Normal
form and the following must satisfy:
• No non–prime attribute is transitively dependent on prime key attribute
• For any non–trivial functional dependency, X ? A, then either
• X is a superkey or,
• A is prime attribute. 69
Database Management Converting from 2NF to 3NF :
System
• Identify the primary key in the 2NF relation.
• Identify functional dependencies in the relation.
• If transitive dependencies exist on the primary key remove them by
placing them in a new relation along with a copy of their dominant.
Advantages of Normalization
The advantages of the normalization have been discussed below.
• Here the data structure is more efficient.
• It also avoids redundant fields or columns.
• We are able to add new rows and data values easily
• It helps in Better understanding of data.
• It ensures that distinct tables exist when necessary.
• It is very easy to perform operations and complex queries can be easily
handled.
• It even minimizes duplication of data.
• Close modeling of real world entities, processes and their relationships.
Disadvantages of Normalization
The disadvantages of the normalization have been discussed below.
1. One cannot start building the database before he know what the user
needs.
2. In the process of normalizing the relations to higher normally forms that
simply means. 4NF, 5NF the performance degrades.
3. This is a very time taking and difficult process in normalizing relations
of higher degree.
4. Careless decomposition may leads to bad design of database which may
leads to serious problems.
Check Your Progress – 4 :
1. What is normalization ? What is need of normalization ?
.......................................................................................................................
.......................................................................................................................
2. can help us detect poor E–R design.
(A) Database Design Process (B) E–R Design Process
(C) Relational scheme (D) Functional dependencies
3. If a multivalued dependency holds and is not implied by the corresponding
functional dependency, it usually arises from one of the following sources.
(A) A many–to–many relationship set
(B) A multivalued attribute of an entity set
(C) A one–to–many relationship set
(D) Both A many–to–many relationship set and A multivalued attribute
of an entity set
70
4. Which of the following has each related entity set has its own schema Database Design
and there is an additional schema for the relationship set.
(A) A many–to–many relationship set
(B) A multivalued attribute of an entity set
(C) A one–to–many relationship set
(D) All of the mentioned
5. In which of the following, a separate schema is created consisting of
that attribute and the primary key of the entity set.
(A) A many–to–many relationship set
(B) A multivalued attribute of an entity set
(C) A one–to–many relationship set
(D) All of the mentioned
6. A function that has no partial functional dependencies is in
form :
(A) 3NF (B) 2NF (C) 4NF (D) BCNF
5.9 Glossary :
1. Prime attribute – an attribute, which is part of prime–key, is prime
attribute.
2. Non–prime attribute – an attribute, which is not a part of prime–key,
is said to be a non–prime attribute.
3. Normalization – It is a method to remove all these anomalies and bring
database to consistent state and free from any kinds of anomalies.
4. First Normal Form (1NF) – This rule defines that all the attributes
in a relation must have atomic domains.
5. Second Normal Form (2NF) – Second normal form says, that every
non–prime attribute should be fully functionally dependent on prime key
attribute.
That is, if X ? A holds, then there should not be any proper subset Y
of X, for that Y ? A also holds.
6. Third Normal Form (3NF) – For a relation to be in Third Normal Form,
it must be in Second Normal form and the following must satisfy: No
non–prime attribute is transitively dependent on prime key attribute, For
any non–trivial functional dependency, X ? A, then either X is a superkey
or, A is prime attribute.
5.10 Assignment :
State the conceptual difference between simple bifurcation and
Normalization of DBMS view.
5.11 Activities :
Study the BCNF, 4NF and 5NF normal forms in detail.
73
Unit
06 NORMALIZATION
UNIT STRUCTURE
6.0 Learning Objectives
6.1 Introduction
6.2 What is Normalization ?
6.3 Database Normal Forms and Example
6.4 1NF (First Normal Form)
6.5 2NF (Second Normal Form)
6.6 3NF (Third Normal Form)
6.7 BCNF (Boyce–Codd Normal Form)
6.8 4NF (Fourth Normal Form)
6.9 5NF & 6NF (Fifth & Sixth Normal Form)
6.10 Let Us Sum Up
6.11 Suggested Answer for Check Your Progress
6.12 Glossary
6.13 Assignment
6.14 Activities
6.15 Case Study
6.16 Further Readings
6.1 Introduction :
Database normalization is the way toward getting sorted out information
into tables in such a way that the results of using the database are always
unambiguous and as intended. Such normalization is intrinsic to relational
database theory. It may have the effect of duplicating data within the database
and often results in the creation of additional tables. In this unit learner will
get insight about Normalization and its various forms.
6.4 1NF :
For a table to be in the First Normal Form, it should follow the following
4 rules :
75
Database Management • It should only have single(atomic) valued attributes/columns.
System
• Values stored in a column should be of the same domain
• All the columns in a table should have unique names.
• And the order in which data is stored, does not matter.
6.5 2NF :
For a table to be in the Second Normal Form,
• It should be in the First Normal form.
• And, it should not have Partial Dependency.
It is clear that we can't move forward to make our simple database in
2nd Normalization form unless we partition the table above.
76
We have divided our 1NF table into two tables viz. Table 1 and Table2. Normalization
Table 1 contains member information. Table 2 contains information on movies
rented. We have introduced a new column called Membership_id which is the
primary key for table 1. Records can be uniquely identified in Table 1 using
membership id. Foreign Key references the primary key of another Table! It
helps connect your Tables.
You will only be able to insert values into your foreign key that exist
in the unique key in the parent table. This helps in referential integrity.
The above problem can be overcome by declaring membership id from
Table2 as foreign key of membership id from Table1. Now, if somebody tries
to insert a value in the membership id field that does not exist in the parent
table, an error will be shown
A transitive functional dependency is when changing a non–key column,
might cause any of the other non–key columns to change Consider the table
1. Changing the non–key column Full Name may change Salutation.
We have again divided our tables and created a new table which stores
Salutations. There are no transitive functional dependencies, and hence our table
is in 3NF
In Table 3 Salutation ID is primary key, and in Table 1 Salutation ID
is foreign to primary key in Table 3
Now our little example is at a level that cannot further be decomposed
to attain higher normal forms of normalization. In fact, it is already in higher
normalization forms. Separate efforts for moving into next levels of normalizing
data are normally needed in complex databases. However, we will be discussing
next levels of normalizations in brief in the following. Advantage of removing
Transitive Dependency.
The advantage of removing transitive dependency is, Amount of data
duplication is reduced.
Data integrity achieved.
6.7 BCNF :
Boyce and Codd Normal Form is a higher version of the Third Normal
form. This form deals with certain type of anomaly that is not handled by
3NF. A 3NF table which does not have multiple overlapping candidate keys
is said to be in BCNF. For a table to be in BCNF, following conditions must
be satisfied:
• R must be in 3rd Normal Form
• and, for each functional dependency (X Y), X should be a super Key.
78
The second point sounds a bit tricky, right ? In simple words, it means, Normalization
that for a dependency A B, A cannot be a non–prime attribute, if B is a
prime attribute. Boyce–Codd Normal Form or BCNF is an extension to the
third normal form, and is also known as 3.5 Normal Form.
6.8 4NF :
A table is said to be in the Fourth Normal Form when,
• It is in the Boyce–Codd Normal Form.
• And, it doesn't have Multi–Valued Dependency.
Fourth Normal Form comes into picture when Multi–valued Dependency
occur in any relation. In this tutorial we will learn about Multi–valued Dependency,
how to remove it and how to make any table satisfy the fourth normal form.
What is Multi–valued Dependency ?
A table is said to have multi–valued dependency, if the following conditions
are true,
For a dependency A B, if for a single value of A, multiple value
of B exists, then the table may have multi–valued dependency. Also, a table
should have at–least 3 columns for it to have a multi–valued dependency. And,
for a relation R(A, B, C), if there is a multi–valued dependency between, A
and B, then B and C should be independent of each other. If all these conditions
are true for any relation(table), it is said to have multi–valued dependency.
79
Database Management 6.10 Let Us Sum Up :
System
Lets us summaries the unit, This unit focus on the detailed understanding
of Normalization process with first normal form to 6th normal form with
example.
6.12 Glossary :
1. First Normal Form (1NF) – This rule defines that all the attributes
in a relation must have atomic domains.
2. Second Normal Form (2NF) – Second normal form says, that every
non–prime attribute should be fully functionally dependent on prime key
attribute.
That is, if X A holds, then there should not be any proper subset
Y of X, for that Y A also holds.
3. Third Normal Form (3NF) – For a relation to be in Third Normal Form,
it must be in Second Normal form and the following must satisfy : No
non–prime attribute is transitively dependent on prime key attribute, For
any non–trivial functional dependency, X A, then either X is a
superkey or, A is prime attribute.
6.13 Assignment :
State the conceptual difference between simple bifurcation and
Normalization of DBMS view.
6.14 Activities :
Study the BCNF, 4NF and 5NF normal forms in detail.
80
Each image is defined by the owner of a customer is allocated monthly Normalization
rental price. The owner of the painting so that the customer has paid 10%
of the rental price. Hired within six months are not any pictures that are returned
to the owner. However, three months later, the owner returned the painting
can resubmit.
Each painting can only have one artist associated with it.
Several reports are required from the system. Three main ones are :
1. For each client, they are hired or a report showing an overview of all
the pictures are currently recruiting
2. All pictures for each artist to hire a report submitted
3. For each artist a return to the past six months rent pictures Reports
What are you supposed to take each in turn report and the third is to
make a set of relationships in general. At each step in the normalization process
you must show the relationship
81
Database Management BLOCK SUMMARY :
System
In unit 4, We discussed in a relational model real world objects are
represented in tables. Each table is made out of rows and columns. It is also
known as tuple or record each line, it is also known as attributes, is out of
the fields. Each attribute stands for a certain feature of the real world object.
An attribute is defined by a name and its value.
Unit 5 discussed issues of bad database design and explained the need
of normalization process. It also explained logical, physical database design
and need of capacity planning. The basics of normalization process is discussed.
82
BLOCK ASSIGNMENT :
Short Questions :
1. Define Candidate key, primary key.
2. Define terms schema, tuple, instance, relation.
3. What is goal of normalization ?
4. Write any two advantages of normalization.
5. What is referential integrity ?
Long Questions :
1. Explain the anomalies in un–normalized database.
2. What is normalization ? Explain basic three normal forms.
3. Compare Hierarchical, network and relational model.
83
Database Management Enrolment No. :
System
1. How many hours did you need for studying the units ?
Unit No. 4 5 6
No. of Hrs.
2. Please give your reactions to the following items based on your reading
of the block :
84
Dr. Babasaheb Ambedkar BCAR-202/
Open University Ahmedabad DCAR-202
Unit 7 will introduce query language for database. SQL (Structured Query
Language) is a database computer language designed for managing data in
relational database management systems RDBMS). SQL, is a standardized
computer language that was originally developed by IBM for querying, altering
and defining relational databases, using declarative statements. SQL has several
parts like DDL, DML, and TCL etc. You will discover how to use SQL to sort
and are trieve data from tables and how to use SQL to filter retrieved data. You
will also learn how to gather significant statistics from data using aggregate
functions, and how to extract data from multiple tables simultaneously using joins
and subqueries. In addition, you'll learn how to manipulate data using the
INSERT, UPDATE, and DELETE statements. So this unit will provide you SQL's
basic constructs and concepts.
the database. Transactions start with an executable DML statement and end when
the statement or multiple statements are all either rolled back or committed to
the database, or when a DDL (Data Definition Language) statement is issued
during the transaction. You will learn different types of transactions in this unit.
Also you will understand the steps involved in processing local and remote
transaction. In addition, you will learn how to optimize the execution of
transaction.
technology. The result of this blend is Object Oriented Database (OODB). The
blending achieves integration of applications running on different platforms.
OODB has more or less the same features of DBMS, but, in addition, is in a
position to handle OO features. In the real world, both the applications exist,
namely, OO applications and RDBMS applications, with first generation client–
server computing. So in an organisation, both OODBs and RDBs exist. The user
must have access to both of them to manipulate data. The developer must
therefore develop applications that could source data from all databases (OODB,
relational database, etc.). Now there is a mismatch between the application
objects and relational data that needs to be mapped for use in the application.
The process of mapping and integrating begins with defining the relationships
between the table structures in RDB and class structure in the object model in
OODB.
Block Objectives :
After learning this block, you will be able to :
7.1 Introduction :
Dr. E. F. Codd published the paper, "A Relational Model of Data for
Large Shared Data Banks", in June 1970 in the journal, Communications of
the ACM. Codd's model is now accepted as the definitive model for relational
database management systems (RDBMS). The language, Structured English
Query Language (SEQUEL) was developed by IBM Corporation, Inc., to use
Codd's model. SEQUEL later became SQL (still pronounced "sequel"). In 1979,
Relational Software, Inc. (now Oracle) introduced the first commercially available
implementation of SQL. Today, SQL is accepted as the standard RDBMS
language.
85
Database Management SQL is a non–procedural language. Non–Procedural means that SQL lacks
System the traditional control constructs such as IF–THEN–ELSE, WHILE, FOR, and
GO TO statements found in procedural languages like C, Perl, Python, and
other 3GL's.
SQL is called a declarative language, meaning that you only need to
specify what needs to be accomplished (such as a query or insert) and not
how to do it; the DBMS determines and performs internally the step by step
operation needed to obtain the result.
The sets of records can be manipulated instead of one record at a time.
The syntax is free–flowing, enabling you to concentrate on the data presentation.
Oracle has two optimizers (cost– and rule–based) that will parse the syntax
and format it into an efficient statement before the database engine receives
it for processing. The database administrator (DBA) determines which optimizer
is in effect for each database instance.
7.2 History :
SQL History
The history of SQL begins in an IBM laboratory in San Jose, California,
where SQL was developed in the late 1970s. The initials stand for Structured
Query Language, and the language itself is often referred to as "sequel." It
was originally developed for IBM's DB2 product (a relational database
management system, or RDBMS, that can still be bought today for various
platforms and environments). In fact, SQL makes an RDBMS possible. SQL
is a nonprocedural language, in contrast to the procedural or third–generation
languages (3GLs) such as COBOL and C that had been created up to that
time.
Two standards organizations, the American National Standards Institute
(ANSI) and the International Standards Organization (ISO), currently promote
SQL standards to industry. Although these standard–making bodies prepare
standards for database system designers to follow, all database products differ
from the ANSI standard to some degree. In addition, most systems provide
some proprietary extensions to SQL that extend the language into a true
procedural language.
86
Check Your Progress – 1 : SQL (Structured
Query Language)
1. What is SQL ? Explain in brief History of SQL.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
Basic Structure
SQL is based on set and relational operations with certain modifications
and enhancements.A typical SQL query has the form:
Select A1, A2,..., An
From r1, r2,..., rm
Where P
–A is represent attribute
–r is represent relations
–P is a predicate or condition.
Select Clause :
• The select clause corresponds to the projection operation of the relational
algebra. It is used to list the attributes desired in the result of a query.
• The general form for a SELECT statement, retrieving all of the rows
in the table is:
Select Column Name, Column Name,...
From Table Name;
• To get all columns of a table without typing all column names, use: Select
* From Table Name;
Note that SQL is not case sensitive. SELECT is the same as select.
Sql> Select * From Emp;
87
Database Management EMPNO ENAME JOB MGR HIREDATE SAL COMM DEPTNO
System
––– ––– ––– ––– ––– ––– ––– –––
7369 Smith Clerk 7902 17-Dec-80 800 20
7499 Allen Salesman 7698 20-Feb-81 1600 300 30
7521 Ward Salesman 7698 22-Feb-81 1250 500 30
7566 Jones Manager 7839 02-Apr-81 2975 20
7654 Martin Salesman 7698 28-Sep-81 1250 1400 30
7698 Blake Manager 7839 01-May-81 2850 30
7782 Clark Manager 7839 09-Jun-81 2450 10
7788 Scott Analyst 7566 19-Apr-87 3000 20
7839 King President 17-Nov-81 5000 10
7844 Turner Salesman 7698 08-Sep-81 15000 30
7876 Adams Clerk 7788 23-May-87 1100 20
7900 James Clerk 7698 03-Dec-81 950 30
7902 Ford Analyst 7566 03-Dec-81 3000 20
7934 Miller Clerk 7782 23-Jan-82 1300 10
14 Rows Selected.
Select All The Records From Dept Table
Sql> Select * From Dept;
DEPTNO DNAME LOC
––– ––– –––
10 ACCOUNTING NEW YORK
20 RESEARCH DALLAS
30 SALES CHICAGO
40 OPERATIONS BOSTON
• SQL allows duplicates in relations as well as in query results. To force
the elimination of duplicates, insert the keyword distinct after select.
Example : Find the names of all employees in the emp relation, and
remove duplicates.
select distinct ename
from emp;
• The keyword all specifies that duplicates should not be removed.
select all ename
from emp;
• The select clause can also contain arithmetic expressions involving the
operators, +, ?, ?, and /, and operating on constants or attributes of tuples.
The query:
Select ename, sal*100 From emp;
88
Would return a relation which is the same as the emp relation, except SQL (Structured
that the attribute sal is multiplied by 100. Query Language)
WHERE clause:
• The where clause corresponds to the selection predicate of the relational
algebra. It consists of a predicate involving attributes of the relations
that appear in the from clause.
SYNTAX :
Where <search condition>
Find The Employee Name Who Is Working In Dept No 30
Sql> Select Ename From Emp Where Deptno=30; Ename
Allen
Ward
Martin
Blake
Turner
James
6 Rows Selected.
• SQL uses the logical connectives and, or, and not. It allows the use of
arithmetic expressions as operands to the comparison operators. SQL
includes a between comparison operator in order to simplify where
clauses that specify that a value be less than or equal to some value
and greater than or equal to some other value.
• Find ename Who Are Manager And Getting Salary More Than 2000
Sql> Select ename From emp Where Job='Manager' And Sal>2000;
Ename
–––
Jones
Blake
Clark
• Find Those Employees Who Were Hired Between 1 Mar 1981 And 1
Jun 1983
Sql> Select Ename From Emp Where Hiredate Between '1–Mar–1981'
And '1–Jun–1983';
Ename
–––
Jones
Martin
Blake
Clark
King
Turner
89
Database Management James
System
Ford
Miller
9 Rows Selected.
FROM Clause :
• The from clause corresponds to the Cartesian product operation of the
relational algebra. It lists the relations to be scanned in the evaluation
of the expression.
• Find the Cartesian product emp×dept
Select ename, dname
From emp, dept
• Find the name of employees working at location 'DALLAS'
select distinct ename
from emp, dept
where emp.deptno = dept.deptno and loc= 'DALLAS';
Tuple variables :
Tuple variables are defined in the from clause via the use of the as clause.
• Find the emp names, deptno and their dept name
Select T.ename, S.Deptno, S.Dname
from emp as T, dept asS
where T.deptno = S.deptno
String Operations :
• SQL includes a string–matching operator for comparisons on character
strings. Patterns are described using two special characters :
– percent (%) : The % character matches any substring.
– underscore : The character matches any character.
• Find the names of all employees whose name includes the substring 'john'.
Select ename
From emp
Where ename like"%john%"
Create Table
The PURPOSE of this command is to create a table, the basic structure
to hold user data, specifying this information:
• column definitions
• integrity constraints
• the table's table space
• storage characteristics Syntax:
Create Table [Schema.]Table
( { Column Datatype [Default Expr] [Column_Constraint]...
| Table_Constraint}
[, { Column Datatype [Default Expr] [Column_Constraint]...
| Table_Constraint} ]...)
Where:
91
Database Management • Schema – is the schema containing the table. If you omit schema, it
System is assumed that the table is in your own schema.
• Table – is the name of the table to be created.
• Column – specifies the name of a column of the table. The number
of columns in a table can range from 1 to 254.
• Datatype – is the datatype of a column.
The data types permitted are
92
• Table Constraint – defines an integrity constraint as part of the table SQL (Structured
definition. Query Language)
94
For Example : SQL (Structured
Query Language)
To drop the Student table, enter:
SQL> DROP TABLE Student;
Table dropped.
Check Your Progress – 3 :
1. Explain basic DDL statements.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
SELECT
Data Manipulation
Language
UPDATE
DELETE
Queries
97
Database Management To retrieve data from the database, use the SELECT statement. Once
System again, proper privileges are required and are maintained by the DBA. The
SELECT statement has the following format:
SELECT column(s)
FROM tables(s)
WHERE conditions are met
GROUP BY selected columns
ORDER BY column(s);
Every SQL statement ends with a semicolon (;). When you are writing
scripts (disk files) that will be executed, you can also use a slash (\) to terminate
the SQL statement.
When SELECT column(s) is used, it is assumed that all of the columns
fitting the WHERE clause will be retrieved. It is sometimes necessary to only
retrieve columns that are distinct from one another. To do this, use the reserved
word DISTINCT before the column descriptions. In the following example,
a SELECT statement is used to retrieve all of the cities and states from the
addresses table (defined previously).
SELECT city, state
FROM addresses;
When you run this code, every city and state will be retrieved from the
table. If 30 people lived in Rochester, NY, the data would be displayed 30
times. To see only one occurrence for each city and state use the DISTINCT
qualifier, as shown in the following example:
SELECT DISTINCT city, state
FROM addresses;
This will cause only one row to be retrieved for entries with Rochester,
NY.
The FROM clause is a listing of all tables needed for the query. You
can use table aliases to help simplify queries, as shown in the following
example:
SELECT adrs.city, adrs.state
FROM addresses adrs;
In this example, the alias adrs has been given to the table addresses.
The alias will be used to differentiate columns with the same name from
different tables.
The 'WHERE' clause is used to list the criteria necessary to restrict the
output from the query or to join tables in the FROM clause. See the following
example.
SELECT DISTINCT city, state
FROM addresses
WHERE state in ('CA','NY','CT')
AND city is NOT NULL;
98
The above example will retrieve cities and states that are in the states SQL (Structured
of California, New York, and Connecticut. The check for NOT NULL cities Query Language)
will not bring data back if the city field was not filled in.
The GROUP BY clause tells Oracle how to group the records together
when certain functions are used.
SELECT dept_no, SUM(emp_salary)
FROM emp
GROUP BY dept_no;
The GROUP BY example will list all department numbers once with
the summation of the employee salaries for that particular department.
Check Your Progress – 5 :
1. Explain group by and order by clause with example.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
99
Database Management For example : Find employee name whose salary is greater than empno
System = 7566
SELECT ename
FROM EMP
WHERE sal >
( SELECT sal
FROM emp
WHERE empno=7566);
Output :
ENAME
FORD
SCOTT
KING
FORD
• A common use of sub queries is to perform tests for set membership,
set comparisons, and set cardinality.
SET MEMBERSHIP:
The 'in' & 'not in' connectivity tests are used for set membership &
absence of set membership respectively.
For example, display the employee number and name for all employees
who work in a department with any employee whose name contains a T.
SELECT empno, ename
FROM emp
WHERE deptno IN
( SELECT deptno
FROM emp
WHERE ename LIKE '%T%' );
Set Comparisons:
The < some, > some, <= some, >= some, = some, <> some are the
constructs allowed for comparison.
= some is same as the 'in' connectivity.
<> some is not the same as the 'not in' connectivity.
Similarly sql also provides < all, >all, <=all, >= all, <> all comparisons.
<>all is same as the 'not in' construct.
SELECT empno, ename, job
FROM emp
WHERE sal < ANY
( SELECT sal
FROM emp
WHERE job = 'CLERK' );
100
Output : SQL (Structured
Query Language)
EMPNO ENAME JOB
7369 SMITH CLERK
7900 JAMES CLERK
7876 ADAMS CLERK
7521 WARD SALESMAN
7654 MARTIN SALESMAN
Set Cardinality :
The 'exists' construct returns the value true if the argument subquery is
nonempty. We can test for the non–existence of tuples in a subquery by using
the 'not exists' construct. The 'not exists 'construct can also be used to simulate
the set containment operation (the super set). We can write "relation A contains
relation
B" as "not exists (B except A)".
For example, display employee and their dept. names whose salary
> 3500
SELECT ename, deptno
FROM emp
WHERE EXISTS
(SELECT *
FROM emp
WHERE sal >3500 )
ENAME DEPTNO
SMITH 20
ALLEN 30
WARD 30
JONES 20
MARTIN 30
BLAKE 30
CLARK 10
SCOTT 20
KING 10
TURNER 30
ADAMS 20
JAMES 30
FORD 20
MILLER 10
101
Database Management Check Your Progress – 6 :
System
1. Explain SQL statements involving set membership, set comparisons and
set cardinality operations.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
Aggregate Functions
The five important aggregate functions are SUM, AVG, MAX, MIN, and
COUNT. They are called aggregate functions because they summarize the
results of a query, rather than listing all of the rows.
(1) SUM () gives the total of all the rows, satisfying any conditions, of the
given column, where the given column is numeric.
(2) AVG () gives the average of the given column.
(3) MAX () gives the largest figure in the given column.
(4) MIN () gives the smallest figure in the given column.
(5) COUNT (*) gives the number of rows satisfying the conditions.
• FIND ENAME WHO WAS HIRED FIRST :
SQL> SELECT ENAME FROM EMP WHERE HIREDATE IN(SELECT
MIN(HIREDATE) FROM EMP);
ENAME
–––––
SMITH
• FIND TOTAL SALARY FOR THOSE WHO ARE NOT MANAGER
SQL> SELECT SUM(SAL) FROM EMP WHERE JOB<>'MANAGER';
SUM(SAL)
8275
102
• FIND TOTAL SALARY OF EACH DEPT EXCLUDING THE SQL (Structured
EMPLOYEE WHO ARE NOT SALESMAN AND DISPLAY ONLY Query Language)
THOSE DEPT WHOSE TOTAL>7000
SQL> SELECT DEPTNO,SUM(SAL) FROM EMP WHERE
JOB!='SALESMAN' GROUP BY DEPTNO HAVING SUM(SAL)>7000;
DEPTNO SUM (SAL)
10 8750
20 10875
• FIND AVG SALARY FOR ALL THE JOB TYPES WITH MORE THAN
2 EMPLOYEES
SQL> SELECT JOB,AVG(SAL) FROM EMP GROUP BY JOB HAVING
COUNT(JOB)>2;
JOB AVG(SAL)
CLERK 1037.5
MANAGER 2758.3333
SALESMAN 1400
• DISPLAY EMP. COUNT FOR EACH JOB CATAGORY
SQL> SELECT JOB,DEPTNO,COUNT(ENAME) FROM EMP GROUP
BY JOB,DEPTNO;
JOB DEPTNO COUNT(ENAME)
––––– ––––– –––––
ANALYST 20 2
CLERK 10 1
CLERK 20 2
CLERK 30 1
MANAGER 10 1
MANAGER 20 1
MANAGER 30 1
PRESIDENT 10 1
SALESMAN 30 4
9 ROWS SELECTED.
Check Your Progress – 7 :
1. Explain any five aggregate functions.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
103
Database Management 2. Designers use which of the following to tune the performance of systems
System to support time–critical operations ?
(A) Denormalization (B) Redundant optimization
(C) Optimization (D) Realization
3. If one attribute is determinant of second, which in turn is determinant
of third, then the relation cannot be:
(A) Well–structured (B) 1NF
(C) 2NF (D) 3NF
4. 5NF is designed to cope with :
(A) Transitive dependency (B) Join dependency
(C) Multi valued dependency (D) None of these
5. Third normal form is inadequate in situations where the relation :
(A) has multiple candidate keys
(B) has candidate keys that are composite
(C) has overlapped candidate keys
(D) none of the above
6. function gives the total of all the rows.
(A) sum() (B) Avg() (C) Min() (D) Max()
104
7.10 Suggested Answer for Check Your Progress : SQL (Structured
Query Language)
Check Your Progress 1 :
See Section 1.2
Check Your Progress 2 :
See Section 1.3
Check Your Progress 3 :
See Section 1.4
Check Your Progress 4 :
See Section 1.5
Check Your Progress 5 :
See Section 1.6
Check Your Progress 6 :
See Section 1.7
Check Your Progress 7 :
1 : See Section 1.8 2 : Denormalization
3 : 3NF 4 : Join dependency
5 : none of the above, 6 : sum()
7.11 Glossary :
1. Aggregate – A single [summary] field's value, based on results calculated
from values found in like fields across an entire set or subset of records;
typically counts, sums, averages, first, last, minimum and maximum.
2. Column – Synonymous with field.
3. Constraints – Data restrictions specified in a database; rules that determine
what values the field to the table can assume.
4. Data Retrieval – Data extraction from disparate sources, most operational,
some legacy –– typically in different formats.
5. Data Type – Every field in every table in a database must be declared
as a specific type of data with defined parameters and limitations (e.g.
numeric, character or text, date, logical, etc.), known as a data type.
6. Key – A key is a field, or combination of fields, that uniquely identifies
a record in a table.
7. Candidate Key – (1) One or more fields that will uniquely identify one
record in a table; (2) A potential primary key.
8. Composite Key – A key made up of two or more table columns that,
together, guarantee uniqueness, when there is no single column available
that can guarantee uniqueness by itself.
9. Foreign key – A column or group of columns in a table that corresponds
to or references a primary key in another table in the database. A foreign
Key need not itself be unique, but must uniquely identify the field or
fields in the table that the key references.
105
Database Management 10. Primary key – A field or combination of fields that uniquely identifies
System each record in a table, so that each record can be uniquely distinguished
from every other occurring in the table. A table cannot have more than
one primary key, and a primary key, by definition may not contain a
null value.
11. Nonprocedural Language – For example, SQL; a computer language
in which the programmer cannot dictate in what order a procedure will
process its contained commands.
12. Procedural Language – A computer language that solves a problem by
sequentially executing steps in either a linear or loop back procedure
(or both) until the operation is completed.
13. Query Language – A computer language that enables an end–user to
create and run queries that interact directly with a DBMS to retrieve,
and possibly modify, the data it contains.
14. Nested query – A statement that contains one or more sub queries.
15. Referential Integrity – (1) A series of rules that defines and manages
the link between parent and child records; (2) A state in which all the
tables in the database are consistent with each other; (3) The facility
of any DBMS that ensures the validity of predefined relationships.
16. Schema – (1) The database's metadata –– the structure of an entire
database, which specifies, among other things, the tables, their fields,
and their domains. In some database systems, the linking or join fields
are also specified as part of the schema; (2) The description of a single
table.
17. Table – Synonymous with relation. A collection of data organized into
records and fields (a ka rows and columns), with fields being descriptions
of the kinds of information contained in each record (attributes); and
records being specific instances usually referring to specific objects or
persons (entities).
18. SQL – Pronounced "Sequel", it stands for Structured Query Language,
the standard format for commands that most database software understands.
7.12 Assignment :
Explain in our own words importance of the SQL in today's IT world.
7.13 Activities :
Explain in brief various relational databases available in IT world.
107
Unit
08 SQL CONSTRAINTS
UNIT STRUCTURE
8.0 Learning Objectives
8.1 Introduction
8.2 Not Null Constraint
8.3 Default Constraint
8.4 Unique Constraint
8.5 Primary Key
8.6 Foreign Key
8.7 Check Constrait
8.8 Let Us Sum Up
8.9 Suggested Answer for Check Your Progress
8.10 Glossary
8.11 Assignment
8.12 Activities
8.13 Case Study
8.14 Further Readings
8.1 Introduction :
Constraints are the rules enforced on the data columns of a table. These
are used to limit the type of data that can go into a table. This ensures the
accuracy and reliability of the data in the database. Constraints could be either
on a column level or a table level. The column level constraints are applied
only to one column, whereas the table level constraints are applied to the whole
table.
Following are some of the most commonly used constraints available
in SQL. These constraints have already been discussed in SQL – RDBMS
Concepts chapter, but it's worth to revise them at this point.
• NOT NULL Constraint – Ensures that a column cannot have NULL value.
• DEFAULT Constraint – Provides a default value for a column when none
is specified.
108
• UNIQUE Constraint – Ensures that all values in a column are different. SQL Constraints
• PRIMARY Key – Uniquely identifies each row/record in a database table.
• FOREIGN Key – Uniquely identifies a row/record in any of the given
database table.
• CHECK Constraint – The CHECK constraint ensures that all the values
in a column satisfies certain conditions.
Constraints can be specified when a table is created with the CREATE
TABLE statement or you can use the ALTER TABLE statement to create
constraints even after the table is created.
Dropping Constraints :
Any constraint that you have defined can be dropped using the ALTER
TABLE command with the DROP CONSTRAINT option.
For example, to drop the primary key constraint in the EMPLOYEES
table, you can use the following command.
ALTER TABLE EMPLOYEES DROP CONSTRAINT
CONSTRAINT_NAME;
Some implementations may provide shortcuts for dropping certain
constraints. For example, to drop the primary key constraint for a table in
Oracle, you can use the following command.
ALTER TABLE EMPLOYEES DROP PRIMARY KEY;
109
Database Management 8.3 Default Constraint :
System
The DEFAULT constraint provides a default value to a column when
the INSERT INTO statement does not provide a specific value.
Example :
For example, the following SQL creates a new table called CUSTOMERS
and adds five columns. Here, the SALARY column is set to 5000.00 by default,
so in case the INSERT INTO statement does not provide a value for this column,
then by default this column would be set to 5000.00.
CREATE TABLE CUSTOMERS(
ID INT NOT NULL,
NAME VARCHAR (20) NOT NULL,
AGE INT NOT NULL,
ADDRESS CHAR (25),
SALARY DECIMAL (18, 2) DEFAULT 5000.00,
PRIMARY KEY (ID)
);
If the CUSTOMERS table has already been created, then to add a
DEFAULT constraint to the SALARY column, you would write a query like
the one which is shown in the code block below.
ALTER TABLE CUSTOMERS
MODIFY SALARY DECIMAL (18, 2) DEFAULT 5000.00;
Drop Default Constraint
To drop a DEFAULT constraint, use the following SQL query.
ALTER TABLE CUSTOMERS ALTER COLUMN SALARY DROP
DEFAULT;
110
If the CUSTOMERS table has already been created, then to add a SQL Constraints
UNIQUE constraint to the AGE column. You would write a statement like the
query that is given in the code block below.
ALTER TABLE CUSTOMERS MODIFY AGE INT NOT NULL UNIQUE;
You can also use the following syntax, which supports naming the
constraint in multiple columns as well.
ALTER TABLE CUSTOMERS ADD CONSTRAINT myUniqueConstraint
UNIQUE(AGE, SALARY);
DROP a UNIQUE Constraint
To drop a UNIQUE constraint, use the following SQL query.
ALTER TABLE CUSTOMERS DROP CONSTRAINT
myUniqueConstraint;
If you are using MySQL, then you can use the following syntax ?
ALTER TABLE CUSTOMERS DROP INDEX myUniqueConstraint;
111
Database Management For defining a PRIMARY KEY constraint on multiple columns, use the
System SQL syntax given below.
CREATE TABLE CUSTOMERS(
ID INT NOT NULL,
NAME VARCHAR (20) NOT NULL,
AGE INT NOT NULL,
ADDRESS CHAR (25),
SALARY DECIMAL (18, 2),
PRIMARY KEY (ID, NAME)
);
To create a PRIMARY KEY constraint on the "ID" and "NAMES"
columns when CUSTOMERS table already exists, use the following SQL
syntax.
ALTER TABLE CUSTOMERS
ADD CONSTRAINT PK_CUSTID PRIMARY KEY (ID, NAME);
Delete Primary Key
You can clear the primary key constraints from the table with the syntax
given below.
ALTER TABLE CUSTOMERS DROP PRIMARY KEY ;
112
CREATE TABLE ORDERS ( SQL Constraints
ID INT NOT NULL,
DATE DATETIME,
CUSTOMER_ID INT references CUSTOMERS(ID),
AMOUNT double,
PRIMARY KEY (ID)
);
If the ORDERS table has already been created and the foreign key has
not yet been set, the use the syntax for specifying a foreign key by altering
a table.
ALTER TABLE ORDERS ADD FOREIGN KEY (Customer_ID)
REFERENCES CUSTOMERS (ID);
DROP a FOREIGN KEY Constraint
To drop a FOREIGN KEY constraint, use the following SQL syntax.
ALTER TABLE ORDERS DROP FOREIGN KEY;
113
Database Management To drop a CHECK constraint, use the following SQL syntax. This syntax
System does not work with MySQL.
ALTER TABLE CUSTOMERS DROP CONSTRAINT myCheckConstraint;
Check Your Progress :
1. Which of the following is not a integrity constraint ?
(A) Not null (B) Positive
(C) Unique (D) Check 'predicate'
2. Foreign key is the one in which the ________ of one relation is referenced
in another relation.
(A) Foreign key (B) Primary key
(C) References (D) Check constraint
3. Point out the correct statement.
(A) CHECK constraints enforce domain integrity
(B) UNIQUE constraints enforce the uniqueness of the values in a set
of columns
(C) In a UNIQUE constraint, no two rows in the table can have the
same value for the columns
(D) All of the mentioned
4. Which of the following constraint does not enforce uniqueness ?
(A) UNIQUE (B) Primary key
(C) Foreign key (D) None of the mentioned
5. Constraints can be applied on ___________
(A) Column (B) Table
(C) Field (D) All of the mentioned
8.10 Glossary :
1. NOT NULL – Ensures that a column cannot have a NULL value
2. UNIQUE – Ensures that all values in a column are different
3. PRIMARY KEY – A combination of a NOT NULL and UNIQUE.
Uniquely identifies each row in a table
4. FOREIGN KEY – Uniquely identifies a row/record in another table
114
5. CHECK – Ensures that all values in a column satisfies a specific SQL Constraints
condition
6. DEFAULT – Sets a default value for a column when no value is specified
7. INDEX – Used to create and retrieve data from the database very quickly
8.11 Assignment :
Create user registration table with all constraints utilization.
8.12 Activities :
Study the constraints in detail.
115
Unit
TRANSACTION
09 PROCESSING
UNIT STRUCTURE
9.0 Learning Objectives
9.1 Introduction
9.2 Types of Transactions
9.2.1 Concurrent Transactions
9.2.2 Discreet Transactions
9.2.3 Distributed Transactions
9.2.4 In–Doubt Transactions
9.2.5 Normal Transactions
9.2.6 Read–Only Transactions
9.2.7 Remote Transactions
9.3 Read–Consistency
9.4 Steps to Processing a Transaction
9.4.1 Entering DML/DDL Statements
9.4.2 Assigning Rollback Segments
9.4.3 Long–Running Transactions and Rollback Segment Allocation
9.4.4 Using the Optimizer
9.4.5 Cost–Based Analysis
9.4.6 Rule–Based Analysis
9.4.7 Overriding the Optimizer_Mode Parameter
9.4.8 Parsing Statements
9.4.9 Handling Locks
9.4.10 Stepping Through the Transaction
9.5 Processing a Remote or Distributed Transaction
9.5.1 Entering DDL/DML Statements
9.5.2 Assigning Rollback Segments
9.5.3 Breaking down Statements
9.5.4 Optimizing Local Statements
9.5.5 Forwarding Remote Commands
9.5.6 Assigning Remote Rollback Segments and Writing Redo Logs
9.5.7 Optimizing Remote Statement
9.5.8 Returning Data to the Local Database
9.5.9 Summarizing Remote and Distributed Transactions
9.6 Let Us Sum Up
9.7 Suggested Answer for Check Your Progress
116
9.8 Glossary Transaction Processing
9.9 Assignment
9.10 Activities
9.11 Case Study
9.12 Further Readings
9.1 Introduction :
Understanding how a transaction begins, executes, and ends, and knowing
what happens along each step of the way are vital parts of making Oracle
work for you. This knowledge is helpful not only to system and database
administrators, but to Oracle developers as well. Knowing when a transaction
is issued, a rollback segment, or how locking occurs in the database can
drastically change the strategy of creating applications or nightly processes.
A transaction is directly related to a session, but it is still considered
a separate entity. A session, simply stated, is a single connection to a database
instance based upon a username and, optionally, a password. All sessions in
a database instance are unique, which means that they have a unique identifier
setting them apart from the other users and processes accessing the database.
This unique identifier, called a SID, is assigned by the instance and can be
reused after a session is ended. The combination of the SID and a session
serial number guarantees that each no session, even if the number is reused,
is identical.
A transaction, also in simplified terms, is a specific task, or set of tasks,
to be executed against the database. Transactions start with an executable DML
statement and end when the statement or multiple statements are all either rolled
back or committed to the database, or when a DDL (Data Definition Language)
statement is issued during the transaction.
If COMMIT or ROLLBACK statements are issued from the command
line, the transaction is said to have been explicitly ended. However, if you
issue a DDL command (DROP TABLE, ALTER TABLE, etc.) the previous
statements in your transaction will be committed (or rolled back if unable to
commit), the transaction will be implicitly ended, and a new transaction will
begin and then end.
117
Database Management To illustrate these rules, assume that you log into your database to update
System and modify your customer tables. What you would like to do is enter 100 new
customers to your database. You do this by creating a temporary table to hold
that customer information, search your customer table for duplicates, and update
those records if they do not exist. Though this is unlikely, assume that you
must update customer table before checking for duplicate entries. The sequence
would look like the steps listed in the following sequence of session and
transaction begins and ends without save points.
1. Connect to SQL*Plus (begin session #1).
2. Create temporary customer table (begin and end transaction #1).
3. Insert new customer information into temporary table (begin transaction
#2).
4. Step through entries in temporary table (continue transaction #2).
5. Update customer table (continue transaction #2).
6. Check for duplicate entries (continue transaction #2). If duplicates exist,
roll back entire transaction (end transaction #2).
7. Repeat steps 4–7 until complete or duplicates are found.
8. Drop temporary table (end transaction #2, begin and end transaction #3).
9. Exit from SQL*Plus (end session #1).
Notice how the create–table and drop–table steps (2 and 8) begin and
end a transaction. If you found duplicate entries in your tables, step 8 would
actually end transaction #3 and begin and end transaction #4. Also note that
the DDL command in step 5 implicitly ended transaction #2 by committing
any changes made before beginning transaction #3. Finally, it is important to
realize that if you had done another update between steps 5 and 6, the exit
from SQL*Plus would have implicitly ended transaction #4 (started by the
update) by issuing a commit before exiting.
One other form of implicitly ending a transaction includes terminating
a session either normally or abnormally. When these situations arise, the
instance automatically attempts to commit the current transaction. If that is
not possible, the transaction will be rolled back.
Commits, Rollbacks, and Save points
Although these topics are discussed elsewhere in this book, it is important
to note how they affect a given transaction. As mentioned earlier, commits
and rollbacks both end a transaction. Commit makes all changes made to the
data permanent. Rollback reverses all changes made during the transaction by
restoring the previous state of all modified data. With the use of save points,
the ROLLBACK command can also be used to roll back only a portion of
a transaction.
Save points were designed to be used as logical stopping points from
within a single transaction. They are helpful in splitting up extremely long
transactions into smaller portions, and they provide points of recovery along
the way. Using save points within a transaction enables you to roll back the
transaction to any given save point as long as a commit has not been issued
(which immediately commits all data, erases all save points, and ends the
transaction). Refer to Chapter 6, "SQL*Plus," to learn more about the
118
SAVEPOINT command as well as how to use ROLLBACK to roll the current Transaction Processing
transaction back to a specified save point.
The following list is an update to the previously shown sequence with
the addition of save points. Refer to this example to show how save points
affect the transaction.
1. Connect to SQL*Plus (begin session #1).
2. Create temporary customer table (begin and end transaction #1).
3. Insert new customer information into temporary table (begin transaction
#2).
4. Step through each entry in the temporary table (continue transaction #2).
5. Create unique save point (continue transaction #2).
6. Update customer table with information from temporary table (continue
transaction #2).
7. If duplicate customer is found, roll back to save point (continue transaction
#2).
8. Repeat steps 4–7 until finished.
9. Issue commit (end transaction #2).
10. Drop temporary table (begin and end transaction #3).
11. Exit from SQL*Plus (end session #1).
Notice how the save point enables you to roll back to a point within
your current transaction without affecting the previous updates before the save
point. Anywhere within your procedure, you can roll back to any save point
or you can roll back the entire transaction. By using the save points, you are
providing a collection of recovery points that are available to you until you
end that transaction. Once the transaction is ended, all save points are erased.
Multiple save points give you even greater flexibility in rolling back
portions of long or complex transactions if an error occurs before the completion
of the process.
There is no limit to the number of save points you can create during
any given transaction, but be careful that the ones you do create are logically
named in case you must roll back to them.
Transaction Control Statements
Transaction control statements are statements that affect the execution
or properties of a transaction, whether it is the management of data or
characteristics of how the transaction executes. The family of transaction control
statements includes:
• COMMIT
• SAVEPOINT
• ROLLBACK
• SET TRANSACTION
119
Database Management interpreting Oracle errors returned during a transaction. These terms cover types
System of transactions as well as other terms used in identifying them.
9.2.1 Concurrent Transactions :
Concurrent transactions are transactions that are executed in the same
general time. These transactions, because they have started so close to each
other, generally do not see the changes made by the other transactions. Any
data that has been updated by a concurrent transaction and requested by another
concurrently running transaction must be read from rollback segments until the
transaction requesting the data has completed.
9.2.2 Discreet Transactions :
A discreet transaction is used to improve the performance of short
transactions. For developers creating custom applications, the procedure
BEGIN_DISCREET_TRANSACTION() changes the steps followed during the
duration of a session in order to speed its processing. The main differences
are as follows :
• All changes are held until the transaction ends
• Other transactions cannot see uncommitted changes
• Redo information is stored in a separate location in the SGA
• No rollback information is written because all changes are held until a
commit and then applied directly to the data block(s)
9.2.3 Distributed Transactions :
Distributed transactions are transactions in which one or more statements
manipulate data on two or more nodes, or remote systems of a distributed
database. If a transaction manipulates data on only one node, it is considered
a remote transaction. As in a remote transaction, none of the redo information
is stored locally.
9.2.4 In–Doubt Transactions :
An in–doubt transaction is actually a state of a transaction instead of
a type and refers to transactions within a distributed database environment.
One situation that causes this state is if an instance involved in a currently
running transaction fails, that transaction must be either rolled back or committed.
It is difficult, however, to do either without knowing the state of the transaction
in the affected database. In this case, all other instances in the distributed
environment mark this transaction as in–doubt. Once the instance is restarted,
the transaction can be analyzed and all instances can either commit or rollback.
9.2.5 Normal Transactions :
Normal transaction is a term used to refer to a local (non–remote)
transaction. All redo information is stored in the local database, and all data
manipulation is done to the same database. This type of transaction is the focus
for the discussion on transaction processing.
9.2.6 Read–Only Transactions :
Read–only refers to the type of read consistency that is set or defaulted
to for a given transaction. By default, the level of read consistency is statement
level, which is also known as read–write. This means that each consecutive
statement in your transaction will see the changes made to the database by
120
any previous statements regardless of whose transaction has committed the Transaction Processing
changes.
9.2.7 Remote Transactions :
Remote transactions are transactions containing single or multiple
statement(s) to be executed against a non–local database. All these statements
reference the same node. If they do not, they are considered separate remote
transactions and the instance will split them up. One of the major differences
between remote and normal transactions is that redo and rollback information
against a remote transaction is stored on the remote database. None of this
information is transferred to your local database to be used for recovery.
Check Your Progress – 1 :
1. Explain all the types of transactions.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
9.3 Read–Consistency :
Read–consistency is not a difficult concept to grasp. In short, read–
consistency guarantees that the data you are viewing while executing a transaction
does not change during that transaction. With read–consistency, if two users
are updating the same table at the same time, user 1 will not see the changes
made by the other user during their transaction. User 2, likewise, cannot see
any changes committed by user 1 until both transactions are complete. If they
happen to be working on the same row in the table, this becomes a locking
issue instead of read–consistency. A later section discusses locking.
Read–consistency is the major building block that enables multiple users
to update, select, and delete from the same tables without having to keep a
private copy for each user. When combined with locking techniques, read–
consistency provides the foundation for a multi–user database in which users
can do similar or identical operations without difficulty.
Take a look at an example of the way read–consistency works in a
theoretical telemarketing department; user1 is entering an order for a customer,
while user 2 is changing customer information. Both users have concurrent
transactions (they are executing at the same time), but user1 began the transaction
first. Suppose that user 2 makes a mistake and changes the phone number for
the same customer whose order is being entered. Because user1 began the
transaction first, they will always be looking at the 'before picture' of the
customer data and will see the customer's previous phone number when querying
the user's data. This is true even if user 2 commits their changes. Why ? Because
it is possible that user1's transaction is solely dependent on the data that existed
when their transaction began. Imagine the confusion that would result if data
could be changed while an already executing query was making changes based
on that data! It would be nearly impossible to guarantee the coordination and
functioning of all processing within the application.
121
Database Management Read–consistency is also a secondary function of rollback segments.
System Aside from being able to undo changes from a transaction, they also provide
other users with a 'before picture' of the data being modified by any process.
If a transaction must review data that has been modified by a concurrent
uncommitted transaction, it must look in the rollback segments to find that
data.
Check Your Progress – 2 :
1. Explain the concept read consistency.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
123
Database Management 9.4.4 Using the Optimizer :
System
Oracle's optimizer is a critical part in the execution of a transaction. The
optimizer is responsible for taking a SQL statement, identifying the most
efficient way of executing the statement, and then returning the data requested.
There is a high likelihood that a SQL statement can be executed in more than
one way. The optimizer is responsible for identifying the most efficient means
of executing that statement.
Optimization can take many steps to complete, depending on the SQL
statement. The steps used to execute the SQL statement are called an execution
plan. Once the execution plan is completed, it is then followed to provide the
desired results (updated or returned data).
9.4.5 Cost–Based Analysis :
Cost–based analysis is a mode of analyzing SQL statements to provide
the most efficient way of execution. When the optimizer is running in cost–
based mode, it follows these steps to decide which plan is the best way to
execute the statement unless the developer has provided a hint to use in the
execution.
1. Generate a set of execution plans based on available access paths.
2. Rank each plan based on estimated elapsed time to complete.
3. Choose the plan with the lowest ranking (shortest elapsed time).
Cost–based analysis uses statistics generated by the ANALYZE command
for tables, indexes, and clusters to estimate the total I/O, CPU, and memory
requirements required to run each execution plan. Because the goal of the cost–
based approach is to provide maximum throughput, the execution plan with
the lowest ranking or lowest estimated I/O, CPU, and memory requirements
will be used.
The analysis used to provide the final cost of an execution plan is based
on the following data dictionary views:
• USER_TABLES, USER_TAB_COLUMNS, USER_INDEXES,
USER_CLUSTERS
• ALL_TABLES, ALL_TAB_COLUMNS, ALL_INDEXES,
ALL_CLUSTERS
• DBA_TABLES, DBA_TAB_COLUMNS, DBA_INDEXES,
DBA_CLUSTERS
9.4.6 Rule–Based Analysis :
Rule–based analysis rates the execution plans according to the access
paths available and the information in Table 9.1. The rule–based approach uses
those rankings to provide an overall rating on the execution plan and uses the
plan with the lowest ranking. Generally speaking the lower the rating, the shorter
the execution time though this is not always the case.
124
Table 9.1 : Access type ratings Ranking Type of Access Transaction Processing
125
Database Management 9.4.8 Parsing Statements :
System
A parsed statement is not to be confused with an execution plan of a
statement. Whereas an execution plan examines the most efficient way to
execute a statement, parsing the statement creates the actual executable statements
to be used in retrieving the data. Parsing a statement is a one–step process
by the optimizer to do the following:
• Check semantics and syntax
• Verify that the user has the appropriate privileges to execute this statement
• Allocate private SQL space to store the statement
• Check for duplicate statements in the shared SQL area
• Generate an executable version of parsed SQL if necessary
• Allocate and stores SQL in shared library cache if it does not already
exist
When checking the syntax and semantics, the instance is verifying that
no key words or necessary parameters are missing. If the statement is in correct
form, the instance then verifies that the user has the correct privileges required
to carry out the execution of the statement. Once these have been verified,
space is allocated in the private SQL area for the user's statement. This statement
is saved until either it is needed again or the memory space is required to
store another parsed statement.
After allocating space in the private SQL area, the instance searches
through the shared SQL area for any duplicate statements. If a duplicate
statement is found, the executable version of the statement is retrieved from
memory and executed by the process, and the private SQL area is pointed to
the statement in the shared area. If it is not found, an executable version is
created and stored in the private SQL area only.
9.4.9 Handling Locks :
The locking of data rows and/or tables is completely automated and
transparent to the user. Once the executable version of the SQL statement is
run, Oracle automatically attempts to lock data at the lowest level required.
This means that if possible, a row will be locked instead of the entire table.
This is dependent solely on how the SQL statement was written and what types
of access are required (full table scan versus single rows).
A form of manual, or explicit, locking can take place by using the LOCK
TABLE command. By default, these commands are not necessary in day–to–
day processing. Oracle recommends that you allow the database to handle all
locking of data whenever possible.
9.4.10 Stepping Through the Transaction :
From this point, there are several paths that a transaction can take to
completion. Most commonly, the transaction is committed. Still, handling must
be taken into account for transactions that are rolled back. Following are the
steps taken during a commit:
1. Instance's transaction table marks transaction as complete.
2. A unique SCN (system change number) is generated.
3. Redo log entries are written to disk.
126
4. Any acquired locks are released. Transaction Processing
5. Transaction is marked as having completed.
Should any of these steps fail, the transaction cannot be committed.
Depending on the nature of the error, the transaction will either wait for the
problem to be fixed so it can complete the transaction, or it will be rolled
back.
The following steps illustrate what must take place if a transaction is
rolled back:
1. All changes are rolled back to previous savepoint and savepoint is
preserved (or beginning of transaction if no savepoints have been specified).
2. If savepoint is not the last specified savepoint, all savepoints after this
one are erased.
3. Acquired locks are released.
4. Transaction continues (if no savepoints were specified then the transaction
is ended).
If transaction is ended, rollback segments are released as well, though
no SCN is recorded
Check Your Progress – 3 :
1. Explain the steps in processing transaction in brief.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
131
Database Management Entering DML/DDL Statements
System
The issuing of DML or DDL statements can take place through a number
of ways, including SQL*Forms, SQL*Plus, and custom C programs. The rules
governing the start and end of transactions are the same no matter which way
a SQL statement is issued.
Rollback segments are assigned randomly by Oracle at the beginning of
each transaction (not session) when the first DML statement is issued, and
they are used to roll back the transaction to a specified save point or to the
beginning of a transaction. The selection of a rollback segment is based on
which rollback segments are currently available and how busy each segment
is. By default, DDL statements are not issued rollback segments due to the
nature of DDL commands. They are, however, issued redo entries so that the
modifications can be reapplied if the database must be recovered.
Using the Optimizer
Oracle's optimizer is a critical part in the execution of a transaction. The
optimizer is responsible for taking a SQL statement, identifying the most
efficient way of executing the statement, and then returning the data requested.
There is a high likelihood that a SQL statement can be executed in more than
one way. The optimizer is responsible for identifying the most efficient means
of executing that statement.
Cost–based analysis is a mode of analyzing SQL statements to provide
the most efficient way of execution. When the optimizer is running in cost–
based mode, it follows these steps to decide which plan is the best way to
execute the statement unless the developer has provided a hint to use in the
execution.
Rule–Based Analysis
Rule–based analysis rates the execution plans according to the access
paths available. The rule–based approach uses those rankings to provide an
overall rating on the execution plan and uses the plan with the lowest ranking.
Handling Locks
The locking of data rows and/or tables is completely automated and
transparent to the user. Once the executable version of the SQL statement is
run, Oracle automatically attempts to lock data at the lowest level required.
This means that if possible, a row will be locked instead of the entire table.
This is dependent solely on how the SQL statement was written and what types
of access are required (full table scan versus single rows).
Stepping Through the Transaction –
Following are the steps taken during a commit.
• Instance's transaction table marks transaction as complete.
• A unique SCN (system change number) is generated.
• Redo log entries are written to disk.
• Any acquired locks are released.
Transaction is marked as having completed.
132
9.7 Suggested Answer for Check Your Progress : Transaction Processing
9.8 Glossary :
1. Transaction – In simplified terms, it is a specific task, or set of tasks,
to be executed against the database
2. Commit – The COMMIT command is the transactional command used
to save changes invoked by a transaction to the database.
3. Rollback – The ROLLBACK command is the transactional command
used to undo transactions that have not already been saved to the database.
4. Save point – A SAVEPOINT is a point in a transaction when you can
roll the transaction back to a certain point without rolling back the entire
transaction.
5. Set Transaction – The Set Transaction command can be used to initiate
a database transaction.
6. Concurrent transactions – These are transactions that are executed in
the same general time.
7. Discreet Transaction – It is used to improve the performance of short
transactions.
8. Distributed transactions – These are transactions in which one or more
statements manipulate data on two or more nodes, or remote systems,
of a distributed database
9. Remote transactions – These are transactions containing single or multiple
statement(s) to be executed against a non–local database
10. Execution plan – The steps used to execute the SQL statement are called
an execution plan
11. Cost–based analysis – It is a mode of analyzing SQL statements to
provide the most efficient way of execution.
12. Rule–based analysis – It rates the execution plans according to the access
paths available and the information.
13. Locking – Reserving the use of a database, table, record, or other
collection of data for access by one user at any one time to prevent
the resulting anomalous database behavior when, for example, one user
reads or modifies data in a record which is in the process of being
modified by another user.
133
Database Management 14. Read Lock – An imposed access barrier that allows a user to query
System and view all or part of a database, but not to modify it. Some database
management systems will allow many users to have simultaneous read
locks.
15. Write Lock – A lock that allows the user to read and to modify the
database. Write locks almost always imply exclusive control of the
database so that other users will not be allowed to have either active
read or write locks as long as one write lock is operational.
9.9 Assignment :
State the difference between Data processing and Transaction Processing.
9.10 Activities :
Explain the use of optimizer in execution of transaction.
134
Unit OBJECT ORIENTED DATABASE
10 MANAGEMENT SYSTEMS
(OODBMS)
UNIT STRUCTURE
10.0 Learning Objectives
10.1 Introduction
10.2 Introduction to Database Management Systems (DBMS)
10.3 Example of Bank Transactions
10.4 Object Oriented Database (OODB)
10.5 Related terms
10.5.1 Distributed Object Computing (DOC)
10.5.2 Objects Methods Users
10.5.3 Interfaces
10.5.4 Associations
10.5.5 Persistent Objects
10.5.6 Persistence Data
10.5.7 Transient Data
10.5.8 Referential Integrity
10.5.9 MDBS
10.5.10 ODBC (Open Database Connectivity)
10.5.11 Locks
10.5.12 ActiveX
10.5.13 OOSAD
10.5.14 CORBA
10.5.15 DCOM
10.5.16 OMG
10.5.17 CORBA Open DOC ActiveX
10.5.18 Virtual DBMS
10.6 Object Oriented Database Management Systems (OODBMS)
10.7 Comparison between RDBMS and OODBMS
10.8 A Three Schema Architecture
10.9 Mapping of OODBMS to RDBMS
10.10 Example of Railway Reservation System
10.11 Let Us Sum Up
10.12 Suggested Answer for Check Your Progress
10.13 Glossary
10.14 Assignment
10.15 Activities
10.16 Case Study
10.17 Further Readings
135
Database Management 10.0 Learning Objectives :
System
After learning this unit, you will be able to understand :
• Basic concepts of Database, DBMS, RDBMS and OODBMS
• Use of OO concepts in database
• Practical applications such as bank transactions, railway reservations
• Principle of mapping of OODBMS to RDBMS
• Difference between OODBMS and RDBMS
• A three schema architecture
• Basic Concept of Persistent Object
• Basic Concept of Transient Object
10.1 Introduction :
Relational database systems have proved their worth in the domain of
business
Applications, particularly those dealing with accounting. The relational
data model, however, is not suitable for all application domains. New applications
involving complex data modeling (i.e. that do not map well to tables) now
require the services normally associated with database systems: persistence,
transactions, authorization, distribution, versioning, data stability, buffering, and
associative access.
To illustrate, let's examine a CAD/CAM application for a company that
manufactures airplanes. The application supports both the specification and
design of all parts required to build an airplane. Modeling physical objects
does not reduce easily to tabular, or relational, form. In particular, an airplane
requires many duplicate parts, each of which would require a unique tag to
be stored as a distinct entity in a relational database. Furthermore, the relations
representing sets of different parts that are mostly similar would require
separate, independent schemas. Finally, the application programmer almost
definitely would prefer to manipulate part designs as complex abstractions at
a level higher than that provided in the relational model.
Our example application, however, requires database services. An airplane
design team typically consists of several people, all of whom will desire access
to the current state of the design. In today's workplace, it is likely that these
designers will be using workstations distributed over a network. In addition,
some people should not be allowed total access to certain aspects of the design
(e.g. documenters do not need update access). Finally, a completed design can
involve hundreds of thousands of parts and direct access to each part becomes
impractical: thus, associative access is essential. For instance, a designer may
wish to know how many times a given part has been used before deciding
to change its specification.
Object–oriented databases, then, are an attempt to solve the problems
mentioned and still maintain the advantages of database systems. Object–
oriented databases treat each entity as a distinct object. An assembly composed
of several parts, therefore, can refer directly to its components instead of
explicitly associating some unique identifier with each component in some
relation. In addition, application programmers can manipulate database entities
at any desired level of abstraction by extending the set of types recognized
136
by the database system. This is an important point – it means that the Object Oriented
programmer need not be concerned with transforming an application's persistent Database Management
data into a form manipulable by the underlying storage subsystem. In many Systems (OODBMS)
systems, a programmer can also incorporate totally new, variable–sized data
types (e.g. multimedia objects). Finally, object–oriented databases allow embedded
semantics by associating procedural information with objects.
Design the
Run the Write a code to
database.
application Populate compensate for DBMS
Write DBMS
Query Database shortcomings, provide a
code to set
and With user interface, validate
up the
Update Information data and perform
database
database. computations,
structure.
138
Bank account Object Object Oriented
Database Management
Attributes Systems (OODBMS)
• Customer
• Balance
• Interest
Operations
• Deposit
• Withdraw
• Get Owner
Inheritance Inheritance
139
Database Management existing data structure. The following parameters are empowered with
System authorization capabilities to perform only certain operations.
Objects Systems Users
The company may have functional databases such as BOM (Bill Of
Material), drawings, equipment. These databases are heterogeneous in nature
and are distributed physically, logically and by responsibility of control
Check Your Progress – 1 :
1. Why there is need of OODBMS ?
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
140
Client applications accessing OODB do not have to specify relationships Object Oriented
because objects are stored with established relationships. Database Management
Systems (OODBMS)
OODBs store and deliver objects and provide facilities to manipulate
objects.
When we need to create multiple versions of entity, OODB becomes very
handy.
OODB can be a solution to handle complex data structures.
The strength of OODB is best seen when:
• Data and relationships are irregular and complex
• Objects are tightly coupled
• Non–DBMS data sources and applications are integrated
OODB should meet following requirements to design a system in OO
technology.
1. User interface support
2. Change management
3. Object translation into various activities.
4. OODB security
5. OODB recovery
6. Accessibility to legacy systems data
7. Maintenance of class libraries
8. Maintenance of data dictionary
9. OODB integrity
10. Providing concurrent access by multiple users
11. Distribute database with location transparency
12. Handling query with efficiency
13. C++ based OO model
14. Maintaining persistent objects
15. Object translation into various activities.
The user must have access to both OODB and RDB to manipulate data.
The developer therefore must develop applications that could source data from
all databases (OODB, RDB etc.).
Check Your Progress – 2 :
1. What are requirements OODB should meet to design a system in OO
technology ?
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
141
Database Management 10.5 Related Terms :
System
10.5.1 Distributed Object Computing (DOC) :
In first generation client–server computing, methods/applications were
available to serve on demand at server side. Having independent locations, the
objects need to communicate with each other for processing and computing.
This is called second generation client–server computing.
Having independent locations, the objects need to communicate with each
other for processing and computing. This is called second generation client–
server computing.
In first generation client–server computing, methods/applications were
available to serve on demand at server side.
Since there is so much heterogeneity distributed in the network in terms
of servers, clients, databases, platforms, applications and architecture, which
cannot be easily dispended with, what is needed is therefore standard DOC
system.
The second generation client–server computing is based on distributed
object computing (DOC). DOC enables the development of extremely flexible
client–server systems as it is possible to locate reusable objects/components
stored anywhere in the network and manipulate them as per the application
requirements. DOC introduces a higher level of abstraction of the application.
10.5.2 Objects Methods Users :
With the advent of DOC, it is now possible to use following components
anywhere in the network and running on different platforms.
a. Reusable Components
b. Distributed Components
c. Resident Components
10.5.3 Interfaces :
In the normal course, the developer would need to write interfaces for
each method or server application whatever resident.
10.5.4 Associations :
Associations show logical link between objects.
The cardinality of association is specified when the class is defined. The
links are expressed in cardinality of associations.
10.5.5 Persistent Objects :
The object store stores objects that have ability to survive even though
the program or system that created them is no longer live. Such objects are
called Persistent Objects.
Persistent Object Storage System under DBMS can provide the following:
• Unique ID to objects so as to reach correctly
• Communication between objects
• Large, stable and reliable storage capacity
The stores for persistent objects, such as disk files do not support queries
or interactive user interface operations. In such stores following problems are
142 seen :
• Controlling concurrent access Object Oriented
Database Management
• Handling ad–hoc queries
Systems (OODBMS)
• Controlling physical locals of data entry
The stores for persistent objects, such as disk files do not support queries
or interactive user interface operations. These problems are solved by DBMS.
10.5.6 Persistence Data :
The data once created by a process survives for long till it is created
externally, is called persistence data.
The data that exists due to its need for ever is categorized under
persistence data.
10.5.7 Transient Data :
The data that has no value once the process that creates it is no more
in existence is called Transient data. Variables allocated dynamically belong
to Transient data. The following are some examples of transient data:
• Discounts calculated to arrive at net price
• Intermediate values computed in discount algorithms
• Data extracted from external sources for reference or use.
10.5.8 Referential Integrity :
The integrity that guarantees existence of all objects that need to be
referred is known as Referential Integrity
10.5.9 MDBS :
The Multi–database (MDBS) is a database system that sits on the top
of existing relational and OODBMS but gives impression of being a single
database to the users.
MDBS provides diverse set of interfaces to reach data stored at any
database and then manipulates it to produce the result. MDBS helps to integrate
homogeneous database system with multiple database and data types other than
relational data types. MDBS handles cross database functionality through virtual
database, by constructing unified schemes. MDBS coordinates the processing
of global transactions split into various local databases to achieve consistency
and integrity of data and results. MDBS translates global queries and updates
data sending it to the appropriate local database for actual processing. Automatic
generation of unified global database using a local database (OODB, RDB,
file system) is done by MDBS. Fig. shows virtual database model.
Virtual
Applications
MDBS virtual database model
143
Database Management 10.5.10 ODBC (Open Database Connectivity) :
System
An API (Application Programming Interface) that provides solution to
reach the intended database is called as ODBC.ODBC is developed by Microsoft
Corporation
The ODBC driver manager manages an application's data needs by
mapping data requirements to data source in the database.
OODBMS provides all required advanced features in addition to all
features of RDBMS. ODBC with API provides standard database access through
a common interface, independent of application. With its strength, ODBC has
become an industry standard for inter–operatability on different databases.
The ODBC–API driver drives to reach to intended database, runs the
application, manipulates the data and puts it back in position as the new data
image.
Developers write applications using different databases. In order to
interconnect these databases, developer has to write a driver program called
ODBC driver.
JDBC (Java database Connectivity) and ODBC can be combined to form
a JDBC–ODBC Bridge.
ODBC Driver A
Application
Manager ODB
programs
ODBC
Application Apl
Drivers B
programs
For RDB
Application DBS
programs
C
Local DB
ODBC connectivity and application
10.5.11 Locks :
Concurrency control and synchronization of different processes is achieved
by using locks. Following types of locks are available:
• Read Lock – No other transaction is committed till the earlier transaction
completes the task
• Writes Lock – The lock that ensures that no other transaction is committed
till the transaction is in the mode of updating and creation of new version
of object is called Write Lock.
• Null lock – The lock in which objects are managed in a cache is called
null lock.
• Notify lock – Cause interruption of the program when a new value is
committed in OODB.
Null and Notify locks are generally used outside transaction boundary
in a cache. On other hand, Read and Write locks are generally used inside
transaction boundary.
144
10.5.12 ActiveX : Object Oriented
Database Management
ActiveX controls are third party controls. For example, if you are using
Systems (OODBMS)
Visual C++, you can import. doc files from MS Word. ActiveX is also known
as OLE (Object Linking and embedding). The ActiveX technology is developed
by Microsoft Corporation.
10.5.13 OOSAD :
When we move to OOSAD, server–resident applications will give way
to objects and distributed objects in distributed server environment.
10.5.14 CORBA :
The advantage of CORBA is that it allows developer through Interface
Definition language (IDL) language to specify language neutral object–oriented
interfaces for applications and their components.
CORBA provides a communication channel through which applications
can access object interfaces and request data and services.
Microsoft's Component Object Model (COM) and Distributed Component
Object Model (DCOM) are alternatives to CORBA.
10.5.15 DCOM :
DCOM is the internet and component strategy where ActiveX plays a
role of DCOM object. An easy passage to distributed object computing technology
such as CORBA and DCOM is provided by ALC (Access Layer Classes).
DCOM is supported by Microsoft's web browser Internet Explorer.
Since there is heterogeneity in computing environment platform,
programming languages and different application architecture, we are living in
a non–standard environment.
10.5.16 OMG :
The Standard DOC system was developed by the organization of about
500 companies named Object Management Group (OMG).
The Object Management Group (OMG) suggested an abstract object
model with following features :
• Object can be distinguished clearly.
• The object methods and operations are governed by parameters and their
values.
• The object's nature is persistent or transient.
• An object at any time has state
• An object has set of operations
• Object provides services to other clients.
Many organizations are now adopting OMG's Object Request Broker
Architecture Standard for integrating.
• Distributed business applications
• Heterogeneous business applications
• Business data types
Object Management Group (OMG) specifies an architecture for Open
Software Bus.
145
Database Management At present, following are the computing standards of OSB (Open Software
System Bus).
10.5.17 CORBA Open DOC ActiveX :
Open Software Bus provides a platform for objects components to operate
across the networks and on different operating systems.
10.5.18 Virtual DBMS :
In multi–database environment when we are dealing with multiple
heterogeneous databases for application processing we need an interface to
access the Virtual DBMS.
Access to the Virtual DBMS can be provided by API. API is used by
developers to interface the application with the database.
Check Your Progress – 3 :
1. Explain the terms like Persistent objects, Persistent data, Transient Data
2. Explain terms ODBC, MDBS,DCOM, CORBA
.......................................................................................................................
.......................................................................................................................
3. Full form of CORBA
4. Full Form of ODBC
5. Full form of OMG
The OODBMS recovery process ensures that the system failure does not
result in an inconsistent database state.
The significant features of OODBMS are as follows:
• OODBMS has a feature of managing traffic of persistent objects.
• OODBMS ensures that when a pointer refers to an object, the object
exists
• OODBMS manages logical link association and its cardinality
• OODBMS can work on following Operating systems on server :– MS
Windows, OS/2, Sun OS In context of transactions, OODBMS
a. Handles nested transactions
b. Supports MROW
c. Supports long transactions.
• Access to other OODBMS comprises of
a. Read data resident in other OODB
b. Modify data resident in other OODB
c. Read data resident on RDBMS
• ODBMS Queries comprise of
a. Ad–hoc queries with C++
b. Ad–hoc queries with 4GL
c. Ad–hoc queries with LISP
• O2, Objectivity, Object Store and Versant are some of the Object Oriented
Database produces
• All OODBMS products namely O2, Objectivity, Object Store and Versant
support following DBMS criteria
a. SUN OS
b. AIX
c. MS Windows
• All OODBMS products namely O2, Objectivity, Object Store and Versant
support following DBMS criteria
a. single–user, single–tasking environment
b. single–user, multi–tasking environment
c. multi–user environment
• All OODBMS products namely O2, Objectivity, Object Store and Versant
support following criteria for access to other DBMS whenever an
application is running on the OODBMS
a. can read data that resides in other OODBMS
b. can modify data that resides in other OODBMS
c. can read data that resides in RDBMS ORACLE
147
Database Management • All OODBMS products namely O2, Objectivity, Object Store and Versant
System support following DBMS criteria:
a. embedded queries with a 4GL
b. ad–hoc updates of DB–schema with GUI
c. ad–hoc updates of DB–schema with OOL.
• All OODBMS products namely O2, Objectivity, Object Store and Versant
support ad–hoc queries with the following
a. GUI
b. 4GL
c. C++
• All OODBMS products namely O2, Objectivity, Object Store and Versant
support following DBMS criteria
a. OQL
b. ODMG C++ binding
c. Smalltalk binding
• All OODBMS products namely O2, Objectivity, Object Store and
• Versant support application programming in following languages.
a. C++
b. JAVA
c. SMALLTALK
• All OODBMS products namely O2, Objectivity, Object Store and Versant
support following DBMS criteria
a. replication of data
b. data encryption
c. database language based on SQL
• In OODBMS product O2, the attributes of objects have to be defined
in following languages.
a. C++
b. JAVA
c. SMALLTALK
• All OODBMS products namely O2, Objectivity, Object Store and Versant
support following DBMS criteria
a. multiple inheritance
b. concept of version
c. checking of cardinality between objects
• All OODBMS products namely O2, Objectivity, Object Store and
• Versant support following DBMS criteria
a. user defined data type
b. is a relationship
c. part of relationship
148
• OODBMS supports comprise of Object Oriented
Database Management
a. user defined data type
Systems (OODBMS)
b. is a relationship
c. part of relationship
• OODBMS standards comprise of
a. ODL
b. OQL
c. ODMG C++ binding
• Following are the OO concepts managed by OODBMS
a. Complex objects storage structures
b. Query processing & displaying results
c. Concurrency control on multiple accesses
• Some parameters of OODBMS
Parameter Example
OODBMS standards ODL, OQL, ODMG C++ binding, standard
SQL in interactive and embedded mode
OODBMS supports user derived datatype, is_a relationship, part_of
relationship
Queries Ad–hoc queries with 4GL, Lisp, C++
Accesstoother Read/Modify data in other OODB
OODBMS
Check Your Progress – 4 :
1. Explain the features of OODBMS.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
149
Database Management
Records in RDBMS are passive Records in OODBMS are active
System
Cannot handle large number of data Handles large number of data types,
types, relationships and behaviors relationships and behaviors.
Poor in navigational and associative Supports navigational and
access to information associative access to information
Data representation has no character Data has character
and no identity in itself
RDBMS does not cover advanced OODBMS provides all required
features of OODBMS. advanced features in addition to all
features of RDBMS
152
Student Table Object Oriented
Database Management
Roll Number Student Name Address College_Id
Systems (OODBMS)
10 S.A. Patil 1102, Sadashiv Peth, Pune 1001
50 N.M. Joshi 102, M.G. Road, Pune 1002
456 P.G. Deshpande 119, Narayan Peth, Pune 1001
7890 M.R. Shinde 234, Parvati, Pune 1005
College Table
College_ID College Name Address
1001 National College 123, M.G. Road
1002 Model College 456, ShanwarPeth
1005 City College 678, Karvenagar
Roll number is a primary key of Student Table
College_ID is a primary key of College Table
College_ID is a foreign key of Student Table
The conversion of object class to table is shown in below figure. Class
student has attributes student name and address as discussed above. These
attributes are listed by table model and added to implicit object ID. It is required
to specify that the field roll_number cannot be null since it is a primary key.
Similarly, student_name also cannot be null. The name must be entered for
every student. However, the student_name is not a primary key because two
or more students may have the same name. The attribute 'address' can be null.
The domain is assigned to each attribute. The primary key is specified for each
table. The SQL code is used to create Student table. The index on student_name
enables quick retrieval for this attribute since it is frequently accessed. The
SQL also maps domain to data types.
154
Disadvantages of OODBMS : Object Oriented
Database Management
• Lack of universal data model – There is no universally agreed data
Systems (OODBMS)
model for an OODBMS, and most models lack a theoretical foundation.
This disadvantage is seen as a significant drawback, and is comparable
to pre–relational systems.
• Lack of experience – In comparison to RDBMSs the use of OODBMS
is still relatively limited. This means that we do not yet have the level
of experience that we have with traditional systems. OODBMSs are still
very much geared towards the programmer, rather than the naïve end–
user. Also there is a resistance to the acceptance of the technology. While
the OODBMS is limited to a small niche market, this problem will
continue to exist.
• Lack of standards – There is a general lack of standards of OODBMSs.
We have already mentioned that there is not universally agreed data
model. Similarly, there is no standard object–oriented query language.
• Competition – Perhaps one of the most significant issues that face
OODBMS vendors is the competition posed by the RDBMS and the
emerging ORDBMS products. These products have an established user
base with significant experience available. SQL is an approved standard
and the relational data model has a solid theoretical formation and
relational products have many supporting tools to help.both end–users
and developers.
• Query optimization compromises encapsulations – Query optimization
requires. An understanding of the underlying implementation to access
the database efficiently. However, this compromises the concept of
incrassation.
Locking at object level may impact performance Many OODBMSs use
locking as the basis for concurrency control protocol. However, if locking
is applied at the object level, locking of an inheritance hierarchy may
be problematic, as well as impacting performance.
• Complexity – The increased functionality provided by the OODBMS
(such as the illusion of a single–level storage model, pointer sizzling,
long–duration transactions, version management, and schema evolution–
–makes the system more complex than that of traditional DBMSs. In
complexity leads to products that are more expensive and more difficult
to use.
• Lack of support for views – Currently, most OODBMSs do not provide
a view mechanism, which, as we have seen previously, provides many
advantages such as data independence, security, reduced complexity, and
customization.
• Lack of support for security – Currently, OODBMSs do not provide
adequate security mechanisms. The user cannot grant access rights on
individual objects or classes.
If OODBMSs are to expand fully into the business field, these deficiencies
must be rectified.
155
Database Management Check Your Progress – 7 :
System
1. How OODBMS can be mapped to RDBMS ?
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
156
Object Oriented
Database Management
Systems (OODBMS)
157
Database Management inheritance into a single table is possible in RDBMS. Relational RDBMS is
System built on schema representing complex relationships between entities.
The relational data model was invented by E.F. Codd. The concept of
RDBMS is based on a table.
Complex data structures are handled effectively by OODB. OODB makes
use of Product data management for this purpose.
The user must have access to both OODB and RDB to manipulate data.
The developer therefore must develop applications that could source data from
all databases (OODB, RDB etc). The Multi–database (MDBS) is a database
system that sits on the top of existing relational and OODBMS but gives
impression of being a single database to the users.
OODBMS can handle wide range of data types supporting complex data
and structure. OODBMS is a result of blending OOP and database technology
to meet the application needs of systems defined in OOT. OODBMS has more
or less the same features of DBMS, but in addition, is in position to handle
OO features.
A three schema architecture is a standard architecture for related database
applications. Earlier, this architecture was proposed by the ANSI/SPARC
committee on DBMS.
Objects at one location should be able to refer to objects at another
location.
The object being independent and not linked with an application, once
method is defined, is usable universally in all applications.
Large complex systems requiring complex data support are handled
through distributed data sources or databases
10.14 Assignment :
Explain the concept of Object Oriented Database Management Systems
(OODBMS) in our own words with suitable example.
10.15 Activities :
Explain the main concepts of object oriented data model like object
structure, object classes, Inheritance, object Identity, object containment.
160
BLOCK SUMMARY :
Unit 7 explains query language for databases that is SQL. SQL commands
are instructions, coded into SQL statements, which are used to communicate
with the database to perform specific tasks, work, functions and queries with
data. SQL commands can be used not only for searching the database but also
to perform various other functions like, for example, you can create tables,
add data to tables, or modify data, drop the table, set permissions for users.
SQL commands are grouped into four major categories depending on their
functionality :
• Data Definition Language (DDL) – These SQL commands are used for
creating, modifying, and dropping the structure of database objects. The
commands are CREATE, ALTER, DROP.
• Data Manipulation Language (DML) – These SQL commands are used
for storing, retrieving, modifying, and deleting data. These Data
Manipulation Language commands are : SELECT, INSERT, UPDATE,
and DELETE.
Unit 8 demonstrate the different constraints with practical aspects. The
Primary key, Foreign Key, Not Null and Check Constraints all explained in
detail.
Unit 9 explains the concept of transaction processing.Knowing the steps
that must be taken to process a transaction can greatly affect how a custom
application is developed. If a transaction is querying data only, using a read–
only transaction can greatly reduce the amount of processing overhead required
to run an application. If many users are running the same read–only query,
this savings on overhead can be a considerable amount.
Likewise, knowing more about how statements are optimized can save
a great deal of time in reaching the goals set for a new application. Because
optimization methods take a critical role in meeting those goals, it is imperative
that you take this into account.
Overall, knowing transaction processing steps is just plain helpful for
administrators and developers alike.
Unit 10 explains the need and concept of object oriented databases.
Object oriented databases are also called Object Database Management Systems
(ODBMS).Object databases store objects rather than data such as integers,
strings or real numbers. Objects are used in object oriented languages such
as Smalltalk, C++, Java, and others. Objects basically consist of the following:
Attributes and methods. Attributes are data which defines the characteristics
of an object. This data may be simple such as integers, strings, and real numbers
or it may be a reference to a complex object.
Methods – Methods define the behavior of an object and are what was
formally called procedures or function With traditional databases, data manipulated
161
Database Management by the application is transient and data in the database is persisted (Stored
System
on a permanent storage device).
In object databases, the application can manipulate both transient and
persisted data. Object databases should be used when there is complex data
and/or complex data relationships. This includes a many to many object
relationship.
Object databases should not be used when there would be few join tables
and there are large volumes of simple transactional data.
Object databases work well with: CAS Applications (CASE–computer
aided software engineering, CAD–computer aided design, and CAM–computer
aided manufacture), Multi–media Applications, Object projects that change over
time and Commerce.
162
BLOCK ASSIGNMENT :
Short Questions :
Long Questions :
BIBLIOGRAPHY
http://www.webbasedprogramming.com/Oracle–Unleashed/oun19fi.htm
http://repository.mdp.ac.id/ebook/programming–books/Best%20DataBase%
20Reference/Oracle%20Unleashed/oun19fi.htm
http://ecomputernotes.com/database–system/adv–database/object–oriented–
database–oodb
http://whatis.techtarget.com/definition/DCOM–Distributed–Component–
Object–Model
163
Database Management Enrolment No. :
System
1. How many hours did you need for studying the units ?
Unit No. 7 8 9 10
No. of Hrs.
2. Please give your reactions to the following items based on your reading
of the block :
164
Dr. Babasaheb Ambedkar BCAR-202/
Open University Ahmedabad DCAR-202
UNIT STRUCTURE
11.1 Introduction
11.2 Centralized Database
11.3 Distributed Database
11.4 Personal Database
11.5 End–User Database
11.6 Commercial Database
11.7 NoSQL Database
11.8 Operational Database
11.9 Relational Database
11.10 Cloud Database
11.11 Object–Oriented Database
11.12 Graph Database
11.13 Let Us Sum Up
11.14 Suggested Answers to Check Your Progress
11.15 Glossary
11.16 Assignment
11.17 Activities
11.18 Case Study
11.19 Further Readings
11.1 Introduction :
Depending upon the usage requirements, there are following types of
databases available in the market ?
• Centralised database.
• Distributed database.
• Personal database.
• End–user database.
165
Database Management • Commercial database.
System
• NoSQL database.
• Operational database.
• Relational database.
• Cloud database.
• Object–oriented database.
• Graph database.
167
Database Management 11.8 Operational Database :
System
Information related to operations of an enterprise is stored inside this
database. Functional lines like marketing, employee relations, customer service
etc. require such kind of databases.
168
11.11 Object–Oriented Databases : Types of Database
169
Database Management 3. The Cloud database have the ability to pay for storage capacity and
System bandwidth on a per–user basis, and they provide scalability on demand,
along with high availability.
(A) True (B) False
4. Which type of database is efficient in analyzing large size unstructured
data that may be stored at multiple virtual servers of the cloud.
(A) Centralized (B) Cloud (C) Relational (D) NOSQL DB
5. The operating systems, underlying hardware as well as application
procedures can be different at various sites of a DDB which is known
as ___________ DDB
(A) Heterogeneous (B) Homogenous
(C) Centralised (D) End User
11.15 Glossary :
Distributed databases : A distributed database is a type of database that
has contributions from the common database and information captured by local
computers. In this type of database system, the data is not in one place and
is distributed at various organizations.
Relational databases : This type of database defines database relationships
in the form of tables. It is also called Relational DBMS, which is the most
popular DBMS type in the market. Database example of the RDBMS system
include MySQL, Oracle, and Microsoft SQL Server database.
Object–oriented databases : This type of computers database supports
the storage of all data types. The data is stored in the form of objects. The
objects to be held in the database have attributes and methods that define what
to do with the data. PostgreSQL is an example of an object–oriented relational
DBMS.
Centralized database : It is a centralized location, and users from
different backgrounds can access this data. This type of computers databases
store application procedures that help users access the data even from a remote
location.
11.16 Assignment :
Understand the different dbms.
11.17 Activities :
Understand when to use which database
170
11.18 Case Study : Types of Database
List five database management system for five different domain context.
171
Unit
DATA WAREHOUSING AND
12 DATA MINING
UNIT STRUCTURE
12.0 Learning Objectives
12.1 Introduction
12.2 Concept
12.3 Architecture
12.4 Various Tools in Data Warehousing
12.5 Tools in Data Mining
12.6 Difference Between Data Mining and Normal Query
12.7 Let Us Sum Up
12.8 Suggested Answer for Check Your Progress
12.9 Glossary
12.10 Assignment
12.11 Activities
12.12 Case Study
12.13 Further Readings
12.1 Introduction :
A data warehouse is formed through the integration of data from multiple
heterogeneous sources. The concept of "Data Warehouse" was first introduced
by Bill Inmon in 1990. He stated that a data warehouse is an integrated, subject
oriented, time–variant, and non–volatile collection of data. The Data helps
analysts to take decisions in an organization. It even supports analytical reporting,
structured and/or ad hoc queries and decision making.
An operational database goes through nonstop changes on a daily basis
because of the regular transactions that take place. In order to understand it
in a much better way let us assume an employee of a business who wants
to examine earlier response on any data such as a product, a supplier, or any
customer data, then the executive will have no data offered to analyze because
the previous data has been efficient due to transactions.
172
A data warehouses offer us a general and consolidated data in Data Warehousing and
multidimensional view. Besides this it even provides us Online Analytical Data Mining
Processing tools. These tools are of great help to us in interactive and effective
analysis of data in a multi–dimensional space. This analysis is of great help
in data generalization and data mining.
The functions of Data mining such as association clustering, classification,
prediction can be included with Online Analytical Processing operations in order
to enhance the interactive mining of knowledge at multiple level of abstraction.
That's the reason why data warehouse has become an essential platform for
data analysis as well as for online analytical processing.
12.2 Concept :
Data Warehouse
=Data
Warehouse' and =Data Warehousing' have been defined differently
by different people. However few of the important definitions are given below :
According to Barry Devlin, –Data Warehouse is a single, complete and
consistent store of data obtained from a variety of different sources made
available to end users in a way they can understand and use in a business
context ||
The table design, dimensions and organization should always be consistent
enough throughout a data warehouse so that reports or queries across the data
warehouse are always consistent. It can also be viewed as a database for
historical data from different functions within the same company.
A Data Warehouse is nothing but a collection of data which is basically
used for important decision making of the organisation. A Data warehouse is
always :
• Subject–oriented
• Integrated
• Time–varying
• Non–volatile
The meaning of each term used above in the definition is as follows –
Subject–oriented – The data warehouse is mostly organized around the
major key subjects of the enterprise. Major subjects includes customers, patients,
students and products.
173
Database Management Integrated – The data housed in the Data Warehouse are defined using
System consistent naming conventions, formats, encoding structures and related
characteristics.
Time–variant – Data in data warehouse contain a time dimension so
that they may be used as historical record of the business.
Non–volatile – Data in data warehouse are loaded and refreshed from
operational systems, but cannot be updated by end users.
Understanding a Data Warehouse :
• It is a database, which is reserved split from the organization's operational
database.
• Even there is no recurring updating in a data warehouse.
• It has consolidate historical data that helps the organization to examine
its business properly.
• It provides executives to organize, understand, and use their data to take
strategic decisions.
• It helps in the integration of variety of application systems.
• This system helps in consolidated chronological analysis of data.
• A data warehouses is always kept split from operational databases because
of the following reasons :
• The Operational database is always constructed keeping in mind for well–
known errands and workloads. In contract, data warehouse queries are
often very complex and they present a general form of data.
• The Operational databases also support simultaneous processing of multiple
transactions. In case of operational database concurrency control and
recovery mechanisms are a must they have to be very robust and there
should be consistency of the database.
• The operational database inquiry allows reading as well as modifying
operations, while an OLAP query needs read only access of stored data.
• This should be kept in mind that operational database maintains current
data. On the other hand, a data warehouse maintains historical data.
Data warehousing is the method of constructing and using a data warehouse.
A data warehouse is constructed by integrating data from multiple heterogeneous
sources that support analytical reporting, structured and/or ad hoc queries, and
decision making. Data warehousing involves data cleaning, data integration,
and data consolidations.
Data Warehouse Applications :
As discussed earlier, a data warehouse helps a lot to the business
executives in organizing, examining, and use their data for important decision
making. A data warehouse is of great helps as a sole part of a plan–execute–
assess "closed–loop" feedback system for the enterprise management. Data
warehouses have been broadly used in the following fields :
• The Financial Services
• The Banking Services
• The Consumer Goods
174
• The Retail Sectors Data Warehousing and
Data Mining
• The Controlled Manufacturing
Check Your Progress – 1 :
1. Explain the term Data Warehouse.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
12.3 Architecture :
Three–Tier Data Warehouse Architecture :
Generally a data warehouses has a three–tier architecture. The three tiers
of the data warehouse architecture are as follows.
• Top–Tier – This tier is the front–end client layer. This layer keeps the
query tools and reporting tools, analysis tools and data mining tools.
• Middle Tier – In the middle tier, we have the OLAP Server that is used
in either of the following ways.
a. Through Multidimensional OLAP also known as (MOLAP) model
in short, this starts the multidimensional data and operations.
b. Through Relational OLAP also known ROLAP in short, it is an
extended type of relational database management system. It maps
the operations on multidimensional data to the standard relational
operations.
• Bottom Tier – This architecture is treated as the data warehouse database
server. It is also a type of the relational database system. It is the back
end tools and utilities to feed data into the bottom tier and these back
end tools and utilities help in the process of the Extract, Clean, Load,
and refresh functions.
The Three–Tier architecture of a data warehouse :
175
Database Management Data Warehouse Models :
System
If we make a study from the point of view of data warehouse architecture
then we have the following models :
(a) First is Virtual Warehouse
(b) Second is the Data mart
(c) Thirdly is the Enterprise Warehouse
(a) Virtual Warehouse :
An operational data warehouse is identified as a virtual warehouse. It
is easy to create a virtual warehouse. Creating a virtual warehouse operations
require additional capacity on the database server.
(b) Data Mart :
Organization–wide data mart is a subset of the data. This subset of data
is important for an organization's specific groups in other words, data marts
include precise figures for a particular group. For example, marketing data mart
items, customers, and may include data related to sales. Data Mart are restricted
to subjects. The following points should be considered in applying data mart
• Mostly windows or Unix/Linux–based servers are used to implement data
marts. They are implemented on low–cost servers.
• The implementation data mart cycles are deliberate in short periods of
time, i.e., in weeks rather than months or years.
• The life cycle of a data mart might be very complex in long run but
if its planning and design are not organization–wide.
• The Data marts are very small in size.
• The Data marts are personalized by department.
• The source of a data mart is departmentally prepared data warehouse.
• Data marts are flexible.
(c) Enterprise Warehouse :
• The enterprise warehouse obtains all the information and the subjects
spanning from whole organization
• They gives us enterprise wide data integration.
• The data is included from operational systems and external information
providers.
• This data or information can differ from a few gigabytes to hundreds
of gigabytes, terabytes or even beyond that.
In the data warehouse environment usually specific architecture transforms
relational data model. There are several schema models designed for data
storage, but are most commonly used :
– Star Schema
The determination of which schema model should be used for a data
warehouse should be based upon the analysis of project requirements, accessible
tools and project team preferences.
Star Schema :
Its architectural Diagram resembles very much to a star with a center
point of the scorching it is called a star schema. This architecture is the simplest
176
data warehouse schema. Star of the center points of the fact table and dimension Data Warehousing and
tables are the star. Fact tables in a star schema typically de– dimensional tables Data Mining
are normalized, the third normal form (3NF) are in. Despite the fact that the
star schema is the simplest architecture, it is most often used today and is
recommended by Oracle.
178
Data Warehousing and
Data Mining
179
Database Management Data mining a vein of valuable ore mining a large database and search
System for information in the rocks really derives its name from the resemblance
between. Sifting through a large amount of the material or are inventively values
actually mean the point is to examine the contents. Mining for gold in rocks
usually "gold mining" and not "rock mining is called" since it follows, rather
than data mining "knowledge mining" should have been asked by analogy,
however, is a misnomer.
However, data mining has become traditional customary period, and even
that is such a more complete description of the process database (KDD) as
the pursuit of knowledge that greatly overshadowed more general terms a trend
rapidly. Other similar words referring to data mining : the data dredging,
knowledge extraction and pattern discovery.
In standard, data mining is a kind of media or data is not accurate. Data
mining information should be applied to any type of repository. Although
applicable to different types of data, the algorithms and approaches may be
different. In reality, the challenges presented by different types of data vary
greatly. Data mining relational databases, object–relational databases and object–
oriented databases, data warehouses, transactional databases, such as the World
Wide Web as unstructured and semi – structured repositories, such as advanced
spatial database, including the database to be used being studied databases,
multimedia databases, time–series database and textual databases, and even flat
files.
May be exposed to that type of pattern employed depend largely on data
mining tasks. By and large, there are two types of data mining functions :
Existing data that describe the general properties of descriptive data
mining.
Predictions based on data available at the conclusion that it is attempting
to predictive data mining.
Data mining functionalities and they are briefly presented in the following
list shows the variety of knowledge :
Used tools for data mining are :
Artificial neural networks – learn through training and resemble biological
neural network structure prognostic model non–linear.
Decision trees – tree–shaped structures that correspond to the set of
decisions. These decisions generate rules for the classification of a dataset.
Rule Induction – useful if the statistical significance of the return on
the rules of data.
Genetic algorithms – genetic combination optimization techniques,
mutation, and natural selection will depend on the concepts.
Nearest neighbour – a classification technique that classifies each record
in a historical database to the most depend on similar records.
Data Mining Applications :
Data mining "knowledge from huge set of data extraction process" can
be defined as. To put it in other words, we say that data mining data mining
and knowledge that can help. This information can be used for any of the
following applications :
180
• The Analysis of Market Data Warehousing and
Data Mining
• The Detection of Fraud
• Retention of the valuable Customer
• The Control of Production
• The Exploration of Science
• The Risk management and Corporate Analysis
• The Analysis of Financial Data
• The Retail Industry
• The Industry of Telecommunication
• Analysis of the Biological Data
• Some Other Scientific Applications
• Detection of Intrusion
Check Your Progress – 4 :
1. What is Data Mining ? Explain the steps in KDD process.
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
182
Table design, dimensions and organization should be consistent throughout Data Warehousing and
a data warehouse so that reports or queries across the data warehouse are Data Mining
consistent. A data warehouse can also be viewed as a database for historical
data from different functions within a company.
Data Warehouse is a collection of data that is used primarily in
organizational decision making.
Data warehouse is
• Subject–oriented
• Integrated
• Time–varying
• Non–volatile
• Bill Inmon, Building the Data Warehouse 1996
Data Warehousing is the process whereby organizations extract meaning
from their informational assets through the use of Data Warehouses. Structured
analytical reporting a data warehouse to support various different sources, and
/ or ad hoc queries, and decisions are made by integrating the data. The data
cleaning, data integration and consolidation of data included. Data available
with the help of decision support technologies used in a data warehouse. Using
these technologies quickly and effectively to the warehouse is of great help
to the authorities. It becomes very easier for them to gather the data, analyze
it and take decisions accordingly on the basis of information present in the
warehouse.
Generally a data warehouses adopts a three–tier architecture. From the
perspective of data warehouse architecture, data warehouse models are Virtual
Warehouse, Data mart and Enterprise Warehouse. Typically, the data in a data
warehouse is loaded through the process of ETL, i.e. extraction, transformation
and loading, from one or more sources of data.
Data Mining, also popularly known as Knowledge Discovery in Databases
(KDD), refers to the nontrivial extraction of implicit, previously unknown and
potentially useful information from data in databases. There is enormous amount
of data available in Information Industry and this data is of no use unless and
until it is converted into some useful information. Analysing this huge amount
of data and extracting useful information from it is necessary. Data Mining
is process of extracting the information from the huge set of data. In other
words we can say that data mining is mining the knowledge from data.
The Knowledge Discovery in Databases process comprises of a few steps
leading from raw data collections to some form of new knowledge. The iterative
process consists of the steps : Data cleaning, Data integration, Data selection,
Data transformation, Data mining, Pattern evaluation and knowledge
representation.
The data mining functionalities are Characterization, Discrimination,
Association analysis, Classification, Prediction, Clustering, Outlier analysis and
Evolution and deviation analysis. Applications of data mining are like Market
Analysis, Fraud Detection, Customer Retention, Production Control and Science
Exploration etc.
183
Database Management 12.8 Suggested Answer for Check Your Progress :
System
Check Your Progress 1 :
See Section 12.2
Check Your Progress 2 :
See Section 12.3
Check Your Progress 3 :
See Section 12.4
Check Your Progress 4 :
See Section 12.5
Check Your Progress 5 :
1 : See Section 12.6,
2 : Computational procedure that takes some value as input and produces
some value as output,
3 : Knowledge Discovery Database,
4 : Data Integration, Data Selection, Data Mining,Knowledge
representation,
5 : Extract Transform Load
12.9 Glossary :
1. Data Warehouse – It is a collection of data that is used primarily in
organizational decision making.
2. Data warehouse is : Subject–oriented, Integrated, Time–varying and
Non–volatile.
3. Data warehousing – it can be explained as the process of building and
using a data warehouse. The process of data warehousing involves data
cleaning, integration, and consolidations.
4. OLAP – Its full form is Online analysis processing.
5. Data mart contains a subset of organization – wide data. This type
of data is very important for specific groups of an organization.
6. Enterprise Warehouse – It collects all the information and the subjects
spanning an entire organization.
7. Classification – analysis means organization of the data in given classes.
8. Clustering – Similar to classification, clustering is the organization of
data in classes. However, unlike classification, clustering, the class labels
are unknown to the discovery of admissible classes depending on the
clustering algorithm.
9. Outliers – These cannot be shared in a given class or cluster the data
elements. Also known as exceptions or surprises, they are often very
important to recognize.
12.10 Assignment :
Write difference between Data Ware housing and Data mining with
suitable example. Search information on internet and relate it with real world
184
12.11 Activities : Data Warehousing and
Data Mining
How data mining is useful in market analysis? Explain with example.
185
Unit
13 DATABASE SECURITY
UNIT STRUCTURE
13.0 Learning Objectives
13.1 Introduction
13.2 Password Authentication
13.3 Operating System Authentication
13.4 Why Protect Passwords ?
13.4.1 Control
13.4.2 Protection
13.4.3 Integrity
13.4.4 Privileged Accounts
13.4.5 SYS
13.4.6 SYSTEM
13.5 Other Issues
13.5.1 Operating System Group : DBA
13.5.2 Object Security
13.5.3 Access Rights
13.5.4 Resolving Object Synonyms
13.5.5 System Security
13.5.6 Defined System Privileges
13.5.7 Object Security Model
13.5.8 Database Auditing
13.6 Recovery from Various Problems of Volatile and Non–Volatile Storage
Devices
13.7 Let Us Sum Up
13.8 Suggested Answers to Check Your Progress
13.9 Glossary
13.10 Assignment
13.11 Activities
13.12 Case Study
13.13 Further Readings
13.1 Introduction :
Although we have been living in extremely technological advanced era
where we are surrounded by so many satellite to monitor the earth and every
second billions of people are getting connected through information technology,
failure is expected but not at every time.
DBMS is considered to be one of the most complex system where
hundreds of transactions are being executed every second and the availability
of DBMS depends to a very great extent on its complex architecture and
fundamental hardware as well as system software. In the event of failure or
crashes amid transactions being executed, it is predictable that the system will
follow some sort of algorithm or techniques through which data could be
recovered from such crashes or failures.
188
To control objects within the database for data changes is one of the Database Security
most difficult task. After all, users should be able to use the data from the
database. Access control and defensive with physical database objects within
the same database as the data should be protected and any one design and
implement a security plan should be the paramount concern.
13.4.4 Privileged Accounts :
The Privileged accounts for the Oracle DBMS are of number of forms.
Some of them are conventional password authenticated accounts, some specific
database privileges are the operating system group while. Each site has its own
custom–defined or privileged DBA can access account.
13.4.5 SYS :
The database user =SYS' is the main proprietor of all base tables, user
views, and data dictionary views. At the time of creation of database these
tables and views are created during internal mechanisms.
13.4.6 SYSTEM :
Just like SYS, the SYSTEM account is shaped when the database is
created. The SYS user owns tables and views that orientation internal database
information, the SYSTEM account owns tables that are owned by Oracle tools,
such as SQL*Menu. Generally this schema is used to install software products
for most third–party tools.
Check Your Progress – 3 :
1. What are the basic reasons for limiting database access ?
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
.......................................................................................................................
193
Database Management 13.9 Glossary :
System
ATA (Advanced Technology Attachment) : A disk drive interface standard
for IDE (Integrated Drive Electronics). A standard for storage devices that lets
them be treated as if they were hard drives on the system. Any ATA compatible
media can be read by any ATA device.
Access : Retrieval of data from or transfer of data into a storage device
or area such as RAM or a register.
Access Time : The amount of time, including seek time, latency and
controller time, needed for a storage device to retrieve information.
Adaptive Caching : A feature of hard disk drives that enables them to
improve performance and throughput by adapting to the application being run.
13.10 Assignment :
Identify and understand the recovery mechanisms which apply when?
13.11 Activity :
Create the tables and recover the tables and users.
194
Unit
14 RECOVERY MECHANISMS
UNIT STRUCTURE
14.0 Learning Objectives
14.1 Introduction
14.2 Concept–Properties–States of Transaction
14.3 Introduction to Mechanisms
14.3.1 Log
14.3.2 Deferred Update
14.3.3 Immediate Update
14.3.4 Caching/Buffering
14.3.5 Checkpoint
14.3.6 Shadow Paging
14.4 Let Us Sum Up
14.5 Suggested Answer for Check Your Progress
14.6 Glossary
14.9 Assignment
14.10 Activities
14.11 Case Study
14.12 Further Readings
14.1 Introduction :
Database systems, like any other computer system, are subject to failures
but the data stored in it must be available as and when required. When a
database fails it must possess the facilities for fast recovery. It must also have
atomicity i.e. either transactions are completed successfully and committed (the
effect is recorded permanently in the database) or the transaction should have
no effect on the database.
There are both automatic and non–automatic ways for both, backing up
of data and recovery from any failure situations. The techniques used to recover
the lost data due to system crash, transaction errors, viruses, catastrophic failure,
incorrect commands execution etc. are database recovery techniques. So to
prevent data loss recovery techniques based on deferred update and immediate
update or backing up data can be used.
195
Database Management Recovery techniques are heavily dependent upon the existence of a special
System file known as a system log. It contains information about the start and end
of each transaction and any updates which occur in the transaction. The log
keeps track of all transaction operations that affect the values of database items.
This information is needed to recover from transaction failure.
• The log is kept on disk start_transaction(T) : This log entry records that
transaction T starts the execution.
• read_item(T, X) : This log entry records that transaction T reads the value
of database item X.
• write_item(T, X, old_value, new_value) : This log entry records that
transaction T changes the value of the database item X from old_value
to new_value. The old value is sometimes known as a before an image
of X, and the new value is known as an afterimage of X.
• commit(T) : This log entry records that transaction T has completed all
accesses to the database successfully and its effect can be committed
(recorded permanently) to the database.
• abort(T) : This records that transaction T has been aborted.
• checkpoint : Checkpoint is a mechanism where all the previous logs are
removed from the system and stored permanently in a storage disk.
Checkpoint declares a point before which the DBMS was in consistent
state, and all the transactions were committed.
196
• Atomicity – Including many low–level operations of a transaction, but Recovery Mechanisms
the property of a transaction executed or not their actions are always
either means which should be treated as an atomic unit states that
although. Left partially completed transactions where the state should not
be in the database. States very clearly and properly or before the execution
of the transaction or the transaction execution / abortion / failure should
be defined after.
• Isolation – In a system of database where there is more than one
transactions being executed simultaneously, the property of isolation
states that all the transactions will be approved and executed as if it
is the only transaction in the system. None of the transaction will affect
the existence of any other transaction.
Transaction Failure :
Whenever a transaction fails to perform or it reaches a point after which
it is unable to be performed successfully it has to abort. This is called transaction
failure where only few transaction or process are affected.
There are number of reasons because of which a transaction fails, they
are
• Logical errors – Whenever a transaction cannot completed because of
any type of code error or any internal error condition
• System errors – When the dbs itself ends an active transaction because
DBMS is not able to execute it or it has to stop because of some system
condition. For example, in case of deadlock or resource unavailability
systems aborts an active transaction.
• System Crash – There are several problems, which are external to the
system, and that may cause the system to stop abruptly and may cause
the system to crash. For example due to failure in power supply, failure
of underlying hardware or software failure.
• Disk Failure – During the very early days of technology evolution, it
was a very common problem being faced by people where hard disk
drives failure or storage drives failure was a very common issue. It include
formation of bad sectors, unreachability to the disk, disk head crash or
any other failure, which destroys all or part of disk storage.
Transaction states :
A transaction must necessarily be in one of the following states :
• Active – This is the initial state, the transaction stays in this state while
it is executing or performing.
• Partially committed – This transaction comes after the final statement
has been executed.
• Failed – This occurs when the normal execution can no longer proceed.
• Aborted – This occurs after the transaction has been rolled back and
the database has been restored to its state prior to the start of the
transaction.
• Committed – after successful completion of transaction.
The state diagram corresponding to a transaction is shown in Figure.
197
Database Management
System
198
• It may have happen that a transaction may have been in the middle of Recovery Mechanisms
any operation; so DBMS should make sure the atomicity of transaction
in this case is not lost
• DBMS should be able to make sure whether the transaction can be
completed or needs to be rolled back.
• None of the transactions would be allowed to leave DBMS in an
inconsistent state.
One should note that the two types of techniques, which can help DBMS
in recovering as well as maintaining the atomicity of transaction to a very great
extent, are :
• Proper maintenance of records or logs for each and every transaction,
and writing them onto some steady and safe storage before actually
modifying the database.
• Maintainence of the shadow paging properly, when the changes are done
on a volatile memory and later the actual database is updated.
Thus a database recovery is the process of eliminating the effects of a
failure from the database. A failure is a state where data inconsistency is visible
to transactions if they are scheduled for execution.
14.3.1 Log :
Most widely used structure for recording database modification is the
Log. The Log is a sequence of records and maintains a record of all the updates
activities in the database.
Following data recorded in Log :
• Transaction identifier – Transaction that performed write operation is
called Transaction identifier.
• Data item identifier – Unique identifier of data– item writer.
• Old value – Value of data – item prior to the write.
• New value – Value of the data–item after the write.
• Commit transaction marker or, abort or Rollback transaction marker.
• When the transaction Ti starts, it registers itself by writing a <Ti start>
log record
• And before Ti executes write(X), a log record <Ti, X, V1, V2> is written,
where V1 is the value of X before the write, and V2 is the value to
be written to X.
• Log record notes that Ti has performed a write on data item Xj, Xj had
value, V1 before the write, and will have value V2 after the write.
• When Ti finishes it last statement, the log record <Ti commit> is written.
• We assume for now that log records are written directly to stable storage
(that is, they are not buffered)
• Two approaches using logs
• Deferred database modification
• Immediate database modification
199
Database Management 1. Deferred database modification – All logs are written on to a very
System steady storage and the database is updated as soon as transaction is
committed.
2. Immediate database modification – Each and every entry is a real
database modification. Is a database that has been adapted immediately
after each operation.
14.3.2 Deferred update :
This technique does not physically update the database on disk until a
transaction has reached its commit point. Before reaching commit, all transaction
updates are recorded in the local transaction workspace. If a transaction fails
before reaching its commit point, it will not have changed the database in any
way so UNDO is not needed. It may be necessary to REDO the effect of the
operations that are recorded in the local transaction workspace, because their
effect may not yet have been written in the database. Hence, a deferred update
is also known as the No–undo/redo algorithm
14.3.3 Immediate update :
In the immediate update, the database may be updated by some operations
of a transaction before the transaction reaches its commit point. However, these
operations are recorded in a log on disk before they are applied to the database,
making recovery still possible. If a transaction fails to reach its commit point,
the effect of its operation must be undone i.e. the transaction must be rolled
back hence we require both undo and redo. This technique is known as undo/
redo algorithm.
14.3.4 Caching/Buffering :
In this one or more disk pages that include data items to be updated
are cached into main memory buffers and then updated in memory before being
written back to disk. A collection of in–memory buffers called the DBMS cache
is kept under control of DBMS for holding these buffers. A directory is used
to keep track of which database items are in the buffer. A dirty bit is associated
with each buffer, which is 0 if the buffer is not modified else 1 if modified.
14.3.5 Check Point :
More than a transaction executed in parallel, when logs are interleaved
and recovery time after all the logs have been difficult to recover the system,
and then will start to recover. To avoid such a situation, most of the modern
DBMS 'checkpoints' use of the concept.
The proper maintenance of logs in real time and environment may
sometime fill out all the memory space on the system. And as the time passes
log file may be too big to be handled at all. Thus, checkpoint is one of the
mechanism where all the previous logs are removed from the system and stored
enduringly in storage disk. Checkpoint declares a point before which the DBMS
was in consistent state and all the transactions were committed.
Recovery :
When a system with concurrent transaction crashes and recovers, it does
act in the following manner :
200
Recovery Mechanisms
• The recovery system makes reading the logs backwards i.e. from the end
to the preceding Checkpoint.
• It builds two lists, undo–list and redo–list.
• If a recovery system shows a log or record with <Tn, Start> and <Tn,
Commit> or just <Tn, Commit>, it puts the transaction ina redo–list.
• If the recovery system shows a log or record with <Tn, Start> but no
commit or abort log found, it puts the transaction ina undo–list.
Thereafter all the transactions in undo–list are undone and their logs are
removed. All transaction in redo–list, their previous logs are detached and then
redone another time and log saved.
14.3.6 Shadow Paging :
Shadow paging is a substitute to log based recovery techniques, which
has both advantages as well as disadvantages. It may require fewer disk
accesses, but it is very difficult to extend paging to allow multiple
contemporaneous transactions. The paging is very much alike to paging schemes
used by the operating system for memory management.
The idea is to maintain a two page tables during the life of a transaction
: the current page table and the shadow page table. When the transaction starts,
both tables are similar. The shadow page is never changed throughout the life
of the transaction but the current page is updated with each write operation.
Each table entry points to a page on the disk. When the transaction is performed,
the shadow page entry becomes a copy of the recent page table entry and the
disk block with the old data is released. If the shadow is stored in non–volatile
memory and a system crash occurs, then the shadow page table is copied to
the current page table. This ensures that the shadow page table will point to
the database pages analogous to the state of the database prior to any transaction
that was active at the time of the crash, making aborts automatic.
201
Database Management Check Your Progress – 2 :
System
1. Explain Log based recovery.
2. If a transaction does not modify the database until it has committed, it
is said to use the ___________ technique.
(a) Deferred–modification (b) Late–modification
(c) Immediate–modification (d) Undo
3. If database modifications occur while the transaction is still active, the
transaction is said to use the ___________technique.
(a) Deferred–modification (b) Late–modification
(c) Immediate–modification (d) Undo
4. ____________ using a log record sets the data item specified in the log
record to the old value.
(a) Deferred–modification (b) Late–modification
(c) Immediate–modification (d) Undo
5. In the __________ phase, the system replays updates of all transactions
by scanning the log forward from the last checkpoint.
(a) Repeating (b) Redo (c) Replay (d) Undo
6. _________________ module of a DBMS that restores the database to
a correct condition
(a) Transaction Manager (b) Recovery Manager
(c) Save Point (D) None
14.6 Glossary :
1. Aborted transaction – A transaction in progress that terminates abnormally.
2. Checkpoint – A facility by which a DBMS periodically refuses to accept
any new transactions. The system is in a quiet state, and the database
and transaction logs are synchronized.
3. Database log – A log that contains before and after images of records
that have been modified by transactions.
4. Database recovery – Mechanisms for restoring a database quickly and
accurately after loss or damage.
5. Database security – Protection of database data against accidental or
intentional loss, destruction, or misuse.
6. Recovery manager – A module of a DBMS that restores the database
to a correct condition when a failure occurs and then resumes processing
user questions.
7. Transaction – A discrete unit of work that must be completely processed
or not processed at all within a computer system. Entering a customer
order is an example of a transaction.
8. Transaction boundaries – The logical beginning and end of a transaction.
9. Transaction log – A record of the essential data for each transaction
that is processed against the database.
10. Transaction manager – In a distributed database, a software module
that maintains a log of all transactions and an appropriate concurrency
control scheme.
203
Database Management 11. Atomicity – Either all operations of the transaction are properly reflected
System in the database or none are.
12. Consistency – Execution of a transaction in isolation preserves the
consistency of the database.
13. Isolation – Although multiple transactions may execute concurrently,
each transaction must be unaware of other concurrently executing
transactions. Intermediate transaction results must be hidden from other
alongside executed transactions.
14. Durability – After a transaction completes successfully, the changes it
has made to the database persevere, even if there are system failures.
14.7 Assignment :
Why Database security is important in DBMs? – Comment. How can
we recovery any database in case of failure
14.8 Activities :
Study different recovery techniques in detail.
204
BLOCK SUMMARY :
Uni 11 provides the introduction of database types and detail information
about different types of databases.
Unit 12 provides overview of Data Ware housing and Data mining.
Generally a data warehouses adopts three–tier architecture. From the perspective
of data warehouse architecture, data warehouse models are Virtual Warehouse,
Data mart and Enterprise Warehouse. Typically, the data in a data warehouse
is loaded through the process of ETL, i.e. extraction, transformation and loading,
from one or more sources of data. Data Mining, also popularly known as
Knowledge Discovery in Databases (KDD), refers to the nontrivial extraction
of implicit, previously unknown and potentially useful information from data
in databases. The iterative process of KDD consists of the steps : Data cleaning,
Data integration, Data selection, Data transformation, Data mining, Pattern
evaluation and knowledge representation. The data mining functionalities are
Characterization, Discrimination, Association analysis, Classification, Prediction,
Clustering, Outlier analysis and Evolution and deviation analysis. Applications
of data mining are like Market Analysis, Fraud Detection, Customer Retention,
Production Control and Science Exploration etc.
Unit 13 & 14 provides you the concept of Database security and recovery.
The database stored in the database need protection from unauthorized access
and malicious destruction or alteration. Database security refers to protection
from malicious access. To protect database, security measures can be taken
at several levels like database system, operating system, network, physical and
human .This unit also explained the concept of database recovery. The different
recovery mechanisms are also explained in unit 14.
A computer system like any other device is subject to failure. There are
varieties of causes of such failure like disk crash, power failure, software errors
etc. In addition to system failures, transaction may also fail due to violation
of integrity or deadlock. An integral part of database system is recovery scheme
that is responsible for detection of failures and for restoration of database to
state that existed before occurrence of failure. So it is responsibility of recovery
scheme to ensure atomicity and durability property. There are basically two
approaches for ensuring these properties : log–based schemes and shadow
paging.
BLOCK ASSIGNMENT :
Short Questions :
1. Define the terms : Data Warehouse, Data Warehousing
2. What is data mining ?
3. What is ETL process ?
4. Why to protect passwords ?
5. What is check point ?
Long Questions :
1. Explain the need of data warehouse and architecture of data warehouse.
2. Explain the steps in data mining or KDD process and give its applications.
3. Explain properties and states of transaction ?
205
Database Management Enrolment No. :
System
1. How many hours did you need for studying the units ?
Unit No. 11 12 13 14
No. of Hrs.
2. Please give your reactions to the following items based on your reading
of the block :
206
DR.BABASAHEB AMBEDKAR
OPEN UNIVERSITY
'Jyotirmay' Parisar,
Sarkhej-Gandhinagar Highway, Chharodi, Ahmedabad-382 481.
Website : www.baou.edu.in