0% found this document useful (0 votes)
27 views281 pages

MDU BCA- Database System

This document is a comprehensive guide on database systems for BCA 3rd semester students at Maharshi Dayanand University, created by Rohit Rathour. It covers fundamental concepts such as data, information, traditional file-based systems, database management systems, and various data models, along with their advantages and disadvantages. Additionally, it discusses database architecture, entity-relationship models, SQL, and query processing strategies.

Uploaded by

amanverma5144
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
27 views281 pages

MDU BCA- Database System

This document is a comprehensive guide on database systems for BCA 3rd semester students at Maharshi Dayanand University, created by Rohit Rathour. It covers fundamental concepts such as data, information, traditional file-based systems, database management systems, and various data models, along with their advantages and disadvantages. Additionally, it discusses database architecture, entity-relationship models, SQL, and query processing strategies.

Uploaded by

amanverma5144
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 281

1

MAHARSHI DAYANAND UNIVERSITY

Notes in Easy Language

CREATED BY: ROHIT RATHOUR

CONTACT NO-7827646303
2

MDU BCA-3rd SEMESTER


DATABASE SYSTEM
INDEX
SECTION-I
Basic Concepts:
 Data
 Information
 Records and Files
 Traditional File Based System
 File Based Approach
 Limitations of File Based Approach
 Database Approach
 Charactertics of Database Approach
 Advantages and Disadvantages of
Database System
 Components of Database System
 Database Management System(DBMS)
 Components of DBMS Enviornment
 DBMS Functions and Components
 DBMS Users
 Advantages and Disadvantages of DBMS
 DBMS Languages
Roles in the Database Enviornment:
 Data and Database Administrator
 Database Designers
 Applications Developers and Users
SECTION-II
3

Database System Architecture:


 Three Levels of Architecture
 External,Conceptual and Internal Levels
 Schemas
 Mapping and Instances
Data Independence:
 Logical and Physical Data Independence
 Classification of Database Management
System
 Centralized and Client Server Architecture
to DBMS
Data Models:
 Record-based Data Models
 Object-based Data Models
 Physical Data Models
 Conceptual Modelling
SECTION-III
Entity-Relationship Model:
 Entity Types
 Entity Sets
 Attributes Relationship Types
 Relationship Instances and ER Diagram
 Abstraction and Integration
 Basic Concept of Hierarchical and Network
Data Model
 Relational Data Model:- Brief History
 Relational Model Terminology-Relational
Data Structure
 Database Relations
 Properties of Relations,Keys
 Domains
4

 Integrity Constraints Over Relations

SECTION-IV

 Relational Algebra
 Relational Calculus
 Relational Database Design: Functional
Dependencies
 Modification Anomalies
 1st to 3rd NFs
 BCNF
 4th and 5th NFs
 Computing Clousers of set FDs
SQL:
 Data Types
 Basic Queries in SQL
 Insert,Delete and Update Statements
 Views
Query Processing:
 General Strategies of Query Processing
 Query Optimization
 Query Processor
 Concept of Security
 Concurrency and Recovery
5

SECTION-I
BASIC CONCEPT
Data:
Data, in the context of databases, refers to all the single items
that are stored in a database, either individually or as a set. Data
in a database is primarily stored in database tables, which are
organized into columns that dictate the data types stored
therein. So, if the “Customers” table has a column titled
“Telephone Number,” whose data type is defined as “Number,”
then only numerals can be stored in that column.
Explanation about Data:
Data, even in a database, is rarely useful in its raw form. For
example, in a banking application, data is the whole collection
of bank account numbers; bank customers’ names, addresses,
and ages; bank transactions and so on. Being presented with
this mass of numbers will simply overwhelm the average
human -- an individual simply cannot process it all. However,
when data is arranged relationally, it then becomes information,
which is much more useful to users. For example, if the mass
of numbers stored in the banking database above is used to
extract the names and addresses of the top 100 clients by size
of deposit, then the data has been used to provide useful
information.

Information: Information is a set of data which is processed in


a meaningful way according to the given requirement.
6

Information is processed, structured, or presented in a given


context to make it meaningful and useful.
It is processed data which includes data that possess context,
relevance, and purpose. It also involves manipulation of raw
data.
Information assigns meaning and improves the reliability of the
data. It helps to ensure undesirability and reduces uncertainty.
So, when the data is transformed into information, it never has
any useless details.

Data Vs. Information


Parameters Data Information
It is a group of data
Qualitative Or
Description which carries news
QuantitativeVariables
and meaning.
7

Parameters Data Information


which helps to develop
ideas or conclusions.
Information word
has old French and
Data comes from a middle English
Latin word, datum, origins. It has
which means “To give referred to the “act
Etymology
something.” Over a time of informing.”. It is
“data” has become the mostly used for
plural of datum. education or other
known
communication.
Data is in the form of
Ideas and
Format numbers, letters, or a set
inferences
of characters.
It can be structured, Language, ideas,
Represented
tabular data, graph, data andthoughts based
in
tree, etc. on the given data.
It carries meaning
Data does not have any that has been
Meaning
specific purpose. assigned by
interpreting data.
Information that is Information that is
Interrelation
collected processed.
Information is the
Data is a single unit and product and group
Feature is raw. It alone doesn’t of data which
have any meaning. jointly carry a
logical meaning.
It never depends on It depended on
Dependence
Information Data.
8

Parameters Data Information


Measured in
Measuring Measured in bits and meaningful units
unit bytes. like time, quantity,
etc.
Support for
It can’t be used for It is widely used for
Decision
decision making decision making.
making
Unprocessed raw Processed in a
Contains
factors meaningful way
It is the second
Knowledge It is low-level
level of
level knowledge.
knowledge.
Data is the property of
Information is
an organization and is
Characteristic available for sale to
not available for sale to
the public.
the public.
Data depends upon the
Information
Dependency sources for collecting
depends upon data.
data.
Sales report by
region and venue.
Ticket sales on a band It gives
Example
on tour. information which
venue is profitable
for that business.
Information is
Data alone has no
Significance significant by
signifiance.
itself.
Data is based on records Information is
Meaning and observations and, considered more
which are stored in reliable than data.
9

Parameters Data Information


computers or It helps the
remembered by a researcher to
person. conduct a proper
analysis.
Information is
The data collected by useful and valuable
Usefulness the researcher, may or as it is readily
may not be useful. available to the
researcher for use.
Information is
always specific to
the requirements
and expectations
Data is never designed
because all the
Dependency to the specific need of
irrelevant facts and
the user.
figures are
removed, during
the transformation
process.

Records and Files:


A record in a database is a collection of information
organized in a table that pertains to a specific topic or
category. Another name for a database record is a tuple. A set
of records can be referred to as a data set, a data file, and a
table. The data in the table can come in a variety of forms. It
can be numeric, such as an individual's weight or financial
records related to a company's yearly spending. The data may
also include non-numeric information, such as an individual's
name and a person's job title.
10

The database itself contains a few important elements:


 Rows represent individual records. These run horizontal
across a spreadsheet.
 Columns represent categories of data pertaining to each
record. Another name for the columns is attributes. These
run vertically across a spreadsheet.
 Fields are each individual piece of data that is entered
into the database. Consider the previous example of the
database that contains criminal records. Each record
contains five separate fields of data, including the
criminal's name, the crime that was committed, the date
of the crime, the criminal's date of birth, and the penalty
or fine of the criminal offense.
 Primary keys are unique pieces of data that cannot be
repeated in the database. For example, a criminal's name
in the aforementioned criminal database is a primary key
that can only be used once in the database. Other pieces
of data can be used repeatedly, such as the type of crime
committed and the date of the crime.

Files:
File Organization
o The File is a collection of records. Using the primary key,

we can access the records. The type and frequency of


access can be determined by the type of file organization
which was used for a given set of records.
o File organization is a logical relationship among various

records. This method defines how file records are mapped


onto disk blocks.
o File organization is used to describe the way in which the

records are stored in terms of blocks, and the blocks are


placed on the storage medium.
11

o The first approach to map the database to the file is to use


the several files and store only one fixed length record in
any given file. An alternative approach is to structure our
files so that we can contain multiple lengths for records.
o Files of fixed length records are easier to implement than
the files of variable length records.
Objective of file organization
o It contains an optimal selection of records, i.e., records can
be selected as fast as possible.
o To perform insert, delete or update transaction on the
records should be quick and easy.
o The duplicate records cannot be induced as a result of
insert, update or delete.
o For the minimal cost of storage, records should be stored
efficiently.
Types of file organization:
File organization contains various methods. These particular
methods have pros and cons on the basis of access or selection.
In the file organization, the programmer decides the best-suited
file organization method according to his requirement.
Types of file organization are as follows:
12

o Sequential file organization


o Heap file organization
o Hash file organization
o B+ file organization
o Indexed sequential access method (ISAM)
o Cluster file organization

Traditional file –based System:


File system is collection of data. In this system, user has to
write procedures for managing database. It provides details of
data representation and storage of data. In this –
 Data is stored in files.

 Each file has specific format.

 Programs that use these files depend on knowledge about

that format.
 In earlier days, database applications were built on top of

file systems.

Pictorial representation of traditional file processing


The diagram given below represents the traditional file
processing in a database.
13

This approach is mostly obsolete but –


 Understanding problems inherent in file based systems may
prevent us from repeating these problems in our database
system.
 Understanding how file system works is extremely useful

when converting a file-based system to a database system.


Basically, it is a collection of application programs that
performs services for end users such as production of reports.
Each file defines and manages its own data.
It doesn’t have a crash mechanism i.e., if system crashes while
entering some data, then content of file will be lost. This is
disadvantage of traditional file based system. Also, it is very
14

difficult to protect a file under the file system. This system


can’t efficiently store and retrieve data.
Advantages of Traditional File System :
 File processing cost less and can be more speed than

database.
 File processing design approach was well suited to

mainframe hardware and batch input.


 Companies mainly use file processing to handle large

volumes of structured data on a regular basis.


 It can be more efficient and cost less than DBMS in certain

situations.
 Design is simple.

 Customization is easy and efficient.

Disadvantages of Traditional File System :


 Data redundancy and inconsistency.

 Difficulty in accessing data.

 Data isolation – multiple files and formats.

 Integrity problems

 Unauthorized access is not restricted.

 It co-ordinates only physical access.

File Based Approach:


The systems that are used to organize and maintain data
files are known as file based data systems. These file
systems are used to handle a single or multiple files and
are not very efficient.
Functionalities
The functionalities of a File-based Data Management
System are as follows −
15

 A file based system helps in basic data management


for any user.
 The data stored in the file based system should
remain consistent. Any transactions done in the file
based system should not alter the consistency
property.
 The file based system should not allow any illegal or
potentially hazardous operations to occur on the
data.
 The file based system should allow concurrent
access by different processes and this should be
carefully coordinated.
 The file based system should make sure that the data
is uniformly structured and stored so it is easier to
access it.
Advantages of File Based System
 The file Based system is not complicated and is
simpler to use.
 Because of the above point, this system is quite
inexpensive.
 Because the file based system is simple and cheap, it
is normally suitable for home users and owners of
small businesses.
 Since the file based system is used by smaller
organisations or individual users, it stores
comparatively lesser amount of data. Hence, the data
can be accessed faster and more easily.
Disadvantages of File Based System
16

 The File based system is limited to a smaller size


and cannot store large amounts of data.
 This system is relatively uncomplicated but this
means it cannot support complicated queries, data
recovery etc.
 There may be redundant data in the file based
system as it does not have a complex mechanism to
get rid of it.
 The data is not very secure in a file based system
and may be corrupted or destroyed.
 The data files in the file based system may be
stored across multiple locations. Consequently, it is
difficult to share the data easily with multiple users.

Limiation of File Based Approach:


The file-based system has some limitations. The limitations are
listed as follows:
o Isolation and separation of data: When the data is stored
in separate files it becomes complex to access. It becomes
very complex when the data has to be retrieved from more
than two files as a large amount of data has to be looked
for.
o Duplication of data: Because of the decentralised
approach, the file system varies to uncontrolled
duplication of data. This is disagreeable as the duplication
leads to wastage of a lot of storage space. It also money
and cost times to enter the data more than once. For
instance, the address information of student may have to
be duplicated in bus list file data.
17

o Inconsistent Data: The data in a file system can become


inconsistent if more than one person changes the data
simultaneously, for example, if any student changes the
residence and the change is notified to only his/her file and
not to bus list. Putting wrong data is also another reason
for inconsistencies.
o Data dependence: The physical structure and storage of
data files and records are distinct in the application code.
This means that it is very difficult to make changes to the
existing structure. The programmer would have to
recognize all the affected programs, change them and
retest them. This characteristic of the File Based system is
known as program data dependence.
o Incompatible File Formats: As the structure of the files
is embedded in application programs, the structure is fully
dependent on application programming languages.
Therefore the structure of a file generated by COBOL
programming language may be quite dissimilar from a file
generated by 'C' programming language. This
incompatibility makes them hard to process together. The
application developer may have to develop software to
change the files to some common format for processing.
Though, this may be time consuming and expensive.
o Fixed Queries: File based systems are very much
dependent on application programs. Any report or query
needed by the organisation has to be developed by the
application programmer. With type, the time, and number
of queries or reports increases. Producing dissimilar types
of queries or reports is not possible in File Based Systems.
As a result, in some organisations the kind of queries or
reports to be produced is fixed. No latest query or report
of the data could be produced.
18

Database Approach:
In order to remove all limitations of the File Based Approach,
a new approach was required that must be more effective
known as Database approach.
The Database is a shared collection of logically related data,
designed to meet the information needs of an organization. A
database is a computer based record keeping system whose
over all purpose is to record and maintains information. The
database is a single, large repository of data, which can be used
simultaneously by many departments and users. Instead of
disconnected files with redundant data, all data items are
integrated with a minimum amount of duplication.
The database is no longer owned by one department but is a
shared corporate resource. The database holds not only the
organization’s operational data but also a description of this
data. For this reason, a database is also defined as a self-
describing collection of integrated records. The description of
the data is known as the Data Dictionary or Meta Data (the
‘data about data’). It is the self-describing nature of a database
that provides program-data independence.
19

A database implies separation of physical storage from use of


the data by an application program to achieve program/data
independence. Using a database system, the user or
programmer or application specialist need not know the details
of how the data are stored and such details are “transparent to
the user”. Changes (or updating) can be made to data without
affecting other components of the system. These changes
include, for example, change of data format or file structure or
relocation from one device to another.

In the DBMS approach, application program written in some


programming language like Java, Visual Basic.Net, and
Developer 2000 etc. uses database connectivity to access the
database stored in the disk with the help of operating system’s
file management system.
The file system interface and DBMS interface for the
university management system is shown.
20

Building blocks of a Database

The following three components form the building blocks of a


database. They store the data that we want to save in our
database.
Columns. Columns are similar to fields, that is, individual
items of data that we wish to store. A Student’ Roll Number,
Name, Address etc. are all examples of columns. They are also
similar to the columns found in spreadsheets (the A, B, C etc.
along the top).
Rows. Rows are similar to records as they contain data of
multiple columns (like the 1, 2, 3 etc. in a spreadsheet). A row
can be made up of as many or as few columns as you want.
This makes reading data much more efficient – you fetch what
you want.
21

Tables. A table is a logical group of columns. For example,


you may have a table that stores details of customers’ names
and addresses. Another table would be used to store details of
parts and yet another would be used .for supplier’s names and
addresses.
It is the tables that make up the entire database and it is
important that we do not duplicate data at all.

Characteristics of database

The data in a database should have the following features:


Organized/Related. It should be well organized and related.
Shared. Data in a database are shared among different users
and applications.
Permanent or Persistence. Data in a database exist
permanently in the sense the data can live beyond the scope of
the process that created it.
Validity/integrity/Correctness. Data should be correct with
respect to the real world entity that they represent.
Security. Data should be protected from unauthorized access.
Consistency. Whenever more than one data element in a
database represents related real world values, the values should
be consistent with respect to the relationship.
22

Non-redundancy: No two data items in a database should


represent the same real world entity.
Independence. Data at different levels should be independent
of each other so that the changes in one level should not affect
the other levels.
Easily Accessible. It should be available when and where it is
needed i.e. it should be easily accessible.
Recoverable. It should be recoverable in case of damage.
Flexible to change. It should be flexible to change.
To create, manage and manipulate data in databases, a
management system known as database management system
was developed.
Characteristics of Database Approach:

Characteristics of database approach :


Some of the most important characteristics of the database
approach to the file processing approach are the following as
follows.
Approach-1 :
Self-Describing Nature of a Database System :
 One of the most fundamental characteristics of the database

approach is that the database system contains not only the


database itself but also an entire definition or description of
the database structure and constraints also known as
metadata of the database.
 This definition is stored within the DBMS catalog, which

contains information like the structure of every file, the sort


and storage format of every data item, and various
constraints/rules on the information.
23

 The knowledge stored within the catalog is named meta-


data, and it describes the structure of the first database The
catalog is employed by the DBMS software and also by
database users such as database administrators who required
to know the information about the database structure.
 A general-purpose DBMS software package is not written
for a selected database application. Therefore, it must ask
the catalog to understand the structure of the files during a
specific database, like the sort and format of knowledge it
will access.
 The DBMS software must work equally well with any
number of database applications, For example, a university
database, a banking database, or a corporation database as
long as because the database definition is stored within the
catalog In traditional file processing, data definition is
usually a part of the files. File processing software can
access only specific databases, Database Management
software can access various databases by extracting the
database definitions or schemas from the catalog and using
these definitions.

Approach-2 :
Isolation between Programs and Data, and Data
Abstraction :
 In a traditional file processing system, the structure of

database knowledge files is embedded within the


application programs, so any changes to the structure of a
file may require changing all programs that access that file.
 Against this, DBMS access programs don’t require such

changes in most cases, so independence is achieved between


them.
24

 The structure of knowledge files is stored within the DBMS


catalog separately from the programs that access them. We
call this property program-data independence.
 The characteristic that allows program-data independence

and program-operation independence is known as data


abstraction.
 A DBMS provides users with a conceptual representation of

knowledge that doesn’t include much of the small print of


how the information is stored or how the operations are
implemented internally. Informally, a knowledge model
may be a sort of data abstraction that won’t provide this
conceptual representation.
 The information model uses logical concepts, like objects,

their properties, and their relationships between them,


which will be easier for many users to know than memory
concepts or storage concepts. Hence, the information model
hides storage and implementation details that are not of
interest to most database users, so unnecessary
complications are hidden from them.
Approach-3 :
Support for Multiple Views of the Data :
 A database sometimes has many users, each of whom may

require a special perspective or view of the database.


 A view could also be a subset of the database, or it’s going

to contain virtual data that is derived from the database files


but isn’t explicitly stored.
 Some users might not get to remember whether the

information they ask for is stored or derived.


 A multi-user DBMS whose users have a spread of distinct

applications must provide facilities for outlining multiple


views. This provides many benefits for large databases such
as the Aadhaar database.
25

Approach-4 :
Sharing of knowledge and Multi-user Transaction
Processing :
 A multi-user DBMS, as its name implies, must allow

multiple users to access the database at an equivalent time


or concurrently.
 This is often essential if data for multiple applications is to

be integrated and maintained during a single database such


as the latest feature of WhatsApp integration with
Facebook.
 The DBMS must implement concurrency control in the

software to make sure that several users trying to update


equivalent data do so in a controlled manner in order that
the results of the updates are correct.
 For instance, when several reservation agents attempt to

assign a seat on an airline flight, the DBMS should make


sure that each seat is often accessed by just one user agent
at a single time for an assignment to a passenger.
 These sorts of applications are generally called online

transaction processing (OLTP) applications. A fundamental


role of multi-user DBMS software is to make sure that
concurrent transactions operate correctly and efficiently
with no inconsistency.
 The concept of a transaction has become central to several

database applications. A transaction is an executing


program or process that has one or more database accesses,
like reading or updating of database records or inserting new
records.
 The isolation property ensures that every transaction

appears to execute in isolation from other transactions,


many transactions could also be executed concurrently
without affecting each other.
26

 The atomicity property ensures that either all the database


operations during a transaction are executed or none are,
these all ACID properties we know.

Advantages And Disadvantages Of Database System

Advantages Of Database
 Further developed information sharing: A benefit of
the database administration approach is, the DBMS assists
with establishing a climate where end clients have better
admittance to more and better-oversaw information. Such
access makes it workable for end clients to react rapidly to
changes in their current circumstances.
 Further developed information security: The more
clients access the information, the more noteworthy the
dangers of information security breaks. Partnerships
contribute significant measures of time, exertion, and cash
to guarantee that corporate information is utilized
appropriately. A DBMS gives a system to better
authorization of information protection and security
arrangements.
 Better information reconciliation: More extensive
admittance to very much overseen information advances a
coordinated perspective on the association’s tasks and a
more clear perspective on the 10,000 foot view. It turns
out to be a lot more straightforward to perceive what
activities in a single fragment of the organization mean for
different sections.
 Limited information irregularity: Information
irregularity exists when various renditions of similar
information show up in better places. For instance,
27

information irregularity exists when an organization’s


outreach group stores a salesman’s name as “Bill Brown”
and the organization’s staff retail chains that equivalent
individual’s name as “William G. Brown,” or when the
organization’s local deals office shows the cost of an item
as $45.95 and its public deals office shows a similar item’s
cost as $43.95. The likelihood of information irregularity
is incredibly decreased in an appropriately planned
database.
 Further developed information access: The DBMS
makes it conceivable to create fast solutions to impromptu
inquiries. According to an information base viewpoint, a
question is a particular solicitation gave to the DBMS for
information control, for instance, to peruse or refresh the
information. Basically, an inquiry is an inquiry, and an
impromptu question is a spontaneous inquiry. The DBMS
sends back a response (called the inquiry result set) to the
application.
 Further developed independent direction: Better-
oversaw information and further developed information
access make it conceivable to create better-quality data, on
which better choices are based. The nature of the data
created relies upon the nature of the basic information.
Information quality is an exhaustive way to deal with
advancing the exactness, legitimacy, and practicality of
the information. While the DBMS doesn’t ensure
information quality, it gives a structure to work with
information quality drives.
 Expanded end-client efficiency: The accessibility of
information, joined with the devices that change
information into usable data, engages end clients to make
speedy, informed choices that can have the effect among
progress and disappointment in the worldwide economy.
28

Disadvantages Of Database
Albeit the database framework yields impressive benefits over
past information the executives draws near, database
frameworks in all actuality do convey critical hindrances.

 Expanded expenses: one of the disservices of DBMS is


Database frameworks require refined equipment and
programming and a profoundly talented work force. The
expense of keeping up with the equipment, programming,
and staff expected to work and deal with an information
base framework can be significant. Preparing, permitting,
and guideline consistence costs are regularly neglected
when database frameworks are executed.
 The board intricacy: Information base frameworks
connect with a wide range of innovations and essentially
affect an organization’s assets and culture. The
progressions presented by the reception of a database
framework should be appropriately figured out how to
guarantee that they assist with propelling the
organization’s targets. Given the way that information
base frameworks hold critical organization information
that are gotten to from numerous sources, security issues
should be evaluated continually.
 Keeping up with money: To amplify the effectiveness of
the information base framework, you should keep your
framework current. In this way, you should perform
incessant updates and apply the most recent patches and
safety efforts to all parts. Since database innovation
propels quickly, faculty preparing costs will more often
than not be critical. Merchant reliance. Given the weighty
interest in innovation and work force preparing,
organizations may be hesitant to change information base
merchants. As a result, sellers are more averse to offer
29

estimating direct benefits toward existing clients, and


those clients may be restricted in their decision of
information base framework parts.
 Regular update/substitution cycles: DBMS sellers
oftentimes redesign their items by adding new usefulness.
Such new elements frequently come packaged in new
redesign adaptations of the product. A portion of these
forms require equipment updates. Not exclusively do the
actual overhauls cost cash, however, it additionally costs
cash to prepare database clients and overseers to
appropriately utilize and deal with the new highlights.
Comparison Table for Advantages And Disadvantages Of
Database

Advantages Disadvantages

Database eliminates the


redundancy of data by Complex software
integrating them

Data consistency is
It requires more memory
increased

Additional information
Multiuser DBMS can be
can be derived from same
more expensive
data

Database improves Performance can be poor


security sometimes
30

Damage to database
It is cost-efficient affects virtually all
applications programs

Components Of Database System:


Components of a Database

The five major components of a database are:


1. Hardware
Hardware refers to the physical, electronic devices such as
computers and hard disks that offer the interface between
computers and real-world systems.
2. Software

Software is a set of programs used to manage and control the


database and includes the database software, operating system,
network software used to share the data with other users, and
the applications used to access the data.
3. Data
Data are raw facts and information that need to be organized
and processed to make it more meaningful. Database
dictionaries are used to centralize, document, control, and
coordinate the use of data within an organization. A database is
a repository of information about a database (also called
metadata).
31

4. Procedures

Procedures refer to the instructions used in a database


management system and encompass everything from
instructions to setup and install, login and logout, manage the
day-to-day operations, take backups of data, and generate
reports.
5. Database Access Language

Database Access Language is a language used to write


commands to access, update, and delete data stored in a
database. Users can write commands using Database Access
Language before submitting them to the database for execution.
Through utilizing the language, users can create new databases,
tables, insert data, and delete data.
32

Database Management System (DBMS):

Database Management System


o Database management system is a software which is used
to manage the database. For example: MySQL, Oracle,
etc are a very popular commercial database which is used
in different applications.
o DBMS provides an interface to perform various
operations like database creation, storing data in it,
updating data, creating a table in the database and a lot
more.
o It provides protection and security to the database. In the
case of multiple users, it also maintains data consistency.
DBMS allows users the following tasks:
o Data Definition: It is used for creation, modification, and
removal of definition that defines the organization of data
in the database.
o Data Updation: It is used for the insertion, modification,
and deletion of the actual data in the database.
o Data Retrieval: It is used to retrieve the data from the
database which can be used by applications for various
purposes.
o User Administration: It is used for registering and
monitoring users, maintain data integrity, enforcing data
security, dealing with concurrency control, monitoring
performance and recovering information corrupted by
unexpected failure.
33

Characteristics of DBMS
o It uses a digital repository established on a server to store
and manage the information.
o It can provide a clear and logical view of the process that
manipulates data.
o DBMS contains automatic backup and recovery
procedures.
o It contains ACID properties which maintain data in a
healthy state in case of failure.
o It can reduce the complex relationship between data.
o It is used to support manipulation and processing of data.
o It is used to provide security of data.
o It can view the database from different viewpoints
according to the requirements of the user.
Advantages of DBMS
o Controls database redundancy: It can control data
redundancy because it stores all the data in one single
database file and that recorded data is placed in the
database.
o Data sharing: In DBMS, the authorized users of an
organization can share the data among multiple users.
o Easily Maintenance: It can be easily maintainable due to
the centralized nature of the database system.
o Reduce time: It reduces development time and
maintenance need.
o Backup: It provides backup and recovery subsystems
which create automatic backup of data
from hardware and software failures and restores the data
if required.
34

o multiple user interface: It provides different types of


user interfaces like graphical user interfaces, application
program interfaces
Disadvantages of DBMS
o Cost of Hardware and Software: It requires a high speed
of data processor and large memory size to run DBMS
software.
o Size: It occupies a large space of disks and large memory
to run them efficiently.
o Complexity: Database system creates additional
complexity and requirements.
o Higher impact of failure: Failure is highly impacted the
database because in most of the organization, all the data
stored in a single database and if the database is damaged
due to electric failure or database corruption then the data
may be lost forever.

Components of DBMS Environment:

Hardware, Software, Data, Database Access Language,


Procedures and Users all together form the components of a
DBMS.
Let us discuss the components one by one clearly.
Hardware
The hardware is the actual computer system used for keeping
and accessing the database. The conventional DBMS hardware
consists of secondary storage devices such as hard disks.
Databases run on the range of machines from micro computers
to mainframes.
Software
35

Software is the actual DBMS between the physical database


and the users of the system. All the requests from the user for
accessing the database are handled by DBMS.
Data
It is an important component of the database management
system. The main task of DBMS is to process the data.
Databases are used to store the data, retrieved, and updated to
and from the databases.
Users
There are a number of users who can access or retrieve the data
on demand using the application and the interfaces provided by
the DBMS.
The users of the database can be classified into different groups

 Native Users
 Online Users

 Sophisticated Users

 Specialized Users

 Application Users

 DBA- Database Administrator

The components of DBMS are given below in pictorial form −


36

DBMS Functions and Components:

Function of DBMS:

A DBMS performs several important functions that guarantee


integrity and consistency of data in the database. Most of
these functions are transparent to end-users. There are the
following important functions and services provided by a
DBMS:
(i) Data Storage Management: It provides a mechanism for
management of permanent storage of the data. The internal
schema defines how the data should be stored by the storage
management mechanism and the storage manager interfaces
with the operating system to access the physical storage.
37

(ii) Data Manipulation Management: A DBMS furnishes


users with the ability to retrieve, update and delete existing
data in the database.
(iii) Data Definition Services: The DBMS accepts the data
definitions such as external schema, the conceptual schema,
the internal schema, and all the associated mappings in
source form.
(iv) Data Dictionary/System Catalog Management: The
DBMS provides a data dictionary or system catalog function
in which descriptions of data items are stored and which is
accessible to users.

(v) Database Communication Interfaces: The end-user’s


requests for database access are transmitted to DBMS in the
form of communication messages.
(vi) Authorization / Security Management: The DBMS
protects the database against unauthorized access, either
international or accidental. It furnishes mechanism to ensure
that only authorized users an access the database.

{vii) Backup and Recovery Management: The DBMS


provides mechanisms for backing up data periodically and
recovering from different types of failures. This prevents the
loss of data,
(viii) Concurrency Control Service: Since DBMSs support
sharing of data among multiple users, they must provide a
mechanism for managing concurrent access to the database.
DBMSs ensure that the database kept in consistent state and
that integrity of the data is preserved.
(ix) Transaction Management: A transaction is a series of
database operations, carried out by a single user or
38

application program, which accesses or changes the contents


of the database. Therefore, a DBMS must provide a
mechanism to ensure either that all the updates corresponding
to a given transaction are made or that none of them is made.

(x) Database Access and Application Programming


Interfaces: All DBMS provide interface to enable
applications to use DBMS services. They provide data access
via Structured Query Language (SQL). The DBMS query
language contains two components: (a) a Data Definition
Language (DDL) and (b) a Data Manipulation Language
(DML).
Components of DBMS:

The database management system can be divided into five


major components, they are:
1. Hardware
2. Software
3. Data
4. Procedures
5. Database Access Language
Let's have a simple diagram to see how they all fit together to
form a database management system.
39

DBMS Components: Hardware

When we say Hardware, we mean computer, hard disks, I/O


channels for data, and any other physical component involved
before any data is successfully stored into the memory.
When we run Oracle or MySQL on our personal computer, then
our computer's Hard Disk, our Keyboard using which we type
in all the commands, our computer's RAM, ROM all become a
part of the DBMS hardware.

DBMS Components: Software

This is the main component, as this is the program which


controls everything. The DBMS software is more like a
40

wrapper around the physical database, which provides us with


an easy-to-use interface to store, access and update data.
The DBMS software is capable of understanding the Database
Access Language and intrepret it into actual database
commands to execute them on the DB.

DBMS Components: Data

Data is that resource, for which DBMS was designed. The


motive behind the creation of DBMS was to store and utilise
data.
In a typical Database, the user saved Data is present and meta
data is stored.
Metadata is data about the data. This is information stored by
the DBMS to better understand the data stored in it.
For example: When I store my Name in a database, the
DBMS will store when the name was stored in the database,
what is the size of the name, is it stored as related data to some
other data, or is it independent, all this information is metadata.

DBMS Components: Procedures

Procedures refer to general instructions to use a database


management system. This includes procedures to setup and
install a DBMS, To login and logout of DBMS software, to
manage databases, to take backups, generating reports etc.
41

DBMS Components: Database Access Language

Database Access Language is a simple language designed to


write commands to access, insert, update and delete data stored
in any database.
A user can write commands in the Database Access Language
and submit it to the DBMS for execution, which is then
translated and executed by the DBMS.
User can create new databases, tables, insert data, fetch stored
data, update data and delete the data using the access language.

Users

 Database Administrators: Database Administrator or


DBA is the one who manages the complete database
management system. DBA takes care of the security of the
DBMS, it's availability, managing the license keys,
managing user accounts and access etc.
 Application Programmer or Software Developer: This
user group is involved in developing and desiging the
parts of DBMS.
 End User: These days all the modern applications, web or
mobile, store user data. How do you think they do it? Yes,
applications are programmed in such a way that they
collect user data and store the data on DBMS systems
running on their server. End users are the one who store,
retrieve, update and delete data.
42

DBMS users:

DBMS users generally access the database; it just specifies the


different accounts associated with the user. This may require
access for the user, and they are the ones who can take
advantage of accessing the database and its data. In DBMS, we
have different types of users who may have different types of
access depending on the requirement. DBMS categorizes the
users based on the access they have with the database.
Sometimes it is required to order access for the database
because they are particularly associated with the user id of the
user and password.

Types of Users
The users of the database can be classified into the following
groups −
Native users − The native users need not be aware of the
presence of the database system. They are end users of the
database who works through a menu driven application
programs, where the type and range of response is always
indicated to the user
Online users − Online users may communicate with databases
directly through an online terminal or indirectly through user
interface and application programs.
Sophisticated Users − They are those users who interact with
the system without writing the program instead they form their
request in database query language. They are the SQL
programmers, who are going to deal directly with the database.
43

They write queries to delete or select or insert and update the


database.
Specialized Users − Specialized users who write specialized
database applications that do not fit into the fractional database
processing framework.
Application Programmer − The application programmer
users who are responsible for developing the application
programs or user interface. The application programs could be
written in high level language. For example − Java, .net, php
etc,
Database Administrator (DBA) − It is a person or the group
in charge of implementing the database system within the
organization. The DBA has all the privileges allowed by the
DBMS and can assign or remove the privileges from the users.
Classification of Users
DBMS mainly classified into three users −
 End Users.
 Application Programmers.

 Database Administrator.

The classification of users in DBMS is pictorially represented


below −
44

Advantages and Disadvantages of DBMS:

Advantages of DBMS
The advantages of the DBMS are explained below −
 Redundancy problem can be solved.
In the File System, duplicate data is created in many places
because all the programs have their own files which create data
redundancy resulting in wastage of memory. In DBMS, all the
files are integrated in a single database. So there is no chance
of duplicate data.
For example: A student record in a library or examination can
contain duplicate values, but when they are converted into a
single database, all the duplicate values are removed.
 Has a very high security level.
Data security level is high by protecting your precious data
from unauthorized access. Only authorized users should have
the grant to access the database with the help of credentials.
45

 Presence of Data integrity.


Data integrity makes unification of so many files into a single
file. DBMS allows data integrity which makes it easy to
decrease data duplicity Data integration and reduces
redundancy as well as data inconsistency.
 Support multiple users.
DBMS allows multiple users to access the same database at a
time without any conflicts.
 Avoidance of inconsistency.
DBMS controls data redundancy and also controls data
consistency. Data consistency is nothing but if you want to
update data in any files then all the files should not be updated
again.
In DBMS, data is stored in a single database so data becomes
more consistent in comparison to file processing systems.
 Shared data
Data can be shared between authorized users of the database in
DBMS. All the users have their own right to access the
database. Admin has complete access to the database. He has
a right to assign users to access the database.
 Enforcement of standards
As DBMS have central control of the database. So, a DBA can
ensure that all the applications follow some standards such as
format of data, document standards etc. These standards help
in data migrations or in interchanging the data.
 Any unauthorized access is restricted
Unauthorized persons are not allowed to access the database
because of security credentials.
 Provide backup of data
46

Data loss is a big problem for all the organizations. In the file
system users have to back up the files in regular intervals which
lead to waste of time and resources.
DBMS solves this problem of taking backup automatically and
recovery of the database.
Tunability
Tuning means adjusting something to get a better performance.
Same in the case of DBMS, as it provides tunability to improve
performance. DBA adjusts databases to get effective results.
Disadvantages of DBMS
The disadvantages of DBMS are as follows:
 Complexity
The provision of the functionality that is expected of a good
DBMS makes the DBMS an extremely complex piece of
software. Database designers, developers, database
administrators and end-users must understand this
functionality to take full advantage of it.
Failure to understand the system can lead to bad design
decisions, which leads to a serious consequence for an
organization.
 Size
The functionality of DBMS makes use of a large piece of
software which occupies megabytes of disk space.
 Performance
Performance may not run as fast as desired.
 Higher impact of a failure
The centralization of resources increases the vulnerability of
the system because all users and applications rely on the
availability of DBMS, the failure of any component can bring
operation to halt.
47

 Cost of DBMS
The cost of DBMS varies significantly depending on the
environment and functionality provided. There is also the
recurrent annual maintenance cost.

DBMS languages:

o A DBMS has appropriate languages and interfaces to


express database queries and updates.
o Database languages can be used to read, store and update
the data in the database.
Types of Database Language

1. Data Definition Language


o DDL stands for Data Definition Language. It is used to

define database structure or pattern.


o It is used to create schema, tables, indexes, constraints,

etc. in the database.


48

o Using the DDL statements, you can create the skeleton of


the database.
o Data definition language is used to store the information
of metadata like the number of tables and schemas, their
names, indexes, columns in each table, constraints, etc.
Here are some tasks that come under DDL:
o Create: It is used to create objects in the database.
o Alter: It is used to alter the structure of the database.
o Drop: It is used to delete objects from the database.
o Truncate: It is used to remove all records from a table.
o Rename: It is used to rename an object.
o Comment: It is used to comment on the data dictionary.
These commands are used to update the database schema that's
why they come under Data definition language.
2. Data Manipulation Language
DML stands for Data Manipulation Language. It is used for
accessing and manipulating data in a database. It handles user
requests.
Here are some tasks that come under DML:
o Select: It is used to retrieve data from a database.
o Insert: It is used to insert data into a table.
o Update: It is used to update existing data within a table.
o Delete: It is used to delete all records from a table.
o Merge: It performs UPSERT operation, i.e., insert or
update operations.
o Call: It is used to call a structured query language or a
Java subprogram.
49

o Explain Plan: It has the parameter of explaining data.


o Lock Table: It controls concurrency.
3. Data Control Language
o DCL stands for Data Control Language. It is used to

retrieve the stored or saved data.


o The DCL execution is transactional. It also has rollback

parameters.
(But in Oracle database, the execution of data control
language does not have the feature of rolling back.)
Here are some tasks that come under DCL:
o Grant: It is used to give user access privileges to a
database.
o Revoke: It is used to take back permissions from the user.
There are the following operations which have the
authorization of Revoke:
CONNECT, INSERT, USAGE, EXECUTE, DELETE,
UPDATE and SELECT.
4. Transaction Control Language
TCL is used to run the changes made by the DML statement.
TCL can be grouped into a logical transaction.
Here are some tasks that come under TCL:
o Commit: It is used to save the transaction on the database.
o Rollback: It is used to restore the database to original
since the last Commit.
50

CHAPTER-2

Roles in the Database Environment

Data and Database Administrator:

A database administrator (DBA) is the information technician


responsible for directing and performing all activities related to
maintaining a successful database environment. A DBA makes
sure an organization's databases and related applications
operate functionally and efficiently.

Importance of a DBA
If your organization uses a database management system
(DBMS) for mission-critical workloads, it's important to have
on board one or more database administrators to ensure that
applications have ongoing, uninterrupted access to data. Most
modern organizations of every size use at least one DBMS, and
therefore the need for database administrators is greater today
than ever before.

The DBA is responsible for understanding and managing the


overall database environment. By developing and
implementing a strategic blueprint to follow when deploying
databases within their organization, DBAs are instrumental in
the ongoing efficacy of applications that rely on databases for
data storage and access.

Without the DBA's oversight, application and system outages,


downtime, and slowdowns will inevitably occur. These kinds
of problems result in business outages that can negatively affect
revenue, customer experiences and business reputation.
51

Data Administrator (DA) :


Data administrator is a person who is responsible for
processing data into a convenient data model. The person is in
charge of figuring out which data is relevant to be stored in the
database. Data Administrator is less of a technical role and
more of a business role with some technical knowledge. This
role is also known as Data Analyst. So, it is mostly a high level
function which is responsible for the overall management of
data resources in an organization.
Responsibilities :
 Filters out relevant data

 Monitor the data flow throughout the organization

 Designs concept-based data model

 Analyze and break down the data to be understood by the

non-tech person
02. Database Administrator (DBA) :
Database administrator is a person who creates updates and
maintains the database. It is more of a wide role as a data
administrator might be someone who is hired to create,
maintain, and backup the database, optimize the database for
high performance, or someone who helps in integrating
databases into applications. The major skills required to be an
excellent database administrator are troubleshooting, logical
thought process, and a strong will to learn as it involves a vast
area. This role is also known as Database Coordinator or
Database Programmer.
Responsibilities :
 Create and design a database

 Analyze and monitor database requirements

 Ensures data security

Difference between Data Administrator (DA) and


Database Administrator (DBA) :
52

DATA DATABASE
S.NO. ADMINISTRATOR ADMINISTRATOR

Database admin is also


known as database
Data admin is also coordinator or database
01. known as data analyst. programmer.

Data admin converts


data into a convenient Database admin inputs
02. data model. data into the database.

Data admin analyzes the Database admin


database for relevant optimizes and maintains
03. data. the database.

Data admin monitors


data flow across the Database admin ensures
04. organization. database security.

Data admin handles


issues concerning the Database admin handles
05. data. issues with the database.

Data admin requires Database admin mostly


excellent data analyzing, require logical thought
expression of ideas, and process, troubleshooting
06. strategic thinking. and will to learn.

Data admin is less of a Database admin is a wide


technical role and more role as it has multiple
07. of a business role. responsibilities
53

Main tasks include


Main tasks include data database design,
planning, definition, construction, security,
architecture and backup and recovery,
08. management etc. performance tuning etc.

It set policies and


standards , coordinates It enforces policies and
and manages database procedures, choose and
09. design. maintains technology.

Generally it owns the Where as it owns the


10. data. database.

It performs the high It performs the technical


11. level function. function.

Data administration is Database administration


12. DBMS independent. is DBMS specific.

Database Designers:

Database design can be generally defined as a collection of


tasks or processes that enhance the designing, development,
implementation, and maintenance of enterprise data
management system. Designing a proper database reduces the
maintenance cost thereby improving data consistency and the
cost-effective measures are greatly influenced in terms of disk
storage space. Therefore, there has to be a brilliant concept of
designing a database. The designer should follow the
54

constraints and decide how the elements correlate and what


kind of data must be stored.
The main objectives behind database designing are to produce
physical and logical design models of the proposed database
system. To elaborate this, the logical model is primarily
concentrated on the requirements of data and the considerations
must be made in terms of monolithic considerations and hence
the stored physical data must be stored independent of the
physical conditions. On the other hand, the physical database
design model includes a translation of the logical design model
of the database by keep control of physical media using
hardware resources and software systems such as Database
Management System (DBMS).
Why is Database Design important?
The important consideration that can be taken into account
while emphasizing the importance of database design can be
explained in terms of the following points given below.
1. Database designs provide the blueprints of how the data is
going to be stored in a system. A proper design of a
database highly affects the overall performance of any
application.
2. The designing principles defined for a database give a
clear idea of the behavior of any application and how the
requests are processed.
3. Another instance to emphasize the database design is that
a proper database design meets all the requirements of
users.
4. Lastly, the processing time of an application is greatly
reduced if the constraints of designing a highly efficient
database are properly implemented.
55

Life Cycle

Although, the life cycle of a database is not an important


discussion that has to be taken forward in this article because
we are focused on the database design. But, before jumping
directly on the designing models constituting database design
it is important to understand the overall workflow and life-cycle
of the database.
Requirement Analysis
First of all, the planning has to be done on what are the basic
requirements of the project under which the design of the
database has to be taken forward. Thus, they can be defined as:-
Planning - This stage is concerned with planning the entire
DDLC (Database Development Life Cycle). The strategic
considerations are taken into account before proceeding.
System definition - This stage covers the boundaries and
scopes of the proper database after planning.
Database Designing
The next step involves designing the database considering the
user-based requirements and splitting them out into various
models so that load or heavy dependencies on a single aspect
are not imposed. Therefore, there has been some model-centric
56

approach and that's where logical and physical models play a


crucial role.
Physical Model - The physical model is concerned with the
practices and implementations of the logical model.
Logical Model - This stage is primarily concerned with
developing a model based on the proposed requirements. The
entire model is designed on paper without any implementation
or adopting DBMS considerations.
Implementation
The last step covers the implementation methods and checking
out the behavior that matches our requirements. It is ensured
with continuous integration testing of the database with
different data sets and conversion of data into machine
understandable language. The manipulation of data is primarily
focused on these steps where queries are made to run and check
if the application is designed satisfactorily or not.
Data conversion and loading - This section is used to import
and convert data from the old to the new system.
Testing - This stage is concerned with error identification in
the newly implemented system. Testing is a crucial step
because it checks the database directly and compares the
requirement specifications.
Database Design Process
The process of designing a database carries various conceptual
approaches that are needed to be kept in mind. An ideal and
well-structured database design must be able to:
1. Save disk space by eliminating redundant data.
57

2. Maintains data integrity and accuracy.


3. Provides data access in useful ways.
4. Comparing Logical and Physical data models.
Logical
A logical data model generally describes the data in as many
details as possible, without having to be concerned about the
physical implementations in the database. Features of logical
data model might include:
1. All the entities and relationships amongst them.
2. Each entity has well-specified attributes.
3. The primary key for each entity is specified.
4. Foreign keys which are used to identify a relationship
between different entities are specified.
5. Normalization occurs at this level.
A logical model can be designed using the following approach:
1. Specify all the entities with primary keys.
2. Specify concurrent relationships between different
entities.
3. Figure out each entity attributes
4. Resolve many-to-many relationships.
5. Carry out the process of normalization.
Also, one important factor after following the above approach
is to critically examine the design based on requirement
gathering. If the above steps are strictly followed, there are
chances of creating a highly efficient database design that
follows the native approach.
58

To understand these points, see the image below to get a clear


picture.

If we compare the logical data model as shown in the figure


above with some sample data in the diagram, we can come up
with facts that in a conceptual data model there are no presence
of a primary key whereas a logical data model has primary keys
for all of its attributes. Also, logical data model the cover
relationship between different entities and carries room for
foreign keys to establish relationships among them.
Physical
A Physical data mode generally represents how the approach
or concept of designing the database. The main purpose of the
physical data model is to show all the structures of the table
including the column name, column data type, constraints,
59

keys(primary and foreign), and the relationship among tables.


The following are the features of a physical data model:
1. Specifies all the columns and tables.
2. Specifies foreign keys that usually define the relationship
between tables.
3. Based on user requirements, de-normalization might
occur.
4. Since the physical consideration is taken into account so
there will straightforward reasons for difference than a
logical model.
5. Physical models might be different for different RDBMS.
For example, the data type column may be different in
MySQL and SQL Server.
While designing a physical data model, the following points
should be taken into consideration:
1. Convert the entities into tables.
2. Convert the defined relationships into foreign keys.
3. Convert the data attributes into columns.
4. Modify the data model constraints based on physical
requirements.
60

Comparing this physical data model with the logical with the
previous logical model, we might conclude the differences that
in a physical database entity names are considered table names
and attributes are considered column names. Also, the data type
of each column is defined in the physical model depending on
the actual database used.
Glossary
Entity - An entity in the database can be defined as abstract
data that we save in our database. For example, a customer,
products.
Attributes - An attribute is a detailed form of data consisting
of entities like length, name, price, etc.
Relationship - A relationship can be defined as the connection
between two entities or figures. For example, a person can
relate to multiple persons in a family.
61

Foreign key - It acts as a referral to the Primary Key of another


table. A foreign key contains columns with values that exist
only in the primary key column they refer to.
Primary key - A primary key is the pointer of records that is
unique and not null and is used to uniquely identify attributes
of a table.
Normalization - A flexible data model needs to follow certain
rules. Applying these rules is called normalizing.

Applications Developers and Users:

Database Users
Database users are the ones who really use and take the benefits
of the database. There will be different types of users depending
on their needs and way of accessing the database.

1. Application Programmers – They are the developers


who interact with the database by means of DML queries.
These DML queries are written in the application
programs like C, C++, JAVA, Pascal, etc. These queries
are converted into object code to communicate with the
database. For example, writing a C program to generate
the report of employees who are working in a particular
department will involve a query to fetch the data from the
database. It will include an embedded SQL query in the C
Program.
2. Sophisticated Users – They are database developers, who
write SQL queries to select/insert/delete/update data.
They do not use any application or programs to request the
database. They directly interact with the database by
means of a query language like SQL. These users will be
scientists, engineers, analysts who thoroughly study SQL
62

and DBMS to apply the concepts in their requirements. In


short, we can say this category includes designers and
developers of DBMS and SQL.
3. Specialized Users – These are also sophisticated users,
but they write special database application programs.
They are the developers who develop the complex
programs to the requirement.
4. Stand-alone Users – These users will have a stand-alone
database for their personal use. These kinds of the
database will have readymade database packages which
will have menus and graphical interfaces.
5. Native Users – these are the users who use the existing
application to interact with the database. For example,
online library system, ticket booking systems, ATMs etc
which has existing application and users use them to
interact with the database to fulfill their requests.
Database Administrators
The life cycle of a database starts from designing,
implementing to the administration of it. A database for any
kind of requirement needs to be designed perfectly so that it
should work without any issues. Once all the design is
complete, it needs to be installed. Once this step is complete,
users start using the database. The database grows as the data
grows in the database. When the database becomes huge, its
performance comes down. Also accessing the data from the
database becomes a challenge. There will be unused memory
in the database, making the memory inevitably huge. This
administration and maintenance of the database are taken care
of by the database Administrator – DBA.
A DBA has many responsibilities. A good-performing database
is in the hands of DBA.
 Installing and upgrading the DBMS Servers: – DBA is

responsible for installing a new DBMS server for the new


63

projects. He is also responsible for upgrading these servers


as there are new versions that come into the market or
requirement. If there is any failure in the up-gradation of
the existing servers, he should be able to revert the new
changes back to the older version, thus maintaining the
DBMS working. He is also responsible for updating the
service packs/ hotfixes/ patches to the DBMS servers.
 Design and implementation: – Designing the database
and implementing is also DBA’s responsibility. He should
be able to decide on proper memory management, file
organizations, error handling, log maintenance, etc for the
database.
 Performance tuning: – Since the database is huge and it
will have lots of tables, data, constraints, and indices,
there will be variations in the performance from time to
time. Also, because of some designing issues or data
growth, the database will not work as expected. It is the
responsibility of the DBA to tune the database
performance. He is responsible to make sure all the
queries and programs work in a fraction of seconds.
 Migrate database servers: – Sometimes, users using
oracle would like to shift to SQL server or Netezza. It is
the responsibility of DBA to make sure that migration
happens without any failure, and there is no data loss.
 Backup and Recovery: – Proper backup and recovery
programs needs to be developed by DBA and has to be
maintained him. This is one of the main responsibilities of
DBA. Data/objects should be backed up regularly so that
if there is any crash, it should be recovered without much
effort and data loss.
 Security: – DBA is responsible for creating various
database users and roles, and giving them different levels
of access rights.
64

 Documentation: – DBA should be properly documenting


all his activities so that if he quits or any new DBA comes
in, he should be able to understand the database without
any effort. He should basically maintain all his
installation, backup, recovery, security methods. He
should keep various reports about database performance.
In order to perform his entire task, he should have very good
command over DBMS.
Types of DBA
There are different kinds of DBA depending on the
responsibility that he owns.

 Administrative DBA – This DBA is mainly concerned


with installing, and maintaining DBMS servers. His prime
tasks are installing, backups, recovery, security,
replications, memory management, configurations, and
tuning. He is mainly responsible for all administrative
tasks of a database.
 Development DBA – He is responsible for creating
queries and procedures for the requirement. Basically, his
task is similar to any database developer.
 Database Architect – Database architect is responsible
for creating and maintaining the users, roles, access rights,
tables, views, constraints, and indexes. He is mainly
responsible for designing the structure of the database
depending on the requirement. These structures will be
used by developers and development DBA to code.
 Data Warehouse DBA –DBA should be able to maintain
the data and procedures from various sources in the data
warehouse. These sources can be files, COBOL, or any
other programs. Here data and programs will be from
different sources. A good DBA should be able to keep the
65

performance and function levels from these sources at the


same pace to make the data warehouse work.
 Application DBA –He acts like a bridge between the
application program and the database. He makes sure all
the application program is optimized to interact with the
database. He ensures all the activities from installing,
upgrading, and patching, maintaining, backup, recovery to
executing the records work without any issues.
 OLAP DBA – He is responsible for installing and
maintaining the database in OLAP systems. He maintains
only OLAP databases.
66

SECTION-II

Database System Architecture


Database System Architecture:

DBMS Architecture
o The DBMS design depends upon its architecture. The

basic client/server architecture is used to deal with a large


number of PCs, web servers, database servers and other
components that are connected with networks.
o The client/server architecture consists of many PCs and a

workstation which are connected via the network.


o DBMS architecture depends upon how users are
connected to the database to get their request done.
Types of DBMS Architecture
67

Database architecture can be seen as a single tier or multi-tier.


But logically, database architecture is of two types like: 2-tier
architecture and 3-tier architecture.
1-Tier Architecture
o In this architecture, the database is directly available to the

user. It means the user can directly sit on the DBMS and
uses it.
o Any changes done here will directly be done on the

database itself. It doesn't provide a handy tool for end


users.
o The 1-Tier architecture is used for development of the

local application, where programmers can directly


communicate with the database for the quick response.
2-Tier Architecture
o The 2-Tier architecture is same as basic client-server. In

the two-tier architecture, applications on the client end can


directly communicate with the database at the server side.
For this interaction, API's like: ODBC, JDBC are used.
o The user interfaces and application programs are run on

the client-side.
o The server side is responsible to provide the
functionalities like: query processing and transaction
management.
o To communicate with the DBMS, client-side application

establishes a connection with the server side.


68

Fig: 2-tier Architecture


3-Tier Architecture
o The 3-Tier architecture contains another layer between the

client and server. In this architecture, client can't directly


communicate with the server.
o The application on the client-end interacts with an

application server which further communicates with the


database system.
o End user has no idea about the existence of the database

beyond the application server. The database also has no


idea about any other user beyond the application.
o The 3-Tier architecture is used in case of large web

application.
69

Fig: 3-tier Architecture

Three Levels of Architecture:

o The three schema architecture is also called ANSI/SPARC


architecture or three-level architecture.
o This framework is used to describe the structure of a
specific database system.
o The three schema architecture is also used to separate the
user applications and physical database.
o The three schema architecture contains three-levels. It
breaks the database down into three different categories.
The three-schema architecture is as follows:
70

In the above diagram:


o It shows the DBMS architecture.
o Mapping is used to transform the request and response
between various database levels of architecture.
o Mapping is not good for small DBMS because it takes
more time.
o In External / Conceptual mapping, it is necessary to
transform the request from external level to conceptual
schema.
o In Conceptual / Internal mapping, DBMS transform the
request from the conceptual to internal level.
71

Objectives of Three schema Architecture


The main objective of three level architecture is to enable
multiple users to access the same data with a personalized view
while storing the underlying data only once. Thus it separates
the user's view from the physical structure of the database. This
separation is desirable for the following reasons:
o Different users need different views of the same data.
o The approach in which a particular user needs to see the
data may change over time.
o The users of the database should not worry about the
physical implementation and internal workings of the
database such as data compression and encryption
techniques, hashing, optimization of the internal structures
etc.
o All users should be able to access the same data according
to their requirements.
o DBA should be able to change the conceptual structure of
the database without affecting the user's
o Internal structure of the database should be unaffected by
changes to physical aspects of the storage.
1. Internal Level

o The internal level has an internal schema which describes


the physical storage structure of the database.
72

o The internal schema is also known as a physical schema.


o It uses the physical data model. It is used to define that
how the data will be stored in a block.
o The physical level is used to describe complex low-level
data structures in detail.
The internal level is generally is concerned with the following
activities:
1
o Storage space allocations.
For Example: B-Trees, Hashing etc.
o Access paths.
For Example: Specification of primary and secondary
keys, indexes, pointers and sequencing.
o Data compression and encryption techniques.
o Optimization of internal structures.
o Representation of stored fields.
2. Conceptual Level

o The conceptual schema describes the design of a database


at the conceptual level. Conceptual level is also known as
logical level.
o The conceptual schema describes the structure of the
whole database.
73

o The conceptual level describes what data are to be stored


in the database and also describes what relationship exists
among those data.
o In the conceptual level, internal details such as an
implementation of the data structure are hidden.
o Programmers and database administrators work at this
level.
3. External Level

o At the external level, a database contains several schemas


that sometimes called as subschema. The subschema is
used to describe the different view of the database.
o An external schema is also known as view schema.
o Each view schema describes the database part that a
particular user group is interested and hides the remaining
database from that user group.
o The view schema describes the end user interaction with
database systems.
Mapping between Views
The three levels of DBMS architecture don't exist
independently of each other. There must be correspondence
between the three levels i.e. how they actually correspond with
each other. DBMS is responsible for correspondence between
the three types of schema. This correspondence is called
Mapping.
74

There are basically two types of mapping in the database


architecture:
o Conceptual/ Internal Mapping
o External / Conceptual Mapping
Conceptual/ Internal Mapping
The Conceptual/ Internal Mapping lies between the conceptual
level and the internal level. Its role is to define the
correspondence between the records and fields of the
conceptual level and files and data structures of the internal
level.
External/ Conceptual Mapping
The external/Conceptual Mapping lies between the external
level and the Conceptual level. Its role is to define the
correspondence between a particular external and the
conceptual view.

External, Conceptual and Internal Levels:

This architecture contains three layers of database


management system, which are as follows −
 External level
 Conceptual level
 Internal level
External/ View level
This is the highest level of database abstraction. It includes a
number of external schemas or user views. This level provides
different views of the same database for a specific user or a
75

group of users. An external view provides a powerful and


flexible security mechanism by hiding the parts of the database
from a particular user.
Conceptual or Logical level
This level describes the structure of the whole database. It acts
as a middle layer between the physical storage and user view.
It explains what data to be stored in the database, what the data
types are, and what relationship exists among those data. There
is only one conceptual schema per database.
This level describes the structure of the whole database. It acts
as a middle layer between the physical storage and user view.
It explains what data to be stored in the database, what the data
types are, and what relationship exists among those data. There
is only one conceptual schema per database.
Internal or Physical level
This is the lowest level of database abstraction. It describes
how the data is stored in the database and provides the methods
to access data from the database. It allows viewing the physical
representation of the database on the computer system.
The interface between the conceptual and internal schema
identifies how an element in the conceptual schema is stored
and how it may be accessed. It is one which is closest to
physical storage. The internal schema not only defines
different stored record types, but also specifies what indices
exist, how stored fields are represented.
The three level schema architecture in DBMS is given below

76

Schemas, Mappings and Instances:

1. Instances :
Instances are the collection of information stored at a
particular moment. The instances can be changed by certain
CRUD operations as like addition, deletion of data. It may be
noted that any search query will not make any kind of
changes in the instances.
Example –
Let’s say a table teacher in our database whose name is
School, suppose the table has 50 records so the instance of
the database has 50 records for now and tomorrow we are
going to add another fifty records so tomorrow the instance
have total 100 records. This is called an instance.
77

2. Schema :

Schema is the overall description of the database. The basic


structure of how the data will be stored in the database is
called schema.

Schema is of three types: Logical Schema, Physical Schema


and view Schema.
1. Logical Schema – It describes the database designed at
logical level.
2. Physical Schema – It describes the database designed at
physical level.
3. View Schema – It defines the design of the database at the
view level.
78

Example –
Let’s say a table teacher in our database name school, the
teacher table require the name, dob, doj in their table so we
design a structure as :
Teacher table
name: String
doj: date
dob: date
Above given is the schema of the table teacher.
Difference between Schema and Instance :
Schema Instance

It is the collection of
information stored in a
It is the overall description of database at a particular
the database. moment.

Data in instances can be


Schema is same for whole changed using addition,
database. deletion, updation.

Does not change Frequently. Changes Frequently.

Defines the basic structure of


the database i.e how the data It is the set of Information
will be stored in the database. stored at a particular time.
79

Data Independence:

o Data independence can be explained using the three-


schema architecture.
o Data independence refers characteristic of being able to
modify the schema at one level of the database system
without altering the schema at the next higher level.
There are two types of data independence:
1. Logical Data Independence
o Logical data independence refers characteristic of being
able to change the conceptual schema without having to
change the external schema.
o Logical data independence is used to separate the
external level from the conceptual view.
o If we do any changes in the conceptual view of the data,
then the user view of the data would not be affected.
o Logical data independence occurs at the user interface
level.
Logical Data Independence is the ability to change the
conceptual scheme without changing
1. External views
2. External API or programs
Any change made will be absorbed by the mapping between
external and conceptual levels.
When compared to Physical Data independence, it is
challenging to achieve logical data independence.
80

Examples of changes under Logical Data Independence


Due to Logical independence, any of the below change will
not affect the external layer.
1. Add/Modify/Delete a new attribute, entity or relationship
is possible without a rewrite of existing application
programs
2. Merging two records into one
3. Breaking an existing record into two or more records

2. Physical Data Independence


o Physical data independence can be defined as the
capacity to change the internal schema without having to
change the conceptual schema.
o If we do any changes in the storage size of the database
system server, then the Conceptual structure of the
database will not be affected.
o Physical data independence is used to separate
conceptual levels from the internal levels.
o Physical data independence occurs at the logical
interface level.
81

With Physical independence, you can easily change the


physical storage structures or devices with an effect on the
conceptual schema. Any change done would be absorbed by
the mapping between the conceptual and internal levels.
Physical data independence is achieved by the presence of the
internal level of the database and then the transformation from
the conceptual level of the database to the internal level.
Examples of changes under Physical Data Independence
Due to Physical independence, any of the below change will
not affect the conceptual layer.
 Using a new storage device like Hard Drive or Magnetic
Tapes
82

 Modifying the file organization technique in the


Database
 Switching to different data structures.
 Changing the access method.
 Modifying indexes.
 Changes to compression techniques or hashing
algorithms.
 Change of Location of Database from say C drive to D
Drive

Difference between Physical and Logical Data


Independence
Physical Data
Logica Data Independence
Independence
Logical Data Independence is
mainly concerned with the Mainly concerned with the
structure or changing the data storage of the data.
definition.
It is difficult as the retrieving of
data is mainly dependent on the It is easy to retrieve.
logical structure of data.
Compared to Logic Physical Compared to Logical
independence it is difficult to Independence it is easy to
achieve logical data achieve physical data
independence. independence.
You need to make changes in A change in the physical
the Application program if new level usually does not need
83

fields are added or deleted from change at the Application


the database. program level.
Modification at the logical Modifications made at the
levels is significant whenever internal levels may or may
the logical structures of the not be needed to improve the
database are changed. performance of the structure.
Concerned with conceptual Concerned with internal
schema schema
Example: change in
Example: Add/Modify/Delete a compression techniques,
new attribute hashing algorithms, storage
devices, etc

Importance of Data Independence


 Helps you to improve the quality of the data
 Database system maintenance becomes affordable
 Enforcement of standards and improvement in database
security
 You don’t need to alter data structure in application
programs
 Permit developers to focus on the general structure of the
Database rather than worrying about the internal
implementation
 It allows you to improve state which is undamaged or
undivided
 Database incongruity is vastly reduced.
 Easily make modifications in the physical level is needed
to improve the performance of the system.
84

Classification of Database Management System:


There are various types of databases used for storing different
varieties of data:

1) Centralized Database
It is the type of database that stores data at a centralized
database system. It comforts the users to access the stored data
from different locations through several applications. These
applications contain the authentication process to let users
access data securely. An example of a Centralized database can
be Central Library that carries a central database of each library
in a college/university.
Advantages of Centralized Database
o It has decreased the risk of data management, i.e.,

manipulation of data will not affect the core data.


o Data consistency is maintained as it manages data in a

central repository.
o It provides better data quality, which enables
organizations to establish data standards.
85

o It is less costly because fewer vendors are required to


handle the data sets.
Disadvantages of Centralized Database
o The size of the centralized database is large, which

increases the response time for fetching the data.


o It is not easy to update such an extensive database system.

o If any server failure occurs, entire data will be lost, which

could be a huge loss.


2) Distributed Database
Unlike a centralized database system, in distributed systems,
data is distributed among different database systems of an
organization. These database systems are connected via
communication links. Such links help the end-users to access
the data easily. Examples of the Distributed database are
Apache Cassandra, HBase, Ignite, etc.
We can further divide a distributed database system into:
86

o Homogeneous DDB: Those database systems which


execute on the same operating system and use the same
application process and carry the same hardware devices.
o Heterogeneous DDB: Those database systems which
execute on different operating systems under different
application procedures, and carries different hardware
devices.
Advantages of Distributed Database
o Modular development is possible in a distributed database,

i.e., the system can be expanded by including new


computers and connecting them to the distributed system.
o One server failure will not affect the entire data set.

3) Relational Database
This database is based on the relational data model, which
stores data in the form of rows(tuple) and columns(attributes),
and together forms a table(relation). A relational database uses
SQL for storing, manipulating, as well as maintaining the data.
E.F. Codd invented the database in 1970. Each table in the
database carries a key that makes the data unique from
others. Examples of Relational databases are MySQL,
Microsoft SQL Server, Oracle, etc.
Properties of Relational Database
There are following four commonly known properties of a
relational model known as ACID properties, where:
A means Atomicity: This ensures the data operation will
complete either with success or with failure. It follows the 'all
or nothing' strategy. For example, a transaction will either be
committed or will abort.
87

C means Consistency: If we perform any operation over the


data, its value before and after the operation should be
preserved. For example, the account balance before and after
the transaction should be correct, i.e., it should remain
conserved.
I means Isolation: There can be concurrent users for accessing
data at the same time from the database. Thus, isolation
between the data should remain isolated. For example, when
multiple transactions occur at the same time, one transaction
effects should not be visible to the other transactions in the
database.
D means Durability: It ensures that once it completes the
operation and commits the data, data changes should remain
permanent.
4) NoSQL Database
Non-SQL/Not Only SQL is a type of database that is used for
storing a wide range of data sets. It is not a relational database
as it stores data not only in tabular form but in several different
ways. It came into existence when the demand for building
modern applications increased. Thus, NoSQL presented a wide
variety of database technologies in response to the demands.
We can further divide a NoSQL database into the following
four types:
88

a. Key-value storage: It is the simplest type of database


storage where it stores every single item as a key (or attribute
name) holding its value, together.
b. Document-oriented Database: A type of database used
to store data as JSON-like document. It helps developers
in storing data by using the same document-model format
as used in the application code.
c. Graph Databases: It is used for storing vast amounts of
data in a graph-like structure. Most commonly, social
networking websites use the graph database.
d. Wide-column stores: It is similar to the data represented
in relational databases. Here, data is stored in large
columns together, instead of storing in rows.
Advantages of NoSQL Database
o It enables good productivity in the application
development as it is not required to store data in a
structured format.
89

o It is a better option for managing and handling large data


sets.
o It provides high scalability.
o Users can quickly access data from the database through
key-value.
5) Cloud Database
A type of database where data is stored in a virtual environment
and executes over the cloud computing platform. It provides
users with various cloud computing services (SaaS, PaaS, IaaS,
etc.) for accessing the database. There are numerous cloud
platforms, but the best options are:
o Amazon Web Services(AWS)
o Microsoft Azure
o Kamatera
o PhonixNAP
o ScienceSoft
o Google Cloud SQL, etc.
6) Object-oriented Databases
The type of database that uses the object-based data model
approach for storing data in the database system. The data is
represented and stored as objects which are similar to the
objects used in the object-oriented programming language.
7) Hierarchical Databases
It is the type of database that stores data in the form of parent-
children relationship nodes. Here, it organizes data in a tree-
like structure.
90

Data get stored in the form of records that are connected via
links. Each child record in the tree will contain only one parent.
On the other hand, each parent record can have multiple child
records.
8) Network Databases
It is the database that typically follows the network data model.
Here, the representation of data is in the form of nodes
connected via links between them. Unlike the hierarchical
database, it allows each record to have multiple children and
parent nodes to form a generalized graph structure.
9) Personal Database
Collecting and storing data on the user's system defines a
Personal Database. This database is basically designed for a
single user.
91

Advantage of Personal Database


o It is simple and easy to handle.

o It occupies less storage space as it is small in size.

10) Operational Database


The type of database which creates and updates the database in
real-time. It is basically designed for executing and handling
the daily data operations in several businesses. For example,
An organization uses operational databases for managing per
day transactions.
11) Enterprise Database
Large organizations or enterprises use this database for
managing a massive amount of data. It helps organizations to
increase and improve their efficiency. Such a database allows
simultaneous access to users.
Advantages of Enterprise Database:
o Multi processes are supportable over the Enterprise

database.
o It allows executing parallel queries on the system.

Centralized and Client Server architecture to DBMS


Centralized DBMS:
a) Merge everything into single system including- Hardware,
DBMS software, application programs, and user interface
processing software.
b) User can still connect by a remote terminal – but all
processing is done at centralized site.
92

Physical Centralized Architecture:

Architectures for DBMS have pursued trends similar to those


generating computer system architectures. Earlier architectures
utilized mainframes computers to provide the main processing
for all system functions including user application programs as
well as user interface programs as well all DBMS functionality.
The reason was that the majority of users accessed such
systems via computer terminals that did not have processing
power and only provided display capabilities. Thus all
processing was performed remotely on the computer system
and only display information and controls were sent from the
computer to the display terminals which were connected to
central computer via a variety of types of communication
networks.
As prices of hardware refused most users replaced their
terminals with PCs and workstations. At first database systems
utilized these computers similarly to how they have used is play
terminals so that DBMS itself was still a Centralized DBMS in
which all the DBMS functionality application program
execution and user interface processing were carried out on one
Machine.
93

Basic 2-tier Client-Server Architectures:


• Specialized Servers with Specialized functions
• Print server
• File server
• DBMS server
• Web server
• Email server
• Clients are able to access the specialized servers as needed
Logical two-tier client server architecture:

Clients:
• Offer appropriate interfaces through a client software module
to access as well as utilize the various server resources.
• Clients perhaps diskless machines or PCs or Workstations
with disks with only the client software installed.
• Connected to the servers by means of some form of a network.
• (LAN- local area network, wireless network and so on.)
DBMS Server:
• Provides database query as well as transaction services to the
clients
• Relational DBMS servers are habitually called query servers,
SQL servers, or transaction servers
94

• Applications running on clients use an Application Program


Interface (API) to access server databases via standard interface
such as:
ODBC- Open Database Connectivity standard
JDBC- for Java programming access
Client and server should install appropriate client module and
server module software for ODBC or JDBC
Two Tier Client-Server Architecture:
a) A client program may perhaps connect to several DBMSs
sometimes called the data sources.
b) In general data sources are able to be files or other non-
DBMS software that manages data. Other variations of clients
are likely- example in some object DBMSs more functionality
is transferred to clients including data dictionary functions,
optimization as well as recovery across multiple servers etc.

Three Tier Client-Server Architecture:


a) Common for Web applications.
b) Intermediate Layer entitled Application Server or Web
Server.
c) Stores the web connectivity software as well as the business
logic part of the application used to access the corresponding
data from the database server.
d) Acts like a conduit for sending moderately processed data
between the database server and the client.
e) Three-tier Architecture is able to Enhance Security:
95

• Database server merely accessible via middle tier.


• Clients can’t directly access database server.

Classification of DBMS's:
• Based on the data model used
• Traditional- Network, Relational, Hierarchical.
• Emerging- Object-oriented and Object-relational.
• Other classifications
• Single-user (typically utilized with personal computers) v/s
multi-user (most DBMSs).
• Centralized (utilizes a single computer with one database)
v/s distributed (uses multiple computers and multiple
databases)
Variations of Distributed DBMSs (DDBMSs):
• Homogeneous DDBMS
• Heterogeneous DDBMS
• Federated or Multi-database Systems
• Distributed Database Systems have at the present come to be
known as client-server based database systems because
• They don’t support a totally distributed environment
however rather a set of database servers supporting a set of
clients.
96

Cost considerations for DBMSs:


• Cost Range- from free open-source systems to
configurations costing millions of dollars
• Instances of free relational DBMSs- MySQL, PostgreSQL
and others.

DBMS Architecture:
o The DBMS design depends upon its architecture. The
basic client/server architecture is used to deal with a large
number of PCs, web servers, database servers and other
components that are connected with networks.
o The client/server architecture consists of many PCs and a
workstation which are connected via the network.
o DBMS architecture depends upon how users are
connected to the database to get their request done.
Types of DBMS Architecture
97

Database architecture can be seen as a single tier or multi-tier.


But logically, database architecture is of two types like: 2-tier
architecture and 3-tier architecture.
1-Tier Architecture
o In this architecture, the database is directly available to the

user. It means the user can directly sit on the DBMS and
uses it.
o Any changes done here will directly be done on the

database itself. It doesn't provide a handy tool for end


users.
o The 1-Tier architecture is used for development of the

local application, where programmers can directly


communicate with the database for the quick response.
2-Tier Architecture
o The 2-Tier architecture is same as basic client-server. In

the two-tier architecture, applications on the client end can


directly communicate with the database at the server side.
For this interaction, API's like: ODBC, JDBC are used.
o The user interfaces and application programs are run on

the client-side.
o The server side is responsible to provide the
functionalities like: query processing and transaction
management.
o To communicate with the DBMS, client-side application

establishes a connection with the server side.


98

Fig: 2-tier Architecture


3-Tier Architecture
o The 3-Tier architecture contains another layer between the

client and server. In this architecture, client can't directly


communicate with the server.
o The application on the client-end interacts with an

application server which further communicates with the


database system.
o End user has no idea about the existence of the database

beyond the application server. The database also has no


idea about any other user beyond the application.
o The 3-Tier architecture is used in case of large web

application.
99

Data Models:

Data Model is the modeling of the data description, data


semantics, and consistency constraints of the data. It provides
the conceptual tools for describing the design of a database at
each level of data abstraction. Therefore, there are following
four data models used for understanding the structure of the
database:
100

1) Relational Data Model: This type of model designs the data


in the form of rows and columns within a table. Thus, a
relational model uses tables for representing data and in-
between relationships. Tables are also called relations. This
model was initially described by Edgar F. Codd, in 1969. The
relational data model is the widely used model which is
primarily used by commercial data processing applications.
2) Entity-Relationship Data Model: An ER model is the
logical representation of data as objects and relationships
among them. These objects are known as entities, and
relationship is an association among these entities. This model
was designed by Peter Chen and published in 1976 papers. It
was widely used in database designing. A set of attributes
describe the entities. For example, student_name, student_id
describes the 'student' entity. A set of the same type of entities
is known as an 'Entity set', and the set of the same type of
relationships is known as 'relationship set'.
101

3) Object-based Data Model: An extension of the ER model


with notions of functions, encapsulation, and object identity, as
well. This model supports a rich type system that includes
structured and collection types. Thus, in 1980s, various
database systems following the object-oriented approach were
developed. Here, the objects are nothing but the data carrying
its properties.
4) Semistructured Data Model: This type of data model is
different from the other three data models (explained above).
The semistructured data model allows the data specifications at
places where the individual data items of the same type may
have different attributes sets. The Extensible Markup
Language, also known as XML, is widely used for representing
the semistructured data. Although XML was initially designed
for including the markup information to the text document, it
gains importance because of its application in the exchange of
data.

Records- based Data Models:

Data Model is the model that organizes elements of the data


and tell how they relate to one-another and with the properties
of real-world entities. The basic purpose of the data model is
to make sure that the data stored in the data model is
understood fully.
Further, it has three types-
1. Physical Data Model,
2. Record-Based Data Model,
3. Object-Oriented Data Model
Physical Data Model is not used much nowadays. In this, we
will study about the Record-Based Data Model in detail.
102

Record-Based Data Model :


When the database is organized in some fixed format of
records of several than the model is known as Record-Based
Data Model. It has a fixed number of fields or attributes in
each record type and each field is usually of a fixed length.
Further, it is classified into three types-
1. Hierarchical Data Model :
In hierarchical type, the model data are represented by
collection of records. In this, relationships among the data
are represented by links. In this model, tree data structure
is used.
It was developed in 1960s by IBM, to manage large amount
of data for complex manufacturing projects. The basic logic
structure of hierarchical data model is upside-down “tree”.

Advantages –
Simplicity, Data Integrity, Data security, Efficiency, Easy
availability of expertise.
Disadvantages –
Complexity, Inflexibility, Lack of Data Independence,
Lack of querying facility, Data Manipulation Language,
Lack Of standards.
2. Network Data Model :
In network type, the model data are represented by
collection of records. In this, relationships among the data
103

are represented by links. Graph data structures are used in


this model. It permits a record to have more than one
parent.
For Example- Social Media sites like Facebook, Instagram
etc.

Advantages –
Simplicity, Data Integrity, Data Independence, Database
standards.
Disadvantages –
System Complexity, Lack of structural Independence.
3. Relational Data Model :
Relational Data Model uses tables to represent the data and
the relationship among these data. Each table has multiple
columns and each column is identified by a unique name.
It is a low-level model.
104

Advantages –
Structural Independence, Simplicity, Ease of designing,
Implementation, Ad-Hoc query capability.
Disadvantages –
Hardware Overheads, Ease of design can result in bad
design.

Object-based Data Models:

Need of Object Oriented Data Model :


To represent the complex real world problems there was a
need for a data model that is closely related to real world.
105

Object Oriented Data Model represents the real world


problems easily.
Object Oriented Data Model :
In Object Oriented Data Model, data and their relationships
are contained in a single structure which is referred as object
in this data model. In this, real world problems are
represented as objects with different attributes. All objects
have multiple relationships between them. Basically, it is
combination of Object Oriented programming and Relational
Database Model as it is clear from the following figure :
Object Oriented Data Model
= Combination of Object Oriented Programming + Relational
database model
Components of Object Oriented Data Model :

Basic Object Oriented Data Model

 Objects –
An object is an abstraction of a real world entity or we can
106

say it is an instance of class. Objects encapsulates data and


code into a single unit which provide data abstraction by
hiding the implementation details from the user. For
example: Instances of student, doctor, engineer in above
figure.

 Attribute –
An attribute describes the properties of object. For
example: Object is STUDENT and its attribute are Roll no,
Branch, Setmarks() in the Student class.

 Methods –
Method represents the behavior of an object. Basically, it
represents the real-world action. For example: Finding a
STUDENT marks in above figure as Setmarks().

 Class –
A class is a collection of similar objects with shared
structure i.e. attributes and behavior i.e. methods. An
object is an instance of class. For example: Person,
Student, Doctor, Engineer in above figure.

class student
{
char Name[20];
int roll_no;
--
--
public:
void search();
107

void update();
}
In this example, students refers to class and S1, S2 are the
objects of class which can be created in main function.
 Inheritance –
By using inheritance, new class can inherit the attributes
and methods of the old class i.e. base class. For example:
as classes Student, Doctor and Engineer are inherited from
the base class Person.

Advantages of Object Oriented Data Model :


 Codes can be reused due to inheritance.

 Easily understandable.

 Cost of maintenance can reduced due to reusability of

attributes and functions because of inheritance.


Disadvantages of Object Oriented Data Model :
 It is not properly developed so not accepted by users

easily.

Physical Data Models:

Physical Data Model


A Physical Data Model describes a database-specific
implementation of the data model. It offers database abstraction
and helps generate the schema. This is because of the richness
of meta-data offered by a Physical Data Model. The physical
data model also helps in visualizing database structure by
replicating database column keys, constraints, indexes,
triggers, and other RDBMS features.
108

Physical Data Model


Characteristics of a physical data model:
 The physical data model describes data need for a single
project or application though it maybe integrated with
other physical data models based on project scope.
 Data Model contains relationships between tables that
which addresses cardinality and nullability of the
relationships.
 Developed for a specific version of a DBMS, location,
data storage or technology to be used in the project.
 Columns should have exact datatypes, lengths assigned
and default values.
 Primary and Foreign keys, views, indexes, access profiles,
and authorizations, etc. are defined.

Conceptual Modeling:

A Conceptual Data Model is an organized view of database


concepts and their relationships. The purpose of creating a
conceptual data model is to establish entities, their attributes,
and relationships. In this data modeling level, there is hardly
any detail available on the actual database structure. Business
stakeholders and data architects typically create a conceptual
data model.
109

Conceptual data model, describes the database at a very high


level and is useful to understand the needs or requirements of
the database. It is this model, that is used in the requirement
gathering process i.e., before the Database Designers start
making a particular database. One such popular model is
the entity/relationship model (ER model). The E/R model
specializes in entities, relationships and even attributes which
are used by the database designers. In terms of this concept, a
discussion can be made even with non-computer science(non-
technical) users and stakeholders, and their requirements can
be understood.
The 3 basic tenants of Conceptual Data Model are
 Entity: A real-world thing
 Attribute: Characteristics or properties of an entity
 Relationship: Dependency or association between two
entities
Data model example:
 Customer and Product are two entities. Customer number
and name are attributes of the Customer entity
 Product name and price are attributes of product entity
 Sale is the relationship between the customer and product

Conceptual Data Model


110

Characteristics of a conceptual data model


 Offers Organisation-wide coverage of the business
concepts.
 This type of Data Models are designed and developed for
a business audience.
 The conceptual model is developed independently of
hardware specifications like data storage capacity,
location or software specifications like DBMS vendor and
technology. The focus is to represent data as a user will
see it in the “real world.”
Conceptual data models known as Domain models create a
common vocabulary for all stakeholders by establishing basic
concepts and scope.
111

SECTION-III

Entity-Relationship Model:

o ER model stands for an Entity-Relationship model. It is a


high-level data model. This model is used to define the
data elements and relationship for a specified system.
o It develops a conceptual design for the database. It also
develops a very simple and easy to design view of data.
o In ER modeling, the database structure is portrayed as a
diagram called an entity-relationship diagram.
For example, Suppose we design a school database. In this
database, the student will be an entity with attributes like
address, name, id, age, etc. The address can be another entity
with attributes like city, street name, pin code, etc and there will
be a relationship between them.
112

Component of ER Diagram

1. Entity:
An entity may be any object, class, person or place. In the ER
diagram, an entity can be represented as rectangles.
Consider an organization as an example- manager, product,
employee, department etc. can be taken as an entity.
113

a. Weak Entity
An entity that depends on another entity called a weak entity.
The weak entity doesn't contain any key attribute of its own.
The weak entity is represented by a double rectangle.

2. Attribute
The attribute is used to describe the property of an entity.
Eclipse is used to represent an attribute.
For example, id, age, contact number, name, etc. can be
attributes of a student.
114

a. Key Attribute
The key attribute is used to represent the main characteristics
of an entity. It represents a primary key. The key attribute is
represented by an ellipse with the text underlined.
115

b. Composite Attribute
An attribute that composed of many other attributes is known
as a composite attribute. The composite attribute is represented
by an ellipse, and those ellipses are connected with an ellipse.

c. Multivalued Attribute
An attribute can have more than one value. These attributes are
known as a multivalued attribute. The double oval is used to
represent multivalued attribute.
For example, a student can have more than one phone number.
116

d. Derived Attribute
An attribute that can be derived from other attribute is known
as a derived attribute. It can be represented by a dashed ellipse.
For example, A person's age changes over time and can be
derived from another attribute like Date of birth.

3. Relationship
A relationship is used to describe the relation between entities.
Diamond or rhombus is used to represent the relationship.
117

Types of relationship are as follows:


a. One-to-One Relationship
When only one instance of an entity is associated with the
relationship, then it is known as one to one relationship.
For example, A female can marry to one male, and a male can
marry to one female.

b. One-to-many relationship
When only one instance of the entity on the left, and more than
one instance of an entity on the right associates with the
relationship then this is known as a one-to-many relationship.
For example, Scientist can invent many inventions, but the
invention is done by the only specific scientist.
118

c. Many-to-one relationship
When more than one instance of the entity on the left, and only
one instance of an entity on the right associates with the
relationship then it is known as a many-to-one relationship.
For example, Student enrolls for only one course, but a course
can have many students.

d. Many-to-many relationship
When more than one instance of the entity on the left, and more
than one instance of an entity on the right associates with the
relationship then it is known as a many-to-many relationship.
For example, Employee can assign by many projects and
project can have many employees.

Entity Types:

Entity in DBMS can be a real-world object with an existence,


For example, in a College database, the entities can be
Professor, Students, Courses, etc.
119

Entities has attributes, which can be considered as properties


describing it, for example, for Professor entity, the attributes
are Professor_Name, Professor_Address,
Professor_Salary, etc. The attribute value gets stored in the
database.
Example of Entity in DBMS
Let us see an example −
<Professor>
Professor_I Professor_Nam Professor_Cit Professor_Salar
D e y y
P01 Tom Sydney $7000
P02 David Brisbane $4500
P03 Mark Perth $5000
Here, Professor_Name, Professor _Address and Professor
_Salary are attributes. Professor_ID is the primary key
Types of DBMS Entities
The following are the types of entities in DBMS −
Strong Entity
The strong entity has a primary key. Weak entities are
dependent on strong entity. Its existence is not dependent on
any other entity.
Strong Entity is represented by a single rectangle −
120

Continuing our previous example, Professor is a strong entity


here, and the primary key is Professor_ID.
Weak Entity
The weak entity in DBMS do not have a primary key and are
dependent on the parent entity. It mainly depends on other
entities.
Weak Entity is represented by double rectangle −

Continuing our previous example, Professor is a strong entity,


and the primary key is Professor_ID. However, another entity
is Professor_Dependents, which is our Weak Entity.
121

<Professor_Dependents>

Name DOB Relation


This is a weak entity since its existence is dependent on another
entity Professor, which we saw above. A Professor has
Dependents.
Example of Strong and Weak Entity
The example of a strong and weak entity can be understood by
the below figure.

The Strong Entity is Professor, whereas Dependent is a Weak


Entity.
ID is the primary key (represented with a line) and the Name
in Dependent entity is called Partial Key (represented with a
dotted line).

Entity Sets:

Entity Set in DBMS (database management system) is a real-


world item with particular properties called attributes that
determine the entity's nature The entity is an object that exists
122

and is distinguishable from other objects. The entity set in


DBMS can be a real-world object with an existence. Entities
are distinct, which means that each entity in a pair of entities
possesses a property that distinguishes one from the other.
For example, a person with a given UID is an entity as he can
be uniquely identified as one particular person or for Professor
entity, the attributes are Professor_Name, Professor_Address,
Professor_Salary, etc. Therefore, the attribute value gets stored
in the database. We have provided the information related to
the entity set in DBMS and the types of an entity set in detail.
What is Entity Set in DBMS?
An entity may be concrete (person) or abstract (job). An entity
set in DBMS is a collection of similar entities. (All persons
having an account at the bank). Entity sets need not be disjoint.
Entity Set in DBMS Example
For example, the entity sets employees (all employees of a
bank), and the entity set customers (all bank customers) may
have members (Entity) in common. An entity is described using
a set of attributes. Therefore, all entities in a given entity set
have the same attributes.
Entity Set in DBMS Types
The entity set in DBMS is classified into two types:
 Weak entity set in DBMS
 Strong entity set in DBMS
Weak Entity Set in DBMS
123

A weak entity set in DBMS does not possess sufficient


attributes to form a primary key. Instead, it depends on another
entity called a weak entity. It is represented using a double
rectangle which is as follows:

An example of a weak entity set in DBMS is as follows:

Dependents are an example of a weak entity set in DBMS. A


weak entity in DBMS can be identified uniquely only by
considering some of its attributes in conjunction with the
primary key of another entity, which is called the identifying
owner. In addition, the following restrictions must hold:
1. The owner entity set and the weak entity set in DBMS
must participate in a one-to-many relationship set (one
owner entity is associated with one or more weak entities,
but each weak entity has a single owner). This relationship
set is called the identifying relationship set of the weak
entity set in DBMS.
2. The weak entity set in DBMS can have total participation
in the identifying relationship set.
For example, a dependents entity can be identified uniquely
only if we take the key of the owning Employees entity and the
pname of the dependents entity. The set of attributes of the
124

weak entity set in DBMS that uniquely identify a weak entity


for a given owner entity is called a partial key of the weak entity
set. In our example, pname is a partial key for dependents,
The dependent's weak entity set and its relationship to
employees are shown in the figure. The total participation of
dependents in policy is indicated by linking them with a double
line. The arrow from policy to dependents indicates that each
dependents entity appears in at most one (indeed, exactly one,
because of the participation constraint) policy relationship. We
draw both with double lines to underscore that dependents are
a weak entity and policy is its identifying relationship. We
underline it using a broken line to indicate that pname is a
partial key for dependents. This means that there may be two
dependents with the same pname value.
Strong Entity Set in DBMS

A strong entity set in DBMS is an entity set that contains


sufficient attributes to identify all its entities uniquely. In other
words, a primary key exists for the strong entity set in DBMS.
The primary key of the strong entity set in DBMS is represented
by underlining it.
A strong entity set in DBMS possesses its primary key. It is
represented using a single rectangle. A diamond represents the
relationship between two strong entities. A set of strong entities
in DBMS is known as a strong entity set in DBMS.
Example of a strong entity set in DBMS:
125

Consider an organization as an example: a manager, a product,


an employee, a department, and so on can all be considered
separate entities.

Attributes Relationship Types:

An attribute is a property or characteristic of an entity. An


entity may contain any number of attributes. One of the
attributes is considered as the primary key. In an Entity-
Relation model, attributes are represented in an elliptical
shape.
Example: Student has attributes like name, age, roll number,
and many more. To uniquely identify the student, we use the
primary key as a roll number as it is not repeated. Attributes
can also be subdivided into another set of attributes.
There are five such types of attributes: Simple, Composite,
Single-valued, Multi-valued, and Derived attribute. One more
attribute is their, i.e. Complex Attribute, this is the rarely used
attribute.
Simple attribute :
An attribute that cannot be further subdivided into components
is a simple attribute.
Example: The roll number of a student, the id number of an
employee.
Composite attribute :
An attribute that can be split into components is a composite
attribute.
Example: The address can be further split into house number,
street number, city, state, country, and pin code, the name can
also be split into first name middle name, and last name.
126

Single-valued attribute :
The attribute which takes up only a single value for each entity
instance is a single-valued attribute.
Example: The age of a student.
Multi-valued attribute :
The attribute which takes up more than a single value for each
entity instance is a multi-valued attribute.
Example: Phone number of a student: Landline and mobile.
Derived attribute :
An attribute that can be derived from other attributes is derived
attributes.
Example: Total and average marks of a student.
Complex attribute :
Those attributes, which can be formed by the nesting of
composite and multi-valued attributes, are called
“Complex Attributes“. These attributes are rarely
used in DBMS(DataBase Management System). That’s why
they are not so popular.
Representation:
Complex attributes are the nesting of two or more composite
and multi-valued attributes. Therefore, these multi-valued and
composite attributes are called ‘Components’ of complex
attributes.
These components are grouped between parentheses ‘(
)’ and multi-valued attributes between curly braces ‘{ }’,
Components are separated by commas ‘, ‘.
For example: let us consider a person having multiple phone
numbers, emails, and an address.
Here, phone number and email are examples of multi-valued
attributes and address is an example of the composite attribute,
127

because it can be divided into house number, street, city, and


state.

Complex attributes

Components:
Email, Phone number, Address(All are separated by commas
and multi-valued components are represented between curly
braces).
Complex Attribute: Address_EmPhone(You can choose any
name).

Relationship Instances and ER Diagrams:

An Entity Relationship Diagram is a diagram that represents


relationships among entities in a database. It is commonly
known as an ER Diagram. An ER Diagram in DBMS plays a
crucial role in designing the database. Today’s business world
previews all the requirements demanded by the users in the
form of an ER Diagram. Later, it's forwarded to the database
administrators to design the database.
128

What is an ER Diagram?

An Entity Relationship Diagram (ER Diagram) pictorially


explains the relationship between entities to be stored in a
database. Fundamentally, the ER Diagram is a structural
design of the database. It acts as a framework created with
specialized symbols for the purpose of defining the
relationship between the database entities. ER diagram is
created based on three principal components: entities,
attributes, and relationships.

The following diagram showcases two entities - Student and


Course, and their relationship. The relationship described
between student and course is many-to-many, as a course can
be opted by several students, and a student can opt for more
than one course. Student entity possesses attributes - Stu_Id,
129

Stu_Name & Stu_Age. The course entity has attributes such


as Cou_ID & Cou_Name.

What is an ER Model?

An Entity-Relationship Model represents the structure of


the database with the help of a diagram. ER Modelling is a
systematic process to design a database as it would require
you to analyze all data requirements before implementing
your database.

History of ER models

Peter Chen proposed ER Diagrams in 1971 to create a


uniform convention that can be used as a conceptual modeling
tool. Many models were presented and discussed, but none
were suitable. The data structure diagrams offered by Charles
Bachman also inspired his model.

Why Use ER Diagrams in DBMS?

 ER Diagram helps you conceptualize the database and lets


you know which fields need to be embedded for a particular
entity
 ER Diagram gives a better understanding of the information
to be stored in a database
 It reduces complexity and allows database designers to
build databases quickly
130

 It helps to describe elements using Entity-Relationship


models
 It allows users to get a preview of the logical structure of
the database

Symbols Used in ER Diagrams

 Rectangles: This Entity Relationship Diagram symbol


represents entity types
 Ellipses: This symbol represents attributes
 Diamonds: This symbol represents relationship types
 Lines: It links attributes to entity types and entity types with
other relationship types
 Primary key: Here, it underlines the attributes
 Double Ellipses: Represents multi-valued attributes
131

Components of ER Diagram

You base an ER Diagram on three basic concepts:

 Entities
 Weak Entity
 Attributes
 Key Attribute
 Composite Attribute
 Multivalued Attribute
 Derived Attribute
 Relationships
 One-to-One Relationships
 One-to-Many Relationships
 Many-to-One Relationships
 Many-to-Many Relationships

Entities

An entity can be either a living or non-living component.

It showcases an entity as a rectangle in an ER diagram.

For example, in a student study course, both the student and


the course are entities.
132

Weak Entity
An entity that makes reliance over another entity is called a
weak entity

You showcase the weak entity as a double rectangle in ER


Diagram.

In the example below, school is a strong entity because it has


a primary key attribute - school number. Unlike school, the
classroom is a weak entity because it does not have any
primary key and the room number here acts only as a
discriminator.

Attribute

An attribute exhibits the properties of an entity.


133

You can illustrate an attribute with an oval shape in an ER


diagram.

Key Attribute
Key attribute uniquely identifies an entity from an entity set.

It underlines the text of a key attribute.

For example: For a student entity, the roll number can


uniquely identify a student from a set of students.
134

Composite Attribute
An attribute that is composed of several other attributes is
known as a composite attribute.

An oval showcases the composite attribute, and the composite


attribute oval is further connected with other ovals.

Multivalued Attribute
Some attributes can possess over one value, those attributes
are called multivalued attributes.

The double oval shape is used to represent a multivalued


attribute.
135

Derived Attribute
An attribute that can be derived from other attributes of the
entity is known as a derived attribute.

In the ER diagram, the dashed oval represents the derived


attribute.

Relationship

The diamond shape showcases a relationship in the ER


diagram.

It depicts the relationship between two entities.

In the example below, both the student and the course are
entities, and study is the relationship between them.
136

One-to-One Relationship
When a single element of an entity is associated with a single
element of another entity, it is called a one-to-one
relationship.

For example, a student has only one identification card and an


identification card is given to one person.

One-to-Many Relationship
When a single element of an entity is associated with more
than one element of another entity, it is called a one-to-many
relationship

For example, a customer can place many orders, but an order


cannot be placed by many customers.
137

Many-to-One Relationship
When more than one element of an entity is related to a single
element of another entity, then it is called a many-to-one
relationship.

For example, students have to opt for a single course, but a


course can have many students.

Many-to-Many Relationship
When more than one element of an entity is associated with
more than one element of another entity, this is called a many-
to-many relationship.

For example, you can assign an employee to many projects


and a project can have many employees.
138

How to Draw an ER Diagram?

Below are some important points to draw ER diagram:

 First, identify all the Entities. Embed all the entities in a


rectangle and label them properly.
 Identify relationships between entities and connect them
using a diamond in the middle, illustrating the relationship.
Do not connect relationships with each other.
 Connect attributes for entities and label them properly.
 Eradicate any redundant entities or relationships.
 Make sure your ER Diagram supports all the data provided
to design the database.
 Effectively use colors to highlight key areas in your
diagrams.

Abstraction And Integration:

Data Abstraction:

Data Abstraction is a process of hiding unwanted or


irrelevant details from the end user. It provides a different view
and helps in achieving data independence which is used to
enhance the security of data.
The database systems consist of complicated data structures
and relations. For users to access the data easily, these
complications are kept hidden, and only the relevant part of the
database is made accessible to the users through data
abstraction.
139

Levels of abstraction for DBMS


Database systems include complex data-structures. In terms of
retrieval of data, reduce complexity in terms of usability of
users and in order to make the system efficient, developers use
levels of abstraction that hide irrelevant details from the users.
Levels of abstraction simplify database design.
Mainly there are three levels of abstraction for DBMS, which
are as follows −
 Physical or Internal Level
 Logical or Conceptual Level

 View or External Level

These levels are shown in the diagram below −

Let us discuss each level in detail.


Physical or Internal Level
It is the lowest level of abstraction for DBMS which defines
how the data is actually stored, it defines data-structures to
store data and access methods used by the database. Actually,
it is decided by developers or database application
programmers how to store the data in the database.
So, overall, the entire database is described in this level that is
physical or internal level. It is a very complex level to
understand. For example, customer's information is stored in
tables and data is stored in the form of blocks of storage such
as bytes, gigabytes etc.
Logical or Conceptual Level
Logical level is the intermediate level or next higher level. It
describes what data is stored in the database and what
140

relationship exists among those data. It tries to describe the


entire or whole data because it describes what tables to be
created and what are the links among those tables that are
created.
It is less complex than the physical level. Logical level is used
by developers or database administrators (DBA). So, overall,
the logical level contains tables (fields and attributes) and
relationships among table attributes.
View or External Level
It is the highest level. In view level, there are different levels
of views and every view only defines a part of the entire data.
It also simplifies interaction with the user and it provides many
views or multiple views of the same database.
View level can be used by all users (all levels' users). This level
is the least complex and easy to understand.
For example, a user can interact with a system using GUI that
is view level and can enter details at GUI or screen and the user
does not know how data is stored and what data is stored, this
detail is hidden from the user.
Data Integration:

Data Integration is a data preprocessing technique that


combines data from multiple heterogeneous data sources into
a coherent data store and provides a unified view of the data.
These sources may include multiple data cubes, databases, or
flat files.
The data integration approaches are formally defined as triple
<G, S, M> where,
G stand for the global schema,
S stands for the heterogeneous source of schema,
141

M stands for mapping between the queries of source and


global schema.

There are mainly 2 major approaches for data integration – one


is the “tight coupling approach” and another is the “loose
coupling approach”.
Tight Coupling:
 Here, a data warehouse is treated as an information retrieval

component.
 In this coupling, data is combined from different sources

into a single physical location through the process of ETL –


Extraction, Transformation, and Loading.
Loose Coupling:
 Here, an interface is provided that takes the query from the

user, transforms it in a way the source database can


understand, and then sends the query directly to the source
databases to obtain the result.
142

 And the data only remains in the actual source databases.


Issues in Data Integration:
There are three issues to consider during data integration:
Schema Integration, Redundancy Detection, and resolution of
data value conflicts. These are explained in brief below.
1. Schema Integration:
 Integrate metadata from different sources.

 The real-world entities from multiple sources are referred to

as the entity identification problem.


2. Redundancy:
 An attribute may be redundant if it can be derived or

obtained from another attribute or set of attributes.


 Inconsistencies in attributes can also cause redundancies in

the resulting data set.


 Some redundancies can be detected by correlation analysis.

3. Detection and resolution of data value conflicts:


 This is the third critical issue in data integration.

 Attribute values from different sources may differ for the

same real-world entity.


 An attribute in one system may be recorded at a lower level

of abstraction than the “same” attribute in another.

Basic Concepts of Hierarchical and Network Data


Model:

Hierarchical Model :

This is one of the oldest models in a data model which was


developed by IBM, in the 1950s. In a hierarchical model, data
are viewed as a collection of tables, or we can say segments
that form a hierarchical relation. In this, the data is organized
into a tree-like structure where each record consists of one
parent record and many children. Even if the segments are
143

connected as a chain-like structure by logical associations,


then the instant structure can be a fan structure with multiple
branches. We call the illogical associations as directional
associations.
In the hierarchical model, segments pointed to by the logical
association are called the child segment and the other
segment is called the parent segment. If there is a segment
without a parent is then that will be called the root and the
segment which has no children are called the leaves. The main
disadvantage of the hierarchical model is that it can have one-
to-one and one-to-many relationships between the nodes.
Applications of hierarchical model :
 Hierarchical models are generally used as semantic models

in practice as many real-world occurrences of events are


hierarchical in nature like biological structures, political, or
social structures.
 Hierarchical models are also commonly used as physical

models because of the inherent hierarchical structure of the


disk storage system like tracks, cylinders, etc. There are
various examples such as Information Management System
(IMS) by IBM, NOMAD by NCSS, etc.
Example 1: Consider the below Student database system
hierarchical model.
144

Hierarchical model

In the above-given figure, we have few students and few


course-enroll and a course can be assigned to a single student
only, but a student can enroll in any number of courses and
with this the relationship becomes one-to-many. We can
represent the given hierarchical model like the below
relational tables:
FACULTY Table
Name Dep Course-taught

John CSE CA

Jake CSE SE

Royal CSE DBMS


STUDENT Table
Name Course-enroll Grade

Gami CA 2.0

Mary SE 3.0

Mayen SE 4.0
Example 2: Consider the below cricket database system
hierarchical model scheme.
145

Hierarchical model

Here, in this example, for each player, there are some set of
positions (P_POSITION) he plays, a set of places (P_PLACE),
and also a set of birthdates (P_BDATE) of the players. In the
above figure, each node represents a logical record type and is
displayed by a list of its fields. The child node represents a set
of records that are connected to each record of the parent type,
which is due to a many-to-many relationship is from child to
parent. In the above, figure, the root node PLAYER states that
for every player there will be a set of positions, a set of places
(only one), and a set of birthdates (which is only one).
Advantages of the hierarchical model :
 As the database is based on this architecture the
relationships between various layers are logically simple so,
it has a very simple hierarchical database structure.
 It has data sharing as all data are held in a common database

data and therefore sharing of data becomes practical.


 It offers data security and this model was the first database

model that offered data security.


146

 There’s also data integrity as it is based on the parent-child


relationship and also there’s always a link between the
parents and the child segments.
Disadvantages of the hierarchical model :
 Even though this model is conceptually simple and easy to

design at the same time it is quite complex to implement.


 This model also lacks flexibility as the changes in the new

tables or segments often yield very complex system


management tasks. Here, a deletion of one segment can lead
to the involuntary deletion of all segments under it.
 It has no standards as the implementation of this model does

not provide any specific standard.


 It is also limited as many of the common relationships do

not conform to the 1 to N format as required by the


hierarchical model.

Network Data Model:

Network Model :
This model was formalized by the Database Task group in
the 1960s. This model is the generalization of the hierarchical
model. This model can consist of multiple parent segments
and these segments are grouped as levels but there exists a
logical association between the segments belonging to any
level. Mostly, there exists a many-to-many logical
association between any of the two segments. We
called graphs the logical associations between the segments.
Therefore, this model replaces the hierarchical tree with a
graph-like structure, and with that, there can more general
connections among different nodes. It can have M: N
relations i.e, many-to-many which allows a record to have
more than one parent segment.
147

Here, a relationship is called a set, and each set is made up of


at least 2 types of record which are given below:
 An owner record that is the same as of parent in the

hierarchical model.
 A member record that is the same as of child in the

hierarchical model.
Structure of a Network Model :

A Network data model

In the above figure, member TWO has only one owner ‘ONE’
whereas member FIVE has two owners i.e, TWO and THREE.
Here, each link between the two record types represents 1 : M
relationship between them. This model consists of both lateral
and top-down connections between the nodes. Therefore, it
allows 1: 1, 1 : M, M : N relationships among the given entities
which helps in avoiding data redundancy problems as it
supports multiple paths to the same record. There are various
examples such as TOTAL by Cincom Systems Inc., EDMS by
Xerox Corp., etc.
Example : Network model for a Finance Department.
Below we have designed the network model for a Finance
Department :
148

Network model of Finance Department.

So, In a network model, a one-to-many (1: N) relationship has


a link between two record types. Now, in the above figure,
SALES-MAN, CUSTOMER, PRODUCT, INVOICE,
PAYMENT, INVOICE-LINE are the types of records for the
sales of a company. Now, as you can see in the given figure,
INVOICE-LINE is owned by PRODUCT & INVOICE.
INVOICE has also two owners SALES-MAN &
CUSTOMER.
Let’s see another example, in which we have two segments,
Faculty and Student. Say that student John takes courses both
in CS and EE departments. Now, find how many instances will
be there?
For the above example, a students instance can have at least 2
parent instances therefore, there exist relations between the
instances of students and faculty segment. The model can be
very complex as if we use other segments say Courses and
logical associations like Student-Enroll and Faculty-course.
So, in this model, a student can be logically associated with
various instances of Faculties and Courses.
149

Advantages of Network Model :


 This model is very simple and easy to design like the

hierarchical data model.


 This model is capable of handling multiple types of

relationships which can help in modeling real-life


applications, for example, 1: 1, 1: M, M: N relationships.
 In this model, we can access the data easily, and also there

is a chance that the application can access the owner’s and


the member’s records within a set.
 This network does not allow a member to exist without an

owner which leads to the concept of Data integrity.


 Like a hierarchical model, this model also does not have any

database standard,
Disadvantages of Network Model :
 The schema or the structure of this database is very complex

in nature as all the records are maintained by the use of


pointers.
 There’s an existence of operational anomalies as there is a

use of pointers for navigation which further leads to


complex implementation.
 The design or the structure of this model is not user-friendly.

 This model does not have any scope of automated query

optimization.
 This model fails in achieving structural independence even

though the network database model is capable of achieving


data independence.

Relational Data Model:

Relational Model was proposed by E.F. Codd to model data in


the form of relations or tables. After designing the conceptual
model of Database using ER diagram, we need to convert the
conceptual model in the relational model which can be
150

implemented using any RDBMS languages like Oracle SQL,


MySQL etc. So we will see what Relational Model is.
What is Relational Model?
Relational Model represents how data is stored in Relational
Databases. A relational database stores data in the form of
relations (tables). Consider a relation STUDENT with
attributes ROLL_NO, NAME, ADDRESS, PHONE and AGE
shown in Table 1.
STUDENT
ROLL_N AG
O NAME ADDRESS PHONE E

945512345
1 RAM DELHI 1 18

RAMES GURGAO 965243154


2 H N 3 18

915625313
3 SUJIT ROHTAK 1 20

4 SURESH DELHI 18

IMPORTANT TERMINOLOGIES
 Attribute: Attributes are the properties that define a

relation. e.g.; ROLL_NO, NAME


 Relation Schema: A relation schema represents name of

the relation with its attributes. e.g.; STUDENT (ROLL_NO,


NAME, ADDRESS, PHONE and AGE) is relation schema
151

for STUDENT. If a schema has more than 1 relation, it is


called Relational Schema.
 Tuple: Each row in the relation is known as tuple. The
above relation contains 4 tuples, one of which is shown as:
1 RAM DELHI 9455123451 18

 Relation Instance: The set of tuples of a relation at a


particular instance of time is called as relation instance.
Table 1 shows the relation instance of STUDENT at a
particular time. It can change whenever there is insertion,
deletion or updation in the database.
 Degree: The number of attributes in the relation is known
as degree of the relation. The STUDENT relation defined
above has degree 5.
 Cardinality: The number of tuples in a relation is known as
cardinality. The STUDENT relation defined above has
cardinality 4.
 Column: Column represents the set of values for a
particular attribute. The column ROLL_NO is extracted
from relation STUDENT.
ROLL_NO

4
152

 NULL Values: The value which is not known or


unavailable is called NULL value. It is represented by blank
space. e.g.; PHONE of STUDENT having ROLL_NO 4 is
NULL.
Constraints in Relational Model
While designing Relational Model, we define some conditions
which must hold for data present in database are called
Constraints. These constraints are checked before performing
any operation (insertion, deletion and updation) in database. If
there is a violation in any of constrains, operation will fail.
Domain Constraints: These are attribute level constraints.
An attribute can only take values which lie inside the domain
range. e.g,; If a constrains AGE>0 is applied on STUDENT
relation, inserting negative value of AGE will result in failure.
Key Integrity: Every relation in the database should have
atleast one set of attributes which defines a tuple uniquely.
Those set of attributes is called key. e.g.; ROLL_NO in
STUDENT is a key. No two students can have same roll
number. So a key has two properties:
 It should be unique for all tuples.

 It can’t have NULL values.

Referential Integrity: When one attribute of a relation can


only take values from other attribute of same relation or any
other relation, it is called referential integrity. Let us suppose
we have 2 relations
STUDENT
ROLL_ NAM ADDR PHON AG BRANCH_
NO E ESS E E CODE

945512
1 RAM DELHI 3451 18 CS
153

RAME GURG 965243


2 SH AON 1543 18 CS

ROHTA 915625
3 SUJIT K 3131 20 ECE

SURE
4 SH DELHI 18 IT

BRANCH
BRANCH_CODE BRANCH_NAME

CS COMPUTER SCIENCE

IT INFORMATION TECHNOLOGY

ELECTRONICS AND
COMMUNICATION
ECE ENGINEERING

CV CIVIL ENGINEERING

BRANCH_CODE of STUDENT can only take the values


which are present in BRANCH_CODE of BRANCH which is
called referential integrity constraint. The relation which is
referencing to other relation is called REFERENCING
RELATION (STUDENT in this case) and the relation to
which other relations refer is called REFERENCED
154

RELATION (BRANCH in this case).

ANOMALIES
An anomaly is an irregularity, or something which deviates
from the expected or normal state. When designing databases,
we identify three types of
anomalies: Insert, Update and Delete.
Insertion Anomaly in Referencing Relation:
We can’t insert a row in REFERENCING RELATION if
referencing attribute’s value is not present in referenced
attribute value. e.g.; Insertion of a student with
BRANCH_CODE ‘ME’ in STUDENT relation will result in
error because ‘ME’ is not present in BRANCH_CODE of
BRANCH.
Deletion/ Updation Anomaly in Referenced Relation:
We can’t delete or update a row from REFERENCED
RELATION if value of REFERENCED ATTRIBUTE is used
in value of REFERENCING ATTRIBUTE. e.g; if we try to
delete tuple from BRANCH having BRANCH_CODE ‘CS’, it
will result in error because ‘CS’ is referenced by
BRANCH_CODE of STUDENT, but if we try to delete the
row from BRANCH with BRANCH_CODE CV, it will be
deleted as the value is not been used by referencing relation. It
can be handled by following method:

ON DELETE CASCADE: It will delete the tuples from


REFERENCING RELATION if value used by
REFERENCING ATTRIBUTE is deleted from
REFERENCED RELATION. e.g;, if we delete a row from
BRANCH with BRANCH_CODE ‘CS’, the rows in
STUDENT relation with BRANCH_CODE CS (ROLL_NO 1
and 2 in this case) will be deleted.
155

ON UPDATE CASCADE: It will update the


REFERENCING ATTRIBUTE in REFERENCING
RELATION if attribute value used by REFERENCING
ATTRIBUTE is updated in REFERENCED RELATION. e.g;,
if we update a row from BRANCH with BRANCH_CODE
‘CS’ to ‘CSE’, the rows in STUDENT relation with
BRANCH_CODE CS (ROLL_NO 1 and 2 in this case) will
be updated with BRANCH_CODE ‘CSE’.

SUPER KEYS:
Any set of attributes that allows us to identify unique rows
(tuples) in a given relation are known as super keys. Out of
these super keys we can always choose a proper subset
among these which can be used as a primary key. Such keys
are known as Candidate keys. If there is a combination of
two or more attributes which is being used as the primary key
then we call it as a Composite key.
Basic Operators in Relational Algebra
Article Contributed by Sonal Tuteja. Please write comments if
you find anything incorrect, or you want to share more
information about the topic discussed above

History of Relational Data Model:

The relational model for databases starts with Codd. Not the
kind you eat, but a gentleman named E.F. Codd. Codd
developed the model in a 1970 research paper. His concepts
were put into practice in the late 70's in an IBM product
called System R. Subsequently, the Interactive Graphics
Retrieval System (INGRES) was put in place at Berkeley
University.
156

The main appeal to the relational model is its simplicity. Prior


models (flat files, hierarchical, and network structures) were
difficult to use and query. Codd's proposal won him a Turing
Award in 1981 and it completely revolutionized data
management. Relational databases are currently the most
prevalent in industry today.

Relational Model Terminology:

Database Relations:

A relational database is a type of database that stores and


provides access to data points that are related to one another.
Relational databases are based on the relational model, an
intuitive, straightforward way of representing data in tables. In
a relational database, each row in the table is a record with a
unique ID called the key. The columns of the table hold
attributes of the data, and each record usually has a value for
each attribute, making it easy to establish the relationships
among data points.
A relational database example
Here’s a simple example of two tables a small business might
use to process orders for its products. The first table is a
customer info table, so each record includes a customer’s name,
address, shipping and billing information, phone number, and
other contact information. Each bit of information (each
attribute) is in its own column, and the database assigns a
unique ID (a key) to each row. In the second table—a customer
order table—each record includes the ID of the customer that
placed the order, the product ordered, the quantity, the selected
157

size and color, and so on—but not the customer’s name or


contact information.
These two tables have only one thing in common: the ID
column (the key). But because of that common column, the
relational database can create a relationship between the two
tables. Then, when the company’s order processing application
submits an order to the database, the database can go to the
customer order table, pull the correct information about the
product order, and use the customer ID from that table to look
up the customer’s billing and shipping information in the
customer info table. The warehouse can then pull the correct
product, the customer can receive timely delivery of the order,
and the company can get paid.
How relational databases are structured
The relational model means that the logical data structures—
the data tables, views, and indexes—are separate from the
physical storage structures. This separation means that database
administrators can manage physical data storage without
affecting access to that data as a logical structure. For example,
renaming a database file does not rename the tables stored
within it.
The distinction between logical and physical also applies to
database operations, which are clearly defined actions that
enable applications to manipulate the data and structures of the
database. Logical operations allow an application to specify the
content it needs, and physical operations determine how that
data should be accessed and then carries out the task.
To ensure that data is always accurate and accessible, relational
databases follow certain integrity rules. For example, an
integrity rule can specify that duplicate rows are not allowed in
158

a table in order to eliminate the potential for erroneous


information entering the database.
The relational model
In the early years of databases, every application stored data in
its own unique structure. When developers wanted to build
applications to use that data, they had to know a lot about the
particular data structure to find the data they needed. These data
structures were inefficient, hard to maintain, and hard to
optimize for delivering good application performance. The
relational database model was designed to solve the problem of
multiple arbitrary data structures.
The relational data model provided a standard way of
representing and querying data that could be used by any
application. From the beginning, developers recognized that
the chief strength of the relational database model was in its use
of tables, which were an intuitive, efficient, and flexible way to
store and access structured information.
Over time, another strength of the relational model emerged as
developers began to use structured query language (SQL) to
write and query data in a database. For many years, SQL has
been widely used as the language for database queries. Based
on relational algebra, SQL provides an internally consistent
mathematical language that makes it easier to improve the
performance of all database queries. In comparison, other
approaches must define individual queries.
Benefits of relational database management system
The simple yet powerful relational model is used by
organizations of all types and sizes for a broad variety of
information needs. Relational databases are used to track
159

inventories, process ecommerce transactions, manage huge


amounts of mission-critical customer information, and much
more. A relational database can be considered for any
information need in which data points relate to each other and
must be managed in a secure, rules-based, consistent way.
Relational databases have been around since the 1970s. Today,
the advantages of the relational model continue to make it the
most widely accepted model for databases.
Relational model and data consistency
The relational model is the best at maintaining data consistency
across applications and database copies (called instances). For
example, when a customer deposits money at an ATM and then
looks at the account balance on a mobile phone, the customer
expects to see that deposit reflected immediately in an updated
account balance. Relational databases excel at this kind of data
consistency, ensuring that multiple instances of a database have
the same data all the time.
It’s difficult for other types of databases to maintain this level
of timely consistency with large amounts of data. Some recent
databases, such as NoSQL, can supply only “eventual
consistency.” Under this principle, when the database is scaled
or when multiple users access the same data at the same time,
the data needs some time to “catch up.” Eventual consistency
is acceptable for some uses, such as to maintain listings in a
product catalog, but for critical business operations such as
shopping cart transactions, the relational database is still the
gold standard.
Commitment and atomicity
160

Relational databases handle business rules and policies at a


very granular level, with strict policies about commitment (that
is, making a change to the database permanent). For example,
consider an inventory database that tracks three parts that are
always used together. When one part is pulled from inventory,
the other two must also be pulled. If one of the three parts isn’t
available, none of the parts should be pulled—all three parts
must be available before the database makes any commitment.
A relational database won’t commit for one part until it knows
it can commit for all three. This multifaceted commitment
capability is called atomicity. Atomicity is the key to keeping
data accurate in the database and ensuring that it is compliant
with the rules, regulations, and policies of the business.
ACID properties and RDBMS
Four crucial properties define relational database transactions:
atomicity, consistency, isolation, and durability—typically
referred to as ACID.
 Atomicity defines all the elements that make up a complete
database transaction.
 Consistency defines the rules for maintaining data points in
a correct state after a transaction.
 Isolation keeps the effect of a transaction invisible to others
until it is committed, to avoid confusion.
 Durability ensures that data changes become permanent
once the transaction is committed.
Stored procedures and relational databases
Data access involves many repetitive actions. For example, a
simple query to get information from a data table may need to
be repeated hundreds or thousands of times to produce the
desired result. These data access functions require some type of
161

code to access the database. Application developers don’t want


to write new code for these functions in each new application.
Luckily, relational databases allow stored procedures, which
are blocks of code that can be accessed with a simple
application call. For example, a single stored procedure can
provide consistent record tagging for users of multiple
applications. Stored procedures can also help developers ensure
that certain data functions in the application are implemented
in a specific way.
Database locking and concurrency
Conflicts can arise in a database when multiple users or
applications attempt to change the same data at the same time.
Locking and concurrency techniques reduce the potential for
conflicts while maintaining the integrity of the data.
Locking prevents other users and applications from accessing
data while it is being updated. In some databases, locking
applies to the entire table, which creates a negative impact on
application performance. Other databases, such as Oracle
relational databases, apply locks at the record level, leaving the
other records within the table available, helping ensure better
application performance.
Concurrency manages the activity when multiple users or
applications invoke queries at the same time on the same
database. This capability provides the right access to users and
applications according to policies defined for data control.
What to look for when selecting a relational database
The software used to store, manage, query, and retrieve data
stored in a relational database is called a relational database
management system (RDBMS). The RDBMS provides an
162

interface between users and applications and the database, as


well as administrative functions for managing data storage,
access, and performance.
Several factors can guide your decision when choosing among
database types and relational database products. The RDBMS
you choose will depend on your business needs. Ask yourself
the following questions:
 What are our data accuracy requirements? Will data storage
and accuracy rely on business logic? Does our data have
stringent requirements for accuracy (for example, financial
data and government reports)?
 Do we need scalability? What is the scale of the data to be
managed, and what is its anticipated growth? Will the
database model need to support mirrored database copies (as
separate instances) for scalability? If so, can it maintain data
consistency across those instances?
 How important is concurrency? Will multiple users and
applications need simultaneous data access? Does the
database software support concurrency while protecting the
data?
 What are our performance and reliability needs? Do we need
a high-performance, high-reliability product? What are the
requirements for query-response performance? What are the
vendor’s commitments for service level agreements (SLAs)
or unplanned downtime?
The relational database of the future: The self-driving
database
Over the years, relational databases have gotten better, faster,
stronger, and easier to work with. But they’ve also gotten more
complex, and administering the database has long been a full-
163

time job. Instead of using their expertise to focus on developing


innovative applications that bring value to the business,
developers have had to spend most of their time on the
management activity needed to optimize database
performance.
Today, autonomous technology is building upon the strengths
of the relational model, cloud database technology,
and machine learning to deliver a new type of relational
database. The self-driving database (also known as the
autonomous database) maintains the power and advantages of
the relational model but uses artificial intelligence (AI),
machine learning, and automation to monitor and improve
query performance and management tasks. For example, to
improve query performance, the self-driving database can
hypothesize and test indexes to make queries faster, and then
push the best ones into production—all on its own. The self-
driving database makes these improvements continuously,
without the need for human involvement.
Autonomous technology frees up developers from the
mundane tasks of managing the database. For instance, they no
longer have to determine infrastructure requirements in
advance. Instead, with a self-driving database, they can add
storage and compute resources as needed to support database
growth. With just a few steps, developers can easily create an
autonomous relational database, accelerating the time for
application development.

Properties of Relations:

A relation is a two-dimensional table. It contains number of


rows (tuples) and columns (attributes). A relation has
following properties :
164

(a). In any given column of the table, all items are of the same
kind whereas items in different columns may not be of the
same kind.
(b). For a row, each coloums must have an atomic value and
also for a row, a columns cannot more than one value.
(c). All of a relations are distinct. That is, a relation does not
contain two rows which are identical in every column. That is,
each row of the relation can be uniquely identified by its
contents.
(d). The ordering of two within a relation is immaterial. That
is, we cannot retrieved any thing by saying that from row
number 5, column name is to be accessed. Thee is no order
maintained for rows inside a relation.
(e). The columns of a relation are assigned distinct names and
the ordering of these columns is immaterial.

Keys:

o Keys play an important role in the relational database.


o It is used to uniquely identify any record or row of data
from the table. It is also used to establish and identify
relationships between tables.
For example, ID is used as a key in the Student table because
it is unique for each student. In the PERSON table,
passport_number, license_number, SSN are keys since they are
unique for each person.
165

Types of keys:

1. Primary key
o It is the first key used to identify one and only one instance

of an entity uniquely. An entity can contain multiple keys,


as we saw in the PERSON table. The key which is most
suitable from those lists becomes a primary key.
o In the EMPLOYEE table, ID can be the primary key since

it is unique for each employee. In the EMPLOYEE table,


we can even select License_Number and
Passport_Number as primary keys since they are also
unique.
o For each entity, the primary key selection is based on

requirements and developers.


166

2. Candidate key
o A candidate key is an attribute or set of attributes that can

uniquely identify a tuple.


o Except for the primary key, the remaining attributes are

considered a candidate key. The candidate keys are as


strong as the primary key.
For example: In the EMPLOYEE table, id is best suited for
the primary key. The rest of the attributes, like SSN,
Passport_Number, License_Number, etc., are considered a
candidate key.
167

3. Super Key
Super key is an attribute set that can uniquely identify a tuple.
A super key is a superset of a candidate key.

For example: In the above EMPLOYEE table,


for(EMPLOEE_ID, EMPLOYEE_NAME), the name of two
employees can be the same, but their EMPLYEE_ID can't be
the same. Hence, this combination can also be a key.
The super key would be EMPLOYEE-ID (EMPLOYEE_ID,
EMPLOYEE-NAME), etc.
168

4. Foreign key
o Foreign keys are the column of the table used to point to

the primary key of another table.


o Every employee works in a specific department in a

company, and employee and department are two different


entities. So we can't store the department's information in
the employee table. That's why we link these two tables
through the primary key of one table.
o We add the primary key of the DEPARTMENT table,

Department_Id, as a new attribute in the EMPLOYEE


table.
o In the EMPLOYEE table, Department_Id is the foreign

key, and both the tables are related.

5. Alternate key
There may be one or more attributes or a combination of
attributes that uniquely identify each tuple in a relation. These
attributes or combinations of the attributes are called the
candidate keys. One key is chosen as the primary key from
these candidate keys, and the remaining candidate key, if it
exists, is termed the alternate key. In other words, the total
169

number of the alternate keys is the total number of candidate


keys minus the primary key. The alternate key may or may not
exist. If there is only one candidate key in a relation, it does not
have an alternate key.
For example, employee relation has two attributes,
Employee_Id and PAN_No, that act as candidate keys. In this
relation, Employee_Id is chosen as the primary key, so the
other candidate key, PAN_No, acts as the Alternate key.

6. Composite key
Whenever a primary key consists of more than one attribute, it
is known as a composite key. This key is also known as
Concatenated Key.
170

For example, in employee relations, we assume that an


employee may be assigned multiple roles, and an employee
may work on multiple projects simultaneously. So the primary
key will be composed of all three attributes, namely Emp_ID,
Emp_role, and Proj_ID in combination. So these attributes act
as a composite key since the primary key comprises more than
one attribute.

7. Artificial key
The key created using arbitrarily assigned data are known as
artificial keys. These keys are created when a primary key is
large and complex and has no relationship with many other
relations. The data values of the artificial keys are usually
numbered in a serial order.
For example, the primary key, which is composed of Emp_ID,
Emp_role, and Proj_ID, is large in employee relations. So it
would be better to add a new virtual attribute to identify each
tuple in the relation uniquely.
171

Domain:

The data type defined for a column in a database is called a


database domain. This data type can either be a built-in type
(such as an integer or a string) or a custom type that defines
data constraints.
To understand this more effectively, let's think like this :
A database schema has a set of attributes, also
called columns or fields, that define the database. Each
attribute has a domain that specifies the types of values that
can be used, as well as other information such as data
type, length, and values.
We will also take an example : Let's say that we have a
table, that stores student records. Now, suppose there is a
column, or attribute to store the students' contact numbers.
The column for contact numbers should only expect numeric
values, usually of the type INT, which is the domain for that
attribute.
Creating a Domain
To create a domain, we use the CREATE
DOMAIN command in SQL.
Let's look at the syntax :
CREATE DOMAIN C_Number INT(10) NOT NULL;
The above statement, for example, creates
a C_Number attribute with ten integers to store contact
numbers. It's not possible to use a NULL or an unknown
value. This generates a ten-integer C_Number property, in
172

which it wouldn't be possible to use a NULL or an unknown


value.
Domain Integrity Constraints
Domain Constraints are user-defined columns that assist the
user in entering values that are appropriate for the data type. If
it detects an incorrect input, it informs the user that the
column is not properly filled. Or, to put it another way, it's an
attribute that describes all of the potential values for the
attribute, such as integer, character, date, time, string, and
so on.
To read more about domain integrity constraints, please head
over to our dedicated article here!
Types of Domain Constraints
There are two types of domain constraints :
1. NOT NULL :
The Not Null constraint prevents a column from
accepting null values. This implies that you can't create a
new record or change an existing one without first
putting a value in the field.
Example :
CREATE DOMAIN C_Number INT(10) NOT NULL;
2. Check :
It restricts the value of a column across ranges. It can
also be understood as it's like a condition or filter
checking before saving data into a column since it
defines a condition that each row must satisfy.
173

Example :
CREATE DOMAIN S_ID INT(3) NOT NULL
CHECK(VALUE > 0);

Integrity Constraints Over Relations:

Integrity Constraints
o Integrity constraints are a set of rules. It is used to maintain

the quality of information.


o Integrity constraints ensure that the data insertion,

updating, and other processes have to be performed in


such a way that data integrity is not affected.
o Thus, integrity constraint is used to guard against

accidental damage to the database.


Types of Integrity Constraint
174

1. Domain constraints
o Domain constraints can be defined as the definition of a

valid set of values for an attribute.


o The data type of domain includes string, character,

integer, time, date, currency, etc. The value of the attribute


must be available in the corresponding domain.
Example:

2. Entity integrity constraints


o The entity integrity constraint states that primary key

value can't be null.


o This is because the primary key value is used to identify

individual rows in relation and if the primary key has a


null value, then we can't identify those rows.
o A table can contain a null value other than the primary key

field.
Example:
175

3. Referential Integrity Constraints


o A referential integrity constraint is specified between two

tables.
o In the Referential integrity constraints, if a foreign key in

Table 1 refers to the Primary Key of Table 2, then every


value of the Foreign Key in Table 1 must be null or be
available in Table 2.
Example:
176

4. Key constraints
o Keys are the entity set that is used to identify an entity

within its entity set uniquely.


o An entity set can have multiple keys, but out of which one

key will be the primary key. A primary key can contain a


unique and null value in the relational table.
Example:
177

SECTION-IV

Relational algebra:

Relational Algebra is procedural query language, which takes


Relation as input and generate relation as output. Relational
algebra mainly provides theoretical foundation for relational
databases and SQL.
Operators in Relational Algebra
Projection (π)
Projection is used to project required column data from a
relation.
Example :
R
(A B C)
----------
1 2 4
2 2 3
3 2 3
4 3 4
π (BC)
B C
-----
2 4
2 3
178

3 4
Note: By Default projection removes duplicate data.

Selection (σ)
Selection is used to select required tuples of the relations.
for the above relation
σ (c>3)R
will select the tuples which have c more than 3.
Note: selection operator only selects the required tuples but
does not display them. For displaying, data projection
operator is used.
For the above selected tuples, to display we need to use
projection also.
π (σ (c>3)R ) will show following tuples.

A B C
-------
1 2 4
4 3 4

Union (U)
Union operation in relational algebra is same as union
operation in set theory, only constraint is for union of two
relation both relation must have same set of Attributes.

Set Difference (-)


Set Difference in relational algebra is same set difference
operation as in set theory with the constraint that both
relation should have same set of attributes.
179

Rename (ρ)
Rename is a unary operation used for renaming attributes of a
relation.
ρ (a/b)R will rename the attribute ‘b’ of relation by ‘a’.

Cross Product (X)


Cross product between two relations let say A and B, so cross
product between A X B will results all the attributes of A
followed by each attribute of B. Each record of A will pairs
with every record of B.
below is the example
A B
(Name Age Sex ) (Id Course)
------------------ -------------
Ram 14 M 1 DS
Sona 15 F 2 DBMS
kim 20 M

AXB
Name Age Sex Id Course
---------------------------------
Ram 14 M 1 DS
Ram 14 M 2 DBMS
Sona 15 F 1 DS
Sona 15 F 2 DBMS
Kim 20 M 1 DS
180

Kim 20 M 2 DBMS
Note: if A has ‘n’ tuples and B has ‘m’ tuples then A X B
will have ‘n*m’ tuples.

Natural Join (⋈)


Natural join is a binary operator. Natural join between two or
more relations will result set of all combination of tuples
where they have equal common attribute.
Let us see below example

Emp Dep
(Name Id Dept_name ) (Dept_name Manager)
------------------------ ---------------------
A 120 IT Sale Y
B 125 HR Prod Z
C 110 Sale IT A
D 111 IT

Emp ⋈ Dep

Name Id Dept_name Manager


-------------------------------
A 120 IT A
C 110 Sale Y
D 111 IT A
181

Conditional Join
Conditional join works similar to natural join. In natural join,
by default condition is equal between common attribute
while in conditional join we can specify the any condition
such as greater than, less than, not equal
Let us see below example
R S
(ID Sex Marks) (ID Sex Marks)
------------------ --------------------
1 F 45 10 M 20
2 F 55 11 M 22
3 F 60 12 M 59

Join between R And S with condition R.marks >= S.marks

R.ID R.Sex R.Marks S.ID S.Sex S.Marks


-----------------------------------------------
1 F 45 10 M 20
1 F 45 11 M 22
2 F 55 10 M 20
2 F 55 11 M 22
3 F 60 10 M 20
3 F 60 11 M 22
3 F 60 12 M 59
182

Relational calculus:

There is an alternate way of formulating queries known as


Relational Calculus. Relational calculus is a non-procedural
query language. In the non-procedural query language, the user
is concerned with the details of how to obtain the end results.
The relational calculus tells what to do but never explains how
to do. Most commercial relational languages are based on
aspects of relational calculus including SQL-QBE and QUEL.
Why it is called Relational Calculus?
It is based on Predicate calculus, a name derived from branch
of symbolic language. A predicate is a truth-valued function
with arguments. On substituting values for the arguments, the
function result in an expression called a proposition. It can be
either true or false. It is a tailored version of a subset of the
Predicate Calculus to communicate with the relational
database.
Many of the calculus expressions involves the use of
Quantifiers. There are two types of quantifiers:
o Universal Quantifiers: The universal quantifier denoted
by ∀ is read as for all which means that in a given set of
tuples exactly all tuples satisfy a given condition.
o Existential Quantifiers: The existential quantifier
denoted by ∃ is read as for all which means that in a given
set of tuples there is at least one occurrences whose value
satisfy a given condition.
Before using the concept of quantifiers in formulas, we need to
know the concept of Free and Bound Variables.
183

A tuple variable t is bound if it is quantified which means that


if it appears in any occurrences a variable that is not bound is
said to be free.
Free and bound variables may be compared with global and
local variable of programming languages.
Types of Relational calculus:

1. Tuple Relational Calculus (TRC)


It is a non-procedural query language which is based on finding
a number of tuple variables also known as range variable for
which predicate holds true. It describes the desired information
without giving a specific procedure for obtaining that
information. The tuple relational calculus is specified to select
the tuples in a relation. In TRC, filtering variable uses the tuples
of a relation. The result of the relation can have one or more
tuples.
Notation:
A Query in the tuple relational calculus is expressed as
following notation
184

1. {T | P (T)} or {T | Condition (T)}


Where
T is the resulting tuples
P(T) is the condition used to fetch T.
Learn more

For example:
1. { T.name | Author(T) AND T.article = 'database' }
Output: This query selects the tuples from the AUTHOR
relation. It returns a tuple with 'name' from Author who has
written an article on 'database'.
TRC (tuple relation calculus) can be quantified. In TRC, we
can use Existential (∃) and Universal Quantifiers (∀).
For example:
1. { R| ∃T ∈ Authors(T.article='database' AND R.name=T.name
)}
Output: This query will yield the same result as the previous
one.
2. Domain Relational Calculus (DRC)
The second form of relation is known as Domain relational
calculus. In domain relational calculus, filtering variable uses
the domain of attributes. Domain relational calculus uses the
same operators as tuple calculus. It uses logical connectives ∧
(and), ∨ (or) and ┓ (not). It uses Existential (∃) and Universal
Quantifiers (∀) to bind the variable. The QBE or Query by
185

example is a query language related to domain relational


calculus.
Notation:
1. { a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}
Where
a1, a2 are attributes
P stands for formula built by inner attributes
For example:
1. {< article, page, subject > | ∈ javatpoint ∧ subject = 'database'
}
Output: This query will yield the article, page, and subject
from the relational javatpoint, where the subject is a database.

Relational database design:

Functional dependencies:

The functional dependency is a relationship that exists between


two attributes. It typically exists between the primary key and
non-key attribute within a table.
1. X → Y
The left side of FD is known as a determinant, the right side of
the production is known as a dependent.
For example:
186

Assume we have an employee table with attributes: Emp_Id,


Emp_Name, Emp_Address.
Here Emp_Id attribute can uniquely identify the Emp_Name
attribute of employee table because if we know the Emp_Id, we
can tell that employee name associated with it.
Functional dependency can be written as:
1. Emp_Id → Emp_Name
We can say that Emp_Name is functionally dependent on
Emp_Id.
Types of Functional dependency

1. Trivial functional dependency


o A → B has trivial functional dependency if B is a subset

of A.
o The following dependencies are also trivial like: A → A,

B→B
Example:
187

1. Consider a table with two columns Employee_Id and Employ


ee_Name.
2. {Employee_id, Employee_Name} → Employee_Id is a tri
vial functional dependency as
3. Employee_Id is a subset of {Employee_Id, Employee_Name}
.
4. Also, Employee_Id → Employee_Id and Employee_Name
→ Employee_Name are trivial dependencies too.
2. Non-trivial functional dependency
o A → B has a non-trivial functional dependency if B is not

a subset of A.
o When A intersection B is NULL, then A → B is called as

complete non-trivial.
Example:
1. ID → Name,
2. Name → DOB

Modification anomalies:

Anomalies
There are different types of anomalies which can occur in
referencing and referenced relation which can be discussed as:
188

Insertion anomaly: If a tuple is inserted in referencing


relation and referencing attribute value is not present in
referenced attribute, it will not allow inserting in referencing
relation. For Example, If we try to insert a record in
STUDENT_COURSE with STUD_NO =7, it will not allow.
Deletion and Updation anomaly: If a tuple is deleted or
updated from referenced relation and referenced attribute
value is used by referencing attribute in referencing relation, it
will not allow deleting the tuple from referenced relation. For
Example, If we try to delete a record from STUDENT with
STUD_NO =1, it will not allow. To avoid this, following can
be used in query:
 ON DELETE/UPDATE SET NULL: If a tuple is deleted

or updated from referenced relation and referenced attribute


value is used by referencing attribute in referencing relation,
it will delete/update the tuple from referenced relation and
set the value of referencing attribute to NULL.
 ON DELETE/UPDATE CASCADE: If a tuple is deleted

or updated from referenced relation and referenced attribute


value is used by referencing attribute in referencing relation,
it will delete/update the tuple from referenced relation and
referencing relation as well.
189

Ist to 3rd NFs:

First Normal Form (1NF)


o A relation will be 1NF if it contains an atomic value.

o It states that an attribute of a table cannot hold multiple

values. It must hold only single-valued attribute.


o First normal form disallows the multi-valued attribute,

composite attribute, and their combinations.


Example: Relation EMPLOYEE is not in 1NF because of
multi-valued attribute EMP_PHONE.
EMPLOYEE table:

EMP_I EMP_NAM EMP_PHON EMP_STAT


D E E E

14 John 7272826385, UP
9064738238

20 Harry 8574783832 Bihar

12 Sam 7390372389, Punjab


8589830302

The decomposition of the EMPLOYEE table into 1NF has been


shown below:
190

EMP_I EMP_NAM EMP_PHON EMP_STAT


D E E E

14 John 7272826385 UP

14 John 9064738238 UP

20 Harry 8574783832 Bihar

12 Sam 7390372389 Punjab

12 Sam 8589830302 Punjab

Second Normal Form (2NF)


o In the 2NF, relational must be in 1NF.

o In the second normal form, all non-key attributes are fully

functional dependent on the primary key


Example: Let's assume, a school can store the data of teachers
and the subjects they teach. In a school, a teacher can teach
more than one subject.
TEACHER table

TEACHER_ID SUBJECT TEACHER_AGE

25 Chemistry 30

25 Biology 30
191

47 English 35

83 Math 38

83 Computer 38

In the given table, non-prime attribute TEACHER_AGE is


dependent on TEACHER_ID which is a proper subset of a
candidate key. That's why it violates the rule for 2NF.
To convert the given table into 2NF, we decompose it into two
tables:
TEACHER_DETAIL table:

TEACHER_ID TEACHER_AGE

25 30

47 35

83 38

TEACHER_SUBJECT table:

TEACHER_ID SUBJECT

25 Chemistry

25 Biology
192

47 English

83 Math

83 Computer

Third Normal Form (3NF)


o A relation will be in 3NF if it is in 2NF and not contain

any transitive partial dependency.


o 3NF is used to reduce the data duplication. It is also used

to achieve the data integrity.


o If there is no transitive dependency for non-prime

attributes, then the relation must be in third normal form.


A relation is in third normal form if it holds atleast one of the
following conditions for every non-trivial function dependency
X → Y.
1. X is a super key.
2. Y is a prime attribute, i.e., each element of Y is part of
some candidate key.
Example:
EMPLOYEE_DETAIL table:

EMP_ EMP_N EMP_ EMP_ST EMP_C


ID AME ZIP ATE ITY

222 Harry 201010 UP Noida


193

333 Stephan 02228 US Boston

444 Lan 60007 US Chicago

555 Katharine 06389 UK Norwich

666 John 462007 MP Bhopal

Super key in the table above:


1. {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP
_NAME, EMP_ZIP}....so on
Candidate key: {EMP_ID}
Non-prime attributes: In the given table, all attributes
except EMP_ID are non-prime.
Here, EMP_STATE & EMP_CITY dependent on
EMP_ZIP and EMP_ZIP dependent on EMP_ID. The
non-prime attributes (EMP_STATE, EMP_CITY)
transitively dependent on super key(EMP_ID). It violates
the rule of third normal form.
That's why we need to move the EMP_CITY and
EMP_STATE to the new <EMPLOYEE_ZIP> table, with
EMP_ZIP as a Primary key.
EMPLOYEE table:

EMP_ID EMP_NAME EMP_ZIP

222 Harry 201010


194

333 Stephan 02228

444 Lan 60007

555 Katharine 06389

666 John 462007

EMPLOYEE_ZIP table:

EMP_ZIP EMP_STATE EMP_CITY

201010 UP Noida

02228 US Boston

60007 US Chicago

06389 UK Norwich

462007 MP Bhopal

Boyce Codd normal form (BCNF)


o BCNF is the advance version of 3NF. It is stricter than

3NF.
o A table is in BCNF if every functional dependency X →

Y, X is the super key of the table.


o For BCNF, the table should be in 3NF, and for every FD,

LHS is super key.


195

Example: Let's assume there is a company where employees


work in more than one department.
EMPLOYEE table:

EMP EMP_COU EMP_D DEPT_T EMP_DEP


_ID NTRY EPT YPE T_NO

264 India Designin D394 283


g

264 India Testing D394 300

364 UK Stores D283 232

364 UK Developi D283 549


ng

In the above table Functional dependencies are as follows:


1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate key: {EMP-ID, EMP-DEPT}
The table is not in BCNF because neither EMP_DEPT nor
EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into
three tables:
196

EMP_COUNTRY table:

EMP_ID EMP_COUNTRY

264 India

264 India

EMP_DEPT table:

EMP_DEPT DEPT_TYPE EMP_DEPT_NO

Designing D394 283

Testing D394 300

Stores D283 232

Developing D283 549

EMP_DEPT_MAPPING table:

EMP_ID EMP_DEPT

D394 283

D394 300

D283 232
197

D283 549

Functional dependencies:
1. EMP_ID → EMP_COUNTRY
2. EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Now, this is in BCNF because left side part of both the
functional dependencies is a key.

Fourth normal form (4NF)


o A relation will be in 4NF if it is in Boyce Codd normal
form and has no multi-valued dependency.
o For a dependency A → B, if for a single value of A,
multiple values of B exists, then the relation will be a
multi-valued dependency.
Example
STUDENT

STU_ID COURSE HOBBY

21 Computer Dancing

21 Math Singing
198

34 Chemistry Dancing

74 Biology Cricket

59 Physics Hockey

The given STUDENT table is in 3NF, but the COURSE and


HOBBY are two independent entity. Hence, there is no
relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains
two courses, Computer and Math and two
hobbies, Dancing and Singing. So there is a Multi-valued
dependency on STU_ID, which leads to unnecessary repetition
of data.
So to make the above table into 4NF, we can decompose it into
two tables:
STUDENT_COURSE

STU_ID COURSE

21 Computer

21 Math

34 Chemistry

74 Biology

59 Physics
199

STUDENT_HOBBY

STU_ID HOBBY

21 Dancing

21 Singing

34 Dancing

74 Cricket

59 Hockey

Fifth normal form (5NF)


o A relation is in 5NF if it is in 4NF and not contains any

join dependency and joining should be lossless.


o 5NF is satisfied when all the tables are broken into as

many tables as possible in order to avoid redundancy.


o 5NF is also known as Project-join normal form (PJ/NF).

Example
SUBJECT LECTURER SEMESTER

Computer Anshika Semester 1

Computer John Semester 1

Math John Semester 1


200

Math Akash Semester 2

Chemistry Praveen Semester 1

In the above table, John takes both Computer and Math class
for Semester 1 but he doesn't take Math class for Semester 2.
In this case, combination of all these fields required to identify
a valid data.
Suppose we add a new Semester as Semester 3 but do not know
about the subject and who will be taking that subject so we
leave Lecturer and Subject as NULL. But all three columns
together acts as a primary key, so we can't leave other two
columns blank.
So to make the above table into 5NF, we can decompose it into
three relations P1, P2 & P3:
P1

SEMESTER SUBJECT

Semester 1 Computer

Semester 1 Math

Semester 1 Chemistry

Semester 2 Math

P2
201

SUBJECT LECTURER

Computer Anshika

Computer John

Math John

Math Akash

Chemistry Praveen

P3

SEMSTER LECTURER

Semester 1 Anshika

Semester 1 John

Semester 1 John

Semester 2 Akash

Semester 1 Praveen
202

Computing Closures Of Set Fds:

Closure of an Attribute: Closure of an Attribute can be


defined as a set of attributes that can be functionally determined
from it.
OR
Closure of a set F of FDs is the set F+ of all FDs that can be
inferred from F
Closure of a set of attributes X concerning F is the set X+ of all
attributes that are functionally determined by X
Pseudocode to find Closure of an Attribute?
Determine X+, the closure of X under functional dependency
set F
X Closure : = will contain X itself;
Repeat the process as:
old X Closure : = X Closure;
for each functional dependency P → Q in FD set do
if X Closure is subset of P then X Closure := X Closure U Q ;
Repeat until ( X Closure = old X Closure);
Algorithm of Determining X+, the Closure of X under F
Input: A set F of FDs on a relation schema R, and a set of
attributes X, which is a subset of R.
1. X+ := X;
203

2. repeat
3. oldX+ := X+ ;
4. for each functional dependency Y → Z in F do
5. if X+ ⊇ Y then X+ := X+ ∪ Z;
6. until (X+ = oldX+ );
QUESTIONS ON CLOSURE SET OF ATTRIBUTE:
1) Given relational schema R( P Q R S T U V) having
following attribute P Q R S T U and V, also there is a set of
functional dependency denoted by FD = { P->Q, QR->ST,
PTV->V }.
Determine Closure of (QR)+ and (PR)+
a) QR+ = QR (as the closure of an attribute or set of attributes
contain same).
Now as per algorithm look into a set of FD that complete the
left side of any FD contains either Q, R, or QR since in FD
QR→ST has complete QR.
Hence QR+ = QRST
Again, trace the remaining two FD that any left part of FD
contains any Q, R, S, T.
Since no complete left side of the remaining two FD{P->Q,
PTV->V} contain Q, R, S, T.
Therefore QR+ = QRST (Answer)
Note: In FD PTV→V, T is in QRST but that cannot be
entertained, as complete PTV should be a subset of QRST

b) PR + = PR (as the closure of an attribute or set of attributes


contain same)
204

Now as per algorithm look into a set of FD, and check that
complete left side of any FD contains either P, R, or PR. Since
in FD P→Q, P is a subset of PR, Hence PR+ = PRQ
Again, trace the remaining two FD that any left part of FD
contains any P, R, Q, Since, in FD QR → ST has its complete
left part QR in PQR
Hence PR+ = PRQST
Again trace the remaining one FD { PTV->V } that its complete
left belongs to PRQST. Since complete PTV is not in PRQST,
hence we ignore it.
Therefore PR+ = PRQST ( Answer)
2. Given relational schema R( P Q R S T) having following
attributes P Q R S and T, also there is a set of functional
dependency denoted by FD = { P->QR, RS->T, Q->S, T-> P }.
Determine Closure of ( T )+
T + = T (as the closure of an attribute or set of attributes contain
same) Now as per algorithm look into a set of FD that complete
the left side of any FD contains T since, in FD T → P, T is in
T, Hence T+ = TP Again trace the remaining three FD that any
left part of FD contain any TP, Since in FD P → QR has its
complete left part P in TP, Hence T+ = TPQR Again trace the
remaining two FD { RS->T, Q->S } that any of its Complete
left belongs to TPQR, Since in FD Q → S has its complete left
part Q in TPQR, Hence T+ = TPQRS Again trace the remaining
one FD { RS->T } that its complete left belongs to TPQRS,
Since in FD RS → T has its complete left part RS in TPQRS
Hence T+ = TPQRS ( no changes, as T, is already in
TPQRS) Therefore T+ = TPQRS ( Answer).
205

SQL:

o SQL stands for Structured Query Language. It is used for


storing and managing data in relational database
management system (RDMS).
o It is a standard language for Relational Database System.
It enables a user to create, read, update and delete
relational databases and tables.
o All the RDBMS like MySQL, Informix, Oracle, MS
Access and SQL Server use SQL as their standard
database language.
o SQL allows users to query the database in a number of
ways, using English-like statements.
Rules:
SQL follows the following rules:
o Structure query language is not case sensitive. Generally,
keywords of SQL are written in uppercase.
o Statements of SQL are dependent on text lines. We can
use a single SQL statement on one or multiple text line.
o Using the SQL statements, you can perform most of the
actions in a database.
o SQL depends on tuple relational calculus and relational
algebra.
SQL process:
o When an SQL command is executing for any RDBMS,

then the system figure out the best way to carry out the
request and the SQL engine determines that how to
interpret the task.
206

o In the process, various components are included. These


components can be optimization Engine, Query engine,
Query dispatcher, classic, etc.
o All the non-SQL queries are handled by the classic query
engine, but SQL query engine won't handle logical files.

Characteristics of SQL
o SQL is easy to learn.

o SQL is used to access data from relational database

management systems.
o SQL can execute queries against the database.

o SQL is used to describe the data.


207

o SQL is used to define the data in the database and


manipulate it when needed.
o SQL is used to create and drop the database and table.
o SQL is used to create a view, stored procedure, function
in a database.
o SQL allows users to set permissions on tables, procedures,
and views.
Advantages of SQL
There are the following advantages of SQL:
High speed
Using the SQL queries, the user can quickly and efficiently
retrieve a large amount of records from a database.
No coding needed
In the standard SQL, it is very easy to manage the database
system. It doesn't require a substantial amount of code to
manage the database system.
Well defined standards
Long established are used by the SQL databases that are being
used by ISO and ANSI.
Portability
SQL can be used in laptop, PCs, server and even some mobile
phones.
208

Interactive language
SQL is a domain language used to communicate with the
database. It is also used to receive answers to the complex
questions in seconds.
Multiple data view
Using the SQL language, the users can make different views of
the database structure.

Data types:

Data types are used to represent the nature of the data that can
be stored in the database table. For example, in a particular
column of a table, if we want to store a string type of data then
we will have to declare a string data type of this column.
Data types mainly classified into three categories for every
database.
o String Data types
o Numeric Data types
o Date and time Data types
Data Types in MySQL, SQL Server and Oracle Databases
MySQL Data Types
A list of data types used in MySQL database. This is based on
MySQL 8.0.
209

MySQL String Data Types

CHAR(Size) It is used to specify a fixed


length string that can contain
numbers, letters, and special
characters. Its size can be 0 to
255 characters. Default is 1.

VARCHAR(Size) It is used to specify a variable


length string that can contain
numbers, letters, and special
characters. Its size can be from
0 to 65535 characters.

BINARY(Size) It is equal to CHAR() but


stores binary byte strings. Its
size parameter specifies the
column length in the bytes.
Default is 1.

VARBINARY(Size) It is equal to VARCHAR() but


stores binary byte strings. Its
size parameter specifies the
maximum column length in
bytes.

TEXT(Size) It holds a string that can


contain a maximum length of
255 characters.
210

TINYTEXT It holds a string with a


maximum length of 255
characters.

MEDIUMTEXT It holds a string with a


maximum length of
16,777,215.

LONGTEXT It holds a string with a


maximum length of
4,294,967,295 characters.

ENUM(val1, val2, It is used when a string object


val3,...) having only one value, chosen
from a list of possible values.
It contains 65535 values in an
ENUM list. If you insert a
value that is not in the list, a
blank value will be inserted.

SET( It is used to specify a string


val1,val2,val3,....) that can have 0 or more values,
chosen from a list of possible
values. You can list up to 64
values at one time in a SET
list.

BLOB(size) It is used for BLOBs (Binary


Large Objects). It can hold up
to 65,535 bytes.
211

MySQL Numeric Data Types

BIT(Size) It is used for a bit-value type. The


number of bits per value is specified
in size. Its size can be 1 to 64. The
default value is 1.

INT(size) It is used for the integer value. Its


signed range varies from -
2147483648 to 2147483647 and
unsigned range varies from 0 to
4294967295. The size parameter
specifies the max display width that is
255.

INTEGER(size) It is equal to INT(size).

FLOAT(size, d) It is used to specify a floating point


number. Its size parameter specifies
the total number of digits. The number
of digits after the decimal point is
specified by d parameter.

FLOAT(p) It is used to specify a floating point


number. MySQL used p parameter to
determine whether to use FLOAT or
DOUBLE. If p is between 0 to24, the
data type becomes FLOAT (). If p is
from 25 to 53, the data type becomes
DOUBLE().
212

DOUBLE(size, It is a normal size floating point


d) number. Its size parameter specifies
the total number of digits. The number
of digits after the decimal is specified
by d parameter.

DECIMAL(size, It is used to specify a fixed point


d) number. Its size parameter specifies
the total number of digits. The number
of digits after the decimal parameter is
specified by d parameter. The
maximum value for the size is 65, and
the default value is 10. The maximum
value for d is 30, and the default value
is 0.

DEC(size, d) It is equal to DECIMAL(size, d).

BOOL It is used to specify Boolean values


true and false. Zero is considered as
false, and nonzero values are
considered as true.

MySQL Date and Time Data Types

DATE It is used to specify date format


YYYY-MM-DD. Its supported
range is from '1000-01-01' to
'9999-12-31'.
213

DATETIME(fsp) It is used to specify date and time


combination. Its format is YYYY-
MM-DD hh:mm:ss. Its supported
range is from '1000-01-01
00:00:00' to 9999-12-31 23:59:59'.

TIMESTAMP(fsp) It is used to specify the timestamp.


Its value is stored as the number of
seconds since the Unix
epoch('1970-01-01 00:00:00'
UTC). Its format is YYYY-MM-
DD hh:mm:ss. Its supported range
is from '1970-01-01 00:00:01'
UTC to '2038-01-09 03:14:07'
UTC.

TIME(fsp) It is used to specify the time


format. Its format is hh:mm:ss. Its
supported range is from '-
838:59:59' to '838:59:59'

YEAR It is used to specify a year in four-


digit format. Values allowed in
four digit format from 1901 to
2155, and 0000.

SQL Server Data Types


SQL Server String Data Type
214

char(n) It is a fixed width character string data


type. Its size can be up to 8000 characters.

varchar(n) It is a variable width character string data


type. Its size can be up to 8000 characters.

varchar(max) It is a variable width character string data


types. Its size can be up to 1,073,741,824
characters.

text It is a variable width character string data


type. Its size can be up to 2GB of text data.

nchar It is a fixed width Unicode string data type.


Its size can be up to 4000 characters.

nvarchar It is a variable width Unicode string data


type. Its size can be up to 4000 characters.

ntext It is a variable width Unicode string data


type. Its size can be up to 2GB of text data.

binary(n) It is a fixed width Binary string data type.


Its size can be up to 8000 bytes.

varbinary It is a variable width Binary string data


type. Its size can be up to 8000 bytes.

image It is also a variable width Binary string data


type. Its size can be up to 2GB.
215

SQL Server Numeric Data Types

bit It is an integer that can be 0, 1 or null.

tinyint It allows whole numbers from 0 to 255.

Smallint It allows whole numbers between -32,768 and


32,767.

Int It allows whole numbers between -2,147,483,648


and 2,147,483,647.

bigint It allows whole numbers between -


9,223,372,036,854,775,808 and
9,223,372,036,854,775,807.

float(n) It is used to specify floating precision number


data from -1.79E+308 to 1.79E+308. The n
parameter indicates whether the field should hold
the 4 or 8 bytes. Default value of n is 53.

real It is a floating precision number data from -


3.40E+38 to 3.40E+38.

money It is used to specify monetary data from -


922,337,233,685,477.5808 to
922,337,203,685,477.5807.
216

SQL Server Date and Time Data Type

datetime It is used to specify date and time


combination. It supports range from
January 1, 1753, to December 31, 9999
with an accuracy of 3.33 milliseconds.

datetime2 It is used to specify date and time


combination. It supports range from
January 1, 0001 to December 31, 9999
with an accuracy of 100 nanoseconds

date It is used to store date only. It supports


range from January 1, 0001 to December
31, 9999

time It stores time only to an accuracy of 100


nanoseconds

timestamp It stores a unique number when a new


row gets created or modified. The time
stamp value is based upon an internal
clock and does not correspond to real
time. Each table may contain only one-
time stamp variable.
217

SQL Server Other Data Types

Sql_variant It is used for various data types except


for text, timestamp, and ntext. It stores
up to 8000 bytes of data.

XML It stores XML formatted data.


Maximum 2GB.

cursor It stores a reference to a cursor used for


database operations.

table It stores result set for later processing.

uniqueidentifier It stores GUID (Globally unique


identifier).

Oracle Data Types


Oracle String data types

CHAR(size) It is used to store character data


within the predefined length. It
can be stored up to 2000 bytes.

NCHAR(size) It is used to store national


character data within the
predefined length. It can be stored
up to 2000 bytes.
218

VARCHAR2(size) It is used to store variable string


data within the predefined length.
It can be stored up to 4000 byte.

VARCHAR(SIZE) It is the same as


VARCHAR2(size). You can also
use VARCHAR(size), but it is
suggested to use
VARCHAR2(size)

NVARCHAR2(size) It is used to store Unicode string


data within the predefined length.
We have to must specify the size
of NVARCHAR2 data type. It
can be stored up to 4000 bytes.

Oracle Numeric Data Types

NUMBER(p, s) It contains precision p and scale s.


The precision p can range from 1 to
38, and the scale s can range from -84
to 127.

FLOAT(p) It is a subtype of the NUMBER data


type. The precision p can range from
1 to 126.

BINARY_FLOAT It is used for binary precision( 32-


bit). It requires 5 bytes, including
length byte.
219

BINARY_DOUBLE It is used for double binary precision


(64-bit). It requires 9 bytes, including
length byte.

Oracle Date and Time Data Types

DATE It is used to store a valid date-time format with


a fixed length. Its range varies from January 1,
4712 BC to December 31, 9999 AD.

TIMESTAMP It is used to store the valid date in YYYY-


MM-DD with time hh:mm:ss format.

Oracle Large Object Data Types (LOB Types)

BLOB It is used to specify unstructured


binary data. Its range goes up to 232-1
bytes or 4 GB.

BFILE It is used to store binary data in an


external file. Its range goes up to 232-1
bytes or 4 GB.

CLOB It is used for single-byte character


data. Its range goes up to 232-1 bytes or
4 GB.

NCLOB It is used to specify single byte or fixed


length multibyte national character set
220

(NCHAR) data. Its range is up to 232-1


bytes or 4 GB.

RAW(size) It is used to specify variable length raw


binary data. Its range is up to 2000
bytes per row. Its maximum size must
be specified.

LONG It is used to specify variable length raw


RAW binary data. Its range up to 231-1 bytes
or 2 GB, per row.

Basic Queries in SQL:

o SQL commands are instructions. It is used to


communicate with the database. It is also used to perform
specific tasks, functions, and queries of data.
o SQL can perform various tasks like create a table, add data
to tables, drop the table, modify the table, set permission
for users.
Types of SQL Commands
There are five types of SQL commands: DDL, DML, DCL,
TCL, and DQL.
221

1. Data Definition Language (DDL)


o DDL changes the structure of the table like creating a

table, deleting a table, altering a table, etc.


o All the command of DDL are auto-committed that means

it permanently save all the changes in the database.


Here are some commands that come under DDL:
o CREATE
o ALTER
o DROP
o TRUNCATE
a. CREATE It is used to create a new table in the database.
Syntax:
222

1. CREATE TABLE TABLE_NAME (COLUMN_NAME DAT


ATYPES[,....]);
Example:
1. CREATE TABLE EMPLOYEE(Name VARCHAR2(20), Em
ail VARCHAR2(100), DOB DATE);
b. DROP: It is used to delete both the structure and record
stored in the table.
Syntax
1. DROP TABLE table_name;
Example
1. DROP TABLE EMPLOYEE;
c. ALTER: It is used to alter the structure of the database. This
change could be either to modify the characteristics of an
existing attribute or probably to add a new attribute.
Syntax:
To add a new column in the table
1. ALTER TABLE table_name ADD column_name COLUMN-
definition;
To modify existing column in the table:
1. ALTER TABLE table_name MODIFY(column_definitions....
);
EXAMPLE
223

1. ALTER TABLE STU_DETAILS ADD(ADDRESS VARCH


AR2(20));
2. ALTER TABLE STU_DETAILS MODIFY (NAME VARCH
AR2(20));
d. TRUNCATE: It is used to delete all the rows from the table
and free the space containing the table.
Syntax:
1. TRUNCATE TABLE table_name;
Example:
1. TRUNCATE TABLE EMPLOYEE;
2. Data Manipulation Language
o DML commands are used to modify the database. It is

responsible for all form of changes in the database.


o The command of DML is not auto-committed that means

it can't permanently save all the changes in the database.


They can be rollback.
Here are some commands that come under DML:
o INSERT
o UPDATE
o DELETE
a. INSERT: The INSERT statement is a SQL query. It is used
to insert data into the row of a table.
Syntax:
1. INSERT INTO TABLE_NAME
2. (col1, col2, col3,.... col N)
3. VALUES (value1, value2, value3, .... valueN);
224

Or
1. INSERT INTO TABLE_NAME
2. VALUES (value1, value2, value3, .... valueN);
For example:
1. INSERT INTO javatpoint (Author, Subject) VALUES ("Sono
o", "DBMS");
b. UPDATE: This command is used to update or modify the
value of a column in the table.
Syntax:
1. UPDATE table_name SET [column_name1= value1,...column
_nameN = valueN] [WHERE CONDITION]
For example:
1. UPDATE students
2. SET User_Name = 'Sonoo'
3. WHERE Student_Id = '3'
c. DELETE: It is used to remove one or more row from a table.
Syntax:
1. DELETE FROM table_name [WHERE condition];
For example:
1. DELETE FROM javatpoint
2. WHERE Author="Sonoo";
3. Data Control Language
DCL commands are used to grant and take back authority from
any database user.
225

Here are some commands that come under DCL:


o Grant
o Revoke
a. Grant: It is used to give user access privileges to a database.
Example
1. GRANT SELECT, UPDATE ON MY_TABLE TO SOME_U
SER, ANOTHER_USER;
b. Revoke: It is used to take back permissions from the user.
Example
1. REVOKE SELECT, UPDATE ON MY_TABLE FROM USE
R1, USER2;
4. Transaction Control Language
TCL commands can only use with DML commands like
INSERT, DELETE and UPDATE only.
These operations are automatically committed in the database
that's why they cannot be used while creating tables or dropping
them.
Here are some commands that come under TCL:
o COMMIT
o ROLLBACK
o SAVEPOINT
a. Commit: Commit command is used to save all the
transactions to the database.
Syntax:
226

1. COMMIT;
Example:
1. DELETE FROM CUSTOMERS
2. WHERE AGE = 25;
3. COMMIT;
b. Rollback: Rollback command is used to undo transactions
that have not already been saved to the database.
Syntax:
1. ROLLBACK;
Example:
1. DELETE FROM CUSTOMERS
2. WHERE AGE = 25;
3. ROLLBACK;
c. SAVEPOINT: It is used to roll the transaction back to a
certain point without rolling back the entire transaction.
Syntax:
1. SAVEPOINT SAVEPOINT_NAME;
5. Data Query Language
DQL is used to fetch the data from the database.
It uses only one command:
o SELECT
a. SELECT: This is the same as the projection operation of
relational algebra. It is used to select the attribute based on the
condition described by WHERE clause.
227

Syntax:
1. SELECT expressions
2. FROM TABLES
3. WHERE conditions;
For example:
1. SELECT emp_name
2. FROM employee
3. WHERE age > 20;

Insert Statement in SQL:

SQL INSERT statement is a SQL query. It is used to insert a


single or a multiple records in a table.
There are two
1. By specifying column names
2. Without specifying column names
2. By SQL insert into select statement

1) Inserting data directly into a table


You can insert a row in the table by using SQL INSERT INTO
command.
There are two ways to insert values in a table.
In the first method there is no need to specify the column
name where the data will be inserted, you need only their
values.
228

1. INSERT INTO table_name


2. VALUES (value1, value2, value3....);
The second method specifies both the column name and
values which you want to insert.
1. INSERT INTO table_name (column1, column2, column3....)

2. VALUES (value1, value2, value3.....);


Let's take an example of table which has five records within it.
1. INSERT INTO STUDENTS (ROLL_NO, NAME, AGE, CI
TY)
2. VALUES (1, ABHIRAM, 22, ALLAHABAD);
3. INSERT INTO STUDENTS (ROLL_NO, NAME, AGE, CI
TY)
4. VALUES (2, ALKA, 20, GHAZIABAD);
5. INSERT INTO STUDENTS (ROLL_NO, NAME, AGE, CI
TY)
6. VALUES (3, DISHA, 21, VARANASI);
7. INSERT INTO STUDENTS (ROLL_NO, NAME, AGE, CI
TY)
8. VALUES (4, ESHA, 21, DELHI);
9. INSERT INTO STUDENTS (ROLL_NO, NAME, AGE, CI
TY)
10. VALUES (5, MANMEET, 23, JALANDHAR);
It will show the following table as the final result.

ROLL_NO NAME AGE CITY

1 ABHIRAM 22 ALLAHABAD
229

2 ALKA 20 GHAZIABAD

3 DISHA 21 VARANASI

4 ESHA 21 DELHI

5 MANMEET 23 JALANDHAR

You can create a record in CUSTOMERS table by using this


syntax also.
1. INSERT INTO CUSTOMERS
2. VALUES (6, PRATIK, 24, KANPUR);
The following table will be as follow:

ROLL_NO NAME AGE CITY

1 ABHIRAM 22 ALLAHABAD

2 ALKA 20 GHAZIABAD

3 DISHA 21 VARANASI

4 ESHA 21 DELHI

5 MANMEET 23 JALANDHAR

6 PRATIK 24 KANPUR
230

2) Inserting data through SELECT Statement


SQL INSERT INTO SELECT Syntax
1. INSERT INTO table_name
2. [(column1, column2, .... column)]
3. SELECT column1, column2, .... Column N
4. FROM table_name [WHERE condition];
Note: when you add a new row, you should make sure that data
type of the value and the column should be matched.

SQL INSERT Multiple Rows


Many times developers ask that is it possible to insert multiple
rows into a single table in a single statement. Currently,
developers have to write multiple insert statements when they
insert values in a table. It is not only boring but also time-
consuming.
Let us see few practical examples to understand this concept
more clearly. We will use the MySQL database for writing all
the queries.
Example 1:
To create a table in the database, first, we need to select the
database in which we want to create a table.
1. mysql> USE dbs;
Then we will write a query to create a table named student in
the selected database 'dbs'.
231

1. mysql> CREATE TABLE student(ID INT, Name VARCH


AR(20), Percentage INT, Location VARCHAR(20), DateOf
Birth DATE);

The student table is created successfully.


Now, we will write a single query to insert multiple records in
the student table:
1. mysql> INSERT INTO student(ID, Name, Percentage, Locat
ion, DateOfBirth) VALUES(1, "Manthan Koli", 79, "Delhi",
"2003-08-20"), (2, "Dev Dixit", 75, "Pune", "1999-06-
17"), (3, "Aakash Deshmukh", 87, "Mumbai", "1997-09-
12"), (4, "Aaryan Jaiswal", 90, "Chennai", "2005-10-
02"), (5, "Rahul Khanna", 92, "Ambala", "1996-03-
04"), (6, "Pankaj Deshmukh", 67, "Kanpur", "2000-02-
02"), (7, "Gaurav Kumar", 84, "Chandigarh", "1998-07-
06"), (8, "Sanket Jain", 61, "Shimla", "1990-09-
08"), (9, "Sahil Wagh", 90, "Kolkata", "1968-04-
03"), (10, "Saurabh Singh", 54, "Kashmir", "1989-01-06");
232

To verify that multiple records are inserted in the student table,


we will execute the SELECT query.
1. mysql> SELECT *FROM student;

ID Name Percentage Location DateOfBirth

1 Manthan 79 Delhi 2003-08-20


Koli

2 Dev Dixit 75 Pune 1999-06-17

3 Aakash 87 Mumbai 1997-09-12


Deshmukh

4 Aaryan 90 Chennai 2005-10-02


Jaiswal

5 Rahul 92 Ambala 1996-03-04


Khanna

6 Pankaj 67 Kanpur 2000-02-02


Deshmukh

7 Gaurav 84 Chandigarh 1998-07-06


Kumar

8 Sanket 61 Shimla 1990-09-08


Jain
233

9 Sahil 90 Kolkata 1968-04-03


Wagh

10 Saurabh 54 Kashmir 1989-01-06


Singh

The results show that all ten records are inserted successfully
using a single query.
Example 2:
To create a table in the database, first, we need to select the
database in which we want to create a table.
1. mysql> USE dbs;
Then we will write a query to create a table named items_tbl in
the selected database 'dbs'.
1. mysql> CREATE TABLE items_tbl(ID INT, Item_Name V
ARCHAR(20), Item_Quantity INT, Item_Price INT, Purchas
e_Date DATE);

The table named items_tbl is created successfully.


Now, we will write a single query to insert multiple records in
the items_tbl table:
234

1. mysql> INSERT INTO items_tbl(ID, Item_Name, Item_Qua


ntity, Item_Price, Purchase_Date) VALUES(1, "Soap", 5, 200
, "2021-07-08"), (2, "Toothpaste", 2, 80, "2021-07-
10"), (3, "Pen", 10, 50, "2021-07-
12"), (4, "Bottle", 1, 250, "2021-07-
13"), (5, "Brush", 3, 90, "2021-07-
15"), (6, "Notebooks", 10, 1000, "2021-07-
26"), (7, "Handkerchief", 3, 100, "2021-07-
28"), (8, "Chips Packet", 5, 50, "2021-07-
30"), (9, "Marker", 2, 30, "2021-08-
13"), (10, "Scissors", 1, 60, "2021-08-13");

To verify that multiple records are inserted in the items_tbl


table, we will execute the SELECT query.
1. mysql> SELECT *FROM items_tbl;

I Item_Na Item_Quant Item_Pri Purchase_D


D me ity ce ate

1 Soap 5 200 2021-07-08

2 Toothpaste 2 80 2021-07-10

3 Pen 10 50 2021-07-12

4 Bottle 1 250 2021-07-13


235

5 Brush 3 90 2021-07-15

6 Notebooks 10 1000 2021-07-26

7 Handkerch 3 100 2021-07-28


ief

8 Chips 5 50 2021-07-30
Packet

9 Marker 2 30 2021-08-13

10 Scissors 1 60 2021-08-13

The results show that all ten records are inserted successfully
using a single query.

Delete Statement in SQL:

The SQL DELETE statement is used to delete rows from a


table. Generally DELETE statement removes one or more
records from a table.
SQL DELETE Syntax

Let's see the Syntax for the SQL DELETE statement:


1. DELETE FROM table_name [WHERE condition];
Here table_name is the table which has to be deleted.
The WHERE clause in SQL DELETE statement is optional
here.
236

SQL DELETE Example

Let us take a table, named "EMPLOYEE" table.

ID EMP_NAME CITY SALARY

101 Adarsh Singh Obra 20000

102 Sanjay Singh Meerut 21000

103 Priyanka Sharma Raipur 25000

104 Esha Singhal Delhi 26000

Example of delete with WHERE clause is given below:


1. DELETE FROM EMPLOYEE WHERE ID=101;
Resulting table after the query:

ID EMP_NAME CITY SALARY

102 Sanjay Singh Meerut 21000

103 Priyanka Sharma Raipur 25000

104 Esha Singhal Delhi 26000

Another example of delete statement is given below


237

1. DELETE FROM EMPLOYEE;


Resulting table after the query:

ID EMP_NAME CITY SALARY

It will delete all the records of EMPLOYEE table.


It will delete the all the records of EMPLOYEE table where ID
is 101.
The WHERE clause in the SQL DELETE statement is optional
and it identifies the rows in the column that gets deleted.
WHERE clause is used to prevent the deletion of all the rows
in the table, If you don't use the WHERE clause you might loss
all the rows.

Invalid DELETE Statement for ORACLE database

You cannot use * (asterisk) symbol to delete all the records.


1. DELETE * FROM EMPLOYEE;

SQL DELETE TABLE


The DELETE statement is used to delete rows from a table. If
you want to remove a specific row from a table you should use
WHERE condition.
1. DELETE FROM table_name [WHERE condition];
But if you do not specify the WHERE condition it will remove
all the rows from the table.
238

1. DELETE FROM table_name;


There are some more terms similar to DELETE statement like
as DROP statement and TRUNCATE statement but they are
not exactly same there are some differences between them.

Difference between DELETE and TRUNCATE statements


There is a slight difference b/w delete and truncate statement.
The DELETE statement only deletes the rows from the table
based on the condition defined by WHERE clause or delete all
the rows from the table when condition is not specified.
But it does not free the space containing by the table.
The TRUNCATE statement: it is used to delete all the rows
from the table and free the containing space.
Let's see an "employee" table.

Emp_id Name Address Salary

1 Aryan Allahabad 22000

2 Shurabhi Varanasi 13000

3 Pappu Delhi 24000

Execute the following query to truncate the table:


1. TRUNCATE TABLE employee;
239

Difference b/w DROP and TRUNCATE statements


When you use the drop statement it deletes the table's row
together with the table's definition so all the relationships of
that table with other tables will no longer be valid.
When you drop a table:
o Table structure will be dropped
o Relationship will be dropped
o Integrity constraints will be dropped
o Access privileges will also be dropped
On the other hand when we TRUNCATE a table, th

Update Statement in SQL:

The SQL commands (UPDATE and DELETE) are used to


modify the data that is already in the database. The SQL
DELETE command uses a WHERE clause.
SQL UPDATE statement is used to change the data of the
records held by tables. Which rows is to be update, it is decided
by a condition. To specify condition, we use WHERE clause.
The UPDATE statement can be written in following form:
1. UPDATE table_name SET [column_name1= value1,... colu
mn_nameN = valueN] [WHERE condition]
Let's see the Syntax:
240

45.5M
773
History of Java
1. UPDATE table_name
2. SET column_name = expression
3. WHERE conditions
Let's take an example: here we are going to update an entry in
the source table.
SQL statement:
1. UPDATE students
2. SET User_Name = 'beinghuman'
3. WHERE Student_Id = '3'
Source Table:

Student_Id FirstName LastName User_Name

1 Ada Sharma sharmili


2 Rahul Maurya sofamous
3 James Walker jonny

See the result after updating value:

Student_Id FirstName LastName User_Name

1 Ada Sharma sharmili


2 Rahul Maurya sofamous
3 James Walker beinghuman
241

Updating Multiple Fields:


If you are going to update multiple fields, you should separate
each field assignment with a comma.
SQL UPDATE statement for multiple fields:
1. UPDATE students
2. SET User_Name = 'beserious', First_Name = 'Johnny'
3. WHERE Student_Id = '3'
Result of the table is given below:

Student_Id FirstName LastName User_Name

1 Ada Sharma sharmili


2 Rahul Maurya sofamous
3 Johnny Walker beserious

MYSQL SYNTAX FOR UPDATING TABLE:


1. UPDATE table_name
2. SET field1 = new-value1, field2 = new-value2,
3. [WHERE CLAUSE]
SQL UPDATE SELECT:
SQL UPDATE WITH SELECT QUERY:
We can use SELECT statement to update records through
UPDATE statement.
SYNTAX:
242

1. UPDATE tableDestination
2. SET tableDestination.col = value
3. WHERE EXISTS (
4. SELECT col2.value
5. FROM tblSource
6. WHERE tblSource.join_col = tblDestination. Join_col
7. AND tblSource.Constraint = value)
You can also try this one -
1. UPDATE
2. Table
3. SET
4. Table.column1 = othertable.column 1,
5. Table.column2 = othertable.column 2
6. FROM
7. Table
8. INNER JOIN
9. Other_table
10. ON
11. Table.id = other_table.id
My SQL SYNTAX:
If you want to UPDATE with SELECT in My SQL, you can
use this syntax:
Let's take an example having two tables. Here,
First table contains -
Cat_id, cat_name,
And the second table contains -
Rel_cat_id, rel_cat_name
243

SQL UPDATE COLUMN:


We can update a single or multiple columns in SQL with SQL
UPDATE query.
SQL UPDATE EXAMPLE WITH UPDATING SINGLE
COLUMN:
1. UPDATE students
2. SET student_id = 001
3. WHERE student_name = 'AJEET';
This SQL UPDATE example would update the student_id to
'001' in the student table where student_name is 'AJEET'.
SQL UPDATE EXAMPLE WITH UPDATING
MULTIPLE COLUMNS:
To update more than one column with a single update
statement:
1. UPDATE students
2. SET student_name = 'AJEET',
3. Religion = 'HINDU'
4. WHERE student_name = 'RAJU';
This SQL UPDATE statement will change the student name to
'AJEET' and religion to 'HINDU' where the student name is
'RAJU'.

Views in SQL:

o Views in SQL are considered as a virtual table. A view


also contains rows and columns.
o To create the view, we can select the fields from one or
more tables present in the database.
244

o A view can either have specific rows based on certain


condition or all the rows of a table.
Sample table:
Student_Detail

STU_ID NAME ADDRESS

1 Stephan Delhi

2 Kathrin Noida

3 David Ghaziabad

4 Alina Gurugram

Student_Marks

STU_ID NAME MARKS

1 Stephan 97 1

2 Kathrin 86 2

3 David 74 1

4 Alina 90 2

5 John 96 1

1. Creating view
245

A view can be created using the CREATE VIEW statement.


We can create a view from a single table or multiple tables.
Syntax:
1. CREATE VIEW view_name AS
2. SELECT column1, column2.....
3. FROM table_name
4. WHERE condition;
2. Creating View from a single table
In this example, we create a View named DetailsView from the
table Student_Detail.
Query:
1. CREATE VIEW DetailsView AS
2. SELECT NAME, ADDRESS
3. FROM Student_Details
4. WHERE STU_ID < 4;
Just like table query, we can query the view to view the data.
1. SELECT * FROM DetailsView;
Output:

NAME ADDRESS

Stephan Delhi

Kathrin Noida

David Ghaziabad
246

3. Creating View from multiple tables


View from multiple tables can be created by simply include
multiple tables in the SELECT statement.
In the given example, a view is created named MarksView from
two tables Student_Detail and Student_Marks.
Query:
1. CREATE VIEW MarksView AS
2. SELECT Student_Detail.NAME, Student_Detail.ADDRESS,
Student_Marks.MARKS
3. FROM Student_Detail, Student_Mark
4. WHERE Student_Detail.NAME = Student_Marks.NAME;
To display data of View MarksView:
1. SELECT * FROM MarksView;
NAME ADDRESS MARKS

Stephan Delhi 97

Kathrin Noida 86

David Ghaziabad 74

Alina Gurugram 90

4. Deleting View
A view can be deleted using the Drop View statement.
Syntax
247

1. DROP VIEW view_name;


Example:
If we want to delete the View MarksView, we can do this as:
1. DROP VIEW MarksView;

Query processing:

Query Processing is the activity performed in extracting data


from the database. In query processing, it takes various steps
for fetching the data from the database. The steps involved are:
1. Parsing and translation
2. Optimization
3. Evaluation
The query processing works in the following way:
Parsing and Translation
As query processing includes certain activities for data
retrieval. Initially, the given user queries get translated in high-
level database languages such as SQL. It gets translated into
expressions that can be further used at the physical level of the
file system. After this, the actual evaluation of the queries and
a variety of query -optimizing transformations and takes place.
Thus before processing a query, a computer system needs to
translate the query into a human-readable and understandable
language. Consequently, SQL or Structured Query Language is
the best suitable choice for humans. But, it is not perfectly
suitable for the internal representation of the query to the
system. Relational algebra is well suited for the internal
248

representation of a query. The translation process in query


processing is similar to the parser of a query. When a user
executes any query, for generating the internal form of the
query, the parser in the system checks the syntax of the query,
verifies the name of the relation in the database, the tuple, and
finally the required attribute value. The parser creates a tree of
the query, known as 'parse-tree.' Further, translate it into the
form of relational algebra. With this, it evenly replaces all the
use of the views when used in the query.
Thus, we can understand the working of a query processing in
the below-described diagram:

Suppose a user executes a query. As we have learned that there


are various methods of extracting the data from the database.
In SQL, a user wants to fetch the records of the employees
whose salary is greater than or equal to 10000. For doing this,
the following query is undertaken:
249

select emp_name from Employee where salary>10000;


Thus, to make the system understand the user query, it needs to
be translated in the form of relational algebra. We can bring this
query in the relational algebra form as:
o σsalary>10000 (πsalary (Employee))
o πsalary (σsalary>10000 (Employee))
After translating the given query, we can execute each
relational algebra operation by using different algorithms. So,
in this way, a query processing begins its working.
Evaluation
For this, with addition to the relational algebra translation, it is
required to annotate the translated relational algebra expression
with the instructions used for specifying and evaluating each
operation. Thus, after translating the user query, the system
executes a query evaluation plan.
Query Evaluation Plan
o In order to fully evaluate a query, the system needs to

construct a query evaluation plan.


o The annotations in the evaluation plan may refer to the

algorithms to be used for the particular index or the


specific operations.
o Such relational algebra with annotations is referred to

as Evaluation Primitives. The evaluation primitives


carry the instructions needed for the evaluation of the
operation.
o Thus, a query evaluation plan defines a sequence of

primitive operations used for evaluating a query. The


250

query evaluation plan is also referred to as the query


execution plan.
o A query execution engine is responsible for generating
the output of the given query. It takes the query execution
plan, executes it, and finally makes the output for the user
query.
Optimization
o The cost of the query evaluation can vary for different
types of queries. Although the system is responsible for
constructing the evaluation plan, the user does need not to
write their query efficiently.
o Usually, a database system generates an efficient query
evaluation plan, which minimizes its cost. This type of
task performed by the database system and is known as
Query Optimization.
o For optimizing a query, the query optimizer should have
an estimated cost analysis of each operation. It is because
the overall operation cost depends on the memory
allocations to several operations, execution costs, and so
on.
Finally, after selecting an evaluation plan, the system evaluates
the query and produces the output of the query.

General strategies of query processing:

Steps for SQL processing in database management system


(DBMS) are as follows −
Step 1
251

Parser − While parsing, the database performs the checks like,


Syntax check, Semantic check and Shared pool check, after
converting the query into relational algebra.
 The syntax check concludes SQL is syntactically correct
or not, that means it checks SQL syntactic validity.
The syntax for parser to do syntax check is −
SELCT * FROM student;
The output is as follows −
error
Here error of wrong spelling of SELECT is given by this
check.
 Semantic check It determines whether the statement has
meaning or not. Example: query contains a table name
which does not exist is checked by this check.
 Shared Pool check This check determines existence of
written hash code in shared pool, suppose if code exists in
shared pool then database will not take additional steps for
optimization and execution because every query possess a
hash code during its execution
Step 2
Optimizer − In this stage, the database has to perform a hard
parse at least for one unique DML statement and it has to do
optimization during this parse. This database never optimizes
DDL unless it includes a DML component.
It is a process where multiple query execution plans for
satisfying a query are examined and the most efficient query
plan is satisfied for execution.
Database catalogue stores the execution plans and then
optimizer passes the lowest cost plan for execution.
252

Row Source Generation − The Row Source Generation is


software which receives an optimal execution plan from the
optimizer and produces an iterative execution plan which is
used by the rest of the database. The iterative plan is the binary
program that when executed by the sql engine produces the
result set.
Step 3
Execution Engine − Execution engine is helpful to run the
query and display the required result.

Query Optimization:

Query Optimization: A single query can be executed


through different algorithms or re-written in different forms
and structures. Hence, the question of query optimization
comes into the picture – Which of these forms or pathways is
the most optimal? The query optimizer attempts to determine
the most efficient way to execute a given query by
considering the possible query plans.
Importance: The goal of query optimization is to reduce the
system resources required to fulfill a query, and ultimately
provide the user with the correct result set faster.
 First, it provides the user with faster results, which makes

the application seem faster to the user.


 Secondly, it allows the system to service more queries in

the same amount of time, because each request takes less


time than unoptimized queries.
 Thirdly, query optimization ultimately reduces the amount

of wear on the hardware (e.g. disk drives), and allows the


server to run more efficiently (e.g. lower power
253

consumption, less memory usage).

There are broadly two ways a query can be optimized:


1. Analyze and transform equivalent relational expressions:
Try to minimize the tuple and column counts of the
intermediate and final query processes (discussed here).
2. Using different algorithms for each operation: These
underlying algorithms determine how tuples are accessed
from the data structures they are stored in, indexing,
hashing, data retrieval and hence influence the number of
disk and block accesses (discussed in query processing).
Analyze and transform equivalent relational expressions.
Here, we shall talk about generating minimal equivalent
expressions. To analyze equivalent expression, listed are a set
of equivalence rules. These generate equivalent expressions
for a query written in relational algebra. To optimize a query,
we must convert the query into its equivalent form as long as
an equivalence rule is satisfied.

1. Conjunctive selection operations can be written as a


sequence of individual selections. This is called a sigma-
cascade.

Explanation: Applying condition intersection is


expensive. Instead, filter out tuples satisfying condition
(inner selection) and then apply condition (outer
selection) to the then resulting fewer tuples. This leaves us
with less tuples to process the second time. This can be
extended for two or more intersecting selections. Since we
are breaking a single condition into a series of selections or
cascades, it is called a “cascade”.
254

2. Selection is commutative.

Explanation: condition is commutative in nature. This


means, it does not matter whether we apply first or
first. In practice, it is better and more optimal to apply that
selection first which yields a fewer number of tuples. This
saves time on our outer selection.

3. All following projections can be omitted, only the first


projection is required. This is called a pi-cascade.

Explanation: A cascade or a series of projections is


meaningless. This is because in the end, we are only
selecting those columns which are specified in the last, or
the outermost projection. Hence, it is better to collapse all
the projections into just one i.e. the outermost projection.

4. Selections on Cartesian Products can be re-written as


Theta Joins.
 Equivalence 1

Explanation: The cross product operation is known to


be very expensive. This is because it matches each tuple
of E1 (total m tuples) with each tuple of E2 (total n
tuples). This yields m*n entries. If we apply a selection
operation after that, we would have to scan through m*n
entries to find the suitable tuples which satisfy the
condition . Instead of doing all of this, it is more
optimal to use the Theta Join, a join specifically
designed to select only those entries in the cross product
which satisfy the Theta condition, without evaluating
the entire cross product first.
255

 Equivalence 2

Explanation: Theta Join radically decreases the number


of resulting tuples, so if we apply an intersection of both
the join conditions i.e. and into the Theta Join itself,
we get fewer scans to do. On the other hand, a
condition outside unnecessarily increases the tuples to
scan.

5. Theta Joins are commutative.

Explanation: Theta Joins are commutative, and the query


processing time depends to some extent which table is
used as the outer loop and which one is used as the inner
loop during the join process (based on the indexing
structures and blocks).

6. Join operations are associative.


 Natural Join

Explanation: Joins are all commutative as well as


associative, so one must join those two tables first which
yield less number of entries, and then apply the other
join.

 Theta Join

Explanation: Theta Joins are associative in the above


manner, where involves attributes from only E2 and
E3.
256

7. Selection operation can be distributed.


 Equivalence 1

Explanation: Applying a selection after doing the Theta


Join causes all the tuples returned by the Theta Join to
be monitored after the join. If this selection contains
attributes from only E1, it is better to apply this
selection to E1 (hence resulting in a fewer number of
tuples) and then join it with E2.

 Equivalence 2

Explanation: This can be extended to two selection


conditions, and , where Theta1 contains the
attributes of only E1 and contains attributes of only
E2. Hence, we can individually apply the selection
criteria before joining, to drastically reduce the number
of tuples joined.

8. Projection distributes over the Theta Join.


 Equivalence 1

Explanation: The idea discussed for selection can be


used for projection as well. Here, if L1 is a projection
that involves columns of only E1, and L2 another
projection that involves the columns of only E2, then it
is better to individually apply the projections on both the
tables before joining. This leaves us with a fewer
number of columns on either side, hence contributing to
an easier join.
257

 Equivalence 2

Explanation: Here, when applying projections L1 and


L2 on the join, where L1 contains columns of only E1
and L2 contains columns of only E2, we can introduce
another column E3 (which is common between both the
tables). Then, we can apply projections L1 and L2 on E1
and E2 respectively, along with the added column L3.
L3 enables us to do the join.

9. Union and Intersection are commutative.

Explanation: Union and intersection are both distributive;


we can enclose any tables in parentheses according to
requirement and ease of access.

10. Union and Intersection are associative.

Explanation: Union and intersection are both distributive;


we can enclose any tables in parentheses according to
requirement and ease of access.

11. Selection operation distributes over the union,


intersection, and difference operations.

Explanation: In set difference, we know that only those


tuples are shown which belong to table E1 and do not
belong to table E2. So, applying a selection condition on
the entire set difference is equivalent to applying the
selection condition on the individual tables and then
258

applying set difference. This will reduce the number of


comparisons in the set difference step.

12. Projection operation distributes over the union


operation.

Explanation: Applying individual projections before


computing the union of E1 and E2 is more optimal than the
left expression, i.e. applying projection after the union
step.

Query Processor:

The query processor in a database management system


receives as input a query request in the form of SQL text,
parses it, generates an execution plan, and completes the
processing by executing the plan and returning the results to
the client.
In a relational database system the query processor is the
module responsible for executing database queries. The query
processor receives as input queries in the form of SQL text,
parses and optimizes them, and completes their execution by
employing specific data access methods and database operator
implementations. The query processor communicates with the
storage engine, which reads and writes data from the disk,
manages records, controls concurrency, and maintains log
files.

Typically, a query processor consists of four sub-components;


each of them corresponds to a different stage in the lifecycle
of a query.
259

Concept of Security:

Security of databases refers to the array of controls, tools, and


procedures designed to ensure and safeguard confidentiality,
integrity, and accessibility. This tutorial will concentrate on
confidentiality because it's a component that is most at risk in
data security breaches.
Security for databases must cover and safeguard the following
aspects:
o The database containing data.
o Database management systems (DBMS)
o Any applications that are associated with it.
o Physical database servers or the database server virtual,
and the hardware that runs it.
o The infrastructure for computing or network that is used
to connect to the database.
Security of databases is a complicated and challenging task that
requires all aspects of security practices and technologies. This
is inherently at odds with the accessibility of databases. The
more usable and accessible the database is, the more
susceptible we are to threats from security. The more
vulnerable it is to attacks and threats, the more difficult it is to
access and utilize.
Why Database Security is Important?
According to the definition, a data breach refers to a breach of
data integrity in databases. The amount of damage an incident
like a data breach can cause our business is contingent on
various consequences or elements.
260

o Intellectual property that is compromised: Our


intellectual property--trade secrets, inventions, or
proprietary methods -- could be vital for our ability to
maintain an advantage in our industry. If our intellectual
property has been stolen or disclosed and our competitive
advantage is lost, it could be difficult to keep or recover.
o The damage to our brand's reputation: Customers or
partners may not want to purchase goods or services from
us (or deal with our business) If they do not feel they can
trust our company to protect their data or their own.
o The concept of business continuity (or lack of it): Some
businesses cannot continue to function until a breach has
been resolved.
o Penalties or fines to be paid for not complying: The cost
of not complying with international regulations like the
Sarbanes-Oxley Act (SAO) or Payment Card Industry
Data Security Standard (PCI DSS) specific to industry
regulations on data privacy, like HIPAA or regional
privacy laws like the European Union's General Data
Protection Regulation (GDPR) could be a major problem
with fines in worst cases in excess of many million dollars
for each violation.
o Costs for repairing breaches and notifying consumers
about them: Alongside notifying customers of a breach,
the company that has been breached is required to cover
the investigation and forensic services such as crisis
management, triage repairs to the affected systems, and
much more.
Common Threats and Challenges
Numerous software configurations that are not correct,
weaknesses, or patterns of carelessness or abuse can lead to a
261

breach of security. Here are some of the most prevalent kinds


of reasons for security attacks and the reasons.
Insider Dangers
An insider threat can be an attack on security from any three
sources having an access privilege to the database.
o A malicious insider who wants to cause harm
o An insider who is negligent and makes mistakes that
expose the database to attack. vulnerable to attacks
o An infiltrator is an outsider who acquires credentials by
using a method like phishing or accessing the database of
credential information in the database itself.
Insider dangers are among the most frequent sources of security
breaches to databases. They often occur as a consequence of
the inability of employees to have access to privileged user
credentials.
Human Error
The unintentional mistakes, weak passwords or sharing
passwords, and other negligent or uninformed behaviours of
users remain the root causes of almost half (49 percent) of all
data security breaches.
Database Software Vulnerabilities can be Exploited
Hackers earn their money by identifying and exploiting
vulnerabilities in software such as databases management
software. The major database software companies and open-
source databases management platforms release regular
security patches to fix these weaknesses. However, failing to
implement the patches on time could increase the risk of being
hacked.
262

SQL/NoSQL Injection Attacks


A specific threat to databases is the infusing of untrue SQL as
well as other non-SQL string attacks in queries for databases
delivered by web-based apps and HTTP headers. Companies
that do not follow the safe coding practices for web applications
and conduct regular vulnerability tests are susceptible to attacks
using these.
Buffer Overflow is a way to Exploit Buffers
Buffer overflow happens when a program seeks to copy more
data into the memory block with a certain length than it can
accommodate. The attackers may make use of the extra data,
which is stored in adjacent memory addresses, to establish a
basis for they can begin attacks.
DDoS (DoS/DDoS) Attacks
In a denial-of-service (DoS) attack in which the attacker
overwhelms the targeted server -- in this case, the database
server with such a large volume of requests that the server is
unable to meet no longer legitimate requests made by actual
users. In most cases, the server is unstable or even fails to
function.
Malware
Malware is software designed to exploit vulnerabilities or cause
harm to databases. Malware can be accessed via any device that
connects to the databases network.
263

Attacks on Backups
Companies that do not protect backup data using the same
rigorous controls employed to protect databases themselves are
at risk of cyberattacks on backups.
The following factors amplify the threats:
o Data volumes are growing: Data capture, storage, and
processing continue to increase exponentially in almost all
organizations. Any tools or methods must be highly
flexible to meet current as well as far-off needs.
o The infrastructure is sprawling: Network environments
are becoming more complicated, especially as companies
shift their workloads into multiple clouds and hybrid cloud
architectures and make the selection of deployment,
management, and administration of security solutions
more difficult.
o More stringent requirements for regulatory
compliance: The worldwide regulatory compliance
landscape continues to increase by complexity. This
makes the compliance of every mandate more
challenging.
Best use of Database Security
As databases are almost always accessible via the network, any
security risk to any component or part of the infrastructure can
threaten the database. Likewise, any security attack that
impacts a device or workstation could endanger the database.
Therefore, security for databases must go beyond the limits of
the database.
264

In evaluating the security of databases in our workplace to


determine our organization's top priorities, look at each of these
areas.
o Security for physical security: If the database servers are
on-premises or the cloud data centre, they should be
placed in a secure, controlled climate. (If our server for
database is located in a cloud-based data centre, the cloud
provider will handle the security on our behalf.)
o Access to the network and administrative
restrictions: The practical minimum number of users
granted access to the database and their access rights
should be restricted to the minimum level required to fulfil
their tasks. Additionally, access to the network is limited
to the minimum permissions needed.
o End security of the user account or device: Be aware of
who has access to the database and when and how data is
used. Monitoring tools for data can notify you of data-
related activities that are uncommon or seem to be
dangerous. Any device that connects to the network
hosting the database must be physically secured (in the
sole control of the appropriate person) and be subject to
security checks throughout the day.
o Security: ALL data--including data stored in databases,
as well as credential information should be secured using
the highest-quality encryption when in storage and while
in transport. All encryption keys must be used in
accordance with the best practices guidelines.
o Security of databases using software: Always use the
most current version of our software to manage databases
and apply any patches immediately after they're released.
o Security for web server applications and websites: Any
application or web server that connects to the database
265

could be a target and should be subjected to periodic


security testing and best practices management.
o Security of backups: All backups, images, or copies of
the database should have the identical (or equally
rigorous) security procedures as the database itself.
o Auditing: Audits of security standards for databases
should be conducted every few months. Record all the
logins on the server as well as the operating system. Also,
record any operations that are made on sensitive data, too.
Data protection tools and platforms
Today, a variety of companies provide data protection
platforms and tools. A comprehensive solution should have all
of the following features:
o Discovery: The ability to discover is often needed to meet
regulatory compliance requirements. Look for a tool that
can detect and categorize weaknesses across our
databases, whether they're hosted in the cloud or on-
premises. It will also provide recommendations to address
any vulnerabilities that are discovered.
o Monitoring of Data Activity: The solution should be
capable of monitoring and analysing the entire data
activity in all databases, whether our application is on-
premises, in the cloud, or inside a container. It will alert
us to suspicious activity in real-time to allow us to respond
more quickly to threats. It also provides visibility into the
state of our information through an integrated and
comprehensive user interface. It is also important to
choose a system that enforces rules that govern policies,
procedures, and the separation of duties. Be sure that the
solution we select is able to generate the reports we need
to comply with the regulations.
266

o The ability to Tokenize and Encrypt Data: In case of an


incident, encryption is an additional line of protection
against any compromise. Any software we choose to use
must have the flexibility to protect data cloud, on-
premises hybrid, or multi-cloud environments. Find a tool
with volume, file, and application encryption features that
meet our company's regulations for compliance. This
could require tokenization (data concealing) or advanced
key management of security keys.
o Optimization of Data Security and Risk Analysis: An
application that will provide contextual insights through
the combination of security data with advanced analytics
will allow users to perform optimizing, risk assessment,
and reporting in a breeze. Select a tool that is able to keep
and combine large amounts of recent and historical data
about the security and state of your databases. Also,
choose a solution that provides data exploration, auditing,
and reporting capabilities via an extensive but user-
friendly self-service dashboard.

Concurrency And Recovery:

Concurrency Control is the management procedure that is


required for controlling concurrent execution of the operations
that take place on a database.
But before knowing about concurrency control, we should
know about concurrent execution.
Concurrent Execution in DBMS
o In a multi-user system, multiple users can access and use
the same database at one time, which is known as the
267

concurrent execution of the database. It means that the


same database is executed simultaneously on a multi-user
system by different users.
o While working on the database transactions, there occurs
the requirement of using the database by multiple users for
performing different operations, and in that case,
concurrent execution of the database is performed.
o The thing is that the simultaneous execution that is
performed should be done in an interleaved manner, and
no operation should affect the other executing operations,
thus maintaining the consistency of the database. Thus, on
making the concurrent execution of the transaction
operations, there occur several challenging problems that
need to be solved.
Problems with Concurrent Execution
In a database transaction, the two main operations
are READ and WRITE operations. So, there is a need to
manage these two operations in the concurrent execution of the
transactions as if these operations are not performed in an
interleaved manner, and the data may become inconsistent. So,
the following problems occur with the Concurrent Execution of
the operations:
Problem 1: Lost Update Problems (W - W Conflict)
The problem occurs when two different database transactions
perform the read/write operations on the same database items
in an interleaved manner (i.e., concurrent execution) that
makes the values of the items incorrect hence making the
database inconsistent.
268

For example:
Consider the below diagram where two transactions TX and
TY, are performed on the same account A where the balance
of account A is $300.

o At time t1, transaction TX reads the value of account A,


i.e., $300 (only read).
o At time t2, transaction TX deducts $50 from account A that
becomes $250 (only deducted and not updated/write).
o Alternately, at time t3, transaction TY reads the value of
account A that will be $300 only because TX didn't update
the value yet.
o At time t4, transaction TY adds $100 to account A that
becomes $400 (only added but not updated/write).
o At time t6, transaction TX writes the value of account A
that will be updated as $250 only, as TY didn't update the
value yet.
269

o Similarly, at time t7, transaction TY writes the values of


account A, so it will write as done at time t4 that will be
$400. It means the value written by TX is lost, i.e., $250 is
lost.
Hence data becomes incorrect, and database sets to
inconsistent.
Dirty Read Problems (W-R Conflict)
The dirty read problem occurs when one transaction updates
an item of the database, and somehow the transaction fails, and
before the data gets rollback, the updated database item is
accessed by another transaction. There comes the Read-Write
Conflict between both transactions.
For example:
Consider two transactions TX and TY in the below diagram
performing read/write operations on account A where the
available balance in account A is $300:
270

o At time t1, transaction TX reads the value of account A,


i.e., $300.
o At time t2, transaction TX adds $50 to account A that
becomes $350.
o At time t3, transaction TX writes the updated value in
account A, i.e., $350.
o Then at time t4, transaction TY reads account A that will
be read as $350.
o Then at time t5, transaction TX rollbacks due to server
problem, and the value changes back to $300 (as initially).
o But the value for account A remains $350 for transaction
TY as committed, which is the dirty read and therefore
known as the Dirty Read Problem.
Unrepeatable Read Problem (W-R Conflict)
Also known as Inconsistent Retrievals Problem that occurs
when in a transaction, two different values are read for the
same database item.
For example:
Consider two transactions, TX and TY, performing the
read/write operations on account A, having an available
balance = $300. The diagram is shown below:
271

o At time t1, transaction TX reads the value from account A,


i.e., $300.
o At time t2, transaction TY reads the value from account A,
i.e., $300.
o At time t3, transaction TY updates the value of account A
by adding $100 to the available balance, and then it
becomes $400.
o At time t4, transaction TY writes the updated value, i.e.,
$400.
o After that, at time t5, transaction TX reads the available
value of account A, and that will be read as $400.
o It means that within the same transaction TX, it reads two
different values of account A, i.e., $ 300 initially, and after
updation made by transaction TY, it reads $400. It is an
unrepeatable read and is therefore known as the
Unrepeatable read problem.
Thus, in order to maintain consistency in the database and avoid
such problems that take place in concurrent execution,
272

management is needed, and that is where the concept of


Concurrency Control comes into role.
Concurrency Control
Concurrency Control is the working concept that is required for
controlling and managing the concurrent execution of database
operations and thus avoiding the inconsistencies in the
database. Thus, for maintaining the concurrency of the
database, we have the concurrency control protocols.
Concurrency Control Protocols
The concurrency control protocols ensure the atomicity,
consistency, isolation, durability and serializability of the
concurrent execution of the database transactions. Therefore,
these protocols are categorized as:
o Lock Based Concurrency Control Protocol
o Time Stamp Concurrency Control Protocol
o Validation Based Concurrency Control Protocol
Recovery in Database:

Database systems, like any other computer system, are


subject to failures but the data stored in them must be available
as and when required. When a database fails it must possess
the facilities for fast recovery. It must also have atomicity i.e.
either transaction are completed successfully and committed
(the effect is recorded permanently in the database) or the
transaction should have no effect on the database. There are
both automatic and non-automatic ways for both, backing up
of data and recovery from any failure situations. The
techniques used to recover the lost data due to system crashes,
transaction errors, viruses, catastrophic failure, incorrect
273

commands execution, etc. are database recovery techniques.


So to prevent data loss recovery techniques based on deferred
update and immediate update or backing up data can be used.
Recovery techniques are heavily dependent upon the existence
of a special file known as a system log. It contains information
about the start and end of each transaction and any updates
which occur during the transaction. The log keeps track of all
transaction operations that affect the values of database items.
This information is needed to recover from transaction failure.
 The log is kept on disk start_transaction(T): This log entry

records that transaction T starts the execution.


 read_item(T, X): This log entry records that transaction T

reads the value of database item X.


 write_item(T, X, old_value, new_value): This log entry

records that transaction T changes the value of the database


item X from old_value to new_value. The old value is
sometimes known as a before an image of X, and the new
value is known as an afterimage of X.
 commit(T): This log entry records that transaction T has

completed all accesses to the database successfully and its


effect can be committed (recorded permanently) to the
database.
 abort(T): This records that transaction T has been aborted.

 checkpoint: Checkpoint is a mechanism where all the

previous logs are removed from the system and stored


permanently in a storage disk. Checkpoint declares a point
before which the DBMS was in a consistent state, and all
the transactions were committed.
A transaction T reaches its commit point when all its
operations that access the database have been executed
successfully i.e. the transaction has reached the point at which
it will not abort (terminate without completing). Once
committed, the transaction is permanently recorded in the
274

database. Commitment always involves writing a commit


entry to the log and writing the log to disk. At the time of a
system crash, item is searched back in the log for all
transactions T that have written a start_transaction(T) entry
into the log but have not written a commit(T) entry yet; these
transactions may have to be rolled back to undo their effect on
the database during the recovery process.
 Undoing – If a transaction crashes, then the recovery

manager may undo transactions i.e. reverse the operations


of a transaction. This involves examining a transaction for
the log entry write_item(T, x, old_value, new_value) and
set the value of item x in the database to old-value. There
are two major techniques for recovery from non-
catastrophic transaction failures: deferred updates and
immediate updates.
 Deferred update – This technique does not physically

update the database on disk until a transaction has reached


its commit point. Before reaching commit, all transaction
updates are recorded in the local transaction workspace. If
a transaction fails before reaching its commit point, it will
not have changed the database in any way so UNDO is not
needed. It may be necessary to REDO the effect of the
operations that are recorded in the local transaction
workspace, because their effect may not yet have been
written in the database. Hence, a deferred update is also
known as the No-undo/redo algorithm
 Immediate update – In the immediate update, the database

may be updated by some operations of a transaction before


the transaction reaches its commit point. However, these
operations are recorded in a log on disk before they are
applied to the database, making recovery still possible. If a
transaction fails to reach its commit point, the effect of its
operation must be undone i.e. the transaction must be rolled
275

back hence we require both undo and redo. This technique


is known as undo/redo algorithm.
 Caching/Buffering – In this one or more disk pages that
include data items to be updated are cached into main
memory buffers and then updated in memory before being
written back to disk. A collection of in-memory buffers
called the DBMS cache is kept under the control of DBMS
for holding these buffers. A directory is used to keep track
of which database items are in the buffer. A dirty bit is
associated with each buffer, which is 0 if the buffer is not
modified else 1 if modified.
 Shadow paging – It provides atomicity and durability. A
directory with n entries is constructed, where the ith entry
points to the ith database page on the link. When a
transaction began executing the current directory is copied
into a shadow directory. When a page is to be modified, a
shadow page is allocated in which changes are made and
when it is ready to become durable, all pages that refer to
the original are updated to refer new replacement page.
 Backward Recovery – The term “Rollback ” and “UNDO”
can also refer to backward recovery. When a backup of the
data is not available and previous modifications need to be
undone, this technique can be helpful. With the backward
recovery method, unused modifications are removed and
the database is returned to its prior condition. All
adjustments made during the previous traction are reversed
during the backward recovery. In another word, it
reprocesses valid transactions and undoes the erroneous
database updates.
 Forward Recovery – “Roll forward “and “REDO” refers
to forwarding recovery. When a database needs to be
updated with all changes verified, this forward recovery
technique is helpful.
276

Some failed transactions in this database are applied to the


database to roll those modifications forward. In another
word, the database is restored using preserved data and valid
transactions counted by their past saves.
Some of the backup techniques are as follows :
 Full database backup – In this full database including data
and database, Meta information needed to restore the whole
database, including full-text catalogs are backed up in a
predefined time series.
 Differential backup – It stores only the data changes that
have occurred since the last full database backup. When
some data has changed many times since last full database
backup, a differential backup stores the most recent version
of the changed data. For this first, we need to restore a full
database backup.
 Transaction log backup – In this, all events that have
occurred in the database, like a record of every single
statement executed is backed up. It is the backup of
transaction log entries and contains all transactions that had
happened to the database. Through this, the database can be
recovered to a specific point in time. It is even possible to
perform a backup from a transaction log if the data files are
destroyed and not even a single committed transaction is
lost.

Crash Recovery
DBMS is a highly complex system with hundreds of
transactions being executed every second. The durability and
robustness of a DBMS depends on its complex architecture and
its underlying hardware and system software. If it fails or
crashes amid transactions, it is expected that the system would
follow some sort of algorithm or techniques to recover lost data.
277

Failure Classification
To see where the problem has occurred, we generalize a failure
into various categories, as follows −
Transaction failure
A transaction has to abort when it fails to execute or when it
reaches a point from where it can’t go any further. This is called
transaction failure where only a few transactions or processes
are hurt.
Reasons for a transaction failure could be −
 Logical errors − Where a transaction cannot complete
because it has some code error or any internal error
condition.
 System errors − Where the database system itself

terminates an active transaction because the DBMS is not


able to execute it, or it has to stop because of some system
condition. For example, in case of deadlock or resource
unavailability, the system aborts an active transaction.
System Crash
There are problems − external to the system − that may cause
the system to stop abruptly and cause the system to crash. For
example, interruptions in power supply may cause the failure
of underlying hardware or software failure.
Examples may include operating system errors.
Disk Failure
In early days of technology evolution, it was a common
problem where hard-disk drives or storage drives used to fail
frequently.
278

Disk failures include formation of bad sectors, unreachability


to the disk, disk head crash or any other failure, which destroys
all or a part of disk storage.
Storage Structure
We have already described the storage system. In brief, the
storage structure can be divided into two categories −
 Volatile storage − As the name suggests, a volatile
storage cannot survive system crashes. Volatile storage
devices are placed very close to the CPU; normally they
are embedded onto the chipset itself. For example, main
memory and cache memory are examples of volatile
storage. They are fast but can store only a small amount of
information.
 Non-volatile storage − These memories are made to
survive system crashes. They are huge in data storage
capacity, but slower in accessibility. Examples may
include hard-disks, magnetic tapes, flash memory, and
non-volatile (battery backed up) RAM.
Recovery and Atomicity
When a system crashes, it may have several transactions being
executed and various files opened for them to modify the data
items. Transactions are made of various operations, which are
atomic in nature. But according to ACID properties of DBMS,
atomicity of transactions as a whole must be maintained, that
is, either all the operations are executed or none.
When a DBMS recovers from a crash, it should maintain the
following −
 It should check the states of all the transactions, which
were being executed.
279

 A transaction may be in the middle of some operation; the


DBMS must ensure the atomicity of the transaction in this
case.
 It should check whether the transaction can be completed

now or it needs to be rolled back.


 No transactions would be allowed to leave the DBMS in

an inconsistent state.
There are two types of techniques, which can help a DBMS in
recovering as well as maintaining the atomicity of a transaction

 Maintaining the logs of each transaction, and writing them
onto some stable storage before actually modifying the
database.
 Maintaining shadow paging, where the changes are done
on a volatile memory, and later, the actual database is
updated.
Log-based Recovery
Log is a sequence of records, which maintains the records of
actions performed by a transaction. It is important that the logs
are written prior to the actual modification and stored on a
stable storage media, which is failsafe.
Log-based recovery works as follows −
 The log file is kept on a stable storage media.
 When a transaction enters the system and starts execution,

it writes a log about it.


<Tn, Start>
 When the transaction modifies an item X, it write logs as

follows −
<Tn, X, V1, V2>
It reads Tn has changed the value of X, from V 1 to V2.
280

 When the transaction finishes, it logs −


<Tn, commit>
The database can be modified using two approaches −
 Deferred database modification − All logs are written on
to the stable storage and the database is updated when a
transaction commits.
 Immediate database modification − Each log follows an
actual database modification. That is, the database is
modified immediately after every operation.
Recovery with Concurrent Transactions
When more than one transaction are being executed in parallel,
the logs are interleaved. At the time of recovery, it would
become hard for the recovery system to backtrack all logs, and
then start recovering. To ease this situation, most modern
DBMS use the concept of 'checkpoints'.
Checkpoint
Keeping and maintaining logs in real time and in real
environment may fill out all the memory space available in the
system. As time passes, the log file may grow too big to be
handled at all. Checkpoint is a mechanism where all the
previous logs are removed from the system and stored
permanently in a storage disk. Checkpoint declares a point
before which the DBMS was in consistent state, and all the
transactions were committed.
Recovery
When a system with concurrent transactions crashes and
recovers, it behaves in the following manner −
281

 The recovery system reads the logs backwards from the


end to the last checkpoint.
 It maintains two lists, an undo-list and a redo-list.

 If the recovery system sees a log with <Tn, Start> and <Tn,

Commit> or just <Tn, Commit>, it puts the transaction in


the redo-list.
 If the recovery system sees a log with <T n, Start> but no

commit or abort log found, it puts the transaction in undo-


list.
All the transactions in the undo-list are then undone and their
logs are removed. All the transactions in the redo-list and their
previous logs are removed and then redone before saving their
logs.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy