Database Management Systems-1
Database Management Systems-1
DATABASE
MANAGEMENT SYSTEM
(DBMS)
(R-13 Autonomous)
SYLLABUS
(R15A0509) DATABASE MANAGEMENT SYSTEMS
Objectives:
To Understand the basic concepts and the applications of database systems
To Master the basics of SQL and construct queries using SQL
To understand the relational database design principles
To become familiar with the basic issues of transaction processing and concurrency control
To become familiar with database storage structures and access techniques
UNIT I:
Data base System Applications, Purpose of Database Systems, View of Data – Data Abstraction –
Instances and Schemas – data Models – the ER Model – Relational Model – Other Models –
Database Languages – DDL – DML – database Access for applications Programs – data base Users
and Administrator – Transaction Management – data base Architecture – Storage Manager – the
Query Processor
Data base design and ER diagrams – ER Model - Entities, Attributes and Entity sets – Relationships
and Relationship sets – ER Design Issues – Concept Design – Conceptual Design for University
Enterprise.
Introduction to the Relational Model – Structure – Database Schema, Keys – Schema Diagrams
UNIT II:
Relational Query Languages, Relational Operations.
Relational Algebra – Selection and projection set operations – renaming – Joins – Division –
Examples of Algebra overviews – Relational calculus – Tuple relational Calculus – Domain relational
calculus.
Overview of the SQL Query Language – Basic Structure of SQL Queries, Set Operations, Aggregate
Functions – GROUPBY – HAVING, Nested Sub queries, Views, Triggers.
UNIT III:
Normalization – Introduction, Non loss decomposition and functional dependencies, First, Second,
and third normal forms – dependency preservation, Boyee/Codd normal form.
Higher Normal Forms - Introduction, Multi-valued dependencies and Fourth normal form, Join
dependencies and Fifth normal form
UNIT IV:
Transaction Concept- Transaction State- Implementation of Atomicity and Durability – Concurrent –
Executions – Serializability- Recoverability – Implementation of Isolation – Testing for
serializability- Lock –Based Protocols – Timestamp Based Protocols- Validation- Based Protocols –
Multiple Granularity.
Recovery and Atomicity – Log – Based Recovery – Recovery with Concurrent Transactions – Buffer
Management – Failure with loss of nonvolatile storage-Advance Recovery systems- Remote Backup
systems.
UNIT V:
File organization:– File organization – various kinds of indexes. Query Processing – Measures of
query cost - Selection operation – Projection operation, - Join operation – set operation and aggregate
operation – Relational Query Optimization – Transacting SQL queries – Estimating the cost –
Equivalence Rules.
3
TEXT BOOKS:
1. Data base System Concepts, Silberschatz, Korth, McGraw hill, Sixth Edition.(All UNITS
except III th)
2. Data base Management Systems, Raghurama Krishnan, Johannes Gehrke, TATA
McGrawHill 3rd Edition.
REFERENCE BOOKS:
1. Fundamentals of Database Systems, Elmasri Navathe Pearson Education.
2. An Introduction to Database systems, C.J. Date, A.Kannan, S.Swami Nadhan, Pearson, Eight
Edition for UNIT III.
URLs:
Outcomes:
Demonstrate the basic elements of a relational database management system
Ability to identify the data models for relevant problems
Ability to design entity relationship and convert entity relationship diagrams into RDBMS and
formulate SQL queries on the respect data
Apply normalization for the development of application software’s
4
UNIT-1
What is a Database?
To find out what database is, we have to start from data, which is the basic building block of any DBMS.
Data: Facts, figures, statistics etc. having no particular meaning (e.g. 1, ABC, 19 etc).
Record: Collection of related data items, e.g. in the above example the three data items had no meaning. But
if we organize them in the following way, then they collectively represent meaningful information.
Roll Name Age
1 ABC 19
The columns of this relation are called Fields, Attributes or Domains. The rows are called Tuples
or Records.
Database: Collection of related relations. Consider the following collection of tables:
T1 T2
Roll Name Age Roll Address
1 ABC 19 1 KOL
2 DEF 22 2 DEL
3 XYZ 28 3 MUM
T3 T4
We now have a collection of 4 tables. They can be called a “related collection” because we can clearly find out
that there are some common attributes existing in a selected pair of tables. Because of these common
attributes we may combine the data of two or more tables together to find out the complete details of a
student. Questions like “Which hostel does the youngest student live in?” can be answered now, although
5
Age and Hostel attributes are in different tables.
A database in a DBMS could be viewed by lots of different people with different responsibilities.
For example, within a company there are different departments, as well as customers, who each need to see
different kinds of data. Each employee in the company will have different levels of access to the database with
their own customized front-end application.
In a database, data is organized strictly in row and column format. The rows are called Tuple or Record. The
data items within one row may belong to different data types. On the other hand, the columns are often called
Domain or Attribute. All the data items within a single attribute are of the same data type.
A database-management system (DBMS) is a collection of interrelated data and a set of programs to access
those data. This is a collection of related data with an implicit meaning and hence is a database. The collection
of data, usually referred to as the database, contains information relevant to an enterprise. The primary goal of
a DBMS is to provide a way to store and retrieve database information that is both convenient and efficient. By
data, we mean known facts that can be recorded and that have implicit meaning.
The management system is important because without the existence of some kind of rules and regulations it is
not possible to maintain the database. We have to select the particular attributes which should be included in a
particular table; the common attributes to create relationship between two tables; if a new record has to be
inserted or deleted then which tables should have to be handled etc. These issues must be resolved by having
some kind of rules to follow in order to maintain the integrity of the database.
Database systems are designed to manage large bodies of information. Management of data involves both
defining structures for storage of information and providing mechanisms for the manipulation of information. In
addition, the database system must ensure the safety of the information stored, despite system crashes or
attempts at unauthorized access. If data are to be shared among several users, the system must avoid possible
anomalous results.
Because information is so important in most organizations, computer scientists have developed a large body of
concepts and techniques for managing data. These concepts and technique form the focus of this book. This
6
chapter briefly introduces the principles of database systems.
Databases touch all aspects of our lives. Some of the major areas of application are as follows:
1. Banking
2. Airlines
3. Universities
4. Manufacturing and selling
5. Human resources
Enterprise Information
◦ Sales: For customer, product, and purchase information.
◦ Accounting: For payments, receipts, account balances, assets and other accounting information.
◦ Human resources: For information about employees, salaries, payroll taxes, and benefits, and for generation
of paychecks.
◦ Manufacturing: For management of the supply chain and for tracking production of items in factories,
inventories of items inwarehouses and stores, and orders for items.
Online retailers: For sales data noted above plus online order tracking,generation of recommendation lists,
and
maintenance of online product evaluations.
Banking and Finance
◦ Banking: For customer information, accounts, loans, and banking transactions.
◦ Credit card transactions: For purchases on credit cards and generation of monthly statements.
◦ Finance: For storing information about holdings, sales, and purchases of financial instruments such as
stocks and bonds; also for storing real-time market data to enable online trading by customers and
automated trading by the firm.
• Universities: For student information, course registrations, and grades (in addition to standard enterprise
information such as human resources and accounting).
• Airlines: For reservations and schedule information. Airlines were among the first to use databases in a
geographically distributed manner.
• Telecommunication: For keeping records of calls made, generating monthly bills, maintaining balances on
prepaid calling cards, and storing information about the communication networks.
System programmers wrote these application programs to meet the needs of the university.
New application programs are added to the system as the need arises. For example, suppose that a university
decides to create a new major (say, computer science).As a result, the university creates a new department
and creates new permanent files (or adds information to existing files) to record information about all the
instructors in the department, students in that major, course offerings, degree requirements, etc. The university
may have to write new application programs to deal with rules specific to the new major. New application
programs may also have to be written to handle new rules in the university. Thus, as time goes by, the system
acquires more files and more application programs.
This typical file-processing system is supported by a conventional operating system. The system stores
permanent records in various files, and it needs different application programs to extract records from, and add
records to, the appropriate files. Before database management systems (DBMSs) were introduced,
organizations usually stored information in such systems. Keeping organizational information in a file-
processing system has a number of major disadvantages:
Data redundancy and inconsistency. Since different programmers create the files and application programs
over a long period, the various files are likely to have different structures and the programs may be written in
several programming languages. Moreover, the same information may be duplicated in several places (files).
For example, if a student has a double major (say, music and mathematics) the address and telephone number
of that student may appear in a file that consists of student records of students in the Music department and in
a file that consists of student records of students in the Mathematics department. This redundancy leads to
higher storage and access cost. In addition, it may lead to data inconsistency; that is, the various copies of
the same data may no longer agree. For example, a changed student address may be reflected in the Music
department records but not elsewhere in the system.
Difficulty in accessing data. Suppose that one of the university clerks needs to find out the names of all
students who live within a particular postal-code area. The clerk asks the data-processing department to
generate such a list. Because the designers of the original system did not anticipate this request, there is no
application program on hand to meet it. There is, however, an application program to generate the list of all
students.
The university clerk has now two choices: either obtain the list of all students and extract the needed
information manually or ask a programmer to write the necessary application program. Both alternatives are
obviously unsatisfactory. Suppose that such a program is written, and that, several days later, the same clerk
needs to trim that list to include only those students who have taken at least 60 credit hours. As expected, a
program to generate such a list does not exist. Again, the clerk has the preceding two options, neither of which
is satisfactory. The point here is that conventional file-processing environments do not allow needed data to be
retrieved in a convenient and efficient manner. More responsive data-retrieval systems are required for general
use.
Data isolation. Because data are scattered in various files, and files may be in different formats, writing new
application programs to retrieve the appropriate data is difficult.
Integrity problems. The data values stored in the database must satisfy certain types of consistency
constraints. Suppose the university maintains an account for each department, and records the balance
amount in each account. Suppose also that the university requires that the account balance of a department
may never fall below zero. Developers enforce these constraints in the system by adding appropriate code in
the various application programs. However, when new constraints are added, it is difficult to change the
programs to enforce them. The problem is compounded when constraints involve several data items from
different files.
8
Atomicity problems. A computer system, like any other device, is subject to failure. In many applications, it is
crucial that, if a failure occurs, the data be restored to the consistent state that existed prior to the failure.
Consider a program to transfer $500 from the account balance of department A to the account balance of
department B. If a system failure occurs during the execution of the program, it is possible that the $500 was
removed from the balance of department A but was not credited to the balance of department B, resulting in an
inconsistent database state. Clearly, it is essential to database consistency that either both the credit and debit
occur, or that neither occur.
That is, the funds transfer must be atomic—it must happen in its entirety or not at all. It is difficult to ensure
atomicity in a conventional file-processing system.
Concurrent-access anomalies. For the sake of overall performance of the system and faster response, many
systems allow multiple users to update the data simultaneously. Indeed, today, the largest Internet retailers
may have millions of accesses per day to their data by shoppers. In such an environment, interaction of
concurrent updates is possible and may result in inconsistent data. Consider department A, with an account
balance of $10,000. If two department clerks debit the account balance (by say $500 and $100, respectively) of
department A at almost exactly the same time, the result of the concurrent executions may leave the budget in
an incorrect (or inconsistent) state. Suppose that the programs executing on behalf of each withdrawal read the
old balance, reduce that value by the amount being withdrawn, and write the result back. If the two programs
run concurrently, they may both read the value $10,000, and write back $9500 and $9900, respectively.
Depending on which one writes the value last, the account balance of department A may contain either $9500
or $9900, rather than the correct value of $9400. To guard against this possibility, the system must maintain
some form of supervision.
But supervision is difficult to provide because data may be accessed by many different application programs
that have not been coordinated previously.
As another example, suppose a registration program maintains a count of students registered for a course, in
order to enforce limits on the number of students registered. When a student registers, the program reads the
current count for the courses, verifies that the count is not already at the limit, adds one to the count, and stores
the count back in the database. Suppose two students register concurrently, with the count at (say) 39. The two
program executions may both read the value 39, and both would then write back 40, leading to an incorrect
increase of only 1, even though two students successfully registered for the course and the count should be 41.
Furthermore, suppose the course registration limit was 40; in the above case both students would be able to
register, leading to a violation of the limit of 40 students.
Security problems. Not every user of the database system should be able to access all the data. For example,
in a university, payroll personnel need to see only that part of the database that has financial information. They
do not need access to information about academic records. But, since application programs are added to the
file-processing system in an ad hoc manner, enforcing such security constraints is difficult.
These difficulties, among others, prompted the development of database systems. In what follows, we shall see
the concepts and algorithms that enable database systems to solve the problems with file-processing systems.
Advantages of DBMS:
Controlling of Redundancy: Data redundancy refers to the duplication of data (i.e storing same data multiple
times). In a database system, by having a centralized database and centralized control of data by the DBA the
unnecessary duplication of data is avoided. It also eliminates the extra time for processing the large volume of
data. It results in saving the storage space.
9
Improved Data Sharing : DBMS allows a user to share the data in any number of application programs.
Data Integrity : Integrity means that the data in the database is accurate. Centralized control of the data helps
in permitting the administrator to define integrity constraints to the data in the database. For example: in
customer database we can can enforce an integrity that it must accept the customer only from Noida and
Meerut city.
Security : Having complete authority over the operational data, enables the DBA in ensuring that the only
mean of access to the database is through proper channels. The DBA can define authorization checks to be
carried out whenever access to sensitive data is attempted.
Data Consistency : By eliminating data redundancy, we greatly reduce the opportunities for inconsistency. For
example: is a customer address is stored only once, we cannot have disagreement on the stored values. Also
updating data values is greatly simplified when each value is stored in one place only. Finally, we avoid the
wasted storage that results from redundant data storage.
Efficient Data Access : In a database system, the data is managed by the DBMS and all access to the data is
through the DBMS providing a key to effective data processing
Enforcements of Standards : With the centralized of data, DBA can establish and enforce the data standards
which may include the naming conventions, data quality standards etc.
Data Independence : Ina database system, the database management system provides the interface between
the application programs and the data. When changes are made to the data representation, the meta data
obtained by the DBMS is changed but the DBMS is continues to provide the data to application program in the
previously used way. The DBMs handles the task of transformation of data wherever necessary.
Reduced Application Development and Maintenance Time : DBMS supports many important functions that
are common to many applications, accessing data stored in the DBMS, which facilitates the quick development
of application.
Disadvantages of DBMS
1) It is bit complex. Since it supports multiple functionality to give the user the best, the underlying software
has become complex. The designers and developers should have thorough knowledge about the software
to get the most out of it.
2) Because of its complexity and functionality, it uses large amount of memory. It also needs large memory to
run efficiently.
3) DBMS system works on the centralized system, i.e.; all the users from all over the world access this
database. Hence any failure of the DBMS, will impact all the users.
4) DBMS is generalized software, i.e.; it is written work on the entire systems rather specific one. Hence some
of the application will run slow.
View of Data
A database system is a collection of interrelated data and a set of programs that allow users to access and
modify these data. A major purpose of a database system is to provide users with an abstract view of the data.
That is, the system hides certain details of how the data are stored and maintained.
10
Data Abstraction
For the system to be usable, it must retrieve data efficiently. The need for efficiency has led designers to use
complex data structures to represent data in the database. Since many database-system users are not
computer trained, developers hide the complexity from users through several levels of abstraction, to simplify
users’ interactions with the system:
Database
DISK
• Physical level (or Internal View / Schema): The lowest level of abstraction describes how the data are
actually stored. The physical level describes complex low-level data structures in detail.
• Logical level (or Conceptual View / Schema): The next-higher level of abstraction describes what data are
stored in the database, and what relationships exist among those data. The logical level thus describes the
entire database in terms of a small number of relatively simple structures. Although implementation of the
simple structures at the logical level may involve complex physical-level structures, the user of the logical level
does not need to be aware of this complexity. This is referred to as physical data independence. Database
administrators, who must decide what information to keep in the database, use the logical level of abstraction.
• View level (or External View / Schema): The highest level of abstraction describes only part of the entire
database. Even though the logical level uses simpler structures, complexity remains because of the variety of
information stored in a large database. Many users of the database system do not need all this information;
instead, they need to access only a part of the database. The view level of abstraction exists to simplify their
interaction with the system. The system may provide many views for the same database. Figure 1.2 shows the
relationship among the three levels of abstraction.
An analogy to the concept of data types in programming languages may clarify the distinction among levels of
abstraction. Many high-level programming languages support the notion of a structured type. For example, we
may describe a record as follows:
type instructor = record
ID : char (5);
name : char (20);
dept name : char (20);
salary : numeric (8,2);
end;
This code defines a new record type called instructor with four fields. Each field has a name and a type
associated with it. A university organization may have several such record types, including