Lecture 1 Introduction To Database Systems
Lecture 1 Introduction To Database Systems
Objectives
The objective of this lecture is to introduce the significance of database systems—their
significance, the underlying theoretical principles that govern them, how they are constructed, and their
management. After this lecture, student will understand
Reference Reading
• C. J. Date, “An Introduction to Database Systems”, 8th edition, 2015
• Chapter 1: An Overview of Database Management
• Ramakrishnan, Gehrke, “ Database Management Systems”, 3th Edition
• Chapter 1: Overview of Database Systems
• Abraham Silberschatz, Henry F. Korth, S. Sudarshan, “Database System Concepts”, 7th Edition
• Chapter 1: Introduction
• Elvis C. Foster , Shripad Godbole, “Database Systems, A Pragmatic Approach”, Second Edition,
216
• Chapter 1: Introduction to Database Systems
1
1. Introduction
Most organizations store many files that contain the data they need to operate their businesses; for
example, businesses often need to maintain files containing data about employees, customers, inventory
items, and orders. Many organizations use a database to organize the information in these files. A database
holds a group of files that an organization needs to support its applications.
A database management system (DBMS) is a computerized record keeping software (or) system for
creating, manipulating, accessing a database. The overall purpose of DBS is maintaining information and
making it available whenever required by applications to accesses to the data, guaranteeing many properties
about the data and the accesses.
The database management system consists of two parts. They are:
1. Database : a collection of related data about facts, figures, statistics, etc.
2. Management System : a software system that manage the data in database.
Before proceeding further study about database, it is important to make a distinction between data and
information.
Data refers to the raw materials that software systems act on in order to produce useful
information to end users.
o Example: Customer Data –
o --- 1.CustomerName (cname), 2.CustoomerNo (cno), 3. Customer City (ccity).
Information is processed and assimilated data that conveys meaning to its intended users.
o Example: Customer who live in NewYork city.
A database is a collection of data that is organized, which is also called structured data. It can be
accessed or stored in a computer system. For example, data can be Facts, figures, statistics, etc.
o Example: Customer Data –
o --- 1.CustomerName (cname), 2.CustoomerNo (cno), 3. Customer City (ccity).
o These can be stored in the form of tables as structured data:
The database is a basically just a computerized filing cabinet that is a repository or container for a
collection of computerized related data files. The data in a database refers to as “persistent”, because
once data has been entered into the database by the database management system (DBMS), it can
subsequently be removed from the database only by some explicit request to the DBMS. So more precise
definition for the tern database:
“A database is a collection of persistent data that is used by the application systems of some
given enterprise.”
The term “enterprise" here is some organization or individual, for examples: a manufacturing
company, a bank, a hospital or a university or a government department.
The “persistent data” in those enterprises include product data, account data, patient data, student
data and planning data.
2
(2) What is the Management System?
A Data Base Management System (DBMS) is software system and/or applications designed to
manage a database in maintaining and utilizing large collections of data for easy, efficient and reliable data
processing and management. The software allows users to store, retrieve, and manage data efficiently.
It acts as an intermediary between the user and the data, ensuring organized and secure data
handling. The software performs variety of operations for users’ request such as: adding new files to the
database, inserting data into existing files, retrieving data from existing files, deleting data from existing
files, changing data in existing files and removing existing files from the database. A DBMS is used for:
In earlier times, data was stored and retrieved using files in a typical file system. For example: A
company might keep separate files for employees’ details, customer information, and daily sales. These
files could be stored as text documents, spreadsheets, or printed records in cabinets. This approach worked
fine for small amounts of data but became challenging as the volume of data increased. File systems were
the natural choice for several reasons:
Simplicity: It was easy to create and manage files without requiring specialized software.
Low Cost: There was no need to invest in additional tools or training to use file systems.
Direct Access: Users could access files directly from storage devices.
3
(6) Concurrency Issues
Multiple users could not access or update files simultaneously without causing conflicts or data loss.
To address these challenges, the Database Management System (DBMS) was developed. A DBMS is
software that allows users to store, retrieve, and manage data efficiently. It acts as an intermediary between
the user and the data, ensuring organized and secure data handling.
Here are the key benefits that DBMS brought compared to traditional file systems:
Example: Customers and their orders can be linked using a “customer ID.”
4
2.3 Comparison of File Systems and DBMS
Databases are essential to software engineering; many software systems have underlying databases that are
constantly accessed, though in a manner that is transparent to the end user. Table 1-1 provides some
examples. Companies that compete in the marketplace need databases to store and manage their mission
critical and other essential data.
At the core of an expert system is a knowledge base containing cognitive data that
Expert Systems is accessed and used by an inference engine to draw conclusions based on input
fed to the system.
5
Like desktop applications, computer-aided software engineering (CASE) tools
CASE and RAD Tools and rapid application development (RAD) tools rely on complex resource
databases to service the user requests and provide the features used.
Like CASE and RAD tools, a DBMS also relies on a complex resource databases
to service the user requests and provide the features used. Additionally, a DBMS
DBMS Suites
maintains a very sophisticated meta database (called a data dictionary or system
catalog) for each user database that is created and managed via the DBMS.
Figure 1-2 illustrates a simplified picture of a database system. The main components of a DBS include
Database users communicate with the software systems/applications, which in turn communicate
(through the programming interface) with the DBMS. The DBMS communicates with the operating system
(which in turn communicates with the hardware) to store data in and/or fetch data from the database. Figure
1-2 illustrates these basic concepts and Figure 1-3 shows data flows between components of DBS.
6
Figure 1-3. Communication among components of a DBS
There are several primary and secondary objectives of a database system that should concern the
computer science (CS) professional. Whether you are planning to design, construct, develop, and implement
a DBS, or you are simply shopping around for a DBMS, these objectives help you to understand database
systems with useful insight into where the course is heading. As you will soon see, these objectives are
lofty, and it is by no means easy to achieve them all.
Security and protection, which includes the prevention of unauthorized users and protection from
inter-process interference
Reliability, which is the assurance of stable, predictable performance
Facilitation of multiple users
Flexibility, including the ability to obtain data and effect action via various methods
Ease of data access and data change
Accuracy and consistency
Clarity, which includes standardization of data to avoid ambiguity
Ability to service unanticipated requests
• Protection of the investment, typically achieved through backup and recovery procedures
• Minimization of data proliferation, so new application needs may be met with existing data rather
than creating new files and programs
• Availability, so that data is available to users whenever it is required
Physical Data Independence: Storage hardware and storage techniques are insulated from
application programs.
Logical Data Independence: Data items can be added or subtracted, or the overall logical structure
modified, without affecting existing application programs that access the database
Control of Redundancy: The general rule is to store data minimally and not replicate that storage
in multiple places unless this is absolutely necessary.
Integrity Controls: Range checks and other controls must prevent invalid data from entering the
system.
Clear Data Definition: It is customary to maintain a data dictionary that unambiguously defines
each data item stored in the database.
A Suitably Friendly User Interface: It can be graphical, command-based, or menu-based.
7
Tunability: Easy reorganizing the database to improve performance without changing the
application programs.
Automatic Reorganization of Migration: This improves performance.
A database system brings a number of advantages to its end users as well as the company that owns it.
Some of the advantages are the following:
A very important advantage of using a DBMS is that it offers data independence. The separation of
data structure of database from the application program that uses the data is called data independence. In
DBMS, you can easily change the structure of database without modifying the application program. Data
independence is the protection of application programs to changes in structure and access strategy of data.
The architecture of the database systems provides a basis for achieving this data independence. It is
necessary for the following reasons:
Different applications and users will need to have different logical views (interpretation) of data.
The tuning of the system should not affect the application programs.
(1) Physical data independence implies that the user’s view is independent of physical file organization,
machine, or storage medium.
(2) Logical data independence implies that each user (or application program) can have his/her (its) own
logical view and does not need a global view of the database.
Traditional approach to database design is conventional file processing systems, which involve
many files. Figure 1-4 illustrates the idea of the conventional file approach. Application programs exist to
update files or retrieve information from files.
8
The main problem with the traditional approach is the absence of data independence. To illustrate
the problem, consider for a moment an information system consisting of 30 data files and 150 application
programs that manipulate those files. Suppose that each data file impacts 10–15 application programs.
Whenever it becomes necessary to adjust the structure of a data file in any way, it will be necessary to track
down 10–15 application programs and adjust them as well. Certainly, you do realize this is a very
inefficient way of managing a complex software system that may be contingent on a far more complex
database.
The data to be stored in the database can be defined in terms of a data model or database model, i.e.,
underlying the structure of a database. A database model shows the logical structure of a database,
including the data relationships, data semantics and consistency constraints that determine how data can be
stored and accessed. Individual database models are designed based on the rules and concepts of
whichever broader data model the designers adopt. Most data models can be represented by an
accompanying database diagram.
There are many kinds of data models. Some of the most common ones include:
9
Figure 1-5. Hierarchical database model
10
(4) Object-oriented database model
This model defines a database as a collection of objects, or reusable software elements, with
associated features and methods. There are several kinds of object-oriented databases:
A multimedia database incorporates media, such as images, that could not be stored in a relational
database.
A hypertext database allows any object to link to any other object. It’s useful for organizing lots of
disparate data, but it’s not ideal for numerical analysis.
The object-oriented database model is the best known post-relational database model, since it
incorporates tables, but isn’t limited to tables. Such models are also known as hybrid database models.
Figure 1-9. The difference between Relational and object-oriented types of Database Models
11
Figure 1-10. Entity relationship model for database design
(6) The object-relational model, which combines the two that make up its name
This hybrid database model combines the simplicity of the relational model with some of the
advanced functionality of the object-oriented database model. In essence, it allows designers to incorporate
objects into the familiar table structure.
Inverted file model: A database built with the inverted file structure is designed to facilitate
fast full text searches. In this model, data content is indexed as a series of keys in a lookup
table, with the values pointing to the location of the associated files. This structure can
provide nearly instantaneous reporting in big data and analytics, for instance.
Flat model: The flat model is the earliest, simplest data model. It simply lists all the data in a
single table, consisting of columns and rows. In order to access or manipulate the data, the
computer has to read the entire flat file into memory, which makes this model inefficient for
all but the smallest data sets.
Multidimensional model: This is a variation of the relational model designed to facilitate
improved analytical processing. While the relational model is optimized for online
transaction processing (OLTP), this model is designed for online analytical processing
(OLAP). Each cell in a dimensional database contains data about the dimensions tracked by
the database. Visually, it’s like a collection of cubes, rather than two-dimensional tables.
Semi-structured model: In this model, the structural data usually contained in the database
schema is embedded with the data itself. Here the distinction between data and schema is
vague at best. This model is useful for describing systems, such as certain Web-based data
sources, which we treat as databases but cannot constrain with a schema. It’s also useful for
describing interactions between databases that don’t adhere to the same schema.
Context model: This model can incorporate elements from other database models as needed.
It cobbles together elements from object-oriented, semi-structured, and network models.
Associative model: This model divides all the data points based on whether they describe an
entity or an association. In this model, an entity is anything that exists independently,
whereas an association is something that only exists in relation to something else. The
associative model structures the data into two sets:
(1) A set of items, each with a unique identifier, a name, and a type
12
(2) A set of links, each with a unique identifier and the unique identifiers of a source, verb, and
target. The stored fact has to do with the source, and each of the three identifiers may refer
either to a link or an item.
Other, less common database models include:
Semantic model, which includes information about how the stored data relates to the real world
XML database, which allows data to be specified and even stored in XML format
Named graph
Triplestore
The hierarchical model, network model, and the inverted-list model are outdated database approaches.
The relational model and the object-oriented model are dominant contemporary database approaches. In
addition to these database approaches, there are three emerging database approaches that are worthy
of mention. They are summarized below:
1) Hadoop: This describes a framework for handling distributed processing of large data sets.
2) Entity-Attributes-Value (EAV) Model: This approach reduces a database to three principal
storage entities: an entity for defining other entities; an entity for defining properties (attributes) of
entities; an EAV entity that connects the other two entities and stored values for entity-attribute
combinations.
3) NoSQL: This approach refers to a family of non-relational database approaches that are designed
for managing large data sets, while providing benefits such as flexibility, scalability, availability,
lower costs, and special capabilities.
In addition to the object database model, other non-SQL models have emerged in contrast to the
relational model. NoSQL databases are especially useful when working with large or fast-moving data
that may not fit neatly into a table. NoSQL databases use various data models for accessing and
managing data. These databases are optimized for applications needing flexible data models, handling
large volumes of data, and achieving low latency. They accomplish this by relaxing some of the data
consistency restrictions found in relational databases, making them ideal for dynamic, high-performance
applications that require scalability and speed. Instead of tables, NoSQL databases use more flexible
data models, such as key-value pairs, documents, or graphs. They offer scalability and flexibility,
making them suitable for handling large amounts of unstructured or semi-structured data. Examples include
MongoDB, CouchBase, Cassandra, and Redis. Some NoSQL models are
The graph database model, which is even more flexible than a network model, allowing any node
to connect with any other.
The multi-value model, which breaks from the relational model by allowing attributes to contain a
list of data rather than a single data point. For example Key-Value, Wide-Column
The document model, also called Document-oriented, which is designed for storing and
managing documents or semi-structured data, rather than atomic data.
Table 1-2 list two categories of databases types: Relational and NoSQL Databases. NoSQL is then
further divided into four types: Document-oriented, Key-Value, Wide-Column, and Graph databases.
13
Table 1-2: Relational model and NoSQL model
Relational Databases Store data in a structured format with rows and MySQL, Oracle, PostgreSQL,
(SQL) columns. Microsoft SQL Server
NoSQL Databases that store data in documents, usually MongoDB, Cassandra, Redis,
NoSQL Databases
in formats like JSON. Couchbase
NoSQL Databases that are designed to handle data Redis, Amazon DynamoDB,
Key-Value Stores
whose relations are best represented as a graph. Riak
Store data in a structured format with rows and Apache Cassandra, Google’s
Wide-Value Stores
columns. Bigtable, HBase
Most websites rely on some kind of database to organize and present data to users. Whenever
someone uses the search functions on these sites, their search terms are converted into queries for a
database server to process. Typically, middleware connects the web server with the database.
The broad presence of databases allows them to be used in almost any field, from online shopping to micro-
targeting a voter segment as part of a political campaign. Various industries have developed their own
norms for database design, from air transport to vehicle manufacturing
Among these data models, the relational data model is the most widely used data model, and a
vast majority of current database systems are based on the relational model. Since the 1970s, relational
databases have dominated the field of database systems. Object databases created some interest for a while,
but it appears that they have been replaced by more contemporary approaches such as the EAV model,
Hadoop, and NoSQL. Still, relational databases continue to dominate. The other three approaches are
traditional approaches that have been discarded due to their related problems.
In the database approach, a database is created and managed via a database management system
(DBMS) or CASE tool. A user interface, developed with appropriate application development software, is
superimposed on the database, so that end users access the system through the user interface. All the data
resides in the database. Various software systems can then access the database. Therefore, contemporary
database systems must live up to de facto standards set by the software engineering industry. Roughly
speaking, a well-designed database system must exhibit the following features (more specific standards will
be discussed later in the course):
The importance of these features provides the choice of the DBMS in determining the characteristic
features of the database system and how they are provided.
Database design plays a crucial role in software development because it helps ensure that the software
system can efficiently and accurately store, retrieve, and manipulate data. A well designed database can
help improve the performance, scalability, and maintainability of a software system. It can also help reduce
the development costs and improve the user experience
The entire database design and development process can be broadly split into two stages.
(1) Database Planning Stage: The first stage involves the activities such as understanding the data
requirements and creating data models.
(2) Database Development Stage: The second stage involves actually building the database using a
specific database management system (DBMS).
15
Summary
• A database system is a computerized record keeping system with the overall purpose of maintaining
information and making it available on demand.
• The DBMS is the software that facilitates creation and administration of the database.
• The DBS is made up of the hardware, the operating system, the DBMS, the actual database, the
application programs, and the end users.
• There are several primary and secondary objectives of a DBS, which are of importance to the CS
professional.
• Many software systems rely on underlying database systems to provide critical information.
• A DBS brings a number of significant advantages to the business environment.
• There are three traditional approaches to constructing a DBS that are no longer prevalent today.
They are the instant small system, the file processing system, and the traditional non-relational
approaches.
• There are five contemporary approaches to constructing a DBS. They are the relational approach,
the object-oriented approach, the Hadoop framework, the EAV approach, and the NoSQL
approach. The relational approach is the most dominant.
• In striving to acquire a DBS, it is advisable to aspire for most of the objectives and advantages.
Additionally, one should aim for user friendliness, thorough documentation, and a DBMS that
provides platform independence, comprehensive system catalog, backup and recovery, appropriate
transaction management, communication with other systems, and adequate programming support.
• The database development life cycle outlines the main activities in the useful life of a DBS.
• Interested? We have just begun to touch the surface. There is a lot more to cover. Most successful
software systems are characterized by carefully designed databases. In fact, it is safe to say that the
efficacy of the software system is a function of its underlying database. So stay tuned: the next
chapter provides more clarification on the database environment.
Review Questions
Here are some review questions for you to answer. You are encouraged to write your responses down; that
way you will know whether you need to revisit related sections of the lecture.
16