Document From Shanmukha Reddy
Document From Shanmukha Reddy
MODULE-I
INTRODUCTION TO DATABASE CONCEPTS AND MODELING: Introduction to Data bases, Purpose of
Database Systems, View of Data, Data Models, Database Languages, Database Users, Database Systems
architecture.
OVERVIEW OF DATABASE DESIGN: Beyond ER Design, Entities, Attributes and Entity sets, Relationships
and Relationship sets, Conceptual Design with the ER Model.
Database systems are designed to manage large bodies of information. Management of data involves
both defining structures for storage of information and providing mechanisms for the manipulation of
information. In addition to that the database system must ensure the safety of the information stored,
despite system crashes or attempts at unauthorized access. If data are to be shared among several users,
the system must avoid possible anomalous results.
System programmers wrote these application programs to meet the needs of the University. New
Application programs are added to the system as the need arises.
For example, university decides to create a new major let’s say COMPUTER SCIENCE. Then the University
creates a new department and creates new permanent files to record the information about all the
instructors in the department, students in that major, course offering, etc. The University may have to write
new application programs to deal with rules specific to new major and require new application programs to
handle new rules in the University. Thus as time goes by, the system acquires more files and more
application programs.
The file-processing system stores permanent records in various files, and it needs different
application programs to extract records from, and add records to, the appropriate files. Before database
management systems (DBMS) came along, organizations usually stored information in such systems.
Storing Organizational information in a file-processing system has a number of major disadvantages:
1. Data redundancy and inconsistency: The duplication of data available in multiple files that leads higher
storage and access cost. For example, the address and telephone number of a particular customer may
appear in a file that consists of savings-account records and in a file that consists of checking-account
records. This redundancy leads to higher storage and access cost. In addition, it may lead to data
inconsistency; that is, the various copies of the same data may no longer agree. For example, a changed
customer address may be reflected in savings-account records but not elsewhere in the system.
2. Difficulty in accessing data: Suppose that one of the bank officers needs to find out the names of all
customers who live within a particular postal-code area. The officer asks the data-processing department to
generate such a list. Because there is no application program to generate that. The bank officer has now
two choices: either obtain the list of all customers and extract the needed information manually or ask a
system programmer to write the necessary application program. Both alternatives are obviously
unsatisfactory. Suppose that such a program is written, and that, several days later, the same bank officer
needs to trim that list to include only those customers who have taken loan. As expected, a program to
generate such a list does not exist. Again the officer has the preceding two choices.
The conventional file-processing environment does not allow needed data to be retrieved in a
convenient and efficient manner.
3. Data isolation: Because data are scattered in various files, and files may be in different formats, writing
new application programs to retrieve the appropriate data is difficult.
4. Integrity problems: The data values stored in the database must satisfy certain types of consistency
constraints. Suppose the balance of a bank account may never fall below a prescribed amount (say,500/-).
Developers enforce these constraints in the system by adding appropriate code in the various application
programs. However, when new constraints are added, it is difficult to change the programs to enforce
them. The problem is compounded when constraints involve several data items from different files.
5. Atomicity problems: A computer system, like any other mechanical or electrical device, is subject to
failure. In many applications, it is crucial that, if a failure occurs, the data be restored to the consistent state
that existed prior to the failure. Consider a program to transfer $50 from account A to account B. If a
system failure occurs during the execution of the program, it is possible that the $50 was removed from
account A but was not credited to account B, resulting in an inconsistent database state. Clearly, it is
essential to database consistency that either both the credit and debit occur, or that neither occur. That is,
the funds transfer must be atomic—it must happen in its entirety or not at all. It is difficult to ensure
atomicity in a conventional file-processing system.
6. Concurrent-access anomalies: For the sake of overall performance of the system and faster response,
many systems allow multiple users to update the data simultaneously. In such an environment, interaction
of concurrent updates may result in inconsistent data. Consider bank account A, containing $500. If two
customers withdraw funds (say $50 and $100 respectively) from account A at about the same time, the
result of the concurrent executions may leave the account in an incorrect (or inconsistent) state. Suppose
that the programs executing on behalf of each withdrawal read the old balance, reduce that value by the
amount being withdrawn, and write the result back. If the two programs run concurrently, they may both
read the value $500, and write back $450 and $400, respectively. Depending on which one writes the value
last, the account may contain $450 or $400, rather than the correct value of $350. To guard against this
possibility, the system must maintain some form of supervision. But supervision is difficult to provide
because data may be accessed by many different application programs that have not been coordinated
previously.
7. Security problems: Not every user of the database system should be able to access all the data. For
example, in a university, payroll personnel need to see only that part of the database that has financial
information. They do not need access to information about academic records. But, since application
programs are added to the system in an ad hoc manner, enforcing such security constraints is difficult.
These difficulties, among others, prompted the development of database systems.
VIEW OF DATA:
A database system is a collection of interrelated files and a set of programs that allow users to
access and modify these files. A major purpose of database system is to provide users with an “Abstract
View” of the data. That is, the system hides certain details of how data stored and maintained.
Data Abstraction
The system to be usable and must retrieve data efficiently. The need for efficiency has led designer
to use complex data structures to represent data in database. System developers hide the complexity from
users through several level of abstraction, to simplify user’s interactions with the system.
Levels of Abstraction:
Physical Level: The lowest Level of Abstraction describes “How” the data are actually stored. The physical
level describes complex low-level data structures.
Logical Level: The next higher level of abstraction describes “What” data are to be stored in the database
and what relationships exist among those data. The logical level describes the entire database in terms of a
small number of relative simple structures, may involve physical-level structures. This is referred to as
Physical data independence. DBA must decide what information to keep in the database at this level.
View Level: It is the highest level of data Abstracts that describes only part of entire database. Different
users require different types of data elements from each database. For one physical and logical levels of the
database there may be more than one view level. The system may provide many views for the same
database.
The following figure shows the relationship among the three levels of abstraction:
View n
Logical
Level
For example, consider a record structured by type in high-level programming languages as follows:
type instructor = record
ID: char (5);
Name: char (15);
Dept_name: char (10);
Salary: numeric (8,2);
End;
This code defines a new record type called instructor with four fields. Each field has a name and a type
associated with it. A university organization may have several such record types, including:
At the physical level, an instructor, department, course, and student record can be described as a
block of consecutive storage locations. The compiler hides this level of details from programmers, the
database system hides many of the lowest-level storage details from database programmers. DBA may
be aware of certain details of the physical organization of the data.
At the logical level, each record is described by a type definition. Programmers using a
programming language work at this level of abstraction. Similarly, DBA usually work at this level of
abstraction.
Finally, at the view level, several views of the database are defined, and a database user sees some
or all of these views. In addition, views also provide security mechanism to prevent users from
accessing certain part of the database.
Database systems have several schemas, partitioned according to levels of abstraction. The
physical schema describes the database design at physical level, while the logical level describes the
database design at logical level. A database may also have several schemas at the view level,
sometimes called subschemas, that describes different views of the database.
Data Models:
Underlying the structure of a database is the data model. A data model can be defined as a
collection of conceptual tools for describing data, data relationships, data semantics and consistency
constraints. So the DBMS allows a user to define the stored data in terms of data model. A data model
provides a way to describe the design at a database at the physical, logical, and view levels.
The data models can be classified into four different categories:
Relational Model:
The relational model uses a collection of tables to represent both data and the relationships
among those data. Each table has multiple columns, and each column has a unique name. Tables are
also known as Relations. The relational model is an example of a record-based model. Record-based
models are so named because the database is structured in fixed-format that of several types. Each
table contains of a particular type. Each record type defines fixed number of fields, or attributes. The
columns of the table correspond to the attributes of the record type. Majority of current database
systems are based on the relational model. For example, consider the following student relation:
Entity-Relationship Model:
The entity-relationship (E-R) data model uses a collection of basic objects, called entities, and
relationship among these entities. An entity is a “thing” or “object” in the real world that is
distinguishable from other objects. Relationship is an association among two or more entities. The
entity-relationship model is widely used in database design. For example, consider customer entity and
account entity associated with relationship depositor as in following figure:
DATABASE LANGUAGES
A database system provides a data definition language to specify the database schema and data
manipulation language to express database queries and updates. In practice, the data definition and
data manipulation languages are not two separate languages; instead they simply form parts of a single
database language, such as the widely used SQL language.
Database languages are:
Data Definition Language (DDL)
Data Manipulation Language (DML)
Data Control Language (DCL)
Transaction Control Language (TCL)
Referential Integrity: To ensure that a value that appear in one relation for a given set of
attributes also appear in a certain set of attributes in another relation (referential integrity). For
example, the dept_name value in a course record must appear in the dept_name attribute of
some record of department relation.
Assertions: an assertion is any condition that the database must always satisfy. Domain
constraints and referential-integrity constraints are special form of assertions. For example,
“Every department must have at least five courses offered every semester” must be expressed
as an assertion. When an assertion is created, if that assertion is valid then any future
modifications to the database is allowed otherwise, that assertion to be violated.
Authorization: To differentiate among users as far as the types of access they are permitted on
various data values in the database. These differentiation are expressed in terms of
authorization, read authorization: which allows reading but not modification of data;
insert authorization: which allows insertion of new data but not modification of existing data;
update authorization: which allows modification but not deletion of data;
delete authorization: which allows deletion of data.
The output of the DDL is placed in the data dictionary, which contain Meta data-that is data
about data. The data dictionary is a special type of table that can only be accessed and updated by the
database system itself. The database system consults the data dictionary before reading or modifying
actual data.
The DDL commands in the SQL are CREATE, DROP, ALTER, DESC, RENAME.
Data Manipulation Language (DML)
A Data Manipulation Language is a language that enables users to access or manipulates data as
organized by the appropriate data model. The types of access are:
Retrieval of information stored in the database.
Insertion of new information into the database.
Deletion of information from the database.
Modification of information stored in the database.
Data Control Language (DCL): DCL is a language, which authenticates users to access data by others. It
describes that Control over the Database. It used to give the Permissions to the Users of the Database.
In SQL, there are commands like GRANT and REVOKE.
Grant: It is used to give user access privileges to a database.
Revoke: It is used to take back permissions from the user.
Transaction Control Language (TCL):
TCL controls the transactions over database. The commands in SQL are COMMIT, ROLLBACK, and SAVE
POINT.
COMMIT: Commit command is used to save all the transactions to the database.
ROLLBACK: Rollback command is used to undo transactions that have not already been saved to the
database.
SAVEPOINT: It is used to roll the transaction back to a certain point without rolling back the entire
transaction.
DATABASE SYSTEMS ARCHITECTURE
Database systems can be centralized or client-server, where one server machine executes work
on behalf of multiple client machines. A database system is partitioned into modules that deal with
each of the responsibilities of the overall system. The functional components of a database system can
be broadly divided into the storage manager and the query processor components as shown in the
following figure, the structure of database system architecture.
The storage manager is important because databases typically require a large amount of storage
space. Some Big organizations Database ranges from giga bytes to tera bytes. So the main memory of
computers cannot store this much information, the information is stored on disks. Data are moved
between disk storage and main memory as needed.
The query processor also very important because it helps the database systems simplify and
facilitate access to data. So quick processing of updates and queries is important. It is the job of the
database system to translate updates and queries written in a nonprocedural language, at the logical
level, into an efficient sequence of operations at the physical level.
Storage Manager:
A storage manager is the component of a database system that provides the interface between
the low-level data stored in the database and the application programs and queries submitted to the
system. The storage manager is responsible for the interaction with the file manager. The storage
manager translates the various DML statements into low-level file-system commands. Thus, the
storage manager is responsible for storing, retrieving, and updating data in the database.
The Storage Manager Components are:
Authorization and integrity manager: which tests for the satisfaction of integrity constraints and
checks the authority of users to access data.
Transaction manager: which ensures that the database remains in a consistent (correct) state despite
system failures, and that concurrent transaction executions proceed without conflicting.
File manager: which manages the allocation of space on disk storage and the data structures used to
represent information stored on disk.
Buffer manager: which is responsible for fetching data from disk storage into main memory and
deciding what data to cache in main memory.
The storage manager implements several data structures as part of the physical system
implementation.
Data files are used to store the database itself.
Data dictionary is used to stores metadata about the structure of the database, in particular the
schema of the database.
Indices- which can provide fast access to data items. A database index provides pointers to those data
items that hold a particular value.
Query Processor:
Query Processor Components:
DDL interpreter: It interprets DDL statements and records the definitions in the data dictionary.
DML compiler: It translates DML statements in a query language into an evaluation plan consisting of
low-level instructions that the query evaluation engine understands.
Query evaluation engine: It executes low-level instructions generated by the DML compiler.
Application Architectures:
Database applications are usually partitioned into two or three parts. That is two –Tier Architecture,
Three – Tier Architecture.
Two-tier architecture:
The application resides at the client machine, which invokes database system functionality at the server
machine through query language statements. Application program interface standards like ODBC and
JDBC are used for interaction between the client and the server.
Three-tier architecture: The client machine acts as merely a front end and does not contain any direct
database calls. Instead, the client end communicates with an application server, usually through a
forms interface. The application server in turn communicates with a database system to access data.
The business logic of the application, which says what actions to carry out under what conditions, is
embedded in the application server, instead of being distributed across multiple clients. Three-tier
applications are more appropriate for large applications, and for applications that run on the World
Wide Web.
1. Requirements Analysis: The very first step in designing a database application is to understand what
data is to be stored in the database, what applications must be built on top of it, and what operations
are most frequent and subject to performance requirements. The database designers collect
information of the organization and analyzer, the information to identify the user’s requirements. The
database designers must find out what the users want from the database.
2. Conceptual Database Design: Once the information is gathered in the requirements analysis step a
conceptual database design is developed and is used to develop a high level description of the data to
be stored in the database, along with the constraints that are known to hold over this data. This step is
often carried out using the ER model, or a similar high-level data model.
3. Logical Database Design: In this step convert the conceptual database design into a database
schema (Logical Database Design) in the data model of the chosen DBMS. In terms of relational
DBMSs, and therefore, the task in the logical design step is to convert an ER schema into a relational
database schema. The result is a conceptual schema, sometimes called the logical schema, in the
relational data model.
BEYOND THE ER DESIGN: The first three steps are more relevant to the ER Model. Once the logical
scheme is defined designer consider the physical level implementation and finally provide certain
security measures. The remaining three steps of database design are briefly described below:
4. Schema Refinement: The fourth step in database design is to analyze the collection of relations in
our relational database schema to identify potential problems, and to refine it. In contrast to the
requirements analysis and conceptual design steps, which are essentially subjective, schema
refinement can be guided by some elegant and powerful theory.
5. Physical Database Design: In this step, must consider typical expected workloads that our database
must support and further refine the database design to ensure that it meets desired performance
criteria. This step may simply involve building indexes on some tables and clustering some tables, or it
may involve a substantial redesign of parts of the database schema obtained from the earlier design
steps.
6. Security Design: The last step of database design is to include security features. This is required to
avoid unauthorized access to database practice after all the six steps. We required Tuning step in which
all the steps are interleaved and repeated until the design is satisfactory.
ENTITY: An entity is a thing which is distinguishable from other objects. Entities are represented by
using rectangular boxes. ( ). For example each person in a university is an entity.
Eg: Student, Course, employee, department etc.,
ENTITY SET: A set of entities of the same type that shares the same properties or attributes is called
entity set.. The entity set student might represent the set of all students in the university. There are
two types of entities. They are 1.Strong entity. 2. Weak entity.
STRONG ENTITY: An entity whose existence does not depend on existence of another entity is called
strong entity. An entity set that has a primary key is termed as a strong entity set. Strong entity can be
represented in rectangular box.
Here student is a strong entity, as it has a primary key attribute ID which is uniquely defined.
WEAK ENTITY: An entity whose existence depends on the existence of another entity is called Weak
entity. An entity set may not have sufficient attributes to form a primary key such an entity set is
termed as a weak entity set. Weak entity is represented by double rectangular box.
Attribute: The property of the Entity is called an Attribute. The entity is described using a set of
Attributes. It is represented with oval.
Example: Employee Number, Name, Salary etc.,
There are 8 types of attributes. They are:
1. Simple attribute. 2. Composite attribute.3. Single valued attribute. 4. Multi valued attribute. 5. Null
attribute. 6. Key attribute. 7. Derived attribute. 8. Descriptive attribute.
SIMPLE ATTRIBUTE: The attributes that can’t be divided into subparts are known as simple attributes.
Here the attributes ID, salary cannot be divided into subparts. So they are simple attributes.
COMPOSITE ATTRIBUTE: The attributes that can be divided into subparts are known as composite
attributes.
For example, an attribute name could be structured as a composite attribute consisting of first_ name,
middle_ initial and last_ name
SINGLE VALUED ATTRIBUTE: The attribute have a single value for a particular entity is known as single
valued attribute. Eg: For instance, the student ID attribute for a specific student entity refers to only
one student ID. Such attributes are said to be single valued.
MULTI VALUED ATTRIBUTE: The attribute that have many values for a particular entity is known as
multi valued attribute. Eg: If we add to the instructor entity set an attribute dependent name listing all
the dependents. This attribute will be multi valued, since any particular instructor may have zero, one
or more dependents. To denote that an attribute is multi valued, it represented by double oval shape.
NULL ATTRIBUTE: A null attribute is an attribute that uses a null value when an entity doesn’t have a
value for an attribute. Null attribute is represented by empty oval shape.
For example, one may have no middle name. Null can also designate that an attribute value is
unknown. An unknown value may be either missing or not known.
KEY ATTRIBUTE: A key is a minimal set of attributes whose values uniquely identify an entity in the set.
Eg: The entity set instructor includes the attributes ID, name, dept_name and salary with ID forming
the primary key. It can be represented by underlining the key attribute.
DERIVED ATTRIBUTE: The values for this type of attribute can be derived from the values of another
related attribute. Derived attribute represented by dotted oval shape.
Here instructor is an entity where age indicates the instructor’s age. If it also has the attribute date of
birth we can calculate age from date of birth and current date.
DESCRIPTIVE ATTRIBUTE: A relationship can also have descriptive attributes. Descriptive attributes are
used to record the information about the relationship rather than about any one of the participating
entities.
Eg: Let us consider a relationship advisor with entity sets instructor and student and associate the
attribute date with that relationship to specify the date when an instructor became advisor of a
student.
Domain: The attribute with set of values is called Domain. For each attribute associated with an entity
set, must identify a domain of possible values. For example, if the company rates employees on a scale
of 1 to 10 and stores ratings in a field called rating, the associated domain consists of integers 1
through 10.
Key: A key is a minimal set of attributes whose values uniquely identify an entity in the set. There could
be more than one candidate key, and then select any of them as the Primary Key. So for each entity set
choose a key. The key attribute is represented with under line. Each attribute in the primary key is
underlined.
Relationship Set: The collection of similar relationships is a relationship set. A relationship set can be
thought of as a set of n-tuples: {(e1,…, en) | e1 ϵ E1,……en ϵ En}
.Each n-tuple denotes a relationship involving n entities e1 through en, where entity e is in entity set Ei.
The above Figure shows the relationship set Works_In, in which each relationship indicates a
department in which an employee works.
Descriptive Attributes: Descriptive attributes are used to record information about the relationship.
From above figure, since attribute is descriptive attribute which stores the information about
relationship.
Instance: An instance of a relationship set is a set of relationships. Below figure show that each
Employees entity is denoted by its ssn, and each Departments entity is denoted by it’s did. The since
value is shown beside each relationship.
Degree:
The number of entities associated with a relationship set is known by degree of entities.
1. Unary: Where the association is within a single entity.
2. Binary: A relationship that associates two entities is knows as binary relationship.
3. Ternary: When three entities are participating in the relation then that is a Ternary.
4. N – Ary. When more than three entities are participating in the relation then that is a N – ary
relation.
Consider another example of ER diagram, suppose that each department has offices in several
locations and to record the location at which each employee works. This relationship is ternary because
it must record an association between an employee, a department, and a location as shown in bellow:
Many to one – When entities in one entity set can take part only once in the relationship set and
entities in other entity set can take part more than once in the relationship set, cardinality is many to
one. Let us assume that a student can take only one course but one course can be taken by many
students. So the cardinality will be n to 1. It means that for one course there can be n students but for
one student, there will be only one course.
Many to many – When entities in all entity sets can take part more than once in the
relationshipcardinality is many to many. Let us assume that a student can take more than one course
and one course can be taken by many students. So the relationship will be many to many.
It records the interval during which an employee works for a department. Now suppose that it is
possible for an employee to work in a given department over more than one period. This possibility is
ruled out by the ER diagram's semantics. The problem is to record several values for the descriptive
attributes for each instance of the Works_In2 relationship. May address this problem by introducing an
entity set called, say, Duration, with attributes from and to, as shown in below the Figure.
Entity versus Relationship:
Consider the relationship set called Manages in the below Figure. Suppose that each department
manager is given a discretionary budget (dbudget), as shown in below the Figure, in which the
relationship set renamed to Manages2.
There is at most one employee managing a department, but a given employee could manage
several departments; we store the starting date and discretionary budget for each manager-
department pair. This approach is natural if we assume that a manager receives a separate
discretionary budget for each department that he or she manages. But what if the discretionary budget
is a sum that covers all departments managed by that employee? In this case each Manages2
relationship that involves a given employee will have the same value in the dbudget field. In general
such redundancy could be significant and could cause a variety of problems.
Binary versus Ternary Relationships:
Consider the ER diagram shown in below Figure. It models a situation in which an employee can own
several policies, each policy can be owned by several employees, and each dependent can be covered
by several policies.
Fig: Policies as an Entity Set
The following additional requirements also there:
A policy cannot be owned jointly by two or more employees.
Every policy must be owned by some employee.
Dependent is a weak entity set, and each dependent entity is uniquely identified by taking
pname in conjunction with the policyid of a policy.
The first requirement suggests that, it impose a key constraint on Policies with respect to Covers, but
this constraint has the effect that a policy can cover only one dependent. The second requirement
suggests that, it impose a total participation constraint on Policies. This solution is acceptable if each
policy covers at least one dependent.
The third requirement forces to introduce an identifying relationship that is binary as show in
below:
As a good example of a ternary relationship, consider entity sets Parts, Suppliers, and Departments,
and a relationship set Contracts that involves all of them. A contract specifies that a supplier will supply
a part to a department. This relationship cannot be adequately captured by a collection of binary
relationships without the use of aggregation. With binary relationships, can denote that a supplier `can
supply' certain parts, that a department `needs' some parts, or that a department `deals with' a certain
supplier.
The choice between using aggregation or a ternary relationship is mainly determined by the existence
of a relationship that relates a relationship set to an entity set. The choice may also be guided by
certain integrity constraints that we want to express. For example, consider the ER diagram shown in
Figure “Aggregation”. According to this diagram, a project can be sponsored by any number of
departments, a department can sponsor one or more projects, and each sponsorship is monitored by
one or more employees.