Dbms-Unit 1,2 Notes
Dbms-Unit 1,2 Notes
Introduction to Database Design: Database design and ER diagrams, Entities, attributes and
entity sets, Relationships and relationship sets, Additional features of ER model, Conceptual
Design with ER model.
PL/SQL: Generic PL/SQL block, PL/SQL data types, Control structure, Procedures and
functions, Cursors, Database triggers.
Storage and Indexing: Data on external storage, File organizations and indexing – Clustered
indexes, Primary and secondary indexes; Index data structures – Hash based indexing, Tree
based indexing; Comparison of file organizations.
Module 1
INTRODUCTION TO DATABASE SYSTEMS AND DATABASEDESIGN (08 Periods)
Introduction to Database Systems: Database system applications, Purpose of
database systems,View of data - Data abstraction, Instances and schemas, Data
models; Database languages - Data Definition Language, Data Manipulation
Language; Database architecture, Database users and administrators.
Introduction:
Data: Data are raw facts. The word raw indicates that the facts have not yet been processed to reveal
their meaning.
Or
The term data referred to known raw facts. Text, graphics, images and videosegments that have meaning
in the user’s environment
Information: Information is the result of processing raw data to reveal ors meaning. Data processing can be as simple
as organizing data to reveal patterns or as complex as making forecasts or drawing inferences using statistical
modeling.
Or
Data that have been processed in such a way as to increase the knowledge of theperson who use the data.
Information is produced by processing data. Information is used to reveal the meaning of data. Accurate,
relevant, and timely information os the key
Database: The database consists of logically related data stored in a single datarepository.
Metadata:The metadata provide a description of data characteristics and the set ofrelationships that
link the data found with in the database.
Ex:
Field: A character or group of characters that has a specific meaning. A field is used todefine and store
data.
Record: A logically connected a set of one or more fields that describes a person, placeor thing.
1. Data Availability : Data availability refers to the fact That the data are creates available to thewide variety of
users can access the date easily in a meaningful format
2. Data Integrity : Data integrity refers to the correctness of the data in the database. In other words, the data
available in the database is a reliable data
3. Data Security : Data security refers to the fact that only authorized users can access the data. Data security can
be enforced by passwords.
If two separate users are accessing a particular data at the same time, The DBMS must not allow them to make
conflicting changes.
4. Data Independence : DBMS allows the user to store, update, and retrieve data in efficient manner.DBMS provides
an “abstract view” of how the data is stored in the database.
In order to store the information efficiently, complex data structures are usedto represent the data.
The system hides certain details of how the data are stored and maintained.
Early 1960s.
1. Charles Bachman at GE created the first general purpose DBMS IntegratedData Store.
2. It created the basis for the network model which was standardized byCODASYL
(Conference on Data System Language).
Late 1960s:
1. IBM developed the Information Management System (IMS).
2. IMS used an alternate model, called the Hierarchical Data Model.
In 1970:
1. Edgar Codd, from IBM created the Relational Data Model.
In 1981:
1. Codd received the Turing Award for his contributions to database theory.
2. Codd Passed away in April 2003.
In 1976:
Peter Chen presented Entity-Relationship model, which is widely used in databasedesign.
In 1980:
1. SQL developed by IBM, became the standard query language for databases. SQLwas
standardized by ISO.
• Telecom: There is a database to keeps track of the information regarding calls made, network usage, customer
details etc. Without the database systems it is hard to maintain that huge amount of data that keeps updating
every millisecond.
• Industry: Where it is a manufacturing unit, warehouse or distribution centre, each one needs a database to keep
the records of ins and outs. For example distribution centre should keep a track of the product units that supplied
into the centre as well as the products that got delivered out from the distribution centre on each day; this is where
DBMS comes into picture.
• Banking System: For storing customer info, tracking day to day credit and debit transactions, generating bank
statements etc. All this work has been done with the help of Database management systems. Also, banking system
needs security of data as the data is sensitive, this is efficiently taken care by the DBMS systems.
• Sales: To store customer information, production information and invoice details. Using DBMS, you can track,
manage and generate historical data to analyse the sales data.
• Airlines: To travel though airlines, we make early reservations, this reservation information along with flight
schedule is stored in database. This is where the real-time update of data is necessary as a flight seat reserved for
one passenger should not be allocated to another passenger, this is easily handled by the DBMS systems as the
data updates are in real time and fast.
• Education sector: Database systems are frequently used in schools and colleges to store and retrieve the data
regarding student details, staff details, course details, exam details, payroll data, attendance details, fees details
etc. There is a large amount of inter-related data that needs to be stored and retrieved in an efficient manner.
• Online shopping: You must be aware of the online shopping websites such as Amazon, Flipkart etc. These sites
store the product information, your addresses and preferences, credit details and provide you the relevant list of
products based on your query. All this involves a Database management system. Along with managing the vast
catalogue of items, there is a need to secure the user private information such as bank & card details. All this is
taken care of by database management systems.
PURPOSE OF DATABASE SYSTEMS
The purpose of DBMS is to transform the following −
• Data into information.
• Information into knowledge.
• Knowledge to the action.
The diagram given below explains the process as to how the transformation of data to information to knowledge to
action happens respectively in the DBMS −
Previously, the database applications were built directly on top of the file system
DBMS provides many advantages such as:
Disadvantages of DBMS
Database systems main disadvantages are
1. Increased costs
2. Management complexity
3. Maintaining currency
4. Vendor dependence
5. Frequent upgrade/replacement cycles
1. Increased costs:
a. Database System required sophisticated hardware and software and highly skilledpersonnel.
b. The cost of maintain the hardware, software and personnel required to operateand manage a
database system can be large.
2. Management Complexity:
a. Database Systems interface with many different technologies
b. And have a significant impact on a company’s resources and culture.
c. Database systems hold crucial company data
d. That are accessed from multiple sources, security issues must be assessedconstantly.
3. Maintaining currency:
a. To maximize the efficiency of the database system, you must key your systemcurrent.
b. User must perform frequent updates and apply.
c. The latest patches and security measures to all components.
d. Because database technologies advances rapidly, personnel training cost tend tobe significant.
4. Vendor dependence:
Given the heavy investment in technology and personnel training, companiesmight be reluctant to
change database vendors.
1. Data abstraction:Database systems are made-up of complex data structures. To ease the user interaction with
database, the developers hide internal irrelevant details from users. This process of hiding irrelevant details
from user is called data abstraction.
2. Instance and schema: Design of a database is called the schema. Schema is of three types: Physical schema,
logical schema and view schema. The data stored in database at a particular moment of time is called instance
of database. Database schema defines the variable declarations in tables that belong to a particular database;
the value of these variables at a moment of time is called the instance of that database.
DATA ABSTRACTION
Database systems are made-up of complex data structures. To ease the user interaction with database, the
developers hide internal irrelevant details from users. This process of hiding irrelevant details from user is called data
abstraction. The term “irrelevant” used here with respect to the user, it doesn’t mean that the hidden data is not
relevant with regard to the whole database. It just means that the user is not concerned about that data.
For example: When you are booking a train ticket, you are not concerned how data is processing at the back end
when you click “book ticket”, what processes are happening when you are doing online payments. You are just
concerned about the message that pops up when your ticket is successfully booked. This doesn’t mean that the
process happening at the back end is not relevant, it just means that you as a user are not concerned what is
happening in the database.
Three levels of abstraction
1. External level
It is also called view level. The reason this level is called “view” is because several users can view their desired data
from this level which is internally fetched from database with the help of conceptual and internal level mapping.
The user doesn’t need to know the database schema details such as data structure, table definition etc. user is only
concerned about data which is what returned back to the view level after it has been fetched from database (present
at the internal level).
External level is the “top level” of the Three Level DBMS Architecture.
2. Conceptual level
It is also called logical level. The whole design of the database such as relationship among data, schema of data etc.
are described in this level.
Database constraints and security are also implemented in this level of architecture. This level is maintained by DBA
(database administrator).
3. Internal level
This level is also known as physical level. This level describes how the data is actually stored in the storage devices.
This level is also responsible for allocating space to the data. This is the lowest level of the architecture.
Example: Let’s say we are storing customer information in a customer table. At physical level these records can be
described as blocks of storage (bytes, gigabytes, terabytes etc.) in memory. These details are often hidden from the
programmers.
At the logical level these records can be described as fields and attributes along with their data types, their
relationship among each other can be logically implemented. The programmers generally work at this level because
they are aware of such things about database systems.
At view level, user just interact with system with the help of GUI and enter the details at the screen, they are not
aware of how the data is stored and what data is stored; such details are hidden from them.
DBMS Schema
Definition of schema: Design of a database is called the schema. For example: An employee table in database exists
with the following attributes:
• Schema represents the logical view of the database. It helps you understand what data needs to go
where.
• Schema can be represented by a diagram as shown below.
• Schema helps the database users to understand the relationship between data. This helps in efficiently
performing operations on database such as insert, update, delete, search etc.
In the following diagram, we have a schema that shows the relationship between three tables: Course, Student and
Section. The diagram only shows the design of the database, it doesn’t show the data present in those tables.
Schema is only a structural view(design) of a database as shown in the diagram below.
The design of a database at physical level is called physical schema, how the data stored in blocks of storage is
described at this level.
Design of database at logical level is called logical schema, programmers and database administrators work at this
level, at this level data can be described as certain types of data records gets stored in data structures, however the
internal details such as implementation of data structure is hidden at this level (available at physical level).
Design of database at view level is called view schema. This generally describes end user interaction with database
systems.
DBMS Instance
Definition of instance: The data stored in database at a particular moment of time is called instance of database.
Database schema defines the attributes in tables that belong to a particular database. The value of these attributes
at a moment of time is called the instance of that database.
For example, we have seen the schema of table “employee” above. Let’s see the table with the data now. At this
moment the table contains two rows (records). This is the the current instance of the table “employee” because this
is the data that is stored in this table at this particular moment of time.
Data models
Data modeling, the first step in designing a database, refers to the process of creating a specific data model for
a definite problem domain (area or server).
A data model is a relatively simple representation, usually graphical, of more complex real word data structures.
A model is an abstraction (concept) of a more complex real-word object or event. The database environment,
a data model represents data structures and their characteristics, relations, constraints, transformations and
other constraints with the purpose of supporting a specific problem domain.
THE IMPORTANCE OF DATA MODELS: Data models can facilitate interaction among the designer the application
programmer, and the end user. A well developed data model can even foster improved understanding of the
organization for which the database design is developed.
1. Hierarchical Model
2. Network Model
3. Entity-Relationship Model
4. Relational Model
5. Object-Oriented Data Model
6. Object-Relational Data Model
7. Flat Data Model
8. Semi-Structured Data Model
9. Associative Data Model
10. Context Data Model
Hierarchical Model
Hierarchical Model was the first DBMS model. This model organises the data in the hierarchical tree structure. The
hierarchy starts from the root which has root data and then it expands in the form of a tree adding child node to
the parent node. This model easily represents some of the real-world relationships like food recipes, sitemap of a
website etc. Example: We can represent the relationship between the shoes present on a shopping website in the
following way:
Features of a Hierarchical Model
1. One-to-many relationship: The data here is organised in a tree-like structure where the one-to-many
relationship is between the datatypes. Also, there can be only one path from parent to any
node. Example: In the above example, if we want to go to the node sneakers we only have one path to reach
there i.e through men's shoes node.
2. Parent-Child Relationship: Each child node has a parent node but a parent node can have more than one
child node. Multiple parents are not allowed.
3. Deletion Problem: If a parent node is deleted then the child node is automatically deleted.
4. Pointers: Pointers are used to link the parent node with the child node and are used to navigate between
the stored data. Example: In the above example the 'shoes' node points to the two other nodes 'women
shoes' node and 'men's shoes' node.
Advantages of Hierarchical Model
Network Model
This model is an extension of the hierarchical model. It was the most popular model before the relational model.
This model is the same as the hierarchical model, the only difference is that a record can have more than one
parent. It replaces the hierarchical tree with a graph. Example: In the example below we can see that node student
has two parents i.e. CSE Department and Library. This was earlier not possible in the hierarchical model.
Features of a Network Model
1. Ability to Merge more Relationships: In this model, as there are more relationships so data is more related.
This model has the ability to manage one-to-one relationships as well as many-to-many relationships.
2. Many paths: As there are more relationships so there can be more than one path to the same record. This
makes data access fast and simple.
3. Circular Linked List: The operations on the network model are done with the help of the circular linked list.
The current position is maintained with the help of a program and this position navigates through the
records according to the relationship.
Advantages of Network Model
• The data can be accessed faster as compared to the hierarchical model. This is because the data is more
related in the network model and there can be more than one path to reach a particular node. So the data
can be accessed in many ways.
• As there is a parent-child relationship so data integrity is present. Any change in parent record is reflected
in the child record.
Disadvantages of Network Model
• As more and more relationships need to be handled the system might get complex. So, a user must be
having detailed knowledge of the model to work with the model.
• Any change like updation, deletion, insertion is very complex.
Entity-Relationship Model
Entity-Relationship Model or simply ER Model is a high-level data model diagram. In this model, we represent the
real-world problem in the pictorial form to make it easy for the stakeholders to understand. It is also very easy for
the developers to understand the system by just looking at the ER diagram. We use the ER diagram as a visual tool
to represent an ER Model. ER diagram has the following three components:
• Entities: Entity is a real-world thing. It can be a person, place, or even a concept. Example: Teachers,
Students, Course, Building, Department, etc are some of the entities of a School Management System.
• Attributes: An entity contains a real-world property called attribute. This is the characteristics of that
attribute. Example: The entity teacher has the property like teacher id, salary, age, etc.
• Relationship: Relationship tells how two attributes are related. Example: Teacher works for a department.
Example:
In the above diagram, the entities are Teacher and Department. The attributes of Teacher entity are
Teacher_Name, Teacher_id, Age, Salary, Mobile_Number. The attributes of entity Department entity are Dept_id,
Dept_name. The two entities are connected using the relationship. Here, each teacher works for a department.
Features of ER Model
• Graphical Representation for Better Understanding: It is very easy and simple to understand so it can be
used by the developers to communicate with the stakeholders.
• ER Diagram: ER diagram is used as a visual tool for representing the model.
• Database Design: This model helps the database designers to build the database and is widely used in
database design.
Advantages of ER Model
• Simple: Conceptually ER Model is very easy to build. If we know the relationship between the attributes and
the entities we can easily build the ER Diagram for the model.
• Effective Communication Tool: This model is used widely by the database designers for communicating their
ideas.
• Easy Conversion to any Model: This model maps well to the relational model and can be easily converted
relational model by converting the ER model to the table. This model can also be converted to any other
model like network model, hierarchical model etc.
Disadvatages of ER Model
• No industry standard for notation: There is no industry standard for developing an ER model. So one
developer might use notations which are not understood by other developers.
• Hidden information: Some information might be lost or hidden in the ER model. As it is a high-level view so
there are chances that some details of information might be hidden.
Relational Model
Relational Model is the most widely used model. In this model, the data is maintained in the form of a two -
dimensional table. All the information is stored in the form of row and columns. The basic structure of a relational
model is tables. So, the tables are also called relations in the relational model. Example: In this example, we have
an Employee table.
• Tuples: Each row in the table is called tuple. A row contains all the information about any instance of the
object. In the above example, each row has all the information about any specific individual like the first
row has information about John.
• Attribute or field: Attributes are the property which defines the table or relation. The values of the attribute
should be from the same domain. In the above example, we have different attributes of the employee like
Salary, Mobile_no, etc.
Advnatages of Relational Model
• Simple: This model is more simple as compared to the network and hierarchical model.
• Scalable: This model can be easily scaled as we can add as many rows and columns we want.
• Structural Independence: We can make changes in database structure without changing the way to access
the data. When we can make changes to the database structure without affecting the capability to DBMS
to access the data we can say that structural independence has been achieved.
Disadvantages of Relatinal Model
• Hardware Overheads: For hiding the complexities and making things easier for the user this model requires
more powerful hardware computers and data storage devices.
• Bad Design: As the relational model is very easy to design and use. So the users don't need to know how
the data is stored in order to access it. This ease of design can lead to the development of a poor database
which would slow down if the database grows.
But all these disadvantages are minor as compared to the advantages of the relational model. These problems can
be avoided with the help of proper implementation and organisation.
In the above example, we have two objects Employee and Department. All the data and relationships of each object
are contained as a single unit. The attributes like Name, Job_title of the employee and the methods which will be
performed by that object are stored as a single object. The two objects are connected through a common attribute
i.e the Department_id and the communication between these two will be done with the help of this common id.
Object-Relational Model
As the name suggests it is a combination of both the relational model and the object-oriented model. This model
was built to fill the gap between object-oriented model and the relational model. We can have many advanced
features like we can make complex data types according to our requirements using the existing data types. The
problem with this model is that this can get complex and difficult to handle. So, proper understanding of this model
is required.
Semi-Structured Model
Semi-structured model is an evolved form of the relational model. We cannot differentiate between data and
schema in this model. Example: Web-Based data sources which we can't differentiate between the schema and
data of the website. In this model, some entities may have missing attributes while others may have an extra
attribute. This model gives flexibility in storing the data. It also gives flexibility to the attributes. Example: If we are
storing any value in any attribute then that value can be either atomic value or a collection of values.
• Item: Items contain the name and the identifier(some numeric value).
• Links: Links contain the identifier, source, verb and subject.
Example: Let us say we have a statement "The world cup is being hosted by London from 30 May 2020". In this
data two links need to be stored:
1. The world cup is being hosted by London. The source here is 'the world cup', the verb 'is being' and the
target is 'London'.
2. ...from 30 May 2020. The source here is the previous link, the verb is 'from' and the target is '30 May 2020'.
This is represented using the table as follows:
These commands are used to update the database schema that's why they come under Data definition language.
DML stands for Data Manipulation Language. It is used for accessing and manipulating data in a database. It handles
user requests.
(But in Oracle database, the execution of data control language does not have the feature of rolling back.)
There are the following operations which have the authorization of Revoke:
TCL is used to run the changes made by the DML statement. TCL can be grouped into a logical transaction.
DATABASE ARCHITECTURE
The architecture of DBMS depends on the computer system on which it runs. For example, in a client-server DBMS
architecture, the database systems at server machine can run several requests made by client machine. We will
understand this communication with the help of diagrams.
In this type of architecture, the database is readily available on the client machine, any request made by client
doesn’t require a network connection to perform the action on the database.
For example, lets say you want to fetch the records of employee from the database and the database is available on
your computer system, so the request to fetch employee details will be done by your computer and the records will
be fetched from the database by your computer as well. This type of system is generally referred as local database
system
2. Two tier architecture
In two-tier architecture, the Database system is present at the server machine and the DBMS application is present
at the client machine, these two machines are connected with each other through a reliable network as shown in the
above diagram.
Whenever client machine makes a request to access the database present at server using a query language like sql,
the server perform the request on the database and returns the result back to the client. The application connection
interface such as JDBC, ODBC are used for the interaction between server and client.
In three-tier architecture, another layer is present between the client machine and server machine. In this
architecture, the client application doesn’t communicate directly with the database systems present at the server
machine, rather the client application communicates with server application and the server application internally
communicates with the database system present at the server
Structure of DBMS:
A database system is partitioned into modules that deal with each of the responsibilities of the overall system. The
functional components of a database system can be broadly divided into the storage manager and the query
processor components. The storage manager is important because databases typically require a large amount of
storage space. The query processor is important because it helps the database system simplify and facilitate access
to data.
It is the job of the database system to translate updates and queries written in a nonprocedural language, at the
logical level, into an efficient sequence of operations at the physical level.
Database Users:
Users are differentiated by the way they expect to interact with the system:
• Application programmers:
o Application programmers are computer professionals who write application programs. Application
programmers can choose from many tools to develop user interfaces.
o Rapid application development (RAD) tools are tools that enable an application programmer to
construct forms and reports without writing a program.
• Sophisticated users:
o Sophisticated users interact with the system without writing programs. Instead, they form their
requests in a database query language.
o They submit each such query to a query processor, whose function is to break down DML statements
into instructions that the storage manager understands.
• Specialized users :
o Specialized users are sophisticated users who write specialized database applications that do not fit
into the traditional data-processing framework.
o Among these applications are computer-aided design systems, knowledge base and expert systems,
systems that store data with complex data types (for example, graphics data and audio data), and
environment-modeling systems.
• Naïve users :
o Naive users are unsophisticated users who interact with the system by invoking one of the application
programs that have been written previously.
o For example, a bank teller who needs to transfer $50 from account A to account B invokes a program
called transfer. This program asks the teller for the amount of money to be transferred, the account
from which the money is to be transferred, and the account to which the money is to be transferred.
Database Administrator:
• Coordinates all the activities of the database system. The database administrator has a good understanding
of the enterprise’s information resources and needs.
• Database administrator's duties include:
o Schema definition: The DBA creates the original database schema by executing a set of data definition
statements in the DDL.
o Storage structure and access method definition.
o Schema and physical organization modification: The DBA carries out changes to the schema and
physical organization to reflect the changing needs of the organization, or to alter the physical
organization to improve performance.
o Granting user authority to access the database: By granting different types of authorization, the
database administrator can regulate which parts of the database various users can access.
o Specifying integrity constraints.
o Monitoring performance and responding to changes in requirements.
Query Processor
· DDL interpreter, which interprets DDL statements and records the definitions in the data dictionary.
· DML compiler, which translates DML statements in a query language into an evaluation plan consisting
of low-level instructions that the query evaluation engine understands.
A query can usually be translated into any of a number of alternative evaluation plans that all give the same result.
The DML compiler also performs query optimization, that is, it picks the lowest cost evaluation plan from among the
alternatives.
· Query evaluation engine, which executes low-level instructions generated by the DML compiler.
Storage Manager
A storage manager is a program module that provides the interface between the lowlevel data stored in the database
and the application programs and queries submitted to the system. The storage manager is responsible for the
interaction with the file manager. The raw data are stored on the disk using the file system, which is usually provided
by a conventional operating system. The storage manager translates the various DML statements into low-level file-
system commands. Thus, the storage manager is responsible for storing, retrieving, and updating data in the
database.
· Authorization and integrity manager, which tests for the satisfaction of integrity constraints and checks
the authority of users to access data.
· Transaction manager, which ensures that the database remains in a consistent (correct) state despite
system failures, and that concurrent transaction executions proceed without conflicting.
· File manager, which manages the allocation of space on disk storage and the data structures used to
represent information stored on disk.
· Buffer manager, which is responsible for fetching data from disk storage into main memory, and deciding
what data to cache in main memory. The buffer manager is a critical part of the database system, since it
enables the database to handle data sizes that are much larger than the size of main memory.
Transaction Manager
A transaction is a collection of operations that performs a single logical function in a database application. Each
transaction is a unit of both atomicity and consistency. Thus, we require that transactions do not violate any database-
consistency constraints. That is, if the database was consistent when a transaction started, the database must be
consistent when the transaction successfully terminates. Transaction - manager ensures that the database remains
in a consistent (correct) state despite system failures (e.g., power failures and operating system crashes) and
transaction failures.
1. Application Programmers – They are the developers who interact with the database by means
of DML queries. These DML queries are written in the application programs like C, C++, JAVA, Pascal, etc. These
queries are converted into object code to communicate with the database. For example, writing a C program
to generate the report of employees who are working in a particular department will involve a query to fetch
the data from the database. It will include an embedded SQL query in the C Program.
2. Sophisticated Users – They are database developers, who write SQL queries to select/insert/delete/update
data. They do not use any application or programs to request the database. They directly interact with the
database by means of a query language like SQL. These users will be scientists, engineers, analysts who
thoroughly study SQL and DBMS to apply the concepts in their requirements. In short, we can say this category
includes designers and developers of DBMS and SQL.
3. Specialized Users – These are also sophisticated users, but they write special database application programs.
They are the developers who develop the complex programs to the requirement.
4. Stand-alone Users – These users will have a stand-alone database for their personal use. These kinds of the
database will have readymade database packages which will have menus and graphical interfaces.
5. Native Users – these are the users who use the existing application to interact with the database. For
example, online library system, ticket booking systems, ATMs etc which has existing application and users use
them to interact with the database to fulfill their requests.
Database Administrators
The life cycle of a database starts from designing, implementing to the administration of it. A database for any kind
of requirement needs to be designed perfectly so that it should work without any issues. Once all the design is
complete, it needs to be installed. Once this step is complete, users start using the database. The database grows as
the data grows in the database. When the database becomes huge, its performance comes down. Also accessing the
data from the database becomes a challenge. There will be unused memory in the database, making the memory
inevitably huge. This administration and maintenance of the database are taken care of by the database Administrator
– DBA.
A DBA has many responsibilities. A good-performing database is in the hands of DBA.
• Installing and upgrading the DBMS Servers: – DBA is responsible for installing a new DBMS server for the new
projects. He is also responsible for upgrading these servers as there are new versions that come into the market
or requirement. If there is any failure in the up-gradation of the existing servers, he should be able to revert the
new changes back to the older version, thus maintaining the DBMS working. He is also responsible for updating
the service packs/ hotfixes/ patches to the DBMS servers.
• Design and implementation: – Designing the database and implementing is also DBA’s responsibility. He should
be able to decide on proper memory management, file organizations, error handling, log maintenance, etc for
the database.
• Performance tuning: – Since the database is huge and it will have lots of tables, data, constraints, and indices,
there will be variations in the performance from time to time. Also, because of some designing issues or data
growth, the database will not work as expected. It is the responsibility of the DBA to tune the database
performance. He is responsible to make sure all the queries and programs work in a fraction of seconds.
• Migrate database servers: – Sometimes, users using oracle would like to shift to SQL server or Netezza. It is the
responsibility of DBA to make sure that migration happens without any failure, and there is no data loss.
• Backup and Recovery: – Proper backup and recovery programs needs to be developed by DBA and has to be
maintained him. This is one of the main responsibilities of DBA. Data/objects should be backed up regularly so
that if there is any crash, it should be recovered without much effort and data loss.
• Security: – DBA is responsible for creating various database users and roles, and giving them different levels of
access rights.
• Documentation: – DBA should be properly documenting all his activities so that if he quits or any new DBA
comes in, he should be able to understand the database without any effort. He should basically maintain all his
installation, backup, recovery, security methods. He should keep various reports about database performance.
In order to perform his entire task, he should have very good command over DBMS.
Types of DBA
There are different kinds of DBA depending on the responsibility that he owns.
• Administrative DBA – This DBA is mainly concerned with installing, and maintaining DBMS servers. His prime
tasks are installing, backups, recovery, security, replications, memory management, configurations, and tuning.
He is mainly responsible for all administrative tasks of a database.
• Development DBA – He is responsible for creating queries and procedures for the requirement. Basically, his
task is similar to any database developer.
• Database Architect – Database architect is responsible for creating and maintaining the users, roles, access
rights, tables, views, constraints, and indexes. He is mainly responsible for designing the structure of the
database depending on the requirement. These structures will be used by developers and development DBA to
code.
• Data Warehouse DBA –DBA should be able to maintain the data and procedures from various sources in the
data warehouse. These sources can be files, COBOL, or any other programs. Here data and programs will be
from different sources. A good DBA should be able to keep the performance and function levels from these
sources at the same pace to make the data warehouse work.
• Application DBA –He acts like a bridge between the application program and the database. He makes sure all
the application program is optimized to interact with the database. He ensures all the activities from installing,
upgrading, and patching, maintaining, backup, recovery to executing the records work without any issues.
• OLAP DBA – He is responsible for installing and maintaining the database in OLAP systems. He maintains only
OLAP databases.
Introduction to Database Design: Database design and ER
diagrams, Entities, attributes and entity sets, Relationships
and relationship sets, Additional features of ER model,
Conceptual Design with ER model.
The logical model concentrates on the data requirements and the data to be stored independent of physical
considerations. It does not concern itself with how the data will be stored or where it will be stored physically.
The physical data design model involves translating the logical DB design of the database onto physical media using
hardware resources and software systems such as database management systems (DBMS).
Database design process in DBMS is crucial for high performance database system.
The database development life cycle has a number of stages that are followed when developing database systems.
The steps in the development life cycle do not necessarily have to be followed religiously in a sequential manner.
On small database systems, the process of database design is usually very simple and does not involve a lot of steps.
In order to fully appreciate the above diagram, let’s look at the individual components listed in each step for overview
of design process in DBMS.
Requirements analysis
• Planning – This stages of database design concepts are concerned with planning of entire Database
Development Life Cycle. It takes into consideration the Information Systems strategy of the organization.
• System definition – This stage defines the scope and boundaries of the proposed database system.
Database designing
• Logical model – This stage is concerned with developing a database model based on requirements. The entire
design is on paper without any physical implementations or specific DBMS considerations.
• Physical model – This stage implements the logical model of the database taking into account the DBMS and
physical implementation factors.
Implementation
• Data conversion and loading – this stage of relational databases design is concerned with importing and
converting data from the old system into the new database.
• Testing – this stage is concerned with the identification of errors in the newly implemented system. It checks
the database against requirement specifications.
1. Normalization
2. ER Modeling
ER DIAGRAM
E-R model is a graphical representation of the logical structure of the database. In this model, we represent
the real-world problem in the pictorial form to make it easy for the stakeholders to understand. It is also very easy
for the developers to understand the system by just looking at the ER diagram. We use the ER diagram as a visual
tool to represent an ER Model. ER diagram has the following three components:
• Entities: Entity is a real-world thing. It can be a person, place, or even a concept. It is represented by a
rectangle. Each entity has one of its attributes(attributes are properties of an entity) as the primary key
which can define it uniquely. Such an entity is called a strong entity. There are some entities which cant be
uniquely identified from there attributes. Such an entity is called a weak entity. We will study them in details
in some other blog. Example: Teachers, Student, Course, Building, Department, etc are some of the entities
of a School Management System. Student entity is represented as below.
• Attributes: An entity contains a real-world property called attribute. This is the characteristics of that
attribute. It is represented by an oval. Example: The entity car has properties like Car_no, Color, Price, etc.
There are many types of attributes which we will study in detail in some other blog. Here, we will have an
overview of it. A simple attribute contains an atomic value which cannot be further divided. Example: Roll
Number of Student.
Key Attribute: Key attribute is used to uniquley identify the records and it represents the primary key of the table.
It is represented by oval and the text in it is underlined. Example: If we have Roll_no as the key attribute then it
would be represneted as:
Composite Attribute: This attribute can be further divided into other attributes. Example: Name of a student can
be further divided into first name, middle name and last name.
Multivalued Attribute: This attribute has more than one value. It is represented by a double oval. Example: A
student can have more than one e-mail address.
Derived Attribute: The value of this attribute are derived from some other attributes. This is done mainly because
the value for such attribute keeps on changing. It is represented by a dashed oval. Example: The value of age
attribute is derived from the DOB(date of birth) attribute.
Note: Entities along with attributes are called entity type or schema. Example: Car entity has attributes Car_no,
Color, Price. So, we represent an entity type as follows:
• Relationship: Association between two entities is called a relationship. Entities take part in the relationship.
It is represented by a diamond shape. Example: Teacher teaches a student. There are three types of
relationships that can exist between two entities.
One-to-One Relationship: Such a relationship exists when each record of one table is related to only one record of
the other table. Example: If there are two entities 'Person' and 'Passport'. So, each person can have only one
passport and each passport belongs to only one person.
One-to-Many Relationship: Such a relationship exists when each record of one table can be related to one or more
than one record of the other table. Example: If there are two entities 'Owner' and 'Pet' then each owner can have
more than one pet but each pet is owned by only one owner.
Many-to-Many Relationship: Such a relationship exists when each record of the first table can be related to one or
more than one record of the second table and a single record of the second table can be related to one or more than
one record of the first table. Example: If there are two entities 'Customer' and 'Product' then each customer can buy
more than one product and a product can be bought by many different customers.
Using the above knowledge of how to make an ER diagram, we can make an ER diagram for a hospital management
system. We will have three entities i.e Doctor, Patient, Medicine.
In the above diagram, entity Doctor has key attribute 'doctor_id' which will be used to identify the
doctors. It also has two multivalued attributes as 'specialization' and 'qualification' as a doctor may have more than
one qualification and may be specialized in more than one fields. The Doctor and Patient entity have a one-to-many
relationship as a Doctor may treat more than one patient. Similarly, Patient and Medicine have a many-to-many
relationship as a patient may buy more than one medicine and vice-versa. 'Code' is the key attribute for Medicine
which is unique for every medicine. The Patient has many attributes Patient_id, DOB, Age, etc. 'Age' is the derived
attribute here. Also, it has a composite attribute 'Address' which can further be divided into two attributes 'Locality'
and 'Town'.
Features of ER Model
• Graphical Representation for Better Understanding: It is very easy and simple to understand so it can be
used by the developers to communicate with the stakeholders.
• ER Diagram: ER diagram is used as a visual tool for representing the model. As seen above the ER diagram
makes it very easy for us to represent the real-world problems.
• Database Design: This model helps the database designers to build the database and is widely used in
database design. ER model as discussed above provides a base for designing the database. Example: In the
above hospital management system, we can create the relational database according to the attributes,
entities and the relationship among them. We can have a separate table for each entity. The attributes can
make the column of each table and the association between the tables is defined according to the
relationship according to them.
Advantages of ER Model
• Simple: Conceptually ER Model is very easy to build. If we know the relationship between the attributes and
the entities we can easily build the ER Diagram for the model.
• Effective Communication Tool: This model is used widely by the database designers for communicating their
ideas.
• Easy Conversion to any Model: This model maps well to the relational model and can be easily converted
relational model by converting the ER model to the table. This model can also be converted to any other
model like network model, hierarchical model etc.
Disadvantages of ER Model
• No industry standard for notation: There is no industry standard for developing an ER model. So one
developer might use notations which are not understood by other developers.
• Hidden information: Some information might be lost or hidden in the ER model. As it is a high-level view so
there are chances that some details of information might be hidden.
ENTITIES:
Ex: If we have a table of a Student (Roll_no, Student_name, Age, Mobile_no) then each student in that table is an
entity and can be uniquely identified by their Roll Number i.e Roll_no.
ENTITY TYPE
The entity type is a collection of the entity having similar attributes. In the above Student table example, we have
each row as an entity and they are having common attributes i.e each row has its own value for attributes Roll_no,
Age, Student_name and Mobile_no. So, we can define the above STUDENT table as an entity type because it is a
collection of entities having the same attributes. So, an entity type in an ER diagram is defined by a name(here,
STUDENT) and a set of attributes(here, Roll_no, Student_name, Age, Mobile_no). The table below shows how the
data of different entities( different students) are stored.
The E-R representation of the above Student Entity Type is done below.
Strong Entities
1. A strong entity type is an entity that exists independently of other entity types.
2. Strong entity type always has a unique characteristic called an identifier.
3. That is an attribute or combination of attribute
4. That uniquely identified each occurrence (incidence) of that entity type.
5. It can be denoted as single line rectangle.
Weak Entities
1. A weak entity is an entity type whose existence depends on some other entity type.
2. The entity has not primary key
3. That is partially or totally derived from the parent entity in the relationship.
4. It is represented by double line rectangle.
Example: If we have two tables of Customer(Customer_id, Name, Mobile_no, Age, Gender) and Address(Locality,
Town, State, Customer_id). Here we cannot identify the address uniquely as there can be many customers from the
same locality. So, for this, we need an attribute of Strong Entity Type i.e ‘Customer’ here to uniquely ide ntify entities
of 'Address' Entity Type.
ASSOCIATIVE ENTITY
1. An associative entity is an entity type that associates the instances of one ormore entity
type.
2. It means a many to many relationship exits
3. It is also called composite entity or bridge entity
4. And it contains attributes that are peculiar to the relationship between those entityinstances.
5. It is represented by a diamond shape with in a rectangle.
ENTITY SET
Entity Set is a collection of entities of the same entity type. In the above example of STUDENT entity type, a
collection of entities from the Student entity type would form an entity set. We can say that entity type is a
superset of the entity set as all the entities are included in the entity type. Let's try to understand this with the
help of an example.
Example 1: In the below example, two entities E1 (2, Angel, 19, 8709054568) and E2(4, Analisa, 21, 9847852156)
form an entity set.
ATTRIBUTE: A real-world property of an entity type is called an attribute. This is the characteristics
of an entity. It is represented by an oval or ellipse in E-R diagram. Each attribute can take only a set of permitted
values. This is called the domain of that attribute.
For example, we define the roll_no of the ‘Student’ by a numeric value. So, the permitted values are only integers
and hence, ‘integer’ is the domain of attribute ‘roll_no’. Each attribute is represented by a separate column in a
relational table.
The entity ‘Student’ has properties like Name, Address, Roll_no, Mobile_no, Age, DOB, Class, Section, etc. So, when
we make an E-R diagram then Name, Address, Roll_no, Mobile_no, Age, DOB, Section and Class are represented
as the attributes of the entity type ‘Student’
Simple Attribute
A simple attribute contains an atomic value which cannot be further divided. It is simply represented by an oval. A
simple attribute is directly connected to the entity type. While making the E-R diagram, we directly connect the
oval with the rectangle.
For example, Roll_no of Student, Age of Student. It is represented in the E-R diagram as:
Composite Attribute
A Composite attribute can be further divided into other attributes. We represent this in E-R diagram by connecting
an oval with oval i.e the composite attribute is also represented by oval and the further divided attribute all also
represented by ovals. When we convert this E-R diagram to the relational table then we don't have any column of
the composite attribute. We have column only for the attribute which came after we further divided the composite
attribute.
For example, Name of a student can be further divided into first name, middle name and last name. The composite
attribute name is also represented by oval as well as the other attributes are also represented by oval and we
connect the all the further divided attributes with the composite attribute. In the table, we will have three columns
i.e. First_name, Middle_name, and Last_name. There is no such column called "Name".
Single-valued Attribute
This attribute has only one value. It is represented by a simple oval. Some simple attribute can also be a single-
valued attribute. For example, the Section of ‘Student’ is a simple attribute as it can’t be further divided. Also, it is
a single-valued attribute because a student can't study in more than one section.
For example, A student can have more than one e-mail address.
Stored Attribute
The value of this attribute should be provided by the user. This attribute is stored and can be used for finding the
value of other attributes. It cannot be derived from any other attribute. It is also represented by an oval. For
Example, The Date of Birth of ‘Student’ has to be provided by the student.
Derived Attribute
The value of this attribute is derived from some other attributes. We know the value of some other attribute(stored
attribute)and from stored attribute, we are deriving the value of this attribute(derived attribute). This is done
mainly because the value for such attribute keeps on changing. It is represented by a dashed oval.
For example, The value of age attribute is derived from the DOB(date of birth) attribute.
Example: We have Roll_no as the key attribute of the ‘Student’ because two students can never has same roll
number.
Non-Key Attribute
All the other attributes other than the key attribute are the non-key attributes. Two or more entities can have the
same value for this attribute. For example, the Class attribute would have the same value for all those students
who are studying in the same class.
Example: Class, Section, Age, Name etc, are the non-key attributes.
Note: The same attribute can be of more than one kind. For Example, The Address attribute is a composite attribute,
multivalued attribute, stored attribute and a non-key attribute.
Relationship:
1. Relationship is an association between entities.
2. The entities that participate each relationship is identified by a name that describes the
relationship
Relationship Set
A set of relationships of similar type is called a relationship set. Like entities, a relationship too can have attributes.
These attributes are called descriptive attributes.
Example- Set representation of above ER diagram is-
Degree of a Relationship Set- The number of entity sets that participate in a relationship set is termed as the
degree of that relationship set. Thus,
1. Unary Relationship Set- Unary relationship set is a relationship set where only one entity set participates
in a relationship set.
Example- One person is married to only one person
2. Binary Relationship Set- Binary relationship set is a relationship set where two entity sets participate in
a relationship set.
Example- Student is enrolled in a Course
3. Ternary Relationship Set- Ternary relationship set is a relationship set where three entity sets participate in a
relationship set.
Example-
4. N-ary Relationship Set- N-ary relationship set is a relationship set where ‘n’ entity sets participate in a
relationship set.
MAPPING CARDINALITIES
Cardinality defines the number of entities in one entity set, which can be associated with the number of entities of
other set via relationship set.
There are three types of relationships that can exist between two entities.
• One-to-One Relationship
• One-to-Many or Many-to-One Relationship
• Many-to-Many Relationship
One-to-One Relationship
Such a relationship exists when each record of one table is related to only one record of the other table.
For example, If there are two entities ‘Person’ (Id, Name, Age, Address)and ‘Passport’(Passport_id, Passport_no).
So, each person can have only one passport and each passport belongs to only one person.
Such a relationship is not very common. However, such a relationship is used for security purposes. In the above
example, we can easily store the passport id in the ‘Person’ table only. But, we make another table for the ‘Passport’
because Passport number may be sensitive data and it should be hidden from certain users. So, by making a
separate table we provide extra security that only certain database users can see it.
For example, If there are two entity type ‘Customer’ and ‘Account’ then each ‘Customer’ can have more than one
‘Account’ but each ‘Account’ is held by only one ‘Customer’. In this example, we can say that each Customer is
associated with many Account. So, it is a one-to-many relationship. But, if we see it the other way i.e many Account
is associated with one Customer then we can say that it is a many-to-one relationship.
Many-to-Many Relationship
Such a relationship exists when each record of the first table can be related to one or more than one record of the
second table and a single record of the second table can be related to one or more than one record of the first
table. A many-to-many relationship can be seen as a two one-to-many relationship which is linked by a 'linking
table' or 'associate table'. The linking table links two tables by having fields which are the primary key of the other
two tables. We can understand this with the following example.
Example: If there are two entity type ‘Customer’ and ‘Product’ then each customer can buy more than one product
and a product can be bought by many different customers.
Now, to understand the concept of the linking table here, we can have the ‘Order’ entity as a linking table which
links the ‘Customer’ and ‘Product’ entity. We can break this many-to-many relationship in two one-to-many
relationships. First, each ‘Customer’ can have many ‘Order’ whereas each ‘Order’ is related to only one ‘Customer’.
Second, each ‘Order’ is related only one Product wheres there can many orders for the same Product.
In the above concept of linking can be understood with the help of taking into consideration all the attributes of
the entities 'Customer', 'Order' and 'Product'. We can see that the primary key of both 'Customer' and 'Product'
entity are included in the linking table i.e 'Order' table. These key act as foreign keys while referring to the
respective table from the 'Order' table.
Participation Constraints
The relationship can be between two strong entity or a strong entity and a weak entity. Depending upon the type
of entity participating in the relationship, the participation can be partial or total. There are two types of
participation constraints:
• Partial Participation
• Total Participation
Partial Participation: Partial Participation exists when all the entity of an entity type is not associated with one or
the other entity of another entity type. This is represented by joining the relationship with the entity type with the
help of one line.
Example: We have two entity type ‘Customer’ and ‘Order’. Then there can be ‘Customer’ who have not done any
order. So, here there is partial participation of the entity in the relationship.
Total Participation: Total Participation exists when all the entity of an entity type is associated with one or the
other entity of another entity type. This is represented by joining the relationship with the entity type with the help
of a double parallel line. Such a relationship usually exist between a strong entity and a weak entity.
Example: We have two entity type ‘Employee’ and ‘Dependant’. Then all the ‘Dependent’ entity are related to one
or the other ‘Employee’ entity. This is called total participation of the entity in the relationship. But, it may be
possible that some ‘Employee’ is not related to any of the ‘Dependant’ entity. So, ‘Employee’ is showing partial
participation whereas the ‘Dependant’ is showing total participation in the relationship.
The attributes of Employee entity type are Name, Phone, Salary, Employee_id, and Address.
The attributes of Customer entity type are Name, Phone, Address, Credit, Customer_id, and Email.
We can see that the three attributes i.e. Name, Phone, and Address are common here. When
we generalize these two entities, we form a new higher-level entity type Person. The new entity type formed is
a generalized entity. We can see in the below E-R diagram that after generalization the new generalized entity
Person contains only the common attributes i.e. Name, Phone, and Address. Employee entity contains only the
specialized attribute like Employee_id and Salary. Similarly, the Customer entity type contains only specialized
attributes like Customer_id, Credit, and Email. So from this example, it is also clear that when we go from bottom
to top, it is a Generalization and when we go from top to bottom it is Specialization. Hence, we can say that
Generalization is the reverse of Specialization.
Specialization
Specialization, as the name suggests, is a process of specializing an entity type into a more specified
entity. In this process, we specialize a higher-level entity type by adding some additional attributes to the entity
type. In this approach, we add some additional attributes and specialize a higher-level entity type into some other
entity type. It is a top-down approach in which a higher-level entity is broken into smaller entities. For
example, Animals can be mammals or reptiles. Then these mammals can either be tiger, elephant or humans. The
reptiles can be snakes or lizards. So we are specializing as we are moving towards any particular type of animal.
Example: If we have a person entity type who has attributes such as Name, Phone_no, Address. Now, suppose we
are making software for a shop then the person can be of two types. This Person entity can further be divided into
two entity i.e. Employee and Customer. How we will specialize it? We will specialize the Person entity type to
Employee entity type by adding attributes like Emp_no and Salary. Also, we can specialize the Person entity type
to Customer entity type by adding attributes like Customer_no, Email, and Credit. These lower-level attributes will
inherit all the properties of the higher-level attribute. The Customer entity type will also have attributes like Name,
Phone_no, and Address which was present in the higher-level entity.
ISA Relationship and Attribute Inheritance
IS A relationship supports attribute inheritance and relationship participation. In the EER diagram, the
subclass relationship is represented by ISA relationship. Attribute inheritance is the property by which
subclass entities inherit values for all attributes of the superclass.
Consider the example of EMPLOYEEentity set in a bank. TheEMPLOYEE in a bank can be CLERK, MANAGER,
CASHIER, ACCOUNTANT, etc. It is to be observed that the CLERK, MANAGER, CASHIER, ACCOUNTANT
inherit some of the attributes of the EMPLOYEE.
In this example the superclass is EMPLOYEE and the subclasses are CLERK, MANAGER, and CASHIER. The subclasses
inherit the attributes of the superclass. Since each member of the subclass is an ISA member of the superclass, the
circle below the EMPLOYEE entity set represents ISA relationship.
Multiple Inheritance : A subclass with more than one superclass is called a shared subclass. A subclass
inherits attributes not only of its direct superclass, but also of all its predecessor superclass, that is it has
multiple inheritance from its super classes. In multiple inheritance a subclass can be subclass of more than
one superclass.
Composition is a stronger form of aggregation where the part cannot exist without its containing whole entity type
and the part can only be part of one entity type.
Consider the example of DEPARTMENT has PROJECT. Each project is associated with a particular DEPARTMENT.
There cannot be a PROJECT without DEPARTMENT. Hence DEPARTMENT has PROJECT is an example of composition.
In other words, make sure that all data needed are in the model and that all data in the model are needed. All
data elements required by the database transactions must be defined in the model, and all data elements defined in
the model must be used by at least one database transaction. The conceptual design has four steps, which are as
follows.
1. Data analysis and requirements
2. Entity relationship modeling and normalization
3. Data model verification
4. Distributed database design
• Directly observing the current system: existing and desired output. The end user usually has an existing system in
place, whether it’s manual or computer-based. The designer reviews the existing system to identify the data and their
characteristics.
• Interfacing with the systems design group. The database design process is part of the Systems Development Life
Cycle (SDLC). In some cases, the systems analyst in charge of designing the new system will also develop the
conceptual database model.
Before creating the ER model, the designer must communicate and enforce appropriate standards to be used
in the documentation of the design. The process of defining business rules and developing the conceptual model
using ER diagrams can be described using the following steps.
The data model verification step is one of the last steps in the conceptual design stage, and it is also one of the
most critical ones. In this step, the ER model must be verified against the proposed system processes in order to
corroborate that the intended processes can be supported by the database model. Verification requires that the
model be run through a series of tests against:
• End-user data views.
• All required transactions: SELECT, INSERT, UPDATE, and DELETE operations.
• Access rights and security.
• Business-imposed data requirements and constraints.
how choosing an entity set vs an attribute can change the whole ER design semantics. To understand this lets take an
example,
❖ Similar to the problem of wanting to record several addresses for an employee: we want to record several
values of the descriptive attributes for each instance of this relationship.
2. Choosing Entity Set vs. Relationship Sets
It is hard to decide that an object can be best represented by an entity set or relationship set. To comprehend and
decide the perfect choice between these two (entity vs relationship), the user needs to understand whether the entity
would need a new relationship if a requirement arise in future, if this is the case then it is better to choose entity set
rather than relationship set.
❖ First ER diagram OK if a manager gets a separate discretionary budget for each dept.
❖ What if a manager gets a discretionary budget that covers all managed depts?
Redundancy of dbudget, which is stored for each dept managed by the manager
Misleading: suggests dbudget tied to managed dept
In most cases, the relationships described in an ER diagrams are binary. The n-ary relationships are those where
entity sets are more than two, if the entity sets are only two, their relationship can be termed as binary relationship.
The n-ary relationships can make ER design complex, however the good news is that we can convert and represent
any n-ary relationship using multiple binary relationships.
This may sound confusing so lets take an example to understand how we can convert an n-ary relationship to
multiple binary relationships. Now lets say we have to describe a relationship between four family members: father,
mother, son and daughter. This can easily be represented in forms of multiple binary relationships, father-mother
relationship as “spouse”, son and daughter relationship as “siblings” and father and mother relationship with their
child as “child”.
The cardinality ratio in DBMS can help us determine in which scenarios we need to place relationship attributes.
It is recommended to represent the attributes of one to one or one to many relationship sets with any participating
entity sets rather than a relationship set.
For example, if an entity cannot be determined as a separate entity rather it is represented by the combination of
participating entity sets. In such case it is better to associate these entities to many-to-many relationship sets.
5. Functional dependencies:
1. e.g., A dept can’t order two distinct parts from the same supplier.
1. Can’t express this wrt ternary Contracts relationship.
Normalization refines ER design by considering FDs.
Inclusion dependencies:
– Special case: Foreign keys (ER model can express these).
– e.g., At least 1 person must report to each manager. (Set of ssn values in Manages must be subset of supervisor_ssn
values in Reports_To.) Foreign key? Expressible in ER model?
❖ General constraints:
– e.g., Manager’s discretionary budget less than 10% of the combined budget of all departments he or she manages.
Module 2: RELATIONAL MODEL AND RELATIONAL ALGEBRA (08
Periods)
Relational Model: Creating and modifying relations, Integrity constraints
over relations, Enforcing integrity constraints, Querying relational data,
Logical database design, Introduction to views, Destroying/altering tables
and views.
RELATIONAL MODEL
Relational Model (RM) represents the database as a collection of relations. A relation is nothing but a table of
values. Every row in the table represents a collection of related data values. These rows in the table denote a real-
world entity or relationship.
The table name and column names are helpful to interpret the meaning of values in each row. The data are
represented as a set of relations. In the relational model, data are stored as tables. However, the physical storage of
the data is independent of the way the data are logically organized.
1. The relational model uses a collection of tables to represent both data and relationships.
2. Tables are logical structures maintained by the database manager.
3. The relational model is a combination of three components,
a. Structural
b. Integrity and
c. Manipulative parts.
a. Structural Part : The structural part defines the database as a collection of relations.
b. Integrity Part : The database integrity is maintained in the relational model using primary and foreign keys.
c. Manipulative Part
1. The relational algebra and relational calculus are the tools
2. It used to manipulate data in the database.
3. Thus relational model has a strong mathematical background.
Relational Model Concepts: This database consists of various components based on the relational model. These
include:
1. Attribute: Each column in a Table. Attributes are the properties which define a relation. e.g., Student_Rollno,
NAME,etc.
2. Tables – In the Relational model the, relations are saved in the table format. It is stored along with its entities.
A table has two properties rows and columns. Rows represent records and columns represent attributes.
3. Tuple – It is nothing but a single row of a table, which contains a single record.
4. Relation Schema: A relation schema represents the name of the relation with its attributes.
5. Degree: The total number of attributes which in the relation is called the degree of the relation.
6. Cardinality: Total number of rows present in the Table.
7. Column: The column represents the set of values for a specific attribute.
Create: Create is used to create schema, tables, index, and constraints in the database. The basic syntax to create
table is as follows.
Syntax: CREATE TABLE table_name ( column1 data_type(size), column2 data_type(size), ..columnN data_type(size)
);
Ex1 : CREATE TABLE Employees ( Emp_Id int(3), Emp_Name varchar(20),address varchar(20),dept_id int);
Insert : Insert statement is used to insert new records into the table.
Syntax1: INSERT INTO TABLE_NAME (col1, col2, col3,...colN) VALUES (value1, value2, value3,...valueN);
It inserts value to each column from 1 to N. When we insert the data in this way, we need to make sure
datatypes of each column matches with the value we are inserting. Else, data will not be inserted or will insert wrong
values. Also, if there is any foreign key constraint on the table, then we have to make sure foreign key value already
exists in the parent table.
We can insert the values into the table without specifying the columns, provided we enter the value in the order
the table structure is. Imagine table structure for Employee is as follows and then we can insert the record without
specifying column names as below:
If we are specifying the column list then we need not insert it in the order of table structure. It inserts the values
in the order the column is listed in the INSERT statement.
INSERT INTO EMPLOYEE (EMP_ID, DEPT_ID, ADDRESS, EMP_NAME) VALUES (10001, 11101,’Troy’, ‘Joseph’);
We can even copy some of the column values from the existing table. Suppose we have to copy emp_id,
emp_name from Employee table to some temporary table. Then we can write as follows: (Assuming here that
datatype in both the tables are same and emp_name has enough buffers to store both first name and last name
combined).
Update : The update command is used to update or modify the existing data of a table in the database. It
changes the values of the records. Using UPDATE command we can update a single column as well as multiple
columns simultaneously.
Syntax: UPDATE table_name SET col1 = value1, col2 = value2,… colN = valueN WHERE condition;
If we do not specify ‘WHERE’ condition in the ‘UPDATE’ statement, then it will update the whole table. Hence we
have to specify which record/employee has to be updated.
Ex: UPDATE Employees SET Salary = 10000 WHERE Emp_Id = 007;
Modifies the salary(single column) of the employee with Emp_Id 7 and sets Salary to 10000.
Delete : Using Delete statement, we can delete the records in the entire table or specific record by specifying
the condition.
Ex: Suppose we have to delete an employee with id 110 from Employee table. Then the delete statement would be
DELETE FROM EMPLOYEE WHERE EMP_ID = 110;
Example: If we have to store the salary of the employees in the 'employee_table' then we can put constraints that
it should only be an INTEGER. Any entry other than integer like characters would not be acceptable and when we
try to give input like this, the DBMS will produce errors.
Entity Integrity
Each row for an entity in a table should be uniquely identified i.e. id some record is saved in the database
then that record should be uniquely identified from others. This is done with the help of primary keys. The entity
constraint says that the value of the primary key should not be NULL. If the value of the primary key is NULL then
we can't uniquely identify the rows if all other fields are the same. Also, with the help of primary key, we can
uniquely identify each record.
Example: 1 If we have a customer database and customer_table is present there with attributes like age and name.
Then each customer should be uniquely identified. There might be two customers with the same name and same
age, so there might be confusion while retrieving the data. If we retrieve the data of customer named 'Angel' then
two rows are having this name and there would be confusion. So, to resolve this issues primary keys are assigned
in each table and it uniquely identifies each entry of the table.
Example: 2
1. You can't delete a record from a primary table if matching records exist in a related table.
2. You can't change a primary key value in the primary table if that record has related records.
3. You can't enter a value in the foreign key field of the related table that doesn't exist in the primary key of the
primary table.
4. However, you can enter a Null value in the foreign key, specifying that the records are unrelated.
EXAMPLE-
Consider 2 relations "stu" and "stu_1" Where "Stu_id " is the primary key in the "stu" relation and foreign key in the
"stu_1" relation.
Relation "stu"
Relation "stu_1"
Examples
Rule 1. You can't delete any of the rows in the ”stu” relation that are visible since all the ”stu” are in use in the “stu_1”
relation.
Rule 2. You can't change any of the ”Stu_id” in the “stu” relation since all the “Stu_id” are in use in the ”stu_1”
relation. * Rule 3.* The values that you can enter in the” Stu_id” field in the “stu_1” relation must be in the” Stu_id”
field in the “stu” relation.
Rule 4 You can enter a null value in the "stu_1" relation if the records are unrelated.
Example2 : Let us suppose we have two tables of the student(student_id, name, age, course_id) and
course(course_id, course_name, duration). Now, if any course_id is present in the student table which is not there
in the course table then this is not allowed. The course_id in the student table should either be null or if any
course_id is present in the student table then it should also be present in the course table. This is how referential
integrity is maintained.
ENFORCING INTEGRITY CONSTRAINTS
SQL Constraints are the rules or restrictions applied to the database to limit the type and accuracy of data
entering the table. If the data to enter satisfies the constraints rule, it enters successfully. Let's consider the table
named Employee with EmpId as the primary key.
Let's consider another table named Designation. It contains the list of all the available designations in the company.
D_Id Column 2
1 Associate Consultant
2 Consultant
3 Data Analyst
4 Software Developer
5 Technical Staff
We will insert a new value in the table and check if they violate the constraints.
`INSERT into Employee VALUES (6, 'Patra', 'Suman', 'Business Analyst', 90000, 'Pune');`
Here, you can see that the designation Business Analyst is not available in the company list. So, none of the
employees can have a designation other than that present in the list. Entering any value other than what satisfies the
constraints rule would lead to a violation.
Integrity constraint comes into action when the primary key gets references by a foreign key. It defines
that the values of the foreign key must be present in the primary key or is NULL. There are three primary causes of
referential integrity constraint violation. They are as follows:
Applying the above modifications to the relations in the database may or may not violate the constraints of the
database.
Insertion in a Referencing Relation : Insertion of a tuple in the relationship can cause violations in different
ways. Let's have a look at each of them one by one.
Violation of Domain Constraint : Domain constraint in SQL restricts the values or type of attributes a column can
contain. The data type of values of a domain can be string, integer, currency, character, date, or time data type.
• For representing the age in a database, the value inserted should always be a number. Insertion of any
character or string or another data type may cause a violation of domain constraint.
• The domain constraint not only restricts the datatype but also defines the range or set of values of the
attribute. For example, the domain constraint can fix the age between 10 to 50. Another number inserted
greater than 50 or less than10 causes domain constraint violation.
Violation of the Entity integrity constraint : The entity constraint makes sure that none of the values in the primary
key column is assigned NULL. The primary key identifies the individual records and connects the relations and thus
cannot be NULL.
Violation of Key Constraints : Keys in DBMS are attributes or sets of attributes that uniquely identify rows in the
relation. The primary key is one of the multiple keys present in the entity set that can contain unique and non NULL
values in the database. Insertion of value violates the key constraints when it is already present in an existing tuple
in the table.
Violation of Referential Integrity Constraints : Values already present in the list of referenced values are allowed to
insert in the referencing attribute.
Values not present in the list of referenced values cannot get inserted in the referencing attribute. Inserting such
values causes referential integrity constraint violation.
Example: We will discuss with the help of the following two schemas:
The Student schema is the referencing relation, while the Branch schema is the referenced relation. It can
also be said as the Student relation references the Branch relation. Branch_Code is the foreign key in the tables.
Student
Roll_no Name Age Branch
1 Nisha 22 CSE
2 Suman 23 CE
3 Durgesh 21 MME
4 Sarfaraz 24 CSE
5 Yashwant 23 EE
Branch
Branch_Code Branch_Name
CSE Computer Science
ECE Electronics Engineering
EE Electrical Engineering
IT Information Technology
CE Civil Engineering
MME Metallurgy and Materials Engineering
According to the constraints, any value not present in the referenced table cannot get inserted into the
referencing table. For the above table, Students having Branch_Code other than the values in the Branch table cannot
get inserted into the Students table.
For example, a Student with Branch_Code BT ( BioTechnology ) cannot be present in the Students relation as
the branch code BT is not present in the relation Branch.
Deletion of values from a table can only cause referential integrity constraint violation.
According to the constraints, if the referencing attribute uses a value of the referenced attribute, that row cannot get
deleted from the referenced relation. Deleting such rows can cause referential integrity constraint violation.
Let's consider some examples from the above two relations that cause integrity constraint violation.
• A row having Branch_Code CSE ( Computer Science and Engineering ) cannot delete from the relation Branch.
It is because referencing relation Student references the value 'CSE' by the referencing attribute Branch_Code.
The above SQL statement deletes the record of the computer science branch from the Branch table. It violates the
constraints as there is already a student present in the Student table with its Branch as CSE.
Student
Roll_no Name Age Branch
1 Nisha 22 CSE
2 Suman 23 CE
3 Durgesh 21 MME
4 Sarfaraz 24 CSE
5 Yashwant 23 EE
Branch
Branch_Code Branch_Name
ECE Electronics Engineering
EE Electrical Engineering
IT Information Technology
CE Civil Engineering
MME Metallurgy and Materials Engineering
• A row having Branch_Code ECE ( Electronics Engineering ) can be safely deleted from the relation Branch as
the referencing relation Student does not reference the value 'ECE' by the referencing attribute Branch_Code.
The above SQL statement deletes the record of the Electronics Engineering branch from the Branch table. Deleting
that doesn't violate the constraints as there is no record in the Student table referenced by it.
Student
Roll_no Name Age Branch
1 Nisha 22 CSE
2 Suman 23 CE
3 Durgesh 21 MME
4 Sarfaraz 24 CSE
5 Yashwant 23 EE
Branch
Branch_Code Branch_Name
EE Electrical Engineering
IT Information Technology
CE Civil Engineering
MME Metallurgy and Materials Engineering
Handling the Violations : The referential integrity constraint violation caused by the deletion from a referenced
relation gets handled in the following three ways:
• It is by deleting the tuples from the referencing relation simultaneously where the referencing attribute has
the value of the referenced attribute to be deleted. This way of constraint violation handling is known as On
Delete Cascade.
• The second method is to abort or delete the deletion request of the referenced relation if the value to delete
is present in the referencing attribute of the referencing relation.
• The third method involves replacing the value with NULL or some other value in the referencing relation if
the referencing attribute uses the value to delete from the referenced relation.
The Branch_Code of a tuple in the relation Branch cannot update from CSE to CS as the referencing attribute
Branch_Code references value CSE in the referencing relation Student.
The referential integrity constraint violation caused by the update from a referenced relation gets handled in the
following three ways:
• It is by updating the tuples from the referencing relation simultaneously where the referencing attribute has
the value of the referenced attribute to be updated. This way of constraint violation handling is known as
On Update Cascade.
• The second method is to abort or delete the deletion request of the referenced relation if the value to
update is present in the referencing attribute of the referencing relation.
• The third method involves replacing the value with NULL or other value in the referencing relation if the
referencing attribute uses the value updated in the referenced relation.
Syntax 1: SELECT * FROM table_name; -- retrieves all the rows and columns from table table_name and displays it in
tabular form.
Ex: SELECT * FROM STUDENT; -- All the columns are retrieved and displayed in below format
Syntax 2: SELECT COLUMN1, COLUMN2, COLUMN3 from table_name; -- retrieves only 3 columns from table
table_name
Ex : SELECT STUDENT_NAME, ADDRESS FROM STUDENT; -- Retrieves only name and address from STUDENT table
WHERE Clause – here we can specify the filter conditions to the query. We can add any number of conditions.
The WHERE clause is used to filter certain records that meet the conditions mentioned in the query. It is evaluated
second after the FROM clause.
Operator Description
= Equal
>= Greater than equal to
<= less than equal to
> Greater than
< Less than
<> Not equal
BETWEEN Between a range
Like Search for a pattern
GROUP BY – We can combine specific categories of columns together and show the results. For example, there are
multiple employees working in different department. Using group by option we can display how many employees are
working in each department.
In the above query instead of count (e.DEPARTMENT_ID), we can give count (1). Both are same.
Some of the Group functions are
Count – it counts the total number of records in the table/s after applying ‘where’ clause. If where clause is not
specified, it gives the total number of records in the table. If Group by clause is applied, it filters the records based on
where clause (if any), then groups the records based on the columns in group by clause and gives the total count of
records in each grouped category.
SUM – It totals the value in each numeric column. We can use this function to find the total marks of a student, total
salary of an employee in a specific period etc.
AVG – It gives the average value of a column, provided column has numeric value. E.g.: Average age of students
present in particular class.
MAX – It gives the maximum value in a column. For example, highest scorer in the class can be retrieved by MAX
function.
MIN– It gives the minimum value in a column. For example, lowest paid employee in a department can be obtained
by MIN.
These group functions can be used with group by clause or without it.
Having Clause – Using this clause we can add conditions to the grouped categories and filter the records. For
examples, if we want to display the details of the department which has more than 100 employees.
ORDER BY – This clause helps to sort the records that are retrieved. By default, it displays the records in ascending
order of primary key. If we need to sort it based on different columns, then we need to specify it in ORDER BY clause.
If we need to order by descending order, then DESC keyword has to be added after the column list.
We can combine two or more columns into one column by using || in select statement.
SELECT EMP_FIRST_NAME || ‘ ‘ || EMP_LAST_NAME as emp_name
FROM EMPLOYEE WHERE EMP_ID = 1001; -- Here first name and last name of employee with id 1001 to show it as
emp_name
Distinct : The SELECT DISTINCT statement is used to return only distinct (different) values.
Inside a table, a column often contains many duplicate values; and sometimes you only want to list the different
(distinct) values.
Syntax : SELECT DISTINCT column1, column2, ... FROM table_name;
The first relation contains all of the attributes of the entity type except the multivalued attribute.
The second relation contains two attributes that form the primary key of the second relation. The first of these
attributes is the primary key from the first relation, which becomes a foreign key in the second relation. The second
is the multivalued attribute.
In the database, every entity set or relationship set can be represented in tabular form.
o Entity type becomes a table. In the given ER diagram, LECTURE, STUDENT, SUBJECT and COURSE forms
individual tables.
o All single-valued attribute becomes a column for the table. In the STUDENT entity, STUDENT_NAME and
STUDENT_ID form the column of STUDENT table. Similarly, COURSE_NAME and COURSE_ID form the column
of COURSE table and so on.
o A key attribute of the entity type represented by the primary key. In the given ER diagram, COURSE_ID,
STUDENT_ID, SUBJECT_ID, and LECTURE_ID are the key attribute of the entity.
o The multivalued attribute is represented by a separate table. In the student table, a hobby is a multivalued
attribute. So it is not possible to represent multiple values in a single column of STUDENT table. Hence we
create a table STUD_HOBBY with column name STUDENT_ID and HOBBY. Using both the column, we create
a composite key.
o Composite attribute represented by components. In the given ER diagram, student address is a composite
attribute. It contains CITY, PIN, DOOR#, STREET, and STATE. In the STUDENT table, these attributes can
merge as an individual column.
o Derived attributes are not considered in the table. In the STUDENT table, Age is the derived attribute. It
can be calculated at any point of time by calculating the difference between current date and Date of Birth.
Using these rules, you can convert the ER diagram to tables and columns and assign the mapping between the
tables. Table structure for the given ER diagram is as below:
INTRODUCTION TO VIEWS
As you can see from the above image, we can extract data columns from more than one table by running queries on
the data tables. Views contain only the definition of the view data in the data dictionary, not the actual data.
Let's take an employee table that store data of employees in a particular company:
Employee Table:
This table contains details of employees in a particular company and has data fields such as EmpID, EmpName,
Address, Contact. We have added 6 records of employees for our purpose.
EmpRole Table: This table contains details of employees' roles in a particular company and has data fields as EmpID,
Role, Dept. We have stored all the records particular to the EmpID of each employee.
Creating View : The view can be created by using the CREATE VIEW statement, Views can be simple or
complex depending on their usage.
Syntax: CREATE VIEW veiwName AS SELECT column1, column2,.... FROM tableName WHERE condition;
Here, viewName is the Name for the View we set, tableName is the Name of the table and condition is the
Condition by which we select rows.
Creating a Simple View : Simple view is the view that is made from a single table, It takes only one table and
just the conditions, It also does not take any inbuilt SQL functions like AVG(), MIN(), MAX() etc, or GROUP BY clause.
While creating a simple view, we are not creating an actual table, we are just projecting the data from the
original table to create a view table.
Example 1: In this example, we are creating a view table from Employee Table for getting the EmpID and EmpName.
So the query will be:
Now to see the data in the EmpView1 view created by us, We have to simply use the SELECT statement.
Output:
The view table EmpView1 that we have created from Employee Table contains EmpID and EmpName as its data fields.
EmpID EmpName
1 Alex
2 Adolf
3 Aryan
4 Bhuvan
5 Carol
6 Steve
Example 2: In this example, we are creating a view table from EmpRole Table for getting the EmpID and Dept for
employees having EmpIDs less than 4. So the query will be:
CREATE VIEW EmpView2 AS SELECT EmpID, Dept FROM EmpRole WHERE EmpID<4;
Now to see the data in the EmpView2 view created by us, We have to simply use the SELECT statement.
Output:
The view table EmpView2 that we have created from EmpRole Table contains EmpID and Dept as its data fields.
EmpID Dept
1 Engineering
2 IT
3 HR
DESTROYING/ALTERING TABLES AND VIEWS
ALTERING TABLE : Alter command is used to modify the existing database objects. It can add, delete/drop or
modify columns in the existing table. It can also be used to add and drop various constraints on the existing table.
DESTROYING TABLE: DROP command is used to delete the various existing database objects. It deletes an
entire database, an entire table, a view of a table or other objects in the database. If you drop a table, all the rows
in the table are deleted and the table structure is removed from the database. Once a table is dropped we cannot
get it back.
Deleting view
what if we don't need our created views anymore, So we need to delete the view so as we DROP a table in SQL,
similarly, we can delete or drop a view using the DROP statement. The DROP statement completely deletes the
structure of the view.
Syntax:
DROP VIEW ViewName; ------ Here ViewName is the name of the view to be deleted.
Example:
Let's delete one of our created views, say EmpView1.
DROP VIEW EmpView1;
Let's use the SELECT statement on Deleted View.
SELECT * from EmpView1;
Output:
Now SQL Editor will give us an error saying the table does not exist as the table is now been deleted.
ORA-00942: table or view does not exist
Updating View
Suppose we want to add more columns to the created view so we will have to update the view.
For updating the views we can use CREATE OR REPLACE VIEW statement, new columns will replace or get added to
the view.
Syntax:
For Updating a view CREATE OR REPLACE statement is used. Here, viewName is the Name of the view we
want to update, tableName is the Name of the table and condition is the Condition by which we select rows.
Example 1:
let's take the above created simple view EmpView1 and we want to one more column of address to our EmpView1
from Employee Table.
Now to see the data in the EmpView1 view created by us, We have to simply use the SELECT statement.
Output:
The data fields of view table EmpView1 have been updated to EmpID, EmpName, and Address.
2. Project Operation:
o This operation shows the list of those attributes that we wish to appear in the result. Rest of the attributes
are eliminated from the table.
o It is denoted by ∏.
Notation: ∏ A1, A2, An (r)
Where A1, A2, A3 is used as an attribute name of relation r.
Example: We can use the rename operator to rename STUDENT relation to STUDENT1. ρ(STUDENT1, STUDENT)
b. Join Operations
1. Join operation combines two relations to create a new relation.
2. The tables should be joined based on a common column.
3. The common column should be compatible in terms of domain.
4. It allows information to be combined from two or more tables.
5. Join is the real power behind the relation database, allowing the use of
independent table linked by common attributes (Characteristics).
Join Operations:
A Join operation combines related tuples from different relations, if and only if a given join condition is satisfied. It is
denoted by ⋈.
Example:
EMPLOYEE
EMP_CODE EMP_NAME
101 Stephan
102 Jack
103 Harry
SALARY
EMP_CODE SALARY
101 50000
102 30000
103 25000
1. Operation: (EMPLOYEE ⋈ SALARY)
Result:
Example: Let's use the above EMPLOYEE table and SALARY table:
Input:∏EMP_NAME, SALARY (EMPLOYEE ⋈ SALARY)
Output:
EMP_NAME SALARY
Stephan 50000
Jack 30000
Harry 25000
2. Outer Join:
The outer join operation is an extension of the join operation. It is used to deal with missing information.
Example:
EMPLOYEE
EMP_NAME STREET CITY
Ram Civil line Mumbai
Shyam Park street Kolkata
Ravi M.G. Street Delhi
Hari Nehru nagar Hyderabad
FACT_WORKERS
EMP_NAME BRANCH SALARY
Ram Infosys 10000
Shyam Wipro 20000
Kuber HCL 30000
Hari TCS 50000
Input:(EMPLOYEE ⋈ FACT_WORKERS)
Output:
EMP_NAME STREET CITY BRANCH SALARY
Ram Civil line Mumbai Infosys 10000
Shyam Park street Kolkata Wipro 20000
Hari Nehru nagar Hyderabad TCS 50000
Input:EMPLOYEE ⟖ FACT_WORKERS
Output:
Input:EMPLOYEE ⟗ FACT_WORKERS
Output:
Example:
CUSTOMER RELATION
CLASS_ID NAME
1 John
2 Harry
3 Jackson
PRODUCT
PRODUCT_ID CITY
1 Delhi
2 Mumbai
3 Noida
Input: CUSTOMER ⋈ PRODUCT
Output: