0% found this document useful (0 votes)
5 views28 pages

DBMS Notes Unit 1

The document outlines the fundamentals of Database Management Systems (DBMS) for the academic year 2024-2025, focusing on relational databases, their purpose, advantages, disadvantages, and architecture. It discusses data abstraction, data models, and the components of a DBMS, including the storage manager and query processor. Additionally, it highlights the importance of data manipulation and definition languages in managing databases effectively.

Uploaded by

Vignes Waran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views28 pages

DBMS Notes Unit 1

The document outlines the fundamentals of Database Management Systems (DBMS) for the academic year 2024-2025, focusing on relational databases, their purpose, advantages, disadvantages, and architecture. It discusses data abstraction, data models, and the components of a DBMS, including the storage manager and query processor. Additionally, it highlights the importance of data manipulation and definition languages in managing databases effectively.

Uploaded by

Vignes Waran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

CS3492 – DATABASE MANAGEMENT SYSTEMS

ACADEMIC YEAR 2024-2025 (EVEN)


Department of Computer Science and Engineering

UNIT-I - RELATIONAL DATABASES

Purpose of Database System – Views of data – Data Models – Database System Architecture –
Introduction to relational databases – Relational Model – Keys – Relational Algebra – SQL
fundamentals – Advanced SQL features – Embedded SQL– Dynamic SQL

1.1 INTRODUCTION
A database is a collection of data elements (facts) stored in a computer in a systematic way, such
that a computer program can consult it to answer questions. The answers to those questions become
information that can be used to make decisions that may not be made with the data elements alone. The
computer program used to manage and query a database is known as a database management system
(DBMS).
So a database is a collection of related data that we can use for
 Defining - specifying types of data
 Constructing - storing & populating
 Manipulating - querying, updating, reporting
A Database Management System (DBMS) is a software package to facilitate the creation and
maintenance of a computerized database.

1.2 Purpose of Database System


The file-processing system is supported by a conventional operating system. The system stores
permanent records in various files, and it needs different application programs to extract records from,
and add records to, the appropriate files. Before database management systems (DBMSs) were
introduced, organizations usually stored information in such systems.

Database management systems were developed to handle the difficulties of typical file-
processing systems supported by conventional operating systems.

SRMMCET / CSE / DBMS 1


CS3492 – DATABASE MANAGEMENT SYSTEMS

Advantages of DBMS
 Data redundancy and inconsistency - Since different programmers create the files and
application programs over a long period, the various files are likely to have different structures and the
programs may be written in several programming languages. Moreover, the same information may be
duplicated in several places (files).
For example, if a student has a double major (say, music and mathematics) the address and
telephone number of that student may appear in a file that consists of student records of students in the
Music department and in a file that consists of student records of students in the Mathematics
department. This redundancy leads to higher storage and access cost. In addition, it may lead to data
inconsistency; that is, the various copies of the same data may no longer agree. For example, a
changed student address may be reflected in the Music department records but not elsewhere in the
system.

 Difficulty in accessing data - Suppose that one of the university clerks needs to find out the
names of all students who live within a particular postal-code area. The clerk asks the data-processing
department to generate such a list. Because the designers of the original system did not anticipate this
request, there is no application program on hand to meet it. There is, however, an application program to
generate the list of all students.

 Data isolation – Because data are scattered in various files, and files may be in different
formats, writing new application programs to retrieve the appropriate data is difficult.

 Integrity problems - The data values stored in the database must satisfy certain types of
consistency constraints. Suppose the university maintains an account for each department, and records
the balance amount in each account. Suppose also that the university requires that the account balance
of a department may never fall below zero. Developers enforce these constraints in the system by
adding appropriate code in the various application programs. However, when new constraints are
added, it is difficult to change the programs to enforce them.

 Atomicity of updates - A computer system, like any other device, is subject to failure. In
many applications, it is crucial that, if a failure occurs, the data be restored to the consistent state that
existed prior to the failure. Consider a program to transfer $500 from the account balance of
department A to the account balance of department B. If a system failure occurs during the execution
of the program, it is possible that the $500 was removed from the balance of department A, but was
not credited to the balance of department B, resulting in an inconsistent database state. Clearly, it is
essential to database consistency that either both the credit and debit occur, or that neither occur.

SRMMCET / CSE / DBMS 2


CS3492 – DATABASE MANAGEMENT SYSTEMS

 Concurrent access by multiple users – To improve the overall performance of the system,
many systems allow multiple users to update the data simultaneously. In such an environment,
interaction of concurrent updates is possible and may result in inconsistent data.

 Security problems - Not every user of the database system should be able to access all the
data. For example, in a university, payroll personnel need to see only that part of the database that has
financial information. They do not need access to information about academic records. But, since
application programs are added to the file-processing system in an ad hoc manner, enforcing such
security constraints is difficult.

Disadvantages of DBMS
 It is bit complex. Since it supports multiple functionality to give the user the best, the
underlying software has become complex. The designers and developers should have thorough
knowledge about the software to get the most out of it.
 Because of its complexity and functionality, it uses large amount of memory. It also needs
large memory to run efficiently.
 DBMS system works on the centralized system, i.e.; all the users from all over the world
access this
 database. Hence any failure of the DBMS, will impact all the users.
 DBMS is generalized software, i.e.; it is written work on the entire systems rather specific one.
Hence some of the application will run slow.

1.3 Views of data


A database system is a collection of interrelated data and a set of programs that allow users to
access and modify these data. A major purpose of a database system is to provide users with an abstract
view of the data. That is, the system hides certain details of how the data are stored and maintained.

Data Abstraction
For the system to be usable, it must retrieve data efficiently. The need for efficiency has led
designers to use complex data structures to represent data in the database. Since many database-system
users are not computer trained, developers hide the complexity from users through several levels of
abstraction, to simplify users’ interactions with the system. Database contains three levels of abstraction
namely,
i. Physical level (or Internal View / Schema): The lowest level of abstraction describes how
the data are actually stored. The physical level describes complex low-level data structures in detail.

ii. Logical level (or Conceptual View / Schema): The next-higher level of abstraction
describes what data are stored in the database, and what relationships exist among those data. The

SRMMCET / CSE / DBMS 3


CS3492 – DATABASE MANAGEMENT SYSTEMS

logical level thus describes the entire database in terms of a small number of relatively simple structures.
Although implementation of the simple structures at the logical level may involve complex physical-
level structures, the user of the logical level does not need to be aware of this complexity. This is
referred to as physical data independence. Database administrators, who must decide what information
to keep in the database, use the logical level of abstraction.

iii. View level (or External View / Schema): The highest level of abstraction describes only
part of the entire database. Even though the logical level uses simpler structures, complexity remains
because of the variety of information stored in a large database. Many users of the database system do
not need all this information; instead, they need to access only a part of the database. The view level of
abstraction exists to simplify their interaction with the system. The system may provide many views for
the same database. Fig. 1.1 shows the relationship among the three levels of abstraction.

Fig. 1.1 Levels of Abstraction in a DBMS

1.4 Data Models


A data model is a collection of conceptual tools for describing data, data relationships, data
semantics, and consistency constraints. A data model provides a way to describe the design of a
database at the physical, logical, and view levels.
The data models can be classified into four different categories:
i. Relational Model
ii. Entity-Relationship Model
iii. Object-Based Data Model
iv. Semi-structured Data Model

SRMMCET / CSE / DBMS 4


CS3492 – DATABASE MANAGEMENT SYSTEMS

i) Relational Model. The relational model uses a collection of tables to represent both data
and the relationships among those data. Each table has multiple columns, and each column has a
unique name. Tables are also known as relations. The relational model is an example of a recordbased
model. Record-based models are so named because the database is structured in fixedformat records of
several types. Each table contains records of a particular type. Each record type defines a fixed number
of fields, or attributes. The columns of the table correspond to the attributes of the record type. The
relational data model is the most widely used data model, and a vast majority of current database
systems are based on the relational model.
ii) Entity-Relationship Model. The entity-relationship (E-R) data model uses a collection of
basic objects, called entities, and relationships among these objects. An entity is a “thing” or “object”
in the real world that is distinguishable from other objects. The entity-relationship model is widely
used in database design.

iii) Object-Based Data Model. Object-oriented programming has become the dominant
software-development methodology. This led to the development of an object-oriented data model
that can be seen as extending the E-R model with notions of encapsulation, methods, and object
identity.

iv) Semi-structured Data Model. The semi-structured data model is different from the other
three data models. The semistructured data model allows the data specifications at places where the
individual data items of the same type may have different attributes sets.
The Extensible Markup Language, also known as XML, is widely used for representing the
semistructured data. Although XML was initially designed for including the markup information to the
text document, it gains importance because of its application in the exchange of data.

Fig. 1.2 Data Models

Database Languages
A database system provides a data-definition language to specify the database schema and a
data-manipulation language to express database queries and updates. In practice, the datadefinition and
data-manipulation languages are not two separate languages; instead they simply form parts of a single
database language, such as the widely used SQL language.

SRMMCET / CSE / DBMS 5


CS3492 – DATABASE MANAGEMENT SYSTEMS

Data-Manipulation Language
A data-manipulation language (DML) is a language that enables users to access or manipulate
data as organized by the appropriate data model. The types of access are:
 Retrieval of information stored in the database
 Insertion of new information into the database
 Deletion of information from the database
 Modification of information stored in the database

There are basically two types:


• Procedural DMLs require a user to specify what data are needed and how to get those data.
• Declarative DMLs (also referred to as nonprocedural DMLs) require a user to specify what data are
needed without specifying how to get those data. Declarative DMLs are usually easier to learn and use
than are procedural DMLs. However, since a user does not have to specify how to get the data, the
database system has to figure out an efficient means of accessing data.

A query is a statement requesting the retrieval of information. The portion of a DML that
involves information retrieval is called a query language. Although technically incorrect, it is common
practice to use the terms query language and data-manipulation language synonymously.

Data-Definition Language (DDL)


We specify a database schema by a set of definitions expressed by a special language called a
data-definition language (DDL). The DDL is also used to specify additional properties of the data.

1.5 Database System Architecture


A database system is partitioned into modules that deal with each of the responsibilities of the
overall system. The functional components of a database system can be broadly divided into the storage
manager and the query processor components.

The storage manager is important because databases typically require a large amount of storage
space. Corporate databases range in size from hundreds of gigabytes to, for the largest databases,
terabytes of data. A gigabyte is approximately 1000 megabytes (actually 1024) (1 billion bytes), and a
terabyte is 1 million megabytes (1 trillion bytes). Since the main memory of computers cannot store this
much information, the information is stored on disks. Data are moved between disk storage and main
memory as needed. Since the movement of data to and from disk is slow relative to the speed of the
central processing unit, it is imperative that the database system structure the data so as to minimize the
need to move data between disk and main memory.

SRMMCET / CSE / DBMS 6


CS3492 – DATABASE MANAGEMENT SYSTEMS

The query processor is important because it helps the database system to simplify and facilitate
access to data. The query processor allows database users to obtain good performance while being able
to work at the view level. It is the job of the database system to translate updates and queries written in a
nonprocedural language, at the logical level, into an efficient sequence of operations at the physical
level.

Fig. 1.3 System Architecture

Storage Manager:
The storage manager is the component of a database system that provides the interface between
the low level data stored in the database and the application programs and queries submitted to the
system. The storage manager is responsible for the interaction with the file manager. The raw data are
stored on the disk using the file system provided by the operating system. The storage man-ager translates

SRMMCET / CSE / DBMS 7


CS3492 – DATABASE MANAGEMENT SYSTEMS

the various DML statements into low-level file-system commands. Thus, the storage manager is
responsible for storing, retrieving, and updating data in the database.
The storage manager components include:
 Authorization and integrity manager, which tests for the satisfaction of integrity
constraints and checks the authority of users to access data.
 Transaction manager, which ensures that the database remains in a consistent) state despite
system failures and that concurrent transaction execution proceed without conflicting.
 File manager, which manages the allocation of space on disk storage and the data
structures used to represent information stored on disk.
 Buffer manager, which is responsible for fetching data from disk storage into main
memory, and deciding what data to cache in main memory. The buffer manager is a
critical part of the database system, since it enables the database to handle data sizes that
are much larger than the size of main memory
The storage manager implements several data structures as part of the physical system
implementation:
 Data files, which store the database itself.
 Data dictionary, which stores metadata about the structure of the database, in particular the
schema of the database.
 Indices, which can provide fast access to data items. A database index provides pointers to
those data items that hold a particular value. For example, we could use an index to find
the instructor record with a particular ID, or all instructor records with a particular name.
Hashing is an alternative to indexing that is faster in some but not all cases.

The Query Processor:


The query processor components include:
 DDL interpreter, which interprets DDL statements and records the definitions in the data
dictionary.
 DML compiler, which translates DML statements in a query language into an evaluation
plan consisting of low-level instructions that the query evaluation engine understands. A
query can usually be translated into any of a number of alternative evaluation plans that all
give the same result. The DML compiler also performs query optimization; that is, it picks
the lowest cost evaluation plan from among the alternatives.
 Query evaluation engine, which executes low-level instructions generated by the DML
compiler.

SRMMCET / CSE / DBMS 8


CS3492 – DATABASE MANAGEMENT SYSTEMS

Database Architecture
The architecture of a database system is greatly influenced by the underlying computer system on
which the database system runs. Database systems can be centralized, or client-server, where one server
machine executes work on behalf of multiple client machines. Database systems can also be designed to
exploit parallel computer architectures. Distributed databases span multiple geographically separated
machines.

Most users of a database system today are not present at the site of the database system, but
connect to it through a network. We can therefore differentiate between client machines, on which remote
database users’ work, and server machines, on which the database system runs.

Database applications are usually partitioned into two or three parts. In a two-tier architecture, the
application resides at the client machine, where it invokes database system functionality at the server
machine through query language statements. Application program interface standards like ODBC and
JDBC are used for interaction between the client and the server.

In contrast, in three-tier architecture, the client machine acts as merely a front end and does not
contain any direct database calls. Instead, the client end communicates with an application server, usually
through a forms interface. The application server in turn communicates with a database system to access
data. The business logic of the application, which says what actions to carry out under what conditions, is
embedded in the application server, instead of being distributed across multiple clients. Three-tier
applications are more appropriate for large applications, and for applications that run on the World Wide
Web.

Fig 1.5 Two-tier and Three-tier Architectures

SRMMCET / CSE / DBMS 9


CS3492 – DATABASE MANAGEMENT SYSTEMS

1.6 Introduction to relational databases


A relational database consists of a collection of tables, each of which is assigned a unique name.
For example, consider the instructor table 1.1, which stores information about instructors. The table has
four column headers: ID, name, dept name, and salary. Each row of this table records information about
an instructor, consisting of the instructor’s ID, name, dept name, and salary. Similarly, the course table
of 1.2 stores information about courses, consisting of a course id, title, dept name, and credits, for each
course. Note that each instructor is identified by the value of the column ID, while each course is
identified by the value of the column course id.
Table 1.1 The instructor relation

Table 1.2 The course relation

1.7 Relational Model


The relational model is the primary data model for commercial data processing applications. The
relational model uses a collection of tables to represent both data and relationships among those data.
The relational model describes data at the logical and view levels, abstracting away low level details of
data storage.
Let us consider the table 1.1 instructor relation and table 1.2 course relation.
Table 1.3 shows a third table, prereq, which stores the prerequisite courses for each course. The
table has two columns, course id and prereq id. Each row consists of a pair of course identifiers such
that the second course is a prerequisite for the first course.

Thus, a row in the prereq table indicates that two courses are related in the sense that one course
is a prerequisite for the other. As another example, we consider the table instructor, a row in the table
can be thought of as representing the relationship between a specified ID and the corresponding values
for name, dept name, and salary values.

SRMMCET / CSE / DBMS 10


CS3492 – DATABASE MANAGEMENT SYSTEMS

Table 1.3 Prereq relation

Thus, in the relational model the term relation is used to refer to a table, while the term tuple is
used to refer to a row. Similarly, the term attribute refers to a column of a table.

Examining Table 1.1, we can see that the relation instructor has four attributes: ID, name, dept
name, and salary. We use the term relation instance to refer to a specific instance of a relation, i.e.,
containing a specific set of rows. The instance of instructor shown in Table 1.1 has 5 tuples,
corresponding to 12 instructors.

Database Schema
 Database schema is the logical design of the database and the database instance is a snapshot of
the data in the database at a given instant in time.
 The concept of relation corresponds to the programming language notion of a variable, while the
concept of a relation schema corresponds to the programming language notion of type definition.
 In general, a relation schema consists of a list of attributes and their corresponding domains
 The concept of a relation instance corresponds to the programming language notation of type
definition.

Schema Diagram
A schema is the blueprint or structure that defines how data is organized and stored in a
database. It outlines the tables, fields, relationships, views, indexes, and other elements within the
database. The schema defines the logical view of the entire database and specifies the rules that govern
the data, including its types, constraints, and relationships.

Fig. 1.6 Examples for Schema

SRMMCET / CSE / DBMS 11


CS3492 – DATABASE MANAGEMENT SYSTEMS

Database Schema
A database schema is the design or structure of a database that defines how data is organized and
how different data elements relate to each other. It acts as a blueprint, outlining tables, fields,
relationships, and rules that govern the data.

Key points about a database schema:


 It defines how data is logically organized, including tables, fields, and relationships.
 It outlines the relationships between entities, such as primary and foreign keys.
 It helps resolve issues with unstructured data by organizing it in a clear, structured way.
 Database schemas guide how data is accessed, modified, and maintained.
In simple terms, the schema provides the framework that makes it easier to understand, manage,
and use data in a database. It’s created by database designers to ensure the data is consistent and
efficiently organized.

Fig. 1.7 Types of Database Schema


Types of Database Schemas
i. Physical Database Schema
A physical schema defines how data is stored in the storage system, including the arrangement of
files, indices and other storage structures. It specifies the actual code and syntax needed to create the
database structure. Essentially, it determines where and how the data is stored in the physical storage
medium.
The database administrator decides the storage locations and organization of data within the
storage blocks. This schema represents the lowest level of abstraction

ii. Logical Database Schema


A logical database schema defines the logical structure of the data, including tables, views,
relationships, and integrity constraints. It describes how data is organized in tables and how the
attributes of these tables are connected. The logical schema ensures that the data is stored in an
organized manner, while maintaining data integrity.

SRMMCET / CSE / DBMS 12


CS3492 – DATABASE MANAGEMENT SYSTEMS

Using Entity-Relationship (ER) modeling, the logical schema outlines the relationships between
different data components. It also defines integrity constraints to ensure the quality of data during
insertion and updates. This schema represents a higher level of abstraction compared to the physical
schema, focusing on logical constraints and how the data is structured, without dealing with the physical
storage details.

Database Schema Designs


There are many ways to structure a database and we should use the best-suited schema design for
creating our database because ineffective schema designs are difficult to manage & consume extra
memory and resources. Schema design mostly depends on the application’s requirements.
Some effective schema designs to create applications are:
i. Flat Model
ii. Hierarchical Model
iii. Network Model
iv. Relational Model
v. Star Schema
vi. Snowflake Schema
Flat Model
A flat model schema is a 2-D array in which every column contains the same type of
data/information and the elements with rows are related to each other. It is just like a table or a
spreadsheet. This schema is better for small applications that do not contain complex data.

Fig. 1.8 Flat Models


Hierarchical Model
Data is arranged using parent-child relationships and a tree-like structure in the Hierarchical
Database Model. Because each record consists of several children and one parent, it can be used to
illustrate one-to-many relationships in diagrams such as organizational charts. A hierarchical database
structure is great for storing nested data.

SRMMCET / CSE / DBMS 13


CS3492 – DATABASE MANAGEMENT SYSTEMS

Fig 1.9 Hierarchical Model


Network Model
The network model is similar to the hierarchical model in that it represents data using nodes
(entities) and edges (relationships). However, unlike the hierarchical model, which enforces a strict
parent-child relationship, the network model allows for more flexible many-to-many relationships. This
flexibility means that a node can have multiple parent nodes and child nodes, making the structure more
dynamic.
The network model can contain cycles which is a situation where a path exists that allows you to
start and end at the same node. These cycles enable more complex relationships and allow for greater
data interconnectivity.

Fig. 1.10 Network Model


Relational Model
The relational model is mainly used for relational databases, where the data is stored as relations
of the table. This relational model schema is better for object-oriented programming.

Fig. 1.11 Relational Model

SRMMCET / CSE / DBMS 14


CS3492 – DATABASE MANAGEMENT SYSTEMS

Star Schema
Star schema is better for storing and analyzing large amounts of data. It has a fact table at its
center & multiple dimension tables connected to it just like a star, where the fact table contains the
numerical data that run business processes and the dimension table contains data related to dimensions
such as product, time, people, etc. or we can say, this table contains the description of the fact table.

Fig. 1.12 Star Schema


Snowflake Schema
Just like star schema, the snowflake schema also has a fact table at its center and multiple
dimension tables connected to it, but the main difference in both models is that in snowflake schema –
dimension tables are further normalized into multiple related tables. The snowflake schema is used for
analyzing large amounts of data.

Fig. 1.13 Snowflake Schema

1.8 Keys
In databases, keys are fundamental in maintaining data integrity and organization. They serve as
unique identifiers and establish relationships between tables, enabling efficient data retrieval and
manipulation.

SRMMCET / CSE / DBMS 15


CS3492 – DATABASE MANAGEMENT SYSTEMS

Types of Database Keys


i. Primary key
ii. Candidate key
iii. Super Key
iv. Foreign key
v. Alternate key
vi. Composite key

Primary Key
There can be more than one candidate key in relation out of which one can be chosen as the
primary key. For Example, STUD_NO, as well as STUD_PHONE, are candidate keys for relation
STUDENT but STUD_NO can be chosen as the primary key (only one out of many candidate keys).

 A primary key is a unique key, meaning it can uniquely identify each record (tuple) in a table.
 It must have unique values and cannot contain any duplicate values.
 A primary key cannot be NULL, as it needs to provide a valid, unique identifier for every
record.
 A primary key does not have to consist of a single column. In some cases, a composite
primary key (made of multiple columns) can be used to uniquely identify records in a table.
 Databases typically store rows ordered in memory according to primary key for fast access of
records using primary key.
Example:
STUDENT table -> Student(STUD_NO, SNAME, ADDRESS, PHONE) , STUD_NO is a
primary key .
Table STUDENT
Stud_No. SName Address Phone
1 Shyam Delhi 123456789
2 Rakesh Kolkata 223365796
3 Suraj Delhi 175468965

Candidate Key
The minimal set of attributes that can uniquely identify a tuple is known as a candidate key.
For Example, STUD_NO in STUDENT relation.
 A candidate key is a minimal super key, meaning it can uniquely identify a record but contains
no extra attributes.
 It is a super key with no repeated data is called a candidate key.

SRMMCET / CSE / DBMS 16


CS3492 – DATABASE MANAGEMENT SYSTEMS

 The minimal set of attributes that can uniquely identify a record.


 A candidate key must contain unique values, ensuring that no two rows have the same value in
the candidate key’s columns.
 Every table must have at least a single candidate key.
 A table can have multiple candidate keys but only one primary key.

Example:
STUD_NO is the candidate key for relation STUDENT.

Super Key
The set of one or more attributes (columns) that can uniquely identify a tuple (record) is known
as Super Key. For Example, STUD_NO, (STUD_NO, STUD_NAME), etc.
 A super key is a group of single or multiple keys that uniquely identifies rows in a table. It
supports NULL values in rows.
 A super key can contain extra attributes that aren’t necessary for uniqueness. For example, if
the “STUD_NO” column can uniquely identify a student, adding “SNAME” to it will still
form a valid super key, though it’s unnecessary.

Example: Let us consider Table STUDENT


Stud_No. SName Address Phone
1 Shyam Delhi 123456789
2 Rakesh Kolkata 223365796
3 Suraj Delhi 175468965

Here, STUD_NO + PHONE is a super key.

Foreign Key
A foreign key is an attribute in one table that refers to the primary key in another table. The table
that contains the foreign key is called the referencing table, and the table that is referenced is called the
referenced table.
 A foreign key in one table points to the primary key in another table, establishing a
relationship between them.
 It helps connect two or more tables, enabling you to create relationships between them. This is
essential for maintaining data integrity and preventing data redundancy.
 They act as a cross-reference between the tables.
Example:
STUD_NO in STUDENT_COURSE is a foreign key to STUD_NO in STUDENT relation.

SRMMCET / CSE / DBMS 17


CS3492 – DATABASE MANAGEMENT SYSTEMS

Alternate Key
An alternate key is any candidate key in a table that is not chosen as the primary key. In other
words, all the keys that are not selected as the primary key are considered alternate keys.

An alternate key is also referred to as a secondary key because it can uniquely identify records in
a table, just like the primary key.
An alternate key can consist of one or more columns (fields) that can uniquely identify a record,
but it is not the primary key
Example:- SNAME, and ADDRESS is Alternate keys in STUDENT table.

Composite Key
Sometimes, a table might not have a single column/attribute that uniquely identifies all the
records of a table. To uniquely identify rows of a table, a combination of two or more columns/attributes
can be used. It still can give duplicate values in rare cases. So, we need to find the optimal set of
attributes that can uniquely identify rows in a table.
 It acts as a primary key if there is no primary key in a table
 Two or more attributes are used together to make a composite key.
 Different combinations of attributes may give different accuracy in terms of identifying the
rows uniquely.

1.9 Relational Algebra


Relational algebra is a procedural query language. It gives a step by step process to obtain the
result of the query. It uses operators to perform queries.

Types of relational operations:


Select Operation (σ)
It selects tuples that satisfy the given predicate from a relation.
Notation − σp(r)
Where σ stands for selection predicate and r stands for relation. p is prepositional logic formula
which may use connectors like and, or, and not. These terms may use relational operators like =, ≠, ≥,
< , >, ≤.

Example
σsubject = "database"(Books)
Output: Selects tuples from books where subject is 'database'.
σsubject = "database" and price = "450"(Books)
Output: Selects tuples from books where subject is 'database' and 'price' is 450.

SRMMCET / CSE / DBMS 18


CS3492 – DATABASE MANAGEMENT SYSTEMS

σsubject = "database" and price = "450" or year > "2010"(Books)


Output: Selects tuples from books where subject is 'database' and 'price' is 450 or those books
published after 2010.

Project Operation (∏)


It projects column(s) that satisfy a given predicate.
Notation − ∏A1, A2, An (r) Where A1, A2 , An are attribute names of relation r. Duplicate rows are
automatically eliminated, as relation is a set.
Example
∏subject, author (Books)
Output: Selects and projects columns named as subject and author from the relation Books.

Cartesian Product (Χ)


Combines information of two different relations into one.
Notation − r Χ s Where r and s are relations and their output will be defined as
r Χ s = { q t | q ∈ r and t ∈ s}

Example
σauthor = 'tutorialspoint'(Books Χ Articles)
Output − Yields a relation, which shows all the books and articles written by tutorials point.

Join operations
1. Natural Join
A Natural join is the set of tuples of all combinations in R and S that are equal on their common
attribute name. It is denoted by
Example: Let’s consider EMPLOYEE table and SALARY table
EMPLOYEE
EMP_CODE EMP_NAME
101 Ajay
102 Suresh
103 Ram

SALARY
EMP_CODE SALARY
101 50000
102 30000
103 25000

SRMMCET / CSE / DBMS 19


CS3492 – DATABASE MANAGEMENT SYSTEMS

Input: ∏EMP_NAME_SALARY (EMPLOYEE SALARY)


Output:
EMP_NAME SALARY
Ajay 50000
Suresh 30000
Ram 25000

2. Outer Join
The outer join operation is an extension of the join operation. It is used to deal with missing
information.
Example
EMPLOYEE
EMP_NAME STREET CITY
Ajay M.G.R Street Chennai
Suresh Park Street Trichy
Ram Anna Nagar Madurai

FACT_WORKERS
EMP_NAME BRANCH SALARY
Ajay Wipro 50000
Ram TCS 30000
Bala Honeywell 25000
Input:
(EMPLOYEE FACT_WORKERS)
Output:
EMP_NAME STREET CITY BRANCH SALARY
Ajay M.G.R Street Chennai Wipro 50000
Ram Anna Nagar Trichy TCS 30000

An outer Join is basically of three types:


a. Left outer join
Left outer Join contains the set of tuples of all combinations in R and S that are equal on their
common attribute names. It is denoted by
Example: Let us consider the above EMPLOYEE and FACT_WORKERS tables.
Input: (EMPLOYEE FACT_WORKERS)
Output
EMP_NAME STREET CITY BRANCH SALARY
Ajay M.G.R Street Chennai Wipro 50000
Ram Anna Nagar Trichy TCS 30000
Suresh Park Street Trichy NUL NUL

SRMMCET / CSE / DBMS 20


CS3492 – DATABASE MANAGEMENT SYSTEMS

b. Right outer join


Right outer join contains the set of tuples of all combinations in R and S that are equal on
their common attributes names. It is denoted by
Input: EMPLOYEE FACT_WORKERS
EMP_NAME BRANCH SALARY STREET CITY
Ajay Wipro 50000 M.G.R Street Chennai
Ram TCS 30000 Anna Nagar Trichy
Bala Honeywell 25000 NUL NUL

c. Full outer join


Full outer join is like a left or right join except that it contains all rows from both tables. It is
denoted by .
Example:
Input: EMPLOYEE FACT_WORKERS
Output
EMP_NAME STREET CITY BRANCH SALARY
Ajay M.G.R Street Chennai Wipro 50000
Ram Anna Nagar Trichy TCS 30000
Bala NUL NUL Honeywell 25000
Suresh Park Street Trichy NUL NUL

3. Equi Join
It is also inner join. It is based on matched data as per the equality condition. The Equi join uses
the comparison operator (=).
Example:
CUSTOMER RELATION
CUSST_ID NAME
1 Ajay
2 Suresh
3 Ram
PRODUCT
PRODUCT_ID CITY
1 Chennai
2 Trichy
3 Madurai

Input: CUSTOMER PRODUCT


Output
CUSST_ID NAME PRODUCT_ID CITY
1 Ajay 1 Chennai
2 Suresh 2 Trichy
3 Ram 3 Madurai

SRMMCET / CSE / DBMS 21


CS3492 – DATABASE MANAGEMENT SYSTEMS

1.10 SQL fundamentals


Select Operation (σ)
It selects tuples that satisfy the given predicate from a relation.
Notation − σp(r)
Where σ stands for selection predicate and r stands for relation. p is prepositional logic formula
which may use connectors like and, or, and not. These terms may use relational operators like =, ≠, ≥,
< , >, ≤.

Example
σsubject = "database"(Books)
Output: Selects tuples from books where subject is 'database'.
σsubject = "database" and price = "450"(Books)
Output: Selects tuples from books where subject is 'database' and 'price' is 450.
σsubject = "database" and price = "450" or year > "2010"(Books)
Output: Selects tuples from books where subject is 'database' and 'price' is 450 or those books
published after 2010.

Project Operation (∏)


It projects column(s) that satisfy a given predicate.
Notation − ∏A1, A2, An (r) Where A1, A2 , An are attribute names of relation r. Duplicate rows are
automatically eliminated, as relation is a set.
Example
∏subject, author (Books)
Output: Selects and projects columns named as subject and author from the relation Books.

Cartesian Product (Χ)


Combines information of two different relations into one.
Notation − r Χ s Where r and s are relations and their output will be defined as
r Χ s = { q t | q ∈ r and t ∈ s}

Example
σauthor = 'tutorialspoint'(Books Χ Articles)
Output − Yields a relation, which shows all the books and articles written by tutorials point.

Join operations
4. Natural Join
A Natural join is the set of tuples of all combinations in R and S that are equal on their common
attribute name. It is denoted by

SRMMCET / CSE / DBMS 22


CS3492 – DATABASE MANAGEMENT SYSTEMS

Example: Let’s consider EMPLOYEE table and SALARY table


EMPLOYEE
EMP_CODE EMP_NAME
101 Ajay
102 Suresh
103 Ram

SALARY
EMP_CODE SALARY
101 50000
102 30000
103 25000

Input: ∏EMP_NAME_SALARY (EMPLOYEE SALARY)


Output:
EMP_NAME SALARY
Ajay 50000
Suresh 30000
Ram 25000

5. Outer Join
The outer join operation is an extension of the join operation. It is used to deal with missing
information.
Example
EMPLOYEE
EMP_NAME STREET CITY
Ajay M.G.R Street Chennai
Suresh Park Street Trichy
Ram Anna Nagar Madurai

FACT_WORKERS
EMP_NAME BRANCH SALARY
Ajay Wipro 50000
Ram TCS 30000
Bala Honeywell 25000
Input:
(EMPLOYEE FACT_WORKERS)
Output:
EMP_NAME STREET CITY BRANCH SALARY
Ajay M.G.R Street Chennai Wipro 50000
Ram Anna Nagar Trichy TCS 30000

SRMMCET / CSE / DBMS 23


CS3492 – DATABASE MANAGEMENT SYSTEMS

An outer Join is basically of three types:


d. Left outer join
Left outer Join contains the set of tuples of all combinations in R and S that are equal on their
common attribute names. It is denoted by
Example: Let us consider the above EMPLOYEE and FACT_WORKERS tables.
Input: (EMPLOYEE FACT_WORKERS)
Output
EMP_NAME STREET CITY BRANCH SALARY
Ajay M.G.R Street Chennai Wipro 50000
Ram Anna Nagar Trichy TCS 30000
Suresh Park Street Trichy NUL NUL

e. Right outer join


Right outer join contains the set of tuples of all combinations in R and S that are equal on
their common attributes names. It is denoted by
Input: EMPLOYEE FACT_WORKERS
EMP_NAME BRANCH SALARY STREET CITY
Ajay Wipro 50000 M.G.R Street Chennai
Ram TCS 30000 Anna Nagar Trichy
Bala Honeywell 25000 NUL NUL

f. Full outer join


Full outer join is like a left or right join except that it contains all rows from both tables. It is
denoted by .
Example:
Input: EMPLOYEE FACT_WORKERS
Output
EMP_NAME STREET CITY BRANCH SALARY
Ajay M.G.R Street Chennai Wipro 50000
Ram Anna Nagar Trichy TCS 30000
Bala NUL NUL Honeywell 25000
Suresh Park Street Trichy NUL NUL

6. Equi Join
It is also inner join. It is based on matched data as per the equality condition. The Equi join uses
the comparison operator (=).

SRMMCET / CSE / DBMS 24


CS3492 – DATABASE MANAGEMENT SYSTEMS

Example:
CUSTOMER RELATION
CUSST_ID NAME
1 Ajay
2 Suresh
3 Ram
PRODUCT
PRODUCT_ID CITY
1 Chennai
2 Trichy
3 Madurai

Input: CUSTOMER PRODUCT


Output
CUSST_ID NAME PRODUCT_ID CITY
1 Ajay 1 Chennai
2 Suresh 2 Trichy
3 Ram 3 Madurai

1.11 Advanced SQL features

Aggregate functions are functions that take a collection of values as input and return a single
value. Aggregate functions supported by SQL are
Average: avg
Minimum: min
Maximum: max
Total: sum
Count: count
Group by clause is used to apply aggregate functions to a set of tuples. The attributes given in the
group by clause are used to form groups. Tuples with the same value on all attributes in the group by
clause are placed in one group.

Aggregate Functions
- A type of request that cannot be expressed in the basic relational algebra is to specify
mathematical aggregate functions on collections of values from the database.
- Examples of such functions include retrieving the average or total salary of all employees or the
total number of employee tuples. These functions are used in simple statistical queries that
summarize information from the database tuples.
- Common functions applied to collections of numeric values include SUM, AVERAGE,
MAXIMUM, and MINIMUM. The COUNT function is used for counting tuples or values

SRMMCET / CSE / DBMS 25


CS3492 – DATABASE MANAGEMENT SYSTEMS

a)
DNO NO_OF_EMPLOYEES AVERAGE_SAL
5 4 33250
4 3 31000
1 1 55000

b)
DNO COUNT_SSN AVERAGE_SALARY
5 4 33250
4 3 31000
1 1 55000

c)
DNO AVERAGE_SALARY
8 33125
Example:
Consider the following SQL query on the EMPLOYEE relation:
SELECT Lname, Fname
FROM EMPLOYEE
WHERE Salary > ( SELECT MAX (Salary)
FROM EMPLOYEE
WHERE Dno=5 );

This query retrieves the names of employees (from any department in the company) who
earn a salary that is greater than the highest salary in department 5. The query includes a nested
subquery and hence would be decomposed into two blocks.

1.12 Embedded SQL


SQL statements embedded in a C or C++ source file are referred to as Embedded SQL. A
preprocessor translates these statements into calls to a runtime library. Embedded SQL is an ISO/ANSI
and IBM standard.

Embedded SQL is portable to other databases and other environments, and is functionally
equivalent in all operating environments. It is a comprehensive, low-level interface that provides all the
functionality available in the product. Embedded SQL requires knowledge of C or C++ programming
languages.

Embedded SQL provides a means by which a program can interact with a database server.
However, under embedded SQL, the SQL statements are identied at compile time using a preprocessor,
which translates requests expressed in embedded SQL into function calls. At runtime, these function calls

SRMMCET / CSE / DBMS 26


CS3492 – DATABASE MANAGEMENT SYSTEMS

connect to the database using an API that provides dynamic SQL facilities but may be specie to the
database that is being used.

Example:
The embedded SQL statements are parsed by an embedded SQL preprocessor and replaced by host-
language calls to a code library. The output from the preprocessor is then compiled by the host compiler.

1.13 Dynamic SQL


Dynamic SQL is a powerful SQL programming technique that allows us to construct and
execute SQL statements at runtime. Unlike static SQL, where queries are fixed during the development
phase, dynamic SQL enables developers to build flexible and general-purpose SQL queries that adapt to
varying conditions.
Steps to use Dynamic SQL
1. Declare Variables: Declare two variables, @var1 for holding the name of the table and @var 2 for
holding the dynamic SQL :
DECLARE
@var1 NVARCHAR(MAX),
@var2 NVARCHAR(MAX);
2. Assign Values to Variables: Set the value of the @var1 variable to table_name :
SET @var1 = N'table_name';
3. Construct the SQL Statement: Create the dynamic SQL by adding the SELECT statement to the table
name parameter :
SET @var2= N'SELECT *
FROM ' + @var1;
4. Execute the SQL Statement: Run the sp_executesql stored procedure by using the @var2 parameter :
EXEC sp_executesql @var2;

Example of Dynamic SQL


Dynamic SQL can be particularly useful when working with user inputs or dynamically
changing table names. Let’s consider a table named geek with sample data representing users and their
associated cities. This table will serve as the basis for our demonstration of constructing and executing
dynamic queries.

SRMMCET / CSE / DBMS 27


CS3492 – DATABASE MANAGEMENT SYSTEMS

The following script dynamically retrieves data from the geek table:

Query:
DECLARE
@tab NVARCHAR(128),
@st NVARCHAR(MAX);
SET @tab = N'geektable';
SET @st = N'SELECT *
FROM ' + @tab;
EXEC sp_executesql @st;

Output

Advantages of Dynamic SQL


 Flexibility: Dynamic SQL lets us create queries that adapt to changing conditions, making it
ideal for situations where you need custom or on-the-fly queries.
 Handles User Input: You can easily build queries based on user-provided inputs, making our
database operations more interactive and versatile.
 Works with Dynamic Schemas: It enables operations on tables, columns, or other database
objects that may be created or modified dynamically at runtime.

Disadvantages of Dynamic SQL


 Performance Issues: Since the SQL query is built and processed at runtime, it can take longer
due to additional parsing and compiling steps.
 Security Risks: If user inputs are not properly handled, it opens the door to SQL Injection
attacks, which can compromise our database.
 Harder to Debug: Debugging dynamic queries is more complex, as they are not predefined in
the code and can change based on runtime conditions.

SRMMCET / CSE / DBMS 28

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy