DBMS Mca Ii Sem Notes
DBMS Mca Ii Sem Notes
0
1. RELATIONAL MODEL CONCEPTS
Key of a Relation:
Each row has a value of a data item (or set of items) that uniquely identifies that row in
the table
Called the key
In the STUDENT table, SSN is the key
1
Sometimes row-ids or sequential numbers are assigned as keys to identify the rows in a
table
Called artificial key or surrogate key
Formal Definition - Relation
Key of a Relation:
Each row has a value of a data item (or set of items) that uniquely identifies that row in
the table
Called the key
In the STUDENT table, SSN is the key
Sometimes row-ids or sequential numbers are assigned as keys to identify the rows in a
table
Called artificial key or surrogate key
Schema
The Schema (or description) of a Relation:
Denoted by R(A1, A2, .....An)
R is the name of the relation
The attributes of the relation are A1, A2, ..., An
Example:
CUSTOMER (Cust-id, Cust-name, Address, Phone#)
CUSTOMER is the relation name
Defined over the four attributes: Cust-id, Cust-name, Address, Phone#
Each attribute has a domain or a set of valid values.
For example, the domain of Cust-id is 6 digit numbers.
Tuple
A tuple is an ordered set of values (enclosed in angled brackets ‘< … >’)
Each value is derived from an appropriate domain.
A row in the CUSTOMER relation is a 4-tuple and would consist of four values, for example:
<632895, "John Smith", "101 Main St. Atlanta, GA 30332", "(404) 894-2000">
This is called a 4-tuple as it has 4 values
A tuple (row) in the CUSTOMER relation.
A relation is a set of such tuples (rows)
2
Domain
A domain has a logical definition:
Example: “USA_phone_numbers” are the set of 10 digit phone numbers valid in the U.S.
A domain also has a data-type or a format defined for it.
The USA_phone_numbers may have a format: (ddd)ddd-dddd where each d is a decimal
digit.
Dates have various formats such as year, month, date formatted as yyyy-mm-dd, or as
dd mm,yyyy etc.
The attribute name designates the role played by a domain in a relation:
Used to interpret the meaning of the data elements corresponding to that attribute
Example: The domain Date may be used to define two attributes named “Invoice-date”
and “Payment-date” with different meanings
State
The relation state is a subset of the Cartesian product of the domains of its attributes
each domain contains the set of all possible values the attribute can take.
Example: attribute Cust-name is defined over the domain of character strings of maximum
length 25
dom(Cust-name) is varchar(25)
The role these strings play in the CUSTOMER relation is that of the name of a customer.
Table Relation
Row Tuple
Characteristics of Relation
3
Ordering of tuples in a relation r(R):
The tuples are not considered to be ordered, even though they appear to be in the
tabular form.
Ordering of attributes in a relation schema R (and of values within each tuple):
We will consider the attributes in R(A1, A2, ..., An) and the values in t=<v1, v2, ..., vn> to
be ordered .
(However, a more general alternative definition of relation does not require this
ordering).
Values in a tuple:
All values are considered atomic (indivisible).
Each value in a tuple must be from the domain of the attribute for that column
If tuple t = <v1, v2, …, vn> is a tuple (row) in the relation state r of R(A1, A2, …,
An)
Then each vi must be a value from dom(Ai)
A special null value is used to represent values that are unknown or inapplicable to
certain tuples.
Notation:
We refer to component values of a tuple t by:
t[Ai] or t.Ai
This is the value vi of attribute Ai for tuple t
Similarly, t[Au, Av, ..., Aw] refers to the subtuple of t containing the values of attributes
Au, Av, ..., Aw, respectively in t
2. RELATIONAL MODEL CONSTRAINTS AND RELATIONAL DATABASE SCHEMAS
Constraints are conditions that must hold on all valid relation states.
There are three main types of constraints in the relational model:
Key constraints
Entity integrity constraints
Referential integrity constraints
Another implicit constraint is the domain constraint
Every value in a tuple must be from the domain of its attribute (or it could be null, if allowed for
that attribute)
Key Constraints
4
Superkey of R:
Is a set of attributes SK of R with the following condition:
No two tuples in any valid relation state r(R) will have the same value for SK
That is, for any distinct tuples t1 and t2 in r(R), t1[SK] ¹ t2[SK]
This condition must hold in any valid state r(R)
Key of R:
A "minimal" superkey
That is, a key is a superkey K such that removal of any attribute from K results in a set of
attributes that is not a superkey (does not possess the superkey uniqueness property)
Example: Consider the CAR relation schema:
CAR(State, Reg#, SerialNo, Make, Model, Year)
CAR has two keys:
Key1 = {State, Reg#}
Key2 = {SerialNo}
Both are also superkeys of CAR
{SerialNo, Make} is a superkey but not a key.
In general:
Any key is a superkey (but not vice versa)
Any set of attributes that includes a key is a superkey
A minimal superkey is also a key
If a relation has several candidate keys, one is chosen arbitrarily to be the primary key.
The primary key attributes are underlined.
Example: Consider the CAR relation schema:
CAR(State, Reg#, SerialNo, Make, Model, Year)
We chose SerialNo as the primary key
The primary key value is used to uniquely identify each tuple in a relation
Provides the tuple identity
Also used to reference the tuple from another tuple
General rule: Choose as primary key the smallest of the candidate keys (in terms of size)
Not always applicable – choice is sometimes subjective
5
Entity Integrity:
The primary key attributes PK of each relation schema R in S cannot have null values in
any tuple of r(R).
This is because primary key values are used to identify the individual tuples.
t[PK] ¹ null for any tuple t in r(R)
If PK has several attributes, null is not allowed in any of these attributes
Note: Other attributes of R may be constrained to disallow null values, even though
they are not members of the primary key.
Referential Integrity
A constraint involving two relations
The previous constraints involve a single relation.
Used to specify a relationship among tuples in two relations:
The referencing relation and the referenced relation.
Tuples in the referencing relation R1 have attributes FK (called foreign key attributes) that
reference the primary key attributes PK of the referenced relation R2.
A tuple t1 in R1 is said to reference a tuple t2 in R2 if t1[FK] = t2[PK].
A referential integrity constraint can be displayed in a relational database schema as a directed
arc from R1.FK to R2.
Statement of the constraint
The value in the foreign key column (or columns) FK of the the referencing relation R1
can be either:
(1) a value of an existing primary key value of a corresponding primary key PK in
the referenced relation R2, or
(2) a null.
In case (2), the FK in R1 should not be a part of its own primary key.
6
Displaying Relational Database Schema and its constraints
Each relation schema can be displayed as a row of attribute names
The name of the relation is written above the attribute names
The primary key attribute (or attributes) will be underlined
A foreign key (referential integrity) constraints is displayed as a directed arc (arrow) from the
foreign key attributes to the referenced table
Can also point the the primary key of the referenced relation for clarity
7
3. UPDATE OPERATIONS AND DEALING WITH CONSTRAINT VIOLATION
Each relation will have many tuples in its current relation state
The relational database state is a union of all the individual relation states
Whenever the database is changed, a new state arises
Basic operations for changing the database:
INSERT a new tuple in a relation
DELETE an existing tuple from a relation
MODIFY an attribute of an existing tuple
INSERT a tuple.
DELETE a tuple.
MODIFY a tuple.
Integrity constraints should not be violated by the update operations.
8
Several update operations may have to be grouped together.
Updates may propagate to cause other updates automatically. This may be necessary to
maintain integrity constraints.
In case of integrity violation, several actions can be taken:
Cancel the operation that causes the violation (RESTRICT or REJECT option)
Perform the operation but inform the user of the violation
Trigger additional updates so the violation is corrected (CASCADE option, SET NULL
option)
Execute a user-specified error-correction routine
Possible violations of each operation
INSERT may violate any of the constraints:
Domain constraint:
if one of the attribute values provided for the new tuple is not of the specified
attribute domain
Key constraint:
if the value of a key attribute in the new tuple already exists in another tuple in
the relation
Referential integrity:
if a foreign key value in the new tuple references a primary key value that does
not exist in the referenced relation
Entity integrity:
if the primary key value is null in the new tuple
DELETE may violate only referential integrity:
If the primary key value of the tuple being deleted is referenced from other tuples in the
database
Can be remedied by several actions: RESTRICT, CASCADE, SET NULL (see Chapter
8 for more details)
RESTRICT option: reject the deletion
CASCADE option: propagate the new primary key value into the foreign
keys of the referencing tuples
SET NULL option: set the foreign keys of the referencing tuples to NULL
9
One of the above options must be specified during database design for each foreign key
constraint
UPDATE may violate domain constraint and NOT NULL constraint on an attribute being modified
Any of the other constraints may also be violated, depending on the attribute being updated:
Updating the primary key (PK):
Similar to a DELETE followed by an INSERT
Need to specify similar options to DELETE
Updating a foreign key (FK):
May violate referential integrity
Updating an ordinary attribute (neither PK nor FK):
Can only violate domain constraints
THE RELATIONAL ALGEBRA AND RELATIONAL CALCULUS-
Overview
Relational algebra is the basic set of operations for the relational model
These operations enable a user to specify basic retrieval requests (or queries)
The result of an operation is a new relation, which may have been formed from
one or more input relations
This property makes the algebra “closed” (all objects in relational algebra
are relations)
The algebra operations thus produce new relations
10
4. UNARY RELATIONAL OPERATIONS: SELECT AND PROJECT –
11
SELECT Operation Properties
The SELECT operation <selection condition>(R) produces a
relation S that has the same schema (same attributes)
as R
SELECT is commutative:
<condition1>( < condition2> (R)) = <condition2> ( < condition1> (R))
Because of commutativity property, a cascade
(sequence) of SELECT operations may be applied in
any order:
<cond1>(<cond2> (<cond3> (R)) = <cond2> (<cond3> (<cond1> ( R)))
A cascade of SELECT operations may be replaced by a
single selection with a conjunction of all the
conditions:
<cond1>(< cond2> (<cond3>(R)) = <cond1> AND < cond2> AND < cond3>(R)))
The number of tuples in the result of a SELECT is
less than (or equal to) the number of tuples in the
input relation R
12
The general form of the project
operation is:
<attribute list>(R)
(pi) is the symbol used to represent the
project operation
<attribute list> is the desired list of
attributes from relation R.
The project operation removes any
duplicate tuples
This is because the result of the project
operation must be a set of tuples 13
14
The general RENAME operation can
be expressed by any of the following
forms:
S (B1, B2, …, Bn )(R) changes both:
the relation name to S, and
the column (attribute) names to B1, B1, …..Bn
S(R) changes:
the relation name only to S
(B1, B2, …, Bn )(R) changes: 15
16
Example:
To retrieve the social security numbers of all
employees who either work in department 5
(RESULT1 below) or directly supervise an employee
who works in department 5 (RESULT2 below)
We can use the UNION operation as follows:
DEP5_EMPS DNO=5 (EMPLOYEE)
RESULT1 SSN(DEP5_EMPS)
RESULT2(SSN) SUPERSSN(DEP5_EMPS)
RESULT RESULT1 RESULT2
The union operation produces the tuples that are in
either RESULT1 or RESULT2 or both
17
Type Compatibility of operands is required for
the binary set operation UNION , (also for
INTERSECTION , and SET DIFFERENCE –, see
next slides)
R1(A1, A2, ..., An) and R2(B1, B2, ..., Bn) are
type compatible if:
they have the same number of attributes, and
the domains of corresponding attributes are type
compatible (i.e. dom(Ai)=dom(Bi) for i=1, 2, ..., n).
The resulting relation for R1R2 (also for
R1R2, or R1–R2, see next slides) has the
same attribute names as the first operand
relation R1 (by convention)
18
SET DIFFERENCE (also called MINUS or
EXCEPT) is denoted by –
The result of R – S, is a relation that
includes all tuples that are in R but not
in S
The attribute names in the result will
be the same as the attribute names
in R
The two operand relations R and S
must be “type compatible”
19
CARTESIAN (or CROSS) PRODUCT Operation
This operation is used to combine tuples from two
relations in a combinatorial fashion.
Denoted by R(A1, A2, . . ., An) x S(B1, B2, . . ., Bm)
Result is a relation Q with degree n + m attributes:
Q(A1, A2, . . ., An, B1, B2, . . ., Bm), in that order.
The resulting relation state has one tuple for each
combination of tuples—one from R and one from
S.
6. BINARY RELATIONAL OPERATIONS: JOIN AND DIVISION –
Hence, if R has nR tuples (denoted as |R| = nR ), and
S has nS tuples, then R x S will have nR * nS tuples.
20
The two operands do NOT have to be "type
compatible”
Example: Suppose that we want to retrieve the
name of the manager of each department.
To get the manager’s name, we need to combine
each DEPARTMENT tuple with the EMPLOYEE tuple
whose SSN value matches the MGRSSN value in the
department tuple.
We do this by using the join operation.
EMPLOYEE.SSN
The general case of JOIN operation is
called a Theta-join: R S
theta
The join condition is called theta
Theta can be any general boolean
expression on the attributes of R and S;
for example:
R.Ai<S.Bj AND (R.Ak=S.Bi OR R.Ap<S.Bq)
Most join conditions involve one or
more equality conditions “AND”ed
together; for example: 22
24
25
26
7. ADDITIONAL RELATIONAL OPERATION –
GROUP FUNCTIONS (or) AGGREGATE FUNCTIONS
Sum
Avg
Max
Min
Count
Group functions will be applied on all the rows but produces single output.
a) SUM
This will give the sum of the values of the specified column.
Syntax: sum (column)
Ex: SQL> select sum(sal) from emp;
SUM(SAL)
----------
38600
27
b) AVG
This will give the average of the values of the specified column.
Syntax: avg (column)
Ex: SQL> select avg(sal) from emp;
AVG(SAL)
---------------
2757.14286
c) MAX
This will give the maximum of the values of the specified column.
Syntax: max (column)
Ex: SQL> select max(sal) from emp;
MAX(SAL)
----------
5000
d) MIN
This will give the minimum of the values of the specified column.
Syntax: min (column)
Ex: SQL> select min(sal) from emp;
MIN(SAL)
----------
500
e) COUNT
This will give the count of the values of the specified column.
Syntax: count (column)
Ex: SQL> select count(sal),count(*) from emp;
COUNT(SAL) COUNT(*)
-------------- ------------
14 14
GROUP BY AND HAVING
GROUP BY
Using group by, we can create groups of related information.
Columns used in select must be used with group by, otherwise it was not a group by
expression.
Ex: SQL> select deptno, sum(sal) from emp group by deptno;
DEPTNO SUM(SAL)
---------- ----------
28
10 8750
20 10875
30 9400
HAVING
This will work as where clause which can be used only with group by because of
absence of where clause in group by.
Ex: SQL> select deptno,job,sum(sal) tsal from emp group by deptno,job having
sum(sal) > 3000;
DEPTNO JOB TSAL
---------- --------- ----------
10 PRESIDENT 5000
20 ANALYST 6000
30 SALESMAN 5600
SQL> select deptno,job,sum(sal) tsal from emp group by deptno,job having sum(sal) >
3000 order by job;
DEPTNO JOB TSAL
---------- --------- ----------
20 ANALYST 6000
10 PRESIDENT 5000
30 SALESMAN 5600
ORDER OF EXECUTION
Group the rows together based on group by clause.
Calculate the group functions for each group.
Choose and eliminate the groups based on the having clause.
Order the groups based on the specified column.
29
RELATIONAL CALCULUS
2. It is based on Predicate calculus, a name derived from branch of symbolic language. A predicate
is a truth-valued function with arguments. On substituting values for the arguments, the function
result in an expression called a proposition. It can be either true or false. It is a tailored version of
a subset of the Predicate Calculus to communicate with the relational database.
It is a non-procedural query language which is based on finding a number of tuple variables also known
as range variable for which predicate holds true. It describes the desired information without giving a
specific procedure for obtaining that information. The tuple relational calculus is specified to select the
tuples in a relation. In TRC, filtering variable uses the tuples of a relation. The result of the relation can
have one or more tuples.
30
Notation:
Where
For example:
Output: This query selects the tuples from the AUTHOR relation. It returns a tuple with 'name' from
Author who has written an article on 'database'.
TRC (tuple relation calculus) can be quantified. In TRC, we can use Existential (∃) and Universal
Quantifiers (∀).
For example:
Output: This query will yield the same result as the previous one.
The second form of relation is known as Domain relational calculus. In domain relational calculus,
filtering variable uses the domain of attributes. Domain relational calculus uses the same operators as
tuple calculus. It uses logical connectives ∧ (and), ∨ (or) and ┓ (not). It uses Existential (∃) and
Universal Quantifiers (∀) to bind the variable. The QBE or Query by example is a query language related
to domain relational calculus.
Notation:
31
1. { a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}
Where
For example:
Output: This query will yield the article, page, and subject from the relational javatpoint, where the
subject is a databas
32