0% found this document useful (0 votes)
9 views64 pages

Outer Join and Aggregate Function

The document discusses relational algebra, specifically focusing on the concept of Join, which is a binary operator used to combine related tuples from two relations. It covers various types of joins including Inner Join, Outer Join, Theta Join, Equi Join, and Natural Join, along with examples and explanations of their applications. Additionally, it addresses the implications of join operations on data integrity and provides a set of equivalences and queries related to relational algebra.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views64 pages

Outer Join and Aggregate Function

The document discusses relational algebra, specifically focusing on the concept of Join, which is a binary operator used to combine related tuples from two relations. It covers various types of joins including Inner Join, Outer Join, Theta Join, Equi Join, and Natural Join, along with examples and explanations of their applications. Additionally, it addresses the implications of join operations on data integrity and provides a set of equivalences and queries related to relational algebra.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 64

Relational Algebra

Join

Sanghita Bhattacharjee
Department of CSE
NIT Durgapur
References
• A Silberschatz, H F Korth and S Sudarshan, Database System Concepts, 5th Edition, 2006

• Ramez Elmasriand Shamkant, B Navathe, Fundamentals of Database Systems, 3rd


Edition, Addison Wesley, 2000

• Video lectures:
(i) Database Management System by Prof. Partha Pratim Das
(ii) Introduction to database systems by Prof. P. Sreenivasa Kumar
(iii) Online DBMS tutorials
Join
• Binary operator
• Denoted by
• Join is used to combine the related tuples from two relations into single tuples
• Join is useful as it allows to process the relationships among the relations
• Cartesian Product:
• All combinations of tuples are included in the result
• Certain tuples in the result are meaningful
• Useful when we follow select after Cartesian Product
• Join : only the combination of tuples satisfying the join condition appear in the
result
R S = 𝜎𝑐 ( R× S)
Example of Join
Employee HOD

EID Ename Dept HODID Dept Year


1 Smith CS 1 CS 2020
2 David IT 2 IT 2019
3 John IT 4 EE 2019
4 Virat EE

Suppose that, we want to retrieve the name of the HOD of various departments. So, we have to join two tables/
relations to retrieve the required information i.e. Ename.
Here, HODID is proper subset of EID

𝜋𝐸𝑛𝑎𝑚𝑒 (𝐸𝑚𝑝𝑙𝑜𝑦𝑒𝑒 ⋈𝐸𝑚𝑝𝑙𝑜𝑦𝑒𝑒.𝐸𝐼𝐷=𝐻𝑂𝐷.𝐻𝑂𝐷𝐼𝐷 HOD)

HODID is FK and EID is PK. So referential integrity is used to maintain the consistency to match the two
join attributes.
Types of Join

• Types of Join: Inner Join and Outer Join ( See later)


• Inner Join:
Theta join
Equi join
Natural join ( a variant of inner join)
• Outer join:
Left outer
Right outer
Full outer
Theta Join
• Let R (A1, A2, … An ) and S (B1, B2, … Bm) and R ∩ S = 𝜙 then 𝑇heta join
combines the tuples from the relations into a relation with (m+n) attributes that
satisfy the condition represented by 𝛩

• Theta join of R and S : R S

Θ = Join condition is form of = c1 AND c2 AND c3 AND…..

Each condition ci is form of Ai Θ Bi where Ai is attribute of R and Bi is attribute of


S and Ai and Bi have same domain
Θ ={ < , > =, <=, =, ≠}
Theta Join Example
S
R
Class Course
SID Sname Section
2 CS01
101 Alex 3
2 PH01
102 Rohit 2
3 ME01
1 BIO01

Q= R S
= (R.Section = S.Class)

SID Sname Section Class Course


101 Alex 3 3 ME01
102 Rohit 2 2 CS01
102 Rohit 2 2 PH01
• Fewer tuples than cross-product, might be able to compute more efficiently
• Tuples whose join attributes are null do not appear in the result. Thus join does
not preserve all the information of the input relations
Equi Join
• Equality is the only comparison operator used in the join condition
• Equi Join is a one kind of theta join where 𝛩 is an equality

Example : Enrolment( SID, Cno, Semester, Grade)


Course (CID, Cname, Credit, Hours)
Find the name of the course and credits in which students have enrolled
T1= 𝜋𝐶𝐼𝐷,𝐶𝑛𝑎𝑚𝑒,𝐶𝑟𝑒𝑑𝑖𝑡 (Course)
T2= 𝜋𝑆𝐼𝐷,𝐶𝑛𝑜 (Enrolment)

Resulting relation = 𝜋𝑆𝐼𝐷,𝐶𝑛𝑎𝑚𝑒,𝐶𝑟𝑒𝑑𝑖𝑡 (T1 T2 )


T1.CID= T2.Cno

Equivalent SQL: Select SID, Cname, Credit from Course as T1 inner join Enrolment as T2 on T1.CID =
T2.Cno
Equi Join Example

Class Course
SID Sname Section
2 CS01
R 101 Alex 3 S
2 PH01
102 Rohit 2
3 ME01
1 BIO01

Q = (R S)
R.Section = S.Class

SID Sname Section Class Course


101 Alex 3 3 ME01
102 Rohit 2 2 CS01
102 Rohit 2 2 PH01
Why Natural Join

• In result of an Equi Join, we can have one or more pairs of attributes that have
identical values in every tuple because of the equality join condition
• See in the previous slide, values of the attributes Section and Class are identical
for every tuple in the resulting relation Q
• Because one of each pair of attributes with identical values is superfluous, Natural
Join is created to over come the problem of superfluous attribute in an Equi Join
• The definition of Natural Join requires that the two join attributes ( or a pair of
join attributes) must have same name in both the relations.
Natural Join
• Natural join does not use any comparison operator. It does not concatenate the
way the Cartesian product does
• Natural join can only be performed if there exists at least one common attribute
between the relations. Those attributes must have same name and domain
• There can be a list of join attributes from each relation, each corresponding pair
must have the same name
• Relations R, S-have common attributes, say X1,X2,X3
• Join condition:
(R.X1= S.X1) ^ (R.X2= S.X2) ^ (R.X3= S.X3)
provided the values of common attributes should be equal
• Schema for the result Q = R ⋃(S-{X1, X2, X3 })
Only one copy of the common attributes is kept
• Notation for natural join Q = R* S
Natural Join Example

A B B C
R X Y S Z U
X Z V W
Y Z Z V
Z V

A B C
X Z U
R*S X Z V
Y Z U
Y Z V
Z V W
More on Natural Join
• If joining attributes of the relations do not have same name, a renaming operation
is performed first. Then join is applied
Q= R * (𝜌A (S))
• If the attributes on which the natural join is performed have the same name in both
the relations , renaming is unnecessary
• In join, if no combination of tuples satisfies the join condition, the result of join is
empty relation with 0 tuples. If R has m tuples and S has n tuples, the size of R * S
will have between 0 to m*n tuples
• If there is no join condition, all tuples qualify and join becomes Cartesian Product
• The natural join or Equi join can also be specified among multiple tables, leading
to n-way join
((R *a S) *b Q)
1. Let two relations R ( A, B, C) and S ( B, D, E)
B→A
A→C
R has 200 tuples and S has 100 tuples. What is the maximum size of the natural join
R * S ? Answer : 100 tuples

2. Consider two relations A (P, Q, R) and B ( R, S, T) with primary keys P and R


respectively. The relation A contains 200 tuples and B contains 250 tuples. What is
the maximum size of A * B ?
(i) 200 tuples
(ii) 250 tuples
(iii) 50000 tuples
(iv) 0 tuples
Q: What is the optimized version of the relational algebra expression
𝜋𝐴1 (𝜋𝐴2 (𝜎𝐹1 (𝜎𝐹2 (r) )))
where A1 , A2 are the sets of attributes in r with A1 ⊆ 𝐴2 and F1 and F2 are the Boolean expression
based on the attributes in r?
(i) 𝜋𝐴1 (𝜎𝐹1∧𝐹2 ( r ))
(ii) 𝜋𝐴1 (𝜎𝐹1∨𝐹2 ( r ))
(iii) 𝜋𝐴2 (𝜎𝐹1∧𝐹2 ( r ))
(iv) 𝜋𝐴2 (𝜎𝐹1∨𝐹2 ( r ))

Q:
Q. Which of the following query expression are correct? r1, r2 are relations, c1 c2
are conditions and A1, A2 are attributes
(i ) 𝜎𝑐1 (𝜎𝑐2 ( r1)) → 𝜎𝑐2 (𝜎𝑐1 ( r1))
(ii) 𝜎𝑐1 ( r1 ∪ r2) → 𝜎𝑐1 (r1) ∪ 𝜎𝑐1 (r2)
(iii) 𝜎𝑐1 (𝜋𝐴1 ( r1)) → 𝜋𝐴1 (𝜎𝑐1 ( r1))
(iv) 𝜋𝐴1 (𝜎𝑐1 ( r1)) → 𝜎𝑐1 (𝜋𝐴1 ( r1))
Questions

• Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R * S ?

• Given R(A, B, C), S(D, E), what is R * S ?

• Given R(A, B), S(A, B), what is R * S ?


Complete Set of Operators
• { σ, π ,∪, −,× } are operators are known as 𝑐𝑜𝑚𝑝𝑙𝑒𝑡𝑒 𝑠𝑒𝑡 of operators as any
other relational algebra operations can be expressed as a sequence of operations
from the set
R ∩ S= ( R ∪ S) – (( R – S) ∪ (S – R)
(R ⋈ C S ) = 𝜎𝐶 ( R × S)
Rename
• Unary operator
• Changes the schema, not the instance
• Notation: 𝜌 (B1,…,Bn) (R) // change column names or attributes to B1, B2, … Bn //
• Notation : 𝜌 S (R) // rename the relation name R to S //
• Example:
𝜌 (SID, Sname, Age) (Student)

Student 𝜌 (SID, Sname, Age) (Student)

RollNo Name Age SID Sname Age


101 Rohit 18 101 Rohit 18
102 Ranbir 19 102 Ranbir 19
103 Sandy 18 103 Sandy 18
Equivalence
1. σc1∧c2 = σc1 σc2 r
2. σc1 σc2 r = σc2 σc1 r
3. If A⊆ A1, 𝜋𝐴 (r) = 𝜋𝐴 (𝜋𝐴1 (r))
4. πA (σc (r)) = σc (πA r ) if attributes in c ⊆ attributes in A
5. if attributes in c ⊆ attributes in r , 𝜎𝑐 ( r ⋈ 𝑠) = 𝜎𝑐 ( r)⋈s
6. ( r⋈s)⋈ q= r ⋈(s⋈q) ,⋈:: join
7. σc ( r ∪ s)= σc (r ) ∪ σc (s) (also intersection, difference)
8. πA (r ∪ s )= πA (r ) ∪ πA (s)
9. if c involves the attributes in A of r and B of s, 𝜋𝐴𝐵 (r ⋈𝑐 𝑠 )= 𝜋𝐴 (r) ⋈𝑐 𝜋𝐵 (s)
Queries and RA # 1

Book = (BID, title, publiser, year)


Student = (SID, sname, age, major)
Author = (Aname, address)
Borrow= (BID, SID, date)
Has_written =( BID, Aname)
Describe = (BID, keyword)
Book = (BID, title,
Q: List year and title of each book publiser, year)
𝜋𝑡𝑖𝑡𝑙𝑒,𝑦𝑒𝑎𝑟 𝐵𝑜𝑜𝑘 Student = (SID, sname,
age, major)
Q. List all information about the students whose major is ‘CS’ Author = (Aname, address)
𝜎𝑚𝑎𝑗𝑜𝑟=′𝐶𝑆 ′ (Student) Borrow= (BID, SID, date)
Has_written =( BID,
Q. List all students with books they can borrow Aname)
Student × Borrow Describe = (BID, keyword)

Q. List all books published by LPE before 1990


𝜎𝑝𝑢𝑏𝑙𝑖𝑠ℎ𝑒𝑟=′𝐿𝑃𝐸′^𝑦𝑒𝑎𝑟<1990 (Book)
Q. List name of students who are older than 24 and not studying CS
𝜋𝑠𝑛𝑎𝑚𝑒 ( 𝜎𝑎𝑔𝑒>24 (Student)) - 𝜋𝑠𝑛𝑎𝑚𝑒 ( 𝜎𝑚𝑎𝑗𝑜𝑟 ≠′𝐶𝑆 ′ (Student))

Q. List name of student who have borrowed a book and major is CS


𝜋𝑠𝑛𝑎𝑚𝑒 ( 𝜎𝑆𝑡𝑢𝑑𝑒𝑛𝑡.𝑆𝐼𝐷=𝐵𝑜𝑟𝑟𝑜𝑤.𝑆𝐼𝐷 (𝜎𝑚𝑎𝑗𝑜𝑟 =′𝐶𝑆 ′ (Student) × Borrow))

Q. List the books written by Korth


𝜋𝑡𝑖𝑡𝑙𝑒 (𝜎𝐻𝑎𝑠𝑤𝑟𝑖𝑡𝑡𝑒𝑛.𝐵𝐼𝐷=𝐵𝑜𝑜𝑘.𝐵𝐼𝐷 (𝜎𝑎𝑛𝑎𝑚𝑒=′ 𝐾𝑜𝑟𝑡ℎ′ (Has_written) × Book))
Queries in Join
Employee (eid, ename, street, city)
Works_on (eid, cname, salary)
Company (CID, cname, city)
Manage( eID, manager_name)

Q. Find names and cities of all employees who work for RS company
𝜋𝑒𝑛𝑎𝑚𝑒,𝑐𝑖𝑡𝑦 (𝜎𝑐𝑛𝑎𝑚𝑒=′𝑅𝑆′ ( Works_on) *Employee)
Q. Find name, street, city of all employees who work for RS and earn more than
20000
T1= 𝜎𝑐𝑛𝑎𝑚𝑒=′𝑅𝑆′ ∧ 𝑠𝑎𝑙𝑎𝑟𝑦>20000 ( Works_on)
Result = 𝜋𝑒𝑛𝑎𝑚𝑒,𝑠𝑡𝑟𝑒𝑒𝑡,𝑐𝑖𝑡𝑦 (T1 * Employee)
Queries in Join

Q. Companies are in different cities. Find all companies located in every city in
which RS company is located
T1= 𝜋𝑐𝑖𝑡𝑦 (𝜎𝑐𝑛𝑎𝑚𝑒=𝑅𝑆 (Company))
T2= company ÷ 𝑇1
Result= 𝜋𝑐𝑛𝑎𝑚𝑒 ( T2)
Outer Join and Aggregation
Inner Join
• Inner join is one of the more frequent types of joins
• Finds all rows which meet the join condition
• Theta join, equi-join, natural join are all called inner joins
• The result of these operations contain only the matching tuples

r s
EMP MANAGER

EID Ename Salary Dept EID ControllingDept


1 Smith 20000 HR 4 HR
2 David 30000 IT 2 IT
3 Alisha 40000 SALES 3 SALES
4 Alia 40000 HR 5 FINANCE
5 Dev 30000 FINANCE
6 Niya 20000 IT
EMP * MANAGER

When we perform EMP * MANAGER, then the TID Ename Salary Dept Controlli
resulting relation will give all information about the ng Dept
employees who are managers, not other employees 4 Alia 40000 HR HR
information. So, we loose some tuples after 2 David 30000 IT IT
performing natural join between the relations.
3 Alisha 40000 SALES SALES
Information about Smith, Niya are not in the output 5 Dev 30000 FINANCE FINANCE
relation
Outer Join
• There exists methods by which all the tuples of any relation are included in
resulting relation. They are known as outer join
• Using outer join, all tuples in relation r or relation s or both in r and s can be
included in resulting relations
• There are three kinds of outer join
Left outer

Right outer

Full outer

• Outer join was developed to take union of tuples from both the relations if
relations are not union compatible. Relations are partially compatible
• Outer join can be used to avoid loss of information
Left Outer Join

• Left outer r s
all the tuples of the left relation r are included in the resulting relation and if there
exists tuples t in r without matching tuple in s then s-attributes of t are made NULL
in the resulting relation

r s
Example of Outer Join

Relation : S Relation : F

SID Sname Age Dept EMPID EID Ename Sex


101 David 18 CSE 4P101 4P101 Alex M
102 Joy 19 IT 4P102
4P102 Joydeep M
103 Rohit 20 CSE NULL
4P103 Ankita F
104 Ronita 20 ME 4P103
105 Anu 19 CHE NULL 4P104 Tanmay M
Left Outer Join Example

• T1= S F
SID Sname Age Dept EMPID EID Ename Sex
101 David 18 CSE 4P101 4P101 Alex M
102 Joy 19 IT 4P102 4P102 Joydeep M

103 Rohit 20 CSE NULL NULL NULL NULL


SID Sname Ename
104 Ronita 20 ME 4P103 4P103 Ankita F
105 Anu 19 CHE NULL NULL NULL NULL 101 David Alex
102 Joy Joydeep
Find SID, name and the corresponding 103 Rohit NULL
supervisor name if any 104 Ronita Ankita
105 Anu NULL
𝑅𝑒𝑠𝑢𝑙𝑡 = 𝜋𝑆𝐼𝐷,𝑆𝑛𝑎𝑚𝑒,𝐸𝑛𝑎𝑚𝑒 (T1)
Right Outer Join

• Right outer join r s


All the tuples of the right relation s are included in the resulting relation and if there
exists tuples t in s without matching tuple in r then r-attributes of t are made NULL
in the resulting relation

r s
Right Outer Join Example

• T1= S F
SID Sname Age Dept EMPID EID Ename Sex
101 David 18 CSE 4P101 4P101 Alex M
102 Joy 19 IT 4P102 4P102 Joydeep M
104 Ronita 20 ME 4P103 4P103 Ankita F
NULL NULL NULL NULL NULL 4P104 Tanmay M SID Sname Ename
101 David Alex
102 Joy Joydeep
Find SID, name and the corresponding
supervisor name if any 104 Ronita Ankita
NULL NULL Tanmay
𝑅𝑒𝑠𝑢𝑙𝑡 = 𝜋𝑆𝐼𝐷,𝑆𝑛𝑎𝑚𝑒,𝐸𝑛𝑎𝑚𝑒 (T1)
Full Outer Join

• Full outer join r s

All the tuples in both the relations r and s are in the result and if there no matching
tuples for both relation, their respective unmatched attributes are made NULL

r s
Full Outer Join

• T1 = S F
SID Sname Age Dept EMPID EID Ename Sex
101 David 18 CSE 4P101 4P101 Alex M
102 Joy 19 IT 4P102 4P102 Joydeep M
104 Ronita 20 ME 4P103 4P103 Ankita F
NULL NULL NULL NULL NULL 4P104 Tanmay M
SID Sname Ename
103 Rohit 20 CSE NULL NULL NULL NULL
101 David Alex
105 Anu 19 CHE NULL NULL NULL NULL
102 Joy Joydeep
104 Ronita Ankita
𝑅𝑒𝑠𝑢𝑙𝑡 = 𝜋𝑆𝐼𝐷,𝑆𝑛𝑎𝑚𝑒,𝐸𝑛𝑎𝑚𝑒 (T1) NULL NULL Tanmay
103 Rohit NULL
105 Anu NULL
Extended Relational-Algebra-Operations

• Generalized Projection
• Aggregate Functions
Aggregate Functions and Operations
• Aggregation function takes a collection of values and returns a single value as a result
avg: average value
min: minimum value
max: maximum value
sum: sum of values
count: number of values
• Aggregate operation in relational algebra
G1, G2, …, Gn g F1( A1), F2( A2),…, Fn( An) (E)

• E is any relational-algebra expression


• G1, G2 …, Gn is a list of attributes on which to group (can be empty)
• Each Fi is an aggregate function
• Each Ai is an attribute name

• Duplicate is not eliminated when aggregate function is applied


Aggregation Example

Employee
Group the employees based on department number
EID DNO HOD
Retrieve no of employees in each department
4P101 CSE X
𝐷𝑁𝑂g𝑐𝑜𝑢𝑛𝑡 (𝐸𝐼𝐷) (𝐸𝑚𝑝𝑙𝑜𝑦𝑒𝑒) 4P102 CSE X
4P103 ME Y
4P104 IT Z
4P105 ME Y
4P106 IT Z
DNO Count (EID)
CSE 2
IT 2
ME 2
Renaming and Aggregation
• Result of aggregation does not have a name
• Can use rename operation to give it a name
𝐷𝑁𝑂 g𝑐𝑜𝑢𝑛𝑡 𝐸𝐼𝐷 𝑎𝑠 𝑁𝑜 𝑜𝑓 𝑒𝑚𝑝𝑙𝑜𝑦𝑒𝑒 (𝐸𝑚𝑝𝑙𝑜𝑦𝑒𝑒)

• For convenience, we permit renaming as part of aggregate operation

R DNO No of employee
𝜌𝑅( 𝐷𝑁𝑂,𝑁𝑜 𝑜𝑓 𝑒𝑚𝑝𝑙𝑜𝑦𝑒𝑒) CSE 2
IT 2
ME 2
Another Example

• Relation account grouped by branch-name:

branch-name account-number balance


Perryridge A-102 400
Perryridge A-201 900
Brighton A-217 750
Brighton A-215 750
Redwood A-222 700

branch-name g count- distinct (branch-name) as count (account) branch-name g sum(balance) (account)


branch-name Sum (balance)
branch-name Count Perryridge 1300
Perryridge 1 Brighton 1500
Brighton 1 Redwood 700
Redwood 1
Generalized Projection
• Generalized projection operation extends the projection by allowing arithmetic
expression to be used in projection list
• General form: 𝜋𝐹1,𝐹2,….,𝐹𝑛 (E)
where E is any relational algebra expression/ relation and each F1, F2 is an
arithmetic expression involving constant and attributes in schema of E
• We want to find how much a person can spend
𝜋𝑁𝑎𝑚𝑒, 𝐿𝑖𝑚𝑖𝑡 −𝐸𝑥𝑝𝑒𝑛𝑑𝑖𝑡𝑢𝑟𝑒 𝑎𝑠 𝐴𝑣𝑎𝑖𝑙𝑎𝑏𝑙𝑒 (Balance _info)
Balance_info
Name Limit Expenditure
Smith 2000 700
Joe 1000 500
Lee 1500 1000
Ricky 2000 800
• Retrieve the sum of salaries, average salary, the maximum salary and the
minimum salary
SQL :
Select min(salary),max(salary),average(salary), sum(salary) from EMP;
• Retrieve sum of salaries of the employees working in research department
SQL:
Select sum(salary) from EMP where Dept=‘Research’;
• Count total employees
SQL:
Select count(*) from EMP;
• Count number of employees working in HR department
SQL:
Select count(EMPID) from EMP where Dept= ‘HR’;
• For each department, find number of employees
SQL: Select Dept, count(*) from EMP group by Dept;
• Find name of employees with more than two dependents
SQL: Select Ename from EMP where (select count(*) from DEPENDENT group by
EMPID having count(*))>2
Or
Select Ename from EMP where (Select count(*) from DEPENDENT, EMP where
DEPENDENT.EMPID=EMP.ID)>2;
NULL Value

• It is possible for tuples to have a null value, denoted by null, for some of their
attributes
• null signifies an unknown value or that a value does not exist
• The result of any arithmetic expression involving null is null
• Aggregate functions simply ignore null values
• For duplicate elimination and grouping, null is treated like any other value, and
two nulls are assumed to be the same
NULL Value
• Comparisons with null values return the special truth value unknown
• If false was used instead of unknown, then not (P < 5)
would not be equivalent to P >= 5
• Three-valued logic using the truth value unknown:
• OR: (unknown or true) = true
(unknown or false) = unknown
(unknown or unknown) = unknown
• AND: (true and unknown) = unknown,
(false and unknown) = false,
(unknown and unknown) = unknown
• NOT: (not unknown) = unknown
• Result of select predicate is treated as false if it evaluates to unknown

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy