TDA357 L12 RelationalAlgebra
TDA357 L12 RelationalAlgebra
• Example: πid,name(Students)
• In SQL: SELECT id,name FROM Students;
• Called the projection operator (we project a certain view of the relation)
Sets, bags or lists? (Again)
• Remember: A set has no duplicates or internal ordering, bags allow duplicates,
lists allow duplicates and each value has a position
• Traditionally, relations are considered sets of tuples in relational algebra
• This makes them harder to translate to/from SQL where results are bags
• There are also things like sorting operators in most Relational Algebra
definitions, which is not compatible with either sets or bags
• In this course we use bag semantics
• Semantics ≈ what expressions mean, as opposed to how they look (syntax)
• You will need to understand the implications of this choice
Projection on sets/bags
• Projection is one of the operators where set/bag semantics differ
• The intuition of projection is that you just remove a few attributes
• If using set semantics, the number of tuples/rows may decrease,
because duplicates are introduced when removing the attributes!
• One way to explain this in terms of SQL:
• With bag semantics, projection corresponds to the SELECT clause
• With set semantics, projection corresponds to SELECT DISTINCT
• In this course, we follow the intuition and use bag semantics for π
Table: WL set semantics bag semantics
student course position student student
Student1 TDA357 1 Student1 Student1
Student2 TDA357 2
πstudent(WL) Student2
Student2
Student1 TDA143 1 Student1
Selection
• The σ (sigma) operator corresponds to the WHERE-clause in SQL
• Syntax: σ<condition on rows>(R)
• In SQL:
SELECT * FROM <SQL for R> WHERE <condition on rows>
• Conditions should be simple row-wise checks, do not put RA-expressions in
your conditions (unlike in SQL where subqueries are allowed)
• Boolean syntax from SQL (AND, OR, NOT ...) or logical symbols ( , ,¬...)
• Comparisons like <, >, = on constants and attributes
• Called the selection operator because it selects which rows to keep
The most unfortunate naming mismatch ever
• Selection (σ) does not correspond to the SELECT clause in SQL!
• σ corresponds more closely to the WHERE clause
• Projection (π) corresponds to SELECT
πstudent(Grades) σstudent=1(Grades)
• "Take all idnr from students, and remove all idnr with a passing grade"
• Like in SQL, schemas must be compatible (same number of attributes)
Extending set operations to bags
• In sets, each tuple is either in or not in each relation
• In bags, each tuple occurs a number of times in each relation
• Assuming x occurs n times in R1 and m times in R2
• x occurs n+m times in R1 U R2
• x occurs min(n,m) Qmes in R1 ∩ R2
• x occurs n-m times in R1 - R2 (minimal of 0 times)
• Translates to UNION ALL, INTERSECT ALL and EXCEPT ALL
• This is the semantics we use for union, intersection and difference in
this course
Grouping
• The grouping operator γ (gamma) is like a combined SELECT and GROUP BY
• Syntax: γ<attributes/aggregates>(R)
• Example: γstudent, AVG(grade) → average(Grades)
Table: Grades
student course grade
S1 TDA357 3 student average
S2 TDA357 3 S1 4
S1 TDA143 5 S2 3
(name) (idnr,name,student,passed)
• Sanity check: All our conditions, projections etc. only mention attributes
that actually exist in their operands
Students(idnr, name)
Grades(student, course, grade)
Sanity check student -> Students.idnr
• Not doing this simple sanity check is probably the most common way
to unnecessarily loose points on the exam
What about HAVING?
• In SQL the HAVING-clause is like an extra WHERE-clause that happens
after/during grouping, having such an operator in RA does not make sense
• This is only a feature of SQL to avoid using subqueries all the time
• This query:
SELECT student FROM Grades
GROUP BY student Note: We need a name here
HAVING AVG(grade)>4;
Corresponds to this expression:
πstudent(σaverage>4(γstudent, AVG(grade)→average(Grades)))
• No need for a separate operator working on aggregates
• But it is important to do the selection outside the grouping when
translating a HAVING-clause to relational algebra
• Do the sanity check!
Qualified names
• Base relations have names that can be used in conditions etc.
• The results of expressions do not have names though
• Technically, expressions like πR1.x(R1 R2) are invalid, because the result
of (R1 R2) does not have a name
• Like SELECT R1.x FROM (SELECT * FROM R1 R2), which is invalid
• Essentially means qualified names are never useful in projections
• This is often ignored in examples of relational algebra and each attribute
is understood to retain its qualified name
• I will allow this in this course
Students(idnr, name)
Grades(idnr, course, grade)
Qualified names student -> Students.idnr
• If there are name clashes, it makes sense to sanity check with qualified names
(Grades.idnr, average)
• Use ρS(Students) to only rename the relation and keep attribute names
Table: Numbers
owner num
Renaming example Bart 11111
Lisa 22222
• Consider this query (self join) Bart 33333
SELECT N1.num, N2.num, N1.owner
FROM Numbers AS N1, Numbers AS N2
WHERE N1.owner = N2.owner;
• Here the ρ operator is essential
πN1.num, N2.num, N1.owner(σ(N1.owner = N2.owner(ρN1(Numbers) ρN2(Numbers))))
πp1(πp2(R)) = πp1(R)
R1 ∩ R2 = R1 – (R1 – R2)
1 *
2 3 R1 R2
• Each node in the tree can be computed into a value (or a schema), bottom up
All basic operators (a few more on next slide)
• Selection, "Sigma": σ<selection condition>(R)
• Projection, "Pi": π<attribute list>(R)
• Cartesian product: R1 R2
• Other set operations: R1 R2, R1 ∩ R2, R1 - R2
• Grouping, "Gamma": γ<attributes/aggregates>(R)
• Join: R1 <condition> R2
• Renaming, "Rho": ρ<Relation name>(<optional attribute names>)(R)
Additional operators
• Apart from the operators we have seen so far there are several
extensions to match various features of SQL
• NATURAL JOIN: R1 R2 (Just omit the Join-condition)
• JOIN USING: R1 idnr R2 (replace Join-condition with attribute)
• Outer joins:
o
• Full outer join: R1 <join condition> R2
• Left/right join: R1 oL R2 and R1 oR R2
<join condition> <join condition>
• DISTINCT: δ (delta), for converting from a bag to a set
e.g. R1 U R2 is UNION ALL in SQL, δ (R1 U R2) is UNION
• τ (tau), for ORDER BY on an expression. Examples:
τgrade(Grades) for SELECT * FROM Grades ORDER BY grade ASC
τ-grade(Grades) for SELECT * FROM Grades ORDER BY grade DESC
Is it OK if I just learn SQL and translate that to RA?
• Yes!
• But the translation is not always trivial
• Relational algebra is not just SQL in Greek!
Translating a single query
• A query with almost everything:
SELECT a1, MAX(a2) AS mx Some things, like HAVING
FROM T1, T2 requires new names to
WHERE a3=5
be introduced
GROUP BY a1,a3
HAVING COUNT(*) > 10
ORDER BY a1 ASC;
• A relational algebra expression for it:
τa1(πa1,mx (σtemp>10(γa1,a3,MAX(a2)→mx,COUNT(*)→temp(σa3=5(T1 T2)))))