Chapter 2.3
Chapter 2.3
1
Chapter 5: SQL Joins
2
Objectives
Horizontally combine data from multiple tables.
Distinguish between inner and outer SQL joins.
Compare SQL joins to DATA step merges.
3
Combining Data from Multiple Tables
SQL uses set operators to combine tables vertically.
Table
TableAA
Table
Table BB
This produces results that can be compared to a
DATA step concatenation.
4
Combining Data from Multiple Tables
SQL uses joins to combine tables horizontally.
Table
TableAA Table
Table BB
5
6
5.01 Multiple Choice Poll
Which of these DATA step statements is used to combine
tables horizontally?
a. SET
b. APPEND
c. MERGE
d. INPUT
e. INFILE
7
5.01 Multiple Choice Poll – Correct Answer
Which of these DATA step statements is used to combine
tables horizontally?
a. SET
b. APPEND
c. MERGE
d. INPUT
e. INFILE
8
Types of Joins
PROC SQL supports two types of joins:
inner joins
outer joins
9
Types of Joins
Inner joins
return only matching rows
enable a maximum of 256 tables to be joined
10
Types of Joins
Outer joins
return all matching rows, plus nonmatching rows
at a time.
11
Cartesian Product
To understand how SQL processes a join, it is important
to understand the concept of the Cartesian product.
A query that lists multiple tables in the FROM clause with-
out a WHERE clause produces all possible combinations
of rows from all tables. This result is called the Cartesian
product.
select *
from one, two;
s105d01
12
Cartesian Product
Table One Table Two
X A X B
1 a 2 x
4 d 3 y
2 b 5 v
s105d01
13 ...
Cartesian Product
Table One Table Two
X A X B
1 a 2 x
4 d 3 y
2 b 5 v
Result Set
X A X B
1 a 2 x
s105d01
14 ...
Cartesian Product
Table One Table Two
X A X B
1 a 2 x
4 d 3 y
2 b 5 v
Result Set
X A X B
1 a 2 x
1 a 3 y
s105d01
15 ...
Cartesian Product
Table One Table Two
X A X B
1 a 2 x
4 d 3 y
2 b 5 v
Result Set
X A X B
1 a 2 x
1 a 3 y
1 a 5 v
s105d01
16 ...
Cartesian Product
Table One Table Two
X A X B
1 a 2 x
4 d 3 y
2 b 5 v
Result Set
X A X B
1 a 2 x
1 a 3 y
1 a 5 v
4 d 2 x
s105d01
17 ...
Cartesian Product
Table One Table Two
X A X B
1 a 2 x
4 d 3 y
2 b 5 v
Result Set
X A X B
1 a 2 x
1 a 3 y
1 a 5 v
4 d 2 x
4 d 3 y
s105d01
18 ...
Cartesian Product
Table One Table Two
X A X B
1 a 2 x
4 d 3 y
2 b 5 v
Result Set
X A X B
1 a 2 x
1 a 3 y
1 a 5 v
4 d 2 x
4 d 3 y
4 d 5 v
s105d01
19 ...
Cartesian Product
Table One Table Two
X A X B
1 a 2 x
4 d 3 y
2 b 5 v
Result Set
X A X B
1 a 2 x
1 a 3 y
1 a 5 v
4 d 2 x
4 d 3 y
4 d 5 v
2 b 2 x
s105d01
20 ...
Cartesian Product
Table One Table Two
X A X B
1 a 2 x
4 d 3 y
2 b 5 v
Result Set
X A X B
1 a 2 x
1 a 3 y
1 a 5 v
4 d 2 x
4 d 3 y
4 d 5 v
2 b 2 x
2 b 3 y
s105d01
21 ...
Cartesian Product
Table One Table Two
X A X B
1 a 2 x
4 d 3 y
2 b 5 v
Result Set
X A X B
1 a 2 x
1 a 3 y
1 a 5 v
4 d 2 x
4 d 3 y
4 d 5 v
2 b 2 x
2 b 3 y
2 b 5 v
s105d01
22 ...
Cartesian Product
Table One Table Two
X A X B
1 a 2 x
4 d 3 y
2 b 5 v
Result Set
X A X B
1 a 2 x
1 a 3 y
1 a 5 v
4 d 2 x
4 d 3 y
4 d 5 v
2 b 2 x
2 b 3 y
2 b 5 v
s105d01
23
Cartesian Product
Table One Table Two
X A X B
1 a 2 x
4 d 3 rows 3 y
2 b 5 v
Result Set
X A X B
1 a 2 x
1 a 3 y
1 a 5 v
4 d 2 x
4 d 3 y
4 d 5 v
2 b 2 x
2 b 3 y
2 b 5 v
24 ...
Cartesian Product
Table One Table Two
X A X B
1 a 2 x
4 d 3 rows 3 rows 3 y
2 b 5 v
Result Set
X A X B
1 a 2 x
1 a 3 y
1 a 5 v
4 d 2 x
4 d 3 y
4 d 5 v
2 b 2 x
2 b 3 y
2 b 5 v
25 ...
Cartesian Product
Table One Table Two
X A X B
1 a 2 x
4 d 3 rows X 3 rows 3 y
2 b 5 v
Result Set
X A X B
1 a 2 x
1 a 3 y
1 a 5 v
4 d 2 x
4 d 3 y 9 rows
4 d 5 v
2 b 2 x
2 b 3 y
2 b 5 v
26
Cartesian Product
The number of rows in a Cartesian product is the product
of the number of rows in the contributing tables.
3x3=9
1,000 x 1,000 = 1,000,000
100,000 x 100,000 = 10,000,000,000
27
28
5.02 Quiz
How many rows are returned from this query?
select *
from three, four;
Table Three Table Four
X A X B
1 a1 2 x1
1 a2 2 x2
2 b1 3 y
2 b2 5 v
4 d
29
s105a01
5.02 Quiz – Correct Answer
How many rows are returned from this query?
The query produces 20 rows.
select *
from three, four;
Table Three Table Four Partial Results Set
X A X B X A X B
1 a1 2 x1 1 a1 2 x1
1 a2 2 x2 1 a1 2 x2
2 b1 3 y 1 a1 3 y
2 b2 5 v 1 a1 5 v
4 d 1 a2 2 x1
1 a2 2 x2
5*4=20 1 a2 3 y
1 a2 5 v
2 b1 2 x1
2 b1 2 x2
30
s105a01
Inner Joins
Inner join syntax resembles Cartesian product syntax,
but a WHERE clause restricts which rows are returned.
General form of an inner join:
SELECT
SELECT column-1<,
column-1<, …column-n>
…column-n>
FROM
FROM table-1|view-1<,
table-1|view-1<, …… table-n|view-n>
table-n|view-n>
WHERE
WHEREjoin-condition(s)
join-condition(s)
<AND
<ANDother
othersubsetting
subsettingconditions>
conditions>
<other
<otherclauses>;
clauses>;
31 ...
Inner Joins
Inner join syntax resembles Cartesian product syntax,
but a WHERE clause restricts which rows are returned.
General form of an inner join:
SELECT
SELECT column-1<,
column-1<, …column-n>
…column-n>
FROM
FROM table-1|view-1<,
table-1|view-1<, …… table-n|view-n>
table-n|view-n>
WHERE
WHEREjoin-condition(s)
join-condition(s)
<AND
<ANDother
othersubsetting
subsettingconditions>
conditions>
<other
<otherclauses>;
clauses>;
33
Inner Joins: Cartesian Product Built
Table One Table Two
X A X B
1 a 2 x
4 d 3 y
2 b select * 5 v
from one, two
X A X B
1 a 2 x
1 a 3 y
1 a 5 v
4 d 2 x
4 d 3 y
4 d 5 v
2 b 2 x
2 b 3 y
2 b 5 v s105d02
34 ...
Inner Joins: WHERE Clause Restricts Rows
Table One Table Two
X A X B
1 a 2 x
4 d 3 y
2 b select * 5 v
from one, two
where one.x=two.x;
X A X B
1 a 2 x
1 a 3 y
1 a 5 v
4 d 2 x
4 d 3 y
4 d 5 v
2 b 2 x
2 b 3 y
2 b 5 v s105d02
35 ...
Inner Joins: Results Are Returned
Table One Table Two
X A X B
1 a 2 x
4 d 3 y
2 b select * 5 v
from one, two
where one.x=two.x;
X A X B
2 b 2 x
s105d02
36
Inner Joins
One method of displaying the X column only once is to use a
table qualifier in the SELECT list.
Table One Table Two
X A X B
1 a 2 x
4 d 3 y
2 b 5 v
select one.x, a, b
from one, two
where one.x=two.x;
X A B
2 b x
s105d03
37
Inner Joins
Display all combinations of rows with matching keys, including
duplicates.
Table Three Table Four
X A X B
1 a1 2 x1
1 a2 2 x2
2 b1 3 y
2 b2 5 v
4 d
proc sql;
select *
from three, four
where three.x=four.x;
quit;
s105d04
38 ...
Inner Joins
Display all combinations of rows with matching keys, including
duplicates.
Table Three Table Four Results Set
X A X B X A X B
1 a1 2 x1 2 b1 2 x1
1 a2 2 x2 2 b1 2 x2
2 b1 3 y 2 b2 2 x1
2 b2 5 v 2 b2 2 x2
4 d
proc sql;
select *
from three, four
where three.x=four.x;
quit;
s105d04
39
40
Setup for the Poll
Run program s105a02 and review the results to determine
how many rows (observations) the DATA step MERGE
statement produces in the output table.
Three Four
X A X B
data new;
1 a1 2 x1
merge three (in=InThree)
1 a2 2 x2 four (in=InFour);
2 b1 3 y by x;
2 b2 5 v if InThree and InFour;
4 d run;
s105a02
41
5.03 Multiple Choice Poll
How many rows (observations) result from the DATA step
MERGE statement in program s105a02?
a. 4
b. 2
c. 6
d. 20
e. None of the above
42
5.03 Multiple Choice Poll – Correct Answer
How many rows (observations) result from the DATA step
MERGE statement in program s105a02?
a. 4
b. 2
c. 6
d. 20
e. None of the above
Birth
Name City
Month
Last, First City Name
1
44
Business Scenario
Considerations:
orion.Employee_Addresses contains em-
ployee name, country, and city data.
orion.Payroll contains employee birth dates.
Both orion.Employee_Addresses and ori-
on.Payroll contain Employee_ID.
Names are stored in the Employee_Name column
as Last, First.
45
Inner Joins
proc sql;
title "Australian Employees' Birth Months";
select Employee_Name as Name format=$25.,
City format=$25.,
month(Birth_Date) 'Birth Month' format=3.
from orion.Employee_Payroll,
orion.Employee_Addresses
where Employee_Payroll.Employee_ID=
Employee_Addresses.Employee_ID
and Country='AU'
order by 3,City, Employee_Name;
quit;
s105d05
46
Inner Joins
Partial PROC SQL Output
Birth
Name City Month
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
Aisbitt, Sandy Melbourne 1
Graham-Rowe, Jannene Melbourne 1
Hieds, Merle Melbourne 1
Sheedy, Sherie Melbourne 1
Simms, Doungkamol Melbourne 1
Tannous, Cos Melbourne 1
Body, Meera Sydney 1
Clarkson, Sharryn Sydney 1
Dawes, Wilson Sydney 1
Rusli, Skev Sydney 1
Glattback, Ellis Melbourne 2
Gromek, Gladys Melbourne 2
47
Inner Join Alternate Syntax
An inner join can also be accomplished using an alternate
syntax, which limits the join to a maximum of two tables.
General form of an inner join:
SELECT
SELECT column-1
column-1<, <, …column-n>
…column-n>
FROM
FROM table-1
table-1
INNER
INNERJOIN
JOIN
table-2
table-2
ON
ONjoin-condition(s)
join-condition(s)
<other
<otherclauses>;
clauses>;
This syntax is common in SQL code produced by
code generators such as SAS Enterprise Guide.
The ON clause specifies the JOIN criteria; a
WHERE clause can be added to subset the
48
results.
Inner Join Alternate Syntax
proc sql;
title "Australian Employees' Birth Months";
select Employee_Name as Name format=$25.,
City format=$25.,
month(Birth_Date) 'Birth Month' format=3.
from orion.Employee_Payroll
inner join
orion.Employee_Addresses
on Employee_Payroll.Employee_ID=
Employee_Addresses.Employee_ID
where Country='AU'
order by 3,City, Employee_Name;
quit;
s105d06
49
50
5.04 Multiple Choice Poll
How many tables can be combined using a single inner
join?
a. 2
b. 32
c. 256
d. 512
e. Limited only by my computer’s resources
f. No limit
51
5.04 Multiple Choice Poll – Correct Answer
How many tables can be combined using a single inner
join?
a. 2
b. 32
c. 256
d. 512
e. Limited only by my computer’s resources
f. No limit
52
53
Outer Joins
Inner joins returned only matching rows. When you join
tables, you might want to include nonmatching rows as
well as matching rows.
54
Outer Joins
You can retrieve both nonmatching and matching rows us-
ing an outer join.
Outer joins include left, full, and right outer joins. Outer
joins can process only two tables at a time.
55
Compare Inner Joins And Outer Joins
The following table is a comparison of inner and outer join
syntax and limitations:
Key Point Inner Join Outer Join
Table Limit 256 2
Join Behavior Returns matching rows Returns matching and
only nonmatching rows
56
Outer Joins
Outer join syntax is similar to the inner join alternate syn-
tax.
General form of an outer join:
SELECT
SELECT column-1
column-1<, <, …column-n>
…column-n>
FROM
FROM table-1
table-1
LEFT|RIGHT|FULL
LEFT|RIGHT|FULLJOIN JOIN
table-2
table-2
ON
ONjoin-condition(s)
join-condition(s)
<other
<otherclauses>;
clauses>;
57
Determining Left and Right
Consider the position of the tables in the FROM clause.
Left joins include all rows from the first (left) table,
FROM
FROM table-1
table-1 join-type
join-type table-2
table-2
ON
ONjoin-condition(s);
join-condition(s);
58
Left Join
Table One Table Two
X A X B
1 a 2 x
4 d 3 y
2 b 5 v
select *
from one left join two
on one.x = two.x;
X A X B
1 a .
2 b 2 x
4 d .
s105d07
59
Right Join
Table Two Table One
X B X A
2 x 1 a
3 y 4 d
5 v 2 b
select *
from two right join one
on one.x = two.x;
X B X A
. 1 a
2 x 2 b
. 4 d
s105d08
60
Full Join
Table One Table Two
X A X B
1 a 2 x
4 d 3 y
2 b 5 v
select *
from one full join two
on one.x = two.x;
X A X B
1 a .
2 b 2 x
. 3 y
4 d .
. 5 v
s105d09
61
Business Scenario
List the employee ID and gender for all
married employees. Include the names
of any charities to which the employee
donates via the company program.
62
Business Scenario
Considerations:
The table orion.Employee_Payroll
contains gender and marital status information.
Employee_Payroll
(all employees)
63 ...
Business Scenario
Considerations:
The table orion.Employee_Payroll
contains gender and marital status information.
The table orion.Employee_Donations con-
tains records only for those employees who
donate to a charity via the company program.
Employee_Payroll
(all employees)
Employee_Donations
(employees who
donate to charity)
64 ...
Business Scenario
Considerations:
The table orion.Employee_Payroll
contains gender and marital status information.
The table orion.Employee_Donations con-
tains records only for those employees who
donate to a charity via the company program.
Less than half of all employees are married.
Employee_Donations
(employees who
donate to charity)
65 ...
Business Scenario
Considerations:
The table orion.Employee_Payroll
contains gender and marital status information.
The table orion.Employee_Donations con-
tains records only for those employees who
donate to a charity via the company program.
Less than half of all employees are married.
69
5.05 Multiple Choice Poll – Correct Answer
For the report, you need the data for all married employ-
ees from orion.Employee_Payroll.
You also want to include the charity names from the
orion.Employee_Donations table if Em-
ployee_ID matches. What type of join should you use
to combine the information from these two tables?
a. Inner Join
b. Left Join
c. Full Join
d. None of the above
70
Outer Joins
proc sql;
select Employee_payroll.Employee_ID,
Employee_Gender, Recipients
from orion.Employee_payroll
left join
orion.Employee_donations
on Employee_payroll.Employee_ID=
Employee_donations.Employee_ID
where Marital_Status="M"
;
quit;
s105d10
71
Outer Joins
Partial PROC SQL Output (Rows 203-215)
Employee_
Employee_ID Gender Recipients
ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ
121128 F Cancer Cures, Inc.
121131 M Vox Victimas 40%, Conserve Nature, Inc. 60%
121132 M EarthSalvors 50%, Vox Victimas 50%
121133 M Disaster Assist, Inc.
121138 M Cuidadores Ltd.
121139 F
121142 M AquaMissions International 10%, Child Survivors 90%
121143 M Mitleid International 60%, Save the Baby Animals
40%
121144 F
121145 M Save the Baby Animals
121146 F
121147 F Cuidadores Ltd. 50%, Mitleid International 50%
121148 M
SELECT
SELECT alias-1.column-1<,
alias-1.column-1<, …alias-2.column-n>
…alias-2.column-n>
FROMtable-1
FROMtable-1AS ASalias-1
alias-1
join-type
join-type
table-2
table-2AS
ASalias-2
alias-2
ON
ONjoin-condition(s)
join-condition(s)
<other
<otherclauses>;
clauses>;
73
Using a Table Alias
proc sql;
select p.Employee_ID, Employee_Gender,
Recipients
from orion.Employee_payroll as p
left join
orion.Employee_donations as d
on p.Employee_ID=d.Employee_ID
where Marital_Status="M"
;
quit;
s105d11
74
DATA Step Merge (Review)
A DATA step with MERGE and BY statements automatically
overlays same-name columns.
Table One Table Two
X A X B Table One must be sorted or
1 a 2 x indexed on column X before
4 d 3 y a merge can be performed.
2 b 5 v
Output
data merged; X A B
1 a
merge one two; 2 b x
by x; 3 y
run; 4 d
proc print data=merged; 5 v
run;
s105d12
75 ...
SQL Join versus DATA Step Merge
SQL joins do not automatically overlay same-named col-
umns.
Table One Table Two
X A X B
1 a 2 x
4 d 3 y
2 b 5 v
Output
proc sql; X A B
1 a
select one.x, a, b 2 b x
from one full join two y
on one.x=two.x 4 d
; v
quit;
s105d12
76
SQL Join versus DATA Step Merge
You can use the COALESCE function to overlay columns.
Table One Table Two
X A X B
1 a 2 x
4 d 3 y
2 b 5 v
Output
proc sql; X A B
1 a
select coalesce(one.x,two.x) 2 b x
as x,a,b 3 y
from one full join two 4 d
on one.x=two.x; 5 v
quit;
s105d12
77
SQL Join versus DATA Step Merge
Key Points SQL Join DATA Step
Merge
Explicit sorting of data Not required Required
before join/merge
Same-named columns in Not required Required
join/merge expressions
Equality in join or merge Not required Required
expressions
78
79
Exercise
80
Chapter 5: SQL Joins
81
Objectives
Create and use in-line views.
Use in-line views and subqueries to simplify coding
a complex query.
82
Chapter Review
1. How many rows are returned by the following query?
Table1 Table2
proc sql; X A X B
select * 1 a 2 x
from 3 d 1 y
table1,table2; 2 b 3 v
quit;
83
Chapter Review Answers
1. How many rows are returned by the following query?
Table1 Table2
proc sql; X A X B
select * 1 a 2 x
from 3 d 1 y
table1,table2; 2 b 3 v
quit;
84
Chapter Review
2. Which of the following statements describes an advan-
tage of using a PROC SQL view?
a. Views often save space, because a view is usually
quite small compared with the data that it accesses.
b. Views can provide users a simpler alternative to
frequently retrieving and submitting query code to
produce identical results.
c. Views hide complex query details from users.
d. All of the above
85
Chapter Review Answers
2. Which of the following statements describes an advan-
tage of using a PROC SQL view?
a. Views often save space, because a view is usu-
ally
quite small compared with the data that it accesses.
b. Views can provide users a simpler alternative to
frequently retrieving and submitting query code to
produce identical results.
c. Views hide complex query details from users.
d. All of the above
86
Chapter Review
3. Outer and Inner Joins:
a. An outer join can operate on a maximum of ___
tables simultaneously.
b. An inner join can operate on a maximum of ___
tables simultaneously.
87
Chapter Review Answers
3. Outer and Inner Joins:
a. An outer join can operate on a maximum of _2_
tables simultaneously.
b. An inner join can operate on a maximum of _256
tables simultaneously.
88
Chapter Review
4. True or False:
An in-line view can be used on a WHERE or HAVING
clause and can return many rows of data, but must re-
turn only one column.
89
Chapter Review Answers
4. True or False:
An in-line view can be used on a WHERE or HAVING
clause and can return many rows of data, but must re-
turn only one column.
False
An in-line view is a query used in the FROM
clause in place of a table. An in-line view can
return any number of rows or columns.
90