Session 5 BIZ
Session 5 BIZ
08:30
https://unsplash.com/photos/gm3bxHin8VA
MySQL for Data Analytics
3
Sample Question 1
• You are requested to do an analysis on a data file
containing the sales information of your company
before 2022. Recently, you received a complementary
data file of the sales information for 2022. Both files
have the same data structure. You want to merge two
tables into one table. How would you do that?
27.09.2023
4
UNION [all]
• UNION operator combines two or more result
sets from multiple SELECT statements
- Each SELECT statement within UNION must have
the same number of columns
- The corresponding columns must also have similar
data types
27.09.2023
5
Solution
Table customer
27.09.2023
Table employee 7
Question 2
• Your boss asks you to provide the contact details of both
customers and employees in the USA, including their
names and contacts (e.g., phone number or email).
• Assuming that your company is located in the USA, all
the employees should be on the list. However,
customers may not be from the USA.
27.09.2023
8
Solution
select contactLastName, contactFirstName,
phone as contact from customers where country
like 'USA'
union
select lastName, firstName, email from
employees
What would happen, if we
use ‘union all’ instead
27.09.2023
9
Get a fixed value on a select
(Select contactLastName, contactFirstName,
phone as contact, "customer" AS category
from customers where country like 'USA' limit 5)
union
select lastName, firstName, email,
"employee" AS category
from employees limit 10
27.09.2023
10
Select…Group by…
27.09.2023
13
Reference: https://www.w3resource.com/mysql/aggregate-functions-and-grouping/aggregate-functions-and-grouping-group_concat.php
This table offers
the information of
products included
in each sales order.
Table orderdetails
• In one purchase (or a sales order), a customer may buy several
products.
• The same product can be sold in different prices in different
27.09.2023
sales orders. 14
Attention! A common mistake!
If we can select the maximum quantity ordered
of each product via the following code,…
SELECT productCode, MAX(quantityOrdered)
FROM orderdetails
GROUP BY productCode
How to retrieve the price of the product that has Raw data
the largest number in quantity ordered for each
order number?
15
AVG() + group by
Revenue generated by a
product when ordered =
quantityOrdered x priceEach
27.09.2023
17
Query improvement
SELECT productCode, AVG(priceEach*quantityOrdered) FROM
orderdetails GROUP BY productCode
SELECT productCode,
ROUND(AVG(priceEach*quantityOrdered), 2) AS avg_revenue
FROM orderdetails
GROUP BY productCode
27.09.2023
ORDER BY avg_revenue DESC 18
Count() + group by
• What is the product
that has been most
often purchased by
customers?
27.09.2023
20
You can use number to indicate column
Select productCode, count(*) as frequency
from orderdetails group by productCode order
by frequency desc
… is the same to …
27.09.2023
23
group_concat()+ Group by
• This function returns a string result with
the concatenated non-NULL values from a
group.
27.09.2023
24
An example
• select group_concat(contactFirstName,
contactLastName) from customers
27.09.2023
25
Group_concat
27.09.2023
26
You find duplicated records mistakenly appearing
at the table below. How can you obtain a clean
table by making each payment appears only once?
select * from
payments group by
customerNumber,
checkNumber,
paymentDate,
amount
27.09.2023
27
Select…Group by + having
• Where and Having are similar if Group By
is not included in the command.
It works!
27.09.2023
30
Order of keyword operation
- Select command template (a simple version)
Select (columns or computed new columns)
From (table[s])
Where (conditions)
Group by (columns or computed new columns)
Having (conditions – based on computed new columns, e.g. count)
Order by (columns or computed new columns)
Limit (number)
- Sequence of operation:
From → Where → Group by → Select → Having → Order by → Limit
31
NULL value VS. Empty value
27.09.2023
32
Manipulation on NULL value
• Selection of empty value or purely space
Select * from peoplenames where Middel_Name = '';
!= '';
• Selection of NULL value
Select * from peoplenames where Middel_Name is null;
is not null
27.09.2023
33
Manipulation on date
• Business-oriented data is normally time- or
date-based by assigning a time stamp to
record the occurrence of each event, e.g.:
- Supermarket receipt
- Time of delivery or making order
- Stock exchange
• current_time() function in MySQL.
27.09.2023
34
Possible research questions
• Is the revenue generated on Monday higher
than that on Tuesday?
27.09.2023
35
DATE(expr)
• Business data is often as accurate as seconds.
• Extracting the DATE part of a datetime
expression expr.
27.09.2023
36
extract information
Command Result
select hour('2015-03-16 23:45:59'); 23
select minute('2015-03-16 23:45:59'); 45
select second('2015-03-16 23:45:59'); 59
select day('2015-03-16 23:45:59'); 16
select week('2015-03-16 23:45:59'); 11
select month('2015-03-16 23:45:59'); 3
select quarter('2015-03-16 23:45:59'); 1
select year('2015-03-16 23:45:59'); 2015
27.09.2023
37
DAYNAME(date)
MONTHNAME(date)
• DAYNAME(date);
• MONTHNAME(date)
27.09.2023
38
Possible research question
• In a week, when will consumers most likely
submit their complaints to CFPB?
- DayName
- Group by
http://presemo.aalto.fi/drm/
27.09.2023
40
weekday() vs. dayofweek()
• For weekday(): 0 = Monday, 1 = Tuesday, 2
= Wednesday, 3 = Thursday, 4 = Friday, 5 =
Saturday, 6 = Sunday.
• For dayofweek(): 1=Sunday, 2=Monday,
3=Tuesday, 4=Wednesday, 5=Thursday,
6=Friday, 7=Saturday.
27.09.2023
41
Examples
• Select dayofweek("2017-06-15");
- Return : 5
• Select weekday("2017-06-15");
- Return : 3
• Select dayname("2017-06-15");
- Return: Thursday
27.09.2023
42
Manipulation on date VS. number
27.09.2023
43
DATE_ADD(date, INTERVAL expr unit)
27.09.2023
44
Question
• A person is born on March 16, 1998. When
will the date that the person has been living
in this world for 10,000 days?
27.09.2023
46
DATEDIFF(expr1,expr2)
27.09.2023
47
Question
• In CFPB, which company has the largest
average interval between Date_received and
Date_sent_to_company? Only those
company who has over 50 records in the
data will be considered.
27.09.2023
49
Alter table
• Alter Table table_name Add column_name
datatype
• Alter Table table_name Drop column_name
27.09.2023
50
Update table
• Update table_name
Set column_name1 = value|expression,
column_name2 = value|expression,
…
column_nameN = value|expression
Where conditions;
27.09.2023
51
Table products
• Price_difference = (MSRP-buyPrice)
Please create a new column of Price_difference
27.09.2023
53
Delete records from table
• Delete from table_name
[where conditions]
customers payments
27.09.2023
58
Example
Select *
from customers
where customerNumber in
(select customerNumber
from payments
where amount > 100000)
27.09.2023
59
Question
• Retrieve the payment information of the
customers who are living in the country,
Spain, with a creditLimit of over 5000?
SELECT customerNumber
FROM customers
WHERE country = 'Spain' AND
creditLimit > 5000
) 27.09.2023
61
Sub-Queries (2.1)
• If the result of the select command is based
on multiple columns of another table. E.g.:
Select attributes
from table_1
Where (attribute1, … , attributeN) IN| NOT IN
(Select column1, … , columnN
from table_2
Where attributes )
27.09.2023
62
Question
• Assume we have a table of undelivered products, how can we
calculate the revenue of those delivered products.
Table: undelivered_products
Select sum(quantityOrdered*priceEach)
FROM orderdetails
WHERE (orderNumber,productCode) NOT IN
(select orderNumber,productCode FROM
undelivered_products)
27.09.2023
64
Attention! A common mistake!
If we can select the maximum quantity ordered
of each product via the following code,…
SELECT productCode, MAX(quantityOrdered)
FROM orderdetails
GROUP BY productCode
How to retrieve the price of the product that has Raw data
the largest number in quantity ordered for each
order number?
65
Answer to the question for common
mistake of “group by”
The command to extract rows with the ‘priceEach’ of each productCode
with maximam quantityOrdered
27.09.2023
69