0% found this document useful (0 votes)
8 views69 pages

Session 5 BIZ

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views69 pages

Session 5 BIZ

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

The lecture will start soon at

08:30

https://unsplash.com/photos/gm3bxHin8VA
MySQL for Data Analytics

Lecturer: Yong Liu


Contact me at: Yong.liu@aalto.fi
Content for class 5
- Union
- Group by
- Manipulation on NULL value
- Manipulation on date
- Sub-queries
- Alter and update table
- Delete record

3
Sample Question 1
• You are requested to do an analysis on a data file
containing the sales information of your company
before 2022. Recently, you received a complementary
data file of the sales information for 2022. Both files
have the same data structure. You want to merge two
tables into one table. How would you do that?

27.09.2023
4
UNION [all]
• UNION operator combines two or more result
sets from multiple SELECT statements
- Each SELECT statement within UNION must have
the same number of columns
- The corresponding columns must also have similar
data types

27.09.2023
5
Solution

Select * from sale_before_2022


Union
select * from sale_in_2022

“Union” removes duplicated rows in default;


“Union all” keeps duplicated rows.
27.09.2023
6
Question 2

Table customer

27.09.2023
Table employee 7
Question 2
• Your boss asks you to provide the contact details of both
customers and employees in the USA, including their
names and contacts (e.g., phone number or email).
• Assuming that your company is located in the USA, all
the employees should be on the list. However,
customers may not be from the USA.

27.09.2023
8
Solution
select contactLastName, contactFirstName,
phone as contact from customers where country
like 'USA'
union
select lastName, firstName, email from
employees
What would happen, if we
use ‘union all’ instead
27.09.2023
9
Get a fixed value on a select
(Select contactLastName, contactFirstName,
phone as contact, "customer" AS category
from customers where country like 'USA' limit 5)
union
select lastName, firstName, email,
"employee" AS category
from employees limit 10
27.09.2023
10
Select…Group by…

• “Group by” likes “Distinct” to


offer unique record based on
columns specified.
• select distinct productCode, priceEach
from orderdetails
The same
results
• select productCode, priceEach
from orderdetails group by productCode, priceEach
27.09.2023
11
GROUP BY (Aggregate) Functions
Name Description
AVG() Return the average value of the argument
COUNT() Return a count of the number of rows returned
COUNT(DISTINCT) Return the count of a number of different values
GROUP_CONCAT() Return a concatenated string
MAX() Return the maximum value
MIN() Return the minimum value
STD() Return the population standard deviation
SUM() Return the sum
VARIANCE() Return the population standard variance
27.09.2023
12
Group by (II)

Table Name: book_mast

SELECT cate_id, MAX(book_price)


FROM book_mast
GROUP BY cate_id;

27.09.2023
13
Reference: https://www.w3resource.com/mysql/aggregate-functions-and-grouping/aggregate-functions-and-grouping-group_concat.php
This table offers
the information of
products included
in each sales order.

Table orderdetails
• In one purchase (or a sales order), a customer may buy several
products.
• The same product can be sold in different prices in different
27.09.2023
sales orders. 14
Attention! A common mistake!
If we can select the maximum quantity ordered
of each product via the following code,…
SELECT productCode, MAX(quantityOrdered)
FROM orderdetails
GROUP BY productCode
How to retrieve the price of the product that has Raw data
the largest number in quantity ordered for each
order number?

… what would be the output for the following


queue?
SELECT productCode, MAX(quantityOrdered),
priceEach FROM orderdetails GROUP BY
productCode;
27.09.2023

15
AVG() + group by
Revenue generated by a
product when ordered =
quantityOrdered x priceEach

What is the average


revenue generated by
each product when
ordered?

Table orderdetails 27.09.2023


16
Improving the presentation of result?
SELECT productCode, AVG(priceEach*quantityOrdered) FROM
orderdetails GROUP BY productCode

27.09.2023
17
Query improvement
SELECT productCode, AVG(priceEach*quantityOrdered) FROM
orderdetails GROUP BY productCode

SELECT productCode, ROUND(AVG(priceEach*quantityOrdered), 2)


FROM orderdetails GROUP BY productCode

SELECT productCode,
ROUND(AVG(priceEach*quantityOrdered), 2) AS avg_revenue
FROM orderdetails
GROUP BY productCode
27.09.2023
ORDER BY avg_revenue DESC 18
Count() + group by
• What is the product
that has been most
often purchased by
customers?

Table orderdetails 27.09.2023


19
Answer
• select productCode, count(*) as frequency
from orderdetails group by productCode
order by frequency desc

27.09.2023
20
You can use number to indicate column
Select productCode, count(*) as frequency
from orderdetails group by productCode order
by frequency desc
… is the same to …

Select productCode, count(*) as frequency


from orderdetails group by 1 order by 2 desc
27.09.2023
21
Count(Distinct)+ Group by
A product can be sold in
different prices in
different sales orders.
• What is the product
that has been sold in
the largest amount of
different prices?

Table orderdetails 27.09.2023


22
Answer
select productCode, count(distinct priceEach)
as frequency
from orderdetails
group by productCode
order by frequency desc

27.09.2023
23
group_concat()+ Group by
• This function returns a string result with
the concatenated non-NULL values from a
group.

27.09.2023
24
An example
• select group_concat(contactFirstName,
contactLastName) from customers

27.09.2023
25
Group_concat

select country, city,


group_concat(' ', contactFirstName,' ',contactLastName)
as contact_list
from customers group by country, city

27.09.2023
26
You find duplicated records mistakenly appearing
at the table below. How can you obtain a clean
table by making each payment appears only once?

select * from
payments group by
customerNumber,
checkNumber,
paymentDate,
amount
27.09.2023
27
Select…Group by + having
• Where and Having are similar if Group By
is not included in the command.

select priceEach from orderdetails where priceEach > 200;

select priceEach from orderdetails having priceEach > 200;

Two queries product


the same results 27.09.2023
28
Where versus Having
Doesn’t work!

select priceEach as p from orderdetails where p > 200;

select priceEach as p from orderdetails having p > 200;

It works!

• WHERE is applied before Select or Group


by, while HAVING is applied after. 27.09.2023
29
Example
select orderNumber, count(*) as freq
from orderdetails
where priceEach > 50
group by orderNumber having freq > 10;

27.09.2023
30
Order of keyword operation
- Select command template (a simple version)
Select (columns or computed new columns)
From (table[s])
Where (conditions)
Group by (columns or computed new columns)
Having (conditions – based on computed new columns, e.g. count)
Order by (columns or computed new columns)
Limit (number)
- Sequence of operation:
From → Where → Group by → Select → Having → Order by → Limit
31
NULL value VS. Empty value

27.09.2023
32
Manipulation on NULL value
• Selection of empty value or purely space
Select * from peoplenames where Middel_Name = '';
!= '';
• Selection of NULL value
Select * from peoplenames where Middel_Name is null;
is not null
27.09.2023
33
Manipulation on date
• Business-oriented data is normally time- or
date-based by assigning a time stamp to
record the occurrence of each event, e.g.:
- Supermarket receipt
- Time of delivery or making order
- Stock exchange
• current_time() function in MySQL.
27.09.2023
34
Possible research questions
• Is the revenue generated on Monday higher
than that on Tuesday?

• Did consumers complaint more often in the


weekends than weekdays?

27.09.2023
35
DATE(expr)
• Business data is often as accurate as seconds.
• Extracting the DATE part of a datetime
expression expr.

27.09.2023
36
extract information
Command Result
select hour('2015-03-16 23:45:59'); 23
select minute('2015-03-16 23:45:59'); 45
select second('2015-03-16 23:45:59'); 59
select day('2015-03-16 23:45:59'); 16
select week('2015-03-16 23:45:59'); 11
select month('2015-03-16 23:45:59'); 3
select quarter('2015-03-16 23:45:59'); 1
select year('2015-03-16 23:45:59'); 2015

27.09.2023
37
DAYNAME(date)
MONTHNAME(date)
• DAYNAME(date);

• MONTHNAME(date)

27.09.2023
38
Possible research question
• In a week, when will consumers most likely
submit their complaints to CFPB?
- DayName
- Group by
http://presemo.aalto.fi/drm/

Column: Data_received Table name: Tablex 27.09.2023


39
Answer
select dayname(Data_received),
count(*) as freq
from tablex
group by dayname(Data_received)
order by freq desc

27.09.2023
40
weekday() vs. dayofweek()
• For weekday(): 0 = Monday, 1 = Tuesday, 2
= Wednesday, 3 = Thursday, 4 = Friday, 5 =
Saturday, 6 = Sunday.
• For dayofweek(): 1=Sunday, 2=Monday,
3=Tuesday, 4=Wednesday, 5=Thursday,
6=Friday, 7=Saturday.

27.09.2023
41
Examples
• Select dayofweek("2017-06-15");
- Return : 5
• Select weekday("2017-06-15");
- Return : 3
• Select dayname("2017-06-15");
- Return: Thursday
27.09.2023
42
Manipulation on date VS. number

• select '2008-01-02' - 1 Normal


mathematic
calculation
like + and -
cannot be
applied to date
directly.
• select '2008-01-02' - '2007-11-01'

27.09.2023
43
DATE_ADD(date, INTERVAL expr unit)

• DATE_ADD() is a synonym for ADDDATE()

27.09.2023
44
Question
• A person is born on March 16, 1998. When
will the date that the person has been living
in this world for 10,000 days?

Select date_add('1998-03-16', interval 10000 day)


→ 2025-08-01
27.09.2023
45
DATEDIFF(expr1,expr2)
• DATEDIFF() returns expr1 – expr2 expressed
as a value in days from one date to the other.
expr1 and expr2 are date or date-and-time
expressions.
• Only the date parts of the values are used in the
calculation.

27.09.2023
46
DATEDIFF(expr1,expr2)

27.09.2023
47
Question
• In CFPB, which company has the largest
average interval between Date_received and
Date_sent_to_company? Only those
company who has over 50 records in the
data will be considered.

Table name: Tablex 27.09.2023


48
Answer
SELECT company,
avg(DATEDIFF(Data_sent_to_company, Data_received)) AS diff,
COUNT(*) AS freq
FROM tablex
GROUP BY company
HAVING freq > 50
ORDER BY diff DESC

27.09.2023
49
Alter table
• Alter Table table_name Add column_name
datatype
• Alter Table table_name Drop column_name

27.09.2023
50
Update table
• Update table_name
Set column_name1 = value|expression,
column_name2 = value|expression,

column_nameN = value|expression
Where conditions;

27.09.2023
51
Table products

• Price_difference = (MSRP-buyPrice)
Please create a new column of Price_difference

Alter table products add Price_difference decimal(10,2);

Update products set Price_difference = (MSRP-buyPrice);

Update products set Price_difference = (MSRP-buyPrice)


Where productName like ‘1996 Moto Guzzi%’;
27.09.2023
52
Tips
• Update versus Select
- ‘Select’ is just a presentation of new result while
‘update’ actually changes and saves the data.

• Undo the last action?! It does not work!

27.09.2023
53
Delete records from table
• Delete from table_name
[where conditions]

• Delete from table_name


- This commands will remove all the records from
the table [output: an empty table]
27.09.2023
54
Example
• In table customers, some values of the
‘salesRepEmplooyeeNumber’ column are
null
select * from customers
where salesRepEmployeeNumber is null

Delete from customers


where salesRepEmployeeNumber is null
27.09.2023
55
Sub-Queries (1)
• Template 1:
Create table TB_name as
(Select attributes
from table or view
[Where conditions]
[Group by attributes [Having condition]]
[Order by attributes [asc | desc]]
[Limit]) 27.09.2023
56
Sub-Queries (2)
• If the result of the select command is based
on one column of another table. E.g.:
Select attributes
from table_1
Where attributes IN| NOT IN
(Select ONE_column
from table_2
Where attributes )
27.09.2023
57
Example
• Please provide the contact information of
customers who made a payment over
100,000 Euro.

customers payments
27.09.2023
58
Example
Select *
from customers
where customerNumber in
(select customerNumber
from payments
where amount > 100000)
27.09.2023
59
Question
• Retrieve the payment information of the
customers who are living in the country,
Spain, with a creditLimit of over 5000?

customers payments 27.09.2023


60
Answer
SELECT *
FROM payments
WHERE customerNumber IN (

SELECT customerNumber
FROM customers
WHERE country = 'Spain' AND
creditLimit > 5000
) 27.09.2023
61
Sub-Queries (2.1)
• If the result of the select command is based
on multiple columns of another table. E.g.:
Select attributes
from table_1
Where (attribute1, … , attributeN) IN| NOT IN
(Select column1, … , columnN
from table_2
Where attributes )
27.09.2023
62
Question
• Assume we have a table of undelivered products, how can we
calculate the revenue of those delivered products.

Table: undelivered_products

Table: orderdetails 27.09.2023


63
http://stackoverflow.com/questions/8435107/mysql-where-not-in-using-two-columns
Table: undelivered_products Table: orderdetails

Select sum(quantityOrdered*priceEach)
FROM orderdetails
WHERE (orderNumber,productCode) NOT IN
(select orderNumber,productCode FROM
undelivered_products)
27.09.2023
64
Attention! A common mistake!
If we can select the maximum quantity ordered
of each product via the following code,…
SELECT productCode, MAX(quantityOrdered)
FROM orderdetails
GROUP BY productCode
How to retrieve the price of the product that has Raw data
the largest number in quantity ordered for each
order number?

… what would be the output for the following


queue?
SELECT productCode, MAX(quantityOrdered),
priceEach FROM orderdetails GROUP BY
productCode;
27.09.2023

65
Answer to the question for common
mistake of “group by”
The command to extract rows with the ‘priceEach’ of each productCode
with maximam quantityOrdered

Select * from orderdetails


WHERE (productCode,quantityOrdered) IN
(SELECT productCode, MAX(quantityOrdered)
FROM orderdetails GROUP BY productCode);
27.09.2023
66
Think and solution
How to select the records with the
second largest quantityOrdered
for each productCode?

Solution: if the largest quantityOrdered for each


productCode is removed from the table [or use NOT
IN], the second largest one becomes the largest one.
27.09.2023
67
Change date format
• select date_format('2015-03-16', '%m.%d.%y');
Result: 03.16.15
• select date_format('2015-03-16', '%m-%d-%y');
Result: 03-16-15
• select date_format('2015-03-16', '%y-%m-%d');
Result: 15-03-16
• select date_format('2015-03-16', '%Y-%M-%D');
Result: 2015-March-16th 27.09.2023
68
STR_TO_DATE()
• This is the inverse of the DATE_FORMAT()
function. It takes a string str and a format
string format.

27.09.2023
69

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy