DataAnalyticEngineer Test
DataAnalyticEngineer Test
1/ [SQL]
Assuming that your company is running a loan marketplace where people who want to borrow
money are matched with appropriate loan products provided by different banks, the data
schemas are shown below:
Banks
Column name Data type Notes
Products
Column name Data type Notes
interest_rate float
Customers
marketplace
created_date DateTime
Leads
Column name Data type
Notes
customer_id int
product_id int
apply_date DateTime
A customer with an estimated risk level X will only be matched with a product that accepts
risk level X.
Based on the above data tables, please write SQL queries to:
a) Show the number of products available for each accepted risk level.
b) Show the average interest rates of products provided by HSBC and Techcombank banks.
d) Show which source brings to the marketplace more low-risk customers.
e) Show all months of the year 2018 that the number of customers applying for loans is 20%
higher than the monthly average number of customers of the year.
f) Show the names of all leads who applied in 2019 and are older than 95% of all leads
who applied in 2017
2/ [BUSINESS]
You are an Analytic Engineer for a company which produces a new generation of electric men
razor. Your company registered an e-commerce site at www.Coolmen-Coolrazors.com 1 month
ago to sell its product online instead of the traditional supermarket channel. During the last
month, it piloted advertising on 2 channels:
● Email Channel
● SMS Channel
Data are extracted from a centralized database and stored in the attached file called
“mkt_data.csv”. Dataset
coupon Format: float, the value of coupon expressed in each message, valid for
up to 3 units for each order
clicked Format: binary, either 0 (customer doesn’t click on the link in SMS/Email)
or 1 (they clicked)
last_step Format: string. It can have one of the following values: “received”,
“bounced”, “saw review”, “added to cart”, “payment page”,
“purchased”
nb_units
Format: integer, representing the number of units of customers’ order.
order_value
Format: float, representing the value of the order the customer made.
Already minus the coupon applied.
The column “last_step” is the final point of contact with customers before they leave our
website. Its values are explained below:
● Saw review: scroll down and read the review and information of the product
Financial Information
Together with the data above, you have additional information about the production cost and
the marketing campaigns.
● The production cost for each razor is 18$.
● Cost per one SMS is $0.050, cost per one email sent is $0.075.
● Each email or SMS will be supplied a coupon that can have a value of 2$, 4$ or 6$.
The coupon is valid for up to 3 razors in each order. They have the option to wrap the
items as a gift. Ignore wrapping and shipping costs.
● From experience (and some models), potential customers are divided into 4 age
groups:
○ 18 - 30
○ 31 - 45
○ 46 - 60
○ 60 +
Question
2.a.
For the next quarter, your marketing department has a budget of $60,000 to spend on online
campaigns. How would you allocate it between SMS and Email? Assume that we have a
potential customer pool for each age group as below:
Age Group Pool size
18 - 30
300,000
31 - 45
350,000
46 - 60 500,000
60+ 200,000
2.b.
Now assume that you are also responsible for the operation of the company’s website. Do you
have any comments or suggestions so that we can improve the website’s performance in
order to maximize net profit?
3/ [Python]
Please find attached the file “messy.xlsx”, use Python or your programming language of choice
to do the following:
● Clean the names of columns to lowercase separated by “_”, remove any empty
column if necessary.
● Change the name column to the title case (e.g: Jason Mraz).
● Filter those who join since 2019 and export to a csv file, delimited by “|”, file name
“emp_{report_date}.csv” with report_date = today.