0% found this document useful (0 votes)
42 views12 pages

Lecture 04. Case Study

Uploaded by

dantie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views12 pages

Lecture 04. Case Study

Uploaded by

dantie
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Fundamentals of Data Analytics

Case Study
Instructional Team
Practical use of distributions
❏ Course objective: Theoretical foundations for the DA course → Be patient as
having a good foundation is very important!
❏ Statistics and Probability:
❏ Descriptive: describe the data that we have collected
❏ Inference: characteristics of the underlying population → Predict or Forecast
❏ How distributions are used in practice?
❏ Consider the phenomenon/event of interest as a random variable, the result of a random
experiment
❏ Depending on the nature of the problem, we can assume that this random variable follows a
certain distribution → Need to understand distributions and its assumptions
❏ Historical data will be used to estimate parameters of the distribution
❏ Estimated parameters will be used for predicting or forecasting if we believe that the process
will continue
Poisson Distribution
“...expresses the probability of a given number of events occurring in a fixed interval of
time or space if these events occur with a known constant rate and independently of
the time since the last event.” (Wikipedia.org)

3
Poisson Distribution - Examples
❏ Number of Be bike requests from 8h - 12h everyday
❏ Assumptions:
❏ Fixed interval of time: 8h - 12h everyday.
❏ Known constant rate:
❏ Independence of current event with previous event: request at for instance 9 a.m. from one
customer does not affect the probability of another customer requesting at 9:15 a.m.
❏ From historical data → estimate lambda
❏ Assume that this process will continue in the future → Calculate probability of
having from 100 - 120 requests during that time
❏ Another example: The number of patients arriving in an emergency room
between 10 and 11 pm
Normal Distribution

5
Normal Distribution & Central Limit Theorem

❏ Example: sending SMS to potential customer to invite them to product sales


❏ Define the event of interest as customer respond to our SMS: respond: 1, not
respond: 0 → Response rate is sample mean
❏ Experiment for one month to see their daily click through rate
❏ From this dataset, we can estimate expected response rate (with confidence
interval) when running future campaigns (assuming the same population)
Optimization
❏ What + Why + How
❏ INPUTS + PROCESS + CONSTRAINTS = OUTPUT
❏ Inputs: Money in investment, Ingredients in cooking, Patient information in medical treatment
❏ Constraints: No active derivatives market in Vietnam, No oven, ethical constraints in medical
❏ Process: How do the inputs interact with one another in producing the output? → Both data
and a Data Analyst are needed here!
❏ Output: Revenue, Returns, Cakes, Best treatment procedure → You must decide this.

Question 1: What is the difference between inputs and constraints?

Question 2: Investment in financial markets


Case Study: Marketing Budget Allocation
❏ Objectives:
❏ Introduce the basic concepts of optimization
❏ Demonstrate for the theoretical statistics/probability concepts we cover in this short course
❏ Give you a chance to practice with Python language for those who are new to Python

An over-simplified version of real budget allocation problem.

Hence, a lot of details are ignored or taken as given.


Background Information
❏ You are a Data Analyst for a company which produces a new generation of
electric men razor.
❏ Your company registered an e-commerce site at www.Coolmen-
Coolrazors.com 1 month ago to sell its product online instead of the traditional
supermarket channel.
❏ During the last month, it piloted advertising on 2 channels:
❏ Email Channel
❏ SMS Channel
❏ Data are recorded in a centralized database (discuss later).
Background Information (cont.)
❏ Production cost for each razor is 18$.
❏ Cost per one SMS is $0.050, cost per one email sent is $0.075.
❏ Each email or SMS will be supplied a coupon which can have value of 2$, 4$
or 6$. Coupon is valid for up to 3 razors in each order. They have the option
to wrap the items as gift. Ignore wrapping and shipping costs.
❏ The price without coupon is 40$ / razor.
❏ From experience (and some models), potential customers are divided into 4
age groups:
❏ 18 - 30
❏ 31 - 45
❏ 46 - 60
❏ 60 +
Dataset
id send_date estimated_ age_range channel coupon clicked last_step nb_units order_value
age

Format: Format: Format: Format: Format: Format: Format: Format: Format: Format:
Integer, data, date Integer, string. string, either float, the binary, string. It can integer, float,
representing when ranging from Audience is SMS or value of either 0 have one of representing representing
each sms/Email 0 to 100 divided into Email coupon (customer the following the number value of the
message was sent 4 age expressed in doesn’t values: of units of order
ranges each click on the “received”, customers’ customer
message, link in “bounced”, order. made.
valid for up SMS/Email) “saw review”, Already
to 3 units for or 1 (they “added to minus the
each order clicked) cart”, coupon
“payment applied.
page”,
“purchased”

Received: sms/email sent successfully, but no clicked.


Bounced: they clicked but exited immediately.
Saw review: scroll down and read the review and information of the product
Added to cart: customers added the product to cart to check out
Payment page: They stopped at payment without finishing it
Purchased: They made an order
Question & Suggestions
❏ Question: For the next quarter, your marketing department has a budget of
$60,000 to spend on online campaigns.
❏ How would you allocate it between SMS and Email?
❏ Add other comments if you have any so that we can maximize net profit
❏ Suggestions:
❏ Remember Central Limit Theorem?
❏ Define the journey or funnel of potential customers
❏ What are the important metrics to consider here?

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy