Lecture 04. Case Study
Lecture 04. Case Study
Case Study
Instructional Team
Practical use of distributions
❏ Course objective: Theoretical foundations for the DA course → Be patient as
having a good foundation is very important!
❏ Statistics and Probability:
❏ Descriptive: describe the data that we have collected
❏ Inference: characteristics of the underlying population → Predict or Forecast
❏ How distributions are used in practice?
❏ Consider the phenomenon/event of interest as a random variable, the result of a random
experiment
❏ Depending on the nature of the problem, we can assume that this random variable follows a
certain distribution → Need to understand distributions and its assumptions
❏ Historical data will be used to estimate parameters of the distribution
❏ Estimated parameters will be used for predicting or forecasting if we believe that the process
will continue
Poisson Distribution
“...expresses the probability of a given number of events occurring in a fixed interval of
time or space if these events occur with a known constant rate and independently of
the time since the last event.” (Wikipedia.org)
3
Poisson Distribution - Examples
❏ Number of Be bike requests from 8h - 12h everyday
❏ Assumptions:
❏ Fixed interval of time: 8h - 12h everyday.
❏ Known constant rate:
❏ Independence of current event with previous event: request at for instance 9 a.m. from one
customer does not affect the probability of another customer requesting at 9:15 a.m.
❏ From historical data → estimate lambda
❏ Assume that this process will continue in the future → Calculate probability of
having from 100 - 120 requests during that time
❏ Another example: The number of patients arriving in an emergency room
between 10 and 11 pm
Normal Distribution
5
Normal Distribution & Central Limit Theorem
Format: Format: Format: Format: Format: Format: Format: Format: Format: Format:
Integer, data, date Integer, string. string, either float, the binary, string. It can integer, float,
representing when ranging from Audience is SMS or value of either 0 have one of representing representing
each sms/Email 0 to 100 divided into Email coupon (customer the following the number value of the
message was sent 4 age expressed in doesn’t values: of units of order
ranges each click on the “received”, customers’ customer
message, link in “bounced”, order. made.
valid for up SMS/Email) “saw review”, Already
to 3 units for or 1 (they “added to minus the
each order clicked) cart”, coupon
“payment applied.
page”,
“purchased”