Da Public Slides Ch11 v3 2023
Da Public Slides Ch11 v3 2023
probabilities
Motivation
I What are the health benefits of not smoking? Considering the 50+ population, we
can investigate if differences in smoking habits are correlated with differences in
health status.
Data Analysis for Business, Economics, and Policy 2 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
Binary events
I Start with binary events: things that either happen or don’t happen captured by
binary variable
I How can we model these events?
I We do not observe ‘on average’ larger values for y in this case.
E [y ] = P[y = 1]
I The average of a 0–1 binary variable is also the probability that it is one.
I Frequency (25% of cases) — probability (25% chance)
I Expected value = average probability of event happening
I Use the same tools, but interpretation is changing!
Data Analysis for Business, Economics, and Policy 3 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
Data Analysis for Business, Economics, and Policy 4 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
y P = P[y = 1|x1 , x2 , . . . ]
I Linear probability model (LPM) regression is
y P = β0 + β1 x1 + β2 x2
Data Analysis for Business, Economics, and Policy 5 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
y P = β0 + β1 x1 + β2 x2
I y P denotes the probability that the dependent variable is one, conditional on the
right-hand-side variables of the model.
I β0 shows the probability of y if all x are zero.
I β1 shows the difference in the probability that y = 1 for observations that are
different in x1 but are the same in terms of x2 .
I Still true: average difference in y corresponding to differences in x1 with x2 being
the same.
Data Analysis for Business, Economics, and Policy 6 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
Data Analysis for Business, Economics, and Policy 7 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
I But in LPM, they may be below 0 and above 1. No formal bounds in the model.
I With continuous variables that can take any value (GDP, Population, sales, etc), this
could be a serious issue
I With binary variables, no problem (’saturated models’)
The question of the case study is whether, and by how much less likely smokers are to
stay healthy than non-smokers.
I focus on people of age 50 to 60 who consider themselves healthy
I ask them four years later as well
Data Analysis for Business, Economics, and Policy 9 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
Data
Data Analysis for Business, Economics, and Policy 10 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
LPM
Both dependent and independent models are using only dummy variables.
Estimated β is -0.072
Data Analysis for Business, Economics, and Policy 11 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
Scatterplot
Data Analysis for Business, Economics, and Policy 12 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
LPM Interpretation
I The coefficient on smokes shows the difference in the probability of staying healthy
comparing current smokers and current nonsmokers.
I Current smokers are 7 percentage points less likely to stay healthy than those that
did not smoke.
I Can add additional controls to capture if quitting matters.
Data Analysis for Business, Economics, and Policy 13 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
Data Analysis for Business, Economics, and Policy 14 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
I Pick variables:
I gender dummy, age, years of education,
I income (measured as in which of the 10 income groups individuals belong within
their country),
I body mass index (a measure of weight relative to height),
I whether the person exercises regularly, the country in which they live.
I country - set of binary indicators.
Data Analysis for Business, Economics, and Policy 15 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
Staying healthy and years of education Staying healthy and income group
Decisions: (1) Include education as a piecewise linear spline with knots at 8 and 18 years; (2) include income in
a linear way.
Data Analysis for Business, Economics, and Policy 16 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
LPM results
Probability of staying healthy - extended model
VARIABLES Staying healthy VARIABLES (cnt.)
Observations 3,109
Robust standard errors in parentheses. ** p<0.01, * p<0.05
Y/N denotes binary vars. BMI and education entered as spline. Age in years. Income in deciles.
Data Analysis for Business, Economics, and Policy 17 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
Data Analysis for Business, Economics, and Policy 18 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
Data Analysis for Business, Economics, and Policy 21 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
Data Analysis for Business, Economics, and Policy 22 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
Data Analysis for Business, Economics, and Policy 23 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
Link functions I.
The logit model has the following form:
exp(β0 + β1 x1 , β2 x2 + ...)
y P = Λ(β0 + β1 x1 , β2 x2 + ...) =
1 + exp(β0 + β1 x1 + β2 x2 + ...)
exp(z)
where the link function Λ(z) = 1+exp(z) is called the logistic function.
y P = Φ(β0 + β1 x1 + β2 x2 + ...)
Rz 2
where the link function Φ(z) = −∞ √12π exp − z2 dz, is the cumulative distribution
function (CDF) of the standard normal distribution.
Data Analysis for Business, Economics, and Policy 24 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
I Both the probit and the logit transform the β0 + β1 x1 + ... linear combination
using a link function that shows an S-shaped curve.
I The slope of this curve keeps changing as we change whatever is inside.
I The slope is steepest when y P = 0.5;
I it is flatter further away; and it becomes very flat if y P is close to zero or one.
Data Analysis for Business, Economics, and Policy 26 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
Marginal differences
I Link functions makes variation in association between x and y P – for logit and
probit models, we do not interpret raw coefficients!
I Instead, transform them into ‘marginal differences’ for interpretation purposes
I The average marginal difference for x is the average difference in the probability
of y = 1, that corresponds to a one unit difference in x.
I Software may call them ‘marginal effects’ or ‘average marginal effects (AME)’ or
‘average partial effects’.
Data Analysis for Business, Economics, and Policy 27 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
I Idea for maximum likelihood is another way to get coefficient estimates. Done in
steps.
I You specify a (conditional) distribution, that you will use during the estimation.
I This is logistic for logit and normal for probit model.
I You maximize this function w.r.t. your β parameters → gives the maximum
likelihood for this model.
I No closed form solution → need to use search algorithms.
I Search algorithms will play critical role in machine learning as well.
I More in DA3.
Data Analysis for Business, Economics, and Policy 28 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
Data Analysis for Business, Economics, and Policy 29 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
Data Analysis for Business, Economics, and Policy 30 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
I Happens often:
I We could not know which is the "right model" for inference
I Often LPM is good enough for interpretation.
I Check if logit/probit very different.
I Investigate functional forms if yes.
Data Analysis for Business, Economics, and Policy 32 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
Data Analysis for Business, Economics, and Policy 33 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
Brier score
I Brier score
n
1X P
Brier = (ŷi − yi )2
n
i=1
I The Brier score is the average distance (mean squared difference) between
predicted probabilities and the actual value of y .
I Smaller the Brier score, the better.
I When comparing two predictions, the one with the smaller Brier score is the better
prediction because it produces less (squared) error on average.
I Related to a main concept in prediction: mean squared error (MSE)
Data Analysis for Business, Economics, and Policy 34 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
Pseudo R2
I Pseudo R-squared
I Similar to the R-squared – measures the goodness of fit, tailored to binary outcomes.
I Many versions of this measure. Most widely used: McFadden’s R-squared
I Computes the ratio of log-likelihood of the model vs intercept only.
I Can be computed for the logit and the probit but not for the linear probability
model. (No likelihood function there...)
Data Analysis for Business, Economics, and Policy 35 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
Practical use
I There are several measured of model fit, they often give the same ranking of
models.
I Do not use: R-squared could be computed for any model, but it no longer has the
interpretation we had for linear models with quantitative dependent variable.
I Only probit vs logit: pseudo R-squared may be used to rank logit and probit
models.
I Use, especially for prediction: Brier score is a metric that can be computed for all
models and is used in prediction.
Data Analysis for Business, Economics, and Policy 36 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
Data Analysis for Business, Economics, and Policy 37 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
Data Analysis for Business, Economics, and Policy 38 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
I If, in our data, 20% of observations have y = 0 and 80% have y = 1, and the
average of our prediction is N
P
i=1 ŷi /N = 0.8, then our prediction is unbiased.
I A large value of bias indicates a greater tendency to underestimate or overestimate
the chance of an event.
Data Analysis for Business, Economics, and Policy 39 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
Calibration
Data Analysis for Business, Economics, and Policy 40 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
Calibration curve
I A calibration curve
I Horizontal axis shows the values of all predicted probabilities (ŷ P ).
I Vertical axis shows the fraction of y = 1 observations for all observations with the
corresponding predicted probability.
I A well-calibrated case, the calibration curve is close to the 45 degree line.
I In practice we create bins for predicted probabilities and make comparisons of the
actual event’s probability.
I Use percentiles in general. Some cases equal widths are used (this is a more noisy
estimate)
Data Analysis for Business, Economics, and Policy 41 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
Calibration curve
Data Analysis for Business, Economics, and Policy 42 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary
I Find patterns with ease when y is binary - model probability with regressions
I Linear probability model is mostly good enough, easy inference.
I Predicted values could be below 0, above 1
I Logit (and probit) - better when aim is prediction, predicted values strictly between
0-1
I Most often, LPM, logit, probit - similar inference
I Use marginal (average) differences
I No trivial goodness of fit. Brier score or pseudo-R-Squared.
I Calibration is useful diagnostics tool: well-calibrated models will predict a 20%
chance for events that tend to happen one out of five cases.
Data Analysis for Business, Economics, and Policy 43 / 43 Gábor Békés (Central European University)