0% found this document useful (0 votes)
11 views43 pages

Da Public Slides Ch11 v3 2023

Uploaded by

Tin Tran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views43 pages

Da Public Slides Ch11 v3 2023

Uploaded by

Tin Tran
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

Békés-Kézdi: Data Analysis, Chapter 11: Modelling

probabilities

Data Analysis for Business, Economics,


and Policy
Gábor Békés (Central European University)
Gábor Kézdi (University of Michigan)
Cambridge University Press, 2021
gabors-data-analysis.com

Central European University


Version: v3.1 License: CC BY-NC 4.0
Any comments or suggestions:
gabors.da.contact@gmail.com
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Motivation

I What are the health benefits of not smoking? Considering the 50+ population, we
can investigate if differences in smoking habits are correlated with differences in
health status.

Data Analysis for Business, Economics, and Policy 2 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Binary events

I Start with binary events: things that either happen or don’t happen captured by
binary variable
I How can we model these events?
I We do not observe ‘on average’ larger values for y in this case.

I Solution - model instead the probabilities!

E [y ] = P[y = 1]
I The average of a 0–1 binary variable is also the probability that it is one.
I Frequency (25% of cases) — probability (25% chance)
I Expected value = average probability of event happening
I Use the same tools, but interpretation is changing!

Data Analysis for Business, Economics, and Policy 3 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Modelling events: LMP

Data Analysis for Business, Economics, and Policy 4 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Linear probability model - LPM


I Modelling probability – regression with binary dependent variable.
I Linear Probability Model (LPM) is a linear regression with a binary dependent
variable

I Differences in average y are also differences in the probability that y = 1


I Linear regressions with binary dependent variables show
I differences in expected y by x, is also differences in the probability of y = 1 by x.
I Introduce notation for probability:

y P = P[y = 1|x1 , x2 , . . . ]
I Linear probability model (LPM) regression is

y P = β0 + β1 x1 + β2 x2

Data Analysis for Business, Economics, and Policy 5 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Linear probability model - interpretation

y P = β0 + β1 x1 + β2 x2

I y P denotes the probability that the dependent variable is one, conditional on the
right-hand-side variables of the model.
I β0 shows the probability of y if all x are zero.
I β1 shows the difference in the probability that y = 1 for observations that are
different in x1 but are the same in terms of x2 .
I Still true: average difference in y corresponding to differences in x1 with x2 being
the same.

Data Analysis for Business, Economics, and Policy 6 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Linear probability model - modelling

I Linear probability model (LPM) using OLS.


I We can use all transformations in x, that we used before:
I Log, Polinomials, Splines, dummies, interactions, ect.
I All formulae and interpretations for standard errors, confidence intervals,
hypotheses and p-values of tests are the same.
I Heteroskedasticity robust error are essential in this case!

Data Analysis for Business, Economics, and Policy 7 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Predicted values in LPM


I Predicted values - ŷ P - may be problematic, calculated the same way, but to be
interpreted as probabilities.

ŷ P = β̂0 + β̂1 x1 + β̂2 x2


I Predicted values need to be between 0 and 1 because they are probabilities

I But in LPM, they may be below 0 and above 1. No formal bounds in the model.
I With continuous variables that can take any value (GDP, Population, sales, etc), this
could be a serious issue
I With binary variables, no problem (’saturated models’)

I Problem if goal is prediction!


I Not a big issue for inference → uncover patterns of association.
I But note in theory it may give biased estimates...
Data Analysis for Business, Economics, and Policy 8 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Does smoking pose a health risk?

The question of the case study is whether, and by how much less likely smokers are to
stay healthy than non-smokers.
I focus on people of age 50 to 60 who consider themselves healthy
I ask them four years later as well

Research question: Does smoking lead to deteriorating health?

Data Analysis for Business, Economics, and Policy 9 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Data

I y = 1 if person stayed healthy


I y = 0 if person became unhealthy
I Data comes from SHARE (Survey for Health, Aging and Retirement in Europe)
I 14 European countries
I Demographic information on all individual
I 2011 and 2015 participants are used
I Being healthy means to report “feeling excellent” or “very good”
I N = 3, 109

Data Analysis for Business, Economics, and Policy 10 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

LPM

Start with a simple univariate model with being a smoker.

stays healthy P = α + βsmoker

Both dependent and independent models are using only dummy variables.

Estimated β is -0.072

Can we draw a scatterplot?

Data Analysis for Business, Economics, and Policy 11 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Scatterplot

Figure: Staying healthy - scatterplot and regression line

Data Analysis for Business, Economics, and Policy 12 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

LPM Interpretation

I The coefficient on smokes shows the difference in the probability of staying healthy
comparing current smokers and current nonsmokers.

I Current smokers are 7 percentage points less likely to stay healthy than those that
did not smoke.
I Can add additional controls to capture if quitting matters.

Data Analysis for Business, Economics, and Policy 13 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

LPM with many regressors I.

I Multiple regression – closer to causality


I compare people who are very similar in many respects but are different in smoking
habits
I find many confounders that could be correlated with smoking habits and health
outcomes
I Smokers / non-smokers – different in many other behaviors and conditions:
I personal traits
I behavior such as eating, exercise
I socio-economic conditions
I background - e.g. country they live in

Data Analysis for Business, Economics, and Policy 14 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

LPM with many regressors II.

I Pick variables:
I gender dummy, age, years of education,
I income (measured as in which of the 10 income groups individuals belong within
their country),
I body mass index (a measure of weight relative to height),
I whether the person exercises regularly, the country in which they live.
I country - set of binary indicators.

I Think functional form:


I Continuous control variables might have nonlinear relationship with staying healthy
I Explore the relationship with nonparametric tools

Data Analysis for Business, Economics, and Policy 15 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Functional form selection

Staying healthy and years of education Staying healthy and income group

Decisions: (1) Include education as a piecewise linear spline with knots at 8 and 18 years; (2) include income in
a linear way.
Data Analysis for Business, Economics, and Policy 16 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

LPM results
Probability of staying healthy - extended model
VARIABLES Staying healthy VARIABLES (cnt.)

Current smoker (Y/N) -0.061* Income group 0.008*


(0.024) (0.003)
Ever smoked (Y/N) 0.015 BMI (for < 35) -0.012**
(0.020) (0.003)
Female (Y/N) 0.033 BMI (for >= 35) 0.006
(0.018) (0.017)
Age -0.003 Exercises regularly (Y/N) 0.053**
(0.003) (0.017)
Years of education (for < 8) -0.001 Years of education (for >= 18) -0.010
(0.007) (0.012)
Years of education (for >= 8 and < 18) 0.017** Country indicators YES
(0.003)

Observations 3,109
Robust standard errors in parentheses. ** p<0.01, * p<0.05
Y/N denotes binary vars. BMI and education entered as spline. Age in years. Income in deciles.
Data Analysis for Business, Economics, and Policy 17 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Detour: Regression Tables

I If need to show many explanatory variables


I Do not show table 12*2 rows, people will not see it.

I Either only show selected variables


I Or may need to create two columns.

I Make site you have title, N of observations, footnote on SE, stars.


I SE, stars: many different notations. Check carefully.
I Default is ***= p<0.01. Bit often **=p<0.01 (like here)

Data Analysis for Business, Economics, and Policy 18 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Does smoking pose a health risk?– LPM interpret


I coefficient on currently smoking is −0.06
I The 95% confidence interval is relatively wide [−0.11, −0.01], but it does not
contain zero
I no significant differences in staying healthy when comparing never smokers to
those who used to smoke but quit
I women are 3 percentage points more likely to stay in good health
I age does not seem to matter in this relatively narrow age range of 50 to 60 years
I differences in years of education
I do not matter if we compare people with less than 8 years or more than 18 years,
I matters a lot in-between, with a one-year-difference corresponding to 1.7 percentage
point difference in the likelihood of staying healthy
I income matters somewhat less, maybe non-linear?
I Regular exercise matters.
Data Analysis for Business, Economics, and Policy 19 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

LPM’s predicted probabilities

Histogram of the predicted probabilities


I Predicted probabilities are
calculated from the extended
linear probability model.
I Predicted probability of
staying healthy from this
linear probability model ranges
between 0.036 and 1.011
I LPM means it can be
below 0 or above 1...
I Here, only marginally
above 1

Source: share-health dataset.


Data Analysis for Business, Economics, and Policy 20 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Compare predicted probability distribution

I Drill down in distribution:


I Looking at the composition of people: top vs bottom part of probability distribution
I Look at average values of covariates for top and bottom 1% of predicted
probabilities!

Top 1% predicted probability:


Bottom 1% predicted probability:
I no current smokers, women,
I 37.5% current smokers, 63% men
I avg 17.3ys of education, higher income
I 7.6 years of education, lower income
I BMI of 20.7, and 90% of them
I BMI of 30.5, 19% exercise
exercise.

Data Analysis for Business, Economics, and Policy 21 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Modelling events: logit

Data Analysis for Business, Economics, and Policy 22 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Probability models: logit and probit

I Prediction: predicted probability need to be between 0 and 1

I For prediction, we use non-linear models


I Relate the probability of the y = 1 event to a nonlinear function of the linear
combination of the explanatory variables -> ‘Link function’
I Link function is some F (·), s.t. F (y ) may be used in linear models.

I Two options: Logit and probit – different link function


I Resulting probability is always strictly between zero and one.

Data Analysis for Business, Economics, and Policy 23 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Link functions I.
The logit model has the following form:

exp(β0 + β1 x1 , β2 x2 + ...)
y P = Λ(β0 + β1 x1 , β2 x2 + ...) =
1 + exp(β0 + β1 x1 + β2 x2 + ...)
exp(z)
where the link function Λ(z) = 1+exp(z) is called the logistic function.

The probit model has the following form:

y P = Φ(β0 + β1 x1 + β2 x2 + ...)
Rz  2
where the link function Φ(z) = −∞ √12π exp − z2 dz, is the cumulative distribution
function (CDF) of the standard normal distribution.
Data Analysis for Business, Economics, and Policy 24 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Link functions II.

I Both Λ and Φ are increasing S-shape


curves, bounded between 0 and 1.
(Y here is Λ(z) and Φ(z)
I Plotted against their respective "z"
values. (Here -3 to 3)
I Small difference (indistinguishable) -
logit less steep close to zero and one
= thicker tails than the probit.
I In our models, ‘z’ is a linear
combination of β coefficients and
x-s. The parameter estimates are
typically different in probit vs logit.
Data Analysis for Business, Economics, and Policy 25 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Logit and probit interpretation

I Both the probit and the logit transform the β0 + β1 x1 + ... linear combination
using a link function that shows an S-shaped curve.
I The slope of this curve keeps changing as we change whatever is inside.
I The slope is steepest when y P = 0.5;
I it is flatter further away; and it becomes very flat if y P is close to zero or one.

I The difference in y P that corresponds to a unit difference in any explanatory


variable is not the same.
I You need to take the partial derivatives. It depends on the value of x

I Important consequence: no direct interpretation of the raw coefficient values!

Data Analysis for Business, Economics, and Policy 26 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Marginal differences

I Link functions makes variation in association between x and y P – for logit and
probit models, we do not interpret raw coefficients!
I Instead, transform them into ‘marginal differences’ for interpretation purposes
I The average marginal difference for x is the average difference in the probability
of y = 1, that corresponds to a one unit difference in x.
I Software may call them ‘marginal effects’ or ‘average marginal effects (AME)’ or
‘average partial effects’.

I Average marginal difference has the exact same interpretation as the


coefficient of linear probability models.

Data Analysis for Business, Economics, and Policy 27 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Maximum likelihood estimation

I When estimating a logit or probit model, we use ‘maximum likelihood’ estimation.


I See 11.U2 for details.

I Idea for maximum likelihood is another way to get coefficient estimates. Done in
steps.
I You specify a (conditional) distribution, that you will use during the estimation.
I This is logistic for logit and normal for probit model.
I You maximize this function w.r.t. your β parameters → gives the maximum
likelihood for this model.
I No closed form solution → need to use search algorithms.
I Search algorithms will play critical role in machine learning as well.
I More in DA3.

Data Analysis for Business, Economics, and Policy 28 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Predictions for LMP, Logit and Probit I.

Comparing probabilities from models

I Compare the three model results


I Baseline is LPM - extended model.
I 45 degree line is LPM
I Predicted probabilities from the
logit and the probit shown vs LPM

Data Analysis for Business, Economics, and Policy 29 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Predictions for LMP, Logit and Probit II.

I Predicted probabilities from the Comparing probabilities from models


logit and the probit are practically
the same
I range is between 0.10 and 0.92,
which is narrower than the
LPM, which ranges from 0.036
to 0.101
I LPM, logit and probit models
produce almost exactly the same
predicted probabilities
I except for the lowest and highest
probabilities

Data Analysis for Business, Economics, and Policy 30 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Coefficient results for logit and probit


(1) (2) (3) (4) (5)
Dep.var.: stays healthy LPM logit coeffs logit marginals probit coeffs probit marginals
Current smoker -0.061* -0.284** -0.061** -0.171* -0.060*
(0.024) (0.109) (0.023) (0.066) (0.023)
Ever smoked 0.015 0.078 0.017 0.044 0.016
(0.020) (0.092) (0.020) (0.056) (0.020)
Female 0.033 0.161* 0.034* 0.097 0.034
(0.018) (0.082) (0.018) (0.050) (0.018)
Years of education (if < 8) -0.001 -0.003 -0.001 -0.002 -0.001
(0.007) (0.033) (0.007) (0.020) (0.007)
Years of education (if >= 8 and < 18) 0.017** 0.079** 0.017** 0.048** 0.017**
(0.003) (0.016) (0.003) (0.010) (0.003)
Years of education (if >= 18) -0.010 -0.046 -0.010 -0.029 -0.010
(0.012) (0.055) (0.012) (0.033) (0.012)
Income group 0.008* 0.036* 0.008* 0.022* 0.008*
(0.003) (0.015) (0.003) (0.009) (0.003)
Exercises regularly 0.053** 0.255** 0.055** 0.151** 0.053**
(0.017) (0.079) (0.017) (0.048) (0.017)
Age, BMI, Country YES YES YES YES YES
Observations 3,109 3,109 3,109 3,109 3,109
Data Analysis for Business, Economics, and Policy 31 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Does smoking pose a health risk?– logit and probit

I LPM – interpret the coefficients.


I Logit, probit - Interpret the marginal differences. Basically the same.
I Marginal differences are essentially the same across the logit and the probit.
I Essentially the same as the corresponding LPM coefficients.

I Happens often:
I We could not know which is the "right model" for inference
I Often LPM is good enough for interpretation.
I Check if logit/probit very different.
I Investigate functional forms if yes.

Data Analysis for Business, Economics, and Policy 32 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Goodness of fit measures

I There is no comprehensively accepted goodness of fit measure...


I This is because we do not observe probabilities only 1 and 0...

I R-squared is not the same meaning as before


I Evaluating fit for probability models, we compare predictions that are between zero
and one to values that are zero or one.
I But predicted probabilities would not fit the zero-one variables, so we’d never get it
right.

I R-squared less natural measure of fit, but we can calculate it as usual.


I But: R-squared can not be interpreted the same way we did for linear models.

Data Analysis for Business, Economics, and Policy 33 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Brier score

I Brier score
n
1X P
Brier = (ŷi − yi )2
n
i=1
I The Brier score is the average distance (mean squared difference) between
predicted probabilities and the actual value of y .
I Smaller the Brier score, the better.
I When comparing two predictions, the one with the smaller Brier score is the better
prediction because it produces less (squared) error on average.
I Related to a main concept in prediction: mean squared error (MSE)

Data Analysis for Business, Economics, and Policy 34 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Pseudo R2

I Pseudo R-squared
I Similar to the R-squared – measures the goodness of fit, tailored to binary outcomes.
I Many versions of this measure. Most widely used: McFadden’s R-squared
I Computes the ratio of log-likelihood of the model vs intercept only.
I Can be computed for the logit and the probit but not for the linear probability
model. (No likelihood function there...)

I Another alternative is ‘Log-loss’ measure


I Negative number. Better prediction comes with a smaller log-loss in absolute values.

Data Analysis for Business, Economics, and Policy 35 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Practical use

I There are several measured of model fit, they often give the same ranking of
models.
I Do not use: R-squared could be computed for any model, but it no longer has the
interpretation we had for linear models with quantitative dependent variable.
I Only probit vs logit: pseudo R-squared may be used to rank logit and probit
models.
I Use, especially for prediction: Brier score is a metric that can be computed for all
models and is used in prediction.

Data Analysis for Business, Economics, and Policy 36 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Does smoking pose a health risk?– Goodness of fit

Table: Statistics of goodness of fit for probability predictions models

Statistic Linear probability Logit Probit


R-squared 0.103 0.104 0.104
Brier score 0.215 0.214 0.214
Pseudo R-squared n.a. 0.080 0.080
Log-loss -0.621 -0.617 -0.617

Source: share-health data. People of age 50 to 60 from


14 European countries who reported to be healthy in 2011.
N=3109.

Data Analysis for Business, Economics, and Policy 37 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Does smoking pose a health risk?– Goodness of fit

I Stable ranking – better predictions have a


I higher R-squared and pseudo R-squared
I and a lower Brier score
I a smaller log-loss in absolute values.
I Logit and the probit are of the same quality.
I Logit/probit better than the predictions from linear probability model. The
differences are small.

Data Analysis for Business, Economics, and Policy 38 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Bias of the predictions

I Post-prediction: we may be interested to study some features of our model

I One specific goal: evaluating the bias of the prediction.


I Probability predictions are unbiased if they are right on average = the average of
predicted probabilities is equal to the actual probability of the outcome.
I If the prediction is unbiased, the bias is zero.

I If, in our data, 20% of observations have y = 0 and 80% have y = 1, and the
average of our prediction is N
P
i=1 ŷi /N = 0.8, then our prediction is unbiased.
I A large value of bias indicates a greater tendency to underestimate or overestimate
the chance of an event.

Data Analysis for Business, Economics, and Policy 39 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Calibration

I Unbiasedness refers to the whole distribution of probability predictions is


I A finer and stricter concept is calibration
I A prediction is well calibrated if the actual probability of the outcome is equal to the
predicted probability for each and every value of the predicted probability.
I You take predicted probabilities which are around 10% and check the average for
the realized outcome. If it is 10%, then the prediction is well calibrated.
I ‘Calibration curve’ is used to show this.
I A model may be unbiased (right on average) but not well calibrated
I underestimate high probability events and overestimate low probability ones

Data Analysis for Business, Economics, and Policy 40 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Calibration curve

I A calibration curve
I Horizontal axis shows the values of all predicted probabilities (ŷ P ).
I Vertical axis shows the fraction of y = 1 observations for all observations with the
corresponding predicted probability.
I A well-calibrated case, the calibration curve is close to the 45 degree line.

I In practice we create bins for predicted probabilities and make comparisons of the
actual event’s probability.
I Use percentiles in general. Some cases equal widths are used (this is a more noisy
estimate)

Data Analysis for Business, Economics, and Policy 41 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Calibration curve

I A calibration curve for the logit


model
I 10 bins
I Not only unbiased, but well
calibrated!

Data Analysis for Business, Economics, and Policy 42 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Probability models summary

I Find patterns with ease when y is binary - model probability with regressions
I Linear probability model is mostly good enough, easy inference.
I Predicted values could be below 0, above 1
I Logit (and probit) - better when aim is prediction, predicted values strictly between
0-1
I Most often, LPM, logit, probit - similar inference
I Use marginal (average) differences
I No trivial goodness of fit. Brier score or pseudo-R-Squared.
I Calibration is useful diagnostics tool: well-calibrated models will predict a 20%
chance for events that tend to happen one out of five cases.

Data Analysis for Business, Economics, and Policy 43 / 43 Gábor Békés (Central European University)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy