0% found this document useful (0 votes)

11 views43 pages

Da Public Slides Ch11 v3 2023

Uploaded by

Tin Tran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views43 pages

Da Public Slides Ch11 v3 2023

Uploaded by

Tin Tran

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 43

Békés-Kézdi: Data Analysis, Chapter 11: Modelling

probabilities

Data Analysis for Business, Economics,

and Policy
Gábor Békés (Central European University)
Gábor Kézdi (University of Michigan)
Cambridge University Press, 2021
gabors-data-analysis.com

Central European University

Version: v3.1 License: CC BY-NC 4.0
Any comments or suggestions:
gabors.da.contact@gmail.com
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Motivation

I What are the health benefits of not smoking? Considering the 50+ population, we
can investigate if differences in smoking habits are correlated with differences in
health status.

Data Analysis for Business, Economics, and Policy 2 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Binary events

I Start with binary events: things that either happen or don’t happen captured by
binary variable
I How can we model these events?
I We do not observe ‘on average’ larger values for y in this case.

I Solution - model instead the probabilities!

E [y ] = P[y = 1]
I The average of a 0–1 binary variable is also the probability that it is one.
I Frequency (25% of cases) — probability (25% chance)
I Expected value = average probability of event happening
I Use the same tools, but interpretation is changing!

Data Analysis for Business, Economics, and Policy 3 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Modelling events: LMP

Data Analysis for Business, Economics, and Policy 4 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Linear probability model - LPM

I Modelling probability – regression with binary dependent variable.
I Linear Probability Model (LPM) is a linear regression with a binary dependent
variable

I Differences in average y are also differences in the probability that y = 1

I Linear regressions with binary dependent variables show
I differences in expected y by x, is also differences in the probability of y = 1 by x.
I Introduce notation for probability:

y P = P[y = 1|x1 , x2 , . . . ]
I Linear probability model (LPM) regression is

y P = β0 + β1 x1 + β2 x2

Data Analysis for Business, Economics, and Policy 5 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Linear probability model - interpretation

y P = β0 + β1 x1 + β2 x2

I y P denotes the probability that the dependent variable is one, conditional on the
right-hand-side variables of the model.
I β0 shows the probability of y if all x are zero.
I β1 shows the difference in the probability that y = 1 for observations that are
different in x1 but are the same in terms of x2 .
I Still true: average difference in y corresponding to differences in x1 with x2 being
the same.

Data Analysis for Business, Economics, and Policy 6 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Linear probability model - modelling

I Linear probability model (LPM) using OLS.

I We can use all transformations in x, that we used before:
I Log, Polinomials, Splines, dummies, interactions, ect.
I All formulae and interpretations for standard errors, confidence intervals,
hypotheses and p-values of tests are the same.
I Heteroskedasticity robust error are essential in this case!

Data Analysis for Business, Economics, and Policy 7 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Predicted values in LPM

I Predicted values - ŷ P - may be problematic, calculated the same way, but to be
interpreted as probabilities.

ŷ P = β̂0 + β̂1 x1 + β̂2 x2

I Predicted values need to be between 0 and 1 because they are probabilities

I But in LPM, they may be below 0 and above 1. No formal bounds in the model.
I With continuous variables that can take any value (GDP, Population, sales, etc), this
could be a serious issue
I With binary variables, no problem (’saturated models’)

I Problem if goal is prediction!

I Not a big issue for inference → uncover patterns of association.
I But note in theory it may give biased estimates...
Data Analysis for Business, Economics, and Policy 8 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Does smoking pose a health risk?

The question of the case study is whether, and by how much less likely smokers are to
stay healthy than non-smokers.
I focus on people of age 50 to 60 who consider themselves healthy
I ask them four years later as well

Research question: Does smoking lead to deteriorating health?

Data Analysis for Business, Economics, and Policy 9 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Data

I y = 1 if person stayed healthy

I y = 0 if person became unhealthy
I Data comes from SHARE (Survey for Health, Aging and Retirement in Europe)
I 14 European countries
I Demographic information on all individual
I 2011 and 2015 participants are used
I Being healthy means to report “feeling excellent” or “very good”
I N = 3, 109

Data Analysis for Business, Economics, and Policy 10 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

LPM

Start with a simple univariate model with being a smoker.

stays healthy P = α + βsmoker

Both dependent and independent models are using only dummy variables.

Estimated β is -0.072

Can we draw a scatterplot?

Data Analysis for Business, Economics, and Policy 11 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Scatterplot

Figure: Staying healthy - scatterplot and regression line

Data Analysis for Business, Economics, and Policy 12 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

LPM Interpretation

I The coefficient on smokes shows the difference in the probability of staying healthy
comparing current smokers and current nonsmokers.

I Current smokers are 7 percentage points less likely to stay healthy than those that
did not smoke.
I Can add additional controls to capture if quitting matters.

Data Analysis for Business, Economics, and Policy 13 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

LPM with many regressors I.

I Multiple regression – closer to causality

I compare people who are very similar in many respects but are different in smoking
habits
I find many confounders that could be correlated with smoking habits and health
outcomes
I Smokers / non-smokers – different in many other behaviors and conditions:
I personal traits
I behavior such as eating, exercise
I socio-economic conditions
I background - e.g. country they live in

Data Analysis for Business, Economics, and Policy 14 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

LPM with many regressors II.

I Pick variables:
I gender dummy, age, years of education,
I income (measured as in which of the 10 income groups individuals belong within
their country),
I body mass index (a measure of weight relative to height),
I whether the person exercises regularly, the country in which they live.
I country - set of binary indicators.

I Think functional form:

I Continuous control variables might have nonlinear relationship with staying healthy
I Explore the relationship with nonparametric tools

Data Analysis for Business, Economics, and Policy 15 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Functional form selection

Staying healthy and years of education Staying healthy and income group

Decisions: (1) Include education as a piecewise linear spline with knots at 8 and 18 years; (2) include income in
a linear way.
Data Analysis for Business, Economics, and Policy 16 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

LPM results
Probability of staying healthy - extended model
VARIABLES Staying healthy VARIABLES (cnt.)

Current smoker (Y/N) -0.061* Income group 0.008*

(0.024) (0.003)
Ever smoked (Y/N) 0.015 BMI (for < 35) -0.012**
(0.020) (0.003)
Female (Y/N) 0.033 BMI (for >= 35) 0.006
(0.018) (0.017)
Age -0.003 Exercises regularly (Y/N) 0.053**
(0.003) (0.017)
Years of education (for < 8) -0.001 Years of education (for >= 18) -0.010
(0.007) (0.012)
Years of education (for >= 8 and < 18) 0.017** Country indicators YES
(0.003)

Observations 3,109
Robust standard errors in parentheses. ** p<0.01, * p<0.05
Y/N denotes binary vars. BMI and education entered as spline. Age in years. Income in deciles.
Data Analysis for Business, Economics, and Policy 17 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Detour: Regression Tables

I If need to show many explanatory variables

I Do not show table 12*2 rows, people will not see it.

I Either only show selected variables

I Or may need to create two columns.

I Make site you have title, N of observations, footnote on SE, stars.

I SE, stars: many different notations. Check carefully.
I Default is ***= p<0.01. Bit often **=p<0.01 (like here)

Data Analysis for Business, Economics, and Policy 18 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Does smoking pose a health risk?– LPM interpret

I coefficient on currently smoking is −0.06
I The 95% confidence interval is relatively wide [−0.11, −0.01], but it does not
contain zero
I no significant differences in staying healthy when comparing never smokers to
those who used to smoke but quit
I women are 3 percentage points more likely to stay in good health
I age does not seem to matter in this relatively narrow age range of 50 to 60 years
I differences in years of education
I do not matter if we compare people with less than 8 years or more than 18 years,
I matters a lot in-between, with a one-year-difference corresponding to 1.7 percentage
point difference in the likelihood of staying healthy
I income matters somewhat less, maybe non-linear?
I Regular exercise matters.
Data Analysis for Business, Economics, and Policy 19 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

LPM’s predicted probabilities

Histogram of the predicted probabilities

I Predicted probabilities are
calculated from the extended
linear probability model.
I Predicted probability of
staying healthy from this
linear probability model ranges
between 0.036 and 1.011
I LPM means it can be
below 0 or above 1...
I Here, only marginally
above 1

Source: share-health dataset.

Data Analysis for Business, Economics, and Policy 20 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Compare predicted probability distribution

I Drill down in distribution:

I Looking at the composition of people: top vs bottom part of probability distribution
I Look at average values of covariates for top and bottom 1% of predicted
probabilities!

Top 1% predicted probability:

Bottom 1% predicted probability:
I no current smokers, women,
I 37.5% current smokers, 63% men
I avg 17.3ys of education, higher income
I 7.6 years of education, lower income
I BMI of 20.7, and 90% of them
I BMI of 30.5, 19% exercise
exercise.

Data Analysis for Business, Economics, and Policy 21 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Modelling events: logit

Data Analysis for Business, Economics, and Policy 22 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Probability models: logit and probit

I Prediction: predicted probability need to be between 0 and 1

I For prediction, we use non-linear models

I Relate the probability of the y = 1 event to a nonlinear function of the linear
combination of the explanatory variables -> ‘Link function’
I Link function is some F (·), s.t. F (y ) may be used in linear models.

I Two options: Logit and probit – different link function

I Resulting probability is always strictly between zero and one.

Data Analysis for Business, Economics, and Policy 23 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Link functions I.
The logit model has the following form:

exp(β0 + β1 x1 , β2 x2 + ...)
y P = Λ(β0 + β1 x1 , β2 x2 + ...) =
1 + exp(β0 + β1 x1 + β2 x2 + ...)
exp(z)
where the link function Λ(z) = 1+exp(z) is called the logistic function.

The probit model has the following form:

y P = Φ(β0 + β1 x1 + β2 x2 + ...)
Rz 2
where the link function Φ(z) = −∞ √12π exp − z2 dz, is the cumulative distribution
function (CDF) of the standard normal distribution.
Data Analysis for Business, Economics, and Policy 24 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Link functions II.

I Both Λ and Φ are increasing S-shape

curves, bounded between 0 and 1.
(Y here is Λ(z) and Φ(z)
I Plotted against their respective "z"
values. (Here -3 to 3)
I Small difference (indistinguishable) -
logit less steep close to zero and one
= thicker tails than the probit.
I In our models, ‘z’ is a linear
combination of β coefficients and
x-s. The parameter estimates are
typically different in probit vs logit.
Data Analysis for Business, Economics, and Policy 25 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Logit and probit interpretation

I Both the probit and the logit transform the β0 + β1 x1 + ... linear combination
using a link function that shows an S-shaped curve.
I The slope of this curve keeps changing as we change whatever is inside.
I The slope is steepest when y P = 0.5;
I it is flatter further away; and it becomes very flat if y P is close to zero or one.

I The difference in y P that corresponds to a unit difference in any explanatory

variable is not the same.
I You need to take the partial derivatives. It depends on the value of x

I Important consequence: no direct interpretation of the raw coefficient values!

Data Analysis for Business, Economics, and Policy 26 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Marginal differences

I Link functions makes variation in association between x and y P – for logit and
probit models, we do not interpret raw coefficients!
I Instead, transform them into ‘marginal differences’ for interpretation purposes
I The average marginal difference for x is the average difference in the probability
of y = 1, that corresponds to a one unit difference in x.
I Software may call them ‘marginal effects’ or ‘average marginal effects (AME)’ or
‘average partial effects’.

I Average marginal difference has the exact same interpretation as the

coefficient of linear probability models.

Data Analysis for Business, Economics, and Policy 27 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Maximum likelihood estimation

I When estimating a logit or probit model, we use ‘maximum likelihood’ estimation.

I See 11.U2 for details.

I Idea for maximum likelihood is another way to get coefficient estimates. Done in
steps.
I You specify a (conditional) distribution, that you will use during the estimation.
I This is logistic for logit and normal for probit model.
I You maximize this function w.r.t. your β parameters → gives the maximum
likelihood for this model.
I No closed form solution → need to use search algorithms.
I Search algorithms will play critical role in machine learning as well.
I More in DA3.

Data Analysis for Business, Economics, and Policy 28 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Predictions for LMP, Logit and Probit I.

Comparing probabilities from models

I Compare the three model results

I Baseline is LPM - extended model.
I 45 degree line is LPM
I Predicted probabilities from the
logit and the probit shown vs LPM

Data Analysis for Business, Economics, and Policy 29 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Predictions for LMP, Logit and Probit II.

I Predicted probabilities from the Comparing probabilities from models

logit and the probit are practically
the same
I range is between 0.10 and 0.92,
which is narrower than the
LPM, which ranges from 0.036
to 0.101
I LPM, logit and probit models
produce almost exactly the same
predicted probabilities
I except for the lowest and highest
probabilities

Data Analysis for Business, Economics, and Policy 30 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Coefficient results for logit and probit

(1) (2) (3) (4) (5)
Dep.var.: stays healthy LPM logit coeffs logit marginals probit coeffs probit marginals
Current smoker -0.061* -0.284** -0.061** -0.171* -0.060*
(0.024) (0.109) (0.023) (0.066) (0.023)
Ever smoked 0.015 0.078 0.017 0.044 0.016
(0.020) (0.092) (0.020) (0.056) (0.020)
Female 0.033 0.161* 0.034* 0.097 0.034
(0.018) (0.082) (0.018) (0.050) (0.018)
Years of education (if < 8) -0.001 -0.003 -0.001 -0.002 -0.001
(0.007) (0.033) (0.007) (0.020) (0.007)
Years of education (if >= 8 and < 18) 0.017** 0.079** 0.017** 0.048** 0.017**
(0.003) (0.016) (0.003) (0.010) (0.003)
Years of education (if >= 18) -0.010 -0.046 -0.010 -0.029 -0.010
(0.012) (0.055) (0.012) (0.033) (0.012)
Income group 0.008* 0.036* 0.008* 0.022* 0.008*
(0.003) (0.015) (0.003) (0.009) (0.003)
Exercises regularly 0.053** 0.255** 0.055** 0.151** 0.053**
(0.017) (0.079) (0.017) (0.048) (0.017)
Age, BMI, Country YES YES YES YES YES
Observations 3,109 3,109 3,109 3,109 3,109
Data Analysis for Business, Economics, and Policy 31 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Does smoking pose a health risk?– logit and probit

I LPM – interpret the coefficients.

I Logit, probit - Interpret the marginal differences. Basically the same.
I Marginal differences are essentially the same across the logit and the probit.
I Essentially the same as the corresponding LPM coefficients.

I Happens often:
I We could not know which is the "right model" for inference
I Often LPM is good enough for interpretation.
I Check if logit/probit very different.
I Investigate functional forms if yes.

Data Analysis for Business, Economics, and Policy 32 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Goodness of fit measures

I There is no comprehensively accepted goodness of fit measure...

I This is because we do not observe probabilities only 1 and 0...

I R-squared is not the same meaning as before

I Evaluating fit for probability models, we compare predictions that are between zero
and one to values that are zero or one.
I But predicted probabilities would not fit the zero-one variables, so we’d never get it
right.

I R-squared less natural measure of fit, but we can calculate it as usual.

I But: R-squared can not be interpreted the same way we did for linear models.

Data Analysis for Business, Economics, and Policy 33 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Brier score

I Brier score
n
1X P
Brier = (ŷi − yi )2
n
i=1
I The Brier score is the average distance (mean squared difference) between
predicted probabilities and the actual value of y .
I Smaller the Brier score, the better.
I When comparing two predictions, the one with the smaller Brier score is the better
prediction because it produces less (squared) error on average.
I Related to a main concept in prediction: mean squared error (MSE)

Data Analysis for Business, Economics, and Policy 34 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Pseudo R2

I Pseudo R-squared
I Similar to the R-squared – measures the goodness of fit, tailored to binary outcomes.
I Many versions of this measure. Most widely used: McFadden’s R-squared
I Computes the ratio of log-likelihood of the model vs intercept only.
I Can be computed for the logit and the probit but not for the linear probability
model. (No likelihood function there...)

I Another alternative is ‘Log-loss’ measure

I Negative number. Better prediction comes with a smaller log-loss in absolute values.

Data Analysis for Business, Economics, and Policy 35 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Practical use

I There are several measured of model fit, they often give the same ranking of
models.
I Do not use: R-squared could be computed for any model, but it no longer has the
interpretation we had for linear models with quantitative dependent variable.
I Only probit vs logit: pseudo R-squared may be used to rank logit and probit
models.
I Use, especially for prediction: Brier score is a metric that can be computed for all
models and is used in prediction.

Data Analysis for Business, Economics, and Policy 36 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Does smoking pose a health risk?– Goodness of fit

Table: Statistics of goodness of fit for probability predictions models

Statistic Linear probability Logit Probit

R-squared 0.103 0.104 0.104
Brier score 0.215 0.214 0.214
Pseudo R-squared n.a. 0.080 0.080
Log-loss -0.621 -0.617 -0.617

Source: share-health data. People of age 50 to 60 from

14 European countries who reported to be healthy in 2011.
N=3109.

Data Analysis for Business, Economics, and Policy 37 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Does smoking pose a health risk?– Goodness of fit

I Stable ranking – better predictions have a

I higher R-squared and pseudo R-squared
I and a lower Brier score
I a smaller log-loss in absolute values.
I Logit and the probit are of the same quality.
I Logit/probit better than the predictions from linear probability model. The
differences are small.

Data Analysis for Business, Economics, and Policy 38 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Bias of the predictions

I Post-prediction: we may be interested to study some features of our model

I One specific goal: evaluating the bias of the prediction.

I Probability predictions are unbiased if they are right on average = the average of
predicted probabilities is equal to the actual probability of the outcome.
I If the prediction is unbiased, the bias is zero.

I If, in our data, 20% of observations have y = 0 and 80% have y = 1, and the
average of our prediction is N
P
i=1 ŷi /N = 0.8, then our prediction is unbiased.
I A large value of bias indicates a greater tendency to underestimate or overestimate
the chance of an event.

Data Analysis for Business, Economics, and Policy 39 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Calibration

I Unbiasedness refers to the whole distribution of probability predictions is

I A finer and stricter concept is calibration
I A prediction is well calibrated if the actual probability of the outcome is equal to the
predicted probability for each and every value of the predicted probability.
I You take predicted probabilities which are around 10% and check the average for
the realized outcome. If it is 10%, then the prediction is well calibrated.
I ‘Calibration curve’ is used to show this.
I A model may be unbiased (right on average) but not well calibrated
I underestimate high probability events and overestimate low probability ones

Data Analysis for Business, Economics, and Policy 40 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Calibration curve

I A calibration curve
I Horizontal axis shows the values of all predicted probabilities (ŷ P ).
I Vertical axis shows the fraction of y = 1 observations for all observations with the
corresponding predicted probability.
I A well-calibrated case, the calibration curve is close to the 45 degree line.

I In practice we create bins for predicted probabilities and make comparisons of the
actual event’s probability.
I Use percentiles in general. Some cases equal widths are used (this is a more noisy
estimate)

Data Analysis for Business, Economics, and Policy 41 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Calibration curve

I A calibration curve for the logit

model
I 10 bins
I Not only unbiased, but well
calibrated!

Data Analysis for Business, Economics, and Policy 42 / 43 Gábor Békés (Central European University)
Concepts LPM CS A1 Logit&probit CS A2-A3 Goodness of fit CS A4a Diagnostics CS A4b Summary

Probability models summary

I Find patterns with ease when y is binary - model probability with regressions
I Linear probability model is mostly good enough, easy inference.
I Predicted values could be below 0, above 1
I Logit (and probit) - better when aim is prediction, predicted values strictly between
0-1
I Most often, LPM, logit, probit - similar inference
I Use marginal (average) differences
I No trivial goodness of fit. Brier score or pseudo-R-Squared.
I Calibration is useful diagnostics tool: well-calibrated models will predict a 20%
chance for events that tend to happen one out of five cases.

Data Analysis for Business, Economics, and Policy 43 / 43 Gábor Békés (Central European University)

Getting Grounded On Analytics
100% (2)
Getting Grounded On Analytics
31 pages
Automated Scoring of A Neuropsychological Test - Rey Osterieth
0% (1)
Automated Scoring of A Neuropsychological Test - Rey Osterieth
8 pages
Discrete Choice Models 230919 191735
No ratings yet
Discrete Choice Models 230919 191735
132 pages
Logistic Regression
No ratings yet
Logistic Regression
54 pages
Topic 3: Qualitative Response Regression Models
No ratings yet
Topic 3: Qualitative Response Regression Models
29 pages
Us20 Allison
No ratings yet
Us20 Allison
10 pages
Week12-1 - Probit - Logit - 2
No ratings yet
Week12-1 - Probit - Logit - 2
4 pages
Week 12 LPN Logit 0
No ratings yet
Week 12 LPN Logit 0
35 pages
Econometrics - Qualitative Response Models
No ratings yet
Econometrics - Qualitative Response Models
17 pages
MicroEconometrics Lecture10
No ratings yet
MicroEconometrics Lecture10
27 pages
4a. LPM-Logit-Probit-Tobit Model - IInd Sem 23-24
No ratings yet
4a. LPM-Logit-Probit-Tobit Model - IInd Sem 23-24
130 pages
Logit & Probit Model
No ratings yet
Logit & Probit Model
51 pages
Chapter 15.1
No ratings yet
Chapter 15.1
22 pages
Chapter 5-LDVM-2024
No ratings yet
Chapter 5-LDVM-2024
27 pages
Econometrics CH 4
No ratings yet
Econometrics CH 4
14 pages
REgression 1
No ratings yet
REgression 1
19 pages
Discrete Choice Model Soderbom
No ratings yet
Discrete Choice Model Soderbom
43 pages
Seminar Econometrie
No ratings yet
Seminar Econometrie
15 pages
In All The Regression Models That We Have Considered So
100% (1)
In All The Regression Models That We Have Considered So
52 pages
Qualitative Response Regression Model - Probabilistic Models
No ratings yet
Qualitative Response Regression Model - Probabilistic Models
34 pages
Probit Logit Models
No ratings yet
Probit Logit Models
26 pages
Chapter - Five - Limited Dependent Variable Models
No ratings yet
Chapter - Five - Limited Dependent Variable Models
75 pages
Assignment 2
No ratings yet
Assignment 2
11 pages
Regression Logistic Regression
100% (1)
Regression Logistic Regression
37 pages
Tutorial 12 QM@
No ratings yet
Tutorial 12 QM@
17 pages
Logit Probit
No ratings yet
Logit Probit
11 pages
Chapter 5 MGT
No ratings yet
Chapter 5 MGT
60 pages
Chapter 5
No ratings yet
Chapter 5
25 pages
Ch4 Classifications24
No ratings yet
Ch4 Classifications24
42 pages
Limited Dependent Variables
No ratings yet
Limited Dependent Variables
34 pages
Metrikaq
No ratings yet
Metrikaq
11 pages
Econometrics Eviews 6
No ratings yet
Econometrics Eviews 6
12 pages
Chapter 4
No ratings yet
Chapter 4
11 pages
Introduction To Logistic Regression: Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein
No ratings yet
Introduction To Logistic Regression: Rachid Salmi, Jean-Claude Desenclos, Alain Moren, Thomas Grein
36 pages
Cap1 Slides
No ratings yet
Cap1 Slides
30 pages
Qualitative Response Regression Models 1
No ratings yet
Qualitative Response Regression Models 1
29 pages
Ecmetrics II Ch1
No ratings yet
Ecmetrics II Ch1
56 pages
Presentation Health Insurance USA
No ratings yet
Presentation Health Insurance USA
18 pages
R Egression Simplified
No ratings yet
R Egression Simplified
24 pages
Binary
No ratings yet
Binary
135 pages
Binary
No ratings yet
Binary
47 pages
Lecture Notes - Logistic Regression
100% (1)
Lecture Notes - Logistic Regression
11 pages
Binary
No ratings yet
Binary
40 pages
Summary Note
No ratings yet
Summary Note
2 pages
411 Note LDV
No ratings yet
411 Note LDV
12 pages
Econometrics 2 Notes
No ratings yet
Econometrics 2 Notes
14 pages
Logistic Regression Notes
No ratings yet
Logistic Regression Notes
79 pages
Regression With Linear Predictors Complete DOCX Download
100% (20)
Regression With Linear Predictors Complete DOCX Download
16 pages
Part III - Analysis With NonLinear Models
No ratings yet
Part III - Analysis With NonLinear Models
68 pages
LPM Tabrief
No ratings yet
LPM Tabrief
5 pages
Regression With A Binary Dependent Variable: Michael Ash
No ratings yet
Regression With A Binary Dependent Variable: Michael Ash
18 pages
CS ELEC 4 Finals Module
No ratings yet
CS ELEC 4 Finals Module
57 pages
CH-4-Discrete Choice Models-Short
No ratings yet
CH-4-Discrete Choice Models-Short
58 pages
Week 6 Notes
No ratings yet
Week 6 Notes
107 pages
Unitb - II - Linear Probability, Logit and Probit
No ratings yet
Unitb - II - Linear Probability, Logit and Probit
34 pages
Limited Dependent Variables - Binary Dependent Variables
No ratings yet
Limited Dependent Variables - Binary Dependent Variables
24 pages
Binaryresponsemf IMP
No ratings yet
Binaryresponsemf IMP
11 pages
Log Reg
No ratings yet
Log Reg
32 pages
MGMT 469 Maximum Likelihood Estimation
No ratings yet
MGMT 469 Maximum Likelihood Estimation
6 pages
Decision Science - June - 2023
No ratings yet
Decision Science - June - 2023
8 pages
Lecture 8 - Limited Dependent Var PDF
No ratings yet
Lecture 8 - Limited Dependent Var PDF
78 pages
Spatial Analysis For Port Crisis
No ratings yet
Spatial Analysis For Port Crisis
24 pages
Game Theory in Transport and Logistics
No ratings yet
Game Theory in Transport and Logistics
6 pages
Keyboard Shortcuts 2 Excel
No ratings yet
Keyboard Shortcuts 2 Excel
1 page
Opti Supp Chai
No ratings yet
Opti Supp Chai
6 pages
Scaling Laws and Statistical Properties of The Transaction Flows and Holding Times of Bitcoin
No ratings yet
Scaling Laws and Statistical Properties of The Transaction Flows and Holding Times of Bitcoin
48 pages
Useful Junk The Effects of Visual Embellishment On Comprehension and Memorability of Charts
No ratings yet
Useful Junk The Effects of Visual Embellishment On Comprehension and Memorability of Charts
11 pages
1056 Mat Hang Viet Nam Xuat Khau Sang Thuy Dien Nam 2023 Va Thi Phan
No ratings yet
1056 Mat Hang Viet Nam Xuat Khau Sang Thuy Dien Nam 2023 Va Thi Phan
74 pages
WhartonOnline M2
No ratings yet
WhartonOnline M2
24 pages
Tws LC Target Costing
No ratings yet
Tws LC Target Costing
6 pages
WhartonOnline M7
No ratings yet
WhartonOnline M7
84 pages
Ant Colony Optimization in Supply Chain
No ratings yet
Ant Colony Optimization in Supply Chain
14 pages
Cooperative Games in Two-Echelon Supply Chains
No ratings yet
Cooperative Games in Two-Echelon Supply Chains
18 pages
WhartonOnline M3
No ratings yet
WhartonOnline M3
32 pages
Py Regex v4p0
No ratings yet
Py Regex v4p0
122 pages
Stochastic Processes
No ratings yet
Stochastic Processes
277 pages
How Data Is Driving Resilient Sustainable Supply Chain 2021
No ratings yet
How Data Is Driving Resilient Sustainable Supply Chain 2021
16 pages
Data Driven Digital Transformation For Emergency 2022 International Journal
No ratings yet
Data Driven Digital Transformation For Emergency 2022 International Journal
11 pages
P2-4 Setiyono 2014 Remote-Sensing Based Crop Yield Monitoring
No ratings yet
P2-4 Setiyono 2014 Remote-Sensing Based Crop Yield Monitoring
12 pages
Using AI To Detect Panic Buying
No ratings yet
Using AI To Detect Panic Buying
30 pages
Paccurate Report - How To Save The Planet - Think Inside The Box
No ratings yet
Paccurate Report - How To Save The Planet - Think Inside The Box
8 pages
MIT15 060F14 HW2 Work
No ratings yet
MIT15 060F14 HW2 Work
4 pages
How To Direct Source Products To Sell Online - NerdWallet
No ratings yet
How To Direct Source Products To Sell Online - NerdWallet
16 pages
A System Dynamics Archetype To Mitigate Rework 2022 International Journal o
No ratings yet
A System Dynamics Archetype To Mitigate Rework 2022 International Journal o
11 pages
Oecd Competitive Neutrality Reviews Vietnam 2021 Highlights
No ratings yet
Oecd Competitive Neutrality Reviews Vietnam 2021 Highlights
4 pages
Trade Partner Diversification
No ratings yet
Trade Partner Diversification
39 pages
Neo4j GDS Use Cases Supply Chain
No ratings yet
Neo4j GDS Use Cases Supply Chain
5 pages
SSRN 4477833
No ratings yet
SSRN 4477833
44 pages
Estimating Food Value Chain FAO
No ratings yet
Estimating Food Value Chain FAO
36 pages
1 What Is Ob
No ratings yet
1 What Is Ob
53 pages
Yael Navaro-Yashin - The Make-Believe Space - Affective Geography in A Postwar Polity-Duke University Press (2012) PDF
No ratings yet
Yael Navaro-Yashin - The Make-Believe Space - Affective Geography in A Postwar Polity-Duke University Press (2012) PDF
297 pages
The Impact of High Potential (Hipot) Testing On
No ratings yet
The Impact of High Potential (Hipot) Testing On
4 pages
Job Description of Nursing Personnel Job Description of Nursing Director
No ratings yet
Job Description of Nursing Personnel Job Description of Nursing Director
21 pages
Types of Power : Organization
No ratings yet
Types of Power : Organization
2 pages
Lacan Unary Trait PDF
No ratings yet
Lacan Unary Trait PDF
26 pages
Escorts
100% (2)
Escorts
59 pages
Agannath Niversity: Causes and Consequences of Water Pollution: A Study in Dhaka City
No ratings yet
Agannath Niversity: Causes and Consequences of Water Pollution: A Study in Dhaka City
8 pages
ID Analisis Tata Kelola Pajak Bumi Bangunan
No ratings yet
ID Analisis Tata Kelola Pajak Bumi Bangunan
11 pages
School Leadership Preparation Questionnaire
No ratings yet
School Leadership Preparation Questionnaire
2 pages
Remote Interview Vs Personal Interview
No ratings yet
Remote Interview Vs Personal Interview
13 pages
A Guide For Maccrate
No ratings yet
A Guide For Maccrate
53 pages
Sensitivity Analysis
No ratings yet
Sensitivity Analysis
8 pages
A Functional Approach To Basics of Data Science With Excel-Book - Chapter 1 and 2 - 1st Print
No ratings yet
A Functional Approach To Basics of Data Science With Excel-Book - Chapter 1 and 2 - 1st Print
13 pages
Intergroup and Third Party Peacemaking Interventions
No ratings yet
Intergroup and Third Party Peacemaking Interventions
23 pages
Use of Calculators in Examinations
No ratings yet
Use of Calculators in Examinations
3 pages
Copar - Community Organizing Participatory Action Research
No ratings yet
Copar - Community Organizing Participatory Action Research
2 pages
European E Democracy in Practice
No ratings yet
European E Democracy in Practice
359 pages
CN Consulting Women in Ai en
No ratings yet
CN Consulting Women in Ai en
41 pages
Critical Success Factors in Managing Modular Production Design Six Company Case Studies in Hong Kong, China, and Singapore - Lau - 2011
No ratings yet
Critical Success Factors in Managing Modular Production Design Six Company Case Studies in Hong Kong, China, and Singapore - Lau - 2011
16 pages
Capstone Project Proposal Draft
No ratings yet
Capstone Project Proposal Draft
5 pages
Chapter VI-0623c3a873ca902.59818737
No ratings yet
Chapter VI-0623c3a873ca902.59818737
10 pages
EC Marie Curie Initial Training Network: Advanced Technologies For Biogas Efficiency, Sustainability and Transport
No ratings yet
EC Marie Curie Initial Training Network: Advanced Technologies For Biogas Efficiency, Sustainability and Transport
15 pages
REviewer
No ratings yet
REviewer
2 pages
Fishing
No ratings yet
Fishing
23 pages
AI Magazine - 2023 - Munz - Maximizing AI Reliability Through Anticipatory Thinking and Model Risk Audits
No ratings yet
AI Magazine - 2023 - Munz - Maximizing AI Reliability Through Anticipatory Thinking and Model Risk Audits
12 pages
Tugas Biodas 1
No ratings yet
Tugas Biodas 1
5 pages
Ba ZG524 Course Handout
No ratings yet
Ba ZG524 Course Handout
7 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.