Logit and Probit Models
Logit and Probit Models
1
Binary Choice Models: Logit and
Probit Models
• Why?
• Specification
• Estimation Method: MLE
• Parameters Interpretation and Partial effect
• Testing in Binary Response Models
• Criteria of Model Choice
2
Why Binary Response Models?
• Recall the linear probability model, which can
be written as P(y = 1|x) = b0 + xb
• Adding to the previously mentioned drawbacks
of the linear probability model is that:
– It assumes that Pi increases linearly with X.
– Any increase in X beyond a certain threshold X* will
not affect the Pi.
• Given the drawbacks, an alternative is
required!
3
Specification of binary response model (1/5)
• The binary response models take the following:
P(y = 1|x) = G(b0 + x’b) ≡ p(x)
• x is a (k x 1) vector.
• In LPM, G(b0 + x’b) = b0 + b1X1 + … + bkXk
• In the binary response models: 0 < G(b0 + x’b) < 1
• The function G(.) maps the index (b0 + x’b ) into
the response probability.
• In most application, G(.) is a cumulative
distribution function (CDF). 4
Specification of binary response model (2/5)
• Sometimes binary dependent variable models
are motivated through a latent variables model.
• The idea is that there is an underlying variable
y*, that can be modeled as:
y* = b0 + x’b + u, but we only observe
y = 1 if y* > 0 , and y =0 if y* ≤ 0.
• G(.) is the CDF of u.
• u assumed continuously distributed variable
with a symmetric distribution about zero.
5
Specification of binary response model (3/5)
• 1 - G(-z) = G(z).
• Therefore
P yi 1 x P(b0 xβ ui 0 x)
P(ui b 0 xβ x)
P(ui b 0 xβ x)
G(b 0 xβ)
6
Specification of binary response model (4/5)
7
Specification of binary response model (5/5)
• Choices of G(.)?
– Standard normal cumulative distribution function:
Probit Model.
– Logistic function: Logit Model.
11
The Probit Model (2/6)
• This model can be derived directly from
probability theory or as a latent regression or
as a random utility model.
• The latent regression model
• Assume
yi* = b0 + x’b + ui
ui ~ N(0,1)
i = 1, 2, …, n and x and b are k1 vectors
• where yi* is an unobserved (latent) variable
reflecting some underlying response. 12
The Probit Model (3/6)
1 if y 0
*
yi i
0 ow
13
The Probit Model (4/6)
• Derivation of the likelihood: The probability of
observing the value yi = 1 is given as
P yi 1 P( b0 x' β ui 0)
P(ui b0 x' β)
P(ui b0 x' β)
F( b0 x' β)
• where F(.) is the standardized normal
distribution, thus x'β
t2
1
F( b 0 x' β)
2p
e 2 dt
14
The Probit Model (5/6)
• Hence, the likelihood function is easily
obtained as
15
The Probit Model (6/6)
• We need to maximize lnL with respect to the
beta’s.
16
The Logit Model (1/5)
• Another common choice for G(z) is the
logistic function, which is the cdf for a
standard logistic random variable
• G(z) = exp(z)/[1 + exp(z)] = L(z)
• This case is referred to as a logit model, or
sometimes as a logistic regression
• Both functions have similar shapes – they are
increasing in z, most quickly around 0
17
The Logit Model (2/5)
• The latent regression model
• Assume
yi* = b0 + x’b + ui
ui ~ Logistic(0,p2 /3)
i = 1, 2, …, n and x and b are k1 vectors
• And again
1 if y 0
*
yi i
0 ow
18
The Logit Model (3/5)
• Derivation of the likelihood: The probability of
observing the value yi = 1 is given as
P yi 1 P( b0 x' β ui 0)
P(ui b0 x' β)
P(ui b0 x' β)
L( b0 x' β)
• Where L(.) is the standardized logistic
distribution, thus
exp( b0 x' β)
L( b0 x' β)
1 exp( b0 x' β)
19
The Logit Model (4/5)
• Hence, the likelihood function is easily
obtained as
20
The Logit Model (5/5)
• We need to maximize lnL with respect to the
beta’s.
ln L( b ; y, x)
0
b 0
ln L( b ; y, x)
0
b j
21
The random utility models (1/2)
• Let Ua and Ub represent the utility of choosing
a and b respectively.
• The observed choice between the two reveals
which one provides the greater utility. The
observed indicator equals 1 if Ua > Ub and 0 if
Ua ≤ Ub.
• Assume the linear random utility model
Ua ba + x’ba + ua and Ub bb + x’bb + ub
22
The random utility models (2/2)
• The probability of choosing a is then
24
Example (1/3)
• Recall the labor force participation of women
1 if the ith woman has or is looking for a job
Yi
0 otherwise (not in the labor force)
29
Interpretation (3/7)
• Clear that it’s incorrect to just compare the
coefficients across the three models.
30
Interpretation (4/7)
• The partial effect of a continous variable
∂P/ ∂xj = g(b0 +b1 X1+b2 X2+…+bK XK)bj, where
g(z) is dG/dz
• This is the effect on the probability for a one
unit increase in Xj, holding all other factors
constant
• If X2 is a binary explanatory variable
G(b0 b1 X1 b2 b K X K ) - G(b0 b1 X1 b K X K )
31
Interpretation (5/7)
Other kinds of discrete variables
• e.g., years of education, number of children
• The magnitude of the effect of X2 on the
probability from one level to the other is
32
Interpretation (6/7)
• The results of the logit model is often
expressed in terms of odds ratios
exp b 0 x'β
Pr[ y 1] Lb 0 x'β
1 exp b0 x'β
1 exp b0 x'βLb0 x'β exp b0 x'β
Lb0 x'β exp b0 x'βLb0 x'β exp b0 x'β
Lb0 x'β exp b0 x'β exp b0 x'βLb0 x'β
Lb0 x'β exp b0 x'β1 Lb0 x'β
33
Interpretation (7/7)
Lb 0 x'β
exp b 0 x'β
1 Lb0 x'β
Pr[ yi 1 | x]
exp b 0 x'β
1 Pr[ yi 1 | x]
Pr[ yi 1 | x]
exp b 0 x'β " odds ratio"
Pr[ yi 0 | x]
• Example: If the probability of voting is 0.8, it
means that odds are 4 to 1 in favor of voting. 34
Testing hypothesis of individual parameter (1/2)
reject reject
a/2 1 a a/2
-c 0 c 36
The Likelihood Ratio Test
• Unlike the LPM, where we can compute F
statistics to test exclusion restrictions, we
need a new type of test.
• Maximum likelihood estimation (MLE), will
always produce a log-likelihood, lnL = l
• Just as in an F test, you estimate the
restricted and unrestricted model, then
form
LR = 2(lur – lr) ~ c2q
37
Goodness of Fit (1/2)
• Unlike the LPM, where we can compute an R2
to judge goodness of fit, we need new
measures of goodness of fit
• One possibility is a pseudo R2 based on the
log likelihood and defined as 1 – lur/lr
– This ratio is the likelihood function with the best
parameter estimate divided by the likelihood
function if we just predict each Y by the sample
proportion of Y values that are one.
38
Goodness of Fit (2/2)
• Fraction predicted correctly:
– If you take the prediction of yi to be 1 if G(.) > 0.5
and zero otherwise, then you get a prediction of
zero or one for each yi . The fraction correctly
predicted is:
n1 n2
N
• An alternative, Rp2 which is the average of the
percentage of ones correctly explained and
the percentage of zeros correctly explained.
39
Probit Model in Stata
• probit reports the coefficients and dprobit
reports the partial effects. The regression is
identical for each.
• Note that the partial effects depend on Z and
thus on X. You can specify the values at which
to evaluate the partial effects in dprobit with
the default being at the means.
• Partial effects of dummy variables are
reported (by default) as difference in
probabilities, with other variables at means. 40
Logit Model in Stata (1/4)
• logit reports coefficients and logistic reports
the “odds-ratio” eβj. (This is really the
proportional effect of the variable on the odds
ratio, not the odds ratio itself.)
– If Xji increases by one, eXiβ increases to eX β + βj =eX βeβj
i i
42
Logit Model in Stata (3/4)
44