0% found this document useful (0 votes)
97 views

Logit and Probit Models

The document discusses binary choice models such as logit and probit models. It covers why these models are used, how they are specified including the latent variable approach, and how they are estimated using maximum likelihood estimation. It also discusses the interpretation of parameters and testing in binary response models. The key models covered are the probit and logit models.

Uploaded by

essam nabil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
97 views

Logit and Probit Models

The document discusses binary choice models such as logit and probit models. It covers why these models are used, how they are specified including the latent variable approach, and how they are estimated using maximum likelihood estimation. It also discusses the interpretation of parameters and testing in binary response models. The key models covered are the probit and logit models.

Uploaded by

essam nabil
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Binary Choice Models

Logit and Probit Models

1
Binary Choice Models: Logit and
Probit Models
• Why?
• Specification
• Estimation Method: MLE
• Parameters Interpretation and Partial effect
• Testing in Binary Response Models
• Criteria of Model Choice

2
Why Binary Response Models?
• Recall the linear probability model, which can
be written as P(y = 1|x) = b0 + xb
• Adding to the previously mentioned drawbacks
of the linear probability model is that:
– It assumes that Pi increases linearly with X.
– Any increase in X beyond a certain threshold X* will
not affect the Pi.
• Given the drawbacks, an alternative is
required!

3
Specification of binary response model (1/5)
• The binary response models take the following:
P(y = 1|x) = G(b0 + x’b) ≡ p(x)
• x is a (k x 1) vector.
• In LPM, G(b0 + x’b) = b0 + b1X1 + … + bkXk
• In the binary response models: 0 < G(b0 + x’b) < 1
• The function G(.) maps the index (b0 + x’b ) into
the response probability.
• In most application, G(.) is a cumulative
distribution function (CDF). 4
Specification of binary response model (2/5)
• Sometimes binary dependent variable models
are motivated through a latent variables model.
• The idea is that there is an underlying variable
y*, that can be modeled as:
y* = b0 + x’b + u, but we only observe
y = 1 if y* > 0 , and y =0 if y* ≤ 0.
• G(.) is the CDF of u.
• u assumed continuously distributed variable
with a symmetric distribution about zero.
5
Specification of binary response model (3/5)

• 1 - G(-z) = G(z).
• Therefore

P yi  1 x  P(b0  xβ  ui  0 x)
 P(ui  b 0  xβ x)
 P(ui  b 0  xβ x)

 G(b 0  xβ)
6
Specification of binary response model (4/5)

• This model will allow Pi to increase as Xi


increases.

• Never steps out of [0,1].

• Pi and Xi have a non-linear relationship.

7
Specification of binary response model (5/5)

• Choices of G(.)?
– Standard normal cumulative distribution function:
Probit Model.
– Logistic function: Logit Model.

• No real reason to prefer one on the other.

• Both probit and logit models are non linear: MLE


and not OLS.
8
Maximum likelihood estimation (1/2)
• Focuses on the fact that different populations
generate different samples.
• For example:
– if we have a sample (X1, X2, …, X8) is known to be
drawn from a normal population with a given
variance but unknown mean.
– Assuming that the observations come from either
distribution A or B.
– The observations “select” the population A as the
most likely to have generated the observed data.
9
Maximum likelihood estimation (2/2)
• Consists on estimating the unknown
parameters in such a manner that the
probability of observing a given Y is as high as
possible.
• Thus we need the density of Yi . It we assume
it is normally distributed, and each of the Y’s
are independently distributed (iid).
• The maximum –likelihood estimator
maximizes: 10
The Probit Model (1/6)
• One choice for G(z) is the standard normal
cumulative distribution function (cdf)
• G(z) = F(z) ≡ ∫f(v)dv, where f(z) is the
standard normal, so f(z) = (2p)-1/2exp(-z2/2)
• This case is referred to as a probit model
• Since it is a nonlinear model, it cannot be
estimated by our usual methods
• Use maximum likelihood estimation

11
The Probit Model (2/6)
• This model can be derived directly from
probability theory or as a latent regression or
as a random utility model.
• The latent regression model
• Assume
yi* = b0 + x’b + ui
ui ~ N(0,1)
i = 1, 2, …, n and x and b are k1 vectors
• where yi* is an unobserved (latent) variable
reflecting some underlying response. 12
The Probit Model (3/6)

• yi* is related to the observed variable yi by the


following relation

1 if y  0
*
yi   i

0 ow

13
The Probit Model (4/6)
• Derivation of the likelihood: The probability of
observing the value yi = 1 is given as
P yi  1  P( b0  x' β  ui  0)
 P(ui  b0  x' β)
 P(ui  b0  x' β)
 F( b0  x' β)
• where F(.) is the standardized normal
distribution, thus x'β

t2
1
F( b 0  x' β)  
 2p
e 2 dt
14
The Probit Model (5/6)
• Hence, the likelihood function is easily
obtained as

L  0 1  F( b0  x' β)  F(b  x' β)


1 yi yi
1 0

• where P0 and P1 denotes the product over


observations with yi = 0 and yi = 1,
respectively.
• The log likelihood function is given as

ln L  l  0 1  yi ln1  F( b0  x' β)  1 yi ln F(x' β)

15
The Probit Model (6/6)
• We need to maximize lnL with respect to the
beta’s.

• In this case this maximization has no closed


form and needs to be evaluated numerically
say using the N-R algorithm.

16
The Logit Model (1/5)
• Another common choice for G(z) is the
logistic function, which is the cdf for a
standard logistic random variable
• G(z) = exp(z)/[1 + exp(z)] = L(z)
• This case is referred to as a logit model, or
sometimes as a logistic regression
• Both functions have similar shapes – they are
increasing in z, most quickly around 0

17
The Logit Model (2/5)
• The latent regression model
• Assume
yi* = b0 + x’b + ui
ui ~ Logistic(0,p2 /3)
i = 1, 2, …, n and x and b are k1 vectors
• And again
1 if y  0
*
yi   i

0 ow
18
The Logit Model (3/5)
• Derivation of the likelihood: The probability of
observing the value yi = 1 is given as
P yi  1  P( b0  x' β  ui  0)
 P(ui  b0  x' β)
 P(ui  b0  x' β)
 L( b0  x' β)
• Where L(.) is the standardized logistic
distribution, thus
exp( b0  x' β)
L( b0  x' β) 
1  exp( b0  x' β)
19
The Logit Model (4/5)
• Hence, the likelihood function is easily
obtained as

L  0 1  L( b0  x' β)  L(b  x' β)


1 yi yi
1 0

• where P0 and P1 denotes the product over


observations with yi = 0 and yi = 1,
respectively.
• The log likelihood function is given as

ln L  l  0 (1  yi ) ln1  L( b0  x' β)  1 yi ln L( b0  x' β)

20
The Logit Model (5/5)
• We need to maximize lnL with respect to the
beta’s.
 ln L( b ; y, x)
0
b 0

 ln L( b ; y, x)
0
b j

21
The random utility models (1/2)
• Let Ua and Ub represent the utility of choosing
a and b respectively.
• The observed choice between the two reveals
which one provides the greater utility. The
observed indicator equals 1 if Ua > Ub and 0 if
Ua ≤ Ub.
• Assume the linear random utility model
Ua  ba + x’ba + ua and Ub  bb + x’bb + ub

22
The random utility models (2/2)
• The probability of choosing a is then

PU  U   P(βa  x' βa  ua  βb  x' βb  ub )


a b

 P(βa  x' βa  ua  βb  x' βb  ub  0)


 P((βa  βb )  x' (βa  βb )  ua  ub  0)
 P((βa  βb )  x' β  u  0)

• Which is the same as the latent regression


model 23
Probits and Logits
• Both the probit and logit are nonlinear and
require maximum likelihood estimation.
• No real reason to prefer one over the other.
• Traditionally saw more of the logit, mainly
because the logistic function leads to a more
easily computed model.
• Today, probit is easy to compute with
standard packages, so more popular.

24
Example (1/3)
• Recall the labor force participation of women
1 if the ith woman has or is looking for a job
Yi  
0 otherwise (not in the labor force)

• We had previously estimate a LPM of a linear


functional form
Yi = b0 + b1Mi + b2 Si + ui
– Mi = 1 if the ith woman is married, 0 otherwise
– Si = the number of years of schooling of the ith woman
25
Example (2/3)
• LPM results
Yˆi  0.28  0.38M i  0.09Si
(0.15) (0.03)
N  30 R  0.32
2
R  0.81
2
p
• Logit results
LogitPˆ (Yi  1)  5.89  2.59M i  0.69Si
(1.18) (0.31)
t  2.19 2.19
N  30 R  0.81
2
p iterations  5 26
Example (3/3)
• Comparing the results of the two previous
equation:
– The signs of the slope coefficients are the same.
– Significance of the slope coefficients are the same,
but!
– The overall fits are roughly the same.
– The Logit coefficient are larger than the LPM.
Usually times bigger.
– Main difference is that the estimated values never
step outside the 0 and 1 range.
27
Interpretation (1/7)
• In general we care about the effect of x on
P(y = 1|x), that is, we care about ∂P/ ∂x
• For the linear case, this is easily computed as
the coefficient on x
• For the nonlinear probit and logit models,
it’s more complicated:
 Pr[ y  1] d Pr[ y  1] z
  G( z ) b j  g ( z ) b j
x j dz x j
where g(.) is the pdf associated with the cdf
G(.) 28
Interpretation (2/7)
• Probit model
 Pr[ y  1]
 f b 0  x' β b j
x j
 1 2
exp  b 0  x' β   b j
1

2p  2 
• Logit model
 Pr[ y  1] exp b 0  x'β
 b j  b j Pi 1  Pi 
x j 1  exp b0  x'β2

29
Interpretation (3/7)
• Clear that it’s incorrect to just compare the
coefficients across the three models.

• Can compare sign and significance (based on


a standard t test) of coefficients, though.

• To compare the magnitude of effects, need to


calculate the derivatives, say at the means.

30
Interpretation (4/7)
• The partial effect of a continous variable
∂P/ ∂xj = g(b0 +b1 X1+b2 X2+…+bK XK)bj, where
g(z) is dG/dz
• This is the effect on the probability for a one
unit increase in Xj, holding all other factors
constant
• If X2 is a binary explanatory variable
G(b0  b1 X1  b2    b K X K ) - G(b0  b1 X1    b K X K )
31
Interpretation (5/7)
Other kinds of discrete variables
• e.g., years of education, number of children
• The magnitude of the effect of X2 on the
probability from one level to the other is

G b0  b1 X 1  b 2 X 2  b 3 (c3  1)  ...  b K X K  


G b0  b1 X 1  b 2 X 2  b 3 (c3 )  ...  b K X K 

32
Interpretation (6/7)
• The results of the logit model is often
expressed in terms of odds ratios
exp b 0  x'β
Pr[ y  1]  Lb 0  x'β  
1  exp b0  x'β
1  exp b0  x'βLb0  x'β  exp b0  x'β
Lb0  x'β  exp b0  x'βLb0  x'β  exp b0  x'β
Lb0  x'β  exp b0  x'β  exp b0  x'βLb0  x'β
Lb0  x'β  exp b0  x'β1  Lb0  x'β
33
Interpretation (7/7)
Lb 0  x'β 
exp b 0  x'β  
1  Lb0  x'β
Pr[ yi  1 | x]
exp b 0  x'β  
1  Pr[ yi  1 | x]
Pr[ yi  1 | x]
exp b 0  x'β   " odds ratio"
Pr[ yi  0 | x]
• Example: If the probability of voting is 0.8, it
means that odds are 4 to 1 in favor of voting. 34
Testing hypothesis of individual parameter (1/2)

• Calculating the standard errors of the coefficient


estimators is complicated, but it is handled by
Stata. The asymptotic covariance matrix of any
MLE is the inverse of the “information matrix”:
1
   ln L( b ; y, x)  
2
cov(β)  I (β)
1
  E  
  ββ' 
• It can be approximated for the sample
numerically to get an estimated covariance
matrix for the parameter vector.
35
Testing hypothesis of individual parameter (2/2)
• Two-Sided Alternatives
H0: bj = 0 H1: bj ≠ 0
bˆ j
t
 
se bˆ j
fail to reject

reject reject
a/2 1  a a/2
-c 0 c 36
The Likelihood Ratio Test
• Unlike the LPM, where we can compute F
statistics to test exclusion restrictions, we
need a new type of test.
• Maximum likelihood estimation (MLE), will
always produce a log-likelihood, lnL = l
• Just as in an F test, you estimate the
restricted and unrestricted model, then
form
LR = 2(lur – lr) ~ c2q
37
Goodness of Fit (1/2)
• Unlike the LPM, where we can compute an R2
to judge goodness of fit, we need new
measures of goodness of fit
• One possibility is a pseudo R2 based on the
log likelihood and defined as 1 – lur/lr
– This ratio is the likelihood function with the best
parameter estimate divided by the likelihood
function if we just predict each Y by the sample
proportion of Y values that are one.

38
Goodness of Fit (2/2)
• Fraction predicted correctly:
– If you take the prediction of yi to be 1 if G(.) > 0.5
and zero otherwise, then you get a prediction of
zero or one for each yi . The fraction correctly
predicted is:
n1  n2
N
• An alternative, Rp2 which is the average of the
percentage of ones correctly explained and
the percentage of zeros correctly explained.
39
Probit Model in Stata
• probit reports the coefficients and dprobit
reports the partial effects. The regression is
identical for each.
• Note that the partial effects depend on Z and
thus on X. You can specify the values at which
to evaluate the partial effects in dprobit with
the default being at the means.
• Partial effects of dummy variables are
reported (by default) as difference in
probabilities, with other variables at means. 40
Logit Model in Stata (1/4)
• logit reports coefficients and logistic reports
the “odds-ratio” eβj. (This is really the
proportional effect of the variable on the odds
ratio, not the odds ratio itself.)
– If Xji increases by one, eXiβ increases to eX β + βj =eX βeβj
i i

, so eβj measures the estimated proportion by which


a one unit change in Xji changes the odds ratio.

• Interpretation can be tricky!


41
Logit Model in Stata (2/4)
• All eβ values are positive.
• A zero effect means that β = 0 and eβ = 1.

• A variable that reduces the odds ratio has a β < 1.


• A variable that increases the odds ratio has a β > 1.

42
Logit Model in Stata (3/4)

• Example: If eβj = 2 and the initial probability p of Y =


1 for this observation is .2, (so the initial odds-ratio
p/(1 – p) is (.2) / (.8) = 0.25), then a one-unit
increase in Xj multiplies the odds ratio by eβj = 2,
making it 0.5, which means that the probability of Y
= 1 has increased from 0.2 to 0.333 = 0.5/(1 + 0.5).
43
Logit Model in Stata (4/4)

• If we do the same example for an observation with


an initial p = 0.5, then the initial odds ratio is 1, the
unit increase in Xj multiplies it by 2, making the new
odds ratio 2, and thus the probability has increased
from 0.5 to 2/(1 + 2) = 0.667.

44

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy