Discrete Choice Models 230919 191735
Discrete Choice Models 230919 191735
Econometrics
9/19/2023 1
We know that, in a regression framework, the
dependent variable can be continuous or
discrete.
9/19/2023 2
Regression models essentially imply averaging y for
given values of the Xs.
9/19/2023 4
Inother words, averaging in this
context does not tell us something
about the average value a
qualitative variable assumes but
rather it tells us something about
the probability that the qualitative
variable will equal 1.
9/19/2023 5
Some Applications
In some applications, you might be interested in
investigating factors affecting a qualitative event or a
binary outcome such as;
9/19/2023 6
The Linear probability model (LPM)
In all of the above examples our dep. Var. takes
only two values 1 if the event occurs and 0
otherwise.
9/19/2023 8
Suppose we have the following multiple
regression model;
y 0 1 x1 .... k x k u (1)
where
E(u|x)=0 by definition.
9/19/2023 9
The dep. Var. changes either from 1 to 0 or vice
versa or does not change at all.
9/19/2023 10
Using the expectations operator, we have
E ( y | x) 0 1 x1 .... k x k (2)
P( y 1 | x) 0 1 x1 .... k x k (3)
9/19/2023 11
The probability is a linear function of the x’s.
9/19/2023 13
In the LPM, j measures the change in the
probability of success when x j changes, ceteris
paribus:
P( y 1 | x) j x j (4)
9/19/2023 14
Alternative conceptual framework for the
LPM.
Consider the following unconditional
expectation of a binary variable, y, defined as
a probability;
E ( y ) Pr( y 1)
E ( y | x ) Pr( y 1 | x )
y F ( x, ) u
9/19/2023 15
Taking expectations;
E ( y | x) F ( x, ); as, E (u ) 0
9/19/2023 17
Introducing disturbances u, we can write the
model as
y x' u
For n observations;
y i xi ' u i
yˆ i xi ' ̂
will give the estimated probability that the
event will occur or the characteristic will be
observed given the particular value of the x’s.
9/19/2023 18
Problems with LPM
(1) Disturbances are non-normal(error are either
0 or 1)
Alternatively,
Pr( yi 1 | x) xi ' ui
1 xi ' ui
(
1 xi ' ui
9/19/2023 19
If the event does not happen, ui xi ' with
probability f (ui ) (1 xi ' ) .
Alternatively,
Pr( yi 1 | x) xi ' ui
0 xi ' ui
xi ' ui
9/19/2023 21
Why does the above expression show heteroscedasticity?
9/19/2023 22
What is the solution?
9/19/2023 23
The major criticism relates to the formulation –
that the conditional expectations be
interpreted as the probability that the event will
occur.
9/19/2023 24
(4)A probability cannot be linearly related to
the independent variables for all their possible
values.
9/19/2023 25
Example
Suppose we are interested in the impact of the
number of kids a woman has on her labour market
participation. The impact on the participation
probability when the woman moves from having
no (0) children to having 1 should not be equal to
the impact of moving from having 1 to 2 children.
In practice, subsequent children have smaller
impact than the first child.
Article
McCall, B. P. (1995) The Impact of Unemployment Insurance
Benefit Levels on Recipiency, Journal of Business and
Economic Statistics, 13, 189-198. Also Verbeek Ch. 7, sec.
7.1.6.
9/19/2023 27
Source | SS df MS Number of obs = 4877
-------------+------------------------------ F( 19, 4857) = 18.33
Model | 70.5531915 19 3.71332587 Prob > F = 0.0000
Residual | 983.900366 4857 .20257368 R-squared = 0.0669
-------------+------------------------------ Adj R-squared = 0.0633
Total | 1054.45356 4876 .216253806 Root MSE = .45008
------------------------------------------------------------------------------
y | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rr | .6288587 .3842068 1.64 0.102 -.1243605 1.382078
rr2 | -1.019059 .480955 -2.12 0.034 -1.961949 -.0761697
age | .0157489 .0047841 3.29 0.001 .0063698 .025128
age2ten | -.0014595 .0006016 -2.43 0.015 -.0026389 -.0002801
tenure | .0056531 .0012152 4.65 0.000 .0032708 .0080355
slack | .1281283 .0142249 9.01 0.000 .100241 .1560156
abol | -.0065206 .0248281 -0.26 0.793 -.0551948 .0421537
seasonal | .0578745 .0357985 1.62 0.106 -.0123067 .1280557
head | -.043749 .016643 -2.63 0.009 -.0763769 -.0111211
married | .0485952 .0161348 3.01 0.003 .0169637 .0802267
dkids | -.0305088 .0174321 -1.75 0.080 -.0646837 .003666
dykids | .0429115 .0197563 2.17 0.030 .0041803 .0816428
smsa | -.035195 .0140138 -2.51 0.012 -.0626684 -.0077217
nwhite | .0165889 .0187109 0.89 0.375 -.0200928 .0532707
yrdispl | -.0133149 .0030686 -4.34 0.000 -.0193307 -.007299
school12 | -.0140365 .0168433 -0.83 0.405 -.0470571 .018984
male | -.0363176 .0178142 -2.04 0.042 -.0712415 -.0013936
statemb | .0012394 .0002039 6.08 0.000 .0008396 .0016393
stateur | .0181479 .0030843 5.88 0.000 .0121012 .0241945
_cons | -.076869 .122056 -0.63 0.529 -.316154 .162416
------------------------------------------------------------------------------
9/19/2023 28
Interpretation?
When we estimate the LPM using OLS, no
corrections for heteroskedasticity are made
and no attempt is made to keep the implied
probabilities between 0 and 1.
9/19/2023 29
Limited Dependent Variable (LDV)
models
LDV – a dependent variable whose range of values is
substantively restricted.
These two models are logit and probit models. One can
estimate a logit or a probit model for an equation with a
binary dep. Var.
9/19/2023 30
The logit model assumes that the error
terms follow a logistic distribution while
the probit model assumes that it
follows a normal distribution.
9/19/2023 31
What about corner solution responses?
In practice, optimising behaviour of economic agents
(e.g. individuals, households, …etc) leads to a corner
solution response for some nontrivial fraction of the
population.
Example:
9/19/2023 32
poisson and negative binomial are count data
models which handle untypical dependent
variables for example when the dependent
variables is in the form of counts (e.g. number
of visits to a hospital in a given year)
9/19/2023 33
The Logit model
9/19/2023 34
Our primary interest is to estimate the response
probability;
P( y 1 | x) P ( y 1 | x1 , x 2 ,..., x n ) (5)
P( y 1 | x) G ( 0 1 x1 .... k x k ) G ( 0 x )
(6)
G( z ) G( 0 1 x1 .... k xk ) G( 0 x )
yi 0 1 x1 .... k xk i
0 x i (8)
9/19/2023 37
The Likelihood function
9/19/2023 38
In the case of random sampling where all
observations are sampled independently, the
likelihood function will simply be the product of
the individual contributions, as follows;
n
L piyi (1 pi )1 yi
i 1
1 y1 1 y 2 1 y n
p (1 pi ) p
i
y1
* p (1 pi )
i
y2
* ... * p (1 pi )
i
yn
(9)
ln L [ yi ln pi (1 yi ) ln(1 pi ) (10)
Article
9/19/2023 40
Logit Model using MLE technique
Iteration 0: log likelihood = -3043.028
Iteration 1: log likelihood = -2875.8198
Iteration 2: log likelihood = -2873.2003
Iteration 3: log likelihood = -2873.1965
------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rr | 3.06808 1.868225 1.64 0.101 -.5935732 6.729733
rr2 | -4.890618 2.333521 -2.10 0.036 -9.464236 -.3170007
age | .0676968 .0239095 2.83 0.005 .020835 .1145586
age2ten | -.0059681 .0030383 -1.96 0.050 -.0119231 -.000013
tenure | .0312492 .0066443 4.70 0.000 .0182267 .0442717
slack | .624822 .0706385 8.85 0.000 .4863731 .7632709
abol | -.0361753 .1178082 -0.31 0.759 -.2670751 .1947245
seasonal | .270874 .1711711 1.58 0.114 -.0646152 .6063633
head | -.2106822 .081226 -2.59 0.009 -.3698822 -.0514821
married | .2422656 .0794099 3.05 0.002 .0866251 .3979061
dkids | -.1579269 .0862177 -1.83 0.067 -.3269105 .0110566
dykids | .2058941 .0974924 2.11 0.035 .0148126 .3969756
smsa | -.1703537 .0697808 -2.44 0.015 -.3071216 -.0335858
nwhite | .0740701 .0929562 0.80 0.426 -.1081208 .256261
yrdispl | -.0637001 .0149972 -4.25 0.000 -.0930941 -.0343062
school12 | -.0652576 .0824126 -0.79 0.428 -.2267834 .0962681
male | -.179829 .087535 -2.05 0.040 -.3513944 -.0082636
statemb | .006027 .001009 5.97 0.000 .0040494 .0080046
stateur | .0956198 .0159116 6.01 0.000 .0644336 .126806
_cons | -2.800499 .6041675 -4.64 0.000 -3.984645 -1.616352
------------------------------------------------------------------------------
9/19/2023 41
Computing predicted probabilities from the
above logit estimates
exp( z )
G( z ) ( z )
1 exp( z )
Dividing both the numerator and the
denominator by exp(z), we get
exp( z ) / exp( z) 1
Pri G( z )
1 / exp( z) exp( z) / exp( z ) 1 / exp( z) 1
9/19/2023 42
What is the distinction between the logit model
and logistic model?
9/19/2023 43
The Probit model
9/19/2023 44
z
G ( z ) ( z )
( z )dz, (11)
where
( z ) (2 ) 1 / 2 exp( z 2 / 2)
1 1 2
exp z .
2 2
Eq (11) can be rewritten as;
1 2
z
1
G ( z ) ( z )
2
exp
2
z dz
(12)
------------------------------------------------------------------------------
y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
rr | 1.863475 1.127476 1.65 0.098 -.3463382 4.073287
rr2 | -2.980436 1.410589 -2.11 0.035 -5.74514 -.2157332
age | .0422141 .0142969 2.95 0.003 .0141927 .0702355
age2ten | -.0037741 .0018118 -2.08 0.037 -.0073251 -.0002231
tenure | .0176942 .0038077 4.65 0.000 .0102312 .0251572
slack | .3754931 .0424115 8.85 0.000 .2923681 .458618
abol | -.0223137 .071845 -0.31 0.756 -.1631274 .1185
seasonal | .1612071 .1039498 1.55 0.121 -.0425308 .3649451
head | -.1247463 .0491627 -2.54 0.011 -.2211034 -.0283892
married | .1454763 .0477579 3.05 0.002 .0518725 .2390801
dkids | -.0965778 .051813 -1.86 0.062 -.1981294 .0049738
dykids | .1236098 .058581 2.11 0.035 .0087931 .2384265
smsa | -.1001521 .04183 -2.39 0.017 -.1821373 -.0181668
nwhite | .0517939 .0559871 0.93 0.355 -.0579388 .1615266
yrdispl | -.0384797 .0090685 -4.24 0.000 -.0562535 -.0207058
school12 | -.0415518 .0497067 -0.84 0.403 -.1389751 .0558714
male | -.1067169 .0527926 -2.02 0.043 -.2101885 -.0032454
statemb | .0036399 .0006071 6.00 0.000 .0024499 .0048298
stateur | .0568271 .0094492 6.01 0.000 .038307 .0753472
_cons | -1.699991 .3622682 -4.69 0.000 -2.410024 -.9899586
------------------------------------------------------------------------------
9/19/2023 46
Derivation of the above two models using
latent variable model
y* 0 x e, y 1[ y* 0] (13)
9/19/2023 47
where 1[.] is the indicator function which
can be shown as;
y 1, if , y* 0
y 0, otherwise/ if , y* 0
Therefore, 1-G(-z)=G(z).
9/19/2023 55
* *
Y=1( U > U ) y 1 y 0
= 1( x' 1 u1 x' 0 u0 )
=1( u1 u0 x' ( 1 0 )
Clearly, we cannot identify both sets of parameters 0 and 1 , however, we
9/19/2023 56
implicitly parameterise the choice model
as
y=1(y*>0)
where y* x' (1 0 ) (u1 u0 ) x' u .
In other words, the latent variable approach to binary choice model specification
can be derived from an economic model of behaviour.
9/19/2023 57
While the underling preference
specification is by necessity fairly
restrictive, the fact that the latent
variable approach can be presented
as having foundations in economic
theory lends weight to its application
in applied work.
9/19/2023 58
Goodness-of-fit
Contrary to the linear regression model, there is
no single measure for the goodness-of-fit in
binary response (choice) models
9/19/2023 59
Let log denote the maximum likelihood value
of the model of interest and let log denote the
maximum value of the log likelihood function
when all parameters, except the intercept, are
set to zero.
9/19/2023 60
Clearly, log L1 log L0 . The larger the
difference between the two log likelihoods
values, the more the extended model
adds to the very restrictive model. A first
goodness-of-fit measure is defined as;
9/19/2023 61
1
R 1
2
Pseudo 1 2(log L1 log L0 ) / N
log L1
R 1 2
McFadden log L0
9/19/2023 62
and it is sometimes referred to as the
Likelihood Ratio Index.
p( x) dG
g ( 0 x ) j , where, g ( z ) ( z ).
x j dz
(11)
G= cdf of a continuous random variable
g= pdf
G(.) is strictly increasing and so g(z)>0, z.
9/19/2023 67
Eq(11) tells us that the partial effect of x j
on p(x) depends on x through the positive
quantity g ( 0 x ) , which means that the
partial effect always has the same sign as
j.
9/19/2023 68
What is the ME if the regressor is binary?
It will be;
G ( 0 1 x1 2 x2 ... k xk ) G ( 0 2 x2 ...
(12)
Note X 1 is 1 in the first term of eq(12) and
0 in the second term. Only the sign, not the
magnitude of the coefficient is important.
9/19/2023 69
Example:
If y is an employment indicator and the regressor is
a location dummy (e.g. urban – rural).
G ( 0 1 x1 ... k (c k 1) G ( 0 1 x1 ...
(13)
9/19/2023 71
Other standard functional forms can be
included among the regressors (e.g.
polynomials of different order).
Example:
In the model,
P( y 1 | z) G( 0 1 z1 z 3 log(z2 ) 4 z3 )
2
2 1
(14)
9/19/2023 72
ME of z1 on P(y=1|z) is
P( y 1 | z )
g ( 0 x )(1 2 2 z1 )
z1
(15)
ME of z 2 is
P( y 1 | z )
g ( 0 x )( 3 / z 2 )
z 2
(16)
9/19/2023 73
where
x 1 z1 z 3 log(z2 ) 4 z3
2
2 1 (17)
Thus
P( y 1 | z )
g ( 0 x )( 3 / 100)or, 3/(1 / 100)or, / 3
z 2
is the approximate change in the
response probability when z 2 increases by
1%.
9/19/2023 74
Computing Marginal Effects using STATA
(Type mfx after estimating equation)
Note:
Interactions among regressors (i.e. including
those between discrete and continuous
variables) are handled similarly.
Estimation
We know that we have different ways of
generating estimators i.e, method of moments,
least squares and maximum likelihood
estimation (MLE).
9/19/2023 75
All of the discrete choice models we discussed
above are estimated using MLE technique. To
estimate LPM, we can use OLS or WLS
(weighted least squares) in some cases.
9/19/2023 76
Assume that we have a random sample of
size n. To obtain the ML estimator
conditional on the regressors, we need the
density of y given x . This is
i i
f ( y | xi ; ) [G( xi )] [1 G( xi )]
y 1 y
, y 0,1
(18)
9/19/2023 79
Differentiaing this function w.r.t. to the
parameters gives us the following FOCs.
Solving them for the parameters of interest
will give us the ML estimates.
L( ) L( ) L( )
0; 0,......... , 0
1 2 k
(20)
9/19/2023 80
If G(.) in eq. (19) is the standard logit cdf,
ˆ
we get (i.e. vector of parameters) as a
logit estimator and if G(.) is the standard
normal cdf, then the vector gives us the
probit estimator.
9/19/2023 81
The non-linear nature of the maximisation
problem makes it computationally difficult to
write formulas for logit or probit ML estimates.
9/19/2023 82
Testing for Normality (Background)
9/19/2023 83
E ( ) as follows:
j
E ( 1 ) 0 (mean) (c.1)
E ( ) 1 (variance)
2
(c.2)
E ( ) 0 (skewness)
3
(c.3)
E ( ) 0 (kurtosis)
4
(c.4)
9/19/2023 84
When presented with a regression of the
form yi x'i u i in which the error term
u i is maintained as normal, the error term
should respect characteristics (c.1) to
(c.4).
9/19/2023 85
An obvious way to carry out such tests is to
compare the sample moments of
~ ~
standardised residuals; i ( yi x'i ) * ,
1
9/19/2023 86
Note that the first two sample moments of
~i respect (c.1) and (c.2) by definition.
and,
N ,
N 1
* i 1(var iance)
~ 2
i 1
9/19/2023 87
~
where i ( yi x'i ) * .
~ 1
i 1 i 1
as test statistics.
9/19/2023 88
Under the null of normality, these two
statistics ought to respect conditions (c.3)
and (c.4).
Why?
9/19/2023 89
The skewness of any symmetric
E ( 3
)
( i ) 3
distribution ( N 1) s 3 such
as the normal distribution is zero. S
is the standard deviation and is
a univariate random variable.
9/19/2023 90
We also know that the kurtosis for a
standard normal distribution
E ( )
4 i
( ) 4
( N 1) s 4 is 3. For this
reason, excess kurtosis is defined as
E ( 4
)
( i ) 4
( N 1) s 4 so that the
standard normal distribution has a
kurtosis of zero.
9/19/2023 91
Likelihood Ratio (LR) Test for non-normality
in probit models
an auxiliary regression
~ 2 ~ 3
y *i x'i 1 ( x'i ) 2 ( x'i ) i ;
9/19/2023 92
3. Obtain maximised log-likelihood
LogLN from the auxiliary regression
9/19/2023 93
LR test for heteroskedasticity in probit
models
9/19/2023 94
3. Obtained maximised log-likelihood
LogLH from the auxiliary regression.
null of homoskedasticity.
9/19/2023 95
LR test for Omitted Variables in Probit
Models
1. Estimate y *i x'i i to obtain ML
~
estimates of and maximised log-
likelihood LogL0 ;
no incorrect omission.
9/19/2023 97
Testing exclusion restrictions in logit and
probit models
In many applications, we do not go
beyond t and F-tests to assess the
statistical significance of parameters.
9/19/2023 99
Suppose we have the following model;
y 0 1 x1 ... k x k u (21)
Note: y can be continuous or discrete.
Test, whether the last q of these variables all
have zero population parameters. Thus, the null
is
H 0 : k q 1 0,.... k 0
(22)
9/19/2023 100
which puts q exclusion restrictions on the model
given in eq(21).
9/19/2023 101
Step I:
The estimates from the restricted model can be
given as;
~ ~ ~
y 0 1 x1 ... k q x k q u~ (23)
9/19/2023 102
Step II;
This suggests running a regression of these
residuals on those independent variables
excluded under the null, which is almost what
the LM test does.
9/19/2023 103
This takes the form;
~u x .... x
0 1 1 k k (24)
This is referred to as an auxiliary regression which
is used to compute a test statistic but its
coefficients are not of direct interest.
9/19/2023 104
Step III
Compute the LM Statistic. Under the null, this
turns out to be the product of the sample size
and the R-squared obtained from the auxiliary
regression.
LM nR ~
2
u
2
q (25)
9/19/2023 105
Step IV
Compare the statistic with the chi-
squared reading of critical values and
decide.
Unlike
the F-test, the degrees of freedom
in the unrestricted model plays no role
under the LM test.
9/19/2023 106
Caution:
If in step(I), we mistakenly regress y on all of the
independent variables and obtain the residuals
from this unrestricted regression, we do not get
an interesting statistic; the resulting R-squared
will be exactly zero.
9/19/2023 107
ii.) Wald test
9/19/2023 108
This statistic is computed by econometrics
packages (such as STATA) that allow exclusion
restrictions to be tested after the unrestricted
model has been estimated.
9/19/2023 109
The statistic has a chi-square distribution, the
number of restrictions tested being equal to the
degrees of freedom.
9/19/2023 110
Inconducting the above tests using
STATA one needs to know or have
good knowledge on post estimation
commands
9/19/2023 111
The Multinomial Logit (Mlogit) model
When we are interested in modelling decisions
among multiple alternatives, where the outcomes
are unordered, we can not use ordered or
bivariate models.
9/19/2023 113
Imagine that we have M possible
alternatives to a discrete (multiple) choice,
each with an associated probability Pmi ,
m=1,…,M and for i=1,…,N.
9/19/2023 114
Essentially, the Mlogit expresses these
probabilities (relative to some benchmark
outcome, say PMi ) in relation to some non-
linear transformation of a linear
combination of a set of k explanatory
variables x i .
9/19/2023 115
Suppressing the subscript i for ease of
exposition, let
Pm
F ( x' m )
Pm PM (26)
9/19/2023 116
Since Pm (0,1) , we therefore have that
Pm
0, as, Pm 0; PM 1
Pm PM
Pm (28)
1, as, Pm 1; PM 0.
Pm PM
9/19/2023 117
Hence, it must also be the case that F(.) is
a monotone increasing function of its
argument, such that F(u) 0 as u ,
M
have that
9/19/2023 118
M 1 Pj 1 PM 1
j 1 PM
PM
PM
1
(29)
which therefore defines the Mth
probabilities(the benchmark) as
1
Pj
M 1
PM 1
j 1 PM
1
M 1 (30)
1 ( x' j )
j 1
9/19/2023 119
and the remaining M-1 probabilities Pm as
(31)
for all m=1,…,M-1.
9/19/2023 120
In other words, each probability can be
expressed in terms of the set of
explanatory variables and an unknown set
of parameter vectors 1 , 2 ,..., M 1 .
9/19/2023 121
Estimation of Mlogit
In order to estimate the model, we need a
specification for (.) and the formulation for the
mlogit turns out to be (u ) exp(u ) such that the
implied specification of F(.) is logistic (why?).
9/19/2023 122
The likelihood function itself is a logical
extension of the likelihoods in the binary case.
9/19/2023 123
M M
li Pr( yi m | xi ) zim Pimzim (32)
m 1 m 1
z im 1( yi m), for m=1,…,M and where P im
9/19/2023 124
with the probabilities defined using the
formulae derived above.
9/19/2023 125
Additional Example
Consider the outcomes 1,2,3,…,m
recorded in y, and the explanatory
variables X. Assume there are m=3
outcomes (e.g. choice of health care
provider, public hospital, private hospital
and traditional healer) with probabilities
Pr(y=1), Pr(y=2) and Pr(y=3).
e x 2
Pr( y 2) x1 x 2 x 3
e e e
x 3
e
Pr( y 3) x1 x 2 x 3
e e e
9/19/2023 127
The model, however, is unidentified in the
sense that there is more than one solution
to 1 , 2 and 3 , that leads to the same
probabilities for y=1, y=2 and y=3.
e x 2
Pr( y 2)
1 e x 2 e x 3
e x 3
Pr( y 3)
1 e x 2 e x 3
9/19/2023 131
The errors in the latent variable model
giving us the mlogit model may have the
conventional interpretation of the impact
of factors known to the decision maker
but not to the observer/econometrician.
9/19/2023 132