0% found this document useful (0 votes)
50 views15 pages

Seminar Econometrie

This document provides an overview of binary choice models and multiple discrete choice models. It discusses two classes of discrete variables: binary variables that can take values of either 1 or 0, and multinomial variables that can take multiple categorical or non-categorical values. The document then focuses on binary choice models, introducing the theoretical framework and latent variable approach. It describes the linear probability model (LPM) and its problems, and introduces the probit model as a better alternative that constrains probabilities to the range from 0 to 1 by assuming the error term follows a normal distribution. Maximum likelihood estimation is discussed as the technique for estimating parameters in the probit model given its non-linear functional form.

Uploaded by

Mihai Cociuba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
50 views15 pages

Seminar Econometrie

This document provides an overview of binary choice models and multiple discrete choice models. It discusses two classes of discrete variables: binary variables that can take values of either 1 or 0, and multinomial variables that can take multiple categorical or non-categorical values. The document then focuses on binary choice models, introducing the theoretical framework and latent variable approach. It describes the linear probability model (LPM) and its problems, and introduces the probit model as a better alternative that constrains probabilities to the range from 0 to 1 by assuming the error term follows a normal distribution. Maximum likelihood estimation is discussed as the technique for estimating parameters in the probit model given its non-linear functional form.

Uploaded by

Mihai Cociuba
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Nonlinear Econometrics - Universiatea de Vest

Timisoara
Lecture 1
Binary Choice Models and Multiple Discrete Choice
Models
1 Classes of Discrete Variable
There are two classes of discrete variables:
1. Binary
A variable Y
t
which can take a value of either 1 or 0
2. Multinomial
Which can be further classied as:
Categorical:
y = 1 if income < $10, 000
y = 2 if $10, 000 income $20, 000
y = 3 if $20, 000 < income
A further classication of categorical variables depends on whether the
specic outcomes taken by that variable have a natural ordering or se-
quence.
Nominal/unordered categorical
y = 1 if the mode of transport is by car
y = 2 if the mode of transport is by bus
y = 3 if the mode of transport is by train
Sequential
y = 1 if an individual is working
y = 2 if an individual is working part-time
Non-categorical
y = the number of TV sets in a household
The characteristics of any discrete variable dictate the methods available for
model solution.
1
2 About Binary Choice Models
Theoretical framework: Consider a binary dependent variable y which has
only two possible outcomes (0 and 1), and a vector of explanatory variables x
thought to inuence the realization of y.
The unconditional expectation of the binary variable y is by denition a
probability:
E(y) = Pr(y = 1)
Further, let the set of explanatory variables x inuence the outcome of y. Then,
the conditional expectation of y given x is:
E(y|x) = Pr(y = 1|x)
Relate this term to the standard regression analysis:
y = F(x, ) + u
has the conditional expectation
E(y|x) = F(x, ) + E(u|x) = F(x, )
Therefore, the standard regression functional F(x, ) is a representation
of the conditional expectation of y given x.
If the dependent variable in a regression relationship is binary, then the
regression function equates directly to the conditional probability of ob-
serving y = 1.
Thus, the characteristics of binary choice models depend on the way we
specify F(x, ).
The latent variable approach: Assume that there is some underlying (and
unobserved) latent propensity variable y

where
y

(, ).
We do not observe y

directly, but we do observe a binary outcome y such that


y = 1(y

> 0)
where 1(.) is the indicator function taking the value 1 if the condition within
parentheses is satised, and 0 otherwise.
Dene the latent equation in linear form,
y

= x

+ u,
where u is random with symmetric density f(.) and corresponding cdf F(.).
2
We now have that
E(y|x) = Pr(y = 1|x)
= Pr(y

> 0|x)
= Pr((x

+ u) > 0)
= Pr(u > x

) = F(x

).
By specifying an appropriate distribution function for u, we can derive binary
choice models.
Example Let y be a labour force participation variable with y = 1 if the indi-
vidual works and y = 0 otherwise and let the outcome (working, non-working)
being described by the state-specic utilities U

(y) , with
U

(y = 1) = x

1
+ u
1
U

(y = 0) = x

0
+ u
0
Participation in the labour force requires that U

(y = 1) > U

(y = 0), such
that
y = 1[U

(y = 1) > U

(y = 0)]
= 1(x

1
+ u
1
> x

0
+ u
0
)
= 1[u
1
u
0
> x

(
1

0
)].
We cannot identify both parameters
1
and
0
, but we can identify the dierence

0
Hence,
y = 1(y

> 0)
where
y

= x

(
1

0
) + (u
1
u
0
) = x

+ u.
Thus, the latent variable approach to a binary choice model can be derived from
an economic model of behavior.
3 The Linear Probability Model (LPM)
Consider a binary dependent variable y and a (kx1) vector of explanatory vari-
ables x. We may specify the conditional probability directly as:
Pr(y = 1|x) = F(x, ) = x

;
Introducing random disturbances, we have
y = x

+ u,
where u represents the stochastic disturbance term in the relationship, f(u)
represents its density and E(u|x) = 0 by denition. This model is known as the
3
Linear Probability Model (LPM).
For a sample of n observations {y
i
, x
i
} drawn at random from a population,
y
i
= x

i
+ u
i
.
The Ordinary Least Squares estimation procedures may be applied.
The LPM might therefore be considered a rst-order approximation to the
arbitrary nonlinear probability function F(x
i
, ); that is,
F(x, ) F(x
0
, ) + (x x
0
)

F(x
0
)

= x

using a rst-order Taylor series expansion around x = x


0
.
Problems with the LPM:
disturbance terms are non-normal;
u
i
= 1 x

i
with probability f(u
i
) = x

i
(for y = 1)
u
i
= x

i
with probability f(u
i
) = 1 x

i
(for y = 0)
disturbance terms are heteroskedastic;
var(u
i
|x
i
) = E(u
2
i
|x
i
)
= (x

i
)
2
(1 x

i
) + (1 x

i
)
2
(x

i
)
= (x

i
)(1 x

i
)
= Pr(y
i
= 1|x
i
) Pr(y
i
= 0|x
i
).
the conditional expectation is not bounded between zero and one;
E(y
i
|x
i
) = Pr(y
i
= 1|x
i
) = x

Instead, it is dened over the entire real line. What is the expected value
of X
i
here? It is P(Y = 1|X). This is a probability, and so the support
(the domain) of X

i
is not necessarily between zero and one. However,
the support of Y has to be between zero and one by denition.
There are thus two problems: rstly, in linear probability models, the
predicted Y s can be > 1, so that when you build weights to address
heteroskedasticity (as we will see in the next section), they can be

1,
which doesnt exist! Secondly, after they have been weighted, the predicted
Ys can be > 1, which also makes no sense.
4
Weighted Least Squares: A Solution We transform this model to give it
a constant variance using a
1
Wi
weight.
Y
i
W
i
=
X
i
W
i
+
u
i
w
i
In order to obtain a constant
ui
wi
, we use the following weight (with var(Y
i
) = 1):
w
i
=
_
(x

)(1 x

)
calculated from a rst-stage estimation. The weights are a function of our since
we use a

in the estimation of the weights, so we introduce some randomness,
but it may nonetheless look homoskedastic. We need a consistent estimator of
the : if we estimate with OLS, it will be consistent but heteroskedastic. If we
use w
i
, the adjusted model then becomes
y
i
w
i
=
_
x
i
w
i
_

+
u
i
w
i
Which will generate

LPM
. We can draw inference with this new OLS model,
where weve corrected the errors.
var(
u
i
x
) =
var(u
i
)
x
2
However, this model still does not return probabilities within the range [0, 1].
We must still assume that this is the case, or impose the restriction on the data.
A better solution is to re-specify, or transform the regression model itself to
constrain the probability outcome.
LPM Marginal Eects The change in the expected value of y from x is:
E(y
i
|x
i
)
x
i
=
since:
E(y = 1|x
i
) = Pr(y
i
= 1|x
i
)
4 The Probit Model
Any probability function is non-linear. If, however, we assume that u
i

N(0, ), then Pr(y
i
= 1|x
i
) should be normal.
y
i
= Pr(y
i
= 1|x
i
) + u
i
But if Pr(y
i
= 1|x
i
) is normal, we cannot represent it linearly. If it is normal,
the distribution (PDF) is:
f(u
i
)(u
i
) =
1

2
2
e
u
2
2
2
5
and the CDF is:
F(x
i
, ) = (z) =
_
z

(u)du
Since the normal is nonlinear, this integral is also non-linear and we have:
Pr(y
i
= 1|x
i
) = (
x

)
Where the CDF is
x

. This is obtained by writing P(y


i
= 1|x
i
) for u. If
looking at y
i
= u
x

. The probability of u
i
= 1 x

i
is x

i
. If non-
linear, the probability will be F(x

i
). When u is assumed normally distributed,
parameters must scaled to force the variance of u to unity. Thats why we
replace (x

i
) then normalize by dividing by .
Pr(y = 1|x) = Pr(u > x

)
= Pr[
u

> x

]
= Pr[z > x

]
= (x

)
y
i
= (
x

) + u
i
.
Note: The function (z) is a monotone increasing function of z. Moreover, the
model returns well-dened probabilities:
F(x
i
, ) 0 as x

i

F(x
i
, ) 1 as x

i

Indeed, thats why we use it! However, because the transformed regression func-
tion is non-linear in , we can no longer use OLS and must move to Maximum
Likelihood solution techniques.
4.1 Estimation of the Probit Model
We construct a likelihood estimator with parameters , .
Consider a sample of n observations {y
i
, x
i
}, where y
i
is binary. Assume
y
i
= 1(y

i
> 0) and 1 y
i
= 1(y

i
0) for y

i
= x

+ u
i
.
For any vector , the probability of observing y
i
conditional on x
i
for an indi-
vidual is
L
i
(, |x
i
, y
i
) = Pr(y
i
= 1|x
i
) Pr(y
i
= 0|x
i
)
and for all individuals it is
L
T
(, |x, y) =
m

i=1
L
i
=
m

i=1
[Pr(y
i
= 1|x
i
)
yi
Pr(y
i
= 0|x
i
)
1yi
]
6
Since for the Probit model,
Pr(y
i
= 1|x
i
; ) = (x

i
),
Pr(y
i
= 0|x
i
; ) = 1 (x

i
).
and normalizing by dividing by the variance, we have:
L
T
(, |x, y) =
m

i=1
[(
x

)
yi
(1 (
x

))
1yi
]
Taking logs simplies the equation:
ln L
T
(, |x
i
, y
i
) =
m

i=1
_
y
i
ln (
x

) + (1 y
i
) ln(1 (
x

))
_
since y
i
can only take on values of 0 and 1, half of the terms in the summation
are equal to zero because of the indicator function.
First Order conditions
, arg max
,
lnL
T
The rst order condition for (also known as the score function) is non-linear:
ln L
T

=
n

i=1
_
y
i
(
x

)
(
x

)
x
i
+ (1 y
i
)
(
x

)
xi

1 (
x

)
_
= 0
where is the CDF, the cumulative distribution function and is the PDF,
the probability distribution function (the derivative of the CDF).
The rst order condition for is:
lnL
T

=
n

i=1
[Y
i
(
X

)
(
X

)
+ (1 Y
i
)
(
X

)(
X

2
)
1 (
X

)
] = 0
To nd

and requires solving a system of two equations in two unknowns,
which is not straightforward to solve. You can start with an initial value, and
work on a gradient until get a rst order condition (FOC) value close to zero,
which will give you your optimal .
7
Probit Marginal Eects
E(Y
i
= 1|X
i
)
X
=
P(Y
i
= 1|)
X
= (
X
i

)
2
E(Y
i
|X)
X
=
Pr(Y
i
= 1|X)
X
=
(
X

)
X

= (
X

But this depends on i, so each individual would have a dierent marginal ef-
fect. If we have the normal distribution, = 1, the denominators will disappear
and we will have
p
. Lets assume = 1. Then,
(X

i
)
p
=
LPM
where (X
i
) is between zero and one.
5 The Logit Model
If u
i
is not normal, we can assume that it follows the logistic () distribution.
In this case,
F(x
i
, ) = (x

i
)
where
(x

i
) = Pr(y
i
= 1|x
i
; ) =
exp(x

i
)
1 + exp(x

i
)
=
e
X

1 + e
X
=
exp(z)
1 + exp(z)
=
1
1 + exp(z)
is the CDF of the Logistic function with u Logistic and a mean of zero
E(u
i
|X
i
) = 0.
Like the Probit model, the Logit CDF is a monotone increasing function of
z that returns well-dened probabilities between 0 and 1 but is non-linear in
and which must thus be estimated by Maximum Likelihood solution techniques.
5.1 Estimation of the Logit Model
We construct a maximum likelihood estimator for the logit model.
Since
Pr(y
i
= 1|x
i
; ) = (x

i
) =
exp(x

i
)
1 + exp(x

i
)
;
Pr(y
i
= 0|x
i
; ) = 1 (x

i
) =
1
1 + exp(x

i
)
,
8
The likelihood for a given individual will be:
L
i
= (x

i
)
yi
(1 (x

i
))
1yi
= (
e
x

1 + e
x

)
yi
(1
e
x

1 + e
x

)
1yi
= (
e
x

1 + e
x

)
yi
(
1
1 + e
x

)
1yi
= (e
xi
)
yi
1
1 + e
x

L
T
=
n

i=1
(e
x

)
yi
1
1 + e
x

lnL
T
=
n

i=1
[y
i
(x
i
) ln(1 + e
x

)]
This is non-linear in , but easier to estimate than probit and LPM.
First Order conditions The FOCs/ score function is:
lnL
T

=
n

i=1
(y
i
x

e
x

1 + e
x
x
i
) =
n

i=1
(y
i

e
x

1 + e
x
)x
i
= 0
which gives logistic,

L
.
Logit Marginal Eects

L
=
(x

i
)
x
=
Pr (y
i
= 1|x;
i
)
x
ij
=
exp(x

i
)
[1+exp (x

i
)]
2

j
6 Comparing LPM, Probit and Logit Models
6.1 Interpreting/comparing binary choice model coecients
The coecients are:
LPM : Pr(y
i
= 1|x
i
; ) = x

Probit : Pr(y
i
= 1|x
i
; ) = (x

i
)
Logit : Pr(y
i
= 1|x
i
; ) = (x

i
)
If the parameter
j
associated to the jth explanatory variable is positive
(negative), then
Pr(y
i
= 1|x
i
; ) = F(x

i
)
will increase (decrease) with an increase in x
j
.
9
The marginal eects are:
LPM :
Pr (y
i
= 1|x
i
; )
x
ij
=
j
Probit :
Pr (y
i
= 1|x
i
; )
x
ij
= (x

i
)
j
Logit :
Pr (y
i
= 1|x
i
; )
x
ij
=
exp(x

i
)
[1+exp (x

i
)]
2

j
Coecients are not the same as marginal eects. By implication, slope esti-
mates are not directly comparable amongst models (eg. variance of disturbances
in Logit model and the Probit model are dierent). Hence, the parameters are
scaled dierently as well.
Notice also that:
the marginal eects in the LPM are constant (ie. independent of the data)
the marginal eects in the Probit and Logit models depend on x
i
.
6.2 Comparing LPM and Probit Estimators
The probit and LPM estimator should not be equal, they should be dierent.
One is linear, one is not; they may both be consistent. Note that the probit
functional form has to be between zero and one by denition and

LPM
has to
be, and should be, between zero and one, either by assumption or by adjustment
to get 0 <

LPM
< 1.
If we know that the estimators should be dierent, we can nd the ratio
(

LPM
to

P
.) between the two by linearizing the .

Y
i
= X

LPM
+ u
i

Y
i
= F(X
i

P
) + u
i
X

LPM
= F(X
i

p
)
Yet we know from each models marginal eects that (when = 1):
E(y
i
= 1|x
i
)
x
=

LPM
E(y
i
= 1|x
i
)
x
= (x

i
)
p
(X

i
)
p
=

LPM

p
=

LPM
(x

i
)
10
and since (x

i
) is between zero and one, we can see that

p
should be bigger
than

LPM
. Or, equivalently, we can say that

LPM
should be smaller than

p
.
In particular, the ratio should be:

LPM
0.25

p
0.625

LPM
If you get a ratio dierent from this, say

p
= 1,

LPM
= 0.7, either the LPM
estimator is not consistent, or the normality assumption on the error is incorrect.
To see if the latter is the case, we can test for the probit models normality of
errors. If errors are not normal,
p
is not consistent and it is useless. But this
doesnt mean that
LPM
is consistent. The errors may be non-normal AND

LPM
may be inconsistent. If the ratio is preserved, can see that the model is
good.
6.3 Comparing LPM, Probit and Logit Estimators
We have:
Linear: Y
i
= X
i
+ u
i
Probit: Y
i
= (X
i
) + u
i
Logit: Y
i
= (X
i
) + u
i
If all three estimators are consistent, then the marginal eect, or change in
E(Y
i
), should be the same. The marginal eects for each estimator are:

LPM
=
E(u
i
)
X
=
ln(Y
i
1|X)
X
i
= (X

i
)

P
= (X
i
)

L
=
(X

i
)
X
If the marginal eect of each estimator is the same, then we have:
E(u
i
)
X
= (X

i
) =
(X

i
)
X
Yet we know that the derivative of (X

) with respect to X

or, equivalently,
u, is just the PDF of (X

). We know that the CDF of the logit is (u) =


e
u
1+e
u
.
11
So to nd that PDF of the logit, we take the derivative of (u).
(u) =
(u)
u
=

e
u
1+e
u
u
=
e
u
(1 + e
u
) e
u
e
u
(1 + e
u
)
2
=
e
u
+ (e
u
)
2
(e
u
)
2
(1 + e
u
)
2
= (
e
u
1 + e
u
)(
1
1 + e
u
)
= (u)(1 (u))
So, in words, the PDF of the logistic equals the CDF times one minus the CDF.
Thus the marginal eects for each model are:
LPM :
Pr (y
i
= 1|x
i
; )
x
ij
=
j
Probit :
Pr (y
i
= 1|x
i
; )
x
ij
= (x

i
)
j
Logit :
Pr (y
i
= 1|x
i
; )
x
ij
=

L
(x

)(1 x

) =

L
e
u
1 + e
u
(1
e
u
1 + e
u
)
If all three models are consistently estimated,
logit
will be the largest. The
ratios should be:
LPM : 0.7 Probit : 1 Logit : 1.4
If ratio is not preserved between 0.7 and 1: if, for example, have LPM: 0.7,
Probit: 1, Logit:1/0.0625; then LPM is not consistent. Probit and logit are
consistent, can use probit. Probit is more ecient because the tails are smaller.
If all the ratios are not preserved, then we dont know. Remain with logistic.
6.4 Empirical Example: childcare take-up estimates
(y=1 if the women uses paid childcare, y=0 otherwise)
12
Parameter Estimates
Variable LPM Probit Logit
single woman -0.059 -0.184 -0.310
other children aged 5+ -0.101 -0.318 -0.540
woman works 0.152 0.430 0.713
left school at 18 0.109 0.310 0.520
attended college/university 0.160 0.458 0.757
youngest child aged 2 0.186 0.556 0.928
youngest child aged 3-4 0.309 0.882 1.458
receives maintenance 0.089 0.264 0.432
constant 0.153 -0.995 -1.645
The reference in all cases is:
a married women who
doesnt work
has left school at 16
has one child aged less than 2 and
receives no maintenance.
For the reference household, all explanatory variables take a value of 0, which
leads to probability estimates in each model of:
LPM: Pr(y
i
= 1|x
i
) = x

= 0.153
Probit: Pr(y
i
= 1|x
i
) = x

= (0.995) = 0.161
Logit: Pr(y
i
= 1|x
i
) = x

= exp(1.645)/[1 + exp(1.645)] = 0.162


How, for example, does the probability change for women who attended univer-
sity?
LPM: Pr(y
i
= 1|x
i
) = x

= 0.153 + 0.160 = 0.313


Probit: Pr(y
i
= 1|x
i
) = x

= (0.995 + 0.458) = (0.537) = 0.296


Logit: Pr(y
i
= 1|x
i
) = x

=
exp(1.645+0.757)
1+exp(1.645+0.757)
= 0.291
7 Statistical Inference in Binary Choice Models
For the LPM, estimated standard errors can be derived easily. Dont forget
LPM is heteroskedastic.
For the Probit and Logit models,

n(

)
a
N(0, I(

)
1
)
Computer software for ML estimation evaluates the variance-covariance ma-
trix V (

) directly.
Hence, statistical inference and hypothesis testing can then be carried out
using standard inferential techniques.
13
Measures of goodness-of-t In order to assess the accuracy with which
a binary choice model approximates the observed data, 2 measures based on
likelihood ratios are proposed and are attributed to Cragg and Uhler (1970)
and to McFadden (1974) .
Let L
U
- represent the likelihood for the full unrestricted model
Let L
R
- represent the likelihood for a restricted model estimated on an
intercept alone.
Then the formulation for two proposed measures are as follows:
Cragg and Uhler: pseudo R
2
=
L
2/n
U
L
2/n
R
1L
2/n
R
McFadden: pseudo R
2
= 1
ln L
U
ln L
R
An alternative outcome-based measure of performance or t in binary choice
models evaluates the proportion of correct predictions.
Let

P
i
= Pr(y
i
= 1|x
i
) = (x

) for a Probit model


Use the following rule to predict states for each observation
y
i
= 1
_

P
i
> 0.5
_
The proportion of correct predictions may then be dened as:
P =
1
n
n

i=1
1 (y
i
= y
i
)
Testing the overall signicance of the regression Let L
U
- represent the
likelihood for the full model
Let L
R
- represent the likelihood for a restricted model
r = the number of restrictions imposed
To test the joint signicance of the slope parameters in the ML model of binary
choice use the following statistic
2 ln
_
L
R
L
U
_
= 2 (ln L
U
ln L
R
)
2
r
For example,
H
0
:
2
=
3
= ... =
k
= 0
H
1
: at least one
j
= 0, j = 2, ..., k
The technique can be generalized to test for a subset of restrictions.
8 Using Binary Choice Models
8.1 Simulating transitions using binary choice models
Empirical binary choice models can be used to simulate individual responses
to exogenous shocks. Consider a model of labour market participation which
14
includes the level of non-means tested benet income among the exogenous
variables describing the decision to work, and suppose that a policy option is
being considered which would abolish the non-means tested benets. How might
such a policy impact on labour market decisions?
Let
y

i
= x

i
+ u
i
,
consider now an exogenous shock on the exogenous variables x
i
. Thus, x
i
be-
comes x
N
i
y

i
becomes y
N
i
y
N
i
=
_
x
N
i
_

+ u
i
.
How does this exogenous shock impact on the probability P
i(jk)
of transition
from any state j to k, (j, k = 0, 1) ?
A correct way to nd transitions probabilities is found in Duncan and Weeks
(1970) . They represent the transition probabilities as
P
i(00)
= min
_
p
B
i0
, p
N
i0
_
P
i(01)
= 1
_
p
B
i0
> p
N
i0
_
p
B
i0
p
N
i0
_
P
i(10)
= 1
_
p
B
i0
< p
N
i0
_
p
N
i0
p
B
i0
_
P
i(11)
= min
_
p
B
i1
, p
N
i1
_
In other words, one needs only to dierence the two state probabilities for a
correct measure of the probability of transition.
8.2 Exogenous & Dummy Regressors Case
Data Generating Process for the Probit Estimator.
Use the same data generating process
The experimental design consist of the following model
Y
it
= + X
it
+ D
it
+
it
,
where
X
it
N (0, 1)

it
N (0, 1)
D
it
= 1[(X
it
+
it
) > 0] ,
it
N (0, 1)
Y
it
= 1[Y
it
> 0] .
Do the estimation of the model by maximizing the unconditional log-likelihood
L
U
=

t
{y
it
ln ( + x
it
+ D
it
) + (1 y
it
) ln

( + x
it
+ D
it
)},
where

(+x
it
+D
it
) = 1(+x
it
+D
it
). An estimate for = {, , } ,
is obtained as

L
U
= arg max

_
i
L
U
N
_
.
15

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy