Econometrics
Econometrics
Måns Söderbom
2 May 2011
Ordered response models (e.g. modelling the rating of the corporate payment default risk, which
employed)
Corner solution models and censored regression models (e.g. modelling household health expendi-
ture: the dependent variable is non-negative, continuous above zero and has a lot of observations
at zero)
These models are designed for situations in which the dependent variable is not strictly continuous and
not binary. They can be viewed as extensions of the nonlinear binary choice models studied in Lecture
References:
Greene 23.10 (ordered response); 23.11 (multinomial response); 24.2-3 (truncation & censored data).
You might also …nd the following sections in Wooldridge (2002) "Cross Section and Panel Data"
useful: 15.9-15.10; 16.1-5; 16.6.3-4; 16.7; 17.3 (these references in Wooldridge are optional).
What’s the meaning of ordered response? Consider credit rating on a scale from zero to six, for
instance, and suppose this is the variable that we want to model (i.e. this is the dependent variable).
Clearly, this is a variable that has ordinal meaning: six is better than …ve, which is better than four etc.
The standard way of modelling ordered response variables is by means of ordered probit or ordered
logit. These two models are very similar. I will discuss the ordered probit, but everything below carries
over to the logit if we replace the normal CDF (:) by the logistic CDF (:).
2
Can you think of reasons why OLS may not be suitable for modelling an ordered response variable?
Let y be an ordered response taking on the values f0; 1; 2; :::; Jg: We derive the ordered probit from a
y = 1 x1 + ::: + k xk +"
= x0 + "; (2.1)
where e is a normally distributed variable with the variance normalized to one. Notice that this model
We do not observe the latent variable, but we do observe choices according to the following:
y = 0 if y 1
y = 1 if 1 <y 2
y = 2 if 2 <y 3
(:::)
y = J if J <y :
Think of the cut-o¤ points as intercept shifters. This is how Stata speci…es the model. Greene does it
3
Suppose y can take three values: 0, 1 or 2. We then have
y = 0 if x0 + " 1
y = 1 if 1 < x0 + " 2
y = 2 if 2 < x0 + ":
We can now de…ne the probabilities of observing y = 0; 1; 2. For the smallest and the largest value, the
resulting expressions are very similar to what we have seen for the binary probit:
= Pr (e 1 x0 )
= ( 1 x0 ) ;
= 1 (x0 1)
= Pr (" > 2 x0 )
= 1 ( 2 x0 )
= (x0 2) :
= [1 ( 1 x0 )] (x0 2) ;
= 1 (1 (x0 1 )) (x0 2) ;
= (x0 1) (x0 2) ;
4
or equivalently
Pr (y = 1jx) = ( 2 x0 ) ( 1 x0 )
(remember: (a) = 1 ( a), because the normal distribution is symmetric - keep this in mind when
studying ordered probits or you might get lost in the algebra). In the general case where there are several
intermediate categories, all the associated probabilities will be of this form; see Greene, p.832. Notice
2.2. Interpretation
When discussing binary choice models we paid a lot of attention to marginal e¤ects - i.e. the partial
e¤ects of a small change in explanatory variable xj on the probability that we have a positive outcome.
For ordered models, we can clearly compute marginal e¤ects on the predicted probabilities along the
same principles. It is not obvious (to me, anyway) that this the most useful way of interpreting the
results is, however. Let’s have a look the marginal e¤ects and then discuss.
When discussing marginal e¤ects for binary choice models, we focussed on the e¤ects on the probability
that y (the binary dependent variable) is equal to one. We ignored discussing e¤ects on the probability
that y is equal to zero, as these will always be equal to minus one times the partial e¤ect on the probability
Since we now have more than two outcomes, interpretation of partial e¤ects on probabilities becomes
somewhat more awkward. Sticking to the example in which we have three possible outcomes, we obtain:
@ Pr (y = 2jx)
= (x0 2) k;
@xk
5
for the highest category (note: analogous to the expression for binary probit).1 Moreover,
@ Pr (y = 1jx)
= [ (x0 1) (x0 2 )] k;
@xk
@ Pr (y = 0jx)
= (x0 1) k;
@xk
for the lowest category, assuming that xk is a continuous variable enter the index model linearly (if xk is
discrete - typically binary - you just compute the discrete change in the predicted probabilities associated
The partial e¤ect of xk on the predicted probability of the highest outcome has the same sign as
k.
The partial e¤ect of xk on the predicted probability of the lowest outcome has the opposite sign to
The sign of the partial e¤ect of xk on predicted probabilities of intermediate outcomes cannot, in
general, be inferred from the sign of k. This is because there are two o¤setting e¤ects - suppose
k > 0, then the intermediate category becomes more likely if you increase xk because the the
probability of the lowest category falls, but it also becomes less likely because the the probability of
the highest category increases (illustrate this in a graph). Typically, partial e¤ects for intermediate
probabilities are quantitatively small and often statistically insigni…cant. Don’t let this confuse you!
Discussion - how best interpret results from ordered probit (or logit)?
6
Clearly one option here is to look at the estimated -parameters, emphasizing the underlying latent
variable equation with which we started. Note that we don’t identify the standard deviation of
" separately. Note also that consistent estimation of the -parameters requires the model to be
correctly speci…ed - e.g. homoskedasticity and normality need to hold, if we are using ordered
probit. Such assumptions are testable using, for example, the methods introduced for binary choice
models. You don’t often see this done in applied work however.
Another option might be to look at the e¤ect on the expected value of the ordered response
variable, e.g.
in our example with three possible outcomes. This may make a lot of sense if y is a numerical
variable - basically, if you are prepared to compute mean values of y and interpret them. For
example, suppose you’ve done a survey measuring consumer satisfaction where 1="very unhappy",
2="somewhat unhappy", 3="neither happy nor unhappy", 4="somewhat happy", and 5="very
happy", then most people would be prepared to look a the sample mean even though strictly the
underlying variable is qualitative, thinking that 3.5 (for example) means something (consumers are
on average a little bit happy?). In such a case you could look at partial e¤ects on the conditional
mean.
Alternatively, you might want investigate the e¤ect on the probability of observing categories j; j +
e¤ect on the probability that a consumer is "somewhat happy" or "very happy", for example.
Thus, it all boils down to presentation and interpretation here, and exactly what your quantity
of interest is depends on the context. We can use the Stata command ’mfx compute’ to obtain
estimates of the partial e¤ects on the predicted probabilities, but for more elaborate partial e¤ects
7
EXAMPLE: Incidence of corruption in Kenyan …rms. Section 1 in the appendix.
Suppose now the dependent variable is such that more than two outcomes are possible, where the outcomes
cannot be ordered in any natural way. For example, suppose we are modelling occupational status
based on household data, where the possible outcomes are self-employed (SE), wage-employed (WE) or
unemployed (UE). Alternatively, suppose we are modelling the transportation mode for commuting to
Binary probit and logit models are ill suited for modelling data of this kind. Of course, in principle we
could bunch two or more categories and so construct a binary outcome variable from the raw data (e.g.
if we don’t care if employed individuals are self-employed or wage-employees, we may decide to construct
a binary variable indicating whether someone is unemployed or employed). But in doing so, we throw
away potentially interesting information. And OLS is obviously not a good model in this context.
However, the logit model for binary choice can be extended to model more than two outcomes.
Suppose there are J possible outcomes in the data. The dependent variable y can then take J values, e.g.
0,1,...,J-1. So if we are modelling, say, occupational status, and this is either SE, WE or UE, we have
J = 3. There is no natural ordering of these outcomes, and so what number goes with what category is
arbitrary (but, as we shall see, it matters for the interpretation of the results). Suppose we decide on the
following:
y = 0 if individual is UE,
y = 1 if individual is WE,
y = 2 if individual is SE.
8
We write the conditional probability that an individual belongs to category j = 0; 1; 2 as
Pr (yi = jjxi ) ;
One way of imposing these restrictions is to write the probabilities in logit form:
exp (xi 1 )
Pr (yi = 1jxi ) = ;
1 + exp (xi 1 ) + exp (xi 2)
exp (xi 2 )
Pr (yi = 2jxi ) = ;
1 + exp (xi 1 ) + exp (xi 2)
1
= :
1 + exp (xi 1 ) + exp (xi 2)
The main di¤erence compared to what we have seen so far, is that there are now two parameter vectors,
1 and 2 (in the general case with J possible responses, there are J 1 parameter vectors). This makes
interpretation of the coe¢ cients more di¢ cult than for binary choice models.
The easiest case to think about is where 1k and 2k have the same sign. If 1k and 2k are positive
(negative) then it is clear that an increase in the variable xk makes it less (more) likely that the
But what about the e¤ects on Pr (yi = 1jxi ) and Pr (yi = 2jxi )? This is much trickier than what
we are used to. We know that, for sure, the sum of Pr (yi = 1jxi ) and Pr (yi = 2jxi ) will increase,
but how this total increase is allocated between these two probabilities is not obvious. To …nd out,
9
we need to look at the marginal e¤ects. We have
@ Pr (yi = 1jxi ) 1
= 1k exp (xi 1 ) [1 + exp (xi 1) + exp (xi 2 )]
@xik
2
exp (xi 1 ) [1 + exp (xi 1) + exp (xi 2 )]
@ Pr (yi = 1jxi )
= 1k Pr (yi = 1jxi )
@xik
1
Pr (yi = 1jxi ) [1 + exp (xi 1) + exp (xi 2 )]
or
@ Pr (yi = 1jxi ) 1kexp (xi 1 ) + 2k exp (xi 2 )
= Pr (yi = 1jxi ) 1k : (3.1)
@xik 1 + exp (xi 1 ) + exp (xi 2 )
Similarly, for j = 2:
Of course it’s virtually impossible to remember, or indeed interpret, these expressions. The point is that
whether the probability that y falls into, say, category 1 rises or falls as a result of varying xik , depends
not only on the parameter estimate 1k , but also on 2k . As you can see from (3.1), the marginal e¤ect
@ Pr(yi =1jxi )
@xik may in fact be negative even if 1k is positive, and vice versa. Why might that happen?
10
EXAMPLE: Appendix, Section 2. Occupational outcomes amongst Kenyan manufacturing workers.
The multinomial logit is very convenient for modelling an unordered discrete variable that can take
on more than two values. One important limitation of the model is that the ratio of any two probabilities
j and m depends only on the parameter vectors j and m, and the explanatory variables xi :
= exp (xi ( 1 2 )) :
It follows that the inclusion or exclusion of other categories must be irrelevant to the ratio of the two
Example: Individuals can commute to work by three transportation means: blue bus, red bus, or
train. Individuals choose one of these alternatives, and the econometrician estimates a multinomial logit
Suppose the bus company were to remove blue bus from the set of options, so that individuals can choose
only between red bus and train. If the econometrician were to estimate the multinomial logit on data
generated under this regime, do you think the above probability ratio would be the same as before?
If not, this suggests the multinomial logit modelling the choice between blue bus, red bus and train is
mis-speci…ed: the presence of a blue bus alternative is not irrelevant for the above probability ratio, and
Some authors (e.g. Greene; Stata manuals) claim we can test the IIA assumption for the multinomial
11
1. Estimate the full model. For example, with red bus, train and blue bus being the possible outcomes,
and with red bus de…ned as the benchmark category. Retain the coe¢ cient estimates.
2. Omit one category and re-estimate the model - e.g. exclude blue bus, and model the binary decision
3. Compare the coe¢ cients from (1) and (2) above using the usual Hausman formula. Under the null
that IIA holds, the coe¢ cients should not be signi…cantly di¤erent from each other.
First, you don’t really have data generated in the alternative regime (with blue bus not being an
option) and so how can you hope to shed light on the behavioral e¤ect of removing blue bus from
don’t depend on blue bus outcomes. So if you estimate a multinomial logit with only a constant
included in the speci…cation, the estimated constant in the speci…cation train speci…cation (with
red bus as the reference outcome) will not change if you omit blue bus outcomes when estimating
(i.e. step (2) above). Conceptually, a similar issue will hold if you have explanatory variables in the
model, at least if you have a ‡exible functional form in your xi indices (e.g. mutually exclusive
dummy variables)
Third, from what I have seen the Hausman test for the IIA does not work well in practice (not very
surprising).
While testing for IIA in the context of a multinomial logit appears problematic, it may more sense
in a di¤erent setting. For example, it will work …ne for conditional logit models, i.e. models where
12
choices are made based on observable attributes of each alternative (e.g. ticket prices for blue bus,
red bus and train may vary). So, what I have said above applies speci…cally for the multinomial
logit.
Note that there are lots of other econometric models that can be used to model multinomial response
models - notably multinomial probit, conditional logit, nested logit etc. These will not be discussed here.
EXAMPLE: Hausman test for IIA based on multinomial logit gives you nonsense - appendix, Section
3.
We now consider econometric issues that arise when the dependent variable is bounded but continuous
within the bounds. We focus …rst on corner solution models, and then turn to the censored regression
lo yi hi;
where lo denotes the lower bound (limit) and hi the higher bound, and where these bounds are the result
By far the most common case is lo = 0 and hi = 1, i.e. there is a lower limit at zero and no upper
limit. The dependent variable takes the value zero for a nontrivial fraction of the population, and
is roughly continuously distributed over positive values. You will often …nd this in micro data, e.g.
You can thus think of this type of variable as a hybrid between a continuous variable (for which
the linear model is appropriate) and a binary variable (for which one would typically use a binary
13
choice model). Indeed, as we shall see, the econometric model designed to model corner solution
variables looks like a hybrid between OLS and the probit model. In what follows we focus on the
Let y be a variable that is equal to zero for some non-zero proportion of the population, and that is
continuous and positive if it is not equal to zero. As usual, we want to model y as a function of a set of
x= 1 x1 x2 ::: xk :
4.1. OLS
We have seen how for binary choice models OLS can be a useful starting point (yielding the linear
probability model), even though the dependent variable is not continuous. We now have a variable which
is ’closer’to being a continuous variable - it’s discrete in the sense that it is either in the corner (equal
y = x + u:
We’ve seen that there are a number of reasons why we may not prefer to estimate binary choice models
using OLS. For similar reasons OLS may not be an ideal estimator for corner response models:
Based on OLS estimates we can get negative predictions, which doesn’t make sense since the
dependent variable is non-negative (if we are modelling household expenditure on education, for
Conceptually, the idea that a corner solution variable is linearly related to a continuous independent
variable for all possible values is a bit suspect. It seems more likely that for observations close to
the corner (close to zero), changes in some continuous explanatory variable (say x1 ) has a smaller
14
e¤ect on the outcome than for observations far away from the corner. So if we are interested in
A third (and less serious) problem is that the residual u is likely to be heteroskedastic - but we can
A fourth and related problem is that, because the distribution of y has a ’spike’at zero, the residual
cannot be normally distributed. This means that OLS point estimates are unbiased, but inference
in small samples cannot be based on the usual suite of normality-based distributions such as the t
test.
So you see all of this is very similar to the problems identi…ed with the linear probability model.
4.2. Tobit
To …x these problems we follow a similar path as for binary choice models. We start, however, from the
y = x + u; (4.1)
2
where the residual u is assumed normally distributed with a constant variance u, and uncorrelated
8 9
>
> >
>
< y if y > 0 =
y= ; (4.2)
>
> >
: 0 if y 0 >
;
y = max (y ; 0) :
First, y satis…es the classical linear model assumptions, so had y been observed the obvious choice
15
Second, it is often helpful to think of y as a variable that is bounded below for economic reasons,
and y as a variable that re‡ects the ’desired’value if there were no constraints. Actual household
expenditure on health is one example - this is bounded below at zero. In such a case y could
be interpreted as desired expenditure, in which case y < 0 would re‡ect a desire to sell o¤ ones
personal (or family’s) health. This may not be as far-fetched as it sounds - if you’re very healthy
and very poor, for instance, perhaps you wouldn’t mind feeling a little less healthy if you got paid
for it (getting paid here, of course, would be the same as having negative health expenditure).
We said above that a corner solution variable is a kind of hybrid: both discrete and continuous. The
discrete part is due to the piling up of observations at zero. The probability that y is equal to zero can
be written
Pr (y = 0jx) = Pr (y 0) ;
= Pr (x + u 0) ;
= Pr (u x )
x
= (integrate; normal distribution)
u
x
Pr (y = 0jx) = 1 (by symmetry),
u
y = x + u:
1[y(i)>0]
1[y(i)=0] yi xi
f (yjx; ; u ) = [1 (xi = u )] ;
u
16
where 1[a] is a dummy variable equal to one if a is true. Thus the contribution of observation i to the
yi xi
ln Li = 1[y(i)=0] ln [1 (xi = u )] + 1[y(i)>0] ln ;
u
Suppose the model can be written according to the equations (4.1)-(4.2), and suppose we have obtained
We see straight away from the latent variable model that j is interpretable as the partial (marginal)
@E (y jx)
= j;
@xj
E (y jxj = 1) E (y jxj = 0) = j
if xj is a dummy variable (of course if xj enters the model nonlinearly these expressions need to be
modi…ed accordingly). I have omitted i-subscripts for simplicity. If that’s what we want to know, then
Typically, however, we are interested in the partial e¤ect of xj on the expected actual outcome y;
rather than on the latent variable. Think about the health example above. We are probably primarily
interested in the partial e¤ects of xj (perhaps household size) on expected actual - rather than desired
17
- health expenditure, e.g. @E (yjx) =@xj if xj is continuous. In fact there are two di¤erent potentially
and
@E (yjx; y > 0)
: (Conditional on y>0)
@xj
We need to be clear on which of these we are interested in. Now let’s see what these marginal e¤ects
look like.
@E (yjx; y > 0)
:
@xj
y = max (y ; 0) ;
y = max (x + u; 0)
Because of the truncation (y is always positive, or, equivalently, u is always larger than x ), dealing
with the second term is not as easy as it may seem. We begin by taking on board the following result for
18
A useful result. If z follows a normal distribution with mean zero, and variance equal to one (i.e.
(c)
E (zjz > c) = ; (4.3)
1 (c)
where c is a constant (i.e. the lower bound here), denotes the standard normal probability density,
The residual u is not, in general, standard normal because the variance is not necessarily equal to one,
but by judiciously dividing and multiplying through with its standard deviation u we can transform u
That is, (u= u) is now standard normal, and so we can apply the above ’useful result’, i.e. eq (4.3), and
write:
( x = u)
E (uju > x )= u ;
1 ( x = u)
and thus
( x = u)
E (yjy > 0; x) = x + u :
1 ( x = u)
(x = u)
E (yjy > 0; x) = x + u ;
(x = u)
19
where the function is de…ned as
(z)
(z) = :
(z)
Have a look at the inverse Mills ratio function in Section 4 in the appendix, Figure 1.
Equation (4.4) shows that the expected value of y, given that y is not zero, is equal to x plus a
@E (yjy > 0; x) @ (x = u)
= j + u ;
@xj @xj
0
= j + u j= u ;
0
= j 1+ ;
0
where denotes the partial derivative of with respect to (x = u) (note: I am assuming here that xj
is continuous and not functionally related to any other variable - i.e. it enters the model linearly - this
means I can use calculus, and that I don’t have to worry about higher-order terms). It is tedious but
in general, hence
@E (yjy > 0; x)
= j f1 (x = u ) [x = u + (x = u )]g :
@xj
This shows that the partial e¤ect of xj on E (yjy > 0; x) is not determined just by j. In fact, it depends
on all parameters in the model as well as on the values of all explanatory variables x, and the
standard deviation of the residual. The term in f g is often referred to as the adjustment factor, and
it can be shown that this is always larger than zero and smaller than one (why is this useful to know?).
20
It should be clear that, just as in the case for probits and logits, we need to evaluate the marginal
e¤ects at speci…c values of the explanatory variables. This should come as no surprise, since one of the
reasons we may prefer tobit to OLS is that we have reasons to believe the marginal e¤ects may di¤er
according to how close to the corner (zero) a given observation is (see above). In Stata we can use the mfx
compute command to compute marginal e¤ects without too much e¤ort. How this is done will be clearer
in a moment, but …rst I want to go over the second type of marginal e¤ect that I might be interested in.
y = max (y ; 0) ;
y = max (x + u; 0) :
= ( x = u) 0+ (x = u) E (yjy > 0; x) ;
= (x = u) E (yjy > 0; x) ;
i.e. the probability that y is positive times the expected value of y given that y is indeed positive. Using
21
and we know from the previous sub-section that
@E (yjy > 0; x)
= j f1 (x = u ) [x = u + (x = u )]g ;
@xj
and
E (yjy > 0; x) = x + u (x = u) :
Hence
@E (yjx)
= (x = u) j f1 (x = u ) [x = u + (x = u )]g
@xj
j
+ (x = u) [x + u (x = u )] ;
u
which looks complicated but the good news is that several of the terms cancel out, so that:
@E (yjx)
= j (x = u)
@xj
(try to prove this). This has a straightforward interpretation: the marginal e¤ect of xj on the expected
value of y, conditional on the vector x, is simply the parameter j times the probability that y is larger
than zero. Of course, this probability is smaller than one, so it follows immediately that the marginal
Now consider the example in section 4 in the appendix, on investment in plant and machinery among
22
PhD Programme: Econometrics II
Department of Economics, University of Gothenburg
Appendix: Ordered & Multinomial Outcomes. Tobit regression.
Måns Söderbom
In the following example we consider a model of corruption in the Kenyan manufacturing sector. 1 Our
dataset consists of 155 firms observed in year 2000.
⎛ profit ⎞
corrupti* = α1 ln K i + α 2 ⎜ ⎟ + si + towni + ei ,
⎝ K ⎠i
where
corrupt = incidence of corruption in the process of getting connected to public services
K = Value of the firm's capital stock
profit = Total profit
s = sector effect (food, wood, textile; metal is the omitted base category)
town = location effect (Nairobi, Mombasa, Nakuru; Eldoret – which is the most remote town – is the
omitted base category)
u = a residual, assumed homoskedastic and normally distributed with variance normalized to one.
Incidence of corruption is not directly observed. Instead we have subjective data, collected through
interviews with the firm's management, on the prevalence of corruption. Specifically, each firm was
asked the following question:
"Do firms like yours typically need to make extra, unofficial payments to get connected to public
services (e.g. electricity, telephone etc)?"
Given the data available, it makes sense to estimate the model using either ordered probit or ordered
logit.
1
These data was collected by a team from the CSAE in 2000 – for details on the survey and the data, see
Söderbom, Måns “Constraints and Opportunities in Kenyan Manufacturing: Report on the Kenyan
Manufacturing Enterprise Survey 2000,” 2001, CSAE Report REP/2001‐03. Oxford: Centre for the Study of
African Economies, Department of Economics, University of Oxford. Available at
http://www.economics.ox.ac.uk/CSAEadmin/reports.
1
Summary statistics for these variables are as follows:
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
obribe | 155 3.154839 1.852138 1 6
lk | 155 15.67499 3.197098 7.258711 22.38821
profk | 155 -.3647645 2.449862 -15.73723 11.3445
wood | 155 .2 .4012966 0 1
textile | 155 .2903226 .4553826 0 1
-------------+--------------------------------------------------------
metal | 155 .2516129 .4353465 0 1
nairobi | 155 .5096774 .5015268 0 1
mombasa | 155 .2645161 .442505 0 1
nakuru | 155 .1032258 .3052398 0 1
------------------------------------------------------------------------------
obribe1 | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
lk | -.0809392 .0307831 -2.63 0.009 -.141273 -.0206054
profk | -.0569773 .0377651 -1.51 0.131 -.1309955 .0170409
wood | -.543739 .2698032 -2.02 0.044 -1.072543 -.0149345
textile | .1068028 .2405553 0.44 0.657 -.3646768 .5782825
metal | -.3959804 .251102 -1.58 0.115 -.8881313 .0961706
nairobi | .0740607 .2836262 0.26 0.794 -.4818364 .6299578
mombasa | -.1443718 .3005436 -0.48 0.631 -.7334265 .4446829
nakuru | -.0242636 .3644382 -0.07 0.947 -.7385494 .6900222
-------------+----------------------------------------------------------------
_cut1 | -2.065609 .5583871 (Ancillary parameters)
_cut2 | -1.539941 .5510676
_cut3 | -1.309679 .5479021
_cut4 | -.665663 .543653
_cut5 | -.5036779 .5442082
------------------------------------------------------------------------------
2
Marginal effects:
. mfx compute, predict(outcome(1));
Note: The sign of the marginal effects referring to the highest outcome are the
same as the sign of the estimated parameter beta(j), and the sign of the marginal
effects referring to the lowest outcome are the opposite to the sign of the
estimated parameter beta(j). For intermediate outcome categories, the signs of the
marginal effects are ambiguous and often close to zero (e.g. outcome 3 above). Why
is this?
3
2. Multinomial Logit
In the following example we consider a model of occupational choice within the Kenyan
manufacturing sector (see footnote 1 for a reference for the data). We have data on 950 individuals and
we want to investigate if education, gender and parental background determine occupation.
The explanatory variables are
years of education: educ
gender: male
parental background: f_prof, m_prof (father/mother professional), f_se, m_se
(father/mother self‐employed or trader)
Summary statistics for these variables are as follows:
. sum educ male f_prof f_se m_prof m_se;
4
A breakdown by occupation is a useful first step to see if there are any broad patterns in the data:
. tabstat educ male f_prof f_se m_prof m_se, by(job);
5
Results:
. mlogit job educ male f_prof f_se m_prof m_se;
------------------------------------------------------------------------------
job | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Manag |
educ | .738846 .0755869 9.77 0.000 .5906984 .8869935
male | .0277387 .3383262 0.08 0.935 -.6353685 .690846
f_prof | 1.135737 .3373116 3.37 0.001 .4746187 1.796856
f_se | .1189543 .4074929 0.29 0.770 -.679717 .9176256
m_prof | .3806786 .4661837 0.82 0.414 -.5330247 1.294382
m_se | -.6073577 .4413568 -1.38 0.169 -1.472401 .2576856
_cons | -10.25324 .9913425 -10.34 0.000 -12.19623 -8.310244
-------------+----------------------------------------------------------------
Admin |
educ | .2421636 .0333887 7.25 0.000 .1767229 .3076042
male | -.9075081 .2018354 -4.50 0.000 -1.303098 -.511918
f_prof | .5696015 .2570499 2.22 0.027 .065793 1.07341
f_se | -.0884656 .2616688 -0.34 0.735 -.601327 .4243958
m_prof | -.0135092 .3751632 -0.04 0.971 -.7488156 .7217972
m_se | -.5700617 .256966 -2.22 0.027 -1.073706 -.0664175
_cons | -2.350944 .3941898 -5.96 0.000 -3.123542 -1.578346
-------------+----------------------------------------------------------------
Support |
educ | .2805316 .0723475 3.88 0.000 .1387331 .4223302
male | -.9905816 .3642871 -2.72 0.007 -1.704571 -.276592
f_prof | .6547286 .4707312 1.39 0.164 -.2678877 1.577345
f_se | .8717071 .4237441 2.06 0.040 .0411839 1.70223
m_prof | .7996763 .5500412 1.45 0.146 -.2783846 1.877737
m_se | -.5924061 .5213599 -1.14 0.256 -1.614253 .4294405
_cons | -4.777905 .8675103 -5.51 0.000 -6.478193 -3.077616
------------------------------------------------------------------------------
(Outcome job==Prod is the comparison group)
Marginal effects
6
. mfx compute, predict(outcome(2)) nose;
7
Predicted job probabilities
1. PRIMARY
. list pp1 pp2 pp3 pp0;
+-------------------------------------------+
| pp1 pp2 pp3 pp0 |
|-------------------------------------------|
1. | .0108399 .2284537 .0304956 .7302108 |
+-------------------------------------------+
2. SECONDARY
. list sp1 sp2 sp3 sp0;
+-------------------------------------------+
| sp1 sp2 sp3 sp0 |
|-------------------------------------------|
1. | .1274372 .3683361 .0573239 .4469028 |
+-------------------------------------------+
3. UNIVERSITY
+-----------------------------------------+
| up1 up2 up3 up0 |
|-----------------------------------------|
1. | .6057396 .240109 .0435665 .110585 |
+-----------------------------------------+
/* first collapse the data: this gives a new data set consisting of one
observations and the sample means of the variables */
/* now vary education: since 1985 the Kenyan education system has involved 8 years
for primary education, 4 years for secondary, and 4 years for university */
/* first do primary */
. replace educ = 8;
(1 real change made)
/* now do secondary */
. replace educ=12;
(1 real change made)
8
>
> predict sp1, outcome(1);
(option p assumed; predicted probability)
/* finally do university */
. replace educ=16;
(1 real change made)
9
3. Illustration: The Hausman test for IIA in multinomial logit is totally useless
.
. /* The results shown in the manual */
. mlogit insure male age
------------------------------------------------------------------------------
insure | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Prepaid |
male | .5095747 .1977893 2.58 0.010 .1219148 .8972345
age | -.0100251 .0060181 -1.67 0.096 -.0218204 .0017702
_cons | .2633838 .2787574 0.94 0.345 -.2829708 .8097383
-------------+----------------------------------------------------------------
Uninsure |
male | .4748547 .3618446 1.31 0.189 -.2343477 1.184057
age | -.0051925 .0113821 -0.46 0.648 -.027501 .017116
_cons | -1.756843 .5309591 -3.31 0.001 -2.797504 -.7161824
------------------------------------------------------------------------------
(insure==Indemnity is the base outcome)
.
. mlogit insure male age if insure !="Uninsure":insure
------------------------------------------------------------------------------
insure | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Prepaid |
male | .5144003 .1981735 2.60 0.009 .1259875 .9028132
age | -.0101521 .0060049 -1.69 0.091 -.0219214 .0016173
_cons | .2678043 .2775562 0.96 0.335 -.2761959 .8118046
------------------------------------------------------------------------------
(insure==Indemnity is the base outcome)
10
. hausman . allcats, alleqs constant
chi2(3) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 0.08
Prob>chi2 = 0.9944
(V_b-V_B is not positive definite)
.
. /* confirm that IIA test is nonsense in model with male dummy only */
. mlogit insure male
------------------------------------------------------------------------------
insure | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Prepaid |
male | .477311 .1959282 2.44 0.015 .0932987 .8613234
_cons | -.1772065 .0968274 -1.83 0.067 -.3669847 .0125718
-------------+----------------------------------------------------------------
Uninsure |
male | .46019 .3593228 1.28 0.200 -.2440698 1.16445
_cons | -1.989585 .1884768 -10.56 0.000 -2.358993 -1.620177
------------------------------------------------------------------------------
(insure==Indemnity is the base outcome)
.
. mlogit insure male if insure !="Uninsure":insure
------------------------------------------------------------------------------
insure | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Prepaid |
male | .477311 .1959283 2.44 0.015 .0932987 .8613234
_cons | -.1772065 .0968274 -1.83 0.067 -.3669847 .0125718
------------------------------------------------------------------------------
(insure==Indemnity is the base outcome)
11
.
. hausman . allcats, alleqs constant
chi2(2) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= 0.00
Prob>chi2 = 1.0000
(V_b-V_B is not positive definite)
.
. /* confirm that IIA test is nonsense in model with constant only */
. mlogit insure
------------------------------------------------------------------------------
insure | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Prepaid |
_cons | -.0595623 .0837345 -0.71 0.477 -.2236789 .1045544
-------------+----------------------------------------------------------------
Uninsure |
_cons | -1.876917 .1600737 -11.73 0.000 -2.190656 -1.563179
------------------------------------------------------------------------------
(insure==Indemnity is the base outcome)
.
. mlogit insure if insure !="Uninsure":insure
------------------------------------------------------------------------------
insure | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
Prepaid |
_cons | -.0595623 .0837345 -0.71 0.477 -.2236789 .1045544
------------------------------------------------------------------------------
(insure==Indemnity is the base outcome)
12
. hausman . allcats, alleqs constant
chi2(1) = (b-B)'[(V_b-V_B)^(-1)](b-B)
= -0.00 chi2<0 ==> model fitted on these
data fails to meet the asymptotic
assumptions of the Hausman test;
see suest for a generalized test
13
4. Tobit
-4 -2 0 2 4
z
In the following example we consider a model of company investment within the Ghanaian
manufacturing sector. 2 Our dataset consists of 1,202 observations on firms over the 1991-99 period (in
fact, there is a panel dimension in the data, but we will ignore this for now).
⎛I ⎞
⎜ ⎟ = max{0, α 0 + α1 ln TFPit + α 2 ln K i ,t −1 + uit },
⎝ K ⎠ it
where
I = Gross investment in fixed capital (plant & machinery)
K = Value of the capital stock
TFP = Total factor productivity, defined as ln(output) – 0.3ln(K) – 0.7ln(L), where L is
employment
u = a residual, assumed homoskedastic and normally distributed.
There is evidence physical capital is 'irreversible' in African manufacturing, i.e. selling off fixed
capital is difficult due to the lack of a market for second hand capital goods (Bigsten et al., 2005). We
can thus view investment as a corner response variable: investment is bounded below at zero.
2
This is an extension of the dataset used by Söderbom and Teal (2004).
14
Table 1. Summary statistics
Variable | Obs Mean Std. Dev. Min Max
-------------+--------------------------------------------------------
invrate | 1202 .0629597 .1477861 0 1
invdum | 1202 .4550749 .4981849 0 1
tfp | 1202 10.20903 1.108122 5.049412 14.7326
lk_1 | 1202 16.06473 3.104121 9.555573 23.51505
------------------------------------------------------------------------------
invrate | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
tfp | .0114908 .0038443 2.99 0.003 .0039484 .0190331
lk_1 | .0002798 .0013724 0.20 0.838 -.0024127 .0029723
_cons | -.058845 .0440225 -1.34 0.182 -.1452148 .0275248
------------------------------------------------------------------------------
------------------------------------------------------------------------------
invrate | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
tfp | .0344135 .0077922 4.42 0.000 .0191257 .0497012
lk_1 | .0123672 .0027384 4.52 0.000 .0069947 .0177397
_cons | -.6158372 .0913444 -6.74 0.000 -.7950496 -.4366247
-------------+----------------------------------------------------------------
_se | .2540915 .0083427 (Ancillary parameter)
------------------------------------------------------------------------------
15
Figure 2. Predicted investment rates as a function of log TFP
.2 .15
Investment rate
.1 .05
9 9.5 10 10.5 11
log TFP
Est imat ed E(y|y>0,x), tobit Es timat ed E(y|x), t obit
Est imat ed E(y|x), OLS
16
References
Bigsten, Arne, Paul Collier, Stefan Dercon, Marcel Fafchamps, Bernard Gauthier, Jan Willem
Gunning, Remco Oostendorp, Catherine Pattillo, Måns Söderbom, and Francis Teal (2005).
“Adjustment Costs, Irreversibility and Investment Patterns in African Manufacturing,” The B.E.
Journals in Economic Analysis & Policy: Contributions to Economic Analysis & Policy 4:1, Article
12, pp. 1-27.
Söderbom, Måns, and Francis Teal (2004). “Size and Efficiency in African Manufacturing Firms:
Evidence from Firm-Level Panel Data,” Journal of Development Economics 73, pp. 369-394.
17
18