0% found this document useful (0 votes)
61 views

Binary Response Models: Logits, Probits and Semiparametrics: Joel L. Horowitz and N.E. Savin

The document discusses binary response models used in econometrics and biometrics. It reviews commonly used logit and probit models, which assume a known functional form relating an outcome variable to explanatory variables. However, the functional form is often unknown in practice. Semiparametric and nonparametric models relax this assumption by leaving the functional form unspecified.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
61 views

Binary Response Models: Logits, Probits and Semiparametrics: Joel L. Horowitz and N.E. Savin

The document discusses binary response models used in econometrics and biometrics. It reviews commonly used logit and probit models, which assume a known functional form relating an outcome variable to explanatory variables. However, the functional form is often unknown in practice. Semiparametric and nonparametric models relax this assumption by leaving the functional form unspecified.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Journal of Economic Perspectives—Volume 15, Number 4 —Fall 2001—Pages 43–56

Binary Response Models: Logits, Probits


and Semiparametrics

Joel L. Horowitz and N.E. Savin

A binary response model is a regression model in which the dependent


variable Y is a binary random variable that takes on only the values zero
and one. In many economic applications of this model, an agent makes a
choice between two alternatives: for example, a commuter chooses to drive a car to
work or to take public transport. Another example is the choice of a worker
between taking a job or not. Driving to work and taking a job are choices that
correspond to Y ⫽ 1, and taking public transport and not taking a job to Y ⫽ 0. The
model gives the probability that Y ⫽ 1 is chosen conditional on a set of explanatory
variables. In the transportation example, common explanatory variables include
the time and the cost of travel; in the worker example, common explanatory
variables include age, education and experience.
The econometric problem is to estimate the conditional probability that
Y ⫽ 1 considered as a function of the explanatory variables. The most commonly
used approach, notably logit and probit models, assumes that the functional form
of the dependence on the explanatory variables is known. The logit and probit
models have been used almost exclusively in econometric applications in the
leading journals.
However, the functional form is seldom known in practice. If the functional
form is misspecified, then the estimates of the coefficients and the inferences based
on them can be highly misleading. It is possible to relax the restrictive assumption
that the functional form is known by using either semiparametric or nonparametric
models. In these types of models, the functional form is unknown. The problems of
estimating semiparametric and nonparametric binary response models have

y Joel L. Horowitz is Professor of Economics, Northwestern University, Evanston, Illinois.


N.E. Savin is Professor of Economics, University of Iowa, Iowa City, Iowa.
44 Journal of Economic Perspectives

generated considerable interest in recent years. This paper traces the evolution of
estimation methods for binary response models.

Review of Linear, Logit and Probit Models

This section reviews the binary response models most commonly used in
applications. For simplicity, consider the case where the probability that Y takes on
the value zero or one is conditional on a single explanatory variable X.
Suppose that the true conditional probability of Y ⫽ 1 given X ⫽ x is
P (Y ⫽ 1兩X ⫽ x) ⫽ F (␤0 ⫹ x␤1). In the parametric approach to modeling, the
function F is known and the values of the parameters ␤0 and ␤1 are unknown.
The linear probability model specifies that the conditional probability is a linear
function of X: P (Y ⫽ 1兩X ⫽ x) ⫽ ␤0 ⫹ x␤1. This model says that the probability of
Y ⫽ 1 increases by ␤1 for each one unit increase in X. This implies that the
probability is negative if X is small enough and greater than one if X is large
enough. Hence, this model has the defect that the conditional probability is not
constrained to lie between zero and one.
This defect can be corrected by replacing the linear function with one with a
lower kink that keeps the conditional probability from being less than zero and an
upper kink that keeps it from being greater than one. The two-kink function is
illustrated in Figure 1; it is the cumulative distribution function of a uniform
distribution. The cumulative distribution function of a probability distribution is an
S-shaped curve that has a lower bound of zero and an upper bound of one. Thus,
if F is a cumulative distribution function, the conditional probabilities are automat-
ically constrained to lie between zero and one.
The kinks in the two-kink function are unrealistic, however. The commonly
used cumulative distribution functions are smooth S-shaped curves. Such a curve is
illustrated in Figure 1. It is the cumulative distribution function of a normal
distribution.
A binary response model is referred to as a probit model if F is the cumulative
normal distribution function. It is called a logit model if F is the cumulative logistic
distribution function. The logistic and normal distributions are both symmetrical
around zero and have very similar shapes, except that the logistic distribution has
fatter tails. As a result, the conditional probability functions are very similar for both
models, except in the extreme tails.
The estimation problem is to estimate the unknown parameters ␤1 and ␤2. In
practice, the linear probability model is estimated by fitting a straight line to the
observations on X and Y by ordinary least squares. The ordinary least squares–
based predictions of the conditional probability can be greater than one or less
than zero. The logit and probit models are typically estimated by maximum
likelihood. This is because the maximum likelihood estimator has good properties
in large samples. In particular, it is asymptotically efficient; that is, it is the most
precise estimator in large samples.
Joel L. Horowitz and N.E. Savin 45

Figure 1
Cumulative Probability Distributions

In the past, the main motivation for using ordinary least squares was compu-
tational effort. Software for ordinary least squares regression became widely avail-
able much earlier than software for maximum likelihood estimation of the logit
and probit models. Now, maximum likelihood can be performed by any one of
several well-known commercial statistical computing packages. There is no longer
any computational advantage to using ordinary least squares.

Biometric Background

The archetypical biometric study is one that analyzes the effect of a pesticide
on an insect. The two outcomes are dead and alive. The dependent variable, Y,
takes on the value one if the insect dies (“success”) and zero if it survives (“failure”).
The explanatory variable, X, is the dose of a pesticide. The data used in the study
are produced by an experiment in which the experimenter controls the dose
applied to each insect. Hence, the experimenter controls the values of X. A small
number of X values are used: the number typically ranges between three and eight.
An important feature of the experiment is that many insects receive the same
dose. This situation, which is referred to as one with many observations per cell, is
illustrated by a classic pesticide experiment. In the experiment, batches of about
fifty Macrosiphoniella sanborni, the chrysanthemum aphis, were sprayed with a series
of concentrations of pesticide rotenone. The results are reported in Table 1. Given
many observations per cell, the sample proportion of the dead insects can be
computed for each cell, that is, for each value of X in the experiment. (The insects
apparently dead, moribund or so badly affected as to be unable to walk more than
a few steps were classified as dead for the purpose of the analysis.) These sample
46 Journal of Economic Perspectives

Table 1
Results of the Rotenone-Macrosiphoniella sanborni Experiment

Dose (in logarithms) Number of Insects Number Dead Percentage Dead

0.41 50 6 12
0.58 48 16 33
0.71 46 24 52
0.89 49 42 86
1.01 50 44 88

Source: Finney (1971).

proportions are natural estimates of the conditional probabilities that Y ⫽ 1 for


those X values that occur in the sample data.
The focus in biometric applications, however, is to obtain the conditional
probability that Y ⫽ 1 given X ⫽ x for all values of X, not just for the values of X in
the data. One of the main objectives of pesticide studies is to estimate the dose that
kills a certain proportion of the insects, say, 50 percent. To obtain a good estimate
of this dose, the biometrician needs to have more information than is supplied by
the sample proportions for the X values in the data. Specifically, a model is needed
to estimate the probability that Y ⫽ 1 at points between the values of X in the data.
The probit model has been the dominant model in biometrics. The leading
textbook in biometrics for many years was Probit Analysis by Finney (1971). The
experience of one of us (Savin) with pesticide studies suggests that both logit and
probit models provide good fits to samples from laboratory-reared colonies. The
two models give similar predictions except for extreme values of the dose. There is
no compelling biological reason, however, to adopt either the logit or the probit
specification.
A key question in economics is whether models are “structural models,” that is,
whether the values of the parameters do not change across certain environments,
such as policy regimes. A similar problem arises with biometric data. Savin and his
colleagues reviewed a series of studies in which the probit model was estimated for
different generations of laboratory colonies of the same species and similarly for
the logit model. Their findings overwhelmingly rejected the hypothesis that the
parameter values were the same for different generations (Savin, Robertson and
Russell, 1977). This is consistent with Finney (1964, pp. 91–92): “In general, the
assumption that a response curve once determined can be used in further assays
[analysis] is inadmissible.”
The assumption that the binary response function is structural should be
treated with caution. If it does not hold up in laboratory-controlled colonies, then
there is reason to believe that it may not hold up in the real-world evidence studied
by economists. However, binary response studies in economics are seldom re-
peated, so it is more difficult to detect if there is parameter constancy. One notable
exception is mode choice for travel to work. The transferability of the parameters
of mode choice models among cities was studied intensively in the 1970s.
Binary Response Models: Logits, Probits and Semiparametrics 47

The major controversy in biometrics has not been over logit versus probit
modeling, but over estimation methods. For some decades, there was controversy
over whether to use the minimum chi-square method proposed by Berkson (1944)
or to use maximum likelihood. The controversy, which was resolved in favor of
maximum likelihood, is briefly summarized in Amemiya (1985).

Econometric Applications

The 1970s witnessed an upsurge of models involving discrete dependent


variables in econometrics, as documented by Amemiya (1981). In economic appli-
cations, the data typically come from surveys rather than controlled experiments.
As will be explained, the nature of survey data has tipped the balance in favor of
maximum likelihood estimation and the logit model, at least until recently.
In survey data, there are often few observations on the dependent variable per
cell. The reason for few observations per cell is that survey data often involve several
explanatory variables, some of which are continuous. In the case of a continuous
variable, it is easy to see that there may be a large number of sparsely populated
cells. But this can happen even if the explanatory variables take on a modest
number of discrete values. For example, if there are four explanatory variables that
can each take on ten values, there are 104 ⫽ 10,000 possible cells. Hence, it is not
surprising that there are often few observations per cell for data with multiple
explanatory variables.
An advantage of maximum likelihood estimation is that it is feasible when
there are few observations per cell, which includes the case of no observations
in some cells. The logit work trip models considered by Domencich and
McFadden (1975) illustrate the case of few observations per cell. The models
explain the probability that an individual chooses auto over mass transit. The
transit mode is primarily bus, but there are also some streetcar trips. In one
version of the model, the explanatory variables are transit walk time (in min-
utes), auto-in-vehicle time (in minutes) minus transit station-to-station time (in
minutes), auto parking charges (in dollars) minus transit fare (in dollars), and
autos per worker in the household. The sample employed to estimate this
version of the model consists of 115 observations on individual trip makers. In
this example, there are many cells with no observations and some with only one
observation per cell. The maximum likelihood estimates of the parameters and
the t-ratios are presented in Table 2. The estimated coefficients indicate, among
other things, that the value of transit walk time is $3.94/hour (60 ⫻ 0.147/
2.24), whereas the value of transit station-to-station time is only $1.10/hour
(60 ⫻ 0.0411/2.24). In fact, most analyses of urban travel behavior find that
travelers place a higher value on walk time than on in-vehicle time.
In survey data, there are often three or more possible responses. Models in
which a single dependent variable takes on three or more discrete and unor-
dered values are usually called multinomial response models. Multinomial
48 Journal of Economic Perspectives

Table 2
Maximum Likelihood Estimates of Logit Work Trip Model

Variable Parameter Asymptotic t-ratio

Constant ⫺4.76 ⫺3.79


Transit walk time 0.147 ⫺2.62
Auto-in-vehicle time minus transit station-to-station time ⫺0.0411 ⫺2.05
Auto parking charges minus transit fare ⫺2.24 ⫺4.53
Autos per worker in the household 3.78 4.06

Source: Domencich and McFadden (1975).

models have been extensively used in urban travel demand studies; for example,
a work trip may be taken using one of three travel modes: bus, carpool or
driving alone. They have also been used in many other applications. For
example, in an ownership study, the household may own an electric dryer, a gas
dryer or no dryer.
Multinomial versions of the logit and probit models have been the subject of
much attention in the econometric literature; see, for example, Amemiya (1985),
Mittelhammer, Judge and Miller (2000) and Ruud (2000). The multinomial logit
specification is attractive on analytical grounds because of an important and
elegant result by McFadden (1974), which shows that the multinomial logit model
can be derived from utility maximization under certain conditions and that the
probabilities have simple closed-form expressions.1 The logit specification also
became attractive in the 1970s and 1980s on computational grounds because
maximum likelihood estimation of the logit model was feasible with the computing
technology available at that time.
One danger in using the multinomial logit model is that it can produce
misleading inferences when some of the alternatives are close substitutes. This
arises because the multinomial logit specification imposes the restriction that the
odds (the ratios of the probabilities) of choosing the jth alternative over the ith
depend only on the characteristics of those two alternatives. In other words, the
characteristics of any other alternatives in the choice set have no influence on the
odds between the ith and jth alternatives. This feature is called the independence from
irrelevant alternatives property.
McFadden’s red bus and blue bus example illustrates the problem caused by
independence from irrelevant alternatives. Suppose the initial transportation
choice is between driving and taking a red bus. Assume individuals are split fifty-fifty
between driving and taking the red bus, which implies the odds of taking the red

1
As an example, suppose an individual has three alternatives, and the utility associated with the jth
alternative is Uj ⫽ ␮j ⫹ ␧j , j ⫽ 0, 1, 2, where ␮j is a nonstochastic function of the explanatory variables
and unknown parameters and ␧j is an unobservable random variable that follows the Type I extreme
value (log Weibull) distribution. Then the probability of an individual choosing the first alternative,
P(Y ⫽ 1兩X ⫽ x), is e ␮1 divided by e ␮1 ⫹ e ␮2 ⫹ e ␮3 , and similarly for the other alternatives.
Joel L. Horowitz and N.E. Savin 49

bus over driving are 1:1. Now, suppose a new, rival bus company introduces a blue
bus, which aside from color is indistinguishable from the red bus. Then it is
reasonable to expect that the bus riders split their trips evenly between the two
buses and that car drivers will continue to drive.
If the independence from irrelevant alternatives property holds, however, then
the odds of taking the red bus over driving still would be 1:1. The odds of taking
the red bus over the blue bus must also be 1:1, because the two types of bus are
indistinguishable except for color. Therefore, because the probabilities of the three
alternatives have to sum to one, independence from irrelevant alternatives implies
that the probabilities of driving, taking the red bus and taking the blue bus must be
equal to one-third. Hence, the multinomial logit model gives the counter-intuitive
prediction that one-third ((21 ⫺ 31)/(21) ⫽ 31) of the car drivers will switch to the bus
without any real change in the alternatives simply because there are now both blue
and red buses.
The practical message here is that multinomial logit specification should
only be used in applications where the alternatives are dissimilar. When the
multinomial logit specification is not plausible, other models can be used. One
is the so-called nested multinomial logit model (McFadden, 1977, 1981). An-
other model, which is more flexible than the nested logit model, is the multi-
nomial probit model. Until recently, the application of the multinomial probit
model with more than three or four choices was computationally difficult; it
involves evaluating multiple integrals, which is computer intensive. But due to
dramatic increases in computer speed and the recent development of new
computing algorithms, simulation-based estimation of multinomial probit mod-
els is now quite feasible.2 Currently available software for implementing these
methods is cited in Geweke and Keane (2001).

An Earnings Example

This section uses an earnings example to illustrate the differences between a


linear probability model and a logit model. The example consists of estimating the
probability that an individual’s weekly wages are below $280 a week, conditional on
education and experience. Following a commonly used approach in labor econom-
ics, the general specification is

P共Y ⫽ 1兩ed, ex兲 ⫽ F共 ␤ 0 ⫹ ␤ 1 ed ⫹ ␤ 2 ex ⫹ ␤ 3 ex 2兲,

2
The ideas motivating the simulation estimation methods are due to Lerman and Manski (1981),
McFadden (1989) and Pakes and Pollard (1989). Several approaches that combine estimation and
simulation are described in Hajivassiliou and Ruud (1994) and Geweke and Keane (2001). Moreover,
Geweke, Keane and Runkle (1994, 1997) show that several competitive simulation-based methods yield
reliable inference in finite samples.
50 Journal of Economic Perspectives

where F is a function that depends on the model being estimated, Y ⫽ 1 if the


individual’s weekly wage is less than or equal to the $280, ed is years of education,
and ex is years of experience. A weekly wage of $280 is close to the tenth percentile
of weekly wages of all individuals in the data set. The data are taken from the 1993
Current Population Survey (CPS). They consist of observations of weekly wages,
years of education and years of experience for 3,123 full-time, full-year white males
who were employed in a metropolitan area in the north central region of the
United States.
According to the linear probability model,

Y ⫽ ␤ 0 ⫹ ␤ 1 ed ⫹ ␤ 2 ex ⫹ ␤ 3 ex 2 ⫹ U,

where U is an unobserved random variable whose mean is zero. The binary logit
model is

P共Y ⫽ 1兩ed, ex兲 ⫽ F L 共 ␤ 0 ⫹ ␤ 1ed ⫹ ␤ 2 ex ⫹ ␤ 3 ex 2兲,

where FL is the cumulative logistic distribution function, FL(z) ⫽ 1/(1 ⫹ e⫺z), and

z ⫽ ␤ 0 ⫹ ␤ 1ed ⫹ ␤ 2ex ⫹ ␤ 3ex 2.

We estimated the linear probability model by ordinary least squares and the binary
logit model by maximum likelihood.
The estimated coefficients and standard errors are reported in Table 3. For
both models, the estimated coefficients are different from zero at the usual
significance levels. The models can be easily compared by plotting the predicted
probabilities of earning less than $280. Figure 2 shows the predictions of P(Y ⫽
1兩ed, ex) by the two models as a function of experience, ex, which is on the
horizontal axis, when the level of education is fixed. For each model, there are
two curves: one curve shows the probability of earning less than $280 for those
with a high school education only (12 years of education, ed ⫽ 12) and the other
for those with a college education only (16 years of education, ed ⫽ 16).
The predictions from the linear probability and logit models are very different,
so the two models are not substitutes for one another, and the linear model is not
a good approximation to the logit model. Most notably, the linear model gives
negative predicted probabilities for ed ⫽ 16 over a wide range of experience levels,
namely, 19 ⱕ ex ⱕ 40. This implies that the linear model is badly misspecified and
provides a poor fit to the data.
In many applications, the change in the conditional probability function with
respect to a change in the explanatory variables (the derivative) is of interest,
especially when analyzing the effects of policy changes. Given a fixed value for
education, the derivative with respect to experience is

⭸F/⭸ex ⫽ ␤ 2 ⫹ 2 ␤ 3 ex
Binary Response Models: Logits, Probits and Semiparametrics 51

Table 3
Coefficients of Linear and Logit Models

Model Variable Coefficient Std. Error

Linear Intercept 0.715 0.031


Experience ⫺0.025 0.002
Experience squared 0.0004 0.00001
Education ⫺0.025 0.002
Logit Intercept 5.068 0.422
Experience ⫺0.251 0.020
Experience squared 0.004 0.0004
Education ⫺0.367 0.029

Figure 2
Predicted Probability that Weekly Wages are Less Than or Equal to $280

Source: 1993 Current Population Survey.

for the linear probability model and

⭸F/⭸ex ⫽ P共Y ⫽ 1兩ed, ex兲P共Y ⫽ 0兩ed, ex兲共 ␤ 2 ⫹ 2 ␤ 3 ex兲

for the logit model. Figure 3 shows the derivatives of the linear probability and logit
models. The derivative function of the linear probability model is an upward
sloping straight line, a linear function of experience. By contrast, the derivative
function of the logit model is a curve, a nonlinear function of experience.
The derivative of the linear probability model and the derivative of the logit
model have the same value only where the straight line and the curve intersect.
Thus, it is clear from Figure 3 that the straight line is not a good approximation to
the curve for ed ⫽ 12, ed ⫽ 13.5 or ed ⫽ 16. Table 4 reports the derivatives evaluated
52 Journal of Economic Perspectives

Figure 3
Derivatives of Probability Function with Respect to Experience for Linear and
Logit Models

Table 4
Derivatives of Conditional Probability Function

Experience Mean Experience Experience Experience


Model Education ⫽ 19.9 Conditional on Education ⫽5 ⫽ 10

Derivatives with Respect to Experience

Linear 12 ⫺0.07 ⫺0.021 ⫺0.017


Logit 12 ⫺0.004 ⫺0.050 ⫺0.027
Linear 16 ⫺0.012 ⫺0.021 ⫺0.017
Logit 16 ⫺0.003 ⫺0.023 ⫺0.008
Linear 13.5 ⫺0.008
Logit 13.5 ⫺0.003
Nonparametric 12 ⫺0.003 ⫺0.058 ⫺0.031

Derivatives with Respect to Education

Linear 13.5 ⫺0.025


Logit 13.5 ⫺0.013

Notes: The mean values of experience conditional on education are 21.4 and 16.3, respectively, for 12
and 16 years of education. The unconditional mean values of experience and education are 13.5 years
and 19.9 years, respectively.

at three values of experience, including the mean value of experience conditional


on the educational level. The table also shows derivatives of the probability function
at the unconditional mean values of education and experience. Derivatives are
often evaluated at the mean values of the explanatory variables. The results in Table
4 show that the derivative at the mean value may be a poor approximation of the
Joel L. Horowitz and N.E. Savin 53

derivative evaluated at other values and that the derivative of the linear probability
model at the mean of experience may be a poor approximation to the derivative of
the logit model at the mean.

Nonparametrics to Semiparametrics

As noted in the introduction, there is usually little justification for assuming


that the conditional probability function is known. Again, consider estimating the
probability that weekly wages are below $280 a week, conditional on education and
experience. The conditional probability function was previously estimated assum-
ing the logit model is correct. Another approach is to estimate the conditional
probability function nonparametrically. In nonparametric estimation, no assump-
tions are made about the form of the function or the nature of the dependence on
the explanatory variables (Härdle, 1990). Figure 4 displays the predictions as a
function of ex for ed ⫽ 12 for the logit model estimated by maximum likelihood and
for the nonparametric alternative.3 The predictions of the logit and nonparametric
models differ noticeably, especially when ex ⬍ 20. A formal test based on Horowitz
and Spokoiny (2001) confirms that the predictions are, in fact, significantly differ-
ent. This implies that the logit model is misspecified.
Of course, the fit of the logit model may be improved by adding powers of
experience. Searching over the powers amounts to informal nonparametric fitting.
The performance of such an informal specification search depends on the search
rule (what powers to add) and the stopping rule (how many powers to add). With
the search and stopping rules typically employed in practice but seldom explicitly
cited, informal nonparametric fitting is inconsistent; it does not recover the true
conditional probability function as the sample size increases.
The nonparametric approach maximizes flexibility in that it imposes no dis-
tributional assumptions on F. Matzkin (1992, 1993) has studied nonparametric
estimation of structural binary response models. However, the price of the flexi-
bility of the nonparametric models can be high for several reasons. One is that
estimation precision decreases rapidly as the number of explanatory variables
increases. This is due to the so-called curse of dimensionality. As a result, imprac-
ticably large samples may be needed to obtain acceptable estimation precision
when there are multiple explanatory variables. A second problem with nonpara-
metric estimation is that its results can be difficult to display, report and interpret
when there are multiple explanatory variables. A third problem is that nonpara-
metric estimation does not permit extrapolation.
The distinction between semiparametric and nonparametric models is that a
semiparametric model makes assumptions that avoid the curse of dimensionality.
Often, the resulting model includes an unknown finite-dimensional parameter.

3
We used the Nadaraya-Watson kernel estimation method with the standard normal density function for the
kernel. The bandwidth was chosen by least squares cross-validation, which yielded a bandwidth value of 1.6.
54 Journal of Economic Perspectives

Figure 4
Predicted Probability that Weekly Wages are Less than or Equal to $280, ed ⴝ 12

Thus, the semiparametric approach is a halfway house between the parametric and
nonparametric approaches. It imposes less structure than does the parametric
approach, but more structure than the nonparametric approach imposes, and thus
it is a compromise between restrictive distributional assumptions on the one hand
and flexibility on the other.
This section briefly introduces two approaches to semiparametric estimation,
one based on single-index modeling and the other on the binary response version of
the median regression model. These two approaches are the consequence of dif-
ferent sets of identification conditions. The identification problem is to find
conditions under which the parameters are uniquely determined by the population
distribution of the data and auxiliary assumptions. Manski (1988) has shown that in
semiparametric settings the conditions required for identification are often subtle.
In the case of the single index model, the conditional probability that Y ⫽ 1
given X ⫽ x has the form P(Y ⫽ 1兩X ⫽ x) ⫽ G(x␤), where ␤ is an unknown K ⫻ 1
constant vector and G is an unknown distribution function. The quantity x␤ is called
an index, where, in general, X is a 1 ⫻ K random vector. If there are two explanatory
variables, the index is x1␤1 ⫹ x2␤2. The semiparametric estimation problem is to
estimate both ␤ and G from observations on X and Y. Note that the above
specification is more flexible than are the logit and probit binary response models.
The logit (probit) model is a special case obtained by assuming the logistic
(normal) cumulative distribution function for G. Under the appropriate condi-
tions, the asymptotically efficient estimator of ␤ in a single-index model is the
semiparametric maximum likelihood estimator of Klein and Spady (1993).
Another approach specifies that Y ⫽ 1 if X␤ ⫹ U ⬎ 0 and Y ⫽ 0 otherwise,
where U is an unobserved random variable whose median is zero. Except in special
cases, the resulting model is not a single-index model. Manski (1985) proposed the
Binary Response Models: Logits, Probits and Semiparametrics 55

maximum score estimator for estimating ␤ consistently. Horowitz (1992) modified this
estimator to improve its precision and other statistical properties. See Greene
(2000) for a textbook presentation of the maximum score estimator and an
empirical application.
This section concludes with an empirical example that illustrates the useful-
ness of semiparametric single-index models. The example is taken from Horowitz
and Härdle (1996) and consists of the estimating of a model of product innovation
by German manufactures of investment goods. The estimation of a binary probit
model appeared to produce reasonable estimates for the parameters and the
probability of an innovation. If the probit model is correct, the first derivative of G
is the standard normal density. Semiparametric estimation revealed, however, that
the first derivative of G was bimodal. This is an important feature of the data that
could not easily be found by standard parametric methods. The product innovation
study demonstrates that usefulness of a semiparametric single-index model in
reducing the risk of obtaining misleading results.

Concluding Comments

The focus of this paper has been on the evolution of more flexible estimation
methods in binary response and its multinomial cousin. For the sake of brevity,
many topics involving binary variables have been omitted. These include random
coefficient models, panel data models and choice-based sampling. Moreover, the
evolution in estimation methods is ongoing both in classical and Bayesian econo-
metrics, although this paper only considers classical econometrics. Geweke, Keane
and Runkle (1994, 1997) present some recent developments using the Bayesian
approach. The expectation is that flexible methods will become more widely
employed with the development of user-friendly software. The new methods prom-
ise to offer more insightful data analysis and, in turn, to provide a sounder basis for
policy and forecasting.

y The authors thank John Geweke, Brad De Long, Alan Krueger, Jim McDonald, Forrest
Nelson, Timothy Taylor, Shawn Ulrick, Michael Waldman and Allan Würtz for their help
and constructive comments on earlier draft material. The research of Joel L. Horowitz was
supported in part by NSF Grant SES-9910925.

References

Amemiya, Takeshi. 1981. “Qualitative Re- Amemiya, Takeshi. 1985. Advanced Economet-
sponse Models: A Survey.” Journal of Economic rics. Cambridge, Mass.: Harvard University Press.
Literature. December, 19:4, pp. 1483–536. Berkson, Joseph. 1944. “Application of the
56 Journal of Economic Perspectives

Logistic Function to Bio-Assay.” Journal of the Approximate Choice Probabilities,” in Structural


American Statistical Association. 39, pp. 357– 65. Analysis in Discrete Data with Econometric Applica-
Domencich, Tom A. and Daniel McFadden. tions. Charles F. Manski and Daniel L. McFad-
1975. Urban Travel Demand: A Behavioral Analysis. den, eds. Cambridge, Mass.: MIT Press, pp. 305–
Amsterdam: North-Holland Publishing Co. 19.
Finney, David J. 1964. Statistical Method in Bio- Manski, Charles. F. 1985. “Semiparametric
logical Assay. New York: Hafner. Analysis of Discrete Response: Asymptotic Prop-
Finney, David J. 1971. Probit Analysis, Third erties of the Maximum Score Estimator.” Journal
Edition. Cambridge: Cambridge University Press. of Econometrics. March, 27:3, pp. 313–33.
Geweke, John and Michael Keane. 2001. Manski, Charles F. 1988. “Identification in Bi-
“Computationally Intensive Methods for Integra- nary Response Model.” Journal of the American
tion in Econometrics,” in Handbook of Economet- Statistical Association. September, 83:403, pp.
rics, Volume V. James J. Heckman and Edward E. 729 –38.
Leamer, eds. Amsterdam: North-Holland. Forth- Matzkin, Rosa L. 1992. “Nonparametric and
coming. Distribution-Free Estimation of the Binary
Geweke, John, Michael Keane and David Choice and the Threshold-Crossing Models.”
Runkel. 1994. “Alternative Computational Ap- Econometrica. March, 60:2, pp. 239 –70.
proaches to Statistical Inference in Multinomial Matzkin, Rosa L. 1993. “Nonparametric Iden-
Probit Models.” Review of Economics and Statistics. tification and Estimation of Polychotomous
November, 76:4, pp. 609 –32. Choice Models.” Journal of Econometrics. July, 58:
Geweke, John, Michael Keane and David 1-2, pp. 137– 68.
Runkel. 1997. “Statistical Inference in the Multi-
McFadden, Daniel L. 1974. “Conditional Logit
nomial Multiperiod Probit Model.” Journal of
Analysis of Qualitative Choice Behavior,” in Fron-
Econometrics. September, 80:1, pp. 125– 65.
tiers in Econometrics. Paul Zarembka, ed. New
Greene, William H. 2000. Econometric Analysis,
York: Academic Press, pp. 105– 42.
Fourth Edition. Upper Saddle River, N. J.: Pren-
McFadden, Daniel L. 1977. “Quantitative
tice Hall.
Methods for Analyzing Travel Behavior of Indi-
Hajivassiliou, Vassilis A. and Paul A. Ruud.
viduals: Some Recent Developments.” Cowles
1994. “Classical Estimation for LDV Models Us-
Foundation Discussion Paper No. 474.
ing Simulation,” in Handbook of Econometrics, Vol-
McFadden, Daniel L. 1981. “Econometric
ume IV. Robert F. Engle and Daniel L. McFad-
Models of Probabilistic Choice,” in Structural
den, eds. Amsterdam: North-Holland, pp. 2383–
Analysis of Discrete Data with Econometric Applica-
441.
Härdle, Wolfgang. 1990. Applied Nonparametric tions. Charles F. Manski. and Daniel L. McFad-
Regression. Cambridge: Cambridge University den, eds. Cambridge, Mass.: MIT Press. pp. 198 –
Press. 272.
Horowitz, Joel L. 1992. “A Smoothed Maxi- McFadden, Daniel L. 1989. “A Method of Sim-
mum Score Estimator for Binary Response Mod- ulated Moments for Estimation of Discrete Re-
els.” Econometrica. May, 60:3, pp. 505–31. sponse Models without Numerical Integration.”
Horowitz, Joel L. and Wolfgang Härdle. 1996. Econometrica. 57:5, pp. 995–1026.
“Direct Semiparametric Estimation of Single-In- Mittelhammer, Ron C., George G. Judge and
dex Models with Discrete Covariates.” Journal of Douglas J. Miller. 2000. Econometric Foundations.
the American Statistical Association. December, 91: Cambridge: Cambridge University Press.
436, pp. 1632– 40. Pakes, Ariel A. and David Pollard. 1989. “Sim-
Horowitz, Joel L. and Valdimir G. Spokoiny. ulation and the Asymptotics of Optimization Es-
2001. “An Adaptive, Rate Optimal Test of a Para- timators.” Econometrica. 57:5, pp. 1027–57.
metric Mean-Regression Model against a Non- Ruud, Paul A. 2000. An Introduction to Classical
parametric Alternative.” Econometrica. May, 69:3, Econometric Theory. Oxford: Oxford University
pp. 599 – 631. Press.
Klein, Roger W. and Richard H. Spady. 1993. Savin, N.E., Jaqueline L. Robertson and Rob-
“An Efficient Semiparametric Estimator for Bi- ert M. Russell. 1977. “A Critical Evaluation of
nary Response Models.” Econometrica. March, Bioassay in Insecticide Research: Likelihood Ra-
61:2, pp. 387– 421. tio Tests of Dose-Mortality Regression.” Bulletin
Lerman, Steven R. and Charles F. Manski. of the Entomological Society of America. 23:4, pp.
1981. “On the Use of Simulated Frequencies to 257– 66.
This article has been cited by:

1. Athanasios Triantafyllou, George Dotsis, Alexandros Sarris. 2020. Assessing the Vulnerability to
Price Spikes in Agricultural Commodity Markets. Journal of Agricultural Economics 71:3, 631-651.
[Crossref]
2. James Wangu, Ellen Mangnus, A.C.M. (Guus) van Westen. 2020. Limitations of Inclusive
Agribusiness in Contributing to Food and Nutrition Security in a Smallholder Community. A Case
of Mango Initiative in Makueni County, Kenya. Sustainability 12:14, 5521. [Crossref]
3. Stanley Karanja Ng’ang’a, Dorcas Anyango Jalang’o, Evan Hartunian Girvetz. 2020. Adoption of
technologies that enhance soil carbon sequestration in East Africa. What influence farmers’ decision?.
International Soil and Water Conservation Research 8:1, 90-101. [Crossref]
4. Vanessa Naegels, Bert D’Espallier, Neema Mori. 2020. Perceived problems with collateral: The value
of informal networking. International Review of Economics & Finance 65, 32-45. [Crossref]
5. Judith Clifton, Daniel Díaz‐Fuentes, Gonzalo Llamosas García. 2020. A Behavioral Perspective on
Saving Decisions. Empirical Evidence for Policymakers in the European Union. Global Policy 11:S1,
62-72. [Crossref]
6. Euijin Yang, Kasey M. Faust. 2019. Dynamic Public Perceptions of Water Infrastructure in US
Shrinking Cities: End-User Trust in Providers and Views toward Participatory Processes. Journal of
Water Resources Planning and Management 145:9, 04019040. [Crossref]
7. Benjamin Enns, Emanuel Krebs, William C. Mathews, Richard D. Moore, Kelly A. Gebo, Bohdan
Nosyk. 2019. Heterogeneity in the costs of medical care among people living with HIV/AIDS in the
United States. AIDS 33:9, 1491-1500. [Crossref]
8. Lallen T. Johnson. 2019. Modeling Urban Neighborhood Violence: The Systemic Model and Variable
Effects of Social Structure. Urban Affairs Review 27, 107808741984401. [Crossref]
9. Naveen Kumar, Liangfei Qiu, Subodha Kumar. 2018. Exit, Voice, and Response on Digital Platforms:
An Empirical Investigation of Online Management Response Strategies. Information Systems Research
29:4, 849-870. [Crossref]
10. Gery Geenens, Thomas Cuddihy. 2018. Non‐parametric evidence of second‐leg home advantage
in European football. Journal of the Royal Statistical Society: Series A (Statistics in Society) 181:4,
1009-1031. [Crossref]
11. Nik N. Bin Muhd Noor, Keming Yu, Ujjwal Bharadwaj, Tat-Hean Gan. 2018. Making use of external
corrosion defect assessment (ECDA) data to predict DCVG %IR drop and coating defect area.
Materials and Corrosion 69:9, 1237-1256. [Crossref]
12. Gebremedhin Gebremeskel, T. G. Gebremicael, Hadush Hagos, Teferi Gebremedhin, Mulubrehan
Kifle. 2018. Farmers’ perception towards the challenges and determinant factors in the adoption of
drip irrigation in the semi-arid areas of Tigray, Ethiopia. Sustainable Water Resources Management
4:3, 527-537. [Crossref]
13. Xirong Chen, Wenzheng Gao, Zheng Li. 2018. A data-driven bandwidth selection method for the
smoothed maximum score estimator. Economics Letters 170, 24-26. [Crossref]
14. Jan-Jan Soon, Roslina Kamaruddin, Abdul Rahim Anuar. 2018. Flood victims’ evacuation decisions:
a semi-nonparametric estimation. International Journal of Emergency Services 7:2, 134-146. [Crossref]
15. John Geweke, Joel Horowitz, Hashem Pesaran. Econometrics 3199-3242. [Crossref]
16. Moses K. Muriithi, Germano Mwabu. Demand for Health Care in Kenya 375-384. [Crossref]
17. Liangfei Qiu, Subodha Kumar. 2017. Understanding Voluntary Knowledge Provision and Content
Contribution Through a Social-Media-Based Prediction Market: A Field Experiment. Information
Systems Research 28:3, 529-546. [Crossref]
18. Sungwan Bang, Seung Jun Shin. 2016. A comparison study of multiple linear quantile regression
using non-crossing constraints. Korean Journal of Applied Statistics 29:5, 773-786. [Crossref]
19. Kalle Hirvonen. 2016. Temperature Changes, Household Consumption, and Internal Migration:
Evidence from Tanzania. American Journal of Agricultural Economics 98:4, 1230-1249. [Crossref]
20. LOPAMUDRA BANERJEE. 2016. CATASTROPHES AND CONSUMPTION FAILURE. The
Singapore Economic Review 61:01, 1640006. [Crossref]
21. Soyoung Jung, Xiao Qin, Cheol Oh. 2016. Systemwide Impacts of Emergency Medical Services
Resources on Freeway Crash Severity. Transportation Research Record: Journal of the Transportation
Research Board 2582:1, 51-60. [Crossref]
22. Yeong-Eun Yoo, Seok-Woo Son, Hyeong-Seog Kim, Jee-Hoon Jeong. 2015. Synoptic Characteristics
of Cold Days over South Korea and Their Relationship with Large-Scale Climate Variability.
Atmosphere 25:3, 435-447. [Crossref]
23. Kristian Bernt Karlson. 2015. Another Look at the Method of Y-Standardization in Logit and Probit
Models. The Journal of Mathematical Sociology 39:1, 29-38. [Crossref]
24. Jeroen P. J. de Jong, Orietta Marsili. 2015. The distribution of Schumpeterian and Kirznerian
opportunities. Small Business Economics 44:1, 19-35. [Crossref]
25. Chong Wei, Tingting Lu, Xuedong Yan. 2015. A Logistic Regression Model with a Hierarchical
Random Error Term for Analyzing the Utilization of Public Transport. Mathematical Problems in
Engineering 2015, 1-8. [Crossref]
26. Felix Mutter, Tim Pawlowski. 2014. Role models in sports – Can success in professional sports increase
the demand for amateur sport participation?. Sport Management Review 17:3, 324-336. [Crossref]
27. Bart van Hoof, Marcus Thiell. 2014. Collaboration capacity for sustainable supply chain management:
small and medium-sized enterprises in Mexico. Journal of Cleaner Production 67, 239-248. [Crossref]
28. Peter Davis, Pasquale Schiraldi. 2014. The flexible coefficient multinomial logit (FC-MNL) model of
demand for differentiated products. The RAND Journal of Economics 45:1, 32-63. [Crossref]
29. Sebhatu Kefleyesus Ogubazghi, Willy Muturi. 2014. The Effect of Age and Educational Level of
Owner/Managers on SMMEs’ Access to Bank Loan in Eritrea: Evidence from Asmara City. American
Journal of Industrial and Business Management 04:11, 632-643. [Crossref]
30. James Topitzes, Joshua P. Mersky, Kristin A. Dezen, Arthur J. Reynolds. 2013. Adult resilience among
maltreated children: A prospective investigation of main effect and mediating models. Children and
Youth Services Review 35:6, 937-949. [Crossref]
31. Peter J. Davis, Pasquale Schiraldi. 2013. The Flexible Coefficient Multinomial Logit Model of
Demand for Differentiated Products. SSRN Electronic Journal . [Crossref]
32. Xiangjin Shen, Shiliang Li, Hiroki Tsurumi. 2013. Comparison of Parametric and Semi-Parametric
Binary Response Models. SSRN Electronic Journal . [Crossref]
33. Hans R.A. Koster, Jan Rouwendal. 2012. THE IMPACT OF MIXED LAND USE ON
RESIDENTIAL PROPERTY VALUES*. Journal of Regional Science 52:5, 733-761. [Crossref]
34. James Topitzes, Joshua P. Mersky, Arthur J. Reynolds. 2012. From Child Maltreatment to Violent
Offending. Journal of Interpersonal Violence 27:12, 2322-2347. [Crossref]
35. Justin Doran, Declan Jordan, Eoin O'Leary. 2012. The Effects of National and International
Interaction on Innovation: Evidence from the Irish CIS: 2004–06. Industry and Innovation 19:5,
371-390. [Crossref]
36. Mohammad Ghasemi Hamed, Mathieu Serrurier, Nicolas Durand. Simultaneous Interval Regression
for K-Nearest Neighbor 602-613. [Crossref]
37. Luca Agnello, Ludger Schuknecht. 2011. Booms and busts in housing markets: Determinants and
implications. Journal of Housing Economics 20:3, 171-190. [Crossref]
38. James Topitzes, Joshua P. Mersky, Arthur J. Reynolds. 2011. Child Maltreatment and Offending
Behavior. Criminal Justice and Behavior 38:5, 492-510. [Crossref]
39. M. Helena Guimarães, Carlos Sousa, Tiago Garcia, Tomaz Dentinho, Tomasz Boski. 2011. The value
of improved water quality in Guadiana estuary—a transborder application of contingent valuation
methodology. Letters in Spatial and Resource Sciences 4:1, 31-48. [Crossref]
40. Zhou Chi, Sun Jun. Automatically optimized and self-evolutional Ship Targeting system for Port
State Control 791-795. [Crossref]
41. Ming Li, Quanxi Shao, Lu Zhang, Francis H.S. Chiew. 2010. A new regionalization approach and its
application to predict flow duration curve in ungauged basins. Journal of Hydrology 389:1-2, 137-145.
[Crossref]
42. J. Topitzes, J. P. Mersky, A. J. Reynolds. 2010. Child Maltreatment and Adult Cigarette Smoking: A
Long-term Developmental Model. Journal of Pediatric Psychology 35:5, 484-498. [Crossref]
43. Gery Geenens, Léopold Simar. 2010. Nonparametric tests for conditional independence in two-way
contingency tables. Journal of Multivariate Analysis 101:4, 765-788. [Crossref]
44. Maria F.O. Martins, Sara S. Dias. 2010. Developmental Impact of Very Low Birth Weight on
Childhood Disability: A Parametric and Semiparametric Binary Choice Model Approach. American
Journal of Mathematical and Management Sciences 30:3-4, 217-242. [Crossref]
45. Thierry Magnac. Logit Models of Individual Choice 83-88. [Crossref]
46. Hamid Saeedi, Mohsen Fattahi Ardakani, Mohammad Javad Shahbazi. Estimation of Car Demand
in Iranian Market: Using Discrete Choice Model 231-237. [Crossref]
47. Yulia Kotlyarova, Victoria Zinde-Walsh. 2009. Robust Estimation in Binary Choice Models.
Communications in Statistics - Theory and Methods 39:2, 266-279. [Crossref]
48. J. L. Jensen, P. E. Sum, D. T. Flynn. 2009. Political Orientations and Behavior of Public Employees:
A Cross-National Comparison. Journal of Public Administration Research and Theory 19:4, 709-730.
[Crossref]
49. James Topitzes, Olga Godes, Joshua P. Mersky, Sudi Ceglarek, Arthur J. Reynolds. 2009. Educational
Success and Adult Health: Findings from the Chicago Longitudinal Study. Prevention Science 10:2,
175-195. [Crossref]
50. Ana Colubi, Gil González-Rodríguez, María José Domínguez-Cuesta, Montserrat Jiménez-Sánchez.
2008. Favorability functions based on kernel density estimation for logistic models: A case study.
Computational Statistics & Data Analysis 52:9, 4533-4543. [Crossref]
51. John Geweke, Joel Horowitz, Hashem Pesaran. Econometrics 1-44. [Crossref]
52. Thierry Magnac. Logit Models of Individual Choice 1-5. [Crossref]
53. Jim Wong, Laurence Kang-Por Fung, Tom Pak-Wing Fong, Angela Sze. Residential Mortgage Default
Risk in Hong Kong 132-156. [Crossref]
54. Raymond Struyk, Kirill Chagin. 2007. Perm Oblast's approach to reducing rural poverty. Journal of
Comparative Policy Analysis: Research and Practice 9:1, 87-105. [Crossref]
55. Richard Briesch, Pradeep K. Chintagunta, Rosa L. Matzkin. 2007. Nonparametric Discrete Choice
Models With Unobserved Heterogeneity. SSRN Electronic Journal . [Crossref]
56. Joel L. Horowitz. 2004. Comments on “Size Matters”. The Journal of Socio-Economics 33:5, 551-554.
[Crossref]
57. Jim Wong, Laurence Fung, Tom Fong, Angela Sze. 2004. Residential Mortgage Default Risk in Hong
Kong. SSRN Electronic Journal . [Crossref]
58. Moses K. Muriithi, Germano Mwabu. Demand for Health Care in Kenya 102-110. [Crossref]

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy