Probit Model: Conceptual Framework
Probit Model: Conceptual Framework
In statistics, a probit model is a type of regression where the dependent variable can only take two
values, for example married or not married. The word is a portmanteau, coming
from probability + unit.[1] The purpose of the model is to estimate the probability that an observation
with particular characteristics will fall into a specific one of the categories; moreover, if estimated
probabilities greater than 1/2 are treated as classifying an observation into a predicted category, the
probit model is a type of binary classification model.
A probit model is a popular specification for an ordinal[2] or a binary response model. As such it treats
the same set of problems as does logistic regression using similar techniques. The probit model,
which employs a probit link function, is most often estimated using the standard maximum
likelihood procedure, such an estimation being called a probit regression.
Probit models were introduced by Chester Bliss in 1934;[3] a fast method for computing maximum
likelihood estimates for them was proposed by Ronald Fisher as an appendix to Bliss' work in 1935.
Conceptual framework[edit]
Suppose a response variable Y is binary, that is it can have only two possible outcomes which we
will denote as 1 and 0. For example Y may represent presence/absence of a certain condition,
success/failure of some device, answer yes/no on a survey, etc. We also have a vector
of regressors X, which are assumed to influence the outcome Y. Specifically, we assume that the
model takes the form
where Pr denotes probability, and Φ is the Cumulative Distribution Function (CDF) of the
standard normal distribution. The parameters β are typically estimated by maximum likelihood.
It is possible to motivate the probit model as a latent variable model. Suppose there exists an
auxiliary random variable
where ε ~ N(0, 1). Then Y can be viewed as an indicator for whether this latent variable is
positive:
The use of the standard normal distribution causes no loss of generality compared with
using an arbitrary mean and standard deviation because adding a fixed amount to the
mean can be compensated by subtracting the same amount from the intercept, and
multiplying the standard deviation by a fixed amount can be compensated by multiplying
the weights by the same amount.
To see that the two models are equivalent, note that