metrikaq
metrikaq
y = b0 + b1x1 +... + bk xk + e
Where we made several assumptions for OLS to have good properties, one of
which was that the error term is normal => y is normal => y has a continuous
distribution.
Today we will depart from the OLS estimation of the linear regression model.
We will deal with cases where:
- The dependent variable in the model will no longer be continuously
varying on an infinite range, like we have assumed in OLS world.
- Most models we will cover will not be linear in parameters, so
Ordinary Least Squares (OLS) technique will not be applicable for
estimation of these models. Instead we will see how Maximum
Likelihood Estimation (MLE) is used to estimate these models.
- The parameter estimates from these models will no longer represent
the marginal change in the expected outcomes, and we will have to learn
the new ways of interpreting model estimates and finding marginal
effects.
The models can be classified into two main groups: qualitative response models
and models with limited dependent variable.
I. Qualitative response (discrete choice) models:
1. Binary choice models (logit, probit)
Example: being in the labor force (=working or looking for work)
o No (=0)
o Yes (=1)
2. Multiple-choice models: choosing from unranked options
(multinomial or conditional or mixed logit model)
Example: choice of candidates for the Democratic Party (US)
but you have wage values/data only for individuals who are in the labor
force, while you are missing values on wages for those individuals who
are currently out of the labor force. If you only use data for people in the
labor force to estimate the relationship of interest, you will not have a
random sample of the population anymore: people who are in labor force
might be a selective subsample of the population. In this case OLS will no
longer be unbiased, and we have to find a better way for estimating the
model.
Our focus will be on binary choice models (logit & probit) and the optimization
approach you need to take in order to estimate these models.
Can’t we use OLS approach to deal with binary dependent variable and
estimate a linear probability model?
A Linear Probability Model (LPM) is a type of regression model used to
estimate the probability of a binary outcome (e.g., yes/no, 0/1) as a linear
function of explanatory variables. It is a special case of linear regression where
the dependent variable is binary.
Technically, we could use a linear probability model when our dependent
variable is binary and estimate it using Ordinary Least Squares approach, but
there are certain problems associated with this approach.
Some of the undesirable properties of linear probability model estimated via
OLS include:
Fit of the regression line is not flexible enough (linear vs. logistic)
Estimated (fitted) values of Y can take values above/below 1 and 0, while
their interpretation is “probability of Y=1”
Let’s look a bit closer at what we would get if we estimated a model with binary
dependent variable (labor force participation) using a linear model and OLS
technique:
Model #1: Linear Probability Model (LPM) – estimated using OLS
Simple linear regression model where Yi is a binary dependent variable.
reg lfp k5 k618 age wc hc lwg inc
where: E(Y|X1, X2, …, Xk) for a binary Y variable is simply equal to:
E(Y|X1, X2, …, Xk) = 1*P(Y=1|X1, X2, …, Xk) + 0* P(Y=0|X1, X2, …, Xk) =
= P(Y=1|X1, X2, …, Xk)
𝛽𝑖 can be interpreted as the change in probability that Yi=1 associated with unit
change in 𝑋𝑖 , holding other regressors constant. If 𝛽1,𝑖𝑛𝑐𝑜𝑚𝑒 =0.09 that means
that as average income increases by $1, probability of LF participation increases
by 9%.
If intercept (𝛽0 ) coefficient is positive, the probability of participating in LF is
positive even if all variables are 0.
Interpretation of fitted values (y_hat): if fitted value for particular set of X’s is
0.70, this means individuals with given X values have a 70% chance of having a
score of 1 (being in labor force). In other words, we would expect that 70% of
the people who have this particular combination of values on X would fall into
category 1 of the dependent variable, while the other 30% would fall into
category 0. But problem is that fitted values can go beyond 0 and 1 (not
exactly probability)
So, how can we move from this linear probability model to non-linear
probability model and how can we estimate it?
What logit model trues to estimate is the following model equation:
Where coefficients represent change in log odds (left hand side of the equation).
But let’s see where this is coming from:
Non-linear probability models: logit & probit
The binary choice model is derived from an underlying linear model with a
continuous response/dependent variable yi* being determined by observable
factors (x1…xk) and an error:
In model (1) let’s assume that the error term is normal 𝑢𝑖 ~𝑁(0, 𝜎 2 ). In this case
we will have conditional distribution of 𝑦𝑖∗ ~𝑁(𝑥𝑖′ 𝛽, 𝜎 2 ), and we can write:
𝑦𝑖∗ − 𝑥𝑖′ 𝛽 0 − 𝑥𝑖′ 𝛽 𝑥𝑖′ 𝛽
𝑃(𝑦𝑖 = 1) = 𝑃(𝑦𝑖∗ ≥ 0|𝑥𝑖′ 𝛽, 𝜎) = 𝑃( ≥ ) = 𝑃 (𝑧𝑖 ≥ − )
𝜎 𝜎 𝜎
𝑥𝑖′ 𝛽 𝑥𝑖′ 𝛽
= 𝑃 (𝑧𝑖 < ) = 𝛷( ) = CDF (probability or area below that value, which
𝜎 𝜎
Logit model assumes that error terms follow a logit distribution, while probit
model assumes that the error terms follow a normal distribution
We substitute these specific CDF functions into the likelihood function and we
need to do optimization in order to find the values of betas and sigma:
𝑦𝑖 1−𝑦𝑖
𝑥𝑖′ 𝛽 𝑥𝑖′ 𝛽
𝐿(𝛽, 𝜎|𝑦𝑖 , 𝑥𝑖 ) = ∏𝑛𝑖=1 [𝛷 ( )] [1 − 𝛷 ( )]
𝜎 𝜎
If we take the log of the likelihood function, we will get the log-likelihood
function:
é n ù n
ln L = ln [ L(q1, q 2 ,..., qk | y)] = ln êÕ li (q1, q2 ,..., q k | y)ú = å lnl(q1, q 2 ,..., q k | yi )
ë i=1 û i=1
n
ln L(β | data) yi ln (x i ' β) (1 yi ) ln1 (x i ' β)
i 1
𝑃̂(𝑦𝑖 = 1) = 𝐹(𝛽̂0 + 𝛽̂1 𝑎𝑔𝑒𝑖 + 𝛽̂2 𝑒𝑑𝑢𝑐𝑖 + 𝛽̂3 𝑒𝑥𝑝𝑒𝑟𝑖 + ⋯ + 𝛽̂𝑘 𝑘𝑖𝑑𝑠𝑖 )
Since value of PDF is always positive, the sign of marginal effect will always
depend on the value of 𝛽𝑗 obtained from the regression.
Odds ratios:
Often what studies using logit/probit regression will report are odds-ratios,
which allow us to interpret the effect of a particular variable on dependent
variable. Odds ratio allows us to understand the effect of each explanatory
variable X on the odds of the outcome.
Example:
å yi å(1-yi )
L(q | y) = L(q | y1,..., yn ) = q i=1 (1- q ) i=1
( Þ L(q | y) = Õli )
i i
li = l(q | yi ) = q y (1- q )1-y
n n n ⌢ å yi
å yi - q å yi = nq - q å yi Þ q= i=1
=y
i=1 i=1 i=1 n
Basically, we find that the sample mean will be an MLE estimator for the
unknown parameter θ.
Final comment:
Logit and probit models follow similar procedure of estimating regression
coefficients (betas), however they assume a different probability function (one
that follows either logistic regression or normal distribution) and thus its
likelihood function also looks different. However, the steps used in optimization
and derivation of coefficients in these models is similar to the steps used in this
example.