Lecture 2 - Instrumental Variable
Lecture 2 - Instrumental Variable
Vu Van Huong
Instrumental variables
◼ Unbiasedness and consistency of OLS under
exogeneity assumption
◼ Identifying assumptions
◼ Instrumental variable: just-identification
◼ Two-stage least square
◼ Test of endogeneity
◼ Problems with IV
OLS regression: assumption of
exogeneity
◼ Consider model:
◼ Y = X +
◼ where:
◼ Y is a n*1 vector of observations on the
explained variable
◼ X is a n*k matrix of observations on k
explanatory variables (include a constant)
◼ The main assumption is that exogeneity of X,
i.e., E(|X) = 0
OLS regression
◼ Minimizing the sum of square of error term,
we get:
ˆ = (X X )−1 (X Y )
◼ Under assumption on exogeneity:
◼ Unbiased estimator:
ˆ = (X X )−1 (X Y ) = (X X )−1 (X ( X + ) )
= (X X ) (X X ) + (X X ) (X )
−1 −1
= + (X X ) X
−1
OLS regression
◼ Then
ˆ
E() = E() + E (X X ) X
−1
= + E (X X )
−1
XE()
=
Violation of assumption of
exogeneity: Finding instruments
◼ When X is not independent of , we cannot
establish the unbiasedness and consistency of
OLS estimator.
◼ Standard solution is to use an instrument z
for each endogenous variable.
◼ Cov(Z, ) = 0
◼ Cov(Z, X) 0
Instrumental variables
◼ From y = X + , we have:
◼ Z’y = Z’X + Z’
◼ Since Z’y is a k*1 vector, Z’X is k*k matrix,
the estimator of is obtained automatically:
ˆ = (ZX )−1 ZY
Two-stage least square (2SLS)
◼ When the number of instruments is lower
than the number of endogenous variables,
the equations Z’y = Z’X do not have solution
=> is underidentified.
◼ When the number of instruments is larger
than the number of endogenous variables,
the equations Z’y = Z’X have more
equations than unknown parameters =>
is overidentified.
Two-stage least square (2SLS)
◼ The idea is to premultiply l*n matrix Z’ with a
k*l matrix ’ to produce a k*n matrix ’Z’. (l
is the number of instrumental variables, k is
the number of unknown parameter ).
◼ Hence we have:
Y = ˆ X
ZY = Zˆ X
= ( ZX ) ZY
ˆ −1
(Z − Z ) i
n
Z
i
Cov( Z, )
ˆ IV = + i =1
=+ =+
ZX
(Z − Z )(X i − X )
n
Cov( Z, X)
i
i =1
Problem with IV estimators
◼ The asymptotic bias is small if Z is small and
ZX is large.
◼ Note that, for OLS:
ˆ Cov (X, ) X
OLS = + =+
Var (X) X2
◼ IV estimator have smaller bias than OLS
when:
Z X
Z X XZ
ZX X2
Example
◼ We use the data on married working women in
MROZ.RAW to estimate the return to education in the
simple regression model: