0% found this document useful (0 votes)
4 views18 pages

Lecture 2 - Instrumental Variable

The document discusses instrumental variables (IV) in the context of OLS regression, focusing on the assumptions of exogeneity and the identification of instruments. It explains the two-stage least squares (2SLS) method for estimating parameters when endogeneity is present and outlines the testing for endogeneity using the Hausman test. Additionally, it highlights the challenges of finding valid instruments and the potential bias associated with weak instruments.

Uploaded by

tranhongcuc1304
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views18 pages

Lecture 2 - Instrumental Variable

The document discusses instrumental variables (IV) in the context of OLS regression, focusing on the assumptions of exogeneity and the identification of instruments. It explains the two-stage least squares (2SLS) method for estimating parameters when endogeneity is present and outlines the testing for endogeneity using the Hausman test. Additionally, it highlights the challenges of finding valid instruments and the potential bias associated with weak instruments.

Uploaded by

tranhongcuc1304
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Instrumental variables

Vu Van Huong
Instrumental variables
◼ Unbiasedness and consistency of OLS under
exogeneity assumption
◼ Identifying assumptions
◼ Instrumental variable: just-identification
◼ Two-stage least square
◼ Test of endogeneity
◼ Problems with IV
OLS regression: assumption of
exogeneity
◼ Consider model:
◼ Y = X + 
◼ where:
◼ Y is a n*1 vector of observations on the
explained variable
◼ X is a n*k matrix of observations on k
explanatory variables (include a constant)
◼ The main assumption is that exogeneity of X,
i.e., E(|X) = 0
OLS regression
◼ Minimizing the sum of square of error term,
we get:
ˆ = (X X )−1 (X Y )
◼ Under assumption on exogeneity:
◼ Unbiased estimator:
ˆ = (X X )−1 (X Y ) = (X X )−1 (X ( X + ) )

= (X X ) (X X ) + (X X ) (X  )
−1 −1
  
=  + (X X ) X 
−1
OLS regression
◼ Then
ˆ  
 E() = E() + E (X X ) X
−1


=  + E (X X )
 −1
XE()
=
Violation of assumption of
exogeneity: Finding instruments
◼ When X is not independent of , we cannot
establish the unbiasedness and consistency of
OLS estimator.
◼ Standard solution is to use an instrument z
for each endogenous variable.
◼ Cov(Z, ) = 0
◼ Cov(Z, X)  0
Instrumental variables
◼ From y = X + , we have:
◼ Z’y = Z’X + Z’
◼ Since Z’y is a k*1 vector, Z’X is k*k matrix,
the estimator of  is obtained automatically:
ˆ = (ZX )−1 ZY
Two-stage least square (2SLS)
◼ When the number of instruments is lower
than the number of endogenous variables,
the equations Z’y = Z’X do not have solution
=>  is underidentified.
◼ When the number of instruments is larger
than the number of endogenous variables,
the equations Z’y = Z’X have more
equations than unknown parameters  => 
is overidentified.
Two-stage least square (2SLS)
◼ The idea is to premultiply l*n matrix Z’ with a
k*l matrix ’ to produce a k*n matrix ’Z’. (l
is the number of instrumental variables, k is
the number of unknown parameter ).
◼ Hence we have:
Y = ˆ X
  ZY =  Zˆ X
  = ( ZX )  ZY
ˆ −1

as long as ZX has full rank of k.


Two-stage least square (2SLS)
◼  is estimated by:
 = (Z Z ) ZX
−1

◼ Hence,  is the l*k matrix of coefficients in
OLS regression X on Z.
◼ 2SLS estimators are:
ˆ 
2SLS = XZ(ZZ) ZX
−1 −1

XZ(ZZ) ZY
−1


ˆ ( ) −1
Or: 2SLS = X̂X X̂Y where X̂ = Z
Two-stage least square (2SLS)
◼ 2SLS can be implemented in two stages:
(1) Run regression of X on Z:
X = Z + u
ˆ
Estimate predicted value of X: X̂ = Z
(2) Run regression of Y on X^:
Y = X̂ + u
to obtain estimators of 
Test of endogeneity
◼ Test of endogeneity of a variable X:
H0: X is exogenous  p lim ˆ OLS = p lim ˆ IV / 2SLS
H1: X is endogenous  p lim ˆ OLS  p lim ˆ IV / 2SLS
◼ This test can be conducted as follows:
(1) Run regression of X on Z:
X = Z + u
Estimate predicted value of error term u^.
(2) Run regression of Y on X and u^:
Y = X + û + 
Test of endogeneity
◼ The null hypothesis is now: H 0 :  = 0
◼ Besides, we can use statistic (following t
distribution):
ˆ IV / 2SLS − ˆ OLS
t=
Var (ˆIV / 2SLS ) − Var (ˆ OLS )

to test the null hypothesis of exogeneity of X


versus the alternative hypothesis of
endogeneity of X.
Test of endogeneity
◼ To perform the test, we need an instrument
for each endogenous.
◼ This test is called Hausman test or Durbin-
Wu-Hausman test.
◼ In addition, when there are more than one IV
for one endogenous variable, we can perform
overidentification test. The ideas is that 2SLS
estimates obtained using all of the
instruments are similar to those using just a
necessary subset of the instrument.
Problem with IV estimators
◼ Finding a valid instrument is difficult. Weak
instrument can lead to high bias.
◼ Consider a simple equation:
Y=  + X + , with an endogenous variable
X, and an instrument Z.
◼  can be identified as:
Cov(Z,Y) = Cov(Z,  + X + )
= Cov(Z,) + Cov(Z,X) + Cov(Z,)
= .Cov(Z,X)
=>  = Cov(Z,Y)/Cov(Z,X)
Problem with IV estimators
◼ We can estimate this by:
1 n
(Zi − Z)(Yi − Y )

n − 1 i =1
ˆ IV =
1 n
(Zi − Z)(X i − X )

n − 1 i =1
◼ And have:

 (Z − Z ) i
n

 Z
i
Cov( Z, )
ˆ IV =  + i =1
=+ =+
 ZX
 (Z − Z )(X i − X )
n
Cov( Z, X)
i
i =1
Problem with IV estimators
◼ The asymptotic bias is small if Z is small and
ZX is large.
◼ Note that, for OLS:
ˆ Cov (X, )  X
OLS =  + =+
Var (X)  X2
◼ IV estimator have smaller bias than OLS
when:
 Z  X
   Z   X XZ
 ZX X2
Example
◼ We use the data on married working women in
MROZ.RAW to estimate the return to education in the
simple regression model:

◼ We use father’s education (fatheduc) as an


instrumental variable for educ. We have to maintain
that fatheduc is uncorrelated with u. The second
requirement is that educ and fatheduc are correlated.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy