0% found this document useful (0 votes)
18 views13 pages

5 Ivmf

This document provides a comprehensive overview of instrumental variables (IV) in microeconometrics, detailing the conditions under which IV can be used to address endogeneity issues in regression models. It discusses the importance of valid instruments, their exogeneity and relevance, and outlines the estimation process using two-stage least squares (2SLS). Additionally, it covers methods for testing the validity of instruments and the endogeneity of regressors, emphasizing the implications of weak instruments on statistical properties.

Uploaded by

Sandeep Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views13 pages

5 Ivmf

This document provides a comprehensive overview of instrumental variables (IV) in microeconometrics, detailing the conditions under which IV can be used to address endogeneity issues in regression models. It discusses the importance of valid instruments, their exogeneity and relevance, and outlines the estimation process using two-stage least squares (2SLS). Additionally, it covers methods for testing the validity of instruments and the endogeneity of regressors, emphasizing the implications of weak instruments on statistical properties.

Uploaded by

Sandeep Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Short Guides to Microeconometrics Kurt Schmidheiny

Fall 2023 University of Basel

Instrumental Variables
matrix-free

1 Introduction

This handout extends the handout on ”The Multiple Linear Regression


Model” and refers to its definitions and assumptions in section 2. It dis-
cusses the violation of the exogeneity assumption (OLS3), its consequences
and the potential solution through the use of instrumental variables.
In many applications of the linear model, we suspect that some regres-
sors are endogenous, i.e. one or more regressors are correlated with the
error term, Cov[xik , ui ] 6= 0. In this situation, OLS cannot consistently
estimate the causal effect of the regressor on the dependent variable.
Sometimes, we are able to find exogenous variables zi` which are cor-
related with the endogenous regressor but not correlated with the error
term, i.e. Cov[zi` , ui ] = 0. Such variables zi` are called instrumental vari-
ables or instruments. If there are enough good such instruments, we can
estimate the causal effect of the regressor on the dependent variable.

2 Canonical Examples

2.1 Omitted Variables

Consider the following regression model

yi = γ0 + xi1 β1 + xi2 β2 + vi

which conforms with standard OLS assumptions. Suppose that the vari-
able x2 is not observed. The estimated regression model is therefore

yi = γ0 + xi1 β1 + ui

Version: 3-1-2024, 14:13


Instrumental Variables 2

where ui = xi2 β2 + vi . Regressors xk in x1 are therefore correlated with


the error term u if they are correlated with the omitted variable x2 . In
case xi1 and xi2 are scalars, cov(xik , ui ) = β2 cov(xik , xi2 ).

2.2 Simultaneity and Reversed Causality

Consider the following system of equations

yi1 = α1 + β1 zi1 + γ1 yi2 + ui1

yi2 = α2 + β2 zi2 + γ2 yi1 + ui2

where we assume that both z1 and z2 are uncorrelated with both u1


and u2 . This system is called a structural simultaneous equation system
since y1 and y2 are simultenously determined. The regressor y2 depends
on y1 through the second equation. As y1 is directly dependent on u1 ,
the regressor y2 is also correlated with u1 and hence endogenous in the
first equation. Assuming that u1 and u2 are uncorrelated, cov(yi2 , u1 ) =
[γ2 /(1−γ1 γ2 )]σu2 1 . The above equation system is also described as reversed
causality because the dependent variable y1 has a feedback effect on the
regressor y2 .
In the above example z2 and z1 are straightforward instruments for
IV estimation of the first and second equation, respectively.1

2.3 Measurement Errors (Errors in Variables)

Consider the true regression model

yi = γ0 + β1 x∗i + u∗i
1 Instead of estimating the single structural equations directly by IV it is possible
to formulate and estimate a so-called reduced form of the above equation system. The
RHS of the reduced form equations consists of exogenous variables only. If the system
is identified, the parameters in the structural form can be deduced from the estimated
parameters in the reduced form.
3 Short Guides to Microeconometrics

which conforms the standard OLS assumptions. Suppose that the variable
x∗ is only observed with an error

xi = x∗i + vi

where the error v is uncorrelated with x∗ and with u∗i . The estimated
regression model uses x as a proxy for x∗

yi = γ0 + β1 xi + ui

where ui = u∗i − β1 vi . The regressor x is therefore correlated with the


error term u as both depend on v. Assuming independence between v and
u∗ , the covariance in the above example is cov(x, u) = −β1 σv2 .
In this special case of a bivariate regression, the OLS estimator is
“biased towards zero” as
1
|plim βb1 | = |β1 | V (vi )
< |β1 |.
1+ V (xi )

3 The Econometric Model

Consider the multiple linear regression model for observations i = 1, ..., N

yi = γ0 + β1 xi1 + ... + βK xiK + γ1 wi1 + ... + γM wiM + ui

where yi is the dependent variable, xi1 , ..., xiK are K endogenous regres-
sors wi1 , ..., wiM are M exogenous regressors, βk and γm are K + M + 1
parameters and ui is the error term. Each observation is furthermore
described by L exogenous variables zi1 , ..., ziL , called the instruments or
excluded instruments.
The data generation process (dgp) is fully described by the following
set of assumptions:

IV1: Linearity
yi = γ0 + β1 xi1 + ... + βK xiK + γ1 wi1 + ... + γM wiM + ui , E[ui ] = 0
Instrumental Variables 4

IV2: Independence
{xi1 , ..., xiK , wi1 , ..., wiM , zi1 , ..., ziL , yi }N
i=1 i.i.d.

IV2 means that regressors, instruments and dependent variables are inde-
pendent across observations. In practice guaranteed by random sampling.

IV3: Exogeneity
∀m : Cov[wim , ui ] = 0
∀` : Cov[zi` , ui ] = 0
IV3 means that the exogenous variables (exogenous regressors and ex-
cluded instruments) are uncorrelated with the error term.

IV4: Error Variance


a) V [ui |wi1 , ..., wiM , zi1 , ..., ziL ] = σ 2 < ∞ (homoscedasticity)
b) V [ui |wi1 , ..., wiM , zi1 , ..., ziL ] = σi2 = g(wi1 , ..., ziL ) < ∞
(conditional heteroscedasticity)

IV5: Identifiability
(1, x̂i1 , ..., x̂iK , wi1 , ..., wiM ) are not linearly dependent
0 < V [x̂ik ] < ∞ and 0 < Vb [x̂ik ] for all k
0 < V [wim ] < ∞ and 0 < Vb [wim ] for all m
where x̂ik is predicted by a regression of xk on z1 , ..., zL , w1 , ..., wM
and a constant
IV5 is also called instrument relevance and requires that there are at
least as many excluded instruments as endogenous regressors, L ≥ K,
that all instruments (but the constant) have non-zero variance and not
too many extreme values, that the instruments are relevant predictors for
the endogenous regressors and that the predicted endogenous regressors
are not perfectly collinear, i.e. that different endogenous regressors are
differently predicted by the instruments.
5 Short Guides to Microeconometrics

4 Estimation with OLS

The OLS estimators βk and γm are biased and inconsistent because OLS3
is violated.

5 Estimation with IV (2SLS)

The instrumental variables estimator for βk and γm can be estimated in


a two-stage procedure:

(1) Regress each xk on z1 , ..., zL , w1 , ..., wM and a constant. Predict x


bik
for each variable k and each observation i.

(2) Regress y on x
b1 , ..., x
bK , w1 , ..., wM and a constant.

The estimated coefficients βkIV and γm


IV
of the 2nd stage are called the
instrumental variables (IV) or two-stage least squares (2SLS) estimates.
For the binary regression model with one endogenous regressor and
one instrument, K = L = 1 and M = 0, the IV estimator of γ0 and β1
can be directly calculated as:
PN ¯ PN
xi − x
i=1 (b b)(yi − ȳ) i=1 (zi − z̄)(yi − ȳ)
βb1IV = PN = PN
¯
b)2
xi − x
i=1 (b i=1 (zi − z̄)(xi − x̄)

b0IV = ȳ − βb1IV x̄
γ

6 Small Sample Properties of the IV Estimator

No small sample properties can be analytically established. The IV esti-


mator is in general biased.
Instrumental Variables 6

7 Asymptotic Properties of the IV Estimator

The following large sample properties can be established under assump-


tions IV1 through IV4 :

• The IV estimator is consistent:

plim βbkIV = βk

• The IV estimator is asymptotically normally distributed:


√ d
N (βbkIV − βk ) −→ N 0, ς 2


• The IV estimator is therefore approximately normally distributed:


 
A
βbkIV ∼ N βk , Avar[βbk ]

b = ς 2 /N . For the binary


where the asymptotic variance is Avar(β)
regression model under IV4a (homoscedasticity) it can be consis-
tently estimated as

[ βb1IV ] = P b2
σ
Avar[ N ¯b)2
xi − x
i=1 (b

with
N
1 X 2
b2 =
σ u
N i=1 i
b

where ubi = yi − (b
γ0 + βb1 xi1 ). In the general case with K endoge-
nous regressors, M exogenous regressors, L instruments and IV4b
(heteroscedasticity), Avar[βbIV ] can be consistently estimated as
k
the robust or Eicker-Huber-White estimator (see handout on “Het-
eroscedasticity in the linear Model”).

Note: The estimated asymptotic variance given in the usual output of the
2nd stage OLS regression is incorrect since σ b2 will be based on
bi = yi − (b
u γ0 + βb1 x
bi1 + ... + βbK x
biK + γ
b1 wi1 + ... + γ
bM wiM ) rather than
bi = yi − (b
u γ0 + β1 xi1 + ... + βK xiK + γ
b b b1 wi1 + ... + γ
bM wiM ).
7 Short Guides to Microeconometrics

8 What are Valid Instruments

Valid instruments are typically derived from natural or random experi-


ments (Angrist and Krueger 2001). Instruments are valid if the following
two requirements are satisfied:

(1) Instrument Exogeneity (IV3): Valid instruments are uncorrelated


with the error term. This requirement needs a strong theoretical
argument and can in general not be tested (see section 9). The
theoretical argument has to

(a) convincingly rule out any direct effect of the instruments on


the dependent variable or any effect running through omitted
variables. This is sometimes called the exclusion restriction.
(b) convincingly rule out any reverse effect of the dependent vari-
able on the instruments.
(c) convincingly describe why the instruments influence the en-
dogenous regressors. This is the influence after controlling for
the effect through exogenous included regressors. If you do not
understand why excluded instruments and endogenous regres-
sors are correlated, then this correlation is likely a sign that
that either (a) or (b) is violated.

(2) Instrument Relevance (IV5): Valid instruments are highly corre-


lated with the endogenous regressors even after controlling for the
exogenous regressors. This requirement can be empirically tested in
the first stage regression (see section 10).

In practice the two requirements are often conflicting.

9 Testing for the Exogeneity of the Instruments

The exogeneity of the instruments (IV3 ) can in general not be tested.


In case we have more instruments than necessary, L > K, we can per-
form a so-called J-test for overidentifying restrictions. This tests whether
Instrumental Variables 8

all instruments are exogeneous assuming that a least one of the instru-
ments is exogenous. The J-Test will therefore not necessarily detect a
situation in which all instruments are endogenous.

10 Testing for the Relevance of the Instruments

Instruments that have a low correlation with the endogenous regressors


after controlling for the exogenous regressors are called weak instruments.
There is empirical and theoretical evidence that IV estimation with weak
instruments has poor statistical properties and may perform even poorer
than OLS (surveyed in Stock, Wright and Yogo 2002). In particular,
hypothesis tests may not have correct size and confidence intervals may
not be correct even in very large samples.
The relevance of the instruments is tested in the first-stage regression.
As a rule of thumb, the F -statistic of a joint test whether all excluded
instruments (zi1 , ..., ziL ) are significantly different from zero should be
bigger than 10 in case of a single endogenous regressor. This F -Test
should always be reported when reporting IV estimates. In case of a single
instrument and a single endogenous regressor, this implies that the t-value

for the instrument should be bigger than 10 ≈ 3.2 or the corresponding
p-value below 0.0016.

11 Reduced Form Estimation

In the presence of weak instruments (see section 10), hypothesis tests


based on IV estimates are not correct any more. Reduced form esti-
mation offers a simple approach to test the null hypothesis H0 that all
K coefficients β1 , ..., βK related to the endogenous explanatory variables
xi1 , ..., xiK are simultaneously equal to zero.
The reduced form estimation is an OLS regression of the dependent
variable yi on a constant, all excluded instruments zi1 , ..., ziL and all exo-
9 Short Guides to Microeconometrics

geneous regressors wi1 , ..., wiM

yi = φ0 + δ1 zi1 + ... + δK ziL + φ1 wi1 + ... + φM wiM + vi

where δk and φm are L + M + 1 parameters and vi is the error term. Un-


der H0 , the excluded instruments do not have an effect on the dependent
variable. The null hypothesis H0 : β1 = 0, ..., βK = 0 can therefore be
tested by testing whether the coefficients, δ1 , ..., δL related to the excluded
instruments zi` are simultaneously equal to zero in the reduced form re-
gression. This can be tested with a standard joint Wald-test. In case of a
single endogenous regressor and a single instrument, it can be tested with
a standard t-test. The reduced form test does not involve the first-stage
regression(s) and is therefore also correct if the instruments are weak. See
Chernozhukov and Hansen (2008) for motivation and generalizations.

12 Testing for the Exogeneity of the Regressors

We may also want to know if there is an endogeneity problem in an appli-


cation. This is usually tested by a (Durbin-Wu-)Hausmann test. However,
the Hausman test is only valid under homoscedasticity and often involves
the cumbersome generalized inversion of a non-singular matrix.
Exogeneity of the regressors is better tested by running an auxiliary
regression (Wooldridge 2010, eq. 6.25)

yi = γ0 +β1 xi1 +...+βK xiK +γ1 wi1 +...+γM wiM +δ1 vbi1 +...+δK vbiK +ei

where vbi are the residuals from the first stage regressions for all endoge-
nous regressors (the variables xi1 , ..., xiK ). The exogeneity test is then
a joint F or Wald-Test that all K coefficients δ1 , ..., δK are equal to
zero. This test is robust to heteroscedasticity if the robust (Eicker-Huber-
White) variance estimator is used.
Note: This is a test for the exogeneity of the regressors xi and not for
the exogeneity of the instruments zi . If the instruments are not valid, the
test is not valid either.
Instrumental Variables 10

13 Heterogeneous Effects

IV1 assumes that the parameters βk are constant across individuals i.


However, in reality, effects βik likely differ across i, i.e. the effects are
heterogeneous and researchers seek to estimate an average treatment effect
AT Ek = E[βik ]. Unfortunately, the IV estimator βbk is in general not an
unbiased estimator for AT Ek .
An exception is the case with a single explanatory variable di and a
single instrument zi which are both dummy variables: yi = β0 + β1 di + ui .
The dummy variable di = g(zi , vi ) ∈ {0, 1} is a function of the instrument
zi ∈ {0, 1} and an error vi and takes value 1 for a treated individual and 0
for an individual in the control group. In this case, the IV estimator βb1 can
be interpreted as the local average treatment effect (LAT E) even if the
individual effects βi1 are heterogeneous provided that zi is independent
of both ui (IV3 ) and vi and provided that the instrument zi does not
decrease the treatment di for any individual i (monotonicity). The latter
condition means that there are no individuals who are treated di = 1
when zi = 0 and would not have been treated di = 0 when zi = 1.
Such individuals are called “defiers” and need to be ruled out for the
LAT E interpretation. The monotonicity requirement cannot be tested
and its validity must be defended in the context of a particular application.
“Local” in LAT E means that the estimated effect is the AT E only for
the sub-population of those individuals who are treated di = 1 when
zi = 1 and would not have been treated di = 0 when zi = 0. This sub-
population is called “compliers”. Note that different instruments zi will
lead to different sub-populations of compliers and hence different LAT E
to be estimated. See Imbens and Angrist (1994) for explanations and
generalizations.
11 Short Guides to Microeconometrics

Implementation in Stata 17

Stata calculates the IV (2SLS) estimator by the command


ivregress 2sls depvar [varlist1] (varlist2=varlist3)

where varlist1 are exogeneous regressors (wi1 , ..., wiM ) , varlist2 are
endogenoues regressors (xi1 , ..., xiK ) and varlist3 are excluded instru-
ments (zi1 , ..., ziL ). For example, load data
webuse hsng2

and regress median monthly rents (rent) of census divisions on the share
of urban population (pcturban) and the median housing value (hsngval)
ivregress 2sls rent pcturban (hsngval = faminc reg2-reg4), vce(robust)

Housing values are likely endogeneous and therefore instrumented by me-


dian family income (faminc) and 3 regional dummies (reg2, reg4, reg4).
The Eicker-Huber-White covariance estimator which is robust to het-
eroscedasticity is reported with the option vce(robust). The option
first requests that the first-stage regression results be displayed. First
stage results are also provided by the postestimation command
estat firststage

which includes the F -statistic to assess weak instruments in case of K = 1


or the so-called rank F-statistic in case of K > 1.
The J-Test is reported with the postestimation command
estat overid

The test for exogeneity of the regressors can be calculated by adding the
first stage residuals to an auxiliary regression. For example,
regress hsngval pcturban faminc reg2-reg4
predict v, resid
regress rent hsngval pcturban v
test v

The reduced form test is performed by


regress rent pcturban faminc reg2-reg4, vce(robust)
test faminc reg2 reg3 reg4
Instrumental Variables 12

Implementation in R 4.2.3

The IV (2SLS) estimator is conveniently implemented in the R package


ivreg as command
ivreg(y ~ x1 + x2 + w1 + w2 | z1 + z2 + z3 + w1 + w2)

where x1 and x2 are endogenous regressors, w1 and w2 exogeneous regres-


sors, and z1 to z3 are excluded instruments. For example, load data
library(haven)
hsng2 <- read.dta("http://www.stata-press.com/data/r17/hsng2.dta")

and regress median monthly rents (rent) of census divisions on the share
of urban population (pcturban) and the median housing value (hsngval)
library(ivreg)
iv <- ivreg(rent~hsngval+pcturban|pcturban+faminc+reg2+reg3+reg4,
data = hsng2)
summary(iv)

Housing values are likely endogeneous and therefore instrumented by me-


dian family income (faminc) and 3 regional dummies (reg2, reg4, reg4).
The Eicker-Huber-White covariance estimator which is robust to het-
eroscedastic error terms and corrected for degrees of freedom in small
samples is reported after estimation with
library(sandwich)
library(lmtest)
coeftest(iv, vcov=vcovHC, type="HC1")

First stage results are reported by explicitly estimating them. E.g,


first <- lm(hsngval~pcturban+faminc+reg2+reg3+reg4, data = hsng2)
summary(first)

In case of a single endogenous variable (K = 1), the F -statistic to assess


weak instruments is reported after estimating the first stage with e.g.
waldtest(first, .~.-faminc-reg2-reg3-reg4)

or in case of heteroscedatistic errors


waldtest(first, .~.-faminc-reg2-reg3-reg4, vcov=vcovHC(first, type="HC1"))
13 Short Guides to Microeconometrics

References
Introductory textbooks

Stock, James H. and Mark W. Watson (2012), Introduction to Economet-


rics, 3rd ed., Pearson Addison-Wesley, chapter 12.
Wooldridge, Jeffrey M. (2009), Introductory Econometrics: A Modern
Approach, 4th ed., South-Western Cengage Learning, chapter 15.

Advanced textbooks

Cameron, A. Colin and Pravin K. Trivedi (2005), Microeconometrics:


Methods and Applications, Cambridge University Press, 4.8–4.9.
Wooldridge, Jeffrey M. (2010), Econometric Analysis of Cross Section and
Panel Data, MIT Press, chapter 5.
Davidson, Russell and James G. MacKinnon (2004), Econometric Theory
and Methods, Oxford University Press, chapter 8.

Articles

Angrist, Joshua and Alan Krueger (2001), Instrumental Variables and


the Search for Identification: From Supply and Demand to Natural
Experiments, Journal of Economics Perspectives, 15/4, 69–85.
Chernozhukov, Victor and Christian Hansen (2008), The reduced form:
A simple approach to inference with weak instruments, Economics
Letters, 100, 68–71.
Hausman, Jerry (2001), Mismeasured Variables in Econometric analysis:
Problem form the Right and from the Left, Journal of Economics
Perspectives, 15/4, 57–67.
Imbens, Guido W. and Joshua Angrist (1994), Identification and Estima-
tion of Local Average Treatment Effects, Econometrica, 62(2), 467–
475.
Stock, J. H., J. H. Wright and M. Yogo (2002), A Survey of Weak Instru-
ments and Weak Identification in Generalized Method of Moments,
Journal of Business and Economic Statistics, 20(4), 518–29.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy