19-Econometrics-Linear Regression
19-Econometrics-Linear Regression
University of Milan-Bicocca
Course lecturer:
Maryam Ahmadi
maryam.ahmadi@unimib.it
1
Endogenous Regressors and
Instrumental Variables
2
Problem 17 & Answer.
Consider the data SCHOOLING. The purpose of this exercise is to explore the role of parents’ education as
instruments to estimate the returns to schooling.
a. Estimate a reduced form for schooling that include mother’s and father’s education levels, instead of the
lived near college dummy. What do these results indicate about the possibility of using parents’ education as
instruments?
The t-statistics on the two parents' education
variables indicate that these two variables are
highly significant. This indicates that father's
and mother’s education exhibit significant
correlation with schooling, after controlling for
the impact of experience, black, smsa and
south.
Also note that the R2 of this reduced form
(0.5141) is higher than the one reported for the
specification in example (using nearc4 as the
instrument), (0.4745). This is a good signal
because instruments have to be relevant.
However, It does not say anything about
exogeneity of the instruments.
3
b. Estimate the returns to schooling, on the basis of the same specification as in the
example, using mother’s and father’s education as instruments.
c. Re-estimate the model using also the lived near college dummy.
d. Compare and interpret the different estimates on the returns to schooling from
example, and parts b and c of this exercise.
4
(b) (c)
• The 2SLS estimator is less efficient than OLS when the explanatory variables are
exogenous, as we have seen than the 2SLS estimates can have very large standard
errors.
6
Testing for endogeneity of explanatory variables
Suppose in the following model, we have a single suspected endogenous variable (𝑥1 )
and the other variables are exogenous (𝑥2 𝑎𝑛𝑑 𝑥3 )
𝑦=𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥3 + 𝑢 Model 1
Suppose, we have two other exogenous variables 𝑧1 and 𝑧2 that are not in the model
and we can use them as the instruments for 𝑥1 .
• If 𝑥1 is uncorrelated with 𝑢, we should estimate this model by using OLS and if 𝑥1 is
endogenous and is correlated with 𝑢, we estimate it by using 2SLS.
• We can use the test proposed by Hausman (1978) to test for endogeneity.
7
• Hausman (1978) test for endogeneity
As all the regressors here are uncorrelated with 𝑢, if 𝑣 is uncorrelated with 𝑢, we can conclude that 𝑥1
is uncorrelated with 𝑢1 . So, this is what we have to test
𝑢=𝛿0 +𝛿1 𝑣+𝑒
• If 𝛿1 =0 (using a t test), 𝑢 and 𝑣 are uncorrelated, so, 𝑥1 is uncorrelated with 𝑢 and is exogenous, so
model 1 should be estimated using the OLS method
• If 𝛿1 ≠0, 𝑢 and 𝑣 are correlated, so, 𝑥1 is correlated with 𝑢 and is endogenous, so model 1 should be
estimated using the 2SLS method
• The easiest way to test if 𝛿1 =0 or not is to include 𝑣 into model 1 and do the t test, we show it in Stage
two. (We use 𝑣ො obtained from estimating model 2 as a proxy of 𝑣)
8
Stage two. Regress 𝑦 on all explanatory variables appeared in model 1 as well as 𝑣ො
𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥3 + 𝛿1 𝑣ො + 𝑒𝑟𝑟𝑜𝑟 Model 3
9
Example. We have the following model
11
OLS estimation results: 2SLS estimation results:
. reg lwage educ exper expersq . ivregress 2sls lwage (educ = motheduc fatheduc) exper expersq
Source SS df MS Number of obs = 428 Instrumental variables (2SLS) regression Number of obs = 428
F(3, 424) = 26.29 Wald chi2(3) = 24.65
Model 35.0222967 3 11.6740989 Prob > F = 0.0000 Prob > chi2 = 0.0000
Residual 188.305144 424 .444115906 R-squared = 0.1568 R-squared = 0.1357
Root MSE = .67155
Adj R-squared = 0.1509
Total 223.327441 427 .523015084 Root MSE = .66642
lwage Coef. Std. Err. z P>|z| [95% Conf. Interval]
lwage Coef. Std. Err. t P>|t| [95% Conf. Interval] educ .0613966 .0312895 1.96 0.050 .0000704 .1227228
exper .0441704 .0133696 3.30 0.001 .0179665 .0703742
educ .1074896 .0141465 7.60 0.000 .0796837 .1352956 expersq -.000899 .0003998 -2.25 0.025 -.0016826 -.0001154
exper .0415665 .0131752 3.15 0.002 .0156697 .0674633 _cons .0481003 .398453 0.12 0.904 -.7328532 .8290538
expersq -.0008112 .0003932 -2.06 0.040 -.0015841 -.0000382
_cons -.5220406 .1986321 -2.63 0.009 -.9124667 -.1316144 Instrumented: educ
Instruments: exper expersq motheduc fatheduc
12
• Also there is Stata command to test for the presence of endogeneity. This
command should be used after the 2SLS regression
. qui ivregress 2sls lwage (educ = motheduc fatheduc) exper expersq
Tests of endogeneity
Ho: variables are exogenous
13
Testing overidentification restrictions
• Instrument exogeneity: All the instruments are uncorrelated with the error
term
• If the instruments are correlated with the error term, the first stage of 2SLS
doesn’t successfully isolate a component of endogenous variable that is
uncorrelated with the error term
• So the fitted values of the endogenous variable remain correlated with u
and 2SLS is inconsistent
Can we test if our instruments are valid?
14
Testing overidentification restrictions
15
• Stata command for testing overidentification restriction
• Use MROZ.dta
16
Problem 18
Data from 9000 households in a country contains information on whether or not a household has a
member, that works abroad, migrant. Migrants regularly send money to their household back home.
These transfers are called remittances. A researcher wants to use the household survey data to
estimate whether receiving remittances from migrants affect the income earned locally at home by
the household (if receiving money from the migrant can either reduce or increase how much the
household earns at home). 𝑙𝑖𝑛𝑐𝑜𝑚𝑒𝑖 = 𝛽0 + 𝛽1 𝑙𝑟𝑒𝑚𝑖𝑖 + 𝑢𝑖
a) The researcher suspect that lrem is endogenous as some family background factors, such as the
head of family’s education and working experience are omitted from the model. We think that
education level of family head can be correlated with both lrem and lincome. What is the
consequence? And what is the solution?
b) To solve this problem, the researcher uses distance from the capital city as an instrument for
remittances in an IV-estimation. What criteria must distance fulfil to be a valid instrument?
c) Write down the first stage equation of the IV-regression that uses distance from capital city as an
instrument. What would you look for in this first stage to determine the validity of the instrument?
d) Someone suggests that another instrument could be used, namely the ownership of non-
agricultural land. Is it possible to use both instruments simultaneously? Explain your answers.
17