0% found this document useful (0 votes)
1 views17 pages

19-Econometrics-Linear Regression

The document discusses the use of parents' education as instrumental variables to estimate returns to schooling, highlighting the significance of these variables in a reduced form model. It also explains the Hausman test for endogeneity, detailing the steps to test if an explanatory variable is endogenous and the implications for using OLS or 2SLS methods. Additionally, it addresses the validity of instruments in IV regression and presents a problem involving remittances and their effect on household income, exploring potential instruments and their requirements.

Uploaded by

Lorenzo Lucchesi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views17 pages

19-Econometrics-Linear Regression

The document discusses the use of parents' education as instrumental variables to estimate returns to schooling, highlighting the significance of these variables in a reduced form model. It also explains the Hausman test for endogeneity, detailing the steps to test if an explanatory variable is endogenous and the implications for using OLS or 2SLS methods. Additionally, it addresses the validity of instruments in IV regression and presents a problem involving remittances and their effect on household income, exploring potential instruments and their requirements.

Uploaded by

Lorenzo Lucchesi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Econometrics

University of Milan-Bicocca

Course lecturer:
Maryam Ahmadi
maryam.ahmadi@unimib.it

1
Endogenous Regressors and
Instrumental Variables

2
Problem 17 & Answer.
Consider the data SCHOOLING. The purpose of this exercise is to explore the role of parents’ education as
instruments to estimate the returns to schooling.

a. Estimate a reduced form for schooling that include mother’s and father’s education levels, instead of the
lived near college dummy. What do these results indicate about the possibility of using parents’ education as
instruments?
The t-statistics on the two parents' education
variables indicate that these two variables are
highly significant. This indicates that father's
and mother’s education exhibit significant
correlation with schooling, after controlling for
the impact of experience, black, smsa and
south.
Also note that the R2 of this reduced form
(0.5141) is higher than the one reported for the
specification in example (using nearc4 as the
instrument), (0.4745). This is a good signal
because instruments have to be relevant.
However, It does not say anything about
exogeneity of the instruments.
3
b. Estimate the returns to schooling, on the basis of the same specification as in the
example, using mother’s and father’s education as instruments.

c. Re-estimate the model using also the lived near college dummy.

d. Compare and interpret the different estimates on the returns to schooling from
example, and parts b and c of this exercise.

4
(b) (c)

(d) The estimates for the returns to schooling are


• 0.133 (0.049) if we use lived near college,
• 0.091 (0.012) if we use father's and mother’s education,
• 0.094 (0.012) if we use all three variables as instruments.
Note that using parents' education substantially improves the precision of the IV estimates. This is not surprising
given the significance in the reduced form of the extra instruments. Statistically, we do not find any evidence that
would lead us to reject the validity of parents' education as instruments. However, some may argue that parent's
education is partly determined by the same unobservables that determine a kid's schooling.
5
Testing for endogeneity of explanatory variables

• The 2SLS estimator is less efficient than OLS when the explanatory variables are
exogenous, as we have seen than the 2SLS estimates can have very large standard
errors.

• Therefore, it is useful to have a test for endogeneity of an explanatory variable.

• This shows whether using 2SLS is necessary.

6
Testing for endogeneity of explanatory variables

Suppose in the following model, we have a single suspected endogenous variable (𝑥1 )
and the other variables are exogenous (𝑥2 𝑎𝑛𝑑 𝑥3 )

𝑦=𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥3 + 𝑢 Model 1

Variable that is suspected to be endogenous

Suppose, we have two other exogenous variables 𝑧1 and 𝑧2 that are not in the model
and we can use them as the instruments for 𝑥1 .
• If 𝑥1 is uncorrelated with 𝑢, we should estimate this model by using OLS and if 𝑥1 is
endogenous and is correlated with 𝑢, we estimate it by using 2SLS.
• We can use the test proposed by Hausman (1978) to test for endogeneity.
7
• Hausman (1978) test for endogeneity

Stage one. Use a reduced form of 𝑥1 , in which 𝑥1 is regressed on all exogenous


variables
𝑥1 =𝜋0 +𝜋1 𝑥2 +𝜋2 𝑥3 +𝜋3 𝑧1 +𝜋4 𝑧2 +𝑣 Model 2

As all the regressors here are uncorrelated with 𝑢, if 𝑣 is uncorrelated with 𝑢, we can conclude that 𝑥1
is uncorrelated with 𝑢1 . So, this is what we have to test
𝑢=𝛿0 +𝛿1 𝑣+𝑒
• If 𝛿1 =0 (using a t test), 𝑢 and 𝑣 are uncorrelated, so, 𝑥1 is uncorrelated with 𝑢 and is exogenous, so
model 1 should be estimated using the OLS method
• If 𝛿1 ≠0, 𝑢 and 𝑣 are correlated, so, 𝑥1 is correlated with 𝑢 and is endogenous, so model 1 should be
estimated using the 2SLS method
• The easiest way to test if 𝛿1 =0 or not is to include 𝑣 into model 1 and do the t test, we show it in Stage
two. (We use 𝑣ො obtained from estimating model 2 as a proxy of 𝑣)

8
Stage two. Regress 𝑦 on all explanatory variables appeared in model 1 as well as 𝑣ො

𝑦 = 𝛽0 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝛽3 𝑥3 + 𝛿1 𝑣ො + 𝑒𝑟𝑟𝑜𝑟 Model 3

• Estimate model 3 by OLS and test 𝐻𝑂 :𝛿1 =0 using a t statistic.

• If we reject 𝐻𝑂 , we conclude that 𝑥1 is endogenous and we can use 2SLS to estimate


model 1.

9
Example. We have the following model

lwage=𝛽0 + 𝛽1 𝑒𝑑𝑢𝑐 + 𝛽2 𝑒𝑥𝑝𝑒𝑟 + 𝛽3 𝑒𝑥𝑝𝑒𝑟 2 + 𝑢

• Suppose exper and 𝑒𝑥𝑝𝑒𝑟 2 are exogenous


• We think that educ is an endogenous variable related to ability in the error term
• The question is that if the educ is really endogenous?
• We chose parents education as the instruments for education, as parents
education are correlated with education but uncorrelated with innate ability of
the person.
• Using MROZ.dta, we go through stages one and two of the Hausman (1978)
endogeneity test.
• We use only working women data(the observations for which wage data is not
missing)
10
Stage One: Stage Two:
. reg lwage educ exper expersq vhat

Source SS df MS Number of obs = 428


F(4, 423) = 20.50
Model 36.2573098 4 9.06432744 Prob > F = 0.0000
Residual 187.070131 423 .442246173 R-squared = 0.1624
Adj R-squared = 0.1544
Total 223.327441 427 .523015084 Root MSE = .66502

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval]

educ .0613966 .0309849 1.98 0.048 .000493 .1223003


exper .0441704 .0132394 3.34 0.001 .0181471 .0701937
expersq -.000899 .0003959 -2.27 0.024 -.0016772 -.0001208
vhat .0581666 .0348073 1.67 0.095 -.0102502 .1265834
_cons .0481003 .3945753 0.12 0.903 -.7274721 .8236727

vhat is hardly significant, as it is significant at the 10% level, so we cannot


get a strong conclusion about endogeneity of educ, however we cannot
also conclude that it is strongly exogenous. We report both results from
the OLS and the 2SLS estimations

11
OLS estimation results: 2SLS estimation results:

. reg lwage educ exper expersq . ivregress 2sls lwage (educ = motheduc fatheduc) exper expersq

Source SS df MS Number of obs = 428 Instrumental variables (2SLS) regression Number of obs = 428
F(3, 424) = 26.29 Wald chi2(3) = 24.65
Model 35.0222967 3 11.6740989 Prob > F = 0.0000 Prob > chi2 = 0.0000
Residual 188.305144 424 .444115906 R-squared = 0.1568 R-squared = 0.1357
Root MSE = .67155
Adj R-squared = 0.1509
Total 223.327441 427 .523015084 Root MSE = .66642
lwage Coef. Std. Err. z P>|z| [95% Conf. Interval]

lwage Coef. Std. Err. t P>|t| [95% Conf. Interval] educ .0613966 .0312895 1.96 0.050 .0000704 .1227228
exper .0441704 .0133696 3.30 0.001 .0179665 .0703742
educ .1074896 .0141465 7.60 0.000 .0796837 .1352956 expersq -.000899 .0003998 -2.25 0.025 -.0016826 -.0001154
exper .0415665 .0131752 3.15 0.002 .0156697 .0674633 _cons .0481003 .398453 0.12 0.904 -.7328532 .8290538
expersq -.0008112 .0003932 -2.06 0.040 -.0015841 -.0000382
_cons -.5220406 .1986321 -2.63 0.009 -.9124667 -.1316144 Instrumented: educ
Instruments: exper expersq motheduc fatheduc

12
• Also there is Stata command to test for the presence of endogeneity. This
command should be used after the 2SLS regression
. qui ivregress 2sls lwage (educ = motheduc fatheduc) exper expersq

. estat endogenous Stata command to perform the hausman test of endogeneity

Tests of endogeneity
Ho: variables are exogenous

Durbin (score) chi2(1) = 2.80707 (p = 0.0938)


Wu-Hausman F(1,423) = 2.79259 (p = 0.0954)

Hausman test of endogeneity rejects


the null that educ is exogenous at the
10% significance level

13
Testing overidentification restrictions

• Instrument exogeneity: All the instruments are uncorrelated with the error
term
• If the instruments are correlated with the error term, the first stage of 2SLS
doesn’t successfully isolate a component of endogenous variable that is
uncorrelated with the error term
• So the fitted values of the endogenous variable remain correlated with u
and 2SLS is inconsistent
Can we test if our instruments are valid?

14
Testing overidentification restrictions

It is not possible to test whether instruments are valid (exogenous) if the


model is exactly identified (K=R). We just have to believe them.

In the overidentified case, we can test the overidentifying restrictions.


• If there are more instruments than endogenous regressors, it is possible
to test – partially – for instrument exogeneity
• Hausman (1978) proposed a test of overidentifying restrictions

15
• Stata command for testing overidentification restriction
• Use MROZ.dta

Overidentifying restrictions test command

We fail to reject the null, so both


motheduc and fatheduc can be kept as
the instrumental variables for educ
In these two tests, the null and the model is correctly specified.
hypothesis is that there is no
overidentifying restriction and the
instrument set is valid (the model is
correctly specified).

16
Problem 18
Data from 9000 households in a country contains information on whether or not a household has a
member, that works abroad, migrant. Migrants regularly send money to their household back home.
These transfers are called remittances. A researcher wants to use the household survey data to
estimate whether receiving remittances from migrants affect the income earned locally at home by
the household (if receiving money from the migrant can either reduce or increase how much the
household earns at home). 𝑙𝑖𝑛𝑐𝑜𝑚𝑒𝑖 = 𝛽0 + 𝛽1 𝑙𝑟𝑒𝑚𝑖𝑖 + 𝑢𝑖

a) The researcher suspect that lrem is endogenous as some family background factors, such as the
head of family’s education and working experience are omitted from the model. We think that
education level of family head can be correlated with both lrem and lincome. What is the
consequence? And what is the solution?

b) To solve this problem, the researcher uses distance from the capital city as an instrument for
remittances in an IV-estimation. What criteria must distance fulfil to be a valid instrument?

c) Write down the first stage equation of the IV-regression that uses distance from capital city as an
instrument. What would you look for in this first stage to determine the validity of the instrument?

d) Someone suggests that another instrument could be used, namely the ownership of non-
agricultural land. Is it possible to use both instruments simultaneously? Explain your answers.
17

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy