Lecture 12 Instrumental Variables
Lecture 12 Instrumental Variables
Instrumental Variables
A necessary condition for consistency of OLS estimators is that the disturbance term is
distributed independently of the regressors (assumption B.7 is satisfied). It is said that there is
endogeneity in the regression model if some of the explanatory variables are correlated with the
disturbance term. Variables correlated with the disturbance term are defined as endogenous variables,
while variables uncorrelated with the disturbance term are called exogenous variables. There are
many sources of endogeneity. As we have studied earlier, endogeneity would occur if:
1) There are omitted variables that are correlated with at least one of the included explanatory
variables;
2) There is a measurement error in one of the explanatory variables.
In the next lecture we will cover one more source of endogeneity when models comprise two or more
simultaneous relationships. The main consequences of assumption B.7 violation consist in the
following: OLS estimators become inconsistent, standard statistics are calculated wrongly and
statistical tests are invalid. We can deal with these problems by applying instrumental variables (IV)
estimation.
Instrumental variables
A valid instrumental variable 𝑍 is a special proxy variable for which the following conditions
must be satisfied:
1) Instrument exogeneity: It is uncorrelated with the disturbance term, i.e. Cov(𝑍, 𝑢) = 0;
2) Instrument relevance: It is sufficiently strongly correlated with the corresponding
endogenous variable, i.e. Cov(𝑍, 𝑋) ≠ 0
The main objective of their application is to obtain consistent estimates of the parameters once the
assumption B.7 is violated for the initial specification. Let’s consider a simple linear regression
model:
𝑌 = 𝛽1 + 𝛽2 𝑋 + 𝑢, 𝑋 is related to 𝑢
OLS will give inconsistent estimates, i.e.:
̂ (𝑋, 𝑌) Cov
Cov ̂ (𝑋, 𝛽1 + 𝛽2 𝑋 + 𝑢) Cov ̂ (𝑋, 𝛽1 ) Cov
̂ (𝑋, 𝛽2 𝑋) Cov
̂ (𝑋, 𝑢)
𝑏2𝑂𝐿𝑆 = = = + + =
̂ (𝑋)
Var ̂ (𝑋)
Var ̂ (𝑋)
Var ̂ (𝑋)
Var ̂ (𝑋)
Var
̂ (𝑋) Cov
Var ̂ (𝑋, 𝑢) ̂ (𝑋, 𝑢)
Cov
= 0 + 𝛽2 + = 𝛽2 +
̂ (𝑋)
Var ̂ (𝑋)
Var ̂ (𝑋)
Var
By taking the probability limit and using its properties, we can show that the resulting estimator is
inconsistent:
̂ (𝑋, 𝑢)
Cov ̂ (𝑋, 𝑢))
plim(Cov Cov(𝑋, 𝑢)
plim(𝑏2𝑂𝐿𝑆 ) = plim (𝛽2 + ) = 𝛽2 + = 𝛽2 + ≠ 𝛽2 ,
̂ (𝑋)
Var ̂ (𝑋)
plim(Var Var( 𝑋)
as Cov(𝑋, 𝑢) ≠ 0
Let’s introduce an instrumental variable 𝑍, correlated with 𝑋 and uncorrelated with 𝑢. Let's show that
the estimator of the parameter 𝛽2 obtained with the use of the instrumental variable, defined as
̂ (𝑍, 𝑌) ∑(𝑍𝑖 − 𝑍̄)(𝑌𝑖 − 𝑌̄)
Cov
𝑏2𝐼𝑉 = =
̂ (𝑍, 𝑋) ∑(𝑍𝑖 − 𝑍̄)(𝑋𝑖 − 𝑋̄)
Cov
is consistent:
1
̂ (𝑍, 𝑌) Cov
Cov ̂ (𝑍, 𝛽1 + 𝛽2 𝑋 + 𝑢) Cov
̂ (𝑍, 𝛽1 ) Cov
̂ (𝑍, 𝛽2 𝑋) Cov
̂ (𝑍, 𝑢)
𝑏2𝐼𝑉 = = = + + =
̂ (𝑍, 𝑋)
Cov ̂ (𝑍, 𝑋)
Cov ̂ (𝑍, 𝑋)
Cov ̂ (𝑍, 𝑋)
Cov ̂ (𝑍, 𝑋)
Cov
̂ (𝑍, 𝑋) Cov
Cov ̂ (𝑍, 𝑢) ̂ (𝑍, 𝑢)
Cov
= 0 + 𝛽2 + = 𝛽2 +
̂ (𝑍, 𝑋) Cov
Cov ̂ (𝑍, 𝑋) ̂ (𝑍, 𝑋)
Cov
By taking the probability limit and using its properties, we can show that IV estimator is consistent:
̂ (𝑍, 𝑢)
Cov ̂ (𝑍, 𝑢))
plim (Cov Cov(𝑍, 𝑢)
plim(𝑏2𝐼𝑉 ) = plim (𝛽2 + ) = 𝛽2 + = 𝛽2 +
̂ (𝑍, 𝑋)
Cov ̂ (𝑍, 𝑋))
plim (Cov Cov(𝑍, 𝑋)
0
= 𝛽2 + = 𝛽2 ,
Cov(𝑍, 𝑋)
as Cov(𝑍, 𝑢) = 0 and Cov(𝑍, 𝑋) ≠ 0 by construction
Note that it is not possible to demonstrate whether the estimator is unbiased because we are
unable to take expectations since 𝑋 is non-stochastic that is distributed not independently of 𝑢, i.e.
̂ (𝑍, 𝑢)
Cov ̂ (𝑍, 𝑢)
Cov ̂ (𝑍, 𝑢))
E(Cov
E(𝑏2𝐼𝑉 ) = E (𝛽2 + ) = 𝛽2 + E ( ) ≠ 𝛽2 +
̂ (𝑍, 𝑋)
Cov ̂ (𝑍, 𝑋)
Cov ̂ (𝑍, 𝑋))
E(Cov
The population variance of the Instrumental Variables estimator of the slope coefficient of the
simple linear regression is given by the following expression (valid for the large samples):
𝜎𝑢2 1
Var(𝑏2IV ) = 𝜎𝑏2IV = × 2
2 ∑(𝑋𝑖 − 𝑋̄)2 𝑟𝑋,𝑍
Compare this expression to that for the variance of the OLS estimator:
𝜎𝑢2
Var(𝑏2OLS ) =
∑(𝑋𝑖 − 𝑋̄)2
1
The variance of 𝑏2IV can be obtained by multiplying 𝑏2OLS by 𝑟 2 . The higher the correlation between
𝑋,𝑍
𝑋 and 𝑍, the smaller this multiplier and, therefore, the smaller the variance of 𝑏2IV . Therefore, facing a
choice between several potential instrumental variables, it is necessary to choose the one which is most
strongly correlated with 𝑋 because, other things being equal, it will yield the most efficient estimators.
At the same time, it would be undesirable to use an instrumental variable perfectly correlated with 𝑋,
even if one could be found, because it would automatically be correlated with 𝑢 as well and we would
still get inconsistent estimators. We need an instrumental variable, most strongly correlated with 𝑋 but
uncorrelated with 𝑢.
Example: Instrumental variables in Friedman’s model (Nissan Liviatan)
To illustrate the use of instrumental variables method, we will show how it can be applied to
solve the problem of inconsistency arising when Friedman’s consumption function is estimated (see
Lecture 10). Suppose, we have data on consumption and income of a sample of households in two
sequential years. We shall denote consumption and income in the first year by C1 and Y1 and those in the
second year by C2 and Y2.
If Friedman’s theory is correct, Y2 can act as an instrumental variable for Y1. Obviously, it is likely
to be closely correlated with Y1, so that one of the two requirements to a good instrumental variable is
satisfied. Second, if the transitory components of the measured income in different years are
uncorrelated, as Friedman assumed, Y2 will be uncorrelated with the disturbance term in the regression
2
of C1 on Y1; thus, the other condition is also satisfied.
It is also possible to use C2 as an instrumental variable for Y1. It is strongly correlated with Y2 (and,
therefore, with Y1 as well) while being uncorrelated with the disturbance term in the relationship between
C1 and Y1 (if, in accordance with Friedman’s hypothesis, the transitory components of consumption are
uncorrelated with one another).
Similarly, it is possible to estimate regressions for the second year consumption, using C1 and Y1
as instrumental variables for Y2.
This described approach, where a lagged explanatory variable is used as an instrument, is quite
often employed in econometric modelling. In fact, economic variables are frequently connected by both
direct and inverse relationships. For example, in the model 𝑌 = 𝛽1 + 𝛽2 𝑋 + 𝑢 the value of 𝑋 can itself
depend on the corresponding value of 𝑌 (endogeneity problem). This leads to a violation of the 4-th
Gauss-Markov condition, i.e. to a correlation between the explanatory variable and the disturbance term,
and, consequently, to a bias in OLS estimators. If, however, a lagged value of 𝑋 is used as an explanatory
variable and if autocorrelation is absent, then consistent estimators can be obtained, which will also be
quite reliable, since the subsequent values of 𝑋 are strongly correlated (this is the case in most economic
models).
Asymptotic and finite-sample distributions of the IV estimator
The distribution of the IV estimator degenerates to a spike. In fact, the expression for the
variance may be rewritten as shown.
𝜎𝑢2 1 𝜎𝑢2 1 𝜎𝑢2 1
Var(𝑏2IV ) = 𝜎𝑏2IV = × = × = ×
2 ∑(𝑋𝑖 − 𝑋̄)2 𝑟𝑋,𝑍 𝑛 (1 ∑(𝑋 − 𝑋̄)2 ) 𝑟𝑋,𝑍 𝑛 MSD(𝑋) 𝑟𝑋,𝑍
2 2 2
𝑛 𝑖
MSD(𝑋) is the mean square deviation of X. By a law of large numbers (LLN), MSD(𝑋) tends
to the population variance of 𝑋 that is non-zero. Therefore, as 𝑛 appears in the denominator, the
variance of 𝑏2IV is inversely related to 𝑛 => it tends to zero for large 𝑛 and its distribution collapses to
the spike.
Next, let’s show the asymptotic normality. Consider the distribution of√𝑛(𝑏2IV − 𝛽2 ). Now the
problem of diminishing variance is eliminated. Moreover, it has a limiting distribution with zero mean
and stable variance, i.e.
𝜎𝑢2 1
Var (√𝑛(𝑏2IV − 𝛽2 )) = × 2
MSD(𝑋) 𝑟𝑋,𝑍
E (√𝑛(𝑏2IV − 𝛽2 )) = √𝑛E(𝑏2IV − 𝛽2 ) → √𝑛(𝛽2 − 𝛽2 ) → 0 as 𝑛 → ∞
It can be shown that a central limit theorem (CLT) can be applied to demonstrate that
√𝑛(𝑏2IV − 𝛽2 ) has the limiting normal distribution:
IV
𝑑 𝜎𝑢2 1
√ 𝑛(𝑏 2 − 𝛽2 ) → 𝑁 (0, 2× 2 )
𝜎𝑋 𝑟𝑋𝑍
2
𝜎𝑢 1
(𝑏2IV − 𝛽2 )~𝑁 (0, × 2)
𝑛MSD(𝑋) 𝑟𝑋𝑍
Hence, as an approximation, for sufficiently large samples, 𝑏2IV is distributed as follows:
𝜎𝑢2 1
𝑏2IV ~𝑁 (𝛽2 , × 2 )
𝑛MSD(𝑋) 𝑟𝑋,𝑍
This result can be used for performing usual tests. However, from the mathematical point of
view such concepts as “sufficiently large samples” and “an approximation” are not well defined. That
is why the analysis is made by means of a Monte Carlo experiment. Suppose, we set up the following
3
model where Z, V, and u are drawn independently from a normal distribution with mean zero and unit
variance:
𝑌 = 𝛽1 + 𝛽2 𝑋 + 𝑢
𝑋 = 𝜆1 𝑍 + 𝜆2 𝑉 + 𝑢
We will treat and V as variables, u as the disturbance term in the model. λ1 and λ2 are constants.
Setting 𝛽1 = 10, 𝛽2 = 5, 𝜆1 = 0.5, 𝜆2 = 2, the model becomes:
𝑌 = 10 + 5𝑋 + 𝑢
𝑋 = 0.5𝑍 + 2.0𝑉 + 𝑢
By this construction, 𝑋 is not distributed independently of 𝑢, i.e.
Cov( 𝑋, 𝑢) = Cov( 0.5𝑍 + 2.0𝑉 + 𝑢, 𝑢) = Cov(𝑢, 𝑢) = Var( 𝑢) = 1
OLS will give inconsistent estimates. Let’s calculate the large sample bias:
̂ (𝑋, 𝑢)
Cov Cov(𝑋, 𝑢)
plim(𝑏2𝑂𝐿𝑆 ) = plim (𝛽2 + ) = 𝛽2 + =
̂ (𝑋)
Var Var( 𝑋)
Var( 𝑋) = Var( 0.5𝑍 + 2.0𝑉 + 𝑢) = 1
=| |=5+ ≈ 5.19
0.25 Var( 𝑍) + 4 Var( 𝑉) + Var( 𝑢) 0.25 + 4 + 1
At the same time, Z can serve as an instrument, correlated with X, but independent of u. Hence,
plim(𝑏2𝐼𝑉 ) = 5.
The diagrams below show the distributions of the OLS and IV estimators for different sample
sizes (𝑛 = 25, 𝑛 = 100, 𝑛 = 3200) for 10 million samples:
10
OLS, n = 100
plim(𝑏2𝑂𝐿𝑆 ) ≈ 5.19
plim(𝑏2𝐼𝑉 ) = 5
It is evident that IV estimators have
5
greater variances, while OLS are
IV, n = 100 OLS, n = 25 biased. Note that for a small sample
IV, n = 25 size (𝑛 = 25) one might prefer the
OLS estimator according to some
0
4 5 6 criterion such as the mean square
60
error:
MSE(𝒃𝟐 ) =
= (bias(𝒃𝟐 ))𝟐 + Var( 𝒃𝟐 )
OLS, n = 3,200
For 𝑛 = 100 IV estimator looks
40
better.
For a large sample size (𝑛 = 3200),
both estimators are tending to the
20
predicted limits. The IV estimator is
IV, n = 3,200
definitely better here.
0
4 5 6
Let’s consider the distribution of √𝑛(𝑏2IV − 𝛽2 ) for different sample sizes. The dashed red line
shows the limiting normal distribution predicted by CLT. As can be seen from the diagram below,
for 𝑛 = 3200 the distribution is very close to the limiting normal case, while for small sample sizes
4
(𝑛 = 25 and 𝑛 = 100) it is definitely non-normal. In fact, it has fat tails that increases the probability
of suffering a Type I error. This distortion for small sample sizes is partly explained by low correlation
between X and Z that equals 0.22 for this case. In other words, the chosen instrument is not strong.
However, it is often difficult to find any credible instrument at all.
n = 25
n = 100
limiting normal distribution 0.2
n = 3,200
0.1
0
-6 -4 -2 0 2 4 6