Economics 620, Lecture 11: Generalized Least Squares (GLS) : Nicholas M. Kiefer
Economics 620, Lecture 11: Generalized Least Squares (GLS) : Nicholas M. Kiefer
(GLS)
Nicholas M. Kiefer
Cornell University
Professor N. M. Kiefer (Cornell University) Lecture 11: GLS 1 / 17
In this lecture, we will consider the model y = X, + - retaining the
assumption Ey = X,.
However, we no longer have the assumption V(y) = V(-) = o
2
I . Instead
we add the assumption V(y) = V where V is positive denite.
Sometimes we take V = o
2
with tr = N.
As we know,
^
, = (X
0
X)
1
X
0
y. What is E
^
,?
Note that V(
^
,) = (X
0
X)
1
XVX(X
0
X)
1
in this case.
Is
^
, BLUE? Does
^
, minimize e
0
e?
Professor N. M. Kiefer (Cornell University) Lecture 11: GLS 2 / 17
The basic idea behind GLS is to transform the observation matrix [y X] so
that the variance in the transformed model is I (or o
2
I ).
Since V is positive denite, V
1
is positive denite too. Therefore, there
exists a nonsingular matrix P such that V
1
= P
0
P.
Transforming the model y = X, + - by P yields Py = PX, + P-.
Note that EP- = PE- = 0 and
V(P-) = PE--
0
P
0
= PVP
0
P(P
0
P)
1
P
0
= I . (We could have done this
with V = o
2
and imposed tr = N if useful.) That is, the transformed
model Py = PX, + P- satises the conditions under which we developed
Least Squares estimators.
Professor N. M. Kiefer (Cornell University) Lecture 11: GLS 3 / 17
Thus, the LS estimator is BLUE in the transformed model. The LS
estimator for , in the model Py = PX, + P- is referred to as the GLS
estimator for , in the model y = X, + -.
Proposition: The LGS estimator for , is
^
,
G
= (X
0
V
1
X)
1
X
0
V
1
y.
Proof : Apply LS to the transformed model. Thus,
^
,
G
= (X
0
P
0
PX)
1
X
0
P
0
Py
= (X
0
V
1
X)
1
X
0
V
1
y.
Proposition: V(
^
,
G
) = (X
0
V
1
X)
1
.
Proof : Note that
^
,
G
, = (X
0
V
1
X)
1
X
0
V
1
-. Thus,
V(
^
,
G
) = E(X
0
V
1
X)
1
X
0
V
1
--
0
V
1
X(X
0
V
1
X)
1
= (X
0
V
1
X)
1
X
0
V
1
VV
1
X(X
0
V
1
X)
1
= (X
0
V
1
X)
1
.
Professor N. M. Kiefer (Cornell University) Lecture 11: GLS 4 / 17
Aitkens Theorem: The GLS estimator is BLUE. (This really follows from
the Gauss-Markov Theorem, but lets give a direct proof.)
Proof : Let b be an alternative linear unbiased estimator such that
b = [(X
0
V
1
X)
1
X
0
V
1
+ A]y.
Unbiasedness implies that AX = 0.
V(b) = [(X
0
V
1
X)
1
X
0
V
1
+ A]V
[(X
0
V
1
X)
1
X
0
V
1
+ A
0
]
= (X
0
V
1
X)
1
+ AVA
0
+ (X
0
V
1
X)
1
X
0
A
0
+AX(X
0
V
1
X)
1
The last two terms are zero. (Why?)
The second term is positive semi-denite, so A = 0 is best.
Professor N. M. Kiefer (Cornell University) Lecture 11: GLS 5 / 17
What does GLS minimize?
Recall that (y Xb)
0
(y Xb) is minimized by b =
^
,
(i.e., (y Xb) is minimized in length by b =
^
,).
Consider P(y Xb). The length of this vector is
(y Xb)
0
P
0
P(y Xb) = (y Xb)
0
V
1
(y Xb)
Thus, GLS minimizes P(y Xb) in length.
Let ~ e = (y X
^
,
G
). Note that satises
X
0
V
1
(y X
^
,
G
) = X
0
V
1
~ e = 0.(Why?)
Then
(y Xb)
0
V
1
(y Xb) = (y X
^
,
G
)
0
V
1
(y X
^
,
G
)
+(b
^
,
G
)
0
X
0
V
1
X(b
^
,
G
)
Note that X
0
~ e 6= 0 in general.
Professor N. M. Kiefer (Cornell University) Lecture 11: GLS 6 / 17
Estimation of Variance
Let V(y) = o
2
where tr = N.
Choose P so P
0
P =
1
. Then the variance in the transformed model
Py = PX, + P- is o
2
I . Our standard formula gives s
2
= ~ e
0
~ e,(N K)
which is the unbiased estimator for o
2
.
Now we add the assumption of normality: y N(X,, o
2
).
Consider the log likelihood:
/(,o
2
) = c
N
2
ln o
2
1
2
ln jj
1
2o
2
(y X,)
0
1
(y X,).
Proposition: The GLS estimator is the ML estimator for ,. (Why?)
Professor N. M. Kiefer (Cornell University) Lecture 11: GLS 7 / 17
Proposition: o
2
ML
= ~ e
0
~ e,N (as expected).
Proposition:
^
,
G
and ~ e are independent. (How would you prove this?)
Testing:
Testing procedures are as in the ordinary model. Results we have
developed under the standard set-up will be applied to the transformed
model.
Professor N. M. Kiefer (Cornell University) Lecture 11: GLS 8 / 17
When does
^
,
G
=
^
,?
1.
^
,
G
=
^
, holds trivially when o
2
I = V.
2.
^
, = (X
0
X)
1
X
0
y and
^
,
G
= (X
0
V
1
X)
1
X
0
V
1
y
^
,
G
=
^
,
)(X
0
X)
1
X
0
= (X
0
V
1
X)
1
X
0
V
1
)VX = X(X
0
V
1
X)
1
X
0
X = XR
(What are the dimensions of these matrices?)
Interpretation: In the case where K = 1, X is an eigenvector of V. In
general, if the columns of X are each linear combinations of the same K
eigenvectors of V, then
^
,
G
=
^
,. This is hard to check and would usually
be a bad assumption.
Professor N. M. Kiefer (Cornell University) Lecture 11: GLS 9 / 17
Example: Equicorrelated case: V(y) = V = I + c11
0
where 1 is an
N-vector of ones.
The LS estimator is the same as the GLS estimator if X has a column of
ones.
Case of unknown :
Note that there is no hope of estimating since there are N(N + 1),2
parameters and only N observations. Thus, we usually make some
parametric restriction as = (0) with 0 a xed parameter. Then we
can hope to estimate 0 consistently using squares and cross products of LS
residuals or we could use ML.
Note that it doesnt make sense to try to consistently estimate since it
grows with sample size.
Professor N. M. Kiefer (Cornell University) Lecture 11: GLS 10 / 17
Thus, consistency refers to the estimate of 0.
Dention:
^
= (
^
0) is a consistent estimator of if and only if
^
0 is a
consistent estimator of 0.
Feasible GLS (FGLS) is the estimation method used when is unknown.
FGLS is the same as GLS except that it uses an estimated , say
^
= (
^
0), instead of .
Proposition:
^
,
FG
= (X
0
^
1
X)
1
X
0
^
1
y
Note that
^
,
FG
= , + (X
0
^
1
X)
1
X
0
^
1
-. The following proposition
follows easily from this decomposition of
^
,
FG
.
Professor N. M. Kiefer (Cornell University) Lecture 11: GLS 11 / 17
Proposition: The sucient conditions for
^
,
FG
to be consistent are
p lim
X
0
^
1
X
N
= Q
where Q is positive denite and nite, and
p lim
X
0
^
1
-
N
= 0.
It takes a little more to get a distribution theory. From our discussion of
^
,
G
, it easily follows that
p
N(
^
,
G
,) !N
0, o
2
X
0
1
X
N
1
!
Professor N. M. Kiefer (Cornell University) Lecture 11: GLS 12 / 17
What about the distribution of
^
,
FG
when is unknown?
Proposition: Sucient conditions for
^
,
FG
and
^
,
G
to have the same
asymptotic distribution are that
p lim
X
0
(
^
1
)X
N
= 0
p lim
X
0
(
^
1
)e
p
N
= 0.
Proof : Note that
p
N(
^
,
G
,) =
X
0
1
X
N
X
0
1
-
p
N
and
p
N(
^
,
FG
,) =
X
0
^
1
X
N
!
1
X
0
^
1
-
p
N
!
.
Professor N. M. Kiefer (Cornell University) Lecture 11: GLS 13 / 17
Thus
p lim
p
N(
^
,
G
^
,
FG
) = 0
if
p lim
X
0
^
1
X
N
= p lim
X
0
1
X
N
and
p lim
X
0
^
1
-
p
N
= p lim
X
0
1
-
p
N
.
We are done. (Recall that p lim(x y) = 0 ) the random variables x
and y have the same asymptotic distribution.)
Professor N. M. Kiefer (Cornell University) Lecture 11: GLS 14 / 17
Summing up:
Consistency of
^
0 implies consistency of the FGLS estimator. A little more
is required for the FGLS estimator to have the same asymptotic
distribution as the GLS estimator. These conditions are usually met.
Professor N. M. Kiefer (Cornell University) Lecture 11: GLS 15 / 17
Small-sample properties of FGLS estimators:
Proposition: Suppose
^
0 is an even function of - (i.e.,
^
0(-) =
^
0(-)).
(This holds of
^
0 depends on squares and cross products of residuals.)
Suppose - has a symmetric distribution. Then E
^
,
FG
= , if the mean
exists.
Proof : The sampling error
^
,
FG
, = (X
0
^
(
^
0)
1
X)
1
X
0
^
(
^
0)
1
-
has a symmetric distribution around zero since - and - give the same
value of
^
. If the mean exists, it is zero.
Note that this property is weak. It is easily obtained but it is not very
useful.
Professor N. M. Kiefer (Cornell University) Lecture 11: GLS 16 / 17
General advice:
-Do note use too many parameters in estimating the variance-covariance
matrix or the increase in sampling variances will outweigh the decrease in
asymptotic variance.
-Always calculate LS as well as GLS estimators. What are the data telling
you if these dier a lot?
Professor N. M. Kiefer (Cornell University) Lecture 11: GLS 17 / 17