0% found this document useful (0 votes)
102 views7 pages

Limit Theorems, OLS, and HAC

This document discusses limit theorems, ordinary least squares (OLS), and heteroskedasticity and autocorrelation consistent (HAC) standard errors. It begins by defining the law of large numbers and central limit theorem for independent random variables. It then shows how these can be extended to weakly stationary time series if the autocovariances are absolutely summable. The document discusses how OLS can be applied to time series data if the regressors are weakly exogenous. It notes that the asymptotic distribution of the OLS estimator involves a long-run variance, J, which accounts for potential autocorrelation in the error terms. Finally, it discusses parametric and non-parametric approaches to estimating J using HAC

Uploaded by

Jawhar Bacha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
102 views7 pages

Limit Theorems, OLS, and HAC

This document discusses limit theorems, ordinary least squares (OLS), and heteroskedasticity and autocorrelation consistent (HAC) standard errors. It begins by defining the law of large numbers and central limit theorem for independent random variables. It then shows how these can be extended to weakly stationary time series if the autocovariances are absolutely summable. The document discusses how OLS can be applied to time series data if the regressors are weakly exogenous. It notes that the asymptotic distribution of the OLS estimator involves a long-run variance, J, which accounts for potential autocorrelation in the error terms. Finally, it discusses parametric and non-parametric approaches to estimating J using HAC

Uploaded by

Jawhar Bacha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Limit Theorems

1
14.384 Time Series Analysis, Fall 2007
Professor Anna Mikusheva
Paul Schrimpf, scribe
September 11, 2007
revised September 9, 2013

Lecture 2

Limit Theorems, OLS, and HAC


Limit Theorems
What are limit theorems? They are laws describing behavior of sums of many random variables. The mostly
used are the Law of Large Numbers and Central Limit Theorem. In fact, these are two sets of theorems,
rather than just two theorems (dierent assumptions about moments conditions, dependence and the way
of summing can lead to similar statements). The most generic form is:
If {xi } is a sequence of independent identically distributed (iid) random variables, with Exi = ,
Var(xi ) = 2 then
n
1. Law of large numbers (LLN) n1 i=1 xi (in L2 , a.s., in probability)
2. Central limit theorems (CLT)

n 1
(n

n
i=1

xi ) N (0, 1)

We stated these while assuming independence. In time series, we usually dont have independence. Let us
explore where independence may have been used.
First, lets start at the simplest proof of LLN:
(
E

1
xi
n i=1
n

)2

)
n
1
=Var
xi
n i=1
( n
)

1
= 2 Var
xi
n
i=1

(1)

(2)

n
1
Var(xi )
n2 i=1

(3)

n 2
0
n2

(4)

We used independence to go from (2) to (3).


Without independence, wed have
( n
)
n
n
1
1
Var
xi = 2
cov(xi , xj )
n i=1
n i=1 j=1
1
(n0 + 2(n 1)1 + 2(n 2)2 + ...)
n2[
]
(
)
n

1
k
=
2
k 1
+ 0
n
n

k=1

Cite as: Anna Mikusheva, course materials for 14.384 Time Series Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu),
Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Limit Theorems

If we assume absolute summability, i.e.

j=

|j | < , then

[ n
]
(
)
1
k
lim
k 1
+ 0 = 0
n n
n
k=1

Thus, we have:
Lemma 1. If xt is a weakly stationary time series(with mean ) with absolutely summable auto-covariances
then a law of large numbers holds (in probability and L2 ).
Remark 2. Stationarity is not enough. Let z N (0, 2 ). Suppose xt = z t. Then cov(xt , xs ) = 2
t, s, so
n
we do not have absolute summability, and clearly we do not have a LLN for {xt } since the average n1 i=1 xi
equals to z, which is random.

Remark 3. For an MA, xt = c(L)et , we have j=1 |cj | < implies |j | <
The proof is easy. Last time we showed that
k =

cj cj+k

j=0

then

|k | =

|
cj cj+k |
k=0 j=0

k=0

|cj ||cj+k |

k=0 j=0

|cj ||cl |

l=0 j=0

=
|cj | <
j=0

From the new proof of LLN one can guess that the variance in a central limit theorem should change.
Remember that we wish to normalize the sum in such a way that the limit variance would be 1.
(
)
(
)
n
n

1
k
Var
xi =0 + 2
k 1
n
n i=1
0 + 2

k=1

k = J

k=1

J is called the long-run variance and is a correct scale measure.


There are many Central Limit Theorems for serially correlated observations. The simplest is for MA().

Theorem 4. Let yt = + j=0 cj etj , where et is independent white noise and j=0 |cj | < , then

)
T
1
yt N (0, J )
T t=1

For another version we have to introduce the following notations.

Cite as: Anna Mikusheva, course materials for 14.384 Time Series Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu),
Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Limit Theorems

Let It be information available at time t, i.e. It is the sigma-algebra generated by {yj }tj=
Let t,k = E[yt |Itk ] E[yt |Itk1 ] is the revision of forecast about yt as the new information arrives
at time t k.
Definition 5. A strictly stationary process {yt } is ergodic if for any t, k, l and any bounded functions, g
and h,
lim cov(g(yt , ..., yt+k ), h(yt+k+n , ...yt+k+n+l )) = 0

Theorem 6 (Gordons CLT). Assume that we have a strictly stationary and ergodic series {yt } with Eyt2 <
satisfying:

2 1/2
1.
<
j (Et,j )
2. E[yt |Itj ] 0 in L2 as j
then

where J = 0 + 2

T
1

yt N (0, J ),
T t=1

k is a long-run variance.

Remark 7. Notice, that yt = j=0 t,j . The condition 1 is intended to make the dependence between distant
observations to decrease to 0. Condition 1 can be checked (see an example below). Im not sure how the
ergodicity can be easily checked. Condition 2 is aimed at the correct centering, in particular, it implies that
E[yt ] = 0
Example 8. AR(1) yt = yt1 + et
2
We can check condition 2. We have E[yt |Itk ] E[yt |Itk1 ] = k etk and Et,j
= 2k 2 , so condition 2 is
satised. More generally, if the MA has absolutely summable coecients, then condition 2 will hold. One
can notice that E[yt |Itk ] = k ytk , so condition 3 holds. Now lets calculate the long-run variance:
k=1

2 k
k =
1 2
(
)

2
2
k
J = 0 + 2
k =
1
+
2

=
1 2
(1 )2
k=1

k=1

Remark 9.
J = 0 + 2

k =

k = (1)

k=

k=1

where (1) is the covariance function from last lecture evaluated at 1. Recall:

() =

i i

i=

and if a(L)yt = b(L)et , then


() = 2
so

(
J =

b()b( 1 )
a()a( 1 )

b(1)
a(1)

)2
2

Cite as: Anna Mikusheva, course materials for 14.384 Time Series Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu),
Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

OLS

Remark 10. If {yt } is a vector, then let k = cov(yt , yt+k ) and J = k . The only thing thats dierent
k . Instead, k = k . All the formulas above also hold, except in matrix
from the scalar case is that k =
notation. For example, for a VARMA,
J =A1 BB A1

Remark 11. If yt is a martingale dierence: E[yt |It1 ] = 0, then there is no serial correlation and J = 2 .

OLS
Suppose yt = xt + ut . In cross-section xt is always independent from us if s =
t due to iid assumption,
so the exclusion restriction is formulated as E(ut |xt ) = 0. In time series, however, we have to describe the
dependence between error terms and all regressors.
Definition 12. xt is weakly exogenous if E(ut |xt , xt1 , ...) = 0
Definition 13. xt is strictly exogenous if E(ut |{xt }
t= ) = 0
Usually, strict exogeneity is too strong an assumption, it is dicult to nd a good empirical example for
it. The weak exogeneity is much more functional (and we will mainly assume it).
OLS estimator: = (X X)1 (X y)
What is the asymptotic distribution?

1
1
T ( ) =( X X)1 ( X u)
T
T
1
1
1
=(
xt xt ) (
xt ut )
T t
T t

condition for
Appropriate assumptions will gives us a LLN for ( T1 t xt xt ) M . Assume also Gordons

zt = xt ut . If ut is weakly exogenous, then centering is OK. Gordons CLT gives ( 1T t xt ut ) N (0, J ),


which means that

T ( ) N (0, M 1 J M 1 )

The only thing that is dierent from usual is the J . J = j (where j are the autocovariances of zt =
xt ut ) is called the long-run variance. The long-run variance usually arise from potentially auto-dependent
error terms ut . The errors usually contains everything that is not in the regression, which is arguably autocorrelated. It also may arise from xt being autocorrelated and from conditional heteroskedasticity of the
error terms. We need to gure out how to estimate J . This is called HAC (heteroskedasticity autocorrelation
consistent) standard errors.
Remark 14. A side note on GLS. If one believes in strict exogeneity, then the estimation can be done more
eciently by using GLS. However, GLS is generally invalid if only weak exogeneity holds.
The logic here is the following. In many settings error terms ut are arguably auto-correlated, one may
think that estimation is not fully ecient (as Gauss-Markov theorem assumes that observations are uncrrelated) and could be improved. Assume for a moment that
yt = xt + ut ;

and

ut = ut1 + et .

Assume also for a moment that is known and et are serially uncorrelated(white noise). You may think of
transforming the system of observations and replace t s equation with the quasi-dierenced one:
yt yt1 = (xt xt1 ) + et ;
Cite as: Anna Mikusheva, course materials for 14.384 Time Series Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu),
Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

HAC

or yet = x
et + et , where yet = yt yt1 and x
et = xt xt1 . The new system seems to be better since the
errors are not autocorrelated and have the same variance (with the exception of the rst one). If we have
strong exogeneity then the OLS for the new system (the rst equation should be corrected to have the same
variance) is the ecient(BLUE). What we described is ecient GLS in this case. The problem thought is
that
E[et |xt , xt1 , ....] = E[ut |xt , xt1 , ...] E[ut1 |xt , xt1 , ...]
However, if ut satised only weak exogenuity but not strong exogenuity assumption, then the new error may
not satisfy the exogenuity condition, and the OLS in the transformed system will be biased. So, unless you
believe in strong exogeneity (which is extremely rare), you should not use GLS.

HAC
Assume we have a series {zt } satisfying Assumptions of CLT, and we want to estimate J =
are two main ways: parametric and non-parametric.

k . There

Parametric
Assume zt is AR(p):
zt = a1 zt1 + ... + ap ztp + et
then J =

2
a(1)2 ,

where a(L) = 1 a1 L ...ap Lp . We can proceed in the following way: run OLS regression
of zt on zt1 , ..., ztp , get a
1 , ..., a
p and
2 , then use a
(L) = 1 a
1 L .. a
p Lp to construct J,

2
J =
.
a
(1)2
Two important practical questions:
What p should we use? model selection criteria, BIC (Bayesian informaiton criteria)
What if zt is not AR(p)?
The second question is still an open question. Den Haan and Levin (1997) showed that if zt is AR(p), then
the convergence of the parameteric estimator is faster than the kernel estimator described below.

Non-parametric
A nave approach
J is the sum of all auto-covariance. We can estimate T 1 of these, but not all. What if we just use the
ones we can estimate, i.e.
J =

T
1

k=T 1

T k
1
k , k =
zj zj+k
T j=1

Cite as: Anna Mikusheva, course materials for 14.384 Time Series Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu),
Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Non-parametric

It turns out that this is a very bad idea.


T
1

J =

k=(T 1)

1
T

T
1

T
k

zj zj+k

k=(T 1) j=1

1 2
= (
zt )
T t=1
T

T
1 2
=(
zt )
T t=1

N (0, J )2
so J is not consistent; it converges to a distribution instead of a point. The problem is that were summing
too many imprecisely estimated covariances. So, the noise does not die out. For example, to estimate T 1
we use only one observation, how good can it be?
Truncated sum of sample covariances
What if we dont use all the covariances?
J2 =

ST

k=ST

where ST < T and ST as T , but more slowly.


First, we have to notice that due to truncation there will be a nite sample bias. As ST will increase
the bias due to truncation should be smaller and smaller. But we dont want to increase ST too fast for the
reason stated above (we dont want to sum up noises). Assume that we can choose ST in such a way that
this estimator is consistent. Then we might face another bad small sample property: the estimate of long
run variance may be negative: J2 < 0 (or in vector case, J2 not positive denite)
Example 15. Take ST = 1, then J2 = 0 + 21 . In small samples, we may nd 1 < 1/20 , then J2 will be
negative.
Weighted, truncated sum of sample covariances
The renewed suggestion is to create a weighted sum of sample auto-covariances with weights guaranteeing
positive-deniteness:
J =

ST

kT (j)j

j=ST

Remark 16. kT () is called a kernel.


We need conditions on ST and kT () to give us consistency and positive-deniteness. ST should increase
ST as T , but but not too fast.
kT () needs to be such that it guarantees positive-deniteness by down-weighting high lag covariances. Also
need kT () 1 for consistency.

Cite as: Anna Mikusheva, course materials for 14.384 Time Series Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu),
Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

MIT OpenCourseWare
http://ocw.mit.edu

14.384 Time Series Analysis


Fall 2013

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy