0% found this document useful (0 votes)

102 views7 pages

Limit Theorems, OLS, and HAC

This document discusses limit theorems, ordinary least squares (OLS), and heteroskedasticity and autocorrelation consistent (HAC) standard errors. It begins by defining the law of large numbers and central limit theorem for independent random variables. It then shows how these can be extended to weakly stationary time series if the autocovariances are absolutely summable. The document discusses how OLS can be applied to time series data if the regressors are weakly exogenous. It notes that the asymptotic distribution of the OLS estimator involves a long-run variance, J, which accounts for potential autocorrelation in the error terms. Finally, it discusses parametric and non-parametric approaches to estimating J using HAC

Uploaded by

Jawhar Bacha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

102 views7 pages

Limit Theorems, OLS, and HAC

Uploaded by

Jawhar Bacha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Limit Theorems

1
14.384 Time Series Analysis, Fall 2007
Professor Anna Mikusheva
Paul Schrimpf, scribe
September 11, 2007
revised September 9, 2013

Lecture 2

Limit Theorems, OLS, and HAC

Limit Theorems
What are limit theorems? They are laws describing behavior of sums of many random variables. The mostly
used are the Law of Large Numbers and Central Limit Theorem. In fact, these are two sets of theorems,
rather than just two theorems (dierent assumptions about moments conditions, dependence and the way
of summing can lead to similar statements). The most generic form is:
If {xi } is a sequence of independent identically distributed (iid) random variables, with Exi = ,
Var(xi ) = 2 then
n
1. Law of large numbers (LLN) n1 i=1 xi (in L2 , a.s., in probability)
2. Central limit theorems (CLT)

n 1
(n

n
i=1

xi ) N (0, 1)

We stated these while assuming independence. In time series, we usually dont have independence. Let us
explore where independence may have been used.
First, lets start at the simplest proof of LLN:
(
E

1
xi
n i=1
n

)
n
1
=Var
xi
n i=1
( n
)

1
= 2 Var
xi
n
i=1

(1)

(2)

n
1
Var(xi )
n2 i=1

(3)

n 2
0
n2

(4)

We used independence to go from (2) to (3).

Without independence, wed have
( n
)
n
n
1
1
Var
xi = 2
cov(xi , xj )
n i=1
n i=1 j=1
1
(n0 + 2(n 1)1 + 2(n 2)2 + ...)
n2[
]
(
)
n

1
k
=
2
k 1
+ 0
n
n

k=1

Cite as: Anna Mikusheva, course materials for 14.384 Time Series Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu),
Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Limit Theorems

If we assume absolute summability, i.e.

|j | < , then

[ n
]
(
)
1
k
lim
k 1
+ 0 = 0
n n
n
k=1

Thus, we have:
Lemma 1. If xt is a weakly stationary time series(with mean ) with absolutely summable auto-covariances
then a law of large numbers holds (in probability and L2 ).
Remark 2. Stationarity is not enough. Let z N (0, 2 ). Suppose xt = z t. Then cov(xt , xs ) = 2
t, s, so
n
we do not have absolute summability, and clearly we do not have a LLN for {xt } since the average n1 i=1 xi
equals to z, which is random.

Remark 3. For an MA, xt = c(L)et , we have j=1 |cj | < implies |j | <
The proof is easy. Last time we showed that
k =

cj cj+k

j=0

then

|k | =

|
cj cj+k |
k=0 j=0

k=0

|cj ||cj+k |

k=0 j=0

|cj ||cl |

l=0 j=0

=
|cj | <
j=0

From the new proof of LLN one can guess that the variance in a central limit theorem should change.
Remember that we wish to normalize the sum in such a way that the limit variance would be 1.
(
)
(
)
n
n

1
k
Var
xi =0 + 2
k 1
n
n i=1
0 + 2

k=1

k = J

k=1

J is called the long-run variance and is a correct scale measure.

There are many Central Limit Theorems for serially correlated observations. The simplest is for MA().

Theorem 4. Let yt = + j=0 cj etj , where et is independent white noise and j=0 |cj | < , then

)
T
1
yt N (0, J )
T t=1

For another version we have to introduce the following notations.

Cite as: Anna Mikusheva, course materials for 14.384 Time Series Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu),
Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Limit Theorems

Let It be information available at time t, i.e. It is the sigma-algebra generated by {yj }tj=
Let t,k = E[yt |Itk ] E[yt |Itk1 ] is the revision of forecast about yt as the new information arrives
at time t k.
Definition 5. A strictly stationary process {yt } is ergodic if for any t, k, l and any bounded functions, g
and h,
lim cov(g(yt , ..., yt+k ), h(yt+k+n , ...yt+k+n+l )) = 0

Theorem 6 (Gordons CLT). Assume that we have a strictly stationary and ergodic series {yt } with Eyt2 <
satisfying:

2 1/2
1.
<
j (Et,j )
2. E[yt |Itj ] 0 in L2 as j
then

where J = 0 + 2

T
1

yt N (0, J ),
T t=1

k is a long-run variance.

Remark 7. Notice, that yt = j=0 t,j . The condition 1 is intended to make the dependence between distant
observations to decrease to 0. Condition 1 can be checked (see an example below). Im not sure how the
ergodicity can be easily checked. Condition 2 is aimed at the correct centering, in particular, it implies that
E[yt ] = 0
Example 8. AR(1) yt = yt1 + et
2
We can check condition 2. We have E[yt |Itk ] E[yt |Itk1 ] = k etk and Et,j
= 2k 2 , so condition 2 is
satised. More generally, if the MA has absolutely summable coecients, then condition 2 will hold. One
can notice that E[yt |Itk ] = k ytk , so condition 3 holds. Now lets calculate the long-run variance:
k=1

2 k
k =
1 2
(
)

2
2
k
J = 0 + 2
k =
1
+
2

=
1 2
(1 )2
k=1

k=1

Remark 9.
J = 0 + 2

k =

k = (1)

k=1

where (1) is the covariance function from last lecture evaluated at 1. Recall:

() =

i i

and if a(L)yt = b(L)et , then

() = 2
so

(
J =

b()b( 1 )
a()a( 1 )

b(1)
a(1)

)2
2

Cite as: Anna Mikusheva, course materials for 14.384 Time Series Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu),
Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

OLS

Remark 10. If {yt } is a vector, then let k = cov(yt , yt+k ) and J = k . The only thing thats dierent
k . Instead, k = k . All the formulas above also hold, except in matrix
from the scalar case is that k =
notation. For example, for a VARMA,
J =A1 BB A1

Remark 11. If yt is a martingale dierence: E[yt |It1 ] = 0, then there is no serial correlation and J = 2 .

OLS
Suppose yt = xt + ut . In cross-section xt is always independent from us if s =
t due to iid assumption,
so the exclusion restriction is formulated as E(ut |xt ) = 0. In time series, however, we have to describe the
dependence between error terms and all regressors.
Definition 12. xt is weakly exogenous if E(ut |xt , xt1 , ...) = 0
Definition 13. xt is strictly exogenous if E(ut |{xt }
t= ) = 0
Usually, strict exogeneity is too strong an assumption, it is dicult to nd a good empirical example for
it. The weak exogeneity is much more functional (and we will mainly assume it).
OLS estimator: = (X X)1 (X y)
What is the asymptotic distribution?

1
1
T ( ) =( X X)1 ( X u)
T
T
1
1
1
=(
xt xt ) (
xt ut )
T t
T t

condition for
Appropriate assumptions will gives us a LLN for ( T1 t xt xt ) M . Assume also Gordons

zt = xt ut . If ut is weakly exogenous, then centering is OK. Gordons CLT gives ( 1T t xt ut ) N (0, J ),

which means that

T ( ) N (0, M 1 J M 1 )

The only thing that is dierent from usual is the J . J = j (where j are the autocovariances of zt =
xt ut ) is called the long-run variance. The long-run variance usually arise from potentially auto-dependent
error terms ut . The errors usually contains everything that is not in the regression, which is arguably autocorrelated. It also may arise from xt being autocorrelated and from conditional heteroskedasticity of the
error terms. We need to gure out how to estimate J . This is called HAC (heteroskedasticity autocorrelation
consistent) standard errors.
Remark 14. A side note on GLS. If one believes in strict exogeneity, then the estimation can be done more
eciently by using GLS. However, GLS is generally invalid if only weak exogeneity holds.
The logic here is the following. In many settings error terms ut are arguably auto-correlated, one may
think that estimation is not fully ecient (as Gauss-Markov theorem assumes that observations are uncrrelated) and could be improved. Assume for a moment that
yt = xt + ut ;

and

ut = ut1 + et .

Assume also for a moment that is known and et are serially uncorrelated(white noise). You may think of
transforming the system of observations and replace t s equation with the quasi-dierenced one:
yt yt1 = (xt xt1 ) + et ;
Cite as: Anna Mikusheva, course materials for 14.384 Time Series Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu),
Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

HAC

or yet = x
et + et , where yet = yt yt1 and x
et = xt xt1 . The new system seems to be better since the
errors are not autocorrelated and have the same variance (with the exception of the rst one). If we have
strong exogeneity then the OLS for the new system (the rst equation should be corrected to have the same
variance) is the ecient(BLUE). What we described is ecient GLS in this case. The problem thought is
that
E[et |xt , xt1 , ....] = E[ut |xt , xt1 , ...] E[ut1 |xt , xt1 , ...]
However, if ut satised only weak exogenuity but not strong exogenuity assumption, then the new error may
not satisfy the exogenuity condition, and the OLS in the transformed system will be biased. So, unless you
believe in strong exogeneity (which is extremely rare), you should not use GLS.

HAC
Assume we have a series {zt } satisfying Assumptions of CLT, and we want to estimate J =
are two main ways: parametric and non-parametric.

k . There

Parametric
Assume zt is AR(p):
zt = a1 zt1 + ... + ap ztp + et
then J =

2
a(1)2 ,

where a(L) = 1 a1 L ...ap Lp . We can proceed in the following way: run OLS regression
of zt on zt1 , ..., ztp , get a
1 , ..., a
p and
2 , then use a
(L) = 1 a
1 L .. a
p Lp to construct J,

2
J =
.
a
(1)2
Two important practical questions:
What p should we use? model selection criteria, BIC (Bayesian informaiton criteria)
What if zt is not AR(p)?
The second question is still an open question. Den Haan and Levin (1997) showed that if zt is AR(p), then
the convergence of the parameteric estimator is faster than the kernel estimator described below.

Non-parametric
A nave approach
J is the sum of all auto-covariance. We can estimate T 1 of these, but not all. What if we just use the
ones we can estimate, i.e.
J =

T
1

k=T 1

T k
1
k , k =
zj zj+k
T j=1

Cite as: Anna Mikusheva, course materials for 14.384 Time Series Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu),
Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

Non-parametric

It turns out that this is a very bad idea.

T
1

J =

k=(T 1)

1
T

T
1

T
k

zj zj+k

k=(T 1) j=1

1 2
= (
zt )
T t=1
T

T
1 2
=(
zt )
T t=1

N (0, J )2
so J is not consistent; it converges to a distribution instead of a point. The problem is that were summing
too many imprecisely estimated covariances. So, the noise does not die out. For example, to estimate T 1
we use only one observation, how good can it be?
Truncated sum of sample covariances
What if we dont use all the covariances?
J2 =

k=ST

where ST < T and ST as T , but more slowly.

First, we have to notice that due to truncation there will be a nite sample bias. As ST will increase
the bias due to truncation should be smaller and smaller. But we dont want to increase ST too fast for the
reason stated above (we dont want to sum up noises). Assume that we can choose ST in such a way that
this estimator is consistent. Then we might face another bad small sample property: the estimate of long
run variance may be negative: J2 < 0 (or in vector case, J2 not positive denite)
Example 15. Take ST = 1, then J2 = 0 + 21 . In small samples, we may nd 1 < 1/20 , then J2 will be
negative.
Weighted, truncated sum of sample covariances
The renewed suggestion is to create a weighted sum of sample auto-covariances with weights guaranteeing
positive-deniteness:
J =

kT (j)j

j=ST

Remark 16. kT () is called a kernel.

We need conditions on ST and kT () to give us consistency and positive-deniteness. ST should increase
ST as T , but but not too fast.
kT () needs to be such that it guarantees positive-deniteness by down-weighting high lag covariances. Also
need kT () 1 for consistency.

Cite as: Anna Mikusheva, course materials for 14.384 Time Series Analysis, Fall 2007. MIT OpenCourseWare (http://ocw.mit.edu),
Massachusetts Institute of Technology. Downloaded on [DD Month YYYY].

MIT OpenCourseWare
http://ocw.mit.edu

14.384 Time Series Analysis

Fall 2013

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

TimeSeriesAnalysis&ItsApplications2e Shumway PDF
91% (11)
TimeSeriesAnalysis&ItsApplications2e Shumway PDF
82 pages
Time Series Cheatsheet en
100% (1)
Time Series Cheatsheet en
3 pages
Time Series Analysis and Its Applications (Instructor's Manual) (Robert H. Shumway, David S. Stoffer)
100% (1)
Time Series Analysis and Its Applications (Instructor's Manual) (Robert H. Shumway, David S. Stoffer)
81 pages
Stochastic Calculus (EPFL)
No ratings yet
Stochastic Calculus (EPFL)
44 pages
SolutionsShumway PDF
No ratings yet
SolutionsShumway PDF
82 pages
Network Analysis Questions and Worksheet
No ratings yet
Network Analysis Questions and Worksheet
3 pages
Hamilton Time Series Analysis
No ratings yet
Hamilton Time Series Analysis
407 pages
Multivariate Bayesian Regression Analysis Applied To Ground-Motion Prediction Equations, Part 1
No ratings yet
Multivariate Bayesian Regression Analysis Applied To Ground-Motion Prediction Equations, Part 1
17 pages
Lecture 2
No ratings yet
Lecture 2
42 pages
MIT14 384F13 Lec1 PDF
No ratings yet
MIT14 384F13 Lec1 PDF
6 pages
Linear Stationary Models
No ratings yet
Linear Stationary Models
16 pages
MIT14 384F13 Lec20
No ratings yet
MIT14 384F13 Lec20
6 pages
L4 Modeling Cycles
No ratings yet
L4 Modeling Cycles
80 pages
Estimation
No ratings yet
Estimation
16 pages
Time Series Station, AR, MA
No ratings yet
Time Series Station, AR, MA
30 pages
Cap0 Slides
No ratings yet
Cap0 Slides
53 pages
Midterm Answers: T T T 1 T T 2
No ratings yet
Midterm Answers: T T T 1 T T 2
5 pages
Graduate Macro Theory II: Notes On Time Series
No ratings yet
Graduate Macro Theory II: Notes On Time Series
20 pages
Empirical Finance3
No ratings yet
Empirical Finance3
35 pages
Krolzig Macroeconometrics I
No ratings yet
Krolzig Macroeconometrics I
48 pages
Problem Set 2-1
No ratings yet
Problem Set 2-1
6 pages
Time Series Exam, 2010: Solutions
No ratings yet
Time Series Exam, 2010: Solutions
4 pages
Time Series Cheatsheet en
No ratings yet
Time Series Cheatsheet en
3 pages
Time Series 2022 B
No ratings yet
Time Series 2022 B
57 pages
Time Series Cheatsheet en
No ratings yet
Time Series Cheatsheet en
3 pages
Exercises Chapter 6
No ratings yet
Exercises Chapter 6
17 pages
Chapter 5
No ratings yet
Chapter 5
10 pages
Introduction To Econometrics - Stock & Watson - CH 13 Slides
No ratings yet
Introduction To Econometrics - Stock & Watson - CH 13 Slides
38 pages
Time Series Analysis
100% (1)
Time Series Analysis
66 pages
LBSEconometricsPartIIpdf Time Series
No ratings yet
LBSEconometricsPartIIpdf Time Series
246 pages
Week1 Combined
No ratings yet
Week1 Combined
38 pages
Block 2
No ratings yet
Block 2
105 pages
LN LinearTSModels
No ratings yet
LN LinearTSModels
31 pages
A Course in Time Series Analysis
No ratings yet
A Course in Time Series Analysis
139 pages
Econ 654 Unit1
No ratings yet
Econ 654 Unit1
296 pages
Economitrics
No ratings yet
Economitrics
20 pages
1 Introduction: Why Time Series Analysis
No ratings yet
1 Introduction: Why Time Series Analysis
22 pages
Lecture 8
No ratings yet
Lecture 8
22 pages
Station A Rity
No ratings yet
Station A Rity
18 pages
Ch6 Slides Ed3 Feb2024
No ratings yet
Ch6 Slides Ed3 Feb2024
31 pages
MIT14 384F13 Rec7
No ratings yet
MIT14 384F13 Rec7
6 pages
Zamani 1
No ratings yet
Zamani 1
22 pages
CH 5
No ratings yet
CH 5
53 pages
c11.4 02-Further Issues in Using OLS With Times Series Data
No ratings yet
c11.4 02-Further Issues in Using OLS With Times Series Data
61 pages
Ec2142 CourseNotes
No ratings yet
Ec2142 CourseNotes
94 pages
12 Autocorrelation
No ratings yet
12 Autocorrelation
9 pages
TIme-series Analysis
No ratings yet
TIme-series Analysis
17 pages
Ch6 Slides Ed3 Feb2021
No ratings yet
Ch6 Slides Ed3 Feb2021
63 pages
Main Linear Models of Time Series
No ratings yet
Main Linear Models of Time Series
30 pages
Hilbert Space For Random Processes
No ratings yet
Hilbert Space For Random Processes
11 pages
Lec1 08 PDF
No ratings yet
Lec1 08 PDF
15 pages
Panel Data
No ratings yet
Panel Data
14 pages
Time Series
No ratings yet
Time Series
327 pages
Overview of As Convergence
No ratings yet
Overview of As Convergence
17 pages
1.5 Estimation of Correlation: 26 1 Characteristics of Time Series
No ratings yet
1.5 Estimation of Correlation: 26 1 Characteristics of Time Series
20 pages
Univariate Time Series Modelling and Forecasting
100% (2)
Univariate Time Series Modelling and Forecasting
72 pages
Y .C, YA,: Yt Yy y Ys
No ratings yet
Y .C, YA,: Yt Yy y Ys
24 pages
Lecture 19 Trends I
No ratings yet
Lecture 19 Trends I
8 pages
Lecture 1
No ratings yet
Lecture 1
60 pages
Spring 2012 Statistics 153 Lecture Five
No ratings yet
Spring 2012 Statistics 153 Lecture Five
6 pages
TSNotes 2
No ratings yet
TSNotes 2
28 pages
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
From Everand
Student Solutions Manual to Accompany Economic Dynamics in Discrete Time, secondedition
Yue Jiang
4.5/5 (2)
Lectures on the Coupling Method
From Everand
Lectures on the Coupling Method
Torgny Lindvall
No ratings yet
MATH230 Lecture Notes 4
No ratings yet
MATH230 Lecture Notes 4
50 pages
Bon Et Al 2023 Being Bayesian in The 2020s Opportunities and Challenges in The Practice of Modern Applied Bayesian
No ratings yet
Bon Et Al 2023 Being Bayesian in The 2020s Opportunities and Challenges in The Practice of Modern Applied Bayesian
29 pages
Chapter 2 - Axioms of Probability 1
No ratings yet
Chapter 2 - Axioms of Probability 1
17 pages
Lampiran 4 Hasil Uji Validitas (Analisa Faktor) : KMO and Bartlett's Test
No ratings yet
Lampiran 4 Hasil Uji Validitas (Analisa Faktor) : KMO and Bartlett's Test
3 pages
Unit 5
No ratings yet
Unit 5
36 pages
Spreij Measure Theoretic Probability
No ratings yet
Spreij Measure Theoretic Probability
169 pages
Stochastic Differential Equations: Florian Herzog 2010
No ratings yet
Stochastic Differential Equations: Florian Herzog 2010
64 pages
CBSE Worksheets For Class 12 Maths Linear Programming Assignment 01
No ratings yet
CBSE Worksheets For Class 12 Maths Linear Programming Assignment 01
1 page
Checklist s2
No ratings yet
Checklist s2
3 pages
09 Random Variables and Probability Distributions
No ratings yet
09 Random Variables and Probability Distributions
16 pages
or 2 Decision Theory
No ratings yet
or 2 Decision Theory
57 pages
CHAPTER 1 Part 2 Student
No ratings yet
CHAPTER 1 Part 2 Student
4 pages
Further Statistics Mock 5: Answers 9231/4M/05/M/J/24
No ratings yet
Further Statistics Mock 5: Answers 9231/4M/05/M/J/24
1 page
As Level Statistics 2022
No ratings yet
As Level Statistics 2022
5 pages
Uw Math Stat394 Hw4 Sol
No ratings yet
Uw Math Stat394 Hw4 Sol
4 pages
Cambridge International Examinations
No ratings yet
Cambridge International Examinations
12 pages
Guía 2
No ratings yet
Guía 2
6 pages
CS103: Mathematical Foundations of Computer Science, Stanford University
No ratings yet
CS103: Mathematical Foundations of Computer Science, Stanford University
3 pages
36-401 Modern Regression Homework #1 Solutions: DUE: September 8, 2017 Problem 1 (20 PTS.)
No ratings yet
36-401 Modern Regression Homework #1 Solutions: DUE: September 8, 2017 Problem 1 (20 PTS.)
7 pages
Chapter 1
No ratings yet
Chapter 1
47 pages
STSM 2023 Module Guide
No ratings yet
STSM 2023 Module Guide
14 pages
10 - T Dist
No ratings yet
10 - T Dist
44 pages
22-Lecture Notes On Probability Theory and Random Processes
100% (2)
22-Lecture Notes On Probability Theory and Random Processes
302 pages
Introduction
No ratings yet
Introduction
99 pages
Final Assignment - 4 - MA2201
No ratings yet
Final Assignment - 4 - MA2201
2 pages
Two Way Anova & Manova
No ratings yet
Two Way Anova & Manova
10 pages
00 KokoskaIntroStat3e 04962 ch06.5 Online 001 010 4PP 105448
No ratings yet
00 KokoskaIntroStat3e 04962 ch06.5 Online 001 010 4PP 105448
10 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Limit Theorems, OLS, and HAC

Uploaded by

Limit Theorems, OLS, and HAC

Uploaded by

Limit Theorems

Limit Theorems, OLS, and HAC

We used independence to go from (2) to (3).

If we assume absolute summability, i.e.

J is called the long-run variance and is a correct scale measure.

For another version we have to introduce the following notations.

and if a(L)yt = b(L)et , then

zt = xt ut . If ut is weakly exogenous, then centering is OK. Gordons CLT gives ( 1T t xt ut ) N (0, J ),

It turns out that this is a very bad idea.

where ST < T and ST as T , but more slowly.

Remark 16. kT () is called a kernel.

14.384 Time Series Analysis

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.