0% found this document useful (0 votes)
95 views37 pages

Auto/cross-Correlation: Generalized Regression Model

This lecture discusses autocorrelation and its implications for ordinary least squares (OLS) regression. It defines autocorrelation and presents examples of autoregressive (AR) and moving average (MA) models. Visual checks for autocorrelation involve plotting residuals to identify patterns. Autocorrelation violates OLS assumptions and results in inefficient estimates, though estimates remain unbiased. The Newey-West robust covariance matrix addresses this. Relative efficiency compares the variance of OLS and generalized least squares estimates, showing OLS can have much larger standard errors when autocorrelation is present.

Uploaded by

jios
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
95 views37 pages

Auto/cross-Correlation: Generalized Regression Model

This lecture discusses autocorrelation and its implications for ordinary least squares (OLS) regression. It defines autocorrelation and presents examples of autoregressive (AR) and moving average (MA) models. Visual checks for autocorrelation involve plotting residuals to identify patterns. Autocorrelation violates OLS assumptions and results in inefficient estimates, though estimates remain unbiased. The Newey-West robust covariance matrix addresses this. Relative efficiency compares the variance of OLS and generalized least squares estimates, showing OLS can have much larger standard errors when autocorrelation is present.

Uploaded by

jios
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

RS – Lecture 13

Lecture 13
Auto/cross-correlation

Generalized Regression Model


• The generalized regression model's assumptions:
(A1) DGP: y = X  +  is correctly specified.
(A2) E[|X] = 0
(A3’) Var[|X] = Σ = 2.
(A4) X has full column rank – rank(X)=k –, where T ≥ k.

• We assume that the ’s in the sample are not longer generated
independently of each other. Ignoring hetersocedasticity, we have a
new Σ:
E[ij|X] = ij if i ≠ j
= 2 if i = j.

1
RS – Lecture 13

Auto-correlation
• In general, we find autocorrelation (or serial correlation) in time
series, shocks are persistent over time: It takes time to absorb a shock.

• The shocks can also be correlated over the cross-section, causing


cross-correlation. For example, if an unexpected new tax is imposed
on the technology sector, all the companies in the sector are going to
share this shock.

• Usually, we model autocorrelation using two model: autoregressive


(AR) and moving averages (MA).

• In an AR model, the errors, εt, show a correlation over time. In an


MA model, the errors, 𝜀 , are a function (similar to a weighted
average) of previous errors, now denoted 𝑢 ’s.
3

Auto-correlation
Examples:
- First-order autoregressive autocorrelation: AR(1)
𝜀 =  𝜀 𝑢
- pth-order autoregressive autocorrelation: AR(p)
𝜀 =  𝜀  𝜀 ⋯  p𝜀 𝑢
- Third-order moving average autocorrelation: MA(3)
𝜀 = 𝑢 λ𝑢 λ𝑢 λ3𝑢

Note: The last example is described as third-order moving average


autocorrelation, denoted MA(3), because it depends on the three
previous innovations as well as the current one.

2
RS – Lecture 13

Auto-correlation – Visual Check


• Plot data, usually residuals from a regression, to see if there is a
pattern:

- Positive autocorrelation: A positive (negative) observation tends to


be followed by a positive (negative) observation. We tend to see
continuation in the series.

- Negative autocorrelation: A positive (negative) observation tends to


be followed by a negative (positive) observation. We tend to see
reversals.

- No autocorrelation: A positive (negative) observation has the same


probability of being followed by a negative or positive (positive or
negative) observation. We tend to no pattern.
5

Auto-correlation – Visual Check


Example: I simulate a 𝑦 series, with N(0,1) 𝑢 errors:
𝑦 =  𝑦 𝑢
Three cases:
(1) Positive autocorrelation:  
(2) Negative autocorrelation:  
(3) No correlation:  

• R code for simulation:


T_sim <- 200
u <- rnorm(200) # Draw T_sim normally distributed errors
y_sim <- matrix(0,T_sim,1)
rho <- .7 # Change to create different correlation patterns
a <- 2 # Time index for observations
while (a <= T_sim) {
y_sim[a] = rho * y_sim[a-1] + u[a] # y_sim simulated autocorrelated values
a <- a + 1
}
plot(y_sim, type="l", col="blue", ylab ="Simulated Series", xlab ="Time")
title("Visual Test: Autocorrelation?") 6

3
RS – Lecture 13

Auto-correlation – Visual Check


Example (continuation):
(1) Positive autocorrelation  

(2) Negative autocorrelation  

Auto-correlation – Visual Check


Example (continuation):
(3) No autocorrelation:  

4
RS – Lecture 13

Auto-correlation – Visual Check: IBM


Example: Residual plot for the 3 factor F-F model for IBM returns:

• It looks like a small , but not very clear pattern from the graphs.
9

Auto-correlation – Visual Check: GE


Example: Residual plot for the 3 factor F-F model for GE returns:

• It looks like a small , but not very clear pattern from the graphs. 10

5
RS – Lecture 13

Auto-correlation – Visual Check: GBP


Example: Residual plot for the encompassing model (IFE + PPP)
for changes in the USD/GBP:

• Again, it looks like a small , but not very clear pattern. 11

Implications for OLS


• Similar to the heteroscedasticity results:
- OLS is unbiased, consistent (we need additional assumptions),
asymptotic normality (we need additional assumptions and
definitions), but inefficient.
- OLS standard errors are incorrect, often biased downwards.

• A very important exception: The lagged dependent variable


yt =  xt +  yt-1 + t; t = t-1 + ut.

Now, Cov[yt-1, t ]  0  IV Estimation

• Useful strategy: OLS estimates with the Newey-West (NW) robust


estimation of the covariance matrix. Recall NW’s HAC estimator of
Q*:
ST = S0 + (1/T) l kL(l) t=l+1,...,T (xt-let-l etxt+ xtet et-lxt-l)

6
RS – Lecture 13

Implications for OLS: Relative Efficiency


• We define relative efficiency of GLS against OLS as:

RE  i 
X  
X -1

1

 
i

 X  X 1 X   X  X  X 1 i

• Let yt =  xt +  yt-1 + t; with t = t-1 + ut. ut ~ WN


Also, let xt also follow an AR(1) process: xt = θ xt-1 + ξt. ξt ~ WN

Then, when T is large, it can be shown that

RE   

Var ˆGLS


1    1   2  
Var bOLS  1    1   2  2   
• The relative efficiency can be very poor for large  for any given θ.

Example: Let  = θ = 0.7  RE ≈ 0.3423.


Suppose SE[GLS]= 1  SE[b] = 1.71 (= sqrt[1/0.3423])
 OLS SE is about 71% > GLS SE.

Implications for OLS: Relative Efficiency

RE   

Var ˆGLS


1    1   2  
Var bOLS  1    1   2  2  
• The OLS estimators can be quite reasonable for a low degree of
autocorrelation for any given θ, for example, when  = .3 and θ=.9,
then RE ≈ 0.9510.

• The inefficiency of OLS is difficult to generalize. We tend to see


increase inefficiency with increasing values of the disturbance
variances.

• In practice, it is worst in low frequency -i.e., long period (year)- slowly


evolving data. Can be extremely bad. GLS vs. OLS, the efficiency
ratios can be 3 or more.

• Given the potential efficiency gain, it makes sense to test for


autocorrelation.

7
RS – Lecture 13

Newey-West estimator
• The performance of NW estimators depends on the choice of the
kernel function –i.e., kL– and truncation lag (L). These choices affect
the resulting test statistics and render testing results fragile.

• NW SEs perform poorly in Monte Carlo simulations: the finite-


sample performance of tests using NW SE is not well approximated
by the asymptotic theory (big size problems), especially when show
xtet moderate or high persistence:
- The kernel weighting scheme yields negative bias –i.e., NW SEs are
downward biased–, which could be big in finite samples.
- The tests based on the NW SE usually over-reject H0.
- A relatively small L is needed to minimize MSE, which leads to
considerable bias of the Q* estimator (&, then, distorts the size the
tests). Minimizing size distortions needs a larger L.

Newey-West estimator: Implementation


• To implement the HAC estimator, we need to determine: lag order –
i.e., truncation lag (L) or bandwidth–, and kernel choice (kl (L)).

(1) Truncation lag (L)


No optimal formula; though selecting L to minimize MSE is popular.

To determine L, we use:
- Trial and error, informed guess.
- Rules of thumb. For example: L = 0.75T1/3- 1.
- Automatic selection rules, following Andrews (1991), Newey
and West (1994) or Sun et al. (2008).

The choice of L matters. In general, for ARMA models we have:


- Shorter lags: Larger Bias, Smaller Variance
- Longer lags: Smaller Bias, Larger Variance

8
RS – Lecture 13

Newey-West estimator: Implementation


• Usual practical advise regarding L : Choose L (lags) a little longer
than you might otherwise.

• Sun et al. (2008) give some intuition for a longer L than the optimal
MSE L, by expanding the probability of a test. Simple example:

Let z ~ N(0, σ2), s2 is an estimator of σ2 (assume independent of z).


 z2 
    
Pr 2  c   Pr z 2  s 2c  E I[ z 2  s 2c]  E g(s 2 ) 
s 
    1
 
 E g( 2 )  E s 2   2  g' ( 2 )  E (s 2   2 ) 2  g' ' ( 2 )
2
1
 
 F 2 (c)  Bias(s 2 )  g' ( 2 )  MSE s 2  g' ' ( 2 )
1
2
 Equal weight MSE + Bias. Long L minimizes the Bias; better size!

Newey-West estimator: Implementation


(2) Kernel Choice
- In theory, the kernel choice matters.
- In practice, at least for psd kernels, it does not seem to matter.

kL(x)

x
• Based on the work of Andrews (1991), where he finds a HAC that
minimizes the AMSE of the LRV, the QS kernel tends to be the default
kernel in computations of HAC SE.

9
RS – Lecture 13

NW Estimator: Improvements
• Other than finding a good kernel and a (long) L, the performance of
HAC estimators may be improved by:

(1) Pre-whitening the data -Andrews and Monahan (1992). Regress


xiet on its lagged values. Some arbitrary choice in the selection of the
lag order to do the regression.

(2) Computing sample autocovariances based on forecast errors,


instead of OLS residuals –Kuan and Hsieh (2006). Replace et with
one-step-ahead forecast errors: fet = yt – Xt’ bt-1, where bt-1 is the
recursive OLS estimators based on the subsample of the first t − 1
observations.

NW Estimator: Improvements - Example


Example: We compute different NW SE for the 3 factor F-F model
for IBM returns, with bandwidth selected as in Andrews (1991):
> library(sandwich)
> reg <- lm(y ~ x -1)
> reg$coefficients
x xx1 xx2 xx3
-0.2331470817 0.0101872239 0.0009802843 -0.0044459013 ⟹ OLS b

> sqrt(diag(kernHAC(reg, prewhite = 0, bw = bwAndrews, kernel = "Quadratic Spectral", verbose = TRUE)))


x xx1 xx2 xx3
0.020959375 0.002848645 0.003983330 0.005310548 ⟹ & Bandwidth chosen: 3.035697

> sqrt(diag(kernHAC(reg, prewhite = 0, bw = bwAndrews, kernel = "Bartlett", verbose = TRUE)))


x xx1 xx2 xx3
0.020344074 0.002828663 0.003995942 0.005177482 ⟹ & Bandwidth chosen: 3.507051

> sqrt(diag(kernHAC(reg, prewhite = 0, bw = bwAndrews, kernel = "Parzen", verbose = TRUE)))


x xx1 xx2 xx3
0.022849506 0.002839034 0.003954436 0.005427730 ⟹ & Bandwidth chosen: 6.110888

10
RS – Lecture 13

NW Estimator: Improvements - Example


Example: Now, we also pre-white the data (prewhite = 1):
> sqrt(diag(kernHAC(reg, prewhite = 1, bw = bwAndrews, kernel = "Quadratic Spectral", verbose = TRUE)))
x xx1 xx2 xx3
0.043339699 0.002908898 0.004029606 0.005783013 ⟹ & Bandwidth chosen: 0.8118876

> sqrt(diag(kernHAC(reg, prewhite = 1, bw = bwAndrews, kernel = "Bartlett", verbose = TRUE)))


x xx1 xx2 xx3
0.042943572 0.002912273 0.004022336 0.005786720 ⟹ & Bandwidth chosen: 0.516233

> sqrt(diag(kernHAC(reg, prewhite = 1, bw = bwAndrews, kernel = "Parzen", verbose = TRUE)))


x xx1 xx2 xx3
0.040963950 0.002912789 0.004006919 0.005767432 ⟹ & Bandwidth chosen: 1.634337

• Note: Pre-whitening tends to increase the standard errors (&


decrease the bandwidth). Nice result, given that the usual NW SEs
tend to be downward biased.

Newey-West estimator: Inconsistency


• Recall that a key assumption in establishing consistency for ST is that
L → ∞ as T → ∞, but L/T → 0.

• In practice, L/T is never equal to 0, but some positive fraction, b (b є


(0,1]). Under this situation, the NW estimator is no longer consistent.

• Thus, t- and F-tests no longer converge in distribution to Normal


and χ2 RVs, but they do converge in distribution to a RV that have
non-standard distribution; which do not depend on the unknown
value of Ω. Tests are still possible.

• To get asymptotic distributions (& critical values) we use “fixed-b”


asymptotics. Under fixed-b asymptotics, the truncation parameter, L,
is treated as proportional to T, so L =bT, where b is fixed –see, Kiefer,
Vogelsang & Bunzell (KVB, 2000), Kiefer & Vogelsang (2002, 2005).

11
RS – Lecture 13

Newey-West estimator: Inconsistency


• Under fixed-b asymptotics, typically ST → Q*½ Ξ Q*½, where Ξ is a
RV with E[Ξ]=Ip. Ξ has a non-standard distribution.

• Kiefer and Vogelsang (2005) derive limiting distribution for ST,


which is complicated, but the 95% critical values (CV) for t-tests can
be constructed using the following polynomial (b = L/T):

CV (L/T) = 1.96 + 2.9694 b + 0.416 b2 – .05324 b3.

Note: As b → 0, the standard t critical values apply.

• Since non-standard distributions are not popular, work has been


devoted to find simple and intuitive estimators of Q* that can be used
in tests with traditional distributions (say, N(0, 1) and χ2).

Newey-West estimator: Inconsistency


• When the frequency domain kernel weights are equal and truncated
after the first B/2 periodogram ordinates (an estimator of the spectrum at
frequency (2π j/T)), the limiting fixed-b distribution of ST is a χ2B/B.

• This corresponds to the equal-weighted periodogram estimator of


Q* (the Daniell window):
T
2 B/2
1 B/2  1 T
 1 T

STEWP 
B
 I xeex (2j / T ) 
j 1

B j 1  T

t 1
( xt et )e it  
 T

t 1
( xt et )e it 

Now, the usual t-test, t EWP  T (ˆ  0 ) / STEWP, has a tB asymptotic


distribution under H0.

• The EWP estimator has the nice property that fixed-b asymptotic
inference can be conducted using standard t and F distributions.

12
RS – Lecture 13

Newey-West estimator: Inconsistency


• In addition, the EWP estimator is psd with probability 1.

• Müller (2007), and Sun (2013) note that other estimators of Q* can
be derived by replacing the Fourier functions in SEWPT by other basis
functions of a general orthonormal set of basis function for L2[0,1].

• Then, we can see SEWPT as a especial case of:


1 B ˆ  1
T
STBF  
B j 1
ˆ' 
S j , where S j   ˆ & 
j j j
T
(x e ) (t / T )
t 1
t t j

• Different 𝜙j basis functions (say, cosine), different estimators.

Note: Since SBFT is computed using an outer product, it is psd.

Newey-West estimator: KVB


• The (kernel) HAC estimation requires the choices of the kernel
function and L. Such choices are somewhat arbitrary in practice.

• To avoid these difficulties, Kiefer, Vogelsang, and Bunzel (2000),


KVB, proposed an approach that yields an asymptotically pivotal test
without consistent estimation of the asymptotic covariance matrix.

• Idea: Use a normalizing matrix to eliminate the nuisance parameters


in Q*½, the matrix square root of Q*T & impose no truncation (b=1).
Let
φj = (1/√T) ∑ 𝒙 𝑒
Normalizing matrix:
CT = (1/T) t=l,...,T φt φt’ = (1/T2) j=1,...,T (∑ 𝒙 𝑒 ) (∑ 𝒙 𝑒 )’

13
RS – Lecture 13

Newey-West estimator: KVB


• Normalizing matrix:
CT = (1/T) t=l,...,T φt φt’ = (1/T2) j=1,...,T (∑ 𝒙 𝑒 ) (∑ 𝒙 𝑒 )’
This normalizing matrix is inconsistent for Q*T but it is free from the
choice of kernel and L. (Note: There is no truncation, L=T  Good
for size of test!)

• We use this CT matrix to calculate tests. For example, to test r


restrictions H0: (R β − q=0), we have the following statistic

W+T = T (RˆbT − q)’[R (X’X)−1 CT (X’X)−1 R] -1 (RˆbT − q).

Although the asymptotic distribution of W+T is non-standard, it can


be simulated -Lobato (2001).

Newey-West estimator: KVB


• KV (2002) showed that 2CT is algebraically equivalent to Q*,BT
(where B stands for Bartlett kernel) without truncation (b=1) - i.e.,
L(T)=T. Then, usual W based on Q*,BT without truncation is the
same as W+T/2.

• KVB derive the (non-standard) asymptotic distribution of the


conventional t-test of H0: βi=r ; but using their robust version, t+:

where δi is the i-th diagonal element of (X’X)−1 CT (X’X)−1, W is a


standard Wiener process, and B(r) is a Brownian Bridge –i.e., Bk(r) =
Wk(r) − rWk(1), 0 ≤ r ≤ 1. This distribution is symmetric, but more
disperse than the N(0,1).

14
RS – Lecture 13

Newey-West estimator: KVB


• KVB report the quantiles of the asymptotic distribution of the usual
t-test, using CT and using the NW SE, without truncation. (Notation:
Q*=Σkernel)

Remark: KV (2002) shows that under certain assumptions the t-test


with NW’s SE without truncation are also asymptotically pivotal.

Newey-West estimator: KVB - Remarks


• An advantage of testing with KVB’s CT matrix is that its asymptotic
distribution usually provides good approximation to its finite-sample
counterpart. That is, the empirical size is close to the nominal size (α).

• This is not the case for the NW HAC SE: in finite samples, they are
downward biased. Tests are usually over-sized –i.e., not conservative.

• KV (2002b) show that, for Q*,k with the truncation lag equal to
sample size, T, Q*,BT compares favorably with Q*,QST in terms of
power. This is in contrast with the result in HAC estimation, where
the latter is usually preferred to other kernels.

• Reference: Kiefer, N. M., T. J. Vogelsang and H. Bunzel (2000).


“Simple robust testing of regression hypothesis,” Econometrica, 68,
695–714.

15
RS – Lecture 13

Testing for Autocorrelation: LM Test


• There are several autocorrelation tests. Under the null hypothesis of
no autocorrelation of order p, we have H0 p = 0.
Under H1, we consider: 𝜀 = 𝜀  𝜀 ⋯  p𝜀 𝑢
Under H0, we can use OLS residuals.

• Breusch–Godfrey (1978) LM test. Similar to the BP test:


– Step 1. (Same as BP’s Step 1). Run OLS on DGP:
y = X  + . - Keep residuals, 𝑒 .
– Step 2. (Auxiliary Regression). Run the regression of 𝑒 on all the
explanatory variables, X:
𝑒 = X t’ γ + α 1 𝑒 + .... + αp 𝑒 + vt - Keep R2 (𝑅 )
– Step 3. Keep 𝑅 . Then, calculate:
LM = (T- p) * 𝑅 → χ . 31

Testing for Autocorrelation: LM Test


Example: LM-AR Test for the 3 factor F-F model for IBM returns
(p=12 lags):
fit_ibm<- lm(ibm_x ~ Mkt_RF + SMB + HML) # OLS regression
e <- fit_ibm$residuals # OLS residuals
p_lag <- 12 # Select # of lags for test (set p)
e_lag <- matrix(0,T-p_lag,p_lag) # Matrix to collect lagged residuals
a <- 1
while (a<=p_lag) { # Do loop creates matrix (e_lag) with lagged e
za <- e[a:(T-p_lag+a-1)]
e_lag[,a] <- za
a <- a+1
}

Mkt_RF_p <- Mkt_RF[(p_lag+1):T] # Adjust for new sample size: T – p_lag


SMB_p <- SMB[(p_lag+1):T]
HML_p <- HML[(p_lag+1):T]
fit1 <- lm(e[(p_lag+1):T] ~ e_lag + Mkt_RF_p + SMB_p + HML_p) # Auxiliary Regression
r2_e1 <- summary(fit1)$r.squared # get R^2 from Auxiliary Regression32
lm_t <- (T-p_lag )* r2_e1 # LM-test wih p lags

16
RS – Lecture 13

Testing for Autocorrelation: LM Test


Example (continuation):
lm_t # print lm_t
df <- ncol(e_lag) # degrees of freedom of test
1 - pchisq(lm_t,df) # p-value of lm_t
> r2_e1 <- summary(fit1)$r.squared
> r2_e1
[1] 0.0303721
> (T-p_lag)
[1] 557
> lm_t <- (T - p_lag) * r2_e1
> lm_t
[1] 16.91726
> df <- ncol(e_lag) # degrees of freedom for the LM Test
> 1-pchisq(lm_t,df)
[1] 0.1560063

LM-AR(12) Test: 16.91726  cannot reject H0 at 5% level (p-value > .05).

If I run the test with p=4 lags, I get 33


LM-AR(4) Test: 2.9747 (p-value = 0.56)  cannot reject H0 at 5% level (p-value > .05).

Testing for Autocorrelation: LM Test


Example (continuation):
The package lmtest, performs this test, bgtest, (and many others, used in
this class, encompassing, jtest, waldtest, etc). You need to install it
first: install.packages(“lmtest”), then call the library(lmtest).
> library(lmtest)
> bgtest(ibm_x ~ Mkt_RF + SMB + HML, order=12)

Breusch-Godfrey test for serial correlation of order up to


12

data: lr_ibm ~ Mkt_RF + SMB + HML


LM test = 16.259, df = 12, p-value = 0.1797 (minor difference with the previous test, likely due to
multiplication by T. Results do not change much)

Note: If you do not include in the Auxiliary Regression the original


regressors (Mkt_RF, SMB, HML) the test do not change much. You
get LM-AR(12) Test: 16.83253  very similar. Not entirely
correct, but it works well.34

17
RS – Lecture 13

Testing for Autocorrelation: LM Test


Example (continuation):
Autocorrelation is very common. If I run the test for Disney, CNP,
or GE, instead, we get significant test results. For DIS:
lr_dis <- log(x_dis[-1]/x_dis[-T])
dis_x <- lr_dis – RF

> bgtest(dis_x ~ Mkt_RF + SMB + HML, order=4)


Breusch-Godfrey test for serial correlation of order up to 4

data: dis_x ~ Mkt_RF + SMB + HML


LM test = 8.6382, df = 4, p-value = 0.07081  cannot reject H0 at 5% level (p-value >.05)

> bgtest(dis_x ~ Mkt_RF + SMB + HML, order=12)


Breusch-Godfrey test for serial correlation of order up to 12

data: dis_x ~ Mkt_RF + SMB + HML


LM test = 30.068, df = 12, p-value = 0.002728  reject H0 at 5% level (p-value < .05)
35

Testing for Autocorrelation: LM Test


Example (continuation):
LM tests for autocorrelation (with 12 lags) for GE and CNP again
show significant test results:
lr_ge <- log(x_ge[-1]/x_ge[-T]); ge_x <- lr_ge – RF
lr_cnp <- log(x_cnp[-1]/x_cnp[-T]); cnp_x <- lr_cnp – RF

> bgtest(ge_x ~ Mkt_RF + SMB + HML, order=4)


Breusch-Godfrey test for serial correlation of order up to 4

data: ge_x ~ Mkt_RF + SMB + HML


LM test = 28.257, df = 4, p-value = 0.005073  cannot reject H0 at 5% level (p-value >.05)

> bgtest(cnp_x ~ Mkt_RF + SMB + HML, order=12)


Breusch-Godfrey test for serial correlation of order up to 12

data: cnp_x ~ Mkt_RF + SMB + HML


LM test = 31.718, df = 12, p-value = 0.00153  reject H0 at 5% level (p-value < .05)
36

18
RS – Lecture 13

Testing for Autocorrelation: LM Test


• Q: How many lags are needed in the test? In general, enough to
make sure there is no auto-correlation left in the residuals. Using some
criteria for optimal (“automatic”) selection is possible.

• There are some popular rule of thumbs: for daily data, 5 or 20 lags;
for weekly, 4 or 12 lags; for monthly data, 12 lags; for quarterly data, 4
lags.

37

Testing for Autocorrelation: Durbin-Watson


• The Durbin-Watson (1950) (DW) test for AR(1) autocorrelation:
H00 against H1≠ 0. Based on simple correlations of e.
T

 (e t  e t 1 ) 2
d  t 2
T

e
t 1
2
t

• It is easy to show that when T → ∞, d  2(1 - ).


•  is estimated by the sample correlation r.
• Under H0, =0. Then, d should be distributed randomly around 2.
• Small values of d lead to rejection of H0. The distribution depends
on X. Durbin-Watson derived bounds for the test.
• In the presence of lagged dependent variables, Durbin’s (1970) h
test should be used: h = r sqrt{T/(1 – T s2)}

19
RS – Lecture 13

Testing for Autocorrelation: DW Test


Example: DW Test for the 3 factor F-F model for IBM returns
fit_dw <- lm(ibm_x ~ Mkt_RF + SMB + HML) # OLS regression
e <- fit_dw$residuals # OLS residuals
> RSS <- t(e)%*%e # RSS
> DW <- sum((e[1:(T-1)]-e[2:T])^2)/RSS # DW stat
> DW
[1] 2.042728  DW statistic ≈2  No evidence for autocorrelation of order 1.
> 2*(1-cor(e[1:(T-1)],e[2:T])) # approximate DW stat
[1] 2.048281

• Similar finding for Disney returns:


> DW
[,1]
[1,] 2.1609  DW statistic ≈2  But, DIS suffers from autocorrelation!

 This is why DW are not that informative. They only test for AR(1) in residuals.

Note: The package lmtest performs this test too, dwtest:


> dwtest(y ~ Mkt_RF + SMB + HML)
DW = 2.0427, p-value = 0.7087 39

Testing for Autocorrelation: DW Test


Example: DW Test for the residuals of the encompassing model
(IFE + PPP) for changes in USD/GBP:
fit_gbp <- lm(lr_usdgbp ~ inf_dif + int_dif)
e_gbp <- fit_gbp$residuals
> dwtest(fit_gbp)

Durbin-Watson test

data: fit_gbp
DW = 1.8588, p-value = 0.08037  not significant at 5% level.
alternative hypothesis: true autocorrelation is greater than 0

40

20
RS – Lecture 13

Testing for Autocorrelation: Portmanteu tests


• Portmanteu tests are tests with a well-defined H0, but not specific
H1. We will present two: Box-Pierce Q test and the Ljung-Box test.

• Box-Pierce (1970) test (Q test).

It tests H0 p = 0 using the sample correlation, 𝑟 =


where (using time series notation)

𝛾 = Sample covariance between 𝑦 & 𝑦 =
𝛾 = Sample variance.

Then, under H0:

Q=T ∑ 𝑟 → χ .
41

Testing for Autocorrelation: Portmanteu tests


• Ljung-Box (1978) test (LB test).
A variation of the Box-Pierce test. It has a small sample correction.

LB = T * (T – 2) * ∑ χ .

• The asymptotic distribution of both tests is based on the fact that,


under the null of independent data, 𝑇 𝒓 N(0, I).

Note: When analyzing residuals, 𝑒 , of a regression we compute 𝑟 as:



𝑟 = = ∑

• The LB statistic is widely used. But, the BG (1978) LM tests


conditions on X. Thus, it is more powerful. 42

21
RS – Lecture 13

Testing for Autocorrelation: Portmanteu tests


Example: Q and LB tests with p=12 lags for the residuals in the 3-
factor FF model for IBM excess returns:
RSS <- sum(e_ibm^2)
r_sum <- 0
lb_sum <- 0
p_lag <- 12
a <- 1
while (a <= p_lag) {
za <- as.numeric(t(e_ibm[(p_lag+1):T]) %*% e_ibm[a:(T-p_lag+a-1)])
r_sum <- r_sum + (za/RSS)^2 #sum cor(e[(p_lag+1):T], e[a:(T-p_lag+a-1)])^2
lb_sum <- lb_sum + (za/RSS)^2/(T-a)
a <- a + 1
}
Q <- T*r_sum
LB <- T*(T-2)*lb_sum
>Q
[1] 16.39559 (p-value = 0.1737815)  cannot reject H0 at 5% level.
> LB
[1] 16.46854 (p-value = 0.1707059)  cannot reject H0 at 5% level. 43

Testing for Autocorrelation: Portmanteu tests


Example (continuation): The Box.test function computes Q & LB:
• Q test
> Box.test(e_ibm, lag = 12, type="Box-Pierce")

Box-Pierce test

data: e
X-squared = 16.304, df = 12, p-value = 0.1777

• LB test
> Box.test(e_ibm, lag = 12, type="Ljung-Box")

Box-Ljung test

data: e
X-squared = 16.61, df = 12, p-value = 0.1649

Note: There is a minor difference between the previous code and the code
in Box.test. They are based on how the correlations of e are computed
(centered around the mean, or assumed zero mean). 44

22
RS – Lecture 13

Testing for Autocorrelation: Portmanteu tests


Example (continuation): Same tests (p=12 lags) & same model:
• For DIS (dis_x), we get:
>Q
[1] 28.76842 (p-value = 0.004264043)  reject H0 at 5% level.
> LB
[1] 29.05072 (p-value = 0.003872236)  reject H0 at 5% level.

• For GE (ge_x), we get


>Q
[1] 24.20958 (p-value = 0.01904602)  reject H0 at 5% level.
> LB
[1] 24.33922 (p-value = 0.01828389)  reject H0 at 5% level.

• Autocorrelation in financial asset returns is a usual finding in


monthly, weekly and daily data.
45

Testing for Autocorrelation: Portmanteu tests


• Q & LB tests are widely use, but they have two main limitations:
(1) The test was developed under the independence assumption.
If 𝑦 shows dependence, such as heteroscedasticity, the asymptotic
variance of 𝑇 𝒓 is no longer I, but a non-diagonal matrix.

There are several proposals to “robustify” both Q & LB tests, see


Diebold (1986), Robinson (1991), Lobato et al. (2001). The
“robustified” Portmanteau statistic uses 𝑟 instead of 𝑟 :

𝑟 = =∑

Thus, for Q we have:

Q* = T ∑ 𝑟̃ → χ . 46

23
RS – Lecture 13

Testing for Autocorrelation: Portmanteu tests


(2) The selection of the number of autocorrelations 𝑝 is arbitrary.
The traditional approach is to try different 𝑝 values, say 3, 6 & 12.
Another popular approach is to let the data “select” 𝑝, for example,
using AIC or BIC, an approach sometimes referred as “automatic
selection.”

Escanciano and Lobato (2009) propose combining BIC’s and AIC’s


penalties to select 𝑝 in Q* (BIC for small and AIC for bigger .

• It is common to reach different conclusion from Q and Q*.

47

Testing for Autocorrelation: Portmanteu tests


Example: Q* tests with automatic selection of p for the residuals in
the 3-factor FF model for IBM & DIS excess returns. We use
Auto.Q funcition in R package vrtest.
- For IBM (e_ibm), we get:
> library(vrtest)
> Auto.Q(e_ibm, 12) #Maximum potential lag = 12
> $Stat
[1] 0.2781782

$Pvalue
[1] 0.5978978

- For DIS (e_dis), we get:


> Auto.Q(e_dis, 12)
$Stat
[1] 2.649553

$Pvalue
[1] 0.103579  Reversal for DIS 48

24
RS – Lecture 13

GLS: The AR(1) Model


• (A1) holds: y = X  + 
But,  is no longer white noise:
𝜀 =  𝜀 + 𝑢 , || < 1. 𝑢 is white noise error ~D(0,σu2)

Note: This characterizes the disturbances, not the regressors.

Notation: Let L be the lag operator, such that Lq zt = zt-q. Then,


(1 – L) 𝜀 = 𝑢 .

• After some algebra, we get


𝜀 = 𝑢 +  𝑢 + 2 𝑢 + 3 𝑢 + ...
= Σj=0 j 𝑢 = Σj=0 (L)j 𝑢 (a moving average)

• Var[𝜀 ] = Σj=0 2j Var[ 𝑢 ] = Σj=0 2j σu2 = σu2 /(1 – 2)

GLS: AR(1) Case – Autocorrelation Matrix Σ


• Now, we get (A3’) Σ = σ2 .

1 𝜌 𝜌 ⋯ 𝜌
𝜌 1 𝜌 ⋯ 𝜌
(A3’) 𝜎 Ω 𝜌 𝜌 1 ⋯ 𝜌
⋮ ⋮ ⋮ ⋱ ⋮
𝜌 𝜌 𝜌 ⋯ 1

1. Then, we can get the transformation matrix P = -1/2:

1 𝜌 0 0 ... 0
𝜌 1 0 ... 0
Ω /
0 𝜌 1 ... 0
... ... ... ... ...
0 0 0 𝜌 0

25
RS – Lecture 13

GLS: AR(1) Case – Transformed y & X: y* & X*

2. With P = -1/2, we transform the data to do GLS.

1 𝜌 0 0 ... 0
𝜌 1 0 ... 0
𝐏 𝛀 /
0 𝜌 1 ... 0
: ... ... ... ... ...
0 0 0 𝜌 0
1 𝜌 𝑦
𝑦 𝜌𝑦
y* 𝐏𝒚 𝑦 𝜌𝑦  GLS: Transformed y*.
...
𝑦 𝜌

GLS: AR(1) Case – Transformed y & X: y* & X*

2. Transformed xk column (independent variable k) of matrix X is:

1 𝜌 0 0 ... 0
𝜌 1 0 ... 0
𝐏 𝛀 /
0 𝜌 1 ... 0
... ... ... ... ...
0 0 0 𝜌 0

1 𝜌 𝑥
𝑥 𝜌𝑥
𝒙∗ 𝐏𝒙  GLS: Transformed X*.
𝑥 𝜌𝑥
...
𝑥 𝜌𝑥

3. GLS is done with transformed data. In (A3’) we assume 𝜌 known.

26
RS – Lecture 13

GLS: The Autoregressive Transformation


• With AR models, sometimes it is easier to transform the data by
taking pseudo differences.

• For the AR(1) model, we multiply the DGP by ρ and subtract it


from it. That is,
𝑦 𝒙 ′𝛃 𝜀, 𝜀 𝜌𝜀 𝑢
𝜌𝑦 𝜌𝒙 ′𝛃 𝜌𝜀

𝑦 𝜌𝑦 𝒙 𝜌𝒙 ′𝛃 𝜀 𝜌𝜀
∗ ∗
𝑦 𝒙 ′𝛃 𝑢
Now, the errors, 𝑢 , which are uncorrelated. We can do OLS with the
pseudo differences.

Note: 𝑦 ∗ 𝑦 𝜌𝑦 & 𝒙∗ 𝒙 𝜌𝒙 are pseudo differences.

FGLS: Unknown 
• The problem with GLS is that  is unknown. For example, in the
AR(1) case,  is unknown.

• Solution: Estimate .  Feasible GLS (FGLS).

• In general, there are two approaches for GLS


(1) Two-step, or Feasible estimation: - First, estimate  first.
- Second, do GLS.
Similar logic to HAC procedures: We do not need to estimate ,
difficult with T observations. We estimate (1/T)X-1X.
– Nice asymptotic properties for FGLS estimator. Not longer BLUE

(2) ML estimation of , 2, and  at the same time (joint estimation


of all parameters). With some exceptions, rare in practice.

27
RS – Lecture 13

FGLS: Specification of 
•  must be specified first.
•  is generally specified (modeled) in terms of a few parameters.
Thus,  = () for some small parameter vector . Then, we need to
estimate .

Example: i with AR(1) process. We have already derived 2  as a


function of .

Technical note: To achieve full efficiency, we do not need an efficient


estimate of the parameters in , only a consistent one.

• For the AR(1) case, there is a simple estimation technique, the


Cochrane-Orcutt method.

FGLS Estimation: Cochrane-Orcutt


• yt – yt-1 = (Xt – Xt-1)’  + t - t-1
 yt = yt-1 + Xt’  – Xt-1’  + ut

• We have a linear model, but it is nonlinear in parameters. This is not


a problem: Non-linear estimation is possible.

• Before today’s computer power, Cochrane–Orcutt’s (1949) iterative


procedure was an ingenious way to do NLLS. Steps:
(1) Do OLS. Get residuals, e. Then estimate  with a regression of et
against et-1. We use r to denote the estimator of .
(2) FGLS Step. Using r transform the model to get y* and X*. Do
OLS  get b to estimate . Get residuals, e*. Go back to (1).
(3) Iterate until convergence.

28
RS – Lecture 13

FGLS Estimation: Cochrane-Orcutt in R


Example: Cochrane-Orcutt in R
# C.O. funcition requires Y, X (with constant), OLS b.
c.o.proc <- function(Y,X,b_0,tol){
T <- length(Y)
e <- Y - X%*%b_0 # OLS residuals
rss <- sum(e^2) # Initial RSS of model, RSS9
rss_1 <- rss # RSS_1 will be used to reset RSS after each iteration
d_rss = rss # initialize d_rss: difference between RSSi & RSSi-1
e2 <- e[-1] # adjust sample size for et
e3 <- e[-T] # adjust sample size for et-1
ols_e0 <- lm(e2 ~ e3 - 1) # OLS to estimate rho
rho <- ols_e0$coeff[1] # initial value for rho, 0
i<-1
while (d_rss > tol) { # tolerance of do loop. Stop when diff in RSS < tol
rss <- rss_1 # RSS at iter (i-1)
YY <- Y[2:T] - rho * Y[1:(T-1)] # pseudo-diff Y
XX <- X[2:T, ] - rho * X[1:(T-1), ] # pseudo-diff X
ols_yx <- lm(YY ~ XX - 1) # adjust if constant included in X

FGLS Estimation: Cochrane-Orcutt in R


Example (continuation):
b <- ols_yx$coef # updated OLS b at iteration i
# b[1] <- b[1]/(1-rho) # If constant not pseudo-differenced remove tag #
e1 <- Y - X%*%b # updated residuals at iteration i
e2 <- e1[-1] # adjust sample size for updated et
e3 <- e1[-T] # adjust sample size for updated e_t-1 (lagged et)
ols_e1 <- lm(e2~e3-1) # updated regression to value for rho at iteration i
rho <- ols_e1$coeff[1] # updated value of rho at iteration i, i
rss_1 <- sum(e1^2) # updated value of RSS at iteration i, RSSi
d_rss <- abs(rss_1 - rss) # diff in RSS (RSSi - RSSi-1)
i <- i+1
}

result <-list()
result$Cochrane-Orc.Proc <- summary(ols_yx)
result$rho.regression <- summary(ols_e1)
# result$Corrected.b_1 <- b[1]
result$Iterations < -i-1
return(result)
}

29
RS – Lecture 13

FGLS Estimation: Cochrane-Orcutt – iMX


Example: In the model for Mexican interest rates (iMX), we suspect
an AR(1) in the residuals:
iMX,t = 0 + 1 iUS,t + 2 et + 3 mx_It + 4 mx_yt + t
𝜀 𝜌𝜀 𝑢
• Cochrane-Orcutt estimation.
y <- mx_i_1
T_mx <- length(mx_i_1)
xx_i <- cbind(us_i_1, e_mx, mx_I, mx_y)
x0 <- matrix(1,T_mx,1)
X <- cbind(x0,xx_i) # X matrix
fit_i <- lm(mx_i_1 ~ us_i_1 + e_mx + mx_I + mx_y)
b_i <-fit_i$coefficients # extract coefficients from lm
> summary(fit_i)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.04022 0.01506 2.671 0.00834 **
us_i_1 0.85886 0.31211 2.752 0.00661 **
e_mx -0.01064 0.02130 -0.499 0.61812
mx_I 3.34581 0.19439 17.212 < 2e-16 ***
mx_y -0.49851 0.73717 -0.676 0.49985

FGLS Estimation: Cochrane-Orcutt – iMX


Example (continuation):
> c.o.proc(y,X,b,.0001)
$Cochrane.Orcutt.Proc

Call:
lm(formula = YY ~ XX - 1)

Residuals:
Min 1Q Median 3Q Max
-0.69251 -0.02118 -0.01099 0.00538 0.49403

Coefficients:
Estimate Std. Error t value Pr(>|t|)
XX 0.16639 0.07289 2.283 0.0238 *
XXus_i_1 1.23038 0.76520 1.608 0.1098  not longer significant at 5% level.
XXe_mx -0.00535 0.01073 -0.499 0.6187
XXmx_I 0.41608 0.27260 1.526 0.1289  not longer significant at 5% level.
XXmx_y -0.44990 0.53096 -0.847 0.3981
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

30
RS – Lecture 13

FGLS Estimation: Cochrane-Orcutt – iMX


Example (continuation):
Residual standard error: 0.09678 on 160 degrees of freedom
Multiple R-squared: 0.1082, Adjusted R-squared: 0.08038
F-statistic: 3.884 on 5 and 160 DF, p-value: 0.002381

$rho
e3
0.8830857  very high autocorrelation.

$Corrected.b_1
XX
0.1663884  Constant corrected if X does not include a constant

$Number.Iteractions
[1] 10  algorithm converged in 10 iterations.

FGLS Estimation: Cochrane-Orcutt


• SE[bCO] and SE[rCO] are obtained from the regression in the last
iteration. If the constant is not pseudo-differentiatted, the estimated
bCO,0 has to be adjusted by (1- rCO). Similar correction for SE[bCO,0].

• If we do not want to lose the first observation, we can use the Prais-
Winsten (1945) transformation of the first observation:
sqrt{1 – 2} y1 & sqrt{1 – 2} X1

• A grid search around  can speed up the algorithm considerably.


This is the Hildreth-Lu (1960) procedure.

• The iterative two-step estimation procedure can be easily extended


to AR(p) models.

31
RS – Lecture 13

FGLS Estimation: Cochrane-Orcutt


• Note: Cochrane-Orcutt works well if the specified AR(p) structure
is correct. Otherwise, we are in the presence of a missspecified model.

Example: For the 3 FF factor model for IBM returns we run C-O
with an AR(1) process for t: t = t-1 + ut.

Then, after the final run, we do an LM-AR(3) test on the residuals, ut.
We do this by adding in the C-O procedure (& add to the list the last
line: result$LM.AR3.test_u <- lm_t_u:
## lm_t for AR(3) in u
ols_u <- lm(u[4:T] ~ u[1:(T-3)] + u[2:(T-2)] + u[3:(T-1)])
r2_u <- summary(ols_u)$r.squared
lm_t_u <- (T-3)*r2_u

$LM.AR3.test_u
[1] 56.29834  Very significant. We need to use a higher AR(p) model.

FGLS & MLE Estimation


• We need to estimate   We need a model for  = (θ).
 In the AR(1) model, we had  = ().

- FGLS estimation is done using Cochrane-Orcutt or NLLS.


- MLE can also be done, say assuming a normal distribution for ut, to
estimate  and  simultaneously. For the AR(1) problem, the MLE
algorithm works like the Cochrane-Orcutt algorithm.

• For an AR(2) model, Beach-Mackinnon (1978) propose an MLE


algorithm that is very fast to converge.

• For an AR(p) models, with p > 3, MLE becomes complicated. Two-


step estimation is usually done.

32
RS – Lecture 13

MLE Estimation: Example in R


• Log likelihood of ARMA(1,1)-GARCH(1,1) Model:
log_lik_garch11 <- function(theta, data) {
mu <- theta[1]; delta <- theta[2]; gamma <- theta[3]; alpha0 <- abs(theta[4]);
alpha1 <- abs(theta[5]); beta1 <- abs(theta[6]); chk0 <- (1 - alpha1 - beta1)
r <- ts(data)
n <- length(r)

u <- vector(length=n);
u <- ts(u)
u[1] <- r[1] - mu # set initial value for u[t] series
for (t in 2:n)
{u[t] = r[t] - mu - delta*r[t-1] - gamma*u[t-1]}

h <- vector(length=n); h <- ts(h)


h[1] = alpha0/chk0 # set initial value for h[t] series
if (chk0==0) {h[1]=.00001} #check to avoid dividing by 0
for (t in 2:n)
{h[t] = abs(alpha0 + alpha1*(u[t-1]^2)+ beta1*h[t-1])
if (h[t]==0) {h[t]=.00001} } #check to avoid log(0)

return(-sum(-0.5*log(2*pi) - 0.5*log(abs(h[2:n])) - 0.5*(u[2:n]^2)/abs(h[2:n])))


}

MLE Estimation: Example in R


• To maximize the likelihood we use optim (mln can also be used):
dat_xy <- read.csv(” http://www.bauer.uh.edu/rsusmel/phd/datastream-K-DIS.csv",head=TRUE,sep=",")
summary(dat_xy)
names(dat_xy)

z <- dat_xy$SP500 # S&P 500 90-2016 monthly data

theta0 = c(0.01, -0.1, 0.01, -0.001, 0.2, 0.7) # initial values


ml_2 <- optim(theta0, log_lik_garch11, data=z, method="BFGS", hessian=TRUE)

ml_2$par # estimated parameters

I_Var_m2 <- ml_2$hessian


eigen(I_Var_m2) #check if Hessian is pd.
sqrt(diag(solve(I_Var_m2))) # parameters SE

33
RS – Lecture 13

Autocorrelation as a Common Factor


• From the first-order autocorrelated model
 yt = yt-1 + Xt’  – Xt-1’  + ut (*)

• We can generalize (*) using the lag operator L –i.e., L yt = yt-1:


(1 – L) yt = (1 – L) Xt’  + ut
Then, dividing by (1- L):
yt = Xt’  + ut/(1 – L) = Xt’  + t

• We can think of a model with autocorrelation as a misspecified


model. The common factor (1 – L) is omitted. See Mizon (1977).

• We can generalize (*) even more by introducing more common lags:


(1 – B(L)) yt = (1 – B(L)) Xt’  + ut B(L): function of L,L2,...,Lq; .

Common Factor Test


• From the AR(1) model:
(R) yt = yt-1 + Xt’  - Xt-1’  + ut (*)

• We can think of (*) as a special case of a more general specification:


(U) yt = λ1 yt-1 + Xt′ λ2 + Xt-1′ λ3 + ut

Restrictions needed to get (*): λ3 = - λ1 λ2

• Hendry and Mizon (1980) propose testing the validity of the


restrictions using a LR test, which has an asymptotic χ2 distribution,
with degrees of freedom equal to the number of restrictions .
LR = T log [RSSR/RSSU]

• The test is known as the common factor (COMFAC) test.

34
RS – Lecture 13

Common Factor Test


• We can use an F-test or Wald tests. See Mizon (1995) and McGuirk
and Spanos (2004).

Note: Since the H0 and H1 models involve lagged yt’s, the test statistics
do not follow the asymptotic distribution. Bootstraps are a good idea.

Common Factor Test - Example


• Common Factor Test for 3 FF factor model for IBM returns:
(U) - We fit the unrestricted model: yt = λ1 yt-1 + Xt′ λ2 + Xt-1′ λ3 + ut
> x <- cbind(x0,x1,x2,x3)
> x_l <- cbind(x1,x2,x3)
> reg_u <- lm(y[2:T]~y[1:T-1]+x[2:T,]+x_l[1:(T-1),] -1)
> sum(residuals(reg_u)^2)
[1] 2.92264

(R) We fit the restricted model: yt = yt-1 + Xt’  - Xt-1’  + ut


sum2 <- function(theta, x,y) {
rho1 <- theta[1]; mu <- theta[2]; beta <- theta[3:5]; lambda3 <- (-1)*rho1%*%beta
r <- ts(y)
T <- length(r)
T1 <- T-1
u <- vector(length=T1);
u = r[2:T]- rho1*r[1:(T-1)] - x[2:T]*mu - x[2:T,2:4]%*%beta - x[1:(T-1),2:4]%*%t(lambda3)
return(sum(u^2))
}

35
RS – Lecture 13

Common Factor Test - Example


> theta0 = c(0.5, -.02, 0.01, -0.005, -0.003)# initial values
> cf_r <- optim(theta0, sum2, x=x, y=y, method="BFGS", hessian=TRUE)

> cf_r$par
[1] 0.875011230 -0.027804863 0.009997961 -0.002767329 -0.003927199
> sum2(cf_r$par, x,y)
[1] 2.927888
> T*log(sum2(cf_r$par, x,y)/sum(residuals(reg_u)^2) # LR COMFAC TEST
[1] 0.5561482

• F-test = [(2.927888 - 2.92264)/3]/[2.92264/311] = 0.1861477


 cannot reject H0 at 5% level.

Note: The restricted model seems OK. But, we need to check that the
model is well specified. In this case, does the AR(1) structure is
enough to remove the autocorrelation in the errors?

Common Factor Test - Example


• We do an LM-AR(5) test to check the errors in the U Model:
> fit_u <- lm(e_u[(5+1):T]~e_u[1:(T-5)]+e_u[2:(T-4)]+e_u[3:(T-3)]+e_u[4:(T-2)]+e_u[5:(T-1)])
> r2_e_u <- summary(fit_u)$r.squared
> lm_t_u <- (T-4)*r2_e_u
> lm_t_u
[1] 70.75767 ⟹ Very significant (p-value: 1.6e-14). An AR(1) structure is not
sufficient to remove AR in errors.

In general, if we allow for more dynamics in the U Model we do


better. For example, we use 4 lags in yt and 2 lags in Xt:

> reg_u4 <- lm(y[5:T]~y[1:(T-4)]+y[2:(T-3)]+y[3:(T-2)]+y[4:(T-1)] +x[5:T,]+x_l[4:(T-1),]+x_l[3:(T-2),] -1)


> e_u4 <- residuals(reg_u4)
> fit_u5 <- lm(e_u4[(5+1):T]~e_u4[1:(T-5)]+e_u4[2:(T-4)]+e_u4[3:(T-3)]+e_u4[4:(T-2)]+e_u4[5:(T-1)])
> r2_e_u5 <- summary(fit_u5)$r.squared
> lm_t_u5 <- (T-5)*r2_e_u5
> lm_t_u5
[1] 6.938392 ⟹ Not significant (p-value: .139).

36
RS – Lecture 13

Building the Model


• Old (pre-LSE school) view: A feature of the data
– “Account” for autocorrelation in the data.
– Different models, different estimators

• Contemporary view: Why is there autocorrelation?


– What is missing from the model?
– Build in appropriate dynamic structures
– Autocorrelation should be “built out” of the model
– Use robust procedures (OLS with Newey-West SE) instead of
elaborated models specifically constructed for the AR errors.

37

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy