Auto/cross-Correlation: Generalized Regression Model
Auto/cross-Correlation: Generalized Regression Model
Lecture 13
Auto/cross-correlation
• We assume that the ’s in the sample are not longer generated
independently of each other. Ignoring hetersocedasticity, we have a
new Σ:
E[ij|X] = ij if i ≠ j
= 2 if i = j.
1
RS – Lecture 13
Auto-correlation
• In general, we find autocorrelation (or serial correlation) in time
series, shocks are persistent over time: It takes time to absorb a shock.
Auto-correlation
Examples:
- First-order autoregressive autocorrelation: AR(1)
𝜀 = 𝜀 𝑢
- pth-order autoregressive autocorrelation: AR(p)
𝜀 = 𝜀 𝜀 ⋯ p𝜀 𝑢
- Third-order moving average autocorrelation: MA(3)
𝜀 = 𝑢 λ𝑢 λ𝑢 λ3𝑢
2
RS – Lecture 13
3
RS – Lecture 13
4
RS – Lecture 13
• It looks like a small , but not very clear pattern from the graphs.
9
• It looks like a small , but not very clear pattern from the graphs. 10
5
RS – Lecture 13
• Again, it looks like a small , but not very clear pattern. 11
6
RS – Lecture 13
RE i
X
X -1
1
i
X X 1 X X X X 1 i
RE
Var ˆGLS
1 1 2
Var bOLS 1 1 2 2
• The relative efficiency can be very poor for large for any given θ.
RE
Var ˆGLS
1 1 2
Var bOLS 1 1 2 2
• The OLS estimators can be quite reasonable for a low degree of
autocorrelation for any given θ, for example, when = .3 and θ=.9,
then RE ≈ 0.9510.
7
RS – Lecture 13
Newey-West estimator
• The performance of NW estimators depends on the choice of the
kernel function –i.e., kL– and truncation lag (L). These choices affect
the resulting test statistics and render testing results fragile.
To determine L, we use:
- Trial and error, informed guess.
- Rules of thumb. For example: L = 0.75T1/3- 1.
- Automatic selection rules, following Andrews (1991), Newey
and West (1994) or Sun et al. (2008).
8
RS – Lecture 13
• Sun et al. (2008) give some intuition for a longer L than the optimal
MSE L, by expanding the probability of a test. Simple example:
kL(x)
x
• Based on the work of Andrews (1991), where he finds a HAC that
minimizes the AMSE of the LRV, the QS kernel tends to be the default
kernel in computations of HAC SE.
9
RS – Lecture 13
NW Estimator: Improvements
• Other than finding a good kernel and a (long) L, the performance of
HAC estimators may be improved by:
10
RS – Lecture 13
11
RS – Lecture 13
• The EWP estimator has the nice property that fixed-b asymptotic
inference can be conducted using standard t and F distributions.
12
RS – Lecture 13
• Müller (2007), and Sun (2013) note that other estimators of Q* can
be derived by replacing the Fourier functions in SEWPT by other basis
functions of a general orthonormal set of basis function for L2[0,1].
13
RS – Lecture 13
14
RS – Lecture 13
• This is not the case for the NW HAC SE: in finite samples, they are
downward biased. Tests are usually over-sized –i.e., not conservative.
• KV (2002b) show that, for Q*,k with the truncation lag equal to
sample size, T, Q*,BT compares favorably with Q*,QST in terms of
power. This is in contrast with the result in HAC estimation, where
the latter is usually preferred to other kernels.
15
RS – Lecture 13
16
RS – Lecture 13
17
RS – Lecture 13
18
RS – Lecture 13
• There are some popular rule of thumbs: for daily data, 5 or 20 lags;
for weekly, 4 or 12 lags; for monthly data, 12 lags; for quarterly data, 4
lags.
37
(e t e t 1 ) 2
d t 2
T
e
t 1
2
t
19
RS – Lecture 13
This is why DW are not that informative. They only test for AR(1) in residuals.
Durbin-Watson test
data: fit_gbp
DW = 1.8588, p-value = 0.08037 not significant at 5% level.
alternative hypothesis: true autocorrelation is greater than 0
40
20
RS – Lecture 13
Q=T ∑ 𝑟 → χ .
41
LB = T * (T – 2) * ∑ χ .
21
RS – Lecture 13
Box-Pierce test
data: e
X-squared = 16.304, df = 12, p-value = 0.1777
• LB test
> Box.test(e_ibm, lag = 12, type="Ljung-Box")
Box-Ljung test
data: e
X-squared = 16.61, df = 12, p-value = 0.1649
Note: There is a minor difference between the previous code and the code
in Box.test. They are based on how the correlations of e are computed
(centered around the mean, or assumed zero mean). 44
22
RS – Lecture 13
Q* = T ∑ 𝑟̃ → χ . 46
23
RS – Lecture 13
47
$Pvalue
[1] 0.5978978
$Pvalue
[1] 0.103579 Reversal for DIS 48
24
RS – Lecture 13
• Var[𝜀 ] = Σj=0 2j Var[ 𝑢 ] = Σj=0 2j σu2 = σu2 /(1 – 2)
1 𝜌 𝜌 ⋯ 𝜌
𝜌 1 𝜌 ⋯ 𝜌
(A3’) 𝜎 Ω 𝜌 𝜌 1 ⋯ 𝜌
⋮ ⋮ ⋮ ⋱ ⋮
𝜌 𝜌 𝜌 ⋯ 1
1 𝜌 0 0 ... 0
𝜌 1 0 ... 0
Ω /
0 𝜌 1 ... 0
... ... ... ... ...
0 0 0 𝜌 0
25
RS – Lecture 13
1 𝜌 0 0 ... 0
𝜌 1 0 ... 0
𝐏 𝛀 /
0 𝜌 1 ... 0
: ... ... ... ... ...
0 0 0 𝜌 0
1 𝜌 𝑦
𝑦 𝜌𝑦
y* 𝐏𝒚 𝑦 𝜌𝑦 GLS: Transformed y*.
...
𝑦 𝜌
1 𝜌 0 0 ... 0
𝜌 1 0 ... 0
𝐏 𝛀 /
0 𝜌 1 ... 0
... ... ... ... ...
0 0 0 𝜌 0
1 𝜌 𝑥
𝑥 𝜌𝑥
𝒙∗ 𝐏𝒙 GLS: Transformed X*.
𝑥 𝜌𝑥
...
𝑥 𝜌𝑥
26
RS – Lecture 13
𝑦 𝜌𝑦 𝒙 𝜌𝒙 ′𝛃 𝜀 𝜌𝜀
∗ ∗
𝑦 𝒙 ′𝛃 𝑢
Now, the errors, 𝑢 , which are uncorrelated. We can do OLS with the
pseudo differences.
FGLS: Unknown
• The problem with GLS is that is unknown. For example, in the
AR(1) case, is unknown.
27
RS – Lecture 13
FGLS: Specification of
• must be specified first.
• is generally specified (modeled) in terms of a few parameters.
Thus, = () for some small parameter vector . Then, we need to
estimate .
28
RS – Lecture 13
result <-list()
result$Cochrane-Orc.Proc <- summary(ols_yx)
result$rho.regression <- summary(ols_e1)
# result$Corrected.b_1 <- b[1]
result$Iterations < -i-1
return(result)
}
29
RS – Lecture 13
Call:
lm(formula = YY ~ XX - 1)
Residuals:
Min 1Q Median 3Q Max
-0.69251 -0.02118 -0.01099 0.00538 0.49403
Coefficients:
Estimate Std. Error t value Pr(>|t|)
XX 0.16639 0.07289 2.283 0.0238 *
XXus_i_1 1.23038 0.76520 1.608 0.1098 not longer significant at 5% level.
XXe_mx -0.00535 0.01073 -0.499 0.6187
XXmx_I 0.41608 0.27260 1.526 0.1289 not longer significant at 5% level.
XXmx_y -0.44990 0.53096 -0.847 0.3981
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
30
RS – Lecture 13
$rho
e3
0.8830857 very high autocorrelation.
$Corrected.b_1
XX
0.1663884 Constant corrected if X does not include a constant
$Number.Iteractions
[1] 10 algorithm converged in 10 iterations.
• If we do not want to lose the first observation, we can use the Prais-
Winsten (1945) transformation of the first observation:
sqrt{1 – 2} y1 & sqrt{1 – 2} X1
31
RS – Lecture 13
Example: For the 3 FF factor model for IBM returns we run C-O
with an AR(1) process for t: t = t-1 + ut.
Then, after the final run, we do an LM-AR(3) test on the residuals, ut.
We do this by adding in the C-O procedure (& add to the list the last
line: result$LM.AR3.test_u <- lm_t_u:
## lm_t for AR(3) in u
ols_u <- lm(u[4:T] ~ u[1:(T-3)] + u[2:(T-2)] + u[3:(T-1)])
r2_u <- summary(ols_u)$r.squared
lm_t_u <- (T-3)*r2_u
$LM.AR3.test_u
[1] 56.29834 Very significant. We need to use a higher AR(p) model.
32
RS – Lecture 13
u <- vector(length=n);
u <- ts(u)
u[1] <- r[1] - mu # set initial value for u[t] series
for (t in 2:n)
{u[t] = r[t] - mu - delta*r[t-1] - gamma*u[t-1]}
33
RS – Lecture 13
34
RS – Lecture 13
Note: Since the H0 and H1 models involve lagged yt’s, the test statistics
do not follow the asymptotic distribution. Bootstraps are a good idea.
35
RS – Lecture 13
> cf_r$par
[1] 0.875011230 -0.027804863 0.009997961 -0.002767329 -0.003927199
> sum2(cf_r$par, x,y)
[1] 2.927888
> T*log(sum2(cf_r$par, x,y)/sum(residuals(reg_u)^2) # LR COMFAC TEST
[1] 0.5561482
Note: The restricted model seems OK. But, we need to check that the
model is well specified. In this case, does the AR(1) structure is
enough to remove the autocorrelation in the errors?
36
RS – Lecture 13
37