Rahim Karim J 201410 PHD
Rahim Karim J 201410 PHD
to Nonstationary Data
by
Queen’s University
Kingston, Ontario, Canada
October, 2014
This thesis is concerned with changes in the spectrum over time observed in Holocene
climate data as recorded in the Burgundy grape harvest date series. These changes
represent nonstationarities, and while spectral estimation techniques are relatively
robust in the presence of nonstationarity—that is, they are able to detect significant
contributions to power at a given frequency in cases where the contribution to power
at that given frequency is not constant over time—estimation and prediction can be
improved by considering nonstationarity. We propose improving spectral estimation
by considering such changes. Specifically, we propose estimating the level of change
in frequency over time, detecting change-point(s) and sectioning the time series into
stationary segments. We focus on locating a change in frequency domain in time,
and propose a graphical technique to detect spectral changes over time. We test the
estimation technique in simulation, and then apply it to the Burgundy grape harvest
date series. The Burgundy grape harvest date series was selected to demonstrate the
introduced estimator and methodology because the time series is equally spaced, has
few missing values, and a multitaper spectral analysis, which the methodology pro-
posed in this thesis is based on, of the grape harvest date series was recently published.
In addition, we propose a method using a test for goodness-of-fit of autoregressive
estimators to aid in assessment of change in spectral properties over time.
ii
This thesis has four components: (1) introduction and study of a level-of-change
estimator for use in the frequency domain change-point detection, (2) spectral analysis
of the Burgundy grape harvest date series, (3) goodness-of-fit estimates for autore-
gressive processes, and (4) introduction of a statistical software package for multi-
taper spectral analysis. We present four results. (1) We introduce and demonstrate
the feasibility of a level-of-change estimator. (2) We present a spectral analysis and
coherence study of the Burgundy grape harvest date series that includes locating a
change-point. (3) We present a study showing an advantage using multitaper spectral
estimates when calculating autocorrelation coefficients. And (4) we introduce an R
software package, available on the Comprehensive R Archive Network (CRAN), to
perform multitaper spectral estimation.
iii
Acknowledgments
I would like to thank my advisor, David Thomson, for sharing his knowledge and
interest, and, perhaps most importantly, for his kindness, insight, encouragement, and
honesty in working with me in this endeavour. I would like to thank the following past
and current students of David Thomson who have provided helpful discussion along
the way: Wesley Burr, Charlotte Haley, Kyle Lepage, Ian Moore, Joshua Pohlkamp-
Hartt, David Riegert, and Aaron Springford. In addition, I would also like to thank
Maja-Lisa Thomson, Valdimar Tasnov, and Jim Diamond for helpful discussions, I
would like to thank Jennifer Reid for making this department a comfortable place
to be. Finally yet importantly, I would like to thank my family for their patience,
kindness, and encouragement along the way.
iv
Co-authorship
Chapters 3, 4, and 5 are co-authored with David J. Thomson. Appendix A1, which
discusses the multitaper R software package, is co-authored with Wesley S. Burr and
David J. Thomson. The multitaper R software package is co-authored with Wesley
S. Burr and David J. Thomson.
v
Table of Contents
Abstract ii
Acknowledgments iv
Co-authorship v
Table of Contents vi
List of Tables x
Chapter 1:
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Chapter 2:
Background . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.1 Time Series Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
vi
2.3 Stationary Time Series . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.4 Several Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.5 Spectral Density Function . . . . . . . . . . . . . . . . . . . . . . . . 10
2.6 Spectral Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.7 Spectral Representation of a Stationary Process . . . . . . . . . . . . 15
2.8 Nonstationary Harmonizable Process . . . . . . . . . . . . . . . . . . 19
2.9 Multitaper Spectral Estimation Overview . . . . . . . . . . . . . . . . 20
2.10 Aliasing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.11 Zero Padding Spectral Estimates . . . . . . . . . . . . . . . . . . . . 26
2.12 Jackknife Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.13 Coherence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.14 Spectrograms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Chapter 3:
Frequency-domain Change-point Detection . . . . . . . . 32
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.2 Change-points Problem Overview . . . . . . . . . . . . . . . . . . . . 34
3.3 Literature Review of Change-point Techniques . . . . . . . . . . . . . 36
3.4 Additional Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.5 Level-of-change in Frequency-domain . . . . . . . . . . . . . . . . . . 41
3.6 Simulation Study of Estimator . . . . . . . . . . . . . . . . . . . . . . 46
3.7 Suggested Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . 73
3.8 Summary and Comments . . . . . . . . . . . . . . . . . . . . . . . . . 74
vii
Chapter 4:
Burgundy Grape Harvest Dates . . . . . . . . . . . . . . . 76
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
4.2 Initial Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.3 Spectrograms and Level-of-change . . . . . . . . . . . . . . . . . . . . 93
4.4 Summary and Concluding Remarks . . . . . . . . . . . . . . . . . . . 98
Chapter 5:
Goodness-of-fit in AR Processes . . . . . . . . . . . . . . 99
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.2 Calculation of AR Coefficients . . . . . . . . . . . . . . . . . . . . . . 102
5.3 Cautionary Notes on Using AR Spectral Estimates . . . . . . . . . . 113
5.4 Comparison of Methods for Finding AR Coefficients . . . . . . . . . . 114
5.5 Goodness-of-fit Test for Autoregressive Processes . . . . . . . . . . . 115
5.6 Simulations of Goodness-of-fit . . . . . . . . . . . . . . . . . . . . . . 119
5.7 Burgundy Grape Harvest Dates . . . . . . . . . . . . . . . . . . . . . 120
5.8 Conclusions and Future Work . . . . . . . . . . . . . . . . . . . . . . 122
Chapter 6:
Concluding Remarks . . . . . . . . . . . . . . . . . . . . . 126
Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Appendix A:
Multitaper R Package . . . . . . . . . . . . . . . . . . . . 149
A.1 Appendix Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
viii
A.2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
A.3 The Theory of Multitaper Spectral Estimation . . . . . . . . . . . . . 154
A.4 Addressing Statistical Significance with Multitaper Tools . . . . . . . 162
A.5 Bivariate Time Series: Magnitude-squared Coherence . . . . . . . . . 168
A.6 Complex Demodulation . . . . . . . . . . . . . . . . . . . . . . . . . . 172
A.7 Additional Tools and Extending Functionality . . . . . . . . . . . . . 177
A.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
ix
List of Tables
3.1 Random samples of spectral means were generated of size 2048, 4096,
8192, then multitaper adaptively weighted block spectrograms were
constructed by using block lengths of 128, 256, and 512 respectively.
The table gives sample means found using simulation for the peri-
odogram and nonadaptive weighted multitaper spectral estimates with
time-bandwidth parameters, N W = 2, 3, 4 and 5. . . . . . . . . . . . 50
3.2 Variances of random samples were generated and multitaper spectro-
grams constructed as in Table 3.2. Observed sample variances con-
structed using adaptive weighting are higher than both theoretical
variances and simulated variances constructed from multitaper spec-
trograms without adaptive weighting. . . . . . . . . . . . . . . . . . . 50
3.3 Sample means of the level-of-change estimator from an N (0, 1)3 dis-
tribution. 4000-run simulations were made, each having 16 blocks
in length. The bottom row gives the approximations derived in Sec-
tion 3.5.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.4 Sample variances from simulated level-of-change estimator from an
N (0, 1)3 distribution. 4000-run simulations were made, each having
16 blocks in length. The bottom row gives the approximations derived
in Section 3.5.2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
x
3.5 Average across blocks and frequencies of the standard error matrix of
the level-of-change estimator, using adaptive weights, with N W = 5,
and K = 9, from 4000 simulations of the autoregressive moving average
(ARMA)(4,2) process. . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.6 Average across blocks and frequencies of the standard sample mean of
the level-of-change estimator, using adaptive weights, N W = 5 and
K = 9, from 4000 simulations of the ARMA(4,2) process. . . . . . . . 63
3.7 Cutoffs for controlling Type I error for the level-of-change estimator
based on maximum values in each level-of-change matrix for 4000 sim-
ulation and a N (0, 1) process. . . . . . . . . . . . . . . . . . . . . . . 70
3.8 A sample of potential block sizes, selected by using the criterion that
data at the end points not be discarded. In general, when the offset size
is small, the options for block size increase, and the trade-off occurs
when block size and offset are close and thus minimizing the overlap. 72
xi
5.2 Shape and rate parameters with their respective standard errors, ab-
breviated SE, for the fitted Gamma distributions shown in Figure 5.2.
Both the shape and rate parameters are considerably higher for the
case where the simulated autoregressive (AR) model did not match
the theoretical model. . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.3 Maximum absolute deviation (max abs dist) of the observed grape har-
vest date (GHD) standardized integrated spectrum to the theoretical
standardized integrated spectrum for the various models and approx-
imate p-values based on simulations testing the null hypothesis that
the maximum absolute deviation is small enough for the model to be
appropriate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
xii
List of Figures
xiii
3.5 Multitaper adaptively weighted spectrogram of a realization of the
ARMA(4,2) process. Each block is 128 samples long, and the multi-
taper parameters used are N W = 5 and K = 9. . . . . . . . . . . . . 58
3.6 Multitaper adaptively weighted spectrum estimate all 2048 samples
from the same realization of the ARMA(4,2) process using multitaper
parameters N W = 5 and K = 9. . . . . . . . . . . . . . . . . . . . . 59
3.7 Level-of-change estimator, constructed without adaptive weights, N W =
5, and K = 9, for the ARMA(4,2) example shown in Figure 3.5. This
plot has h . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
3.8 Level-of-change estimator, constructed using adaptive weights, with
N W = 5 and K = 9 for the ARMA(4,2) example shown in Figure 3.5.
This image is less noisy than the one without adaptive weights. This
plot has some high valued false detects which require further examination. 61
3.9 Level-of-change estimator plot showing only values above the 4.16 cut-
off constructed using adaptive weights, with N W = 5, and K = 9
for the ARMA(4,2) example shown in Figure 3.5. The only detected
values are in a region where false detects are expected due to the low
resolution of each block. . . . . . . . . . . . . . . . . . . . . . . . . . 62
xiv
3.10 Multitaper spectrogram plot of simulated data containing two sinu-
soidal frequencies, with one that considerably damps down at the
halfway point. In this case the nonstationarity is clearly visible in
the spectrogram. The black line segment in the upper left indicates
the bandwidth, 2W . The first half of the data has a sinusoid of am-
plitude A1a = 1 at f1a = .09, and a sinusoid of amplitude A2 = 0.6
at f2 = 0.2. The second half has a sinusoid of amplitude A1b = 0.2 at
f1b ≈ 0.0526. The background noise has constant variance of one. The
multitaper parameters used were N W = 5, and K = 9. The ≈ 0.0526
low-amplitude frequency is not distinguishable at this block length. . 65
3.11 We plot the level-of-change estimator between adjacent blocks, trim-
ming the blocks by w at the frequency edges (zero and Nyquist frequen-
cies). Note that we visually detect a level-of-change estimator between
blocks 8 and 9 at a frequency of approximately 0.091 (1/11). . . . . 66
3.12 Bartlett M-test for this change-point example. This test shows non-
stationarity at the frequency where there is a change in amplitude and
change in frequency. The line segment in the below the legend indicates
the bandwidth, 2W , and the two dashed lines indicate the chi-squared
expected value and the 95% value. The multitaper parameters used
were N W = 5, and K = 9. . . . . . . . . . . . . . . . . . . . . . . . . 68
xv
3.13 Average eighth block pair level-of-change column over 4000 simula-
tions. This figure shows that the average observed level-of-change
over the 4000 simulations is considerably higher in the frequency range
where the change-point occurs. The multitaper parameters used were
N W = 5 and K = 9. . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
3.14 Plots of densities of the level-of-change estimator for a model with a
change-point and a model without. These are based on 4000 simula-
tions comparing maximum values of a model with a change-point to
one without. The intersection point is 0.68. . . . . . . . . . . . . . . 71
4.1 (a) Burgundy GHD plotted as number of days after September 1st .
Five additional series are also shown: (b) Swiss GHD as days after
September 1st . There are several large gaps in the first part of this
series. (c) Central England Temperature (CET) annual temperature
series. (d) Annual phase of the CET series in (angular) degrees. (e)
Estimated total solar irradiance (TSI) in watts per square metre. (f)
Three reconstructions of the El Niño—southern oscillation (ENSO)
cycle shown in normalized degrees Celsius. . . . . . . . . . . . . . . 79
4.2 Multitaper spectra of GHD series. Multitaper spectral estimates were
made with N W = 3, 4, 5 and 6, and with K = 5, 7, 9 and 10, re-
spectively starting at the top left. The crosses at approximately 0.135
cycles/year indicates the passband bandwidth, 2W , and height of the
approximate theoretical 95% confidence interval based on the χ22k dis-
tribution. Note that the peak at a period of 3.9 years almost agrees
with Tourre et al. (2011). . . . . . . . . . . . . . . . . . . . . . . . . . 81
xvi
4.3 This figure shows the harmonic F -test statistic for the harvest dates.
The parameter values used are N W = 3, 4, 5 and 6, with K = 5, 7, 9
and 10 respectively. The red dashed line indicates a 1 − 1/N level of
significance where N = 634, in keeping with the rule of thumb for the
harmonic F -test (see Section 3.6.1). We note that the most significant
peak occurs at a period of 4.14 years, which is close to the reported
period of 3.9 years reported in Tourre et al. (2011, p. 247). . . . . . . 82
4.4 Overlapping section of the Swiss and Burgundy GHD series consist-
ing of years 1550 to 2003; no prewhitening has been applied, and
magnitude-squared coherence (MSC) is presented in the next plot. We
note that the Swiss harvest is on average ∼ 14 days after the Burgundy
harvest. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
4.5 MSC between Swiss and Burgundy GHDs. The coherence is con-
structed from overlapped years 1550 to 2003 and is based on the multi-
taper spectral estimates with parameters N W = 4 and K = 7. The
y-axis indicates a normalized MSC; a hyperbolic inverse tangent trans-
form is known to transform the MSC to a standard normal distribu-
tion (Thomson and Chave, 1991b). The dashed red line indicates the
inherent bias in the estimate; specifically, it shows that a coherence
of 0.14 will be observed for estimated values of uncorrelated samples.
The faint dashed line on the coherence plot represents the lower of a
one standard deviation jackknife confidence interval. The two dashed
blue lines indicate a significance of 95% and 99%, corresponding to an
MSC of 0.39 and 0.54 respectively. . . . . . . . . . . . . . . . . . . . 85
xvii
4.6 Phase coherence between Burgundy and Swiss GHDs. Coherence is de-
fined in (2.67) and based on the multitaper cross-spectrum in (2.68). In
these equations the Burgundy series is represented by x and the Swiss
series is represented by y. Two standard deviation confidence intervals
are indicated on the plots; the green line represents multitaper jackknife
confidence intervals, and the blue line represents approximate theoret-
ical confidence intervals (Bendat and Piersol, 2011, p. 306). It may be
observed that these agree well. The phase is generally consistent with
zero, excluding the low–frequency part, and no phase unwrapping was
required. Between periods of ∼ 208 and ∼ 90 years there is a sharp
drop to -69 degrees. Both edge frequencies are well known in the cli-
mate literature: 208 years is one of the main “Suess cycles” (Thomson,
1990b), and 90 years is very close to the upper peak, 91.5 years, of the
∼ 88 year Gleissberg cycle triplet (Peristyk and Damon, 2003). The
linear regression line (in grey) has a negative intercept and a positive
slope. This indicates that the Swiss series leads the Burgundy series
by ∼ 9 days. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.7 Plots of the Burgundy GHD and the CET annual series for overlapping
years 1661 to 2003. . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
xviii
4.8 MSC between Central England average annual temperature and the
Burgundy harvest dates from 1661 to 2003. The parameters used are
N W = 6.5 K = 11. The dashed red line indicates the bias value of
0.09, and the dashed blue lines indicate MSC of 0.173 and 0.201. The
coherence is modest, particularly at low frequencies. The association
between GHD and April to August temperatures in Burgundy have
been established (Chuine et al., 2004; Krieger et al., 2011). . . . . . . 89
4.9 Phase coherence between the Burgundy GHD and the average annual
temperature of Central England series and for years 1661 to 2003. This
figure is based on (2.67) with the Burgundy series is represented by x
and the Central England series is represented by y. The multitaper
parameters are: N W = 6.5, K = 11. The linear regression line (in
red) has a positive intercept and a positive slope. This indicates that
the Burgundy series leads the Central England series by ∼ 18 days. . 90
4.10 Plot of the Burgundy GHD Series and the Central England phase con-
structed from three years of monthly data. The phase was first cor-
rected for the three day offset. A discussion of obtaining the phase plot
is given in Appendix A.6.1. . . . . . . . . . . . . . . . . . . . . . . . 90
xix
4.11 MSC between annual phase of the Central England temperature series
and Burgundy GHD for years 1661 to 2003. The multitaper parameters
are: N W = 6.5, K = 11. The coherence is modest at low frequencies.
The annual phase of the Central England temperature series was calcu-
lated with zeroth order Slepian complex demodulation technique with
a length of N = 36, 3 years of monthly data, with N W = 4.5. The
thee-day offset for years 1661 to 1752, originally reported in Thomson
(1995), discussed on page 174, was applied. The dashed red line indi-
cates the bias value of 0.091, and the dashed blue lines indicate a MSC
of 0.17 and 0.21. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.12 Phase coherence between the Burgundy GHD and annual phase of the
Central England temperature series calculated over three years. This
figure is based on (2.67) with the Burgundy series is represented by
x and the annual phase of the Central England temperature series
represented by y. The intercept is positive and the slope is ∼ 300
degrees per year indicating the Burgundy GHD leads phase of the
CET series by ∼ 305 days. The multitaper parameters are: N W = 6.5,
K = 11. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.13 Multitaper spectrogram with considerable overlap. In this case the
block length is 74, there are 71 blocks, and the offset is 8 years. This
indicates an overlap of about 89%, but it allows for higher-frequency
resolution. The vertical line segment on the left indicates the band-
width, 2W , and one can see the spectral estimates evolve over time.
The centre line indicates where are analysis selects to section the series. 94
xx
4.14 Bartlett M-test for stationarity using block sizes with 2.5% (little) over-
lap. The expected value (green dashed line) and the 95% significance
level (red dotted line) are on the graph. The multitaper parameters
used were N W = 3 K = 5, with 8 blocks, each of length 81 with
an offset of 79. The line segment in the top right of the plot indicates
the bandwidth. Nonstationary components are approximately between
the frequencies of 0.1 and 0.18 cycles/year, and between 0.2 and 0.24
cycles/year . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.15 We plot the level-of-change between blocks in the spectrogram for the
GHD. If we restrict ourselves to the frequency of interest, 0.10 to 0.18,
based on the Bartlett M-test, we see that considerable change occurs
at approximately the centre of the series. . . . . . . . . . . . . . . . 96
4.16 Multitaper spectra of the GHD before (top) and after (bottom) the
year 1675.5. The crosses indicate 95% confidence levels and the width
of bandwidth parameter, 2W . On the upper plot, the dashed lines
indicate a period of 10.6 years (0.94 cycles/year), and 7.5 years (0.133
cycles/year )for the date up to the year 1675. On the lower plot,
the dashed line indicates a period of 3.9 years (0.278 cycles/year). It
appears that a change in the spectral properties of the GHD series
occurs when the data is sectioned at the year 1675. . . . . . . . . . . 97
xxi
5.1 Estimated fourth-reflection coefficient based on a 100000-run simu-
lation of an AR(4) process with coefficients 2.7607, -3.8106, 2.6535,
-0.9238. Levinson-Durbin estimate using: (a) the default estimate—
i.e., using the autocovariance sequence (acvs) from unwindowed Fourier
transforms; (b) one discrete prolate spheroidal sequence (DPSS) taper
with N W = 5; and (c) the use of an adaptive multitaper estimate with
k = 8. The dashed line indicates -0.9238, the true value. Mean esti-
mates were -0.425, -0.914, and -0.920 respectively. The distribution of
the Burg estimator is very similar to the multitaper spectral estimator
and is not shown. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.2 This figure shows the observed maximum absolute distance observed
from 40000 simulations. The top left plot compares a simulated AR(4)
to the theoretical AR(4), the top right plot compares a simulated
AR(2) to the theoretical AR(2), the bottom left plot compares a simu-
lated AR(4) to the theoretical AR(2), and the bottom right plot com-
pares a simulated AR(2) to a theoretical AR(4). Note the changing
y-axis scales. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
xxii
5.3 We ran 40000 simulations each comparing a simulated AR(4) to the
theoretical AR(4), top left, a simulated AR(2) to the theoretical AR(2),
top right, a simulated AR(2) to the theoretical AR(2), bottom left, and
a simulated AR(4) to the theoretical AR(2), bottom right. The top
two plots indicate the worst fit of the 40000 runs when the simulations
were from the same model as the theoretical AR, and the bottom two
plots indicate the best fit of the 40000 runs when the simulations are
from a model than different from the theoretical AR. . . . . . . . . . 122
5.4 Adaptive multitaper spectrum of the GHD series. The parameters
used are: N W = 3 and k = 5. Plotted over the spectrum, we have
the standard AR(1) spectrum in red, the standard AR(8) spectrum in
green, the DPSS tapered AR(8) spectrum in blue, and the multitaper
AR(8) spectrum in cyan. The multitaper AR(8) in cyan and the stan-
dard AR(8) follow closely except between the frequencies 0.2 and 0.3
(cycles/year), where the multitaper estimate has slightly higher power
and appears to follow the spectral estimate more closely. . . . . . . . 123
xxiii
A.6 Temperature deviations time series in degrees Celsius with trend lines
fitted. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
A.7 MSC between monthly CO2 measurements from Mauna Loa, and the
global temperature series during 1958–2007. The Arctanh transform
normalizes the MSC and each integer value on this scale represents
approximately one standard deviation (Thomson and Chave, 1991b). 182
A.8 Central England monthly temperature Phase . . . . . . . . . . . . . . 183
xxiv
List of Abbreviations
AR autoregressive
MA moving average
MSE mean-squared-error
xxvi
List of R Package Function Calls
xxvii
Chapter 1
Introduction
1
CHAPTER 1. INTRODUCTION 2
Background
3
CHAPTER 2. BACKGROUND 4
The feature of time series analysis which distinguishes it from other sta-
tistical analyses is the explicit recognition of the importance of the order
in which the observations are made. While in many problems the observa-
tions are statistically independent, in time series successive observations
may be dependent, and the dependence may depend on the positions in
the sequence. The nature of a series and the structure of its generating
process may also involve in other ways the sequence in which the obser-
vations are taken.
We are interested in time series that generally come under the category of “stochas-
tic processes,” which are described as a statistical phenomenon that evolves in time
CHAPTER 2. BACKGROUND 5
according to probabilistic laws (Chatfield, 2004, p. 27). The theory of stochastic pro-
cesses is well developed and is beyond the scope of this work. An introductory book
on the subject that covers the frequency domain is Papoulis and Pillai (2001), a dis-
cussion of stationary stochastic processes is presented in Grenander and Rosenblatt
(1984), and Cox and Miller (1965, ch. 7) provide suitable supplementary information.
If the sample space is the ensemble of all possible realizations, then at any fixed
time, we can define a random variable as a function from the sample space of all pos-
sible outcomes to the real line for a real-valued random variable, X(t), that describes
the outcome of the experiment at time t. A stochastic process, {X(t) : t ∈ T } is the
family of random variables indexed by t, where t belongs to some given set T .
Most statistical problems are concerned with estimating the properties of the pop-
ulation based on a sample. An investigator typically determines sample size and how
randomness is incorporated into the sample. In time series analysis, the observations
are determined by time, and it is rarely possible to take more than one sample at a
given time. While it may be possible to increase the sample size—i.e., the length of
the series—there will only be one sample at each time t. We can imagine an infinite
set of time series, an ensemble, where every member of the ensemble is a possible
realization of a stochastic process and the time series is a particular realization of the
ensemble. Ergodic theorems, discussed briefly in Section 2.4.1, seek to address this
theoretically.
CHAPTER 2. BACKGROUND 6
1 (x−µ)2
f (x) = √ e− 2σ2 , (2.1)
σ 2π
µ(t) = µ (2.2)
σ 2 (t) = σ 2 . (2.3)
and it depends only on the lag τ . Restricting τ to discrete time steps (2.4) becomes
the autocovariance sequence (acvs), denoted Rτ . The autocorrelation sequence is the
standardized autocovariance sequence; this is discussed in Section 5.2.1.
Autocorrelation—that is, correlations between samples in the same series at differ-
ent times—were first used in Cave-Browne-Cave (1905) while studying meteorological
data. She had earlier worked with Karl Pearson and computed correlations between
1
Here, in addition to the ordinary autocovariance r(t − u) = E{X(t), X ∗ (u)}, one must have the
“outer covariance” q(t, u) = E{X(t), X(u)} = q(t − u). Note that X(u) is not conjugated.
CHAPTER 2. BACKGROUND 8
This may not be immediately obvious, but it is possible to obtain a consistent esti-
mate of the properties of a stationary process from a single finite realization. Ergodic
theorems show that, for most stationary processes met in practice, sample moments of
CHAPTER 2. BACKGROUND 9
These terms will be used in Section 3.5.2 and are defined here. The cumulants κn of
a random variable X are defined with the cumulant-generating function
The cumulants are obtained from the power series expansion of the cumulant gener-
ating function
∞
X tn
g(t) = κn . (2.6)
n=1
n!
The gamma function is defined for all complex numbers except the negative inte-
gers and zero. For complex numbers with a positive real part, the gamma function is
defined as
Z ∞
Γ(t) = xt−1 e−x dx. (2.7)
0
The polygamma function of order m, ψ (m) (z), of complex value z is defined as the
m + 1 derivative of the logarithm of the gamma function:
(m) dm+1
ψ (z) = m+1 ln Γ(z). (2.8)
dz
CHAPTER 2. BACKGROUND 10
S(f ) is the spectral density function. Assuming the S(f ) is square integrable, and
if R(τ ) is square summable, then for a time step of ∆ t = 1 the following holds in
mean-square:
∞
X
S(f ) = R(τ )e−i2πf τ , (2.9)
τ =−∞
and conversely
Z 1/2
R(τ ) = S(f )ei2πf τ df. (2.10)
−1/2
This section introduces the spectrum. The paleoclimate datasets under study are
real-valued, and we restrict attention here to real-valued time series. We will use the
multitaper spectrum estimates (Thomson, 1982, 2001; Thomson et al., 2007; Park
et al., 1987; Lindberg and Park, 1987). Other methods, such as classical Blackman-
Tukey, classical periodogram, autoregressive, maximum likelihood, Prony and Pis-
arenko methods are reviewed in Kay and Marple (1981). In general, the multitaper
method provides improved bias and variance properties over earlier estimators at a
computational cost. A comparison of multitaper spectral estimation and Welch’s
(windowed) overlapped segment average (WOSA) indicates the multitaper method
has a performance advantage (Bronez, 1992).
CHAPTER 2. BACKGROUND 11
2.6.1 Periodogram
N −1 2
1 X
P (f ) = x(t)e−i2πf t . (2.11)
N t=0
The periodogram was suggested in Stokes (1879), then named and analyzed in Schus-
ter (1898). Einstein introduced the concept of power spectrum without using the
term (Einstein, 1987; Yaglom, 1987a).
The periodogram is the Fourier transform of the sample autocovariance sequence,
N
X −1
P (f ) = R̂τ e−i2πf τ , (2.12)
τ =−(N −1)
where R̂τ is the sample acvs. The frequency domain of such estimates is in the Nyquist
band −1/2 ≤ f < 1/2. This is a stationary version of the Einstein-Wiener-Khintchine
theorem (Einstein, 1987; Khintchine, 1934).
Thomson (1977a) has given an example where the periodogram was in error by a
factor of greater than 1010 over most of the frequency range. Periodograms have two
major problems, variance and bias. The periodogram is an inconsistent estimator, as
the variance does not decrease with sample size; this was first pointed out by Rayleigh
(1903). Brillinger (2001) has shown that if x(t) is a strictly stationary process with
an acvs such that
∞
X
|τ Rτ | < ∞, (2.13)
τ =−∞
2
This can also be considered single tapered spectral estimate with a constant taper.
CHAPTER 2. BACKGROUND 12
The regularity condition implies that the acvs decays to zero quickly. Alternatively,
it implies the true spectrum, S(f ), is a smooth function; S(f ) has a continuous first
derivative (P&W93).
Often the direct spectral estimator is convolved3 with a smoothing window. An exam-
ple of convolution with a smoothing window is taking a running mean of five adjacent
points4 . The direct spectral estimate is the sum of two squares, the imaginary and
real part of the discrete Fourier transform, and it has a chi-squared distribution with
2 degrees of freedom (Blackman and Tukey, 1959).
Multitaper spectral estimates, first introduced in Thomson (1982), make use of multi-
ple direct spectral estimators with discrete prolate spheroidal sequence (DPSS) (Slepian,
1978), also called Slepian sequences, described in Section 2.9.1 as the windowing func-
tion. In this procedure, one selects an analysis bandwidth W, such that 0 < W ≤ 1/4,
often N W ≈ 4 to 6 (Thomson, 2001). One then selects K ≈ 2N W Slepian sequences
to use as tapers. For each taper one computes the eigencoefficients,
N
X −1
(k)
yk (f ) = x(t)vt (N, W )e−i2πf t , (2.18)
t=0
(k)
where vt (N, W ) is the k th Slepian sequences for parameters N , W and k = 0, 1, . . . ,
K − 1. The crudest multitaper spectrum estimator is an average of the eigencoeffi-
cients,
K−1
1 X
S̄(f ) = |yk (f )|2 . (2.19)
K k=0
3
The convolution product, sometimes called the “resultant” or “Faltung” of two functions f and
R 2π
g, is defined as (f ∗ g)(x) = 0 f (x − t)g(t)dt (Davis, 1963).
4
The terms data window and data taper refer to a window applied prior to a Fourier transform,
whereas a smoothing window is applied to a dataset or a spectral estimate.
CHAPTER 2. BACKGROUND 14
The multitaper method provides a harmonic F -test for periodic components in coloured
noise (Thomson, 1982). To describe the F -test, we first define Uk (N, W ; 0) as the dis-
crete prolate wave function, which is also the Fourier transform of the Slepian sequence
(k)
vt (N, W ), taken with f = 0. Then the harmonic F -test is defined as
(K − 1)|µ̂|2 K−1 2
P
k=0 Uk (N, W ; 0)
F (f ) = PK−1 , (2.21)
2
k=0 |yk (f ) − µ̂(f )Uk (N, W ; 0)|
known to sum to zero. We can use this along with multitaper spectra to locate and
assess the significance of harmonic components found in a time series.
cess
Papers explaining the multitaper spectral estimate generally begin with the Cramér
Spectral Representation (Cramér, 1940; Thomson, 1990b, 1982; Park et al., 1987;
Thomson, 2001). In this section, we motivate the spectral representation theorem
following the procedure in P&W(1993).
Initially we consider the spectral representation theorem for real-valued discrete
time harmonic process
L
X
Xt = Dl cos(2πfl t + φl ), t = 0, ±1, ±2, · · · , (2.23)
l=1
where L ≥ 1; Dl and fl are real-valued constants. The fl ’s are distinct, fl > 0, and
the terms φl are independent random variables having a rectangular distribution on
[−π, π]. This is a zero-mean harmonic process. Assume the frequencies are ordered
such that 0 < fl < fl+1 ≤ 1/2. Here the Nyquist frequency is 1/2 and ∆t = 1. Using
the definition
eiθ + e−iθ
cos(θ) = , (2.24)
2
The variance of the stationary process can be decomposed into a sum of components
E{|Cl |2 }. We can define a variance spectrum by
Dl2 /4,
if f = fl , l = 0, ±1, · · · , ±L,
(V )
S (f ) = (2.29)
0,
otherwise.
where fL+1 = 1/2, and Z(0) = 0. Z(f ) is a “jump” process on the interval [0, 1/2]
with a random complex-valued jump at each fl . Then
0,
for 0 ≤ f ≤ f1 ,
Z(f ) = C1 , for f1 < f ≤ f2 , (2.31)
C 1 + C 2 ,
for f2 < f ≤ f3 ,
5
Cov(Z1 , Z2 ) = E{[Z1 − E[Z1 ]]∗ [Z2 − E[Z2 ]]}. A superscripted ∗
denotes complex conjugate.
CHAPTER 2. BACKGROUND 17
and so forth.
We now define an orthogonal increment process as
Z(f + df ) − Z(f ), 0 ≤ f < 1/2,
dZ(f ) = 0, f = 1/2, and (2.32)
dZ ∗ (−f ),
−1/2 < f < 0.
In this case, df is a small increment such that 0 < f + df < 1/2 when 0 < f < 1/2.
For l ≥ 0 we have
= Cl . (2.33)
For any f 6= fl for some l, dZ(f ) = 0 for df that are sufficiently small. As E{Cl } = 0
and dZ(f ) is either 0 or Cl , E{dZ(f )} = 0. Next, provided that f, f 0 , df, df 0 , are such
that the intervals [f, f + df ], and [f 0 , f 0 + df 0 ] do not intersect, the random variables
dZ(f ) and dZ(f 0 ) are uncorrelated,
We can show that the random variables are uncorrelated if we consider the following
cases:
Based on (2.34), the process {Z(f )} has orthogonal increments and is called an or-
thogonal process. Note that Var{dZ(f )} = Dl2 /4.
Let g(f ) be a continuous function over the interval [−1/2, 1/2], and let H(f ) be
a step function defined over the same intervals with jumps at
Next, let g(f ) = ei2πf t and H(f ) = Z(f ) and we can rewrite (2.25) as
Z 1/2
Xt = ei2πf t dZ(f ). (2.36)
−1/2
for all t. Note the difference in the limits of integration between (2.37) and (2.36).
The process {Z(f )} has the following properties:
2. E{|dZ(f )|2 } = dS (I) (f ) for all f , where the integrated spectrum S (I) (f ) is
bounded and nondecreasing; and
This means that, in the case of a stationary process, we are only interested in one
column of the variance covariance matrix, as all other columns are cyclic shifts—that
is, the matrix is Toeplitz. In a harmonizable process, we must consider the entire
matrix.
Multitaper spectral estimation differs from WOSA estimates (Welch, 1967a) in that
instead of using a single Hamming window7 on overlapped segments, one will use
multiple orthogonal Slepian sequence tapers on the entire length of the time series, N .
One performs several multitaper spectral estimates on sections, or blocks, of the time
series. When these block estimators are plotted sequentially in colour, a spectrogram
is formed. In constructing a spectrogram, one must consider the appropriate length,
bandwidth, and allowable overlap in selecting block size.
Three important variables are used in this approach (Thomson, 2001).
3. The offset between blocks, ∆d. Often Nb /2 is used as the offset between blocks.
WOSA estimates use a 50% offset which is equivalent to ∆d = N/2.
Welch was timely—his estimator was introduced just after the introduction of the
fast Fourier transform (FFT), but averaging periodograms over different times was
also mentioned in Schuster (1898).
Details of the multitaper spectral estimator for full-length time series follow. These
details apply to a single section, or block, of the series if N is replaced with Nb .
Multitaper estimates of the spectrum are based on approximately solving the integral
equation that expresses the projection of dX(f ) onto the Fourier transform of the
data, y(f ) (Strang, 2005, pp. 204–206). If one takes the discrete Fourier transform
of the observed data,
N
X −1
y(f ) = x(t)e−i2πf t , (2.41)
t=0
and uses the spectral representation (2.39) for x(t), one gets the fundamental equation
of spectrum estimation,
Z 1/2
y(f ) = KN (f − ξ)dX(ξ), (2.42)
−1/2
where the kernel, the Dirichlet kernel multiplied by a phase factor, is defined as
sin(N πf ) N −1
KN (f ) = exp −i2πf . (2.43)
sin(πf ) 2
1. As one can take the inverse Fourier transform of y(f ) and recover x(t) for
0 ≤ t ≤ N − 1, y(f ) is a trivially sufficient statistic and completely equivalent
to the original data. In practice, when an FFT is used, the inverse FFT will
return the original data. The inverse can be performed given either: (1) the
complete FFT result including any redundant complex conjugate in the case of
real data, or (2) the FFT result without the redundant complex conjugate and
the original length (Frigo and Johnson, 2005). The latter can only be used in
the case of real data input to the FFT.
2. The finite Fourier transform y(f ) is not equivalent to dX(f ), because dX(f )
is assumed to generate the entire data sequence for all t, not just the observed
samples.
3. 1
N
|y(f )|2 , the periodogram, is not the spectrum; it is biased and inconsistent.
5. Multitaper spectral estimators refer to the class of estimators that use any set
orthogonal data tapers, and this work focuses on using Slepian sequences as
orthogonal data tapers.
The multitaper estimates used in this thesis use the discrete prolate spheroidal wave
functions (dpswfs). These functions provide reasonable solutions to (2.42). The
(k)
Slepian sequences υn (N, W ) are defined as real, unit-energy sequences on [0, N − 1]
CHAPTER 2. BACKGROUND 23
having the greatest in bandwidth energy, W, and are the solutions to the symmetric
Toeplitz matrix eigenvalue equation,
N −1
X sin(2πW (n − m)) (k)
λk υn(k) = υm , for 0 ≤ n ≤ N − 1. (2.44)
m=0
π(n − m)
(k) (k)
From this point we will write vt to indicate vt (N, W ) for the Slepian sequences,
and the arguments N and W are implied. To compute the Slepian sequences, we
use the tridiagonal form given in Slepian (1978). In practice, the LAPACK functions
dstebz and dstein (Anderson et al., 1999) are used as described in P&W93. These
LAPACK functions are called from the multitaper R package; see page 177 for details
on obtaining Slepian sequences using the R package. We used the normalization used
in Thomson (1982); Park et al. (1987), and not those used in P&W93 or the signal
processing toolbox in Matlab. Thomson (1990a) defines
N
X −1
Vk (N, W ; f ) = υn(k) e−i2πnf , (2.45)
n=0
on L2 (−W, W ) × [0, N − 1] has better sidelobe leakage properties than other sets of
orthogonal windows (Thomson, 2001, p. 327). The sequences are orthonormal, and
the functions are orthonormal on [−1/2, 1/2) and are also orthogonal on (−W, W ):
Z W
Vj (f )Vk∗ (f )df = λj δjk . (2.48)
−W
In general, the multitaper estimate (2.19) is not used. Instead, weights are used
to replace averaging of the independent eigencoefficients (2.18). The weights are
calculated iteratively from
√
λk S(f )
dk (f ) ≈ , (2.49)
λk S(f ) + Bk (f )
Bk (f ) ≤ σ 2 (1 − λk ). (2.50)
The initial weights, calculated from the initial estimate, are applied to the eigen-
coefficients creating, the weighted eigencoefficients, which are used to obtain a new
estimate, and this becomes the initial estimate in the next iteration. In practice,
only a handful of iterations are required (Thomson, 1982). The canonical multitaper
spectrum estimate is
1
PK−1 2
k k=0 |dk (f )yk (f )|
S
cx (f ) =
1
PK−1 . (2.51)
2
K k=0 |dk (f )|
CHAPTER 2. BACKGROUND 25
The equations used to calculate the weights, (2.49), and the canonical multitaper,
(2.51), follow the original definitions given in Thomson (1982); Thomson et al. (2007);
Park et al. (1987).
The statistical properties of (2.51) are known, and some important points from Thom-
son et al. (2007) are listed below.
3. Under a locally white assumption, where we assume that the data is white
within the bandwidth W, multitaper estimates can be jackknifed by deleting
one window at a time (Thomson and Chave, 1991b). See Section 2.12.
2.10 Aliasing
time series X(t) has the Fourier transform Gc (f ). If the series is sampled at discrete
time intervals, t = 0, 1, · · · , N − 1, with equal spacing ∆t, then denote the Fourier
transform of the sampled series as Gd (f ). It can be shown (Blackman and Tukey,
1959) that
∞
X k
Gd (f ) = Gc (f + ). (2.53)
k=−∞
∆t
k
This indicates that Gd (f ) depends on a countably infinite set of frequencies f + ∆t , for
±1 ±2 1
k= , ,···.
∆t ∆t
The Nyquist or folding frequency is defined as 2∆t
(Nyquist, 1928).
Zero padding is generally used in conjunction with FFT algorithms, and the default
option using the multitaper R package (introduced in Appendix A) zero pads the data
to twice the next power of two given the length of the data. Specifically, if n is the
length of the data, then zero padding is performed to a total length, nFFT , of
This section follows the development in Thomson and Chave (1991b). In jackknifing,
separate estimates are formed by deleting one sample at time. This differs from boot-
strapping, in which one resamples from the original sample with replacement (Wu,
1986; Good, 2001). Let {xi }, for i = 1, · · · , N be a sample of N independent ob-
servations drawn from some distribution characterized by a parameter θ, which we
estimate by θ̂. We denote the estimate of θ using all N observations by θ̂all . If we
now create N additional estimates, θ̂(i) , each one leaving out the ith of the original
samples. Each new leave-one-out estimate is made from N − 1 samples
This estimate was introduced as a lower bias replacement for θ̂. Let the average of
the delete-one estimates be
N
1 X
θ(·) = θ̂(i) . (2.58)
N i=0
(2.59) was considered the variance estimate for θ̃, but simulations with small samples
indicate that it is more accurate than the variance estimate of θ̂all (Hinkley, 1978).
This estimate of the variance, (2.59), has been shown to be conservative (Efron and
Stein, 1981). In practice, it is informative to further research datasets or frequencies
where jackknife variances do not coincide with theoretical variances. The idea of
“jackknifing over tapers” was introduced in Blackman and Tukey (1959) and then
developed in Thomson (1984).
In the case of the multitaper estimate, the jackknife is computed by leaving out
one of the eigenspectra in computing each θ̂(i) . Each θ̂(i) is calculated using (2.51)
such that weights are calculated differently for each θ̂(i) . In this case θ(·) 6= θ̂all . In
practice, a two-standard-deviation jackknife confidence interval can be plotted on a
multitaper spectral estimate, multitaper magnitude-squared coherence estimate and
phase coherence estimate; these can be used in determining significance and can
be used to draw attention to details in estimates in which theoretical and jackknife
confidence intervals do not match.
2.13 Coherence
We define the coherence as a measure of the degree to which both variables are jointly
influenced by cycles near frequency f (Jenkins and Watts, 1968; Koopmans, 1995).
We begin with the k th autocovariance matrix for a two-dimensional process. Let
Xt
y = , (2.60)
Yt
CHAPTER 2. BACKGROUND 29
The sample cross-spectrum can be written as the sample cospectrum, ĉyx (f ), and the
sample quadrature spectrum, q̂yx (f ) (Jenkins and Watts, 1968),
where
q
R̂(f ) = [ĉyx (f )]2 + [q̂yx (f )]2 , and (2.64)
−1 q̂yx (f )
θ̂(f ) = tan . (2.65)
ĉyx (f )
|Ŝxy |2
|γ̂(f )|2 = , (2.66)
Ŝxx (f )Ŝyy (f )
CHAPTER 2. BACKGROUND 30
In calculating the actual angle, one must consider the quadrant unless a function such
as atan2 in R is used (Bloomfield, 2000).
Using the multitaper method we calculate the cross-spectrum from the eigen-
coefficients or by the eigencoefficients weighted by the square root of the weights
determined in (2.49),
K−1
1X
Sxy (f ) = xk (f )yk∗ (f ), (2.68)
k k=0
and then calculate the coherence as above (Kuo et al., 1990; Thomson, 1982).
The magnitude-squared coherence, when calculated using an untapered spectral
estimate, such as the periodogram, is known to be biased (Carter et al., 1973a). In
this thesis we use the multitaper method to correct for bias; specifically, the bias
correction in Thomson and Chave (1991b, p. 87) is used in this thesis.
2.14 Spectrograms
Spectrograms were introduced in analog form in Koenig et al. (1946), and we plot
the multitaper spectrogram or the high-resolution spectrogram introduced in Thom-
son (1998). This estimator is considered similar to the evolutionary periodogram
estimator (Kayhan et al., 1994), which was introduced independently a few years
earlier (Moghtaderi, 2009, pp. 24–26). A spectrogram is a graphical representation of
multiple power spectral estimators in succession, each estimator based on a section,
or “block,” of the entire data series. The spectrogram can consist of blocks with both
CHAPTER 2. BACKGROUND 31
32
CHAPTER 3. FREQUENCY-DOMAIN CHANGE-POINT DETECTION 33
3.1 Introduction
Section 3.5 provides properties of (3.1), which is related to the Fisher Z-distribution (Fisher,
1924).
This chapter is organized as follows: Section 3.2 introduces the change-point prob-
lem; Section 3.3 provides a literature review of change-point techniques; Section 3.4
gives additional preliminary information; Section 3.5 introduces the frequency-domain
CHAPTER 3. FREQUENCY-DOMAIN CHANGE-POINT DETECTION 34
change-point estimator and discusses some of its properties; Section 3.6 studies the
estimator using simulations on a stationary model and a model where there is change
in the frequency component(s); Section 3.7 puts forward a methodology for using the
estimator which incorporates existing tools; and Section 3.8 presents a summary and
concluding remarks.
The terms change-point and disorder (in Eastern Europe) have been used to de-
scribe a change in the statistical distribution of samples. We are interested specifically
in changes over time that are observed in the spectral (time × frequency) domain.
Chen and Gupta (2012, pp. 1–5) give an introduction to the change-point problem.
Their discussion, like the methodology presented here, focuses on offline change-point
analysis, or retrospective change-point analysis of finite samples, as opposed to on-
line1 change-point detection, which occurs in real-time with sequential data. Online
change-point problems are generally presented in statistical quality control, public
health surveillance, and signal processing (see Mei (2006) for an overview).
A broad selection of methods are used in change-point detection, including maxim
likelihood ratio tests, Bayesian based tests, nonparametric tests, stochastic process
analysis and information theory approaches. Generally parametric change-point prob-
lems focus on detecting change in the mean, scale and shape parameters of a proba-
bility distribution, whereas nonparametric methods focus on rank or order statistics,
as in, for example, Pettitt (1979) studies Mann-Whitney-Wilcoxon-like (Mann and
Whitney, 1947) tests in change-point detection. In economics, one looks for evidence
of “structural change” related to externally influenced changes (Andrews, 1993).
Change-point problems originated in the statistical quality control context, but have
spread into areas such as stationarity of stochastic processes, estimation of the cur-
rent position of a time series, and testing and estimating changes in the patterns
1
Online change-point analysis occurs in statistical control systems where an observed change
requires an immediate correction, whereas offline change-point analysis occurs after the data has
been collected, and the analysis is to determine whether (and where) a change occurred.
CHAPTER 3. FREQUENCY-DOMAIN CHANGE-POINT DETECTION 37
Thomson (1977b, p. 1994) suggests using the Bartlett M-test for heteroscedastic-
ity (Bartlett, 1937) to test for nonstationarity as a function of frequency. This test
evaluates the logarithm of the ratio of the arithmetic mean of spectral estimates,
across blocks, over the geometric mean across blocks. If for each block j ∈ 1, . . . , nb ,
where nb is the number of blocks, we have a spectral estimate, Ŝj (f ), then at fre-
quency, f , the Bartlett M-test statistic is constructed as:
nb
! nb
1 X X
M (f ) = nb ν ln
c Ŝj (f ) − ν ln Ŝj (f ), (3.2)
nb j=1 j=1
CHAPTER 3. FREQUENCY-DOMAIN CHANGE-POINT DETECTION 40
where B is the beta function, or Euler integral, and ν1 and ν2 are the degrees-of-
freedom. We restrict ourselves to the case where ν1 = ν2 = ν, and the Fisher
Z-distribution, which is then symmetric, becomes
sechν (z)
f (z, ν, ν) = ν−1 dz. (3.4)
2 B(ν/2, ν/2)
CHAPTER 3. FREQUENCY-DOMAIN CHANGE-POINT DETECTION 41
A notable difference between the Fisher Z-distribution and the F -distribution is that
ν2
when ν2 > 2, the F -distribution has the expected value , whereas E{Z} = 0.
ν2 − 2
If the spectra are independent, the proposed level-of-change estimator is the
square of an estimator with a Fisher-Z distribution, which is approximately Gaus-
sian (Aroian, 1941), and thus the square has approximately a chi-squared distribution.
The use of Fisher-Z distribution in this work is currently limited to the expected value
in Section 3.5.2; however, the distribution may be of value in future work.
In this thesis, spectral estimates in (3.1) are computed using the multitaper procedure
described in Section 2.6.3. It is known that such estimates have approximately a chi-
squared distribution with 2K degrees-of-freedom, where K is the number of tapers
used (P&W93, p. 222). We propose the following level-of-change estimator:
We are interested in the mean and variance of the level-of-change estimator (3.5)
at one frequency. In this section, we obtain the first two central moments for the
level-of-change estimator and express these in terms of polygamma functions. We use
simulations to check these values in Section 3.6.
CHAPTER 3. FREQUENCY-DOMAIN CHANGE-POINT DETECTION 42
In the following, the subscripts j and j + 1 are replaced with 1 and 2 for conve-
nience. Begin assuming that Ŝ1 and Ŝ2 are independent spectrum estimates at any
frequency f . We omit the frequency f in this section for convenience.
Let
Sˆ1
Ẑ0 = ln . (3.6)
Ŝ2
For examining the level-of-change in two spectral estimates, consider the two estimates
ˆl1 = ln Ŝ1 , and ˆl2 = ln Ŝ2 . Let ¯l = E{ˆl1 } = E{ˆl2 }, then Ẑ0 = ˆl1 − ˆl2 , and E{Ẑ0 } = 0.2
Q̂0 = Ẑ02
= (ˆl1 − ˆl2 )2
omitting the subscripts 1 and 2 in ˆl. Equation (3.9) will be written in terms of
a trigamma function by (1) writing the second central moment of Ẑ0 in terms of
2
If we do not require ¯l = E{ˆl1 } = E{ˆl2 }, then E{Ẑ0 } = 0 as Ẑ0 has a Fisher Z distribution.
CHAPTER 3. FREQUENCY-DOMAIN CHANGE-POINT DETECTION 43
cumulants of ln Ŝ (Stuart and Ord, 2010, pp. 88–89), and (2) equating cumulants
of ln Ŝ, the natural logarithm of an estimate with a chi-squared distribution, to a
trigamma function (Bartlett and Kendall, 1946, p. 128).
Next, the variance of the level-of-change estimator, Q̂0 , is considered:
In order to examine (3.10) more closely, write Q̂20 as a fourth power of ˆl1 and ˆl2 .
= (ˆl1 − ¯l)4 − 4 (ˆl1 − ¯l)3 (ˆl2 − ¯l) + 6 (ˆl1 − ¯l)2 (ˆl2 − ¯l)2 − 4 (ˆl1 − ¯l) (ˆl2 − ¯l)3 + (ˆl2 − ¯l)4 .
(3.11)
In order to examine the second and fourth terms in (3.11), we take their expected
value, noting E{ˆl1 − ¯l} = E{ˆl2 − ¯l} = 0, so by independence,
and the expectation of the second and fourth terms in (3.11) is zero. For the expected
value of the third term, we have
Taking the expected value and grouping terms of the level-of-change estimator,
we have
From (3.10) and (3.14), we obtain an expression for the variance of the level-of-
change estimator:
The expressions E{(ˆl−¯l)2 } and E{(ˆl−¯l)4 }, the second and fourth central moments, can
be expressed in terms of the second- and fourth-order cumulants of the distribution
of ln Ŝ (Stuart and Ord, 2010, pp. 88–89), represented as κ2 and κ4 as follows:
E{Q̂0 } = 2 κ2
= 2ψ 0 (K) (3.19)
CHAPTER 3. FREQUENCY-DOMAIN CHANGE-POINT DETECTION 45
and
= 2 κ4 + 8 κ22
The polygamma functions of order one and three (when r = 1 and 3 in (3.18))
have the following series expansions, which hold as α → ∞ (Abramowitz and Stegun,
1965, p. 260):
1 1 1 1 1 1
ψ 0 (α) = + 2+ 3− 5
+ 7
− + ..., (3.21)
α 2α 6α 30α 42α 30α9
and
2 3 2 1 4 3 10
ψ (3) (α) = 3
+ 4 + 5 − 7 + 9 − 11 + 13 − . . . . (3.22)
α α α α 3α α α
The discussion in Section 3.5.2 makes the assumption of independent spectrum esti-
mates, which does not exactly hold. It has been shown to hold asymptotically as the
distance between the sections on which the two spectral estimates are made grows
infinitely large (Brillinger, 2001, p. 130). However, this work deals with adjacent
sections or slightly overlapping sections.
As a partial justification of the independence assumption we present the following
observations. Correlations between spectrum estimates made on different blocks using
CHAPTER 3. FREQUENCY-DOMAIN CHANGE-POINT DETECTION 46
The first simulation studies the case of no change-points in independent data. This
example is designed to produce the mean and variance values derived in Section 3.5.2.
Random samples from a N (0, 1) distribution are generated, the level-of-change esti-
mator is calculated, and the sample mean and the sample variances are observed.
This is done using sample sizes of 2048, 4096, and 8192 and block sizes of 128, 256,
and 512, respectively. The number 128 is a power of two that is close to an appro-
priate block size for analyzing the GHD series, and larger sample sizes will be re-
quired to show convergence in some simulation examples. Ten thousand realizations
CHAPTER 3. FREQUENCY-DOMAIN CHANGE-POINT DETECTION 47
of each sample size are generated. For each realization, multitaper spectrograms3
with 16 time blocks are constructed, and then the level-of-change estimator between
adjacent blocks is formed. The set of 4000 simulations was run with four different
time-bandwidth parameters, N W = 2, 3, 4 and 5, each using 2N W − 1 tapers. Fig-
ure 3.1 shows a sample 16-block multitaper spectrogram constructed of blocklength
128 from a N (0, 1) sample of length 2048 using multitaper parameters N W = 5 and
K = 9. This spectrogram represents a matrix that is the first step in obtaining the
level-of-change estimator. Figure 3.2 shows the associated level-of-change estimator,
and it provides a pictorial representation of the between-block-pair level-of-change.
Block pair 1 in Figure 3.2 represents the level-of-change between blocks 1 and 2 in
Figure 3.1, block pair 2 represents the level-of-change between blocks 2 and 3, and so
forth. The frequencies within W of zeroth and Nyquist (0.5) are dropped from the
level-of-change estimator. All multitaper spectrograms are plotted on a logarithmic
colour scale. The values on the scale indicate power, and are in units2 /frequency,
where “units” indicates the units of the original variable.
All simulations presented in this thesis use code in R based on the multitaper
software package introduced in Appendix A, which includes optimized Fortran 90 code.
The simulations require modifications not included in the package, and the simulations
are performed using the R “parallel” software package, which allow multiprocessor use
provided that the code is appropriately designed. The set of simulations presented in
this Normal Distribution subsection takes approximately 10 hours on an Intel “Core
2” 2.50 GHz quad core processor running a Linux Mint operating system when all
four cores are utilized.
3
One spectrogram constructed from periodogram estimates is included, and this estimate can be
considered a single tapered spectral estimate with a constant taper.
CHAPTER 3. FREQUENCY-DOMAIN CHANGE-POINT DETECTION 48
0.5
2.0
0.4
Frequency
0.3
1.5
0.2
1.0
0.1
0.5
0.0
5 10 15
Block
Figure 3.1: Multitaper spectrogram plot with adaptive weighting of white noise data
using 16 non-overlapped blocks of length 128—that is, the total length is N = 16 ×
128 = 2048. The multitaper spectral estimates use the parameters N W = 5 and
K = 9.
The primary object of the first set of simulations is to check the derived mean and
variance of the level-of-change estimator against simulated values, and Figures 3.1
and 3.2 are presented to provide an example of the procedure. Matrices represented
by the two figures are from one realization of the simulation, each matrix is generated
for each of the 4000 realizations, and the 4000 simulations are run for each sample size,
and for each of the five spectral estimators, which includes the four time-bandwidth
parameters and the constant taper (periodogram). Figure 3.2 plots the level-of-change
estimator. The matrix represents a level-of-change value for each frequency, and
for each pair of adjacent block. We initially set a cutoff of 4.16 (level-of-change
≥ 4.16) as an initial way of consider detecting a change-point in this example. Our
simulations indicate such a cutoff leads to a ∼ 5% false detect rate (Type I error).
Note that 4.16 represents a value above 12σ where σ 2 is the variance based on the
CHAPTER 3. FREQUENCY-DOMAIN CHANGE-POINT DETECTION 49
0.4
2.0
0.3
Frequency
1.5
1.0
0.2
0.5
0.1
0.0
2 4 6 8 10 12 14
Block Pair
Figure 3.2: Level-of-change estimator between block pairs based on the spectrogram
of white noise shown in Figure 3.1, using multitaper parameters N W = 5, and K = 9.
The frequency range is reduced as we omit frequencies within W of the zeroth and
Nyquist (0.5). In this example, we use a cutoff value of 4.16, giving a 5% error rate
for the complete matrix, and this matrix exhibits no change-points.
chi-squared degrees-of-freedom presented in Table 3.2. This high cutoff is the result of
the multiple hypothesis tests in the matrix represented by Figure 3.2, which has over
1600 values that are not independent. A discussion of selecting a cutoff is presented
in Section 3.6.3. We do not propose this as a standalone estimator but rather as a
tool to be used with other existing tools (see Section 3.6.2).
The simulation means of the level-of-change estimator over all blocks and frequen-
cies for each time-bandwidth pair are shown in Table 3.1. Additionally, the average of
the level-of-change estimator obtained from a periodogram is shown. Table 3.2 shows
the average simulation variances of the level-of-change estimator. The tables show
close agreement between observed simulation means and variances and theoretical
CHAPTER 3. FREQUENCY-DOMAIN CHANGE-POINT DETECTION 50
values derived in Section 3.5.2. Table 3.2 also shows how using multitaper spec-
tral estimation considerably reduces the variance associated with the level-of-change
estimator as a result of the degrees-of-freedom increase.
Table 3.1: Random samples of spectral means were generated of size 2048, 4096, 8192,
then multitaper adaptively weighted block spectrograms were constructed by using
block lengths of 128, 256, and 512 respectively. The table gives sample means found
using simulation for the periodogram and nonadaptive weighted multitaper spectral
estimates with time-bandwidth parameters, N W = 2, 3, 4 and 5.
Table 3.2: Variances of random samples were generated and multitaper spectrograms
constructed as in Table 3.2. Observed sample variances constructed using adaptive
weighting are higher than both theoretical variances and simulated variances con-
structed from multitaper spectrograms without adaptive weighting.
Note that the level-of-change estimator was plotted with adaptive weighting, which
lowers the associated degrees of freedom; however, there is no visible difference in the
estimator with and without adaptive weighting in this example. Tables 3.1 and 3.2
were constructed using adaptive weighting and while the associated degrees of freedom
is lower without adaptive weighting, in the Gaussian noise example the difference
between the estimator with and without adaptive weighting is confined to the third
CHAPTER 3. FREQUENCY-DOMAIN CHANGE-POINT DETECTION 51
and fourth decimal place. We do not find adaptive weighting of use in these first
simple examples; however, in Section 3.6.1.4, a stationary no change-point example
in which adaptive weighting is of visible benefit to the level-of-change estimator will
be presented. This section demonstrates that random N (0, 1) simulations using block
sizes as small as 128 closely match the values derived in Section 3.5.2.
estimator. The multitaper spectral estimator assumes that the data is locally smooth
in order to justify the χ22K distribution of spectral estimate, and the locally smooth
condition is increasingly satisfied with larger sample sizes when N W is unchanged.
Thomson (1982, pp. 1062–1065) suggests that a non-central chi-squared distribution
may be more appropriate when the locally smooth assumption is violated.
Tables 3.3 and 3.4, which are constructed in the same way as the tables in Sec-
tion 3.6.1.1, present the observed sample means and sample variances averaged over
4000 realizations. From the perspective of the level-of-change estimator, the high
variance in the cubed Gaussian distribution requires impractically large block sizes to
converge in mean and variance to the derived values. These tables indicate that the
proposed level-of-change estimator can be effective with high-variance non-Gaussian
data sets, but only with extremely large samples. Next, dependent data samples are
considered. The variances are high with sample sizes we expect to see in climate data,
and we cannot propose use of the level-of-change estimator on data with a N (0, 1)3
distribution unless the sample sizes are very large, at least 16384 samples, and we
expect a 634 samples in a long series.
Level-of-change Simulated Means
Block Size NW = 2 NW = 3 NW = 4 NW = 5
128 1.2460 0.9160 0.7884 0.7200
256 1.0342 0.6980 0.5672 0.4979
512 0.9192 0.5787 0.4462 0.3758
1024 0.8560 0.5127 0.3790 0.3081
2048 0.8242 0.4790 0.3444 0.2728
4096 0.8072 0.4610 0.3260 0.2542
8192 0.7986 0.4519 0.3166 0.2542
16384 0.7944 0.4474 0.3120 0.2400
Theory 0.7899 0.4426 0.3071 0.2350
Table 3.3: Sample means of the level-of-change estimator from an N (0, 1)3 distribu-
tion. 4000-run simulations were made, each having 16 blocks in length. The bottom
row gives the approximations derived in Section 3.5.2.
CHAPTER 3. FREQUENCY-DOMAIN CHANGE-POINT DETECTION 53
Table 3.4: Sample variances from simulated level-of-change estimator from an N (0, 1)3
distribution. 4000-run simulations were made, each having 16 blocks in length. The
bottom row gives the approximations derived in Section 3.5.2.
CHAPTER 3. FREQUENCY-DOMAIN CHANGE-POINT DETECTION 54
In Section 3.5.3, a discussion for relaxing dependence was made, and we present
two dependent data models without change-points. The first is an AR(2) model with
coefficients φ = (0.75, −0.5)T , a model that has been used in the literature as a simple
dependent data model (P&W93, p. 45). Four thousand simulations indicate that the
sample means and sample variances averaged across blocks and frequencies are close
to those derived in Section 3.5.2. Admittedly, in this case the dependence between
adjacent blocks is low.
As in Section 3.6.1.1, sample sizes of 2048, 4096, and 8192 and block sizes of
128, 256, and 512 respectively are used in 4000 run simulations. Figure 3.3 shows a
realization of the multitaper spectrogram of the process. In a sufficiently long data
set, the dependence structure in the data is seen as being of higher power between
frequencies 0.1 and 0.2 (P&W93, p. 309); however, in the short blocks, the power in
this frequency range varies from high to low, as the sample size is not sufficiently long
to capture the signal in each block. Figure 3.4 shows the level-of-change estimator
for the same realization, and differences between blocks resulting from the AR(2)
structure are hard to observe in a single realization. If, as previously, a cutoff of 4.16
is selected, this realization will not be significant.5 A plot of the average level-of-
change estimator across all 4000 realizations, not presented here, shows that a higher
level-of-change is observed between frequencies of 1.8 and 2.6, representing the down
slope of the peak resulting from the AR(2) process, which is not well resolved with
smaller sample sizes (Thomson, 2001, p. 349). A similar pattern is observed in a plot
of the standard errors over the 4000 realizations, also not presented here.
5
Note that simulations show this AR(2) model will have a ∼ 7.6% false detect rate with a cutoff
of 4.16 which was selected from white noise simulations.
CHAPTER 3. FREQUENCY-DOMAIN CHANGE-POINT DETECTION 55
Tables similar to Tables 3.1 and 3.2, not shown, here indicate that the simulation
values of sample mean and sample variance are close to those derived in Section 3.5.2,
indicating some robustness to dependence. We propose use of a spectral estimate of
the entire series along with this level-of-change estimator, as the spectral estimate of
the complete series would aid in assessing a reason for any increased level-of-change.
0.5
8
0.4
6
Frequency
0.3
4
0.2
2
0.1
0.0
5 10 15
Block
0.4
2.0
0.3
Frequency
1.5
1.0
0.2
0.5
0.1
0.0
2 4 6 8 10 12 14
Block Pair
Figure 3.4: Level-of-change estimator for the AR(2) example shown in Figure 3.3.
This realization indicates the potential for a false detect. This risk can be reduced
by recognizing a higher likelihood of false detect around the unresolved peak in the
spectrum.
CHAPTER 3. FREQUENCY-DOMAIN CHANGE-POINT DETECTION 57
figure with adaptive weights, as the false detect is associated with the power range
where there is a frequency fluctuation in the spectrogram. The power fluctuation is a
random pattern resulting from high and lower resolution in the short blocks; however,
a simple spectral estimate of the entire length of the series will make it clear whether
such a signal is present. Figure 3.6 plots the spectral estimate for all 2048 samples
used in the spectrogram, and the two peaks are readily apparent. It is the two peaks
which are not well resolved and creating the colour pattern in Figure 3.5.
0.5
0.4
1500
Frequency
0.3
1000
0.2
500
0.1
0.0
5 10 15
Block
1e+02
spectrum
1e+00
1e−02
Frequency
(NW = 5 K = 9)
Figure 3.6: Multitaper adaptively weighted spectrum estimate all 2048 samples from
the same realization of the ARMA(4,2) process using multitaper parameters N W = 5
and K = 9.
We see that in this realization, all values above the cutoff are in the vicinity of the
unresolved spectral peak and, as such, can be recognized as a false detect. In general,
AR and ARMA processes have lower spectral values than white noise, with similar
mean and variance, in the higher frequencies. This artificial example can appear as
non stationary on the Bartlett M test, and can produce false detects using the pro-
posed level-of-change method for detecting change points; however, we submit, if one
plots a complete spectral estimate, and recognizes the spectral peaks are not resolved
in short blocks, then inappropriately classifying such a process as nonstationary can
be avoided. We also recommend fitting an ARMA model and plotting the residuals
as a standard diagnostic. These techniques should help avoid missclassification.
CHAPTER 3. FREQUENCY-DOMAIN CHANGE-POINT DETECTION 60
0.4
8
6
0.3
Frequency
4
0.2
2
0.1
2 4 6 8 10 12 14
Block Pair
As in the previous examples, Tables 3.5 and 3.6 show the average over the sample
mean matrix, and the average over the sample variance matrix respectively, for N W =
2, 3, 4 and 5, with K = 2N W − 1. The tables show the values of sample mean and
standard errors not close to those derived in Section 3.5.2 at a sample size of 128;
however, the values approach the derived values as the sample size doubles. This
indicates that with smaller block sizes, as are likely to be seen in the GHD series,
the mean and variances may not equal the derived values. The values presented in
the tables are constructed with adaptive weighting, and when adaptive weighting is
not used, the means and variances are closer to theoretical values with fewer tapers;
however, when a higher number of tapers are used, the values are actually closer
to theoretical values with adaptive weighting. Tables of values without adaptive
weighting are not presented.
CHAPTER 3. FREQUENCY-DOMAIN CHANGE-POINT DETECTION 61
15
0.4
0.3
Frequency
10
0.2
5
0.1
2 4 6 8 10 12 14
Block Pair
0.4
0.3
Frequency
0.2
0.1
2 4 6 8 10 12 14
Block Pair
Figure 3.9: Level-of-change estimator plot showing only values above the 4.16 cutoff
constructed using adaptive weights, with N W = 5, and K = 9 for the ARMA(4,2)
example shown in Figure 3.5. The only detected values are in a region where false
detects are expected due to the low resolution of each block.
Table 3.5: Average across blocks and frequencies of the standard error matrix of the
level-of-change estimator, using adaptive weights, with N W = 5, and K = 9, from
4000 simulations of the ARMA(4,2) process.
CHAPTER 3. FREQUENCY-DOMAIN CHANGE-POINT DETECTION 63
Table 3.6: Average across blocks and frequencies of the standard sample mean of the
level-of-change estimator, using adaptive weights, N W = 5 and K = 9, from 4000
simulations of the ARMA(4,2) process.
CHAPTER 3. FREQUENCY-DOMAIN CHANGE-POINT DETECTION 64
We next explore the practicality of this estimator using a simulation model where
there is a change-point specifically, we simulate a series where a frequency component
changes. This is the type of structural change we focus on detecting in the GHD
data. We study a simplified version of the models described in Lees and Park (1995).
A single time series is constructed by concatenating the following two models:
and
where wt is random noise drawn from an N (0, 1) distribution. The model is:
x1 (n) if n ≤ n0
x(n) = (3.25)
x2 (n) if n > n0 .
We simulate x1 and x2 , each of length 1024,6 and concatenate the series for x(n) with
x(n) having 2048 points and being indexed n = 1, 2, . . . , 2048. This is equivalent to
the smallest simulation size in the stationary examples, and each realization will have
16 blocks of length 128.
Figure 3.10 plots one realization of the spectrogram from this example. In the fig-
ure, the parameters N W = 5 and K = 9 are used, and changing the time-bandwidth
parameter and number of tapers affects the appearance of the spectrogram. In gen-
eral, decreasing the time-bandwidth can help to resolve signals better, given the
resulting shorter time blocks sizes; however, increasing time-bandwidth and the as-
sociated number of tapers increases the degrees of freedom of the spectral estimate
6
We set n0 in (3.25) to 2048.
CHAPTER 3. FREQUENCY-DOMAIN CHANGE-POINT DETECTION 65
and thus lowers the variance in the level-of-change estimator (Thomson, 1982). The
vertical bar in the first block, at approximately f = 0.38, indicates the bandwidth pa-
rameter 2W used in the spectral estimate. In this figure, the harmonic component at
0.2 is visible throughout the spectrogram, but it is not well resolved in each block; the
sinusoid at frequency 0.09 is much better resolved for the first half of the series where
it is present, and the lowest-amplitude sinusoid, at 0.0526, is barely distinguishable
from noise in the second half of the series.
0.5
1.5
0.4
1.0
Frequency
0.3
0.5
0.0
0.2
−0.5
0.1
−1.0
−1.5
0.0
5 10 15
Block
Figure 3.10: Multitaper spectrogram plot of simulated data containing two sinusoidal
frequencies, with one that considerably damps down at the halfway point. In this
case the nonstationarity is clearly visible in the spectrogram. The black line segment
in the upper left indicates the bandwidth, 2W . The first half of the data has a
sinusoid of amplitude A1a = 1 at f1a = .09, and a sinusoid of amplitude A2 = 0.6 at
f2 = 0.2. The second half has a sinusoid of amplitude A1b = 0.2 at f1b ≈ 0.0526. The
background noise has constant variance of one. The multitaper parameters used were
N W = 5, and K = 9. The ≈ 0.0526 low-amplitude frequency is not distinguishable
at this block length.
CHAPTER 3. FREQUENCY-DOMAIN CHANGE-POINT DETECTION 66
Figure 3.11 plots the level-of-change matrix from the spectrogram using adaptive
weighting and multitaper parameters N W = 5 and K = 9. Once again, other multi-
taper parameters and block lengths were attempted; and this plot demonstrates the
level of change in this realization of the process. In this example, if we select a cutoff
of 4.15, we can detect the change.
5
0.4
4
0.3
Frequency
3
0.2
1
0.1
2 4 6 8 10 12 14
Block Pair
Figure 3.11: We plot the level-of-change estimator between adjacent blocks, trimming
the blocks by w at the frequency edges (zero and Nyquist frequencies). Note that we
visually detect a level-of-change estimator between blocks 8 and 9 at a frequency of
approximately 0.091 (1/11).
The above example is not randomly selected from the set of the 4000 simulations
containing the change-point; instead, it is randomly selected from the set of simula-
tions with values above the 4.16 cutoff in the correct frequency range and block. We
estimate that approximately 25% of the 4000 simulations containing a change-point
CHAPTER 3. FREQUENCY-DOMAIN CHANGE-POINT DETECTION 67
as specified in (3.25) have such a value. The selected cutoff gives low statistical power;
however, simulations demonstrate that cutoffs selected to control Type I error across
the entire frequency range on single hypothesis tests, such as the harmonic F -test,
also have low statistical power. This tells us that while we can select such a cutoff to
help in reading the level-of-change estimator matrix in simulating simple examples,
such a cutoff is not feasible in practice. We do not propose the level-of-change esti-
mator as a stand-alone tool with a strict cutoff set to control Type I error across the
whole matrix, but we propose that it to be incorporated with other existing tools to
help in detecting change-points.
One tool that this estimator should be used with is the Bartlet M-test; Figure 3.12
plots the Bartlett M-test for this example, and it clearly shows non-stationarity at
approximately f = 0.2. This is a tool that can aid in identifying which frequencies
to pay attention to when attempting to detect a change-point.
Figure 3.13 plots the average values of the level-of-change estimator for the eighth
block pair column, the column for the block pair which contains the change point, over
the 4000 simulations using N W = 5 and K = 9. The plot indicates that the level-
of-change estimator has a mean value that is considerably higher in the appropriate
frequency range, while frequency where there is no change has values close to the
expected values. This pattern is similar when other multitaper parameters are used.
A plot of the standard errors, not shown, is similar.
CHAPTER 3. FREQUENCY-DOMAIN CHANGE-POINT DETECTION 68
100
95% Significance
80
Expected Value
Bartlett M−test
60
40
20
Frequency
Figure 3.12: Bartlett M-test for this change-point example. This test shows non-
stationarity at the frequency where there is a change in amplitude and change in
frequency. The line segment in the below the legend indicates the bandwidth, 2W ,
and the two dashed lines indicate the chi-squared expected value and the 95% value.
The multitaper parameters used were N W = 5, and K = 9.
CHAPTER 3. FREQUENCY-DOMAIN CHANGE-POINT DETECTION 69
2.5
2.0
Level−of−change
1.5
1.0
0.5
Frequency
Figure 3.13: Average eighth block pair level-of-change column over 4000 simulations.
This figure shows that the average observed level-of-change over the 4000 simulations
is considerably higher in the frequency range where the change-point occurs. The
multitaper parameters used were N W = 5 and K = 9.
CHAPTER 3. FREQUENCY-DOMAIN CHANGE-POINT DETECTION 70
The cutoff value of 4.16 was obtained to control Type I error over the entire level-
of-change matrix; the value sets Type I error at 5%. The cutoff was selected from
simulations of the N (0, 1) process described in Section 3.6.1.1 for multitaper pa-
rameters N W = 5 and K = 9, and it has a high Type II error in the frequency
change-point model introduced in Section 3.6.2. We do not propose a general use of
such cutoff values with this method, and on the basis of simulations not presented
here, we observed that a similarly obtained cutoff would lead to low power in tests
such as ubiquitous harmonic F -test. Table 3.7 shows the cutoffs obtained for the
parameters N W = 2, 3, 4 and 5 and K = 3, 5, 7 and 9. One can see that the cutoffs
increase as the degrees of freedom decrease.
Level-of-change Cutoffs
NW = 2 NW = 3 NW = 4 NW = 5
24.50 9.46 5.82 4.16
Table 3.7: Cutoffs for controlling Type I error for the level-of-change estimator based
on maximum values in each level-of-change matrix for 4000 simulation and a N (0, 1)
process.
Figure 3.14 shows a density plot comparing maximum values from change points
to maximum values of no-change-points. This figure is presented to point out that
selecting a cutoff in order to control both Type I and Type II error is not a trivial
task for the change-point model discussed in Section 3.6.2. However, such problems
are truly difficult and are generally not solved for other spectral analysis tests; for
example a similar problem exists for the harmonic F -test. Once again we propose
using this estimator as part of a suite of tools to help detect change-points.
CHAPTER 3. FREQUENCY-DOMAIN CHANGE-POINT DETECTION 71
3 change
density
change
no change
2
0 3 6 9 12
Level−of−change
Figure 3.14: Plots of densities of the level-of-change estimator for a model with a
change-point and a model without. These are based on 4000 simulations comparing
maximum values of a model with a change-point to one without. The intersection
point is 0.68.
In practice, we are faced with a fixed sample size. For example, the grape harvest
date (GHD) (Chuine et al., 2004) time series, examined in Chapter 4, has 634 an-
nual samples, from 1370 to 2003, and obtaining more samples is not feasible, except
perhaps for the most recent decade. While one can look to other proxy measures,
14
such as the C Bristlcone Pine Record (Suess and Linick, 1990), we chose to work
with the existing data and consider the case of overlapped block sizes. In selecting
block size, we have to consider the power of the signal we are trying to resolve, the
acceptable amount of overlap, and whether we are willing to discard data at the edges
or between blocks. As demonstrated earlier in this chapter, block size selection affects
the spectrogram, the Bartlett M-test results, and the level-of-change estimator. In
this case we have may have to consider various levels of overlap, lost data at the ends
of the series, or lost data between blocks. When there are few samples, we prefer
to use all the data. The procedure that we adopt is to test all reasonable estimates
of overlapped and omitted block lengths in an appropriate range, and try to find an
appropriate compromise. Table 3.8 gives a sample of possible block size choices.
Block Length Block Offset Number of Blocks % Overlap
109 105 6 3.7
106 88 7 17.0
81 79 8 2.5
82 69 9 15.9
84 55 11 34.5
106 48 12 54.7
128 46 12 64.1
Table 3.8: A sample of potential block sizes, selected by using the criterion that data
at the end points not be discarded. In general, when the offset size is small, the
options for block size increase, and the trade-off occurs when block size and offset are
close and thus minimizing the overlap.
CHAPTER 3. FREQUENCY-DOMAIN CHANGE-POINT DETECTION 73
Exploratory data analysis is still partly [largely] an art so, for a given time
series, several approaches are possible. (Thomson, D. J., pers. comm.)
We propose the following general approach when using the proposed level-of-change
estimator to locate change-points:
1. Plot the spectrum with and without several prewhitening models and ensure
that a simple ARMA model is not sufficient.
2. Plot the spectrogram with various levels of overlap and time-bandwidth param-
eters, while being careful to recognize what an unresolved signal looks like and
ensure that no possible change-point can be explained that way.
5. Plot the spectrum before and after the change-point on the same scale and
determine if it is possible to identify what is going on.
spectral estimates that are not independent, and it has been suggested that a non-
central chi-squared distribution is more appropriate in such matrices. Further work
is required to study the properties of the estimator as a part of a matrix containing
many (over 1600) non-independent points.
Chapter 4
76
CHAPTER 4. BURGUNDY GRAPE HARVEST DATES 77
4.1 Introduction
This chapter presents an analysis of the Burgundy grape harvest date (GHD) series
first assembled by Chuine et al. (2004).1 This series is of particular interest because,
starting in 1370, it is the longest climate time series available that has known dates
and continues into the present. Other similar time series, proxy data for climate,
have uncertain dates; the time (or date) is estimated. Thus this series can be used
both to calibrate dates of proxy series such as ice cores and to compare pre-industrial
and industrialized European climate. Natural climate variability and its impact on
ecosystems and plant phenology have been discussed in Jones and Goodrich (2007),
and this series is considered to track climate variability accurately. We note that
there are concerns about production practices and socioeconomic pressures resulting
in artificially low within-year variability (Chabin et al., 2007). Burgundy represents
18, regions and the capital, Dijon, regularly mandated the harvest date for the en-
tire region, resulting in artificially low within-year variability. We are specifically
interested in the annual climate signal captured by the harvest date.
In addition to the Burgundy GHD series, the following long-term time series
have been produced for the region of interest: a Swiss GHD series from 1480 CE
to 2006 (Meier et al., 2007) and a 335-year Central England Temperature (CET) se-
ries (Manley, 1974). In the Burgundy GHD series, harvest dates correlate negatively
with April to August temperatures, September temperatures do not correlate signif-
icantly, and the overall relationship is dominated by interannual variability (Chuine
et al., 2004; Krieger et al., 2011). Two additional series are used in coherence esti-
mates: (1) an estimate of total solar irradiance (TSI) from Stocker et al. (2013) and
1
The series consists of harvest dates taken from multiple sites, but harvest dates were often
selected by the central authority and imposed on all sites.
CHAPTER 4. BURGUNDY GRAPE HARVEST DATES 78
based on Krivova et al. (2010) and Vieira et al. (2012), and (2) an El Niño—southern
oscillation (ENSO) reconstruction based on Wilson et al. (2010). TSI can be thought
of as a reconstruction of solar brightness, and ENSO represents a record of anomalous
sea surface temperatures that are known to affect climate.
The primary GHD data set provides the longest series. The median, over the
18 regions, of the standardized dates is reported as day after September 1st . A
multitaper spectral analysis of the Burgundy GHD series for years 1678 to 2003 is
presented in Tourre et al. (2011). This is the most recent published analysis of this
series, and they did not consider the entire series.
A four-stage analysis is presented: (1) Compare the Burgundy GHD series with
other series to determine the magnitude-squared coherence (MSC) and phase coher-
ence, (2) perform analysis of the complete series, (3) use the methodology discussed in
Chapter 3 to locate one change-point, and (4) perform multitaper spectral analysis of
the (two) sections. The novel contributions are the coherence study, which indicates
that the series captures a climate signal, the location of the change-point using the
methodology introduced in Chapter 3, and the multitaper spectral analysis of the
complete and sectioned GHD time series.
We begin by plotting the Burgundy GHD series along with five other similar series for
comparison. Figure 4.1a plots the original series, and Figure 4.1b plots the Swiss GHD
as days after September 1st (Meier et al., 2007). Note that there are several large gaps
in the first part of the Swiss series. Figure 4.1c plots the CET annual series (Manley,
1974), Figure 4.1d plots the annual phase of the CET series in (angular) degrees,
CHAPTER 4. BURGUNDY GRAPE HARVEST DATES 79
which was calculated using the monthly temperature series (see Appendix A.6.1),
Figure 4.1e plots the estimated TSI in watts per square metre (Stocker et al., 2013),
and Figure 4.1f plots three reconstructions of the ENSO cycle shown in normalized
degrees Celsius. Note that the CPR, composite plus regression, which relies on simple
averaging of the proxy series, and PCR, principal component regression, reconstruc-
tions of the ENSO cycle appear almost identical.
Days after Sept 1st
70
50
a) b)
Swiss GHD
50
30
30
−10 10
1400 1500 1600 1700 1800 1900 2000 10 1500 1600 1700 1800 1900 2000
CET annual phase ( ° )
CET annual temp ° C
c) d)
148
10
9
142
8
136
7
1700 1750 1800 1850 1900 1950 2000 1700 1750 1800 1850 1900 1950 2000
Solar irradiance (W/m2 )
e) TEL f)
1361.5
CPR
ENSO ° C
PCR
2
0
1360.0
−2
Figure 4.1: (a) Burgundy GHD plotted as number of days after September 1st . Five
additional series are also shown: (b) Swiss GHD as days after September 1st . There
are several large gaps in the first part of this series. (c) CET annual temperature
series. (d) Annual phase of the CET series in (angular) degrees. (e) Estimated TSI
in watts per square metre. (f) Three reconstructions of the ENSO cycle shown in
normalized degrees Celsius.
Figure 4.2 gives multitaper spectral estimates of the GHD series. The crosses at
approximately 0.135 cycles/year indicate the passband bandwidth, 2W , and height
CHAPTER 4. BURGUNDY GRAPE HARVEST DATES 80
of the approximate theoretical 95% confidence interval based on the χ22k distribu-
tion. The confidence interval uses the median number of degrees-of-freedom from the
adaptively weighted spectral estimate, which is generally slightly below 2k. Mean
and trend are also removed from the GHD series prior to taking the multitaper spec-
tral estimate. One can detrend with simple mean and variance or smoothing splines;
however, we opt to detrend using an expansion into the Slepian sequences employed
in Thomson (1982).2 We use different time-bandwidth parameters and observe how
the plots change. This gives us a sense of whether we are over- or under-smoothing—
we smooth by selecting the bandwidth or time-bandwidth parameter. The spectra
are plotted on a logarithmic scale and the units of the y-axis are (Days after Sept
1)2 /(cycles/year). The observed pattern is consistent with that seen in the climate
literature. Increasing the bandwidth by increasing N W results in increasing the
passband, and this can potentially smooth out frequency components of interest. For
example, the top left plot has a small peak at approximately 1/11 cycles per year,
corresponding to a solar cycle, but this component is not large enough to be statis-
tically significant and is smoothed out when the passband, W, increases. To avoid
missing lines, the series was zero-padded to 8192. Figure 4.3 presents a harmonic
F -test of the entire 600 year series; we note that the peak at a period of 3.9 years is
slightly less than the period 4.14 years found in Tourre et al. (2011), and we attribute
this shift to including the entire GHD series.
In this section, we present a multitaper spectral analysis of the complete Bur-
gundy GHD series, and we find a significant harmonic component at 3.9 years which
represents a slight shift from 4.14 year period found in the multitaper analysis of
2
In our experience, there is little observable difference between different methods of detrending,
and this method is specifically adapted to multitaper spectral estimates.
CHAPTER 4. BURGUNDY GRAPE HARVEST DATES 81
Spectrum (NW = 3)
200
200
NW = 4
50
50
10
10
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
200
200
NW = 5
NW = 6
50
50
10
10
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
Frequency (cycles/year)
Figure 4.2: Multitaper spectra of GHD series. Multitaper spectral estimates were
made with N W = 3, 4, 5 and 6, and with K = 5, 7, 9 and 10, respectively starting at
the top left. The crosses at approximately 0.135 cycles/year indicates the passband
bandwidth, 2W , and height of the approximate theoretical 95% confidence interval
based on the χ22k distribution. Note that the peak at a period of 3.9 years almost
agrees with Tourre et al. (2011).
years 1678–2003 in Tourre et al. (2011, p. 247). The plots also demonstrate the effect
of increasing parameters N W and K. The change-point detection method discussed
in Section 3.7 is suited to a higher N W and K, but a more descriptive plot can be
seen with lower values for N W and K.
CHAPTER 4. BURGUNDY GRAPE HARVEST DATES 82
10 20
10
F−test (NW = 3)
NW = 4
5
5
2
2
1
1
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
10
10
NW = 5
NW = 6
5
5
2
2
1
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
Frequency (cycles/year)
Figure 4.3: This figure shows the harmonic F -test statistic for the harvest dates. The
parameter values used are N W = 3, 4, 5 and 6, with K = 5, 7, 9 and 10 respectively.
The red dashed line indicates a 1 − 1/N level of significance where N = 634, in
keeping with the rule of thumb for the harmonic F -test (see Section 3.6.1). We note
that the most significant peak occurs at a period of 4.14 years, which is close to the
reported period of 3.9 years reported in Tourre et al. (2011, p. 247).
CHAPTER 4. BURGUNDY GRAPE HARVEST DATES 83
In this section we present several spectral coherence comparisons obtained using the
multitaper method to correct for bias (see Section 2.13). We do this to examine
whether the GHD series has properties similar to those of other related data sets,
using MSC and phase coherence (see Section 2.13). We do not prewhiten the series
with AR filters prior to examining coherence, but we do remove mean and trend,
as is customary, prior to working with spectral estimates. We note that MSC has
a slight but well-calibrated bias, and we use a multitaper estimate to reduce the
bias (Thomson and Chave, 1991a). Expected values for the moments of the MSC are
given in equation (5) from Carter et al. (1973b), and from this the expected value of
the MSC for independent data is 1/K where K is the number of independent tapers
used in a multitaper MSC estimate.3
Figure 4.4 plots the Burgundy and Swiss GHD series, presented in Meier et al.
(2007), for the overlapping dates. On average, the Swiss harvest date lags the Bur-
gundy harvest date by 14 days. Figure 4.5 indicates the MSC between the Burgundy
GHD series and the Swiss series, and the MSC indicates a high coherence. The
dashed red line in Figure 4.5 indicates the known bias in the estimate; specifically,
in this case it shows that a coherence of 0.14 will be observed for estimated values
of uncorrelated samples. The dashed blue lines indicate MSC of 0.393 and 0.534,
which correspond to a significance of 95% and 99% respectively.4 The observed MSC
is considerably above the expected value for uncorrelated examples, and we consider
3
Prior to plotting the coherence plots, we plot either a single plot with both series or two adjacent
plots with both series. It is customary to plot them on the same page where possible, and we opt
for plotting them in sequence.
4
These significance values are based on the normal transformation from Thomson and Chave
(1991b), and normality was assessed prior to assigning significance.
CHAPTER 4. BURGUNDY GRAPE HARVEST DATES 84
this as evidence of high data quality—i.e., the records are relatively accurate—and
both data sets capture similar climate variability. The faint dashed line on the coher-
ence plot represents the lower one standard deviation jackknife confidence interval.
Note that the confidence interval is approximately one standard deviation based on
the inverse hyperbolic tangent transformation (the scale of the y-axis). Coherence
estimates are jackknifed as other multitaper spectral estimators (see Section 2.12,
page 27). Many of the details in Figure 4.4 track; however, the low frequencies of-
ten depart for decades. This agrees with the coherence plot, Figure 4.5, where the
MSC is decreased at low frequencies. We consider this an interesting result; both
series have similar signals with periods of 11 to 2 years. Figure 4.6 plots the phase
60
Days after Sept 1st
40
20
0
Swiss
Burgundy
Figure 4.4: Overlapping section of the Swiss and Burgundy GHD series consisting of
years 1550 to 2003; no prewhitening has been applied, and MSC is presented in the
next plot. We note that the Swiss harvest is on average ∼ 14 days after the Burgundy
harvest.
CHAPTER 4. BURGUNDY GRAPE HARVEST DATES 85
coherence—note that the slope is relatively flat, once again indicating both series are
affected by similar periodic fluctuations. The nine-day lag is slightly less than that
captured in the average series lag, and we subtract the mean prior to computing the
spectral estimates. The delay captured here corresponds to change in year-to-year
fluctuations.
Period (years)
18 11 8 6 4 3
Magnitude squared coherence
4 5
0.6
3
0.2 0.4
2 1
Frequency (cycles/year)
Figure 4.5: MSC between Swiss and Burgundy GHDs. The coherence is constructed
from overlapped years 1550 to 2003 and is based on the multitaper spectral estimates
with parameters N W = 4 and K = 7. The y-axis indicates a normalized MSC; a
hyperbolic inverse tangent transform is known to transform the MSC to a standard
normal distribution (Thomson and Chave, 1991b). The dashed red line indicates
the inherent bias in the estimate; specifically, it shows that a coherence of 0.14 will
be observed for estimated values of uncorrelated samples. The faint dashed line
on the coherence plot represents the lower of a one standard deviation jackknife
confidence interval. The two dashed blue lines indicate a significance of 95% and
99%, corresponding to an MSC of 0.39 and 0.54 respectively.
The high coherence between the two series represents an incidental but significant
CHAPTER 4. BURGUNDY GRAPE HARVEST DATES 86
result of this thesis. This result provides evidence that two separate GHD series in
different countries were subject to similar cyclical climate effects between years 1550
and 2003.
The next coherence study compares the Burgundy GHD series with the annual
value of the CET series. The CET monthly series is shorter, and we compare the
years 1661 to 2003. Figure 4.7 plots the two series for observation. Figure 4.8 plots
the MSC of the two series. Once again the dashed red line indicates the bias inherent
on the estimate; uncorrelated series are expected to have an MSC of 0.09. The two
dashed blue lines indicate an MSC of 0.173 and 0.211 corresponding to a significance
of 85% and 95% respectively, and the faint grey line indicates a one standard devia-
tion jackknife confidence interval. Once again the MSC is unusually high, providing
evidence of similar cyclical climate effects in both the Burgundy GHD and the CET
series.
Next we examine the coherence between the Burgundy GHD and the annual
phase of Central England temperature series, this phase was originally presented
in Thomson (1995), and is reproduced in Appendix A.6.1. Figure 4.10 plots the
two series next to each other, and then Figure 4.11 plots the MSC. It indicates the
coherence between the two series is modest, especially at low frequencies.5 The MSC
is not consistently and considerably above the bias value as the other two are, yet
still there is some evidence of coherence in certain frequency ranges. The MSC values
of 0.17 and 0.21, represent significance values of 85% and 95% respectively. Finally,
Figure 4.12 plots the phase of the coherence. In this case a slope is apparent; the
phase is not flat. In this case the slope corresponds to a delay of approximately 305
5
The GHD series is annual and we compute the coherence estimate directly between the harvest
date, in days after Sept. 1st , and the annual phase of the Central England temperature series
calculated from three years of monthly temperatures.
CHAPTER 4. BURGUNDY GRAPE HARVEST DATES 87
days, the annual phase of the Central England temperature series lags the Burgundy
GHD by approximately 305 days. We cannot explain this last observed delay and
further study is required.
In these coherence estimates three phenomena are evident. First, low and high
frequencies often show distinct patterns. The “low frequencies” up to ∼ 0.16 cy-
cles/year (approximately 6 years in period) often show long term solar phenomena
(Suess and Gleissberg cycles, in addition to the ordinary 11-year solar cycle). It may
be coincidence, but the average decay time of a sunspot cycle, where most of the large
flux occurs, is 6.3 years. Second, at high frequencies a linear phase trend is often vis-
ible and has typical slopes corresponding to a few days, about the time required for
ordinary weather patterns to drift across the continent. This illustrates one of the
advantages of coherences; on may often reliably detect time delays of a few day in
series with annual sampling. Third, the MSC often alternates between high values at
periodic climate cycles and at low values between them.
The trend in Figure 4.6 is not as obvious with a larger y-range. It is possible
that there is a slight trend visible between 0.16 and 0.5 cycle/years. We note that a
frequency of 0.16 cycles per year corresponds to a ∼ 6 year period, approximately the
shortest cycle in the sunspot record. The trend corresponds to a delay of ∼ 3 days,
possibly the propagation time for ordinary weather patterns between Burgundy and
Switzerland.
CHAPTER 4. BURGUNDY GRAPE HARVEST DATES 88
Period (years)
208 18 11 8 6 4 3
Phase of the coherence (degrees)
2 SD Appoximation
0 degrees
0
−50
−150
Frequency (cycles/year)
Figure 4.6: Phase coherence between Burgundy and Swiss GHDs. Coherence is de-
fined in (2.67) and based on the multitaper cross-spectrum in (2.68). In these equa-
tions the Burgundy series is represented by x and the Swiss series is represented by
y. Two standard deviation confidence intervals are indicated on the plots; the green
line represents multitaper jackknife confidence intervals, and the blue line represents
approximate theoretical confidence intervals (Bendat and Piersol, 2011, p. 306). It
may be observed that these agree well. The phase is generally consistent with zero,
excluding the low–frequency part, and no phase unwrapping was required. Between
periods of ∼ 208 and ∼ 90 years there is a sharp drop to -69 degrees. Both edge
frequencies are well known in the climate literature: 208 years is one of the main
“Suess cycles” (Thomson, 1990b), and 90 years is very close to the upper peak, 91.5
years, of the ∼ 88 year Gleissberg cycle triplet (Peristyk and Damon, 2003). The
linear regression line (in grey) has a negative intercept and a positive slope. This
indicates that the Swiss series leads the Burgundy series by ∼ 9 days.
CHAPTER 4. BURGUNDY GRAPE HARVEST DATES 89
50
Burgundy Harvest Days After Sept. 1
10
40
CET Annual Series
30
9
20
10
8
0
−10
7
Year
Figure 4.7: Plots of the Burgundy GHD and the CET annual series for overlapping
years 1661 to 2003.
Period (years)
18 8 6 4 3
Magnitude squared coherence
3 4
0.3
2
0.1
1
0
Frequency (cycles/year)
Figure 4.8: MSC between Central England average annual temperature and the Bur-
gundy harvest dates from 1661 to 2003. The parameters used are N W = 6.5 K = 11.
The dashed red line indicates the bias value of 0.09, and the dashed blue lines indicate
MSC of 0.173 and 0.201. The coherence is modest, particularly at low frequencies.
The association between GHD and April to August temperatures in Burgundy have
been established (Chuine et al., 2004; Krieger et al., 2011).
CHAPTER 4. BURGUNDY GRAPE HARVEST DATES 90
Period (years)
18 11 8 6 4 3
Phase of the coherence (degrees)
600
400
200
Phase
0
2 SD Jackknife
2 SD Approximation
Regression Line
Frequency (cycles/year)
Figure 4.9: Phase coherence between the Burgundy GHD and the average annual
temperature of Central England series and for years 1661 to 2003. This figure is based
on (2.67) with the Burgundy series is represented by x and the Central England series
is represented by y. The multitaper parameters are: N W = 6.5, K = 11. The linear
regression line (in red) has a positive intercept and a positive slope. This indicates
that the Burgundy series leads the Central England series by ∼ 18 days.
185
50
Burgundy Harvest Days After Sept. 1
Corrected CET Phase (degrees)
40
180
30
175
20
10
170
0
−10
165
Year
Figure 4.10: Plot of the Burgundy GHD Series and the Central England phase con-
structed from three years of monthly data. The phase was first corrected for the three
day offset. A discussion of obtaining the phase plot is given in Appendix A.6.1.
CHAPTER 4. BURGUNDY GRAPE HARVEST DATES 91
Period (years)
18 8 6 4 3
Magnitude squared coherence
3.5
Arctanh transform of MSC
0.4
2.5
0.1 0.2
0.5 1.5
Frequency (cycles/year)
Figure 4.11: MSC between annual phase of the Central England temperature series
and Burgundy GHD for years 1661 to 2003. The multitaper parameters are: N W =
6.5, K = 11. The coherence is modest at low frequencies. The annual phase of
the Central England temperature series was calculated with zeroth order Slepian
complex demodulation technique with a length of N = 36, 3 years of monthly data,
with N W = 4.5. The thee-day offset for years 1661 to 1752, originally reported
in Thomson (1995), discussed on page 174, was applied. The dashed red line indicates
the bias value of 0.091, and the dashed blue lines indicate a MSC of 0.17 and 0.21.
CHAPTER 4. BURGUNDY GRAPE HARVEST DATES 92
Period (years)
18 11 8 6 4 3
Phase of the coherence (degrees)
600
200
0
Phase
2 SD Jackknife
−200
2 SD Approximation
Regression Line
Frequency (cycles/year)
Figure 4.12: Phase coherence between the Burgundy GHD and annual phase of the
Central England temperature series calculated over three years. This figure is based
on (2.67) with the Burgundy series is represented by x and the annual phase of the
Central England temperature series represented by y. The intercept is positive and
the slope is ∼ 300 degrees per year indicating the Burgundy GHD leads phase of the
CET series by ∼ 305 days. The multitaper parameters are: N W = 6.5, K = 11.
CHAPTER 4. BURGUNDY GRAPE HARVEST DATES 93
0.5
Frequency (cycles/year)
0.4
200
100
0.3
50
0.2
20
10
0.1
5
0.0
Time (year)
Figure 4.13: Multitaper spectrogram with considerable overlap. In this case the block
length is 74, there are 71 blocks, and the offset is 8 years. This indicates an overlap of
about 89%, but it allows for higher-frequency resolution. The vertical line segment on
the left indicates the bandwidth, 2W , and one can see the spectral estimates evolve
over time. The centre line indicates where are analysis selects to section the series.
20
15
Bartlett M Test
10
5
Frequency
Figure 4.14: Bartlett M-test for stationarity using block sizes with 2.5% (little) over-
lap. The expected value (green dashed line) and the 95% significance level (red dotted
line) are on the graph. The multitaper parameters used were N W = 3 K = 5, with 8
blocks, each of length 81 with an offset of 79. The line segment in the top right of the
plot indicates the bandwidth. Nonstationary components are approximately between
the frequencies of 0.1 and 0.18 cycles/year, and between 0.2 and 0.24 cycles/year
0.40
1.5
0.30
Frequency
1.0
0.20
0.5
0.10
1 2 3 4 5 6 7
Block Pairs
Figure 4.15: We plot the level-of-change between blocks in the spectrogram for the
GHD. If we restrict ourselves to the frequency of interest, 0.10 to 0.18, based on the
Bartlett M-test, we see that considerable change occurs at approximately the centre
of the series.
CHAPTER 4. BURGUNDY GRAPE HARVEST DATES 97
200
Spectrum
100
50
20
Frequency (cycles/year)
(NW = 3 K = 5)
200
100
Spectrum
50
20
Frequency (cycles/year)
(NW = 3 K = 5)
Figure 4.16: Multitaper spectra of the GHD before (top) and after (bottom) the
year 1675.5. The crosses indicate 95% confidence levels and the width of bandwidth
parameter, 2W . On the upper plot, the dashed lines indicate a period of 10.6 years
(0.94 cycles/year), and 7.5 years (0.133 cycles/year )for the date up to the year 1675.
On the lower plot, the dashed line indicates a period of 3.9 years (0.278 cycles/year).
It appears that a change in the spectral properties of the GHD series occurs when
the data is sectioned at the year 1675.
CHAPTER 4. BURGUNDY GRAPE HARVEST DATES 98
We have presented a coherence study comparing the Burgundy GHD series with
similar series and found considerable coherence between it and the Swiss GHD series
and between it and the CET series. We consider this as evidence of data quality,
and evidence that a similar climate signal is captured in these series. The phase
delays observed in Figures 4.6 and 4.9 indicate that the Burgundy GHD series lags
the Swiss GHD series by 9 days, and that the Burgundy GHD series lags average
annual temperature of Central England series by 18 days.
We applied our spectral analysis change-point detection tool, discussed in Chap-
ter 3, and we found a change-point within the Maunder minimum, which was from
1645 to 1715, a period of cooler climate coincident with decreased solar irradia-
tion (Eddy, 1976)
The analysis indicates changing spectra across the time series, with the largest
change-point located at 1676. The sectioned series (before and after the change-point)
exhibit different spectra, as seen in Figure 4.16. In the future, a similar analysis can,
and should, be carried out on the adjacent series, and the tool and methodology
presented here can be used in analysis, specifically to section the data into “quasi”
stationary sections, and more in-depth study into the potential causes of coherence
should be considered.
Chapter 5
Goodness-of-fit in Autoregressive
Processes
99
CHAPTER 5. GOODNESS-OF-FIT IN AR PROCESSES 100
5.1 Introduction
This chapter contains a paper submitted to the proceedings of the at the 2013 Joint
Statistical Meetings conference (Rahim and Thomson, 2013); the paper is presented
here with improvements and changes.
In this paper we (a) use simulation to show that multitaper spectral estimation in
conjunction with the Levinson-Durbin recursions provides accurate selection of au-
toregressive (AR) coefficients, (b) propose a practical test for assessing the goodness-
of-fit of AR coefficients, (c) use simulation to examine the proposed goodness-of-fit
test, and (d) fit several AR models to the Burgundy grape harvest date (GHD) time
series. Our tests find that several models are acceptable for the GHD series. This
paper does not consider the problem of AR coefficient order selection, and in practical
applications we use the Akaike information criterion (AIC).
AR models are used in many applications. For example, they are used to prewhiten
data in engineering applications (Thomson, 1977a) and to prewhiten data in climate
science prior to spectrum estimation (Mann and Lees, 1996). The choice of AR model
and the estimated AR coefficients used in prewhitening can affect the residuals and
the subsequent spectral analysis, masking or enhancing features of the spectrum.
Using simulation, we show that AR coefficients obtained by different methods are not
equivalently distributed, and we find in our simulations that the Levinson-Durbin
recursions with multitaper spectral estimates and Burg’s algorithm produce unbiased
low-variance estimates. We fit and compare the goodness-of-fit of several AR models
to the Burgundy GHD series (Chuine et al., 2004; Tourre et al., 2011).
CHAPTER 5. GOODNESS-OF-FIT IN AR PROCESSES 101
Goodness-of-fit for AR models has been proposed and discussed in the litera-
ture (Priestly, 1981, pp. 475–494; Anderson, 1997). Our test is based on the maxi-
mum absolute deviation of the integrated spectrum, originally proposed in Bartlett
(1937), and as a practical point, our test uses simulation to determine approximate
p-values.
There are multiple methods for fitting AR coefficients, and we review two popular
methods: (a) solving the Yule-Walker equations with Levinson-Durbin recursions,
and (b) using Burg’s recursions with forward and backward estimators.
Some authors argue that Yule-Walker equations should not be used (De Hoon
et al., 1996); however, we dispute this and demonstrate that the Yule-Walker equa-
tions can be effective in finding coefficients for an AR process with roots close to the
unit circle when used with an appropriate spectral estimate, such as the multitaper,
and solved with the Levinson-Durbin recursions. The Levinson-Durbin recursions
avoid matrix inversion in straight forward Yule-Walker equations, and have been
found more effective on digital computers (P&W93, p. 403). We note that the Burg
method is known to split spectral lines (Ulrych and Bishop, 1975),1 and there are
concerns about the Burg method producing unstable models (Burg et al., 1982).
This paper is organized in the following manner: Section 5.2 discusses basic theory
of AR coefficients and reviews two of the procedures used in calculation of the coef-
ficients, Section 5.3 reviews some general cautionary notes regarding the use of AR
models, Section 5.4 compares methods used in obtaining AR coefficients, Section 5.5
discusses goodness-of-fit tests for AR coefficients, Section 5.6 presents a simulation
analysis of goodness-of-fit tests, Section 5.7 compares various fitted AR models for
the Burgundy GHD series, and Section 5.8 gives concluding remarks and suggests
1
Adjacent peaks in the power spectrum can appear as one peak when the Burg method is used.
CHAPTER 5. GOODNESS-OF-FIT IN AR PROCESSES 102
future work.
5.2.1 Preliminaries
If {Zt } is a purely random process with zero mean and variance σZ2 , indexed by
t = 1, 2, . . . , N , then the process {Xt } is an AR process of order p (denoted as an
AR(p) process), and we have
Remark 1. Equation (5.1) notes an analogy between an AR(p) model and a regression
problem; however, instead of independent variables, the right side of equation (5.1)
has lagged copies of the dependent variable.
removed, or partialed out. The idea of partial correlation was introduced in Yule
(1897). This was groundbreaking work because, as is common with modern work, it
relies on regression. Yule did not invoke normality as much of the data he was using
was anything but normal.
Remark 3. An alternative notation for AR(p) often used in engineering and in Priest-
ley (1981) processes is:
where αj = −φj .
γτ
ρτ = , (5.4)
γ0
and ρ0 = 1.
N
Remark 6. If one replaces X̄ with µ, and multiplies by N −|τ |
one would have an
unbiased estimate; however, simply making the second substitution while using an
estimator of µ will not produce an unbiased estimator (Bartlett, 1978).
Theorem 1. The sequence formed by (5.5) is positive definite if and only if the
realizations of X1 , X2 , . . . XN are not all identical.
Remark 7. Generally, fast Fourier transforms (FFTs) are used instead of calculating
the acvs in equation (5.5) directly. (See remark 9.)
We will often examine and estimate the spectral density function (SDF), which is
the Fourier transform of the acvs,
∞
X
S(f ) = γτ e−i2πf τ . (5.6)
τ =−∞
In equation (5.6), we allow τ ∈ Z, thus we are considering a process with infinite past
and future. Frequency, f , takes values in [0, 1/2]. The above equality is true only
in a mean-square sense, but it can be considered pointwise in all practical applica-
tions2 The fact that autocovariances and the spectral density function are a Fourier
transform pair was first discovered in 1914 (Einstein, 1987) but was overlooked (see
Yaglom, 1987b) It was rediscovered independently by Wiener and Khintchine in the
1930s.
2
A sequence of functions {fn } converges pointwise to the function f if and only if lim fn (x) =
n→∞
f (x). In practice, one uses the acvs and spectrum as Fourier transform pairs.
CHAPTER 5. GOODNESS-OF-FIT IN AR PROCESSES 105
σZ2
SAR (f ) = Pp 2, for |f | ≤ 1/2, (5.7)
1− −i2πf j
j=1 φj e
In the above estimator, ∆t is the change in time step, t, and ht is a data taper. If we
p
allow ht = 1/n, the direct spectral estimator becomes the so-called periodogram,
which we denote as Ŝ(f ). The raw periodogram is asymptotically unbiased, but the
bias can exist even with large sample sizes in practical applications (Thomson, 1982,
p. 1058). Additionally, the periodogram is an inconsistent statistical estimator—that
is, the variance does not decrease as the sample size increases (Rayleigh, 1903). It can
p
be shown that, for real data, this estimator has a χ22 distribution when ht = 1/N
for all frequencies except f = 0 and f = 1/2, which contain only real values and thus
have a χ21 distribution (Blackman and Tukey, 1959).
Remark 9. The periodogram and the biased estimator of the acvs, equation (5.5), are
Fourier transform pairs,
We will use a set of orthonormal discrete prolate spheroidal sequence (DPSS), also
known as Slepian sequences, as data tapers. These sequences are defined as solutions
to the system of equations (Slepian, 1978):
N −1
X sin[2πW (t − t0 )]
vt0 ,k (N, W ) = λk (N, W )vt,k (N, W ) (5.9)
t0 =0
π(t − t0 )
CHAPTER 5. GOODNESS-OF-FIT IN AR PROCESSES 106
The individual Ŝk (f ) estimates are known as eigenspectra, and the averaged estimator,
equation (5.10), is distributed as χ22K for f 6= 0 and f 6= 1/2.
Remark 10. When using Slepian sequences, one selects the time-bandwidth parameter
N W which in turn specifies W . Typically one sets a bandwidth parameter between
2 and 6, and noninteger values can be used (Thomson, 1982, p. 1086). Judicious
selection of the bandwidth parameter can allow, for example, for the resolution of a
lower-power harmonic that would otherwise be masked by an adjacent higher-power
harmonic.
Remark 11. In practice, we will use the adaptive weighted multitaper spectral esti-
mate, Ŝ (AM T ) (f ), which uses a sophisticated weighted averaging scheme. This weight-
ing scheme generally down-weights higher-order eigenspectra, which have a higher
bias. See Thomson (1982, pp. 1065–1066) for more details. This weighted averaging
scheme provides a non-integer degree-of-freedom estimate at each frequency that is
typically slightly below 2K, but can also be significantly lower.
CHAPTER 5. GOODNESS-OF-FIT IN AR PROCESSES 107
The Yule-Walker equations are the oldest method for estimating the parameters of a
zero-mean stationary AR(p) process {Yt }. The method involves the following steps:
(a) Assume the process is stationary. [This step may seem a bit circular.]
(b) Multiply equation (5.1) by Xt−k for k = 1, 2, . . . , p.
(c) Take expected values,
p
X
γk = φj γk−j for all k > 0. (5.11)
j=1
Using the fact that Yt−k is uncorrelated with noise that occurs after time t − k,
we see that E{Zt Yt−k } = 0.
(d) As γ−j = γj , we write the Yule-Walker equations as
γ1 = φ1 γ0 + φ2 γ1 + ··· + φp γp−1
γ2 = φ1 γ1 + φ2 γ0 + · · · + φp γp−2
.. .. .. .. (5.12)
...
. . . .
γp = φ1 γp−1 + φ2 γp−2 + · · · + φp γ0 .
γ p = Γpφ p , (5.13)
The Toeplitz symmetric matrix Γ [equation (5.14)] must be positive definite for
the procedure to make sense. If it is not, one obtains nonsensical results such as
negative prediction variances. This is the reason that one uses the biased form
1
of φ̂B
Z in (5.5). If one replaces the 1/N with , the unbiased estimates are
N −τ
not positive definite. However, if the correlations are the Fourier transform of
a positive spectrum, they are guaranteed to be positive definite by Bochner’s
theorem.
(e) Thus we have an estimate:
φ p = Γ−1
p γ p. (5.15)
This gives us the variance of white noise term in equation (5.1) estimate as:
σZ2 = γ0 − φ Tp γ p . (5.16)
(f) Finally, the method of moments is used and the estimator of γ̂ B from equation
(5.5) is used to form γ̂ p in equations (5.15) and (5.16), and these equations become
φ p = Γ̂−1 2 T
p γ̂ p , and σ̂Z = γ̂0 − φ̂ p γ̂ p . (5.17)
Note that the vector γp represents the autocorrelation coefficients φj for an AR(p)
process.
The Yule-Walker equations can be solved by matrix inversion but are generally solved
using the Levinson-Durbin recursions, which are related to a modified Cholesky [lower
triangular matrix] decomposition. The code must be written carefully, and 64-bit
floating point arithmetic is required to avoid loss of precision.
CHAPTER 5. GOODNESS-OF-FIT IN AR PROCESSES 109
5.2.3.1 Preliminaries
where 1 ≤ i ≤ j ≤ N .
where X p = WN −(p−1),N X that is, X p is the vector of the last p elements from
X = (X1 , X2 , . . . , XN )T . The mean-squared one-step-ahead prediction error is given
by
→
PN +1 = E{(X N +1 (p) − XN +1 )2 } (5.20)
= γ(0) − γ Tp Γ−1
p γ p.
It has been noted that there is no reason to restrict oneself to the acvs computed from
untapered spectral estimates (Thomson, 1977a, p. 1773) one can use direct spectral
estimators and multitaper estimates. It has been shown, (e.g., P&W93, pp. 405–406)
that a direct spectral estimator using Slepian sequences with N W = 2 accurately
depict, the theoretical spectra of a known AR(4) process. We present simulations
comparing different estimates for a known AR(4) sequence.
5.2.4.1 Overview
of γ̂τ . It does this by focusing on minimizing the error in the one-step-ahead and
one-step-back prediction estimates (Burg, 1968). In practice, the Burg algorithm has
been shown to be more efficient than use of the Levinson-Durbin recursions using the
standard biased estimator, γ̂τB , for smaller sample size. Our simulations indicate that
the Burg method is considerably more effective than the Levinson-Durbin recursions
when using the standard biased estimator, γ̂τB , but it is not significantly more effective,
when a multitaper spectral estimate version of γ̂τ is used. The Burg estimator is not
without its own drawbacks such as splitting lines (Ulrych and Bishop, 1975) (see
Section 5.3).
Some authors consider Burg’s method more effective when roots of the charac-
teristic polynomial are close to the unit circle (De Hoon et al., 1996). However, in
Section 5.4, we show that the Yule-Walker equations when used with the multitaper
method is as effective as the Burg method when applied to an AR(4) example with
roots close to the unit circle. It was recognized in the 1970’s that the Burg method
both split lines and gave spectrum estimates with very high variance. A good example
is given in Figure 2, Section VII of Burg et al. (1982). Various patches and corrections
have been suggested, for example in Kaveh and Lippert (1983), but these destroy the
elegance of Burg’s original proposal and are usually not include in code. Further, be-
cause the algorithm minimizes the sum of the forward and reverse prediction variances
it is very sensitive to the stationarity assumption.
One other practical concern about the Burg method due to missing lines is based
on the following quote:
5.2.4.2 Preliminaries
We write the prediction error associated with the one-step-ahead AR(p) predictor,
equation (5.19), as
→
→
t (p) = Xt − X t (p). (5.25)
We write the prediction error associated with the one-step-back AR(p) predictor,
equation (5.26), as
←
←
t (p) = Xt − X t (p). (5.27)
L v = (vN , v1 , v1 , . . . , vN −1 )T ,
We define the variance σ̃02 , = γ̂0B , then for k = 1, 2, . . . , p we recursively compute the
following:
→ ←
2hMk+1,N e (k − 1), Mk+1,N e (k − 1)i
φ̃k,k = → ← (5.28)
kMk+1,N e (k − 1)k2 + kMk+1,N e (k − 1)k2
σ̃k2 = σ̃k−1
2
(1 − φ̃2k,k )
→ → ←
e (k) = e (k − 1) − φ̃k,k e (k − 1)
← → →
e (k) = L ( e (k − 1) − φ̃k,k e (k − 1)),
where we use h·, ·i to denote vector inner product, and k · k2 to denote the squared
norm.
The Burg estimator φ̃k,k differs from the Yule-Walker estimator φ̂k,k . The key
point of the Burg estimator is that an estimator of acvs, typically γ̂τB for τ > 0, is no
longer required, whereas for the Yule-Walker equations, an estimator of the acvs is
required for integer values of τ ≤ p.
mates
We generally consider AR models useful for prewhitening data, but we caution against
its use in general spectral estimation in the physical sciences. Kaveh and Lippert
(1983) believe that AR spectral estimation can be patched for use, but Tukey (1984)
condemned the general use of parametric spectral estimation. For a general overview
of the problems of parametric spectral estimation, see Kay and Marple (1981).
CHAPTER 5. GOODNESS-OF-FIT IN AR PROCESSES 114
Two specific problems to note when using AR spectral estimates for a sinusoid
in additive noise are that (a) the location of the peak in the spectrum is found to
depend on the phase of the sinusoid, and (b) two adjacent peaks in the spectrum can
appear as one peak (Ulrich, 1970). The second problem is known in the literature as
spectral line splitting.
Two proposed solutions are (a) replacing the real-valued signal with an analytic
signal Kay and Marple (1981, p. 1396) and (b) using improved estimates of the
autocorrelation function, equation (5.1). The first solution must consider taking an
appropriate Hilbert transform that does not have the same bias properties as estimates
based on the raw (biased) periodogram, and the process becomes complicated in the
presence of multiple lines. We take the latter approach.
ficients
−0.9238 using the Levinson-Durbin recursions with the biased autocovariance esti-
mator, to an autocovariance estimator based on a single Slepian taper N W = 5, and
to an autocovariance estimator constructed using the adaptive multitaper method
with N W = 5 and k = 5. The acvs estimators were calculated from the estimated
spectrum using the property in Remark 9. Figure 5.1 is a comparison of partial auto-
correlation coefficients. In this simulation, the multitaper spectral estimate and the
Burg estimate are preferred, and the use of a single Slepian taper is preferred to the
standard biased acvs estimator. This table gives a non-trivial example where the
Yule-Walker equations based the multitaper spectral estimate and solved with the
Levinson-Durbin recursions is as effective as the Burg method.
Table 5.1: Comparisons of estimates of φ4,4 from 100,000 run simulations using the
Yule-Walker equations with the biased autocovariance estimator, an autocovariance
estimator using one Slepian taper with N W = 5, an adaptive weighted multitaper
spectral estimate with N W = 5, and k = 8, and the partial autocovariance estimator
made using Burg’s method.
cesses
30
20
Method
Density
default
mtm
taper
10
5.5.1 Preliminaries
Note that tests based on the integrated spectrum, standardized or not, are gen-
erally not considered to be affected by the bias properties of using the raw peri-
odogram (Priestley, 1981, p. 471). In the case of real-valued data, equation (5.29)
can be adjusted to only consider positive frequencies (see: Priestley, 1981, p. 473).
The standardized integrated spectrum can be written as
R f0
−1/2
S(ξ) dξ
F (f ) = R 1/2 , (5.30)
−1/2
S(ξ) dξ
which we estimate using the standard spectral estimator in equation (5.8). The
goodness-of-fit tests draw on the correspondence between the standardized integrated
spectrum and the empirical distribution function, and standardization provides the
advantage that asymptotic distributions are valid under more general conditions than
those without standardization (Anderson, 1997). As with the integrated spectrum,
equation (5.29), this estimator can be constructed from only positive frequencies when
restricted to real-valued data. Tests using the integrated spectrum are generally poor
because they are insensitive to lower-power parts of the spectrum.
An overview of goodness-of-fit tests for AR and moving average (MA) models are
presented in Priestley (1981, pp. 475–494). We will be using the maximum absolute
deviation of the integrated spectrum as a measure of goodness-of-fit, and we will use
simulations to estimate p-values for the observed maximum absolute deviation. We
note that Anderson (1997) proposes the same test statistic, the maximum absolute
deviation of the integrated spectrum, to test the null hypothesis that the observations
are on an AR process of an order not greater than the specified one. In place of
CHAPTER 5. GOODNESS-OF-FIT IN AR PROCESSES 118
√
max N | F̂+ (f ) − F+ (f ) | , (5.31)
0≤f ≤1/2
to the Kolmogorov-Smirnov statistic, which has been used in testing the goodness-of-
fit in empirical distributions (Priestley, 1981, p. 480). We use the subscript positive
sign, +, to indicate that we are constructing the estimate solely on positive frequen-
cies, (see Section 5.5.1).
Limiting distributions for the goodness-of-fit tests have been studied Anderson (1997),
but practical software solutions are not readily available, and we propose a simple
simulation-based statistical test. In addition, simulations do not constrain us to
a one-size-fits-all approach. We propose (a) careful fitting of AR coefficients, (b)
plotting the estimated spectra against the theoretical spectra—see equation (5.7),
for the selected AR model, and (c) comparing the estimated standardized integrated
spectrum to the theoretical spectra for the AR using the maximum absolute deviation
as a test. We then use simulations to assess the significance of the observed distance.
In constructing the theoretical AR spectrum used in the standardized integrated
spectrum, we estimate σZ2 in equation (5.7), from the data.
CHAPTER 5. GOODNESS-OF-FIT IN AR PROCESSES 119
We assess the goodness-of-fit for AR models for two AR models used in the lit-
erature, the AR(4) model discussed in Figure 5.1 and the AR(2) model φ =
(0.75, −0.5)T (P&W93, p. 45). Figure 5.2 compares empirical distributions of the
distance, showing four comparisons in our simulations, two cases where the simulated
AR process matches the theoretical, and two cases where we simulate mismatches
(that is, the AR process simulated does not match the theoretical.) The top two
plots show the distributions of the distances where the models accurately fit, and
the lower two show distributions of misfit models. Comparing the top two plots in
Figure 5.2 to the bottom two, one can see considerable change in the x-axis values.
The misfit models generate larger distances. The red line indicates fitted Gamma
distributions, and Table 5.2 indicates the shape and rate parameters of the fitted
Gamma distributions.
The probability density of the gamma function is
x
xk−1 e− θ
f (x; k; θ) = k for x > 0 and k, θ > 0. (5.32)
θ Γ(k)
In (5.32), θ is the scale parameter, and k is the shape parameter, and the inverse
scale parameter, β = 1/θ, is called a rate parameter.
In order to get a sense of how the simulated integrated spectra compare to the
theoretical, the top left plot in Figure 5.3 shows the observed integrated spectrum from
that AR(4) simulation run that had the extreme (largest) value for maximum absolute
deviation of the 40000 simulations against the theoretical integrated spectrum for the
AR(4) process. The top-right plot shows the AR(2) simulation run that had the
extreme (largest) value for maximum absolute deviation of the 40000 simulations
CHAPTER 5. GOODNESS-OF-FIT IN AR PROCESSES 120
0.30
0.8
AR(4) Fit AR(2) Fit
Gamma Gamma
Density
Density
0.15
0.4
0.00
0.0
0 2 4 6 8 10 12 1 2 3 4 5
0.4
1.2
Density
0.8
0.2
0.4
0.0
0.0
10.5 11.5 12.5 13.5 10 12 14 16
Figure 5.2: This figure shows the observed maximum absolute distance observed from
40000 simulations. The top left plot compares a simulated AR(4) to the theoretical
AR(4), the top right plot compares a simulated AR(2) to the theoretical AR(2),
the bottom left plot compares a simulated AR(4) to the theoretical AR(2), and the
bottom right plot compares a simulated AR(2) to a theoretical AR(4). Note the
changing y-axis scales.
against the theoretical integrated spectrum for the AR(2) process. Comparing the
bottom left plot and the bottom right plot, one wonders whether a misfit to an AR(4)
process is easier to detect than the misfit to the AR(2) process.
A link between Burgundy GHD and European climate fluctuations has been pro-
posed (Tourre et al., 2011), and the Burgundy Pinot Noir grape is considered to be
highly sensitive to climate variations; specifically, earlier harvest dates correspond
to higher April to August temperatures (Chuine et al., 2004; Krieger et al., 2011).
CHAPTER 5. GOODNESS-OF-FIT IN AR PROCESSES 121
Table 5.2: Shape and rate parameters with their respective standard errors, abbrevi-
ated SE, for the fitted Gamma distributions shown in Figure 5.2. Both the shape and
rate parameters are considerably higher for the case where the simulated AR model
did not match the theoretical model.
Prewhitening data reduces bias in analysis (Thomson, 1990b), and AR models are
an efficient prewhitening tool (Thomson, 1990a). Mann and Lees (1996) propose
removing spectral lines, fitting an AR(1) process, and then assessing significance
of harmonic components in the spectrum using confidence intervals from the fitted
AR(1) model. Figure 5.4 presents the raw spectrum of the GHD series, and the
associated spectra of several AR models, including models of the same order where
different techniques were used to obtain the AR coefficients. It does appear from
the plot that the selection of AR model prewhitener can affect harmonic analysis of
residuals. We proceed to fit several AR models to the Burgundy GHD series and test
them for goodness-of-fit.
Using our method for comparing AR goodness-of-fit, Table 5.3 shows the observed
maximum absolute deviation of the sample integrated spectrum, and it shows that the
simulated p-values. Based on this goodness-of-fit criterion, we see little difference in
the choice of models, and certainly no statistically significant difference. We conclude
that each of the four models fits reasonably well. In looking at the spectrum in
Figure 5.4, it appears that the choice of prewhitner can affect the significance of the
harmonic components; however, this test does not enable us to distinguish between
CHAPTER 5. GOODNESS-OF-FIT IN AR PROCESSES 122
0.8
0.4
0.4
AR(4) Model AR(2) Model
AR(4) Observed AR(2) Observed
0.0
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
Frequency Frequency
Std Int Spectra F(f)
0.8
0.8
0.4
0.4
0.0
0.0
0.0 0.1 0.2 0.3 0.4 0.5 0.0 0.1 0.2 0.3 0.4 0.5
Frequency Frequency
Figure 5.3: We ran 40000 simulations each comparing a simulated AR(4) to the
theoretical AR(4), top left, a simulated AR(2) to the theoretical AR(2), top right,
a simulated AR(2) to the theoretical AR(2), bottom left, and a simulated AR(4) to
the theoretical AR(2), bottom right. The top two plots indicate the worst fit of the
40000 runs when the simulations were from the same model as the theoretical AR,
and the bottom two plots indicate the best fit of the 40000 runs when the simulations
are from a model than different from the theoretical AR.
This chapter demonstrates that multitaper spectral estimation of the acvs used with
Levinson-Durbin recursions is more effective than an untapered spectral estimate used
with the Levinson-Durbin recursions for a high signal-to-noise ratio AR(4) process
with roots close to the unit circle. It also demonstrates that the multitaper Levinson-
Durbin versions are as effective as Burg’s method in this example. We propose a
CHAPTER 5. GOODNESS-OF-FIT IN AR PROCESSES 123
Period (years)
100 20 10 7 5 4 3 2.5 2
500
(Days)2 (cycles/year)
50 100
20
10
Frequency (cycles/year)
Figure 5.4: Adaptive multitaper spectrum of the GHD series. The parameters used
are: N W = 3 and k = 5. Plotted over the spectrum, we have the standard AR(1)
spectrum in red, the standard AR(8) spectrum in green, the DPSS tapered AR(8)
spectrum in blue, and the multitaper AR(8) spectrum in cyan. The multitaper AR(8)
in cyan and the standard AR(8) follow closely except between the frequencies 0.2 and
0.3 (cycles/year), where the multitaper estimate has slightly higher power and appears
to follow the spectral estimate more closely.
CHAPTER 5. GOODNESS-OF-FIT IN AR PROCESSES 124
Table 5.3: Maximum absolute deviation (max abs dist) of the observed GHD stan-
dardized integrated spectrum to the theoretical standardized integrated spectrum for
the various models and approximate p-values based on simulations testing the null
hypothesis that the maximum absolute deviation is small enough for the model to be
appropriate.
(1) Test, with simulations, different but closely related AR models in order to deter-
mine how the test works, and test simulations with mixed spectra that include
discrete line components.
(2) Consider other ways of comparing two spectra. For example, the L2 distance
would be more sensitive to overall differences, whereas maximum absolute devi-
ation may be more sensitive to high-power line components.
(3) Explore questions of stationarity and change-point that exist in climate series
CHAPTER 5. GOODNESS-OF-FIT IN AR PROCESSES 125
such as the Burgundy GHD series. One could section the series at a change-point
and make multiple comparisons of spectra before and after the change-point to
each other and to AR models for the entire series.
(4) Study the effect of AR model selection on hypothesis tests, such as Mann and
Lees (1996), for spectra exceeding the AR confidence intervals.
Chapter 6
structure over time. We derived the mean and variance values for the estimator
under an independence assumption, confirmed these using simulations, and then used
simulations to study relaxing of the independence assumption. We then tested our
procedure on a change-point model and found that the estimator was graphically
effective; however, it was not found to be statistically powerful. We presented it as
a graphical tool as part of a methodology incorporating existing spectral multitaper
tools. In Chapter 4 we presented spectral and statistical analysis of the Burgundy
GHD time series. The analysis included a coherence study in which we found the
Burgundy GHD series coherent with both the Swiss GHD and the CET series. This
provides new evidence that the three series, Burgundy GHD, Swiss GHD and CET,
capture similar climate signals. We then used the level-of-change estimator as part of
a graphical technique to detect a change-point of 1675 in the Burgundy GHD, a date
that was consistent with the Maunder minimum. Finally presented a spectral analysis
of the sectioned series. In Chapter 5 we studied methods of calculating AR coefficients
and presented a method for estimating the goodness-of-fit for AR estimators. We gave
an example in which the coefficients estimated using the Yule-Walker equations with
the multitaper spectral estimate, and solved with the Levinson-Durbin recursions
provide results similar to those provided by the Burg method. The advantage of the
multitaper method combined with the Yule-Walker equations is that the latter has
a tunable parameter. The Burg method is known to split lines, and, while we did
not find a statistical difference between the two in our examples using our goodness-
of-fit test, the Yule-Walker equations with the multitaper method provide a tunable
alternative, which, we have shown, can be accurate with a process with roots close to
the unit circle.
CHAPTER 6. CONCLUDING REMARKS 128
Geosciences, as we receive email comments and questions about the multitaper pack-
age in R from those the geophysics and climate science community; the multitaper
technique is not a standard statistical tool.
Bibliography
D. E. Amos. Algorithm 610: A portable fortran subroutine for derivatives of the psi
function. ACM Transactions on Mathematical Software (TOMS), 9(4):494–502,
1983.
T. W. Anderson. The Statistical Analysis of Time Series. John Wiley & Sons, 1971.
130
BIBLIOGRAPHY 131
D. W. K. Andrews. Tests for parameter instability and structural change with un-
known change point. Econometrica: Journal of the Econometric Society, 61:821–
856, 1993.
P. Bloomfield. Fourier Analysis of Time Series. John Wiley & Sons, 2nd edition,
2000.
D. R. Brillinger. Time series: data analysis and theory, volume 36. Siam, 2001.
E. N. Brown, R. E. Kass, and P. P. Mitra. Multiple neural spike train data analysis:
State-of-the-art and future challenges. Nature Neuroscience, 7(5):456–461, 2004.
J. P. Burg. A new analysis technique for time series data. NATO Advanced Study
Institute on Signal Processing with Emphasis on Underwater Acoustics (reprinted
in Childers, 1978), 1, 1968.
BIBLIOGRAPHY 133
D. R. Cox and H. D. Miller. The Theory of Stochastic Processes. John Wiley & Sons,
1965.
H. F. Davis. Fourier Series and Orthogonal Functions. Boston Allyn and Bacon,
1963.
B. Efron and G. Gong. A leisurely look at the bootstrap, the jackknife, and cross-
validation. The American Statistician, 37(1):36–48, 1983.
B. Efron and C. Stein. The jackknife estimate of variance. The Annals of Statistics,
9(3):586–596, 1981.
BIBLIOGRAPHY 135
F. J. Harris. On the use of windows for harmonic analysis with the discrete Fourier
transform. Proceedings of the IEEE, 66:51–83, 1978.
H. X. He and D. J. Thomson. The canonical bicoherence–part II: QPC test and its
application in geomagnetic data. Signal Processing, IEEE Transactions on, 57(4):
1285–1292, 2009.
M. Kaveh and G. A. Lippert. An optimum tapered Burg algorithm for linear predic-
tion and spectral analysis. IEEE Trans. on Acoustics, Speech, and Signal Process-
ing, ASSP–31:438–444, 1983.
L. H. Koopmans. The Spectral Analysis of Time Series, volume 22. Academic Press,
1995.
J. M. Lees. RSEIS: Seismic Time Series Analysis Tools, 2013. R package version
3.2-1.
M. Mudelsee. Climate Time Series Analysis: Classical Statistical and Bootstrap Meth-
ods, volume 42. Springer, 2010.
K. Pearson. On the criterion that a given system of deviations from the probable
in the case of a correlated system of variables is such that it can be reasonably
supposed to have arisen from random sampling. Philosophical Magazine Series 5,
50(302):157–175, 1900.
M. B. Priestley. Spectral Analysis and Time Series. Volume 1: Univariate Series. Vol-
ume 2: Multivariate Series, Prediction and Control. Probability and Mathematical
Statistics, 1981.
for detecting regime shifts in paleoclimatic time series: Application to δ18o time
series of the Plio-Pleistocene. Paleoceanography, 24(1), 2009.
R. H. Shumway and D. S. Stoffer. Time Series Analysis and its Applications: With
R Examples. Springer, 2nd edition, 2006.
R. H. Shumway and D. S. Stoffer. Time Series Analysis and its Applications: With
R examples. Springer, 3rd edition, 2010.
D. Slepian and H. O. Pollak. Prolate spheroidal wave functions, Fourier analysis and
uncertainty–I. Bell Syst. Tech. J, 40:43–64, 1961.
BIBLIOGRAPHY 144
G. Strang. Linear Algebra and Its Applications Academic. Cengage Learning, 4th
edition, 2005.
14
H. E. Suess and T. W. Linick. The C record in Bristlecone pine wood of the past
8000 years based on the dendrochronology of the late C. W. Ferguson. Philosophical
Transactions of the Royal Society of London. Series A, Mathematical and Physical
Sciences, 330(1615):403–412, 1990.
A. R. Tomé and P. M. A. Miranda. Piecewise linear fitting and trend changing points
of climate parameters. Geophysical Research Letters, 31(2), 2004.
T. J. Ulrych and Thomas N. Bishop. Maximum entropy spectral analysis and autore-
gressive decomposition. Reviews of Geophysics, 13(1):183–200, 1975.
P. D. Welch. The use of the fast fourier transform for estimation of spectra: A method
based on time averaging over short, modified periodogram. IEEE Trans. on Audio
and Acoustics, 15:70–74, 1967a.
P. D. Welch. The use of fast Fourier transform for the estimation of power spectra: a
method based on time averaging over short, modified periodograms. IEEE Trans.
on Audio and Acoustics, 15(2):70–73, 1967b.
B. Whitcher. waveslim: Basic Wavelet Routines for One-, Two- and Three-
dimensional Signal Processing, 2012. R package version 1.7.1.
Multitaper R Package
149
APPENDIX A. MULTITAPER R PACKAGE 150
A.2 Introduction
Spectral analysis is used by statisticians and researchers to analyze sequential data re-
ferred to as time series. Examples of a time series include digitized recorded speech,
a sequential record of stock prices, and an electrocardiogram, which is a record of
electrical signals from a patient’s heart. The term time series implies successive ob-
servations in time, creating a serial correlation, but time series analysis techniques
apply to observations related sequentially, even if the sequential relationship is not
time. Spectral analysis refers to techniques involving analysis of a representation of
the time series in terms of sinusoidal components. One can imagine projecting a
time series onto the space spanned by sinusoids of discrete frequencies from zero to a
cutoff frequency, and then analyzing the coefficients, or squared coefficients, at each
APPENDIX A. MULTITAPER R PACKAGE 151
frequency to determine which frequencies contribute more to the variance of the orig-
inal series. This projection image is accurate because in spectral analysis, as in linear
regression, we consider equality in a mean-square sense. Multitaper spectral analy-
sis is a form of spectral analysis that exploits certain optimal orthogonal sequences,
discrete prolate spheroidal sequence (Slepian sequences), to produce consistent, in
the statistical sense, spectral estimates with lower bias and variance than the naı̈ve
estimator, which is called the periodogram.
We present a package for the R (R Core Team, 2013) statistical programming
language that performs multitaper spectral estimation. In addition, the package
implements techniques that exploit properties of multitaper spectral estimators us-
ing Slepian sequences to provide: a jackknife (non-parametric) variance; a harmonic
F -test, a statistical technique for detecting single-frequency line components; a
magnitude-squared coherence (MSC) estimate, an improved technique for analyzing
a linear dependence in frequency of bivariate time series which includes a jackknifed
variance estimate; and a complex demodulate estimate, a technique for observing
phase drift, a slow change in frequency over time.
While this paper is self-contained, the authors recommend some familiarity with
time series analysis and spectral estimation. Two comprehensible reference texts that
include introductory discussions of spectral analysis are Chatfield (2004) and Diggle
(1990). Percival and Walden (1993), hereinafter abbreviated as P&W93, present
a thorough overview of multitaper spectral estimation theory with many examples.
Shumway and Stoffer (2010) present a comprehensive book covering spectral and time
series analysis using the R programming language.
Thomson (1982) introduced multitaper spectral estimates using Slepian sequences,
APPENDIX A. MULTITAPER R PACKAGE 152
and in the interim this technique has been used in fields such as anesthesiology (Moore
et al., 2008), climate science (Tourre et al., 2011), geophysics (He and Thomson, 2009;
Lepage and Thomson, 2009), and neuroscience (Brown et al., 2004).
The multitaper spectral estimate is similar to direct spectrum estimates (Black-
man and Tukey, 1959), which reduce bias by applying a data taper. It improves
over direct spectral estimates in two ways: (1) it makes use of Slepian sequences,
which are maximally concentrated in time and frequency (P&W93, pp. 75–81), and
(2) it uses several orthogonal Slepian sequences averaging estimates. Typically, one
uses an adaptive weighted average to reduce variance while controlling bias. The
cost of using this method is (1) a reduction in frequency resolution and (2) increased
computational cost, as multiple Fourier transforms are required in place of one. The
computational burden can be measured in fractions of a second and should not be a
primary concern. The direct spectral estimator controls bias with one taper (Black-
man and Tukey, 1959), thus decreasing frequency resolution, and additionally requires
smoothing or frequency averaging to increase variance, again decreasing bandwidth.
The direct spectral estimator, without frequency averaging, is not statistically consis-
tent, as the variance does not decrease as the sample size increases. The periodogram
can be shown to be asymptotically unbiased, but examples exist where considerable
bias is observed with a high number of data points (Thomson, 1982, p. 1058)
There are several software packages and programs that implement the multitaper
method, and we present a brief review. The programming environment MATLAB im-
plements Thomson’s multitaper method, using the adaptive weights, with the “signal
processing toolbox.” Code written in C++, available in Press et al. (2007, pp. 662–
667), can be used to obtain a multitaper spectral estimate; however, adaptive weights
APPENDIX A. MULTITAPER R PACKAGE 153
are not implemented. Pardo-Igúzquiza et al. (1994) introduce a Fortran program that
implements the multitaper method using adaptive weights. Lees and Park (1995)
present C code implementing the multitaper method with adaptive weighting and
the harmonic F -test; however, there is no option to zero-pad to increase the fre-
quency grid. Fortran 90 code implementing the adaptive weighted multitaper spectral
estimate and the harmonic F -test is provided in Prieto et al. (2009). LISP code
implementing the adaptive weighted multitaper spectral estimate and the harmonic
F -test. Some functionality in our multitaper package is based on the LISP code accom-
panying P&W93. The following packages are available in the programming language
R. The package waveslim (Whitcher, 2012) obtains the Slepian sequences using the
accurate inverse iteration method (Bell et al., 1993). The package sapa calculates
the multitaper spectral estimate, but without using adaptive weights. The package
RSEIS implements the multitaper method using adaptive weights and it computes the
harmonic F -test (Lees, 2013). We present the multitaper package, which implements
the multitaper method, allows for adaptive weights, and implements the harmonic
F -test. This package adds the ability to obtain the nonparametric, jackknife variance
of the spectral estimate, the bivariate MSC with a jackknife estimate, and complex
demodulation using the Slepian sequences. The programming environment S-Plus
provided a native function to calculate complex demodulation based on Bloomfield
(2000, pp. 97–130); however, R (as of the development version 3.1.0) does not provide
a similar function. To accommodate R users, function calls in the multitaper package
are designed to be similar to existing R spectral estimate calls (see (Shumway and
Stoffer, 2010), and the functions return similar objects. We note that this package
makes use of well tested Fortran routines developed in Thomson (1982, pp. 219–220).
APPENDIX A. MULTITAPER R PACKAGE 154
tion
A.3.1 Overview
Direct spectral estimates are estimates of the spectrum computed via a direct Fourier
transform, on tapered data, as compared to indirect estimates, which are obtained by
APPENDIX A. MULTITAPER R PACKAGE 155
taking the Fourier transform of the autocovariance function via the Einstein-Wiener-
Khintchine theorem, also known as the Wiener-Khintchine theorem. Multitaper spec-
tral estimation is a technique that uses the weighted average of several direct spectral
estimates, each computed using a different member of a family of orthogonal tapers.
By default, we will assume that multitaper spectral estimates are computed using
the Slepian sequences, which have been shown to be maximally concentrated in both
time and frequency (Slepian and Pollak, 1961; Slepian, 1964, 1976, 1978, 1983), as
tapers. That is, they define the classical uncertainty principles when time, frequency
or both are limited. There are several other taper options available, including the
sine tapers, that are also implemented in the multitaper package.
The key advantages of multitaper spectral estimation are as follows: first the
availability of the Slepian tapers, as they are maximally concentrated in both time
and frequency; second, the higher degrees of freedom obtained by use of multiple
orthogonal tapers; and third, an optimal weighting scheme for combining the approx-
imately orthogonal spectrum estimates into an approximately maximum-likelihood
estimate of the spectrum. Two further advantages of using the Slepian tapers over
other choices are the existence of the harmonic F -test statistic and the jackknife
estimation of variance, also implemented in this package.
A.3.2 Parameters
−1
If we begin with a time series {xt }N
t=0 , in order to compute the multitaper spec-
tral estimate we must select two initial parameters: the time-bandwidth parameter,
denoted as N W (where W is the bandwidth over which the Slepian tapers have
been concentrated), and the number of tapers, denoted as K. Typically, one selects
APPENDIX A. MULTITAPER R PACKAGE 156
process. Take the classic Cramér representation (Grenander and Rosenblatt, 1953;
Doob, 1952) for a discrete stationary stochastic process,
Z 1/2
xt = ei2πnf dX(f ), (A.1)
−1/2
Mullis and Scharf, 1991; Bronez, 1992; Stoica and Sundin, 1999), every quadratic es-
timator of the power spectrum must have the form
N
X −1
Ŝ(f ) = qj,k ei2π(j−k)f xj xk (A.2)
j,k=0
(k) (k)
where υn (N, W ) or just υn is Slepian’s notation for the DPSSs with the time index
shifted by 1. When K = 1, this becomes the familiar direct estimate, and if N W = 0
(0)
so υn = N −1/2 , it is the periodogram.
Formally, the components of the spectral estimator written
N
X −1
yk (f ) = xn υn(k) e−i2πf n (A.5)
n=0
are called the eigencoefficients and are the discrete Fourier transform of the data
multiplied by the k th discrete taper. In the classic development, these tapers are the
discrete prolate spheroidal sequence. As each eigencoefficient is computed by trans-
(k)
forming the data multiplied by the k th data window υn , their absolute squares are
individually direct spectrum estimates and are referred to as the kth eigenspectrum.
The Fourier transforms of the eigenvectors (tapers) υ (0) , . . . , υ (K−1) alone are written
APPENDIX A. MULTITAPER R PACKAGE 158
as {Vk (f )}K−1
k=0 . These functions are odd and even as k is odd or even, and have k
which is complex-valued, and more useful as it directly represents the form taken by
an FFT implementation.
In multitaper, these eigencoefficients are obtainable by the user by setting the
parameter returnInternals=TRUE in the spec.mtm call. We will show that the
eigencoefficients are central to the tools that can be developed in the multitaper
arena, and thus are critical for extensibility of this package.
The Q matrix from Equation (A.3) was not specified, and the choice
sin 2πW (n − m)
qnm = (A.8)
π(n − m)
that gives the best concentration (jointly) in time and frequency is used by default.
The user should note that the default tapers (or windows) used in multitaper are
the Slepians, and that each Slepian taper has a corresponding eigenvalue λk that
represents the concentration of that taper within the band (−W, W ). For low-order
tapers, λk ≈ 1 (although bounded above), with the concentration decreasing as k
approaches 2N W .
In Equation (A.4), the absolute squares of the eigencoefficients are combined ad-
ditively, each weighted by a corresponding µk . One version of the spectral estimator
APPENDIX A. MULTITAPER R PACKAGE 159
2 (A.9)
k=0 λk Ŝ(f ) − B̂k (f )
as suggested in Percival and Walden (1993, p. 46), and analyzed throughout that
text. Using the above coefficients, we can calculate the theoretical spectra as
σw2
S(f ) = , (A.10)
|1 − 4i=1 φi e−i2πf |2
P
where σw2 is defined as the innovations variance. The following R code generates a
realization of this AR(4) time series with standard normal innovations, loads the
multitaper library, and displays the multitaper spectral estimate of the series similar
to that in Figure A.1. Note that the following R commands show only the estimated
multitaper spectrum, while Figure A.1 also includes the theoretical spectrum.
1e+03
1e+01
Spectrum
1e−01
1e−03
Frequency in cycles/second
Figure A.1: Adaptive multitaper spectrum of the realization of an AR(4) time series
(thick lines) plotted on top of the theoretical spectrum (thin line).
APPENDIX A. MULTITAPER R PACKAGE 161
Function 1. spec.mtm
R> library("multitaper")
R> ar4Coef <- c(2.7607, -3.8106, 2.6535, -0.9238)
R> set.seed(60)
R> ar4.ts <- arima.sim(list(order = c(4, 0, 0), ar = ar4Coef),
+ n = 1024)
R> spec.mtm(ar4.ts, nw = 4, k = 8, dtUnits = "second")
taper Tools
The jackknife is a classic statistical tool, covered in Efron and Gong (1983) and fully
applied to the multitaper spectrum estimate in Thomson and Chave (1991b), with a
more approachable overview in Thomson (2007). This tool is fully implemented in
the multitaper package, and briefly reviewed here.
To jackknife multitaper spectrum estimates, begin with Equation (A.9), omit the
j th eigencoefficient from the weight, and take θ\j = ln Ŝ\j (f ) (where the subscript
\j is read in the set-theoretic meaning of “without j”) at each frequency, where
θ̂\j = {x1 , . . . , xj−1 , xj+1 , . . . , xK } denotes the estimate of the parameter θ omitting
the jth observation. This action treats the eigencoefficients as exchangeable data and
is called “jackknifing over tapers.” We then compute the delete-one log-spectrum
estimates as
" K−1
#
1 X
ln Ŝ\j (f ) = ln Ŝk (f ) . (A.11)
K − 1 k=0, k6=j
This gives all that we need to compute arbitrary confidence intervals for the
Slepian-tapered multitaper spectrum estimate. An example is shown in Figure A.3.
APPENDIX A. MULTITAPER R PACKAGE 163
Harmonic analysis (in the context of spectrum estimation) has come to mean the study
of line components in a spectrum, without regard to whether they are at multiples of
a common frequency or not. To make sense of this, it is essential to recognize that the
assumption of “pure” line components is a convenient fiction, and is rarely supported
over long time spans. Thus, we can divide time series into two types: short series,
in which our focus is primarily on detection and resolution of line components, and
long series, in which our focus is typically on the structure of any line components
present.
In addition to the basic multitaper approach, Thomson (1982) presented a new
approach to the problem of “mixed” spectra, i.e., where line components are em-
bedded in stationary background noise with a continuous spectrum. The process is
typically described as being a stationary random process plus a non-zero mean value
function, consisting of some number of sinusoidal terms at various frequencies, plus
perhaps a polynomial trend. In terms of the spectral representation, this amounts to
having the extended Munk-Hasselmann representation
X
E {dZ(f )} = µm δ(f − fm ) (A.14)
in place of the usual assumption that E {dZ(f )} = 0. Under this assumption, the
continuous part of the spectrum is the second absolute central moment of dZ(f ). As
implemented in the multitaper package, the harmonic F -test assumes the simplest
case of a single line component at frequency f0 . In this case, the eigencoefficients, as
defined in Equation (A.5), have non-zero expected value:
The assumption is made that the continuous component of the spectrum near f0 is
slowly varying (or locally white), resulting in the relationship
where S(f ) is the continuous spectrum and does not include the line power. One
then uses point regression at f = f0 , where the relation
holds, and, remembering that both the yk (f )s and µ(f ) are complex-valued, µ can be
estimated by standard regression techniques (Miller, 1974):
K−1
X
Uk (0)yk (f )
k=0
µ̂(f ) = K−1
. (A.18)
X
Uk2 (0)
k=0
Subtracting this result from the eigencoefficients gives an estimate of the continuous
background spectrum, and comparing this value with the power in the line component
results in an F variance-ratio test (Fisher et al., 1990) with 2 and 2(K − 1) degrees
of freedom for the significance of the line component. Formally,
K−1
X
(K − 1) |µ̂(f )|2 Uk (0)2
k=0
F (f ) = K−1
. (A.19)
X
|yk (f ) − µ̂(f )Uk (0)|2
k=0
This is the ratio of the variance in the band (f −W, f +W ) explained by the sinusoid to
the residual, unexplained variance in the same band, scaled by the degrees-of-freedom.
Thus, as implemented in the multitaper package, the test results in an array of
F statistics on the same frequency mesh as the spectrum. Significance levels can
APPENDIX A. MULTITAPER R PACKAGE 165
be computed using the qf() function. When plotting an mtm object, the F -test is
plotted in place of the spectrum by passing Ftest = TRUE, and significance lines
can be added to the plots by passing siglines = c(p1,p2,p3) with p1,p2,p3 user-
defined significance levels, typically chosen on the basis of sample size. A typical rule
of thumb is to set your minimum significance level at ∼ 1 − 1/N (Thomson, 1990b).
In general, one must be aware of possible false detects, as the F -statistic is highly
sensitive to violations of its underlying assumption of a locally white spectra. In
practice, one would plot both the spectrum and the F -test; a statistically significant
F -test statistic at a given frequency combined with a characteristic (approximately
rectangular) multitaper peak centred at the same frequency gives much more credence
to the detection.
We also note here that, for practical examples, the choice of zero-padding amount
(when computing the FFT) can have significant impact upon the F -test. As is shown
in Thomson (2001, pp. 364–365), the standard deviation of the F -test can often
be very small. Thus, the number of zero-padded transform bins should be selected
to be approximately equivalent to half this standard deviation in order to accurately
determine estimated frequencies. This technique can be applied as part of an iterative
process whereby a pilot estimate of the spectrum and F -test are computed, and then
a refinement is made based on the maximum F -test value observed (approximately
the signal-to-noise ratio).
To explore the use of the harmonic F -test as included in the multitaper package,
we use the Hadley Centre Central England Temperature daily series, available from
APPENDIX A. MULTITAPER R PACKAGE 166
15
10
5
0
−5
Date
We compute the multitaper spectrum of this series and display the portion around
1 cycle/year using the dropFreqs function, as detailed in Section A.7.1. As is com-
monly known, any long-run temperature series exhibits extremely strong response at
1 cycle/year, or 31.69nHz. This series is no exception, as can be seen in Figures A.3
and A.4. The jackknifed confidence intervals are included in these plots at 5% and
95%.
APPENDIX A. MULTITAPER R PACKAGE 167
R> data("CETdaily")
R> cet.spec <- spec.mtm(CETdaily[, "Temp"], nw = 5, k = 10,
+ plot = FALSE,
+ Ftest = TRUE, jackknife = TRUE, dT = 86400, units = "second")
Function 3. dropFreqs
The F -test coefficients are contained in the mtm object, and can easily be extracted
and plotted separately. The frequency array is also available, and can be modified to
scale to user-selected units. As is shown in this example, the function dropFreqs acts
on the entire mtm object, focusing (in frequency) the spectrum estimate, the harmonic
F -test statistic (if computed), and possibly the coherence (see Section A.5.1). We
also show the use of the siglines parameter, placing a 0.999 significance line on the
plot.
APPENDIX A. MULTITAPER R PACKAGE 168
herence
Given two stationary stochastic processes x(t) and y(t), t ∈ Z, the coherence between
x and y, is the complex-valued function of frequency defined as
Sxy (f )
Cxy (f ) = p p , (A.20)
Sx (f ) Sy (f )
where Sx (f ) and Sy (f ) are spectra of x and y respectively, and Sxy (f ) = E[dX(f )dY ∗ (f )].
A related quantity is the MSC between x and y, denoted as γxy (f ) and defined by
where xk (f ) and yk (f ) are the eigencoefficients of x and y respectively, and the line
indicates complex conjugation. Substituting these estimates of the auto- and cross-
spectra into the definition of the coherence results in a multitaper estimate of coher-
ence:
K−1
X
xk (f )yk (f )
k=0
Ĉxy (f ) = !0.5 (A.24)
K−1
X K
X
|xk (f )|2 · |yk (f )|2
k=0 k=0
2
xy (f ) = Cxy (f ) .
γc d (A.25)
The coherence between two time series can be computed by using the mtm.coh
function, as demonstrated in Section A.5.1. The coherence can be computed between
weighted or unweighted spectrum estimates, depending on how the user has generated
the mtm objects.
In similar fashion to Kuo et al. (1990), we examine records of atmospheric CO2 from
the Mauna Loa observatory, Hawaii, USA, and monthly northern hemisphere temper-
ature anomalies from the Hadley Climate Research Unit, University of East Anglia,
UK. The records were obtained from NOAA (2011); Hadley Climate Research Unit
(2011) and were cleaned. The few missing points were linearly interpolated. Both
records are of monthly data. As in the cited paper, we estimate the trend using a
multitaper technique, with further details given in Section A.7.1. We use this method
of trend estimation over the more traditional least-squares estimator for the favourable
APPENDIX A. MULTITAPER R PACKAGE 170
frequency-domain aspects of the result. The residuals after this trend estimate are
then passed through the spec.mtm function, and the resultant mtm objects are passed
to mtm.coh.
APPENDIX A. MULTITAPER R PACKAGE 171
Function 6. multitaperTrend
Function 7. mtm.coh
Function 8. plot.mtm.coh
The coherence plot shown here differs from Kuo et al. (1990) in that the first few
yearly harmonics have not been removed from the individual series before computing
the MSC. The two plots are similar in that their scales have been adjusted to be the
same, and both have been detrended using multitaperTrend. For more details on
the implications of this plot, see Kuo et al. (1990, pp. 711–713).
Complex demodulation is a tool to analyze both the phase and the amplitude of
a specific frequency component in a time series. A good reference on the general
application of this theory is given in Bloomfield (2000, pp. 97–131).
In general, one is interested in locating periodic phenomena that have a simple
representation in terms of cosine functions. However, even after the multitaper tech-
nique has been used to improve analysis, harmonic analysis has some limitations in
describing the signal component of the time series. To overcome this, the technique
of complex demodulation can be used to describe features of the data that could
be missed with standard multitaper harmonic analysis, or to confirm that no such
features exist.
APPENDIX A. MULTITAPER R PACKAGE 173
The algorithm for complex demodulation involves two steps. First, a frequency
shift is applied to the data such that the frequency of interest is centred at zero, and
secondly, the centred frequency of interest is isolated using a low-pass filter. The
objective is to expose small changes in amplitude or phase of a specific approximately
periodic cycle.
Begin with an assumed model for xt :
where {R(t)} is the amplitude, and {φt } represents the slowly varying phase of a
harmonic component at frequency f0 . We will focus on isolating and graphing the
slowly varying phase, φt . To develop the method, consider the complex analog of
(A.26):
yt
In this case, Rt = |yt | and e2πiφt = . The new series {yt } is said to be obtained
|yt |
from {xt } by complex demodulation. Returning to Equation (A.26), the real form of
xt can be written as
1
xt = Rt e2πi(f0 t+φt ) + e−2πi(f0 t+φt )
(A.29)
2
and is thus the sum of two complex terms, one similar to Equation (A.27) and the
second its complex conjugate. We will use complex demodulation and filter the second
(conjugate) component using convolution with a Slepian sequence.
APPENDIX A. MULTITAPER R PACKAGE 174
1 1
yt = Rt e2πiφt + Rt e−2πi(2f0 +φt ) . (A.30)
2 2
The first term is our desired component, from which we can easily extract Rt and
φt , while the second term must be removed, which we will do using an appropriate
low-pass filter. In this case we opt to convolve the data with a Slepian sequence with
relatively small bandwidth parameter, w, which acts as an effective low-pass filter.
Using this method, the estimate of yt is formed by
N
X c −1
(0)
yt = vj (Nc W ) xt−j e−i2πf0 (t−j)∆t , (A.31)
j=0
(0)
where vj (Nc W ) is a Slepian taper with appropriately chosen time-bandwidth pa-
rameter, Nc is the length of the convolution representing the time and frequency
resolution trade-off and ∆t is the time step. In previous equations where ∆t was
omitted, it was assumed to be 1.
After passing the complex demodulate through a low-pass filter, isolating the
single component of interest, the result is smoothed to remove unwanted variation
due to noise. The choice of smoother is an open one, so for consistency we use a
short-length convolution Slepian filter. The parameter Nc for this filter should be
considerably less than the length of the series of interest and is typically chosen to be
approximately 1/f0 —i.e., the time-domain period of the frequency of interest.
In this section, we examine the CET monthly means series originally complied by Man-
ley (1974), and updated in Parker et al. (1992); Parker and Horton (2005). The anal-
ysis we follow was originally published in Thomson (1995), and examined the phase
APPENDIX A. MULTITAPER R PACKAGE 175
of the annual cycle in the monthly temperature series. The data consists of monthly
mean temperature for CET from 1659 to 2011.4 We take t to represent the calendar
year, with t = 1 representing January 1659, and take the time step, ∆t = 1/12. We
also use 12-year blocks for time resolution. We note that in the original analysis, the
author found no visible difference in the phase plot when correcting for month length
in his analysis.
4
The CET monthly series begins in 1659 whereas the CET daily series begins in 1772.
APPENDIX A. MULTITAPER R PACKAGE 176
Function 9. demod.dpss
R> data("CETmonthly")
R> nJulOff <- 1175
R> xd <- ts(CETmonthly[,"temp"],deltat=1/12)
R> demodYr <- demod.dpss(xd,centreFreq=1,NW=3,blockLen=120,
+ stepSize=1)
R> phase <- demodYr["phase"][["phase"]]
R> offsJul <- 3*360/365
R> phaseAdj <- phase
R> phaseAdj[1:nJulOff] <- phase[1:nJulOff] + offsJul
R> yr <- (time(xd)+1658)[1:length(phase)]
R> plot(yr, phaseAdj, type="l", lwd=2,
+ ylab="Phase of the Year in Degrees",
+ xlab="Gregorian calender date")
R> lines(yr[1:nJulOff], phase[1:nJulOff], col="red", lty=3)
R> fit <- lm( phaseAdj ~ yr)
R> abline(fit, lty=2, col="blue")
a slope of 56.8 arcseconds per year, which is similar to the 51.1 arcseconds per year
found in Thomson (1995) and slightly greater than the precession constant of 50.3
arc seconds per year. Note that Thomson’s original paper used data only up to 1990,
and this analysis uses data up to 2011. As noted in the paper, the phase begins to
exhibit different characteristics after 1940.
There are four miscellaneous utility routines included in the multitaper package. Some
are referenced within other routines (including centre and dpss), while others are
not needed for the default operations of the package. The four routines are:
1. dpss: Generates Slepian tapers (discrete prolate spheroidal sequences) using the
tridiagonal method of Slepian (1978) and returns the eigenvectors and eigenval-
ues for the user-provided nw, k and N parameters.
2. centre: Takes a time series, and estimates the mean, using one of: the arith-
metic mean, the robust trimmed mean, or the Slepian taper-based mean method;
see Thomson (1982). The function returns the residuals after the computed
mean has been subtracted.
3. dropFreqs: Given an mtm or mtm.coh object, truncates all internal data objects
to a frequency range specified by the user. Note that mtm.coh cannot act
on objects that have first been passed through dropFreqs, instead requiring
unmodified spec objects.
APPENDIX A. MULTITAPER R PACKAGE 178
As detailed in Section A.3, the majority of the tools developed for multitaper spectral
analysis work on the raw (or weighted) eigencoefficients yk (f ). To provide functional-
ity for extending this package, an option is provided in the spec.mtm call that returns
the internal parameters of the spectrum estimation procedure. The option that pro-
duces these parameters is returnInternals = TRUE, and the parameters, for an mtm
object named test.mtm, are:
• test.mtm[["mtm"]][["eigenCoefs"]]
• test.mtm[["mtm"]][["eigenCoefWt"]]
and consist of the eigencoefficients and their associated weights. Using these
coefficients, and the related quantities (returned by default) of nw, k, nFFT and dpss
(the tapers), it is possible to extend the package to produce any desired multitaper-
based tool. As there have been numerous papers published since 1982 that contain
suggestions or development of tools, the list of possible extensions is too long to fully
list, but we do suggest several possibly useful options.
A.8 Summary
The multitaper package implements the core functionality detailed (and implied)
by Thomson (1982), with refinements from Riedel and Sidorenko (1995) and Percival
and Walden (1993) among many others. Discrete prolate spheroidal sequences are
generated in an efficient and accurate fashion and are provided as the default tapers for
the multitaper spectrum estimation routine. Approximately unbiased adaptive sine
tapers are also available, and the spectrum associated with them is easily produced. A
number of core extensions, including the harmonic F -test, the magnitude-squared co-
herence, and jackknife estimates of significance, are also provided. Finally, high-level
utility routines designed to make working with data easier have been implemented
and are included in the package.
APPENDIX A. MULTITAPER R PACKAGE 180
1e+10
1 cycle/year
1e+09
1e+08
1e+07
1e+06
20 25 30 35 40
Frequency in nHz
Figure A.3: Spectrum of CET series, zoomed to region around 1 cycle/year (31.69nHz
= 31.69 × 10−9 Hz), with 95% jackknifed confidence intervals.
200
Harmonic F−test Statistic
50
10 20
99.9%
5
2
1
25 30 35
Frequency in nHz
Figure A.4: Harmonic F -test statistic for the CET series, zoomed to low frequencies
using the function dropFreqs.
APPENDIX A. MULTITAPER R PACKAGE 181
CO2 in ppb
360
320
Date
Figure A.5: CO2 concentration time series in parts-per-billion with trend lines fitted.
2.0
Temp. in Celsius
1.0
0.0
−1.0
Date
Figure A.6: Temperature deviations time series in degrees Celsius with trend lines
fitted.
APPENDIX A. MULTITAPER R PACKAGE 182
5.5
0.7
Magnitude Squared Coherence
4.5
Arctanh Transform of MSC
1.5
0.5
0.5
0.05
0.0 0.5 1.0 1.5 2.0 2.5
Frequency in cycles/year
Figure A.7: MSC between monthly CO2 measurements from Mauna Loa, and the
global temperature series during 1958–2007. The Arctanh transform normalizes the
MSC and each integer value on this scale represents approximately one standard
deviation (Thomson and Chave, 1991b).
APPENDIX A. MULTITAPER R PACKAGE 183
148
144
140
136
Figure A.8: CET monthly phase, thick (black) line. The dotted red line indicates
the phase before the calender correction, and the dashedblue line shows the least
squares line with a slope of 56.8 arcseconds.