0% found this document useful (0 votes)
4 views21 pages

Zeluiz, V4n4a03

This paper reviews classical multivariate techniques for discrimination, clustering, and dimension reduction in time series data. It discusses the relationship between model selection and discrimination, the challenges of clustering time series, and various approaches to dimension reduction, including canonical analysis and dynamic factor models. The authors highlight the importance of these techniques across various fields such as seismology, medicine, and economics.

Uploaded by

Harpiani Arifin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views21 pages

Zeluiz, V4n4a03

This paper reviews classical multivariate techniques for discrimination, clustering, and dimension reduction in time series data. It discusses the relationship between model selection and discrimination, the challenges of clustering time series, and various approaches to dimension reduction, including canonical analysis and dynamic factor models. The authors highlight the importance of these techniques across various fields such as seismology, medicine, and economics.

Uploaded by

Harpiani Arifin
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Resenhas IMB- USP 20 00 , Vol. 4 , No . 4 , 383 - 403 .

Multivariate Analysis in Vector Time Series

Pedro Galeano and Daniel Pella

Abstract: This paper reviews the applications of classi-


cal multivariate techniques for discrimination, clustering and
dimension reduction for time series data. It is shown that
the discrimination problem can be seen as a model selection
problem . Some of the results obtained in the time domain
are reviewed . Clustering time series requires the definition of
an adequate metric between univariate time series and sev-
eral possible metrics are analyzed. Dimension reduction has
been a very active line of research in the time series literature
and the dynamic principal components or canonical analysis
of Box and Tiao (1977) and the factor model as developed by
Peiia and Box (1987) and Peiia and Poncela (1998) are ana~
Iyzed. The relation between the nonstationary factor model
and the cointegration literature is also reviewed.
Key words: Canonical Analysis, Cluster Analysis, Clas-
sification, Dynamic Factor Model, Discriminant Analysis, Prin-
cipal Components.

1. Introduction

Standard multivariate analysis includes, among others, procedures for discrimina-


tion among several populations, classification (pattern recognition) of multivariate
data into groups, either hier~rchical or not, and dimension reduction. These prob-
lems are also important in multivariate time series. The discrimination problem
appears as follows . Suppose that we know that a set of time series can be gen-
erated by one of several possible models, Mi, i =
1, ... , k and we assume that
these models are known. Now, we observe a new time series and the problem is to
decide which ofthe models, Mi, has generated this time series. This problem is an
important area of research in different disciplines. For instance, in seismology it is
important to be able to discriminate between data from earthquakes and nuclear
explosions (Dargahi-Noubary, 1992, Darg~hi-Noubary and Laycock, 1981, Kak-
izawa, et al., 1998, Shumway and Unger, 1974). In medicine the information from
the electroencephalographic time series (EEG) can be used for discriminating be-
tween different stages of sleep (Alagon, 1989, Gersch et al., 1979). In engineering
it is important to discriminate between a pattern generated by a signal plus noise
and a pattern generated by a noise alone, for example, to detect a radar signal for
determining the position of a moving target . In Economics we are interested in
classifying the economic situation as expansion or depression by considering the
values of some time series economic indicators. Finally, in Business a company
can be classified as successful or in potential trouble by looking at some time series
indicators of its economic activity.
The problem of making clusters of set of time series appears also in many sci-
entific fields but most of the published examples of cluster analysis in time series

383
384 Pedro Gale an o and Daniel Pen a

have been made with environmental data. We have time series from different lo-
cations and we want to make groups with locations with the same behavior. See
for instance Bohte et al. (1980) , Cowpertwait and Cox (1992) , Gantert (1994) ,
Walden (1994) and Macchiato et al. (1995) . There are several problems not com-
pletely solved in the application of cluster analysis in time series. The standard
approach for splitting a sample of multivariate data into clusters is to assume that
the multivariate observations have been generated by a mixture of multivariate
normal distributions with different means and covariance matrices and unknown
mixture probabilities. If the number of populations were known, the parame-
ters can be estimated by the EM algorithm or by MC 2 Bayesian methods. As
the number of population is unknown, a model selection procedure, such as the
BIC or AIC criteria is applied to select the number of populations involved . The
generalization of these approach to time series is to assume that data has been
generated by some set of possible multivariate time series models or data gen-
erating processes, M I , . . . , Mk, with unknown probabilities, and then the cluster
problem is closely related to the discrimination problem. However, this approach
has not yet been fully explored.
The problem of dimensionality reduction is very important for dynamic data
since for vector ARMA models, as well as for simultaneous equations econometric
models, the number of parameters to estimate grows rapidly with the number of
observed variables. An interesting extension of the idea of principal components
for time series is the canonical analysis of Box and Tiao (1977). Instead of finding
linear combination of maximum (or minimum) variability these authors studied
the problem of finding linear combinations of maximum (or minimum) predictabil-
ity. They showed that the canonical variables are useful for understanding and
simplifying the dynamic structure present in the vector of time series. Factor anal-
ysis oftime series was studied by Geweke and Singleton (1981), Brillinger (1981) ,
. Engle and Watson (1981), Molenaar (1985), Pena and Box (1987), Molenaar et
al. (1992), Pena and Poncela (1998) among others. An alternative approach to
dimension reduction is the reduced rank approach by Velu et al. (1986) and Ahn
and Reinsel (1988). In the nonstationary case estimating the nonstationary fac-
tors is equivalent to testing for cointegration in the econometrics field (whose vast
literature we do not pretend to review here), since the number of cointegration
relations among the components of a vector of time series is the dimension of the
vector minus the number of nonstationary common factors (see Escribano and
Pena, 1994). An alternative useful approach for model simplification is the scalar
components approach by Tiao and Tsay (1989) . Finally the state space approach
to time series includes procedures for dimension reduction (Hannan and Deistler,
1988, Aoki, 1990) .
This paper describes some of the development of these procedures in the time
domain. The reader interested in the devepment in the frequency domain is ad-
vised to read chapter 5 of Shumway and Stoffer (2000), which contains a good
review of this field. The article is organized as follows . In the next section the
problem of discrimination in time series is presented. The standard discriminat
Multivariate Analysis in Vector Time Series 385

analysis is seen as a model fitting exercise and it is shown that in practice, when
the parameters are unkonwn, discriminat analysis for time series is closely related
to the model selection problem that has been the subject of an important area of
research in time series. In Section 3 we present the clustering problem and discuss
some of the measures of distance among time series that have been proposed in the
literature. Some suggestions for further research in this field are also included. We
have decided to consider only in Section 4 the extensions of standard multivariate
methods, as the literature on model simplification and dimension reduction is very
large. Thus, in the section we present the extension of the principal component
idea of Box and Tiao (1977) and the Dynamic factor model. The relationship
between both approaches is discussed and we also relate the nonstationary . fac-
tor model and the cointegration literature. Section 5 presents some concluding
remarks.

2. Discrimination in Time Series

2.1 Linear Discrimination


Discriminant analysis has been mainly studied for Gaussian processes. The clas-
sical approach is as follows. Suppose a series with T observations, denoted by
x = (Xl"", XT)', which follows a Gaussian process with vector of marginal means
Pj = =
(Pjl; ... , PjT)' for j 1, 2. Assume that the process x - Pj is a zero mean
stationary process with covariance matrix L.j = =
{O"j (5 - t): 5, t 1, ... , T}. Thus,
under the hypothesis Hj, x '" NT (Pj, L.j) , for j =1,2. Then, the probability
density function for this process is,

The classical approach assumes that both covariance matrices are equal, L.l =
L.2 = L., but the means are unequal. Thus, we assume that the difference between
the two marginal means is due to some deterministic function. For instance, if
Pj; = b Oj + b1ji, the series have a different deterministic trend and if b 1j = 0
the series have a different marginal mean. The Neyman-Pearson lemma for the
hypothesis Hl : x E Ml versus H2 : x E M 2 , leads to the following rule for
accepting HI :
p(x/Hl ) > K (2)
p(X/H2) ,
for some value K that takes into account the probabilities of misclassifying the
time series. Assuming that the costs of missclassification are the same and that
the a priori probabilities of each model are also the same, we will classify the
observation in the model that have the maximum likelihood. This is equivalent
to accept the hypothesis H 1 if
386 Pedro Galeano and Daniel Pena

that is, if we denote by Di =


(x - Ild' E- 1 (x - f-l;) the Mahalanobis distance
between the data and the vector of marginal means , x is classified in the first
population if D2 > D l . An alternative interpretation of this rule can be obtained
by writing this equation as:

(3)

that implies that the scalar measure v = o/x is built, where

Calling ml = a'ill and m2 = t


a' 112 the series is classified in Ml if v > (m! m2 ) . If
we denote for D l 2, the Mahalanobis distance between the means of both popula-
tions,
(4)
then the linear discrimination function, v, is normally distributed with mean ml
under HI, and m2 under H 2 . The variance is D12 in both cases . Thus we classify
on Ml if the scalar variable v is closer to ml than to m2.
Note that this rule, obtained by the likelihood ratio test, is equivalent to fitting
the time series by both models and then choosing the model that leads to a smaller
residual variance. This result is clear from (1) because note that ej = x - Ilj,
j = 1, 2 are the residuals from the deterministic fit x = f-lj and aj = E-l/2 ej
corresponds to the residuals taking into account the stationary structure. Note
that the errors aj have an identity covariance matrix. Thus

(5)

and minimum Mahalanobis distance is equivalent to minimum residual sum of


squares. Another way to look at this property is by noting that if ej follows a zero
mean linear process the likelihood f( ej) can be written, by using the prediction
error decomposition , as

and the likelihood will only depend on the one-step ahead forecasting errors that
are equal to the residuals a. Note that for linear time series we can write the zero
mean process 7r (B) et = ft, where 7r (B) =1-7r1 B -7r2B2 _ ... , as IIe =
f, where

1 o o

o
-7rT-l 1
Multivariate Analysis in Vector Time Series 387

Suppose that the covariance matrix of t is (1"21 . Then , calling I; the covariance
matrix of e, we have

and, therefore,

and
lIn n
e':E-le = e' (1"2 (n'n) e = (1"2 L t; = La;
;=1 ;=1
in agreement with (5). Thus, discriminant analysis can be viewed as assigning
the observed time series x to the model (population) that when fitted to the time
series produces the smallest one step ahead squared forecast error.

2.2 Unequal Covariance Matrices


A more relevant case in time series discrimination is when the covariance matrices
are unequal. Suppose that we want to discriminate bet ween two time series mod-
els . Both imply Gaussian populations but with different covariance matrices and ,
for simplicity, we will assume that the marginal means are in both cases equal to
O. Then, the rule (2) says that we accept HI if
Q(x) = x' (:E2I - :Ell) X >K
which is a quadratic form. This discriminant rule has a simple interpretation in
term of prediction errors, because, as before, x':E; 1 x is the residual sum of squares
of the fitted model. Thus, the likelihood ratio test leads to fitting the observed
time series with both models and choosing the one with the smallest one step
ahead forecast error.
An alternative interesting interpretation of the discriminant rule is that it
assigns the series to the model producing the smallest interpolation error. The
best linear interpolator of a time series is given by (see for instance Pefia and
Maravall, 1991)
00

Xs = E [x s IXt, t =/: s] = - L pf (x s_; + Xs+;)


;=1

where pf are the coefficients of the dual autocorrelation function of the model
given by

00

with 1f (B) , the autoregressive form of the model, VD = (1"2 L 1ft and F = B- 1 ,
;=0
the forward operator. Then, Galeano and Pefia (2000) showed that
(x - x)'(x - x)
x':E;lx = ~--~~~--~
MSE(x)
388 Pedro Galeano and Daniel Peiia

where M SE(15) denotes the mean square interpolation error. That is, the series
Xt is assigned to the model that produces the smallest interpolation error or , in
other words, the model that is better fitted to the data.
As the distribution of Q(x) is difficult to find, Shumway (1982) suggests
that under Hj , j = 1,2, and for large values of T, Q(x) can be approxi-
mated by a normal distribution with mean tr ((E;-I - Ell) Ej ) and variance
2 . tr ((E2I - E2I) Ej) 2 , where tr denotes trace. This method has the principal
inconvenient that the eigenvalues must be obtained numerically, being the matri-
ces (E21 -- E2I) "E j very large, which makes a numerical solution very difficult to
obtain .
When the covariance matrix are different the optimum discriminant rule is
not linear. An alternative approach in these situations is to obtain a good lin-
ear discriminant rule according to some criteria. This is the idea of admissible
linear procedures introduced by Anderson and Bahadur (1962). For Gaussian
populations, under Hi, a linear discriminant rule, c/ x, has a univariate normal
distribution with mean a' Pi and variance a'Eia. Therefore the probability of
misclassifying an observation are given by

Pr (a'x < Klx E M 1) = <I>(K-a'Pl)


y'a'E I a '
<I> (a'P2 - K)
y'a'"E 2 a '

where <I> (x) is the cdf of the N(O, 1) distribution. The objective is to make these
=
values as small as possible, and this is equivalent to make the values, YI ~a ,a
and Y2 = ~fJ?;K,
a 2a
small. The set of desirable procedures are those that: (1)
minimize the probability of one error when the other is specified, or (2) minimize
the maximum probability of error, or (3) minimize the probability of error when
a priori probabilities of the two populations are specified. The solutions to these
problems are the set of admissible linear procedures. The set of solutions that
minimizes YI for each given Y2 is characterized by,

(6)

where the values tl and t2 verify K = =


a'PI +tla'I:la a'P2 - t2a'E2a .
Information measures usually leads to admissible linear procedures. For in-
stance, Kullback (1959) considered the Kullback-Leibler discrimination informa-
tion for discriminating in favor of HI over H 2 . It is given by

I(I : 2,a'x)
Multivariate Analysis in Vector Time Series 389

Another useful measure is the divergence that for discriminating is defined by


J (1 : 2, a'x) = 1(1 : 2, a'x) + I (2: 1, a'x).

The values of a that maximize 1(1 : 2, a'x), 1(2 : 1, a'x) or J(1 : 2, a'x) are of
the form :Ela - ,\:E2a = 'YO, where 0 = (Ill -1l2) , for some values of the scalars
,\ and 'Y . As a consequence of this , the procedures based in the Kullback-Leibler
information and in the divergence are admissible linear procedures. Chaudhuri et
al. (1991) obtain linear discriminant procedures through the maximization of the
Bhattacharyya distance for Gaussian processes with unequal covariance matrices.
If (n, B, v) is a measure space and p is the set of all the probability measures
on B which are absolutely continuous with respect to v, then the Bhattacharyya
distance between two probability measures with density functions PI and P2 be-
longing to p, is defined by,

-In P (PI, P2) = -In J


n
../PIP2dv.

Under HI : x E N (IlI,:Ed and H2 : x E N (1l2, :E 2) , the linear discriminant


function obtained maximizing -In P (PI, P2) is,

a'x = (Ill -1l2)' (:E I - :E2)-1 x.

Chaudhuri (1992) considered the problem of classifying a complex normal time


series through the maximization of the previous distance.
When the parameters of the models are unknown they must be estimated from
the data. Although in principle we can plug in the estimates and use the same
criteria that in the known parameter case this is not a good solution when the
number of parameters in both models are very different. For instance, suppose
that one of the possible models is an AR(I) and the other is an AR(5) with
four complex roots, that is, we are checking if an observed time series presents
pseudo-cycles. Then if we use the plug in procedure of obtaining the estimates
and introducing them in the discriminant function we will always get that the
model with a larger number of parameters provides a better fit. Thus we have to
take into account the difference between in sample fit and out of sample forecast.
Several criteria has been proposed for selecting time series models since the
seminal work of Akaike (1969, 1974). Among them are the Bayesian Information
criteria BIC of Schwarz (1978) and Akaike (1979), the penalty methods of Hannan
and Quinn (1979), the predictive least squares criterion of Rissanen (1986), ex-
tended by Lai and Lee (1997), and the modified AIC of Hurvich and Tsai (1989)
and Cavanaugh and Shumway (1997). Surveys on the performance of these crite-
ria for ARMA order selection can be found in Bhansali (1993) and Postcher and
Srinivasan (1994).
These criteria have the general form
C = -2(logmaximized likelihood)+f(number of parameters), (7)
390 Pedro Galeano and Daniel Peiia

where the function f depends on the criteria. For instance, for ARMA models the
AIC of Akaike is
Ale = n log 0: 2 + 2(p+ q)
where (p + q) is the number of parameters in the model. The BIC criteria due to
Schwarz (1978) is

BIG = -2(logmaximized likelihood)+(logn)(number of parameters) . (8)

This last criterion has been shown to' have a very good performance in many model
selection problems.
Some Bayesian approaches have been proposed that do not adopt a formal
Bayes rules via a loss function. For example, Broemeling and Son (1987) consider
how to assign an observed time series to one of several possible autoregressive
sources with a common known order and unknown parameters and error variance.
Using a vague prior density for the parameters and the variance, the observed time
series data is assigned to one class by using the marginal posterior mass function
of a classification vector,
A = (AI, .. ., Ak)'
with k mass points (1,0, ... ,0)" ... , (0 , ... ,0,1)'. The realization is assigned to the
process i if the posterior mass function of A has its largest value at the i - th mass
point. Marco et al. (1988) consider the case of different autoregressive classes.
The data is assigned to the class k, if it has the greater predictive probability.
Finally, there exist other approaches for discrimination in which discriminant
functions are not used. For instance Kedem and Slud (1982) proposed transform
a stationary time series into binary arrays that retain only the signs of the j - th
difference series. This binary series are used for discriminating among different
models. Li (1996) proposed a generalization of this method through the use of
parametric filtering, i.e., using a family of filters indexes by a parameter. The
series is filtered and the information provided by the autocorrelation function is
used for discriminate the series into different models.

3. Clustering Time Series

Suppose that we have a large set of time series following different models. In a
nonparametric approach each series is considered as a point on ~T, where T is
the length of the series. A straightforward generalization of the standard cluster
methods is to obtain groups of series by looking at the distance between these
points in the space. In order to identify groups we can work directly with a
distance metric in ~T, or we can try to work in a smaller space by projecting the
points according to some optimality criterion. This criterion should be related to
the possibility of identifying clusters in the projected cloud of points.
A parametric approach would proceed by first fitting time series models to the
data and then representing the series by the vector of estimated parameters. If
Multivariate Analysis in Vector Time Series 391

the dimension of the vector of parameters is p these parameter vectors will be


points in ~P, and again we can try to find points that are close in this space. In
both cases we can define a measure of distance and then use a standard k-means
type algorithm . Thus, an important first step is obtaining an appropriate metric
for measuring the similarity between points.
In the parametric ARIMA approach each time series is represented by the
vector of parameters models. For instance, if the series are fitted by

i = 1, ... ,k

where cPi (B) = 1 - cPilB - ... - cPipBP, and Bj (B) =


1 - BilB - ... - BiqBq,
we can represent the series by the autoregressive and moving-average parameters
including all of them in a vector

and then defining a measure of distance by

where ~p is an appropriate matrix to define the metric that, in particular, it can


be the identity. Bruce and Martin (1989) consider a similar measure of distance
between ARIMA models. However, as indicated by Peiia (1989) this measure
has three main principal problems. The first is that it cannot compare ARIMA
models with different degrees of differencing. The second is that it does not take
into account the possibility of cancellation between AR and MA. For instance
the models (1 - .9B)xt = (1 - .89B)ft is almost exactly the same as the model
Xt = ft whereas with this metric both will seem very different. The third is
that it does not allow for the duality between the AR and M A forms. A more
convenient measure is defined through the comparison between the coefficients of
the polynomial7r (B) , obtained from B (B) 7r (B) = cP (B) (1- B)d.
Piccolo (1990) introduced a metric for ARIMA models that can be used for
classifying and clustering time series. Let Xt is a zero mean stochastic processes
following an ARIMA(p, d, q) model in the usual notation, ¢ (B) Xt = B(B) ft,
where ft is Gaussian white noise. When Xt is invertible, it is possible to define
the autoregressive operator 7r (B) = 0- 1 (B) ¢ (B) = 1 - 7rlB - 7r2B2 - .... The
coefficients of 7r (B) convey all usual information about the stochastic structure
given initial values and the order of the process. If £ denotes the set of invertible
processes, we can define a measure of structural diversity between processes in
£ comparing their respective 7r sequences. The metric on £ is defined by the
distance,

{f:
1

d(x, y) = (7rj,x _ 7rj ,y)2} "2 ,


3=1
392 Pedro Galeano and Daniel Peiia

which always exists for every x, y E £ , and being the zero element the sequence
(0 , ... ,0) . We notice that a dual metric can be defined by

{f
1

d(x, y) = ('1PJ,x _ tP j ,y)2} 2 ,

3=1

where the tP sequence defines the MA(oo) operator as tP(B) = ¢(B)-lO(B) =


7[-1 (B) . However this metric can not be computed for integrated processes.
This definition of distance allows to perform applications to clustering algo-
rithms, through the study of similarities between time series. Piccolo (1990)
applies the following method to study a possible similarity in the behavior of
industrial production series in different sectors. The algorithm starts defining a
model for each series considered and based on this model the distances between
all the time series are computed. Then a dendrogram based on the similarities is
built and that gives us the different clusters formed by the models. An alternative
procedure, also used in the paper, is the classical solution of multidimensional
scaling to the distance matrix previously obtained , that is, obtaining a configura-
tion of points in a convenient space where the interpoint distance reproduces the
similarity matrix. The results found in the two ways are very similar.
Nonparametric clustering techniques for time series have been less studied be-
cause of the difficulties of defining a general measure of distance between stationary
time series sequences. In order to illustrate a possible procedure suppose that we
have n zero mean and unit variance stationary time series sequences, Xl, .. . , Xn.
We assume first that the data has been centered and scaled. A possible distance
metric among the points Xi is the Euclidean metric. However, this metric is in-
variant to transformations which modify the order of the observation over time
in the two series that are compared an,d , therefore, it does not take into account
the correlation structure of the stationary data. That is, given the original set of
time series Xi = (Xii, .... , XiT) for i = 1, ... , k, if we now built a new set of time
series sequences xt = (xi1' ... . , xiT) by using the same permutation of the time
observation for all the series the Euclidean distance between the elements in the
second set are identical to those in the first , whereas the correlation structure of
the second set can be arbitrarily distorted. Thus, the Euclidean distance does not
take into account the autocorrelation structure.
The distance measure to be used depends on the kind of similarities we are
interested in. We may be interested in (a) Finding series with a similar correla-
tion structure or (b) Finding series with a similar noise structure. In case (a) a
straightforward measure of distance is to compute the autocorrelation coefficients
ri = (ri(l), .... ,ri(h)) for some h such that riU)::: 0 for j > h and then use
D(Xi,Xj) = (r; - rj)'Wr(ri - rj)'

for some weighting function Wr that can be used to give weights to the coefficients
that decrease with the lag. This measure is related to the parametric approach
Multivariate Analysis in Vector Time Series 393

as the parameters of the autoregressive approximation are computed from the


autocorrelation coefficients.
For (b), a model is fitted to each series and the residuals f are obtained. Then
a measure of distance between them is built by
D(Xi,Xj) = (Ei - fj)'W(fi - €j),
where, for instance, the matrix W can be used to give more weight to the re-
cent values than to the oldest values in the time series. Both procedures can be
combined to define a measure of distance that takes into account both sources of
variability by
. D(Xi,Xj) = Adr; - rj)'Wr{r; - rj) + AZ{€; - €j)'W{€; - €j),
where Ai, i = 1, 2 are normalizing constants. This idea does not seem to have
been yet explored in the literature.

4. Dimension Reduction

4.1 Canonical Analysis


The canonical analysis of time series was introduced by Box and Tiao (1977) and
can be considered as a principal component analysis of time series. This work was
very important because (I) It leads to a clear solution of the dimension reduction
problem in terms of prediction, (2) It introduces, for the first time, the idea that
linear combination of nonstationary time series can be stationary, that is, the idea
of cointegration.
Suppose a m x 1 vector Xt following a stationary VAR{p) model
¢ (B) Xt = Ct.
We can always write the orthogonal decomposition
Xt = xt-l{l) + Ct,
where xt-l(l) is the one step ahead prediction. Corresponding to this decompo-
sition we can also split the covariance matrix, E [XtX;] rx(O), as=
r x{O) = Fx{O) + 1:,
where E{ct€D = E and E [xt-l{l)xt-l(l)] =
F;e(O) . We are interested in finding
a linear combination of Xt
Zlt =
m'xt,
such that it has maximum predictability. The variance of this linear combina-
tion is m'r x (O)m and this variance is decomposed into a~ explained variability,
m' Fx(O)m, and a residual variability, m'Em. We want to maximize
A = m'Fx(O)m
m'rx(O)m
394 Pedro Galeano and Daniel Peiia

and the value that maximize this equation is

Thus, m must be the largest eigenvector of the matrix obtained as product of the
matrix of explained variability and the inverse of the matrix of total variability,

The procedure can be extended to find other linear combinations by choosing


as m the ordered eigenvectors of the matrix Q. Thus, in practice the canonical
decomposition consists in finding the eigenvalues and eigenvectors of the matrix

Qm; = .A;m;,
the eigenvectors provide the required linear combination and the eigenvalues the
predictability of these linear combinations. Building the matrix M = [m1 .. .mp]'
where .A1 ~ .A2 ~ ... ~ .Ap and the transformation

Zt = M'Xt
a new vector of time series is obtained with components ordered from most to
least predictable. The components are contemporaneously uncorrelated because
it is easy to show that the matrices M'roM and M'Y:.M are both diagonal.

4.2 The Dynamic Factor Model


The factor model has a straightforward extension to the dynamic case. Consider
the possible nonstationary vector process Xt. The dynamic factor model assumes
that this time series vector, which we assume has dimension m, has been generated
by the equation
=
Xt Pit + 11t, (9)
where P is a m x r loading matrix that we assume is normalized in such a way that
P' P = I . Thus, all the common dynamic structure comes through the common
factors, It, and the 11t includes the independent idiosyncratic components. We
suppose that the vector of common factors follows a VARIMA(p, q) model

I!> (B) It = E>(B)at (10)

where I!> (B)= l-¢lB- ... -¢pBp and E>(B) =I -fhB- ... -OqBq are polynomial
matrices r x r and the roots of II!> (B) I are on or outside the unit circle and those
of IE> (B) I are outside the unit circle. The sequence at is serially uncorrelated with
zero mean and covariance matrix Y:. a . The components of the vector of common
factors can be either stationary or nonstationary.
The specific dynamic structure associated with each of the observed series is
included in the vector 11t of idiosyncratic components. Some components of this
Multivariate Analysis in Vector Time Series 395

vector can be white noise, while other ones can have stationary dynamic structure.
In general, we assume that nt follows the vector ARMA model

(11 )

where ~n(B) and 0 n (B) include mxm diagonal matrices. The sequence of vectors
et are normally distributed, have zero mean and diagonal covariance matrix I;e.
Therefore, each component follows an univariate ARMA(Pi, qi), i = 1,2,· ··, m ,
being p=max(Pi) and q=max(qi), i =
1,2,···, m. We assume that the noises
from the common factors and specific components are also uncorrelated for all
lags, E(ate~_h) = 0, 'tIh.
The model as stated is not identified, because for any r x r non singular matrix
H the observed series can be expressed in terms of a new set of factors ,

where I; = Hit , and


~*(B)lt = 0*(B)a;
where a; = Hat. With this transformation the old system matrices are related
to the new system matrices by

~*(B) H~(B)H-l,
0oO(B) H0(B)H-l,
~: H~aH'.

To solve this identification problem, we can always choose either ~a = lor pIp =
I. Note that as
p*' P* = (H- 1), P' P H- 1
if pIp = I then PoO' poO = (H- 1 ), H-l that will only be the identity matrix if H is
orthogonal. Therefore the model is not yet identified under rotations, and we need
to introduce a restriction to estimate the model. The standard restriction used to
solve this problem in static factor analysis is that p/~;; 1 P should be diagonal.
Harvey (1989) imposes that Pij = 0, for j > i, where P =
[Pij). This condition is
not restrictive, since the factor model can be rotated for a better interpretation
when needed (see Harvey, 1989, for a brief discussion about it).
Peiia and Poncela (1998) showed that the model presented is fairly general
and includes also the case where lagged factors are present in equation (9). For
instance, for ease of exposition assume a stationary model with no specific com-
ponents, but with lagged factors on the observation equation, such as

Xt = Pv(B)Ft + nt,
where v(B) = I + v1B + ... + vlB I , 1<00 and F t follows a VARMA model

Ft = 'iJI(B)at, 'iJl o = I.
396 Pedro Galeano and Daniel Pefia

This model can be rewritten in the standard form presented in (9) with

It = Ft + V1 F t-1 + ... + vlFt-1


following_the VARMA model It = ~(B)at where ~(B) 2::1 ~iBi :nd ~i
satisfies Wi = Wi + VI Wi-l + ... + V/Wi _ / with Wj = Orxr if j < 0 and Wo = I.
Since matrices Vi are of constants coefficients, II v;! I < 00 and equation It = ~ (B)at
also represents a VARMA stationary process. Therefore the standard formulation
presented in (9) can include important complex relationships between the series
and the factors .
In the particular case that all the factors are stationary and the component nt
is white noise, f, the dynamic factor model reduces to the model studied by Peiia
and Box (1987). In this case, assuming E(xt) =
0 and calling r x (k) = E [XtX;_k]
and r J (k) = E [Jd:- k ] we have that

rx (0) prJ (0) pI + E f ,


rx (k) prJ (k) pI, k ~ 1,
which implies that the columns of P are eigenvectors of the matrix r x ' (k) for all
k > 1. To show this note that

and,

E [Pld:-k P' + ftf~_k + Pltf~_k + fd:_ k Pl ] = PE [Id:-k] pI = prj (k) P' .


Thus, the eigenvalues of r x (k) are the covariance of the factors, k ~ 1. Peiia and
Box (1987) proposed the following procedure to recover the factors:
(1) Compute eigenvalues and eigenvectors ofr x (k) for k ~ 1.
(2) Obtain the number of common factors by the rank of the matrices r x (k) .
Assume that the common rank is r, the number of common factors.
(3) Use the non zero eigenvectors of r x (k), for k ~ 1, in order to estimate the
loading matrix P.
(4) Build the transformation M =
[P V), where P'V =
0, V belong to the
null space of P and apply it to the Xt in order to recover the factors. As pIp = I
we have
=
pIXt It + plft
and

Then, the transformation


Zt = M'Xt

gives r linear combinations of the time series components that will recover the
factors and m - r combinations that will be white noise.
Multivariate Analysis in Vector Time Series 397

This model has been estudied in the nonstationary case by Pella and Poncela
(1998). They showed that the identification of the nonstationary I(d) factors
can be made through the common eigenstructure of some generalized covariance
matrices, properly normalized. The number of common nonstationary factors is
the number of nonzero eigenvalues. Thus, a similar identification procedure can be
applied in the stationary and in the nonstationary case. Once we have a preliminar
estimation of the dimension of the system we can estimate the factor loading
matrix and the parameters of the VARIMA factor representation by writing the
model in the state space from and use the EM algorithm.

4.3 Relationship between Canonical Analysis and the Dynamic Factor


Model
Let us show the relationship between the dynamic principal components or
canonical analysis and the standard principal component approach that considers
the eigenstructure of r 2;(0). In the canonical analysis we obtain eigenvectors from
Q = =
r2;(O)-1 (r,,(O) - E) 1- r2;(0)-1E. Note that:
(1) Q, r2;(O)-1E, and E- 1r,,(0) have the same eigenvectors;
(2) The largest eigenvalue of I - r ",(0)-1 E is the smallest of r ",(0)-1 E;
(3) The smallest eigenvalue of r;1(0)E is the largest of E- 1r",(0).
Then the canonical analysis can be interpreted as obtaining eigenvectors from
E- 1 r ",(0), whereas the standard principal component approach uses directly the
matrix r", (0).
To understand better this difference, suppose that the factorial model holds
and

r",(O) = pr/(O)p' + E,
then

and if V is such that P'V = 0 then

and therefore a transformation based on the eigenvectors of E- 1r2;(0) will also


separate the factors from white noise.
Let us consider now the relationship between the canonical analysis and the
identification procedure in the factor model as developed by Pella and Box (1987)
and Pella and Poncela (1998). Consider the stationary case to simplify. Then the
factor are initially estimated by computing eigenvalues and eigenvectors of r", (k)
for k ~ 1. Thus this identification depends only on P. whereas in the canonical
analysis the components obtained depend on both P and E.
398 Pedro Galeano and Daniel Pen a

4.4 Cointegration and the Factor Model


Suppose that Xt follows a nonstationary model 4> (B) \7Xt = €t. Then we say that
Xt is /(1). There will be cointegration among the components if we can find linear
combinations that are stationary. That is, we will say that the components of Xt
are cointegrated if there exits a m x p matrix (3 such that

(3' Xt = stationary
that is, Xt ,..... / (1) but (3' Xt ,..... / (0). The matrix (3 is called the cointegration
matrix and there will be p linear combinations that lead to stationary processes.
In order to see the implications of this property, suppose the simplest /(1) model
Xt =Xt-l + €t· We can write this model as

(12)

and note that if Xt follows the multivariate random walk the value of II in this
equation is zero and this implies no cointegration. However, if the process is a
stationary VAR(1) process we can write the model as

Xt = (II - I)Xt-l + ft
in this equation II is a full rank matrix becasues II = 4> + / where 4> is the
AR matrix that must have eigenvalues smaller than one for the process to be
stationary. Thus saying that II is a full rank matrix implies that Xt follows and
VAR(l) , all the components are stationary, or they are /(0) . A third intermediate
possibility is that II is neither a zero matrix nor a full rank matrix but it has
rank p . Let us show that this implies cointegration, that is, in this case some
linear combination of the vector of time series are stationary whereas some others
will be nonstationary. To show this property note that if II has rank p it can be
written as
II = a(3'
where a and (3 are m x p matrices of rank p < m. Now if we multiply (12) by (3',
we have
\7(3' Xt =
((3' a )(3' Xt-l + (3' ft
and calling Zt = (3' Xt we have that

and II* is a full rank matrix and Zt stationary. Thus, the p linear combinations
will be stationary whereas if the matrix m x (m - p), 0'.1 belongs to the null
(3' Xt
space of a, that is, it verifies O'~ 0' = 0, we have that the m - p combinations a~ Xt
are nonstationary.
There is a close connection between cointegration and the factor model. Es-
cribano and Peiia (1994) showed that the following two proposition are equivalent:
Multivariate Analysis in Vector Time Series 399

(1) The individual components of Xt are I (1) but there are p cointegration
relationships, (3' Xt, that are 1(0).
(2) Xt can be written as generated by m - p common factors that are 1(1) .
Thus, cointegration implies common factor and common nonstationary factors
implies cointegration . From the practical point of view if the dimension m is large
it is simpler to look for a few factors than for many cointegration relations.

5. Conclusions

As any stationary time series is a sample from some multivariate distribution


one could expect that multivariate classical methods will be widely applied in
time series. However, in practice the time series analysis is made without any
reference to multivariate analysis by using · the special structure implied by the
ordering of the observations on time. Some univariate time series identification
methods have been based on canonical correlation analysis (see Tiao and Tsay,
1985) but in general the use of multivariate methods in univariate time series
is small. However, with vector time series multivariate techniques are of key
importance. Discrimination is related to the problem of model selection, clustering
methods appear in a natural way when working with large set of time series and
methods for dimension reduction are a clear need for practical model building. In
fact, it was shown by Peri a and Box (1987) that building a VARMA model ignoring
the possible common factors is a sure method to look for trouble: the MA and AR
parameter matrices are not identified when common factors are present and so we
could end up building a very complicated multivariate VARMA model when in
fact the data generating process is very simple. Also Tiao and Tsay (1989) have
shown the usefulness of linear combinations of the vector of observed time series
for model simplification.
We have seen that the discrimination problem is closely related to the model
selection problem, and the criteria to choose models can be applied to select the
data generating process in discriminant analysis. In time series cluster methods,
more research is needed in order to have meaningful procedures that search for
useful configurations taking into account the autocorrelation structure and new
algorithms need to be developed to implement them. Although research in model
simplification and dimension reduction has been very large, still more research is
needed in order to compare the advantages and drawbacks of the different proce-
dures available. We expect that this review can stimulate further developments
in this area in the future.

Acknowledgment

This work has been partially supported by DGES, grant PB96-0111, and Catedra
BBVA de Metodos para la Mejora de la Calidad.
400 Pedro Galeano and Daniel Peiia

References

AHN, S . K. AND REINSEL, G. C. (1988). Nested reduced-rank autoregressive


models for multiple time series, J. Amer. Statist. Assoc., 83, 849-856 .
AKAIKE, H. (1969) . Fitting autorregresive models for prediction, Annals Inst.
Stat. Math. " 21, 343-347.
AKAIKE, H. {1974} . A new look at the statistical model identification, I.E.E.E.
Trans . Aut. Contr. AC, 19,203-217.
AKAIKE, H . (1979). A Bayesian extension of the minimum AlC procedure of
autorregressive model fitting, Biometrika, 66, 2 237-242.
ALAGO!li, J. (1989). Spectral discrimination for two groups of time series, J.
Time Ser. Anal., 10, 3, 203-214.
ANDERSON, T. W. AND BAHADUR, R. R. (1962). Classification into two multi-
variate normal distributions with different covariance matrices, Ann. Math.
Stat. 33, 420-431.
AOKI, M (1990). State Space Modeling of Time Series. Springer: Berlin .
BHANSALI, R. J. (1993) . Order selection .for linear time series models: a review.
In T . Subba Rae (eds) Developments in Time Series Analysis. Chapman and
Hall, London, 50-66.
BOHTE, Z ., CEPAR, D . AND KOSMELJ, K. (1980). Clustering of time series,
COMPSTAT 1980,Proceedings in Computational Statistics, 587-593.
Box, G . AND TIAO , G. (1977). A canonical analysis of multiple time series,
Biometrika, 64, 355-65.
BRILLINGER, D. R. (1981). Time series Data Analysis and Theory, expanded
edition. San Francisco: Holden-Day.
BROEMELING , L . D. AND SON, M . S. (1987). The classification problem with au-
toregressive processes, Communications in Statistics (Theory and Methods),
16, 927-936.
BRUCE, A. G. AND MARTIN R. D. (1989). Leave-k-out Diagnostics for time
series (with discussion), J. R. Statist. Soc. B, 51, 3, 363-424.
CAVANAUGH, J . E . AND SHUMWAY, R . H. (1997). A Bootstrap variant of AlC
for state space model selection, Statistica Sinica, 7, 473-496 .
COWPERTWAIT, P . S . P., AND Cox, T . F . (1992). Clustering population means
under heterogeneity of variance with an application to a rainfall time series
problem, The Statistician, 41, 113-121.
CHAUDHURI, G. , BORWANKAR, J. D. AND RAO, P. R. K. (1991) . Bhattacharyya
Multivariate Analysis in Vector Time Series 401

Distance based linear discriminant function for stationary time series, Com-
munications in Statistics (Theory and Methods), 20, 2195-2205 .
CHAUDHURI, G . (1992) . Linear discriminant function for complex normal time
series, Statist. Probab. Lett. , 15, 277-279.
CIS Extended Data Base Publication .
DARGAHI-NoUBARY, G. R . (1992). Discrimination between gaussian time series
based on their spectral differences, Communications in Statistics (Theory and
Methods), 21, 2439-2458 .
DARGAHI-NoUBARY, G. R . AND LAYCOCK, P. J. (1981). Spectral ratio discrim-
inants and information theory, J. Time Ser. Anal., 2, 2, 71-86.
ENGLE, R. F. AND WATSON, M . W. (1981). A one-factor multivariate time
series model of metropolitan wage rates, J. Am. Statist. Ass., 76, 774-78l.
ESCRIBANO, A. AND PENA, D. (1994) . Cointegration and common factors , J.
Time Ser. Anal., 15, 577-586.
GALEANO, P . AND PENA, D. (2000). Discrimination in time series and the
interpolation error. Working Paper, Universidad Carlos III de Madrid.
GANTERT , C. (1994) . Classification of trends via the linear state space model,
Biometrical Journal, 36, 825-839.
GERSCH, W., MARTINELLI, F., YONEMOTO, J., Low, M . D. AND McEWAN,
J. A. (1979) . Automatic classification of electroencephalograms: Kullback-
Leibler nearest neighbor rules. Science,205, 193-195.
GEWEKE, J. F. AND SINGLETON K. J. (1981). Maximum Likelihood Confirma-
tory Analysis of Economic Time Series International Economic Review, 22,
37-54.
HANNAN, E. J . AND QUINN, B. J. (1979). The determination of the order of an
autoregression, J . R. Statist. Soc. B, 41, 190-195.
HANNAN E. J. AND DEISTLER, M. (1988). The Statistical Theory of Linear
Systems. New York: John Wiley.
HARVEY, A. (1989). Forecasting Structural Time Series Models and the Kalman
Filter (2nd edn). Cambridge: Cambridge University Press.
HURVICH, C. M. AND TSAI, C. L. (1989). Regression and Time series model
selection in small samples, Biometrika, 76, 297-307.
KAKIZAWA, Y ., SHUMWAY , R . H . AND TANIGUCHI, M. (1998). Discrimination
and clustering for multivariate time series, J. Am. Statist. Ass.,93, 328-340 .
KEDEM, B. AND SLUD, E . (1982). Time series discrimination by higher order
crossings, Annals of Statistics, 10, 786-794.
402 Pedro Galeano and Daniel Pena

KULLBACK , S . (1959). Information Theory and Statistics. Smith, Gloucester,


MA.
LAI, T. L. AND LEE, C. P. (1997). Information and Prediction criteria for
model selection in sthocastic regression and ARMA models, Statistica Sinica,
7, 285-309.
LI, T. (1996) . Discrimination of time series by parametric filtering, J. A m. Statist.
Ass., 91, 284-293.
MACCHIATO, M. F., LA ROTONDA, L., LAPENNA, V., RAGOSTA, M. (1995).
Time modelling and spatial clustering of daily ambient temperature: An
application in Southern Italy, EnvironMetrics, 6: 31-53.
MARCO, V. R., YOUNG, D. M. AND TURNER, D. W. (1988) . Predictive discrim-
ination for autoregressive processes ,Pattern Recognition Letters, 7, 145-149.
MOLENAAR, P. C . M. (1985). A dynamic factor model for the analysis of multi-
variate time series, Psycometrika 50, 181- 202.
MOLENAAR, P. C. M, DE GOOIJER, J. AND SCHMITZ, B. (1992). A dynamic
factor analysis of nonstationary multivariate time series, Psycometrika 57,
333- 349.
PENA, D. (1989). Discussion of " Leave-k-out Diagnostics for time series" ,J. R.
Statist. Soc. B, 51, 3, 414-415 .
PENA, D . AND Box, G. (1987) . Identifying a simplifying structure in time series,
J. Am. Statist. Ass., 82, 836-843 .
PENA, D . AND MARAVALL, A. (1991) . Interpolation, outliers and inverse auto-
correlations, Communications in Statistics {Theory and Methods}, 20, 3175-
3186.
PENA , D. AND PONCELA, P. (1998) . Nonstationary Dynamic factor Analysis,
Working paper, Universidad Carlos III de Madrid.
PICCOLO, D. (1990). A distance measure for classifying ARIMA models, J. Time
Ser. Anal., 11, 2, 153-164.
POSTCHER, B. M. AND SRINIVASAN, S. (1994). A comparison of order determi-
nation procedures for ARMA models, Statistica Sinica, 4, 29-50.
RISSANEN, J. (1986) . Sthocastic complexity and Modelling Annals of Statistics,
14, 3, 1080- 1100.
SCHWARZ, G . (1978). Estimating the dimension of a model, Annals of Statistics,
6, 2, 461-464.,
SHUMWAY, R. H . {1982). Discriminant analysis for time series, Handbook of
Statistics,2, 1-46.
Multivariate Analysis in Vector Time Series 403

SHUMWAY, R. H. AND UNGER, A. N. (1974). Linear discriminant functions for


stationary time series, J. Amer. Statist. Assoc., 69, 948-956.
SHUMWAY, R. H. AND STOFFER, D. S. (2000). Time Series Analysis and Its
Applications. New York: Springer.
TIAO, G. C. AND TSAY, R. S. (1989). Model specification in multivariate time
series, J. R. Statist. Soc. B, 51, 157-213.
TIAO, G. C. AND TSAY, R. S. (1985). Use of canonical analysis in time series
model identification, Biometrika 72, 299-315.
VELU, R. P., REINSEL, G. C . AND WICHERN, D. W. (1986). Reduced rank
models for multiple time series, Biometrika, 73, 105-118.
WALDEN, A. T. (1994). Spatial clustering: Using simple summaries of seismic
data to find the edge of an oil-field, Applied Statistics, 43: 385-398.

Pedro Galeano
Teaching Assistant and PH.D. Student
Departamento de Estadistica y Econometria
Universidad Carlos III de Madrid,
pgaleano@est-econ.uc3m.es
Spain
Daniel Pena
Professor of Statistics
Departamento de Estadistica y Econometria
Universidad Carlos III de Madrid,
dpena@est-econ.uc3m.es
Spain

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy