Combining Domain Knowledge and Statistical Models in Time Series Analysis
Combining Domain Knowledge and Statistical Models in Time Series Analysis
Abstract: This paper describes a new approach to time series modeling that
combines subject-matter knowledge of the system dynamics with statistical
techniques in time series analysis and regression. Applications to American
option pricing and the Canadian lynx data are given to illustrate this approach.
1. Introduction
In their Fisher Lectures at the Joint Statistical Meetings, Cox [11] and Lehmann
[31] mentioned two major types of stochastic models in statistical analysis, namely,
empirical and substantive (or mechanistic). Whereas substantive models are ex-
planatory and related to subject-matter theory on the mechanisms generating the
observed data, empirical models are interpolatory and aim to represent the observed
data as a realization of a statistical model chosen largely for its flexibility, tractabil-
ity and interpretability but not on the basis of subject-matter knowledge. Cox [11]
also mentioned a third type of stochastic models, called indirect models, that are
used to evaluate statistical procedures or to suggest methods for analyzing com-
plex data (such as hidden Markov models in image analysis). He noted, however,
that the distinctions between the different types of models are important mostly
when formulating and checking them but that these types are not rigidly defined,
since “quite often parts of the model, e.g., those representing systematic variation,
are based on substantive considerations with other parts more empirical.” In this
paper, we elaborate further the complementary roles of empirical and substantive
models in time series analysis and describe a basis function approach to combining
subject-matter (domain) knowledge with statistical modeling techniques.
This basis function approach was first developed in [29] for the valuation of
American options. In Sections 2 and 3 we review the statistical and subject-matter
models for option pricing in the literature as examples of empirical and substantive
models in time series analysis. Section 4 describes a combined substantive-empirical
approach via basis functions, in which the substantive component is associated with
basis functions of a certain form, and the empirical component uses flexible and
computationally convenient basis functions such as regression splines. The work
of Lai and Wong [29] on option pricing and recent related work in financial time
series are reviewed to illustrate this approach. Section 5 applies this approach to a
widely studied data set in the nonlinear time series literature, namely, the Canadian
1 Department of Statistics, Stanford Univeristy, Stanford, CA 94305, U.S.A., e-mail:
lait@stat.stanford.edu
2 Department of Statistics, The Chinese University of Hong Kong, Shatin, N.T., Hong Kong,
e-mail: samwong@sta.cuhk.edu.hk
AMS 2000 subject classifications: primary 62M10, 62M20; secondary 62P05, 62P10.
Keywords and phrases: time series analysis, domain knowledge, empirical models, mechanistic
models, combined substantive-empirical approach, basis function.
193
194 T. L. Lai and S. P.-S. Wong
lynx data set that records the annual numbers of Canadian lynx trapped in the
Mackenzie River district from 1821 to 1934. We use substantive models from the
ecology literature together with multivariate adaptive regression splines to come up
with a new time series model for these data. Some concluding remarks are given in
Section 6.
The development of statistical time series models in the past fifty years has wit-
nessed a remarkable confluence of basic ideas from various areas in statistics and
probability, coupled with the powerful influence from diverse fields of applications
ranging from economics and finance to signal processing and control systems. The
first phase of this development was concerned with stationary time series, leading to
MA (moving average), AR (autoregressive) and ARMA representations in the time
domain and transfer function representations in the frequency domain. This was
followed by extensions to nonstationary time series, either by fitting (not necessarily
stationary) ARMA models or by the Box-Jenkins approach involving the ARIMA
(autoregressive integrated moving average) models and their seasonal SARIMA
counterparts. More general fractional differencing then led to the ARFIMA mod-
els. The next phase of the development was concerned with nonlinear time series
models, beginning with bilinear models that add cross-product terms yt−i ǫt−j to
the usual ARMA model yt = β1 yt−1 + · · · + βp yt−p + ǫt + c1 ǫt−1 + · · · + cq ǫt−q , and
threshold autoregressive and regime switching models that introduce nonlineari-
ties into the usual autoregressive models via state-dependent changes or Markov
jumps in the autoregressive parameters. The monograph by Tong [44] summarized
these and other nonlinear time series models in the previous literature. The appro-
priateness of the parametric forms assumed in these nonlinear time series models,
however, may be difficult to justify in real applications, as pointed out by Chen and
Tsay [9].
Whereas the AR model yt = β1 yt−1 +· · ·+βp yt−p +ǫt is related to linear regression
since β T xt is the regression function E(yt |xt ) of yt given xt := (yt−1 , . . . , yt−p )T ,
and likewise its nonlinear parametric extensions yt = f (xt , β) + ǫt are related to
nonlinear regression, Chen and Tsay [9, 10] proposed to use nonparametric re-
gression for E(yt |xt ) instead. They started with functional-coefficient autoregres-
sive (FAR) models of the form yt = f1 (x∗t )yt−1 + · · · + fp (x∗t )yt−p + ǫt , where
f1 , . . . , fp are unspecified functions to be estimated by local linear regression and
x∗t = (yt−i1 , . . . , yt−id )T with i1 < · · · < id chosen from {1, . . . , p}. Because of sparse
data in high dimensions, local linear regression typically require d to be 1 or 2. To
deal with nonparametric regression in higher dimensions, they considered additive
autoregressive models of the form yt = f1 (yt−i1 ) + · · · + fd (yt−id ) + ǫt , in which the
fi can be estimated nonparametrically via the generalized additive model (GAM)
of Hastie and Tibshirani [19] . Making use of Friedman’s [15] multivariate adap-
tive splines (MARS), Lewis and Stevens [34] and Lewis and Ray [32, 33] developed
spline models for empirical modeling of time series data. Weigend, Rummelhart
and Huberman [48] and Weigend and Gershenfeld [47] proposed to use neural net-
works (NN) to model E(yt |xt ), while Lai and Wong [28] considered a variant called
stochastic neural networks, for which they could use the EM algorithm to develop
efficient estimation procedures that have much lower computational complexity
than those for conventional neural networks.
The preceding time series models are autonomous, relating the dynamics of yt to
Combining domain knowledge 195
the past states. In econometrics and engineering, the outputs yt are related not only
to the past outputs but also to the past inputs ut−d , . . . , ut−k . Therefore the AR
model has been extended to the ARX model (where X stands for exogenous inputs)
yt = β T xt + ǫt with xt = (yt−1 , . . . , yt−p , ut−d , . . . , ut−k )T . Instead of assuming a
linear or nonlinear parametric regression model, one can use nonparametric regres-
sion to estimate E(yt |xt ), as in the following financial application.
Example 1. As noted by Ross [40], option pricing theory is “the most successful
theory not only in finance, but in all of economics.” A call (put) option gives the
holder the right to buy (sell) the underlying asset (e.g. stock) by a certain date
T (known as the “expiration date” or “maturity”) at a certain price (known as
the “strike price” and denoted by K). European options can be exercised only on
the expiration date, whereas American options can be exercised at any time up to
the expiration date. The celebrated Black-Scholes theory, which will be reviewed in
Section 3, yields the following pricing formulas for the prices ct and pt of European
call and put options at time t ∈ [0, T ):
(2.1) ct = St e−d(T −t) Φ(d1 (St , K, T − t)) − Ke−r(T −t) Φ(d2 (St , K, T − t)),
(2.2) pt = Ke−r(T −t) Φ(−d2 (St , K, T − t)) − St e−d(T −t) Φ(−d1 (St , K, T − t)),
create sparsity of data in the space of (St , K, T − t). Training the options pricing
formula in the form of f (St , K, T − t) can only interpolate the data and can hardly
produce any good prediction because (St , K) in the future can be very different
from the data used in estimating f . The proposed transformation makes use of
the fact that all observed and future St /K are close to 1. Therefore, the proposed
transformation captures the stationary structure of the data and enable the non-
parametric models to predict well. Another point that Hutchinson, Lo and Poggio
[22] highlighted is the measure of performance of the estimated pricing formula.
According to their simulation study, even a linear f (St /K, T −t) can give R2 ≈ 90%
(Table I of Hutchinson, Lo and Poggio [22]). However, such a linear f implies a
constant delta hedging scheme which would provide poor hedging results. Since the
primary function of options is hedging the risk created by changes in the price of the
underlying asset, Hutchinson, Lo and Poggio [22] suggested using, instead of R2 , the
hedging error measures ξ = e−rT E[|V (T )|] and η = e−rT [EV 2 (T )]1/2 , where V (T )
is the value of the hedged portfolio at time T . In a perfect Black-Scholes world,
V (T ) should be 0 if Black-Scholes formula is used. However, from the simulation
study, the Black-Scholes formulas still give ξ > 0 and η > 0 because time is discrete.
Hutchinson, Lo and Poggio [22] reported that RBF, NN and PPR all give hedging
measures comparable to those of the Black-Scholes in the simulation study. For
real data analysis of futures options, RBF, NN and PPR performed better than the
Black-Scholes formula in terms of hedging.
For American options, instead of using these learning networks to approximate
the option price, Broadie et al. [5] used kernel smoothers to estimate the option
pricing formula of an American option. Using a training sample of daily closing
prices of American calls on the S&P100 Index that were traded on the Chicago
Board Options Exchange from 3 January 1984 to 30 March 1990, they compared the
nonparametric estimates of American call option prices at a set of (S/K, t∗ ) values
with corresponding parametric estimates obtained by using the approximations to
American option prices due to Broadie and Detemple [4], and found significant
differences between the parametric and nonparametric estimates.
In control engineering, the dynamics of linear input-output systems are often given
by ordinary differential equations, whose discrete-time approximations in the pres-
ence of noise have led to the ARX models (for white noise), and ARMAX models
(for colored noise) in the preceding section. The problem of choosing the inputs
sequentially so that the outputs are as close as possible to some target values when
the model parameters are unknown and have to be estimated on-line has a large
literature under the rubric of stochastic adaptive control ; see Goodwin, Ramadge
and Caines [16], Lai and Wei [27], Lai and Ying [30] and Guo and Chen [17]. More
general dynamics in the presence of additive noise have led to stochastic differen-
tial equations (SDEs), whose discrete-time approximations are related to nonlinear
time series models described in the preceding section. One such SDE is geometric
Brownian motion (GBM) for the asset price process in the Black-Scholes option
pricing theory. In view of Ito’s formula, the GBM dynamics for the asset price St
translate into SDE dynamics for the option price f (t, St ). Such implied dynamics
from the mechanistic model can be combined with subject-matter theory to derive
the functional form or differential equation for f and other important corollaries of
the theory, as illustrated in the following.
Combining domain knowledge 197
1 ∂2f
df (t, St ) = ∂f
∂t dt +
∂f 2 2
∂S dSt + 2 ∂S 2 σ St dt
2
∂f ∂f 1 2 2∂ f ∂f
= ∂t + µSt ∂S + 2 σ St ∂S 2 dt + σSt ∂S dwt .
For simplicity assume that the asset does not pay dividends, i.e., d = 0. Consider
an option writer’s portfolio at time t, consisting of −1 option and yt units of the
asset. The value of the portfolio πt is −f (t, St ) + yt St and therefore
∂f ∂f 1 ∂2f ∂f
dπt = − + µSt + σ 2 St2 − µyt St dt + σSt yt − dwt .
∂t ∂S 2 ∂S ∂S
Hence setting yt = ∂f /∂S yields a risk-free portfolio. This is the basis of delta
hedging in the options theory of Black and Scholes [3], who denote ∂f /∂S by ∆.
Besides GBM dynamics for the asset price, the Black-Scholes theory also assumes
that there are no transaction costs and no limits on short selling and that trading
can take place continuously so that delta hedging is feasible. Since economic theory
prescribes absence of arbitrage opportunities in equilibrium, πt that consists of −1
option and ∆ units of the asset should have the same return as rπt dt = r(−f +
St ∆)dt, yielding the Black-Scholes PDE for f :
∂f ∂f 1 ∂2f
(3.2) + rS + σ 2 S 2 2 = rf, 0 ≤ t < T,
∂t ∂S 2 ∂S
with the boundary condition f (T, S) = g(S), where g(S) = (K − S)+ for a put
option, and g(S) = (S − K)+ for a call option, where x+ = max(x, 0). This PDE
has the explicit solution (2.1) or (2.2) with d = 0. If the asset pays dividend at rate
d, then a modification of the preceding argument yields (3.2) in which rS(∂f /∂S)
is replaced by (r − d)S(∂f /∂S).
Merton [37] extended the Black-Scholes theory for pricing European options to
American options that can be exercised at any time prior to the expiration date.
Optimal exercise of the option is shown to occur when the asset price exceeds (or
falls below) an exercise boundary ∂C for a call (or put) option. The Black-Scholes
PDE still holds in the continuation region C of (t, St ) before exercise, and ∂C is
determined by the free boundary condition ∂f /∂S = 1 (or −1) for a call (or put)
option. Unlike the explicit formula (2.1) or (2.2) for European options, there is
no closed-form solution of the free-boundary PDE and numerical methods such as
finite differences are needed to compute American option prices under this theory.
By the Feynman-Kac formula, the PDE (3.2) has a probabilistic representation
f (t, S) = E[e−r(T −t) g(ST )|St = S], and the expectation E is with respect to the
“equivalent martingale measure” under which dSt /St = (r − d)dt + σdwt . This
representation generalizes to American options as the value function of the optimal
stopping problem
where Tt,T denotes the set of stopping times τ taking values between t and T .
Cox, Ross and Rubinstein [12] proposed to approximate GBM by a binomial tree,
with root node S0 at time 0, so that (3.3) can be approximated by a discrete-
time and discrete-state optimal stopping problem that can be solved by backward
induction. Denote f (t, S) by C(t, S) for an American call option, and by P (t, S)
for an American put option. Jacka [23] and Carr, Jarrow and Myneni [7] derived
the decomposition formula
Z 0n z̄(s) − z
P (t, S) = p(t, S) + Kρeρu e−ρs Φ √
u s−u
(3.4) z̄(s) − z √ o
− θe−(θρs+u/2)+z Φ √ − s − u ds,
s−u
and a similar formula relating C(t, S) to c(t, S), where z̄(u) is the early exercise
boundary ∂C under the transformation
Ju [24] found that the early exercise premium can be computed in closed form if
∂C is a piecewise exponential function which corresponds to a piecewise linear z̄(u).
By using such assumption, Ju [24] reported numerical studies showing his method
with 3 equally spaced pieces substantially improves previous approximations to
option prices in both accuracy and speed. AitSahlia and Lai [1] introduced the
transformation (3.5) to reduce GBM to Brownian motion and showed that z̄(u)
is indeed well approximated by a piecewise linear function with a few pieces. The
integral obtained by differentiating that in (3.4) with respect to S also has a closed-
form expression when z̄(·) is piecewise linear, and approximating z̄(·) by a linear
spline that uses a few unevenly spaced knots gives a fast and reasonably accurate
method for computing ∆ = ∂P/∂S.
The Black-Scholes price involves the parameters r and σ, which need to be
estimated. The yield of a short-maturity Treasury bill is usually used for r. Although
in the GBM model for asset prices which are observed at fixed intervals of time
(e.g. daily), one can estimate σ by the standard deviation of historical (daily)
asset returns, which are i.i.d. normal under the GBM model for asset prices, there
are issues due to departures from this model (e.g., σ can change over time and
asset returns are markedly non-normal) and due to violations of the Black-Scholes
assumptions in the financial market (e.g., there are actually transaction costs and
limits on short selling). Section 13.4 and Chapter 16 of Hull [21] discuss how the
parameter σ in the Black-Scholes option price is treated in current practice. In the
next section we describe an alternative approach that addresses the discrepancy
between the Black-Scholes-Merton theory and time series data on American options
and the underlying stock prices.
In this section we describe an approach to time series modeling that contains both
substantiative and empirical components. We first came up with this approach when
we studied valuation of American options. Its basic idea is to use empirical modeling
to address the gap between the actual prices in the American options market and the
option prices given by the Black-Scholes-Merton theory in Example 2, as explained
below.
Combining domain knowledge 199
Example 3. For European options, instead of using the basis function of Hutchin-
son, Lo and Poggio [22], an alternative approach is to express the option price as
∗
c + Ke−rt f ∗ (S/K, t∗ ), where c is the Black-Scholes price (2.1) because the Black-
Scholes formula has proved to be quite successful in explaining empirical data. This
is tantamount to including c(t, S) as one of the basis functions (with prescribed
weight 1) to come up with a more parsimonious approximation to the actual option
price.
The usefulness of this idea is even more apparent in the case of American options.
Focusing on puts for definiteness, the decomposition formula (3.4) expresses an
American put option price as the sum of a European put price p and the early
exercise premium which is typically small relative to p. This suggests that p should
be included as one of the basis functions (with prescribed weight 1). Lai and Wong
[29] propose to use additive regression splines after the change of variables u =
−σ 2 (T −t) and z = log(S/K). Specifically, for small T −t (say within 5 trading days
prior to expiration, i.e. T − t ≤ 5/253 under the assumption of 253 trading days per
year), we approximate P by p. For T − t > 5/253 (or equivalently, u < −5σ 2 /253),
we approximate P by
Ju
X
ρu
P = p + Ke {α + α1 u + α1+j (u − u(j) )+
j=1
Jz
X
(4.1) + β1 z + β2 z 2 + β2+j (z − z (j) )2+ + γ1 w + γ2 w2
j=1
Jw
X
+ γ2+j (w − w(j) )2+ },
j=1
is an “interaction” variable derived from z and u. The motivation behind the cen-
tering term (ρ − θρ − 1/2)u comes from (3.5) that transforms GBM into Brown-
ian motion, whereas that behind the normalization |u|−1/2 comes from (3.4) and
the closely related d1 (x, y, v) in (2.2). The knots u(j) (respectively z (j) or w(j) )
of the linear (respectively quadratic) spline in (4.1) are the 100j/Ju (respectively
100j/Jz and 100j/Jw )-th percentiles of {u1 , . . . , un } (respectively {z1 , . . . , zn } or
{w1 , . . . , wn }). The choice of Ju , Jz and Jw is over all possible integers between 1
and 10 to minimize the generalized cross validation (GCV) criterion, which can be
expressed in the following form (cf. [19, 46]):
n
,( !2 )
X J u + Jz + J w + 6
2
GCV(Ju , Jz , Jw ) = (Pi − P̂i ) n 1− ,
i=1
n
where the Pi are the observed American option prices in the past n periods, and
the Pbi are the corresponding fitted values given by (4.1) in which the regression
coefficients are estimated by least squares.
In the preceding we have assumed prescribed constants γ and σ as in the Black-
Scholes model; these parameters appear in (4.1) via the change of variables (3.5).
In practice σ is unknown and may also vary with time. We can replace it in (4.1)
200 T. L. Lai and S. P.-S. Wong
by the standard deviation σ bt of the most recent asset returns say, during the past
60 trading days prior to t as in [22], p. 881. Moreover, the risk-free rate r may also
change with time, and can be replaced by the yield rbt of a short-maturity Treasury
bill on the close of the month before t. The same remark also applies to the dividend
rate.
The simulation study in Lai and Wong [29] shows the advantages of this combined
substantive-empirical approach. Not only is P well approximated by Pb, especially
over intervals of S/K values that occur frequently in the sample, ∆ b − ∆ also reveals
a pattern similar to that of Pb − P . Besides ξP̂ = E{e −rτ
|VP̂ (τ )|}, where τ is the
time of exercise and VP̂ (t) is the value of the replicating portfolio at time t that
rebalances (according to the pricing formula P̂ ) between the risky and riskless assets
([22], p. 868-869), Lai and Wong [29] also consider the measure
Z τ
(4.3) κP̂ = E ˆ
(St /K)2 (∆(t) − ∆(t))2
dt ,
0
The Canadian Lynx data set consists of the annual record of the numbers of the
Canadian lynx trapped in the Mackenzie River district of the North-west Canada
for the period 1821-1934 inclusively. Let Xt be log10 (number recorded as trapped in
year 1820 + t) (t = 1, . . . , 114). Figure 1 shows the time series plot of Xt . According
to Tong [44], Moran [39] performed the first time series analysis on these data by
fitting an AR(2) model to Xt ; moreover, the log transformation is used because it
(i) makes the marginal distribution of Xt more symmetric about its mean and (ii)
reduces the approximation error in assuming the number of lynx to be proportional
to the population. In view of the substantial non-linearity of E[Xt |Xt−3 ] found in
the scatterplot of Xt versus Xt−3 , Tong([44], p.361) critiques Moran’s analysis and
its enhancements by Campbell and Walker [6], who added a harmonic component to
the AR(2) model, and by Tong [43], who used the AIC to select the order p = 11 for
AR(p) models, as “uncritical acceptance of linearity” in Xt . He uses a self-excited
threshold autoregressive model (SETAR) of the form
(
0.62 + 0.25Xt−1 − 0.43Xt−2 + εt if Xt−2 ≤ 3.25
(5.1) Xt − Xt−1 =
−(1.24Xt−2 − 2.25) + 0.52Xt−1 + εt if Xt−2 > 3.25
to fit these data, similar to Tong and Lim ([45], Section 9). The growth rate Xt −
Xt−1 in the first regime (i.e., Xt−2 ≤ 3.25) tends to be positive but small, which
corresponds to a slow population growth. In the second regime (i.e., Xt−2 > 3.25),
Xt − Xt−1 tends to be negative, corresponding to a decrease in population size.
Combining domain knowledge 201
3. 5
3.0
X(t)
2.5
2.0
Time
Tong ([44], p. 377) interprets the fitted model as an “energy balance” between the
population expansion and the population contraction, yielding a stable limit cycle
with a 9-year period which is in good agreement with the observed asymmetric
cycles. Motivated by Van der Pol’s equation, Haggan and Ozaki [18] proposed to fit
another nonlinear time series model, namely, the exponential autoregressive model
11
X 2
(5.2) Xt − µ = (φj + πj e−γ(Xt−j −µ) )(Xt−j − µ) + εt ,
j=1
which gives a limit cycle of period 9.45 years. Lim [35] compares the prediction per-
formance of these and other parametric models and concludes that Tong’s SETAR
model ranks the best among them.
Taking a more nonparametric approach, Fan and Yao [14] use a functional −
coefficient autoregressive model to fit the observed Xt series and compare its pre-
diction with that of threshold autoregression. Specifically, they fit the FAR(2,2)
model
to the first 102 observations, reserving the last 12 observations to evaluate the
prediction. The a1 (·) and a2 (·) in (5.3) are unknown functions which are estimated
by using locally linear smoothers. Fan and Yao ([14], p. 327) plot the estimates â1 (·)
and â2 (·), which are approximately constant for Xt−2 < 2.7 with â1 (Xt−2 ) ≈ 1.3
and â2 (Xt−2 ) ≈ −0.2, and which are approximately linear for Xt−2 ≥ 2.7. For
comparison, Fan and Yao [14] also fit the following SETAR(2) model to the same
set of data:
(
b 0.424 + 1.255Xt−1 − 0.348Xt−2, Xt−2 ≤ 2.981,
(5.4) Xt =
1.882 + 1.516Xt−1 − 1.126Xt−2, Xt−2 > 2.981.
202 T. L. Lai and S. P.-S. Wong
Because of the close resemblance of the fitted SETAR(2) and FAR(2,2), they share
certain ecological interpretations. In particular, the difference of the fitted coeffi-
cients in each regime can be explained by using the phase dependence and the den-
sity dependence in the predator-prey structure. The phase dependence refers to the
difference in the behavior of preys (snowshoe hare) and predators (lynx) in hunting
and escaping at the decreasing and increasing phases of population dynamics, while
the density dependence is the relationship between the reproduction rates of the
animals and their abundance. More discussion on these ecological interpretations
can be found in [42].
To evaluate the predictions of FAR (2,2), Fan and Yao ([14], p. 324) use the
one-step ahead forecast (denoted by X bt ) and the iterative two-step-ahead forecast
(denoted by X̃t ), which are defined by
bt := â1 (Xt−2 )Xt−1 + â2 (Xt−2 )Xt−2 ,
X bt−1 + â2 (Xt−2 )Xt−2 .
X̃t := â1 (Xt−2 )X
The predictions of SETAR(2) are similarly defined. The out-sample prediction ab-
solute errors (|X̂t − Xt | and |X̃t − Xt |) of the last 12 observations are reported
in Table 1. Based on the average of these 12 absolute prediction errors (AAPE),
FAR(2,2) performs slightly better than SETAR(2). Other nonparametric time se-
ries models for the Canadian lynx data include the projection pursuit regression
(PPR) model fitted by Lin and Pourahmadi [36] who found that SETAR outper-
forms PPR in terms of one-step-ahead forecasts, and neural network models which
Kajitani, Hipel and McLeod [25] found to be “just as good or better than SETAR
models for one-step out-of-sample forecasting of the lynx data.”
A substantive approach is adopted by Royama ([41], Chapter 5). Instead of
building the statistical model first and using ecology to interpret the fitted model
later, Royama starts with ecological mechanisms and population dynamics. Letting
Rt = Xt+1 − Xt denote the log reproductive rate from year t to t + 1, he consid-
ers nonlinear dynamics of the form Rt = f (Xt , . . . , Xt−h+1 ) + ut , where ut is a
zero-mean random disturbance, and emphasizes that “our ultimate goal is to deter-
mine the reproduction surface f and to find an appropriate model which reasonably
approximates to it,” with f satisfying the following two conditions in view of eco-
logical considerations: There exists X ∗ such that f (X ∗ , . . . , X ∗ ) = 0, and Rt has
to be bounded above because “no animal can produce infinite number of offspring”
Table 1
Absolute prediction errors of one-step-ahead (1 yr) and iterative two-step-ahead (2 yr) forecasts
and their 12-year average (AAPE).
Model (5.3) Model (5.4) Model (5.6) Model (5.8a)
FAR(2,2) SETAR(2) Logistic Logistic-MARS
Year Xt 1 yr 2 yr 1 yr 2 yr 1 yr 2 yr 1 yr 2 yr
1923 3.054 0.157 0.156 0.187 0.090 0.178 0.075 0.188 0.082
1924 3.386 0.012 0.227 0.035 0.269 0.077 0.281 0.057 0.286
1925 3.553 0.021 0.035 0.014 0.038 0.057 0.153 0.073 0.120
1926 3.468 0.008 0.037 0.022 0.000 0.012 0.077 0.023 0.140
1927 3.187 0.085 0.101 0.059 0.092 0.020 0.018 0.122 0.168
1928 2.723 0.055 0.086 0.075 0.015 0.128 0.098 0.002 0.159
1929 2.686 0.135 0.061 0.273 0.160 0.179 0.004 0.009 0.012
1930 2.821 0.016 0.150 0.026 0.316 0.004 0.216 0.010 0.001
1931 3.000 0.017 0.037 0.030 0.062 0.005 0.010 0.013 0.025
1932 3.201 0.007 0.014 0.060 0.043 0.048 0.042 0.021 0.005
1933 3.424 0.089 0.098 0.076 0.067 0.124 0.184 0.066 0.091
1934 3.531 0.053 0.175 0.072 0.187 0.083 0.245 0.011 0.087
AAPE 0.055 0.095 0.073 0.112 0.075 0.117 0.050 0.098
Combining domain knowledge 203
(see [41], p. 50, 154, 178). In Chapter 4 of [42], Royama introduces the (first-order)
logistic model of f (Xt ) = rm − exp{−a0 − a1 Xt−1 } to incorporate competition over
an available resource. Here rm is the maximum biologically realizable reproduction
rate, i.e. Rt ≤ rm for all t; see [42], Section 4.2.5. An implicit assumption of the
model is that the resource being depleted during a time step will be recovered to
the same level by the onset of the next time step. This assumption can be relaxed
if a linear combination of Xt−j (j = 1, . . . , h) with h > 1 is used in the exponential
term of f , yielding a higher-order logistic model; see [41], p. 153.
Chapter 5 of Royama [41] examines the autocorrelation function and the partial
autocorrelation function of the Canadian lynx series and concludes that h should
be set to 2, which corresponds to the model
which implies that the maximum logarithmic reproduction rate is 0.460, i.e., the
population can grow at most 100.46 = 2.884 times per year. Figure 2, top left
5
4
** * *
* **
** ** * **** ***
* * * * ****
* ***
** * * * *** *
* **
3
* *
**** *
x(t 2)
* * * ** *
* * * * *
* * * * *** * *
* * * ** *
*
*** ** * *
* *
** *
2
* * *
**
*
* *
1
1 2 3 4 5
Fig 2. Contour plot of R̂t−1 = Xt − dXt−1 of the logistic model (5.6). The observations are
marked by ∗. The dotted line is Xt−2 = Xt−1 . The intersection of this line and the contour
numbered 0 gives the equilibrium X ∗ .
204 T. L. Lai and S. P.-S. Wong
corner, shows a negative contour of the response surface of the fitted model (5.6).
This implies that the population size can drop sharply in the region Xt−2 > 3.5
and Xt−1 < 2.5, leading to extinction in the upper left part of this region. Whereas
(5.6) does not rule out the possibility of Xt diverging to −∞, extinction occurs as
soon as Xt falls below 0 (or equivalently, the population size 10Xt falls below 1).
Note that one can also derive bounds on the logarithmic reproduction rates
from the empirical approach. Figure 3 is the plot of the limit cycle generated by the
skeleton of the fitted model (5.4). The limit cycle is of period 8 years. The maximum
and the minimum logarithmic reproduction rates, attained at years 1 and 5 in
Figure 3, are 0.212 and -0.269, respectively. That is, the population grows at most
100.212 = 1.629 times per year and diminishes by at most a factor of 10−0.269 =0.538
per year. Moreover, the limit cycle of (5.4) implies an infinite loop of expansion and
contraction and rules out the possibility of extinction. These are consequences of
adopting an empirical approach because the data are distributed along the main
diagonal of Figure 2, but not its top left corner nor its lower right corner. In order to
deduce the behavior of the reproduction rates in these regions, mechanistic modeling
is essential. On the other hand, the empirical approach uses the observed data better
and gives more accurate forecasts. Table 1 compares the prediction performance of
FAR(2,2) and SETAR(2) with that of the logistic model (5.5). The fitted logistic
model provides the worst AAPE of one-step-ahead and iterative two-step-ahead
forecasts. Moreover, instead of characterizing the equilibrium with limit cycles,
the logistic model only gives two equilibrium points, with one corresponding to
extinction and the other equal to X ∗ = {a0 + log(rm )}/(a1 + a2 ) = 3.107 (the
intersection of the line Xt−1 = Xt−2 and the contour of f = 0 in Figure 2.)
We next apply the combined substantive-empirical approach of Section 4 to these
data, using the substantive model (5.5) to provide one of the basis functions in the
3. 4
2 3
10 11
18 19
1
3.2
9 4
17 12
20
x(t)
8
3.0
16
5
13
7
2.8
15
6
14
2.6
Fig 3. Limit cycle of the skeleton of the SETAR(2) model (5.4). The dotted line is Xt = Xt−1 .
Combining domain knowledge 205
semiparametric model
We evaluate this fitted model by using the out-sample prediction criterion. Table 1
shows that (5.8a) gives the smallest AAPE for one-step-ahead forecasts among all
models considered, and that the AAPE for iterative two-step-ahead forecasts of
(5.8a) is comparable to the smallest one provided by FAR(2,2). The region S in
(5.8a) is chosen to be the oblique rectangle whose edges are defined by the sample
means ±3 standard deviations of the principal components of the bivariate sample
of (Xt−1 , Xt−2 ); see Figure 4 which shows that this region contains not only the
in-sample data but also the out-sample data. Figure 5 gives the contour plot of
the fitted model (5.8a). The logarithmic growth rate at its top left corner is about
−2, which shows a strong possibility of extinction even though the magnitude is
less drastic than that in Figure 2 for (5.6). The inclusion of tensor products of
univariate splines in (5.8a) would have produced positive probability limits of Xt
diverging to ∞ or to −∞ if (Xt−1 , Xt−2 ) had not been confined to a compact
region. On the other hand, with an absorbing barrier at 0 and with (5.8b) only
applicable inside the compact set S, Markov chains of the type (5.8a) not only have
stationary distributions but are also geometrically ergodic under mild assumptions
on the random disturbances ut (e.g., to ensure irreducibity); see [39].
6. Conclusion
In his concluding remarks, Cox [11] noted that for successful use of statistical models
in particular applications, “large elements of subject-matter judgment and technical
statistical expertise are usually essential. Indeed, it is precisely the need for this
combination that makes our subject such an interesting and demanding one.” We
206 T. L. Lai and S. P.-S. Wong
5
4
** * *
* **
** ** * **** ***
* * * * * ****
** * ***
* * * *** *
**
3
* *
**** *
x(t 2)
* * * ** *
*
* * ** * * *
* *** * *
* * * ** *
* * *
* * ** *
* *
** *
2
* * *
**
*
* *
1
1 2 3 4 5
Fig 4. The oblique rectangle S formed by ±3 standard deviations away from the sample means
of the principal components of (Xt−1 , Xt−2 ). The in-sample and out-sample observations are
marked by ∗ and o, respectively.
5
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
4
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ** * *
^^^^^^^^^^^^^^^^^^^^^^^^^^
* ** *
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ** ** **
**
*
^^^^^^^^^^^^^^^^^^^^ * *
* *** *
^^^^^^^^^^^^^^^^^^ * * * *
^^^^^^^^^^^^^^^^ * ***
**
^^^^^^^^^^^^^^ * * * ** *
* **
^^^^^^^^^^^^^^^^^ *
3
^^^^^^^^^^ * *
^^^^^^^^ * ** ** ** *
x(t 2)
^^^^^^ *
** * * *
^^^^ * *
* * * *
^^^ * **** *
** * * *
* * ** **
** *
* *
** *
2
* * *
*
*
**
*
1
1 2 3 4 5
d
Fig 5. Contour plot of R̂t−1 = Xt − Xt−1 of the logistic-MARS model (5.7). The observations
are marked by ∗. The shaded region corresponds to extinction.
Combining domain knowledge 207
have followed up on his remarks here with a combined subject-matter and statistical
modeling approach to time series analysis, which we illustrate for the “particular
applications” of option pricing and population dynamics of the Canadian lynx. In
particular, for the Canadian lynx data, we have shown how statistical modeling for
data-rich regions of (Xt−1 , Xt−2 ) can be combined effectively with “subject-matter
judgment” which is the only reliable guide for sparse-data regions.
Acknowledgments
Lai’s research was supported by the National Science Foundation grant DMS-
0305749. Wong’s research was supported by the Research Grants Council of Hong
Kong under grant CUHK6158/02E.
References
[1] AitSahlia, F. and Lai, T. L. (2001). Exercise boundaries and efficient ap-
proximations to American option prices and hedge parameters. J. Comput.
Finance 4 85–103.
[2] Barron, A. R. (1993). Universal approximation bounds for superpositions of
a sigmoid function. IEEE Trans. Information Theory 39 930–945. MR1237720
[3] Black, F. and Scholes, M. (1973). The pricing of options and corporate
liabilities. J. Political Economy 81 637–659.
[4] Broadie, M. and Detemple, J. (1996). American option valuation: New
bounds, approximations, and a comparison of existing methods. Rev. Finan-
cial Studies 9 1121–1250.
[5] Broadie, M., Detemple, J., Ghysels, E. and Torres, O. (2000). Non-
parametric estimation of American options’ exercise boundaries and call prices.
J. Econ. Dynamics & Control 24 1829–1857. MR1784575
[6] Campbell, M. J. and Walker, A.M. (1977). A survey of statistical work
on the McKenzie River series of annual Canadian lynx trappings for the years
1821-1934, and a new analysis. J. Roy. Statist. Soc. Ser. A 140 411–431.
[7] Carr, P., Jarrow, R. and Myneni, R. (1992). Alternative characteriza-
tions of American put options. Math. Finance 2 87–106. MR1143390
[8] Chen, H. (1988). Convergence rates for parametric components in a partly
linear model. Ann. Statist. 16 136–146. MR0924861
[9] Chen, R. and Tsay, R. S. (1993). Functional-coefficient autoregressive mod-
els. J. Amer. Statist. Assoc. 88 298–308. MR1212492
[10] Chen, R. and Tsay, R. S. (1993). Nonlinear additive ARX models. J. Amer.
Statist. Assoc. 88 955–967.
[11] Cox, D. R. (1990). Role of models in statistical analysis. Statist. Sci. 5 169–
174. MR1062575
[12] Cox, J., Ross, S. and Rubinstein, M. (1979). Option pricing: A simplified
approach. J. Financial Econ. 7 229–263.
[13] Engle, R. F., Granger, C. W. J., Rice, J. and Weiss, A. (1986). Semi-
parametric estimates of the relation between weather and electricity sales. J.
Amer. Statist. Assoc. 81 310–320.
[14] Fan, J. and Yao, Q. (2003). Nonlinear Time Series. Springer-Verlag, New
York. MR1964455
[15] Friedman, J. H. (1991). Multivariate adaptive regression splines. Ann.
Statist. 19 1–142. MR1091842
208 T. L. Lai and S. P.-S. Wong
els and data mining in time series: a case-study on the Canadian lynx data.
Appl. Statist. 47 187–201.
[37] Merton, R. C. (1973). Theory of rational option pricing. Bell J. Econ. &
Management Sci. 4 141–181. MR0496534
[38] Meyn, S. P. and Tweedie, R. L. (1993). Markov Chains and Stochastic
Stability. Springer-Verlag, New York. MR1287609
[39] Moran, P. A. P. (1953). The statistical analysis of the Canadian lynx cycle,
I: Structure and prediction. Austral. J. Zoology 1 163–173.
[40] Ross, S. A. (1987). Finance. In The New Palgrave: A Dictionary of Economics
(J. Eatwell, M. Milgate and P. Newman, eds.), Vol. 2. Stockton Press, New
York, pp. 322–336.
[41] Royama, T. (1992). Analytical Population Dynamics. Chapman & Hall, Lon-
don.
[42] Stenseth, N. C., Chan, K. S., Tong, H., Boonstra, R., Boutin, S.,
Krebs, C. J.,Post, E., O’Donoghue, M., Yoccoz, N. G., Forchham-
mer, M. C. and Hurrell, J. W. (1998). From patterns to processes: Phase
and density dependencies in the Canadian lynx cycle. Proc. Natl. Acad. Sci.
USA 95 15430–15435.
[43] Tong, H. (1977). Some comments on the Canadian lynx data. J. Roy. Statist.
Soc. Ser. A 140 432–435.
[44] Tong, H. (1990). Nonlinear Time Series. Oxford University Press, Oxford.
MR1079320
[45] Tong H. and Lim, K. S. (1980). Threshold autoregression, limit cycles and
cyclical data (with Discussion). J. Roy. Statist. Soc. Ser. B 42 245–292.
[46] Wahba, G. (1990). Spline Models for Observational Data. SIAM Press,
Philadelphia. MR1045442
[47] Weigend, A. and Gershenfeld, N. (1993). Time Series Prediction: Fore-
casting the Future and Understanding the Past. Addison-Wesley, Reading, MA.
[48] Weigend, A., Rumelhart, D. and Huberman, B. (1991). Predicting
Sunspots and Exchange Rates with Connectionist Networks. In Nonlinear Mod-
eling and Forecasting (Casdagli, M. and Eubank, S., eds.). Addison Wesley,
Redwood City, CA, 395–432.