Rugarch Introduction
Rugarch Introduction
(Version 1.4-3)
Alexios Ghalanos
Contents
1 Introduction 3
2 Model Specification 3
2.1 Univariate ARFIMAX Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Univariate GARCH Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2.1 The standard GARCH model (’sGARCH’) . . . . . . . . . . . . . . . . . 6
2.2.2 The integrated GARCH model (’iGARCH’) . . . . . . . . . . . . . . . . . 7
2.2.3 The exponential GARCH model . . . . . . . . . . . . . . . . . . . . . . . 7
2.2.4 The GJR-GARCH model (’gjrGARCH’) . . . . . . . . . . . . . . . . . . . 7
2.2.5 The asymmetric power ARCH model (’apARCH’) . . . . . . . . . . . . . 8
2.2.6 The family GARCH model (’fGARCH’) . . . . . . . . . . . . . . . . . . . 9
2.2.7 The Component sGARCH model (’csGARCH’) . . . . . . . . . . . . . . . 11
2.2.8 The Multiplicative Component sGARCH model (’mcsGARCH’) . . . . . 12
2.2.9 The realized GARCH model (’realGARCH’) . . . . . . . . . . . . . . . . . 13
2.2.10 The fractionally integrated GARCH model (’fiGARCH’) . . . . . . . . . . 15
2.3 Conditional Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.1 The Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.2 The Student Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.3.3 The Generalized Error Distribution . . . . . . . . . . . . . . . . . . . . . . 19
2.3.4 Skewed Distributions by Inverse Scale Factors . . . . . . . . . . . . . . . . 19
2.3.5 The Generalized Hyperbolic Distribution and Sub-Families . . . . . . . . 20
2.3.6 The Generalized Hyperbolic Skew Student Distribution . . . . . . . . . . 24
2.3.7 Johnson’s Reparametrized SU Distribution . . . . . . . . . . . . . . . . . 24
3 Fitting 24
3.1 Fit Diagnostics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
4 Filtering 30
6 Simulation 34
7 Rolling Estimation 34
1
9 The ARFIMAX Model with constant variance 41
11 Future Development 46
2
1 Introduction
The pioneering work of Box et al. (1994) in the area of autoregressive moving average models
paved the way for related work in the area of volatility modelling with the introduction of ARCH
and then GARCH models by Engle (1982) and Bollerslev (1986), respectively. In terms of the
statistical framework, these models provide motion dynamics for the dependency in the condi-
tional time variation of the distributional parameters of the mean and variance, in an attempt
to capture such phenomena as autocorrelation in returns and squared returns. Extensions to
these models have included more sophisticated dynamics such as threshold models to capture
the asymmetry in the news impact, as well as distributions other than the normal to account
for the skewness and excess kurtosis observed in practice. In a further extension, Hansen (1994)
generalized the GARCH models to capture time variation in the full density parameters, with
the Autoregressive Conditional Density Model1 , relaxing the assumption that the conditional
distribution of the standardized innovations is independent of the conditioning information.
The rugarch package aims to provide for a comprehensive set of methods for modelling uni-
variate GARCH processes, including fitting, filtering, forecasting, simulation as well as diagnostic
tools including plots and various tests. Additional methods such as rolling estimation, boot-
strap forecasting and simulated parameter density to evaluate model uncertainty provide a rich
environment for the modelling of these processes. This document discusses the finer details of
the included models and conditional distributions and how they are implemented in the package
with numerous examples.
The rugarch package is available on CRAN (http://cran.r-project.org/web/packages/
rugarch/index.html) and the development version on bitbucket (https://bitbucket.org/
alexiosg). Some online examples and demos are available on my website (http://www.unstarched.
net).
The package is provided AS IS, without any implied warranty as to its accuracy or suitability.
A lot of time and effort has gone into the development of this package, and it is offered under the
GPL-3 license in the spirit of open knowledge sharing and dissemination. If you do use the model
in published work DO remember to cite the package and author (type citation(”rugarch”) for
the appropriate BibTeX entry) , and if you have used it and found it useful, drop me a note and
let me know.
USE THE R-SIG-FINANCE MAILING LIST FOR QUESTIONS.
A section on FAQ is included at the end of this document.
2 Model Specification
This section discusses the key step in the modelling process, namely that of the specification.
This is defined via a call to the ugarchspec function,
> args(ugarchspec)
Thus a model, in the rugarch package, may be described by the dynamics of the conditional
mean and variance, and the distribution to which they belong, which determines any additional
1
The racd package is now available from my bitbucket repository.
3
parameters. The following sub-sections will outline the background and details of the dynamics
and distributions implemented in the package.
with the left hand side denoting the Fractional AR specification on the demeaned data and the
right hand side the MA specification on the residuals. (L) is the lag operator, (1 − L)d the long
memory fractional process with 0 < d < 1, and equivalent to the Hurst Exponent H - 0.5, and
µt defined as,
m−n
X Xm
µt = µ + δi xi,t + δi xi,t σt + ξσtk , (2)
i=1 i=m−n+1
where we allow for m external regressors x of which n (last n of m) may optionally be multiplied
by the conditional standard deviation σt , and ARCH-in-mean on either the conditional standard
deviation, k = 1 or conditional variance k = 2. These options can all be passed via the arguments
in the mean.model list in the ugarchspec function,
• armaOrder (default = (1,1). The order of the ARMA model.)
• include.mean (default = TRUE. Whether the mean is modelled.)
• archm (default = FALSE. The ARCH-in-mean parameter.)
• archpow (default = 1 for standard deviation, else 2 for variance.)
• arfima (default = FALSE. Whether to use fractional differencing.)
• external.regressors (default = NULL. A matrix of external regressors of the same length
as the data.)
• archex (default = FALSE. Either FALSE or integer denoting the number of external re-
gressors from the end of the matrix to multiply by the conditional standard deviation.).
Since the specification allows for both fixed and starting parameters to be passed, it is useful to
provide the naming convention for these here,
• AR parameters are ’ar1’, ’ar2’, ...,
• MA parameters are ’ma1’, ’ma2’, ...,
• mean parameter is ’mu’
• archm parameter is ’archm’
• the arfima parameter is ’arfima’
• the external regressor parameters are ’mxreg1’, ’mxreg2’, ...,
Note that estimation of the mean and variance equations in the maximization of the likelihood
is carried out jointly in a single step. While it is perfectly possible and consistent to perform
a 2-step estimation, the one step approach results in greater efficiency, particularly for smaller
datasets.
4
2.2 Univariate GARCH Models
In GARCH models, the density function is usually written in terms of the location and scale
parameters, normalized to give zero mean and unit variance,
with ω = ω(θ, xt ) denoting the remaining parameters of the distribution, perhaps a shape and
skew parameter. The conditional mean and variance are used to scale the innovations,
yt − µ(θ, xt )
zt (θ) = , (6)
σ(θ, xt )
• model (default = ’sGARCH’ (vanilla GARCH). Valid models are ’iGARCH’, ’gjrGARCH’,
’eGARCH’, ’apARCH’,’fGARCH’,’csGARCH’ and ’mcsGARCH’).
• submodel (default = NULL. In the case of the ’fGARCH’ omnibus model, valid choices are
’GARCH’, ’TGARCH’, ’GJRGARCH’, ’AVGARCH’, ’NGARCH’, ’NAGARCH’, ’APARCH’
and ’ALLGARCH’)
The rest of this section discusses the various flavors of GARCH implemented in the package,
while Section 2.3 discusses the distributions implemented and their standardization for use in
GARCH processes.
5
2.2.1 The standard GARCH model (’sGARCH’)
The standard GARCH model (Bollerslev (1986)) may be written as:
Xm q
X p
X
σt2 = ω + ζj vjt + αj ε2t−j + 2
βj σt−j , (9)
j=1 j=1 j=1
with σt2 denoting the conditional variance, ω the intercept and ε2t the residuals from the mean
filtration process discussed previously. The GARCH order is defined by (q, p) (ARCH, GARCH),
with possibly m external regressors vj which are passed pre-lagged. If variance targeting is used,
then ω is replaced by,
Xm
σ̄ 2 1 − P̂ − ζj v̄j (10)
j=1
where σ̄ 2 is the unconditional variance of ε2 which is consistently estimated by its sample counter-
part at every iteration of the solver following the mean equation filtration, and v̄j represents the
sample mean of the j th external regressors in the variance equation (assuming stationarity), and
P̂ is the persistence and defined below. If a numeric value was provided to the variance.targeting
option in the specification (instead of logical), this will be used instead of σ̄ 2 for the calcula-
tion.2 One of the key features of the observed behavior of financial data which GARCH models
capture is volatility clustering which may be quantified in the persistence parameter P̂ . For the
’sGARCH’ model this may be calculated as,
q
X p
X
P̂ = αj + βj . (11)
j=1 j=1
−loge 2
h2l = . (12)
loge P̂
Finally, the unconditional variance of the model σ̂ 2 , and related to its persistence, is,
ω̂
σ̂ 2 = , (13)
1 − P̂
where ω̂ is the estimated value of the intercept from the GARCH model. The naming conventions
for passing fixed or starting parameters for this model are:
6
2.2.2 The integrated GARCH model (’iGARCH’)
The integrated GARCH model (see Engle and Bollerslev (1986)) assumes that the persistence
P̂ = 1, and imposes this during the estimation procedure. Because of unit persistence, none of
the other results can be calculated (i.e. unconditional variance, half life etc). The stationarity
of the model has been established in the literature, but one should investigate the possibility
of omitted structural breaks before adopting the iGARCH as the model of choice. The way
the package enforces the sum of the ARCH and GARCH parameters to be 1, is by subtracting
Pq p
P
1− αi − βi , so that the last beta is never estimated but instead calculated.
i=1 i>1
where the coefficient αj captures the sign effect and γj the size effect. The expected value of the
absolute standardized innovation, zt is,
Z∞
E |zt | = |z|f (z, 0, 1, ...) dz (15)
−∞
The unconditional variance and half life follow from the persistence parameter and are calculated
as in Section 2.2.1.
where γj now represents the ’leverage’ term. The indicator function I takes on value of 1 for
ε ≤ 0 and 0 otherwise. Because of the presence of the indicator function, the persistence of
the model now crucially depends on the asymmetry of the conditional distribution used. The
persistence of the model P̂ is,
q
X p
X q
X
P̂ = αj + βj + γj κ, (19)
j=1 j=1 j=1
7
where κ is the expected value of the standardized residuals zt below zero (effectively the proba-
bility of being below zero),
Z0
2
κ=E It−j zt−j = f (z, 0, 1, ...) dz (20)
−∞
where f is the standardized conditional density with any additional skew and shape parameters
(. . . ). In the case of symmetric distributions the value of κ is simply equal to 0.5. The variance
targeting, half-life and unconditional variance follow from the persistence parameter and are
calculated as in Section 2.2.1. The naming conventions for passing fixed or starting parameters
for this model are:
Note that the Leverage parameter follows the order of the ARCH parameter.
where δ ∈ R+ , being a Box-Cox transformation of σt , and γj the coefficient in the leverage term.
Various submodels arise from this model:
• The Absolute Value GARCH (AVGARCH) model of Taylor (1986) and Schwert (1990)
when δ = 1 and γj = 0.
• The Log ARCH model of Geweke (1986) and Pantula (1986) when δ → 0.
8
where κj is the expected value of the standardized residuals zt under the Box-Cox transformation
of the term which includes the leverage coefficient γj ,
Z∞
δ
κj = E(|z| − γj z) = (|z| − γj z)δ f (z, 0, 1, ...) dz (23)
−∞
In particular, to obtain any of the submodels simply pass the appropriate parameters as fixed.
which is a Box-Cox transformation for the conditional standard deviation whose shape is de-
termined by λ, and the parameter δ transforms the absolute value function which it subject to
rotations and shifts through the η1j and η2j parameters respectively. Various submodels arise
from this model, and are passed to the ugarchspec ’variance.model’ list via the submodel option,
9
• The Absolute Value GARCH (AVGARCH) model of Taylor (1986) and Schwert (1990)
when λ = δ = 1 and |η1j | ≤ 1 (submodel = ’AVGARCH’).
• The GJR GARCH (GJRGARCH) model of Glosten et al. (1993) when λ = δ = 2 and
η2j = 0 (submodel = ’GJRGARCH’).
• The Nonlinear Asymmetric GARCH model of Engle and Ng (1993) when δ = λ = 2 and
η1 j = 0 (submodel = ’NAGARCH’).
• The Asymmetric Power ARCH model of Ding et al. (1993) when δ = λ, η2j = 0 and
|η1j | ≤ 1 (submodel = ’APARCH’).
• The Exponential GARCH model of Nelson (1991) when δ = 1, λ = 0 and η2j = 0 (not
implemented as a submodel of fGARCH).
where κj is the expected value of the standardized residuals zt under the Box-Cox transformation
of the absolute value asymmetry term,
Z∞
δ
κj = E(|zt−j − η2j | − η1j (zt−j − η2j )) = (|z − η2j | − η1j (z − η2j ))δ f (z, 0, 1, ...) dz (28)
−∞
10
• Conditional Sigma Power parameter is ’lambda’,
where effectively the intercept of the GARCH model is now time-varying following first order
autoregressive type dynamics. The difference between the conditional variance and its trend,
2 −q
σt−j t−j is the transitory component of the conditional variance. The conditions for the non-
negativity of the conditional variance are given in Lee and Engle (1999) and imposed during
estimation by the stationarity option in the fit.control list of the ugarchfit method, and related
to the stationarity conditions that the sum of the (α,β) coefficients be less than 1 and that ρ < 1
(effectively the persistence of the transitory and permanent components).
The multistep, n > 1 ahead forecast of the conditional variance proceeds as follows:
q
X p
2
X
αj ε2t+n−j − qt+n−j + 2
Et−1 σt+n = Et−1 (qt+n ) + βj σt+n−j − qt+n−j
j=1 j=1
q p (32)
X X
2
αj Et−1 ε2t+n−j
2
Et−1 σt+n = Et−1 (qt+n ) + − qt+n−j + βj Et−1 σt+n−j − qt+n−j
j=1 j=1
h i h i
2
However, Et−1 ε2t+n−j = Et−1 σt+n−j , therefore:
q
X p
X
2 2
2
Et−1 σt+n = Et−1 (qt+n ) + αj Et−1 σt+n−j − qt+n−j + βj Et−1 σt+n−j − qt+n−j
j=1 j=1
n
max(p,q)
X
2
σt2 − qt
Et−1 σt+n = Et−1 (qt+n ) + (αj + βj )
j=1
(33)
11
The permanent component forecast can be represented as:
Et−1 [qt+n ] = ω + ρEt−1 [qt+n−1 ] + φEt−1 ε2t+n−j − σt+n−j
2
(34)
= ω + ρEt−1 [qt+n−1 ] (35)
= ω + ρ [ω + ρEt−1 [qt+n−2 ]] (36)
= ... (37)
n−1 n
= 1 + p + ··· + ρ ω + ρ qt (38)
1 − ρn
= ω + ρn qt (39)
1−ρ
(40)
As n → ∞ the unconditional variance is:
2 ω
Et−1 σt+n = Et−1 [qt+n ] = (41)
1−ρ
In the rugarch package, the parameters ρ and φ are represented by η11 (’eta11’) and η21 (’eta21’)
respectively.
12
see Section 2.2.1 for details. In the rugarch package, unlike the paper of ES2012, the conditional
mean and variance equations (and hence the diurnal component on the residuals from the con-
ditional mean filtration) are estimated jointly. Furthermore, and unlike ES2012, it is possible
to include ARMAX dynamics in the conditional mean, though because of the complexity of
the model and its use of time indices, ARCH-m is not currently allowed, but may be included
once the xts package is fully translated to Rcpp. Finally, as an additional point of departure
from ES2012, the diurnal component calculation uses the median instead of the mean which was
found to provide for a more robust alternative, particularly given the type and size of datasets
typically used [changed in version 1.2-3]. As currently stands, the model has methods for esti-
mation (ugarchfit), filtering (ugarchfilter), forecast from fit (ugarchforecast) but not from spec
(secondary dispatch method), simulation from fit (ugarchsim) but not from spec (ugarchpath),
rolling estimation (ugarchroll) but not the bootstrap (ugarchboot). Some of the plots, which
depend on the xts package will not render nicely since plot.xts does not play well with intraday
data. Some plots however such as VaRplot have been amended to properly display intraday
data coming from an xts object, and more may be added as time allows. An extensive example
of the model may be found on http://www.unstarched.net. The paper by ES2012 is currently
freely available here: http://jfec.oxfordjournals.org/content/10/1/54.full.
where we have defined the dynamics for the returns (yt ), the log of the conditional variance (σt2 )
and the log of the realized measure (rt ).3 The asymmetric reaction to shocks comes via the τ (.)
function which is based on the Hermite polynomials and truncated at the second level to give a
simple quadratic form:
τ (zt ) = η1 zt + η2 zt2 − 1
(47)
3
In the original paper by HHS2012, the notation is slightly different as I have chosen to re-use some of the
symbols/variables already in the rugarch specification. For completeness, the differences are noted:
• yt (rugarch) = rt (HHS2012)
• α (rugarch) = γ (HHS2012)
• σt2 (rugarch) = ht (HHS2012)
• rt (rugarch) = xt (HHS2012)
• δ (rugarch) = ϕ (HHS2012)
• η1 (rugarch) = τ1 (HHS2012)
• η2 (rugarch) = τ2 (HHS2012)
• λ (rugarch) = σu (HHS2012)
13
which has the very convenient property that Eτ (zt ) = 0. The function also forms the basis for
the creation of a type of news impact curve ν (z), defined as:
so that 100×ν (z) is the percent change in volatility as a function of the standardized innovations.
A key feature of this model is that it preserves the ARMA structure which characterize many
standard GARCH models and adapted here from Proposition 1 of the HHS2012 paper:
p∨q
X q
X
log σt2 = µσ + (δαi + 2
βi ) log σt−1 + αj wt−j
i=1 j=1
p∨q
X p∨q
X
log rt = µr + (δαi + βi ) log rt−1 + wt − βj wt−j (49)
i=1 j=1
q p
!
X X
µσ = ω + ξ αi , µr = δω + 1− βi ξ
i=1 i=1
q p
P P
where wt = τ (zt ) + ut , µσ = ω + ξ αi and µr = δω + 1 − βi ξ, and the convention
i=1 i=1
βi = αj = 0 for i > p and j < p. It is therefore a simple matter to show that the persistence
(P̂ ) of the process is given by:
X p Xq
P̂ = βi +δ αi (50)
i=1 i=1
While not standard for MLE, the independence of zt and ut means that we can factorize the
joint density into:
log f (yt , rt |Ft−1 ) = log (yt |Ft−1 ) + (rt |yt , Ft−1 ) (53)
| {z } | {z }
log l(y) log l(r|y )
which makes comparison with other GARCH models possible (using log l (y)). Finally, multi-
14
period forecasts have a nice VARMA type representation Yt = AYt−1 + b + εt , where:
log σt2
.
.
(β1 , ..., βp ) (α1 , ..., αq ) ω
.
(Ip−1×p−1 , 0p−1×1 ) 0p−1×q , b = 0p−1×1
2
log σt−p+1 , A =
Yt =
δ (β 1 , ..., β p ) δ (α1 , ..., α q ) ξ + δω
log rt
0q−1×p (Iq−1×q−1 , 0q−1×1 ) 0q−1×1
.
.
log rt−q+1
0p×1
εt = τ (zt ) + ut
0q×1
(54)
k−1
so that Yt+k = Ak Yt + Aj (b + εt+k−j ), and it is understood that the superscripts denote
P
j=0
matrix power, with [.]0 the identity matrix.4
In the rugarch package, all the methods, from estimation, to filtering, forecasting and simu-
lation have been included with key parts of the code written in C for speed (as elsewhere). For
the forecast routine, some additional arguments are made available as a result of the generation
of the conditional variance density (rather than point forecasts, although these are returned
based on the average of the simulated density values). Consult the documentation and online
examples for more details.
15
k−1−d
The expansion5 is usually truncated to some large number, such as
Q
where πi = k .
1≤k≤i
1000. Rearranging again, we obtain the following representation of the FIGARCH model:
n o
σt2 = ω[1 − β (L)]−1 + 1 − [1 − β(L)−1 φ (L) (1 − L)d ε2t
= ω ∗ + λ (L) ε2t (59)
∞
= ω∗ + λi Li ε2t
P
j=1
σt2 = ω + ε2t − β (L) ε2t − ε2t + ε̄2t + α (L) ε2t + ε̄2t + β (L) σt2
σt2 = ω − ε̄2t − β (L) ε2t + α (L) ε2t + ε̄2t + β (L) σt2 (61)
p q q p
σt2 = ω − ε̄2t − βj ε2t−j + αj ε2t−j + αj ε̄2t−j + 2
P P P P
βj σt−j
j=1 j=1 j=1 j=1
q P p
σt2 = ω − ε̄2t + αj ε2t−j + ε̄2t−j + 2 − ε2
P
βj σt−j t−j
j=1 j=1
Contrary to the case of the ARFIMA model, the degree of persistence in the FIGARCH model
operates in the oppposite direction, so that as the fractional differencing parameter d gets closer
to one, the memory of the FIGARCH process increases, a direct result of the parameter acting
on the squared errors rather than the conditional variance. When d = 0 the FIGARCH collapses
to the vanilla GARCH model and when d = 1 to the integrated GARCH model. The question
of the stationarity of a FIGARCH(q,d,p) model is open and there is no general proof of this at
present. As such, the stationarity argument in the estimation function is used interchangeably
for positivity conditions. Baillie et al. (1996) provided a set of sufficient conditions for the
FIGARCH(1,d,1) case which may be restrictive in practice, which is why Conrad and Haag
(2006) provide a more general set of sufficient conditions for the FIGARCH(q,d,p). Equations
(7)-(9) and Corollary 1 of their paper provide the conditions for the positivity in the case of the
FIGARCH(1,d,1) case which the rugarch package implements.6 Therefore, while it is possible
to estimate any order desired, only conditions for the (1,d,1) are checked and imposed during
estimation when the stationarity7 flag is set to TRUE.
5
This is the hypergeometric function expansion.
6
At present, only the (1,d,1) case is allowed.
7
Which is used here to denote a positivity constraint
16
Numerous alternatives and extensions have been proposed in the literature since the FI-
GARCH model was published. The model of Karanasos et al. (2004) models the squared residu-
als as deviations from ω so that it specifies a covariance stationary process (although the question
of strict stationary still remains open). Augmenting the EGARCH model with the fractional
operator appears to provide for a natural way to deal with the positivity issue since the process
is always strictly positive (see Bollerslev and Mikkelsen (1996)). This may be included in a
future version.
• fitdist(distribution = ”norm”, x, control = list()). A function for fitting data using any of
the included distributions.
This section provides a dry but comprehensive exposition of the required standardization of
these distributions for use in GARCH modelling.
The conditional distribution in GARCH processes should be self-decomposable which is a
key requirement for any autoregressive type process, while possessing the linear transformation
property is required to center (xt − µt ) and scale (εt /σt ) the innovations, after which the mod-
elling is carried out directly using the zero-mean, unit variance, distribution of the standardized
variable zt which is a scaled version of the same conditional distribution of xt , as described in
Equations 6, 7 and 8.
8
These were originally taken from the fBasics package but have been adapted and re-written in C for the
likelihood estimation.
9
Since version 1.0-8.
10
From the gamlss package.
17
2.3.1 The Normal Distribution
The Normal Distribution is a spherical distribution described completely by it first two moments,
the mean and variance. Formally, the random variable x is said to be normally distributed with
mean µ and variance σ 2 (both of which may be time varying), with density given by,
−0.5(x−µ)2
e σ2
f (x) = √ . (62)
σ 2π
Following a mean filtration or whitening process, the residuals ε, standardized by σ yield the
standard normal density given by,
2
!
1 e−0.5z
x−µ 1
f = f (z) = √ . (63)
σ σ σ 2π
To obtain the conditional likelihood of the GARCH process at each point in time (LLt ), the
conditional standard deviation σt from the GARCH motion dynamics, acts as a scaling factor
on the density, so that:
1
LLt (zt ; σt ) = f (zt ) (64)
σt
which illustrates the importance of the scaling property. Finally, the normal distribution has
zero skewness and zero excess kurtosis.
where α, β, and ν are the location, scale11 and shape parameters respectively, and Γ is the
Gamma function. Similar to the GED distribution described later, this is a unimodal and sym-
metric distribution where the location parameter α is the mean (and mode) of the distribution
while the variance is:
βν
V ar (x) = . (66)
(ν − 2)
For the purposes of standardization we require that:
βν
V ar(x) = =1
(ν − 2)
(67)
ν−2
∴β=
ν
(ν−2)
Substituting ν into 65 we obtain the standardized Student’s distribution:
−( ν+1
2 )
Γ ν+1
z2
x−µ 1 1 2
f = f (z) = p ν
1 + . (68)
(ν − 2)
σ σ σ (ν − 2) πΓ 2
11
In some representations, mostly Bayesian, this is represented in its inverse form to denote the precision.
18
In terms of R’s standard implementation of the Student density (’dt’), and including a scaling
by the standard deviation, this can be represented as:
√ εt
dt ,ν
σ (v−2)/ν
p (69)
σ (v − 2) /ν
The Student distribution has zero skewness and excess kurtosis equal to 6/(ν − 4) for ν > 4.
with α, β and κ representing the location, scale and shape parameters. Since the distribution
is symmetric and unimodal the location parameter is also the mode, median and mean of the
distribution (i.e. µ). By symmetry, all odd moments beyond the mean are zero. The variance
and kurtosis are given by,
−1
2 2/κ Γ 3κ
V ar (x) = β 2
Γ (κ−1 )
(71)
Γ 5κ−1 Γ κ−1
Ku (x) =
Γ (3κ−1 ) Γ (3κ−1 )
As κ decreases the density gets flatter and flatter while in the limit as κ → ∞, the distribution
tends towards the uniform. Special cases are the Normal when κ = 2, the Laplace when κ = 1.
Standardization is simple and involves rescaling the density to have unit standard deviation:
−1
2 2/κ Γ 3κ
V ar (x) = β 2 =1
Γ (κ−1 )
s (72)
Γ (κ −1 )
∴ β = 2−2/κ
Γ (3κ−1 )
19
where ξ ∈ R+ and H(.) is the Heaviside function. The absolute moments, required for deriving
the central moments, are generated from the following function:
Z ∞
Mr = 2 z r f (z) dz. (75)
0
The Normal, Student and GED distributions have skew variants which have been standardized
to zero mean, unit variance by making use of the moment conditions given above.
Proof 1 The Standardized Generalized Hyperbolic Distribution. Let εt be a r.v. with mean (0)
and variance (σ 2 ) distributed as GHY P (ζ, ρ), and let z be a scaled version of the r.v. ε with
variance (1) and also distributed as GHY P (ζ, ρ).14 The density f (.) of z can be expressed as
εt 1 1
f( ; ζ, ρ) = ft (z; ζ, ρ) = ft (z; α̃, β̃, δ̃, µ̃), (77)
σ σ σ
where we make use of the (α, β, δ, µ) parametrization since we can only naturally express the
density in that parametrization. The steps to transforming from the (ζ, ρ) to the (α, β, δ, µ)
parametrization, while at the same time standardizing for zero mean and unit variance are given
henceforth.
Let
p
ζ = δ α2 − β 2 (78)
β
ρ = , (79)
α
13
Credit is due to Diethelm Wurtz for his original implementation in the fBasics package of the transformation
and standardization function.
14
The parameters ζ and ρ do not change as a result of being location and scale invariant
20
which after some substitution may be also written in terms of α and β as,
ζ
α = p , (80)
δ (1 − ρ2 )
β = αρ. (81)
β2 α2 ρ2 α2 ρ2 ρ2
= = = , (84)
α2 − β 2 a2 − α2 ρ2 a2 (1 − ρ2 ) (1 − ρ2 )
then we can re-write the formula for δ in terms of the estimated parameters ζ̂ and ρ̂ as,
2 −0.5
Kλ+1 ζ̂ ρ̂2 Kλ+2 ζ̂ Kλ+1 ζ̂
δ= + − (85)
(1 − ρ̂2)
ζ̂Kλ ζ̂ Kλ ζ̂ Kλ ζ̂
Transforming into the (α̃, β̃, δ̃, µ̃) parametrization proceeds by first substituting 85 into 80 and
simplifying,
Kλ+2 (ζ̂ ) (K (ζ̂ ))2
0.5
ρ̂2 − λ+1 2
Kλ+1 (ζ̂ ) Kλ (ζ̂ ) (Kλ (ζ̂ ))
ζ̂ + (1−ρ̂2 )
ζ̂Kλ (ζ̂ )
α̃ = p ,
(1 − ρ̂2 )
Kλ+2 (ζ̂ ) (K (ζ̂ ))2
0.5
ζ̂ 2 ρ̂2 − λ+1 2
ζ̂Kλ+1 (ζ̂ ) + Kλ (ζ̂ ) (Kλ (ζ̂ ))
Kλ (ζ̂ ) (1−ρ̂2 )
= p ,
(1 − ρ̂2 )
2
0.5
ζ̂Kλ+1 (ζ̂ ) 2 2 Kλ+2 (ζ̂ ) Kλ+1 (ζ̂ ) (Kλ+1 (ζ̂ ))
ζ̂ ρ̂ − 2
Kλ (ζ̂ ) Kλ+1 (ζ̂ ) Kλ (ζ̂ ) (Kλ (ζ̂ ))
= 2
+ 2
,
(1 − ρ̂ ) 2
(1 − ρ̂ )
0.5
ζ̂Kλ+1 (ζ̂ ) Kλ+2 (ζ̂ ) Kλ+1 (ζ̂ )
ζ̂ ρ̂2 −
Kλ (ζ̂ ) Kλ+1 (ζ̂ ) Kλ (ζ̂ )
=
(1 − ρ̂2 ) 1 +
. (86)
(1 − ρ̂2 )
21
Finally, the rest of the parameters are derived recursively from α̃ and the previous results,
β̃ = α̃ρ̂, (87)
ζ̂
δ̃ = p , (88)
α̃ 1 − ρ̂2
−β̃ δ̃ 2 Kλ+1 ζ̂
µ̃ = . (89)
ζ̂Kλ ζ̂
For the use of the (ξ, χ) parametrization in estimation, the additional preliminary steps of con-
verting to the (ζ, ρ) are,
1
ζ = − 1, (90)
ξˆ2
χ̂
ρ = . (91)
ξˆ
Particular care should be exercised when choosing the GH distribution in GARCH models since
allowing the GIG λ parameter to vary is quite troublesome in practice and may lead to identi-
fication problems since different combinations of the 2 shape (λ, ζ) and 1 skew (ρ) parameters
may lead to the same or close likelihood. In addition, large sections of the likelihood surface
for some combinations of the distribution parameters is quite flat. Figure 1 shows the skewness,
kurtosis and 2 quantiles surfaces for different combinations of the (ρ, ζ) parameters for two
popular choices of λ.
22
23
(a) λ = 1(HYP) (b) λ = −0.5(NIG)
To standardize the distribution to have zero mean and unit variance, I make use of the first two
moment conditions for the distribution which are:
βδ 2
E (x) = µ +
ν−2
(93)
2β 2 δ 4 δ2
V ar (x) = +
(ν − 2)2 (ν − 4) ν − 2
where I have made use of the 4th parametrization of the GH distribution given in Prause (1999)
where β̂ = βδ. The location parameter is then rescaled by substituting into the first moment
formula δ so that it has zero mean:
βδ 2
µ̄ = − (95)
ν−2
Therefore, we model the GH Skew-Student using the location-scale invariant parametrization
(β̄, ν) and then translate the parameters into the usual GH distribution’s (α, β, δ, µ), setting
α = abs(β) + 1e − 12. As of version 1.2-8, the quantile function (via qdist) is calculated using
the SkewHyperbolic package of Scott and Grimson using the spline method (for speed), as is the
distribution function (via pdist).
3 Fitting
Once a uGARCHspec has been defined, the ugarchfit method takes the following arguments:
> args(ugarchfit)
24
function (spec, data, out.sample = 0, solver = "solnp", solver.control = list(),
fit.control = list(stationarity = 1, fixed.se = 0, scale = 0,
rec.init = "all"), ...)
The out.sample option controls how many data points from the end to keep for out of sample
forecasting, while the solver.control and fit.control provide additional options to the fitting rou-
tine. Importantly, the stationarity option controls whether to impose a stationarity constraint
during estimation, which is usually closely tied to the persistence of the process. The fixed.se
controls whether, for those values which are fixed, numerical standard errors should be calcu-
lated. The scale option controls whether the data should be scaled prior to estimation by its
standard deviation (scaling sometimes facilitates the estimation process). The option rec.init,
introduced in version 1.0-14 allows to set the type of method for the conditional recursion ini-
tialization, with default value ’all’ indicating that all the data is used to calculate the mean of
the squared residuals from the conditional mean filtration. To use the first ’n’ points for the
calculation, a positive integer greater than or equal to one (and less than the total estimation
datapoints) can instead be provided. If instead a positive numeric value less than 1 is provided,
this is taken as the weighting in an exponential smoothing backcast method for calculating the
initial recursion value.
Currently, 5 solvers 15 are supported, with the main one being the augmented Lagrange solver
solnp of Ye (1997) implemented in R by Ghalanos and Theussl (2011). The main functionality,
namely the GARCH dynamics and conditional likelihood calculations are done in C for speed.
For reference, there is a benchmark routine called ugarchbench which provides a comparison of
rugarch against 2 published GARCH models with analytic standard errors, and a small scale
comparison with a commercial GARCH implementation. The fitted object is of class uGARCHfit
which can be passed to a variety of other methods such as show (summary), plot, ugarchsim,
ugarchforecast etc. The following example illustrates its use, but the interested reader should
consult the documentation on the methods available for the returned class.
*---------------------------------*
* GARCH Model Fit *
*---------------------------------*
Optimal Parameters
------------------------------------
Estimate Std. Error t value Pr(>|t|)
mu 0.000522 0.000087 5.9873 0.00000
ar1 0.870609 0.071909 12.1070 0.00000
15
Since version 1.0 − 8 the ’nlopt’ solver of Johnson (interfaced to R by Jelmer Ypma in the ’nloptr’ package)
has been added, greatly expanding the range of possibilities available via its numerous subsolver options - see
documentation.
25
ma1 -0.897805 0.064324 -13.9576 0.00000
omega 0.000001 0.000001 1.3912 0.16418
alpha1 0.087714 0.013705 6.4001 0.00000
beta1 0.904955 0.013750 65.8136 0.00000
LogLikelihood : 17902.41
Information Criteria
------------------------------------
Akaike -6.4807
Bayes -6.4735
Shibata -6.4807
Hannan-Quinn -6.4782
26
Joint Statistic: 174.6662
Individual Statistics:
mu 0.2090
ar1 0.1488
ma1 0.1057
omega 21.3780
alpha1 0.1345
beta1 0.1130
where,
00
A = L θ̂
n
X T (97)
B= gi xi θ̂ gi xi θ̂
i=1
which is the Hessian and covariance of the scores at the optimum. The robust standard errors
are the square roots of the diagonal of V .
The inforcriteria method on a fitted or filtered object returns the Akaike (AIC), Bayesian
27
(BIC), Hannan-Quinn (HQIC) and Shibata (SIC) information criteria to enable model selection
by penalizing overfitting at different rates. Formally, they may be defined as:
−2LL 2m
AIC = +
N N
−2LL mloge (N )
BIC = +
N N
(98)
−2LL (2mloge (loge (N )))
HQIC = +
N N
−2LL (N + 2m)
SIC = + loge
N N
were any parameters fixed during estimation are excluded from the calculation. Since version
1.3-1, the Q-statistics and ARCH-LM test have been replaced with the Weighted Ljung-Box and
ARCH-LM statistics of Fisher and Gallagher (2012) which better account for the distribution
of the statistics of the values from the estimated models. The ARCH-LM test is now a weighted
portmanteau test for testing the null hypothesis of adequately fitted ARCH process, whilst the
Ljung-Box is another portmanteau test with null the adequacy of the ARMA fit. The signbias
calculates the Sign Bias Test of Engle and Ng (1993), and is also displayed in the summary.
This tests the presence of leverage effects in the standardized residuals (to capture possible
misspecification of the GARCH model), by regressing the squared standardized residuals on
lagged negative and positive shocks as follows:
ẑt2 = c0 + c1 Iε̂t−1 <0 + c2 Iε̂t−1 <0 ε̂t−1 + c3 Iε̂t−1 >0 ε̂t−1 + ut (99)
where I is the indicator function and ε̂t the estimated residuals from the GARCH process. The
Null Hypotheses are H0 : ci = 0 (for i = 1, 2, 3), and that jointly H0 : c1 = c2 = c3 = 0. As
can be inferred from the summary of the previous fit, there is significant Negative and Positive
reaction to shocks. Using instead a model such as the apARCH would likely alleviate these
effects.
The gof calculates the chi-squared goodness of fit test, which compares the empirical dis-
tribution of the standardized residuals with the theoretical ones from the chosen density. The
implementation is based on the test of Palm (1996) which adjusts the tests in the presence on
non-i.i.d. observations by reclassifying the standardized residuals not according to their value (as
in the standard test), but instead on their magnitude, calculating the probability of observing
a value smaller than the standardized residual, which should be identically standard uniform
distributed. The function must take 2 arguments, the fitted object as well as the number of
bins to classify the values. In the summary to the fit, a choice of (20, 30, 40, 50) bins is used,
and from the summary of the previous example it is clear that the Normal distribution does not
adequately capture the empirical distribution based on this test.
The nymblom test calculates the parameter stability test of Nyblom (1989), as well as the
joint test. Critical values against which to compare the results are displayed, but this is not
available for the joint test in the case of more than 20 parameters.
Finally, some informative plots can be drawn either interactively(which = ’ask’), individually
(which = 1:12) else all at once (which = ’all’) as in Figure 2.
28
29
Figure 2: uGARCHfit Plots
4 Filtering
Sometimes it is desirable to simply filter a set of data with a predefined set of parameters. This
may for example be the case when new data has arrived and one might not wish to re-fit. The
ugarchfilter method does exactly that, taking a uGARCHspec object with fixed parameters.
Setting fixed or starting parameters on the GARCH spec object may be done either through
the ugarchspec function when it is called via the fixed.pars arguments to the function, else by
using the setfixed<- method on the spec object. The example which follows explains how:
> data(sp500ret)
> spec = ugarchspec(variance.model = list(model = "apARCH"), distribution.model = "std")
> setfixed(spec) <- list(mu = 0.01, ma1 = 0.2, ar1 = 0.5, omega = 1e-05,
+ alpha1 = 0.03, beta1 = 0.9, gamma1 = 0.01, delta = 1, shape = 5)
> filt = ugarchfilter(spec = spec, data = sp500ret)
> show(filt)
*------------------------------------*
* GARCH Model Filter *
*------------------------------------*
Filter Parameters
---------------------------------------
mu 1e-02
ar1 5e-01
ma1 2e-01
omega 1e-05
alpha1 3e-02
beta1 9e-01
gamma1 1e-02
delta 1e+00
shape 5e+00
LogLikelihood : 5627.392
Information Criteria
---------------------------------------
Akaike -2.0378
Bayes -2.0378
Shibata -2.0378
Hannan-Quinn -2.0378
30
statistic p-value
Lag[1] 1178 0
Lag[2*(p+q)+(p+q)-1][5] 1212 0
Lag[4*(p+q)+(p+q)-1][9] 1217 0
d.o.f=2
H0 : No serial correlation
The returned object is of class uGARCHfilter and shares many of the methods as the uGARCHfit
class. Additional arguments to the function are explained in the documentation. Note that the
information criteria shown here are based on zero estimated parameters (they are all fixed), and
the same goes for the infocriteria method on a uGARCHfilter object.
31
a rather complicated object). In the latter case, it is also possible to make use of the GARCH
bootstrap, described in Pascual et al. (2006) and implemented in the function ugarchboot,
with the added innovation of an optional extra step of fitting either a kernel or semi-parametric
density (SPD) to the standardized residuals prior to sampling in order to provide for (possibly)
more robustness in the presence of limited data. To understand what the GARCH bootstrap
does, consider that there are two main sources of uncertainty about n.ahead forecasting from
GARCH models: that arising from the form of the predictive density and that due to parameter
uncertainty. The bootstrap method in the rugarch package is based on resampling standardized
residuals from the empirical distribution of the fitted model to generate future realizations of the
series and sigma. Two methods are implemented: one takes into account parameter uncertainty
by building a simulated distribution of the parameters through simulation and refitting, and
one which only considers distributional uncertainty and hence avoids the expensive and lengthy
parameter distribution estimation. In the latter case, prediction intervals for the 1-ahead sigma
forecast will not be available since only the parameter uncertainty is relevant in GARCH type
models in this case. The following example provides for a brief look at the partial method, but
the interested reader should consult the more comprehensive examples in the inst folder of the
package.
> data(sp500ret)
> spec = ugarchspec(variance.model=list(model="csGARCH"), distribution="std")
> fit = ugarchfit(spec, sp500ret)
> bootp = ugarchboot(fit, method = c("Partial", "Full")[1],
+ n.ahead = 500, n.bootpred = 500)
> show(bootp)
*-----------------------------------*
* GARCH Bootstrap Forecast *
*-----------------------------------*
Model : csGARCH
n.ahead : 500
Bootstrap method: partial
Date (T[0]): 2009-01-30
Series (summary):
min q.25 mean q.75 max forecast[analytic]
t+1 -0.10855 -0.013343 0.000668 0.016285 0.090454 0.001944
t+2 -0.11365 -0.010796 0.001632 0.015721 0.085783 0.001707
t+3 -0.28139 -0.013203 -0.000378 0.015496 0.082250 0.001512
t+4 -0.10459 -0.014830 0.000346 0.015602 0.109223 0.001352
t+5 -0.21915 -0.012494 0.001196 0.016627 0.098003 0.001220
t+6 -0.11029 -0.012119 0.001008 0.015000 0.083469 0.001112
t+7 -0.22818 -0.013280 0.000398 0.015250 0.094184 0.001023
t+8 -0.25722 -0.014854 -0.001401 0.016074 0.088067 0.000949
t+9 -0.34629 -0.017681 -0.004484 0.012847 0.154058 0.000889
t+10 -0.11328 -0.013566 0.000957 0.018291 0.140734 0.000840
.....................
Sigma (summary):
min q0.25 mean q0.75 max forecast[analytic]
t+1 0.026387 0.026387 0.026387 0.026387 0.026387 0.026387
t+2 0.025518 0.025564 0.026345 0.026493 0.038492 0.026577
32
Figure 3: GARCH Bootstrap Forecast Plots
2. Sample n.bootfit sets of size N (original dataset less any out of sample periods) from either
the raw standardized residuals, using the spd or kernel based methods.
3. Simulate n.bootfit paths of size N, using as innovations the sampled standardized residuals.
The simulation is initiated with the last values from the dataset at point N (T0 in simulation
time).
4. The n.bootfit simulated series are then estimated with the same specification used by
the originally supplied object in order to generate a set of coefficients representing the
parameter uncertainty.
5. Filter the original dataset with the n.bootfit set of estimated coefficients.
6. Use the last values of the filtered conditional sigma (and if it is the csGARCH model,
then also the permanent component q) and residuals from the previous step to initialize a
33
new simulation with horizon n.ahead and m.sim=n.bootpred, using again the standardized
residuals sampled as in step 2 and the new set of estimated coefficients. The simulation
now contains uncertainty about the conditional n-ahead density as well as parameter un-
certainty.
6 Simulation
Simulation may be carried out either directly on a fitted object (ugarchsim) else on a GARCH
spec with fixed parameters (ugarchpath). The ugarchsim method takes the following argu-
ments:
> args(ugarchsim)
function (fit, n.sim = 1000, n.start = 0, m.sim = 1, startMethod = c("unconditional",
"sample"), presigma = NA, prereturns = NA, preresiduals = NA,
rseed = NA, custom.dist = list(name = NA, distfit = NA),
mexsimdata = NULL, vexsimdata = NULL, ...)
where the n.sim indicates the length of the simulation while m.sim the number of independent
simulations. For reasons of speed, when n.sim is large relative to m.sim, the simulation code
is executed in C, while for large m.sim a special purpose C++ code (using Rcpp and Rcp-
pArmadillo) is used which was found to lead to significant speed increase. Key to replicating
results is the rseed argument which is used to pass a user seed to initialize the random number
generator, else one will be assigned by the program. In any case, the returned object, of class
uGARCHsim (or uGARCHpath) contains a slot with the seed(s) used.
7 Rolling Estimation
The ugarchroll method allows to perform a rolling estimation and forecasting of a model/dataset
combination, optionally returning the VaR at specified levels. More importantly, it returns the
distributional forecast parameters necessary to calculate any required measure on the forecasted
density. The following example illustrates the use of the method where use is also made of the
parallel functionality and run on 10 cores.16 Figure 4 is generated by calling the plot function
on the returned uGARCHroll object. Additional methods, and more importantly extractor func-
tions can be found in the documentation. Note that only n.ahead=1 is allowed at present (more
complicated rolling forecasts can be created by the user with the ugarchfit and ugarchforecast
functions). Finally, there is a new method called resume which allows resumption of estimation
of an object which had non-converged windows, optionally supplying a different solver and solver
control combination.
> data(sp500ret)
> library(parallel)
> cl = makePSOCKcluster(10)
> spec = ugarchspec(variance.model = list(model = "eGARCH"), distribution.model = "jsu")
> roll = ugarchroll(spec, sp500ret, n.start = 1000, refit.every = 100,
refit.window = "moving", solver = "hybrid", calculate.VaR = TRUE,
VaR.alpha = c(0.01, 0.05), cluster = cl, keep.coef = TRUE)
>show(roll)
>stopCluster(cl)
16
Since version 1.0-14 the parallel functionality is based on the paralllel package and it is upto the user to initialize
a cluster object and pass it to the function, and then terminate it once it is no longer required. Eventually, this
approach to the parallel usage will filter through to all the functions in rugarch and rmgarch.
34
*-------------------------------------*
* GARCH Roll *
*-------------------------------------*
No.Refits : 46
Refit Horizon : 100
No.Forecasts : 4523
GARCH Model : eGARCH(1,1)
Distribution : jsu
Forecast Density:
Mu Sigma Skew Shape Shape(GIG) Realized
1991-02-21 4e-04 0.0102 -0.2586 1.5065 0 -0.0005
1991-02-22 2e-04 0.0099 -0.2586 1.5065 0 0.0019
1991-02-25 4e-04 0.0095 -0.2586 1.5065 0 0.0044
1991-02-26 3e-04 0.0093 -0.2586 1.5065 0 -0.0122
1991-02-27 1e-04 0.0101 -0.2586 1.5065 0 0.0135
1991-02-28 7e-04 0.0099 -0.2586 1.5065 0 -0.0018
..........................
Mu Sigma Skew Shape Shape(GIG) Realized
2009-01-23 0.0015 0.0259 -0.87 2.133 0 0.0054
2009-01-26 0.0005 0.0243 -0.87 2.133 0 0.0055
2009-01-27 -0.0002 0.0228 -0.87 2.133 0 0.0109
2009-01-28 -0.0011 0.0212 -0.87 2.133 0 0.0330
2009-01-29 -0.0039 0.0191 -0.87 2.133 0 -0.0337
2009-01-30 0.0009 0.0220 -0.87 2.133 0 -0.0231
35
Figure 4: eGARCH Rolling Forecast Plots
LR.cc Statistic: 0
LR.cc Critical: 5.991
LR.cc p-value: 1
Reject Null: NO
36
8 Simulated Parameter Distribution and RMSE
It is sometimes instructive to be able to investigate the underlying density of the estimated
parameters under different models. The ugarchdistribution method performs a monte carlo
experiment by simulating and fitting a model multiple times and for different ’window’ sizes.
This allows to obtain some insight on the consistency of the parameter estimates as the data
window increases
√ by looking at the rate of decrease of the Root Mean Squared Error and whether
we have N consistency. This is a computationally expensive exercise and as such should only
be undertaken in the presence of ample computing power and RAM. As in other functions,
parallel functionality is enabled if available. The example which follows illustrates an instance
of this test on one model and one set of parameters. Figures 5 and 6 complete this example.
persistence
0.975
> library(parallel)
> cl = makePSOCKcluster(10)
> setfixed(spec) <- list(mu = 0.001, ar1 = 0.4, ma1 = -0.1, omega = 1e-06,
+ alpha1 = 0.05, beta1 = 0.9, gamma1 = 0.05, shape = 1.5)
> dist = ugarchdistribution(fitORspec = spec, n.sim = 2000, n.start = 1,
+ m.sim = 100, recursive = TRUE, recursive.length = 6000, recursive.window = 1000,
+ rseed = 1066, solver = "solnp", solver.control = list(trace = 0),
+ cluster = cl)
> stopCluster(cl)
> show(dist)
*------------------------------------*
* GARCH Parameter Distribution *
*------------------------------------*
Model : gjrGARCH
No. Paths (m.sim) : 100
Length of Paths (n.sim) : 2000
Recursive : TRUE
Recursive Length : 6000
Recursive Window : 1000
37
38
Figure 5: Simulated Parameter Density
39
Figure 6: RMSE Rate of Change
40
(a) Bivariate Parameter Plots (b) GARCH Stat Plots
M1 E 2[zt ] =0
M2 E zt −1 =0
M3 E 3
4zt =0
M4 h E zt − 3 i =0
(100)
Q2 E zt2 − 1 zt−j 2 −1 =0
h i
Q3 E zt3 zt−j 3 =0
h 4 i
Q4 E zt4 − 3 zt−j −3 =0
Zyt
xt = fˆ (u) du = F̂ (yt ) (101)
−∞
which transforms the data yt , using the estimated distribution F̂ into i.i.d. U (0, 1) under the
correctly specified model. Based on this transformation, Tay et al. (1998) provide for a visual
assessment test, while Berkowitz (2001) provides a more formal test, implemented in the package
41
under the name BerkowitzTest. Because of the difficulty in testing a U (0, 1) sequence, the PIT
data is transformed into N (0, 1) by Berkowitz using the normal quantile function, and tested
using a Lagrange Multiplier (LM ) test for any residual autocorrelation given a specified number
of lags. In addition, a tail test based on the censored Normal is also provided, under the Null
that the standardized tail data has mean zero and unit variance. More recently, Hong and
Li (2005) introduced a nonparametric portmanteau test, building on the work of Ait-Sahalia
(1996), which tests the joint hypothesis of i.i.d AND U (0, 1) for the sequence xt . As noted by
the authors, testing xt using a standard goodness-of-fit test (such as the Kolmogorov-Smirnov)
would only check the U (0, 1) assumption under i.i.d. and not the joint assumption of U (0, 1) and
i.i.d. Their approach is to compare a kernel estimator ĝj (x1 , x2 ) for the joint density gj (x1 , x2 )
of the pair {xt , xt−j } (where j is the lag order) with unity, the product of two U (0, 1) densities.
Given a sample size n and lag order j > 0, their joint density estimator is:
n
−1
X
ĝj (x1 , x2 ) ≡ (n − j) Kh x1 , X̂t Kh x2 , X̂t−j (102)
t=j+1
√
where X̂t = Xt θ̂ , and θ̂ is a n consistent estimator of θ0 . The function Kh is a boundary
modified kernel defined as:
. R 1
h−1 k x−y ifx ∈ [0, h) ,
−(x/h) k (u) du,
h
−1 x−y
Kh (x, y) ≡ h .k h , ifx ∈ [h, 1 − h] , (103)
−1 x−y R (1−x)/h
h k k (u) du, ifx ∈ (1 − h, 1] ,
h −1
where h ≡ h (n) is a bandwidth such that h → 0 as n → ∞, and the kernel k(.) is a pre-specified
symmetric probability density, which is implemented as suggested by the authors using a quartic
kernel,
15 2
k (u) = 1 − u2 1 (|u| ≤ 1) , (104)
16
where 1 (.) is the indicator function. Their portmanteau test statistic is defined as:
p
X
Ŵ (p) ≡ p−1/2 Q̂ (j), (105)
j=1
where h i.
1/2
Q̂ (j) ≡ (n − j) hM̂ (j) − A0h V0 , (106)
and Z 1Z 1
M̂ (j) ≡ [ĝj (x1 , x2 ) − 1]2 dx1 dx2 . (107)
0 0
The centering and scaling factors A0h and V0 are defined as:
h R1 R1Rb i2
A0h ≡ h−1 − 2 −1 k 2 (u) du + 2 0 −1 kb2 (u) dudb − 1
R 1 hR 1 i 2 2 (108)
V0 ≡ 2 −1 −1 k (u + v) k (v) dv du
where, Z b
kb (.) ≡ k (.) k (v) dv. (109)
−1
Under the correct model specification, the authors show that Ŵ (p) → N (0, 1) in distribution.
Because negative values of the test statistic only occur under the Null Hypothesis of a correctly
42
specified model, the authors indicate that only upper tail critical values need be considered.
The test is quite robust to model misspecification as parameter uncertainty has no impact on
√
the asymptotic distribution of the test statistic as long as the parameters are n consistent.
Finally, in order to explore possible causes of misspecification when the statistic rejects a model,
the authors develop the following test statistic:
, 1/2
n−1
X n−1
X n−2
X
M (m, l) ≡ w2 (j/p) (n − j) ρ̂2ml (j) − w2 (j/p) 2 w4 (j/p) (110)
j=1 j=1 j=1
where ρ̂ml (j) is the sample cross-correlation between X̂tm and X̂t−|j|
l , and w (.) is a weighting
function of lag order j, and as suggested by the authors implemented as the Bartlett kernel.
As in the Ŵ (p) statistic, the asymptotic distribution of M (m, l) is N (0, 1) and upper critical
values should be considered. As an experiment, Table 1 considers the cost of fitting a GARCH-
Normal model when the true model is GARCH-Student, using the HLTest on simulated data
using the ugarchsim function. The results are clear: At low levels of the shape parameter
ν, representing a very high excess kurtosis, the model is overwhelmingly rejected by the test,
and as that parameter increases to the point where the Student approximates the Normal, the
rejections begin to reverse. Also of interest, but not surprising, the strength of the rejection is
somewhat weaker for smaller datasets (N = 500, 1000). For example, in the case of using only
500 data points and a shape parameter of 4.1 (representing an excess kurtosis of 60!), 5% of the
time, in this simulation, the test failed to reject the GARCH-Normal.
ν[4.1] ν[4.5] ν[5] ν[5.5] ν[6] ν[6.5] ν[7] ν[7.5] ν[8] ν[10] ν[15] ν[20] ν[25] ν[30]
N500
ˆ
stat 10.10 6.70 5.08 4.07 2.64 2.22 1.47 1.46 1.05 0.19 -0.34 -0.36 -0.54 -0.71
%reject 95 89 82 76 59 54 42 41 34 22 12 13 6 8
N1000
ˆ
stat 18.54 13.46 9.46 7.64 6.16 5.14 4.17 2.95 3.03 1.31 0.28 -0.15 -0.48 -0.47
%reject 100 100 98 97 90 86 79 64 69 39 24 11 7 12
N2000
ˆ
stat 32.99 26.46 19.41 15.53 12.41 10.35 7.76 6.79 5.79 3.20 0.87 0.09 0.03 -0.21
%reject 100 100 100 100 100 99 95 94 92 71 32 22 22 16
N3000
ˆ
stat 47.87 37.03 27.38 21.67 17.85 14.22 11.46 9.73 7.99 5.12 1.60 0.35 0.10 -0.09
%reject 100 100 100 100 100 100 100 99 96 85 46 27 22 15
Note: The table presents the average test statistic of Hong and Li (2005) and number of rejections at the 95% confidence level for fitting a
GARCH(1,1)-Normal model to a GARCH(1,1)-Student model for different values of the shape parameter ν, and sample size (N ). For each
sample of size N , 250 simulated series were created from a GARCH student model with parameters
(µ, ω, α, β) = (5.2334e − 04, 4.3655e − 06, 5.898e − 02, 9.2348e − 01), and ν in the range of [4.1, 30], and fitted using a GARCH(1,1)-Normal
model. The standardized residuals of the fitted model where then transformed via the normal distribution function into U (0, 1) series and
evaluated using the test of Hong and Li (2005).
43
with,
1X
AT = rt
T t
! ! (112)
1X 1X
BT = sgn (ŷt ) yt
T t T t
with ŷ being the forecast of y and rt = sgn(ŷt )(yt ). According to the authors of the test, the
estimated variance of EP , V̂EP may be estimated as:
4 X
V̂EP = 2
p̂ŷ (1 − p̂ŷ ) (yt − ȳ)2 (113)
T
1 1 P
where p̂ŷ = 2 1+ T sgn (ŷt ) . The EP statistic is asymptotically distributed as N (0, 1).
t
For the DA test the interested reader can consult the relevant literature for more details.
where p is the probability of an exceedance for the chosen confidence level and N is the sample
size. Under the Null the test statistic is asymptotically distributed as a χ2 with 1 degree of free-
dom. The test does not consider any potential violation of the assumption of the independence
of the number of exceedances. The conditional coverage test of Christoffersen et al. (2001) cor-
rects this by jointly testing the frequency as well as the independence of exceedances, assuming
that the VaR violation is modelled with a first order Markov chain. The test is a likelihood
ratio, asymptotically distributed as χ2 with 2 degrees of freedom, where the Null is that the
conditional and unconditional coverage are equal to α. The test is implemented under the name
VaRTest.
In a further paper, Christoffersen and Pelletier (2004) considers the duration between VaR vio-
lations as a stronger test of the adequacy of a risk model. The duration of time between VaR
violations (no-hits) should ideally be independent and not cluster. Under the Null hypothesis
of a correctly specified risk model, the no-hit duration should have no memory. Since the only
continuous distribution which is memory free is the exponential, the test can conducted on any
distribution which embeds the exponential as a restricted case, and a likelihood ratio test then
conducted to see whether the restriction holds. Following Christoffersen and Pelletier (2004),
the Weibull distribution is used with parameter b = 1 representing the case of the exponential.
The test is implemented under the name VaRDurTest.
Because VaR tests deal with the occurrences of hits, they are by definition rather crude measures
to compare how well one model has done versus another, particularly with short data sets. The
expected shortfall test of McNeil and Frey (2000) measures the mean of the shortfall violations
which should be zero under the Null of a correctly specified risk model. The test is implemented
in the function ESTest which also provides for the option of bootstrapping the distribution of
44
the p-value, hence avoiding any strong assumptions about the underlying distribution of the
excess shortfall residuals.
Finally, it is understood that these tests are applied to out-of-sample forecasts and NOT insam-
ple, for which no correction to the tests have been made to account for parameter uncertainty.
where dˆij measures the relative performance between models, and dˆi the measures the relative
performance of model i to the average of all the models in M , and the variance of dˆi , var(dˆi )
may be derived by use of the bootstrap. The statistic then used to eliminate inferior models is
the range statistic17 and defined as:
dˆi
TR = max r . (120)
i,j∈M
var dˆi
The asymptotic distribution of TR , and hence the p-values reported, is obtained via the bootstrap
procedure, the validity of which is established in HLN.
17
Other options are available such as the semi-quadratic statistic which is also returned by the package function.
45
11 Future Development
Any future extensions will likely be ’add-on’ packages released in the bitbucket code repository
of the package.
Since version 1.01-3, only xts data is supported, or data which can be coerced to this. This
is meant to simplify maintenance of the package whilst at the same time use what is a very
popular and widely adopted ’format/wrapper’. Some of the extractor functions will now also
return an xts formatted object.
Yes. Since version 1.0-14, rugarch makes exclusive use of the parallel package for all par-
allel computations. Certain functions take as input a user supplied cluster object (created by
calling parallel::makeCluster ), which is then used for parallel computations. It is then up to
the user to terminate that cluster once it is no longer needed. Allowing a cluster object to be
provided in this way was deemed the most flexible approach to the parallel computation problem
across different architectures and resources.
There are several avenues to consider here. The package offers 4 different solvers, namely ’solnp’,
’gosolnp’, ’nlminb’ and ’L-BGFS-U’ (from optim). Each solver has its own merits, and control
parameters which may, and should be passed, via the solver.control list in the fitting routines,
depending on your particular data. For problems where neither ’solnp’ nor ’nlminb’ seem to
work, try the ’gosolnp’ solver which does a search of the parameter space based on a trun-
cated normal distribution for the parameters and then initializes multiple restarts of the ’solnp’
solver based on the best identified candidates. The numbers of randomly generated parameters
(n.sim) and solver restarts (n.restarts) can be passed via the solver.control list. Additionally, in
the fit.control list of the fitting routines, the option to perform scaling of the data prior to fitting
usually helps, although it is not available under some setups. Finally, consider the amount of
data you are using for modelling GARCH processes, which leads to another FAQ below.
Q: How much data should I use to model GARCH processes with confidence?
The distribution of the parameters varies by model, and is left to the reader to consult relevant
literature on this. However, using 100 data points to try and fit a model is unlikely to be a sound
approach as you are unlikely to get very efficient parameter estimates. The rugarch package
does provide a method (ugarchdistribution) for simulating from a pre-specified model, data
of different sizes, fitting the model to the data, and inferring the distribution of the parameters
46
as well as the RMSE rate of change as the data length increases. This is a very computationally
expensive way to examine the distribution of the parameters (but the only way in the non-
Bayesian world), and as such should be used with care and in the presence of ample computing
power.
The package has a folder called ’rugarch.tests’ which contains many tests which I use for debug-
ging and checking. The files in the folder should be ’sourced’ by the user, and the ’runtests.R’
file contains some wrapper functions which describe what each test does, and optionally runs
chosen tests. The output will be a combination of text files (.txt) and figures (either .eps or
.png) in an output directory which the user can define in the arguments to the wrapper function
’rugarch.runtests’. It is quite instructive to read and understand what each test is doing prior
to running it. There are also online examples which you can find by typing rugarch in a search
engine.
Please use the R-SIG-Finance mailing list to post your questions. If you do mail me directly, do
consider carefully your email, debug information you submit, and correct email etiquette (i.e.
do not send me a 1 MB .csv file of your data and at no time send me an Excel file).
47
References
K. Aas and I.H. Haff. The generalized hyperbolic skew student’s t-distribution. Journal of
Financial Econometrics, 4(2):275–309, 2006.
Y. Ait-Sahalia. Testing continuous-time models of the spot interest rate. Review of Financial
Studies, 9(2):385–426, 1996.
S. Anatolyev and A. Gerko. A trading approach to testing for predictability. Journal of Business
and Economic Statistics, 23(4):455–461, 2005.
Torben G Andersen and Tim Bollerslev. Intraday periodicity and volatility persistence in finan-
cial markets. Journal of Empirical Finance, 4(2):115–158, 1997.
Richard T Baillie, Tim Bollerslev, and Hans Ole Mikkelsen. Fractionally integrated generalized
autoregressive conditional heteroskedasticity. Journal of econometrics, 74(1):3–30, 1996.
T. Bollerslev. A conditionally heteroskedastic time series model for speculative prices and rates
of return. The Review of Economics and Statistics, 69(3):542–547, 1987.
Tim Bollerslev and Hans Ole Mikkelsen. Modeling and pricing long memory in stock market
volatility. Journal of econometrics, 73(1):151–184, 1996.
G.E.P. Box, G.M. Jenkins, and G.C. Reinsel. Time series analysis: Forecasting and control.
Prentice Hall, 1994.
P. Christoffersen, J. Hahn, and A. Inoue. Testing and comparing value-at-risk measures. Journal
of Empirical Finance, 8(3):325–342, 2001.
P.F. Christoffersen and F.X. Diebold. Financial asset returns, direction-of-change forecasting,
and volatility dynamics. Management Science, 52(8):1273–1287, 2006.
Christian Conrad and Berthold R Haag. Inequality constraints in the fractionally integrated
garch model. Journal of Financial Econometrics, 4(3):413–449, 2006.
Z. Ding, C.W.J. Granger, and R.F. Engle. A long memory property of stock market returns and
a new model. Journal of Empirical Finance, 1(1):83–106, 1993.
R.F. Engle. Autoregressive conditional heteroscedasticity with estimates of the variance of united
kingdom inflation. Econometrica, 50(4):987–1007, 1982.
R.F. Engle and T. Bollerslev. Modelling the persistence of conditional variances. Econometric
Reviews, 5(1):1–50, 1986.
R.F. Engle and J. Mezrich. Grappling with garch. Risk, 8(9):112–117, 1995.
R.F. Engle and V.K. Ng. Measuring and testing the impact of news on volatility. Journal of
Finance, 48(5):1749–1778, 1993.
48
R.F. Engle, D.M. Lilien, and R.P. Robins. Estimating time varying risk premia in the term
structure: The arch-m model. Econometrica: Journal of the Econometric Society, 55(2):
391–407, 1987.
Robert F. Engle and Magdalena E. Sokalska. Forecasting intraday volatility in the us equity
market. multiplicative component garch. Journal of Financial Econometrics, 10(1):54–83,
2012. doi: 10.1093/jjfinec/nbr005. URL http://jfec.oxfordjournals.org/content/10/
1/54.abstract.
C. Fernandez and M.F. Steel. On bayesian modeling of fat tails and skewness. Journal of the
American Statistical Association, 93(441):359–371, 1998.
J.T. Ferreira and M.F. Steel. A constructive representation of univariate skewed distributions.
Journal of the American Statistical Association, 101(474):823–829, 2006.
Thomas J Fisher and Colin M Gallagher. New weighted portmanteau statistics for time series
goodness of fit testing. Journal of the American Statistical Association, 107(498):777–787,
2012.
A. Ghalanos and S. Theussl. Rsolnp: General non-linear optimization using augmented Lagrange
multiplier method., 1.11 edition, 2011.
L.R. Glosten, R. Jagannathan, and D.E. Runkle. On the relation between the expected value
and the volatility of the nominal excess return on stocks. Journal of Finance, 48(5):1779–1801,
1993.
L.P. Hansen. Large sample properties of generalized method of moments estimators. Economet-
rica, 50(4):1029–1054, 1982.
Peter R Hansen, Asger Lunde, and James M Nason. The model confidence set. Econometrica,
79(2):453–497, 2011.
Peter Reinhard Hansen, Zhuo Huang, and Howard Howan Shek. Realized garch: a joint model for
returns and realized measures of volatility. Journal of Applied Econometrics, 27(6):877–906,
2012.
L. Hentschel. All in the family nesting symmetric and asymmetric garch models. Journal of
Financial Economics, 39(1):71–104, 1995.
M.L. Higgins, A.K. Bera, et al. A class of nonlinear arch models. International Economic Review,
33(1):137–158, 1992.
Y. Hong and H. Li. Nonparametric specification testing for continuous-time models with appli-
cations to term structure of interest rates. Review of Financial Studies, 18(1):37–84, 2005.
Menelaos Karanasos, Zacharias Psaradakis, and Martin Sola. On the autocorrelation properties
of long-memory garch processes. Journal of Time Series Analysis, 25(2):265–282, 2004.
P.H. Kupiec. Techniques for verifying the accuracy of risk measurement models. The Journal
of Derivatives, 3(2):73–84, 1995. ISSN 1074-1240.
49
G.J. Lee and R.F. Engle. A permanent and transitory component model of stock return volatility.
In Cointegration Causality and Forecasting A Festschrift in Honor of Clive WJ Granger, pages
475–497. Oxford University Press, 1999.
A.J. McNeil and R. Frey. Estimation of tail-related risk measures for heteroscedastic financial
time series: an extreme value approach. Journal of Empirical Finance, 7(3-4):271–300, 2000.
J. Nyblom. Testing for the constancy of parameters over time. Journal of the American Statistical
Association, 84(405):223–230, 1989.
S.G. Pantula. Comment: Modelling the persistence of conditional variances. Econometric Re-
views, 5(1):71–74, 1986.
L. Pascual, J. Romo, and E. Ruiz. Bootstrap prediction for returns and volatilities in garch
models. Computational Statistics and Data Analysis, 50(9):2293–2312, 2006.
K. Prause. The generalized hyperbolic model: Estimation, financial derivatives, and risk mea-
sures. PhD thesis, University of Freiburg, 1999.
R.A. Rigby and D.M. Stasinopoulos. Generalized additive models for location, scale and shape.
Journal of the Royal Statistical Society: Series C (Applied Statistics), 54(3):507–554, 2005.
G.W. Schwert. Stock volatility and the crash of ’87. Review of Financial Studies, 3(1):77, 1990.
David Scott and Fiona Grimson. SkewHyperbolic: The Skew Hyperbolic Student t-Distribution,
version 0.3-3 edition. URL http://cran.r-project.org/web/packages/SkewHyperbolic/
index.html.
D.M. Stasinopoulos, B.A. Rigby, and C. Akantziliotou. gamlss: Generalized additive models for
location, scale and shape., 1.11 edition, 2009.
A. Tay, F.X. Diebold, and T.A. Gunther. Evaluating density forecasts: With applications to
financial risk management. International Economic Review, 39(4):863–883, 1998.
Y. Ye. Interior point algorithms: Theory and analysis, volume 44. Wiley-Interscience, 1997.
J.M. Zakoian. Threshold heteroskedastic models. Journal of Economic Dynamics and Control,
18(5):931–955, 1994.
50