EQI Gappy ch2 20240430
EQI Gappy ch2 20240430
The Questions
Draft (April 30, 2024). Please read the chapter carefully and send
comments and corrections to the author. Any contribution will be
acknowledged in the final copy.
Email: paleologo@gmail.com
Xtwitter: @__paleologo (DM me)
LinkedIn: https://www.linkedin.com/in/gappy/ (connect, then message)
We start with models of univariate returns for two reasons. First, because
single-asset returns are the basic constituents of portfolios. We cannot hope to
understand the behavior of portfolios if we do not have a solid understanding
of their building blocks, so it is necessary to summarize the salient empirical
37
38 CHAPTER 2. RETURNS: PROPERTIES AND MODELS
properties of stock returns and the most common processes employed to model
them, and specifically to model volatility e↵ectively. These models have general
applicability, and are even more useful when combined with other families of
models for multivariate returns. GARCH and exponential moving averages are
essential tools of the working modeler. In the process, I am introducing models
that justify their use. Exponential moving average find their motivation in linear
state-space models; GARCH is an instance of a nonlinear state-space model.
These models will be your friends for life.
There are five parts to the chapter. First, we lay out definitions of returns;
second, we summarize some “stylized facts” (empirical features of returns that
are ubiquitous and relevant to risk management); third, we skim GARCH
models and realized volatility models. Because both topics have been covered
extensively in textbooks, my goal here is to introduce the essentials and their
associated insights and give a jump-o↵ point for the reader. Then, I touch
on State-Space Model for Variance Estimation. Lastly, I cover spherical and
elliptical distributions.
2.1 Returns
2.1.1 Definitions
We have a set of n assets and a currency, also called the numeraire1 . We will
use dollars throughout as currency. It is customary to assume that each of these
assets is infinitely divisible. We buy the equivalent of a unit of currency for
asset i. We denote the value of the asset tomorrow Ri . An equivalent way to
define returns is from the closing price of security i on days 0 and 1, Pi (0) and
Pi (1), respectively. The return is defined as
Pi (1) Pi (0)
ri (1) :=
Pi (0)
We extend this definition to the case in which the security pays a dividend. The
holder of the asset receives an amount Di (1), and the return is then defined as
Pi (0) and Pi (1) respectively. The dividend-adjusted return is defined as
We denote the vector of daily returns at time t as (R1 (t), . . . , Rn (t)). A great
deal of equity risk management deals with the properties of this vector. For
a portfolio w 2 Rn , where wi is an investment in asset i, the Profit and Loss
(PnL)
P in a single
P period is given by the change in the0 value of the portfolio:
i wi Pi (1) i wi Pi (0). In vector form, this equals w r.
Pi (T )
ri (1 : T ) := 1
Pi (0)
Pi (T ) Pi (T 1) Pi (1)
= ... 1
Pi (T 1) Pi (T 2) Pi (0)
=(ri (T ) + 1) ⇥ (ri (T 1) + 1) ⇥ . . . ⇥ (ri (1) + 1) 1
If ri (t) are normally distributed, the cumulative total return is not normal
distributed, and its distribution rapidly diverges from the normal distribution.
The variance of the cumulative returns is not a simple function of the
single-period variances.
On the other side, log returns compound under multiplication. Let r̃(t) :=
log(1 + ri (t)). Then, the log of the compound return is equal to the sum of the
log returns in the same period, and if the log return is normal, so is the log of
the compound returns. If the returns are independent, the variance of the log of
compound log return is equal to the sum of the variances. We can reconcile the
two view of returns – raw and log – if the approximation log(x) = x 1+o(|x 1|)
is sufficiently accurate, i.e., if net returns are small. In this case, we can make
the approximation r̃i ' ri , which is sufficiently accurate provided the returns
are not too large.
A common approximation for the compounded net return of an asset over
40 CHAPTER 2. RETURNS: PROPERTIES AND MODELS
time is given by
!
Y X
(r(t) + 1) 1 = exp r̃(t) 1
t t
X
'1+ r̃(t) 1
t
X
' r(t) .
t
Always verify the accuracy of the approximation, for example comparing the
estimate of models developed using r and r̃. When the assets are equities, the
approximation is usually considered adequate for daily interval measurements
or shorter.
rather than ignore them and their impact and face unintended consequences.
Perhaps the simple model is the Roll model (Roll, 1984). Model for asset prices.
In this model, the “true” price mt of an asset evolves as an arithmetic random
walk, and we imperfectly observe the price pt . In formulas:
with ✏t , ⌘t independent random variables (serially and from each other) dis-
tributed according to a standard normal.
Before we try to estimate prices, the model has an immediate and testable
consequence: consecutive price di↵erences are negatively correlated. The price
di↵erence is pt+1 := ✏ ✏t+1 + ⌘ (⌘t+1 ⌘t ), which is zero in expectation. However,
2
E( pt+1 pt ) = ⌘
E( pt+1 ps ) =0, s<t
The lag-one autocorrelation can also be used to estimate the measurement error.
The presence of large non-zero autocorrelations beyond lag one may point to
model inadequacy, in the sense that there are actual long-term dependencies in
the price process mt . The model can be extended; see Section 2.4. An optimal
estimator for mt is provided by the Kalman filter. The filter is covered in the
Appendix, Section 14.3, and specifically in Example 1 of Subsection 14.3.1. The
estimator is given by
Where the explicit formula for K 2 (0, 1) is given in the Appendix. The smaller
the ratio ⌘ / ✏ , the higher the K, which makes sense: we do not need to average
observation if the price observations are accurate. The gist of the model is that
an exponential moving averages of prices is preferable to just taking the last price
in the measurement period. If we want the daily closing price, for example, we
may want to use a weighted average of 5-minute interval prices in the preceding
interval. There is a caveat, however. Suppose we have estimates m̂t , and we use
these estimates to compute returns at intervals T ; i.e. rT := m̂nT /m̂(n 1)T 1.
Because we employ the same observed prices p both in m̂(n 1)T and m̂nT the two
estimates are positively correlated. One should always check that (1 K)T ⌧ 1
to alleviate this spurious correlation.
42 CHAPTER 2. RETURNS: PROPERTIES AND MODELS
Skewness Kurtosis
Stock Mean Left Right Mean Left Right
AAPL -0.2 -0.5 0.2 5.7 3.6 7.8
IBM 0.1 -0.2 0.5 7.1 5.4 8.7
NRG 0.4 -0.5 1.2 14.3 7.9 20.0
WAT -2.0 -3.3 -0.6 29.8 12.8 48.1
SPY -0.1 -0.7 0.6 11.4 6.5 16.0
Table 2.1: Sample skewness and kurtosis of daily log returns and p = 0.01
confidence intervals estimated using nonparametric bootstrap with replacement
(5000 variates). Range: 1/3/2001-12/8/2017.
1.0
1.0
0.8
0.8
0.6
0.6
ACF
ACF
0.4
0.4
0.2
0.2
0.0
0.0
0 5 10 15 20 0 5 10 15 20
Lag Lag
(a) (b)
1.0
1.0
0.8
0.8
0.6
0.6
ACF
ACF
0.4
0.4
0.2
0.2
0.0
0.0
0 5 10 15 20 0 5 10 15 20
Lag Lag
(c) (d)
1.0
1.0
0.8
0.8
0.6
0.6
ACF
ACF
0.4
0.4
0.2
0.2
0.0
0.0
0 5 10 15 20 0 5 10 15 20
Lag Lag
(e) (f)
Reality3 is in stark contrast with simple models of univariate price dynamics like
the geometric di↵usion process at the core of simple derivative pricing models:
This model predicts Gaussian, independent log returns, which are inconsistent
with the empirical evidence. First, returns show little serial autocorrelation.
This does not mean that returns are independent, nor that returns are unpre-
dictable based on the history of returns or some additional explanatory variables.
Regarding the former point: zero-autocorrelation does not imply independence.
Regarding the latter, returns are predictable. This is not only an article of
faith of active investors, who usually do a terrible job at it, but also a relatively
uncontroversial empirical finding among academics4 . Nevertheless, even though
they are predictable, they are not so trivially predictable.
Regarding heavy tails: for asset returns, we restrict our attention to power-
tailed distributions: the complement of the cumulative distribution function
follows a power law: F̄ (x) := P (r > x) = Cx ↵ , with ↵ > 0. Compare this to
Gaussian returns: if r ⇠ N (0, 1), then a common approximation Wasserman
(2004) for the tail probability is
✓ ◆
1 2 1 1 1 x2 /2 1
p e x /2 F̄ (x) p e (2.2)
2⇡ x x3 2⇡ x
For the case |x| 1, the right-side inequality can be used to bound quantiles
and the symmetric inequality of the left tail:
q p
1 x2 /2 1
F̄ (x) p e )F̄ (1 ) 2 log[1/(( 2⇡(1 ))] (2.3)
2⇡ q
1 x2 /2 1
p
F (x) p e )F ( ) 2 log[1/( 2⇡ )] (2.4)
2⇡
3 Note,however, that I am not including the Leverage E↵ect among the stylized facts. In the words
of Cont (2001), “most measures of volatility of an asset are negatively correlated with the returns of
that asset”. This e↵ect is not sufficiently strong in recent data, as shown by Ratli↵-Crain et al. (2023).
4 John Cochrane has written extensively on this, e. g., Cochrane (2008) and the blog en-
large set of models in this family. The fundamental insight of the model is to
make the parameters in the model a part of the state of the stochastic process.
The laws for GARCH(1,1) are
rt = h t ✏t (2.5)
h2t = ↵0 + ↵1 rt2 1 + 2
1 ht 1 (2.6)
✏t ⇠ N (0, 1) (2.7)
To gain some intuition, let us look at the second equation of the GARCH process
when we remove the term ↵1 rt2 1 . The equation
h2t = ↵0 + 2
1 ht 1 (2.8)
can be rewritten as
h2t h2 = 2
1 (ht 1 h2 )
where h2 := ↵0 /(1 2 2
1 ). The value of ht converges to h at a geometric rate, so
long as | 1 | < 1. High values of the squared return rt2 shock the volatility upward,
provided that ↵1 > 0. This in turn increases the probability of large squared
returns in the following period, giving rise to a rich dynamic behavior. The
increase in volatility cannot continue unabated, because the term 1 (h2t 1 h2 )
will dampen variances that are much greater than the “equilibrium level” h2 .
This can be seen through substitution in the second equation of the model:
1
X
h2t 2
= h + ↵1 i 1 2
1 rt i (2.9)
i=1
One could replace the true values of ↵0 , ↵1 , 1 with estimates, and interpret the
formula by saying that the variance estimate is an exponential moving average
of non-iid returns, since they are modulated by ht , in light of Equation (2.5).
Set
at := 1 + ↵1 ✏2t 1
2.2. CONDITIONAL HETEROSCEDASTIC MODELS (CHM) 47
h2t = at h2t 1 + ↵0
This formulation shows that the process is Markovian, and that the variance
process is governed by an autoregressive equation with random coefficients. This
allows us to study the process using the toolkit of random recursive equations.
By recursion (Lindner, 2009), we can rewrite the equations as
k
! k Yi 1
Y X
2 2
ht = at i h t k 1 + ↵ 0 at j
i=0 i=0 j=0
Q
The product xt := ti=0 ai plays an important role (Nelson, 1990). If we can
identify the conditions under whichPit converges to zero almost surely (a.s.)
k Qi 1
and fast enough to guarantee that i=0 j=0 at j is finite a.s., then we have
proven the existence of an asymptotic limit for h2t . Let us consider the process
{xt : t > 0}. First, assume xt 0; it diverges if and only if log xt ! 1. We
then have to find the conditions under which
t
X
log( 1 + ↵1 ✏2i 1 ) ! 1 a.s.
i=0
Since this is the sum of iid random variables, a necessary and sufficient condition
for this is that µ := E[log( 1 + ↵1 ✏20 )] > 0, provided that the variance of
log( 1 + ↵1 ✏20 ) is finite. If that is the case, then we can apply the Strong Law of
Large Numbers:
t
1X
log( 1 + ↵1 ✏2i 1 ) ! µ a.s.
t i=0
Conversely, assume that E[log( 1 + ↵1 ✏20 )] < 0. Then log xt ! 1 a.s., and
xt ! 0 a.s. Under this condition, the unconditional variance is
1 Y
X i 1
h2t = ↵0 at j (2.10)
i=0 j=0
so the process is leptokurtic as long as ↵1 > 0. How about skewness? The the
unconditional returns are not skewed, because
Finally, we point out that not only are the unconditional returns leptokurtic, but
do in fact have Pareto tails, provided the process is stationary: P (rt > x) ⇠ x ↵ ,
for some ↵ > 0; see Mikosch and Stărică (2000); Buraczewski et al. (2016).
Summing up, some but not all of the stylized facts about log-returns are
captured by GARCH(1, 1).
2.2. CONDITIONAL HETEROSCEDASTIC MODELS (CHM) 49
10
● ● ●
● ●
●
●
5 ●●
●
●●●
●●●●
●● ●
●● ●
●
●●●● ● ●
●● ●
●● ●
●●
●
●
●●
●
●
●●●
●
●●
●
● ●
●
●
●●●
●●
●●
●
●
●●●● ●
●
●
●●●●
●
● ●
●
●
●●
●
●
●●
●
●
●
●●
●
● ● ●
●
●
●●
●
●
●
●●
● ●
●
●●
●
●●
●
●●
●
●
●
●●
● ●
●●
●
● ●
●● ●●
●
●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●● 5 ●
●●
●●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
● ● ●●
●●
●
●
●●
●
●
●● ● ●
●
●
●●
●
●
●
●
●
●●
● ●●
●
●
●●
●
●
●●
●
● ●●
●
●
●●
●
●
●●
●
●●
●●
●
●
●● ● ●
●●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●● ●
●
●
●●
●
●●
●
●
●
●●
●
●
●●
●
●● ●●
● ●●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●●
●
● ●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
● ●●
●
● ●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●● ●●
● ●●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
● ●
●
●●
● ●●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●●
● ●●
●
●● ●
●
●
●
●
●●
●
●
●●
●
●
● ●
●● ●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
● ●
●●
●● ●●
●
●
●●
0 ●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
● ●
●●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●● ●
●●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
● ●●
●
●●
●●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●●
●
●●
●
●
●● ●
●●
●
●●
●
●
●
●●
●
●●
●
●
sample
sample
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
● ●
●●
●
●●
●
●
●●
●
●
●●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
● ●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
● ●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
● ●●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
● ●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●● ●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
● ●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●●
●
● ●
●●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●●
●
●
●
●●
● ●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
● ●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●●
●
● ●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●●
● ●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●● ●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●●
●
● ●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● 0 ●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●
●● ●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●● ●●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
● ●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●● ●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
● ●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
● ●
●●
●
●
●●
●
●●
●●
●
●
●●
●
●
●●
●
●
● ●
●●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●● ●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●●
● ●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
● ●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●●
● ●
●
●
●●
●
●
●
●●
●
●●
●
●●
●
●● ●●
●
●
●
●●
●
●
● ●
●
●●
●
●●
●●
●● ●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●● ●●
●
●●
●
●●●
●
●●
●
●●
● ●
●●
●
●
●
●●
●
●●
● ●
●
●●
●●● ●●
●
● ●
●
●
●
●
●
●
● ●
●● ●
●
●● ●●
●
●●
●
● ●
●●
● ●●
●
●
●
●
●●● ●
● ●●
●
● ●
●
●
●● ●
● ●●
●
●
−5 ●●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●● ●●
●●
● ●
●●●
●●●
●
● ●●●
●● ● ● ●
● ● ● ● ●
●● ●
●●
−5 ●
●
●●●●
● ● ● ●
●●
● ●
● ● ●
−4 −2 0 2 4 −4 −2 0 2 4
theoretical theoretical
(a) (b)
15 10
● ●
● ●
●● ●
●●●
● ● ●●
●●
●●
●●●
10 ●
●
●
●●●
●
●●
●●
● ●●●
● ●●
●
●● ●
●
● ●●
●●
●
● ●
●
●
●●
●
● ●●
●●
●
● ●
●
●●
●
●
●●
●●
●● ●●
●
●●
●
●
●●
●
● ●
●●
●●
●
●
●
●
●●
● ●
●
●●
●●
●
●
●●
●
●
●●
●
●●
●
●●
●
●
●●
●
●●
●
●●
●●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●
●
●
● ●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
● ●●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
0 ●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
● ●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
● ●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
● ●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●
● ●
●
●●
●
●●
●
●
●
●●
●
●
●●
●
●●
●
●
5 ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●●● ●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●●
●
●●●●
●● ●●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
sample
sample
●
●● ●
●●
●●●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●● ●●
● ●
●
●●●
●
●
● ●
●●● ●●
● ●
●
●●●●
● ●●●
●
●
●
●
●●●
●● ●●
●●
● ●
●● ●
●
●●
●
●
●
●●
●
●
●●
●
●●
●
● ●
●●
●
●
●●
●●
●
●●
●
● ●
●●
●● ●
●●
●
●●
●
●
●
●
●
●●
● ●
●●
●
●●
● ●●
●
●
●
●●
●
● ●
●
●●
●●
●
●
●
●●
●
●●
●
●●
●
●●
●
●
●●
●
●●
●
●
●
●●
●
●●
●
●
●
●●
●
●●
●
● ●●
●
●
●●
●
●●
●
●
●
●●
●
●
●●
●
●● ●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●● ●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●● ●●
●
●
●
●●
●
●●●
●
●●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●● ●●
●●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
0 ●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
● ●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
● ●●●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●●
●●
●
●
●●
●
●
●●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●●●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●●●
●
●
●●
●
●●
●
● ●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●●
●
●●
●
●●
●
●
●● ●
●
●●
●
●●
● ●
●●
●
●●
●
●●
●
●●
●
●●
●
●
●●
●●
●
●● −10 ●
●
●
●●
●
●
●●●
●
●● ● ●
●
●●
●●●
●
● ●●
●
●●
●●
●● ●
●
●● ●
●● ●●
●
●●
● ●
●
●●● ●
●
●●
●
●●● ●
●
●●
●
● ●● ●●
●
●●
●
●●●●
−5
● ●
●
● ●
● ●
−2 0 2 −4 −2 0 2 4
theoretical theoretical
(c) (d)
● 10 ●
10
●
●
●
● ●
●● ●
5 ●●●● 5 ●
●
●
●
●●●●● ●●● ●
●
●
●
● ● ● ●
●●●●
●
●
●
●
●
● ●● ●
●
●
●
●●
● ● ●
●
●●
●
●
●
●●
●
● ●●●● ●
●●
●
● ●●
●
●
●
●●
●●
●
●●
●●
●●●●● ●●
●
●●
●
●
●●
●●
●
● ●
●●
● ●
●
●
●●
●
●
●
●
●● ●
●
●●
●
●● ●
●
●
●
●
● ●
●●
●●
●
●●
●
●
●●
●
● ●
●●
● ●●
●●●●
●●
●
●
● ●
●
●●
●●
●
●● ●
●● ●
●●
●●
●●
●●
●
●●
●
●● ●
●● ●
sample
●●
●
●●
●
●
●●
●
●● ●
●● ●
●●
●
●
●●
●
●●
●
●
●●
● ●
●● ●●
●
●
●●
●
●●
●
●●
●
sampl
●●
●
●
●●
●
●●
●
●
●
●
●
●
●●
●
●●
● ●●
●
●
● ●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●●
●
●●
●
●
● ●
●●
●
● ●
●
●●
●
●●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●●
●●
●
●
●●
● ●● ●
●●
●
●
●
●
●●
●
●●
●
●
●●
●
●
● ●
●
●●
●
●
●●
●
●
●● ●
●
●●●
● ●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●●●
●
●
●●
●
●●
●
●
●● ●
●
●●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●●
●
●
●●
●
●●
●
●
●●
●
● ●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●●
●
●●
●
●
●●
●
●
●●
●
● ●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
● ●●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
● ●
●
●
●●
●
●●
●
●●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
● ●●
●
●
●●
●
●●
●
●
●●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
● ●
●
●
●●
●
●●
●
● ●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●● ●
●
●
● ●
●
●
0 ●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●●●
●
●
●●
●
●
●●
●●
●
●
●●
●
●
●●
●
●
●●
● ●
●●
●
●
●●
●
●
●●
●●
●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●●●
●
●
●●
●
●
●●
●
● ●
●●
●
●
●●
●
●
●●●
●●
●
●●
●
●
●●
●
●
●
●●
●
●
●●
●
●
● ●
●
●●
●
●
●●
●
●
●●
● ●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
● ●
●
●●
●
●
●●
●
●
●● ●
●
●
●●
●
●
●●
●
●
●●
●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
● ●
●
●
●●
●
●
●● ●
●●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
● ●
●
●
●●
●
●
●●
●
● ●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
● ●
●
●
●●
●
●
●●
●
● ●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
● ●●
●
●
● ●
●
●●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
● 0 ●●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●●
●
●●
●●●
●
●
●●
●
●
● ●●
●
●
●●
●
●●
●
●
●●
●
●●
●●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●●
●
●●
●●
●
●
●●
● ●
●
●●
●
●
●●
●
●
●●
●
● ●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●
●●●
●
●●
●
●●
● ●●
●
●
●●
●
●
●● ●
●
●●
●
●
●●
●
●
●●
●
●●
●
●●
●
●●
●
●●
●
●
● ●
●
●
●●
●
●
●●
● ●●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●●
● ●
●
●●
●
●
●●
●
●
● ●
●
●●
●
●
●●
●
●
●
●
●
●●
●
●●
●
●
●● ●
●
●●
●
●
●● ●●
●
●
●●
●
●
●●
●
●
●●
●
●● ●
●
●
●●
●
●
●● ●
●●
●
●
●●
●
●
●
●●
●
●●
●
● ●
●●
●
●
●●
●
● ●
●
●●
●
●
●●
●
●
●●
●
●
●●●
●
● ●
●
●
●●
● ●●
●
●
●●
●
●
●●
●
●●
●
●
●●●
●● ●
●
●●
●
●
● ●●
●
●
●●
●
●
●●
●
●●●●
●
●●
● ●
●
●●
●
●
● ●
●
●●
●
●
●●
●
●
●
●
●● ●
● ●
●●
●
●
●●●
●●
●●
●
●●
●
●
●
●
●●
● ● ●
●●
●
●●
●●
●●
●
●●
●
●● ●
● ●
●●
●●●
●
●
●●
●● ● ●
●
●●●
●
●
●●
●
●●●● ● ●
●
●●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
● ●
●●
●●● ●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●●
●
●
●●
●
●●
●
●
●●
●
●●
●●
●
●
●
●● ●●
●●
●
● ●
●
●
●●
● ●
●
●●
●
●●
● ●
●●
●
● ●
●
●
● ●
●
●●
●
−5 ● ●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
● ●
●●
●●
● ●●
●
●●
●
●
● ● ●
● ●●
●
● ●●
●●● ● ●●
●
● ●● ● ●●
●
●●
● ●●
● ●● ●●
●
● ●
●
●
●
● ● −5 ● ●●
●
●
● ●
−4 −2 0 2 4 −4 −2 0 2 4
theoretical theoretical
(e) (f)
Figure 2.2: Quantile-Quantile plot for daily log returns (blue dots) and
GARCH(1,1) residuals (orange dots) of log returns against the theoretical
normal distribution for (a) AAPL, (b) IBM, (c) NRG, (d) WAT, (e) SPY, (f)
XLK. Return range: 1/3/2001-12/8/2017.
50 CHAPTER 2. RETURNS: PROPERTIES AND MODELS
rt = h t ✏t
h2t = (h2t 1 , rt2 1 , ✓)
2.2. CONDITIONAL HETEROSCEDASTIC MODELS (CHM) 51
T
X ✓ ◆
rt
L(✓) = f
t=1
ht (r1 , . . . , rt 1 , ✓)
We can then estimate the parameters ✓ of the model by maximizing the log-
likelihood. As an example, consider the GARCH(1,1) model. The recursive
equation for h2t is given by Equation (2.9), so we solve
T ✓
X ◆
rt2
min log h2t + 2
t=1
ht
t
!1/2
1 t 1 X
1 i 1 2
s.t. ht = ↵0 + ↵1 1 rt i t = 1, . . . , T
1 1 i=1
dp = ↵dt + dW
where W (t) is a Brownian process. ↵ 2 R (the drift) > 0 (the volatility) are
constants. In all applications of interest, the drift is much smaller than the
volatility: |↵| ⌧ . The quantity ↵/ is termed the (daily) Sharpe Ratio and
52 CHAPTER 2. RETURNS: PROPERTIES AND MODELS
will figure prominently in the rest of the book5 . We observe the process in the
interval [0, 1] and measure the state variable p at intervals of length 1/n. The
measured return is r(j) := p(j/n) p((j 1)/n). Clearly, r(j) are iid random
variables, and r(j) ⇠ N (↵/n, 2 /n). The maximum likelihood estimators for
drift and moments are
X
↵ˆ= r(j) = p(1) p(0)
j
X
ˆ12 = [r(j) ˆ /n]2
↵
j
The first remarkable phenomenon is that the MLE estimator for the drift
does not depend on the number of intervals n. Moreover, one can show that
↵) = var(p(1) p(0)), and p(1) p(0) ⇠ N (↵, ), so that var(ˆ
var(ˆ ↵) = 2 . The
estimation error does not depend on the number of intervals either. To estimate
the variance of ˆ12 and ˆ22 we need a few formulas. The moments of r(j) are
those of a Gaussian random variable with mean ↵/n and variance 2 /n:
↵
E[r(j)] = (2.12)
n
⇣ ↵ ⌘2 2
2
E[r (j)] = + (2.13)
n n
⇣ ↵ ⌘4 ⇣ ↵ ⌘2 2 ✓ 2 ◆2
E[r4 (j)] = +6 +3 (2.14)
n n n n
so that
✓ 2
◆2 ⇣ ↵ ⌘2 2
2
var(r (j)) = 2 +4 (2.15)
n n n
and
↵2
E(ˆ22 ) = 2
+ from Equation (2.13)
n
4 ⇣ ↵ ⌘2
var(ˆ22 ) =2 +4 2
from Equation (2.15)
n n
5 This
is the Sharpe Ratio of log returns, which is to a first approximation close to the daily Sharpe
Ratio computed on returns.
2.2. CONDITIONAL HETEROSCEDASTIC MODELS (CHM) 53
For the rest of us, the question is: what to choose? Liu et al. (2015) compare a
broad set of estimators, with several choices of parameters, for assets in di↵erent
asset classes (equities, futures, indices). They use Romano and Wolf’s procedure
for multiple comparison (Romano and Wolf, 2005) and Hansen et al.“model
confidence set” (Hansen et al., 2011). They find that the Vanilla RV at 5-minutes
intervals performs competitively across various assets and asset classes. There
are a few cases where this is not true. When higher-frequency measurements
are available, this estimator is outperformed by a one-minute subsampled RV,
by 1- and 5-second interval realized kernel. In addition, at lower frequencies,
5- and 15-minute truncated RV (Mancini, 2009, 2011) also outperforms vanilla
RV. However, where available, 5-min nonoverlapping intervals seem to be a
reasonable choice.
rt =ht ✏t
h2t =↵0 + 2
1 ht 1 + xt 1
xt =⇠ + h2t + ut (2.16)
The first two equations are similar to the standard GARCH(1, 1) model, with
one di↵erence: the term proportional to rt2 1 has been replaced by a term
proportional to xt 1 . This variable is the observed estimate of the realized
variance at time t; when this estimator is more accurate than the rough estimate
of variance rt2 1 , then the model will probably outperform GARCH(1, 1). The
last Equation (2.16) models the dynamic behavior of the realized variance. It
posits a linear dependence on ht and on a stochastic term ut . The random
variables u1 , . . . , ut are iid random variables, not necessarily with zero mean.
for some 0 < K < 1. We discount the past by giving its observations exponen-
tially decreasing weights, which makes sense, and even more so when we write
the estimate as a recursion:
A low value of K forgets the past faster. The formula is computationally efficient
both in terms of storage and computation. For uncentered variance estimation
of a return, this takes the form
middle with a relevant example. As it happens, this example is also the simplest
non-trivial example of a state-space model. The model (Muth, 1960) posits that
there is a scalar state xt that evolves randomly over time with the addition of a
gaussian disturbance to its previous value. We observe the state imperfectly;
the observation yt is a noisy measure value xt . In formulas:
The innovations and the measurement noises are gaussian with mean zero, and
their are independent of each other: ✏s ? ✏t ⌘s ? ⌘t for all s 6= t, and ✏s ? ⌘t for
all t and s. I skipped the derivation, which the interested reader can find in the
Appendix. Define the ratio of measurement to innovation noise := ⌧⌘ /⌧✏ The
stationary ˆt+1|t standard deviation of the state estimate is given by:
p
2 2 1 + (2)2 + 1
ˆt+1|t = ⌧✏
2
and the optimal estimation recursion is
2
ˆt+1|t
K := 2
ˆt+1|t + ⌧⌘2
x̂t+1|t =(1 K)x̂t|t 1 + Kyt
1.0
0.8
0.6
0.4
K
0.2
variance more concentrated around its long-term mean. This means that we
discount the past less. The detailed derivation of these formulas is in the
Appendix, Section 14.3.1.
where is a known constant, and ⇠t ⇠ N (0, 1); hence returns are, at any point
in time, lognormally distributed. Define
ut := log(1 + rt )
) ut = exp(xt /2)⇠t
) log u2t =xt + log ⇠t2
=xt + ⌘t +
where := E(log ⇠t2 ) ' 1.27, and ⌘t is a zero-mean random variable with
standard deviation stdev(log ⇠t2 ) ' 2.22. Define
yt := log u2t
= log[(log(1 + rt ) )2 ]
y t = x t + ⌘t
xt+1 = b + axt + ✏t
This is the same model as AR(1), from which we obtain an estimate x̂t . If = 0,
then the formulas take a simple form: ut ' rt and the state estimate is given by
rt = exp(xt /2)⇠t
Define
where and ⌘t are defined as for the Harvey-Shephard model above. The model
is completed by the Equations, also from the original model,
xt+1 =b + axt + ✏t
yt = log rt2
2.5 Exercises
Exercise 2.1 (Portfolio Covariances).
Exercise 2.3. (20) Provide an example of two random variables that are un-
correlated but dependent.
Exercise 2.5. (30) Let X, Y be two random variables taking values in R+ . Show
that cor(X 2 , Y 2 ) if and only if cor(X, Y ) > 0.
Exercise 2.6. (15) Derive the formula for E(h21 ) from Equation (2.10).
Exercise 2.7. (10) Prove that if E(h21 ) is finite, i.e. ↵1 + 1 < 1, then a
stationary distribution exists, i.e. E[log( 1 + ↵1 ✏20 )] < 0. (Hint: use Jensen’s
inequality)
62 CHAPTER 2. RETURNS: PROPERTIES AND MODELS
The Takeaways
and let w(⌦ ˆ r ) be its solution. Denote the realized variance of the portfolio
var(w(⌦ˆ r ), ⌦r ).
The realized volatility of portfolio w(⌦ˆ r ) is greater than the one of w(⌦r ),
and the two are identical if and only if ⌦r / ⌦ ˆ r.
ˆ r 1 ⌦r ⌦
ˆ r ), ⌦r ) b0 ⌦r 1 b b0 ⌦
var(w(⌦ ˆ r 1b
=
ˆ r 1b
ˆ r ), ⌦r ) b0 ⌦
var(w(⌦ b0 ⌦ˆ r 1b
papers have been written on it, and there are several monographs covering the
Kalman Filter in details from di↵erent perspectives: control (Simon, 2006),
statistical (Harvey, 1990), econometric (Hansen and Sargent, 2008). I cover
the KF for two reasons. First, because, for somewhat mysterious reasons, the
derivation of the KF is often more complicated that it should be. A rigorous yet,
I hope, intuitive proof essentially fits in half a page and should save the reader
a few hours. Secondly, I wanted to present the problem under two di↵erent lens,
and show its close connection to the Linear Quadratic Regulator (LQR). Both
problems are essential tools in the arsenal of the quantitative finance researcher,
so there is value in catching two birds with one stone2 .
What is di↵erent is that factors returns are usually not modeled as being serially
dependent.
2 However, should you catch birds, please don’t use stones, but nets, or food.
14.3. THE KALMAN FILTER 265
Once we have the posterior distribution given the observation yt , the conditional
distribution of xt+1 follows from Equation (14.4). xt+1 is Gaussian with the
following conditional mean and covariance matrix:
ˆ t+1|t =A⌃
⌃ ˆ t|t A0 + ⌃✏ (prediction step)
(14.8)
x̂t+1|t =Ax̂t|t 1
ˆ t|t B0 (B⌃
+ A⌃ ˆ t|t B0 + ⌃⌘ ) 1 (yt Bx̂t|t 1 ) (14.9)
266 CHAPTER 14. APPENDIX?
The measurement and time update equations above are the whole of the Kalman
Filter. If we combine Equations (14.6) and (14.8), the covariance matrix evolves
according to the equation:
ˆ t+1|t =A(⌃
⌃ ˆ t|t 1
ˆ t|t 1 B0 (B⌃
⌃ ˆ t|t 1 B0 + ⌃⌘ ) 1 B⌃
ˆ t|t 1 )A0 + ⌃✏
This is called a Riccati recursion. In steady state the covariance matrix does not
change in consecutive periods: ⌃ ˆ t+1|t = ⌃
ˆ t|t 1 . We can solve for the stationary
matrix:
14.3.1 Examples
Example 1 (Muth, 1960):
Define:
ut :=xt µ (14.19)
vt :=yt µ (14.20)
We rewrite the equation as
xt+1 µ =xt µ + (a 1)(xt µ) + ⌧✏ ✏t+1
ut+1 =x̃t + (a 1)ut + ⌧✏ ✏t+1
ut+1 =ax̃t + ⌧✏ ✏t+1
The state space equations are
ut+1 =aut + ⌧✏ ✏t+1
vt+1 =ut+1 + ⌧⌘ ⌘t+1
268 CHAPTER 14. APPENDIX?
2 2
a2 ˆt+1|t
4
(1 a )ˆt+1|t + 2 =⌧✏2
ˆt+1|t + ⌧⌘2
1h 2 q i
2
) ˆt+1|t = (a 1)⌧⌘2 + ⌧✏2 + (a2 1)⌧⌘4 + ⌧✏4 + 2(a2 + 1)⌧⌘2 ⌧✏2
2h q i
1
= (a2 1)⌧⌘2 + ⌧✏2 + [(a2 1)⌧⌘2 + ⌧✏2 ]2 + 4⌧⌘2 ⌧✏2
2 2 s 3
✓ ◆2
1 ⇥ ⇤ 2
= ⌧✏2 (a2 1)2 + 1 41 + 1+ 5
2 (a2 1)2 + 1
2
ˆt+1|t
K= 2
ˆt+1|t + ⌧⌘2
ût+1|t =(1 K)ût|t 1 + Kvt
Now replace u, v using Equations (14.19) and (14.20):
) x̂t+1|t =(1 K)x̂t|t 1 + Kyt
For a = 1 the formula is identical to that of Example 1; and it is straightforward
2
to verify that ˆt+1|t is decreasing in a, and consequently also K is decreasing in
a. There are two insights to be drawn from this:
1. The EWMA is still an optimal estimator for a mean-reverting model of
volatility.
2. In the presence of mean reversion, K decreases, everything else being
equal. We discount the past less, because mean-reversion causes volatility
to be more concentrated. When the volatility is changing less from period
to period, past observations become more informative.
Example 3 (Harvey and Shephard, 1996): The generating process for gross
returns Rt = Pt /Pt 1 is assumed to be
+exp(ht /2)⇠t
Rt = e
where is a known constant, and ⇠t ⇠ N (0, 1). Define ut = log Rt . Then
ut = exp(ht /2)⇠t . Square ut and take the logarithm to linearize the equation:
log u2t =ht + log ⇠t2
=ht + ⌘t +
14.4. LINEAR REGRESSION 269
where := E(log ⇠t2 ) ' 1.2703, and ⌘t is a zero-mean random variable. Define
yt = log u2t
=2 log | log Rt |
so that we get an observation equation:
y t = h t + ⌘t
Now, we posit an evolution equation for ht :
ht+1 = b + aht + ✏t
This is the same model as AR(1), from which we obtain an estimate ĥt . If = 1,
then the formulas take a simple form: ut ' rt and the volatility estimate is
given by
"1 #
X
ˆt ' exp (1 K) s (log | log Rt 1| )
s=0
One basic result in statistics and in control theory is that, if E(y 2 ) < 1, the
function that minimizes this expectation is the conditional expectation of y
given x. We introduce a new variable ✏:
y = E(y|x) + ✏ (14.22)
It follows thatE(✏) = E(y) E(E(y|x)) = E(y) E(y) = 0. Then use the chain
the following chain:
E[(ŷ(x) y)2 |x] =E[(ŷ(x) E(y|x) + E(y|x) y)2 |x] (14.23)
=E[✏2 |x] + E[(ŷ E(y|x)2 |x] 2E[✏|x](y E(y|x)) (14.24)
=E[✏2 |x] + E[(ŷ E(y|x)2 |x] (14.25)
E[✏2 |x] (14.26)
The equality holds only if ŷ = E(y|x). The term E[✏2 |x] is finite, because
E[✏2 |x] 2E(y 2 ) + 2E[E(y|x)2 ] (14.27)
2E(y 2 ) + 2E[E(y 2 |x)] (Jensen) (14.28)
=4E(y 2 ) (Iterated Expectation) (14.29)
<1 (14.30)
In applications, we have n samples (yi , xi ) and we choose a functional form for
ŷ = g(x, ✓) where ✓ is a finite-P
or infinite-dimensional vector. We then minimize
1 2
the empirical
P squared loss n i (yi g(x, ✓)) . The simplest form of g is linear:
g(x, ) = i i xi . In matrix form, Equation (14.22) becomes
y =X +✏ (14.31)
where y 2 Rn , X 2 Rn⇥m , 2 Rm . n are the observations, and m are the
“features”. we want to estimate the parameters , and estimates for X . We
then minimize the empirical loss
min ky X k2 (14.32)
A di↵erent way to arrive at the same problem is to posit that the true model is
Equation (14.31), and to further assume that ✏ ⇠ N (0, 2 In ). If we fix , we
have ✏ = y X ; and since we know the distribution of ✏, we can associate to
a choice of a likelihood f (✏| ). If we choose the parameter to maximize the
likelihood, we end up solving the same problem as Equation (14.32). The choice
of maximizing the likelihood is called the Maximum Likelihood Principle 4 .
4 For a detailed discussion of the MLP, see Robert (2007).
14.4. LINEAR REGRESSION 271
ŷ = X ˆ
= X(X0 X) 1 X0 y (14.34)
ˆ✏ = (I H)y (14.35)
Intuitively, the optimal estimates should not change if we change the base of
the subspace. To see this rigorously, transform X into XQ, where Q 2 Rm⇥m is
non-singular. The transformed set of predictors spans the same subspace as X.
Then
var( ˆ ) = 2
(X0 X) 1
(14.39)
Similarly,
var(ŷ) = 2
X(X0 X) 1 X0 (14.40)
var( ˆ) = 2
V⇤ 2 V0 (14.41)
var(ŷ) = 2
UU0 (14.42)
The variance of the estimates var( ˆ ) becomes larger as the columns of X become
more collinear. In our interpretation of the matrix X, this occurs when we
include factors that overlap heavily with pre-existing factors.
y = x1 1 + x2 2 +✏ (14.43)
where we have partitioned the predictors x = (x1 |x2 ). Equation (14.33) can
be rewritten by using block submatrices for X0 X and X0 y, and the formula for
the inverse of submatrices, in order to obtain ˆ 1 , ˆ 2 . It can be shown that
the coefficient ˆ 2 can be estimated by a two-step process. First, regress the
columnns of y2 on x1 : x2 = x1 + u, where x2 x1 ? u Second, regress y on
u: y = u 3 + v. The least-squared coefficient of this regression is the same as
ˆ 2 , i.e. ˆ 3 = ˆ 2 . The proof can be found in Hansen (2022).
14.4. LINEAR REGRESSION 273
Exercise 14.1. If a matrix X 2 Rn⇥m has near collinear columns, then there
is a vector u such that kXuk2 = h for some small positive h.