0% found this document useful (0 votes)
26 views57 pages

KalmanSlides 2

This document provides an introduction to stochastic filtering, focusing on Bayesian estimation and its application in filtering hidden processes using observable data. It outlines the basic principles, correction equations, and the Kalman filter, which is a powerful technique for estimating the state of a dynamic system. The document includes examples and emphasizes the recursive nature of the filtering process, allowing for efficient updates with new observations.

Uploaded by

Mihai Iancu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views57 pages

KalmanSlides 2

This document provides an introduction to stochastic filtering, focusing on Bayesian estimation and its application in filtering hidden processes using observable data. It outlines the basic principles, correction equations, and the Kalman filter, which is a powerful technique for estimating the state of a dynamic system. The document includes examples and emphasizes the recursive nature of the filtering process, allowing for efficient updates with new observations.

Uploaded by

Mihai Iancu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 57

An introduction to Filtering

Samuel N. Cohen
Mathematical Institute
University of Oxford

©2019, Not for general distribution


Filtering
What are we doing here?

I In these lectures, we are going to look at the basic principles


of ‘stochastic filtering’.
I The key idea is that you have two processes, which are
correlated, and you use observations of one (which you can
see) to determine the behaviour of the other (which you can’t
see)
I Our aim is to give applicable theory, with numerical examples
(all implemented in the statistical environment R (available at
r-project.org))
I We will also give an example with data from high-frequency
trading.

Filtering: Introduction 2
A basic problem
Bayesian estimation of the mean

I We begin with a simple Bayesian estimation problem, which


will lead nicely to filtering.
I We have a hierarchical model for some observations
Y1 , Y2 , ..., YT with unknown mean X .
I For simplicity, suppose X ∼ N(µ0 , τ02 ) and Yi |X ∼ N(X , σ 2 ),
where the Yi are conditionally independent.
I We assume σ, τ0 , µ0 are all known.
I Our aim is to estimate X from the observations of Y ’s.

Filtering: A basic problem 3


The joint density
Bayesian estimation of the mean

We write out the joint density of X and Y1 , expand and complete


the square, to see that
 (x − µ )2 (y − x)2 
0
f (x, y ) ∝ exp − 2

2τ0 2σ 2
2 2
µ0 /τ0 +y /σ 2
 x − 1/τ 2 2 (y − µ0 )2

0 +1/σ
= exp − 1
− .
2 1/τ 2 +1/σ 2 2(σ 2 + τ02 )
0

Using Bayes’ theorem, we conclude that


 µ /τ 2 + Y /σ 2 1 
0 0 1
X |Y1 ∼ N , =: N(µ1 , τ12 ).
1/τ02 + 1/σ 2 1/τ02 + 1/σ 2

Filtering: A basic problem 4


The correction equations
Bayesian estimation of the mean

I This gives us a way of ‘correcting’ our opinions of X given the


first observation
I We take a weighted average for the mean, and add the inverse
variances (‘precisions’).
I Of course, we can repeat this, to include the second
observation, then the third,..., and after some simplification
we find
X |(Y1 , ..., Yt ) ∼ N(µt , τt2 ),
where
2 + Y /σ 2
µt−1 /τt−1 t 1
µt = 2
, τt2 = 2
1/τt−1 + 1/σ 2 1/τt−1 + 1/σ 2

Filtering: A basic problem 5


Simplification
Bayesian estimation of the mean

I This simplifies, in this setting, to

tσ 2 µ0 + τ02 Ȳt 1
µt = , τt2 =
tσ 2 + τ02 1/τ02 + t/σ 2
1 Pt
with Ȳt = t s=1 Ys .
I This simplification is special to this particular setting.

Example 1

Filtering: A basic problem 6


The correction dynamics
Bayesian estimation of the mean

I Let’s focus on the way the distribution changes.


I Whenever we get a new observation Yt , we correct our
estimate of X , by updating the conditional distribution with
the rule
2 Y Yt+1
(µt−1 , τt−1 t
) −→ (µt , τt2 ) −→ (µt+1 , τt+1
2
).

I This is the basic idea of filtering: we have a hidden value X ,


and use our observations to update an estimate of X (in
particular, its conditional distribution).

Filtering: A basic problem 7


A simple filtering problem
Bayesian estimation of a changing mean

I Instead of X being constant, we will now assume that X is a


random process.
I In particular, we will take X0 ∼ N(µ0 , τ02 ) and

Xt |Xt−1 ∼ N(Xt−1 , γ 2 ), Yt ∼ N(Xt , σ 2 ).

I Equivalently,

X0 = τ0 W0 , Xt = Xt−1 + γWt , Yt = Xt + σVt ,

where W , V are standard white noise (i.e. Wt , Vt are


independent N(0, 1)).
I Here γ, σ, µ0 and τ0 are all known.
I We call X the signal process and Y the observation process.

Filtering: The simplest filtering problem 8


Dependence diagram
Bayesian estimation of a changing mean

The dependence diagram for our model is the following:

X0 X1 X2 ··· XT

Y1 Y2 ··· YT

Our conclusion depends on learning from the observations Y , but


also takes account of the fact that X is changing through time.

Filtering: The simplest filtering problem 9


Solving the filter
Bayesian estimation of a changing mean

I We want to find the distribution of Xt given Y1 , ..., Yt .


I Write Ft = σ(Xs , Ys ; s ≤ t) for the ‘full information filtration’
and Yt = σ(Ys ; s ≤ t) for the ‘observation filtration’.
I We can now repeat calculations similar to those we did before:

I From the dynamics, we have the prediction:

X0 ∼ N(µ0 , τ02 ) ⇒ X1 ∼ N(µ0 , τ02 + γ 2 )


2
I Writing τ1|0 = τ02 + γ 2 , Bayes’ rule gives the correction:

 µ0 /τ 2 + Y1 /σ 2 1 
1|0 2
X1 |Y1 ∼ N 2 + 1/σ 2 , 2 + 1/σ 2 =: N(µ1|1 , τ1|1 ).
1/τ1|0 1/τ1|0

Filtering: The simplest filtering problem 10


Solving the filter
Bayesian estimation of a changing mean

I In general, we write µt|t−1 for the mean of Xt given Yt−1 and


µt|t for the mean of Xt given Yt , similarly for the variances
2
τt|t−1 2 .
and τt|t
I Our system can be described in two steps, prediction and
correction:
2 Prediction 2 Correction 2
(µt−1|t−1 , τt−1|t−1 ) −→ (µt|t−1 , τt|t−1 ) −→ (µt|t , τt|t ),

where
2 2
µt|t−1 = µt−1|t−1 , τt|t−1 = τt−1|t−1 + γ2,
2
µt−1|t−1 /τt|t−1 + Yt /σ 2 2 1
µt|t = , τt|t = 2
.
1/τ 2 + 1/σ 2 1/τt|t−1 + 1/σ 2
t|t−1

Filtering: The simplest filtering problem 11


Solving the filter
Bayesian estimation of a changing mean

I By iterating these equations, we solve our filtering problem,


that is, we have a complete description of the distribution of
Xt given Yt for every t.
I These calculations are recursive, so including new observations
is simple (and fast!).

2 )
(µ1|0 , τ1|0 2 )
(µ2|1 , τ2|1 ···

2 )
(µ0|0 , τ0|0 2 )
(µ1|1 , τ1|1 2 )
(µ2|2 , τ2|2

Y1 Y2

Filtering: The simplest filtering problem 12


Solving the filter
Bayesian estimation of a changing mean

Example 2

I These equations can be solved quickly, using only basic


methods.
I Updating only involves addition, multiplication and division.
I Division can be largely avoided by using precisions (τ −2 )
instead of variances (τ 2 ).
2 converges quickly to a stationary value, the
I Observe that τt|t
limit is found by solving the equation
1 1 p 2 
τ2 = ⇒ τ 2
= γ 4σ + γ 2 − γ2 .
1/(τ 2 + γ 2 ) + 1/σ 2 2

Filtering: The simplest filtering problem 13


The Kalman Filter
The general setup

I We have just solved a simple case of the famous Kalman


filtering problem.
I The general case has two differences: our processes are vector
valued and the relationship between X and Y is more general
(but still linear).
I These simple generalizations yield an extraordinarily powerful
technique.

Filtering: Discrete Time: The Kalman filter 14


The Kalman Filter
The general setup

Consider the following model:

Xt = AXt−1 + Wt , Yt = CXt + Vt

with starting distribution X0 ∼ N(µ0|0 , P0|0 ).


Here
I W , V are white noise processes in Rk and Rd with respective
(nonnegative definite) variances Γ and Σ
I In other words, Wt ∼ N(0, Γ) and Vt ∼ N(0, Σ) for all t, all
values independent.
I A and Γ are k × k-matrices, C is d × k, Σ is d × d.
I We know A, C , Γ, Σ.

Filtering: Discrete Time: The Kalman filter 15


The key equations
Conditioning normal distributions

The key fact we will need is that


if you have jointly (multivariate) normal random variables
Y , Z , then Y |Z is also normal.

Furthermore
E [Y |Z ] = E [Y ] + cov(Y , Z )var(Z )−1 (Z − E [Z ])
var(Y |Z ) = var(Y ) − cov(Y , Z )var(Z )−1 cov(Y , Z )>

These facts can be proven using the densities, and justify


everything that follows.

Filtering: Discrete Time: The Kalman filter 16


The Kalman Filter: Prediction
Step 1 of the filter

I We know that Xt |Yt = Xt |(Y1 , Y2 , ..., Yt ) is normal (and


similarly Xt |Yt−1 ),
I Using the dynamics of X and Y , we can easily calculate the
prediction equations:

µt|t−1 = E [Xt |Yt−1 ] = E [AXt−1 + Wt |Yt−1 ]


= AE [Xt−1 |Yt−1 ]
= Aµt−1|t−1
Pt|t−1 = var(Xt |Yt−1 ) = var(AXt−1 + Wt |Yt−1 )
= Avar(Xt−1 |Yt−1 )A> + var(Wt |Yt−1 )
= APt−1|t−1 A> + Γ

Filtering: Discrete Time: The Kalman filter 17


The Kalman Filter: Kalman Gain
Step 2a of the filter

I The correction equations are made simpler if we first calculate


the ‘innovation’ process η and its variance S
I η tells us what ‘new’ information we learn from Yt

ηt = Yt − E [Yt |Yt−1 ] = Yt − C µt|t−1 ,


St = var(ηt |Yt−1 ) = var(Yt |Yt−1 ) = CPt|t−1 C > + Σ.

I Using S, we can calculate the ‘Kalman gain’ process, which


allows us to optimally incorporate new information,

Kt = Pt|t−1 C > St−1 = (St−1 CPt|t−1 )>

Filtering: Discrete Time: The Kalman filter 18


The Kalman Filter: Prediction
Step 2b of the filter

Finally, it is easy to calculate the correction equations:

µt|t = µt|t−1 + Kt ηt ,
Pt|t = (I − Kt C )Pt|t−1 .

Given these equations, we are ready to calculate!

Example 3

Filtering: Discrete Time: The Kalman filter 19


The Kalman Filter: Forecasting
Easy with Matrices!

I Using our equations, it is easy to see how to calculate the


forecasted values E [Xt |Ys ] for s < t.
I By direct recursion:

µt|s = E [Xt |Ys ] = At−s µs|s .

I Furthermore, the conditional variance Pt|s = var(Xt |Ys )


satisfies
Pt+1|s = APt|s A> + Γ
which is easy to calculate recursively.

Filtering: Discrete Time: The Kalman filter 20


The Kalman Filter: Smoothing
Harder, but useful!

I Calculating the ‘smoother’, that is, µt|N = E [Xt |YN ] for


t < N is also possible.
−1
I First write Jt = Pt|t A> Pt+1|t . Then, using our basic
properties of normal distributions (and plenty of algebra),

µt|N = µt|t + Jt (µt+1|N − µt+1|t ),


Pt|N = Pt|t + Jt (Pt+1|N − Pt+1|t )Jt> ,

I These can be calculated backwards, starting at time N.


I In effect, you first do a single forward pass through the
observations from 0 → N calculating the filter, then
backwards from N → 0 to calculate the smoother.
Example 3 (ctd)

Filtering: Discrete Time: The Kalman filter 21


The Kalman Filter: Smoothing
One-step correlations

I We shall see that, when trying to fit a filter in practice, it will


also be useful to know the values of

Pt−1,t|N := E [(Xt − µt|N )(Xt−1 − µt−1|N )> |YN ].

I Fortunately, there is a formula:

PN−1,N|N = (I − KN C )APN−1|N−1 ,
> >
Pt−1,t|N = Pt|t Jt−1 + Jt (Pt,t+1|N − APt|t )Jt−1 .

I The derivation is even more algebra than before.


I It can also be calculated using a single sweep back through
the data.
Exercise: prove these formulae!

Filtering: Discrete Time: The Kalman filter 22


Example: An ARMA(1,1) process
A common time series model

To see how rich a theory this gives, consider an ARMA(1,1)


process, where
xt = φxt−1 + θzt−1 + zt
for constants φ, θ and white noise z.
I We only observe xt .
I It’s difficult to calculate E [xt |xt−1 , xt−2 , ...], which is usually
needed when fitting these models.
I This does not look like the models we’ve considered...

Filtering: Discrete Time: The Kalman filter 23


Example: An ARMA(1,1) process
A common time series model

To see how rich a theory this gives, consider an ARMA(1,1)


process, where
xt = φxt−1 + θzt−1 + zt
for constants φ, θ and white noise z.
I We only observe xt .
I It’s difficult to calculate E [xt |xt−1 , xt−2 , ...], which is usually
needed when fitting these models.
I This does not look like the models we’ve considered... until
we write it as a ‘state space’ model.

Filtering: Discrete Time: The Kalman filter 24


Example: An ARMA(1,1) process
Surprisingly a Kalman Filter model!

We can write
        
xt φ 1 xt−1 1 φ 1
Xt = = + zt = Xt−1 + Wt
θzt 0 0 θzt−1 θ 0 0

and  
Yt = xt = 1 0 Xt−1 .
I Hence we can apply the Kalman filter to X , and so efficiently
calculate
   
1 0 µt|t−1 = 1 0 E [Xt |Yt−1 ] = E [xt |xt−1 , xt−2 , ...].

I In our earlier notation, we have Σ = 0, A, C as indicated and


 
1 θ
Γ= .
θ θ2
Filtering: Discrete Time: The Kalman filter 25
Hidden Markov Models
Another simple filter

I The equations we’ve seen have been fairly ‘nice’.


I The filters can be solved in closed form, recursively, and are
finite dimensional.
I This is because we have assumed throughout that all our
random variables are Gaussian, and all the relationships
between them are linear.
I Without this assumption, as we will see in continuous time,
we are in a much more difficult situation.
I One other case where a nice set of equations can be obtained
is when X is a finite-state Markov chain.

Filtering: Discrete Time: Hidden Markov Models 26


Hidden Markov Models
A general setup

I Suppose X is a finite-state Markov chain. We write X as a


process Xt = AXt−1 + Mt where X takes values in the basis
vectors in Rd , and M is a martingale difference process (so
E [Mt |Ft−1 ] = 0).
I The matrix A> is the familiar transition matrix of the Markov
chain.
I We just need to calculate the probability X takes values in
each state, or equivalently, the vector µt|t = E [Xt |Yt ] ∈ Rd
(as P(Xt = ei |Yt ) = E [ei> Xt |Yt ] = ei> µt|t
I We assume that Yt |Ft ∼ c(y ; Xt )m(dy ), where c is some
density function and m is some measure (no normality is
needed).

Filtering: Discrete Time: Hidden Markov Models 27


The Filter
Still easy to calculate

I We can directly calculate the prediction equation:

µt|t−1 = E [Xt |Yt−1 ] = E [AXt−1 + Mt |Yt−1 ] = Aµt−1|t−1 .

I To calculate the correction equation, we use Bayes’ theorem:

c(Yt ; ei )P(Xt = ei |Yt−1 )


P(Xt = ei |Yt , Yt−1 ) = P
j c(Yt ; ej )P(Xt = ej |Yt−1 )
∝ c(Yt ; ei )P(Xt = ei |Yt−1 )

Filtering: Discrete Time: Hidden Markov Models 28


Forecasting and Smoothing
Simple algorithms

I Again, forecasting is easy: µt|s = At−s µs|s for s < t.


I Smoothing can be done with a backward pass, by looking at a
‘dual’ variable ν satisfying the equation (for N > t)

νt|N ∝ A> C (Yt+1 )νt+1|N , νN|N = 1,

and then calculating µt|N ∝ µt|t νt|N , where the product is


taken component by component.
I There are closed form equations for other quantities also (for
example, estimating occupation times, the number of
transitions, functions of X and Y , ... see Elliott, Aggoun and
Moore, Hidden Markov Models, Springer 1995)

Example 4
Filtering: Discrete Time: Hidden Markov Models 29
Continuous time
Much more technically difficult

I So now we move gear a little technically, as we want to see


what happens in continuous time.
I This is particuarly useful as a model when observations occur
in very high frequency, as it allows us to find good
approximations to our problem.
I On the other hand, it becomes more difficult to find and solve
the filtering equations.

Filtering: Continuous Time: The key equations 30


The reference probability method
A nice version of Bayes’ theorem

I The approach we shall take is called the ‘reference probability


method’.
I It depends on the following result, which will serve as “Bayes’
theorem” in this context.

Theorem
Suppose we have a probability measure Q ∼ P. Write the
Radon–Nikodym density Z = dQ/dP, and suppose we have a
filtration {Ft }t≥0 . Then for any t ≥ 0 and any random variable ξ,
we know that
EP [Z ξ|Ft ]
EQ [ξ|Ft ] = .
EP [Z |Ft ]

Filtering: Continuous Time: The key equations 31


A continuous model
Common basic time series model

I We assume as before that we have processes X and Y , on an


interval [0, T ].
I These satisfy the SDEs

dXt = f (t, Xt )dt + κ(t, Xt )dBt


dYt = c(t, Xt )dt + dWt

where f , κ, c are known (Lipschitz continuous) functions, and


B and W are Brownian motions.
I We assume X and Y are scalar and B and W are independent
for simplicity.
I These assumptions can be relaxed, but the notation becomes
more difficult.

Filtering: Continuous Time: The key equations 32


Feynman–Kac
Connecting SDEs and PDEs

I From the Feynman–Kac theorem/Ito’s lemma, we know that


for any smooth bounded function φ,
Z
φ(Xt ) = φ(X0 ) + Lφ(Xu )du + martingale
[0,t]

where L is the infinitesimal generator of X , that is,

∂φ 1 ∂2φ
Lφ = f (t, x) + · 2 κ(t, x)2 .
∂x 2 ∂x
I We expect L to be part of the solution to our filtering
problem.

Filtering: Continuous Time: The key equations 33


Changing measure
Making Bayes’ theorem work for us

I We define a probability Q by dQ
dP = ZT , where
 Z   Z t 1 t
Z 
Zt = E − c(s, Xs )dWs = exp − c(s, Xs )dWs − c(s, Xs )2 dt .
t 0 2 0

I We write Λ = 1/Z , and using Ito’s lemma we can see that


dΛt = Λt c(t, Xs )dYs .
I Using Girsanov’s theorem, this change of measure has the
effect of changing the drift in Y , so under Q we have the
dynamics

dXt = f (t, Xt )dt + κ(t, Xt )dBt , dYt = dWtQ

where B and W Q are independent Q-Brownian motions.


I X and Y are independent under Q!
Filtering: Continuous Time: The key equations 34
Unnormalized expectations
Expanding with Ito

I We will now try to calculate the unnormalized expectations,


which we write:

σt (φ) := EQ [Λt φ(Xt )|Yt ].

I “Bayes’ theorem” tells us that EP [φ(Xt )|Ft ] = σt (φ)/σt (1).


I Now, we can write out Λs φ(Xs ) using Ito’s lemma. This gives

∂φ 1 ∂2φ
d(Λφ(X ))t = Λt dXt + Λt 2 κ(t, Xt )2 dt + Λt φ(Xt )c(t, Xt )dYt
∂x 2 ∂x
∂φ
= Λt Lφ(Xt )dt + Λt κ(t, Xt )dBt + Λt φ(Xt )c(t, Xt )dYt
∂x

Filtering: Continuous Time: The key equations 35


Unnormalized expectations
Using independence

I Taking an expectation, as (X , B) and Y are Q-independent,


we have the ‘Zakai equation’
σt (φ) = EQ [Λt φ(Xt )|Yt ]
Z t Z t
= σ0 (φ) + EQ [Λs Lφ(Xs )|Yt ]ds + EQ [Λs φ(Xs )c(s, Xs )|Yt ]dYs
0 0
Z t Z t
= σ0 (φ) + EQ [Λs Lφ(Xs )|Ys ]ds + EQ [Λs φ(Xs )c(s, Xs )|Ys ]dYs
0 0
Z t Z t
= σ0 (φ) + σs (Lφ)ds + σs (φc)dYs
0 0

I This is a simple equation apart from one thing: the term


σs (φc) cannot be calculated recursively in terms of σs (φ).

Filtering: Continuous Time: The key equations 36


Normalized expectations
Simplifying with Ito

I Rearranging and applying Ito’s lemma, we can obtain an


equation for the normalized expectations

πs (φ) := σs (φ)/σs (1) = E [φ(Xs )|Ys ],

the ‘Fujisaki–Kallianpur–Kunita’ equation


Z Z

πt (φ) = π0 (φ)+ πs (Lφ)du+ πs (φc)−πs (φ)πs (c) dVs .
[0,t] [0,t]

I Here dVs = dYs − πs (c)ds is the (differential of the)


‘innovations process’ (and is a Y-Brownian motion under P).

Filtering: Continuous Time: The key equations 37


The Density equation
Finding an SPDE

I Let’s assume
R X has a smooth density given Yt , so
σt (φ) = R φ(x)q(t, x)dx. Then we see that
Z Z Z tZ
φ(x)q(t, x)dx = φ(x)q(0, x)dx + Lφ(x)q(s, x)dxds
R R 0 R
Z tZ
+ φ(x)c(s, x)q(s, x)dx dYs
0 R
I By integration by parts, if L∗ is the adjoint of L

∂(qf ) 1 ∂ 2 (qκ)
L∗ q = + · ,
∂x 2 ∂x 2
we calculate
Z Z  Z t Z t 
φ(x)q(t, x)dx = φ(x) q(0, x)+ L∗ q(s, x)ds+ c(s, x)q(s, x)dYs dx
R R 0 0

Filtering: Continuous Time: The key equations 38


The Density equation
Finding an SPDE

I This should hold for every smooth and bounded φ, so we have


the linear SPDE
Z t Z t

q(t, x) = q(0, x) + L q(s, x)ds + c(s, x)q(s, x)dYs
0 0

We can then calculate the density of Xt |Yt as

q(t, x)
p(t, x) = R 0 0
.
R q(t, x )dx

I One can also get a nonlinear SPDE for the normalized density.
I Solving SPDEs is hard, so this equation is not frequently
solved in practice in this general form – instead it suggests
good approximations, or allows special cases to be derived.

Filtering: Continuous Time: The key equations 39


The Kalman–Bucy filter
The Continuous-time Gaussian model

I Let’s see the continuous-time Gaussian case.


I Here we assume c(t, Xt ) = cXt , f (t, Xt ) = aXt and
κ(t, Xt ) = b. Then we have the dynamics

dXt = aXt dt + b dBt , dYt = cXt dt + dWt

I Here a, b, c are known.


I With Ỹ = Y /c and f = 1/c, this is the same as the model
for observations d Ỹt = Xt dt + f dWt .
I We know that these equations define a Gaussian process (i.e.
all marginals are jointly normal), so it’s enough to calculate
the mean and variance.

Filtering: Continuous Time: The Kalman–Bucy Filter 40


The Kalman–Bucy filter
Simplifying...

I Write X̂t = EP [Xt |Yt ].


I First observe that everything here is Gaussian, and X̂ − X is
uncorrelated with Ys for all s < t.
I In particular, this implies they are independent, and

E [(Xt − X̂t )2 |Yt ] = E [(Xt − X̂t )2 ] =: Pt

is deterministic.
I Also, E [(Xt − X̂t )3 |Yt ] = 0.

Filtering: Continuous Time: The Kalman–Bucy Filter 41


The Kalman–Bucy filter
Applying the FKK equation

I Taking φ(x) = x so X̂t = πt (φ), we know Lφ ≡ 0, so


Z t Z t
πs (Xs2 ) − X̂s2 dVs

X̂t = X̂0 + aX̂s ds + c
Z0 t Z0 t
= X̂0 + aX̂s ds + c Ps dVs .
0 0

I Notice this is in terms of the innovations process V .

Filtering: Continuous Time: The Kalman–Bucy Filter 42


The Kalman–Bucy filter
Applying the FKK equation

I Taking φ(x) = x 2 ,
Z
2 2
πt (X ) = π0 (X ) + (2aπs (X 2 ) + b 2 )du
[0,t]
Z
+c (πs (X 3 ) − X̂u πu (X 2 ))dV .
[0,t]
Z t Z t
X̂t2 = X̂02 + 2
2a(X̂s ) ds + 2c X̂s Ps dVs .
0 0

I Taking a difference and simplifying, we obtain a Riccatti


equation for the variance P
Z t
2 2
Pt = πt (X ) − X̂t = P0 + (2aPs + b 2 − c 2 Ps2 )du.
0

Filtering: Continuous Time: The Kalman–Bucy Filter 43


The Kalman–Bucy filter
The filter

I Together, we have an SDE for the mean


Z t Z t
X̂t = X̂0 + aX̂s ds + c Ps dVs .
0 0

and a (deterministic) Riccatti equation for the variance


Z t
Pt = P0 + (2aPs + b 2 − c 2 Ps2 )du.
0
I This pair of equations is called the ‘Kalman–Bucy filter’.
I It can then be approximated using the usual methods for
SDEs/ODEs (eg Euler methods)
I It is possible to obtain a Kalman–Bucy smoother as well.
Example 5

Filtering: Continuous Time: The Kalman–Bucy Filter 44


The Wonham filter
Continuous Markov Chains

I Just as in discrete time, there is a continuous time equation


for the filter based on a (continuous time) Markov chain.
I Here we have the dynamics

dXt = AXt dt + dMt


dYt = c > Xt dt + dWt

where A> is the Q-matrix of the Markov chain, M is a


martingale and c is a vector.

Filtering: Continuous Time: The Wonham Filter 45


The Wonham filter
Continuous Markov Chains

I As X is written using only basis vectors in RN , any function of


X can be written Φ(X ) = φ> X for some vector φ ∈ RN .
I While X is not of the form we considered earlier, we can still
find the generator of X is LΦ = A> φ, and the adjoint of L is
simply L∗ v = Av .
I We can calculate (from the Zakai equation) the unnormalized
probability vector for the state of X
Z t Z t
E [Xt |Yt ] ∝ qt = q0 + Aqu du + diag(c)qu dYu .
0 0

I This equation is just an N-dimensional linear SDE. Equations


for the smoother are also known.

Filtering: Continuous Time: The Wonham Filter 46


Calibrating a filter
Trying to make things useable.

I What we have seen so far deals with the problem of how to


take our observations Y and obtain the behaviour of X .
I However, we have assumed throughout that we know the
probability model, that is, all the other parameters are fixed.
I The question of how to estimate those parameters is what we
consider next.
I This problem has a wide range of approaches, depending on
the details involved.
I We shall focus on a simple case, using the EM algorithm
(discussed in Elliott, van Der Hoek and Malcolm (2005),
based on Shumway and Stoffer (1982)).

Filtering: Calibration 47
Calibrating a filter
A setup

I We focus on the following simple scalar version of the discrete


time Kalman filter:
Xt+1 = a + bXt + cWt+1
Yt = Xt + f Vt .

I If we could observe X and Y directly, then we could calculate


a, b, c and f easily using regression.
I If we cannot observe X , then we need to use a more advanced
method.

Filtering: Calibration 48
Calibrating a filter
The equations

I Our filtering equations simplify to

µt+1|t = a + bµt|t , Pt+1|t = b 2 Pt|t + c 2


Pt+1|t
Kt+1 =
Pt+1|t + f 2
µt+1|t+1 = µt+1|t + Kt+1 (yt+1 − µt+1|t )
Pt+1|t+1 = (1 − Kt+1 )Pt+1|t = f 2 Kt+1

I The smoothing equations are (with Jt = bPt|t /Pt+1|t )



µt|N = µt|t + Jt µt+1|N − (a + bµt|t )
Pt|N = Pt|t + Jt2 Pt+1|N − Pt+1|t


Pt−1,t|N = Jt−1 Pt|t + Jt Jt−1 Pt,t+1|N − bPt|t
PN−1,N|N = b(1 − KN )PN−1|N−1

Filtering: Calibration 49
Calibrating a filter
The EM algorithm

I So, how to estimate the parameters?


I Simply regressing the smoothed values of X gives bias, as we
expect X will be ‘rougher’ than the smoothed values.
I The likelihood function is hard to compute, as it depends on
X and Y
I If we assumed we could calculate the expectation, then we
could instead try to maximize the expected log-likelihood
E [`(a, b, c, f ; {Xt , Yt }t≤T )|YT ]
I We can then iterate (calculate parameters) ↔ (calculate filter
estimates) until convergence. This is the
“Expectation-Maximization algorithm”, as we iterate between
(Maximum likelihood step) ↔ (Expectation step).

Filtering: Calibration 50
Calibrating a filter
The maximum estimates

I The maximization step can be solved! (all sums from 1 to N):


1
Xt−1 )(Xt − N1
P P P
E [ (Xt−1 − N Xt−1 )|YN ]
b̂ = P 2
E [ (Xt − X̄ ) |YN ]
Pt−1,t|N + µt|N µt−1|N − N1
P P P P
µt|N µt−1|N
=
Pt|N + µ2t|N − N1 ( µt|N )2
P P P

1 X 1 X
â = µt|N − b̂ µt−1|N
N N
1 h X i
ĉ = E (Xt − â − b̂Xt−1 )2 YN
N
1 X
= Pt|N + µ2t|N + â2 − 2âµt|N + 2âb̂µt−1|N
N 
+ b̂ 2 (Pt−1|N + µ2t−1|N ) − 2b̂Pt−1,t|N − 2b̂µt|N µt−1|N )
1 X 
fˆ = (Yt − µt|N )2 + Pt|N
N
Filtering: Calibration 51
The Kalman–Bucy filter
Simplifying...

I We start off with approximate estimates of a, b, c, f


I We iterate the EM algorithm to improve these estimates
I Convergence may be slow (or get stuck) given bad starting
points.
I Given a large amount of data, it may be worth starting with
only a small subsample, then increasing the amount of data
used as you go.

Example 6

Filtering: Calibration 52
Pairs trading
A simple application

I We will look at using these methods to creat a basic pairs


trading system, using a toy setup, with real data.
I We will build this using one-second mid-prices for Microsoft
(MSFT) and Intel Corp. (INTC), on individual days in the
week beginning 3 November 2014.
I Thanks to Álvaro Cartea and Sebastian Jaimungal for data.

Filtering: Application: Pairs Trading 53


Pairs trading
A model

I We will model this using the method in previous section, as


suggested by Elliott, van der Hoek and Malcolm (2005).
I We fit the filter using Y = log(INTC/MSFT) using the
Kalman–EM method described above.
I We will then create a trading signal depending on the value of
Yt − µt|t .
I If our model is reasonable, we expect this value will revert
quickly to zero, which suggests a profitable trade, either long
INTC and short MSFT (if Y < µt|t ) or vice versa.
I Effectively, we expect prices to oscillate around a short-term
mean

Filtering: Application: Pairs Trading 54


Pairs trading
A model

I We choose to trade only when the difference is sufficiently


large, in such a way that we have a position 1% of the time.
I We reevaluate our position every second, and only
invest/short at most $1 in each stock.
I We ignore all transaction costs, microstructure issues, trading
constraints, etc.
Example 7

Filtering: Application: Pairs Trading 55


Pairs trading
A model

I This suggests that these methods can be used to build


profitable trading strategies.
I Of course, we would need to incorporate further effects into
our model of profits before implementing this in practice, as in
the real world we can only buy at the ask and sell at the bid,
which will likely eliminate most of our observed gains.
I Filtering is fast, which is important in this setting.

Filtering: Application: Pairs Trading 56


Conclusion
What have we done

I We have looked at the problem of filtering in a variety of


contexts.
I Discrete/Continuous time
I Gaussian/Finite state (or general with an SPDE solution)
I We have seen how you can implement these filters, and how
to estimate the coefficients in a simple setting
I The EM algorithm can be used more widely
I We have seen a toy application of these methods to financial
data

Filtering: Conclusion 57

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy