0% found this document useful (0 votes)

21 views134 pages

An Introduction To Stochastic Control

Uploaded by

ansd39

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views134 pages

An Introduction To Stochastic Control

Uploaded by

ansd39

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 134

An introduction to stochastic control

F. J. Silva
Dip. di Matematica Guido Castelnuovo

March 2012
Some disperse but useful results

This section is based on [24, Chapter 1]. Let Ω be a nonempty set, and let F ⊂ 2Ω.
We say that

F is a π -system if A, B ∈ F ⇒ A ∩ B ∈ F .

F is a λ-system if (i) Ω ∈ F ; (ii) A, B ∈ F and A ⊆ B imply B \ A ∈ F ; (iii)

Ai ∈ F , Ai ↑ A implies that A ∈ F .

Lemma 1. [σ -field of a π -system, or λ-π lemma] If a π system A is contained in a λ

system F then σ(A) ⊆ F .

Proof: See any standard book on measure theory (e.g. [2])

Example of application: [Uniqueness of the extension of a measure defined on a

π -system] Let P and Q be two probability measures on (Ω, F ) which coincide on a

1
π -system A, then they coincide on σ(A). In fact, it is enough to define C = {C ∈
F ; P (C) = Q(C)} and two verify that it is a λ-system.
Corollary 1. [Measurable function w.r.t. the σ -field of a π system] Let A be a π system.
Let H be a linear space of functions from Ω to R such that

(i) 1 ∈ H; (ii) IA ∈ H for all A ∈ A

(iii) φi ∈ H, 0 ≤ φi ↑ φ, φ is bounded ⇒ φ ∈ H.

Then H contains all σ(A)-measurable functions from Ω to R.

Proof: Let φ be σ(A)-measurable. Clearly we have that

n + n
X −n
φ ↑ φ , with φ (ω) := j2 I{φ+(ω)∈[j2−n,(j+1)2−n]} (1)
j≥0

with an analogous approximation for φ−. Therefore, by (iii), it is enough to show that
φn ∈ H. But φ is the sum of indicator of elements in σ(A). Therefore it is natural to
consider the set
F := {A ∈ Ω ; IA ∈ H}.

2
and to prove that σ(A) ⊆ F . But this follows from lemma 1, since A is a π -system and
A ⊂ F , which is easily shown to be a λ-system.
Theorem 1. [Dynkin theorem] Let (Ω, F ) and (Ω0, F 0) two measurable spaces, and let
(U, d) be a Polish space. Let ξ : Ω → Ω0 and φ : Ω → U be r.v.’s. Then φ ∈ σ(ξ)-
measurable, i.e. φ−1(B(U )) ⊆ ξ −1(F 0), iff there exists a measurable η : Ω0 → U
such that
φ(ω) = η(ξ(ω)) for all ω ∈ Ω.

Proof: Consider the case U = R (the general case can be obtained using an
isomorphism theorem, see [19]) and define the set
0
H := η(ξ) ; for some F 0-measurable map η : Ω → R .

We have to shown that the set of σ(ξ)/B(R) measurable maps belongs to H. This can
be done by checking the assumptions of corollary 1 with A = σ(ξ) and proving that H
satisfies (i), (ii) and (iii).

Exercise: Do the details of the above proof.

3
Lemma 2. [Borel-Cantelli] Let (Ω, F , P) be a probability space and consider a sequence
of events Ai ∈ F . We have
X
P(Ai) < ∞ ⇒ P(∩i ∪j≥i Aj ) = 0.
i

Proof: Straightforward. Note that P(∩i ∪j≥i Aj ) ≤ P(∪j≥iAj ) for all i and use the
convergence of the series.

Lemma 3. [Chebyshev inequality] Consider a nonnegative r.v. X . Then, for all p ∈

(0, ∞) and ε > 0 we have
E(X p)
P(X ≥ ε) ≤ p
.
ε
Proof: It suffices to note that
Xp
Z Z
p p
P(X ≥ ε) = P(X ≥ ε ) = I{X p≥εp}dP(ω) ≤ p
dP(ω).
Ω Ω ε

4
Conditional expectation

This section is based on [24, Chapter 1]. Consider a probability space (Ω, F , P). Let
X ∈ L1(Ω) and G be a sub-σ -field of F . Define the signed measure µ : G → R as
Z
µ(A) := X(ω)dP(ω) for all A ∈ G.
A

By the Radon-Nikodym theorem there exists a unique P|G a.s. r.v. f ∈ L1G (Ω) (i..e. in
particular G -measurable) such that
Z Z
f dP = XdP for all A ∈ G. (2)
A A

The function f is called the conditional expectation of X given G , and we write

E(X|G) := f.

5
Fundamental properties of E(·|G)
All the properties below are simple consequences of (2) (Do them as an exercise!)

(i) E(·|G) is a linear bounded functional.

(ii) For a constant a ∈ R, we have E(a|G) = a.

(iii) [Mononoticity] For X, Y ∈ L1F with X ≥ Y we have E(X|G) ≥ E(Y |G).

(iv) [Take out the measurable part] For X ∈ L2F and Y ∈ L2G , we have
E(Y X|G) = Y E(X|G).

(v) [Characterization of independence] X is independent of G iff for every Borel f

such that f (X) ∈ L1(Ω) we have E(f (X)|G) = E(f (X)).

(vi) [Tower or “projection” property] If G1 ⊆ G2 ⊆ F, then

E(E(X|G1)|G2) = E(E(X|G2)|G1) = E(X|G1).

(vii) [Jensen inequality] Let φ convex such that φ(X) ∈ L1(Ω), then

φ(E(X|G)) ≤ E(φ(X)|G).

6
[Conditioning one r.v. X w.r.t. another r.v. ξ ]

Let X ∈ L1F and ξ : (Ω, F ) → (U, B(U )). Note that we can always define
−1
E(X|ξ) := E(X|ξ (B(U ))) = η(ξ) for some B(U )/B(R)- measurable function η,

by Dynkin theorem. Therefore, it is natural to define

E(X|ξ = x) := η(x).

Another way to define this is by appealing to the Radon-Nikodym. In fact, let us define in
B(U ) the measure Z
ν(B) := X(ω)dP(ω),
ξ −1 (B)
which is absolutely continuous with respect to Pξ := P ◦ ξ −1 (the image measure of P
dν
under ξ ). Therefore, there exists a Radon -Nikodym derivative dP ε
(unique Pε-a.s.) such
that
dν
Z Z
(x)dPε(x) = X(ω)dP(ω) for all B ∈ B(U ).
B dP ε ξ −1 (B)

7
We define
dν
E(X|ξ = x) := (x).
dPε

It can be checked that

−1 dν
η(ξ(ω)) = E(X|ξ (B(U )))(ω) = (ξ(ω)) P|ξ−1(B(U )) − a.s. (3)
dPε

Integrating w.r.t. a set of the form ξ −1(A) and using the definition of Pξ , we obtain that

dν
η(x) = (x) Pε − a.s.
dPε

Note that incidentally, this gives another proof of Dynkin theorem. Let us prove (3). We

8
have
dν dν
R R
ξ −1 (B) dPε
(ξ(ω))dP(ω) = B dPε
(x)dPε(x)
= ν(B)
R
= ξ −1 (B)
X(ω)dP(ω)
E(X|ξ −1(B(U )))(ω)dP(ω).
R
= ξ −1 (B)

which yields the result.

Now, we define the conditional probability w.r.t. a σ -field G as

P(A|G) := E(IA|G).

Nota that for each B ∈ F we have that P(B|G) is only defined P|G a.s. Thus, it is not
sure that we can find some A ∈ G with P(A) = 1 such that if we fix any ω ∈ A, we
can measure all the sets B ∈ F . However, we can give a sense to this using the concept
of regular conditional probability (see [2] for more on this).

9
[Characterization of E(·|ξ) in terms of {g(ξ) ; g bounded continuous}] Let us now
prove that

E(X|ξ) = 0 iff for all bounded continuous g we have E(g(ξ)X) = 0.

The “only if” part is direct. To prove the “if part”, first note that

E(X|ξ) = 0 iff E(φX) = 0 for all φ that are σ(ξ) measurable.

Now, consider the set

H := {φ : (Ω, F ) → (R, B(R)) ; E(φX) = 0}.

We have to show that H contains the σ(ξ)-measurable functions. It is clear that H

satisfies the assumptions (i) and (iii) of corollary 1. We only have to construct a π -system
A such that IA ∈ H for all A ∈ A and σ(A) = σ(ξ). Let us take

−1
A := {ξ ([a, b]) ; for some a < b}.

10
If A = ξ −1([a, b]) ∈ A we have

E(IA(ξ)X) = E(I[a,b](ξ)X).

Now, take any sequence of continuous functions g n → I[a,b](ξ), where the convergence is
pointwise. Since E(g n(ξ)X) = 0, by passing to the limit we get that E(IA(ξ)X) = 0
and so assumption (ii) of corollary 1 is verified.

Note that the same proof yields that if G = σ(ξ1, ..., ξn) we have

E(X|G) = 0 iff for all bounded continuous g : Rn → R we have E(g(ξ1, . . . , ξn)X) = 0.

The interesting fact is that the result is also valid when we condition to a countable
set of r.v’s. More precisely, using the above result and the technique of its proof we get
the following result.

Proposition 1. [Checking conditions only on a finite set of variables] Consider a

11
sequence of variables ξ1, ξ2, . . . and define G := σ({ξi ; i ∈ N}). Then

E(X|G) = 0 iff for all n ∈ N and g ∈ Cb(Rn) we have

E(g(ξ1, . . . , ξn)X) = 0.

12
Stochastic process: Basic definitions

Good references for this part are [7, 14].

Let I be a non-empty index set and (Ω, F , P) a probability space. A family

{X(t); t ∈ I} is called a stochastic process. It can also be interpreted as for each ω
you have a function X(·, ω) defined on I .

We define, for ti ∈ I , the so-called finite-dimensional distributions

Ft1 (x1) := P(X(t1) ≤ x1)

Ft1,t2 (x1, x2) := P(X(t1) ≤ x1, X(t2) ≤ x2)
...
Ft1,...,tn (x1, . . . xn) := P(X(t1) ≤ x1, . . . X(tn) ≤ xn)

Clearly, the family Ft1...tn satisfies:

13
(i) [Symmetry condition] For any permutation σ of (t1, · · · , tn) we have

Ftσ(1),...,tσ(n) (xσ(1), . . . xσ(n)) = Ft1,...,tn (x1, . . . xn) (4)

(ii) [Compatibility condition] For all i < j

Ft1,...,ti,ti+1,...,tj (x1, . . . , xi, ∞, . . . , ∞) = Ft1,...,ti (x1, . . . , xi). (5)

Question: Given a family of functions Ft1,...,ti (·, . . . , ·) satisfying the the symmetry and
compatibility condition, do we have an stochastic process with the desired finite-dimensional
distributions?
Theorem 2. [Kolmogorov’s existence theorem] Given a family Ft1,...,ti (·, . . . , ·)
satisfying (4) and (5), there exists a probability space (Ω, F , P) and a stochastic process
X whose finite dimensional distributions are the desired ones.

Proof: See e.g. [7].

From now on we will assume that I := R+.

14
Given two process X and X 0 we say that X 0 is a modification of X if for all t ≥ 0
we have P(X(t) = X 0(t)) = 1. Note that the negligible sets depend on the time t.
We say that X 0 is a version of X if there exists A ∈ F with P(A) = 1 such that
X(ω, ·) = X 0(ω, ·) for all ω ∈ A.

Example: Let (Ω, F , P) = ([0, 1], FLeb, L) and define X(t, ω) = 1 if ω = t

and 0 if not. Then X 0 ≡ 0 is a modification of X but not a version. In fact, for all
ω ∈ [0, 1] we have that X(·, ω) 6= X 0(·, ω).

Despite of the above example, it should be noted that modifications are in fact very
useful. For example, a modification of a process preserves its finite-dimensional laws.

[Continuous process] We say that X is continuous (right-continuous, left-continuous)

if for all ω we have X(·, ω) is continuous (resp. right-continuous, left-continuous).

[Filtration and usual conditions] A family of σ -fields {Ft}t≥0 is called a filtration if

Fs ⊆ Ft ⊆ F for all 0 ≤ s ≤ t. We say that {Ft}t≥0 satisfies the usual conditions, if
F0 contains all the negligible sets, it is right-continuous, i.e. Ft = Ft+ := ∩s>tFs and
it is left-continuous, i.e. Ft = Ft− := σ (∪s<tFs).

15
[Measurability of process] A stochastic process X in [0, T ] is:

(i) measurable, if X : [0, T ] × Ω → R is [B(0, T ) × F ]/B(R) measurable.

(iii) adapted, if for all t ∈ [0, T ] we have that X(t) is Ft/B(R) measurable.

(iii) progressively measurable (to simplify we will say “progressive”) if for all t ∈ [0, T ]
we have that X : [0, t] × Ω → R is B([0, t]) × Ft/B(R)-measurable.

Evidently, every progressive process is adapted. What happens with the converse? In
general, it is not true. However,
Proposition 2. [Adapted continuous process are progressive] If an adapted process X is
left or right continuous, then it is progressively measurable.

Proof: Suppose that X is right continuous and consider the sequence

n kT kT (k + 1)T
X (t) := X if t ∈ , for k = 0, . . . , n − 1.
n n n

16
It is easy to see that X n are progressive and by the right continuity they converge to X
and so X is progressive.

For the general case, we have the following difficult result (for the proof see [17]) :
Proposition 3. [Progressive modifications] Every adapted process has a progressive
modification.

Why progressive measurability is important? Since you have a “dynamic” form of

product measurability, this allows (as we will see later) to do the composition of process
and random variables in such a way that the result is still at least adapted. Note that
all the process Y (t) of interest should be at least adapted (which correspond that I can
decide if Y (t) satisfies something or not, with the events that I can measure up to time
t, i.e. Ft).

Now, we construct two natural filtrations in the space W[0, T ] := C[0, T ]. Set

Wt[0, T ] := {ξ(· ∧ t) ; ξ ∈ W[0, T ]} ,

Bt(W[0, T ]) := σ (B(Wt[0, T ]))
Bt+ (W[0, T ]) := ∩s>tBs(W[0, T ])

17
Remark 1. Note that by definition W[0, T ] ∈/ B(Wt[0, T ]), this is one reason to consider
σ(B(Wt[0, T ])), the σ -field in W[0, T ], which by definition contains W[0, T ].

Note also that Bt(W[0, T ]) 6= Bt+ (W[0, T ]). In fact, the event

{ξ ∈ W [0, T ] ; ξ has a local maximum at t̄} ,

belongs to Bt̄+ (W[0, T ]) \ Bt̄(W[0, T ]).

We denote by

A(U ) the set of Bt+ (W[0, T ])- progressive measurable process (6)

with values in a Polish space U , i.e. if f ∈ A(U ), the restriction of f to [0, t] × W[0, T ]
is [0, t] × Bt+ (W[0, T ])/B(U ) measurable.

The following theorem is the extension for filtrations of Dynkin theorem 1. For the
proof see [24].

18
Theorem 3. [Dynkin theorem for filtrations generated by a continuous process] Consider
(Ω, F , P) and a Polish space U . Let ξ : [0, T ] × Ω → R be a continuous process and
denote by Ftξ := σ(ξ(s) ; 0 ≤ s ≤ t) for its associated filtration. Then a process
φ : [0, T ] × Ω → U is {Ftξ }t≥0 adapted iff there exists η ∈ A(U ) such that

φ(t, ω) = η(t, ξ(· ∧ t, ω)).

Nex we provide a sufficient condition for a.s. Holder continuity (up to modification of
the process) for a general stochastic process.

Theorem 4. [Kolmogorov-Centsov Continuity Criterium] Let Xt be a stochastic process

satisfying
r 1+γr
E [|X(t) − X(s)| ] ≤ C|t − s| , 0 ≤ s, t, ≤ T, for some r, γ, C ≥ 0.

Then X has a modification which is α-Holder continuous, for every α ∈ (0, γ).

Proof: See e.g. [7] or [14].

19
Stopping times

Let (Ω, F , {Ft}t≥0, P) be a filtered probability space satisfying the usual conditions.

A mapping τ : Ω → [0, ∞] is a stopping time (w.r.t. {Ft}t≥0), if for all t ≥ 0

we have {τ ≤ t} ∈ Ft.

The σ -field Fτ (of the events “before τ ”) is defined as

Fτ := {A ∈ F ; A ∩ {τ ≤ t} ∈ Ft, ∀ t ≥ 0}.

Can we change ≤ by < in the definition of τ and Fτ ? The answer is yes because the
filtration is right-continuous (Exercise).

20
We will only list some interesting properties of stopping times. See [14] for the proofs.

Proposition 4. Let σ , τ and σi be stopping times. Then

(i)[Construction of stopping times] Then the following r.v. are also stopping times

σ + τ ; sup σi; inf σi; lim sup σi; lim inf σi.
i i i i

(ii)[Comparison of stopping times] The events {σ > τ }, {σ ≥ τ } and {σ = τ }

belong to Fσ∧τ . Moreover, if A ∈ Fσ then A ∩ {σ ≤ τ } belongs to Fτ . In particular,
if P-a.s. we have σ ≤ τ then Fσ ⊆ Fτ .
(iii)[The σ -field of the infimum of stopping times] Let σ̂ = inf i σi, then Fσ̂ = ∩iFσi .

21
Proposition 5. [Tower property along stopping times] Let τ and σ be two stopping
times. Then

E(Iσ>τ X|Fτ ) = Iσ>τ E(X|Fτ ) = Iσ>τ E(X|Fτ ∧σ )

Proposition 6. [Stochastic process and stopping times] Suppose that the filtration
satisfies the natural conditions. Let X be progressive and τ an stopping time. Then
X(τ ) is Fτ measurable and the process X(τ ∧ t) is progressive.

Important example of stopping times:

[First hitting time and first exit time of a continuous process] Let X be adapted and
continuous and E an open set. The first hitting time of X to E is defined as
σE (ω) := inf{t ≥ 0 ; X(t, ω) ∈ E} and the first exit time of X from E is defined
as τE (ω) := inf{t ≥ 0 ; X(t, ω) ∈ E c}. Both random times are stopping times
(exercise!).

22
Martingales

A process X(t) is called a Ft-martingale (resp. submartingale, supermartingale) if

it is adapted, X(t) ∈ L1(Ω) and

E(X(t)|Fs) = X(s) (resp. ≥, ≤), for all 0 ≤ s ≤ t.

Note that in particular for a martingale (resp. submartingale, supermartingale) ,

t → E(X(t)) is constant (resp. increasing, decreasing).
Given a submartingale and a convex non-decreasing φ such that φ(X(t)) ∈ L1(Ω) for
all t ∈ [0, T ], Jensen inequality yields that φ(X(t)) is also a submartingale.

23
Fundamental properties

See [14] for the proof of the following results:

[Doob inequality I] Let p ≥ 1 and X(t) a right continuous martingale with

X(t) ∈ Lp(Ω) for all t ∈ [0, T ]. Then
!
E (|X(T )|p)
P sup |X(t)| > λ ≤ p
.
t∈[0,T ] λ

Note that the above inequality is an important improvement of Chebyshev’s inequality.

[Doob inequality II] For all p > 1, and X(t) a right continuous martingale with
X(t) ∈ Lp(Ω) for all t ∈ [0, T ], we have
! p
p p p
E sup |X(t)| ≤ E (|X(T )| ) .
t∈[0,T ] p−1

24
[Optional sampling theorem, or martingale property over stopping times] Let σ ≤
τ be two bounded stopping times. Then, for any martingale (resp. submartingale,
supermartingale) X(t) we have

E (X(τ ) |Fσ ) = X(σ) (resp. ≥, ≤) P − a.s.

As a corollary we obtain that for stopping times σ ≤ τ we have

E (X(t ∧ τ ) − X(t ∧ σ)|Fσ ) = 0.

Proof: Since X(t ∧ τ ) is Ft mesurable, the optional sampling theorem yields

X(t ∧ σ) = E (X(t ∧ τ )|Ft∧σ )

= E (E (X(t ∧ τ )|Ft) |Fσ ) = E (X(t ∧ τ )|Fσ ) ,

which gives the result.

25
Multivariate Normal distribution

A d-dimensional r.v. X is said to have the multivariate Gaussian distribution with

mean µ and covariance Σ if its density is given by

−d

1 > −1 d
(2π detΣ) 2 exp − 2 (x − µ) Σ (x − µ) for all x ∈ R .

The particular form of the density yields that if Σ is diagonal then the coordinates of X
are one dimensional independent variables with normal distribution. (you can factorize the
multiple integrals in order to calculate the distributions).
Proposition 7. [Characterization of a multivariate gaussian] X is a multivariate
gaussian iff for every b ∈ Rd we have that b>X is a univariate gaussian.

Proof: See any basic probability book (e.g. [1]).

26
Brownian motion

Let (Ω, F , {Ft}t≥0, P) be a filtered probability space.

An adapted Rm valued process X(t) is called a Brownian motion over [0, ∞) if for
all 0 ≤ s < t we have that:

(i) X(t) has continuous trajectories.

(ii) X(t) − X(s) is independent of Fs.

(iii) X(t) − X(s) is normally distributed with mean 0 and covariance matrix (t − s)I.

(iv) X(0) = 0, P-a.s.

Therefore,

E(X(t) − X(s)|Fs) = 0, P − a.s.

E (X(t) − X(s))(X(t) − X(s))>|Fs = (t − s)I.

27
In particular, it is equivalent to say that we have m independents one dimensional Brownian
motions. In order to simplify the exposition we will work only with a one dimensional
Brownian motion, but we will come back to the multidimensional case a bit before treating
stochastic differential equations.

Does a Brownian motion exist?

We now provide three sketches of a construction of such a process.

(i) [Construction on the space of continuous functions] We consider the space W :=

C([0, ∞)), endowed with the distance
X −j
ρ̂(w, ŵ) := 2 |w − ŵ|C([0,j]) ∧ 1 .
j≥1

It is seen that W, endowed with this metric, is a Polish space. We say that C ⊆ W is a
cylinder in W if for j ∈ N and 0 ≤ t1 < . . . tj , there exists E ∈ B(Rj ) such that

B = {ξ ∈ W ; (ξ(t1), . . . , ξ(tj )) ∈ E} . (8)

28
We call C the set of cylinders.

Lemma 4. We have σ(C) = B(W).

Proof: Since ξ ∈ W → (ξ(t1), ...., ξ(tj )) ∈ Rj is continuous, we get that

C ∈ B(W). On the other hand, by continuity, note that for any ε > 0 and ξ0 ∈ W
" #
−j
X
{ξ ∈ W ; ρ̂(ξ, ξ0) < ε} = {ξ ∈ W ; 2 sup |ξ(t) − ξ0(t)| ∧ 1 < ε}
j≥1 t∈Q∩[0,j]

which belongs to σ(C). Since W is separable, open sets are countably unions of balls and
thus B(W ) ⊆ σ(C).

Let µ be probability measure in (R, B(R)) and consider the density function of a
N (0, t),
1 |x|2
− 2 − 2t
f (x, t) := (2πt) e for t > 0, x ∈ R.

29
We define Pµ as follows: For any Ei ∈ B(R) we let
R R
Pµ({ξ ∈ W ; ξ(t1) ∈ E1, . . . ξ(tj ) ∈ Ej }) := µ(x ) E f (x1 − x0, t)dx1 . . .
R R 0 1
. . . R f (xj − xj−1, tj − tj−1)dxj .
(9)
It is seen that Pµ is additive. The difficult part consists in proving that it is σ -additive.
This was shown by Wiener in 1923. Using lemma 4 we can extend Pµ to (W, B(W)).
This extension of µ (which of course it is unique) is called a Wiener measure with initial
distribution µ. Then, on the filtered probability space (W, B(W), {Bt(W)}t≥0, Pµ), we
obtain a Brownian motion by defining

X(t, w) := w(t) and letting µ = δ0.

Note that, as we have seen before, the filtration is not right-continuous.

(ii) [Construction on R[0,∞)] We consider the set R[0,∞). As before we define a

cylinder in the same manner
n o
[0,∞)
B= ξ∈R ; (ξ(t1), . . . , ξ(tj )) ∈ E ,

30
where E ∈ B(Rj ). We set B(R[0,∞)) for the σ -field generated by the cylinders. We
define a family of a posteriori finite dimensional-distributions replacing in (9) each Ei
by (−∞, xj ]. It is easy to show that this family satisfies (4) and (5). Therefore, by
applying theorem 2 we obtain a stochastic process X that has (9) as finite dimensional
distributions.

The only delicate point is that a priori there is no a.s. continuity. In fact, it is easy
to see that W ∈ / B(R[0,∞)) (we only have countably information which is not enough to
establish continuity, since R[0,∞) is too large) and that W contains only the empty set as
a measurable set belonging to B(R[0,∞)).

However, using Kolmogorov-Centov criterium (theorem 4) and the finite dimensional

distributions, we obtain the existence of a modification which has α-Holder paths a.s. for
any α ∈ (0, 21 ) (Exercise!). As we will see later, almost surely we do not have Holder
continuity for α ≥ 12 .

31
(iii) [Approximation by random walks] Given a family {Y (i) ; i = 1, . . . , n} of
independent r.v’s with P(Y (i) = 1) = P(Y (i) = −1) = 21 we define the symmetric
random walk as
k
X
M (0) := 0 and M (k) := Y (j) for k = 0, . . . , n.
j=1

Now, we define the obvious continuous process M (t) by linear interpolation

M (t) := M ([t]) + (t − [t])Y ([t] + 1) for all t ≥ 0.

Let us properly scale the process (recalling the Central Limit Theorem) defining

n 1
W (t) := √ M (nt) for all t ≥ 0.
n

Another proof of the existence of the Brownian motion, consists in proving the convergence
of this process to a process that satisfies the desired properties. This is the so-called
Donsker’s invariance principle (see [5]).

32
Set tk := k/n. Note that the following properties are clear (easy exercise!)

E(W n(tk )|Fti ) = W n(ti) for all 0 ≤ i ≤ k.

n 2
(W n(ti))2 − ti for i ≤ j

E (W (tj )) − tj |Fti = (10)
Pk 2
[W n, W n](tk ) := n n
j=1 (W (tj ) − W (tj−1 )) = tk .

In particular we have
n 2 n n
(W (tk )) − [W , W ](tk ) is a discrete time martingale.

Return to the Brownian motion W (t): We now prove the analogous properties for the
limit case, i.e. for W (t).

(i) W (·) is a martingale, i.e. if s ≤ t we have E(W (t)|Fs) = W (s).

Proof:

E(W (t)|Fs) = E(W (t)−W (s)+W (s)|Fs) = W (s)+E(W (t)−W (s)|Fs) = W (s).

33
(ii) Let π n = (tni)i≥0 (t0 = 0) be a sequence of partitions of R+. Let ∆tni :=
tni+1 − tni and suppose that
n n
|π | := sup |∆ti | → 0 as n ↑ ∞.
i≥1

We define the discrete quadratic variation

πn
X n n 2
QV (t) := |W (ti ∧ t) − W (ti−1 ∧ t)| .
i≥1

n
Proposition 8. We have that QV π (t) → t in L2(Ω), for all t ≥ 0.

Proof: We have
2 2
πn P n n 2 n n
E QV (t) − t = E i≥1 |W (ti ∧ t) − W (ti−1 ∧ t)| − (ti ∧ t − ti−1 ∧ t)

Writing the square, using the independence of the increments and the fact that
n n n n
W (ti ∧ t) − W (ti−1 ∧ t) ∼ N (0, ti ∧ t − ti−1 ∧ t),

34
we can eliminate the cross- terms and we obtain that
2
πn
h i
P n n 4 n n 2
E QV (t) − t = i≥1 E |W (ti ∧ t) − W (ti−1 ∧ t)| − (ti ∧ t − ti−1 ∧ t)

If a r.v. Y ∼ N (0, t) then E(Y 4 ) = 3t2 (Exercise! which is also useful to prove the a.s α- Holder
continuity of W (·) when α ∈ (0, 21 )). Thus
2
πn
X n n 2 n
E QV (t) − t =2 (ti ∧ t − ti−1 ∧ t) ≤ 2t|π | → 0.
i≥1

In fact, if the sequence of partitions has a mesh satisfying that π n converge fast
enough to 0 (for example if the mesh is given by the dyadic partition tni := i2−n) it is
n
possible to prove a.s. convergence for the whole trajectory QV π (·) (see [7]).

Before passing to the third identity, let us first discuss some important consequences
of the above result: First, from the inequality
πn n n
X
QV (t) ≤ max |W (tj ∧ t) − W (tj−1 ∧ t)| |W (ti ∧ t) − W (ti−1 ∧ t)|,
j≥1
i≥1

35
n
we see, by the continuity of the W (·), that V π (t) := |W (tni ∧ t) − W (tni−1 ∧ t)|
P
i≥1
n
satisfies V π (t) → ∞ a.s. This implies that a.s. the total variation of W (·) is +∞.
Therefore, it is not possible to define (see [21])
Z T
f (t)dW (t, ω) for all f ∈ C[0, T ].
0

Another trivial consequence is that W cannot have a bounded derivative on any

interval. In fact, we will see later that a.s. W (·) is not differentiable at any point!

(iii) t → W (t)2 − t is a martingale.

Proof: In fact, since W (t) is a martingale, for s < t we have
h i
2 2 2
t − s = E (W (t) − W (s)) |Fs = E(W (t) |Fs) − W (s) .

36
[The augmented canonical filtration and the Blumenthal zero-one law] Clearly, by
the continuity of the trajectories, the natural filtration associated with W (·), given by
Ft = σ (W (s) ; s ≤ t), is left-continuous. However, it is easy to see that is not right-
continuous (think again on the event of having a local maximum). Therefore, it does not
satisfy the natural conditions. Now we are going to complete the filtration and then we
will show that the new filtration is right-continuous.

First we need a useful result.

Theorem 5. [Blumenthal zero-one law] For all A ∈ F0+ we have P(A) = 0 or
P(A) = 1.

Proof: We will show that F0+ is independent to it self, which clearly implies the
result. Let 0 < t1 < ... < tk and g : Rk → R bounded continuous. For A ∈ F0+, by
continuity we get

E(IAg(W (t1), . . . , W (tk )) = lim E [IAg(W (t1) − W (ε), . . . , W (tk ) − W (ε)]

ε→0

For ε small enough, we obtain the independence between the above increments and Fε,

37
and so the increments are also independent of F0+. Thus
E(IA g(W (t1 ), . . . , W (tk )) = limε→0 P(A)E [g(W (t1 ) − W (ε), . . . , W (tk ) − W (ε)]
= P(A)E(g(W (t1 ), . . . , W (tk )).

Using the usual argument based in the monotone class theorem, we obtain that F0+
is independent of σ (W (s) ; 0 < s ≤ t) = σ (W (s) ; 0 ≤ s ≤ t) by continuity of
W (·), and so we get the independence to itself.

We define

F t := σ(Ft∪N (F )) where N (F ) := {A ⊆ Ω ; ∃ A ∈ F , A ⊆ A with P(A) = 0}.

The filtration F t is called the canonical augmented filtration.

Theorem 6. The augmented filtration F is continuous and W is a F - Brownian
motion.

Proof: The left continuity is clear. Clearly F 0 ⊆ F 0+ and by the above result
F 0+ ⊆ N (F ) ⊆ F0, which shows the right-continuity at zero. At the other times the
same argument applies by using the independence of the increments.

38
[Almost surely nowhere differentiable] As we have seen before the Brownian motion
has α-Holder continuous paths, for every α ∈ [0, 1/2). What happens for α ∈ [1/2, 1)?

Define the G(α, c, ε) as the set of all ω ∈ Ω such that for some s ∈ [0, 1] we have

α
|W (s, ω) − W (t, ω)| ≤ c|s − t| for all t with |s − t| ≤ ε. (11)

The proof of the following theorem is taken from the nice book [22].

Theorem 7. If α > 21 , then G(α, c, ε) has probability 0 for all 0 < c < ∞ and ε > 0.

Proof: First divide [0, 1] in intervals of length 1/n. When you are at k/n consider
m-blocks to the right (where m will be fixed and be chosen later!) . For m/n ≤ ε, set

j+1 j
Xn,k (ω) := max W ,ω −W ,ω ; k ≤j <k+m .
n n

If ω ∈ G(α, c, ε), then at least one block Bn,k̄ := [k̄/n, (k̄ + m)/n] is such that, for
the neighborhood where (11) holds, we have that Bn,k̄ ⊆ [s − ε, s + ε]. By the triangular

39
inequality and (11) we get that Xn,k̄ (ω) ≤ 2c(m/n)α. Therefore, we have

α
G(α, c, ε) ⊆ ω; min Xn,k ≤ 2c(m/n) .
0≤k≤n−m

On the other hand, by the properties of the increments Brownian motion W (t)
(independence and stationarity), we get:

α 1 α m

P (Xn,k ≤ 2c(m/n) ) = P W n ≤ 2c(m/n) m
1
−2
= P n |W (1)| ≤ 2c(m/n)α .

√
Using the trivial bound P(|B1| ≤ x) ≤ 2x/ 2π , we easily obtain

m
√

1
α α −α
P (Xn,k ≤ 2c(m/n) ) ≤ 4cm n 2 / 2π .

40
Thus,
m
4mα

1
α 1+m( 2 −α)
P(G(α, c, ε)) ≤ nP (Xn,k ≤ 2c(m/n) ) ≤ √ n .
2π

We conclude by choosing m such that m(α − 12 ) ≥ 1 and then letting n ↑ ∞.

As a corollary, we get

Corollary 2. With probability one every path of the Brownian motion is not differentiable
at any 0 ≤ t ≤ 1.

Proof: Set D for the event that W is differentiable at some t. Then

[[ 1
D⊆ G 1, n,
j n
j

which has zero probability.

41
[Anecdotic remark] Weierstrass in 1872 constructed the following function which is
continuous and nowhere differentiable
X cos(3nt)
f (t) := .
n≥1
2n

Thus we have answered to the question of the α-Holder property for α ∈ ( 21 , 1).
What happens with the limit case α = 12 ? The following important theorem gives, as a
corollary, a negative answer for α = 12 .

Theorem 8. [Law of the iterated logarithm] For the Brownian motion W (t) we have,
with probability 1, that

W (t) W (t)
lim sup q = 1 and lim inf q = −1.
t↓0
t↓0 2t log[log( 1t )] 2t log[log( 1t )]

Proof: See [7].

42
[The strong Markov property] We recall that by definition of W (t) the process
W (t + ·) − W (t) is independent of W (t). Our aim now is to generalize this property
to the case of stopping times.
Theorem 9. Let τ be a stopping time. Consider the process

(τ )
W (t) := W (τ + t) − W (t).

Then, W (τ )(t) is a Brownian motion independent of Fτ .

Proof: Note that, by the monotone class argument, it is enough to prove that
h i
(τ ) (τ )
E IAg(W (t1), . . . , W (tp)) = P(A)E [g(W (t1), . . . , W (tp))]

for all 0 ≤ t1 < . . . , tp and any continuous bounded g . Let [τ ]n be the smallest real
number of the form k2−n which is greater than τ . Clearly
h i h i
(τ ) (τ ) ([τ ]n ) ([τ ]n )
E IAg(W (t1), . . . , W (tp)) = lim E IAg(W (t1), . . . , W (tp)) .
n→∞

43
But
h i
([τ ]n ) ([τ ]n )
E IA g(W (t1 ), . . . , W (tp ))
h i
P −n −n −n −n
= k≥0 E IA g W (k2 + t1 ) − W (k2 ), . . . , W (k2 + t1 ) − W (k2 ) ,
k

where Ak := A ∩ {(k − 1)2−n < τ ≤ k2−n}. Since τ is a stopping time, we have that
Ak is Fk2−n -measurable. Thus, by the independence of the increments of W , we get
h i
([τ ]n ) ([τ ]n ) P
E IA g(W (t1 ), . . . , W (tp )) = E g W (t1 ), . . . , W (tp ) k≥0 P(Ak ),

= P(A)E g W (t1 ), . . . , W (tp ) ,

from which we get the result.

44
Stochastic Integral

Let (Ω, F , {Ft}t≥0, P) be a filtered probability space satisfying the usual conditions.

We need the definition of some spaces:

L2F ([0, T ]; R) is defined as the set of adapted measurable process f such that
"Z #
T
2
E |f (t, ω)| dt < ∞.
0

We have that L2F ([0, T ]; R) is a Hilbert space. For simplicity, when the context is clear,
we write only L2F .

45
M2([0, T ]; R) is the set of Ft-square integrable (i.e. in L2F ([0, T ]; R)) martingales
with right-continuous paths starting from 0. We set

hM1, M2iM 2 := E [M1(T )M2(T )] . (12)

M1 and M2 are identified if up P-null set, we have M1(·) = M2(·).

M2c ([0, T ]; R) is the subspace of continuous elements of M2([0, T ]; R)

Theorem 10. M2([0, T ]; R) endowed with (12) is a Hilbert space and M2c ([0, T ]; R)
is closed in M2([0, T ]; R).

Using this theorem, in particular the closeness property of M2c ([0, T ]; R), we will define
the stochastic integral w.r.t. W (t),

If (·) ∈ Mc2 of f ∈ L2F .

46
Proof of theorem 10: The fact that k · kM 2 is a norm is easy. We now prove that the
space is complete. Let Xn ∈ M2 be a Cauchy sequence. Choosing a subsequence nk such that
kXnk+1 − Xnk kM 2 ≤ 2−3k , Doob inequality (I) yields
!

−k 2k 2 −k
P sup |Xnk+1 (t) − Xnk (t)| > 2 ≤2 E |Xnk+1 (T ) − Xnk (T )| ≤2 .
t∈[0,T ]
P
By the Borel-Cantelli lemma (if i P(Ai ) < +∞ then P(∩i ∪j≥i Ai ) = 0) we get the existence of
Ω0 ∈ F and X right-continuous, such that for all ω ∈ Ω0 we have supt∈[0,T ] |Xnk (t) − X(t)| → 0.
By dominated convergence and Doob inequality (II), we get that Xnk (t) → X(t) in L2 (Ω) for all
t ∈ [0, T ]. By passing to the limit in the martingale equality we get that X ∈ M2 . Finally, if Xnk is
continuous we have that X is continuous, because of the a.s. uniform convergence of the subsequence.

L0F ([0, T ]; R) is the subspace of L2F of bounded simple process, i.e. f ∈ L0

if there exists t0 = 0 < t1 < ... < ti < ... < T and Fti -measurable functions
fi : Ω → R with supi,ω |fi(ω)| < ∞ such that:
X
f (t, ω) = fi(ω)I(ti,ti+1](t).
i≥0

47
Define the stochastic integral If (ω, ·) : [0, T ] → R as
X
If (ω, ·) := fi(ω) [W (ω, · ∧ ti+1) − W (ω, · ∧ ti)] .
i≥0

Fundamental properties of the process (ω, t) → If (ω, t).

(i) For all ω ∈ Ω, the continuity of W (ω, ·) yields the continuity of If (ω, t).

(ii) [Martingale property] We have that If is a martingale. In particular E(If ) = 0.

Proof: If t > s ∈ (tj , tj+1), we have

E If (t)|Fs = If (s) + fj E(W (tj+1) − W (s)||Fs )
P
+ i≥j+1 E fi (ω) W (t ∧ ti+1 ) − W (t ∧ ti ) |Fs ,

and conditioning to Fti in every term gives the result.

hR i
2 T 2
(iii) [Ito isometry] kIf (·)kM 2 = E(If (T ) ) = E 0
|f (t)| dt = kf kL2 .

48
Proof:
h i
2 P 2 2
E(If (T ) ) = i≥0 E |f (t

i )| (W (ti+1 ) − W (ti ))

P
+2 i<j E f (ti )f (tj )(W (ti+1 ) − W (ti ))(W (tj+1 ) − W (tj )) .

Conditioning by Fti each term of the first sum and by Ftj each term of the second sum we get the result.

Properties (i)-(iii) imply that If ∈ Mc2 and that the linear application f ∈ L2F →
If ∈ Mc2 is an isometry. Therefore we can extend If to f ∈ clos(L0) (the closure of L0
in L2F ). But... what is clos(L0)? Note that if f ∈ L2F is bounded and a.s. continuous,
we have that
n −1
2X
iT 2
fn(·) := f I iT (i+1)T
(·) → f (·) in LF .
i
2n 2n
,
2n

We have the following general result for all f ∈ L2F .

Lemma 5. We have clos(L0) = L2F .

49
Therefore, for every f ∈ L2F we have defined its stochastic integral If (·) ∈ M2c ,
which is usually denoted by
Z ·
If (·) =: f (s)dW (s).
0

Proof of lemma 5:

(i) As we have seen if f is bounded and a.s. continuous we are done.

(ii) If f is bounded progressively measurable we can approximate first by the following continuous
adapted (and thus progressively measurable) process
Z t
k
F (t) := k f (s)ds., (13)
0∨(t− 1 )
k

and we obtain the result by a diagonal argument.

(iii) If f is bounded adapted, it has a progressive measurable modification and the analogous process
to (13) is progressive measurable. Using the argument based on Fubini theorem (see [14]) we can prove that
F k (t) is adapted and then can proceed as in (ii).

50
(iv) If f is only measurable and adapted we truncate.

Some remarks on the construction of the integral:

(i) One can think to approach a progressive process by sums as in the usual Lebesgue
theory (see for example the sum in (1)). However, it is clear that it does not work since
an event like
−n −n
{(t, ω) ; f (t, ω) ∈ [j2 , (j + 1)2 }
belongs only to B([0, T ]) × FT and thus we cannot construct elements in L0 of this
kind, which implies the loose of the isometry property, the martingale property, etc.

(ii) Note that in the construction of the integral, we can avoid the use of the
normal distribution of the increments as follows: We have used this in the Itô isometry
property for the simple process. But the same result can be obtained, using that
W (t)2 − t = W (t)2 − [W, W ](t) is a martingale (exercise!). This remark allows to
define the stochastic integral for arbitrary continuous square-integrable martingales M (t).
In fact, there is a deep result called the Doob-Meyer decomposition that says that for such

51
a martingale M the exists a process [M, M ](t) (the quadratic variation of M ) such that
2
M (t) − [M, M ](t) is a martingale.

However, the approximation is then done in the space

2
LM := {x : Ω × [0, T ] → R ; x is adapted and kxkM,2 < ∞},

where "Z #
T
2
kxkM,2 := E |X(t)| d[M, M ](t) .
0

It can be shown that if the measure induced by [M, M ](·) is absolutely continuous
w.r.t. the Lebesgue measure (which is the case for the Brownian motion), then we can
approximate with simple functions any element of L2M . However, for the general case we
can approximate only any progressive element of L2M . For more information on this see
the books [12, 14].

52
Fundamental properties of the stochastic integral Consider f , g ∈ L2F and s < t

(i) [Conditioned Ito isometry and a new martingale]

2 hR i
Rt t 2
E f (r)dW (r) |Fs
s
= E s
|f (r)| dr|Fs ,
R · 2 R ·
0
f (r)dW (r) − 0 |f (r)|2dr is a martingale.

Proof: The first identity follows analogously to the derivation for the unconditioned Ito isometry. In fact,
the same proof yields the identity for simple process and then we can pass to the limit in the conditional
expectation.
R· The second identity comes from a direct computation using the first identity. In fact, using
that 0 f (r)dW (r) is a martingale we get that

"Z
t 2 # "Z
t 2 # Z s 2
E f (r)dW (r) |Fs =E f (r)dW (r) |Fs − f (r)dW (r) ,
s 0 0

and the result follows using the first result.

Proof: The first two identities are a direct consequence of the optional sampling theorem and (i). For the
third identity, note that by the second identity we have
"Z
t∧τ 2 # Z t∧τ
2
E [f (r) + g(r)]dW (r) |Fσ = E [f (r) + g(r)] dr|Fσ ,
s∧σ s∧σ

developing the squares and using the second identity again, we get the result.

(ii) [Consistency] For any stopping time σ and f ∈ L2F , let fˆ(t, ω) :=
f (t, ω)Iσ(ω)≥t. Then
Z t∧σ Z t
f (s)dW (s) = fˆ(s)dW (s).
0 0

54
Proof: The proof of this fact follows easily from our construction (Exercise!).

Now, we sketch the construction of the integral for a more general kind of process.
Define
( Z T )
2,loc 2
LF := X : [0, T ] × Ω → R ; X is Ft adapted and |X(t)| dt < ∞, P − a.s.
0

M2,loc := {X : [0, T ] × Ω → R ; ∃ nondecreasing stopping times σj with

P(limj→∞ σj ≥ T ) = 1 and X(· ∧ σj ) ∈ M2 ∀ j ≥ 1

n o
2,loc 2,loc
Mc := X ∈M ; X is continuous .

If X ∈ M2,loc we say that X is a local martingale and if X ∈ M2,loc

c we say that X is
a continuous local martingale.

Now, given f ∈ L2,loc

F we are going to construct its stochastic integral by a localization

55
argument. In fact, for every j ≥ 1 define the stopping time
Z t
2
σj (ω) := inf t ∈ [0, T ] ; |f (s, ω)| ds ≥ j .
0

Define fj (t) := f (t)It≤σj . By definition of the stopping times we have that fj ∈ L2F .
We define the stochastic integral of f as
Z t Z t
f (s)dW (s) := fj (s)dW (s) if t ∈ [0, σj ]
0 0

We have to check that is well defined, because if t ≤ σi(ω) ≤ σj (ω) we have to verify
that Z t Z t
fj (s)dW (s) = fi(s)dW (s).
0 0
To see this, note that by the property of consistency, we have
Z t∧σ Z t Z t Z t
i
fj (s)dW (s) = fj (s)Is≤σi dW (s) = f (s)Is≤σi dW (s) = fi (s)dW (s).
0 0 0 0

56
R·
It can be proved that 0
f (s)dW (s) is a local martingale, but in general is not a
martingale.

Let us state an important inequality for the stochastic integral.

Theorem 11. [Burkholder-Davis-Gundy (BDG) inequality] Let W be an m-
dimensional Brownian motion and σ ∈ L2F ([0, T ]; Rn×m). Then, for any r > 0,
there exists a constant Kr > 0 such that for any stopping time τ ,
h R i 2r

τ r t
1
|σ(s)|2ds
R
Kr E 0
≤ E sup0≤t≤τ 0 σ(s)dW (s)
h R r i
τ 2
≤ Kr E 0
|σ(s)| ds .

Note that for r = 1 the first inequality is trivial and the second inequality is Doob
inequality (II) with p = 2.

57
Ito’s formula

Note that if “standard” differential calculus were valid, when consider the stochastic
integral one would expect that
Z t
2
W (t) = 2 W (s)dW (s).
0

Let us check that this formula is wrong! In fact, evaluating in t 6= 0 and taking the
expectation we obtain t = 0!. Let us find the correct formula with the definition of the
stochastic integral. Let us consider a sequence of partitions 0 = tn0 < tn1 < . . . < tnn = t
such that maxi≥1 |tni − tni−1| → 0 as n ↑ ∞. Since W (·) has continuous trajectories
Rt
we can approximate 2 0 W (s)dW (s) as the limit of

Pn h i Pn h 2 i P h i2
2 n
i=1 2W (ti − 1) W (ti ) − W (ti−1 ) = i=1 W (ti ) − W (ti−1 ) − i=1 W (ti ) − W (ti−1 )
Pn h i2
= 2
W (t) − i=1 W (ti ) − W (ti−1 ) .

58
By taking the L2 limit we get
Z t
2
2 W (s)dW (s) = W (t) − t.
0

Now we provide the following change of variable formula better known as Itô’s formula.
Now, we comeback to the setting where the Brownian motion is m-dimensional.

It is important to fix now the following notation that will be used throughout the
0
notes. For a smooth function f : R+ × Rm → R (m0 ∈ N), we set

2 2
∂tf (t, x) := Dtf (t, x), Df (t, x) := Dxf (t, x), D f (t, x) := Dxxf (t, x)

Theorem 12. [Itô’s formula for the Brownian motion] Let f : R+ × Rm → R be

C 1,2([0, T ]) and W (t) a m-dimensional Brownian motion. Then, with probability 1,
for every t > 0 we have:
Z t Z r h i
> 2
f (t, W (t)) = f (0, 0)+ Df (s, W (s)) dW (s)+ ∂t f (s, W (s)) + 12 Tr D f (s, W (s)) ds.
0 0

59
For the proof of this important result see [12]. It is important to note that, since the
result holds with probability one, then we can replace t by any stopping time τ .

Now we extend the result to the so-called Itô’s process. For adapted measurable
processes b(t) and σ(t) with values in Rn and Rn×m respectively, satisfying that
Z T Z T
2
|b(s)|ds + |σ(s)| dt < ∞
0 0

we can define the process

Z t Z t
X(t) := X0 + b(s)ds + σ(s)dW (s).
0 0

A process of this type is called an Itô’s process. For these process we have the following
Itô’s formula:
Theorem 13. Let f : R+ × Rn → R be C 1,2([0, T ]; Rn). Then, with probability one,
we have
Z t Z th
> 1 Tr σ(s)σ(s)> D 2 f (s, X(s)) dt.
h ii
f (t, X(t)) = f (0, 0) + Df (s, X(s)) dX(s) + ∂t f (s, X(s)) + 2
0 0

60
Stochastic Differential Equations (SDEs)

Let (Ω, F , {Ft}t≥0, P) be given, W (t) be an m-dimensional Brownian motion and

ξ0 a r.v. F0 measurable. Let us consider two functions b ∈ A(Rn), σ ∈ A(Rn×m) (recall
(6)) and the following stochastic differential equation: (the expression in the following
display is only standard notation, whose meaning is defined by the concept of solution
define below)

dX(t) = b(t, X(·))dt + σ(t, X(·))dW (t),

(14)
X(0) = ξ0

We say that an adapted, continuous process X(t) is a solution of (14) if a.s. we have

(i) [Well posedness of the integrals]

Z t h i
2
|b(s, X(·))| + |σ(s, X(·))| ds < ∞ for all t ≥ 0, P − a.s.
0

61
(Note that by the lemma 6 below the process b(t, X(·)) and σ(t, X(·)) are adapted)

(ii) [X solves the equation]

Z t Z t
X(t) = ξ0 + b(s, X(·))ds + σ(s, X(·))dW (s) for all t ≥ 0.
0 0

Lemma 6. Let b ∈ A(R). Let (Ω, F , {F }t≥0, P) be given satisfying the usual
conditions and let X be continuous and progressive. Then the process (t, ω) →
b(t, X(·, ω)) is progressive.

Proof: We consider the application (t, ω) → Φ(t, ω) = (t, X(·, ω)) ∈ [0, T ]×W.
Therefore,
b(t, X(·, ω)) = b ◦ Φ(t, ω).
Since b ∈ A(R) we have that b−1(B(R)) ⊆ B([0, t]) × Bt+(W). Thus, since
X −1(Bt+(W)) ⊆ Ft+ = Ft (note that it is enough to verify it on the cylinders defined
in (8)) we obtain immediately the result.

Assumptions for the existence of a unique solution:

62
(H) [Lipschitz conditions for the coefficients] There exists a constant L > 0 such that
for all t ∈ [0, ∞), and x, y in W.

|b(t, x(·)) − b(t, y(·))| ≤ Lρ̂(x(·), y(·))

|σ(t, x(·)) − σ(t, y(·))| ≤ Lρ̂(x(·), y(·))
|b(t, 0)| + |σ(t, 0)| ∈ L2([0, T ]) for all T > 0.

Note that if we fix j̄ ∈ N we have,

ρ̂(x(· ∧ j̄), y(· ∧ j̄)) ≤ Lkx(· ∧ j̄) − y(· ∧ j̄)kC[0,j̄],

Because, we clearly have

kx(· ∧ j̄) − y(· ∧ j̄)kC[0,j] ≤ kx(· ∧ j̄) − y(· ∧ j̄)kC[0,j̄] for all j ∈ N.

For ` ≥ 1, T > 0, let us consider the space

` n n
LF (Ω, C([0, T ]; R )) := x : Ω × [0, T ] → R ; x is adapted, continuous and kxk`,∞ < ∞ ,

63
where " #
` `
kxk`,∞ := E sup |X(t)| .
t∈[0,T ]

It is easy to show that L`F (Ω, C([0, T ]; Rn)) is a Banach space (exercise!).

Theorem 14. [Existence and uniqueness] Under the assumption (H), for all ` ≥ 1 and
ξ0 ∈ L`(Ω, Rn), there exists a unique solution X(t) of (14). Moreover, for all T > 0
we have that X ∈ L`F (Ω, C([0, T ]; Rn)).

Proof: We will use a fixed point technique. First, let us fix a fixed deterministic time
τ , to be chosen later, and consider the space Sτ := L`F (Ω, C([0, τ ]; Rn)). We define
the application T : Sτ → Sτ by
Z t Z t
T (x)(t) := ξ0 + b(s, X(·))ds + σ(s, X(·))dW (s) for all t ∈ [0, τ ].
0 0

If the application is well defined and is contractive, then we get our solution in [0, τ ] and
of course using the same argument we can extended to [τ, 2τ ], by taking the logical initial
condition in order to get continuity, etc. Therefore, it is enough to prove that

64
(i) T is well defined . We easily obtain the existence of a constant L1 > 0 such that

`
` !
Z τ Z t
` `
|T (x)(t)| ≤ L1 |ξ0| + |b(s, X(·))|ds + sup σ(s, X(·))dW (s) .
0 t∈[0,τ ] 0
(15)
The BDG inequality gives

 
" Z t `
# Z τ `
2
2
E sup σ(s, X(·))dW (s) ≤ K` E  |σ(s, X(·))| ds 
t∈[0,τ ] 0 0

Using (H) we get the existence of L2 > 0 such that

 
" Z t `
# Z τ `
2 `
2 `
E sup σ(s, X(·))dW (s) ≤ L2  |σ(s, 0)| ds + τ 2 kxk`,∞ < ∞
t∈[0,τ ] 0 0

65
Similarly, there exists L3 > 0 such that
"Z ` # "Z ` #
τ τ ` `
E |b(s, X(·))|ds ≤ L3 |b(s, 0)|ds + τ 2 kxk`,∞ < ∞.
0 0

Thus, by taking the supremum and then the expectation in (15) we obtain that T is well
defined.

(ii) T is a contraction. Given x, y in Sτ , there exists a constant L4 > 0 such that

66
By (H) we obtain the existence of L6 > 0 such that
`
" #
Z t ` `
E sup [σ(s, X(·)) − σ(s, y(·))] dW (s) ≤ L6τ 2 kx − yk`,∞
t∈[0,τ ] 0

By (H) again, we get the existence of L7 > 0 such that

Z τ `
` ` ` `
|b(s, X(·)) − b(s, y(·))|ds ≤ L7τ kx−yk`,∞ ≤ L7τ 2 kx−yk`,∞ for τ ≤ 1.
0

Therefore, by taking the supremum in (16), we finally obtain the existence of L8 > 0 such
that ` `
kT (y) − T (x)k`,∞ ≤ L8τ 2 kx − yk`,∞.
2
Letting τ < min{1, ( L1 ) ` }, we get the result.
8

67
[Markov solutions] Suppose that we are given b : [0, ∞) × Rn → Rn and
σ : [0, ∞) × Rn → Rn×m such that
(H’) [Lipschitz conditions for the coefficients (bis)] There exists a constant L > 0 such
that for all t ∈ [0, ∞), and x, y in W.

|b(t, x) − b(t, y)| ≤ L|x − y|

|σ(t, x) − σ(t, y)| ≤ L|x − y|
|b(t, 0)| + |σ(t, 0)| ∈ L2([0, T ]) for all T > 0.

We consider the SDE

dX(t) = b(t, X(·))dt + σ(t, X(·))dW (t),

(17)
X(0) = ξ0

Clearly, this is a special case of the one studied before. Moreover, it is clear that (H’)
implies (H) when b and σ are viewed as elements in A(Rn) and A(Rn×m) respectively.
This particular structure is very important since the solutions are Markovian, i.e. satisfy
the Markov property.

68
Theorem 15. Under (H’) there exists a unique solution X of (17). Moreover, X(t)
is a Markov process, i.e.

E (φ(X(t + h))|Ft) = E (φ(X(t + h))|X(t)) .

Moreover, X(t) is a strong Markov process, i.e. for every stopping time τ , we have

E (φ(X(τ + h))|Fτ ) = E (φ(X(τ + h))|X(τ )) .

Idea of the proof: We only need to prove the Markov and strong Markov property. We
will argue in a formal way (see [14] for a rigorous proof). Note that
Z t+h Z τ +t
X(t + h) = X(t) + b(s, X(s))ds + σ(s, X(s))dW (s).
t τ

Defining X 0(h) := X(t + h), W 0(h) := W (t + h) − W (t), b0(h, x) := b(t + h, x),

σ 0(h, x) = σ(t + h) and the filtration Fh0 := Ft+h, we can write the equation in the

69
form
Z h Z h
0 0 0 0 0 0 0 0 0 0 0 0
X (h) = X (0) + b (h , X (h ))dh + σ (h , X (h ))dW (h ).
0 0

Since W 0(h) is a Brownian motion w.r.t. the filtration Fh0 (and independent of Fs by the
Markov property), the above SDE is well posed and its solution depend only on X 0(0) .
By path uniqueness of the solutions this implies the result. For the strong Markov property
the previous ideas work in the same way if we have autonomous coefficients, noting that
the new W 0(h) := W (τ + h) − W (τ ) is a Brownian motion independent of Fτ by the
strong Markov property. Otherwise, we can add an artificial variable (dXn+1 = 1dt), to
treat the time as a part of X .

70
Proposition 9. [Important stability estimates] Suppose that (H) holds true. Then,
for all T > 0 there exists a constant KT such that the unique solution X(t) of (14)
satisfies:
(i)
" #

` `
E sup |X(s)| ≤ KT 1 + E(|ξ| ) .
0≤s≤T

(ii)

`

`
`
E |X(t) − X(s)| ≤ KT 1 + E(|ξ| ) |t − s| 2 .

(iii) If ξ̂ ∈ L`F0 (Ω; Rn) is another r.v. and X̂(t) is the corresponding solution, then

!

` `
E sup |X(s) − X̂(s)| ≤ KT E |ξ − ξ̂| .
0≤s≤T

To prove these results we will need the Gronwall lemma.

71
Lemma 7. [Gronwall lemma] Let f : [a, b] → R be piecewise continuous, satisfying
Z t
f (t) ≤ α + β f (s)ds, for all t ∈ [a, b], (18)
a

for some positive α and β . Then

β(t−a)
f (t) ≤ αe .

Proof of lemma 7: Multiplying eq. (18) by e−β(t−a) we get

Z t
d −β(t−a) −β(t−a)
e f (s)ds ≤ αe .
dt a

Integrating this equation yields

Z t
−β(t−a) α −β(t−a)
e f (s)ds ≤ (1 − e ).
a β

Plugging this expression in (18) gives the result.

72
Proof of proposition 9: We will prove the result for ` ≥ 2. The proof for the general case can be
found in [18].

Proof of (i): Given T > 0 and t ∈ [0, T ], arguing as in the first part of the proof of theorem 14 we
easily obtain

Z t ` Z s `
!
` `
|X(t)| ≤ LT |ξ| + |b(s, X(r))|ds + sup σ(r, X(r))dW (r)
0 0≤s≤t 0

The BDG inequality and the fact that ` ≥ 2, give

!`
 
" # !
Z s ` Z T 2 Z t
2 `
sup σ(r, X(r))dW (r) ≤C |σ(s, 0)| ds + sup |X(r)| ds ,
 
E E
0≤s≤t 0 0 0 0≤r≤s

for some C > 0. In what follows C will always denote a generic constant. Clearly,
 !` Z 
"Z
t ` # Z T t
!
`
E |b(s, X(s))|ds ≤C  |b(s, 0)|ds + E sup |X(r)| ds .
0 0 0 0≤r≤s

73
Therefore, we finally obtain
! " ! #
Z t
` ` `
E sup |X(s)| ≤ C 1 + E |ξ| + E sup |X(r)| ds ,
0≤s≤t 0 0≤r≤s

and the result follows from Gronwall lemma.

Proof of (ii): Repeating the same kind of arguments we get

` `
` Rt Rt
E |X(t) − X(s)| ≤ C s |b(0, r)|dr +E s |X(r)|dr
` ` (19)
" #)
R R
t 2 2 t 2 2
+ s |σ(0, r)| dr +E s |X(r)| dr

By the Cauchy-Schwartz inequality we get

!`
 
"Z
t ` # Z T 2
`  2
|X(r)|dr ≤ (t − s) 2 E  |X(r)| dr .

E
s 0

Using this fact, majoring X(r) by sup0≤t≤T X(r) in (19) and using (i), we easily obtain the result.

Proof of (iii): Exercise! Very similar to (i).

74
Stochastic equations with random coefficients

Let us consider the following SDE with explicit dependence on ω

dX(t) = b(t, X(·), ω)dt + σ(t, X(·), ω)dW (t),

(20)
X(0) = ξ0 ,

where b : [0, ∞) × Wn × Ω → Rn → Rn and σ :: [0, ∞) × Wn × Ω → Rn×m. The

definition of solution of (20) is the obvious modification than the one for (??). In order to
give a sense to the above SDE, we suppose:

(i) For every ω ∈ Ω, we have that b(·, ·, ω) ∈ An(Rn) and b(·, ·, ω) ∈ An(Rn×m).

(ii) For any x ∈ Rn, b(·, x, ·) and σ(·, x, ·) are adapted.

75
(iii) There exists a constant L > 0 such that for all t ∈ [0, ∞), ω ∈ Ω, and x, y in W.

|b(t, x(·), ω) − b(t, y(·), ω)| ≤ Lρ̂(x(·), y(·))

|σ(t, x(·), ω) − σ(t, y(·), ω)| ≤ Lρ̂(x(·), y(·))
|b(·, 0, ·)| + |σ(·, 0, ·)| ∈ L2F ([0, T ]) for all T > 0.

Theorem 16. [Existence and uniqueness for SDEs with random coefficients] Under
the above assumptions there exists a unique solution of (20). Moreover, the obvious
estimates (uniforms in ω ), of proposition 9 hold true.

Proof: It is a direct adaptation of the proof for the non-random coefficient case (Exercise!).

It is important to note that since we have an explicit dependence in ω , the coefficients

“have memory”, and thus the argument of the proof of the Markov property for diffusions
does not apply. In general, the solution of (20) is not a Markov process.

76
Connections with PDEs

We come back to the setting of non-random coefficients. Let {X t,x(s), s ≥ t} be

the unique solution of

dX(s) = b(s, X(s))ds + σ(s, X(s))dW (s) for s ≥ t,

X(t) = x.

Given f : Rn → R, define the infinitesimal generator Af as

t,x

E f (X (t + h)) − f (x)
Af (t, x) := lim if the limit exists.
h→0 h

Itô’s formula implies that A is well defined for every C 1,2-function with bounded derivatives
and h i
> 1 > 2
Af = b(t, x) Df (t, x) + 2 Tr σσ (t, x)D f (x, t) . (21)

77
Proposition 10. Assume that (t, x) → v(t, x) := E g X (T ) is C 1,2([0, T ] ×
t,x

Rn). Then v solves the PDE

∂tv + Av = 0,
v(T, ·) = g(·).

Proof: If the function v has bounded derivatives the proof below simplifies considerably (exercise).
To treat the case when only v ∈ C 1,2 , we use a typical localization in space technique in order
to be able to eliminate the diffusion term by taking expectation. In fact, define the stopping time
τ1 := T ∧ inf{s > t ; |X t,x (s) − x| ≥ 1}. Itô’s formula gives

t,x
R s∧τ
v s ∧ τ1 , X (s ∧ τ1 ) = v(t, x) + t 1 (∂t v + Av) (r, X t,x (r))dr
R s∧τ
+ t 1 Dv(s, X t,x (r))> σ(r, X t,x (r))dW (r).

By taking the expectation, we get

Z s∧τ
h
t,x
i 1 t,x
E v s ∧ τ1 , X (s ∧ τ1 ) − v(t, x) = E (∂t v + Av) (r, X (r))dr .
t

78
But, by the strong Markov property we obtain

s∧τ1 ,X t,x (s∧τ1 )

t,x
v s ∧ τ1 (ω), X (s ∧ τ1 (ω), ω) = E g X (T ) |Fτ1 (ω).

Therefore,
h i
t,x x,t
v s ∧ τ1 (ω), X (s ∧ τ1 (ω), ω) = E g X (T ) |Fτ1 (ω).

By taking the expectation we get

h i
t,x
E v s ∧ τ1 , X (s ∧ τ1 ) = v(x, t),

from which Z s∧τ

1 t,x
E (∂t v + Av) (r, X (r))dr = 0.
t
Dividing by s − t, letting s ↓ t and using Lebesgue theorem we obtain the result.

Now, we consider a more general problem:

∂tv + Av − k(t, x) + f (t, x) = 0, for (t, x) ∈ [0, T ) × Rn

(22)
v(T, ·) = g(·).

79
We suppose that: (i) b and σ are continuous, Lipschitz in x uniformly in t and |b(·, x)|,
|σ(·, x)| belong to L2([0, T ], (ii) the function k is uniformly bounded from below, (iii)
the function f has quadratic growth in x uniformly in t.

Theorem 17. [Probabilistic representation of the solution of a linear parabolic PDE,

known as Feynman- Kac formula] Suppose that the above assumptions hold true. Let
v ∈ C 1,2([0, T ], Rm) be a solution of (22) satisfying quadratic growth in x uniformly
in t. Then v has the following representation

"Z #
T
t,x t,x t,x t,x
v(t, x) = E β (s)f s, X (s) .s + β (T )g(X (T )) ,
t

t,x
Rs
where β (s) := exp{− t
k(r, X t,x(r))dr}, with X x,t(s) being the solution of

dX(s) = b(s, X(s))ds + σ(s, X(s))dW (s),

X(t) = x.

80
Proof: Define the sequence of stopping times
n o
t,x
τn := T ∧ inf s > t ; |X (s) − x| ≥ n .

By the continuity of the paths of X t,x (s) it is clear that a.s. we have τn ↑ T as n ↑ ∞. Using that v is
smooth, we can apply Itô’s formula to β t,x (s)v(s, X x,t (s)) obtaining that
h i
t,x x,t t,x x,t
d β (s)v(s, X (s)) = β (s) [−kv + ∂t v + Av] s, X (s) ds

t,x x,t x,t
+β (s)Dv s, X (s) σ s, X (s) dW (s).

Using that v solves (22) and taking the expected value, we obtain
h i h R τn t,x i
t,x x,t x,t
E β (τn )v(τn , X (τn )) − v(t, x) = E − t β (s)f s, X (s) ds
hR i
τn t,x x,t x,t
+E t β (s)Dv s, X (s) σ s, X (s) dW (s) .

Since before τn , X x,t (s) is bounded, the last term in the above expression is zero. Therefore, we have
Z τn
t,x x,t t,x x,t
v(t, x) = E β (s)f s, X (s) ds + β (τn )v(τn , X (τn )) .
t

81
The result easily follows by letting n ↑ ∞ by Lebesgue theorem. In order to verify that the integrand is
dominated we use the quadratic growth property of f , the estimates for the second moment of X x,t (s) and
the fact that k is bounded by below.

82
Stochastic control: Problem formulation

Excellent references for what follows are the books [11, 20, 24] and the lecture notes
[23].

Let (Ω, F , {F }t≥0, P) be a given probability space satisfying the usual conditions.
Let W be a Brownian motion defined on this space. We suppose that the filtration is given
by the canonical augmentation of the one generated by W (t). For (t, x) ∈ [0, T ] × Rn,
let us consider the following controlled stochastic SDE

dX(s) = b (s, X(s), u(s)) ds + σ(s, X(s), u(s))dW (s)

(23)
X(t) = x

In the notation above, u is the control that belongs to the space

n o
2 d
U := u : [0, T ] → U ; u ∈ L ([0, T ]; R ) .

83
where U ⊆ Rd is closed. For u ∈ U we denote by X t,x[u] for the unique solution (if it
exists!) of (23). Given a function ` : [0, T ]×Rn ×Rd → R and g : Rn → R, we consider
the cost (which is well defined under the assumptions below) J : [0, T ] × Rn × U → R
"Z #
T
t,x t,x
J(t, x, u) := E `(s, X [u](s), u(s))ds + g(X [u](T )) .
t

The stochastic optimal control problem at (t, x) is to calculate the following value function

v(x, t) := inf J(t, x, u), v(x, T ) = g(x).

u∈U

[Assumptions to give an sense to the problem]. We first consider standard

assumptions for the SDE:

(i) [Uniform continuity assumption] The maps b : [0, T ] × Rn × Rd → Rn, σ :

[0, T ] × Rn × Rd → Rn×m, ` : [0, T ] × Rn × Rd → Rn and g : Rn → R are uniformly
continuous.

84
(i) [Lipschitz assumption] There exists a constant L > 0 such that for φ(t, x, u) =
b(t, x, u), σ(t, x, u), `(t, x, u), h(x), we have

|φ(t, x, u) − φ(t, y, u)| ≤ L|x − y|, for all t ∈ [0, T ], x, y ∈ Rn, u ∈ U.

|φ(t, 0, u)| ≤ L, ∀ (t, u) ∈ [0, T ] × U.

Under these assumptions, for each u ∈ U , equation (23) admits a unique solution
X t,x[u] and the function J is well defined. For simplicity, we have considered these
assumptions. However, most of the results presented later can be extended to more general
settings as:

(i’) Instead of uniform continuity you ask only for continuity.

(ii’) Lipschitz assumption only for b and σ .

(iii’) Instead of a uniform bound over b(t, 0, u) and σ(t, 0, u) you only ask for linear
growth for b(t, x, u) and σ(t, x, u) .

(iv’) Instead of a uniform bound over `(t, 0, u), you only ask for quadratic growth for

85
`(t, x, u) and g(x).

For these assumptions see the lecture notes of N. Touzi [23] and the books [11, 20].

[An interesting reduction of the set of admissible controls] Let us consider the set

Ut := u ∈ U ; u|[t,T ] is independent of Ft .

We have:

Proposition 11. [A restriction of the set of admissible controls] The value function can
be calculated as
v(t, x) = inf J(t, x, u).
u∈Ut

Proof: Clearly, v(t, x) ≤ inf u∈Ut J(t, x, u). Now, let u ∈ U and for simplicity suppose that
` = 0. We know by Dynkin theorem that u can be written as

u(s) = h (s, W (· ∧ s)) .

86
Therefore, for s > t we have

0
u(s) = h (s, W (· ∧ s)) = h s, W (· ∧ t) + W (· ∧ s)

where W 0 (· ∧ s) = W (· ∧ t) + [W (· ∧ s) − W (· ∧ t)]. Therefore if “we freeze the trajectory” of

W (· ∧ t) we have, because of the independence of the increments of w(·), a new control that is independent
of Ft . Using this observation, we have (denoting by µ the Wiener measure), that

h i
E g(X [u](T )) = C([0,T ]) g(X t,x [u](T, w))dµ(w)
t,x R
J(t, x, u) =
t,x
[u](T, w, w0 ))dµ(w)dµ(w0 )
R R
= C([t,T ]) C([0,t]) g(X
t,x
[u](T, w, w0 ))dµ(w)dµ(w)
R R
= C([0,t]) C([t,T ]) g(X
R
≥ C([0,t]) inf u∈Ut J(t, x, u)dµ(w) = inf u∈Ut J(t, x, u),
which yields the result.

Some properties of the value function: As we will see later v is the unique solution
(in a weak sense to be defined) of a second order HJB equation. However, we can deduce
some interesting properties of v without appealing to this PDE (which in turns a posteriori
prove these properties for the solution of the PDE).

87
1
Theorem 18. [Linear growth for v . Lipchitz property of v w.r.t. x and local 2 Holder
w.r.t. to t] There exists K > 0 such that the value function satisfies
|v(t, x)| ≤ K(1 + |x|), ∀ (t, x) ∈ [0, T ] × Rn ,

1
|v(t, x) − v(t̂, x̂)| ≤ K |x − x̂| + (1 + |x| ∨ |x̂|) |t − t̂| 2 ∀ t, t̂ ∈ [0, T ], x, x̂ ∈ Rn .

Proof: Since our assumptions are uniform on u, the proof is a direct consequence of our estimates for
solutions of SDE (with ` = 1).

[Continuous dependence on parameters] Consider a family of problems parameterized

by ε > 0. The dynamics are of the form

dX(t) = bε (s, X(s), u(s)) ds + σε(s, X(s), u(s))dW (s)

(24)
X(t) = x

and the cost

"Z #
T
t,x t,x
Jε(t, x, u) := E `ε(s, Xε [u](s), u(s))ds + gε(Xε [u](T )) .
t

88
where Xεt,x[u](s) is the solution of (24) associated with u. The function vε is defined as

vε(x, t) := inf Jε(t, x, u), vε(x, T ) = gε(x).

u∈U

We suppose, uniformly in ε, that the data of the family of problems satisfies the same kind
of assumptions than the data for the original problem. For ε = 0, we consider our original
data.
Proposition 12. [A stability result] Suppose that uniformly in (t, u) ∈ [0, T ] × U and
x ∈ K , for some compact K , we have

lim |φε(t, x, u) − φ0(t, x, u)| = 0,

ε↓0

where φε = bε, σε, `ε, gε. Then vε(t, x) → v(t, x) uniformly over compact sets.
Proof: Fix (t, x) ∈ [0, T ] × Rn and u ∈ Ut . For notational convenience, we do not write the
dependence on time for the data of the problems. Let us set Xε (·) := Xεt,x [u](·), X0 (·) := X t,x [u](·),
δε b(X) := bε (Xε )−b(X), δε σ(X) := σε (Xε )−σ(X). By Itô’s formula, we have for every s ∈ [t, T ],
Z sn h io Z s
2 >
|Xε (s)−X(s)| = 2(Xε − X)δε b(x) + Tr δε σ(X)δε σ(X) dr+2 (Xε −X)δε σ(X)dW (r).
t t

89
From now on we denote by K > 0 a generic constant. Thus,
" #
2
E sup |Xε (r) − X(r)| ≤ K(I1 + I2 ),
t≤r≤s

where,
hR h i i
s Rs >
I1 = E t |Xε (r) − X(r)||δ ε b(X)|dr + t Tr δε σ(X)δε σ(X) dr ,
Rr
I2 = E supt≤r≤s t (Xε − X)δε σ(X)dW (r 0 ) .

Now we estimate I2 . By the BDG inequality we get

90
By Young inequality, with the obtain that
" # Z s
2 2 2
KI2 ≤ 21 E sup |Xε (r) − X(r)| + 2K E |δε σ(X)| dr .
t≤r≤s t

which by Gronwall inequality implies that

By the uniform convergence of the parameters we can find η : [0, ∞) × [0, ∞) → [0, ∞), continuous,
non-decreasing, with η(0, R) = 0 for all R ≥ 0, such that
n
|φε (t, x, u) − φ0 (t, x, u)| ≤ η(ε, |x|) ∀ (t, x, u) ∈ [0, T ] × R × U.

In fact, we can take

η(ε, R) = sup sup sup φε0 (t, x, u) − φ0 (t, x, u) .

0≤ε0 ≤ε 0≤r≤R (t,u)∈[0,T ]×U, x∈B(0,r)

91
Thus, we get that
" # Z T h i
2 2
E sup |Xε (s) − X(s)| ≤K E η(ε, |X(s)|) ds
t≤s≤T t

Now, we estimate the r.h.s. of the above equation. By our assumptions, we obtain

η(ε, |X(s)|) ≤ 2L(1 + |X(s)|).

Therefore, for any R > 0, we have

h i h i
2 2 2
E η(ε, |X(t)|) ) ≤ η(ε, R) + KE I|X(t)|>R 1 + |X(t)| .

Thus, by the Cauchy Schwartz inequality, we obtain

h i 1 h i 1
2 2 4 2
E η(ε, |X(t)|) ) ≤ η(ε, R) + KP(|X(t)| > R) 2 E 1 + |X(t)| .

By the Chebyshev inequality, we get

1
E(|X(t)|4 ) 2
h i
2 2
E η(ε, |X(t)|) ) ≤ η(ε, R) + K 2 (1 + |x|2 )
R
2 (1+|x|4 )
≤ η(ε, R) + K .
R2

92
Therefore,
h
2
i
2 (1 + |x|4 )
sup E η(ε, |X(t)|) ≤ η(ε, R) + K
t∈[0,T ] R2
which gives !
2
1 + |x|
sup E [η(ε, |X(t)|)] ≤ K η(ε, R) + . (25)
t∈[0,T ] R
In particular, letting first ε ↓ 0 and then R ↑ ∞, we get
" #
E sup |Xε (s) − X(s)| → 0.
t≤s≤T

Using (25), we have

!
2
1 + |x|
|Jε (t, x, u) − J(t, x, u)| ≤ K η(ε, R) + .
R

93
Which implies that Jε (t, x, u) → J(t, x, u) uniformly in [0, T ] × Ut and x in a compact set. This fact
implies the result.

As a particular case, let us suppose that we only perturb σ to σε :=

√
(σ, 2εIn)n×(m+n) (i.e. we are adding n independent “small noises”). Then
Corollary 3. [An error estimate] There exists a constant K > 0 such that
√
|vε(t, x) − v(t, x)| ≤ K ε.

Proof: It is clear that now

" #
√
E sup |Xε(s) − X(s)| = O( ε),
t≤s≤T

and the result follows directly from the expression of |Jε(t, x, u) − J(t, x, u)|.

94
[Semiconcavity of the value function under stronger assumptions] Let us recall that
a function φ is said to be semiconcave if ∀ x, y ∈ Rn, λ ∈ [0, 1]
2
λφ(x) + (1 − λ)φ(y) − φ(λx + (1 − λ)y) ≤ Kλ(1 − λ)|x − y| .

The trivial and useful semiconcave and non-concave example is |x|2 thanks to the identity
2 2 2 2
λ|x| + (1 − λ)|y| = |λx + (1 − λ)y| + λ(1 − λ)|x − y| .

As a direct consequence of the above example, we get:

Proposition 13. [An equivalent definition] A function φ is semiconcave iff there exists
a constant K > 0 such that φ(·) − K| · |2 is concave.

We say that a function φ is semiconvex if −φ is semiconcave, i.e. there exists a

constant K > 0 such that φ(·) + K| · |2 is convex.

95
Let us assume that g(·) and `(t, ·, u) are semiconcave uniformly on (t, u). Moreover,
let us suppose that b(t, ·, u), σ(t, ·, u) are differentiable and its derivatives are Lipschitz
uniformly in (t, u).

Proposition 14. [Semiconcavity of v ] Under the above assumptions, the function v(t, ·)
is semiconcave, uniformly in t ∈ [0, T ].

Sketch of the proof: Take two points x1 , x2 ∈ Rn and set xλ := λx1 + (1 − λ)x2 . Fix u ∈ Ut
such that
J(t, xλ , u) ≤ v(t, xλ ) + ε.
Given this fact, it is enough to prove the semiconcavity of J(t, ·, u) with an associated constant independent
of u. Let us denote by

t,x1 t,x2 t,xλ

X1 = X [u], X2 = X [u] and Xλ = X [u].

Due to the non linearity of b and σ we have in general Xλ 6= λX1 + (1 − λ)X2 and we cannot apply
directly the semiconcavity of `(t, ·, u) an g(·). So we have to work a little... In order to simplify the
notation, suppose that ` ≡ 0

λJ(t, x1 , u)+(1−λ)J(t, x2 , u)−J(t, xλ , u) = E [λg(X1 (T )) + (1 − λ)g(X2 (T )) − g(Xλ (T ))] .

96
Let us define X λ (t) := λX1 + (1 − λ)X2 . In order to use the semiconcavity of g we need to estimate
" #
λ
E sup |X (s) − Xλ (s)|
s∈[t,T ]

As usual these will depend on the difference of the coefficients. Let us write the coefficient of the dt part

λb(s, X1 (s)) + (1 − λ)b(s, X2 (s)) − b(s, Xλ (s)).

Because of the Lipschitz property for b, it is natural to write

λ λ
λb(s, X1 (s)) + (1 − λ)b(s, X2 (s)) − b(s, X (s)) + b(s, X (s)) − b(s, Xλ (s)),

and to prove that

λ 2
λb(s, X1 (s)) + (1 − λ)b(s, X2 (s)) − b(s, X (s)) ≤ Kλ(1 − λ)|X1 (s) − X2 (s)| .

This is easily checked using that b(s, ·) has a Lipschitz derivative. Analogously, we obtain
λ 2
λσ(s, X1 (s)) + (1 − λ)σ(s, X2 (s)) − σ(s, X (s)) ≤ Kλ(1 − λ)|X1 (s) − X2 (s)| .

Then use the standard procedure: write |X λ (s) − Xλ (s)|2 , apply the BDG inequality and then use the
Gronwall inequality to get the result (Exercise!).

97
The Hamilton-Jacobi-Bellman equation

Now, we do the link with the PDEs. In fact, for the moment, we prove that if
v ∈ C 1,2([0, T ] × Rd) then it satisfies a second order Hamilon-Jacobi-Bellman (HJB)
equation (compare with proposition 10). Let us define the function

Ĥ : [0, T ] × Rn × U × Rn × Rn×n → R
H : [0, T ] × Rn × Rn × Rn×n → R

Ĥ(t, x, u, p, P ) := `(t, x, u) + p>b(x, u) + 12 Tr(σ(x, u)σ(x, u)>P ),

H(t, x, p, P ) := inf u∈U Ĥ(t, x, u, p, P ).

Note that for a smooth f : [0, T [×Rn → R, we have

2
H(t, x, Df (t, x), D f (t, x)) = inf {`(t, x, u) + A[u]f (t, x)} ,
u∈U

98
where
h i
> 1 > 2
A[u]f (t, x) = b(t, x, u) Df (t, x) + 2 Tr σσ (t, x, u)D f (x, t) .

The dynamic programming principle

“An optimal policy has the property that whatever the initial
state and initial decision are, the remaining decisions must
constitute an optimal policy with regard to the state
resulting from the first decision.”
R. Bellman [4]

The following property will be crucial in order to characterize the value function as the
solution of a PDE.

99
Theorem 19. [Dynamic programming principle] For any (t, x) ∈ [0, T ] × Rn we have
"Z #
t̄
t,x t,x
v(t, x) = inf E `(t, X [u](s), u(s))ds + v(t̄, X [u](t̄)) .
u∈Ut t

Proof of the “easy inequality”: Let u ∈ Ut . We have

hR i
t̄ t,x RT t,x t,x
J(t, x, u) =E `(t, X [u](s), u(s))ds + t̄ `(t, X [u](s), u(s))ds + g(X [u](T ))
t
R t̄ t,x T t,x
[u](s), u(s))ds + t̄ `(t, X t,x [u](s), u(s))ds + g(X t̄,X [u](t̄) [u](T ))
R
=E t `(t, X

Dynkin theorem 3 implies that each time that we freeze the W (·) up to time t̄ (i.e. we condition w.r.t.
Ft̄ and we evaluate at ω ), there exists an admissible control ûω ∈ Ut̄ such that ûω (s) = u(s, ω) for all
s ∈ [0, t̄]. Conditioning w.r.t. Ft̄ we get
hR i
t̄ t,x t,x
J(t, x, u) =E `(t, X [u](s), u(s))ds + J(t̄, X
[u](t̄), û)
hRt i
t̄ t,x t,x
≥E t `(t, X [u](s), u(s))ds + v(t̄, X [u](t̄)) ,

100
which implies that
"Z #
t̄
t,x t,x
v(t, x) ≥ inf E `(t, X [u](s), u(s))ds + v(t̄, X [u](t̄)) .
u∈Ut t

The other inequality is rather difficult to prove, specially for more general problems when the value
function v is not a priori continuous. There are different types if proofs: El Karoui [9] used delicate
measurability selections theorems. Yong and Zhou [24] consider a weak form of the formulation of stochastic
optimal control problems (i.e. the probability space is a part of the control). B. Bouchard and N. Touzi [6]
established a general “weak form” of the dynamic programming principle, that allows to establish the HJB
equation (26) in a very general framework.

Let us consider the following second order HJB equation

∂tv(t, x) + H(t, x, Dv(t, x), D 2v(t, x)) = 0 for (t, x) ∈ [0, T ) × Rn

v(T, x) = g(x) for x ∈ Rn.
(26)
We have
Theorem 20. Suppose that the value function v belongs to C 1,2([0, T ] × Rd). Then v
solves (26).

101
Proof: For simplicity we will suppose that the derivatives ∂t v , Dv and D 2 v are bounded. Otherwise,
we have to use a localization argument with the stopping times (as in the proof of proposition 10). Given
u ∈ U define a control u ∈ Ut as u(t, ω) ≡ u. By the dynamic programming principle we have
"Z #
t̄
x,t x,t
v(t, x) ≤ E `(s, X [u](s), u)ds + v(t̄, X [u](t̄)) . (27)
t

On the other hand, Itô’s formula implies

x,t R t̄ h x,t x,t

i
v(t̄, X [u](t̄)) = v(t, x) + t ∂t v(s, X [u](s)) + A[u]v(s, X [u](s)) dt
R t̄
+ t Dv(s, X x,t [u](s))σdW (s).

By taking the expected valued and using (27), we obtain

"Z #
t̄ n o
x,t x,t x,t 2 x,t
E ∂t v(s, X [u](s)) + Ĥ s, X [u](s), u, Dv s, X [u](s) , D v s, X [u](s) ≥ 0.
t

Dividing by t̄ − t and letting t → t̄ gives

2
∂t v(t, x) + Ĥ(t, x, u, Dv(t, x), D v(t, x)) ≥ 0.

102
By taking the infimum over u ∈ U we get
2
∂t v(t, x) + H(t, x, Dv(t, x), D v(t, x)) ≥ 0.

To prove the converse inequality, chose uε,t̄ such that

"Z #
t̄
x,t x,t
v(t, x) + ε(t − t̄) ≥ E `(s, X [uε,t̄ ](s), uε,t̄ )ds + v(t̄, X [uε,t̄ ](t̄)) .
t

Again, using Itô’s formula, we get (we simplify the notation since the context is clear)
hR n oi
t̄ 2
ε(t − t̄) ≥ E t ∂t v(s, X[uε,t̄ ](s)) + Ĥ s, X[uε,t̄ ], uε,t̄ , Dv s, X[uε,t̄ ] , D v s, X[uε,t̄ ]
hR i
t̄ 2
≥ E t ∂t v(s, X[uε,t̄ ](s)) + H s, X(s), Dv s, X[uε,t̄ ], D v(s, X[uε,t̄ ] .

Using the uniform continuity of the functions, we can divide by t̄ − t and to pass to the limit to get the
result.

Theorem 21. [A verification theorem] Let v ∈ C 1,2([0, T ] × Rn) be a solution of (26).

Then:
(i)
n
v(x, t) ≤ J(t, x, u) for all u ∈ Ut, (t, x) ∈ [0, T ] × R .

103
(ii) We have the following verification argument for open-loops controls. An admissible
pair (x̄(·), ū(·)) is optimal for v(x, t) iff
2
∂tv(x, t) + Ĥ(t, x̄(t), ū(t), Dv(t, x̄(t)), D v(t, x̄(t)) = 0.

(iii) We have the following verification argument for Markov or feed-back controls.
Suppose that for each (t, x) there exists a ū(t, x) ∈ U such that

2 2
Ĥ t, x, ū(t, x), Dv(t, x), D v(t, x) = H t, x, Dv(t, x), D v(t, x) .

Suppose also that the equation

dX(s) = b (s, X(s), ū(s, X(s))) ds + σ(s, X(s), ū(s, X(s)))dW (s)
X(t) = x
(28)
admits a unique solution for all (x, t). If s ∈ [t, T ] → ū(s, X(s)) ∈ U is admissible,
then it is optimal.

104
Proof: Let u ∈ Ut with associate trajectory x(·) with x(t) = x, then we have
hR i
T
E(g(x(T )) = v(x, t) + E t {∂t v(s, x(s)) + Av(s, x(s))}
hR n
T
≥ v(x, t) + E t ∂t v(s, x(s)) + H(s, x(s), Dv(s, x(s)), D 2 v(s, x(s)))
−`(s, x(s), u(s))}] ,

which, using the equation (26), proves (i). For (ii), the inequality becomes equality, from which follows
easily the result. Finally, (iii) is a direct consequence of (ii).

105
Viscosity solutions

As we have seen, if the value function is regular enough, then it solves equation (26).
This equation can be written in the abstract form

2
F (t, x, v(t, x), ∂tv(t, x), Dv(t, x), D v(t, x)) = 0 plus limit conditions.
(29)
In general this equation does not admit a classical solution. However we can expect to
define a weak type of solutions such that (29) is well posed, i.e. it has a unique solution.
The correct manner is the one of f viscosity solutions (introduced in this context by Lions
in [15, 16])

However, we need to pay attention to the following fact: One of the crucial assumptions
in the theory of viscosity solutions is that F (t, x, v, q, p, P ) is monotone w.r.t. P . The
usual convention for the type of monotonicity is that F is not increasing w.r.t. P . This is

106
also called ellipticity condition. In order to follow this convention, we have to write (26) as

−∂tv(t, x) + G(t, x, Dv(t, x), D 2v(t, x)) = 0 for (t, x) ∈ [0, T ) × Rn

v(T, x) = g(x) for x ∈ Rn,
(30)
where
n o
> >
G(t, x, p, P ) := supu∈U −`(t, x, u) − b(t, x, u) p − 21 Tr σ(t, x, u)σ(t, x, u) P .

= supu∈U −Ĥ(t, x, u, p, P ).

Evidently, we have G(t, x, p, X) ≤ G(t, x, p, Y ) if X ≥ Y and thus our operator is

elliptic. Of course, we have
2
G(t, x, Dv(t, x), D v(t, x)) = sup {−`(t, x, u) − A[u]v(t, x)} .
u∈U

107
[Definition of viscosity solutions] (i) We say that v ∈ C([0, T ] × Rn) is a viscosity
subsolution of (30) if
v(T, x) ≤ g(x),
and for any φ ∈ C 1,2([0, T ] × Rn), whenever v − φ attains a local maximum at (t, x)
we have
2
−∂tφ(t, x) + G(t, x, Dφ(t, x), D φ(t, x)) ≤ 0.

(ii) We say that v ∈ C([0, T ] × Rn) is a viscosity supersolution of (30) if

v(T, x) ≥ g(x),

and for any φ ∈ C 1,2([0, T ] × Rn), whenever v − φ attains a local minimum at (t, x)
we have
2
−∂tφ(t, x) + G(t, x, Dφ(t, x), D φ(t, x)) ≥ 0.

(iii) We say that v ∈ C([0, T ] × Rn) is a viscosity solution of (30) if it is both, a

viscosity subsolution and a viscosity supersolution.

108
[Some important remarks] The following properties are proposed as an exercise! (in
increasing order of difficulty)
(i) We can suppose that at the test point (x, t) we have v(x, t) = φ(x, t).
(ii) We can change “local maximum”, “local minimum” by “local maximum strict” and
“local minimum strict”.
(iii) We can change “local strict maximum”, “local strict minimum” by “global strict
maximum” and “global strict minimum” .

In our stochastic framework, it will be convenient to work with the last notion in order
to avoid localization arguments with stopping times (see the proof of theorem 22).

[The value function is a viscosity solution of (30)] We have

Theorem 22. The value function v is a viscosity solution of (30).

Proof: We essentially repeat the proof for the regular case, but taking into the account the test
functions in order to be able to differentiate. Let φ ∈ C 1,2 ([0, T ] × Rn ) and (t, x) such that v − φ has
a global maximum at (t, x). Let u ∈ U and define u(·) ∈ Ut defined as u(t, ω) ≡ u. Let us denote by

109
x(·) the associated state, with x(t) = x. We have, for any t̄ > t,
"Z #
t̄
E {v(s, x(s)) − φ(s, x(s)) − [v(t, x) − φ(t, x)]} ds ≤ 0.
t

By Itô’s formula, we obtain

"Z #
t̄
E {v(s, x(s)) − v(t, x) − ∂t φ(s, x(s)) − A[u]φ(s, x(s))} ds ≤0
t

Using the dynamic programing principle, we get

"Z # "Z #
t̄ t̄
E {v(s, x(s)) − v(t, x)} ds ≥E −`(s, x(s), u)ds .
t t

Therefore, "Z #
t̄
E {−∂t φ(s, x(s)) − `(s, x(s), u) − A[u]φ(s, x(s))} ds ≤0
t
By taking the supremum w.r.t. u ∈ U we obtain
"Z #
t̄ n o
2
E −∂t φ(s, x(s)) + G(s, x(s), Dφ(s, x(s)), D φ(s, x(s)) ds ≤ 0.
t

110
Dividing by t − t̄ and taking the limit t̄ → t, yields the subsolution property.

For the supersolution property, let φ ∈ C 1,2 ([0, T ] × Rn ) and (t, x) such that v − φ has a global
minimum at (t, x). Thus, for t̄ > t, we have for every adapted x(·) starting at x,
"Z #
t̄
E {v(s, x(s)) − φ(s, x(s)) − [v(t, x) − φ(t, x)]} ds ≥ 0. (31)
t

Using the dynamic programming principle, chose uε,s̄ ∈ Ut , with associated state xε,s̄ , such that
"Z # "Z #
t̄ n o t̄
E v(t, x) − v(s, xε,t̄ (s)) ds ≥ E ` s, xε,t̄ (s), uε,t̄ (s) ds − ε(t̄ − t).
t t

Combining with (31) we get

"Z #
t̄ n o
E −` s, xε,t̄ (s), uε,t̄ (s) + φ(t, x) − φ(s, xε,t̄ (s)) ds ≥ −ε(t̄ − t).
t

By Itô’s formula we obtain

"Z #
t̄ n o
E −∂t φ(s, xε,t̄ (s)) − ` s, xε,t̄ (s), uε,t̄ (s) − A[uε,t̄ (s)]φ(s, xε,t̄ (s)) ds ≥ −ε(t̄ − t),
t

111
which implies that
"Z #
t̄ n o
2
E −∂t φ(s, xε,t̄ (s)) + G(s, xε,t̄ (s), Dφ(s, xε,t̄ (s)), D φ(s, xε,t̄ (s)) ds ≥ −ε(t̄ − t).
t

Dividing by t̄ − t, letting t̄ ↓ t and noting that ε is arbitrary, we obtain that v is a supersolution.

[Basic results] We first provide a proposition that implies that viscosity solutions
qualify as generalized solutions:
Proposition 15. [Equivalence of notions under regularity] Let v ∈ C 1,2([0, T ] × Rn.
Then v is a viscosity solution of (30) iff it is a classical solution.

Proof: The proof for the subsolution (supersolution) property follows very easily from the first and
second order optimality conditions for a maximum (minimum) of v − φ at some (x, t), together will the
ellipticity of G.

In fact, in order to satisfies the equation point-wisely, we can ask, instead of v ∈ C 1,2,
that v admits a “first order expansion in t” and a second order expansion at x.

112
Lemma 8. [Relation with “second order expansions in x”] Suppose that for all (t, x)
there exists X (for notational convenience we denote D 2v(t, x) = X ) such that
2
v(s, y) = v(t, x) + ∂t v(t, x)(s − t) + Dv(x, t)(y − x) + 21 hX(y − x), y − xi + o(|s − t| + |x − y| ).

Then, if v is a viscosity subsolution (supersolution) of (30), we have that

2
−∂tv(t, x) + G(t, x, Dv, D v(t, x)) ≤ (≥)0.

Proof: For the subsolution case, it is enough to test using the quadratic function
2
φε,δ (s, y) := v(t, x) + (∂t v(t, x) + ε)(s − t) + Dv(x, t)(y − x) + 12 h[D v(x, t) + δI](y − x), y − xi

In fact, we have that v(t, x) − φε,δ (s, y) has a local maximum at (t, x). We thus find that

2
−(∂t v(t, x) + ε) + G(t, x, Dv, D v(t, x) + δI) ≤ 0.

and the result follows by letting ε, δ → 0. An analogous argument applies for the supersolution property.

113
Now we turn our attention to an important and rather surprising stability result.
Proposition 16. [Stability result] Let vε be a solution of

−∂tvε + Gε(t, x, Dvε, D 2vε) = 0 in (0, T ) × Rn,

(32)
vε(x, T ) = gε(x) in (0, T ) × Rn,

where, for every ε > 0, the functions Gε and gε are continuous. Suppose that as
ε ↓ 0, we have Gε → G, hε → h and vε → v uniformly over any compact set. Then
v is a viscosity solution of

−∂tv + G(t, x, Dv, D 2v) = 0 in (0, T ) × Rn,

(33)
v(x, T ) = g(x) in (0, T ) × Rn.

Proof: The proof, as well as a several of arguments in this theory, is based on the the basic lemma 9
below. Let us prove that v is a subsolution of (33). Let φ ∈ C 1,2 and (t, x) such that v − φ has a strict
maximum at (t, x). Using the lemma below, we have that the existence of (tn , xn ) → (t, x) such that
vn − φ has a local maximum at (tn , xn ). Using that vn is a subsolution of (32), we have
2
−∂t φ(tn , xn ) + Gε (tn , xn , Dφ(tn , xn ), D φ(tn , xn )) ≤ 0.

114
By passing to the limit, we obtain the result. The supersolution property follows with the same procedure.

Lemma 9. Let v : O → R be continuous (where O is a domain on a Euclidean space),

with a strict local maximum at x0. Suppose that there exists a sequence vn → v locally
uniformly to v . Then there exists a sequence xn of local minimums of vn such that
xn → x0 .

Proof of the lemma: There exists a δ > 0 such that v(x0 ) > v(x) for x ∈ Bδ (x0 ). Therefore,
since vn → v uniformly on B δ (x0 ), after some n0 , we will have that any maximum xn of vn in B δ (x0 )
belongs to Bδ (x0 ). In order to prove that xn → x0 , let x̄ be a limit point of xn . Then, because
of the uniform convergence, vn (xn ) → v(x̄) = v(x0 ) (uniform convergence implies convergence of the
maximum of the functions, to see this note that vn (xn ) ≥ vn (z) for all z ∈ Bδ (x0 ) and then pass to the
limit). Therefore, because x0 is the unique maximum of v in Bδ (x0 ) we get that x̄ = x0 , and the proof is
completed.

Now we state the important issue of uniqueness of a viscosity solution for (30).

Theorem 23. [Uniqueness theorem] The value function is the unique solution of (30).

115
As a important corollary we get the an error estimate for the vanishing viscosity
approximation of (30). In fact, equation (30) can be degenerate and in general does not
have regular solutions. A natural way of regularizing the equation is to consider

−∂tvε − ε∆vε + G(t, x, Dvε, D 2vε(t, x)) = 0 for (t, x) ∈ [0, T ) × Rn

vε(T, x) = g(x) for x ∈ Rn,
(34)
In this case, due to the regularizing Laplacian, it can be shown that there exists a unique
solution vε ∈ C 1,2 of (34). The unicity result of (23) and corollary 3 imply
Proposition 17. [Vanishing viscosity] There exists a constat K > 0 such that
√ n
|vε(t, x) − v(t, x)| ≤ K ε ∀ (t, x) ∈ [0, T ] × R , ε > 0.

We will not prove exactly the uniqueness theorem. However, we will prove a comparison
principle for a model second order HJB equation. Up to important technical matters (see
[24]), the same procedure allows to prove the uniqueness result of a viscosity solution of
(30).

116
Let O ⊆ Rn be an open and bounded set. Let us consider the equation

2
H v(x), Dv(x), D v(x) = 0, x ∈ O. (35)

where H : Rn × R × Rn × S n → R. We assume:
(i) H is continuous.

(ii) [Ellipticity condition] We suppose that H is non-increasing w.r.t. the last variable, i.e.
for every A, B ∈ S n, with A ≤ B we have
n
H(v, p, A)≥H(v, p, B) for all (v, p) ∈ R × R .

(iii) There exists γ > 0 such that for all v ≥ v2 we have

n n
H(v1, p, P ) − H(v2, p, P )≥γ(v1 − v2) for all (p, P ) ∈ R × S .

The definition of subsolutions and supersolutions is of course analogous to the one given
for the parabolic case. The result that we want to prove is the following:

117
Theorem 24. Let v and v̂ be a subsolution and a supersolution, respectively, of (35).
If v ≤ v̂ in ∂O , then v ≤ v̂ in O .

Proof for the regular case: Let us assume that v, v̂ ∈ C 2 (Ω) (or more generally that the admit a
second order expansion). We argue by contradiction. Suppose that M := supx∈O [v(x) − v̂(x)] =
v(x0 ) − v̂(x0 )> 0, for some x0 ∈ O . Then,
2 2
Dv(x0 ) = Dv̂(x0 ) and D v(x0 ) ≤ D v̂(x0 ). (36)

Moreover, by the lemma 8, we must have

2 2
H v(x), Dv(x), D v(x) ≤ 0, −H v̂(x), Dv̂(x), D v̂(x) ≤ 0. (37)

By combining (36) and (36) and using the ellipticity property, we obtain

2 2
H v(x0 ), Dv̂(x0 ), D v̂(x0 ) − H v̂(x0 ), Dv̂(x0 ), D v̂(x0 ) ≤ 0,

which yields a contradiction with assumption (iii).

For the proof of the general case, we will try to “mimic” the above proof. We will
need need to approximate v and v̂ in such a way that they become almost two times

118
differentiable. The good approximation is by semiconvex functions and one way to do it is
to use the so-called inf-convolutions and sup-convolutions.

The following arguments and the proof of the uniqueness result are based on the notes
[8].

Why semi-convex functions are the good regularization? We provide below two
theorems that answer this question:
Theorem 25. [Alexandrov theorem] Let w be a semiconvex function with constant M
over the open set O . Then, for a.a. x ∈ O , the function w admits a second order
expansion, i.e. there exist X ∈ Sn (which depends on x) such that

2
w(y) = w(x) + Dw(x)(y − x) + 12 hX(y − x), y − xi + o |y − x| .

Moreover, we have that X ≥ −2M In.

Proof: See [10].

119
Theorem 26. [Jensen maximum principle] Let w : O → R be a semi-convex function
with an strict local maximum at some x0 ∈ O . More precisely, set α := w(x0) −
max∂Br (x0) w, where r > 0. For δ > 0, define
n
Eδ := x ∈ Br (x0 ) ; ∃ p ∈ R , |p| ≤ δ, w(y) − hp, y − xi ≤ w(x) ∀ y ∈ Br (x0 ) .

Then, if δ ∈ (0, αr ), the set Eδ has a strictly positive measure. Moreover

n
L(Eδ ) ≥ cδ ,

for some constant c > 0.

Proof: See [13].

We will use both theorems in the following way: If a point x ∈ Eδ and it admits a
second order expansion then
2
|Dw(x)| ≤ δ, D w(x) ≤ 0, and w(y) ≤ w(x) + hDw(x), y − xi ∀ y ∈ Br (x0).

This remark allows us to prove the following proposition:

120
Proposition 18. Consider a semiconvex function w with a local strict maximum at x0.
Then, there exists a sequence xn → x0 such that w has a second order expansion at
x0 and
2
Dw(xn) → 0 and D w(xn) ≤ 0.

Proof: It suffices to consider points xn in E 1 such that w has a second order expansion at xn . Thus
n
the properties announced follow for the above remark, except for the convergence of xn to x. But

w(x0 ) − hDw(xn ), x0 − xi ≤ w(xn ),

which, by passing to the limit and using that x0 is a strict local minimum, imply that every limit point of
xn is equal to x0 . The result follows.

Now we state a version of the above propositions that do not ask for a strict local
maximum.

Proposition 19. Consider a semiconvex function w with a local maximum at x0. Then
there exists a matrix X ∈ Sn and a sequence xn → x0 such that w has a second order

121
expansion at x0 and
2
Dw(xn) → 0 and D w(xn) → X ≤ 0.

Proof: Apply proposition 18 to the semiconvex function wk (x) := w(x) − k1 |x − x0 |2 and use a
diagonal procedure (exercise!).

Given a compact set K ⊆ Rn and a continuous function v := K → R we define for

α > 0 its sup-convolution v α as

α 1 2
v (x) := sup v(y) − |x − y| | y ∈ K for all x ∈ K.
α

Analogously, for a continuous function v := K → R, we define for α > 0 its

inf-convolution vα as

1 2
vα(x) := inf v(y) + |x − y| | y ∈ K for all x ∈ K.
α

122
Note that vα = −(−v)α, which allow us to extend properties of the sup-convolution
to the inf-convolution.

[Fundamental properties of the inf-convolution]

Lemma 10. [v α is semiconvex and Lipschitz] For all α > 0, v α is Lipschitz in K , and
semiconvex with constant α1 in the interior of K .
Proof: The Lipschitz property follows easily from the fact that K is compact. On the other hand, note
that
α 1 2 1
2
v (x) + |x| = sup v(y) − |y| − 2hx, yi ,
α α
from which the semi-convexity (with constant 1/α) follows.

Lemma 11. [A convergence result for v α] For all α > 0, we have that v α ≥ v . Also,
we have that v α(x) → v(x) for all x ∈ K . Moreover,
α
lim v (xα) = v(x). (38)
α↓0, xα →x

Proof: Taking y = x in the definition of v α we get that v α (x) ≥ v(x). Now, we prove

123
lim supxα →x v α (xα ) ≤ v(x). In fact, by compactness we have the existence of yα such that

α 1 1
v (xα ) = v(yα ) − |xα − yα | ≤ M − |xα − yα |,
α α

for M := supK |u|. This implies that yα → x, because

1 α
|xα − yα | ≤ M − v (xα ) ≤ M − v(xα ) ≤ 2M.
α

Consider a subsequence such that lim v αn (xαn ) = lim supxα →x v α (xα ). We have

α
lim v n (xαn ) ≤ lim v(yαn ) = v(x).

We use this result, in order to prove (38). In fact,

α α
v(x) = lim v(xα ) ≤ lim inf v (xα ) ≤ lim sup v (xα ) ≤ v(x).

Lemma 12. [v α is still a subsolution on a smaller domain] If v is a subsolution of (35),

124
then v α is a subsolution of the same equation but on the open set
 
" #1 

 2
α,k α
O := x ∈ O ; v > −k, d(x, ∂O) > α(sup v + k)
O

 


Proof: See [8].

Finally, we will need another simple lemma whose easy proof is left to the reader.

Lemma 13. Consider a continuous function w : K → R and a sequence of continuous

functions wα : K → R such that wα ≥ w and lim supα↓0, xα→x wα(xα) = w(x).
Then for all ε > 0 there exists α0 > 0 such that for all α ∈ (0, α0) and maximum
point xα of wα, there exists a maximum point x of w such that |x−xα| ≤ ε. Moreover,

α
lim max w = max w.
α↓0 K

125
Proof of the comparison principle: We argue by contradiction. If the result is not true, since we
have v ≤ v̂ in ∂O , we have the existence of x̂ ∈ O such that v(x̂) > v̂(x̂). Let us denote by
M = maxx∈O {v(x) − v̂(x)} > 0. As for the proof of the comparison principle for first order equations
(see e.g. [3]) we double variables, by introducing, for ε > 0,

1 2
wε (x, y) := v(x) − v̂(y) − 2 |x − y| .
ε
which we know that “approximates well” the function v − v̂ (see e.g. [3]). By continuity and convergence
we have maxÔ wε > 0 for ε small enough. Also, there exists θ > 0 (independent of ε) such that for every
maximum point (xε , yε ) of wε we have

d(xε , ∂O) > θ ; d(yε , ∂O) > θ. for all ε small enough.

Now fix such an ε. To complete the proof we follow the following steps:
(i) We modify the regularization wε to

α 1 2
wε,α (x, y) = v (x) − v̂α (y) − 2 |x − y| for all α > 0. (39)
ε
This allows to obtain functions that have a.e. a second order expansion and to apply optimality conditions
at an optimal (xε , yε ). In this way we try to proceed as if originally the functions were differentiable. To
put v α it is natural because we now that it is a subsolution over a smaller domain O α . Analogously, to put
v̂α is natural since it is a super-solution over a smaller domain Oα .

126
(ii) We construct O α and Oα in such a way that any maximum (xα , yα ) of wε,α satisfies that xα ∈ O α
and yα ∈ Oα . In fact, to do this we need the following lemma (whose easy proof is left to the reader):

Lemma 14. Let f α and f be u.s.c. over a compact set K . Suppose that
α α
f ≥f and that lim sup f (xα ) = f (x).
α↓0, xα →x

Then, for every ε > 0, there exists α0 > 0 such that: For all α ∈ (0, α0 ) and xα ∈ argmax f α
there exists x ∈ argmax f such that |x − xα | ≤ ε. Moreover
α
lim max f = max f.
α↓0 K K

Using the above lemma we see that for α small enough, we have that (xα , yα ) is uniformly close to
(xε , yε ) ∈ argmax wε . In particular, d(xα , ∂O) > θ/2 and d(yα , ∂O) > θ/2. The same lemma
implies that
lim max wα,ε = wε .
α↓0 O×O
From the definition in terms of v and v̂ and the fact that

α 1 2
v (xα ) − v̂α (yα ) − 2 |xα − yα | = max wε,α > 0.
ε

127
we readily have that v α (xα ) and v̂α (yα ) are bounded by a constant independent of (ε, α). Therefore
there exists a constant k such that

i1
( )
h
xα ∈ O := x ∈ O ; v (x) > −k ; d(x, ∂O) > α(supO v + k) 2
α α
,
i1
( )
h
yα ∈ Oα := x ∈ O ; v̂α (x) < k ; d(x, ∂O) > α(inf O v̂ + k) 2

(iii) We fix α and a maximum (x̄, ȳ) ∈ O α × Oα . We obtain information from this optimality thanks to
the semiconvexity and proposition 19. In fact, there exists a sequence (xn , yn ) → (x̄, ȳ) and a matrix
A ∈ S 2n , with A ≤ 0, such that

2
Dwε,α (xn , yn ) → 0 ; D wε,α (xn , yn ) → A. (40)

(iv) In view of the particular structure of wε,α (a decoupled function plus a C ∞ function), we have an
important information: the fact that wε,α admits a second order expansion at (xn , yn ) implies that v α
admits a second order expansion at xn and v̂α admits a second order expansion at yn .

128
Moreover, we have

α 2 2
Dwε,α (xn , yn ) = Dv (xn ) − 2 (xn − yn ), −Dv̂α (yn ) + 2 (xn − yn )
ε ε
! (41)
D 2 v α (xn )

2 0 I n −I n
D wε,α (xn , yn ) = 2 − 22
0 −D v̂α (yn ) ε −I n I n

The convergence in (40) implies the existence of X , Y in Sn such that:

α 2 2
Dv (xn ) → 2 (x̄ − ȳ), Dv̂α (yn ) → 2 (x̄ − ȳ) (42)
ε ε
and
X 0 2 In −In
A= − 2 (43)
0 −Y ε −In In
n n
Testing with (z, z) ∈ R × R we get that X ≤ Y .
(iv) Intuitively, we are almost done, because if we do the analogy with the regular case (recall that x̄ should
be like ȳ ), we have that (42) is like Dv(x̄) = Dv̂(x̄) and (43) is like D 2 v(x̄) ≤ D 2 v̂(x̄). Let us finish
the proof in the correct manner. Since v α is a subsolution of (35) in O α and it has a second order expansion
at every xn , we get
α α 2 α
H(v (xn ), Dv (xn ), D v (xn )) ≤ 0.
The continuity of H implies
α 2
H v (x̄), 2 (x̄ − ȳ), X ≤ 0.
ε

129
Analogously, since v̂α is a subsolution of (35) in Oα and it has a second order expansion at every yn , we
get
2
H v̂α (ȳ), 2 (x̄ − ȳ), Y ≥ 0.
ε
Subtracting the inequalities we get

α 2 2
H v (x̄), 2 (x̄ − ȳ), X − H v̂α (ȳ), 2 (x̄ − ȳ), Y ≤ 0.
ε ε

After an easy calculation, using the ellipticity assumptions, we get

wε,α (x̄, ȳ) ≤ 0

and we have a contradiction since wε,α (x̄, ȳ) > 0.

130
References
[1] R.B. Ash. Basic probability theory. Wiley, NY, 1970.
[2] R.B. Ash. Real Analysis and Probability. Academic Press, NY, 1972.
[3] M. Bardi and I. Capuzzo Dolcetta. Optimal control and viscosity solutions of Hamilton-Jacobi-Bellman
equations. Birkauser, 1996.
[4] R. Bellman. Dynamic Programming. Princeton Univ. Press, Princeton, New Yersey, 1957.
[5] P. Billingsley. Convergence of Probability Measures. Wiley, NY, 1968.
[6] B. Bouchard and N. Touzi. Weak dynamic programming principle for viscosity solutions. SIAM J.
Control Optim., 49-3:948–962, 2011.
[7] L. Breiman. Probability. Addison-Wesley Publishing Company, Reading, MA, 1968.
[8] P. Cardaliaguet. Solutions de viscosité d‘équations elliptiques et paraboliques non linéaires. Lecture
Notes for the DEA program at Rennes, 2004.
[9] N. ElKaroui. Les aspects probabilistes du contrôle stochastique. Lecture Notes in Math. 876, 1981.
[10] L.C. Evans and R.F. Gariepy. Measure theory and fine properties of functions. CRC Press, Boca Ratón,
FL, 1992. Studies in Advanced Mathematics.
[11] W.H. Fleming and H.M. Soner. Controlled Markov processes and viscosity solutions. Springer, New
York, 1993.
[12] N. Ikeda and S. Watanabe. Stochastic differential equations and diffusion processes. Second Edition,
North-Holland Publishing Co., Amsterdam, 1989.

131
[13] R. Jensen. The maximum principle for viscosity solutions of fully nonlinear second order partial
differential equations. Arch. Ration. Mech. Anal., 101-1:1–27, 1988.
[14] I. Karatzas and S.E. Shreve. Brownian Motion and Stochastic Calculus. Second Edition Springer-Verlag,
New York, 1991.
[15] P.-L. Lions. Optimal control of diffusion processes and Hamilton-Jacobi-Bellman equations. I. The
dynamic programming principle and applications. Comm. Partial Differential Equations, 8(10):1101–
1174, 1983.
[16] P.-L. Lions. Optimal control of diffusion processes and Hamilton-Jacobi-Bellman equations. Part 2:
viscosity solutions and uniqueness. Comm. Partial Differential Equations, 8:1229–1276, 1983.
[17] P.A. Meyer. Probability and Potentials. Blaisdell Publishing Company, Waltham, Mass., 1966.
[18] L. Mou and J. Yong. A variational formula for stochastic controls and some applications. Pure and
Applied Mathematics Quarterly, 3:539–567, 2007.
[19] K.R. Parthasarathy. Probability Measures on Metric Spaces. Academic Press, New York, 1967.
[20] H. Pham. Optimisation et contrôle stochastique appliqués à la finance, volume 61 of Mathématiques
& Applications. Springer, Berlin, 2007.
[21] P. Protter. Stochastic integration and differential equations. Springer-Verlag, Berlin, 2nd edition,,
2004.
[22] J. Steele. Stochastic calculus and financial applications. Springer-Verlag, New York, 2001.
[23] N. Touzi. Optimal Stochastic Control, Stochastic Target Problems, and Backward SDEs. Lecture
Notes at the Fields Institute, 2010.

132
[24] J. Yong and X.Y. Zhou. Stochastic controls, Hamiltonian systems and HJB equations. Springer-Verlag,
New York, Berlin, 2000.

133

Foss Lecture4
No ratings yet
Foss Lecture4
12 pages
Econometric S If All 2020
No ratings yet
Econometric S If All 2020
119 pages
EC744 Lecture Note 9 Convergence of Markov Processes: Prof. Jianjun Miao
No ratings yet
EC744 Lecture Note 9 Convergence of Markov Processes: Prof. Jianjun Miao
22 pages
Proba 2
No ratings yet
Proba 2
88 pages
(1963) Probability Density Functionals and Reproducing Kernel Hilbert Spaces (Parzen)
No ratings yet
(1963) Probability Density Functionals and Reproducing Kernel Hilbert Spaces (Parzen)
15 pages
1 Preliminaries: 1.1 Dynkin's π-λ Theorem
No ratings yet
1 Preliminaries: 1.1 Dynkin's π-λ Theorem
13 pages
Independence: 7.1 Independent Events and Random Variables
No ratings yet
Independence: 7.1 Independent Events and Random Variables
17 pages
Continuous Time
No ratings yet
Continuous Time
62 pages
n-2 Cond-Exp
No ratings yet
n-2 Cond-Exp
31 pages
Probability Theory I: CAM 384K Concepts
No ratings yet
Probability Theory I: CAM 384K Concepts
14 pages
Introduction To Financial Mathematics
No ratings yet
Introduction To Financial Mathematics
47 pages
Week 14
No ratings yet
Week 14
4 pages
Marcin Pitera. Stochastic Processes.
No ratings yet
Marcin Pitera. Stochastic Processes.
45 pages
ArXiv 9806022
No ratings yet
ArXiv 9806022
15 pages
Notes
No ratings yet
Notes
158 pages
Condition A Ex of Convergence Theorem
No ratings yet
Condition A Ex of Convergence Theorem
8 pages
Probability Theory Nate Eldredge
No ratings yet
Probability Theory Nate Eldredge
65 pages
Martingales
No ratings yet
Martingales
15 pages
Convergence of Stochastic Processes
No ratings yet
Convergence of Stochastic Processes
223 pages
On Nyquist-Shannon Theorem With One-Sided Half of Sampling Sequence
No ratings yet
On Nyquist-Shannon Theorem With One-Sided Half of Sampling Sequence
6 pages
REU Project: Topics in Probability: Trevor Davis August 14, 2006
No ratings yet
REU Project: Topics in Probability: Trevor Davis August 14, 2006
12 pages
Reu Project: 1 Preface
No ratings yet
Reu Project: 1 Preface
10 pages
[FREE PDF sample] Rawesomely Vegan The Ultimate Raw Vegan Recipe Book ebooks
100% (8)
[FREE PDF sample] Rawesomely Vegan The Ultimate Raw Vegan Recipe Book ebooks
34 pages
Summary Notes 1
No ratings yet
Summary Notes 1
4 pages
Advanced Probabiliy
No ratings yet
Advanced Probabiliy
80 pages
011.Cours.en
No ratings yet
011.Cours.en
93 pages
Hilbert Space For Random Processes
No ratings yet
Hilbert Space For Random Processes
11 pages
Final Print (1) PRINT
No ratings yet
Final Print (1) PRINT
25 pages
ORFE 526 - Probability: 1 Definitions
No ratings yet
ORFE 526 - Probability: 1 Definitions
10 pages
Stochastic Dynamic Programming: 4.1 The Axiomatic Approach To Probability: Basic Con-Cepts of Measure Theory
No ratings yet
Stochastic Dynamic Programming: 4.1 The Axiomatic Approach To Probability: Basic Con-Cepts of Measure Theory
17 pages
Martingale Limit Theory and Stochastic Regression Theory: Ching-Zong Wei
No ratings yet
Martingale Limit Theory and Stochastic Regression Theory: Ching-Zong Wei
155 pages
Corrige 2
No ratings yet
Corrige 2
2 pages
Probability CW
No ratings yet
Probability CW
14 pages
Applied Stochastic Processes: M. Ottobre
No ratings yet
Applied Stochastic Processes: M. Ottobre
164 pages
9e6cbf4ac9c3320e4e9d5402ab7ac5eb_MIT14_384F13_rec7
No ratings yet
9e6cbf4ac9c3320e4e9d5402ab7ac5eb_MIT14_384F13_rec7
6 pages
נוסחאות ואי שיוויונים
No ratings yet
נוסחאות ואי שיוויונים
12 pages
Ergodic Properties of Markov Processes
No ratings yet
Ergodic Properties of Markov Processes
39 pages
Notes Mainimp
No ratings yet
Notes Mainimp
164 pages
Maa 203 Cheat Sheet Lucien Walewski
No ratings yet
Maa 203 Cheat Sheet Lucien Walewski
2 pages
Summary of Stat 581/582: Text: A Probability Path by Sidney Resnick Taught by Dr. Peter Olofsson
No ratings yet
Summary of Stat 581/582: Text: A Probability Path by Sidney Resnick Taught by Dr. Peter Olofsson
26 pages
Don McLeish Probability
No ratings yet
Don McLeish Probability
101 pages
Discrete Time
No ratings yet
Discrete Time
106 pages
note
No ratings yet
note
46 pages
Theory of Probability Zitcovic PDF
No ratings yet
Theory of Probability Zitcovic PDF
162 pages
Handout For The Quantitative Finance Course: Conditional Expectation and Discrete Martingale
No ratings yet
Handout For The Quantitative Finance Course: Conditional Expectation and Discrete Martingale
4 pages
1103904258
No ratings yet
1103904258
14 pages
sheet4_solution
No ratings yet
sheet4_solution
5 pages
Stochastic Calculus
No ratings yet
Stochastic Calculus
217 pages
Solutions of Selected Problems From Probability Essentials, Second Edition
No ratings yet
Solutions of Selected Problems From Probability Essentials, Second Edition
23 pages
Theory of Probability: Lecture Notes
No ratings yet
Theory of Probability: Lecture Notes
162 pages
Durrett Probability Theory and Examples Solutions PDF
73% (15)
Durrett Probability Theory and Examples Solutions PDF
122 pages
Nav 1 A
No ratings yet
Nav 1 A
12 pages
Mixtest 2022 1 BW
No ratings yet
Mixtest 2022 1 BW
9 pages
A2.3 One-Step Equations Using Addition and Subtraction - Student Version
No ratings yet
A2.3 One-Step Equations Using Addition and Subtraction - Student Version
11 pages
Gaskell 6th Solutions
No ratings yet
Gaskell 6th Solutions
229 pages
Basic Welding Symbols PDF
No ratings yet
Basic Welding Symbols PDF
6 pages
Transverse Vibration Testing
No ratings yet
Transverse Vibration Testing
8 pages
Lecture 01 05.08.2024 AI-ML Introduction
No ratings yet
Lecture 01 05.08.2024 AI-ML Introduction
46 pages
s41586-024-08502-w
No ratings yet
s41586-024-08502-w
25 pages
Chapter 3 Data Representation
100% (1)
Chapter 3 Data Representation
23 pages
Setting and Hardening: Cement Types
No ratings yet
Setting and Hardening: Cement Types
4 pages
Module 10. Anatomy of The Nervous System
No ratings yet
Module 10. Anatomy of The Nervous System
6 pages
RB Boiler Product Specs
No ratings yet
RB Boiler Product Specs
4 pages
Solution Exercises List 1 - Probability and Measure Theory
No ratings yet
Solution Exercises List 1 - Probability and Measure Theory
8 pages
Quiz 2 PL - SQL
No ratings yet
Quiz 2 PL - SQL
6 pages
Lec 14
No ratings yet
Lec 14
13 pages
F1-Chapter 1 KSSM
No ratings yet
F1-Chapter 1 KSSM
207 pages
Assignment 2 Question
No ratings yet
Assignment 2 Question
2 pages
Dpms Wnt5 x86-32 Contents
No ratings yet
Dpms Wnt5 x86-32 Contents
10 pages
Consumer Lighting Catalogue
100% (1)
Consumer Lighting Catalogue
84 pages
CBSE Class 11 Mathematics Straight Lines Worksheet Set F
No ratings yet
CBSE Class 11 Mathematics Straight Lines Worksheet Set F
1 page
Pallet Shuttle System: Semi-Automatic High Density Storage Solution
No ratings yet
Pallet Shuttle System: Semi-Automatic High Density Storage Solution
8 pages
SKIIP83AC12I Datasheet
No ratings yet
SKIIP83AC12I Datasheet
1 page
Treatment of Posterior Crossbite Comparing 2 Appliances: A Community-Based Trial
No ratings yet
Treatment of Posterior Crossbite Comparing 2 Appliances: A Community-Based Trial
8 pages
REFERENCE Datasheet-Tarun Jackated Pump (Ref. JH1-A-MOC-049) PDF
No ratings yet
REFERENCE Datasheet-Tarun Jackated Pump (Ref. JH1-A-MOC-049) PDF
2 pages
CL Analyzer: Coagulometric, Chromogenic and Immunological Assays
No ratings yet
CL Analyzer: Coagulometric, Chromogenic and Immunological Assays
2 pages
Quotation: Five Layers Fully Automatic Stretch Film Making Machine Model:WT65-100-100-65-1850MM
No ratings yet
Quotation: Five Layers Fully Automatic Stretch Film Making Machine Model:WT65-100-100-65-1850MM
10 pages
CRD Is Best Suited For Experiments With A Small Number of Treatments
No ratings yet
CRD Is Best Suited For Experiments With A Small Number of Treatments
14 pages
Dimmers: by Andrew Penfold
No ratings yet
Dimmers: by Andrew Penfold
4 pages
ST SPT1P1 (0045)
No ratings yet
ST SPT1P1 (0045)
3 pages
Computer Graphics (CSC209)
No ratings yet
Computer Graphics (CSC209)
6 pages
Multiple Integrals, A Collection of Solved Problems
From Everand
Multiple Integrals, A Collection of Solved Problems
Steven Tan
No ratings yet
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Square Summable Power Series
From Everand
Square Summable Power Series
Louis de Branges
5/5 (1)
Lectures on Measure and Integration
From Everand
Lectures on Measure and Integration
Harold Widom
No ratings yet
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
Set Theory Essentials
From Everand
Set Theory Essentials
Emil Milewski
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Lectures on Integral Equations
From Everand
Lectures on Integral Equations
Harold Widom
3.5/5 (1)
Group Theory I Essentials
From Everand
Group Theory I Essentials
Emil Milewski
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

An Introduction To Stochastic Control

Uploaded by

An Introduction To Stochastic Control

Uploaded by

An introduction to stochastic control

 F is a λ-system if (i) Ω ∈ F ; (ii) A, B ∈ F and A ⊆ B imply B \ A ∈ F ; (iii)

Lemma 1. [σ -field of a π -system, or λ-π lemma] If a π system A is contained in a λ

Proof: See any standard book on measure theory (e.g. [2])

Example of application: [Uniqueness of the extension of a measure defined on a

(i) 1 ∈ H; (ii) IA ∈ H for all A ∈ A

Then H contains all σ(A)-measurable functions from Ω to R.

Proof: Let φ be σ(A)-measurable. Clearly we have that

Exercise: Do the details of the above proof.

Lemma 3. [Chebyshev inequality] Consider a nonnegative r.v. X . Then, for all p ∈

The function f is called the conditional expectation of X given G , and we write

(i) E(·|G) is a linear bounded functional.

(ii) For a constant a ∈ R, we have E(a|G) = a.

(iii) [Mononoticity] For X, Y ∈ L1F with X ≥ Y we have E(X|G) ≥ E(Y |G).

(v) [Characterization of independence] X is independent of G iff for every Borel f

(vi) [Tower or “projection” property] If G1 ⊆ G2 ⊆ F, then

by Dynkin theorem. Therefore, it is natural to define

It can be checked that

which yields the result.

Now, we define the conditional probability w.r.t. a σ -field G as

E(X|ξ) = 0 iff for all bounded continuous g we have E(g(ξ)X) = 0.

E(X|ξ) = 0 iff E(φX) = 0 for all φ that are σ(ξ) measurable.

Now, consider the set

H := {φ : (Ω, F ) → (R, B(R)) ; E(φX) = 0}.

We have to show that H contains the σ(ξ)-measurable functions. It is clear that H

E(X|G) = 0 iff for all bounded continuous g : Rn → R we have E(g(ξ1, . . . , ξn)X) = 0.

Proposition 1. [Checking conditions only on a finite set of variables] Consider a

E(X|G) = 0 iff for all n ∈ N and g ∈ Cb(Rn) we have

Good references for this part are [7, 14].

 Let I be a non-empty index set and (Ω, F , P) a probability space. A family

 We define, for ti ∈ I , the so-called finite-dimensional distributions

Ft1 (x1) := P(X(t1) ≤ x1)

Clearly, the family Ft1...tn satisfies:

Ftσ(1),...,tσ(n) (xσ(1), . . . xσ(n)) = Ft1,...,tn (x1, . . . xn) (4)

(ii) [Compatibility condition] For all i < j

Ft1,...,ti,ti+1,...,tj (x1, . . . , xi, ∞, . . . , ∞) = Ft1,...,ti (x1, . . . , xi). (5)

Proof: See e.g. [7].

From now on we will assume that I := R+.

Example: Let (Ω, F , P) = ([0, 1], FLeb, L) and define X(t, ω) = 1 if ω = t

 [Continuous process] We say that X is continuous (right-continuous, left-continuous)

 [Filtration and usual conditions] A family of σ -fields {Ft}t≥0 is called a filtration if

(i) measurable, if X : [0, T ] × Ω → R is [B(0, T ) × F ]/B(R) measurable.

Proof: Suppose that X is right continuous and consider the sequence

Why progressive measurability is important? Since you have a “dynamic” form of

Wt[0, T ] := {ξ(· ∧ t) ; ξ ∈ W[0, T ]} ,

{ξ ∈ W [0, T ] ; ξ has a local maximum at t̄} ,

belongs to Bt̄+ (W[0, T ]) \ Bt̄(W[0, T ]).

φ(t, ω) = η(t, ξ(· ∧ t, ω)).

Theorem 4. [Kolmogorov-Centsov Continuity Criterium] Let Xt be a stochastic process

Proof: See e.g. [7] or [14].

 A mapping τ : Ω → [0, ∞] is a stopping time (w.r.t. {Ft}t≥0), if for all t ≥ 0

 The σ -field Fτ (of the events “before τ ”) is defined as

Proposition 4. Let σ , τ and σi be stopping times. Then

(ii)[Comparison of stopping times] The events {σ > τ }, {σ ≥ τ } and {σ = τ }

E(Iσ>τ X|Fτ ) = Iσ>τ E(X|Fτ ) = Iσ>τ E(X|Fτ ∧σ )

Important example of stopping times:

 A process X(t) is called a Ft-martingale (resp. submartingale, supermartingale) if

E(X(t)|Fs) = X(s) (resp. ≥, ≤), for all 0 ≤ s ≤ t.

 Note that in particular for a martingale (resp. submartingale, supermartingale) ,

See [14] for the proof of the following results:

 [Doob inequality I] Let p ≥ 1 and X(t) a right continuous martingale with

Note that the above inequality is an important improvement of Chebyshev’s inequality.

E (X(τ ) |Fσ ) = X(σ) (resp. ≥, ≤) P − a.s.

 As a corollary we obtain that for stopping times σ ≤ τ we have

E (X(t ∧ τ ) − X(t ∧ σ)|Fσ ) = 0.

Proof: Since X(t ∧ τ ) is Ft mesurable, the optional sampling theorem yields

X(t ∧ σ) = E (X(t ∧ τ )|Ft∧σ )

which gives the result.

 A d-dimensional r.v. X is said to have the multivariate Gaussian distribution with

Proof: See any basic probability book (e.g. [1]).

Let (Ω, F , {Ft}t≥0, P) be a filtered probability space.

(i) X(t) has continuous trajectories.

(ii) X(t) − X(s) is independent of Fs.

(iv) X(0) = 0, P-a.s.

E(X(t) − X(s)|Fs) = 0, P − a.s.

 Does a Brownian motion exist?

F is a λ-system if (i) Ω ∈ F ; (ii) A, B ∈ F and A ⊆ B imply B \ A ∈ F ; (iii)

Let I be a non-empty index set and (Ω, F , P) a probability space. A family

We define, for ti ∈ I , the so-called finite-dimensional distributions

[Continuous process] We say that X is continuous (right-continuous, left-continuous)

[Filtration and usual conditions] A family of σ -fields {Ft}t≥0 is called a filtration if

A mapping τ : Ω → [0, ∞] is a stopping time (w.r.t. {Ft}t≥0), if for all t ≥ 0

The σ -field Fτ (of the events “before τ ”) is defined as

A process X(t) is called a Ft-martingale (resp. submartingale, supermartingale) if

Note that in particular for a martingale (resp. submartingale, supermartingale) ,

[Doob inequality I] Let p ≥ 1 and X(t) a right continuous martingale with

As a corollary we obtain that for stopping times σ ≤ τ we have

A d-dimensional r.v. X is said to have the multivariate Gaussian distribution with

Does a Brownian motion exist?

L0F ([0, T ]; R) is the subspace of L2F of bounded simple process, i.e. f ∈ L0

Let us state an important inequality for the stochastic integral.