Chapter 3
Chapter 3
for all t > 0, s > sn > · · · > s1 ≥ 0 and i, j, ik ∈ S. This is the obvious
analogue of the Markov property when the discrete time variable l is replaced
by a continuous parameter t. We refer to equation (1.1.1) as the Markov
property and to the quantities P [Xs+t = j|Xs = i] as transition probabilities
or matrices.
We represent the transition probabilities P [Xs+t = j|Xs = i] by a possibly
s
infinite matrix Ps+t . Making the time homogeneity assumption as in the
s
case of Markov chain, we deduce that the matrix Ps+t depends only on the
s
difference s + t − s = t and and therefore we simply write Pt instead of Ps+t .
Thus for a continuous time Markov chain, the family of matrices Pt (generally
an infinite matrix) replaces the single transition matrix P of a Markov chain.
In the case of Markov chains the matrix of transition probabilities after l
units of time is given by P l . The analogous statement for a continuous time
Markov chain is
Ps+t = Pt Ps . (1.1.2)
(t)
This equation is known as the semi-group property. As usual we write Pij
for the (i, j)th entry of the matrix Pt . The proof of (1.1.2) is similar to
that of the analogous statement for Markov chains, viz., that the matrix of
transition probabilities after l units of time is given by P l . Here the transition
probability from state i to state j after t + s units is given
X (t) (s) (t+s)
Pik Pkj = Pij ,
k
1
space S. We also impose the additional requirement of right continuity on
ω ∈ Ω in the form
Ps Pt = Ps+t = Pt Ps .
lim Pt = I. (1.1.3)
h→0+
The limit here means entry wise for the matrix Pt . While no requirement of
uniformity relative to the different entries of the matrix Pt is imposed, we use
this limit also in the sense that for any vector v (in the appropriate function
space) we have limt→0 vPt = v. We define the infinitesimal generator of the
2
continuous time Markov chain as the one-sided derivative
Ph − I
A = lim .
h→0+ h
A is a real matrix independent of t. For the time being, in a rather cavalier
manner, we ignore the problem of the existence of this limit and proceed as
if the matrix A exists and has finite entries. Thus we define the derivative of
Pt at time t as
dPt Pt+h − Pt
= lim ,
dt h→0+ h
where the derivative is taken entry wise. The semi-group property implies
that we can factor Pt out of the right hand side of the equation. We have
two choices namely factoring Pt out on the left or on the right. Therefore we
get the equations
dPt dPt
= APt , = Pt A. (1.1.4)
dt dt
These differential equations are known as the Kolmogorov backward and for-
ward equations respectively. They have remarkable consequences some of
which we will gradually investigate.
The (possibly infinite) matrices Pt are Markov or stochastic in the sense
that entries are non-negative and row sums are 1. Similarly the matrix A is
not arbitrary. In fact,
3
principle construct Pt easily. The idea is to explicitly solve the Kolomogorov
(forward or backward) equation. In fact if we replace the matrices Pt and A
by scalars, we get the differential equation dp
dt
= ap which is easily solved by
p(t) = Ceat . Therefore we surmise the solution Pt = CetA for the Kolmogorov
equations where we have defined the exponential of a matrix B as the infinite
series
X Bj
eB = (1.1.5)
j j!
4
Nothing in the definition of a continuous time Markov chain ensures the
existence of the infinitesimal generator A. In fact it is possible to construct
continuous time Markov chains with diagonal entries of A being −∞. Intu-
itively this means the transition out of a state may be instantaneous. For
many Markov chains appearing in the analysis of problems of interest do
not allow of instantaneous transitions. We eliminate this possibility by the
requirement
5
exponential random variable with parameter λ is λ1 , it is intuitively clear
that if λi ’s increase sufficiently fast we should expect infinitely transitions in
a finite interval. In order to analyze this issue more closely we consider a
family T1 , T2 , . . . of independent exponential random variables with Tk hav-
P
ing parameter λk . Then we consider the infinite sum k Tk . We consider the
events
X X
[ Tk < ∞] and [ Tk = ∞].
k k
The first event means there are infinitely many transitions in a finite interval
of time, and the second is the complement. It is intuitively clear that if
the rates λk increases sufficiently rapidly we should expect infinitely many
transitions in a finite interval, and conversely, if the rates do not increase
too fast then only finitely many transitions are possible i finite time. More
precisely, we have
Proof - We have
X X 1
E[ Tk ] = ,
k k λk
Now
Z ∞
E[e−Tk ] = P [−Tk > log s]ds
◦
Z ∞
= P [Tk < − log s]ds
◦
λk
= .
1 + λk
6
Therefore, by a standard theorem on infinite products1 ,
P 1
E[e− Tk ] =
Y
= 0.
1 + λ1k
P
Since e− Tk is a non-negative random variable, its expectation can be 0 only
if Tk = ∞ with probability 1. ♣
P
Remark 1.1.1 It may appear that the Kolmogorov forward and backward
equations are one and the same equation. This is not the case. While A and
Pt formally commute, the domains of definition of the operators APt and
Pt A are not necessarily identical. The difference between the forward and
backward equations becomes significant, for instance, when dealing with cer-
tain boundary conditions where there is instantaneous return from boundary
points (or points at infinity) to another state. However if the infinitesimal
generator A has the property that the absolute values of the diagonal entries
satisfy a uniform bound |Aii | < c, then the forward and backward equations
have the same solution Pt with P◦ = I. In general, the backward equation
has more solutions than the forward equation and its minimal solution is also
the solution of the forward equation. Roughly speaking, this is due to the
fact that A can be an unbounded operator, while Pt has a smoothing effect.
An analysis of such matters demands more technical sophistication than we
are ready to invoke in this context. ♥
Remark 1.1.2 The fact that the series (1.1.5) converges is is easy to show
for finite matrices or under some bounded assumption on the entries of the
matrix A. If the entries Ajk grow rapidly with k, j, then there will be conver-
gence problems. In manipulating exponentials of (even finite) matrices one
should be cognizant of the fact that if AB 6= BA then eA+B 6= eA eB . On the
other hand if AB = BA then eA+B = eA eB as in the scalar case. ♥
Recall that the stationary distribution played an important role in the
theory of Markov chains. For a continuous time Markov chain we similarly
define the stationary distribution as a row vector π = (π1 , π2 , · · ·) satisfying
X
πPt = π for all t ≥ 0, πj = 1, πj ≥ 0. (1.1.9)
1
Let ak be a sequencePof positive numbers, then the infinite product (1 + ak )−1
Q
diverges to 0 if and only if ak = ∞. The proof is by taking logarithms and expanding the
log and can be found in many books treating infinite series and products, e.g. Titchmarsh
- Theory of Functions, Chapter 1.
7
The following lemma re-interprets πPt = π in terms of the infinitesimal
generator A:
8
This system is easily solved to yield
λ
πi = ( )i π◦ .
µ
For λ < µ we obtain
λ λ i
πi = (1 − )( ) .
µ µ
as the stationary distribution. ♠
The semi-group property (1.1.2) implies
t
n
(t)
Pii ≥ Pii n
.
(t)
The continuity assumption (1.1.3) implies that for sufficiently large n, Piin >
0 and consequently
(t)
Pii > 0, for t > 0.
More generally, we have
(t) (t)
Lemma 1.1.3 The diagonal entries Pii > 0, and off-diagonal entries Pij ,
i 6= j, are either positive for all t > 0 or vanish identically. The entries of
the matrix Pt are right continuous as functions of t.
(t) (t)
Proof - We already know that Pjj > 0 for all t. Now assume Pij = 0 where
i 6= j. Then for α, β > 0, α + β = 1 we have
(t) (αt) (βt)
Pij ≥ Pii Pij .
(βt) (t)
Consequently Pij = 0 for all 0 < β < 1. This means that if Pij = 0, the
(s) (s)
Pij = 0 for all s ≤ t. The conclusion that Pij = 0 for all s is proven later
(see corollary 1.4.1). The continuity property (1.1.3)
lim Pt+h = Pt lim Ph = Pt .
h→0+ h→0+
(t)
implies right continuity of Pij . ♣
Note that in the case of a finite state Markov chain Pt has convergent
(t)
series representation Pt = etA , and consequently the entries Pij are analytic
functions of t. An immediate consequence is
9
Corollary 1.1.1 If all the states of a continuous time Markov chain commu-
(t)
nicate, then the Markov chain has the property that Pij > 0 for all i, j ∈ S
(and all t > 0). In particular, if S is finite then all states are aperiodic and
recurrent.
In view of the existence of periodic states in the discrete time case, this
corollary stands in sharp contrast to the latter situation. The existence
of the limiting value of liml→∞ P l for a finite state Markov chain and its
implication regarding long term behavior of the Markov chain was discussed
in §1.4. The same result is valid here as well, and the absence of periodic
states for continuous Markov chains results in a stronger proposition. In fact,
we have
Proof - It follows from the hypotheses that for some t > 0 all entries of
Pt are positive, and consequently for all t > 0 all entries of Pt are positive.
Fix t > 0 and let Q = Pt be the transition matrix of a finite state (discrete
time) Markov chain. liml Ql is the rank one matrix each row of which is the
stationary distribution of the Markov chain. This limit is independent of
choice of t > 0 since
10
EXERCISES
Exercise 1.1.1 A hospital owns two identical and independent power gen-
erators. The time to breakdown for each is exponential with parameter λ and
the time for repair of a malfunctioning one is exponential with parameter µ.
Let X(t) be the Markov process which is the number of operational genera-
tors at time t ≥ 0. Assume X(0) = 2. Prove that the probability that both
generators are functional at time t > 0 is
µ2 λ2 e−2(λ+µ)t 2λµe−(λ+µ)t
+ + .
(λ + µ)2 (λ + µ)2 (λ + µ)2
Exercise 1.1.2 Let α > 0 and consider the random walk Xn on the non-
negative integers with a reflecting barrier at 0 defined by
α 1
pi i+1 = , pi i−1 = , for i ≥ 1.
1+α 1+α
1. Find the stationary distribution of this Markov chain for α < 1.
11
1.2 Inter-arrival Times and Poisson Processes
Poisson processes are perhaps the most basic examples of continuous time
Markov chains. In this subsection we establish their basic properties. To con-
struct a Poisson process we consider a sequence W1 , W2 , . . . of iid exponential
random variables with parameter λ. Wj ’s are called inter-arrival times. Set
T1 = W1 , T2 = W◦ + W1 and Tn = Tn−1 + Wn . Tj ’s are called arrival times.
Now define the Poisson process Nt with parameter λ as
Nt = max{n | W1 + W2 + · · · + Wn ≤ t} (1.2.1)
Intuitively we can think of certain events taking place and every time the
event occurs the counter Nt is incremented by 1. We assume N◦ 0 and the
times between consecutive events, i.e., Wj ’s, being iid exponentials with the
same parameter λ. Thus Nt is the number of events that have taken place
until time t. The validity of the Markov property follows from the construc-
tion of Nt and the exponential nature of the inter-arrival times, so that the
Poisson process is a continuous time Markov chain. It is clear that Nt is
stationary in the sense that Ns+t − Ns has the same distribution as Nt .
The arrival and inter-arrival times can be recovered from Nt by
reflecting the fact that from state n only transition to state n + 1 is possible.
12
To analyze Poisson processes we begin by calculating the density func-
tion for Tn . Recall that the distribution of a sum of independent exponential
random variables is computed by convolving the corresponding density func-
tions (or using Fourier transforms to convert convolution to multiplication.)
Thus it is a straightforward calculation to show that Tn = W1 + · · · + Wn has
density function
(
µe−µx (µx)n−1
f(n,µ) (x) = (n−1)!
for x ≥ 0; (1.2.3)
0 for x < 0.
One commonly refers to f(n,µ) as Γ density with parameters (n, µ), so that
Tn has Γ distribution with parameters (n, µ). From this we can calculate the
density function for Nt , for given t > 0. Clearly {Tn+1 ≤ t} ⊂ {Tn ≤ t} and
the event {Nt = n} is the complement of {Tn+1 ≤ t} in {Tn ≤ t}. Therefore
by (1.2.3) we have
Z t Z t e−µt (µt)n
P [Nt = n] = f(n,µ) (x)dx − f(n+1,µ) (x)dx = . (1.2.4)
◦ ◦ n!
Hence Nt is a Z+ -valued random variable whose distribution is Poisson with
parameter µt, hence the terminology Poisson process. This suggests that we
can interpret the Poisson process Nt as the number of arrivals at a server in
the interval of time [0, t] where the assumption is made that the number of
arrivals is a random variable whose distribution is Poisson with parameter
µt.
In addition to stationarity Poisson processes have another remarkable
property. Let 0 ≤ t1 < t2 ≤ t3 < t4 , then the random variables Nt2 −
Nt1 and Nt4 − Nt3 are independent. This property is called independence
of increments of Poisson processes. The validity of this property can be
understood intuitively without a formal argument. The essential point is that
the inter-arrival times have the same exponential distribution and therefore
the number of increments in the interval (t3 , t4 ) is independent of how many
transitions have occurred up to time t3 an in particular independent of the
number of transitions in the interval (t1 , t2 ). A more formal proof will also
follow from our analysis of Poisson processes.
To compute the infinitesimal generator of the Poisson process we note
that in view of (1.2.4) for h > 0 small we have
13
It follows that the infinitesimal generator of the Poisson process Nt is
−µ µ 0 0 0 ···
0
−µ µ 0 0 ···
A= 0 0 −µ µ 0 ···
(1.2.5)
0 0 0 −µ µ ···
.. .. .. .. ..
..
. . . . . .
∂(x1 , · · · , xm )
h(y1 , · · · , ym ) = f (x1 (y1 , · · · , ym ), · · · , xm (y1 , · · · , ym ))| |,
∂(y1 , · · · , ym )
14
we obtain
X X
h(y1 , · · · , ym ) = det Af ( A1i yi , · · · , Ami yi ).
t1 = w1 , t2 = w1 + w2 , · · · , tm = w1 + · · · + wm
Therefore to calculate
P [Am , Nt = m]
P [Am | Nt = m] = ,
Nt = m
where Am denotes the event
Am = {0 < T1 < t1 < T2 < t2 < · · · < tm−1 < Tm < tm < t < Tm+1 }
we evaluate the numerator of the right hand side by noting that the condition
Nt = m is implied by the requirement Tm < tm < t <. Now
Z
P [Am ] = µm+1 eµtm+1 ds1 · · · dsm+1
U
U : (s1 , · · · , sm+1 ) such that 0 < s1 < t1 < s2 < t2 < · · · < sm < tm < t < sm+1 .
15
Therefore
m!
P [Am | Nt = m] = t1 (t2 − t1 ) · · · (tm − tm−1 ) . (1.2.6)
tm
To obtain the conditional joint density of T1 , · · · , Tm given Nt = m we apply
m
the differential operator ∂t1∂···∂tm to (1.2.6) to obtain
m!
fT |N (t1 , · · · , tm ) = , 0 ≤ t1 < t2 < · · · < tm ≤ t. (1.2.7)
tm
We deduce the following remarkable fact:
Proposition 1.2.1 With the above notation and hypotheses, the conditional
joint density of T1 , · · · , Tm given Nt = m is identical with that of the order
statistics of m uniform random variables from [0, t].
Proposition 1.2.2 The Poisson process Nt with parameter µ has the fol-
lowing properties:
16
1. For fixed t > 0, Nt is a Poisson random variable with parameter µt;
2. Nt is stationary (i.e., Ns+t − Ns has the same distribution as Nt ) and
has independent increments;
3. The infinitesimal generator of Nt is given by (1.2.5).
Property (3) of proposition 1.2.2 follows from the first two which in fact
characterize Poisson processes. From the infinitesimal generator (1.2.5) one
can construct the transition probabilities Pt = etA .
There is a general procedure for constructing continuous time Markov
chains out of a Poisson process and a (discrete time) Markov chain. The
resulting Markov chains are often considerably easier to analyze and behave
somewhat like the finite state continuous time Markov chains. It is cus-
tomary to refer to these processes as Markov chains subordinated to Poisson
processes. Let Zn be a (discrete time) Markov chain with transition ma-
trix K, and Nt be a Poisson process with parameter µ. Let S be the state
space of Zn . We construct the continuous time Markov chain with state
space S by postulating that the number of transitions in an interval [s, s + t)
is given by Nt+s − Ns which has the same distribution as Nt . Given that
there are n transitions in the interval [s, s + t), we require the probability
P [Xs+t = j | Xs = i, Nt+s − Ns = n] to be
(n)
P [Xs+t = j | Xs = i, Nt+s − Ns = n] = Kij .
(t)
Let K ◦ = I, then the transition probability Pij for Xt is given by
∞
(t) X e−µt (µt)n (n)
Pij = Kij . (1.2.9)
n=◦ n!
The infinitesimal generator is easily computed by differentiating (1.2.9) at
t = 0:
A = µ(−I + K). (1.2.10)
From the Markov property of the matrix K it follows easily that the infinite
series expansion of etA converges and therefore Pt = etA is rigorously defined.
The matrix Q of lemma 1.4.1 can also be expressed in terms of the Markov
matrix K. Assuming no state is absorbing we get (see corollary 1.4.2)
(
0, if i = j;
Qij = Kij
, otherwise. (1.2.11)
1−Kii
17
Note that if (1.2.10) is satisfied then from A we obtain a continuous time
Markov chain subordinate to a Poisson process.
18
EXERCISES
Exercise 1.2.1 For the two state Markov chain with transition matrix
p q
K= ,
p q
show that the continuous time Markov chain subordinate to the Poisson pro-
cess of rate µ has trasnsition matrix
p + qe−µt q − qe−µt
Pt =
p − pe−µt q + pe−µt
for h > 0 a small real number. This implies that the probability that a single
organism splits more than once in h units of time is o(h). Now suppose that
we have n organisms and Aj denote the event that organism number j splits
(at least once) and A be the event that in h units of time there is exactly
one split. Then
X X X
P [A] = P [Aj ] − P [Ai ∩ Aj ] + P [Ai ∩ Aj ∩ Ak ] − · · ·
j i<j i<j<k
Note that the exact value of the terms incorporated into o(h) is quite com-
plicated. We shall see that in spite of ignoring these complicated terms, we
19
can recover exact information about our continuous time Markov chain by
using Kolmogorov forward equation. Let B be the event that in h units of
time there are at least two splits, and C the event that there are no split of
the n organisms. Then
P [B] = o(h), P [C] = 1 − λnh + o(h). (1.3.2)
Equations (1.3.1) and (1.3.2) imply that the infinitesimal generator A of the
continuous time Markov chain Xt is
−λ λ 0 0 0 ···
0 −2λ 2λ 0 0 ···
A= −3λ 3λ 0 · · ·
0 0
.. .. .. .. .. . .
. . . . . .
For any initial distribution π ◦ = (π1◦ , π2◦ , · · ·) let q(t) = (q1 (t), q2 (t), · · ·)
be the row vector q(t) = π ◦ Pt which describes the distribution of states at
time t. In fact,
qk (t) = P [Xt = k | X◦ = π ◦ ].
Thus a basic problem about the Markov chain Xt is the calculation of E[Xt ]
or more generally of the generating function
FX (t, ξ) = E[ξ Xt ] = P [Xt = k]ξ k .
X
k=1
20
The fact that FX a linear partial differential equation makes the calcula-
tion of E[Xt ] very simple. In fact, since E[Xt ] = ∂F∂ξX evaluated at ξ = 1− ,
we differentiate both sides of (1.3.4) with respect to ξ, change the order of
differentiation relative to ξ and t on the left side, and set ξ = 1 to obtain
dE[Xt ]
= λE[Xt ]. (1.3.5)
dt
The solution to this ordinary differential equation is Ceλt and the constant
C is determined by the initial condition X◦ = 1 to yield
E[Xt ] = eλt .
The partial differential equation (1.3.4) tells us considerably more than
just the expectation of Xt . The basic theory of a single linear first order
partial differential equation is well understood. Recall that the solution to a
first order ordinary differential equation is uniquely determined by specifying
one initial condition. Roughly speaking, the solution to a linear first order
partial differential equation in two variables is uniquely determined by spec-
ifying a function of one variable. Let us see how this works for our equation
(1.3.4). For a function g(s) of a real variable s we want to substitute for s
a function of t and ξ such that (1.3.4) is necessarily valid regardless of the
choice of g. If for s we substitute λt + φ(ξ), then by the chain rule
∂g(λt + φ(ξ)) ∂g(λt + φ(ξ))
= λg 0 (λt + φ(ξ)), = φ0 (ξ)g 0 (λt + φ(ξ)).
∂t ∂ξ
where g 0 denote the derivative of the function g. Therefore if φ is such
that φ0 (ξ) = ξ21−ξ , then, regardless of what function we take for g, equation
(1.3.4) is satisfied by g(λt + φ(ξ)). There is an obvious choice for φ namely
the function
Z ξ
du ξ
φ(ξ) = 1 2 = log ,
2
u −u 1−ξ
for 0 < ξ < 1. (The lower limit of the integral is immaterial and 12 is fixed
for convenience.) Now we incorporate the initial condition X◦ = 1 which in
terms of the generating function FX means FX (0, ξ) = ξ. In terms of g this
translates into
1−ξ
g(log ) = ξ.
ξ
21
That is, g should be the inverse to the mapping ξ → log 1−ξ
ξ
. It is easy to
see that
es
g(s) =
1 + es
is the required function. Thus we obtain the expression
ξ
FX (t, ξ) = (1.3.6)
ξ + (1 − ξ)e−λt
for the probability generating function of Xt . If we change the initial condi-
tion to X◦ = N , then the generating function becomes
ξN
FX (ξ, t) = .
[ξ + (1 − ξ)e−λt ]N
From this we deduce
!
(t) j−1
PN j = e−N λt (1 − e−λt )N −j
N −1
for the transition probabilities. The method of derivation of (1.3.6) is re-
markable and instructive. We made essential use of the Kolmogorov forward
equation in obtaining a linear first order partial differential equation for the
probability generating function. This was possible because we have an infi-
nite number of states and the coefficients of qk (t) and qk−1 (t) in (1.3.3) were
linear in k. If the dependence were quadratic in k, probably we could have
obtained a partial differential equation which had order two relative to the
variable ξ and the situation would have been more complex. The fact that
we have an explicit differential equation for the generating function gave us
a fundamental new tool for understanding it. In the exercises this method
will be further demonstrated.
Example 1.3.1 The birth process described above can be easily generalized
to a birth-death process by introducing a positive parameter µ > 0 and
replacing equations (1.3.1) and (1.3.2) with the requirement
if a = −1;
nµh + o(h),
P [Xt+h = n + a | Xt = n] = nλh + o(h), if a = 1; (1.3.7)
if |a| > 1.
o(h),
22
The probability generating function for Xt can be calculated by an argument
similar to that of pure birth process given above and is delegated to exercise
1.3.5. It is shown there that for λ 6= µ
µ(1 − ξ) − (µ − λξ)e−t(λ−µ)
N
FX (ξ, t) = . (1.3.8)
λ(1 − ξ) − (µ − λξ)e−t(λ−µ)
23
EXERCISES
Exercise 1.3.1 (M/M/1 queue) A server can service only one customer at a
time and the arriving customers form a queue according to order of arrival.
Consider the continuous time Markov chain where the length of the queue
is the state space, the time between consecutive arrivals is exponential with
parameter µ and the time of service is exponential with parameter λ. Show
that the matrix Q = (Qij ) of lemma 1.4.1 is
µ
µ+λ
, if j = i + 1;
Qij = λ
µ+λ
, if j = i − 1;
0, otherwise.
−λ ···
λ 0 0 0
µ
−(λ + µ) λ 0 0 ···
0 2µ −(λ + 2µ) λ 0 ···
.. .. .. .. .. ...
. . . . .
Let FX (ξ, t) = E(ξ X(t) ) denote the generating function of the Markov process.
Show that FX satisfies the differential equation
∂FX ∂FX
= (1 − ξ)[−λFX + µ ].
∂t ∂ξ
λ
E[X(t)] = (1 − e−µt ) + me−µt .
µ
24
Exercise 1.3.3 (Continuation of exercise 1.3.2) - With the same notation
as exercise 1.3.2, show that the substitution
λ(1−ξ)
FX (ξ, t) = e− µ G(ξ, t)
gets rid of the term involving FX on the right hand side of the differential
equation for FX . More precisely, it transforms the differential equation for
FX into
∂G ∂G
= µ(1 − ξ) .
∂t ∂ξ
Can you give a general approach for solving this differential equation? Verify
that
−µt )/µ
FX (ξ, t) = e−λ(1−ξ)(1−e [1 − (1 − ξ)e−µt ]m
25
Exercise 1.3.5 Consider the birth-death process of example 1.3.1
1. Show that the generating function FX (ξ, t) = E[ξ Xt ] satisfies the partial
differential equation
∂FX ∂FX
= (λξ − µ)(ξ − 1) ,
∂t ∂ξ
3. Show that
λ + µ (λ−µ)t (λ−µ)t
Var[Xt ] = N e (e − 1).
λ−µ
4. Let N = 1 and Z denote the time of the extinction of the process. Show
that for λ = µ , E[Z] = ∞.
26
1.4 Discrete vs. Continuous Time Markov Chains
In this subsection we show how to assign a discrete time Markov chain to one
with continuous time, and how to construct continuous time Markov chains
from a discrete time one. We have already introduced the notions of Markov
and stopping times for for Markov chains, and we can easily extend them to
continuous time Markov chains. Intuitively a Markov time for the (possibly
continuous time) Markov chain is a random variable T such that the event
[T ≤ t] does not depend on Xs for s > t. Thus a Markov time T has the
property that if T (ω) = t then T (ω 0 ) = t for all paths which are identical
with ω for s ≤ t. For instance, for a Markov chain Xl with state space Z
and X◦ = 0 let T be the first hitting time of state 1 ∈ Z. Then T is a
Markov time. If T is Markov time for the continuous time Markov chain Xt ,
the fundamental property of Markov time, generally called Strong Markov
Property, is
(s)
P [XT +s = j | Xt , t ≤ T ] = PXT j (1.4.1)
This reduces to the Markov property if we take T to be a constant. To
understand the meaning of equation (1.4.1), consider Ωu = {ω | T (ω) = u}
where u ∈ R+ is any fixed positive real number. Then the left hand side of
(1.4.1) is the conditional probability of the set of paths ω that after s units
of time are in state j given Ωu and Xt for t ≤ u = T (ω). The right hand
side states that the information Xt for t ≤ u is not relevant as long we know
the states for which T (ω) = u, and this probability is the probability of the
paths which after s units of time are in state j assuming at time 0 they were
in a state determined by T = u. One can also loosely think of the strong
Markov property as allowing one to reparametrize paths so that all the paths
will satisfy T (ω) = u at the same constant time T and then the standard
Markov property will be applicable. Examples that we encounter will clarify
the meaning and significance of this concept. The validity of (1.4.1) is quite
intuitive, and one can be convinced of its validity by looking at the set of
paths with the required properties and using the Markov property. It is
sometimes useful to make use of a slightly more general version of the strong
Markov property where a function of the Markov time is introduced. Rather
than stating a general theorem, its validity in the context where it is used
will be clear.
The notation Ei [Z] where the random variable Z is a function of of the
continuous time Markov chain Xt means that we are calculating conditional
27
expectation conditioned on X◦ = i. Naturally, one may replace the subscript
i by a random variable to accommodate a different conditional expectation.
Of course, instead of a subscript one may write the conditioning in the usual
manner E[? | ?]. The strong Markov property in the context of conditional
expectations implies
X (s)
E[g(XT +s ) | Xu , u ≤ T ] = EXT [g] = E[ PXT j g(j)]. (1.4.2)
j∈S
Then Y is a Markov time. The assumption (1.1.8) implies that except for a
set of paths of probability 0, Y (ω) > 0, and by the right continuity assump-
tion, the infimum is actually achieved. The strong Markov property implies
that the random variable Y is memoryless in the sense that
P [Y ≥ t + s | Y > s] = P [Y ≥ t | X◦ = i].
This equation is compatible with (1.1.8). Note that for an absorbing state i
we have λi = 0.
From a continuous time Markov chain one can construct a (discrete time)
Markov chain. Let us assume X◦ = i ∈ S. A simple and not so useful way is
(1)
to define the transition matrix P of the Markov chain as Pij . A more useful
28
approach is to let Tn be the time of the nth transition. Thus T1 (ω) = s > 0
means that there is j ∈ S, j 6= i, such that
i for t < s;
ω(t) =
j for s = t.
T1 is a stopping time if we assume that i is not an absorbing state. We define
Qij to be the probability of the set of paths that at time 0 are in state i and
at the time of the first transition they move to state j. Therefore
X
Qkk = 0, and Qij = 1.
j6=i
Let Wn = Tn+1 − Tn denote the time elapsed between the nth and (n + 1)st
transitions. We define a Markov chain Z◦ = X◦ , Z2 , · · · by setting Zn =
XTn . Note that the strong Markov property for Xt is used in ensuring that
Z◦ , Z1 , Z2 , · · · is a Markov chain since transitions occur at different times on
different paths. The following lemma clarifies the transition matrix of the
Markov chain Zn and sheds light on the transition matrices Pt .
Proof - Clearly the left hand side of the equation can be written in the form
29
We have
P [XW◦ = j | W◦ > u, X◦ = k] = P [XW◦ = j | Xs = k for s ≤ u]
= P [Xu+W◦ = j | Xu = k]
= P [XW◦ = j | X◦ = k].
The quantity P [XW◦ = j | X◦ = k] is independent of u and we denote it by
Qkj . Combining this with (1.4.3) (exponential character of elapsed time Wn
between consecutive transitions) we obtain the desired formula. The validity
of stated properties of Qkj is immediate. ♣
A immediate corollary of lemma 1.4.1 is that it allows to fill in the gap
in the proof of lemma 1.1.3.
(t)
Corollary 1.4.1 Let i 6= j ∈ S, then either Pij > 0 for all t > 0 or it
vanishes identically.
(t)
Proof - If Pij > 0 for some t, then Qij > 0, and it follows that for all t > 0
(t)
Pij > 0. ♣
This process of assigning a Markov chain to a continuous time Markov
chain can be reversed to obtain (infinitely many) continuous time Markov
chains from a discrete time one. In fact, for every j ∈ S let λj > 0 be
a positive real number. Now given a Markov chain Zn with state space S
and transition matrix Q, let Wj be an exponential random variable with
parameter λj . If j is not an absorbing state, then the first transition out of j
happens at time s > t with probability e−λj t and once the transition occurs
the probability of hitting state k is
Qjk
P .
i6=j Qji
Lemma 1.4.1 does not give direct and adequate information about the
behavior of transition probabilities. However, combining it with the strong
Markov property yields an important integral equation satisfied by the tran-
sition probabilities.
30
Proof - We may assume i is not an absorbing state. Let T1 be the time of
first transition. Then trivially
(t)
Pij = P [Xt = j, T1 > t | X◦ = i] + P [Xt = j, T1 ≤ t | X◦ = i].
Corollary 1.4.2 With the notation of lemma 1.4.2, the infinitesimal gener-
ator of the continuous time Markov chain Xt is
−λi , if i = j;
Aij =
λi Qij , 6 j.
if i =
Proof - Making the change of variable s = t − u in the integral in lemma
1.4.2 we obtain
Z t
(t) (u)
Pij = e−λi t δij + λi e−λi u
X
Qik Pkj du.
◦ k
3
This is like the connection between the density function and distribution function of
a random variable.
31
Differentiating with respect to t yields
(t)
dPij (t)
= −λi e−λi t δij + λi e−λi t
X
Qik Pkj .
dt k
32
and take for instance t = 1, then a path ω between 0 and n will be squeezed
in the horizontal (time) direction to the interval [0, 1] and the values will be
multiplied by √1n . The resulting path will still consist of broken line segments
where the points of nonlinearity (or non-differentiability) occur at nk , k =
1, 2, 3, · · ·. At any rate since all the paths are continuous, we may surmise
that the path space for limn→∞ is the space Ω = Cx◦ [0, ∞) of continuous
function on [0, ∞) and we may require ω(0) = x◦ , some fixed number. Since
in the simple symmetric random walk, a path is just as likely to up as down we
expect, the same is true of the paths in the Brownian motion. A differentiable
path on the other hand has a definite preference at each point, namely,
the direction of the tangent. Therefore it is reasonable to expect that with
probability 1 the paths in Brownian are nowhere differentiable in spite of the
fact that we have not yet said anything about how probabilities should be
assigned to the appropriate subsets of Ω. The assignment of probabilities is
the key issue in defining Brownian motion.
Let 0 < t1 < t2 < · · · < tm and we want to see what we can say about
(n) (n) (n) (n) (n)
the joint distribution of (Zt1 , Zt2 − Zt1 , · · · , Ztm − Ztm−1 ). Note that these
(n) (n)
random variables are independent while Zt1 , Zt2 , · · · are not. By the central
limit theorem, for n sufficiently large,
1
(n) (n)
Ztk − Ztk−1 = √ S[ntk ] − S[tnk−1 ]
n
is approximately normal with mean 0 and variance (tk − tk−1 )σ 2 . We assume
(n)
that taking the limit of n → ∞ the process Zt tends to a limit. Of course
this requires specifying the sense in which convergence takes place and proof,
but because of the applicability of the central limit theorem we assign prob-
abilities to sets of paths accordingly without going through a convergence
argument. More precisely, to the set of paths, starting at 0 at time 0, which
are in the open subset B ⊂ R at time t, it is natural to assign the probability
1 Z − u22
P [Zt ∈ B] = √ e 2tσ du. (1.5.2)
2πtσ B
In view of independence of Zt1 and Zt2 −Zt1 , the probability that Zt1 ∈ (a1 , b1 )
and Zt2 − Zt1 ∈ (a2 , b2 ), is
b1 b2 u2 u2
1 Z Z
− 1
2σ 2 t1
− 2
2σ 2 (t2 −t1 )
q e e du1 du2 .
2πσ 2 t1 (t2 − t1 ) a1 a2
33
Note that we are evaluating the probability of the event [Zt1 ∈ (a1 , b1 ), Zt2 −
Zt1 ∈ (a2 , b2 )] and not [Zt1 ∈ (a1 , b1 ), Zt2 ∈ (a2 , b2 )] since the random vari-
ables Zt1 and Zt2 are not independent. This formula extends to probability
of any finite number of increments. In fact, for 0 < t1 < t2 < · · · < tk the
joint density function for (Zt1 , Zt2 − Zt1 , · · · , Ztk − Ztk−1 ) is the product
u2 u2 u2
1 2 − k
− − 2σ 2 (tk −tk−1 )
e 2σ 2 t1
e 2σ 2 (t2 −t1 )
···e
q
σ k (2π)k t1 (t2 − t1 ) · · · (tk − tk−1 )
One refers to the property of independence of (Zt1 , Zt2 −Zt1 , · · · , Ztk −Ztk−1 )
as independence of increments. For future reference and economy of notation
we introduce
1 x2
pt (x; σ) = √ e− 2tσ2 . (1.5.3)
2πtσ
For σ = 1 we simply write pt (x) for pt (x; σ).
For both discrete and continuous time Markov chains the transition prob-
abilities were give by matrices Pt . Here the transition probabilities are en-
coded in the Gaussian density function pt (x; σ). It is easier to introduce the
analogue of Pt for Brownian motion if we look at the dual picture where the
action of the semi-group Pt on functions on the state space is described. Just
as in the case of continuous time Markov chains we set
Z ∞
(Pt ψ)(x) = E[ψ(Zt ) | Z◦ = x] = ψ(y)pt (x − y; σ)dy, (1.5.4)
−∞
Perhaps the simplest way to see the validity of (1.5.5) is by making use of
Fourier analysis which transforms convolutions into products as explained
earlier in subsection (XXXX). It is a straightforward calculation that
x2
Z ∞
−iλx e− 2tσ2 1 λ2 σ 2 t
e √ dx = e− 2 . (1.5.6)
−∞ 2πtσ π
34
From (??) and (1.5.6), the desired relation (1.5.5) and the semi-group prop-
erty follow. An important feature of continuous time Markov chains is that
Pt satisfied the the Kolmogorov forward and backward equations. In view of
the semi-group property the same is true for Brownian motion and we will
explain in example 1.5.2 below what the infinitesimal generator of Brownian
motion is. With some of the fundamental definitions of Brownian motion in
place we now calculate some quantities of interest.
35
Example 1.5.3 The notion of hitting time of a state played an important
role in our discussion of Markov chains. In this example we calculate the
density function for hitting time of a state a ∈ R in Brownian motion. The
trick is to look at the identity
P [Zt > a] = P [Zt > a | Ta ≤ t]P [Ta ≤ t] + P [Zt > a | Ta > t]P [Ta > t].
Clearly the second term on the right hand side vanishes, and by symmetry
1
P [Zt > a | Ta < t] = .
2
Therefore
P [Ta < t] = 2P [Zt > a]. (1.5.9)
The right hand side is easily computable and we obtain
2 Z ∞ − x22 2 Z ∞ − u2
P [Ta < t] = √ e 2tσ dx = √ e 2 du.
2πtσ a 2π √atσ
E[Ta ] = ∞, (1.5.11)
Example 1.5.4 For Brownian motion with Z◦ = 0 the event that it crosses
the line −a, where a > 0, between times 0 and t is identical with the event
[T−a < t] and by symmetry it has the same probability as P [Ta < t]. We
calculated this latter quantity in example 1.5.3. Therefore the probablity P
36
that the Brownian motion has at least one 0 in the interval (t1 , t2 ) can be
written as
1 Z∞ − a
2
37
Since a sum independent Gaussian random variables is Gaussian, Yt;j ’s are
also Gaussian. Furthermore,
m
X
Cov(Yt;j , Yt;k ) = Ajl Akl = δjk
l=1
p ≡ 1 on N ; p ≡ 0 on M.
Proof - Let ρ > 0 be sufficiently small so that the sphere Sρm−1 (x) of radius
ρ > 0 centered at x ∈ D in contained entirely in D. Let T be the first hitting
of the sphere Sρm−1 (x) given Z◦ = x. Then the distribution of the points
y defined by ZT = y is uniform on the sphere. Let Bx be the event that
starting x the Brownian motion hits N before M . Consequently in view of
the Markov property (see remark 1.5.2 below) we have for B ⊂ Rm
Z
1
P [Bx ] = P [By | ZT = y] dvS (y),
Sρm−1 (x) vol(Sρm−1 (x))
38
where dvS (y) denotes the standard volume element on the sphere Sρm−1 (x).
Therefore we have
Z
p(y)
p(x) = dvS (y), (1.5.14)
Sρm−1 (x) vol(Sρm−1 (x))
Remark 1.5.2 We are in fact using the strong Markov property of Brownian
motion. This application of this property is sufficiently intuitive that we did
give any further justification. ♥
1 ∂ ∂ 1 ∂2
∆2 = (r ) + 2 2 ,
r ∂r ∂r r ∂θ
and
1 ∂ 2∂ 1 ∂ ∂ 1 ∂2
∆3 = 2 r + (sin θ ) + .
r ∂r ∂r sin θ ∂θ ∂θ sin2 θ ∂φ2
Looking for spherically symmetric solutions pm (i.e., depending only on the
variable r) the partial differential equations reduce to ordinary differential
equations which we easily solve to obtain the solutions
log r − log R
p2 (x) = , for x = (r, θ), (1.5.15)
log − log R
39
and
1 1
r
− R
p3 (x) = 1 1 , for x = (r, θ, φ), (1.5.16)
− R
for the given boundary conditions. Now notice that
lim p2 (x) = 1, lim p3 (x) = . (1.5.17)
R→∞ R→∞ r
The difference between the two cases is naturally interpreted as Brownian
motion being recurrent in dimension two but transient in dimensions ≥ 3. ♠
1
Remark 1.5.3 The functions u = rm−2 satisfies Laplace’s equation in Rm
for m ≥ 3, and can be used to rstablish the analogue of example 1.5.5 in
dimensions ≥ 4. ♥
Brownian motion has the special property that transition from a starting
point x to a set A is determined by the integration of a function pt (x − y; σ)
with respect to y on A. The fact that the integrand is a function of y − x
reflects a space homogeneity property which Brownian shares with random
walks. On the other hand, Markov chains do not in general enjoy such space
homogeneity property. Naturally there are many pocesses that are Markovian
in nature but do not have space homogeneity property. We present some such
examples constructed from Brownian motion.
We have
P [Zt > y] = P [Bt (y)] + P [Ct (y)]. (1.5.18)
By the reflection principle
P [Ct (y)] = P [Zt < −y].
40
Therefore
P [Bt (y)] = P [Zt > y] − P [Zt < −y]
= P [Zt > y − x | Z◦ = 0] − P [Zt > x + y | Z◦ = 0]
u2
1 R y+x −
= √2πtσ y−x e
2tσ 2 du
R y+2x
= y pt (u − x; σ)du.
Therefore for Z̃t we have
P [Z̃t = 0 | Z̃◦ = x] = 1 − P [B (0)]
Rx t
= 1 R− −x pt (u; σ)du
= 2 ◦∞ pt (x + u; σ)du.
Similarly, for 0 < a < b,
P [a < Z̃t < b] = P [Bt (a)] − P [Bt (b)]
= ab [pt (u − x; σ) − pt (u + x; σ)]du.
R
Thus we see that the transition probability has a discrete part P [Z̃t = 0 | Z̃◦ =
0] and a continuous part P [a < Z̃t < b] and is not a function of y − x. ♠
Example 1.5.7 Let Zt = (Zt;1 , Zt;2 ) denote two dimensional Brownian mo-
tion with Zt;i (0) = 0, and define
q
2 2
Rt = Zt;1 + Zt;2 .
This process means that for every Brownian path ω we consider its dis-
tance from the origin. This is a one dimensional process on R+ , called radial
Brownian motion or a Bessel process and its Markovian property is intuitively
reasonable and will analytically follow from the calculation of transition prob-
abilities presented below. Let us compute the transition probabilities. We
have
(y −x )2 +(y2 −x2 )2
1 − 1 1
P [Rt ≤ b] | Z◦ = (x1 , x2 )] =
RR
+ 2
y1 y2 ≤b2 2πtσ 2 e 2tσ 2 dy1 dy2
(r cos θ−x1 )2 +(r sin θ−x2 )2
e−
1 R b R 2π
= 2πtσ 2 ◦ ◦
2tσ 2 dθrdr
2 2
r Rb − r +ρ2
= ◦ e
2tσ I(r, x)dr.
2πtσ 2
where (r, θ) are polar coordinates in y1 y2 -plane, ρ = ||x|| and
Z 2π r
I(r, x) = e tσ2 [x1 cos θ+x2 sin θ] dθ.
◦
x1 x2
Setting cos φ = ρ
and sin φ = ρ
, we obtain
Z 2π rρ
I(r, x) = e tσ2 cos θ dθ.
◦
41
The Bessel function I◦ is defined as
1 Z 2π α cos θ
I◦ (α) = e dθ.
2π ◦
Therefore the desired transition probability
Z b
P [Rt ≤ b | Z◦ = (x1 , x2 )] = p̃t (ρ, r; σ)dr, (1.5.19)
◦
where
r − r2 +ρ22 rρ
p̃t (ρ, r; σ) = 2
e 2tσ I◦ ( 2 ).
tσ tσ
The Markovian property of radial Brownian motion is a consequence of the
expression for transition probabilities since they depends only on (ρ, r). From
the fact that I◦ is a solution of the differential equation
d2 u 1 du
+ − u = 0,
dz 2 z dz
we obtain the partial differential differential equation satisfied p̃:
∂ p̃ σ 2 ∂ 2 p̃ σ ∂ p̃
= + ,
∂t 2 ∂r2 2r ∂r
which is the radial heat equation. ♠
Ztµ = Zt + µt
42
Example 1.5.8 Let −b < 0 < a, x ∈ (−b, a) and p(x) be the probability
that Ztµ hits a before it hits −b. This is similar to proposition 1.5.1. Instead
of using the mean value property of harmonic functions (which is no longer
valid here) we directly use our knowledge of calculus to derive a differential
equation for p(x) which allows us to calculate it. The method of proof has
other applications (see exercise 1.5.5). Let h be a small number and B denote
the event that Ztµ hits a before it hits −b. By conditioning on Zhµ , and setting
Zhµ = x + y we obtain
Z
1 ∞ (y−µh)2
p(x) = e− 2hσ2 p(x + y)dy
√ (1.5.20)
−∞ 2πhσ
The Taylor expansion of p(x + y) gives
1
p(x + y) = p(x) + yp0 (x) + y 2 p00 (x) + · · ·
2
Now y = Zh − Z◦ and therefore
(y−µh)2 (y−µh)2
Z ∞ e− 2hσ 2
Z ∞ e− 2hσ 2
y √ dy = µh, y2 √ dy = σ 2 h + h2 µ2 .
−∞ 2πhσ −∞ 2πhσ
It is straightforward to check that contribution of terms of the Taylor expan-
sion containing y k , for k ≥ 3, is O(h2 ). Substituting in (1.5.20), dividing by
h and taking limh→0 we obtain
σ 2 d2 p dp
2
+µ = 0.
2 dx dx
The solution with the required boundary conditions is
2µb 2µx
e σ2 − e− σ2
p(x) = 2µb 2µa .
e σ2 − e− σ2
The method of solution is applicable to other problems. ♠
43
EXERCISES
Exercise 1.5.1 Formulate the analogue of the reflection principle for Brow-
nian motion and use it to give an alternative proof of (1.5.9).
Exercise 1.5.3 Generate ten paths for the simple symmetric random walk
1
on Z for n ≤ 1000. Rescale the paths in time direction by 1000 and in the
1
space direction by 1000 , and display them as graphs.
√
Exercise 1.5.4 Display ten paths for two dimensional Brownian motion by
repeating the computer simulation of exercise 1.5.3 for each component. The
paths so generated are one dimensional curves in three dimensional space
(time + space). Display only their projections on the space variables.
Exercise 1.5.5 Consider Brownian motion with drift Ztµ and assume µ > 0.
Let −b < 0 < a and let T be the first hitting of the boundary of the interval
[−b, a] and assume Z◦µ = x ∈ (−b, a). Show that E[T ] < ∞. Derive a
differential equation for E[T ] and deduce that for σ = 1
Exercise 1.5.6 Consider Brownian motion with drift Ztµ and assume µ > 0
and a > 0. Let Taµ be the first hitting of the point a and
Using the method of example 1.5.8, derive a differential equation for Ft (x).
44