0% found this document useful (0 votes)

11 views177 pages

Lecture PDF

Uploaded by

Ivo Tamara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

11 views177 pages

Lecture PDF

Uploaded by

Ivo Tamara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 177

(Higher) Probability and Stochastic Processes “Probability is common sense reduced to calculation”

Pierre-Simon Laplace (1749-1827)

MATH3801-MATH3901
Semester 1, 2017
MATH3801-MATH3901 Stochastic Processes 1
“When I hear you give your reasons,’ I remarked, ‘the thing Outline
always appears to me to be so ridiculously simple that I could
easily do it myself, though at each successive instance of your 1 Introduction to Probability Theory
reasoning I am baffled until you explain your process.” 2 Random variables
Dr. Watson to Sherlock Holmes
3 Conditional Probability and Conditional Expectation
A Scandal in Bohemia
Interlude Stochastic processes
4 Markov Chains
5 The Exponential Distribution and the Poisson Process
6 Continuous-Time Markov Chains
8 Queueing theory
10 Brownian Motion and Stationary Processes
11 Introduction to Martingales
MATH3801-MATH3901 Stochastic Processes 2 MATH3801-MATH3901 Stochastic Processes 3
1. Introduction to Probability Theory 1.1 Introduction
Probability theory
Any realistic model of a real-world phenomenon must take into account
1 Introduction to Probability the possibility of randomness
; probabilistic model: no attempt to deterministically predict the
Theory outcome of an experiment, but rather a set of possible outcomes
weighted by their likelihoods
Any real-world situation is concerned with probability ideas
; probability theory is a central field of mathematics
MATH3801-MATH3901 Stochastic Processes 4
1. Introduction to Probability Theory 1.1 Introduction 1. Introduction to Probability Theory 1.1 Introduction
A brief history of probability A brief history of probability
Probability theory has an infamous birthplace: gambling rooms 1654: correspondence between Pierre de Fermat (French lawyer,
Earliest publication on probability: “Liber de Ludo Aleae” (i.e. the book 1601-1665) and Blaise Pascal (French mathematician and
on games of chance) by the controversial Gerolamo Cardano philosopher, 1623-1662) on the questions of the fair division of the
(1501-1576), an Italian mathematician, astrologer and gambler of the stake in an interrupted game of chance (Chevalier de Mere’s problem)
time 1657: the Dutch mathematician and astronomer Christiaan Huygens
(1629-1695) gave the first comprehensive mathematical treatment of
the subject
Trying to put the odds in his favour in games of dice and cards, he
studied methods for evaluating how likely were certain random events
; he made some important discoveries and it is believed that the word
“probability” was first coined and used by him
MATH3801-MATH3901 Stochastic Processes 5 MATH3801-MATH3901 Stochastic Processes 6
1. Introduction to Probability Theory 1.2 Sample Space and Events 1. Introduction to Probability Theory 1.2 Sample Space and Events
Random experiment, sample space and events Random experiment, sample space and events
random experiment: whose outcome is not predictable in advance random experiment: whose outcome is not predictable in advance
sample space S = set of all possible outcomes sample space S = set of all possible outcomes
one possible outcome ! = an elementary event one possible outcome ! = an elementary event
any subset E of S (i.e. a set of elementary events) = an event any subset E of S (i.e. a set of elementary events) = an event
we say that an event E occurs when the observed ! is in E we say that an event E occurs when the observed ! is in E
Example 1 Example 2
experiment: flipping of a coin experiment: rolling a die
S = {H, T } S = {1, 2, 3, 4, 5, 6}
!1 = H, !2 = T !1 = 1, !2 = 2, . . ., !6 = 6
E = {H} = “the coin comes up heads” E = {2, 4, 6} = “an even number appears on the roll”
MATH3801-MATH3901 Stochastic Processes 7 MATH3801-MATH3901 Stochastic Processes 8
1. Introduction to Probability Theory 1.2 Sample Space and Events 1. Introduction to Probability Theory 1.2 Sample Space and Events
Random experiment, sample space and events Random experiment, sample space and events
random experiment: whose outcome is not predictable in advance random experiment: whose outcome is not predictable in advance
sample space S = set of all possible outcomes sample space S = set of all possible outcomes
one possible outcome ! = an elementary event one possible outcome ! = an elementary event
any subset E of S (i.e. a set of elementary events) = an event any subset E of S (i.e. a set of elementary events) = an event
we say that an event E occurs when the observed ! is in E we say that an event E occurs when the observed ! is in E
Example 3 Example 4
experiment: flipping two coins experiment: rolling two dice
S = {(H, H), (H, T ), (T , H), (T , T )} S = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 1), . . . , (6, 6)}
!1 = (H, H), . . ., !4 = (T , T ) !1 = (1, 1), . . ., !36 = (6, 6)
E = {(H, H), (H, T ), (T , H)} = “at least one coin comes up heads” E = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)}
= “the sum of the dice equals seven”
MATH3801-MATH3901 Stochastic Processes 9 MATH3801-MATH3901 Stochastic Processes 10
1. Introduction to Probability Theory 1.2 Sample Space and Events 1. Introduction to Probability Theory 1.2 Sample Space and Events
Random experiment, sample space and events Random experiment, sample space and events
random experiment: whose outcome is not predictable in advance random experiment: whose outcome is not predictable in advance
sample space S = set of all possible outcomes sample space S = set of all possible outcomes
one possible outcome ! = an elementary event one possible outcome ! = an elementary event
any subset E of S (i.e. a set of elementary events) = an event any subset E of S (i.e. a set of elementary events) = an event
we say that an event E occurs when the observed ! is in E we say that an event E occurs when the observed ! is in E
Example 5
Example 6
experiment: repeatedly flipping a coin until the first Head is
experiment: measuring the lifetime of a car (in years)
observed
S = [0, 1)
S = {H, TH, TTH, TTTH, . . .}
each ! is a non-negative real number (but impossible to cite all !’s
!1 = H, !2 = TH, !3 = TTH, . . .
as they are uncountably many)
E = {H, TH, TTH}
E = [2, 6] = “the car lasts between 2 and 6 years”
= “at most three tosses are needed to observe the first Head”
MATH3801-MATH3901 Stochastic Processes 11 MATH3801-MATH3901 Stochastic Processes 12
1. Introduction to Probability Theory 1.2 Sample Space and Events 1. Introduction to Probability Theory 1.3 Probabilities
Events “The probable is what usually happens”
The elements of interest are the events, which are sets
Aristotle (384-322 BCE)
) basic concepts of set theory are useful
Set notation
• union E1 [ E2 = event “either E1 or E2 occurs”
• intersection E1 \ E2 = event “both E1 and E2 occur”
• complement E1c = event “E1 does not occur”
S is an event ; certain event
.
S c = ; impossible event
E1 \ E2 = ; mutually exclusive events
E1 ✓ E2 ; E1 implies E2
(E1 [ E2 )c = E1c \ E2c Augustus De Morgan
De Morgan’s laws:
(E1 \ E2 )c = E1c [ E2c (1806-1871), British
mathematician
MATH3801-MATH3901 Stochastic Processes 13 MATH3801-MATH3901 Stochastic Processes 14
1. Introduction to Probability Theory 1.3 Probabilities 1. Introduction to Probability Theory 1.3 Probabilities
Probability Kolmogorov’s axioms of probability theory
Intuitively, the probability P(E) of an event E is a real number which
should measure
how likely E is to occur
Kolmogorov’s axioms
Naturally, such a number should satisfy some conditions The probability measure P(·) satisfies:
i) 0  P(E)  1 for any event E
Firm mathematical footing ; Kolmogorov’s axioms (1933)
ii) P(S) = 1
iii) for any (infinite) sequence of mutually exclusive events E1 , E2 , . . .,
! 1 1
[ X
P Ei = P(Ei )
i=1 i=1
Surprisingly enough, everything to know about probabilities follows
from these simple conditions, and only from them
Andrey Kolmogorov (1903-1987), Russian mathematician
MATH3801-MATH3901 Stochastic Processes 15 MATH3801-MATH3901 Stochastic Processes 16
1. Introduction to Probability Theory 1.3 Probabilities 1. Introduction to Probability Theory 1.3 Probabilities
Useful implications of the axioms Assigning probabilities
P( ) = 0 (; the impossible event has probability 0) Note
for any finite sequence of mutually exclusive events E1 , E2 , . . . , En , The axioms only state the conditions a valid assignment of
! n n probabilities must satisfy, but they do not tell how to assign specific
[ X
P Ei = P(Ei ) probabilities to events
i=1 i=1
; different approaches can be used, the most widely held being the
for any event E, P(E c ) = 1 P(E) frequentist approach
E1 ✓ E2 ) P(E1 )  P(E2 ) (increasing measure) Frequentist definition of probability
P(E1 [ E2 ) = P(E1 ) + P(E2 ) P(E1 \ E2 ), and by induction: If the experiment is repeated independently over and over again
Additive Law of Probability (or inclusion/exclusion principle) (infinitely many times), the proportion of time that the event E occurs is
n
! n its probability P(E)
[ X X X
P Ei = P(Ei ) P(Ei \ Ej ) + P(Ei \ Ej \ Ek ) Suppose the experiment is repeated n times. Then, the probability of
i=1 i=1 i<j i<j<k
! the event E is
n
\ number of times E occurs
+ . . . + ( 1)n 1
P Ei P(E) = lim
i=1
n!1 n
MATH3801-MATH3901 Stochastic Processes 17 MATH3801-MATH3901 Stochastic Processes 18
1. Introduction to Probability Theory 1.3 Probabilities 1. Introduction to Probability Theory 1.3 Probabilities
Assigning probabilities A simple example
Interpretation Example 1 (ctd.)
probability ' proportion of chances of occurrence of the event experiment: flipping a coin ; S = {H, T }
8
< i) 0  P(H)  1, 0  P(T )  1
It is straightforward to check that the so-defined probability measure Axioms: ii) P(S) = P(H [ T ) = 1
satisfies the axioms :
iii) P(H [ T ) = P(H) + P(T )
Of course, this definition remains theoretical, as assigning probabilities
would require infinitely many repetitions of the experiment The coin is tossed n times, we observe the proportion of H and T :
n=1000
Besides, in many situations, the experiment cannot be faithfully

1.0
proportion of Heads
proportion of Tails
replicated (What is the probability that it will rain tomorrow? What is

0.8
the probability of finding oil in that region?)
1

0.6
; P(H) = P(T ) =
; essentially, assigning probabilities in practice relies on prior 2

prop
knowledge of the experimenter (belief and/or model)

0.4
(fair coin, in this case)

0.2
A simple model is to assume that all the outcomes are equally likely,
other more elaborated models define probability distributions

0.0
0 200 400 600 800 1000
MATH3801-MATH3901 Stochastic Processes 19 MATH3801-MATH3901
repetition
Stochastic Processes 20
1. Introduction to Probability Theory 1.4 Conditional Probabilities 1. Introduction to Probability Theory 1.4 Conditional Probabilities
Conditional probabilities: definition Conditional probabilities: properties
Sometimes probabilities need to be re-evaluated as additional P( • |E2 ) = an alternative probability assignment
information becomes available ; satisfies Kolmogorov’s axioms (check!) and their implications
; conditional probability: based on the some partial information, e.g. P(S|E2 ) = 1, or P(E1c |E2 ) = 1 P(E1 |E2 )
effectively or allegedly observed P(E1 |S) = P(E1 )
Definition P(E1 |E1 ) = 1, P(E1 |E2 ) = 1 if E2 ✓ E1
The conditional probability of E1 , conditional on E2 , is defined as P(E1 |E2 ) ⇥ P(E2 ) = P(E1 \ E2 ) = P(E2 |E1 ) ⇥ P(E1 )
P(E1 \ E2 ) ; Bayes’ first rule: if P(E1 ), P(E2 ) > 0,
P(E1 |E2 ) = (if P(E2 ) > 0)
P(E2 ) P(E1 )
P(E1 |E2 ) = P(E2 |E1 ) ⇥
P(E2 )
= probability of E1 , given that E2 has occurred
; as we know that E2 has occurred, E2 becomes the new sample Multiplicative Law of Probability:
!n n\1
!
space in the place of S \
P Ei = P(E1 )⇥P(E2 |E1 )⇥P(E3 |E1 \E2 )⇥. . .⇥P En Ei
; the probability of E1 has to be calculated within E2 and relatively
i=1 i=1
to P(E2 )
MATH3801-MATH3901 Stochastic Processes 21 MATH3801-MATH3901 Stochastic Processes 22
1. Introduction to Probability Theory 1.4 Conditional Probabilities 1. Introduction to Probability Theory 1.4 Conditional Probabilities
Example 1.8 in the textbook: men mixing their hats at a party... From the Multiplicative Law of Probability,
Example 1.8 bis: the Matching Problem P(Ei \ Ej ) = P(Ej |Ei ) ⇥ P(Ei ) for any i 6= j
A computer system has 3 users, each with a unique name and password. Now, given Ei , that is knowing that the ith user has got their own password,
Due to a software error, the 3 passwords have been randomly permuted there remain two passwords that the jth user may get, one of these two being
internally. Only the users lucky enough to have had their passwords their own. So
unchanged in the permutation are able to continue using the system. What is P(Ej |Ei ) = 1/2
the probability that none of the three users has their password unchanged? and
P(Ei \ Ej ) = 1/6.
Denote A =“none has their password unchanged”, and Ei = “the ith user has
Likewise, given E1 \ E2 , that is knowing that the first two users have kept their
their password unchanged” (i = 1, 2, 3). See that
own passwords, there is only one password left, the one of the third user, and
Ac = E1 [ E2 [ E3 ,
P(E3 |E1 \ E2 ) = 1
for Ac = at least one has their password unchanged. By the Additive Law of
so that (again Multiplicative Law of Probability)
Probability,
P(E1 \ E2 \ E3 ) = P(E3 |E1 \ E2 ) ⇥ P(E2 |E1 ) ⇥ P(E1 ) = 1/6.
P(E1 [ E2 [ E3 ) = P(E1 ) + P(E2 ) + P(E3 ) P(E1 \ E2 )
Finally,
P(E1 \ E3 ) P(E2 \ E3 ) + P(E1 \ E2 \ E3 ).
P(E1 [ E2 [ E3 ) = 3 ⇥ 1/3 3 ⇥ 1/6 + 1/6 = 2/3
Clearly, as each user gets a password at random out of 3, including their own, and
P(Ei ) = 1/3, i = 1, 2, 3 P(A) = 1 P(E1 [ E2 [ E3 ) = 1/3.
MATH3801-MATH3901 Stochastic Processes 23 MATH3801-MATH3901 Stochastic Processes 24
1. Introduction to Probability Theory 1.5 Independent Events 1. Introduction to Probability Theory 1.5 Independent Events
Independence of two events
Example 1.9
We toss two fair dice, denote E1 =“the sum of the dice is six”, E2 =“the sum
Definition of the dice is seven” and F =“the first die shows four”. Are E1 and F
independent? Are E2 and F independent?
Two events E1 and E2 are said to be independent if and only if
Recall that S = {(1, 1), (1, 2), (1, 3), . . . , (6, 5), (6, 6)} (there are thus 36
P(E1 \ E2 ) = P(E1 ) ⇥ P(E2 ) possible outcomes). See that, for the dice are fair,
E1 = {(1, 5), (2, 4), (3, 3), (4, 2), (5, 1)} P(E1 ) = 5/36
Note that independence implies
E2 = {(1, 6), (2, 5), (3, 4), (4, 3), (5, 2), (6, 1)} P(E2 ) = 6/36
P(E1 |E2 ) = P(E1 ) and P(E2 |E1 ) = P(E2 ) F = {(4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6)} P(F ) = 6/36
i.e. the probability of the occurrence of one of the event is unaffected E1 \ F = {(4, 2)} P(E1 \ F ) = 1/36
by the occurrence or the non-occurrence of the other E2 \ F = {(4, 3)}, P(E2 \ F ) = 1/36
; in agreement with our everyday usage of the word “independent” Hence, P(E1 \ F ) 6= P(E1 ) ⇥ P(F ) and P(E2 \ F ) = P(E2 ) ⇥ P(F )
; E2 and F are independent, but E1 and F are not
MATH3801-MATH3901 Stochastic Processes 25 MATH3801-MATH3901 Stochastic Processes 26
1. Introduction to Probability Theory 1.5 Independent Events 1. Introduction to Probability Theory 1.5 Independent Events
Independence of more than two events Remark
Pairwise independent events need not be jointly independent !
Definition
The events E1 , E2 , . . . , En are said to be jointly independent if and only Example 1.10
if for every subset {i1 , i2 , . . . , ir : r  n} of {1, 2, . . . , n}, Let a ball be drawn totally at random from an urn containing four balls
0 1 numbered 1,2,3,4. Let E = {1, 2}, F = {1, 3} and G = {1, 4}
r
\ r
Y Because the ball is selected at random, P(E) = P(F ) = P(G) = 1/2, and
P@ Eij A = P(Eij )
j=1 j=1 P(E \ F ) = P(E \ G) = P(F \ G) = P(E \ F \ G) = P({1}) = 1/4
So, P(E \ F ) = P(E) ⇥ P(F ), P(E \ G) = P(E) ⇥ P(G) and
For instance, E1 , E2 and E3 are independent iff P(F \ G) = P(F ) ⇥ P(G), but
P(E1 \ E2 ) = P(E1 ) ⇥ P(E2 ) and P(E \ F \ G) 6= P(E) ⇥ P(F ) ⇥ P(G)
P(E1 \ E3 ) = P(E1 ) ⇥ P(E3 ) and The events E, F , G are pairwise independent, but they are not jointly
independent
P(E2 \ E3 ) = P(E2 ) ⇥ P(E3 ) and
; knowing that one event happened does not affect the probability of the
P(E1 \ E2 \ E3 ) = P(E1 ) ⇥ P(E2 ) ⇥ P(E3 ) others, but knowing that 2 events simultaneously happened does affect
the probability of the third one
MATH3801-MATH3901 Stochastic Processes 27 MATH3801-MATH3901 Stochastic Processes 28
1. Introduction to Probability Theory 1.5 Independent Events 1. Introduction to Probability Theory 1.6 Bayes’ Formula
Partition
Example 1.10 bis Definition
Let a ball be drawn totally at random from an urn containing 8 balls numbered A sequence of events E1 , E2 , . . ., En such that
Sn
1,2,3,. . . ,8. Let E = {1, 2, 3, 4}, F = {1, 3, 5, 7} and G = {1, 4, 6, 8} 1. S = i=1 Ei and
It is clear that P(E) = P(F ) = P(G) = 1/2, and 2. Ei \ Ej = for all i 6= j (mutually exclusive),
is called a partition of S
P(E \ F \ G) = P({1}) = 1/8 = P(E) ⇥ P(F ) ⇥ P(G),
Some examples:
but
P(F \ G) = P({1}) = 1/8 6= P(F ) ⇥ P(G).
Hence, the events E, F , G are not independent, although
P(E \ F \ G) = P(E) ⇥ P(F ) ⇥ P(G)
Simplest partition: {E, E c }, for any event E
MATH3801-MATH3901 Stochastic Processes 29 MATH3801-MATH3901 Stochastic Processes 30
1. Introduction to Probability Theory 1.6 Bayes’ Formula 1. Introduction to Probability Theory 1.6 Bayes’ Formula
Law of Total Probability Bayes’ second rule
From a partition {E1 , E2 , . . . , En }, any event A can be written
Now, put the Law of Total Probability in Bayes’ first rule and get
A = (A \ E1 ) [ (A \ E2 ) [ . . . [ (A \ En )
Bayes’ second rule
Given a partition {E1 , E2 , . . . , En } of S such that P(Ei ) > 0 for all i, we
have, for any event A such that P(A) > 0,
P(A|Ei )P(Ei )
P(Ei |A) = Pn
Pn j=1 P(A|Ej )P(Ej )
; P(A) = P(A \ E1 ) + P(A \ E2 ) + . . . + P(A \ En ) = i=1 P(A \ Ei )
Thomas Bayes (1702-1761), English
Law of Total Probability
mathematician and Presbyterian minister
Given a partition {E1 , E2 , . . . , En } of S such that P(Ei ) > 0 for all i, the
probability of any event A can be written (Bayes gave his name to the Bayesian definition
of probability, the main ‘competitor’ of the
n
X frequentist definition given on Slide 18)
P(A) = P(A|Ei ) ⇥ P(Ei )
i=1
; MATH3871 - Bayesian Inference
MATH3801-MATH3901 Stochastic Processes 31 MATH3801-MATH3901 Stochastic Processes 32
1. Introduction to Probability Theory 1.6 Bayes’ Formula
Example 1.13
Suppose a multiple choice test, with m multiple-choice alternatives for each
question. A student knows the answer of a given question with probability p.
If she does not know, she guesses. Given that the student correctly answered
a question, what is the probability that she effectively knew the answer?
Let C =“she answers the question correctly” and K =“she knows the
answer”. Then, we desire P(K |C). We have 2 Random Variables
P(K )
P(K |C) = P(C|K ) ⇥
P(C)
P(C|K ) ⇥ P(K )
=
P(C|K ) ⇥ P(K ) + P(C|K c ) ⇥ P(K c )
1⇥p
=
1 ⇥ p + (1/m) ⇥ (1 p)
mp
=
1 + (m 1)p
MATH3801-MATH3901 Stochastic Processes 33
2. Random Variables 2.1 Random Variables 2. Random Variables 2.1 Random Variables
Random variables Random variables and events
Events = set of possible outcomes of a random experiment Define SX the domain of variation of X , that is the set of possible
values taken by X
Often, we are only interested in what the outcome implies, but not in
the outcome itself Example: tossing two dice when playing a board game
Example: tossing two dice when playing a board game X = sum of the pips, SX = {2, 3, 4, . . . , 12}
S = {(1, 1), (1, 2), . . . , (6, 5), (6, 6)} Then, for any x 2 SX , assertions like “X = x” or “X  x” correspond to
... but usually only the sum of the pips matters a set of possible outcomes
; each possible outcome ! is characterised by a real value (X = x) = {! 2 S : X (!) = x} ⇢ S
Definition (X  x) = {! 2 S : X (!)  x} ⇢ S
A random variable is a real-valued function defined over the sample ; they are events ! ; meaningful to talk about their probability
space: Example (ctd.) - If the dice are fair
X :S!R
(X = 2) = {(1, 1)} ; P(X = 2) = 1/36
! ! X (!) (X 11) = {(5, 6), (6, 5), (6, 6)} ; P(X 11) = 3/36 = 1/12
MATH3801-MATH3901 Stochastic Processes 34 MATH3801-MATH3901 Stochastic Processes 35
2. Random Variables 2.1 Random Variables 2. Random Variables 2.1 Random Variables
Cumulative distribution function
Note 1 A random variable is essentially described by its cumulative
distribution function (cdf) (or just distribution)
It is important not to confuse:
X , the name of the random variable (capital letter) Definition
X (!), the numerical value that the random variable X associates The cdf of the random variable X is defined for any real number x, by
with an elementary event !
FX (x) = P(X  x)
x, a generic numerical value (lower case letter)
Some properties:
Note 2 for any a  b, P(a < X  b) = FX (b) FX (a)
Most interesting problems can be stated, often naturally, in terms of FX is a nondecreasing function
random variables
FX is right continuous
; many inessential details about the sample space can be left limx!+1 FX (x) = FX (+1) = 1
unspecified, and one can still solve the problem
limx! 1 FX (x) = FX ( 1) = 0
The last 4 properties essentially characterise a proper cumulative
distribution function
MATH3801-MATH3901 Stochastic Processes 36 MATH3801-MATH3901 Stochastic Processes 37
2. Random Variables 2.1 Random Variables 2. Random Variables 2.2 Discrete Random Variables
Cumulative distribution function Discrete random variables
Definition
A random variable is said to be discrete if it can only take on at most a
1.0
1.0

1.0

● ● ● ● ●
● ●
● ●
countable number of values
0.8
0.8

0.8

● ●
●
0.6
0.6

0.6

Suppose that those values are SX = {x1 , x2 , . . .}

FX(x)
FX(x)

FX(x)

0.4
0.4

0.4

● ●
●
Definition
0.2
0.2

0.2

The probability mass function (pmf) of a discrete random variable X is

0.0
0.0

0.0

● ●
x x x
defined for any real number x, by
Continuous distribution Discrete distribution Hybrid distribution
; continuous r.v. ; discrete r.v. ; hybrid r.v. pX (x) = P(X = x)
; pX (x) > 0 for x = x1 , x2 , . . ., and pX (x) = 0 for any other value of x
(Hybrid distributions will not be studied in this course) X
Obviously: pX (x) = 1
x2SX
MATH3801-MATH3901 Stochastic Processes 38 MATH3801-MATH3901 Stochastic Processes 39
2. Random Variables 2.2 Discrete Random Variables 2. Random Variables 2.2 Discrete Random Variables
Some discrete distributions of interest
probability mass function
1.0
0.8

Probability mass function:

0.6
pX(x)

“spikes” at x1 , x2 , . . . Some particular discrete distributions have been exhaustively studied

0.4

height of spike at xi = pX (xi )

●
●
● as they arise in situations very common in practice, e.g.
0.2

● ●
the Bernoulli distribution ;
0.0

●
x
the Binomial distribution ;
cumulative distribution function the Geometric distribution ;
the Poisson distribution ;
1.0

● ●
●
● ●
Cumulative distribution function: etc.
0.8

P
FX (x) =
i:xi x pX (xi )
● ●
0.6
FX(x)

step function
0.4 jumps at x1 , x2 , . . .
● ●
0.2
● ● magnitude of jump at xi = pX (xi )
0.0
●
MATH3801-MATH3901 Stochastic Processes 40 MATH3801-MATH3901 Stochastic Processes 41
x
2. Random Variables 2.2 Discrete Random Variables 2. Random Variables 2.2 Discrete Random Variables
The Binomial distribution The Binomial distribution
Assume: Note: the coefficients xn = n!/(x!(n x)!) are the binomial
the outcome of the random experiment can be classified as either coefficients, that is the coefficients arising in the famous Newton’s
a “Success” or a “Failure” (; S = {Success, Failure}) binomial expansion
n ✓ ◆
X
we observe Success with probability p n x n
(a + b)n = a b x
(hence Failure with probability 1 p) x
x=0
n independent repetitions of this experiment are performed
These coefficients are often represented in the Pascal’s triangle,
Define X = number of successes observed over the n repetitions. We named after Blaise Pascal (again, see Slide 6)
say that X has the binomial distribution with parameters n and p:
X ⇠ Bin(n, p)
See that SX = {0, 1, 2, . . . , n} and the binomial probability mass is
given by ✓ ◆
n x
pX (x) = p (1 p)n x
for x 2 SX .
x
MATH3801-MATH3901 Stochastic Processes 42 MATH3801-MATH3901 Stochastic Processes 43
2. Random Variables 2.2 Discrete Random Variables 2. Random Variables 2.2 Discrete Random Variables
The Binomial distribution The Bernoulli distribution
p = 0.1 p = 0.2 p = 0.5 p = 0.8 p = 0.9
Particular case: if n = 1 ; the Bernoulli distribution
1.0

1.0

1.0
Named after the Swiss scientist Jakob Bernoulli (1654-1705)
0.8

0.8

0.8
0.6

0.6

0.6
● ●
pX(x)

pX(x)

pX(x)
X ⇠ Bern(p)
0.4

0.4

0.4
● ●
● ● ● ●
● ●
0.2

0.2

0.2
● ●
pmf:
● ●
● ●
8
● ●
● ●
0.0

0.0

0.0
● ● ● ● ● ● ● ● ● ●
<1 p if x = 0
−1 0 1 2 3 4 5 6 −1 0 1 2 3 4 5 6 −1 0 1 2 3 4 5 6 −1 0 1 2 3 4 5 6 −1 0 1 2 3 4 5 6
x x x x x
pX (x) = p if x = 1
:
1.0

1.0

1.0
● ● ● ● ● ● ● ●
0 otherwise
● ● ● ●
● ●
● ●
● ●
0.8

0.8

0.8
● ●
● ●
SX = {0, 1} ; this random variable is often used to characterise the
● ●
0.6

0.6

0.6
● ●
FX(x)

FX(x)

FX(x)
● ●
occurrence/non-occurrence of a given event, or the presence/absence
0.4

0.4

0.4
● ●
● ●
of a given feature
● ●
0.2

0.2

0.2
● ●
● ●
● ●
● ●
n
0.0

0.0

0.0
● ● ● ● ●
● ● ● ● ●
● ●
−1 0 1 2 3 4 5 6 −1 0 1 2 3 4 5 6 −1 0 1 2 3 4 5 6 −1 0 1 2 3 4 5 6 −1 0 1 2 3 4 5 6 X
Note: if X ⇠ Bin(n, ⇡), we can represent it as X = Xi , where Xi ’s
x x x x x
i=1
Binomial pmf and cdf, for n = 5 and p = {0.1, 0.2, 0.5, 0.8, 0.9} are n independent Bernoulli r.v. with parameters ⇡
MATH3801-MATH3901 Stochastic Processes 44 MATH3801-MATH3901 Stochastic Processes 45
2. Random Variables 2.2 Discrete Random Variables 2. Random Variables 2.2 Discrete Random Variables
Random variables: remark The Geometric distribution
We defined random variables to be functions from a sample space S to R Assume:
However, we did not mention the sample space when introducing Binomial (or the outcome of the random experiment can be classified as either
Bernoulli) random variables
a “Success” or a “Failure” (; S = {Success, Failure})
Actually, in practice, the sample space often “disappears” but it is really there
we observe Success with probability p
in the background
(hence Failure with probability 1 p)
For instance, let’s construct a sample space explicitly for a Bernoulli r.v.:
independent repetitions of this experiment are performed until the
Imagine any random experiment whose sample space is S = [0, 1]
first success occurs
Define the probability P as P({! : ! 2 [a, b]}) = b a for 0  a  b  1. Fix
Define X = number of repetitions required until the first success. We
p 2 [0, 1] and define a random variable X : S = [0, 1] ! R as
⇢ say that X has the geometric distribution with parameter p:
1 if 0  !  p
X (!) =
0 if p < !  1 X ⇠ Geo(p)
Then see that P(X = 1) = P({! : ! 2 [0, p]}) = p and P(X = 0) = 1 p
See that SX = {1, 2, . . .} and the geometric probability mass is given
Thus, X ⇠ Bern(p). We could do something similar for any other distribution
by
So even if we often think of a random variable like a random number, bear in pX (x) = (1 p)x 1
p for x 2 SX
mind that is basically a function defined on some sample space
MATH3801-MATH3901 Stochastic Processes 46 MATH3801-MATH3901 Stochastic Processes 47
2. Random Variables 2.2 Discrete Random Variables 2. Random Variables 2.2 Discrete Random Variables
The Geometric distribution The Poisson distribution
p = 0.1 p = 0.2 p = 0.5 p = 0.8 p = 0.9
Assume you are interested in the number of occurrences of some
1.0

1.0

1.0
●
random phenomenon in a fixed period of time
0.8

0.8

0.8
●
0.6

0.6

0.6
Define X = number of occurrences. We say that X has the Poisson
pX(x)

pX(x)

pX(x)
●
0.4

0.4

0.4
● distribution with parameter , i.e.
0.2

0.2

0.2
●
● ●
● ●
X ⇠ P( ),
● ●
● ● ● ●
● ● ●
● ● ●
● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
0.0

0.0

0.0
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
if
x x x x x
x
pX (x) = e for x 2 SX = {0, 1, 2, . . .}
1.0

1.0

1.0
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ●
● ● ●
● ●
● ●
● ●
● ●
● ●
● ● ● ●
● ● ● ●
● ●
x!
● ●
● ● ● ●
● ● ●
● ● ● ●
● ●
● ● ● ●
● ●
0.8

0.8

0.8
● ● ● ● ● ●
● ●
● ● ● ●
● ●
● ●
● ●
● ●
● ●
Simeon-Denis Poisson
0.6

0.6

0.6
● ●
● ●
● ●
FX(x)

FX(x)

FX(x)
● ●
● ●
● ●
● ●
(1781-1840), French
0.4

0.4

0.4
● ●
● ●
● ●
● ●
mathematician
0.2

0.2

0.2
● ● ● ●
● ●
0.0

0.0

0.0
● ● ● ● ●
“Life is good for only two things:
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
x x x x x
discovering mathematics and
Geometric pmf and cdf, for p = {0.1, 0.2, 0.5, 0.8, 0.9} teaching it”
MATH3801-MATH3901 Stochastic Processes 48 MATH3801-MATH3901 Stochastic Processes 49
2. Random Variables 2.2 Discrete Random Variables 2. Random Variables 2.2 Discrete Random Variables
The Poisson distribution: how does it arise? The Poisson distribution: how does it arise?
think of the time period of interest as being split up into a large number, finally, as
say n, of subperiods ✓ ◆n
n!
assume that the phenomenon could occur at most one time in each of !1 and 1 !e
nx (n x)!(1 /n)x n
those subperiods, with some common probability p
as n ! 1, it remains
if what happens within one interval is independent of others,
x
X ⇠ Bin(n, p) P(X = x) = e for x 2 {0, 1, . . .}
x!
now, as n increases, p should decrease (the shorter the period, the less which is the Poisson distribution as defined on Slide 49
likely the occurrence of the phenomenon) ; let p = /n for some the Poisson distribution is thus suitable for modelling the number of
then, for any x 2 {0, 1, . . . , n}, the binomial pmf yields occurrences of a phenomenon satisfying some assumptions of
✓ ✓ ◆x ◆n x continuity, stationarity, independence and non-simultaneity
n!
P(X = x) = 1 is called the intensity of the phenomenon
x!(n x)! n n
x
✓ ◆n the Poisson distribution with = np is a good approximation for the
n!
= 1 binomial distribution when n is large and p is small. Specifically, if
nx (n x)!(1 /n)x x! n X ⇠ Bin(n, p), then P(X = x) ' P(Y = x) with Y ⇠ P(np)
MATH3801-MATH3901 Stochastic Processes 50 MATH3801-MATH3901 Stochastic Processes 51
2. Random Variables 2.2 Discrete Random Variables 2. Random Variables 2.3 Continuous Random Variables
The Poisson distribution Continuous random variables
As opposed to discrete r.v., a continuous random variable X is allowed
lambda = 0.1 lambda = 0.5 lambda = 1 lambda = 2 lambda = 10
1.0

1.0

1.0
to take on an uncountable number of values. SX is therefore an
●
0.8

0.8

0.8
uncountable set of real numbers, and can even be R itself. However,
0.6

0.6

0.6
●
this is not enough as a definition.
pX(x)

pX(x)

pX(x)
0.4

0.4

0.4
● ●
●
● ●
Definition
0.2

0.2

0.2
● ●
●
● ●
● ●
● ● ● ●
● ●
● ●
●
● ● ●
● ● ● ●
0.0

0.0

0.0
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
A random variable X is said to be continuous if there exists a
● ● ● ● ● ● ● ● ● ● ● ● ●
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
x x x x x
nonnegative function fX (x) defined for all real x 2 R such that for any
set B of real numbers,
1.0

1.0

1.0
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ●
● ●
● ● ● ●
● ● ● ● ● ●
● ●
● ● ● ●
0.8

0.8

0.8
Z
● ●
● ●
● ●
● ●
0.6

0.6

0.6
● ●
P(X 2 B) = fX (x)dx
● ●
FX(x)

FX(x)

FX(x)
● ●
0.4

0.4

0.4
B
● ●
● ●
● ●
● ●
0.2

0.2

0.2
Consequence: P(X = x) = 0 for any x ! (intuitively understandable)
● ● ● ●
● ●
● ●
0.0

0.0

0.0
● ●
● ● ● ● ● ● ● ●
0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20 0 5 10 15 20
x x x x x
; the probability mass function is useless
Poisson pmf and cdf, for = {0.1, 0.5, 1, 2, 10} ; the probability density function (pdf) fX (x) will play the central
role for continuous distributions
MATH3801-MATH3901 Stochastic Processes 52 MATH3801-MATH3901 Stochastic Processes 53
2. Random Variables 2.3 Continuous Random Variables 2. Random Variables 2.3 Continuous Random Variables
Probability density function: properties Some continuous distributions of interest
Z x
FX (x) = P(X  x) = fX (y ) dy , that is
1
dFX (x)
fX (x) = = FX0 (x) Some particular continuous distributions have been exhaustively
dx
studied as they arise in situations very common in practice, e.g.
wherever FX is differentiable the Uniform distribution ;
fX (x) 0 8x 2 R (as FX (x) is nondecreasing)
Z the Exponential distribution ;
b
P(a  X  b) = fX (x)dx = FX (b) FX (a) the Gamma distribution ;
a
Z +1 the Normal distribution ;
fX (x)dx = 1 etc.
1
Z x+"/2
P(x "/2  X  x + "/2) = fX (y )dy ' "fX (x)
x "/2
; SX = {x 2 R : fX (x) > 0}
Note: as P(X = x) = 0, P(X < x) = P(X  x) (for continuous r.v. only!)
MATH3801-MATH3901 Stochastic Processes 54 MATH3801-MATH3901 Stochastic Processes 55
2. Random Variables 2.3 Continuous Random Variables 2. Random Variables 2.3 Continuous Random Variables
The Uniform distribution The Uniform distribution
A random variable is said to have the Uniform distribution over an
interval [↵, ], i.e.
X ⇠ U[↵, ]

1.0
if its probability density function is given by

0.8
1
⇢ 1 β−α
if x 2 [↵, ]
fX (x) = ↵ (; SX = [↵, ])

0.6
0 otherwise

FX(x)

fX(x)
0.4
Constant density ; X is just as likely to be “close” to any value in [↵, ]

0.2
By integration, it is easy to show that

0.0
8
< 0 if x < ↵ α β α β
x ↵
FX (x) = ↵ if ↵  x  x x
:
1 if x > cdf FX (x) pdf fX (x) = FX0 (x)
MATH3801-MATH3901 Stochastic Processes 56 MATH3801-MATH3901 Stochastic Processes 57
2. Random Variables 2.3 Continuous Random Variables 2. Random Variables 2.3 Continuous Random Variables
The Exponential distribution The Exponential distribution
A random variable is said to have the Exponential distribution with
parameter ( > 0), i.e.
X ⇠ Exp( ),
1
if its probability density function is given by
⇢ x
e if x 0
λ
fX (x) = (; SX = R+ )
0 otherwise
FX(x)

fX(x)
1/2 ●
By integration, it is easy to show that
⇢
0 if x < 0
FX (x) = x
1 e if x 0 0 ln 2
0 λ 0
This distribution is often useful for representing random amounts of x x
time, like the amount of time required to complete a specified task, the
waiting time at a counter, the amount of time until you receive a phone cdf FX (x) pdf fX (x) = FX0 (x)
call, the lifetime of an electrical item, etc.
MATH3801-MATH3901 Stochastic Processes 58 MATH3801-MATH3901 Stochastic Processes 59
2. Random Variables 2.3 Continuous Random Variables 2. Random Variables 2.3 Continuous Random Variables
The Gamma distribution The Gamma distribution
A random variable is said to have the Gamma distribution with α=1 β=1 2 α=2 β=1 2 α=3 β=1 2 α=5 β=1 α=9 β=2
parameters (↵, ) (↵ > 0, > 0), i.e.

0.6

0.6
0.5

0.5

0.5
X ⇠ (↵, ),

0.4

0.4
fX(x)

fX(x)

fX(x)
0.3

0.3

0.3
if its probability density function is given by

0.2

0.2
0.1

0.1

0.1
( x ( x)↵ 1
e

0.0

0.0
if x 0 0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
fX (x) = (↵) (; SX = R+ ) x x x x x
0 otherwise

1.0
1.0

1.0
Note: the function (·) is defined as

0.8
0.8

0.8
0.6
0.6

0.6
0.6

0.6
Z

FX(x)

FX(x)
+1
x ↵ 1

0.4
0.4
0.4

0.4

0.4
(↵) = e x dx

0.2
0

0.2
0.2

0.2

0.2
0.0

0.0

0.0
0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15 0 5 10 15
x x x x x
generally, no closed form exists for FX (x)
Gamma pdf and cdf, for (↵, ) = {(1, 1/2), (2, 1/2), (3, 1/2), (5, 1), (9, 2)}
useful to represent skewed random variables
MATH3801-MATH3901 Stochastic Processes 60 MATH3801-MATH3901 Stochastic Processes 61
2. Random Variables 2.3 Continuous Random Variables 2. Random Variables 2.3 Continuous Random Variables
The Gamma distribution The Normal distribution
Note 1 A random variable is said to have the Normal distribution with
parameters µ and 2 , i.e.
↵ = shape parameter, = scale parameter
2
X ⇠ N (µ, ),
Note 2
For an integral value of ↵, say ↵ = n, it can be shown that if its probability density function is given by
1 (x µ)2
(n) = (n 1)! fX (x) = p e 2 2 (; SX = R)
2⇡ 2
In that case, the distribution is also known as the Erlang-n distribution,
and we have Unfortunately, no closed form exists for
n 1
X ( x)i Z x µ)2
FX (x) = 1 e x 1 (y
i! FX (x) = p e 2 2 dy
i=0 2⇡ 2 1
Note 3
Particular case, if ↵ = 1: “natural” distribution (“bell-shaped”)
(1, ) = Exp( ) most widely used continuous probability distribution in statistics
MATH3801-MATH3901 Stochastic Processes 62 MATH3801-MATH3901 Stochastic Processes 63
2. Random Variables 2.3 Continuous Random Variables 2. Random Variables 2.3 Continuous Random Variables
The Normal distribution The Standard Normal distribution
The Standard Normal distribution is the Normal distribution with µ = 0
1 and 2 = 1. This yields
1
1 x2
2πσ2
fX (x) = p e 2
2⇡
. .
Usually, in this particular situation, fX (x) = (x) and FX (x) = (x).
FX(x)

fX(x)
1/2
Property: Standardisation
If X ⇠ N (µ, 2 ), then
X
0
µ − 2σ µ−σ µ µ+σ µ + 2σ µ
Z = ⇠ N (0, 1)
µ − 2σ µ−σ µ µ+σ µ + 2σ
x x
cdf FX (x) pdf fX (x) = FX0 (x)
MATH3801-MATH3901 Stochastic Processes 64 MATH3801-MATH3901 Stochastic Processes 65
2. Random Variables 2.3 Continuous Random Variables 2. Random Variables 2.4 Expectation of a Random Variable
The Standard Normal distribution Parameters of a distribution
Many random variables have complicated distribution functions and it
is sometimes difficult to get an intuitive understanding of their
1 behaviour by simply glancing at their cdf
1 Fact
2π
Some quantities characterise a random variable more usefully
(although incompletely) than the whole cumulative distribution function
Φ(x)

φ(x)

1/2
; focus on certain general properties of the distribution
The two most important such quantities are:
the expectation and
−2 −1 0 1 2 −2 −1 0 1 2
the variance
x x
of the distribution
cdf (x) pdf (x) = 0 (x)
MATH3801-MATH3901 Stochastic Processes 66 MATH3801-MATH3901 Stochastic Processes 67
2. Random Variables 2.4 Expectation of a Random Variable 2. Random Variables 2.4 Expectation of a Random Variable
Expectation Expectation
Definition Expectation = expected value, mean value, average value
= “central” value, around which the r.v. takes its values
The expectation of a random variable X whose cdf is FX (x) is given by
= “centre of gravity” of the distribution
Z +1
E(X ) = x dFX (x)
1
dFX (x) is the differential of the function FX (x)
In the discrete case: dFX (x) = pX (x) for x 2 SX and 0 otherwise
In the continuous case: dFX (x) = fX (x)dx for x 2 SX and 0 otherwise
Thus,
Discrete r.v. Continuous r.v. =
X Z E(X )
E(X ) = x pX (x) E(X ) = x fX (x)dx ; location parameter
x2SX SX
Note: unit of E(X ) = unit of X
MATH3801-MATH3901 Stochastic Processes 68 MATH3801-MATH3901 Stochastic Processes 69
2. Random Variables 2.4 Expectation of a Random Variable 2. Random Variables 2.4 Expectation of a Random Variable
Expectation: examples Expectation: examples
Example 2.15 Example 2.17
What is E(X ) when X ⇠ Bin(n, p)?
What is the expected outcome when a fair die is rolled?
n ✓ ◆ n
X = outcome, SX = {1, 2, 3, 4, 5, 6} with pX (x) = 1/6 for any x 2 SX X n x X xn!
E(X ) = x
p (1 p)n x
= px (1 p)n x
E(X ) = 1 ⇥ 1/6 + 2 ⇥ 1/6 + 3 ⇥ 1/6 + 4 ⇥ 1/6 + 5 ⇥ 1/6 + 6 ⇥ 1/6 x (n x)!x!
x=0 x=1
= 3.5 n
X n!
= px (1 p)n x
; E(X ) need not be a possible outcome ! (n x)!(x 1)!
x=1
n
X
; E(X ) is not the most likely outcome (this is called the mode) (n 1)!
= np px 1 (1 p)n x
(n x)!(x 1)!
Example 2.15 (bis) x=1
Xn ✓ ◆ n 1✓
X ◆
What is the expected sum when two fair dice are rolled? n 1 n 1 x
= np px 1 (1 p)n x = np p (1 p)n 1 x
x 1 x
x=1 x=0
X = sum, SX = {2, 3, . . . , 12} with pX (x) = (6 |7 x|)/36 for any x 2 SX
= np(p + (1 p))n 1
= np
; E(X ) = 2 ⇥ 1/36 + 3 ⇥ 2/36 + . . . + 12 ⇥ 1/36 = 7
MATH3801-MATH3901 Stochastic Processes 70 MATH3801-MATH3901 Stochastic Processes 71
2. Random Variables 2.4 Expectation of a Random Variable 2. Random Variables 2.4 Expectation of a Random Variable
Expectation: examples Expectation: examples
Corollary: X ⇠ Bern(p) ; E(X ) = p Example 2.19
What is E(X ) when X ⇠ P( )?
For any event A, define the indicator r.v. 1I{·} as
⇢
1 if the event A occurs +1
X x
1I{A} = E(X ) = xe
0 if the event A does not occur x!
x=0
+1
This is a Bernoulli r.v., with p = P(A). Hence, X x 1
= e
(x 1)!
x=1
E(1I{A} ) = P(A) +1
X x
= e
; the operators P and E are somewhat equivalent, and some authors x=0
x!
actually base probability theory upon the expectation operator E
= e e
rather than upon the probability measure P
=
(not investigated here)
MATH3801-MATH3901 Stochastic Processes 72 MATH3801-MATH3901 Stochastic Processes 73
2. Random Variables 2.4 Expectation of a Random Variable 2. Random Variables 2.4 Expectation of a Random Variable
Expectation: examples
Example 2.20
What is E(X ) when X ⇠ U[↵, ] ? Example 2.21
What is E(X ) when X ⇠ Exp( )?
Z
1 Z
E(X ) = x dx +1
↵ ↵ E(X ) = x e x
dx

1 x2 0
Z
= ⇥ ⇤ +1
↵ 2 ↵ = xe x +1
+ e x
dx (by parts)
0
2 0
↵2  +1
= e x
2( ↵) =0+
↵+ 0
= 1
2 =
MATH3801-MATH3901 Stochastic Processes 74 MATH3801-MATH3901 Stochastic Processes 75
2. Random Variables 2.4 Expectation of a Random Variable 2. Random Variables 2.4 Expectation of a Random Variable
The Cauchy distribution The Cauchy distribution
A random variable is said to have the Cauchy distribution, i.e.
X ⇠ Cauchy

1.0
if its probability density function is given by

0.30
1

0.8

0.25
fX (x) = (; SX = R)
⇡(1 + x 2 )

0.20
0.6
By integration, it can be found that

FX(x)

fX(x)

0.15
1 1

0.4
1
FX (x) = + tan

0.10
(x)
2 ⇡

0.2

0.05
0.00
0.0
Augustin-Louis Cauchy −10 −5 0
x
5 10 −10 −5 0
x
5 10
(1789-1857), French
cdf FX (x) pdf fX (x) = FX0 (x)
mathematician
MATH3801-MATH3901 Stochastic Processes 76 MATH3801-MATH3901 Stochastic Processes 77
2. Random Variables 2.4 Expectation of a Random Variable 2. Random Variables 2.4 Expectation of a Random Variable
The Cauchy distribution: non-existence of the Expectation of a function of a random variable
expectation Sometimes we are interested, not in the expected value of X , but in the
It turns out that Z 1
expected value of a function of X , say g(X ). There is actually no need
1 for explicitly deriving the distribution of g(X ).
x dx
1 ⇡(1 + x 2 )
is not defined Expectation of a function of a random variable
Indeed, If X is r.v. with distribution FX , then for any function g, we have
Z 1 Z 0 Z 1 Z +1
1 1 1
x dx = x dx + x dx E(g(X )) = g(x)dFX (x)
1 ⇡(1 + x 2 ) 1 ⇡(1 + x 2 ) 0 ⇡(1 + x 2 ) 1
= ( 1) + (1) = ? R +1
In particular, E(aX + b) = 1 (ax + b)dFX (x)
; the expectation of the Cauchy distribution does not exist! R +1 R +1
=a 1 xdFX (x) + b 1 dFX (x)
This explains why the Cauchy distribution makes frequent
appearances in many ‘counterexamples’ of any kind = aE(X ) + b
Bottom line: the expectation seems to be a basic and natural quantity, With a = 0 ; E(b) = b (degenerate random variable)
but it is not defined for all distributions
MATH3801-MATH3901 Stochastic Processes 78 MATH3801-MATH3901 Stochastic Processes 79
2. Random Variables 2.4 Expectation of a Random Variable 2. Random Variables 2.4 Expectation of a Random Variable
Moments of a random variable Variance of a random variable
Definition
The variance of a random variable X is defined as
Also, define the k th moment of a random variable as
⇣ ⌘
Z +1 Var(X ) = E (X E(X ))2
E(X k ) = x k dFX (x)
1
Z +1
Explicitly, Var(X ) = (x E(X ))2 dFX (x)
In particular, the expectation E(X ) is the first moment of X , and 1
Z ; expected squared deviation of X from its expected value
+1
E(X 2 ) = x 2 dFX (x) ; the variance cannot be negative
1
; the variance quantifies the dispersion of the possible values of X
is the second moment of X around the “central” value E(X )
Note: unit of Var(X ) = (unit of X )2
p
; the standard deviation of X is defined as Var(X ) (same unit as X )
MATH3801-MATH3901 Stochastic Processes 80 MATH3801-MATH3901 Stochastic Processes 81
2. Random Variables 2.4 Expectation of a Random Variable 2. Random Variables 2.4 Expectation of a Random Variable
Variance: illustration Variance: properties
Two random variables X1 and X2 , with E(X1 ) = E(X2 ) Property 1
pdf of X1 pdf of X2 Var(X ) = E(X 2 ) (E(X ))2
1.0

1.0

variability of X2
Proof: Z +1
2
0.8

0.8

Var(X ) = (x E(X )) dFX (x)

1
Z Z Z
0.6

0.6

+1 +1 +1
= x 2 dFX (x) 2E(X ) xdFX (x) + (E(X ))2 dFX (x)
fX(x)

variability of X1
fX(x)

1 1 1
0.4

0.4

= E(X 2 ) 2E(X )E(X ) + (E(X ))2

= E(X 2 ) (E(X ))2
0.2

0.2

E(X1) = 0 E(X2) = 0
0.0

0.0

−3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3
As Var(X ) 0, this shows that E(X 2 ) (E(X ))2 for all random variable X
x x
This is a particular case of Jensen’s inequality: for all convex function h,
; Var(X1 ) > Var(X2 ) E(h(X )) h(E(X ))
MATH3801-MATH3901 Stochastic Processes 82 MATH3801-MATH3901 Stochastic Processes 83
2. Random Variables 2.4 Expectation of a Random Variable 2. Random Variables 2.4 Expectation of a Random Variable
Variance: properties Variance: example
Example
Property 2 What is Var(X ) when X ⇠ P( )?
Var(aX + b) = a2 Var(X )
+1
X x
Proof:
E(X 2 ) = x2 e
x!
x=0
Var(aX + b) = E((aX + b)2 ) (E(aX + b))2
+1
X x +1
X x
= E(a2 X 2 + 2abX + b2 ) (aE(X ) + b)2 = x(x 1) e + xe
x! x!
= a2 E(X 2 ) + 2abE(X ) + b2 (a2 (E(X ))2 + 2abE(X ) + b2 ) x=0 x=0
+1
X x 2
= a2 (E(X 2 ) (E(X ))2 ) = 2
e + E(X )
(x 2)!
= a2 Var(X ) x=2
2
= e e +
; dispersion not affected by translations 2
= +
With a = 0 ; Var(b) = 0 (degenerate random variable)
; Var(X ) = E(X 2 ) (E(X ))2 = 2
+ 2
=
MATH3801-MATH3901 Stochastic Processes 84 MATH3801-MATH3901 Stochastic Processes 85
2. Random Variables 2.4 Expectation of a Random Variable 2. Random Variables 2.5 Jointly distributed Random Variables
Variance: example Joint distribution function
Example Often, probability statements concerning two or more random
What is Var(X ) when X ⇠ U[↵, ] ? variables defined on the same sample space are of interest
; these two variables are most certainly related
Z  ; they should be jointly analysed
1 1 x3 3
↵3 2
+ ↵ + ↵2
E(X 2 ) = x2 dx = = = Let X and Y be two random variables defined on S
↵ ↵ ↵ 3 ↵ 3( ↵) 3
Definition
✓ ◆2 The joint cumulative distribution function of X and Y is given by
2
+ ↵ + ↵2 ↵+
; Var(X ) = E(X 2 ) (E(X ))2 =
3 2 FXY (x, y ) = P(X  x, Y  y ) 8(x, y ) 2 R ⇥ R
2
✓ ◆
+ ↵ + ↵2 ↵2 + 2↵ + 2
=
3 4 Note: (X  x, Y  y ) is the usual notation for (X  x) \ (Y  y )
( ↵)2 The so-called marginal cdf’s of X and Y are obtained as
=
12
FX (x) = FXY (x, +1) and FY (y ) = FXY (+1, y )
MATH3801-MATH3901 Stochastic Processes 86 MATH3801-MATH3901 Stochastic Processes 87
2. Random Variables 2.5 Jointly distributed Random Variables 2. Random Variables 2.5 Jointly distributed Random Variables
Joint distribution: discrete case Joint distribution: continuous case
Definition
X and Y are said to be jointly continuous if there exists a function
fXY (x, y ) : R ⇥ R ! R+ such that for any sets A and B of real numbers
If X and Y are both discrete, the joint probability mass function is Z Z
defined by
P(X 2 A, Y 2 B) = fXY (x, y )dy dx
pXY (x, y ) = P(X = x, Y = y ) A B
The marginal pmf of X and Y can be obtained by The function fXY (x, y ) is the joint probability density of X and Y
X X The marginal densities follows from
pX (x) = pXY (x, y ) and pY (y ) = pXY (x, y ) Z Z Z
y 2SY x2SX fX (x)dx = P(X 2 A) = P(X 2 A, Y 2 SY ) = fXY (x, y )dy dx
A A SY
R R
Thus, fX (x) = SY fXY (x, y )dy and fY (y ) = SX fXY (x, y )dx
Rx Ry
Note that, by definition, FXY (x, y ) = 1 1 fXY (u, v ) dv du, so that
@ 2 FXY
(x, y ) = fXY (x, y )
@x@y
MATH3801-MATH3901 Stochastic Processes 88 MATH3801-MATH3901 Stochastic Processes 89
2. Random Variables 2.5 Jointly distributed Random Variables 2. Random Variables 2.5 Jointly distributed Random Variables
Expectation of a function of two random variables Expectation of a function of two random variables
For instance, in the continuous case (also true in the discrete case)
Z Z
E(aX + bY ) = (ax + by )fXY (x, y )dy dx
SX SY
Z Z Z Z
For any function g : R ⇥ R ! R, the expectation of g(X , Y ) is given by
= ax fXY (x, y )dy dx + by fXY (x, y )dy dx
SX SY SX SY
X X Z Z Z Z
E(g(X , Y )) = g(x, y )pXY (x, y ) in the discrete case =a x fXY (x, y )dy dx + b y fXY (x, y )dx dy
SX SY SY SX
x 2S y 2SY
X Z Z
Z Z
=a xfX (x)dx + b yfY (y )dy
= g(x, y )fXY (x, y )dy dx in the continuous case SX SY
SX SY
= aE(X ) + bE(Y )
Example 2.29 bis
What is the expected sum obtained when two fair dice are rolled?
Let X be the sum and Xi the value shown on the ith die. Then, X = X1 + X2 ,
and
E(X ) = E(X1 ) + E(X2 ) = 2 ⇥ 3.5 = 7
MATH3801-MATH3901 Stochastic Processes 90 MATH3801-MATH3901 Stochastic Processes 91
2. Random Variables 2.5 Jointly distributed Random Variables 2. Random Variables 2.5 Jointly distributed Random Variables
Joint distribution: more than 2 random variables Expectation of a function of several random variables
Example 2.31 bis: the Matching problem
A computer system has n users, each with a unique name and password.
The generalisation to more than 2 variables is “straightforward” Due to a software error, the n passwords have been randomly permuted
The joint cdf of n random variables X1 , X2 , . . . , Xn is given by internally. Only the users lucky enough to have had their passwords
unchanged in the permutation are able to continue using the system. What is
FX1 X2 ...Xn (x1 , x2 , . . . , xn ) = P(X1  x1 , X2  x2 , . . . , Xn  xn ), the expected number of users who have their password unchanged?
Let X = number of users who keep their own password. We have
from which we can define a joint pmf (if all Xi ’s are discrete) or a joint X = X1 + X2 + . . . + Xn where
pdf (if they satisfy a definition as on Slide 89) ⇢
1 if the ith user keeps their own password
Xi =
It also follows ! 0 otherwise
n
X n
X
E ai Xi = ai E(Xi ) Now, as the ith user is equally likely to get any of the n passwords,
i=1 i=1 P(Xi = 1) = 1/n
As any Xi is Bernoulli distributed, it follows E(Xi ) = 1/n 8i so that
E(X ) = E(X1 ) + E(X2 ) + . . . + E(Xn ) = n ⇥ 1/n = 1
MATH3801-MATH3901 Stochastic Processes 92 MATH3801-MATH3901 Stochastic Processes 93
2. Random Variables 2.5 Jointly distributed Random Variables 2. Random Variables 2.5 Jointly distributed Random Variables
Independent random variables Independent random variables
Definition
Property
The random variables X and Y are said to be independent if, for all
If X and Y are independent, then for any functions h and g,
(x, y ) 2 R ⇥ R,
P(X  x, Y  y ) = P(X  x) ⇥ P(Y  y ) E(h(X )g(Y )) = E(h(X )) ⇥ E(g(Y ))
In other words, X and Y are independent if all couples of events Proof (in the continuous case):
(X  x) and (Y  y ) are independent
Z Z
Characterisation: For all (x, y ) 2 R ⇥ R, E(h(X )g(Y )) = h(x)g(y )fXY (x, y )dy dx
SX SY
FXY (x, y ) = FX (x) ⇥ FY (y ), Z Z
which is equivalent to = h(x)g(y )fX (x)fY (y )dy dx
SX SY
Z Z
pXY (x, y ) = pX (x) ⇥ pY (y ) in the discrete case = h(x)fX (x)dx ⇥ g(y )fY (y )dy
SX SY
or
fXY (x, y ) = fX (x) ⇥ fY (y ) in the continuous case = E(h(X )) ⇥ E(g(Y ))
(whenever FXY is differentiable at (x, y ))
MATH3801-MATH3901 Stochastic Processes 94 MATH3801-MATH3901 Stochastic Processes 95
2. Random Variables 2.5 Jointly distributed Random Variables 2. Random Variables 2.5 Jointly distributed Random Variables
Covariance of two random variables Covariance: interpretation
Definition Suppose X and Y are two Bernoulli random variables, and see that
The covariance of two random variables X and Y is defined by XY is then also a Bernoulli. It follows:
Cov(X , Y ) = E (X E(X ))(Y E(Y )) Cov(X , Y ) = E(XY ) E(X )E(Y ) = P(X = 1, Y = 1) P(X = 1)P(Y = 1)
Properties: Then,
Cov(X , Y ) = Cov(Y , X ) Cov(X , Y ) > 0 , P(X = 1, Y = 1) > P(X = 1)P(Y = 1)
Cov(X , X ) = Var(X ) P(X = 1, Y = 1)
, > P(Y = 1)
Cov(X , Y ) = E(XY ) E(X )E(Y ) P(X = 1)
Cov(aX + b, cY + d) = ac Cov(X , Y ) , P(Y = 1|X = 1) > P(Y = 1)
0 1
Xn m
X Xn Xm ; the outcome X = 1 makes it more likely that Y = 1
Cov @ Xi , Yj A = Cov(Xi , Yj )
i=1 j=1 i=1 j=1
; Y tends to increase when X does, and vice-versa
This interpretation holds for any r.v. X and Y (not only for Bernoulli r.v.)
Note: unit of Cov(X , Y ) = (unit of X ) ⇥ (unit of Y ) ; correlation
MATH3801-MATH3901 Stochastic Processes 96 MATH3801-MATH3901 Stochastic Processes 97
2. Random Variables 2.5 Jointly distributed Random Variables 2. Random Variables 2.5 Jointly distributed Random Variables
Covariance: interpretation Covariance: examples
Example
Cov(X , Y ) > 0 ; X and Y ● ●
Let the pmf of a r.v. X be pX (1) = pX ( 1) = p and pX (0) = 1 2p
tend to increase or decrease
●
●
● (p 2 (0, 1/2)). Define Y = X 2 . Find Cov(X , Y )
together ● ●
Cov(X , Y ) < 0 ; X tends to
● ●
We have Cov(X , Y ) = E(XY ) E(X )E(Y ) = E(X 3 ) E(X )E(X 2 ), but as X
● ●
only takes values 1, 0 and 1, X 3 = X . It remains
increase as Y decreases and
Y

●
● ● ●
vice-versa ●
●
●

●
● Cov(X , Y ) = E(X )(1 E(X 2 ))
Cov(X , Y ) = 0 ; no linear (X(ω), Y(ω))
Also, E(X ) = ( 1) ⇥ p + 1 ⇥ p = 0, so that
association between X and Y ● ●

●
(doesn’t mean no association
●
● Cov(X , Y ) = 0
at all!)
X ; X and Y are uncorrelated
Fact However, there is a direct functional dependence between X and Y !
X and Y independent ) Cov(X , Y ) = 0 In particular, P(Y = 0|X = 0) = 1, but P(Y = 0) = 1 2p 6= 0 ; X and Y are
: (X and Y are uncorrelated) not independent!
MATH3801-MATH3901 Stochastic Processes 98 MATH3801-MATH3901 Stochastic Processes 99
2. Random Variables 2.5 Jointly distributed Random Variables 2. Random Variables 2.5 Jointly distributed Random Variables
Covariance: examples Variance of a sum of random variables
From the properties of the covariance, it follows:
Example ! 0 1
Xn Xn n
X Xn n
X
Let U and V be two r.v. with a common distribution. Find the covariance Var = Cov @
Xi Xi , Xj A = Cov(Xi , Xj )
between X = U V and Y = U + V . i=1 i=1 j=1 i=1 j=1
n n X
We have X X
= Cov(Xi , Xi ) + Cov(Xi , Xj )
Cov(X , Y ) = Cov(U V,U + V) i=1 i=1 j6=i
= Cov(U, U) Cov(V , U) + Cov(U, V ) Cov(V , V ) n
X X
= Var(U) Var(V ) Cov(V , U) + Cov(U, V ) = Var(Xi ) + 2 Cov(Xi , Xj )
i=1 i<j
Now, as U and V have the same distribution, Var(U) = Var(V ), and for
Cov(V , U) = Cov(U, V ), it remains Now, if X1 , X2 , . . . , Xn are uncorrelated random variables,
n
! n
Cov(X , Y ) = 0
X X
Var Xi = Var(Xi )
i=1 i=1
; X and Y are uncorrelated but they are not independent
Obviously, this is also true if X1 , X2 , . . . , Xn are independent
MATH3801-MATH3901 Stochastic Processes 100 MATH3801-MATH3901 Stochastic Processes 101
2. Random Variables 2.5 Jointly distributed Random Variables 2. Random Variables 2.5 Jointly distributed Random Variables
Variance of a sum of r.v.: examples Variance of a sum of r.v.: examples
Example 2.34 Example 2.31 bis bis: the Matching problem
What is the variance of X if X ⇠ Bin(n, p) ? Find Var(X ) when X is the number of users who keep their password in
Example 2.31 bis
X = number of successes in n independent repetitions of a certain
experiment, thus X = X1 + X2 + . . . + Xn where the Xi are independent Remind that X = X1 + X2 + . . . + Xn , where Xi = 1 if the ith user keeps their
Bernoulli(p) r.v. such that own password and 0 otherwise. So Xi ⇠ Bern(1/n), but they are not
⇢ independent! Hence,
1 if the ith repetition is a success
Xi = n
X X
0 otherwise
Var(X ) = Var(Xi ) + 2 Cov(Xi , Xj ).
For any i, E(Xi2 ) = E(Xi ) = p, so that i=1 i<j
As Xi ⇠ Bern(1/n) for all i, it directly follows
Var(Xi ) = E(Xi2 ) (E(Xi ))2 = p p2 = p(1 p) ✓ ◆
1 1 n 1
Var(Xi ) = 1 =
Finally, as Var(X ) = Var(X1 ) + Var(X2 ) . . . + Var(Xn ), we have n n n2
Also, for i 6= j,
Var(X ) = np(1 p)
Cov(Xi , Xj ) = E(Xi Xj ) E(Xi )E(Xj )
MATH3801-MATH3901 Stochastic Processes 102 MATH3801-MATH3901 Stochastic Processes 103
2. Random Variables 2.5 Jointly distributed Random Variables 2. Random Variables 2.5 Jointly distributed Random Variables
Variance of a sum of r.v.: examples Sum of independent random variables
Example 2.31 bis bis: the Matching problem
Find Var(X ) when X is the number of users who keep their password in Let X and Y be two random variables. We know
Example 2.31 bis
We have (Bernoulli r.v. with multiplicative law of probability):
E(X + Y ) = E(X ) + E(Y )
Var(X + Y ) = Var(X ) + Var(Y ) + 2 Cov(X , Y )
1 1
E(Xi Xj ) = P(Xi = 1, Xj = 1) = P(Xi = 1|Xj = 1) P(Xj = 1) = ⇥
n 1 n
and if X and Y are independent,
so that ✓ ◆2
1 1 1 1 Var(X + Y ) = Var(X ) + Var(Y )
Cov(Xi , Xj ) = = 2
n 1 n n n (n 1)
Finally, What about the full distribution of X + Y ?
✓ ◆
n 1 n 1 n 1 1 Nothing can be said in the general case
Var(X ) = n +2 = +
n2 2 n2 (n 1) n n
=1
MATH3801-MATH3901 Stochastic Processes 104 MATH3801-MATH3901 Stochastic Processes 105
2. Random Variables 2.5 Jointly distributed Random Variables 2. Random Variables 2.5 Jointly distributed Random Variables
Sum of independent random variables Sum of independent random variables: examples
However if X and Y are independent, we have (in the continuous case)
ZZ Example 2.36: sum of two independent Uniform r.v.
FX +Y (z) = P(X + Y  z) = fXY (x, y ) dx dy Find the probability density of X + Y if X and Y are two independent U[0,1] r.v.
(x,y ):x+y z
Z +1 Z z y
= fX (x)fY (y ) dx dy We have fX (z) = fY (z) = 1 if 0  z  1, and 0 otherwise. Thus,
1 1 Z
Z +1 ✓Z z y ◆ 1
= fX (x) dx fY (y ) dy fX +Y (z) = fX (z y ) dy (= convolution of fX and fY ⌘ 1)
1 1 0
Z +1 Z
= FX (z y )fY (y ) dy = FX (z y )fY (y ) dy If 0  z  1, this reduces to
1 SY
Z z Hence,
By differentiating with respect to z, it remains (if some mild conditions hold), fX +Y (z) = dy = z 8
Z +1 0 < z if 0  z  1
fX +Y (z) = 2 z if 1  z  2
fX +Y (z) = fX (z y )fY (y ) dy and if 1  z  2, :
1 0 otherwise
Z 1
= convolution of fX (x) and fY (y )
P fX +Y (z) = dy = 2 z
In the discrete case: P(X + Y = z) = y 2SY P(X = z y )P(Y = y ) z 1
MATH3801-MATH3901 Stochastic Processes 106 MATH3801-MATH3901 Stochastic Processes 107
2. Random Variables 2.5 Jointly distributed Random Variables 2. Random Variables 2.5 Jointly distributed Random Variables
Sum of independent random variables: examples Example 2.38
Let X1 , X2 , . . ., Xn be i.i.d. continuous r.v. with cdf F and pdf F 0 = f . Let X(i)
Example 2.37: sum of two independent Poisson r.v. denote the ith smallest of these random variables. What is the density of X(i) ?
Find the probability mass function of X + Y if X and Y are independent
Poisson r.v. with respective means 1 and 2 Let’s first derive the cdf:
the probability of any of the n r.v. X1 , . . ., Xn being less or equal to some
We have, for any z 2 {0, 1, . . .}, x is F (x) (same distribution)
z
X z k k
1 2 ; as they are independent, the number of them being effectively less or
P(X + Y = z) = e 1
e 2
(z k )! k! equal to x is Binomially distributed with parameters (n, F (x))
k =0
z
X z k k X(i) will be smaller or equal to x iff at least i are less or equal to x
=e ( 1+ 2) 1 2 Pn n k
k !(z k )! ; It follows P(X(i)  x) = k =i k (F (x)) (1 F (x))n k
k =0
X z Differentiating yields the density of X(i) :
e ( 1+ 2) z! z k k
= 1 2 n
z! k !(z k )! X n!
k =0
fX(i) (x) = k (F (x))k 1
f (x)(1 F (x))n k
e ( 1+ 2) (n k )!k !
z k =i
= ( 1 + 2)
z! + (F (x))k (n k )(1 F (x))n k 1
( f (x))
; X + Y ⇠ P( 1 + 2)
MATH3801-MATH3901 Stochastic Processes 108 MATH3801-MATH3901 Stochastic Processes 109
2. Random Variables 2.5 Jointly distributed Random Variables 2. Random Variables 2.6 Moment Generating Functions
Example 2.38 Moment Generating Function
Let X1 , X2 , . . ., Xn be i.i.d. continuous r.v. with cdf F and pdf F 0 = f . Let X(i)
Definition
denote the ith smallest of these random variables. What is the density of X(i) ?
The moment generating function (mgf) of a random variable X is
(ctd.) defined as
⇣ ⌘ Z +1
n
X n! 'X (t) = E etX = etx dFX (x)
fX(i) (x) = f (x) (F (x))k 1
(1 F (x))n k
(n k )!(k 1)! 1
k =i
n 1
X n! For any r.v. X , 'X is thus a positive function such that 'X (0) = 1
f (x) (F (x))k (1 F (x))n k 1
However, the integral need not exist for all t:
(n k 1)!k !
k =i
n for t > 0, 'X (t) only exists if 1 FX (x) approaches 0 faster than
X n!
= f (x) (F (x))k 1
(1 F (x))n k e tx as x ! +1
(n k )!(k 1)!
k =i for t < 0, 'X (t) only exists if FX (x) approaches 0 faster than e tx
n
X n! as x ! 1
f (x) (F (x))k 1
(1 F (x))n k
(n k )!(k 1)!
k =i+1 Fact:
n! if 'X exists (is finite) over a set of real numbers t around 0, then its
fX(i) (x)= f (x)(F (x))i 1
(1 F (x))n i
(intuitive)
(n i)!(i 1)! derivatives of all order exist at 0
MATH3801-MATH3901 Stochastic Processes 110 MATH3801-MATH3901 Stochastic Processes 111
2. Random Variables 2.6 Moment Generating Functions 2. Random Variables 2.6 Moment Generating Functions
Moment Generating Function Moment Generating Function: examples
Property Example 2.40
If 'X (t) exists around 0, then for any positive integer n, Find the mgf of X ⇠ Bin(n, p) and deduce its expectation and variance
n ✓ ◆
d n 'X (t) X n x
= E(X n ), etx p (1 p)n x
(n)
'X (0) = 'X (t) =
dt n x
t=0 x=0
Xn ✓ ◆
that is the nth derivative of 'X evaluated at 0 is the nth moment of X n
= (pet )x (1 p)n x
x
x=0
; finding the mgf of a random variable provides a convenient way to = (pet + 1 p)n for all t
calculate its moments
Then, 'X0 (t) = n(pet + 1 p)n 1
pet so that
Note: the characteristic function of a random variable X is defined as
⇣ ⌘ Z 1 p E(X ) = 'X0 (0) = np,
= E eitX =
X (t) eitx where i = dFX (x),
1
1 and 'X00 (t) = n(n 1)(pet + 1 p)n 2
(pet )2 + n(pet + 1 p)n 1
pet , so that
This one enjoys the same properties as the mgf, and it always exists E(X 2 ) = 'X00 (0) = n(n 1)p2 + np
In this course, however, we will consider the moment generating
function 'X (t) only to avoid complex analysis It follows Var(X ) = n(n 1)p2 np (np)2 = np(1 p)
MATH3801-MATH3901 Stochastic Processes 112 MATH3801-MATH3901 Stochastic Processes 113
2. Random Variables 2.6 Moment Generating Functions 2. Random Variables 2.6 Moment Generating Functions
Moment Generating Function: examples Moment Generating Function: examples
Example 2.42
Example 2.41
Find the mgf of X ⇠ Exp( ) and deduce its expectation and variance
Find the mgf of X ⇠ P( ) and deduce its expectation and variance
Z +1
+1 +1 'X (t) = etx e x
dx
X x X ( e t )x 0
'X (t) = etx e =e Z 
x! x! +1
e ( t)x +1
x=0 x=0 t)x
= e (
dx =
=e e et
=e (et 1)
for all t 0 ( t) 0
2
= if t <
Then, 'X0 (t) = et e (et 1)
and 'X00 (t) = et e (et 1)
+ et e (et 1)
, so that t
2
Then, 'X0 (t) = and 'X00 (t) = , so that
E(X ) = 'X0 (0) = , ( t)2 ( t)3
and E(X ) = 'X0 (0) = 1/ ,
E(X 2 ) = 'X00 (0) = 2
+ and
It follows Var(X ) = 2
+ 2
= E(X 2 ) = 'X00 (0) = 2/ 2
2 2 2
It follows Var(X ) = 2/ 1/ = 1/
MATH3801-MATH3901 Stochastic Processes 114 MATH3801-MATH3901 Stochastic Processes 115
2. Random Variables 2.6 Moment Generating Functions 2. Random Variables 2.6 Moment Generating Functions
Moment Generating Function: examples Mgf: properties
Example 2.43
Property 1
Find the mgf of X ⇠ N (0, 1) and deduce its expectation and variance
The moment generating function uniquely determines the distribution
Z +1
1 x2 of a random variable
'X (t) = etx p e 2 dx
2⇡
Z +1
1
; there is a one-to-one correspondence between mgf and distribution
1 x 2 2tx
= p e 2 dx
1 2⇡ Example
Z +1
t2 1 (x t)2 t2
If the mgf of a random variable X is 'X (t) = e3(e
t 1) , find P(X = 0)
=e2 p e 2 dx = e 2
2⇡ 1
t2 t2
We see that 'X is the mgf of a Poisson random variable with mean
Then, 'X0 (t) = te 2 and 'X00 (t) = (t 2 + 1)e 2 , so that = 3. Therefore, the distribution of X must be P(3), and
E(X ) = 'X0 (0) = 0, 3
P(X = 0) = e ' 0.05
and
E(X 2 ) = 'X00 (0) = 1.
It follows Var(X ) = 1.
MATH3801-MATH3901 Stochastic Processes 116 MATH3801-MATH3901 Stochastic Processes 117
2. Random Variables 2.6 Moment Generating Functions 2. Random Variables 2.6 Moment Generating Functions
Mgf: properties Mgf: properties
Property 2
Example 2.43
For any real numbers a and b, we have
2
Find the mgf of X ⇠ N (µ, ) and deduce E(X ) and Var(X )
'aX +b (t) = ebt 'X (at) t2 X µ
We know 'Z (t) = e 2 for Z = ⇠ N (0, 1). Hence, for X = µ + Z , it
Proof: 'aX +b (t) = E e(aX +b)t = E ebt eatX = ebt E e(at)X = ebt 'X (at) directly follows
2t2 2t2
'X (t) = eµt e 2 = eµt+ 2 .
Corollary: E(aX + b) = aE(X ) + b and Var(aX + b) = a2 Var(X )
Hence,
Proof: 2
2t2
0
= bebt aebt 'X0 (at) 'X0 (t) = (µ + t) eµt+ 2 ; E(X ) = 'X0 (0) = µ
'aX +b (t) 'X (at) +
and and
00 2 bt bt 0 2 bt 00
'aX +b (t) = b e 'X (at) + 2abe 'X (at) + a e 'X (at)
2t2
2 2
'X00 (t) = ( + (µ + t)2 ) eµt+ 2 ; E(X 2 ) = 'X00 (0) = 2
+ µ2
The result follows from E(aX + b) = 'aX
0
+b (0) and
so that
2
Var(X ) = E(X 2 ) (E(X ))2 = ( 2
+ µ2 ) µ2 = 2
00 0
Var(aX + b) = 'aX +b (0) ('aX +b (0))
MATH3801-MATH3901 Stochastic Processes 118 MATH3801-MATH3901 Stochastic Processes 119
2. Random Variables 2.6 Moment Generating Functions 2. Random Variables 2.6 Moment Generating Functions
Mgf: properties Example 2.45
Find the distribution of X + Y if X ⇠ P( 1) and Y ⇠ P( 2 ), X and Y
Property 3 being independent
If X and Y are two independent random variables,
We have
'X +Y (t) = 'X (t) ⇥ 'Y (t) 1 (e
t 1) 2 (e
t 1) t 1)
'X +Y (t) = 'X (t)'Y (t) = e e = e( 1 + 2 )(e
Proof:
ind. ; X + Y ⇠ P( 1 + 2) (see Example 2.37)
'X +Y (t) = E et(X +Y ) = E etX etY = E etX E etY = 'X (t)'Y (t)
Example 2.44 Example 2.46
Find the distribution of X + Y if X ⇠ Bin(n1 , p) and Y ⇠ Bin(n2 , p), X and Y Find the distribution of X + Y if X ⇠ N (µ1 , 2 and Y ⇠ N (µ2 , 2 X
1) 2 ),
being independent
and Y being independent
We have
'X +Y (t) = 'X (t)'Y (t) = (pet + 1 p)n1 (pet + 1 p)n2 = (pet + 1 p)n1 +n2 We have 2 2 2 2 2 + 2 )t 2
1t 2t ( 1 2
'X +Y (t) = 'X (t)'Y (t) = eµ1 t+ 2 eµ2 t+ 2 = e(µ1 +µ2 )t+ 2
; X + Y ⇠ Bin(n1 + n2 , p)
; X + Y ⇠ N µ1 + µ2 , 2 + 2
1 2
MATH3801-MATH3901 Stochastic Processes 120 MATH3801-MATH3901 Stochastic Processes 121
2. Random Variables 2.6 Moment Generating Functions 2. Random Variables 2.6 Moment Generating Functions
Joint Moment Generating Function Joint Moment Generating Function
Definition It can be shown that 'X1 X2 ···Xn uniquely determines the joint distribution
of (X1 , X2 , . . . , Xn ) (one-to-one correspondence between joint mgf and
The joint moment generating function of a collection of random
joint distribution)
variables X , X , . . . , X is defined for all real values t , t , . . . , tn by
1 2 n 1 2
⇣ ⌘ In addition, if 'X1 X2 ···Xn (t1 , t2 , . . . , tn ) exists in a nonempty rectangle
'X1 X2 ···Xn (t1 , t2 , . . . , tn ) = E et1 X1 +t2 X2 +...+tn Xn containing the origin, then its partial derivatives with respect to each ti
of every order exist at (0, 0, . . . , 0), and
(provided this expectation exists) ⇣ ⌘ @ k1 +k2 +...+kn ' (t , t , . . . , t )
X 1 2 n
E X1k1 X2k2 . . . Xnkn =
This joint mgf always exists at the origin (0, 0, . . . , 0), and @t1k1 @t2k2 . . . @tnkn t1 =0,t2 =0,...,tn =0
'X1 X2 ···Xn (0, 0, . . . , 0) = 1. At other points, it may or may not exist For instance, if (X , Y ) is a random vector with a joint mgf 'XY in some
Where it exists, the joint mgf provides much information about the joint rectangle around (0, 0), then
distribution of X , X , . . . , X :
1 2 n @ 2 'XY (t1 , t2 )
Cov(X , Y ) =
⇣ ⌘ @t1 @t2 t1 =0,t2 =0
'X1 X2 ···Xn (t, t, . . . , t) = E et(X1 +X2 +...+Xn ) = 'X1 +X2 +...+Xn (t) @' (t1 , t2 ) @' (t1 , t2 )
XY XY
⇣ ⌘ ⇥
@t1 @t2
' X1 X2 ···Xn (t, 0, . . . , 0) = E etX1 = ' (t) X1
t1 =0,t2 =0 t1 =0,t2 =0
MATH3801-MATH3901 Stochastic Processes 122 MATH3801-MATH3901 Stochastic Processes 123
2. Random Variables 2.6 Moment Generating Functions 2. Random Variables 2.6 Moment Generating Functions
Multivariate Normal Distribution Cramér-Wold theorem
Definition
The random variables X1 , X2 , . . ., Xn are said to have a multivariate normal
distribution if there exist m independent standard normal random variables
Z1 , . . ., Zm and some constants aij , 1  i  n, 1  j  m and µi , 1  i  n,
An alternative definition is the following
such that 8
> X1
>
>
= µ1 + a11 Z1 + a12 Z2 + . . . + a1m Zm
Definition - ‘Cramér-Wold theorem’
< X2 = µ2 + a21 Z1 + a22 Z2 + . . . + a2m Zm
. The random variables X1 , X2 , . . ., Xn are said to have a multivariate
>
> ..
>
: normal distribution if and only if, for any vector (b1 , b2 , . . . , bn ) 2 Rn ,
Xn = µn + an1 Z1 + an2 Z2 + . . . + anm Zm
the linear combination b1 X1 + b2 X2 + . . . + bn Xn has a (univariate)
From the properties of the expectation, the variance and the covariance, it normal distribution
follows that for any i and j in {1, . . . , n},
m
X m
X
2
E(Xi ) = µi Var(Xi ) = aik Cov(Xi , Xj ) = aik ajk .
k =1 k =1
Besides, as a linear combination of independent normal rv, each Xi is
normally distributed
MATH3801-MATH3901 Stochastic Processes 124 MATH3801-MATH3901 Stochastic Processes 125
2. Random Variables 2.6 Moment Generating Functions 2. Random Variables 2.6 Moment Generating Functions
Multivariate Normal Distribution - warning Copula
This can be understood through the concept of copula
Careful! The fact that Xi ⇠ N (µi , i2 ) for each i = 1, . . . , n does not It can be shown that any joint distribution FXY can be written as a
imply that (X1 , X2 , . . . , Xn ) together have a multivariate normal particular function of its marginals FX and FY
distribution
Copula
Here is a simple counterexample:
There always exists a function CXY such that, 8x, y ,
Let X ⇠ N (0, 1) and a > 0. Define
⇢ FXY (x, y ) = CXY (FX (x), FY (y ))
X if |X | < a
Y =
X if |X | a This function CXY is called the copula of the distribution, and is unique
if FXY is continuous (this generalises to more dimensions than 2)
Exercise: show that Y ⇠ N (0, 1) also, but (X , Y ) is not bivariate
normal Note 1: copula is Latin for ‘link’
Note 2: the previous result is known as Sklar’s theorem
MATH3801-MATH3901 Stochastic Processes 126 MATH3801-MATH3901 Stochastic Processes 127
2. Random Variables 2.6 Moment Generating Functions 2. Random Variables 2.6 Moment Generating Functions
Copula Gaussian Copula
Intuitively, CXY describes how the marginal FX and FY ‘interact’ to
produce the joint FXY
In fact, two marginal normal distributions will interact so as to produce
a bivariate normal distribution with correlation ⇢ if and only if their
Example: Independence copula copula takes the following very particular form:
If CXY (u, v ) = uv , then FXY (x, y ) = FX (x) ⇥ FY (y ) ✓ ◆
2⇢ 1 (⇠) 1
1 (⌘)}2
)
(⌘) ⇢ 2 {
( 1 (⇠)}2 +{
; X and Y and independent Z u Z v exp 2(1 ⇢2 )
CXY (u, v ) = p d⇠ d⌘
0 0 1 ⇢2
In general, the copula fully accounts for the dependence structure
between random variables ; this is called the Gaussian copula with correlation ⇢
Importantly, it is disjoint from the marginals But any other copula function gives a joint distribution with normal
; with the same marginals, we can produce many different joint marginals, which is not bivariate normal!
distributions using different copulas, accounting for many different
dependence structures
MATH3801-MATH3901 Stochastic Processes 128 MATH3801-MATH3901 Stochastic Processes 129
2. Random Variables 2.6 Moment Generating Functions 2. Random Variables 2.6 Moment Generating Functions
Normal marginals + Gaussian copula = Bivariate Normal marginals + other copula 6= Bivariate Normal
Normal
MATH3801-MATH3901 Stochastic Processes 130 MATH3801-MATH3901 Stochastic Processes 131
2. Random Variables 2.6 Moment Generating Functions 2. Random Variables 2.6 Moment Generating Functions
Normal marginals + other copula 6= Bivariate Normal Joint Moment Generating Function: example
Example 2.48: the Multivariate Normal Distribution
Find the joint mgf of a multivariate normal vector (X1 , X2 , . . ., Xn )
⇣ Pn ⌘ Pn
By definition, 'X1 X2 ···Xn (t1 , t2 , . . . , tn ) = E e i=1 ti Xi . Note that i=1 ti Xi is
itself normally distributed, with
n
! n n
X X X
E ti Xi = ti E(Xi ) = ti µ i
i=1 i=1 i=1
and
n
! n
X X X
Var ti Xi = ti2 Var(Xi ) + 2 ti tj Cov(Xi , Xj )
i=1 i=1 i<j
n
X m
X X m
X
= ti2 2
aik +2 ti tj aik ajk
i=1 k =1 i<j k =1
MATH3801-MATH3901 Stochastic Processes 132 MATH3801-MATH3901 Stochastic Processes 133
2. Random Variables 2.6 Moment Generating Functions 2. Random Variables 2.6 Moment Generating Functions
Joint Moment Generating Function: example Joint Moment Generating Function: example
Example 2.48: the Multivariate Normal Distribution
Now, let X1⇤ , X2⇤ , . . ., Xn⇤ be independent normal r.v., with E(Xi⇤ ) = µi
Find the joint mgf of a multivariate normal vector (X1 , X2 , . . ., Xn )
and Var(Xi⇤ ) = Var(Xi )
; 'X1 X2 ···Xn (t1 , t2 , . . . , tn ) = 'Y (1) where Y is a normally distributed r.v., with n n
⇣ P ⌘ Y ⇣ ⌘ Y
the above parameters, i.e., from Example 2.43 Slide 119, n
; 'X1⇤ X2⇤ ···Xn⇤ (t1 , t2 , . . . , tn ) = E e i=1 ti Xi = E eti Xi =
⇤ ⇤
'Xi⇤ (ti )
0 1
n n m m i=1 i=1
X 1 X 2X 2 X X ✓ ◆
'X1 X2 ···Xn (t1 , t2 , . . . , tn ) = exp @ ti µ i + ti aik + ti tj aik ajk A n
Y
2 1
i=1 i=1 k =1 i<j k =1 = exp µi ti + ti2 Var(Xi )
2
i=1
If Cov(Xi , Xj ) = 0 for any i 6= j n n
!
X 1X 2
Pn Pn = exp ti µ i + ti Var(Xi )
; 'X1 X2 ···Xn (t1 , t2 , . . . , tn ) = exp 1 2 2
i=1 ti µi + 2 i=1 ti Var(Xi ) i=1 i=1
; identical to 'X1 X2 ···Xn (t1 , t2 , . . . , tn )
MATH3801-MATH3901 Stochastic Processes 134 MATH3801-MATH3901 Stochastic Processes 135
2. Random Variables 2.6 Moment Generating Functions 2. Random Variables 2.8 Limit Theorems
Joint Moment Generating Function: example Markov’s Inequality
Named after the Russian mathematician Andrey Markov (1856-1922)
The joint mgf of a series of independent normal r.v. is identical to the Proposition
joint mgf of a series of uncorrelated normal r.v.
If X is a random variable that takes only nonnegative values, then for
; both must have the same joint distribution! any value a > 0,
E(X )
P(X a) 
Consequence a
If X and Y are normally distributed: Proof:
Z +1 Z a Z +1
X and Y independent , Cov(X , Y ) = 0 E(X ) = x dFX (x) = x dFX (x) + x dFX (x)
0 0 a
Z +1
Note: take ⇢ = 0 in the expression of the Gaussian copula (Slide 129), x dFX (x)
it reduces to CXY (u, v ) = uv , i.e. the independence copula! Z
a
Z
+1 +1
a dFX (x) = a dFX (x) = aP(X a)
a a
MATH3801-MATH3901 Stochastic Processes 136 MATH3801-MATH3901 Stochastic Processes 137
2. Random Variables 2.8 Limit Theorems 2. Random Variables 2.8 Limit Theorems
Chebyshev’s inequality Strong Law of Large Numbers
Named after another Russian mathematician (who happened to be the Theorem
teacher of Markov): Pafnuty Chebyshev (1821-1894)
Let X1 , X2 , . . ., Xn be a sequence of independent random variables
Proposition having a common distribution, and let E(Xi ) = µ < 1. Then, with
Let X be a random variable with mean µ and variance 2. Then, for probability 1,
any value k > 0, . X1 + X2 + . . . + Xn
2 X̄n = !µ as n ! 1
P(|X µ| k)  n
k2
Proof: apply Markov’s inequality to the nonnegative r.v. (X µ)2 : For instance, suppose that a sequence of independent trials is
performed, and let E be a fixed event. Let Xi = 1 if E occurs on the ith
E((X µ)2 ) 2
trial and 0 otherwise. Then, the “long-run frequency”
P(|X µ| k ) = P((X µ)2 k 2)  =
k2 k2
X1 + X2 + . . . + Xn
! “sthg”,
Markov’s and Chebyshev’s inequalities provide bounds on some n
probabilities of interest which are valid for any random variable and this number is defined as the probability of the event E (frequentist
; herein lies their power definition of the probability - Slide 18)
MATH3801-MATH3901 Stochastic Processes 138 MATH3801-MATH3901 Stochastic Processes 139
2. Random Variables 2.8 Limit Theorems 2. Random Variables 2.8 Limit Theorems
Strong Law of Large Numbers: example Central Limit Theorem: motivation
.
The SLLN does not say that Sn = X1 + X2 + . . . + Xn is close to
average dice value over number of rolls E(Sn ) = nµ with high probability as n ! +1
The variance of Sn can easily be seen to equal n 2 ! 1
6

(with 2 = Var(Xi ))
; little idea on where about is Sn when n is large!
5

However, E(Sn /n) = µ and Var(Sn /n) = n 2 /n2 = 2 /n !0

4
mean value

; Sn /n converges to the degenerate rv µ, in the sense that

2
3

P (|Sn /n µ| k )  !0 for all k > 0, as n ! 1

nk 2
p p
However, E((Sn nµ)/ n) = 0 and Var((Sn nµ)/ n) = n 2 /n = 2
2
p
; (Sn nµ)/ n is a proper r.v. for any n
1
0 1000 2000 3000 4000 5000
The Central Limit Theorem states that the distribution of that random
trials variable as n ! 1 is normal, whatever the distribution of the Xi ’s
MATH3801-MATH3901 Stochastic Processes 140 MATH3801-MATH3901 Stochastic Processes 141
2. Random Variables 2.8 Limit Theorems 2. Random Variables 2.8 Limit Theorems
The Central Limit Theorem Central Limit Theorem
The Central Limit Theorem (CLT) is certainly one of the most Theorem
remarkable results in probability (“the unofficial sovereign of probability Let X1 , X2 , . . ., Xn be a sequence of i.i.d. random variables, with
theory”) E(Xi ) = µ < 1 and Var(Xi ) = 2 < 1. Then, the distribution of Sn pnµ n
tends to the standard normal as n ! 1, that is
It was first postulated by Abraham de Moivre (French mathematician,
✓ ◆ Z
1667-1754) who used the normal curve to approximate the distribution Sn nµ 1 z x2
of the number of heads resulting from many tosses of a fair coin P p z ! (z) = p e 2 dx
n 2⇡ 1
However, this received little attention until the French mathematician
This holds for any distribution of Xi ! (subject to the above regularity
Pierre-Simon Laplace (1749-1827) rescued it from obscurity in his
conditions) ; very powerful result!
monumental work “Théorie Analytique des Probabilités”, which was
published in 1812 This is usually denoted Sn pnµ L
! N (0, 1), or, dividing numerator and
n
But it was not before 1901 that it was defined in general terms and denominator by n,
formally proved by the Russian mathematician Aleksandr Lyapunov p X̄n µ L
n ! N (0, 1)
(1857-1918)
where X̄n = Sn /n is the average of X1 , X2 , . . ., Xn
MATH3801-MATH3901 Stochastic Processes 142 MATH3801-MATH3901 Stochastic Processes 143
2. Random Variables 2.8 Limit Theorems 2. Random Variables 2.8 Limit Theorems
Central Limit Theorem: a heuristic proof Central Limit Theorem: a heuristic proof (ctd.)
Suppose that Xi have mean 0 and variance 1, and let E etX be their p
p
common mgf. Then, the mgf of S / n is
n ; the mgf of Sn / n converges to the mgf of a standard normal distribution
p ⇣ p p p ⌘ p
; it can be shown that this implies that the distribution Sn / n converges to
E exp(t(X1 + X2 + . . . + Xn )/ n) = E etX1 / n etX2 / n · · · etXn / n
the standard normal distribution (using the one-to-one
⇣ p ⌘n
= E etX / n (i.i.d.) correspondence between mgf and distribution, plus some technical
arguments)
Now, when n is large, we have
p tX t 2X 2 When Xi has mean µ and variance 2
, the r.v. Xi µ
has mean 0 and variance
etX / n
'1+ p +
n 2n 1, and the general result follows.
Sn pnµ
by a Taylor expansion, so that Remark: the CLT states that n
⇠ N (0, 1) only when n is infinitely large
⇣ p ⌘ tE(X ) t 2
E(X 2
) t2
E etX / n ' 1 + p + =1+ ; never!
n 2n 2n
p ⇣ ⌘n ; the standard normal distribution provides a reasonable approximation to
t2
Therefore, E exp(tSn / n) ' 1 + 2n , and it follows the distribution of Sn pnµ
n
when “n is large”, and herein is its utility
p t2 The Central Limit Theorem mostly explains the prevalence of the normal
lim E exp(tSn / n) = e 2
n!1 distribution over the whole statistical theory
MATH3801-MATH3901 Stochastic Processes 144 MATH3801-MATH3901 Stochastic Processes 145
2. Random Variables 2.8 Limit Theorems 2. Random Variables 2.8 Limit Theorems
Central Limit Theorem: applications Central Limit Theorem: applications
Let X ⇠ Bin(n, p). Then, X = X1 + X2 + . . . + Xn where the Xi are Example 2.50
independent Bernoulli r.v. with parameter p, i.e. E(Xi ) = p and Let X be the number of times that a fair coin, flipped 40 times, lands Heads.
Var(Xi ) = p(1 p). Hence, Find the probability that X = 20
X np d Obviously, X ⇠ Bin(40, 1/2) so that E(X ) = 20, Var(X ) = 10 and
p ! N (0, 1) as n ! 1 ✓ ◆
np(1 p) 40
(1/2)40 = 0.1268
P(X = 20) =
20
Note: usually, the approximation is quite good for values of n satisfying
But imagine you miss this. It does not (really) matter, as the CLT provides you
with an approximate distribution, that is
np(1 p) 10
P(X = 20) = P(19.5 < X < 20.5) (continuity of N )
✓ ◆
19.5 20 X 20 20.5 20
=P p < p < p
10 10 10
' (0.16) ( 0.16)
= 0.1272 (from tables, or software)
; very good approximation
MATH3801-MATH3901 Stochastic Processes 146 MATH3801-MATH3901 Stochastic Processes 147
3. Conditional Probability and Conditional Expectation 3.1 Introduction
Conditional Probability and Conditional Expectation:
Introduction
In practice, we are often interested in calculating probabilities and
3 Conditional Probability and expectations when some partial information is available
Conditional Expectation ; conditional probabilities or expectations (see Slide 21)
Besides, first conditioning on some appropriate events can also prove
extremely useful in calculating probabilities or expectations in complex
problems
; recall the Multiplicative Law of Probability (Slide 22) or the
Law of Total Probability (Slide 31)
MATH3801-MATH3901 Stochastic Processes 148
3. Conditional Probability and Conditional Expectation 3.2 The Discrete Case 3. Conditional Probability and Conditional Expectation 3.2 The Discrete Case
Conditional probability mass function Conditional expectation (discrete case)
Recall that for any two events E1 and E2 , the conditional probability of Definition
E1 given E2 is defined, as long as P(E2 ) > 0, by The conditional expectation of X given that Y = y is defined by
X
P(E1 \ E2 ) E(X |Y = y ) =
x pX |Y (x|y )
P(E1 |E2 ) =
P(E2 ) x2SX
Therefore, for two discrete r.v. X and Y , it is natural to define the = same definition as the (unconditional) expectation, but everything
conditional probability mass function of X given Y = y by is here conditional on Y = y ; use pX |Y (x|y ) instead of pX (x)
P(X = x, Y = y ) pXY (x, y ) ; E(X |Y = y ) is the average value taken by X when Y = y
pX |Y (x|y ) = P(X = x|Y = y ) = = ,
P(Y = y ) pY (y )
Corollary 1: if X and Y are independent, E(X |Y = y ) = E(X ) (8y 2 SY )
for all values y such that P(Y = y ) > 0. Corollary 2: all the properties of the expectation remain valid, e.g.
Note: if X and Y are independent r.v., pX |Y (x|y ) = pX (x) 8y 2 SY E(X1 + X2 |Y = y ) = E(X1 |Y = y ) + E(X2 |Y = y )
E(aX + b|Y = y ) = aE(X |Y = y ) + b
E(XY |Y = y ) = y E(X |Y = y )
MATH3801-MATH3901 Stochastic Processes 149 MATH3801-MATH3901 Stochastic Processes 150
3. Conditional Probability and Conditional Expectation 3.2 The Discrete Case 3. Conditional Probability and Conditional Expectation 3.2 The Discrete Case
Conditional pmf: examples Conditional pmf: examples
Example 3.2 Example 3.2
Find the conditional pmf of X1 given X1 + X2 = m if X1 ⇠ Bin(n1 , p) and Find the conditional pmf of X1 given X1 + X2 = m if X1 ⇠ Bin(n1 , p) and
X2 ⇠ Bin(n2 , p) (X1 and X2 independent) X2 ⇠ Bin(n2 , p) (X1 and X2 independent)
We want P(X1 = x|X1 + X2 = m) Now,
By Example 2.44 (Slide 120), X1 + X2 ⇠ Bin(n1 + n2 , p), that is, n1 n2
✓ ◆ P(X1 = x, X1 + X2 = m) x m x
n1 + n2 m P(X1 = x|X1 + X2 = m) = = n1 +n2 ,
P(X1 + X2 = m) = p (1 p)n1 +n2 m P(X1 + X2 = m) m
m
for m = 0, 1, . . . , n1 + n2 for m = 0, 1, . . . , n1 + n2 and x = 0, 1, . . . , m
Also,
; this is the hypergeometric distribution
P(X1 = x, X1 + X2 = m) = P(X1 = x, X2 = m x)
✓ ◆ ✓ ◆ = distribution of the r.v. “number of blue balls that are chosen when a
n1 n2 sample of m balls are drawn from an urn that contains n1 blue balls and
= px (1 p)n1 x pm x (1 p)n2 m+x
x m x n2 red balls”
✓ ◆✓ ◆
n1 n2
= pm (1 p)n1 +n2 m (0  x  m) ; intuitive
x m x
MATH3801-MATH3901 Stochastic Processes 151 MATH3801-MATH3901 Stochastic Processes 152
3. Conditional Probability and Conditional Expectation 3.2 The Discrete Case 3. Conditional Probability and Conditional Expectation 3.2 The Discrete Case
Conditional pmf: examples Conditional pmf: examples
Example 3.3 Example 3.3
Find the conditional pmf and expected value of X1 given X1 + X2 = m if Find the conditional pmf and expected value of X1 given X1 + X2 = m if
X1 ⇠ P( 1 ) and X2 ⇠ P( 2 ) (X1 and X2 independent) X1 ⇠ P( 1 ) and X2 ⇠ P( 2 ) (X1 and X2 independent)
We want P(X1 = x|X1 + X2 = m) Then,
By Example 2.37 or 2.45 (Slide 121), X1 + X2 ⇠ P( 1 + 2 ), that is, x m x
P(X1 = x, X1 + X2 = m) m! 1 2
( 1 + 2 )m P(X1 = x|X1 + X2 = m) = = m
P(X1 + X2 = m) = e ( 1+ 2) P(X1 + X2 = m) x!(m x)! ( 1 + 2)
m! ✓ ◆✓ ◆x ✓ ◆m x
m 1 1
for m = 0, 1, . . . = 1
x 1 + 2 1 + 2
Also,
P(X1 = x, X1 + X2 = m) = P(X1 = x, X2 = m x) for m = 0, 1, . . . and x = 0, 1, . . . , m
x m x ⇣ ⌘
1 2
=e 1
e 2
; X1 |(X1 + X2 = m) ⇠ Bin m, 1
; intuitive
x! (m x)! 1+ 2
x m x
1 2 ; E(X1 |X1 + X2 = m) = m 1
=e ( 1+ 2)
(0  x  m) 1+ 2
x!(m x)!
MATH3801-MATH3901 Stochastic Processes 153 MATH3801-MATH3901 Stochastic Processes 154
3. Conditional Probability and Conditional Expectation 3.2 The Discrete Case 3. Conditional Probability and Conditional Expectation 3.2 The Discrete Case
Conditional pmf: examples Conditional pmf: examples
Example 3.5 Example 3.5 bis
There are n components. On a rainy day, component i will work with There are n components. On a rainy day, component i will work with
probability pi , and with probability qi on a nonrainy day. It will rain tomorrow probability pi , and with probability qi on a nonrainy day. It will rain tomorrow
with probability ↵. Find the conditional expected number of components that with probability ↵. Find the conditional expected number of components that
function tomorrow given that it rains. function tomorrow (no information about the weather).
P P P
Let ⇢ Then, we are searching for E( i Xi ) = i E(Xi ) = i P(Xi = 1). By the Law
1 if component i works tomorrow of Total Probability, we have
Xi = (Bernoulli r.v.)
0 otherwise
P(Xi = 1) = P(Xi = 1|Y = 1)P(Y = 1) + P(Xi = 1|Y = 0)P(Y = 0)
and Y a r.v. taking the value 1 if it rainsP
and 0 if it doesn’t. Then, the total (the events (Y = 0) and (Y = 1) form a partition of the sample space).
number of components that function is i Xi , and the desired conditional Hence,
expectation is P(Xi = 1) = ↵pi + (1 ↵)qi ,
!
n
X n
X n
X n
X and
E Xi Y =1 = E(Xi |Y = 1) = P(Xi = 1|Y = 1) = pi n
! n n n
X X X X
i=1 i=1 i=1 i=1 E Xi = {↵pi + (1 ↵)qi } = ↵ pi + (1 ↵) qi
i=1 i=1 i=1 i=1
; independent of ↵ and qi , as we know that it rains
MATH3801-MATH3901 Stochastic Processes 155 MATH3801-MATH3901 Stochastic Processes 156
3. Conditional Probability and Conditional Expectation 3.3 The Continuous Case 3. Conditional Probability and Conditional Expectation 3.3 The Continuous Case
Conditional probability density function Conditional probability density function: heuristics
Recall that the marginal density of Y were defined so as to have
If X and Y have a joint pdf fXY (x, y ), then we have the following
P(y /2  Y  y + /2) ' fY (y ). Similarly, the joint density of X
definition
and Y is such that
Definition
The conditional probability density function of X given that Y = y is P ((x "/2  X  x + "/2), (y /2  Y  y + /2)) ' " fXY (x, y )
defined, for all values y such that fY (y ) > 0, by Similarly, the conditional density should be such that
fXY (x, y ) P (x "/2  X  x + "/2) (y /2  Y  y + /2) ' "fX |Y (x|y )
fX |Y (x|y ) =
fY (y )
This probability equals
Note: if X and Y are independent, then fX |Y (x|y ) = fX (x) 8y 2 SY P ((x "/2  X  x + "/2), (y /2  Y  y + /2)) " fXY (x, y )
' ,
P(y /2  Y  y + /2) fY (y )
Careful! The link between fX |Y (x|y ) and the elementary conditional
probability definition (Slide 149) is not so obvious, as here so that
P(Y = y ) = 0 fXY (x, y )
fX |Y (x|y ) =
fY (y )
MATH3801-MATH3901 Stochastic Processes 157 MATH3801-MATH3901 Stochastic Processes 158
3. Conditional Probability and Conditional Expectation 3.3 The Continuous Case 3. Conditional Probability and Conditional Expectation 3.3 The Continuous Case
Conditional expectation (continuous case) Conditional pdf: example
Definition Example
The conditional expectation of X given that Y = y is defined by A soft-drink machine has a random amount Y in supply at the beginning of a
Z given day and dispenses a random amount X during the day. It is not
resupplied during the day, so X  Y always. It has been observed that X and
E(X |Y = y ) = x fX |Y (x|y ) dx
x2SX
Y has a joint density
⇢
1/2 if 0  x  y  2
= same definition as the (unconditional) expectation, but everything fXY (x, y ) =
0 elsewhere
is here conditional on Y = y ; use fX |Y (x|y ) instead of fX (x)
; E(X |Y = y ) is the average value taken by X when Y = y Find the conditional expectation of the dispensed amount X , given that Y = y
for some y 2 [0, 2]
Corollary 1: if X and Y are independent, E(X |Y = y ) = E(X ) (8y 2 SY )
We first need fX |Y (x|y ), and therefore fY (y ). We have
Corollary 2: all the properties of the expectation remain valid, e.g.
Z Z y
E(X1 + X2 |Y = y) = E(X1 |Y = y) + E(X2 |Y = y) y
fY (y ) = fXY (x, y ) dx = 1/2 dx = (if 0  y  2)
SX 0 2
E(aX + b|Y = y ) = aE(X |Y = y ) + b
E(XY |Y = y ) = y E(X |Y = y )
MATH3801-MATH3901 Stochastic Processes 159 MATH3801-MATH3901 Stochastic Processes 160
3. Conditional Probability and Conditional Expectation 3.3 The Continuous Case 3. Conditional Probability and Conditional Expectation 3.4 Computing Expectations by Conditioning
Conditional pdf: example Conditional expectation as a random variable
Example For each given value y assumed by the random variable Y , the
It has been observed that X and Y has a joint density conditional expectation of X may take a different value, it is therefore a
⇢ function of y :
1/2 if 0  x  y  2 .
fXY (x, y ) = E(X |Y = y ) = (y )
0 elsewhere
Because we do not know in advance which value the random variable
Find the conditional expectation of the dispensed amount X , given that Y = y Y will take, we need to consider all possibilities
for some y 2 [0, 2]
A convenient way of doing this is to construct a new random variable
Then, for 0 < y < 2,
.
fXY (x, y ) 1/2 1 E(X |Y ) = (Y )
fX |Y (x|y ) = = =
fY (y )
y /2 y
if 0  x  y and 0 elsewhere ; X |(Y = y ) ⇠ U[0,y ] which takes the value E(X |Y = y ) whenever Y = y
Finally, ; the ‘conditional expectation’ is actually a random variable
Z Z y  y
1 1 x2 y As a function of Y , E(X |Y ) is a discrete random variable if Y is
E(X |Y = y ) = xfX |Y (x|y ) dx = x dx = =
SX 0 y y 2 0 2 discrete, and may be a continuous random variable if Y is continuous
(expectation of the U[0.y ] -distribution)
MATH3801-MATH3901 Stochastic Processes 161 MATH3801-MATH3901 Stochastic Processes 162
3. Conditional Probability and Conditional Expectation 3.4 Computing Expectations by Conditioning 3. Conditional Probability and Conditional Expectation 3.4 Computing Expectations by Conditioning
Law of Iterated Expectations Law of Iterated Expectations
Let us denote by E(X |Y ) the random variable, function of Y , whose
value at Y = y is E(X |Y = y ). Then, we have the following useful Theorem: Law of Iterated Expectations
result Let X and Y be two random variables. Then,
Theorem: Law of Iterated Expectations E(X ) = E [E(X |Y )]
Let X and Y be two random variables. Then,
Explicitly, in the discrete case, the last line of the proof is:
E(X ) = E [E(X |Y )] X
E(X |Y = y )P(Y = y )
E(X ) =
Proof (continuous case):
Z Z Z y 2SY
E(X ) = x fX (x) dx = x fXY (x, y ) dy dx Interpretation:
S SX SY
Z X Z E(X ) is the weighted average of the conditional expected values of X
= x fX |Y (x|y ) fY (y ) dy dx given that Y = y , each of these terms E(X |Y = y ) being weighted by
S SX
⇢Z Y Z the probability of the event on which it is conditioned
= x fX |Y (x|y ) dx fY (y ) dy See that this is a kind of ‘Law of Total Probability’ (Slide 31) for
SY SX
Z expectations, which is not surprising: recall the equivalence between
= E(X |Y = y )fY (y ) dy = E [E(X |Y )] expectations and probabilities (Slide 72)
SY
MATH3801-MATH3901 Stochastic Processes 163 MATH3801-MATH3901 Stochastic Processes 164
3. Conditional Probability and Conditional Expectation 3.4 Computing Expectations by Conditioning 3. Conditional Probability and Conditional Expectation 3.4 Computing Expectations by Conditioning
Law of Iterated Expectations Examples
There is a more general version of the previous theorem: Example 3.10
Theorem Sam will read either one chapter of his probability book or one chapter of his
history book, with equal probability of choosing either book. The number of
The conditional expectation E(X |Y ) satisfies misprints in a chapter in his probability book is Poisson distributed with mean
2, while the number of misprints in a chapter in his history book is Poisson
E [E(X |Y ) g(Y )] = E(X g(Y )) distributed with mean 5. Find the expected number of misprints that Sam will
come across.
for any function g (for which both expectations exist)
Let X be the number of misprints and
Proof (continuous case): ⇢
Z
0 if he chooses the history book
E [E(X |Y )g(Y )] = E(X |Y = y )g(y )fY (y ) dy Y =
SY 1 if he chooses the probability book
Z Z
= xfX |Y (x|y ) dx g(y )fY (y ) dy Then,
SY SX
Z Z E(X ) = E [E(X |Y )] = E(X |Y = 0)P(Y = 0) + E(X |Y = 1)P(Y = 1)
= xg(y )fXY (x, y ) dy dx = E(X g(Y ))
SX SY = 5 ⇥ 1/2 + 2 ⇥ 1/2 = 3.5 (misprints)
Besides, E(X |Y ) is the only function of Y satisfying this property
MATH3801-MATH3901 Stochastic Processes 165 MATH3801-MATH3901 Stochastic Processes 166
3. Conditional Probability and Conditional Expectation 3.4 Computing Expectations by Conditioning 3. Conditional Probability and Conditional Expectation 3.4 Computing Expectations by Conditioning
Example 3.11: expectation of a compound random variable
Example Suppose the expected number of accidents per week at an industrial plant is
four. Supposed also that the number of workers injured in each accident are
A quality control plan for an assembly line involves sampling n = 10 finished independent r.v. with a common mean 2. The number of workers injured in
items per day and counting X the number of defectives. Assume that P, the each accident is independent of the number of accidents that occur. What is
proportion of defective items on a day, actually varies from day to day the expected number of injuries during a week?
according to an uniform distribution on [0, 1/4]. Find the expected value of X .
Let N be the number of accidents and Xi the number of injured workers in the
If P = p was fixed, we would have X ⇠ Bin(n, p) so that E(X |P = p) = np, ith accident (i = 1, 2, . . . , N). Then, the total number of injuries is
that is, E(X |P) = nP. It follows that PN
X = i=1 Xi , and
E(X ) = E [E(X |P)] = E(nP) = nE(P). N
! " !# N
" # N
X X X ⇣ ⌘
E(X ) = E Xi =E E Xi N =E E Xi N
Now, as P ⇠ U[0,1/4] , we know that E(P) = 1/8, and finally, with n = 10, i=1 i=1 i=1
" N
#
E(X ) = 10 ⇥ 1/8 = 1.25 (defective item) X
=E E (Xi ) = E [N E (Xi )] = E (Xi ) E [N] = 2 ⇥ 4 = 8
i=1
PN
Note: a random variable like i=1 Xi , i.e. the sum of a random number N of
i.i.d. r.v. independent of N, is called a compound random variable
MATH3801-MATH3901 Stochastic Processes 167 MATH3801-MATH3901 Stochastic Processes 168
3. Conditional Probability and Conditional Expectation 3.4 Computing Expectations by Conditioning 3. Conditional Probability and Conditional Expectation 3.4 Computing Expectations by Conditioning
Example 3.12: expectation of the Geometric distribution Example 3.13
A miner is trapped in a mine and is facing three tunnels. The first one takes
A coin, having probability p of coming up Heads, is to be successfully flipped
him to safety after two hours, the second tunnel returns him to the same
until the first Heads appears. What is the expected number of flips required?
place after 3 hours and the third one returns him to the same place after 5
Let N be the number of flips required and hours. Assuming that the miner is at all times equally likely to choose any one
of the tunnels, what is the expected length of time until he reaches safety?
⇢
1 if the first flip results in Heads
Y = Let X denote the time until he reaches safety, and let Y be the tunnel he
0 if the first flip results in Tails
initially chooses (SY = {1, 2, 3}). We have:
Now, E(N) = E [E(N|Y )], that is
E(X ) = E(X |Y = 1)P(Y = 1) + E(X |Y = 2)P(Y = 2) + E(X |Y = 3)P(Y = 3)
E(N) = E(N|Y = 1)P(Y = 1) + E(N|Y = 0)P(Y = 0) = 1/3 ⇥ E(X |Y = 1) + E(X |Y = 2) + E(X |Y = 3)
= 1 ⇥ p + (1 + E(N)) ⇥ (1 p)
Now, E(X |Y = 1) = 2, E(X |Y = 2) = 3 + E(X ) and E(X |Y = 3) = 5 + E(X ),
Solve for E(N) and get so that
1 E(X ) = 1/3 ⇥ (2 + 3 + E(X ) + 5 + E(X )).
E(N) =
p Solve for E(X ) to get
Note: N is obviously a Geometric random variable with parameter p E(X ) = 10 (hours)
MATH3801-MATH3901 Stochastic Processes 169 MATH3801-MATH3901 Stochastic Processes 170
3. Conditional Probability and Conditional Expectation 3.4 Computing Expectations by Conditioning 3. Conditional Probability and Conditional Expectation 3.4 Computing Expectations by Conditioning
Example 3.14: the Matching problem Example 3.14: the Matching problem
Suppose in Example 2.31 bis (Slide 93) that the users getting their own Find E(Rn ) where Rn is the number of rounds that are necessary when n men
passwords are put aside, while the others are randomly assigned a new are initially present.
password from the remaining passwords. The process continues until each
Then, with R0 = 0,
user has their own password. Find E(Rn ) where Rn is the number of rounds
n
that are necessary when n users are initially concerned. X
E(Rn ) = (1 + E(Rn i ))P(Xn = i)
We know that there is on average one match per round, no matter how many i=0
n n
people are concerned, so that one might suggest E(Rn ) = n. Let us show X X
that by induction. It is obvious that R1 = 1, so that E(R1 ) = 1. Suppose now = P(Xn = i) + E(Rn )P(Xn = 0) + E(Rn i )P(Xn = i)
that E(Rk ) = k for k = 1, 2, . . . , (n 1). If Xn is the number of matches in the i=0 i=1
n
X
first round, we have
n
= 1 + E(Rn )P(Xn = 0) + (n i)P(Xn = i)
X i=1
E(Rn ) = E [E(Rn |Xn )] = E(Rn |Xn = i)P(Xn = i)
i=0 =1+ E(Rn )P(Xn = 0) + n(1 P(Xn = 0)) E(Xn )
= 1 + E(Rn )P(Xn = 0) + n(1 P(Xn = 0)) 1
Now, given a total of i matches in the first round, the number of rounds
E(Rn ) = E(Rn )P(Xn = 0) + n(1 P(Xn = 0)) , E(Rn ) = n
needed is 1 (the first round) plus the number of rounds required when n i
persons are concerned
MATH3801-MATH3901 Stochastic Processes 171 MATH3801-MATH3901 Stochastic Processes 172
3. Conditional Probability and Conditional Expectation 3.4 Computing Expectations by Conditioning 3. Conditional Probability and Conditional Expectation 3.4 Computing Expectations by Conditioning
Conditional variance The conditional variance formula
Definition Then, we have the following useful result
The conditional variance of a random variable X given that Y = y is Theorem: the conditional variance formula
given by Let X and Y be two random variables. Then,
h i
Var(X |Y = y ) = E (X E(X |Y = y ))2 Y = y Var(X ) = E [Var(X |Y )] + Var [E(X |Y )]
h i
Proof: E [Var(X |Y )] = E E(X 2 |Y ) (E(X |Y ))
2
; same expression as the usual (unconditional) variance except that
h i
all the expectations are taken conditional on Y = y = E(X 2 )
E (E(X |Y ))
2
; all properties of the variance hold, e.g.
and
Var(X |Y = y ) = E(X 2 |Y = y ) (E(X |Y = y ))2 h
2
i
2
Var [E(X |Y )] = E (E(X |Y )) {E [E(X |Y )]}
Var(aX + b|Y = y ) = a2 Var(X |Y = y ) h
2
i
2
= E (E(X |Y )) {E(X )}
Similarly to what we did for the expectation (Slide 162), we can define
the random variable Var(X |Y ), function of Y , whose value at Y = y is Sum these two quantities, the result follows, as Var(X ) = E(X 2 ) {E(X )}2
Var(X |Y = y )
MATH3801-MATH3901 Stochastic Processes 173 MATH3801-MATH3901 Stochastic Processes 174
3. Conditional Probability and Conditional Expectation 3.4 Computing Expectations by Conditioning 3. Conditional Probability and Conditional Expectation 3.4 Computing Expectations by Conditioning
Example Example 3.19: variance of a compound random variable
A quality control plan for an assembly line involves sampling n = 10 finished Let X1 , X2 , . . . be i.i.d. r.v. with mean µ and variance 2 , and independent of
items per day and counting X the number of defectives. Assume that P, the the nonnegative integer-valued r.v. N. Find the variance of the compound r.v.
proportion of defective items on a day, actually varies from day to day XN
according to an uniform distribution on [0, 1/4]. Find the variance of X . Xi S=
i=1
For a given P = p, X has a binomial distribution so that E(X |P) = nP and
Var(X |P) = nP(1 P). Thus, We have N
! " N
!# " N
!#
X X X
Var(X ) = E [Var(X |P)] + Var [E(X |P)] = nE [P(1 P)] + n2 Var [P] Var(S) = Var Xi = E Var Xi N + Var E Xi N
i=1 i=1 i=1
Now, as P ⇠ U[0,1/4] , we have E(P) = 1/8 and Var(P) = 1/192, so that ⇣ PN ⌘
E(P 2 ) = 1/48. Hence, From Example 3.11, E i=1 Xi N = N E(Xi ) = µN. Similarly,
!
5n n2 N
X
Var(X ) = n(E(P) E(P 2 )) + n2 Var(P) = + , Var Xi N = N Var(Xi ) = 2
N,
48 192
i=1
which is, for n = 10,
so that
Var(X ) = 1.5625 (defective item2 ) Var(S) = E( 2
N) + Var(µN) = 2
E(N) + µ2 Var(N)
For instance, if N ⇠ P( ) (compound Poisson r.v.), Var(S) = E(X 2 )
MATH3801-MATH3901 Stochastic Processes 175 MATH3801-MATH3901 Stochastic Processes 176
3. Conditional Probability and Conditional Expectation 3.5 Computing Probabilities by Conditioning 3. Conditional Probability and Conditional Expectation 3.5 Computing Probabilities by Conditioning
Computing Probabilities by Conditioning Computing Probabilities by Conditioning: examples
With the Law of Iterated Expectations we easily obtain expectations of Example 3.22
interest by first conditioning on an appropriate random variable
An insurance company supposes that the number of accidents that each of
; this allows us to treat each source of randomness individually, its policyholders will have this year is Poisson distributed, with a mean
which makes things simpler depending on the policyholder: the Poisson mean ⇤ of a randomly chosen
person has a Gamma distribution with density function f⇤ ( ) = e ( > 0).
; same approach possible to compute probabilities
Find the probability that a randomly chosen person has exactly n accidents
We know that, for any event E, P(E) can be written as the expectation this year.
of the indicator rv 1I{E} : E(1I{E} ) = P(E), see Slide 72
Let X denote the number of accidents that a randomly chosen person has
Then, it is also clear that E(1I{E} |Y = y ) = P(E|Y = y ) for any r.v. Y this year. Then, conditioning on ⇤ yields, for n = 0, 1, . . .,
From Slides 163-164, it follows Z +1
X P(X = n) = P(X = n|⇤ = )f⇤ ( ) d
P(E) = P(E|Y = y )P(Y = y ) if Y is discrete 0
Z +1 n
y 2SY
Z = e e d
0 n!
P(E) = P(E|Y = y )fY (y ) dy if Y is continuous Z +1
SY 1 n+1 2
= e d
Note: {(Y = y ), y 2 SY } is a partition of S ; law of total probability n! 0
MATH3801-MATH3901 Stochastic Processes 177 MATH3801-MATH3901 Stochastic Processes 178
3. Conditional Probability and Conditional Expectation 3.5 Computing Probabilities by Conditioning 3. Conditional Probability and Conditional Expectation 3.5 Computing Probabilities by Conditioning
Computing Probabilities by Conditioning: examples Example 3.23
Example 3.22 The number of people visiting a yoga academy each day is Poisson with
mean . Each person who visits is, independently, female with probability p
An insurance company supposes that the number of accidents that each of and male with probability 1 p. Find the probability that exactly n women and
its policyholders will have this year is Poisson distributed, with a mean m men visit the academy today
depending on the policyholder: the Poisson mean ⇤ of a randomly chosen
person has a Gamma distribution with density function f⇤ ( ) = e ( > 0). Let N1 be the number of women, and N2 be the number of men visiting the
Find the probability that a randomly chosen person has exactly n accidents academy today, and let N = N1 + N2 . Conditioning on N yields
this year. +1
X
P(N1 = n, N2 = m) = P(N1 = n, N2 = m|N = i)P(N = i),
However,
i=0
2e 2 (2 )n+1
h( ) = >0 but as P(N1 = n, N2 = m|N = i) = 0 when i 6= n + m, this reduces to
(n + 1)!
is the density of a Gamma (n + 2, 2) r.v., so that it integrates to 1 P(N1 = n, N2 = m) = P(N1 = n, N2 = m|N = n + m)P(N = n + m)
n+m
Z +1 = P(N1 = n, N2 = m|N = n + m) e
n+1 2 (n + 1)! (n + m)!
; e d =
0 2n+2 n+m
It follows = P(N1 = n|N = n + m) e
n+1 (n + m)!
P(X = n) = (n = 0, 1, 2, . . .)
2n+2
MATH3801-MATH3901 Stochastic Processes 179 MATH3801-MATH3901 Stochastic Processes 180
3. Conditional Probability and Conditional Expectation 3.5 Computing Probabilities by Conditioning 3. Conditional Probability and Conditional Expectation 3.5 Computing Probabilities by Conditioning
Example 3.23 Example 3.23
The number of people visiting a yoga academy each day is Poisson with The number of people visiting a yoga academy each day is Poisson with
mean . Each person who visits is, independently, female with probability p mean . Each person who visits is, independently, female with probability p
and male with probability 1 p. Find the probability that exactly n women and and male with probability 1 p. Find the probability that exactly n women and
m men visit the academy today m men visit the academy today
Now, given N = n + m, each of these n + m persons is independently a Remark:
woman with prob p and a man with prob 1 p
( p)n ( (1 p))m
n+m P(N1 = n, N2 = m) = e p ⇥ e (1 p)
; N1 |(N = n + m) ⇠ Bin(n + m, p) ; P(N1 = n|N = n + m) = n pn (1 p)m n! m!
Therefore, = P(N1 = n) ⇥ P(N2 = m)
✓ ◆
n+m n n+m ; N1 and N2 are independent
P(N1 = n, N2 = m) = p (1 p)m e n
n (n + m)! Moreover, as P(N1 = n) = e p ( p)
n! ,
n+m
(n + m)! n
= p (1 p)m e p
e (1 p)
N1 ⇠ P( p)
n!m! (n + m)!
p)n p( (1 p) ( (1 p))m and similarly
e =e N2 ⇠ P( (1 p))
n! m!
MATH3801-MATH3901 Stochastic Processes 181 MATH3801-MATH3901 Stochastic Processes 182
3. Conditional Probability and Conditional Expectation 3.5 Computing Probabilities by Conditioning 3. Conditional Probability and Conditional Expectation 3.5 Computing Probabilities by Conditioning
Example Example
A total of n people have been invited to a party honoring an important official. A total of n people have been invited to a party honoring an important official.
It is starting at time 0. The arrival times of the n guests are i.i.d. exponential It is starting at time 0. The arrival times of the n guests are i.i.d. exponential
r.v. with mean 1, and the arrival time of the official is uniformly distributed r.v. with mean 1, and the arrival time of the official is uniformly distributed
between 0 and 1. between 0 and 1.
a) Find the probability that exactly k guests arrive before the official. a) Find the probability that exactly k guests arrive before the official.
Let us call T the arrival time of the official, and X the number of guests It follows
arriving before him. We know that T has density fT (t) = 1 on [0, 1] and 0 Z ✓ ◆
1
elsewhere. Conditioning on T therefore yields n
Z 1 P(X = k ) = (1 e t )k (e t )n k
dt
0 k
P(X = k ) = P(X = k |T = t) dt ✓ ◆Z 1
n
0 = (1 y )k y n k 1
dy
Now, given T = t, each guest i has the same probability of arriving earlier, k e 1
independently of the others: Ti ⇠ Exp(1) ; P(Ti  t) = 1 e t t
with the change of variable y = e
; X |(T = t) ⇠ Bin(n, 1 e t)
(n people arriving independently, each of them with common probability
1 e t to be before the official)
MATH3801-MATH3901 Stochastic Processes 183 MATH3801-MATH3901 Stochastic Processes 184
3. Conditional Probability and Conditional Expectation 3.5 Computing Probabilities by Conditioning 3. Conditional Probability and Conditional Expectation 3.5 Computing Probabilities by Conditioning
Example Example 3.25: the best prize problem
A total of n people have been invited to a party honoring an important official. We are to be presented with n distinct prizes in sequence. When being
It is starting at time 0. The arrival times of the n guests are i.i.d. exponential presented with a prize, we must immediately decide whether to accept it or
r.v. with mean 1, and the arrival time of the official is uniformly distributed reject it and consider the next prize. When deciding, we only know the prizes
between 0 and 1. already seen, and a prize which is rejected is lost. The objective is to develop
a strategy maximizing the probability of obtaining the best prize.
b) Find the expected number of guests who arrive before the official.
T Fix a value k (0  k < n) and consider the strategy that rejects the first k
We are searching for E(X ). We know X |T ⇠ Bin(n, 1 e ), so that
prizes and then accept the first one that is better than all of those first k .
T
E(X |T ) = n(1 e ) Denote Pk (best) the probability that the best prize is selected using this
strategy, and X the position of the best prize. Conditioning on X yields
Hence,
n
X n
Z 1 Z 1 1X
Pk (best) = Pk (best|X = i)P(X = i) = Pk (best|X = i)
E(X ) = E [E(X |T )] = E(X |T = t)fT (t) dt = n (1 e t ) dt = ne 1
n
0 0 i=1 i=1
e 1
' 0.37 ; a little more than one third of the guests will be present Now, Pk (best|X = i) = 0 if i  k (no prize is then ever selected).
If i > k , the best prize is selected if the best of the first k prizes is also the
(on average)
best of the first i 1 ; = i) = k /(i 1)
Pk (best|X
MATH3801-MATH3901 Stochastic Processes 185 MATH3801-MATH3901 Stochastic Processes 186
3. Conditional Probability and Conditional Expectation 3.5 Computing Probabilities by Conditioning 3. Conditional Probability and Conditional Expectation 3.5 Computing Probabilities by Conditioning
Example 3.25: the best prize problem Example 3.25: the best prize problem
We are to be presented with n distinct prizes in sequence. When being We are to be presented with n distinct prizes in sequence. When being
presented with a prize, we must immediately decide whether to accept it or presented with a prize, we must immediately decide whether to accept it or
reject it and consider the next prize. When deciding, we only know the prizes reject it and consider the next prize. When deciding, we only know the prizes
already seen, and a prize which is rejected is lost. The objective is to develop already seen, and a prize which is rejected is lost. The objective is to develop
a strategy maximizing the probability of obtaining the best prize. a strategy maximizing the probability of obtaining the best prize.
It follows 1 1
Now, the function g(x) = x log x is maximized at x = e , so that
n n 1
k X 1 k X1
Pk (best) = = , Pk (best) is maximized for k = ne 1
' 0.37 ⇥ n
n i 1 n i
i=k +1 i=k
which should be maximized with respect to k . For large n,
Z ; the best strategy is to let the first [n/e] prizes go by and then accept the
k n 11
Pk (best) ' dx first one that is better than all of those
n k x
✓ ◆ Besides, as g(e 1
)=e 1
, the probability that we select the best prize is
k n 1
= log 1
n k P[n/e] (best) ' e ' 0.37
k ⇣n⌘
log ' This problem is also known as the secretary problem or the marriage problem
n k
See www.science.unsw.edu.au/news/new-fiancee-formula-marriage
MATH3801-MATH3901 Stochastic Processes 187 MATH3801-MATH3901 Stochastic Processes 188
3. Conditional Probability and Conditional Expectation 3.5 Computing Probabilities by Conditioning 3. Conditional Probability and Conditional Expectation 3.5 Computing Probabilities by Conditioning
Iterated Conditional Expectations Iterated Conditional Expectations: example
From Slides 150, 159 and 163, we know:
the conditional expectation satisfies all the properties of ordinary Example 3.32
expectations; An automobile insurance company classifies the policyholders as being one
of the types i = 1, . . . , k . The number of accidents that a type i driver has in
one of them being the Law of Iterated Expectation
successive years are assumed to be independent Poisson r.v. with mean i
E(X ) = E [E(X |Y )] (i = 1, . . . , k ). The probability that a newly insured person is type i is pi .
; this suggests: a) Given that a driver had n accidents in her first year, what is the expected
number of accidents that she has in her second year?
Iterated conditional expectations
E(X |Y ) = E[E(X |W , Y )|Y ] with probability 1 let Ni = number of accidents she has in year i (i = 1, 2)
we desire E(N2 |N1 = n)
where E(X |W , Y ) is the r.v. function of Y and W ,
equal to E(X |W = w, Y = y ) when W = w and Y = y see that N1 and N2 depend on the type of the driver
(this result can be properly proved) ; condition on the risk type T
Note: This result states the equality of two random variables, while the
Law of Iterated Expectations was about the equality of two numbers
MATH3801-MATH3901 Stochastic Processes 189 MATH3801-MATH3901 Stochastic Processes 190
3. Conditional Probability and Conditional Expectation 3.5 Computing Probabilities by Conditioning 3. Conditional Probability and Conditional Expectation 3.5 Computing Probabilities by Conditioning
Iterated Conditional Expectations: example Iterated Conditional Expectations: example
Example 3.32 Example 3.32
An automobile insurance company classifies the policyholders as being one An automobile insurance company classifies the policyholders as being one
of the types i = 1, . . . , k . The number of accidents that a type i driver has in of the types i = 1, . . . , k . The number of accidents that a type i driver has in
successive years are assumed to be independent Poisson r.v. with mean i successive years are assumed to be independent Poisson r.v. with mean i
(i = 1, . . . , k ). The probability that a newly insured person is type i is pi . (i = 1, . . . , k ). The probability that a newly insured person is type i is pi .
a) Given that a driver had n accidents in her first year, what is the expected a) Given that a driver had n accidents in her first year, what is the expected
number of accidents that she has in her second year? number of accidents that she has in her second year?
Now, as {(T = i), i = 1, . . . , k } is a partition, Bayes’ second rule yields
k
X P(N1 = n|T = i)P(T = i)
E(N2 |N1 = n) = E(N2 |N1 = n, T = i)P(T = i|N1 = n) P(T = i|N1 = n) = Pk
i=1 j=1 P(N1 = n|T = j)P(T = j)
n
k
X pi e i i /n!
= E(N2 |T = i)P(T = i|N1 = n) = Pk
n
j=1 pj e j /n!
j
i=1
n
k
X pi e i i
= Pk
= i P(T = i|N1 = n) j=1 pj e
j n
j
i=1
MATH3801-MATH3901 Stochastic Processes 191 MATH3801-MATH3901 Stochastic Processes 192
3. Conditional Probability and Conditional Expectation 3.5 Computing Probabilities by Conditioning 3. Conditional Probability and Conditional Expectation 3.5 Computing Probabilities by Conditioning
Iterated Conditional Expectations: example Iterated Conditional Expectations: example
Example 3.32 Example 3.32
An automobile insurance company classifies the policyholders as being one An automobile insurance company classifies the policyholders as being one
of the types i = 1, . . . , k . The number of accidents that a type i driver has in of the types i = 1, . . . , k . The number of accidents that a type i driver has in
successive years are assumed to be independent Poisson r.v. with mean i successive years are assumed to be independent Poisson r.v. with mean i
(i = 1, . . . , k ). The probability that a newly insured person is type i is pi . (i = 1, . . . , k ). The probability that a newly insured person is type i is pi .
a) Given that a driver had n accidents in her first year, what is the expected b) Given that a driver had n accidents in her first year, what is the conditional
number of accidents that she has in her second year? probability that she has m accidents in her second year?
It follows:
k
X
Pk n+1 k
i=1 pi e
i
i
X pi e i n
i
P(N2 = m|N1 = n) = P(N2 = m|N1 = n, T = i)P(T = i|N1 = n)
E(N2 |N1 = n) = Pk = Pk i i=1
n n
j=1 pj e j=1 pj e
j j
j i=1 j k
X m
pi e i n
i i
n = e i
Pk
; weighted average of the i ’s, with weights proportional to pi e i
m! n
i j=1 pj e
j
i=1 j
Pk 2 i m+n
i=1 pi e i
= Pk
m! n
i=1 pi e
i
i
MATH3801-MATH3901 Stochastic Processes 193 MATH3801-MATH3901 Stochastic Processes 194
Interlude - Stochastic processes 2.9 Stochastic Processes
Stochastic Processes
Definition
A stochastic process X = {X (t) : t 2 T } is an infinite collection of
random variables defined on the same sample space, indexed by an
integer or a real number
Interlude - Stochastic processes ; for each fixed t 2 T , X (t) is a random variable
Similarly to the definition of a random variable, we have
X :S !R⇥T
! ! (X (t)) (!)
i.e. a whole function is associated to each possible outcome ! of a
random experiment. This function is called a sample path
Note: stochastic and random are synonyms, but for some reason
“random” is more popular for variables and “stochastic” for processes
(‘ ⌧ o ↵ ⌧ ◆o&’, ancient Greek for ‘proceeding by guesswork’)
MATH3801-MATH3901 Stochastic Processes 195
Interlude - Stochastic processes 2.9 Stochastic Processes Interlude - Stochastic processes 2.9 Stochastic Processes
Discrete- and Continuous-time Stochastic Processes Discrete- and Continuous-state Stochastic Processes
The index t is usually interpreted as time The state space SX is defined as the set of all possible values that Xn
; X (t) is the state of the process at time t or X (t) can assume at any time
The set T is called the index set of the process ; it is the common domain of variation of the random variables {Xn }
when T is a countable set, X is said to be a discrete-time process, or {X (t)}
and we use the notation It might be discrete or continuous: discrete-state or continuous-state
{Xn , n = 0, 1, 2, . . .} stochastic processes
when T is an interval on the real line, X is said to be a Hence there are essentially four main types of stochastic processes:
continuous-time process, denoted discrete-time discrete-state stochastic processes (e.g.
{X (t), t 0} discrete-time Markov Chains, Ch.4)
continuous-time discrete-state stochastic processes (e.g. Poisson
; whether T is discrete or continuous is important in determining processes, Ch.5)
how the process will be mathematically studied discrete-time continuous-state stochastic processes
(just as discrete random variables are distinguishable from continuous-time continuous-state stochastic processes (e.g.
continuous ones) Brownian motion, Ch.10)
MATH3801-MATH3901 Stochastic Processes 196 MATH3801-MATH3901 Stochastic Processes 197
Interlude - Stochastic processes 2.9 Stochastic Processes Interlude - Stochastic processes 2.9 Stochastic Processes
Stochastic processes Stochastic Processes: examples
Some examples
The sequences of random variables {Xn , n = 0, 1, 2, . . .} or X (t) might be
{X (t), t 0} may be independent, but then it is of little interest to study the total number of customers in a supermarket at time t
them together
the position of a particle at time t
Rather, stochastic processes are mainly concerned with the study of the amount of fish in a lake at time t
the ‘temporal’ dependence present in those sequences
the number of demands on a store’s stock at time t
; more realistic models for random evolution your fortune at time t when repeatedly betting in a game of chance
...
Interpretation
A stochastic process is a family of r.v. that describes the evolution over This course studies models for such “systems” that evolve in a random
time of some physical process fashion and for which we would like to quantify the likelihood of various
outcomes
; the simplest class of such models is probably the
discrete-time Markov Chains
MATH3801-MATH3901 Stochastic Processes 198 MATH3801-MATH3901 Stochastic Processes 199
4. Markov Chains 4.1 Introduction
Markov Chains: introduction
In 1913, Andrey Markov was reading a book entitled ‘Eugene Onegin’
by Alexander Pushkin
He studied the first 20,000 characters. The respective proportions of
consonants and vowels were 0.568 and 0.432, however he noted that
4 Markov Chains if one was reading a vowel, then the probability that the next letter
would be a consonant was 0.872
if one was reading a consonant, then the probability that the next
letter would be a consonant was only 0.337
; the sequence could not be understood as a sequence of
independent random variables
But he also noted that knowing the previous two letters did not
change drastically these probabilities
MATH3801-MATH3901 Stochastic Processes 200
4. Markov Chains 4.1 Introduction 4. Markov Chains 4.1 Introduction
Markov Chains: introduction Markov Chains: definition
Definition
A discrete-time Markov Chain is
1 a discrete-time stochastic process {Xn , n = 0, 1, 2, . . .},
That was the first example of what became known as a Markov Chain,
whose central idea could be summarised as
2 that takes on a finite or countable number of possible values
(discrete state space SX ) called “states” and (usually) denoted by
“while successive outcomes may not be independent, only the most the integers {0, 1, 2, . . .} (wlog) , and
recent outcome is helpful in predicting the next one” 3 satisfying the “Markov property”:
or loosely
for any i, j, i0 , i1 , . . . , in 1 2 SX = {0, 1, . . .},
it only matters where you are, not where you have been P (Xn+1 = j|Xn = i, Xn = in
1 1 , . . . , X2 = i2 , X1 = i1 , X0 = i0 )
= P (Xn+1 = j|Xn = i)
; given the present state (time n), further information about the past
(times 0,1,. . .,n 1) is irrelevant for comprehending the future
(time n + 1) (; the future is independent of the past)
MATH3801-MATH3901 Stochastic Processes 201 MATH3801-MATH3901 Stochastic Processes 202
4. Markov Chains 4.1 Introduction 4. Markov Chains 4.1 Introduction
Markov Chains: description Markov Chains: description
Most of the time, the transition probabilities do not depend on n, and
A Markov Chain is also often represented as a directed graph, with
we denote
one node for each state and a directed arc for each non-zero transition
P (Xn+1 = j|Xn = i) = Pij for i, j 2 {0, 1, . . .} probability, e.g.
; homogeneous Markov Chain (always assumed in this course) P12
P01 * P52
; Pij represents the probability that the process will, when in state i, P00
5 0^
/ 1 j 2 o 5
next make the transition to state j. Thus, Pij 0 8i, j 0, and P21
X P13 P24
P30
Pij = 1, 8i 0 ✏ ✏
j2SX 3 P34
/ 4
i P44
as the process must make a transition into some state
These transition probabilities are conveniently formed into a matrix:
0 1 Interpretation
P00 P01 P02 ...
BP P11 C We can think of the development of the chain as the motion of a
B 10 C
P=B
B
. .. C
C notional particle which randomly jumps between the states at each
@P20 A epoch of time
. .
.. ..
MATH3801-MATH3901 Stochastic Processes 203 MATH3801-MATH3901 Stochastic Processes 204
4. Markov Chains 4.1 Introduction 4. Markov Chains 4.1 Introduction
Markov Chains: examples Markov Chains: examples
Example 4.1: forecasting the weather Example 4.3: Gary’s mood
Suppose the chance of rain tomorrow depends on previous weather On any given day, Gary is either cheerful (C), so-so (S), or glum (G). If he is
conditions only through whether or not it is raining today and not on other cheerful today, then he will be C, S, or G tomorrow with respective
past weather conditions. If it rains today, it will rain tomorrow with probability probabilities 0.5, 0.4, 0.1. If he is feeling so-so today, then he will be C, S, or
↵, and if it does not rain today, it will rain tomorrow with probability G tomorrow with probabilities 0.3, 0.4, 0.3. If he is glum today, then he will be
C, S, or G tomorrow with probabilities 0.2, 0.3, 0.5
Letting Xn be the ‘weather’ on day n, the process {Xn , n 0} is a two-state
Markov chain (it is raining (state 0) and it is not raining (state 1)), whose Letting Xn denote Gary’s mood on the nth day, then {Xn , n 0} is a
transition probabilities are given by three-state Markov chain (state 0=C, state 1=S, state 2=G) with transition
✓ ◆ probabilities matrix 0 1
↵ 1
↵ 0.5 0.4 0.1
P= ,
1 P = @0.3 0.4 0.3A ,
0.2 0.3 0.5
which is represented by 0.1
which is represented by
1 ↵
* 0.4 0.4 0.3 ⌫
↵
6 0 j 1 h 1 * *
0.5
6 0W j 1 j 2 h 0.5
0.3 0.3
0.2
MATH3801-MATH3901 Stochastic Processes 205 MATH3801-MATH3901 Stochastic Processes 206
4. Markov Chains 4.1 Introduction 4. Markov Chains 4.1 Introduction
Markov Chains: examples Markov Chains: examples
Example 4.4 Example 4.4
Suppose whether it rains today or not depends on previous weather Suppose whether it rains today or not depends on previous weather
conditions through the last two days. Specifically, suppose that if it has rained conditions through the last two days. Specifically, suppose that if it has rained
for the past two days, then it will rain tomorrow with probability 0.7; if it rained for the past two days, then it will rain tomorrow with probability 0.7; it it rained
today but not yesterday, then it will rain tomorrow with probability 0.5; it it today but not yesterday, then it will rain tomorrow with probability 0.5; it it
rained yesterday but not today, then it will rain tomorrow with probability 0.4; if rained yesterday but not today, then it will rain tomorrow with probability 0.4; if
it has not rained in the past two days, then it will rain tomorrow with probability it has not rained in the past two days, then it will rain tomorrow with probability
0.2 0.2
Basically (state = today’s weather): not a Markov chain. However, we can 0.5
0 1 t
redefine the states as the weather conditions over the past two days: 0.7 0 0.3 0 0.7
6 0 8 1 T
B C 0.4
state 0: it rained both today and yesterday B0.5
P=@
0 0.5 0 C
0.3 0.2
state 1: it rained today but not yesterday 0 0.4 0 0.6A  0.5
state 2: it rained yesterday but not today 0 0.2 0 0.8 x
2 4 3 h 0.8
state 3: it did not rain either today or yesterday 0.6
; Markov Chain
MATH3801-MATH3901 Stochastic Processes 207 MATH3801-MATH3901 Stochastic Processes 208
4. Markov Chains 4.1 Introduction 4. Markov Chains 4.1 Introduction
Markov Chains: examples Markov Chains: examples
Definition: Random Walk (Example 4.5) Illustration: ten realisations of a symmetric (p = 1/2) random walk
A Markov Chain whose state space is given by the integers i = 0, ±1, ±2, . . .
is said to be a random walk if, for some number 0 < p < 1,

10
Pi,i+1 = p, Pi,i 1 =1 p i = 0, ±1, ±2, . . .

5
The associated graph looks like
p p p p p p
+ + * * *
... j –2 k –1 k 0 j 1 j 2 j .( . .

0
state
1 p 1 p 1 p 1 p 1 p 1 p

−5
Colourful application: wanderings of a drunken man
as he walks along a straight line

−10
; this process is also called the drunkard’s walk
(Salvador Dali - The Drunkard, 1922)
0 10 20 30 40 50
time
MATH3801-MATH3901 Stochastic Processes 209 MATH3801-MATH3901 Stochastic Processes 210
4. Markov Chains 4.1 Introduction 4. Markov Chains 4.1 Introduction
Markov Chains: examples Markov Chains: examples
Imagine now a drunkard walking randomly in an idealised city,
arranged in a square grid
At every intersection, the drunkard chooses one of the four possible
routes (including the one he came from) with equal probability
; formally, this is a random walk in two dimensions, on the set of all
points in the plane with integer coordinates
Three realisations of a random walk in Antony Gormley’s Quantum Cloud
three dimensions sculpture in London was designed by
a computer using a random walk
algorithm
MATH3801-MATH3901 Stochastic Processes 211 MATH3801-MATH3901 Stochastic Processes 212
4. Markov Chains 4.1 Introduction 4. Markov Chains 4.2 Chapman-Kolmogorov Equations
Markov Chains: examples Two-step transition probabilities
Example 4.6 : Gambler’s ruin problem Pij = P(Xn+1 = j|Xn = i) = one-step transition probabilities
Consider a gambler who, at each play of the game, either wins $1 with [2]
; what is the probability Pij that a process in state i will be in state j
probability p or loses $1 with probability 1 p. Suppose that he quits playing
[2]
either when he goes broke or attains a fortune of $N after 2 transitions, that is Pij = P(Xn+2 = j|Xn = i)?
The gambler’s fortune (state i = the gambler has $i) is a Markov chain, with M ●
transition probabilities ●
⇢ j ● ●
Pi,i+1 = p = 1 Pi,i 1 i = 1, 2, . . . , N 1 ●

state
P00 = PNN = 1
i ● ●
●
which is represented by
1 ●
0 ●
n n+1 n+2
p p p p step
* , * X
1
6 0 j 1 j 2 j .( . . j N–1 N j 1 [2]
1 p 1 p 1 p 1 p
; Pij = P(Xn+2 = j|Xn+1 = k , Xn = i)P(Xn+1 = k |Xn = i)
k 2SX
X
Note: the states 0 and N are called absorbing. This is a finite-state random = Pik Pkj = [PP]ij = [P2 ]ij (Law of Total Probability)
walk with absorbing barriers k 2SX
MATH3801-MATH3901 Stochastic Processes 213 MATH3801-MATH3901 Stochastic Processes 214
4. Markov Chains 4.2 Chapman-Kolmogorov Equations 4. Markov Chains 4.2 Chapman-Kolmogorov Equations
Chapman-Kolmogorov Equations Chapman-Kolmogorov Equations: example
Similarly,
Example 4.8
Chapman-Kolmogorov Equations
Consider Example 4.1 (weather), with ↵ = 0.7 and = 0.4. Calculate the
For all n, m 0, i, j 0, we have probability that it will rain four days from today if it is raining today
[n+m]
X [n] [m]
Pij = Pik Pkj We have ✓ ◆
0.7 0.3
k 2SX P= ,
0.4 0.6
[n]
With P[n] the matrix of n-step transition probabilities Pij , this is so that ✓ ◆✓ ◆ ✓ ◆
0.7 0.3 0.7 0.3 0.61 0.39
P[n+m] = P[n] P[m] P[2] = P2 = =
0.4 0.6 0.4 0.6 0.52 0.48
In particular, P[2] = P[1] P[1] = PP = P2 , and by induction and
✓ ◆✓ ◆ ✓ ◆
P[n] = Pn 0.61 0.39 0.61 0.39 0.5749 0.4251
P[4] = P4 = P2 P2 = =
; the n-step transition matrix is the nth power of the one-step 0.52 0.48 0.52 0.48 0.5668 0.4332
transition matrix [4]
; the desired probability is P00 = 0.5749
Note: P[0] = P0 = I (the identity matrix)
MATH3801-MATH3901 Stochastic Processes 215 MATH3801-MATH3901 Stochastic Processes 216
4. Markov Chains 4.2 Chapman-Kolmogorov Equations 4. Markov Chains 4.2 Chapman-Kolmogorov Equations
Transformed transition matrix Example 4.12
In some situations it can be useful to work with a transformed transition
A pensioner receives k$2 at the beginning of each month. The amount of
matrix, for instance:
money he spends during a month is independent of and at most the amount
Suppose you want to determine the probability that a Markov Chain he has, and is equal to k$ 1, 2, 3, 4 with probability 1/4. If he has more than
has ever entered some set of states A by time m k$3 at the end of a month, he gives the surplus to his son. If, after receiving
his payment at the beginning of a month, he has k$5, what is the probability
; reset the transition matrix to that his capital is ever 1 or less at any time within the following four months?
8
< 1 if i 2 A, j = i The capital of the pensioner is a Markov Chain with
Qij = P⇤ (Xn+1 = j|Xn = i) = 0 if i 2 A, j =
6 i
: state i: he has k$i at the end of the month (i = 0, 1, 2, 3)
Pij if i 2
/A
and the transition matrix
(transform all states in A into absorbing states) 1/4
*
6 J0 j
0 1 3/4 1/2 8 1 1/4
; the probability the original Markov Chain (with matrix P) enters a 3/4 1/4 0 0 1/4 T h
B1/2 1/4 1/4 C
0 C 1/4 1/4
state of A by time m is the probability the transformed Markov P=B
@ 1/4 1/4 1/4 1/4A 1/4
Chain (with matrix Q) is in one of the states of A at time m x *
3 h
0 1/4 1/4 1/2 1/4
6 2 j 1/4 1/2
; analyse Q[m] 1/4
MATH3801-MATH3901 Stochastic Processes 217 MATH3801-MATH3901 Stochastic Processes 218
4. Markov Chains 4.2 Chapman-Kolmogorov Equations 4. Markov Chains 4.3 Classification of States
As we are interested in whether his capital ever falls as low as 1, we will use Accessible states
the transformed transition matrix
0 1
1
6 J0 8 1T h 1 Definition
1 0 0 0
B 0 C 1/4 1/4 State j is said to be accessible from state i, denoted by i ! j, if
1 0 0 C 1/4
Q=B
@ 1/4 1/4 1/4 1/4A *
[Pn ]ij = Pij > 0
[n]
0 1/4 1/4 1/2
1/4
6 2 j 1/4 3 h 1/2
1/4
Then, 0 1 for some n 0
1 0 0 0
B C ; starting in i, it is possible that the process will ever enter state j
Q[4] B 0
= Q4 = @
1 0 0 C
0.36 0.50 0.05 0.08A
0.14 0.64 0.08 0.13 Properties:
Because the pensioner’s initial end of month capital was 3, the desired for all state i, i ! i
probability is for all states i, j and k , if i ! j and j ! k , then i ! k
[4] [4]
Q3,0 + Q3,1 = 0.14 + 0.64 = 0.78
MATH3801-MATH3901 Stochastic Processes 219 MATH3801-MATH3901 Stochastic Processes 220
4. Markov Chains 4.3 Classification of States 4. Markov Chains 4.3 Classification of States
Communicating states Communication classes
Definition
Two states i and j communicate, denoted i $ j, if i is accessible from j
and j is accessible from i (i ! j and j ! i)
Definition
Communication ($) is an equivalence relationship: A class C is a non-empty set of states such that for each state i 2 C, i
1 reflexivity: i $ i communicates with all j 2 C and does not communicate with any j 2 /C
2 symmetry: i $ j () j $i
Property: any two classes are either identical or disjoint
3 transitivity: if i $ j and j $ k , then i $ k
; the state space is partitioned into one or more disjoint classes
; the state space can be uniquely broken down into communication
classes, such that within each class all states communicate with
each other, but there is no communication between classes
MATH3801-MATH3901 Stochastic Processes 221 MATH3801-MATH3901 Stochastic Processes 222
4. Markov Chains 4.3 Classification of States 4. Markov Chains 4.3 Classification of States
Communications classes Irreducible Chain
Example
Definition
P12 A Markov Chain is said to be irreducible if there is only one class, that
P01 * P52
P00
5 0^
/ 1 j 2 o 5 is, if all states communicate with each other
P21 ; three classes:
P13 P24
P30 ✏ ✏ {0, 1, 2, 3}, {4} and {5} Example
3 P34
/ 4
i P44
0.5
t
0.7 5 0 9 1
T
0.4
Example: gambler’s ruin problem 0.3 0.2 ; one class {0, 1, 2, 3}
 0.5
p p p p y
*
.( . . j
, + 2 4 3 i 0.8
1 5 0 j 1 j 2 j N–1 N k 1
0.6
1 p 1 p 1 p 1 p
; three classes: {0}, {1, 2, . . . , N 1}, and {N}
MATH3801-MATH3901 Stochastic Processes 223 MATH3801-MATH3901 Stochastic Processes 224
4. Markov Chains 4.3 Classification of States 4. Markov Chains 4.3 Classification of States
Irreducible Chain Classification of states
For any state let pi the probability that, starting in state i, the process
will ever re-enter state i
Definition Definition
A Markov Chain is said to be irreducible if there is only one class, that State i is said
is, if all states communicate with each other 1 recurrent if pi = 1, or
2 transient if pi < 1
Example: random walk
p p p p p p Suppose the process is in state i and i is recurrent
+ + * * *
... j –2 k –1 k 0 j 1 j 2 j .( . . ; with probability 1, the process will eventually re-enter state i
1 p 1 p 1 p 1 p 1 p 1 p ; Markov property ; the process will be starting over again
one class: {0, ±1, ±2, . . .} ; with probability 1, the process will eventually re-enter state i
(again)
; ...
If state i in recurrent, then, starting in i, the process will re-enter it
again and again and again, i.e. infinitely often
MATH3801-MATH3901 Stochastic Processes 225 MATH3801-MATH3901 Stochastic Processes 226
4. Markov Chains 4.3 Classification of States 4. Markov Chains 4.3 Classification of States
Classification of states Classification of states
+1
X
For any state let pi the probability that, starting in state i, the process
The number of visits to state i can be written Ni = 1I{Xn =i}
will ever re-enter state i
n=0
Definition Hence
State i is said +1
X +1
X +1
X [n]
1 recurrent if pi = 1, or E (Ni |X0 = i) = E(1I{Xn =i} |X0 = i) = P(Xn = i|X0 = i) = Pii
2 transient if pi < 1. n=0 n=0 n=0
Suppose the process is in state i but i is now transient Proposition
; with probability (1 pi ) > 0, the process will never re-enter state i +1
X +1
X
[Pn ]ii = +1
[n]
State i is recurrent if Pii =
; probability it will visit state i exactly n times = pin 1 (1 pi )
n=0 n=0
; the number of times state i is visited Ni ⇠ Geo(1 pi ) +1
X +1
X
[Pn ]ii < +1
[n]
; E(Ni ) = 1 1pi < 1, Var(Ni ) = (1 ppi )2 < 1 and transient if Pii =
i n=0 n=0
If state i in transient, then it will only be visited a finite number of times
[n]
(with probability 1) Corollary: if i is transient, then Pii ! 0 as n ! 1
MATH3801-MATH3901 Stochastic Processes 227 MATH3801-MATH3901 Stochastic Processes 228
4. Markov Chains 4.3 Classification of States 4. Markov Chains 4.3 Classification of States
Classification of classes Classification of classes
Theorem Theorem
In a Markov Chain, either all states in a given class are recurrent or all In a Markov Chain, either all states in a given class are recurrent or all
are transient are transient
Proof: Let i be a recurrent state is class C, and j any other state in C. As i $ j, ; recurrence/transience is a class property
[k ] [m]
9 integers k and m such that Pij > 0 and Pji > 0. Now, for any integer n, ; we refer to the class itself as being recurrent or transient
X X [m] [n] [k ]
[m+n+k ]
Pjj Pjr Prs Psj
= Example
r s
[m] [n] [k ]
Pji Pii Pij P12
P01 * P52 ; three classes:
+1
X +1
X +1
X +1
X
P00
5 0^
/ 1 j 2 o 5
;
[n]
Pjj
[m+n+k ]
Pjj
[m] [n] [k ]
Pji Pii Pij = Pji Pij
[m] [k ] [n]
Pii = +1 P21 {4}: recurrent
P13 P24
n=0 n=0 n=0 n=0 P30 ✏ ✏ {5}: transient
as i is recurrent. It follows that j is also recurrent. Finally, as all states in a 3 / 4 P44 {0, 1, 2, 3}: transient
P34 i
class containing one recurrent state are recurrent, any state in a class
containing one transient state must be transient
MATH3801-MATH3901 Stochastic Processes 229 MATH3801-MATH3901 Stochastic Processes 230
4. Markov Chains 4.3 Classification of States 4. Markov Chains 4.3 Classification of States
Classification of classes Classification of classes
Theorem Example
In a Markov Chain, either all states in a given class are recurrent or all 0.5
t
are transient 0.7 5 0 9 1
T
0.4 ; one class {0, 1, 2, 3}:
; recurrence/transience is a class property 0.3 0.2
 0.5 recurrent
; we refer to the class itself as being recurrent or transient 2
y
4 3 i 0.8
0.6
Example: gambler’s ruin
p p p p
* , + Fact
1 5 0 j 1 j 2 j .( . . j N–1 N k 1
1 p 1 p 1 p 1 p In a finite-state Markov Chain, not all the states can be transient, i.e.
there is at least one recurrent class (and possibly arbitrarily many other
; three classes: {0}: recurrent recurrent and transient classes)
{N}: recurrent
{1, 2, . . . , N 1}: transient Corollary: an irreducible finite-state Markov Chain is a recurrent class,
that is, all states are recurrent in such a Markov Chain
MATH3801-MATH3901 Stochastic Processes 231 MATH3801-MATH3901 Stochastic Processes 232
4. Markov Chains 4.3 Classification of States 4. Markov Chains 4.3 Classification of States
Classification of classes Period of states
Example 4.18: random walk Definition
p p p p p p
The period of a state i, denoted by d(i), is the greatest common divisor
[n]
... j
+
–2 k
+
–1 k
*
0 j
*
1 j
*
2 j .( . . of those values of n for which Pii > 0. If the period of a state is 1, the
1 p 1 p 1 p 1 p 1 p 1 p state is aperiodic
one class C = {0, ±1, ±2, . . .} 7
*
8 1
J F
*
all states are either transient or recurrent 0 j 1
T T ⌦ ⌦
P1 [n]
let’s consider state 0 and determine n=0 P00 6 0 f 2
  J
[2n 1]
P00 = 0 for any n = 1, 2, . . . 2 j
*
3
n
p))
P00 = 2n
[2n] n ⌦ ⌦
n p (1 p)n = (2n)!
n!n! (p(1 p))n ⇠ (4p(1
p
n⇡
(Stirling’s approx.)
[n]
; P00 > 0 only for n = 2, 4, 6, . . . 5 j 4 3
P [n]
; n P00 , converging if p 6= 1/2 and diverging if p = 1/2 ; d(0) = 2 [n]
; P00 > 0 only for n = 4, 6, 8, 10, . . .
; the chain is recurrent if p = 1/2 (symmetric random walk) and transient
otherwise ; d(0) = 2
MATH3801-MATH3901 Stochastic Processes 233 MATH3801-MATH3901 Stochastic Processes 234
4. Markov Chains 4.3 Classification of States 4. Markov Chains 4.3 Classification of States
Period of classes Positive-recurrent and null-recurrent states
Let Tij be the number of transitions until a chain starting in state i
Theorem
enters state j for the first time
In a Markov Chain, all states in a given class have the same period
; Tii is the number of transitions until a chain starting in state i returns
Proof: Let i and j be two distinct states in a class. As i $ j, there are integers to state i
[m] [k ] [m+k ]
m and k such that Pij > 0 and Pji > 0. So Pii > 0, and m + k is
[n] [m+n+k ]
if i is transient, E(Tii ) = +1, as the chain will never be back (i.e.,
divisible by d(i). Let n be any integer such that Pjj > 0. Then, Pii > 0, Tii = +1) with positive probability
and m + n + k is divisible by d(i), and thus n is divisible by d(i). Since this is if i is recurrent, E(Tii ) < +1 or E(Tii ) = +1
[n]
true for any n such that Pjj > 0, d(j) itself is divisible by d(i). Reversing the
roles of i and j, d(i) is divisible by d(j), so d(i) = d(j). Definition
; periodicity is a class property A state i of a Markov Chain is positive-recurrent if it is recurrent and
E(Tii ) < +1. It is null-recurrent if it is recurrent and E(Tii ) = +1
; we refer to the class itself as having the period of its states
Notes:
; a class with period 1 is said to be aperiodic 1 positive- and null-recurrence are class properties
2 in a finite-state Markov Chain, all recurrent states are
positive-recurrent
MATH3801-MATH3901 Stochastic Processes 235 MATH3801-MATH3901 Stochastic Processes 236
4. Markov Chains 4.3 Classification of States 4. Markov Chains 4.3 Classification of States
Null-recurrence: an example Null-recurrence: an example
Random walk Random walk
p p p p p p 1/2 1/2 1/2 1/2 1/2 1/2
+ + * * *
... j –2 k –1 k 0 j 1 j 2 j .( . . ... j
+
–2 k
+
–1 k
*
0 j
*
1 j
*
2 j .( . .
1 p 1 p 1 p 1 p 1 p 1 p
1/2 1/2 1/2 1/2 1/2 1/2
We showed: Similarly,
the chain is transient if p 6= 1/2
E(T00 ) = (1 + E(T10 )) ⇥ 1/2 + (1 + E(T 10 )) ⇥ 1/2
the chain is recurrent if p = 1/2
Let’s compute E(T10 ) in the latter case. Conditioning on the first transition, By symmetry again,
E(T10 ) = E(T 10 )
E(T10 ) = E(T10 |left)P(left) + E(T10 |right)P(right)
= 1 ⇥ 1/2 + E(1 + T21 + T10 ) ⇥ 1/2 so that
E(T00 ) = 1 + E(T10 ) = +1
By symmetry, E(T21 ) = E(T10 ), so that
; 0 is a null-recurrent state
E(T10 ) = 1/2 + (1 + 2E(T10 )) ⇥ 1/2 = 1 + E(T10 )
; the whole chain is null-recurrent
; E(T10 ) = +1
MATH3801-MATH3901 Stochastic Processes 237 MATH3801-MATH3901 Stochastic Processes 238
4. Markov Chains 4.3 Classification of States 4. Markov Chains 4.3 Classification of States
Ergodic states, classes and chains Classification of states: summary
A state can be:
Type of state Definition
recurrent eventual return to state certain
transient eventual return to state uncertain
Definition
positive-recurrent recurrent, finite mean recurrence time
A state is said to be ergodic if it is positive-recurrent and aperiodic. A
class of ergodic states is an ergodic class. A irreducible Markov Chain null-recurrent recurrent, infinite mean recurrence time
consisting of one ergodic class is an ergodic chain periodic return to state possible only at times k , 2k ,
3k , . . ., for k > 1 integer
; ergodic chains have many desirable properties (see later) aperiodic not periodic
ergodic aperiodic and positive-recurrent
A Markov Chain is ergodic if it is: 1 irreducible
2 positive-recurrent
3 aperiodic
MATH3801-MATH3901 Stochastic Processes 239 MATH3801-MATH3901 Stochastic Processes 240
4. Markov Chains 4.4 Limiting Probabilities 4. Markov Chains 4.4 Limiting Probabilities
Limiting probabilities: heuristics Limiting probabilities: heuristics
Example
Intuitively: in the long run, the starting state does not really matter
Consider Example 4.1 (weather), with ↵ = 0.7 and = 0.4
Mathematically: it seems reasonable to assume that there exists some
We know ✓ ◆ probability ⇡j such that for any i
0.7 0.3
P= ,
0.4 0.6 [n]
lim Pij = ⇡j ,
so that n!1
✓ ◆ ✓ ◆
0.61 0.39 0.5749 0.4251 that is 0 1
P[2] = P2 = and P[4] = P4 = ⇡
0.52 0.48 0.5668 0.4332
B ⇡ C
Further, lim Pn = ⇧ = @
B C
A
n!1 ...
✓ ◆ ✓ ◆ ⇡
0.5715 0.4285 0.5714 0.4286
P[8] = P8 = and P[16] = P16 =
0.5714 0.4286 0.5714 0.4286 with ⇡ = (⇡0 , ⇡1 , ⇡2 , . . .)
; ‘long-ago’ weather does no more influence today’s weather
MATH3801-MATH3901 Stochastic Processes 241 MATH3801-MATH3901 Stochastic Processes 242
4. Markov Chains 4.4 Limiting Probabilities 4. Markov Chains 4.4 Limiting Probabilities
Limiting probabilities: heuristics Theorem 4.1
If this is true, then we have For an ergodic Markov Chain,
[n]
⇣ ⌘ lim Pij exists for all j
lim Pn = lim Pn+1 = lim (Pn P) = lim Pn P, n!1
n!1 n!1 n!1 n!1
and is independent of i. Furthermore, letting
so that ⇧ = ⇧P or [n]
⇡ = ⇡P, ⇡j = lim Pij ,
n!1
that is, for any j 0, X then ⇡ = (⇡0 , ⇡1 , ⇡2 , . . .) is the unique nonnegative solution of the
⇡j = ⇡i Pij system of equations
i2SX
8 X
; the limiting probabilities {⇡j } are solutions of the system ⇡ = ⇡P > ⇡j =
> ⇡i Pij (8j = 0, 1, 2, . . .)
<
Questions: i2S
XX
: 1 =
>
> ⇡j
1 Does Pn effectively converge? To ⇧?
i2SX
2 Does ⇡ = ⇡P always have a solution?
3 If so, is this solution unique? Remark: the last equation is a normalisation constraint making the
solution ⇡ a proper probability distribution
MATH3801-MATH3901 Stochastic Processes 243 MATH3801-MATH3901 Stochastic Processes 244
4. Markov Chains 4.4 Limiting Probabilities 4. Markov Chains 4.4 Limiting Probabilities
Stationary distribution Stationary distribution
See that we have, by the Law of Total Probability,
X ⇡j ' P(Xn = j) for n “large enough” ; unconditional probability
P(Xn+1 = j) = = P(Xn+1
= j|Xn i)P(Xn = i) ; ⇡j is also the long-run proportion of time that the process will be in
i2SX state j
Now suppose that the distribution of Xn is ⇡. Then, now, remind that E(Tjj ) is the expected number of transitions until
X X a chain starting in state j returns to state j
P(Xn+1 = j) = P(Xn+1 = j|Xn = i)⇡i = Pij ⇡i = ⇡j
i2SX i2SX the chain is ergodic, so it is positive-recurrent, so E(Tjj ) < 1 for
any j
and by induction P(Xn+k = j) = ⇡j for all k 0
; on average, the chain will spend 1 unit of time in state j for every
; once attained, this distribution is maintained for ever E(Tjj ) units of time. It follows
; the vector ⇡ is called the stationary distribution (or sometimes
the steady-state distribution) 1
⇡j =
E(Tjj )
Fact
In the short term the random evolution of the chain is described by P, ; the proportion of time in state j is the inverse of the mean time
whilst long-term changes are described by ⇡ between visits to state j
MATH3801-MATH3901 Stochastic Processes 245 MATH3801-MATH3901 Stochastic Processes 246
4. Markov Chains 4.4 Limiting Probabilities 4. Markov Chains 4.4 Limiting Probabilities
Stationary distribution: examples Stationary distribution: examples
Example 4.20 Example 4.21
Consider Example 4.1 (weather), with ↵ = 0.7 and = 0.4. Calculate the Consider Example 4.3 (Gary’s mood). In the long run, what proportion of
probability that it is raining on a given day. days is Gary in each of the three moods?
We have ✓ ◆ We have 0 1
↵ 1 ↵ 0.5 0.4 0.1
P= P = @0.3 0.4 0.3A ,
1
0.2 0.3 0.5
We have to solve ⇡ = ⇡P, i.e.
8 so that the limiting probabilities equations are
> ⇣ ⇡0 =
< ⇡0 ↵ + ⇡1 ⌘ 8
< ⇡0 = 0.5⇡0 + 0.3⇡1 + 0.2⇡2
⇡1 = ⇡0 (1 ↵) + ⇡1 (1 ) ⇡1 = 0.4⇡0 + 0.4⇡1 + 0.3⇡2
>
: :
1 = ⇡0 + ⇡1 1 = ⇡0 + ⇡1 + ⇡2
We find: Solving yields
1 ↵
⇡0 = ⇡1 = 21 23 18
1+ ↵ 1+ ↵ ⇡0 = = 0.34 ⇡1 = = 0.37 ⇡2 = = 0.29
62 62 62
4
With ↵ = 0.7 and = 0.4, ⇡0 = 7 = 0.5714
MATH3801-MATH3901 Stochastic Processes 247 MATH3801-MATH3901 Stochastic Processes 248
4. Markov Chains 4.4 Limiting Probabilities 4. Markov Chains 4.4 Limiting Probabilities
Limiting probabilities: example Limiting probabilities: example
Example 4.24 Example 4.24
A production process changes states in accordance with an irreducible, A production process changes states in accordance with an irreducible,
positive recurrent Markov Chain having transition probabilities Pij , positive recurrent Markov Chain having transition probabilities Pij ,
i, j = 1, . . . , n. Certain of the states (states A) are considered “acceptable” i, j = 1, . . . , n. Certain of the states (states A) are considered “acceptable”
and the remaining (Ac ) “unacceptable”. The production process is said to be and the remaining (Ac ) “unacceptable”. The production process is said to be
“up” when in an acceptable state and “down” when in an unacceptable state. “up” when in an acceptable state and “down” when in an unacceptable state.
a) Find the rate at which the production process goes from up to down (that b) Find the average number of transitions the process remains down when
is, the rate of breakdowns). it goes down, and the average number of transitions it remains up when
it goes up.
let ⇡k , k = 1, . . . , n be the stationary distribution (that is, the long run
denote U the number of transitions the process remains up when it goes
proportions in each state)
up and D the number of transitions it remains down when it goes down
rate at which the process enters state j from state i = ⇡i Pij
among those U + D transitions, only one is a breakdown (from up to
rate at which the process enters state j from an acceptable state
P down)
= i2A ⇡i Pij
on average, one breakdown every E(U + D) = E(U) + E(D) transitions
rate at which the process
P enters
P an unacceptable state from an P P 1
acceptable state = j2Ac i2A ⇡i Pij the rate at which breakdowns occur is E(U)+E(D) = j2Ac i2A ⇡i Pij
MATH3801-MATH3901 Stochastic Processes 249 MATH3801-MATH3901 Stochastic Processes 250
4. Markov Chains 4.4 Limiting Probabilities 4. Markov Chains 4.4 Limiting Probabilities
Limiting probabilities: example Limiting probabilities: example
Example 4.24 Example 4.24
A production process changes states in accordance with an irreducible, Suppose we have:
positive recurrent Markov Chain having transition probabilities Pij , 1/4
*
i, j = 1, . . . , n. Certain of the states (states A) are considered “acceptable” 0 1 1/4
6 0T f 8 1T h 1/4
and the remaining (Ac ) “unacceptable”. The production process is said to be 1/4 1/4 1/2 0
B C
“up” when in an acceptable state and “down” when in an unacceptable state. B 0
P=@
1/4 1/2 1/4C 1/2 1/4
1/4 1/4 1/4 1/4A  x 
b) Find the average number of transitions the process remains down when 4 3 h
1/4 1/4 0 1/2 1/4
6 2 1/2
it goes down, and the average number of transitions it remains up when 1/4
it goes up. where 0 and 1 are the acceptable states and 2 and 3 are the unacceptable
P states
In the long run, the proportion of time the process is up is i2A ⇡i , but it is
E(U)
also E(U)+E(D) , so we have to solve Then, the limiting probabilities satisfy
8 8
( P P 8 P > ⇡0 = 14 ⇡0 + 41 ⇡2 + 41 ⇡3
> > ⇡0
> = 3/16
1 < E(U) ⇡i < <
⇡1 = 14 ⇡0 + 41 ⇡1 + 41 ⇡2 + 41 ⇡3
i2A
= c ⇡i Pij = P P
i2A ⇡i Pij
⇡1 = 1/4
E(U)+E(D)
Pj2A i2A
) Pc
j2A )
E(U)
= ⇡i : E(D) = P j2A
P
c ⇡j > ⇡2 = 21 ⇡0 + 21 ⇡1 + 41 ⇡2
> > ⇡2
> = 14/48
E(U)+E(D) i2A ⇡i Pij : :
j2Ac i2A
1 = ⇡0 + ⇡1 + ⇡2 + ⇡3 ⇡3 = 13/48
MATH3801-MATH3901 Stochastic Processes 251 MATH3801-MATH3901 Stochastic Processes 252
4. Markov Chains 4.4 Limiting Probabilities 4. Markov Chains 4.4 Limiting Probabilities
Limiting probabilities: example Non-ergodic cases
Example 4.24
Suppose we have: If the chain is ergodic, then Pn converges as n ! +1 and there is one
1/4
* single stationary distribution ⇡, which is the unique solution of ⇡ = ⇡P
0
1/4 1/4 1/2 0
1 1/4
6 0T f 8 1T h 1/4
; what if the chain is not ergodic?
B 0 1/4 1/2 C
1/4C 1/2 1/4
P=B
@ ‘not ergodic’ means:
1/4 1/4 1/4 1/4A  x 
1/4 1/4 0 1/2 1/4
6 2 4 3 h 1/2
1 not irreducible
1/4
where 0 and 1 are the acceptable states and 2 and 3 are the unacceptable
I one recurrent class and some transient classes
states
I several recurrent classes (and possibly some transient classes)
Then, 2 periodic
9 3 null-recurrent
rate of breakdowns = ⇡0 (P02 + P03 ) + ⇡1 (P12 + P13 ) = ' 0.28
32
27/48
The down periods last, on average, E(D) = 9/32 = 2 time units, while we
14
have E(U) = 9 ' 1.56 time units.
MATH3801-MATH3901 Stochastic Processes 253 MATH3801-MATH3901 Stochastic Processes 254
4. Markov Chains 4.4 Limiting Probabilities 4. Markov Chains 4.4 Limiting Probabilities
Non-ergodic cases Non-ergodic cases
Non-irreducible chain: one recurrent class and some transient classes Non-irreducible chain: several recurrent classes
stationary distribution = long-term (n ! 1) behaviour of the chain Suppose we have the chain
0.5 0.5
transient classes play a role only for a finite number of transitions t *
1 5 0 1 2 i 1
; the possible initial transient behaviour of the chain cannot disturb
its long term behaviour {0} and {2} are recurrent classes, {1} is a transient
0 1 0 1 class
after a certain number of transitions, the chain enters the recurrent 1 0 0 1 0 0
class and never leaves it P = @0.5 0 0.5A ) Pn = @0.5 0 0.5A 8n 0
; everything happens as if there was only one irreducible chain 0 0 1 0 0 1
; Pn converges, but its rows are not equal
; check if it is ergodic or not
; sensitivity to the initial state
; if it is, Pn converges and there is a unique stationary distribution there are actually infinitely many stationary distributions: any
over the recurrent class (let ⇡k = 0 for any transient state k ) vector of the form ⇡ = (p 0 1 p) (with 0  p  1) satisfies the
equation ⇡ = ⇡P
the effective stationary distribution will only depend on the initial
distribution (distribution of X0 )
MATH3801-MATH3901 Stochastic Processes 255 MATH3801-MATH3901 Stochastic Processes 256
4. Markov Chains 4.4 Limiting Probabilities 4. Markov Chains 4.4 Limiting Probabilities
Non-ergodic cases Non-ergodic cases
Periodic chain Null-recurrent chain
Suppose we have the chain Consider the symmetric random walk
1
* 1/2 1/2 1/2 1/2 1/2 1/2
0 j 1 + + * * *
1
... j –2 k –1 k 0 j 1 j 2 j .( . .
1/2 1/2 1/2 1/2 1/2 1/2
one class {0, 1} with period 2
long-term behaviour sensitive to the initial step: if X0 = 0, the that we know to be null-recurrent (note that it is also periodic)
0 1
chain will be in state 0 every even times and in state 1 every odd ..
B ... . C
times✓(reverse
◆ if X0 ✓ = 1) ◆ ✓ ◆ ✓ ◆ B 1/2 0 1/2 0 C
B. . . ... C
0 1 1 0 0 1 1 0 B. . .
P=B 0 1/2 0 1/2 0 C
P= , P2 = , P3 = , P4 = , ... ... C
1 0 0 1 1 0 0 1 B
@ ... 0 1/2 0 1/2 0 C
. . .A
; Pn does not converge ..
. ...
But: ⇡ = ⇡P has a unique solution ⇡ = (0.5 0.5)
; the only solution of ⇡ = ⇡P is ⇡ = (. . . , 0, 0, 0, 0, . . .)
; this makes sense: on the long run, the chain spends half the time P
in state 0 and half the time in state 1 ; not a proper distribution ( ⇡i 6= 1) ; no stationary distribution
i
MATH3801-MATH3901 Stochastic Processes 257 MATH3801-MATH3901 Stochastic Processes 258
4. Markov Chains 4.4 Limiting Probabilities 4. Markov Chains 4.4 Limiting Probabilities
Ergodic Theorem Ergodic theorem: example
Ergodic Theorem (or Law of Large Numbers for Markov Chains) Fact
Most of the limit theorems for sums of independent random variables
Let {Xn , n 0} be an ergodic Markov Chain with stationary
(LLN, CLT, and their generalisations) essentially hold true for sums of
probabilities ⇡j , j 0. Let r be a bounded function: SX ! R ⇢ R.Then,
dependent r.v., provided the dependence is not too strong
with probability 1,
N 1
1 X X
lim r (Xn ) = r (j)⇡j The ergodic theorem provides a straightforward alternative proof that
N!1 N
n=0 j2SX ⇡j represents the long-term proportion of time that the chain is in state j
Take r (X ) = 1I{X =j} (obviously bounded)
for any starting value X0
PN 1
Then, n=0 1I{Xn =j} is the number of passages by state j after N time
units
; time-averaging over one particular path is equivalent to averaging
against the stationary distribution (at a particular fixed time) The theorem states
N 1
Loosely speaking: time average = space average 1 X
lim 1I{Xn =j} = ⇡j
; very powerful (and useful) result! N!1 N
n=0
MATH3801-MATH3901 Stochastic Processes 259 MATH3801-MATH3901 Stochastic Processes 260
4. Markov Chains 4.4 Limiting Probabilities 4. Markov Chains 4.4 Limiting Probabilities
Ergodic theorem: example Ergodic theorem: example
Markov Chain with reward
Markov Chain with reward
Suppose we have the Markov Chain
0.01
*
0.01
*
0.99
6 0 j 1 h 0.99
6 0 j
0.99 1 h 0.99 0.01
0.01
and suppose that we receive 100$ each time the chain is in state 1. If we let the chain tends to persist in whatever state it is for a long time
the process run from a given initial state, what is our average reward by ; starting in state 1, you receive an immediate reward plus some
transition in the long term? additional gain with high probability
But: the Ergodic Theorem states that
the chain is ergodic
N 1
the stationary distribution is easily seen to be ⇡ = (0.5 0.5) (symmetry) 1 X
⇢ lim r (Xn ) = r (0)⇡0 + r (1)⇡1 = 50$,
0 if Xn = 0 N!1 N
the function “reward” r is defined by r (Xn ) = n=0
100 if Xn = 1
1
PN 1 whatever the initial state and whatever the path taken by the chain
; we desire limN!1 N n=0 r (Xn )
MATH3801-MATH3901 Stochastic Processes 261 MATH3801-MATH3901 Stochastic Processes 262
4. Markov Chains 4.5 Some applications 4. Markov Chains 4.5 Some applications
An application: the Gambler’s Ruin Problem An application: the Gambler’s Ruin Problem
Gambler’s ruin problem Conditioning on the outcome of the initial play, we obtain
Consider a gambler who at each play of the game has probability p of Pi = pPi+1 + qPi 1
winning $1 and probability q = 1 p of losing $1. Assuming that or pPi + qPi = pPi+1 + qPi 1
successive plays are independent, what is the probability that, starting q
with $i, the gambler’s fortune will reach $N before reaching 0? that is, Pi+1 Pi = (Pi Pi 1 ) (i = 1, 2, . . . , N 1)
p
With state i = the gambler’s fortune is $i, we have: It follows, as P0 = 0,
p p p p
* , + ✓ ◆i
1 5 0 j 1 j 2 j .( . . j N–1 N k 1 q
q q q q
Pi+1 Pi = P1
p
Three classes: {0} and {N} recurrent, {1, 2, . . . , N 1} transient Now,
; after some amount of time, the gambler must either attain his goal of ✓ ◆ ✓ ◆2 ✓ ◆i !
i 1
X 1
$N or go broke q q q
Pj+1 Pj = Pi P1 = P1 + + ... +
Let Pi (i = 0, 1, . . . , N) be the probability that, starting with $i, he will p p p
j=1
eventually reach $N
MATH3801-MATH3901 Stochastic Processes 263 MATH3801-MATH3901 Stochastic Processes 264
4. Markov Chains 4.5 Some applications 4. Markov Chains 4.5 Some applications
An application: the Gambler’s Ruin Problem An application : the Gambler’s Ruin Problem
Solving the last equation for Pi in function of P1 yields, for
i = 1, 2, . . . , N, (
1 (q/p)i q Note that, as N ! +1,
Pi = 1 (q/p) P1 if p 6= 1
q ( ⇣ ⌘i
iP1 if =1 q 1
p 1 if p >
Pi ! p 2
Now, as PN = 1, it follows 0 if p  1
2
(
1 (q/p) q
1 (q/p)N
if p 6= 1 If p > 1/2, there is a positive probability (but < 1!) that the gambler’s
P1 = q
1/N if p =1 fortune will increase indefinitely
If p  1/2, the gambler will, with probability 1, go broke against an
Hence, (
1 (q/p)i 1 infinitely rich adversary (like a casino)
if p 6= 2
Pi = 1 (q/p)N
1
i/N if p = 2
MATH3801-MATH3901 Stochastic Processes 265 MATH3801-MATH3901 Stochastic Processes 266
4. Markov Chains 4.5 Some applications 4. Markov Chains 4.5 Some applications
Gambler’s ruin problem: examples Gambler’s ruin problem: examples
Example 4.28
Max and Patty decide to flip pennies; the one coming closest to the wall wins. Drug testing
Patty has a probability 0.6 of winning on each flip. a) If Patty starts with five Two drugs have been developed for treating a certain disease. Drug i has a
pennies and Max with ten, what is the probability that Patty will wipe Max out? cure rate Pi (i = 1, 2), obviously unknown. We would like a method for
b) What if Patty starts with 10 and Max with 20? deciding whether P1 > P2 or P2 > P1
; Gambler’s ruin problem, with a) i = 5, N = 15 and p = 0.6, so that the Consider the following test:
desired probability is
1 (2/3)5 pairs of patients are treated sequentially with one member of the pair
' 0.87, receiving drug 1 and the other drug 2, and the results are determined;
1 (2/3)15
and b) i = 10, N = 30 and p = 0.6, so that the desired probability is the testing stops when the cumulative number of cures using one of the
drugs exceeds the other number by some fixed predetermined number
1 (2/3)10 M
' 0.98.
1 (2/3)30
MATH3801-MATH3901 Stochastic Processes 267 MATH3801-MATH3901 Stochastic Processes 268
4. Markov Chains 4.5 Some applications 4. Markov Chains 4.5 Some applications
Gambler’s ruin problem : examples Gambler’s ruin problem : examples
What is the probability that the test will incorrectly (i.e. if P1 > P2 ) assert that
Formally, P2 > P1 ?
denote after each pair, the difference of cures goes up by 1 with probability
⇢ P1 (1 P2 ), goes down by 1 with probability P2 (1 P1 ), or remains the
1 if the patient in the jth pair to receive drug 1 is cured
Xj = same with probability P1 P2 + (1 P1 )(1 P2 )
0 otherwise
⇢ if we only consider those pairs in which the difference of cures changes,
1 if the patient in the jth pair to receive drug 2 is cured
Yj = then this difference goes up 1 with (conditional) probability
0 otherwise
P1 (1 P2 )
p=
P1 (1 P2 ) + P2 (1 P1 )
the test stops after N pairs of patients, where N is the first value of n
and down 1 with (conditional) probability
such that
X1 + . . . + Xn = Y1 + . . . + Yn + M P2 (1 P1 )
q=1 p=
P1 (1 P2 ) + P2 (1 P1 )
or
X1 + . . . + Xn = Y1 + . . . + Yn M
; Gambler’s ruin problem, the probability that the test will assert P2 > P1 is
In the former case, we assert that P1 > P2 and in the latter, P2 > P1 equal to the probability that the gambler’s fortune will go down M before
going up M
MATH3801-MATH3901 Stochastic Processes 269 MATH3801-MATH3901 Stochastic Processes 270
4. Markov Chains 4.5 Some applications 4. Markov Chains 4.5 Some applications
Gambler’s ruin problem : examples A www application: PageRankTM of Google
We have:
p p p p p
* , + Have you ever wondered how did Google displace all the other search
1
0 –M k –M+1 m .) . . j 0 j .( . . j M–1 M l 1
q q q q q
engines about ten years ago? (Altavista, Yahoo, Webcrawler, etc)
They simply had more efficient algorithms for defining the relevance of a
which is equivalent to:
particular web page, given a user’s search request
p p p p p
.( . . j
*
.( . . j
- , Ranking of webpages generated by Google is defined via a ‘random surfer’
1 5 0 j 1 j M j 2M–1 2M r 1
algorithm
q q q q q
So, the probability of asserting P2 > P1 is, with i = M and N = 2M in the ; that one is based on nothing else but a Markov chain!
gambler’s ruin problem, Let N be the total number of webpages, these representing the states of a
1 (q/p)M 1 1 Markov Chain (N !)
1 = = ⇣ ⌘M
1 (q/p)2M 1 + (p/q)M P1 (1 P2 ) For each page (state) i, let L(i) be the set of pages it links to, through its
1+ P2 (1 P1 )
hyperlinks; if L(i) = then i is a ‘dangling page’
For instance, if P1 = 0.6 and P2 = 0.4 then the test is mistaken with
probability 0.017 when M = 5 and this error rate even reduces to 0.0003
when M = 10
MATH3801-MATH3901 Stochastic Processes 271 MATH3801-MATH3901 Stochastic Processes 272
4. Markov Chains 4.5 Some applications 4. Markov Chains 4.5 Some applications
A www application: PageRankTM of Google A www application: PageRankTM of Google
Define transition probabilities Now this is how Google assigns ranks to pages:
⇢
1/|L(i)| if j 2 L(i), 0 otherwise It computes the stationary distribution ⇡ for the Markov Chain with
Pij⇤ =
1/N for all j if L(i) = transition matrix P such that [P]ij = Pij :
; if Xn = i is the currently visited webpage, the next visited page Xn+1 ⇡ = ⇡P, (?)
is picked uniformly at random amongst all pages that i links to,
and then it says
unless i is a dangling page, in such a case the chain moves
to any random page on the Internet page i has higher rank than page j if ⇡i > ⇡j
However, it is not clear that the chain is ergodic (irreducible?) ; ⇡i is the long-term proportion of time that a random surfer will be
; a slight modification is needed: select ↵ between 0 and 1 (in practice on page i, so the higher ⇡i , the more ‘popular’ page i is
Google uses ↵ ' 0.2) and let the new transition probabilities be The idea is simple, but the implementation is very difficult: solving (?)
1 is one of the largest matrix computations in the world, because of the
Pij = (1 ↵)Pij⇤ + ↵ 8(i, j)
N size of P (N ⇥ N, with N ⇠ 45 ⇥ 109 !)
“Bored surfer” interpretation: the surfer moves according to the original chain, PageRankTM is precisely the algorithm used to solve this problem. It is
unless he suddenly gets bored (this happens with probability ↵) in which case based in advanced techniques from linear algebra to speed up matrix
he clicks on a random page computations
MATH3801-MATH3901 Stochastic Processes 273 MATH3801-MATH3901 Stochastic Processes 274
4. Markov Chains 4.6 Mean Time Spent in Transient States 4. Markov Chains 4.6 Mean Time Spent in Transient States
Absorption problems Canonical form
Consider a finite-state Markov Chain, with T transient states (T > 1) You can always rearrange the states of the chain to have
finite-state chain ; at least one recurrent class ✓ ◆
Pt Ptr
if the chain enters a recurrent class, it remains in it for ever (see P⇤ =
0 Pr
tutorial exercise 4.5)
; the chain is absorbed by the class (cf. “absorbing state”) If R is the number of recurrent states, then
Pt is the (T ⇥ T )-matrix of transitions from a transient state to a transient
Possible questions:
state
1 what is the probability that the chain will end up in a recurrent
Ptr the (T ⇥ R)-matrix of transitions from a transient state to a recurrent
class, i.e. will eventually be absorbed? state
2 on the average, how long will it take for the chain to be absorbed?
0 a (R ⇥ T )-matrix of 0’s, as it is impossible to get out of a recurrent class
3 on the average, how long will the chain be in each transient states,
Pr is the (R ⇥ R)-matrix of transitions from a recurrent state to a
before being absorbed?
recurrent state, which is block diagonal (one block for each recurrent
4 if there are several recurrent classes, what are the respective class)
probabilities that the chain will be absorbed by each of them?
P⇤ is called the canonical form of P
MATH3801-MATH3901 Stochastic Processes 275 MATH3801-MATH3901 Stochastic Processes 276
4. Markov Chains 4.6 Mean Time Spent in Transient States 4. Markov Chains 4.6 Mean Time Spent in Transient States
Probability of Absorption Time to absorption
3 On the average, how long will the chain be in each transient
states, before being absorbed?
1 What is the probability that the chain will end up in a recurrent
class, i.e. will eventually be absorbed? denote T = {1, 2, . . . , T } the set of transient states
define Sij as the expected number of time periods that the chain
Answer: this probability is 1 will be in the transient state j, given that it is now in the transient
state i
Intuition:
; conditioning on the first transition from i, we get
the transient states are visited only a finite number of times, so the
probability that the chain leaves them and enters a recurrent class TX
+R T
X
must be 1 Sij = ij + Pik⇤ Skj = ij + [Pt ]ik Skj
k =1 k =1
Formal proof: show that Ptn ! 0 as n ! 1
with ij the Kronecker delta (= 1 if i = j and 0 otherwise)
MATH3801-MATH3901 Stochastic Processes 277 MATH3801-MATH3901 Stochastic Processes 278
4. Markov Chains 4.6 Mean Time Spent in Transient States 4. Markov Chains 4.6 Mean Time Spent in Transient States
Time to absorption Time to absorption
3 On the average, how long will the chain be in each transient 2 On the average, how long will it take for the chain to be absorbed?
states, before being absorbed?
let i be the number of time periods before the chain is absorbed,
In matrix notation, given that the chain is now in the transient state i
S = I + Pt S ; we have:
T
X
where 0 1
S11 S12 . . . S1T i = Sij
S = @ ... . C
B .. j=1
. .. A ,
(summing all the expected times of being in each transient state j
ST 1 . . . STT
gives the expected time in any of transient states, that is, the
and I is the identity matrix expected time before the chain is absorbed)
; It follows if = ( 1 2 . . . T )t , we have
1
S = (I Pt )
1
= S e = (I Pt ) e
(this inverse exists, as (I Pt ) is a diagonally dominant matrix)
This matrix S is often called the fundamental matrix with e a column vector all of whose entries are 1
MATH3801-MATH3901 Stochastic Processes 279 MATH3801-MATH3901 Stochastic Processes 280
4. Markov Chains 4.6 Mean Time Spent in Transient States 4. Markov Chains 4.6 Mean Time Spent in Transient States
Absorption probabilities Absorption probabilities
4 If there are several recurrent classes, what are the respective 4 If there are several recurrent classes, what are the respective
probabilities that the chain will be absorbed by each of them? probabilities that the chain will be absorbed by each of them?
PT
; Bij = k =1 Sik Pkj , that is,
⇤
let Bij be the probability that the first recurrent state to be entered
is state j if the chain is now in the transient state i B = S Ptr = (I Pt ) 1
Ptr
; we have: T
+1 X
X with B the (T ⇥ R)-matrix
⇤[n]
Bij = Pik Pkj⇤ 0 1
n=0 k =1
B1,T +1 B1,T +2 . . . B1,T +R
B = @ ... .. . C
B
T X
X +1 . .. A
⇤[n]
= Pik Pkj⇤ BT ,T +1 ... BT ,T +R
k =1 n=0
if C is a recurrent class of states, the probability that it absorbs the
P
now, 1I{X0 =i,Xn =k } = number of visits in state k chain, given that the chain is now in state i, is
n
X
if the chain is now in state i BiC = Bij
P P P ⇤[n]
; Sik = E n 1I{X0 =i,Xn =k } = E 1I{X0 =i,Xn =k } = n Pik
n j2C
MATH3801-MATH3901 Stochastic Processes 281 MATH3801-MATH3901 Stochastic Processes 282
4. Markov Chains 4.6 Mean Time Spent in Transient States 4. Markov Chains 4.6 Mean Time Spent in Transient States
Absorption: examples Absorption: examples
; 3 classes: {1, 2, 3} is transient, {0} and {4} are recurrent
Example: the Drunkard’s walk 0 1
0 1/2 0 1/2 0
A man walks along a four-block stretch of George Street. If he is at corner 1, B 1/2 0 1/2 0 0 C
2 or 3, then he walks to the left or right with equal probability. He continues B C
B
; canonical form P⇤ = B 0 1/2 0 0 1/2 C
until he reaches corner 4, which is a bar, or corner 0, which is his home. If he C,
@ 0 0 0 1 0 A
reaches either home or the bar, he stays there. 0 0 0 0 1
0 1 0 1
0 1/2 0 1/2 0
1/2 1/2 1/2
* * * Pt = @1/2 0 1/2A , Ptr = @ 0 0 A
1
6 0 j 1 j 2 j 3 4 h 1 0 1/2 0 0 1/2
1/2 1/2 1/2
0 1 he is guaranteed to end up either at home or at the bar (chain absorbed
1 0 0 0 0
B C by state 0 or state 4)
B1/2 0 1/2 0 0 C
B 0
P=B 1/2 0 1/2 C
0 C we have
@ 0 0 1/2 0 1/2A 0 1 1 0 1
1 1/2 0 3/2 1 1/2
0 0 0 0 1 S = (I Pt ) 1
= @ 1/2 1 1/2A =@ 1 2 1 A
0 1/2 1 1/2 1 3/2
MATH3801-MATH3901 Stochastic Processes 283 MATH3801-MATH3901 Stochastic Processes 284
4. Markov Chains 4.6 Mean Time Spent in Transient States 4. Markov Chains 4.6 Mean Time Spent in Transient States
Absorption: examples Absorption: examples
so 0 1 finally,
3/2 1 1/2 0 10 1 0 1
S=@ 1 2 1 A 3/2 1 1/2 1/2 0 3/4 1/4
1/2 1 3/2 B = S Ptr = @ 1 2 1 A@ 0 0 A = @1/2 1/2A
1/2 1 3/2 0 1/2 1/4 3/4
; if the man is currently e.g. at corner 2, on average he will go once by
corner 1, once by corner 3 and twice by corner 2 (including the current ; if he is at corner 2, he is equally likely to end up at home or at the bar.
situation) before getting home or the bar Otherwise, he’ll be absorbed by the bar with probability 1/4 if he’s at
0 10 1 0 1 corner 1 and probability 3/4 if he’s at corner 3
further, 3/2 1 1/2 1 3
= Se = @ 1 2 1 A @1A = @4A
1/2 1 3/2 1 3
; if he is at corner 1 or 3, he will walk on average 3 blocks, while if he is at
corner 2, he will walk on average 4 blocks
MATH3801-MATH3901 Stochastic Processes 285 MATH3801-MATH3901 Stochastic Processes 286
4. Markov Chains 4.6 Mean Time Spent in Transient States 4. Markov Chains 4.6 Mean Time Spent in Transient States
Absorption: examples Absorption: examples
Example 4.30 Example 4.30
Consider the gambler’s ruin problem with p = 0.4 and N = 7. Starting with 3 Consider the gambler’s ruin problem with p = 0.4 and N = 7. Starting with 3
dollars, determine (a) the expected amount of time the gambler has 5 dollars; dollars, determine (a) the expected amount of time the gambler has 5 dollars;
(b) the expected amount of time the gambler has 2 dollars (b) the expected amount of time the gambler has 2 dollars
; it follows
0 1
0.4
*
0.4
*
0.4
*
0.4
*
0.4
*
0.4
* 1.6149 1.0248 0.6314 0.3691 0.1943 0.0777
5 j 7 h B C
1
6 0 j 1 j 2 j 3 j 4 j 6 1
B1.5372 2.5619 1.5784 0.9228 0.4857 0.1943C
B C
0.6 0.6 0.6 0.6 0.6 0.6
S = (I Pt ) 1 B1.4206
=B
2.3677 2.9990 1.7533 0.9228 0.3691C
C
0 1 0 1 B1.2458 2.0763 2.6299 2.9990 1.5784 0.6314C
0 0.4 0 0 0 0 0.6 0 @0.9835 1.6391 2.0763 2.3677 2.5619 1.0248A
B 0 C B C
B0.6 0 0.4 0 0 C B 0 0 C 0.5901 0.9835 1.2458 1.4206 1.5372 1.6149
B 0 C B C
B 0
Pt = B
0.6 0 0.4 0 C B 0 0 C
C , Ptr = B 0 C ; (a) S35 = 0.9228, (b) S32 = 2.3677
B 0 0 0.6 0 0.4 0 C B 0 C
@ 0 0 0 0.6 0 0.4A @ 0 0 A
0 0 0 0 0.6 0 0 0.4
MATH3801-MATH3901 Stochastic Processes 287 MATH3801-MATH3901 Stochastic Processes 288
4. Markov Chains 4.6 Mean Time Spent in Transient States 4. Markov Chains 4.7 Branching Processes
Absorption: examples Probability generating function
Example 4.30 We know about the moment generating function of a r.v. X ,
Consider the gambler’s ruin problem with p = 0.4 and N = 7. Starting with 3 'X (t) = E etX
dollars, determine (a) the expected amount of time the gambler has 5 dollars;
(b) the expected amount of time the gambler has 2 dollars Definition
The probability generating function (pgf) of a non-negative
; further,
0 1 0 1 integer-valued r.v. X is defined by
3.91 0.97 0.03
B C B C
B 7.28 C B0.92 0.08C ⇣ ⌘ X +1
B
B 9.83 C
C B C
B0.85 0.15C GX (s) = E sX = sj P(X = j)
= Se = B C and B = S Ptr = B C
B11.16C B0.75 0.25C j=0
@ 10.69 A @0.59 0.41A
7.39 0.35 0.65 Under reasonable conditions, the pgf uniquely determines the
distribution of X
; if he starts with 3 dollars, the game will last for 9.83 rounds on average,
he will reach 7 dollars with probability 0.15 and go broke with probability ; complete alternative representation of the probability distribution
0.85
; compare with Pi = (1 (q/p)i )/(1 (q/p)N ) as found in Section 4.5
MATH3801-MATH3901 Stochastic Processes 289 MATH3801-MATH3901 Stochastic Processes 290
4. Markov Chains 4.7 Branching Processes 4. Markov Chains 4.7 Branching Processes
Probability generating function: properties Probability generating function: properties
P+1 j
The pgf GX (s) = E sX = j=0 s P(X = j) is such that:
P Pn
1 it exists for all value s 2 [0, 1], as P(X = j) = 1 Suppose that X = i=1 Xi , with {Xi } i.i.d. random variables.
j 5
2 it is continuous, non-decreasing and convex over [0, 1] Then,
3 GX (0) = P(X = 0), and GX (1) = 1 n
! n
⇣ P ⌘ Y Y ⇣ ⌘
i.
4 it is infinitely many times differentiable over [0, 1], with GX (s) = E s i Xi = E sXi = E sXi
i=1 i=1
1 (j) 1 d j GX (s) n
Y
G (0) = = P(X = j) i.d. n
j! X j! dsj s=0
= GXi (s) = GXi (s)
i=1
(j) d j GX (s)
and GX (1) = = E (X (X 1) . . . (X j + 1))
dsj s=1 n
; GX (s) = GXi (s)
In particular: GX0 (1) = E(X )
MATH3801-MATH3901 Stochastic Processes 291 MATH3801-MATH3901 Stochastic Processes 292
4. Markov Chains 4.7 Branching Processes 4. Markov Chains 4.7 Branching Processes
Probability generating function: properties Branching process: introduction
Consider the following model for studying the population of various
PN types of individuals from one generation to the next:
6 Suppose that X = i=1 Xi , with N a positive integer-valued
random variable and {Xi } i.i.d. random variables independent of N by the end of its lifetime, each individual has given rise to a certain
(compound random variable, see Slides 168 and 176). Then, number of offspring, according to some known probability
distribution
N
! !! N
⇣ PN ⌘ Y Y the number of offspring is independent from one individual to
GX (s) = E s i=1 Xi = E sXi =E E sXi N another
i=1 i=1
!
N
Y ⇣ ⌘ ⇣ ⌘ The evolution of the total population as time proceeds is termed a
N
=E E sXi =E GXi (s) = GN GXi (s)
i=1 branching process
; GX (s) = GN GXi (s) = GN GXi (s) “Individuals” can be insects, bacteria, particles, family names, etc.
; very useful tool in epidemiologic and social studies, particularly in
modelling disease spread or population growth
MATH3801-MATH3901 Stochastic Processes 293 MATH3801-MATH3901 Stochastic Processes 294
4. Markov Chains 4.7 Branching Processes 4. Markov Chains 4.7 Branching Processes
Branching process: introduction Branching process: introduction
Such a branching process is also known as the Galton-Watson Let Xn = number of individuals in generation n of the population
process
In 1873, Sir Francis Galton (1822-1911) had been interested for some
years in the decay of some aristocratic surnames. He originally posed
the following question in a review called the Educational Times:
“A large nation, of whom we will only concern ourselves with the adult
males, N in number, and who each bear separate surnames, colonise a
district. Their law of population is such that, in each generation, a
per cent of the adult males have no male children who reach adult life;
b have one such male child; c have two; and so on up to f who have
five. Find what proportion of the surnames will have become extinct
after r generations.”
; the evolution of the population is described by the sequence of r.v.
The Reverend Henry Watson (1827-1903), clergyman and
X0 , X1 , X2 , . . .
mathematician, replied with a solution, which was to be the basis of the
modern treatment of such a process ; discrete-time stochastic process
MATH3801-MATH3901 Stochastic Processes 295 MATH3801-MATH3901 Stochastic Processes 296
4. Markov Chains 4.7 Branching Processes 4. Markov Chains 4.7 Branching Processes
Branching process: Markov property Branching process: Markov property
Let Yn,k be the number of offspring of the k th individual of the nth See that 0 (extinction) is an absorbing (and thus recurrent) state:
generation P00 = P(Xn+1 = 0|Xn = 0) = 1
Galton-Watson definition of the process Also, Pi0 = p0i for all i 1
By assumption, {Yn,k } are i.i.d. random variables, with common if p0 > 0, Pi0 > 0 so all states are transient (except 0)
probability distribution P(Y = j) = for j = 0, 1, 2, . . . and
pj if p0 = 0, Pi0 = 0 = Pij for any j < i, so all states are transient
(except 0)
Xn
X
Xn+1 = Yn,k
Consequence
k =1
The population will either grow indefinitely or go extinct
Thus, P
P(Xn+1 = i2 |Xn = i1 , Xn 1 = i0 ) = P( kX=1
n
Yn,k = i2 |Xn = i1 , Xn 1 = i0 ) Natural question: what is the probability of extinction?
P i1 Pi1
= P( k =1 Yn,k = i2 |Xn = i1 , Xn 1 = i0 ) = P( k =1 Yn,k = i2 ) = Pi1 ,i2 If p0 = 0 and X0 6= 0, this probability is 0; if p0 = 1, this probability is 1
; {Xn , n 0} is a Markov Chain, with states {0, 1, 2, . . .} ; below, we assume 0 < p0 < 1
MATH3801-MATH3901 Stochastic Processes 297 MATH3801-MATH3901 Stochastic Processes 298
4. Markov Chains 4.7 Branching Processes 4. Markov Chains 4.7 Branching Processes
Probability of extinction Probability of extinction
Suppose we have X0 = k > 0 (initial size of the population) Now, conditioning on the size of the population in the 1st generation,
+1
[ we obtain
Extinction if Xn = 0 for some n 0, thus ‘extinction’ = (Xn = 0) !
+1
[
n=1
S+1 ⇡0 = P (Xn = 0) X0 = 1
Denote ⇡0 (k ) = P n=1 (Xn = 0)|X0 = k , and ⇡0 = ⇡0 (1). Then, n=1
+1 +1
!
! X [
k +1
\ [ = P (Xn = 0) X1 = j, X0 = 1 P(X1 = j|X0 = 1)
⇡0 (k ) = P (Xn = 0|X0 = 1) j=0 n=1
i=1 n=1 +1 +1
!
k +1
! X [
Y [ = P (Xn = 0) X1 = j P(X1 = j|X0 = 1)
= P (Xn = 0) X0 = 1 j=0 n=1
i=1 n=1
+1
X +1
X
k
Y = ⇡0 (j)P1j = ⇡0j pj = E(⇡0Y ) = G(⇡0 )
= ⇡0 = ⇡0k
j=0 j=0
i=1
with G(s) the probability generating function of the offspring number Y
; enough to know ⇡0 (and k )
MATH3801-MATH3901 Stochastic Processes 299 MATH3801-MATH3901 Stochastic Processes 300
4. Markov Chains 4.7 Branching Processes 4. Markov Chains 4.7 Branching Processes
Probability of extinction Probability of extinction
Theorem
⇡0 is the smallest non-negative root of the equation s = G(s)
; ⇡0 is a root of the equation s = G(s) Proof: Let G (s) be the pgf of Xn , and see that G1 (s) = G(s) (if X0 = 1). Also,
n
P X1
the pgf of X2 = k =1 Yn,k is G2 (s) = G1 (G(s)) = G(G(s)), and by induction
We know that G(1) = 1, so 1 is a root. Does that mean that ⇡0 = 1?
Gn (s) = Gn 1 (G(s)) = . . . = G(Gn 1 (s))
No, s = G(s) may have other roots Denote ⇡0,n = P(Xn = 0) and see that ⇡0 = limn!1 ⇡0,n . We have:
⇡0,n = Gn (0) = G(Gn 1 (0)) = G(⇡0,n 1) for n = 1, 2, . . .
Theorem with ⇡0,0 = 0. Now, let ⌘ be any non-negative root of s = G(s). Because G is
⇡0 is the smallest non-negative root of the equation s = G(s) non-decreasing on [0, 1], we have
⇡0,1 = G(⇡0,0 ) = G(0)  G(⌘) = ⌘,
⇡0,2 = G(⇡0,1 )  G(⌘) = ⌘
.
..
⇡0,n  ⌘ 8n ; ⇡0 = lim ⇡0,n  ⌘
n!1
MATH3801-MATH3901 Stochastic Processes 301 MATH3801-MATH3901 Stochastic Processes 302
4. Markov Chains 4.7 Branching Processes 4. Markov Chains 4.7 Branching Processes
Probability of extinction Probability of extinction: conclusion
We know that G(s) is continuous, non-decreasing, convex and such
that G(0) = p0 > 0, G(1) = 1 and G0 (1) = E(Y ), denoted by µ
Essentially, there are only two possibilities:
Conclusion
We have thus shown that, for any initial number k of individuals:
if µ  1, the population will eventually die out with probability 1
if µ > 1, the population will eventually die out with probability ⇡0k
strictly between 0 and 1
where µ = E(Y ) is the average number of offspring of an individual
(a) µ > 1 ; 2 roots on [0, 1] ; ⇡0 < 1
(b) µ  1 ; 1 root s = 1 on [0, 1] ; ⇡0 = 1
MATH3801-MATH3901 Stochastic Processes 303 MATH3801-MATH3901 Stochastic Processes 304
4. Markov Chains 4.7 Branching Processes 4. Markov Chains 4.7 Branching Processes
Expected number of individuals in a given generation Expected number of individuals in a given generation
We could also be interested in the expected number of individuals in There are thus three possibilities:
generation n. We know that E(Xn ) = µE(Xn 1 ) (expectation of a (a) if µ < 1, then E(Xn ) ! 0 and Var(Xn ) ! 0;
compound random variable). This could also have been shown using (b) if µ = 1, then E(Xn ) ⌘ k and Var(Xn ) ! 1 (linearly);
the pgf: Gn (s) = G1 (Gn 1 (s)), so that (c) if µ > 1, then E(Xn ) ! 1 and Var(Xn ) ! 1 (exponentially).
Gn0 (s) = G10 (Gn 0
1 (s))Gn 1 (s) Obviously, case (a) inexorably leads to extinction (⇡0 (k ) = 1). Cases
and, as Gn = 1, (b) and (c) are trickier:
1 (1)
(b) E(Xn ) > 0 for all n, but the probability of extinction is 1.
E(Xn ) = Gn0 (1) = G10 (1)Gn0 1 (1) = µE(Xn 1)
; when n gets large, the nth generation contains a large number
Iterating, we get of individuals with a very small probability and contains no
E(Xn ) = µn E(X0 ) = µn k individuals (extinction) with a very large probability
Using either the properties of a compound r.v. or the pgf, it follows in (c) there is a positive probability of extinction, even though the
the same way ( ⇣ ⌘ expected number of individuals is growing exponentially to 1
1 µn
k 2 µn 1 if µ 6= 1
1 µ (reason similar to above)
Var(Xn ) =
kn 2 if µ = 1 ; because of the infinite variance, E(Xn ) is not a good
with 2 = Var(Y ) approximation to the actual number of individuals in the population
MATH3801-MATH3901 Stochastic Processes 305 MATH3801-MATH3901 Stochastic Processes 306
4. Markov Chains 4.7 Branching Processes 4. Markov Chains 4.7 Branching Processes
Branching processes: examples Branching processes: examples
Example
Example 4.32 Consider a branching process with Y following the distribution
Consider a branching process with Y such that P(Y = 0) = 1/2,
P(Y = 1) = 1/4 and P(Y = 2) = 1/4. Determine ⇡0 P(Y = j) = (1 p)j p j = 0, 1, 2, . . .
Since µ = E(Y ) = 3/4 < 1, ⇡0 = 1 for some p 2 (0, 1). Find the probability of extinction of the process
Example 4.33 The pgf of Y is
Consider a branching process with Y such that P(Y = 0) = 1/4, +1
X +1
X p
P(Y = 1) = 1/4 and P(Y = 2) = 1/2. Determine ⇡0 G(s) = sj P(Y = j) = sj (1 p)j p =
1 (1 p)s
j=0 j=0
1
Here µ = E(Y ) = 5/4 > 1, thus ⇡0 < 1. The pgf of Y is G(s) = 4 + 41 s + 21 s2 , p
so ⇡0 satisfies Hence, ⇡0 is the smallest non-negative solution of s = 1 (1 p)s , whose roots
1 1 1 are1±(2p 1) p
⇡0 = + ⇡0 + ⇡02 2(1 p) , that is, 1 p and 1
4 4 2 p
; if p < 1/2, then the smallest solution is ⇡0 =
The smallest positive solution of this equation is ⇡0 = 1/2 1 p
; if p 1/2, then the smallest solution is ⇡0 = 1
MATH3801-MATH3901 Stochastic Processes 307 MATH3801-MATH3901 Stochastic Processes 308
4. Markov Chains 4.7 Branching Processes 4. Markov Chains 4.7 Branching Processes
Branching processes: examples Branching processes: examples
The distribution of Xn for n ‘large’ (here: n = 10) is typically
The following table shows the results from the first 10 simulations (out of
5,000) of a branching process with offspring distribution as in the previous “boom-or-bust”:
example, with p = 0.3 (; µ = 2.333) either the population take off (boom), and the population size
grows quickly,
or the population fail altogether (bust)
Simul. X0 X1 X2 X3 X4 X5 X6 X7 X8 X9
; if the population goes extinct, it is likely to do so very quickly, within
1 1 1 0 0 0 0 0 0 0 0
2 1 1 1 2 1 16 35 109 310 690
the first few generations
3 1 1 6 8 20 36 80 196 457 1075 Out of 5,000 simulations, we (empirically) observe a proportion of
4 1 0 0 0 0 0 0 0 0 0 42.64% of samples extinct by generation 10
5 1 5 13 29 87 167 392 983 2311 5072
6 1 0 0 0 0 0 0 0 0 0 ; very close to the theoretical probability of extinction for this model
7 1 0 0 0 0 0 0 0 0 0 ⇡0 = 1 p p = 3/7 ' 0.4288
8 1 1 4 10 12 36 137 357 966 2260 Even more interestingly, the mean and the variance of the observed
9 1 1 4 3 20 40 100 269 697 1706
X10 are 2,083 and 10,977,693, respectively
10 1 1 2 1 1 0 0 0 0 0
; the distribution of X10 has 42.64% of zeros, but (when it is non-zero)
takes values up to 45,532 ; very large mean and variance
MATH3801-MATH3901 Stochastic Processes 309 MATH3801-MATH3901 Stochastic Processes 310
4. Markov Chains 4.7 Branching Processes 4. Markov Chains 4.10 Markov Decision Processes
Branching processes: examples Markov Decision Processes: introduction
The following table shows the results from the first 10 simulations (out of We introduced on Slide 261 a Markov Chain with reward, that is, with
5,000) of a branching process with offspring distribution as in the previous some reward ri associated with any visit to state i
example, with p = 0.5 (; µ = 1)
A Markov Decision Process is a more elaborated structure in which a
decision maker can select between various possible alternatives for
Simul. X0 X1 X2 X3 X4 X5 X6 X7 X8 X9
rewards and transition probabilities, with a view to maximising the
1 1 1 0 0 0 0 0 0 0 0
2 1 1 2 6 1 2 1 0 0 0 expected reward
3 1 0 0 0 0 0 0 0 0 0
Example
4 1 1 0 0 0 0 0 0 0 0
5 1 3 3 2 4 8 9 6 8 10 Suppose you can choose between two possible alternatives in State 2:
6 1 0 0 0 0 0 0 0 0 0
0.01
7 1 0 0 0 0 0 0 0 0 0 0.01 *
* 0.99 5 0 j 1
8 1 3 6 7 19 23 27 29 26 28 0.99 5 0 j 1 i 0.99
1
9 1 1 1 0 0 0 0 0 0 0 0.01 r0 = 0 (2)
r1 = 50
r0 = 0 (1)
r1 =1
10 1 1 0 0 0 0 0 0 0 0
Which alternative would you choose?
Out of 5,000 simulations, by generation 10 we find 89.68% of samples
extinct, with a mean population size of 1.019 and a variance of 18.64 ; you have to make a decision
MATH3801-MATH3901 Stochastic Processes 311 MATH3801-MATH3901 Stochastic Processes 312
4. Markov Chains 4.10 Markov Decision Processes 4. Markov Chains 4.10 Markov Decision Processes
Markov Decision Processes: definition Markov Decision Processes: finite horizon
Suppose we have a Markov Chain with states denoted 0,1, . . ., M Suppose there are n transitions remaining and the chain is currently in
At each state i there is a choice between Ki alternatives, that is Ki state i. Then, the optimal policy value, say vi (n), satisfies the recursive
different transition probabilities sets relation 8 9
< XM =
(k ) (k )
(1) (2)
{Pij , j = 0, . . . , M}, {Pij , j = 0, . . . , M}, . . . , {Pij
(Ki )
, j = 0, . . . , M} vi (n) = max ri + Pij vj (n 1)
1k Ki : ;
j=0
(1) (2) (Ki )
and the corresponding immediate rewards, ri , ri , . . ., ri This is known as the Bellman equation. It follows from the so-called
You have to choose a policy which is the set of rules that prescribe the “principle of optimality”
alternatives to be selected through the remaining transitions Principle of optimality
; a policy associates with any state i an alternative ki to be used An optimal policy has the property that whatever the initial state and
when the chain enters state i the initial decision are, the remaining decisions must constitute an
The total expected reward of a policy will be called its value optimal policy with regard to the state resulting from the first decision
; a policy that delivers the maximum value is called an optimal policy Given the current state, an optimal policy for the remaining transitions
It is clear from the previous example that the optimal policy depends on is independent of the policy used in the previous transitions
whether we are working over a finite horizon or over an infinite horizon ; of course, this is essentially the Markov property
MATH3801-MATH3901 Stochastic Processes 313 MATH3801-MATH3901 Stochastic Processes 314
4. Markov Chains 4.10 Markov Decision Processes 4. Markov Chains 4.10 Markov Decision Processes
Markov Decision Processes: finite horizon Markov Decision Processes: finite horizon
Consider again the example
The recursive relation is the base of the dynamic programming
algorithm, which will yield the so-called optimal dynamic policy Alternative 1 Alternative 2
0.01 0.01
The recursion is complete by considering the boundary condition * *
0.99
6 0 j 1 h 0.99 0.99
6 0 j 1
0.01 1
vi (0) = ui r0 = 0 (1)
r1 =1 r0 = 0 (2)
r1 = 50
with 2 transitions remaining, and suppose u0 = u1 = 0
This value ui is usually called the terminal value of state i, defined to
be the additional gain received if the system terminates in state i As there is no choice in State 0, we directly have
(k )
Note: ui may be maxk ri , or 0, or something totally different, v0 (1) = r0 + P00 u0 + P01 u1 = 0
depending on the particular circumstances On the other hand, if in State 1, we must make a decision. This one will be
In fact, the dynamic programming algorithm is conceptually very motivated by
n o
simple, as it is just the successive calculation of the recursive equation (1) (1) (1) (2) (2) (2)
v1 (1) = max r1 + P10 u0 + P11 u1 , r1 + P10 u0 + P11 u1 = max{1, 50} = 50
; the best choice if in State 1 and one transition left is Alternative 2
MATH3801-MATH3901 Stochastic Processes 315 MATH3801-MATH3901 Stochastic Processes 316
4. Markov Chains 4.10 Markov Decision Processes 4. Markov Chains 4.10 Markov Decision Processes
Markov Decision Processes: finite horizon Markov Decision Processes: infinite horizon
Suppose now that there is no planned termination for the process
Now, if we go on with two transitions remaining, we have
; the optimal policy will now be called the optimal stationary policy
v0 (2) = r0 + P00 v0 (1) + P01 v1 (1) = 0.01 ⇥ 50 = 0.5 Obviously, the use of the word stationary is not innocent: we are
working on the long-run, with a final transition “too far away” to play a
n o role (similar to the stationary distribution of a Markov Chain, with an
(1) (1) (1) (2) (2) (2)
v1 (2) = max r1 + P10 v0 (1) + P11 v1 (1), r1 + P10 v0 (1) + P11 v1 (1) initial behaviour of the process not important any more)
= max{1 + 0.99 ⇥ 50, 50} = max{50.5, 50} = 50.5 At first sight, we might think that the optimal stationary policy could be
determined by the previous algorithm, letting n growing to 1
; when in State 1 and 2 transitions are left, the best choice is Alternative 1
However, it is not that simple:
; the optimal dynamic policy is thus Alternative 1 for the first transition and
we would have to solve an “infinite” recursion (without proper
Alternative 2 for the second (and last) transition
boundary conditions)
Note: continuing this computation for larger n, one finds v0 (n) = n/2 and it is not clear that the optimal policy becomes fixed when n
v1 (n) = 50 + n/2 for all n. The optimal dynamic policy is Alternative 2 for the becomes large (that is, that such an optimal policy exists)
last transition and Alternative 1 for all previous transitions usually, limn!1 vi (n) diverges for each state
; the dynamic programming approach cannot be directly applied
MATH3801-MATH3901 Stochastic Processes 317 MATH3801-MATH3901 Stochastic Processes 318
4. Markov Chains 4.10 Markov Decision Processes 4. Markov Chains 4.10 Markov Decision Processes
Markov Decision Processes: infinite horizon Markov Decision Processes: infinite horizon
Consider a policy k = (k0 , k1 , . . . , kM ) which defines a Markov Chain
with transition matrix P(k) , that is
h i
The problem of existence of an optimal stationary policy is similar to P(k) = Pij
(ki )
that of existence of a stationary distribution for a regular Markov Chain ij
Assume this Markov Chain is ergodic with stationary distribution ⇡ (k)
Recall that a sufficient condition (but not necessary!) for the existence
of a stationary distribution is the ergodicity of the Markov Chain Then, the ergodic theorem (Slide 259) guarantees that the average
long-term reward per transition, using this policy k, is
Similar ideas arise when deriving conditions for the existence of an
N 1 M
optimal stationary policy in a Markov Decision Process 1 X (ki ) X (k ) (k)
g (k) = lim ri 1I{Xn =i} = ri i ⇡i
However, the study of those conditions is well beyond the scope of this N!1 N
n=0 i=0
brief introduction to Markov Decision Processes (k)
Denote vi (n) the total expected reward using policy k when n
; in this course, we will assume that such an optimal stationary exists transitions are remaining and the chain is currently in state i
⇣ ⌘
(k)
Then, it can be understood that limn!1 vi (n) ng (k) exists:
⇣ ⌘
(k) . (k)
lim ng (k) = wi
vi (n)
n!1
MATH3801-MATH3901 Stochastic Processes 319 MATH3801-MATH3901 Stochastic Processes 320
4. Markov Chains 4.10 Markov Decision Processes 4. Markov Chains 4.10 Markov Decision Processes
Markov Decision Processes: infinite horizon Markov Decision Processes: infinite horizon
(k)
; wi tells us how profitable it is, in the long-term, to start in state i So we have
8 PM
compared to other states, it is called the relative gain of state i > (k)
w
(k0 ) (k0 ) (k)
0 +g r j=0 P0j wj
(k) = +
>
> 0
(under policy k) >
< (k) (k ) P M (k ) (k)
w1 + g (k) = r1 1 + j=0 P1j 1 wj
Now, we know that >
>
> ... ...
M
X >
: (k) (k ) P M (k ) (k)
(k) (k ) (k ) (k) wM + g (k) = rM M + j=0 PMjM wj
vi (n) = ri i + Pij i vj (n 1),
j=0 ; system of M + 1 equations with M + 2 unknowns (w0(k) , . . . , wM(k) , g (k) )
(k) (k)
hence Note that wj ’s are relative gains ; fix one, say w0 , to 0, and solve
(k) (ki )
M
X (ki )
⇣
(k)
⌘ the system to find the value of policy k
vi (n) ng (k) = ri + Pij vj (n 1) (n 1)g (k) g (k) (k)
j=0 See that wi + g (k) is actually what you wish to maximise with respect
⇣ ⌘ M
X ⇣ ⌘ to k, for each state i
(k) (k ) (k ) (k)
lim vi (n) ng (k) = ri i + Pij i lim vj (n 1) (n 1)g (k) g (k) ; improve on the current policy by finding, for each state i, the
n!1 n!1
j=0
alternative that maximises
ki0
M
X M
X
(k) (ki ) (k ) (k) (ki0 ) (ki0 ) (k)
wi = ri g (k) + Pij i wj (for all i) ri + Pij wj
j=0 j=0
MATH3801-MATH3901 Stochastic Processes 321 MATH3801-MATH3901 Stochastic Processes 322
4. Markov Chains 4.10 Markov Decision Processes 4. Markov Chains 4.10 Markov Decision Processes
Policy-iteration Method Policy-iteration method: example
Example
This iterative optimisation procedure is known as the policy-iteration Consider a Markov Decision Process with two states and the following
method alternatives (2 alternatives by state):
(1) (1) (1)
It can be initialised with any policy k, for instance, take in State 0, we have P00 = 0.4, P01 = 0.6 and r0 = 6.4, while
(2) (2) (2)
ki = arg maxk ri
(k ) P00 = 0.8, P01 = 0.2 and r0 = 4.4
(1) (1) (1)
It is clear that each succeeding policy has a higher value than the in State 1, we have P10 = 0.2, P11 = 0.8 and r1 = 0.4, while
(2) (2) (2)
previous one P10 = 0.5, P11 = 0.5 and r1 = 1
; once a policy is considered, it cannot reappear in the process Find the optimal stationary policy
If it does, that is, if two successive iterations yield identical policies, Given the values of the rewards at both states, it seems reasonable to start
then this one is the optimal stationary policy with the policy (1, 1), that is, Alternative 1 for both states
(k)
Thus, we have to solve (we write g and wi for the values of g (k) and wi
; stop the procedure
under the running policy): (
(1) (1) (1)
g+ w0 (1 P00 ) P01 w1 = r0
(1) (1) (1)
g P10 w0 + w1 (1 P11 ) = r1
MATH3801-MATH3901 Stochastic Processes 323 MATH3801-MATH3901 Stochastic Processes 324
4. Markov Chains 4.10 Markov Decision Processes 4. Markov Chains 4.10 Markov Decision Processes
Policy-iteration method: example Policy-iteration method: example
Because there are 3 unknowns (g, w0 and w1 ) for 2 equations, we set w0 = 0 Then, repeat with the new policy (2, 2). We have first to solve
Then, we have ⇢ ⇢
g 0.6 ⇥ w1 = 6.4 g 0.2 ⇥ w1 = 4.4
g + 0.2 ⇥ w1 = 0.4 g + 0.5 ⇥ w1 = 1
that is, w1 =8.5 and g = 1.3 that is, w1 = 7.71 and g = 2.85
From the policy improvement routine, we obtain, for State 0,
n o From the policy improvement routine, we obtain, for State 0,
(1) (1) (2) (2)
k00 = arg max r0 + P01 w1 , r0 + P01 w1
k00 = arg max {1.77, 2.86} ; k00 = 2
= arg max {6.4 + 0.6 ⇥ w1 , 4.4 + 0.2 ⇥ w1 }
= arg max {1.3, 2.7} For State 1, we have
; k00 = 2 (replace Alternative 1 by Alternative 2 for State 0) k10 = arg max { 6.57, 4.86} ; k10 = 2
For State 1, we have
n o The resulting policy is again (2, 2), so that (2, 2) is the optimal stationary
(1) (1) (2) (2)
k10 = arg max r1 + P11 w1 , r1 + P11 w1 policy
= arg max { 0.4 + 0.8 ⇥ w1 , 1 + 0.5 ⇥ w1 } With that policy, we are rewarded $2.85 per transition (on average), and the
= arg max { 7.2, 5.25} ; k10 =2 relative gain when starting in state 0 instead of in state 1 is $7.71
MATH3801-MATH3901 Stochastic Processes 325 MATH3801-MATH3901 Stochastic Processes 326
5. The Exponential Distribution and the Poisson Process 5.2 The Exponential Distribution
Exponential distribution: definition
A random variable is said to have the Exponential distribution with
parameter ( > 0), i.e.
X ⇠ Exp( ),
5 The Exponential Distribution if its probability density function is given by
⇢
and the Poisson Process fX (x) =
x e if x 0
(; SX = R+ )
0 otherwise
By integration, it is easy to show that
⇢
0 if x < 0
FX (x) = P(X  x) = x
1 e if x 0
MATH3801-MATH3901 Stochastic Processes 327
5. The Exponential Distribution and the Poisson Process 5.2 The Exponential Distribution 5. The Exponential Distribution and the Poisson Process 5.2 The Exponential Distribution
Exponential distribution: representation Exponential distribution: moments
We have also (Example 2.42, Slide 115)
⇣ ⌘
'X (t) = E etX =
1
(for t < )
t
λ
(n) n!
Differentiating 'X (t) n times gives 'X (t) = ( t)n+1
, so that
FX(x)

fX(x)
1/2 ●
n!
E(X n ) = 'X (0) =
(n)
n
In particular, this yields
0
1 2 1
ln 2
E(X 2 ) =
0
E(X ) = and Var(X ) =
0 λ
, 2 2
x x
cdf FX (x) pdf fX (x) = FX0 (x)
MATH3801-MATH3901 Stochastic Processes 328 MATH3801-MATH3901 Stochastic Processes 329
5. The Exponential Distribution and the Poisson Process 5.2 The Exponential Distribution 5. The Exponential Distribution and the Poisson Process 5.2 The Exponential Distribution
Memoryless property Memoryless property: example
Definition Example
A non-negative random variable X is said to be without memory, or Suppose X is the time between arrivals of two successive buses.
memoryless, if Imagine you have been waiting at the bus stop for 15 minutes
P(X > s + t|X > t) = P(X > s) for all s, t 0
if X is memoryless, then after you wait for 15 minutes, you are no
Think of X as the waiting time until some given arrival better off than you were originally in terms of how long you can
; given that the arrival has not occurred by time t, the distribution of still expect to wait
the remaining waiting time is the same as the original waiting time if the bus is known to arrive regularly every 16 minutes, then you
distribution know that it will arrive within a minute ; not memoryless
; the remaining time “has no memory” of previous waiting if the bus is supposed to arrive regularly, but frequently breaks
down, then a 15 minute wait can indicate that the remaining time
Another equivalent formulation:
is probably very long ; not memoryless
P(X > s + t) = P(X > t) ⇥ P(X > s) for all s, t 0
This must be understood in terms of the distribution of the
Note: you should not miss that this property bears a strong remaining waiting time, i.e. the probability of waiting for a given
resemblance to the Markov property amount of time! (NOT in terms of the remaining waiting time itself)
MATH3801-MATH3901 Stochastic Processes 330 MATH3801-MATH3901 Stochastic Processes 331
5. The Exponential Distribution and the Poisson Process 5.2 The Exponential Distribution 5. The Exponential Distribution and the Poisson Process 5.2 The Exponential Distribution
Memoryless property of the Exponential distribution Memoryless property
If X ⇠ Exp( ), Proposition
P(X > s + t, X > t) P(X > s + t) The Exponential distribution is the only continuous distribution that
P(X > s + t|X > t) = =
P(X > t) P(X > t) has the memoryless property
e (s+t)
s
= t
=e = P(X > s) Proof: Let X be a continuous r.v. with FX (x) = 1 g(x), i.e. g(x) = P(X > x)
e
The memoryless property states that g(s + t) = g(s)g(t) for all s, t 0
for all s, t 0
It follows that log g(s + t) = log g(s) + log g(t), therefore
d log g(t + ) log g(t) log g( ) .
log g(t) = lim = lim =c
dt
λ
!0 !0
where c does not depend on t
Integrating the previous expression, we have log g(t) = log g(0) + ct, and
thus g(t) = g(0)ect
Since g(0) = 1 and g is decreasing, it follows that
λexp(− λs)
λexp(− λt)
g(t) = e t
x
s t s+t for some positive , hence FX (x) = 1 e , which is the Exp( )-cdf
MATH3801-MATH3901 Stochastic Processes 332 MATH3801-MATH3901 Stochastic Processes 333
5. The Exponential Distribution and the Poisson Process 5.2 The Exponential Distribution 5. The Exponential Distribution and the Poisson Process 5.2 The Exponential Distribution
Hazard rate function Hazard rate function: Exponential distribution
Definition
For a continuous random variable X having cdf FX and density fX , the Suppose now that the lifetime distribution is Exponential. We have
hazard rate function (or failure rate function) is defined by t
fX (t) e
rX (t) = = t
=
fX (t) 1 FX (t) e
rX (t) =
1 FX (t)
; constant over time
Interpretation: Imagine that X is the lifetime of an item (i.e., its time
before failure). For a small " > 0, ; another manifestation of the memoryless nature of the
Exponential distribution
P(X 2 (t, t + ")) "fX (t)
P(X 2 (t, t + ")|X > t) = ' = " rX (t)
P(X > t) 1 FX (t) ; “lack of ageing”
; rX (t) is the conditional probability density of the failure time ; is often referred to as the rate of the distribution
for a t-year-old item
; sometimes called the instantaneous hazard or failure rate
MATH3801-MATH3901 Stochastic Processes 334 MATH3801-MATH3901 Stochastic Processes 335
5. The Exponential Distribution and the Poisson Process 5.2 The Exponential Distribution 5. The Exponential Distribution and the Poisson Process 5.2 The Exponential Distribution
Link to the Geometric distribution Link to the Geometric distribution
Imagine you are waiting for the occurrence of a certain random suppose you can check the occurrence of the phenomenon in
phenomenon. You know that the phenomenon can occur continuous time (i.e., at each ‘instant’ of time). Let T4 be your
independently on each trial, with equal probability p waiting time
suppose you can only check the occurrence of the phenomenon ; T4 ⇠ “Geo(limn!1 p/n)”,
once an hour. Let T1 be your waiting time (in hours) and E(T4 ) = limn!1 pn “instants” = limn!1 1n
np = 1
p hours
1
; T1 ⇠ Geo(p), and E(T1 ) = p hours ; in fact, T4 ⇠ Exp(p)
suppose you can check the occurrence of the phenomenon once
; the Exponential distribution characterises a kind of limiting
a minute. Let T2 be your waiting time (in minutes)
60 1
(continuous) behaviour of the Geometric distribution
; T2 ⇠ Geo(p/60), and E(T2 ) = p minutes = p hours
; what happens at each instant of time is independent to what
suppose you can check the occurrence of the phenomenon once happened at the previous instants, even very close in time
a second. Let T3
be your waiting time (in seconds)
3600 1
; T3 ⇠ Geo(p/3600), and E(T3 ) = p seconds = p hours Note: the Geometric distribution is the only discrete distribution that
has the memoryless property
MATH3801-MATH3901 Stochastic Processes 336 MATH3801-MATH3901 Stochastic Processes 337
5. The Exponential Distribution and the Poisson Process 5.2 The Exponential Distribution 5. The Exponential Distribution and the Poisson Process 5.2 The Exponential Distribution
Examples Examples
Example Example 5.4
Suppose that the amount of time that a light bulb works before burning itself The amount of damage involved in a car accident is an Exponential random
out is exponentially distributed with mean ten hours. Suppose that a person variable with mean $1000. Of this, the insurance company only pays that
enters a room in which a light bulb is burning. If this person desires to work amount exceeding $400. Find the expected value of the amount the
for five hours, then what is the probability that she will able to complete her insurance company pays per accident
work without the bulb burning out?
Let X be the amount of damage resulting from an accident, and Y the
By the memoryless property of the exponential distribution, no need to know amount paid by the insurance company. Then, conditioning on whether X
the amount of time t the bulb had been in use before the person enters exceeds 400, we have
the room. The remaining time T has the same distribution as the initial time,
that is, the Exponential distribution with mean 10. Hence the desired E(Y ) = E(Y |X  400)P(X  400) + E(Y |X > 400)P(X > 400)
probability is
5/10 1/2
P(T > 5) = 1 e =e ' 0.61
if X  400, the company doesn’t pay anything, i.e., Y = 0
Clearly, if the distribution of the bulb’s lifetime was not assumed exponential,
we would have if X > 400, then by the memoryless property, the amount by which it
1 FT (t + 5) exceeds is itself Exponential with mean 1000
P(T > t + 5|T > t) =
1 FT (t) 400/1000
; E(Y |X > 400) = 1000, and E(Y ) = 1000 e = 670.32 ($)
; additional information (amount of time t) would be needed
MATH3801-MATH3901 Stochastic Processes 338 MATH3801-MATH3901 Stochastic Processes 339
5. The Exponential Distribution and the Poisson Process 5.2 The Exponential Distribution 5. The Exponential Distribution and the Poisson Process 5.2 The Exponential Distribution
Further properties of the Exponential distribution Further properties of the Exponential distribution
Property 2
Let X1 and X2 be independent exponential random variables with
Property 1 respective rates 1 and 2 . Then,
Let X1 , X2 , . . . , Xn be i.i.d. Exponential random variables with 1
parameter . Then P(X1 < X2 ) =
n 1 + 2
X
Xi ⇠ (n, ) Proof:
i=1 Z +1
P(X1 < X2 ) = P(X1 < X2 |X1 = x)fX1 (x) dx
Proof: straightforward using mgf’s (Exercise 2.16) Z
0
+1
1x
= P(x < X2 ) 1e dx
Here, as n is obviously an integer, (n, ) is the Erlang-n distribution, 0
Z
with parameter (see Slide 62) +1
2x 1x
= e 1e dx
0
Z +1  1
e ( 1 + 2 )x
1
= 1 e ( 1 + 2 )x
dx = 1 =
0 ( 1 + 2) 0 1 + 2
MATH3801-MATH3901 Stochastic Processes 340 MATH3801-MATH3901 Stochastic Processes 341
5. The Exponential Distribution and the Poisson Process 5.2 The Exponential Distribution 5. The Exponential Distribution and the Poisson Process 5.2 The Exponential Distribution
Further properties of the Exponential distribution Further properties of the Exponential distribution
Property 3
Let X , X , . . ., X be independent exponential random variables with
1 2 n
respective rates 1 , 2 , . . ., n . Then the distribution
Pn of X(1) = mini Xi is
Property 2 bis Exponentially distributed with rate equal to i=1 i , i.e.,
Let X1 , X2 , . . . , Xn be independent exponential random variables with n
!
X
Xi having rate i . Then, X(1) ⇠ Exp i
i=1
i
P(Xi is the minimum) = Pn
j=1 j Proof: For t > 0,
P(X(1) > t) = P(X1 > t, X2 > t, . . . , Xn > t)
Proof: direct generalisation of the preceding
= P(X1 > t) ⇥ P(X2 > t) ⇥ . . . ⇥ P(Xn > t)
1t 2t nt
=e e ...e
Pn
=e ( i=1 i )t
Pn Pn
; for t > 0, F X(1) (t) = 1 e ( i=1 i )t ; Exp( i=1 i )-cdf
MATH3801-MATH3901 Stochastic Processes 342 MATH3801-MATH3901 Stochastic Processes 343
5. The Exponential Distribution and the Poisson Process 5.2 The Exponential Distribution 5. The Exponential Distribution and the Poisson Process 5.2 The Exponential Distribution
Example Example
Example 5.8 Example 5.8
Suppose you arrive at a post office having two clerks at a moment when both Suppose you arrive at a post office having two clerks at a moment when both
are busy but there is no one else waiting in line. You will enter service when are busy but there is no one else waiting in line. You will enter service when
either clerk becomes free. If service times for clerk i are Exponential with rate either clerk becomes free. If service times for clerk i are Exponential with rate
i , i = 1, 2, find E(T ), where T is the amount of time that you spend in the i , i = 1, 2, find E(T ), where T is the amount of time that you spend in the
post office post office.
Let Ri denote the remaining service time of the customer with clerk i. By the Besides, conditioning on which of R1 and R2 is smallest, we get
memoryless property, R1 and R2 are independent exponential r.v. with rates E(S) = E(S|R1 < R2 )P(R1 < R2 ) + E(S|R2 < R1 )P(R2 < R1 )
1 and 2 . Now, with S denoting your service time, we have 1 1 1 2
T = min(R1 , R2 ) + S, so that = +
1 1 + 2 2 1 + 2
E(T ) = E(min(R1 , R2 )) + E(S) 2
= .
1 + 2
1
We know that min(R1 , R2 ) ⇠ Exp( 1 + 2 ), so that E(min(R1 , R2 )) = 1+ 2
Finally,
1 2 3
E(T ) = + =
1 + 2 1 + 2 1 + 2
MATH3801-MATH3901 Stochastic Processes 344 MATH3801-MATH3901 Stochastic Processes 345
5. The Exponential Distribution and the Poisson Process 5.2 The Exponential Distribution 5. The Exponential Distribution and the Poisson Process 5.2 The Exponential Distribution
Sum of Exponential random variables Sum of Exponential random variables
If X1 and X2 are independent Exp( ) random variables, then In the same way, we have the general result:
X1 + X2 ⇠ (2, ) Property 4
What if X1 ⇠ Exp( 1 ) and X2 ⇠ Exp( 2 ), with 1 6= 2?
If X1 , X2 , . . . , Xn are independent Exponentially distributed random
; convolution (see Slide 106) variables with respective distinct parameters i , then
Z t n
X
f (t) = f (s) f (t s) ds fX1 +X2 +...+Xn (t) = Ci,n it
X1 +X2
0
X1 X2
ie ,
Z t i=1
1s s)
1e 2e ds
2 (t
=
0 where Y j
Z t Ci,n =
2t j i
1 2e e ds
( 1 2 )s
= j6=i
0
⇣ ⌘
=
1 2
e 2t
1 e ( 1 2 )t (Proof: see textbook, Section 5.2.4)
1 2 This is known as the Hypoexponential distribution
1 2t 2 1t
!
2e 1e
= + n
X n
X
it
1 2 2 1 Direct integration also shows that P Xi > t = Ci,n e
i=1 i=1
MATH3801-MATH3901 Stochastic Processes 346 MATH3801-MATH3901 Stochastic Processes 347
5. The Exponential Distribution and the Poisson Process 5.2 The Exponential Distribution 5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process
Geometric sum of Exponential random variables The Poisson Process: introduction
Property 5 - Geometric compound random variable The Poisson distribution models the number of occurrences of a
given phenomenon during a fixed period of time (Slide 49)
Let X1 , X2 , . . . be i.i.d. Exponential random variables with parameter ,
and N a Geometric random variable with parameter p. Then ; the Poisson process is the analogue when recording occurrences
N of the phenomenon over time
X
X =
Xi ⇠ Exp(p ) (; from random variable to stochastic process)
i=1 This is a simple and widely used process for modelling the times at
Proof: Conditioning on N = n, we have from Property 1 that which arrivals enter a system. Those can be:
customers at a cashier;
xn 1 n
e x
fX |N (x|n) = , for x 0, n = 1, 2, . . . (Gamma pdf) raindrops at a patch of ground;
(n 1)!
babies born in some hospital;
Hence, requests at a web server;
1
X xn 1 n x 1
X n 1 n 1
e x x phone calls at a cell tower;
fX (x) = ⇥ P(N = n) = p e (1 p)n 1
n=1
(n 1)! (n
n=1
1)! goals scored by a given soccer team;
1
X (x (1 p)) n ...
x x x(1 p) p x
=p e =p e e =p e for x 0 But: every arrival/counting process is not necessarily a Poisson
n!
n=0 process!
MATH3801-MATH3901 Stochastic Processes 348 MATH3801-MATH3901 Stochastic Processes 349
5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process 5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process
Arrival process Interarrival times
Definition
Definition The interarrival times X1 , X2 , . . . are positive random variables defined
An arrival process is a sequence of increasing random variables in terms of the arrival times by X1 = S1 and Xi = Si Si 1 for i > 1.
0 < S1 < S2 < . . . called arrival times (or arrival epochs) and Similarly, we can write
representing the times at which some repeating phenomenon occurs n
X
Sn = Xi
Remarks: i=1
Si > 0 8i ; the process starts at time 0
Si > Si 1 ; multiple arrivals cannot occur simultaneously ; the joint distribution of X1 , . . . , Xn for all n 1 is sufficient (in
principle) to specify the arrival process
Any arrival process can also be specified by two other stochastic
processes: Often, the sequence of interarrival times is a sequence of i.i.d.
the sequence of interarrival times X1 , X2 , . . . random variables. Then, the process is called a renewal process
the counting process N(t) (See Chapter 7 in the textbook for a general treatment of Renewal
Processes - not addressed in this class)
MATH3801-MATH3901 Stochastic Processes 350 MATH3801-MATH3901 Stochastic Processes 351
5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process 5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process
Counting process Arrival - interarrival - counting processes
Definition The whole arrival process of interest can be specified by the joint
A stochastic process {N(t), t 0} is said to be a counting process if distribution of the arrival times S1 , S2 , . . ., that of the interarrival
N(t) represents the total number of occurrences of a certain intervals X1 , X2 , . . . or that of the counting random variables N(t)
phenomenon by (and including) time t ; in principle, specifying any one of these also specifies the others
; the three descriptions are equivalent, and the arrival process can
Some properties:
be viewed in terms of whichever description is more convenient at
1 N(t) 0 for all t 0 the time
2 N(t) is integer valued for all t 0
The arrival process can be represented as:
3 N(0) = 0
4 for all t2 > t1 , N(t2 ) N(t1 )
; {N(t), t 0} is a continuous-time, discrete-state stochastic process
A counting process N(t) can obviously be related to any arrival
process, and we have, for any integer n 1 and any time t 0,
{Sn  t} () {N(t) n}
MATH3801-MATH3901 Stochastic Processes 352 MATH3801-MATH3901 Stochastic Processes 353
5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process 5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process
Increments Poisson process: Definition 1
Poisson process: Definition 1
A Poisson process is an arrival process in which the interarrival
intervals are i.i.d. and exponentially distributed, i.e. there exists some
Definition parameter > 0 such that
The increment of the counting process between time t1 and time t2
(t1 < t2 ) is the number of occurrences of the phenomenon in the i.i.d.
X1 , X2 , . . . ⇠ Exp( )
interval (t1 , t2 ], i.e. N(t2 ) N(t1 )
X1 , X2 , . . . i.i.d. ; it is a renewal process
An increment of the process is thus a difference in the state of the 1/ = E(Xi ) = expected time between two successive
process at two distinct times occurrences of the phenomenon
Note: Increments will be fundamental quantities in the developments ; = average number of occurrences of the phenomenon over one
which follow unit of time ; rate or intensity of the process
Question: what are the distributions of Sn and N(t) with that
specification of the interarrival times?
; the memoryless property of the Exponential distribution will have
important implications for the process
MATH3801-MATH3901 Stochastic Processes 354 MATH3801-MATH3901 Stochastic Processes 355
5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process 5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process
First arrival after time t First arrival after time t
Let Zt be the duration of the interval from t until the first arrival after t. Along the same lines: P (Zt > z|{N(⌧ ), 0  ⌧  t}) = e z
Consider first the case N(t) = 0, that is, the first arrival after t is the
first arrival of the process. For any z > 0, we have: ; Zt |{N(⌧ ), 0  ⌧  t} ⇠ Exp( )
P (Zt > z|N(t) = 0) = P (X1 > z + t|N(t) = 0)
= P (X1 > z + t|X1 > t) = P(X1 > z) = e z Proposition
For a Poisson process of rate and any time t > 0, the amount of time
Hence, Zt |{N(t) = 0} ⇠ Exp( )
Zt between t and the first arrival after t is Exp( ), independent of all
Now suppose that N(t) = n and the nth arrival occurs at time arrival times before t and independent of N(⌧ ) for all ⌧  t
Sn = ⌧  t. Then, for any z > 0,
In fact, Zt , conditional on the time ⌧ of the last arrival before t, is simply
P (Zt > z|N(t) = n, Sn = ⌧ ) = P (Xn+1 > z + t ⌧ |N(t) = n, Sn = ⌧ )
the remaining time until the next arrival
= P (Xn+1 > z + t ⌧ |Xn+1 > t ⌧, Sn = ⌧ )
Memoryless property: Zt is independent of ⌧  t, hence independent
= P (Xn+1 >z +t ⌧ |Xn+1 > t ⌧)
of everything before t
z
= P (Xn+1 > z) = e
; the portion of the Poisson process starting at some time t > 0 is a
; Zt |{N(t) = n, Sn = ⌧ } ⇠ Exp( ), for all n, ⌧ probabilistic replica of the process starting at 0
MATH3801-MATH3901 Stochastic Processes 356 MATH3801-MATH3901 Stochastic Processes 357
5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process 5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process
Stationary increment property Independent increment property
Definition
Definition A counting process {N(t), t 0} has the independent increment
A counting process {N(t), t 0} has the stationary increment property if the numbers of occurrences of the phenomenon in disjoint
property if for every 0 < t1 < t2 , N(t2 ) N(t1 ) has the same intervals (i.e., increments) are independent
distribution function as N(t2 t1 )
; if (t1 , t2 ] and (t3 , t4 ] are disjoints intervals, (N(t2 ) N(t1 )) and
(N(t4 ) N(t3 )) are independent
; the number of occurrences of the phenomenon in a given interval
; e.g., the number of occurrences of the phenomenon between
depends only on the length of the interval
times 6 and 8 is independent of the number of occurrences of the
; the distribution of the number of occurrences over (t, t + z] is the phenomenon between times 10 and 15
same for all t, and depends only on z ; e.g., the number of occurrences before time 10 (i.e., between
; the previous results clearly show that times 0 and 10) is independent of the number of occurrences of
the phenomenon between times 10 and 15
the Poisson process has the stationary increment property
; the previous results clearly show that
the Poisson process has the independent increment property
MATH3801-MATH3901 Stochastic Processes 358 MATH3801-MATH3901 Stochastic Processes 359
5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process 5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process
Distribution of Sn Example
Another quantity of main interest is Sn , the arrival time of the nth
P n Example 5.13
replication of the phenomenon. We have that Sn = i=1 Xi , with Xi
i.i.d. Exponential( ) random variables. Therefore we know Suppose that people immigrate into a territory at a Poisson rate = 1 per
day. (a) What is the expected time until the tenth immigrant arrives? (b) What
is the probability that the elapsed time between the tenth and the eleventh
Sn ⇠ (n, ),
arrival exceeds two days? (c) Ten days have passed without seeing any
immigrant arriving. What is the probability that the time until the next arrival
that is,
n 1 exceeds two more days?
s ( s)
fSn (s) = e for s 0,
(n 1)!
10
(a) E(S10 ) = = 10 days
and it directly follows
2 2
(b) P(X11 > 2) = e =e ' 0.133, as any Xi ⇠ Exp(1)
n n
E(Sn ) = and Var(Sn ) = 2 (c) P(X > 12|X > 10) = P(X > 2) = e 2 = e 2 ' 0.133
(memoryless property of the Exp-distribution)
MATH3801-MATH3901 Stochastic Processes 360 MATH3801-MATH3901 Stochastic Processes 361
5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process 5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process
Joint distribution of S1 , S2 , . . ., Sn Distribution of N(t)
The joint density of S1 , S2 , . . ., Sn will also be helpful later. See that, The Exponential distribution plays the major role in the previous
for any n-uple (s1 , s2 , . . . , sn )
such that 0 < < < ... < s1 s2 sn , development ; why is this called the Poisson process?
Theorem
def.
fS1 S2 ...Sn (s1 , s2 , . . . , sn ) = fX1 X2 ...Xn (s1 , s2 s1 , . . . , sn sn 1) For a Poisson process of rate , and for any t > 0, we have
ind.
= fX1 (s1 ) fX2 (s2 s1 ) . . . fXn (sn sn 1)
N(t) ⇠ P( t),
i.d. s1 (s2 s1 ) (sn sn 1)
= e e ... e n
that is, P (N(t) = n) = e t ( t) , for n = 0, 1, 2, . . .
n (s1 +s2 s1 +s3 s2 +...+sn sn 1)
= e n!
n sn Proof: Fixing S0 = 0, we have {N(t) n} = {Sn  t} for any non-negative
= e
integer n and any t > 0, hence
; this does not depend on any arrival time other than sn (except in P(N(t) = n) = P(N(t) n) P(N(t) n + 1)
the constraint 0 < s1 < s2 < . . . < sn ) = P(Sn  t) P(Sn+1  t)
Z t Z t
; for fixed sn , the joint density, and thus the conditional density of
= fSn (s) ds fSn+1 (s) ds
S1 , S2 , . . ., Sn 1 given Sn = sn , is (multivariate) uniform 0 0
MATH3801-MATH3901 Stochastic Processes 362 MATH3801-MATH3901 Stochastic Processes 363
5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process 5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process
Distribution of N(t) Poisson process: Definition 2
Now, Z Z
t t
s ( s)n 1
fSn (s) ds = e ds, Poisson process: Definition 2
0 0 (n 1)!
and, integrating by parts, A Poisson process is a counting process that satisfies
Z t n Z t n
t ( t) s ( s) ( t)n
fSn (s) ds =e + e ds t
0 n! 0 n! P(N(t) = n) = e
n Z t
n!
t ( t)
=e + fSn+1 (s) ds for any non-negative integer n and any t > 0, and has the independent
n! 0
It follows and stationary increment properties
n
t ( t)
P(N(t) = n) = e So far we have shown that Definition 1 implies Definition 2. Starting with
n!
for any non-negative integer n and any t > 0 Definition 2, we have, for any t > 0,
Note: it follows that, for any t 0, P(X1 > t) = P(N(t) = 0) = e t
E(N(t)) = t and Var(N(t)) = t
so that X1 ⇠ Exp( )
; as already noted, = E(N(1)) is the expected number of arrivals in unit
time, and is therefore called the (arrival) rate of the process
MATH3801-MATH3901 Stochastic Processes 364 MATH3801-MATH3901 Stochastic Processes 365
5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process 5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process
Poisson process: Definition 2 Revision (?): Landau’s small o notation
Moreover, for any integer n 0, Definition
A function f (·) is said to be o( ) (read: ‘small oh of ’) if
P(Xn+1 > t |Sn = s) = P (N(s + t) N(s) = 0|Sn = s)
= P(N(s + t) N(s) = 0) (independent increments) f( )
lim =0
= P(N(t) = 0) (stationary increments) !0
t
=e ; f ( ) is negligible compared with , when is small
; Xn+1 is independent of Sn and has distribution Exp( ) ; o(·) is actually for “of smaller order than”
Similarly, Xn+1 is independent of any Sn k, k = 0, . . . , n For instance, from a Taylor expansion,
1
; independent of any previous Xk , k = 1, . . . , n X ( 1)k ( )k
e =1 +
; {Xi } are i.i.d. Exp( ), which is Definition 1 k!
k =2
Note: the Poisson distribution of N(t) is not sufficient on its own for the 1 ( 1)k ( )k ( 1)k k
k 1
but as for any k 2, lim !0 k! = k! lim !0 = 0, we
process to be Poisson! The independent and stationary increment properties can write
are necessary! e =1 + o( ) as !0
MATH3801-MATH3901 Stochastic Processes 366 MATH3801-MATH3901 Stochastic Processes 367
5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process 5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process
Revision (?): Landau’s small o notation Revision (?): Landau’s small o notation
Note that o( ) does not refer to any specific function An example from statistical inference:
2
It denotes any quantity that goes to 0 at a faster rate than , as !0 Suppose that we have a i.i.d. sample X1 , X2 , . . . , Xn ⇠ N (µ, ) and the
parameter of interest is exp(µ)
. . . but it is just a notation ! The Maximum Likelihood Estimator of this parameter is exp(X̄ ), where
Pn
For instance, since the sum of two negligible quantities (compared to X̄ = n1 i=1 Xi is the sample mean
) is again negligible compared to , we may write It can be shown that
⇣ 2 2
⌘
o( ) + o( ) = o( ), MSE{exp(X̄ )} = e2µ e2 /n
2e /(2n)
+1
which might be a bit disconcerting at first sight ; it is difficult to appreciate the effect of increasing n
Similarly, we have However, from the power series expansion of the exponential function, we
have
MSE{exp(X̄ )} = e2µ 2 n 1 + 7 4 n 2 /4 + . . .
o( ) ⇥ o( ) = o( )
and c ⇥ o( ) = o( ) or
MSE{exp(X̄ )} = e2µ 2
n 1
+ o(n 1
) as n ! 1,
where c is any constant ; the effect of n increasing is much easier to understand through this
MATH3801-MATH3901 Stochastic Processes 368 MATH3801-MATH3901 Stochastic Processes 369
5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process 5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process
Poisson process: Definition 3 Poisson process: 3 equivalent definitions
Poisson process: Definition 3 Poisson process: Definition 3
A Poisson process is a counting process that satisfies A Poisson process is a counting process that satisfies
P N(t + ) N(t) = 0 = 1 + o( )
P N(t + ) N(t) = 0 = 1 + o( ) P N(t + ) N(t) = 1 = + o( )
Poisson process: Definition 2
P N(t + ) N(t) = 1 = + o( ) P N(t + ) N(t) 2 = o( )
A Poisson process is a counting process that satisfies
P N(t + ) N(t) 2 = o( ) for any t > 0 and any small > 0, and has the independent
( t)n
and stationary increment properties P(N(t) = n) = e t
for any t > 0 and any small > 0, and has the independent and n!
stationary increment properties Poisson process: Definition
for any non-negative 1 n and any t > 0, and has the
integer
independent
A Poisson process and stationary
is an arrival process in increment
which the properties
From Definition 2, we have interarrival intervals are i.i.d. and exponentially distributed,
i.e. there exists some parameter > 0 such that
( )k ( )k
P N(t + ) N(t) = k = P(N( ) = k ) = e = (1 + o( )) i.i.d.
k! k! X1 , X2 , . . . ⇠ Exp( )
and Definition 3 follows. Definition 3 also implies Definition 2: proof based on ; use the definition you need, at your convenience!
solving differential equations ; we’ll have a glimpse at this in Chapter 6 ; very flexible when showing facts about the Poisson process
MATH3801-MATH3901 Stochastic Processes 370 MATH3801-MATH3901 Stochastic Processes 371
5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process 5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process
Independent Poisson processes Combining Poisson processes
First, N(t) has the stationary and independent increment properties,
since N1 (t) and N2 (t) have them and are independent (direct
Definition consequence of the definitions). Moreover,
We say that two Poisson processes {N1 (t), t 0} and {N2 (t), t 0}
are independent if their interarrival time random variables are P N(t + ) N(t) = 0
independent. Equivalently, the random variables N1 (t1 ) and N2 (t2 ) = P N1 (t + ) N1 (t) = 0, N2 (t + ) N2 (t) = 0
should be independent for all t1 , t2 0 =P N1 (t + ) N1 (t) = 0 P N2 (t + ) N2 (t) = 0
= (1 1 + o( ))(1 2 + o( ))
=1 ( 1 + 2) + o( )
Consider two independent Poisson processes {N1 (t), t 0} and
{N2 (t), t 0} of rates 1 and 2 respectively Similarly, P N(t + ) N(t) = 1 = ( 1 + 2 ) + o( ), and
Define N(t) = N1 (t) + N2 (t) for all t 0 P N(t + ) N(t) 2 = o( )
Is {N(t), t 0} a Poisson process? ; Definition 3 ; {N(t), t 0} is a Poisson process with rate 1 + 2
Note that this easily generalises to the case where more than 2
independent Poisson processes are combined
MATH3801-MATH3901 Stochastic Processes 372 MATH3801-MATH3901 Stochastic Processes 373
5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process 5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process
Combining Poisson processes: example Splitting a Poisson process
Example Consider a Poisson process {N(t), t 0}
Suppose the buses 391, 392, 393, 394, 395, 396, 397, 398 and 399, Suppose that each arrival is classified as a Type I arrival with
going to the City, arrive at the bus stop in front of UNSW according to probability p or a Type II arrival with probability 1 p, independently of
independent Poisson processes {Nk (t), t 0} with respective rate k , all other events
k = 391, . . . , 399. You arrive at the bus stop at some time and you Let N1 (t) and N2 (t) denote the number of Type I and Type II arrivals by
would like to go to the City. How long do you expect to wait for the next time t
bus to arrive? Are {N1 (t), t 0} and {N2 (t), t 0} Poisson processes?
You wait for any kind of bus, as they are all going to the City. Define the Yes, with respective rates p and (1 p)
process {N(t), t 0} counting the number of buses of any kind,
P Furthermore, they are independent
arriving at UNSW, that is, N(t) = k399=391 Nk (t). We have that
P (recall Example 3.23, Slides 180-182)
{N(t), t 0} is a Poisson process, with rate k399 =391 k . At the time
you arrive, the process probabilistically restarts itself (memoryless Note: conditionally on {N(t), t 0}, {N1 (t), t 0} and {N2 (t), t 0}
property). Your expected waiting time for the next bus is thus are not independent, as one completely determines the other
1 ; the (unconditional) independence of {N1 (t), t 0} and
E(X1 ) = P399
k
{N2 (t), t
0} might be a little surprising (but it is not!)
k =391
MATH3801-MATH3901 Stochastic Processes 374 MATH3801-MATH3901 Stochastic Processes 375
5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process 5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process
Splitting a Poisson process Splitting a Poisson process
Indeed,
{N1 (t), t 0} and {N2 (t), t 0} clearly inherits the stationary and
independent increment properties of {N(t), t 0}
for any t 0, N1 (t) and N2 (t) are independent random variables
for {N1 (t), t 0}, we have
(see Example 3.23)
P N1 (t + ) N1 (t) = 0 = P N1 ( ) = 0 also, for t1 < t2 , the independence of N1 (t1 ) and
= P N1 ( ) = 0|N( ) = 0 P(N( ) = 0) N2 (t2 ) = N2 (t1 ) + (N2 (t2 ) N2 (t1 )) follows from the independent
+ P N1 ( ) = 0|N( ) = 1 P(N( ) = 1) increment property
+ P N1 ( ) = 0|N( ) 2 P(N( ) 2) ; {N1 (t), t 0} and {N2 (t), t 0} are independent Poisson
= 1 ⇥ (1 + o( )) + (1 p)( + o( )) processes
+ o( )
Again, this readily generalises to the case where the initial process is
=1 p + o( )
split in more than 2 sub-processes
; similar for P N1 ( ) = 1 and P N1 ( ) 2 , and for {N2 (t), t 0}
with rate (1 p)
MATH3801-MATH3901 Stochastic Processes 376 MATH3801-MATH3901 Stochastic Processes 377
5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process 5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process
Splitting a Poisson process: Example Poisson process: examples
Example 5.15
Example 5.14
Nonnegative offers to buy an item you want to sell arrive according to a
If immigrants to area A arrive at a Poisson rate of ten per week, and if Poisson process with rate . Assume that each offer is the value of a
each immigrant is of English descent with probability 1/12, then what is continuous random variable having density f (x). Once the offer is presented
the probability that no people of English descent will emigrate to area to you, you must either accept it or reject it and wait for the next offer. You
A during the month of February? incur costs at a rate c per unit of time until the item is sold, and you’d like to
maximise your expected return. Your policy is to accept the first offer that is
The initial immigration process {N(t), t 0} is split into English greater than some specified value y . What is the optimal value of y ?
descent immigration (process {N1 (t), t 0}) and non-English descent
immigration (process {N2 (t), t 0}). The former is a Poisson process let X denote the value of the offer, and let F̄ (x) = P(X > x) (sometimes
with rate 10 ⇥ 1/12 = 0.83 immigrant per week. Thus, in February (4 called the survival function)
weeks), offers arrive according to a Poisson process of rate , and will be greater
0.83⇥4
P(N1 (4) = 0) = e ' 0.26 than y with probability F̄ (y )
; splitting of a Poisson process ; offers greater than y arrive according to
a Poisson process of rate F̄ (y )
MATH3801-MATH3901 Stochastic Processes 378 MATH3801-MATH3901 Stochastic Processes 379
5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process 5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process
Poisson process: examples Poisson process: examples
; the time Ty until the first offer greater than y (i.e., the offer that you will the condition can be written as
accept) is Exponentially distributed, with parameter F̄ (y ) c
'(y ) = (?)
let Ry = X |(X > y ) cTy be your return, whose expectation you’d like to
maximise with respect to y Z +1
; we have: E(Ry ) = E(X |X > y ) cE(Ty ) with '(y ) = (x y )f (x) dx
y
Z +1
c see that '(y ) = E(max(X y , 0)), and that max(X y , 0) is a
= x fX |X >y (x) dx
0 F̄ (y ) non-increasing function in y
Z +1
f (x) c ; '(y ) is a non-increasing function in y
= x dx
y F̄ (y ) F̄ (y )
✓Z +1 ◆ moreover, '(0) = E(X )
1 c
= x f (x) dx ; if E(X ) < c/ , (?) has no solution ; take y = 0 and accept the first offer
F̄ (y ) y
if E(X ) c/ , the optimal value of y is the solution of (?)
differentiation yields: suppose that = 2 offers/hour, c = 1$/hour and X ⇠ U[10,110]
✓Z +1 ◆
d c R 110 2
E(Ry ) = 0 () y F̄ (y ) + x f (x) dx =0 ; '(y ) = y
1
(x y ) 100 dx = (110200y ) , and '(y ) = 21 if y = 100$
dy y
MATH3801-MATH3901 Stochastic Processes 380 MATH3801-MATH3901 Stochastic Processes 381
5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process 5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process
Poisson process: examples Poisson process: examples
(j)
let S1 denote the time of the first arrival in process j (that is, the first
Example 5.17 - The Coupon collecting problem
time one obtains a coupon of type j)
There are r different types of coupons. Each time a person collects a coupon (j) (j)
it is, independently of ones previously obtained, a type j coupon with ; S1 ⇠ Exp(pj ), and let S = max1jr S1
probability pj . Let N denote the number of coupons one needs to collect in ; S denotes the time at which the collection is complete
order to have a complete collection of at least one of each type. Find E(N). as the different Poisson processes are independent, the S1 are
(j)
independent! ✓ ◆
first way of thinking: with Nj the number one must collect to obtain a type (j)
it follows P(S  t) = P max S1 t
j coupon, we have j
N = max Nj ⇣ ⌘
(1) (2) (r )
1jr = P S1  t, S1  t, . . . , S1  t
each Nj ⇠ Geo(pj ), but they are dependent r
Y ⇣ ⌘ r
Y
(j) pj t
= P S1  t = 1 e
; not easy to find the distribution of their maximum
j=1 i=1
suppose now that the coupons are collected at times chosen according
to a Poisson process with rate = 1 (why not?) therefore (see Exercises 2.5 and 2.18):
Z +1 Z +1 r
!
different types of coupons ; splitting of this Poisson process in r Y
pj t
E(S) =P(S > t) dt = 1 1 e dt
independent Poisson processes, with respective rates p1 , . . . , pr 0 0 i=1
MATH3801-MATH3901 Stochastic Processes 382 MATH3801-MATH3901 Stochastic Processes 383
5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process 5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process
Poisson process: examples Splitting an artificial Poisson process
PN If arrivals of a Poisson process are split into two new arrival processes,
obviously, S = i=1 Xi , where Xi are the interarrival times in the general
Poisson process with probability p and 1 p independently from one to another, then
i.i.d.
the new processes are Poisson, have respective rates p and (1 p)
; Xi ⇠ Exp(1), and N is independent of the Xi ’s and are independent
; S is a compound random variable
Useful consequence
! N
X Any two independent Poisson processes can be viewed as being
E(S) = E Xi = E(N)E(Xi ) = E(N)
generated from an artificial single process in this way
i=1
if the two independent processes have respective rates 1 and 2,
Z !
+1 r
Y
pj t
the rate of the “initial” process must be 1 + 2
; E(N) = 1 1 e dt
0
; the splitting parameter p must satisfies 1 = p, that is,
i=1
1
Note: The ‘trick’ of assuming that things happen following a fictitious Poisson p=
1 + 2
process whereas it is not (necessarily) the case is called the
‘Poissonisation’ of the process
; this is sometimes a useful working assumption
MATH3801-MATH3901 Stochastic Processes 384 MATH3801-MATH3901 Stochastic Processes 385
5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process 5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process
n arrivals of type I before m arrivals of type II Example
Let {N1 (t), t 0} and {N2 (t), t 0} be two independent Poisson Example
processes with rates 1 and 2
Customers arrive at the post office according to a Poisson process of rate 0.6
; what is the probability that n arrivals in the first one occur before m per minute. There is a single counter, serving the customers in order with
arrivals in the second one? service time exponentially distributed with parameter 1. You are the first
(1) (2) person in the queue. What is the probability that at least one person
Let Sn denote the time of the nth arrival in the first process, and Sm subsequently arrives at the post office before you leave?
the time of the mth arrival in the second process
⇣ ⌘
(1)
; P Sn < Sm ?
(2) Process 1: the customers arrive ; Poisson with rate 1 = 0.6
Process 2: the customers leave after being served ; Poisson with rate
In terms of an artificial split arrival process, this amounts to: 2= 1, as the interarrival times are i.i.d. Exp(1)
out of the first n + m 1 arrivals, what is the probability that n or more independent processes
of them are switched to the first process? we have
2 ✓ ◆✓ ◆i ✓ ◆2
X1✓ ◆✓ ◆i ✓ ◆n+m ⇣ ⌘ X
⇣ ⌘ n+m i
n+m 1
1 i
(1) (2) 2 0.6 1
; P Sn(1) < Sm(2) = 1 2 P S1 < S2 =
i 1 + 2 1 + 2
i 1 + 0.6 1 + 0.6
i=n i=1
(Binomial distribution, see Example 3.3, Slide 153) = 2 ⇥ 0.375 ⇥ 0.625 + 0.3752 = 0.61
MATH3801-MATH3901 Stochastic Processes 386 MATH3801-MATH3901 Stochastic Processes 387
5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process 5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process
Conditional distribution of the arrival times Conditional distribution of the arrival times
Suppose you are told that exactly one arrival has taken place by time t,
i.e., N(t) = 1. What is the distribution of the time at which it occurred? Theorem 5.2
Intuition: the process has stationary and independent increments, so it Given that N(t) = n, the joint density of the n arrival times
seems reasonable that each short subinterval in [0, t] of equal length S1 , S2 , . . . , Sn is constant over the region 0 < s1 < s2 < . . . < sn  t,
should have the same probability of containing the arrival ; Uniform? with
n!
Indeed, for any 0 < s < t, fS1 S2 ...Sn |N(t) (s1 , s2 , . . . , sn |n) =
tn
P(X < s, N(t) = 1)
1
P(S1 < s|N(t) = 1) = if 0 < s1 < s2 < . . . < sn  t and 0 otherwise
P(N(t) = 1)
P(N(s) = 1, N(t) N(s) = 0) Proof: For any 0 < s1 < . . . < sn < sn+1 , write, using Bayes’ first rule,
=
P(N(t) = 1) fS1 ...Sn Sn+1 |N(t) (s1 , . . . , sn , sn+1 |n)
P(N(s) = 1) ⇥ P(N(t s) = 0) fS1 ...Sn Sn+1 (s1 , . . . , sn , sn+1 )
= = pN(t)|S1 ...Sn Sn+1 (n|s1 , . . . , sn , sn+1 )
P(N(t) = 1) P(N(t) = n)
se se (t s) s and see that pN(t)|S1 ...Sn Sn+1 (n|s1 , . . . , sn , sn+1 ) = 1 if sn  t and sn+1 > t and 0
= t
= ; U[0,t] -cdf otherwise
te t
; what if N(t) = n?
MATH3801-MATH3901 Stochastic Processes 388 MATH3801-MATH3901 Stochastic Processes 389
5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process 5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process
Conditional distribution of the arrival times Conditional distribution of the arrival times
Thus,
fS1 ...Sn Sn+1 (s1 , . . . , sn , sn+1 ) Given that N(t) = n, Sn+1 has the same distribution as t + Zt , where Zt is the
fS1 ...Sn Sn+1 |N(t) (s1 , . . . , sn , sn+1 |n) =
P(N(t) = n) amount of time from t until the first arrival after t (see Slides 356-357)
for 0 < s1 < . . . < sn  t < sn+1 (and 0 otherwise). We know that Zt is known to be Exp( ), independently of the rest, hence:
(sn+1 t)
fS1 S2 ...Sn (s1 , s2 , . . . , sn ) = n
e sn ; fSn+1 |S1 ...Sn ,N(t) (sn+1 |s1 , . . . , sn , n) = fSn+1 |N(t) (sn+1 |n) = e
for 0 < s1 < . . . < sn (Slide 362), so that for sn+1 > t (‘shifted’ Exponential distribution - see Exercise 3.1)
n+1 sn+1 (sn+1 t) It follows,
e n! e n! e (sn+1 t)
n!
fS1 ...Sn Sn+1 |N(t) (s1 , . . . , sn , sn+1 |n) = t(
= fS1 ...Sn |N(t) (s1 , . . . , sn |n) = =
e t)n /n! tn tn e (sn+1 t) tn
for 0 < s1 < . . . < sn  t < sn+1 . But, by the multiplicative law of probability, for 0 < s1 < . . . < sn  t, as announced
fS1 ...Sn Sn+1 |N(t) (s1 , . . . , sn , sn+1 |n)
= fSn+1 |S1 ...Sn ,N(t) (sn+1 |s1 , . . . , sn , n)fS1 ...Sn |N(t) (s1 , . . . , sn |n)
MATH3801-MATH3901 Stochastic Processes 390 MATH3801-MATH3901 Stochastic Processes 391
5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process 5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process
Order statistics Order statistics
If the Yi ’s are i.i.d. with probability density f , then, for any
Let Y1 , Y2 , . . ., Yn be n random variables, and denote y1 < y2 < . . . < yn ,
Y(1) = the smallest value among Y1 , Y2 , . . . , Yn 1 Y(1) , Y(2) , . . . , Y(n) will equal (y1 , y2 , . . . , yn ) if (Y1 , Y2 , . . . , Yn ) is
equal to any of the n! permutations of (y1 , y2 , . . . , yn );
Y(2) = the 2nd smallest value among Y1 , Y2 , . . . , Yn 2 for any of these permutations, the joint density of (Y1 , Y2 , . . . , Yn )
Q
. n
.. is i=1 f (yi ) (independence)
Y(n) = the nth smallest value among Y1 , Y2 , . . . , Yn Hence, the joint density of the order statistics is
n
Y
= the largest value among Y1 , Y2 , . . . , Yn
fY(1) ,Y(2) ,...,Y(n) (y1 , y2 , . . . , yn ) = n! f (yi )
i=1
for y1 < y2 < . . . < yn
Definition
1
The vector Y(1) , Y(2) , . . . , Y(n) is called the order statistics of the Now, if the Yi are uniformly distributed over [0, t], that is, if f (y ) = t for
vector (Y1 , Y2 , . . . , Yn ) 0  y  t, then
n!
f
Y(1) ,Y(2) ,...,Y(n) (y1 , y2 , . . . , yn ) = n (0  y1 < y2 < . . . < yn  t)
t
MATH3801-MATH3901 Stochastic Processes 392 MATH3801-MATH3901 Stochastic Processes 393
5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process 5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process
Conditional distribution of the arrival times Conditional distribution of the arrival times
Consequence
Given that N(t) = n, the n arrival times S1 , S2 , . . ., Sn have the same This result can be directly used for Si , as, given N(t) = n, it can be
distribution as the order statistics corresponding to n independent thought of as the ith smallest value among n i.i.d. U[0,t] random
random variables uniformly distributed on the interval [0, t] variables, that is, with cdf F (s) = st and density f (s) = 1t for 0  s  t
This observation can be very helpful For 0  s  t, we obtain:
For instance, in Example 2.38 (Slides 109-110), we showed that the n! si 1 (t s)n i
density of the ith smallest value X(i) of n i.i.d. random variables fSi |N(t) (s|n) =
(n i)!(i 1)! tn
X1 , X2 , . . . , Xn
with cdf F and density f is given by
This distribution is a (scaled version of a) Beta distribution
n!
fX(i) (x) = f (x)(F (x))i 1
(1 F (x))n i
(n i)!(i 1)!
MATH3801-MATH3901 Stochastic Processes 394 MATH3801-MATH3901 Stochastic Processes 395
5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process 5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process
Beta distribution: definition Beta distribution: cdf and pdf
A random variable is said to have the Beta distribution with parameters
↵ and (↵, > 0), i.e.

1.0
X ⇠ B(↵, ),

3.5
α = 8, β = 2 α = 8, β = 2
α = 4, β = 3 α = 4, β = 3
α = 3, β = 3 α = 3, β = 3
if its probability density function is given by α = 4, β = 8 α = 4, β = 8

3.0
α = 2, β = 3 α = 2, β = 3

0.8
( ↵ 1 1 (1 x)

2.5
x
if x 2 [0, 1]
fX (x) = B(↵, ) (; SX = [0, 1])

0.6
0 otherwise

2.0
dbeta(x, 8, 2)
pbeta(x, 8, 2)
where B(↵, ) is a normalisation constant

1.5
0.4
By integration, it is easy to find FX (x) and

1.0
0.2
↵ ↵

0.5
E(X ) = and Var(X ) =
↵+ (↵ + )2 (↵ + + 1)

0.0
0.0
0.0 0.2 0.4 0.6 0.8 1.0
The Beta distribution is a common model for the behaviour of random
0.0 0.2 0.4 0.6 0.8 1.0
x x
variables limited to intervals of finite length
cdf FX (x) pdf fX (x) = FX0 (x)
MATH3801-MATH3901 Stochastic Processes 396 MATH3801-MATH3901 Stochastic Processes 397
5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process 5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process
Conditional distribution of the arrival times Sampling a Poisson process
Imagine that t = 1 (not restrictive, time can always be rescaled If arrivals of a Poisson process of rate are split into two new arrival
accordingly) processes, with probability p and 1 p independently from one to
According to the previous fact, another, then the new processes are Poisson, have respective rates
p and (1 p) and are independent
Si |{N(1) = n} ⇠ B(i, n i + 1)
Suppose now that an arrival occurring at time s will be classified as a
and type I arrival, independently of anything in the past, with probability
i
E(Si |N(1) = n) = p(s), and as type II with probability 1 p(s)
n+1
Also, as Xi = Si Si Let N1 (t) and N2 (t) represent the number of type I and type II arrivals
1, we have also
arriving by time t. Then,
1
E(Xi |N(1) = n) = Proposition
n+1
for all i = 2, 3, . . . , n, as well as For any t, N1 (t) and N2 (t) are independent Poisson random variables
having means
1
E(X1 |N(1) = n) = E(S1 |N(1) = n) = Z Z
n+1 t t
it t E(N1 (t)) = p(s) ds and E(N2 (t)) = (1 p(s)) ds
(In the general case: E(Si |N(t) = n) = n+1 , E(Xi |N(t) = n) = n+1 ) 0 0
MATH3801-MATH3901 Stochastic Processes 398 MATH3801-MATH3901 Stochastic Processes 399
5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process 5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process
Sampling a Poisson process Sampling a Poisson process
Proof: With the P( t) distribution of N(t), we have
Let us compute the joint probability P(N1 (t) = n, N2 (t) = m). Conditioning on ✓ ◆ n+m
n+m n t ( t)
the total number of arrivals by time t, we have P(N1 (t) = n, N2 (t) = m) = p (1 p)m e
n (n + m)!
P(N1 (t) = n, N2 (t) = m) = P(N1 (t) = n, N2 (t) = m|N(t) = n + m) pt ( pt)n (1 p)t ( (1 p)t)m
⇥P(N(t) = n + m) =e ⇥e
n! m!
Now, consider an arbitrary arrival. Given that N(t) = n + m, the time S at
which it occurs is uniformly distributed over [0, t], so the probability of the ; N1 (t) and N2 (t) are independent Poisson random variables, with
arrival being of type I is respective parameters
Z t
Z t Z t pt = p(s) ds
1 1
p = E(p(S)) = p(s) ds = p(s) ds, 0
0 t t 0 and
Z ! Z
t t
independently of the other arrivals. Therefore, N1 (t) is binomially distributed,
with parameters n + m and p, so that (1 p)t = t p(s) ds = (1 p(s)) ds
0 0
✓ ◆
n+m n
P(N1 (t) = n, N2 (t) = m|N(t) = n + m) = p (1 p)m Remark: {N1 (t), t 0} and {N2 (t), t 0} are not Poisson processes!
n
They do not have the stationary increment property!
MATH3801-MATH3901 Stochastic Processes 400 MATH3801-MATH3901 Stochastic Processes 401
5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process 5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process
Sampling a Poisson process: examples Sampling a Poisson process: examples
Example 5.18: An Infinite Server Queue
Customers arrive at a service station in accordance with a Poisson process
with rate . Upon arrival the customer is immediately served by one of an
infinite number of possible servers, and the service times are assumed to be ; X (t) and Y (t) are two independent Poisson random variables, with
independent with a common cdf G. What is the distribution of X (t), the respective parameters
number of customers that have completed service by time t? What is the Z t Z t
distribution of Y (t), the number of customers that are being served at time t? E(X (t)) = G(t s) ds = G(s) ds
0 0
Define a Type I customer as a customer who completes his service by time t
and Z Z
and a type II customer as a a customer who does not complete his service by t t
time t. A customer enters at time s. Then, E(Y (t)) = (1 G(t s)) ds = (1 G(s)) ds
0 0
he will be of Type I if his service time is less than t s, that is, with
probability G(t s)
he will be of Type II if his service time is more than t s, that is, with
probability 1 G(t s)
MATH3801-MATH3901 Stochastic Processes 402 MATH3801-MATH3901 Stochastic Processes 403
5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process 5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process
Sampling a Poisson process: examples Sampling a Poisson process: examples
Example 5.19: Minimising the number of encounters define an ‘arrival’ at time s as a car entering the highway at time s
Cars enter a one-way highway in accordance with a Poisson process with say that this arrival is of type I if it results in an encounter with your car
rate . The cars enter at point a and depart at point b. Each car travels at a ; it will encounter you car if its travel time T is such that
constant speed that is randomly determined, independently from car to car,
from the distribution G. When a faster car encounters a slower one, it passes s + T > t + t0 , if s < t
,
it with no time being lost. If your car enters the highway at time t, what speed s + T < t + t0 , if t < s < t + t0
should you adopt to minimise the expected number of encounters you will and this happens with probability
have? 8
< P(s + T > t + t0 ) = 1 F (t + t0 s), if s < t
Suppose that you adopt the speed x. Let d = b a be the length of the road. p(s) = P(s + T < t + t0 ) = F (t + t0 s), if t < s < t + t0
:
As you enter the road at time t, you will leave it at time t + t0 , where t0 = d/x 0 if s > t + t0
is the travel time.
Each other car travels at a speed X , selected according to the cdf G. This ; as cars enter the highway according to a Poisson process of rate , it
results in a travel time T = d/X . Let F denote the distribution of the travel follows that the total number N1 (t + t0 ) of encounters is Poisson with
time, i.e., mean
Z Z !
t t+t0
F (s) = P(T  s) = P(d/X  s) = P(X d/s) = 1 G(d/s) E(N1 (t + t0 )) = (1 F (t + t0 s)) ds + F (t + t0 s) ds
0 t
MATH3801-MATH3901 Stochastic Processes 404 MATH3801-MATH3901 Stochastic Processes 405
5. The Exponential Distribution and the Poisson Process 5.3 The Poisson Process 5. The Exponential Distribution and the Poisson Process 5.4 Generalisations of the Poisson Process
Sampling a Poisson process: examples Generalisations of the Poisson Process
that is, The assumptions on which the Poisson process is built state that
Z Z ! 1 the rate is constant (stationary increment property) and
t+t0 t0
E(N1 (t + t0 )) = (1 F (t)) dt + F (t) dt 2 only single arrivals are possible at a time,
t0 0
Differentiation yields among other things
d However, in some cases, these assumptions poorly fit the real situation
E(N1 (t + t0 )) = (1 F (t + t0 ) 1+ F (t0 ) + F (t0 ))
dt0
Setting this equal to 0, we have Example
2F (t0 ) = F (t + t0 ) Let {N(t), t 0} represent the number of customers entering a given
store on a given day
Now, if t is large enough (when did the process start?), F (t + t0 ) ' 1,
; there are probably ‘peak hours’ during the day
and we have
1 ; there are probably customers coming in couple, in family, etc.
F (t0 ) = ,
2
that is, your travel time t0 should be the median of the travel time Hence two generalisations of the Poisson process may be useful:
distribution, or similarly, 1 the non-homogeneous Poisson process
your speed x should be the median of the speed distribution G 2 the compound Poisson process
MATH3801-MATH3901 Stochastic Processes 406 MATH3801-MATH3901 Stochastic Processes 407
5. The Exponential Distribution and the Poisson Process 5.4 Generalisations of the Poisson Process 5. The Exponential Distribution and the Poisson Process 5.4 Generalisations of the Poisson Process
Non-homogeneous Poisson process Sampling a (homogeneous) Poisson process
Definition
A counting process {N(t), t t} is said to be a non-homogeneous
Poisson process with rate function (t) if
Let {N(t), t 0} be a Poisson process with rate , and suppose that
P N(t + ) N(t) = 0 = 1 (t) + o( ) an arrival occurring at time t is, independently of what has occurred
prior to s, counted with probability p(s)
P N(t + ) N(t) = 1 = (t) + o( )
P N(t + ) N(t) 2 = o( ) Proposition
for any t > 0 and any small > 0, and has the independent If Nc (t) denotes the number of counted arrivals by time t, the counting
increment property process {Nc (t), t 0} is a non-homogeneous Poisson process with
rate function (t) = p(t)
Essentially Definition 3 of the Poisson process, but:
; the arrival rate is allowed to vary as a function of time (arrivals
more likely to occur at certain times than at others)
; the process does not have the stationary increment property (for
obvious reasons)
MATH3801-MATH3901 Stochastic Processes 408 MATH3801-MATH3901 Stochastic Processes 409
5. The Exponential Distribution and the Poisson Process 5.4 Generalisations of the Poisson Process 5. The Exponential Distribution and the Poisson Process 5.4 Generalisations of the Poisson Process
Sampling a (homogeneous) Poisson process Combining independent non-homogeneous Poisson
“Proof”: processes
1 for t1 < t2 , Nc (t2 ) Nc (t1 ) depends only on N(t2 ) N(t1 ) and
{p(s) : s 2 (t1 , t2 )}, which are both independent of what has
occurred before time t1 Proposition
; {Nc (t), t 0} has the independent increment property Let {N1 (t), t 0} and {N2 (t), t 0} be independent
non-homogeneous Poisson processes, with respective rate functions
2 For some small > 0,
(t) and µ(t), and let N(t) = N1 (t) + N2 (t) for any t. Then,
P Nc (t + ) Nc (t) 2  P N(t + ) N(t) 2 = o( ); (a) {N(t), t 0} is a non-homogeneous Poisson process with rate
P Nc (t + ) Nc (t) = 1 function (t) + µ(t)
= P Nc (t + ) Nc (t) = 1 N(t + ) N(t) = 1) (b) given that a arrival occurs at time t in the {N(t), t 0} process, it
⇥ P(N(t + ) N(t) = 1) comes from the {N1 (t), t 0} process with probability
+ P Nc (t + ) Nc (t) = 1 N(t + ) N(t) 2) (t)
⇥ P(N(t + ) N(t) 2) (t) + µ(t)
= p(t) + o( ) = (t) + o( )
and similar for showing P Nc (t + ) Nc (t) = 0 = 1 (t) + o( )
MATH3801-MATH3901 Stochastic Processes 410 MATH3801-MATH3901 Stochastic Processes 411
5. The Exponential Distribution and the Poisson Process 5.4 Generalisations of the Poisson Process 5. The Exponential Distribution and the Poisson Process 5.4 Generalisations of the Poisson Process
Combining independent non-homogeneous Poisson Combining independent non-homogeneous Poisson
processes processes
Proof: (part (a))
Proof: (part (b))
1 let I1 , I2 , . . . , In be non-overlapping intervals, and let N1 (I) and N2 (I)
denote the number of arrivals in the interval I for each process. 4 which process caused the arrival at time t is independent of what
N1 and N2 have each the independent increment property and are happened before t (independent increment property). Hence,
independent of each other P N1 (t + ) N1 (t) = 1 N(t + ) N(t) = 1)
; N1 (I1 ), . . . , N1 (In ), N2 (I1 ), . . . , N2 (In ) are all independent
; so are N(I1 ) = N1 (I1 ) + N2 (I1 ), . . . , N(In ) = N1 (In ) + N2 (In ) P N1 (t + ) N1 (t) = 1, N2 (t + ) N2 (t) = 0
=
; N has also the independent increment property P(N(t + ) N(t) = 1)
2 P N(t + ) N(t) 2 (t) + o( )
= P N1 (t + ) N1 (t) = 1, N2 (t + ) N2 (t) = 1 + o( ) =
( (t) + µ(t)) + o( )
= ( (t) + o( )) ⇥ (µ(t) + o( )) + o( ) = o( ) o( )
(t) +
3 P N(t + ) N(t) = 1 = P N1 (t + ) N1 (t) = 1, N2 (t + ) N2 (t) = 0 = o( )
+P N1 (t + ) N1 (t) = 0, N2 (t + ) N2 (t) = 1 ( (t) + µ(t)) +
= ( (t) + o( ))(1 µ(t) + o( )) Letting ! 0 gives the result.
+(µ(t) + o( ))(1 (t) + o( ))
= ( (t) + µ(t)) + o( )
MATH3801-MATH3901 Stochastic Processes 412 MATH3801-MATH3901 Stochastic Processes 413
5. The Exponential Distribution and the Poisson Process 5.4 Generalisations of the Poisson Process 5. The Exponential Distribution and the Poisson Process 5.4 Generalisations of the Poisson Process
Non-homogeneous Poisson process as a sampled Non-homogeneous Poisson process as a sampled
Poisson process Poisson process
Let {N(t), t 0} be a non-homogeneous Poisson process with rate
function (t), and suppose that there exists > 0 such that (t) < ,
for all t 0 Consequence
Imagine another non-homogeneous Poisson process {N ⇤ (t), t 0} Every non-homogeneous Poisson process with a bounded rate
with rate function ⇤ (t) = (t), that is independent of {N(t), t 0} function can be thought of as being a time sampling of a Poisson
process
By the previous proposition, {N(t) + N ⇤ (t), t 0} is a
“non-homogeneous” Poisson process with rate function
; this observation can be very helpful
(t) + (t) =
; constant rate ; usual homogeneous Poisson process! For instance, we know that a sampled version of a Poisson process is
characterised by a counting random variable, namely N(t), that is
(t) Rt
An arrival at time t in this process has probability
of coming from Poisson distributed with mean 0 p(s) ds (Slide 401)
{N(t), t 0} ; this applies to any non-homogeneous Poisson process
; {N(t), t 0} is a sampled version of a Poisson process, with
p(t) = (t)/
MATH3801-MATH3901 Stochastic Processes 414 MATH3801-MATH3901 Stochastic Processes 415
5. The Exponential Distribution and the Poisson Process 5.4 Generalisations of the Poisson Process 5. The Exponential Distribution and the Poisson Process 5.4 Generalisations of the Poisson Process
Mean value function Mean value function
Proposition
If {N(t), t 0} is a non-homogeneous Poisson process, then, for any
Proposition
t 0, N(t) is a Poisson random variable with mean
If {N(t), t 0} is a non-homogeneous Poisson process, then, for any
Z t
t, ⌧ 0, the increment N(t + ⌧ ) N(t) is a Poisson random variable
E(N(t)) = (s) ds
0 with mean Z t+⌧
(s) ds
Proof: apply the above mentioned result with p(s) = (s)/ . t
Definition Proof: the number of arrivals between t and t + ⌧ is the number of arrivals
The function Z between 0 and ⌧ if the process was starting at time t (independence of the
t
past). Thus, N(t + ⌧ ) N(t) is Poisson distributed with mean
m(t) = (s) ds R⌧ R t+⌧
0 0
(t + s) ds = t (s) ds
is called the mean value function of the non-homogeneous Poisson
process
MATH3801-MATH3901 Stochastic Processes 416 MATH3801-MATH3901 Stochastic Processes 417
5. The Exponential Distribution and the Poisson Process 5.4 Generalisations of the Poisson Process 5. The Exponential Distribution and the Poisson Process 5.4 Generalisations of the Poisson Process
Non-homogeneous Poisson process: examples Non-homogeneous Poisson process: examples
Example 5.24 ; the rate function is
Siegbert runs a hot dog stand that opens at 8am. From 8 until 11am, 8
customers seem to arrive, on the average, at a steadily increasing rate that < 5 + 5t if 0  t  3
starts with an initial rate of 5 customers per hour at 8am and reaches a (t) = 20 if 3  t  5
:
maximum of 20 customers per hour at 11. From 11 until 1pm the rate seems 20 2(t 5) if 5  t  9
to remain constant at 20 customers per hour. However, the arrival rate then
with 8am = time 0 and 5pm = time 9
drops steadily from 1pm until closing time 5pm at which time it has the value
of 12 customers per hour. If we assume that the number of customers the number of customers between 8.30am and 9.30am is thus
arriving at Siegbert’s stand during disjoint time periods are independent, then N(1.5) N(0.5), and is Poisson distributed with parameter
what is the probability that no customers arrive between 8.30am and 9.30am Z 1.5
on Monday morning? What is the expected number of arrivals in this period?
(t) dt = 10
0.5
a good model would be to assume that arrivals constitute a Poisson
process ; P(0 customer) = e 10
' 0.00004
however, the arrival rate is not constant ; non-homogeneous Poisson
; the average number of customers between 8.30am and 9.30am is 10
process
MATH3801-MATH3901 Stochastic Processes 418 MATH3801-MATH3901 Stochastic Processes 419
5. The Exponential Distribution and the Poisson Process 5.4 Generalisations of the Poisson Process 5. The Exponential Distribution and the Poisson Process 5.4 Generalisations of the Poisson Process
Distribution of Sn in a non-homogeneous Poisson Compound Poisson process: definition
process Another limitation of the Poisson process is that it only allows single
The distribution of Sn , the time at which the nth arrival occurs, is also arrivals. The compound Poisson process relaxes this restriction
of interest. We have:
Definition
P(t < Sn  t + ) = P(N(t) = n 1, N(t + ) N(t) 1)
A stochastic process {X (t), t 0} is said to be a compound Poisson
= P(N(t) = n 1, N(t + ) N(t) = 1) + o( ) process if it can be represented as
= P(N(t) = n 1)P(N(t + ) N(t) = 1) + o( )
N(t)
X
(m(t))n 1 X (t) = Yi , t 0
= e m(t) ( (t) + o( )) + o( )
(n 1)! i=1
(m(t))n 1
= e m(t) (t) + o( ) where {N(t), t 0} is a Poisson process and {Yi , i = 1, 2, . . .} is a
(n 1)!
sequence of i.i.d. random variables independent of {N(t), t 0}
Hence,
P(t < Sn t+ ) m(t) (m(t))
n 1 Note: for each t, N(t) is a Poisson random variable, so that X (t) is a
fSn (t) = lim =e (t) compound Poisson random variable
!0 (n 1)!
MATH3801-MATH3901 Stochastic Processes 420 MATH3801-MATH3901 Stochastic Processes 421
5. The Exponential Distribution and the Poisson Process 5.4 Generalisations of the Poisson Process 5. The Exponential Distribution and the Poisson Process 5.4 Generalisations of the Poisson Process
Compound Poisson process: examples Compound Poisson process: properties
1 if Yi = 1 for all i, then X (t) = N(t) ; usual Poisson process
2 suppose that buses arrive at a sporting event in accordance with a Because X (t) is a compound Poisson random variable for any t, with
Poisson process, and the number of fans in each bus are E(N(t)) = t, it directly follows from Examples 3.11 and 3.19 (Slides
assumed to be i.i.d.. Then, if X (t) denotes the number of fans who 168 & 176) that
have arrived by t, E(X (t))) = t E(Yi )
N(t)
X
X (t) = Yi and
i=1
Var(X (t)) = t E(Yi2 ),
where N(t) is the number of buses arrived by time t and Yi is the for all t 0. Besides, the compound Poisson process has the
number of fans in bus i ; compound Poisson process
independent and stationary increment properties,
3 suppose that customers leave a supermarket in accordance to a
Poisson process. If Yi , the amount spent by the ith customer, are i.e., X (t + s) X (t) has the same distribution as X (s) and is
i.i.d., then {X (t), t 0} is a compound Poisson process where independent of what happened before time t
PN(t)
X (t) = i=1 Yi is the total amount of money spent by the
customers by time t
MATH3801-MATH3901 Stochastic Processes 422 MATH3801-MATH3901 Stochastic Processes 423
5. The Exponential Distribution and the Poisson Process 5.4 Generalisations of the Poisson Process 5. The Exponential Distribution and the Poisson Process 5.4 Generalisations of the Poisson Process
Compound Poisson process: examples Compound Poisson process: examples
Example 5.26 Example 5.27
Suppose that families migrate to an area at a Poisson rate = 2 per week. If
Consider a single-server service station in which customers arrive according
the number of people in each family is i.i.d. and takes on the values 1, 2, 3
to a Poisson process having rate . An arriving customer is immediately
and 4 with probabilities 1/6, 1/3, 1/3 and 1/6, then what is the expected value
served if the server is free; if not, he waits in line. The successive service
and variance of the number of individuals migrating to this area during a fixed
times are independent with a common distribution. A period when there are
five-week period?
no customers in the system is said to be idle, while periods when there are
Letting Yi denote the number of people in the ith family, we have that customers in the system are said to be busy. A busy period starts when an
arrival finds the system empty, and because of the memoryless property of
5 43 the Poisson arrivals it follows that the distribution of the length of a busy
E(Yi ) = and E(Yi2 ) = .
2 6 period will be the same for each such period. Let B denote the length of a
Hence, letting X (5) denote the number of immigrants during a five-week busy period. Find E(B) and Var(B).
period, we have
5 Let S denote the service time of the first customer in the busy period and let
E(X (5)) = 2 ⇥ 5 ⇥ = 25 people
2 N(S) be the number of subsequent arrivals during that time
and
43 215 if N(S) = 0, B = S (the busy period ends when the initial customer
Var(X (5)) = 2 ⇥ 5 ⇥ = ' 72 people2 leaves the system)
6 3
MATH3801-MATH3901 Stochastic Processes 424 MATH3801-MATH3901 Stochastic Processes 425
5. The Exponential Distribution and the Poisson Process 5.4 Generalisations of the Poisson Process 5. The Exponential Distribution and the Poisson Process 5.4 Generalisations of the Poisson Process
Compound Poisson process: examples Compound Poisson process: examples
if N(S) = 1: at time S there is a single customer in the system, starting conditioning on S, we have:
his service ; same situation as the initial one ; additional time from S 0 1
N(S)
until the system becomes empty has the same distribution as B, and we X
E(B|S) = S + E @ Bi S A
have i=1
B = S + B1
and 0 1
where B1 is independent of S and has the same distribution as B N(S)
X
if N(S) = n: n customers are waiting in line at time S. As the order in Var(B|S) = Var @ Bi S A
which customers are served does not affect the total service time, each i=1
of these n customers will cause a busy period of length identically
N(S)
distributed as the initial one (independently of the others) X
given S, Bi is a compound Poisson random variable, so that
; B = S + B1 + . . . + Bn i=1
0 1 0 1
N(S) N(S)
; we can express B as X X
N(S) E@ Bi S A = SE(B) and Var @ Bi S A = SE(B 2 )
X
i=1 i=1
B=S+ Bi
i=1 and
where B1 , B2 , . . . are i.i.d. random variables, all distributed as B E(B|S) = (1 + E(B))S and Var(B|S) = SE(B 2 )
MATH3801-MATH3901 Stochastic Processes 426 MATH3801-MATH3901 Stochastic Processes 427
5. The Exponential Distribution and the Poisson Process 5.4 Generalisations of the Poisson Process
Compound Poisson process: examples
Hence,
E(B) = E (E(B|S)) = (1 + E(B))E (S)
implying
E(S)
E(B) =
provided that E(S) < 1
1 E(S)
6 Continuous-Time Markov
(if E(S) 1
; explosion of the system) Chains
Also,
Var(B) = Var(E(B|S)) + E(Var(B|S))
= (1 + E(B))2 Var(S) + E(B 2 )E(S)
= (1 + E(B))2 Var(S) + Var(B) + (E(B))2 E(S)
yielding
3
(1 + E(B))2 Var(S) + (E(B))2 E(S) Var(S) + (E(S))
Var(B) = = 3
1 E(S) (1 E(S))
MATH3801-MATH3901 Stochastic Processes 428
6. Continuous-Time Markov Chains 6.1 Introduction 6. Continuous-Time Markov Chains 6.2 Continuous-time Markov Chains
Continuous-time Markov Chains: introduction Continuous-time Markov Chain: definition
Consider a Poisson process {N(t), t 0}. Then, for any t, s 0, and
any nonnegative integers i and j, Definition
A continuous-time stochastic process {X (t), t 0} taking on values in
P (N(t + s) = j|N(s) = i, {N(⌧ ), 0  ⌧ < s})
the set of nonnegative integers is a continuous-time Markov Chain if,
= P (N(t + s) N(s) = j i|N(s) = i, {N(⌧ ), 0  ⌧ < s}) for all s, t 0, and nonnegative integers i, j, x(⌧ ), 0  ⌧ < s,
But, due to the independent and stationary properties, we have
P (X (t + s) = j|X (s) = i, {X (⌧ ) = x(⌧ ), 0  ⌧ < s})
P N(t + s) = j|N(s) = i, {N(⌧ ), 0  ⌧ < s}
= P(X (t + s) = j|X (s) = i)
= P (N(t + s) N(s) = j i) = P(N(t) = j i)
; given the present (time s), the future (time t + s) is independent of If, in addition, P(X (t + s) = j|X (s) = i) is independent of s, then the
the past (times 0  ⌧ < s) continuous-time Markov Chain is said to be homogeneous (always
; Markov property, but in continuous time assumed in this course)
A Poisson process is a particular case of a ; P (X (t + s) = j|X (s) = i, {X (⌧ ) = x(⌧ ), 0  ⌧ < s}) = Pij (t)
continuous-time Markov Chain
The functions Pij (t) are called the transition probability functions
It is actually the simplest example of a continuous-time Markov Chain
MATH3801-MATH3901 Stochastic Processes 429 MATH3801-MATH3901 Stochastic Processes 430
6. Continuous-Time Markov Chains 6.2 Continuous-time Markov Chains 6. Continuous-Time Markov Chains 6.2 Continuous-time Markov Chains
Continuous-time Markov Chain: memoryless property Continuous-time Markov Chain: Exponential
Suppose that the continuous-time Markov Chain enters state i at some occupation time
time (say, time 0) and we know that it does not leave it during the next
It follows directly that the occupation time Ti must be exponentially
s units of time
distributed (only memoryless continuous distribution)
What is the probability that the process will not leave state i during the
following t units of time? In fact, the Markov property is a kind of ‘forgetting’ property, hence it is
natural that it is strongly related to the memoryless Exponential
The process is in state i at time s, this is the present. So, by the
distribution
Markov property, what will happen next must be independent of the
past This gives another way of defining a continuous-time Markov Chain.
In particular, we must have: Namely, it is a stochastic process with the property that each time it
enters state i
P(Ti > s + t|Ti > s) = P(Ti > t) 1 the amount of time it spends in that state before making a
where Ti is the amount of time that the process stays in state i before transition into another state is exponentially distributed, with some
making a transition into another state (occupation time) parameter vi
This is exactly the ‘memoryless’ property! 2 when the process leaves state i, it next enters state j with some
probability Pij
; in order for the Markov property to hold, Ti must be memoryless
MATH3801-MATH3901 Stochastic Processes 431 MATH3801-MATH3901 Stochastic Processes 432
6. Continuous-Time Markov Chains 6.2 Continuous-time Markov Chains 6. Continuous-Time Markov Chains 6.2 Continuous-time Markov Chains
Continuous-time Markov Chain: second definition Continuous-time Markov Chain: rate-based
representation
Suppose there are M + 1 states (SX = {0, 1, 2, . . . , M}) in the Markov
Alternative definition Chain (M may be infinite, though)
A continuous-time Markov chain is a stochastic process which moves
Imagine that each time the process enters state i, M independent
from state to state in accordance with a (discrete-time) Markov Chain,
Exponential random variables are drawn, with respective rates qij 0,
but is such that the amounts of time it spends in each state, before
for all j 6= i. Call those M random variables Tij (all j 6= i)
proceeding to the next state, are independent exponentially distributed
random variables The process then stays in state i for an amount of time equal to
Tij ⇤ = minj6=i Tij and moves to state j ⇤
The discrete-time Markov Chain which governs transitions between
states is called the embedded Markov Chain Now in this new state, say state j, this is repeated, from M independent
exponential r.v. with respective rates qjk , all k 6= j, etc.
This is known as the transition-probability-based representation of the
chain Notice that qij > 0 corresponds to a non-zero transition probability from
i to j, while some qij might be 0 (infeasible transitions)
This is known as the rate-based representation of the chain
MATH3801-MATH3901 Stochastic Processes 433 MATH3801-MATH3901 Stochastic Processes 434
6. Continuous-Time Markov Chains 6.2 Continuous-time Markov Chains 6. Continuous-Time Markov Chains 6.2 Continuous-time Markov Chains
Continuous-time Markov Chain: rate-based Continuous-time Markov Chain: rate-based
representation representation
From this rate-based representation, it is easy to retrieve the transition Besides, after this time Ti , we know that the process will move to state
probability-based representation of the Markov Chain j if Tij ⇠ Exp(qij ) is the minimum among the Exp r.v., and this happens
Again, define Ti as the amount of time spent in state i before making a with probability (Slide 342)
transition into any other state. We have qij
Pij = P
j qij
Ti = min Tij
j2SX Inversely, we also see that
where T ⇠ Exp(q ) and are independent. Then, we know (Slide 343)
ij ij qij = vi Pij
that 0 1
X ; the set of qij are known as the instantaneous transition rates
Ti ⇠ Exp @ qij A The instantaneous rates qij contain more information than the
j2SX transition probabilities and are therefore the primitive parameters
Thus, we have X
vi = qij , characterising a continuous-time Markov Chain
j2SX ; the rate-based representation is more common and will preferably
the rate at which the chain leaves state i be used in this class (unlike in the textbook)
MATH3801-MATH3901 Stochastic Processes 435 MATH3801-MATH3901 Stochastic Processes 436
6. Continuous-Time Markov Chains 6.2 Continuous-time Markov Chains 6. Continuous-Time Markov Chains 6.2 Continuous-time Markov Chains
A first example A first example
Example 6.1 ; Markov Chain with three states:
Consider a shoe-shine establishment consisting of two chairs, chair 1 and
State 0 system empty
chair 2. A customer upon arrival goes initially to chair 1 where his shoes are
State 1 a customer in chair 1
cleaned and polish applied. After this is done he moves on to chair 2 where
State 2 a customer in chair 2
the polish is buffed. The service time at the two chairs are assumed to be
independent random variables that are exponentially distributed with Clearly, P01 = P12 = P20 = 1, so that we have
respective rates µ1 and µ2 . Suppose that the potential customers arrive in
* *
accordance with a Poisson process having rate , and that a potential 0 c 1 2
customer will enter the system only if both chairs are empty
; the process can be analysed as a continuous-time Markov Chain We must also specify the instantaneous rates associated to those
(Exponential service times, and Exponential interarrival times for possible transitions: *
µ1
*
customers (Poisson process)) 0 c
1 2
µ2
a potential customer will enter the system only if there are no other
customers present ; always either 0 or 1 customer in system Here, because of the particular values of the transitions probabilities (all
equal to 0 or 1), we also have directly
if there is 1 customer in the system, we also need to know which chair
he is presently in v0 = v1 = µ1 v2 = µ2
MATH3801-MATH3901 Stochastic Processes 437 MATH3801-MATH3901 Stochastic Processes 438
6. Continuous-Time Markov Chains 6.2 Continuous-time Markov Chains 6. Continuous-Time Markov Chains 6.3 Birth and Death Processes
Some remarks Birth and Death Processes
Remark 1: A major particular case of continuous-time Markov Chains is the Birth
The quantities shown on the diagram are rates, not probabilities and Death process
They are all positive (if qij = 0, no arc is shown on the diagram), but Consider a system whose state X (t) at any time t is the number of
they are not necessarily bounded by 1 people in the system at that time. Whenever there are i people in the
system, then
; this is sometimes a bit surprising, especially when used to handle i) new arrivals enter the system at an exponential rate i , that is, the
discrete-time Markov Chains (and so diagrams with transition time until the next arrival is exponentially distributed with mean
probabilities) 1/ i , and
It should always be clear from the context whether we are working with ii) people leave the system at an exponential rate µi , that is, the time
rates or with transition probabilities until the next departure is also exponentially distributed with mean
Remark 2: 1/µi
’Self-transitions’ are not allowed in a continuous-time Markov Chain, at ; the parameters i and µi (i 0) are called, respectively, the arrival
least not explicitly (or birth) and departure (or death) rates
In fact, the time the chain will stay in a given state i is driven by the rate ; clearly, this birth and death process is a continuous-time Markov
vi ; this rate makes it up for the persistence of the chain in state i Chain (Exponential inter-transition times)
MATH3801-MATH3901 Stochastic Processes 439 MATH3801-MATH3901 Stochastic Processes 440
6. Continuous-Time Markov Chains 6.3 Birth and Death Processes 6. Continuous-Time Markov Chains 6.3 Birth and Death Processes
Birth and Death Processes Birth and Death Processes: examples
the states of the Markov Chain are {0, 1, 2, . . .}
the transitions from state i > 0 may go only to either state i 1 (if
the next event is a death, hence with rate qi,i 1 = µi ) or state i + 1 Suppose i = for all i 0, µi = 0 for all i 0
(if the next event is a birth, hence with rate qi,i+1 = i )
; we have: ; process in which departures never occur (µi ⌘ 0), this is known as a
0 1 2 pure birth process
* *
0 j 1 j 2 j .( . .
µ1 µ2 µ3 * *
0 1 2 .( . .
Clearly:
v0 = 0, vi = i + µi (i > 0)
The birth rate is constant, so the times between successive arrivals
whence are i.i.d. exponential with mean 1/
i µi
Pi,i+1 = , Pi,i 1 = (i > 0) ; this is the Poisson process!
i + µi i + µi
(i.e., the probabilities that an Exp( i ) occurs before or after an
independent Exp(µi ))
Note: Of course, P01 = 1
MATH3801-MATH3901 Stochastic Processes 441 MATH3801-MATH3901 Stochastic Processes 442
6. Continuous-Time Markov Chains 6.3 Birth and Death Processes 6. Continuous-Time Markov Chains 6.3 Birth and Death Processes
Birth and Death Processes: examples Birth and Death Processes: examples
Consider a population whose members can give birth to new members Example 6.5: M/M/1 queue
but cannot die ; pure birth process
Suppose that customers arrive at a single-server station in accordance with a
Suppose each member acts independently of the others, and gives Poisson process with rate . Upon arrival, each customer goes directly into
birth after an exponentially distributed time with mean 1/ service if the server is free; if not, he joins the queue. When the server
finishes serving a customer, the customer leaves the system and the next
; the birth rate is customer in line, enters the service. The service times are assumed to be
i =i , i 0 independent exponential random variables with mean 1/µ.
This process is known as the Yule process, after George Udny Yule If X (t) denote the number of customers in the system at time t, then
(British statistician, 1871-1951), who used it in his mathematical theory {X (t), t 0} is a birth and death process with parameters
of evolution
qi,i 1 = µi = µ (i 1), qi,i+1 = i = (i 0)
2 3
* *
0 1 2 3 .( . .
Note: see that v0 = 0, which makes 0 an absorbing state 0 j
*
1 j
*
2 j .( . .
µ µ µ
MATH3801-MATH3901 Stochastic Processes 443 MATH3801-MATH3901 Stochastic Processes 444
6. Continuous-Time Markov Chains 6.3 Birth and Death Processes 6. Continuous-Time Markov Chains 6.3 Birth and Death Processes
Birth and Death Processes: examples Birth and Death Processes: examples
Example 6.6: M/M/s queue Example 6.4: A Linear Growth Model with Immigration
Consider a birth and death process in which each individual is assumed to
Consider an exponential queueing system in which there are s servers
independently give birth at an exponential rate . In addition, there is an
available, each serving at rate µ. An entering customer first waits in line and
exponential rate of increase ✓ of the population due to an external source
then goes to the first free server.
such as immigration. Finally, deaths are assumed to occur at an exponential
Birth and death process, with parameters rate µ for each member of the population. Let X (t) denote the population size
at time t. Suppose that X (0) = i. Find an expression for M(t) = E(X (t)).
⇢
iµ, 1i s
qi,i 1 = µi = We have:
sµ, i >s
and qi,i 1 = µi = iµ (i 1), qi,i+1 = i =i +✓ (i 0)
qi,i+1 = i = , i 0 Now, given X (t), we have that, for any small ,
8
* *
.( . . j
, * ,
.( . . < X (t) + 1, with probability (X (t) + ✓) + o( )
0 j 1 j 2 j s 1 s j s+1
µ 2µ 3µ
l
sµ sµ
l
sµ
X (t + ) = X (t) 1, with probability X (t)µ + o( )
(s 1)µ :
X (t), with probability 1 (✓ + X (t) + X (t)µ) + o( )
Note: These queueing systems will be studied in more detail in Chapter 8
(Exponential inter-transition times; ignoring events whose probability is o( ))
MATH3801-MATH3901 Stochastic Processes 445 MATH3801-MATH3901 Stochastic Processes 446
6. Continuous-Time Markov Chains 6.3 Birth and Death Processes 6. Continuous-Time Markov Chains 6.3 Birth and Death Processes
Birth and Death Processes: examples Birth and Death Processes: examples
Hence, E(X (t + )|X (t)) = X (t) + (✓ + X (t) X (t)µ) + o( ), so that Here, if µ 6= , we get, for some constant c,
E(X (t + )) = E(E(X (t + )|X (t)) = M(t) + ( µ)M(t) + ✓ + o( ) ✓
M(t) = ce( µ)t
µ
or
M(t + ) M(t) o( )
=( µ)M(t) + ✓ + To determine the constant c, we use the fact that M(0) = i:
; taking the limit as ! 0, ✓
+ i = c,
µ
M 0 (t) = ( µ)M(t) + ✓
so that
; we have to solve a differential equation, as it will often be the case with ✓ ◆
✓ ✓ ✓
continuous-time Markov Chains M(t) = e( µ)t
i+ = ie( µ)t
+ (e( µ)t
1)
µ µ µ
Fact
If µ = , the solution is
The solution of the differential equation y 0 (x) = a + by (x) is, if b 6= 0,
M(t) = ✓t + i
y (x) = cebx a/b, for some constant c
MATH3801-MATH3901 Stochastic Processes 447 MATH3801-MATH3901 Stochastic Processes 448
6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t) 6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t)
Generator of a Continuous-Time Markov Chain Generator: remarks
For discrete-time Markov Chains, we wrote the n-step transition
Remark 1:
probabilities matrix P[n] in terms of the one-step matrix P, which played
The diagonal elements of G are vi . As vi is the rate at which the
the central role in the analysis
process leaves state i for any other state, vi may be regarded as
On the other hand, in continuous time there is no obvious analogue of quantifying the process persistence in state i
P, since there is no implicit unit length of time
; it makes sense to have it as Gii !
However, the instantaneous transition rates qij (Slide 436) give
Remark 2:
information about what happens instantaneously
G can effectively be seen as the counterpart of the one-step transition
Consider the matrix G, such that Gij = qij for i 6= j, and Gii = vi : matrix P in the discrete case
0 1 P
v0 q01 ... q0M But: the probabilities Pij 2 [0, 1], Pij ⌘ 1,
j
B C
B q10 v1 q1M C P P
G=B . .. . C while the rates Gij 0, j Gij = j6=i qij vi = 0
@ .. . .. A
qM0 qM1 ... vM Remark 3:
In the textbook the generator matrix is quite anonymously introduced in
This matrix is called the (infinitesimal) generator of the Markov Chain Section 6.8, and is denoted R there
and is a fundamental quantity associated with it
MATH3801-MATH3901 Stochastic Processes 449 MATH3801-MATH3901 Stochastic Processes 450
6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t) 6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t)
Generator: examples Generator: examples
For the Yule process, we have
For the shoe-shine shop Markov Chain (Example 6.1), we have 0 1
0 0 0 ...
0 1 B .. C
0 B . C
B0 C
G=@ 0 µ1 A µ1 G=B
B .. C
. 2 2 C
µ2 µ2 0 B
B
C
C
@ 3 3 A
For the Poisson process, we have .. ..
. .
0 1
0 ... For a general birth and death process, we have
B C
B 0 0 C 0 1
G=B 0 0 C 0 0 0 ...
@ A
. .. .. .. .. B ( + µ1 ) 0 C
.. . . . . B µ1 1 1 C
G=B 0 µ2 ( 2 + µ2 ) 0 C
@ 2 A
. .. .. .. ..
.. . . . .
MATH3801-MATH3901 Stochastic Processes 451 MATH3801-MATH3901 Stochastic Processes 452
6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t) 6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t)
Transition probability functions Transition probability functions
In a homogeneous continuous-time Markov Chain {X (t), t 0} , we For each t, we can form the functions Pij (t) into a matrix P(t) whose
know that the probability that the process presently (say, at time s) in (i, j)th entry is Pij (t)
state i will be in state j a time t later is a function of i, j and t only, that ; the matrix 0 1
is, P00 (t) P01 (t) . . .
B C
P X (s + t) = j|X (s) = i, {X (⌧ ), 0  ⌧ < s} = Pij (t), P(t) = @P10 (t) P11 (t) A
. ..
for any i, j nonnegative integers and t, s 0 (Markov property) .. .
; these quantities are called the transition probability functions is thus a matrix-valued function of t
P
Obviously, we have Pij (t) 2 [0, 1] for all i, j and t 0 and j Pij (t) ⌘ 1 Similarly to P[n] = Pn , we expect a strong relationship between P(t)
and G ; true
Just as G is the continuous-time counterpart of P, the functions Pij (t)
can be thought of as the continuous-time counterpart of the n-step Unfortunately, in continuous-time, this relationship takes the form of a
[n]
transition probabilities Pij : system of differential equations, which can rarely be solved in a nice
closed form (see later)
(very) short-term behaviour of the chain: P ⇡ G
mid-term behaviour of the chain: P[n] ⇡ P(t) In simple cases we can avoid this and determine the functions Pij (t)
from what we know
MATH3801-MATH3901 Stochastic Processes 453 MATH3801-MATH3901 Stochastic Processes 454
6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t) 6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t)
Pij (t) for the Poisson process Pij (t) for the Poisson process ( = 1)
Consider a Poisson process with rate
; particular case of a pure birth process

1.0
P00(t)
For any i  j, Pij (t) is the probability of observing j i arrivals within a P01(t)
P02(t)
time t, that is, P(N(t) = j i)
P03(t)
P04(t)
P05(t)

0.8
For the Poisson process of rate , we know that N(t) ⇠ P( t)
Hence,

0.6
( t)j i

P0j(t)
t
Pij (t) = P(N(t) = j i) = e (t 0)
(j i)!

0.4
for any i  j

0.2
For i > j, Pij (t) ⌘ 0
Note: deriving these expressions as solutions of differential equations

0.0
is essentially the proof (that we skipped) that Definition 3 of a Poisson
process implies Definition 2 (Slide 370)
0 2 4 6 8 10
t
MATH3801-MATH3901 Stochastic Processes 455 MATH3801-MATH3901 Stochastic Processes 456
6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t) 6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t)
Pij (t) for the pure birth process Pij (t) for the pure birth process
Consider now a more general pure birth process (µi ⌘ 0), having Similarly,
distinct birth rates 0 i, i 0 1
j
X j
X j
Y
Let Tk denote the time the process spends in state k before making a P(X (t) < j + 1|X (0) = i) = P @ Tk > t A = e kt r
transition into state k + 1 (; Tk ⇠ Exp( k )) k =i k =i r =i
r k
P r 6=k
; kj =i1 Tk is the time it takes until the process enters state j if it is
currently in state i, and follows the Hypoexponential distribution ; P(X (t) = j|X (0) = i) = P(X (t) < j + 1|X (0) = i) P(X (t) < j|X (0) = i),
(Slide 347) so that it follows:
We have: Proposition 6.1
j 1
X For a pure birth process having i 6= j when i 6= j,
(X (t) < j) | (X (0) = i) () Tk > t (if i < j)
k =i j j j 1 j 1
X Y r
X Y r
Therefore, for i < j, kt kt
Pij (t) = e e , for i < j
0 1 k =i r =i
r k
k =i r =i
r k
Xj 1 j 1
X j 1
Y r 6=k r 6=k
kt r
P(X (t) < j|X (0) = i) = P @ Tk > tA = e
r k it
k =i k =i r =i Also, Pii (t) = P(Ti > t) = e (exponential distribution of Ti )
r 6=k
MATH3801-MATH3901 Stochastic Processes 457 MATH3801-MATH3901 Stochastic Processes 458
6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t) 6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t)
Pij (t) for the Yule process Pij (t) for the Yule process ( = 1)
Example 6.8

1.0
P11(t)
Consider the Yule process, which is a pure birth process for which P12(t)
P13(t)
i = i ,i 1. Find Pij (t)
P14(t)
P15(t)

0.8
Take i = 1 and apply the previous result with k = k to find
j j j 1 j 1

0.6
X Y r X Y r
k t k t
P1j (t) = e e
r k r k

P1j(t)
k =1 r =1 k =1 r =1
r 6=k r 6=k

0.4
= ...
t t j 1
=e (1 e )

0.2
t t j 1
; P(X (t) = j|X (0) = 1) = e (1 e )
t
; X (t)|(X (0) = 1) ⇠ Geo(e )

0.0
0 2 4 6 8 10
t
MATH3801-MATH3901 Stochastic Processes 459 MATH3801-MATH3901 Stochastic Processes 460
6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t) 6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t)
The Negative Binomial Distribution The Negative Binomial Distribution
From the memoryless property of the Geometric distribution (Slide
Geo(p) is the distribution of the number of trials required to observe 337), it can be understood that
the first success in a series of Bernoulli trials
X = Y1 + Y2 + . . . + Yr
What is the distribution of the number X of trials required to observe
the r th success, where r is any positive integer? where Y1 , Y2 , Yr are r independent Geo(p)-rv This representation
Of course, SX = {r , r + 1, . . .}, and gives easy access to the properties of the Negative Binomial
distribution:
✓ ◆ Xr
x
1 r r
P(X = x) = p (1 p)x r
for x 2 SX E(X ) = E(Yi ) =
1 r p
i=1
r
This is known as the Negative Binomial distribution, denoted X r (1 p)
Var(X ) = Var(Yi ) =
p2
i=1
X ⇠ NegBin(r , p)
r
Y ✓ ◆r
p
'X (t) = 'Yi (t) = t
e 1+p
i=1
MATH3801-MATH3901 Stochastic Processes 461 MATH3801-MATH3901 Stochastic Processes 462
6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t) 6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t)
Pij (t) for the Yule process Chapman-Kolmogorov equations
Example 6.8 Lemma 6.3: Chapman-Kolmogorov equations
Consider the Yule process, which is a pure birth process for which In a continuous-time Markov Chain, we have, for all s, t 0,
i = i ,i 1. Find Pij (t) X
Pij (t + s) =
Pik (t)Pkj (s)
In the Yule process, X (t)|(X (0) = i) is the sum of i independent
k 2SX
Geo(e t )-rv, and has therefore the Negative Binomial distribution:
X (t)|(X (0) = i) ⇠ NegBin(i, e t
) Proof: (essentially same as the proof in the discrete case)
Hence Pij (t + s) = P (X (t + s) = j|X (0) = i)
X
✓ ◆ = P (X (t + s) = j|X (t) = k , X (0) = i) P (X (t) = k |X (0) = i)
j 1 i t t j i
Pij (t) = P(X (t) = j|X (0) = i) = e (1 e ) , k 2SX
i 1 X
= P (X (t + s) = j|X (t) = k ) P (X (t) = k |X (0) = i)
for 1  i  j
k 2SX
X
In these ‘simple’ cases, we have been able to directly derive the functions Pij ; = Pik (s)Pkj (t)
k 2SX
In less classical situations, though, we will have to solve the system of
differential equations ; in matrix form, this can be concisely written P(t + s) = P(t) P(s)
MATH3801-MATH3901 Stochastic Processes 463 MATH3801-MATH3901 Stochastic Processes 464
6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t) 6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t)
Chapman-Kolmogorov equations: remark Derivatives of Pij (t) at 0
It seems clear that we have Pii (0) = 1 and Pij (0) = 0 if i 6= j. Besides,
On Slide 333, we showed that if a real-valued function g was such that Lemma 6.2
g(t + s) = g(t)g(s) for all t, s 0, then g must be the exponential In a continuous-time Markov Chain, we have
function 1 Pii ( )
(a) Pii0 (0) = lim = vi
Here, P(t) satisfies P(t + s) = P(t) P(s) for all t, s 0, but is !0
matrix-valued Pij ( )
(b) Pij0 (0) = lim = qij when i 6= j
!0
; things are not as simple
Proof:
However we can expect the exponential function to play a role in the
functions Pij (t) (a) Ti ⇠ Exp(vi ) for all i
(look at Slides 455, 458 or 463 to get convinced) ; one transition out of state i in a short time has probability vi + o( )
(a second transition back to i has probability o( ))
A first step in that direction is given by the following result 1 Pii ( ) o( ) !0
; Pii ( ) = 1 vi + o( ) ; vi = + ; Pii0 (0) = vi
Pij ( ) o( )
(b) Pij ( ) = vi Pij + o( ) = qij + o( ) ; qij = +
MATH3801-MATH3901 Stochastic Processes 465 MATH3801-MATH3901 Stochastic Processes 466
6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t) 6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t)
Kolmogorov’s Backward Equations Kolmogorov’s Backward Equations: examples
Theorem 6.1: Kolmogorov’s Backward Equations
Example 6.9
In a continuous-time Markov Chain, for all states i, j and time t 0,
Write the Kolmogorov’s backward equations for the pure birth process
X
Pij0 (t) = qik Pkj (t) vi Pij (t) For a pure birth process, we know that qij = i j,i+1 (Kronecker delta) and
k 6=i vi = i . Hence, the equations are:
Proof: from the Chapman-Kolmogorov equations, we have Pij0 (t) = i (Pi+1,j (t) Pij (t))
X
Pij (t + ) Pij (t) =
Pik ( )Pkj (t) Pij (t) for all i, j 0
k 2SX
X ; solving this system of differential equations, with the initial conditions
= Pik ( )Pkj (t) (1 Pii ( ))Pij (t) Pij (0) = i,j , yields Pij (t) for all i, j
k 6=i it
0 1 This could be achieved recursively, using the fact that Pii (t) = e
Pij (t + ) Pij (t) X Pik ( ) 1 Pii ( ) (Exponential distribution of Ti )
; lim = lim @ Pkj (t) Pij (t)A
!0 !0
k 6=i It can be checked that the functions Pij (t) we derived on Slide 458 satisfy
these equations, indeed
The limit and the summation can be interchanged, and the result follows
MATH3801-MATH3901 Stochastic Processes 467 MATH3801-MATH3901 Stochastic Processes 468
6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t) 6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t)
Kolmogorov’s Backward Equations: examples Kolmogorov’s Backward Equations: examples
Example 6.11 (in which we are able to solve the equations)
Example 6.10 Consider a machine that works for an exponential amount of time having
Write the Kolmogorov’s backward equations for a general birth and death mean 1/ before breaking down; and suppose that it takes an exponential
process amount of time having mean 1/µ to repair the machine. If the machine is in
working condition at time 0, what is the probability that it will be working at
For a general birth and death process, we know qi,i+1 = i (i 0), qi,i 1 = µi time t?
(i 1) and qij = 0 for any other value j, with v0 = 0 and for i 1 vi = i + µi
We have continuous-time Markov Chain with two states: working (state 0)
The equations become: for all j, and being repaired (state 1)
*
P0j0 (t) = 0 (P1j (t) P0j (t)), 0 j 1
µ
Pij0 (t) = i Pi+1,j (t) + µi Pi 1,j (t) ( i + µi )Pij (t), i >0
with q01 = v0 = and q10 = v1 = µ. We desire P00 (t).
; again, solving this system of differential equations, with the initial
; solve Kolmogorov’s backward equations
conditions Pij (0) = i,j , would give the functions Pij (t) for all i, j
; system of differential equations: 4 differential equations for 4 functions
; really complicated P00 (t), P01 (t), P10 (t), P11 (t)
; in fact: 2 differential equations for 2 functions (as Pi0 (t) =1 Pi1 (t) 8t)
MATH3801-MATH3901 Stochastic Processes 469 MATH3801-MATH3901 Stochastic Processes 470
6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t) 6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t)
Kolmogorov’s Backward Equations: examples Summary
The relevant equations are: A continuous-time Markov Chain is characterised by the instantaneous
P q
⇢ transition rates qij ,
which also define and ; = Pij qij , Pij =vi vi j
P ij
P00
0
(t) = (P10 (t) P00 (t)), (1) j qij
P10
0
(t) = µ(P00 (t) P10 (t)) (2) (Note: qii = 0)
Then, µ⇥(1) + ⇥(2): Lemma 6.3: Chapman-Kolmogorov equations
0 0
µP00 (t) + P10 (t) = 0 In a continuous-time Markov Chain, we have, for all s, t 0,
Integrating: X
Pij (t + s) =
Pik (t)Pkj (s)
µP00 (t) + P10 (t) = c
k 2SX
for some constant c. But we know P00 (0) = 1 and P10 (0) = 0, so that c = µ
; P10 (t) = µ(1 P00 (t)) Theorem 6.1: Kolmogorov’s Backward Equations
In a continuous-time Markov Chain, for all states i, j and time t 0,
In (1): P00
0
(t) = µ (µ + )P00 (t), and solving using the result on Slide 447,
X
Pij0 (t) =
qik Pkj (t) vi Pij (t)
µ µ µ
; P00 (t) = e (µ+ )t
+ , P10 (t) = e (µ+ )t
+ k 6=i
µ+ µ+ µ+ µ+
P
; P00 (t) is the probability that the machine is working at time t Proof: Pij (t + ) Pij (t) = k 2SX Pik ( )Pkj (t) Pij (t) = . . .
MATH3801-MATH3901 Stochastic Processes 471 MATH3801-MATH3901 Stochastic Processes 472
6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t) 6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t)
Kolmogorov’s Forward Equations Kolmogorov’s Forward Equations: examples
Theorem 6.2: Kolmogorov’s Forward Equations Example
In a continuous-time Markov Chain, for all states i, j and times t 0, Find the Kolmogorov’s Forward Equations in the pure birth process
under suitable regularity conditions, We know qij = and vi =
i j,i+1 i. The equations thus reduce to
X
Pij0 (t) =
Pik (t)qkj Pij (t)vj Pij0 (t) = j 1 Pi,j 1 (t) j Pij (t), for all i, j
k 6=j However, note that Pij (t) ⌘ 0 if j < i (no deaths), so that the non-trivial
equations are ⇢
Proof: (similar to that for the Backward Equations) Pii0 (t) = i Pii (t)
0 1 Pij0 (t) = j 1 Pi,j 1 (t) j Pij (t) for j i +1
Pij (t + ) Pij (t) X Pkj ( ) 1 Pjj ( )
lim = lim @ Pik (t) Pij (t)A From the first equation and Pii (0) = 1, it follows
!0 !0
k 6=j it
Pii (t) = e (t 0)
; Exponential distribution of Ti , and further, recursively,
; the assumed regularity conditions allow us to interchange the limit and
Z t
the summation, and the result follows jt js
Pij (t) = j 1e e Pi,j 1 (s) ds j i +1
Note: the regularity conditions hold in most models, including all birth and 0
death processes and all finite state models (again, one can check that this yields the functions Pij (t) on Slide 458)
MATH3801-MATH3901 Stochastic Processes 473 MATH3801-MATH3901 Stochastic Processes 474
6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t) 6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t)
Kolmogorov’s Forward Equations: examples Kolmogorov’s equations: remark
Kolmogorov’s backward and forward equations are indeed two different
Example 6.12 sets of differential equations with the same solutions {Pij (t)} (with the
Find the Kolmogorov’s Forward Equations in the birth and death process boundary conditions Pij (0) = i,j )
We know v0 = 0 , q0j = 0 j,1 and for any i 1, vi = i + µi , qi,i+1 = i,
The backward equations express Pij0 (t) in terms of what might have
P
qi,i 1 = µi and qij = 0 for other values of j. The equations are: happened before reaching state j ⇠ k 6=i qik Pkj (t) ,
⇢ while the forward equations do it in terms of what might happen when
Pi0
0
(t) = µ1 Pi1 (t) 0 Pi0 (t) P
Pij0 (t) = j 1 Pi,j 1 (t) + µj+1 Pi,j+1 (t) ( i + µi )Pij (t) leaving state i ⇠ k 6=j Pik (t)qkj
for all states i, j In some situations, it is easier to solve the backward equations, in
some others forward equations are more convenient to work with
; virtually impossible to solve analytically
In many situations, though, none can be solved in closed form
MATH3801-MATH3901 Stochastic Processes 475 MATH3801-MATH3901 Stochastic Processes 476
6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t) 6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t)
Kolmogorov’s equations Kolmogorov’s equations
With the matrices G and P(t), the Kolmogorov’s backward and forward Proposition
equations
The solution of both equations P0 (t) = GP(t) and P0 (t) = P(t)G, with
X X
Pij0 (t) = qik Pkj (t) vi Pij (t) and Pij0 (t) = qkj Pik (t) vj Pij (t) the boundary conditions P(0) = I, is given by
k 6=i k 6=j
t 2 G2 t 3 G3
P(t) = I + tG + + + . . ., for t 0
just become 2! 3!
P0 (t) = GP(t) and P0 (t) = P(t)G, Proof: differentiating element-wise the suggested P(t) with respect to t yields
t 2 G3 t 3 G4
where we have defined P0 (t) = G + tG2 + + + ...
2! 3!
0 0 1 ✓ ◆
P00 (t) P 0 t 2 G2 t 3 G3
01 (t) . . . = G I + tG + + + ...
B 0 C 2! 3!
P0 (t) = P (t) P (t)
0
@ 10 11 A
. .. = GP(t)
.. .
Besides, P(0) = I. Similarly,
To solve those matrix differential equations, we should add the ✓ ◆
t 2 G2 t 3 G3
boundary condition P(0) = I (identity matrix) P0 (t) = I + tG + + + . . . G = P(t)G
2! 3!
MATH3801-MATH3901 Stochastic Processes 477 MATH3801-MATH3901 Stochastic Processes 478
6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t) 6. Continuous-Time Markov Chains 6.4 The Transition Probability Function Pij (t)
Kolmogorov’s equations Kolmogorov’s equations
By analogy with It might look like this easily provides an explicit expression for all the
1 transition probabilities Pij (t) (without solving differential equations)
X xk x2 x3
ex = =1+x + + + ..., However, this is not really the case: the sum is infinite, and we often
k! 2! 3!
k =0 have to compute many of its terms to arrive at a good approximation
t 2 G2 t 3 G3 as t grows
the matrix P(t) = I + tG + 2! + 3! + . . . is often denoted
The idea still suggests some approximation methods for P(t). For
P(t) = etG (matrix exponential) n
instance, we know that ex = limn!1 1 + xn , so that
Careful! The exponential of a matrix is not just the matrix whose ✓ ◆n
t
elements have been exponentiated ! P(t) ' I+G , for n large enough
n
In fact, ‘etG ’ is just a shorthand for the infinite sum above
Note: the solution P(t) = etG clearly shows how fundamental G is to a We see that finding P(t) is not an easy task and requires a lot of work
continuous-time Markov Chain ; is it really worth it ?
P(t) = etG is the continuous-time analogue of P[n] = Pn in In discrete-time, we rarely worked with P[n] , but we tried to identify a
discrete-time stationary distribution which was usually the feature of most interest
MATH3801-MATH3901 Stochastic Processes 479 MATH3801-MATH3901 Stochastic Processes 480
6. Continuous-Time Markov Chains 6.5 Limiting Probabilities 6. Continuous-Time Markov Chains 6.5 Limiting Probabilities
Limiting Probabilities: heuristics Limiting probabilities
In Example 6.11 (continuous-time Markov Chain with 2 states 0 and 1,
Slide 470), we found that: From what we know about discrete-time Markov Chain, we can expect
µ µ µ that, in ‘nice’ situations, there exists some probability ⇡j such that, for
P00 (t) = e (µ+ )t + , P10 (t) = e (µ+ )t + any initial state i,
µ+ µ+ µ+ µ+
For instance, with = 2 and µ = 3: lim Pij (t) = ⇡j
t!1
After a long time, the initial state does no longer matter, and the chain
enters a stationary behaviour, that is, if P(X (s) = j) = ⇡j for all j at
1.0

Poo (t) ; when t ! 1, P00 (t) and P10 (t) some time s, then P(X (s + t) = j) = ⇡j for all j for all subsequent time
converge to the same limit
0.8

s+t
; long-term behaviour of the
Besides, we know that ⇡j can also be regarded as the long run
0.6

chain apparently independent

Pij(t)

proportion of time that the process is in state j

of the initial state
0.4

; the initial-state-dependent Consequently, ⇡j is also the unconditional probability of finding the

process in state j at any given time
0.2

P10(t) exponential term in P00 (t) and

P10 (t) is a transient effect
0.0

MATH3801-MATH3901 Stochastic Processes 481 MATH3801-MATH3901 Stochastic Processes 482

t
6. Continuous-Time Markov Chains 6.5 Limiting Probabilities 6. Continuous-Time Markov Chains 6.5 Limiting Probabilities
Limiting probabilities Limiting probabilities: Balance Equations
If the limit ⇡j exists for all j, then, from the Chapman-Kolmogorov We have the following equivalences:
equations, we can write, for all t 0: 1
X (tG)k
X ⇡ = ⇡P(t) for all t 0 () ⇡ = ⇡ for all t 0
lim Pij (s + t) = lim Pik (s)Pkj (t) k!
s!1 s!1 k =0
k 2SX 1
X tk
X
⇡j = ⇡k Pkj (t) () 0 = ⇡ Gk for all t 0
k!
k =1
k 2SX
() 0 = ⇡Gk for all k 1
Defining the vector ⇡ = (⇡0 , ⇡1 , ⇡2 , . . .), we have, in matrix form,
() 0 = ⇡G
⇡ = ⇡P(t) for all t 0, Explicitly, this is
X
which is the essential condition for ⇡ to be a stationary distribution for ⇡k qkj = ⇡j vj (for all state j)
the chain k 6=j
; solving this system of equations for ⇡ would require knowing P(t)
; these equations are called the balance equations, as they can
However, writing etG for P(t) in this expression would yield a much be interpreted as
simpler condition long run rate into state j = long run rate out of state j
MATH3801-MATH3901 Stochastic Processes 483 MATH3801-MATH3901 Stochastic Processes 484
6. Continuous-Time Markov Chains 6.5 Limiting Probabilities 6. Continuous-Time Markov Chains 6.5 Limiting Probabilities
Limiting probabilities: Balance Equations Balance equations: example
The balance equations Example
⇡G = 0, Consider the machine of Example 6.11. In the long-term, what is the
together with the normalisation constraint proportion of time that the machine is working?
X The problem was represented by a 2-state continuous time Markov Chain
⇡j = 1, (state 0: the machine is working, state 1: it is being repaired)
j2SX
*
0 j 1
determine the stationary distribution ⇡ (if one exists) µ
(; analogue to ⇡ = ⇡P in the discrete-time case) with generator ✓ ◆
G=
Note: like in the discrete-time case, one of the equations in ⇡G = 0 will µ µ
be redundant, as we know that each row in G sum to 0 The balance equations ⇡G = 0 are thus
⇢
In fact, if ⇡ is solution to ⇡G = 0, then for any c, c⇡ is also solution ⇡0 + µ⇡1 =0
P ⇡0 µ⇡1 =0
; need for the normalisation j ⇡j , to make ⇡ into a genuine The second equation is obviously useless, and the constraint ⇡0 + ⇡1 = 1 is
probability distribution required to unequivocally determine ⇡
MATH3801-MATH3901 Stochastic Processes 485 MATH3801-MATH3901 Stochastic Processes 486
6. Continuous-Time Markov Chains 6.5 Limiting Probabilities 6. Continuous-Time Markov Chains 6.5 Limiting Probabilities
Balance equations: example Stationary distribution of the embedded Markov Chain
Example
Consider the machine of Example 6.11. In the long-term, what is the
On Slide 433, we defined the embedded Markov Chain as the
proportion of time that the machine is working? discrete-time (DT) Markov Chain which dictates the transition
probabilities for the continuous-time (CT) Markov Chain
We find:
µ Let the
P CT Markov Chain have transition rates qij and, as usual, let
⇡0 = and ⇡1 =
µ+ µ+ vi = j6=i qij be the rate out of state i
µ
; the long-term proportion of time that the machine is working is µ+ The one-step transition probabilities of the embedded Markov Chain
See that this is indeed the limit of both P00 (t) and P10 (t) on Slide 481 are (Slide 436) qij
Pij =
; If µ > , the machine is repaired faster than it breaks down, and it works vi
most of the time Denote G and ⇡ the generator and the stationary distribution of the CT
If < µ, the machine fails faster than it is repaired, and it is under repair Markov Chain, and P and the transition matrix and the stationary
most of the time distribution of the embedded Markov Chain
If µ = , then ⇡0 = ⇡1 = 0.5
MATH3801-MATH3901 Stochastic Processes 487 MATH3801-MATH3901 Stochastic Processes 488
6. Continuous-Time Markov Chains 6.5 Limiting Probabilities 6. Continuous-Time Markov Chains 6.5 Limiting Probabilities
Stationary distribution of the embedded Markov Chain Stationary distribution of the embedded Markov Chain
If ⇡ and both exist, they must satisfy Furthermore, if that stationary distribution is unique (that is, if the
embedded Markov Chain is irreducible and positive recurrent), we may
⇡G = 0 and = P conclude that ⇡ j vj
j =P
; one can wonder whether ⇡ and are the same. The answer is no i2SX ⇡i vi
Writing out the jth equation in each set of equations, we have and thus,
j /vj
⇡j = P
X X i2SX i /vi
⇡j vj = and⇡i qij = i Pij j
i6=j i2SX
Intuitively:
j is the long run proportion of transitions that the CT Markov
Since qij = vi Pij and qjj = 0, we may rewrite the first one as Chain makes into state j
X each time it makes a transition into state j, it stays there for an
⇡j vj = ⇡i vi Pij average of 1/vj units of time (Exponential distribution of Tj )
i2SX
; the product j /vj should be proportional to the long run proportion
; ⇡j vj satisfies the equation for the stationary distribution of the of time that the CT Markov Chain spends in state j, and this is how
embedded Markov Chain ! we interpret ⇡j !
MATH3801-MATH3901 Stochastic Processes 489 MATH3801-MATH3901 Stochastic Processes 490
6. Continuous-Time Markov Chains 6.5 Limiting Probabilities 6. Continuous-Time Markov Chains 6.5 Limiting Probabilities
Stationary distribution of the embedded Markov Chain Existence of the stationary distribution
Remark: if and ⇡ both exist, the uniqueness of implies the Proposition
uniqueness of ⇡, and ⇡ can be determined from Let {X (t), t 0} be a continuous-time Markov Chain with generator G,
whose embedded Markov Chain is positive recurrent and irreducible
However, the existence of does not guarantee the existence of ⇡!
with (unique) stationary distribution . Then, provided
P
The existence of ⇡ actually all depends on the rates vj : j2SX j /vj < 1, for any i, j we have
for any discrete-time Markov Chain with a stationary distribution , we
lim Pij (t) = ⇡j
may construct infinitely many CT Markov Chains with different sets of t!1
rates {vj }
P where the vector ⇡ = (⇡0 , ⇡1 , . . .) is the unique solution of
Depending on these rates, ⇡ may or may not exist: if i2SX i /vi < 1,
⇡ exists, it does not otherwise ⇡G = 0
Those observations allow us to state the following result P
satisfying the constraint j2SX ⇡j = 1
MATH3801-MATH3901 Stochastic Processes 491 MATH3801-MATH3901 Stochastic Processes 492
6. Continuous-Time Markov Chains 6.5 Limiting Probabilities 6. Continuous-Time Markov Chains 6.5 Limiting Probabilities
Existence of the stationary distribution: remark Lévy dichotomy
Proposition
In discrete-time, what ensures the existence of the limiting distribution
is the ergodic nature of the chain. Here, we only require the embedded In a continuous-time Markov Chain, for any two states i and j, exactly
Markov Chain to be one of the following is true:
irreducible and Pij (t) = 0 for all t 0,
positive-recurrent Pij (t) > 0 for all t 0
What about the aperiodicity ? We do not need it! Why ? Proof: no proof provided
1 a periodic discrete-time Markov Chain does have a unique ; if the embedded Markov Chain of a continuous-time Markov Chain
stationary distribution (but it is not the limit value of P[n] , as this is irreducible, then starting in state i we could possibly be in any
does not converge, see Slide 257) other state j (including i itself) at any time t > 0
2 continuous-time Markov Chains do not have a period! (no “step” in This leads to a definition of irreducibility for continuous-time Markov
continuous time) Chains: a continuous-time Markov Chain is irreducible if, for any pair of
states (i, j), Pij (t) > 0 for some t > 0
This last point is made explicit by the following result
For instance, a pure birth process is not irreducible, as Pij (t) ⌘ 0 for
j <i
MATH3801-MATH3901 Stochastic Processes 493 MATH3801-MATH3901 Stochastic Processes 494
6. Continuous-Time Markov Chains 6.5 Limiting Probabilities 6. Continuous-Time Markov Chains 6.5 Limiting Probabilities
Stationary distribution: example Stationary distribution: example
Solving in terms in ⇡0 yields
Example 6.15: the Shoe-shine shop
⇡2 = ⇡0 , ⇡1 = ⇡0
Reconsider the shoe-shine shop of Example 6.1. Determine the proportion of µ2 µ1
time the process is in each of the states 0, 1 and 2
and with the constraint ⇡0 + ⇡1 + ⇡2 = 1, it follows
We have the chain ✓ ◆
µ1 ⇡0 1 + + =1
* * µ2 µ1
0 c 1 2
that is,
µ2 µ1 µ2
⇡0 =
µ1 µ2 + (µ1 + µ2 )
; write the balance equations
and thus
⇡ 0 = µ 2 ⇡2 (balance in state 0) µ1 µ2
⇡2 = , ⇡1 =
µ 1 ⇡1 = ⇡0 (balance in state 1) µ1 µ2 + (µ1 + µ2 ) µ1 µ2 + (µ1 + µ2 )
µ 2 ⇡2 = µ 1 ⇡1 (balance in state 2) Note: the embedded Markov Chain is periodic (period 3), but this does not
prevent the continuous-time Markov Chain from admitting a valid stationary
distribution ⇡ with ⇡j = limt!1 Pij (t) for all i, j = 0, 1, 2
MATH3801-MATH3901 Stochastic Processes 495 MATH3801-MATH3901 Stochastic Processes 496
6. Continuous-Time Markov Chains 6.5 Limiting Probabilities 6. Continuous-Time Markov Chains 6.5 Limiting Probabilities
Stationary distribution for the birth and death process Stationary distribution for the birth and death process
Consider again our favorite particular case of continuous-time Markov Solving in terms of ⇡0 yields
Chains: the birth and death process
0
From the generator G of the birth and death process, the balance ⇡1 = ⇡0
µ1
equations ⇡G = 0 are seen to be
1 1 0
⇡2 = ⇡1 = ⇡0
µ1 ⇡1 = 0 ⇡0 µ2 µ2 µ1
. .
0 ⇡0 + µ2 ⇡2 = ( 1 + µ1 )⇡1 .. = ..
. .
.. = .. ⇡j =
j 1 j 2... 1 0
⇡0 for any j 1
+ µj+1 ⇡j+1 = ( + µj )⇡j for any j 1 µj µj 1 . . . µ2 µ1
j 1 ⇡j 1 j
P P1
Adding to each equation the one preceding it, we find As j2SX ⇡j = j=1 ⇡j = 1, we need
1
µ j ⇡j = j 1 ⇡j 1 for any j 1 X j 1 j 2... 1 0
⇡0 + ⇡0 =1
µj µj 1 . . . µ2 µ1
Note: these ‘simpler’ equations are sometimes called the local balance j=1
equations, as they express a balance of flow between any pair of states
MATH3801-MATH3901 Stochastic Processes 497 MATH3801-MATH3901 Stochastic Processes 498
6. Continuous-Time Markov Chains 6.5 Limiting Probabilities 6. Continuous-Time Markov Chains 6.5 Limiting Probabilities
Stationary distribution for the birth and death process Stationary distribution for the birth and death process
so that: 0 1 Example 6.14
1
1
X j 1 j 2... 1 0A Find the stationary distribution (if it exists) for the M/M/1 queueing system
⇡0 = @1 +
µj µj 1 . . . µ2 µ1
j=1 Recall that the M/M/1 queue is a birth and death process, with i ⌘ and
and, for all j 1, µi ⌘ µ. The condition of existence of the stationary distribution becomes:
0 1 1 1 ✓ ◆j
X
1
X <1
j 1 j 2... 1 0 @1 + j 1 j 2... 1 0A µ
⇡j = j=1
µj µj 1 . . . µ2 µ1 µj µj 1 . . . µ2 µ1
j=1
This is a geometric series, which converges iff
; the stationary distribution exists if and only if < 1, that is, <µ
µ
1
X j 1 j 2... 1 0
<1 ; the arrival rate of the customers must be less than the service rate
µj µj 1 . . . µ2 µ1 (intuitive), otherwise the number of customers in the system explodes
j=1
(the chain is then transient, the system is said unstable)
; otherwise, ⇡j ⌘ 0 and the limiting probabilities do not form a Note that µ = also leads to instability (the embedded Markov Chain is the
proper probability distribution ; transient (or null-recurrent) chain symmetric random walk, which is null-recurrent (Slides 237-238)
MATH3801-MATH3901 Stochastic Processes 499 MATH3801-MATH3901 Stochastic Processes 500
6. Continuous-Time Markov Chains 6.5 Limiting Probabilities 6. Continuous-Time Markov Chains 6.5 Limiting Probabilities
Stationary distribution for the birth and death process Stationary distribution for the birth and death process
If < µ, then Notes:
0 1 1 0 1 1
in the M/M/s queueing system (Example 6.6), the condition of
1 ✓ ◆j 1 ✓ ◆j ✓ ◆ 1
X X 1 existence of a stationary distribution becomes
⇡0 = @1 + A =@ A = =1 ,
µ µ 1 ( /µ) µ s j 1 j
j=1 j=0 X X
+ < 1,
and j!µj (sµ)j
✓ ◆j ✓ ◆ j=1 j=s+1
⇡j = 1 , for all j 1
µ µ which reduces to <1
; if N denotes the number of customers in the system, we see that
sµ
in the linear growth model with immigration (Example 6.4), the
N + 1 ⇠ Geo(1 /µ) condition becomes
1
X ✓(✓ + ) . . . (✓ + (n
(in the long term), so that 1) )
< 1,
n!µn
1 n=1
E(N) = 1=
1 /µ µ
shown (see textbook p.393) to be equivalent to <1
µ
MATH3801-MATH3901 Stochastic Processes 501 MATH3801-MATH3901 Stochastic Processes 502
6. Continuous-Time Markov Chains 6.5 Limiting Probabilities 6. Continuous-Time Markov Chains 6.5 Limiting Probabilities
Stationary distribution for the birth and death process Stationary distribution for the birth and death process
Example 6.13: A machine repair model
Consider a job shop that consists of M machines and one serviceman. From the general results, we have directly
Suppose that the amount of time each machine runs before breaking down is ! 1
M
X M (M
exponentially distributed with mean 1/ , and suppose that the amount of time 1) . . . (M n + 1)
that it takes for the serviceman to fix a machine is exponentially distributed ⇡0 = 1+
µn
n=1
with mean 1/µ. (a) What is the average number of machines not in use? (b) ! 1
What proportion of time is each machine in use? M ✓ ◆n
X M!
= 1+
µ (M n)!
Define the state X (t) of the system at time t as the number of machines not n=1
in use at time t. Then, we have a birth and death process
and for n 1,
M (M 1) (M 2) 2 !
✓ ◆n 1
* * , + M ✓ ◆n
X
0 j 1 j 2 j .( . . j M 1
l M M+1 ... M! M!
⇡n = 1+
µ µ µ µ µ
µ (M n)! µ (M n)!
n=1
Note that the states M + 1 and higher have no physical reality, but are defined
so that the process complies to the usual theory of birth and death processes
( i = µi+1 = 0 for i M)
MATH3801-MATH3901 Stochastic Processes 503 MATH3801-MATH3901 Stochastic Processes 504
6. Continuous-Time Markov Chains 6.5 Limiting Probabilities
Stationary distribution for the birth and death process
Hence, (a) the average number of machines not in use is
M M ✓ ◆n M ✓ ◆n
! 1
X X M! X M!
n⇡n = n 1+
µ (M n)! µ (M n)!
n=0 n=1 n=1
and (b) the long-run proportion of time that a given machine is working is
given by
8 Queueing theory
M
X
P(machine working) = P(machine working|n machines not working)⇡n
n=0
M
XM n
= ⇡n
M
n=0
M
X n⇡n
=1
M
n=0
MATH3801-MATH3901 Stochastic Processes 505
8. Queueing theory 8.1 Introduction 8. Queueing theory 8.1 Introduction
Queueing theory: introduction Queueing theory: introduction
As evidenced by the previous chapter’s examples, a major application Queueing theory is the mathematical theory of waiting lines: it is
of continuous-time Markov Chain is to modelling queues concerned with the mathematical modelling and analysis of systems
A queue is a waiting line, which arises in many of our daily activities, that provide service to random demands
like customers waiting at a supermarket checkout counter or tasks ; a queueing model is an abstract description of such a system
waiting to be processed by a computer system
MATH3801-MATH3901 Stochastic Processes 506 MATH3801-MATH3901 Stochastic Processes 507
8. Queueing theory 8.1 Introduction 8. Queueing theory 8.1 Introduction
Historical remarks Queueing theory: introduction
Queueing theory’s history goes back more than 100 years, Nowadays, queueing theory is attractive and popular, as
Johannsen’s “Waiting Times and Number of Calls” (1907) seems to be
the first paper on the subject
1 queues draw strongly from our intuitions about our everyday
activities
But the method used in this paper was not mathematically exact and 2 the understanding of those models uses much of the machinery
therefore, from the point of view of exact treatment, the paper that has
which we have developed in the previous chapters
historic importance is that of Agner Krarup Erlang (1878-1929), a
Danish engineer who worked for the Copenhagen Telephone ; it is mainly seen as a branch of applied probability theory, as its
Exchange, “The Theory of Probabilities and Telephone Conversations” essence is precisely that it takes into account the randomness of
in 1909 the arrival process and the randomness of the service process
Despite its apparent simplicity, this subject is actually one of some
depth and subtlety
In particular, queues are in general non-Markovian (and so quite
difficult to study); Only under certain conditions, their analysis uses
ideas related to Markov Chains
MATH3801-MATH3901 Stochastic Processes 508 MATH3801-MATH3901 Stochastic Processes 509
8. Queueing theory 8.2 Preliminaries 8. Queueing theory 8.2 Preliminaries
Description of a Queueing System Description of a Queueing System
In a queueing system, customers from a population arrive at a service facility The following notation is used to represent the random variables
to receive service associated with a queuing system:
If customers arrive when all the servers are busy attending to other Nq is the number of customers in the queue and Ns is the number
customers, they join the queue until a server is free of customers receiving service
After a customer has been served, he/she leaves the system and will not join N is the total number of customers in the system, thus
the queue again
A server cannot be idle when there are customers to be served: this is the N = Nq + Ns
work conservation rule
W is the time a customer spends in the queue before going to
service (i.e. the waiting time)
S is the time a customer spends in actual service
T is the total time a customer spends in the system, that is,
T =W +S
X is the interarrival time, that is, the time between two successive
arrivals of customers in the system
MATH3801-MATH3901 Stochastic Processes 510 MATH3801-MATH3901 Stochastic Processes 511
8. Queueing theory 8.2 Preliminaries 8. Queueing theory 8.2 Preliminaries
Description of a Queueing System Description of a Queueing System
We must specify a number of details about the queueing system Queueing discipline: the rules that governs how the next customer
before we are able to model it: to receive service is selected, once a server becomes available.
Population: the source of the customers, it can be finite or infinite Several disciplines can be used, like:
I first-in first-out (FIFO): the customers are served in the order they
Arrival process: defines in what manner the customers arrive in
arrive
the system (i.e., the distribution of X ) I last-in first-out (LIFO): the last customer to arrive receives service
Service time distribution: defines the time taken to serve each before those that arrived earlier
customer (i.e., the distribution of S) I service in random order (SIRO): the next customer to receive
service is selected at random from the queue
Capacity: can be finite or infinite. If it is finite, customers that I priority (PNPN): customers are divided into ordered priority classes,
arrive when the system is full are lost higher classes customers receive service before lower classes,
Number of servers: can be one (single-server system) or more even if they arrive later
than one (multi-server system). A special case of a multi-server I etc.
system is the infinite-server system, where each arriving customer It requires only little imagination to think of various other systems
is served immediately. In a multi-server system, some servers can
Obviously, the time a customer spends in the system is a function of
provide faster service than others
the preceding parameters and queueing policies
MATH3801-MATH3901 Stochastic Processes 512 MATH3801-MATH3901 Stochastic Processes 513
8. Queueing theory 8.2 Preliminaries 8. Queueing theory 8.2 Preliminaries
The Kendall Notation The Kendall Notation
The Kendall notation is a classical shorthand notation used to describe
queueing systems. It is written in the form Symbols traditionally used for A and B are:
M, which stands for Markovian interarrival time or service time,
A/B/c/D/E/F that is the Exponential distribution. This also means that
customers arrive according to a Poisson process
where: G which stands for a general (interarrival or service) time
“A” describes the arrival process distribution
“B” describes the service time distribution D which stands for deterministic (non-random)
“c” is the number of servers For instance, we have already introduced the queueing systems:
“D” is the system capacity (the default value is infinity) M/M/1: exponentially distributed interarrival time, exponentially
“E” describes the size of the population (the default value is distributed service time, a single server
infinity) M/M/s: exponentially distributed interarrival time, exponentially
“F ” describes the queueing discipline (the default is FIFO) distributed service time, s servers
When default values of D, E and F are used, we use the simple (see Slides 444-445)
notation A/B/c
MATH3801-MATH3901 Stochastic Processes 514 MATH3801-MATH3901 Stochastic Processes 515
8. Queueing theory 8.2 Preliminaries 8. Queueing theory 8.2 Preliminaries
Little’s formula Little’s formula
Little’s formula is a statement on the relationship between the mean “Heuristic proof”:
number of customers in the system, the mean time spent in the Consider a time ⌧ large enough so that the system has attained its
system, and the average rate at which customers arrive in the system stationary behaviour (if any)
This average arrival rate is given by By time ⌧ , the total number of customers to have entered the system is
A(t) A(⌧ ) = a ⌧ ,
a = lim
t!1 t and the average number to have departed is a (⌧ E(T ))
where A(t) denotes the number of customers arrivals by time t ; the average net number remaining in the system is the difference:
Little’s formula
E(N) = a⌧ a (⌧ E(T )) = a E(T )
E(N) = a E(T )
As such, the formula can also be stated in terms of the number of
Notably, this result is valid for all queueing models, regardless of the customers in queue or in service:
arrival process, the number of servers or queue discipline
E(Nq ) = a E(W ), E(Ns ) = a E(S)
In fact, the validity of this result does not rest on details of any
particular system, but just depends upon long run mass flow
MATH3801-MATH3901 Stochastic Processes 516 MATH3801-MATH3901 Stochastic Processes 517
8. Queueing theory 8.2 Preliminaries 8. Queueing theory 8.2 Preliminaries
Steady-state probabilities Steady-state probabilities
Let N(t) denote the number of customers in the system at time t and As an illustration, consider a one server queueing model in which all
define (assuming the limit exists) customers have service times equal to 1 (deterministic service time),
and where the times between successive customers are always
⇡n = lim P(N(t) = n) greater than 1 (like U[1,2] for instance)
t!1
; it is a G/D/1 queueing model
which is the long-run proportion of time that there will be exactly n Every arrival finds the system empty and every departure leaves it
customers in the system empty, so that
Two other sets of limiting probabilities are {↵n , n 0} and { n, n 0}, ↵0 = 0 =1
where However,
⇡0 6= 1
↵n = proportion of customers that find n in the system when they arrive
as the system is not always empty of customers
n = proportion of customers leaving behind n in the system when they depart On the other hand, it is no coincidence that ↵n = n : arrivals and
departures always see the same number of customers in any system
In a general queueing system, these quantities ⇡n , ↵n and n need not
in which customers arrive and depart one at a time
be equal
(see textbook for a proof of this result)
MATH3801-MATH3901 Stochastic Processes 518 MATH3801-MATH3901 Stochastic Processes 519
8. Queueing theory 8.2 Preliminaries 8. Queueing theory 8.3 Exponential Models
PASTA principle The M/M/1 Queueing system
Hence, on the average, arrivals and departures always see the same M/M/1 is the simplest queueing system:
number of customers. However, they do not, in general, see time customers arrive according to a Poisson process of rate
averages. One important exception where they do is in the case of the service time is exponentially distributed with rate µ
Poisson arrivals (i.e., with Exponential inter-arrival times) this is a single-server service facility
Proposition (the model also assumes the various default values: infinite capacity,
infinite population, FIFO discipline)
Poisson Arrivals See Time Averages, that is, ⇡n = ↵n = n if the arrival
process is Markovian The number of customers in the system can increase or decrease by
at most one at a time. Besides, the Markovian nature of both the
This is known as the PASTA property arrival process and the service time makes {N(t), t 0} a
The reason for that is the independent increment property of the Poisson continuous-time Markov Chain
process. Consider an arbitrary arrival. Knowing that it occurred at time t gives
; it is a birth and death process with constant birth rate and
us no information about what occurred prior to t. Hence, an arrival would just
see the system according to the limiting probabilities ⇡n constant death rate µ
By contrast, in the previous G/D/1 system, knowing that an arrival occurred * *
0 j 1 j 2 j .( . .
at t tells us that there have been no arrivals in [t 1, t) ; what this arrival µ µ µ
observes is not the same as the distribution of the system state at any time t
MATH3801-MATH3901 Stochastic Processes 520 MATH3801-MATH3901 Stochastic Processes 521
8. Queueing theory 8.3 Exponential Models 8. Queueing theory 8.3 Exponential Models
The M/M/1 Queueing system The M/M/1 Queueing system
Define the traffic intensity of a queueing system (sometimes also
called the load or the utilisation of the system) by the ratio of the mean
service time over the mean inter-arrival time, i.e. On this condition, the local balance equations (see Slide 497)
E(S) ⇡j = µ⇡j+1 , j = 0, 1, 2, . . .
⇢=
E(X )
give ⇡j+1 = ⇢⇡j , j = 0, 1, 2, . . . which recursively lead to
; it can be understood that this is a fundamental quantity in any
queueing system ⇡j = ⇢j ⇡0
In the case of a M/M/1 system, we have S ⇠ Exp(µ) and X ⇠ Exp( ), P1 Pn j ⇡0
hence the traffic intensity is just As j=0 ⇡j must equal 1, we have 1 = ⇡0 j=0 ⇢ = 1 ⇢, that is,
1/µ
⇢= = ⇡0 = 1 ⇢ and ⇡j = (1 ⇢)⇢j , j = 1, 2, . . .
1/ µ
On Slide 500, we showed, from the general theory of birth and death
processes, that {N(t), t 0} admits a stationary distribution
⇡ = (⇡0 , ⇡1 , . . .) if and only if < µ, that is, ⇢ < 1
MATH3801-MATH3901 Stochastic Processes 522 MATH3801-MATH3901 Stochastic Processes 523
8. Queueing theory 8.3 Exponential Models 8. Queueing theory 8.3 Exponential Models
The M/M/1 Queueing system The M/M/1 Queueing system
What about E(N), E(Nq ), E(Ns ), E(W ) and E(T ) in the long run?
Further,
Using the ergodic theorem (Slide 259), the long-term average number ⇢ ⇢2
of customers in the system is E(Nq ) = E(W ) = =
µ(1 ⇢) 1 ⇢
1
X 1
X ⇢ ⇢
E(N) = j⇡j = (1 ⇢) j⇢j = (1 ⇢) = ⇢ ⇢2 ⇢
(1 ⇢)2 1 ⇢ E(Ns ) = E(N) E(Nq ) = = (1 ⇢) = ⇢
j=0 j=0 1 ⇢ 1 ⇢ 1 ⇢
(compare Slide 501) ; the mean number of customers in service is ⇢, which also equals
The other quantities can be obtained with the help of Little’s formula 1 ⇡0 , that is, the probability that the server is busy (at least 1
Here, as the arrival process is Markovian, the arrival process is a customer in the system)
Poisson process with constant rate: a = , so that See that we have also
E(W )
E(N) 1 1 ⇢=
E(T ) = = = E(T )
µ µ(1 ⇢)
1 1 ⇢ ; ⇢ also gives the proportion the waiting time represents relatively to
E(W ) = E(T ) E(S) = = the total time spent in the system
µ µ µ(1 ⇢)
MATH3801-MATH3901 Stochastic Processes 524 MATH3801-MATH3901 Stochastic Processes 525
8. Queueing theory 8.3 Exponential Models 8. Queueing theory 8.3 Exponential Models
The M/M/1 Queueing system The M/M/1 Queueing system
Note that the waiting time, the mean time in the system, the mean
number of customers in the system become extremely large as the
traffic intensity approaches one Logically enough, the queue behaves in qualitatively different manners
Imagine that µ = 1, then E(N) = E(W ) = ⇢ depending on whether ⇢ < 1 or ⇢ > 1
1 ⇢. The figure below
illustrates this phenomenon ⇢ > 1: service times exceed interarrival times, on average, so the
server is not able to cope with the rate of arrivals and the queue
50

length increases without limit (the system explodes, unstable

40 system)
; no stationary distribution
30
⇢ < 1: the queue settles down into equilibrium (stable system)

E(N)=E(W)
⇢ = 1: the queue length experiences wild oscillations with no

20
reasonable bound on their magnitudes (unstable system)

10
0
0.0 0.2 0.4 0.6 0.8 1.0
ρ
MATH3801-MATH3901 Stochastic Processes 526 MATH3801-MATH3901 Stochastic Processes 527
8. Queueing theory 8.3 Exponential Models 8. Queueing theory 8.3 Exponential Models
The M/M/1 Queueing system: example M/M/1 Queueing system: some further properties
Example 8.2 Suppose that a customer that leaves the system is known to have
spent a total of t time units in it (t > 0)
Students arrive at the campus post office according to a Poisson process with
an average rate of one per every 12 minutes, and the service time is What is the (conditional) distribution of the number of customers that
exponential at a rate of one service per 8 minutes. There is only one postal were present in the system when that customer arrived?
worker at the counter, and any arriving student that finds the worker busy
joins the queue. a) What is the probability that an arriving student has to
Bayes’ first rule: for any nonnegative integer n,
wait? b) What is the mean time spent at the post office by an arbitrary P(N = n)
student? c) What is the mean number of students in the post office ? What P(N = n|T = t) = fT |N (t|n)
fT (t)
about a), b) and c) if one student arrives per every 9 minutes on average?
Now, given that N = n, the time spent in the system is distributed as
Clearly we have here a M/M/1 queue with = 1/12 ( = 1/9) and µ = 1/8, the sum of n + 1 independent Exponential r.v. with common rate µ, that
so that
8 8 is, as the Gamma distribution with parameters n + 1 and µ (Slide 340).
⇢= = 0.667 (⇢ = = 0.889) Hence, as P(N = n) = ⇡n (PASTA property),
12 9
so that a) P(arriving student waits) = P(server busy) = ⇢ = 0.667 (⇢ = 0.889) (µt)n ( /µ)n (1 /µ) ( t)n
µt
1 1 P(N = n|T = t) = µe =K
b) E(T ) = µ(1 ⇢) = 1/8⇥1/3 = 24 min (E(T ) = 72 min) n! fT (t) n!
⇢
c) E(N) = 1 ⇢ = 2 students (E(N) = 8) where K is some constant which does not depend on n
MATH3801-MATH3901 Stochastic Processes 528 MATH3801-MATH3901 Stochastic Processes 529
8. Queueing theory 8.3 Exponential Models 8. Queueing theory 8.3 Exponential Models
M/M/1 Queueing system: some further properties M/M/1/K: single server finite capacity queue
Summing over n yields In the previous model, it is assumed that there is no limit on the
1 1 number of customers that could be in the system at the same time
X X ( t)n t
1= P(N = n|T = t) = K = Ke However, in reality there is always a finite system capacity, say K (if
n!
n=0 n=0
only a physical limit)
Thus, K = e t, showing that So, the queueing system should be represented by
n
t ( t)
P(N = n|T = t) = e , n = 0, 1, 2 . . . * * ,
n! 0 j 1 j 2 j .( . . j K 1
l
*
K
µ µ µ µ µ
; N|T ⇠ P( T )
The local balance equations remain ⇡j = µ⇡j+1 , except that now this
As a by-product, we also have
holds true for j = 0, 1, . . . , K 1 only (⇡j ⌘ 0 if j K + 1)
1
fT (t) = (1 /µ)µe µt
= (µ )e (µ )t
, t 0 It follows that
K ⇡j = ( /µ)j ⇡0 , j = 0, 1, 2, . . . , K
PK
; T ⇠ Exp(µ ) By using the fact that j=0 ⇡j = 1, we obtain
The total amount of time a customer spends in the system is an 1 /µ ( /µ)j (1 /µ)
exponential random variable with rate µ ⇡0 = and ⇡j = , j = 1, 2, . . . , K
1 ( /µ)K +1 1 ( /µ)K +1
MATH3801-MATH3901 Stochastic Processes 530 MATH3801-MATH3901 Stochastic Processes 531
8. Queueing theory 8.3 Exponential Models 8. Queueing theory 8.3 Exponential Models
M/M/1/K: single server finite capacity queue M/M/1/K: single server finite capacity queue
Now, if ⇢ 6= 1,
Note that in this case there is no need to impose /µ < 1, as the K K K
X X ⇢j (1 ⇢) (1 ⇢) X j
queue length is bounded, by definition E(N) = j⇡j = j = j⇢
1 ⇢K +1 1 ⇢K +1
j=0 j=0 j=0
It might be tempting to think that the average arrival rate should be 0 1
1 1
equal to as previously, but this is not true, as arrivals are blocked (1 ⇢) @X j X
= j⇢ j⇢j A = . . .
when N = K : 1 ⇢K +1
j=0 j=K +1
in the long-term, the arrival rate is when N < K (that is, with ⇢ ⇢K +1
probability 1 ⇡K ) and 0 when N = K (that is, with probability ⇡K ). = (K + 1)
1 ⇢ 1 ⇢K +1
Hence,
(1 ( /µ)K ) (1 ⇢K ) We can then apply Little’s formula to obtain, after some algebra,
a = (1 ⇡K ) = = ✓ ◆
1 ( /µ)K +1 1 ⇢K +1 1 1 ⇢K
E(T ) = K
if we again define ⇢ = µ
µ 1 ⇢ 1 ⇢K
Note: the previous results only hold if ⇢ 6= 1. If ⇢ = 1, it can be shown
1 K
that ⇡j = for j = 0, 1, . . . , K and E(N) =
K +1 2
MATH3801-MATH3901 Stochastic Processes 532 MATH3801-MATH3901 Stochastic Processes 533
8. Queueing theory 8.3 Exponential Models 8. Queueing theory 8.3 Exponential Models
M/M/1/K: example
Example b) customers will leave without getting their car fixed if they find the garage
Each morning people arrive at Ed’s garage to have their cars fixed. Ed’s full, this happens with probability
garage can only accommodate 4 cars. Anyone arriving when there are
already four cars has to go away without leaving his car for Ed to fix. Ed’s ⇡4 = ( /µ)4 ⇡0 = (45/60)4 ⇥ 0.328 = 0.1037
customers arrive according to a Poisson process with a rate of one customer
; that is also the long-run proportion of customers leaving the garage
per hour, and the time it takes Ed to service a car is Exponentially distributed
without getting their car fixed
with a mean of 45 minutes. a) What is the probability than an arriving
customer finds Ed idle? b) What proportion of customers are not allowed to c) for those who left their car at Ed’s, the expected time they spend there is
leave their car? c) What is the expected time spent at Ed’s garage to have ✓ ◆ ✓ ◆
one’s car fixed? 1 1 ⇢K 1 0.754
E(T ) = K = 45 4⇥
µ 1 ⇢ 1 ⇢K 1 0.75 1 0.754
This is a M/M/1/4 queue with parameters (time expressed in min) = 1/60
and µ = 1/45, so that ⇢ = 0.75 = 51.68 min
Hence, a) the probability to find Ed idle is
1 /µ 1 45/60
⇡0 = = = 0.328
1 ( /µ)K +1 1 (45/60)5
MATH3801-MATH3901 Stochastic Processes 534 MATH3801-MATH3901 Stochastic Processes 535
8. Queueing theory 8.3 Exponential Models 8. Queueing theory 8.3 Exponential Models
The M/M/s queueing system: the s-server queue The M/M/s queueing system: the s-server queue
Suppose now that there are s identical servers. When a customer The balance equations then reduce to
arrives he is randomly assigned to one of the idle servers until all
⇡j = min(j + 1, s)µ⇡j+1 , j = 0, 1, 2, . . .
servers are busy when a single queue is formed
When there are j  s customers in the system, they are all receiving Iterating on this we obtain
8 ⇣ ⌘ ⇣ ⌘
service so the time until a departure is the minimum of j independent < j
1
exponentials of rate µ ; departure rate is jµ µ j! ⇡0 j = 0, 1, 2, . . . , s
⇡j =
: ( /µ)j
If there are j > s customers in the system, then only s will be in ⇡
s! sj s 0
j >s
P
service, and so the departure rate is sµ With ⇡j = 1, it follows
j
Customers are always arriving at rate 1
⇡0 = ⇣ ⌘j ⇣ ⌘ ⇣ ⌘s ⇣ ⌘
Thus, the system can be represented by Ps 1 1 1 sµ
j=0 µ j! + µ s! sµ
0 j
* *
.( . . j
, *
s j
,
.) . . if < sµ
1 j 2 j s 1
l s+1
k
µ 2µ 3µ (s 1)µ sµ sµ sµ Here define ⇢ = sµ . The condition ⇢ < 1 remains valid for the queue to
admit a stationary distribution ; intuitive
MATH3801-MATH3901 Stochastic Processes 536 MATH3801-MATH3901 Stochastic Processes 537
8. Queueing theory 8.3 Exponential Models 8. Queueing theory 8.3 Exponential Models
The M/M/s queueing system: the s-server queue M/M/s queueing system: example
A queue can only form when the process is in state s or higher Example
; arriving customers who see the process in a state lower than s do Students arrive at a checkout counter in the cafeteria according to a Poisson
not have to wait process with an average of 15 students per hour. There are two cashiers at
The probability ⇡W that a customer has to wait is the counter, and they provide identical service to students. The time to serve
a student by either cashier is exponentially distributed with a mean of 3
X1 X1
⇡s minutes. Students that find both cashiers busy on their arrival join a single
⇡W = ⇡s + ⇡s+1 + . . . = ⇡j = ⇡s ⇢j = queue. What is the probability that an arriving student does not have to wait?
1 ⇢
j=s j=0 What is the average number of students queueing? What is the average
with ⇡s deduced from the previous slide waiting for a student before being served?
The mean queue length is This is a M/M/2 queueing system with the following parameters (time in
1
X 1
X minutes):
⇢⇡s
E(Nq ) = (j s)⇡j = ⇡s j⇢j = 1/4
(1 ⇢)2 = 15/60 = 1/4, µ = 1/3, ;⇢= = 3/8 (< 1)
j=s+1 j=1 2 ⇥ 1/3
Finally, from Little’s formula with a = , we obtain the mean waiting Hence, ✓ ◆
1 5 3 5
time E(Nq ) ⇡s ⇡0 = ⇣ ⌘2 ⇣ ⌘= and ⇡1 = ⇥⇡0 = ⇥ = 0.34
E(W ) = = 2µ 11 µ 4 11
sµ(1 ⇢)2 1+ µ + µ 2(2µ )
MATH3801-MATH3901 Stochastic Processes 538 MATH3801-MATH3901 Stochastic Processes 539
8. Queueing theory 8.3 Exponential Models 8. Queueing theory 8.3 Exponential Models
M/M/s queueing system: example Comparison of different queueing systems
It is worth comparing three ‘apparently’ similar queueing systems in
; a student does not have to wait with probability ⇡0 + ⇡1 = 0.795 terms of the mean time a customer spends in the system (E(T )):
1 a single M/M/1 queue with service rate mµ
Similarly, we find
2 a single M/M/m queue with service rate µ
✓ ◆2
1 1 3 m M/M/1 queues in parallel with service rate µ
⇡2 = ⇥ ⇡0 = ⇥ (3/4)2 ⇥ 5/11 = 0.128 Denote the overall arrival rate. Then,
2 µ 2
1 system 1 is a M/M/1 with parameters 1 = and µ1 = mµ
It follows: 2 system 2 is a M/M/m with parameters 2 = and µ2 = µ
⇢⇡2 (3/8) ⇥ 0.128 3 system 3 is, from the view of a single customer, a M/M/1 system
E(Nq ) = = = 0.122 (student queueing) with parameters 3 = /m and µ3 = µ
(1 ⇢)2 (1 3/8)2
Interestingly, all systems have the same traffic intensity
and
E(Nq ) 0.122 ⇢=
E(W ) = = = 0.49 (min) mµ
1/4
(In particular, they all require < mµ to be able to accommodate all
the arriving customers)
MATH3801-MATH3901 Stochastic Processes 540 MATH3801-MATH3901 Stochastic Processes 541
8. Queueing theory 8.3 Exponential Models 8. Queueing theory 8.3 Exponential Models
Comparison of different queueing systems Comparison of different queueing systems
Now, in terms of the mean time E(T ) spent in the system by a given For µ = 2 and m = 10, the figure below shows E(T ) as a function of ⇢
customer, we find: (thus varying ) for the three systems:
1 for system 1,
1 1

2.0
E(T1 ) = = system 1
µ1 (1 ⇢) mµ(1 ⇢) system 2
system 3
for system 2,

1.5
2
⇡m 1 ⇡m 1
E(T2 ) = + = +
mµ2 (1 ⇢)2 µ2 mµ(1 ⇢)2 µ

E(T)

1.0
3 for system 3,
1 1
E(T3 ) =

0.5
=
µ3 (1 ⇢) µ(1 ⇢)
Clearly, system 3 is the worst system: for instance E(T3 ) is m times

0.0
longer than E(T1 )!
0.0 0.2 0.4 0.6 0.8 1.0
Comparison between system 1 and system 2 is less obvious ρ
MATH3801-MATH3901 Stochastic Processes 542 MATH3801-MATH3901 Stochastic Processes 543
8. Queueing theory 8.3 Exponential Models 8. Queueing theory 8.3 Exponential Models
Comparison of different queueing systems Comparison of different queueing systems
We see that E(T1 ) is always smaller than E(T2 ) (same holds true for Note that things are a bit different if you are only interested in the time
other values of m and µ) ; system 1 is better than system 2 (in terms customers are waiting in queue E(W )
of the mean total time)
We can understand these observations intuitively in terms of the

2.0
efficiency of resource allocation:
system 1
system 2
system 3
In the case of m parallel M/M/1 queues, there is always a positive

1.5
probability that some servers have many customers in their queues
while other servers are idle ; not efficient at all

E(W)

1.0
In the M/M/m case this cannot happen (work conservation rule)
However, when there are k < m customers in the system, the global
service rate is only k µ

0.5
The M/M/1 queue with service rate mµ always provides service with
rate mµ ; more efficient

0.0
(especially when ⇢ is small, that is when few customers are expected 0.0 0.2 0.4 0.6 0.8 1.0
in the system) ρ
MATH3801-MATH3901 Stochastic Processes 544 MATH3801-MATH3901 Stochastic Processes 545
8. Queueing theory 8.3 Exponential Models 8. Queueing theory 8.3 Exponential Models
Queueing system with Bulk service Queueing system with bulk service
So far we have assumed that the servers were able to serve only one Actually, it does not matter to know how many customers are
customer at a time effectively being served, what matters is that the server is busy. So the
process can be represented by
However, in many applications, the server accept for service several
customers at the same time * * * * *
0⇤ k 0 jc 1 c 2 c 3 d 4 c .( . . ...
; think of a lift for instance µ µ
µ µ µ µ µ
Such a situation is called bulk service
Suppose, to fix ideas, that the server is able serve up to 2 customers where State 0⇤ represents “no one in service” and State 0 “Server
simultaneously busy, no one waiting”
Now, the balance equations become
Whenever it completes the service, it then serves the next two
customers. However, if there is only one customer in line, then he is ⇡0⇤ = µ⇡0
served by himself ( + µ)⇡0 = ⇡0⇤ + µ⇡1 + µ⇡2
Assume that the service time is Exp(µ) whether it is serving one or two ( + µ)⇡j = ⇡j 1 + µ⇡j+2 , for j = 1, 2, . . .
customers
MATH3801-MATH3901 Stochastic Processes 546 MATH3801-MATH3901 Stochastic Processes 547
8. Queueing theory 8.3 Exponential Models 8. Queueing theory 8.3 Exponential Models
Queueing system with bulk service Queueing system with bulk service P1
Now the set of equations To obtain ⇡0 , we use ⇡0⇤ + j=0 ⇡j = 1, that is
( + µ)⇡j = ⇡j 1 + µ⇡j+2 0 1
1
X ✓ ◆
has a solution of the form = ↵j µ 1 µ
⇡j ⇡0 1 = ⇡0 @ + ↵ j A = ⇡0 +
1 ↵
Indeed, j=0
( + µ)↵j ⇡0 = ↵j 1
⇡0 + µ↵j+2 ⇡0 ,
Thus,
or
( + µ)↵ = + µ↵3 (1 ↵)
⇡0 =
yields + µ(1 ↵)
p p ↵j (1 ↵)
1 1 + 4 /µ 1 + 1 + 4 /µ ⇡j = for j = 1, 2, . . .
↵ = 1, ↵= or ↵= + µ(1 ↵)
2 2
p µ(1 ↵)
1+ 1+4 /µ ⇡0⇤ =
The only admissible value is the last one ↵ = 2 + µ(1 ↵)
Hence,
µ Note that for this to be valid we need ↵ < 1, that is, < 2µ
⇡j = ↵j ⇡0 (j = 1, 2, . . .) and ⇡0⇤ = ⇡0 (from the first equation) ; intuitive !
MATH3801-MATH3901 Stochastic Processes 548 MATH3801-MATH3901 Stochastic Processes 549
8. Queueing theory 8.3 Exponential Models 8. Queueing theory 8.3 Exponential Models
Queueing system with bulk service Other M/M Queueing systems
All the relevant quantities can now be determined The preceding illustrates that the basic M/M/1 queue can be extended
We have in many directions to model different behaviours, like
1
X 1 defection: when a customer has spent too much time in queue
(1 ↵) X ↵
E(Nq ) = j⇡j = j↵j = and leaves out of frustration without receiving service
+ µ(1 ↵) (1 ↵)( + µ(1 ↵))
j=0 j=0 jockeying: when several servers providing the same service have
Little’s formula with a = then gives each their own queue and some customer leaves his queue and
moves into another supposedly faster queue
E(Nq ) ↵
E(W ) = = balking: when some arriving customers perceive the queue to be
(1 ↵)( + µ(1 ↵))
too long and choose not to join it at all
Then,
1 batch arrivals: when customers arrive in batch (groups of friends,
E(T ) = E(W ) + families, etc.) (more than one arrival at a time)
µ
so that etc.
↵ ; queueing theory is a very powerful technique that can be applied to
E(N) = E(T ) = E(Nq ) + = + many human activities, but also in computer systems,
µ (1 ↵)( + µ(1 ↵)) µ
telecommunications networks, or manufacturing systems
MATH3801-MATH3901 Stochastic Processes 550 MATH3801-MATH3901 Stochastic Processes 551
8. Queueing theory 8.4 Network of queues 8. Queueing theory 8.4 Network of queues
Network of queues Sequential queue
To analyze this system we need to keep track of the number of
customers both at server 1 and at server 2
Consider a two-server system in which customers arrive at a Poisson
So define the states by the pairs (n, m), meaning n customers at
rate at Server 1. After being served by Server 1, they join the queue
Server 1 and m customers at Server 2 (including the one being served)
in front of Server 2
The system can be represented by the ‘two-dimensional’ transition
Each server serves one customer at a time with Server i taking an
diagram
exponential time with rate µi for a service (i = 1, 2)
, ,
Such a system is called a sequential queueing system 0, 0 1, 0 2, 0 .) . .
K µ1 K µ1 K µ1
µ2 µ2 µ2
⇤ ⇤ ⇧
, ,
0, 1 1, 1 2, 1 .) . .
K µ1 K µ1 K µ1
µ2 µ2 µ2
⇤ ⇤ ⇧
, ,
0, 2 1, 2 2, 2 .) . .
K µ1 K µ1 K µ1
µ2 µ2 µ2
MATH3801-MATH3901 Stochastic Processes 552 MATH3801-MATH3901 Stochastic Processes 553
. .⌅ . . .⌅ . . .⇧ .
8. Queueing theory 8.4 Network of queues 8. Queueing theory 8.4 Network of queues
Sequential queue Sequential queue
Hence, we have (stationary distribution of M/M/1, Slide 523):
✓ ◆ ✓n ◆
Then, the balance equations are: P(n at Server 1) = 1 and
µ1 µ1
⇡0,0 = µ2 ⇡0,1 ✓ ◆m ✓ ◆
( + µ1 )⇡n,0 = µ2 ⇡n,1 + ⇡n 1,0 for n = 1, 2, . . . P(m at Server 2) = 1
( + µ2 )⇡0,m = µ2 ⇡0,m+1 + µ1 ⇡1,m 1 for m = 1, 2, . . . µ2 µ2
( + µ1 + µ2 )⇡n,m = µ2 ⇡n,m+1 + µ1 ⇡n+1,m 1 + ⇡n 1,m for n, m = 1, 2, . . . if < min(µ1 , µ2 ). Now, if the numbers of customers at Servers 1 and
We can guess at a solution: 2 were independent, then it would follow that
✓ ✓ ◆✓ ◆n ✓ ◆ ◆m
We first note that the situation at Server 1 is just as in an M/M/1 queue ⇡n,m = 1 1 (?)
µ1 µ1 µ2 µ2
It can be understood that departures from Server 1, that is arrivals at
Server 2, also occur according to a Poisson process with rate As the continuous-time Markov Chain defined by the previous diagram
is irreducible and positive-recurrent (provided < min(µ1 , µ2 )), it has a
; similarly, what happens at Server 2 is also a M/M/1 queue unique stationary distribution
; if (?) satisfies the balance equations on the previous slide, then it
is the stationary distribution
MATH3801-MATH3901 Stochastic Processes 554 MATH3801-MATH3901 Stochastic Processes 555
8. Queueing theory 8.4 Network of queues 8. Queueing theory 8.4 Network of queues
Sequential queue Sequential queue
From this, we see that N, the total number of customers in the system,
is such that
Among others, it is easy to see that X
✓ ◆✓ ◆ ✓ ◆✓ ◆✓ ◆ E(N) = (n + m)⇡n,m
n,m
⇡0,0 = 1 1 = µ2 1 1 = µ2 ⇡0,1 ,
µ1 µ2 µ1 µ2 µ2 X ✓ ◆n ✓ ◆ X ✓ ◆m ✓ ◆
= n 1 + m 1
which is the first balance equation n
µ1 µ1 m
µ2 µ2
It can be checked the same way that all other equations are verified as = +
well µ1 µ2
; (?) is the stationary distribution, and moreover the number of Besides, by Little’s formula, the average time spent in the system is
customers at Server 1 and the number of customers at Server 2 E(N) 1 1
are independent random variables E(T ) = = + ,
µ1 µ2
which is essentially the average time spent at Server 1 plus the
average time spent at Server 2
MATH3801-MATH3901 Stochastic Processes 556 MATH3801-MATH3901 Stochastic Processes 557
8. Queueing theory 8.4 Network of queues 8. Queueing theory 8.4 Network of queues
Sequential queue: remark Network of queues
We said that the number of customers at the two servers at any given
time are independent random variables In many queueing situations, a customer has to wait in a number of
different queues before completing the desired transaction and leaving
However, that does not imply that the waiting times of a given
the system
customer at the two servers are independent
Suppose indeed that is very small with respect to µ1 = µ2 For example, when we go to the registry of motor vehicles to get a
driver’s license, we must wait in one queue to have the application
; almost all customers have zero wait in queue processed, in another queue to pay for the license, and in yet a third
But given that the waiting time in queue 1 is positive for a given queue to obtain a photograph for the license
customer, then it will positive as well in queue 2 with probability
In a multiprocessor computer facility, a job can be queued waiting for
P(S1 < S2 ), where S1 ⇠ Exp(µ1 ) is the service time for this customer
service at one processor, then go to wait for another processor, and so
at Server 1, and S2 ⇠ Exp(µ2 ) is the service time for the previous
forth; frequently the same processor is visited several times before the
customer at Server 2
µ1 job is completed
P(S1 < S2 ) = = 1/2,
µ1 + µ2 Such systems are modeled by a network of queues, and Jackson
as S1 and S2 are independent networks are perhaps the simplest models of such networks
; the waiting times are certainly not independent
MATH3801-MATH3901 Stochastic Processes 558 MATH3801-MATH3901 Stochastic Processes 559
8. Queueing theory 8.4 Network of queues 8. Queueing theory 8.4 Network of queues
Jackson networks Jackson networks: example
In such a model, we have a network of k interconnected queueing A Jackson network with three servers can be represented by
systems (servers)
Each of the k servers receives customers both from outside the r1 r2
network (exogenous inputs) and from other servers within the network P21
s
5 1 e 93 2 i
(endogenous inputs) P11 P12 P22
P31 P32
Customers arrive from outside the system to Server i (i = 1, . . . , k ) P10 P20
according to a Poisson process of rate ri and join the queue at i (if any) P13 P23
⇥ % y ⌧
until their turn at service comes = 3T
r3 P30
Once a customer is served by Server i, he joins the queue (if any) a P33
!
Server j, j = 1, . . . , k , with probability Pij
Pk
Note that here we will assume that j=1 Pij < 1 Arrows like ; are for exogenous arrival rates, the plain ones are for
Pk transition probabilities
;1 j=1 Pij is the probability that the customer departs the system . Pk
(exogenous departure) after being served at Server i We have denoted Pi0 = 1 j=1 Pij the probability of exogenous
departure from Server i
MATH3801-MATH3901 Stochastic Processes 560 MATH3801-MATH3901 Stochastic Processes 561
8. Queueing theory 8.4 Network of queues 8. Queueing theory 8.4 Network of queues
Jackson networks Jackson networks
If we let j denote the total arrival rate to Server j, then The average number of customers in the system is
k
X
X N= (n1 + n2 + . . . + nk )⇡n1 ,n2 ,...,nk
j = rj + i Pij , j = 1, 2, . . . , k
n1 ,n2 ,...,nk
i=1
k
X
It turns out that, similarly to what we observed for sequential queues, the
numbers of customers at each of the servers are independent and are such = average number of customers at Server j
that j=1
✓ ◆ n✓ ◆
j j k
X
P(n customers at Server j) = 1 , n 0, j
µj µj =
µj j
where µj is the exponential rate of service at Server j and j is the solution of j=1
the above equation From Little’s formula, we also have the average time a customer
This can be proved by showing that a stationary distribution of the form spends in the system
◆nj ✓ k ✓ ◆ Pk
Y j j Here, a = j=1 rj . This yields
⇡n1 ,n2 ,...,nk = 1
j=1
µj µj Pk j
j=1 µj j
satisfies the balance equations for this model ; this is sometimes called the E(T ) = Pk
Jackson’s theorem (of course, it is necessary that j < µj for all j) j=1 rj
MATH3801-MATH3901 Stochastic Processes 562 MATH3801-MATH3901 Stochastic Processes 563
8. Queueing theory 8.4 Network of queues 8. Queueing theory 8.4 Network of queues
Jackson networks: remarks Jackson networks: example
The fact that Example 8.6
✓ ◆n ✓ ◆
j j Consider a system of two servers where customers from outside arrive at a
P(n customers at Server j) = 1 , n 0,
µj µj Poisson rate 9 customers per hour. An arriving customer will queue at Server
indicates that the distribution of the number of customers at Server i is 1 with probability 4/9 and at Server 2 with probability 5/9. The service rates of
1 and 2 are respectively 8 and 10. A customer upon completion of service at
the same as in an M/M/1 queue, with rates i and µi
Server 1 is equally likely to go to 2 or to leave the system (i.e., P11 = 0,
However, here, the effective arrival process at Server i is not Poisson P12 = 0.5); whereas a departure from 2 will go 25 percent of the time to 1 and
(in general) will depart the system otherwise (i.e., P21 = 0.25, P22 = 0). Determine the
stationary distribution, the average time spent in the system by a customer,
Besides, it must be understood that the above independence
and the average number of customers in the system
statement is true only for a snapshot: at a given time, the number of
customers at Server i and the number of customers at Server j are As a split Poisson process gives rise to two Poisson processes (Slide 375),
independent we have arrivals at Server 1 and Server 2 according to independent Poisson
processes with respective rates r1 = 4 and r2 = 5
However, if there are many customers currently queueing at Server i, 0.25
4 / 1 t 5
and if Pij is large, we can expect that, some time later, there will be 4 2 o
many customers queueing at Server j 0.5
0.5
0.75
; the processes are not independent
MATH3801-MATH3901 Stochastic Processes 564 MATH3801-MATH3901 Stochastic Processes 565
8. Queueing theory 8.4 Network of queues 8. Queueing theory 8.4 Network of queues
Jackson networks: example M/G/., G/M/. and G/G/. queues
The total arrival rates to both servers, call them 1 and 2 are obtained
M/M/. queues are the only queuing systems which are Markov
from the system of equations
Chains ; the analysis of other queues requires greater ingenuity
⇢ 1
1 =4+ 4 2 If either interarrival times or service times are still exponentially
1
2 =5+ 2 1 distributed (M/G/. or G/M/.), then the general theory of Markov Chains
implying that 1 = 6 and 2 =8 are still insightful for studying the queue
Hence, with µ1 = 8 and µ2 = 10, The reason is that, for each of these two cases, we may find an
✓ ◆ n ✓ ◆ m embedded Markov Chain in the continuous-time queueing system
3 1 4 1
P (n customers at 1, m customers at 2) = ⇥ ⇥ ⇥ If neither interarrival times nor service times are exponentially
4 4 5 5
distributed, then the methods inspired by the Markov property fail
and
6 8 On the other hand, those queueing problems are still intimately related
E(N) = + = 7 customers
8 6 10 8 to random walk problems
E(N) 7 However, studying M/G/., G/M/. or G/G/. in details is well beyond the
E(T ) = = hour (= 46.7 min)
9 scope of the short introduction to queueing theory
MATH3801-MATH3901 Stochastic Processes 566 MATH3801-MATH3901 Stochastic Processes 567
10. Brownian Motion and Stationary Processes 10.1 Brownian Motion
Introduction
Basically, Brownian Motion is the seemingly random movement of
particles suspended in a fluid (i.e. dust motes in water or air) and by
extension the mathematical model used to describe this movement
; it is one of the simplest and most fundamental continuous-time
10 Brownian Motion and stochastic processes, finding applications in numerous situations
A brief history
Stationary Processes 1827 the Scottish botanist Robert Brown observed pollen particles floating in
water executing a jittery motion (Brown was also part of the Matthew
Flinders expedition which first circumnavigated Australia)
1900 the French mathematician Louis Bachelier was the first to model the
return process for the French Stock Market
1905 Albert Einstein gave the first explanation of Brownian motion: particles
are continuously being subject to bombardment by the molecules of the
surrounding medium
1918 the American mathematician Norbert Wiener gave the first precise
mathematical definition of Brownian motion ; Wiener process
MATH3801-MATH3901 Stochastic Processes 568
10. Brownian Motion and Stationary Processes 10.1 Brownian Motion 10. Brownian Motion and Stationary Processes 10.1 Brownian Motion
Introduction Introduction
Imagine a particle in a liquid medium, and let X (t) be the x-component Now, imagine a symmetric random walk in discrete time (Slide 237)
(as a function of time) of its position. Assume that X (0) = x0 1/2 1/2 1/2 1/2 1/2 1/2
+ + * * *
What is the (conditional) density of X (t) given that X (0) = x0 ?
R +1 ... j –2 k –1 k 0 j 1 j 2 j .( . .
it must be fX (t)|X (0) (x|x0 ) 0 and 1 fX (t)|X (0) (x|x0 ) dx = 1 1/2 1/2 1/2 1/2 1/2 1/2
in addition, it can be assumed that the movement is continuous, Suppose that the transitions take place every unit of time and that the
that is limt!0 fX (t)|X (0) (x|x0 ) = 0 for x 6= x0 states represent the position of the particle on the x-axis, expressed in
from the physical principles, Einstein showed that fX (t)|X (0) (x|x0 ) unit of length
must satisfy the diffusion equation
Now speed up the process by taking smaller and smaller steps in
@f @2f smaller and smaller time intervals
=D 2
@t @x
; each t time unit, the process makes a step of size x either to the
where D is a constant called the diffusion coefficient
left or the right with equal probabilities
; if D = 1/2, the only solution of this equation satisfying the above
boundary conditions is ; if X (t) represents the position at time t and X (0) = 0, then
✓ ◆
1 (x x0 ) 2 X (t) = x(⇠1 + ⇠2 + . . . + ⇠[t/ t] )
fX (t)|X (0) (x|x0 ) = p exp ; N (x0 , t)
2⇡t 2t where the ⇠i ’s are i.i.d. with P(⇠i = 1) = P(⇠i = 1) = 1/2
MATH3801-MATH3901 Stochastic Processes 569 MATH3801-MATH3901 Stochastic Processes 570
10. Brownian Motion and Stationary Processes 10.1 Brownian Motion 10. Brownian Motion and Stationary Processes 10.1 Brownian Motion
Introduction Introduction
Now, as E(⇠i ) = 0 and Var(⇠i ) = 1, it follows The changes of value of the random walk in non-overlapping time
intervals are independent
t
E(X (t)) = 0 and Var(X (t)) = ( x)2 ; {X (t), t 0} has the independent increment property
t
p The distribution of the change in position of the random walk over any
Let x and t tend to 0, such that x= t
time interval depends only on the length of that interval
2
; Var(X (t)) = t ; {X (t), t 0} has the stationary increment property
As X (t) can be written as a sum of i.i.d. random variables, it follows The random walk is a Markov Chain, so the Brownian motion
from the Central Limit Theorem that, for any time t, possesses the Markov property:
2
X (t) ⇠ N (0, t) X (t + s)|X (t), {X (u), 0  0 < t} ⇠ X (t + s)|X (t)
; this is the Brownian motion, which can therefore be regarded as a Note: in fact, the independent increment property always implies the
continuous ‘limit’ version of a random walk Markov property ; intuitive
The Brownian motion is thus a continuous-time continuous-space (however, the Markov property does not imply the independent
stochastic process increment property)
MATH3801-MATH3901 Stochastic Processes 571 MATH3801-MATH3901 Stochastic Processes 572
10. Brownian Motion and Stationary Processes 10.1 Brownian Motion 10. Brownian Motion and Stationary Processes 10.1 Brownian Motion
Definition Illustration
Definition Four simulated realisations of a standard Brownian motion
A stochastic process {X (t), t 0} is said to be a Brownian motion if
(i) X (0) = 0; (arbitrary choice)

0.4

1.5
0.2

1.0
(ii) {X (t), t 0} has stationary and independent increments;

0.0
B(t)

B(t)

0.5
(iii) for every t > 0, X (t) is normally distributed with mean 0 and

−0.2

0.0
variance 2 t

−0.4

−0.5
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
With the stationary increments property, it also follows, for any t, s > 0, t t
2
X (s + t)|(X (s) = x) ⇠ N (x, t)

0.0

0.0
−1.2 −1.0 −0.8 −0.6 −0.4 −0.2
Note: when = 1 the process is called the standard Brownian motion

−0.2
−0.4
Any Brownian motion X (t) can be converted to the standard process

B(t)

−0.6
by letting X (t)

−0.8
B(t) =

−1.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
; in the sequel we will most often consider the standard Brownian t t
motion {B(t), t 0} only
MATH3801-MATH3901 Stochastic Processes 573 MATH3801-MATH3901 Stochastic Processes 574
10. Brownian Motion and Stationary Processes 10.1 Brownian Motion 10. Brownian Motion and Stationary Processes 10.1 Brownian Motion
Illustration Properties
The physical origins of a Brownian motion suggest that the sample
Microsoft stock, Feb 2002 - Feb 2007 paths should be continuous functions of t
; this is the case
(no proof given - well beyond the scope of this course!)
On the other hand, the sample paths of B(t) are no-where
differentiable, i.e., for every t
B(t + ) B(t)
lim does not exist (it is ±1 for all t)
!0
; Brownian motion is an important framework for modelling financial p
(In the notation of the previous slides: x= t)
markets
Hence the very rough appearance of the Brownian motion sample
paths!
MATH3801-MATH3901 Stochastic Processes 575 MATH3801-MATH3901 Stochastic Processes 576
10. Brownian Motion and Stationary Processes 10.1 Brownian Motion 10. Brownian Motion and Stationary Processes 10.1 Brownian Motion
Joint density Joint density
We have said that, for any fixed t, B(t) is N (0, t)-distributed: It follows
1 x2
fB(t1 ),B(t2 ),...,B(tn ) (x1 , x2 , . . . , xn )
fB(t) (x) =p e 2t
2⇡t
= fB(t1 ),B(t2 ) B(t1 ),...,B(tn ) B(tn 1)
(x1 , x2 x1 , . . . , xn xn 1)
Question: What is the joint density of (B(t1 ), B(t2 ), . . . , B(tn ))? = fB(t1 ) (x1 ) fB(t2 ) x1 ) . . . fB(tn ) xn
B(t1 ) (x2 B(tn 1)
(xn 1)
Assume that t1 < t2 < . . . < tn . First note that = fB(t1 ) (x1 ) fB(t2 x1 ) . . . fB(tn xn
t1 ) (x2 tn 1)
(xn 1)
8 8
>
> B(t1 ) = x1 >
> B(t1 ) = x1
<
B(t2 ) = x2
<
B(t2 ) B(t1 ) = x2 x1 which is,
()
>
> ... >
> ...
:
B(tn ) = xn
:
B(tn ) B(tn 1 ) = xn xn fB(t1 ),B(t2 ),...,B(tn ) (x1 , x2 , . . . , xn )
1 ⇣ ⇣ ⌘⌘
1 x2 (x2 x1 ) 2 (xn xn 1)
2
exp 2
1
t1 + t2 t1 + . . . + tn tn 1
; by the independent increment property, B(t1 ), B(t2 ) B(t1 ), . . . = p
(2⇡)n/2 t1 (t2 t1 ) . . . (tn tn 1)
B(tn ) B(tn 1 ) are all independent
; by the stationary increment property, B(tk ) B(tk 1 ) is normally ; from this density, we can compute any desired probabilities
distributed with mean 0 and variance tk tk 1 (in principle!)
MATH3801-MATH3901 Stochastic Processes 577 MATH3801-MATH3901 Stochastic Processes 578
10. Brownian Motion and Stationary Processes 10.1 Brownian Motion 10. Brownian Motion and Stationary Processes 10.1 Brownian Motion
Conditional distribution Example
For instance, the conditional distribution of B(t1 ) given that B(t2 ) = a Example 10.1
where t < t :
1 2
fB(t1 ) (x) fB(t2 )|B(t1 ) (a|x) fB(t1 ) (x) fB(t2 ) B(t1 ) (a x) In a bicycle race between two competitors, let Y (t) denote the amount of time
f B(t1 )|B(t2 )
(x|a) = = (in seconds) by which the racer that started in the inside position is ahead
fB(t ) (a)
2
fB(t2 ) (a) when 100t percent of the race has been completed, and suppose that
✓ ◆
x2 (a x)2 {Y (t), t 0} can be effectively modelled as a Brownian motion process with
= K exp
1 variance parameter 2
2t1 2(t2 t1 )
✓ ✓ ◆ ◆ (a) if the inside racer is leading by seconds at the midpoint of the race, what
1 1 ax is the probability that she is the winner?
= K2 exp x2 + +
2t1 2(t2 t1 ) t2 t1
✓ ✓ ◆◆
t2 t a
1
= K2 exp x2 2 x P(Y (1) > 0|Y (1/2) = ) = P(Y (1) Y (1/2) > |Y (1/2) = )
2t1 (t2 t1 ) t2
✓ ◆ = P(Y (1) Y (1/2) > ) (independent increments)
(x at1 /t2 )2
= K3 exp = P(Y (1/2) > ) (stationary increments)
2t1 (t2 t1 )/t2 ✓ ◆
Y (1/2) p
; normal density with =P p > 2
/ 2
at1 t1 (t2 t1 ) p
E(X (t1 )|X (t2 )) = and Var(X (t1 )|X (t2 )) = = ( 2) ' 0.9213
t2 t2
MATH3801-MATH3901 Stochastic Processes 579 MATH3801-MATH3901 Stochastic Processes 580
10. Brownian Motion and Stationary Processes 10.1 Brownian Motion 10. Brownian Motion and Stationary Processes 10.2 Hitting times and Maximum Variable
Example Hitting times and reflection principle
Example 10.1 Let Ta denote the first time the Brownian motion hits a given value a,
and without loss of generality, suppose a > 0
In a bicycle race between two competitors, let Y (t) denote the amount of time
(in seconds) by which the racer that started in the inside position is ahead Ta is often called a hitting time or a first passage time
when 100t percent of the race has been completed, and suppose that What is P(Ta  t)? We have (law of total probability):
{Y (t), t 0} can be effectively modelled as a Brownian motion process with
variance parameter 2 . P(B(t) a) =P(B(t) a|Ta > t)P(Ta > t)
(b) if the inside racer wins the race by a margin of seconds, what is the + P(B(t) a|Ta  t)P(Ta  t)
probability that she was ahead at the midpoint?
; if Ta  t, the process hits a at some time in [0, t], and by symmetry,
We must compute P(Y (1/2) > 0|Y (1) = )
1
Y (t) P(B(t) a|Ta  t) =
We know that B(t) = is the standard Brownian motion 2
; the conditional distribution of B(t1 ) given that B(t2 ) = C
is normal with This fact is often called the reflection principle
mean Ct1 /t2 and variance t1 (t2 t1 )/t2 ; if Ta > t, the process value cannot be greater than a without having
; the conditional distribution of Y (t1 ) given that Y (t2 ) = C is normal with yet hit a (by continuity of the sample paths), so that
mean Ct1 /t2 and variance 2 t1 (t2 t1 )/t2
P(B(t) a|Ta > t) = 0
2
; P(Y (1/2) > 0|Y (1) = ) = P(N ( /2, /4) > 0) = (1) ' 0.8413
MATH3801-MATH3901 Stochastic Processes 581 MATH3801-MATH3901 Stochastic Processes 582
10. Brownian Motion and Stationary Processes 10.2 Hitting times and Maximum Variable 10. Brownian Motion and Stationary Processes 10.2 Hitting times and Maximum Variable
Hitting times Hitting times
Hence,
Density of Ta for a=1 Density of Ta for a=2
P(Ta  t) = 2P(B(t) a)

0.12
Z 1
2 x2
=p e 2t dx

0.10
0.4

2⇡t a
Z 1 ✓ ✓ ◆◆
2 y2 a
p e dy = 2 1

0.08
=p 2 p
0.3
2⇡ a/ t t

0.06
By symmetry, for a < 0, we obtain

f(t)

f(t)
✓ ◆

0.2
a

0.04
P(Ta  t) = 2 p
t

0.1

0.02
Consequence: for any a, Ta is finite with probability 1
Differentiating with respect to t, we also obtain the pdf of Ta

0.00
0.0
✓ ◆
a2
0 2 4 6 8 10 0 2 4 6 8 10
|a|
fTa (t) = p exp
t >0
t t
,
2t
2⇡t 3
MATH3801-MATH3901 Stochastic Processes 583 MATH3801-MATH3901 Stochastic Processes 584
10. Brownian Motion and Stationary Processes 10.2 Hitting times and Maximum Variable 10. Brownian Motion and Stationary Processes 10.2 Hitting times and Maximum Variable
Maximum variable Maximum variable
Another variable of interest is the maximum value the process attains
in [0, t], say M(t) = max0st B(s)
Density of M(t) for t=1 Density of M(t) for t=10

0.8

0.8
As B(0) = 0, M(t) must be non-negative
For a > 0,

0.6

0.6
P (M(t) a) = P(Ta  t)

f(a)

f(a)
0.4

0.4
so that ✓ ◆
a
P (M(t)  a) = 2 p 1
t

0.2

0.2
Differentiating with respect to a, we have
r ✓ ◆

0.0

0.0
2 a2
fM(t) (a) = exp , a>0 0 2 4 6 8 10 0 2 4 6 8 10
⇡t 2t a a
MATH3801-MATH3901 Stochastic Processes 585 MATH3801-MATH3901 Stochastic Processes 586
10. Brownian Motion and Stationary Processes 10.2 Hitting times and Maximum Variable 10. Brownian Motion and Stationary Processes 10.2 Hitting times and Maximum Variable
Hitting times: example Gambler’s ruin problem
Let us now consider the probability that Brownian motion hits a before
Example b, where a, b > 0
The accuracy of the measurements from a certain tensor has been found to
Make use of the interpretation of Brownian motion as being a limit of
be deteriorating. Studies indicate that the daily evolution of the deterioration
can be modelled by a standard Brownian motion. What is the probability that the symmetric random walk
the tensor reading will ever deviate from the true value by -8 units in a) 6 ; Gambler’s ruin problem (Slides 263-266), with
months (180 days); b) 1 year (365 days)? c) What is the probability that the
maximum deviation will be at most +8 units in 6 months? 1/2 1/2 1/2 1/2 1/2
, * + -
1 b/ x m ... j x l 0 j x k .( . . a/ x ⇥ 1
a) We are required to find P(T 8  t), with t = 180, for a standard Brownian -
1/2 1/2 1/2 1/2 1/2
motion, that is ⇣ ⌘
P(T 8  180) = 2 p 8 = 2 ( 0.6) = 0.5486 a+b b
180 that is, with N = x and i = x, in the notation of Chapter 4
⌘ ⇣
b) P(T 8  365) = 2 p 8
= 2 ( 0.42) = 0.6744 ; as the random walk is symmetric, we have
365
c) We are required to find P(M(180)  8), that is i b
Pi = =
⇣ ⌘ N a+b
P(M(180) < 8) = 2 p8
1 = 0.4514 (= 1 P(T 8  180))
180
which is the desired probability (even as x ! 0)
MATH3801-MATH3901 Stochastic Processes 587 MATH3801-MATH3901 Stochastic Processes 588
10. Brownian Motion and Stationary Processes 10.2 Hitting times and Maximum Variable 10. Brownian Motion and Stationary Processes 10.2 Hitting times and Maximum Variable
Zeros of Brownian Motion Zeros of Brownian Motion
Let E(t0 , t1 ) denote the event
Previously we found the distribution of Ta , the first passage time of the E(t0 , t1 ) = “B(t) = 0 for some t 2 (t0 , t1 )”
Brownian Motion at level a 6= 0
Condition on B(t0 ) to obtain
We may also be interested in the return time to its starting point, i.e. 0 Z 1
Define T0 = inf{t > 0 : B(t) = 0} P(E(t0 , t1 )) = P(E(t0 , t1 )|B(t0 ) = w) fB(t0 ) (w) dw
Z11
Interesting (and surprising!) properties of this random variable T0 can
be understood through the number of zeros of {B(t)} over periods of =2 P(E(t0 , t1 )|B(t0 ) = w) fB(t0 ) (w) dw (by symmetry)
0
time
However, for w > 0,
We say that ‘B(t) has a zero at time t’ if B(t) = 0
P(E(t0 , t1 )|B(t0 ) = w) = P(Tw < t1 t0 |B(0) = 0) = P(Tw < t1 t0 ),
by symmetry, and the independent and stationary increments property
MATH3801-MATH3901 Stochastic Processes 589 MATH3801-MATH3901 Stochastic Processes 590
10. Brownian Motion and Stationary Processes 10.2 Hitting times and Maximum Variable 10. Brownian Motion and Stationary Processes 10.2 Hitting times and Maximum Variable
Zeros of Brownian Motion ⇣ ⌘ Zeros of Brownian Motion
pw w2
For w > 0, we know that fTw (t) = exp 2t and
2⇡t 3
B(t0 ) ⇠ N (0, t0 ), hence This result indicates some remarkable properties of the sample paths
of {B(t)}
Z Z 1 t 1 t0
P(E(t0 , t1 )) = 2 fT (t) dt f (w) dw Set t0 = 0 to obtain
w B(t0 )
0 0
Z t1 t0
Z 1 ✓ ✓ ◆◆ P(E(0, t)) = P(there is some zero in (0, t)) = 1 for all t > 0
1 1 t + t0
= p t 3/2 w exp w2 dw dt
⇡ t0 0 0 2 tt0
p Z So, it follows that, with probability 1,
t0 t1 t0 1
= p dt
⇡ 0 (t + t0 ) t T0 = 0
s
2 1 t1 In fact, a deeper analysis shows that, with probability 1, B(t) has
= tan 1 (by the susbtitution t = t0 s2 )
⇡ t0 infinitely many zeros in any non-empty time interval (0, t)!
i.e., ; no wonder that a Brownian Motion process has non-differentiable
2 1
p paths!
P(E(t0 , t1 )) = ⇡ cos t0 /t1
MATH3801-MATH3901 Stochastic Processes 591 MATH3801-MATH3901 Stochastic Processes 592
10. Brownian Motion and Stationary Processes 10.3 Variations on Brownian Motion 10. Brownian Motion and Stationary Processes 10.3 Variations on Brownian Motion
Brownian Motion with Drift Illustration: Brownian motion with Drift
As we said, Brownian motion is often used to model stock prices Four simulated realisations of a Brownian motion with drift, with µ = 1
However, stock prices do not generally have a zero (or even constant) and = 1
mean

1.0
; it is sometimes necessary to include a drift rate µ in the process

0.8
0.8
0.6

0.6
Definition

0.4
X(t)

X(t)

0.4
0.2
We say that {X (t), t 0} is a Brownian motion process with drift

0.0

0.2
−0.2
coefficient µ and variance parameter 2 if

0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
(i) X (0) = 0;
t t
(ii) {X (t), t 0} has stationary and independent increments;

2.5

0.5
(iii) for any t, X (t) is normally distributed with mean µt and variance

2.0
2t

1.5

0.0
X(t)

X(t)
1.0

−0.5
0.5
; another formulation is to let {B(t), t 0} be standard Brownian

0.0
motion and then define 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
X (t) = µt + B(t)
t t
MATH3801-MATH3901 Stochastic Processes 593 MATH3801-MATH3901 Stochastic Processes 594
10. Brownian Motion and Stationary Processes 10.3 Variations on Brownian Motion 10. Brownian Motion and Stationary Processes 10.3 Variations on Brownian Motion
Brownian Motion with Drift Brownian Motion with Drift: Gambler’s ruin problem
Like the regular Brownian Motion (Slides 570-571), the Brownian
Given that X (0) = 0, we can write, for any t, Motion with Drift can be shown to be a limit version of a discrete-time
random walk where the probability of making a transition of size x to
P(X (t)  x) = P(µt + B(t)  x) the right is (see tutorial exercise 10.14)
x µt p
= P(B(t)  ) 1 µ t
p = (1 + )
✓ ◆ 2
x µt
= p This observation is useful to derive the probability that the Brownian
t
motion with drift hits a before b, with a, b > 0
as B(t) ⇠ N (0, t) The gambler’s ruin problem results state that, when p 6= 1/2, this
probability is (compare Slide 588)
; for any fixed value x, the above probability:
1 (q/p)b/ x
decreases with t when µ is positive
1 (q/p)(a+b)/ x
increases with t when µ is negative p
p p µ t
t t q 1
When p = 21 (1 + µ
), q = 1 p = 21 (1 µ
) and p = p
t
1+ µ
MATH3801-MATH3901 Stochastic Processes 595 MATH3801-MATH3901 Stochastic Processes 596
10. Brownian Motion and Stationary Processes 10.3 Variations on Brownian Motion 10. Brownian Motion and Stationary Processes 10.3 Variations on Brownian Motion
Brownian Motion
p with Drift: Gambler’s ruin problem Maximum of a Brownian Motion with Negative Drift
Given that x= t (Slide 571), the desired probability is Consider {X (t), t 0} a Brownian Motion with drift µ, µ < 0
p
✓ µ
p
t
◆b/( t) Over time, such a process will tend toward ever lower values, and its
1
1 p
t
maximum
1+ µ
✓ ◆(a+b)/( p M = max{X (t) : t 0}
µ
p
t t)
1
1 p will be a well-defined and finite random variable
t
1+ µ
Then, for any y > 0, the event M y is equivalent to the event
As t ! 0, we get “{X (t)} reaches y before 1”
p !1/p t ✓ ◆n Thus, with a = y and b = 1 in the previous expression:
µ t
1 1 µ/( n) e µ/ 2µ/
lim p = lim = µ/ = e 1 2
t!0 1+ µ t n!1 1 + µ/( n) e P(M > y ) = 2
= e2µy /
e 2µy /
Hence the limiting value of the above probability is As µ is negative, we can write
2µb/ 2 2
e2µb/
2
1 e 1 P(M  y ) = 1 e 2|µ|y /
, for y > 0
2µ(a+b)/ 2
= 2µb/ 2 2µa/ 2
1 e e e 2 )-cdf 2)
which is the Exp(2|µ|/ ; M ⇠ Exp(2|µ|/
MATH3801-MATH3901 Stochastic Processes 597 MATH3801-MATH3901 Stochastic Processes 598
10. Brownian Motion and Stationary Processes 10.3 Variations on Brownian Motion 10. Brownian Motion and Stationary Processes 10.3 Variations on Brownian Motion
Geometric Brownian Motion Geometric Brownian Motion
Nowadays, mathematical economists usually prefer geometric
Brownian motion over Brownian motion as a model for prices of assets
in a perfect market:
such prices are nonnegative and exhibit fluctuations about a long-term
Definition exponential decay or growth curve
If {Y (t), t 0} is a Brownian motion with drift coefficient µ and
More importantly, for {X (t), t 0} a Geometric Brownian Motion and
variance parameter 2 , then the process {X (t), t 0} defined by
some times t0 < t1 < t2 < . . . < tn , the successive ratios
X (t) = eY (t) X (t ) X (t )
1 2 X (tn )
, ,...,
X (t0 ) X (t1 ) X (tn 1 )
is called Geometric Brownian motion
are independent random variables (by the independent increment
property of the Brownian motion)
; percentage changes over non-overlapping intervals are
independent
(more in Section 10.4 of your textbook)
MATH3801-MATH3901 Stochastic Processes 599 MATH3801-MATH3901 Stochastic Processes 600
10. Brownian Motion and Stationary Processes 10.3 Variations on Brownian Motion 10. Brownian Motion and Stationary Processes 10.3 Variations on Brownian Motion
Illustration: Geometric Brownian motion Geometric Brownian Motion: properties
Four simulated realisations of a Geometric Brownian motion with Let us compute the expected value of the process at time t given the
µ = 1/2 and = 1 history of the process up to time s < t:
⇣ ⌘
E(X (t)|{X (u), 0  u  s}) = E eY (t) |{Y (u), 0  u  s}
6

⇣ ⌘

1.4
5

= E eY (s)+Y (t) Y (s) |{Y (u), 0  u  s}

1.2
4

⇣ ⌘
X(t)

X(t)

1.0
3

= eY (s) E eY (t) Y (s) |{Y (u), 0  u  s}

0.8
2

0.6
⇣ ⌘
1

0.0 0.2 0.4

0.6 0.8 1.0 0.0 0.2 0.4

0.6 0.8 1.0

= X (s) E eY (t) Y (s)
Recall (Slide 119) that the moment generating function of a

14
N (µ, 2 )-random variable W is
1.0

12
10
0.8

8
X(t)

X(t)
2
'W (z) = E(ezW ) = eµz+z
0.6

6
4
0.4
and Y (t) Y (s) ⇠ Y (t s) is a random variable normally distributed

2
with mean µ(t s) and variance 2 (t s)
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
t t
MATH3801-MATH3901 Stochastic Processes 601 MATH3801-MATH3901 Stochastic Processes 602
10. Brownian Motion and Stationary Processes 10.3 Variations on Brownian Motion 10. Brownian Motion and Stationary Processes 10.5 White Noise
Geometric Brownian Motion: properties Introduction to Stochastic Calculus
It follows Imagine a particle whose movement along an axis is given by a
⇣ ⌘ 2 standard Brownian motion {B(t), t 0}
E eY (t) Y (s)
= 'Y (t) Y (s) (1) = eµ(t s)+(t s) 2 ,
Its velocity along the axis should be the derivative of B(t), but we said
hence 2
(Slide 576) that B(t) is nowhere differentiable! ; need something else
s)+(t s)
E(X (t)|{X (u), 0  u  s}) = X (s)eµ(t 2
At time t, the position of the particle is B(t), while at time t + dt, it is
Also, B(t + dt). The differential dB(t) is defined as
2 2
E(X (t)) = eµt+t 2 = et(µ+ 2
)
dB(t) = B(t + dt) B(t),
(take s = 0 in the previous expression; Y (0) = 0 so that X (0) = 1) and is the difference in the particle position after an infinitesimal
Similarly, increment of time ; random variable !
Var(X (t)) = E(X 2 (t)) (E(X (t)))2 The position of the particle after a time s is thus a forward sum of
uncountable and random increments over time, given by the integral
= E(e2Y (t) ) (E(eY (t) ))2 Z s
2t 2t 2 2t dB(t) B(s) =
= e2µt+2 e2µt+ = e2t(µ+ 2
)
(e 1) 0
This motivates the introduction of a stochastic integral
MATH3801-MATH3901 Stochastic Processes 603 MATH3801-MATH3901 Stochastic Processes 604
10. Brownian Motion and Stationary Processes 10.5 White Noise 10. Brownian Motion and Stationary Processes 10.5 White Noise
Stochastic Integral Stochastic IntegralRb
More generally, let f be a function having a continuous derivative in the See that a f (t) dB(t) is a normally distributed random variable, with
!
region [a, b] Z b Z b
Z b E f (t) dB(t) = f (b)E(B(b)) f (a)E(B(a)) E(B(t)) df (t) = 0
a a
Then define the stochastic integral f (t) dB(t) as follows:
a (if we can interchange expectation and integral), as E(B(t)) ⌘ 0
Z b n
X Also, from the independent increment property of {B(t), t 0},
f (t) dB(t) = lim f (ti 1 )(B(ti ) B(ti 1 )) n
! n
a n!1 X X
max(ti ti 1 )!0 i=1 Var f (ti 1 )(B(ti ) B(ti 1 )) = f 2 (ti 1 ) Var(B(ti ) B(ti 1 ))
i=1 i=1
where a = t0 < t1 < . . . < tn = b is a partition of the region [a, b] n
X
; integration by parts yields = f 2 (ti 1 )(ti ti 1)
i=1
Z b Z b Z ! Z
b b
f (t) dB(t) = f (b)B(b) f (a)B(a) B(t) df (t) ; Var f (t) dB(t) = f 2 (t) dt
a a a a
Rb ⇣ Rb ⌘
; convenient definition of the stochastic integral ; a f (t) dB(t) ⇠ N 0, a f 2 (t) dt
MATH3801-MATH3901 Stochastic Processes 605 MATH3801-MATH3901 Stochastic Processes 606
10. Brownian Motion and Stationary Processes 10.5 White Noise 10. Brownian Motion and Stationary Processes 10.5 White Noise
Stochastic Integral Illustration: Gaussian White Noise
Similarly, it can be shown that, for two functions f and g differentiable Four simulated realisations of a Gaussian white noise
Rb Rb
on [a, b], then a f (t) dB(t) and a g(t) dB(t) are jointly normal with
Z Z ! Z

4
3

b b b
2

Cov f (t) dB(t), g(t) dB(t) = f (t)g(t) dt

2
1
a a a
dB(t)

dB(t)
0

0
−1
R

−2
−2
Note 1: the stochastic integral f (t) dB(t) is also called the Ito integral,

−3
after the Japanese mathematician Kiyoshi Ito (1915-2008).
0 200 400 600 800 1000 0 200 400 600 800 1000
t t
The whole extension of the methods of calculus to stochastic

3
processes is named Ito calculus, whose main applications are in

2
3
mathematical finance and stochastic differential equations

1
1
(; MATH5975 - Introduction to Stochastic Analysis)

dB(t)

0
0

−1
−1
Note 2: the stochastic process {dB(t), t 0} is called a

−2
−2

−3
−3
Gaussian white noise 0 200 400
t

600 800 1000 0 200 400

600 800 1000

as it can be shown that such a signal does not carry any information
MATH3801-MATH3901 Stochastic Processes 607 MATH3801-MATH3901 Stochastic Processes 608
10. Brownian Motion and Stationary Processes 10.5 White Noise 10. Brownian Motion and Stationary Processes 10.5 White Noise
Stochastic calculus vs. regular calculus Stochastic calculus vs. regular calculus
Careful! Stochastic calculus is not just ‘regular’ calculus applied to
By a Taylor expansion for f , we have
stochastic processes or random functions
Integration rules and the like may be different, and sometimes lead to f (B(t + dt)) = f (B(t)) + f 0 (B(t))(B(t + dt) B(t))
unexpected results 1
+ f 00 (B(t))(B(t + dt) B(t))2 + . . .
A simple example of such an unexpected result is given here: 2
Consider a deterministic function of time x(t), and set y (t) = f (x(t)) for We know that B(t + dt) B(t) ⇠ N (0, dt), so the second-term will be
some differentiable function f . Then, the usual chain rule states that of order dt and cannot be neglected (unlike in the deterministic case)
dy (t) = f 0 (x(t)) ⇥ dx(t) It turns out that dt is indeed an acceptable approximation for
(B(t + dt) B(t))2 , hence we get
; classical result
1
dY (t) = f 0 (B(t))dB(t) + f 00 (B(t))dt
Now, instead of x(t), take the standard Brownian motion B(t), and let 2
Y (t) = f (B(t)) ; {Y (t), t 0} is a stochastic process
; the ‘stochastic’ chain rule is different!
Is the chain rule affected?
MATH3801-MATH3901 Stochastic Processes 609 MATH3801-MATH3901 Stochastic Processes 610
10. Brownian Motion and Stationary Processes 10.5 White Noise 10. Brownian Motion and Stationary Processes 10.5 White Noise
Stochastic calculus vs. regular calculus An example: the Ornstein-Uhlenbeck process
Example 10.4
Consider a particle of unit mass that is suspended in a liquid and
Note: A broad range of continuous-time continuous-space stochastic suppose that, due to the liquid, there is a viscous force that retards the
processes can be defined as the solution of a stochastic differential velocity of the particle at a rate proportional to its present velocity. In
equation like addition, let us suppose that the velocity instantaneously changes
according to a constant multiple of white noise. That is,
dX (t) = µ(t, X (t))dt + (t, X (t))dB(t),
dV (t) = V (t) dt + ↵dB(t)
for some functions µ and
where V (t) is the particle’s velocity at time t and {B(t), t 0} is
Those processes are known as diffusion processes standard Brownian motion. Find an expression for V (t)
; V (t) is indeed defined through a stochastic differential equation
Note: This formulation has been proposed as an improved model for
the velocity of a particle (that a basic model based on the Brownian
motion for its position is not able to account for)
MATH3801-MATH3901 Stochastic Processes 611 MATH3801-MATH3901 Stochastic Processes 612
10. Brownian Motion and Stationary Processes 10.5 White Noise 10. Brownian Motion and Stationary Processes 10.5 White Noise
An example: the Ornstein-Uhlenbeck process An example: the Ornstein-Uhlenbeck process
We can write the previous expression as
e t (dV (t) + V (t) dt) = ↵e t dB(t)
This process is known as the Ornstein-Uhlenbeck process or ⇣ ⌘
It not only describes the velocity of a particle in a viscous surrounding d e t V (t) = ↵e t dB(t)
medium, it has become a popular tool for modelling interest rates Upon integration, we obtain
(“Vasicek model”) Z t
The above equation is also called the Langevin equation in statistical e t V (t) e 0 V (0) = ↵ e s dB(s)
0
physics
or Z t
Leonard Ornstein (1880-1941) and George Uhlenbeck (1900-1988) V (t) = V (0)e t
+↵ e (t s)
dB(s) (?)
were two Dutch physicists 0
that is,
✓ Z t ◆
t (t s)
V (t) = V (0)e + ↵ B(t) B(s)e ds
0
MATH3801-MATH3901 Stochastic Processes 613 MATH3801-MATH3901 Stochastic Processes 614
10. Brownian Motion and Stationary Processes 10.5 White Noise 10. Brownian Motion and Stationary Processes 10.5 White Noise
An example: the Ornstein-Uhlenbeck process An example: the Ornstein-Uhlenbeck process
Now see that for any fixed t, V (t) is normally distributed, as it is a Further, we have, for 0  t1  t2 ,
linear transformation of a normally-distributed stochastic integral ✓Z t1 Z t2 ◆
Cov (V (t1 ), V (t2 )) = ↵2 Cov e (t1 u)
dB(u), e (t2 u)
dB(u)
Also, 0 0
✓Z t1 Z t1 ◆
✓Z t ◆
t (t s) t = ↵2 Cov e (t1 u)
dB(u), e (t2 u)
dB(u)
E(V (t)) = V (0)e + ↵E e dB(s) = V (0)e 0 0
0 Z t1
and = ↵2 e (t1 +t2 2u)
du
0
✓Z ◆ Z
t t ↵2 ↵2
Var(V (t)) = ↵2 Var e (t s)
dB(s) = ↵2 e 2 (t s)
ds = e (t2 t1 ) e (t1 +t2 )
0 0 2 2
↵2 2 t Note that, in the long term (t1 and t2 large, i.e. t1 + t2 large):
= (1 e ) ✓ ◆
2 ↵2
V (t) ! N 0,
⇣ ⌘ 2
; V (t) ⇠ N V (0)e t , ↵2 (1 e 2 t)
2 and
↵2 (t2 t1 )
Cov (V (t1 ), V (t2 )) ! e
2
MATH3801-MATH3901 Stochastic Processes 615 MATH3801-MATH3901 Stochastic Processes 616
10. Brownian Motion and Stationary Processes 10.5 White Noise 10. Brownian Motion and Stationary Processes 10.5 White Noise
An example: the Ornstein-Uhlenbeck process An example: the Ornstein-Uhlenbeck process
Further,
If V (t) is the velocity of the particle, then its position at time t Var(X (t)) = E(X 2 (t))
(assuming X (0) = 0) would be Z tZ s
=2 E(V (s)V (u)) du ds
Z t 0 0
X (t) = V (s) ds = ...
0 ✓ ◆
⇣ ⌘ ↵2 2 t 1 2 t
2
2 t) = t (1 e )+ (1 e )
To simplify, assume that V (0) = 0. Then, V (t) ⇠ N 0, 2↵ (1 e 2 2
It follows ; when t is ‘large’, we see that
✓Z t ◆ Z t
↵2
E(X (t)) = E V (s) ds = E(V (s)) ds ⌘ 0 Var(X (t)) ⇠ t
2
0 0
; observed over a long time span, the particle’s position as modelled
by an Ornstein-Uhlenbeck velocity behaves much like a Brownian
motion
MATH3801-MATH3901 Stochastic Processes 617 MATH3801-MATH3901 Stochastic Processes 618
10. Brownian Motion and Stationary Processes 10.6 Gaussian Processes 10. Brownian Motion and Stationary Processes 10.6 Gaussian Processes
Gaussian processes: definition Covariance structure
We know that a multivariate normal distribution is completely
Definition determined by the marginal mean values and the covariance values
A stochastic process {X (t), t 0} is called Gaussian, if ; every Gaussian process is uniquely described by its parameters:
(X (t1 ), X (t2 ), . . . , X (tn )) has a multivariate normal distribution for all n the mean function and the covariance function
and all times t1 , t2 , . . . , tn
; a standard Brownian motion could also be defined as the unique
Gaussian process having E(B(t)) = 0 and, for t1  t2 ,
Proposition
A Brownian motion {X (t), t 0} is a Gaussian process Cov(B(t1 ), B(t2 )) = Cov(B(t1 ), B(t1 ) + B(t2 ) B(t1 ))
= Cov(B(t1 ), B(t1 )) + Cov(B(t1 ), B(t2 ) B(t1 ))
Proof: each of X (t1 ), X (t2 ), . . . , X (tn ) can be expressed as a linear = Cov(B(t1 ), B(t1 )) = Var(B(t1 ))
combination of the independent normal random variables
= t1
X (t1 ), X (t2 ) X (t1 ), . . . , X (tn ) X (tn 1 ), which ensures the vector
(X (t1 ), X (t2 ), . . . , X (tn )) is multivariate normal (see Slide 124) If t2  t1 , Cov(B(t1 ), B(t2 )) = t2 , so that in general, for any times t1 and
t2 ,
; the Brownian motion is one of the most important case of Gaussian
Cov(B(t1 ), B(t2 )) = min(t1 , t2 )
processes
MATH3801-MATH3901 Stochastic Processes 619 MATH3801-MATH3901 Stochastic Processes 620
10. Brownian Motion and Stationary Processes 10.6 Gaussian Processes 10. Brownian Motion and Stationary Processes 10.6 Gaussian Processes
Example Brownian bridge
Let {B(t), t 0} be a standard Brownian motion process and consider
Example the process values between 0 and 1 conditional on B(1) = 0, i.e., the
conditional stochastic process
Let {B(t), t 0} be a standard Brownian motion. What type of process is
{Y (t), t 0}, with {B(t), 0  t  1|B(1) = 0}
Y (t) = cB(t/c 2 ),
; this is called the Brownian bridge, as it is tied down at 0 and at 1
for any fixed c > 0?
This process is useful for modelling a system that starts at some given
First, {Y (t), t 0} is clearly a Gaussian process level and is expected to return to that level at some specified future
For any t, B(t) is normally distributed with mean 0 and variance t time
B(t/c 2 ) is normally distributed with mean 0 and variance t/c 2 In a multivariate normal vector (X (t1 ), X (t2 ), . . . , X (tn )), any conditional
Y (t) = cB(t/c 2 ) is normally distributed with mean 0 and variance t distribution of the type
Cov(Y (t1 ), Y (t2 )) = c 2 Cov(B(t1 /c 2 ), B(t2 /c 2 )) = c 2 min( ct12 , ct22 ) = min(t1 , t2 ) (X (t1 ), X (t2 ), . . . , X (tn 1 ))|X (tn )
; Y (t) is also standard Brownian motion is again multivariate normal
; the Brownian bridge is a Gaussian process
MATH3801-MATH3901 Stochastic Processes 621 MATH3801-MATH3901 Stochastic Processes 622
10. Brownian Motion and Stationary Processes 10.6 Gaussian Processes 10. Brownian Motion and Stationary Processes 10.6 Gaussian Processes
Illustration: Brownian Bridge Brownian bridge
Four simulated realisations of the Brownian Bridge: Recall (Slide 579) that, for the standard Brownian motion and
0  t1  t2 ,
bt1 t1 (t2 t1 )
E(B(t1 )|B(t2 ) = b) = and Var(B(t1 )|B(t2 ) = b) =
t2 t2
0.6
0.0

0.4
B(t)|B(1)=0

B(t)|B(1)=0

; E(B(t)|B(1) = 0) = 0, for any 0  t  1

−0.5

0.2

For 0  t1  t2  1, we have
0.0
−1.0

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 Cov(B(t1 ), B(t2 )|B(1) = 0) = E(B(t1 )B(t2 )|B(1) = 0)
t t
= E E(B(t1 )B(t2 )|B(t2 ), B(1) = 0)|B(1) = 0
= E B(t2 ) E(B(t1 )|B(t2 ), B(1) = 0)|B(1) = 0
0.8
0.6

0.6
0.4

= E B(t2 )E(B(t1 )|B(t2 ))|B(1) = 0

0.4
0.2
B(t)|B(1)=0

B(t)|B(1)=0

0.2

t t1
0.0

1
= E B 2 (t2 )|B(1) = 0 = E B 2 (t2 )|B(1) = 0
0.0
−0.6 −0.4 −0.2

−0.2

t2 t2
t1
−0.6

0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 = t2 (1 t2 ) = t1 (1 t2 )
t t t2
MATH3801-MATH3901 Stochastic Processes 623 MATH3801-MATH3901 Stochastic Processes 624
10. Brownian Motion and Stationary Processes 10.6 Gaussian Processes 10. Brownian Motion and Stationary Processes 10.6 Gaussian Processes
Brownian bridge Application: the Empirical Distribution Function
; the Brownian bridge can be defined as the unique Gaussian Let X1 , X2 , . . ., Xn be an i.i.d. sample drawn from an unknown
process with mean value 0 and covariance t1 (1 t2 ), distribution F
0  t1  t2  1 The empirical distribution function corresponding to this sample is
n n
Proposition 10.1 #{Xi  t; i = 1, . . . , n} 1X . 1X
Fn (t) = = 1I{Xi t} = ⇠i (t)
If {B(t), t 0} is standard Brownian motion, then {Z (t), 0  t  1} n n n
i=1 i=1
where The empirical distribution function is an estimate of the true distribution
Z (t) = B(t) tB(1) F (t) = P(X  t)
is the Brownian bridge An important observation is that, for all i,
Proof: {Z (t), 0  t  1} is clearly a Gaussian process, and ⇠i (t) ⇠ Bern(F (t))
E(Z (t)) = E(B(t)) tE(B(1)) = 0. Moreover, for 0  t1  t2  1,
For simplicity, suppose that F (t) = t for 0 < t < 1, i.e. X ⇠ U[0,1]
Cov(Z (t1 ), Z (t2 )) = Cov(B(t1 ) t1 B(1), B(t2 ) t2 B(1))
Then,
= Cov(B(t1 ), B(t2 )) t2 Cov(B(t1 ), B(1)) !
n n
t1 Cov(B(1), B(t2 )) + t1 t2 Cov(B(1), B(1)) 1X 1X
E(Fn (t)) = E ⇠i (t) = E(⇠i (t)) = F (t) = t
= t1 t1 t2 t1 t2 + t1 t2 = t1 (1 t2 ) n n
i=1 i=1
MATH3801-MATH3901 Stochastic Processes 625 MATH3801-MATH3901 Stochastic Processes 626
10. Brownian Motion and Stationary Processes 10.6 Gaussian Processes 10. Brownian Motion and Stationary Processes 10.6 Gaussian Processes
Application: the Empirical Distribution Function Application: the Empirical Distribution Function
In addition, for all i and 0 < s < t < 1, Specifically, the process {Un (t), t 0}, where
p
E(⇠i (s)⇠i (t)) = E 1I{Xi s} 1I{Xi t} = E 1I{Xi s} = F (s) = s, Un (t) =
n(Fn (t) t),
so that would converge, in an appropriate sense, to a Gaussian process with
zero mean and covariance function Cov(Xn (s), Xn (t)) = s(1 t)
Cov(⇠i (s), ⇠i (t)) = E(⇠i (s), ⇠i (t)) E(⇠i (s))E(⇠i (t)) = s(1 t) (0  s < t  1)
Pn As we have seen, this process is a Brownian Bridge {B(t), t 0}
As Fn (t) = n1 i=1 ⇠i (t) is a sum of i.i.d. bounded random variables,
the Central Limit Theorem (Slide 143) applies as n ! 1 Therefore, we would expect the approximation
p p
; any vector ( n(Fn (t1 ) t1 ), . . . , n(Fn (tk ) tk )) has a multivariate B(t)
Normal distribution, for any k and any t1 < . . . < tk Fn (t) ' t + p , 0t 1
n
; the process {Fn (t), t 0} approaches a Gaussian process
Such approximations are heavily used in the theory of statistical
as n ! 1
inference
MATH3801-MATH3901 Stochastic Processes 627 MATH3801-MATH3901 Stochastic Processes 628
10. Brownian Motion and Stationary Processes 10.7 Stationary and Weakly Stationary Processes 10. Brownian Motion and Stationary Processes 10.7 Stationary and Weakly Stationary Processes
Stationary processes: definition Stationary processes: examples
Two basic examples:
1 a (continuous-time) Markov Chain {X (t), t 0} with stationary
Definition
distribution {⇡j }, and
A stochastic process {X (t), t 0} is said to be stationary if for all n, s,
t1 , . . ., tn the random vectors P(X (0) = j) = ⇡j
We have shown that once a Markov Chain enters its stationary
(X (t1 ), X (t2 ), . . . , X (tn ))
distribution, it never leaves it
and ; the probabilities P(X (s) = j) = ⇡j for all s > 0
(X (s + t1 ), X (s + t2 ), . . . , X (s + tn )) 2 the process {X (t), t 0} with
have the same joint distribution X (t) = N(t + L) N(t),
; in choosing any fixed point s as the origin, the ensuing process where L > 0 is a fixed constant and {N(t), t 0} is a Poisson
has the same probabilistic behaviour process having rate ; stationary increment property
; N(t + L) N(t) ⇠ N(L) ⇠ N(t + s + L) N(t + s)
; X (t + s) ⇠ X (t) for all s > 0
MATH3801-MATH3901 Stochastic Processes 629 MATH3801-MATH3901 Stochastic Processes 630
10. Brownian Motion and Stationary Processes 10.7 Stationary and Weakly Stationary Processes 10. Brownian Motion and Stationary Processes 10.7 Stationary and Weakly Stationary Processes
Stationary processes: examples Stationary processes:⇣ examples ⌘
The mean is E(X (t)) = E X0 ( 1)N(t)
Example 10.5: the Random Telegraph Signal Process ⇣ ⌘
Let {N(t), t 0} denote a Poisson process, and let X0 be independent of this = E(X0 )E ( 1)N(t) = 0 8t 0
process and be such that
The covariance function is, for 0  t1 < t2 ,
P(X0 = 1) = P(X0 = 1) = 1/2
Cov(X (t1 ), X (t2 )) = E(X (t1 )X (t2 ))
Defining ⇣ ⌘
X (t) = X0 ( 1)N(t) = E X02 ( 1)N(t1 )+N(t2 )
⇣ ⌘
then {X (t), t 0} is called random telegraph signal process. Show that this = E ( 1)2N(t1 ) ( 1)N(t2 ) N(t1 )
process is stationary and compute its mean and covariance function ⇣ ⌘
= E ( 1)N(t2 t1 )
Because X0 is equally likely to be 0 or 1, at any time s X (s) is equally likely to
1
X
be 0 and 1 as well (no matter the value of N(s)) (t2 t1 ) ( (t2 t1 ))k
= ( 1)k e =e 2 (t2 t1 )
N(t) will switch from even to odd (or vice-versa) with constant intensity , so k!
k =0
( 1)N(t) will change sign with constant intensity
2 t
; or Cov(X (s), X (s + t)) = e , for all t, s 0
Starting at time s instead of time 0 does not change the distribution of the
process ; {X (t), t 0} is stationary ; does not depend on s, as expected for a stationary process
MATH3801-MATH3901 Stochastic Processes 631 MATH3801-MATH3901 Stochastic Processes 632
10. Brownian Motion and Stationary Processes 10.7 Stationary and Weakly Stationary Processes 10. Brownian Motion and Stationary Processes 10.7 Stationary and Weakly Stationary Processes
Weakly stationary process Weakly stationary process: Example
In fact, the condition for a process to be stationary is rather stringent Example 10.6
and excludes many processes. However, what matters in most of the
Let {B(t), t 0} be a standard Brownian motion process, and define, for
situations is that the mean function E(X (t)) is constant and the some ↵ and > 0,
covariance function Cov(X (s), X (s + t)) does not depend on s ◆ ✓
t ↵2 2 t
V (t) = e B e
Definition 2
A stochastic process {X (t), t 0} is said to be weakly stationary (or Is the process {V (t), t 0} stationary, weakly stationary?
second-order stationary) if the mean function E(X (s)) and the Let us compute its mean and covariance functions:
covariance function Cov(X (s), X (s + t)) do not depend on s ✓ ✓ ◆◆
↵2 2 s s
E(V (s)) = e
E B e =0
The covariance between X (t1 ) and X (t2 ) must depend only on |t2 t1 |: 2
Cov(X (t1 ), X (t2 )) = r (|t2 t1 |) and for 0  t1 < t2 ,
✓ ✓ 2 ◆ ✓ ◆◆
↵ 2 t1 ↵2 2 t2
for some function r Cov(V (t1 ), V (t2 )) = e (t1 +t2 )
Cov B e ,B e
2 2
Obviously, stationary process ) weakly stationary process, but the ↵2 2 ↵2
t1 (t2 t1 )
reverse is not true =e (t1 +t2 )
e = e
2 2
For Gaussian processes, weakly stationary ) stationary process
MATH3801-MATH3901 Stochastic Processes 633 MATH3801-MATH3901 Stochastic Processes 634
10. Brownian Motion and Stationary Processes 10.7 Stationary and Weakly Stationary Processes 10. Brownian Motion and Stationary Processes 10.7 Stationary and Weakly Stationary Processes
Weakly stationary process: Example Weakly stationary processes: Example
; depends only on |t2 t1 | ; weakly stationary
; but Gaussian process! ; stationary Example 10.8
Compare with the mean and covariance functions of the Ornstein-Uhlenbeck Suppose in the random telegraph signal process, we drop the requirement
process (Slides 615-616) that P(X0 = 1) = P(X0 = 1) = 1/2, and only require that E(X0 ) = 0. Is the
process stationary, weakly stationary?
; same expressions if t1 + t2 ! 1 !
; hence, the process It will only remain stationary if X0 has a symmetric distribution ( X0 has the
same distribution as X0 ). However, the process will in any case be weakly
⇢ ✓ ◆
t ↵2 2 t
stationary since ⇣ ⌘
e B e ,t 0
2 E(X (t)) = E(X0 )E ( 1)N(t) = 0
is the stationary form of the Ornstein-Uhlenbeck process and
In other words, this is how the Ornstein-Uhlenbeck process behaves in the Cov(X (t), X (t + s)) = E(X (t)X (t + s)) = E(X02 )e 2 s
long term
Note: in the textbook, this process V (t) is directly defined as the O-U process
MATH3801-MATH3901 Stochastic Processes 635 MATH3801-MATH3901 Stochastic Processes 636
10. Brownian Motion and Stationary Processes 10.7 Stationary and Weakly Stationary Processes 10. Brownian Motion and Stationary Processes 10.7 Stationary and Weakly Stationary Processes
Weakly stationary processes: Example Weakly stationary processes: Example
The idea of ‘stationarity’ easily adapts to discrete-time processes as It follows:
n
well. Here is an example: X
n i
E(Xn ) = E(Zi ) = 0 8n 0
Example 10.7: an autoregressive process i=0
and
Let Z , Z , Z , . . . be uncorrelated random variables with E(Zn ) = 0, n
0 1 2 0 and
n n+m
!
⇢ 2 2
X X
n=0 n i n+m i
/(1 ), Cov(Xn , Xn+m ) = Cov Zi , Zi
Var(Zn ) = 2
, n 1 i=0 i=0
n
X
2 n i n+m i
where < 1. Define X0 = Z0 and = Cov(Zi , Zi )
i=0
Xn = Xn + Zn , n 1. n
!
1 X
2 2n+m 1 2i
= 2
+
The process {Xn , n 0} is called a first-order autoregressive process. Is it 1
i=1
weakly stationary? 2 m
= 2
Iterating yields Xn = ( Xn 2+ Zn 1) + Zn = 2
Xn 2 + Zn 1 + Zn 1
Pn n i
= i=0 Zi ; depends only on m ; weakly stationary (in discrete time)
MATH3801-MATH3901 Stochastic Processes 637 MATH3801-MATH3901 Stochastic Processes 638
10. Brownian Motion and Stationary Processes 10.7 Stationary and Weakly Stationary Processes 10. Brownian Motion and Stationary Processes 10.7 Stationary and Weakly Stationary Processes
(Weakly) stationary processes: properties (Weakly) stationary processes: properties
Why are stationary processes important?
stationary processes are easier to handle than non-stationary Law of Large Numbers for discrete-time weakly stationary
processes, as their statistical properties remain the same for the processes - Ergodic Theorem
entire random process Let {Xn , n 1} be a weakly stationary process having mean µ and
under some conditions, stationary processes behave “like” an i.i.d. covariance function r (i) = Cov(Xn , Xn+i ). Then,
sequence of random variables
For instance, recall the Law of Large Numbers (Slide 139): X1 + X2 + . . . + Xn
!µ as n ! 1
n
Law of Large Numbers for i.i.d. sequences
with probability 1, if and only if
Let X1 , X2 , . . ., Xn be a sequence of independent random variables
having a common distribution, and let E(Xi ) = µ < 1. Then, with n
X r (i)
probability 1, lim !0
n!1 n
i=1
X1 + X2 + . . . + Xn
!µ as n ! 1
n ; the covariance has to decrease “fast enough” for this result to hold
Then, we have the analogue for discrete-time weakly stationary (Recall the Ergodic Theorem for Markov Chains, Slide 259)
processes
MATH3801-MATH3901 Stochastic Processes 639 MATH3801-MATH3901 Stochastic Processes 640
“11”. Introduction to Martingales 11.1 Introduction
Introduction
The word Martingale comes from the French city of Martigues, whose
inhabitants were said to know about
a betting strategy that guaranteed a profit
11 Introduction to Martingales The short story about martingales is that they were developed to prove
that this is not possible (under reasonable assumptions)
Martingales are a general class of stochastic process, for which a nice
theory exists
They turn out to be useful in the general study of stochastic processes,
especially in financial mathematics
; if you show that a particular process is a martingale, then you can
immediately assume a lot of theory and prove results quickly and
easily
MATH3801-MATH3901 Stochastic Processes 641
“11”. Introduction to Martingales 11.1 Introduction “11”. Introduction to Martingales 11.1 Introduction
A model for betting
Consider a gambler who is playing a sequence of games in each of “He [Bond] was playing a progressive system on red at table five.
which he wins with probability p and he loses with probability 1 p (...) It seems that he is persevering and plays in maximums.”
Let {Yn , n 1} be a sequence of i.i.d. random variables indicating the Ian Fleming, Casino Royale
outcome of the nth game:
P(Yn = 1) = p = 1 P(Yn = 1), n 1
If the gambler bets $N on a game, he gets $N if he wins and must pay
$N if he loses
Let Xn (n 1) denote the gambler’s gain after the nth game, and
assume X0 = 0
Suppose that the gambler bets bn on the nth game, then ; as an expert in stochastic processes, James Bond was betting
according to the ‘Martingale strategy’
Xn = Xn 1 + bn Yn n 1
MATH3801-MATH3901 Stochastic Processes 642 MATH3801-MATH3901 Stochastic Processes 643
“11”. Introduction to Martingales 11.1 Introduction “11”. Introduction to Martingales 11.1 Introduction
The “Martingale” strategy (... or James Bond’s strategy)
The “Martingale” strategy
Consider that the game is fair, that is, p = 1/2 This strategy can be represented by a Markov Chain {Xn , n 0}
⌦
Strategy: 8 1 m kg
1/2 J 1/2
bet $1 on the first game; stop if you win
1/2 1/2 1/2
if not, bet $2 on the second game; stop if you win
... 0 2 1 2 3 2 7 4 ...
1/2 1/2 1/2 1/2
stop betting if you win, if not double your bet
...
keep doubling your bet until you eventually win ; Xn can take only two values:
⇢ Pn k 1
In mathematical terms: k =1 2 =1 2n with prob 1/2n
Xn =
⇢ 1 with prob 1 1/2n
2n 1 if 1  n  T
bn =
0 if n > T
where T = min{i 1 : Yi = 1}
MATH3801-MATH3901 Stochastic Processes 644 MATH3801-MATH3901 Stochastic Processes 645
“11”. Introduction to Martingales 11.1 Introduction “11”. Introduction to Martingales 11.1 Introduction
The “Martingale” strategy The “Martingale” strategy
If you stop the game at any fixed n,
1 1
E(Xn ) = (1 2n ) ⇥ + 1 ⇥ (1 )=0 However, consider the amount S lost before you win:
2n 2n
1
X
; fair game...
E(S) = E(S|T = n)P(T = n)
But if you stop after your first win, i.e., after the T th game: n=1
1
X 1 ✓
X ◆
1 1 1
XT = 1 = (2n 1
1) n = =1
2 2 2n
n=1 n=1
; you are sure to win!
; on average, you need an infinite capital to win $1
See also that T ⇠ Geo(1/2):
; useless strategy
1
P(T = n) = (n = 1, 2, . . .)
2n
; P(T = 1) = 0 ; you are sure to eventually win in your lifetime
MATH3801-MATH3901 Stochastic Processes 646 MATH3801-MATH3901 Stochastic Processes 647
“11”. Introduction to Martingales 11.1 Introduction “11”. Introduction to Martingales 11.2 Conditional expectations and filtrations
The “Martingale” theory Measurability and Conditional expectation
One might think that by gaining experience as the game proceeds, and
by quitting at a cleverly chosen opportune time based on the gambler’s
experience, a fair game could be turned into a favorable game
The main results of the martingale theory demonstrate that this is not Definition
the case under ‘reasonable’ assumptions (i.e., when the gambler has A random variable Y that can be written as a function of the set of r.v.
limited time and limited resources) X0 , X1 , . . . , Xn is called measurable with respect to X0 , X1 , . . . , Xn
Essentially:
Central to the definition of a martingale is the idea of conditional
you cannot change a fair game in a favourable one, even with a expectation with respect to a set of r.v. X0 , X1 , . . . , Xn
well-chosen betting strategy
Note: the basic definitions are inspired by crude notions of gambling
(back to Cardano, Slide 5), but the theory has become a sophisticated
tool of modern abstract mathematics
MATH3801-MATH3901 Stochastic Processes 648 MATH3801-MATH3901 Stochastic Processes 649
“11”. Introduction to Martingales 11.2 Conditional expectations and filtrations “11”. Introduction to Martingales 11.2 Conditional expectations and filtrations
Measurability and Conditional expectation Filtration
We know that the conditional expectation E(Y |X0 , X1 , . . . , Xn ) is a If X0 , X1 , X2 , . . . is a sequence of random variables, we will denote Fn
random variable, characterised by two properties: the “information” contained in X0 , X1 , . . . , Xn and we will write
(1) E(Y |X0 , X1 , . . . , Xn ) is measurable with respect to X0 , X1 , . . . , Xn , E(Y |Fn ) = E(Y |X0 , X1 , . . . , Xn )
i.e., there exists some function such that
; note that, for n  m, Fn ✓ Fm , as there is more information in
E(Y |X0 , X1 , . . . , Xn ) = (X0 , X1 , . . . , Xn )
(X0 , X1 , . . . , Xn ) [ (Xn+1 , . . . , Xm ) than in (X0 , X1 , . . . , Xn ) only
(2) if A is an event that depends only on X0 , X1 , . . . , Xn , then Such a collection {Fn , n 0} is called a filtration, and can be
considered as the potential information that is being revealed as time
E(Y 1I{A} ) = E(E(Y |X0 , X1 , . . . , Xn )1I{A} ), progresses
where 1I{A} is the indicator function of A Note 1: rigorously speaking, Fn is the -field generated by
The first one was expounded on Slide 162, the second one is a (X0 , X1 , . . . , Xn ), that is, the set of all events determined by the random
particular case of the result on Slide 165, with g being 1I{A} (there we variables (X0 , X1 , . . . , Xn )
were conditioning on one r.v., here we have n r.v. to condition on – but
Note 2: Y measurable with respect to X0 , X1 , . . . , Xn
the idea is the same)
, Y measurable with respect to Fn
MATH3801-MATH3901 Stochastic Processes 650 MATH3801-MATH3901 Stochastic Processes 651
“11”. Introduction to Martingales 11.2 Conditional expectations and filtrations “11”. Introduction to Martingales 11.3 Martingales: definition and examples
Properties of the conditional expectation Definition
1 E(E(Y |Fn )) = E(Y ) (law of iterated expectation - Slide 163) Definition
2 if Y is measurable with respect to Fn , then A discrete-time stochastic process {Mn , n 0} is a martingale with
E(Y |Fn ) =Y respect to the filtration {Fn , n
0} if
1 E(|Mn |) < 1, for all n 0;
3 if Y is independent of X0 , X1 , . . . , Xn , then 2 E(Mm |Fn ) = Mn , for all m n
E(Y |Fn ) = E(Y )
the first condition guarantees that the conditional expectations are
well defined
4 for any Y , if n  m, then (iterated conditional expectations - Slide the second condition implies that Mn is measurable wrt Fn
189)
E(E(Y |Fm )|Fn ) = E(Y |Fn ) {Fn , n 0} can be the filtration associated to any sequence
X0 , X1 , . . ., not necessarily M0 , M1 , . . .
if we say that {Mn , n 0} is a martingale without reference to the
5 if Y is any random variable and Z is measurable w.r.t. Fn ,
filtration, we understand that {Fn , n 0} is the filtration
E(Y Z |Fn ) = Z E(Y |Fn ) associated to the sequence M0 , M1 , . . . itself
MATH3801-MATH3901 Stochastic Processes 652 MATH3801-MATH3901 Stochastic Processes 653
“11”. Introduction to Martingales 11.3 Martingales: definition and examples “11”. Introduction to Martingales 11.3 Martingales: definition and examples
Martingale Martingale: example
The second condition amounts to Martingales are very common stochastic processes. They have also
become an important tool in modern financial mathematics as they
E(Mn+1 |Fn ) = Mn , for all n 0 provide one idea of fair value in financial markets
We have also Let Xn be the closing price at the end of day n of a share of stock
In a perfect market, it should not be possible to predict whether a
E(Mn+1 ) = E(E(Mn+1 |Fn )) = E(Mn ) n 0,
future price Xn+1 will be higher or lower than the current Xn :
that is, if the future price could be expected to be higher, then a number
of buyers would enter the market and their demand would raise
E(Mn+1 ) = E(Mn ) = E(Mn 1) = . . . = E(M1 ) = E(M0 ) the current price Xn
if the future price could be expected to be lower, then a number of
; a martingale is a process with constant mean
sellers would appear and that would depress the current price
; a martingale is a model for a fair game: regardless of a player’s ; equilibrium obtains where the future price cannot be predicted, on
current and past fortunes, his expected fortune at any time in the average, as higher or lower
future is the same as his current fortune
(it is not more/less likely to win than to lose) ; where the price sequence {Xn , n 0} is a martingale
MATH3801-MATH3901 Stochastic Processes 654 MATH3801-MATH3901 Stochastic Processes 655
“11”. Introduction to Martingales 11.3 Martingales: definition and examples “11”. Introduction to Martingales 11.3 Martingales: definition and examples
Sub- and supermartingale Martingale, submartingale and supermartingale
Definition Consider the betting model introduced on Slide 642
A discrete-time stochastic process {Xn , n 0} is a submartingale Xn = Xn + bn Yn , n 1,
1
(supermartingale) with respect to the filtration {Fn , n 0} if
1 E(|Xn |) < 1, for all n 0; and the filtration {Fn , n 0} associated to X0 , X1 , . . .
2 E(Xm |Fn ) () Xn , for all m n; Assume that P(Yn = 1) = p, so that E(Yn ) = 2p 1, for some value p,
3 Xn is measurable wrt Fn and that the bet bn is deterministic, or measurable wrt Fn 1
A submartingale is a model for a favourable game, because the Then, for any n 1,
expected fortune tends to increase in the future
A supermartingale is a model for an unfavourable game, because the E(Xn |Fn 1) = Xn 1 + bn E(Yn |Fn 1) = Xn 1 + bn (2p 1)
expected fortune tends to decrease in the future
Super- and submartingales have many of the properties of ; if p = 1/2, E(Xn |Fn = Xn ; martingale
1) 1
martingales, and are often of interest
; if p > 1/2, E(Xn |Fn 1) Xn 1 ; submartingale
See also that any martingale is automatically both a submartingale
; if p < 1/2, E(Xn |Fn 1 )  Xn 1 ; supermartingale
and a supermartingale
MATH3801-MATH3901 Stochastic Processes 656 MATH3801-MATH3901 Stochastic Processes 657
“11”. Introduction to Martingales 11.3 Martingales: definition and examples “11”. Introduction to Martingales 11.3 Martingales: definition and examples
Importance of martingales Importance of martingales
Actually, for a martingale {Mn , n 0}, we can always write
n
X
Mn = M0 + ⇠j
j=1
The importance of martingales in modern probability theory is that
most of the essential properties of sums of i.i.d. random variables are
where ⇠j = Mj Mj 1. The process {⇠n , n 0} is called the martingale
inherited (with minor modifications) by martingales, despite they may
difference process
not be sums of i.i.d. random variables!
Now, it is clear that
; there are versions of the Strong Law of Large Numbers, the Central
⇠j is measurable with respect to Fj for all j 0
Limit Theorem, etc. for martingales
E(⇠j+1 |Fj ) = E(Mj+1 Mj |Fj ) = Mj Mj = 0 for all j 0
E(⇠j ⇠k +1 ) = E E(⇠j ⇠k +1 )|Fk = E ⇠j E(⇠k +1 |Fk ) = E(⇠j ⇥ 0) = 0
for all j  k ; the random variables ⇠j , j 0, are uncorrelated
; a martingale can always be written as a sum of uncorrelated
random variables, and this gives some appreciation of why the idea
on the previous slide might be true
MATH3801-MATH3901 Stochastic Processes 658 MATH3801-MATH3901 Stochastic Processes 659
“11”. Introduction to Martingales 11.3 Martingales: definition and examples “11”. Introduction to Martingales 11.3 Martingales: definition and examples
Markov’s inequality for nonnegative martingale Maximal inequality for nonnegative martingale
Markov’s inequality (Slide 137) states that for any nonnegative random Theorem
variable X and for any a > 0, Let {Mn , n 0} be a martingale process with nonnegative values.
Then, for any a > 0,
E(X ) ✓ ◆
P(X a)  E(M0 )
a P max Mn a 
n 0 a
Because a martingale has constant mean, this applied to any
nonnegative martingale process immediately yields that for any a > 0, Proof: take m 0 and see that
m
X
E(M0 ) E(Mm ) = E(Mm 1I{M0 <a,M1 <a,...,Mn 1 <a,Mn a} ) + E(Mm 1I{M0 <a,M1 <a,...,Mm <a} )
P(Mn a) 
a n=0
m
X
for all n = 0, 1, 2, . . . E(Mm 1I{M0 <a,M1 <a,...,Mn 1 <a,Mn a} )
n=0
; this limits the probability of a large value for a single observation Mn m
X
= E E(Mm |M0 , M1 , . . . , Mn ) 1I{M0 <a,M1 <a,...,Mn 1 <a,Mn a}
We can actually state a result much stronger than this n=0
(property (2) Slide 650)
MATH3801-MATH3901 Stochastic Processes 660 MATH3801-MATH3901 Stochastic Processes 661
“11”. Introduction to Martingales 11.3 Martingales: definition and examples “11”. Introduction to Martingales 11.3 Martingales: definition and examples
(ctd.) But {Mn , n 0} is a martingale, so E(Mm |M0 , M1 , . . . , Mn ) = Mn and
E(Mm ) = E(M0 ) Maximal inequality for nonnegative martingale
m
X
; E(M0 ) E Mn 1I{M0 <a,M1 <a,...,Mn 1 <a,Mn a}
n=0 ; this maximal inequality limits the probability of observing a large
m
X value at any time in the infinite future of the martingale!
a E 1I{M0 <a,M1 <a,...,Mn 1 <a,Mn a}
n=0 As an illustration, consider again the betting model as on Slide 657
✓ ◆
with p = 1/2 (fair game ; martingale) and wlog suppose X0 = 1
aP max Mn a
0nm
The maximal inequality with e.g. a = 2 asserts that the probability that
whence we can conclude that, for all m 0, the gambler ever doubles his money must be less than 1/2, and this
✓ ◆ holds no matter what the game is and no matter what is the betting
E(M0 )
P max Mn a  strategy (the way of setting the stake bn )
0nm a
; the maximal inequality is a very strong statement
As the right-hand side does not depend on m, the announced result follows:
✓ ◆
E(M0 )
P max Mn a 
0n a
MATH3801-MATH3901 Stochastic Processes 662 MATH3801-MATH3901 Stochastic Processes 663
“11”. Introduction to Martingales 11.3 Martingales: definition and examples “11”. Introduction to Martingales 11.3 Martingales: definition and examples
Martingales: examples Martingales: examples
Symmetric random walk Symmetric random walk
Pn Pn
Let Xn = i=1 Yi , with Yi , i 1 i.i.d. random variables such that Let Xn = Yi , with Yi , i 1 i.i.d. random variables such that
i=1
P(Yi = 1) = P(Yi = 1) = 1/2, P(Yi = 1) = P(Yi = 1) = 1/2,
and let X0 = 0. (a) Is {Xn , n 0} a martingale? and let X0 = 0. (b) Is {Wn , n 0}, where
Let {Fn , n 0} be the filtration associated to Y0 , Y1 , . . ., and see that Xn is Wn = Xn2 n,
measurable w.r.t. Fn
We then verify the two conditions: a martingale?
Pn
1 E(|X0 |) = 0 and for all n 1, E(|Xn |)  i=1 E(|Yi |) = n < 1;
2 for all n 0, E(Xn+1 |Fn ) = E(Xn + Yn+1 |Fn ) = Xn + E(Yn+1 ) = Xn
1 E(|Wn |)  E(Xn2 ) + n = Var(Xn ) + n = 2n < 1;
; {Xn , n 0} is a martingale 2 E(W |Fn ) = E((Xn + Yn+1 )2 (n + 1)|Fn )
n+1
= E(Xn2 |Fn ) + 1 + 2Xn E(Yn+1 |Fn ) (n + 1)
Note: any series of independent random variables Y1 , Y2 , . . . , with E(Yi ) = 0 = Xn2 + 1 + 0 (n + 1) = Wn + 1 1 = Wn
(and E(|Yi |) < 1) for all i (not necessarily identically distributed!) would
define a martingale in this framework ; {Wn , n 0} is a martingale with respect to {Fn , n 0}
MATH3801-MATH3901 Stochastic Processes 664 MATH3801-MATH3901 Stochastic Processes 665
“11”. Introduction to Martingales 11.3 Martingales: definition and examples “11”. Introduction to Martingales 11.3 Martingales: definition and examples
Martingales: examples Martingales: examples
Random walk
Pn
Example Let Xn = i=1 Yi , with Yi , i 1, i.i.d. random variables such that
Qn
Let Xn = i=1 Yi , where Yi , i 1, are independent random variables with P(Yi = 1) = p = 1 P(Yi = 1),
E(Yi ) = 1 and E(|Yi |) < 1 for all i. Is {Xn , n 1} a martingale?
and let X0 = 0
Let {Fn , n 1} be the filtration associated to Y1 , . . .. Then,
Qn Let {Fn , n 1} be the filtration associated to Y1 , Y2 , . . .. Then, similarly to
1 E(|Xn |) = i=1 E(|Yi |) < 1; above, we have that
2 E(X |F ) = E(Xn ⇥ Yn+1 |Fn )
n+1 n
= Xn E(Yn+1 |Fn ) E(Xn+1 |Fn ) = Xn + E(Yn+1 ) = Xn + (2p 1) 6= Xn
= Xn E(Yn+1 ) = Xn in general. However, define the process {Xn⇤ , n 0}, with
; {Xn , n 0} is a martingale with respect to {Fn , n 0}
Xn⇤ = Xn n(2p 1)
Note: if E(Yi ) = µ < 1 (resp. µ > 1), then {Xn , n 1} is a supermartingale
; E(Xn+1
⇤
|Fn ) = Xn⇤ , making {Xn⇤ , n 0} into a martingale wrt {Fn , n 1}
(resp. submartingale)
; most often, only basic operations are enough to reveal a martingale from
a given process
MATH3801-MATH3901 Stochastic Processes 666 MATH3801-MATH3901 Stochastic Processes 667
“11”. Introduction to Martingales 11.3 Martingales: definition and examples “11”. Introduction to Martingales 11.3 Martingales: definition and examples
Martingales: examples Martingales: examples
Consider again Example 3.14, Slide 171
The Matching Problem
The Matching Problem
Suppose in Example 2.31 bis (Slide 93) that the users getting their own
Suppose in Example 2.31 bis (Slide 93) that the users getting their own
passwords are put aside, while the others are randomly assigned a
passwords are put aside, while the others are randomly assigned a
new password from the remaining passwords. The process continues
new password from the remaining passwords. The process continues
until each user has their own password. Let Xn denote the number of
until each user has their own password. Let Xn denote the number of
users yet to recover their own passwords after the nth round.
users yet to recover their own passwords after the nth round.
This implies that
Let {Fn , n 1} be the filtration associated to X0 , X1 , X2 , . . .. At each
round, we expect one user to get their own password back (Slide 93), E(Xn+1 + (n + 1)|Fn ) = 1 + (n + 1) + Xn = Xn + n
i.e. E(Xn Xn+1 |Fn ) = 1 for all n 0
In fact, we even have ; defining Xn⇤ = Xn + n, we have E(Xn+1
⇤ |F ) = X ⇤ for all n
n n
Of course, E(|Xn⇤ |) = E(Xn ) + n < 2n < 1
E(Xn+1 |Fn ) = E(Xn+1 Xn + Xn |Fn ) = 1 + Xn
; the process {Xn⇤ , n 0} is a martingale
MATH3801-MATH3901 Stochastic Processes 668 MATH3801-MATH3901 Stochastic Processes 669
“11”. Introduction to Martingales 11.3 Martingales: definition and examples “11”. Introduction to Martingales 11.3 Martingales: definition and examples
Martingales: examples Martingales: examples
The previous {Wn , n 0} is not the only martingale associated to a
Branching process branching process
Let {Xn , n 0} be a branching process with mean offspring number equal to Define Vn = ⇡0Xn for n = 0, 1, 2, . . ., where ⇡0 is the probability of eventual
µ extinction of the branching process
Recall that ⇡0 is a root of G(s) = s, with G being the pgf of the offspring
Let {Fn , n 1} be the filtration associated to X0 , X1 , X2 , . . .. From the results distribution
on branching processes that we showed on Slide 305, we can write that P Xn
Then, as Xn+1 = k =1 Yn,k (Slide 297), for all n 0,
E(Xn+1 |Fn ) = µXn ✓ P ◆ Xn ⇣ ⌘
X n Y Y Y
E(Vn+1 |Fn ) = E ⇡0 k =1 n,k |Fn = E ⇡0 n,k |Fn
; unless µ = 1, this is not a martingale k =1
Xn
However, defining Wn = µn for all n 0, we get Xn
Y ⇣
Y
⌘ YXn
= E ⇡0 n,k = G(⇡0 ) = G(⇡0 )Xn = ⇡0Xn = Vn
E(Xn+1 |Fn ) µXn Xn k =1 k =1
E(Wn+1 |Fn ) = E(Xn+1 /µn+1 |Fn ) = = n+1 = n = Wn
µn+1 µ µ ; {Vn , n 0} is a martingale wrt {Fn , n 0}
; {Wn , n 0} is a martingale with respect to Fn These facts are very important in the study of the long-term behaviour of
branching processes
MATH3801-MATH3901 Stochastic Processes 670 MATH3801-MATH3901 Stochastic Processes 671
“11”. Introduction to Martingales 11.3 Martingales: definition and examples “11”. Introduction to Martingales 11.3 Martingales: definition and examples
Martingales: examples Martingales: examples
Example: Markov Chains Wald’s martingale
Let {Xn , n 0} be a discrete-time Markov Chain, taking values in the finite
Let Yi , i 1 be i.i.d. random variables with E(Yi ) < 1 and Var(Y
Pni ) < 1 and
set SX , with transition matrix P. Suppose that there exists a function
let 'Y (t) = E(etY ) be the common mgf of the Yi . Define Xn = i=1 Yi , X0 = 0
: S ! S such that
X X
and
X exp(tXn )
Pij (j) = (i) for all i 2 SX Mn (t) = , n 0
('Y (t))n
j2SX
Is {Mn (t), n 0} a martingale?
Such a function is called a harmonic function for P. Is { (Xn ), n 0} a
martingale? E(exp(tX ))
n ('Y (t))n
1 E(|Mn (t)|) = = =1<1
('Y (t))n ('Y (t))n
Obviously, E(| (Xn )|) is bounded for every n, as SX is a finite set. Moreover,
with {Fn , n 0} be the filtration associated to X1 , X2 , . . ., we have E(exp(tX ) exp(tY )|F )
n n+1 n exp(tX ) E(exp(tYn+1 ))
n
2 E(Mn+1 (t)|Fn ) = =
X ('Y (t))n+1 ('Y (t))n 'Y (t)
E( (Xn+1 )|Fn ) = E( (Xn+1 )|Xn ) = P (j) = (Xn ) Xn ,j exp(tXn )
= = Mn (t)
j2SX ('Y (t))n
for all n 0 Note: see that there is a separate martingale for every real value t
; { (Xn ), n 0} is a martingale w.r.t. {Fn , n 0} (for which 'Y (t) exists)
MATH3801-MATH3901 Stochastic Processes 672 MATH3801-MATH3901 Stochastic Processes 673
“11”. Introduction to Martingales 11.3 Martingales: definition and examples “11”. Introduction to Martingales 11.3 Martingales: definition and examples
Martingales: examples Martingales: examples
So far we have only considered the martingale property (that is, the
Doob’s martingale
‘constant mean’ property) in discrete time. However, the same ideas
Let X be any random variable such that E(|X |) < 1 and let {Yn , n 0} be easily carries over to the continuous-time case
any stochastic process. Consider {Fn , n 0} the filtration associated to
Y0 , Y1 , . . .. Define Consider a Brownian motion {X (t), t 0}
Xn = E(X |Fn ), n 0.
Due to the independent and stationary increment properties, we know
Is {Xn , n 0} a martingale? (Slide 573) that, for all s, t 0,
2
1 E(|Xn |) = E(|E(X |Fn )|)  E(E(|X | |Fn )) = E(|X |) < 1 X (t + s)|X (t) ⇠ N (X (t), s)
2 for all n 0, E(Xn+1 |Fn ) = E(E(X |Fn+1 )|Fn ) = E(X |Fn ) = Xn Hence, for all s, t 0,
; {Xn , n 0} is a martingale wrt {Fn , n 0} E(X (t + s)|X (t)) = X (t)
In a sense, Xn approximates X (“projection”) and as n increases the ; constant mean process
approximation becomes better and better
; a Brownian motion is a martingale in continuous-time!
; important result in the theory of convergence of random variables
MATH3801-MATH3901 Stochastic Processes 674 MATH3801-MATH3901 Stochastic Processes 675
“11”. Introduction to Martingales 11.3 Martingales: definition and examples “11”. Introduction to Martingales 11.3 Martingales: definition and examples
Martingales: examples Martingales: examples
A long time before the martingale theory was set up, Abraham de Moivre If on the other hand 0 < Xn < N, then
made use of a martingale argument to answer the ‘gambler’s ruin’ problem ⇣ ⌘
X +Y
E(Dn+1 |Y1 , . . . , Yn ) = E(Dn+1 |X1 , . . . , Xn ) = E (q/p) n n+1 |X1 , . . . , Xn
p p p p
*
.( . . j
, * = (q/p)Xn E((q/p)Yn+1 )
1
6 0 j 1 j 2 j N–1 N j 1
1
q q q q = Dn (p(q/p) + q(q/p) ) = Dn
Then, as a martingale, E(Dn ) = E(D0 ) for all n, so that
De Moivre’s martingale
E(Dn ) = (p/q)i if X0 = i
Write Y1 , Y2 , . . . for the steps (±1) of the random walk, and Xn for the position
after n steps, where X0 = i. Now define Now, let T be the number of steps before the absorption in 0 or N. de Moivre
✓ ◆Xn argued as follows: as E(Dn ) = (q/p)i for all n, it must also be true at time T
q
Dn = , for n 0 E(DT ) = (q/p)i
p
(look back at Slide 646 to see that this is not necessarily true!)
de Moivre claimed that E(Dn+1 |Y1 , . . . , Yn ) = Dn for all n With Pi the probability to reach N before 0 when starting in state i, we also
have
If Xn equals 0 or N then the process has stopped by time n, so Xn+1 = Xn
E(DT ) = (q/p)0 (1 Pi ) + (q/p)N Pi
; Dn+1 = Dn , hence E(Dn+1 ) = E(Dn )
MATH3801-MATH3901 Stochastic Processes 676 MATH3801-MATH3901 Stochastic Processes 677
“11”. Introduction to Martingales 11.3 Martingales: definition and examples “11”. Introduction to Martingales 11.4 Optional Stopping Theorem
Martingales: examples Stopping time: definition
Solving for Pi yields Definition
1 (q/p)i A random variable T is said to be a stopping time with respect to the
Pi =
1 (q/p)N filtration {Fn , n 0} if
as long as p 6= q, in agreement with the result on Slide 265 1 T takes values in {0, 1, 2, . . .}
We see that the argument holds water, as long as 2 for any n 0, 1I{T =n} is measurable with respect to Fn
E(DT ) = E(D0 )
; if you know Fn , you know whether T = n or not
for the random variable T ; at any time n, you can tell if it is time to stop or not
; a major part of the martingale theory is to determine conditions on such ; the random variable T does not ‘look into the future’
random variables T and on the martingale process which ensure that the
above equality is true Properties:
Here, most of those conditions (see later) are trivially satisfied because the (i) if T is a stopping time, then ⌧n = min(T , n) where n is a fixed
chain is finite integer, is a stopping time
(ii) if T and V are two stopping times, then so are min(T , V ) and
; de Moivre’s argument was right, by serendipity
max(T , V )
MATH3801-MATH3901 Stochastic Processes 678 MATH3801-MATH3901 Stochastic Processes 679
“11”. Introduction to Martingales 11.4 Optional Stopping Theorem “11”. Introduction to Martingales 11.4 Optional Stopping Theorem
Stopping time: examples Stopping time: examples
Example
Pn
Let Xn = i=1 Yi , with Yi , i 1 be i.i.d. random variables such that
P(Yi = 1) = p = 1 P(Yi = 1),
Example
and let X0 = 0. Let Fn the information contained in X0 , X1 , . . . , Xn . We
Let T be such that P(T = ⌧ ) = 1 (degenerate random variable). Is T a
consider different stopping rules. Are they stopping times?
stopping time?
Clearly, T is a stopping time. T is fixed so that we know whether T = n or not (a) Let Tj = min{n 0 : Xn = j}. At time n, Tj is determined by the value of
for any n, even without knowing Fn Xn , known from Fn ; for any j, Tj is a stopping time
(r )
; deterministic stopping time (b) Let Tj be the r th passage time of the process in j ; for any j and r ,
(r )
Tj is a stopping time
(c) Let ⌧j = Tj 1. At time n, ⌧j = n if Tj = n + 1, so that it is not measurable
wrt ; is not a stopping time
Fn ⌧j
(d) ⌧ = max{n 0 : Sn = j}, the last time the process visits j ; ⌧ is not a
stopping time
MATH3801-MATH3901 Stochastic Processes 680 MATH3801-MATH3901 Stochastic Processes 681
“11”. Introduction to Martingales 11.4 Optional Stopping Theorem “11”. Introduction to Martingales 11.4 Optional Stopping Theorem
Stopped martingale Stopped martingale
Proposition 2 Note that, for all n 0,
Let {Mn , n 0} be a martingale and T be a stopping time with respect Mmin(T ,n+1) = Mn+1 and Mmin(T ,n) = Mn if T > n
to the filtration {Fn , n 0} associated to M0 , M1 , . . .. Then,
and
E(Mmin(T ,n) ) = E(M0 ) 8n 0 Mmin(T ,n+1) = Mmin(T ,n) if T  n
Hence,
; using any bounded stopping time as a betting strategy yields no Mmin(T ,n+1) = Mmin(T ,n) + (Mn+1 Mn )1I{T >n}
benefit
Proof: we will show that {Mmin(T ,n) , n 0} is a martingale. Check the two
and thus
conditions: E(Mmin(T ,n+1) |Fn ) = Mmin(T ,n) + E (Mn+1 Mn )1I{T >n} |Fn
1 we have = Mmin(T ,n) + 1I{T >n} E ((Mn+1 Mn )|Fn )
|Mmin(T ,n) | < max |Mk |  |M0 | + |M1 | + . . . + |Mn | < 1 = Mmin(T ,n) + 1I{T >n} ⇥ 0 = Mmin(T ,n)
0k n
; you cannot ‘beat’ a fair game if you are only ready to play for (at
because {Mn , n 0} is a martingale ; |Mn | < 1 for all n by definition
most) a fixed number of times
MATH3801-MATH3901 Stochastic Processes 682 MATH3801-MATH3901 Stochastic Processes 683
“11”. Introduction to Martingales 11.4 Optional Stopping Theorem “11”. Introduction to Martingales 11.4 Optional Stopping Theorem
Unbounded stopping times Optional Stopping Theorem
Optional Stopping Theorem
If we do not limit T , what can we say about E(MT )?
Let {Mn , n 0} be a martingale and T be a stopping time with respect
We have, for any given n, to the filtration {Fn , n 0} associated to M0 , M1 , . . .. If
MT = Mmin(T ,n) + MT 1I{(T >n)} Mn 1I{(T >n)} P(T < 1) = 1,
Therefore, E(|MT |) < 1
E(MT ) = E(M0 ) + E(MT 1I{(T >n)} ) E(Mn 1I{(T >n)} ) (?) and
lim E(|Mn |1I{(T >n)} ) = 0,
n!1
This leads to the following theorem, which turns out to be the
cornerstone of martingale theory then
E(MT ) = E(M0 )
MATH3801-MATH3901 Stochastic Processes 684 MATH3801-MATH3901 Stochastic Processes 685
“11”. Introduction to Martingales 11.4 Optional Stopping Theorem “11”. Introduction to Martingales 11.4 Optional Stopping Theorem
Optional Stopping Theorem Optional Stopping Theorem: note
Proof: it follows from (?) that we have only to show
lim E(MT 1I{(T >n)} ) = 0
n!1
We have The version of the theorem stated here is not the minimal one. The
n
X condition
1 > E(|MT |) = E(|MT |1I{(T =k )} ) + E(|MT |1I{(T >n)} ) lim E(|Mn |1I{(T >n)} ) = 0
n!1
k =0
can be weakened. The necessary and sufficient condition is “the
Take the limit as n ! 1, and see that
sequence {Mn , n 0} is uniformly integrable”, that is, 8" > 0, 9K > 0
lim |E(MT 1I{(T >n)} )|  lim E(|MT |1I{(T >n)} ) = 0 such that
n!1 n!1
sup E(|Mn |1I{(|Mn |>K )} ) < "
as the latter limit is the tail of a convergent series n 0
Concretely, the theorem states that no reasonable strategy can help to
earn money in a fair game
; “reasonable” strategy means a strategy limited either in price or in
time, that is, suitable for a gambler with finite lifetime and a house
limit on bets
MATH3801-MATH3901 Stochastic Processes 686 MATH3801-MATH3901 Stochastic Processes 687
“11”. Introduction to Martingales 11.4 Optional Stopping Theorem “11”. Introduction to Martingales 11.4 Optional Stopping Theorem
Back to the “Martingale” strategy Optional Stopping Theorem: sufficient conditions
For the “Martingale” strategy, we have shown that
E(Xn ) = 0 = E(X0 ) 8n 1,
The assumptions of the Optional Stopping Theorem are often hard to
making {Xn , n 1} a martingale. We have also defined a stopping check in practice
time T by the time of the first win, so that
What we need are simple sufficient conditions that we can verify
1 relatively easily
P(T > n) = !0
2n The next theorem provides such a set of aesthetically attractive
and thus P(T < 1) = 1. Also, MT = 1 so that E(|MT |) < 1. sufficient conditions
However, The proof is not provided, but can be found in Fristedt and Gray (1997,
Chapter 24)
1
E(|Xn |1I{(T >n)} ) = |1 2n |P(T > n) = |1 2n | ! 1,
2n
which makes the strategy not “reasonable”, and the guaranteed profit
is actually just a trick
MATH3801-MATH3901 Stochastic Processes 688 MATH3801-MATH3901 Stochastic Processes 689
“11”. Introduction to Martingales 11.4 Optional Stopping Theorem “11”. Introduction to Martingales 11.4 Optional Stopping Theorem
Optional Stopping Theorem: sufficient conditions Optional Stopping Theorem: examples
Example: Rademacher series
Optional Stopping Theorem: sufficient conditions Let Xn =
Pn Yi
i=1 i ↵ , with Yi , i 1 i.i.d. random variables such that
Let {Mn , n 0} be a martingale and T be a stopping time with respect
to the filtration {Fn , n 0} associated to M0 , M1 , . . .. Suppose any one P(Yi = 1) = P(Yi = 1) = 1/2,
of the following conditions holds: ↵ > 1/2, and let X0 = 0.
For some n < 1, P(⌧ < n) = 1;
Because E( iY↵i ) = 0 for all i, {Xn , n 0} is a martingale, see Slide 664
For some nonnegative random variable Z with E(Z ) < 1,
|Mn | < Z for all n 0; On the other hand,
n
X n 1
For some positive and finite c, |Mn+1 Mn | < c for all n 0, and Var(Y ) X 1
i
X 1
E(Xn2 ) = Var(Xn ) = = < = ⇣(2↵) < 1,
E(⌧ ) < 1; i 2↵ i 2↵ i 2↵
i=1 i=1 i=1
P1
For some finite constant c, E(Mn2 )  c for all n 0, where ⇣ is the Riemann zeta function ⇣(z) = n=1 n z
,z>1
then ; the fourth sufficient condition in the theorem above is verified, hence
E(MT ) = E(M0 )
E(X⌧ ) = E(X0 ) = 0
for any stopping time ⌧
MATH3801-MATH3901 Stochastic Processes 690 MATH3801-MATH3901 Stochastic Processes 691
“11”. Introduction to Martingales 11.4 Optional Stopping Theorem “11”. Introduction to Martingales
Optional Stopping Theorem: examples Epilogue
Example: Random walk
Pn (from the memoirs of Casanova, recalling his stay in Venice in 1754)
Let Xn = i=1 Yi , with Yi , i 1, i.i.d. random variables such that
“Playing the martingale, continually doubling my stake, I won every day
P(Yi = 1) = p = 1 P(Yi = 1),
during the rest of the carnival. I was fortunate enough never to lose the
and let X0 = 0 sixth card, and if I had lost it, I should have been without money to
play, for I had 2000 sequins on that card. I congratulated myself on
We saw that {Xn⇤ , n 0}, with Xn⇤ = Xn n(2p 1), is a martingale having increased the fortune of my dear mistress”
We also have that, for any n 0,
Some days later:
⇤
|Xn+1 Xn⇤ | = |Xn+1 (n + 1)(2p 1) Xn + n(2p 1)| = |Yn+1 (2p 1)|  2
“I still played the martingale, but with such a bad luck that I was soon
; the third sufficient condition in the theorem above is verified, hence left without a sequin. As I shared my property with my mistress, I was
E(X⌧ ⌧ (2p 1)) = E(X⌧⇤ ) = E(X0⇤ ) = 0, obliged to tell her of my losses, and at her request sold all her
diamonds, losing what I got for them. She had now 500 sequins, we
i.e. had nothing more to live on”
E(X⌧ ) = E(⌧ ) ⇥ (2p 1)
for any stopping time ⌧ with finite expectation (E(⌧ ) < 1)
MATH3801-MATH3901 Stochastic Processes 692 MATH3801-MATH3901 Stochastic Processes 693
MATH3801-MATH3901 Stochastic Processes 694

MATH3801 Chapters 1 To 10 PDF
No ratings yet
MATH3801 Chapters 1 To 10 PDF
659 pages
Lecture Slides - 230809 - 154641
No ratings yet
Lecture Slides - 230809 - 154641
249 pages
Probablity Mit Removed
No ratings yet
Probablity Mit Removed
31 pages
B.Tech Ece Ii Year I Semester: Lecture Notes
No ratings yet
B.Tech Ece Ii Year I Semester: Lecture Notes
57 pages
Introduction To Probability Theory-Houghton Mifflin
No ratings yet
Introduction To Probability Theory-Houghton Mifflin
271 pages
ProbabilityTheory Slides
No ratings yet
ProbabilityTheory Slides
33 pages
Figueiredo LxMLS2019 PDF
No ratings yet
Figueiredo LxMLS2019 PDF
204 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
Week 0 Part 2 (1) - Experiment, Outcome, Sample Space, Events
No ratings yet
Week 0 Part 2 (1) - Experiment, Outcome, Sample Space, Events
46 pages
STAT3201 Module 2. Basic Concepts of Probability
No ratings yet
STAT3201 Module 2. Basic Concepts of Probability
29 pages
Chapter 2
No ratings yet
Chapter 2
40 pages
4b ProbabilityNotes
No ratings yet
4b ProbabilityNotes
79 pages
Week 1: Learning Objectives: Introduction To Probability
No ratings yet
Week 1: Learning Objectives: Introduction To Probability
30 pages
LectureSTS 1A
No ratings yet
LectureSTS 1A
35 pages
Leure ( - ) : Probability Calculations
No ratings yet
Leure ( - ) : Probability Calculations
56 pages
US-Introduction To Probability and Stochastic Processes
No ratings yet
US-Introduction To Probability and Stochastic Processes
137 pages
Dips Academy Statistics Notes
No ratings yet
Dips Academy Statistics Notes
80 pages
Hoel, Port, Stone
100% (1)
Hoel, Port, Stone
270 pages
Probability & Statistics MTH 2401 STV V54
No ratings yet
Probability & Statistics MTH 2401 STV V54
273 pages
Notes EC636
No ratings yet
Notes EC636
89 pages
Stochbasics Handout
No ratings yet
Stochbasics Handout
36 pages
CH 1
No ratings yet
CH 1
107 pages
Introduction To Probability and Statistics Course ID:MA2203: Course Teacher: Dr. Manas Ranjan Tripathy
No ratings yet
Introduction To Probability and Statistics Course ID:MA2203: Course Teacher: Dr. Manas Ranjan Tripathy
184 pages
Chapter One
No ratings yet
Chapter One
34 pages
Chapters1 4
No ratings yet
Chapters1 4
69 pages
Lect 1
No ratings yet
Lect 1
57 pages
(Courant Lecture Notes) S. R. S. Varadhan - Probability Theory-Amer Mathematical Society (2001) PDF
100% (1)
(Courant Lecture Notes) S. R. S. Varadhan - Probability Theory-Amer Mathematical Society (2001) PDF
178 pages
Stochastic Modelling Notes Discrete
No ratings yet
Stochastic Modelling Notes Discrete
130 pages
Probablity
No ratings yet
Probablity
310 pages
Probablity
100% (1)
Probablity
312 pages
Introduction and Some Basics
No ratings yet
Introduction and Some Basics
26 pages
Chapter 1
No ratings yet
Chapter 1
39 pages
Elementary Probability and Statistics
No ratings yet
Elementary Probability and Statistics
25 pages
Lecture Sheet F
No ratings yet
Lecture Sheet F
17 pages
Chapter 1
No ratings yet
Chapter 1
57 pages
Stat 333
No ratings yet
Stat 333
128 pages
7 - PDFOptim 1
No ratings yet
7 - PDFOptim 1
244 pages
GSM 199 Prev
No ratings yet
GSM 199 Prev
25 pages
Stochastic Dynamics
No ratings yet
Stochastic Dynamics
78 pages
Math 170A
No ratings yet
Math 170A
34 pages
SIC - AI - Chapter 4. Probability and Statistics - Rev2.0
No ratings yet
SIC - AI - Chapter 4. Probability and Statistics - Rev2.0
219 pages
Stats ch1
No ratings yet
Stats ch1
22 pages
Probability Theory
No ratings yet
Probability Theory
9 pages
Probability Notes
No ratings yet
Probability Notes
73 pages
Turn in Recitation and Tutorial Scheduling Form Policy: Text
No ratings yet
Turn in Recitation and Tutorial Scheduling Form Policy: Text
52 pages
4 Probability
No ratings yet
4 Probability
133 pages
MS C1 PDF
No ratings yet
MS C1 PDF
13 pages
Probability & Stochastic Processes
No ratings yet
Probability & Stochastic Processes
55 pages
Chapter-1 - STCP INTRO
No ratings yet
Chapter-1 - STCP INTRO
16 pages
Tutorial 01
No ratings yet
Tutorial 01
131 pages
STA301 Quiz-2 File by Vu Topper RM
No ratings yet
STA301 Quiz-2 File by Vu Topper RM
109 pages
Probability and Statistics
No ratings yet
Probability and Statistics
80 pages
Grade 10 Statistics and Probability
No ratings yet
Grade 10 Statistics and Probability
58 pages
Q3 - Independent and Dependent Event - DLP
50% (2)
Q3 - Independent and Dependent Event - DLP
5 pages
Lecture 2 Review of Probabilty Theory
No ratings yet
Lecture 2 Review of Probabilty Theory
52 pages
01 - Probability Spaces
No ratings yet
01 - Probability Spaces
15 pages
18 Probability
No ratings yet
18 Probability
67 pages
1 Probability
No ratings yet
1 Probability
31 pages
Probability Theory AND Random Processess (18B11MA314) : Unit-1 Basic Probability (Co-1)
No ratings yet
Probability Theory AND Random Processess (18B11MA314) : Unit-1 Basic Probability (Co-1)
22 pages
Chapter 03
No ratings yet
Chapter 03
18 pages
1 Basic Ideas 1
No ratings yet
1 Basic Ideas 1
75 pages
Axioms of Probability
No ratings yet
Axioms of Probability
40 pages
Chapter 4
No ratings yet
Chapter 4
52 pages
MATH 301 Lecture 1
No ratings yet
MATH 301 Lecture 1
65 pages
REVIEW
No ratings yet
REVIEW
184 pages
Probability LectureNotes
No ratings yet
Probability LectureNotes
16 pages
DLL For 2nd Co Math8
No ratings yet
DLL For 2nd Co Math8
3 pages
Lesson 2 Theory
No ratings yet
Lesson 2 Theory
8 pages
Stat I Chapter 4
No ratings yet
Stat I Chapter 4
29 pages
Segment 1 - PPD
No ratings yet
Segment 1 - PPD
32 pages
Ma113-Chapter 3
100% (1)
Ma113-Chapter 3
5 pages
Topic 3 - Probability Handouts
No ratings yet
Topic 3 - Probability Handouts
26 pages
CHP 10 Solutions
No ratings yet
CHP 10 Solutions
6 pages
Handout 2-Axiomatic Probability
No ratings yet
Handout 2-Axiomatic Probability
17 pages
Grade 8 DLP LeccioFinal
No ratings yet
Grade 8 DLP LeccioFinal
11 pages
Math Module PDF
No ratings yet
Math Module PDF
45 pages
02 Random Variables
No ratings yet
02 Random Variables
14 pages
Chapter-6-Random Variables & Probability Distributions
No ratings yet
Chapter-6-Random Variables & Probability Distributions
15 pages
Formula To Calculate Probability
No ratings yet
Formula To Calculate Probability
3 pages
Maths Year 12 Int'l t2 WK 1
No ratings yet
Maths Year 12 Int'l t2 WK 1
7 pages
Notes 1
No ratings yet
Notes 1
4 pages
Reviewer For Math 10
No ratings yet
Reviewer For Math 10
3 pages
Department of Education: Republic of The Philippines
No ratings yet
Department of Education: Republic of The Philippines
13 pages
AP and GP Important Questions For SSC
No ratings yet
AP and GP Important Questions For SSC
8 pages
3.06.R Probability Review
No ratings yet
3.06.R Probability Review
5 pages
HW1 Sol
No ratings yet
HW1 Sol
3 pages
Basics of Probability
No ratings yet
Basics of Probability
6 pages
Probability 101
No ratings yet
Probability 101
1 page
Unit - 1: 1. Explain About The Partition of A Sampling Space Theorem? Ans
No ratings yet
Unit - 1: 1. Explain About The Partition of A Sampling Space Theorem? Ans
14 pages
Mathematical Foundations of Information Theory
From Everand
Mathematical Foundations of Information Theory
A. Ya. Khinchin
3.5/5 (9)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lecture PDF

Uploaded by

Lecture PDF

Uploaded by

(Higher) Probability and Stochastic Processes “Probability is common sense reduced to calculation”

Pierre-Simon Laplace (1749-1827)

Suppose that those values are SX = {x1 , x2 , . . .}

The probability mass function (pmf) of a discrete random variable X is

Probability mass function:

“spikes” at x1 , x2 , . . . Some particular discrete distributions have been exhaustively studied

height of spike at xi = pX (xi )

Var(X ) = (x E(X )) dFX (x)

= E(X 2 ) 2E(X )E(X ) + (E(X ))2

However, E(Sn /n) = µ and Var(Sn /n) = n 2 /n2 = 2 /n !0

; Sn /n converges to the degenerate rv µ, in the sense that

P (|Sn /n µ| k )  !0 for all k > 0, as n ! 1

chain apparently independent

proportion of time that the process is in state j

; the initial-state-dependent Consequently, ⇡j is also the unconditional probability of finding the

P10(t) exponential term in P00 (t) and

MATH3801-MATH3901 Stochastic Processes 481 MATH3801-MATH3901 Stochastic Processes 482

length increases without limit (the system explodes, unstable

= E eY (s)+Y (t) Y (s) |{Y (u), 0  u  s}

= eY (s) E eY (t) Y (s) |{Y (u), 0  u  s}

0.0 0.2 0.4

0.6 0.8 1.0 0.0 0.2 0.4

0.6 0.8 1.0

Cov f (t) dB(t), g(t) dB(t) = f (t)g(t) dt

600 800 1000 0 200 400

600 800 1000

; E(B(t)|B(1) = 0) = 0, for any 0  t  1

= E B(t2 )E(B(t1 )|B(t2 ))|B(1) = 0

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.