0% found this document useful (0 votes)
18 views7 pages

Lecture 3

Uploaded by

dahiyakhushi01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views7 pages

Lecture 3

Uploaded by

dahiyakhushi01
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

PROBABILITY THEORY – LECTURE 3 – BHATT & GRIFFITHS – FALL 2023

Markov Chains

In many applications, successive observations of a process, say, X1, X2, … have an inherent time
component associated with them. For example, the Xi could be the state of the weather at a particular
location on the i-th day, counting from some fixed day. In a simplistic model, the state of the weather
could be “dry” or “wet”, quantified as, say, 0 and 1. It is hard to believe that in such an example, the
sequence X1, X2, … could be mutually independent. The question then arises how to model the
dependence among the Xi. One particular model that has earned a very special status is called a Markov
chain. In a Markov chain model, we assume that the future depends only on the present state. In the
weather example, suppose we want to assign a probability that tomorrow, say March 10, will be dry, and
suppose that we have available to us the precipitation history for each of January 1 to March 9. The
Markov chain model would entail that our probability that March 10 will be dry will depend only on the
state of the weather on March 9, even though the entire past precipitation history was available to us. As
simple as it sounds, Markov chains are enormously useful in applications, perhaps more than any other
specific dependency model. Familiarity with basic Markov chain terminology and theory is often
considered essential for anyone interested in studying statistics and probability.

Notation and Basic Definitions

countable set S  ℝ, and any n  1, xn+1, xn, …, x0  S,


Definition: A sequence of random variables {Xn}, n  0, is said to be a Markov chain if for some

P ( X n +1=x n+1|X 0=x 0 , X 1=x 1 , … , X n=x n } =P ( X n+ 1=x n+1|X n =x n }

The set S is called the state space of the chain. If S is a finite set, the chain is called a finite state Markov
chain. X0 is called the initial state.

Definition: A Markov chain {Xn} is called homogeneous or stationary if P{Xn+1 = y| Xn = x} is independent of


n for any x, y.

Definition: Let {Xn} be a stationary Markov chain. Then the probabilities pij = P(Xn+1 = j | Xn = i) are called
the one-step transition probabilities, or simply transition probabilities. The matrix P = ((pij)) is called the
transition probability matrix.

Definition: Let {Xn} be a stationary Markov chain. Then the probabilities


pij(n) = P(Xn+m = j | Xm = i) = P(Xn = j | X0 = i) are called the n-step transition probabilities, and the matrix P(n)
= ((pij(n)) is called the n-step transition probability matrix.

Definition: If the state space of the chain is finite and has, say t elements, then the transition probability
matrix P is a t x t matrix, and the rows of the matrix will always add to 1. A matrix with this property is
called a stochastic matrix. If, in addition, the columns also add to 1, then we call the matrix doubly
stochastic, or bistochastic.

Markov chains are widely used as models for discrete time sequences that exhibit local dependence. Part
of the reason for this popularity of Markov chains as a model is that a coherent, complete, and elegant
theory exists for how a chain evolves. We describe below examples from numerous fields where a
Markov chain is either the correct model or is chosen as a model.
Example: Suppose that in a particular city, any day is either dry or wet. If it is dry one day, it remains dry
the next day with probability , and will be wet with probability 1. On the other hand, if it is wet one
day, it remains wet the next day with probability , and becomes dry with probability 1. Let X0, X1, … be
the sequence of states of the weather, with X0 being the state on the initial day (on which observation
starts). Then {Xn} is a two-state stationary Markov chain with the transition probability matrix

P= (1−¿ 1−¿ )
Example: Suppose that in a presidential election, voters can vote for either the Labour party, or the
conservative party, or the Independent party. Someone who has voted for the Labour candidate in this
election will vote Labour again with 80% probability, will switch to Conservative with 5% probability, and
vote Independent with 15% probability. Someone who has voted for the Conservative candidate in this
election will vote Conservative again with 90% probability, switch to Labour with 3% probability, and vote
Independent with 7% probability. Someone who has voted for the Independent candidate in this election
will vote Independent again with 80% probability, or switch to one of the other parties with 10%
probability each. This is a three-state stationary Markov chain with state space S = {1, 2, 3}  {Labour,
Conservative, Independent} and the transition matrix

( )
0.8 0.05 0.15
P= 0.03 0.9 0.07
0.1 0.1 0.8

Chapman–Kolmogorov Equation

The Chapman–Kolmogorov equation provides a simple method for obtaining the higher-order transition
probabilities of a Markov chain in terms of lower-order transition probabilities. Carried to its most
convenient form, the equation describes how to calculate by a simple and explicit method all higher-
order transition probabilities in terms of the one-step transition probabilities. Because we always start
analyzing a chain with the one-step probabilities, it is evidently very useful to know how to calculate all
higher-order transition probabilities using just the knowledge of the one-step transition probabilities.

Theorem: Let {Xn} be a stationary Markov chain with the state space S. Let n, m  1. Then,

pij ( m+n )=P ( X m +n= j| X 0=i ¿= ∑ pik ( m ) p kj (n)


k∈S

Note: A verbal proof is actually the most easily understood. In order to get to state j from state i in m C n
steps, the chain must go to some state k  S in m steps, and then travel from that k to the state j in the
next n steps. By adding over all possible k  S, we get the Chapman–Kolmogorov equation.

An extremely important corollary is the following result.

Corollary: Let P(n) denote the n-step transition probability matrix. Then, for all n  2, P(n) = Pn, where Pn
denotes the usual n-th power of P.

So to calculate the n-step transition matrix we just raise the one-step transition matrix to the power of n.
Example: Consider again the weather pattern example with the one-step transition probability matrix

P= (1−¿ 1−¿ )
We let the states be 1, 2 (1 = dry, 2 = wet). We use the Chapman–Kolmogorov equation to answer two
questions. First, suppose it is Wednesday today, and it is dry. We want to know what the probability is
that Saturday will be dry if we assume that  =  = 0.8.

Next, suppose that we want to know what the probability is that Saturday and Sunday will both be dry if
Wednesday is dry.

Note: If we calculate the 7-step transition matrix, we get

P=
7
(0.51
0.49
0.49
0.51 )
We see that convergence to 0.5 is occurring, so eventually you will have a 50-50 chance that a day far
into the future will be dry or wet. Is this always the case? The answer is no. In this case, convergence to
0.5 occurred because the one-step transition matrix P has the doubly stochastic characteristic: each row
as well as each column of P adds to one.
Example: Consider the example previously given on voting preferences. Suppose we want to know what
the probabilities are that a Labour voter in this election will vote, respectively, Labour, Conservative, or
Independent two elections from now. Denoting the states as 1, 2, 3, in notation, we want to find
P{X2 = i | X0 = 1}, i = 1, 2, 3.

Hence, the probabilities that a Labour voter in this election will vote Labour, Conservative, or
Independent two elections from now are 66%, 10%, and 24%. We also see from P2 that a Conservative
voter will vote Conservative two elections from now with 82% probability and has a chance of just 6% to
switch to Labour.

Example: A problem of interest to sociologists is to determine the proportion of society that has an
upper-class, middle-class, and lower-class occupation. One mathematical model is to assume that
transitions between classes of successive generations can be regarded as a Markov chain, that is, we
assume the future occupation of a child depends on their father’s occupation. Suppose the transition
probability matrix is given by

( )
0.45 0.48 0.07
P= 0.05 0.70 0.25
0.01 0.50 0.49

We can look at the long-term percentage of people in each job class by raising P to a high power and
inferring the limiting values.

( )
0.0649 0.6232 0.3118
7
P = 0.0623 0.6234 0.3142
0.0619 0.6235 0.3147
Limiting Probabilities

Rather than raising the transition probability matrices to high powers and then guessing the limiting
values, it is preferable to have mathematical theory to assist us. Unfortunately, the full treatment of this
topic would take a long time to develop. However, we will look at the definitions needed and the main
results below.

Definition: Consider a Markov chain {X0, X1, X2, … } with transition probability matrix P. We say that a
probability distribution i is stationary if it satisfies  =  P

Note: We usually drop the subscript when using the formula.

The question then becomes whether every Markov chain has a stationary limiting distribution.
Unfortunately, the answer is no.

Definition: We say that a transition probability matrix is regular if some power of it only has positive
values.

Example: The matrix

( )
0.25 0.75 0
P= 0.5 0.5 0
0 0 1

is not regular because any power of P will leave the bottom row unchanged (and hence include zeros)
since the third state is said to be absorbing as once state 3 is entered, the process will remain there.

If we look at a high power of P, for example P20, we get

( )
0.4 0.6 0
20
P = 0.4 0.6 0
0 0 1

This indicates that if starting in state 1 or 2, one will end up in state 1 with probability 0.4 and state 2
with probability 0.6. However, if starting in state 3, one will end in state 3 with probability 1.

Definition: We say that a transition probability matrix is irreducible if every state can be reached from
every other state (not necessarily in one step).

It is clear that in the example above, P is not irreducible since it is not possible to reach state 3 from
states 1 and 2, or vice-versa.

There is just one more definition needed before we can state the theorem which allows us to calculate
the limiting probabilities of Markov chains.

Definition: We say that a Markov process is periodic if a state has period k if any return to that state
must occur in multiples of k time steps.

A simple example of this is the transition probability matrix


( )
0 1 0
P= 0 0 1
1 0 0

In this case, the process will cycle between states 1, 2, and 3 (and so each state has period 3).

Theorem: An irreducible, regular, aperiodic Markov chain {X0, X1, X2, …} with transition probability matrix

P will have a unique stationary probability distribution i satisfying  =  P and ∑ π i =1.


i

Example: Earlier, we looked at the (weather related) transition probability matrix

P= (0.8
0.2
0.2
0.8 )
This is clearly irreducible, regular, and aperiodic, so we solve the system of equations:

{
π 1=0.8 π 1+ 0.2 π 2
π 2=0.2 π 1 +0.8 π 2
π 1+ π 2=1

We quickly get that π 1=π 2=0.5 , as was inferred earlier.

Example: Earlier, we looked at the (election related) transition probability matrix

( )
0.8 0.05 0.15
P= 0.03 0.9 0.07
0.1 0.1 0.8

Again, this is irreducible, regular, and aperiodic, so we solve the system of equations:

{
π 1=0.8 π 1 +0.03 π 2+ 0.1 π 3
π 2=0.05 π 1 +0.9 π 2+ 0.1 π 3
π 3=0.15 π 1+ 0.07 π 2+ 0.8 π 3
π 1 + π 2+ π 3=1

26 50 37
The solution is π 1= =0.23 , π 2= =0.44 , π 3 = =0.33 .
113 113 113

Example: Earlier, we looked at the (social mobility) transition probability matrix

( )
0.45 0.48 0.07
P= 0.05 0.7 0.25
0.01 0.5 0.49
Once more, this is irreducible, regular, and aperiodic, so we solve the system of equations:

{
π 1=0.45 π 1+ 0.05 π 2+ 0.01 π 3
π 2=0.48 π 1+ 0.7 π 2+ 0.5 π 3
π 3=0.07 π 1 +0.25 π 2+ 0.49 π 3
π 1 + π 2+ π 3=1

140 1399 705


The solution is π 1= =0.06 , π 2= =0.62 , π 3= =0.31.
2244 2244 2244

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy