L-10 Expectation & Variance PDF
L-10 Expectation & Variance PDF
In this chapter, we look at the same themes for expectation and variance.
The expectation of a random variable is the long-term average of the random
variable.
We will repeat the three themes of the previous chapter, but in a different order.
45
3.1 Expectation
xP(X = x). x
E(X) = X
Expectation of g(X) x
xfX(x) = X
46
Properties of Expectation
i) Let g and h be functions, and let a and b be constants. For any random
variable X (discrete or continuous),
o ag(X) + = aE o h(X)
E In bh(X) n + bE
particular, o n
. n g(X)
E(aX + b) = aE(X) + b.
47
E(XY ) = E(X)E(Y )
E (Y ) g(X) E .
=E h(Y )
g(X)h
Probability as an Expectation
= 0 × P(IA =
0) + 1 × P(IA =
1)
= P(IA =
1)
= P(A).
Thus
P(A) = E(IA) for any event A.
48
The variance of a random variable X is a measure of how spread out it is.
Are the values of X clustered tightly around their mean, or can we
commonly observe values of X a long way from the mean value? The
variance measures how far the values of X are from their mean, on
average.
The variance is the mean squared deviation of a random variable from its own
mean.
If X has high variance, we can observe values of X a long way from the mean.
If X has low variance, the values of X tend to be clustered tightly around the
mean value.
= log(x)
2 1
= 2 log(2) − 2 log(1)
= 2 log(2).
49
2x
=2×2−2×1
= 2.
Thus
Var(X) = E(X2) − {E(X)}2
= 2 − {2 log(2)}2
= 0.0782.
Covariance
Covariance is a measure of the association or dependence between two
random variables X and Y . Covariance can be either positive or negative.
(Variance is always positive.)
1. cov(X, Y ) will be positive if large values of X tend to occur with large values
of Y , and small values of X tend to occur with small values of Y . For
example, if X is height and Y is weight of a randomly selected person, we
would expect cov(X, Y ) to be positive.
50
2. cov(X, Y ) will be negative if large values of X tend to occur with small values
of Y , and small values of X tend to occur with large values of Y . For
example, if X is age of a randomly selected person, and Y is heart rate, we
would expect X and Y to be negatively correlated (older people have slower
heart rates).
3. If X and Y are independent, then there is no pattern between large values of
X and large values of Y , so cov(X, Y ) = 0. However, cov(X, Y ) = 0 does
NOT imply that X and Y are independent, unless X and Y are Normally
distributed.
Properties of Variance
i) Let g be a function, and let a and b be constants. For any random variable X
(discrete or continuous),
ag(X) + = o
Var b g(X) .
2 n
n a Var
o
corr(X, Y ) = cov(X,
Y)
p
Var(X)Var(Y ).
51
Throughout this section, we will assume for simplicity that X and Y are dis
crete random variables. However, exactly the same results hold for
continuous random variables too.
Suppose that X and Y are discrete random variables, possibly dependent
on each other. Suppose that we fix Y at the value y. This gives us a set of
conditional probabilities P(X = x | Y = y) for all possible values x of X. This is
called the conditional distribution of X, given that Y = y.
P(X = x | Y = y) = P(X
= x AND Y = y)
P(Y = y).
We write the conditional probability function as:
fX | Y (x
| y) = P(X = x | Y = y).
y) = X x
52
We can also find the expectation and variance of X with respect to this
condi tional distribution. That is, if we know that the value of Y is fixed at y,
then we can find the mean value of X given that Y takes the value y, and
also the variance of X given that Y = y.
x
=X
E(X | Y = y) is the mean value of X, when Y is fixed at y.
We can therefore view E(X | Y = y) as a function of y, say E(X | Y =y) = h(y).
53
and X | Y =
Example: Suppose Y =
2Y with probability 3/4 ,
1 with probability 1/8 , 2
3Y with probability 1/4 .
with probability 7/8 ,
then: X |(Y = 2) =
so E(X | Y = 2) = 4 ×34 + 1
6 × 4 =
18
4.
2 with probability 3/4
3 with probability 1/4
so E(X | Y = 1) = 2 ×34 + 1 9
4. 4
3 × 4 =
y = 2.
Thus E(X | Y = y) =
9/4 if y = 1 18/4 if
54
Conditional variance
55
If all the expectations below are finite, then for ANY random variables X and
Y , we have:
Law of Total
E(X | Y ) Expectation.
i) E(X) = EY
Note that we can pick any r.v. Y , to make the expectation as easy as we can.
Var(X | Y ) E(X | Y )
Law of Total
+ VarY
iii) Var(X) = EY Variance.
The Law of Total Expectation says that the total average is the average of case
by-case averages.
56
9/4 with probability 1/8, 18/4 with
Example: In the example above, we had:
probability 7/8.
E(X | Y ) = The total average is:
n 9 1 18 7
E(X | Y ) = 4× 8+ 4×
o
E(X) = EY 8= 4.22.
= E(g(X)) = LHS.
EY [Var(X
| Y )] + VarY [E(X
| Y )]
(E(X | Y 2o h
n E
Y )] − EY
2o i2
= EY )) + (E(X | Y ))
| {z }
E(X2| Y ) − [E(X | Y n
E(X) by part (i)
= EY {E(X 2
| Y )} −EY {[E(X
| Y )]2} + EY {[E(X
|Y
| {z } E(X ) by part (i)
2 )]2} − (EX)2
= E(X2) − (EX)2
= Var(X) = LHS.
57
Let Y be the number of consecutive days Fraser has to work between bad
weather days. Let X be the total number of customers who go on Fraser’s
trip in this period of Y days. Conditional on Y , the distribution of X is
(X | Y ) ∼ Poisson(µY ).
E(Y ) = 1
−p
p,
Var(Y ) = 1
−p
2
p .
58
∴ E(X) = µ(1
− p) p.
Variance:
By the Law
Var(X) = EY Var(X | Y ) + VarY E(X | Y )
of Total
+
µY VarY µY
= EY
= µEY (Y
) + µ2VarY (Y
)
p
=µ 1− 2
p
p
1− p + µ2
=µ(1 − p)(p + µ)
2
p .
Checking your answer in R:
If you know how to use a statistical package like R, you can check your
answer to the question above as follows.
59
The formulas we obtained by working give E(X) = 100 and Var(X) = 12600.
The sample mean was x = 100.6606 (close to 100), and the sample
variance was 12624.47 (close to 12600). Thus our working seems to have
been correct.
2. Randomly stopped sum
60
Let TN =
X1 +
X2 +
. . . + XN be
the total amount of money withdrawn in a
day, where each Xi has the probability function above, and X1, X2, . . . are
independent of each other and of N.
TN is
a randomly stopped sum, stopped by the random number of N
Solution
(a) Exercise.
N
(b) Let TN =P
i=1 X
i. If we knew how many terms were in the sum, we could easily
find E(TN ) and Var(TN ) as the mean and variance of a sum of independent r.v.s. So
‘pretend’ we know how many terms are in the sum: i.e. condition on N.
E(TN | N) = E(X1 +
X2 +
. . . + XN | N)
= E(X1 +
X2 +
. . . + XN )
(because all Xis are independent of N)
= E(X1) + E(X2) + . . . + E(XN )
where N is now considered constant;
(we do NOT need independence of Xi’s for this)
= N × E(X) (because all Xi’s have same mean, E(X))
= 105N.
61
Similarly,
Var(TN | N) = Var(X1 +
X2 +
. . . + XN | N)
= Var(X1 + X2 +
. . . + XN )
where N is now considered constant;
(because all Xi’s are independent of N)
= Var(X1) + Var(X2) + . . . + Var(XN )
(we DO need independence of Xi’s for this)
= N × Var(X) (because all Xi’s have same variance, Var(X)) =
2725N.
n
So E(TN ) = EN E(TN | N)
o
= EN (105N)
= 105EN (N)
= 105λ,
= EN {2725N}
+ VarN {105N}
2
= 2725EN (N)
+ 105 VarN (N)
= 2725λ + 11025λ
= 13750λ,
62
Check in R (advanced)
> # Create a function tn.func to calculate a single value of T_N > # for a
given value N=n:
> tn.func <- function(n){
sum(sample(c(50, 100, 200), n, replace=T,
prob=c(0.3, 0.5, 0.2)))
}
Suppose X1, X2, . . . each have the same mean µ and variance σ2, and X1,
X2, . . . , and N are mutually independent. Let TN =
X1 +
. . . + XN be
the
randomly stopped sum. By following similar working to that above:
Xi Xi ) = µE(N)
E(TN ) = E
)
Var(TN ) =
Var
( N
X
i=1
63
Remember from Section 2.6 that we use First-Step Analysis for finding the
probability of eventually reaching a particular state in a stochastic process.
First-step analysis for probabilities uses conditional probability and the
Partition Theorem (Law of Total Probability).
In the same way, we can use first-step analysis for finding the expected
reaching time for a state.
This is because the first-step options form a partition of the sample space.
64
This follows immediately from the law of total expectation:
n o E(X | Y = y).
E(X) = EY E(X | Y ) y)P(Y =
= X y
Let X be the reaching time, and let Y be the label for possible
options: i.e. Y = 1, 2, 3, . . . for options 1, 2, 3, . . .
We then obtain:
y)
y
E(X) = X
E(X | Y = y)P(Y =
E(reaching time | o
ption)P(option).
first-step
i.e. E(reaching time) = X
options
65
Then:
1 1 1
= E(X | Y = 1) × 3+
E(X | Y = 2) × 3+
E(X | Y = 3) × 3.
But:
E(X | Y = 1) = 3 minutes
E(X | Y = 2) = 5 + E(X) (after 5 mins back in Room, time E(X) to get out) E(X
| Y = 3) = 7 + E(X) (after 7 mins, back in Room)
1 1
So 3+
× 3
× 1
3+
7 + EX
E(X) = 3 × 5 + EX
1 1
= 15 × 3 + 2(EX) × 3
1
= 15 × 13
3E(X)
⇒ E(X) = 15 minutes.
First-step analysis:
1 1 1
mR =
3×
3+ 3×
(5 + mR) + 3×
(7 + mR)
⇒ 3mR =
(3 + 5 + 7) + 2mR
66
You must remember to add on whatever we are counting, to every step taken.
expected notation: let 1/3 1/3
The mouse is number of 1/3 1/3
put in a new steps the
Room 1
maze with two mouse takes
rooms, before it
1/3 1/3 Room 2
pictured here. reaches the EXIT
Starting from exit?
Room 1, what
is the 1. Define
(a) ⇒ 3m1 =
3 + 2m1
3 steps.
⇒ m1 =
67
1
3m2.
In each case, we will get the same answer (check). This is because the
option probabilities sum to 1, so in Method 1 we are adding ( 13 +
1
3+
1
3)×1
= 1×1 = 1, just as we are in Method 2.
Recall from Section 3.1 that for any event A, we can write P(A) as an expecta
tion as follows. 1) = P(A).
68
But
0
1
E(IA | Y ) = X
r= rP(IA =
r|Y)
= 0 × P(IA =
0 | Y ) + 1 × P(IA =
1|Y)=
P(IA = 1 | Y )
= P(A | Y ).
E(IA | Y ) .
P(A) = EY P(A | Y )
Thus = EY
This means that for any random variable X (discrete or continuous), and for
any set of values S (a discrete set or a continuous set), we can write:
69
You watch the lottery draw on TV and your numbers match the winners!!
You had a one-in-a-million chance, and there were a million players, so it
must be YOU, right?
Not so fast. Before you rush to claim your prize, let’s calculate the
probability that you really will win. You definitely win if you are the only
person with matching numbers, but you can also win if there there are
multiple matching tickets and yours is the one selected at random from the
matches.
Define Y to be the number of OTHER matching tickets out of the OTHER 1
million tickets sold. (If you are lucky, Y = 0 so you have definitely won.)
If there are 1 million tickets and each ticket has a one-in-a-million chance of
having the winning numbers, then
Y ∼ Poisson(1) approximately.
The relationship Y ∼ Poisson(1) arises because of the Poisson
approximation to the Binomial distribution.
y
fY (y) = P(Y = y) = 1
−1
y!e =1
e × y!for y = 0, 1, 2, . . ..
(b) What is the probability that yours is the only matching ticket?
1
P(only one matching ticket) = P(Y = 0) = e=
0.368.
(c) The prize is chosen at random from all those who have matching
tickets. What is the probability that you win if there are Y = y OTHER
matching tickets?
Let W be the event that I win.
P(W | Y = y) = 1
y + 1.
70
(d) Overall, what is the probability that you win, given that you have a
match ing ticket?
EY P(W | Y =
n y)
P(W) = o
P(W | Y =
y)P(Y = y)
∞
=X y=0
y=0 y+1
e × y!
1 1
∞
=X
1
(y + 1)y!
1 ∞
= eX
y=0
1
1 ∞
= eX (y + 1)!
y=0
)
1 y!−
= e 1
( ∞ 0!1
X y=0
1
= e{e − 1}
1
=1− e
= 0.632.
Disappointing?
71
3.7 Special process: a model for gene spread
Stochastic process:
The state of the process at time t is Xt = the number of animals with allele A
at generation t.
x
N, N .
[ Xt+1 | Xt =
x ] ∼ Binomial
72
If
x
N, N ,
then y
1 −xN
Ny
P(Xt+1 =
y | Xt =
x)
N−y
Example with N = (Binomial
=
formula)
3 x
N
Transition diagram:
a d
c b
0 1 2 3
b d
c
a
Probability the harmful allele A dies out
Suppose the process starts at generation 0. One of the three animals has
the harmful allele A. Define a suitable notation, and find the probability that
the harmful allele A eventually dies out.
73
Suppose again that the process starts at generation 0, and one of the three
animals has the harmful allele A. Eventually all animals will have the same
allele, whether it is allele A or B. When this happens, the population is said
to have reached fixation: it is fixed for a single allele and no further changes
are possible.
Things get more interesting for large N. When N = 100, and x = 10 animals
have the harmful allele at generation 0, there is a 90% chance that the
harmful allele will die out and a 10% chance that the harmful allele will take
over the whole population. The expected number of generations taken to
reach fixation is 63.5. If the process starts with just x = 1 animal with the
harmful allele, there is a 99% chance the harmful allele will die out, but the
expected number of generations to fixation is 10.5. Despite the allele being
rare, the average number of generations for it to either die out or saturate
the population is quite large.
Note: The model above is also an example of a process called the Voter
Process. The N individuals correspond to N people who each support one
of two political candidates, A or B. Every day they make a new decision
about whom to support, based on the amount of current support for each
candidate. Fixation in the genetic model corresponds to concensus in the
Voter Process.