0% found this document useful (0 votes)

69 views34 pages

L-10 Expectation & Variance PDF

The document discusses expectation and variance of random variables. It defines expectation as the long-term average value of a random variable. Expectation is calculated differently for continuous and discrete random variables using integrals or sums. Variance measures how spread out the values of a random variable are from the mean and is defined as the expected value of the squared deviations from the mean. The properties of expectation and how to calculate expectation and variance for specific random variables are also covered.

Uploaded by

Syed Md. Affanul Karim Shourav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

69 views34 pages

L-10 Expectation & Variance PDF

Uploaded by

Syed Md. Affanul Karim Shourav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

44

Chapter 3: Expectation and Variance

In the previous chapter we looked at probability, with three major
themes: 1. Conditional probability: P(A | B).

2. First-step analysis for calculating eventual probabilities in a stochastic

process.
3. Calculating probabilities for continuous and discrete random variables.

In this chapter, we look at the same themes for expectation and variance.
The expectation of a random variable is the long-term average of the random
variable.

Imagine observing many thousands of independent random values from the

random variable of interest. Take the average of these random values. The
expectation is the value of this average as the sample size tends to infinity.

We will repeat the three themes of the previous chapter, but in a different order.

1. Calculating expectations for continuous and discrete random variables.

2. Conditional expectation: the expectation of a random variable X, condi
tional on the value taken by another random variable Y . If the value of
Y affects the value of X (i.e. X and Y are dependent), the conditional
expectation of X given the value of Y will be different from the overall
expectation of X.
3. First-step analysis for calculating the expected amount of time needed
to reach a particular state in a process (e.g. the expected number of
shots before we win a game of tennis).

We will also study similar themes for variance.

45
3.1 Expectation

The mean, expected value, or expectation of a random variable X is writ ten

as E(X) or µX. If we observe N random values of X, then the mean of the N
values will be approximately equal to E(X) for large N. The expectation is
defined differently for continuous and discrete random variables.

Definition: Let X be a continuous random variable with p.d.f. fX(x). The ex

pected value of Z −∞
∞
X is
xfX(x) dx.
E(X) =

Definition: Let X be a discrete random variable with probability function fX(x).

The expected value of X is

xP(X = x). x
E(X) = X

Expectation of g(X) x
xfX(x) = X

Let g(X) be a function of X. We can imagine a long-term average of g(X)

just as we can imagine a long-term average of X. This average is written as
E(g(X)). Imagine observing X many times (N times) to give results x1, x2, . . .
, xN . Apply the function g to each of these observations, to give g(x1), . . . ,
g(xN ). The mean of g(x1), g(x2), . . . , g(xN ) approaches E(g(X)) as the
number of observations N tends to infinity.

Definition: Let X be a continuous random variable, and let g be a function. The

expected g(X) is E g(X) = g(x)fX(x)
value of Z ∞ −∞ dx.

Definition: Let X be a discrete random variable, and let g be a function. The

expected value of g(X) is
g(X)
E =X x g(x)fX(x g(x)P(X
= x).
x
)=X

Expectation of XY : the definition of E(XY )

Suppose we have two random variables, X and Y . These might be
independent, in which case the value of X has no effect on the value of Y .
Alternatively, X and Y might be dependent: when we observe a random
value for X, it might influence the random values of Y that we are most likely
to observe. For example, X might be the height of a randomly selected
person, and Y might be the weight. On the whole, larger values of X will be
associated with larger values of Y .

To understand what E(XY ) means, think of observing a large number of

pairs (x1, y1),(x2, y2), . . . ,(xN , yN ).
If X and Y are dependent, the value xi
might affect the value yi, and vice versa, so we have to keep the
observations together in their pairings. As the number of pairs N tends to
infinity, the average
PN
1

i=1 xi × yi approaches the expectation E(XY ).

For example, if X is height and Y is weight, E(XY ) is the average of (height

× weight). We are interested in E(XY ) because it is used for calculating the
covariance and correlation, which are measures of how closely related X
and Y are (see Section 3.2).

Properties of Expectation

i) Let g and h be functions, and let a and b be constants. For any random
variable X (discrete or continuous),
o ag(X) + = aE o h(X)
E In bh(X) n + bE
particular, o n
. n g(X)
E(aX + b) = aE(X) + b.

ii) Let X and Y be ANY random variables (discrete, continuous, independent,

non-independent). Then
or E(X + Y ) = E(X) + E(Y ).

More generally, for ANY random variables X1, . . . , Xn,

E(X1 +
. . . + Xn) = E(X1) + . . . + E(Xn).

iii) Let X and Y be independent random variables, and g, h be functions. Then

E(XY ) = E(X)E(Y )
E (Y ) g(X) E .
=E h(Y )
g(X)h

. E(XY ) = E(X)E(Y ) is ONLY generally true if X and Y are

Notes: 1
INDEPENDENT.
2. If X and Y are independent, then E(XY ) = E(X)E(Y ). However, the
converse is not generally true: it is possible for E(XY ) = E(X)E(Y ) even
though X and Y are dependent.

Probability as an Expectation

Let A be any event. We can write P(A) as an expectation, as

follows. Define the indicator function:
occurs, 0
IA = otherwise.

(
1 i f event A

a random variable, and

Then IA is
E(IA) = X 1 rP(IA =
r)

r=0

= 0 × P(IA =
0) + 1 × P(IA =
1)
= P(IA =
1)
= P(A).

Thus
P(A) = E(IA) for any event A.

3.2 Variance, covariance, and correlation

The variance of a random variable X is a measure of how spread out it is.
Are the values of X clustered tightly around their mean, or can we
commonly observe values of X a long way from the mean value? The
variance measures how far the values of X are from their mean, on
average.

Definition: Let X be any random variable. The variance of X is

−
Var(X) = (X − µ )2 = E(X2) 2
.
X
E E(X)

The variance is the mean squared deviation of a random variable from its own
mean.

If X has high variance, we can observe values of X a long way from the mean.

If X has low variance, the values of X tend to be clustered tightly around the
mean value.

Example: Let X be a continuous random variable with p.d.f.

(
2x−2for 1 < x < 2, 0
fX(x) =
otherwise.
Find E(X) and Var(X).
Z ∞ −∞ Z 2 1 = h i2
Z 2
x fX(x) x × 2x−1dx
E(X) = dx = 2x−2dx 1

= log(x)
2 1

= 2 log(2) − 2 log(1)

= 2 log(2).

For Var(X), we use

Var(X) = E(X2) − {E(X)}2.
Now dx = 2x−2dx = 2 dx i2
1
Z ∞ −∞
Z 2 1 x2 × h
E(X2) = Z 2
x2fX(x)
= 1

2x
=2×2−2×1

= 2.
Thus
Var(X) = E(X2) − {E(X)}2
= 2 − {2 log(2)}2
= 0.0782.

Covariance
Covariance is a measure of the association or dependence between two
random variables X and Y . Covariance can be either positive or negative.
(Variance is always positive.)

Definition: Let X and Y be any random variables. The covariance between X

and Y is given by
n
cov(X, Y ) = E (X − µX)(Y − µY ) = E(XY ) − E(X)E(Y ),
o
where µX =
E(X), µY =
E(Y ).

1. cov(X, Y ) will be positive if large values of X tend to occur with large values
of Y , and small values of X tend to occur with small values of Y . For
example, if X is height and Y is weight of a randomly selected person, we
would expect cov(X, Y ) to be positive.

2. cov(X, Y ) will be negative if large values of X tend to occur with small values
of Y , and small values of X tend to occur with large values of Y . For
example, if X is age of a randomly selected person, and Y is heart rate, we
would expect X and Y to be negatively correlated (older people have slower
heart rates).
3. If X and Y are independent, then there is no pattern between large values of
X and large values of Y , so cov(X, Y ) = 0. However, cov(X, Y ) = 0 does
NOT imply that X and Y are independent, unless X and Y are Normally
distributed.

Properties of Variance

i) Let g be a function, and let a and b be constants. For any random variable X
(discrete or continuous),
ag(X) + = o
Var b g(X) .
2 n
n a Var
o

In particular, Var(aX + b) = a2Var(X).

ii) Let X and Y be independent random variables. Then
Var(X + Y ) = Var(X) + Var(Y ).

iii) If X and Y are NOT independent, then

Var(X + Y ) = Var(X) + Var(Y ) + 2cov(X, Y ).
Correlation (non-examinable)
The correlation coefficient of X and Y is a measure of the linear association
between X and Y . It is given by the covariance, scaled by the overall
variability in X and Y . As a result, the correlation coefficient is always
between −1 and +1, so it is easily compared for different quantities.
Definition: The correlation between X and Y , also called the correlation coefficient, is
given by

corr(X, Y ) = cov(X,
Y)
p
Var(X)Var(Y ).

The correlation measures linear association between X and Y . It takes

values only between −1 and +1, and has the same sign as the covariance.

The correlation is ±1 if and only if there is a perfect linear relationship

between X and Y , i.e. corr(X, Y ) = 1 ⇐⇒ Y = aX + b for some constants a
and b.

The correlation is 0 if X and Y are independent, but a correlation of 0 does

not imply that X and Y are independent.

3.3 Conditional Expectation and Conditional Variance

Throughout this section, we will assume for simplicity that X and Y are dis
crete random variables. However, exactly the same results hold for
continuous random variables too.
Suppose that X and Y are discrete random variables, possibly dependent
on each other. Suppose that we fix Y at the value y. This gives us a set of
conditional probabilities P(X = x | Y = y) for all possible values x of X. This is
called the conditional distribution of X, given that Y = y.

Definition: Let X and Y be discrete random variables. The conditional probability

function of X, given that Y = y, is:

P(X = x | Y = y) = P(X
= x AND Y = y)

P(Y = y).
We write the conditional probability function as:
fX | Y (x
| y) = P(X = x | Y = y).

Note: The conditional probabilities fX | Y (x

| y) sum to one, just like any other
probability function:
X x P{Y =y}(X = x) =
P(X = x | Y = 1,

y) = X x

using the subscript notation P{Y =y} of

Section 2.3.

We can also find the expectation and variance of X with respect to this
condi tional distribution. That is, if we know that the value of Y is fixed at y,
then we can find the mean value of X given that Y takes the value y, and
also the variance of X given that Y = y.

Definition: Let X and Y be discrete random variables. The conditional expectation

of X, given that Y = y, is
µX | Y =y = E(X | Y = y)xfX | Y (x
| y).

x
=X
E(X | Y = y) is the mean value of X, when Y is fixed at y.

Conditional expectation as a random variable

The unconditional expectation of X, E(X), is just a number:

e.g. EX = 2 or EX = 5.8.

The conditional expectation, E(X | Y = y), is a number depending on y.

If Y has an influence on the value of X, then Y will have an influence on the

average value of X. So, for example, we would expect E(X | Y = 2) to be
different from E(X | Y = 3).

We can therefore view E(X | Y = y) as a function of y, say E(X | Y =y) = h(y).

To evaluate this function, h(y) = E(X | Y = y), we:

i) fix Y at the chosen value y;

ii) find the expectation of X when Y is fixed at this value.

However, we could also evaluate the function at a random value of

Y : i) observe a random value of Y ;

ii) fix Y at that observed random value;

iii) evaluate E(X | Y = observed random value).

We obtain a random variable: E(X | Y ) = h(Y ).

The randomness comes from the randomness in Y , not in X.

Conditional expectation, E(X | Y ), is a random variable

with randomness inherited from Y , not X.

and X | Y =
Example: Suppose Y =

2Y with probability 3/4 ,
1 with probability 1/8 , 2
3Y with probability 1/4 .
with probability 7/8 ,

Conditional expectation of X given Y = y is a number depending on y:

with probability 3/4
If Y = 1, then: X |(Y = 1) = If Y = 2, 6 with probability 1/4

then: X |(Y = 2) =

so E(X | Y = 2) = 4 ×34 + 1
6 × 4 =

18
4.
2 with probability 3/4
3 with probability 1/4

so E(X | Y = 1) = 2 ×34 + 1 9
4. 4
3 × 4 =
y = 2.
Thus E(X | Y = y) =

9/4 if y = 1 18/4 if

So E(X | Y = y) is a number depending on y, or a function of y.

Conditional expectation of X given random Y is a random variable:

18/4 if Y = 2 (probability 7/8).
From above, E(X | Y ) =

9/4 if Y = 1 (probability 1/8),

9/4 with probability
So E(X | Y ) = 1/8, 18/4 with
probability 7/8.
Thus E(X | Y ) is a random variable.
The randomness in E(X | Y ) is inherited from Y , not from X.

Conditional expectation is a very useful tool for finding the unconditional

expectation of X (see below). Just like the Partition Theorem, it is useful
because it is often easier to specify conditional probabilities than to specify
overall probabilities.

Conditional variance

The conditional variance is similar to the conditional expectation. • Var(X |

Y = y) is the variance of X, when Y is fixed at the value Y = y.

• Var(X | Y ) is a random variable, giving the variance of X when Y is

fixed at a value to be selected randomly.

Definition: Let X and Y be random variables. The conditional variance of X,

given Y , is given by
Y)− 2 n 2
n o = E (X − µX | Y ) |
E(X | Y )
Var(X | Y ) = E(X2| Yo

Like expectation, Var(X | Y = y) is a number depending on y (a function of y ),

while Var(X | Y ) is a random variable with randomness inherited from Y .

Laws of Total Expectation and Variance

If all the expectations below are finite, then for ANY random variables X and
Y , we have:
Law of Total
E(X | Y ) Expectation.
i) E(X) = EY

Note that we can pick any r.v. Y , to make the expectation as easy as we can.

E(g(X)| Y ) for any function

ii) E(g(X)) = EY g.

Var(X | Y ) E(X | Y )
Law of Total
+ VarY
iii) Var(X) = EY Variance.

Note: EY and

VarY denote
expectation over Y and variance over Y ,
i.e. the expectation or variance is computed over the distribution of the
random variable Y .

The Law of Total Expectation says that the total average is the average of case
by-case averages.

• The total average is E(X);

• The case-by-case averages are E(X | Y ) for the different values of Y ; •
The average of case-by-case averages is the average over Y of the Y-case
average
s: EY E(X | Y .
)

9/4 with probability 1/8, 18/4 with
Example: In the example above, we had:
probability 7/8.
E(X | Y ) = The total average is:
n 9 1 18 7
E(X | Y ) = 4× 8+ 4×
o
E(X) = EY 8= 4.22.

Proof of (i), (ii), (iii):

(i) is a special case of (ii), so we just need to prove (ii). Begin at RHS:
" g(x)P(X = x
i X x
RHS = EY |Y)
= EY
h # #
E(g(X)| Y )
"
= X y X x
g(x)P(X = P(Y = y)
x | Y = y)
Y = y)P(Y =
X y y) g(x)X y
= P(X = x | Y =
X x
y)P(Y = y)
=X x
g(x)P(X = x |
g(x)P(X = x)
=X x (partition rule)

= E(g(X)) = LHS.

(iii) Wish to prove Var(X) = EY [Var(X

| Y )] + VarY [E(X
| Y )]. Begin at RHS:

EY [Var(X
| Y )] + VarY [E(X
| Y )]
(E(X | Y 2o h
n E
Y )] − EY
2o i2
= EY )) + (E(X | Y ))
| {z }
E(X2| Y ) − [E(X | Y n
E(X) by part (i)

= EY {E(X 2
| Y )} −EY {[E(X
| Y )]2} + EY {[E(X
|Y

| {z } E(X ) by part (i)
2 )]2} − (EX)2

= E(X2) − (EX)2
= Var(X) = LHS.

3.4 Examples of Conditional Expectation and Variance

1. Swimming with dolphins

Fraser runs a dolphin-watch business.

Every day, he is unable to run the trip
due to bad weather with probability p,
independently of all other days. Fraser works every day except the
bad-weather days, which he takes as holiday.

Let Y be the number of consecutive days Fraser has to work between bad
weather days. Let X be the total number of customers who go on Fraser’s
trip in this period of Y days. Conditional on Y , the distribution of X is
(X | Y ) ∼ Poisson(µY ).

(a) Name the distribution of Y , and state E(Y ) and Var(Y ).

(b) Find the expectation and the variance of the number of customers
Fraser sees between bad-weather days, E(X) and Var(X).

(a) Let ‘success’ be ‘bad-weather day’ and ‘failure’ be ‘work-day’.

Then P(success) = P(bad-weather) = p.
Y is the number of failures before the first success.
So
Y ∼ Geometric(p).
Thus

E(Y ) = 1
−p

p,
Var(Y ) = 1
−p
2
p .

(b) We know (X | Y ) ∼ Poisson(µY ): so

E(X | Y ) = Var(X | Y ) = µY.

By the Law of Total Expectation:

n
o
E(X) = EY E(X | Y )
= EY (µY
)
= µEY (Y
)

∴ E(X) = µ(1
− p) p.

Variance:
By the Law
Var(X) = EY Var(X | Y ) + VarY E(X | Y )
of Total
+
µY VarY µY
= EY

= µEY (Y
) + µ2VarY (Y
)
p
=µ 1− 2
p
p
1− p + µ2

=µ(1 − p)(p + µ)
2
p .
Checking your answer in R:

If you know how to use a statistical package like R, you can check your
answer to the question above as follows.

> # Pick a value for p, e.g. p = 0.2.

> # Pick a value for mu, e.g. mu = 25
>
> # Generate 10,000 random values of Y ~ Geometric(p = 0.2): > y <-
rgeom(10000, prob=0.2)
>
> # Generate 10,000 random values of X conditional on Y: > #
use (X | Y) ~ Poisson(mu * Y) ~ Poisson(25 * Y)
> x <- rpois(10000, lambda = 25*y)

> # Find the sample mean of X (should be close to E(X)): >

mean(x)
[1] 100.6606
>
> # Find the sample variance of X (should be close to var(X)): > var(x)
[1] 12624.47
>
> # Check the formula for E(X):
> 25 * (1 - 0.2) / 0.2
[1] 100
>
> # Check the formula for var(X):
> 25 * (1 - 0.2) * (0.2 + 25) / 0.2^2
[1] 12600

The formulas we obtained by working give E(X) = 100 and Var(X) = 12600.
The sample mean was x = 100.6606 (close to 100), and the sample
variance was 12624.47 (close to 12600). Thus our working seems to have
been correct.
2. Randomly stopped sum

This model arises very commonly in stochastic

processes. A random number N of events occur,
and each event i has associated with it some cost,
penalty, or reward Xi. The question is to find the
mean and variance of the total cost / reward:
TN =
X1 +
X2 +
. . . + XN .
The difficulty is that the number N of terms in the sum is itself random.

called a randomly stopped sum: it is a sum of Xi’s, randomly stopped at

TN is
the random number of N terms.

hink of a cash machine, which has to be loaded with enough money

Example: T
to cover the day’s business. The number of customers per day is a random
number N. Customer i withdraws a random amount Xi. The total amount
withdrawn during the day is a randomly stopped sum: TN =
X1 +
. . . + XN .

Cash machine example

The citizens of Remuera withdraw money from a cash machine according

to the following probability function (X):
Amount, x ($) 50 100 200

P(X = x) 0.3 0.5 0.2

The number of customers per day has the distribution N ∼ Poisson(λ).

Let TN =
X1 +
X2 +
. . . + XN be
the total amount of money withdrawn in a
day, where each Xi has the probability function above, and X1, X2, . . . are
independent of each other and of N.
TN is
a randomly stopped sum, stopped by the random number of N

customers. (a) Show that E(X) = 105, and Var(X) = 2725.

(b) Find E(TN ) and Var(TN ):
the mean and variance of the amount of
money withdrawn each day.

Solution

(a) Exercise.
N
(b) Let TN =P
i=1 X
i. If we knew how many terms were in the sum, we could easily
find E(TN ) and Var(TN ) as the mean and variance of a sum of independent r.v.s. So
‘pretend’ we know how many terms are in the sum: i.e. condition on N.

E(TN | N) = E(X1 +
X2 +
. . . + XN | N)
= E(X1 +
X2 +
. . . + XN )
(because all Xis are independent of N)
= E(X1) + E(X2) + . . . + E(XN )
where N is now considered constant;
(we do NOT need independence of Xi’s for this)
= N × E(X) (because all Xi’s have same mean, E(X))
= 105N.

Similarly,

Var(TN | N) = Var(X1 +
X2 +
. . . + XN | N)
= Var(X1 + X2 +
. . . + XN )
where N is now considered constant;
(because all Xi’s are independent of N)
= Var(X1) + Var(X2) + . . . + Var(XN )
(we DO need independence of Xi’s for this)
= N × Var(X) (because all Xi’s have same variance, Var(X)) =
2725N.

n
So E(TN ) = EN E(TN | N)
o

= EN (105N)

= 105EN (N)

= 105λ,

because N ∼ Poisson(λ) so E(N) = λ.

Similarly,
n N) E(TN | N)
Var(TN ) = Var(T | o + VarN o
N
EN n

= EN {2725N}
+ VarN {105N}

2
= 2725EN (N)
+ 105 VarN (N)

= 2725λ + 11025λ
= 13750λ,

because N ∼ Poisson(λ) so E(N) = Var(N) = λ.

Check in R (advanced)

> # Create a function tn.func to calculate a single value of T_N > # for a
given value N=n:
> tn.func <- function(n){
sum(sample(c(50, 100, 200), n, replace=T,
prob=c(0.3, 0.5, 0.2)))
}

> # Generate 10,000 random values of N, using lambda=50: > N

<- rpois(10000, lambda=50)
> # Generate 10,000 random values of T_N, conditional on N: > TN
<- sapply(N, tn.func)
> # Find the sample mean of T_N values, which should be close to > #
105 * 50 = 5250:
> mean(TN)
[1] 5253.255
> # Find the sample variance of T_N values, which should be close > # to
13750 * 50 = 687500:
> var(TN)
[1] 682469.4
All seems well. Note that the sample variance is often some distance from
the true variance, even when the sample size is 10,000.

General result for randomly stopped sums:

Suppose X1, X2, . . . each have the same mean µ and variance σ2, and X1,
X2, . . . , and N are mutually independent. Let TN =
X1 +
. . . + XN be
the
randomly stopped sum. By following similar working to that above:
Xi Xi ) = µE(N)

E(TN ) = E

)
Var(TN ) =
Var
( N
X

i=1

( N = σ2 E(N) + µ2

X i=1
Var(N).

3.5 First-Step Analysis for calculating expected reaching times

Remember from Section 2.6 that we use First-Step Analysis for finding the
probability of eventually reaching a particular state in a stochastic process.
First-step analysis for probabilities uses conditional probability and the
Partition Theorem (Law of Total Probability).
In the same way, we can use first-step analysis for finding the expected
reaching time for a state.

This is the expected number of steps that will be needed to reach a

particular state from a specified start-point, or the expected length of time it
will take to get there if we have a continuous time process.

Just as first-step analysis for probabilities uses conditional probability and

the law of total probability (Partition Theorem), first-step analysis for
expectations uses conditional expectation and the law of total expectation.

First-step analysis for probabilities:

The first-step analysis procedure for probabilities can be summarized as follows:

options
P(eventual goal |
P(eventual goal) = X
option)P(option).
first-ste
p

This is because the first-step options form a partition of the sample space.

First-step analysis for expected reaching times:

The expression for expected reaching times is very similar:

options
E(reaching time) = X
E(reaching time |
first-ste option)P(option).
p

64
This follows immediately from the law of total expectation:
n o E(X | Y = y).
E(X) = EY E(X | Y ) y)P(Y =
= X y

Let X be the reaching time, and let Y be the label for possible
options: i.e. Y = 1, 2, 3, . . . for options 1, 2, 3, . . .

We then obtain:
y)
y
E(X) = X
E(X | Y = y)P(Y =

E(reaching time | o
ption)P(option).
first-step
i.e. E(reaching time) = X
options

Example 1: Mouse in a Maze

A mouse is trapped in a room with three exits at

the centre of a maze.

• Exit 1 leads outside the maze after 3 minutes.

• Exit 2 leads back to the room after 5 minutes.
• Exit 3 leads back to the room after 7 minutes.
Every time the mouse makes a choice, it is equally likely to choose any of
the three exits. What is the expected time taken for the mouse to leave the
maze?
Exit 2
Let X = time taken 5 mins
1/3
for mouse to
leave maze, starting from room 1/3
Room3 minsExit 1 7 mins 1/3

R. Let Y = exit the mouse
chooses first (1, 2, or 3). Exit 3

Then:

E(X) = EY =X3

E(X | Y = y)P(Y =
y=1
y)
E(X | Y )

1 1 1
= E(X | Y = 1) × 3+
E(X | Y = 2) × 3+
E(X | Y = 3) × 3.

But:
E(X | Y = 1) = 3 minutes
E(X | Y = 2) = 5 + E(X) (after 5 mins back in Room, time E(X) to get out) E(X
| Y = 3) = 7 + E(X) (after 7 mins, back in Room)
1 1
So   3+
× 3

× 1
3+

7 + EX
E(X) = 3 × 5 + EX
1 1
= 15 × 3 + 2(EX) × 3
1
= 15 × 13
3E(X)
⇒ E(X) = 15 minutes.

Notation for quick solutions of first-step analysis problems

As for
probabilities, first-step analysis for expectations relies on a good notation.
The best way to tackle the problem above is as follows.
Define mR =
E(time to leave maze |start in Room).

First-step analysis:
1 1 1
mR =
3×
3+ 3×
(5 + mR) + 3×
(7 + mR)
⇒ 3mR =
(3 + 5 + 7) + 2mR

15 minutes (as before).

⇒ mR =

Example 2: Counting the steps

The most common questions involving first-step analysis for expectations

ask for the expected number of steps before finishing. The number of steps
is usually equal to the number of arrows traversed from the current state to the
end.
The key point to remember is that when we take expectations, we are
usually counting something.

You must remember to add on whatever we are counting, to every step taken.
expected notation: let   1/3 1/3
The mouse is number of 1/3 1/3
put in a new steps the
Room 1
maze with two mouse takes
rooms, before it
1/3 1/3 Room 2
pictured here. reaches the EXIT
Starting from exit?
Room 1, what
is the 1. Define

E(number of steps to finish | start in Room 1)

m1 =

E(number of steps to finish | start in Room 2).

m2 =
2. First-step analysis:
1 1 1
m1 =
3×
1+ 3(1 + m1) + 3(1 + m2) (a)
1 1 1
m2 =
3×
1+ 3(1 + m1) + 3(1 + m2) (b)

We could solve as simultaneous equations, as usual, but in this case inspection

of (a) and (b) shows immediately that m1 =
m2. Thus:

(a) ⇒ 3m1 =
3 + 2m1

3 steps.
⇒ m1 =

Further, m2 = 3 steps also.

m1 =

Incrementing before partitioning

1/3
In many every possible step there are two
1/3
problems, all adds 1 to the   ways of
possible first-step total number of proceeding:
options incur the steps taken.
Room 1   1/3
same initial
penalty. The last In a case where 1/3 1/3 Room 2   1/3

example is such a all steps incur the EXIT

case, because same penalty,

1. Add the penalty onto each option separately: e.g.

1 1 1
m1 =
3×
1+ 3(1 + m1) + 3(1 + m2).
2. (Usually quicker) Add the penalty once only, at the
1 1
beginning: m1 = 1 + 3 × 0+ 3m1 +

1
3m2.
In each case, we will get the same answer (check). This is because the
option probabilities sum to 1, so in Method 1 we are adding ( 13 +

1
3+

1
3)×1
= 1×1 = 1, just as we are in Method 2.

3.6 Probability as a conditional expectation

Recall from Section 3.1 that for any event A, we can write P(A) as an expecta
tion as follows. 1) = P(A).

Define the indicator random

1 if event A occurs, 0
variable: IA =
Then E(IA) = P(IA =
otherwise.

We can refine this expression further, using the idea of conditional

expectation. Let Y be any random variable. Then
.
P(A) =
E(IA) = EY E(IA | Y )

But
0
1
E(IA | Y ) = X

r= rP(IA =
r|Y)

= 0 × P(IA =
0 | Y ) + 1 × P(IA =
1|Y)=

P(IA = 1 | Y )
= P(A | Y ).
E(IA | Y ) .
P(A) = EY P(A | Y )
Thus = EY
This means that for any random variable X (discrete or continuous), and for
any set of values S (a discrete set or a continuous set), we can write:

• for any discrete random variable Y ,

= y).
y
P(X ∈ S) = X
P(X ∈ S | Y = y)P(Y

• for any continuous r andom variable Y ,

Z
P(X ∈ S) = y (y) dy.
P(X ∈ S | Y = y)fY

Example of probability as a conditional expectation: winning a lottery

Suppose that a million people have bought tickets for the

weekly lottery draw. Each person has a probability of one
in-a-million of selecting the winning numbers. If more than
one person selects the winning numbers, the winner will be
chosen at random from all those with matching numbers.

You watch the lottery draw on TV and your numbers match the winners!!
You had a one-in-a-million chance, and there were a million players, so it
must be YOU, right?

Not so fast. Before you rush to claim your prize, let’s calculate the
probability that you really will win. You definitely win if you are the only
person with matching numbers, but you can also win if there there are
multiple matching tickets and yours is the one selected at random from the
matches.
Define Y to be the number of OTHER matching tickets out of the OTHER 1
million tickets sold. (If you are lucky, Y = 0 so you have definitely won.)

If there are 1 million tickets and each ticket has a one-in-a-million chance of
having the winning numbers, then
Y ∼ Poisson(1) approximately.
The relationship Y ∼ Poisson(1) arises because of the Poisson
approximation to the Binomial distribution.

(a) What is the probability function of Y , fY (y)?

y
fY (y) = P(Y = y) = 1

−1
y!e =1
e × y!for y = 0, 1, 2, . . ..

(b) What is the probability that yours is the only matching ticket?

1
P(only one matching ticket) = P(Y = 0) = e=

0.368.

(c) The prize is chosen at random from all those who have matching
tickets. What is the probability that you win if there are Y = y OTHER
matching tickets?
Let W be the event that I win.

P(W | Y = y) = 1

y + 1.

(d) Overall, what is the probability that you win, given that you have a
match ing ticket?
EY P(W | Y =
n y)
P(W) = o
P(W | Y =
y)P(Y = y)
∞
=X y=0
y=0 y+1
e × y!
1 1
∞
=X
1

(y + 1)y!
1 ∞
= eX
y=0
1
1 ∞
= eX (y + 1)!
y=0
)
1 y!−
= e 1
( ∞ 0!1
X y=0

1
= e{e − 1}
1
=1− e
= 0.632.

Disappointing?

71
3.7 Special process: a model for gene spread

Suppose that a particular gene comes in two variants (alleles): A and B. We

might be interested in the case where one of the alleles, say A, is harmful
— for example it causes a disease. All animals in the population must have
either allele A or allele B. We want to know how long it will take before all
animals have the same allele, and whether this allele will be the harmful
allele A or the safe allele B. This simple model assumes asexual
reproduction. It is very similar to the famous Wright-Fisher model, which is a
fundamental model of population genetics.
Assumptions:
1. The population stays at constant size N for all generations.
2. At the end of each generation, the N animals create N offspring and
then they immediately die.
3. If there are x parents with allele A, and N − x with allele B, then each
offspring gets allele A with probability x/N and allele B with 1 − x/N.
4. All offspring are independent.

Stochastic process:

The state of the process at time t is Xt = the number of animals with allele A
at generation t.

Each Xt could

be 0, 1, 2, . . . , N. The state space is {0, 1, 2, . . ., N}.

Distribution of [ Xt+1 | Xt ]

Suppose that Xt =

x, so x of the animals at generation t have allele A.
x
Each of the N offspring will get A with probability Nand B with probability
1 − xN.
x
Thus the number of offspring at time t+1 with allele A is: Xt+1 ∼ BinomialN, N .
We write this as follows: x ] ∼ Binomial
[ Xt+1 | Xt =

x
N, N .
[ Xt+1 | Xt =
x ] ∼ Binomial
72

x
N, N ,

then y
1 −xN
Ny
P(Xt+1 =
y | Xt =
x)
N−y
Example with N = (Binomial
=
formula)
3 x
N

This process becomes complicated to do by hand when N is large. We can

use small N to see how to use first-step analysis to answer our questions.

Transition diagram:

Exercise: find the missing probabilities a, b, c, and d when N = 3. Express

them all as fractions over the same denominator.

a   d
c   b
0 1 2 3
b   d
c
a
Probability the harmful allele A dies out

Suppose the process starts at generation 0. One of the three animals has
the harmful allele A. Define a suitable notation, and find the probability that
the harmful allele A eventually dies out.

Exercise: answer = 2/3.

Expected number of generations to fixation

Suppose again that the process starts at generation 0, and one of the three
animals has the harmful allele A. Eventually all animals will have the same
allele, whether it is allele A or B. When this happens, the population is said
to have reached fixation: it is fixed for a single allele and no further changes
are possible.

Define a suitable notation, and find the expected number of generations to

fixation.

Exercise: answer = 3 generations on average.

Things get more interesting for large N. When N = 100, and x = 10 animals
have the harmful allele at generation 0, there is a 90% chance that the
harmful allele will die out and a 10% chance that the harmful allele will take
over the whole population. The expected number of generations taken to
reach fixation is 63.5. If the process starts with just x = 1 animal with the
harmful allele, there is a 99% chance the harmful allele will die out, but the
expected number of generations to fixation is 10.5. Despite the allele being
rare, the average number of generations for it to either die out or saturate
the population is quite large.

Note: The model above is also an example of a process called the Voter
Process. The N individuals correspond to N people who each support one
of two political candidates, A or B. Every day they make a new decision
about whom to support, based on the amount of current support for each
candidate. Fixation in the genetic model corresponds to concensus in the
Voter Process.

Lesson Plan Class 6 CH Magnet Day 3
100% (3)
Lesson Plan Class 6 CH Magnet Day 3
8 pages
Pipeline Maintenance Inspection & Repair
100% (1)
Pipeline Maintenance Inspection & Repair
49 pages
OPS-GLB-En-108199 - Float Equipment - Operating Procedures and Recommendations
100% (1)
OPS-GLB-En-108199 - Float Equipment - Operating Procedures and Recommendations
10 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
1,027 pages
Applied Physics Lab Manual
No ratings yet
Applied Physics Lab Manual
36 pages
Haramaya University Haramaya Institute of Technology Department of Electrical and Computer Engineering
No ratings yet
Haramaya University Haramaya Institute of Technology Department of Electrical and Computer Engineering
3 pages
JEE Physics Experimental Physics Vernier Caliper, Screw Gauge
100% (1)
JEE Physics Experimental Physics Vernier Caliper, Screw Gauge
39 pages
Stats 1 - IITM BS Notes - Part 4
No ratings yet
Stats 1 - IITM BS Notes - Part 4
16 pages
Most Important Theory Questions & Derivations For Class 11 - 2024 Exam
No ratings yet
Most Important Theory Questions & Derivations For Class 11 - 2024 Exam
10 pages
MindCurveXI-SQP01-2024-25 by Deepika Bhati
No ratings yet
MindCurveXI-SQP01-2024-25 by Deepika Bhati
5 pages
End321 03 Expectations
No ratings yet
End321 03 Expectations
34 pages
Lecture Notes 4MeanVariance
No ratings yet
Lecture Notes 4MeanVariance
44 pages
Chapter 6
No ratings yet
Chapter 6
39 pages
Lecture Note L4
No ratings yet
Lecture Note L4
49 pages
MATH230 Lecture Notes 3
No ratings yet
MATH230 Lecture Notes 3
45 pages
Econometrics1 2 PDF
No ratings yet
Econometrics1 2 PDF
63 pages
Expected Value
No ratings yet
Expected Value
32 pages
Mathematical Expectation or Expected Value
No ratings yet
Mathematical Expectation or Expected Value
52 pages
Adhaar Magnetism L4 Current Loop As Magnetic Dipole, Bar Magnet
No ratings yet
Adhaar Magnetism L4 Current Loop As Magnetic Dipole, Bar Magnet
84 pages
OptimalLinearFilters PDF
No ratings yet
OptimalLinearFilters PDF
107 pages
Random Variables: Presented by in Stochastic Analysis and Inverse Modelling
100% (1)
Random Variables: Presented by in Stochastic Analysis and Inverse Modelling
21 pages
3-Mathematical Expectation
No ratings yet
3-Mathematical Expectation
37 pages
Lecture 15 N
No ratings yet
Lecture 15 N
37 pages
Lecture 10
No ratings yet
Lecture 10
28 pages
Mean Variance
No ratings yet
Mean Variance
14 pages
Lecture - 02
No ratings yet
Lecture - 02
36 pages
1-Introduction Sep 2022 (Pchem)
No ratings yet
1-Introduction Sep 2022 (Pchem)
46 pages
Nanoparticles-Properties and Applications
No ratings yet
Nanoparticles-Properties and Applications
21 pages
Ugc Net Economics English Book 2
No ratings yet
Ugc Net Economics English Book 2
17 pages
Chapter 8
No ratings yet
Chapter 8
39 pages
Stat Reviewer 1
No ratings yet
Stat Reviewer 1
61 pages
Expected Value of A Random Variable
No ratings yet
Expected Value of A Random Variable
10 pages
Random Variables Review - Unannotated
No ratings yet
Random Variables Review - Unannotated
9 pages
Probability Slides
No ratings yet
Probability Slides
12 pages
Nptel: Course On
No ratings yet
Nptel: Course On
11 pages
2A2. Review of Probability
No ratings yet
2A2. Review of Probability
8 pages
Properties of Expectation Expectation of A Sum: - Proposition
No ratings yet
Properties of Expectation Expectation of A Sum: - Proposition
16 pages
1 + X E (X Is Is Integrable, But Not Square Is Not Integrable, The Variance Is
No ratings yet
1 + X E (X Is Is Integrable, But Not Square Is Not Integrable, The Variance Is
18 pages
Hardness Test
No ratings yet
Hardness Test
15 pages
04 Ekspektasi - Matematik - SLIDE
No ratings yet
04 Ekspektasi - Matematik - SLIDE
28 pages
ST3236 Note3
No ratings yet
ST3236 Note3
17 pages
PME-lec7-ch4-a
No ratings yet
PME-lec7-ch4-a
67 pages
Relativistic Quantum Fields 1
No ratings yet
Relativistic Quantum Fields 1
60 pages
wph12 01 Que 20240516
100% (1)
wph12 01 Que 20240516
28 pages
Regrooving Elevator Sheaves White Paper
No ratings yet
Regrooving Elevator Sheaves White Paper
5 pages
Mathcad - 02-Materials
No ratings yet
Mathcad - 02-Materials
6 pages
Chapter 21
No ratings yet
Chapter 21
10 pages
Random Variable Modified PDF
No ratings yet
Random Variable Modified PDF
19 pages
c04 Mathexp
No ratings yet
c04 Mathexp
8 pages
Econometrics I Lecture 2 Wooldridge
No ratings yet
Econometrics I Lecture 2 Wooldridge
40 pages
MIT6 436JF18 Lec06
No ratings yet
MIT6 436JF18 Lec06
18 pages
Geotechnical Earthquake Engineering Leaflet 2025
No ratings yet
Geotechnical Earthquake Engineering Leaflet 2025
6 pages
04 Expected Value, Variance, Conditional Expectation
No ratings yet
04 Expected Value, Variance, Conditional Expectation
5 pages
Discrete Random Variables: Integral
No ratings yet
Discrete Random Variables: Integral
7 pages
115af18 Lecture Notes
No ratings yet
115af18 Lecture Notes
59 pages
Probability
No ratings yet
Probability
12 pages
Final Homework, Due May 3 (Returned May 8) :: F X e X
No ratings yet
Final Homework, Due May 3 (Returned May 8) :: F X e X
2 pages
MIT14 381F13 Lec1 PDF
No ratings yet
MIT14 381F13 Lec1 PDF
8 pages
2.3 Expectation of Random Variables
No ratings yet
2.3 Expectation of Random Variables
3 pages
Draw PDF
No ratings yet
Draw PDF
21 pages
Elements of Probability Theory
No ratings yet
Elements of Probability Theory
6 pages
Expectations 13 Pages
No ratings yet
Expectations 13 Pages
13 pages
3.1 Expectation: Expectation or The Expected Value of X, Denoted by E (X), Is Defined by
No ratings yet
3.1 Expectation: Expectation or The Expected Value of X, Denoted by E (X), Is Defined by
3 pages
Review Some Basic Statistical Concepts: Topic
No ratings yet
Review Some Basic Statistical Concepts: Topic
55 pages
Behavior of Gases in Relation To Compression: 2013 - IFP Training
No ratings yet
Behavior of Gases in Relation To Compression: 2013 - IFP Training
21 pages
Math Device Paper Favila
No ratings yet
Math Device Paper Favila
9 pages
Microsoft Word - Documento1
No ratings yet
Microsoft Word - Documento1
14 pages
Econ 140 (Spring 2018) - Section 1: 1 Random Variable (RV)
No ratings yet
Econ 140 (Spring 2018) - Section 1: 1 Random Variable (RV)
7 pages
EE4 App B Solutions Manual
No ratings yet
EE4 App B Solutions Manual
7 pages
Introductory Econometrics: Probability and Statistics Refresher
No ratings yet
Introductory Econometrics: Probability and Statistics Refresher
35 pages
Random Variable Definition, Types, Formula & Example
No ratings yet
Random Variable Definition, Types, Formula & Example
1 page
Probability and Statistics: A Sample Analogues Approach: Charlie Gibbons Economics 140 University of California, Berkeley
No ratings yet
Probability and Statistics: A Sample Analogues Approach: Charlie Gibbons Economics 140 University of California, Berkeley
44 pages
Dimensional Stability of Concrete
No ratings yet
Dimensional Stability of Concrete
30 pages
Statistics Boot Camp: X F X X E DX X XF X E Important Properties of The Expectations Operator
No ratings yet
Statistics Boot Camp: X F X X E DX X XF X E Important Properties of The Expectations Operator
3 pages
Expectation Value (Statisitic Formulae)
No ratings yet
Expectation Value (Statisitic Formulae)
6 pages
Super Power Store CYOA (Revised)
No ratings yet
Super Power Store CYOA (Revised)
15 pages
Distributions and Normal Random Variables
No ratings yet
Distributions and Normal Random Variables
8 pages
Chapter 3
No ratings yet
Chapter 3
6 pages
Probability Review Stochastic
No ratings yet
Probability Review Stochastic
23 pages
Rafsanzani Pane
No ratings yet
Rafsanzani Pane
11 pages
Mathematical Expectation: Variance and Covariance of Random Variables
No ratings yet
Mathematical Expectation: Variance and Covariance of Random Variables
3 pages
The General Theory of Escalator Energy Consumption With Calculations and Examples
No ratings yet
The General Theory of Escalator Energy Consumption With Calculations and Examples
16 pages
Tguide SC5
No ratings yet
Tguide SC5
2 pages
2 Physics Paper 2024-25
No ratings yet
2 Physics Paper 2024-25
4 pages
Baumer MTRM Series Hall Sensors Datasheet 02 11 15 2487
No ratings yet
Baumer MTRM Series Hall Sensors Datasheet 02 11 15 2487
2 pages
Quiz 1: Solution: U A Department of Physics and Astronomy PH 106-4 / Leclair Fall 2008
No ratings yet
Quiz 1: Solution: U A Department of Physics and Astronomy PH 106-4 / Leclair Fall 2008
3 pages
Infinite Series
From Everand
Infinite Series
James M Hyslop
No ratings yet
Differential Forms
From Everand
Differential Forms
Henri Cartan
5/5 (2)
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Math for Computer Applications
From Everand
Math for Computer Applications
The Editors of REA
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

L-10 Expectation & Variance PDF

Uploaded by

L-10 Expectation & Variance PDF

Uploaded by

44

Chapter 3: Expectation and Variance

2. First-step analysis for calculating eventual probabilities in a stochastic

Imagine observing many thousands of independent random values from the

1. Calculating expectations for continuous and discrete random variables.

We will also study similar themes for variance.

The mean, expected value, or expectation of a random variable X is writ ten

Definition: Let X be a continuous random variable with p.d.f. f​X​(x). The ex

Definition: Let X be a discrete random variable with probability function f​X​(x).

Let g(X) be a function of X. We can imagine a long-term average of g(X)

Definition: Let X be a continuous random variable, and let g be a function. The

Definition: Let X be a discrete random variable, and let g be a function. The

Expectation of XY : the definition of E(XY )

To understand what E(XY ) means, think of observing a large number of

i=1 ​x​i ​× y​i ​approaches the expectation E(XY ).

For example, if X is height and Y is weight, E(XY ) is the average of (height

ii) Let X and Y be ANY random variables (discrete, continuous, independent,

More generally, for ANY random variables X​1​, . . . , X​n​,

iii) Let X and Y be independent random variables, and g, h be functions. Then

​ . E(XY ) = E(X)E(Y ) is ONLY generally true if X and Y are

Let A be any event. We can write P(A) as an expectation, as

​ a ​random variable,​ and

3.2 Variance, covariance, and correlation

Definition: Let X be any random variable. The variance of X is

Example: ​Let X be a continuous random variable with p.d.f.

For Var​(X)​, we use

Definition: Let X and Y be any random variables. The covariance between X

In particular, ​Var​(aX + b) = a​2​Var​(X).

iii) If X and Y are NOT independent, then

The correlation measures linear association between X and Y . It takes

The correlation is ±1 if and only if there is a perfect linear relationship

The correlation is 0 if X and Y are independent, but a correlation of 0 does

3.3 Conditional Expectation and Conditional Variance

Definition: Let X and Y be discrete random variables. The conditional probability

Note: ​The conditional probabilities f​X | Y (x

using the subscript notation P​{Y =y} of

Definition: Let X and Y be discrete random variables. The conditional expectation

Conditional expectation as a random variable

The unconditional expectation of X, E(X), is just ​a number:

The conditional expectation, E(X | Y = y), is ​a number depending on ​y​.

If Y has an influence on the value of X, then Y will have an influence on the

To evaluate this function, h(y) = E(X | Y = y), we:

i) ​fix ​Y ​at the chosen value ​y​;

ii) ​find the expectation of ​X ​when ​Y ​is fixed at this value.

However, we could also evaluate the function at a random value of

Y : i) ​observe a random value of ​Y ​;

ii) ​fix ​Y ​at that observed random value;

iii) ​evaluate ​E(X | Y = ​observed random value​)​.

We obtain a ​random variable: ​E(X | Y ) = h(Y )​.

The randomness comes from the randomness in ​Y ​, not in ​X​.

Conditional expectation, ​E(X | Y )​, is a random variable

Conditional expectation of X given Y = y is a number depending on y:

So ​E(X | Y = y) ​is a number depending on ​y​, or a function of ​y​.

Conditional expectation of X given random Y is a random variable:

Conditional expectation is a very useful tool for finding the ​unconditional

The conditional variance is similar to the conditional expectation. • Var(X |

• Var(X | Y ) is a random variable, giving the variance of X when Y is

Definition: Let X and Y be random variables. The conditional variance of X,

Like expectation, Var​(X | Y = y) ​is a number depending on ​y ​(a function of y​ ​),

Laws of Total Expectation and Variance

E(g(X)| Y ) for any function

Note: ​E​Y and

• The total average is E(X)​;

Proof of (i), (ii), (iii):

(iii) Wish to prove Var(X) = E​Y [Var(X

3.4 Examples of Conditional Expectation and Variance

1. Swimming with dolphins

Fraser runs a dolphin-watch business.

(a) Name the distribution of Y , and state E(Y ) and Var(Y ).

(a) ​Let ‘success’ be ‘bad-weather day’ and ‘failure’ be ‘work-day’.

(b) ​We know ​(X | Y ) ∼ ​Poisson​(µY )​: so

By the Law of Total Expectation:

> # Pick a value for p, e.g. p = 0.2.

> # Find the sample mean of X (should be close to E(X)): >

This model arises very commonly in stochastic

​ called a ​randomly stopped sum: it is a sum of ​X​i​’s, randomly stopped at

Definition: Let X be a continuous random variable with p.d.f. fX(x). The ex

Definition: Let X be a discrete random variable with probability function fX(x).

i=1 xi × yi approaches the expectation E(XY ).

More generally, for ANY random variables X1, . . . , Xn,

. E(XY ) = E(X)E(Y ) is ONLY generally true if X and Y are

a random variable, and

Example: Let X be a continuous random variable with p.d.f.

For Var(X), we use

In particular, Var(aX + b) = a2Var(X).

Note: The conditional probabilities fX | Y (x

using the subscript notation P{Y =y} of

The unconditional expectation of X, E(X), is just a number:

The conditional expectation, E(X | Y = y), is a number depending on y.

i) fix Y at the chosen value y;

ii) find the expectation of X when Y is fixed at this value.

Y : i) observe a random value of Y ;

ii) fix Y at that observed random value;

iii) evaluate E(X | Y = observed random value).

We obtain a random variable: E(X | Y ) = h(Y ).

The randomness comes from the randomness in Y , not in X.

Conditional expectation, E(X | Y ), is a random variable

So E(X | Y = y) is a number depending on y, or a function of y.

Conditional expectation is a very useful tool for finding the unconditional

Like expectation, Var(X | Y = y) is a number depending on y (a function of y ),

Note: EY and

• The total average is E(X);

(iii) Wish to prove Var(X) = EY [Var(X

(a) Let ‘success’ be ‘bad-weather day’ and ‘failure’ be ‘work-day’.

(b) We know (X | Y ) ∼ Poisson(µY ): so

called a randomly stopped sum: it is a sum of Xi’s, randomly stopped at

hink of a cash machine, which has to be loaded with enough money

because N ∼ Poisson(λ) so E(N) = λ.

because N ∼ Poisson(λ) so E(N) = Var(N) = λ.

( N = σ2 E(N) + µ2

E(X) = EY =X3

15 minutes (as before).

E(number of steps to finish | start in Room 1)

E(number of steps to finish | start in Room 2).

Further, m2 = 3 steps also.

1. Add the penalty onto each option separately: e.g.

• for any discrete random variable Y ,

• for any continuous r andom variable Y ,

(a) What is the probability function of Y , fY (y)?

Each Xt could

Distribution of [ Xt+1 | Xt ]

Suppose that Xt =

Exercise: answer = 2/3.

Exercise: answer = 3 generations on average.