0% found this document useful (0 votes)
73 views32 pages

Topic 4 - Sequences of Random Variables

This document provides a summary of key concepts from Chapter 7 of Papoulis on sequences (vectors) of random variables. It defines a vector random variable X as assigning a vector of real numbers to each outcome in a sample space S. A vector can describe a sequence of random variables. Expectation of a random vector is the vector of individual expectations. The covariance matrix describes the covariances between all pairs of random variables in the vector. The law of iterated expectations and conditional expectations are also summarized.

Uploaded by

Hamza Mahmood
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
73 views32 pages

Topic 4 - Sequences of Random Variables

This document provides a summary of key concepts from Chapter 7 of Papoulis on sequences (vectors) of random variables. It defines a vector random variable X as assigning a vector of real numbers to each outcome in a sample space S. A vector can describe a sequence of random variables. Expectation of a random vector is the vector of individual expectations. The covariance matrix describes the covariances between all pairs of random variables in the vector. The law of iterated expectations and conditional expectations are also summarized.

Uploaded by

Hamza Mahmood
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 32

Topic 4_Sequences of Random Variables

Read chapter 7 of Papoulis


(1 week)

Sequences (Vector) of Random Variables


Papoulis Chapter 7

A vector random variable, X, is a function that assigns a vector of


real numbers to each outcome in S. Basis for Random Processes.
The vector can be used to describe a sequence of random variables.
X() = x

Rn

Example: Sample a signal X(t) every T sec.: Xk = X(kT)


X = (X1, X2, , Xn) is a vector RV (or a random process)
An event involving an n-dimensional random variable X = (X1, X2,
, Xn) has a corresponding region in n-dimensional real space.
The expectation of a random vector is the vector of the expectations.
If X = (X1, X2, Xn), then
E[X] = (E[X1], E[X2], E[Xn])

(4-1)
2

Conditional Expected Values


(very useful in estimation)
We will show that
EX [X | Y] =EY [EX [X | Y]| Y]
(4-2a)
Which implies the useful result
EX [X] = EY [EX [X | Y] ]
(4-2b)
The above is known as the Law of Iterated Expectations. It is also
known as the Law of Total Expectation, since it implies that

$ E[ X | Y = y ]P(Y = y)
!y
E[ X ] = #
E[ X | Y = y] f Y ( y )dy

!" y
If X and Y are independent, then
EX [X | Y] = EX [X]
If a is a constant, then
E [a |Y] = a

if Y is discrete
if Y is continuous.

(4-2c)

(4-3a)
(4-3b)

Law of Iterated Expectations


Proof:

! E[X | Y = y] f

E[ E[X | Y ]] =

(y) dy

! !xf

X|Y

(x | y)dx fY (y)dy

y x

!x! f
x

X,Y

(x, y)dy dx

!xf

(x) dx

= E[X]

(4-3c)

Example: Expectation of the Sum of a Random


Number of Random Variables
If N is a positive integer-valued R.V. and Xi, i = 1, 2, are
identically distributed R.V.s, with mean E [ X that are independent of
N, then
N

i =1

is a random variable and


N

E[ X i ] = E[ N ]E[ X ].

(4-4)

i =1

Expectation of the Sum of a Random Number of


Random Variables
Proof:

E X [! Xi N = n] = E X [! Xi | N = n]
i=1

(4-5a)

i=1
n

= E X [! Xi ] = ! E X [Xi ] = n E X [X]
i=1

so that

i=1

E X [! X i | N ] = N " E[X]

(4-5b)

i=1

and finally we get the desired result


N

E[! X i ] = E N [E X [! X i | N ]] = E N [N " E[X]] = E[N ]E[X].


i=1

(4-4)

i=1
6

Conditional Variance
var( X )

E[ X | Y ]

var( X | Y )

The conditional variance of X given Y is the R.V. given by

var(X Y ) = E[(X ! E[X | Y ])2 | Y ] = E[X 2 | Y ]! E[X | Y ]2 .

(4-6a)

The Conditional Variance Formula (or Law of Total Variance)

var( X ) = E[var( X | Y )] + var( E[ X | Y ])

(4-6b)
7

Conditional Variance
Proof:
E[var( X Y )] = E[ E[ X 2 | Y ] E[ X | Y ]2 ]
= E[ E[ X 2 | Y ]] E[ E[ X | Y ]2 ]
= E[ X 2 ] E[ E[ X | Y ]2 ]
= E[ X 2 ] E[ X ]2 E[ E[ X | Y ]2 ] + E[ X ]2
= var( X ) E[ E[ X | Y ]2 ] + E[ E[ X | Y ]]2
= var( X ) var( E[ X | Y ]).
(4-7)

Generalization to Many RVs


Let X1, X2, , Xn, be n random variables defined on a sample space
Let the row vector XT = (X1, X2, , Xn) be the transpose of a random
(column) vector.
Let u = (u1, u2, , un) be a real vector
The notation { X u } denotes the event
{X1 u1, X2 u2, , Xn un}, where, as before, the commas denote
intersections, that is,
{ X u } = {X1 u1}{X2 u2 }{Xn un}
(4-8a)
The joint CDF of X1, X2, , Xn or the CDF of the random vector X is
defined as
FX(u ) = P{ X u } = P{X1 u1, X2 u2, , Xn un}
(4-8b)
FX(u ) is a real-valued function of n real variables (or of the
vector u ) with values between 0 and 1.
The expectation of a random vector is the vector of the expectations
9
E[XT] = [ E[X1], E[X2], E[Xn] ]T = XT
(4-8c)

The Covariance Matrix I


There are n2 pairs of random variables Xi and Xj giving n2
covariance functions
n of these are cov(Xi, Xi) = var(Xi)
The covariance matrix R is a symmetric nn matrix with i-jth
entry ri,j = cov(Xi, Xj).
The variances of the Xi s appear along the diagonal of the matrix
Uncorrelated RVs R is diagonal matrix
Expectation of a matrix with RVs as entries = matrix of
expectations
Note: sometimes we refer to the covariance matrix as R and
sometimes as

10

The Covariance Matrix II


X = [X1, X2, Xn] and are n x1 column vectors
(X X)T is a n1 row vector
(X X)(X X)T is a nn matrix whose i-jth entry is (Xii)(Xjj) (4-9a)
E[matrix] = matrix of expectations
R = E[(X X)(X X)T] is an nn matrix with i-jth entry ri,j = cov(Xi, Xj)
R is a symmetric positive semi-definite (also known as symmetric
nonnegative definite) matrix, since the quadratic form
var(XTa) = aTRa 0.

(4-9b)

R is used to find variance and covariances of linear combinations of the Xi

11

Multivariate (Sequence) Parameters


Let XT denote the transposed (column) data vector (x1, x2 ,..., xn )
T
Mean: E !"XT #$ = [1,..., n ]

(4-10a)

Covariance:! ij % Cov ( Xi , X j )

(4-10b)

" ij
Correlation: Corr ( Xi , X j ) % !ij =
(4-10c)
" i" j
The covariance matrix has elements ij and is denoted by and is
defined below:

Cov(X ) = E (X )(X )

& 12 12 1d #
$
!
2
21 2 2d !
$
=
$
!
$
2 !
$% d 1 d 2 d !"

(4-10d)

Multivariate Normal Distribution

x ~ N d (, )
1

& 1
#
T 1
p (x ) =
exp$ (x ) (x )!
1/ 2
d/2
% 2
"
(2)

(4-11)

Parameter Estimation (Predicting the value of Y)


Suppose Y is a RV with known PMF or PDF
Problem: Predict (or estimate) what value of Y will be
observed on the next trial
Questions:
What value should we predict?
What is a good prediction?
We need to specify some criterion that determines what is a
good/reasonable estimate.
Note that for continuous random variables, it doesnt make
sense to predict the exact value of Y, since that occurs with
zero probability.
A common estimate is the mean-square estimate.

The Mean Square Error (MSE) Estimate


We will let be the mean square estimate of the random variable, Y.
Let E[Y] =
The mean-squared error (MSE) is defined as:
e = E[(Y )2]
We proceed by completing the square
E[(Y )2] = E[(Y + )2]
= E[(Y )2 + 2(Y )( ) + ( )2]
= var(Y) + 2( )E[Y ] + ( )2
= var(Y) + ( )2 > var(Y) if
(4-12)
Clearly choosing = minimizes the MSE of the estimate
= is called the minimum- (or least-) mean-square error (MMSE
or LMSE) estimate
The minimum mean-square error is var(Y)

The MSE of a RV Based Upon Observing another RV


Let X and Y denote random variables with known joint distribution
Suppose that the value of X becomes known to us, but not the value
of Y. How can we find the MMSE estimate, ?
Can the MMSE estimate, , which is a function of X, do better than
ignoring X and estimating the value of Y as = Y = E[Y] ?
Denoting the MMSE estimate by c(X), the MSE is given by
" "
2

e = E{[Y ! c(X)] } =

2
[y
!
c(x)]
f X,Y (x, y)dxdy
#

!" !"
"

"

= # f X (x) # [y ! c(x)]2 fY |X (y | x)dydx


!"

(4-13)

!"

Note that the above integrals are positive, so that e will be minimized
if the inner integral is a minimum for all values of x.
Note that for a fixed value of x, c(x) is a variable [not a function]

The MSE of a RV Based Upon Observing another RV-2


Since for a fixed value of x, c(x) is a variable [not a function], we can
minimize the MSE by setting the derivative of the inner integral, with
respect to c, to zero:
"
d "
2
[y
!
c(x)]
fY |X (y | x)dy = # 2(y ! c) fY |X (y | x)dy = 0
#
dc !"
!"

(4-14a)

Solving for c after noting that


"

"

# c(x) f

Y |X

!"

gives

(y)dy = c(x) # fY |X (y)dy = c(x), where the integral is one.


!"

Y = c(X) =

"

# yf

Y |X

(y)dy = E[Y | X]

(4-14b)

!"

Thus the MMSE estimate, , is the conditional mean of Y given X.


The MMSE estimate is nonlinear and its MSE are RVs that are
functions of X.

17

MMSE Example
Let the random point (X, Y) is uniformly distributed on a semicircle
1
(12)1/2!

The joint PDF has value 2/ on the semicircle


The conditional PDF of Y given that X = is a uniform density on [0,
(12)1/2].
So, = E[Y|X = ] = (1/2)(12)1/2 and this estimate achieves the
least possible MSE of var(Y|X = ) = (12)/12
Intuitively reasonable since
If || is nearly 1, the MSE is small (since the range of Y is small)
If || is nearly 0, the MSE is large (since the range of Y is large)

The Regression Curve of Y on X


1

= E[Y|X=] as a function of is a curve called the regression


curve of Y on X
Graph of (1/2)(12)1/2 is a half-ellipse
Given X value, the MMSE estimator of Y can be read off
from the regression curve

Linear MMSE Estimation I


Suppose that we wish to estimate Y as a linear function of the
observation X
The linear MMSE estimate of Y is aX + b where a and b are chosen
to minimize the mean-square error E[(Y aX b)2]
Let Z = Y aX b be the error, then
E[(Y aX b)2] = E[Z2] = var(Z) + (E[Z])2
(4-15a)
= var(Y) + a2var(X) 2acov(X,Y) + (E[Z])2
The above is quadratic in a and b.
By differentiation, it can be shown that the minimum occurs when

!" Y
a=
"X

b = Y ! a X

(4-15b)

Linear MMSE Estimation II


As before, let Z = Y aX b be the error, then the MSE is
e = E[(Y aX b)2] = E [(Z2)]
(4-16a)
Setting the derivative of the MSE with respect to a to zero gives

!e
= E[2Z("X)] = 0
!A

(4-16b)

Which says that the estimation error, Z, is orthogonal, (that is


uncorrelated) to the received data X.
This is referred to as the orthogonality principle of linear
estimation. That is the error is uncorrelated with the observation )
data), X, and, intuitively, the estimate has done all it can to extract
correlated information from the data.

Gaussian MMSE = Linear MMSE


In general, the linear MMSE estimate has a higher MSE than the
(usually nonlinear) MMSE estimate E[Y|X]
If X and Y are jointly Gaussian RVs, it can be shown that the
conditional PDF of Y given X = is a Gaussian PDF with mean
Y + (Y/X)( X)
(4-17a)
and variance (Y)2(1 2)
(4-17b)
Hence, E[Y|X = ] = Y + (Y/X)( X)
(4-17c)
is the same as the linear MMSE.
For jointly Gaussian RVs, MMSE estimate = linear MMSE estimate

Limit Theorems
Limit theorems specify the probabilistic behavior of n random
variables as n
Possible restrictions on RVs:
Independent random variables
Uncorrelated random variables
Have identical marginal CDFs/PDFs/PMFs
Have identical means and/or variances

23

The Average of n RVs


n random variables X1, X2, , Xn have finite expectations 1, 2, ,
n
Let the average be
Z = (X1 + X2 + + Xn)/n
(4-18a)
What is E[Z]?
Expectation is a linear operator so that
(4-18b)
E[Z] = (E[X1] + E[X2] + + E[Xn])/n
Expected value of average of n RVs = numerical average of their
expectations.
An important practical case is when the RVs are independent and
identically distributed (i.i.d.) and the average is called the sample
mean.
24

Variance of the Sample Mean


Sample mean Z = (X1 + X2 + + Xn)/n
E[Z] = E[X] =
It is easy to show that
var(Z) = var(X1 + X2 + + Xn)/n2
= var(X)/n
This is because the RVs are independent.
The variance decreases as n increases.

(4-18c)

(4-18d)

25

Weak Law of Large Numbers (WLLN)


Weak Law of Large Numbers:
If X1, X2, Xn, are i.i.d. RVs with finite mean , then for
every > 0,
P{|(X1+X2++Xn)/n | } 0 as n
(4-19a)
Equivalently
P{|(X1+X2++Xn)/n | } 1 as n
(4-19b)
Note that it is not necessary for the RVs to have finite variance
But the proof is easier if variance is finite
Note: WLLN says lim n P{something} = 1

26

Strong Law of Large Numbers (SLLN)


Strong Law of Large Numbers:
If X1, X2, Xn, are i.i.d. RVs with finite mean , then
P{ lim n(X1 + X2 + + Xn)/n = } = 1

(4-20a)

Experiment will be repeated infinitely often and the RV X took on


values x1, x2, xn, on these trials
What can be said about (x1+x2++xn)/n?
There are three possibilities
Sequence converges to
or it converges to some other number
or it does not converge at all
The Strong Law of Large Numbers says that
(4-21b)
P{(x1+x2++xn)/n converges to } = 1
Note: SLLN says P{lim n something} = 1

27

Strong Law of Large Numbers II


If the Strong Law of Large Numbers holds, then so does the Weak
Law
In fact, both require only that the RVs be i.i.d. with finite mean
But, the Weak Law of Large Numbers might be applicable in
cases when the Strong Law does not hold
Example: Weak Law of Large Numbers still applies if the RVs are
uncorrelated but not independent

28

Strong Law and Relative frequencies


The Strong Law of Large Numbers justifies the estimation of
probabilities in terms of relative frequencies
If the Xi are i.i.d. Bernoulli RVs with parameter p (and hence,
finite mean p), then the sample mean Zn converges to p with
probability 1 as n
The observed relative frequency of an event of probability p
converges to p with probability 1.

29

The Central Limit Theorem


(previously discussed)

Let Yn= (X1 + X2 + + Xn n)/n be a RV with mean 0


and variance 1.
The Central Limit Theorem asserts for large values of n that
the CDF of Yn is well-approximated by the unit Gaussian CDF
Formally, the Central Limit Theorem states that the CDF
converges to the unit Gaussian CDF.
In practical use of the Central Limit Theorem. we hardly ever
use the RV
Yn= (X1 + X2 + + Xn n)/n
Instead, X1 + X2 + + Xn is treated as if its CDF is
approximately that of a N(n,n2) RV
Thus, we compute
P{X1 + X2 + + Xn u} ((un)/n)
which is effectively the same computation
30

Final Remarks on Probability


We see that the theory of probability is at bottom only
common sense reduced to calculation; it makes us appreciate
with exactitude what reasonable minds feel by a sort of
instinct, often without being able to account for it. It is
remarkable that this science, which originated in the
consideration of games of chance, should become the most
important object of human knowledge. The most
important questions of life are, for the most part, really only
problems of probability.
Pierre Simon, Marquis de LaPlace, Analytical Theory of
Probability

31

Additional Backup/Reference Slides

32

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy