Slide MI2036 Chap5
Slide MI2036 Chap5
HANOI – 2023
(1)
Email: thuy.nguyenthithu2@hust.edu.vn
Nguyễn Thị Thu Thủy (SAMI-HUST) ProSta-CHAP5 1/52 HANOI – 2023 1 / 52
CHAPTER OUTLINE
- The techniques in Chapter 4 use the outcomes of experiments to make inferences about probability models.
In this chapter, we use observations to calculate an approximate value of a sample value of a random variable
that has not been observed.
If X is the random variable to be estimated, we adopt the notation Xb (also a random variable) for the estimate.
In most of the chapter, we use the mean square error
b 2
e = E(X − X) (1)
Content
Blind Estimation of X
- An experiment produces a random variable X. Prior to performing the experiment, what is the best estimate
of X? This is the blind estimation problem because it requires us to make an inference about X in the absence
of any observations. Although it is unlikely that we will guess the correct value of X, we can derive a number
that comes as close as possible in the sense that it minimizes the mean square error.
Theorem 1
In the absence of observations, the minimum mean square error estimate of random variable X is
x
bB = E(X) (2)
Blind Estimation of X
e = E(X 2 ) − 2b b2B .
xB E(X) + x
To minimize e, we solve
de
= −2E(X) + 2b
xB = 0
db
xB
yielding x
bB = E(X).
- Remark. In the absence of observations, the minimum mean square error estimate of X is the expected value
E(X). The minimum error is e∗B = Var(X). In introducing the idea of expected value, Chapter 2 describes
E(X) as a “typical value” of X. Theorem 1 gives this description a mathematical meaning.
Blind Estimation of X
Example 1
Prior to rolling a six-sided die, what is the minimum mean square error estimate of the number of spots X
that will appear?
Solution. The probability model is PX (x) = 1/6, x = 1, 2, . . . , 6, otherwise PX (x) = 0. For this model
E(X) = 3.5. Even though x bB = 3.5 is not in the range of X, it is the estimate that minimizes the mean square
estimation error.
- Problem.
Suppose that we perform an experiment. Instead of observing X directly, we learn only that X ∈ A. Given
this information, what is the minimum mean square error estimate of X?
Given A, X has a conditional PDF fX|A (x) or a conditional PMF PX|A (x). Our task is to minimize the
conditional mean square error eX|A = E[(X − xb)2 |A].
We see that this is essentially the same as the blind estimation problem with the conditional PDF fX|A (x)
or the conditional PMF PX|A (x) replacing fX (x) or PX (x).
Theorem 2
x
bA = E(X|A) (3)
Example 2
The duration T minutes of a phone call is an exponential random variable with expected value E(T ) = 3
minutes. If we observe that a call has already lasted 2 minutes, what is the minimum mean square error
estimate of the call duration?
If the call is still in progress after 2 minutes, we have t ∈ A = {T > 2}. Therefore, the minimum mean
square error estimate of T is b t = E(T |T > 2).
We have the conditional PDF
(
1 −(t−2)/3
fT (t) 3
e , t ≥ 2,
fT |T >2 (t) = =
P (T > 2) 0, o.w.
Therefore,
+∞
Z
1
E(T |T > 2) = t e−(t−2)/3 dt = 5 minutes.
3
2
- Remark. Prior to the phone call, the minimum mean square error (blind) estimate of T is E(T ) = 3 minutes.
After the call is in progress for 2 minutes, the best estimate of the duration becomes E(T |T > 2) = 5 minutes.
This result is an example of the memoryless property of an exponential random variable. At any time t0 during a
call, the expected time remaining is just the expected value of the call duration, E(T ).
- Problem. Consider an experiment that produces two random variables, X and Y . We can observe Y but we
really want to know X. Therefore, the estimation task is to assign to every y ∈ SY a number, xb, that is near X.
As in the other techniques presented in this section, our accuracy measure is the mean square error
eM = E(X = x
bM (y)|Y = y) (4)
- Remark
Because each y ∈ SY produces a specific x bM (y) is a sample value of a random variable X
bM (y), x bM (Y ).
The fact that x
bM (y) is a sample value of a random variable is in contrast to blind estimation and
estimation given an event.
In those situations, x
bB and x
bA are parameters of the probability model of X.
In common with xbB in Theorem 1 and x bA in Theorem 2, the estimate of X given Y is an expected value
of X based on available information. In this case, the available information is the value of Y .
Theorem 3
x
bM (y) = E(X|Y = y) (5)
Example 3
Suppose X and Y are independent random variables with PDFs fX (x) and fY (y). What is the minimum
mean square error estimate of X given Y ?
Solution.
In this case, fX|Y (x) = fX (x) and the minimum mean square error estimate is
+∞
Z +∞
Z
x
bM (y) = xfX|Y (x)dx = xfX (x)dx = E(X) = x
bB .
−∞ −∞
That is, when X and Y are independent, the observation Y is useless and the best estimate of X is
simply the blind estimate.
Suppose that R has a uniform U (0, 1) and that given R = r, X is a uniform U (0, r) random variable. Find
x
bM (r), the minimum mean square error estimate of X given R.
Solution.
From Theorem 3, we know x
bM (r) = E(X|R = r).
To calculate the estimator, we need the conditional PDF fX|R (x).
The problem statement implies that
(
1
r
, 0 < x < r < 1,
fX|R (x) =
0, o.w,
permitting us to write
Zr
1 r
x
bM (r) = x dx = .
r 2
0
Example 5
Suppose that R has a uniform U (0, 1) and that given R = r, X is a uniform U (0, r) random variable. Find
rbM (x), the minimum mean square error estimate of R given X = x.
Solution.
From Theorem 3, we know rbM (x) = E(R|X = x).
To perform this calculation, we need to find the conditional PDF fR|X (r). We have
(
1
, 0 < x < r < 1,
fR,X (r, x) = fR (r)fX|R (x) = r
0, o.w.
So, Z 1
+∞ (
1
− ln x,
Z
r
dr, 0 < x < 1, 0 < x < 1,
fX (x) = fR,X (r, x)dr = x =
0, o.w. 0, o.w.
−∞
Solution.
Hence, (
1
fR,X (r, x) − r ln x
, 0 < x < r < 1,
fR|X (r) = =
fX (x) 0, o.w.
The corresponding estimator is, therefore,
Z1
1 x−1
rbM (x) = r dr = .
−r ln x ln x
x
- Remark. While the solution of Example 4 is a simple function of r that can easily be obtained with a
microprocessor or an analog electronic circuit, the solution of Example 5 is considerably more complex. In many
applications, the cost of calculating this estimate could be significant. In these applications, engineers would
look for a simpler estimate. Even though the simpler estimate produces a higher mean square error than the
estimate in Example 5, the complexity savings might justify the simpler approach. For this reason, there are
many applications of estimation theory that employ linear estimates, the subject of Section 2.
Problems
Exercise 1
Generalizing the solution of Example 2, let the call duration T be an exponential (λ) random variable. For
t0 > 0, show that the minimum mean square error estimate of T , given that T > t0 , is b
t = t0 + E(T ).
Solution.
Z∞
t = E(T |T > t0 ) =
b tfT |T >t0 (t)dt.
t0
Problems
Exercise 2
Problems
Sol
(b) x
bB = E(X).
(c) x
bA = E(X|X < 0.5).
(e) ybB = E(Y ).
(c) ybA = E(Y |Y > 0.5).
Problems
Exercise 3
Problems
Exercise 4
Solution.
(b) x
bM (y) = E(X|Y = y) = y/3.
(d) ybM (x) = E(Y |X = x) = x/3 + 2/3.
Nguyễn Thị Thu Thủy (SAMI-HUST) ProSta-CHAP5 24/52 HANOI – 2023 24 / 52
5.1 Optimum Estimation Given Another Random Variable 5.1.3 Minimum Mean Square Estimation of X Given Y
Problems
Exercise 5
Solution.
(a) fX|Y (x) = 1/y, 0 ≤ x ≤ y
(b) x
bM (y) = E(X|Y = y) = y/2
Content
Introduction
-
In this section, we again use an observation, y, of random variable Y to produce an estimate, x
b, of
random variable X.
Again, our accuracy measure is the mean square error, in Equation (1).
Section 1 derives x
bM (y), the optimum estimate for each possible observation Y = y.
By contrast, in this section, the estimate is a single function that applies to all Y . The notation for this
function is
x
bL (y) = ay + b (6)
Introduction
-
To present the mathematics of minimum mean square error linear estimation, we introduce the subscript L
to denote the mean square error of a linear estimate:
bL (Y ))2
eL = E(X − X (7)
cov(X, Y )
ρX,Y = (8)
σX σY
Theorem 4
Random variables X and Y have expected values µX and µY , standard deviations σX and σY , and
correlation coefficient ρX,Y . The optimal linear mean square error (LMSE) estimator of X given Y is
XbL (Y ) = a∗ Y + b∗ and it has the following properties
(a)
cov(X, Y ) σX
a∗ = = ρX,Y , b∗ = µX − a∗ µY (9)
Var(Y ) σY
(b) The minimum mean square estimation error for a linear estimate is
e∗L = E(X − X
bL (Y ))2 = σX
2
(1 − ρ2X,Y ) (10)
Random variables X and Y have a bivariate Gaussian PDF with parameters µ1 , µ2 , σ1 , σ2 , and ρ if
where µ1 and µ2 can be any real numbers, σ1 > 0, σ2 > 0, and −1 < ρ < 1.
Theorem 5
If X and Y are the bivariate Gaussian random variables in Definition 1, the optimum estimator of X given
Y is the optimum linear estimator in Theorem 4.
Example 6
As in Examples 4 and 5, R is a uniform U (0, 1) random variable and given R = r, X is a uniform U (0, r)
random variable. Derive the optimum linear estimator of R given X.
Solution.
The estimate we have to derive is given by Theorem 4:
σR
(x − E(X)) + E(R).
rbL (x) = ρR,X
σX
√
Since R is uniform U (0, 1), E(R) = 1/2 and σR = 1/ 12.
Solution.
Using fX (x) and fR,X (r, x) in Example 5, we can calculate
+∞
Z √
E(X) = xfX (x)dx = 1/4, σX = 7/12
−∞
and
+∞
Z
E(RX) = rxfR,X (r, x)drdx = 1/6.
−∞
p
Hence, cov(R, X) = E(RX) − E(R)E(X) = 1/24 and ρR,X = 3/7.
So
6 2
rbL (x) = x+ .
7 7
Figure 1: The minimum mean square error (MMSE) estimate rbM (x) in Example 5 and the optimum
linear (LMSE) estimate rbL (x) in Example 6 of X given R
- Remark.
Figure 1 compares the optimum (MMSE) estimator and the optimum linear (LMSE) estimator.
We see that the two estimators are reasonably close for all but extreme values of x (near 0 and 1).
Note that for x > 5/6, the linear estimate is greater than 1, the largest possible value of R.
By contrast, the optimum estimate rbM (x) is confined to the range of R for all x.
Problems
Exercise 6
The following table gives PX,Y (x, y), the joint probability mass function of random variables X and Y .
HH Y
−3 −1 1 3
X HH H
−1 1/6 1/8 1/24 0
0 1/12 1/12 1/12 1/12
Problems
Exercise 6 (continuous)
(a) Find the marginal probability mass functions PX (x) and PY (y).
(b) Are X and Y independent?
(c) Find E(X), E(Y ), Var(X), Var(Y ), and cov(X, Y ).
b ) = aY + b be a linear estimator of X. Find a∗ and b∗ , the values of a and b that minimize the
(d) Let X(Y
mean square error eL .
(e) What is e∗L , the minimum mean square error of the optimum linear estimate?
(f) Find PX|Y =−3 (x), the conditional PMF of X given Y = −3.
bM (−3), the optimum (nonlinear) mean square estimator of X given Y = −3.
(g) Find x
(h) What is
e∗ (−3) = E[(X − x
bM (−3))2 |Y = −3]
the mean square error of this estimate?
Problems
Solution.
(d) From Theorem 4, the optimal linear estimate of X given Y is
bL (Y ) = ρX,Y σX (Y − µY ) + µX = 7 Y + 0.
X
σY 30
Therefore, a∗ = 7/30 and b∗ = 0.
bM (−3) = E(X|Y = −3) = −2/3.
(g) x
bM (−3))2 |Y = −3] = 2/9.
(h) ebM (−3) = E[(X − x
Problems
Exercise 7
A telemetry voltage V , transmitted from a position sensor on a ship’s rudder, is a random variable with
(
1/12, −6 ≤ v ≤ 6,
fV (v) =
0, o.w.
A receiver in the ship’s control room receives R = V + X. The random variable X is a Gaussian (0, 3)
noise voltage that is independent of V . The receiver uses R to calculate a linear estimate of the telemetry
voltage: Vb = aR + b. Find
(a) the expected received voltage E(R),
(b) the variance Var(R) of the received voltage,
(c) the covariance cov(V, R) of the transmitted and received voltages,
(d) a∗ and b∗ , the optimum coefficients in the linear estimate,
(e) e∗ , the minimum mean square error of the estimate
Problems
Solution.
(a) E(R) = E(V + X) = 0.
(b) V (R) = V (V ) + V (X) = 15.
(c) cov(V, R) = V (V ).
(d) ρV,R = σV /σR , Vb (R) = 12R/15, a∗ = 4/5, b∗ = 0.
(e) e∗ = V ar(V )(1 − ρ2R,V ) = 12/5.
Problems
Exercise 8
The random variables X and Y have the joint probability density function
(
2(y + x), 0 ≤ x ≤ y ≤ 1,
fX,Y (x, y) =
0, o.w.
What is X
bL (Y ), the linear minimum mean square error estimate of X given Y ?
Solution.
E(X) = 5/12, E(X 2 ) = 7/30, E(Y ) = 3/4, E(Y 2 ) = 3/5, Var(X) = 129/2160, Var(Y ) = 3/80,
√
E(XY ) = 1/3, ρX,Y = 5/ 129,
σX
X
bL (Y ) = ρX,Y
σY
(Y − E(Y )) + E(X) = 5Y /9
Problems
Exercise 9
Problems
Exercise 10
Random variable R has an exponential PDF with expected value 1. Given R = r, X has an exponential
PDF with expected value 1/r. Find
(a) the MMSE estimate of R given X = x, rbM (x),
(b) the MMSE estimate of X given R = r, x
bM (r),
(c) the LMSE estimate of R given X, R
bL (X),
Solution.
(a) 1/(x + 1)
(b) 1/r
(c) Because E(X) doesn’t exist, the LMSE estimate of X given R doesn’t exist.
(d) Because E(X) doesn’t exist, the LMSE estimate of R given X doesn’t exist.
Content
Introduction
-
Sections 1 and 2 describe methods for minimizing the mean square error in estimating a random variable
X given a sample value of another random variable Y .
In this section, we present the maximum a posteriori probability (MAP) estimator and the maximum
likelihood (ML) estimator.
Although neither of these estimates produces the minimum mean square error, they are convenient to
obtain in some applications, and they often produce estimates with errors that are not much higher than
the minimum mean square error.
As you might expect, MAP and ML estimation are closely related to MAP and ML hypothesis testing.
We will describe these methods in the context of continuous random variables X and Y .
MAP Estimation
Definition 2 (Maximum A Posteriori Probability (MAP) Estimate)
x
bMAP (y) = arg max fX|Y (x) (12)
x
-
In this definition, the notation arg maxx g(x) denotes a value of x that maximizes g(x), where g(x) is any
function of a variable x.
We recall from Chapter 3 that
Because the denominator fY (y) does not depend on x, maximizing fX|Y (x) over all x is equivalent to
maximizing the numerator fY |X (y)fX (x).
Nguyễn Thị Thu Thủy (SAMI-HUST) ProSta-CHAP5 45/52 HANOI – 2023 45 / 52
5.3 MAP and ML Estimation 5.3.1 MAP Estimation
MAP Estimation
Theorem 6
x
bMAP (y) = arg max fY |X (y)fX (x) = arg max fX,Y (x, y) (14)
x x
- Remark.
From Theorem 6, we see that the MAP estimation procedure requires that we know the PDF f X( x).
That is, the MAP procedure needs the a priori probability model for random variable X. This is analogous
to the requirement of the MAP hypothesis test that we know the a priori probabilities P (Hi ).
In the absence of this a priori information, we can instead implement a maximum likelihood estimator.
ML Estimation
x
bML (y) = arg max fY |X (y) (15)
x
Examples
Example 7
Consider an experiment that produces a Bernoulli random variable with probability of success q. In order
to estimate q, we perform the experiment that produces this random variable n. In this experiment, q is a
sample value of a random variable, Q, with PDF
(
6q(1 − q), 0 ≤ q ≤ 1,
fQ (q) =
0, o.w.
To estimate Q we perform n independent trials of the Bernoulli experiment. The number of successes in
the n trials is a random variable K. Given an observation K = k, derive the following estimates of Q:
(a) The blind estimate qbB .
(b) The maximum likelihood estimate qbML (k).
(c) The maximum a posteriori probability estimate qbMAP (k).
Examples
Solution.
(a) To derive the blind estimate, we find
Z1
qbB = E(Q) = q6q(1 − q)dq = 1/2.
0
(b) To find the other estimates, we observe in the problem statement that for any Q = q, K is a binomial
random variable B(n, q). Therefore, the conditional PMF of K is PK|Q (k) = Cnk q k (1 − q)n−k ,
k = 0, 1, . . . , n. The maximum likelihood estimate is the value of q that maximizes PK|Q (k). The
derivative of PK|Q (k) with respect to q is
dPK|Q (k)
= Cnk q k−1 (1 − q)n−k−1 (k(1 − q) − (n − k)q).
dq
Setting dPK|Q (k)/dq = 0, and solving for q yields
k
qbML (k) = .
n
Examples
Solution.
(c) For the MAP estimator, we need to maximize
fQ (q)PK|Q (k)
fQ|K (q) = .
PK (k)
Since the denominator of this equation is a constant with respect to q, we can obtain the maximum value
by setting the derivative of the numerator to 0:
d
[fQ (q)PK|Q (k)] = 6Cnk q k (1 − q)n−k [(k + 1)(1 − q) − (n − k + 1)q] = 0.
dq
Solving for q yields
k+1
qbMAP (k) = .
n+2
Problems
Exercise 11
For a certain coin, Q, is a uniform U (0, 1) random variable. Given Q = q, each flip is heads with
probability q, independent of any other flip. Suppose this coin is flipped n times. Let K denote the number
of heads in n flips.
(a) What is the ML estimator of Q given K?
(b) What is the MAP estimator of Q given K?
Problems
Solution.
(a) The maximum likelihood estimate of Q given K selects the value of q that maximizes PK|Q (k) the
conditional PMF of K given Q.
(b) The maximum a posteriori estimate of Q given K is simply the value of q that will maximize fQ|K (q).