0% found this document useful (0 votes)

13 views

Lecture 1.3

This document provides notes on Bayesian estimation from a statistical machine learning course. It discusses how estimation theory fits within statistical inference paradigms. Bayesian estimation assumes a prior distribution over the hidden variable X and computes a posterior distribution over X given an observation Y using Bayes' rule. The goal is to find an estimator X^ that minimizes the expected loss between the true X and the estimate. The Bayes optimal estimator minimizes the expected loss pointwise. When using squared loss, the Bayes least squares estimator is the mean of the posterior distribution and achieves minimum mean squared error.

Uploaded by

Jon Smithson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views

Lecture 1.3

Uploaded by

Jon Smithson

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

CS 57800 Statistical Machine Learning, Prof.

Anuran Makur Spring 2022

Lecture 1.3 Bayesian Estimation

Anuran Makur

Note: These notes are for your personal educational use only. Please do not distribute them.

1 Introduction
We now turn to the study of estimation theory. But before delving in, it is important to understand how
this topic fits into the broader scope of statistical inference. So, we briefly situate the topics we cover in this
class within the four main paradigms of statistical decision theory. Each paradigm focuses on a particular set
of assumptions imposed on a common abstract problem model. The common abstract setting is as follows.
We have a hidden variable X ∈ X that we cannot observe, and another random variable Y ∈ Y that we can
observe, where X and Y are standard measurable spaces, e.g., {0, 1}, R, etc. The two variables are related
by a known “observation model” or “statistical experiment,” which is a collection of likelihoods of Y given
X, i.e., a Markov kernel {PY |X (·|x) : x ∈ X }, where each PY |X (·|x) is a probability distribution over Y.
Our goal is to infer X based on an observation of a realization of the random variable Y generated by the
above model. Clearly, the variable X may be deterministic or random, and its alphabet X may be a finite
or an infinite set. The precise nature of X and X determines which of the aforementioned four paradigms
the inference problem falls into.
In the early 1900’s, the radar community was interested in models where |X | < +∞ (often with |X | = 2).
For instance, radar engineers would observe some measurement and have to detect if there was a signal
in the measurement, or if the measurement was just random noise. This problem could be set up as a
binary hypothesis testing problem where X = {0 = no signal, 1 = signal} (which we studied earlier). Such
inference problems with finite |X | are classified under the category of detection theory. In contrast, the
branch of statistics that deals with inference problems where X is a countably or uncountably infinite set,
e.g., {. . . , −2, −1, 0, 1, 2, . . . }, R, etc., is known as estimation theory. In the radar context, after detecting
an analog signal, engineers would have to approximate its value from noisy measurements. This would
correspond to a parameter estimation problem.
In the early statistics community, there was another divide among inference problems. Bayesian statis-
ticians believed that the underlying variable X was random and had a prior distribution PX . This prior
represented the statistician’s belief about X. So, the “right” way to proceed after observing Y was to com-
pute the posterior distribution PX|Y using Bayes’ rule in order to update the belief about X. In contrast,
non-Bayesian (or frequentist) statisticians did not impose such a prior over X. They argued that because X
could not be observed enough times for probabilities of X to have philosophical meaning, imposing a prior
over X was also meaningless. They assumed instead that X was just an unknown deterministic parameter.
The “matrix” below classifies the topics we cover in this class within the four paradigms of statistical decision
theory.

Bayesian Non-Bayesian
Detection Bayesian hypothesis testing Neyman-Pearson theory
Estimation Bayesian least squares Minimax estimation

2 Bayesian Estimation
In this section, we are concerned with Bayesian estimation theory. For simplicity, we assume in the sequel
that X = Y = R. As discussed above, in the Bayesian framework, we assume that we are given a prior
probability density function (PDF) PX > 0 and the likelihoods {PY |X (·|x) > 0 : x ∈ R}, where each
PY |X (·|x) is a conditional PDF of Y given X = x. Moreover, we assume that X and Y have finite second
moments. Our objective is to construct a “good” estimator X̂ : R → R, X̂(Y ) that provides an estimate of
X given an observation of Y . (Note that although we work with scalar random variables X and Y here, the
analysis can be generalized for vector-valued X and Y .)

1
Spring 2022 Part 1 Statistical Inference, Lecture 3 Bayesian Estimation

To formally set up the problem of finding a good estimator, as in our discussion of binary hypothesis
testing, consider a loss function L : R × R → R, where L(x, x̂) represents the “loss” in inferring x̂ when the
actual realization of X is x. We would like to construct the Bayes optimal estimator X̂B : R → R that solves
the following minimization problem:
h i
RB , inf E L(X, X̂(Y )) , (1)
X̂:R→R

where the infimum is over all estimators X̂ of X based on Y , E[L(X, X̂(Y ))] is known as the risk (where
the expectation is with respect to the joint PDF PX,Y ), and RB is known as the Bayes risk. Note that
the infimum is over all deterministic estimators, because Bayes risk is achieved by deterministic estimators
as we argued during our analysis of Bayesian hypothesis testing. So, there is no advantage in considering
randomized estimators in the minimization above.
In analogy with our analysis of binary hypothesis testing, the next proposition characterizes the Bayes
optimal estimator, cf. [1, Section 6].

Proposition 1 (Bayes Optimal Estimator). Under the aforementioned framework, suppose there exists an
estimator X̂B : R → R such that:

∀y ∈ R, X̂B (y) = arg min E[L(X, u)|Y = y] .

u∈R

Then, X̂B is the Bayes optimal estimator (as the notation suggests).

Proof. For any estimator X̂ : R → R, observe that

h i Z +∞ Z +∞
E L(X, X̂(Y )) = L(x, X̂(y))PY |X (y|x)PX (x) dx dy
−∞ −∞
Z +∞ Z +∞
= PY (y) L(x, X̂(y))PX|Y (x|y) dx dy ,
−∞ −∞
| {z }
minimize

where PX|Y (x|y) = PY |X (y|x)PX (x)/PY (y) by Bayes rule. Hence, we can find the optimal X̂ by minimizing
the above expression pointwise for each y ∈ R. Therefore, for any y ∈ R, the Bayes optimal estimator is
given by
Z +∞
X̂B (y) = arg min L(x, u)PX|Y (x|y) dx = arg min E[L(X, u)|Y = y]
u∈R −∞ u∈R

when it exists. This completes the proof.

It is evident that different choices of loss functions lead to different Bayes optimal estimators X̂B . Perhaps
the most popular and commonly used loss function in the literature is the squared loss:

∀x, x̂ ∈ R, L(x, x̂) = (x − x̂)2 . (2)

The risk corresponding to the squared loss is known as the mean squared error (MSE), and the minimum
MSE (MMSE) risk is given by: h i
RMMSE = min E (X − X̂(Y ))2 . (3)
X̂:R→R

The Bayes optimal estimator that achieves RMMSE , denoted X̂BLS : R → R, is known as the Bayes least
squares (BLS) estimator (or sometimes the MMSE estimator). The next theorem presents the well-known
result that the BLS estimator is the mean of the posterior distribution PX|Y , cf. [1, Section 6.3].

Theorem 1 (BLS Estimator). The BLS estimator is given by the conditional expectation:

∀y ∈ R, X̂BLS (y) = E[X|Y = y] ,

2
CS 57800 Statistical Machine Learning, Prof. Anuran Makur Spring 2022

and it achieves an MMSE risk of

Proof. To prove this, notice using Proposition 1 that for all y ∈ R,

X̂B (y) = arg min E (X − u)2 Y = y = arg min u (u − 2E[X|Y = y]) = E[X|Y = y] .

u∈R u∈R

Hence, X̂BLS = X̂B given above. Moreover, we have

RMMSE = E (X − E[X|Y ])2

= E E (X − E[X|Y ])2 Y

= E[var(X|Y )]

as desired. Finally, notice using the tower property that

h i
E X̂BLS (Y ) = E[E[X|Y ]] = E[X] ,

which completes the proof.

It turns out that MMSE estimation, and consequently, the BLS estimator outlined above, has an elegant
underlying geometry. The next section introduces the relevant mathematical machinery from functional
analysis to elucidate this geometric structure.

3 Hilbert Spaces and the Orthogonality Principle

Let L2 (R×R, PX,Y ) , f : R × R → R | E[f (X, Y )2 ] < +∞ denote the Hilbert space of real-valued functions
f (x, y) that have finite second moment (over the field R). Recall that a Hilbert space is a set of functions
(which play the role of “vectors”) that forms a vector space and is endowed with an inner product; so, the
set of functions forms an inner product space. Specifically, we endow L2 (R × R, PX,Y ) with the correlation
as inner product:
∀f, g ∈ L2 (R × R, PX,Y ), hf, gi , E[f (X, Y )g(X, Y )] , (4)
where the expectation is taken with respect to the joint distribution PX,Y . It is straightforward to verify
that this is indeed an inner product by checking the inner product axioms:

1. (Positive definiteness) hf, f i = E f (X, Y )2 ≥ 0 with equality if and only if f = f0 , where the almost
everywhere zero function, f0 (x, y) = 0 for x, y ∈ R, represents the zero vector in L2 (R × R, PX,Y ),

2. (Symmetry) hf, gi = E f (X, Y )g(X, Y ) = E g(X, Y )f (X, Y ) = hg, f i,

3. (Linearity) haf + bg, hi = E (af (X, Y )+bg(X, Y ))h(X, Y ) = aE f (X, Y )h(X, Y ) +bE g(X, Y )h(X, Y ) =
a hf, hi + b hg, hi,
where f, g, h ∈ L2 (R × R, PX,Y ) and a, b ∈ R. Furthermore, this inner product induces the norm:
1/2 1/2
∀f ∈ L2 (R × R, PX,Y ), kf k , hf, f i = E f (X, Y )2

,

which is the square root of the second moment of f . The finite second moment (or finite norm) constraint
in the definition of a Hilbert space is an analytical condition that ensures that the space is complete, i.e.,
every Cauchy sequence converges in the space. Informally, the vector space structure of a Hilbert space

3
Spring 2022 Part 1 Statistical Inference, Lecture 3 Bayesian Estimation

allows linear combinations of functions in the space to belong to the space, the inner product permits us
to measure distances and angles between functions in the space, and the completeness allows us to take
well-defined limits in the space. Finally, it is worth mentioning that various well-known inequalities for
Euclidean spaces carry over to this infinite dimensional Hilbert space setting. For instance, for any two
functions f, g ∈ L2 (R × R, PX,Y ), we have the well-known Cauchy-Schwarz-Bunyakovsky inequality:
2 2 2 2
|hf, gi| = E[f (X, Y )g(X, Y )] ≤ E f (X, Y )2 E g(X, Y )2 = kf k kgk ,

as well as the triangle inequality:

1/2 1/2 1/2
kf + gk = E (f (X, Y ) + g(X, Y ))2 ≤ E f (X, Y )2 + E g(X, Y )2

= kf k + kgk .

We refer readers to [2] for a comprehensive introduction to the theory of infinite dimensional Hilbert spaces.
Let S be a closed linear subspace of L2 (R × R, PX,Y ) (i.e., a sub-Hilbert space of L2 (R × R, PX,Y )). This
means that S is a non-empty subset of L2 (R × R, PX,Y ) that is itself a Hilbert space with the same inherited
inner product. Then, we have the following orthogonality principle, cf. [2, Lemma 4.1], which can be shown
to follow from the Hilbert projection theorem in convex analysis.

Theorem 2 (Orthogonality Principle). Given g ∈ L2 (R × R, PX,Y ), we have

2
h = arg min kg − f k = arg min E (g(X, Y ) − f (X, Y ))2

f ∈S f ∈S

if and only if for every f ∈ S,

hh − g, f i = E[(h(X, Y ) − g(X, Y ))f (X, Y )] = 0 .

Proof. Since you are not required to know functional analysis, we omit analytical details that guarantee the
2
existence and uniqueness of h as the solution to the extremization minf ∈S kg − f k ; see, e.g., [2, Lemma 4.1]
if interested.
To prove the forward direction, consider the function h − f ∈ S for any fixed f ∈ S and ∈ R\{0}, and
2
observe using h = arg minf ∈S kg − f k that:

2 2 2 2
kg − hk ≤ kg − h + f k = kg − hk + 2 hg − h, f i + 2 kf k

which implies that:

2
2 hg − h, f i + 2 kf k ≥ 0 .
If hg − h, f i > 0, then taking to be small enough (in magnitude) and negative contradicts the non-
negativity above. Likewise, if hg − h, f i < 0, then taking to be small enough and positive contradicts the
non-negativity above. Hence, we must have hg − h, f i = 0 for every f ∈ S.
To prove the converse direction, note that for every f ∈ S:
2 2
kg − f k = kg − h + h − f k
2 2
= kg − hk + 2 hg − h, h − f i + kh − f k
2 2
= kg − hk + kh − f k
2
≥ kg − hk

where the third equality follows from hg − h, h − f i = 0 since h − f ∈ S. This completes the proof.

Geometrically, this principle states that given a function g ∈ L2 (R × R, PX,Y ), the closest function to g
in a closed linear subspace S is a function h ∈ S such that the error h − g is orthogonal to the subspace S.
As we will see, this principle provides a unified framework to characterize the BLS and related constrained
MMSE estimators, such as the linear least squares estimator.

4
CS 57800 Statistical Machine Learning, Prof. Anuran Makur Spring 2022

4 Geometry of MMSE Estimation

We now elucidate the geometry of the BLS and linear least squares estimators using the orthogonality
principle in Theorem 2. To this end, suppose g(X, Y ) = X is the function we seek to estimate, and S is
some subspace of admissible estimator functions X̂(Y ), which only includes functions X̂ that depend on Y .
Then, the minimization in Theorem 2 corresponds to the constrained MMSE problem:
h i
min E (X − X̂(Y ))2 , (5)
X̂∈S

where each X̂ ∈ S is a function of Y only, and the definition of the subspace S imposes further constraints
on the estimators. The solution to this constrained MMSE problem is the unique estimator
h i
X̂S = arg min E (X − X̂(Y ))2 (6)
X̂∈S

that satisfies the orthogonality principle:

h i
∀f ∈ S, E (X̂S (Y ) − X)f (Y ) = 0 . (7)

Intuitively, the optimal estimator X̂S is characterized by the property that its error X̂S (Y ) − X is orthogonal
to (or uncorrelated with) every function of Y . Equivalently, X̂S (Y ) is the projection of X onto the subspace
S. Let us use this key idea to establish orthogonality characterizations of BLS and related estimators as
corollaries of Theorem 2.

4.1 BLS Estimator

Recall that the BLS estimator X̂BLS is the solution to the optimization problem:
h i
X̂BLS = arg min E (X − X̂(Y ))2 . (8)
X̂:R→R

Since this MSE minimization is over all estimators or functions of Y , consider the sub-Hilbert space

S = L2 (R, PY ) , f : R → R E f (Y )2 < +∞ ,

(9)

which contains all real-valued functions f (y) that only depend on Y and have finite second moment. It is
straightforward to verify that S is a closed subspace of L2 (R × R, PX,Y ). (For example, linear combinations
of functions of Y are also functions of Y .) Then, applying Theorem 2 yields the following orthogonality
characterization of BLS estimators, cf. [1, Section 6.3.2].
Proposition 2 (Orthogonality Characterization of BLS Estimator). An estimator X̂ : R → R satisfies the
orthogonality property: h i
∀f ∈ L2 (R, PY ), E (X̂(Y ) − X)f (Y ) = 0 ,

if and only if it is the BLS estimator X̂(y) = X̂BLS (y) = E[X|Y = y] for all y ∈ R.
Proof. As before, let g(X, Y ) = X be the function we seek to estimate. Then, Theorem 2 provides the fol-
lowing orthogonality characterization of BLS estimators: An estimator X̂ : R → R satisfies the orthogonality
property: h i
∀f ∈ L2 (R, PY ), E (X̂(Y ) − X)f (Y ) = 0 , (10)

if and only if it is the BLS estimator, X̂(y) = X̂BLS (y) for all y ∈ R, defined in (8). Furthermore, we can
derive the explicit conditional expectation form of the BLS estimator from this orthogonality characterization.
Indeed, for every f : R → R with finite second moment, we have using (10) and the tower property that
h i
E X̂BLS (Y )f (Y ) = E[Xf (Y )] = E[E[X|Y ] f (Y )] ,

5
Spring 2022 Part 1 Statistical Inference, Lecture 3 Bayesian Estimation

which implies that h i

E (X̂BLS (Y ) − E[X|Y ])f (Y ) = 0 .

Then, setting f (Y ) = X̂BLS (Y ) − E[X|Y ], we get E (X̂BLS (Y ) − E[X|Y ])2 = 0. Hence, X̂BLS (Y ) = E[X|Y ]
almost surely, since the second moment of a random variable vanishes if and only if it is zero almost surely.
This completes the proof.

Intuitively, Proposition 2 conveys that the BLS estimator is characterized by having an error X̂BLS (Y )−X
that is orthogonal to the subspace of all functions of Y . Equivalently, the BLS estimator is the projection
of X onto the subspace of all functions of Y .

4.2 Linear Least Squares Estimator

Although BLS estimators achieve MMSE, they can sometimes be difficult to calculate due to imperfect
knowledge about the underlying joint distribution PX,Y or computational intractability of finding the pos-
terior PX|Y . In such cases, we often restrict our attention to the class of affine (or linear) estimators. So,
consider the sub-Hilbert space

S = {f : R → R, f (y) = ay + b | a, b ∈ R} (11)

of all affine functions f (y) that only depend on Y . It is straightforward to verify that S is a closed subspace
of L2 (R × R, PX,Y ) using the fact that Y has finite second moment E[Y 2 ] < +∞. Fixing S to be our class
of admissible estimators, we define the linear least squares (LLS) estimation problem as:
h i
min E (X − X̂(Y ))2 = min E (X − aY − b)2 ,

(12)
X̂∈S a,b∈R

and refer to the associated optimal estimator as the LLS estimator :

h i
X̂LLS = arg min E (X − X̂(Y ))2 . (13)
X̂∈S

Note that we are essentially performing MMSE estimation over a smaller, more constrained, subspace here.
Then, applying Theorem 2 yields the following orthogonality characterization of LLS estimators, cf. [1,
Section 7].

Proposition 3 (Orthogonality Characterization of LLS Estimator). An affine estimator X̂ : R → R satisfies

the orthogonality property: h i
∀a, b ∈ R, E (X̂(Y ) − X)(aY + b) = 0 ,

if and only if it is the LLS estimator X̂ = X̂LLS given by:

cov(X, Y )
∀y ∈ R, X̂LLS (y) = E[X] + (y − E[Y ]) ,
var(Y )

where cov(X, Y ) = E[(X − E[X])(Y − E[Y ])] is the covariance of X and Y , and var(Y ) = cov(Y, Y ) is the
variance of Y . Moreover, the LLS estimator achieves an MMSE risk of
h i cov(X, Y )2
E (X − X̂LLS (Y ))2 = var(X) − .
var(Y )

Proof. As before, let g(X, Y ) = X be the function we seek to estimate. Then, Theorem 2 provides the
following orthogonality characterization of LLS estimators: An affine estimator X̂ : R → R satisfies the
orthogonality property:
h i h i
∀f ∈ S, E (X̂(Y ) − X)f (Y ) = E (X̂(Y ) − X)(aY + b) = 0 , (14)

6
CS 57800 Statistical Machine Learning, Prof. Anuran Makur Spring 2022

where a, b ∈ R are the parameters that define f (y) = ay + b for y ∈ R, if and only if it is the LLS estimator,
X̂(y) = X̂LLS (y) for all y ∈ R, defined in (13). It remains to prove the explicit form of the LLS estimator
from this orthogonality characterization. (Note that the explicit form could also be derived through direct
optimization of (12).) Let X̂LLS (y) = cy + d for some c, d ∈ R. Then, letting a = 0 and b = 1 in (14), we get

cE[Y ] + d = E[X] ,

and letting a = 1 and b = 0 in (14), we get

cE Y 2 + dE[Y ] = E[XY ] .

Solving these equations produces c = cov(X, Y )/var(Y ) and d = E[X] − (cov(X, Y )/var(Y ))E[Y ], which
establish the form of X̂LLS in the proposition statement. Finally, the MMSE risk of X̂LLS follows from direct
calculation:
" 2 #
h
2
i cov(X, Y )
E (X − X̂LLS (Y )) = E (X − E[X]) − (Y − E[Y ])
var(Y )
cov(X, Y )2 2cov(X, Y )2
= var(X) + −
var(Y ) var(Y )
2
cov(X, Y )
= var(X) − .
var(Y )

This completes the proof.

Intuitively, Proposition 3 conveys that the LLS estimator is characterized by having an error X̂LLS (Y )−X
that is orthogonal to the subspace of all affine (or linear) functions of Y . Equivalently, the LLS estimator is the
projection of X onto the subspace of all affine functions of Y . Finally, we mention two more remarks. Firstly,
the LLS estimator can be computed using only first and second order moments of the joint distribution PX,Y ;
these quantities can often be much easier to obtain than the full joint distribution PX,Y itself. Secondly,
when X, Y are jointly Gaussian random variables, the BLS and LLS estimators coincide; we leave this as an
exercise for the reader.

References
[1] G. W. Wornell, “Inference and information,” May 2017, Department of Electrical Engineering and Com-
puter Science, MIT, Cambridge, MA, USA, Lecture Notes 6.437.
[2] E. M. Stein and R. Shakarchi, Real Analysis: Measure Theory, Integration, and Hilbert Spaces, ser.
Princeton Lectures in Analysis. Princeton, NJ, USA: Princeton University Press, 2005, vol. 3.

6 437-Pset1
No ratings yet
6 437-Pset1
8 pages
Tuo Zhao Notes
No ratings yet
Tuo Zhao Notes
47 pages
Xarios 350 Описание
No ratings yet
Xarios 350 Описание
2 pages
Maximum Likelihood Estimation by K.Kashin
No ratings yet
Maximum Likelihood Estimation by K.Kashin
34 pages
Assignment 10 solution
No ratings yet
Assignment 10 solution
8 pages
SSPI Lecture 3 Estimation Intro 2025
No ratings yet
SSPI Lecture 3 Estimation Intro 2025
56 pages
Lecture Notes For Probability and Statistics
No ratings yet
Lecture Notes For Probability and Statistics
7 pages
M2S2 - Statistical Modelling: DR Axel Gandy Imperial College London Spring 2011
No ratings yet
M2S2 - Statistical Modelling: DR Axel Gandy Imperial College London Spring 2011
25 pages
RenSun Sankhya2004 ComparisonBayesFreqtstPrediction
No ratings yet
RenSun Sankhya2004 ComparisonBayesFreqtstPrediction
29 pages
Stat Risk
No ratings yet
Stat Risk
6 pages
Bayes Lecture Notes
No ratings yet
Bayes Lecture Notes
172 pages
Estimation and Detection Lec 01
No ratings yet
Estimation and Detection Lec 01
5 pages
Statistics 512 Notes 26: Decision Theory Continued: FX FX D
No ratings yet
Statistics 512 Notes 26: Decision Theory Continued: FX FX D
11 pages
UMVUE Statmat 2 2022
No ratings yet
UMVUE Statmat 2 2022
43 pages
Cours 2 MVA
No ratings yet
Cours 2 MVA
5 pages
Module C
No ratings yet
Module C
30 pages
MA40189 Notes
No ratings yet
MA40189 Notes
70 pages
Statistical Learning Theory: 18.657: Mathematics of Machine Learning
No ratings yet
Statistical Learning Theory: 18.657: Mathematics of Machine Learning
9 pages
Block 4 ST3189
No ratings yet
Block 4 ST3189
25 pages
Foundations of Statistical Inference
No ratings yet
Foundations of Statistical Inference
89 pages
확통1 LectureNote09 on Bayesian Statistical Inference
No ratings yet
확통1 LectureNote09 on Bayesian Statistical Inference
78 pages
Design and Analysis of Computer Experiments: Theory: 1 Density Estimation
No ratings yet
Design and Analysis of Computer Experiments: Theory: 1 Density Estimation
9 pages
Bayes ML Tutorial
No ratings yet
Bayes ML Tutorial
69 pages
Project Report
No ratings yet
Project Report
56 pages
Intro To Bayes Approach. Reasons To Be Bayesian: Differences Between Bayesian and Frequentist Approaches 1
No ratings yet
Intro To Bayes Approach. Reasons To Be Bayesian: Differences Between Bayesian and Frequentist Approaches 1
9 pages
Introduction
No ratings yet
Introduction
11 pages
Point Estimation: Definition of Estimators
No ratings yet
Point Estimation: Definition of Estimators
8 pages
Estimation and Detection: Lecture 6: The Bayesian Philosophy
No ratings yet
Estimation and Detection: Lecture 6: The Bayesian Philosophy
19 pages
Scribe Notes BML
No ratings yet
Scribe Notes BML
25 pages
MIT18 657F15 LecNote PDF
No ratings yet
MIT18 657F15 LecNote PDF
194 pages
Bayesian Classifier Implementation Using MATLAB
No ratings yet
Bayesian Classifier Implementation Using MATLAB
21 pages
Mathematics of Machine Learning MIT
No ratings yet
Mathematics of Machine Learning MIT
411 pages
STAT2102_Chapter6
No ratings yet
STAT2102_Chapter6
5 pages
stat-review__xid-8243919_1
No ratings yet
stat-review__xid-8243919_1
24 pages
Bayes
No ratings yet
Bayes
10 pages
IntroBayesTimeSeries1
No ratings yet
IntroBayesTimeSeries1
72 pages
MCMC Bayes PDF
No ratings yet
MCMC Bayes PDF
27 pages
Bayesian Inference: A Practical Primer: Outline
No ratings yet
Bayesian Inference: A Practical Primer: Outline
28 pages
15.097: Probabilistic Modeling and Bayesian Analysis
No ratings yet
15.097: Probabilistic Modeling and Bayesian Analysis
42 pages
Bayesian Modelling Tuts-4-9
No ratings yet
Bayesian Modelling Tuts-4-9
6 pages
Estimation Theory Presentation
100% (2)
Estimation Theory Presentation
66 pages
Lecture_Notes_MAI
No ratings yet
Lecture_Notes_MAI
114 pages
Lecture 09
No ratings yet
Lecture 09
32 pages
Lecture5 Module2 Anova 1
No ratings yet
Lecture5 Module2 Anova 1
9 pages
Lecture1
No ratings yet
Lecture1
8 pages
Detecting A Vector Based On Linear Measurements: Ery Arias-Castro
No ratings yet
Detecting A Vector Based On Linear Measurements: Ery Arias-Castro
9 pages
Lecture Notes MAI
No ratings yet
Lecture Notes MAI
111 pages
Chap 25
No ratings yet
Chap 25
85 pages
Statistics
No ratings yet
Statistics
60 pages
PBM Notes
No ratings yet
PBM Notes
130 pages
Estimation
No ratings yet
Estimation
53 pages
zzzz-essential_bayes
No ratings yet
zzzz-essential_bayes
158 pages
J Jspi 2005 01 004 PDF
No ratings yet
J Jspi 2005 01 004 PDF
25 pages
Estimation
No ratings yet
Estimation
39 pages
An Introduction To Objective Bayesian Statistics PDF
No ratings yet
An Introduction To Objective Bayesian Statistics PDF
69 pages
Asymptotic Theory and Parametric Inference
No ratings yet
Asymptotic Theory and Parametric Inference
32 pages
Notests PDF
No ratings yet
Notests PDF
153 pages
Ps 1
No ratings yet
Ps 1
6 pages
Group Theory I Essentials
From Everand
Group Theory I Essentials
Emil Milewski
No ratings yet
Theory of Approximation
From Everand
Theory of Approximation
N. I. Achieser
No ratings yet
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
From Everand
Student's Solutions Manual and Supplementary Materials for Econometric Analysis of Cross Section and Panel Data, second edition
Jeffrey M. Wooldridge
No ratings yet
1 s2.0 S0378720612000444 Main
No ratings yet
1 s2.0 S0378720612000444 Main
8 pages
logitech-wireless-combo-mk345
No ratings yet
logitech-wireless-combo-mk345
16 pages
Manual Gear Differential
No ratings yet
Manual Gear Differential
7 pages
bioNet-BM7 Service Manual
No ratings yet
bioNet-BM7 Service Manual
68 pages
FM2A75 Pro4 - multiQIG
No ratings yet
FM2A75 Pro4 - multiQIG
171 pages
Part Eterna
No ratings yet
Part Eterna
9 pages
Mad 19
No ratings yet
Mad 19
3 pages
Fermueller YB09
No ratings yet
Fermueller YB09
12 pages
PYU-24121415-977254
No ratings yet
PYU-24121415-977254
4 pages
Reguladores Cooper Power (EATON)
No ratings yet
Reguladores Cooper Power (EATON)
8 pages
Dairy Technology: Vol.01: Milk and Milk Processing PDF: (Pub.20zwt) Get Download
100% (1)
Dairy Technology: Vol.01: Milk and Milk Processing PDF: (Pub.20zwt) Get Download
2 pages
My Resume
No ratings yet
My Resume
2 pages
android studio tutorial 2
No ratings yet
android studio tutorial 2
5 pages
Basic Math Skills Grade 2 Evan Moor pdf download
100% (2)
Basic Math Skills Grade 2 Evan Moor pdf download
52 pages
Assignment 1
No ratings yet
Assignment 1
3 pages
Student Profile 2022
No ratings yet
Student Profile 2022
1 page
The Soulmate Story Review1250
No ratings yet
The Soulmate Story Review1250
5 pages
My Research Paper
No ratings yet
My Research Paper
11 pages
Funskool Pvt. LTD Chennai (Tamil Nadu)
No ratings yet
Funskool Pvt. LTD Chennai (Tamil Nadu)
27 pages
Summary - Ecenbarger 2014
No ratings yet
Summary - Ecenbarger 2014
2 pages
S112V/S115V/S215V SM10V/SM12V/SM15V: Speaker System
No ratings yet
S112V/S115V/S215V SM10V/SM12V/SM15V: Speaker System
11 pages
Human Value - Ethics - 5
No ratings yet
Human Value - Ethics - 5
4 pages
CNEA Argentina Gamma-Ray Spectrometer
No ratings yet
CNEA Argentina Gamma-Ray Spectrometer
2 pages
Info Only
No ratings yet
Info Only
5 pages
Extruded 3D Pellet Machinery Model LT100
No ratings yet
Extruded 3D Pellet Machinery Model LT100
3 pages
BEO HRD For Casual Meeting
No ratings yet
BEO HRD For Casual Meeting
4 pages
Divisibility Rule From 1 To 20
No ratings yet
Divisibility Rule From 1 To 20
3 pages
Iso 20022 Guide
No ratings yet
Iso 20022 Guide
1 page
International Conference EPCLL 2025
No ratings yet
International Conference EPCLL 2025
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lecture 1.3

Uploaded by

Lecture 1.3

Uploaded by

CS 57800 Statistical Machine Learning, Prof.

Anuran Makur Spring 2022

Lecture 1.3 Bayesian Estimation

∀y ∈ R, X̂B (y) = arg min E[L(X, u)|Y = y] .

Proof. For any estimator X̂ : R → R, observe that

when it exists. This completes the proof.

∀x, x̂ ∈ R, L(x, x̂) = (x − x̂)2 . (2)

∀y ∈ R, X̂BLS (y) = E[X|Y = y] ,

and it achieves an MMSE risk of

Proof. To prove this, notice using Proposition 1 that for all y ∈ R,

Hence, X̂BLS = X̂B given above. Moreover, we have

RMMSE = E (X − E[X|Y ])2

as desired. Finally, notice using the tower property that

which completes the proof.

3 Hilbert Spaces and the Orthogonality Principle

as well as the triangle inequality:

Theorem 2 (Orthogonality Principle). Given g ∈ L2 (R × R, PX,Y ), we have

if and only if for every f ∈ S,

hh − g, f i = E[(h(X, Y ) − g(X, Y ))f (X, Y )] = 0 .

which implies that:

4 Geometry of MMSE Estimation

that satisfies the orthogonality principle:

4.1 BLS Estimator

which implies that h i

4.2 Linear Least Squares Estimator

and refer to the associated optimal estimator as the LLS estimator :

Proposition 3 (Orthogonality Characterization of LLS Estimator). An affine estimator X̂ : R → R satisfies

if and only if it is the LLS estimator X̂ = X̂LLS given by:

and letting a = 1 and b = 0 in (14), we get

This completes the proof.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.