0% found this document useful (0 votes)
5 views4 pages

4 Criteria For Estimators: Lectures 18-21

The document discusses criteria for estimators, focusing on bias, unbiased estimates, and the concept of Uniform Minimum Variance Unbiased (UMVU) estimators. It introduces definitions and theorems related to bias, mean squared error (MSE), and the Rao-Blackwell theorem, emphasizing the importance of sufficient statistics in improving estimators. Additionally, it covers the Information Inequality and the Cramer Rao Lower Bound, establishing relationships between variance and Fisher Information.

Uploaded by

kumawatnikky143
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views4 pages

4 Criteria For Estimators: Lectures 18-21

The document discusses criteria for estimators, focusing on bias, unbiased estimates, and the concept of Uniform Minimum Variance Unbiased (UMVU) estimators. It introduces definitions and theorems related to bias, mean squared error (MSE), and the Rao-Blackwell theorem, emphasizing the importance of sufficient statistics in improving estimators. Additionally, it covers the Information Inequality and the Cramer Rao Lower Bound, establishing relationships between variance and Fisher Information.

Uploaded by

kumawatnikky143
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

Lectures 18-21

4 Criteria for estimators


[CB7.3, BD3.4]

Definition 1 The bias of an estimate T (X) of a parameter q(θ) in a model


(non-empty set of pdf/pmf ) mathcalP = Pθ : θ ∈ Θ as Biasθ (T ) = Eθ (T (X))−
q(θ). An estimate such that Biasθ (T ) = 0 is called unbiased. Any function q(θ)
for which an unbiased estimate T exists is called an estimable function.
This notion has intuitive appeal, ruling out, for instance, estimates that ignore
the data, such as T (X) = q(θ0 ), which can’t be beat for θ = θ0 but can obviously
be arbitrarily terrible.
Eg: X̄ and s2 in normal distribution are unbiased for µ and σ 2 . However,
note that S is not an unbiased estimate of σ. Eg: (Unbiased estimates may be
absurd) Let X ∼ P oisson(λ) and let q(λ) = e−2λ . Consider T (X) = (−1)X
as an estimate. It is unbiased but since T alternates between -1 and 1 while
q(λ) > 0, it is not a good estimate.
Eg: (Unbiased Estimates in Survey Sampling) Suppose we wish to sample from
a finite population, for instance, a census unit, to determine the average value of
a variable (say) monthly family income during a time between two censuses and
suppose that we have available a list of families in the unit with family incomes
at the last census. Write xl , · · · , xN for the unknown current family incomes
and correspondingly u1 , · · · , uN for the known last census incomes. We ignore
difficulties such as families moving. We let Xl , · · · , Xn denote the incomes of a
sample of n families
PN drawn at random without replacement. The parameter of
interest is N1 i=1 xi . The model is
 −1
N
P (X1 = a1 , · · · Xn = an ) = if {a1 , · · · , an } ⊆ {x1 , · · · , xn }
n
2 (n−1) PN
Ex: X̄ is unbiased and has variance σn (1− (N −1) ) where σ = N
2 1 2
i=1 (xi − x̄) .
This method of sampling does not use the information contained in u1 , · · · , uN .
One way to do this, reflecting the probable correlation between (u1 , · · · , uN )
and (xl , · · · , xN ), is to estimate by a regression estimate
X̄R = X̄ − b(¯(U ) − ū)
Ex: For each b this is unbiased.
Ex: If the correlation between Ui and Xi is positive (population) and b <
2Cov(Ū , X̄)/V ar(Ū ), this is better than X̄.
Ex: The optimal choice of b is Cov(Ū , X̄)/V ar(Ū ). This value is unknown and
can be estimated by
1
Pn
n i=1 (Xi − X̄)(Ui − Ū )
bopt = 1
PN 2
N i=1 (ui − ū)

1
. Ex: This estimator is biased.

4.1 Uniform Minimum Variance Unbiased (UMVU)


Note: If there exist 2 unbiased estimates T1 and T2 of θ, then any estimate of
the form αT1 + (1 − α)T2 for 0 ≤ α ≤ 1 will also be an unbiased estimate of θ.
Which one should we choose?
For unbiased estimates mean square error and variance coincide.

Definition 2 An unbiased estimate T ∗ (X) of q(θ) that has minimum MSE


among all unbiased estimates for all θ is called UMVU (uniformly minimum
variance unbiased). If this happens for a single parameter value θ0 then it is
locally minimum variance unbiased.

Theorem 1 Let U be the class of all unbiased estimates T of θ ∈ Θ with


Eθ (T 2 ) < ∞∀θ, and suppose that U is non–empty. Let U0 be the set of all
unbiased estimates of 0, i.e., U0 = {ν : Eθ (ν) = 0, Eθ (ν 2 ) < ∞∀θ ∈ Θ}. Then
T0 ∈ U is UMVUE iff Eθ (νT0 ) = 0∀θ ∈ Θ∀ν ∈ U0 .

Eg: Let X be unif(θ, θ + 1). Then T = X − 1/2 is unbiased for θ. An unbiased


R θ+1
estimator ν(X) of zero has to satisfy θ ν(x)dx = 0 for all θ. One such
function is ν(x) = sin(2πx).

Cov(X − 1/2, sin(2πX)) = −cos(2πθ)/2π.

This is non-zero. So T is not UMVU.


Eg: X1 , · · · , Xn iid unif(0,θ). Here Y = (n + 1)T /n is unbiased with T as
X(n) . Note that T is a sufficient statistic. We need to check if this is un-
correlated with all unbiased estimators of zero. Suppose W is an unbiased
estimator of zero and cov(W,Y)>0. Then cov(E(W—Y),Y)=E(YE(W—Y))-
E(W)E(Y)=EE(WY—Y)-E(W)E(Y)=cov(W,Y)¿0. So wlog, W can be consid-
ered a function of Y , equivalently a function of T . But T is complete sufficient
implying W =0. Since Y is uncorrelated with W , Y is UMVE.
Theorem 2 Let U be the non–empty class of unbiased estimates of θ ∈ Θ as
defined in Theorem 1. Then there exists at most one UMVUE T ∈ U for θ.

Theorem 3 (Rao–Blackwell) Let W be any unbiased estimator of τ (θ) and T


be a sufficient statistic for θ. Define φ(T ) = E(W | T ). Then φ(T ) is an
estimator with E(φ(T )) = τ (θ) and var(φ(T )) ≤ var(W ).

Pf: CB pf 342
This process of conditioning an unbiased estimator on a sufficient statistic is
called Rao Blackwellization and leads to another unbiased estimator with uni-
formly lower variance. In other words, it is enough to consider the class of

2
unbiased estimators that are functions of sufficient statistics as any other unbi-
ased estimator will have higher variance than one of them (the corresponding
conditional correlation).

Eg: Suppose that X1 , · · · , Xn comes from density λexp(−λx). Suppose that


we want an estimate of θ = exp(−10λ). This corresponds to the probability P[
Xi > 10 ]. The maximum likelihood estimate of λ is 1/X̄, so we could certainly
claim T = exp(−10/X̄) is the MLE of θ. This is certainly not unbiased.
Use statistic u(X) = I(X1 > 10). This statistic takes only the values 0 and 1,
and it only depends on the first observation, so it’s certainly a bad estimate. It
is, however, unbiased.
The Rao-Blackwell theorem says that we P can get a better unbiased estimate by
using u∗ (X) = E[u(X)|V ] where V = Xi is a sufficient statistic.
The conditional distribution of X1 /V given V is beta(1,n).
u∗ (X) = P [X1 > 10|V ] = P (beta(1, n) > 10/V ) = (1 − 10/V )n
Eg (conditioning on an insufficient statistic): X1 , X2 iid N(θ, 1). Then X̄ is
unbiased for θ. Let φ(¯(X)) = E(X̄|X1 ). Then this is unbiased and has lower
variance. But it is not an estimator (depends on θ).

4.2 Mean squared Error


Definition 3 The Mean Squared Error (MSE) of an estimator W of a param-
eter θ is the function of θ defined by Eθ (W − θ)2 .

Alternatively, Mean absolute error or expectation of any other increasing


function of |W − θ| can be used as a measure of performance of an estimator.
The advantage of MSE id easy tractability and the interpretation M SE =
V ar + Bias2 . (prove). For an unbiased estimator MSE=var. But a biased
estimator might have lower MSE and will be preferred in most cases.
In the iid normal case, (n − 1)S 2 /σ 2 ∼ χ2n−1 . Here E(S 2 ) = σ 2 , var[(n −
1)S 2 /σ 2 ] = 2(n − 1), var(S 2 ) = 2σ 4 /(n − 1) = mse. Now let us consider
σˆ2 M LE = (n − 1)S 2 /n. Bias=σ 2 /n. Var=2(n − 1)σ 4 /n2 . MSE=(2n − 1)σ 4 /n2 .
This is smaller than MSE of the unbiased estimator S 2 . Thus by trading off
variance for bias, MSE is improved.
Eg Let X1 , · · · Xn be iid Ber(p). The MLE of p is X̄ with MSE=Var=p(1-
p)/n.
Consider
P the Bayes estimator with Beta(α, β) prior.
√ The estimator equals
p̂B ( Xi + α)/(n + α + β). Taking α = β = n/2 makes MSE(p̂B ) con-
stant as a function of p. With this prior, for small n, X̄ has lower MSE than p̂B
unless p is close to zero or one. For large n, p̂B has lower MSE than X̄ unless
p is close to half.

3
4.3 Information Inequality
Assumptions I. The set A = x : p(x, θ) > 0 does not depend on θ. For all
x ∈ A, θ ∈ Θ, ∂/partialθlogp(x, θ) exists and is finite.
II. If T is any statistic such that E(|T |) ¡ ∞ for all θ ∈ Θ, then the
R operations of
integration and differentiation can be interchanged in ∂/partialθ T (x)p(x, θ)dx.

Theorem 4 If p(x, θ) = h(x)exp{η(θ)T (x) − B(θ)} is an exponential family


and η(θ) has a nonvanishing continuous derivative on Θ, then I and II hold.
The Fisher Information is defined as I(θ) = E(logp(X, θ))2 .

Theorem 5 Suppose that I and II hold and that E|logp(X, θ)| < ∞. Then
E(logp(X, θ)) = 0 and I(θ) = V ar(logp(X, θ)).

Theorem 6 (Information Inequality/ Cramer Rao Lower Bound) Let T(X) be


any statistic such that Var(T(X)) ¡ ∞ for all θ. Denote E(T(X)) by ψ(θ). Sup-
pose that I and II hold and 0 < I(θ) < ∞. Then for all θ, ψ(θ) is differentiable
and
(ψ 0 (θ))2
V ar(T (X)) ≥
I(θ)

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy