0% found this document useful (0 votes)
40 views16 pages

4 Comparison of Estimators: 4.1 Optimality Theory

The document discusses methods for comparing estimators and finding optimal estimators. It introduces concepts like mean squared error (MSE), root MSE, bias, and variance in evaluating estimators. The minimum variance unbiased estimator (UMVUE) is defined as the estimator with the lowest possible variance among all unbiased estimators. Two methods for finding the UMVUE are given: using a complete sufficient statistic and applying the Rao-Blackwell theorem. Examples of finding the UMVUE are provided for distributions like normal, Poisson, and hypergeometric.

Uploaded by

lzc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views16 pages

4 Comparison of Estimators: 4.1 Optimality Theory

The document discusses methods for comparing estimators and finding optimal estimators. It introduces concepts like mean squared error (MSE), root MSE, bias, and variance in evaluating estimators. The minimum variance unbiased estimator (UMVUE) is defined as the estimator with the lowest possible variance among all unbiased estimators. Two methods for finding the UMVUE are given: using a complete sufficient statistic and applying the Rao-Blackwell theorem. Examples of finding the UMVUE are provided for distributions like normal, Poisson, and hypergeometric.

Uploaded by

lzc
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

4 Comparison of Estimators

4.1 Optimality theory

(1) Mean Squared Error (MSE)

R(θ, T ) = E(T (X) − θ)2

(2) Root Mean Squared Error (RMSE)

E|T (X) − θ|

if T ∼ N(θ, ),
r
2p
E|T (X) − θ| = R(θ, T ).
π

Relationship between MSE and bias and variance of an estimator:

R(θ, T ) = var(T (X)) + b2 (θ, T ),

where b(θ, T ) = E(T (X)) − θ.

Ex. (X1 , . . . , Xn ) ∼ N(µ, σ 2 )

1
Pn
µ̂ = X̄, σ̂ = n i=1 (Xi − X̄)2

σ2
R(θ, X̄) = var(X̄) =
n

σ2
 2
2 nσ̂
b(θ, σ̂ ) = E − σ2
n σ2
σ2
= (n − 1) − σ 2
n
σ2
= −
n

1
σ4
 2
2 nσ̂ σ4
R(θ, σ̂ ) = var +
n2 σ2 n2
σ4
= [2(n − 1) + 1]
n2
σ 4 (2n − 1)
= . 
n2

Ex. Consider aX̄, 0 < a < 1.

σ2
R(θ, aX̄) = a2 + (a − 1)2 µ2
n

Then aX̄ is better than X̄ for µ ≈ 0. 

Inadmissible : we say the estimator S is inadmissible if there exists the other estimator T

such that

R(θ, T ) ≤ R(θ, S), ∀θ,

with strict inequality holding for some θ.

* In general, there is no estimator improves all others in all respects.

* UMVUE (uniformly minimum variance unbiased estimate):

R(θ, T ) = var(T) ≤ var(S) = R(θ, S), ∀θ and S.

• Difficulty :

– (1) unbiased estimates may not exist.

– (2) even UMVUE exist, they may be Inadmissible.

– (3) unbiasedness is not invariant under functional transformation.

Other criteria for comparing estimators:

* Bayes risk :
Z
R(θ, T )π(θ)dθ

2
* Worst-case risk :

max R(θ, T )
θ

the estimator with minimum worst-case risk is called minimax.

4.2 UMVUEs

Ex. Consider two estimators for estimating the standard deviation σ when data ∼ N(µ, σ 2 ):
s
1X
σ̂ = (Xi − X̄)2
n i

1X
σ̃ = |Xi − X̄|
n i

• aσ̂ is UMVUE with


" √ #−1
2 Γ( 12 n)
a= √
n Γ( 21 (n − 1))

• cσ̃ is unbiased (but NOT UNVUE) with


"r #−1
2(n − 1)
c= . 
πn

Theorem. (Rao-Blackwell) T (X) is sufficient for θ and Eθ (|S(X)|) < ∞. Then the estimator

defined by

T ∗ (X) = E(S(X)|T (X))

has the nice property

Eθ (T ∗ (X) − q(θ))2 ≤ Eθ (S(X) − q(θ))2 ∀θ.

Strict inequality holds unless T ∗ (X) = S(X) with probability one if varθ (S(X)) < ∞.

p.f.

b(θ, T ∗ (X)) = b(θ, S(X))

3
so it is sufficient to show that

var[E(S(X)|T (X))] ≤ var[S(X)]

with “=” holds iff E(S(X)|T (X)) = S(X); but that is already a known result (Chapter 1)

.

Complete: A statistic T is complete iff the only function g , defined on the range of T ,

satisfying

Eθ [g(T )] = 0, ∀θ

is just g() = 0.
P
Ex. (X1 , . . . , Xn ) ∼ P oisson(θ) ⇒ T = i Xi is sufficient for θ and T ∼ P oisson(nθ):

E(g(T ))

−nθ
X g(i) (n θ)i
= e = 0, ∀θ > 0
i=0
i!
⇔ g(i) = 0, ∀i = 0, 1, 2, ...

So T is complete. 

Theorem. (Lehmann-Scheffé) If T (X) is a complete sufficient statistic and S(X) is an

unbiased estimate of q(θ), then T ∗ = E[S(X)|T (X)] is UMVUE of q(θ). If varθ (T∗ (X)) < ∞,

T ∗ is the unique UMVUE.

p.f. Since b(θ, T ∗ ) = b(θ, S), so T ∗ is also unbiased. By Rao-Blackwell,

var(T∗ ) ≤ var(S)

(strictly smaller unless S is equal to T ∗ ). We need to show that T ∗ does not depend on

which unbiased estimate S we start with. Suppose

T1∗ = g1 (T ) and T2∗ = g2 (T )

4
both unbiased and obtained by Rao-Blackwell.

Then

E(g1 (T ) − g2 (T )) = 0.

By completeness g1 = g2 , so T ∗ = E(S|T ) cannot depend on S, and uniqueness follows from

the Rao-Blackwell theorem. 

How to find UMVUE: first find a complete sufficient statistic T , then either

(1) find h(T (X)) such that h(T ) is unbiased, or

(2) find an unbiased S, then E(S|T ) is UMVUE.

Ex. hypergeometric ⇒ X is sufficient and complete:

n Nθ N −N θ
 
X k n−k
Eθ (g(X)) = N
 g(k) = 0, ∀θ = 0, 1/N, . . . , 1.
k=0 n

When θ = 0,

E0 (g(X)) = g(0) = 0 ⇒ g(0) = 0

1
When θ = N
,

1 N −1 1 N −1
   
0 1 n−1
Eθ (g(X)) = N
n g(0) + N
 g(1)
n n

⇒ g(1) = 0.

So, by induction,

g(0) = g(1) = · · · = g(n) = 0

hence g() = 0. Since Eθ ( Xn ) = θ,


X
is UMVUE.
n

5
Theorem. Suppose {Pθ } belongs to exponential family, and suppose C = (C1 (θ), · · · , Ck (θ))

has a nonempty interior. Then

T (X) = (T1 (X), . . . , Tk (X))

is complete and sufficient.

Ex. (X1 , . . . , Xn ) ∼ N(µ, σ),


P
2 i (Xi − X̄)
s =
n−1

is unbiased for σ 2 and is a function of complete sufficient statistics, so it is UMVUE for σ 2 .

Ex. (X1 , . . . , Xn ) ∼ U(0, θ)

1
p(X, θ) = if X(n) < θ, X(1) > 0.
θn

Recall that the sufficient statistic for θ is X(n) , and the density of X(n) is

n tn−1
fX(n) (t) = , 0 < t < θ.
θn

Note that, if

θ
n
Z
g(t) tn−1dt = 0, ∀θ > 0,
 
E g(X(n) ) = n
θ 0

⇒ g(t) = 0.

So X(n) is also complete. Since

θ
n nθ
Z
E(X(n) ) = n tn dt =
θ 0 n+1

n+1
We have that n
X(n) is UMVUE for θ. 

Ex. X = (X1 , . . . , Xn ) ∼ Exp(λ),

f (x) = λe−λx , x > 0.

6
Pn
⇒T = i=1 Xi is sufficient, (complete) for λ. We want to estimate the quantity

Pλ (X1 ≤ x) = 1 − e−λx .

Note that I(X1 ≤ x) is an unbiased estimate of 1 − e−λx , and

E[I(X1 ≤ x)|T = t]
n
X
= P (X1 ≤ x| Xi = t)
i=1
n
!
X1 x X
= P Pn ≤ Pn | Xi = t
i=1 Xi i=1 Xi i=1

∵ X1 ∼ Γ(1, λ)
n
X
Xi ∼ Γ(n, λ)
i=1

X1
∴ Pn ∼ β(1,n−1)
i=1 Xi

n
X
E[I(X1 ≤ x)| Xi = t]
i=1
Z x/t
= β(1,n−1) (u) du
0
Z x/t
= (n − 1)(1 − u)n−2 du
0
Pn
Pn x )n−1 , if

 1 − (1 −

i=1 Xi > x
i=1 Xi
=


 1, o.w.

which is UMVUE for Pλ (X1 ≤ x).

4.3 Information Theory

* Regularity Assumptions :

(1) The set A = {x : p(x, θ) > 0} (support) does not depend on θ.

7
(2) ∀x ∈ A and θ ∈ Θ,

log p(x, θ)
∂θ

exists and is finite.

(3) Eθ (|T |) < ∞, and hence

∂ ∂
Z Z
T (x)p(x, θ)dx = T (x) p(x, θ)dx
∂θ ∂θ

* Exponential family satisfies the regularity conditions above.

* Score Function (for single observation):


S(X, θ) = log p(X, θ)
∂θ

* Properties for score function (under regularity assumptions) : (1) Eθ (S(X, θ)) = 0.

p.f.

 

Eθ log p(x, θ)
∂θ

Z
= log p(x, θ) p(x, θ)dx
∂θ

Z
= p(x, θ)dx
∂θ

Z
= p(x, θ)dx
∂θ

= 1
∂θ
= 0. 

h i
∂2
(2) varθ (S(X, θ)) = −E ∂θ 2
log p(x, θ) .
∂ 2
p.f. Since Eθ (S(X, θ)) = 0, varθ (S(X, θ)) = E[S(X, θ)]2 = E ∂θ
log p(X, θ) . The result

8
follows by noting that

∂2
 
−E log p(x, θ)
∂θ2
∂2
Z
= − log p(x, θ) p(x, θ)dx
∂θ2
Z   
∂ ∂ ∂ ∂
Z
= − log p(x, θ) p(x, θ) dx − log p(x, θ) p(x, θ) dx
∂θ ∂θ ∂θ ∂θ

∂ ∂ ∂ p(x, θ)
Z Z
= − log p(x, θ) p(x, θ)dx + log p(x, θ) ∂θ p(x, θ) dx
∂θ ∂θ ∂θ p(x, θ)
Z  2

= 0+ log p(x, θ) p(x, θ)dx
∂θ
 2

= E log p(x, θ) . 
∂θ

(3) covθ (S(X, θ), T (X)) = ψ ′ (θ) with ψ(θ) = Eθ (T (X)).

p.f. Since Eθ (S(X, θ)) = 0, covθ (S(X, θ), T (X)) = E[(S(X, θ)T (X)], and


Z
E[(S(X, θ)T (X)] = log p(x, θ)T (x)p(x, θ) dx
∂θ

Z
= p(x, θ)T (x) dx
∂θ

Z
= p(x, θ)T (x) dx
∂θ

= Eθ (T (x)) = ψ ′ (θ). 
∂θ

* Fisher Information (for single observation):


 2

I(θ) = E log p(x, θ)
∂θ
 2 

= −E log p(x, θ) (from Property (2))
∂θ2

Note that the Fisher information number satisfies 0 ≤ I(θ) ≤ ∞.

Ex. (X1 , . . . , Xn ) ∼ P oisson(θ)


Pn
∂ i=1 xi
log p(x, θ) = −n
∂θ θ
9
Pn 2
i=1 xi
I(θ) = E −n
θ
 Pn 
i=1 xi
= var
θ
1
= 2 nθ
θ
n
= .
θ

Theorem. (Information Inequality) Let T (x) be any statistic such that var(T (x)) < ∞, ∀θ.

Let Eθ (T (x)) = ψ(θ). Under reqularity conditions and 0 < I(θ) < ∞:

[ψ ′ (θ)]2
varθ (T (x)) ≥ , ∀ψ differentiable.
I(θ)


p.f. Let S = ∂θ
log p(x, θ) (score). Then var(S) = I(θ), and ψ ′ (θ) = cov(T, S). The result

follows by the well known inequality for covariance (Cauchy-Schwartz). 

Corollary.

For unbiased T (X) :


1
var(T (X)) ≥ .
I(θ)

1/I(θ) : Cramér-Rao lower bound.

Corollary.

For unbiased T ∗ , if
1
var(T ∗ ) =
I(θ)

then T ∗ is UMVUE.

In general, we have n observations and the Fisher information I(θ) is defined for n

observations. Suppose that I1 (θ) is the Fisher information for single observation. Then

I(θ) = n I1 (θ).

10
Ex. (X1 , . . . , Xn ) ∼ N(µ, σ 2 )

σ2 1 1
var(X̄) = = = ,
n I(θ) n I1 (θ)

1
where I1 (θ) = σ2
.

Theorem. If {Pθ : θ ∈ Θ} satisfies regularity conditions and there exists an unbiased

estimate T ∗ of ψ(θ) achieving information bound for every θ. Then Pθ is a one-parameter

exponential family with density p(x, θ) = exp [c(θ)T ∗ (x) + d(θ) + s(x)] IA (x). Conversely, if

Pθ is a one-parameter exponential family and c(θ) has a continuous non-vanishing deviate

on Θ. Then T (X) achieves the information bound and is a UMVUE of E(T (X)).

4.4 Large Sample Theory

* Consistency : Tn (X1 , . . . , Xn ) is with high probability close to θ.


p
Mathematically, Tn is consistent for θ (Tn → θ) iff ∀ε > 0, as n → ∞,

P [ |Tn (X1 , . . . , Xn ) − θ| ≥ ε ] → 0.

Ex.
n
1X j
m̂j = X
n i=1 i

is consistent for E(X j ).

Theorem. The method of moment estimator is consistent.

p
Tn = g(m̂1 , . . . , m̂r ) → g(m1 (θ), . . . , mr (θ)) = g(θ)

* UMVUEs are consistent; MLEs usually are.

11
* Asymptotic Normality:

Tn is approximately normally distributed with µn (θ) and variance σn2 (θ) iff
 
t − µn (θ)
P (Tn (X) ≤ t) ≈ Φ
σn (θ)

i.e., ∀z,
 
Tn (X1 , . . . , Xn ) − µn (θ)
lim P ≤ z = Φ(z).
n→∞ σn (θ)

* n-consistent :


n(µn (θ) − q(θ)) → 0,

n σn2 (θ) → σ 2 (θ) > 0.

Ex. (X1 , . . . , Xn ) ∼ Bin(θ)

√  d
n q(X̄n ) − q(θ) → Z ∼ N(0, [q ′ (θ)]2 θ(1 − θ)).

So we can take
[q ′ (θ)]2 θ(1 − θ)
µn = q(θ), σn2 = .
n

Theorem. (a) Suppose that P = (P1 , . . . , Pk ) are the population frequencies for k categories.

Let Tn = h(P̂1 , . . . , P̂k ) with P̂i (i = 1, . . . , k) the sample frequencies. Then


n(Tn − h(P1 , . . . , Pk )) → N(0, σh2 )

k  2 " k
#2
X ∂ X ∂
σh2 = Pi h(P) − Pi h(P)
i=1
∂Pi i=1
∂Pi

(b) Suppose that m = (m1 , . . . , mr ) are the population moments. Let Tn = h(m̂1 , . . . , m̂k )

with m̂i (i = 1, . . . , r) the sample moments. Then


n(Tn − g(m1 , . . . , mr )) → N(0, σg2 )

12
2r
" r #2
X X ∂
σg2 = bi mi − mi g(m)
i=2 i=1
∂mi
r
!
X ∂
= var g(m) X i ,
i=1
∂mi
X ∂ ∂
bi = g(m) g(m).
∂mj ∂mk
j+k=i
1≤j, k≤r

Ex. (Hardy-Weinberg Equilibrium) N1 = #(AA), N2 = #(Aa), N3 = #(aa), θ = P (A), P1 =

P (AA) = θ2 , P2 = P (Aa) = 2θ(1 − θ), P3 = P (aa) = (1 − θ)2 .


r
N1
T1 (X) =
nr
N3
T2 (X) = 1 −
n
N1 N2
T3 (X) = + (MLE)
n 2n

p
h1 (P1 , P2 , P3 ) = P1
p
h2 (P1 , P2 , P3 ) = 1 − P3
1
h3 (P1 , P2 , P3 ) = P1 + P2
2
2
1 − θ2

1 1 1 − P1
σ12 = √ · var(X1 ) = · P1 (1 − P1 ) = =
2 P1 4P1 4 4
2
1 − (1 − θ)2

1 1 1 − P3
σ22 = − √ · var(X3 ) = · P3 (1 − P3 ) = =
2 P3 4P3 4 4

 2
1 1
σ32 = 2
1 · var(X1 ) + · var(X2 ) + 1 · · 2 cov(X1 , X2 )
2 2
1
= P1 (1 − P1 ) + · P2 (1 − P2 ) + 1 · (−P1 P2 )
4
1
= θ12 (1 − θ12 ) + · 2θ1 (1 − θ1 )[1 − 2θ1 (1 − θ1 )] − θ12 · 2θ1 (1 − θ1 )
4
2
θ θ1
= − 1 +
2 2
θ1 (1 − θ1 )
= .
2
13
Ex. σ̂ 2 = m̂2 − m̂21 .

g(m1 , m2 ) = m2 − m21


g(m1 , m2 ) = −2m1
∂m1

g(m1 , m2 ) = 1
∂m2

∴ var(σ̂ 2 ) = var(−2m1 · X + 1 · X 2 )

= var(X − m1 )2

= E(X − m1 )4 − σ 4

= 3m41 − 4m1 m3 + 8m21 m3 − m22 . 

* Asymptotic Relative Efficiency (ARE) :

Tn(1) : nσn2 1 = σ12

Tn(2) : nσn2 2 = σ22

the asymptotic relative efficiency of Tn1 to Tn2 :

σ22
e(θ, T (1) , T (2) ) =
σ12

Ex. (HWE)

1 − θ2
σ12 =
4
2 1 − (1 − θ)2
σ2 =
4
1 − (1 − θ)2
e(T1 , T2 ) =
1 − θ2

1
T1 is better when θ > 2
and T1 ≈ T2 when θ = 21 .

14
Does the asymptotic variance σ 2 (θ) satisfy

{ψ ′ (θ)}2
σ 2 (θ) ≧ ?
I(θ)

generally holds.

Ex. Sample variance when X ∼ Poisson(θ):

σ 2 (θ) = E(X − θ)4 − θ2 = θ + 3θ2 − θ2 = θ(1 + 2θ)



q ′ (θ) = θ=1
∂θ
2 2
θX e−θ
 
∂ X var(X) 1
I1 (θ) = E log =E −1 = 2
=
∂θ X! θ σ θ
∴ Information bound = θ ≤ θ(1 + 2θ).

*asymptotically efficient : σ 2 (θ) satisfies the information bound.

* MLE are asymptotically efficient.

Ex. X ∼ Poisson(θ)

MLE for θ is X̄ and


θ
var(X̄) = ,
n

achieving the information bound. 

Ex. X ∼ Bin(θ)
2 2
(X − θ)2
   
∂ X 1−X 1
I(θ) = E log θX (1 − θ)1−X =E − =E =
∂θ θ 1−θ θ(1 − θ) θ(1 − θ)

∴ q(X̄) is asymptotically efficient. 

*Proof of MLE’s asymptotic efficiency


Let ℓi = ∂θ
log p(Xi , θ), and θ̂ the MLE. Then

X X hX i
0= ℓi (θ̂) ≈ ℓi (θ) + ℓ′i (θ∗ ) (θ̂ − θ)

15
− √1n ℓi (θ)
P

 
N(0, I(θ)) 1
n(θ̂ − θ) ≈  1 P ′  −−−−→ = N 0,
n
ℓi (θ) Stustky
CLT I(θ) I(θ)

MLE is asymptotically normal and efficient.

(MLE is not the only method guaranteeing asymptotic optimality.)

(UMVUE, first-order Newton’s method approximation, Bayes estimate.)

Ex. q(θ) = θ(1 − θ)


n
UMVUE = X̄(1 − X̄)
n−1
√ n n √
n{Tn − θ(1 − θ)} = n{X̄n (1 − X̄n ) − θ(1 − θ)}
n−1 n−1

has the same limit as



n {X̄n (1 − X̄n ) − θ(1 − θ)}

∴ the UMVUE has the same asymptotical distribution as MLE. 

16

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy