6.chapter 4
6.chapter 4
Introduction
This Unit deals with the various requirements or properties that have to be possessed by
estimators. Different estimation procedures yield different estimators for the same population
parameter. We use the different properties to determine the estimator which is the best in some
sense. These properties include error, mean square error, unbiasedness, consistency, efficiency
and sufficiency.
Error
𝑒𝑟𝑟𝑜𝑟 = 𝜃̂( 𝑋1 , 𝑋2 , 𝑋3 , ⋯ , 𝑋𝑛 ) − 𝜃
where θ is the parameter being estimated. Note that the error depends not only on the estimator
but also on the sample. Good estimators tend to have low error values whilst poor ones have
large error values.
Bias
Bias is defined as the difference between the average of the collection of estimates and the
single population parameter being estimated, that is
𝐵𝑖𝑎𝑠 = 𝐸(𝜃̂ ) − 𝜃.
It is used to determine how far on average, the collection of sample estimates are from the
population parameter being estimated. High values of the MSE mean that the poor estimator
and low MSE values mean good estimators.
Exercise
Variance
It is used to determine how far on average, the collection of sample estimates are from the the
expected value of the estimates. High values of the variance mean that the poor estimator and
low values usually imply good estimators.
Unbiasedness
Example
Let 𝑋1 , 𝑋2 , 𝑋3 , ⋯ , 𝑋4 be a random sample form a population with density function 𝑓(𝑥), with
1
mean 𝐸(𝑋) = 𝜇 and variance 𝑉𝑎𝑟(𝑋) = 𝜎 2 . Let 𝑋̅ = 𝑛 ∑ 𝑋𝑖 be the sample mean. Show that 𝑋̅
is an unbiased estimator of 𝜇.
Solution
1
𝑬(𝑋̅) = 𝐸 ( ∑ 𝑋𝑖 )
𝑛
𝑋1 + 𝑋2 + 𝑋3 + ⋯ + 𝑋𝑛
= 𝐸( )
𝑛
1
= 𝐸(𝑋1 + 𝑋2 + 𝑋3 + ⋯ + 𝑋𝑛 )
𝑛
1
= {𝐸(𝑋1 ) + 𝐸(𝑋2 ) + 𝐸(𝑋3 ) + ⋯ + 𝐸(𝑋𝑛 )}
𝑛
1 𝑛𝜇
= {𝜇 + 𝜇 + 𝜇 + ⋯ + 𝜇} =
𝑛 𝑛
= 𝜇
Exercise
1
𝑓(𝑥, 𝜃) = , 0 < 𝑥 < 𝜃, 𝜃 > 0.
𝜃
Consistency
Consistency is another way of assessing the accuracy of an estimator. This property says that
as the sample size increases, the estimator 𝜃̂ must get closer to its true value.
Definition
2
lim E(𝜃̂ − 𝜃) = 0
n→∞
Example
Let 𝑋1 , 𝑋2 , 𝑋3 , ⋯ , 𝑋4 be a random sample from the Bernoulli distribution
1
Show that 𝑋̅ = 𝑛 ∑ 𝑋𝑖 is an unbiased and consistent estimator of 𝑝.
Solution
1 1 1 𝑛𝑝
𝐸(𝑋̅) = 𝐸 ( ∑ 𝑋𝑖 ) = ∑ 𝐸(𝑋𝑖 ) = ∑ 𝑝 = = 𝑝.
𝑛 𝑛 𝑛 𝑛
1 1 𝑛𝑝(1 − 𝑝) 𝑝(1 − 𝑝)
𝑉𝑎𝑟(𝑋̅) = 𝑉𝑎𝑟 ( ∑ 𝑋𝑖 ) = 2 ∑ 𝑉𝑎𝑟( 𝑋𝑖 ) = =
𝑛 𝑛 𝑛2 𝑛
Exercise
1. Let 𝑋1 , 𝑋2 , 𝑋3 , ⋯ , 𝑋4 be a random sample from an exponential distribution
𝑓(𝑥) = 𝑒 ∝−𝑥 , 𝑥 >∝.
(a) Determine the method of moment estimator for ∝ is a consistent estimator.
(b) Show that the maximum likelihood estimator for ∝ is a consistent estimator.
Efficiency
Efficiency is a term used in Statistics when comparing various statistical procedures or refers
to a measure of the optimality of an estimator. A more efficient estimator requires fewer
samples that a less efficient one to achieve a desire level of performance.
𝑉𝑎𝑟(𝜃̂1 )
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝐸𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑐𝑦 =
𝑉𝑎𝑟(𝜃̂2 )
Example
Consider two estimators for the parameter θ of a uniform distribution 𝑈(0, θ); θ̂1 =
n+1
2𝑋̅ and θ̂2 = n X(n) where X(n) is the maximum observation on the data 𝑋1 , 𝑋2 , 𝑋3 , ⋯ , 𝑋𝑛 .
(b) the relative efficiency of 𝜃̂1 and 𝜃̂2 . Comment on your result.
Solution
2𝜃
(a) 𝐸(𝜃̂1 ) = 𝐸(2𝑋̅) = 2𝐸(𝑋̅) = 𝜃 = 𝜃
𝑛𝑦 𝑛−1
From order statistics the pdf of 𝑋(𝑛) is given by 𝑓𝑋(𝑛) = . Therefore the
𝜃𝑛
expectation of 𝑋(𝑛) is given by
𝜃
𝑛+1 𝑛 𝑛+1 𝑛
𝐸(𝜃̂2 ) = ( ) ( 2 ) ∫ 𝑦 𝑛 𝑑𝑦 = ( )( )𝜃 = 𝜃
𝑛 𝜃 𝑛 𝑛+1
0
Since 𝐸(𝜃̂2 ) = 𝜃 we conclude that 𝜃̂2 is an unbiased estimator of 𝜃. Therefore both 𝜃̂1
and 𝜃̂2 are unbiased estimators of θ.
4 𝜃2 𝜃2
(b) 𝑉𝑎𝑟(𝜃̂1 ) = 𝑉𝑎𝑟(2𝑋̅) = 4𝑉𝑎𝑟(𝑋̅) = (𝑛) (12) = 3𝑛
To find the variance of 𝜃̂2 we first find 𝐸(𝑋(𝑛)
2
), that is
𝑛 𝜃 𝑛
2
𝐸(𝑋(𝑛) ) = 𝜃𝑛 ∫0 𝑦 𝑛 𝑑𝑦 = 𝑛+2 𝜃 2 . Therefore the variance of 𝜃̂2 is given by
(𝑛 + 1)2 (𝑛 + 1)2 2 𝜃2
𝑉𝑎𝑟(𝜃̂2 ) = 𝑉𝑎𝑟(𝑋(𝑛) ) = {𝐸(𝑋 2
(𝑛) ) − ̂
(𝐸(𝜃1 )) } =
𝑛2 𝑛2 𝑛(𝑛 + 2)
2
̂ )
𝑉𝑎𝑟(𝜃 (𝜃 ⁄3𝑛) 𝑛+2
𝑅𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑒𝑓𝑓𝑖𝑐𝑖𝑒𝑛𝑐𝑦 = 𝑉𝑎𝑟(𝜃̂1 ) = 2 =
2 (𝜃 ⁄𝑛(𝑛+2)) 3
This indicates that for values of 𝑛 > 1, 𝜃̂2 has a lower variance.
We previously discussed how to compare at least two unbiased estimators for the same
parameter. The one with least variance is considered the better one. The question that we want
to address is, “Is there a best estimator in the sense of possessing a minimum variance? How
do we know if the estimator is the best?”
In the next section we shall see that the variance of an unbiased estimator cannot be smaller
than a certain bound called the Cramer-Rao bound.
1
Var(θ̂ ) ≥ 2
∂In 𝑓𝑋 (𝑥, 𝜃)
nE [( ) ]
∂θ
and
1
Var(θ̂ ) ≥
∂2 In 𝑓𝑋 (𝑥, 𝜃)
nE [ ]
∂2
The Cramer-Rao Lower Bound (CRLB) sets a lower bound on the variance of an unbiased
estimator. It uses are
(a) If we find an estimator that achieves the CRLB, then we know that we have found a
Minimum Variance Unbiased Estimator (UMVUE),
(b) The CRLB provides a benchmark against which we can compare the performance of an
estimator,
(c) The CRLB can be used to rule-out impossible estimators, and
(d) The theory behind the CRLB can tell us if an estimator exists that achieves the lower
bound.
Example
Let 𝑋1 , 𝑋2 , 𝑋3 , ⋯ , 𝑋𝑛 be the total number of successes in each of n independent trials. Let p be
the probability of success in any given trial and is an unknown parameter. Let the distribution
of X be
𝑝𝑋 (𝑘, 𝑝) = 𝑝𝑘 (1 − 𝑝)𝑘
𝑋
Let 𝑋 = 𝑋1 + 𝑋2 + 𝑋3 + ⋯ + 𝑋𝑛 be total number of successes. Define 𝑝̂ = 𝑛 .
Solution
𝑋 𝑛𝑝 𝑋
(a) 𝐸(𝑝̂ ) = 𝐸 (𝑛 ) = = 𝑝, therefore 𝑝̂ = 𝑛 is unbiased.
𝑛
(b) We have
𝑋 1 1 𝑛𝑝(1−𝑝)
𝑉𝑎𝑟(𝑝̂ ) = 𝑉𝑎𝑟 (𝑛 ) = 𝑛2 𝑉𝑎𝑟(𝑋) = 𝑛2 𝑉𝑎𝑟( 𝑋1 + 𝑋2 + 𝑋3 + ⋯ + 𝑋𝑛 ) = =
𝑛2
𝒑(𝟏−𝒑)
𝒏
Taking the expected value of the above equation and substitute in the CRLB inequality,
we get
1 𝑝(1 − 𝑝)
=
−1 𝑛
−𝑛 ( )
𝑝(1 − 𝑝)
𝑋
Conclusion: (𝑝̂ ) the CRLB therefore 𝑝̂ = 𝑛 is a Minimum Variance Unbiased Estimator
(UMVUE).
Sufficiency
A sufficient statistics with respect to population parameter 𝜃 is a statistic 𝜃̂ = 𝜃̂(𝑋1 , . . . , 𝑋𝑛 )
that contains all the information that is useful for the estimation of 𝜃. It is a very useful data
reduction tool, and studying its properties leads to other useful results.
The intuition behind the sufficient statistic concept is that it contains all the information
necessary for estimating θ. Therefore if one is interested in estimating θ, it is necessary to get
rid of the original data while keeping only the value of the sufficient statistic.
The definition of sufficient statistic is very hard to verify. A much easier way to find sufficient
statistics is through the factorization theorem.
Definition: Let 𝑋1 , 𝑋2 , 𝑋3 , ⋯ , 𝑋𝑛 be independent and identically distributed random variables
whose distribution is the pdf 𝑓𝑋𝑖 (𝑥𝑖 ) or the pmf 𝑝𝑋𝑖 . The likelihood function is the product
of the pdf`s or pmf’s, that is
𝑛
where h(𝑥1 , . . . , 𝑥𝑛 ) does not depend on θ and 𝑔(𝜃, 𝜃̂(𝑋1, . . . , 𝑋𝑛 ) depends on θ but only
through the statistic 𝜃̂ .
Exercises
1. Let 𝑋1 , 𝑋2 , 𝑋3 , ⋯ , 𝑋𝑛 be a random sample from Bernoulli distribution with unknown
parameter p. The pdf of the 𝑋𝑖 ′𝑠 is 𝑝𝑋𝑖 (𝑘, 𝑝) = 𝑝𝑘 (1 − 𝑝)1−𝑘 , 𝑘 = 0,1; 0 ≤ 𝑝 ≤ 1
Determine whether 𝑝̂ = ∑𝑛𝑖=1 𝑋𝑖 is sufficient for p.
2. Let 𝑋1 , 𝑋2 , 𝑋3 , ⋯ , 𝑋𝑛 be a random sample from the uniform distribution over the range
(0, 𝜃). Consider the statistic ̂𝜃(𝑋1 , . . . , 𝑋𝑛 ) = max( 𝑋1 , . . . , 𝑋𝑛 ) , determine whether the
statistic is
(a) unbiased, and
(b) sufficient
3. Let 𝑋1 , 𝑋2 , 𝑋3 , ⋯ , 𝑋𝑛 be a random sample from a normal distribution for which the mean
µ is unknown but the variance 𝜎 2 .
(a) Find the unbiased estimator for µ.
1
(b) Determine whether 𝑋̅ = 𝑛 ∑ 𝑋𝑖 is a sufficient estimator for µ.