Introduction
Introduction
Introduction
In this chapter we briefly describe the types of problems with which we will
be concerned. Then we define some notation and review some basic concepts
from probability theory and statistical inference.
Symbol Definition
xn = o(an ) limn→∞ xn /an = 0
xn = O(an ) |xn /an | is bounded for all large n
an ∼ b n an /bn → 1 as n → ∞
an b n an /bn and bn /an are bounded for all large n
Xn X convergence in distribution
P
Xn −→ X convergence in probability
a.s.
Xn −→ X almost sure convergence
θn estimator of parameter θ
bias E(θn ) − θ
se V(θn ) (standard error)
se estimated standard error
mse E(θn − θ)2 (mean squared error)
Φ cdf of a standard Normal random variable
zα Φ−1 (1 − α)
and the log-likelihood function is n (θ) = log Ln (θ). The maximum likeli-
hood estimator, or mle θn , is the value of θ that maximizes the likelihood. The
score function is s(X; θ) = ∂ log f (x; θ)/∂θ. Under appropriate regularity
conditions, the score function satisfies Eθ (s(X; θ)) = s(x; θ)f (x; θ)dx = 0.
Also,
√
n(θn − θ) N (0, τ 2 (θ))
where τ 2 (θ) = 1/I(θ) and
∂ 2 log f (x; θ)
I(θ) = Vθ (s(x; θ)) = Eθ (s2 (x; θ)) = −Eθ
∂θ2
is the Fisher information. Also,
(θn − θ)
N (0, 1)
se
2 = 1/(nI(θn )). The Fisher information In from n observations sat-
where se
isfies In (θ) = nI(θ); hence we may also write se 2 = 1/(In (θn )).
The bias of an estimator θn is E(θ)
− θ and the the mean squared error mse
is mse = E(θ − θ)2 . The bias–variance decomposition for the mse of an
estimator θn is
mse = bias2 (θn ) + V(θn ). (1.10)
Confidence balls and bands can be finite sample, pointwise asymptotic and
uniform asymptotic as above. When estimating a real-valued quantity instead
of a function, Cn is just an interval and we call Cn a confidence interval.
Ideally, we would like to find finite sample confidence sets. When this is
not possible, we try to construct uniform asymptotic confidence sets. The
last resort is a pointwise asymptotic confidence interval. If Cn is a uniform
asymptotic confidence set, then the following is true: for any δ > 0 there exists
an n(δ) such that the coverage of Cn is at least 1 − α − δ for all n > n(δ).
With a pointwise asymptotic confidence set, there may not exist a finite n(δ).
In this case, the sample size at which the confidence set has coverage close to
1 − α will depend on f (which we don’t know).
1.3 Confidence Sets 7
F = {f (x; θ) : θ ∈ Θ}
be a parametric model with scalar parameter θ and let θn be the maximum
likelihood estimator, the value of θ that maximizes the likelihood function
n
Ln (θ) = f (Xi ; θ).
i=1
θn ≈ N (θ, se
2)
where
= (In (θn ))−1/2
se
is the estimated standard error of θn and In (θ) is the Fisher information.
Then
θn ± zα/2 se
E(X)
P(X > t) ≤ . (1.21)
t
σ2
P(|X − µ| ≥ t) ≤ . (1.22)
t2
2φ(t)
P(|Z| > t) ≤ (1.25)
t
where φ is the standard Normal density. In fact, for any t > 0,
1 1 1
− 3 φ(t) < P(Z > t) < φ(t) (1.26)
t t t
and
1 −t2 /2
P (Z > t) < e . (1.27)
2
1.4 Useful Inequalities 9
Recall that a function g is convex if for each x, y and each α ∈ [0, 1],
If g is concave then
Eg(X) ≤ g(EX). (1.33)
1.6 Exercises
1. Consider Example 1.17. Prove that (1.18) is a pointwise asymptotic
confidence interval. Prove that (1.19) is a uniform confidence interval.
2 = Sn2 /n and
where se
n
1
Sn2 = (Xi − X n )2 .
n i=1