Sampling MND MLE AED 2021
Sampling MND MLE AED 2021
Santiago Alférez
Febrero de 2021
Análisis Estadı́stico de Datos
MACC
1 / 24
Contenidos
2 / 24
The Multivariate Normal Likelihood
The Multivariate Normal Likelihood
3 / 24
The Multivariate Normal Likelihood
The multivariate normal likelihood
• When the numerical values of the observations become available,
they may be substituted for the xj in the above Equation.
• The resulting expression, now considered as a function of µ and Σ
for the fixed set of observations x1 , x2 , . . . , xn , is called the
likelihood.
• Many good statistical procedures employ values for the population
parameters that ”best” explain the observed data.
• One meaning of best is to select the parameter values that
maximize the joint density evaluated at the observations. This
technique is called maximum likelihood estimation, and the
maximizing parameter values are called maximum likelihood
estimates.
4 / 24
The Multivariate Normal Likelihood
5 / 24
The Multivariate Normal Likelihood
( )
Joint density Pn
(xj −µ)0 Σ−1 (xj −µ)/2
= 1 1
(2π)np/2 |Σ|n/2
e− j=1
of X1 , X2 , . . . , Xn
Since the trace of a sum of matrices is equal to the sum of the traces of the matrices:
Xn Xn
(xj − µ)0 Σ−1 (xj − µ) = tr (xj − µ)0 Σ−1 (xj − µ)
j=1 j=1
n
X
tr Σ−1 (xj − µ) (xj − µ)0
=
j=1
" n
!#
X
= tr Σ−1 (xj − µ) (xj − µ)0
j=1
6 / 24
The Multivariate Normal Likelihood
h P i
Pn 0 −1 −1 n 0
j=1 (x j − µ) Σ (x j − µ) = tr Σ j=1 (x j − µ) (x j − µ)
j=1
n n
(xj − x) (xj − x)0 +
X X
= (x − µ)(x − µ)0
j=1 j=1
n
(xj − x) (xj − x)0 + n(x − µ)(x − µ)0
X
=
j=1
7 / 24
The Multivariate Normal Likelihood
h P i
Pn 0 −1 −1 n 0
j=1 (x j − µ) Σ (x j − µ) = tr Σ j=1 (x j − µ) (x j − µ)
h P i
−1 n 0 0
=tr Σ j=1 (xj − x) (xj − x) + n(x − µ)(x − µ)
8 / 24
The Multivariate Normal Likelihood
1 −1
Pn 0 0
L(µ, Σ) = np/2 n/2
e− tr[Σ ( j=1 (xj −x)(xj −x) +n(x−µ)(x−µ) )]/2
(2π) |Σ|
j=1
n
0
X
= tr Σ−1 (xj − x) (xj − x) + n(x − µ)0 Σ−1 (x − µ)
j=1
9 / 24
Maximum Likelihood Estimation of
µ and Σ
Maximum Likelihood Estimation of µ and Σ
10 / 24
Maximum Likelihood Estimation of µ and Σ
MLE of µ and Σ
Let X1 , X2 , . . . , Xn be a random sample from a normal population
with mean µ and covariance Σ. Then
n
1X 0 (n − 1)
µ̂ = X and Σ̂ = Xj − X Xj − X = S
n n
j=1
11 / 24
Maximum Likelihood Estimation of µ and Σ
with respect to Σ.
12 / 24
Maximum Likelihood Estimation of µ and Σ
13 / 24
Maximum Likelihood Estimation of µ and Σ
14 / 24
Maximum Likelihood Estimation of µ and Σ
−1
1. The maximum likelihood estimator of µ0 Σ−1 µ is µ̂0 Σ̂ µ̂, where µ̂ = X and
Σ̂ = ((n − 1)/n)S are the maximum likelihood estimators of µ and Σ
respectively.
√ √
2. The maximum likelihood estimator of σii is σ̂ii , where
n
1X 2
σ̂ii = Xij − X̄i
n
j=1
is the maximum likelihood estimator of σii = Var (Xi )
15 / 24
Maximum Likelihood Estimation of µ and Σ
Sufficient Statistics
• The sample estimates X and S are sufficient statistics
• This means that all of the information contained in the data can
be summarized by these two statistics alone
• This is only true if the data follow a multivariate normal
distribution - if they do not, other terms are needed (i.e., skewness
array, kurtosis array, etc...)
• Some statistical methods only use one or both of these matrices in
their analysis procedures and not the actual data
16 / 24
The Sampling Distribution of
X and S
The Sampling Distribution of X and S
Some considerations
• With (p = 1), we know that X is normal with mean µ = (population
mean) and variance n1 σ 2 = population variance
sample size
• The result for the multivariate case (p ≥ 2) is analogous in that X has a
normal distribution with mean µ and covariance matrix (1/n)Σ.
2
• For the sample variance, recall that (n − 1)s2 = nj=1 Xj − X̄ is
P
Sampling distribution of µ̂
The estimator is a linear combination of normal random vectors each
from Np (µ, Σ) i.i.d.
1 1 1
µ̂ = X = X1 + X2 + · · · + Xn
n n n
So µ̂ = X also has a normal distribution Np (µ, (1/n)Σ)
18 / 24
The Sampling Distribution of X and S
n−1
Σ̂ = n S
Sampling distribution of Σ̂
The matrix
n
(xj − x) (xj − x)0
X
(n − 1)S =
j=1
Whishart distribution
• A multivariate analogue to the chi-square distribution.
• It’s defined as
Wm (· | Σ) = Wishart distribution with m degrees of freedom
Xm
= The distribution of Zj Z0j
where Zj ∼ Np (0, Σ) and independent. j=1
20 / 24
Central Limit Theorem
21 / 24
Central Limit Theorem
What if Σ is unknown?
If n is large ”enough”, S will be close to Σ, So
√
1
n(X − µ) ≈ Np (0, S) or X ≈ Np µ, S
n
22 / 24
Large-Sample Behavior of X and S
23 / 24
Comparison of Probability Contours
Below are contours for 99%, 95%, 90%, 75%, 50% and 20% for an
example with n = 20:
❜ ❜
24 / 24