AE - Tema 3 - The Multivariate Gaussian Distribution
AE - Tema 3 - The Multivariate Gaussian Distribution
among its applications lies the detection of anomalies. Some examples are the
detection of defective products, the detection of anomalous behavior in computers
or the detection of fraud. Before applying this important distribution, we will
briefly review its univariate version in order to better understand the coordinates
of the multivariate Gaussian distribution.
1 (X − µ)2
1 −
f (X) = √ e 2 σ2 ,
2nσ
where µ represents the mean and σ the standard deviation. The following
figure shows four random variables with different parameters.
N(0,1) N(0,4)
−2 0 2 −10 −5 0 5 10
N(2,1) N(2,4)
1
n
1X
X̄ = Xi
n i=1
n
1X
σˆ2 = (Xi − X̄)2
n i=1
2
Let S = {X1 , · · · , Xn } observations independent and identically distributed
from a random p-dimensional variable X ∈ Np (µ, Σ), n > p, its mean and
covariance matrix can be obtained through the maximum likelihood estimator
by using:
n
1X
µ̂ = Xi ,
n i=1
n
1X
Σ̂ = (Xi − µ̂)(Xi − µ̂)T .
n i=1
5
X2
−5
−5 0 5
X1
3
One option would be to make a confidence interval for each variable. However,
as can be seen in the Figure 2 none of these intervals detect the anomaly. A
possible solution is to use the normal multivariate distribution.
Theorem. Let X ∈ Np (µ, Σp×p ) and rg(Σ)=p. Then X has density function
1 T −1
1 − (X − µ) Σ (X − µ)
f (X) = √ p e 2
( 2π)p det(Σ)
Proof. Let Z ∈ Np (0, I),then since Z has independent components it has density
p 1 (Zi − 0)2 p 1 2 1 T
Y 1 − Y 1 − Zi 1 − Z Z
f (Z) = √ e 2 1 = √ e 2 = √ e 2
2π 2π ( 2π)p
i=1 i=1
4
Proof. Suppose that X1 and X2 are independent then
For the case of Σ21 note that Σ12 = ΣT21 because otherwise the covariance matrix
would not be symmetric.
On the other hand, let’s suppose now that Σ12 = ΣT21 = 0, then
1 T −1
1 − (X − µ) Σ (X − µ)
f (X) = √ p e 2
( 2π)p det(Σ)
Furthermore
1 1
√ p = √ p
( 2π)p det(Σ) ( 2π)p1 +p2 det(Σ11 )det(Σ22 )
1 1
= √ p √ p
( 2π)p1 det(Σ11 ) ( 2π)p2 det(Σ22 )
Theorem. Let A be positive definite. Then the set of solutions for the equation
XT AX = c, c > 0, is an ellipsoid with principle axes in the directions of the
eigenvectors.
Proof. Let P = [p1 · · · pn ] the matrix whose columns are the coordinates of
orthonormed eigenvectors of A, that is A = P ΛP T , P T P = I. Assuming
5
Y = P T X the following holds
XT AX = XT P ΛP T X
= (P T X)T ΛP T X
= YT ΛY
= λ1 Y12 + · · · + λn Yn2
Y2 Y2
= √1 2 + · · · + √n 2 .
1/( λ1 ) 1/( λn )
Corollary. Let X ∈ Np (µ, Σ), then contour lines of the joint density function
f (X) = c are ellipsoids.
Proof.
1
− (X − µ)T Σ−1 (X − µ)
f (X) = k1 · e 2 = k ⇐⇒ (X − µ)T Σ−1 (X − µ) = c > 0.
Ec = {X : f (X) = k}
will be ellipsoids.
Theorem. Let X ∈ Np (µ, Σ), if Σ have full rank (rg(Σ) = p), then
Proof. From the first part of the class we know that X ∈ Np (µ, Σ) if an only if
there exist µ ∈ Rp and A ∈ Rp×n such that X = AZ + µ for Zi ∈ N (0, 1) i.i.d.
Then using A = P Λ1/2 we have that Z = A−1 (X − µ). Let’s see how the previous
expression looks like
with Zi ∈ N (0, 1), and the chi-square distribution with p degrees of freedom is
the distribution of a sum of the squares of p independent standard normal random
variables.