0% found this document useful (0 votes)
23 views6 pages

AE - Tema 3 - The Multivariate Gaussian Distribution

Uploaded by

Ramón García
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views6 pages

AE - Tema 3 - The Multivariate Gaussian Distribution

Uploaded by

Ramón García
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

In this topic we are going to study the multivariate Normal distribution that

among its applications lies the detection of anomalies. Some examples are the
detection of defective products, the detection of anomalous behavior in computers
or the detection of fraud. Before applying this important distribution, we will
briefly review its univariate version in order to better understand the coordinates
of the multivariate Gaussian distribution.

Univariate Gaussian Distribution


We say that a random variable X follows a univariate Normal or Gaussian
distribution if its density function is given by:

1 (X − µ)2
1 −
f (X) = √ e 2 σ2 ,
2nσ
where µ represents the mean and σ the standard deviation. The following
figure shows four random variables with different parameters.

Figura 1: Plots of different univariate normal distributions

N(0,1) N(0,4)

−2 0 2 −10 −5 0 5 10

N(2,1) N(2,4)

−2.5 0.0 2.5 5.0 7.5 −5 0 5 10

Let S = {X1 , · · · , Xn } observations independent and identically distributed


from a random variable X, that follows a Normal distribution, its mean and
variance can be obtained through the maximum likelihood estimator by using:

1
n
1X
X̄ = Xi
n i=1
n
1X
σˆ2 = (Xi − X̄)2
n i=1

When the sample is small, quasi-variance is usually used as an estimate of


variance because it is an unbiased estimator (E(sˆ2 ) = σ 2 ), quasi-variance is given
by:
n
ˆ 1 X
2
s = (Xi − X̄)2
n − 1 i=1
However, we will assume that the sample is large enough so that both formulas
can be assumed to be identical.
Example. As an application of the one-dimensional Gaussian, let’s see an
example of anomaly detection. Imagine a professional gambler who wants to know
which are the matches in which there is a more advantageous (or disadvantageous)
betting house so that he can approach betting (or avoid the disadvantage).

Multivariate Gaussian Distribution


Definition. Let Z1 , · · · , Zp random variables independently distributed according
to a normal with mean zero and variance one, N(0,1). Then, we say that
Z = [Z1 , · · · , Zp ]T follows a standard multivariate normal distribution with mean
0p and dispersion matrix identity Ip , that is
   
E(Z1 ) 0
 ..   .. 
E(Z) =  .  =  .  = 0p ,
E(Zp ) 0
 
var(Z1 ) . . . cov(Z1 , Zp )
D(Z) =  .. .. ..
 = Ip .
 
. . .
cov(Z1 , Zp ) . . . var(Zp )
Definition. We say that the random p-dimensional variable X follows a normal
distribution with parameters µ and Σ if X has the same distribution as µ + AZ,
where A satisfies that AAT = Σ and Z ∈ Np (0, I). It is denoted as X ∈ Np (µ, Σ),
that is
E(X) = E(µ + AZ) = µ + AE(Z) = µ,
D(X) = D(µ + AZ) = AD(Z)AT = Σ.
Remark. As Σ is a symmetric positive semidefinite matrix then it’s orthogonally
diagonalizable Σ = P ΛP T , where P is the matrix whose columns are the
eigenvectors of Σ and Λ the diagonal matrix formed by the eigenvalues of Σ. Then
Σ = P Λ1/2 Λ1/2 P T = P Λ1/2 (P Λ1/2 )T
with which A = P Λ1/2 .

2
Let S = {X1 , · · · , Xn } observations independent and identically distributed
from a random p-dimensional variable X ∈ Np (µ, Σ), n > p, its mean and
covariance matrix can be obtained through the maximum likelihood estimator
by using:
n
1X
µ̂ = Xi ,
n i=1
n
1X
Σ̂ = (Xi − µ̂)(Xi − µ̂)T .
n i=1

Note that now Xi and µ are vectors. If we denote X = [X1 · · · Xn ] a matrix


containing the data in the columns and Y = X − µ the centered data, then
 
Y1T
1 1   1
Σ̂ = Y Y T = [Y1 · · · Yn ]  ...  = Y1 Y1T + · · · + Yn YnT .
n n n
YnT
That is the formula we know from the first class except for the order of transposes.
This happens because in real data matrices you have observations as rows, not
columns.
Example. Let’s imagine now that we want to make an application that
detects when our computer is performing an anomalous behavior based on the
computational load and ram memory used. This graph shows a point diagram
of 435 samples taken.

Figura 2: Detection of outliers

5
X2

−5

−5 0 5
X1

Type outlier regular

3
One option would be to make a confidence interval for each variable. However,
as can be seen in the Figure 2 none of these intervals detect the anomaly. A
possible solution is to use the normal multivariate distribution.

Theorem. Let X ∈ Np (µ, Σp×p ) and rg(Σ)=p. Then X has density function
1 T −1
1 − (X − µ) Σ (X − µ)
f (X) = √ p e 2
( 2π)p det(Σ)
Proof. Let Z ∈ Np (0, I),then since Z has independent components it has density

p 1 (Zi − 0)2 p 1 2 1 T
Y 1 − Y 1 − Zi 1 − Z Z
f (Z) = √ e 2 1 = √ e 2 = √ e 2
2π 2π ( 2π)p
i=1 i=1

Consider the transformation from Rp → Rp given by


X = µ + AZ,
where AAT = Σ. From class 1 we saw that Σ has full range implies that A is
invertible, so
Z = h(X) = A−1 (X − µ).
Then, by the change of variable theorem we have that
f (X) = f (h(X))det(h0 (X)).
Giving
T
ZT Z = (X − µ)T A−1 A−1 (X − µ),
= (X − µ)T Σ−1 (X − µ).
Furthermore
1
det(Σ) = det(AAT ) = det(A)2 ⇒ det(A−1 ) = p ,
det(Σ)
then
1 T −1
1 − (X − µ) Σ (X − µ)
f (X) = √ p e 2
( 2π)p det(Σ)

Theorem. Let X ∈ N (µ, Σ), then M X + b ∈ N (M µ + b, M T ΣM )


Proof. Direct from the above
Theorem. Let X formed by two random multivariate Gaussian vectors
     !
X1 µ1 Σ Σ12
X= ∈N , 11
X2 µ2 Σ21 Σ22

then X1 and X2 are independent if and only if Σ12 = ΣT21 = 0.

4
Proof. Suppose that X1 and X2 are independent then

Σ12 = cov(X1 , X2 ) = E[(X1 − µ1 )(X2 − µ2 )T ]


= E[(X1 X2 T − X1 µT2 − µ1 X2 T + µ1 µT2 ]
= E(X1 X2 T ) − µ1 µT2 − µ1 µT2 + µ1 µT2
independence
= µ1 µT2 − µ1 µT2 = 0

For the case of Σ21 note that Σ12 = ΣT21 because otherwise the covariance matrix
would not be symmetric.

On the other hand, let’s suppose now that Σ12 = ΣT21 = 0, then
1 T −1
1 − (X − µ) Σ (X − µ)
f (X) = √ p e 2
( 2π)p det(Σ)

Let’s work with


 T  −1  
T −1 X1 − µ 1 Σ11 0 X1 − µ 1
(X − µ) Σ (X − µ) =
X2 − µ 2 0 Σ22 X2 − µ 2
 −1  
  Σ 0 X 1 − µ 1
= (X1 − µ1 )T (X2 − µ2 )T 11
0 Σ−1 X2 − µ 2
22 
 T −1 T −1
 X1 − µ 1
= (X1 − µ1 ) Σ11 (X2 − µ2 ) Σ22
X2 − µ 2
= (X1 − µ1 )T Σ−1 T −1
11 (X1 − µ1 ) + (X2 − µ2 ) Σ22 (X2 − µ2 )

Furthermore
1 1
√ p = √ p
( 2π)p det(Σ) ( 2π)p1 +p2 det(Σ11 )det(Σ22 )
1 1
= √ p √ p
( 2π)p1 det(Σ11 ) ( 2π)p2 det(Σ22 )

where p1 and p2 are the dimension of X1 and X2 , respectively.


Finally, by replacing the previous expressions in f (X) we have

f (X) = f (X1 )f (X2 )

Theorem. Let A be positive definite. Then the set of solutions for the equation
XT AX = c, c > 0, is an ellipsoid with principle axes in the directions of the
eigenvectors.

Proof. Let P = [p1 · · · pn ] the matrix whose columns are the coordinates of
orthonormed eigenvectors of A, that is A = P ΛP T , P T P = I. Assuming

5
Y = P T X the following holds

XT AX = XT P ΛP T X
= (P T X)T ΛP T X
= YT ΛY
= λ1 Y12 + · · · + λn Yn2
Y2 Y2
= √1 2 + · · · + √n 2 .
1/( λ1 ) 1/( λn )

Corollary. Let X ∈ Np (µ, Σ), then contour lines of the joint density function
f (X) = c are ellipsoids.

Proof.
1
− (X − µ)T Σ−1 (X − µ)
f (X) = k1 · e 2 = k ⇐⇒ (X − µ)T Σ−1 (X − µ) = c > 0.

Since Σ−1 , is positive definite the contour lines

Ec = {X : f (X) = k}

will be ellipsoids.

Theorem. Let X ∈ Np (µ, Σ), if Σ have full rank (rg(Σ) = p), then

(X − µ)T Σ−1 (X − µ) ∈ χ2 (p).

Proof. From the first part of the class we know that X ∈ Np (µ, Σ) if an only if
there exist µ ∈ Rp and A ∈ Rp×n such that X = AZ + µ for Zi ∈ N (0, 1) i.i.d.
Then using A = P Λ1/2 we have that Z = A−1 (X − µ). Let’s see how the previous
expression looks like

(X − µ)T Σ−1 (X − µ) = (X − µ)T (P ΛP T )−1 (X − µ)


= (X − µ)T P Λ−1 P T (X − µ)
= (X − µ)T P Λ−1/2 Λ−1/2 P T (X − µ)
= (X − µ)T (A−1 )T A−1 (X − µ)
p
X
T
=Z Z= Zi2
i=1

with Zi ∈ N (0, 1), and the chi-square distribution with p degrees of freedom is
the distribution of a sum of the squares of p independent standard normal random
variables.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy