0% found this document useful (0 votes)

22 views

6 Multivariate Gaussian

Uploaded by

Atharva Tambat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

22 views

6 Multivariate Gaussian

Uploaded by

Atharva Tambat

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 138

CS 215

Data Analysis and Interpretation

Multivariate Statistics: Multivariate Gaussian
Suyash P. Awate
Multivariate Gaussian – Definition
• Consider a vector random variable X := [X1; X2; …; XD]
• Column vector of length D
Multivariate Gaussian – Identity A
• Consider a vector random variable X := [X1; X2; …; XD]
• Column vector of length D
Multivariate Gaussian – Identity A
• What are the level sets of the PDF ?
Multivariate Gaussian – Identity A
• Isotropic /
spherical
multivariate
Gaussian
• Level sets
Multivariate Gaussian – Identity A
• Isotropic /
spherical
multivariate
Gaussian
• Level sets
Multivariate Gaussian – Diagonal A
•X=AW+µ
• What is PDF q(X) for non-singular square diagonal matrix A,
and some µ ?
• X1 = A11 W1 + µ1 : Gaussian with mean µ1, standard deviation σ1 = |A11|
• X2 = A22 W2 + µ2 : Gaussian with mean µ2, standard deviation σ2 = |A22|
•…
• XD = ADD WD + µD : Gaussian with mean µD, standard deviation σD = |ADD|
• P(X) = P(X1, X2, …, XD) = G(X1; µ1, σ12) G(X2; µ2, σ22) … G(XD; µD, σD2)
• Any level set of PDF q(X) is a hyper-ellipsoid with:
• Center at µ
• Axes aligned with cardinal axes
Multivariate Gaussian – Diagonal A
•X=AW+µ
• What is PDF q(X) for non-singular square diagonal matrix A, some µ ?
• P(X) = P(X1, X2, …, XD) = G(X1; µ1, σ12) G(X2; µ2, σ22) … G(XD; µD, σD2)
• Example 1-3 (left to right):
means are (µ1, µ2) : (0,0), (0,0), (0,0)
variances are (σ12, σ22): (4,4), (6,2),(2,6)
Multivariate Gaussian – Non-Singular A
•X=AW+µ
• What is PDF q(X) for non-singular square matrix A and µ = 0 ?
• Transformation of random variables (multivariate case)
• Transformation is X := g(W) := A W
• Inverse transformation is W = g-1(X) = A-1X
• Univariate case
• We wanted magnitude of derivative of g-1(.)
• Measured local scaling in lengths caused by g-1(.)
• Multivariate case
• Measure local scaling in volumes caused by g-1(.)
• We want the magnitude of the volume-scaling given by Jacobian of g-1(.)
• Magnitude of determinant of Jacobian of g-1(.), where the Jacobian is A-1
Multivariate Gaussian – Non-Singular A
• Linear transformation
W := A-1 X
Multivariate Gaussian – Non-Singular A
• Linear transformation W := A-1 X
• Transformation A-1 maps
an infinitesimal hyper-cube (dX) δ x δ x … x δ (D times) à
an infinitesimal hyper-parallelepiped (dW)
• If axes of hyper-cube were unit vectors along cardinal axes,
then axes of hyper-parallelepiped are columns of A-1
• If volume of the hyper-cube (dX) is δD,
then volume of hyper-parallelepiped (dW) is δD det(A-1) = δD / det(A)
Multivariate Gaussian – Non-Singular A
• Volume of a parallelepiped (in 3D)
• Scalar triple product
Multivariate Gaussian – Non-Singular A
• Why is volume of hyper-parallelepiped given by
determinant of matrix with columns as sides of hyper-parallelepiped ?
• An inductive proof exists. We consider the following non-inductive reasoning.
• Two important properties from linear algebra and geometry:
a) Adding multiples of one column to another
doesn’t change determinant, because determinant function is multi-linear
b) Adding multiples of one side vector to another
doesn’t change volume, because it causes a skew translation of hyper-parallelepiped
1. Using Gram-Schmidt orthogonalization,
transform matrix A-1 to a matrix, say, Borthog with orthogonal columns
(NOT orthonormal columns; that would have determinant 1)
• This doesn’t change determinant or volume
Multivariate Gaussian – Non-Singular A
1. Gram–Schmidt
orthogonalization
• {v1,v2, …} to {u1,u2, …}
Multivariate Gaussian – Non-Singular A
• Why is volume of hyper-parallelepiped given by
determinant of matrix with columns as sides of hyper-parallelepiped ?
• 2 important properties from linear algebra and geometry:
Adding multiples of one column, or side vector, to another:
a) doesn’t change determinant, because determinant function is multi-linear
b) doesn’t change volume, because it causes a skew translation of hyper-parallelepiped
1. Using Gram-Schmidt orthogonalization,
transform matrix A-1 to a matrix, say, Borthog with orthogonal columns
(NOT orthonormal columns; that would have determinant 1)
2. Rotate Borthog to make it to diagonal form (align columns to cardinal axes)
• This doesn’t change determinant or volume
Multivariate Gaussian – Non-Singular A
2. Rotate {u1,u2, …} to align it to cardinal axes
Multivariate Gaussian – Non-Singular A
• Why is volume of hyper-parallelepiped given by
determinant of matrix with columns as sides of hyper-parallelepiped ?
• 2 important properties from linear algebra and geometry:
Adding multiples of one column, or side vector, to another:
a) doesn’t change determinant, because determinant function is multi-linear
b) doesn’t change volume, because it causes a skew translation of hyper-parallelepiped
1. Using Gram-Schmidt orthogonalization,
transform matrix A-1 to a matrix, say, Borthog with orthogonal columns
(NOT orthonormal columns; that would have determinant 1)
2. Rotate Borthog to make it to diagonal form (align columns to cardinal axes)
3. For this diagonal matrix (aligned hyper-rectangle),
determinant magnitude (= product of diagonal-entries' magnitudes) =
volume of a hyper-rectangle (= product of side lengths)
4. Now trace back all operations
Multivariate Gaussian – Non-Singular A
•X=AW+µ
• What is the PDF q(X) for non-singular square matrix A and µ = 0 ?
• Transformation of random variables (multivariate case)
• Transformation is X := g(W) := A W
• Inverse transformation is W = g-1(X) = A-1X
• Multivariate case
• Measure local scaling in volumes caused by g-1(.)
• We want the magnitude determinant of Jacobian of g-1(.)
Multivariate Gaussian – Non-Singular A, Non-Zero µ
• If X := A W is a multivariate Gaussian,
then Y := X + µ is a multivariate Gaussian with

• Proof:
• Follows from the transformation X := Y – µ := g-1(Y)
Multivariate Gaussian – Composite Transformations
• If Y is multivariate Gaussian,
then Z := BY + c is multivariate Gaussian,
where matrix B is square invertible
• Proof:
• Because Y is multivariate Gaussian, we have Y = AW + µ, where A is invertible
• Thus,
Z
= B (AW + µ) + c
= (BA)W + (Bµ + c), where matrix BA is invertible
Multivariate Statistics – Mean and Covariance
Multivariate Statistics – Mean
• For an general random (column) vector X,
the mean vector is
EP(X)[X]
= a (column) vector with the i-th component as EP(X)[Xi] = EP(Xi)[Xi]
Multivariate Statistics – Covariance
• Covariance matrix for a general random (column) vector Y is defined
as:
C := EP(Y) [ (Y – E[Y]) (Y – E[Y])T ]
• So,
Cij = ?
= EP(Y) [ (Yi – E[Yi]) (Yj – E[Yj]) ]
= EP(Yi,Yj) [ (Yi – E[Yi]) (Yj – E[Yj]) ]
= Cov (Yi, Yj)
Multivariate Statistics – Covariance
• More properties of covariance matrix C (for a general random vector X)

• If A is invertible, then C is invertible, and then C is positive definite (PD) where

aTCa > 0 for any non-zero vector a
Multivariate Gaussian – Mean and Covariance
Multivariate Gaussian – Mean
• The mean vector of X := AW+µ is µ
• Proof:
• When X = AW + µ,
EP(X)[X] = EP(W)[AW+µ] = µ + EP(W)[AW] = µ + A EP(W)[W] = µ

• Notes:
• Take the expectation of first component of AW, i.e.,
EP(W) [ A11W1 + A12W2 + … A1DWD ]
= A11 EP(W) [W1] + A12 EP(W) [W2] + … + A1D EP(W) [WD]
• So, for the whole vector: EP(W) [AW] = A EP(W) [W]
Multivariate Gaussian – Covariance
• The covariance matrix of X := AW + µ is AAT
Multivariate Gaussian – Different Cases
Multivariate Gaussian – Special Cases
• Diagonal matrix
• Orthogonal matrix
• Definition: Real square matrix Q whose columns and rows are orthogonal
unit vectors (i.e., orthonormal vectors) Q QT = QT Q = Identity matrix
• Determinant det(Q) is either +1 or -1
• “orthogonal” is an over-used term in mathematics
• Rotation matrix
• When det(Q) = +1, then Q is a rotation matrix
• When det(Q) = -1, then Q models either reflection (called as an improper rotation) or a
combination of rotation and reflection
• “Rotation” is an over-used term (sometimes includes improper rotations) in
mathematics
• Reflection matrix
• An orthogonal matrix that is also symmetric
Multivariate Gaussian – Special Cases
• Property (Rotation and/or Reflection):
If µ = 0; and A = R where R is orthogonal;
then Y := RW has PDF:

• Proof:
• Transformation of random vectors
• |det(R)| = 1
• Inverse transformation is
W = RT Y
Multivariate Gaussian – Special Cases
• Property (Scaling):
If µ = 0; and A = S square diagonal with positive entries on diagonal;
then Y := SW has PDF:
Multivariate Gaussian – Special Cases
• Property (first Scaling, and then Rotation and/or Reflection):
If µ = 0; A = RS,
then Y := RSW has the PDF:
Multivariate Gaussian – Special Cases
• Property (first Scaling, and then Rotation and/or Reflection):
If µ = 0; A = RS,
then Y := RSW has the PDF:
Multivariate Gaussian – General Case
• If X := A W is a multivariate Gaussian,
then Y := X + µ is a multivariate Gaussian with

• What are the level sets of this PDF ?

• We need some linear algebra
• Analyze properties of covariance matrix C that is:
• In general: real symmetric positive semi-definite
• When C = AAT, and A is invertible, then C is positive definite
Probability and Statistics
• Reference books specifically for
multivariate Gaussian
• Basic Probability Theory, by Robert Ash
• faculty.math.illinois.edu/~r-ash/BPT.html
• Link 2
Linear Algebra
• Reference
books
Linear Algebra – Eigen Decomposition
• Eigenvalue and Eigenvector
• For any square NxN matrix A,
an eigenvector is a non-zero vector ‘v’ s.t. Av = λv.
Then, λ is the associated eigenvalue

• Square matrix A is diagonalizable if it is “similar” to a diagonal matrix,

i.e., if there exists an invertible matrix P and a diagonal matrix D
such that P-1AP = D
Linear Algebra – Eigen Decomposition
• If A is diagonalizable,
then it has
N linearly-independent eigenvectors
• The eigenvectors needn’t be
orthogonal to each other
Linear Algebra – Eigen Decomposition
• Invertible doesn’t imply diagonalizable
• A non-diagonalizable matrix is called a defective matrix
• e.g., 2x2 matrix A as shown. B = inv (A). [V D] = eig (A)
• Doesn’t have a complete basis of eigenvectors
• Intuition: Action of matrix is to map vector (x,y) to (x+y,y)
So, any eigenvalue must be 1, any eigenvector must have y=0
Linear Algebra – Eigen Decomposition
• Diagonalizable doesn’t imply invertible
• e.g., some eigenvalues can be zero
Linear Algebra – Eigen Decomposition
• Eigenvalue and Eigenvector
• For any square NxN matrix A,
an eigenvector is a non-zero vector ‘v’ s.t. Av = λv.
Then, λ is the associated eigenvalue
• Theorem:
Every real symmetric matrix (e.g., covariance C) is diagonalizable
• There exists an invertible matrix Q such that Q-1 C Q is diagonal
• This implies C has N linearly-independent eigenvectors
• Theorem:
Every real symmetric matrix (e.g., covariance C) is diagonalizable by
an orthogonal matrix
• There exists an orthogonal matrix Q such that QT C Q is diagonal
Linear Algebra – Eigen Decomposition
• Spectral Theorem: If A is a real symmetric NxN matrix, then
A has N real eigenvalues with N real-valued orthogonal eigenvectors
Linear Algebra – Eigen Decomposition
• In general:
• If {u,v} are distinct/non-parallel eigenvectors associated with same eigenvalue
λ, then any vector within span of {u,v} is also an eigenvector with eigenvalue λ.
• Because A(au + bv) = aAu + bAv = aλu + bλv = λ(au + bv)
• The above equation holds for real/complex-valued a,b, u,v, λ, A
• If {u,v} are distinct eigenvectors with distinct eigenvalues {λ1,λ2}, then action of
A on any vector within span of {u,v} outputs a vector within the span of {u,v}.
• Because A(au + bv) = aAu + bAv = aλ1u + bλ2v = (aλ1)u + (bλ2)v
• Notation: (.)* = conjugate and (.)T = transpose (e.g., converts col to row)
• For a real-symmetric matrix A
• Inner product of Ax and y = inner product of x and Ay
because 𝐴𝑥 !𝑦 ∗ = 𝑥 !𝐴!𝑦 ∗ = 𝑥 !𝐴 𝑦 ∗ = 𝑥 !𝐴∗ 𝑦 ∗ = 𝑥 ! 𝐴𝑦 ∗ (A is ”self-adjoint”)
• If v is a eigenvector and if u is orthogonal to v (i.e., if uTv* = 0), then
action of A on u produces a vector also orthogonal to v (A maintains orthogonality)
• Because (Au)Tv* = uT(Av*) = uT(Av)* = uTλ*v* = 0
Linear Algebra – Eigen Decomposition
• Spectral Theorem: If A is a real symmetric NxN matrix, then
A has N real eigenvalues with N real-valued orthogonal eigenvectors
• First, we show that A has all real eigenvalues
(i.e., A cannot have a complex-valued eigenvalue)
Linear Algebra – Eigen Decomposition
• Spectral Theorem: If A is a real symmetric NxN matrix, then
A has N real eigenvalues with N real-valued orthogonal eigenvectors
• Let us analyze the special/simple case where all real eigenvalues = λ
• Then, for all v in RN, linear operator A’s action on v simply scales v by factor λ
• We can exactly model such an operator A simply by a diagonal matrix 𝜆𝐼!×!
• Or model A as A = 𝑄 𝜆𝐼!×! 𝑄# ,
where Q is any orthogonal basis for RN with real-valued column vectors
• Thus, 𝑄# AQ leads to a diagonal matrix
• So, the N columns of Q are N real-valued eigenvectors of A
• Of course, Q isn’t unique
Linear Algebra – Eigen Decomposition
• Spectral Theorem: If A is a real symmetric NxN matrix, then
A has N real eigenvalues with N real-valued orthogonal eigenvectors
• For real-symmetric A,
eigenvectors corresponding to distinct (real) eigenvalues
are orthogonal Notation:
• Proof: (.)T = transpose
• Let A have eigenvector 𝑣# with real eigenvalue 𝜆#
• Let A have eigenvector 𝑣$ with real eigenvalue 𝜆$ ≠ 𝜆#
• Then, 𝜆#𝑣#!𝑣$∗ = 𝜆#𝑣# !𝑣$∗ = 𝐴𝑣# !𝑣$∗
• = 𝑣#! 𝐴𝑣$∗ (because A is symmetric)
• = 𝑣#! 𝐴𝑣$ ∗ (because A is real)
• = 𝑣#! 𝜆$𝑣$ ∗ = 𝑣#!𝜆$𝑣$∗ (because 𝜆$ is real)
• = 𝜆$𝑣#!𝑣$∗
• Because 𝜆$ ≠ 𝜆#, we get 𝑣# orthogonal to 𝑣$
Linear Algebra – Eigen Decomposition
• Spectral Theorem: If A is a real symmetric NxN matrix, then
A has N real eigenvalues with N real-valued orthogonal eigenvectors
Linear Algebra – Eigen Decomposition
• Spectral Theorem: If A is a real symmetric NxN matrix, then
A has N real eigenvalues with N real-valued orthogonal eigenvectors
• How do we know that real-valued eigenvectors exist for A ?
• Let real-positive scalar 𝑏 ≔ !
max 𝐴𝑦 * ,
$∈& ∶ $ " ()
• Let real-valued vector x ≔ arg max
!
𝐴𝑦 *
$∈+ ∶ $ " ()
• If there are multiple such unit-norm vectors y, then we pick one of them as x
• Thus, action of A scales norm of any vector by at-most b times
• Then 𝑏 * = Ax # Ax = x # (AAx)
• RHS can take value 𝑏 * only if x is parallel to AAx (and then AAx = 𝑏 * x)
• Then we can claim that:
1. Either Ax is parallel to x; so v1 := x is a real-valued eigenvector, with real eigenvalue b
2. Or v1 := Ax+bx (non-zero) is a real-valued eigenvector, with real eigenvalue b,
because A Ax + bx = AAx + bAx = b$x + bAx = b(bx + Ax)
Linear Algebra – Eigen Decomposition
• Spectral Theorem: If A is a real symmetric NxN matrix, then
A has N real eigenvalues with N real-valued orthogonal eigenvectors
• How do we know that real-valued eigenvectors exist for A ?
• Repeat the following:
• Let real-positive scalar 𝑐 ≔ ! #
max 𝐴𝑦 * ,
$∈& , $ -$ (. ∶ $ " ()
• Let real-valued vector x ≔ arg max 𝐴𝑦 *
$∈+! , $ -$ (. ∶ $ " ()
#

• Then 𝑐 * = Ax # Ax = x # (AAx)
• RHS can take value 𝑐 * only if x is parallel to AAx (and then AAx = 𝑐 * )
• Then we can claim that:
1. Either Ax is parallel to x; so x is a real-valued eig.vector (orthogonal to v1) (eig.value c)
2. Or (Ax+cx) is a real-valued eig.vector of A (orthogonal to v1, as both Ax and x are),
because A Ax + cx = AAx + cAx = c $x + cAx = c(cx + Ax)
Linear Algebra – Eigen Decomposition
• Every NxN real symmetric positive definite (SPD) matrix M
(e.g., covariance matrix C) has an eigen-decomposition with
all eigenvalues as positive
• Proof:
• Let eigen decomposition for real symmetric matrix M be: M = Q D QT
• Where Q is real orthogonal and D is real diagonal
• Then, vT M v = vT Q D QT v = uT D u, where u := QT v (simply “rotated” v)
• For a PD matrix M, vTMv must be positive for every non-zero v
• So, uTDu must be positive for every non-zero u
• So, all values on diagonal of D must be positive
Multivariate Gaussian – Level Sets
• If X = A W is a multivariate Gaussian,
then Y = X + µ is a multivariate Gaussian with

• What are the level sets of this PDF ?

• Let C = Q D QT . Then, C-1 = Q D-1 QT that is also SPD
• Each level set satisfies (y-μ)T C-1 (y-μ) = a, where a>=0
• Because C-1 is SPD; ‘a’ becomes zero iff y=μ
• So, (y-μ)T Q D-1 QT (y-μ) = a
• Change to roto-reflected coordinate system represented by orthogonal basis Q
• Where y maps to y’ = QT y, and μ maps to μ’ = QT μ
• Then, (y’-μ’)T D-1 (y’-μ’) = a, which is a hyper-ellipsoid:
• In roto-reflected coordinate system, center is at μ’ and axes are along cardinal axes
• Whose half-lengths of axes are square root of diagonal elements in D-1
Multivariate Gaussian – Level Sets
• If X = A W is a multivariate Gaussian, then Y = X + µ is a multivariate
Gaussian with

• What are the level sets of this PDF ?

peterroelants.g
ithub.io/posts/
multivariate-
normal-primer/
Multivariate Gaussian – Level Sets
• If X = A W is a multivariate Gaussian, then Y = X + µ is a multivariate
Gaussian with

• What are the level sets of this PDF ?

Multivariate Gaussian – Marginals and Conditionals
Multivariate Gaussian – Marginals
• Marginal PDFs
• Property: The 1D marginal PDF of
multivariate Gaussian X,
for any single variable,
is (univariate) Gaussian
• Proof:
• From the definition, we know that:
• (1)
• (2) transformations of scaling and/or translation on a univariate Gaussian RV
lead to another univariate Gaussian RV
• (3) sum of 2 independent univariate Gaussian RVs leads to another univariate
Gaussian RV
Multivariate Gaussian – Marginals
• Marginal PDFs
• Property: The 1D marginal PDF of
multivariate Gaussian X,
for any single variable,
is (univariate) Gaussian
Multivariate Gaussian – Marginals
• Marginal PDFs
• Property: Marginal PDFs of multivariate Gaussian X in N dimensions,
over any chosen subset of the variables (subset size M < N),
are (multivariate) Gaussian
• Proof:
• Choose transformation B as a projection matrix of size MxN, where M < N
• Each row has all zeros except a 1 at one position
• e.g., row [1 0 … 0 ] will select the first component of X
• If we consider multivariate Gaussian X := AW + µ, where A is invertible,
then BX = (BA)W + (Bµ)
• Note: Because A is invertible (full rank), BA has rank M
• By definition, BX is also multivariate Gaussian
• Mean = Bµ, Covariance = (BA)(BA)T = BAATBT = BCBT = C’,
where C’ is a square sub-matrix of C corresponding to the chosen M variables
Multivariate Gaussian – Marginals
• Marginal PDFs being Gaussian
doesn’t imply
joint PDF is multivariate Gaussian
• Example
• Let X be a standard Normal
• Let Y = X (2B – 1)
where B is Bernoulli with parameter 0.5

• More examples
• https://en.wikipedia.org/wiki/Normally_distributed_and_uncorrelated_does_not_impl
y_independent
Multivariate Gaussian – Marginals
• Marginal PDFs being Gaussian doesn’t imply joint PDF is multivariate
Gaussian
• Only top-row left, top-row middle, bottom-row left are bivariate Gaussian
• All marginals are Gaussian
Multivariate Gaussian – Conditionals
• Conditional PDFs
• If multivariate Gaussian X
is partitioned into X1 and X2,
then conditional PDF P(X1|X2=x2)
is also a multivariate Gaussian
• P(X1|X2=x2)
=
P(X1,X2=x2) / P(X2=x2)
Multivariate Gaussian – Conditionals
• Conditional PDFs
• If multivariate Gaussian X is partitioned into X1 and X2,
then the conditional PDF P(X1|X2=x2) is also a multivariate Gaussian
Multivariate Gaussian – Conditionals
• “Conditional” PDFs
• What about this way of slicing ?
• Yes, profile on the line has a Gaussian shape
Multivariate Gaussian – Conditionals
• “Conditional” PDFs
• What about this way of slicing ?
• Yes, profile on the line has a Gaussian shape
Multivariate Gaussian – ML Estimation
Multivariate Gaussian – ML Estimation
• Data: {y1, …, yN}
• Take log-likelihood function
• ML estimate (MLE) for mean vector (= sample mean)
• Take derivative with respect to (w.r.t.) µ, and assign to zero. Solve.
• Quadratic form aTBa
= ΣiΣj aiajBij 𝜕
𝜇 − 𝑥 !𝐶 %# 𝜇 − 𝑥 = 2𝐶 %#(𝜇 − 𝑥)
• Partial derivative w.r.t. ak 𝜕𝜇
= Σj ajBkj + Σi aiBik
= 2 Σj Bkjaj (because B is symmetric)
= 2 (k-th row of B * column-vector a)
• Scalar function, say f(a), of multiple scalar variables in column-vector ‘a’
• Jacobian df/da will be a row vector of the same length as ‘a’
• Change in function value (df) = derivative (df/da) * change in variable (da)
• Can be reshaped/rearranged into a column vector of the same shape as ‘a’
Multivariate Gaussian – ML Estimation
• Data: {y1, …, yN}
• Take log-likelihood function
• MLE for covariance matrix (= sample covariance; uncorrected/biased)
• Take derivative w.r.t. C, and assign to zero. Solve.
• Need partial derivatives w.r.t. Cij
• Scalar function, say f(C), of multiple scalar variables in C
• Consider a (column)-vectorized form of C
• Jacobian df/dC will be a row vector of the same length as (column)-vectorized C
• Can be reshaped/rearranged into a matrix of the same shape as C
Multivariate Gaussian – ML Estimation
• “Matrix Calculus”
• http://www.ee.ic.ac.uk/hp/staff/dmb/matrix/calculus.html
• https://en.wikipedia.org/wiki/Matrix_calculus
• http://www.matrixcalculus.org/
Principal Component Analysis (PCA)
Principal Component Analysis (PCA)
• Principal Component Analysis (PCA)
• What is it about ?
• What does it tell us about the distribution underlying the data ?
• What does it tell us about the distribution underlying the data,
when the data is known to have a multivariate Gaussian distribution ?
• Applications
Principal Component Analysis (PCA)
• “Modes of variation”
• Set of vectors (directions and magnitudes) that are used to depict the
variation in a population or sample, around the mean

https://statistics.laerd.com/spss-tutorials/pearsons-product-moment-correlation-using-spss-statistics.php
Principal Component Analysis (PCA)
• “Modes of variation”
• Set of vectors (directions and magnitudes) that are used to depict the
variation in a population or sample, around the mean
• Can we do it for the
distribution of images of each digit ?
Principal Component Analysis (PCA)
• Directions of maximal variance
• Consider a general multivariate random variable X with PDF P(X)
• We aren’t assuming it to be a Gaussian yet
Principal Component Analysis (PCA)
• Directions of maximal variance
Principal Component Analysis (PCA)
• Directions of maximal variance
Principal Component Analysis (PCA)
• Directions of maximal variance
Principal Component Analysis (PCA)
• Directions of maximal variance
• When covariance matrix C is diagonal (sample mean at origin)
• Let d-th element on diagonal of C be Cdd
• Let d-th element in vector ‘v’ be vd

• Objective function is a convex combination of { Cdd } with weights { (vd)2 }

Principal Component Analysis (PCA)
• Directions of maximal variance
• When covariance matrix C is diagonal (and sample mean at origin):
• Minor axis corresponds to Dimension d = 1
dimension d with
smallest 1/Cdd, i.e., largest Cdd

• The point on hypersphere that

maximizes objective function
lies at the end of the minor axis Dimension d = 2
of one of the hyper-ellipsoids

These level sets being

ellipsoids isn’t because of
any Gaussian assumption.
Principal Component Analysis (PCA)
• Directions of maximal variance
• When covariance matrix C is diagonal (and sample mean at origin):

• Second mode of variation is second cardinal axis (another eigenvector).

Variance along that mode = second-largest eigenvalue = C22
• Similar arguments hold for 3rd, 4th, ... directions
• Thus, for any P(X) with a diagonal covariance matrix C, modes of variation are
cardinal directions that maximize variance of projected data
Principal Component Analysis (PCA)
• Directions of maximal variance
• For a general SPD covariance matrix C (and sample mean at origin):
Principal Component Analysis (PCA)
• Directions of maximal variance
• Data

• What do the eigenvectors look like ? Compute them and see for yourself.
Principal Component Analysis (PCA)
• Example 1: rng(0); N = 1e5; data = randn (2,N)
Principal Component Analysis (PCA)
• Example 2: rng(0); N = 1e5; data = rand (2,N)
Principal Component Analysis (PCA)
• Example 3: rng(0); N = 1e5; data = [[1 -1];[1 1]]/sqrt(2) * (rand (2,N))
Principal Component Analysis (PCA)
• Example 4: rng(0); N = 1e5; data = exprnd (1,[2,N])
Principal Component Analysis (PCA)
• Example 5: rng(0); N=1e5; data = [[1 -1];[1 1]]/sqrt(2) * exprnd (1,[2,N])
Principal Component Analysis (PCA)
• What happens to covariance matrix when we rotate the data ?
• In general
• Let 𝐶/ ≔ 𝐸 𝑋 − 𝜇/ 𝑋 − 𝜇/ # = 𝐸 𝑋𝑋 # − 𝜇/ 𝜇/#
• Let 𝑌 ≔ 𝑅𝑋 (for any invertible R)
• Then, 𝜇0 ≔ 𝐸 𝑌 = 𝐸 𝑅𝑋 = 𝑅𝐸 𝑋 = 𝑅𝜇/
• Then, 𝐶0 ≔ 𝐸 𝑌 − 𝜇0 𝑌 − 𝜇0 #
• = 𝐸 𝑌𝑌 # − 𝜇0 𝜇0#
• = 𝐸 𝑅𝑋𝑋 # 𝑅# − 𝑅𝜇/ 𝜇/# 𝑅#
• = 𝑅 𝐸 𝑋𝑋 # − 𝜇/ 𝜇/# 𝑅#
• = 𝑅C1 R#
• If 𝐶! = 𝑎I and R is orthogonal, then 𝐶" = 𝑅𝑎𝐼R# = 𝑎I = 𝐶!
• Thus, eigenvalues don’t change, and eigenvectors can be any orthogonal basis
• Thus, directions of maximal variance aren’t unique
Principal Component Analysis (PCA)
• Spaces of maximal variance
• What if we want to find multi-D lower-dimensional spaces that maximize
“total dispersion/variance” ?
• Total dispersion/variance is empirical average of squared distance from mean
Principal Component Analysis (PCA)
• Spaces of maximal variance
• What if we want to find multi-D lower-dimensional spaces that maximize
“total dispersion/variance” ?
• Total dispersion/variance is empirical average of squared distance from mean
• When covariance matrix C is diagonal (and sample mean is at origin):
Principal Component Analysis (PCA)
• Spaces of maximal variance
• What if we want to find multi-D lower-dimensional spaces that maximize
“total dispersion/variance” ?
• When covariance matrix C is diagonal (and sample mean at origin):
Principal Component Analysis (PCA)
• Spaces of maximal variance
• What if we want to find multi-D lower-dimensional spaces that maximize
“total dispersion/variance” ?
• When covariance matrix C is diagonal (and sample mean at origin):

• Now this problem is similar to what we had solved before

• Like before, we have a “convex” combination of {Cdd} with weights {0 ≤ ad ≤ 1}
• So, increase a1 to its limit (i.e., 1) and then increase a2 to its limit (i.e., 1)
Principal Component Analysis (PCA)
• Spaces of maximal variance
• What if we want to find multi-D lower-dimensional spaces that maximize
“total dispersion/variance” ?
• When covariance matrix C is diagonal (and sample mean at origin):

• Any orthogonal basis spanning the space spanned by first 2 cardinal axes will be a solution
• Similar arguments will hold for lower-dimensional spaces of dimensions 3, 4, …, D-1
• Similar arguments will also hold for a general SPD covariance matrix C
Principal Component Analysis (PCA)
• PCA applied to data from a multivariate Gaussian distribution
• Consider X is multivariate Gaussian
• If X := AW + b, then:
• Principal modes of variation are directions given by
eigenvectors of covariance matrix C := AAT
• Principal modes of variation are along
axes of hyper-ellipsoids that are level sets of P(X)
• Variances along principal modes of variation are
the eigenvalues of C
• If X := RSW + b, then:
• Principal modes of variation are
column vectors of orthogonal matrix R
i.e., eigenvectors of C = RS2RT
• Variances along principal modes of variation are
the eigenvalues of C, i.e., diagonal elements in S2
Principal Component Analysis (PCA)
• Applications: Dimensionality reduction
• Intrinsic dimension: Minimum number of variables (degrees of freedom)
required to represent the signal
• Consider a multivariate random vector X of N scalar variables: x = (x1, …, xN)
• Consider a function g(.), and M<N scalar variables a1, …, aM such that
every x~P(X) can be written as x = g (a1, …, aM) for some a1, …, aM,
then signal X needs only M variables for representation
• Here, “intrinsic dimension” of X is M, instead of the “representation dimension” = N
Principal Component Analysis (PCA)
• Applications: Dimensionality reduction

https://medium.com/analytics-vidhya/dimensionality-reduction-using-principal-component-analysis-pca-41e364615766
Principal Component Analysis (PCA)
• Applications: Dimensionality reduction
• Acquired data is corrupted with errors
• e.g., measurement errors
• Such errors make the signal representation seem to be of a dimension higher than
intrinsic dimension
• Dimensionality reduction:
Transformation of data
from a higher-dimensional space into a lower-dimensional space
so that
lower-D representation (ideally close to its intrinsic dimension)
retains some meaningful properties of original data,
• PCA can perform linear dimensionality reduction
Principal Component Analysis (PCA)
• Applications:
Dimensionality reduction
• Using PCA
• X may be N dimensional
• PCA finds an
M-dimensional space
that captures most of the
variability (total dispersion)
in the data

http://bennymachinelearning.blogspot.com/2017
/08/machine-learning-principal-component.html
Principal Component Analysis (PCA)
• Applications: Dimensionality reduction
• Using PCA
• X may be N dimensional
• PCA can find an
M-dimensional space
(often when M << N)
that captures most of
the variability
(total dispersion)
in the data
Multivariate Gaussian – Mahalanobis Distance
Multivariate Gaussian – Mahalanobis Distance

• Term (y-µ)T C-1 (y-µ) appearing in exponent

= squared Mahalanobis distance
of point y from mean µ
• d(y,µ; C)2 := (y-µ)T C-1 (y-µ) (where C is SPD)
• Prasanta Chandra Mahalanobis
founded
Indian Statistical Institute (ISI) in Kolkata
Multivariate Gaussian – Mahalanobis Distance
• d(y,µ; C)2 := (y-µ)T C-1 (y-µ) (where C is SPD)
• Generalizes Euclidean distance in a multidimensional space
• When C is Identity:
• Mahalanobis distance = Euclidean distance
• When C is diagonal:
• Mahalanobis distance rescales “units” along each dimension
based on standard deviation of the marginal along that dimension
• A level set of a Multivariate Gaussian PDF is
the locus of points with equal Mahalanobis distance from the mean
Multivariate Gaussian – Mahalanobis Distance
• d(y,µ; C)2 := (y-µ)T C-1 (y-µ) (where C is SPD)
• Property: The Mahalanobis distance is a “distance metric”
• Proof:
• A distance metric is a function 𝑑 . , . → 𝑅 that needs to satisfy 3 properties:
• (1) identity of indiscernibles: d(x,y) = 0 iff x = y
• (2) symmetry: d(x,y) = d(y,x)
• (3) triangle inequality: d(x,y) <= d(x,z) + d(z,y)
• These imply non-negativity (i.e., d(x,y) >= 0, for all x,y):
0 = d(x,x) <= d(x,y) + d(y,x) = 2 d(x,y)
• In our case of SPD matrix C:
• C being SPD implies: d(x,y; C) >= 0 for all x,y
• C being SPD implies: d(x,y; C) = 0 iff x=y
• Because of the specific quadratic form of d(.,.; C)2 : d(x,y; C) = d(y,x; C)
Multivariate Gaussian – Mahalanobis Distance
• Property: The Mahalanobis distance is a true distance metric
• Proof (when covariance matrix C is diagonal):

• Showing LHS <= RHS is equivalent to showing LHS2 <= RHS2

Multivariate Gaussian – Mahalanobis Distance
• Property: The Mahalanobis distance is a true distance metric
• Proof (when covariance matrix C is diagonal):
Multivariate Gaussian – Mahalanobis Distance
• Property: The Mahalanobis distance is a true distance metric
• Proof (for a general covariance matrix C):
Multivariate Gaussian – Mahalanobis Distance
• A level set of a Multivariate Gaussian is the locus of points with the
same Mahalanobis distance from the mean
• Scaling the coordinate frame: X := SW
• How does the Mahalanobis distances change (w.r.t. case when C = Identity) ?
• How do the level sets change ?
Multivariate Gaussian – Mahalanobis Distance
• A level set of a Multivariate Gaussian is the locus of points with the
same Mahalanobis distance from the mean
• Scaling + “Rotating” (proper + improper) coordinate frame: Y := USW
• How does the Mahalanobis distances change (w.r.t. case when C = Identity) ?
• How do the level sets change ?
Multivariate Gaussian – Mahalanobis Distance
•.
Multivariate Gaussian – Applications
Multivariate Gaussian – Applications
• Multivariate Gaussian (Mahalanobis distance) for anomaly detection

Blue = normal
Blue = normal Red = anamolous
Red = anamolous Using Mahalanobis distance from mean of normal data,
Using Euclidean distance from mean of normal data using covariance of normal data
Multivariate Gaussian – Applications
• Multivariate Gaussian for maximum-likelihood classification (in 1D)

http://sar.kangwon.ac.kr/etc/rs_note/rsnote/cp11/cp11-7.htm
Multivariate Gaussian – Applications
• Multivariate Gaussian for maximum-likelihood classification (in 2D)
• If det(C1)=det(C2),
then
likelihood-based
classification is
equivalent to
Mahalanobis-
distance based
classification

https://onlinelibrary.
wiley.com/doi/full/1
0.1111/maps.13314
Multivariate Gaussian – Applications
• Multivariate Gaussian for maximum-likelihood classification
• How do decision boundaries look like ?
• P(x|Class1) = G (x; m1, C1)
• P(x|Class2) = G (x; m2, C2)
• Decision surface comprises all points ‘x’ at which likelihoods are equal
• { x : P(x|Class1) = P(x|Class2) }
• { x : 0 = log ( P(x|Class1) / P(x|Class2) ) }
• At any point in the domain ‘x’, the log likelihood-ratio is:
log (P(x|Class1) / P(x|Class2))
=
- 0.5 (x-m1)T C1-1 (x-m1) - 0.5 log (det (C1))
+ 0.5 (x-m2)T C2-1 (x-m2) + 0.5 log (det (C2))
• In general, decision surface is a hyper-quadric
Multivariate Gaussian – Applications
• Multivariate Gaussian for maximum-likelihood classification
• Decision boundaries

http://mi.eng.cam.
ac.uk/~mjfg/local/
4F10/lect2.pdf
Multivariate Gaussian – Applications
• Multivariate Gaussian for maximum-likelihood classification
• Decision boundaries

http://mi.eng.cam.
ac.uk/~mjfg/local/
4F10/lect2.pdf
Multivariate Gaussian – Applications
• Multivariate Gaussian for maximum-likelihood classification
• Decision boundaries
• When C1 = C2 = C, then decision boundary is:
• 0 = log (P(x|Class1) / P(x|Class2))
=
- 0.5 (x-m1)T C-1 (x-m1)
+ 0.5 (x-m2)T C-1 (x-m2)
•0
=
+ (m2-m1)T C-1 x
+ 0.5 m1T C-1 m1
- 0.5 m2T C-1 m2
• Decision surface is a hyper-plane
Multivariate Gaussian – Applications
• Multivariate Gaussian for maximum-likelihood classification
• Example (Data taken from R. A. Fisher's classic 1936 paper)
• UCI ML repository Iris dataset http://archive.ics.uci.edu/ml/datasets/Iris/

http://mi.eng.cam.
ac.uk/~mjfg/local/
4F10/lect2.pdf
Datasets
• UCI Machine Learning Repository
• https://archive.ics.uci.edu/ml/
Singular Value Decomposition (SVD)
• Singular Value Decomposition (SVD)
• What is it about ?
• What can we say about existence ?
• What can we say
about uniqueness ?
• How does it help us
understand the
multivariate Gaussian ?
Singular Value Decomposition (SVD)
Singular Value Decomposition (SVD)
• Matrix factorization
• Let matrix A be size MxN
• When A is real valued, then SVD of A = U S VT, where:
• V is orthogonal of size NxN
• When A is complex: V is unitary
• U is orthogonal of size MxM
• When A is complex: U is unitary
• S is (rectangular) diagonal with size MxN
• Values on diagonal = singular values
• Singular values are non-negative real (even when A, U, V are complex-valued)
• If the m-th columns of U and V are um and vm, respectively, then:
Singular Value Decomposition (SVD)
• A = U S VT
• An example, in pictures:

A = U S VT
Singular Value Decomposition (SVD)
• Geometric interpretation of the action of a matrix A on a vector
• In this example,
A is square A

VT U

S
Singular Value Decomposition (SVD)
Singular Value Decomposition (SVD)
• Matrix norm
• Induced by a vector norm

• Geometric interpretation related to 2-norm

• Apply “linear operator” A to all unit-norm vectors x (starting at origin)
• Let y := A x, for all such x
• Then, among all vectors y, pick the norm of the vector y’ that has the largest norm
Singular Value Decomposition (SVD)
• Matrix
norm
• Induced
by a
vector
norm
Singular Value Decomposition (SVD)
• Existence, for any real matrix A
Singular Value Decomposition (SVD)
• Existence
Singular Value Decomposition (SVD)
• Existence
Singular Value Decomposition (SVD)
• Existence
• How to analyze S further ? Induction on size of A, i.e., MxN
Singular Value Decomposition (SVD)
• Properties of singular values, vectors
• What does A = U S VT imply ?
• Some insights via algebra and geometry
• Let i-th column of V be vi
• Let j-th column of U be uj
• What is Avi ? For example, take i = 2.
(assume S is at least of size 2x2)
• Av2 = USVT v2
= U S [0 1 0 … 0]T
= U [0 S22 0 … 0]T
= S22 u2
• Thus, Av1 is along u1, and, hence, orthogonal to all other columns of U
• Also, Av2 is along u2, and, hence, orthogonal to all other columns of U, …
• Also, if any vector v is orthogonal to v1, then Av is orthogonal to Av1 or u1
Singular Value Decomposition (SVD)
• Uniqueness analysis of singular values, singular vectors

v1
Singular Value Decomposition (SVD)
x
• Uniqueness analysis w

θ
v1
Singular Value Decomposition (SVD) x
• Uniqueness analysis w

θ
v1

2
Singular Value Decomposition (SVD)
• Uniqueness analysis x
w

θ
v1
Singular Value Decomposition (SVD)
• Uniqueness analysis
• Why is norm(B) <= norm(A) ?
• We know that A = USVT , where U and V are orthogonal, and S is as shown above
• Let β := norm(B)
• By definition of norm(B),
there exists a unit-norm column-vector y such that norm(By) = β
• Use that y to construct a longer (but still) unit-norm column vector x := V [0,yT]T
• norm (A x)
= norm ( USVT V[0,yT]T )
= norm ( S [0,yT]T )
= norm ( [ 0, (By)T ]T )
= norm (By)
=β
• Thus, there exists a vector x such that norm(Ax) = β,
which implies that norm(A) cannot be less than β, i.e., norm(A) >= β = norm(B)
Singular Value Decomposition (SVD)
• Uniqueness analysis

• Properties of other singular values and singular vectors follows by induction

• Thus, if all singular values are distinct, then all singular vectors are unique
(upto sign)
Singular Value Decomposition (SVD)
• How does SVD help us in understanding the multivariate Gaussian ?
• Consider X := AW, where:
• Components of W are independent standard-normal. A is of size MxN, where M < N.
• We use A := USVT , where:
• S is MxN (rectangular) diagonal. U is MxM orthogonal. V is NxN orthogonal.
• AW
= USVT W
= U S W’ (where components of W’ are also independent standard-normal)
= U S’ W’’ (where S’ is square with columns as the first M columns of S,
W’’ is first M components of W’)
= A’ W’’ (where A’=US’ is MxM, and W’’ is Mx1)
• Covariance(X) = C = AAT = U SST UT = A’A’T, where:
• SST is square diagonal of size MxM
• For matrix C to be SPD, the rank of S needs to be M (M non-zero singular values)

Anatomy and Physiology From Science to Life 3rd Edition by Gail Jenkins download pdf
100% (4)
Anatomy and Physiology From Science to Life 3rd Edition by Gail Jenkins download pdf
24 pages
EE114 Power Engineering - I Assignment 03: e G 2 C R e G C
No ratings yet
EE114 Power Engineering - I Assignment 03: e G 2 C R e G C
5 pages
LEED V4 - BDC - Reference Guide - IEQ c9 - Acoustic Performance
100% (1)
LEED V4 - BDC - Reference Guide - IEQ c9 - Acoustic Performance
4 pages
w2e_multivariate_gaussian
No ratings yet
w2e_multivariate_gaussian
6 pages
The Multivariate Gaussian Distribution: 1 Relationship To Univariate Gaussians
No ratings yet
The Multivariate Gaussian Distribution: 1 Relationship To Univariate Gaussians
10 pages
lec08
No ratings yet
lec08
51 pages
Lec9 MultivariateGaussian
No ratings yet
Lec9 MultivariateGaussian
60 pages
Murphy Gaussians
No ratings yet
Murphy Gaussians
15 pages
Roweis Gaussianidentities
No ratings yet
Roweis Gaussianidentities
4 pages
Multivariate Statistical Analysis: The Multivariate Normal Distribution
No ratings yet
Multivariate Statistical Analysis: The Multivariate Normal Distribution
13 pages
more_on_gaussians
No ratings yet
more_on_gaussians
11 pages
STA3005 Exploratory Data Analysis Notes
No ratings yet
STA3005 Exploratory Data Analysis Notes
16 pages
MVA Section1 2012
No ratings yet
MVA Section1 2012
14 pages
1) Common Univariate Summaries: I) I) Iii) I) Ii)
No ratings yet
1) Common Univariate Summaries: I) I) Iii) I) Ii)
5 pages
Tut2 Questions
No ratings yet
Tut2 Questions
3 pages
2 Probability and Linear Algebra
No ratings yet
2 Probability and Linear Algebra
21 pages
Multivariate Data Analysis: Universiteit Van Amsterdam
No ratings yet
Multivariate Data Analysis: Universiteit Van Amsterdam
28 pages
More On Gaussians
No ratings yet
More On Gaussians
11 pages
(Ebook) Multivariate Statistical Analysis: Revised And Expanded by Narayan C. Giri (Author) ISBN 9780824747138, 9781482276374, 9780203913239, 9781135522568, 9781135522520, 9781135522575, 9781135522544, 9781482293111 - The ebook in PDF format with all chapters is ready for download
100% (1)
(Ebook) Multivariate Statistical Analysis: Revised And Expanded by Narayan C. Giri (Author) ISBN 9780824747138, 9781482276374, 9780203913239, 9781135522568, 9781135522520, 9781135522575, 9781135522544, 9781482293111 - The ebook in PDF format with all chapters is ready for download
58 pages
GRV and Vectors
No ratings yet
GRV and Vectors
2 pages
(Ebook) Multivariate Statistical Analysis: Revised And Expanded by Narayan C. Giri (Author) ISBN 9780824747138, 9781482276374, 9780203913239, 9781135522568, 9781135522520, 9781135522575, 9781135522544, 9781482293111 2024 Scribd Download
No ratings yet
(Ebook) Multivariate Statistical Analysis: Revised And Expanded by Narayan C. Giri (Author) ISBN 9780824747138, 9781482276374, 9780203913239, 9781135522568, 9781135522520, 9781135522575, 9781135522544, 9781482293111 2024 Scribd Download
76 pages
Multidimensional Gaussian Distribution
No ratings yet
Multidimensional Gaussian Distribution
99 pages
CPSC 540: Machine Learning: Gaussians
No ratings yet
CPSC 540: Machine Learning: Gaussians
30 pages
Covariance Matrix
No ratings yet
Covariance Matrix
6 pages
Pca
No ratings yet
Pca
73 pages
Applied Multivariate Statistical Analysis: Chang Xinfeng Department of Statistics
No ratings yet
Applied Multivariate Statistical Analysis: Chang Xinfeng Department of Statistics
46 pages
Lecture2 Module1 Anova 1
No ratings yet
Lecture2 Module1 Anova 1
11 pages
Briefing of Joint Probability
No ratings yet
Briefing of Joint Probability
14 pages
2012 30 08 MichaelOsborne
No ratings yet
2012 30 08 MichaelOsborne
112 pages
MDA3S
No ratings yet
MDA3S
22 pages
Multivariate normal distribution - Wikipedia, the free encyclopedia
No ratings yet
Multivariate normal distribution - Wikipedia, the free encyclopedia
12 pages
Random Vectors
No ratings yet
Random Vectors
44 pages
Symbiosis International (Deemed University) : Symbiosis School For Online and Digital Learning
No ratings yet
Symbiosis International (Deemed University) : Symbiosis School For Online and Digital Learning
84 pages
2.3 supuestos
No ratings yet
2.3 supuestos
7 pages
Two Dimensional Random Variable
No ratings yet
Two Dimensional Random Variable
4 pages
CL 202 Multivariate Gaussian Handout
No ratings yet
CL 202 Multivariate Gaussian Handout
9 pages
Handout 2 Multivariate
No ratings yet
Handout 2 Multivariate
10 pages
Multivariate Statistics - An Introduction 8th Edition
100% (1)
Multivariate Statistics - An Introduction 8th Edition
202 pages
research methodology part 1
No ratings yet
research methodology part 1
25 pages
Multivariate Analysis Homework Solutions
100% (1)
Multivariate Analysis Homework Solutions
7 pages
Multi Variate Analysis
No ratings yet
Multi Variate Analysis
133 pages
Linear Algebra and Statistics
No ratings yet
Linear Algebra and Statistics
31 pages
Applied multivariate statistical analysis 5th Edition Richard Arnold Johnson download
100% (1)
Applied multivariate statistical analysis 5th Edition Richard Arnold Johnson download
59 pages
Lecture1 Introduction To GPs
No ratings yet
Lecture1 Introduction To GPs
172 pages
Gaussian PDF
No ratings yet
Gaussian PDF
5 pages
Linear Algebra and Applications: Benjamin Recht
No ratings yet
Linear Algebra and Applications: Benjamin Recht
42 pages
Module 2 Rnsit
No ratings yet
Module 2 Rnsit
15 pages
Lecture Notes On The Gaussian Distribution
No ratings yet
Lecture Notes On The Gaussian Distribution
6 pages
Part2 Gaussian RVs.pptx Annotated
No ratings yet
Part2 Gaussian RVs.pptx Annotated
39 pages
Stat331-Multiple Linear Regression
No ratings yet
Stat331-Multiple Linear Regression
13 pages
Solution To Exercises On MVN: 1 Question 1 (I)
No ratings yet
Solution To Exercises On MVN: 1 Question 1 (I)
3 pages
Matrix Introduction
No ratings yet
Matrix Introduction
30 pages
Eece 522 Notes - 24 CH - 10b
No ratings yet
Eece 522 Notes - 24 CH - 10b
18 pages
(Ebook) Applied multivariate statistical analysis, 5th Edition by Richard Arnold Johnson, Dean W. Wichern ISBN 9780130925534, 0130925535 pdf download
No ratings yet
(Ebook) Applied multivariate statistical analysis, 5th Edition by Richard Arnold Johnson, Dean W. Wichern ISBN 9780130925534, 0130925535 pdf download
56 pages
2.5.2 Multivariate Density
No ratings yet
2.5.2 Multivariate Density
12 pages
ML_Lec 3- Review of Linear Algebra
No ratings yet
ML_Lec 3- Review of Linear Algebra
16 pages
Multivariate Material
No ratings yet
Multivariate Material
58 pages
Gaussian Process Intuitive
No ratings yet
Gaussian Process Intuitive
17 pages
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)
Transformation of Axes (Geometry) Mathematics Question Bank
From Everand
Transformation of Axes (Geometry) Mathematics Question Bank
Mohmmad Khaja Shareef
3/5 (1)
Calculus-II (Mathematics) Question Bank
From Everand
Calculus-II (Mathematics) Question Bank
Mohmmad Khaja Shareef
No ratings yet
Two Dimensional Computer Graphics: Exploring the Visual Realm: Two Dimensional Computer Graphics in Computer Vision
From Everand
Two Dimensional Computer Graphics: Exploring the Visual Realm: Two Dimensional Computer Graphics in Computer Vision
Fouad Sabry
No ratings yet
29 - Khattab - CS224U IR Part 3
No ratings yet
29 - Khattab - CS224U IR Part 3
8 pages
Endsem 2023
No ratings yet
Endsem 2023
20 pages
29 - Khattab - CS224U IR Part 4
No ratings yet
29 - Khattab - CS224U IR Part 4
12 pages
Cross River Solutions
No ratings yet
Cross River Solutions
2 pages
Quiz 2 Cheatsheet
No ratings yet
Quiz 2 Cheatsheet
2 pages
CS355 LabEndsem SeatingArrangement
No ratings yet
CS355 LabEndsem SeatingArrangement
10 pages
Lab8 Question2 Model
No ratings yet
Lab8 Question2 Model
1 page
Digital Storage Oscilloscope (DSO) Presentation Based On The TDS210/TDS1002 Series User Manual
No ratings yet
Digital Storage Oscilloscope (DSO) Presentation Based On The TDS210/TDS1002 Series User Manual
19 pages
Autocad Lab-8 Steps+PDF Drawings
No ratings yet
Autocad Lab-8 Steps+PDF Drawings
12 pages
Learn Operating Digital Storage Oscilloscopes (Dsos) : Rupesh Gupta
No ratings yet
Learn Operating Digital Storage Oscilloscopes (Dsos) : Rupesh Gupta
27 pages
Lab Report 1 Tray Drier
100% (1)
Lab Report 1 Tray Drier
7 pages
Reporte Diario Laboratorio Asoc Nare Mayo
No ratings yet
Reporte Diario Laboratorio Asoc Nare Mayo
433 pages
Second Quarterly Examination in Grade 7 Mathemetics - Docxfinal
No ratings yet
Second Quarterly Examination in Grade 7 Mathemetics - Docxfinal
11 pages
Chem 1 Syllabus
No ratings yet
Chem 1 Syllabus
10 pages
Curtain Wall-Report REv 1-10-21
No ratings yet
Curtain Wall-Report REv 1-10-21
11 pages
Mikael Owen Syahputra Sinaga - Bab 8
No ratings yet
Mikael Owen Syahputra Sinaga - Bab 8
10 pages
Q.2 Revision Worksheet-2
No ratings yet
Q.2 Revision Worksheet-2
12 pages
P08 Parametric Equations
No ratings yet
P08 Parametric Equations
38 pages
Introduction To Contact Regulation Thermography
No ratings yet
Introduction To Contact Regulation Thermography
36 pages
Systems Thinking Concepts
No ratings yet
Systems Thinking Concepts
4 pages
To solve this problem 2
No ratings yet
To solve this problem 2
2 pages
Serial Review: Oxidative DNA Damage and Repair: Guest Editor: Miral Dizdaroglu
No ratings yet
Serial Review: Oxidative DNA Damage and Repair: Guest Editor: Miral Dizdaroglu
14 pages
Oiv Ma As2 08
No ratings yet
Oiv Ma As2 08
7 pages
Mosfet PowerPoint Presentation
100% (8)
Mosfet PowerPoint Presentation
71 pages
Fluid Mechanics Experiment 2
No ratings yet
Fluid Mechanics Experiment 2
4 pages
Spotcheck, SKL-WP2: Certification
No ratings yet
Spotcheck, SKL-WP2: Certification
3 pages
Quiz #3
No ratings yet
Quiz #3
1 page
Numerical Methods Notes
No ratings yet
Numerical Methods Notes
21 pages
Application of Layers With Internal Stress For Silicon Wafer Shaping
No ratings yet
Application of Layers With Internal Stress For Silicon Wafer Shaping
16 pages
Vertical_Tunnel_Field_Effect_Transistors_VTFETs_A_Potential_Candidate_for_Low_Power_Applications (3)
No ratings yet
Vertical_Tunnel_Field_Effect_Transistors_VTFETs_A_Potential_Candidate_for_Low_Power_Applications (3)
7 pages
How We Come To Realize That The Earth Is Not The Center of The Universe
100% (6)
How We Come To Realize That The Earth Is Not The Center of The Universe
8 pages
Physical Effects in Wormholes and Time Machines
No ratings yet
Physical Effects in Wormholes and Time Machines
9 pages
2024+Guechi+and+Khalouta
No ratings yet
2024+Guechi+and+Khalouta
18 pages
BS 2782-8-Method 820A-1996 (1998) Methods of Testing Plastics. Other Properties. Determination of Water Vapour Transmission Rate (Dish Method) .
No ratings yet
BS 2782-8-Method 820A-1996 (1998) Methods of Testing Plastics. Other Properties. Determination of Water Vapour Transmission Rate (Dish Method) .
17 pages
Integrated Microelectronic Devices ch1 Solution
No ratings yet
Integrated Microelectronic Devices ch1 Solution
13 pages
Stoke's Law
No ratings yet
Stoke's Law
6 pages
Core Practical 6_ Investigating Diffraction Gratings _ Edexcel International A Level Physics Revision Notes 2018
No ratings yet
Core Practical 6_ Investigating Diffraction Gratings _ Edexcel International A Level Physics Revision Notes 2018
6 pages
Radio Pro 44078
No ratings yet
Radio Pro 44078
8 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

6 Multivariate Gaussian

Uploaded by

6 Multivariate Gaussian

Uploaded by

CS 215

Data Analysis and Interpretation

• If A is invertible, then C is invertible, and then C is positive definite (PD) where

• What are the level sets of this PDF ?

• Square matrix A is diagonalizable if it is “similar” to a diagonal matrix,

• What are the level sets of this PDF ?

• What are the level sets of this PDF ?

• What are the level sets of this PDF ?

• Objective function is a convex combination of { Cdd } with weights { (vd)2 }

• The point on hypersphere that

These level sets being

• Second mode of variation is second cardinal axis (another eigenvector).

• Now this problem is similar to what we had solved before

• Term (y-µ)T C-1 (y-µ) appearing in exponent

• Showing LHS <= RHS is equivalent to showing LHS2 <= RHS2

• Geometric interpretation related to 2-norm

• Properties of other singular values and singular vectors follows by induction

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.