4.5 Principal Component Analysis
4.5 Principal Component Analysis
A face
𝑧𝑛 1 𝒘 𝑧𝑛 2 𝒘 𝑧𝑛 3 𝒘 𝑧𝑛 4 𝒘 4
image 1 2 3
Sm
Each input will still have 2 co-
New co-ordinate axis ()
all
ordinates, in the new co-ordinate
va
To reduce dimension, can only keep the co-
ri
nce
an
ordinates of those directions that have var
ia system, equal to the distances
ce
ge
La r measured from the new origin
largest variances (e.g., in this example, if
we want to reduce to one-dim, we can keep
𝑒1
the co-ordinate of each point along and
throw away ). We won’t lose much
information
Subject to
𝑁 orthonormality
arg min ∑ ‖𝒙 𝑛 −𝑾 𝒛 𝑛‖ =arg min ‖ 𝑿 − 𝒁𝑾 ‖
2 2
constraints: for and
Learning
𝑾,𝒁projection
𝑛 =1
directions that𝑾result
,𝒁 in smallest reconstruction error
Principal Component Analysis: the algorithm 5
Center the data (subtract the mean from each data point)
Compute the covariance matrix using the centered data matrix as
1 ⊤
𝐒= 𝐗 𝐗 (Assuming is arranged as )
𝑁
Do an eigendecomposition of the covariance matrix (many
methods exist)
Take top leading eigvectors with eigvalues
The -dimensional projection/embedding of Note:
each Caninput is
decide how
many eigvecs to use
based on how much
𝒛𝑛 ≈ 𝐖⊤
𝐾 𝒙𝑛 is the “projection matrix” variance we want to
of size campure (recall that
each gives the variance
in the direction (and their
sum is the total variance)
6
𝒙𝑛
is the cov matrix of the data:
Variance of the projections:
⊤
𝒘 1
PCA is done by doing SVD on the covariance matrix (left and right
singular vectors are the same and become eigenvectors, singular
15
Dim-Red as Matrix Factorization
If we don’t care about the orthonormality constraints, then dim-red
can also be achieved by solving a matrix factorization problem on
the data matrix
D K D
K W
N X ≈ N Z
Matrix
containing the
low-dim rep of
{𝐙 ^ }=argmin
^ ,𝐖
𝐙 , 𝐖 ‖𝐗 − 𝐙𝐖 ‖
2
If , such a factorization gives
a low-rank approximation of
the data matrix X