0% found this document useful (0 votes)
19 views15 pages

4.5 Principal Component Analysis

Uploaded by

Mrinal Borah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
19 views15 pages

4.5 Principal Component Analysis

Uploaded by

Mrinal Borah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 15

Topic: Principal Component Analysis

Prof. Bhabesh Deka


Dept. of ECE
Tezpur University
2
Dimensionality Reduction Can think of W as a linear
mapping that transforms low-
dim to high-dim
 A broad class of techniques Some dim-red techniques assume a
nonlinear mapping function such
that
 Goal is to compress the original representation of the inputs For example, can be
modeled by a kernel or a
deep neural net
 Example: Approximate each input , as a linear combination of
“basis” vectors , each also
Note: These “basis” vectors need
not necessarily be linearly
𝐾 is
independent. But for some dim. 𝒙 𝑛 ≈ ∑ 𝑧 𝑛𝑘 𝒘 𝑘= 𝐖 𝒛 𝑛
red. techniques, e.g., classic 𝑘=1 is
principal component analysis
 We have represented each by a -dim vector (a new feat. rep)
(PCA), they are

 To store such inputs , we need to keep and


 Originally we required storage, now storage
 If , this yields substantial storage saving, hence good compression
3
Dimensionality Reduction
Each “basis” image is like a
“template” that captures the
 Dim-red for face images common properties of face K=4 “basis” face
images in the dataset images

A face
𝑧𝑛 1 𝒘 𝑧𝑛 2 𝒘 𝑧𝑛 3 𝒘 𝑧𝑛 4 𝒘 4
image 1 2 3

 In this example, is a low-dim feature rep. for Like 4 new


features

 Essentially, each face image in the dataset now represented by just


4 real numbers 
 Different dim-red algos differ in terms of how the basis vectors are
defined/learned
4
Principal Component Analysis (PCA)
 A classic linear dim. reduction method (Pearson, 1901;
Hotelling, 1930)
 Can be seen as
𝑒2
 Learning directions (co-ordinate axes) that capture maximum
PCA is essentially doing avariance
change in
data of axes in which we are
Standard co-ordinate axis () 𝑤 𝑤2 1 representing the data

Sm
Each input will still have 2 co-
New co-ordinate axis ()

all
ordinates, in the new co-ordinate

va
To reduce dimension, can only keep the co-

ri
nce

an
ordinates of those directions that have var
ia system, equal to the distances

ce
ge
La r measured from the new origin
largest variances (e.g., in this example, if
we want to reduce to one-dim, we can keep
𝑒1
the co-ordinate of each point along and
throw away ). We won’t lose much
information
Subject to
𝑁 orthonormality
arg min ∑ ‖𝒙 𝑛 −𝑾 𝒛 𝑛‖ =arg min ‖ 𝑿 − 𝒁𝑾 ‖
2 2
constraints: for and
 Learning
𝑾,𝒁projection
𝑛 =1
directions that𝑾result
,𝒁 in smallest reconstruction error
Principal Component Analysis: the algorithm 5

 Center the data (subtract the mean from each data point)
 Compute the covariance matrix using the centered data matrix as
1 ⊤
𝐒= 𝐗 𝐗 (Assuming is arranged as )
𝑁
 Do an eigendecomposition of the covariance matrix (many
methods exist)
 Take top leading eigvectors with eigvalues
 The -dimensional projection/embedding of Note:
each Caninput is
decide how
many eigvecs to use
based on how much
𝒛𝑛 ≈ 𝐖⊤
𝐾 𝒙𝑛 is the “projection matrix” variance we want to
of size campure (recall that
each gives the variance
in the direction (and their
sum is the total variance)
6

Understanding PCA: The variance perspective


Solving PCA by Finding Max. Variance Directions 7

 Consider projecting an input along a direction


 Projection/embedding of (red points below) will be (green pts
below)
Mean of projections of all inputs:
𝒘1
𝒙𝑛

𝒙𝑛
is the cov matrix of the data:
Variance of the projections:

𝒘 1

For already centered data, =


 Want such that variance is maximized and =
s.t.
Need this constraint
otherwise the objective’s
8
Max. Variance Direction Variance along
the direction
Note: Total
variance of the
data is equal to
the sum of
 Our objective function was s.t. eigenvalues of ,
i.e.,
 Can construct a Lagrangian for this problem PCA would keep
the top such
directions of
1- largest variances

 Taking derivative w.r.t. and setting to zero gives Note: In general,


will have
 Therefore is an eigenvector of the cov matrix with eigenvalueeigvecs

 Claim: is the eigenvector of with largest eigenvalue . Note that

 Thus variance will be max. if is the largest eigenvalue (and is the


corresponding top eigenvector; also known as the first Principal
Component)
 Other large variance directions can also be found likewise (with
9

Understanding PCA: The reconstruction perspective


10
Alternate Basis and Reconstruction
 Representing a data point in the standard orthonormal basis
𝐷
𝒙 𝑛= ∑ 𝑥 𝑛𝑑 𝒆 𝑑 is a vector of all zeros except a
single 1 at the position. Also, for
 Let’s represent the same data
𝑑=1
point in a new orthonormal basis

𝐷 denotes the co-ordinates of


is the projection of along the
direction since (verify) 𝒙 𝑛= ∑ 𝑧 𝑛𝑑 𝒘 𝑑 in the new basis
 Ignoring directions along which
𝑑=1 projection is small, we can
approximate as
𝐾 𝐾 𝐾 Note that is the
𝒙 𝑛 = ∑ 𝑧 𝑛𝑑 𝒘 𝑑 ¿ ∑ ( 𝒙¿ ¿ 𝑛⊤ 𝒘 𝑑 ) 𝒘 𝑑 ¿ ¿ ∑ (𝒘 𝑑 𝒘 ¿ ¿ 𝑑 )𝒙 𝑛 ¿
⊤ reconstruction error on .
𝒙𝑛≈ ^ Would like it to minimize
𝑑 =1 𝑑=1 𝑑=1 w.r.t.

 Now is represented by dim. rep. and (verify)


⊤ is the “projection matrix”
Also,
𝒛 𝑛 ≈ 𝐖 𝐾 𝒙𝑛 of size
11
Minimizing Reconstruction Error
 We plan to use only directions so would like them to be such
that the total reconstruction error is minimizedConstant;
doesn’t depend
𝑁 𝑁 on the ’s
ℒ ( 𝒘 1 , 𝒘 2 , … , 𝒘 𝐾 ) = ∑ ‖𝒙 𝑛 − ^
𝒙 𝑛‖ = ∑ ¿ ¿ ¿ ¿
2
(verify)
𝑛=1 𝑛=1 Variance along
 Each optimal can be found by solving

 Thus minimizing the reconstruction error is equivalent to


maximizing variance
 The directions can be found by solving the
eigendecomposition of
 Note:
 Thus s.t. orthonormality on columns of is the same as solving the
12
Principal Component Analysis
 Center the data (subtract the mean from each data point)
 Compute the covariance matrix using the centered data matrix as
1 ⊤
𝐒= 𝐗 𝐗 (Assuming is arranged as )
𝑁
 Do an eigendecomposition of the covariance matrix (many
methods exist)
 Take top leading eigvectors with eigvalues
 The -dimensional projection/embedding of Note:
each Caninput is
decide how
many eigvecs to use
based on how much
𝒛𝑛 ≈ 𝐖⊤
𝐾 𝒙𝑛 is the “projection matrix” variance we want to
of size campure (recall that
each gives the variance
in the direction (and their
sum is the total variance)
13
Singular Value Decomposition (SVD)
 Any matrix of size can be represented as the following
decomposition min { 𝑁 , 𝐷 }
𝐗 = 𝐔 𝚲 𝐕 ⊤= ∑ 𝜆 𝑘 𝒖𝑘 𝒗 ⊤
𝑘
𝑘=1

Diagonal matrix. If , last rows are all


zeros; if , last columns are all zeros

 is matrix of left singular vectors, each


 is also orthonormal
 is matrix of right singular vectors, each
 is also orthonormal
 is with only diagonal entries - singular values
 Note: If is symmetric then it is known as eigenvalue decomposition
14
Low-Rank Approximation via SVD
 If we just use the top singular values, we get a rank- SVD

 Above SVD approx. can be shown to minimize the


reconstruction error
 Fact: SVD gives the best rank- approximation of a matrix

 PCA is done by doing SVD on the covariance matrix (left and right
singular vectors are the same and become eigenvectors, singular
15
Dim-Red as Matrix Factorization
 If we don’t care about the orthonormality constraints, then dim-red
can also be achieved by solving a matrix factorization problem on
the data matrix
D K D
K W
N X ≈ N Z
Matrix
containing the
low-dim rep of

{𝐙 ^ }=argmin
^ ,𝐖
𝐙 , 𝐖 ‖𝐗 − 𝐙𝐖 ‖
2
If , such a factorization gives
a low-rank approximation of
the data matrix X

 Can solve such problems using ALT-OPT

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy