0% found this document useful (0 votes)
5 views18 pages

14 Pca

The document discusses Principal Components Analysis (PCA) and Autoencoders as methods for dimensionality reduction in data analysis. PCA aims to identify directions in data that explain variance, while Autoencoders are neural networks designed to minimize reconstruction error. Both techniques are useful for data visualization, preprocessing, and modeling, with PCA being a linear approach and Autoencoders allowing for nonlinear representations.

Uploaded by

tungxeom
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views18 pages

14 Pca

The document discusses Principal Components Analysis (PCA) and Autoencoders as methods for dimensionality reduction in data analysis. PCA aims to identify directions in data that explain variance, while Autoencoders are neural networks designed to minimize reconstruction error. Both techniques are useful for data visualization, preprocessing, and modeling, with PCA being a linear approach and Autoencoders allowing for nonlinear representations.

Uploaded by

tungxeom
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

CSC 411: Lecture 14: Principal Components Analysis &

Autoencoders

Raquel Urtasun & Rich Zemel

University of Toronto

Nov 4, 2015

Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 1 / 18
Today

Dimensionality Reduction
PCA
Autoencoders

Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 2 / 18
Mixture models and Distributed Representations

One problem with mixture models: each observation assumed to come from
one of K prototypes
Constraint that only one active (responsibilities sum to one) limits
representational power
Alternative: Distributed representation, with several latent variables relevant
to each observation
Can be several binary/discrete variables, or continuous

Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 3 / 18
Example: continuous underlying variables

What are the intrinsic latent dimensions in these two datasets?

How can we find these dimensions from the data?

Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 4 / 18
Principal Components Analysis

PCA: most popular instance of second main class of unsupervised learning


methods, projection methods, aka dimensionality-reduction methods
Aim: find small number of ”directions” in input space that explain variation
in input data; re-represent data by projecting along those directions
Important assumption: variation contains information
Data is assumed to be continuous:
I linear relationship between data and learned representation

Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 5 / 18
PCA: Common tool

Handles high-dimensional data


I if data has thousands of dimensions, can be difficult for classifier to
deal with
Often can be described by much lower dimensional representation
Useful for:
I Visualization
I Preprocessing
I Modeling – prior for new data
I Compression

Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 6 / 18
PCA: Intuition
Assume start with N data vectors, of dimensionality D
Aim to reduce dimensionality:
I linearly project (multiply by matrix) to much lower dimensional space,

M << D
Search for orthogonal directions in space w/ highest variance
I project data onto this subspace

Structure of data vectors is encoded in sample covariance

Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 7 / 18
Finding principal components

To find the principal component directions, we center the data (subtract the
sample mean from each variable)
Calculate the empirical covariance matrix:
N
1 X (n)
C= (x − x̄)(x(n) − x̄)T
N n=1

with x̄ the mean


What’s the dimensionality of x?
Find the M eigenvectors with largest eigenvalues of C : these are the
principal components
Assemble these eigenvectors into a D × M matrix U
We can now express D-dimensional vectors x by projecting them to
M-dimensional z
z = UT x
Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 8 / 18
Standard PCA

Algorithm: to find M components underlying D-dimensional data


1. Select the top M eigenvectors of C (data covariance matrix):
N
1 X (n)
C= (x − x̄)(x(n) − x̄)T = UΣU T ≈ UΣ1:M U1:M
T
N n=1

where U: orthogonal, columns = unit-length eigenvectors

U T U = UU T = 1

and Σ: matrix with eigenvalues in diagonal = variance in direction of


eigenvector
2. Project each input vector x into this subspace, e.g.,

zj = uT
j x;
T
z = U1:M x

Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 9 / 18
Two Derivations of PCA

Two views/derivations:
I Maximize variance (scatter of green points)
I Minimize error (red-green distance per datapoint)

Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 10 / 18
PCA: Minimizing Reconstruction Error

We can think of PCA as projecting the data onto a lower-dimensional


subspace
One derivation is that we want to find the projection such that the best
linear reconstruction of the data is as close as possible to the original data
X
J= ||x(n) − x̃(n) ||2
n

where
M D
(n)
X X
x̃(n) = zj uj + bj uj
j=1 j=M+1

Objective minimized when first M components are the eigenvectors with the
maximal eigenvalues
(n)
zj = (x(n) )T uj ; bj = x̄T uj

Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 11 / 18
Applying PCA to faces

Run PCA on 2429 19x19 grayscale images (CBCL data)


Compresses the data: can get good reconstructions with only 3 components

PCA for pre-processing: can apply classifier to latent representation


I PCA w/ 3 components obtains 79% accuracy on face/non-face
discrimination in test data vs. 76.8% for m.o.G with 84 states
Can also be good for visualization

Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 12 / 18
Applying PCA to faces: Learned basis

Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 13 / 18
Applying PCA to digits

Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 14 / 18
Relation to Neural Networks

PCA is closely related to a particular form of neural network


An autoencoder is a neural network whose outputs are its own inputs

The goal is to minimize reconstruction error

Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 15 / 18
Autoencoders

Define
z = f (W x); x̂ = g (V z)

Goal:
N
1 X (n)
min ||x − x̂(n) ||2
W,V 2N n=1

If g and f are linear


N
1 X (n)
min ||x − VW x(n) ||2
W,V 2N n=1

In other words, the optimal solution is PCA.

Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 16 / 18
Autoencoders: Nonlinear PCA

What if g () is not linear?


Then we are basically doing nonlinear PCA
Some subtleties but in general this is an accurate description

Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 17 / 18
Comparing Reconstructions

Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 18 / 18

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy