14 Pca
14 Pca
Autoencoders
University of Toronto
Nov 4, 2015
Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 1 / 18
Today
Dimensionality Reduction
PCA
Autoencoders
Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 2 / 18
Mixture models and Distributed Representations
One problem with mixture models: each observation assumed to come from
one of K prototypes
Constraint that only one active (responsibilities sum to one) limits
representational power
Alternative: Distributed representation, with several latent variables relevant
to each observation
Can be several binary/discrete variables, or continuous
Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 3 / 18
Example: continuous underlying variables
Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 4 / 18
Principal Components Analysis
Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 5 / 18
PCA: Common tool
Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 6 / 18
PCA: Intuition
Assume start with N data vectors, of dimensionality D
Aim to reduce dimensionality:
I linearly project (multiply by matrix) to much lower dimensional space,
M << D
Search for orthogonal directions in space w/ highest variance
I project data onto this subspace
Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 7 / 18
Finding principal components
To find the principal component directions, we center the data (subtract the
sample mean from each variable)
Calculate the empirical covariance matrix:
N
1 X (n)
C= (x − x̄)(x(n) − x̄)T
N n=1
U T U = UU T = 1
zj = uT
j x;
T
z = U1:M x
Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 9 / 18
Two Derivations of PCA
Two views/derivations:
I Maximize variance (scatter of green points)
I Minimize error (red-green distance per datapoint)
Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 10 / 18
PCA: Minimizing Reconstruction Error
where
M D
(n)
X X
x̃(n) = zj uj + bj uj
j=1 j=M+1
Objective minimized when first M components are the eigenvectors with the
maximal eigenvalues
(n)
zj = (x(n) )T uj ; bj = x̄T uj
Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 11 / 18
Applying PCA to faces
Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 12 / 18
Applying PCA to faces: Learned basis
Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 13 / 18
Applying PCA to digits
Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 14 / 18
Relation to Neural Networks
Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 15 / 18
Autoencoders
Define
z = f (W x); x̂ = g (V z)
Goal:
N
1 X (n)
min ||x − x̂(n) ||2
W,V 2N n=1
Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 16 / 18
Autoencoders: Nonlinear PCA
Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 17 / 18
Comparing Reconstructions
Urtasun & Zemel (UofT) CSC 411: 14-PCA & Autoencoders Nov 4, 2015 18 / 18