315 F19 27 Pca1
315 F19 27 Pca1
Lecture 27:
Dimensionality Reduction
Principal Component Analysis (PCA) Teacher:
Gianni A. Di Caro
Data are (usually) high-dimensional
v A lot of features describing the inputs
Ø → High-dimensional spaces to represent, store, and manipulate the data!
Ø Corrupted, noisy, missing and latent data, …
2
Data are (usually) high-dimensional
3
Curse of dimensionality
q Computational and memory challenges: it’s hard to store and process the data
q Statistical / learning challenges: Complexity of decision rules tend to grow with #features
→ Hard to learn complex rules as it needs more data.
4
Dimensionality reduction can help
v One approach: Regularization (MAP): Integrate feature selection into learning objective
by penalizing solutions where all (or most of) feature weights get a non-zero or a large
(MAP formulation)
6
Latent features
v Linear/nonlinear combinations of observed features provide a more efficient representation
and capture underlying relations that govern data better than directly observed features
o Ego, personality, and intelligence are hidden attributes that characterize human behavior better
that attributes from survey questions
o Topics (sports, science, news) are more efficient descriptors instead of document words
o Often a physical meaning is not obvious
8
A simple model for dimensionality reduction / compression
9
Subspace spanned by a set of vectors
10
Principal Component Analysis (PCA)
11
Principal Component Analysis (PCA): key idea
o When data lies on or near a low 𝑑-dimensional linear subspace, axes of this subspace
are an effective representation of the data
o Identifying the axes is known as Principal Components Analysis, and can be obtained
by Eigen or Singular value decomposition
o We can change the basis in which we represent
the data (and get a new co-ordinate system)
o If, in the new basis, data has low variance along
some dimension, we can ignore those
12
Principal Component Analysis (PCA): key idea
13
Basis Representation of Data
14
Selecting directions, keeping only a few of them
15
Variance captured by projections
If data centered 16
Direction of maximum variance
17
Direction of Maximum Variance
18
PCAs
19
PCA
20
PCA algorithm
21
Dimensionality reduction
22
Example of PCA
23
Example of PCA
24
Example of PCA
25
Example of PCA
26
Example of PCA
27
Kernel PCA
28