0% found this document useful (0 votes)
13 views28 pages

315 F19 27 Pca1

Principal component analysis is a dimensionality reduction

Uploaded by

Tigabu Yaya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views28 pages

315 F19 27 Pca1

Principal component analysis is a dimensionality reduction

Uploaded by

Tigabu Yaya
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Disclaimer: These slides can include material from

different sources. I’ll happy to explicitly acknowledge


a source if required. Contact me for requests.

Introduction to Machine Learning


10-315 Fall ‘19

Lecture 27:
Dimensionality Reduction
Principal Component Analysis (PCA) Teacher:
Gianni A. Di Caro
Data are (usually) high-dimensional
v A lot of features describing the inputs
Ø → High-dimensional spaces to represent, store, and manipulate the data!
Ø Corrupted, noisy, missing and latent data, …

v The Internet is an ever increasing source of high-dimensional data to learn from

o Document classification: Features per document = thousands of


words/unigrams, millions of bigrams, contextual information, …

o Surveys (for learning customers’ needs/wishes):

2
Data are (usually) high-dimensional

v Some (basic) data types are inherently high-dimensional

o High-resolution images: millions of


multi-channel pixels

o Medical imaging : E.g., diffusion scans


of brain with ~ 300,000 brain fibers

3
Curse of dimensionality

v Having a large number of features is potentially bad:

q If not precisely selected, many features can be redundant


(e.g., not all words are really useful to classify a document)
o Large noise added to the main (useful) signal

q Difficult to interpret and visualize

q Computational and memory challenges: it’s hard to store and process the data

q Statistical / learning challenges: Complexity of decision rules tend to grow with #features
→ Hard to learn complex rules as it needs more data.

4
Dimensionality reduction can help

v Feature selection: Select the features that are


truly relevant for the task 𝑋$ is irrelevant!

v Latent features: Some linear/nonlinear combination of features provides a more


efficient representation than directly observed features

Data is actually embedded in lower dimensional subspaces or manifolds 5


Feature selection

v One approach: Regularization (MAP): Integrate feature selection into learning objective
by penalizing solutions where all (or most of) feature weights get a non-zero or a large

(MAP formulation)

6
Latent features
v Linear/nonlinear combinations of observed features provide a more efficient representation
and capture underlying relations that govern data better than directly observed features
o Ego, personality, and intelligence are hidden attributes that characterize human behavior better
that attributes from survey questions
o Topics (sports, science, news) are more efficient descriptors instead of document words
o Often a physical meaning is not obvious

q Linear (for data that can be approximated by a linear subspace):


ü Principal Component Analysis (PCA)
ü Factor Analysis
ü Independent Component Analysis (ICA)
q NonLinear:
ü Kernel PCA
ü Laplacian Eigenmaps
ü ISOMAP, Local Linear Embedding (LLE) 7
A simple model for dimensionality reduction / compression

8
A simple model for dimensionality reduction / compression

o 𝑧&' quantifies the amount of vector


𝒘' present in observation 𝒙&
o 𝑾 is a matrix of reference prototype vectors
o The 𝐾 vectors 𝑾 identify a reference system, a 𝐾-dimensional vector subspace where the
higher dimensional vectors 𝒙& can be effectively represented (i.e., approximated)
o 𝑧&' are the components of the 𝒙& vectors in the subspace identified by 𝑾

9
Subspace spanned by a set of vectors

10
Principal Component Analysis (PCA)

11
Principal Component Analysis (PCA): key idea

o When data lies on or near a low 𝑑-dimensional linear subspace, axes of this subspace
are an effective representation of the data
o Identifying the axes is known as Principal Components Analysis, and can be obtained
by Eigen or Singular value decomposition
o We can change the basis in which we represent
the data (and get a new co-ordinate system)
o If, in the new basis, data has low variance along
some dimension, we can ignore those

o In the example we can represent each point using just


the first co-ordinate (with very little information loss)
o This helps in reducing dimensionality:
From 𝑥 = 𝑥/ , 𝑥1 to 𝑧 = [𝑧/ , 𝑧1 ] (i.e., from 2D to 1D)

12
Principal Component Analysis (PCA): key idea

v PCA finds a new basis such that information loss


is minimum if we only keep some dimensions

o PCA works (well) only if data can be projected on a linear subspaces


o Basis need to be orthonormal

13
Basis Representation of Data

14
Selecting directions, keeping only a few of them

15
Variance captured by projections

If data centered 16
Direction of maximum variance

17
Direction of Maximum Variance

18
PCAs

v The top eigenvector, 𝒖/ , is called the first Principal Component (PC)


ü Other directions / Principal components, can be found likewise, with each direction
being orthogonal to all previous ones using the eigendecomposition of 𝑺
Ø Need to compute the first 𝐾 orthonormal eigenvectors of 𝑺, 𝒖/ , 𝒖1 , ⋯ 𝒖7 where
the associated eigenvalues are such that 𝜆/ ≥ 𝜆1 ≥ ⋯ ≥ 𝜆7

19
PCA

This dimension can be


dropped with minimal loss
for representing the data

20
PCA algorithm

21
Dimensionality reduction

22
Example of PCA

23
Example of PCA

24
Example of PCA

25
Example of PCA

26
Example of PCA

27
Kernel PCA

28

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy