0% found this document useful (0 votes)

13 views28 pages

315 F19 27 Pca1

Principal component analysis is a dimensionality reduction

Uploaded by

Tigabu Yaya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views28 pages

315 F19 27 Pca1

Principal component analysis is a dimensionality reduction

Uploaded by

Tigabu Yaya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

Disclaimer: These slides can include material from

different sources. I’ll happy to explicitly acknowledge

a source if required. Contact me for requests.

Introduction to Machine Learning

10-315 Fall ‘19

Lecture 27:
Dimensionality Reduction
Principal Component Analysis (PCA) Teacher:
Gianni A. Di Caro
Data are (usually) high-dimensional
v A lot of features describing the inputs
Ø → High-dimensional spaces to represent, store, and manipulate the data!
Ø Corrupted, noisy, missing and latent data, …

v The Internet is an ever increasing source of high-dimensional data to learn from

o Document classification: Features per document = thousands of

words/unigrams, millions of bigrams, contextual information, …

o Surveys (for learning customers’ needs/wishes):

2
Data are (usually) high-dimensional

v Some (basic) data types are inherently high-dimensional

o High-resolution images: millions of

multi-channel pixels

o Medical imaging : E.g., diffusion scans

of brain with ~ 300,000 brain fibers

3
Curse of dimensionality

v Having a large number of features is potentially bad:

q If not precisely selected, many features can be redundant

(e.g., not all words are really useful to classify a document)
o Large noise added to the main (useful) signal

q Difficult to interpret and visualize

q Computational and memory challenges: it’s hard to store and process the data

q Statistical / learning challenges: Complexity of decision rules tend to grow with #features
→ Hard to learn complex rules as it needs more data.

4
Dimensionality reduction can help

v Feature selection: Select the features that are

truly relevant for the task 𝑋$ is irrelevant!

v Latent features: Some linear/nonlinear combination of features provides a more

efficient representation than directly observed features

Data is actually embedded in lower dimensional subspaces or manifolds 5

Feature selection

v One approach: Regularization (MAP): Integrate feature selection into learning objective
by penalizing solutions where all (or most of) feature weights get a non-zero or a large

(MAP formulation)

6
Latent features
v Linear/nonlinear combinations of observed features provide a more efficient representation
and capture underlying relations that govern data better than directly observed features
o Ego, personality, and intelligence are hidden attributes that characterize human behavior better
that attributes from survey questions
o Topics (sports, science, news) are more efficient descriptors instead of document words
o Often a physical meaning is not obvious

q Linear (for data that can be approximated by a linear subspace):

ü Principal Component Analysis (PCA)
ü Factor Analysis
ü Independent Component Analysis (ICA)
q NonLinear:
ü Kernel PCA
ü Laplacian Eigenmaps
ü ISOMAP, Local Linear Embedding (LLE) 7
A simple model for dimensionality reduction / compression

8
A simple model for dimensionality reduction / compression

o 𝑧&' quantifies the amount of vector

𝒘' present in observation 𝒙&
o 𝑾 is a matrix of reference prototype vectors
o The 𝐾 vectors 𝑾 identify a reference system, a 𝐾-dimensional vector subspace where the
higher dimensional vectors 𝒙& can be effectively represented (i.e., approximated)
o 𝑧&' are the components of the 𝒙& vectors in the subspace identified by 𝑾

9
Subspace spanned by a set of vectors

10
Principal Component Analysis (PCA)

11
Principal Component Analysis (PCA): key idea

o When data lies on or near a low 𝑑-dimensional linear subspace, axes of this subspace
are an effective representation of the data
o Identifying the axes is known as Principal Components Analysis, and can be obtained
by Eigen or Singular value decomposition
o We can change the basis in which we represent
the data (and get a new co-ordinate system)
o If, in the new basis, data has low variance along
some dimension, we can ignore those

o In the example we can represent each point using just

the first co-ordinate (with very little information loss)
o This helps in reducing dimensionality:
From 𝑥 = 𝑥/ , 𝑥1 to 𝑧 = [𝑧/ , 𝑧1 ] (i.e., from 2D to 1D)

12
Principal Component Analysis (PCA): key idea

v PCA finds a new basis such that information loss

is minimum if we only keep some dimensions

o PCA works (well) only if data can be projected on a linear subspaces

o Basis need to be orthonormal

13
Basis Representation of Data

14
Selecting directions, keeping only a few of them

15
Variance captured by projections

If data centered 16
Direction of maximum variance

17
Direction of Maximum Variance

18
PCAs

v The top eigenvector, 𝒖/ , is called the first Principal Component (PC)

ü Other directions / Principal components, can be found likewise, with each direction
being orthogonal to all previous ones using the eigendecomposition of 𝑺
Ø Need to compute the first 𝐾 orthonormal eigenvectors of 𝑺, 𝒖/ , 𝒖1 , ⋯ 𝒖7 where
the associated eigenvalues are such that 𝜆/ ≥ 𝜆1 ≥ ⋯ ≥ 𝜆7

19
PCA

This dimension can be

dropped with minimal loss
for representing the data

20
PCA algorithm

21
Dimensionality reduction

22
Example of PCA

23
Example of PCA

24
Example of PCA

25
Example of PCA

26
Example of PCA

27
Kernel PCA

Chapter 10. Dimensionality Reduction With PCA
No ratings yet
Chapter 10. Dimensionality Reduction With PCA
23 pages
GHH1
100% (1)
GHH1
8 pages
Raspberry Pi For Beginners & Advanced Users The Comprehensive Raspberry Pi Mastery Guide
No ratings yet
Raspberry Pi For Beginners & Advanced Users The Comprehensive Raspberry Pi Mastery Guide
94 pages
08 HighDimensional PDF
No ratings yet
08 HighDimensional PDF
88 pages
Serai Catalogue ENG 2017-18
No ratings yet
Serai Catalogue ENG 2017-18
44 pages
FR Pca Lda
No ratings yet
FR Pca Lda
52 pages
08 HighDimensional PDF
No ratings yet
08 HighDimensional PDF
88 pages
Kertas 1
No ratings yet
Kertas 1
41 pages
ECU Part Number Vehicle Application List
100% (3)
ECU Part Number Vehicle Application List
2 pages
Class8-9 DataPreprocessing DataReduction 30Sept-05Oct2020
No ratings yet
Class8-9 DataPreprocessing DataReduction 30Sept-05Oct2020
22 pages
MODULE T4 - DCC50242 BIM Terbaru
No ratings yet
MODULE T4 - DCC50242 BIM Terbaru
147 pages
Dimensionality Reduction: Principal Component Analysis (PCA)
No ratings yet
Dimensionality Reduction: Principal Component Analysis (PCA)
11 pages
Data Reduction Techniques
No ratings yet
Data Reduction Techniques
41 pages
SE 458 - Data Mining (DM) : Spring 2019 Section W1
No ratings yet
SE 458 - Data Mining (DM) : Spring 2019 Section W1
10 pages
Shortcut Keys of Tally .ERP 9
94% (33)
Shortcut Keys of Tally .ERP 9
6 pages
Principal Component Analysis: Jianxin Wu
No ratings yet
Principal Component Analysis: Jianxin Wu
24 pages
Microprocessor Chapter 2
No ratings yet
Microprocessor Chapter 2
111 pages
Dimensionality Reduction Using Principal Component Analysis
No ratings yet
Dimensionality Reduction Using Principal Component Analysis
32 pages
Principal Component Analysis: Atent Ariables
No ratings yet
Principal Component Analysis: Atent Ariables
13 pages
Chapter 7 INT 21H
No ratings yet
Chapter 7 INT 21H
14 pages
Nanohub U Pen Alam l3.12
No ratings yet
Nanohub U Pen Alam l3.12
15 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
CS6551 Chapter 7
No ratings yet
CS6551 Chapter 7
57 pages
Bhel Report
No ratings yet
Bhel Report
17 pages
14: Dimensionality Reduction (PCA) : Motivation 1: Data Compression
No ratings yet
14: Dimensionality Reduction (PCA) : Motivation 1: Data Compression
7 pages
Sprpackagereport 2434200900
No ratings yet
Sprpackagereport 2434200900
135 pages
8086 Hardware Specification: Segment 5
No ratings yet
8086 Hardware Specification: Segment 5
19 pages
Machine Learning: Unsupervised Learning Dimensionality Reduction K-Means Clustering
No ratings yet
Machine Learning: Unsupervised Learning Dimensionality Reduction K-Means Clustering
28 pages
Anova One Way & Two Way Classified Data: Dr. Mukta Datta Mazumder Associate Professor Department of Statistics
No ratings yet
Anova One Way & Two Way Classified Data: Dr. Mukta Datta Mazumder Associate Professor Department of Statistics
32 pages
Overview Network
No ratings yet
Overview Network
42 pages
9 ML
No ratings yet
9 ML
39 pages
Artificial Neural Network
No ratings yet
Artificial Neural Network
8 pages
Chapter 4
No ratings yet
Chapter 4
80 pages
P-3.1.4 - Pca
No ratings yet
P-3.1.4 - Pca
44 pages
PCA
100% (1)
PCA
33 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
60 pages
Dim Reduction & Pattern Recognition
No ratings yet
Dim Reduction & Pattern Recognition
63 pages
Lecture 7: Unsupervised Learning: C19 Machine Learning Hilary 2013 A. Zisserman
No ratings yet
Lecture 7: Unsupervised Learning: C19 Machine Learning Hilary 2013 A. Zisserman
20 pages
EX-Featured Excipient Capsule and Tablet Diluents
No ratings yet
EX-Featured Excipient Capsule and Tablet Diluents
8 pages
Team POP Quiz 1 Template: About Module 1 Prepared By: BOIL BABY BOIL
No ratings yet
Team POP Quiz 1 Template: About Module 1 Prepared By: BOIL BABY BOIL
13 pages
Inclined Bedding - Fold (Lab 2A)
No ratings yet
Inclined Bedding - Fold (Lab 2A)
14 pages
Ruiz Modified I2ml3e Chap6
No ratings yet
Ruiz Modified I2ml3e Chap6
38 pages
Solve Linear Programming Problems - MATLAB Linprog - MathWorks India
No ratings yet
Solve Linear Programming Problems - MATLAB Linprog - MathWorks India
24 pages
Face Recognition PAC
No ratings yet
Face Recognition PAC
24 pages
BD52XXG/FVE Series BD53XXG/FVE Series: Voltage Detector Ic With Adjustable Delay Time
No ratings yet
BD52XXG/FVE Series BD53XXG/FVE Series: Voltage Detector Ic With Adjustable Delay Time
5 pages
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
No ratings yet
1.variable Reduction 2.principal Component Analysis: Topic UNIT-4
19 pages
Operater Overloading B
No ratings yet
Operater Overloading B
17 pages
University of Gondar Institute of Technology: C++ Programming-2 Assignment
No ratings yet
University of Gondar Institute of Technology: C++ Programming-2 Assignment
7 pages
Further Properties of Splines and B-Splines
No ratings yet
Further Properties of Splines and B-Splines
17 pages
Chapter Five Principal Comonent Analysis (PCA)
No ratings yet
Chapter Five Principal Comonent Analysis (PCA)
33 pages
Cell Structures and Their Functions
No ratings yet
Cell Structures and Their Functions
1 page
3.2 Pca
No ratings yet
3.2 Pca
27 pages
Pitriani Rajab Mangasi - 201830112
No ratings yet
Pitriani Rajab Mangasi - 201830112
14 pages
Unit 4 Dimenstionality Reduction
No ratings yet
Unit 4 Dimenstionality Reduction
104 pages
PCA - Feb 8
No ratings yet
PCA - Feb 8
28 pages
Arc-Free Atmospheric Pressure Cold Plasma Jets - A Review
No ratings yet
Arc-Free Atmospheric Pressure Cold Plasma Jets - A Review
12 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
19 pages
Linear Regression: Dimensionality Reduction
No ratings yet
Linear Regression: Dimensionality Reduction
7 pages
Measures of Location (Solutions) : I I I I J J I I I I I I I
No ratings yet
Measures of Location (Solutions) : I I I I J J I I I I I I I
3 pages
AI Assignment 4
No ratings yet
AI Assignment 4
5 pages
Week12 PCA BayesianInference Before Lecture
No ratings yet
Week12 PCA BayesianInference Before Lecture
82 pages
CHBE413CDS Lecture 12 Unsupervised DimRed
No ratings yet
CHBE413CDS Lecture 12 Unsupervised DimRed
30 pages
Dimension Reduction
No ratings yet
Dimension Reduction
23 pages
Quiz 3 Key
No ratings yet
Quiz 3 Key
6 pages
SPAD
No ratings yet
SPAD
34 pages
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
No ratings yet
FALLSEM2024-25 SWE1015 ETH VL2024250103260 2024-09-18 Reference-Material-I
62 pages
Unit V Foml
No ratings yet
Unit V Foml
18 pages
Module 3
No ratings yet
Module 3
41 pages
Unit 4
No ratings yet
Unit 4
79 pages
Lecture 3
No ratings yet
Lecture 3
14 pages
Lecture 6 (B) PCA-II
No ratings yet
Lecture 6 (B) PCA-II
90 pages
CS464 Ch6 FeatureExtraction
No ratings yet
CS464 Ch6 FeatureExtraction
46 pages
W4.2 DataPreProcessing-PCA
No ratings yet
W4.2 DataPreProcessing-PCA
22 pages
03 Dimensionality Reduction
No ratings yet
03 Dimensionality Reduction
38 pages
Lecture 9 - PCA
No ratings yet
Lecture 9 - PCA
44 pages
Love Report 1
No ratings yet
Love Report 1
10 pages
Noc19-Ae05 Assignment Week 05
No ratings yet
Noc19-Ae05 Assignment Week 05
4 pages
22AIP3101A Session 7
No ratings yet
22AIP3101A Session 7
28 pages
Security Processor Architecture 1
No ratings yet
Security Processor Architecture 1
29 pages
Visualization 9 Dim Reduction
No ratings yet
Visualization 9 Dim Reduction
73 pages
A Study On Machine Learning Algorithms and Its Applications
No ratings yet
A Study On Machine Learning Algorithms and Its Applications
13 pages
Machine Learning Is A Computer Vision
No ratings yet
Machine Learning Is A Computer Vision
7 pages
Feature Selection
No ratings yet
Feature Selection
6 pages
کتاب نهم بارگزاری شده
No ratings yet
کتاب نهم بارگزاری شده
55 pages
Article SlotStress 2006
No ratings yet
Article SlotStress 2006
7 pages
315 F19 15 SVM 2
No ratings yet
315 F19 15 SVM 2
35 pages
MTE 02 English January 2022 December 2022
No ratings yet
MTE 02 English January 2022 December 2022
36 pages
U5@-Data Reduction
No ratings yet
U5@-Data Reduction
22 pages
08 Training
No ratings yet
08 Training
18 pages
Machine Learning (CSO851) - Lecture 03
No ratings yet
Machine Learning (CSO851) - Lecture 03
71 pages
Principal Component Analysis (PCA) : Gundimeda Venugopal
No ratings yet
Principal Component Analysis (PCA) : Gundimeda Venugopal
17 pages
ITS323Y12S1E02 Final Exam Answers
No ratings yet
ITS323Y12S1E02 Final Exam Answers
21 pages
3-Data Fundamentals For BI - Part2
No ratings yet
3-Data Fundamentals For BI - Part2
44 pages
Subnetting and Supernetting Classful Addressing
No ratings yet
Subnetting and Supernetting Classful Addressing
34 pages
Ajst 9 1 150 154
No ratings yet
Ajst 9 1 150 154
5 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
27 pages
Hematocritc Determination
No ratings yet
Hematocritc Determination
3 pages
Unit 3
No ratings yet
Unit 3
102 pages
10 Autoencoders
No ratings yet
10 Autoencoders
42 pages
IDS 4 (Week 14)
No ratings yet
IDS 4 (Week 14)
66 pages
Lecture 9 - Data Prep - Reduction - PCA-M
No ratings yet
Lecture 9 - Data Prep - Reduction - PCA-M
44 pages
ASTRO 3: Introductory Astronomy 3rd Edition Michael A. Seeds - The Ebook in PDF Format Is Available For Download
No ratings yet
ASTRO 3: Introductory Astronomy 3rd Edition Michael A. Seeds - The Ebook in PDF Format Is Available For Download
52 pages
7SX80003BA501BA0-Z+P10 Datasheet en
No ratings yet
7SX80003BA501BA0-Z+P10 Datasheet en
3 pages
20 Pca
No ratings yet
20 Pca
50 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
Computer Vision and Image Processing - Fundamentals and Applications
No ratings yet
Computer Vision and Image Processing - Fundamentals and Applications
34 pages
AI Algorithms: Foundations, Applications, and Advancements
From Everand
AI Algorithms: Foundations, Applications, and Advancements
Anand Vemula
No ratings yet
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

315 F19 27 Pca1

Uploaded by

315 F19 27 Pca1

Uploaded by

Disclaimer: These slides can include material from

different sources. I’ll happy to explicitly acknowledge

Introduction to Machine Learning

v The Internet is an ever increasing source of high-dimensional data to learn from

o Document classification: Features per document = thousands of

o Surveys (for learning customers’ needs/wishes):

v Some (basic) data types are inherently high-dimensional

o High-resolution images: millions of

o Medical imaging : E.g., diffusion scans

v Having a large number of features is potentially bad:

q If not precisely selected, many features can be redundant

q Difficult to interpret and visualize

v Feature selection: Select the features that are

v Latent features: Some linear/nonlinear combination of features provides a more

Data is actually embedded in lower dimensional subspaces or manifolds 5

q Linear (for data that can be approximated by a linear subspace):

o 𝑧&' quantifies the amount of vector

o In the example we can represent each point using just

v PCA finds a new basis such that information loss

o PCA works (well) only if data can be projected on a linear subspaces

v The top eigenvector, 𝒖/ , is called the first Principal Component (PC)

This dimension can be

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.