0% found this document useful (0 votes)

55 views32 pages

Lecture 11 Dimensionality Reduction

The document discusses dimensionality reduction and subspace methods. Principal Component Analysis (PCA) is used to reduce the dimensionality of data while retaining variation. Discriminant functions classify new data based on distance to class prototypes. Subspace methods represent each class in a low-dimensional subspace and classify new data based on which subspace best approximates it. The dimensionality of each class subspace is important and can be determined by the cumulative contributions of eigenvalues of the data's autocorrelation matrix.

Uploaded by

LIU Hengxu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views32 pages

Lecture 11 Dimensionality Reduction

Uploaded by

LIU Hengxu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

Principal Component Analysis (PCA)

Discriminant function
Subspace Methods
Fisher’s method

Lecture 11: Dimensionality Reduction and

Subspace Methods
DD2421

Atsuto Maki

Autumn, 2020

Atsuto Maki Lecture 11: Dimensionality Reduction and Subspace Methods

Our keywords today:

• Dimensionality reduction
– Principal Component Analysis (PCA)

• Discriminant function
– Similarity measures: angle, projection length

• Subspace Methods
Principal Component Analysis (PCA)
1. Maximizing variance
x2 u1

Centroid E (x)

Number of samples
Mean vector of x : E ( x) = (1 / r )å x
Covariance matrix: S = E (( x - E ( x))( x - E ( x))T )
1. Maximum variance criterion
Reduce the effective number of variables
(only dealing with components with larger variances)
E (( xT ui - E ( xT ui )) 2 ) → Maximize i = 1,…, p
= E ((uiT ( x - E ( x))) 2 ) Condition:
= u E (( x - E ( x))( x - E ( x)) )ui = u Sui
T
i
T T
i
uiT u j = d ij
Covariance matrix
max[tr(U T SU )]
The transformation matrix U consists of p columns:
the eigenvectors of the covariance matrix, Σ
(corresponding to its p largest eigenvalues).
Example 3-d to 2-d: Ninety observations simulated in 3-d

The first 2 principal component directions span the plane that best fits the data.
It minimizes the sum of squared distances from each point to the plane.

Figure from
An Introduction to Statistical Learning (James et al.)
Principal Component Analysis (PCA)
2. Min. approximation error
x2
u1
x

Distribution is viewed
from the origin,
not from the centroid
x1

Autocorrelation matrix: Q = E ( xxT )

2. Minimum squared distance criterion
Averaged squared error between x and its
approximation to be minimized by a set {u1 ,!, u p }
x ~x residual
E (|| x - x¢ ||2 ) →minimize i = 1,…, p
p x ¢
Approximated x¢ = å ( x ui )ui
T O
i =1 L(u1 , ! , u p )
|| x¢ ||2 =|| x ||2 - || ~
x ||2 →maximize

The basis consists of p eigenvectors of the

autocorrelation matrix, Q (corresponding to its p
largest eigenvalues).
PCA example 1: Hand-written digits

Feature extraction
Pattern vectors: normalized & blurred patterns

(figure credit: Y. Kurosawa)

Numeral Characters(0 - 4) (figure credit: Y. Kurosawa)
Example 2: Human face classificaiton
Basis vectors of a person: someone’s dictionary

(Eigenvectors from a large collection of his/her face)

(figure credit: K. Fukui)

Example 3: Ship classification (profiles)
Profile vectors

Principal Component Analysis (PCA)

↓
Eigenvectors

Eigenvectors for the greatest eigenvalues

…
What is the set of images of an object
under all possible lighting conditions?

(Harvard Database)
Concept of subspace
Subspace L is a collection of n-d vectors:
spanned by a basis, a set of linearly independent vectors
p
L(b1,, bp ) = {z | z = ∑ξ i bi } (ξ i ∈ R, bi ∈ R n )
i=1

Dimension of a subspace:
the number of base vectors
p = dim( L) << n u1
O
u2
L(u1 , u2 )
Conveniently represented R 3

by orthonormal basis {u1 ,!, u p }

• Variations of “9” covered by a 2-d subspace

u0 u1

(figure credit: Y. Kurosawa)

Background: Schematic of classification

Training Feature extraction

Samples Training
(labeled)

Model (Dictionary)

New inputs Feature extraction

(test data)
Testing Output
class
Training phase
• Given: Limited number of labeled data
(samples whose classes are known)
• The dimensionality often too high for limited
number of samples

One approach to this is to find redundant variables

and discard them, i.e. dimensionality reduction
(without losing essential information)

Information compression: to extract the class

characteristics and throw away the rest!
Testing phase

• Various ways to measure the distance

– Euclidean / Mahalanobis distance
– Angle between vectors
– Projection length on subspaces
…

• Classification methods
– Discriminant function
– Subspace method
…
Nearest Neighbor methods (revisiting)
• Binary classification
C1
Unseen point x
– N1 samples of class C1
– N2 samples of class C2
C2
– Unseen data x

→ Compute distances
to N1 + N2 samples
n-dim feature space

• Find the nearest neighbour

→ classify x to the same class
Discriminant function
• Need to remember all the samples?
– In k-NN we simply used all the training data
– Still cover only a small portion of possible patterns

• Define a class by a few representative patterns

– e.g. the centroid of class distribution

C1 Extreme case: one vector per class

C3
Formulation: one prototype per class
– K classes:
– K prototypes:

Consider Euclidean distances between the new input x

and the prototypes:

→ Choose the class that minimises the distance.

Discriminant function

max
Input x Output
class
Setting the “don’t know” category
• Reject if the distance is above the threshold

C2
Direction cosine as similarity
Think of the new input and the prototype as vectors.
Compute cosine between the input vector x and vector

Simple similarity”
(The closer it is to 1, the more likely to be in )

Now let’s extend the class representative to

a set of basis vectors spans a subspace
Subspace Methods
• Exploit localization of pattern distributions
Samples in the same class such as a digit (or face
images of a person) are similar to each other.
They are localized in a subspace spanned by a set of basis ui .
ui : reference vectors
(orthonormal basis)

a.k.a. CLAFIC
CLAss-Featuring Information Compression
Framework of Subspace Method
1. Training: for each class, compute a low-dimensional
subspace that represents the distribution in the class.
w (1) ,!, w ( K )
2. Testing: determine the class of new unknown input by
comparing which subspace best approximates the input.
Training Testing
Input
vector Similarity 1
subspace 1
subspace 1 subspace 2 Similarity 2

max
subspace 2 Projection
subspace K Similarity K

subspace K
Similarity in Subspace Method
Projection length to the subspace
p Input
S = ∑ (x, ui ) 2

i=1
p: dimension of subspace
ui : reference vectors
(orthonormal basis)

subspace
Similarity in Subspace Method (example)
Projection length to the subspace

p: the dimensionality of class subspace

(can be determined for each class, how?)
Dimensionality of a class subspace
Eigenvalues of autocorrelation matrix Q: λ1 ≥ ...λ j ... ≥ λ p ≥ 0
The number of dimensions to be used for each class:
– Too low → low capability to represent the class
– Too high→ issue of overlapping across classes
•Cumulative contributions
P (i ) Choose a dimension p(i) for each class ω (i)
∑λ j
(i)
a( p ) = j=1 a ( p (i ) ) £ k £ a ( p (i ) + 1) κ: common value
p

∑λ j The projection length to the subspace is

j=1
made uniform.

Experiments still needed to find a good dimensionality

What is a good dimension (direction)
for classification, given labels?
Ideal distributions of input pattern vectors:
§ Patterns from an identical class be close
§ Patterns from different classes be apart
x2 x2

w (1) w (1)

w ( 2) w ( 2)

x1 x1

→ Overlapping distributions harmful for classification

Ratio of between-classes variance to
within-class variance
Within-class variance Average in class w
(i )

1 K
s = å å ( x - E (i ) ( x))T ( x - E (i ) ( x))
2
W
r i =1 xÎw ( i )
Total # of samples Average overall
Between-class variance
1 K (i ) (i )
s B = å r ( E ( x) - E ( x))T ( E (i ) ( x) - E ( x))
2

r i =1
Number of samples in class w
(i )

Within-class var. between-class var. ratio

s B2 Between-class variance In short: distance between classes

Js = 2
sW Within-class var in ave normalized by distance within class
→ the larger the better!
Fisher’s method
Find a subspace most suitable to classification
discriminant analysis
Given pattern distributions in 2 classes
⇒ Optimal axis direction where J is maximized
Scatter matrix represents variation within class
å (x - E
T
Si º (i )
( x))( x - E ( x))
(i )

xÎw ( i )

Within-class: SW º S1 + S 2
Between-classes: S B º å r ( i ) ( E ( i ) ( x) - E ( x))( E ( i ) ( x) - E ( x))T
i =1, 2

r (1) r ( 2 ) (1)
... = ( E ( x) - E ( 2 ) ( x))( E (1) ( x) - E ( 2 ) ( x))T
r
From n-d feature space to 1-d space by Matrix A
A is an n x 1 matrix → n-dim vector a in practice
→ The pattern will become a scalar by y = AT x

Scatter matrix in the space after the transformation:

Sˆi º å(y - E
T
(i )
( y ))( y - E ( y ))
(i )

xÎw ( i )

å
T
= A ( x - T
E ( x ))( x - E(i )
( x )) A = AT (i )
Si A
yÎw ( i )

Within-class: SˆW º Sˆ1 + Sˆ2 = AT S1 A + AT S 2 A = AT SW A

Between-class: Sˆ B º å r ( i ) ( E ( i ) ( y ) - E ( y )) 2 Scalar
i =1, 2

r (1) r ( 2 ) T (1)
... = A ( E ( x) - E ( 2 ) ( x)) 2 A = AT S B A
r
Fisher’s criterion:
Maximizing the ratio of
Sˆ B AT S B A between-classes variance
J S ( A) º = T
SˆW A SW A to within-class variance
Lagrange multiplier

J (a) º a T S B a - l (a T SW a - I ) →Maximize
S B a = lSW a Condition: Sˆ = I
W
Û S S a = la
-1
W B

Û max{J S (a)} = l1 The greatest eigenvalue of SW−1SB

-1
→ The eigenvector for the greatest eigenvalue of SW S B
gives A that maximises Fisher’s criterion

Lec05 DimensionalityReduction
No ratings yet
Lec05 DimensionalityReduction
57 pages
شباتر اله مجمعه
No ratings yet
شباتر اله مجمعه
126 pages
PCALDAICA
No ratings yet
PCALDAICA
28 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
60 pages
Clustering
No ratings yet
Clustering
104 pages
Pattern Recognition and AI Using Matlab Textbook PDF
No ratings yet
Pattern Recognition and AI Using Matlab Textbook PDF
263 pages
Chapter 6
No ratings yet
Chapter 6
55 pages
Eigenvectors 2
No ratings yet
Eigenvectors 2
31 pages
Lecture W12ab
No ratings yet
Lecture W12ab
60 pages
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
No ratings yet
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
40 pages
Linear Discriminant Analysis
No ratings yet
Linear Discriminant Analysis
32 pages
Multivariate Statistical Methods: A Primer, Fourth Edition Bryan F.J. Manly PDF Download
100% (2)
Multivariate Statistical Methods: A Primer, Fourth Edition Bryan F.J. Manly PDF Download
59 pages
Principal Component Analysis
100% (1)
Principal Component Analysis
17 pages
AI Unit-5
No ratings yet
AI Unit-5
53 pages
Lec 3
No ratings yet
Lec 3
60 pages
Reviewed - IJAMSS - Equivalence of Fisher Discriminant Analysis and Least Square
No ratings yet
Reviewed - IJAMSS - Equivalence of Fisher Discriminant Analysis and Least Square
11 pages
Part1 Lecture 11 Annotated
No ratings yet
Part1 Lecture 11 Annotated
15 pages
Outline: Reducing Data Dimension
No ratings yet
Outline: Reducing Data Dimension
7 pages
Visualization 9 Dim Reduction
No ratings yet
Visualization 9 Dim Reduction
73 pages
MLSP-6 Dimensionality Reduction
No ratings yet
MLSP-6 Dimensionality Reduction
39 pages
ML Mod 4 Part 2
No ratings yet
ML Mod 4 Part 2
32 pages
ML RUSA Module 5 Dim Red
No ratings yet
ML RUSA Module 5 Dim Red
85 pages
7.pca Mda
No ratings yet
7.pca Mda
26 pages
5 - Feature Generation
No ratings yet
5 - Feature Generation
15 pages
Lecture 7: Principal Component Analysis (PCA) (Draft: Version 0.9.1)
No ratings yet
Lecture 7: Principal Component Analysis (PCA) (Draft: Version 0.9.1)
11 pages
Facial Recognition and Mathematics - Vectors and Geometry in Action
No ratings yet
Facial Recognition and Mathematics - Vectors and Geometry in Action
6 pages
Ruiz Modified I2ml3e Chap6
No ratings yet
Ruiz Modified I2ml3e Chap6
38 pages
5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
Fischer LDA
No ratings yet
Fischer LDA
8 pages
5-Dimension Reduction
No ratings yet
5-Dimension Reduction
48 pages
Pattern Classification 06. Feature Selection & Extraction: Abdelmoniem Bayoumi, PHD
No ratings yet
Pattern Classification 06. Feature Selection & Extraction: Abdelmoniem Bayoumi, PHD
29 pages
Feature Extraction
No ratings yet
Feature Extraction
16 pages
Ai Notes V
No ratings yet
Ai Notes V
7 pages
Subspace Methods
100% (1)
Subspace Methods
12 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
CHP 4
No ratings yet
CHP 4
72 pages
I2ml3e Chap6
No ratings yet
I2ml3e Chap6
37 pages
Data Science Cheatsheet
No ratings yet
Data Science Cheatsheet
5 pages
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
No ratings yet
Foundations of Machine Learning: Sudeshna Sarkar IIT Kharagpur
19 pages
22AIP3101A Session 7
No ratings yet
22AIP3101A Session 7
28 pages
Feature Selection and Extraction
No ratings yet
Feature Selection and Extraction
26 pages
16 dm2 Dimred 2022 23
No ratings yet
16 dm2 Dimred 2022 23
49 pages
Curse of Dimensionality, Dimensionality Reduction With PCA
No ratings yet
Curse of Dimensionality, Dimensionality Reduction With PCA
36 pages
Pattern Recognition Linear Classifier by Zaheer Ahmad
0% (1)
Pattern Recognition Linear Classifier by Zaheer Ahmad
37 pages
Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
No ratings yet
Foundations of Machine Learning: Module 3: Instance Based Learning and Feature Reduction
40 pages
Relevant Dimensions For Classification and Visualization
No ratings yet
Relevant Dimensions For Classification and Visualization
11 pages
Dimensions Reduction
No ratings yet
Dimensions Reduction
27 pages
CS434a/541a: Pattern Recognition Prof. Olga Veksler
No ratings yet
CS434a/541a: Pattern Recognition Prof. Olga Veksler
42 pages
CEE 408-12 Railcars 1 - Freight Car Basics-2020
No ratings yet
CEE 408-12 Railcars 1 - Freight Car Basics-2020
24 pages
4.10 Fisher Linear Discriminant: Chapter 4. Nonparametric Techniques
No ratings yet
4.10 Fisher Linear Discriminant: Chapter 4. Nonparametric Techniques
8 pages
3c Feature Extraction
No ratings yet
3c Feature Extraction
19 pages
cs229 Notes10 PDF
No ratings yet
cs229 Notes10 PDF
6 pages
CEE 408-17 Train Resistance 2020
No ratings yet
CEE 408-17 Train Resistance 2020
24 pages
Classification Techniques
No ratings yet
Classification Techniques
99 pages
TheBiomechanicsOfTheFieldHockeyDragFlick GRACE
No ratings yet
TheBiomechanicsOfTheFieldHockeyDragFlick GRACE
338 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
41 pages
LFD 2005 Nearest Neighbour
No ratings yet
LFD 2005 Nearest Neighbour
6 pages
Deep Learning - AD3501 - Important Questions and Question Bank
No ratings yet
Deep Learning - AD3501 - Important Questions and Question Bank
18 pages
Introduction To Python and Computer Programming 1704298503
No ratings yet
Introduction To Python and Computer Programming 1704298503
44 pages
TISCIA Monograph12
No ratings yet
TISCIA Monograph12
325 pages
Guidelines To Best Practices For Heavy Haul Railway Operations
No ratings yet
Guidelines To Best Practices For Heavy Haul Railway Operations
75 pages
Cheatsheet PCA
No ratings yet
Cheatsheet PCA
1 page
EMA Manufacturing Companies in Malaysia
No ratings yet
EMA Manufacturing Companies in Malaysia
14 pages
Lecture 7: Unsupervised Learning: C19 Machine Learning Hilary 2013 A. Zisserman
No ratings yet
Lecture 7: Unsupervised Learning: C19 Machine Learning Hilary 2013 A. Zisserman
20 pages
MR PPT TP
No ratings yet
MR PPT TP
31 pages
CSE437 Sec02 Fall2023 Final Exam
No ratings yet
CSE437 Sec02 Fall2023 Final Exam
2 pages
CEE 408-16 Railcars 5 - ECP Brakes-2020
100% (1)
CEE 408-16 Railcars 5 - ECP Brakes-2020
16 pages
Pca
No ratings yet
Pca
6 pages
Exam SRM Sample Questions
No ratings yet
Exam SRM Sample Questions
69 pages
Yield Curve Estimation, PCA
No ratings yet
Yield Curve Estimation, PCA
22 pages
Lecture 2 Decision Trees
No ratings yet
Lecture 2 Decision Trees
71 pages
Chapter 5 Dimensional Reduction Methods
No ratings yet
Chapter 5 Dimensional Reduction Methods
56 pages
Lecture 10 Ensemble Methods
No ratings yet
Lecture 10 Ensemble Methods
69 pages
Machine Learning For Multi-Omics Data Integration
No ratings yet
Machine Learning For Multi-Omics Data Integration
18 pages
An Introduction To QSAR Methodology
No ratings yet
An Introduction To QSAR Methodology
24 pages
Exp 3 A
No ratings yet
Exp 3 A
2 pages
Face Recognition Chapter
No ratings yet
Face Recognition Chapter
30 pages
A Context Aware Unsupervised Predictive Maintenance
No ratings yet
A Context Aware Unsupervised Predictive Maintenance
27 pages
Trimap: Large-Scale Dimensionality Reduction Using Triplets: Ehsan Amid & Manfred K. Warmuth Google Research, Brain Team
No ratings yet
Trimap: Large-Scale Dimensionality Reduction Using Triplets: Ehsan Amid & Manfred K. Warmuth Google Research, Brain Team
18 pages
Face Recognition by Using Linear Discriminant Analysis
No ratings yet
Face Recognition by Using Linear Discriminant Analysis
4 pages
EFA Guide
No ratings yet
EFA Guide
28 pages
AE420 Chpt3 Fall 2021 (Part2)
No ratings yet
AE420 Chpt3 Fall 2021 (Part2)
22 pages
2021 Digital Learning Before & During Covid
No ratings yet
2021 Digital Learning Before & During Covid
24 pages
AE420 Chpt8 Fall 2021
No ratings yet
AE420 Chpt8 Fall 2021
20 pages
AE420 Chpt7 Fall 2021
No ratings yet
AE420 Chpt7 Fall 2021
20 pages
Sitagliptin
No ratings yet
Sitagliptin
7 pages
University Social Responsibility As Antecedent of Students ' Satisfaction
No ratings yet
University Social Responsibility As Antecedent of Students ' Satisfaction
13 pages
AE420 Chapt1 Fall 2021
No ratings yet
AE420 Chapt1 Fall 2021
29 pages
Modelling Energy Forward Curves (Borovkova) PDF
No ratings yet
Modelling Energy Forward Curves (Borovkova) PDF
42 pages
Machine Learning Interview Questions and Answers PDF
No ratings yet
Machine Learning Interview Questions and Answers PDF
15 pages
Adaptive Doppler Compensation For Mitigating
No ratings yet
Adaptive Doppler Compensation For Mitigating
16 pages
Machine Learning Foundation
No ratings yet
Machine Learning Foundation
4 pages
Managing Management Graduates' Give Back Intentions An Empirical Study, Part I
No ratings yet
Managing Management Graduates' Give Back Intentions An Empirical Study, Part I
8 pages
Divorce Prediction System: Devansh Kapoor 179202050
No ratings yet
Divorce Prediction System: Devansh Kapoor 179202050
12 pages
A Comprehensive Strategy To Detect The Fraudulent Adulteration of Herbs - The Oregano Approach
No ratings yet
A Comprehensive Strategy To Detect The Fraudulent Adulteration of Herbs - The Oregano Approach
7 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
A First Course in Functional Analysis
From Everand
A First Course in Functional Analysis
Martin Davis
No ratings yet
A Short Course in Discrete Mathematics
From Everand
A Short Course in Discrete Mathematics
Edward A. Bender
3/5 (1)
A-level Maths Revision: Cheeky Revision Shortcuts
From Everand
A-level Maths Revision: Cheeky Revision Shortcuts
Scool Revision
3.5/5 (8)
Calculus I Essentials
From Everand
Calculus I Essentials
Editors of REA
1/5 (1)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lecture 11 Dimensionality Reduction

Uploaded by

Lecture 11 Dimensionality Reduction

Uploaded by

Principal Component Analysis (PCA)

Lecture 11: Dimensionality Reduction and

Atsuto Maki Lecture 11: Dimensionality Reduction and Subspace Methods

Autocorrelation matrix: Q = E ( xxT )

The basis consists of p eigenvectors of the

(figure credit: Y. Kurosawa)

(Eigenvectors from a large collection of his/her face)

(figure credit: K. Fukui)

Principal Component Analysis (PCA)

Eigenvectors for the greatest eigenvalues

by orthonormal basis {u1 ,!, u p }

(figure credit: Y. Kurosawa)

Training Feature extraction

New inputs Feature extraction

One approach to this is to find redundant variables

Information compression: to extract the class

• Various ways to measure the distance

• Find the nearest neighbour

• Define a class by a few representative patterns

C1 Extreme case: one vector per class

Consider Euclidean distances between the new input x

→ Choose the class that minimises the distance.

Now let’s extend the class representative to

p: the dimensionality of class subspace

∑λ j The projection length to the subspace is

Experiments still needed to find a good dimensionality

→ Overlapping distributions harmful for classification

Within-class var. between-class var. ratio

s B2 Between-class variance In short: distance between classes

Scatter matrix in the space after the transformation:

Within-class: SˆW º Sˆ1 + Sˆ2 = AT S1 A + AT S 2 A = AT SW A

Û max{J S (a)} = l1 The greatest eigenvalue of SW−1SB

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.