0% found this document useful (0 votes)

4 views38 pages

Lecture6

The document discusses various nonlinear dimensionality reduction techniques, including kernel PCA, Multidimensional Scaling (MDS), and Isometric Mapping (ISOMAP). It outlines the mathematical foundations and algorithms for these methods, emphasizing the importance of kernel matrices and distance metrics in transforming high-dimensional data into lower dimensions. Additionally, it provides references and implementation details for each technique.

Uploaded by

huntersganggaming

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views38 pages

Lecture6

Uploaded by

huntersganggaming

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

CS303: Mathematical Foundations for AI

Nonlinear Dimensionality Reduction

23 Jan 2025
Recap

• Recap
▶ Principal Component Analysis (PCA)
▶ Linear Discriminant Analysis
• kernel PCA
• Multidimensional Scaling (MDS)
• Isometric Mapping (ISOMAP)

1 / 28
References

• kernel PCA
▶ PCA and Fisher’s Discriminant Analysis – Bishop, Christopher M., and
Nasser M. Nasrabadi. Pattern recognition and machine learning. Vol. 4.
No. 4. New York: springer, 2006.
▶ kernel PCA
▶ kernel Matrix
• MDS
▶ Video Lecture
▶ Slides
• ISOMAP
▶ Original Paper
▶ Video Lecture
▶ Slides

2 / 28
Manifold

3 / 28
Manifold

Locally Euclidean (flat)

4 / 28
PCA on Swiss Roll Dataset

PCA Fails

4 / 28
Recall PCA

• X is the n × d data matrix

• Center the data (i.e., mean normalize) X̄
• X̄ T X̄ is the covariance matrix (d × d)
• Compute the top k eigenvectors v1 , . . . , vk of X̄ T X̄
• The new X ′ = X̄ v1 . . . vk which is of n × k

5 / 28
Recall PCA for Wide Matrices
We have X̄ with n << d
• We compute X̄ X̄ T , which is n × n let K = X̄ X̄ T
• Compute top k eigenvectors of X̄ X̄ T denoted by u1 , . . . , uk

• The normalized eigenvectors vi of X̄ T X̄

1
vi = √ X̄ T ui
λi

Proof.
X̄ X̄ T ui = λi ui
X̄ T X̄ ( X̄ T ui ) = λi ( X̄ T ui ) On both sides multiply X̄ T
X̄ T ui
vi = (∥ X̄ T ui ∥2 = λi )
∥ X̄ T ui ∥

6 / 28
Recall PCA for Wide Matrices
We have X̄ with n << d
• We compute X̄ X̄ T , let K = X̄ X̄ T
• Compute top k eigenvectors of X̄ X̄ T denoted by u1 , . . . , uk

• The eigen vectors vi of X̄ T X̄

1
vi = √ X̄ T ui
λi

• The new data X ′ i.e., n × k is given by

X̄ v1 . . . vk = X̄ X̄ T u1 . . . uk Λ−1/2

= K̄ u1 . . . uk Λ−1/2

6 / 28
kernel PCA

The projected data X ′ i.e., n × k is given by

K̄ u1 . . . uk Λ−1/2

where ui′ s are the eigen vectors of K̄

Note: we do not need to know X at all, all we need is the kernel matrix K (inner
product between all pairs of data points)

“kernel trick”

7 / 28
kernel PCA

8 / 28
kernel PCA

• Project data x to higher dimension space ϕ( x )

• To reduce dimensions we are going to “higher dimensions” ?
• Note that we will never need the ϕ( x ) (row vector), just like we did not need X
but K̄
• All we need is the following n × n matrix

ϕ ( x1 ) ϕ ( x1 ) T . . . ϕ ( x1 ) ϕ ( x n ) T
 

K = ϕ( X )ϕ( X )T = 
 .. 
. 
T
ϕ ( x n ) ϕ ( x1 ) . . . ϕ ( x n ) ϕ ( x n ) T

• Note that the above is symmetric and positive semi-definite

9 / 28
Kernal PCA

Given X,
• We need K̄
• Compute the top k eigenvectors of K̄, u1 , . . . , uk
• Given an input xi the output with reduced dimensions is given by
yi = (yi1 , . . . , yik ),
n u1j n ukj
yi1 = ∑ K̄(xi , x j ) √λ , . . . , yik = ∑ K̄(xi , x j ) √λ
j =1 1 j =1 k

Y = K̄ u1 . . . uk Λ−1/2

10 / 28
kernel PCA

• How to get K̄ from K ? that is kernel matrix from mean centered feature space
• How to chose K? What higher dimension space?

11 / 28
Obtaining K̄
Centering in Feature Space

1 n
n i∑
µϕ = ϕ ( x i ); ϕ ′ ( xi ) = ϕ ( xi ) − µ ϕ , ∀i ∈ {1, 2, . . . , n}.
=1

The centered kernel matrix K̄ is computed as:

K̄ij = ⟨ϕ′ ( xi ), ϕ′ ( x j )⟩.

Expanding ϕ′ ( xi ) and ϕ′ ( x j ) Substitute ϕ′ ( xi ) = ϕ( xi ) − µϕ :

K̄ij = ⟨ϕ( xi ) − µϕ , ϕ( x j ) − µϕ ⟩.

12 / 28
Obtaining K̄

Expand the inner product:

K̄ij = ⟨ϕ( xi ), ϕ( x j )⟩ − ⟨ϕ( xi ), µϕ ⟩ − ⟨µϕ , ϕ( x j )⟩ + ⟨µϕ , µϕ ⟩.

• First term is ⟨ϕ( xi ), ϕ( x j )⟩ = Kij

• Second term is ⟨ϕ( xi ), µϕ ⟩ = n1 ∑nk=1 ⟨ϕ( xi ), ϕ( xk )⟩ = 1
n ∑nk=1 Kik
• Third term is ⟨µϕ , ϕ( x j )⟩ = n1 ∑nk=1 ⟨ϕ( xk ), ϕ( x j )⟩ = n1 ∑nk=1 Kkj
• Fourth term is ⟨µϕ , µϕ ⟩ = n12 ∑nk=1 ∑nl=1 ⟨ϕ( xk ), ϕ( xl )⟩ = n12 ∑nk=1 ∑nl=1 Kkl

12 / 28
Obtaining K̄
Combine these results:
1 n 1 n 1 n n
K̄ij = Kij − ∑
n k =1
Kik − ∑ Kkj + 2
n k =1 n ∑ ∑ Kkl .
k =1 l =1

1 1 1
K̄ = K − K1n − 1n K + 2 1n K1n ,
n n n
where 1n is an n × n matrix with all entries equal to 1.

Simplify further using H = In − n1 1n (the centering matrix):

K̄ = HKH.

12 / 28
Choice of K

• ϕ( X )ϕ( X )T leads to a symmetric and positive semidefinite K

• Any K which is symmetric and positive semidefinite will have corresponding ϕ( X )
(Mercer’s Theorem)

The Radial Basis Function (RBF) kernel,

!
∥ x i − x j ∥2
K ( xi , x j ) = exp − ,
2σ2

13 / 28
Choice of K

What ϕ gives leads to RBF?

13 / 28
Implementation

kernelPCA code

Summary: Given a gram matrix i.e., K or XX T (linear kernel) we can compute data in
lower dimension without access to X

14 / 28
Unrolling Swiss Roll

15 / 28
Multidimensional Scaling

16 / 28
Multidimensional Scaling

16 / 28
Multidimensional Scaling (MDS)

• We have X in n × d (High dimension)

• We need to go to Y in n × k (Low dimension)
min ∥ DX − DY ∥2F
Y,rank(Y )≤k

• DX (ij): distance between xi and x j

• DY (ij): distance between yi and y j
“Preserving the distances after transformation”

17 / 28
Classical Multidimensional Scaling

The given distances are Euclidean

• DX (ij) = d2ij = ∥ xi − x j ∥22
• d2ij = ∥ xi − x j ∥2 = ∥ xi ∥2 + ∥ x j ∥2 − 2xi x Tj (Eq. 1)
• Can we get the kernel (Gram matrix) from the distances?
▶ Non-unique
▶ Assume translational invariance ∑i xi = 0
• From Gram matrix we can then go to optimal low rank representation

18 / 28
Classical MDS
Solve the following equations 3 equations for xi x Tj (And ∑i xi = 0)

!
1 1 1
∑ d2ij =
n ∑ d2ij − ∑ ∥ xi ∥ 2 ∥ x i ∥2 =
n ∑ d2ij − 2n ∑ d2ij
i i i i ij
1 1
!
1
∑ d2ij = n ∑ d2ij − ∑ ∥ x j ∥2 ∥ x j ∥2 =
n ∑ d2ij − 2n ∑ d2ij
j ij
j j j

∑ ∑ d2ij = 2n ∑ ∥ xk ∥2
i j k Substitute the above in the Eq. 1 we have
1
xi x Tj = (∥ xi ∥2 + ∥ x j ∥2 − d2ij )
2

19 / 28
Classical MDS

Using these, the centered form of B is:

!
1 1 1 1
Kij = xi x Tj = −
2
d2ij −
n ∑ d2ij − n ∑ d2ij + n2 ∑ d2ij .
i j ij

Let 1n be an n × n matrix of ones:

1
K = − HDX H,
2

where H = In − n1 1n is the centering matrix

20 / 28
Classical MDS

Complete Algorithm
• We are given DX
• Compute K = − 12 HDX H
• Find the top k eigenvectors V = v1 . . . vk of K

• The new data Y would be

−1/2 1/2 vi vi
Y = KVΛ = VΛ As K √ = λi √
λi λi

21 / 28
Metric MDS

For Swiss roll data Euclidean distance is not good!

• D is any general distance metric
▶ d( x, x ) = 0
▶ Non-negative: d( x, y) ≥ 0
▶ Symmetry: d( x, y) = d(y, x )
▶ Triangle Inequality: d( x, z) ≤ d( x, y) + d(y, x )
• Examples?

22 / 28
Metric MDS

• A non-Euclidean D will not recover the exact embedding :(

• How good is the embedding found?

∑ij (dij − ∥yi − y j ∥)2

stress(y) =
d2ij

• Find y′ s that “Minimize stress” :)

23 / 28
ISOMAP

What D to use for Swiss roll kind of dataset?

Geodesic distance

24 / 28
ISOMAP

Given the data X which lies on some high dimension space on a low dimension
manifold.
• Is the manifold known?
• Then how to compute geodesic distance?

25 / 28
ISOMAP

Exploit “locally euclidean”

• Given a point x
• Find the neighbors of x (ϵ-ball)
• Build the nearest neighbor graph with edges as the Euclidean distances

26 / 28
ISOMAP

How to find the distance DX (ij)

• DX (ij) is the shortest path between ij on the nearest neighbor graph (Dijkstra)

27 / 28
ISOMAP

ISOMAP Algorithm
• Given: pairwise distances between high dimensional input points xi , x j , dij
• Compute nearest neighbour graph G using ϵ-ball
• Compute DX from G
• Apply MDS on DX to obtain low dimensional Y

27 / 28
Implementation ISOMAP

Notebook

28 / 28

Cep Report Final
No ratings yet
Cep Report Final
17 pages
20-pca
No ratings yet
20-pca
50 pages
Machine Learning (CSO851) - Lecture 03
No ratings yet
Machine Learning (CSO851) - Lecture 03
71 pages
DAY-1 PPT of Advanced Programming Lab-II
No ratings yet
DAY-1 PPT of Advanced Programming Lab-II
55 pages
QSRI-lecture4
No ratings yet
QSRI-lecture4
56 pages
Battery-2030 Roadmap Version2.0
100% (1)
Battery-2030 Roadmap Version2.0
82 pages
Lecture 9_PCA
No ratings yet
Lecture 9_PCA
44 pages
شباتر اله مجمعه
No ratings yet
شباتر اله مجمعه
126 pages
PCA
No ratings yet
PCA
42 pages
L6
No ratings yet
L6
24 pages
Lecture_08_slides
No ratings yet
Lecture_08_slides
43 pages
Cheat Sheet
No ratings yet
Cheat Sheet
4 pages
L4
No ratings yet
L4
18 pages
Lecture4
No ratings yet
Lecture4
49 pages
11 Accenture Impact AI GDP Middle East
No ratings yet
11 Accenture Impact AI GDP Middle East
26 pages
L7
No ratings yet
L7
14 pages
AI in Power Transformer Fault Diagnosis
No ratings yet
AI in Power Transformer Fault Diagnosis
21 pages
Machine Learning: Unsupervised Learning Dimensionality Reduction K-Means Clustering
No ratings yet
Machine Learning: Unsupervised Learning Dimensionality Reduction K-Means Clustering
28 pages
Part1 Lecture 11 Annotated
No ratings yet
Part1 Lecture 11 Annotated
15 pages
Unit 3
No ratings yet
Unit 3
21 pages
Suspicious-human_activity_recognition final
No ratings yet
Suspicious-human_activity_recognition final
9 pages
PCA
No ratings yet
PCA
11 pages
Mathematial Introduction to Data Science
No ratings yet
Mathematial Introduction to Data Science
158 pages
Week 1
No ratings yet
Week 1
19 pages
TÉCNICA_CIRÚRGICAS_Técnica_Cirúrgica_em_Grandes_Animais_Turner
No ratings yet
TÉCNICA_CIRÚRGICAS_Técnica_Cirúrgica_em_Grandes_Animais_Turner
331 pages
771 A18 Lec19
No ratings yet
771 A18 Lec19
131 pages
Week12_PCA_BayesianInference_before_lecture
No ratings yet
Week12_PCA_BayesianInference_before_lecture
82 pages
Challenges and Drivers in Events
No ratings yet
Challenges and Drivers in Events
15 pages
Lecture W12ab
No ratings yet
Lecture W12ab
60 pages
Visualization 9 Dim Reduction
No ratings yet
Visualization 9 Dim Reduction
73 pages
lec3
No ratings yet
lec3
60 pages
Dimension Reduction
No ratings yet
Dimension Reduction
23 pages
What Is Supply Chain Management
No ratings yet
What Is Supply Chain Management
13 pages
SchSmo03c
No ratings yet
SchSmo03c
24 pages
Ruiz Modified I2ml3e Chap6
No ratings yet
Ruiz Modified I2ml3e Chap6
38 pages
Transformers Can Do Bayesian Inference
No ratings yet
Transformers Can Do Bayesian Inference
23 pages
Research Article Machine Learning Technique For Precision Agriculture Applications in 5G-Based Internet of Things
No ratings yet
Research Article Machine Learning Technique For Precision Agriculture Applications in 5G-Based Internet of Things
11 pages
Jwt Magazine December 2024_Compress (1)
No ratings yet
Jwt Magazine December 2024_Compress (1)
125 pages
guidlances_modulo_2
No ratings yet
guidlances_modulo_2
4 pages
supplementary _active learning alloys
No ratings yet
supplementary _active learning alloys
38 pages
Çallı Et Al. - 2021
No ratings yet
Çallı Et Al. - 2021
29 pages
Ai Notes V
No ratings yet
Ai Notes V
7 pages
Integrating AI with Fusion 360 and Other Surface Modeling Tools
No ratings yet
Integrating AI with Fusion 360 and Other Surface Modeling Tools
5 pages
7.pca Mda
No ratings yet
7.pca Mda
26 pages
MLSP-6 dimensionality reduction
No ratings yet
MLSP-6 dimensionality reduction
39 pages
Lecture2
No ratings yet
Lecture2
17 pages
Nonlinear Dimensionality Reduction
No ratings yet
Nonlinear Dimensionality Reduction
18 pages
DimensionalityReduction Pca
No ratings yet
DimensionalityReduction Pca
24 pages
A.I.question Bank
100% (1)
A.I.question Bank
28 pages
Lecture 05
No ratings yet
Lecture 05
49 pages
TH Sunday Magazine HD 31 - 03 - 2024
No ratings yet
TH Sunday Magazine HD 31 - 03 - 2024
6 pages
Project Report (Group 9)
No ratings yet
Project Report (Group 9)
20 pages
Resume
No ratings yet
Resume
2 pages
Subspace Methods
100% (1)
Subspace Methods
12 pages
AML Unit - 1 Material
No ratings yet
AML Unit - 1 Material
36 pages
کتاب نهم بارگزاری شده
No ratings yet
کتاب نهم بارگزاری شده
55 pages
2014 02 26 Kernels
No ratings yet
2014 02 26 Kernels
140 pages
A Survey of Graph Meets Large Language Model - Progress and Future Directions
No ratings yet
A Survey of Graph Meets Large Language Model - Progress and Future Directions
13 pages
Publication 1
No ratings yet
Publication 1
12 pages
Ai 13
No ratings yet
Ai 13
14 pages
CH 2
No ratings yet
CH 2
121 pages
Lecture 3 Introduction to Linear Algebra (Part 2)
No ratings yet
Lecture 3 Introduction to Linear Algebra (Part 2)
57 pages
Florida
No ratings yet
Florida
2 pages
I2ml3e Chap6
No ratings yet
I2ml3e Chap6
37 pages
Some Methods of Constructing Kernel
No ratings yet
Some Methods of Constructing Kernel
23 pages
Policy Brief-104 - Amit Kumar
No ratings yet
Policy Brief-104 - Amit Kumar
12 pages
Week 9 Notes
No ratings yet
Week 9 Notes
6 pages
MNIST-Handwritten-Digit-Recognition-with-Different-CNN-Architectures
No ratings yet
MNIST-Handwritten-Digit-Recognition-with-Different-CNN-Architectures
4 pages
Dim Reduction & Pattern Recognition
No ratings yet
Dim Reduction & Pattern Recognition
63 pages
High Dimensional Representation
No ratings yet
High Dimensional Representation
33 pages
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
No ratings yet
Dimension Reduction and Hidden Structure: 1.1 Principal Component Analysis (PCA)
40 pages
Pca
No ratings yet
Pca
6 pages
Lecture 11 Dimensionality Reduction
No ratings yet
Lecture 11 Dimensionality Reduction
32 pages
Kernel Nearest-Neighbor Algorithm
No ratings yet
Kernel Nearest-Neighbor Algorithm
10 pages
Dimensionality Reduction
No ratings yet
Dimensionality Reduction
60 pages
Pca Kmeans GMM
No ratings yet
Pca Kmeans GMM
96 pages
Kernel Isomap: Heeyoul Choi and Seungjin Choi
No ratings yet
Kernel Isomap: Heeyoul Choi and Seungjin Choi
6 pages
5 - Feature Generation
No ratings yet
5 - Feature Generation
15 pages
Principal Component Analysis
No ratings yet
Principal Component Analysis
16 pages
16 dm2 Dimred 2022 23
No ratings yet
16 dm2 Dimred 2022 23
49 pages
Kernel Principal Component Analysis and Its Applications in Face Recognition and Active Shape Models
No ratings yet
Kernel Principal Component Analysis and Its Applications in Face Recognition and Active Shape Models
9 pages
cs229 Notes10 PDF
No ratings yet
cs229 Notes10 PDF
6 pages
KPCA
No ratings yet
KPCA
26 pages
Artificial Intelligence - What It Is and Why It Matters - SAS
No ratings yet
Artificial Intelligence - What It Is and Why It Matters - SAS
8 pages
Dimensions Reduction
No ratings yet
Dimensions Reduction
27 pages
Principal Component Analysis: Atent Ariables
No ratings yet
Principal Component Analysis: Atent Ariables
13 pages
Lecture 7: Unsupervised Learning: C19 Machine Learning Hilary 2013 A. Zisserman
No ratings yet
Lecture 7: Unsupervised Learning: C19 Machine Learning Hilary 2013 A. Zisserman
20 pages
English Presentation PDF
No ratings yet
English Presentation PDF
13 pages
Loompanics Unlimited - An Interview With Mike Hoy
No ratings yet
Loompanics Unlimited - An Interview With Mike Hoy
13 pages
Machine Learning Techniques
No ratings yet
Machine Learning Techniques
8 pages
Exploring The Role of AI in Public Sector Accounting Education and Research
No ratings yet
Exploring The Role of AI in Public Sector Accounting Education and Research
14 pages
AI in Agriculture
100% (1)
AI in Agriculture
20 pages
unit V
No ratings yet
unit V
67 pages
Worked Examples in Mathematics for Scientists and Engineers
From Everand
Worked Examples in Mathematics for Scientists and Engineers
G. Stephenson
No ratings yet
The Green Book of Mathematical Problems
From Everand
The Green Book of Mathematical Problems
Kenneth Hardy
4.5/5 (3)

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lecture6

Uploaded by

Lecture6

Uploaded by

CS303: Mathematical Foundations for AI

Nonlinear Dimensionality Reduction

Locally Euclidean (flat)

• X is the n × d data matrix

• The normalized eigenvectors vi of X̄ T X̄

• The eigen vectors vi of X̄ T X̄

• The new data X ′ i.e., n × k is given by

The projected data X ′ i.e., n × k is given by

where ui′ s are the eigen vectors of K̄

• Project data x to higher dimension space ϕ( x )

• Note that the above is symmetric and positive semi-definite

The centered kernel matrix K̄ is computed as:

K̄ij = ⟨ϕ′ ( xi ), ϕ′ ( x j )⟩.

Expanding ϕ′ ( xi ) and ϕ′ ( x j ) Substitute ϕ′ ( xi ) = ϕ( xi ) − µϕ :

Expand the inner product:

K̄ij = ⟨ϕ( xi ), ϕ( x j )⟩ − ⟨ϕ( xi ), µϕ ⟩ − ⟨µϕ , ϕ( x j )⟩ + ⟨µϕ , µϕ ⟩.

• First term is ⟨ϕ( xi ), ϕ( x j )⟩ = Kij

Simplify further using H = In − n1 1n (the centering matrix):

• ϕ( X )ϕ( X )T leads to a symmetric and positive semidefinite K

The Radial Basis Function (RBF) kernel,

What ϕ gives leads to RBF?

• We have X in n × d (High dimension)

• DX (ij): distance between xi and x j

The given distances are Euclidean

Using these, the centered form of B is:

Let 1n be an n × n matrix of ones:

where H = In − n1 1n is the centering matrix

• The new data Y would be

For Swiss roll data Euclidean distance is not good!

• A non-Euclidean D will not recover the exact embedding :(

∑ij (dij − ∥yi − y j ∥)2

• Find y′ s that “Minimize stress” :)

What D to use for Swiss roll kind of dataset?

Exploit “locally euclidean”

How to find the distance DX (ij)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.