0% found this document useful (0 votes)
4 views38 pages

Lecture6

The document discusses various nonlinear dimensionality reduction techniques, including kernel PCA, Multidimensional Scaling (MDS), and Isometric Mapping (ISOMAP). It outlines the mathematical foundations and algorithms for these methods, emphasizing the importance of kernel matrices and distance metrics in transforming high-dimensional data into lower dimensions. Additionally, it provides references and implementation details for each technique.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views38 pages

Lecture6

The document discusses various nonlinear dimensionality reduction techniques, including kernel PCA, Multidimensional Scaling (MDS), and Isometric Mapping (ISOMAP). It outlines the mathematical foundations and algorithms for these methods, emphasizing the importance of kernel matrices and distance metrics in transforming high-dimensional data into lower dimensions. Additionally, it provides references and implementation details for each technique.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

CS303: Mathematical Foundations for AI

Nonlinear Dimensionality Reduction


23 Jan 2025
Recap

• Recap
▶ Principal Component Analysis (PCA)
▶ Linear Discriminant Analysis
• kernel PCA
• Multidimensional Scaling (MDS)
• Isometric Mapping (ISOMAP)

1 / 28
References

• kernel PCA
▶ PCA and Fisher’s Discriminant Analysis – Bishop, Christopher M., and
Nasser M. Nasrabadi. Pattern recognition and machine learning. Vol. 4.
No. 4. New York: springer, 2006.
▶ kernel PCA
▶ kernel Matrix
• MDS
▶ Video Lecture
▶ Slides
• ISOMAP
▶ Original Paper
▶ Video Lecture
▶ Slides

2 / 28
Manifold

3 / 28
Manifold

3 / 28
Manifold

Locally Euclidean (flat)

4 / 28
PCA on Swiss Roll Dataset

PCA Fails

4 / 28
Recall PCA

• X is the n × d data matrix


• Center the data (i.e., mean normalize) X̄
• X̄ T X̄ is the covariance matrix (d × d)
• Compute the top k eigenvectors v1 , . . . , vk of X̄ T X̄
• The new X ′ = X̄ v1 . . . vk which is of n × k
 

5 / 28
Recall PCA for Wide Matrices
We have X̄ with n << d
• We compute X̄ X̄ T , which is n × n let K = X̄ X̄ T
• Compute top k eigenvectors of X̄ X̄ T denoted by u1 , . . . , uk
 

• The normalized eigenvectors vi of X̄ T X̄

1
vi = √ X̄ T ui
λi

Proof.
X̄ X̄ T ui = λi ui
X̄ T X̄ ( X̄ T ui ) = λi ( X̄ T ui ) On both sides multiply X̄ T
X̄ T ui
vi = (∥ X̄ T ui ∥2 = λi )
∥ X̄ T ui ∥

6 / 28
Recall PCA for Wide Matrices
We have X̄ with n << d
• We compute X̄ X̄ T , let K = X̄ X̄ T
• Compute top k eigenvectors of X̄ X̄ T denoted by u1 , . . . , uk
 

• The eigen vectors vi of X̄ T X̄

1
vi = √ X̄ T ui
λi

• The new data X ′ i.e., n × k is given by

X̄ v1 . . . vk = X̄ X̄ T u1 . . . uk Λ−1/2
   

= K̄ u1 . . . uk Λ−1/2
 

6 / 28
kernel PCA

The projected data X ′ i.e., n × k is given by

K̄ u1 . . . uk Λ−1/2
 

where ui′ s are the eigen vectors of K̄

Note: we do not need to know X at all, all we need is the kernel matrix K (inner
product between all pairs of data points)

“kernel trick”

7 / 28
kernel PCA

8 / 28
kernel PCA

8 / 28
kernel PCA

• Project data x to higher dimension space ϕ( x )


• To reduce dimensions we are going to “higher dimensions” ?
• Note that we will never need the ϕ( x ) (row vector), just like we did not need X
but K̄
• All we need is the following n × n matrix

ϕ ( x1 ) ϕ ( x1 ) T . . . ϕ ( x1 ) ϕ ( x n ) T
 

K = ϕ( X )ϕ( X )T = 
 .. 
. 
T
ϕ ( x n ) ϕ ( x1 ) . . . ϕ ( x n ) ϕ ( x n ) T

• Note that the above is symmetric and positive semi-definite

9 / 28
Kernal PCA

Given X,
• We need K̄
• Compute the top k eigenvectors of K̄, u1 , . . . , uk
• Given an input xi the output with reduced dimensions is given by
yi = (yi1 , . . . , yik ),
n u1j n ukj
yi1 = ∑ K̄(xi , x j ) √λ , . . . , yik = ∑ K̄(xi , x j ) √λ
j =1 1 j =1 k

Y = K̄ u1 . . . uk Λ−1/2
 

10 / 28
kernel PCA

• How to get K̄ from K ? that is kernel matrix from mean centered feature space
• How to chose K? What higher dimension space?

11 / 28
Obtaining K̄
Centering in Feature Space

1 n
n i∑
µϕ = ϕ ( x i ); ϕ ′ ( xi ) = ϕ ( xi ) − µ ϕ , ∀i ∈ {1, 2, . . . , n}.
=1

The centered kernel matrix K̄ is computed as:

K̄ij = ⟨ϕ′ ( xi ), ϕ′ ( x j )⟩.

Expanding ϕ′ ( xi ) and ϕ′ ( x j ) Substitute ϕ′ ( xi ) = ϕ( xi ) − µϕ :

K̄ij = ⟨ϕ( xi ) − µϕ , ϕ( x j ) − µϕ ⟩.

12 / 28
Obtaining K̄

Expand the inner product:

K̄ij = ⟨ϕ( xi ), ϕ( x j )⟩ − ⟨ϕ( xi ), µϕ ⟩ − ⟨µϕ , ϕ( x j )⟩ + ⟨µϕ , µϕ ⟩.

• First term is ⟨ϕ( xi ), ϕ( x j )⟩ = Kij


• Second term is ⟨ϕ( xi ), µϕ ⟩ = n1 ∑nk=1 ⟨ϕ( xi ), ϕ( xk )⟩ = 1
n ∑nk=1 Kik
• Third term is ⟨µϕ , ϕ( x j )⟩ = n1 ∑nk=1 ⟨ϕ( xk ), ϕ( x j )⟩ = n1 ∑nk=1 Kkj
• Fourth term is ⟨µϕ , µϕ ⟩ = n12 ∑nk=1 ∑nl=1 ⟨ϕ( xk ), ϕ( xl )⟩ = n12 ∑nk=1 ∑nl=1 Kkl

12 / 28
Obtaining K̄
Combine these results:
1 n 1 n 1 n n
K̄ij = Kij − ∑
n k =1
Kik − ∑ Kkj + 2
n k =1 n ∑ ∑ Kkl .
k =1 l =1

1 1 1
K̄ = K − K1n − 1n K + 2 1n K1n ,
n n n
where 1n is an n × n matrix with all entries equal to 1.

Simplify further using H = In − n1 1n (the centering matrix):

K̄ = HKH.

12 / 28
Choice of K

• ϕ( X )ϕ( X )T leads to a symmetric and positive semidefinite K


• Any K which is symmetric and positive semidefinite will have corresponding ϕ( X )
(Mercer’s Theorem)

The Radial Basis Function (RBF) kernel,


!
∥ x i − x j ∥2
K ( xi , x j ) = exp − ,
2σ2

13 / 28
Choice of K

What ϕ gives leads to RBF?

13 / 28
Implementation

kernelPCA code

Summary: Given a gram matrix i.e., K or XX T (linear kernel) we can compute data in
lower dimension without access to X

14 / 28
Unrolling Swiss Roll

15 / 28
Multidimensional Scaling

16 / 28
Multidimensional Scaling

16 / 28
Multidimensional Scaling (MDS)

• We have X in n × d (High dimension)


• We need to go to Y in n × k (Low dimension)
min ∥ DX − DY ∥2F
Y,rank(Y )≤k

• DX (ij): distance between xi and x j


• DY (ij): distance between yi and y j
“Preserving the distances after transformation”

17 / 28
Classical Multidimensional Scaling

The given distances are Euclidean


• DX (ij) = d2ij = ∥ xi − x j ∥22
• d2ij = ∥ xi − x j ∥2 = ∥ xi ∥2 + ∥ x j ∥2 − 2xi x Tj (Eq. 1)
• Can we get the kernel (Gram matrix) from the distances?
▶ Non-unique
▶ Assume translational invariance ∑i xi = 0
• From Gram matrix we can then go to optimal low rank representation

18 / 28
Classical MDS
Solve the following equations 3 equations for xi x Tj (And ∑i xi = 0)

!
1 1 1
∑ d2ij =
n ∑ d2ij − ∑ ∥ xi ∥ 2 ∥ x i ∥2 =
n ∑ d2ij − 2n ∑ d2ij
i i i i ij
1 1
!
1
∑ d2ij = n ∑ d2ij − ∑ ∥ x j ∥2 ∥ x j ∥2 =
n ∑ d2ij − 2n ∑ d2ij
j ij
j j j

∑ ∑ d2ij = 2n ∑ ∥ xk ∥2
i j k Substitute the above in the Eq. 1 we have
1
xi x Tj = (∥ xi ∥2 + ∥ x j ∥2 − d2ij )
2

19 / 28
Classical MDS

Using these, the centered form of B is:


!
1 1 1 1
Kij = xi x Tj = −
2
d2ij −
n ∑ d2ij − n ∑ d2ij + n2 ∑ d2ij .
i j ij

Let 1n be an n × n matrix of ones:


1
K = − HDX H,
2

where H = In − n1 1n is the centering matrix

20 / 28
Classical MDS

Complete Algorithm
• We are given DX
• Compute K = − 12 HDX H
• Find the top k eigenvectors V = v1 . . . vk of K
 

• The new data Y would be


 
−1/2 1/2 vi vi
Y = KVΛ = VΛ As K √ = λi √
λi λi

21 / 28
Metric MDS

For Swiss roll data Euclidean distance is not good!


• D is any general distance metric
▶ d( x, x ) = 0
▶ Non-negative: d( x, y) ≥ 0
▶ Symmetry: d( x, y) = d(y, x )
▶ Triangle Inequality: d( x, z) ≤ d( x, y) + d(y, x )
• Examples?

22 / 28
Metric MDS

• A non-Euclidean D will not recover the exact embedding :(


• How good is the embedding found?

∑ij (dij − ∥yi − y j ∥)2


stress(y) =
d2ij

• Find y′ s that “Minimize stress” :)

23 / 28
ISOMAP

What D to use for Swiss roll kind of dataset?


Geodesic distance

24 / 28
ISOMAP

Given the data X which lies on some high dimension space on a low dimension
manifold.
• Is the manifold known?
• Then how to compute geodesic distance?

25 / 28
ISOMAP

Exploit “locally euclidean”


• Given a point x
• Find the neighbors of x (ϵ-ball)
• Build the nearest neighbor graph with edges as the Euclidean distances

26 / 28
ISOMAP

How to find the distance DX (ij)


• DX (ij) is the shortest path between ij on the nearest neighbor graph (Dijkstra)

27 / 28
ISOMAP

ISOMAP Algorithm
• Given: pairwise distances between high dimensional input points xi , x j , dij
• Compute nearest neighbour graph G using ϵ-ball
• Compute DX from G
• Apply MDS on DX to obtain low dimensional Y

27 / 28
Implementation ISOMAP

Notebook

28 / 28

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy