Lecture6
Lecture6
• Recap
▶ Principal Component Analysis (PCA)
▶ Linear Discriminant Analysis
• kernel PCA
• Multidimensional Scaling (MDS)
• Isometric Mapping (ISOMAP)
1 / 28
References
• kernel PCA
▶ PCA and Fisher’s Discriminant Analysis – Bishop, Christopher M., and
Nasser M. Nasrabadi. Pattern recognition and machine learning. Vol. 4.
No. 4. New York: springer, 2006.
▶ kernel PCA
▶ kernel Matrix
• MDS
▶ Video Lecture
▶ Slides
• ISOMAP
▶ Original Paper
▶ Video Lecture
▶ Slides
2 / 28
Manifold
3 / 28
Manifold
3 / 28
Manifold
4 / 28
PCA on Swiss Roll Dataset
PCA Fails
4 / 28
Recall PCA
5 / 28
Recall PCA for Wide Matrices
We have X̄ with n << d
• We compute X̄ X̄ T , which is n × n let K = X̄ X̄ T
• Compute top k eigenvectors of X̄ X̄ T denoted by u1 , . . . , uk
1
vi = √ X̄ T ui
λi
Proof.
X̄ X̄ T ui = λi ui
X̄ T X̄ ( X̄ T ui ) = λi ( X̄ T ui ) On both sides multiply X̄ T
X̄ T ui
vi = (∥ X̄ T ui ∥2 = λi )
∥ X̄ T ui ∥
6 / 28
Recall PCA for Wide Matrices
We have X̄ with n << d
• We compute X̄ X̄ T , let K = X̄ X̄ T
• Compute top k eigenvectors of X̄ X̄ T denoted by u1 , . . . , uk
1
vi = √ X̄ T ui
λi
X̄ v1 . . . vk = X̄ X̄ T u1 . . . uk Λ−1/2
= K̄ u1 . . . uk Λ−1/2
6 / 28
kernel PCA
K̄ u1 . . . uk Λ−1/2
Note: we do not need to know X at all, all we need is the kernel matrix K (inner
product between all pairs of data points)
“kernel trick”
7 / 28
kernel PCA
8 / 28
kernel PCA
8 / 28
kernel PCA
ϕ ( x1 ) ϕ ( x1 ) T . . . ϕ ( x1 ) ϕ ( x n ) T
K = ϕ( X )ϕ( X )T =
..
.
T
ϕ ( x n ) ϕ ( x1 ) . . . ϕ ( x n ) ϕ ( x n ) T
9 / 28
Kernal PCA
Given X,
• We need K̄
• Compute the top k eigenvectors of K̄, u1 , . . . , uk
• Given an input xi the output with reduced dimensions is given by
yi = (yi1 , . . . , yik ),
n u1j n ukj
yi1 = ∑ K̄(xi , x j ) √λ , . . . , yik = ∑ K̄(xi , x j ) √λ
j =1 1 j =1 k
Y = K̄ u1 . . . uk Λ−1/2
10 / 28
kernel PCA
• How to get K̄ from K ? that is kernel matrix from mean centered feature space
• How to chose K? What higher dimension space?
11 / 28
Obtaining K̄
Centering in Feature Space
1 n
n i∑
µϕ = ϕ ( x i ); ϕ ′ ( xi ) = ϕ ( xi ) − µ ϕ , ∀i ∈ {1, 2, . . . , n}.
=1
K̄ij = ⟨ϕ( xi ) − µϕ , ϕ( x j ) − µϕ ⟩.
12 / 28
Obtaining K̄
12 / 28
Obtaining K̄
Combine these results:
1 n 1 n 1 n n
K̄ij = Kij − ∑
n k =1
Kik − ∑ Kkj + 2
n k =1 n ∑ ∑ Kkl .
k =1 l =1
1 1 1
K̄ = K − K1n − 1n K + 2 1n K1n ,
n n n
where 1n is an n × n matrix with all entries equal to 1.
K̄ = HKH.
12 / 28
Choice of K
13 / 28
Choice of K
13 / 28
Implementation
kernelPCA code
Summary: Given a gram matrix i.e., K or XX T (linear kernel) we can compute data in
lower dimension without access to X
14 / 28
Unrolling Swiss Roll
15 / 28
Multidimensional Scaling
16 / 28
Multidimensional Scaling
16 / 28
Multidimensional Scaling (MDS)
17 / 28
Classical Multidimensional Scaling
18 / 28
Classical MDS
Solve the following equations 3 equations for xi x Tj (And ∑i xi = 0)
!
1 1 1
∑ d2ij =
n ∑ d2ij − ∑ ∥ xi ∥ 2 ∥ x i ∥2 =
n ∑ d2ij − 2n ∑ d2ij
i i i i ij
1 1
!
1
∑ d2ij = n ∑ d2ij − ∑ ∥ x j ∥2 ∥ x j ∥2 =
n ∑ d2ij − 2n ∑ d2ij
j ij
j j j
∑ ∑ d2ij = 2n ∑ ∥ xk ∥2
i j k Substitute the above in the Eq. 1 we have
1
xi x Tj = (∥ xi ∥2 + ∥ x j ∥2 − d2ij )
2
19 / 28
Classical MDS
20 / 28
Classical MDS
Complete Algorithm
• We are given DX
• Compute K = − 12 HDX H
• Find the top k eigenvectors V = v1 . . . vk of K
21 / 28
Metric MDS
22 / 28
Metric MDS
23 / 28
ISOMAP
24 / 28
ISOMAP
Given the data X which lies on some high dimension space on a low dimension
manifold.
• Is the manifold known?
• Then how to compute geodesic distance?
25 / 28
ISOMAP
26 / 28
ISOMAP
27 / 28
ISOMAP
ISOMAP Algorithm
• Given: pairwise distances between high dimensional input points xi , x j , dij
• Compute nearest neighbour graph G using ϵ-ball
• Compute DX from G
• Apply MDS on DX to obtain low dimensional Y
27 / 28
Implementation ISOMAP
Notebook
28 / 28