0% found this document useful (0 votes)

152 views8 pages

n25 PDF

The document discusses different approaches to clustering, including k-means clustering, hierarchical clustering, and spectral clustering. K-means clustering aims to minimize the distance between data points and their assigned cluster centroids. Hierarchical clustering builds clusters incrementally from individual data points using distance metrics. Spectral clustering formulates clustering as a graph partitioning problem by viewing data as a graph and minimizing cuts between clusters while balancing cluster sizes.

Uploaded by

Christine Straub

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

152 views8 pages

n25 PDF

Uploaded by

Christine Straub

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

Clustering

compiled by Alvin Wan from Professor Benjamin Rechts lecture, Samanehs discussion

1 Overview
With clustering, we have several key motivations:

archetypes (factor analysis)

segmentation
hierarchy
faster lookups (quantization)

Its not trivial to choose an objective to minimize. In PCA, the algorithm was fixed, regard-
less of the objective. With clustering, different objective can result in a different algorithms.
There is no preferred way to do clustering, but we will explore several popular methods in
this note. Here are three approaches to consider:

k-means (quantization)
agglomeration (hierarchy)
spectral (segmentation)

2 K-Means Clustering
In k-means clustering, we segment our data by describing each data point using a centroid
i . In other words, xi is in cluster j if xi to closer to cluster j than any other cluster,
kxi j k < kxi j 0 k for j 6= j 0 . Given centroids, this is how we assign points to clusters.
The question is now: how do we pick centroids? We have the following optimization problem:

n
X
Minimize1 ,2 ,...k Min1ji k kxi ji k2
i=1

(ji is an index) This is effectively an SVM, where were fitting parameters to some loss
function. As it turns out, minimizing this cost is NP-hard.
Explore I want hue?

1
2.1 Lloyds Algorithm

The following is called alternating minimization. If we fix the cluster assignments, the
problem becomes easy. If the cluster assignment is fixed, the objective is a convex function.
Then, if we fix the means, then we can easily cluster. In the following algorithm, we then
alternately fix the cluster assignments or the means and minimize over the other.

1. Initialize 1 , . . . , k .

2. Assign each point to the jth cluster if it is closest to j. For i = 1, . . . n, assign xi to Cj

if kxi j k2 kx2 j 0 kj 0 6= j.

3. If not assignments changed, then return.

1
P
4. Assign the mean to the mean of the cluster. j |Cj | iCj xi .

5. Go back to 1.

The number of clusters is in fact a hyper-parameter for this algorithm. How do we initialize
i ? We have a few options:

Pick 1 , 2 , . . . k at random.

Initialize using k-means++. (See stronger results by Schulman, Rabani, Swarmy, Os-
trovski.)

Set 1 to be randomly-selected xi . In high-dimensional space, a randomly-selected

point may easily be distant from our data.
For c = 1, . . . k 1, for i = 1, . . . , n, di = Min1jC kxi j k2 .
P
z= di
di
For i = 1, . . . , n, pi = z
. Set c+1 = xi with probability pi .
If the distance is large, we have a high probability of picking that point. We have
0 probability of picking the original point.

As it turns out, if there exists a good clustering, and we know the number of clusters, this
algorithm is guaranteed to find that clustering.

2
3 Hierarchical Clustering

Previously, we had a top-down approach, where we took clusters and then assigned samples.
Here, we take a bottom-up approach; we form clusters incrementally. Take clusters of 2,
merge the pairs, then the quadruples etc. This inherently gives us a hierarchy. Let us define
one possible distance metric, called average linkage:

1 XX
d(A, B) = Dist(a, b)
|A||B| aA aB

P
We can also define centroid linkage, where A = aA a.

d(A, B) = Dist(A , B )

We could similarly and arbitrarily apply any valid metric:

d(A, B) = Max(Dist(a, b) : a A, b B)

3.1 Greedy Algorithm

1. Initialize with n clsuters, Ci = {xi }.
2. Repeat.
3. For all pairs of clusters (A, B), compute d(A, B)
4. Cnew = A B, where d(A, B) is minimized.

The dendogram represents our steps to union each set of clusters.

We can examine a random greedy algorithm

Choose A uniformly at random

Cnew = A B, where B is closest to A.

This reduces runtime from n3 to n2 an often produces more stable results.

3
4 Spectral Clustering

View data as a graph, where our nodes are data points x1 , . . . xn , and edges are wij , which
denote similarity of two data points, Sim(xi , xj ). Here are a few sample similarity functions.

xT xj
cosine similarity: kxi kkxj k

a kernel function k(xi , xj )

(
1 kxi xj k D0

0 otherwise

4.1 Cuts

As it turns out, we can convert clustering into a graph partition problem. Let us formalize
the problem parameters. Our goal is find cut for our graph. Let V be the set of all nodes,
then our partitions V1 , V2 must satisfy the following.

V1 V2 = V

V1 V2 =

P P
The number of cuts is Cut(V1 , V2 ) = iV1 jV2 wij . However, we can find a trivial solution
that minimizes the number of cuts, which is to consider V1 = V, V2 = . So, we introduce a
penalty term to make a balanced cut.

MinimizeCut(V1 , V2 )

subject to |V1 | = |V2 | = n2 . (We ignore the odd case for now.) This problem is also NP-hard.
We are now going to transform a discrete problem into a continuous problem.

4
4.2 Graph Laplacian

We have several types of matrices that describe the structure of a graph.

adjacency matrix (A): Aij = 1 if i, j connected and 0 otherwise

affinity matrix (W ): entries are s(i, j) if i, j connected and 0 otherwise (no self-loops,
so diagonal entries are 0)

degree matrix (D): In the derivation below, D is a diagonal matrix with sums of the

Laplacian matrix (L = D W ): symmetric, PSD, always has i = 0, vi = 1

Let us call Mass(G1 ) the number of nodes in G1 , or |V1 |. We wish to find 2 or more parittions
of similar sizes, where we cut edges with low weight. We can see that our problem can be
formally expressed as the following.

Cut(G1 , G2 )
Minimize
Mass(G1 )Mass(G2 )

4.3 Minimizing the Cut

Let us first define the cut indicator.

(
1 i V1
vi =
1 i V2

We can then define a cut indicator, which tells us if i, j is in the cut.

n n
1 XX
Cut(V1 , V2 ) = wij (vi vj )2
4 i=1 j=1

If the weight is high, we want nodes to be closer together, and if the weight is low, nodes
are repelled. As it turns out, we can simplify this expression.

5
XX
Cut(V1 , V2 ) = wij
iG1 jG2
X 1
= wij (yi yj )2
4
(i,j)E
1 X
= (wij yi2 2wij yi yj + wij yj2 )
4
(i,j)E
1 X 1 X
= (2wij yi yj ) + (wij yi2 + wij yj2 )
4 4
(i,j)E (i,j)E

In the second summation, we sum over all edges in the cut wij , adding weight for both
vertices i, j. This is equivalent to summing over all vertices in the cut, and for each vertex,
adding all weights for edges in the cut.

n n
1 X X
2
X
Cut(V1 , V2 ) = (2wij yi yj ) + yi wik
4 i=1 k=1
(i,j)E
1
= v T (D W )v
4
1
= v T Lv
4

(
wij i 6= j
where Lij = P . L is known as the Graph Laplacian. This, like the adjacency
k wik i=j
matrix, can uniquely identify a graph. We know a few properties about this matrix L.

L is symmetric.

L is positive semidefinite, if wij > 0. Since all terms are squared and non-negative,
v T Lv 0, v.

L1 = 0, where 1 is the vector of 1s.

6
We thus have a new objective.

1
Minimize v T Lv
4

such that vi P{1, 1}, 1T v = 0. To make this more explicit, note that along the diagonal
of L, we have j wij . Since wii = 0, then we have that this sum is equal to the sum of all
other terms in that row. Thus, L1 = 0. Since v 6= 0, = 0.

We claim only one such exists. Note that if y = 1, Ly = 0 and Cut(G1 , G2 ) = y T Ly = 0,

and all nodes are in G1 or G2 .

Proof: Assume for contradiction that another v2 6= 1 so that 2 = 0. So Lv2 = 0 = 2 v2 =

Cut(G1 , G2 ) = 0. We know v2T Lv2 = 0, and thus the graph is still connected. This means
v2 = 1. Contradiction.

Note that this minimization problem is the exact same problem as the one proposed earlier.
The only difference is that this for continuous-valued numbers. Now, we make an approxi-
mation. Instead, we will subject our problem to kvk2 = n and 1T v = 0. As it turns out, the
solution to this minimization problem is the second-smallest eigenvalue. If 1T v = 0 was not
added, the solution would be the first eigenvalue.

There are a variety of other related to the Graph Laplacian - the normalized cut, maximum
cut etc. All of these are NP-hard.

4.4 Minimizing the Masses

Now, let us consider the denominator. We need to additionally constrian the sizes of the
partitions to be similar. How can we ensure that |V1 | = |V2 | = n2 . We want thte sum of all
entries in v to be 0. So, 1T y = 0. The problem is formally

Minimizev T Lv

subject to the constraint that i, yi = 1 or yi = 1. Consider a two-dimensional represen-

tation, on only y1 , y2 . Plotting all combinations of {1, 1}, we have the corners of a square.
We can loosen this constraint so that y1 , y2 are anywhere on the circle that passes through

7

all corners of the square. This is a circle of radius 2 = n. Generalizing to n, we can relax
this constraint to kvk22 = n or identically, 1T v = 0. Without any constraints, note that

v T Lv
Minimize = min (L) = 0
vT v

So, v1 is not a solution. However, we note that v1 = 1. Note that v2 is orthogonal to v1 , so

v2 satisfies the constraint. Our solution is thus the second-smallest eigenvalue.

Consider now the ellipsoid, {x : xT Ax = 1}. Our semi-axis length is given by 1i . Our
principal directions are given by vi . When A = L, we have an eigenvalue of i = 0, so we
have one axis with length infinity. Seen geometrically, this is a cylinder, where the length of
the cylinder runs along v1 . Since we want to v1T v = 0, then we want v to be orthogonal to
v1 . This is a hyperplane orthogonal to v1 . Per before, we want kvk22 = n. The constraint
in three-dimensional space is a sphere. Thus, we are looking for the intersection of the
hyperplane with the sphere. This is precisely v2 .

Learning Outcomes:: Session Plan
No ratings yet
Learning Outcomes:: Session Plan
6 pages
Learning Spectral Clustering
No ratings yet
Learning Spectral Clustering
8 pages
2002 Spectral Kernel Methods For Clustering
No ratings yet
2002 Spectral Kernel Methods For Clustering
7 pages
Errata - Mathematical Introduction To Data Science
No ratings yet
Errata - Mathematical Introduction To Data Science
6 pages
Lecture Note 08
No ratings yet
Lecture Note 08
6 pages
English Language Year 9 Paper 1
No ratings yet
English Language Year 9 Paper 1
9 pages
James and The Giant Peach Lessonplan and Worksheet
No ratings yet
James and The Giant Peach Lessonplan and Worksheet
57 pages
PR Module 4 QB
No ratings yet
PR Module 4 QB
37 pages
Clustering
No ratings yet
Clustering
82 pages
DS303 Clustering
No ratings yet
DS303 Clustering
20 pages
Clustering Notes
No ratings yet
Clustering Notes
4 pages
Algebra Practice
No ratings yet
Algebra Practice
4 pages
Giu 2719 65 22376 2025-02-17T23 42 29
No ratings yet
Giu 2719 65 22376 2025-02-17T23 42 29
37 pages
Semi-Automatic Twister: The Way To Make It
No ratings yet
Semi-Automatic Twister: The Way To Make It
1 page
Clustering
No ratings yet
Clustering
41 pages
Modul B Inggris Xii
100% (1)
Modul B Inggris Xii
27 pages
Graph Laplacian Matrix
No ratings yet
Graph Laplacian Matrix
3 pages
LecN10 R
No ratings yet
LecN10 R
9 pages
Spectral Clustering 2
No ratings yet
Spectral Clustering 2
39 pages
Extended Essay Skeleton Outline Template
No ratings yet
Extended Essay Skeleton Outline Template
4 pages
Ds 11
No ratings yet
Ds 11
15 pages
Mastering The Rockefeller Habits
No ratings yet
Mastering The Rockefeller Habits
10 pages
Cmu850 f20
No ratings yet
Cmu850 f20
285 pages
Lec 18
No ratings yet
Lec 18
8 pages
Chad Rafe: Unit: Unit 1 Lesson: Getting Started in Spanish I
No ratings yet
Chad Rafe: Unit: Unit 1 Lesson: Getting Started in Spanish I
64 pages
Spectral Decomposition For Graphs
No ratings yet
Spectral Decomposition For Graphs
5 pages
Spectral Clustering Survey
No ratings yet
Spectral Clustering Survey
12 pages
MRR 4
100% (1)
MRR 4
2 pages
Poster - The Canary in The Mineshaft
No ratings yet
Poster - The Canary in The Mineshaft
1 page
Eve Mining PDF
No ratings yet
Eve Mining PDF
2 pages
6752 To 6735
No ratings yet
6752 To 6735
10 pages
ASTM C88 Soundness of Aggregate
100% (1)
ASTM C88 Soundness of Aggregate
3 pages
Introduction To Optimization - Jean-François Aujol
No ratings yet
Introduction To Optimization - Jean-François Aujol
51 pages
Randomized Decision Trees II: 1 Feature Selection
No ratings yet
Randomized Decision Trees II: 1 Feature Selection
3 pages
n27 PDF
No ratings yet
n27 PDF
3 pages
n14 PDF
No ratings yet
n14 PDF
4 pages
Clustering With Gradient Descent: 1 Performance
No ratings yet
Clustering With Gradient Descent: 1 Performance
4 pages
Spec Clus Mod
No ratings yet
Spec Clus Mod
29 pages
Convolutional Neural Networks: 1 Convolution
No ratings yet
Convolutional Neural Networks: 1 Convolution
2 pages
n9 PDF
No ratings yet
n9 PDF
6 pages
Clustering
No ratings yet
Clustering
24 pages
Spectral Clustering
No ratings yet
Spectral Clustering
7 pages
n15 PDF
No ratings yet
n15 PDF
4 pages
Linear Regression: 1 Perspective 1: Maximum Likelihood Estimation
No ratings yet
Linear Regression: 1 Perspective 1: Maximum Likelihood Estimation
5 pages
CXTFit ISMAR PDF
100% (1)
CXTFit ISMAR PDF
4 pages
Contra Positive
No ratings yet
Contra Positive
9 pages
Listening - Part 1: Extract One
No ratings yet
Listening - Part 1: Extract One
3 pages
Chapter 1 Introduction
No ratings yet
Chapter 1 Introduction
18 pages
Bias-Variance Tradeoffs: 1 Single Sample MLE
No ratings yet
Bias-Variance Tradeoffs: 1 Single Sample MLE
7 pages
Speech and Language Disorders
No ratings yet
Speech and Language Disorders
3 pages
Rationale
No ratings yet
Rationale
2 pages
1 1.2. Ecosystem Introduction: The Term Eco' Means Environment. The Immediate
No ratings yet
1 1.2. Ecosystem Introduction: The Term Eco' Means Environment. The Immediate
17 pages
Verbal Reasoning Test Tutorial Ebook Sample Only
No ratings yet
Verbal Reasoning Test Tutorial Ebook Sample Only
7 pages
Lecture 9 - SVMs
No ratings yet
Lecture 9 - SVMs
8 pages
Simple Additive Weighting Approach To Personnel Selection Problem
100% (1)
Simple Additive Weighting Approach To Personnel Selection Problem
5 pages
UC Berkeley EECS: Cal Day, April 18, 2015
No ratings yet
UC Berkeley EECS: Cal Day, April 18, 2015
50 pages
Competition 2008 Spring
No ratings yet
Competition 2008 Spring
2 pages
h31 Higher Order Derivatives Velocity and Acceleration
No ratings yet
h31 Higher Order Derivatives Velocity and Acceleration
2 pages
Cheat Sheet
No ratings yet
Cheat Sheet
2 pages
BMP File Format - Wikipedia, The Free Encyclopedia
No ratings yet
BMP File Format - Wikipedia, The Free Encyclopedia
14 pages
Neural Networks: Derivation: 1 Model
No ratings yet
Neural Networks: Derivation: 1 Model
9 pages
Be Cre8v School
No ratings yet
Be Cre8v School
14 pages
Research Article: The Least Eigenvalue of The Complement of The Square Power Graph of G
No ratings yet
Research Article: The Least Eigenvalue of The Complement of The Square Power Graph of G
4 pages
Cmu850 f20
No ratings yet
Cmu850 f20
309 pages
Formulas Area
No ratings yet
Formulas Area
5 pages
Lasso Linear Regression
No ratings yet
Lasso Linear Regression
8 pages
Unsupervised Learning (A.k.a Clustering) : Marcello Pelillo
No ratings yet
Unsupervised Learning (A.k.a Clustering) : Marcello Pelillo
102 pages
Wardhaugh CH 7
No ratings yet
Wardhaugh CH 7
22 pages
Ashish Project Report
No ratings yet
Ashish Project Report
57 pages
Building Diversified Portfolios That Outperform Out-Of-Sample
No ratings yet
Building Diversified Portfolios That Outperform Out-Of-Sample
33 pages
Math 118: Mathematical Methods of Data Theory: Lecture 9: Graphs and Spectral Clustering
No ratings yet
Math 118: Mathematical Methods of Data Theory: Lecture 9: Graphs and Spectral Clustering
11 pages
Handout Materi Sketchup
No ratings yet
Handout Materi Sketchup
31 pages
Algebraic Graph Theory
No ratings yet
Algebraic Graph Theory
24 pages
Engg Mathematics 3 Jan 2014
No ratings yet
Engg Mathematics 3 Jan 2014
2 pages
Kernel K-Means, Spectral Clustering and Normalized Cuts: Inderjit S. Dhillon Yuqiang Guan Brian Kulis
No ratings yet
Kernel K-Means, Spectral Clustering and Normalized Cuts: Inderjit S. Dhillon Yuqiang Guan Brian Kulis
6 pages
Roch Mmids Intro 3clustering
No ratings yet
Roch Mmids Intro 3clustering
15 pages
19MnCr5
No ratings yet
19MnCr5
8 pages
Topic 6e - Hierarchical Clustering (MIN)
No ratings yet
Topic 6e - Hierarchical Clustering (MIN)
14 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
120712ChE128 7 LiqLiq Extract
No ratings yet
120712ChE128 7 LiqLiq Extract
39 pages
2004 01163 PDF
No ratings yet
2004 01163 PDF
17 pages
CS168 Spectral Graph Theory (Roughgarden & Valiant)
No ratings yet
CS168 Spectral Graph Theory (Roughgarden & Valiant)
11 pages
Spectral Clustering: X Through The Parameter W 0. The Resulting
No ratings yet
Spectral Clustering: X Through The Parameter W 0. The Resulting
7 pages
2092 On Spectral Clustering Analysis and An Algorithm
No ratings yet
2092 On Spectral Clustering Analysis and An Algorithm
8 pages
1 Outline: 18.409 An Algorithmist's Toolkit
No ratings yet
1 Outline: 18.409 An Algorithmist's Toolkit
10 pages
3.1 Some Simple Properties: G G G G G G G I
No ratings yet
3.1 Some Simple Properties: G G G G G G G I
11 pages
Cis515 15 Spectral Clust Chap3 4 PDF
No ratings yet
Cis515 15 Spectral Clust Chap3 4 PDF
43 pages
Signed Graphs: Negative Weights A Nega-Tive Weight Indicates Dissimilarity or Distance
No ratings yet
Signed Graphs: Negative Weights A Nega-Tive Weight Indicates Dissimilarity or Distance
42 pages
Unit Lesson Plan
No ratings yet
Unit Lesson Plan
3 pages
3.1 Graph Clustering Using Normalized Cuts
No ratings yet
3.1 Graph Clustering Using Normalized Cuts
24 pages
CS168: The Modern Algorithmic Toolbox Lectures #11: Spectral Graph Theory, I
No ratings yet
CS168: The Modern Algorithmic Toolbox Lectures #11: Spectral Graph Theory, I
7 pages
Region Segmentation Readings: Chapter 10: 10.1 Additional Materials Provided
No ratings yet
Region Segmentation Readings: Chapter 10: 10.1 Additional Materials Provided
47 pages
Lecture Notes On Clustering
No ratings yet
Lecture Notes On Clustering
10 pages
ADMM For Combinatorial Graph Problems: Preprint
No ratings yet
ADMM For Combinatorial Graph Problems: Preprint
20 pages
Support Vector Clustering: Journal of Machine Learning Research 2 (2001) 125-137 Submitted 3/04 Published 12/01
No ratings yet
Support Vector Clustering: Journal of Machine Learning Research 2 (2001) 125-137 Submitted 3/04 Published 12/01
13 pages
Spectral Analysis of Signed Graphs For Clustering, Prediction and Visualization
No ratings yet
Spectral Analysis of Signed Graphs For Clustering, Prediction and Visualization
12 pages
Normalized Cuts and Image Segmentation: ECE 278 Final Report
No ratings yet
Normalized Cuts and Image Segmentation: ECE 278 Final Report
7 pages
Luxburg07 Tutorial 4488
No ratings yet
Luxburg07 Tutorial 4488
32 pages
주례발표 (20101019) SpectralClusteringforClass
No ratings yet
주례발표 (20101019) SpectralClusteringforClass
18 pages
Ellipse Fitting PDF
100% (1)
Ellipse Fitting PDF
23 pages
Tutorial On Spectral Clustering
No ratings yet
Tutorial On Spectral Clustering
26 pages
Ulti Region Segmentation Using Graph Cuts
No ratings yet
Ulti Region Segmentation Using Graph Cuts
8 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

n25 PDF

Uploaded by

n25 PDF

Uploaded by

Clustering

archetypes (factor analysis)

2. Assign each point to the jth cluster if it is closest to j. For i = 1, . . . n, assign xi to Cj

3. If not assignments changed, then return.

Set 1 to be randomly-selected xi . In high-dimensional space, a randomly-selected

We could similarly and arbitrarily apply any valid metric:

3.1 Greedy Algorithm

The dendogram represents our steps to union each set of clusters.

We can examine a random greedy algorithm

Choose A uniformly at random

This reduces runtime from n3 to n2 an often produces more stable results.

a kernel function k(xi , xj )

We have several types of matrices that describe the structure of a graph.

adjacency matrix (A): Aij = 1 if i, j connected and 0 otherwise

Laplacian matrix (L = D W ): symmetric, PSD, always has i = 0, vi = 1

4.3 Minimizing the Cut

Let us first define the cut indicator.

We can then define a cut indicator, which tells us if i, j is in the cut.

L1 = 0, where 1 is the vector of 1s.

We claim only one such exists. Note that if y = 1, Ly = 0 and Cut(G1 , G2 ) = y T Ly = 0,

Proof: Assume for contradiction that another v2 6= 1 so that 2 = 0. So Lv2 = 0 = 2 v2 =

4.4 Minimizing the Masses

subject to the constraint that i, yi = 1 or yi = 1. Consider a two-dimensional represen-

So, v1 is not a solution. However, we note that v1 = 1. Note that v2 is orthogonal to v1 , so

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.