0% found this document useful (0 votes)

33 views15 pages

Data Clustering (Contd) : CS771: Introduction To Machine Learning Piyush Rai

This document discusses various clustering algorithms, including extensions of k-means clustering such as soft clustering and kernel k-means. It also describes hierarchical clustering, graph clustering including spectral clustering, and density-based clustering such as DBSCAN. Finally, it provides a brief introduction to probabilistic clustering methods like Gaussian mixture models.

Uploaded by

Rajachandra Voodiga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views15 pages

Data Clustering (Contd) : CS771: Introduction To Machine Learning Piyush Rai

Uploaded by

Rajachandra Voodiga

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Data Clustering (Contd)

CS771: Introduction to Machine Learning

Piyush Rai
2
Plan
 K-means extensions
 Soft clustering
 Kernel K-means
 A few other popular clustering algorithms
 Hierarchical Clustering
 Agglomerative Clustering
 Divisive Clustering
 Graph Clustering
 Spectral Clustering
 Density-based clustering
 DBSCAN
 Basic idea of probabilistic clustering methods, such as Gaussian mixture
models (details when we talk about latent variable models)
CS771: Intro to ML
3
K-means: Hard vs Soft Clustering
 K-means makes hard assignments of points to clusters
 Hard assignment: A point either completely belongs to a cluster or doesn’t belong at all
A more principled
extension of K-means for
doing soft-clustering is via
probabilistic mixture
models such as the
Gaussian Mixture Model

 When clusters overlap, soft assignment is preferable(i.e., probability of being assigned

to each cluster: say and for some point , )
 A heuristic
𝐾 to get soft assignments: Transform distances from clusters into
∑ 𝛾 𝑛𝑘 =1
prob.
𝑘=1

 CS771: Intro to ML
4
K-means: Decision Boundaries and Cluster Sizes/Shapes
 K-mean assumes that the decision boundary between any two clusters is linear
 Reason: The K-means loss function implies assumes equal-sized, spherical clusters

Reason: Use of
Euclidean distances

 May do badly if clusters are not roughly equi-sized and convex-shaped

CS771: Intro to ML
5
Kernel K-means Helps learn non-spherical clusters
and nonlinear cluster boundaries

 Basic idea: Replace the Eucl. distances in K-means by the kernelized versions

Kernelized distance between

input and mean of cluster

 Here denotes the kernel function and is its (implicit) feature map
 Note: is the mean of mappings of the data points assigned to cluster
1
Not the same as the mapping of
the mean of the data points 𝜙 ( μ𝑘 )= ∑
|𝒞 𝑘| 𝑛: 𝑧 =𝑘
𝜙 ( 𝒙 𝑛) Can also used landmarks or
kernel random features idea
assigned to cluster 𝑛
to get new features and run
standard k-means on those

Note: Apart from kernels, it is also possible to

use other distance functions in K-means.
Bregman Divergence* is such a family of
distances (Euclidean and Mahalanobis are
special cases)
*Clustering with Bregman Divergences (Banerjee et al, 2005)
CS771: Intro to ML
6
Hierarchical Clustering Similarity between two clusters
(or two set of points) is needed in
HC algos (e.g., this can be average
pairwise similarity between the
 Can be done in two ways: Agglomerative or Divisive inputs in the two clusters)
Agglomerative: Start
with each point being Keep recursing until the
in a singleton cluster desired number of clusters
found
At each step, greedily merge
At each step, break a cluster
two most “similar” sub-clusters
into (at least) two smaller
Stop when there is a single homogeneous sub-clusters
cluster containing all the Divisive: Start with all
points points being in a single
Learns a dendrogram-like cluster
structure with inputs at the leaf
nodes. Can then choose how
Tricky because no labels
many clusters we want
(unlike Decision Trees)
 Agglomerative is more popular and simpler than divisive (the latter usually needs
complicated heuristics to decide cluster splitting).
 Neither uses any loss function CS771: Intro to ML
7
Graph Clustering
 Often the data is given in form of a graph, not feat. vec.
 Usually in form of a pairwise similarity matrix of size
 is assumed to be the similarity between two nodes/inputs with indices and

 Examples: Social networks and various interaction networks

Various graph
embedding algorithms
 Goal is to cluster the nodes/inputs into clusters (flat partitioning) exist (e.g., node2vec)

 One scheme is to somehow get an embedding of the graph nodes to get feature vector
for each node and run -means or kernel -means or any other clustering algo
 Another way is to perform direct graph clustering
 Spectral clustering is such a popular graph clustering algorithm

CS771: Intro to ML
8
Spectral Clustering
Spectral clustering has a beautiful theory
behind it (won’t get into it in this course; may
refer to a very nice tutorial article listed
below, if interested)

 We are given the node-node similarity matrix of size

 Compute the graph Laplacian
 is a diagonal matrix s.t. (sum of similarities of node with all other nodes)

 Note: Often, we work with a normalized graph Laplacian

 Given the graph Laplacian, solve this spectral decomposition problem Meaning U has
orthonormal
s .t .𝑼 ⊤ 𝑼 =𝑰 columns

 Now run -means on the matrix as the feature matrix of the nodes
 Note: Spectral clustering* is also closely related to kernel -means (but more general
since can represent any graph) and “normalized cuts” for graphs
*Kernel k-means, Spectral Clustering and Normalized Cuts (Dhillon et al, 2004) A Tutorial on Spectral Clustering (Ulrike von Luxburg, 2007)
CS771: Intro to ML
9
Density based Clustering - DBSCAN
 DBSCAN: Density Based Spatial Clustering of Applications with Noise
 Uses notion of density of points (not in the sense of probability density) around a point
DBSCAN treats densely connected Grey points left
points as a cluster, regardless of the unclustered
 Has some very nice properties shape of the cluster since they are
most likely
 Does not require specifying the number of clusters outliers

 Can learn arbitrary shaped clusters (since it only considers of density of points)
 Robust against outliers (leaves them unclustered!), unlike other clust. algos like K-
means Accuracy of DBSCAN
depends crucially on and
minPoint hyperparams
 Basic idea in DBSCAN is as follows
 Want all points within a cluster to be at most distance apart from each other
 Want at least minPoints points within distance of a point (such a point is called “core” point)
 Points that don’t have minPoints within distance are called “border” points
 Points that are neither core nor border point are outliers CS771: Intro to ML
10
DBSCAN (Contd)
 The animation on the right shows DBSCAN in action
 The basic algorithm is as follows
 A point is chosen at random
 If more than minPoint neighbors distance, then call it core point
 Check if more points fall within distance of core/its neighbors
 If yes, include them too in the same cluster
 Once done with this cluster, pick another point randomly and repeat
 An example of clustering obtained by DBSCAN Green points are core points,
blue points are border points,
red points are outliers

DBSCAN is mostly a
heuristic based algorithm.
No loss function unlike K-
means

Animation credit: https://dashee87.github.io/ CS771: Intro to ML

11
Going the Probabilistic Way..
 Assume a generative model for inputs and denotes all the unknown params
 Clustering then boils down to computing posterior cluster probability where denote
the cluster assignment of

(from Bayes rule)

 Assuming prior to be multinoulli with prob. vector and each of the class-conditional
to be a Gaussian
(Here )

Posterior prob. Of cluster assignment also depends on prior Different clusters can have different
probability (fraction of points in that cluster if using MLE) covariances (hence different shapes)

 We know how to estimate if were known (recall generative classification) Just like in K-
means
 But since we don’t know need to estimate both (and ALT-OPT can be used)
CS771: Intro to ML
12
Going the Probabilistic Way..
 At a high-level, a probabilistic clustering algorithm would look somewhat like
this
Akin to initializing the cluster means in K-
means
Akin to computing cluster
assignments in K-means

Akin to updating cluster

means in K-means

 The above algorithm is an instance of a more general Expectation Maximization

CS771: Intro to ML
13
Clustering vs Classification
 Any clustering model (prob/non-prob) typically learns two type of quantities
 Parameters of the clustering model (e.g., cluster means in K-means)
 Cluster assignments for the points
 If cluster assignments were known, learning the parameters is just like learning the
parameters of a classifn model (typically generative classification) using labeled data
 Thus helps to think of clustering as (generative) classification with unknown labels
 Therefore many clustering problems are typically solved in the following fashion
1. Initialize somehow
2. Predict Z given current estimate of
3. Use the predicted Z to improve the estimate of (like learning a generative classification
model)
4. Go to step 2 if not converged yet
CS771: Intro to ML
14
Clustering can help supervised learning, too
 Often “difficult” sup. learning problems can be seen as mixture of simpler
models
 Example: Nonlinear regression or nonlinear classification as mixture of linear models

 Don’t know which point should be modeled by which linear

Such an approachmodel ⇒ Clustering
is also an example of divide and conquer
and is also known as “mixture of experts” (will see it more
 Can therefore solve such problems as follows formally when we discuss latent variable models)

 Initialize each linear model somehow (maybe randomly)

 Cluster the data by assigning each point to its “closest” linear model (one that gives lower error)
 (Re-)Learn a linear model for each cluster’s data. Go to step 2 if not converged. CS771: Intro to ML
15
Coming up next
 Latent Variable Models
 Mixture models using latent variables
 Expectation Maximization algorithm

CS771: Intro to ML

Practical No 1: Aim:Breadth First Search & Iterative Depth First Search
No ratings yet
Practical No 1: Aim:Breadth First Search & Iterative Depth First Search
36 pages
Time Series
100% (1)
Time Series
91 pages
K-Means and Hierarchical Clustering
No ratings yet
K-Means and Hierarchical Clustering
30 pages
Clustering Examples
No ratings yet
Clustering Examples
47 pages
Data Modeling - Cheatsheet
No ratings yet
Data Modeling - Cheatsheet
9 pages
Lec 11
No ratings yet
Lec 11
57 pages
Information Security Awareness - Refresher Course
100% (2)
Information Security Awareness - Refresher Course
83 pages
03 Clustering
No ratings yet
03 Clustering
63 pages
Machine Learning4
No ratings yet
Machine Learning4
39 pages
Clustering
No ratings yet
Clustering
41 pages
05 Ensemble Learning
No ratings yet
05 Ensemble Learning
67 pages
Clustering
No ratings yet
Clustering
55 pages
Roc Curve in Python
No ratings yet
Roc Curve in Python
58 pages
DS303 Clustering
No ratings yet
DS303 Clustering
20 pages
7 - Kmeans 5-11-24
No ratings yet
7 - Kmeans 5-11-24
51 pages
EML %TH Module
No ratings yet
EML %TH Module
40 pages
06 Clustering
No ratings yet
06 Clustering
36 pages
Natural Disasters Prediction
No ratings yet
Natural Disasters Prediction
21 pages
Box Plot - Excel 2007
No ratings yet
Box Plot - Excel 2007
11 pages
DM After Midz
No ratings yet
DM After Midz
22 pages
8 Cluster
No ratings yet
8 Cluster
33 pages
Clustering Lec 1 Introduction To Clustering
No ratings yet
Clustering Lec 1 Introduction To Clustering
48 pages
Classification of Dry Bean
No ratings yet
Classification of Dry Bean
16 pages
K Means Clustering Algorithm - BECOC316
No ratings yet
K Means Clustering Algorithm - BECOC316
5 pages
6 - Into To Data Science Techniques and Clustering
No ratings yet
6 - Into To Data Science Techniques and Clustering
16 pages
Data Mining - Lab 2
No ratings yet
Data Mining - Lab 2
5 pages
K Means
No ratings yet
K Means
24 pages
Clustering
No ratings yet
Clustering
53 pages
Tugas Data Mining Pertemuan 10 Kelompok 3
No ratings yet
Tugas Data Mining Pertemuan 10 Kelompok 3
4 pages
ML Assignment 2
No ratings yet
ML Assignment 2
6 pages
Data Mining P9-SVM
No ratings yet
Data Mining P9-SVM
30 pages
Semi-: Supervised Learning
No ratings yet
Semi-: Supervised Learning
40 pages
Unit 7 - Unsupervised Machine Learning
No ratings yet
Unit 7 - Unsupervised Machine Learning
27 pages
Week 7 Kmeans
No ratings yet
Week 7 Kmeans
18 pages
Week 11 ML
No ratings yet
Week 11 ML
3 pages
Michael Chan
No ratings yet
Michael Chan
6 pages
RNN LSTM
No ratings yet
RNN LSTM
49 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
66 pages
A Comparison of K-Means Clustering Algorithm and C
No ratings yet
A Comparison of K-Means Clustering Algorithm and C
4 pages
Chapter1 PDF
No ratings yet
Chapter1 PDF
54 pages
Box Plot Generator
No ratings yet
Box Plot Generator
6 pages
Clustering
No ratings yet
Clustering
75 pages
Lecture 13 - Unsupervised Learning, PCA ICA
No ratings yet
Lecture 13 - Unsupervised Learning, PCA ICA
50 pages
SP14 CS188 Lecture 23 - Kernels and Clustering - Print
No ratings yet
SP14 CS188 Lecture 23 - Kernels and Clustering - Print
39 pages
Major Project - Colab
No ratings yet
Major Project - Colab
15 pages
Unsupervised Learning (A.k.a Clustering) : Marcello Pelillo
No ratings yet
Unsupervised Learning (A.k.a Clustering) : Marcello Pelillo
102 pages
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-Lect5 - Clustering. The K-Means Algorithm. Hierarchical Clustering. The DBSCAN Algorithm. Clustering Evaluation
110 pages
Data Science Project
No ratings yet
Data Science Project
25 pages
771 A18 Lec14
No ratings yet
771 A18 Lec14
118 pages
Lecture Unsupervised (17!04!2024)
No ratings yet
Lecture Unsupervised (17!04!2024)
61 pages
Lect 10 - Unsupervised Learning
No ratings yet
Lect 10 - Unsupervised Learning
50 pages
Klasifikasi Penipuan Transaksi Kartu Kredit Menggunakan Metode Random Forest 18219061 Gian Denggan Bendjamin
No ratings yet
Klasifikasi Penipuan Transaksi Kartu Kredit Menggunakan Metode Random Forest 18219061 Gian Denggan Bendjamin
8 pages
Capstone Project 2
No ratings yet
Capstone Project 2
27 pages
Cluster Analysis 1731695796
No ratings yet
Cluster Analysis 1731695796
91 pages
2021 Clustering
No ratings yet
2021 Clustering
50 pages
Machine Learning-4
No ratings yet
Machine Learning-4
73 pages
SQL Joins Interview Questions: Click Here
No ratings yet
SQL Joins Interview Questions: Click Here
34 pages
Unit-V Clustering Part 1
No ratings yet
Unit-V Clustering Part 1
26 pages
USL3
No ratings yet
USL3
19 pages
2 Supervised Learning
No ratings yet
2 Supervised Learning
52 pages
ML Lecture14
No ratings yet
ML Lecture14
17 pages
Chapter 3 Unsupervised Learning
No ratings yet
Chapter 3 Unsupervised Learning
45 pages
UNIT-IV - Decision Tree Induction
No ratings yet
UNIT-IV - Decision Tree Induction
19 pages
EE769-10 Clustering
No ratings yet
EE769-10 Clustering
20 pages
5 - Clustering
No ratings yet
5 - Clustering
13 pages
Unsupervised Learning 1
No ratings yet
Unsupervised Learning 1
40 pages
Clustering
No ratings yet
Clustering
65 pages
20210501-ML Question Bank
No ratings yet
20210501-ML Question Bank
1 page
Unsuper
No ratings yet
Unsuper
15 pages
ML Module 5
No ratings yet
ML Module 5
15 pages
K - Means Clustering and Related Algorithms: Ryan P. Adams COS 324 - Elements of Machine Learning Princeton University
No ratings yet
K - Means Clustering and Related Algorithms: Ryan P. Adams COS 324 - Elements of Machine Learning Princeton University
18 pages
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
No ratings yet
Week 4 - Lecture Slides - K-Means, Mixture Models, & EM
65 pages
Machine Learning
No ratings yet
Machine Learning
2 pages
Interpreting Box Plots 3
No ratings yet
Interpreting Box Plots 3
2 pages
Clustering Analysis
No ratings yet
Clustering Analysis
30 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
Amazon-Fine-Food-Review - K-Means, Agglomerative & DBSCAN Clustering
No ratings yet
Amazon-Fine-Food-Review - K-Means, Agglomerative & DBSCAN Clustering
79 pages
Phil IRI Answer Sheet
No ratings yet
Phil IRI Answer Sheet
4 pages
Introduction To (Statistical) Machine Learning
No ratings yet
Introduction To (Statistical) Machine Learning
30 pages
Mod 4 - CLustering
No ratings yet
Mod 4 - CLustering
55 pages
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
No ratings yet
Introduction To Data Science Unsupervised Learning: CS 194 Fall 2015 John Canny
54 pages
Clustering
No ratings yet
Clustering
11 pages
Unsupervised Learning: K-Means Clustering
No ratings yet
Unsupervised Learning: K-Means Clustering
23 pages
Understanding The Inners of Clustering: DR Akashdeep, UIET, Panjab University Chandigarh, Maivriklab@pu - Ac.in
No ratings yet
Understanding The Inners of Clustering: DR Akashdeep, UIET, Panjab University Chandigarh, Maivriklab@pu - Ac.in
61 pages
Lecture 6
No ratings yet
Lecture 6
55 pages
ML 03
No ratings yet
ML 03
42 pages
Machine Learning Timetable
No ratings yet
Machine Learning Timetable
4 pages
Soft Vs Hard Clustering
No ratings yet
Soft Vs Hard Clustering
5 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
Uncertainity Quantification
No ratings yet
Uncertainity Quantification
88 pages
Numpy Interview Questions: Click Here
No ratings yet
Numpy Interview Questions: Click Here
32 pages
ML DSBA Lab7
No ratings yet
ML DSBA Lab7
6 pages
08.time Series
No ratings yet
08.time Series
1 page
Artificial Intelligence Interview Questions: Click Here
No ratings yet
Artificial Intelligence Interview Questions: Click Here
44 pages
Reinforcement Learning
No ratings yet
Reinforcement Learning
23 pages
Chapter3 Gaining Efficiencies
No ratings yet
Chapter3 Gaining Efficiencies
6 pages
Clustering: CMPUT 466/551 Nilanjan Ray
No ratings yet
Clustering: CMPUT 466/551 Nilanjan Ray
34 pages
Chapter1-Foundations For Efficiencies
No ratings yet
Chapter1-Foundations For Efficiencies
5 pages
Machine Learning & Data Mining: Understanding
No ratings yet
Machine Learning & Data Mining: Understanding
7 pages
Sagrada Individualized Phil Iri English Results Post Test Sy 2022 2023
No ratings yet
Sagrada Individualized Phil Iri English Results Post Test Sy 2022 2023
7 pages
Pyspark Interview Questions: Click Here
0% (1)
Pyspark Interview Questions: Click Here
35 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Data Clustering (Contd) : CS771: Introduction To Machine Learning Piyush Rai

Uploaded by

Data Clustering (Contd) : CS771: Introduction To Machine Learning Piyush Rai

Uploaded by

Data Clustering (Contd)

CS771: Introduction to Machine Learning

 When clusters overlap, soft assignment is preferable(i.e., probability of being assigned

 May do badly if clusters are not roughly equi-sized and convex-shaped

Kernelized distance between

Note: Apart from kernels, it is also possible to

 Examples: Social networks and various interaction networks

 We are given the node-node similarity matrix of size

 Note: Often, we work with a normalized graph Laplacian

Animation credit: https://dashee87.github.io/ CS771: Intro to ML

(from Bayes rule)

Akin to updating cluster

 The above algorithm is an instance of a more general Expectation Maximization

 Don’t know which point should be modeled by which linear

 Initialize each linear model somehow (maybe randomly)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.