Clustering and K-Means Algorithm
Clustering and K-Means Algorithm
LEARNING
(Clustering and K-Means
Algorithm)
Outline of Presentation
❑Clustering
❑Types of Clustering
❑Examples and explanation of Clustering
❑K-Means Clustering
❑Working of K-Means Algorithm
❑K-Means Algorithm Examples
❑Hierarchal Clustering
❑Distance Based Clustering
Machine Learning
Classification
Supervised Task Driven (Defined Labels)
Predict Future
Learning Values Regression
(No Defined Labels)
Examples of Clustering
➢ Clustering quality
▪ Inter-clusters distance maximized
▪ Intra-clusters distance minimized
• In the field of
cluster analysis,
this similarity
plays an
important part.
• Now, we shall
learn how
similarity (this is
also alternatively
judged as
“dissimilarity”)
between any two
data can be
measured.
Applications of clustering
Exclusive Clustering
• Hard Clustering
• Data Point/Item belongs exclusively to one
cluster
• For Example- K-Means Clustering
Types of CLUSTERING
❑ Exclusive Clustering
❑ Overlapping Clustering
❑ Hierarchical Clustering
Overlapping Clustering
• Soft Cluster
• Data Points/Item Belong to Multiple Cluster
• For Example- Fuzzy/ C-Means Clustering
Types of CLUSTERING
❑ Exclusive Clustering
❑ Overlapping Clustering
❑ Hierarchical Clustering
Hierarchal Clustering
Methods of CLUSTERING
❑ Partitioning Method
❑ Hierarchical Method
❑ Density-Based Methods
Partitioning Methods
Partitioning a database D of n objects into a set of k clusters, such that the sum of
squared distances is minimized (where c i is the centroid or medoid of cluster Ci)
E = ik=1 pCi ( p − ci ) 2
k-means (MacQueen’67, Lloyd’57/’82): Each cluster is represented by the
center of the cluster
k-medoids or PAM (Partition around medoids) (Kaufman & Rousseeuw’87):
Each cluster is represented by one of the objects in the cluster
K-means Clustering
Examples:
Partitioning Methods
Partitioning a database D of n objects into a set of k clusters, such that the sum of
squared distances is minimized (where c i is the centroid or medoid of cluster Ci)
E = ik=1 pCi ( p − ci ) 2
k-means (MacQueen’67, Lloyd’57/’82): Each cluster is represented by the
center of the cluster
k-medoids or PAM (Partition around medoids) (Kaufman & Rousseeuw’87):
Each cluster is represented by one of the objects in the cluster
Hierarchical CLUSTERING
• Density-reachable:
• A point p is density-reachable from a
p
point q w.r.t. Eps, MinPts if there is a p1
chain of points p1, …, pn, p1 = q, pn = p q
such that pi+1 is directly density-reachable
from pi
• Density-connected
• A point p is density-connected to a point
q w.r.t. Eps, MinPts if there is a point o
p q
such that both, p and q are density-
reachable from o w.r.t. Eps and MinPts o
DBSCAN: Density-Based Spatial
Clustering of Applications with Noise
• Relies on a density-based notion of cluster: A cluster is defined as a
maximal set of density-connected points
• Discovers clusters of arbitrary shape in spatial databases with noise
Outlier
Border
Eps = 1cm
Core MinPts = 5
DBSCAN: The Algorithm
• Continue the process until all of the points have been processed