Kmeans Notes
Kmeans Notes
Table Of Contents
Clustering 1
What is a Cluster? 1
What is Clustering? 1
What are the Different Types of Clustering? 2
K Means 2
What is the Basic Idea behind K-means? 2
What is the 1-of-K Coding Scheme? 2
What is the Objective Function in K-Means Algorithm? 2
What is Distortion Measure? 3
What is the K-Means algorithm? ( More Formalised Version) 3
Convergence in K Means 3
What is Convergence in K-Means? 3
Is Convergence Guaranteed in K-Means? 3
Minima Issues in K-Means 4
What are some reasons why a cluster may have just One Point? 5
K Medoids 6
Implementation of K Means 7
Supervised versus Unsupervised Learning
What is the Key Difference Between Unsupervised Learning versus Supervised Learning?
Supervised Learning: Learning under supervision, We have a full set of labeled data for learning.
Unsupervised Learning: They analyze and cluster unlabeled data sets using algorithms that
discover hidden patterns in data without the need for human intervention, Unlabeled Data.
How can we view the task of discrete binning/grouping in unsupervised and unsupervised
learning setups?
Clustering
What is a Cluster?
A Cluster can be thought of as comprising a group of data points whose inter-point distances are
small compared with the distances to points outside of the cluster.
What is Clustering?
The grouping of objects such that objects in the same cluster are more similar to each other than
they are to objects in another cluster.[1]
Or
The task of finding an assignment of data points to clusters, as well as a set of vectors {μk}, such that
the sum of the squares of the distances of each data point to its closest vector μk ( or another
similarity measure) is a minimum.
Clean and simple explanation with diagrams at Google Developers Machine Learning Course -
Clustering Algorithms
K Means
For each data point xn, we introduce a corresponding set of binary indicator variables rnk ∈ { 0, 1},
where k = 1, . . . , K describing which of the K clusters the data point xn is assigned to, so that if data
point xn is assigned to cluster k then rnk = 1, and rnj = 0 for j = k.
Also called the Distortion Measure, this represents the sum of the squares of the distances of each
data point to its assigned vector μk. N is range of Data Points, and K is range of Topics/Classes.
Our goal is to find values for the {rnk} and the {μk} so as to minimize J.
Convergence in K Means
There is no further change in the assignments of the centroid of clusters, or cluster assignment of
points in the algorithm.
Yes, in each phase our Objective Function will decrease and will reach a steady state.
An example plot:
Source of Image
But sometimes, we may not observe a strict inflection point, like in the fig below.
3. KMeans++ Algorithm
KMeans does the first step, assignment of the first round of initialization of centroids in a smarter
initialization of the centroids and improves the quality of the clustering.
From GeeksForGeeks:
K Medoids
K-means may not be the most suitable algorithm in some cases, since it is very sensitive to noise and
outliers. While, K-means attempts to minimize the total squared error, while k-medoids minimize
1. Initialize: select k random points out of the n data points as the medoids.
2. Associate each data point to the closest medoid by using any common distance metric methods.
3. While the cost decreases:
For each medoid m, for each data o point which is not a medoid:
1. Swap m and o, associate each data point to the closest medoid, recompute the cost.
2. If the total cost is more than that in the previous step, undo the swap.
Summary:
1. All clusters are the same size.
2. Clusters have the same extent in every direction.
3. Clusters have similar numbers of points assigned to them.
Find a demonstration at Demonstration of k-means assumptions — scikit-learn 1.0.1
documentation
Implementation of K Means