Unit 3 - KmeansClustering
Unit 3 - KmeansClustering
It minimize the sum of distances between the data point and their
corresponding clusters.
The algorithm takes the unlabeled dataset as input, divides the dataset
into k-number of clusters, and repeats the process until it does not find
the best clusters.
The value of k should be predetermined in this algorithm.
K-Means Clustering Algorithm
Step 1 − Specify the number of clusters, K, need to be generated by
this algorithm.
Step 2 − Next, randomly select K data points and assign each data
point to a cluster. Classify the data based on the number of data
points.
Step 3 − Now it will compute the cluster centroids.
Step 4 − Keep iterating the following until we find
optimal centroid which is the assignment of data points to the clusters
that are not changing any more
pseudocode of K means Clustering Algorithm
Initialize k means with random values
--> For a given number of iterations:
--> Iterate through items:
--> Find the mean closest to the item by calculating
the euclidean distance of the item with each of the means
--> Assign item to mean
--> Update mean by shifting it to the average of the items in that cluster
Example
We have a list of cricket players from all over the world, which
gives information on the runs scored by the player and the
wickets taken by them in the last ten matches.
Based on this information, we need to group the data into two
clusters, namely batsman and bowlers.
Step 1 : let us solve the problem by taking K = 2
Step 2 : Two points are assigned as centroids.
The points can be anywhere, as they are random points. They
are called centroids, but initially, they are not the central point
of a given data set.
Step 3: Determine the distance between each of the randomly
assigned centroids' data points.
For every point, the distance is measured from both the
centroids, and whichever distance is less, that point
is assigned to that centroid.
Determine the actual centroid for these two clusters
Step 4: This process of calculating the distance and
repositioning the centroid continues until we obtain
our final cluster.
Height Weight
185 72
170 56
168 60
179 68
182 72
188 77
180 71
180 70
Applications of K-Means Clustering
Academic performance - based on the scores, students are
categorized into grades like A, B, or C.
Diagnostic systems – It uses k-means in creating smarter medical
decision support systems, especially in the
treatment of liver ailments.
Search engines - Clustering forms a backbone of search engines.
When a search is performed, the search results need to be grouped,
and the search engines very often use clustering to do this.
Link : https://www.youtube.com/watch?
v=CLKW6uWJtTc&t=340s