100% found this document useful (1 vote)
122 views14 pages

Machine Learning Notes Anna University

Machine learning Clustering Unsupervised learning DATA preprocessing

Uploaded by

Jeeva Jeeva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
122 views14 pages

Machine Learning Notes Anna University

Machine learning Clustering Unsupervised learning DATA preprocessing

Uploaded by

Jeeva Jeeva
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

K-Means Clustering Algorithm

K-Means Clustering is an unsupervised learning algorithm that is used to solve the clustering problems in machine learning or data science. In this
topic, we will learn what is K-means clustering algorithm, how the algorithm works, along with the Python implementation of k-means clustering
What is K-Means Algorithm?
K-Means Clustering is an Unsupervised Learning algorithm, which groups the unlabeled dataset into different clusters. Here K defines the number
of pre-defined clusters that need to be created in the process, as if K=2, there will be two clusters, and for K=3, there will be three clusters, and so
on.
It is an iterative algorithm that divides the unlabeled dataset into k different clusters in such a way that each dataset belongs only one group that
has similar properties.
It allows us to cluster the data into different groups and a convenient way to discover the categories of groups in the unlabeled dataset on its own
without the need for any training.
It is a centroid-based algorithm, where each cluster is associated with a centroid. The main aim of this algorithm is to minimize the sum of
distances between the data point and their corresponding clusters.
The algorithm takes the unlabeled dataset as input, divides the dataset into k-number of clusters, and repeats the process until it does not find the
best clusters. The value of k should be predetermined in this algorithm.

The k-means clustering algorithm mainly performs two tasks:


Determines the best value for K center points or centroids by an iterative process. O
Assigns each data point to its closest k-center. Those data points which are near to the particular k-center, create a cluster. Hence each cluster has
datapoints with some commonalities, and it is away from other clusters. The below diagram explains the working of the K-means Clustering
Algorithm:
How does the K-Means Algorithm Work? The working of the K-Means algorithm is explained in the below steps:
Step-1: Select the number K to decide the number of clusters.
Step-2: Select random K points or centroids. (It can be other from the input dataset).
Step-3: Assign each data point to their closest centroid, which will form the predefined K clusters.
Step-4: Calculate the variance and place a new centroid of each cluster.
Step-5: Repeat the third steps, which mean reassign each datapoint to the new closest centroid of each cluster.
Step-6: If any reassignment occurs, then go to step-4 else go to FINISH.
Step-7: The model is ready
Let's understand the above steps by considering the visual plots:
Suppose we have two variables M1 and M2. The x-y axis scatter plot of these two variables is given below: Let's take number k of clusters, i.e.,
K=2, to identify the dataset and to put them into different clusters. It means here we will try to group these datasets into two different clusters.
We need to choose some random k points or centroid to form the cluster. These points can be either the points from the dataset or any other point.
So, here we are selecting the below two points as k points, which are not the part of our dataset. Consider the below image:
Now we will assign each data point of the scatter plot to its closest K-point or centroid. We will compute it by applying some mathematics that we
have studied to calculate the distance between two points. So, we will draw a median between both the centroids. Consider the below image:

From the above image, it is clear that points left side of the line is near to the K1 or blue centroid, and points to the right of the line are close to the
yellow centroid. Let's color them as blue and yellow for clear visualization. As we need to find the closest cluster, so we will repeat the process by
choosing a new centroid. To choose the new centroids, we will compute the center of gravity of these centroids, and will find new centroids as
below: Next, we will reassign each datapoint to the new centroid. For this, we will repeat the same process of finding a median line.
Hierarchical clustering
Hierarchical clustering is a connectivity-based clustering model that groups the data points together that are close to each other based on the
measure of similarity or distance. The assumption is that data points that are close to each other are more similar or related than data points that are
farther apart.
A dendrogram, a tree-like figure produced by hierarchical clustering, depicts the hierarchical relationships between groups. Individual data points
are located at the bottom of the dendrogram, while the largest clusters, which include all the data points, are located at the top. In order to generate
different numbers of clusters, the dendrogram can be sliced at various heights.
The dendrogram is created by iteratively merging or splitting clusters based on a measure of similarity or distance between data points. Clusters
are divided or merged repeatedly until all data points are contained within a single cluster, or until the predetermined number of clusters is
attained.
We can look at the dendrogram and measure the height at which the branches of the dendrogram form distinct clusters to calculate the ideal
number of clusters. The dendrogram can be sliced at this height to determine the number of clusters.
Types of Hierarchical Clustering
Basically, there are two types of hierarchical Clustering:
1.Agglomerative Clustering
2.Divisive clustering
Hierarchical Agglomerative Clustering
It is also known as the bottom-up approach or hierarchical agglomerative clustering (HAC). A structure that is more informative than the
unstructured set of clusters returned by flat clustering. This clustering algorithm does not require us to prespecify the number of clusters. Bottom-
up algorithms treat each data as a singleton cluster at the outset and then successively agglomerate pairs of clusters until all clusters have been
merged into a single cluster that contains all data.
Steps:
•Consider each alphabet as a single cluster and calculate the distance of one cluster from all the other clusters.
•In the second step, comparable clusters are merged together to form a single cluster. Let’s say cluster (B) and cluster (C) are very similar to each
other therefore we merge them in the second step similarly to cluster (D) and (E) and at last, we get the clusters [(A), (BC), (DE), (F)]
•We recalculate the proximity according to the algorithm and merge the two nearest clusters([(DE), (F)]) together to form new clusters as [(A),
(BC), (DEF)]
•Repeating the same process; The clusters DEF and BC are comparable and merged together to form a new cluster. We’re now left with clusters
[(A), (BCDEF)].
•At last, the two remaining clusters are merged together to form a single cluster [(ABCDEF)].
Hierarchical Divisive clustering
It is also known as a top-down approach. This algorithm also does not require to prespecify the number of clusters. Top-down clustering requires a
method for splitting a cluster that contains the whole data and proceeds by splitting clusters recursively until individual data have been split into
singleton clusters.
Computing Distance Matrix
While merging two clusters we check the distance between two every pair of clusters and merge the pair with the least distance/most similarity.
But the question is how is that distance determined. There are different ways of defining Inter Cluster distance/similarity. Some of them are:
1.Min Distance: Find the minimum distance between any two points of the cluster.
2.Max Distance: Find the maximum distance between any two points of the cluster.
3.Group Average: Find the average distance between every two points of the clusters.
4.Ward’s Method: The similarity of two clusters is based on the increase in squared error when two clusters are merged.
For example, if we group a given data using different methods, we may get different results:
Hierarchical Agglomerative vs Divisive Clustering
•Divisive clustering is more complex as compared to agglomerative clustering, as in the case of divisive clustering we need a flat clustering
method as “subroutine” to split each cluster until we have each data having its own singleton cluster.
•Divisive clustering is more efficient if we do not generate a complete hierarchy all the way down to individual data leaves. The time complexity
of a naive agglomerative clustering is O(n3) because we exhaustively scan the N x N matrix dist_mat for the lowest distance in each of N-1
iterations. Using priority queue data structure we can reduce this complexity to O(n2logn). By using some more optimizations it can be brought
down to O(n2). Whereas for divisive clustering given a fixed number of top levels, using an efficient flat algorithm like K-Means, divisive
algorithms are linear in the number of patterns and clusters.
•A divisive algorithm is also more accurate. Agglomerative clustering makes decisions by considering the local patterns or neighbor points
without initially taking into account the global distribution of data. These early decisions cannot be undone. whereas divisive clustering takes into
consideration the global distribution of data when making top-level partitioning decisions.
Mean-Shift Clustering
Meanshift is falling under the category of a clustering algorithm in contrast of Unsupervised learning that assigns the data points to the clusters
iteratively by shifting points towards the mode (mode is the highest density of data points in the region, in the context of the Meanshift). As such,
it is also known as the Mode-seeking algorithm.
Mean-shift algorithm has applications in the field of image processing and computer vision. Unlike the popular K-Means cluster algorithm, mean-
shift does not require specifying the number of clusters in advance. The number of clusters is determined by the algorithm with respect to the data.
The process of mean-shift clustering algorithm can be summarized as follows:
Initialize the data points as cluster centroids.
Repeat the following steps until convergence or a maximum number of iterations is reached:
For each data point, calculate the mean of all points within a certain radius (i.e., the “kernel”) centered at the data point.
Shift the data point to the mean.
Identify the cluster centroids as the points that have not moved after convergence.
Return the final cluster centroids and the assignments of data points to clusters.
One of the main advantages of mean-shift clustering is that it does not require the number of clusters to be specified beforehand.
It also does not make any assumptions about the distribution of the data, and can handle arbitrary shapes and sizes of clusters. However, it can be
sensitive to the choice of kernel and the radius of the kernel.
Mean-Shift clustering can be applied to various types of data, including image and video processing, object tracking and bioinformatics.
Kernel Density Estimation
The first step when applying mean shift clustering algorithms is representing your data in a mathematical manner this means representing your
data as points such as the set below.

Mean-shift builds upon the concept of kernel density estimation, in short KDE. Imagine that the above data was sampled from a probability
distribution. KDE is a method to estimate the underlying distribution also called the probability density function for a set of data. It works by
placing a kernel on each point in the data set.
A kernel is a fancy mathematical word for a weighting function generally used in convolution. There are many different types of kernels, but the
most popular one is the Gaussian kernel. Adding up all of the individual kernels generates a probability surface example density function.
Depending on the kernel bandwidth parameter used, the resultant density function will vary.
Below is the KDE surface for our points above using a Gaussian kernel with a kernel bandwidth of 2.

Surface plot: Contour plot:

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy