0% found this document useful (0 votes)
31 views13 pages

K - Means Clustering

Means

Uploaded by

Aryan Panchal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
31 views13 pages

K - Means Clustering

Means

Uploaded by

Aryan Panchal
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 13

K – Means Clustering

Introduction
• K-Means Clustering is an
Unsupervised Machine Learning algorithm,
which groups the unlabeled dataset into
different clusters.
• Unsupervised Machine Learning is the process of teaching a
computer to use unlabeled, unclassified data and enabling the
algorithm to operate on that data without supervision. Without
any previous data training, the machine’s job in this case is to
organize unsorted data according to parallels, patterns, and
variations.
• K means clustering, assigns data points to one of the K
clusters depending on their distance from the center of the
clusters. It starts by randomly assigning the clusters
centroid in the space. Then each data point assign to one
of the cluster based on its distance from centroid of the
cluster. After assigning each point to one of the cluster,
new cluster centroids are assigned. This process runs
iteratively until it finds good cluster. In the analysis we
assume that number of cluster is given in advanced and
we have to put points in one of the group.
• A centroid is a data point that represents the
center of the cluster (the mean), and it might
not necessarily be a member of the dataset.
• In some cases, K is not clearly defined, and we have to
think about the optimal number of K. K Means
clustering performs best data is well separated. When
data points overlapped this clustering is not suitable. K
Means is faster as compare to other clustering
technique. It provides strong coupling between the data
points. K Means cluster do not provide clear
information regarding the quality of clusters. Different
initial assignment of cluster centroid may lead to
different clusters. Also, K Means algorithm is sensitive
to noise. It maymhave stuck in local minima.
objective of k-means clustering

• The goal of clustering is to divide the population


or set of data points into a number of groups so
that the data points within each group are more
comparable to one another and different from the
data points within the other groups. It is
essentially a grouping of things based on how
similar and different they are to one another.
How k-means clustering works?
• We are given a data set of items, with certain
features, and values for these features (like a
vector). The task is to categorize those items
into groups. To achieve this, we will use the K-
means algorithm, an unsupervised learning
algorithm. ‘K’ in the name of the algorithm
represents the number of groups/clusters we
want to classify our items into.
• (It will help if you think of items as points in an
n-dimensional space). The algorithm will
categorize the items into k groups or clusters
of similarity. To calculate that similarity, we
will use the Euclidean distance as a
measurement.
The algorithm works as follows:
• First, we randomly initialize k points, called
means or cluster centroids.
• We categorize each item to its closest mean, and
we update the mean’s coordinates, which are
the averages of the items categorized in that
cluster so far.
• We repeat the process for a given number of
iterations and at the end, we have our clusters.
• The “points” mentioned above are called means
because they are the mean values of the items
categorized in them. To initialize these means, we
have a lot of options. An intuitive method is to
initialize the means at random items in the data set.
Another method is to initialize the means at
random values between the boundaries of the data
set (if for a feature x, the items have values in [0,3],
we will initialize the means with values for x at
[0,3]).
pseudocode
• Initialize k means with random values
--> For a given number of iterations:

--> Iterate through items:

--> Find the mean closest to the item by calculating


the euclidean distance of the item with each of the means

--> Assign item to mean

--> Update mean by shifting it to the average of the items in


that cluster

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy