0% found this document useful (0 votes)
63 views19 pages

A Quick Introduction To Machine Learning (K-Means Clustering)

This document provides an introduction to K-means clustering, an unsupervised machine learning algorithm. It explains that K-means clustering partitions a set of points into k clusters by minimizing the distance between points and their assigned cluster centroid. The K-means algorithm works by randomly selecting initial centroids and then iteratively reassigning points to centroids and recomputing centroids until the clusters stabilize. Issues with K-means include its dependence on initial centroid selection and choosing the correct number of clusters k.

Uploaded by

Moamen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views19 pages

A Quick Introduction To Machine Learning (K-Means Clustering)

This document provides an introduction to K-means clustering, an unsupervised machine learning algorithm. It explains that K-means clustering partitions a set of points into k clusters by minimizing the distance between points and their assigned cluster centroid. The K-means algorithm works by randomly selecting initial centroids and then iteratively reassigning points to centroids and recomputing centroids until the clusters stabilize. Issues with K-means include its dependence on initial centroid selection and choosing the correct number of clusters k.

Uploaded by

Moamen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

A Quick Introduction to

Machine Learning
(K-means Clustering)
Lecturer: John Guttag

6.00.2x Machine Learning


K-means Clustering
Given a set of points X, and a positive integer k,
partition X into k clusters such that it approximately
minimizes the objective function
k

åå x - m
2 Minimizing the sum
c of the mean square
differences
c=1 xÎc
distance
clusters points from point
to centroid
of cluster
6.00.2x Machine Learning
K-means Algorithm
randomly choose k examples as centroids
while true:
create k clusters by assigning each
example to closest centroid
compute k new centroids by averaging
examples in each cluster
if centroids don’t change:
break

6.00.2x Machine Learning


Example

6.00.2x Machine Learning


Choose Initial Centroids (k = 4)

6.00.2x Machine Learning


Assign Points to Clusters

6.00.2x Machine Learning


Compute New Centroids

6.00.2x Machine Learning


Reassign Points to Clusters

6.00.2x Machine Learning


Compute New Centroids

6.00.2x Machine Learning


Reassign Points

6.00.2x Machine Learning


Compute New Centroids

6.00.2x Machine Learning


Reassign Points

6.00.2x Machine Learning


Compute New Centroids

6.00.2x Machine Learning


No Points Move

6.00.2x Machine Learning


Issues with K-means
Final result can depend upon initial centroids
Greedy algorithm can find different local optima

Choosing the “wrong” k can lead to nonsense

6.00.2x Machine Learning


Choosing K
A priori knowledge about application domain
There are five different kinds of bacteria: k = 5
There are two kinds of people in the world: k = 2

Search for a good k


Try different values of k, and evaluate quality of results

6.00.2x Machine Learning


Choosing Centroids
Try multiple random choices and choose best

6.00.2x Machine Learning


Finding the “Best” Solution

best = kMeans(points)
for t in range(numTrials):
C = kMeans(points)
if badness(C) < badness(best):
best = C

V(c) = å(mean(c) - x )2 badness(C) = å V(c)


xÎc cÎC

6.00.2x Machine Learning


Hierarchical vs. K-means
Hierarchical looks at different numbers of clusters
From 1 to n

K-means looks at many ways of creating k clusters

Hierarchical is slow

K-means is fast

Hierarchical is deterministic

K-means is non-deterministic
6.00.2x Machine Learning

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy