0% found this document useful (0 votes)

34 views8 pages

Kmeans Notes

The document provides an overview of K-means clustering, including definitions of key terms, explanations of the algorithm and assumptions. It discusses how K-means aims to partition observations into K clusters by minimizing total intra-cluster variance, and how it converges by alternating between assignment and update steps. However, K-means can get stuck in local optima and the number of clusters K must be specified beforehand.

Uploaded by

p_manimozhi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

34 views8 pages

Kmeans Notes

Uploaded by

p_manimozhi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

CSC380: Principles of Data Science

Class Notes | Clustering - K Means

Table Of Contents

Supervised versus Unsupervised Learning 1

What is the Key Difference Between Unsupervised Learning versus Supervised Learning? 1
How can we view the task of discrete binning/grouping in unsupervised and unsupervised
learning setups? 1

Clustering 1
What is a Cluster? 1
What is Clustering? 1
What are the Different Types of Clustering? 2

K Means 2
What is the Basic Idea behind K-means? 2
What is the 1-of-K Coding Scheme? 2
What is the Objective Function in K-Means Algorithm? 2
What is Distortion Measure? 3
What is the K-Means algorithm? ( More Formalised Version) 3

Convergence in K Means 3
What is Convergence in K-Means? 3
Is Convergence Guaranteed in K-Means? 3
Minima Issues in K-Means 4
What are some reasons why a cluster may have just One Point? 5

The Number Of Topics for K Means 5

How can we choose a number of Topics for K-Means? 5
What is the KMeans ++ Algorithm? 6

K Medoids 6

Assumptions made by K Means 7

Implementation of K Means 7
Supervised versus Unsupervised Learning

What is the Key Difference Between Unsupervised Learning versus Supervised Learning?

Supervised Learning: Learning under supervision, We have a full set of labeled data for learning.

Unsupervised Learning: They analyze and cluster unlabeled data sets using algorithms that
discover hidden patterns in data without the need for human intervention, Unlabeled Data.

Supervised Learning: Classification

Unsupervised Learning: Clustering

Clustering

What is a Cluster?

A Cluster can be thought of as comprising a group of data points whose inter-point distances are
small compared with the distances to points outside of the cluster.

What is Clustering?

The grouping of objects such that objects in the same cluster are more similar to each other than
they are to objects in another cluster.[1]

The task of finding an assignment of data points to clusters, as well as a set of vectors {μk}, such that
the sum of the squares of the distances of each data point to its closest vector μk ( or another
similarity measure) is a minimum.

CSC380: Principles of Data Science 1

What are the Different Types of Clustering?

Clean and simple explanation with diagrams at Google Developers Machine Learning Course -
Clustering Algorithms

K Means

Based on Chapter 9: Pattern Recognition and Machine Learning- Christopher M Bishop

What is the Basic Idea behind K-means?

● Assign (Random) Cluster Centroids

● Until Convergence:
○ Cluster Assignment Step
○ Re-assigning Centroid Step

A simple gif visualization at Introduction to K-Means Clustering in Python with scikit-learn

Video Recommendation: Andrew NG - Machine Learning Course - This Lecture

What is the 1-of-K Coding Scheme?

For each data point xn, we introduce a corresponding set of binary indicator variables rnk ∈ { 0, 1},
where k = 1, . . . , K describing which of the K clusters the data point xn is assigned to, so that if data
point xn is assigned to cluster k then rnk = 1, and rnj = 0 for j = k.

What is the Objective Function in K-Means Algorithm?

Also called the Distortion Measure, this represents the sum of the squares of the distances of each
data point to its assigned vector μk. N is range of Data Points, and K is range of Topics/Classes.

Our goal is to find values for the {rnk} and the {μk} so as to minimize J.

CSC380: Principles of Data Science 2

What is Distortion Measure?

Same as above question

What is the K-Means algorithm? ( More Formalised Version)

1. Choose the number of clusters k

2. Select k random points from the data as centroids ( We later discuss how to optimise this)
3. Until Convergence
a. Assignment to a cluster Head/Centroid.

b. Resetting the Centroid.

Find the mean of all points in a cluster, and set that as the new centroid of the
cluster. ( Hence the name k-means)

Convergence in K Means

What is Convergence in K-Means?

There is no further change in the assignments of the centroid of clusters, or cluster assignment of
points in the algorithm.

Is Convergence Guaranteed in K-Means?

Yes, in each phase our Objective Function will decrease and will reach a steady state.

An example plot:

CSC380: Principles of Data Science 3

Source: Fig 9.2 in Bishop - Pattern Recognition And Machine Learning

Minima Issues in K-Means

K-means may converge at a Local minima than a Global Minima.

Fig Source: Andrew NG Coursera Machine Learning Course

CSC380: Principles of Data Science 4

What are some reasons why a cluster may have just One Point?

1. The Cluster Point is An Outlier

2. Local Minima Was Attained.

The Number Of Topics for K Means

How can we choose a number of Topics for K-Means?

1.The Most Common Approach is to Visualise it and Choose.

2. The Elbow Method

Over a range of k, we compute distortion score (

or a similar metric). When these are plotted as a
line chart, we will observe a point of inflection- an
elbow.

This is a recommended number of Topics.

Source of Image

But sometimes, we may not observe a strict inflection point, like in the fig below.

CSC380: Principles of Data Science 5

Image Source : Andrew NG Machine Learning Machine Learning Course

3. KMeans++ Algorithm

What is the KMeans ++ Algorithm?

KMeans does the first step, assignment of the first round of initialization of centroids in a smarter
initialization of the centroids and improves the quality of the clustering.

From GeeksForGeeks:

1. Randomly select the first centroid from the data points.

2. For each data point compute its distance from the nearest, previously chosen
centroid.
3. Select the next centroid from the data points such that the probability of
choosing a point as centroid is directly proportional to its distance from the
nearest, previously chosen centroid. (i.e. the point having maximum distance
from the nearest centroid is most likely to be selected next as a centroid)
4. Repeat steps 2 and 3 until k centroids have been sampled

Video Recommendation: Sara Jensen: K Means++

K Medoids

K-means may not be the most suitable algorithm in some cases, since it is very sensitive to noise and
outliers. While, K-means attempts to minimize the total squared error, while k-medoids minimize

CSC380: Principles of Data Science 6

the sum of dissimilarities between points labeled to be in a cluster and a point designated as the
center of that cluster. In contrast to the k-means algorithm, k-medoids choose datapoints as centers
( medoids or exemplars).[1]

So an improvised K means algorithm - K medoids is used.

K-medoids algorithm from GeeksForGeeks

1. Initialize: select k random points out of the n data points as the medoids.
2. Associate each data point to the closest medoid by using any common distance metric methods.
3. While the cost decreases:
For each medoid m, for each data o point which is not a medoid:
1. Swap m and o, associate each data point to the closest medoid, recompute the cost.
2. If the total cost is more than that in the previous step, undo the swap.

Reference 1. K-means and K-medoids: applet

Assumptions made by K Means

Simple and straightforward explanation with figures found at mbmlbook.

Summary:
1. All clusters are the same size.
2. Clusters have the same extent in every direction.
3. Clusters have similar numbers of points assigned to them.
Find a demonstration at Demonstration of k-means assumptions — scikit-learn 1.0.1
documentation

Implementation of K Means

sklearn.cluster.KMeans — scikit-learn 1.0.1 documentation

Sample Implementation on the IRIS Dataset

From Scratch Implementation - K Means Clustering | K Means Clustering Algorithm in Python

CSC380: Principles of Data Science 7

No-Glamour Listening Comprehension
100% (9)
No-Glamour Listening Comprehension
264 pages
2026 수능특강 3강-변형문제
No ratings yet
2026 수능특강 3강-변형문제
19 pages
4th Semester Mcqs of Evidence Based Practices
76% (21)
4th Semester Mcqs of Evidence Based Practices
5 pages
Abstract Reasoning Questions & Answers - Page 6
No ratings yet
Abstract Reasoning Questions & Answers - Page 6
8 pages
Vladimir Propp (1976-2009), On The Comic and Laughter
100% (1)
Vladimir Propp (1976-2009), On The Comic and Laughter
192 pages
Global Survey Assessment Centre Practices Report 2012
No ratings yet
Global Survey Assessment Centre Practices Report 2012
69 pages
IB Unit - English 7 - Narrative v2
No ratings yet
IB Unit - English 7 - Narrative v2
3 pages
Kmeans
No ratings yet
Kmeans
92 pages
ML 5 (1)
No ratings yet
ML 5 (1)
61 pages
K - Means Clustering
No ratings yet
K - Means Clustering
13 pages
K Mean
No ratings yet
K Mean
12 pages
UNIT 4
No ratings yet
UNIT 4
125 pages
Clustering
No ratings yet
Clustering
18 pages
kmea
No ratings yet
kmea
53 pages
Unsupervised Learning
No ratings yet
Unsupervised Learning
12 pages
Chapter 9
No ratings yet
Chapter 9
8 pages
K-Means Clustering
No ratings yet
K-Means Clustering
5 pages
K means algorithm
No ratings yet
K means algorithm
4 pages
Clustering K-Means
100% (2)
Clustering K-Means
28 pages
K Mean
No ratings yet
K Mean
7 pages
Unsupervised Learning (1)
No ratings yet
Unsupervised Learning (1)
27 pages
Week 10
No ratings yet
Week 10
41 pages
ML Unit-2
No ratings yet
ML Unit-2
31 pages
Clustering - K-Means: Prerequisite
No ratings yet
Clustering - K-Means: Prerequisite
8 pages
K-Means Clustering Algorithm
No ratings yet
K-Means Clustering Algorithm
40 pages
WWW Simplilearn Com Tutorials Machine Learning Tutorial K Means Clustering Algor
No ratings yet
WWW Simplilearn Com Tutorials Machine Learning Tutorial K Means Clustering Algor
19 pages
ML-12
No ratings yet
ML-12
19 pages
Week 9
No ratings yet
Week 9
66 pages
Lesson 5 - Unsupervised Learning
No ratings yet
Lesson 5 - Unsupervised Learning
11 pages
K Means Clustering
No ratings yet
K Means Clustering
17 pages
Kmean
No ratings yet
Kmean
24 pages
9.1. Machine Learning Unsupervised Learning-1
No ratings yet
9.1. Machine Learning Unsupervised Learning-1
57 pages
8-cluster
No ratings yet
8-cluster
33 pages
1-Kmeans
No ratings yet
1-Kmeans
13 pages
K-Means Clustering
No ratings yet
K-Means Clustering
3 pages
Cluster
No ratings yet
Cluster
50 pages
K - Means Clustering
No ratings yet
K - Means Clustering
3 pages
DM&BAFall2204 2
No ratings yet
DM&BAFall2204 2
61 pages
Unsupervised Learning 2024-PPG
No ratings yet
Unsupervised Learning 2024-PPG
85 pages
K Means Clustering
No ratings yet
K Means Clustering
22 pages
P-3 1 2-Kmeans
No ratings yet
P-3 1 2-Kmeans
43 pages
DSV_Unit 3_Data Analysis in Depth
No ratings yet
DSV_Unit 3_Data Analysis in Depth
53 pages
Clustering-Part1.pptx
No ratings yet
Clustering-Part1.pptx
84 pages
KMeans Clustering
No ratings yet
KMeans Clustering
16 pages
Unit 3 - KmeansClustering
No ratings yet
Unit 3 - KmeansClustering
17 pages
13: Clustering: Unsupervised Learning - Introduction
No ratings yet
13: Clustering: Unsupervised Learning - Introduction
4 pages
Introduction To The K-Means Clustering Algorithm Based On The Elbow
No ratings yet
Introduction To The K-Means Clustering Algorithm Based On The Elbow
4 pages
2 - K-Mean
No ratings yet
2 - K-Mean
39 pages
kmeansfinal
No ratings yet
kmeansfinal
16 pages
Clustering in Machine Learning
No ratings yet
Clustering in Machine Learning
20 pages
datamining-lect8
No ratings yet
datamining-lect8
79 pages
16 K Mean Clustring 1 18052023 095249am 08042024 093324am
No ratings yet
16 K Mean Clustring 1 18052023 095249am 08042024 093324am
20 pages
Mod4_Unsupervised Learning
No ratings yet
Mod4_Unsupervised Learning
9 pages
Clustering Algorithms
No ratings yet
Clustering Algorithms
19 pages
K MEANS
No ratings yet
K MEANS
40 pages
AI-AG-Day-2-28th Feb 2023
No ratings yet
AI-AG-Day-2-28th Feb 2023
44 pages
Pilot
No ratings yet
Pilot
3 pages
MINOR PROJECT
No ratings yet
MINOR PROJECT
10 pages
Lecture5 - Clustering (K Means and K Medoids)
No ratings yet
Lecture5 - Clustering (K Means and K Medoids)
36 pages
K-Means Clustering Algorithm: - V - ' Is The Euclidean Distance Between X ' Is The Number of Data Points in I
No ratings yet
K-Means Clustering Algorithm: - V - ' Is The Euclidean Distance Between X ' Is The Number of Data Points in I
3 pages
Clustering Kmeans
No ratings yet
Clustering Kmeans
6 pages
Clustering
No ratings yet
Clustering
125 pages
Unit 4 Aam
No ratings yet
Unit 4 Aam
26 pages
Datamining-lect5 - Clustering. the K-means Algorithm. Hierarchical Clustering. the DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-lect5 - Clustering. the K-means Algorithm. Hierarchical Clustering. the DBSCAN Algorithm. Clustering Evaluation
110 pages
Assignment 6 ML
No ratings yet
Assignment 6 ML
4 pages
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
The Secret Of Machine Learning
From Everand
The Secret Of Machine Learning
Mhd Arjunanta
No ratings yet
Rock
No ratings yet
Rock
48 pages
4.2 Generative
No ratings yet
4.2 Generative
21 pages
3 6-ConditionalIndependence
No ratings yet
3 6-ConditionalIndependence
38 pages
Mobcomp Unit3
No ratings yet
Mobcomp Unit3
12 pages
Unit 3
No ratings yet
Unit 3
119 pages
Howse - Interview of Teen Reader
No ratings yet
Howse - Interview of Teen Reader
7 pages
Venus Gonyora Data Analyst
No ratings yet
Venus Gonyora Data Analyst
2 pages
Grice's Bricks and Slabs: Friday, April 19, 2013
No ratings yet
Grice's Bricks and Slabs: Friday, April 19, 2013
22 pages
L2 - The Nature of Planned Change - 04
No ratings yet
L2 - The Nature of Planned Change - 04
4 pages
A Structural Probe For Finding Syntax in Word Representations
No ratings yet
A Structural Probe For Finding Syntax in Word Representations
10 pages
Najdi Arabic Grammar
100% (2)
Najdi Arabic Grammar
39 pages
Ict Abstract and Paraphrase
No ratings yet
Ict Abstract and Paraphrase
4 pages
Concept Mapping &amp Mind Mapping
50% (2)
Concept Mapping &amp Mind Mapping
15 pages
Classroom Art Talk - How Discourse Shapes Teaching and Learning in
No ratings yet
Classroom Art Talk - How Discourse Shapes Teaching and Learning in
11 pages
11 Humss Exam
100% (1)
11 Humss Exam
2 pages
3871-Article Text-14575-1-10-20220629
No ratings yet
3871-Article Text-14575-1-10-20220629
11 pages
Critiquing Available Materials and Appropriate Technique Ed
No ratings yet
Critiquing Available Materials and Appropriate Technique Ed
41 pages
Greetings: (2 Minutes)
No ratings yet
Greetings: (2 Minutes)
5 pages
Artikel 21 Readability Test
No ratings yet
Artikel 21 Readability Test
8 pages
Occult Experiments in The Home PDF
75% (4)
Occult Experiments in The Home PDF
135 pages
Jcu - Writing An Essay
No ratings yet
Jcu - Writing An Essay
7 pages
Ms Diễm - 25 bản
No ratings yet
Ms Diễm - 25 bản
6 pages
Cambridge Grammar of English
No ratings yet
Cambridge Grammar of English
20 pages
Se3ae Activity9 Engl120 Camacho Sachiko N.
No ratings yet
Se3ae Activity9 Engl120 Camacho Sachiko N.
3 pages
Health Education
No ratings yet
Health Education
23 pages
First Language Acuisition.......
No ratings yet
First Language Acuisition.......
19 pages
Dreams of Chaotic Perfection
No ratings yet
Dreams of Chaotic Perfection
5 pages
88-94 Des Koloni 22
No ratings yet
88-94 Des Koloni 22
7 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Kmeans Notes

Uploaded by

Kmeans Notes

Uploaded by

CSC380: Principles of Data Science

Class Notes | Clustering - K Means

Supervised versus Unsupervised Learning 1

The Number Of Topics for K Means 5

Assumptions made by K Means 7

Read more at Nvidia Blog, IBM Blog

Supervised Learning: Classification

Read more at (1) Nvidia Blog Google Developers Blog

CSC380: Principles of Data Science 1

Based on Chapter 9: Pattern Recognition and Machine Learning- Christopher M Bishop

What is the Basic Idea behind K-means?

● Assign (Random) Cluster Centroids

A simple gif visualization at Introduction to K-Means Clustering in Python with scikit-learn

Video Recommendation: Andrew NG - Machine Learning Course - This Lecture

What is the 1-of-K Coding Scheme?

What is the Objective Function in K-Means Algorithm?

CSC380: Principles of Data Science 2

Same as above question

What is the K-Means algorithm? ( More Formalised Version)

1. Choose the number of clusters k

b. Resetting the Centroid.

What is Convergence in K-Means?

Is Convergence Guaranteed in K-Means?

CSC380: Principles of Data Science 3

Minima Issues in K-Means

K-means may converge at a Local minima than a Global Minima.

Fig Source: Andrew NG Coursera Machine Learning Course

CSC380: Principles of Data Science 4

1. The Cluster Point is An Outlier

The Number Of Topics for K Means

How can we choose a number of Topics for K-Means?

1.The Most Common Approach is to Visualise it and Choose.

2. The Elbow Method

Over a range of k, we compute distortion score (

This is a recommended number of Topics.

CSC380: Principles of Data Science 5

What is the KMeans ++ Algorithm?

1. Randomly select the first centroid from the data points.

Video Recommendation: Sara Jensen: K Means++

CSC380: Principles of Data Science 6

So an improvised K means algorithm - K medoids is used.

K-medoids algorithm from GeeksForGeeks

Reference 1. K-means and K-medoids: applet

Assumptions made by K Means

sklearn.cluster.KMeans — scikit-learn 1.0.1 documentation

Sample Implementation on the IRIS Dataset

From Scratch Implementation - K Means Clustering | K Means Clustering Algorithm in Python

CSC380: Principles of Data Science 7

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.