0% found this document useful (0 votes)

2 views81 pages

Clustering and K-Means Algorithm

The document provides an overview of unsupervised learning, focusing on clustering techniques, particularly the K-Means algorithm. It explains the differences between supervised and unsupervised learning, types of clustering, and various clustering methods, including hierarchical and density-based clustering. Additionally, it discusses applications of clustering and the challenges associated with the K-Means method.

Uploaded by

kritikamishra4000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views81 pages

Clustering and K-Means Algorithm

Uploaded by

kritikamishra4000

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 81

UNSUPERVISED

LEARNING
(Clustering and K-Means
Algorithm)
Outline of Presentation

❑Clustering
❑Types of Clustering
❑Examples and explanation of Clustering
❑K-Means Clustering
❑Working of K-Means Algorithm
❑K-Means Algorithm Examples
❑Hierarchal Clustering
❑Distance Based Clustering
Machine Learning
Classification
Supervised Task Driven (Defined Labels)
Predict Future
Learning Values Regression
(No Defined Labels)

Machine Unsupervised Data Driven

Learning Learning Clustering

Reinforcement Learning from the

mistakes
Learning
Supervised learning vs. Unsupervised learning

Supervised learning: discover patterns in the data that relate

data attributes with a target (class) attribute.
• These patterns are then utilized to predict the values of the
target attribute in future data instances.

Unsupervised learning: The data have no target attribute.

• We want to explore the data to find some intrinsic structures in
them.
Clustering

❑ Clustering is a technique for finding similarity groups in data,

called clusters. i.e.,
❖ It groups data instances that are similar to (near) each other in
one cluster and data instances that are very different (far away)
from each other into different clusters.
❑ Clustering is often called an unsupervised learning task as
no class values denoting an a priori grouping of the data
instances are given, which is the case in supervised learning.
❑ Due to historical reasons, clustering is often considered
synonymous with unsupervised learning.
❖ In fact, association rule mining is also unsupervised
WHAT Is Clustering?

“Clustering is the process of dividing the datasets into groups,

consisting of similar data-points”
• Points in the same group are as similar as possible.
• Points in different group are as dissimilar as possible

Examples of Clustering

Items arranges in a Mall Group of Diners in a Restaurant

Illustration of Clustering
The data set has three natural groups of
data points, i.e., 3 natural clusters.

Example 1: groups people of similar sizes together to

make “small”, “medium” and “large” T-Shirts.
• Tailor-made for each person: too expensive
• One-size-fits-all: does not fit all.
Example 2: In marketing, segment customers
according to their similarities
• To do targeted marketing.
➢ It has a long history, and used in almost every field, e.g., medicine,
psychology, botany, sociology, biology, archeology, marketing, insurance,
libraries, etc.
➢ In recent years, due to the rapid increase of online documents, text
clustering becomes important.
Aspects of Clustering
➢ A clustering algorithm
▪ Partitional clustering
▪ Hierarchical clustering
▪ …

➢ A distance (similarity, or dissimilarity) function

➢ Clustering quality
▪ Inter-clusters distance  maximized
▪ Intra-clusters distance  minimized

➢ The quality of a clustering result depends on the algorithm, the

distance function, and the application.
What is Similarity

• In the field of
cluster analysis,
this similarity
plays an
important part.
• Now, we shall
learn how
similarity (this is
also alternatively
judged as
“dissimilarity”)
between any two
data can be
measured.
Applications of clustering

Recommendation Search results

System
Banking/ Finance/ insurance

Grouping of photos Movie

Recommendation
Retail Stores
Applications of clustering (Face Clustering)
Applications of clustering (Face Clustering)
Types of CLUSTERING
❑ Exclusive Clustering
❑ Overlapping Clustering
❑ Hierarchical Clustering

Exclusive Clustering
• Hard Clustering
• Data Point/Item belongs exclusively to one
cluster
• For Example- K-Means Clustering
Types of CLUSTERING
❑ Exclusive Clustering
❑ Overlapping Clustering
❑ Hierarchical Clustering

Overlapping Clustering
• Soft Cluster
• Data Points/Item Belong to Multiple Cluster
• For Example- Fuzzy/ C-Means Clustering
Types of CLUSTERING
❑ Exclusive Clustering
❑ Overlapping Clustering
❑ Hierarchical Clustering

Hierarchal Clustering
Methods of CLUSTERING
❑ Partitioning Method
❑ Hierarchical Method
❑ Density-Based Methods

Partitioning Methods
Partitioning a database D of n objects into a set of k clusters, such that the sum of
squared distances is minimized (where c i is the centroid or medoid of cluster Ci)

E = ik=1 pCi ( p − ci ) 2
k-means (MacQueen’67, Lloyd’57/’82): Each cluster is represented by the
center of the cluster
k-medoids or PAM (Partition around medoids) (Kaufman & Rousseeuw’87):
Each cluster is represented by one of the objects in the cluster
K-means Clustering

➢ K-Means is a clustering algorithm whose main goal is to group similar elements

or data-points into a cluster.
➢ “K” represents the number of clusters

Examples:

Pile of Dirty Cloths

Document Classifier
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
K-means Clustering
How will you find value of k
Summary of K-means algorithm
Application of K-means algorithm
Colour-Based Image Segmentation Using K-means
Step 1: Loading a colour image of
tissue stained with
hemotoxylin and eosin
(H&E)
Application of K-means algorithm
Colour-Based Image Segmentation Using K-means
Step 2: Convert the image from RGB colour space to L*a*b*
colour space
• Unlike the RGB colour model, L*a*b* colour is
designed to approximate human vision.
• There is a complicated transformation between RGB
and L*a*b*.
(L*, a*, b*) = T(R, G, B)
(R, G, B) = T’(L*, a*, b*).
Application of K-means algorithm
Colour-Based Image Segmentation Using K-means
Step 3: Undertake clustering analysis in the (a*, b*) colour space
with the K-means algorithm
• In the L*a*b* colour space, each pixel has a
properties or feature vector: (L*, a*, b*).
• Like feature selection, L* feature is discarded. As a
result, each pixel has a feature vector (a*, b*).
• Applying the K-means algorithm to the image in the
a*b* feature space where K = 3 by applying the
domain knowledge.
Application of K-means algorithm
Colour-Based Image Segmentation Using K-means
Step 4: Label every pixel in the
image using the results
from K-means clustering
(indicated by three
different grey levels)
Application of K-means algorithm
Colour-Based Image Segmentation Using K-means
Step 5: Create Images that Segment the H&E Image by Colour
• Apply the label and the colour information of each pixel to achieve
separate colour images corresponding to three clusters.

“blue” pixels “white” pixels “pink” pixels

Application of K-means algorithm
Colour-Based Image Segmentation Using K-means
Step 6: Segment the nuclei into a
separate image with the L*
feature
• In cluster 1, there are dark and light
blue objects (pixels). The dark blue
objects (pixels) correspond to nuclei
(with the domain knowledge).
• L* feature specifies the brightness
values of each colour.
• With a threshold for L*, we achieve
an image containing the nuclei only.
Problem of the K-Means Method

• The algorithm is only applicable if the mean is defined.

• For categorical data, k-mode - the centroid is represented by
most frequent values.
• The user needs to specify k.
• The algorithm is sensitive to outliers
• Outliers are data points that are very far away from other
data points.
• Outliers could be errors in the data recording or some special
data points with very different values.
Problems with outliers
Problem of the K-Means Method

• The algorithm is sensitive to initial seeds.

Problem of the K-Means Method
• If we use different seeds: good results
▪ There are some
methods to help
choose good
seeds
Problem of the K-Means Method

• The k-means algorithm is not suitable for discovering clusters

that are not hyper-ellipsoids (or hyper-spheres).
Problem of the K-Means Method
• If we use different seeds: good results
▪ There are some
methods to help
choose good
seeds
Methods of CLUSTERING
❑ Partitioning Method
❑ Hierarchical Method
❑ Density-Based Methods

Partitioning Methods
Partitioning a database D of n objects into a set of k clusters, such that the sum of
squared distances is minimized (where c i is the centroid or medoid of cluster Ci)

• Use distance matrix as clustering criteria. This method does not

require the number of clusters k as an input, but needs a termination
condition
Step 0 Step 1 Step 2 Step 3 Step 4 agglomerative
(AGNES)
a Agglomerative Nesting
ab
b abcde
c
cde
d
de
e divisive
(DIANA)
Step 4 Step 3 Step 2 Step 1 Step 0
Divisive Analysis
Types of hierarchical clustering

➢ Agglomerative (bottom up) clustering: It builds the

dendrogram (tree) from the bottom level, and
▪ merges the most similar (or nearest) pair of cluster types of
hierarchical clustering.
▪ stops when all the data points are merged into a single cluster
(i.e., the root cluster).
➢ Divisive (top down) clustering: It starts with all data
points in one cluster, the root.
• Splits the root into a set of child clusters. Each child cluster is
recursively divided further
• stops when only singleton clusters of individual data points
remain, i.e., each cluster with only a single point
An example: working of the algorithm
Hierarchical CLUSTERING

Decompose data objects into a several levels of nested partitioning (tree of

clusters), called a dendrogram

A clustering of the data objects is obtained by cutting the dendrogram at the

desired level, then each connected component forms a cluster
Density-Based CLUSTERING

• Clustering based on density (local cluster criterion), such as density-

connected points
• Major features:
➢ Several interesting studies: Discover clusters of arbitrary shape
➢ Handle noise
➢ One scan
➢ Need density parameters as termination condition
• Several interesting studies:
DBSCAN: Ester, et al. (KDD’96)
OPTICS: Ankerst, et al (SIGMOD’99).
DENCLUE: Hinneburg & D. Keim (KDD’98)
CLIQUE: Agrawal, et al. (SIGMOD’98) (more grid-based)
Density-Based CLUSTERING

• Two parameters: Epsilon and Minimum Points

• Eps: Maximum radius of the neighbourhood
• MinPts: Minimum number of points in an Eps-neighbourhood
of that point
• NEps(p): {q belongs to D | dist(p,q) ≤ Eps}
• Directly density-reachable: A point p is directly density-reachable
from a point q w.r.t. Eps, MinPts if
• p belongs to NEps(q) p MinPts = 5
• core point condition: Eps = 1 cm
|NEps (q)| ≥ MinPts q
Density-Reachable and Density-Connected

• Density-reachable:
• A point p is density-reachable from a
p
point q w.r.t. Eps, MinPts if there is a p1
chain of points p1, …, pn, p1 = q, pn = p q
such that pi+1 is directly density-reachable
from pi
• Density-connected
• A point p is density-connected to a point
q w.r.t. Eps, MinPts if there is a point o
p q
such that both, p and q are density-
reachable from o w.r.t. Eps and MinPts o
DBSCAN: Density-Based Spatial
Clustering of Applications with Noise
• Relies on a density-based notion of cluster: A cluster is defined as a
maximal set of density-connected points
• Discovers clusters of arbitrary shape in spatial databases with noise

Outlier

Border
Eps = 1cm
Core MinPts = 5
DBSCAN: The Algorithm

• Arbitrary select a point p

• Retrieve all points density-reachable from p w.r.t. Eps and MinPts

• If p is a core point, a cluster is formed

• If p is a border point, no points are density-reachable from p and DBSCAN

visits the next point of the database

• Continue the process until all of the points have been processed

Clustering K Means Agnes
No ratings yet
Clustering K Means Agnes
36 pages
Clustering
No ratings yet
Clustering
25 pages
Clustering-Part1
No ratings yet
Clustering-Part1
79 pages
Clustering
No ratings yet
Clustering
32 pages
Module 5
No ratings yet
Module 5
91 pages
Clustering
No ratings yet
Clustering
67 pages
ML Unit III.pptx
No ratings yet
ML Unit III.pptx
82 pages
Ml Module5 Clustering
No ratings yet
Ml Module5 Clustering
71 pages
Cluster Analysis
No ratings yet
Cluster Analysis
21 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
No ratings yet
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
38 pages
Unit3 Suhail Rashid
No ratings yet
Unit3 Suhail Rashid
106 pages
8. Clustering
No ratings yet
8. Clustering
80 pages
ML Module 4 Unsupervised Learning - Updated
No ratings yet
ML Module 4 Unsupervised Learning - Updated
55 pages
Ann
No ratings yet
Ann
86 pages
M3 - Unsupervised Machine Learning
No ratings yet
M3 - Unsupervised Machine Learning
35 pages
ML Mod 4 Part 1
No ratings yet
ML Mod 4 Part 1
99 pages
2002 Spring CS525 Lecture 2
No ratings yet
2002 Spring CS525 Lecture 2
37 pages
Unit 3 Clustering Algorithm
No ratings yet
Unit 3 Clustering Algorithm
44 pages
Machine Learning & Data Mining
No ratings yet
Machine Learning & Data Mining
108 pages
Clustering
No ratings yet
Clustering
29 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
What is Unsupervised Learning (1)
No ratings yet
What is Unsupervised Learning (1)
9 pages
Clustering Notes
No ratings yet
Clustering Notes
37 pages
k Means Clustering
No ratings yet
k Means Clustering
29 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
IT3080 Lecture04 2023
No ratings yet
IT3080 Lecture04 2023
56 pages
UNIT 4
No ratings yet
UNIT 4
125 pages
Lecture 6
No ratings yet
Lecture 6
14 pages
ML CH 4
No ratings yet
ML CH 4
51 pages
DS_Lecture-Week-3 Recursion & Searching
No ratings yet
DS_Lecture-Week-3 Recursion & Searching
51 pages
Clustering
No ratings yet
Clustering
104 pages
Unit- 4(ML)
No ratings yet
Unit- 4(ML)
13 pages
7.introduction To Clustering
No ratings yet
7.introduction To Clustering
11 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
Report Batch-1
No ratings yet
Report Batch-1
69 pages
Unsupesfwafarvised Learning
No ratings yet
Unsupesfwafarvised Learning
49 pages
MODULE 2 and 3
No ratings yet
MODULE 2 and 3
53 pages
DWMModule 4 (1) (1) (1)
No ratings yet
DWMModule 4 (1) (1) (1)
31 pages
ML Unit-4 Final 2024-25
No ratings yet
ML Unit-4 Final 2024-25
28 pages
ML - 8
No ratings yet
ML - 8
70 pages
ARTIFICIAL INTELLIGENCE LEC 5
No ratings yet
ARTIFICIAL INTELLIGENCE LEC 5
20 pages
ML Unit-4
No ratings yet
ML Unit-4
14 pages
DS Lecture Week 13
No ratings yet
DS Lecture Week 13
38 pages
A Handheld LiDAR-Based Semantic Automatic Segmenta
No ratings yet
A Handheld LiDAR-Based Semantic Automatic Segmenta
40 pages
BDA Unit 2
No ratings yet
BDA Unit 2
31 pages
Clustering Algorithm: An Unsupervised Learning Approach
No ratings yet
Clustering Algorithm: An Unsupervised Learning Approach
23 pages
M5
No ratings yet
M5
40 pages
Lecture 01 - Unsupervised Learning (Optional)
No ratings yet
Lecture 01 - Unsupervised Learning (Optional)
57 pages
Clustering Explanation
No ratings yet
Clustering Explanation
8 pages
K-Means Data Clustering Approach: Jaipur National University
No ratings yet
K-Means Data Clustering Approach: Jaipur National University
43 pages
unit-4 ML
No ratings yet
unit-4 ML
16 pages
8. Clustering
No ratings yet
8. Clustering
38 pages
Unit 4
No ratings yet
Unit 4
74 pages
Clustering
No ratings yet
Clustering
125 pages
ML UNIT 4 Sir
No ratings yet
ML UNIT 4 Sir
42 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
Face XFormer
No ratings yet
Face XFormer
27 pages
22AIP3101A Session 9
No ratings yet
22AIP3101A Session 9
38 pages
Clustering
No ratings yet
Clustering
84 pages
DS Lecture Week 5
No ratings yet
DS Lecture Week 5
26 pages
DS Lecture Week 4 Sorting
No ratings yet
DS Lecture Week 4 Sorting
22 pages
FML Unit4
No ratings yet
FML Unit4
14 pages
Brosnan y Sun PDF
No ratings yet
Brosnan y Sun PDF
14 pages
Contour Based Tracking
No ratings yet
Contour Based Tracking
20 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Inbound1209687539 PDF
No ratings yet
Inbound1209687539 PDF
7 pages
2023 Optimized deep learning architecture for brain tumor classification using improved Hunger Games Search Algorithm
No ratings yet
2023 Optimized deep learning architecture for brain tumor classification using improved Hunger Games Search Algorithm
21 pages
unsupervised learning
No ratings yet
unsupervised learning
23 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
INTERSHIP REPORT IMAGE PROCESSING
No ratings yet
INTERSHIP REPORT IMAGE PROCESSING
31 pages
DM Lecture 06
No ratings yet
DM Lecture 06
32 pages
Investigating Root Architectural Differences in Li
No ratings yet
Investigating Root Architectural Differences in Li
14 pages
L 1
No ratings yet
L 1
14 pages
Bm3652 Mip Notes 2021 Regulation
No ratings yet
Bm3652 Mip Notes 2021 Regulation
79 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
17 pages
K Means Clustering Lecture
No ratings yet
K Means Clustering Lecture
32 pages
CellT-Net_A_Composite_Transformer_Method_for_2-D_Cell_Instance_Segmentation
No ratings yet
CellT-Net_A_Composite_Transformer_Method_for_2-D_Cell_Instance_Segmentation
12 pages
Smart Parking System
No ratings yet
Smart Parking System
11 pages
Development of A Prototype Robot and Fast Path-Planning Algorithm For Static Laser Weeding
No ratings yet
Development of A Prototype Robot and Fast Path-Planning Algorithm For Static Laser Weeding
10 pages
L2
No ratings yet
L2
11 pages
Bitwise Operations On Images in Computer Vision
No ratings yet
Bitwise Operations On Images in Computer Vision
4 pages
Measurement: Amit Kumar Jaiswal, Prayag Tiwari, Sachin Kumar, Deepak Gupta, Ashish Khanna, Joel J.P.C. Rodrigues
No ratings yet
Measurement: Amit Kumar Jaiswal, Prayag Tiwari, Sachin Kumar, Deepak Gupta, Ashish Khanna, Joel J.P.C. Rodrigues
8 pages
Review Article: Fabric Defect Detection in Textile Manufacturing: A Survey of The State of The Art
No ratings yet
Review Article: Fabric Defect Detection in Textile Manufacturing: A Survey of The State of The Art
13 pages
Dip L3
No ratings yet
Dip L3
42 pages
SAS Cluster Project Report
100% (1)
SAS Cluster Project Report
24 pages
PHD Thesis Manufacturing Engineering
100% (3)
PHD Thesis Manufacturing Engineering
4 pages
CSEDS 7
No ratings yet
CSEDS 7
16 pages
An Accurate Plant Disease Detection Technique Usin
No ratings yet
An Accurate Plant Disease Detection Technique Usin
9 pages
Silkworm Egg Problem
No ratings yet
Silkworm Egg Problem
2 pages
An Online Visual Measurement Method for Workpiece Dimension Based on Deep Learning
No ratings yet
An Online Visual Measurement Method for Workpiece Dimension Based on Deep Learning
11 pages
OCR Using Image Processing
No ratings yet
OCR Using Image Processing
8 pages
Tree Counting
No ratings yet
Tree Counting
6 pages
Ai in Shopping
No ratings yet
Ai in Shopping
5 pages
The Importanceof Centroidin Image Processing
No ratings yet
The Importanceof Centroidin Image Processing
3 pages
CSET243-R23_questionpaper
No ratings yet
CSET243-R23_questionpaper
3 pages
Dip
No ratings yet
Dip
2 pages
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
From Everand
Machine Learning with Clustering: A Visual Guide for Beginners with Examples in Python
Artem Kovera
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Clustering and K-Means Algorithm

Uploaded by

Clustering and K-Means Algorithm

Uploaded by

UNSUPERVISED

Machine Unsupervised Data Driven

Reinforcement Learning from the

Supervised learning: discover patterns in the data that relate

Unsupervised learning: The data have no target attribute.

❑ Clustering is a technique for finding similarity groups in data,

“Clustering is the process of dividing the datasets into groups,

Items arranges in a Mall Group of Diners in a Restaurant

Example 1: groups people of similar sizes together to

➢ A distance (similarity, or dissimilarity) function

➢ The quality of a clustering result depends on the algorithm, the

Recommendation Search results

Grouping of photos Movie

➢ K-Means is a clustering algorithm whose main goal is to group similar elements

Pile of Dirty Cloths

“blue” pixels “white” pixels “pink” pixels

• The algorithm is only applicable if the mean is defined.

• The algorithm is sensitive to initial seeds.

• The k-means algorithm is not suitable for discovering clusters

• Use distance matrix as clustering criteria. This method does not

➢ Agglomerative (bottom up) clustering: It builds the

Decompose data objects into a several levels of nested partitioning (tree of

A clustering of the data objects is obtained by cutting the dendrogram at the

• Clustering based on density (local cluster criterion), such as density-

• Two parameters: Epsilon and Minimum Points

• Arbitrary select a point p

• If p is a core point, a cluster is formed

• If p is a border point, no points are density-reachable from p and DBSCAN

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.