0% found this document useful (0 votes)

2 views29 pages

Clustering

Uploaded by

Sahil Pahuja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views29 pages

Clustering

Uploaded by

Sahil Pahuja

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 29

What is Cluster Analysis?

• Finding groups of objects such that the objects in a group will

be similar (or related) to one another and different from (or
unrelated to) the objects in other groups

Inter-cluster
Intra-cluster distances are
distances are maximized
minimized
What is Cluster Analysis?
• Cluster: a collection of data objects
– Similar to one another within the same cluster
– Dissimilar to the objects in other clusters
• Cluster analysis
– Grouping a set of data objects into clusters
• Clustering is unsupervised classification: no predefined classes
• Clustering is used:
– As a stand-alone tool to get insight into data distribution
• Visualization of clusters may unveil important information
– As a preprocessing step for other algorithms
• Efficient indexing or compression often relies on clustering
Some Applications of Clustering
• Pattern Recognition
• Image Processing
– cluster images based on their visual content
• Bio-informatics
• WWW and IR
– document classification
– cluster Weblog data to discover groups of similar access patterns
Applications of Cluster Analysis
• Understanding Discovered Clusters
Applied-Matl-DOWN,Bay-Network-Down,3-COM-DOWN,
Industry Group

– Group related documents for 1 Cabletron-Sys-DOWN,CISCO-DOWN,HP-DOWN,

DSC-Comm-DOWN,INTEL-DOWN,LSI-Logic-DOWN,
Micron-Tech-DOWN,Texas-Inst-Down,Tellabs-Inc-Down,
Technology1-DOWN

browsing, group genes and Natl-Semiconduct-DOWN,Oracl-DOWN,SGI-DOWN,

Sun-DOWN
Apple-Comp-DOWN,Autodesk-DOWN,DEC-DOWN,
proteins that have similar 2 ADV-Micro-Device-DOWN,Andrew-Corp-DOWN,
Computer-Assoc-DOWN,Circuit-City-DOWN,
Technology2-DOWN
functionality, or group stocks Compaq-DOWN, EMC-Corp-DOWN, Gen-Inst-DOWN,
Motorola-DOWN,Microsoft-DOWN,Scientific-Atl-DOWN

with similar price fluctuations Fannie-Mae-DOWN,Fed-Home-Loan-DOWN,

3 MBNA-Corp-DOWN,Morgan-Stanley-DOWN Financial-DOWN
Baker-Hughes-UP,Dresser-Inds-UP,Halliburton-HLD-UP,

4 Louisiana-Land-UP,Phillips-Petro-UP,Unocal-UP,
Schlumberger-UP
Oil-UP

• Summarization
– Reduce the size of large data
sets

Clustering
precipitation in
Australia
What is not Cluster Analysis?
• Supervised classification
– Have class label information

• Simple segmentation
– Dividing students into different registration groups alphabetically,
by last name

• Results of a query
– Groupings are a result of an external specification
Notion of a Cluster can be Ambiguous

How many clusters? Six Clusters

Two Clusters Four Clusters

What Is Good Clustering?
• A good clustering method will produce high quality clusters
with
– high intra-class similarity
– low inter-class similarity
• The quality of a clustering result depends on both the
similarity measure used by the method and its implementation
(i.e. algorithms used).
• The quality of a clustering method is also measured by its
ability to discover some or all of the hidden patterns.
Major Clustering Approaches
• Partitioning algorithms: Construct random partitions and then iteratively
refine them by some criterion
• Hierarchical algorithms: Create a hierarchical decomposition of the set of
data (or objects) using some criterion
• Density-based: based on connectivity and density functions
• Grid-based: based on a multiple-level granularity structure
• Model-based: A model is hypothesized for each of the clusters and the
idea is to find the best fit of that model to each other
Partitioning Algorithms: Basic Concept

• Partitioning method: Construct a partition of a database D of n

objects into a set of k clusters
– k-means (MacQueen’67): Each cluster is represented by the center of the
cluster. (center may or may not be the object of the cluster)
– k-medoids or PAM (Partition around medoids) (Kaufman &
Rousseeuw’87): Each cluster is represented by one of the objects in the
cluster
K-means Clustering

• Partition clustering approach

• Each cluster is associated with a centroid (center point)
• Each point is assigned to the cluster with the closest centroid
• Number of clusters, K, must be specified
• The basic algorithm is very simple
K-Means Algorithm

© Prentice Hall 11
K-means Clustering – Details
• Initial centroids are often chosen randomly.
– Clusters produced vary from one run to another.
• The centroid is (typically) the mean of the points in the cluster.
• ‘Closeness’ is measured by Euclidean distance, cosine similarity,
correlation, etc.
• Most of the convergence happens in the first few iterations.
– Often the stopping condition is changed to ‘Until relatively few points
change clusters’
• Complexity is O( n * K * I * d )
– n = number of points, K = number of clusters,
I = number of iterations, d = number of attributes
K-Means Example
• Given: {2,4,10,12,3,20,30,11,25}, k=2
• Randomly assign means: m1=3,m2=4
• K1={2,3}, K2={4,10,12,20,30,11,25},
m1=2.5,m2=16
• K1={2,3,4},K2={10,12,20,30,11,25}, m1=3,m2=18
• K1={2,3,4,10},K2={12,20,30,11,25},
m1=4.75,m2=19.6
• K1={2,3,4,10,11,12},K2={20,30,25}, m1=7,m2=25
• Stop as the clusters with these means are the
same.

© Prentice Hall 13
Evaluating K-means Clusters
– For each point, the error is the distance to the nearest cluster
– To get SSE(sum of squared error), we square these errors and sum
them.
K
SSE    dist 2 (mi , x )
i 1 xCi

– x is a data point in cluster Ci and mi is the representative point for

cluster Ci
• can show that mi corresponds to the center (mean) of the cluster
– Given two clusters, we can choose the one with the smallest error
Limitations of K-means
• K-means has problems when clusters are of differing
– Sizes
– Densities
– Non-spherical shapes
• Resultant clusters formed depends on Initial centroids chosen.
• K-means has problems when the data contains outliers.
Limitations of K-means: Differing Sizes

Original Points K-means (3 Clusters)

Limitations of K-means: Differing Density

Original Points K-means (3 Clusters)

Limitations of K-means: Non-globular Shapes

Original Points K-means (2 Clusters)

Importance of Choosing Initial Centroids
Iteration 1 Iteration 2 Iteration 3
3 3 3

2.5 2.5 2.5

2 2 2

1.5 1.5 1.5

y
1 1 1

0.5 0.5 0.5

0 0 0

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x

Iteration 4 Iteration 5 Iteration 6

3 3 3

2.5 2.5 2.5

2 2 2

1.5 1.5 1.5

y
1 1 1

0.5 0.5 0.5

0 0 0

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x
Importance of Choosing Initial Centroids …

Iteration 1 Iteration 2
3 3

2.5 2.5

2 2

1.5 1.5
y

y
1 1

0.5 0.5

0 0

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2

x x

Iteration 3 Iteration 4 Iteration 5

3 3 3

2.5 2.5 2.5

2 2 2

1.5 1.5 1.5

y
1 1 1

0.5 0.5 0.5

0 0 0

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2
x x x
Solutions to Initial Centroids Problem

• Multiple runs
– For each run, compute SSE and consider the case with minimum
SSE.
• Sample and use hierarchical clustering to determine initial
centroids.
• Take the centroid of all the points. Then select the point farthest
from this initial centroid as another centroid.
• Select more than k initial centroids and then select among these
initial centroids
– Select most widely separated
The K-Medoids Clustering Method
• Find representative objects, called medoids, in clusters
• PAM (Kaufmann & Rousseeuw 1987) [Partitioning Around
Medoids]
– starts from an initial set of medoids and iteratively replaces one of the
medoids by one of the non-medoids if it improves the total distance of
the resulting clustering
– PAM works effectively for small data sets, but does not scale well for
large data sets
• CLARA (Kaufmann & Rousseeuw, 1990)[clustering large
Applications]
• CLARANS (Ng & Han, 1994): Randomized sampling [Clustering
Large Applications based upon RANdomized Search]
The K-Medoids Clustering Method

PAM (Kaufman and Rousseeuw, 1987)

• Arbitrarily select k objects as medoid
• Assign each data object in the given data set to most similar medoid.
• Randomly select nonmedoid object O’
• Compute total cost, S, of swapping a medoid object to O’ (cost as total
sum of absolute error)
• If S<0, then swap initial medoid with the new one
• Repeat until there is no change in the medoid.

k-medoids and (n-k) instances pair-wise comparison

December 5, 2024 Data Mining: Concepts and Techniques 23

PAM Clustering: Total swapping cost TCih=jCjih
• i is a current medoid, h is a non-selected object
• Assume that i is replaced by h in the set of
medoids
• TCih = 0;
• For each non-selected object j ≠ h:
– TCih += d(j,new_medj)-d(j,prev_medj):
• new_medj = the closest medoid to j after i is replaced by
h
• prev_medj = the closest medoid to j before i is replaced
by h
PAM Clustering: Total swapping cost TCih=jCjih
10 10

9 9
j
8
t 8
t
7 7

5
j 6

4
i h 4
h
3

2
3

2
i
1 1

0 0
0 1 2 3 4 5 6 7 8 9 10 0 1 2 3 4 5 6 7 8 9 10

Cjih = d(j, h) - d(j, i) Cjih = 0

10
10

9
9

8
h 8

j
7
7

6
6

5
5 i
i 4
h j
4

3
t 3

2
2

1
t
1

0
0
0 1 2 3 4 5 6 7 8 9 10
0 1 2 3 4 5 6 7 8 9 10

Cjih = d(j, t) - d(j, i) Cjih = d(j, h) - d(j, t)

PAM
• Partitioning Around Medoids (PAM) (K-
Medoids)
• Handles outliers well.
• Ordering of input does not impact results.
• Does not scale well.
• Each cluster represented by one item, called
the medoid.
• Initial set of k medoids randomly chosen.

© Prentice Hall 27
PAM Cost Calculation
• At each step in algorithm, medoids are changed if
the overall cost is improved.
• Cjih – cost change for an item tj associated with
swapping medoid ti with non-medoid th.

Cluster-Analysis
No ratings yet
Cluster-Analysis
89 pages
Unit - 5 Cluster Analysis
No ratings yet
Unit - 5 Cluster Analysis
83 pages
Unit V - Clustering
No ratings yet
Unit V - Clustering
19 pages
Session 7 Clustering
No ratings yet
Session 7 Clustering
93 pages
Cluster Analysis
No ratings yet
Cluster Analysis
76 pages
Cluster Analysis 1731695796
No ratings yet
Cluster Analysis 1731695796
91 pages
Clustering
No ratings yet
Clustering
25 pages
Cluster
No ratings yet
Cluster
20 pages
Clustering Data Mining
No ratings yet
Clustering Data Mining
27 pages
Clustering
No ratings yet
Clustering
24 pages
UNIT-5 PPT
No ratings yet
UNIT-5 PPT
85 pages
05 Clustering
No ratings yet
05 Clustering
96 pages
Clustering
No ratings yet
Clustering
32 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
07-Clustering
No ratings yet
07-Clustering
54 pages
2002 Spring CS525 Lecture 2
No ratings yet
2002 Spring CS525 Lecture 2
37 pages
Clustering Partition Hierachy
No ratings yet
Clustering Partition Hierachy
58 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
Clustering
No ratings yet
Clustering
34 pages
10ClusBasic
No ratings yet
10ClusBasic
95 pages
Unit 5 DM
No ratings yet
Unit 5 DM
47 pages
10ClusBasic Editted v1
No ratings yet
10ClusBasic Editted v1
41 pages
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
No ratings yet
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
38 pages
10ClusBasic
No ratings yet
10ClusBasic
66 pages
unit iv[1]
No ratings yet
unit iv[1]
96 pages
Ml Module5 Clustering
No ratings yet
Ml Module5 Clustering
71 pages
Clustering
No ratings yet
Clustering
80 pages
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
No ratings yet
Data Mining Lecture Notes-1: Bsc. (H) Computer Science: Vi Semester Teacher: Ms. Sonal Linda
40 pages
DM_C6
No ratings yet
DM_C6
37 pages
Slide-08-Chapter10-Cluster Analysis Basic Concept I
No ratings yet
Slide-08-Chapter10-Cluster Analysis Basic Concept I
40 pages
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
No ratings yet
Cluster Analysis: Dr. Bernard Chen Ph.D. Assistant Professor
43 pages
MIT AI Ethics Education Curriculum
100% (1)
MIT AI Ethics Education Curriculum
94 pages
Datamining-lect5 - Clustering. the K-means Algorithm. Hierarchical Clustering. the DBSCAN Algorithm. Clustering Evaluation
No ratings yet
Datamining-lect5 - Clustering. the K-means Algorithm. Hierarchical Clustering. the DBSCAN Algorithm. Clustering Evaluation
110 pages
DM Clustering
No ratings yet
DM Clustering
51 pages
Lecture 9 Clustering
No ratings yet
Lecture 9 Clustering
36 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
DMDWUNITV
No ratings yet
DMDWUNITV
72 pages
DSML-ML09. Unsupervised Learning
No ratings yet
DSML-ML09. Unsupervised Learning
69 pages
k Means Clustering
No ratings yet
k Means Clustering
29 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Module 5
No ratings yet
Module 5
98 pages
Clustering
No ratings yet
Clustering
104 pages
Lect 10 DM
No ratings yet
Lect 10 DM
36 pages
Lecture 3. Partitioning-Based Clustering Methods
No ratings yet
Lecture 3. Partitioning-Based Clustering Methods
27 pages
Lecture 6
No ratings yet
Lecture 6
14 pages
margadarsika-06470998ddbe4d9-41955371
No ratings yet
margadarsika-06470998ddbe4d9-41955371
60 pages
datamining-lect8
No ratings yet
datamining-lect8
79 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
Chapter 5. Clustering Algorithms-Stud
No ratings yet
Chapter 5. Clustering Algorithms-Stud
44 pages
Unsupervised Learning Modi
No ratings yet
Unsupervised Learning Modi
16 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
77 pages
23MZ02 PPT
No ratings yet
23MZ02 PPT
59 pages
Unsupervised Learning - Clustering
No ratings yet
Unsupervised Learning - Clustering
55 pages
Data Mining: I Gede Mahendra Darmawiguna
No ratings yet
Data Mining: I Gede Mahendra Darmawiguna
25 pages
M5
No ratings yet
M5
40 pages
M5
No ratings yet
M5
40 pages
08 - Classification - Decision Trees
No ratings yet
08 - Classification - Decision Trees
116 pages
Importance of Machine Learning
No ratings yet
Importance of Machine Learning
36 pages
Modeling Analysis and Optimization Under Uncertainty A Review
No ratings yet
Modeling Analysis and Optimization Under Uncertainty A Review
37 pages
Clustering
No ratings yet
Clustering
125 pages
Clustering Algorithm
No ratings yet
Clustering Algorithm
47 pages
UNIT 1 (2)
No ratings yet
UNIT 1 (2)
31 pages
Internal Evaluation - IT762 Dissertation, 04316404523
No ratings yet
Internal Evaluation - IT762 Dissertation, 04316404523
37 pages
Vnd.openxmlformats Officedocument.wordprocessingml.document&Rendition=1
No ratings yet
Vnd.openxmlformats Officedocument.wordprocessingml.document&Rendition=1
24 pages
Unit 4 Clustering - K-Means and Hierarchical
No ratings yet
Unit 4 Clustering - K-Means and Hierarchical
40 pages
Credit Card Fraud Detection
No ratings yet
Credit Card Fraud Detection
16 pages
Basketball Free Throw - Biomechanic Analysis
No ratings yet
Basketball Free Throw - Biomechanic Analysis
13 pages
UNIT 1
No ratings yet
UNIT 1
21 pages
Andrew NG Complete Machine Learning
No ratings yet
Andrew NG Complete Machine Learning
170 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
Wheels Identification Using Machine Vision Technology: Axis
No ratings yet
Wheels Identification Using Machine Vision Technology: Axis
3 pages
Ensemble Learning: Comprehensive Explanation: Base Models
No ratings yet
Ensemble Learning: Comprehensive Explanation: Base Models
20 pages
Heart Disease rp3
No ratings yet
Heart Disease rp3
20 pages
Shoreline Detection Using Optical Remote Sensing: A Review: Geo-Information
No ratings yet
Shoreline Detection Using Optical Remote Sensing: A Review: Geo-Information
21 pages
TSP_IASC_32301
No ratings yet
TSP_IASC_32301
14 pages
2008-A Literature Survey On Domain Adaptation of Statistical Classifiers
No ratings yet
2008-A Literature Survey On Domain Adaptation of Statistical Classifiers
12 pages
3891399 Blockchain Technology Spectrum a Gartner Trend Insight Report
No ratings yet
3891399 Blockchain Technology Spectrum a Gartner Trend Insight Report
13 pages
Ciml v0 - 99 ch02 PDF
No ratings yet
Ciml v0 - 99 ch02 PDF
10 pages
Using Software Measurement in SLAs
No ratings yet
Using Software Measurement in SLAs
11 pages
K-Means Clustering
No ratings yet
K-Means Clustering
5 pages
AI QB For All 5 Units - 2 Marks
No ratings yet
AI QB For All 5 Units - 2 Marks
28 pages
Sahil Pahuja End Term Seminar Report 1
No ratings yet
Sahil Pahuja End Term Seminar Report 1
15 pages
Hexaview_USICT_2025 Time Slot _Online Assessment
No ratings yet
Hexaview_USICT_2025 Time Slot _Online Assessment
3 pages
SLIQ
No ratings yet
SLIQ
15 pages
NLP - Emotion Detection
No ratings yet
NLP - Emotion Detection
8 pages
VDPLTechnicalPaperforWhatsAppAIChatbots
No ratings yet
VDPLTechnicalPaperforWhatsAppAIChatbots
8 pages
UNIT 4 SPM
No ratings yet
UNIT 4 SPM
8 pages
Codathon Questions
No ratings yet
Codathon Questions
11 pages
SVM Notes
No ratings yet
SVM Notes
8 pages
A Hybrid Machine Learning Method For Estimating Software Project Cost
100% (1)
A Hybrid Machine Learning Method For Estimating Software Project Cost
7 pages
Data Mining Concepts
No ratings yet
Data Mining Concepts
35 pages
C4GT Community - Proposal_1
No ratings yet
C4GT Community - Proposal_1
7 pages
From_Dials_to_Facial_Coding_Automated_Detection_of_Spontaneous_Facial_Expressions_fo
No ratings yet
From_Dials_to_Facial_Coding_Automated_Detection_of_Spontaneous_Facial_Expressions_fo
6 pages
Automatic Change Detection On Satellite Images Using Principal Component Analysis, ISODATA and Fuzzy C-Means Methods
No ratings yet
Automatic Change Detection On Satellite Images Using Principal Component Analysis, ISODATA and Fuzzy C-Means Methods
8 pages
IOT & Big Data Analytics
No ratings yet
IOT & Big Data Analytics
4 pages
Confusion Matrix For Your Multi-Class Machine Learning Model - by Joydwip Mohajon - Towards Data Science
No ratings yet
Confusion Matrix For Your Multi-Class Machine Learning Model - by Joydwip Mohajon - Towards Data Science
9 pages
Data Mining
No ratings yet
Data Mining
15 pages
1730898070284_Instructions to Access Delhi Buses - GTFS-RT Feed - DTC & DIMTS (2)
No ratings yet
1730898070284_Instructions to Access Delhi Buses - GTFS-RT Feed - DTC & DIMTS (2)
3 pages
Nutrition Services Screening Assessment (NSSA) Sebagai
No ratings yet
Nutrition Services Screening Assessment (NSSA) Sebagai
8 pages
ID3 BuyPC
No ratings yet
ID3 BuyPC
3 pages
Data Types
No ratings yet
Data Types
2 pages
Cluster
100% (1)
Cluster
72 pages
Personalized Food Recommendation System by Using Machine Learning Models
No ratings yet
Personalized Food Recommendation System by Using Machine Learning Models
5 pages
Complete Project Cost Estimation
No ratings yet
Complete Project Cost Estimation
1 page
Front Page Gold Text
No ratings yet
Front Page Gold Text
1 page
Admit Card (2)
No ratings yet
Admit Card (2)
1 page
Calculationo Marks&ConversionCertificateforVarious 1 2
No ratings yet
Calculationo Marks&ConversionCertificateforVarious 1 2
2 pages
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Clustering

Uploaded by

Clustering

Uploaded by

What is Cluster Analysis?

• Finding groups of objects such that the objects in a group will

– Group related documents for 1 Cabletron-Sys-DOWN,CISCO-DOWN,HP-DOWN,

browsing, group genes and Natl-Semiconduct-DOWN,Oracl-DOWN,SGI-DOWN,

with similar price fluctuations Fannie-Mae-DOWN,Fed-Home-Loan-DOWN,

How many clusters? Six Clusters

Two Clusters Four Clusters

• Partitioning method: Construct a partition of a database D of n

• Partition clustering approach

– x is a data point in cluster Ci and mi is the representative point for

Original Points K-means (3 Clusters)

Original Points K-means (3 Clusters)

Original Points K-means (2 Clusters)

2.5 2.5 2.5

1.5 1.5 1.5

0.5 0.5 0.5

Iteration 4 Iteration 5 Iteration 6

2.5 2.5 2.5

1.5 1.5 1.5

0.5 0.5 0.5

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2

Iteration 3 Iteration 4 Iteration 5

2.5 2.5 2.5

1.5 1.5 1.5

0.5 0.5 0.5

PAM (Kaufman and Rousseeuw, 1987)

k-medoids and (n-k) instances pair-wise comparison

December 5, 2024 Data Mining: Concepts and Techniques 23

Cjih = d(j, h) - d(j, i) Cjih = 0

Cjih = d(j, t) - d(j, i) Cjih = d(j, h) - d(j, t)

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.