0% found this document useful (0 votes)

2 views14 pages

Clustering Methods

Cluster analysis is a method of grouping similar data objects into clusters based on their characteristics, using unsupervised learning. It has various applications across fields such as biology, marketing, and city planning, and serves as a preprocessing tool for other algorithms. The quality of clustering depends on factors like intra-class similarity, inter-class similarity, and the similarity measures used.

Uploaded by

pobocow192

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views14 pages

Clustering Methods

Uploaded by

pobocow192

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

What is Cluster Analysis?

 Cluster: A collection of data objects

 similar (or related) to one another within the same group

 dissimilar (or unrelated) to the objects in other groups

 Cluster analysis (or clustering, data segmentation, …)

 Finding similarities between data according to the

characteristics found in the data and grouping similar

data objects into clusters
 Unsupervised learning: no predefined classes (i.e., learning
by observations vs. learning by examples: supervised)
 Typical applications
 As a stand-alone tool to get insight into data distribution

 As a preprocessing step for other algorithms

3
Clustering for Data Understanding and
Applications
 Biology: taxonomy of living things: kingdom, phylum, class, order,
family, genus and species
 Information retrieval: document clustering
 Land use: Identification of areas of similar land use in an earth
observation database
 Marketing: Help marketers discover distinct groups in their customer
bases, and then use this knowledge to develop targeted marketing
programs
 City-planning: Identifying groups of houses according to their house
type, value, and geographical location
 Earth-quake studies: Observed earth quake epicenters should be
clustered along continent faults
 Climate: understanding earth climate, find patterns of atmospheric
and ocean
 Economic Science: market resarch
4
Clustering as a Preprocessing Tool (Utility)

 Summarization:
 Preprocessing for regression, PCA, classification, and
association analysis
 Compression:
 Image processing: vector quantization
 Finding K-nearest Neighbors
 Localizing search to one or a small number of clusters
 Outlier detection
 Outliers are often viewed as those “far away” from any
cluster

5
Quality: What Is Good Clustering?

 A good clustering method will produce high quality

clusters
 high intra-class similarity: cohesive within clusters
 low inter-class similarity: distinctive between clusters
 The quality of a clustering method depends on
 the similarity measure used by the method
 its implementation, and
 Its ability to discover some or all of the hidden patterns

6
Measure the Quality of Clustering
 Dissimilarity/Similarity metric
 Similarity is expressed in terms of a distance function,
typically metric: d(i, j)
 The definitions of distance functions are usually rather
different for interval-scaled, boolean, categorical,
ordinal ratio, and vector variables
 Weights should be associated with different variables
based on applications and data semantics
 Quality of clustering:
 There is usually a separate “quality” function that
measures the “goodness” of a cluster.
 It is hard to define “similar enough” or “good enough”
 The answer is typically highly subjective
7
Considerations for Cluster Analysis
 Partitioning criteria
 Single level vs. hierarchical partitioning (often, multi-level
hierarchical partitioning is desirable)
 Separation of clusters
 Exclusive (e.g., one customer belongs to only one region) vs. non-
exclusive (e.g., one document may belong to more than one
class)
 Similarity measure
 Distance-based (e.g., Euclidian, road network, vector) vs.
connectivity-based (e.g., density or contiguity)
 Clustering space
 Full space (often when low dimensional) vs. subspaces (often in
high-dimensional clustering)
8
Requirements and Challenges
 Scalability
 Clustering all the data instead of only on samples

 Ability to deal with different types of attributes

 Numerical, binary, categorical, ordinal, linked, and mixture of

these
 Constraint-based clustering
 User may give inputs on constraints
 Use domain knowledge to determine input parameters
 Interpretability and usability
 Others
 Discovery of clusters with arbitrary shape

 Ability to deal with noisy data

 Incremental clustering and insensitivity to input order

 High dimensionality

9
Major Clustering Approaches (I)

 Partitioning approach:
 Construct various partitions and then evaluate them by some

criterion, e.g., minimizing the sum of square errors

 Typical methods: k-means, k-medoids, CLARANS

 Hierarchical approach:
 Create a hierarchical decomposition of the set of data (or objects)

using some criterion

 Typical methods: Diana, Agnes, BIRCH, CAMELEON

 Density-based approach:
 Based on connectivity and density functions

 Typical methods: DBSACN, OPTICS, DenClue

 Grid-based approach:
 based on a multiple-level granularity structure

 Typical methods: STING, WaveCluster, CLIQUE

10
Major Clustering Approaches (II)
 Model-based:
 A model is hypothesized for each of the clusters and tries to find

the best fit of that model to each other

 Typical methods: EM, SOM, COBWEB

 Frequent pattern-based:
 Based on the analysis of frequent patterns

 Typical methods: p-Cluster

 User-guided or constraint-based:
 Clustering by considering user-specified or application-specific

constraints
 Typical methods: COD (obstacles), constrained clustering

 Link-based clustering:
 Objects are often linked together in various ways

 Massive links can be used to cluster objects: SimRank, LinkClus

11
Chapter 10. Cluster Analysis: Basic Concepts and
Methods

 Cluster Analysis: Basic Concepts

 Partitioning Methods

 Hierarchical Methods

 Density-Based Methods

 Grid-Based Methods

 Evaluation of Clustering

 Summary
12
Partitioning Algorithms: Basic Concept

 Partitioning method: Partitioning a database D of n objects into a set of

k clusters, such that the sum of squared distances is minimized (where
ci is the centroid or medoid of cluster Ci)

E   ik1 pCi ( p  ci ) 2
 Given k, find a partition of k clusters that optimizes the chosen
partitioning criterion
 Global optimal: exhaustively enumerate all partitions
 Heuristic methods: k-means and k-medoids algorithms
 k-means (MacQueen’67, Lloyd’57/’82): Each cluster is represented
by the center of the cluster
 k-medoids or PAM (Partition around medoids) (Kaufman &
Rousseeuw’87): Each cluster is represented by one of the objects
in the cluster
13
The K-Means Clustering Method

 Given k, the k-means algorithm is implemented in four

steps:
 Partition objects into k nonempty subsets
 Compute seed points as the centroids of the
clusters of the current partitioning (the centroid is
the center, i.e., mean point, of the cluster)
 Assign each object to the cluster with the nearest
seed point
 Go back to Step 2, stop when the assignment does
not change

14
An Example of K-Means Clustering

K=2

Arbitrarily Update the

partition cluster
objects into centroids
k groups

The initial data set Loop if Reassign objects

needed
 Partition objects into k nonempty
subsets
 Repeat
 Compute centroid (i.e., mean Update the
cluster
point) for each partition
centroids
 Assign each object to the
cluster of its nearest centroid
 Until no change
15
Comments on the K-Means Method

 Strength: Efficient: O(tkn), where n is # objects, k is # clusters, and t is

# iterations. Normally, k, t << n.
 Comparing: PAM: O(k(n-k)2 ), CLARA: O(ks2 + k(n-k))
 Comment: Often terminates at a local optimal.
 Weakness
 Applicable only to objects in a continuous n-dimensional space
 Using the k-modes method for categorical data
 In comparison, k-medoids can be applied to a wide range of
data
 Need to specify k, the number of clusters, in advance (there are
ways to automatically determine the best k (see Hastie et al., 2009)
 Sensitive to noisy data and outliers
 Not suitable to discover clusters with non-convex shapes
16

Clustering Algorithms
No ratings yet
Clustering Algorithms
93 pages
Cluster-Analysis
No ratings yet
Cluster-Analysis
89 pages
Cluster Analysis
No ratings yet
Cluster Analysis
76 pages
Catalogo - FO LISA en (Huber+Suhner)
No ratings yet
Catalogo - FO LISA en (Huber+Suhner)
250 pages
Unit - 5 Cluster Analysis
No ratings yet
Unit - 5 Cluster Analysis
83 pages
Session 7 Clustering
No ratings yet
Session 7 Clustering
93 pages
Unit5 Clustering
No ratings yet
Unit5 Clustering
74 pages
Clustering K Means Agnes
No ratings yet
Clustering K Means Agnes
36 pages
8 - Clustering
No ratings yet
8 - Clustering
85 pages
Lecture 1 PDF
No ratings yet
Lecture 1 PDF
23 pages
Syllabus For 37 Aryabhatta Inter-School Mathematics Comp. - 2020 (Class V)
No ratings yet
Syllabus For 37 Aryabhatta Inter-School Mathematics Comp. - 2020 (Class V)
275 pages
Chap8-Cluster Analysis
No ratings yet
Chap8-Cluster Analysis
78 pages
Lecture 1 PDF
No ratings yet
Lecture 1 PDF
23 pages
Clustering
No ratings yet
Clustering
24 pages
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
No ratings yet
APznzaaxpWzYylHJmwXGn2puBz7GP1usZYf9XTi7oqfrrKnFV9DMMfVzPCu6yO0UOnr_XFt1gJv4TE1ITR6850n9k65DydQUgoRlylNdn2acWAu6KNonoO8z7QULN6BlLxY_B-JhKko0tJ3K77woLz26oTaAv1YNcIuMcOSqInmgeCUzpUxjKC9VqnT_lhE7vDyWp_LQQjGTRnamgIC6ya3nlwi7mjjE9EUIiO2sUhjkD6RV
38 pages
10ClusBasic
No ratings yet
10ClusBasic
95 pages
05 Clustering
No ratings yet
05 Clustering
96 pages
10ClusBasic
No ratings yet
10ClusBasic
66 pages
Cluster
No ratings yet
Cluster
20 pages
unit iv[1]
No ratings yet
unit iv[1]
96 pages
Unit 5 DM
No ratings yet
Unit 5 DM
47 pages
10ClusBasic Editted v1
No ratings yet
10ClusBasic Editted v1
41 pages
Clustering
No ratings yet
Clustering
32 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
50 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Clustering-Part1
No ratings yet
Clustering-Part1
79 pages
CSP Unit 1 - Digital Information
No ratings yet
CSP Unit 1 - Digital Information
205 pages
Lecture 8 - Clustering
No ratings yet
Lecture 8 - Clustering
23 pages
10ClusBasic (1)
No ratings yet
10ClusBasic (1)
31 pages
Lecture 16
No ratings yet
Lecture 16
29 pages
Clustering
No ratings yet
Clustering
34 pages
UNIT-5 PPT
No ratings yet
UNIT-5 PPT
85 pages
Slide-08-Chapter10-Cluster Analysis Basic Concept I
No ratings yet
Slide-08-Chapter10-Cluster Analysis Basic Concept I
40 pages
ML_7th_Sem_AIML_ITE_Notes_Complete_LONG[1]-155-202
No ratings yet
ML_7th_Sem_AIML_ITE_Notes_Complete_LONG[1]-155-202
48 pages
10clustering - Han and Kamber
No ratings yet
10clustering - Han and Kamber
93 pages
Cluster Analysis
No ratings yet
Cluster Analysis
21 pages
Clustering For Big Data Analytics
No ratings yet
Clustering For Big Data Analytics
28 pages
Concepts and Techniques: - Chapter 10
No ratings yet
Concepts and Techniques: - Chapter 10
97 pages
2002 Spring CS525 Lecture 2
No ratings yet
2002 Spring CS525 Lecture 2
37 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
9 pages
UG BSF Clustering
No ratings yet
UG BSF Clustering
119 pages
Clustering Notes
No ratings yet
Clustering Notes
17 pages
Chap8-Cluster Analysis
No ratings yet
Chap8-Cluster Analysis
103 pages
Clustering
No ratings yet
Clustering
29 pages
clustering
No ratings yet
clustering
16 pages
NetBackup 52xx Appliance Admin Guide 26
No ratings yet
NetBackup 52xx Appliance Admin Guide 26
344 pages
Introduction to Cluster Analysis.
No ratings yet
Introduction to Cluster Analysis.
53 pages
Cluster Analysis Clustering
No ratings yet
Cluster Analysis Clustering
6 pages
Final ML Unit3 May24
No ratings yet
Final ML Unit3 May24
154 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
101 pages
Clustering
No ratings yet
Clustering
104 pages
Data Mining - Clustering
No ratings yet
Data Mining - Clustering
90 pages
Mohil Katiyar - Comparative Study On Dell and HP
No ratings yet
Mohil Katiyar - Comparative Study On Dell and HP
79 pages
Clustering
No ratings yet
Clustering
29 pages
Partizan Access Control Management User Manual: Version 2.0.0, 14 August 2015
No ratings yet
Partizan Access Control Management User Manual: Version 2.0.0, 14 August 2015
53 pages
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
No ratings yet
What Is Cluster Analysis?: - Cluster: A Collection of Data Objects
42 pages
Lecture 6
No ratings yet
Lecture 6
14 pages
Data Mining: I Gede Mahendra Darmawiguna
No ratings yet
Data Mining: I Gede Mahendra Darmawiguna
25 pages
Clustering-Part 1
No ratings yet
Clustering-Part 1
35 pages
DWMModule 4 (1) (1) (1)
No ratings yet
DWMModule 4 (1) (1) (1)
31 pages
5 Algoritma Klastering
No ratings yet
5 Algoritma Klastering
85 pages
1Z0-1077-24
No ratings yet
1Z0-1077-24
73 pages
BDA Unit 2
No ratings yet
BDA Unit 2
31 pages
Cluster Analysis
No ratings yet
Cluster Analysis
18 pages
The Information Age
No ratings yet
The Information Age
27 pages
PA 5 UNIT
No ratings yet
PA 5 UNIT
35 pages
Pyomo VSjump
No ratings yet
Pyomo VSjump
22 pages
CAN Bus - The Ultimate Guide
100% (3)
CAN Bus - The Ultimate Guide
114 pages
Nios Ddi Expert Exam Valid Questions
No ratings yet
Nios Ddi Expert Exam Valid Questions
8 pages
OpenHSM An Open Key Life Cycle Protocol For PKI HSM
No ratings yet
OpenHSM An Open Key Life Cycle Protocol For PKI HSM
16 pages
Data Mining Clustering
No ratings yet
Data Mining Clustering
76 pages
Oriental College of Techonology, Bhopal: Carwale Website
No ratings yet
Oriental College of Techonology, Bhopal: Carwale Website
11 pages
5 - Clin Experimental Optometry - 2019 - Jaiswal - Ocular and Visual Discomfort Associated With Smartphones Tablets and
No ratings yet
5 - Clin Experimental Optometry - 2019 - Jaiswal - Ocular and Visual Discomfort Associated With Smartphones Tablets and
15 pages
Dr. Dhruv Kant Rai: A Project Is Submitted in Partial Fulfilment of Requirements For The Degree of
No ratings yet
Dr. Dhruv Kant Rai: A Project Is Submitted in Partial Fulfilment of Requirements For The Degree of
12 pages
Accumax Riser Cable: Non-Plenum Cables Suitable For Zone Wiring
No ratings yet
Accumax Riser Cable: Non-Plenum Cables Suitable For Zone Wiring
6 pages
Yang 等 - 2024 - LLM4Drive a Survey of Large Language Models for Autonomous Driving
No ratings yet
Yang 等 - 2024 - LLM4Drive a Survey of Large Language Models for Autonomous Driving
19 pages
Audio Augmented Reality
No ratings yet
Audio Augmented Reality
22 pages
A Compact Circular Patch Antenna For Wireless Network Applications
No ratings yet
A Compact Circular Patch Antenna For Wireless Network Applications
4 pages
Sive John B. Keane John B. Keane Free Download, Borrow, And Streaming Internet Archive
No ratings yet
Sive John B. Keane John B. Keane Free Download, Borrow, And Streaming Internet Archive
1 page
Logcat
No ratings yet
Logcat
7 pages
Software Requirements Specification: For: Library Management System
No ratings yet
Software Requirements Specification: For: Library Management System
7 pages
Master Thesis Guidelines Vub
100% (3)
Master Thesis Guidelines Vub
6 pages
Unit 4
No ratings yet
Unit 4
4 pages
What To Write in An Email When Sending A Resume
100% (1)
What To Write in An Email When Sending A Resume
6 pages
Photoshop Tools Session 2 Review
No ratings yet
Photoshop Tools Session 2 Review
2 pages
Raspberry Pi Retro Gaming System
No ratings yet
Raspberry Pi Retro Gaming System
2 pages
Edu en Vsicm8 Lab
No ratings yet
Edu en Vsicm8 Lab
167 pages
It4023 Operating System Concepts (Mid - SP19)
No ratings yet
It4023 Operating System Concepts (Mid - SP19)
1 page
PLC Memory Mapping and I:O Addressing - PLC Tutorials - PLC Lectures
No ratings yet
PLC Memory Mapping and I:O Addressing - PLC Tutorials - PLC Lectures
2 pages
K Nearest Neighbor Algorithm: Fundamentals and Applications
From Everand
K Nearest Neighbor Algorithm: Fundamentals and Applications
Fouad Sabry
No ratings yet

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Clustering Methods

Uploaded by

Clustering Methods

Uploaded by

What is Cluster Analysis?

 Cluster: A collection of data objects

 dissimilar (or unrelated) to the objects in other groups

 Cluster analysis (or clustering, data segmentation, …)

characteristics found in the data and grouping similar

 As a preprocessing step for other algorithms

 A good clustering method will produce high quality

 Ability to deal with different types of attributes

 Ability to deal with noisy data

 Incremental clustering and insensitivity to input order

criterion, e.g., minimizing the sum of square errors

using some criterion

 Typical methods: DBSACN, OPTICS, DenClue

 Typical methods: STING, WaveCluster, CLIQUE

the best fit of that model to each other

 Typical methods: p-Cluster

 Massive links can be used to cluster objects: SimRank, LinkClus

 Cluster Analysis: Basic Concepts

 Partitioning method: Partitioning a database D of n objects into a set of

 Given k, the k-means algorithm is implemented in four

Arbitrarily Update the

The initial data set Loop if Reassign objects

 Strength: Efficient: O(tkn), where n is # objects, k is # clusters, and t is

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.