0% found this document useful (0 votes)

191 views3 pages

Lattin Et Al - Analyzing Multivariate Data - 281-283

Agglomerative clustering builds clusters gradually by linking or grouping objects together based on their similarity. It results in a hierarchical clustering represented as a dendrogram. There is no definitive number of clusters - the dendrogram shows nested cluster solutions that can be interpreted differently based on a chosen cut-off distance. Single linkage clustering tends to produce elongated, non-compact clusters and is sensitive to outliers. Other methods like complete linkage, average linkage, and centroid clustering aim to produce more balanced clusters of comparable diameter to address weaknesses of single linkage.

Uploaded by

vignesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

191 views3 pages

Lattin Et Al - Analyzing Multivariate Data - 281-283

Uploaded by

vignesh

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

8.

4 Agglomerative Clustering: How It Works 281

1
flGURE 8.1 Observation
p en dro gr am for A---------.
kage clus
single-lin
o utlon for
fe/ing s !
data in Figure 8.10
B _________,

c------------�

0------------�

0 2 3 4 5 6
Minimum distance between clusters

How Many Clusters?

Agglomerative clustering does not provide a definitive answer to the question, how
many clusters are there? In fact, the dendrogram is a graphical representation of a hi
erarchy of nested cluster solutions: a one-cluster solution, two-cluster solution, and
so on, all the way up to an n-cluster solution. Drawing a vertical line on the dendro
gram (corresponding to a particular distance valued) reveals the cluster solution at
that level of distance and the membership of the different clusters. For example, a ver
tical line at d = 4 defines the two-cluster solution with clusters [ A, BJ and { C, DJ.
So how does one tell, by looking at the dendrogram, whether one of these nested
cluster solutions provides a "better" representation of the data? One thing to look for
is a relatively wide range of distances over which the number of clusters in the solu
tion does not change. In this simple example, the two-cluster structure is stable over
the range of distances in the interval (3, 6). There is no question that reading the num
ber of clusters from a dendrogram ( just like reading the number of factors from a
scree plot) involves a considerable amount of subjectivity and requires judgment on
the part of the analyst.

Properties of Single Linkage

The results from single-linkage clustering (and from all of the agglomerative ap
proaches discussed subsequently in this section) are hierarchical in nature. That
means that a cluster solution near the top of the tree can always be obtained by com
bining the clusters from any solution nearer the bottom of the tree. This property is a
natural consequence of the algorithm.
Single-linkage clustering is computationally efficient. As the number of objects
n increases, the worst-case amount of computational effort required increases on the
order of n2 • The algorithm is even more efficient for sparse data (e.g., for network
282 Chapter 8 Cluster Analysis

structures, where each object is connected to only a fraction of the other objects in
the set). Here the computational effort is on the order nA, where A is the average num.
ber of connections for each object in the set. Furthermore, ingJe-linkage clusteting
does not require metric data. The implementation of the algorithm described above
works just as well for ordinal measures of dissimilarity.
One drawback of single linkage is that it tends to be extremely myopic. An ob.
ject will be added to a cluster so long as it is close to any one of the other objects in
the cluster, even if it is relatively far from all the others. Thus, single linkage has a
tendency to produce long, stringy clusters and nonconvex cluster shapes. If the true
underlying clusters are nonconvex, then this property is not necessarily a bad thing;
however, in most cases the naturally occurring modes in our data will tend to be
convex and compact and a better reflection of internal homogeneity. As a direct re
sult, the approach has not performed well in Monte Carlo studies (see, e.g., Milli
gan, 1980).

Alternatives to Single Linkage

Many different approaches have been developed to deal with the we�knesses inher
ent in single linkage. Some of these approaches are described briefly below. Note that
all of these approaches are agglomerative in nature and produce hierarchical cluster
solutions.

Complete Linkage. Instead of defining the distance (or dissimilarity) between clus
ters as the distance between the closest pair of objects (as in single linkage), we use
the distance between the farthest pair of objects. This ensures that each object added
to a cluster is close to all objects in the cluster and not just one. The only change re
quired to go from single linkage to complete linkage clustering is to rewrite step 3 as
follows:
dC.,+, Ck
= max{dcc,dcc
I j A
"
} (8.6)
Compared to single linkage, complete linkage is much more likely to produce
convex clusters that tend to be of comparable diameter. Although complete linkage
has a tendency to produce convenient and homogeneous groupings, these are not nec
essarily driven by the natural modality of the data. Milligan (1980) found that com
plete linkage can be highly sensitive to outliers in the data. When a tie occurs at step
3 (i.e., more than two clusters can be joined together), the choice can affect the sub
sequent shape of the cluster solution (a problem that does not occur in the case of
single linkage).

Average Linkage. This approach can be considered a sort of compromise between

single linkage and complete linkage. Some authors prefer this method because it
comes closest to fitting a tree that satisfies a least squares minimization criterion.
Instead of using the minimum (single linkage) or the maximum (complete link age),
the new distance is defined as the average distance between cluster Ck and the new
8.4 Agglomerative Clustering: How It Works 283

cluster C,,+, (formed by joining together clusters C; and C). Thus, we rewrite step 3
as follows:
- n-dee + n) dee
I
de,.,,c, -
i ( J
(8.7)
(·

n- + 11·J
I

where 11; + 11 is the number of objects in the newly formed cluster C,,+,· Note that if
1

the data are nonmetric, the average can be replaced by the median (in which case the
method is called median linkage).

Centroid Method. Instead of defining the distance between two clusters as the aver
age distance between all pairs of objects, it is also possible to first "average" the ob
jects in each cluster (in effect, calculating the cluster centroids) and then define the
distance between the two centroids. For this method, it simplifies things if we work
with squared distances. Let dj represent the squared Euclidean distance between ob
jects i and j. If cluster C = { i,j}, then the squared distance between object k and the
centroid of cluster C can be written as
d;i + d} cf;7
12
�ke = (8.8)
2 4
In general, the squared distance between any cluster Ck and the new cluster C,,+,
created by joining clusters C; and CJ can be written as
nc Ile d� c
(8.9)
/ ) /I j

(nc' + nc., ) 2
By writing the rule in step 3 as a function of squared Euclidean distance (rather
than the attribute measures X), the centroid method can be used with directly as
sessed proximity measures as well as derived distance measures (e.g., squared dis
tances calculated from attribute data). According to Milligan (1980), the centroid
method is robust to outliers but may be outperformed by average linkage.

Ward's Method. The three methods just described (complete linkage, average link
age, and the centroid method) are all variants of a general agglomerative approach
called the pair group method that differs only in terms of the distance relation
specified in step 3. By contrast, Ward's method (sometimes referred to as the mini
mum variance method) adopts a slightly different strategy at step 1. Instead of join
ing the two closest clusters, Ward's method seeks to join the two clusters whose
merger leads to the smallest within-cluster sum of squares (i.e., minimum within
group variance).
Ward's method has a tendency to produce equal-sized clusters (i.e., clusters with
approximately the same 11umber of observation in each) that are convex and compact.
Because the approach is based on the minimization of within-cluster distances, it
often produces a clustering solution - if the tree is "cut" in the right place - that
is similar to the partitioning methods described in section 8.5 below (which also
focuses on minimizing the within-group sum of squares).

Distance Measures
No ratings yet
Distance Measures
11 pages
Hierarchial Clustering
No ratings yet
Hierarchial Clustering
14 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
6 pages
Unit 3 Clustering
No ratings yet
Unit 3 Clustering
101 pages
6 - Machine Learning and Unlabeled Data
No ratings yet
6 - Machine Learning and Unlabeled Data
67 pages
Lecture07 95791
No ratings yet
Lecture07 95791
84 pages
Hierarchical Clustering: Relationship Between Clusters
No ratings yet
Hierarchical Clustering: Relationship Between Clusters
23 pages
Lecture 18
No ratings yet
Lecture 18
27 pages
RK Clustering
No ratings yet
RK Clustering
77 pages
Cluster Analysis
No ratings yet
Cluster Analysis
9 pages
Stat401 ch6
No ratings yet
Stat401 ch6
37 pages
3CP10 MJJ Hierarchical Clustering
No ratings yet
3CP10 MJJ Hierarchical Clustering
40 pages
13 Clustering and Classifier
No ratings yet
13 Clustering and Classifier
123 pages
Hierarchical Clustering: Class Program University Semester Lecturer Sources
100% (1)
Hierarchical Clustering: Class Program University Semester Lecturer Sources
33 pages
Cluster Analysis Hierarchical & - Means
No ratings yet
Cluster Analysis Hierarchical & - Means
41 pages
6 - Chapter 6 - Hierarchical Clustering
No ratings yet
6 - Chapter 6 - Hierarchical Clustering
32 pages
Cluster Analysis
No ratings yet
Cluster Analysis
12 pages
Unit-4 New
No ratings yet
Unit-4 New
36 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
26 pages
20 - 1 - ML - UNSUP - 02 - Hierarchical Clustering
No ratings yet
20 - 1 - ML - UNSUP - 02 - Hierarchical Clustering
41 pages
Phân Cấp Phân Cụm
No ratings yet
Phân Cấp Phân Cụm
17 pages
Hierarchical
No ratings yet
Hierarchical
31 pages
Hierarchical Clustering Dendrograms
No ratings yet
Hierarchical Clustering Dendrograms
12 pages
Lec.4.D. M. Spring 2025
No ratings yet
Lec.4.D. M. Spring 2025
19 pages
Pattern Recognition 21BR551 MODULE 04 NOTES
No ratings yet
Pattern Recognition 21BR551 MODULE 04 NOTES
16 pages
ML TCS Lecture Hierarchical 1608
No ratings yet
ML TCS Lecture Hierarchical 1608
41 pages
Clustering
No ratings yet
Clustering
110 pages
Lec 35
No ratings yet
Lec 35
18 pages
Cluster Analysis
No ratings yet
Cluster Analysis
6 pages
AI20 - Hierarchical-Clustering
No ratings yet
AI20 - Hierarchical-Clustering
31 pages
07 Hierarchical Clustering
No ratings yet
07 Hierarchical Clustering
19 pages
CLUSTERING
No ratings yet
CLUSTERING
16 pages
MA Unit 5
No ratings yet
MA Unit 5
7 pages
Module-5-Cluster Analysis-Part1
No ratings yet
Module-5-Cluster Analysis-Part1
24 pages
Clustring
No ratings yet
Clustring
20 pages
Cluster Analysis (Continued)
No ratings yet
Cluster Analysis (Continued)
3 pages
Clustering 1
No ratings yet
Clustering 1
2 pages
Hierarchical Clustering
No ratings yet
Hierarchical Clustering
8 pages
BA2 7 Cluster
No ratings yet
BA2 7 Cluster
33 pages
Cluster Analysis GP Seminar
No ratings yet
Cluster Analysis GP Seminar
13 pages
Hierarchical Clustering - 11.3.2024 - Full
No ratings yet
Hierarchical Clustering - 11.3.2024 - Full
14 pages
Un Supervised Learning
No ratings yet
Un Supervised Learning
22 pages
Cluster Analysis
No ratings yet
Cluster Analysis
30 pages
My Lecture On CLUSTER ANALYSIS PDF
No ratings yet
My Lecture On CLUSTER ANALYSIS PDF
55 pages
Unit-6 Clustering Techniques
No ratings yet
Unit-6 Clustering Techniques
110 pages
Performance Evaluation of Distance Metrics in The Clustering Algorithms
No ratings yet
Performance Evaluation of Distance Metrics in The Clustering Algorithms
14 pages
Cluster Analysis: Prof. (DR.) H. J. Jani Mba Programme, Sardar Patel University Vallabh Vidyanagar - 388 120
No ratings yet
Cluster Analysis: Prof. (DR.) H. J. Jani Mba Programme, Sardar Patel University Vallabh Vidyanagar - 388 120
41 pages
Clustering: EE-671 Prof L. Behera, IITK
No ratings yet
Clustering: EE-671 Prof L. Behera, IITK
33 pages
Artificial Neural Network-Adaline & Madaline
No ratings yet
Artificial Neural Network-Adaline & Madaline
18 pages
Clustering Hierarchical PDF
No ratings yet
Clustering Hierarchical PDF
31 pages
Lattin Et Al - Analyzing Multivariate Data - 279-281
No ratings yet
Lattin Et Al - Analyzing Multivariate Data - 279-281
3 pages
Knowledge Acquisition and Sharing - Data Mining: INF 791 Lecture 4: Cluster Analysis
No ratings yet
Knowledge Acquisition and Sharing - Data Mining: INF 791 Lecture 4: Cluster Analysis
43 pages
Agglomerative Clustering
No ratings yet
Agglomerative Clustering
6 pages
Hierarchical Clustering Methods
No ratings yet
Hierarchical Clustering Methods
22 pages
K Means Questions
No ratings yet
K Means Questions
2 pages
Example For Agglomerative Clustering
No ratings yet
Example For Agglomerative Clustering
2 pages
Agglomerative Hierarchical Clustering Algorithm-A Review: K.Sasirekha, P.Baby
No ratings yet
Agglomerative Hierarchical Clustering Algorithm-A Review: K.Sasirekha, P.Baby
3 pages
Cluster Analysis
No ratings yet
Cluster Analysis
9 pages
Cluster Analysis BRM Session 14
No ratings yet
Cluster Analysis BRM Session 14
25 pages
Clustering - Hierarchical
No ratings yet
Clustering - Hierarchical
4 pages
Dendrogram - Slides
No ratings yet
Dendrogram - Slides
27 pages
Al3451 ML
No ratings yet
Al3451 ML
6 pages
PA Research Papers
No ratings yet
PA Research Papers
5 pages
3 - Sınıflandırma 2
No ratings yet
3 - Sınıflandırma 2
62 pages
Unit 3 Classification - Dr. Vidyut D
No ratings yet
Unit 3 Classification - Dr. Vidyut D
72 pages
Confusion Matrix
No ratings yet
Confusion Matrix
4 pages
Prediciton of Loan Apprval-Project Report
No ratings yet
Prediciton of Loan Apprval-Project Report
82 pages
DWDM - Unit - VI
No ratings yet
DWDM - Unit - VI
38 pages
ML.4-Classification Techniques (Week 5,6,7)
No ratings yet
ML.4-Classification Techniques (Week 5,6,7)
56 pages
Unit 9 - Classification & Clustering
No ratings yet
Unit 9 - Classification & Clustering
34 pages
Data Description Toolbox DD Tools 2.0.0
No ratings yet
Data Description Toolbox DD Tools 2.0.0
47 pages
A Comparative Analysis of Machine Learning Algorithms For Classification Purpose
No ratings yet
A Comparative Analysis of Machine Learning Algorithms For Classification Purpose
10 pages
Old Exam
No ratings yet
Old Exam
104 pages
Pereira 2021
No ratings yet
Pereira 2021
7 pages
Discriminant Analysis
No ratings yet
Discriminant Analysis
45 pages
Anomaly Detection Algorithms For RapidMiner
No ratings yet
Anomaly Detection Algorithms For RapidMiner
12 pages
ENsemble, Random Forest
No ratings yet
ENsemble, Random Forest
28 pages
What Is Cluster Analysis?
No ratings yet
What Is Cluster Analysis?
56 pages
FB Models PDF
No ratings yet
FB Models PDF
14 pages
Materi 4 - Analisis Big Data
No ratings yet
Materi 4 - Analisis Big Data
30 pages
Phil IRI Answer Sheet
No ratings yet
Phil IRI Answer Sheet
4 pages
Klasifikasi Jenis Umbi Berdasarkan Citra Menggunakan SVM Dan KNN
No ratings yet
Klasifikasi Jenis Umbi Berdasarkan Citra Menggunakan SVM Dan KNN
4 pages
Q Bank2
No ratings yet
Q Bank2
4 pages
Tugas Skill Lab Ebm Dr. Muhammad Fikri Aulia
No ratings yet
Tugas Skill Lab Ebm Dr. Muhammad Fikri Aulia
26 pages
Materi Naive Bayes
No ratings yet
Materi Naive Bayes
15 pages
Boosting (Machine Learning)
No ratings yet
Boosting (Machine Learning)
6 pages
Imputation
No ratings yet
Imputation
2 pages
Precision, Recall, F1-Score
No ratings yet
Precision, Recall, F1-Score
6 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Lattin Et Al - Analyzing Multivariate Data - 281-283

Uploaded by

Lattin Et Al - Analyzing Multivariate Data - 281-283

Uploaded by

8.

4 Agglomerative Clustering: How It Works 281

How Many Clusters?

Properties of Single Linkage

Alternatives to Single Linkage

Average Linkage. This approach can be considered a sort of compromise between

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.