Evaluation of The Performance of Clustering Algorithms in
Evaluation of The Performance of Clustering Algorithms in
www.elsevier.com/locate/patcog
Abstract
By using a kernel function, data that are not easily separable in the original space can be clustered into homogeneous
groups in the implicitly transformed high-dimensional feature space. Kernel k-means algorithms have recently been shown
to perform better than conventional k-means algorithms in unsupervised classification. However, few reports have examined
the benefits of using a kernel function and the relative merits of the various kernel clustering algorithms with regard to the
data distribution. In this study, we reformulated four representative clustering algorithms based on a kernel function and
evaluated their performances for various data sets. The results indicate that each kernel clustering algorithm gives markedly
better performance than its conventional counterpart for almost all data sets. Of the kernel clustering algorithms studied in
the present work, the kernel average linkage algorithm gives the most accurate clustering results.
䉷 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
Keywords: Clustering; Kernel; k-means; Fuzzy c-means; Average linkage; Mountain algorithm
0031-3203/$30.00 䉷 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
doi:10.1016/j.patcog.2004.09.006
608 D.-W. Kim et al. / Pattern Recognition 38 (2005) 607 – 611
Given an unlabeled data set X and a mapping : R d → The kernel fuzzy c-means algorithm in the feature space
H , the k-means algorithm in the high-dimensional feature by a mapping minimizes the function Jm [4]:
space iteratively searches for k clusters by minimizing the n
c
function J [2,3]: Jm (X) = (ij )m (xj ) − vi 2 , (4)
i=1 j =1
k
n
J (X) = (xj ) − vi 2 , (1) where ij is the membership degree of data point xj to the ith
i=1 j =1 fuzzy cluster, and m
is a fuzziness coefficient.
The ith cluster
centroid is vi = nj=1 (ij )m (xj )/ nj=1 (ij )m . The k-
n
where the ith cluster centroid vi = n−1
i j =1 zij (xj ) and means algorithm repeatedly updates the k clusters at each
n successive iteration, whereas the fuzzy c-means algorithm
ni = j =1 zij . Here zij indicates whether data point xj
belongs to the ith cluster; specifically, zij = 1 if it belongs to iteratively updates the new membership degree ij at each
the ith cluster and 0 otherwise. The key notion in the kernel iteration. The update of ij in the feature space is defined
k-means algorithm lies in the calculation of the distance in through the kernel in the input space as follows:
the feature space. The distance between (xj ) and vi in the 1/(m−1) −1
c
feature space is calculated through the kernel in the input (xj ) − vi 2
ij = , (5)
space: (x ) − v 2
k=1 j k
2.3. Kernel average linkage algorithm using the following modified mountain function:
n m
1
Avg (Fn , Fm
) = (xi ) − (yj )2 , (7) 3. Experimental results
nm
i=1 j =1
To test the various kernel clustering algorithms, we ap-
where plied the four conventional clustering algorithms as well
as their kernel versions to 10 widely used data sets and
(xi ) − (xj )2 = (xi ) · (xi ) − 2(xi )(xj ) compared the performances of the algorithms. The data
+ (xj )(xj ) sets employed were the BENSAID (49 data/3 clusters) [5],
DUNN (90 data/2 clusters) [5], IRIS (150 data/3 clusters)
= K(xi , xi ) − 2K(xi , xj )
[5], ECOLI (336 data/7 clusters), CIRCLE, BLE-3, BLE-
+ K(xj , xj ). (8) 2, UE-4, UE-3, and ULE-4 data sets. This selection of data
sets includes various types of clusters, such as hyperspheri-
The iterative merging procedure in the feature space con- cal and hyperellipsoidal, and balanced and unbalanced types
tinues until all data points have been merged into a single (Fig. 1). The parameters used in the k-means and fuzzy c-
cluster or the number of merged groups reaches at prespec- means algorithms were a termination criterion of = 0.001
ified number of clusters k. and a weighting exponent of m = 2.0 [5]. The initial cen-
troids were uniformly distributed across the data set [5]. The
parameters used for the mountain algorithm were = 5.4
2.4. Kernel mountain algorithm and = 1.5. The RBF kernel function was used in all four
kernel clustering algorithms due to its superiority over other
The mountain algorithm estimates the cluster centroids kernel functions [2].
by constructing and destroying the mountain function on a The clustering accuracy achieved by each clustering al-
grid space. The mountain function indicates the potential gorithm for each of the 10 data sets is listed in Table 1.
that each grid point has to be a cluster centroid. To reduce On the whole, the conventional k-means and fuzzy c-means
the computational complexities of the original algorithm, we algorithms showed similar performances for each data set,
employed the subtractive mountain algorithm in which the with average accuracies of 73.60% and 74.39%, respectively.
mountain function is calculated on data points rather than In comparison to these two algorithms, the average link-
grid points. age algorithm showed better clustering performance (aver-
Given a mapping , the mountain function at a data point age 83.33%); in particular, it achieved 100.0% accuracy for
(xi ) in the feature space is defined as four of the ten data sets. In agreement with previous works,
the present results, particularly those for the unbalanced and
n
ellipsoidal data sets (e.g., BENSAID or BLE-2), show that
e−(xi )−(xj ) ,
2
M((xi )) = (9) the average linkage algorithm can handle a greater range
j =1 of cluster shapes than the k-mean-type algorithms. How-
ever, it showed substantially different behavior when applied
where (xi ) − (xj )2 = K(xi , xi ) − 2K(xi , xj ) + to similar data sets; for example it achieved accuracies of
K(xj , xj ). A higher value of M((xi )) indicates that (xi ) 56.0% and 100.0% for the BLE-3 and BLE-2, respectively,
has more data points (xj ) near to it in the feature space. and accuracies of 71.45% and 100.0% for the UE-4 and UE-
After calculating the mountain values, the data point whose 3 data sets. Such discrepancies arise because the average
mountain value is M1∗ = Max i [M((xi ))] is selected as linkage algorithm is sensitive to the order in which data are
the first cluster centroid. Subsequent centroids are found presented. Compared to the other algorithms, the mountain
610 D.-W. Kim et al. / Pattern Recognition 38 (2005) 607 – 611
Fig. 1. Two- and three-dimensional data sets used in our evaluation: (a) CIRCLE, (b) BLE-3, (c) BLE-2, (d) UE-4, (e) UE-3, (f) ULE-4.
Table 1
Clustering accuracy (%) achieved by each clustering algorithm for 10 data sets
Avg. (%) 73.60 74.39 83.33 70.52 89.27 89.73 94.95 87.37
algorithm exhibited more unstable performance. The aver- terparts, and the kernel average linkage and kernel moun-
age accuracy of the conventional mountain algorithm was tain algorithms were approximately 12% and 17% more ac-
70.52%. curate than the conventional algorithms, respectively. The
The clustering results obtained using the kernel clustering clustering performance of the kernel k-means and kernel
algorithms are listed on the right side of Table 1. It is evident fuzzy c-means algorithm provided significantly better clus-
that the kernel clustering algorithms give markedly better tering performance for data sets that have previously been
performance than the conventional algorithms. On average, shown low performance, such as CIRCLE, BLE-2, UE-4,
the kernel k-means and kernel fuzzy c-means algorithms and ULE-4 [5]. In addition, for the CIRCLE and IRIS data
were about 15% more accurate than their conventional coun- sets, which contain ring-shaped clusters and overlapping
D.-W. Kim et al. / Pattern Recognition 38 (2005) 607 – 611 611
clusters, respectively, these algorithms gave better clustering clustering accuracy to a degree similar to that achieved for
results than the kernel average linkage algorithm. The ker- the other algorithms. The kernel average linkage algorithm
nel average linkage algorithm successfully classified seven is the most appropriate to use when no prior knowledge on
of the 10 data sets, giving accuracies of 100.0%. Of partic- the characteristics of the data set is available.
ular note are the results for the BLE-3, UE-4, and ULE-4
data sets, for which the conventional average linkage al-
gorithm gave accuracies of 50–70% but the kernel version Acknowledgements
classified with 100% accuracy. In terms of the total accu-
racy, the kernel average linkage algorithm was the most ac- This work was supported by the Korean Systems Biol-
curate clustering algorithm (94.95%). The kernel mountain ogy Research Grant (M1-0309-02-0002) from the Ministry
algorithm gave the greatest enhancement of clustering per- of Science and Technology. We would like to thank Chung
formance for the mountain algorithm (17% improvement), Moon Soul Center for BioInformation and BioElectronics
with accuracies of more than 90% for six data sets. No- and the IBM SUR program for providing research and com-
tably, the ranking of the conventional clustering algorithms puting facilities.
in terms of overall accuracy was preserved in their ker-
nel versions: the kernel average linkage algorithm was the
most accurate and the kernel mountain algorithm the least References
accurate.
[1] K.-R. Muller, et al., An introduction to kernel-based learning
algorithms, IEEE Trans. Neural Networks 12 (2) (2001)
181–202.
4. Conclusions
[2] M. Girolami, Mercer kernel-based clustering in feature space,
IEEE Trans. Neural Networks 13 (3) (2002) 780–784.
Compared to the corresponding conventional clustering [3] R. Zhang, A.I. Rudnicky, A large scale clustering scheme
algorithms, the kernel clustering algorithms showed bet- for kernel k-means, in: The 16th International Conference on
ter clustering results for almost all data sets. The kernel Pattern Recognition, 2002, pp. 289–292.
k-means algorithm was significantly more accurate than its [4] Z.-d. Wu, W.-x. Xie, Fuzzy c-means clustering algorithm
conventional counterpart, particularly when applied to data based on kernel method, in: The Fifth International Conference
sets that have been shown low performance to date. The on Computational Intelligence and Multimedia Applications,
kernel fuzzy c-means algorithm achieved >90% accuracy 2003, pp. 1–6.
for eight of the 10 data sets. Overall, the kernel average [5] J.C. Bezdek, et al., Fuzzy Models and Algorithms for
Pattern Recognition and Image Processing, Kluwer Academy
linkage algorithm gave the most accurate clustering results.
Publishers, Boston, 1999.
The kernelization of the mountain algorithm improved the