0% found this document useful (0 votes)
20 views5 pages

Evaluation of The Performance of Clustering Algorithms in

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views5 pages

Evaluation of The Performance of Clustering Algorithms in

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

Pattern Recognition 38 (2005) 607 – 611

www.elsevier.com/locate/patcog

Rapid and brief communication


Evaluation of the performance of clustering algorithms in
kernel-induced feature space
Dae-Won Kima,∗ , Ki Young Leeb , Doheon Leea , Kwang H. Leea, b
a Department of BioSystems and Advanced Information Technology Research Center, Korea Advanced Institute of Science and Technology,
Guseong-dong, Yuseong-gu, 305-701, Daejeon, Republic of Korea
b Department of Electrical Engineering and Computer Science, Korea Advanced Institute of Science and Technology, Guseong-dong,
Yuseong-gu, 305-701, Daejeon, Republic of Korea

Received 7 September 2004; accepted 15 September 2004

Abstract
By using a kernel function, data that are not easily separable in the original space can be clustered into homogeneous
groups in the implicitly transformed high-dimensional feature space. Kernel k-means algorithms have recently been shown
to perform better than conventional k-means algorithms in unsupervised classification. However, few reports have examined
the benefits of using a kernel function and the relative merits of the various kernel clustering algorithms with regard to the
data distribution. In this study, we reformulated four representative clustering algorithms based on a kernel function and
evaluated their performances for various data sets. The results indicate that each kernel clustering algorithm gives markedly
better performance than its conventional counterpart for almost all data sets. Of the kernel clustering algorithms studied in
the present work, the kernel average linkage algorithm gives the most accurate clustering results.
䉷 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
Keywords: Clustering; Kernel; k-means; Fuzzy c-means; Average linkage; Mountain algorithm

1. Introduction Users of kernel clustering methods are often left wonder-


ing to what extent kernel clustering algorithms are superior
A clustering has emerged as a popular technique for pat- to conventional algorithms with regard to the data distribu-
tern recognition, image processing, and data mining. The tion and which clustering algorithm is the most improved
kernel-based classification in the feature space not only pre- by reformulation in the kernel-induced feature space. In this
serves the inherent structure of groups in the input space, paper, we evaluate the performance of kernel clustering al-
but also simplifies the associated structure of the data [1]. gorithms with a view to providing answers to these ques-
Since Girolami first developed the kernel k-means clustering tions. To our knowledge, this is the first such comparison
algorithm for unsupervised classification [2], several stud- of kernel clustering algorithms for general purpose cluster-
ies have demonstrated the superiority of kernel clustering ing. We consider four well-known clustering algorithms: the
algorithms over other approaches to clustering [3,4]. k-means, fuzzy c-means, average linkage, and mountain al-
gorithms. We compare the performances of these four algo-
rithms with those of the kernel k-means algorithm, the kernel
∗ Corresponding author. Tel.: +82 42 869 4353; fuzzy c-means algorithm, and formulations of the average
fax: +82 42 869 8680. linkage and mountain algorithms based on a kernel function.
E-mail address: dwkim@bisl.kaist.ac.kr (D.-W. Kim). These comparisons are made over a variety of data sets.

0031-3203/$30.00 䉷 2004 Pattern Recognition Society. Published by Elsevier Ltd. All rights reserved.
doi:10.1016/j.patcog.2004.09.006
608 D.-W. Kim et al. / Pattern Recognition 38 (2005) 607 – 611

2. Kernel clustering algorithms consecutive iterations.



 k  n n
Given an unlabeled data set X = {x1 , . . . , xn } in the J (X) = K(xj , xj ) − 2 zik K(xk , xj )
d-dimensional space R d , let  : R d → H be a non- ni
i=1 j =1 k=1
linear mapping function from this input space to a high- 
n n
dimensional feature space H. By applying the non-linear 1 
mapping function , the dot product xi ·xj in the input space + 2 zik zil K(xk , xl ) . (3)
ni k=1 l=1
is mapped to (xi ) · (xj ) in the feature space. The key
notion in kernel-based learning is that the mapping func- The kernel k-means algorithm lacks the step in which cluster
tion  need not be explicitly specified. The dot product centroids are updated so as to reassign the data point to
(xi ) · (xj ) in the high-dimensional feature space can be the closest cluster because the reassignment can be made
calculated through the kernel function K(xi , xj ) in the in- without calculating the centroids due to the implicit mapping
put space R d . via the kernel function in Eq. (2).

2.1. Kernel k-means algorithm 2.2. Kernel fuzzy c-means algorithm

Given an unlabeled data set X and a mapping  : R d → The kernel fuzzy c-means algorithm in the feature space
H , the k-means algorithm in the high-dimensional feature by a mapping  minimizes the function Jm [4]:
space iteratively searches for k clusters by minimizing the  n
c 
function J [2,3]: Jm (X) = (ij )m (xj ) − vi 2 , (4)
i=1 j =1
k 
 n
J (X) = (xj ) − vi 2 , (1) where ij is the membership degree of data point xj to the ith
i=1 j =1 fuzzy cluster, and m
is a fuzziness coefficient.
 The ith cluster
centroid is vi = nj=1 (ij )m (xj )/ nj=1 (ij )m . The k-
n
where the ith cluster centroid vi = n−1
i j =1 zij (xj ) and means algorithm repeatedly updates the k clusters at each
n successive iteration, whereas the fuzzy c-means algorithm
ni = j =1 zij . Here zij indicates whether data point xj
belongs to the ith cluster; specifically, zij = 1 if it belongs to iteratively updates the new membership degree ij at each
the ith cluster and 0 otherwise. The key notion in the kernel iteration. The update of ij in the feature space is defined
k-means algorithm lies in the calculation of the distance in through the kernel in the input space as follows:
the feature space. The distance between (xj ) and vi in the   1/(m−1) −1
c
feature space is calculated through the kernel in the input  (xj ) − vi 2
ij =   , (5)
space: (x ) − v  2
k=1 j k

(xj ) − vi 2 where


n
1  (xj ) − vi 2
= (xj ) · (xj ) − 2(xj ) · zik (xk ) n
ni ( )m (xk )
k=1
n n
= (xj ) · (xj ) − 2(xj ) · k=1 n ik m
1  1  n n k=1 (ik )
+ zik (xk ) · zil (xl ) m m
ni ni ( ) (xk ) ( ) (xl )
k=1 l=1 + k=1 n ik m · l=1 n il m
n (
k=1 ik  ) l=1 (il )
2  n
= (xj ) · (xj ) − zik (xk )(xj ) 2 k=1 (ik )m (xk )(xj )
ni = (xj ) · (xj ) − n m
k=1
n n k=1 (ik )
n n
1  m m
(ik ) (il ) (xk )(xl )
+ 2 zik zil (xk )(xl ) + k=1 l=1n
ni k=1 l=1 ( k=1 (ik )m )2

2 
n 2 nk=1 (ik )m K(xk , xj )
= K(xj , xj ) − zik K(xk , xj ) = K(xj , xj ) − n m
ni
k=1 n n k=1 (ik )
m m
(ik ) (il ) K(xk , xl )
n n
1  + k=1 l=1  . (6)
+ 2 zik zil K(xk , xl ). (2) ( nk=1 (ik )m )2
ni k=1 l=1
Similar to the kernel k-means algorithm, the kernel fuzzy
Therefore, the objective function can be rewritten as c-means algorithm does not need to calculate the cluster
Eq. (3), and the process of updating clusters are repeated centroids because the centroid information is considered in
until there is no significant improvement in J between updating the membership degree ij .
D.-W. Kim et al. / Pattern Recognition 38 (2005) 607 – 611 609

2.3. Kernel average linkage algorithm using the following modified mountain function:

Compared to k-means-type algorithms, the average M j ((xi ))


n

linkage algorithm is more flexible with regard to the cluster −(xi )−(xj∗−1 )2
shape but has much greater time and space complexi- = M j −1 ((xi )) − Mj∗−1 e (10)
ties. Based on the average distance between two clusters j =1
Fn = {x1 , . . . , xn } and Fm = {y1 , . . . , ym }, the algorithm = M j −1 ((xi )) − Mj∗−1
iteratively merges two closest clusters until a single clus- n
 −(K(xi ,xi )−2K(xi ,xj∗−1 )+K(xj∗−1 ,xj∗−1 ))
ter is obtained. The kernelization of the average linkage × e , (11)
algorithm is simpler and more intuitive than those of the j =1
k-means-type algorithms. Given a mapping , the dis-
tance between two clusters Fn = {(x1 ), . . . , (xn )} and where M j is the new mountain function, M j −1 is the old
Fm = {(y ), . . . , (y )} in the feature space is calculated
1 m mountain function, Mj∗−1 is the maximum value of M j −1 ,
as and xj∗−1 is the newly found centroid.

n m
1 
Avg (Fn , Fm
) = (xi ) − (yj )2 , (7) 3. Experimental results
nm
i=1 j =1
To test the various kernel clustering algorithms, we ap-
where plied the four conventional clustering algorithms as well
as their kernel versions to 10 widely used data sets and
(xi ) − (xj )2 = (xi ) · (xi ) − 2(xi )(xj ) compared the performances of the algorithms. The data
+ (xj )(xj ) sets employed were the BENSAID (49 data/3 clusters) [5],
DUNN (90 data/2 clusters) [5], IRIS (150 data/3 clusters)
= K(xi , xi ) − 2K(xi , xj )
[5], ECOLI (336 data/7 clusters), CIRCLE, BLE-3, BLE-
+ K(xj , xj ). (8) 2, UE-4, UE-3, and ULE-4 data sets. This selection of data
sets includes various types of clusters, such as hyperspheri-
The iterative merging procedure in the feature space con- cal and hyperellipsoidal, and balanced and unbalanced types
tinues until all data points have been merged into a single (Fig. 1). The parameters used in the k-means and fuzzy c-
cluster or the number of merged groups reaches at prespec- means algorithms were a termination criterion of  = 0.001
ified number of clusters k. and a weighting exponent of m = 2.0 [5]. The initial cen-
troids were uniformly distributed across the data set [5]. The
parameters used for the mountain algorithm were  = 5.4
2.4. Kernel mountain algorithm and  = 1.5. The RBF kernel function was used in all four
kernel clustering algorithms due to its superiority over other
The mountain algorithm estimates the cluster centroids kernel functions [2].
by constructing and destroying the mountain function on a The clustering accuracy achieved by each clustering al-
grid space. The mountain function indicates the potential gorithm for each of the 10 data sets is listed in Table 1.
that each grid point has to be a cluster centroid. To reduce On the whole, the conventional k-means and fuzzy c-means
the computational complexities of the original algorithm, we algorithms showed similar performances for each data set,
employed the subtractive mountain algorithm in which the with average accuracies of 73.60% and 74.39%, respectively.
mountain function is calculated on data points rather than In comparison to these two algorithms, the average link-
grid points. age algorithm showed better clustering performance (aver-
Given a mapping , the mountain function at a data point age 83.33%); in particular, it achieved 100.0% accuracy for
(xi ) in the feature space is defined as four of the ten data sets. In agreement with previous works,
the present results, particularly those for the unbalanced and
n
 ellipsoidal data sets (e.g., BENSAID or BLE-2), show that
e−(xi )−(xj ) ,
2
M((xi )) = (9) the average linkage algorithm can handle a greater range
j =1 of cluster shapes than the k-mean-type algorithms. How-
ever, it showed substantially different behavior when applied
where (xi ) − (xj )2 = K(xi , xi ) − 2K(xi , xj ) + to similar data sets; for example it achieved accuracies of
K(xj , xj ). A higher value of M((xi )) indicates that (xi ) 56.0% and 100.0% for the BLE-3 and BLE-2, respectively,
has more data points (xj ) near to it in the feature space. and accuracies of 71.45% and 100.0% for the UE-4 and UE-
After calculating the mountain values, the data point whose 3 data sets. Such discrepancies arise because the average
mountain value is M1∗ = Max i [M((xi ))] is selected as linkage algorithm is sensitive to the order in which data are
the first cluster centroid. Subsequent centroids are found presented. Compared to the other algorithms, the mountain
610 D.-W. Kim et al. / Pattern Recognition 38 (2005) 607 – 611

Fig. 1. Two- and three-dimensional data sets used in our evaluation: (a) CIRCLE, (b) BLE-3, (c) BLE-2, (d) UE-4, (e) UE-3, (f) ULE-4.

Table 1
Clustering accuracy (%) achieved by each clustering algorithm for 10 data sets

Data set Conventional Kernel

k-means FCM Average Mountain k-means FCM Average Mountain

BENSAID 79.59 73.47 100.0 85.71 83.67 93.88 100.0 100.0


DUNN 70.00 70.00 100.0 83.33 71.11 95.56 100.0 100.0
IRIS 89.33 89.33 90.67 52.67 96.00 93.33 89.33 93.33
ECOLI 42.86 49.11 76.49 51.19 68.75 61.01 77.38 69.05
CIRCLE 50.76 52.79 62.44 55.84 100.0 93.40 82.74 62.94
BLE-3 65.67 65.67 56.00 70.33 76.33 74.67 100.0 71.67
BLE-2 88.50 87.75 100.0 85.25 100.0 94.00 100.0 100.0
UE-4 77.25 66.00 71.45 73.50 100.0 98.50 100.0 84.75
UE-3 95.83 95.00 100.0 51.17 98.83 96.67 100.0 95.67
ULE-4 76.25 94.75 76.25 96.25 98.00 96.25 100.0 96.25

Avg. (%) 73.60 74.39 83.33 70.52 89.27 89.73 94.95 87.37

algorithm exhibited more unstable performance. The aver- terparts, and the kernel average linkage and kernel moun-
age accuracy of the conventional mountain algorithm was tain algorithms were approximately 12% and 17% more ac-
70.52%. curate than the conventional algorithms, respectively. The
The clustering results obtained using the kernel clustering clustering performance of the kernel k-means and kernel
algorithms are listed on the right side of Table 1. It is evident fuzzy c-means algorithm provided significantly better clus-
that the kernel clustering algorithms give markedly better tering performance for data sets that have previously been
performance than the conventional algorithms. On average, shown low performance, such as CIRCLE, BLE-2, UE-4,
the kernel k-means and kernel fuzzy c-means algorithms and ULE-4 [5]. In addition, for the CIRCLE and IRIS data
were about 15% more accurate than their conventional coun- sets, which contain ring-shaped clusters and overlapping
D.-W. Kim et al. / Pattern Recognition 38 (2005) 607 – 611 611

clusters, respectively, these algorithms gave better clustering clustering accuracy to a degree similar to that achieved for
results than the kernel average linkage algorithm. The ker- the other algorithms. The kernel average linkage algorithm
nel average linkage algorithm successfully classified seven is the most appropriate to use when no prior knowledge on
of the 10 data sets, giving accuracies of 100.0%. Of partic- the characteristics of the data set is available.
ular note are the results for the BLE-3, UE-4, and ULE-4
data sets, for which the conventional average linkage al-
gorithm gave accuracies of 50–70% but the kernel version Acknowledgements
classified with 100% accuracy. In terms of the total accu-
racy, the kernel average linkage algorithm was the most ac- This work was supported by the Korean Systems Biol-
curate clustering algorithm (94.95%). The kernel mountain ogy Research Grant (M1-0309-02-0002) from the Ministry
algorithm gave the greatest enhancement of clustering per- of Science and Technology. We would like to thank Chung
formance for the mountain algorithm (17% improvement), Moon Soul Center for BioInformation and BioElectronics
with accuracies of more than 90% for six data sets. No- and the IBM SUR program for providing research and com-
tably, the ranking of the conventional clustering algorithms puting facilities.
in terms of overall accuracy was preserved in their ker-
nel versions: the kernel average linkage algorithm was the
most accurate and the kernel mountain algorithm the least References
accurate.
[1] K.-R. Muller, et al., An introduction to kernel-based learning
algorithms, IEEE Trans. Neural Networks 12 (2) (2001)
181–202.
4. Conclusions
[2] M. Girolami, Mercer kernel-based clustering in feature space,
IEEE Trans. Neural Networks 13 (3) (2002) 780–784.
Compared to the corresponding conventional clustering [3] R. Zhang, A.I. Rudnicky, A large scale clustering scheme
algorithms, the kernel clustering algorithms showed bet- for kernel k-means, in: The 16th International Conference on
ter clustering results for almost all data sets. The kernel Pattern Recognition, 2002, pp. 289–292.
k-means algorithm was significantly more accurate than its [4] Z.-d. Wu, W.-x. Xie, Fuzzy c-means clustering algorithm
conventional counterpart, particularly when applied to data based on kernel method, in: The Fifth International Conference
sets that have been shown low performance to date. The on Computational Intelligence and Multimedia Applications,
kernel fuzzy c-means algorithm achieved >90% accuracy 2003, pp. 1–6.
for eight of the 10 data sets. Overall, the kernel average [5] J.C. Bezdek, et al., Fuzzy Models and Algorithms for
Pattern Recognition and Image Processing, Kluwer Academy
linkage algorithm gave the most accurate clustering results.
Publishers, Boston, 1999.
The kernelization of the mountain algorithm improved the

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy