A Fuzzy K-Means Clustering Algorithm Using Cluster Center Displacement
A Fuzzy K-Means Clustering Algorithm Using Cluster Center Displacement
In this paper, we present a fuzzy k-means clustering algorithm using the cluster cen-
ter displacement between successive iterative processes to reduce the computational
complexity of conventional fuzzy k-means clustering algorithm. The proposed method,
referred to as CDFKM, first classifies cluster centers into active and stable groups. Our
method skips the distance calculations for stable clusters in the iterative process. To
speed up the convergence of CDFKM, we also present an algorithm to determine the ini-
tial cluster centers for CDFKM. Compared to the conventional fuzzy k-means clustering
algorithm, our proposed method can reduce computing time by a factor of 3.2 to 6.5 us-
ing the data sets generated from the Gauss Markov sequence. Our algorithm can reduce
the number of distance calculations of conventional fuzzy k-means clustering algorithm
by 38.9% to 86.5% using the same data sets.
1. INTRODUCTION
Received September 14, 2009; revised January 19 & March 24, 2010; accepted September 7, 2010.
Communicated by Chih-Jen Lin.
995
996 CHIH-TANG CHANG, JIM Z. C. LAI AND MU-DER JENG
have few candidates to be selected as their closest centers. Lai et al. [15] exploited this
characteristic to develop a fast k-means clustering algorithm to reduce the computational
complexity of k-means clustering.
To reduce the computational complexity of FKM, Shankar and Pal used multistage
random sampling to reduce the data size [16]. This method reduced the computational
complexity by a factor of 2 to 4. Cannon, Dave, and Bezdek used look-up tables for stor-
ing distances to approximate fuzzy k-mean clustering and reduced the computing time by
a factor of about 6 [17]. It is noted that this method is applicable only for integer-valued
data in the range of 0 to 255 and the accuracy of a cluster center’s coordinate is up to 0.1.
Höppner developed an approximate FKM to reduce the computational complexity of
conventional FKM [18]. This method gave the same membership as that of conventional
FKM within a given precision and reduced the computing time of conventional FKM by
a factor of 2 to 4. It is noted that all the above method cannot obtain the same clustering
result as that of conventional FKM. After some iterations of FKM, it is expected that
many of the centers are converged to their final positions and many distance calculations
can be avoided at each partition step. This characteristic is exploited to reduce the com-
putational complexity of fuzzy k-means clustering.
In this paper, two algorithms are presented to reduce the computing time of fuzzy
k-means clustering. These two algorithms classify cluster centers (representatives) into
stable and active groups and the distance calculations are executed only for those active
cluster representatives during the iterative process. This paper is organized as follows.
Section 2 describes the fuzzy k-means clustering algorithm. Section 3 presents the algo-
rithms developed in this paper. Some theoretical analyses of the presented algorithms are
also shown in section 3. Experimental results are presented in section 4 and concluding
remarks are given in section 5.
The fuzzy k-means clustering algorithm partitions data points into k clusters Sl (l = 1,
2, …, k) and clusters Sl are associated with representatives (cluster center) Cl. The rela-
tionship between a data point and cluster representative is fuzzy. That is, a membership
ui,j ∈ [0, 1] is used to represent the degree of belongingness of data point Xi and cluster
center Cj. Denote the set of data points as S = {Xi}. The FKM algorithm is based on
minimizing the following distortion:
k N
J= ∑ ∑ uim, j dij (1)
j =1 i =1
with respect to the cluster representatives Cj and memberships ui,j, where N is the number
of data points; m is the fuzzifier parameter; k is the number of clusters; and dij is the
squared Euclidean distance between data point Xi and cluster representative Cj. It is
noted that ui,j should satisfy the following constraint:
k
∑ ui, j = 1, for i = 1 to N. (2)
j =1
A FUZZY K-MEANS CLUSTERING ALGORITHM 997
The major process of FKM is mapping a given set of representative vectors into an
improved one through partitioning data points. It begins with a set of initial cluster cen-
ters and repeats this mapping process until a stopping criterion is satisfied. It is supposed
that no two clusters have the same cluster representative. In the case that two cluster cen-
ters coincide, a cluster center should be perturbed to avoid coincidence in the iterative
process. If dij < η, then ui,j = 1 and ui,l = 0 for l ≠ j, where η is a very small positive num-
ber. The fuzzy k-means clustering algorithm is now presented as follows.
(1) Input a set of initial cluster centers SC0 = {Cj(0)} and the value of ε. Set p = 1.
(2) Given the set of cluster centers SCp, compute dij for i = 1 to N and j = 1 to k. Update
memberships ui,j using the following equation:
−1
1/ m −1 ⎞
⎛ k ⎛
1 ⎞
ui , j = ⎜ (dij )1/ m −1 ∑ ⎜ ⎟ ⎟ . (3)
⎜ = ⎝ dil ⎠ ⎟
⎝ l 1 ⎠
(4) If ||Cj(p) − Cj(p − 1)|| < ε for j = 1 to k, then stop, where ε > 0 is a very small positive
number. Otherwise set p + 1 → p and go to step 2.
The major computational complexity of FKM is from steps 2 and 3. However, the
computational complexity of step 3 is much less than that of step 2. Therefore the com-
putational complexity, in terms of the number of distance calculations, of FKM is O(Nkt),
where t is the number of iterations.
3. PROPOSED METHODS
In the iterative process of fuzzy k-means clustering, one may expect that the dis-
placements of some cluster centers will be smaller than the threshold ε after few times of
iterations and others need the much longer times of iterations to be stabilized, where ε >
0 is a very small positive number. Let the jth cluster centers used in the current and pre-
vious partitions be denoted as Cj and C′j, respectively. Denote the displacement between
Cj and C′j as Dj. That is, Dj = ||Cj − C′j||. If Dj < ε, then the vector Cj is defined as a stable
cluster center; otherwise it is called an active cluster center. The cluster associated with
an active center is called an active cluster. Similarly the cluster having a stable center is
defined as a stable cluster. The number of stable cluster centers increases as the iteration
proceeds [19].
998 CHIH-TANG CHANG, JIM Z. C. LAI AND MU-DER JENG
Denote the subsets, which consist of active cluster centers and stable cluster centers
as SCa and SCs, respectively. Let ka,i be the number of clusters in SCa at the ith iteration
of fuzzy k-means clustering. The value of ka,i decreases as the iteration proceeds in gen-
eral. The performance, in terms of computing time, of the proposed method is better, if
ka,i decreases more quickly during the process of iteration. The value and creasing rate of
ka,i depend on data distribution. For a data set with good data separation, ka,i will decrease
quickly. For an evenly distributed data set, ka,i will decrease slowly. For a real data set, a
good data separation is usually obtained. In the worst case, ka,i equals k, which is the
number of clusters. It is noted that centers of clusters in the previous iteration will be
used to partition the set of data points {Xi} in the current iteration. The FKM algorithm
stops, if the displacements of all cluster centers are less than ε. That is, if Dj < ε, then
cluster Sj is a stable cluster and dij (i = 1 to N) will not be recalculated to update ui,j in the
iterative process. The proposed algorithm will use this property to speed up fuzzy k-
means clustering. Now, the fuzzy k-means clustering algorithm using cluster displace-
ment (CDFKM) is presented below.
In this subsection, the effect of ε on ui,j will be investigated. Eq. (3) can be rewritten
as
1
ui , j = 1/ m −1
. (5a)
k ⎛ dij ⎞
1+ ∑ ⎜ ⎟
l =1, l ≠ j ⎝ d il ⎠
That is,
1/ m −1
k ⎛ dij ⎞ 1
∑ ⎜d ⎟ =(
ui , j
− 1). (5b)
l =1, l ≠ j ⎝ il ⎠
Let the squared Euclidean distances between data point Xi and cluster centers Cj(p −
A FUZZY K-MEANS CLUSTERING ALGORITHM 999
1) and Cj(p) be d'ij and dij, respectively. In the case of ||Cj(p) − Cj(p − 1)|| < ε, d'ij is used
to calculate memberships ui,j. Denote u'i,j as the membership of Xi with respect to Cj(p), if
dij is replaced by d'ij in Eq. (3). Similarly, let ui,j be the membership of Xi with respect to
Cj(p) for the case that dij is used to calculate memberships. For many applications, m = 2
is used [10] and is adopted in this paper. In the case of ||Cj(p) − Cj(p − 1)|| < ε, d'ij can be
estimated by
Eq. (6) implies that if ||Cj(p) − Cj(p − 1)|| < ε, then |d'ij − dij|/(dij)1/2 ≈ O(||Cj(p) − Cj(p
− 1)||). That is, |d'ij − dij|/(dij)1/2 equals approximately the displacement of the ith cluster’s
center. u'i,j is obtained by replacing dij in Eq. (5a) by d'ij given in Eq. (6). That is,
1
ui′, j ≈ k ⎛ dij
⎞ k
1
1+ ∑ ⎜ ⎟ ± O( ε )(d ij )1/2
∑ d
l =1, l ≠ j ⎝ d il
⎠ l =1, l ≠ j il
(7a)
1
= ,
k ⎛ dij ⎞ k dij
1+ ∑ ⎜ ⎟ ± O(ε )(dij )
−1/2
∑
l =1, l ≠ j ⎝ dil ⎠ l =1, l ≠ j d il
k ⎛ dij ⎞ 1
∑ ⎜ ⎟=( − 1). (7b)
l =1, l ≠ j ⎝ d il ⎠ ui , j
k ⎛ dij ⎞ 1
Substituting the term ∑ ⎜
d
⎟ in Eq. (7a) by (
u
− 1) as shown in Eq. (7b) gives
l =1, l ≠ j ⎝ il ⎠ i, j
1
ui′, j ≈ ≈ ui′, j ± O(ε )(dij )1/ 2 (ui′, j − (ui′, j ) 2 ). (8)
1 ⎛ 1 ⎞
± O(ε )(dij ) −1/2 ⎜ − 1⎟
ui , j ⎜ ui , j ⎟
⎝ ⎠
That is, |u'i,j − ui,j | ≈ O(ε)(dij)−1/2(ui,j − (ui,j)2) ≤ O(ε)(dij)−1/2. In the case of dij < η, it
implies that ui,j = u'i,j = 1 and |u'i,j − ui,j| = 0. Since η is a very small positive number, it im-
plies that η << η1/2. In the case of η ≤ dij ≤ ε, one can obtain |u'i,j − ui,j| ≤ O(ε)/η1/2. If ε = η
is chosen, |u'i,j − ui,j| << 1 will be obtained due to η << η1/2. In this paper, ε = η = 0.00001
is used.
The major computational complexity of CDFKM is from steps 4, 6, and 7. To com-
1/ m −1
k ⎛ 1 ⎞
pute ui,j, ∑ ⎜ ⎟ for each data point Xi are first calculated and stored. That is, Nk
l =1 ⎝ dil ⎠
multiplications and additions are needed to update ui,j at step 6. To calculate distances
between all data points and cluster centers, Nk distance calculations are required. Each
distance calculation requires d multiplications and (2d − 1) additions, where d is the data
1000 CHIH-TANG CHANG, JIM Z. C. LAI AND MU-DER JENG
dimension. That is, Nkd multiplications and Nk(2d − 1) additions are needed to calculate
distances at step 4. Therefore, it can be concluded that the computational complexities of
steps 6 and 7 are much less than that of step 4. Let ka be the average number of active clus-
t −1
ter centers at each stage of iteration for CDFKM, where ka = (∑ ka , i )/t , t is the number
i =0
of iteration, and ka,i is the number of active clusters at the ith iteration. For a data set with
good data separation, a small value of ka is expected. For an evenly distributed data set,
ka may equal k. For the worst case, ka equals k and the proposed method will make no
improvement over original FKM. The probability of performing distance calculations at
step 4 is ka/k, where k is the number of clusters. That is, the computational complexity, in
terms of the number of distance calculations, of CDFKM is O(Nkt) × O(ka/k) = O(Nkat),
where t is the number of iterations. Since 1 ≤ ka ≤ k, the computational complexity, in
terms of the number of distance calculations, of CDFKM is upper bounded by O(Nkt).
The computing time and number of iterations for CDFKM increase as the data size
increases. The proposed method first uses CDFKM to generate a codebook from subsets
of S and then adopt this codebook as the initial codebook for CDFKM to partition the
whole data set S into k clusters. This initial approximation helps to reduce the number of
iterations for CDFKM. To speed up the convergence of CDFKM, M subsets of the data
set S are used to estimate the initial cluster centers for CDFKM. Denote these M subsets
as SBl (l = 1 to M). The data size of SBl is fN, where f < 1 and N is the data points in S. It
is noted that the data points in SBl are selected randomly from S and SBi ∩ SBj = ∅ (i ≠
j), where ∅ is an empty set. The cluster center estimation algorithm (CCEA) first gener-
ates an initial set of cluster centers SC0, which is obtained by selecting randomly k data
points from SB1, for CDFKM to partition SU, where SU = SB1. This partition process
will generate a set of cluster centers SC1. Setting SU = SU ∪ SBp, where p = 2 to M,
CCEA uses SCp-1 as the initial set of cluster centers for CDFKM to generate SCp using
the data set SU. This process is repeated until the set of cluster centers SCM is obtained.
Finally, CDFKM uses SCM as the initial set of cluster centers to partition the whole data
set S. Now, the cluster center estimation algorithm (CCEA) is presented as follows,
The set SC0 is obtained by selecting randomly k data points from the subset SB1. A
subset of size sN is used by CCEA to obtain an initial set of cluster centers for CDFKM,
where 0 < s < 1. Note that f is a small positive real number. The value of M is so chosen
that it is less than (s/f). Using the initial cluster centers determined by CCEA for
CDFKM, the corresponding algorithm is denoted as modified CDFKM (MCDFKM).
A FUZZY K-MEANS CLUSTERING ALGORITHM 1001
4. EXPERIMENTAL RESULTS
Example 2: Data sets are generated with cluster centers evenly distributed over the hy-
percube [− 1, 1]d.
In this example, each data set consists of 20,000 data points. Fig. 1 shows the aver-
age computing time for FKM, CDFKM, and MCDFKM with M = 1, whereas Fig. 2 gives
1002 CHIH-TANG CHANG, JIM Z. C. LAI AND MU-DER JENG
Table 1. The average computing time (in seconds) for FKM, CDFKM, and MCDFKM
using a data set generated three real images.
k
Method
16 32 64 128
FKM 509.50 1328.92 2925.56 15355.94
CDFKM 468.86 1212.73 2329.64 7920.42
MCDFKM (M = 1) 338.67 894.00 3192.59 7623.59
MCDFKM (M = 2) 349.08 911.80 3434.36 7791.73
MCDFKM (M = 3) 394.97 983.69 3678.59 8404.13
MCDFKM (M = 4) 412.97 1072.31 4069.98 10147.78
MCDFKM (M = 5) 475.51 1138.83 4390.61 10205.95
MCDFKM (M = 6) 530.64 1294.08 5385.81 14342.30
Table 2. The average number of distance calculations per data point for FKM, CDFKM,
and MCDFKM using a data set generated from three real images.
k
Method
16 32 64 128
FKM 82577024 210768032 462431360 2415968128
CDFKM 65275360 173313824 225074624 398747520
MCDFKM (M = 1) 41237844 98798730 289202130 462964383
MCDFKM (M = 2) 42805470 106209666 301687848 503016639
MCDFKM (M = 3) 47960496 112179816 309157617 525194391
MCDFKM (M = 4) 49906440 118076676 341583657 555601119
MCDFKM (M = 5) 56393328 123518739 360170358 623380608
MCDFKM (M = 6) 62447400 136084641 387006372 669698964
Table 3. The average distortion per data point for FKM, CDFKM, and MCDFKM using
a data set generated from three real images.
k
Method
16 32 64 128
FKM 828.68 404.38 200.13 99.76
CDFKM 828.68 404.38 200.13 99.76
MCDFKM (M = 1) 828.68 404.38 200.13 99.76
MCDFKM (M = 2) 828.68 404.38 200.13 99.76
MCDFKM (M = 3) 828.69 404.38 200.13 99.77
MCDFKM (M = 4) 828.69 404.38 200.13 99.77
MCDFKM (M = 5) 828.69 404.38 200.13 99.77
MCDFKM (M = 6) 828.69 404.38 200.13 99.77
the average number of distance calculations per data point for FKM, CDFKM, and MCD-
FKM. Fig. 3 shows the average distortion per data point for these three methods. From
Figs. 1 and 2, it can be found that the proposed method MCDFKM with M = 1 outper-
forms FKM in terms of the computing time and number of distance calculations. From
Fig. 1, it can also be found find that the computing time of the proposed approach MCD-
FKM with M = 1 grows linearly with data dimension. Compared to FKM, MCDFKM
with M = 1 can reduce the computing time by a factor of 2.6 to 3.1. Fig. 3 shows that
FKM, CDFKM, and MCDFKM with M = 1 can obtain the same clustering result.
A FUZZY K-MEANS CLUSTERING ALGORITHM 1003
FKM
CDFKM
MCDFKM
Fig. 1. The average computing time for data sets from example 2 with N = 20,000 and k = 40.
FKM
CDFKM
MCDFKM (M=1)
Fig. 2. The average number of distance calculations per data point for data sets from example 2 with
N = 20,000 and k = 40.
FKM
CDFKM
MCDFKM (M=1)
Fig. 3. The average distortion per data point for data sets from example 2 with N = 20,000 and k = 40.
1004 CHIH-TANG CHANG, JIM Z. C. LAI AND MU-DER JENG
Example 3: Data sets are generated from the Gauss Markov sequence.
In this example, each data set is obtained from the Gauss Markov source. Figs. 4
and 7 present the average computing time with k = 128 and 256, respectively. Figs. 5 and
8 show the average number of distance calculations per data point with k = 128 and 256,
respectively. Figs. 6 and 9 present the average distortion per data point. From these fig-
ures, it can be found that the computing time of MCDFKM with M = 1 increases linearly
with the data dimension d. From Figs. 4, 5, 7, and 8, it can be found that MCDFKM with
M = 1 has the best performance in terms of the computing time and number of distance
calculations for all cases. Compared with FKM, the proposed method MCDFKM with M
= 1 can reduce the computing time by a factor of 3.0 to 6.5. From Figs. 5 and 8, it can be
found that the number of distance calculations of the proposed approach MCDFKM with
M = 1 is independent of data dimension. Figs. 6 and 9 also confirm that the proposed
approaches and FKM can obtain the same clustering result.
140
120
Computing time (seconds)
100
80 FKM
FKM
60 CDFKM
FKMCUCD
40 MCDFKM (M=1)(M=1)
MFKMCUCD
20
0
16 20 24 28 32 36 40
Dimension
Fig. 4. The average computing time for data sets from example 3 with N = 10,000 and k = 128.
FKM
CDFKM
MCDFKM (M=1)
Fig. 5. The average number of distance calculation per data point for data sets from example 3 with
N = 10,000 and k = 128.
A FUZZY K-MEANS CLUSTERING ALGORITHM 1005
FKM
CDFKM
MCDFKM (M=1)
Fig. 6. The average distortion per data point for data sets from example 3 with N = 10,000 and k =
128.
FKM
CDFKM
MCDFKM (M=1)
Fig. 7. The average computing time for data sets from example 3 with N = 10,000 and k = 256.
1006 CHIH-TANG CHANG, JIM Z. C. LAI AND MU-DER JENG
FKM
CDFKM
MCDFKM (M=1)
Fig. 8. The average number of distance calculation per data point for data sets from example 3 with
N = 10,000 and k = 256.
FKM
CDFKM
MCDFKM (M=1)
Fig. 9. The average distortion per data point for data sets from example 3 with N = 10,000 and k =
256.
Table 4. The average computing time (in seconds) for FKM, CDFKM, and MCDFKM
using the Statlog data set.
k
Method
16 32 64
FKM 297.64 916.06 1838.70
CDFKM 187.81 371.50 802.89
MCDFKM (M = 1) 143.64 314.49 724.46
To visualize the clustering result, a data set with N = 1000 and d = 2 is generated.
This data set has ten cluster centers distributed over the hypercube [− 1, 1]2. A Gaussian
distribution with standard deviation = 0.15 along each coordinate is used to generate data
points around each center. This data set is divided into 10 clusters using FKM and
MCDFKM with M = 1. Fig. 10 presents the clustering results. From Fig. 10, it can be
found that FKM and MCDFKM with M = 1 can obtain the same clustering result.
A FUZZY K-MEANS CLUSTERING ALGORITHM 1007
Table 5. The average number of distance calculations per data point for FKM, CDFKM,
and MCDFKM using the Statlog data set.
k
Method
16 32 64
FKM 25743984 79291488 158171072
CDFKM 9706652 17309091 29728478
MCDFKM (M = 1) 9855873 18970822 40250065
Table 6. The average distortion per data point for FKM, CDFKM, and MCDFKM using
the Statlog data set.
k
Method
16 32 64
FKM 1368.78 684.39 342.19
CDFKM 1368.78 684.39 342.19
MCDFKM (M = 1) 1368.78 684.39 342.19
0.8
0.6
0.4
centers (FKM)
0
centers (MCDFKM)
-0.2
-0.4
-0.6
-0.8
-1
-1.2
-1.5 -1 -0.5 0 0.5 1 1.5
Fig. 10. The clustering results for FKM and MCDFKM with M = 1 using a synthetic data set of size
1000.
5. CONCLUSIONS
In this paper, two novel algorithms are developed to speed up fuzzy k-means clus-
tering through using the information of center displacement between two successive par-
tition processes. A cluster center estimation algorithm is also presented to determine the
initial cluster centers for the proposed algorithm CDFKM. The computing time of the
proposed algorithm MCDFKM with M = 1 grows linearly with data dimension Com-
pared to FKM, the proposed algorithm MCDFKM with M = 1 can effectively reduce the
computing time and number of distance calculations. Compared with FKM, the proposed
method MCDFKM with M = 1 can reduce the computing time by a factor of 2.6 to 3.1
for the data sets generated with cluster centers evenly distributed over a hypercube. Ex-
perimental results show that the proposed approaches and FKM can obtain the same
1008 CHIH-TANG CHANG, JIM Z. C. LAI AND MU-DER JENG
clustering result. The proposed methods are used to reduce the computational complexity
of conventional fuzzy k-means clustering. Therefore the Euclidean distance is used as the
distortion measure. However, the proposed method can be extended to other distortion
measure, such as Hamming distance.
REFERENCES