0% found this document useful (0 votes)
20 views5 pages

Conference Paper1

Uploaded by

Sunny Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
20 views5 pages

Conference Paper1

Uploaded by

Sunny Nguyen
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

,(((WK,QWHUQDWLRQDO&RQIHUHQFHRQ,QGXVWULDO(QJLQHHULQJDQG$SSOLFDWLRQV

Genetic Algorithm Based Fuzzy c-Ordered-Means to Cluster Analysis

R. J. Kuo, Jun-Yu Lin, Thi Phuong Quyen Nguyen


Department of Industrial Management
National Taiwan University of Science and Technology
Taipei, Taiwan
e-mail: rjkuo@mail.ntust.edu.tw, m10601107@mail.ntust.edu.tw, quyen.ntp@gmail.com

Abstract—Clustering is an important technique which is used to the distance measure and Yager's ordered weighted averaging
discover the data structure. Clustering is applied in many areas, operator [7] to reduce sensitivity to outliers. However, the
such as customer segmentation, image recognition, social results of the FCOM is still very susceptible to initial centroids
science, and so on. However, most of the existing clustering due to the randomly initial centers. Moreover, the FCOM may
methods suffer from two major drawbacks including 1) the terminate at a locally optimal solution. Therefore, this study
susceptibility of clustering result due to the randomly initial proposes a method named genetic algorithm-based fuzzy c-
centers and 2) the sensitivity of outliers and noise data. To solve ordered-means algorithm (GA-FCOM) which integrates the
these two problems, this study proposes a new algorithm named FCOM with genetic algorithm (GA) for several purposes:
genetic algorithm-based fuzzy c-ordered-means algorithm (GA-
1) Find the better initial centroids.
FCOM) Herein, the fuzzy c-ordered-means algorithm (FCOM)
2) Exploit the global optimal solution.
can deal with noise and outliers data while the genetic algorithm
The rest of this paper is organized as follows. Section 1
is employed to obtain the optimal initial centroids efficiently
shows the introduction of this research. Section 2 provides the
during the clustering process. An experiment is conducted using
the benchmark datasets collected from the UCI machine literature review including FCM, FCOM, and GA for
repository to validate the proposed algorithm. The clustering. Section 3 presents a methodology of the proposed
computational results indicate that the proposed GA-FCOM GA-FCOM. Section 4 shows the computational results in
outperforms fuzzy c-means algorithm (FCM) and FCOM in detail. Finally, Section 5 presents the conclusions and future
terms of both accuracy and objective function values. recommendations of this study.

Keywords-component; clustering; meta-heuristics; genetic II. LITERATURE REVIEW


algorithm; outlier; fuzzy c-means algorithm; fuzzy c-ordered A. Fuzzy c-Means Algorithm
means algorithm
The fuzzy c-means algorithm (FCM), which is one of the
I. INTRODUCTION most famous algorithms for the fuzzy clustering, was
The clustering is a very important field in data mining. proposed in 1984 based on the concept of fuzzy sets theory
Clustering mainly explores the degree of similarity between [5]. Suppose that there is a set of data containing N points (p-
data in which data instance grouped in the same cluster are dimensions), and it is expected to divide the data into c
similar. In contrast, there is a large dissimilarity in the clusters. The matrixܷ ൌ ሾ‫ݑ‬௜௝ ሿேൈ௖ can be used to represent its
different clusters [1]. Clustering is also known as an membership degree of data instance ‫ݔ‬௜ to cluster ݆. Let ܸ ൌ
unsupervised learning method which is usually performed ሼ‫ݒ‬ଵ ǡ ‫ݒ‬ଶǡ ǥ ǡ ‫ݒ‬௖ ሽ denotes the center of c clusters. The FCM
without the pre-determined labels [2]. The clustering process partitions a set of objects into c clusters by minimizing the
evaluates the potential set by defining the objective function. following objective function:
௠ ଶ
There are many clustering algorithms which are mainly ‹ ቄ‫ܬ‬ሺܷǡ ܸǡ ܺሻ ൌ σே ௖
௜ୀଵ σ௝ୀଵ൫‫ݑ‬௜௝ ൯ ࣞ൫‫ݔ‬௜ ǡ ‫ݒ‬௝ ൯ ቅǡ (1)
divided into two groups: hierarchical clustering and Subject to
partitional clustering. For hierarchical clustering, the number ‫ݑ‬௜௝ ‫ א‬ሾͲǡͳሿǢσ௖௝ ‫ݑ‬௜௝ ൌ ͳǢ ͳ ൑ ݆ ൑ ܿǢ ͳ ൏ ݅ ൏ ܰǡ (2)
of clusters can be changed from large to small or from small Eq. (1) is a distance-based objective function where
to large by merging or splitting. The partitional clustering is a
ࣞ൫‫ݔ‬௜ ǡ ‫ݒ‬௝ ൯ measures the distance between ‫ݔ‬௜ to the center of
division of dataset into non-overlapping subsets [3]. K-means
algorithm [4] and fuzzy c-means algorithm (FCM) [5] are two cluster ‫ݒ‬௝ , and ݉ is a weighting exponent ( ݉ ‫ א‬ሺͳǡ λሻሻ
popular algorithms in the partitional clustering method. which is usually set at 2 [8]. Commonly, Euclidean distance
Regarding fuzzy clustering, the FCM is one of the famous is usually used. Therefore, ࣞ൫‫ݔ‬௜ ǡ ‫ݒ‬௝ ൯ is defined as follows:
clustering methods based on the concept of fuzzy sets theory.
ࣞ൫‫ݔ‬௜ ǡ ‫ݒ‬௝ ൯ ൌ ටσ௣௟ୀଵሺ‫ݔ‬௜௟ െ ‫ݒ‬௝௟ ሻଶ Ǥ (3)
Herein, the data points no longer belong to any cluster only.
The data point is expressed in the clusters by probability The constraint in Eq. (2) shows that limitation of the
which can be illustrated by a fuzzy membership matrix. membership degree. For the data instance ‫ݔ‬௜ , the sum of the
However, the results of FCM are susceptible to outliers and degrees to which it belongs to each cluster should be exactly
noises. Hence, the fuzzy c-ordered-means algorithm (FCOM) equal to 1. To minimize ‫ܬ‬, the partition matrix and cluster
[6] is the algorithm developed mainly for overcoming the centers are updated as follows:
above shortcomings. The FCOM uses robust loss function as

978-1-7281-0851-3/19/$31.00 ©2019 IEEE 


ͳ However, the values of ݄௜௟௞ depend on residuals ݁௜௞௟ where
‫ݑ‬௜௝ ൌ Ǣ

݀൫‫ݔ‬ ௜ ǡ ‫ݒ‬௝ ൯
ଶ ݁௜௞௟ ൌ ‫ݔ‬௞௟ െ ‫ݒ‬௜௟ Therefore, to minimize the objective function,
σ௞ୀଵሺ ሻ ௠ିଵ
it is necessary to reweight the ordering of the residuals
݀ሺ‫ݔ‬௜ ǡ ‫ݒ‬௞ ሻ
ͳ ൑ ݅ ൑ ݊Ǣ ͳ ൑ ݆ ൑ ܿǡand (4) iteratively.
The residual indicates the distance between the data point
σ೙ ೘
೔సభ ௨೔ೕ ௫೔ and the center of clusters. Let ߨǣ ሼͳǡʹǡ ǥ ݇ǡ ǥ ǡ ܰሽ ՜
˜௝ ൌ σ೙ ೘
Ǣ ͳ ൑ ݆ ൑ ܿǡ (5) ሼͳǡʹǡ ǥ ݇ǡ ǥ ǡ ܰሽ be the permutation function. Each attribute of
೔సభ ௨೔ೕ
The FCM iteratively updates U and V through Eq. (4) and data points obtains the rank-ordered by the residuals. The
Eq. (5) to minimize ‫ ܬ‬for its local extrema when the iterative condition is as follows:
termination is reached. ሾ௥ିଵሿ ሾ௥ିଵሿ ሾ௥ିଵሿ ሾ௥ିଵሿ
ቚ݁௜஠ሺଵሻ௟ ቚ ൑ ቚ݁௜஠ሺଶሻ௟ ቚ ൑ ‫ ڮ‬൑ ቚ݁௜஠ሺ௞ሻ௟ ቚ ൑ ‫ ڮ‬൑ ቚ݁௜஠ሺ୒ሻ௟ ቚǡ (13)
௧௛
B. Fuzzy c-Ordered Means Algorithm The ߙ௜஠ሺ୩ሻ௟ denotes typicality of the ݈ attribute of the
The FCM is one of the most popular methods for ݇ ௧௛ point with respect to the݅௧௛ cluster. The ߙ௜஠ሺ୩ሻ௟ represents
clustering since it is simple and with low computation cost. ߙ௜஠ሺଵሻ௟ ൒ ߙ௜஠ሺଶሻ௟ ൒ ߙ௜஠ሺଷሻ௟ ൒ ‫ ڮ‬൒ ߙ௜஠ሺ୩ሻ௟ , which means that
However, the presence of noise and outliers in data is too the value of ߙ௜஠ሺ୩ሻ௟ is decremented according to the increase
sensitive and will affect the clustering performance. To of the residual. This method also effectively reduces the
overcome this problem, Leski developed an algorithm called impact of outliers on the results. The parameter ߙ௜஠ሺ୩ሻ௟ is
fuzzy c-ordered-means algorithm (FCOM) [6], used Huber’s described to be Piecewise-Linear-weighted OWA (PLOWA)
M-estimators and the Yager’s OWA operators to improve its and Sigmoidally-weighted OWA (SOWA). PLOWA is
robustness. defined as:
The FCOM’s objective function takes the form: ߙ௜஠ሺ௞ሻ௟ ൌ ሼሾ‫݌‬௖ ܰ െ ݇Τʹ‫݌‬௟ ܰ ൅ ͲǤͷሿ ‫ͳ ר‬ሽ ‫Ͳ ש‬Ǣ
‹ሼ‫ܬ‬ሺܷǡ ܸሻ ൌ σ௖௜ୀଵ σே ௠
௞ୀଵ ߚ௜௞ ሺ‫ݑ‬௜௞ ሻ ࣞሺ‫ݔ‬௞ ǡ ‫ݒ‬௜ ሻሽǡ (6) ݇ ‫ א‬ሼͳǡʹǡ ǥ ǡ ܰሽ, (14)
where ߚ௜௞ is additional weighting and where ‫ ר‬indicates the minimum value and ‫ ש‬indicates the
ࣞሺš௞ ǡ ˜௜ ሻ ൌ σ௣௟ୀଵ ࣞሺ‫ݔ‬௞௟ ǡ ‫ݒ‬௜௟ ሻǤ (7) maximum value. In addition, SOWA is represented as:
The set of all possible fuzzy partitions of a set of data ʹǤͻͶͶ
containing N points (p-dimensions) into c clusters is defined ߙ௜஠ሺ௞ሻ௟ ൌ ͳൗ൜ͳ ൅ exp[ (݇-‫݌‬௖ ܰ)]ൠ Ǣ
‫݌‬௔ ܰ
by: ݇ ‫ א‬ሼͳǡʹǡ ǥ ǡ ܰሽ, (15)

ࣤ௚௙௖ ൌ ሼܷ ‫ א‬Թ௖ே ‫ݑ‬௜௞ ‫ א‬ሾͲǡͳሿǢ෍ ߚ௜௞ ‫ݑ‬௜௞ ൌ ݂௞ Ǣ


The above two weighted functions belong to the
decreasing function. Parameters ‫݌‬௟ ൐ Ͳ and ‫݌‬௔ ൐ Ͳ affect the
௜ୀଵ
Ͳ ൏ σே (8) speed of the function's decline. The ߚ௜௞ denotes the typicality
௞ୀଵ ‫ݑ‬௜௞ ൏ Ǣ ͳ ൑ ‹ ൑ Ǣ ͳ ൑  ൑ ሽǡ
where ߚ௜௞ ‫ א‬ሾͲǡͳሿ denotes the typicality of the ݇ ௧௛ point of ݇ ௧௛ point with respect to the each cluster and is calculated
with respect to the ݅௧௛ cluster. The larger ߚ௜௞ value is more with function:
typical data. The ݂௞ parameter can be interpreted as an overall ߚ௜௞ ൌ ς௣௟ ߙ௜஠ሺ௞ሻ௟ Ǥ (16)
typicality of the ‫ݔ‬௞ point, which depend on typicality of the The FCOM iteratively updates the membership degree and
‫ݔ‬௞ point with respect to all clusters. Using s-norm can obtain cluster centroids through Eq. (10) and Eq. (11) to minimize ‫ܬ‬
the overall assessment of the typicality of the ‫ݔ‬௞ point and the for its local extrema. The stopping condition is set at
maximum as the s-norm is chosen in this paper. The formula ԡܸ ௥ െ ܸ ௥ିଵ ԡி ൑ ߝ, where ԡήԡி is Frobenius norm, t is the
of ݂௞ can be described as: iteration step and ߝ is a termination criterion. Most research
݂௞ ൌ ƒšߚ௜௞ Ǣ ݅ ൌ ͳǡʹǡ͵ǡ ǥ ǡ ܿǤ (9) set ߝ ൌ ͳͲିହ . The process of the FCOM is described as
The necessary conditions for minimizing the objective follows:
function with constraints by the Lagrange multiplier theorem Step1: Determine the number of clusters, c (1<c<N), m>1,
are as follows: and set ߚ௜௞ =1, ݂௞ =1 and ‫=ݎ‬1 for iteration. Initialize ܸ ሾ଴ሿ for
ଵ the c cluster centers.
݂௞ ࣞሺ‫ݔ‬௞ ǡ ‫ݒ‬௜ ሻଵି௠ Step2: Calculate the membership matrix ‫ݑ‬௜௞ ௥
by using Eq.
ܷ௜௞ ൌ ଵ ƒ†
௧௛

ቈσ௝ୀଵ ߚ௝௞ ࣞ൫‫ݔ‬௞ ǡ ‫ݒ‬௝ ൯ଵି௠ ቉ (10) for ‫ ݎ‬iteration.
Step3: Update the cluster centers ܸ ௥ ൌ ሼ‫ݒ‬ଵǡ ‫ݒ‬ଶ ǡ ǥ ǡ ‫ݒ‬௖ ሽ by

ͳ ൑ ݇ ൑ ܰǢ ͳ ൑ ݅ ൑ ܿǡ (10) using ‫ݑ‬௜௞ and according to the following steps for ‫ ݎ‬௧௛ iteration.
ሾ௥ሿ
ቂσே ௠ ሾ௥ሿ
௞ୀଵ ߚ௜௞ ሺ‫ݑ‬௜௞ ሻ ݄௜௞௟ ‫ݔ‬௞௟ ቃ Step3-1: Initialize ܸ ሾ଴ሿ ൌ Ͳ. Set ‫ ݐ‬ൌ ͳ for iteration.
ܸ௜௟ ൌ Ǣ ሾ௥ሿ
ሾ௥ሿ Step3-2: Calculate residuals ݁௜௞௟ and the coefficients ݄௜௞௟
ቂσே ௠
௞ୀଵ ߚ௜௞ ሺ‫ݑ‬௜௞ ሻ ݄௜௞௟ ቃ
by using Eq. (12)
ͳ ൑ ݅ ൑ ܿǢ ͳ ൑ ݈ ൑ ‫݌‬ǡ (11) Step3-3: Calculate the rank-order of the residuals by using
ሾ௥ିଵሿ
Ͳǡ ‫ݔ‬௞௟ െ ‫ݒ‬௜௟ ൌͲ
ሾ௥ሿ
݄௜௞௟ ൌ൝ ሾ௥ିଵሿ ሾ௥ିଵሿ ଶ ሾ௥ିଵሿ
ǡ (12) Eq. (13) to obtain the permutation function.
ࣦሺ‫ݔ‬௞௟ െ ‫ݒ‬௜௟ ሻൗሺ‫ݔ‬௞௟ െ ‫ݒ‬௜௟ ሻ ǡ‫ݔ‬௞௟ െ ‫ݒ‬௜௟ ്Ͳ Step3-4: Calculate the ߙ௜஠ሺ௞ሻ௟ by using Eq. (14) or Eq.
The ‫ݑ‬௜௞ can be used to represent its membership degree of (15).
ሾ௥ሿ
the ݇ ௧௛ point to the center of ݅௧௛ cluster and ɋ௜௟ denotes the Step3-5: Calculate the typicality ߙ௞ by using Eq. (16).
௧௛
centers of cluster in the ‫ ݎ‬- iteration. The ɋ௜௟ also depends on Step3-6: Update the centers of clusters for the ‫ ݎ‬௧௛ by using
݄௜௟௞ parameters which are on the loss function and residuals. Eq. (11).


ሾ௥ሿ ሾ௥ିଵሿ where the ܲ௜ is the selection probability of chromosome
Step3-7: If ‫ צ‬ɋ௜ െ ɋ௜ ‫צ‬ଶଶ>Ɍ then‫ ݐ‬ൌ ‫ ݐ‬൅ ͳ and go to
Step3-2 else stop. Ɍ is usually set ͳͲିହ . ݅ሺͳ ൑ ݅ ൑ ݊ሻ, ݂௜ and ݂௝ are the fitness function values of the
Step4: Update the parameters ݂௞ by using Eq. (9). chromosome i and chromosome jሺͳ ൑ ݆ ൑ ݊ሻ, respectively.
Step5: If ԡܸ ௥ െ ܸ ௥ିଵ ԡி ൒ ߝ, then ‫ ݎ‬ൌ ‫ ݎ‬൅ ͳ and go to 4) Crossover
Step2 else stop. According to the crossover rate, we determine whether the
chromosomes in the mating pool are mated or not. If mating
C. The Genetic Algorithm for Clustering is implemented, two selected chromosomes mate to produce
The GA was proposed by Holland in 1975 [9]. GA, which the offspring.ġThe formula is as follows [16]:
simulates the phenomenon of natural evolution, is a search ܺ ௡௘௪ ൌ ߙܺ ൅ ሺͳ െ ߙሻܻ, (19)
algorithm that solves the optimization problem. The main ܻ ௡௘௪ ൌ ߚܺ ൅ ሺͳ െ ߚሻܺǡ (20)
concept of GA is based on Darwin's theory of evolution. where ܺ ௡௘௪ and ܻ ௡௘௪ are the genes of offspring after the
The GA begins by encoding the problem into some crossover process, ߙ and ߚ is the random number between 0
chromosomes. Some chromosomes form the population, and 1, and ܺ and ܻ are the genes of parents which are selected
which produces better offspring through selection, crossover, in the mating pool.
mutation, and evaluation. The result of the continuous 5) Mutation
evolution process is the optimal solution to the problem. In Mutation is used to avoid reaching the local optimal
addition, there are some literatures that apply the GA to solve solution. The probability of mutation is much lower than the
clustering problems [1011]. Murphy and Chowdhury chance of crossover. If the mutation is implemented, this study
proposed a GA-based approach and got better results in applies a function to change the gene of chromosomes. The
clustering problems [12].ġ Krishna and Murty combined GA function is as follows [17]:
and K-means to develop a new clustering approach named ܺ ௡௘௪ ൌ ܺ ൅ ‫ ݏ‬ൈ ‫ ݎ‬ൈ ܽ, (21)
GKA and proved that globally optimal results could be ܽ ൌ ʹି௨௞ ǡ (22)
obtained [13]. GA is also used to optimize initial centroids and where ܺ ௡௘௪ is the gene of offspring after mutation process,
parameters for clustering methods [14-15]. s ‫ א‬ሼെͳǡͳሽ is uniform at random, and ‫ א ݎ‬ሾͳͲെ͸ ǡ ͲǤͳሿ is a
III. A GENETIC ALGORITHM-BASED FUZZY C-ORDERED - specified proportion. u is the random number between 0 and
MEANS ALGORITHM 1, and k ‫{א‬4,5,Ă.20} is mutation precision.
The FCOM may terminate at the local optimal solution B. The Steps of GA-FCOM
since it is sensitive to the initial centroids. Therefore, the GA is a famous method which can search for optimal
proposed GA-FCOM aims to reduce the impact of the initial solutions efficiently. During the iteration process, the fitness
centroids by combining GA with FCOM. This study uses the value is used to evaluate the goodness of chromosomes.
real-code GA for the proposed GA-FCOM. The chromosome Through the natural selection, crossover, and mutation
represents a set of alternative centroids. The process of GA- processes, the chromosomes of the next generation are
FCOM contains initialization, fitness calculation, selection, generated. Finally, after a certain number of generations, the
crossover, and mutation. The procedures of GA are as follows: best solution is obtained. The steps of GA-FCOM can be
A. The Control of GA represented as follows:
Step 1: Set the parameters of GA including population
1) Initialization size, crossover rate, mutation rate, and number of generations,
The initial population is generated by randomly selecting and determine the number of clusters c. In addition, the
the data instances from the dataset to become the initial parameter for the FCOM algorithm should also be pre-
chromosomes. specified.
2) Fitness calculation Step 2: Generate the initial chromosomes by selecting
The fitness function is used to evaluate the quality of each from the dataset.
chromosome. This study uses the following formula as the Step 3: Run FCOM with the generated initial centers.
fitness. Step 4: Calculate the fitness values using Eq. (6) and Eq.

‹–‡•• ൌ ଵା௃ሺ௎ǡ௏ሻǡ (17) (17).
where ‫ܬ‬ሺܷǡ ܸሻ represents the objective function of the Step 5: Update the best chromosome.
FCOM. The higher fitness value indicates better initial Step 6: Selection: Select the chromosomes into the
centroids. crossover pool by Eq. (18).
Step 7: Crossover: using Eq. (19) and Eq. (20) to produce
3) Selection
new offspring.
Roulette wheel selection is a famous method used in GA.
Step 8: Mutation: using Eq. (21) and Eq. (22) to produce
In this method, the probability of each chromosome is
new offspring.
calculated as the base being selected into the mating pool for
Step 9: Stop if the termination condition is reached;
further genetic operations by the value of fitness. The formula
otherwise, go back to Step 3.
is as follows:
௙ Step 10: Choose the best chromosome as initial cluster
ܲ௜ ൌ σ೙ ೔ ǡ (18) centroids to do FCOM.
ೕ ௙ೕ


IV. EXPERIMENTALġRESULTS TABLE III. THE COMPUTATION RESULTS OF ACCURACY

In this section, the experimental results of the clustering Datasets Accuracy FCM FCOM GA-FCOM
Glass Average(%) 55.09 55.23 56.00
algorithms will be presented. The clustering algorithms were SD 0.005 0.012 0.011
coded with Python 3.6.5 and run on a PC with an Intel Core Vertebral Average(%) 66.13 68.91 69.03
i7-6700 processor. The detailed experimental results are SD 0.000 0.002 0.000
described as follows. Breast Tissue Average(%) 58.14 62.55 63.20
SD 0.018 0.019 0.008
A. Datasets
This study uses three data sets including Glass, Vertebral According to Table 3, the proposed GA-FCOM
and Breast Tissue from UCI machine learning repository to outperforms the FCM and FCOM for all datasets. The
evaluate the performance FCM, FCOM, and GA-FCOM. standard deviation of GA-FCOM is also lower than that of
There are outliers in the Glass and Vertebral datasets. The FCOM. Moreover, in order to verify the impact of the initial
characteristics of datasets are illustrated in Table 1. centers, the objective function values of FCOM and GA-
FCOM are compared. Table 4 shows objective function values
TABLE I. THE CHARACTERISTICS OF DATASETS for FCOM and GA-FCOM. Fig. 1 shows the evolution of
objective function in Glass dataset.
Number of Number of
Dataset Number of instances
attributes clusters TABLE IV. THE COMPUTATION RESULTS OF THE OBJECTIVE FUNCTION
Glass 214 9 6
Datasets Objective FCOM GA-FCOM
function
Vertebral 310 6 3
Glass Average 16.53 15.83
Breast SD 0.791 0.175
106 9 6 Vertebral Average 1.42 1.42
Tissue
SD 0.001 0.000
B. Performance Measurement Breast Tissue Average 1.36 1.24
SD 0.133 0.103
This study uses the accuracy to evaluate the performance
of the proposed algorithm. The accuracy can be defined as
[18]:
௣௥௘ௗ௜௖௧௘ௗ௟௘௕௘௟௦
‫ ݕܿܽݎݑܿܿܣ‬ൌ ௧௥௨௘௟௘௕௘௟௦ ǡ (23)
where ‫ ݏ݈ܾ݈݁݁݀݁ݐܿ݅݀݁ݎ݌‬represent the results after
clustering.
C. Parameter Setting
Parameters are set up for GA and clustering algorithms. In
GA, after several trial experiment, the number of
chromosomes is set as 20, crossover rate and mutation rate are
set as 0.85 and 0.01, respectively, and the number of
generations is set as 30. Besides, the parameters setup for
Figure 1. Evolution of objective function in glass dataset.
clustering algorithms are described in Table 2.
From Table 4, it shows the computation results of the
TABLE II. PARAMETERS SETTING FOR ALGORITHMS
objective function. For the Glass and Breast Tissue, GA-
Methods Dataset ࢓ ࢖ࢇ ࢖ࢉ FCOM can obtain the lower value of objective function than
Glass 2 - - FCOM. It means that there is a good performance in GA-
FCM Vertebral 2 - - FCOM. For Vertebral, FCOM and GA-FCOM have the same
Breast Tissue 2 - - performance. From the Fig. 1, GA-FCOM has better
Glass 1.2 0.2 2 performance in the first iteration and faster convergence than
FCOM Vertebral 2 0.4 0.5
FCOM.
Breast Tissue 3 0.3 0.6
Glass 1.2 0.2 2 V. CONCLUSIONS
GA-FCOM Vertebral 2 0.4 0.5
Breast Tissue 3 0.3 0.6 This study has proposed a new clustering method which is
a genetic algorithm-based fuzzy c-ordered means algorithm.
Since the performance of clustering is susceptible to initial
D. Computational Results centroids, this study used the GA to overcome this problem.
In this study, experimental results are obtained by running Because of the good initial centroids, GA-FCOM can
the algorithms 30 times. All data are also normalized in the converge faster and obtain better results more efficiently. GA-
range between 0 and 1. Table 3 shows the computation results FCOM enhances robustness for the dataset with outliers.
including accuracy and the corresponding standard deviation. There are two data sets with outliers used in this experiment.
The experimental results are compared for three algorithms


including FCM, FCOM, and GA-FCOM. GA-FCOM can [8] Fan, J., Han, M., & Wang, J. J. P. R. (2009). Single point iterative
obtain the best clustering performance including the accuracy weighted fuzzy C-means clustering algorithm for remote sensing
image segmentation. 42(11), 2527-2540.
and objective function value for all datasets. In the future,
[9] Holland, J. J. A. A. M. T. U. o. M. P. (1975). Adaption in natural and
since the parameter setting of FCOM is still based on the artificial systems.
previous research, it is hoped that the meta-heuristics can be [10] Bezdek, J. C., Boggavarapu, S., Hall, L. O., & Bensaid, A. (1994).
used to find good parameter values and initial centroids Genetic algorithm guided clustering. Paper presented at the
simultaneously. Evolutionary Computation, 1994. IEEE World Congress on
Computational Intelligence., Proceedings of the First IEEE Conference
ACKNOWLEDGMENT on.
This research was partially supported by the Ministry of [11] Maulik, U., & Bandyopadhyay, S. J. P. r. (2000). Genetic algorithm-
based clustering technique. 33(9), 1455-1465.
Science and Technology of the Taiwan Government under
[12] Murthy, C. A., & Chowdhury, N. (1996). In search of optimal clusters
grant MOST105-2221-E-011-103-MY3. This support is using genetic algorithms.
gratefully appreciated. [13] Krishna, K., Murty, M. N. J. I. T. o. S., Man,, & Cybernetics, P. B.
(1999). Genetic K-means algorithm. 29(3), 433-439.
REFERENCES
[14] Jimenez, J., Cuevas, F., & Carpio, J. (2007). Genetic algorithms
[1] Chen, M.-S., Han, J., Yu, P. S. J. I. T. o. K., & Engineering, d. (1996). applied to clustering problem and data mining. Paper presented at the
Data mining: an overview from a database perspective. 8(6), 866-883. Proceedings of the 7th WSEAS International Conference on
[2] Grira, N., Crucianu, M., & Boujemaa, N. J. A. r. o. m. l. t. f. p. m. c., Simulation, Modelling and Optimization.
Report of the MUSCLE European Network of Excellence. (2004). [15] Khotimah, B. K., Irhamni, F., Sundarwati, T. J. J. o. T., & Technology,
Unsupervised and semi-supervised clustering: a brief survey. 1001- A. I. (2016). A Genetic Algorithm For Optimized Initial Centers K-
1030. Means Clustering In SMEs. 90(1), 23.
[3] Tan, P.-N., Steinbach, M., & Kumar, V. J. I. t. d. m. (2013). Data [16] Michielssen, E., Ranjithan, S., & Mittra, R. J. I. P. J.-O. (1992).
mining cluster analysis: basic concepts and algorithms. Optimal multilayer filter design using real coded genetic algorithms.
[4] MacQueen, J. (1967). Some methods for classification and analysis of 139(6), 413-420.
multivariate observations. Paper presented at the Proceedings of the [17] Sumathi, S., Hamsapriya, T., & Surekha, P. (2008). Evolutionary
fifth Berkeley symposium on mathematical statistics and probability. intelligence: an introduction to theory and applications with Matlab:
[5] Bezdek, J. C. (1981). Objective function clustering. In Pattern Springer Science & Business Media.
recognition with fuzzy objective function algorithms (pp. 43-93): [18] Graves, D., Pedrycz, W. J. F. s., & systems. (2010). Kernel-based fuzzy
Springer. clustering and fuzzy clustering: A comparative experimental study.
[6] Leski, J. M. J. F. S., & Systems. (2016). Fuzzy c-ordered-means 161(4), 522-543
clustering. 286, 114-133.
[7] Yager, R. R. J. I. T. o. s., Man,, & Cybernetics. (1988). On ordered
weighted averaging aggregation operators in multicriteria
decisionmaking. 18(1), 183-190.



You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy