0% found this document useful (0 votes)

25 views17 pages

Artigo Smallex

The document describes using a differential evolution algorithm combined with local search techniques to optimize parameters for support vector machines applied to noisy data classification problems. Support vector machines formulate classification as an optimization problem to find an optimal separating hyperplane between classes. For noisy data, conventional optimization techniques can get stuck in local minima. The proposed approach uses differential evolution to search the parameter space globally and incorporates local search methods like tabu search and Nelder-Mead to further refine promising solutions. The hybrid algorithm aims to find high quality SVM parameter values that improve classification accuracy for noisy data compared to traditional approaches.

Uploaded by

Will Corleone

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

25 views17 pages

Artigo Smallex

Uploaded by

Will Corleone

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/262378087

Support Vector Machines applied to noisy data classiﬁcation using differential

evolution with local search

Article · January 2011

CITATIONS READS

2 3,191

2 authors:

Rodrigo Cosme Renato Krohling

Instituto Capixaba De Pesquisa, Assistência Técnica E Extensão Rural Universidade Federal do Espírito Santo
3 PUBLICATIONS 21 CITATIONS 105 PUBLICATIONS 3,789 CITATIONS

SEE PROFILE SEE PROFILE

All content following this page was uploaded by Renato Krohling on 03 August 2016.

The user has requested enhancement of the downloaded file.

Support Vector Machines applied to noisy data
classification using differential evolution with
local search

Rodrigo de C. Cosme and Renato A. Krohling

Abstract Data mining is a key area for many fields of science and engineering.
In this context, a statistical learning method, known as Support Vector Machines
(SVM) has presented itself as an alternative method to solve data classification. Usu-
ally, the SVM problem is formulated as a nonlinear optimization problem subject to
constraints. Conventional optimization techniques using the Lagrangian approach
are used to solve this kind of problem. In the case of classification of noisy data the
conventional techniques show performance deterioration, since the resulting opti-
mization problem is multidimensional and may present many local minima. In this
work, it is proposed a Differential Evolution algorithm (DE) combined with a local
search technique to find the optimal parameters of SVM classifiers applied to noisy
data.

1 Introduction

In 1995, Vapnik [17] introduced a foundation to Support vector machines (SVM),

a statistical machine learning method. In its basic form the objective of SVM is
to maximize the distance separating the elements of two different classes. When
the classes to which the elements belong to are known a priori, the problem is
called classification. The set of data used to calculate the boundary limit between
the classes is called the training set, while the data set used to test the efficacy of the
method, is called validation set.
Initially, classification problems were designed to separate two classes, the binary
class problem. The introduction of more classes introduces a new level of difficulty.
Since SVM was mathematicaly proposed to judge two classes, in order to classify

Rodrigo de C. Cosme
Universidade Federal do Espı́rito Santo, e-mail: rdccosmo@gmail.com
Renato A. Krohling
Universidade Federal do Espı́rito Santo, e-mail: krohling.renato@gmail.com

1
2 Rodrigo de C. Cosme and Renato A. Krohling

multiclass problems [5] some methods have been proposed. SVM Ensemble is one
of these methods. It consists of applying SVM to the m classes of the problem in
groups of two, then each output of these m binary SVMs is combined to solve the
original multiclass problem. Some Ensemble methods to aggregate binary SVMs to
solve a multiclass problem are majority voting and weighting [5]. The main concern
of multiclass classification problems is the computational cost growth. In addition,
there are cases in which the classes are not linearly separable. In such cases, a non-
linear transformation is applied to the data in order to achieve a plane where the
data can be separated. This transformation is called kernel function. Several kernel
functions have been studied for better performance, to incorporate other properties
such as feature selection [7].
By using SVMs in the context of classification several parameters must be con-
figured to minimize the classification error for the validation instances. As a multi-
dimensional optimization problem, it must be avoided to get trapped into local min-
ima. Amongst the alternatives to solve the optimization problem, are the gradient-
based deterministic methods. These methods normally are computationally efficient,
but can lead to local minima resulting in stagnation. On the other hand, population
based methods (genetic algorithms, differential evolution, particle swarm optimiza-
tion) have been shown effective to solve these kinds of problems. Generally the
search or optimization methods working with a population are more computation-
ally costly than gradient-based methods but may provide more robust solutions.
Hence, the automatization of the SVM training process utilizing a biologically in-
spired evolutionary algorithm is desirable.
Many ways exist to approach this problem, one of them is to use an evolutionary
algorithm to tune the SVMs parameters encoded as an individual [13]. In [13] it is
shown an approach to perform feature selection and determine the parameters of a
SVM utilizing the Particle Swarm algorithm. In [7] it is presented an extension of the
Gaussian kernel which can perform feature extraction. In [4] a study of the impact
of introducing noise in different places on the classification accuracy while utilizing
different techniques of supervised learning such as neural networks and naive Bayes
probabilistic classifier. In [8] it is introduced and analyzed noisy variables.
Other works focus on data clustering in order to produce meaningful classifica-
tion in noisy data [1] with the disadvantage that the number of clusters are known a
priori. In order to have good classifications and reduce the parameters needed to be
set up by the designer, in this paper a SVM+DE approach is proposed. This hybrid
technique benefits from statistical power of the SVM while the DE is used to find
the optimal values of the SVM parameters .
In Section 2 we briefly describe the SVMs. Section 3 presents the DE algorithm.
Section 4 describes the Tabu Search and Nelder Mead methods used for local search.
Section 5 illustrates the hybrid method. Section 6 shows the results and discussions.
In Section 7 the conclusions are given.
Support Vector Machines applied to noisy data classification using DE with local search 3

2 Support Vector Machines

Given a training set Z = {(x1 , y1 ), (x2 , y2 ), . . . (xl , yl )} of l instances where each in-
stance is composed of n attributes (features) (xi = (x1i , . . . , xni )T ∈ Rn ) and a classi-
fying label yi ∈ {1, −1}, the task of classification consists of separating two classes
with a hyperplane.

f (w, b) = sign(w · x + b) (1)

The parameter b (called bias) is calculated using two vectors, but it can be calcu-
lated using all the support vectors on margin to give stability [18].
Although Equation 1 can separate any part of the features space, it is necessary
to establish an optimal separating hyperplane (OSH) [12] (w*, b*). To optimally
separate the set of vectors, they must be separated in a way that minimizes the
classification error of a new instance and the distance between the closest vectors
to the hyperplane is maximal. The hyperplane can be determined by the solution of
the following minimization problem.

min 21 ||w||2 +C
(2)
s.t. yi [w · xi + b] ≥ 1
Or in the Dual form

max ∑ni=1 αi − 21 ∑i, j αi α j ci c j k(xi , x j )

0 ≤ αi ≤ C, 0 ≤ i ≤ l (3)
s.t. l
∑ j=1 α j y j = 0
where l is the number of lagrangian multipliers.
To solve the nonlinear decision surfaces case, the OSH is found by nonlinearly
transforming the set of original feature vectors xi into a high dimensional feature
space mapping Φ : xi → zi followed by the linear separation. However, it is nec-
essary an enormous internal products computation (Φ(x)Φ(xi )) in the high dimen-
sional feature space. Therefore, using a kernel function that satisfies the Mercer’s
Theorem [14] given by Equation 4, it significantly diminishes the calculations to
solve nonlinear problems. The Gaussian kernel is given by Equation 5 and the SVM
binary decision function in Equation 1 can be rewritten as given by Equation 6.
Using Equation 7 to further simplify Equation 6, leads us to Equation 8.
4 Rodrigo de C. Cosme and Renato A. Krohling

(Φ(x)Φ(x)) = K(x, xi ) (4)

−||x − x j ||2
K(x, x j ) = exp( ) (5)
2σ 2
N
g(x) = sign( ∑ αi yi K(xi , x) + b) (6)
i=1
l
w = ∑ yi αi xi (7)
i=1
N
g(x) = sign( ∑ wΦ(x) + b) (8)
i=1

In [7], an extension of the gaussian kernel was given, with the property of feature
extraction as well as feature mapping. The feature extraction is executed due to the
parameter vector β . The lower the value of βk is, the less relevant the feature k is to
the classification. Therefore, removing this feature in question would not affect the
classifier.

− ∑Nk=1 βk (xik − x jk )2
K(x, x j ) = exp( ) (9)
2σk2
Apart from the choice of kernel function, the selection of samples for training
and validation greatly affects the performance of the classifier. It is well established
in the literature a proportion of 70% of the data set is used for training and 30% for
validation [5] but which samples to reserve for each task still remains an open issue.
A few algorithms were proposed to address this issue. In [15] Boosting and Bagging
techniques are compared.
As the dimension of the data set increases the task of classification becomes
increasingly more complex. Since the number of parameters to be optimized grow
proportionately. Next, we describe the algorithms to optimize the SVM parameters.
Many researches have been done in the field of noisy SVM to define methods
to remove the noise or to accommodate the learning model to the perturbed data.
Topics of research include data clustering to remove the noise [1], Differential Evo-
lution classifiers [8] and relevant features extraction [3]. As far as we know little has
been done with SVM+DE. Hence, this is a first approach using SVM+DE to learn
the classification model for noisy data.

3 Differential Evolution

The optimization procedure Differential Evolution (DE) was introduced by Storn

and Price [10]. As a population based evolutionary algorithm, each individual can-
didate solution in the population is subject to basic operations of mutation, crossover
and selection.
Support Vector Machines applied to noisy data classification using DE with local search 5

To better balance exploration and exploitation, it was proposed in [11] the use
of neighborhood-based mutation. In this method, local neighborhood, where the
best individual is in a small neighborhood, and global neighborhood, where the best
individual is in the whole population at the current generarion, are used together. The
combination of local and global model is done by a new parameter w, the weight
factor, resulting in the neighborhood-based mutation vector.
To create the local vector, the best vector in the neighborhood and two other
vectors are chosen, as given by Equation 10:

Li = Xi + α(Xbesti − Xi) + β (X p − Xq ) (10)

where Xbesti stands for the best vector in the neighborhood of Xi , k is the neighbor-
hood radius and p, q ∈ [i − k, i + k] with p 6= q 6= i.
Analogously, the global vector is created according to:

Gi = Xi + α(Xbest − Xi) + β (Xr1 − Xr2 ) (11)

where Xbest stands for the best vector in the population and r1, r2 ∈ [1, N] with r1 6=
r2 6= i. The parameters α and β are scaling factors.
The mutation vector is formed as a combination of Li and Gi as given by:

Vi = wGi + (1 − w)Li (12)

With regard to neighborhood, the idea of topology is used to build the concept
of proximity. A few topologies have been proposed for the PSO algorithm, e.g. star,
wheel and circular, but experiments have shown that the ring topology is better for
the case of DE [11]. In a population of N individuals, the neighbors of individual Xi
are Xi−k . . . Xi . . . Xi+k , where k ∈ [1, N] and k must be smaller than N.
Motivated by [9], where the coefficients of PSO were generated by an exponen-
tial probability distribution with good results, the exponential probability distribu-
tion is used in this work to generate the scaling factor in DE. The exponential proba-
bility distribution with density function f (r) is described by the following Equation.
1
f (r) = exp(−|r − a|/b), −∞ ≤ r ≤ ∞, a, b ≥ 0 (13)
2b
To control the variance one can change the parameters a and b. In this work it was
set as a = 0 and the scale parameter b = 1. Experimental results indicate that using a
scaling factor from this probability distribution has contributed in a positive way to
the performance of DE. Sampling random numbers from the exponential probability
distribution, for the scaling factor in DE may provide a good compromise between
exploration and exploitation.
6 Rodrigo de C. Cosme and Renato A. Krohling

4 Tabu Search and Nelder-Mead method

As mentioned in the introduction, we adopted Tabu Search (TS) and Nelder-Mead

(NM) [6] to exploit the local neighborhood in an attempt to escape of the stagnation
during the evolution process.

4.1 Tabu Search method

In Tabu Search (TS), a number R of random neighbors and the best point evaluated
is chosen as start point, current solution (CS ) and best solution (S). The list used to
keep recent points, the Short-Term Tabu List (STTL) is updated with CS . Then, N
neighbors are generated randomly around CS based on a search strategy and ranked
according to their performance. The best neighbor point is copied to CS if it is not a
member of STTL. A neighbor is chosen as next move if it outperforms S. For further
exploitation, S is added to a Long-Term Tabu List (LTTL). Thus, if CS is better than
S, then S is replaced by CS . The continuous process of neighbor generation and
selection of the best is stopped only when a number os iterations is achieved or
when there is no improvement after a pre-specified number of iterations.

4.2 Nelder-Mead method

In order to find the minimun of a function, the Nelder-Mead (NM) method randomly
generates D points in a D-dimensional search space, from P1 , . . . , PD , around a start
point P0 . The start point can be chosen randomly or be the result of a previously run
algorithm. Each point Pi evaluated has its value denoted by Yi and the highest and
lowest values are Yh and Yl , respectively.
The operations used for NM search are reflection, expansion and contraction,
generating the points Pr , Pe and Pc , respectively. Another point used in these trans-
formations is the centroid, denoted as P. These operations are defined as follows:

1 n
P= ∑ Pi (14)
n i=1
Pr = (1 + αNM )P − αNM Ph (15)
Pe = γNM Pr + (1 − γNM )P (16)
Pc = βNM Ph + (1 − βNM )P (17)

where αNM , βNM and γNM are values in the interval [0, 1].
The NM search algorithm is presented as follows:
1 Perform reflection for the point Ph
Support Vector Machines applied to noisy data classification using DE with local search 7

2 Copy Pr into Ptemp

3 if f (Ptemp ) ≤ Yl then substitute Pl with Ptemp and Yl with f (Ptemp ) else
goto 6
4 Perform expansion at point Yl
5 Copy Pe into Ptemp goto 3
6 if ( f (Ptemp ) < Yh and f (Ptemp ) > Yi , i 6= h) replace Ph by Ptemp and f (Ph ) by
f (Ptemp )
7 Perform contraction of Ph
8 Copy Pc into Ptemp
9 if f (Ptemp > Yh ) then replace all Pi ’s by ((Pi + Pl ))/2
10 if stopping condition has not been reached goto 1
11 Return Pl and Yl

Listing 1 Nelder-Mead algorithm

where the stopping condition in step 10 is given by:

1 n
∑ k Pik − Pik+1 k2 < ε
D i=1
(18)

where Pik and Pik+1 are the points in iteration k and k + 1, respectively, and ε is a
small real number.

5 Optimization of Support Vector Machines Parameters

Our method consists of finding the kernel parameters and selecting the minimum
amount of features while maximizing the classification accuracy of the SVM, that
is, minimizing the classification errors made by the SVM.
The individuals representation of the SVM parameters is made up of the param-
eters 0 ≤ C ≤ 1 (from Equations 2 and 3), 0 ≤ σ ≤ 1, and 0 ≤ β ≤ 1. The σ , and β
components of the individual are multidimensional variables related to the modified
gaussian kernel function [7] as shown in Equation 9.
According to [7] after training the SVM with N features, one can validate the
SVM with fewer features and obtain about the same result, because a small value of
an element of β means that the corresponding feature does no contribute much to
the classification, consequently it can be removed from the data set without affecting
the classification accuracy significantly. This procedure, called feature extraction, is
responsible for increasing the classifier performance [16]. It was configured that all
values of β satisfying βi ≤ 0.5, i = 1, . . . , N can be removed from the features set as
done in [7].
During the evolution process, the objective is the maximization of fitness. When
the best fitness is stagnated for MAXstagnation (set to 10), a local search algorithm is
applied to further exploit the best solution found so far, taking as starting point the
best individual or the local vector, with probability 50%. This algorithm is the com-
bination of Tabu Search and Nelder-Mead method [6] and is described in Section
4.
8 Rodrigo de C. Cosme and Renato A. Krohling

6 Simulation results

6.1 Experimental settings

To create the test suite, three data sets were selected from the UCI [2] repository and
then noise was aggregated to them. Data sets that contained m classes (where m > 2)
were transformed into m 2-class data sets and their results are shown for each class
separately. This transformation consists of assigning a different label to the classes
different than the current selected class. This procedure is performed for each of the
m classes. In addition, the data sets were perturbed with different degrees of noise.
The generation of noise can be classified in different ways [4]. In this study two
categories were taken into account, which are distribution and location. The distri-
bution chosen was Gaussian; and the locations selected to introduce noise were in
the output class, in the training data, in the validation data, or a combination of both.
Another possibility is to introduce noise variables in the training data as done in [8].
Here it was also introduced in the validation data and in both training and validation
data sets. The percentage of data to be perturbed was set to 10% and 50%. Once
the percentage η of noise is determined, ηLs feature vectors are randomly selected
according to a normal distribution to be perturbed, where Ls is the size of the data
set to be perturbed which can be the training set, the validation set or both.
The selection of individual vectors to be perturbed is described as follows. For
each feature, N random numbers are generated with normal distribution, where N is
the number of vectors in the data set multiplied by the noise percentage to be intro-
duced in the data set. Once the vectors to be altered are selected, the perturbation is
introduced in the following manner. The perturbed variables are replaced with ran-
dom numbers sampled from a Gaussian distribution with the original value as mean
and a standard deviation with value 1.
The types of noise are separated in Gaussian noise and noisy variables. The ad-
dition of Gaussian noise works as described earlier. As for the addition of noisy
variables, after the addition of new uniform random variables, the same principle of
Gaussian noise applies. Either it can be added in training data, in validation data,
or in both training and validation (training/validation) data. For ease of reference
we named Gaussian noise in training, validation, training/validation and training la-
bel as noise A, B, C and D, respectively. Also, for the noise variables, we named
them as noise E, F and G, for noise variables added in training, validation and train-
ing/validation, respectively. Noise types A, B, C and D have been studied in [4].
All the noise types are added to the input data, except for noise D, which sets the
output data (the label vector) to a uniform random value chosen out of two pos-
sible values {−1, 1}. In [8] noiseE adds noisy variables to the training data. The
remaining noise types follow the same principle, but add noise in different parts, in
the validation data (noiseF) and in training/validation data (noiseG). The number of
noise variables added was set to twice the number of features of the data set being
classified [8].
Support Vector Machines applied to noisy data classification using DE with local search 9

During the evolutionary process the individuals were ranked based on N-fold
cross-validation values, with the number of folds set to three. To validate the op-
timized classifiers a new validation was performed with the data set selected ran-
domly and divided into 70% for training and 30% for validation. The data pertain-
ing to each class was divided in the same proportions, 70% of each class for training
and 30% for validation. Finally, to statistically validate the method we performed a
bootstrap with N = 100 samples with the trained classifiers.

6.2 Results and Discussion

The effectiveness of the proposed method was evaluated performing three different
classification problems from the UCI repository, for instance, the heart data, the
breast-cancer data and the iris data. The results are summarized in Table 1 for the
heart data set, in Table 2 for the breast data set and Table 3 for the iris data set. The
values are averaged cross-validation results obtained in 10 runs of the evolutionary
process. Since the run consists of the evolutionary process and the bootstrap for
three problems with the different noise setups, it is computationally expensive and
thus 10 runs was considered plausible. The fitness used to measure the classifiers
performance was the classification accuracy.
For the 3 datasets analysed, only the results obtained for iris dataset were com-
pared to other works. The main reason for leaving the heart scale and breast-cancer
uncompared is that the noise setup used was the same as the one used by [4]. So a
complete comparisom is left for future work.

Table 1 Cross-validation values achieved with SV M + DE + LS for the heart scale problem.
no noise type A type B type C type D type E type F type G
10 50 10 50 10 50 10 50 10 50 10 50 10 50
heart scale0 83.2 83.7 78.9 82.9 81.9 82.6 76.7 86.1 87.3 83.0 84.6 82.7 80.5 83.2 84.6

Table 2 Cross-validation values achieved with SV M + DE + LS for the breast-cancer scale prob-
lem.
no noise type A type B type C type D type E type F type G
10 50 10 50 10 50 10 50 10 50 10 50 10 50
breast-cancer scale0 96.7 95.8 93.6 95.7 93.3 94.8 91.1 95.7 95.4 96.8 95.7 95.9 96.3 96.7 96.6

In [8], the result obtained for the heart problem without noise is 83.21, which is in
agreement with our results. Table 1 shows that despite the introduction of noise the
accuracy of the classifier increased for some cases. One possible reason may be that
the noise contributed to increase the distance between the samples and the margins
of the separating hyperplane, making the task of classification more precise. A 10%
noise introduction increased or maintained the accuracy for all noise types. A noise
introduction of 50% decreased the performance for all noise types except for noise
10 Rodrigo de C. Cosme and Renato A. Krohling

Table 3 Cross-validation values achieved with SVM+DE+LS for the iris scale problem.
no noise type A type B type C type D type E type F type G
10 50 10 50 10 50 10 50 10 50 10 50 10 50
iris scale0 100.0 99.7 93.0 99.7 98.3 99.4 88.1 99.7 100.0 100.0 100.0 100.0 100.0 100.0 99.7
NaiveBayesc0[4] 92.42 91.85 54.97 88.30 88.63 89.52 63.12
iris scale1 94.4 93.3 79.6 94.3 89.7 93.9 74.0 94.9 94.8 94.8 90.0 95.2 93.3 94.6 93.4
NaiveBayesc1[4] 100 95.78 44.21 100 100 97.89 80.70
iris scale2 93.5 94.0 85.7 94.6 90.7 94.2 85.3 95.6 95.4 96.5 95.3 95.0 95.4 95.0 96.8
NaiveBayesc2[4] 90.7 88.19 36.48 87.88 83.57 86.11 52.33

types D and G. In [8], when 20 noise variables were added with 100% probability,
the mean classification accuracy was 80.99, while we obtained 86.
Table 2 shows the results for the breast-cancer problem. All the noise types de-
creased or maintained the accuracy level. The noise types which had the highest
accuracy decrease were noise types A, B and C, partially in agreement with Nettle-
ton et al. in [4], that observed that noise in training data affects more the accuracy
of classifiers. The partial agreement is due to: 1) noise B, that affects testing data,
deteriorated the performance slightly more than noise A, which affects training data.
While noise C, affecting training and testing, had the highest accuracy decrease. 2)
On the other hand, for the noise variables type E, F and G, the propertie of noise
in training data affecting more the performance than noise in testing was preserved,
that is, noise E deteriorated the performance more than noises F and G.
For the iris problem, as discussed in 6.1, the original 3 classes decision problem
was transformed into 3 decision problems: irisc0, irisc1 and irisc2. As shown in
Table 3, the introduction of 10% noise didn’t affect the accuracy for any of the noise
types. The introduction of 50% noise had a major impact on noise types A, B and
C, deteriorating the performance, while the others maintained the performance. The
greatest performance deterioration was seen for noise type C at 50% noise level. In
comparison to the Naive Bayes method of [4], our method yielded good results. In
irisc1, our method lost all cases by a difference of no more than 10%, except for the
experiment with noise type A at 50%, winning by a difference of 44%.
To further validate our approach, the trained classifiers were submitted to a boot-
strap method to statistically show that the method achieves good results despite the
randomness. The points plotted are resulting values obtained averaging bootstrap
values of 10 runs to estimate the accuracy without any bias that might be introduced
by the randomness of the selection of both training and validation sets. In Figures 1,
2 and 3, 4 and 5 the fitness achieved for increasing values of noise is shown for each
type of noise inserted, for the heart and irisc0, irisc1 and irisc2 data classification
problems respectively.
The analysis of the resulting graphic shows that, for the Heart problem, noises
of type A and C had a higher performance deterioration as noise increased, with the
accuracies achieving values below 80 percent. The analysis of the graphic for the
Breast problem shows that noise of type C had a higher performance deterioration of
the final classifiers as noise increased. For the Iris problem, noises of type A and C
had more negative impact in the classification. As stated in [4], noise in the training
Support Vector Machines applied to noisy data classification using DE with local search 11

data causes more impact in the classification, which explains the higher influence
caused by the two noise types.

Noise in training Noise in test

100 100
Gaussian noise Gaussian noise
95 Noise variable 95 Noise variable
Bootstrap

Bootstrap
90 90
85 85
80 80
75 75
70 70
0 10 20 30 40 50 0 10 20 30 40 50
Noise percentage Noise percentage

Noise in training and test Noise in training label

100 100
Gaussian noise Gaussian noise
95 Noise variable 95
Bootstrap

Bootstrap

90 90
85 85
80 80
75 75
70 70
0 10 20 30 40 50 0 10 20 30 40 50
Noise percentage Noise percentage

Fig. 1 Fitness bootstrap values as error increases for the Heart problem.

7 Conclusions

In this paper we propose the use of Differential Evolution hybridized with local
search to optimize the parameters of Support Vector Machines to classify data. The
behavior of the classifiers were evaluated while noise was introduced in different
parts of the data.
Analizing the results for the heart problem, in terms of noise type, the deterio-
ration was noticed with 10% noise, when the test data set was altered. That means,
all noise types that affected the test set suffered deterioration: B, C, F. The only
exception was type G, which maintained the same level of accuracy. Although the
performance deterioration was higher with 10% of noise, when the test data set was
perturbed, the same behaviour didn’t occur with a higher percentage of noise. As the
12 Rodrigo de C. Cosme and Renato A. Krohling

Noise in training Noise in test

100 100
Gaussian noise Gaussian noise
Bootstrap
95 Noise variable 95 Noise variable

Bootstrap
90 90
85 85
80 80
75 75
70 70
0 10 20 30 40 50 0 10 20 30 40 50
Noise percentage Noise percentage

Noise in training and test Noise in training label

100 100
Gaussian noise Gaussian noise
95 Noise variable 95
Bootstrap

Bootstrap
90 90
85 85
80 80
75 75
70 70
0 10 20 30 40 50 0 10 20 30 40 50
Noise percentage Noise percentage

Fig. 2 Fitness bootstrap values as error increases for the Breast problem.

noise percentage increased to 50%, the noises affecting the training data set were the
most deteriorated: A and C. Even though noise types D, E, and G affect the training
set, as noise types A and C do, they didn’t show the same deterioration as these types
of Gaussian noise. On the contrary, with 10% noise they showed little deterioration
or maintained the performance; with 50% noise they presented even better accuracy
levels compared to the performance of the original data set, without noise. Noise D
didn’t present deterioration, it increased the accuracy with 10% and 50% noise.
The results for the breast-cancer problem were different. Both training and test
data sets, when perturbed with noise, presented performance deterioration. This
time, type D was not an exception and presented a decrease in accuracy. The de-
terioration for types A and B was the same for both noise percentages. For this
dataset, type D showed performance deterioration, but it still was smaller than any
of the other types, while type C had the highest deterioration. Noise type E only had
a decrease in performance accuracy with 50% noise. As for noise types F and G,
they had no noticeable decrease in performance, maintaining the accuracies. Noise
type F had a small decrease with 10% noise but with 50% noise its accuracy level
increased again staying little below the original dataset level.
Support Vector Machines applied to noisy data classification using DE with local search 13

Noise in training Noise in test

100 100
Gaussian noise Gaussian noise
Bootstrap
95 Noise variable 95 Noise variable

Bootstrap
90 90
85 85
80 80
75 75
70 70
0 10 20 30 40 50 0 10 20 30 40 50
Noise percentage Noise percentage

Noise in training and test Noise in training label

100 100
Gaussian noise Gaussian noise
95 Noise variable 95
Bootstrap

Bootstrap
90 90
85 85
80 80
75 75
70 70
0 10 20 30 40 50 0 10 20 30 40 50
Noise percentage Noise percentage

Fig. 3 Fitness bootstrap values as error increases for the Iris problem Class 0.

The iris problem, divided into 3 sub-problems, iris0, iris1 and iris2, followed the
same pattern observed with the breast-cancer problem. Noise types A and B have
performance deteriorations but noise type C was the highest. Noise types D, E, F and
G maintained the same performances. With a 10% noise, the deterioration, when
observed, was small with almost no changing in performance. But as the percentage
increased to 50% it grew higher.
In a real world classification problem, the data may be corrupted and subject to
noise. This work gives a perspective of what to expect when adopting SVM+DE to
classify noisy data. Preliminary analysis indicates that the performance of SVM+DE
depends on the type and percentage of noise and the characteristics of the data.
For future works, we plan to compare our method with other approaches studied
thoroughly in [4].
14 Rodrigo de C. Cosme and Renato A. Krohling

Noise in training Noise in test

100 100
Gaussian noise Gaussian noise
Bootstrap
95 Noise variable 95 Noise variable

Bootstrap
90 90
85 85
80 80
75 75
70 70
0 10 20 30 40 50 0 10 20 30 40 50
Noise percentage Noise percentage

Noise in training and test Noise in training label

100 100
Gaussian noise Gaussian noise
95 Noise variable 95
Bootstrap

Bootstrap
90 90
85 85
80 80
75 75
70 70
0 10 20 30 40 50 0 10 20 30 40 50
Noise percentage Noise percentage

Fig. 4 Fitness bootstrap values as error increases for the Iris problem Class 1.

References

[1] A. Banerjee. Robust fuzzy clustering as a multi-objective optimization proce-

dure. In Fuzzy Information Processing Society, 2009. NAFIPS 2009. Annual
Meeting of the North American, pages 1 –6, 2009.
[2] A. Frank and A. Asuncion. UCI Machine Learning Repository, 2010.
http://archive.ics.uci.edu/ml/.
[3] B. Byeon and K. Rasheed. Simultaneously removing noise and selecting rel-
evant features for high dimensional noisy data. In Machine Learning and Ap-
plications, 2008. ICMLA ’08. Seventh International Conference on, pages 147
–152, December 2008.
[4] D. Nettleton, A. Orriols-Puig, and A. Fornells. A study of the effect of different
types of noise on the precision of supervised learning techniques. Artificial
Intelligence Review, 33:275–306, 2010.
[5] H. Kim, S. Pang, H. Je, D. Kim, and S. Y. Bang. Constructing support vector
machine ensemble. Pattern Recognition, 36(12):2757–2767, 2003.
Support Vector Machines applied to noisy data classification using DE with local search 15

Noise in training Noise in test

100 100
Gaussian noise Gaussian noise
Bootstrap
95 Noise variable 95 Noise variable

Bootstrap
90 90
85 85
80 80
75 75
70 70
0 10 20 30 40 50 0 10 20 30 40 50
Noise percentage Noise percentage

Noise in training and test Noise in training label

100 100
Gaussian noise Gaussian noise
95 Noise variable 95
Bootstrap

Bootstrap
90 90
85 85
80 80
75 75
70 70
0 10 20 30 40 50 0 10 20 30 40 50
Noise percentage Noise percentage

Fig. 5 Fitness bootstrap values as error increases for the Iris problem Class 2.

[6] M. H. Mashinchi, M. A. Orgun, and W. Pedrycz. Hybrid optimization with

improved tabu search. Appl. Soft Comput., 11:1993–2006, 2011.
[7] P. Du, J. Peng, and T. Terlaky. Self-adaptive support vector machines: mod-
elling and experiments. Computational Management Science, 6(1):41–51,
2009.
[8] P. Luukka and J. Lampinen. Differential evolution classifier in noisy settings
and with interacting variables. Applied Soft Computing, 11(1):891–899, 2011.
[9] R. A. Krohling and L. dos Santos Coelho. PSO-E: Particle swarm with ex-
ponential distribution. In Proceedings of the IEEE Congress on Evolutionary
Computation, CEC 2006, pages 1428 –1433, 2006.
[10] R. Storn and K. Price. Differential evolution – a simple and efficient heuristic
for global optimization over continuous spaces. Journal of Global Optimiza-
tion, 11:341–359, 1997.
[11] S. Das, A. Abraham, U.K. Chakraborty, and A. Konar. Differential evolution
using a neighborhood-based mutation operator. IEEE Transactions on Evolu-
tionary Computation, 13(3):526 –553, June 2009.
[12] S. Kawano, D. Okumura, H. Tamura, H. Tanaka, and K. Tanno. Online learn-
ing method using support vector machine for surface-electromyogram recog-
16 Rodrigo de C. Cosme and Renato A. Krohling

nition. Artificial Life and Robotics, 13:483–487, 2009.

[13] S. Lin, K. Ying, S. Chen, and Z. Lee. Particle swarm optimization for param-
eter determination and feature selection of support vector machines. Expert
Syst. Appl., 35:1817–1824, 2008.
[14] S. Wang, A. Mathew, Y. Chen, L. Xi, L. Ma, and J. Lee. Empirical analysis
of support vector machine ensemble classifiers. Expert Syst. Appl., 36:6466–
6476, April 2009.
[15] T. M. Khoshgoftaar, J. Van Hulse, and A. Napolitano. Comparing boosting
and bagging techniques with noisy and imbalanced data. IEEE Transactions
on Systems, Man and Cybernetics, Part A: Systems and Humans, 41(3):552
–568, 2011.
[16] T. Wu, J. Duchateau, J. Martens, and D. Van Compernolle. Feature subset
selection for improved native accent identification. Speech Commun., 52:83–
98, February 2010.
[17] V. N. Vapnik. The nature of statistical learning theory. Springer-Verlag, New
York, NY, USA, 1995.
[18] V. N. Vapnik, S. E. Golowich, and S. Smola. Support vector method for func-
tion approximation, regression estimation, and signal processing. In Advances
in Neural Information Processing Systems 9, pages 281–287. MIT Press, 1996.

View publication stats

Yong Shi - Advances in Big Data Analytics - Theory, Algorithms and Practices (2022, Springer) - Libgen - Li
No ratings yet
Yong Shi - Advances in Big Data Analytics - Theory, Algorithms and Practices (2022, Springer) - Libgen - Li
723 pages
The SVM Classifier Based On The Modified Particle Swarm Optimization
No ratings yet
The SVM Classifier Based On The Modified Particle Swarm Optimization
9 pages
UBICC Article 522 522
No ratings yet
UBICC Article 522 522
8 pages
Support Vector Machine in R Paper
No ratings yet
Support Vector Machine in R Paper
28 pages
Fuzzy Support Vector Machines: IEEE Transactions On Neural Networks March 2002
No ratings yet
Fuzzy Support Vector Machines: IEEE Transactions On Neural Networks March 2002
9 pages
11597-Article Text-7395-2-10-20220813
No ratings yet
11597-Article Text-7395-2-10-20220813
5 pages
Tan 2021 J. Phys. Conf. Ser. 1994 012016
No ratings yet
Tan 2021 J. Phys. Conf. Ser. 1994 012016
6 pages
A Review of Supervised Learning Based Classification For Text To Speech System
No ratings yet
A Review of Supervised Learning Based Classification For Text To Speech System
8 pages
A Fuzzy Classification Method Based On Support Vector Machine
No ratings yet
A Fuzzy Classification Method Based On Support Vector Machine
4 pages
Data Classification Using Support Vector Machine: Durgesh K. Srivastava, Lekha Bhambhu
No ratings yet
Data Classification Using Support Vector Machine: Durgesh K. Srivastava, Lekha Bhambhu
7 pages
SVM Basics Paper
No ratings yet
SVM Basics Paper
7 pages
1 s2.0 S095741742300951X Main
No ratings yet
1 s2.0 S095741742300951X Main
10 pages
Cloudsvm: Training An SVM Classifier in Cloud Computing Systems
No ratings yet
Cloudsvm: Training An SVM Classifier in Cloud Computing Systems
13 pages
C. Cifarelli Et Al - Incremental Classification With Generalized Eigenvalues
No ratings yet
C. Cifarelli Et Al - Incremental Classification With Generalized Eigenvalues
25 pages
A New Heuristic of The Decision Tree Induction: Ning Li, Li Zhao, Ai-Xia Chen, Qing-Wu Meng, Guo-Fang Zhang
No ratings yet
A New Heuristic of The Decision Tree Induction: Ning Li, Li Zhao, Ai-Xia Chen, Qing-Wu Meng, Guo-Fang Zhang
6 pages
Liu, 2021 - Projection - Multiobj - SVM
No ratings yet
Liu, 2021 - Projection - Multiobj - SVM
13 pages
Paper Oltean Gordan
No ratings yet
Paper Oltean Gordan
7 pages
Machine Learning As Ecology
No ratings yet
Machine Learning As Ecology
23 pages
Taz TFG 2016 2057
No ratings yet
Taz TFG 2016 2057
52 pages
Aim of The Experiment-Software Required - Theory
No ratings yet
Aim of The Experiment-Software Required - Theory
6 pages
SVM Presentation
No ratings yet
SVM Presentation
27 pages
Support Vector Machine Ensemble With Bagging
No ratings yet
Support Vector Machine Ensemble With Bagging
13 pages
Support Vector Machine Dissertation
100% (2)
Support Vector Machine Dissertation
7 pages
A Comprehensive Survey On Support Vector Machine Classification Applications, Challenges and Trends - 2019
No ratings yet
A Comprehensive Survey On Support Vector Machine Classification Applications, Challenges and Trends - 2019
8 pages
Probabilistic Feature Selection and Classification Vector Machine
No ratings yet
Probabilistic Feature Selection and Classification Vector Machine
27 pages
Comparative Study of Four Supervised Machine Learning Techniques For Classification
No ratings yet
Comparative Study of Four Supervised Machine Learning Techniques For Classification
15 pages
An Improved Training Algorithm For Support Vector Machines
No ratings yet
An Improved Training Algorithm For Support Vector Machines
10 pages
A HSC-based Sample Selection Method For Support Vector Machine
No ratings yet
A HSC-based Sample Selection Method For Support Vector Machine
6 pages
Experiential Study of Kernel Functions To Design An Optimized Multi-Class SVM
No ratings yet
Experiential Study of Kernel Functions To Design An Optimized Multi-Class SVM
6 pages
Support Vector Machine For Classification
No ratings yet
Support Vector Machine For Classification
38 pages
Unit-4 AI - SVM
No ratings yet
Unit-4 AI - SVM
21 pages
Classification of Cyber Attacks Using Support Vector Machine
100% (1)
Classification of Cyber Attacks Using Support Vector Machine
4 pages
Novel Dynamic Partial Reconfiguration Implementations of The Supp
No ratings yet
Novel Dynamic Partial Reconfiguration Implementations of The Supp
18 pages
Multi-Class Classification Using Support Vector Machines in Binary Tree Architecture
No ratings yet
Multi-Class Classification Using Support Vector Machines in Binary Tree Architecture
6 pages
VO MCA S4 Data Mining Unit 6
No ratings yet
VO MCA S4 Data Mining Unit 6
21 pages
Fast Kernel Classifiers
No ratings yet
Fast Kernel Classifiers
41 pages
Hearst SVM
No ratings yet
Hearst SVM
12 pages
Performance Evaluation of SVM in A Real Dataset To Predict Customer Purchases
No ratings yet
Performance Evaluation of SVM in A Real Dataset To Predict Customer Purchases
5 pages
Lab 6 Dsa
No ratings yet
Lab 6 Dsa
15 pages
UNIT-II-Support Vector Machine Algorithm
No ratings yet
UNIT-II-Support Vector Machine Algorithm
13 pages
Detection of Temporal Bone Abnormalities Using Hybrid Wavelet Support Vector Machine Classification
No ratings yet
Detection of Temporal Bone Abnormalities Using Hybrid Wavelet Support Vector Machine Classification
6 pages
Research Article: The Construction of Support Vector Machine Classifier Using The Firefly Algorithm
No ratings yet
Research Article: The Construction of Support Vector Machine Classifier Using The Firefly Algorithm
9 pages
SVM Model Selection Using PSO For Learning Handwritten
No ratings yet
SVM Model Selection Using PSO For Learning Handwritten
14 pages
Support Vector Machines For Classification
No ratings yet
Support Vector Machines For Classification
29 pages
Nalepa-Kawulok2019 Article SelectingTrainingSetsForSuppor
No ratings yet
Nalepa-Kawulok2019 Article SelectingTrainingSetsForSuppor
44 pages
B43 Exp3 ML
No ratings yet
B43 Exp3 ML
5 pages
Transformer Health Index Regression and Condition Classification Using SVM
No ratings yet
Transformer Health Index Regression and Condition Classification Using SVM
18 pages
2012 - Huang Et Al. Extreme Learning Machine For Regression and Multiclass Classification
No ratings yet
2012 - Huang Et Al. Extreme Learning Machine For Regression and Multiclass Classification
17 pages
A Parallel Mixture of Svms For Very Large Scale Problems
No ratings yet
A Parallel Mixture of Svms For Very Large Scale Problems
8 pages
AP For NLP-LO2
No ratings yet
AP For NLP-LO2
38 pages
A Comprehensive Survey On Support Vector Machine Classification - Applications, Challenges and Trends
No ratings yet
A Comprehensive Survey On Support Vector Machine Classification - Applications, Challenges and Trends
27 pages
V3I4201499b50 PDF
No ratings yet
V3I4201499b50 PDF
8 pages
Presented By: M. Saqib Iqbal Gull Muhammad Presented To: Mr. Imran Ali Khan Artificial Intelligence National College of Bussiness Administration & Economics Multan
No ratings yet
Presented By: M. Saqib Iqbal Gull Muhammad Presented To: Mr. Imran Ali Khan Artificial Intelligence National College of Bussiness Administration & Economics Multan
11 pages
Combining Support Vector Machines: 6.1. Introduction and Motivations
No ratings yet
Combining Support Vector Machines: 6.1. Introduction and Motivations
20 pages
Articol Informatica Economica
No ratings yet
Articol Informatica Economica
10 pages
Thesis
No ratings yet
Thesis
364 pages
Neurocomputing: Yukun Bao, Zhongyi Hu, Tao Xiong
No ratings yet
Neurocomputing: Yukun Bao, Zhongyi Hu, Tao Xiong
9 pages
AKTUA399 Masteroppgave Fredrik Hjorth Bentsen
No ratings yet
AKTUA399 Masteroppgave Fredrik Hjorth Bentsen
84 pages
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
From Everand
IGNOU BCA Introduction to Algorithm Design Previous Year Unsolved Papers BCS 042
Manish Soni
No ratings yet
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
From Everand
Computer Vision Graph Cuts: Exploring Graph Cuts in Computer Vision
Fouad Sabry
No ratings yet
ME Cse Regulation 2018
No ratings yet
ME Cse Regulation 2018
49 pages
CSE 634 Data Mining Techniques: Mining Association Rules in Large Databases
No ratings yet
CSE 634 Data Mining Techniques: Mining Association Rules in Large Databases
41 pages
Unit 2
No ratings yet
Unit 2
38 pages
Binary Ordering Algorithm
No ratings yet
Binary Ordering Algorithm
5 pages
Daffodil International University Lab Report
No ratings yet
Daffodil International University Lab Report
13 pages
04 Greedy
No ratings yet
04 Greedy
62 pages
IIRfilterdesign FPGA
No ratings yet
IIRfilterdesign FPGA
10 pages
Transportation and Assignment
No ratings yet
Transportation and Assignment
33 pages
DSP Question Bank IV CSE - Cs - 2403doc
No ratings yet
DSP Question Bank IV CSE - Cs - 2403doc
36 pages
4.12 Max SumAlg PDF
No ratings yet
4.12 Max SumAlg PDF
19 pages
M.B.M. University: S.no Index P.no Sign
No ratings yet
M.B.M. University: S.no Index P.no Sign
4 pages
NMCP Unit 5
No ratings yet
NMCP Unit 5
4 pages
Unit 1.2 Array
No ratings yet
Unit 1.2 Array
105 pages
Summative Test 1 (Grade 8 First Quarter) Directions: Choose The Letter of The Correct Answer
No ratings yet
Summative Test 1 (Grade 8 First Quarter) Directions: Choose The Letter of The Correct Answer
2 pages
Implicit Vs Explicit
No ratings yet
Implicit Vs Explicit
1 page
DWM Notes
No ratings yet
DWM Notes
19 pages
Principles of S.oft Computing: Wiley
0% (1)
Principles of S.oft Computing: Wiley
712 pages
BE Honours (Text, Web and Social Media Analytics
No ratings yet
BE Honours (Text, Web and Social Media Analytics
1 page
CSE302: Compiler Design
No ratings yet
CSE302: Compiler Design
18 pages
Digital Signal Processing: Dr. Muayad
No ratings yet
Digital Signal Processing: Dr. Muayad
11 pages
DSP Ass1
No ratings yet
DSP Ass1
3 pages
Sasaki
No ratings yet
Sasaki
20 pages
Tapology Front Page
No ratings yet
Tapology Front Page
5 pages
1.2 Round Off Errors and Computer Arithmetic
No ratings yet
1.2 Round Off Errors and Computer Arithmetic
15 pages
Error Detection and Correction
No ratings yet
Error Detection and Correction
6 pages
Assignment 1
No ratings yet
Assignment 1
10 pages
Ai & ML Roadmaps
No ratings yet
Ai & ML Roadmaps
2 pages
ML06 Neural-Network 2024-2025
No ratings yet
ML06 Neural-Network 2024-2025
78 pages
MLP Multilayer
No ratings yet
MLP Multilayer
29 pages
The MD5 Encryption & Decryption
No ratings yet
The MD5 Encryption & Decryption
13 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.

Artigo Smallex

Uploaded by

Artigo Smallex

Uploaded by

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Support Vector Machines applied to noisy data classiﬁcation using differential

Article · January 2011

Rodrigo Cosme Renato Krohling

SEE PROFILE SEE PROFILE

The user has requested enhancement of the downloaded file.

Rodrigo de C. Cosme and Renato A. Krohling

In 1995, Vapnik [17] introduced a foundation to Support vector machines (SVM),

2 Support Vector Machines

f (w, b) = sign(w · x + b) (1)

max ∑ni=1 αi − 21 ∑i, j αi α j ci c j k(xi , x j )

(Φ(x)Φ(x)) = K(x, xi ) (4)

The optimization procedure Differential Evolution (DE) was introduced by Storn

Li = Xi + α(Xbesti − Xi) + β (X p − Xq ) (10)

Gi = Xi + α(Xbest − Xi) + β (Xr1 − Xr2 ) (11)

Vi = wGi + (1 − w)Li (12)

4 Tabu Search and Nelder-Mead method

As mentioned in the introduction, we adopted Tabu Search (TS) and Nelder-Mead

4.1 Tabu Search method

4.2 Nelder-Mead method

2 Copy Pr into Ptemp

Listing 1 Nelder-Mead algorithm

where the stopping condition in step 10 is given by:

5 Optimization of Support Vector Machines Parameters

6.1 Experimental settings

6.2 Results and Discussion

Noise in training Noise in test

Noise in training and test Noise in training label

Noise in training Noise in test

Noise in training and test Noise in training label

Noise in training Noise in test

Noise in training and test Noise in training label

Noise in training Noise in test

Noise in training and test Noise in training label

[1] A. Banerjee. Robust fuzzy clustering as a multi-objective optimization proce-

Noise in training Noise in test

Noise in training and test Noise in training label

[6] M. H. Mashinchi, M. A. Orgun, and W. Pedrycz. Hybrid optimization with

nition. Artificial Life and Robotics, 13:483–487, 2009.

View publication stats

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.