Optimizing Support Vector Machine Parame PDF
Optimizing Support Vector Machine Parame PDF
Abstract-Support Vector Machines are considered to be are: tuning SVM parameters, and selecting an optimal
excellent patterns classification techniques. The process of feature subset to be given to the SVM classifier. These
classifying a pattern with high classification accuracy counts problems affect each other [9]. This study focuses on tuning
mainly on tuning Support Vector Machine parameters which SVM parameters, also known as model selection.
are the generalization error parameter and the kernel function
There is no regular methodology that accepts advance
parameter. Tuning these parameters is a complex process and
may be done experimentally through time consuming human approximation of optimal values for SVM parameters. In
experience. To overcome this difficulty, an approach such as present classification work, obtaining good values for these
Ant Colony Optimization can tune Support Vector Machine parameters is not easy. It requires either an exhaustive
parameters. Ant Colony Optimization originally deals with search through the space of hyper variables or an
discrete optimization problems. Hence, in applying Ant Colony optimization approach that searches simply a bounded sub
Optimization for optimizing Support Vector Machine group of the potential values. Currently, almost all SVM
parameters, which are continuous parameters, there is a need research chooses these variables experimentally via
to discretize the continuous value into a discrete value. This searching a bounded number of values and preserving those
discretization process results in loss of some information and,
that supply the lowest amount of mistakes. This approach
hence, affects the classification accuracy and seek time. This
study proposes an algorithm to optimize Support Vector needs a grid search through the area of variable values and
Machine parameters using continuous Ant Colony requires identifying the range of executable solution and
Optimization without the need to discretize continuous values best sampling step. This is a difficult task because best
for Support Vector Machine parameters. Seven datasets from sampling steps change from kernel to kernel and grid ranges
UCI were used to evaluate the performance of the proposed may not be simple to identify without advanced knowledge
hybrid algorithm. The proposed algorithm demonstrates the of the problem. Furthermore, when a hyper parameter
credibility in terms of classification accuracy when compared exceeds two of the manual prototypes chosen, it may
to grid search techniques. Experimental results of the proposed become intractable [10]. Approaches such as trial and error,
algorithm also show promising performance in terms of
grid search, cross validation, generalization error estimation
computational speed.
and gradient descent, can be used to find optimal parameter
Keywords-Support Vector Machine; continuous Ant values for SVM. Evolutionary approaches such as Genetic
Colony Optimization; parameters optimization Algorithm (GA), Particle Swarm Optimization (PSO) and
Ant Colony Optimization (ACO) may also be utilized [11].
I. INTRODUCTION ACO algorithms is applied to tune SVM parameters.
Many decision-making processes are examples of These algorithms work through repetitive creation
classification difficulty that can be simply transformed into procedures where each procedure directs a dependent
classification difficulty, e.g., prognosis processes, diagnosis heuristic by intelligently mixing various ideas for exploring
processes, and pattern recognition [1]. The majority of and exploiting the seek space. The learning fashions are
recent researches center on enhancing classification utilized to construct information to efficiently obtain near
accuracy by utilizing statistical approaches [2]. Pattern optimal solutions. Solutions that are built using ACO seek to
classification aims to classify input features into find the shortest way to the origin of food via pheromones
predetermined groups consisting of classes of patterns [3]. [11]-[13]. ACO algorithms deal with discrete and
The Support Vector Machine (SVM) is a present day pattern continuous variables. However, ACO that deals with
classification approach. SVM originates from statistical continuous variables is considered as a modern research
learning approaches that utilize the concept of structural risk field [14]-[17].
minimization [4] and [5]. This concept plans the data into Ant Colony Optimization for continuous variables
high dimensional domains via a kernel function by using a (ACOR ) uses Probability Density Function (PDF) instead of
kernel trick [4] and [6]. Polynomial, Radial Base Function Discrete Probability Distribution, to determine the direction
(RBF), and sigmoid kernel function are three examples of that an ant should follow; Gaussian function, a PDF is one
kernel functions. RBF is the more popular kernel function of the most popular as it uses a very simple manner for data
because of its capability to manage high dimensional data sampling. For each built solution, a density function is
[7], good performance in major cases [8] and it only needs generated from a set of solutions that the technique
one parameter, kernel parameter gamma (γ). Two problems preserves at all times. In order to maintain this set, the set is
in SVM classifier that influence the classification accuracy filled with nonsystematic solutions at the beginning. This is
similar to initializing pheromone value in a discrete ACO expression as a two level optimization problem, where the
approach. Then, at each loop, the group of created solutions values of variables change continuously and thus
is appended to the set and the equivalent number of worst optimization approaches can be implemented to choose
solutions is deleted from the set. This work is similar to optimal variables. These variables can be calculated through
pheromone modification in discrete ACO. The goal is to cross-validation. To obtain optimal values, the variables are
influence the searching procedure to gain the best solution. tested continuously instead of utilizing a discrete approach.
Pheromone information is kept in a table when ACO for Their prototype involves two phases. First, an SVM
discrete combinatorial optimization is used. During each classifier built on the foundation of training data. Secondly,
loop, when selecting a component to be appended to the GA is used to seek optimal values. From their results they
current partial solution, an ant utilizes part of the values concluded that their proposed method often produces better
from that table as a discrete probability distribution. In results compared with pre-selected cost methods. Simple
contrast to the situation of continuous optimization, the pre-selected cost methods work well on some datasets.
selection that the ant makes is not limited to a finite group. Zhang [22] suggested using an automatic and successful
Therefore, it is difficult to express the pheromone in the model selection approach. His work built on evolutionary
table structure. Instead of using a table, ACOR uses solution computation approaches and utilized recollection, accuracy
archive to preserve the route for a number of solutions. and mistake ratio as optimization goals. The concept of
Solution archive contains values of solution variables and constructing a kernel prototype is used which is then
objective functions. These values are then used to modified to the data group with the help of evolutionary
dynamically create PDF [16] and [17]. computation approaches. The modification procedure is
In this study, ACOR is used to solve the SVM model directed by the feedback information obtained from SVM
selection problem. The rest of the paper is organized as execution. Both GA and PSO are used as evolutionary
follows. Section II reviews several literatures on tuning computation approaches to resolve optimization difficulty
SVM parameters and Section III describes the proposed that occurs due to their robustness and global seeking
algorithm. Section IV presents the experimental results, and capability. Saini, Aggarwal & Kumar [13] suggested using
concluding remarks and future works are presented in GA to optimize SVM variables. The regularization
Section V. parameter C and kernel parameters are dynamically
optimized through GA. In their work they used unconnected
II. TUNING SUPPORT VECTOR MACHINE time strings for each worked trading interval instead of
PARAMETER utilizing single time strings to model each day’s price
Imbault & Lebart [18] suggested the use of global profile. From their experiments they concluded that their
minimization approaches, which are GA and SA to solve model supplies better predicting with sensible levels of
model selection problems. They measured GA and SA with accuracy and stability. A grid-based ACO technique was
modified cooling approaches to automatically select the introduced by Zhang, Chen, Zhang, & He [23] to select
value at each step. Their experiments show that using a variables C and RBF kernel σ automatically for SVM
global minimization approach guarantees putting them in a instead of choosing variables unsystematically through
good area, thereby preventing very large misclassification human skill to minimize generalization mistakes and
ratios. Also in their experiments, they saw that GA tends to generalization execution which may be enhanced
be faster, SA needs few variables’ setting while GA requires concurrently. Their work provides high accuracy and less
more. The primary disadvantage of these approaches is their calculation time compared with other methods like grid
calculation time. Frohlich & Zell [19] proposed the use of an algorithm and cross validation approach. RBF kernel is
online Gaussian Process (GP) from the locations in utilized to enhance the accuracy of SVM. However, one
parameter space that have been visited. From their dataset is used to evaluate the performance of the proposed
experiments, they found that online GP can be applied at a technique. ACO was also used by Fang & Bai [24] to
cheaper cost. Recent locations in parameter space are optimize both SVM parameters, C and σ kernel function
sampled based on the predicted enhancement condition. parameters in continuous fields. Both parameters C and σ
Adankan, Cheriet, & Ayat [20] suggested using a fast are divided into a number of sub intervals. In each sub
enhanced method for tuning SVM parameters based on an interval, one point is chosen unsystematically to be the
approximation of the gradient of the empirical error along location of artificial ants. Before starting each loop, advance
with incremental learning, which reduces the resources knowledge and heuristic information are modified. In every
required both in terms of processing time and of storage loop, the transition probability of each ant is predetermined.
space. They tested their method on many benchmark data The ant will move to the next interval if the state transition
which produced promising results confirming their rule is met, otherwise, the ant will search for optimal
approach. The use of GA to optimize C and band width variables within local intervals. Their results showed a very
kernel function variable σ of the SVM was suggested by promising hybrid SVM model for forecasting share price in
Abbas & Arif [12]. In their study, they proposed seven terms of accuracy and generalization ability. Lu, Zhou, He,
support vector machines, one for each day of the week, & Liu [27] proposed using PSO for SVM parameter
trained on previous data which was then utilized for the optimization. PSO is very suitable for global optimization.
predication of daily peak load long range demand. From They considered these parameters as particles and PSO is
their results they concluded that their work gave outcomes applied to gain optimal values for these parameters. Their
better than the best paper of the competition. Dong, Xia, Tu, work shows that the accuracy and efficiency are enhanced.
& Xing [21] proposed the cost variable and kernel variable
III. THE PROPOSED ALGORITHM where 𝑣 is the number of categories of target variable, 𝑁𝑓 is
This study constructs ACOR to optimize SVM classifier 𝑐
the number of features, 𝑁𝑖 is the number of samples of
parameters. An ant’s solution is used to represent a the 𝑖th feature with categorical value c, c ∈ {1, 2, …, 𝑣},
combination of the classifier parameters, C and 𝛾, based on
the Radial Basis Function (RBF) kernel of the SVM 𝑥𝑖,𝑗𝑐 is the jth training sample for the 𝑖 th feature with
classifier. The classification accuracy of the built SVM categorical value c, j ∈ {1, 2, …, 𝑁𝑖 𝑐 }, 𝑥𝑖 is the 𝑖th feature,
classifier is utilized to direct the updating of solution and 𝑥𝑖 𝑐 is the 𝑖th feature with categorical value c.
archives. Based on the solution archive, the transition After computing the F-score for each feature in the
probability is computed to choose a solution path for an ant. dataset, the average F-score is computed and is considered
In implementing the proposed scheme, this study utilizes the as the threshold for choosing features in the feature subset.
RBF kernel function for SVM classifier because of its Features with F-scores equal to or greater to the threshold
capability to manage high dimensional data [7], good are chosen and put in the feature subset and this subset is
performance in major cases [8], and it only needs to use one presented to the SVM.
parameter: kernel parameter gamma ( 𝛾 ) [9]. The overall In the initialization step, for each ant establishing a
process to hybridize ACOR and SVM (ACOR-SVM) is as solution path for parameter C and parameter γ, two solution
depicted in Fig. 1. archives are needed to design the transition probabilities for
The main steps are (1) selecting feature subset (2) C and for γ. The range value for C and γ are sampling
initializing solution archive and algorithm parameters, (3) according to random parameter k which is the size of
solution construction for C and 𝛾 , (4) establishing SVM solutions archives. The weight vector, w is then computed
classifier model, and (5) updating solution archives. In the for each sample for C and γ as follows:
features subset selection step, F-score is used as a (𝑙−1)2
1 −
measurement to determine the importance of features. This 𝑤𝑙 = 𝑞𝑘 𝑒 2𝑞 2 𝑘 2 (2)
2𝜋
measurement is used to judge the favoritism capability of a where k is the size of solution archive, and q is the
feature. High value of F-score indicates the most favorable algorithm’s parameter to control diversification of search
feature. The calculation of F-score is as follows [28]: process. These values are stored in solution archives.
𝑐 2
𝑣 𝑥 𝑖 −𝑥 𝑖
𝑐−1
𝐹 − 𝑆𝑐𝑜𝑟𝑒𝑖 = 𝑐 2
, 𝑖 = 1, 2, … , 𝑁𝑓 (1)
1 𝑁 (𝑐) 𝑐
𝑣 𝑖 𝑥 𝑖,𝑗 −𝑥 𝑖
𝑐−1
𝑁
𝑐
−1
𝑗 =1 Calculate F-Score for each
𝑖
feature
SVM Classifier
Training the
classifier via k-fold
cross validation
Termination Yes
condition Calculate the classification
satisfied? accuracy
No End
V. CONCLUSIONS AND FUTURE WORKS probabilistic Induction,” Expert Systems with Applications,
This study investigated a hybrid ACOR and SVM vol. 36, no. 5, pp. 9524-9532, Jul. 2009.
technique to obtain optimal model parameters. [3] R. Sivagaminathan and S. Ramakrishnan, “A hybrid
approach for feature subset selection using neural networks
Experimental results on seven public UCI datasets showed
and ant colony optimization,” Expert Systems with
promising performance in terms of test accuracy and Applications, vol. 33, no. 1, pp. 49-60, Jul. 2007.
training time. Possible extensions can focus on the area [4] W. Liu and D. Zhang, “Feature subset selection based on
where ACOR-SVM can simultaneously optimize both improved discrete particle swarm and support vector
SVM parameters and features subset using mixed-variable machine algorithm,” Information Engineering and
ACO (ACOR-MV). Incremental Continuous ACO (IACOR) Computer Science, Wuhan, China, 2009, pp. 586-589.
may also be a good alternative for optimizing the classifier [5] H. Zhang and H. Mao, “Feature selection for the stored-
parameter values. Other kernel parameters besides RBF, grain insects based on PSO and SVM,” Knowledge
application to other SVM variants and multiclass data are Discovery and Data Mining, Moscow, Russia, 2009, pp.
586-589.
considered possible future work in this area. [6] Y. Ye, L. Chen, D. Wang, T. Li, Q. Jiang, and M. Zhao,
“SBMDS: an interpretable string based malware detection
ACKNOWLEDGEMENTS system using SVM ensemble with bagging,” Computer
The authors wish to thank the Ministry of Higher Virology, vol. 5, no. 4, pp. 283-293, Nov. 2009.
Education Malaysia for funding this study under [7] S. Moustakidis and J. Theocharis, “SVM-FuzCoC: A novel
Fundamental Research Grant Scheme, S/O code 12377 SVM-based feature selection method using a fuzzy
and RIMC, Universiti Utara Malaysia, Kedah for the complementary criterion,” Pattern Recognition, 43, no. 11,
administration of this study. pp. 3712-3729, Nov. 2010.
[8] H. Zhang, M. Xiang, C. Ma, Q. Huang, W. Li, W. Xie, Y.
Wei, and Yang, S., “Three-class classification models of
[1] REFERENCES H. Orkcu and H. Bal, “Comparing LogS and LogP derived by using GA-CG-SVM approach,”
performance of back propagation and genetic algorithms in
Molecular Diversity, vol. 13, no. 2, pp. 261-268, May
the data classification,” Expert Systems with Applications, 2009.
vol. 38, no. 4, pp. 3703-3709, Apr. 2011.
[9] C. Huang and C. Wang, “A GA-based feature selection and
[2] V. Tseng and C. Lee, “Effective temporal data
parameters optimization for support vector machines,”
classification by integrating sequential pattern mining and Expert Systems with Applications, vol. 31, no. 2, pp. 231-
240, Aug. 2006.
[10] N. Ayat, M. Cheriet, and C. Suen, C., “Automatic model [22] Y. Zhang, “Evolutionary computation based automatic
selection for the optimization of SVM kernels,” Pattern SVM model selection,” IEEE International Conference on
Recognition, vol. 38, no. 10, pp. 1733-1745, Oct. 2005 Natural Computation (ICNC’08) 2008, pp. 66-70, Jinan,
[11] X. Zhang, X. Chen, and Z. He, “An ACO-based algorithm China.
for parameter optimization of support vector machines,” [23] X. Zhang, X. Chen, Z. Zhang, and Z. He, “A grid–based
Expert Systems with Applications, vol. 37, no. 9, pp. 6618- ACO algorithm for parameters optimization in support
6628, Sep. 2010. vector machines,” IEEE International conference on
[12] S. Abbas and M. Arif, “Electric load forecasting using Granular Computing (GrC 2008), 2008, pp. 805-808,
support vector machines optimized by genetic algorithm,” Hangzhou, China.
INMIC, Islamabad, India, 2006, pp. 395-399. [24] X. Fang and T. Bai, “Share price prediction using wavelet
[13] L. Saini, S. Aggarwal, and A. Kumar, “Parameter transform and ant colony algorithm for parameters
optimization using genetic algorithm for support vector optimization in SVM,” Congr. of the IEEE International
machine-based price-forecasting model in national Congress on Intelligent Systems (GCIS ’09), 2009, pp.
electricity market,” IET Generation, Transmission & 288-292, Xiamen, China.
Distribution, vol. 4, no. 1, pp. 36-49, 2010. [25] Y. Dong, Z. Xia, M. Tu, and G. Xing, “An optimization
[14] K. Socha and C. Blum, ”Ant colony optimization,” in method for selecting parameters in support vector
Metaheuristic procedures for training neutral networks, vol. machines,” IEEE International Conference on Machine
36, E. Alba and R. Martí, Eds. US : Springer, 2006, pp. Learning and Applications (ICMLA 2007), 2007, pp. 1-6.
153-180. Cincinnati, OH.
[15] C. Blum, “Ant colony optimization: introduction and recent [26] H. Frohlich and A. Zell, “Efficient parameter selection for
trends,” Physics of Life Reviews, vol. 2, no. 4, pp. 353-373, support vector machines in classification and regression via
Dec. 2005. model-based optimization,” Proc. of the IEEE
[16] K. Socha and M. Dorigo, “Ant colony optimization for International Joint Conference on Neural Networks
continuous domain,” European Journal of Operational (IJCNN ’05), 2005, pp. 1431-1436, Montreal, Canada.
Research, vol. 185, no. 3, pp. 1155-1173, Mar. 2008. [27] N. Lu, J. Zhou, Y. He, and Y. Liu, “Particle swarm
[17] K. Socha, “Ant colony optimization for continuous and optimization for support vector machine model,” IEEE
mixed-variables domain,” Ph.D. dissertation, Universite’ International Conference on Intelligent Computation
Libre de Bruxelles, 2008. Technology and Automation (ICICTA 2009), 2009, pp.
[18] F. Imbault and K. Lebart, “A stochastic optimization 283-286, Changsha, China.
approach for parameter tuning of support vector machines,” [28] C. Huang, “ACO-based hybrid classification system with
Proc. of the IEEE International Conference on Pattern feature subset selection and model parameters
Recognition (ICPR’04), 2004, pp. 597-600, Washington, optimization,” Neurocomputing, vol. 73, no. 1-3, pp. 438-
USA. 448, Dec. 2009.
[19] H. Frohlich and A. Zell A., “Efficient parameter selection [29] UCI Repository of machine learning databases,
for support vector machines in classification and regression Department of Information and Computer Science,
via model-based optimization,” Proc. of the IEEE University of California, Irvine, CA,
International Joint Conference on Neural Network <http://www.ics.uci.edu/mlearn/MLRepository>, 2012.
(IJCNN’05), 2005, pp. 1431-1436 , Montreal, Canada, pp. [30] S. Ding and S. Li, “Clonal Selection Algorithm for Feature
597-60. Selection and Parameters Optimization of Support Vector
[20] M. Adankan, M., Cheriet, and N. Ayat, “Optimizing Machines,” Symp. of the IEEE on knowledge acquisition
resources in model selection for support vector machines,” and modeling (KAM ’09), 2009, Wuhan, China, pp. 17-20.
Proc. of the IEEE International Joint Conference (IJCNN [31] C. Huang and J. Dun, “A Distributed PSO-SVM hybrid
‘05), 2005, pp. 925-930, Montreal, Canada. system with feature selection and parameter optimization”.
[21] Y. Dong, Z. Xia, M. Tu, and G. Xing, “An optimization Applied Soft Computing, vol. 8, no. 4, pp. 1381-1391, Sep.
method for selecting parameters in support vector 2008.
machines,” IEEE International Conference on Machine
Learning and Applications (ICMLA’06) 2007, pp. 1-6, OH,
USA.