Parameter Control Mechanisms in Differential Evolution: A Tutorial Review and Taxonomy
Parameter Control Mechanisms in Differential Evolution: A Tutorial Review and Taxonomy
Abstract—Differential evolution (DE) is a promising algorithm Table I gives the pseudo code of a typical DE algorithm. In
for continuous optimization. Its two parameters, CR and F, have each generation G, every individual serves as the target vector.
great effect on the algorithm performance. In recent years many In the mutation step, several non-identical individuals are
DE algorithms with parameter control mechanisms were chosen randomly. One individual is the base vector and then
proposed. In this paper we propose a taxonomy to classify these
adds an amplified difference vector to be a donor vector. Next,
algorithms according to the number of candidate parameter
values, the number of parameter values used in a single a trial vector is produced by taking the gene values from the
generation, and the source of considered information. We classify target vector or the donor vector probabilistically. Finally, the
twenty-three recent studies into nine categories and review their trial vector replaces the target vector if the former is not worse
design features. Two types of relationships between these than the latter. The DE in Table I is denoted by rnd/1/bin.
algorithms and several research directions are also summarized. Common variants include best/1/bin, rnd/2/bin, and so on [2].
DE has three parameters, NP, CR, and F. Experimental
Index Terms—differential evolution, parameter control, results have shown that their values have great effect on the
adaptive, self-adaptive, classification, taxonomy
convergence speed and solution quality. Setting parameter
values for evolutionary algorithms is carried out in two ways:
I. INTRODUCTION the parameter tuning method tests different values and runs the
Differential evolution (DE) [1][2] has been recognized as a algorithm with the best (and fixed) value; the parameter
promising algorithm for continuous optimization in the last control method adjusts the parameter values during the
decade. It is featured by using the difference between execution of algorithm. Although many advices were given for
individuals in the mutation operator and the local selection by parameter tuning, it is still a time-consuming process to find
comparing one parent and its offspring to determine the proper parameter values. Moreover, we may need different
survivor. parameter values for different stages in the evolution process,
different individuals (search regions), and even different
TABLE I. PSEUDO CODE OF DIFFERENTIAL EVOLUTION objectives (in the case of multiobjective optimization).
NP: population size G: generation number D: problem dimension
Eiben et al. [3] classified parameter control methods into
CR: crossover rate F: scaling factor three groups: deterministic, adaptive, and self-adaptive. The
difference between the first two methods is in that the adaptive
Initialize the population. G = 1. method considers feedback information during the evolution
while the stopping criterion is not met process. The self-adaptive method is featured by encoding
for i = 1 to NP // for each target vector Xi,G = {x1,i,G, x2,i,G, … xD,i,G}
parameters on the chromosomes and evolving the parameters in
// mutation: generate a donor vector Vi, G = {v1,i,G, v2,i,G, … vD,i,G} the same way of evolving the decision variables. After
Vi,G = Xri1,G + F⋅(Xri2,G – Xri3,G) reviewing recent DE algorithms with parameter control, we
// crossover: generate a trial vector Ui,G = {u1,i,G, u2,i,G, … uD,i,G} found that most algorithms fall into the same group (the
for j = 1 to D adaptive group) according to the classification scheme in [3].
°v j ,i ,G , if U j (0,1) ≤ CR ∨ j = j rnd This motivates us to propose a new taxonomy and notation to
u j ,i ,G = ®
°̄ x j ,i ,G , otherwise identify the features of these parameter control methods and to
end for know the similarity and difference between them.
The rest of this paper is organized as follows. In Section II
// selection: accept the trial vector if not worse than the target vector
we describe the proposed taxonomy and classification criteria.
°U i , G , if f (U i ,G ) ≤ f ( X i , G ) Section III reviews nine categories of parameter control
X i , G +1 = ®
°̄ X i , G , otherwise
mechanisms in twenty-three DE studies. Section IV
end for summarizes relationships of algorithm design and performance
G=G+1
comparison among these algorithms. Conclusions and research
end while
directions are given in Section V.
978-1-4673-5873-6/13/$31.00 2013
c IEEE 1
II. PROPOSED TAXONOMY In the literature of DE, a standard three-field notation has
Although some studies addressed the dynamic control of been commonly adopted to describe the mutation strategy. For
the population size, most studies fixed the population size and example, the rnd/1/bin strategy refers to that (1) the base vector
focused on the control of the other two parameters, CR and F. is selected randomly, (2) one difference vector is used in
In this paper we only consider the studies of DE that control the generating a donor vector, and (3) the binomial crossover is
values of CR and F. We propose to distinguish the parameter used to produce the trial vector. Similarly, we propose a three-
control mechanisms by three aspects: (1) the number of field notation to give a simple and pertinent tag to the
candidate parameter values, (2) the number of parameter values parameter control mechanisms. For example, the con/mul/pop
used in a single generation, and (3) the source of considered strategy refers to that (1) parameter values are from a
information. The different designs and corresponding notations continuous range, (2) multiple values are used in a single
in the proposed taxonomy are detailed in the following. generation, and (3) parameter values are adjusted based on the
1) The number of candidate parameter values: Almost all statistics collected from the entire population.
existing parameter control mechanisms allow any value in a III. LITERATURE REVIEW
predefined range, e.g. [0, 1], for CR. In our survey, only one
We classify the literature on the parameter control of DE
study selected from a finite set of values for parameters. We
into nine groups. For each group, we will give some examples
denote these two kinds of strategies by con (continuous) and
and describe their core design ideas.
dis (discrete).
2) The number of parameter values used in a single A. con/1/pop
generation: In this aspect, we identify four kinds of strategies Ali and TĘrn [4] proposed the DEPD, in which the value of
in the literature. CR is fixed and the value of F is adjusted by (1). Fmin denotes
a) 1: This is the simplest strategy. All offspring are the minimum value of F, and fmax/fmin denotes the
produced by the same parameter value in a generation. maximum/minimum fitness value in the population. When the
b) mul (multiple): This kind of strategy draws a random difference between the fitness values of the best and the worst
value from a specified distribution every time an offspring is individuals decreases, the value of F increases. It follows the
produced. For example, it may draw a value for the parameter common idea that a larger perturbation is made when the
F from a normal distribution to generate one offspring and population diversity gets lower.
draw another value for another offspring. °max{Fmin ,1 − f max / f min }, if f max / f min < 1,
F =® (1)
c) idv (individual): This could be the most popular °̄max{Fmin ,1 − f min / f max }, otherwise.
strategy. It associates with each individual one parameter
value. When an individual i serves as the target vector, its Fi
Liu and Lampinen proposed to use two fuzzy logic
and CRi will be used to generate the donor and trial vectors.
controllers (FLCs) to adjust CR and F in their FADE [5]. The
d) var (variable): It is like the idv strategy, but the inputs of the FLC are the change of values of decision variables
parameter values are associated with the decision variables, (d1) and the change of objective values (d2) between two
not with individuals. We found one study taking this kind of generations. When d1 is small, CR and F increase as d2
strategy. increases. When d2 is medium or large, CR and F increase as d1
3) The source of considered information: When the increases.
parameter value is adjusted, information can be collected from The ADEA [6] was proposed by Qian and Li to deal with
different sources. We classify them into four groups. multiobjective optimization problems. It follows the well-
a) rnd (random): This kind of strategy selects parameter known NSGA-II [7] to separate the individuals into different
values from random distributions such as the uniform fronts and calculates a similar crowding measure. The value of
distribution, normal distribution, and Cauchy distribution. F is adjusted according to how well the individuals are evenly
distributed on the fronts and how many individuals are non-
b) pop (population): It considers the statistics collected dominated solutions. The detailed equation is expressed in (2).
from the entire population. Common statistics include the Assume that there are k fronts and there are mj individuals in
population diversity and the successful rate of generating the jth front. The symbol dij is the crowding measure of an
better offspring.
individual i in the jth front, d j is the average crowding measure
c) par (parent): This strategy is used together with the
idv strategy in the second aspect. It adjusts the parameter in the jth front, d is the average crowding measure of all
values according to the parameter values of the selected individuals, and df is the Euclidean distance between the two
parents for generating the donor and trial vectors. boundary solutions. |P| and |Q| denote the number of non-
dominated solutions and the population size, respectively.
d) idv (individual): This strategy is also used together
with the idv strategy in the second aspect. It adjusts the Generally speaking, the value of F increases when the
parameter values based on the records of historical values of individuals are not evenly distributed and when the number of
the target vector. non-dominated solutions is small.
probability of producing larger values. Thus, their NSDE CRm = ¦w j ⋅ CRrec ( j ) (7)
adjusts the value of F by the two distributions in equal j =1
Later, Brest et al. proposed jDE-2 [16] by integrating the idea (23)
of multiple mutation strategies in SaDE into jDE. It records the F. con/idv/par
values of F and CR for each of the three adopted mutation
strategies. In addition, jDE-2 replaces the k worst individuals at The con/idv/par strategy evolves the parameter on the
every l generation with parameter values randomly selected individuals in the same way as it evolves the decision variables.
from the feasible range. This may speed up the adaptation of In other words, the parameter values are adjusted based on the
parameter values. information on the parents. This category matches the “self-
Soliman and Bui [17] proposed a control strategy like that adaptive” category in Eiben et al. [3]. (In fact, most of the so-
in jDE but added more randomness in the control of F. Instead called self-adaptive DE algorithms in the literature fall into the
of controlling the value of F directly, their strategy samples the “adaptive” category according to their taxonomy. Our
F value based on the Cauchy distribution and adjusts the scale taxonomy helps to further identify their features.)
The SPDE by Abbass [21] adjusts the value of CR of a
parameter of the Cauchy distribution in a small probability (τ1).
target vector i according to the values of CR of the three
The value of CR is controlled in the same way as in jDE.
randomly selected parents r1, r2, and r3, as (24) defines.
Besides τ1 and τ2, three more parameters, μ, δl, and δu are
required. The authors did not name their strategy, and in the CRi = CRr1 + N (0,1) ⋅ (CRr 2 − CRr 3 ) (24)
following we call it CSDE.
Omran et al. [22] proposed the SDE, which adjusted the
C ( μ , δ i , g +1 ), if U 1 (0,1) < τ 1 , value of F by (25). Note that the individuals used to adjust the
Fi , g +1 =® (20)
¯ C ( μ , δ i , g ), otherwise. F value are different from the individuals used to adjust the
δ i , g +1 = δ l + δ u ⋅U 2 (0,1) (21) decision variables.
Fi = Fr 4 + N (0,0.5) ⋅ ( Fr 5 − Fr 6 ) (25)
The strategy in MOSADE [18] can be viewed as a special
case of that in jDE with τ1 and τ2 set by 1. In other words, the Instead of using the normal distribution in the SPDE and
values of F and CR are re-sampled at every generation. SDE, the DESAP [23] uses the value of scaling factor F in the
adjustment of CR in (28). In the experiments, the value of F
E. con/idv/pop
was fixed as one.
The con/idv/pop strategy records parameter values on the
individuals and adjusts the values using the information on the CRi = CRr1 + F ⋅ (CRr 2 − CRr 3 ) (26)
target vector as well as the whole population. The RADE [19]
Adjusted Additional
Category* Algorithm parameters parameters** #Obj Brief descriptions
con/1/pop DEPD 2004 F SO (1) Fixed CR;
(2) Increase F when abs(fmax/fmin) decreases.
FADE 2005 CR, F membership SO (1) Two fuzzy logic controllers for CR and F, respectively;
functions (2) Inputs of the controllers are based on the average genotypic and
phenotypic distances between two generations.
ADEA 2008 F MO (1) Fixed CR;
(2) Increase F when the individuals are not evenly distributed and when the
number of non-dominated solutions is small.
con/mul/rnd NSDE 2008 F SO (1) CR~U(0, 1);
(2) F~N(0.5, 0.5)/Cauchy in equal probability.
con/mul/pop SaDE 2005/2009 CR, s LP, ε SO (1) CR~N(CRm, 0.1), CRm as the median of successful CR values;
(2) F~N(0.5, 0.3);
(3) Selection probability of mutation strategies is proportional to the
successful rate.
SaNSDE 2008 CR, F, s LP SO (1) Descendant of NSDE and SaDE;
(2) CR~N(CRm, 0.1), CRm as the weighted average of successful CR values,
where weights are the portion of improvement on fitness;
(3) F~N(0.5, 0.5)/Cauchy with probability depending on the successful rate.
JADE 2009 CR, F c SO (1) CR~N(μ, 0.1), μ as the weighted sum of current μ and arithmetic mean of
successful CR values;
(2) F~C(μ, 0.1), μ as the weighted sum of current μ and Lehmer mean of
successful CR values.
JADE2 2008 CR, F c MO (1) Descendant of JADE, using the same parameter control mechanism.
SaJADE 2011 CR, F, s c SO (1) Descendant of JADE;
(2) Selection of mutation strategies by the same mechanism in JADE.
con/idv/rnd jDE 2006 CR, F τ1, τ2 SO (1) Change CR by U(0, 1) with probability τ1;
(2) Change F by U(Fmin, Fmax) with probability τ2.
jDE-2 2006 CR, F, s τ1, τ2, k, l SO (1) Descendant of jDE, adding multiple mutation strategies;
(2) Re-initialize the parameter values of the worst k individuals every l
generations.
CSDE 2008 CR, F τ1, τ2, μ, δl, δu SO (1) Change CR by U(0, 1) with probability τ1;
(2) Change F by C(μ, δ) with probability τ2, δ ~U(δl, δu).
MOSADE 2010 CR, F MO (1) Descendant of jDE;
(2) CR~U(0.0, 0.5);
(3) F~U(0.1, 0.9).
con/idv/pop RADE 2008 F α, β SO (1) For the individuals whose accumulated fitness improvement is among the
top 1/β% in the population, keep their F values; for the remaining
individuals, set random values
(2) Update F values every α generations.
ISADE 2009 CR, F τ1, τ2 SO (1) Descendant of jDE;
(2) Change the values of CR and F toward the lower bound if the individual’s
fitness is better than the average fitness over the population.
con/idv/par SPDE 2002 CR MO (1) CRi = CRr1 + N(0, 1)⋅(CRr2 – CRr3);
(2) F~N(0, 1).
SDE 2005 F SO (1) CR~N(0.5, 0.15);
(2) Fi = Fr4 + N(0, 0.5)⋅(Fr5 – Fr6).
DESAP 2006 CR SO (1) CRi = CRr1 + F⋅(CRr2 – CRr3);
(2) Fixed F value.
DEMOwSA 2007 CR, F τ MO (1) CRi = (1/4)⋅(CRi + CRr1 + CRr2 + CRr3)⋅ eτ⋅N(0, 1);
(2) Fi = (1/4)⋅(Fi + Fr1 + Fr2 + Fr3)⋅ eτ⋅N(0, 1).
con/idv/idv SFLSDE 2009 CR, F τ1, τ2, τ3, τ4 SO (1) Descendant of jDE;
(2) Use the golden section search and hill climbing search probabilistically to
adjust the F value of the best individual.
SspDE 2011 CR, F, s LP, RP SO (1) Each individual has its own lists, CRLi, FLi, and SLi;
(2) Successful CR, F, and strategy are recorded in wCRLi, wFLi, and wSLi;
(3) Every LP generations, CRLi, FLi, and SLi are refilled by wCRLi, wFLi, and
wSLi, respectively;
(4) Refilled values are taken from the winning lists in probability RP.
con/var/pop APDE 2004 CR, F γ MO (1) Adjust the parameter values to maintain the variances of decision variables
and average variance of objective values between two generations.
dis/mul/pop DEBR 2009 CR, F, s n0, δ SO (1) Use a prespecified combinations of CR, F, and mutation strategies;
(2) Select one combination in a probability proportional to the successful rate.
*
Comparing with the classification scheme in [3], the ⋅/⋅/rnd category matches the deterministic parameter control category, the con/idv/par category matches the
self-adaptive parameter control category, and the remaining categories match the adaptive parameter control category.
**
The upper bound and lower bound of CR and F values are not listed as additional parameters here.