Robust Optimization With Simulated Annealing
Robust Optimization With Simulated Annealing
net/publication/220249298
CITATIONS READS
85 583
2 authors:
All content following this page was uploaded by Dimitris Bertsimas on 11 August 2014.
Abstract Complex systems can be optimized to improve the performance with respect
to desired functionalities. An optimized solution, however, can become suboptimal or even
infeasible, when errors in implementation or input data are encountered. We report on a robust
simulated annealing algorithm that does not require any knowledge of the problems structure.
This is necessary in many engineering applications where solutions are often not explicitly
known and have to be obtained by numerical simulations. While this nonconvex and global
optimization method improves the performance as well as the robustness, it also warrants for
a global optimum which is robust against data and implementation uncertainties. We dem-
onstrate it on a polynomial optimization problem and on a high-dimensional and complex
nanophotonic engineering problem and show significant improvements in efficiency as well
as in actual optimality.
1 Introduction
Optimization has had a distinguished history in engineering and industrial design. Most
approaches, however, assume that the input parameters are precisely known and that the
implementation does not suffer any errors. To accommodate these sources of errors, sensi-
tivity analysis techniques were developed which find solutions that are least sensitive among
a larger set of optima. However, these methods do not provide designs that are intrinsically
robust against errors.
123
J Glob Optim
There has been evidence illustrating that if errors (in implementation or estimation of
parameters) are not taken into account during the design process, the actual phenomenon can
completely disappear. A prime example is optimizing the truss design for suspension bridges.
The Tacoma Narrows bridge was the first of its kind to be optimized to divert the wind above
and below the roadbed [1]. Only a few months after its opening in 1940, it collapsed due to
moderate winds which caused twisting vibrational modes. In another example, Ben-Tal and
Nemirovski demonstrated that only 5% errors can entirely destroy the radiation characteris-
tics of an otherwise optimized phased locked and impedance matched array of antenna [2].
Therefore, taking errors into account during the optimization process is a first order effect.
These considerations have motivated the field of robust optimization. Recent works have
been devoted to problems with convex objectives and constraints (e.g. linear) [3–5]. These
works have shown that a convex optimization problem with parameter uncertainty can be
transformed to another convex optimization problem. This transformation can be either exact
or through a relaxation. However, the final problem can be more complex or can have a sig-
nificantly larger number of constraints. Despite significant advancements, all these results
are limited to convex problems.
Modern engineering design has objectives and constraints that are not explicitly given.
Moreover, solutions are often obtained through numerical simulations. This means that no
internal structure can be exploited. The challenge, thus, lies in robustly designing engineered
and engineering systems which are described through simulations and are exposed to errors.
More recently, local search methods have been expanded to robust optimization of sim-
ulated-based problems [6–8]. They entail finding descent directions and iteratively taking
steps along these directions to optimize the robustness, i.e., reducing the worst-case cost. A
second order cone problem was solved to determine a direction that makes the largest angle
to all worst-case neighbors that were found to deviate the most from the desired solution.
The number of regarded worst-case scenarios and the dimensionality determined the compu-
tational efficiency of these algorithms [8]. These approaches, however, provided only local
robust optima.
Here, we present a more generic method, that is independent of the problem structure and,
thus, is not restricted to dimensionality or other topological features. In fact, it only requires
a black-box that provides the function evaluation and the derivative for a given design. More-
over, we show that this technique can provide a global robust optimum at computationally
reasonable costs. We showcase its performance on a 100-dimensional optimization problem
in nanophotonic design, that has a highly nonconvex cost function as well as nonconvex
search space.
2 Method
In general, to measure the deviations between the desired and current performance of a design
x, a cost function f is defined. For the purpose of generality, we do not impose any assump-
tion of the structure on f , i.e., f can be nonconvex or simply given through a numerical
simulation. The nominal optimization aims to minimize f . There are numerous techniques
to achieve this goal for which we refer to the standard literature [9–11]. In real-world prob-
lems, the design variables x are subject to errors x. Therefore, possible designs are regarded
to reside within an uncertainty set
123
J Glob Optim
Fig. 1 2D-Sketch: Robustness is determined by worst-case costs within the uncertainty set (discs) of a design
(point in x–y plane). Robust optimization seeks to place the lowest disc (red), even lower than the nominal
optimum (yellow)
where denotes the size of U , determined by empirical error bars. While our approach
applies to other norms xp ≤ in (1) (p being a positive integer, including p = ∞),
we present the case of p = 2. To find a design that is robust against all possible errors in U ,
the most conservative approach is to find worst-case scenarios g(x) = maxx∈U f (x + x)
and seek to minimize them by modifying the design appropriately. This robust optimization
problem can be expressed through
min g(x) ≡ min max f (x + x). (2)
x x x∈U
Since f is nonconvex and possibly not given in an analytically closed form, the inner max-
imization problem will be nonconvex and without a closed form as well. In our method, its
solution is found using local searches within the uncertainty set of the design (see Fig. 1).
These searches, such as gradient ascent sequences, identify a potentially discrete set M(x)
of “bad neighbors” x̂ which have the highest costs in U . Once M(x) is assembled for a given
design x, the robust optimization method seeks to iteratively update x in order to exclude
the elements of M(x) and consequently finds designs that have lower worst-case neighbors,
thus improve robustness. In earlier works, the robust local method was used to find a local
robust optimum [6, 8].
To find the global robust minimum, we introduce the method of robust simulated annealing
(RSA). In a pioneering work, Kirkpatrick et al. introduced the method of simulated anneal-
ing for discrete optimization problems [12]. Relying on one of the early works in scientific
computing by Metropolis et al. [13], they replaced the energy of an ensemble by the cost
123
J Glob Optim
We interpret the parameter β as the inverse temperature. We set β0 to be the highest cost in the
bad neighbors set M(x) (see Eq. 4 below). In the RSA algorithm, gβ (x) is computed as a sum
over the members of M(x) and the limβ→∞ in Eq. 3 is approximated by the largest available
β. The RSA algorithm maximizes the robustness by iteratively minimizing the worst-case
scenario of the current design xk as following:
RSA Algorithm:
Step 0. Initialization:
(i) Set iterate k = 0, acceptance number h = 0, and annealing index n = 0.
(ii) Let x0 be the initial decision vector that is arbitrarily chosen.
(iii) Assemble M(x0 ) through a series of gradient ascent sequences within
U (x0 ) [8].
(iv) Compute the initial inverse temperature β0 according to
1
β0 = . (4)
max f (x̂)
x̂∈M(x0 )
123
J Glob Optim
Theorem 1 Suppose that an arbitrary cost function f (x) with a bounded set of minimum
points has a global robust optimum. Then the RSA Algorithm converges to the global optimum
of the robust optimization problem (2), if all designs x are accessible with equal probability
as n → ∞.
Proof Geman and Geman have shown that a generic simulated annealing algorithm con-
verges to a global optimum, if β is selected to be not faster than βn = ln(n)/β0 and if all
accessible states are equally probable for n → ∞ [14].
The first condition is satisfied by the choice of the annealing schedule in Eq. 7. For the
second condition, let H (x) be the probability density of the randomly generated designs xtrial .
In order to show that any design x in the D-dimensional configuration space can be sampled
infinitely often during the annealing time, we have to show that the probability of not visiting
x during the annealing time vanishes. In other words, we have to show that
∞
(1 − Hn (x)) = 0, (8)
n=0
Ingber et al. [15] have shown that the functional form of random generators for Gaussian-
Markovian based systems, like the one we employed to generate xtrial , is given by
D
β 2 β
· e− 2 x ,
2
H (x) = (10)
2π
where 1/β is a measure of the fluctuations (standard deviation) and x is difference between
the current and the previous design. As the algorithm progresses, β is updated. Thus, the
Eq. 10 can be expressed as
D
β0 ecn 2 β0 ecn
· e− ≥ e− ln(n) .
2
Hn (x) = 2 x (11)
2π
Now, we can insert Eq. 11 into Eq. 9 and obtain
∞ ∞ ∞
1
Hn (x) ≥ e− ln(n) = = ∞. (12)
n
n=0 n=0 n=0
123
J Glob Optim
Therefore, as n → ∞, all designs can be generated infinitely often. Moreover, the Metropolis
weights are chosen such that the detailed balance condition
P (x → x ) · W (x) = P (x → x) · W (x ), (13)
is satisfied which warrants for an equal probability that all states are accessible in the search
space. Therefore, we can conclude that the local search based RSA algorithm converges to
a global optimum as n → ∞.
Note that the definition of the robustness in Eq. 3 contains β which varies throughout the
annealing process. However, this does not affect the probability density H (x) of the designs.
Moreover, within each iteration of the RSA algorithm (and in fact over 100 acceptances),
β is constant and acts only as a local renormalization factor. Even when β is updated in
Step 3-ii-B, g(xk ) and W (xk ) are recomputed so that the current and the next trial design are
compared at the same constant β ensuring a constant local probability density. Therefore,
this enhancement to the weighting in g(x) does not affect the convergence criteria of the
RSA algorithm. Furthermore, the convergence behavior of RSA is similar to that of adaptive
simulated annealing, since their cores are comparable. However, since the convergence rate
is highly dependent on the underlying problem and our aim to provide the general foundation
of RSA, we consider this discussion outside the scope of this work.
We demonstrate the performance of the RSA algorithm by applying it to two different
optimization problems. The first application is a robust polynomial optimization problem that
is intended to serve as a demonstration of the main features of RSA as well as its capability
of providing a robust global optimum. This is a simple enough problem in which we can
calculate the robust global optimum. The second application is on the robust optimization in
engineering and is of direct relevance to nanophotonics and nanotechnology design, where
small uncertainties may result in complete failure of an otherwise optimized design. This
problem is high-dimensional, highly non-linear and its solution is not known explicitly, thus
useful as a prototype for modern engineering design problems. This application is intended
to demonstrate the performance of the RSA algorithm in a high-dimensional search space as
well as its efficiency of providing the robust solution.
For the sake of better demonstration of the RSA algorithm, we employ a two dimensional
polynomial problem. We define a nonconvex polynomial function
The implementation errors are bound as = (x, y) by 2 ≤ 0.5. Therefore, the
robust optimization problem can be posed as
min g(x, y) ≡ min max f (x + x, y + y). (15)
x,y x,y 2 ≤0.5
We need to stress that even though this problem is only two-dimensional, it is already a
difficult problem. Henrion and Lasserre successfully introduced relaxation methods to solve
123
J Glob Optim
polynomial optimization problems [16, 17]. Applying the same technique to Problem (15),
however, leads to polynomial semidefinite programs (SDP) which have the form of
min h(x, y)
x,y (16)
s.t. A(x, y) 0.
Note that the entries of the semidefinite constraint are assembled by multivariate polynomials
Aij . In order to solve this problem approximately, we need to convert it into a significantly
larger SDP. More importantly, the size of the SDP grows with the size of the original problem,
the maximum degree of the polynomials, and the number of variables. Because of this, it is
evident that polynomial SDPs are not widely used in practice [18].
The nominal cost of f (x, y) has multiple local minima and a global minimum at (x ∗ , y ∗ ) =
(2.8, 4.0), where f (x ∗ , y ∗ ) = −20.8. We determined the global minimum by using the
Gloptipoly software as discussed in Reference [16] and verified using multiple gradient
descents. We evaluated the worst-case cost function g(x, y) by enumerating the nominal
costs of neighbors using a fine discrete mesh. The Contour plot in Fig. 2 illustrates the
worst-case cost surface and suggests that g(x, y) has multiple local minima.
We applied the Robust Simulated Annealing algorithm to this polynomial problem using
arbitrary initial designs (x, y). Figure 2 shows one exemplary run and demonstrates the con-
vergence towards to robust global minimum. The same global optimum is found when starting
from other arbitrary initial points. The algorithm decreases the worst-case cost significantly
while increasing the nominal cost slightly, which is the “price of robustness”. A much lower
number of iterations is required when starting from a point that is not in the immediate prox-
imity of a local (robust or nominal) minimum. At the termination point, the last configuration
is surrounded by neighbors that have higher worst-case costs. In fact, detailed analysis of the
distribution of bad neighbors at the termination showed that they lie on the boundary of the
uncertainty set. Any additional trial configuration will lead to a vanishing update probability
Fig. 2 2D-Sketch: Contour plot of worst-case cost function g(x, y). The path of the progress of the Robust
Simulated Annealing algorithm is shown with red lines as it starts at an arbitrary initial point and converges
towards to robust global minimum
123
J Glob Optim
and, thus, will be rejected. The surface plot also confirms that the solution is indeed the true
robust global minimum.
In the second application, we demonstrate the efficiency and the performance of the RSA
algorithm by applying it to a prototype electromagnetic problem, as it occurs in nanophotonic
engineering. A plethora of unique characteristics in photonic crystals identified them as prime
candidates for unconventional materials in controlling and manipulating electromagnetic
field propagation [19]. Their peculiar functionalities are based on diffraction phenomena,
which require periodic structures. Upon breaking the spatial symmetry, additional degrees
of freedom are revealed which allow for additional functionality and higher levels of con-
trol. Geremia et al. broke this symmetry by diluting sites and optimizing the location of the
missing scattering sites [20]. However, due to the periodicity requirement of these crystals,
additional degrees of freedom and, hence, their benefit remained restricted. More recently,
Seliger et al. performed unbiased gradient-based optimizations on the spatial distribution of a
large number of dielectric cylinders [21]. The reported aperiodic structure matched a desired
target function up to 95%.
When implemented in the real-world, however, the performance of many of these designs
substantially deviates from the theoretical prediction. A key source of this deviation lies
in the presence of uncontrollable implementation errors, such as incorrect positioning and
erroneous shape of scatterers. Since nanophotonic structures exploit nonlinear features, they
are highly susceptive to small perturbations. Therefore, a robustly optimized design warrants
the desired performance and sustains errors.
We demonstrate this by applying the RSA method to an inverse design problem, that seeks
to match the performance of 50 dielectric cylinders to a desired target function by varying
the position of these scattering centers. In the following, we summarize the essentials of the
physical model for the sake of completeness and refer for more details to References [6, 21].
The model is based on a two-dimensional Helmholtz equation for dielectric scatterers, which
are lossless and non-magnetic. Therefore, this approach scales with frequency and allows
to probe nanophotonic designs. The wave propagation is strictly two-dimensional, since
the domain is bound by two metallic plates separated by less than half the incoming wave
length. Figure 3 illustrates the setup of the domain along with the corresponding electric field
strength. To conform with the laboratory experiment of Seliger et al. [21], we describe the
electric field Ez through the partial differential equation (PDE)
(∂x (µ−1 −1
ry ∂x ) + ∂y (µrx ∂y ))Ez − ω0 µ0 0 rz Ez = 0,
2
(17)
where µr is the relative and µ0 the vacuum permeability and r denotes the relative and
0 the vacuum permittivity. Note, that in the experimental measurements, the frequency is
fixed to f = 37.5 GHz and only stationary solutions of the Maxwell equations are sought,
taking the boundary conditions into account. We compute the field values using the finite-
difference representation of the PDE and solve a linear equation system. The power along
the target surface for an incident angle θ is interpolated using the nearest four mesh points
and their standard Gaussian weights W(θ ) as smod (θ ) = W(θ )
2 · diag(Ez ) · Ez . The accu-
racy of this numerical model has been verified by actual laboratory experiments of identical
configurations [6, 21].
The nominal optimization problem seeks to match the power profile along the detection
surface to a top-hat target function sobj , which has a constant maximum between 30◦ ≤ θ ≤
123
J Glob Optim
Fig. 3 The electric field of 50 dielectric scattering cylinders (white circles). The incoming radio frequency-
source couples to the wave guide. For this configuration, the power is constant between 30◦ ≤ θ ≤ 60◦ and
amounts to 98% of the overall power
60◦ and is zero everywhere else. For any given discretized angle θk and configuration p,
a cost-functional J measures the deviation of smod from sobj through
m
2
J (p) = smod (θk ) − sobj (θk ) . (18)
k=1
Note that J (p) is not convex in p, and depends on p only through the PDE. Furthermore, the
feasible set of all possible configurations is nonconvex, as cylinders should not overlap.
In laboratory experiments, implementation errors p are encountered, when physically
placing the cylinders. To include most errors, we define the uncertainty set U such that the
probability P (p ∈ U ) is 99%. Each component pi is measured to be independently and
normally distributed with a standard deviation 40µm [21]. Hence, we chose = 550 µm
for the size of the uncertainty set in Eq. 1.
Recently, a solution to this nanophotonic model was reported to be more robust by ∼ 8%
[6]. Since the underlying method relied on local searches, the reported robust optimum was a
local optimum. However, it has been shown, that if the configurations are randomly generated
through a Gaussian distribution, a Markovian system will converge to the global minimum,
when a Boltzmann—or Cauchy cooling scheme is applied [13, 14]. In other words, the proba-
bility of the system not to generate a state which is the global minimum, vanishes. Therefore,
applying the proposed RSA to this model warrants a global robust optimum.
To increase the efficiency of the search in this high-dimensional problem, we generate
random states adapted to the annealing scheme. Starting from a uniform random number
r i ∈ (0, 1) for the dimension i, we generate new random numbers
1 1
(1 + β)|2r −1| − 1 ,
i
s i = sign r i − (19)
2 β
123
J Glob Optim
(a) (b)
0.05 Re-annealing 0.05 Re-annealing
Re-annealing, Re-heating Re-annealing, Re-heating
Worst-case Cost
Nominal Cost
0.04 0.04
0.03 0.03
0.02 0.02
0 200 400 600 800 1000 0 200 400 600 800 1000
Iterations Iterations
Fig. 4 Performance of different annealing schemes for RSA on the 100-dimensional nanophotonic problem,
starting from a regular arrangement given by the underlying photonic crystal and using re-annealing at every
100 acceptances and re-heating at every 50 rejections: a worst-case cost performance and b nominal cost
performance
reflecting the cooling schedule in Eq. 7. This adaptation creates random numbers s i ∈ (−1, 1).
Consequently, we generate trial configurations p according to
It has been shown that by successive application of this adaptive sampling of random states
(for iteration number k → ∞), the probability of not generating a state which corresponds to
the global minimum vanishes [22]. To increase the efficiency of the algorithm, we advanced
the cooling scheme by using a re-annealing technique, which resets the iteration index every
100 acceptances and increases β0 by 5%. Additionally, we allowed a re-heating every 50
rejections, where the iteration index is reset and β0 is decreased by 5%. The re-annealing
pushes the system faster towards the minimum while avoiding quenching effects (local min-
ima of small extend). The re-heating allows the search to escape local minima that have larger
extends (bath-top shapes). Figure 4 illustrates the difference between these two techniques.
While the re-annealing technique improves the nominal and the worst-case cost sufficiently
fast, it gets stuck in an many small local minima for the robust cost, as shown in Fig. 4a.
Using the additional re-heating, the algorithm improves both costs at comparable efficiency,
however, it manages to escape local minima and, thus, achieves overall high-quality results
faster. Note that the choice of the parameters is certainly correlated to the actual problem and,
thus, cannot be generalized in this scope. Furthermore, since the algorithm minimizes only
the worst-case cost, the performance of the nominal cost is merely a useful “side-product”
of RSA.
To allow a comparison to previous methods, our starting configuration corresponds to the
initial arrangement used in the robust local search (RLS), as reported in Ref. [6]. Since RSA
conducts a global search, the initial configuration will neither affect the accessibility of the
search space, nor alter the quality of the final solution. Figure 5 illustrates the performance of
RSA for the nanophotonic problem, starting from a nominally optimized design, i.e., without
taking robustness into account. The nominal and worst-case costs of the configuration at any
given iteration step are plotted as the algorithm progresses. Starting from the initial configu-
ration, within a few iterations, RSA improves the robustness significantly, i.e., the worst-case
cost decreases. Since RSA updates the states according to the Boltzmann-weights, it rapidly
scans large portions of the search space before a new updated state is accepted. As Fig. 5
shows, RSA outperforms previous robust optimization techniques. While both algorithms
123
J Glob Optim
0.007
8%
0.006
Worst-case Cost (RLS)
40%
Cost
0.005
0.003
0 20 40 60 80 100
Iterations
Fig. 5 Performance of the Robust Simulated Annealing on the 100-dimensional nanophotonic problem start-
ing from a nominally optimized configuration. The nominal and worst-case costs of each configuration are
plotted over the iteration steps. The cross displays the previously reported robust solution using the robust
local search algorithm [6]. The nominal cost increases slightly as the “price” of improved robustness
require comparable times for the neighborhood search, RLS will require additional com-
putation (approximately 20 min) to find a deterministic descent direction. Moreover, even
though RSA is not deterministic and can yield different solutions for different runs and ini-
tial configurations, it will provide the lowest possible optimum for k → ∞, thus, the global
robust minimum. We terminated the simulations in Fig. 5 after the number of trials for the
same acceptance step exceeded an empirical threshold.
Note that during the performance of RSA, only a few “surprises” are discovered, as
opposed to RLS. This is due to the fact that in RSA steps are only taken, if no worst neigh-
bors are residing around the new update, whereas in RLS the neighborhood search of the
update configuration is conducted after the step has been taken.
These results in Fig. 5 illustrate that the presented method of robust simulated annealing
increases the robustness of these prototype nanophotonic structures significantly. In fact, a
design that is more robust against larger errors allows for alternative choices of manufac-
turing, i.e., costly high-precision implementation might become redundant, if the design is
intrinsically robust against larger errors. In addition to the reduced production costs and higher
manufacturing yield, these robust solutions are also protected against errors, whose sources
were unknown at the time of the optimization. This feature extends the advantage of the
robust design far beyond its conventional notion that was limited to protection against known
uncertainties, thus, provides enhanced performance reliability of novel nano-structures.
5 Conclusion
We have introduced the generic algorithm of robust simulated annealing, which provides
the global robust optimum of complex systems. Since it relies on the solver of the problem
as a black-box, this method is generic and does not require any detailed knowledge of the
structure of the problem, thus, is applicable to most engineering design problems which
employ numerical simulations to compute relevant quantities. Furthermore, the method and
123
J Glob Optim
its efficiency are independent of the definition of the cost function or the dimensionality of
the problem. We have applied the method to a robust polynomial optimization problem and
shown that it finds efficiently the robust global optimum. Moreover, we have demonstrated
its performance by applying it to an actual nanophotonic design problem and shown that
the solution outperforms in terms of efficiency as well as absolute robustness results any
previously available methods.
References
123
View publication stats