Grovers Quantum Algorithm Applied To Global Optim
Grovers Quantum Algorithm Applied To Global Optim
net/publication/220133694
CITATIONS READS
40 699
3 authors, including:
Graham Wood
The University of Warwick
130 PUBLICATIONS 1,173 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Graham Wood on 05 June 2014.
Abstract. Grover’s quantum computational search procedure can provide the basis for imple-
menting adaptive global optimization algorithms. A brief overview of the procedure is given and a
framework called Grover adaptive search is set up. A method of Dürr and Høyer and one introduced
by the authors fit into this framework and are compared.
Key words. discrete optimization, global optimization, Grover iterations, Markov chains, quan-
tum computers, random search
DOI. 10.1137/040605072
∗ Received by the editors March 11, 2004; accepted for publication (in revised form) November 22,
2004; published electronically August 3, 2005. This research was supported by the Marsden Fund of
the Royal Society of New Zealand.
http://www.siam.org/journals/siopt/15-4/60507.html
† Department of Mathematics and Statistics, University of Canterbury, Christchurch, New Zealand
(b.baritompa@math.canterbury.ac.nz).
‡ Department of Statistics, Macquarie University, NSW 2109, Australia (dbulger@efs.mq.edu.au,
gwood@efs.mq.edu.au).
1170
APPLYING GROVER ADAPTIVE SEARCH 1171
and Høyer’s algorithm, extending and correcting the theoretical analysis in [8]. In
section 7, we present a refined version of GAS, and in section 8 this version is com-
pared to that of Dürr and Høyer by numerical simulation. Section 9 concludes the
paper.
2. Optimization problem. We consider the following finite global optimization
problem:
minimize f (x)
subject to x ∈ S,
Note that in the special case of r = 0, Grover search observes only the prepared equal
amplitude superposition of states and so reduces to choosing a point uniformly from
the domain.
Most of the work in implementing the Grover rotation operator is in the oracle
query, so the cost of a Grover search of r rotations is taken as the cost of r oracle
queries. The output is a point in S, and as one would usually want to know if it is in
M or not, a further oracle query (acting on the point) would give the function value
under h.
Grover search is sometimes portrayed as a method for the database table lookup
problem. This is only one elementary application, however. Other interesting appli-
cations concern “marking functions” h which are more than simple tests of indexed
data. Examples relating to data encryption and the satisfiability problem are given
in [2, 9].
From searching to optimizing. Grover search solves a special global opti-
mization problem: it finds a global maximum of h. For the more general problem
introduced in section 2, our intention is to use Grover search repeatedly within a
global optimization method of the adaptive search variety. Adaptive search methods
produce, or attempt to produce, an improving sequence of samples, each uniformly
distributed in the improving region of the previous sample (see [16, 15, 5]).
Given an objective function f : S → R and a point X ∈ S with f (X) = Y , we
use Grover’s algorithm to seek a point in the improving region {w ∈ S : f (w) < Y }.
APPLYING GROVER ADAPTIVE SEARCH 1173
w −→ f
< −→ 0 or 1.
y
The additional comparison logic circuitry < to construct h is minimal, and we will
take the cost of h and f to be the same.
As far as Grover’s algorithm is concerned, h is simply a black box quantum
circuit, inputting a point w in S (or a superposition of such points) and outputting
1, f (w) < y,
0, f (w) ≥ y
where p = |{w ∈ S : f (w) < Y }|/|S|. The procedure also outputs y = f (x). The
cost of the procedure is r + 1 objective function evaluations.
4. Grover adaptive search. This section presents the GAS algorithm intro-
duced in [4]. The algorithm requires as a parameter a sequence (rn : n = 1, 2, . . . ) of
rotation counts. Initially, the algorithm chooses a sample uniformly from the domain
and evaluates the objective function at that point. At each subsequent iteration, the
algorithm samples the objective function at a point determined by a Grover search.
The Grover search uses the best function value yet seen as a threshold. Here is the
algorithm in pseudocode form:
1174 W. P. BARITOMPA, D. W. BULGER, AND G. R. WOOD
Why the rotation count should vary. In [4] we considered the possibility
of using the same rotation count at each iteration. Although it is easy to construct
objective functions for which this method works well, they are exceptional, and in
general it is preferable to vary the rotation count as the algorithm progresses.
To see why, suppose that at a certain point in the execution of the GAS algorithm,
the best value seen so far is Y , and the improving fraction is p = |{w : f (w) < Y }|/N .
For any given rotation count r, the probability of success of each single iteration of the
algorithm is given by gr (p). Although the rationale for using Grover’s algorithm is to
APPLYING GROVER ADAPTIVE SEARCH 1175
Fig. 1. The probability of a step of three Grover rotations finding an improvement, as a function
of the improving fraction p.
increase the probability of finding improving points, there are combinations of values
of r and p where the opposite effect occurs. For instance, Figure 1 plots g3 (p) versus
p. If p = 0.2, then the step is almost guaranteed not to find an improvement. If the
rotation count varies from each iteration to the next, then this is only an occasional
nuisance. But if it is fixed at r, and if the algorithm should happen to sample a point
x such that the improving fraction p for Y = f (x) has gr (p) zero or very small, then
the algorithm will become trapped.
How the rotation count should vary. In fact, at each iteration during the
execution of the algorithm, some optimal rotation count r is associated with the
improving fraction p of the domain (assuming p > 0). If it is used for the next Grover
search, then an improving point will almost certainly be found. This r is the first
positive solution to the equation gr (p) = 1. (Actually of course we must round this to
the nearest integer, and therefore success is not absolutely guaranteed, but this would
contribute little to the expected cost of the algorithm.)
Unfortunately, in the general case the improving fraction p is unknown, so we
are somewhat in the dark as to the choice of rotation counts. In order to make
the most use of all the information available to us at each iteration, we could take
a Bayesian approach and keep track of a sequence of posterior distributions of the
improving fraction at each iteration and choose each rotation count to optimize the
change in some statistic of this posterior distribution. As might be expected, this
kind of approach appears to be very complex and unwieldy. The methods outlined in
the following two sections, however, strike a happy balance between implementability
and optimality of rotation count selection.
6. Dürr and Høyer’s random method. In this section we outline a method
due to Dürr and Høyer for randomly choosing rotation counts and correct two key
arguments in its originators’ analysis.
Grover’s search algorithm provides a method of finding a point within a subset
1176 W. P. BARITOMPA, D. W. BULGER, AND G. R. WOOD
of a domain. If the size of the target subset is known, the algorithm’s rotation count
parameter can easily be tuned to give a negligible failure probability. The case of a
target subset of unknown size is considered in [2], where the following algorithm is
presented:
Boyer et al. search algorithm.
1. Initialize m = 1.
2. Choose a value for the parameter λ (8/7 is suggested in [2]).
3. Repeat:
(a) Choose an integer j uniformly at random such that 0 ≤ j < m.
(b) Apply Grover’s algorithm with j rotations, giving outcome i.
(c) If i is a target point, terminate.
(d) Set m = λm.
√
Actually, in [2],√ the final step updates m to min{λm, N }. It is pointless to
allow m to exceed N , because for a target set √ of any size, it is known [2] that the
optimal rotation count will be no more than
π N /4. In the global optimization
context, however, this point will usually be immaterial, since the target region, though
comprising a small proportion of the domain, will normally be large in absolute terms.
For instance, suppose the domain contains 1020 elements and suppose finding one
of the smallest 10,000 points is required. The optimal√rotation count to find a target
set of this size is 108 π/4, substantially less than
π N /4. The actual target size
will be unknown, and therefore the actual optimal rotation count will be unknown.
But when m reaches this magnitude, if not before, each step will have a substantial
probability (on the order of 1/2) of finding a target point.√Therefore, unless λ is very
large, there will be negligible probability of m reaching N = 1010 before √ a target
point is produced. For simplicity, therefore, in this article we ignore the N ceiling
on the growth of m.
In the quant-ph internet archive, Dürr and Høyer [8] propose using the Boyer
et al. algorithm as the nucleus of a minimization algorithm. Their paper gives the
impression that the algorithm is just for the database problem. They begin with “an
unsorted table of N items each holding a value from an ordered set. The minimum
searching problem is to find the index y such that T [y] is minimum.” Again we
stress their algorithm fits in the GAS framework and is thus applicable to the general
optimization problem.
In their paper, they indicate that every item that is improving is explicitly marked.
However, this is a mistake, as it is incompatible with their complexity analysis later
in the paper. We describe a corrected version of their method using the terminology
of this paper.
Dürr and Høyer’s algorithm.
1. Generate X1 uniformly in S, and set Y1 = f (X1 ).
2. Set m = 1.
3. Choose a value for the parameter λ (as in the previous algorithm).
4. For n = 1, 2, . . . until a termination condition is met, do:
(a) Choose a random rotation count rn uniformly distributed
on {0, . . . ,
m − 1}.
(b) Perform a Grover search of rn rotations on f with threshold Yn ,
and denote the outputs by x and y.
(c) If y < Yn , set Xn+1 = x, Yn+1 = y, and m = 1; otherwise, set
Xn+1 = Xn , Yn+1 = Yn , and m = λm.
This is the special case of GAS arising when the rotation count rn is chosen
APPLYING GROVER ADAPTIVE SEARCH 1177
randomly from an integer interval which is initialized√to {0} at each improvement but
which grows exponentially to a maximum of {0, . . . ,
N −1} between improvements.
The analysis of the algorithm reported in the archive [8] uses incorrect constants
from a preprint of [2]. In the following analysis, we correct this by using the published
version of [2]. Because the Boyer et al. algorithm underpins that of Dürr and Høyer,
we begin with an analysis of the former algorithm. Theorem 3 in [2] is an order of
magnitude result, but inspection of the proof implies that the expected time required
by the Boyer et al. algorithm to find one of t marked items among a total of N items
is bounded by 8 N/t. This constant can be improved upon, though, as we shall see
after the following theorem.
Theorem 6.1. The expected number of oracle queries required by the Boyer et
al. algorithm with parameter λ to find and verify a point from a target subset of size
t from a domain of size N is
∞
j−1
λj
1 sin(4θ
λi )
(6.1) + ,
j=0
2 i=0
2 4
λi sin(2θ)
where θ = arcsin( t/N ).
Proof. Conditioned on reaching iteration j, the expected number of oracle queries
required at that iteration is
λj /2 (including the test of the output of Grover’s
algorithm for target subset membership). The probability of reaching iteration j is
a product of failure rates; the probability of the algorithm failing to terminate at
iteration j, having reached this iteration, is
1 sin(4θ
λi )
+
2 4
λi sin(2θ)
(this is [2, Lemma 2]). Thus the expected number of oracle queries required at itera-
tion j, not conditioned on whether the iteration is reached, is
j−1
λj
1 sin(4θ
λi )
+ ,
2 i=0 2 4
λi sin(2θ)
Fig. 2. The ratio between the partial sums of the geometrically convergent series (6.1) and
N/t when λ = 1.34, plotted against t/N . Note that 1.32 appears to be an upper bound.
above by
√
N
1
1.32 N √ .
r=s+1
r r−1
Note that if s is small compared to N , then the above bound approximately equals
2.46 N/s.
Proof. Assign the domain points ranks from 1 to N , giving the best point rank
1 and so forth. Where several points have equal objective function value, break ties
arbitrarily, but let l(r) be the least rank and h(r) the greatest rank among the points
with the same value as the rank r point. (In the distinct values case we will have
l(r) = h(r) = r for each r ∈ {1, . . . , N }.)
Since Dürr and Høyer’s algorithm will move through a succession of threshold
values with rank above s before finding the desired target point, the bound on the
expectation in question is given by
N
(6.2) p(N, r)B(N, l(r) − 1),
r=s+1
where p(N, r) is the probability of the rank r point ever being chosen and B(N, l(r)−1)
is the expected number of iterations required by the Boyer et al. algorithm to find
and verify a point from a target subset of size l(r) − 1.
APPLYING GROVER ADAPTIVE SEARCH 1179
h(r̂)
√ h(r̂) 1
p(N, r)B(N, l(r) − 1) ≤ 1.32 N
r=l(r̂) r=l(r̂)
h(r) l(r) − 1
h(r̂)
1
= 1.32 N (l(r̂) − 1)
h(r̂)(l(r̂) − 1)
r=l(r̂)
h(r̂)
1
= 1.32 N (l(r̂) − 1)
r(r − 1)
r=l(r̂)
√ h(r̂) 1
≤ 1.32 N √ .
r=l(r̂)
r r−1
0, 0, 0, 1, 1, 0, 1, 1, 2, 1, 2, 3, 1, 4, 5, 1, 6, 2, 7, 9,
(7.1)
11, 13, 16, 5, 20, 24, 28, 34, 2, 41, 49, 4, 60, . . . .
Fig. 3. Performance graphs for Dürr and Høyer’s algorithm for various values of the parameter
λ and two domain sizes. The third graph repeats the second with a finer mesh of λ values.
Fig. 4. Performance graphs comparing Dürr and Høyer’s method to the method of section 7,
for a uniform range distribution.
and including the values 8/7 and 1.34 suggested by [2] and Figure 2. Performance
deteriorates slowly outside of the range from 1.34 to 1.44, but within that range
there is no visible performance gradient. The value of λ may become more important
for smaller values of α, but for the remainder of this section we shall use the value
λ = 1.34.
Comparing the new method to Dürr and Høyer. Having settled on the
parameter value λ = 1.34 for Dürr and Høyer’s method, we can compare it to the
method of section 7. Figure 4 shows that, in the two cases studied, the new method
dominates that of Dürr and Høyer. For instance, to sample a target comprising 0.2%
of the domain with probability 90% or more, Dürr and Høyer’s method requires more
than 100 units of effort, whereas the new method requires only 79 (and in fact it then
samples the target with probability 96%).
Note also, in the two situations depicted in Figure 4, the estimated bound of
2.46 N/s on the expected time required by Dürr and Høyer’s algorithm, mentioned
following Theorem 6.2, amounts to 24.6 and 55.0. While the true expectations cannot
be computed from any finite portion of the performance graphs, these figures do
appear visually to be in approximate agreement with the numerical results.
Nonuniform range distributions. Until now in this section we have assumed
a uniform range distribution. This corresponds to the assumption of injectivity of the
objective function, that is, that different points in the domain map to different values
in the range. In many cases, however, for instance in combinatorial optimization,
there may be a unique optimum, or a small number of optimal domain points, but
large sets of the domain sharing values in the middle of the range; this results in a
nonuniform range distribution.
Experimentation indicates that nonuniformity of the range distribution improves
the performance of both methods under study. To produce Figure 5, we randomly
created five stochastic vectors of length 20 with first element 0.002 (the remainder
of each vector was a point uniformly distributed in [0, 1]19 and then scaled to sum
to 0.998) and simulated the performance of both methods. Compare this with the
last plot of Figure 4. Nonuniformity has improved the performance of the method of
section 7 somewhat. However, a greater improvement in Dürr and Høyer’s method
has allowed it to overtake the method of section 7. Here, for most of the five sample
APPLYING GROVER ADAPTIVE SEARCH 1183
Fig. 5. Performance graphs comparing Dürr and Høyer’s method to the method of section 7,
for a nonuniform range distribution.
range distributions, Dürr and Høyer’s method reaches the target with probability 90%
or more after 61 or fewer units of effort, whereas the new method now requires 67.
9. Conclusion. This paper outlines the significance ofGrover’s quantum search
algorithm (with its performance characteristics implying O( N/t) performance taken
as an axiom) for global optimization. Grover search can provide the basis of imple-
menting adaptive global optimization algorithms. One example is an algorithm of
Dürr and Høyer’s introduced as a method for finding minimum values in a database.
An improved analysis of Dürr and Høyer’s algorithm suggests increasing its parameter
λ from 8/7 to 1.34. Also, that algorithm fits the Grover adaptive search framework,
and thus is applicable to the more general global optimization problem. A new algo-
rithm within the same framework is proposed in section 7. Our numerical experiments
in section 8 show that the algorithms have similar performance. The method proposed
in section 7 had its parameters tuned for the distinct objective function value case
and shows superior performance to that of Dürr and Høyer’s in that case. On the
other hand, Dürr and Høyer’s method (with λ = 1.34) overtakes the new method if
there is a great deal of repetition in objective function values.
A final comment concerning implementation on a quantum computer. This is
work mainly for computer engineers of the future, but some indications are known
at the present time. A fully functional quantum computer would be able to evaluate
an objective function in just the same way as a conventional computer, by executing
compiled code. A technical requirement to control quantum coherence, which we have
not mentioned previously, is that the gates must implement reversible operations. The
code implementing the objective function must be run in the forward direction and
then in the reverse direction. This obviously at most doubles the computational effort
for a function evaluation compared to a conventional computer.
1184 W. P. BARITOMPA, D. W. BULGER, AND G. R. WOOD
REFERENCES