Mathematical Foundation of Quantum Annealing: Satoshi Morita and Hidetoshi Nishimori
Mathematical Foundation of Quantum Annealing: Satoshi Morita and Hidetoshi Nishimori
Abstract
Quantum annealing is a generic name of quantum algorithms to use quantum-mechanical
fluctuations to search for the solution of optimization problem. It shares the basic idea
with quantum adiabatic evolution studied actively in quantum computation. The present
paper reviews the mathematical and theoretical foundation of quantum annealing. In par-
ticular, theorems are presented for convergence conditions of quantum annealing to the
target optimal state after an infinite-time evolution following the Schrödinger or stochas-
tic (Monte Carlo) dynamics. It is proved that the same asymptotic behavior of the
control parameter guarantees convergence both for the Schrödinger dynamics and the
stochastic dynamics in spite of the essential difference of these two types of dynamics.
Also described are the prescriptions to reduce errors in the final approximate solution
obtained after a long but finite dynamical evolution of quantum annealing. It is shown
there that we can reduce errors significantly by an ingenious choice of annealing schedule
(time dependence of the control parameter) without compromising computational com-
plexity qualitatively. A review is given on the derivation of the convergence condition
for classical simulated annealing from the view point of quantum adiabaticity using a
classical-quantum mapping.
1 Introduction
An optimization problem is a problem to minimize or maximize a real single-valued
function of multivariables called the cost function [1, 2]. If the problem is to maximize
the cost function f , it suffices to minimize −f . It thus does not lose generality to consider
minimization only. In the present paper we consider combinatorial optimization, in which
variables take discrete values. Well-known examples are satisfiability problems (SAT),
Exact Cover, Max Cut, Hamilton graph, and Traveling Salesman Problem. In physics,
the search of the ground state of spin systems is a typical example, in particular, systems
with quenched randomness like spin glasses.
Optimization problems are classified roughly into two types, easy and hard ones.
Loosely speaking, easy problems are those for which we have algorithms to solve in
steps(=time) polynomial in the system size (polynomial complexity). In contrast, for
hard problems, all known algorithms take exponentially many steps to reach the exact
solution (exponential complexity). For these latter problems it is virtually impossible
to find the exact solution if the problem size exceeds a moderate value. Most of the
interesting cases as exemplified above belong to the latter hard class.
It is therefore important practically to devise algorithms which give approximate but
accurate solutions efficiently, i.e. with polynomial complexity. Many instances of com-
binatorial optimization problems have such approximate algorithm. For example, the
∗
International School for Advanced Studies (SISSA), Via Beirut 2-4, I-34014 Trieste, Italy
†
Department of Physics, Tokyo Institute of Technology, Oh-okayama, Meguro-ku, Tokyo 152-8551, Japan
1
Lin-Kernighan algorithm is often used to solve the traveling salesman problem within a
reasonable time [3].
In the present paper we will instead discuss generic algorithms, simulated annealing
(SA) and quantum annealing (QA). The former was developed from the analogy be-
tween optimization problems and statistical physics [4, 5]. In SA, the cost function to be
minimized is identified with the energy of a statistical-mechanical system. The system
is then given a temperature, an artificially-introduced control parameter, by reducing
which slowly from a high value to zero, we hope to drive the system to the state with
the lowest value of the energy (cost function), reaching the solution of the optimization
problem. The idea is that the system is expected to stay close to thermal equilibrium
during time evolution if the rate of decrease of temperature is sufficiently slow, and is
thus lead in the end to the zero-temperature equilibrium state, the lowest-energy state.
In practical applications SA is immensely popular due to its general applicability, reason-
able performance, and relatively easy implementation in most cases. SA is usually used
as a method to obtain an approximate solution within a finite computation time since it
needs an infinitely long time to reach the exact solution by keeping the system close to
thermal equilibrium.
Let us now turn our attention to quantum annealing [6, 7, 8, 9, 10, 11] 1 . In SA, we
make use of thermal (classical) fluctuations to let the system hop from state to state over
intermediate energy barriers to search for the desired lowest-energy state. Why then not
try quantum-mechanical fluctuations (quantum tunneling) for state transitions if such
may lead to better performance? In QA we introduce artificial degrees of freedom of
quantum nature, non-commutative operators, which induce quantum fluctuations. We
then ingeniously control the strength of these quantum fluctuations so that the system
finally reaches the ground state, just like SA in which we slowly reduce the temperature.
More precisely, the strength of quantum fluctuations is first set to a very large value for
the system to search for the global structure of the phase space, corresponding to the
high-temperature situation in SA. Then the strength is gradually decreased to finally
vanish to recover the original system hopefully in the lowest-energy state. Quantum
tunneling between different classical states replaces thermal hopping in SA. The physical
idea behind such a procedure is to keep the system close to the instantaneous ground state
of the quantum system, analogously to the quasi-equilibrium state to be kept during the
time evolution of SA. Similarly to SA, QA is a generic algorithm applicable, in principle, to
any combinatorial optimization problem and is used as a method to reach an approximate
solution within a given finite amount of time.
The reader may wonder why one should invent yet another generic algorithm when we
already have powerful SA. A short answer is that QA outperforms SA in most cases, at
least theoretically. Analytical and numerical results indicate that the computation time
needed to achieve a given precision of the answer is shorter in QA than in SA. Also, the
magnitude of error is smaller for QA than SA if we run the algorithm for a fixed finite
amount of time. We shall show some theoretical bases for these conclusions in this paper.
Numerical evidence is found in [9, 10, 11, 14, 15, 16, 17, 18, 19, 20, 21, 22].
A drawback of QA is that a full practical implementation should rely on the quantum
computer because we need to solve the time-dependent Schrödinger equation of very
large scale. Existing numerical studies have been carried out either for small prototype
examples or for large problems by Monte Carlo simulations using the quantum-classical
mapping by adding an extra (Trotter or imaginary-time) dimension [23, 24, 25]. The
1
The term quantum annealing first appeared in [12, 13], in which the authors used quantum transitions for
state search and the dynamical evolution of control parameters were set by hand as an algorithm. Quantum
annealing in the present sense using natural Schrödinger dynamics was proposed later independently in [6]
and [7].
2
latter mapping involves approximations, which inevitably introduces additional errors as
well as the overhead caused by the extra dimension. Nevertheless, it is worthwhile to
clarify the usefulness and limitations of QA as a theoretical step towards a new paradigm
of computation. This aspect is shared by quantum computation in general whose practical
significance will be fully exploited on the quantum computer.
The idea of QA is essentially the same as quantum adiabatic evolution (QAE), which
is now actively investigated as an alternative paradigm of quantum computation [26]. It
has been proved that QAE is equivalent to the conventional circuit model of quantum
computation [27], but QAE is sometimes considered more useful than the circuit model
for several reasons including robustness against external disturbance. In the literature
of quantum computation, one is often interested in the computational complexity of the
QAE-based algorithm for a given specific problem under a fixed value of acceptable error.
QAE can also be used to find the final quantum state when the problem is not a classical
optimization.
In some contrast to these situations on QAE, studies of QA are often focused not on
computational complexity but on the theoretical convergence conditions for infinite-time
evolution and on the amount of errors in the final state within a fixed evolution time.
Such a difference may have lead some researchers to think that QA and QAE are to be
distinguished from each other. We would emphasize that they are essentially the same
and worth investigations by various communities of researchers.
The structure of the present paper is as follows. Section 2 discusses the convergence
condition of QA, in particular the rate of decrease of the control parameter representing
quantum fluctuations. It will be shown there that a qualitatively faster decrease of the
control parameter is allowed in QA than in SA to reach the solution. This is one of the
explicit statements of the claim more vaguely stated above that QA outperforms SA.
In Sec. 3 we review the performance analysis of SA using quantum-mechanical tools.
The well-known convergence condition for SA will be rederived from the perspective
of quantum adiabaticity. The methods and results in this section help us strengthen
the interrelation between QA, SA and QAE. The error rate of QA after a finite-time
dynamical evolution is analyzed in Sec. 4. There we explain how to reduce the final
residual error after evolution of a given amount of time. This point of view is unique in
the sense that most references of QAE study the time needed to reach a given amount of
tolerable error, i.e. computational complexity. The results given in this section can be
used to qualitatively reduce residual errors for a given algorithm without compromising
computational complexity. Convergence conditions for stochastic implementation of QA
are discussed in Sec. 5. The results are surprising in that the rate of decrease of the
control parameter for the system to reach the solution coincides with that found in Sec. 2
for the pure quantum-mechanical Schrödinger dynamics. The stochastic (and therefore
classical) dynamics shares the same convergence conditions as fully quantum dynamics.
Summary and outlook are described in the final section.
The main parts of this paper (Secs. 2, 4 and 5) are based on the PhD Thesis of one of
the authors (S.M.) [28] as well as several original papers of the present and other authors
as will be referred to appropriately. The present paper is not a comprehensive review
of QA since an emphasis is given almost exclusively to the theoretical and mathematical
aspects. There exists an extensive body of numerical studies and the reader is referred
to [9, 10, 11] for reviews.
3
2 Convergence condition of QA – Real-time Schrödinger
evolution
The convergence condition of QA with the real-time Schrödinger dynamics is investigated
in this section, following [29]. We first review the proof of the adiabatic theorem [30] to
be used to derive the convergence condition. Then introduced is the Ising model with
transverse field as a simple but versatile implementation of QA. The convergence condition
is derived by solving the condition for adiabatic transition with respect to the strength
of the transverse field.
d
i |ψ(t)i = H(t) |ψ(t)i , (2)
dt
or, in terms of the dimensionless time,
d
i |ψ̃(s)i = τ H̃(s)|ψ̃(s)i, (3)
ds
where we set ~ = 1. We assume that the initial state is chosen to be the ground state
of the initial Hamiltonian H(0) and that the ground state of H̃(s) is not degenerate for
s ≥ 0. We show in the next section that the transverse-field Ising model, to be used as
H(t) in most parts of this paper, has no degeneracy in the ground state (except possibly in
the limit of t → ∞). If τ is large, the Hamiltonian changes slowly and it is expected that
the state vector keeps track of the instantaneous ground state. The adiabatic theorem
provides the condition for adiabatic evolution. To see this, we derive the asymptotic form
of the state vector with respect to the parameter τ .
Since we wish to estimate how close the state vector is to the ground state, it is natural
to expand the state vector by the instantaneous eigenstates of H̃(s). Before doing so, we
derive useful formulas for the eigenstates. The kth instantaneous eigenstate of H̃(s) with
the eigenvalue εk (s) is denoted as |k(s)i,
We assume that |0(s)i is the ground state of H̃(s) and that the eigenstates are orthonor-
mal, hj(s)| k(s)i = δjk . From differentiation of (4) with respect to s, we obtain
d −1 dH̃(s)
hj(s)| |k(s)i = hj(s)| |k(s)i , (5)
ds εj (s) − εk (s) ds
where j 6= k. In the case of j = k, the same calculation does not provide any meaningful
result. We can, however, impose the following condition,
d
hk(s)| |k(s)i = 0. (6)
ds
4
This condition is achievable by the time-dependent phase shift: If |k̃(s)i = eiθ(s) |k(s)i,
we find
d dθ d
hk̃(s)| |k̃(s)i = i + hk(s)| |k(s)i . (7)
ds ds ds
The second term on the right-hand side is purely imaginary because
∗
d d d
hk(s)| |k(s)i + hk(s)| |k(s)i = hk(s)|k(s)i = 0. (8)
ds ds ds
Thus, the condition (6) can be satisfied by tuning the phase factor θ(s) even if the original
eigenstate does not satisfy it.
Theorem 2.1. If the instantaneous ground state of the Hamiltonian H̃(s) is not degen-
erate for s ≥ 0 and the initial state is the ground state at s = 0, i.e. |ψ̃(0)i = |0(0)i, the
state vector |ψ̃(s)i has the asymptotic form in the limit of large τ as
X
|ψ̃(s)i = cj (s)e−iτ φj (s) |j(s)i , (9)
j
1 dH̃(s)
Aj (s) ≡ hj(s)| |0(s)i . (12)
∆j (s)2 ds
Proof. Substitution of (9) into the Schrödinger equation (3) yields the equation for the
coefficient cj (s) as
Since the initial state is chosen to be the ground state of H(0), c0 (0) = 1 and cj6=0 (0) = 0.
The second term on the right-hand side is of the order of τ −1 because its integrand rapidly
oscillates for large τ . In fact, the integration by parts yields the τ −1 -factor. Thus, cj6=0 (0)
is of order τ −1 at most. Hence only the k = 0 term in the summation remains up to the
order of τ −1 ,
Z s
eiτ [φj (s̃)−φ0 (s̃)] dH̃(s̃)
cj6=0 (s) ≈ ds̃ hj(s̃)| |0(s̃)i + O(τ −2 ), (15)
0 ∆j (s̃) ds̃
Remark. The condition for the adiabatic evolution is given by the smallness of the excita-
tion probability. That is, the right-hand side of (11) should be much smaller than unity.
This condition is consistent with the criterion of the validity of the above asymptotic
expansion. It is represented by
τ ≫ |Aj (s)| . (16)
5
Using the original time variable t, this adiabaticity condition is written as
1 dH(t)
hj(t)| |0(t)i = δ ≪ 1. (17)
∆j (t)2 dt
where the σiα (α = x, y, z) are the Pauli matrices, components of the spin 12 operator at
site i. The eigenvalue of σiz is +1 or −1, which corresponds the classical Ising spin. Most
combinatorial optimization problems can be written in this form by, for example, mapping
binary variables (0 and 1) to spin variables (±1). Another important assumption is that
the Hamiltonian (18) is extensive, i.e. proportional to the number of spins N for large
N.
To realize QA, a fictitious kinetic energy is introduced typically by the time-dependent
transverse field
N
X
HTF (t) ≡ −Γ(t) σix , (19)
i=1
which induces spin flips, quantum fluctuations or quantum tunneling, between the two
states σiz = 1 and σiz = −1, thus allowing a quantum search of the phase space. Ini-
tially the strength of the transverse field Γ(t) is chosen to be very large, and the total
Hamiltonian
H(t) = HIsing + HTF (t) (20)
is dominated by the second kinetic term. This corresponds to the high-temperature
limit of SA. The coefficient Γ(t) is then gradually and monotonically decreased toward
0, leaving eventually only the potential term HIsing . Accordingly the state vector |ψ(t)i,
which follows the real-time Schrödinger equation, is expected to evolve from the trivial
initial ground state of the transverse-field term (19) to the non-trivial ground state of
(18), which is the solution of the optimization problem. An important issue is how slowly
we should decrease Γ(t) to keep the state vector arbitrarily close to the instantaneous
ground state of the total Hamiltonian (20). The following Theorem provides a solution
to this problem as a sufficient condition.
Theorem 2.2. The adiabaticity (17) for the transverse-field Ising model (20) yields the
time dependence of Γ(t) as
6
for t > t0 (for a given positive t0 ) as a sufficient condition of convergence of QA. Here a
and c are constants of O(N 0 ) and δ is a small parameter to control adiabaticity appearing
in (17).
The following Theorem proved by Hopf [32] will be useful to prove this Theorem. See
Appendix A for the proof.
Theorem 2.3. If all the elements of a square matrix M are strictly positive, Mij > 0,
its maximum eigenvalue λ0 and any other eigenvalues λ satisfy
κ−1
|λ| ≤ λ0 , (22)
κ+1
where κ is defined by
Mik
κ ≡ max . (23)
i,j,k Mjk
Proof of Theorem 2.2. We show that the power decay (21) satisfies the adiabaticity
condition (17) which guarantees convergence to the ground state of HIsing as t → ∞. .
For this purpose we estimate the energy gap and the time derivative of the Hamiltonian.
As for the latter, it is straightforward to see
dH(t) dΓ(t)
hj(t)| |0(t)i ≤ −N , (24)
dt dt
since the time dependence of H(t) lies only in the kinetic term HTF (t), which has N
terms. Note that dΓ/dt is negative.
To estimate a lower bound for the energy gap, we apply Theorem 2.3 to the operator
M ≡ (E+ − H(t))N . We assume that the constant E+ satisfies E+ > Emax + Γ0 , where
Γ0 ≡ Γ(t0 ) and Emax is the maximum eigenvalue of the potential term HIsing . All the
elements of the matrix M are strictly positive in the representation that diagonalizes {σiz }
because E+ − H(t) is non-negative and irreducible, that is, any state can be reached from
any other state within at most N steps.
For t > t0 , where Γ(t) < Γ0 , all the diagonal elements of E+ −H(t) are larger than any
non-zero off-diagonal element Γ(t). Thus, the minimum element of M , which is between
two states having all the spins in mutually opposite directions, is equal to N !Γ(t)N , where
N ! comes from the ways of permutation to flip spins. Replacement of HTF (t) by −N Γ0
shows that the maximum matrix element of M has the upper bound (E+ −Emin +N Γ0 )N ,
where Emin is the lowest eigenvalue of HIsing . Thus, we have
(E+ − Emin + N Γ0 )N
κ≤ . (25)
N !Γ(t)N
If we denote the eigenvalue of H(t) by εj (t), (22) is rewritten as
N κ−1 N
[E+ − εj (t)] ≤ [E+ − ε0 (t)] . (26)
κ+1
Substitution of (25) into the above inequality yields
2[E+ − ε0 (t)]N !
∆j (t) ≥ Γ(t)N ≡ AΓ(t)N , (27)
N (E+ − Emin + N Γ0 )N
where we used 1 − ((κ − 1)/(κ + 1))1/N ≥ 2/N (κ + 1) for κ ≥ 1 and N ≥ 1. The coefficient
A is estimated using the Stirling formula as
√ N
2 2πN [E+ − εmax
0 ] N
A≈ , (28)
N eN E+ − Emin + N Γ0
7
where εmax
0 is maxt>t0 {ε0 (t)}. This expression implies that A is exponentially small for
large N .
Now, by combination of the above estimates (24) and (27), we find that the sufficient
condition for convergence for t > t0 is
N dΓ(t)
− = δ ≪ 1, (29)
A2 Γ(t)2N dt
where δ is an arbitrarily small constant. By integrating this differential equation, we
obtain (21).
Remark. The asymptotic power decay of the transverse field guarantees that the exci-
tation probability is bounded by the arbitrarily small constant δ 2 at each instant. This
annealing schedule is not valid when Γ(t) is not sufficiently small because we evaluated
the energy gap for Γ(t) < Γ0 (t > t0 ). If we take the limit t0 → 0, Γ0 increases indefinitely
and the coefficient a in (21) diverges. Then the result (21) does not make sense. This is
the reason why a finite positive time t0 should be introduced in the statement of Theorem
2.2.
The second summation runs over appropriate pairs of sites that satisfy extensiveness of
the Hamiltonian. A recent numerical study shows the effectiveness of this type of quantum
kinetic energy [18]. The additional transverse interaction widens the instantaneous energy
gap between the ground state and the first excited state. Thus, it is expected that an
annealing schedule faster than (21) satisfies the adiabaticity condition. The following
Theorem supports this expectation.
Theorem 2.4. The adiabaticity for the quantum system HIsing + HTI (t) yields the time
dependence of Γ(t) for t > t0 as
The above result implies that additional non-zero off-diagonal elements of the Hamil-
tonian accelerates the convergence of QA. It is thus interesting to consider the many-body
transverse interaction of the form
N
Y
HMTI (t) = −ΓMTI (t) (1 + σix ) . (32)
i=1
All the elements of HMTI are equal to −ΓMTI (t) in the representation that diagonalizes
σiz . In this system, the following Theorem holds.
8
Theorem 2.5. The adiabaticity for the quantum system HIsing + HTMI (t) yields the time
dependence of Γ(t) for t > t0 as
2N −2
ΓMTI (t) ∝ . (33)
δt
Proof. We define the strictly positive operator as M = E+ − HIsing − HMTI (t). The
maximum and minimum matrix elements of M are E+ − Emin + ΓMTI (t) and ΓMTI (t),
respectively. Thus we have
E+ − Emin + ΓMTI (t)
κ= , (34)
ΓMTI (t)
κ−1 E+ − Emin 2ΓMTI (t)
= ≥1− , (35)
κ+1 E+ − Emin + 2ΓMTI (t) E+ − Emin
where à is O(N 0 ). Since the matrix element of the derivative of the Hamiltonian is
bounded as
dH(t) dΓMTI
hj(t)| |0(t)i ≤ −2N , (37)
dt dt
we find that the sufficient condition for convergence with the many-body transverse in-
teraction is
2N dΓMTI
− = δ ≪ 1. (38)
2
à ΓMTI (t)2 dt
Integrating this differential equation yields the annealing schedule (33).
2N −2
tMTI ≈ , (40)
δǫ
which again shows exponential dependence on N .
9
These exponential computational complexities do not come as a surprise because
Theorems 2.2, 2.4 and 2.5 all apply to any optimization problems written in the generic
form (18), which includes the worst cases of most difficult problems. Similar arguments
apply to SA [34].
Another remark is on the comparison of Γ(t)(∝ t−1/(2N −1) ) in QA with T (t)(∝
N/ log(αt + 1)) in SA to conclude that the former schedule is faster than the latter.
The transverse-field coefficient Γ in a quantum system plays the same role qualitatively
and quantitatively as the temperature T does in a corresponding classical system at least
in the Hopfield model in a transverse field [35]. When the phase diagram is written in
terms of Γ and α (the number of embedded patterns divided by the number of neurons)
for the ground state of the model, the result has precisely the same structure as the T -α
phase diagram of the finite-temperature version of the Hopfield model without transverse
field. This example serves as a justification of the direct comparison of Γ and T at least
as long as the theoretical analyses of QA and SA are concerned.
where the sum runs over all configurations of Ising spins, i.e. over the values taken by
the z-components of the Pauli matrices, σiz = σi (±1) (∀i). The symbol {σi } stands for
the set {σ1 , σ2 , · · · , σN }.
An important element is the following Theorem.
10
Theorem 3.1. The thermal expectation value (42) is equal to the expectation value of Q
by the quantum wave function
X
|ψ(T )i = e−βH/2 |{σi }i, (43)
{σ}
where |{σi }iis the basis state diagonalizing each σiz as σi . The sum runs over all such
possible assignments.
Assume T > 0. The wave function (43) is the ground state of the quantum Hamilto-
nian X X
Hq (T ) = −χ Hqj (T ) ≡ −χ (σxj − eβHj ), (44)
j j
where Hj is the sum of the terms of the Hamiltonian (41) involving site j,
X X
Hj = −Jj σjz − Jjk σjz σkz − Jjkl σjz σkz σlz − · · · . (45)
k kl
since the operator σxj just changes the order of the above summation. It is also easy to
see that
σxj e−βH/2 = eβHj e−βH/2 σxj (48)
because
σxj e−βH/2 σxj = e−β(H−Hj )/2 σxj e−βHj /2 σxj = e−β(H−Hj )/2 eβHj /2 = eβHj e−βH/2 . (49)
as both H and Hj are diagonal in the present representation and H − Hj does not include
σjz , so [H − Hj , σjx ] = 0. We therefore have
X
Hqj (T )|ψ(T )i = (σxj − eβHj ) e−βH/2 |{σi }i = 0. (50)
{σ}
A few remarks are in order. In the high-temperature limit, the quantum Hamiltonian
is composed just of the transverse-field term,
X
Hq (T → ∞) = − (σxj − 1). (51)
j
Correspondingly the ground-state wave function |ψ(T → ∞)i is the simple summation
over all possible states with equal weight. In this way the thermal fluctuations in the
11
original classical system are mapped to the quantum fluctuations. The low-temperature
limit has, in contrast, the purely classical Hamiltonian
X
Hq (T ≈ 0) → χ eβHj (52)
j
and the ground state of Hq (T ≈ 0) is also the ground state of H as is apparent from
the definition (43). Hence the decrease of thermal fluctuations in SA is mapped to the
decrease of quantum fluctuations. As explained below, this correspondence allows us
to analyze the condition for quasi-equilibrium in the classical SA using the adiabaticity
condition for the quantum system.
where a and c are N -independent positive constants, in the asymptotic limit of large N .
Proof. The analysis of Sec. 2.2.1 applies with the replacement of Γ(t) by χ = e−βp
and ε0 (t) = 0. This latter condition comes from Hq (T )|ψ(T )i = 0. The condition
Γ(t) < Γ0 (t > t0 ) is unnecessary here because the off-diagonal element χ can always be
chosen smaller than the diagonal elements by adding a positive constant to the diagonal.
Equation (27) gives
∆j (t) ≥ Ae−βpN (55)
and A satisfies, according to (28),
√
A ≈ b 2πN e−cN (56)
Lemma 3.4. The matrix element of the derivative of Hq (T ), relevant to the adiabaticity
condition, satisfies
12
Proof. By differentiating the identity
Hq (T )|ψ(T )i = 0 (58)
we find
∂ ∂ 1
Hq (T ) |ψ(T )i = −Hq (T ) |ψ(T )i = Hq (T ) − H |ψ(T )i. (59)
∂T ∂T 2kB T 2
This relation immediately proves the Lemma if we notice that the ground state energy of
Hq (T ) is zero and therefore Hq (T )|ψ1 (T )i = ∆(T )|ψ1 (T )i.
Proof of Theorem 3.2. The condition of adiabaticity for the quantum system Hq (T )
reads
1 dT
p hψ1 (T )|∂T Hq (T )|ψ(T )i =δ (61)
2
∆(T ) Z(T ) dt
with sufficiently small δ. If we rewrite the matrix element by Lemma 3.4 , the left-hand
side is
|hψ1 (T )|H|ψ(T )i| dT
p . (62)
2kB T 2 ∆(T ) Z(T ) dt
By replacing the numerator by its bound in Lemma 3.5 we have
pN dT
2
= δ̃ ≪ 1 (63)
2kB T ∆(T ) dt
as a sufficient condition for adiabaticity. Using the bound of Lemma 3.3 and integrating
the above differential equation for T (t) noticing dT /dt < 0, we reach the statement of
Theorem 3.2.
3.3 Remarks
Equation (53) reproduces the Geman-Geman condition for convergence of SA [33]. Their
method of proof is to use the theory of classical inhomogeneous (i.e. time-dependent)
Markov chain representing non-equilibrium processes. It may thus be naively expected
that the classical system under consideration may not stay close to equilibrium during
the process of SA since the temperature always changes. It therefore comes as a surprise
that the adiabaticity condition, which is equivalent to the quasi-equilibrium condition
according to Theorem 3.1, leads to Theorem 3.2. The rate of temperature change in this
latter Theorem is slow enough to guarantee the quasi-equilibrium condition even when
the temperature keeps changing.
Also, Theorem 3.2 is quite general, covering the worst cases, as it applies to any
system written as the Ising model of (41). This fact means that one may apply a faster
rate of temperature decrease to solve a given specific problem with small errors. The
same comment applies to the QA situation in Sec. 2.
Another remark is on the relation of QA and QAE. Mathematical analyses of QA
often focus their attention to the generic convergence conditions in the infinite-time limit
as seen in Secs. 2 and 5 as well as in the early paper [7], although the residual energy after
13
finite-time evolution has also been extensively investigated mainly in numerical studies.
This aspect may have lead some researchers to think that QA is different from QAE,
since the studies using the latter mostly concern the computational complexity of finite-
time evolution for a given specific optimization problem using adiabaticity to construct
an algorithm of QAE. As has been shown in the present and the previous sections, the
adiabaticity condition also leads to the convergence condition in the infinite-time limit for
QA and SA. In this sense QA, QAE and even SA share essentially the same mathematical
background.
14
1
f (s) f1 (s)
f2 (s)
f3 (s)
f4 (s)
0
0 1
s
bound of the next order for the excitation probability under this assumption is obtained
as
2 1 h (2) i2
(2)
hj(1)|ψ̃(1)i . 4 Aj (0) + Aj (1) + O(τ −5 ). (67)
τ
It is easy to see that the τ −4 -term also vanishes when H̃ ′′ (0) = H̃ ′′ (1) = 0. It is straight-
forward to generalize these results to prove the following Theorem.
Theorem 4.1. If the kth derivative of H̃(s) is equal to zero at s = 0 and 1 for all
k = 1, 2, · · · , m − 1, the excitation probability has the upper bound
2 1 h i2
(m) (m)
hj(1)|ψ̃(1)i . Aj (0) + Aj (1) + O(τ −2m−1 ). (68)
τ 2m
15
Examples of the annealing schedules fm (s) with the τ −2m error rate are the following
polynomials:
f1 (s) = s, (70)
f2 (s) = s2 (3 − 2s), (71)
3 2
f3 (s) = s (10 − 15s + 6s ), (72)
4 2 3
f4 (s) = s (35 − 84s + 70s − 20s ). (73)
−2
The linear annealing schedule f1 (s), which shows the τ error, has been used in the past
studies. Although we here list only polynomials symmetrical with respect to the point
s = 1/2, this is not essential. For example, f (s) = (1 − cos(πs2 ))/2 also has the τ −4 error
rate because f ′ (0) = f ′ (1) = f ′′ (0) = 0 but f ′′ (1) = −2π 2 .
16
1
Ex itation Probability
10
10
f1 (s)
20
10
f2 (s)
f3 (s)
30
10
f4 (s)
Figure 2: The annealing-time dependence of the excitation probability for the two-level
system (74) using schedules (70) to (73). The curved and straight lines show (75) and (76)
for each annealing schedule, respectively. The parameters in (74) are chosen to be h = 2 and
α = 0.2.
The initial state, namely the ground state of Hkin , is the all-up state along the x axis.
The difference between the obtained approximate energy and the true ground state
energy (exact solution) is the residual energy Eres . It is a useful measure of the error rate
of QA. It has the same behavior as the excitation probability because it is rewritten as
17
0.31 −0.54
−0.73
−0.15
0.29
0.08 0.77
−0.45
0.66
0.29
−0.44 −0.6
5
10
Residual Energy
10
10
f1(s)
15 f2(s)
10
f3(s)
20
10
f4(s)
Figure 4: The annealing-time dependence of the residual energy for the two-dimensional spin
glass model with improved annealing schedules. The solid lines denote functions proportional
to τ −2m (m = 1, 2, 3, 4). The parameter values are h = 0.1 and Γ = 1.
18
the sense that the excitation probability by the adiabatic transition is equal to a small
constant at each time, it has the τ −2 error rate. We show that annealing schedules with
the τ −2m error rate can be constructed by a slight modification of their optimal schedule.
Let us consider the Hilbert space which has the basis states |ii (i = 1, 2, · · · , N ), and
the marked state is denoted by |mi. Suppose that we can construct the Hamiltonian (69)
with two terms,
Hpot = 1 − |mihm|, (81)
N N
1 XX
Hkin = 1 − |iihj|. (82)
N i=1 j=1
The Hamiltonian Hpot can be applied without the explicit knowledge of |mi, the same
assumption as in Grover’s algorithm. The initial state is a superposition of all basis
states,
N
1 X
|ψ(0)i = √ |ii, (83)
N i=1
which does not depend on the marked state. The energy gap between the ground state
and the first excited state,
r
N −1
∆1 (s) = 1 − 4 f (s)[1 − f (s)], (84)
N
has a minimum at f (s) = 1/2. The highest eigenvalue ε2 (s) = 1 is (N −2)-fold degenerate.
To derive the optimal annealing schedule, we briefly review the results reported by
Roland and Cerf [43]. When the energy gap is small (i.e. for f (s) ≈ 1/2), non-adiabatic
transitions are likely to occur. Thus we need to change the Hamiltonian carefully. On
the other hand, when the energy gap is not very small, too slow a change wastes time.
Thus the speed of parameter change should be adjusted adaptively to the instantaneous
energy gap. This is realized by tuning the annealing schedule to satisfy the adiabaticity
condition (16) in each infinitesimal time interval, that is,
|A1 (s)|
= δ, (85)
τ
where δ is a small constant. In the database search problem, this condition is rewritten
as √
N − 1 df
= δ. (86)
τ N ∆1 (s)3 ds
After integration under boundary conditions f (0) = 0 and f (1) = 1, we obtain
1 2s − 1
fopt (s) = + p . (87)
2 2 N − (N − 1)(2s − 1)2
As plotted by a solid line in Fig. 5, this function changes most slowly when the energy
gap takes the minimum value. It is noted that the annealing time is determined by the
small constant δ as √
N −1
τ= , (88)
δ
√
which means that the computation time is of order N similarly to Grover’s algorithm.
The optimal annealing schedule (87) shows the τ −2 error rate because its derivative
is non-vanishing at s = 0 and 1. It is easy to see from (87) that the simple replacement
of s with fm (s) fulfils the condition for the τ −2m error rate. We carried out numerical
19
1
fopt (s)
(2)
0.8 fopt (s)
(3)
fopt (s)
(4)
0.6 fopt (s)
f (s)
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
s
Figure 5: The optimal annealing schedules for the database search problem (N = 64). The
solid line denotes the original optimal schedule (87) and the dashed lines are for the modified
schedules.
5
10
Residual Energy
10
10
fopt (s)
(2)
10
15
fopt (s)
(3)
20
fopt (s)
10
(4)
fopt (s)
Figure 6: The annealing-time dependence of the residual energy for the database search
problem (N = 64) with the optimal annealing schedules described in Fig. 5. The solid lines
represent functions proportional to τ −2m (m = 1, 2, 3, 4).
20
(m)
simulations for N = 64 with such annealing schedules, fopt (s) ≡ fopt (fm (s)), as plotted
(m)
by dashed lines in Fig. 5. As shown in Fig. 6, the residual energy with fopt (s) is propor-
tional to τ −2m . The characteristic time τc for the τ −2m error rate to show up increases
(m)
with m: Since the modified optimal schedule fopt (s) has a steeper slope at s = 1/2 than
fopt (s), a longer annealing time is necessary to satisfy the adiabaticity condition (86).
(m)
Nevertheless, the difference in slopes of fopt (s) is only a factor of O(1), and therefore τc
√
is still scaled as N . Significant qualitative reduction of errors has been achieved without
compromising computational complexity apart from a numerical factor.
However, it is not obvious that this feature survives in the time-dependent situation.
An important aspect of the IT Schrödinger equation is non-unitarity. The norm of
the wave function is not conserved. Thus, we consider the normalized state vector
1
|ψ(t)i ≡ p |Ψ(t)i . (91)
hΨ(t)|Ψ(t)i
21
The above equation is not linear but norm-conserving, which makes the asymptotic expan-
sion easy. In terms of the dimensionless time s = t/τ , the norm-conserving IT Schrödinger
equation is written as
d E h D Ei E
− ψ̃(s) = τ H̃(s) − H̃(s) ψ̃(s) . (94)
ds
c0 (s) ≈ 1 + O τ −2 ,
(96)
Aj (s)
+ O τ −2 .
cj6=0 (s) ≈ (97)
τ
Proof. The norm-conserving IT Schrödinger equation (94) is rewritten as the equation of
motion for cj (s) as
" #
dcj X ck (s) dH̃(s) X
2
= hj(s)| |k(s)i − τ cj (s) εj (s) − εl (s)|cl (s)| . (98)
ds εj (s) − εk (s) ds
k6=j l
Since the norm of the wave function is conserved, l |cl (s)|2 = 1 and therefore
P
X X
εl (s)|cl (s)2 | = ε0 (s) + [εl (s) − ε0 (s)] |cl (s)2 |. (102)
l l6=0
22
Finally we obtain the integral equation for cj (s):
s
h0(s̃)| dds̃
H̃
|l(s̃)i
Z X
τ δ(s) τ δ(s)
c0 (s) = e +e ds̃ e−τ δ(s̃) cl (s̃) , (104)
0 ε0 (s̃) − εl (s̃)
l6=0
s
hj(s̃)| dds̃
H̃
|k(s̃)i
Z X
τ δ(s) −τ [φj (s)−φ0 (s)]
cj6=0 (s) = e e ds̃ e−τ δ(s̃) eτ [φj (s̃)−φ0 (s̃)] ck (s̃) ,
0 εj (s̃) − εk (s̃)
k6=j
(105)
Strictly speaking, the right-hand sides in the above equations denote the upper bound for
the error rate for RT-QA, not the error rate itself. In some systems, for example, the two
level system, the error rate oscillates because Aj (0) and Aj (1) may cancel in (11), and
becomes smaller than that of IT-QA at some τ . However, QA for ordinary optimization
problems has different energy levels at initial and final times, and thus such a cancellation
seldom occurs.
23
1
fsq2(s)
f (s )
fsq1(s)
0
0 1
s
Figure 7: The annealing schedules defined in (111). fsq1 (s) and fsq2 (s) have a vanishing slope
at the initial time s = 0 and the final time s = 1, respectively.
24
1
Residual Energy
5
10
10 IT, sq1
10
IT, sq2
RT, sq1
15
10 RT, sq2
Figure 8: The annealing-time dependence of the residual energy for IT- and RT-QA with
annealing schedules fsq1 (s) and fsq2 (s). The system is the spin-glass model presented in
Sec. 4.3.2. The solid lines stand for functions proportional to τ −2 and τ −4 . The parameters
are h = 0.1 and Γ = 1.
t: (
P (y, x)A(y, x; t) (x 6= y)
G(y, x; t) = P (112)
1 − z∈S P (z, x)A(z, x; t) (x = y),
where P (y, x) and A(y, x; t) are called the generation probability and the acceptance prob-
ability, respectively. The former is the probability to generate the next candidate state y
from the present state x. We assume that this probability does not depend on time and
satisfies the following conditions:
∀x, y ∈ S : P (y, x) = P (x, y) ≥ 0, (113)
∀x ∈ S : P (x, x) = 0, (114)
X
∀x ∈ S : P (y, x) = 1, (115)
y∈S
n−1
Y
∀x, y ∈ S, ∃n > 0, ∃z1 , · · · , zn−1 ∈ S : P (zk+1 , zk ) > 0, z0 = x, zn = y. (116)
k=0
The last condition represents irreducibility of S, that is, any state in S can be reached
from any other state in S.
We define Sx as the neighborhood of x, i.e., the set of states that can be reached by
a single step from x:
Sx = {y | y ∈ S, P (y, x) > 0}. (117)
The acceptance probability A(y, x; t) is the probability to accept the candidate y gener-
ated from state x. The matrix G(t), whose (y, x) component is given by (112), [G(t)]y,x =
G(y, x; t), is called the transition matrix.
25
Let P denote the set of probability distributions on S. We regard a probability dis-
tribution p (∈ P) as the column vector with the component [p]x = p(x). The probability
distribution at time t, started from an initial distribution p0 (∈ P) at time t0 , is written
as
p(t, t0 ) = Gt,t0 p0 ≡ G(t − 1)G(t − 2) · · · G(t0 )p0 . (118)
A Markov chain is called inhomogeneous when the transition probability depends
on time. In the following sections, we will prove that inhomogeneous Markov chains
associated with QA are ergodic under appropriate conditions. There are two kinds of
ergodicity, weak and strong. Weak ergodicity means that the probability distribution
becomes independent of the initial conditions after a sufficiently long time:
∀t0 ≥ 0 : lim sup{kp(t, t0 ) − p′ (t, t0 )k | p0 , p′0 ∈ P} = 0, (119)
t→∞
where p(t, t0 ) and p′ (t, t0 ) are the probability distributions with different initial distribu-
tions p0 and p′0 . The norm is defined by
X
kpk = |p(x)|. (120)
x∈S
Strong ergodicity is the property that the probability distribution converges to a unique
distribution irrespective of the initial state:
∃r ∈ P, ∀t0 ≥ 0 : lim sup{kp(t, t0 ) − rk | p0 ∈ P} = 0. (121)
t→∞
The following two Theorems provide conditions for weak and strong ergodicity of an
inhomogeneous Markov chain [5]. For proofs see Appendix B.
Theorem 5.1 (Condition for weak ergodicity). An inhomogeneous Markov chain is
weakly ergodic if and only if there exists a strictly increasing sequence of positive numbers
{ti }, (i = 0, 1, 2, . . . ), such that
∞
X
1 − α(Gti+1 ,ti ) −→ ∞,
(122)
i=0
26
5.2 Path-integral Monte Carlo method
Let us first discuss convergence conditions for the implementation of quantum annealing
by the path-integral Monte Carlo (PIMC) method [24, 25]. The basic idea of PIMC
is to apply the Monte Carlo method to the classical system obtained from the original
quantum system by the path-integral formula. We first consider the example of ground
state search of the Ising spin system whose quantum fluctuations are introduced by adding
a transverse field. The total Hamiltonian is defined in (20). Although we only treat
the two-body interaction for simplicity in this section, the existence of arbitrary many-
body interactionsP between the z components of Pauli matrix and longitudinal random
magnetic field hi σiz , in addition to the above Hamiltonian, would not change the
following argument.
In the path-integral method, the d-dimensional transverse-field Ising model (TFIM)
is mapped to a (d + 1)-dimensional classical Ising system so that the quantum system
can be simulated on the classical computer. In numerical simulations, the Suzuki-Trotter
formula [23, 24] is usually employed to express the partition function of the resulting
classical system,
M X M XN
X β X (k) (k)
X (k) (k+1)
Z(t) ≈ exp Jij σi σj + γ(t) σi σi , (126)
M i=0
{Si
(k)
} k=1 hiji k=1
(k)
where M is the length along the extra dimension (Trotter number) and σi (= ±1)
denotes a classical Ising spin at site i on the kth Trotter slice. The nearest-neighbour
interaction between adjacent Trotter slices,
1 βΓ(t)
γ(t) = log coth , (127)
2 M
is ferromagnetic. This approximation (126) becomes exact in the limit M → ∞ for a fixed
β = 1/kB T . The magnitude of this interaction (127) increases with time t and tends to
infinity as t → ∞, reflecting the decrease of Γ(t). We fix M and β to arbitrary large
values, which corresponds to the actual situation in numerical simulations. Therefore the
Theorem presented below does not directly guarantee the convergence of the system to
the true ground state, which is realized only after taking the limits M → ∞ and β → ∞.
We will rather show that the system converges to the thermal equilibrium represented
by the right-hand side of (126), which can be chosen arbitrarily close to the true ground
state by taking M and β large enough.
With the above example of TFIM in mind, it will be convenient to treat a more general
expression than (126),
X F0 (x) F1 (x)
Z(t) = exp − − . (128)
T0 T1 (t)
x∈S
Here F0 (x) is the cost function whose global minimum is the desired solution of the
combinatorial optimization problem. The temperature T0 is chosen to be sufficiently
small. The term F1 (x) derives from the kinetic energy, which is the transverse field in
the TFIM. Quantum fluctuations are tuned by the extra temperature factor T1 (t), which
decreases with time. The first term −F0 (x)/T0 corresponds to the interaction term in
the exponent of (126), and the second term −F1 (x)/T1 (t) generalizes the transverse-field
term in (126).
27
For the partition function (128), we define the acceptance probability of PIMC as
q(y; t)
A(y, x; t) = g , (129)
q(x; t)
1 F0 (x) F1 (x)
q(x; t) = exp − − . (130)
Z(t) T0 T1 (t)
This q(x; t) is the equilibrium Boltzmann factor at a given fixed T1 (t). The function g(u)
is the acceptance function, a monotone increasing function satisfying 0 ≤ g(u) ≤ 1 and
g(1/u) = g(u)/u for u ≥ 0. For instance, for the heat bath and the Metropolis methods,
we have
u
g(u) = , (131)
1+u
g(u) = min{1, u}, (132)
respectively. The conditions mentioned above for g(u) guarantee that q(x; t) satisfies
the detailed balance condition, G(y, x; t)q(x; t) = G(x, y; t)q(y; t). Thus, q(x; t) is the
stationary distribution of the homogeneous Markov chain defined by the transition matrix
G(t) with a fixed t. In other words, q(x; t) is the right eigenvector of G(t) with eigenvalue
1.
We denote by d(y, x) the minimum number of steps necessary to make a transition from
x to y. Using this notation we define the minimum number of maximum steps needed to
reach any other state from an arbitrary state in the set S \ Sm ,
n o
R = min max {d(y, x) | y ∈ S} x ∈ S \ Sm . (134)
Also, L0 and L1 stand for the maximum changes of F0 (x) and F1 (x), respectively, in a
single step,
n o
L0 = max |F0 (x) − F0 (y)| P (y, x) > 0, x, y ∈ S , (135)
n o
L1 = max |F1 (x) − F1 (y)| P (y, x) > 0, x, y ∈ S . (136)
Our main results are summarized in the following Theorem and Corollary.
Theorem 5.3 (Strong ergodicity of the system (128)). The inhomogeneous Markov chain
generated by (129) and (130) is strongly ergodic and converges to the equilibrium state
corresponding to the first term of the right-hand side of (130), exp(−F0 (x)/T0 ), if
RL1
T1 (t) ≥ . (137)
log(t + 2)
28
Corollary 5.4 (Strong ergodicity of QA-PIMC for TFIM). The inhomogeneous Markov
chain generated by the Boltzmann factor on the right-hand side of (126) is strongly ergodic
and converges to the equilibrium state corresponding to the first term on the right-hand
side of (126) if
M 1
Γ(t) ≥ tanh−1 . (138)
β (t + 2)2/RL1
Proof of Lemma 5.5. The first part of Lemma 5.5 is proved straightforwardly. Equa-
tion (140) follows directly from the definition of the transition probability and the prop-
erty of the acceptance function g. When q(y; t)/q(x; t) < 1, we have
q(x; t) q(y; t) L0 L1
G(y, x; t) ≥ w g ≥ w g(1) exp − − . (143)
q(y; t) q(x; t) T0 T1 (t)
On the other hand, if q(y; t)/q(x; t) ≥ 1,
L0 L1
G(y, x; t) ≥ w g(1) ≥ w g(1) exp − − , (144)
T0 T1 (t)
where we used the fact that both L0 and L1 are positive.
Next, we prove (141). Since x is not a member of Sm , there exists a state y ∈ Sx such
that F1 (y) − F1 (x) > 0. For such a state y,
F0 (y) − F0 (x) F1 (y) − F1 (x)
lim g exp − − = 0, (145)
t→∞ T0 T1 (t)
because T1 (t) tends to zero as t → ∞ and 0 ≤ g(u) ≤ u. Thus, for all ε > 0, there exists
t1 > 0 such that
F0 (y) − F0 (x) F1 (y) − F1 (x)
∀t > t1 : g exp − − < ε. (146)
T0 T1 (t)
29
We therefore have
X X
P (z, x)A(z, x; t) = P (y, x)A(y, x; t) + P (z, x)A(z, x; t)
z∈S z∈S\{y}
X
< P (y, x)ε + P (z, x)
z∈S\{y}
and consequently,
G(x, x; t) > (1 − ε)P (y, x) > 0. (148)
Since the right-hand side of (141) can be arbitrarily small for sufficiently large t, we obtain
the second part of Lemma 5.5.
Proof of weak ergodicity implied in Theorem 5.3. Let us introduce the following
quantity n o
x∗ = arg min max {d(y, x) | y ∈ S} x ∈ S \ Sm . (149)
Comparison with the definition of R in (134) shows that the state x∗ is reachable by at
most R transitions from any states.
Now, consider the transition probability from an arbitrary state x to x∗ . From the
definitions of R and x∗ , there exists at least one transition route within R steps:
x ≡ x0 6= x1 6= x2 6= · · · 6= xl = xl+1 = · · · = xR ≡ x∗ .
Then Lemma 5.5 yields that, for sufficiently large t, the transition probability at each
time step has the following lower bound:
L0 L1
G(xi+1 , xi ; t − R + i) ≥ wg(1) exp − − . (150)
T0 T1 (t − R + i)
where we eliminate the sum over z in (123) by replacing it with a single term for z = x∗ .
We now substitute the annealing schedule (137). Then weak ergodicity is immediately
proved from Theorem 5.1 because we obtain
∞ ∞
X RL0 X 1
(1 − α(GkR,kR−R )) ≥ wR g(1)R exp − −→ ∞. (153)
T0 kR + 1
k=1 k=k0
30
Proof of Theorem 5.3. To prove strong ergodicity, we refer to Theorem 5.2. The con-
dition 1 has already been proved. As has been mentioned, the Boltzmann factor (130)
satisfies q(t) = G(t)q(t), which is the condition 2. Thus the proof will be complete if we
prove the condition 3 by setting pt = q(t). For this purpose, we first prove that q(x; t) is
monotonic for large t:
A(x)
q(x; t) = . (158)
X ∆(y)
B+ exp − A(y)
min
T1 (t)
y∈S\S1
Since ∆(y) > 0 by definition, the denominator decreases with time. Thus, we obtain
(154).
To prove (155), we consider the derivative of q(x; t) with respect to T1 (t),
X ∆(y)
A(x) B∆(x) + (F1 (x) − F1 (y)) exp − A(y)
T1 (t)
∂q(x; t) y∈S\S1min
= 2 . (159)
∂T1 (t)
∆(x) X ∆(y)
T (t)2 exp B+ exp − A(y)
T1 (t) min
T 1 (t)
y∈S\S1
Only F1 (x) − F1 (y) in the numerator has the possibility of being negative. However, the
first term B∆(x) is larger than the second one for sufficient large t because exp (−∆(y)/T1 (t))
tends to zero as T1 (t) → ∞. Thus there exists t1 > 0 such that ∂q(x; t)/∂T (t) > 0 for all
t > t1 . Since T1 (t) is a decreasing function of t, we have (155).
Consequently, for all t > t1 , we have
X X
kq(t + 1) − q(t)k = [q(x; t + 1) − q(x; t)] − [q(x; t + 1) − q(x; t)]
x∈S1min x6∈S1min
X
=2 [q(x; t + 1) − q(x; t)] , (160)
x∈S1min
P P
where we used kq(t)k = x∈S1min q(x; t) + x6∈S1min q(x; t) = 1. We then obtain
∞
X X
kq(t + 1) − q(t)k = 2 [q(x; ∞) − q(x; t1 )] ≤ 2kq(x; ∞)k = 2. (161)
t=t1 x∈S1min
31
Therefore q(t) satisfies the condition 3:
∞
X tX
1 −1 ∞
X
kq(t + 1) − q(t)k = kq(t + 1) − q(t)k + kq(t + 1) − q(t)k
t=0 t=0 t=t1
tX
1 −1
where b is a positive constant. We have to restrict ourselves to the case q > 1 for a tech-
nical reason as was the case previously [34]. We do not reproduce the proof here because
it is quite straightforward to generalize the discussions for Theorem 5.3 in combination
with the argument of [34]. The result (165) applied to the TFIM is that, if the annealing
schedule asymptotically satisfies
2(t + 2)c
M
Γ(t) ≥ exp − , (166)
β b
the inhomogeneous Markov chain is weakly ergodic. Notice that this annealing schedule
is faster than the power law of (139). We have been unable to prove strong ergodicity
because we could not identify the stationary distribution for a fixed T1 (t) in the present
case.
32
5.2.3 Continuous systems
In the above analyses we treated systems with discrete degrees of freedom. Theorem 5.3
does not apply directly to a continuous system. Nevertheless, by discretization of the
continuous space we obtain the following result.
Let us consider a system of N distinguishable particles in a continuous space of finite
volume with the Hamiltonian
N
1 X 2
H= p + V ({ri }). (167)
2m(t) i=1 i
The mass m(t) controls the magnitude of quantum fluctuations. The goal is to find the
minimum of the potential term, which is achieved by a gradual increase of m(t) to infinity
according to the prescription of QA. After discretization of the continuous space (which
is necessary anyway in any computer simulations with finite precision) and an application
of the Suzuki-Trotter formula, the equilibrium partition function acquires the following
expression in the representation to diagonalize spatial coordinates
M N M
!
β X (k) M m(t) X X (k+1) 2
(k)
Z(t) ≈ Tr exp − V {ri } − ri − ri (168)
M 2β i=1
k=1 k=1
with the unit ~ = 1. Theorem 5.3 is applicable to this system under the identification
of T1 (t) with m(t)−1 . We therefore conclude that a logarithmic increase of the mass
suffices to guarantee strong ergodicity of the potential-minimization problem under spatial
discretization.
The coefficient corresponding to the numerator of the right-hand side of (137) is
estimated as
RL1 ≈ M 2 N L2 /β, (169)
(k+1) (k)
where L denotes the maximum value of ri − ri . To obtain this coefficient, let us
consider two extremes. One is that any states are reachable at one step. By definition,
R = 1 and L1 ≈ M 2 N L2 /β, which yields (169). The other case is that only one particle
can move to the nearest neighbor point at one time step. With a (≪ L) denoting the
lattice spacing, we have
M 2 M La
L1 ≈ L − (L − a)2 ≈ . (170)
2β β
Since the number of steps to reach any configurations is estimated as R ≈ N M L/a, we
again obtain (169).
33
where T is the time-ordering operator. The right-hand side can be decomposed into a
product of small-time evolutions,
|ψ(t)i = lim Ĝ0 (tn−1 )Ĝ0 (tn−2 ) · · · Ĝ0 (t1 )Ĝ0 (t0 )|ψ0 i, (172)
n→∞
where tk = k∆t, ∆t = t/n and Ĝ0 (t) = 1−∆t·H(t). In the GFMC, one approximates the
right-hand side of this equation by a product with large but finite n and replaces Ĝ0 (t)
with Ĝ1 (t) = 1 − ∆t(H(t) − ET ), where ET is called the reference energy to be taken
approximately close to the final ground-state energy. This subtraction of the reference
energy simply adjusts the standard of energy and changes nothing physically. However,
practically, this term is important to keep the matrix elements positive and to accelerate
convergence to the ground state as will be explained shortly.
To realize the process of (172) by a stochastic method, we rewrite this equation in a
recursive form, X
ψk+1 (y) = Ĝ1 (y, x; tk )ψk (x), (173)
x
where ψk (x) = hx|ψk i and |xi denotes a basis state. The matrix element of Green’s
function is given by
Ĝ1 (y, x; t) = hy| 1 − ∆t [H(t) − ET ] |xi. (174)
Equation (173) looks similar to a Markov process but is significantly different in sev-
eral
P ways. An important difference is that the Green’s function is not normalized,
y Ĝ1 (y, x; t) 6= 1. In order to avoid this problem, one decomposes the Green’s function
into a normalized probability G1 and a weight w:
where
Ĝ1 (y, x; t) Ĝ1 (y, x; t)
G1 (y, x; t) ≡ P , w(x; t) ≡ . (176)
y Ĝ1 (y, x; t)
G1 (y, x; t)
Thus, using (173), the wave function at time t is written as
X
ψn (y) = δy,xn w(xn−1 ; tn−1 )w(xn−2 ; tn−2 ) · · · w(x0 ; t0 )
{xk }
× G1 (xn , xn−1 ; tn−1 )G1 (xn−1 , xn−2 ; tn−2 ) · · · G1 (x1 , x0 ; t0 )ψ0 (x0 ). (177)
The algorithm of GFMC is based on this formula and is defined by a weighted random
walk in the following sense. One first prepares an arbitrary initial wave function ψ0 (x0 ),
all elements of which are non-negative. A random walker is generated, which sits initially
(t = t0 ) at the position x0 with a probability proportional to ψ0 (x0 ). Then the walker
moves to a new position x1 following the transition probability G1 (x1 , x0 ; t0 ). Thus
this probability should be chosen non-negative by choosing parameters appropriately
as described later. Simultaneously, the weight of this walker is updated by the rule
W1 = w(x0 ; t0 )W0 with W0 = 1. This stochastic process is repeated to t = tn−1 . One
actually prepares M independent walkers and let those walkers follow the above process.
Then, according to (177), the wave function ψn (y) is approximated by the distribution
of walkers at the final step weighted by Wn ,
M
1 X (i)
ψn (y) = lim Wn δy,x(i) , (178)
M→∞ M n
i=1
34
As noted above, G1 (y, x; t) should be non-negative, which is achieved by choosing
sufficiently small ∆t (i.e. sufficiently large n) and selecting ET within the instantaneous
spectrum of the Hamiltonian H(t). In particular, when ET is close to the instantaneous
ground-state energy of H(t) for large t (i.e. the final target energy), Ĝ1 (x, x; t) is close
to unity whereas other matrix components of Ĝ1 (t) are small. Thus, by choosing ET this
way, one can accelerate convergence of GFMC to the optimal state in the last steps of
the process.
If we apply this general framework to the TFIM with the σ z -diagonal basis, the matrix
elements of Green’s function are immediately calculated as
1 − ∆t [E0 (x) − ET ] (x = y)
Ĝ1 (y, x; t) = ∆t Γ(t) (x and y differ by a single-spin flip) (179)
0 (otherwise),
P
where E0 (x) = hx| − ij Jij σiz σjz |xi. One should choose ∆t and ET such that 1 −
P
∆t(E0 (x) − ET ) ≥ 0 for all x. Since w(x, t) = y Ĝ1 (y, x; t), the weight is given by
w(x; t) = 1 − ∆t [E0 (x) − ET ] + N ∆t Γ(t). (180)
One can decompose this transition probability into the generation probability and the
acceptance probability as in (112):
(
1
(single-spin flip)
P (y, x) = N (181)
0 (otherwise)
N ∆t Γ(t)
A(y, x; t) = . (182)
1 − ∆t [E0 (x) − ET ] + N ∆t Γ(t)
We shall analyze the convergence properties of stochastic processes under these probabil-
ities for TFIM.
35
Proof of Lemma 5.7. The first part of Lemma 5.7 is trivial because the transition
probability is an increasing function with respect to E0 (x) when P (y, x) > 0 as seen in
(182). Next, we prove the second part of Lemma 5.7. According to (179) and (180),
G1 (x, x; t) is written as
N ∆t Γ(t)
G1 (x, x; t) = 1 − . (187)
1 − ∆t [E0 (x) − ET ] + N ∆t Γ(t)
Since the transverse field Γ(t) decreases to zero with time, the second term on the right-
hand side tends to zero as t → ∞. Thus, there exists t1 > 0 such that G1 (x, x; t) > 1 − ε
for ∀ε > 0 and ∀t > t1 . On the other hand, the right-hand side of (185) converges to zero
as t → ∞. We therefore have (185).
Proof of Theorem 5.6. We show that the condition (183) is sufficient to satisfy the
three conditions of Theorem 5.2.
1. From Lemma 5.7, we obtain a bound on the coefficient of ergodicity for sufficiently
large k as
N
∆t Γ(kN − 1)
1 − α(G1kN,kN −N ) ≥ , (188)
1 − ∆t (Emin − ET ) + N ∆t Γ(kN − 1)
in the same manner as we derived (152), where we used R = N . Substituting the
annealing schedule (183), we can prove weak ergodicity from Theorem 5.1 because
∞ h ∞
i b′ N
1 − α(G1kN,kN −N ) ≥
X X
(189)
(kN )cN
k=1 k=k0
Thus, we have
X N ∆t Γ(t) w(y; t) X ∆t Γ(t) w(x; t)
G1 (y, x; t)q(x; t) = 1 − +
w(y; t) A w(x; t) A
x∈S x∈Sy
N ∆t Γ(t) ∆t Γ(t) X
= q(y; t) − + 1, (192)
A A
x∈Sy
36
where we used Tr Jij σiz σjz = 0. Since the volume of Sy is N , (192) indicates that q(x; t)
P
is the stationary distribution of G1 (y, x; t). The right-hand side of (190) is easily derived
from the above equation.
3. Since the transverse field Γ(t) decreases monotonically with t, the above stationary
distribution q(x; t) is an increasing function of t if E0 (x) < 0 and is decreasing if E0 ≥ 0.
Consequently, using the same procedure as in (160), we have
X
kq(t + 1) − q(t)k = 2 [q(x; t + 1) − q(x; t)], (194)
E0 (x)<0
and thus ∞
X X
kq(t + 1) − q(t)k = 2 [q(x; ∞) − q(x; 0)] ≤ 2. (195)
t=0 E0 (x)<0
P∞
Therefore the sum t=0 kq(t + 1) − q(t)k is finite, which completes the proof of the
condition 3.
Remark. Theorem 5.6 asserts convergence of the distribution of random walkers to the
equilibrium distribution (190) with Γ(t) → 0. This implies that the final distribution
is not delta-peaked at the ground state with minimum E0 (x) but is a relatively mild
function of this energy. The optimality of the solution is achieved after one takes the
weight factor w(x; t) into account: The repeated multiplication of weight factors as in
(177), in conjunction with the relatively mild distribution coming from the product of
G1 as mentioned above, leads to the asymptotically delta-peaked wave function ψn (y)
because w(x; t) is larger for smaller E0 (x) as seen in (180).
which is equal to Ĝ0 (t) to the order ∆t. The matrix element of Ĝ2 (t) in the σ z -diagonal
basis is
Ĝ2 (y, x; t) = coshN (∆t Γ(t)) tanhδ (∆t Γ(t)) e−∆t E0 (x) , (197)
where δ is the number of spins in different states in x and y. According to the scheme
of GFMC, we decompose Ĝ2 (y, x; t) into the normalized transition probability and the
weight:
N
cosh(∆t Γ(t))
G2 (y, x; t) = tanhδ (∆t Γ(t)), (198)
e∆t Γ(t)
w2 (x; t) = e∆t N Γ(t) e−∆t E0 (x) . (199)
It is remarkable that the transition probability G2 is independent of E0 (x), although it
depends on x through δ. Thus, the stationary distribution of random walk is uniform.
This property is lost if one interchanges the order of the two factors in (196).
The property of strong ergodicity can be shown to hold in this case as well:
37
Theorem 5.8 (Strong ergodicity of QA-GFMC 2). The inhomogeneous Markov chain
generated by (198) is strongly ergodic if
1
Γ(t) ≥ − log 1 − 2b(t + 1)−1/N . (200)
2∆t
Remark. For sufficiently large t, the above annealing schedule is reduced to
b
Γ(t) ≥ . (201)
∆t (t + 1)1/N
Since the proof is quite similar to the previous cases, we just outline the idea of the
proof. The transition probability G2 (y, x; t) becomes smallest when δ = N . Consequently,
the coefficient of ergodicity is estimated as
N
1 − e−2∆t Γ(t)
t+1,t
1 − α(G2 )≥ .
2
We note that R is equal to 1 in the present case because any states are reachable from
an arbitrary state in a single step. From Theorem 5.1, the condition
N
1 − e−2∆t Γ(t) b′
≥ (202)
2 t+1
is sufficient for weak ergodicity. From this, one obtains (200). Since the stationary
distribution of G2 (y, x; t) is uniform as mentioned above, strong ergodicity readily follows
from Theorem 5.2.
Similarly to the case of PIMC, we can discuss the convergence condition of QA-
GFMC in systems with continuous degrees of freedom. The resulting sufficient condition
is a logarithmic increase of the mass as will be shown now. The operator Ĝ2 generated
by the Hamiltonian (167) is written as
N
!
∆t X 2 −∆tV ({ri })
Ĝ2 (t) = exp − p e . (203)
2m(t) i=1 i
where x and y represent {ri } and {r′i }, respectively. Summation over y, i.e. integration
over {r ′i }, yields the weight w(x; t), from which the transition probability is obtained:
w(x; t) ∝ e−∆tV ({r i }) , (205)
N
!
m(t) X ′ 2
G2 (y, x; t) ∝ exp − |r − r i | . (206)
2∆t i=1 i
The lower bound for the transition probability depends exponentially on the mass: G2 (y, x; t) ≥
e−Cm(t) . Since 1 − α(Gt+1,t
2 ) has the same lower bound, the sufficient condition for weak
ergodicity is e−Cm(t) ≥ (t + 1)−1 , which is rewritten as
m(t) ≤ C −1 log(t + 1). (207)
The constant C is proportional to N L2 /∆t, where L denotes the maximum value of
|r′ − r|. The derivation of C is similar to (169), because G2 (t) allows any transition to
arbitrary states at one time step.
38
6 Summary and perspective
In this paper we have studied the mathematical foundation of quantum annealing, in
particular the convergence conditions and the reduction of residual errors. In Sec. 2, we
have seen that the adiabaticity condition of the quantum system representing quantum
annealing leads to the convergence condition, i.e. the condition for the system to reach
the solution of the classical optimization problem as t → ∞ following the real-time
Schrödinger equation. The result shows the asymptotic power decrease of the transverse
field as the condition for convergence. This rate of decrease of the control parameter
is faster than the logarithmic rate of temperature decrease for convergence of SA. It
nevertheless does not mean the qualitative reduction of computational complexity from
classical SA to QA. Our method deals with a very generic system that represents most of
the interesting problems including worst instances of difficult problems, for which drastic
reduction of computational complexity is hard to expect.
Section 3 reviews the quantum-mechanical derivation of the convergence condition of
SA using the classical-quantum mapping without an extra dimension in the quantum sys-
tem. The adiabaticity condition for the quantum system has been shown to be equivalent
to the quasi-equilibrium condition for the classical system at finite temperature, repro-
ducing the well-known convergence condition of SA. The adiabaticity condition thus leads
to the convergence condition of both QA and SA. Since the studies of QAE often exploits
the adiabaticity condition to derive the computational complexity of a given problem, the
adiabaticity may be seen as a versatile tool traversing QA, SA and QAE.
Section 4 is for the reduction of residual errors after finite-time quantum evolution of
real- and imaginary-time Schrödinger equations. This is a different point of view from
the usual context of QAE, where the issue is to reduce the evolution time (computational
complexity) with the residual error fixed to a given small value. It has been shown
that the residual error can becomes significantly smaller by the ingenious choice of the
time dependence of coefficients in the quantum Hamiltonian. This idea allows us to
reduce the residual error for any given QAE-based algorithm without compromising the
computational complexity apart from a possibly moderate numerical factor.
In Sec. 5 we have derived the convergence condition of QA implemented by Quantum
Monte Carlo simulations of path-integral and Green function methods. These approaches
bear important practical significance because only stochastic methods allow us to treat
practical large-size problems on the classical computer. A highly non-trivial result in
this section is that the convergence condition for the stochastic methods is essentially
the same power-law decrease of the transverse-field term as in the Schrödinger dynamics
of Sec. 2. This is surprising since the Monte Carlo (stochastic) dynamics is completely
different from the Schrödinger dynamics. Something deep may lie behind this coincidence
and it should be an interesting target of future studies.
The results presented and/or reviewed in this paper serve as the mathematical foun-
dation of QA. We have also stressed the similarity/equivalence of QA and QAE. Even
the classical SA can be viewed from the same framework of quantum adiabaticity as long
as the convergence conditions are concerned. Since the studies of very generic properties
of QA seem to have been almost completed, fruitful future developments would lie in the
investigations of problems specific to each case of optimization task by analytical and
numerical methods.
Acknowledgement
We thank G. E. Santoro, E. Tosatti and S. Suzuki for discussions and G. Ortiz for
useful comments and correspondence on the details of the proofs of Theorems in Sec. 3.
39
Financial supports by CREST(JST), DEX-SMI and JSPS are gratefully acknowledged.
A Hopf ’s inequality
In this Appendix, we prove the inequality (22). Although Hopf [32] originally proved this
inequality for positive linear integral operators, we concentrate on a square matrix for
simplicity.
Let M be a strictly positive m × m matrix. The strict positivity means that all the
elements of M are positive, namely, Mij > 0 for all i, j, which will be denoted by M > 0.
Similarly, M ≥ 0 means that Mij ≥ 0 for all i, j. We use the same notation for a vector,
that is, v > 0 means that all the elements vi are positive.
The product of the matrix M and an m-element column vector v is denoted as usual
by M v and its ith element is
Xm
(M v)i = Mij vj . (208)
j=1
Mv > 0 if v ≥ 0, v 6= 0, (209)
vi (M v)i (M v)i vi
min ≤ min ≤ max ≤ max , (210)
i pi i (M p)i i (M p)i i pi
because
m
vi X vi
(M v)i − min (M p)i = Mij vj − min pj ≥ 0, (211)
i pi i pi
j=1
m
vi X vi
max (M p)i − (M v)i = Mij max pj − vj ≥ 0. (212)
i pi j=1
i pi
The above inequality implies that the difference between maximum and minimum of
(M v)i /(M p)i is smaller than that of vi /pi . Following [32], we use the notation,
vi vi vi
osc ≡ max − min , (213)
i pi i pi i pi
40
This assumption is rewritten by the product form as
(M v)i
≤ κ, v ≥ 0, v 6= 0, (217)
(M v)j
for all i, j and such v. The following Theorem states that the inequality (210) is sharpened
under the above additional assumption (217).
Theorem A.1. If M satisfies the conditions (209) and (217), for any p > 0 and any
complex-valued v,
(M v)i κ−1 vi
osc ≤ osc . (218)
i (M p)i κ + 1 i pi
Proof. We consider a real-valued vector v at first. For fixed i, j and fixed p > 0, we
define Xk by
m
(M v)i (M v)j X
− = Xk vk . (219)
(M p)i (M p)j
k=1
We do not have to know the exact form of Xk = XkP (i, j, p). When v = ap, the left-hand
side of the above equation vanishes, which implies k Xk pk = 0. Thus, we have
m
(M v)i (M v)j X
− = Xk (vk − apk ). (220)
(M p)i (M p)j
k=1
Now we choose
vi vi
a = min , b = max . (221)
i pi i pi
Since vk − apk = (b − a)pk − (bpk − vk ), vk − apk takes its minimum 0 at vk = apk and
its maximum (b − a)pk at vk = bpk . Therefore, the right-hand side of (220) with p given
attains its maximum for
where we defined
( (
pi (Xi ≤ 0) 0 (Xi ≤ 0)
p−
i = , p+
i = . (223)
0 (Xi > 0) pi (Xi > 0)
Consequently, we have
(M p+ )i (M p+ )j
(M v)i (M v)j
− ≤ − (b − a). (224)
(M p)i (M p)j (M p)i (M p)j
M p− ≥ 0, M p+ ≥ 0, M p = M p− + M p+ > 0. (225)
(M p+ )i (M p+ )j 1 1 (M p− )i (M p− )j
− ≤ − , t≡ , t′ ≡ . (226)
(M p)i (M p)j 1 + t 1 + t′ (M p+ )i (M p+ )j
41
Since, from the assumption (217), t and t′ are bounded from κ−1 to κ, we find t′ ≤ κt2 ,
which yields
(M p+ )i (M p+ )j 1 1
− ≤ − . (227)
(M p)i (M p)j 1 + t 1 + κ2 t
For t > 0, the right-hand side of the above inequality takes its maximum value (κ −
1)/(κ + 1) at t = κ−1 . Finally, we obtain
(M v)i (M v)j κ−1 vi
− ≤ osc (228)
(M p)i (M p)j κ + 1 i pi
for any i, j. Hence it holds for the sup of the left-hand side, which yields (218).
For a complex-valued vector v, we replace vi by Re (ηvi ). Since M Re (ηv) = Re (ηM v),
the same argument for the real vector case yields
(M v)i κ−1 vi
osc Re η ≤ osc Re η . (229)
i (M p)i κ+1 i pi
Taking the sup with respect to η, |η| = 1, on both sides, we obtain (218).
Proof of Theorem A.2. Let us consider two vectors p, p̄ which are non-negative and
unequal to 0, and define
pn+1 = M pn , p̄n+1 = M p̄n , p0 = p, p̄0 = p̄. (234)
From the hypothesis (209), both pn and p̄n are strictly positive for n > 0. We find by
repeated applications of Theorem A.1 that, for n > 1,
n−1
p̄n,i κ−1 p̄1,i
osc ≤ osc , (235)
i pn,i κ+1 i p1,i
42
where we used the notation pn,i = (pn )i . Consequently, there exists a finite constant
λ > 0, such that
p̄n,i
−→ λ (n −→ ∞) (236)
pn,i
for every i. We normalize the vectors pn , p̄n as
pn p̄n
qn = , q̄ n = , (237)
pn,k p̄n,k
Using (235) and (236), we estimate (239) for qn+1,i − qn,i , which implies that the sequence
q n converges to a limit vector q. Because of (238), we have q > 0. Now (236) reads
pn+1,i (M pn )i (M q n )i
= = −→ λ0 . (241)
pn,i (pn )i (q n )i
Consequently, M q = λ0 q. For any other initial vector p̄, the sequence q̄ n converges to the
same limit as q n because of (235), (236) and (239). Theorem A.2 is thereby proved.
where we used (215). If λ = λ0 , the above inequality implies that osci vi /qi = 0 or v = cq,
which provides the second part of Theorem A.3.
43
B.1 Coefficient of ergodicity
Let us recall the definition of the coefficient of ergodicity
( )
X
α(G) = 1 − min min{G(z, x), G(z, y)} . (244)
x,y∈S
z∈S
Thus, we find
1X X
|G(z, x) − G(z, y)| = [G(z, x) − G(z, y)]
2
z∈S +
z∈SG
X
= max {0, G(z, x) − G(z, y)}
z∈S
X
= [G(z, x) − min{G(z, x), G(z, y)}]
z∈S
X
=1− min{G(z, x), G(z, y)}, (248)
z∈S
for any x, y. Hence taking the max with respect to x, y on both sides, we obtain (245).
To derive the conditions for weak and strong ergodicity, the following Lemmas are
useful.
Lemma B.1. Let G be a transition matrix. Then the coefficient of ergodicity satisfies
0 ≤ α(G) ≤ 1. (249)
Lemma B.2. Let G and H be transition matrices on S. Then the coefficient of ergodicity
satisfies
α(GH) ≤ α(G)α(H). (250)
Lemma B.3. Let G be a transition matrix and H be a square matrix on S such that
X
H(z, x) = 0, (251)
z∈S
44
for any x ∈ S. Then we have
kGHk ≤ α(G)kHk, (252)
where the norm of a square matrix defined by
( )
X
kAk ≡ max |A(z, x)| . (253)
x∈S
z∈S
Proof of Lemma B.1. The definition of α(G) implies α(G) ≤ 1 because G(y, x) ≥ 0.
From (245), α(G) ≥ 0 is straightforward.
ProofPof Lemma B.2. Let us consider a transition matrix G, a column vector a such
that
P z∈S a(z) = 0, and their product b = Ga. We note that the vector b satisfies
z∈S b(z) = 0 because
" #
X XX X X X
b(z) = G(z, y)a(y) = a(y) G(z, y) = a(y) = 0. (254)
z∈S z∈S y∈S y∈S z∈S y∈S
We define subsets of S by
Therefore, we obtain
X X X
|b(z)| = 2 G(z, u)a(u)
z∈S z∈Sb+ u∈S
X X X X
=2 G(z, u) a(u) + 2 G(z, u) a(u)
u∈Sa+ z∈Sb+ u∈Sa− z∈Sb+
X X X X
≤ 2 max G(z, v) a(u) + 2 min G(z, w) a(u)
v∈S w∈S
z∈Sb+ u∈Sa+ z∈Sb+ u∈Sa−
X X
= max [G(z, v) − G(z, w)] |a(u)|
v,w∈S
+
z∈Sb u∈S
( )
X X
≤ max max {0, G(z, v) − G(z, w)} |a(u)|
v,w∈S
z∈S u∈S
( )
1 X X
= max |G(z, v) − G(z, w)| |a(u)|
2 v,w∈S
z∈S u∈S
X
= α(G) |a(u)| , (258)
u∈S
45
where we used (248) and (245).
Next, we consider transition matrices G, H and F = GH. We take a(z) = H(z, x) −
H(z, y), and then (258) is rewritten as
X X
|F (z, x) − F (z, y)| ≤ α(G) |H(u, x) − H(u, y)|, (259)
z∈S u∈S
for any x, y. Hence this inequality holds for the max of both sides with respect to x, y,
which yields Lemma B.2.
Proof of Lemma B.3. Let P us consider F = GH. We can take a(z) = H(z, x) in (258)
because of the assumption y∈S H(y, x) = 0. Thus we have
X X
|F (z, x)| ≤ α(G) |H(u, x)|, (260)
z∈S u∈S
for any x. Hence this inequality holds for the max of both sides with respect to x, which
provides Lemma B.3.
Proof. We assume that the inhomogeneous Markov chain generated by G(t) is weakly
ergodic. For fixed x, y ∈ S, we define probability distributions by
( (
1 (z = x) 1 (z = y)
px (z) = , py (z) = . (262)
0 (otherwise) 0 (otherwise)
Since px (t, s; z) = u∈S Gt,s (z, u)px (u) = Gt,s (z, x) and py (t, s; z) = Gt,s (z, y), we have
P
X X
Gt,s (z, x) − Gt,s (z, y) = |px (t, s; z) − py (t, s; z)|
z∈S z∈S
≤ sup{kp(t, s) − p′ (t, s)k | p0 , p′0 ∈ P}. (263)
H = (p0 , q0 , · · · , q0 ), (265)
t,s
F = G H = (p(t, s), q(t, s), · · · , q(t, s)), (266)
where p(t, s) = Gt,s p0 , q(t, s) = Gt,s q0 . From (245), the coefficient of ergodicity for F is
rewritten as
1X 1
α(F ) = |p(t, s; z) − q(t, s; z)| = kp(t, s) − q(t, s)k. (267)
2 2
z∈S
46
Thus Lemmas B.1 and B.2 yield
kp(t, s) − q(t, s)k ≤ 2α(Gt,s )α(H) ≤ 2α(Gt,s ). (268)
Taking the sup with respect to p0 , q0 ∈ S and the limit t → ∞, we obtain
lim sup{kp(t, s) − q(t, s)k | p0 , q0 ∈ P} ≤ 2 lim α(Gt,s ) = 0, (269)
t→∞ t→∞
for any s > 0. Therefore the inhomogeneous Markov chain generated by G(t) is weakly
ergodic.
Next, we prove Theorem 5.1. For this purpose, the following Lemma is useful.
Lemma B.5. Let a0 , a1 , · · · , an , · · · be a sequence such that 0 ≤ ai < 1 for any i.
∞
X ∞
Y
ai = ∞ =⇒ (1 − ai ) = 0. (270)
i=0 i=n
In
P∞ the limit m → ∞, the right-hand side converges to zero because of the assumption
i=0 ai = ∞. Therefore we obtain (270).
Proof of Theorem 5.1. We assume that the inhomogeneous Markov chain generated
by G(t) is weakly ergodic. Theorem B.4 yields
lim 1 − α(Gt,s ) = 1
(272)
t→∞
for any s > 0. Thus, there exists t1 such that 1 − α(Gt1 ,t0 ) > 1/2 with t0 = s. Similarly,
there exists tn+1 such that 1 − α(Gtn+1 ,tn ) > 1/2 for any tn > 0. Therefore,
n
X 1
1 − α(Gti+1 ,ti ) > (n + 1).
(273)
i=0
2
For fixed s and t such that t > s ≥ 0, we define n and m by tn−1 ≤ s < tn , tm < t ≤ tm+1 .
Thus, from Lemma B.2, we obtain
α(Gt,s ) ≤ α(Gt,tm )α(Gtm ,tm−1 ) · · · α(Gtn+1 ,tn )α(Gtn ,s )
"m #
Y
t,tm ti+1 ,ti
= α(G ) α(G ) α(Gtn ,s ). (275)
i=n
In the limit t → ∞, m goes to infinity and then the right-hand side converges to zero
because of (274). Thus we have
lim α(Gt,s ) = 0, (276)
t→∞
for any s. Therefore, from Theorem B.4, the inhomogeneous Markov chain generated by
G(t) is weakly ergodic.
47
B.3 Conditions for strong ergodicity
The goal of this section is to give the proof of Theorem 5.2. Before that, we prove the
following Theorem, which also provides the sufficient condition for strong ergodicity.
Theorem B.6. An inhomogeneous Markov chain generated by G(t) is strongly ergodic if
there exists the transition matrix H on S such that H(z, x) = H(z, y) for any x, y, z ∈ S
and
lim Gt,s − H = 0 (277)
t→∞
X X
kp(t, s) − rk = Gt,s (z, x)p0 (x) − H(z, u)
z∈S x∈S
X X
Gt,s (z, x) − H(z, u) p0 (x)
=
z∈S x∈S
XX XX
≤ Gt,s (z, x) − H(z, u) = Gt,s (z, x) − H(z, x)
z∈S x∈S z∈S x∈S
X
t,s t,s
≤ G − H = |S| G −H . (278)
x∈S
Taking the sup with respect to p0 ∈ P and using the assumption (277), we obtain
Proof of Theorem 5.2. We assume that the three conditions in Theorem 5.2 hold.
Since the condition 3 is rewritten as
∞
XX ∞
X
|pt (x) − pt+1 (x)| = kpt − pt+1 k < ∞, (280)
x∈S t=0 t=0
we have
∞
X
|pt (x) − pt+1 (x)| < ∞ (281)
t=0
for any x ∈ S. Thus, the stationary state pt converges to p = limt→∞ pt . Now, let
us define a transition matrices H and H(t) by H(z, x) = p(z) and H(z, x; t) = pt (z),
respectively. For t > u > s ≥ 0,
Thus, we evaluate each term on the right-hand side and show that (277) holds.
[1st term] Lemma B.3 yields that
48
where we used kGu,s − H(u)k ≤ 2. Since the Markov chain is weakly ergodic (condition
1), Theorem B.4 implies that
ε
∀ε > 0, ∃t1 > 0, ∀t > t1 : Gt,u Gu,s − Gt,u H(u) < . (284)
3
[2nd term] Since pt = G(t)pt (condition 2), we find
and then
Gt,u H(u) = Gt,u+1 H(u) = Gt,u+1 [H(u) − H(u + 1)] + Gt,u+1 H(u + 1). (286)
The last term on the right-hand side of the above equation is similarly rewritten as
Gt,u+1 H(u + 1) = Gt,u+2 [H(u + 1) − H(u + 2)] + Gt,u+2 H(u + 2). (287)
Gt,v+1 [H(v) − H(v + 1)] ≤ kH(v) − H(v + 1)k = kpv − pv+1 k , (290)
Therefore
ε
∀ε > 0, ∃t2 > 0, ∀t > ∀u ≥ t2 : Gt,u H(u) − H(t − 1) < . (293)
3
[3rd term] From the definitions of H and H(t), they clearly satisfy
49
which implies that
ε
∀ε > 0, ∃t3 > 0, ∀t > t3 : kH(t − 1) − Hk < . (295)
3
Consequently, substitution of (284), (293) and (295) into (282) yields that
ε ε ε
Gt,s − H < + + < ε, (296)
3 3 3
for all t > max{t1 , t2 , t3 }. Since ε is arbitrarily small, (277) holds for any s > 0 and then
the given Markov chain is strongly ergodic from Theorem B.6, which completes the proof
of the first part of Theorem 5.2.
Thus, we obtain
Gt,t0 − H q0 ≤ Gt,t0 − H .
kq(t, t0 ) − pk = (298)
Hence it holds for the sup with respect to q0 ∈ P, which yields (121) in the limit of
t → ∞. Theorem 5.2 is thereby proved.
References
[1] M. R. Garey and D. S. Johnson: Computers and Intractability: A Guide to the
Theory of NP-Completeness (Freeman, San Francisco, 1979)
[2] A. K. Hartmann and M. Weigt: Phase Transitions in Combinatorial Optimization
Problems: Basics, Algorithms and Statistical Mechanics (Wiley-VCH, Weinheim,
2005)
[3] K. Helsgaun: Euro. J. Op. Res. 126 (2000) 106.
[4] S. Kirkpatrick, S. D. Gelett and M. P. Vecchi: Science 220 (1983) 671
[5] E. Aarts and J. Korst: Simulated Annealing and Boltzmann Machines: A Stochastic
Approach to Combinatorial Optimization and Neural Computing (Wiley, New York,
1984)
[6] A. B. Finnila, M. A. Gomez, C. Sebenik, S. Stenson, and J. D. Doll: Chem. Phys.
Lett. 219 (1994) 343
[7] T. Kadowaki and H. Nishimori: Phys. Rev. E 58 (1998) 5355
[8] T. Kadowaki: Study of Optimization Problems by Quantum Annealing (Thesis,
Tokyo Institute of Technology, 1999); quant-ph/0205020
[9] A. Das and B. K. Charkrabarti: Quantum Annealing and Related Optimization Meth-
ods (Springer, Berlin, Heidelberg, 2005) Lecture Notes in Physics, Vol. 679
[10] G. E. Santoro and E. Tosatti: J. Phys. A 39 (2006) R393
[11] A. Das and B. K. Chakrabarti: arXiv:0801.2193 (to be published in Rev. Mod.
Phys.).
[12] B. Apolloni, C. Carvalho and D. de Falco: Stoch. Proc. Appl. 33 (1989) 233
50
[13] B. Apolloni, N. Cesa-Bianchi and D. de Falco: in Stochastic Processes, Physics and
Geometry, eds. S. Albeverio et al. (World Scientific, Singapore, 1990) 97
[14] G. E. Santoro, R. Martoňák, E. Tosatti and R. Car: Science 295 (2002) 2427
[15] R. Martoňák, G. E. Santoro and E. Tosatti: Phys. Rev. B 66 (2002) 094203
[16] S. Suzuki and M. Okada: J. Phys. Soc. Jpn. 74 (2005) 1649
[17] M. Sarjala, V. Petäjä and M. Alava: J. Stat. Mech. (2006) P01008
[18] S. Suzuki, H. Nishimori, and M. Suzuki: Phys. Rev. E 75 (2007) 051112
[19] R. Martoňák, G. E. Santoro and E. Tosatti: Phys. Rev. E 70 (2004) 057701
[20] L. Stella, G. E. Santoro and E. Tosatti: Phys. Rev. B 72 (2005) 014303
[21] L. Stella, G. E. Santoro and E. Tosatti: Phys. Rev. B 73 (2006) 144302
[22] A. Das, B. K. Chakrabarti and R. B. Stinchcombe: Phys. Rev. E 72 (2005) 026701
[23] H. F. Trotter: Proc. Am. Math. Soc. 10 (1959) 545
[24] M. Suzuki: Prog. Theor. Phys. 46 (1971) 1337
[25] D. P. Landau and K. Binder: A Guide to Monte Carlo Simulations in Statistical
Physics (Cambridge, Cambridge University Press, 2000) Chap. 8
[26] E. Farhi, J. Goldstone, S. Gutomann and M. Sipser: quant-ph/0001106
[27] A. Mizel, D. A. Lidar and M. Mitchel: Phys. Rev. Lett. 99 (2007) 070502.
[28] S. Morita: Analytic Study of Quantum Annealing (Thesis, Tokyo Institute of Tech-
nology, 2008).
[29] S. Morita and H. Nishimori: J. Phys. Soc. Jpn. 76 (2007) 064002.
[30] A. Messiah: Quantum Mechanics (Wiley, New York, 1976)
[31] R. D. Somma, C. D. Batista, and G. Ortiz: Phys. Rev. Lett. 99 (2007) 030603
[32] E. Hopf: J. Math. Mech. 12 (1963) 683
[33] S. Geman and D. Geman: IEEE Trans. Pattern Anal. Mach. Intell. PAMI-6 (1984)
721
[34] H. Nishimori and J. Inoue: J. Phys. A: Math. Gen. 31 (1998) 5661
[35] H. Nishimori and Y. Nonomura: J. Phys. Soc. Jpn. 65 (1996) 3780
[36] E. Seneta: Non-negative Matrices and Markov Chains (Springer, New York, 2006)
[37] S. Morita, J. Phys. Soc. Jpn. 76 (2007) 104001
[38] L. D. Landau and E. M. Lifshitz: Quantum Mechanics: Non-Relativistic Theory
(Pergamon Press, Oxford, 1965)
[39] C. Zener: Proc. R. Soc. London Ser. A 137 (1932) 696
[40] S. Morita and H. Nishimori, J. Phys. A: Math. and Gen. 39 (2006) 13903
[41] H. W. Press, A. S. Tuekolosky, T. W. Vettering and P. B. Flannery: Numerical
Recipes in C (Cambridge University Press, Cambridge, 1992) 2nd ed.
[42] L. K. Grover: Phys. Rev. Lett. 79 (1997) 325
[43] J. Roland and N. J. Cerf: Phys. Rev. A 65 (2002) 042308
[44] C. Tsallis and D. A. Stariolo: Physica A 233 (1996) 395
[45] D. M. Ceperley and B. J. Alder: Phys. Rev. Lett. 45 (1980) 566
[46] N. Trivedi and D. M. Ceperley: Phys. Rev. B 41 (1990) 4552
[47] L. Stella and G. E. Santoro: Phys. Rev. E 75 (2007) 036703
51