0% found this document useful (0 votes)
12 views51 pages

Mathematical Foundation of Quantum Annealing: Satoshi Morita and Hidetoshi Nishimori

Uploaded by

waller1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views51 pages

Mathematical Foundation of Quantum Annealing: Satoshi Morita and Hidetoshi Nishimori

Uploaded by

waller1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

Mathematical Foundation of Quantum Annealing

Satoshi Morita∗ and Hidetoshi Nishimori†


arXiv:0806.1859v1 [quant-ph] 11 Jun 2008

Abstract
Quantum annealing is a generic name of quantum algorithms to use quantum-mechanical
fluctuations to search for the solution of optimization problem. It shares the basic idea
with quantum adiabatic evolution studied actively in quantum computation. The present
paper reviews the mathematical and theoretical foundation of quantum annealing. In par-
ticular, theorems are presented for convergence conditions of quantum annealing to the
target optimal state after an infinite-time evolution following the Schrödinger or stochas-
tic (Monte Carlo) dynamics. It is proved that the same asymptotic behavior of the
control parameter guarantees convergence both for the Schrödinger dynamics and the
stochastic dynamics in spite of the essential difference of these two types of dynamics.
Also described are the prescriptions to reduce errors in the final approximate solution
obtained after a long but finite dynamical evolution of quantum annealing. It is shown
there that we can reduce errors significantly by an ingenious choice of annealing schedule
(time dependence of the control parameter) without compromising computational com-
plexity qualitatively. A review is given on the derivation of the convergence condition
for classical simulated annealing from the view point of quantum adiabaticity using a
classical-quantum mapping.

1 Introduction
An optimization problem is a problem to minimize or maximize a real single-valued
function of multivariables called the cost function [1, 2]. If the problem is to maximize
the cost function f , it suffices to minimize −f . It thus does not lose generality to consider
minimization only. In the present paper we consider combinatorial optimization, in which
variables take discrete values. Well-known examples are satisfiability problems (SAT),
Exact Cover, Max Cut, Hamilton graph, and Traveling Salesman Problem. In physics,
the search of the ground state of spin systems is a typical example, in particular, systems
with quenched randomness like spin glasses.
Optimization problems are classified roughly into two types, easy and hard ones.
Loosely speaking, easy problems are those for which we have algorithms to solve in
steps(=time) polynomial in the system size (polynomial complexity). In contrast, for
hard problems, all known algorithms take exponentially many steps to reach the exact
solution (exponential complexity). For these latter problems it is virtually impossible
to find the exact solution if the problem size exceeds a moderate value. Most of the
interesting cases as exemplified above belong to the latter hard class.
It is therefore important practically to devise algorithms which give approximate but
accurate solutions efficiently, i.e. with polynomial complexity. Many instances of com-
binatorial optimization problems have such approximate algorithm. For example, the

International School for Advanced Studies (SISSA), Via Beirut 2-4, I-34014 Trieste, Italy

Department of Physics, Tokyo Institute of Technology, Oh-okayama, Meguro-ku, Tokyo 152-8551, Japan

1
Lin-Kernighan algorithm is often used to solve the traveling salesman problem within a
reasonable time [3].
In the present paper we will instead discuss generic algorithms, simulated annealing
(SA) and quantum annealing (QA). The former was developed from the analogy be-
tween optimization problems and statistical physics [4, 5]. In SA, the cost function to be
minimized is identified with the energy of a statistical-mechanical system. The system
is then given a temperature, an artificially-introduced control parameter, by reducing
which slowly from a high value to zero, we hope to drive the system to the state with
the lowest value of the energy (cost function), reaching the solution of the optimization
problem. The idea is that the system is expected to stay close to thermal equilibrium
during time evolution if the rate of decrease of temperature is sufficiently slow, and is
thus lead in the end to the zero-temperature equilibrium state, the lowest-energy state.
In practical applications SA is immensely popular due to its general applicability, reason-
able performance, and relatively easy implementation in most cases. SA is usually used
as a method to obtain an approximate solution within a finite computation time since it
needs an infinitely long time to reach the exact solution by keeping the system close to
thermal equilibrium.
Let us now turn our attention to quantum annealing [6, 7, 8, 9, 10, 11] 1 . In SA, we
make use of thermal (classical) fluctuations to let the system hop from state to state over
intermediate energy barriers to search for the desired lowest-energy state. Why then not
try quantum-mechanical fluctuations (quantum tunneling) for state transitions if such
may lead to better performance? In QA we introduce artificial degrees of freedom of
quantum nature, non-commutative operators, which induce quantum fluctuations. We
then ingeniously control the strength of these quantum fluctuations so that the system
finally reaches the ground state, just like SA in which we slowly reduce the temperature.
More precisely, the strength of quantum fluctuations is first set to a very large value for
the system to search for the global structure of the phase space, corresponding to the
high-temperature situation in SA. Then the strength is gradually decreased to finally
vanish to recover the original system hopefully in the lowest-energy state. Quantum
tunneling between different classical states replaces thermal hopping in SA. The physical
idea behind such a procedure is to keep the system close to the instantaneous ground state
of the quantum system, analogously to the quasi-equilibrium state to be kept during the
time evolution of SA. Similarly to SA, QA is a generic algorithm applicable, in principle, to
any combinatorial optimization problem and is used as a method to reach an approximate
solution within a given finite amount of time.
The reader may wonder why one should invent yet another generic algorithm when we
already have powerful SA. A short answer is that QA outperforms SA in most cases, at
least theoretically. Analytical and numerical results indicate that the computation time
needed to achieve a given precision of the answer is shorter in QA than in SA. Also, the
magnitude of error is smaller for QA than SA if we run the algorithm for a fixed finite
amount of time. We shall show some theoretical bases for these conclusions in this paper.
Numerical evidence is found in [9, 10, 11, 14, 15, 16, 17, 18, 19, 20, 21, 22].
A drawback of QA is that a full practical implementation should rely on the quantum
computer because we need to solve the time-dependent Schrödinger equation of very
large scale. Existing numerical studies have been carried out either for small prototype
examples or for large problems by Monte Carlo simulations using the quantum-classical
mapping by adding an extra (Trotter or imaginary-time) dimension [23, 24, 25]. The
1
The term quantum annealing first appeared in [12, 13], in which the authors used quantum transitions for
state search and the dynamical evolution of control parameters were set by hand as an algorithm. Quantum
annealing in the present sense using natural Schrödinger dynamics was proposed later independently in [6]
and [7].

2
latter mapping involves approximations, which inevitably introduces additional errors as
well as the overhead caused by the extra dimension. Nevertheless, it is worthwhile to
clarify the usefulness and limitations of QA as a theoretical step towards a new paradigm
of computation. This aspect is shared by quantum computation in general whose practical
significance will be fully exploited on the quantum computer.
The idea of QA is essentially the same as quantum adiabatic evolution (QAE), which
is now actively investigated as an alternative paradigm of quantum computation [26]. It
has been proved that QAE is equivalent to the conventional circuit model of quantum
computation [27], but QAE is sometimes considered more useful than the circuit model
for several reasons including robustness against external disturbance. In the literature
of quantum computation, one is often interested in the computational complexity of the
QAE-based algorithm for a given specific problem under a fixed value of acceptable error.
QAE can also be used to find the final quantum state when the problem is not a classical
optimization.
In some contrast to these situations on QAE, studies of QA are often focused not on
computational complexity but on the theoretical convergence conditions for infinite-time
evolution and on the amount of errors in the final state within a fixed evolution time.
Such a difference may have lead some researchers to think that QA and QAE are to be
distinguished from each other. We would emphasize that they are essentially the same
and worth investigations by various communities of researchers.
The structure of the present paper is as follows. Section 2 discusses the convergence
condition of QA, in particular the rate of decrease of the control parameter representing
quantum fluctuations. It will be shown there that a qualitatively faster decrease of the
control parameter is allowed in QA than in SA to reach the solution. This is one of the
explicit statements of the claim more vaguely stated above that QA outperforms SA.
In Sec. 3 we review the performance analysis of SA using quantum-mechanical tools.
The well-known convergence condition for SA will be rederived from the perspective
of quantum adiabaticity. The methods and results in this section help us strengthen
the interrelation between QA, SA and QAE. The error rate of QA after a finite-time
dynamical evolution is analyzed in Sec. 4. There we explain how to reduce the final
residual error after evolution of a given amount of time. This point of view is unique in
the sense that most references of QAE study the time needed to reach a given amount of
tolerable error, i.e. computational complexity. The results given in this section can be
used to qualitatively reduce residual errors for a given algorithm without compromising
computational complexity. Convergence conditions for stochastic implementation of QA
are discussed in Sec. 5. The results are surprising in that the rate of decrease of the
control parameter for the system to reach the solution coincides with that found in Sec. 2
for the pure quantum-mechanical Schrödinger dynamics. The stochastic (and therefore
classical) dynamics shares the same convergence conditions as fully quantum dynamics.
Summary and outlook are described in the final section.
The main parts of this paper (Secs. 2, 4 and 5) are based on the PhD Thesis of one of
the authors (S.M.) [28] as well as several original papers of the present and other authors
as will be referred to appropriately. The present paper is not a comprehensive review
of QA since an emphasis is given almost exclusively to the theoretical and mathematical
aspects. There exists an extensive body of numerical studies and the reader is referred
to [9, 10, 11] for reviews.

3
2 Convergence condition of QA – Real-time Schrödinger
evolution
The convergence condition of QA with the real-time Schrödinger dynamics is investigated
in this section, following [29]. We first review the proof of the adiabatic theorem [30] to
be used to derive the convergence condition. Then introduced is the Ising model with
transverse field as a simple but versatile implementation of QA. The convergence condition
is derived by solving the condition for adiabatic transition with respect to the strength
of the transverse field.

2.1 Adiabatic theorem


Let us consider the general Hamiltonian which depends on time t only through the di-
mensionless time s = t/τ ,  
t
H(t) = H̃ ≡ H̃(s). (1)
τ
The parameter τ is introduced to control the rate of change of the Hamiltonian. In natural
quantum systems, the state vector |ψ(t)i follows the real-time Schrödinger equation,

d
i |ψ(t)i = H(t) |ψ(t)i , (2)
dt
or, in terms of the dimensionless time,
d
i |ψ̃(s)i = τ H̃(s)|ψ̃(s)i, (3)
ds
where we set ~ = 1. We assume that the initial state is chosen to be the ground state
of the initial Hamiltonian H(0) and that the ground state of H̃(s) is not degenerate for
s ≥ 0. We show in the next section that the transverse-field Ising model, to be used as
H(t) in most parts of this paper, has no degeneracy in the ground state (except possibly in
the limit of t → ∞). If τ is large, the Hamiltonian changes slowly and it is expected that
the state vector keeps track of the instantaneous ground state. The adiabatic theorem
provides the condition for adiabatic evolution. To see this, we derive the asymptotic form
of the state vector with respect to the parameter τ .
Since we wish to estimate how close the state vector is to the ground state, it is natural
to expand the state vector by the instantaneous eigenstates of H̃(s). Before doing so, we
derive useful formulas for the eigenstates. The kth instantaneous eigenstate of H̃(s) with
the eigenvalue εk (s) is denoted as |k(s)i,

H̃(s) |k(s)i = εk (s) |k(s)i . (4)

We assume that |0(s)i is the ground state of H̃(s) and that the eigenstates are orthonor-
mal, hj(s)| k(s)i = δjk . From differentiation of (4) with respect to s, we obtain

d −1 dH̃(s)
hj(s)| |k(s)i = hj(s)| |k(s)i , (5)
ds εj (s) − εk (s) ds

where j 6= k. In the case of j = k, the same calculation does not provide any meaningful
result. We can, however, impose the following condition,
d
hk(s)| |k(s)i = 0. (6)
ds

4
This condition is achievable by the time-dependent phase shift: If |k̃(s)i = eiθ(s) |k(s)i,
we find
d dθ d
hk̃(s)| |k̃(s)i = i + hk(s)| |k(s)i . (7)
ds ds ds
The second term on the right-hand side is purely imaginary because
 ∗
d d d
hk(s)| |k(s)i + hk(s)| |k(s)i = hk(s)|k(s)i = 0. (8)
ds ds ds
Thus, the condition (6) can be satisfied by tuning the phase factor θ(s) even if the original
eigenstate does not satisfy it.
Theorem 2.1. If the instantaneous ground state of the Hamiltonian H̃(s) is not degen-
erate for s ≥ 0 and the initial state is the ground state at s = 0, i.e. |ψ̃(0)i = |0(0)i, the
state vector |ψ̃(s)i has the asymptotic form in the limit of large τ as
X
|ψ̃(s)i = cj (s)e−iτ φj (s) |j(s)i , (9)
j

c0 (s) ≈ 1 + O(τ −2 ), (10)


i h i
cj6=0 (s) ≈ Aj (0) − eiτ [φj (s)−φ0 (s)] Aj (s) + O(τ −2 ), (11)
τ
Rs ′ ′
where φj (s) ≡ 0 ds εj (s ) , ∆j (s) ≡ εj (s) − ε0 (s) and

1 dH̃(s)
Aj (s) ≡ hj(s)| |0(s)i . (12)
∆j (s)2 ds

Proof. Substitution of (9) into the Schrödinger equation (3) yields the equation for the
coefficient cj (s) as

dcj X eiτ [φj (s)−φk (s)] dH̃(s)


= ck (s) hj(s)| |k(s)i , (13)
ds εj (s) − εk (s) ds
k6=j

where we used (5) and (6). Integration of this equation yields


XZ s eiτ [φj (s̃)−φk (s̃)] dH̃(s̃)
cj (s) = cj (0) + ds̃ ck (s̃) hj(s̃)| |k(s̃)i . (14)
0 ε j (s̃) − ε k (s̃) ds̃
k6=j

Since the initial state is chosen to be the ground state of H(0), c0 (0) = 1 and cj6=0 (0) = 0.
The second term on the right-hand side is of the order of τ −1 because its integrand rapidly
oscillates for large τ . In fact, the integration by parts yields the τ −1 -factor. Thus, cj6=0 (0)
is of order τ −1 at most. Hence only the k = 0 term in the summation remains up to the
order of τ −1 ,
Z s
eiτ [φj (s̃)−φ0 (s̃)] dH̃(s̃)
cj6=0 (s) ≈ ds̃ hj(s̃)| |0(s̃)i + O(τ −2 ), (15)
0 ∆j (s̃) ds̃

and the integration by parts yields (11).

Remark. The condition for the adiabatic evolution is given by the smallness of the excita-
tion probability. That is, the right-hand side of (11) should be much smaller than unity.
This condition is consistent with the criterion of the validity of the above asymptotic
expansion. It is represented by
τ ≫ |Aj (s)| . (16)

5
Using the original time variable t, this adiabaticity condition is written as

1 dH(t)
hj(t)| |0(t)i = δ ≪ 1. (17)
∆j (t)2 dt

This is the usual expression of adiabaticity condition.

2.2 Convergence conditions of quantum annealing


In this section, we derive the condition which guarantees the convergence of QA. The
problem is what annealing schedule (time dependence of the control parameter) would
satisfy the adiabaticity condition (17). We solve this problem on the basis of the idea of
Somma et al [31] developed for the analysis of SA in terms of quantum adiabaticity as
reviewed in Sec. 3.

2.2.1 Transverse field Ising model


Let us suppose that the optimization problem we wish to solve can be represented as the
ground-state search of an Ising model of general form
N
X X X
HIsing ≡ − Ji σiz − Jij σiz σjz − Jijk σiz σjz σkz − · · · , (18)
i=1 ij ijk

where the σiα (α = x, y, z) are the Pauli matrices, components of the spin 12 operator at
site i. The eigenvalue of σiz is +1 or −1, which corresponds the classical Ising spin. Most
combinatorial optimization problems can be written in this form by, for example, mapping
binary variables (0 and 1) to spin variables (±1). Another important assumption is that
the Hamiltonian (18) is extensive, i.e. proportional to the number of spins N for large
N.
To realize QA, a fictitious kinetic energy is introduced typically by the time-dependent
transverse field
N
X
HTF (t) ≡ −Γ(t) σix , (19)
i=1

which induces spin flips, quantum fluctuations or quantum tunneling, between the two
states σiz = 1 and σiz = −1, thus allowing a quantum search of the phase space. Ini-
tially the strength of the transverse field Γ(t) is chosen to be very large, and the total
Hamiltonian
H(t) = HIsing + HTF (t) (20)
is dominated by the second kinetic term. This corresponds to the high-temperature
limit of SA. The coefficient Γ(t) is then gradually and monotonically decreased toward
0, leaving eventually only the potential term HIsing . Accordingly the state vector |ψ(t)i,
which follows the real-time Schrödinger equation, is expected to evolve from the trivial
initial ground state of the transverse-field term (19) to the non-trivial ground state of
(18), which is the solution of the optimization problem. An important issue is how slowly
we should decrease Γ(t) to keep the state vector arbitrarily close to the instantaneous
ground state of the total Hamiltonian (20). The following Theorem provides a solution
to this problem as a sufficient condition.
Theorem 2.2. The adiabaticity (17) for the transverse-field Ising model (20) yields the
time dependence of Γ(t) as

Γ(t) = a(δt + c)−1/(2N −1) (21)

6
for t > t0 (for a given positive t0 ) as a sufficient condition of convergence of QA. Here a
and c are constants of O(N 0 ) and δ is a small parameter to control adiabaticity appearing
in (17).
The following Theorem proved by Hopf [32] will be useful to prove this Theorem. See
Appendix A for the proof.
Theorem 2.3. If all the elements of a square matrix M are strictly positive, Mij > 0,
its maximum eigenvalue λ0 and any other eigenvalues λ satisfy
κ−1
|λ| ≤ λ0 , (22)
κ+1
where κ is defined by
Mik
κ ≡ max . (23)
i,j,k Mjk
Proof of Theorem 2.2. We show that the power decay (21) satisfies the adiabaticity
condition (17) which guarantees convergence to the ground state of HIsing as t → ∞. .
For this purpose we estimate the energy gap and the time derivative of the Hamiltonian.
As for the latter, it is straightforward to see
dH(t) dΓ(t)
hj(t)| |0(t)i ≤ −N , (24)
dt dt
since the time dependence of H(t) lies only in the kinetic term HTF (t), which has N
terms. Note that dΓ/dt is negative.
To estimate a lower bound for the energy gap, we apply Theorem 2.3 to the operator
M ≡ (E+ − H(t))N . We assume that the constant E+ satisfies E+ > Emax + Γ0 , where
Γ0 ≡ Γ(t0 ) and Emax is the maximum eigenvalue of the potential term HIsing . All the
elements of the matrix M are strictly positive in the representation that diagonalizes {σiz }
because E+ − H(t) is non-negative and irreducible, that is, any state can be reached from
any other state within at most N steps.
For t > t0 , where Γ(t) < Γ0 , all the diagonal elements of E+ −H(t) are larger than any
non-zero off-diagonal element Γ(t). Thus, the minimum element of M , which is between
two states having all the spins in mutually opposite directions, is equal to N !Γ(t)N , where
N ! comes from the ways of permutation to flip spins. Replacement of HTF (t) by −N Γ0
shows that the maximum matrix element of M has the upper bound (E+ −Emin +N Γ0 )N ,
where Emin is the lowest eigenvalue of HIsing . Thus, we have

(E+ − Emin + N Γ0 )N
κ≤ . (25)
N !Γ(t)N
If we denote the eigenvalue of H(t) by εj (t), (22) is rewritten as
N κ−1 N
[E+ − εj (t)] ≤ [E+ − ε0 (t)] . (26)
κ+1
Substitution of (25) into the above inequality yields
2[E+ − ε0 (t)]N !
∆j (t) ≥ Γ(t)N ≡ AΓ(t)N , (27)
N (E+ − Emin + N Γ0 )N

where we used 1 − ((κ − 1)/(κ + 1))1/N ≥ 2/N (κ + 1) for κ ≥ 1 and N ≥ 1. The coefficient
A is estimated using the Stirling formula as
√ N
2 2πN [E+ − εmax

0 ] N
A≈ , (28)
N eN E+ − Emin + N Γ0

7
where εmax
0 is maxt>t0 {ε0 (t)}. This expression implies that A is exponentially small for
large N .
Now, by combination of the above estimates (24) and (27), we find that the sufficient
condition for convergence for t > t0 is
N dΓ(t)
− = δ ≪ 1, (29)
A2 Γ(t)2N dt
where δ is an arbitrarily small constant. By integrating this differential equation, we
obtain (21).

Remark. The asymptotic power decay of the transverse field guarantees that the exci-
tation probability is bounded by the arbitrarily small constant δ 2 at each instant. This
annealing schedule is not valid when Γ(t) is not sufficiently small because we evaluated
the energy gap for Γ(t) < Γ0 (t > t0 ). If we take the limit t0 → 0, Γ0 increases indefinitely
and the coefficient a in (21) diverges. Then the result (21) does not make sense. This is
the reason why a finite positive time t0 should be introduced in the statement of Theorem
2.2.

2.2.2 Transverse ferromagnetic interactions


The same discussions as above apply to QA using the transverse ferromagnetic interac-
tions in addition to a transverse field,
 
N
X X
HTI (t) ≡ −ΓTI (t)  σix + σix σjx  . (30)
i=1 ij

The second summation runs over appropriate pairs of sites that satisfy extensiveness of
the Hamiltonian. A recent numerical study shows the effectiveness of this type of quantum
kinetic energy [18]. The additional transverse interaction widens the instantaneous energy
gap between the ground state and the first excited state. Thus, it is expected that an
annealing schedule faster than (21) satisfies the adiabaticity condition. The following
Theorem supports this expectation.
Theorem 2.4. The adiabaticity for the quantum system HIsing + HTI (t) yields the time
dependence of Γ(t) for t > t0 as

ΓTI (t) ∝ t−1/(N −1) . (31)


Proof. The transverse interaction introduces non-zero off-diagonal elements to the Hamil-
tonian in the representation that diagonalizes σiz . Consequently, any state can be reached
from any other state within N/2 steps at most. Thus, the strictly positive operator is
modified to (E+ − HIsing − HTI (t))N/2 , which leads to the lower bound for the energy gap
as a quantity proportional to ΓTI (t)N/2 . The rest of the proof is the same as Theorem
2.2.

The above result implies that additional non-zero off-diagonal elements of the Hamil-
tonian accelerates the convergence of QA. It is thus interesting to consider the many-body
transverse interaction of the form
N
Y
HMTI (t) = −ΓMTI (t) (1 + σix ) . (32)
i=1

All the elements of HMTI are equal to −ΓMTI (t) in the representation that diagonalizes
σiz . In this system, the following Theorem holds.

8
Theorem 2.5. The adiabaticity for the quantum system HIsing + HTMI (t) yields the time
dependence of Γ(t) for t > t0 as

2N −2
ΓMTI (t) ∝ . (33)
δt
Proof. We define the strictly positive operator as M = E+ − HIsing − HMTI (t). The
maximum and minimum matrix elements of M are E+ − Emin + ΓMTI (t) and ΓMTI (t),
respectively. Thus we have
E+ − Emin + ΓMTI (t)
κ= , (34)
ΓMTI (t)
κ−1 E+ − Emin 2ΓMTI (t)
= ≥1− , (35)
κ+1 E+ − Emin + 2ΓMTI (t) E+ − Emin

The inequality for the strictly positive operator (22) yields

2ΓMTI (t)(E+ − εmax


0 )
∆j (t) ≥ ≡ Ã ΓMTI (t), (36)
E+ − Emin

where à is O(N 0 ). Since the matrix element of the derivative of the Hamiltonian is
bounded as
dH(t) dΓMTI
hj(t)| |0(t)i ≤ −2N , (37)
dt dt
we find that the sufficient condition for convergence with the many-body transverse in-
teraction is
2N dΓMTI
− = δ ≪ 1. (38)
2
à ΓMTI (t)2 dt
Integrating this differential equation yields the annealing schedule (33).

2.2.3 Computational complexity


The asymptotic power-low annealing schedules guarantee the adiabatic evolution during
the annealing process. The power-law dependence on t is much faster than the log-inverse
law for the control parameter in SA, T (t) = pN/ log(αt + 1), to be discussed in the next
section, first proved by Geman and Geman [33]. However, it does not mean that QA
provides an algorithm to solve NP problems in polynomial time. In the case with the
transverse field only, the time for Γ(t) to reach a sufficiently small value ǫ (which implies
that the system is sufficiently close to the final ground state of HIsing whence HTF is a
small perturbation) is estimated from (21) as
 2N −1
1 1
tTF ≈ . (39)
δ ǫ
This relation clearly shows that the QA needs a time exponential in N to converge.
For QA with many-body transverse interactions, the exponent of t in the annealing
schedule (33) does not depend on the system size N . Nevertheless, it also does not
mean that QA provides a polynomial-time algorithm because of the factor 2N . The
characteristic time for ΓMTI to reach a sufficiently small value ǫ is estimated as

2N −2
tMTI ≈ , (40)
δǫ
which again shows exponential dependence on N .

9
These exponential computational complexities do not come as a surprise because
Theorems 2.2, 2.4 and 2.5 all apply to any optimization problems written in the generic
form (18), which includes the worst cases of most difficult problems. Similar arguments
apply to SA [34].
Another remark is on the comparison of Γ(t)(∝ t−1/(2N −1) ) in QA with T (t)(∝
N/ log(αt + 1)) in SA to conclude that the former schedule is faster than the latter.
The transverse-field coefficient Γ in a quantum system plays the same role qualitatively
and quantitatively as the temperature T does in a corresponding classical system at least
in the Hopfield model in a transverse field [35]. When the phase diagram is written in
terms of Γ and α (the number of embedded patterns divided by the number of neurons)
for the ground state of the model, the result has precisely the same structure as the T -α
phase diagram of the finite-temperature version of the Hopfield model without transverse
field. This example serves as a justification of the direct comparison of Γ and T at least
as long as the theoretical analyses of QA and SA are concerned.

3 Convergence condition of SA and quantum adia-


baticity
We next study the convergence condition of SA to be compared with QA. This prob-
lem was originally solved by Geman and Geman [33] using the theory of inhomogeneous
Markov chain as described in the Quantum Monte Carlo context in Sec. 5. It is quite
surprising that their result is reproduced using the quantum adiabaticity condition ap-
plied after a classical-quantum mapping [31]. This approach is reviewed in this section,
following [31], to clarify the correspondence between the quasi-equilibrium condition for
SA in a classical system and the adiabaticity condition in the corresponding quantum
system. The analysis will also reveal an aspect related to the equivalence of QA and
QAE.

3.1 Classical-quantum mapping


The starting point is an expression of a classical thermal expectation value in terms of a
quantum ground-state expectation value. A well-known mapping between quantum and
classical systems is to rewrite the former in terms of the latter with an extra imaginary-
time (or Trotter) dimension [24]. The mapping discussed in the present section is a
different one, which allows us to express the thermal expectation value of a classical
system in terms of the ground-state expectation value of a corresponding quantum system
without an extra dimension.
Suppose that the classical Hamiltonian, whose value we want to minimize, is written
as an Ising spin system as in (18):
N
X X X
H=− Ji σiz − Jij σiz σjz − Jijk σiz σjz σkz − · · · . (41)
i=1 ij ijk

The thermal expectation value of a classical physical quantity Q({σiz }) is


1 X −βH
hQiT = e Q({σi }), (42)
Z(T )
{σ}

where the sum runs over all configurations of Ising spins, i.e. over the values taken by
the z-components of the Pauli matrices, σiz = σi (±1) (∀i). The symbol {σi } stands for
the set {σ1 , σ2 , · · · , σN }.
An important element is the following Theorem.

10
Theorem 3.1. The thermal expectation value (42) is equal to the expectation value of Q
by the quantum wave function
X
|ψ(T )i = e−βH/2 |{σi }i, (43)
{σ}

where |{σi }iis the basis state diagonalizing each σiz as σi . The sum runs over all such
possible assignments.
Assume T > 0. The wave function (43) is the ground state of the quantum Hamilto-
nian X X
Hq (T ) = −χ Hqj (T ) ≡ −χ (σxj − eβHj ), (44)
j j

where Hj is the sum of the terms of the Hamiltonian (41) involving site j,
X X
Hj = −Jj σjz − Jjk σjz σkz − Jjkl σjz σkz σlz − · · · . (45)
k kl

The coefficient χ is defined by χ = e−βp with p = maxj |Hj |.

Proof. The first half is trivial:


hψ(T )|Q|ψ(T )i 1 X −βH
= e h{σi }|Q|{σi }i = hQiT . (46)
hψ(T )|ψ(T )i Z(T )
{σ}

To show the second half, we first note that


X X
σxj |{σi }i = |{σi }i (47)
{σ} {σ}

since the operator σxj just changes the order of the above summation. It is also easy to
see that
σxj e−βH/2 = eβHj e−βH/2 σxj (48)
because

σxj e−βH/2 σxj = e−β(H−Hj )/2 σxj e−βHj /2 σxj = e−β(H−Hj )/2 eβHj /2 = eβHj e−βH/2 . (49)

as both H and Hj are diagonal in the present representation and H − Hj does not include
σjz , so [H − Hj , σjx ] = 0. We therefore have
X
Hqj (T )|ψ(T )i = (σxj − eβHj ) e−βH/2 |{σi }i = 0. (50)
{σ}

Thus |ψ(T )i is an eigenstate of Hq (T ) with eigenvalue 0. In the present representation,


the non-vanishing off-diagonal elements of −Hq (T ) are all positive and the coefficients of
|ψ(T )i are also all positive as one sees in (43). Then |ψ(T )i is the unique ground state
of Hq (T ) according to the Perron-Frobenius Theorem [36].

A few remarks are in order. In the high-temperature limit, the quantum Hamiltonian
is composed just of the transverse-field term,
X
Hq (T → ∞) = − (σxj − 1). (51)
j

Correspondingly the ground-state wave function |ψ(T → ∞)i is the simple summation
over all possible states with equal weight. In this way the thermal fluctuations in the

11
original classical system are mapped to the quantum fluctuations. The low-temperature
limit has, in contrast, the purely classical Hamiltonian
X
Hq (T ≈ 0) → χ eβHj (52)
j

and the ground state of Hq (T ≈ 0) is also the ground state of H as is apparent from
the definition (43). Hence the decrease of thermal fluctuations in SA is mapped to the
decrease of quantum fluctuations. As explained below, this correspondence allows us
to analyze the condition for quasi-equilibrium in the classical SA using the adiabaticity
condition for the quantum system.

3.2 Adiabaticity and convergence condition of SA


The adiabaticity condition applied to the quantum system introduced above leads to the
condition of convergence of SA. Suppose that we monotonically decrease the temperature
as a function of time, T (t), to realize SA.
Theorem 3.2. The adiabaticity condition for the quantum system of Hq (T ) yields the
time dependence of T (t) as
pN
T (t) = (53)
log(αt + 1)
in the limit of large N . The coefficient α is exponentially small in N .
A few Lemmas will be useful to prove this Theorem.
Lemma 3.3. The energy gap ∆(T ) of Hq (T ) between the ground state and the first
excited state is bounded below as

∆(T ) ≥ a N e−(βp+c)N , (54)

where a and c are N -independent positive constants, in the asymptotic limit of large N .

Proof. The analysis of Sec. 2.2.1 applies with the replacement of Γ(t) by χ = e−βp
and ε0 (t) = 0. This latter condition comes from Hq (T )|ψ(T )i = 0. The condition
Γ(t) < Γ0 (t > t0 ) is unnecessary here because the off-diagonal element χ can always be
chosen smaller than the diagonal elements by adding a positive constant to the diagonal.
Equation (27) gives
∆j (t) ≥ Ae−βpN (55)
and A satisfies, according to (28),

A ≈ b 2πN e−cN (56)

with b and c positive constants of O(N 0 ).

Lemma 3.4. The matrix element of the derivative of Hq (T ), relevant to the adiabaticity
condition, satisfies

∆(T )hψ1 (T )|H|ψ(T )i


hψ1 (T )|∂T Hq (T )|ψ(T )i = − , (57)
2kB T 2
where ψ1 (T ) is the normalized first excited state of Hq (T ).

12
Proof. By differentiating the identity

Hq (T )|ψ(T )i = 0 (58)

we find
   
∂ ∂ 1
Hq (T ) |ψ(T )i = −Hq (T ) |ψ(T )i = Hq (T ) − H |ψ(T )i. (59)
∂T ∂T 2kB T 2

This relation immediately proves the Lemma if we notice that the ground state energy of
Hq (T ) is zero and therefore Hq (T )|ψ1 (T )i = ∆(T )|ψ1 (T )i.

Lemma 3.5. The matrix element of H satisfies


p
|hψ1 (T )|H|ψ(T )i| ≤ pN Z(T ). (60)
P
Proof. There are N terms in H = j Hj , each of which is of norm of at most p. The
p
factor Z(T ) appears from normalization of |ψ(T )i.

Proof of Theorem 3.2. The condition of adiabaticity for the quantum system Hq (T )
reads
1 dT
p hψ1 (T )|∂T Hq (T )|ψ(T )i =δ (61)
2
∆(T ) Z(T ) dt
with sufficiently small δ. If we rewrite the matrix element by Lemma 3.4 , the left-hand
side is
|hψ1 (T )|H|ψ(T )i| dT
p . (62)
2kB T 2 ∆(T ) Z(T ) dt
By replacing the numerator by its bound in Lemma 3.5 we have

pN dT
2
= δ̃ ≪ 1 (63)
2kB T ∆(T ) dt

as a sufficient condition for adiabaticity. Using the bound of Lemma 3.3 and integrating
the above differential equation for T (t) noticing dT /dt < 0, we reach the statement of
Theorem 3.2.

3.3 Remarks
Equation (53) reproduces the Geman-Geman condition for convergence of SA [33]. Their
method of proof is to use the theory of classical inhomogeneous (i.e. time-dependent)
Markov chain representing non-equilibrium processes. It may thus be naively expected
that the classical system under consideration may not stay close to equilibrium during
the process of SA since the temperature always changes. It therefore comes as a surprise
that the adiabaticity condition, which is equivalent to the quasi-equilibrium condition
according to Theorem 3.1, leads to Theorem 3.2. The rate of temperature change in this
latter Theorem is slow enough to guarantee the quasi-equilibrium condition even when
the temperature keeps changing.
Also, Theorem 3.2 is quite general, covering the worst cases, as it applies to any
system written as the Ising model of (41). This fact means that one may apply a faster
rate of temperature decrease to solve a given specific problem with small errors. The
same comment applies to the QA situation in Sec. 2.
Another remark is on the relation of QA and QAE. Mathematical analyses of QA
often focus their attention to the generic convergence conditions in the infinite-time limit
as seen in Secs. 2 and 5 as well as in the early paper [7], although the residual energy after

13
finite-time evolution has also been extensively investigated mainly in numerical studies.
This aspect may have lead some researchers to think that QA is different from QAE,
since the studies using the latter mostly concern the computational complexity of finite-
time evolution for a given specific optimization problem using adiabaticity to construct
an algorithm of QAE. As has been shown in the present and the previous sections, the
adiabaticity condition also leads to the convergence condition in the infinite-time limit for
QA and SA. In this sense QA, QAE and even SA share essentially the same mathematical
background.

4 Reduction of errors for finite-time evolution


In Sec. 2, we discussed the convergence condition of QA implemented for the transverse-
field Ising model. The power decrease of the transverse field guarantees the adiabatic evo-
lution. This annealing schedule, however, does not provide practically useful algorithms
because infinitely long time is necessary to reach the exact solution. An approximate
algorithm for finite annealing time τ should be used in practice. Since such a finite-time
algorithm does not satisfy the generic convergence condition, the answer includes a cer-
tain amount of errors. An important question is how the error depends on the annealing
time τ .
Suzuki and Okada showed that the error after adiabatic evolution for time τ is gener-
ally proportional to τ −2 in the limit of large τ with the system size N kept finite [16]. In
this section, we analyze their results in detail and propose new annealing schedules which
show smaller errors proportional to τ −2m (m > 1) [37]. This method allows us to reduce
errors by orders of magnitude without compromising the computational complexity apart
from a possibly moderate numerical factor.

4.1 Upper bound for excitation probability


Let us consider the general time-dependent Hamiltonian (1). The goal of this section is
to evaluate the excitation probability (closely related with the error probability) at the
final time s = 1 under the adiabaticity condition (16).
This task is easy because we have already obtained the asymptotic form of the exci-
tation amplitude (11). The upper bound for the excitation probability is derived as
1h i2
2 2
hj(1)|ψ̃(1)i = |cj6=0 (1)| . 2 |Aj (0)| + |Aj (1)| + O(τ −3 ). (64)
τ
This formula indicates that the coefficient of the τ −2 term is determined only by the state
of the system at s = 0 and 1 and vanishes if Aj (s) is zero at s = 0 and 1.
When the τ −2 -term vanishes, a similar calculation yields the next order term of the
excitation probability. If H̃ ′ (0) = H̃ ′ (1) = 0, the excitation amplitude cj6=0 (1) is at most
of order τ −2 and then c0 (1) ≈ 1 + O(τ −3 ). Therefore we have
Z 1
eiτ [φj (s)−φ0 (s)] dH̃(s)
cj6=0 (1) ≈ ds hj(s)| |0(s)i + O(τ −3 )
0 ∆ j (s) ds
1 h (2) (2)
i
≈ 2 Aj (0) − eiτ [φj (s)−φ0 (s)] Aj (1) + O(τ −3 ), (65)
τ
where we defined
(m) 1 dm H̃(s)
Aj (s) ≡ m+1
hj(s)| |0(s)i. (66)
∆j (s) dsm
To derive the second line of (65), we used integration by parts twice, and (5) and (6). The
other τ −2 terms vanish because of the assumption H̃ ′ (0) = H̃ ′ (1) = 0. Thus the upper

14
1

f (s) f1 (s)

f2 (s)

f3 (s)

f4 (s)
0
0 1
s

Figure 1: Examples of annealing schedules with reduced errors listed in (70)-(73).

bound of the next order for the excitation probability under this assumption is obtained
as
2 1 h (2) i2
(2)
hj(1)|ψ̃(1)i . 4 Aj (0) + Aj (1) + O(τ −5 ). (67)
τ
It is easy to see that the τ −4 -term also vanishes when H̃ ′′ (0) = H̃ ′′ (1) = 0. It is straight-
forward to generalize these results to prove the following Theorem.
Theorem 4.1. If the kth derivative of H̃(s) is equal to zero at s = 0 and 1 for all
k = 1, 2, · · · , m − 1, the excitation probability has the upper bound
2 1 h i2
(m) (m)
hj(1)|ψ̃(1)i . Aj (0) + Aj (1) + O(τ −2m−1 ). (68)
τ 2m

4.2 Annealing schedules with reduced errors


Although we have so far considered the general time-dependent Hamiltonian, the ordinary
Hamiltonian for QA with finite annealing time is composed of the potential term and the
kinetic energy term,
H̃(s) = f (s)Hpot + [1 − f (s)] Hkin ., (69)
where Hpt and Hkin generalize HIsing and HTF in Sec. 2, respectively. The function
f (s), representing the annealing schedule, satisfies f (0) = 0 and f (1) = 1. Thus H̃(0) =
Hkin and H̃(1) = Hpot . The ground state of Hpot corresponds to the solution of the
optimization problem. The kinetic energy is chosen so that its ground state is trivial. The
above Hamiltonian connects the trivial initial state and the non-trivial desired solution
after evolution time τ .
The condition for the τ −2m -term to exist in the error is obtained straightforwardly
from the results of the previous section because the Hamiltonian (69) depends on time
only through the annealing schedule f (s). It is sufficient that the kth derivative of f (s)
is zero at s = 0 and 1 for k = 1, 2, · · · , m − 1. We note that f (s) should belong to C m ,
that is, f (s) is an mth differentiable function whose mth derivative is continuous.

15
Examples of the annealing schedules fm (s) with the τ −2m error rate are the following
polynomials:
f1 (s) = s, (70)
f2 (s) = s2 (3 − 2s), (71)
3 2
f3 (s) = s (10 − 15s + 6s ), (72)
4 2 3
f4 (s) = s (35 − 84s + 70s − 20s ). (73)
−2
The linear annealing schedule f1 (s), which shows the τ error, has been used in the past
studies. Although we here list only polynomials symmetrical with respect to the point
s = 1/2, this is not essential. For example, f (s) = (1 − cos(πs2 ))/2 also has the τ −4 error
rate because f ′ (0) = f ′ (1) = f ′′ (0) = 0 but f ′′ (1) = −2π 2 .

4.3 Numerical results


4.3.1 Two-level system
To confirm the upper bound for the excitation probability discussed above, it is instructive
to study the two-level system, the Landau-Zener problem, with the Hamiltonian
  
1 t
HLZ (t) = − −f hσ z − ασ x . (74)
2 τ
The energy gap of HLZ (t) has the minimum 2α at f (s) = 1/2. If the annealing time τ
is not large enough to satisfy (16), non-adiabatic transitions occur. The Landau-Zener
2
theorem [38, 39] provides the excitation probability Pex (τ ) = h1(1)|ψ̃(1)i as
πα2 τ
 
Pex (τ ) = exp − ′ ∗ , (75)
f (s )h
where s∗ denotes the solution of f (s∗ ) = 1/2. On the other hand, if τ is sufficiently large,
the system evolves adiabatically. Then the excitation probability has the upper bound
(68), which is estimated as
2
4h2 α2
 m
d f dm f
Pex (τ ) . 2m 2 (0) + (1) . (76)
τ (h + 4α2 )m+2 dsm dsm
We numerically solved the Schrödinger equation (2) for this system (74) with the
Runge-Kutta method [41]. Figure 2 shows the result for the excitation probability with
annealing schedules (70)-(73). The initial state is the ground state of HLZ (0). The
parameters are chosen to be h = 2 and α = 0.2. The curved and straight lines show
(75) and (76), respectively. In the small and large τ regions, the excitation probability
perfectly fits to those two expressions.

4.3.2 Spin glass model


We next carried out simulations of a rather large system, the Ising spin system with
random interactions. The quantum fluctuations are introduced by the uniform transverse
field. Thus, the potential and kinetic energy terms are defined by
X N
X
Hpot = − Jij σiz σjz − h σiz , (77)
hiji i=1
N
X
Hkin = −Γ σix . (78)
i=1

16
1

Ex itation Probability
10
10

f1 (s)
20
10
f2 (s)
f3 (s)
30
10
f4 (s)

10 100 1000 10000




Figure 2: The annealing-time dependence of the excitation probability for the two-level
system (74) using schedules (70) to (73). The curved and straight lines show (75) and (76)
for each annealing schedule, respectively. The parameters in (74) are chosen to be h = 2 and
α = 0.2.

The initial state, namely the ground state of Hkin , is the all-up state along the x axis.
The difference between the obtained approximate energy and the true ground state
energy (exact solution) is the residual energy Eres . It is a useful measure of the error rate
of QA. It has the same behavior as the excitation probability because it is rewritten as

Eres ≡ hψ̃(1)|Hpot |ψ̃(1)i − ε0 (1) (79)


X 2
= ∆j (1) hj(1)|ψ̃(1)i . (80)
j>0

Therefore Eres is expected to be asymptotically proportional to τ −2m using the improved


annealing schedules.
We investigated the two-dimensional square lattice of size 3×3. The quenched random
coupling constants {Jij } are chosen from the uniform distribution between −1 and +1, as
shown in Fig. 3. The parameters are h = 0.1 and Γ = 1. Figure 4 shows the τ dependence
of the residual energy using the annealing schedules (70)-(73). Straight lines representing
τ −2m (m = 1, 2, 3, 4) are also shown for comparison. The data clearly indicates the
τ −2m -law for large τ . The irregular behavior around Eres ≈ 10−25 comes from numerical
rounding errors.

4.3.3 Database search problem


As another example, we apply the improved annealing schedule to the database search
problem of an item in an unsorted database. Consider N items, among which one is
marked. The goal of this problem is to find the marked item in a minimum time. The
pioneering
√ quantum algorithm proposed by Grover [42] solves this task in time of order
N , whereas the classical algorithm tests N/2 items on average. Farhi et al. [26] proposed
a QAE algorithm and Roland and Cerf [43] found a QAE-based algorithm with the same
computational complexity as Grover’s algorithm. Although their schedule is optimal in

17
0.31 −0.54

−0.73

−0.15
0.29
0.08 0.77

−0.45

0.66

0.29
−0.44 −0.6

Figure 3: Configuration of random interactions {Jij } on the 3 × 3 square lattice which we


investigated, and spin configuration of the target state. The solid and dashed lines indicate
ferromagnetic and antiferromagnetic interactions, respectively.

5
10
Residual Energy

10
10
f1(s)
15 f2(s)
10

f3(s)
20
10
f4(s)

100 1000 10000 100000




Figure 4: The annealing-time dependence of the residual energy for the two-dimensional spin
glass model with improved annealing schedules. The solid lines denote functions proportional
to τ −2m (m = 1, 2, 3, 4). The parameter values are h = 0.1 and Γ = 1.

18
the sense that the excitation probability by the adiabatic transition is equal to a small
constant at each time, it has the τ −2 error rate. We show that annealing schedules with
the τ −2m error rate can be constructed by a slight modification of their optimal schedule.
Let us consider the Hilbert space which has the basis states |ii (i = 1, 2, · · · , N ), and
the marked state is denoted by |mi. Suppose that we can construct the Hamiltonian (69)
with two terms,
Hpot = 1 − |mihm|, (81)
N N
1 XX
Hkin = 1 − |iihj|. (82)
N i=1 j=1

The Hamiltonian Hpot can be applied without the explicit knowledge of |mi, the same
assumption as in Grover’s algorithm. The initial state is a superposition of all basis
states,
N
1 X
|ψ(0)i = √ |ii, (83)
N i=1
which does not depend on the marked state. The energy gap between the ground state
and the first excited state,
r
N −1
∆1 (s) = 1 − 4 f (s)[1 − f (s)], (84)
N
has a minimum at f (s) = 1/2. The highest eigenvalue ε2 (s) = 1 is (N −2)-fold degenerate.
To derive the optimal annealing schedule, we briefly review the results reported by
Roland and Cerf [43]. When the energy gap is small (i.e. for f (s) ≈ 1/2), non-adiabatic
transitions are likely to occur. Thus we need to change the Hamiltonian carefully. On
the other hand, when the energy gap is not very small, too slow a change wastes time.
Thus the speed of parameter change should be adjusted adaptively to the instantaneous
energy gap. This is realized by tuning the annealing schedule to satisfy the adiabaticity
condition (16) in each infinitesimal time interval, that is,

|A1 (s)|
= δ, (85)
τ
where δ is a small constant. In the database search problem, this condition is rewritten
as √
N − 1 df
= δ. (86)
τ N ∆1 (s)3 ds
After integration under boundary conditions f (0) = 0 and f (1) = 1, we obtain
1 2s − 1
fopt (s) = + p . (87)
2 2 N − (N − 1)(2s − 1)2

As plotted by a solid line in Fig. 5, this function changes most slowly when the energy
gap takes the minimum value. It is noted that the annealing time is determined by the
small constant δ as √
N −1
τ= , (88)
δ

which means that the computation time is of order N similarly to Grover’s algorithm.
The optimal annealing schedule (87) shows the τ −2 error rate because its derivative
is non-vanishing at s = 0 and 1. It is easy to see from (87) that the simple replacement
of s with fm (s) fulfils the condition for the τ −2m error rate. We carried out numerical

19
1
fopt (s)
(2)
0.8 fopt (s)
(3)
fopt (s)
(4)
0.6 fopt (s)
f (s)
0.4

0.2

0
0 0.2 0.4 0.6 0.8 1
s
Figure 5: The optimal annealing schedules for the database search problem (N = 64). The
solid line denotes the original optimal schedule (87) and the dashed lines are for the modified
schedules.

5
10
Residual Energy

10
10
fopt (s)
(2)
10
15
fopt (s)
(3)
20
fopt (s)
10
(4)
fopt (s)

10 100 1000 10000




Figure 6: The annealing-time dependence of the residual energy for the database search
problem (N = 64) with the optimal annealing schedules described in Fig. 5. The solid lines
represent functions proportional to τ −2m (m = 1, 2, 3, 4).

20
(m)
simulations for N = 64 with such annealing schedules, fopt (s) ≡ fopt (fm (s)), as plotted
(m)
by dashed lines in Fig. 5. As shown in Fig. 6, the residual energy with fopt (s) is propor-
tional to τ −2m . The characteristic time τc for the τ −2m error rate to show up increases
(m)
with m: Since the modified optimal schedule fopt (s) has a steeper slope at s = 1/2 than
fopt (s), a longer annealing time is necessary to satisfy the adiabaticity condition (86).
(m)
Nevertheless, the difference in slopes of fopt (s) is only a factor of O(1), and therefore τc

is still scaled as N . Significant qualitative reduction of errors has been achieved without
compromising computational complexity apart from a numerical factor.

4.4 Imaginary-time Schrödinger Dynamics


So far, we have concentrated on QA following the real-time (RT) Schrödinger dynamics.
From the point of view of physics, it is natural that the time evolution of a quantum
system obeys the real-time Schrödinger equation. Since our goal is to find the solution
of optimization problems, however, we need not stick to physical reality. We therefore
investigate QA following the imaginary-time (IT) Schrödinger dynamics here to further
reduce errors.
The IT evolution tends to filter out the excited states. Thus, it is expected that QA
with the IT dynamics can find the optimal solution more efficiently than RT-QA. Stella
et al. [20] have investigated numerically the performance of IT-QA and conjectured that
(i) the IT error rate is not larger than in the RT, and that (ii) the asymptotic behavior of
the error rate for τ → ∞ is identical for IT-QA and RT-QA. We prove their conjectures
through the IT version of the adiabatic theorem.

4.4.1 Imaginary-time Schrödinger equation


The IT Schrödinger equation is obtained by the transformation t → −it in the time
derivative of the original RT Schrödinger equation:
d
− |Ψ(t)i = H(t) |Ψ(t)i . (89)
dt
The time dependence of the Hamiltonian does not change. If the Hamiltonian is time-
independent, we easily see that the excitation amplitude decreases exponentially relative
to the ground state,
X X X
|Ψ(t)i = cj e−itεj |ji −→ cj e−tεj |ji = e−tε0 cj e−t(εj −ε0 ) |ji . (90)
j j j

However, it is not obvious that this feature survives in the time-dependent situation.
An important aspect of the IT Schrödinger equation is non-unitarity. The norm of
the wave function is not conserved. Thus, we consider the normalized state vector
1
|ψ(t)i ≡ p |Ψ(t)i . (91)
hΨ(t)|Ψ(t)i

The equation of motion for this normalized state vector is


d  
− |ψ(t)i = H(t) − hH(t)i |ψ(t)i , (92)
dt
where we defined the expectation value of the Hamiltonian

hH(t)i ≡ hψ(t)| H(t) |ψ(t)i . (93)

21
The above equation is not linear but norm-conserving, which makes the asymptotic expan-
sion easy. In terms of the dimensionless time s = t/τ , the norm-conserving IT Schrödinger
equation is written as
d E h D Ei E
− ψ̃(s) = τ H̃(s) − H̃(s) ψ̃(s) . (94)
ds

4.4.2 Asymptotic expansion of the excitation probability


To prove the conjecture by Stella et al., we derive the asymptotic expansion of the exci-
tation probability. The following Theorem provides us with the imaginary-time version
of the adiabatic theorem.
Theorem 4.2. Under the same hypothesis as in Theorem 2.1, the state vector following
the norm-conserving IT Schrödinger equation (94) has the asymptotic form in the limit
of large τ as E X
ψ̃(s) = cj (s) |j(s)i , (95)
j

c0 (s) ≈ 1 + O τ −2 ,

(96)
Aj (s)
+ O τ −2 .

cj6=0 (s) ≈ (97)
τ
Proof. The norm-conserving IT Schrödinger equation (94) is rewritten as the equation of
motion for cj (s) as
" #
dcj X ck (s) dH̃(s) X
2
= hj(s)| |k(s)i − τ cj (s) εj (s) − εl (s)|cl (s)| . (98)
ds εj (s) − εk (s) ds
k6=j l

To remove the second term on the right-hand side, we define


Z s " X
#!
2
c̃j (s) ≡ exp τ ds̃ εj (s̃) − εl (s̃)|cl (s̃)| cj (s), (99)
0 l

and obtain the equation of motion for c̃j (s) as

dc̃j X eτ {φj (s)−φk (s)} dH̃(s)


= c̃k (s) hj(s)| |k(s)i , (100)
ds εj (s) − εk (s) ds
k6=j
Rs
where we defined φj (s) ≡ 0 ds′ εj (s′ ) for convenience.
Integration of this equation yields the integral equation for c̃j (s). It is useful to
introduce the following quantity,
Z s X
δ(s) ≡ ds̃ [εl (s̃) − ε0 (s̃)] |cl (s̃)|2 . (101)
0 l6=0

Since the norm of the wave function is conserved, l |cl (s)|2 = 1 and therefore
P

X X
εl (s)|cl (s)2 | = ε0 (s) + [εl (s) − ε0 (s)] |cl (s)2 |. (102)
l l6=0

Thus, the definition of c̃j (s) is written as

c̃j (s) = e−τ δ(s) eτ [φj (s)−φ0 (s)] cj (s). (103)

22
Finally we obtain the integral equation for cj (s):

s
h0(s̃)| dds̃

|l(s̃)i
Z X
τ δ(s) τ δ(s)
c0 (s) = e +e ds̃ e−τ δ(s̃) cl (s̃) , (104)
0 ε0 (s̃) − εl (s̃)
l6=0
s
hj(s̃)| dds̃

|k(s̃)i
Z X
τ δ(s) −τ [φj (s)−φ0 (s)]
cj6=0 (s) = e e ds̃ e−τ δ(s̃) eτ [φj (s̃)−φ0 (s̃)] ck (s̃) ,
0 εj (s̃) − εk (s̃)
k6=j
(105)

where we used the initial condition c0 (0) = 1 and cj6=0 = 0.


The next step is the asymptotic expansion of these integral equations for large τ .
It is expected that c0 (s) = 1 and cj6=0 (s) = 0 for τ → ∞ because of the following
argument: Since the coefficient c0 (s) is less than unity, δ(s) should be O(τ −1 ) at most
and eτ δ(s) = O(1). The second factor on the right-hand side of (105) is small exponentially
with τ because φj (s)−φ0 (s) is positive and an increasing function of s. Thus, cj6=0 (s) → 0
and then c0 (s) → 1 owing to the norm conservation law.
Therefore we estimate the next term of order τ −1 under the assumption that c0 (s) ≫
cj6=0 (s). Since δ(s) is proportional to the square of cj6=0 (s), we have eτ δ(s) ≈ 1. Thus,
the e±τ δ(s) factors can be ignored in the τ −1 term estimation of (105). Consequently,
evaluation of the integral equations yields
s
hj(s̃)| dds̃

|0(s̃)i
Z
−τ [φj (s)−φ0 (s)]
ds̃ eτ [φj (s̃)−φ0 (s̃)] + O τ −2 .

cj6=0 (s) ≈ e (106)
0 ∆j (s̃)

The excitation amplitude is estimated by integration by parts as


1h i
Aj (s) − e−τ (φj (s)−φ0 (s)) Aj (0) + O τ −2 ,

cj6=0 (s) ≈ (107)
τ
where Aj (s) is defined by (12). The second term in the square brackets is vanishingly
small, which is a different point from the RT dynamics. From the above expression, we
find δ(s) = O(τ −2 ), that is eτ δ(s) ≈ 1 + O(τ −1 ). Therefore, we obtain (96) and (97),
which is consistent with the assumption c0 (s) ≫ cj6=0 (s).

Remark. The excitation probability at the end of a QA process is proportional to τ −2 in


the large τ limit:
2 1 2
hj(1)|ψ̃(1)i ≈ 2 |Aj (1)| + O τ −3 .

(108)
τ
Its difference from the upper bound for the RT dynamics (64) is only in the absence
of Aj (0). In the IT dynamics, this term decreases exponentially because of the factor
e−τ (φj (s)−φ0 (s)) . This result proves the conjecture proposed by Stella et al. [20], that is,

ǫIT (τ ) ≤ ǫRT (τ ), (109)


ǫIT (τ ) ≈ ǫRT (τ ) (τ → ∞). (110)

Strictly speaking, the right-hand sides in the above equations denote the upper bound for
the error rate for RT-QA, not the error rate itself. In some systems, for example, the two
level system, the error rate oscillates because Aj (0) and Aj (1) may cancel in (11), and
becomes smaller than that of IT-QA at some τ . However, QA for ordinary optimization
problems has different energy levels at initial and final times, and thus such a cancellation
seldom occurs.

23
1

fsq2(s)

f (s )
fsq1(s)

0
0 1
s

Figure 7: The annealing schedules defined in (111). fsq1 (s) and fsq2 (s) have a vanishing slope
at the initial time s = 0 and the final time s = 1, respectively.

4.4.3 Numerical verification


We demonstrate a numerical verification of the above results by simulations of the IT-
and RT-Schrödinger equations. For this purpose, we consider the following annealing
schedules (Fig. 7):
fsq1 (s) = s2 , fsq2 (s) = s(2 − s). (111)
The former has a zero slope at the initial time s = 0 and the latter at s = 1. Thus, the
Aj (0) and Aj (1) terms vanish with fsq1 (s) and fsq2 (s), respectively. Since the error rate
for IT-QA depends only on Aj (1), IT-QA with fsq2 (s) should show the τ −4 error rate,
while RT-QA with fsq2 (s) exhibits the τ −2 -law. On the other hand, RT-QA and IT-QA
with fsq1 (s) should have the same error rate for large τ . Figure 8 shows the residual
energy with two annealing schedules for the spin-glass model presented in Sec. 4.3.2,
which explicitly supports our results.

5 Convergence condition of QA – Quantum Monte


Carlo evolution
So far, we have discussed QA with the Schrödinger dynamics. When we solve the
Schrödinger equation on the classical computer, the computation time and memory in-
crease exponentially with the system size. Therefore, some approximations are necessary
to simulate QA processes for large-size problems. In most numerical studies, stochastic
methods are used. In this section, we investigate two types of quantum Monte Carlo
methods and prove their convergence theorems, following [40].

5.1 Inhomogeneous Markov chain


Since we prove the convergence of stochastic processes, it is useful to recall various def-
initions and theorems for inhomogeneous Markov processes [5]. We denote the space
of discrete states by S and assume that the size of S is finite. A Monte Carlo step is
characterized by the transition probability from state x(∈ S) to state y(∈ S) at time step

24
1

Residual Energy
5
10

10 IT, sq1
10
IT, sq2
RT, sq1
15
10 RT, sq2

10 100 103 104 105




Figure 8: The annealing-time dependence of the residual energy for IT- and RT-QA with
annealing schedules fsq1 (s) and fsq2 (s). The system is the spin-glass model presented in
Sec. 4.3.2. The solid lines stand for functions proportional to τ −2 and τ −4 . The parameters
are h = 0.1 and Γ = 1.

t: (
P (y, x)A(y, x; t) (x 6= y)
G(y, x; t) = P (112)
1 − z∈S P (z, x)A(z, x; t) (x = y),
where P (y, x) and A(y, x; t) are called the generation probability and the acceptance prob-
ability, respectively. The former is the probability to generate the next candidate state y
from the present state x. We assume that this probability does not depend on time and
satisfies the following conditions:
∀x, y ∈ S : P (y, x) = P (x, y) ≥ 0, (113)
∀x ∈ S : P (x, x) = 0, (114)
X
∀x ∈ S : P (y, x) = 1, (115)
y∈S

n−1
Y
∀x, y ∈ S, ∃n > 0, ∃z1 , · · · , zn−1 ∈ S : P (zk+1 , zk ) > 0, z0 = x, zn = y. (116)
k=0

The last condition represents irreducibility of S, that is, any state in S can be reached
from any other state in S.
We define Sx as the neighborhood of x, i.e., the set of states that can be reached by
a single step from x:
Sx = {y | y ∈ S, P (y, x) > 0}. (117)
The acceptance probability A(y, x; t) is the probability to accept the candidate y gener-
ated from state x. The matrix G(t), whose (y, x) component is given by (112), [G(t)]y,x =
G(y, x; t), is called the transition matrix.

25
Let P denote the set of probability distributions on S. We regard a probability dis-
tribution p (∈ P) as the column vector with the component [p]x = p(x). The probability
distribution at time t, started from an initial distribution p0 (∈ P) at time t0 , is written
as
p(t, t0 ) = Gt,t0 p0 ≡ G(t − 1)G(t − 2) · · · G(t0 )p0 . (118)
A Markov chain is called inhomogeneous when the transition probability depends
on time. In the following sections, we will prove that inhomogeneous Markov chains
associated with QA are ergodic under appropriate conditions. There are two kinds of
ergodicity, weak and strong. Weak ergodicity means that the probability distribution
becomes independent of the initial conditions after a sufficiently long time:
∀t0 ≥ 0 : lim sup{kp(t, t0 ) − p′ (t, t0 )k | p0 , p′0 ∈ P} = 0, (119)
t→∞

where p(t, t0 ) and p′ (t, t0 ) are the probability distributions with different initial distribu-
tions p0 and p′0 . The norm is defined by
X
kpk = |p(x)|. (120)
x∈S

Strong ergodicity is the property that the probability distribution converges to a unique
distribution irrespective of the initial state:
∃r ∈ P, ∀t0 ≥ 0 : lim sup{kp(t, t0 ) − rk | p0 ∈ P} = 0. (121)
t→∞

The following two Theorems provide conditions for weak and strong ergodicity of an
inhomogeneous Markov chain [5]. For proofs see Appendix B.
Theorem 5.1 (Condition for weak ergodicity). An inhomogeneous Markov chain is
weakly ergodic if and only if there exists a strictly increasing sequence of positive numbers
{ti }, (i = 0, 1, 2, . . . ), such that

X
1 − α(Gti+1 ,ti ) −→ ∞,
 
(122)
i=0

where α(Gti+1 ,ti ) is the coefficient of ergodicity defined by


( )
X
ti+1 ,ti
α(G ) = 1 − min min{G(z, x), G(z, y)} x, y ∈ S (123)
z∈S

with the notation G(z, x) = [Gti+1 ,ti ]z,x .


The coefficient of ergodicity measures the variety of the transition probability. If
G(z, x) is independent of a state x, α(G) is equal to zero.
Theorem 5.2 (Condition for strong ergodicity). An inhomogeneous Markov chain is
strongly ergodic if the following three conditions hold:
1. the Markov chain is weakly ergodic,
2. for all t there exists a stationary state pt ∈ P such that pt = G(t)pt ,
3. pt satisfies

X
kpt − pt+1 k < ∞. (124)
t=0
Moreover, if p = lim pt , then p is equal to the probability distribution r in (121).
t→∞
We note that the existence of the limit is guaranteed by (124). This inequality implies
that the probability distribution pt (x) is a Cauchy sequence:
∀ε > 0, ∃t0 > 0, ∀t, t′ > t0 : |pt (x) − pt′ (x)| < ε. (125)

26
5.2 Path-integral Monte Carlo method
Let us first discuss convergence conditions for the implementation of quantum annealing
by the path-integral Monte Carlo (PIMC) method [24, 25]. The basic idea of PIMC
is to apply the Monte Carlo method to the classical system obtained from the original
quantum system by the path-integral formula. We first consider the example of ground
state search of the Ising spin system whose quantum fluctuations are introduced by adding
a transverse field. The total Hamiltonian is defined in (20). Although we only treat
the two-body interaction for simplicity in this section, the existence of arbitrary many-
body interactionsP between the z components of Pauli matrix and longitudinal random
magnetic field hi σiz , in addition to the above Hamiltonian, would not change the
following argument.
In the path-integral method, the d-dimensional transverse-field Ising model (TFIM)
is mapped to a (d + 1)-dimensional classical Ising system so that the quantum system
can be simulated on the classical computer. In numerical simulations, the Suzuki-Trotter
formula [23, 24] is usually employed to express the partition function of the resulting
classical system,
 
M X M XN
X β X (k) (k)
X (k) (k+1)
Z(t) ≈ exp  Jij σi σj + γ(t) σi σi , (126)
M i=0
{Si
(k)
} k=1 hiji k=1

(k)
where M is the length along the extra dimension (Trotter number) and σi (= ±1)
denotes a classical Ising spin at site i on the kth Trotter slice. The nearest-neighbour
interaction between adjacent Trotter slices,
 
1 βΓ(t)
γ(t) = log coth , (127)
2 M

is ferromagnetic. This approximation (126) becomes exact in the limit M → ∞ for a fixed
β = 1/kB T . The magnitude of this interaction (127) increases with time t and tends to
infinity as t → ∞, reflecting the decrease of Γ(t). We fix M and β to arbitrary large
values, which corresponds to the actual situation in numerical simulations. Therefore the
Theorem presented below does not directly guarantee the convergence of the system to
the true ground state, which is realized only after taking the limits M → ∞ and β → ∞.
We will rather show that the system converges to the thermal equilibrium represented
by the right-hand side of (126), which can be chosen arbitrarily close to the true ground
state by taking M and β large enough.
With the above example of TFIM in mind, it will be convenient to treat a more general
expression than (126),
 
X F0 (x) F1 (x)
Z(t) = exp − − . (128)
T0 T1 (t)
x∈S

Here F0 (x) is the cost function whose global minimum is the desired solution of the
combinatorial optimization problem. The temperature T0 is chosen to be sufficiently
small. The term F1 (x) derives from the kinetic energy, which is the transverse field in
the TFIM. Quantum fluctuations are tuned by the extra temperature factor T1 (t), which
decreases with time. The first term −F0 (x)/T0 corresponds to the interaction term in
the exponent of (126), and the second term −F1 (x)/T1 (t) generalizes the transverse-field
term in (126).

27
For the partition function (128), we define the acceptance probability of PIMC as
 
q(y; t)
A(y, x; t) = g , (129)
q(x; t)
 
1 F0 (x) F1 (x)
q(x; t) = exp − − . (130)
Z(t) T0 T1 (t)

This q(x; t) is the equilibrium Boltzmann factor at a given fixed T1 (t). The function g(u)
is the acceptance function, a monotone increasing function satisfying 0 ≤ g(u) ≤ 1 and
g(1/u) = g(u)/u for u ≥ 0. For instance, for the heat bath and the Metropolis methods,
we have
u
g(u) = , (131)
1+u
g(u) = min{1, u}, (132)

respectively. The conditions mentioned above for g(u) guarantee that q(x; t) satisfies
the detailed balance condition, G(y, x; t)q(x; t) = G(x, y; t)q(y; t). Thus, q(x; t) is the
stationary distribution of the homogeneous Markov chain defined by the transition matrix
G(t) with a fixed t. In other words, q(x; t) is the right eigenvector of G(t) with eigenvalue
1.

5.2.1 Convergence theorem for PIMC-QA


We first define a few quantities. The set of local maximum states of F1 is written as Sm ,

Sm = {x | x ∈ S, ∀y ∈ Sx , F1 (y) ≤ F1 (x)} . (133)

We denote by d(y, x) the minimum number of steps necessary to make a transition from
x to y. Using this notation we define the minimum number of maximum steps needed to
reach any other state from an arbitrary state in the set S \ Sm ,
n o
R = min max {d(y, x) | y ∈ S} x ∈ S \ Sm . (134)

Also, L0 and L1 stand for the maximum changes of F0 (x) and F1 (x), respectively, in a
single step,
n o
L0 = max |F0 (x) − F0 (y)| P (y, x) > 0, x, y ∈ S , (135)
n o
L1 = max |F1 (x) − F1 (y)| P (y, x) > 0, x, y ∈ S . (136)

Our main results are summarized in the following Theorem and Corollary.
Theorem 5.3 (Strong ergodicity of the system (128)). The inhomogeneous Markov chain
generated by (129) and (130) is strongly ergodic and converges to the equilibrium state
corresponding to the first term of the right-hand side of (130), exp(−F0 (x)/T0 ), if

RL1
T1 (t) ≥ . (137)
log(t + 2)

Application of this Theorem to the PIMC implementation of QA represented by (126)


immediately yields the following Corollary.

28
Corollary 5.4 (Strong ergodicity of QA-PIMC for TFIM). The inhomogeneous Markov
chain generated by the Boltzmann factor on the right-hand side of (126) is strongly ergodic
and converges to the equilibrium state corresponding to the first term on the right-hand
side of (126) if
M 1
Γ(t) ≥ tanh−1 . (138)
β (t + 2)2/RL1

Remark. For sufficiently large t, the above inequality reduces to


M
Γ(t) ≥ (t + 2)−2/RL1 . (139)
β
This result implies that a power decay of the transverse field is sufficient to guarantee
the convergence of quantum annealing of TFIM by the PIMC. Notice that R is of O(N 0 )
and L1 is of O(N ). Thus (139) is qualitatively similar to (21).
To prove strong ergodicity it is necessary to prove weak ergodicity first. The following
Lemma is useful for this purpose.
Lemma 5.5 (Lower bound on the transition probability). The elements of the transition
matrix defined by (112), (129) and (130) have the following lower bound:
 
L0 L1
P (y, x) > 0 ⇒ ∀t > 0 : G(y, x; t) ≥ w g(1) exp − − , (140)
T0 T1 (t)
and
 
L0 L1
∃t1 > 0, ∀x ∈ S \ Sm , ∀t ≥ t1 : G(x, x; t) ≥ w g(1) exp − − . (141)
T0 T1 (t)
Here, w stands for the minimum non-vanishing value of P (y, x),

w = min {P (y, x) | P (y, x) > 0, x, y ∈ S} . (142)

Proof of Lemma 5.5. The first part of Lemma 5.5 is proved straightforwardly. Equa-
tion (140) follows directly from the definition of the transition probability and the prop-
erty of the acceptance function g. When q(y; t)/q(x; t) < 1, we have
   
q(x; t) q(y; t) L0 L1
G(y, x; t) ≥ w g ≥ w g(1) exp − − . (143)
q(y; t) q(x; t) T0 T1 (t)
On the other hand, if q(y; t)/q(x; t) ≥ 1,
 
L0 L1
G(y, x; t) ≥ w g(1) ≥ w g(1) exp − − , (144)
T0 T1 (t)
where we used the fact that both L0 and L1 are positive.
Next, we prove (141). Since x is not a member of Sm , there exists a state y ∈ Sx such
that F1 (y) − F1 (x) > 0. For such a state y,
  
F0 (y) − F0 (x) F1 (y) − F1 (x)
lim g exp − − = 0, (145)
t→∞ T0 T1 (t)
because T1 (t) tends to zero as t → ∞ and 0 ≤ g(u) ≤ u. Thus, for all ε > 0, there exists
t1 > 0 such that
  
F0 (y) − F0 (x) F1 (y) − F1 (x)
∀t > t1 : g exp − − < ε. (146)
T0 T1 (t)

29
We therefore have
X X
P (z, x)A(z, x; t) = P (y, x)A(y, x; t) + P (z, x)A(z, x; t)
z∈S z∈S\{y}
X
< P (y, x)ε + P (z, x)
z∈S\{y}

= 1 − (1 − ε)P (y, x), (147)

and consequently,
G(x, x; t) > (1 − ε)P (y, x) > 0. (148)
Since the right-hand side of (141) can be arbitrarily small for sufficiently large t, we obtain
the second part of Lemma 5.5.

Proof of weak ergodicity implied in Theorem 5.3. Let us introduce the following
quantity n o
x∗ = arg min max {d(y, x) | y ∈ S} x ∈ S \ Sm . (149)

Comparison with the definition of R in (134) shows that the state x∗ is reachable by at
most R transitions from any states.
Now, consider the transition probability from an arbitrary state x to x∗ . From the
definitions of R and x∗ , there exists at least one transition route within R steps:

x ≡ x0 6= x1 6= x2 6= · · · 6= xl = xl+1 = · · · = xR ≡ x∗ .

Then Lemma 5.5 yields that, for sufficiently large t, the transition probability at each
time step has the following lower bound:
 
L0 L1
G(xi+1 , xi ; t − R + i) ≥ wg(1) exp − − . (150)
T0 T1 (t − R + i)

Thus, by taking the product of (150) from i = 0 to i = R − 1, we have

Gt,t−R (x∗ , x) ≥ G(x∗ , xR−1 ; t − 1)G(xR−1 , xR−2 ; t − 2) · · · G(x1 , x; t − R)


R−1  
Y L0 L1
≥ w g(1) exp − −
i=0
T0 T1 (t − R + i)
 
R R RL0 RL1
≥ w g(1) exp − − , (151)
T0 T1 (t − 1)

where we have used monotonicity of T1 (t). Consequently, it is possible to find an integer


k0 ≥ 0 such that, for all k > k0 , the coefficient of ergodicity satisfies
 
kR,kR−R R R RL0 RL1
1 − α(G ) ≥ w g(1) exp − − , (152)
T0 T1 (kR − 1)

where we eliminate the sum over z in (123) by replacing it with a single term for z = x∗ .
We now substitute the annealing schedule (137). Then weak ergodicity is immediately
proved from Theorem 5.1 because we obtain
∞   ∞
X RL0 X 1
(1 − α(GkR,kR−R )) ≥ wR g(1)R exp − −→ ∞. (153)
T0 kR + 1
k=1 k=k0

30
Proof of Theorem 5.3. To prove strong ergodicity, we refer to Theorem 5.2. The con-
dition 1 has already been proved. As has been mentioned, the Boltzmann factor (130)
satisfies q(t) = G(t)q(t), which is the condition 2. Thus the proof will be complete if we
prove the condition 3 by setting pt = q(t). For this purpose, we first prove that q(x; t) is
monotonic for large t:

∀t ≥ 0, ∀x ∈ S1min : q(x; t + 1) ≥ q(x; t), (154)

∃t1 > 0, ∀t ≥ t1 , ∀x ∈ S \ S1min : q(x; t + 1) ≤ q(x; t), (155)


where S1min
denotes the set of global minimum states of F1 .
To prove this monotonicity, we use the following notations for simplicity:
 
F0 (x) X
A(x) = exp − , B= A(x), (156)
T0 min x∈S1

∆(x) = F1 (x) − F1min . (157)

If x ∈ S1min , the Boltzmann distribution can be rewritten as

A(x)
q(x; t) =   . (158)
X ∆(y)
B+ exp − A(y)
min
T1 (t)
y∈S\S1

Since ∆(y) > 0 by definition, the denominator decreases with time. Thus, we obtain
(154).
To prove (155), we consider the derivative of q(x; t) with respect to T1 (t),
 
 
X ∆(y)
A(x) B∆(x) + (F1 (x) − F1 (y)) exp − A(y)
T1 (t)
∂q(x; t) y∈S\S1min
= 2 . (159)
∂T1 (t)

   
∆(x)  X ∆(y)
T (t)2 exp B+ exp − A(y)
T1 (t) min
T 1 (t)
y∈S\S1

Only F1 (x) − F1 (y) in the numerator has the possibility of being negative. However, the
first term B∆(x) is larger than the second one for sufficient large t because exp (−∆(y)/T1 (t))
tends to zero as T1 (t) → ∞. Thus there exists t1 > 0 such that ∂q(x; t)/∂T (t) > 0 for all
t > t1 . Since T1 (t) is a decreasing function of t, we have (155).
Consequently, for all t > t1 , we have
X X
kq(t + 1) − q(t)k = [q(x; t + 1) − q(x; t)] − [q(x; t + 1) − q(x; t)]
x∈S1min x6∈S1min
X
=2 [q(x; t + 1) − q(x; t)] , (160)
x∈S1min

P P
where we used kq(t)k = x∈S1min q(x; t) + x6∈S1min q(x; t) = 1. We then obtain


X X
kq(t + 1) − q(t)k = 2 [q(x; ∞) − q(x; t1 )] ≤ 2kq(x; ∞)k = 2. (161)
t=t1 x∈S1min

31
Therefore q(t) satisfies the condition 3:

X tX
1 −1 ∞
X
kq(t + 1) − q(t)k = kq(t + 1) − q(t)k + kq(t + 1) − q(t)k
t=0 t=0 t=t1
tX
1 −1

≤ [kq(t + 1)k + kq(t)k] + 2


t=0
= 2t1 + 2 < ∞, (162)

which completes the proof of strong ergodicity.

5.2.2 Generalized transition probability


In Theorem 5.3, the acceptance probability is defined by the conventional Boltzmann
form, (129) and (130). However, we have the freedom to choose any transition (accep-
tance) probability as long as it is useful to achieve our objective since our goal is not
to find finite-temperature equilibrium states but to identify the optimal state. There
have been attempts to accelerate the annealing schedule in SA by modifying the transi-
tion probability. In particular Nishimori and Inoue [34] have proved weak ergodicity of
the inhomogeneous Markov chain for classical simulated annealing using the probability
of Tsallis and Stariolo [44]. There the property of weak ergodicity was shown to hold
under the annealing schedule of temperature inversely proportional to a power of time
steps. This annealing rate is much faster than the log-inverse law for the conventional
Boltzmann factor.
A similar generalization is possible for QA-PIMC by using the following modified
acceptance probability
A(y, x; t) = g (u(y, x; t)) , (163)
 1/(1−q)
F1 (y) − F1 (x)
u(y, x; t) = e−[F0 (y)−F0 (x)]/T0 1 + (q − 1) , (164)
T1 (t)
where q is a real number. In the limit q → 1, this acceptance probability reduces to the
Boltzmann form. Similarly to the discussions leading to Theorem 5.3, we can prove that
the inhomogeneous Markov chain with this acceptance probability is weakly ergodic if
b q−1
T1 (t) ≥ , 0<c≤ , (165)
(t + 2)c R

where b is a positive constant. We have to restrict ourselves to the case q > 1 for a tech-
nical reason as was the case previously [34]. We do not reproduce the proof here because
it is quite straightforward to generalize the discussions for Theorem 5.3 in combination
with the argument of [34]. The result (165) applied to the TFIM is that, if the annealing
schedule asymptotically satisfies

2(t + 2)c
 
M
Γ(t) ≥ exp − , (166)
β b

the inhomogeneous Markov chain is weakly ergodic. Notice that this annealing schedule
is faster than the power law of (139). We have been unable to prove strong ergodicity
because we could not identify the stationary distribution for a fixed T1 (t) in the present
case.

32
5.2.3 Continuous systems
In the above analyses we treated systems with discrete degrees of freedom. Theorem 5.3
does not apply directly to a continuous system. Nevertheless, by discretization of the
continuous space we obtain the following result.
Let us consider a system of N distinguishable particles in a continuous space of finite
volume with the Hamiltonian
N
1 X 2
H= p + V ({ri }). (167)
2m(t) i=1 i

The mass m(t) controls the magnitude of quantum fluctuations. The goal is to find the
minimum of the potential term, which is achieved by a gradual increase of m(t) to infinity
according to the prescription of QA. After discretization of the continuous space (which
is necessary anyway in any computer simulations with finite precision) and an application
of the Suzuki-Trotter formula, the equilibrium partition function acquires the following
expression in the representation to diagonalize spatial coordinates
M N M
!
β X  (k)  M m(t) X X (k+1) 2
(k)
Z(t) ≈ Tr exp − V {ri } − ri − ri (168)
M 2β i=1
k=1 k=1

with the unit ~ = 1. Theorem 5.3 is applicable to this system under the identification
of T1 (t) with m(t)−1 . We therefore conclude that a logarithmic increase of the mass
suffices to guarantee strong ergodicity of the potential-minimization problem under spatial
discretization.
The coefficient corresponding to the numerator of the right-hand side of (137) is
estimated as
RL1 ≈ M 2 N L2 /β, (169)
(k+1) (k)
where L denotes the maximum value of ri − ri . To obtain this coefficient, let us
consider two extremes. One is that any states are reachable at one step. By definition,
R = 1 and L1 ≈ M 2 N L2 /β, which yields (169). The other case is that only one particle
can move to the nearest neighbor point at one time step. With a (≪ L) denoting the
lattice spacing, we have
M  2  M La
L1 ≈ L − (L − a)2 ≈ . (170)
2β β
Since the number of steps to reach any configurations is estimated as R ≈ N M L/a, we
again obtain (169).

5.3 Green’s function Monte Carlo method


The path-integral Monte Carlo simulates only the equilibrium behavior at finite temper-
ature because its starting point is the equilibrium partition function. Moreover, it follows
an artificial time evolution of Monte Carlo dynamics, not the natural Schrödinger dy-
namics. An alternative approach to improve these points is the Green’s function Monte
Carlo (GFMC) method [25, 45, 46, 47]. The basic idea is to solve the imaginary-time
Schrödinger equation by stochastic processes. In the present section we derive sufficient
conditions for strong ergodicity in GFMC.
The evolution of states by the imaginary-time Schrödinger equation starting from an
initial state |ψ0 i is expressed as
 Z t 
′ ′
|ψ(t)i = T exp − dt H(t ) |ψ0 i, (171)
0

33
where T is the time-ordering operator. The right-hand side can be decomposed into a
product of small-time evolutions,

|ψ(t)i = lim Ĝ0 (tn−1 )Ĝ0 (tn−2 ) · · · Ĝ0 (t1 )Ĝ0 (t0 )|ψ0 i, (172)
n→∞

where tk = k∆t, ∆t = t/n and Ĝ0 (t) = 1−∆t·H(t). In the GFMC, one approximates the
right-hand side of this equation by a product with large but finite n and replaces Ĝ0 (t)
with Ĝ1 (t) = 1 − ∆t(H(t) − ET ), where ET is called the reference energy to be taken
approximately close to the final ground-state energy. This subtraction of the reference
energy simply adjusts the standard of energy and changes nothing physically. However,
practically, this term is important to keep the matrix elements positive and to accelerate
convergence to the ground state as will be explained shortly.
To realize the process of (172) by a stochastic method, we rewrite this equation in a
recursive form, X
ψk+1 (y) = Ĝ1 (y, x; tk )ψk (x), (173)
x

where ψk (x) = hx|ψk i and |xi denotes a basis state. The matrix element of Green’s
function is given by
Ĝ1 (y, x; t) = hy| 1 − ∆t [H(t) − ET ] |xi. (174)
Equation (173) looks similar to a Markov process but is significantly different in sev-
eral
P ways. An important difference is that the Green’s function is not normalized,
y Ĝ1 (y, x; t) 6= 1. In order to avoid this problem, one decomposes the Green’s function
into a normalized probability G1 and a weight w:

Ĝ1 (y, x; t) = G1 (y, x; t)w(x; t), (175)

where
Ĝ1 (y, x; t) Ĝ1 (y, x; t)
G1 (y, x; t) ≡ P , w(x; t) ≡ . (176)
y Ĝ1 (y, x; t)
G1 (y, x; t)
Thus, using (173), the wave function at time t is written as
X
ψn (y) = δy,xn w(xn−1 ; tn−1 )w(xn−2 ; tn−2 ) · · · w(x0 ; t0 )
{xk }

× G1 (xn , xn−1 ; tn−1 )G1 (xn−1 , xn−2 ; tn−2 ) · · · G1 (x1 , x0 ; t0 )ψ0 (x0 ). (177)

The algorithm of GFMC is based on this formula and is defined by a weighted random
walk in the following sense. One first prepares an arbitrary initial wave function ψ0 (x0 ),
all elements of which are non-negative. A random walker is generated, which sits initially
(t = t0 ) at the position x0 with a probability proportional to ψ0 (x0 ). Then the walker
moves to a new position x1 following the transition probability G1 (x1 , x0 ; t0 ). Thus
this probability should be chosen non-negative by choosing parameters appropriately
as described later. Simultaneously, the weight of this walker is updated by the rule
W1 = w(x0 ; t0 )W0 with W0 = 1. This stochastic process is repeated to t = tn−1 . One
actually prepares M independent walkers and let those walkers follow the above process.
Then, according to (177), the wave function ψn (y) is approximated by the distribution
of walkers at the final step weighted by Wn ,
M
1 X (i)
ψn (y) = lim Wn δy,x(i) , (178)
M→∞ M n
i=1

where i is the index of a walker.

34
As noted above, G1 (y, x; t) should be non-negative, which is achieved by choosing
sufficiently small ∆t (i.e. sufficiently large n) and selecting ET within the instantaneous
spectrum of the Hamiltonian H(t). In particular, when ET is close to the instantaneous
ground-state energy of H(t) for large t (i.e. the final target energy), Ĝ1 (x, x; t) is close
to unity whereas other matrix components of Ĝ1 (t) are small. Thus, by choosing ET this
way, one can accelerate convergence of GFMC to the optimal state in the last steps of
the process.
If we apply this general framework to the TFIM with the σ z -diagonal basis, the matrix
elements of Green’s function are immediately calculated as

1 − ∆t [E0 (x) − ET ] (x = y)

Ĝ1 (y, x; t) = ∆t Γ(t) (x and y differ by a single-spin flip) (179)

0 (otherwise),

 P 
where E0 (x) = hx| − ij Jij σiz σjz |xi. One should choose ∆t and ET such that 1 −
P
∆t(E0 (x) − ET ) ≥ 0 for all x. Since w(x, t) = y Ĝ1 (y, x; t), the weight is given by
w(x; t) = 1 − ∆t [E0 (x) − ET ] + N ∆t Γ(t). (180)
One can decompose this transition probability into the generation probability and the
acceptance probability as in (112):
(
1
(single-spin flip)
P (y, x) = N (181)
0 (otherwise)
N ∆t Γ(t)
A(y, x; t) = . (182)
1 − ∆t [E0 (x) − ET ] + N ∆t Γ(t)
We shall analyze the convergence properties of stochastic processes under these probabil-
ities for TFIM.

5.3.1 Convergence theorem for GFMC-QA


Similarly to the QA by PIMC, it is necessary to reduce the strength of quantum fluc-
tuations slowly enough in order to find the ground state in the GFMC. The following
Theorem provides a sufficient condition in this regard.
Theorem 5.6 (Strong ergodicity of QA-GFMC). The inhomogeneous Markov process
of the random walker for the QA-GFMC of TFIM, (112), (181) and (182), is strongly
ergodic if
b 1
Γ(t) ≥ , 0<c≤ . (183)
(t + 1)c N
The lower bound of the transition probability given in the following Lemma will be
used in the proof of Theorem 5.6.
Lemma 5.7. The transition probability of random walk in the GFMC defined by (112),
(181) and (182) has the lower bound:
∆t Γ(t)
P (y, x) > 0 ⇒ ∀t > 0 : G1 (y, x; t) ≥ , (184)
1 − ∆t (Emin − ET ) + N ∆t Γ(t)
∆t Γ(t)
∃t1 > 0, ∀t > t1 : G1 (x, x; t) ≥ , (185)
1 − ∆t (Emin − ET ) + N ∆t Γ(t)
where Emin is the minimum value of E0 (x)
Emin = min{E0 (x)|x ∈ S}. (186)

35
Proof of Lemma 5.7. The first part of Lemma 5.7 is trivial because the transition
probability is an increasing function with respect to E0 (x) when P (y, x) > 0 as seen in
(182). Next, we prove the second part of Lemma 5.7. According to (179) and (180),
G1 (x, x; t) is written as
N ∆t Γ(t)
G1 (x, x; t) = 1 − . (187)
1 − ∆t [E0 (x) − ET ] + N ∆t Γ(t)
Since the transverse field Γ(t) decreases to zero with time, the second term on the right-
hand side tends to zero as t → ∞. Thus, there exists t1 > 0 such that G1 (x, x; t) > 1 − ε
for ∀ε > 0 and ∀t > t1 . On the other hand, the right-hand side of (185) converges to zero
as t → ∞. We therefore have (185).

Proof of Theorem 5.6. We show that the condition (183) is sufficient to satisfy the
three conditions of Theorem 5.2.
1. From Lemma 5.7, we obtain a bound on the coefficient of ergodicity for sufficiently
large k as
 N
∆t Γ(kN − 1)
1 − α(G1kN,kN −N ) ≥ , (188)
1 − ∆t (Emin − ET ) + N ∆t Γ(kN − 1)
in the same manner as we derived (152), where we used R = N . Substituting the
annealing schedule (183), we can prove weak ergodicity from Theorem 5.1 because
∞ h ∞
i b′ N
1 − α(G1kN,kN −N ) ≥
X X
(189)
(kN )cN
k=1 k=k0

which diverges when 0 < c ≤ 1/N .


2. The stationary distribution of the instantaneous transition probability G1 (y, x; t)
is
w(x; t) 1 ∆t E0 (x)
q(x; t) ≡ P = N − N , (190)
x∈S w(x; t) 2 2 [1 + ∆t ET + N ∆t Γ(t)]
which is derived as follows. The transition probability defined by (112), (181) and (182)
is rewritten in terms of the weight (180) as
N ∆t Γ(t)


1− (x = y)
w(x; t)



G1 (y, x; t) = ∆t Γ(t) (x ∈ Sy ; single-spin flip) (191)



 w(x; t)
0 (otherwise).

Thus, we have
 
X N ∆t Γ(t) w(y; t) X ∆t Γ(t) w(x; t)
G1 (y, x; t)q(x; t) = 1 − +
w(y; t) A w(x; t) A
x∈S x∈Sy

N ∆t Γ(t) ∆t Γ(t) X
= q(y; t) − + 1, (192)
A A
x∈Sy

where A denotes the normalization factor,


   
X X
w(x; t) = Tr 1 − ∆t − Jij σiz σjz − ET  + N ∆t Γ(t)
x∈S hiji
N
= 2 [1 + ∆t ET + N ∆t Γ(t)] , (193)

36
where we used Tr Jij σiz σjz = 0. Since the volume of Sy is N , (192) indicates that q(x; t)
P
is the stationary distribution of G1 (y, x; t). The right-hand side of (190) is easily derived
from the above equation.
3. Since the transverse field Γ(t) decreases monotonically with t, the above stationary
distribution q(x; t) is an increasing function of t if E0 (x) < 0 and is decreasing if E0 ≥ 0.
Consequently, using the same procedure as in (160), we have
X
kq(t + 1) − q(t)k = 2 [q(x; t + 1) − q(x; t)], (194)
E0 (x)<0

and thus ∞
X X
kq(t + 1) − q(t)k = 2 [q(x; ∞) − q(x; 0)] ≤ 2. (195)
t=0 E0 (x)<0
P∞
Therefore the sum t=0 kq(t + 1) − q(t)k is finite, which completes the proof of the
condition 3.

Remark. Theorem 5.6 asserts convergence of the distribution of random walkers to the
equilibrium distribution (190) with Γ(t) → 0. This implies that the final distribution
is not delta-peaked at the ground state with minimum E0 (x) but is a relatively mild
function of this energy. The optimality of the solution is achieved after one takes the
weight factor w(x; t) into account: The repeated multiplication of weight factors as in
(177), in conjunction with the relatively mild distribution coming from the product of
G1 as mentioned above, leads to the asymptotically delta-peaked wave function ψn (y)
because w(x; t) is larger for smaller E0 (x) as seen in (180).

5.3.2 Alternative choice of Green’s function


So far we have used the Green’s function defined in (174), which is linear in the transverse
field, allowing single-spin flips only. It may be useful to consider another type of Green’s
function which accommodates multi-spin flips. Let us try the following form of Green’s
function,  
!
X X
Ĝ2 (t) = exp ∆t Γ(t) σix exp ∆t Jij σiz σjz  , (196)
i ij

which is equal to Ĝ0 (t) to the order ∆t. The matrix element of Ĝ2 (t) in the σ z -diagonal
basis is
Ĝ2 (y, x; t) = coshN (∆t Γ(t)) tanhδ (∆t Γ(t)) e−∆t E0 (x) , (197)
where δ is the number of spins in different states in x and y. According to the scheme
of GFMC, we decompose Ĝ2 (y, x; t) into the normalized transition probability and the
weight:
 N
cosh(∆t Γ(t))
G2 (y, x; t) = tanhδ (∆t Γ(t)), (198)
e∆t Γ(t)
w2 (x; t) = e∆t N Γ(t) e−∆t E0 (x) . (199)
It is remarkable that the transition probability G2 is independent of E0 (x), although it
depends on x through δ. Thus, the stationary distribution of random walk is uniform.
This property is lost if one interchanges the order of the two factors in (196).
The property of strong ergodicity can be shown to hold in this case as well:

37
Theorem 5.8 (Strong ergodicity of QA-GFMC 2). The inhomogeneous Markov chain
generated by (198) is strongly ergodic if
1  
Γ(t) ≥ − log 1 − 2b(t + 1)−1/N . (200)
2∆t
Remark. For sufficiently large t, the above annealing schedule is reduced to
b
Γ(t) ≥ . (201)
∆t (t + 1)1/N
Since the proof is quite similar to the previous cases, we just outline the idea of the
proof. The transition probability G2 (y, x; t) becomes smallest when δ = N . Consequently,
the coefficient of ergodicity is estimated as
N
1 − e−2∆t Γ(t)

t+1,t
1 − α(G2 )≥ .
2
We note that R is equal to 1 in the present case because any states are reachable from
an arbitrary state in a single step. From Theorem 5.1, the condition
N
1 − e−2∆t Γ(t) b′

≥ (202)
2 t+1
is sufficient for weak ergodicity. From this, one obtains (200). Since the stationary
distribution of G2 (y, x; t) is uniform as mentioned above, strong ergodicity readily follows
from Theorem 5.2.
Similarly to the case of PIMC, we can discuss the convergence condition of QA-
GFMC in systems with continuous degrees of freedom. The resulting sufficient condition
is a logarithmic increase of the mass as will be shown now. The operator Ĝ2 generated
by the Hamiltonian (167) is written as
N
!
∆t X 2 −∆tV ({ri })
Ĝ2 (t) = exp − p e . (203)
2m(t) i=1 i

Thus, the Green’s function is calculated in a discretized space as


N
!
m(t) X ′ 2
Ĝ2 (y, x; t) ∝ exp − |r − ri | − ∆tV ({ri }) , (204)
2∆t i=1 i

where x and y represent {ri } and {r′i }, respectively. Summation over y, i.e. integration
over {r ′i }, yields the weight w(x; t), from which the transition probability is obtained:
w(x; t) ∝ e−∆tV ({r i }) , (205)
N
!
m(t) X ′ 2
G2 (y, x; t) ∝ exp − |r − r i | . (206)
2∆t i=1 i
The lower bound for the transition probability depends exponentially on the mass: G2 (y, x; t) ≥
e−Cm(t) . Since 1 − α(Gt+1,t
2 ) has the same lower bound, the sufficient condition for weak
ergodicity is e−Cm(t) ≥ (t + 1)−1 , which is rewritten as
m(t) ≤ C −1 log(t + 1). (207)
The constant C is proportional to N L2 /∆t, where L denotes the maximum value of
|r′ − r|. The derivation of C is similar to (169), because G2 (t) allows any transition to
arbitrary states at one time step.

38
6 Summary and perspective
In this paper we have studied the mathematical foundation of quantum annealing, in
particular the convergence conditions and the reduction of residual errors. In Sec. 2, we
have seen that the adiabaticity condition of the quantum system representing quantum
annealing leads to the convergence condition, i.e. the condition for the system to reach
the solution of the classical optimization problem as t → ∞ following the real-time
Schrödinger equation. The result shows the asymptotic power decrease of the transverse
field as the condition for convergence. This rate of decrease of the control parameter
is faster than the logarithmic rate of temperature decrease for convergence of SA. It
nevertheless does not mean the qualitative reduction of computational complexity from
classical SA to QA. Our method deals with a very generic system that represents most of
the interesting problems including worst instances of difficult problems, for which drastic
reduction of computational complexity is hard to expect.
Section 3 reviews the quantum-mechanical derivation of the convergence condition of
SA using the classical-quantum mapping without an extra dimension in the quantum sys-
tem. The adiabaticity condition for the quantum system has been shown to be equivalent
to the quasi-equilibrium condition for the classical system at finite temperature, repro-
ducing the well-known convergence condition of SA. The adiabaticity condition thus leads
to the convergence condition of both QA and SA. Since the studies of QAE often exploits
the adiabaticity condition to derive the computational complexity of a given problem, the
adiabaticity may be seen as a versatile tool traversing QA, SA and QAE.
Section 4 is for the reduction of residual errors after finite-time quantum evolution of
real- and imaginary-time Schrödinger equations. This is a different point of view from
the usual context of QAE, where the issue is to reduce the evolution time (computational
complexity) with the residual error fixed to a given small value. It has been shown
that the residual error can becomes significantly smaller by the ingenious choice of the
time dependence of coefficients in the quantum Hamiltonian. This idea allows us to
reduce the residual error for any given QAE-based algorithm without compromising the
computational complexity apart from a possibly moderate numerical factor.
In Sec. 5 we have derived the convergence condition of QA implemented by Quantum
Monte Carlo simulations of path-integral and Green function methods. These approaches
bear important practical significance because only stochastic methods allow us to treat
practical large-size problems on the classical computer. A highly non-trivial result in
this section is that the convergence condition for the stochastic methods is essentially
the same power-law decrease of the transverse-field term as in the Schrödinger dynamics
of Sec. 2. This is surprising since the Monte Carlo (stochastic) dynamics is completely
different from the Schrödinger dynamics. Something deep may lie behind this coincidence
and it should be an interesting target of future studies.
The results presented and/or reviewed in this paper serve as the mathematical foun-
dation of QA. We have also stressed the similarity/equivalence of QA and QAE. Even
the classical SA can be viewed from the same framework of quantum adiabaticity as long
as the convergence conditions are concerned. Since the studies of very generic properties
of QA seem to have been almost completed, fruitful future developments would lie in the
investigations of problems specific to each case of optimization task by analytical and
numerical methods.

Acknowledgement
We thank G. E. Santoro, E. Tosatti and S. Suzuki for discussions and G. Ortiz for
useful comments and correspondence on the details of the proofs of Theorems in Sec. 3.

39
Financial supports by CREST(JST), DEX-SMI and JSPS are gratefully acknowledged.

A Hopf ’s inequality
In this Appendix, we prove the inequality (22). Although Hopf [32] originally proved this
inequality for positive linear integral operators, we concentrate on a square matrix for
simplicity.
Let M be a strictly positive m × m matrix. The strict positivity means that all the
elements of M are positive, namely, Mij > 0 for all i, j, which will be denoted by M > 0.
Similarly, M ≥ 0 means that Mij ≥ 0 for all i, j. We use the same notation for a vector,
that is, v > 0 means that all the elements vi are positive.
The product of the matrix M and an m-element column vector v is denoted as usual
by M v and its ith element is
Xm
(M v)i = Mij vj . (208)
j=1

The strict positivity for M is equivalent to

Mv > 0 if v ≥ 0, v 6= 0, (209)

where 0 denotes the zero-vector. Of course, if v = 0, then M v = 0.


Any real-valued vector v and any strictly positive vector p > 0 satisfy

vi (M v)i (M v)i vi
min ≤ min ≤ max ≤ max , (210)
i pi i (M p)i i (M p)i i pi

because
  m    
vi X vi
(M v)i − min (M p)i = Mij vj − min pj ≥ 0, (211)
i pi i pi
j=1
  m   
vi X vi
max (M p)i − (M v)i = Mij max pj − vj ≥ 0. (212)
i pi j=1
i pi

The above inequality implies that the difference between maximum and minimum of
(M v)i /(M p)i is smaller than that of vi /pi . Following [32], we use the notation,
vi vi vi
osc ≡ max − min , (213)
i pi i pi i pi

which is called the oscillation. For a complex-valued vector, we define

osc vi = sup osc Re(ηvi ). (214)


i |η|=1 i

It is easily to derive, for any complex c,

osc (cvi ) = |c| osc vi . (215)


i i

We can also easily prove that, if osci vi = 0, vi does not depend on i.


We suppose that the simple ratio of matrix elements is bounded,
Mik
≤κ for all i, j, k. (216)
Mjk

40
This assumption is rewritten by the product form as

(M v)i
≤ κ, v ≥ 0, v 6= 0, (217)
(M v)j

for all i, j and such v. The following Theorem states that the inequality (210) is sharpened
under the above additional assumption (217).
Theorem A.1. If M satisfies the conditions (209) and (217), for any p > 0 and any
complex-valued v,
(M v)i κ−1 vi
osc ≤ osc . (218)
i (M p)i κ + 1 i pi
Proof. We consider a real-valued vector v at first. For fixed i, j and fixed p > 0, we
define Xk by
m
(M v)i (M v)j X
− = Xk vk . (219)
(M p)i (M p)j
k=1

We do not have to know the exact form of Xk = XkP (i, j, p). When v = ap, the left-hand
side of the above equation vanishes, which implies k Xk pk = 0. Thus, we have
m
(M v)i (M v)j X
− = Xk (vk − apk ). (220)
(M p)i (M p)j
k=1

Now we choose
vi vi
a = min , b = max . (221)
i pi i pi
Since vk − apk = (b − a)pk − (bpk − vk ), vk − apk takes its minimum 0 at vk = apk and
its maximum (b − a)pk at vk = bpk . Therefore, the right-hand side of (220) with p given
attains its maximum for

v = ap− − bp+ = ap + (b − a)p+ , (222)

where we defined
( (
pi (Xi ≤ 0) 0 (Xi ≤ 0)
p−
i = , p+
i = . (223)
0 (Xi > 0) pi (Xi > 0)

Consequently, we have

(M p+ )i (M p+ )j
 
(M v)i (M v)j
− ≤ − (b − a). (224)
(M p)i (M p)j (M p)i (M p)j

Since, by assumptions, M > 0 and p > 0, we have

M p− ≥ 0, M p+ ≥ 0, M p = M p− + M p+ > 0. (225)

Moreover, M p− > 0 if p− 6= 0 and M p+ > 0 if p+ 6= 0. In either case, namely, p− = 0


or p+ = 0, the expression in the square brackets of (224) vanishes because p+ is equal to
either p or 0. Thus, we may assume that both M p− > 0 and M p+ > 0. Therefore the
expression in inequality (224) is rewritten as

(M p+ )i (M p+ )j 1 1 (M p− )i (M p− )j
− ≤ − , t≡ , t′ ≡ . (226)
(M p)i (M p)j 1 + t 1 + t′ (M p+ )i (M p+ )j

41
Since, from the assumption (217), t and t′ are bounded from κ−1 to κ, we find t′ ≤ κt2 ,
which yields
(M p+ )i (M p+ )j 1 1
− ≤ − . (227)
(M p)i (M p)j 1 + t 1 + κ2 t
For t > 0, the right-hand side of the above inequality takes its maximum value (κ −
1)/(κ + 1) at t = κ−1 . Finally, we obtain
(M v)i (M v)j κ−1 vi
− ≤ osc (228)
(M p)i (M p)j κ + 1 i pi
for any i, j. Hence it holds for the sup of the left-hand side, which yields (218).
For a complex-valued vector v, we replace vi by Re (ηvi ). Since M Re (ηv) = Re (ηM v),
the same argument for the real vector case yields
   
(M v)i κ−1 vi
osc Re η ≤ osc Re η . (229)
i (M p)i κ+1 i pi
Taking the sup with respect to η, |η| = 1, on both sides, we obtain (218).

We apply this Theorem to the eigenvalue problem,


M v = λv. (230)
The Perron-Frobenius theorem states that a non-negative square matrix, M ≥ 0, has a
real eigenvalue λ0 satisfying |λ| ≤ λ0 for any other eigenvalue λ. This result is sharpened
for a strictly positive matrix, M > 0, as the following Theorems.
Theorem A.2. Under the hypotheses (209) and (217), the eigenvalue equation (230) has
a positive solution λ = λ0 > 0, v = q > 0. Moreover, for any vector p (p ≥ 0, p 6= 0),
the sequence
M np
qn = (231)
(M n p)k
with k fixed, converges toward such q.
Theorem A.3. Under the same hypotheses, (230) has no other non-negative solutions
than λ = λ0 , v = cq. For λ = λ0 , (230) has no other solutions than v = cq.
Theorem A.4. Under the same hypotheses, any (complex) eigenvalue λ 6= λ0 of (230)
satisfies
κ−1
|λ| ≤ λ0 . (232)
κ+1
Remark. We note that the factor (κ − 1)/(κ + 1) is the best possible if there is no further
condition. For example.  
κ 1
M= , κ > 0, (233)
1 κ
has eigenvalues λ0 = κ + 1 and λ = κ − 1.

Proof of Theorem A.2. Let us consider two vectors p, p̄ which are non-negative and
unequal to 0, and define
pn+1 = M pn , p̄n+1 = M p̄n , p0 = p, p̄0 = p̄. (234)
From the hypothesis (209), both pn and p̄n are strictly positive for n > 0. We find by
repeated applications of Theorem A.1 that, for n > 1,
 n−1
p̄n,i κ−1 p̄1,i
osc ≤ osc , (235)
i pn,i κ+1 i p1,i

42
where we used the notation pn,i = (pn )i . Consequently, there exists a finite constant
λ > 0, such that
p̄n,i
−→ λ (n −→ ∞) (236)
pn,i
for every i. We normalize the vectors pn , p̄n as
pn p̄n
qn = , q̄ n = , (237)
pn,k p̄n,k

with k fixed. The hypothesis (217) implies that

κ−1 ≤ qn,i ≤ κ, κ−1 ≤ q̄n,i ≤ κ. (238)

Thus, we find that


pn,k p̄n,i p̄n,k pn,k p̄n,i
|q̄n,i − qn,i | = qn,i − ≤κ osc . (239)
p̄n,k pn,i pn,k p̄n,k i pn,i
Now we specialize to the case that p̄ = M p = p1 , namely,

p̄n = M pn = pn+1 , q̄ n = q n+1 . (240)

Using (235) and (236), we estimate (239) for qn+1,i − qn,i , which implies that the sequence
q n converges to a limit vector q. Because of (238), we have q > 0. Now (236) reads

pn+1,i (M pn )i (M q n )i
= = −→ λ0 . (241)
pn,i (pn )i (q n )i
Consequently, M q = λ0 q. For any other initial vector p̄, the sequence q̄ n converges to the
same limit as q n because of (235), (236) and (239). Theorem A.2 is thereby proved.

Proof of Theorem A.3. We assume that v ≥ 0, v 6= 0 is a solution of the eigenvalue


equation (230). Since the hypothesis (209) implies M v > 0, we have λ > 0 and v > 0. We
use this v as an initial vector p in Theorem A.2 and apply the last part of this Theorem
to
M nv λn v v
n
= = . (242)
(M v)k λn vk vk
Hence, the limit q is equal to v/vk , that is, v = cq, and λ = λ0 . Therefore the first part
of Theorem A.3 is proved.
Next, we take λ0 > 0 and q > 0 from Theorem A.2 and consider a solution of
M v = λv. The application of Theorem A.1 to q and v yields
|λ| vi λvi (M v)i κ−1 vi
osc = osc = osc ≤ osc , (243)
λ0 i qi i λ0 qi i (M q)i κ + 1 i qi

where we used (215). If λ = λ0 , the above inequality implies that osci vi /qi = 0 or v = cq,
which provides the second part of Theorem A.3.

Proof of Theorem A.4. We consider (243). If λ 6= λ0 and v 6= 0, v can not be equal


to cq. Therefore osci vi /qi > 0, and then (243) yields (232).

B Conditions for ergodicity


In this Appendix, we prove Theorems 5.1 and 5.2 which provide conditions for weak and
strong ergodicity of an inhomogeneous Markov chain [5].

43
B.1 Coefficient of ergodicity
Let us recall the definition of the coefficient of ergodicity
( )
X
α(G) = 1 − min min{G(z, x), G(z, y)} . (244)
x,y∈S
z∈S

First, we prove that this coefficient is rewritten as


( )
1 X
α(G) = max |G(z, x) − G(z, y)| . (245)
2 x,y∈S
z∈S

Proof of (245). For fixed x, y ∈ S, we define two subsets of S by


+
SG = {z ∈ S | G(z, x) − G(z, y) > 0},

(246)
SG = {z ∈ S | G(z, x) − G(z, y) ≤ 0}.
P
Since the transition matrix satisfies y∈S G(y, x) = 1, we have
X h X i h X i
[G(z, x) − G(z, y)] = 1 − G(z, x) − 1 − G(z, y)
+ − −
z∈SG z∈SG z∈SG
X
=− [G(z, x) − G(z, y)]. (247)

z∈SG

Thus, we find
1X X
|G(z, x) − G(z, y)| = [G(z, x) − G(z, y)]
2
z∈S +
z∈SG
X
= max {0, G(z, x) − G(z, y)}
z∈S
X
= [G(z, x) − min{G(z, x), G(z, y)}]
z∈S
X
=1− min{G(z, x), G(z, y)}, (248)
z∈S

for any x, y. Hence taking the max with respect to x, y on both sides, we obtain (245).

To derive the conditions for weak and strong ergodicity, the following Lemmas are
useful.
Lemma B.1. Let G be a transition matrix. Then the coefficient of ergodicity satisfies

0 ≤ α(G) ≤ 1. (249)

Lemma B.2. Let G and H be transition matrices on S. Then the coefficient of ergodicity
satisfies
α(GH) ≤ α(G)α(H). (250)
Lemma B.3. Let G be a transition matrix and H be a square matrix on S such that
X
H(z, x) = 0, (251)
z∈S

44
for any x ∈ S. Then we have
kGHk ≤ α(G)kHk, (252)
where the norm of a square matrix defined by
( )
X
kAk ≡ max |A(z, x)| . (253)
x∈S
z∈S

Proof of Lemma B.1. The definition of α(G) implies α(G) ≤ 1 because G(y, x) ≥ 0.
From (245), α(G) ≥ 0 is straightforward.

ProofPof Lemma B.2. Let us consider a transition matrix G, a column vector a such
that
P z∈S a(z) = 0, and their product b = Ga. We note that the vector b satisfies
z∈S b(z) = 0 because
" #
X XX X X X
b(z) = G(z, y)a(y) = a(y) G(z, y) = a(y) = 0. (254)
z∈S z∈S y∈S y∈S z∈S y∈S

We define subsets of S by

Sa+ = {z ∈ S | a(z) > 0}, Sa− = {z ∈ S | a(z) ≤ 0},


(255)
Sb+ = {z ∈ S | b(z) > 0}, Sb− = {z ∈ S | b(z) ≤ 0}.
P P
Since z∈S a(z) = z∈S b(z) = 0, we find
X X X X X
|a(z)| = a(z) − a(z) = 2 a(z) = −2 a(z), (256)
z∈S z∈Sz+ z∈Sz− z∈Sz+ z∈Sz−
X X X
|b(z)| = 2 b(z) = −2 b(z). (257)
z∈S z∈Sz+ z∈Sz−

Therefore, we obtain
X X X
|b(z)| = 2 G(z, u)a(u)
z∈S z∈Sb+ u∈S
   
X X X X
=2  G(z, u) a(u) + 2  G(z, u) a(u)
u∈Sa+ z∈Sb+ u∈Sa− z∈Sb+
   
X  X X  X
≤ 2 max G(z, v) a(u) + 2 min G(z, w) a(u)
v∈S   w∈S  
z∈Sb+ u∈Sa+ z∈Sb+ u∈Sa−
 
X X
= max [G(z, v) − G(z, w)] |a(u)|
v,w∈S  
+
z∈Sb u∈S
( )
X X
≤ max max {0, G(z, v) − G(z, w)} |a(u)|
v,w∈S
z∈S u∈S
( )
1 X X
= max |G(z, v) − G(z, w)| |a(u)|
2 v,w∈S
z∈S u∈S
X
= α(G) |a(u)| , (258)
u∈S

45
where we used (248) and (245).
Next, we consider transition matrices G, H and F = GH. We take a(z) = H(z, x) −
H(z, y), and then (258) is rewritten as
X X
|F (z, x) − F (z, y)| ≤ α(G) |H(u, x) − H(u, y)|, (259)
z∈S u∈S

for any x, y. Hence this inequality holds for the max of both sides with respect to x, y,
which yields Lemma B.2.

Proof of Lemma B.3. Let P us consider F = GH. We can take a(z) = H(z, x) in (258)
because of the assumption y∈S H(y, x) = 0. Thus we have
X X
|F (z, x)| ≤ α(G) |H(u, x)|, (260)
z∈S u∈S

for any x. Hence this inequality holds for the max of both sides with respect to x, which
provides Lemma B.3.

B.2 Conditions for weak ergodicity


The following Theorem provides the reason why α(G) is called the coefficient of ergodicity.
Theorem B.4. An inhomogeneous Markov chain is weakly ergodic if and only if the
transition matrix satisfies
lim α Gt,s = 0

(261)
t→∞

for any s > 0.

Proof. We assume that the inhomogeneous Markov chain generated by G(t) is weakly
ergodic. For fixed x, y ∈ S, we define probability distributions by
( (
1 (z = x) 1 (z = y)
px (z) = , py (z) = . (262)
0 (otherwise) 0 (otherwise)

Since px (t, s; z) = u∈S Gt,s (z, u)px (u) = Gt,s (z, x) and py (t, s; z) = Gt,s (z, y), we have
P

X X
Gt,s (z, x) − Gt,s (z, y) = |px (t, s; z) − py (t, s; z)|
z∈S z∈S
≤ sup{kp(t, s) − p′ (t, s)k | p0 , p′0 ∈ P}. (263)

Taking the max with respect to x, y on the left-hand side, we obtain

2α(Gt,s ) ≤ sup{kp(t, s) − p′ (t, s)k | p0 , p′0 ∈ P}. (264)

Therefore the definition of weak ergodicity (119) yields (261).


We assume (261). For fixed p0 , q0 ∈ P, we define the transition probabilities by

H = (p0 , q0 , · · · , q0 ), (265)
t,s
F = G H = (p(t, s), q(t, s), · · · , q(t, s)), (266)

where p(t, s) = Gt,s p0 , q(t, s) = Gt,s q0 . From (245), the coefficient of ergodicity for F is
rewritten as
1X 1
α(F ) = |p(t, s; z) − q(t, s; z)| = kp(t, s) − q(t, s)k. (267)
2 2
z∈S

46
Thus Lemmas B.1 and B.2 yield
kp(t, s) − q(t, s)k ≤ 2α(Gt,s )α(H) ≤ 2α(Gt,s ). (268)
Taking the sup with respect to p0 , q0 ∈ S and the limit t → ∞, we obtain
lim sup{kp(t, s) − q(t, s)k | p0 , q0 ∈ P} ≤ 2 lim α(Gt,s ) = 0, (269)
t→∞ t→∞

for any s > 0. Therefore the inhomogeneous Markov chain generated by G(t) is weakly
ergodic.

Next, we prove Theorem 5.1. For this purpose, the following Lemma is useful.
Lemma B.5. Let a0 , a1 , · · · , an , · · · be a sequence such that 0 ≤ ai < 1 for any i.

X ∞
Y
ai = ∞ =⇒ (1 − ai ) = 0. (270)
i=0 i=n

Proof. Since 0 ≤ 1 − ai ≤ e−ai , we have


m m m
!
Y Y X
−ai
0≤ (1 − ai ) ≤ e ≤ exp − ai . (271)
i=n i=n i=n

In
P∞ the limit m → ∞, the right-hand side converges to zero because of the assumption
i=0 ai = ∞. Therefore we obtain (270).

Proof of Theorem 5.1. We assume that the inhomogeneous Markov chain generated
by G(t) is weakly ergodic. Theorem B.4 yields
lim 1 − α(Gt,s ) = 1
 
(272)
t→∞

for any s > 0. Thus, there exists t1 such that 1 − α(Gt1 ,t0 ) > 1/2 with t0 = s. Similarly,
there exists tn+1 such that 1 − α(Gtn+1 ,tn ) > 1/2 for any tn > 0. Therefore,
n
X  1
1 − α(Gti+1 ,ti ) > (n + 1).

(273)
i=0
2

Taking the limit n → ∞, we obtain (122).


We assume (122). Lemma B.5 yields

Y ∞
Y
1 − 1 − α(Gti+1 ,ti ) = α(Gti+1 ,ti ) = 0.
  
(274)
i=n i=n

For fixed s and t such that t > s ≥ 0, we define n and m by tn−1 ≤ s < tn , tm < t ≤ tm+1 .
Thus, from Lemma B.2, we obtain
α(Gt,s ) ≤ α(Gt,tm )α(Gtm ,tm−1 ) · · · α(Gtn+1 ,tn )α(Gtn ,s )
"m #
Y
t,tm ti+1 ,ti
= α(G ) α(G ) α(Gtn ,s ). (275)
i=n

In the limit t → ∞, m goes to infinity and then the right-hand side converges to zero
because of (274). Thus we have
lim α(Gt,s ) = 0, (276)
t→∞

for any s. Therefore, from Theorem B.4, the inhomogeneous Markov chain generated by
G(t) is weakly ergodic.

47
B.3 Conditions for strong ergodicity
The goal of this section is to give the proof of Theorem 5.2. Before that, we prove the
following Theorem, which also provides the sufficient condition for strong ergodicity.
Theorem B.6. An inhomogeneous Markov chain generated by G(t) is strongly ergodic if
there exists the transition matrix H on S such that H(z, x) = H(z, y) for any x, y, z ∈ S
and
lim Gt,s − H = 0 (277)
t→∞

for any s > 0.

Proof. We consider p0 ∈ P and p(t, s) = Gt,s p0 . For fixed u ∈ S, we define a probability


distribution r by r(z) = H(z, u). We find

X X
kp(t, s) − rk = Gt,s (z, x)p0 (x) − H(z, u)
z∈S x∈S
X X
Gt,s (z, x) − H(z, u) p0 (x)

=
z∈S x∈S
XX XX
≤ Gt,s (z, x) − H(z, u) = Gt,s (z, x) − H(z, x)
z∈S x∈S z∈S x∈S
X
t,s t,s
≤ G − H = |S| G −H . (278)
x∈S

Taking the sup with respect to p0 ∈ P and using the assumption (277), we obtain

lim sup {kp(t, s) − rk | p0 ∈ P} = 0. (279)


t→∞

Therefore, the inhomogeneous Markov chain generated by G(t) is strongly ergodic.

Proof of Theorem 5.2. We assume that the three conditions in Theorem 5.2 hold.
Since the condition 3 is rewritten as

XX ∞
X
|pt (x) − pt+1 (x)| = kpt − pt+1 k < ∞, (280)
x∈S t=0 t=0

we have

X
|pt (x) − pt+1 (x)| < ∞ (281)
t=0

for any x ∈ S. Thus, the stationary state pt converges to p = limt→∞ pt . Now, let
us define a transition matrices H and H(t) by H(z, x) = p(z) and H(z, x; t) = pt (z),
respectively. For t > u > s ≥ 0,

Gt,s − H ≤ Gt,u Gu,s − Gt,u H(u)


(282)
+ Gt,u H(u) − H(t − 1) + kH(t − 1) − Hk .

Thus, we evaluate each term on the right-hand side and show that (277) holds.
[1st term] Lemma B.3 yields that

Gt,u Gu,s − Gt,u H(u) ≤ α(Gt,u ) kGu,s − H(u)k


≤ 2α(Gt,u ), (283)

48
where we used kGu,s − H(u)k ≤ 2. Since the Markov chain is weakly ergodic (condition
1), Theorem B.4 implies that
ε
∀ε > 0, ∃t1 > 0, ∀t > t1 : Gt,u Gu,s − Gt,u H(u) < . (284)
3
[2nd term] Since pt = G(t)pt (condition 2), we find

H(u) = G(u)H(u) = Gu+1,u H(u) (285)

and then

Gt,u H(u) = Gt,u+1 H(u) = Gt,u+1 [H(u) − H(u + 1)] + Gt,u+1 H(u + 1). (286)

The last term on the right-hand side of the above equation is similarly rewritten as

Gt,u+1 H(u + 1) = Gt,u+2 [H(u + 1) − H(u + 2)] + Gt,u+2 H(u + 2). (287)

We recursively apply these relations and obtain


t−2
X
Gt,u H(u) = Gt,v+1 [H(v) − H(v + 1)] + Gt,t−1 H(t − 1)
v=u
t−2
X
= Gt,v+1 [H(v) − H(v + 1)] + H(t − 1). (288)
v=u

Thus the second term in (282) is rewritten as


t−2
X
Gt,u H(u) − H(t − 1) = Gt,v+1 [H(v) − H(v + 1)]
v=u
t−2
X
≤ Gt,v+1 [H(v) − H(v + 1)] . (289)
v=u

Lemmas B.1 and B.3 yield that

Gt,v+1 [H(v) − H(v + 1)] ≤ kH(v) − H(v + 1)k = kpv − pv+1 k , (290)

where we used the definition of H(t). Thus we obtain


t−2
X
Gt,u H(u) − H(t − 1) ≤ kpv − pv+1 k . (291)
v=u
P∞
Since t=0 kpt − pt+1 k < ∞ (condition 3), for all ε > 0, there exists t2 > 0 such that
t−2
X ε
∀t > ∀u ≥ t2 : kpv − pv+1 k < . (292)
v=u
3

Therefore
ε
∀ε > 0, ∃t2 > 0, ∀t > ∀u ≥ t2 : Gt,u H(u) − H(t − 1) < . (293)
3
[3rd term] From the definitions of H and H(t), they clearly satisfy

lim kH(t) − Hk = 0, (294)


t→∞

49
which implies that
ε
∀ε > 0, ∃t3 > 0, ∀t > t3 : kH(t − 1) − Hk < . (295)
3

Consequently, substitution of (284), (293) and (295) into (282) yields that
ε ε ε
Gt,s − H < + + < ε, (296)
3 3 3
for all t > max{t1 , t2 , t3 }. Since ε is arbitrarily small, (277) holds for any s > 0 and then
the given Markov chain is strongly ergodic from Theorem B.6, which completes the proof
of the first part of Theorem 5.2.

Next, we assume p = limt→∞ pt . For any distribution q0 , we have Hq0 = p because


X X
H(z, x)q0 (x) = p(z) q0 (x) = p(z). (297)
x∈S x∈S

Thus, we obtain

Gt,t0 − H q0 ≤ Gt,t0 − H .

kq(t, t0 ) − pk = (298)

Hence it holds for the sup with respect to q0 ∈ P, which yields (121) in the limit of
t → ∞. Theorem 5.2 is thereby proved.

References
[1] M. R. Garey and D. S. Johnson: Computers and Intractability: A Guide to the
Theory of NP-Completeness (Freeman, San Francisco, 1979)
[2] A. K. Hartmann and M. Weigt: Phase Transitions in Combinatorial Optimization
Problems: Basics, Algorithms and Statistical Mechanics (Wiley-VCH, Weinheim,
2005)
[3] K. Helsgaun: Euro. J. Op. Res. 126 (2000) 106.
[4] S. Kirkpatrick, S. D. Gelett and M. P. Vecchi: Science 220 (1983) 671
[5] E. Aarts and J. Korst: Simulated Annealing and Boltzmann Machines: A Stochastic
Approach to Combinatorial Optimization and Neural Computing (Wiley, New York,
1984)
[6] A. B. Finnila, M. A. Gomez, C. Sebenik, S. Stenson, and J. D. Doll: Chem. Phys.
Lett. 219 (1994) 343
[7] T. Kadowaki and H. Nishimori: Phys. Rev. E 58 (1998) 5355
[8] T. Kadowaki: Study of Optimization Problems by Quantum Annealing (Thesis,
Tokyo Institute of Technology, 1999); quant-ph/0205020
[9] A. Das and B. K. Charkrabarti: Quantum Annealing and Related Optimization Meth-
ods (Springer, Berlin, Heidelberg, 2005) Lecture Notes in Physics, Vol. 679
[10] G. E. Santoro and E. Tosatti: J. Phys. A 39 (2006) R393
[11] A. Das and B. K. Chakrabarti: arXiv:0801.2193 (to be published in Rev. Mod.
Phys.).
[12] B. Apolloni, C. Carvalho and D. de Falco: Stoch. Proc. Appl. 33 (1989) 233

50
[13] B. Apolloni, N. Cesa-Bianchi and D. de Falco: in Stochastic Processes, Physics and
Geometry, eds. S. Albeverio et al. (World Scientific, Singapore, 1990) 97
[14] G. E. Santoro, R. Martoňák, E. Tosatti and R. Car: Science 295 (2002) 2427
[15] R. Martoňák, G. E. Santoro and E. Tosatti: Phys. Rev. B 66 (2002) 094203
[16] S. Suzuki and M. Okada: J. Phys. Soc. Jpn. 74 (2005) 1649
[17] M. Sarjala, V. Petäjä and M. Alava: J. Stat. Mech. (2006) P01008
[18] S. Suzuki, H. Nishimori, and M. Suzuki: Phys. Rev. E 75 (2007) 051112
[19] R. Martoňák, G. E. Santoro and E. Tosatti: Phys. Rev. E 70 (2004) 057701
[20] L. Stella, G. E. Santoro and E. Tosatti: Phys. Rev. B 72 (2005) 014303
[21] L. Stella, G. E. Santoro and E. Tosatti: Phys. Rev. B 73 (2006) 144302
[22] A. Das, B. K. Chakrabarti and R. B. Stinchcombe: Phys. Rev. E 72 (2005) 026701
[23] H. F. Trotter: Proc. Am. Math. Soc. 10 (1959) 545
[24] M. Suzuki: Prog. Theor. Phys. 46 (1971) 1337
[25] D. P. Landau and K. Binder: A Guide to Monte Carlo Simulations in Statistical
Physics (Cambridge, Cambridge University Press, 2000) Chap. 8
[26] E. Farhi, J. Goldstone, S. Gutomann and M. Sipser: quant-ph/0001106
[27] A. Mizel, D. A. Lidar and M. Mitchel: Phys. Rev. Lett. 99 (2007) 070502.
[28] S. Morita: Analytic Study of Quantum Annealing (Thesis, Tokyo Institute of Tech-
nology, 2008).
[29] S. Morita and H. Nishimori: J. Phys. Soc. Jpn. 76 (2007) 064002.
[30] A. Messiah: Quantum Mechanics (Wiley, New York, 1976)
[31] R. D. Somma, C. D. Batista, and G. Ortiz: Phys. Rev. Lett. 99 (2007) 030603
[32] E. Hopf: J. Math. Mech. 12 (1963) 683
[33] S. Geman and D. Geman: IEEE Trans. Pattern Anal. Mach. Intell. PAMI-6 (1984)
721
[34] H. Nishimori and J. Inoue: J. Phys. A: Math. Gen. 31 (1998) 5661
[35] H. Nishimori and Y. Nonomura: J. Phys. Soc. Jpn. 65 (1996) 3780
[36] E. Seneta: Non-negative Matrices and Markov Chains (Springer, New York, 2006)
[37] S. Morita, J. Phys. Soc. Jpn. 76 (2007) 104001
[38] L. D. Landau and E. M. Lifshitz: Quantum Mechanics: Non-Relativistic Theory
(Pergamon Press, Oxford, 1965)
[39] C. Zener: Proc. R. Soc. London Ser. A 137 (1932) 696
[40] S. Morita and H. Nishimori, J. Phys. A: Math. and Gen. 39 (2006) 13903
[41] H. W. Press, A. S. Tuekolosky, T. W. Vettering and P. B. Flannery: Numerical
Recipes in C (Cambridge University Press, Cambridge, 1992) 2nd ed.
[42] L. K. Grover: Phys. Rev. Lett. 79 (1997) 325
[43] J. Roland and N. J. Cerf: Phys. Rev. A 65 (2002) 042308
[44] C. Tsallis and D. A. Stariolo: Physica A 233 (1996) 395
[45] D. M. Ceperley and B. J. Alder: Phys. Rev. Lett. 45 (1980) 566
[46] N. Trivedi and D. M. Ceperley: Phys. Rev. B 41 (1990) 4552
[47] L. Stella and G. E. Santoro: Phys. Rev. E 75 (2007) 036703

51

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy