0% found this document useful (0 votes)
32 views2 pages

Inability of A Graph Neural Network Heuristic To Outperform Greedy Algorithms in Solving Combinatorial Optimization Problems Like Max-Cut

The document discusses the inability of a graph neural network heuristic to outperform greedy algorithms in solving combinatorial optimization problems like Max-Cut. It shows that while a GNN approach appeared close to theoretical bounds, simpler methods like gradient descent and greedy search achieved results that were nearly as good or even better for solving Max-Cut problems on random graphs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
32 views2 pages

Inability of A Graph Neural Network Heuristic To Outperform Greedy Algorithms in Solving Combinatorial Optimization Problems Like Max-Cut

The document discusses the inability of a graph neural network heuristic to outperform greedy algorithms in solving combinatorial optimization problems like Max-Cut. It shows that while a GNN approach appeared close to theoretical bounds, simpler methods like gradient descent and greedy search achieved results that were nearly as good or even better for solving Max-Cut problems on random graphs.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Inability of a graph neural network heuristic to outperform greedy algorithms in

solving combinatorial optimization problems like Max-Cut


Stefan Boettcher
Department of Physics, Emory University, Atlanta, GA 30322; USA

Matters Arising from Martin J. A.


(a)
Schuetz et al. Nature Machine Intelligence
https://doi.org/10.1038/s42256-022-00468-6 (2022). 4
10

cut size
cutub
arXiv:2210.00623v1 [cond-mat.dis-nn] 2 Oct 2022

In Ref. [1], Schuetz et al provide a scheme to employ 3 EO


10
GNN
graph neural networks (GNN) as a heuristic to solve a
2 GD
variety of classical, NP-hard combinatorial optimization 10
2 3 4 5
problems. It describes how the network is trained on 10 10 10 10
sample instances and the resulting GNN heuristic is eval- number of nodes n
uated applying widely used techniques to determine its (b)
ability to succeed. Clearly, the idea of harnessing the -0.60 Gradient Descent
GNN
powerful abilities of such networks to “learn” the intrica- -0.62 GraphSAGE
cies of complex, multimodal energy landscapes in such a Greedy Search
-0.64
hands-off approach seems enticing. And based on the ob- EO-Heuristic

<e3>/31/2
served performance, the heuristic promises to be highly -0.66 1-RSB
-P*(Parisi Energy)
scalable, with a computational cost linear in the input -0.68
size n, although there is likely a significant overhead in -0.70
the pre-factor due to the GNN itself. However, closer in-
-0.72
spection shows that the reported results for this GNN are
only minutely better than those for gradient descent and -0.74
get outperformed by a greedy algorithm, for example, for -0.76
Max-Cut. The discussion also highlights what I believe 0 0.0005 0.001 0.0015
are some common misconceptions in the evaluations of 1/n
heuristics.
Among a variety of QUBO problems Ref. [1] consider Figure 1. Results discussed in the text for various heuristics
in their numerical evaluation of their GNN, I want to and bounds for the Max-Cut problem on a 3-regular random
focus the discussion here on Max-Cut. As explained in graph ensemble, (a) plotted for the raw cut-size as a function
the context of Eq. (7), it is derived from an Ising spin- of problem size n, and (b) as an extrapolation plot according
glass Hamiltonian on a d-regular random graph [2] for to Eq. (1). Note that in (b), a fit (red-dashed line) to the
d = 3. (In the physics literature, for historical reason EO-data (circles) suggests a non-linear asymptotic correction
such a graph is often referred to as a Bethe-lattice [3, 4].) with ∼ 1/n2/3 [4].
Minimizing the energy of the Hamiltonian, H, maximizes
the cut-size cut = −H. The cut results for the GNN (for
both, d = 3 and 5) are presented in Fig. 4 of Ref. [1], whose flip (xi 7→ ¬xi ) will improve the cost function.
where they find cut ∼ γ3 n with γ3 ≈ 1.28 via an asymp- (Such “unstable” variables are easy to track.) After only
totic fit to the GNN data obtained from averaging over ∼ 0.4n such flips, typically no further improvements were
randomly generated instances of the problem for a pro- possible and GD converged; very scalable and fast (done
gression of different problem sizes n. In Fig. 1(a) here, overnight on a laptop, averaging over 103 − 105 instances
I have recreated their Fig. 4, based on the value of γ3 at each n, up to n = 105 ). Presented in the form of
reported for GNN (blue line). Like in Ref. [1], I have also Fig. 1(a), the results all look rather good, although it is
included what they describe as a rigorous upper bound, already noticeable that results for GD are barely distin-
cutub (black-dashed line), which derives from an exact guishable from those of the elaborate GNN heuristic.
result obtained when d = ∞ [5]. While the GNN results To discern further details, it is essential to present
appear impressively close to that upper bound, however, the data in a form that, at least, eliminates some of
including two other sets of data puts these results in a its trivial aspects. For example, as Schuetz et al ref-
different perspective. The first set I obtained at signif- erence themselves, the ratio cut/n p ∼ γ converges
√ to a
icant computational cost (∼ n3 ) with another heuristic stable limit with γ ∼ d/4 + P∗ d/4 + O( d) + o(n0 )
(“extremal optimization”, EO) long ago in Ref. [4] (black for n, d → ∞ [6], where P∗ = 0.7632 . . . [5]. In fact,
circles). The second set is achieved by a simple gradient for better comparison with Refs. [3, 4], we focus on the
descent (GD, maroon squares). GD sequentially looks average ground-state energy density of the Hamiltonian
at randomly selected (Boolean) variables xi among those in their Eq. (7) at n = ∞, which is related to γ via
2
√ p p
hed i / d = d/4 − γ 4/d. (The awkward √ denominator of Fig. 1(b), it becomes apparent that the claimed GNN
is owed to fact that P∗ = limd→∞ hed i / d. Also, energy results (blue line) are systematical far (> 15% at any n)
provides a fair reference point to assess relative error be- from optimal (1-RSB, green line) and hardly provide any
cause a purely random assignment of variables results in improvement over pure gradient descent (GD, maroon
an energy of zero, the ultimate null model. Such a refer- squares). It appears that the GNN learns what is indeed
ence point is lacking for the errors quoted in Tab. 1 of the most typical about the energy landscape: the vast
Ref. [1], for example.) prevalence of high-energy, poor-quality metastable solu-
More revealing then merely dividing by n is the trans- tions that gradient descent gets trapped in, missing the
formation of the data into an extrapolation plot [4, 7]: faint signature of exceedingly rare low-energy minima.
Since we care about the scalability of the algorithm in In fact, extending GD by a subsequent 5n spin flips, say,
the asymptotic limit for large problem sizes n → ∞, each flip adjusting one among the least-stable spins (even
which in the form of Fig. 1(a) is out of view, it expe- if not always unstable), allows this greedy local search to
dient to visualize the data plotted for an inverse of the explore several local minima, still at linear cost. The re-
problem size (i.e., 1/n or some power thereof [4, 8, 9]). sults of that simple algorithm, also shown in Fig. 1(b)
Independent of the largest sizes n achieved in the data, it (diamonds), already reduce the error to ≈ 6% across all
conveniently condenses the asymptotic behavior arbitrar- sizes n, a considerable improvement on the GNN results
ily close to the y-intercept where 1/n → 0, albeit it at the in Ref. [1] and still better than an improved version,
cost of sacrificing some data for smaller n. To this end, GraphSAGE, the authors mention in their response (or-
I propose to plot the data in the finite-size corrections ange line).
form,
In conclusion, the study in Ref. [1] exemplifies a num-
const ber of common shortcomings found in the analysis of op-
he3 in ∼ he3 in=∞ + + ..., (n → ∞). (1) timization heuristics (see also Ref. [7]): (1) Reliance on
n
rigorous but rather poor and often meaningless bounds,
In Fig. 1(b) we have plotted the same data from Fig. 1(a) as provided by the Goemans-Williamson algorithm in
according
√ to Eq. (1) for d = 3 (modulo a trivial factor of this case, instead of using the much more relevant re-
1/ 3 for better comparison with P∗ ). Stark differences sults (albeit as-of-yet unproven) from statistical physics,
between each set of data appear, since each set converges (2) using an obscure presentation of the data, (3) lack of
asymptotically to a stable but distinct limit at 1/n = 0. state-of-the-art comparisons across different areas in sci-
First, we note the addition of a well-known result from ence, and (4) lack of benchmarking against trivial, base-
replica theory, a one-step replica symmetry-breaking (1- line models such as gradient descent or greedy search we
RSB) calculation [3, 10] that is expected to yield the presented here. On such closer inspection, the proposed
actual value for he3 in=∞ (and thus, γ3 ) with a preci- GNN heuristic does not provide much algorithmic advan-
sion of 10−4 (green line), a superior reference value than tage over that base line. It is likely that these conclusions
−P∗ (black-dashed line), valid only at d = ∞ although are not isolated to this specific example but would also
seemingly sensible in the form of Fig. 1(a). The 1-RSB hold for Max-Cut at d = 5 and for the other QUBO appli-
value is further emphasized by the fact that the EO data cations discussed in Ref. [1], as the concurrent comment
(black circles) from Ref. [4] smoothly extrapolate to the by Angelini and Ricci-Tersenghi (arXiv:2206.13211) in-
same limit within statistical errors. Finally, in the form dicates.

[1] M. J. A. Schuetz, J. K. Brubaker, and H. G. Katzgraber, Probability 45, 1190 (2017).


Nature Machine Intelligence 4, 367 (2022). [7] S. Boettcher, Physical Review Research 1, 033142 (2019).
[2] Technically, their Hamiltonian in Eq. (7) pertains to an [8] S. Boettcher, Journal of Statistical Mechanics: Theory
antiferromagnet instead of a spin glass, but on such ran- and Experiment 2010, P07002 (2010).
dom graphs, both are equivalent [11]. [9] S. Boettcher, Physical Review Letters 124, 177202
[3] M. Mezard and G. Parisi, J. Stat. Phys. 111, 1 (2003). (2020).
[4] S. Boettcher, The European Physical Journal B - Con- [10] M. Mezard and G. Parisi, Europhys. Lett. 3, 1067 (1987).
densed Matter 31, 29 (2003). [11] L. Zdeborová and S. Boettcher, Journal of Statistical Me-
[5] G. Parisi, J. Phys. A 13, L115 (1980). chanics: Theory and Experiment 2010, P02020 (2010).
[6] A. Dembo, A. Montanari, and S. Sen, The Annals of

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy