Comput Complexity of Count and Sampl
Comput Complexity of Count and Sampl
Complexity of Counting
and Sampling
Computational
Complexity of Counting
and Sampling
István Miklós
Rényi Institute, Budapest, Hungary
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742
This book contains information obtained from authentic and highly regarded sources. Reason-
able efforts have been made to publish reliable data and information, but the author and publisher
cannot assume responsibility for the validity of all materials or the consequences of their use. The
authors and publishers have attempted to trace the copyright holders of all material reproduced in
this publication and apologize to copyright holders if permission to publish in this form has not
been obtained. If any copyright material has not been acknowledged please write and let us know
so we may rectify in any future reprint.
Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.
For permission to photocopy or use material electronically from this work, please access www.
copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc.
(CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organiza-
tion that provides licenses and registration for a variety of users. For organizations that have been
granted a photocopy license by the CCC, a separate system of payment has been arranged.
Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.
Preface xi
vii
viii Contents
Bibliography 363
Index 379
Preface
xi
xii Preface
very briefly presented at the beginning of the book. However, Turing Machines
and/or other models of computations are not explained in this book.
We wanted to give a thorough overview of the field. Still, several topics are
omitted or not discussed in detail in this book. As the book focuses on classify-
ing easy and hard computational problems, very little is presented on improved
running times and asymptotic optimality of algorithms. For example, divide
and conquer algorithms, like the celebrated “four Russians speed-up”, can-
not be found in this book. Similarly, the logarithmic Sobolev inequalities are
not discussed in detail in the chapter on the mixing of Markov chains. Many
beautiful topics, like stochastic computing of the volume of convex bodies,
monotone circuit complexity, #BIS-complete counting problems, Fibonacci
gates, path coupling, and coupling from the past are mentioned only very
briefly due to limited space.
Writing this book was great fun. This work could not have been accom-
plished without the help of my colleagues. I would like to thank Miklós Bóna
for suggesting to write this book. Also, the whole team at CRC Press is
thanked for their support. Special thanks should go to Jin-Yi Cai, Cather-
ine Greenhill, and Zoltán Király for drawing my attention to several papers
I had not been aware of. I would like to thank Kálmán Cziszter, Mátyás
Domokos, Péter Erdős, Jotun Hein, Péter Pál Pálfy, Lajos Rónyai, and Miklós
Simonovits for fruitful discussions. András Rácz was volunteered to read the
first two chapters of the book and to comment, for which I would like to
warmly thank him. Last but not least, I will always remember my beloved
wife, Ágnes Nyúl, who supported the writing of this book till the end of her
last days, and who, unfortunately, passed away before the publication of this
book.
List of Figures
xiii
xiv List of Figures
xvii
Chapter 1
Background on computational
complexity
1
2 Computational complexity of counting and sampling
does not exist for a computational problem. However, we can prove that no
polynomial running time algorithm exists for certain counting problems if no
polynomial running time algorithm exists for certain hard decision problems.
This fact also underlines why discussing decision problems is inevitable in a
book about computational complexity of counting and sampling.
When exact counting is hard, approximate counting might be easy or hard.
Surprisingly, hard counting problems might be easy to approximate stochasti-
cally, however, there are counting problems that we cannot approximate well.
We conjecture that they are hard to approximate, and this is a point where
stochastic approximations are also related to random approaches to decision
problems. Particularly, if no random algorithm exists for certain hard decision
problems that run in polynomial time and is any better than random guessing,
then there is no efficient good approximation for certain counting problems.
In this chapter, we give a brief introduction to computational complexity
and show how computational complexity of counting and sampling is related
to computational complexity of decision and optimization problems.
It is easy to see that these problems are indeed in NP. If somebody selects
vertices v1 , v2 , . . . , vk , it is easy to verify that for all i, j ∈ {1, 2, . . . , k}, there
is an edge between vi and vj . If somebody provides a partitioning of a set of
numbers, it is easy to calculate the sums of the subsets and check if the two
sums are the same. Also, it is easy to verify that assignments to the variables
x1 , x2 , . . . , xn satisfy any inequality under (1.1).
In many cases, finding a solution seems to be harder than verifying a
solution. There are problems in NP for which no polynomial running time
algorithm is known. We cannot prove that such an algorithm does not exist,
however, we can prove that these hard computational problems are as hard
as any problems in NP. To precisely state this, we first need the following
definitions.
Proof. The proof is based on the fact that the sum as well as the composition
of two polynomials are also polynomials.
Stephen Cook proved in 1971 that SAT is NP-complete [44], and Richard
Karp demonstrated in 1972 that many natural computational problems are
NP-complete by reducing SAT to them [108]. These famous Karp’s 21 NP-
complete problems drove attention to NP-completeness and initiated the study
of the P versus NP problem. The question whether or not P equals NP has
Background on computational complexity 7
Problem 2.
Name: H-Cycle.
~ = (V, E).
Input: a directed graph, G
~
Output: “yes” if G has a Hamiltonian cycle, “no” otherwise.
The central question here is how many oracle calls are needed to approximate
the volume of K. We are going to show a negative result. It says that there
is no deterministic, polynomial running time algorithm that can reasonably
approximate the volume of a convex body in this computational model. The
extremely surprising fact is that in the same computational model, approxi-
mating the volume with a random algorithm is possible in polynomial time.
Before we state and prove the theorem, we need the following lemma.
Lemma 1. Let v1 , v2 , . . . , vn be points on the surface of the d (d ≤ n + 1)
dimensional Euclidian unit ball B around the center. Let K be the convex hull
of the given points. Then
d
1
vol(K) ≤ n vol(B) (1.5)
2
where vol(K) and vol(B) are the volumes of the convex hull and the unit ball,
respectively.
Proof. Consider the balls B1 , B2 , . . . , Bn whose radii are 12 and centers are
v1 v2 vn
2 , 2 , . . . , 2 . Here each point vi is considered as a d-dimensional vector. We
claim that
K ⊆ ∪ni=1 Bi . (1.6)
/ ∪ni=1 Bi . Then for each
Indeed, assume that there is a vertex v ∈ K, but v ∈
i, the angle Oxvi is strictly smaller than π/2. Now consider the hyperplane P
whose normal vector is Ox, and contains x, and let H be the open halfspace
determined by P , containing O. Since each Oxvi angle is strictly smaller than
π/2, each vi is in H. But then K, the convex hull of vi s, cannot contain x, a
contradiction.
is in K. Then any deterministic algorithm that has only poly(d) oracle calls
cannot give an estimation f of the volume of K satisfying
vol(K) d
d ≤ f ≤ 1.999 2 vol(K) (1.7)
1.999 2
vol(K1 ) 1
≥ 2d . (1.8)
vol(K2 ) n+d+1
In both cases, the oracle tells us that all points z1 , z2 , . . . , zn are inside the
convex body. Let f be the value that the algorithm generates, based on the
answers of the oracle. Then either
d
f > 1.999 2 vol(K2 ) (1.9)
or
vol(K1 )
f< d . (1.10)
1.999 2
If both inequalities failed, than it would be true that
vol(K1 )
f vol(K1 ) 1
1.999d ≥ vol(K2 )
= ≥ 2d , (1.11)
vol(K2 ) n+d+1
f
P = RP ⊂ NP (1.16)
NP-complete ∩ P = ∅ (1.17)
and
#P-complete ∩ F P = ∅. (1.18)
The conjecture that RP = NP also implies that RP ∩ NP-complete = ∅.
This comes from the following theorem and from the easy observation that
RP ⊆ BPP.
Theorem 14. (Papadimitriou’s theorem) If the intersection of NP-complete
and BPP is not empty, then RP = NP.
Proof. We prove this theorem in three steps.
1. First we prove that a BPP algorithm for any NP-complete problem
would prove that BPP = NP.
2. In particular, SAT would be in BPP. We show that a BPP algorithm
for SAT would provide an RP algorithm for SAT.
3. Finally, we show that an RP algorithm for SAT would mean that RP =
NP.
Concerning the first point, assume that there is an NP-complete problem
A for which a BPP algorithm exists. Let B be an arbitrary problem in NP.
Since A is NP-complete, for any problem instance x in B, there is a series
of problem instances y1 , y2 , . . . yk ∈ A such that these problem instances can
be constructed from x in polynomial time with the size of x. Particularly,
k = O(poly(|x|)) and for each i = 1, 2, . . . , k, |yi | = O(poly(|x|)). Furthermore,
from the solutions of these problems, the solution to x can be achieved in
polynomial time. If only random answers are available for y1 , y2 , . . . , yk , then
only a random answer for x can be generated. One wrong solution for any yi
might result in a wrong answer for x. Therefore, to get a BPP algorithm for
solving x, we need that the probability that all yi are answered correctly must
be at least 2/3. For this, each yi must be answered correctly with probability
1
at least 23 k . We can approximate it with
1
1− 2k
(1.19)
log( 32 )
Background on computational complexity 17
since
log(2 2 )
3
k 2k
( )
log 3
2
1 − 1 1
1 −
= >
2k 2k
log( 32 ) log( 32 )
log(2 2 )
3
r
1 2 2
= > . (1.20)
e 3 3
namely,
m 1
exp − ≤ 2k
. (1.23)
48
log( 32 )
We get that !
2k
m ≥ 24 log (1.24)
log 32
The first and the last step can be done in polynomial time due to the definition
of NP-completeness. Since k = O(poly(|x|)) and for each i, |yi | = O(poly(|x|)),
the second step also runs in polynomial time. Therefore, the overall running
time is polynomial with the size of x. It is also a BPP algorithm since the
probability that all yi are answered correctly is at least 2/3.
Next, we prove that a BPP algorithm for SAT provides an RP algorithm
for SAT. Let Φ be a conjunctive normal form. Consider the conjunctive normal
form Φ1 that is obtained from Φ by removing all clauses that contain the literal
x1 and removing all occurrences of the literal x1 . Clearly, Φ1 is satisfiable if
and only if Φ has a satisfying assignment in which x1 is TRUE. Consider
also the conjunctive normal form Φ01 that is obtained from Φ by removing all
clauses that contain the literal x1 and removing all occurrences of the literal
x1 . Φ01 is satisfiable if and only if Φ has a satisfying assignment in which x1 is
FALSE. We can solve the decision problem if Φ1 is satisfiable with a very high
probability by repeating the BPP algorithm an appropriate number of times,
and we can also do the same with Φ01 . If one of them is satisfiable, then we can
continue with the variable x2 , and decide if there is a satisfying assignment
of Φ1 or Φ01 in which x2 is TRUE or FALSE. Iterating this procedure, we can
actually build up a candidate for satisfying assignment. By repeating the BPP
algorithm sufficiently many times, we can achieve that the probability that
all calculations are correct is at least 1/2. After building up the candidate
assignment, we can deterministically check if this candidate is a satisfying
assignment, and answer “yes” to the decision question if Φ is satisfiable only
if we verified that the candidate is indeed a satisfying assignment.
We claim that this procedure is an RP algorithm for the SAT problem. If
there are n logical variables, then the candidate assignment is built up in n
iterations. We can use again Chernoff’s inequality to show that O(n) repeats
of the BPP algorithm in each iterative step is sufficient for having at least
1/2 probability that all computations are correct. (Detailed computations are
skipped here; the computation is very similar to the previous one.) Therefore,
if Φ is satisfiable, then we can construct a candidate assignment which is
indeed a satisfying assignment with probability at least 1/2. We can verify
that the candidate assignment is indeed a satisfying assignment in polynomial
time, and thus, we answer “yes” with probability at least 1/2. If Φ is not
satisfiable, then either we conclude with this somewhere during the iteration
or we construct a candidate assignment which is actually not a satisfying
one. However, we can deterministically verify it, and therefore, if Φ is not
satisfiable, then with probability 1 we answer “no”.
Finally, we claim that an RP algorithm for SAT provides an RP algorithm
for any problem in NP. This is the direct consequence of Theorem 1. Indeed,
let A be in NP, and let x be a problem instance in A. Then construct the
conjunctive normal form Φ which is satisfiable if and only if the answer for x
is “yes”. By solving the satisfiability problem for Φ with an RP algorithm, we
also solve the decision problem for x.
Background on computational complexity 19
and the running time of the algorithm is polynomial in |x|, 1 and − log(δ).
An algorithm itself with these prescribed properties is also called FPRAS.
Valiant and Vazirani already observed in 1986 that it is hard to count the
number of cycles in a directed graph [103]. Below we introduce this hardness
result.
Problem 6.
Name: Cycle.
~
Input: a directed graph G.
~
Output: “yes” if G contains a cycle, “no” otherwise.
Theorem 16. The Cycle problem is in P. On the other hand, the following
also holds. If there is a deterministic algorithm such that
~ = (V, E),
(a) its input is a directed graph G
~
(b) it runs in polynomial time of the size of G,
f
≤ fˆ ≤ f × poly(n)
poly(n)
FIGURE 1.1: The gadget graph replacing a directed edge in the proof of
Theorem 16. See text for details.
on each diamond motif, and any combination of them provides a path from s
to t.
~ 0 by replacing each edge (u, v) with the
Construct the directed graph G
k
gadget graph H. For any cycle in G ~ with k edges, there are 2n2 corre-
~ 0 . Therefore, if there is no Hamiltonian cycle in G, then
sponding cycles in G
there are at most
n−1
X n 2 k 2 (n−1) 2 (n−1)
k! 2n < n(n − 1)! 2n = n! 2n (1.27)
k
k=2
~ 0 . The ratio of this latter lower bound and the upper bound in case
cycles in G
of no Hamiltonian cycles is
2 n
2n 2n
2
n2
(n−1) = >2 2 (1.29)
n! 2n 2 n!
which grows faster than any exponential function, and thus, any polynomial
function. Therefore, any polynomial approximation for the number of cycles
~ 0 would provide an answer for the question whether there is a Hamiltonian
in G
~ Furthermore, the size of G
cycle in G. ~ 0 is only a polynomial function of G,
~
thus H-Cycle has a polynomial reduction to the polynomial approximation
for #Cycle. That is, it is NP-hard to have a polynomial approximation for
the number of cycles in a directed graph.
An FPRAS approximation for the number of cycles would provide a BPP
algorithm for the H-Cycle problem. Indeed, an FPRAS approximating the
number of cycles in G ~ 0 with parameters = 1 and δ = 1/3 would separate
the cases when G ~ has and does not have a Hamiltonian cycle with probability
at least 2/3. Since H-Cycle is NP-complete, the intersection of NP-complete
and BPP could not be empty which would imply that RP = NP, according
to Theorem 14.
22 Computational complexity of counting and sampling
The situation will remain the same if we could generate roughly uniformly
distributed cycles from a directed graph. Below we state this precisely after
defining the necessary ingredients of the statement.
Definition 16. The total variation distance of two discrete distributions p
and π over the same (countable) domain X is defined as
1 X
dT V (p, π) := |p(x) − π(x)|. (1.30)
2
x∈X
dT V (p, U ) ≤ (1.31)
where U is the uniform distribution of the witnesses. The algorithm must run
in polynomial time both with the size of the problem instance and − log().
This algorithm itself is also called FPAUS.
There is a strong relationship between the complexity classes FPRAS and
FPAUS. In Chapter 7 we will show that for a large class of counting prob-
lems, called self-reducible counting problems, there is an FPRAS algorithm
for a particular counting problem if and only if there is an FPAUS algorithm
for it. Here we state and prove that an FPAUS for sampling cycles from a di-
rected graph would have the same consequence as the existence of an FPRAS
algorithm for counting the cycles in a directed graph.
Theorem 17. If there is an FPAUS algorithm for #CYCLE, then RP = NP.
Proof. According to Theorem 14, it is sufficient to show that an FPAUS would
provide a BPP algorithm for H-CYCLE. Assume that there is an FPAUS for
sampling cycles from a directed graph. For any directed graph G, ~ construct
~ 0
the same graph G that we constructed in the proof of Theorem 16. Apply the
FPAUS algorithm on G ~ 0 using = 1/10, and generate one cycle. Draw back
~
this cycle to G. If this is a Hamiltonian cycle, then answer “yes”, otherwise
answer “no”.
We claim that this is a BPP algorithm (actually, an RP algorithm). If
there is no Hamiltonian cycle in G, ~ then the algorithm answers “no” with
Background on computational complexity 23
assuming that G~ has at least 2 vertices. Note that the distribution p has at
~ 0 that correspond to Hamiltonian cycles
least 2/3 probability on the cycles in G
~ otherwise we would have
in G,
1 1 X
≥ dT V (p, U ) ≥ |U (x) − p(x)|
10 2
x∈H
!
1 X X 1 16 2 1
≥ U (x) − p(x) ≥ − = (1.33)
2 2 18 3 9
x∈H x∈H
(a) Chapter 2 describes the easiest counting problems. These are the prob-
lems whose decision, optimization and counting versions can be univer-
sally solved with dynamic programming algorithms. The computations
in these dynamic programming algorithms use only additions and multi-
plications, which we call monotone computations. We are going to show
that from an algebraic point of view, the logical OR and AND opera-
tions can be considered as additions and multiplications. Similarly, addi-
tion and multiplication can be replaced with minimization and addition
without changing the algebraic properties of the computations. We are
going to show that a large class of dynamic programming algorithms
have some universal algebraic properties, therefore essentially the same
algorithms can solve the decision, optimization and counting versions of
a given problem. If the universal algorithm has polynomial running time,
then the problem it solves has a decision version in P and a counting
version in FP, and furthermore, optimal solutions can be found also in
polynomial time.
(b) Chapter 3 introduces counting problems solvable in polynomial time us-
ing subtraction. Particularly, the number of spanning trees in a graph
as well as the number of Eulerian circuits in a directed Eulerian graph
are related to the determinant of certain matrices, and the number of
perfect matchings in a planar graph is the Pfaffian of an appropriately
oriented adjacency matrix of the graph. We are also going to show that
both the determinant and the Pfaffian can be computed in polynomial
time using only additions, subtractions and multiplications, and there-
fore computations can be generalized to arbitrary commutative rings.
These algorithms can also be viewed as monotone computations on some
certain combinatorial objects. The signed and weighted sums of these
combinatorial objects coincide with the determinants and Pfaffians of
matrices via cancellations.
(c) We give a comprehensive list of #P-complete problems in Chapter 4.
We highlight those problems for which approximation preserving #P-
completeness proofs exist, and therefore, there is no FPRAS approxi-
mations for these problems unless RP = NP.
(d) Chapter 5 is about a relatively new topic, holographic algorithms. A
holographic reduction is a linear algebraic many-to-many mapping be-
tween two sets. Such a mapping can prove that the sizes of the two
sets are the same without explicitly giving a one-to-one correspondence
between the two sets. Therefore, if the cardinality of one of the sets
can be obtained in polynomial time, it can be done for the other set.
Usually the holographic reduction maps a set to the (weighted) perfect
matchings of planar graphs. Computing the sum of the weighted perfect
matchings provides a tractable way to obtain the cardinality of the set.
Holographic reductions are also used to obtain equalities of the cardinal-
26 Computational complexity of counting and sampling
ities of two sets where finding the cardinality of one of the sets is known
to be #P-hard. This provides a proof that finding the cardinality of the
other set is also #P-hard.
(e) We turn to sampling methods in Chapter 6. We show how to sam-
ple uniformly combinatorial objects that can be counted with algebraic
dynamic programming. This is followed by the introduction of ways of
random generations providing techniques of almost uniform generations.
Here Markov chains are the most useful techniques.
(f) Chapter 7 is devoted to the theory of the mixing of Markov chains. It
turns out that in many cases, rapidly mixing Markov chains provide
almost uniform generations of certain combinatorial objects. Markov
chains are used to build a dichotomy theory for self-reducible problems:
a self-reducible counting problem either has an FPRAS or cannot be
approximated within a polynomial approximation factor. We also show
that any self-reducible counting problem has an FPRAS if and only if
it has an FPAUS. The consequence is that in many cases, we can prove
that a counting problem is in FPRAS by showing that there exists a
Markov chain which converges rapidly to the uniform distribution of
the witnesses.
(g) Finally, Chapter 8 provides a comprehensive list of counting problems
for which FPRAS exists.
1.8 Exercises
1. Algorithm A has running time 100000n, and algorithm B has running
time 0.1n3 , where n is the input size. Which algorithm is faster if the
input size is typically between 30 and 80?
2. Algorithm A has running time n3 , and algorithm B has a running time
1.01n , where n is the input size. Which algorithm is faster if the input
size is 1000?
3. An algorithm has running time n81 , where n is the input size. What is the
largest input size for which the algorithm finishes in a year running on a
supercomputer achieving 100 exaflops? An exaflop means 1018 floating
point operations per second. Assume that one computational step needs
one floating point operation. How many years does it take to run this
algorithm on the mentioned supercomputer if the input size is n = 3?
4. * Let B be the unit Euclidian ball in a d-dimensional space. Let S be
Background on computational complexity 27
6. ◦ Assume that the decision problem A has a random algorithm with the
following properties:
(a) Its running time grows polynomially with the size of the input.
(b) If the correct answer is “yes”, it answers “yes” with probability
1/2.
1
(c) If the correct answer is “no”, it answers “no” with probability 2 +
1
n3 , where n is the size of the problem instance.
(a) Its running time grows polynomially with the size of the input.
1
(b) If the correct answer is “yes”, it answers “yes” with probability n3 ,
where n is the size of the problem instance.
(c) If the correct answer is “no”, it answers “no” with probability 1.
11. ◦ Assume that problem A is in RP. Show that there is a random algo-
rithm, that for any problem instance x in A and > 0,
12. Prove that the total variation distance is indeed a distance on the space
of distributions over the same countable domain. For any such domain
X, let F be the set of distributions over X. Let p1 , p2 , p3 ∈ F. Then the
following equations hold.
dT V (p1 , p1 ) = 0
dT V (p1 , p2 ) ≥ 0
dT V (p1 , p2 ) = dT V (p2 , p1 )
dT V (p1 , p2 ) + dT V (p2 , p3 ) ≥ dT V (p1 , p3 ).
13. * Let π and p be two arbitrary distributions over the same countable
domain. Prove that
where X
p(A) := p(x).
x∈A
14. Prove that the inequality in Equation (1.32) holds for any n ≥ 2 and
c ≥ 1.
15. ◦ Assume that the correct answer f of a problem instance x in the
counting problem #A is an integer, and naturally bounded by 1 and
cpoly(|x|) for some c > 1. Furthermore, assume that #A is in FPRAS.
Show that a random estimation fˆ to the solution can be given with the
following properties:
|fˆ − f | 1
≤ .
f poly(|x|)
Background on computational complexity 29
σf2ˆ 1
≤ .
f poly(|x|)
1.9 Solutions
Exercise 4. It is sufficient to show that for any regular d-simplex, the ratio
of the radii of the inscribed and circumscribed hyperspheres is d1 . Let the
coordinates of the regular d-simplex S be the unit vectors of the coordinate
system of a d + 1-dimensional space (that is, (1, 0, 0, . . . , 0), (0, 1, 0, 0, . . . , 0),
etc.). The inscribed hypersphere hits the surface of the simplex in the middle of
the facets. The coordinates of these points are 0, d1 , d1 , . . . , d1 , d1 , 0, d1 , . . . , d1 ,
etc. Observe that these points are also vertices of a regular d-simplex S 0 , whose
circumscribed hypersphere is the inscribed hypersphere of S. Thus the ratio
of the radii of the inscribed and circumscribed hyperspheres √ of S is the ratio
of the edge lengths of S 0 and S. Since the edge length of S 0 is d2 and the edge
√
length of S is 2, the ratio in question is indeed d1 .
n
Exercise 5. Let D = {d , d , . . . , d }. The convex polytope is in R( 2 ) . Let
1 2 n
us denote the coordinates by the index pairs (i, j), where i < j. The linear
inequalities defining the polytope are
0 ≤ x(i,j) ≤ 1
The bijection between the realizations G = (V, E) and the integer points in
30 Computational complexity of counting and sampling
B := {x|p(x) ≥ π(x)}.
Observe that X X
|p(x) − π(x)| = |p(x) − π(x)|.
x∈B x∈B
Indeed,
X X X X
(p(x) − π(x)) + (p(x) − π(x)) = p(x) − π(x) = 0,
x∈B x∈B x∈B∪B x∈B∪B
therefore X X
(p(x) − π(x)) = − (p(x) − π(x)).
x∈B x∈B
Background on computational complexity 31
p(C) − π(C) = p(B) − π(B) − (p(B \ C) − π(B \ C)) + (p(C \ B) − π(C \ B))
holds. However, p(B \C)−π(B \C) cannot be negative and p(C \B)−π(C \B)
cannot be positive due to the definition of B. Therefore,
A product is 1 if and only if for all i, ai,σ(i) = 1, namely, (i, σ(i)) is a directed
edge. We claim that there is a bijection between permutations for which the
product is 1 and the cycle covers in G. ~ The bijection is given by the two
representations of σ, the function representation, namely, for each i, σ(i) is
given, and the cycle representation of σ. Indeed, if a product is 1, then each
ai,σ(i) = 1, and these edges form a cycle cover in G. ~ On the other hand, each
cycle cover indicates a permutation σ, such that each ai,σ(i) = 1.
Part I
Computational Complexity
of Counting
33
Chapter 2
Algebraic dynamic programming and
monotone computations
35
36 Computational complexity of counting and sampling
In this book, we extend this theorem basically for any dynamic programming
recursions. The main idea is to separate recursions building up combinatorial
objects from using exact computations. To do this, we are going to intro-
duce yield algebras that build the combinatorial objects (see Definition 19)
and evaluation algebras that do the computations on these combinatorial ob-
jects (see Definition 21). Before giving the formal definitions, we introduce the
concept via several well-known examples.
Ai+1 = Ai + Ni (2.1)
and
Ni+1 = Ai . (2.2)
Using these recursions, it is easy to find the number of couples in month i.
Indeed, let Fi denote the total number of couples in the ith month.
Then
which is the well-known recursion for the Fibonacci numbers. Assume also that
we would like to count the children, grandchildren and grand-grandchildren of
Algebraic dynamic programming and monotone computations 37
the founding couple, and in general, the rabbit couples of the k th generation in
month i. Then we can assign the following characteristic polynomial to each
subpopulation:
i
X
gAi := ai,k xk (2.4)
k=1
i
X
gNi := ni,k xk (2.5)
k=1
where ai,k is the number of adult couples of the k th generation in the ith
month and ni,k is the number of newborn couples of the k th generation in the
ith month. The first generation is the founding couple. The recursions for the
characteristic polynomials are:
The number of adult (newborn) rabbit couples of the k th generation in the ith
month is the coefficient of xk in gAi (gNi ). Let ri,k denote the total number
of rabbit couples of the k th generation in the ith month. Then
bX2 c
i+1
bX2 c
i+1
bX2 c
i−1
i−k i−k−1
Fi = ri,k = = , (2.11)
k−1 k
k=1 k=1 k=0
which is the well-known equality saying that the Fibonacci numbers are the
sums of the “shallow” diagonals in Pascal’s triangle.
38 Computational complexity of counting and sampling
There is an obvious similarity between Equations (2.1) and (2.2) and Equa-
tions (2.6) and (2.7). Both couples of recursions calculate sums over sets. In-
deed, let Ai denote the set of adult rabbits in the ith month. Let f1 assign
the constant 1 to each rabbit couple, and let f2 assign xk to a rabbit couple
in the k th generation. Then
X
Ai = f1 (r) (2.12)
r∈Ai
and X
gAi = f2 (r), (2.13)
r∈Ai
and similar equations hold for Ni and gNi . We can also obtain recursions
directly on the sets. Define two operations on rabbits, the o: “get one month
older!” and the b: “give birth!” operators. Also define the action of these
operators on sets of rabbit couples as
and similarly for b. Then the recursions for the subsets of populations are
Ai+1 = o ∗ Ai ∪ o ∗ Ni (2.15)
Ni+1 = b ∗ Ai . (2.16)
careful reader can also observe that the obtained sequences from the alphabet
{1, 2} decode the possible patterns of short and long syllables of a given total
duration, thus the numbers of such patterns are also indeed the Fibonacci
numbers.
As we can see, the Fibonacci numbers also appear as the number of cer-
tain legitimate code words or regular expressions. We can count any regular
expressions using recursions. Consider the following example.
and
Ii+1 = (Ii ◦ 0) ∪ (Ii ◦ 1) ∪ (Ii ◦ 3) ∪ (Li ◦ 2) (2.19)
with the initial sets L1 = {2} and I1 = {0, 1, 3}. To count the legitimate and
illegitimate code words, we simply have to find the size of the sets appearing
in Equations 2.18 and 2.19. Observe that each code word appears exactly once
during the recursions, therefore
and
nIi+1 = 3nIi + nNi (2.21)
where nLi (nIi ) is the number of legitimate (illegitimate) code words of length
i. The initial values are nL1 = 1 and nI1 = 3.
It is possible to find a closed form for such linear recurrences; such solutions
can be found in any standard combinatorics textbook. From a computational
40 Computational complexity of counting and sampling
complexity point of view, the given recursions are sufficient to calculate nLi
in a time proportional to some polynomial of i, therefore we are not going to
provide closed forms here.
We also obtain recursions for the sum of the numbers in legitimate
and illegitimate code words of a given length from the recursions in Equa-
tions (2.18) and (2.19). Extending with a number k each code word in a set
of size m increases the sum of the numbers by mk. Therefore the recursions
are
sLi+1 = sLi + 4nLi + sIi + 2nIi (2.22)
and
sIi+1 = sIi + 4nIi + sLi + 2nLi (2.23)
where sLi (sIi ) is the sum of the numbers in the legitimate (illegitimate)
code words of length i. These recursions are sufficient to find the sum of the
numbers in legitimate code words of length n using O(poly(n)) arithmetic
operations.
42 Computational complexity of counting and sampling
The careful reader might observe some similarity between Equation (2.28)
and Equations (2.22) and (2.23). In both cases, the recursion finds the sum of
an additive function over combinatorial objects, and the recursions use both
these sums and the sizes of these smaller sets. We are going to introduce a
commutative ring and will show that calculations involving these recursions
can be naturally described in that ring.
In many textbooks, the introductory example for dynamic programming
algorithms is the money change problem. We too introduce this problem to-
gether with its solution and show how it also fits into the algebraic dynamic
programming approach.
Definition 18. Let C = {c1 , c2 , . . . , ck } be a finite set of positive integers
called the coin system, and let x be a non-negative integer. The money change
problem is to find the minimum number of coins necessary to change x. Each
coin type has an infinite supply.
To solve the money change problem, a function m is defined that maps the
natural numbers N to N ∪ ∞; for each x ∈ N, the value m(x) is the minimum
number of coins necessary to change x if changing of x is possible in this coin
set C and ∞ otherwise. The following theorem is true for function m.
Theorem 18. For x = 0, equation
m(0) = 0 (2.29)
This means that for any x, the left-hand side of Equation (2.30) is greater
Algebraic dynamic programming and monotone computations 43
than or equal to the right-hand side. If the set from which the minimum is
taken on the right-hand side is empty for some x, then it is defined as ∞ and
naturally inequality
holds. If the set is not empty, then let c0 be the coin value for which the
minimum is taken, and let ci1 , ci2 , . . . , cim(x−c0 ) be a minimal change for x − c0 .
Then the set of coins ci1 , ci2 , . . . , cim(x−c0 ) , c0 is a change for x, it contains
m(x − c0 ) + 1 number of coins, and this number cannot be less than m(x).
Therefore
m(x) ≤ m(x − c0 ) + 1 = min {m(x − ci ) + 1}. (2.34)
i∈[1,k]
x−ci ≥0
Since for all x > 0, inequalities in both directions hold, and therefore, Equa-
tion (2.30) also holds.
We can also build recursions on the possible coin sequences. Such recur-
sions hold without stating what we want to calculate. After proving the cor-
rectness of the recursion on the possible coin sequences, we can solve different
optimization and counting problems by replacing the operations in the given
recursions. The construction is stated in the following theorem.
where is the empty string, ◦ denotes string concatenation, and the use of
disjoint union is to emphasize that each string in S(x) is generated exactly
once. Furthermore, the empty disjoint union (which is the case when for all
ci , x − ci < 0) is defined as the empty set.
Proof. Equation (2.35) is trivial: only the empty sum of the coin values has
value 0. To prove Equation (2.36), it is sufficient to prove that
and
S(X) ⊇ t i∈[1,k] {s ◦ ci | s ∈ S(x − ci )} ∀x > 0. (2.38)
x−ci ≥0
partition, and for each sequence in the partition, if the last coin ci is removed,
the remaining sequence will be in S(x − ci ). Thus any sequence in S(x) is in
t i∈[1,k] {s ◦ ci |s ∈ S(x − ci )}.
x−ci ≥0
To prove Equation (2.38), observe that each sequence in
t i∈[1,k] {s ◦ ci |s ∈ S(x − ci )} is in S(x), since the sum of the coin val-
x−ci ≥0
ues in each sequence is x. Furthermore, there is no multiplicity in
t i∈[1,k] {s ◦ ci |s ∈ S(x − ci )}, since there is no multiplicity in any of
x−ci ≥0
the S(x − ci ) sets and two sequences cannot be the same if their last coin
values are different.
S(0) = 1 (2.40)
can be read out from Equation (2.35). This calculation can be formalized in
the following way. Let f (s) := 1 for any coin sequence s, and let
X
F (S(x)) := f (s). (2.41)
s∈S(x)
Then F (S(x)) will be the size of the set S(x) and it holds that
X
F (S(x)) = F (S(x − ci )). (2.42)
i∈[1,k]
x−ci ≥0
This is the evaluation algebra, which can be changed without changing the
underlying yield algebra. For example, let g(s) := z |s| (here z is an indetermi-
nate) and let X
G(S(x)) := g(s). (2.43)
s∈S(x)
thus X
G(S(x)) = G(S(x − ci ))z. (2.45)
i∈[1,k]
x−ci ≥0
a1 ◦ a2 ◦ . . . ◦ am
is also written as
m
◦ (aj )j=1 .
where
S(θ) := {a ∈ A |p(a) = θ} (2.47)
and n o
mi mi
◦i (S(θj ))j=1 := ◦i (aj )j=1 | aj ∈ S(θi,j ) (2.48)
a (b ⊕ c) = (a b) ⊕ (a c) (2.49)
and
(b ⊕ c) a = (b a) ⊕ (c a) (2.50)
hold. The additive unit is usually denoted by 0.
We would like to emphasize that any ring is also a semiring in which the
addition can be inverted (that is, there is also a subtraction). Readers not
familiar with abstract algebra might consider the integer ring as an example
of semirings with the additional rule that subtraction is forbidden. Later on,
we will see that subtraction has very high computational power. Some com-
putational problems can be solved in polynomial time when subtraction is
allowed and otherwise those computational problems have exponential lower
bound for their running time when only addition and multiplication are al-
lowed. In this chapter, we would like to study the computational problems
efficiently solvable using only additions and multiplications. This is why we
restrict evaluation algebras to semiring computations.
Definition 21. An evaluation algebra is a tuple {Y, R, f, T }, where Y is a
Algebraic dynamic programming and monotone computations 47
yield algebra, and f is a function mapping A (the set of objects in the yield
algebra) to some semiring R. T is a set of functions Ti mapping Rmi × Θmi
to R where ◦i is an mi -ary operator , with the property that for any operands
of ◦i
mi
f ◦i (aj )j=1 = Ti (f (a1 ), . . . , f (ami ); p(a1 ), . . . , p(ami )) . (2.51)
In many cases, the operator Ti does not depend on the parameters (i.e.,
p(a1 ), . . . , p(ami )). In those cases, the parameters will be omitted. We also
require that each Ti should be expressed with operations in the algebraic struc-
ture R. When Ti depends on the parameters, the expression is given via a
hidden function h mapping Θmi to some Rni and then Ti is rendered as an
algebraic expression of mi + ni indeterminates (the mi values of f (aj )s and
the ni values coming from the image of h). Each operation Ti ∈ T must satisfy
the distributive rule with respect to the addition in the semiring, that is, for
any θi,1 , . . . , θi,mi
X X
Ti f (ai,1 ), . . . , f (ai,mi ); θi,1 , . . . , θi,mi =
ai,1 ∈S(θi,1 ) ai,mi ∈S(θi,mi )
X X
mi
... f (◦i (ai,j )j=1 . (2.52)
ai,1 ∈S(θi,1 ) ai,mi ∈S(θi,mi )
Thus, the evaluation algebra tells us what to calculate in the recursions in the
yield algebra given in Equation (2.46). Due to the properties of the evaluation
algebra, the following theorem holds.
Theorem 20. Let Y = {A, (Θ, ≤), p, O, R} be a yield algebra, and let E =
{Y, R, f, T } be an evaluation algebra. If for some parameter θ, the recursion
mi
S(θ) = tni=1 ◦i (S(θi,j ))j=1 (2.54)
also holds. Namely, it is sufficient to know the values F (S(θi,j )) for each
S(θi,j ) to be able to calculate F (S(θ)).
48 Computational complexity of counting and sampling
We can write back the definition of the F function (that is, Equation (2.53))
into the arguments of the Ti function, so we get Equation (2.55).
The amount of computation necessary to calculate F (S(θ)) depends on
how many θ0 < θ exist in Θ, how big n is in Equation (2.55) and how much
time it takes to calculate each Ti . This is stated in the theorem below.
Theorem 21. A computational problem X (X might be a decision problem,
optimization problem or a counting problem) can be solved with algebraic dy-
namic programming in polynomial running time, if a pair of yield algebra,
Y = {A, (Θ, ≤), p, O, R} and evaluation algebra E = {Y, R, f, T } exists such
that for any problem instance x ∈ X, the following holds:
usual ordering, and p maps the sequences to their summed coin values. The
unary operator ◦i extends a coin sequence with the coin ci . The recursions are
given in Equations (2.35) and (2.36). For this yield algebra, we give several
evaluation algebras that solve the following computational problems.
a ∞=∞ a = ∞ ∀a ∈ R (2.58)
and a ⊕ b is the minimum of a and b, extended by the symbol of infinity with
the identity relations
a ⊕ ∞ = ∞ ⊕ a = a ∀a ∈ R. (2.59)
which is exactly Equation (2.30) with the notation of the operators in the
tropical semiring and m(x) is denoted by F (S(x)).
We are going to introduce three variants of the tropical semiring which are
also used in optimization problems.
Definition 23. The dual tropical semiring (R, ⊕, ) is a semiring, where
R = R ∪ {−∞}, a ⊕ b is the maximum of a and b, and is the usual addi-
tion. The dual tropical semiring can be obtained from the tropical semiring by
multiplying each element by −1.
The exponentiated tropical semiring (R, ⊕, ) is a semiring, where R =
R+ ∪ {∞}, a ⊕ b is the minimum of a and b, and is the usual multiplication.
The exponentiated tropical semiring can be obtained from the tropical semiring
by taking the exponent of each element.
The dual exponentiated tropical semiring (R, ⊕, ) is a semiring, where
R = R+ ∪ {0}, a ⊕ b is the maximum of a and b, and is the usual multi-
plication. We will denote R+ ∪ {0} by R≥0 . The dual exponentiated tropical
semiring can be obtained from the tropical semiring by first multiplying each
element by −1 and then taking the exponent of it.
holds for any r1 , r2 , r3 ∈ (N, R). Here the first coordinate is the number of
coin sequences and the second coordinate is the total sum of weights. For
any two sets of coin sequences, both the number of sequences and the sum
of the weights are added in the union of the two sets. For the operation ◦ci ,
Algebraic dynamic programming and monotone computations 51
The empty sum in this ring is (0, 0), since (0, 0) is the additive unit.
2.1.3.5 Counting the coin sequences when the order does not count
When the order of the coins does not count, the base set A must be changed
(and so the yield algebra), since several coin sequences contain the same coins,
just in different order. In this case, the base set A must contain only the non-
decreasing coin sequences. (Θ, ≤) contains the pairs N × C, where C is the set
of coin values, and the partial ordering is coordinatewise, that is, for any two
members of Θ, (n1 , c1 ) ≤ (n2 , c2 ) if and only if n1 ≤ n2 and c1 ≤ c2 . The p
function is !
n
X
p(c1 c2 . . . cn ) := ci , cn . (2.66)
i=1
The operator set O contains the unary operators ◦ci which still concatenates
ci to the end of a coin sequence. However, this operator can be applied only on
the sequences that end with a coin whose value is at most ci . It is guaranteed
in the recursion of the yield algebra, since the recursion is
Once the yield algebra is obtained, several evaluation algebras can be associ-
ated to it. For example, to count the number of set of coins that sum up to
a given value, the f function is the constant 1 in the evaluation algebra and
each Ti operator is the identity function.
where only those members of the summation are indicated for which h(g) 6= 0.
Both notations (mappings and formal summations) are used below.
An example for a monoid semiring is the integer polynomial ring Z[x].
Here the monoid is the one variable free monoid generated by x. The semiring
is the integer numbers. Although the integer numbers with the usual addition
and multiplication form a ring, we do not use the subtractions of this ring in
algebraic dynamic programming algorithms. Another example for a monoid
semiring is the natural number polynomial semiring, N[x], that is, the integer
polynomials with non-negative coefficients. 0 is considered to be a natural
number to get a semiring for the usual addition and multiplication. In fact,
N[x] is a sub-semiring of the Z[x] semiring, and that is the semiring what we
use in algebraic dynamic programming algorithms, even if Z[x] is given as the
algebraic structure in the evaluation algebra.
Monoid semirings over the natural number semiring can be used to build
evaluation algebras if the combinatorial objects are scored by a multiplica-
tive function based on some monoid. This is precisely stated in the following
theorem.
Theorem 22. Let Y = (A, Θ, O, R) be a yield algebra, G a (commutative)
monoid, and let f : A → G be a function, such that for any m-ary operator
◦ ∈ O, a function h◦ : Θm → G exists, for which the equality
m
Y
m
f (◦ ((ai )i=1 )) = h◦ (θ1 , θ2 , . . . , θm ) f (ai ) (2.69)
i=1
Proof. It is the direct consequence of the fact that N[G] is a semiring, and the
distributive rule holds. Indeed,
X X
Ti f 0 (ai1 ), . . . , f 0 (aimi ); θ1 , . . . , θmi =
ai1 ∈S(θ1 ) aim ∈S(θmi )
i
X X
h◦i (θ1 , . . . , θmi ) f 0 (ai1 ) . . . f 0 (aimi ) =
ai1 ∈S(θ1 ) aim ∈S(θmi )
i
X X
... h◦i (θ1 , . . . , θmi )f 0 (ai1 ) . . . f 0 (aimi ) =
ai1 ∈S(θ1 ) aim ∈S(θm )
i i
X X mi
... f 0 ◦i aij j=1 . (2.72)
ai1 ∈S(θ1 ) aim ∈S(θm ) ∈S(θmi )
i i
and
X X X
ϕ(h1 h2 ) = (h1 h2 )(g) = h1 (g1 )h2 (g2 ) =
g∈G g∈G g1 g2 =g
X X
h1 (g1 ) h2 (g2 ) = ϕ(h1 )ϕ(h2 ). (2.77)
g1 ∈G g2 ∈G
The homomorph image calculates ϕ(F (S(θ))), that is, the size of S(θ).
(b) (Y, N[R+ ], f, T ) is a statistics evaluation algebra, ϕ : N[R+ ] → R, where
R is the tropical semiring, and
ϕ(h) = min{supp(h)}
where
supp(h) := {g ∈ R|h(g) 6= 0}.
(Recall that h ∈ N[R+ ] is a mapping from R+ to N.) When the support
of h is the empty set, ϕ(h) = +∞, the additive unit of R. It is indeed a
semiring-homomorphism, since
ϕ(h1 + h2 ) = min{supp(h1 + h2 )} =
min{supp(h1 } ⊕ min{supp(h2 )} = ϕ(h1 ) ⊕ ϕ(h2 ) (2.78)
and
ϕ(h1 h2 ) = min{supp(h1 h2 )} =
min{supp(h1 )} min{supp(h2 )} = ϕ(h1 ) ϕ(h2 ). (2.79)
This homomorph image calculates ϕ(F (S(θ))), that is, the minimal
value in S(θ).
Similar construction exists with evaluation algebra {Y, N[R× ] f, T }.
56 Computational complexity of counting and sampling
and
X X X
ϕ(h1 h2 ) = g(h1 h2 )(g) = g h1 (g1 )h2 (g2 ) =
g∈R g∈R g1 g2 =g
X X
g1 h1 (g1 ) g2 h2 (g2 ) = ϕ(h1 )ϕ(h2 ) (2.81)
g1 ∈R g2 ∈R
The homomorph image calculates ϕ(F (S(θ))), which is indeed the weighted
sum of the objects.
However, if the monoid semigroup is N[R+ ], then
X
ϕ(h) = gh(g)
g∈R
Algebraic dynamic programming and monotone computations 57
and
X X
ϕ(h1 h2 ) = (h1 h2 )(g), g(h1 h2 )(g) =
g∈R g∈R
X X X X
h1 (g1 )h2 (g2 ), g h1 (g1 )h2 (g2 ) =
g∈R g1 +g2 =g g∈R g1 +g2 =g
X X X X
h1 (g1 ) h2 (g2 ) , (g1 + g2 )h1 (g1 )h2 (g2 ) =
g1 ∈R g2 ∈R g∈R g1 +g2 =g
X X X X
h1 (g1 ) h2 (g2 ) , (g1 + g2 )h1 (g1 )h2 (g2 ) =
g1 ∈R g2 ∈R g1 ∈R g2 ∈R
X X X X
h1 (g1 ), g1 h1 (g1 ) h2 (g2 ), g2 h2 (g2 ) =
g1 ∈R g1 ∈R g2 ∈R g2 ∈R
thus, weighted sums can also be calculated when the function f is additive on
the objects.
This idea can be extended to an arbitrary commutative ring, and even
58 Computational complexity of counting and sampling
It is easy to see that the distributive rule also holds. This commutative ring
can be used to calculate the moments of an additive function over an ensemble
of combinatorial objects, A, on which a yield algebra exists where all opera-
tions are binary. An additive function is a function g such that for all binary
operations ◦i ,
g(a ◦i b) = g(a) + g(b) (2.86)
where the addition is in the ring R. If x = (p0 , p1 , . . . , pm−1 ) and y =
(q 0 , q 1 , . . . , q m−1 ), then indeed
powering, and tropical powering also satisfies the distributive rule. Indeed, if
for an operator ◦i , the f function satisfies the equation
mi
X
mi
f ◦i (aj )j=1 = ci,0 + ci,j f (aj )
j=1
[= Ti (f (a1 ), . . . , f (ami ), p(a1 ), . . . , p(ami ))] (2.90)
where each ci,j is non-negative (and might depend on parameters p(aj )), then
it also holds that
Ti ( min f (a1 ), . . . , min f (ami ), θ1 , . . . , θmi ) =
a1 ∈S(θ1 ) ami ∈S(θmi )
mi
min . . . min f ◦i (aj )j=1 . (2.91)
a1 ∈S(θ1 ) ami ∈S(θmi )
Another interesting
case
is when R is a distributive lattice, and for each
mi
operator oi , f ◦i (aj )j=1 can be described with operations in R. An eval-
uation algebra can be built in this case, since the distributive rules
_ _ _
(f (ak ) ∨ f (al )) = f (ak ) ∨ f (al ) (2.92)
k,l k l
and _ _ _
(f (ak ) ∧ f (al )) = f (ak ) ∧ f (al ) (2.93)
k,l k l
as well as the dual equalities hold. A special case is when R is the set of
real numbers, ∨ is the maximum and ∧ is the minimum. The so-obtained
(min,max)-semiring can also be used in optimization problems, for example,
finding the highest vehicle that can travel from some point A to point B
on a network of roads containing several bridges with different height (see
Example 6).
are in the lowest level of the Chomsky hierarchy [42]. Stochastic versions of
regular grammars are related to Hidden Markov Models [163, 15].
W → xW 0 (2.94)
W → x (2.95)
W → (2.96)
S = X0 → X1 → . . . → Xk ∈ T ∗ (2.97)
i −1
X kY
w(Wi,j → βi,j )
Gi j=0
where the rewriting rule Wi,j → βi,j is applied in the jth step of the
generation Gi generating X in ki steps.
These questions can be answered using the same yield algebra and dif-
ferent evaluation algebras. The yield algebra builds the possible generations
of intermediate sequences. Note that in any generation of any regular gram-
mar, each intermediate sequence appearing in the sequence of generations is
in form Y W , where Y ∈ T ∗ and W ∈ N . The yield algebra is the following.
The set A contains the possible generations of intermediate sequences Xi W ,
where Xi denotes the prefix of X of length i. The parameters are pairs (i, W )
denoting the length of the prefix and the current non-terminal character. For
i = |X|, a parameter (i, ) is also considered. This parameter describes the
set of possible generations of X. In the partial ordering of the parameters,
(i1 , W1 ) ≤ (i2 , W2 ) if i1 ≤ i2 . For each rewriting rule W → β, there is a unary
operation ◦W →β extending the rewriting with a new rewriting W → β. The
recursions are
with the initial condition S((0, S)) = {S}, namely, the set contains S as the
rewriting sequence containing no rewriting steps, and S((0, W )) = ∅ for all
W 6= S.
For the given problems, the following evaluation algebras can be con-
structed.
(a) The semi-ring is the Boolean semiring ({0, 1}, ∨, ∧). The function f
is the constant 1. Each function Tα→β is the identity. The answer
for the decision question is “yes” if F (S((|X|, ))) = 1, and “no” if
F (S((|X|, ))) = 0. This latter can happen if S((|X|, )) = ∅, since the
empty sum in the Boolean semiring is 0, being the additive unit.
(b) The semi-ring is Z. The f function is the constant 1. Each function
Tα→β is the identity function. The number of possible generations is
F (S((|X|, ))).
(c) The semiring is the dual exponentiated tropical semiring, (R≥0 , max, ·).
62 Computational complexity of counting and sampling
Indeed, here e stands for even, and o stands for odd, and the two characters
in the index of the nonterminals tell the parity of the number of a’s and
b’s generated so far. For example, Woe denotes that so far an odd number
of a’s and an even number of b’s have been generated, etc. Wee is indeed the
starting non-terminal, since at the beginning, 0 number of characters has been
generated and 0 is an even number. The generation can be stopped when an
even number of a’s and an odd number of b’s have been generated, as indicated
by the Weo → rule.
holds. (Recall that β is any of the sequences that might appear in the right-hand
side of a rewriting rule of a regular grammar, including the empty sequence.)
A stochastic regular grammar makes random generations
S = X1 → X2 → . . .
The vertices of G~ are called states. A random walk on the states is defined
by G~ and the transition probabilities. The random walk starts in the state
ST ART and ends in the state EN D. During such a walk, states emit char-
acters according to the emission distribution e. In case of loops, the random
walk might stay in one state for several steps. A random character is emitted
in each step. The process is hidden in the sense that an observer can see only
the emitted characters and cannot observe the random walk itself. An emission
path is a random walk together with the emitted characters. The probability of
an emission path is the product of its transition and emission probabilities.
If (u, v) is an edge, then the notation T (v|u) is also used, emphasizing
that T is a conditional distribution. Indeed T (v|u) is the probability that the
Markov process will be in state v in the next step given that it is in the state
u in the current step.
One can ask, for an X ∈ Γ∗ , what the most likely emission path and the
total emission probability are. This latter is the sum of the emission path
probabilities that emit X. These questions are equivalent with those that can
be asked for the most likely generation and the total generation probability of
a sequence in a stochastic regular grammar, stated by the following theorem.
Theorem 24. For any Hidden Markov Model H = (G, ~ ST ART, EN D, Γ, T, e),
there exists a stochastic regular grammar G = (T, N, S, R, π) such that Γ = T ,
LG is exactly the set of sequences that H can emit, and for any X ∈ LG ,
the probability of the most likely generation in G is the probability of the most
likely emission path in H, and the total generation probability of X in G is the
total emission probability of X in H. Furthermore, the running time needed
to construct G is a polynomial function of the size of G.~
Proof. For G ~ = (V, E), let the non-terminals of the regular grammar corre-
spond to V . That is, for any v ∈ V , there is a non-terminal Wv . The starting
non-terminal S = WST ART . For each (u, v) ∈ E, v 6= EN D and x ∈ Γ such
that e(v, x) 6= 0, construct a rewriting rule
Wu → xWv
with probability T (v|u)e(v, x) and for each (u, EN D) ∈ E construct a rewrit-
ing rule
Wu →
Algebraic dynamic programming and monotone computations 65
This yield algebra can be used to solve the following optimization problems
with appropriate evaluation algebras:
and
A-BC
AD-C
both contain the deletion of B and insertion of D. The problem can be elim-
inated if insertions are not allowed after deletions, however, deletions are al-
lowed after insertions. This needs a modification of the yield algebra, since
it must build only these substitution-free alignments. The parameter set also
should be modified. The triplet (i, j, t), with t ∈ {I, D, M } indicates the length
of the prefixes and whether the last alignment column is an insertion, deletion
or a match. The operators are the same, and the recursions are
and the number of ways that X can be transformed into Y with a minimum
number of insertion and deletion operations is ck k! where k is the smallest
index such that ck 6= 0.
When substitutions are allowed, the yield algebra building all possible
alignments might be given. The parameters (i, j) and operators ◦x , ◦ x and
y −
◦− are as above. The recursions are simply
y
AACTAT
A-C-CT
under the affine gap penalty scoring scheme, although the two alignments
Algebraic dynamic programming and monotone computations 71
are alignments of the same sequences containing the same type of alignment
columns, just in different order.
Under this scoring scheme, the scoring of an alignment is no longer addi-
tive in the strict sense that the score depends on only the individual align-
ment columns. However, the score of an insertion or deletion only depends on
whether or not the previous alignment column is of the same type. If the yield
algebra is built up separating the different types of alignments by extending
the parameters with the different indicator variables, then the scoring can
be done appropriately in the evaluation algebra. That is, the yield algebra is
exactly the same as introduced above with recursions in Equations (2.111)–
(2.114). Recall that in this yield algebra, only those alignments are built that
do not contain insertions after deletions, however, deletions might occur after
insertions.
If the aim is to find the smallest possible score, then the semiring in the
evaluation algebra is the tropical one. The f function assigns the score to each
alignment. The Tx function for operator ◦x is the tropical multiplication with
y y
x
the weight of the alignment column . The functions T x and T− depend on
y − y
the parameters. T x is the tropical multiplication with go if the parameter is
−
(i, j, M ) or (i, j, I) and the tropical multiplication with ge if the parameter is
(i, j, D). Similarly, T− is the tropical multiplication with go if the parameter
y
is (i, j, M ) and the tropical multiplication with ge if the parameter is (i, j, I)
(recall that insertions cannot follow deletions).
Counting problems can be solved with the appropriate modification of the
evaluation algebra, see for example, Exercise 20.
The parameters (i, j, x), where i and j denote the length of the prefixes
and x is a member of a finite set are very similar to the parameters (i, W )
used in regular grammars and Hidden Markov Models. It is natural to define
pair Hidden Markov Models and to observe that the introduced yield algebras
for sequence alignments (for example, the ones defined by recursions in Equa-
tions (2.111)–(2.114)) are special cases of the class of yield algebras that can
be constructed based on pair Hidden Markov Models.
Definition 31. A pair Hidden Markov Model is a tuple
~ ST ART, EN D, Γ, T, e), where G
(G, ~ = (V, E) is a directed graph with
two distinguished vertices, ST ART and EN D. Vertices are called states.
Loops are allowed, however, parallel edges are not. The in-degree of the
ST ART and the out-degree of the EN D states is 0. Γ is a finite set of
symbols, called an alphabet, and T : E → R+ are the transition probability
function satisfying for all v 6= EN D
X
T (e) = 1.
u|(v,u)=e∈E
Depending on which emission probabilities are not 0, the states are called
insertion, deletion and match states. A random walk on the states is defined by
~ and the transition probabilities. The random walk starts in the state ST ART
G
and ends in the state EN D. During such a walk, states emit characters or
pair of characters according to the emission distribution e. In case of loops, the
random walk might stay in one state for several consecutive steps. In each step,
a random character or a pair of characters is emitted. The emitted characters
generate two strings, X and Y . The process is hidden in the sense that an
observer can see only the emitted sequences and cannot observe the random
walk itself. The observer cannot even observe which characters are emitted
together and which ones individually, that is, the observer cannot see the so-
called co-emission pattern. An emission path is a random walk together with
the two emitted sequences. The probability of an emission path is the product
of its transition and emission probabilities.
Given two sequences, X and Y , and a pair Hidden Markov Model H, a
yield algebra can be constructed whose base set, A, is the partial emission
paths that emit prefixes of X and Y . The parameter set is (i, j, W ), where i
and j are the length of the prefixes and W is the current state of the emission
path, assuming that W has already emitted a character or pair of characters.
In the partial ordering of the parameters, (i, j, W ) ≤ (i0 , j 0 , W 0 ) if i ≤ i0 and
j ≤ j 0 . The operators ◦WM , ◦WI and ◦WD extend the emission path with one
step and the emission of the new state. Here, the indices M , I and D denote
the type of the states. The recursions are
S((i,
j, WM ) =
tW |(W,WM )∈E ◦WM (S((i − 1, j − 1, W ))) if e((xi , yj ), WM ) 6= 0
∅ if e((xi , yj ), WM ) = 0
S((i,
j, WI ) =
tW |(W,WI )∈E ◦WI (S((i, j − 1, W ))) if e((−, yj ), WI ) 6= 0
∅ if e((−, yj ), WI ) = 0
S((i,
j, W D ) =
tW |(W,WD )∈E ◦WD (S((i − 1, j, W ))) if e((xi , −), WD ) 6= 0
.
∅ if e((xi , −), WD ) = 0
Algebraic dynamic programming and monotone computations 73
Similar evaluation algebras can be combined with this yield algebra like the
ones prescribed earlier in this section. The method can be extended to triple-
wise and multiple alignments as well as triple and multiple Hidden Markov
Models. However, the size of the parameter set grows exponentially with the
number of sequences. Indeed, the number of parameters below a parame-
ter (i1 , i2 , . . . , ik , W ) in the partial ordering of the set of the parameters is
Qk
Ω( j=1 ij ). Finding the minimum scored multiple alignment is proven to be
NP-hard [104, 183].
W →β
S = X1 → X2 → . . . → Xk ∈ T ∗ (2.118)
W → W1 W 2 | a
where W, W1 , W2 ∈ N and a ∈ T .
It is a well-known theorem that any context-free grammar G can be rewrit-
ten into a context-free grammar G0 such that G0 is in Chomsky normal form
and LG = LG0 [159]. Here an equivalent theorem is proved for stochastic
context-free grammars.
Theorem 25. For any context-free grammar G = {T, N, S, R, π} there exists
a context-free grammar G0 = {T, N 0 , S0 , R0 , π 0 } such that G0 is in Chomsky
normal form, and the following holds:
Algebraic dynamic programming and monotone computations 75
where |R| is defined as the total sum of the lengths of the β sequences in
the rewriting rules in R.
(c) There is a polynomial running time algorithm that constructs G0 from
G.
Proof. The proof is constructive, and G0 is constructed in the following three
phases. Each phase consists of several steps, so G is transformed into G0 via
a series of steps, G = G1 , G2 , . . . Gk = G0 such that for all i = 1, . . . , k − 1, it
is proved that there is a surjective function from TGi to TGi+1 that keeps the
probability. Finally, it will be shown that the entire construction can be done
in polynomial time and indeed |R0 | ≤ 128|R|3 + 4|R|2 .
In the first phase, those rewriting rules W → β are considered for which
|β| > 2. If the current grammar is Gi in which there is a rewriting rule W → β
such that β = b1 , b2 . . . bk , then the new non-terminals W1 and W2 are added
to the nonterminals, and thus we get Ni+1 = Ni ∪ {W1 , W2 } together with
the following rewriting rules and probabilities
πi+1 (W → W1 W2 ) = πi (W → β)
πi+1 (W1 → b1 b2 ) = 1
πi+1 (W2 → b3 . . . bk ) = 1.
The rule W → β is removed from Ri and the above rules with the given
probabilities are added to get Ri+1 and πi+1 . The gi function replaces each
rule W → β with the above three rules. It is easy to see that gi is a bijection
between TGi and TGi+1 that keeps the probabilities.
The first phase finishes in O(|R|) time, since the integer value
X
|β|
β|∃W,W →β∈Ri ∧ |β|>2
πi+1 (W → W1 W2 ) = πi (W → a1 a2 )
πi+1 (W1 → a1 ) = 1
πi+1 (W2 → a2 ) = 1.
The rule W → a1 a2 is removed from Ri and the above rules are added with
the prescribed probabilities to get Ri+1 and πi+1 . The gi function replaces
each rule W → a1 a2 with the above three rules. It is easy to see that gi is a
bijection between TGi and TGi+1 that keeps the probabilities. Rewriting rules
W → a1 W2 and W → W1 a2 are handled in a similar way. However, observe
that it is sufficient to introduce only one new non-terminal.
The second phase finishes in O(|RI )| time since each rewriting rule is
modified at most once, and the new rules are not modified further. At the end
of the second phase, a grammar GII is constructed in which each rewriting
rule is either in Chomsky normal form or it is W → W 0 for some W, W 0 ∈
NII . Furthermore it is clear that |RII | ≤ 2|RI |, thus |RII | ≤ 4|R|, and the
construction runs in polynomial time with |R|.
In the third phase, those rules W → W 0 are considered which are the only
rules that are not in Chomsky normal form in GII . If Gi is a grammar in which
a W 0 exists so that for some W , W → W 0 ∈ R, then we do the following:
πi+1 (W → β)
πi+1 (W → β) := . (2.121)
1 − πi+1 (W → W )
and the definition of the new probabilities in Equation (2.121) together provide
that gi keeps the probabilities.
The third phase finishes in O(|NII |) number of steps, since in each step
one non-terminal is eliminated on the right-hand side, and no rule added in
which a non-terminal would appear on the right-hand side that was eliminated
earlier. At the end of the third phase, the grammar G0 is in Chomsky normal
form.
In any context-free grammar with non-terminals N and terminals T in
Chomsky normal form, the number of rewriting rules cannot exceed 2|N |3 +
|N ||T |. Notice that |N 0 | ≤ |NII | and T 0 = T . Furthermore, a very rough upper
bound is |NII | ≤ |RII |. Since the number of terminal characters cannot be
more than the sum of the length of the rewriting rules, it follows that
possible parse trees and not just the possible sequences that can be generated
with the given condition. These two numbers are equal only if the grammar is
unambiguous. For ambiguous grammars it is also #P-complete to count the
number of sequences that can be generated with a given constraint since any
regular grammar is also context-free, and the counting problem in question is
already #P-complete for regular grammars.
The reason why a context-free grammar must be rewritten into Chom-
sky normal form is that the binary operation ◦W →W1 W2 indicates only O(n)
disjoint union operations in Equation (2.123), where n is the length of the
generated sequence. If there were a rewriting rule W → W1 W2 W3 , it would
require a ternary operator ◦W →W1 W2 W3 , and the recursion in the yield algebra
would require Ω(n2 ) disjoint union in form
S((i, j, W ) = tj−1 0
k2 =i+1 S((i, k2 , W )) ◦W →W W3 S((k2 + 1, j, W3 ) t
0
clearly highlights that the new non-terminals introduced in the first phase of
rewriting a grammar into Chomsky normal form provide additional “memory”
with which the calculations might be speeded up. It is also clear that efficient
algorithms are available for all context-free grammars in which the rewriting
rules are not in Chomsky normal form. However, there are at most 2 non-
terminals on the right-hand side in each rewriting rule, furthermore, there are
no rewriting rules in the form W → W 0 .
One family of the most known combinatorial structures that might be
described with context-free grammars are the Catalan structures. Catalan
structures with a parameter k are combinatorial structures whose number is
the Catalan number Ck . An example for them is the set of Dyck words.
Definition 35. A Dyck word is a finite sequence from the alphabet {x, y},
such that in any prefix, the number of x characters is greater than or equal to
the number of y characters, and the number of x and y characters is the same
in the whole sequence.
i i0 j0 j i j i0 j0 i i0 j j0
a) b) c)
to the top right corner not stepping above the diagonal. Each x is represented
by a horizontal step, and each y is represented by a vertical step. The area of
a Dyck word is the area below the lattice path representing it. Give a dynamic
programming algorithm that calculates the average area of a Dyck word of
length n.
Solution. The yield algebra is the same as defined above. Since area is an
additive function, the algebraic structure in the evaluation algebra is the ring
R = (R2 , ⊕, ), where the addition is coordinatewise and multiplication is
defined by
(x1 , y1 ) (x2 , y2 ) = (x1 x2 , x1 y2 + y1 x2 ).
The f function assigns (1, a) to each Dyck word D where a is the area of
D. The T functions for the operators depend on the parameters. The TxyS
function is the multiplication with (1, k) when the unary operator cxyS is
applied on a Dyck word of length k. The function TxSy is the identity. Finally,
the TxSyS (a, b, k, l) function is a b (1, (k+1)l), where a, b ∈ R and k and l are
parameters. Indeed, adding xy at the beginning of a Dyck word of length 2k
(thus, with parameter k) increases its area by k. Adding an x to the beginning
and a y to the end of a Dyck word do not change its area. Finally, the area of
a Dyck word xDyD0 is the area of D plus the area of D0 plus (k + 1)l if the
parameters of the Dyck words are p(D) = k and p(D0 ) = l.
Y
If F (S(n)) = (X, Y ), then the average area is X .
Context-free grammars are also used in bioinformatics, since the pseudo-
knot-free RNA structures can be described with these grammars.
Definition 36. An RNA sequence is a finite long string from the alphabet
{a, u, c, g}. A secondary structure is a set of pair of indexes (i, j), i + 2 < j
such that each index is in at most one pair. For each pair of indexes, the pair
of characters might only be (a, u), (c, g), (g, c), (u, a), (g, u) or (u, g). These
pairs are called base pairs. A secondary structure is pseudo-knot free if for all
pair of indexes, (i, j) and (i0 , j 0 ), i < i0 , it holds that either j 0 < j or j < i0 .
Namely, any pair of indexes are either nested or separated, and there are no
crossing base pairs, see also Figure 2.2.
Algebraic dynamic programming and monotone computations 81
The four characters represent the possible nucleic acids building the RNA
molecules. An RNA molecule is a single stranded polymer, the string can be
folded, and the nucleotides can form hydrogen bonds stabilizing the structure
of the RNA. The pair of indexes represent the nucleic acids making hydrogen
bonds. Due to spherical constraints, it is required that i + 2 < j. From now
on, any RNA secondary structure is considered to be pseudo-knot-free, and
the adjective “pseudo-knot-free” will be omitted.
Theorem 27. The following grammar (also known as the Knudsen-Hein
grammar, [115]) can generate all possible RNA secondary structures. T =
{a, c, g, u}, N = {S, L, F }, and the rewriting rules are
S → LS | L (2.127)
L → a | c | g | u | aF u | cF g | gF c | uF a | gF u | uF g (2.128)
F → aF u | cF g | gF c | uF a | gF u | uF g | LS (2.129)
The base pairs are the indexes of those pair of characters that are generated
in one rewriting step. Furthermore, each pseuknot-free RNA structure that a
given RNA sequence might have has exactly one parse tree in the grammar.
Proof. It is clear that the grammar generates RNA structures, so it is sufficient
to show that each possible RNA structure that a given RNA sequence might
have can be generated with exactly one parse tree.
In a given RNA secondary structure of a sequence X, let those base pairs
(i, j) be called outermost, for which no base pair (i0 , j 0 ) exists such that i0 < i
and j < j 0 . If (i, j) is an outermost base pair, then the base pairs inside the
substring X 0 from index i + 1 till index j − 1 form also an RNA structure on
the substring. Furthermore, either there is an outermost base pair (i + 1, j − 1)
on the substring X 0 or at least one of the following is true:
(a) there are at least two outermost base pairs or
(b) there is an outermost base pair (i0 , j 0 ) and a character which is not base-
paired and outside of the base pair (i0 , j 0 ), that is, if it has index k, then
k < i0 or k > j 0 or
(c) there are at least two characters in X 0 which are not base-paired and
are outside of any outermost base pair.
This comes from the fact that the length of X 0 is at least 2, according to
the definition of RNA secondary structure, i.e., i + 2 < j. Having said this,
a given RNA secondary structure can be generated in the following way. If
the outermost base pairs have indexes (i1 , j1 ), (i2 , j2 ), . . . (ik , jk ), then first
an intermediate sequence containing
k
X
i1 + (il − jl−1 ) + n − jk
l=2
82 Computational complexity of counting and sampling
L→a|c|g |u
and the outermost base pairs are generated using the appropriate rewriting
rule from the possibilities
L → aF u | cF g | gF c | uF a | gF u | uF g.
Then each intermediate string from the index il + 1 till the index jl − 1 must
be generated. If (il + 1, jl − 1) is an outermost base pair, then it is generated
by the appropriate rule from the possibilities
F → aF u | cF g | gF c | uF a | gF u | uF g,
of base pairs rather than just base pairs stabilizes an RNA secondary struc-
ture [168, 169]. Two base pairs (i, j) and (i0 , j 0 ) are consecutive if i0 = i + 1
and j 0 = j − 1. This can be emphasized via introducing a new non-terminal
F 0 . The non-terminal F represents the first base pair, and F 0 represents the
fact that there was a base pair in the previous rewriting step in the parse tree.
That is, the rewriting rules are modified as
L → a | c | g | u | aF u | cF g | gF c | uF a | gF u | uF g (2.130)
0 0 0 0 0 0
F → aF u | cF g | gF c | uF a | gF u | uF g (2.131)
0 0 0 0 0 0 0
F → aF u | cF g | gF c | uF a | gF u | uF g | LS. (2.132)
S∈S
where S is the set of all possible RNA secondary structures that an RNA
sequence might have. To be able to calculate the probability of the min-
imum free energy structure, the partition function must be calculated.
Since the free energies are additive in this model, the values
∆G(S)
e− RT
X ∆G(S)
Z= e− RT ,
S∈S
X ∆G(S)
∆G( S)e− RT and
S∈S
X ∆G(S)
∆G2 (S)e− RT .
S∈S
Si ∈Si
Yield algebra can be built for walks in the following way. Given a directed
graph G ~ = (V, E), let A be the set of walks on G.
~ The parameters are triplets
(i, j, k), and the walk from vertex vi to vertex vj containing k edges is assigned
to the parameter (i, j, k). The partial ordering of the parameters is the natural
ordering of their last index. The unary operator ◦(vl ,vj ) concatenates the edge
e = (vl , vj ) and vertex vj to the end of a walk. The recursions are
S((i, j, k)) = t(vl ,vj )∈E ◦(vl ,vj ) (S((i, l, k − 1))) . (2.135)
The base cases are S((i, i, 0)) = {vi } and S((i, j, 0)) = ∅ for all i 6= j.
Given a weight function w : E → R, define
k
X
f (v0 , e1 , . . . , ek , vk ) := w(ei ).
i=1
is the weight of the shortest path from vi to vj . Indeed, any path cannot
contain more edges than |V | − 1, and the shortest walk must be a shortest
path in case of no negative cycles.
On the other hand, finding the longest path is an NP-hard optimization
problem. If each weight is the constant 1, then the longest path from vi to
vj is exactly |V | − 1 if and only if there is a Hamiltonian path from vi to
vj . Equivalently, it is NP-hard to find the shortest path in a given number of
steps in case of negative cycles. To see this, set all weights to −1, and ask for
the shortest path from vi to vj in |V | − 1 steps.
Since the shortest walks are all shortest paths when there are no negative
cycles, it is also possible to count the number of shortest paths in polyno-
mial time on graphs without negative cycles. To do this, first build up the
statistics evaluation algebra using the monoid semiring N[R+ ], then take the
homomorph image
ϕ(h) := gminsupp h(gminsupp )
where gminsupp is the smallest element in the support of h. It is easy to see that
this is indeed a semiring homomorphism. Then ϕ(F (S((i, j, k)))) calculates the
number of the shortest walks from vi to vj in k steps. Summing this for all
k = 1, . . . , |V | − 1 in the homomorph image gives the number of shortest walks
from vi to vj which coincides with the number of shortest paths.
The (max,min)-semiring can be utilized in the following example.
Algebraic dynamic programming and monotone computations 87
f (v0 , e1 , . . . , ek , vk ) := min{w(ei )}
i
max f (π)
π|p(π)=(i,j,k)
namely, the maximum value of the walks from vi to vj in k steps. The value
of the maximum value path from vi to vj is
π = vi , . . . , el , vl , el+1 , . . . , vl , el0 , . . . , vj
π 0 = vi , . . . , el , vl , el0 , . . . , vj .
It is clear that counting the number of walks that maximizes the prescribed
f function in Example 6 can be calculated with algebraic dynamic program-
ming in polynomial time. Some of them might not be paths, and counting
the number of such paths is a hard computational problem. Indeed, deciding
that there is a Hamiltonian path in a graph is NP-complete. Observe that the
same blowing-up technique used in the proof of Theorem 16 can be applied to
decide whether or not there is a path with n − 1 edges between two vertices
in a graph, where n is the number of vertices.
However, the number of shortest (= having minimum number of edges)
paths that maximizes the f function in Example 6 can be calculated in poly-
nomial time. Indeed, the shortest walks that maximize f are shortest paths.
The set of walks that minimizes the function f in Example 6 might not
contain any path. Interestingly, this optimization problem is NP-hard, since
the edge-disjoint pair of paths problem can be easily reduced to it [70].
88 Computational complexity of counting and sampling
2.3.5 Trees
Removing an internal node from a tree splits the tree into several subtrees.
This gives the possibility to build a yield algebra on trees and to solve vertex
coloring and related problems on trees. To do this, the trees should be rooted
as defined below.
Definition 38. Let G = (V, E) be a tree, and v ∈ V and arbitrary vertex.
Rooting the tree at vertex v means an orientation of the edges such that the
orientation of an edge (vi , vj ) is from vi to vj if vj is closer to v than vi . In
such a case, vertex vi is a child of vj and vj is the parent of vi . Vertex v
is called the root of the tree. The subtree rooted in vertex vj is the tree that
contains vj and all of its descendants (its children, the children of its children,
etc.).
Every node in a rooted tree except its root has exactly one parent. Any
internal node has at least one child. The central task here is to solve coloring
problems on trees. First, the definition of r-proper coloring is given.
Definition 39. Let G = (V, E) be a rooted tree, C is a finite set of colors,
and r ⊆ C × C is an arbitrary relation. A coloring c : V → C is an r-proper
coloring if for all oriented edges (vi , vj ) ∈ E, (c(vi ), c(vj )) ∈ r.
The r-proper coloring is an extension of the usual proper coloring defini-
tion. Indeed, if
r = (C × C) \ (∪c∈C {(c, c)})
then the r-proper coloring coincides with the usual definition of proper color-
ing.
The yield algebra of the r-proper coloring of a tree rooted in v can be
constructed based on the subtrees of the tree and its proper colorings. Let
G = (V, E) be a tree rooted in vertex v, let C be a finite set of colors, and
let r ⊆ C × C be a relation. Let A be the set of r-proper colorings of all the
subtrees of G. The parameters (u, c) describe the root of the subtree, u, and
its color, c. The partial ordering of the parameters is based on the root of the
subtrees, (u1 , c1 ) ≤ (u2 , c2 ) if the path from u1 to v contains u2 . The arity
of the operator ◦u,c is the number of children of vertex u. It takes a subtree
for each child, and connects them together with their common parent node u
colored by color c. The recursions are
k
S((u, c)) = ◦u,c tci |(ci ,c)∈r S((wi , ci )) i=1 (2.136)
Algebraic dynamic programming and monotone computations 89
where the wi nodes are the children of u. The initial condition for a leaf u
is that S((u, c)) is the set containing the trivial tree consisting only of the
vertex u, colored by color c. There are problems where the colors are given
for the leaves of the tree and the task is to give r-proper coloring of the entire
tree. For those problems, the initial conditions prescribe that S((u, c)) be the
empty set if leaf u is not colored by c.
Combining this yield algebra with different evaluation algebras provides
algebraic dynamic programming algorithms to find, for example,
(a) the number of r-proper colorings of a tree with a given number of colors,
(b) the number of independent vertex sets of a tree,
(c) the size of the largest independent set of a tree,
(d) the coloring that minimizes
X
w(c(u1 ), c(u2 ))
(u1 ,u2 )∈E
with the initial condition S(0) = {ε}, where ε is the empty string. To count
the number of subsequences, we simply have to replace each disjoint union
with the addition and each ◦a operation with multiplication by 1. Since the
number of indices between g(i) and i − 1 is comparable with the length of X,
this algorithm runs in O(n2 ) time.
However, if subtraction is allowed, the number of different subsequences
can be counted in linear time. Let m(i) denote the number of different subse-
quences of the prefix Xi , and let g(i) be defined as above. Then
m(i) = 2m(i − 1) − m(g(i)). (2.138)
Indeed, we can extend any subsequence of Xi−1 with the character xi . This
yields some overcounting as some of the subsequences could appear twice in
this way. However, only those subsequences can appear that are also subse-
quences of Xg(i) . Clearly, the recursion in Equation (2.138) takes constant
time for each index, and therefore, the number of different subsequences can
be computed in linear time if subtractions are allowed.
Schnorr showed that matrix multiplication of two n × n matrices requires
Ω(n3 ) arithmetic operations if only multiplications and additions are allowed
[151]. However, matrix multiplications can be done in O(nlog2 (7) ) time if sub-
tractions are allowed [162], or even faster [47, 51]. Subtractions might have
even more computational power. We are going to discuss it further at the end
of Chapter 3.
Branching (test and branch instructions in an algorithm) might also speed
up calculations. It is well known that a greedy algorithm can find a minimum
spanning tree in an edge-weighted graph in polynomial time [116]. On the
other hand, Mark Jerrum and Marc Snir proved that computing the spanning
tree polynomial needs an exponential number of additions and multiplications
[101]. Let G = (V, E) be a graph, and assign a formal variable xi to each edge
ei of G. The spanning tree polynomial is defined as
X Y
xi (2.139)
T ∈T (G) xi ∈g(T )
Algebraic dynamic programming and monotone computations 91
where T (G) denotes the set of spanning trees of G, and for each spanning
tree T , g(T ) denotes the set of formal variables assigned to the edges of T .
It is easy to see that the evaluation of the spanning tree polynomial in the
tropical semiring is the score (that is, the sum of the weights of the edges)
of the minimum spanning tree. This score can be calculated with a simple
greedy algorithm but not with algebraic dynamic programming, at least not in
polynomial time. Also, the number of spanning trees, as well as the number of
minimum spanning trees can be computed in polynomial time, however, such
a computation needs subtraction. See also Chapter 3 for more details. Thus,
spanning trees are combinatorial objects for which optimization and counting
problems both can be computed in polynomial time, however, these algorithms
cannot be described in the algebraic dynamic programming framework, and
they are quite different.
2.5 Exercises
1. Show that the variants of the tropical semiring given in Definition 23
are all isomorphic to the tropical semiring.
2. Let ai,k and ni,k be the coefficients of the polynomials defined in Equa-
tions (2.4) and (2.5). Show that
(b) calculates X Y
ai .
S⊆A,|S|=k ai ∈S
92 Computational complexity of counting and sampling
14. Prove that two parse trees, both in Chomsky normal form, generating
the same sequence contain the same number of internal nodes, thus the
same number of rewriting rules.
15. Give a dynamic programming algorithm that counts all possible genera-
tions of a sequence by a context-free grammar in Chomsky normal form.
Here two generations
S = A1 → A2 → . . . → An = A
and
S = B1 → B2 → . . . → Bn = A
are considered to be different even if they have the same parse tree.
Note that although all parse trees generating the same sequence contain
the same number of rewriting rules, different parse trees might represent
different numbers of generations.
16. Suppose that in a Dyck word, each x is replaced by (123) and each y
is replaced with (12), and the Dyck word is evaluated as the product
of the defined cycles in the permutation group S3 . For example, xxyy
becomes
(123)(123)(12)(12) = (123)(123) = (132)
on the other hand, xyxy becomes
Notice that the operation defined by the table is neither associative nor
commutative. Give a dynamic programming algorithm that counts how
many parenthesizations of a given sequence of symbols {a, b, c} there are
that yield a.
cf. Dasgupta-Papadimitriou-Vazirani: Algorithms, Exercise 6.6.
18. Let B be a Boolean expression containing words from V =
{“T RU E”, “F ALSE”} and from O = {“and”, “or”, “xor”}. The ex-
pression is legitimate, that is, it starts and ends with a word from V and
contains words from V and O alternatively. Give a dynamic program-
ming algorithm that counts how many ways there are to parenthesize
the expression such that it will evaluate to “T RU E”.
19. An algebraic expression contains positive integer numbers, and addition
and multiplication operations. Give a dynamic programming algorithm
that
(a) calculates the maximum value that a parenthesization might take
and
(b) calculates the average value of a random parenthesization.
20. * Give a dynamic programming algorithm that counts how many se-
quence alignments of two sequences, X and Y , there are which contain
s substitutions, i insertions and d deletions.
21. * Assume that breaking a string of length l costs l units of running time
in some string-processing programming language. Notice that different
breaking scenarios using the same breaking positions might have dif-
ferent running time. For example, if a 30-character string is broken at
positions 5 and 20, then making the first cut at position 5 has a total
running time of 30 + 25 = 55, on the other hand, making the first break
at position 20 has a total running time of 30 + 20 = 50.
Give a dynamic programming algorithm that counts
Algebraic dynamic programming and monotone computations 95
(a) how many ways there are to break a string into m + 1 pieces at
m prescribed points such that the total running time spent on
breaking the substrings into smaller pieces is exactly w units and
(b) how many ways there are to break a string of length n into m +
1 pieces such that the total running time spent on breaking the
substrings into smaller pieces is exactly w units.
Two breaking scenarios are not distinguished if they use the same breaks
of the same substrings just in different order. For example, breaking a
string at position 20 and then at position 10 and then at position 30 is
not distinguished from the scenario which breaks a string at position 20
then at position 30 and then at position 10.
cf. Dasgupta-Papadimitriou-Vazirani: Algorithms, Exercise 6.6.
23. There are n biased coins, and the coin with index i has probability pi
for a head. Give a dynamic programming algorithm that computes the
probability that there will be exactly k heads when all the coins are
tossed once.
cf. Dasgupta-Papadimitriou-Vazirani: Algorithms, Exercise 6.10.
24. * Given a convex polygon P of n vertices in the Euclidian plane (by
the coordinates of the vertices). A triangulation of P is a collection of
n − 3 diagonals such that no two diagonals cross each other. Notice that
a triangulation breaks a polygon into n − 2 triangles. Give a dynamic
programming algorithm that calculates the maximum and average score
of a triangulation if it is defined by
(a) the sum of the edge lengths, and
(b) the product of the edge lengths.
(a) two breaking scenarios are equivalent if they contain the same
breakings in different order, and
(b) the order of the breakings count.
Question for fun: What is the minimum number of breakings necessary
to get nm number of 1 × 1 pieces?
26. There is a rectangular piece of cloth with dimensions n×m, where n and
m are positive integers. Also given a list of k products, for each product
a triplet (ai , bi , ci ) is given, such that the product needs a rectangle of
cloth of dimensions ai × bi and can be sold at a price ci . Assume that
ai , bi and ci are all positive integers. Any rectangular piece of cloth
can be cut either horizontally or vertically into two pieces with integer
dimensions. Give a dynamic programming algorithm that counts how
many ways there are to cut the clothes into smaller pieces maximizing
the total sum of prices. The order of the cuts does not count, but on the
other hand, it does count where the smaller pieces are located on the
n × m rectangle.
cf. Dasgupta-Papadimitriou-Vazirani: Algorithms, Exercise 6.14.
30. Give a dynamic programming algorithm that calculates for each k how
many increasing subsequences of length k are in a given permutation.
31. A sequence of numbers is called zig-zag if the differences of the consec-
utive numbers are positive and negative alternatively. Give a dynamic
programming algorithm that computes for each k how many zig-zag
subsequences of length k there are in a given permutation.
32. * Give a dynamic programming algorithm that for an input text, an
upper bound t, and for each k and x, calculates how many ways there
are to break the text into k lines such that each line contains at most t
characters, and the squared number of spaces at the end of the lines for
each line has a total sum x.
33. A sequence is palindromic if it is equal to its reverse. Give a dynamic
programming algorithm that in a given sequence
(a) finds the longest palindromic subsequence,
(b) calculates the number of longest palindromic subsequences, and
(c) calculates the average length of palindromic subsequences.
36. * Give a dynamic programming algorithm that for a given rooted binary
tree
(a) finds the number of subsets of vertices not containing any neigh-
boring vertices,
(b) computes the number of proper colorings of vertices with k ≥ 3
colors (recall that no neighboring vertices should have the same
color in a proper coloring of the vertices of a graph), and
(c) computes the number of ways that the vertices can be partially
colored with k ≥ 2 colors such that no neighboring vertices have
the same colors, however, one or both might be uncolored.
39. ◦ Generalize Exercise 38 such that diagonal steps are possible on both
paths.
40. * Let A = {a1 , a2 , . . . an } be a set of positive integers. Give a dynamic
programming algorithm that calculates
(a) the number of ways A can be split into 3 disjoint subsets, X, Y
and Z such that
X X X
ai = aj = ak
ai ∈X aj ∈Y ak ∈Z
of a random tripartition X t Y t Z = A.
Algebraic dynamic programming and monotone computations 99
~ = (V, E) and a weight function w : E → R,
41. Given a directed graph G
for any path π, define
f (π) := min w(e).
e∈π
2.6 Solutions
Exercise 3. Construct the following yield algebra. Let A be the subsequences
of prefixes of the given string. Technically, these subsequences can be repre-
sented as sequences from {0, 1}, where 0 means that the corresponding char-
acter is not part of the subsequence and 1 means that the corresponding char-
acter is part of the subsequence. For example, if A = 1, −5, 2, 4, −3, 2, 1, 1, 3,
then 010110 denotes −5, 4, −3. Any subsequence is parameterized with the
length of the prefix and the parity indicator of the length of the subsequence.
That is, Θ contains the (i, p) pairs, i ∈ [0, . . . n], p ∈ {0, 1}. The parame-
ters are partially ordered based on their first coordinate. The operation ◦1
concatenates 1 to the end of the sequence representing the subsequence, and
the operation ◦0 concatenates 0 to the end of the sequence representing the
subsequence. The recursion of the yield algebra is
with initial values S((0, 0)) = {} and S((0, 1)) = ∅, where denotes the
empty sequence.
When the largest product is to be calculated, the evaluation algebra is the
following. R = ((R≥0 + ∪{−∞}) × (R≥0 ∪ {−∞}), ⊕, ) semiring, ⊕ is the
coordinatewise maximum with the rule
max{−∞, a} = a
and
(x1 , y1 ) (x2 , y2 ) = (max{x1 x2 , y1 y2 }, max{x1 y2 , y1 x2 })
where the multiplication of −∞ with itself is defined as −∞. The rationale
is that the first coordinate stores the largest available maximum and the
second coordinate stores the absolute value of the largest possible negative
value, if it exists. Here −∞ stands for “non defined”. If X is a subsequence
{x1 , x2 , . . . xk }, then
Q
i i
Q
i|xi =1
(−1) ai , −∞ if i|xi =1 (−1) ai > 0
f (X) = −∞, i|xi =1 (−1)i ai i .
Q Q
if i|xi =1 (−1) ai < 0
i
(0, 0) Q
if i|xi =1 (−1) ai = 0
The T1 function for the operator ◦1 depends on the parameter (i, p). If
(−1)i ai > 0, then T1 is the multiplication with ((−1)i ai , −∞), and if
(−1)i ai < 0, then T1 is the multiplication with (−∞, (−1)i ai ). If ai = 0,
then T1 is the multiplication with (0, 0). Finally, the function T0 for the oper-
ator ◦0 is the identity function.
When the sum of the score of all possible subsequences is to be calculated,
the evaluation algebra is the following. R is the real field; if X is a subset,
then Y
f (X) = (−1)i ai .
ai ∈X
The function T1 for operation ◦1 depends on the parameter (i, p); it is the mul-
tiplication with (−1)i ai . The function T0 for the operation ◦0 is the identity
function.
Exercise 6. The following unambiguous regular grammar generates the pos-
sible sequences. T = {a, b}, N = {S, A, B, X} and the rewriting rules are
S → aA | bB | a | b
A → aA | bX | a | b
B → aA | bB | a | b
X → bB | b.
The nonterminal A means that the last generated character was a, B means
that the last generated character was b and the next to last character was not
a, and X means that the last generated character was b and the next to last
Algebraic dynamic programming and monotone computations 101
character was a. The usual yield algebra and the corresponding evaluation
algebra as described in Subsection 2.3.1 gives the recursion on the sets of
possible generations and counts them. Since the grammar is unambiguous,
the number of possible generations is the number of possible sequences.
Exercise 7. The following unambiguous grammar generates the possible se-
quences. T = {a, b, c, d, e}, N = {S, W1 , W2 , W3 }, and the rewriting rules are
The non-terminal S means that in the so-far generated sequence, the number
of generated a’s is even, and the sum of the number of c’s and d’s is even.
Similarly, W1 stands for even-odd, W2 stands for odd-even, and W3 stands for
odd-odd. The usual yield algebra and the corresponding evaluation algebra
as described in Subsection 2.3.1 gives the recursion on the sets of possible
generations and counts them. Since the grammar is unambiguous, the number
of possible generations is the number of possible sequences.
Exercise 10. Let a sequence from the alphabet {0, 1} represent a subset of
indexes of locations, such that the sequence X = x1 x2 . . . xk represents
{i | xi = 1} .
with initial condition S((0, 0)) = {}, where denotes the empty sequence.
Define d0 to be −∞, thus dk − d0 > d for any k.
If the task is to calculate the maximum expected profit, then the evalua-
tion algebra can be constructed in the following way. K is the dual tropical
semiring, that is, the tropical addition is taking the maximum instead of the
minimum, and the additive unit is −∞ instead of ∞. The f function is defined
as
f (x1 x2 . . . xk ) := i|xi =1 pi .
The T1 function for the operator ◦1 depends on parameter (k, k); it is the
102 Computational complexity of counting and sampling
tropical multiplication (that is, the usual addition) with pk . The T0 function
for operator ◦0 is the identity function. The maximum expected profit is
If the task is to calculate the variance of the expected profit of the uniform
distribution of legal opening plans, then R is set to (R3 , ⊕, ), where ⊕ is the
coordinate-wise addition, and
The function T1 for the operator ◦1 depends on the parameter (k, k); it is
the multiplication with (1, pk , p2k ) in K. T0 for the operation ◦0 is the identity
function.
Define (N, M, Z) in the following way:
1 0 0 0 1 0
0 1 0 0 0 1
a)
0 b)
0 c)
1 d)
0 e)
1 f)
0
0 0 0 1 0 1
S → aA | bB | cC | dD | eE | f F | a | b | c | d | e | f
A → bB | cC | dD | f F | b | c | d | f
B → aA | cC | dD | eE | a | c | d | e
C → aA | bB | dD | f F | a | b | d | f
D → aA | bB | cC | eE | a | b | c | e
E → bB | dD | f F | b | d | f
F → aA | cC | eE | a | c | e
It is easy to see that this grammar defines the possible sequences of columns
such that no horizontal or vertical neighbors have pebbles. Define the score
of a generation as the sum of the numbers on the squares that have pebbles.
The yield algebra is the standard one for generations in a regular grammar,
in the evaluation algebra. The semiring in the evaluation algebra should be
appropriately chosen according to the given computational task.
Exercise 13. The index i is the number of columns in the alignment without
gap symbols, that is, the alignment columns with a match or a mismatch.
Their number might vary between 0 and min{n, m}. If there are i matches or
mismatches in an alignment, then there are n−i deletions and m−i insertions.
The total length of such an alignment is n + m − i. The number of alignments
with these properties is indeed
n+m−i (n + m − i)!
= .
n − i, m − i, i (n − i)!(m − i)!i!
Exercise 17. Define the following yield algebra. Let A be the possible paren-
thesizations of the possible substrings. Recall that a substring is a consecutive
part of a sequence. The possible parenthesizations of a sequence X are se-
quences from the alphabet {(, ), a, b, c} satisfying the following rules.
• The subsequence obtained by removing the “(” and “)” symbols is X.
• The subsequence obtained by removing characters a, b and c is a Dyck
word.
• There are at least two characters between an opening bracket and its
corresponding closing bracket.
104 Computational complexity of counting and sampling
X1 ◦ X2 := (X1 X2 ).
where dxi ,yj is the Kronecker delta function. The initial condition is
S((0, 0, 0, 0)) = , where denotes the empty alignment. The eval-
uation algebra simply counts the size of the sets, that is, R is the integer ring,
f is the identity function, and all T functions corresponding to operators ◦i,j ,
◦−,j and ◦i,− are the identity functions.
An alternative solution is also possible. In this solution, the yield algebra
is the following. A is the possible alignments of the possible prefixes. The
parameters are pairs (i, j) denoting the length of the prefixes, (i1 , j1 ) ≤ (i2 , j2 )
if i1 ≤ i2 and j1 ≤ j2 . The operators ◦i,j , ◦−,j and ◦i,− are the same as defined
above. The recursions are
S((i, j)) = ◦i,j (S((i − 1, j − 1))) t ◦−,j (S((i, j − 1))) t ◦i,− (S((i − 1, j)))
with initial condition S((0, 0)) = . In the evaluation algebra, R =
Z[z1 , z2 ], the two variable polynomial ring over Z. The function f maps z1s z2d
to an alignment with s number of substitutions and d number of insertions.
1−δx ,y
The Ti,j function for operator ◦i,j is the multiplication with z1 i j , T−,j is
the identity function and Ti,− is the multiplication with z2 . The number of
alignments with s number of substitutions and d number of deletions is the
coefficient of z1s z2d of the polynomial F (S((n, m))), where n and m are the
length of X and Y , respectively.
Exercise 21. To solve the first subexercise, consider a sequence X whose
characters are the m + 1 segments of the original sequence. Each character in
x has a weight w(x) defined as the number of characters in the corresponding
segment. A breaking scenario is a series of breaks, bi1 bi2 . . . bik , where bi is the
break at the border of segments xi and xi+1 . A breaking scenario bi1 bi2 . . . bik
is canonical if for any j < l either ij < il or the break bij acts on a substring
that includes xil (or both conditions hold).
In the yield algebra, A is the possible canonical breaking scenarios of the
possible substrings of X. The parameter (i, j) denotes the first and the last
106 Computational complexity of counting and sampling
with the initial conditions S(i, i)) = {}, where is the empty sequence (of
breaking steps).
In the evaluation algebra, R = Z[z], the polynomial ring over the integers.
The function f maps z W to a breaking scenario, where W is the total run-
ning time of the scenario. The function T for the operator ◦ depends on the
parameters; if p(Q) = (i, k) and p(R) = (k + 1, j), then
Pj
w(xl )
T (Q, R) = z l=i f (Q)f (R)
and similarly, for two values k1 , k2 ∈ K,
Pj
w(xl )
T (k1 , k2 , (i, k), (k + 1, j)) = z l=i k1 k2 .
The number of breaking scenarios with total running time w is the coefficient
of z w in F (S((1, m + 1))).
In the second subexercise, A, the base set of the yield algebra contains the
possible canonical breaking scenarios of the possible substrings of the original
string into smaller substrings (not necessarily into single characters). The
canonical breaking scenarios are defined similarly as above. The parameter
(i, j, l) denotes the beginning of the substring, the end of the substring and
the number of breaks, respectively. In the partial ordering of the parameters,
(i1 , j1 , l1 ) ≤ (i2 , j2 , l2 ) if i2 ≤ i1 , j1 ≤ j2 and l1 ≤ l2 . The operation ◦ operates
on a pair of breaking scenarios operating on consecutive substrings; if p(Q) =
(i, k, l1 ) and p(R) = (k + 1, j, l2 ), then
Q ◦ R = bk , Q, R.
The recursions are
S((i, j, l)) = tj−1 l−1
k=1 tl1 =0 S((i, k, l1 )) ◦ S((k + 1, j, l − l1 − 1))
with initial conditions S((i, j, 0)) = {}, where denotes the empty sequence
(of breaking steps).
In the evaluation algebra, R = Z[z], the polynomial ring over the integers.
The function f maps z W to a breaking scenario, where W is the total run-
ning time of the scenario. The function T for the operator ◦ depends on the
parameters; if p(Q) = (i, k, l1 ) and p(W ) = (k + 1, j, l2 ) then
T (Q, W ) = z j−i+1 f (Q)f (W )
Algebraic dynamic programming and monotone computations 107
Define the following yield algebra. The base set A contains the triangula-
tions of sub-polygons (vi , vj , vj+1 , . . . vi ) described as the canonical ordering of
the diagonals participating in the triangulations, where i > j. The parameters
are the pairs (i, j) describing the former diagonal (vi , vj ) of the sub-polygon.
In the partial ordering of the parameters, (i1 , j1 ) ≤ (i2 , j2 ) if i1 ≤ i2 and
j2 ≤ j1 (observe that i > j). If p(Q) = (i, k) and p(R) = (k, j) then the
operator ◦ acts on them as
Q ◦ R = (vi , vk ), Q, (vk , vj ), R.
◦l (Q) = (vi , vj ), Q.
Here is the tropical multiplication, that is, the usual addition, and |(vi , vj )|
denotes the length of the edge (vi , vj ). The function Tl for the operator ◦l also
depends on the parameter; if p(Q) = (i, j), then
The function T for operator ◦ depends on the parameters. If p(Q) = (i, k) and
p(W ) = (k, j), then
T (r1 , r2 , (i, k), (k, j)) = (1, |(vi , vk )|) r1 (1, |(vk , vj )|) r2 .
The function Tl for the operator ◦l depends on the parameter, if p(Q) = (i, j),
then
TL (Q) = (1, |(vi , vj )|) f (Q)
and similarly, for any r ∈ R,
If F (S((n, 1))) = (x, y), then x is the number of possible triangulations (it is
easy to check that x is the n − 2nd Catalan number) and y is the total sum of
the scores of the triangulations. The average score is simply y/x.
When the score is multiplicative and the task is to calculate the minimum
score of the triangulations, then R is the exponentiated tropical semiring,
that is, R = (R+ ∪ ∞, ⊕, ), where ⊕ is the minimum, and is the usual
multiplication. The function f is defined as the product of the diagonal lengths
in the triangulation. The function T for operator ◦ depends on the parameters;
if p(Q) = (i, k) and p(W ) = (k, j), then
The function Tl for operator ◦l depends on the parameters; if p(Q) = (i, j),
then
Tl (Q) = |(vi , vj )| f (Q),
and similarly, for any r ∈ R
The function T for operator ◦ depends on the parameters; if p(Q) = (i, k) and
p(W ) = (k, j), then
T (r1 , r2 , (i, k), (k, j)) = (1, |(vi , vk )|) r1 (1, |(vk , vj )|) r2 .
The function Tl for operator ◦l depends on the parameters; if p(Q) = (i, j),
then
Tl (Q) = (1, |(vi , vj )|) f (Q),
and similarly, for any r ∈ R,
If F (S((n, 1))) = (x, y), then x is the number of triangulations, and y is the
sum of the scores. The average score is y/x.
Remark. If P is a convex polygon, then x, the number of triangulations,
is the n − 2nd Catalan number, and thus, in the last subexercise, R could be
chosen as the real number field, with the usual addition and multiplication.
However, if P is concave then all the calculations can be modified such that
in the yield algebra, the recursion in Equation (2.142) can be modified in
such a way that only those operations are considered for which the emerging
diagonals are inside the polygon. Then the appropriate evaluation algebra
still counts the number of triangulations, which will no longer be the n − 2nd
Catalan numbers.
Exercise 25. The two sub-exercises need different yield algebras and eval-
uation algebras. If the order of the breaking counts, then the yield algebra
is simple, however, the evaluation algebra is tricky. When the order of the
breakings does not count, the yield algebra is complicated and the evaluation
algebra is simple.
When the order of the breaks does not count, a canonical ordering of the
breakings must be defined. For any chocolate bar larger than 1 × 1, there is
either at least one vertical break running through the entire chocolate bar or
there is at least one horizontal break running through the entire chocolate bar,
however it is impossible to have both horizontal and vertical breaks running
through the entire chocolate bar. Call these breaks long breaks. If there are one
or more vertical long breaks, the canonical order starts with the leftmost break
breaking the bar into pieces B1 and B2 (left and right pieces, respectively),
then followed by the canonical order of breaks of B1 which must start with
Algebraic dynamic programming and monotone computations 111
Q ◦h R := bi1 ,h , Q, R
where bi1 ,h denotes the horizontal break breaking the chocolate bar at the
horizontal line after the first i1 rows. If p(Q) = (i, j1 , h) and p(R) = (i, j2 , x),
then the ◦v operator is defined as
Q ◦v R := bj1 ,v , Q, R
where bj1 ,v denotes the vertical break breaking the chocolate bar at the vertical
line after the first j1 columns. The recursions are
with the initial condition S((1, 1)) = {}, where is the empty string (of
breaking steps). The evaluation algebra is the standard one for computing the
size of the sets, that is, R is the integer ring, f is the constant 1 function, and
both functions for the operators ◦h and ◦v are the multiplication.
When the order does count, the possible breaking scenarios are clus-
tered based on a canonical ordering defined below. The first breaking of
the canonical ordering of a breaking scenario is its first break b breaking
the bar into pieces B1 and B2 (the left and right or the top and bottom
pieces, respectively), then followed by the canonical ordering of the breaks in
B1 , b1,1 , b1,2 , . . . , b1,k1 and then the canonical ordering of the breaks in B2 ,
b2,1 , b2,2 , . . . , b2,k2 . If there are g1 number of breaking scenarios of B1 whose
canonical ordering is b1,1 , b1,2 , . . . , b1,k1 and there are g2 number of breaking
scenarios of B2 whose canonical ordering is b2,1 , b2,2 , . . . , b2,k2 , then there are
k1 + k2
g1 g2
k1
112 Computational complexity of counting and sampling
The idea is to define a yield algebra building the canonical ordering of the
breaking scenarios, then cz k will be assigned to each canonical ordering, where
c is the number of breaking scenarios that the canonical ordering represents,
and k is the number of breaks in it.
Having said this, define the following yield algebra. The base set A contains
the canonical ordering of the possible breaking scenarios of the chocolate bars
with dimensions i×j. The parameters (i, j) denote these dimensions, (i1 , j1 ) ≤
(i2 , j2 ) if i1 ≤ i2 and j1 ≤ j2 . If p(Q) = (i1 , j) and p(R) = (i2 , j), then the
operator ◦h is defined as
Q ◦h R := bi1 ,h , Q, R
where bi1 ,h denotes the horizontal break breaking the chocolate bar at the
horizontal line after the first i1 rows. If p(Q) = (i, j1 ) and p(R) = (i, j2 ), then
the ◦v operator is defined as
Q ◦v R := bj1 ,v , Q, R
where bj1 ,v denotes the vertical break breaking the chocolate bar at the vertical
line after the first j1 columns. The recursions are
In the evaluation algebra, R = Z[z], the polynomial ring over the integers.
The function f maps cz k to a canonical ordering, where c is the number of
breaking scenarios that the canonical ordering represent, and k is the number
of breaks in it. Both functions Th and Tv for the operators ◦h and ◦v are the
following convolution of polynomials.
k1
! k
k1 +k
X X2 X2 +1
ai z i bj z j = cl zl
i=0 j=0 l=1
where
k
X k
ck+1 = ai bk−i .
i=0
i
The total number of breaking scenarios is the sum of the coefficients in
F (S((n, m))).
The answer to the question for fun is that any break takes one piece of
chocolate bar and results two pieces, therefore, any breaking scenario of an
n × m chocolate bar needs nm − 1 breakings (the number of pieces should
Algebraic dynamic programming and monotone computations 113
be increased from 1 to nm, and each break increases the number of pieces by
1). Thus, F (S((n, m))) is only one monoid, cz nm−1 . Therefore, the evalua-
tion algebra can be simplified in the following way. R is the integer ring. The
function f maps the number of breaking scenarios whose canonical represen-
tation is the one in question. Both Th and Tv depend on the parameters; if
p(Q) = (i1 , j) and p(W ) = (i2 , j), then
i1 j + i2 j − 2
Th (Q, W ) = f (Q)f (W ),
i1 j − 1
t ◦l (S((i, j − 1))) .
For each i, the initial value S((i, i)) is the set containing the tree with a single
vertex labeled by li .
If the task is to find the minimum possible cost, then the algebraic structure
R in the evaluation algebra is the tropical semiring. The f function assigns
the score of the tree for any binary search tree. The function T for operator ◦
depends on the parameters; if p(Q) = (i, k − 1) and p(W ) = (k + 1, j), then
j
T (Q, W ) = f (Q) f (W ) m=i π(lm )
where is the tropical multiplication, that is, the usual addition. The expla-
nation for this definition is that each distance in Q and W is increased by
114 Computational complexity of counting and sampling
1, furthermore, the new root is labeled with lk , and this new vertex also has
distance 1. Similarly, for any r1 , r2 ∈ R,
j
T (r1 , r2 , (i, k − 1), (k + 1, j)) = r1 r2 m=i π(lm ) .
The function Tl and Tr also depend on the parameter. If p(Q) = (i, j), then
j
Tr (Q) = f (Q) m=i−1 π(lm ) ,
If the task is to calculate the average score of the possible binary search
trees, then R = (R2 , ⊕, ), where the addition is coordinatewise and the
multiplication is defined as
For any binary search tree with score x, the f function assigns the value (1, x).
The function T for operator ◦ depends on the parameters; if p(Q) = (i, k − 1)
and p(W ) = (k + 1, j), then
j
!
X
T (Q, W ) = f (Q) f (W ) 1, π(lm ) ,
m=i
The functions Tr and Tl for operators ◦r and ◦l also depend on the parameter.
If p(Q) = (i, j), then
j
!
X
Tr (Q) = f (Q) 1, π(lm )
m=i−1
Algebraic dynamic programming and monotone computations 115
j+1
!
X
Tl (Q) = f (Q) 1, π(lm )
m=i
and for any r1 ∈ R,
j
!
X
Tr (r1 , (i, j)) = r1 1, π(lm ) ,
m=i−1
and
j+1
!
X
Tl (r1 , (i, j)) = r1 1, π(lm ) .
m=i
If F (S((1, n))) = (x, y), then the average score is y/x.
Exercise 28. Let v be an arbitrary leaf of T = (V, E) and define a partial
ordering ≤ on V such that v1 ≤ v2 if the unique path from v1 to v contains
v2 . Let Tvi denote the subtree which contains the vertices which are smaller
than or equal to vi . A matching of a subtree Tvi = (Evi , Vvi ) is a mapping M :
Evi → {0, 1} such that for any two edges e1 , e2 ∈ Evi , if M (e1 ) = M (e2 ) = 1,
then e1 and e2 are disjoint. The size of a matching is |{e|M (e) = 1}|.
Define the following yield algebra. The base set A contains all matchings
of all subtrees Tvi . The parameter set θ is V × {0, 1}. On this parameter set,
(vi , xi ) ≤ (vj , xj ) if vi < vj in the above defined partial ordering of the vertices
of T or vi = vj and xi = xj . If M ∈ K is a matching on Tvi such that one of
the edges covers vi , then p(M ) = (vi , 1), otherwise p(M ) = (vi , 0).
The following operators are defined in the yield algebra. Let u be an inter-
nal node, let w1 and w2 be its children (the vertices that are smaller than u in
the partial ordering). Let e1 = (u, w1 ) and e2 = (u, w2 ). If M1 is a matching
of Tw1 and M2 is a matching of Tw2 , then M1 ◦ M2 is a matching M of Tu
such that M (e1 ) = M (e2 ) = 0 and other assignments follow the mappings in
M1 and M2 .
If M1 is a matching with parameter (w1 , 0) and M2 is a matching of Tw2 ,
then M1 ◦l M2 is a matching M of Tu such that M (e1 ) = 1, M (e2 ) = 0 and
other assignments follow the mappings in M1 and M2 .
If M1 is a matching of Tw1 and M2 is a matching with parameter (w2 , 0),
then M1 ◦r M2 is a matching M of Tu such that M (e1 ) = 0, M (e2 ) = 1 and
other assignments follow the mappings in M1 and M2 .
If the child of v is w, e = (v, w), and M is a matching with parameter
(w, 0), then M1 = ◦1 (M ) is a matching such that M1 (e) = 1 and all other
mappings follow M . Finally, if M is a matching of Mw , then M0 = ◦0 (M ) is
a matching such that M0 (e) = 0 and all other assignments follow M .
The recursions are
S((u, 0)) = tx1 ∈{0,1} S((w1 , x1 )) ◦ S((w2 , x2 )) (2.143)
x2 ∈{0,1}
S((u, 1)) = tx2 ∈{0,1} S((w1 , 0)) ◦l S((w2 , x2 )) t
tx1 ∈{0,1} S((w1 , x1 )) ◦r S((w2 , 0)) (2.144)
116 Computational complexity of counting and sampling
j
!2
X
g(i, j) := t− |wk | + j − i .
k=i
Define the following yield algebra. The base set A contains the possible wrap-
pings of the prefix text w1 w2 . . . wi into lines. The parameter i denotes the
Algebraic dynamic programming and monotone computations 117
number of words in the prefix, and the parameters are naturally ordered. The
operator ◦i,j adds a new line to the wrapped text containing the words from
wi till wj . The recursions are
Note that for an X and one of its palindromic supersequences Y , there might
be more than one injection that certifies that Y is indeed a supersequence
of X. However, each such injection indicates a possible (and different!) way
to make X palindromic. Any injection can be represented as an alignment
of X and Y that does not contain mismatches (substitutions) and deletions,
only matches and insertions. Therefore the algebraic dynamic approach counts
these alignments.
Based on the above, define the following yield algebra. The base set A is
the possible alignments of substrings of X to their possible palindromic su-
persequences. The parameters (i, j) indicate the first and the last indexes of
the substrings. In the partial ordering of the parameters, (i1 , j1 ) ≤ (i2 , j2 ) if
i2 ≤ i1 and j1 ≤ j2 . The unary operator ◦ takes an alignment with parameters
x
(i, j), and adds an alignment column i−1 at the beginning of the alignment
xi−1
xj+1
and adds an alignment column at the end of the alignment. The unary
xj+1
operator ◦l takes an alignment with parameters (i, j), adds an alignment col-
x
umn i−1 at the beginning of the alignment, and adds an alignment column
xi−1
−
at the end of the alignment. The unary operator ◦r takes an alignment
xi−1
−
with parameters (i, j), adds an alignment column at the beginning
xj+1
118 Computational complexity of counting and sampling
xj+1
of the alignment, and adds an alignment column at the end of the
xj+1
alignment. The recursions are
and
parent with color c. In the following recursions, the children of v are u1 and
u2 , w denotes white, b denotes black, r denotes red, n denotes “no color”. The
recursions are
(a)
S((v, w)) = (S((u1 , b)) t S((u1 , w))) ◦w (S((u2 , b)) t S((u2 , w)))
S((v, b)) = S((u1 , w)) ◦b S((u2 , w))
(b)
S((v, w)) = (S((u1 , b)) t S((u1 , r))) ◦w (S((u2 , b)) t S((u2 , r)))
S((v, b)) = (S((u1 , w)) t S((u1 , r))) ◦w (S((u2 , w)) t S((u2 , r)))
S((v, r)) = (S((u1 , w)) t S((u1 , b))) ◦w (S((u2 , w)) t S((u2 , b)))
(c)
S((v, ci )) = tcj 6=ci S((u1 , cj )) t S((u1 , n)) ◦ci
tcj 6=ci S((u2 , cj )) t S((u2 , n))
S((v, n)) = ((tci S((u1 , ci ))) t S((u1 , n))) ◦n
((tci S((u2 , ci ))) t S((u2 , n))) .
(2.147)
The evaluation algebra is the standard one counting the size of sets, that is,
R is the integer ring, f is the constant 1 function, and each T operator is the
multiplication.
Exercise 38. Note that inverting the path from the bottom right corner to the
top left makes it also a path from the top left corner to the bottom right. This
inverted path will step to a shared square in the same number of steps as the
top-down path. Having said this, construct the following yield algebra. The
base set A contains the pair of prefixes of the top-down path and the inverted
down-top path with the same length. The parameter quadruple (i1 , j1 , i2 , j2 )
gives the indexes of the last squares of the prefixes, (i1 , j1 ) and (i2 , j2 ). In the
partial ordering of the parameters, (i1 , j1 , i2 , j2 ) ≤ (l1 , m1 , l2 , m2 ) if ik ≤ lk
and (jk ≤ mk ) for both k = 1, 2. The four unary operators, ◦hh , ◦hv , ◦vh and
◦vv extend the prefixes with horizontal or vertical steps. The recursions are
with the initial condition S((1, 1, 1, 1)) containing a pair of paths both con-
taining the top left square.
120 Computational complexity of counting and sampling
If the task is to count the paths, then the evaluation algebra is the usual
one with R = Z, the f function is the constant 1, and each function Txy ,
x, y ∈ {h, v} is the identity. The number of paths is F (S((n, m))).
If the task is to calculate the tour with the maximum score, then the
semiring in the evaluation algebra is the dual tropical semiring (with maxi-
mum instead of minimum). The f function is the sum (tropical product) of
the number of coins along the paths, without multiplicity. The T functions
depend on the parameter. If the parameter of the resulting couple of paths
is (i1 , j1 , i2 , j2 ) and i1 = i2 (and then j1 = j2 !), then the function is the
tropical multiplication (usual addition) with the number of coins on square
mi1 ,j1 . Otherwise, the function is the tropical multiplication (usual addition)
of the sum (tropical product) of the coins on the two different squares. The
maximum possible sum of coin values is F (S((n, m))).
Finally, if the task is to calculate the average score of the tours, then
R = (R2 , ⊕, ), where ⊕ is the coordinatewise addition and
(x1 , y1 ) (x2 , y2 ) = (x1 x2 , x1 y2 + y1 x2 ).
The f function assigns (1, w) of a pair of paths, where w is the sum of the
coins on a pair of paths without multiplicity. The T functions depend on
the parameters. If the resulting pair of paths have parameters (i1 , j1 , i2 , j2 )
and i1 = i2 , then it is the multiplication with (1, wi1 ,j1 ), where wi1 ,j1 is the
number of coins on square mi1 ,j1 . Otherwise, it is the multiplication with
(1, wi1 ,j1 + wi2 ,j2 ). If F (S((n, m))) = (x, y), then the average score is y/x.
Exercise 39. The trick of inverting the down-top path still works. However,
the lengths of the paths might differ, therefore the prefix of the top-down
path and the prefix of the inverted path might step to the same square after a
different number of steps. Therefore the pair of prefixes must be in the same
column to be able to check which squares are shared. An extension of both
prefixes contains a diagonal or horizontal step and possibly a few down steps.
Exercise 40. Construct the following yield algebra. The base set A contains
the possible tripartitions XtY tZ of the possible prefixes of a1 , a2 , . . . , an . The
parameter (l, d1 , d2 ) describesP the lengthP of the prefix, l,P
and the differences
P in
the sums of the sets, d1 := ai ∈X ai − aj ∈Y aj , d2 := ai ∈X ai − ak ∈Z ak .
The partial ordering is based on the length of the prefixes, and a parameter
with shorter prefix length is smaller. The unary operators ◦ai ,X , ◦ai ,Y , and
◦ai ,Z add ai to the sets X, Y and Z, respectively. The recursions are
S(l, d1 , d2 ) = ◦al ,X (S((l − 1, d1 − al , d2 ))) t
◦al ,Y (S((l − 1, d1 + al , d2 ))) t ◦al ,Z (S((l − 1, d1 , d2 + al )))
with the initial condition S((0, 0, 0)) containing an empty set for all X, Y and
Z.
If the task is to count the tripartitions with equal sums, then the evaluation
algebra contains R = Z, f is the constant 1 function, and all T functions
are the identity functions. The number of tripartitions with equal sums is
S((n, 0, 0)).
Algebraic dynamic programming and monotone computations 121
The average of the sum of squared differences of the subset sums can be
calculated with the same evaluation algebra. The solution is
P P 2 2 2
d1 d2 F (S((n, d1 , d2 )))(d1 + d2 + (d1 − d2 ) )
P P .
d1 d2 F (S((n, d1 , d2 )))
and
n
X n
X
bi ≤ c∗j
i=1 j=1
is equivalent with
n
X n
X
bi = f cj ∗
i=1 j=1
and c1 ≤ n (and thus c∗n+1 = 0). Therefore the number of graphical bidegree
sequences {b, c} with length |b| = n and |c| = m is the number of decreasing
non-negative sequences {b, c∗ } with length |b| = n and |c∗ | = n such that they
satisfy for all k = 1, . . . , n
k
X Xk
bi ≤ c∗j (2.148)
i=1 j=1
with initial conditions S((s, t, t − s, 1)) = {{s, t}} and S((s, t, δ, 1)) = ∅ for all
δ 6= t − s.
The evaluation algebra is the standard one counting the size of the sets,
that is, R = Z, f is the identity function and each Tx,y function is the identity
function. The number of graphical bidegree sequences with lengths |b| = n
and |c| = m is
Xm X m
F (S((s, t, 0, n))).
s=0 t=0
Chapter 3
Linear algebraic algorithms. The
power of subtracting
In this chapter, we are going to introduce algorithms that count discrete math-
ematical objects using three operations: addition, subtraction and multiplica-
tion. The determinant and the Pfaffian of certain matrices will be the center
of interest. The minors of Laplacians of graphs are the number of spanning
trees or in case of directed matrices, the number of in-trees. The number of
in-trees appears in counting the Eulerian circuits in directed Eulerian graphs.
The Pfaffians of appropriately oriented adjacency matrices of planar graphs
are the number of perfect matchings in those graphs.
The above-mentioned discrete mathematical objects might have weights,
and these weights might come from arbitrary commutative rings. We might
also want to calculate the sum of these weights. Divisions are not available
in commutative rings, therefore it is necessary to give division-free polyno-
mial running time algorithms for calculating the determinant and Pfaffian of
matrices in an arbitrary commutative ring. Surprisingly, such algorithms are
available, and actually, they are dynamic programming algorithms. These dy-
namic programming algorithms are on some combinatorial objects called clow
sequences. Clow sequences are generalizations of permutations. The sum of
signed weights of clow sequences coincide with the determinants and Pfaffi-
ans, which, by definition, are the sum of signed weights of permutations. The
coincidence is due to cancellation of terms in the summation, which might
not be available without subtractions. We are going to show that subtractions
are inevitable in efficient algorithms: theorems exist that there are no algo-
rithms that can count the above-mentioned discrete mathematical objects in
polynomial time without subtraction.
123
124 Computational complexity of counting and sampling
description of walks. When the last vertex is omitted, the series of vertices
are put into parenthesis, indicating that the walk is closed. Vertex vi0 is called
the head of the walk, and is denoted by head(C). The length of a walk is the
number of edges in it, and is denoted by l(C).
A clow sequence is a series of clows, C = C1 , C2 , . . . Cm such that for all
i = 1, . . . , m − 1, head(Ci ) < head(Ci+1 ). The length of a clow sequence, l(C),
is the sum of the length of the clows in it.
If G ~ is edge-weighted, where w : E → R denotes the weight func-
tion, the score of a clow sequence is defined. The score of a clow C =
vi0 , ej1 , vi1 , . . . ejk , vik is defined as
k
Y
W (C) := w(ei ). (3.2)
i=1
When G~ is the complete directed graph with loops, all permutations appear
amongst clow sequences. Indeed, define the canonical cycle representation of
a permutation as
such that for all i, j, σi,1 < σi,j and for all i = 1, . . . , m − 1, σi,1 < σi+1,1 .
This canonical ordering represents a clow sequence containing m clows, and
the ith clow is
(vσi,1 , vσi,2 , . . . , vσi,li ).
On the other hand, there are clow sequences that are not permutations.
Indeed a clow might visit a vertex several times and two different clows might
visit the same vertex if both heads of the clows are smaller than the visited
vertex. However, these are the only exceptions when a clow sequence is not a
permutation, as stated in the lemma below.
Lemma 3. Let K ~ n be the complete directed graph on n vertices, in which
loops are allowed. Let Pn denote the set of clow sequences on K ~ n for which
each clow is a cycle and each vertex is in at most one cycle. Then the mapping
ϕ that maps each permutation
W (σ) = W (ϕ(σ))
To show this, an involution g is given on Cn \Pn such that for any C ∈ Cn \Pn ,
it holds that
W (C) = −W (g(C)).
Let C be a clow sequence in Cn \ Pn containing m clows. Let i be the smallest
index such that Ci+1 , Ci+2 , . . . Cm are vertex disjoint cycles. Since C is not in
P, i cannot be 0. Ci = (vi,1 , vi.2 , . . . , vi,l+i ) is not a cycle or Ci intersects with
some Ck , k > i or both. Let vi,p be called a witness if vi,p ∈ Ck for some k > i
or there exists vi,q ∈ Ci such that q < p and vi,p = vi,q . Take the smallest
index j such that vi,j is a witness. Observe that either vi,j ∈ Ck or there is
a j 0 < j such that vi,j = vi,j 0 , but the two cases cannot hold in the same
time. Indeed, otherwise vi,j 0 was a witness since it is in Ck , contradicting the
minimality of j. If vi,j ∈ Ck , and it has an index pair k, j 0 in Ck , then define
Ci0 := (vi,1 , vi,2 , . . . vi,j , vk,j 0 +1 , vk,j+2 , . . . v,k,l+k , vk,1 , . . . , vk,j 0 , vi,j+1 , . . . , vi,li ).
Observe that the smallest index i0 such that Ci0 +1 , Ci0 +2 , . . . are dis-
joint cycles in g(C) is still i, and the smallest index witness in Ci is
still vertex vi,j (just now it has a larger index, j + l(Ck )). Indeed, the
cylces Ci+1 . . . , Ck−1 , Ck+1 , . . . , Cm are still disjoint. The set of vertices
vi,1 , . . . , vi,j−1 were not witnesses in C, so they cannot be witnesses in g(C).
Furthermore, the new vertices in Ci0 coming from Ck cannot be witnesses since
Ck was a cycle disjoint from cycles Ci+1 . . . , Ck−1 , Ck+1 , . . . , Cm , and there
were no witnesses in Ci with a smaller index than j, so Ck does not contain
any vertex from vi,1 , . . . , vi,j−1 .
Now consider a clow sequence C 0 such that its smallest index witness vi,j
is a witness since a vi,j 0 exists such that j 0 < j and vi,j = vi,j 0 . Then define
C 0 might have to be rotated to start with the smallest vertex, and thus to be
a clow sequence. Define
Observe that the head of C 0 is larger than vi,1 , so it will appear after Ci in the
clow sequence. Furthermore, C 0 is vertex disjoint from cycles Ci+1 , . . . , Cm ,
since its vertices were not witnesses in Ci , except vi,j 0 = vi,j . Then the smallest
index i0 such that Ci0 +1 , Ci0 +2 , . . . are disjoint cycles is still i, and the smallest
index witness in Ci is still vertex vi,j , since it is in C 0 .
128 Computational complexity of counting and sampling
Based on the above, observe that whatever is the reason that vi,j is the
smallest index witness, the g function is defined such that
g(g(C)) = C,
therefore g is indeed an involution. Furthermore the lengths of C and g(C)
are the same, they contain exactly the same edges, however, their number of
cycles differs by 1. Thus
W (C) = −W (g(C)),
and thus X
W (C) = 0.
C∈C\P
It is possible to build a yield algebra for the clow sequences. More precisely,
the base set A contains partial clow sequences on a given directed graph,
~ = (V, E). A partial clow sequence contains some clows and an ordered walk
G
that might not be closed yet. The parameters are (vh , v, k), where vh is the
head of the current clow, v is the actual vertex, and k is the length of the
partial clow sequence, that is, the number of edges in the clow sequence. In
the partial ordering of the parameters, (vh , v, k) ≤ (vh0 , v 0 , k 0 ) if vh ≤ vh0 and
k ≤ k 0 . Obviously, only those parameters are valid, in which vh < v. The
unary operator ◦(u,w) adds the new edge (u, w) to the partial clow sequence.
If (vh , v) is an edge in G,~ then u might be vh and w might be v, and the
edge (u, w) starts a new partial clow. Otherwise u must be v, and the current
(partial) clow is extended with an edge. Therefore, the recursions are
S((vh , v, k)) = tu>vh ∧(u,v)∈E ◦(u,v) (S((vh , u, k − 1)))
if (vh , v) ∈
/E (3.7)
S((vh , v, k)) = tu>vh ∧(u,v)∈E ◦(u,v) (S((vh , u, k − 1))) t
tvh0 <vh ◦(vh ,v) (S((vh0 , vh0 , k − 1)))
if (vh , v) ∈ E. (3.8)
The initial conditions are the following. S((vh , v, 1)) is the empty set if
(vh , v) ∈
/ E and contains the partial clow sequence having only the edge (vh , v)
in its first clow.
In the evaluation algebra, the algebraic structure R is the real number
field. The function f is defined as
Y
f (C) := (−1)l(C)+m w(e) (3.9)
e∈C
Linear algebraic algorithms. The power of subtracting 129
where l(C) is the number of edges in the partial clow sequence C with multi-
plicity, m is the number of ordered walks in C including the last, possibly yet
not closed ordered walk, and the product also considers the multiplicity of the
edges in C. It is easy to see that
f (C) = W (C)
for any clow sequence C. The T(u,v) function for the operator ◦(u,v) depends
on the parameter. If in the parameter (vh , v, k), vh = v, then T(u,v) is the mul-
tiplication with w((u, v)), otherwise it is the multiplication with −w((u, v)).
The determinant of M can be calculated as
X
det(M ) = F (S((vh , vh , n))).
vh
Observe that the real numbers can be replaced with any commutative ring in
the evaluation algebra. Therefore the following theorem holds.
Theorem 30. Let M be an n × n matrix over an arbitrary commutative ring
R. Then the determinant of M can be calculated in O(n4 ) time.
Finally,
n
X
det(M ) = w(i, i, n).
i=1
Since the number of entries is O(n3 ) and each entry can be computed in O(n)
time, the overall running time is O(n4 ).
Observe that this running time is an order larger than the standard Gaus-
sian elimination, which runs in O(n3 ) running time. On the other hand, this
algorithm does not need any division, thus it can be used for matrices over
arbitrary commutative rings. Specially, if the matrix is over the integer ring,
then the calculations in the recursion remain in the integer ring.
Also observe that the yield algebra builds the clow sequences and not
the permutations. Therefore, the evaluation algebra will calculate the sum of
scores of clow sequences with some given parameters. For some special scores,
this might coincide with the determinant of a matrix. However, if P 6= N P ,
130 Computational complexity of counting and sampling
It is easy to see that the normalizing constant 2n1n! in the definition of the
Pfaffian is for cancelling the 2n n! cases of σ ∈ S2n providing the same score,
n
Y
sign(σ) aσ(2i−1),σ(2i) . (3.12)
i=1
Since the signed product is the same for any σ coming from the same equiva-
lence class, this definition is well defined.
Similar to the determinant, the Pfaffian can be calculated in polynomial
time in an arbitrary commutative ring using the so-called alternating clow
sequences. First, we define them.
Linear algebraic algorithms. The power of subtracting 131
Note that both Mσ and M are a perfect matching of the complete graph
on 2n vertices, therefore the multiset union of Mσ ] M contains even long
alternating cycles, where along a walk on each cycle, one edge comes from Mσ
and the other from M . Edges in Mσ ∩M are represented twice in Mσ ]M , and
they are considered as cycles with two edges. The clow sequence assigned to
σ consists of the oriented versions of these cycles, such that each clow starts
with the smallest index vertex, and the first edge in each clow comes from
Mσ . For example, if σ is
As we can see, each clow has even length, and every second edge is either
(v2i−1 , v2i ) or (v2i , v2i−1 ) for some i, namely, the images are alternating clow
sequences.
We have to define the weights of the edges in the clow sequence to define
the score of the clow sequence. If an edge (vi , vj ) comes from Mσ , then its
weight is ai,j , and if an edge (vi , vj ) comes from M , its weight is j − i (recall
that it is either 1 or −1).
We claim if a clow sequence C is the image of σ, then the score of C is
the score of σ as defined in Equation (3.12) if n is even, and the score of C is
the additive inverse of the score of σ if n is odd. It is clear that the score of
the clow sequence contains the product of the appropriate ai,j terms; we only
have to check the signs. Since the score of σ is well defined on S2n /∼ , without
loss of generality, we might assume that σ is
2, 1, 4, 3, . . . , 2n − 1, 2n.
Since this latter permutation contains n cycles and its length is 2n, its sign is
(−1)n . Therefore we get that
n
Y
sign(σ) aσ(2i−1),σ(2i) = (−1)n W (C), (3.14)
i=1
since the sign of a clow sequence is the same as the sign of its corresponding
permutation.
The mapping is clearly an injection, since two permutations σ and σ 0 from
different equivalence classes contain different edges in Mσ and Mσ0 , and there-
fore their images will be different, too. We claim that the mapping is actually
a bijection between S2n /∼ and those alternating clow sequences that are per-
mutations and contain only even long cycles. Indeed, if there is a permutation
π corresponding to an alternating clow sequence C, then the permutation
The involution g2 is on clow sequences that are not permutations and each
revisited vertex is revisited after an even number of steps. The involution given
in the proof of Theorem 29 suffices for g2 . We already showed that it is an
involution satisfying Equation (3.15); we only have to show that the image of
an alternating clow sequence with the given property also has this property.
The involution operates with cutting and merging cycles. These cycles must
have even length, since each vertex is revisited after an even number of steps,
therefore their images also satisfy the prescribed properties.
We get that X
pf (A) = (−1)n W (C) (3.17)
C∈AC
where AC is the set of alternating clow sequences. It is easy to see that the
dynamic programming algorithm given in the proof of Theorem 30 can calcu-
late this sum if a restriction is added that in each clow, in each even step, the
edge must be (v2i−1 , v2i ) or (v2i , v2i−1 ).
In the following sections, such counting problems are introduced that can
be solved via calculating the determinant or the Pfaffian of a matrix.
tree or it is not) are complements, so the “only if” parts of both statements
follow from the “if” part of the other statement, and thus it is sufficient to
prove the “if” parts.
If the edges in F form the edges of a spanning tree T , we first set up a
partial ordering of the vertices such that u ≤ w if w is on the way from u to v.
Extend this partial ordering to an arbitrary total ordering and rearrange the
rows of the matrix based on this total ordering. There is a bijection ϕ between
the vertices in V \ {v} and the edges in F : u is mapped to e if e connects u
to its parent in the spanning tree rooted in v. The bijection indicates a total
ordering of the edges in F , e1 < e2 if ϕ−1 (e1 ) < ϕ−1 (e2 ). Let the columns of
C −v [F ] be ordered according to this total ordering.
Observe that the rearranged matrix is a lower diagonal matrix with all 1
and −1 in the diagonals. Therefore its determinant is either 1 or −1. How-
ever, the determinant of this matrix and the original C −v [F ] are the same in
absolute value.
If the edges in F do not form a spanning tree, then it contains a cy-
cle. Indeed, |F | = |V | − 1, and any cycle free graph with |V | − 1 edges is
a tree. Since F is not the set of edges of a tree, it must contain a cycle.
Let F 0 denote the edges in this cycle. The corresponding columns in C −v [F ]
are linearly dependent since an appropriate linear combinations of them is
the 0 vector. Indeed, fix an arbitrary walk on the edges around the cycle,
v0 , e1 , v1 , e2 , . . . , v|F 0 |−1 , e|F 0 | , v0 . Let the linear coefficient of the column rep-
resenting ei be 1 if the matrix entry for the pair vi−1 , ei in C is 1, otherwise let
the coefficient be −1. Then this linear combination of column vectors indeed
the 0 vector, since for each row representing vi , there is a 1 and a −1 in the
appropriately weighted column vectors. Since the columns of C −v [F ] are not
linearly independent, det(C −v [F ]) = 0.
where A[F ] denotes the n × n submatrix of A whose column’s indexes are the
indexes in F .
Proof. Consider the determinant of the following matrix
0 A
D=
BT I
where 0 denotes the all-zero matrix and I denotes the identity matrix. We are
going to calculate the determinant in two different ways.
Linear algebraic algorithms. The power of subtracting 135
The first way is based on the Laplace expansion on the first n rows of D.
Define the set of indexes C = {n + 1, . . . , n + m}, and index the column of A
by these indexes. It is sufficient to consider subsets of these columns since in
other subsets of indexes F , the matrix (0|A)[F ] contains an all-zero column,
and thus, its determinant is 0. Therefore
X Pn P
det(D) = (−1) i=1 i+ f ∈F f det(A[F ])det((B T |I)[F̄ ]). (3.19)
F ∈C∧|F |=n
det((B T |I)[F ]) =
X Pn P 0
(−1) i=1 i+ f 0 ∈F 0 (f −n) det([F 0 ]B T )det([F¯0 ]I[F̄ ]).(3.20)
F 0 ∈C∧|F 0 |=n
It is easy to see that all terms in this sum are 0 except the one in which
F 0 = F . Indeed, if F 0 6= F , then F¯0 6= F̄ . In that case [F¯0 ]I[F̄ ] contains an
all-0 row (and also, an all-0 column), and thus, det([F¯0 ]I[F̄ ]) = 0. If F 0 = F ,
then det([F¯0 ]I[F̄ ]) = 1. Therefore,
Pn P
i+ f ∈F (f −n)
det((B T |I)[F ]) = (−1) i=1 det([F ]B T ) =
Pn P
(−1) i=1 i+ f ∈F (f −n) det(B[F ]). (3.21)
We get that
Pn
f −n2
X P
det(D) = (−1)2 i=1 i+2 f ∈F det(A[F ])det(B[F ])
F ⊂{1,2,...,m}∧|F |=n
X
= (−1)n det(A[F ])det(B[F ]), (3.22)
F ⊂{1,2,...,m}∧|F |=n
since
−n2 ≡ n (mod 2). (3.23)
The second way to calculate det(D) is based on Gaussian elimination. For
each i = 1, . . . , n, we add to row i the following linear combination: −ai,1
times the n + 1st row plus −ai,2 times the n + 2nd row, etc., plus −ai,j times
the n + j th row. We get a matrix
C 0
D0 =
BT I
where
m
X
ci,j = −ai,k bk,j (3.24)
k=1
136 Computational complexity of counting and sampling
therefore
X
det(AB T ) = det(A[F ])det(B[F ]). (3.26)
F ⊂{1,2,...,m}∧|F |=n
The Kirchhoff theorem says that the number of spanning trees can be
calculated in polynomial time for any graph. We can use Theorem 34 to count
the leaf-labeled trees.
Example 7. Count the leaf-labeled trees on n vertices.
Solution. The number of leaf-labeled trees on n vertices are the spanning trees
of the complete graph Kn . By Theorem 34, it is the determinant of the matrix
n−1 −1 ··· −1 −1
−1 n − 1 ··· −1 −1
.. .. .. .. ..
. . . . .
−1 −1 ··· n − 1 −1
−1 −1 ··· −1 n−1
where the matrix has n − 1 rows and n − 1 columns. We can add linear combi-
nations of lines to some lines of this matrix without changing the determinant.
First, add all lines starting from the second line to the first line. We get the
following matrix:
+1 +1 ··· +1 +1
−1 n − 1 · · · −1 −1
.. .. . . .
. ..
.
.
. . . .
−1 −1 ··· n − 1 −1
−1 −1 ··· −1 n−1
where I(vi ) is the set of edges incident to vi . Define the weighted adjacency
matrix A such that ai,j = w(e) if e is the edge (vi , vj ). Let v be an arbitrary
vertex, and define A−v and D−v by deleting the row and column corresponding
to vertex v from A and D. Then
X
W (T ) = det(D−v − A−v ) (3.33)
T ∈T
−w(e) if e is an edge
P between vi and vj and 0 otherwise. If i = j, then the
scalar product is e∈I(vi ) w(e). Therefore Equation (3.34) indeed holds. We
have to use the Cauchy-Binet theorem:
−v T
X
det(C −v Cw )= det(C[F ])det(Cw [F ]). (3.35)
F ⊂{1,2,...,m}∧|F |=n−1
Count the spanning trees with a given weight k. The running time must be a
polynomial function of both n and m.
Linear algebraic algorithms. The power of subtracting 139
Solution. Define a new weight function, w0 (e) := xw(e) . This new weight func-
tion contains monomials from the Laurent polynomial ring, Z[x, −x]. Further-
more, for any spanning tree T it holds that
W 0 (T ) = xW (T ) . (3.37)
Since Z[x, −x] is a commutative ring, the summation over the spanning trees T
of the weights W 0 (T ) can be calculated using only a polynomial number of ring
operations. Furthermore, the degrees of the monomials can vary between −mn
and mn, where n is the number of vertices in G, therefore any ring operation
can be done in polynomial time in mn. The coefficient of the monomial xk in
det(D−v − A−v ) is the number of spanning trees of weight k.
The number of minimum (or dually, the maximum) spanning trees can be
counted in polynomial time with both the size of the graph and the logarithm
of the weights, see Exercise 13. On the other hand, it is NP-complete to decide
if there is a spanning tree with a given sum of weights, see Exercise 14.
It is easy to see that Theorem 34 is a special case of Theorem 36. Indeed, let
G be an arbitrary undirected graph. Create a directed graph G, ~ such that each
edge e in G is replaced with a pair of directed edges going to both directions
between the vertices incident to e. Then, on one hand, the matrix D−v − A−v
−v
constructed from G is the same as the matrix Dout − A−v constructed from
0
G . On the other hand, there is a bijection between the spanning trees in G
and the in-trees rooted into v in G0 . Indeed, take any spanning tree T in G,
and for any edge e ∈ T , select the edge in G0 corresponding to e with the
appropriate direction.
Theorem 36 also holds for directed multigraphs in which parallel edges
are allowed but loops are not allowed. Two in-trees are considered different if
the same pair of vertices are connected in the same direction, however, for at
least for one pair of vertices, the directed edges connecting them are different
parallel edges in the multigraph. For such graphs, the out-degree is defined
as the number of outgoing edges, and there is k in the adjacency matrix in
position (i, j) if there are k parallel edges going from vertex vi to vertex vj .
Indeed, it is easy to see that the base cases hold, and the induction also holds
since the linearity of the determinant holds for any matrices.
The number of in-trees appears in the formula counting the directed Eu-
lerian circuits in directed Eulerian graphs. First, we define them.
Definition 44. A directed Eulerian graph is a directed, connected graph in
which for each vertex v, its in-degree equals to its out-degree. A directed Eule-
rian circuit (or short, a Eulerian circuit) is a directed circuit that travels each
edge exactly once.
It is easy to see that each directed Eulerian graph has at least one directed
Eulerian circuit. Their number can be calculated in polynomial time, stated
by the following theorem.
Theorem 37. Let G ~ = (V, E) be a directed Eulerian graph, and let v ∗ be an
arbitrary vertex in it. Then its number of directed Eulerian circuits is
Y
|Tvin
∗ | (dout (v) − 1)! (3.40)
v∈V
∗
where Tvin
∗ is the set of intrees rooted into v .
Proof. The Eulerian circuit might start with an arbitrary edge, therefore, fix
an outgoing edge e of v ∗ . Start a walk on it, and due to the pigeonhole rule,
this walk can be continued until v ∗ is hit dout (v ∗ ) times. The last hit closes
the walk, thus obtaining a circuit. This circuit might not be a Eulerian circuit,
since there is no guarantee that it used all the edges in G. ~ However, assume
that the circuit is a Eulerian circuit. Let E 0 be the set of last outgoing edges
along this circuit for each vertex in V \ {v ∗ }. We claim that these edges are
the edges of an in-tree rooted into v ∗ .
Indeed, there are n − 1 edges. We prove that from each particular vertex
142 Computational complexity of counting and sampling
u0 , there is a directed path to v ∗ . Let e1 be the last edge going out from u0 in
the Eulerian circuit. It goes to some u1 , and thus from u1 , the last outgoing
edge is later in the Eulerian circuit than e1 . Let it be denoted by e2 . This edge
goes into u2 , etc. We claim that the edges e1 , e2 , . . . eventually go to v ∗ due to
the pigeonhole rule. Indeed, it is impossible that some ei goes to uj for some
j < i since ei is later in the Eulerian circuit than ej . However, if ei goes to uj
then the Eulerian path is continued by going out from uj , and thus, the last
outgoing edge of uj is after ei , a contradiction.
Therefore the n − 1 edges form a connected graph, in the weak sense, and
thus, the undirected version of a graph is a tree. Furthermore, all edges are
directed towards v ∗ , therefore the directed version of the graph is an in-tree
rooted into v ∗ .
Furthermore, for each vertex v, give an arbitrary but fixed ordering of the
outgoing edges. If v 6= v ∗ and the last outgoing edge is the k th in this list, then
decrease the indexes of each larger indexed edge by 1. Thus, the indexes of the
outgoing edges except the last one form a permutation of length dout (v) − 1.
For v ∗ , decrease the indexes of each edge which have a larger index than
the index of e. Therefore, the indexes of the last dout (v ∗ ) − 1 edges form a
permutation of length dout (v ∗ ) − 1.
In this way, we can define a mapping of Eulerian circuits onto
Tvin
∗ × Sdout (v)−1 (3.41)
v∈V
where Sn denotes the set of permutations of length n in the following way. The
direct product of the in-tree of the last outgoing edges and the aforementioned
permutations for each vertex is the image of a Eulerian circuit.
It is clear that this mapping is an injection. Indeed, consider the first
step where two Eulerian circuits, C1 and C2 deviate. If these edges go out
from v ∗ , then their image will be different on the permutation in Sdout (v∗ )−1 .
Otherwise, they have different edges going out from some v 6= v ∗ . If the last
outgoing edges from v are different in the two circuits, then the images of the
two circuits have different in-trees. If the last outgoing edges are the same,
then the permutations in Sdout (v)−1 are different in the images.
in
This mapping is also a surjection. Indeed, take any in-tree T ∈ Tv∗ and for
each v 6= v∗, take a permutation πv ∈ Sdout (v)−1 . We are going to construct a
Eulerian circuit whose image is exactly
T× πv . (3.42)
v∈V \{v ∗ }
For each v, if the index of the edge which is in T and going out from v is k,
then increase by 1 all the indexes in permutation πv which are greater than or
equal to k. This is now a list of indexes. Extend this list by k; this extended
list Lv will be the order of the outgoing edges from v. That is, start a circuit
with the edge e going out from v ∗ , and whenever the circuit arrives at v, go
Linear algebraic algorithms. The power of subtracting 143
out on the next edge in the list Lv . Continue till all the outgoing edges from
v ∗ are used. We claim that the so obtained circuit is a Eulerian circuit whose
image is exactly the one in Equation (3.42).
Indeed, assume that some edges going out from v0 is not used. Then par-
ticularly, the last edge e1 going out from v0 is not used. It goes out to some
v1 , and since it is not used, then there are also outgoing edges in v1 which
is not used. Particularly, its last edge e2 is not used, which goes to some v2 .
However, these last edges e1 , e2 , . . . goes to v ∗ , namely, there are ingoing edges
of v ∗ which are not used in the circuit. However, in that case not all outgoing
edges from v ∗ is used, contradicting that the walk generating the circuit is
finished.
Therefore, the circuit is a Eulerian circuit. Its image is indeed the one in
Equation (3.42), due to construction.
Since the mapping is injective and surjective, it is a bijection. Thus, the
number of Eulerian circuits is
Y
Tvin
∗ × Sdout (v)−1 = |Tvin
∗ | (dout (v) − 1)! (3.43)
v∈V \{v ∗ } v∈V \{v ∗ }
times, where m(u, v) is the number of parallel edges going from u to v. (Here
m(v, v) is the number of loops on v.) Having said this, the following theorem
holds.
144 Computational complexity of counting and sampling
|Tvin
Q
∗ | (dout (v) − 1)!
Q v∈V (3.46)
u,v∈V m(u, v)!
∗
where Tvin
∗ is the set of in-trees rooted into v .
a Pfaffian orientation of a planar graph, thus also proving that each planar
graph has a Pfaffian orientation.
Given a planar embedding of a planar graph G = (V, E), construct its dual
graph G∗ = (V ∗ , E ∗ ) in the following way. V ∗ is the set of faces of G, also one
vertex for the external face. Two vertices in V ∗ are connected with an edge
if the corresponding faces are neighbors. Take any spanning tree T ∗ of G∗ ,
and root it into the vertex corresponding the outer face of G. Let this vertex
be denoted by v ∗ . The edges of T ∗ correspond to edges of G separating the
neighbor faces. Let E 0 be the subset of these edges in G. Give an arbitrary
orientation of the edges in E \ E 0 . We claim that this orientation can be
extended to a Pfaffian orientation by visiting and removing the edges of T ∗
and giving an appropriate orientation of the corresponding edges in E 0 . While
there is an edge in T ∗ , take any edge e∗ ∈ E ∗ connecting a leaf which is not
v ∗ to the rest of T ∗ . The corresponding edge e0 ∈ E 0 is the last edge of a face
F not having an orientation yet. Give e0 an orientation such that F has an
odd number of clockwise-oriented edges. The edge e0 separates two faces, F
and F 0 . Observe that due to construction, F 0 is either the outer face or a face
which still has unoriented edges. If F 0 is not the outer face, the orientation
of e0 does not violate the property that F 0 will also have an odd number of
clockwise-oriented edges. Remove e∗ from T ∗ , and continue this procedure. In
the last step, T ∗ contains one edge, which connects a leaf to v ∗ . Orient the
corresponding edge in E 0 such that the last face in G also has an odd number
of clockwise-oriented edges.
This procedure indeed generates a Pfaffian orientation, since T ∗ at the
beginning contains a vertex for each face in G, and once the last edge of each
face is oriented in a Pfaffian manner, the orientation of the edges of the face
are not changed.
Pfaffian orientations have a very important property, stated in the follow-
ing lemma.
Lemma 6. Let G = (V, E) be a planar embedding of a planar graph and let its
edges be in a Pfaffian orientation. Let C be a cycle containing an even number
of edges, surrounding an even number of vertices in the planar embedding.
Then C contains an odd number of clockwise-oriented edges.
Proof. Consider the subgraph G0 that contains C and the vertices and edges
surrounded by C. G0 also has Pfaffian orientation since all of its internal faces
are also internal faces of G. Let F 0 denote the number of internal faces in G0 ,
let E 0 denote the number of edges of G0 , and let V 0 denote the number of
vertices in G0 . From Euler’s theorem, we know that
F 0 − E 0 + V 0 = 1. (3.47)
(Note that the outer face is not counted by F 0 .) We know that V 0 is even, since
there are an even number of vertices in the cycle C and the cycle surrounds
an even number of vertices. Therefore we get that
F 0 + E0 ≡ 1 mod 2. (3.48)
Linear algebraic algorithms. The power of subtracting 147
Each internal edge separates two faces, and the orientation of an internal edge
is clockwise in one of the faces and anticlockwise in the other face. Namely, if
we put those edges into Ec which have clockwise orientation in some faces, then
we put each internal edge into Ec and those edges in C which has clockwise
orientation. Since there are an odd number of clockwise-oriented edges in each
face, Ec has the same parity as F 0 . (Note that each internal edge is clockwise
only in one of the faces!)
If F 0 is odd, then there are an even number of edges in G0 , due to Equa-
tion (3.48). Since there are an even number of edges in C, there are an even
number of internal edges. The parity of Ec is odd, and since there are an even
number of internal edges, the number of clockwise-oriented edges in C is odd.
On the other hand, if F 0 is even, there are an odd number of edges in G0 ,
due to Equation (3.48). Since there are an even number of edges in C, the
number of internal edges is odd. The parity of Ec is even, so removing the
odd number of internal edges from Ec , we get that the number of clockwise-
oriented edges in C is still odd.
Even long cycles surrounding an even number of vertices in a planar em-
bedding are important since any cycle appearing in the union of two perfect
matchings are such cycles. More specifically, there is a bijection of oriented
even cycle coverings and ordered pairs of perfect matchings, stated and proved
below. First, we have to define oriented even cycle coverings.
Definition 46. Let G = (V, E) be a graph. An oriented even cycle covering
of G is a set of cycles with the following properties.
1. Each cycle is oriented, and removing the orientations, the edges are all
in E.
2. Each cycle has even length. A cycle length of 2 is allowed; in that case,
the unoriented versions of the edges are the same, however, they are still
in E.
3. Each vertex in V is in exactly one cycle.
Theorem 39. Let G = (V, E) be an arbitrary graph. The number of oriented
even cycle coverings of G is the square of the number of perfect matchings of
G.
Proof. We give a bijection between the oriented even cycle coverings and or-
dered pair of perfect matchings. Since the number of ordered pairs of perfect
matchings is the number of perfect matchings squared, it proves the theorem.
Fix an arbitrary total ordering of the vertices. We define two injective
functions, one from the ordered even cycle coverings to the ordered pair of
perfect matchings, and one in the other way, and prove that they are inverses
of each other.
Let C = C1 , C2 , . . . Ck be a cycle covering. For each cycle Ci , consider
the smallest vertex vi in it. Take a walk on Ci starting at vi in the given
148 Computational complexity of counting and sampling
orientation. Along the walk, put the edges into the sets M1 and M2 in an
alternating manner, the first edge into M1 , the second into M2 , etc. Remove
the orientations of all edges, both in M1 and M2 . The so-constructed sets will
be perfect matchings, since it is a matching and all vertices are covered. In
this way, we constructed a mapping from the oriented even cycle coverings to
the ordered pairs of perfect matchings.
We claim that this mapping is an injection. Indeed, if the unoriented ver-
sions of two coverings, C1 and C2 , differ in some edges, then they have different
images. If the edges are the same, and just some of the cycles have different
orientation, then different edges of that cycle will go to the first and the second
perfect matchings, and thus the images are still different.
The inverse mapping is the following. Let M1 and M2 be two perfect
matchings. First, take the union of them. The union of them consists of disjoint
even cycles and separated edges. Make an oriented cycle of length 2 from each
separated edge. Orient each cycle Ci such that the edge which is incident to
the smallest vertex vi and comes from M1 goes out from vi and the edge which
is from M2 and incident to vi comes into vi . We constructed an oriented even
cycle covering.
We claim that this mapping is an injection. Indeed, consider two ordered
pairs of perfect matchings, (M1 , M2 ) and (M10 , M20 ). If the set of edges of
M1 ∪ M2 is not the set of edges of M10 ∪ M20 , then the images of the two pairs
of perfect matchings are clearly different. If the two sets of edges in the unions
are the same, but M1 6= M10 , then consider an edge e which is in M1 and not
in M10 . This edge is in a cycle Ci in the union of the two perfect matchings.
We claim that Ci is oriented in a different way in the two matchings. Indeed,
e is an edge from M20 , and since the edges are alternating in Ci , any edge in
Ci which comes from M1 is an edge from M20 . Similarly, any edge from M2 is
an edge from M10 . Especially, the two edges incident to the smallest vertex vi
in Ci comes from M1 and M10 , therefore Ci is oriented differently in the two
images, thus, the two images are different. Finally, it is easy to see if the two
sets of edges in the unions are the same and M1 = M10 , then also M2 = M20 ,
thus the two ordered pairs of perfect matchings are the same.
It is also easy to see that the two injections are indeed the inverses of each
other.
We are ready to prove the main theorem.
Theorem 40. Let G = (V, E) be a planar embedding of a planar graph, with
edges having a Pfaffian orientation. Define the oriented adjacency matrix A
in the following way. Let the entry ai,j be 1 if there is an edge going from vi to
vj , and let ai,j be −1 if there is an edge going from vj to vi . All other entries
are 0. Then the number of perfect matchings of G is
p
det(A). (3.49)
Proof. Based on Theorem 39, it is sufficient to prove that det(A) is the number
Linear algebraic algorithms. The power of subtracting 149
Let On denote the set of those permutations that contain at least one odd
cycle. We prove that
X n
Y
sign(π) ai,π(i) = 0. (3.51)
π∈On i=1
even cycle covering must contain an odd number of clockwise and an odd
number of anti-clockwise edges, since G has a Pfaffian orientation. Therefore,
the contribution of each cycle in the product in Equation (3.55) is −1. Thus
the product is −1 if the number of cycles is odd and it is 1 if the number of
cycles is even. Since the same is true for sign(π), Equation (3.55) holds if π
is an oriented even cycle covering.
Therefore det(A) is indeed the number of oriented even cycle coverings.
Due to Theorem 39, det(A) is the number of perfect matchings in G squared,
so its square root is indeed the number of perfect matchings.
Example 10. Count the 2 × 1 domino tilings on a 3 × 4 square.
Solution. Consider the planar graph whose vertices are the unit squares and
two vertices are connected if the corresponding squares are neighbors. A pos-
sible Pfaffian orientation is
Number the vertices from top to bottom, and from left to right, row by row.
Then the oriented adjacency matrix is
0 1 0 0 1 0 0 0 0 0 0 0
−1 0 1 0 0 1 0 0 0 0 0 0
0 −1 0 1 0 0 1 0 0 0 0 0
0
0 −1 0 0 0 0 1 0 0 0 0
−1 0 0 0 0 −1 0 0 1 0 0 0
0 −1 0 0 1 0 −1 0 0 1 0 0
A=
0 0 −1 0 0 1 0 −1 0 0 1 0
0
0 0 −1 0 0 1 0 0 0 0 1
0
0 0 0 −1 0 0 0 0 1 0 0
0
0 0 0 0 −1 0 0 −1 0 1 0
0 0 0 0 0 0 −1 0 0 −1 0 1
0 0 0 0 0 0 0 −1 0 0 −1 0
Note that there were two places where we used the planarity of G. First,
G can have a Pfaffian orientation, and second, any even cycle appearing in an
oriented even cycle covering has an odd number of clockwise edges. Indeed,
Theorem 39 holds for any graph. This suggests a weaker definition of Pfaffian
orientation.
Definition 47. Let G = (V, E) be a graph. A Pfaffian orientation of the edges
of G in the weak sense is an orientation that determines, for the corresponding
oriented adjacency matrix A of G, that
p
det(A) (3.56)
Using Z[x], assign a weight x to each horizontal edge, and weight 1 to each
vertical edge. Number the vertices from top to bottom, and from left to right,
row by row. Then the weighted oriented adjacency matrix is
0 x 0 0 1 0 0 0
−x 0 x 0 0 1 0 0
0 −x 0 x 0 0 1 0
0 0 −x 0 0 0 0 1
A= −1
0 0 0 0 −x 0 0
0 −1 0 0 x 0 −x 0
0 0 −1 0 0 x 0 −x
0 0 0 −1 0 0 x 0
number. There is no yield algebra (A, (Θ, ≤), p, O, R) satisfying the following
properties.
1. For each n, there is a parameter θn such that S(θn ) contains the span-
ning trees of the complete graph K[n] .
Z+
2. There is a function g : A → 2( 2 ) with the following properties:
(a) For any ◦i ∈ O, and for any of its operands,
mi
) = tm
g(◦i (aj )j=1 j=1 g(aj ) t h◦i (p(a1 ), . . . , p(ami ))
i
+
where h◦i is a function mapping from Θmi to 2( 2 ) .
Z
Y mi
Y
Ti (r1 , . . . , rmi , θ1 , . . . , θmi ) := xk,l ri . (3.64)
(k,l)∈h◦i (θ1 ,...,θmi ) j=1
Jerrum and Snir also proved an exponential lower bound on the number of
additions and multiplications to calculate the permanent polynomial defined
as
X Y n
per(M ) := xi,σ(i) (3.65)
σ∈Sn i=1
+
×Z+
where h◦i is a function mapping from Θmi to 2Z .
(b) If a ∈ A is a permutation of length n, then g(a) is the set of edges
~ [n] that the permutation indicates.
in the cycle cover of K
3. It holds that |θn↓ | = O(poly(n)).
Proof. The proof is similar to the proof of Theorem 42. If such yield alge-
bra existed, then we could build up a corresponding evaluation algebra that
could compute the permanent polynomial using only a polynomial number of
additions and subtractions.
Contrary to the spanning tree polynomial, no algorithm is known to com-
pute the permanent polynomial using only a polynomial number of arithmetic
operations. In fact, computing the permanent is in #P-hard, so no polyno-
mial algorithm exists to compute the permanent of a matrix, assuming that
P is not NP. From this angle, it looks really accidental that the determinant
can be calculated with a dynamic programming algorithm due to cancellation
of terms. What that dynamic programming algorithm really calculates is the
sum of the weights of clow sequences which coincides with the determinant
of a matrix. A large set of similar “accidental” cases are known where we
can compute the number of combinatorial objects in polynomial time due to
cancellations. These cases are briefly described in Chapter 5.
Leslie Valiant proved that computing the perfect matching polynomial
needs exponentially many arithmetic operations if subtractions are forbidden
[177]. His theorem holds also if the problem is restricted to planar graphs.
Linear algebraic algorithms. The power of subtracting 157
On the other hand, the perfect matching polynomial can be computed with
a polynomial number of arithmetic operations if subtractions are allowed. In-
deed, it is easy to see that the formal computations in Theorem 41 build up
the perfect matching polynomial using only a polynomial number of arith-
metic operations in the multivariate polynomial ring. This again shows the
computational power of subtractions: subtractions might have exponential
computational power.
dY
2ed 2 e
n m
Y iπ jπ
4 cos2 + 4 cos2 . (3.66)
i=1 j=1
n+1 m+1
• György Pólya asked the following question [142]. Let A be a square 0-1
matrix. Is it possible to transform it to a matrix B by changing some
of its 1s to −1, such that the permanent of A is the determinant B?
Neil Robertson, Paul Seymour and Robin Thomas solved this question
[146]. Roughly speaking, their theorem says that a matrix A has the
above-mentioned property if and only if it is the adjacency matrix of a
bipartite graph that can be obtained by piecing together planar graphs
and one sporadic non-planar bipartite graph. Their theorem provides a
polynomial running time algorithm to decide if A has such a property
and if so, it also generates a corresponding matrix B.
• Simon Straub, Thomas Thierauf and Fabian Wagner gave a polynomial
time algorithm to count the perfect matchings in K5 -free graphs [164].
It is interesting to mention that some of the K5 -free graphs cannot
be Pfaffian orieneted in the weak sense; an example of that is K3,3 ,
see also Exercise 23. Their algorithm decomposes a K5 -free graph into
components and applies some matchgate techniques to get a polynomial
time algorithm. Matchgates are introduced in this book in Chapter 5.
3.7 Exercises
1. List all clow sequences of length 3.
2. ◦ Let Cln denote the number of clow sequences of length n on the
numbers {1, 2, . . . , n}. Give a dynamic programming recursion that finds
Cln .
pf 2 (A) = det(A).
23. * The complete bipartite graph K3,3 is not planar. Prove that it does
not have a Pfaffian orientation in a weak sense (Definition 47).
24. ◦ Let G = (V, E) be a planar graph, and let w : E → R be a weight
function. Show that
X X
Z(G) := w(e)
M ∈P M (G) e∈M
3.8 Solutions
Exercise 2. Apply the algebraic dynamic programming approach on Equa-
tions (3.7) and (3.8).
Exercise 3. Those clow sequences that contain only one clow with head 1
are already exponentially more than the permutations of the same length.
Indeed, there are (n − 1)n−1 clows of length n with head n over the indexes
1, 2, . . . , n. Therefore the ratio of the number of clow sequences and the number
of permutations is at least
(n − 1)n−1
.
n!
Applying the Stirling formula, this fraction is
n
(n − 1)n−1
1 n−1
√ n = √ en
2πn ne
n 2πn n
Linear algebraic algorithms. The power of subtracting 161
Using standard calculations, we get that det(D−v − A−v = 384, therefore, the
octahedron has 384 spanning trees.
Exercise 10. Let G−e be the graph obtained from G by removing edge e.
Observe that the difference in the number of spanning trees of G and G−e is
the number of spanning trees containing e. If T e is the number of spanning
trees containing e, then the value to be calculated is
X
T e w(e).
e∈E
and π2 is
1 2 ··· k1 k1 + 1 ··· k1 + k2 ··· 2n − km ··· 2n
.
x1,2 x1,3 ··· x1,1 x2,2 ··· x2,1 ··· xm,2 ··· xm,1
Observe that
sign(π1 )sign(π2 ) = sign(σ)
since
π1−1 π2 = σ,
and the sign of any permutation is the sign of its inverse. Therefore we get
that for any so-constructed σ, π1 and π2 ,
n
! n
!
Y Y
sign(π1 ) aπ1 (2i−1),π1 (2i) sign(π2 ) aπ2 (2i−1),π2 (2i) =
i=1 i=1
2n
Y
= sign(σ) ai,σ(i) . (3.67)
i=1
Summing this for all permutation σ which are even cycle coverings, we get
that
X n
Y
sign(π1 ) aπ1 (2i−1),π1 (2i) ×
π1 ∈S2n /∼ i=1
X n
Y
sign(π2 ) aπ2 (2i−1),π2 (2i) =
π2 ∈S2n /∼ i=1
X 2n
Y
= sign(σ) ai,σ(i) .
σ∈S2n \O2n i=1
164 Computational complexity of counting and sampling
Observe that on the left-hand side we have pf 2 (A) by definition, and on the
right-hand side, we have det(A) due to Equation (3.54).
Exercise 20. Index the vertices of the octahedron such that the north pole
is the first vertex, the south pole is the last one, and the vertices on the equa-
tor are ordered along the walk on the equator. With an appropriate Pfaffian
orientation, the oriented adjacency matrix is
0 1 −1 −1 −1 0
−1 0 1 0 −1 −1
1 −1 0 −1 0 −1
A= .
1
0 1 0 −1 1
1 1 0 1 0 −1
0 1 1 −1 1 0
Using standard matrix calculations, we get that det(A) = 64. Therefore, the
number of perfect matchings in an octahedron is 8.
Exercise 23. From Exercise 22, we get that the Pfaffian of the oriented
adjacency graph of the complete bipartite graph K3,3 is the determinant of
the matrix
x1 x2 x3
A = x4 x5 x6
x7 x8 x9
where each xi is either 1 or −1. The number of perfect matchings in K3,3 is
6. However the determinant of matrix A has only 6 terms, so the determinant
can be 6 only if each term is 1. This means that the det(A) could be 6 if all
of the following sets contain an even number of −1s:
From the first 3 sets, we get that there must be an even number of −1s in A.
However, from the second 3 sets, we get that there must be an odd number
of −1s in A. It is impossible.
Exercise 24. Similar to Exercise 10, for each edge (vi , vj ), we can remove
vertices vi and vj to count the perfect matchings containing edge (vi , vj ).
Chapter 4
#P-complete counting problems
165
166 Computational complexity of counting and sampling
2. Polynomial reductions that do not keep the relative error. Such reduc-
tions do not exclude the possibility for an FPRAS approximation; on
the other hand, they do not guarantee the existence of efficient approx-
imations.
(a) Reductions using only one subtraction. For example, this is the way
to reduce #SAT to #DNF.
(b) Modulo prime number calculations. This is applied in reducing the
permanent computation of matrices of −1, 0, 1, 2 and 3 to matrices
containing only small non-negative integers.
(c) Polynomial interpolation. This reduction is based on the following
fact. If the value of a polynomial of order n is computed in n +
1 different points, then the coefficients of the polynomial can be
determined in polynomial time by solving a linear equation system.
In the polynomial reduction, for a problem instance a in the #P-
complete problem A, m + 1 problem instances bj from problem
B are constructed such that theP number of solutions for bj is the
m i
evaluation of some polynomial i=0 ci x at distinct points x =
xj . For some k, ck is the number of solutions of problem instance
a. This is a very powerful approach that is used in many #P-
completeness proofs of counting problems including counting the
not necessarily perfect matchings in planar graphs and counting
the subtrees of a graph.
(d) Reductions using other linear equation systems. An example in this
chapter is the reduction of the number of perfect matchings to the
number of Eulerian orientations of a graph.
Finally, a class of #P-complete computational problems has become a
center of interest recently. These problems are known to be equally hard, and
it is conjectured that they do not have an FPRAS approximation. We will
discuss them in Subsection 4.3.2.
∨ ∨
∨ ∨
∨
˜
x1 ˜x 2 x3 x4
Similar 3CNFs can be constructed when one or both incoming values are
negated. The conjunction of the 3CNFs obtained for the internal nodes de-
scribes the computation on the directed acyclic graph, namely, the satisfying
assignments represent the possible computations on the directed acyclic graph.
To get the satisfying assignments of the initial CNF Φ, we require that the
value propagated to O must be a logical TRUE value. We can represent it
with a 3CNF adding two auxiliary logical variables, O0 and O”:
If the CNF Φ has k variables and m logical operations ∨ and ∧, then Φ0 has
n + m + 2 logical variables, and has 4m + 7 clauses. Since clearly Φ0 can be
constructed in polynomial time, we get the following theorem:
Theorem 44. The counting problem #3SAT is in #P-complete.
the sum of the weights of the cycle covers in G, ~ that is, the permanent of the
corresponding weighted adjacency matrix. Furthermore, G ~ can be constructed
from Φ in polynomial time. This construction proves that calculating the
permanent is #P-hard, since calculating t(Φ) as well as dividing an integer
with 4t(φ) are both easy.
The construction is such that each cycle cover corresponding to a satisfying
assignment has a weight 4t(Φ) and all other “spurious” cycle covers cancel each
other out. Let Φ = C1 ∧ C2 ∧ . . . Cm , where each Ci = (yi,1 ∨ yi,2 ∨ yi,3 ) with
yi,j ∈ {x1 , x1 , x2 , x2 , . . . , xn , xn }. The graph is built up using the following
gadgets:
(a) A track Tk for each variable xk ;
(b) an interchange Ri for each clause Ci ;
(c) for each literal yi,j such that yi,j is either xk or xk , a junction Ji,k at
which Ri and Tk meet. Interchanges also have internal junctions with
the same structure.
Each junction is a 4-vertex, weighted, directed graph with the following
weighted adjacency matrix X:
0 1 −1 −1
1 −1 1 1
X :=
0 1
. (4.8)
1 2
0 1 3 0
Each junction has external connections via the 1st and 4th vertices and not
via the other two vertices. Let X[γ; δ] denote the submatrix obtained from X
deleting the rows γ and columns δ. The following properties are easy to verify:
(a) per(X) = 0,
R5 J5,5
J3,5 R3
R2 J2,5
v5,2
FIGURE 4.2: The track T5 for the variable x5 when x5 is a literal in C2 and
C5 and x5 is a literal in C3 .
Interchange Ri contains two vertices wi,1 and wi,2 , and 3 junctions for the
3 literals and 2 internal junctions. They are wired as shown on Figure 4.3.
The interchanges do not distinguish literals xi and xi , the edges connecting
the junctions are always the same.
~ as a set of cycle covers that contains the same edges
We define a route in G
outside the junctions. A route is good if every junction and internal junction is
entered exactly once and left exactly once at the opposite end. A route might
not be good for several reasons:
1. some junction and/or internal junction is not entered and left, or
2. it is entered and left on the same end, or
3. it is entered and left twice.
Due to the properties of the permanent of the submatrices of X, any route
which is not good, contributes 0 to the permanent. Indeed, if a junction is
not entered and left in a route, then the sum of the weights of the cycle
covers in that route will be 0 due to property (a). Similarly, if a junction is
entered and left on the same end or it is entered and left twice, the sum of the
weights of cycle covers will be 0 due to conditions (b)–(d). On the other hand,
condition (e) ensures that any good route contributes 4t(Φ) to the permanent:
the number of junctions and internal junctions is indeed t(Φ).
We have to show that the good routes are in a bijection with the satisfying
assignments of Φ. Observe the following:
1. Any good route in any track Tk either “picks up” the junctions corre-
sponding to the xk literal or picks up the junctions corresponding to the
xk literal. Assign the logical TRUE value to xk in the first case and the
logical FALSE value in the second case.
#P-complete counting problems 173
w3,1 w3,2
J3,1 J3,5 J3,8
T1 T5 T8
2. The interchanges are designed so that a good route can pick up the two
internal junctions and any subset of the junctions except all of them.
Furthermore, it can pick up these subsets in exactly one way. Therefore,
in any good route, each interchange picks up the junctions corresponding
to the literals that do not satisfy the clause corresponding to the inter-
change. Since it cannot pick up all the literals, some of them will satisfy
the clause. This is true for each interchange, namely, the assignment we
defined based on the good route is a satisfying assignment.
proof that finding the permanent remains #P-hard for non-negative entries
is based on a polynomial reduction that contains some computational steps
that do not preserve the relative error. Later we will see that a polynomial
reduction containing only computational steps that preserve the relative error
does not exist unless RP = NP.
where the index of xi+1 is taken modulo n, that is, xn+1 is defined as x1 .
The assignments of the xi variables in any satisfying assignment of Φ0 also
satisfies Φ, therefore, it is sufficient to show that any satisfying assignment
of Φ can be extended to exactly one satisfying assignment of Φ0 . But this is
obvious: the conjunctive form in Equation (4.11) forces that yi must take the
value of xi . This provides also that in any satisfying assignment of Φ0 , exactly
half of the values will take the TRUE value.
To prove that computing the quantity in Equation (4.10) is #P-complete,
we give a polynomial running time algorithm which for any 3CNF formula Φ,
constructs a problem instance p ∈ #SPS − TREE with the following property:
The number of solutions of p can be written as a + by, where y is the number
of satisfying assignments of Φ, b is an easy-to-calculate positive
integer, and
0 < a b. Thus, if s is the number of solutions of p, then sb is the number
of satisfying assignments of Φ.
Let Φ be a 3CNF, and let Φ0 be the 3CNF that has as many satisfying
assignments as Φ and in all satisfying assignments, the number of TRUE and
FALSE values are the same. Let n denote the number of logical variables in
Φ0 and let k denote the number of clauses in Φ0 . We are going to construct a
tree denoted by TΦ0 , and to label its leaves with sequences over the alphabet
{0, 1}. The first n characters of each sequence correspond to the logical vari-
ables xi , and there are
l further, auxiliary characters. mThe number
of auxiliary
220
characters are 148k (k log(n!) + n log(2))/ log( 312 ) + 1 . The construction
is such that there will be 2n most parsimonious labelings of the internal nodes,
one for each possible logical assignment. Each labeling is such that the labeling
of the root completely determines the labelings at the other internal nodes.
The corresponding assignment is such that the value of the logical variable
xi is TRUE if there is a character 1 in the sequence at the root of the tree
in position i. The characters in the auxiliary positions are 0 in all the most
parsimonious labelings.
If an assignment is a satisfying assignment, then the corresponding labeling
has many more scenarios than the labelings corresponding to non-satisfying
assignments. Furthermore, for each satisfying assignment, the corresponding
labelings have the same, easy-to-compute number of scenarios.
176 Computational complexity of counting and sampling
amending
6444447444448
blowing up
...
644444744444
8
unit subtrees
solutions for the non-satisfying assignment is smaller than for any of the sat-
isfying assignments. Such combinations can be found by some linear algebraic
considerations not presented here; below we just show one possible solution.
For each elementary tree, we give the characters at positions of the three
literals. The elementary trees which are cherries are the following:
• There are four cherries on which the left leaf contains 1 in an extra po-
sition, and the characters in the positions of the three literals on the left
and right leaf are given by
011, 100
101, 010
110, 001
000, 111.
• There is one cherry motif without any extra adjacency, and the charac-
ters in the positions corresponding to the literals are
000, 111.
There are 8 most parsimonious labelings at the root, and each needs 3
substitutions. If the labeling at the root corresponds to a non-satisfying
assignment, the number of scenarios on this cherry is 6; if all logical
values are true, the number of scenarios is still 6; in any other case, the
number of scenarios is 2.
This elementary subtree is repeated 3 times.
• Finally, there are 3 types of cherry motifs with a character 1 at one-one
#P-complete counting problems 179
011, 100
101, 010
110, 001.
There are 8 possible labelings of the root which are most parsimonious in
TΦ0 , and each needs 5 substitutions. If all substitutions at the positions
corresponding to the 3 literals falls onto one edge, then the number of
scenarios is 24, otherwise the number of solutions is 12.
Each of these elementary subtrees is repeated 15 times.
The remaining elementary subtrees contain 3 cherry motifs connected with
a comb, that is, a completely unbalanced tree, see also Figure 4.5. For the
cherry at the right end of this elementary subtree, there is one or more aux-
iliary positions that have character 1 at one of the leaves and 0 everywhere
else in TΦ .
There are 3 elementary subtrees of this type which have only one auxiliary
position. On these trees, the sequence at the right leaf of the rightmost cherry
is all 0, and the sequence at the left leaf of the rightmost cherry motif is
all 0 except at the auxiliary position and exactly 2 positions amongst the 3
positions corresponding to the literals.
The remaining leaves of these elementary subtrees are constructed in such a
way that there are 8 most parsimonious labelings, each needing 7 substitutions,
see the example in Figure 4.5. The number of substitutions is 0 or 1 at each
edge except the two edges of the rightmost cherry motif. Here the number of
substitutions might be 3 and 0, 2 and 1, or 1 and 2, yielding 6 or 2 scenarios,
see also Table 4.1.
Each of these elementary subtrees are repeated 3 times.
Finally, there are 3 elementary subtrees of this type which have one aux-
iliary position for the left leaf of the rightmost cherry motif, and there are
2 auxiliary positions for the right leaf of the rightmost cherry motif. The se-
quence at the right leaf of the rightmost cherry is all 0 except at the 2 auxiliary
positions, and the sequence at the left leaf of the rightmost cherry motif is
all 0 except at the auxiliary positions and exactly 2 positions amongst the 3
positions corresponding to the literals.
The remaining leaves of these elementary subtrees are constructed in such a
way that there are 8 most parsimonious labelings, each needing 9 substitutions,
see the example in Figure 4.5. The number of substitutions is 0 or 1 on each
edge except the two edges of the rightmost cherry motif. Here the number of
substitutions might be 1 and 4, 2 and 3, or 3 and 2, yielding 24 or 12 scenarios,
see also Table 4.1.
Each of these elementary subtrees are repeated 5 times.
180 Computational complexity of counting and sampling
a) c)
b)
1 : 1 1 : 1 1 : 1 1 : 1 1 : 0 1 : 0
2 : 1 2 : 0 2 : 1 2 : 0 2 : 1 2 : 0
3 : 1 3 : 1 3 : 0 3 : 0 3 : 1 3 : 0
x: 0 x: 0 x: 0 x: 0 x: 1 x: 0
FIGURE 4.5: a) A cherry motif, i.e., two leaves connected with an internal
node. b) A comb, i.e., a fully unbalanced tree. c) A tree with 3 cherry motifs
connected with a comb. The assignments for 4 adjacencies, α1 , α2 , α3 and
αx are shown at the bottom for each leaf. αi , i = 1, 2, 3 are the adjacencies
related to the logical variables bi , and αx is an extra adjacency. Note that
Fitch’s algorithm gives ambiguity for all adjacencies αi at the root of this
subtree.
011 101 110 000 011 101 110 000 011 101 110 011 101 110
# 1 1 1 1 3 3 3 3 5 5 5 15 15 15
000 6 6 6 6 63 63 63 63 125 125 125 1215 1215 1215
100 24 4 4 4 63 23 23 23 125 125 125 2415 1215 1215
010 4 24 4 4 23 63 23 23 125 125 125 1215 2415 1215
110 6 6 6 6 23 23 23 23 125 125 245 1215 1215 2415
001 4 4 24 4 23 23 63 23 125 125 125 1215 1215 2415
101 6 6 6 6 23 23 23 23 125 245 125 1215 2415 1215
011 6 6 6 6 23 23 23 23 245 125 125 2415 1215 1215
111 4 4 4 24 23 23 23 63 245 245 245 1215 1215 1215
In this way, the roots of all 76 elementary subtrees have 8 most parsimo-
nious labelings corresponding to the 8 possible assignments of the literals in
the clause. We connect the 76 elementary subtrees with a comb, and thus,
there are still 8 most parsimonious labelings at the root of the entire subtree,
which is the unit subtree. If the labeling at the root corresponds to a satisfying
assignment of the clause, the number of scenarios is 2156 × 364 , if the clause
is not satisfied, the number of scenarios is 2136 × 376 , as can be checked on
Table 4.1. The ratio of them is indeed 220 /312 = γ. The number of leaves on
this unit subtree is 248, and 148 auxiliary positions are introduced.
This was the construction of the constant size unit subtree. In the next
step, we “blow up” the system. Similar blowing up can be found in the seminal
paper by Jerrum, Valiant and Vazirani [103], in the proof of Theorem 5.1. We
repeat the above described unit subtree d(k log(n!) + n log(2))/ log(γ)e + 1
times, and connect all of them with a comb (completely unbalanced tree). It
is easy to see that there are still 8 most parsimonious labelings. For a solution
satisfying the clause, the number of scenarios on this blown-up subtree is
d k log(n!)+n log(2)
e+1
X = 2156 × 364 log(γ)
(4.12)
Let all adjacencies not participating in the clause be 0 on this blown-up sub-
tree.
We are close to the final subtree Tcj for one clause, cj . In the third phase,
we amend the so-far obtained tree with a constant-size subtree. The amending
is slightly different for clauses coming from Φ and for those that are in Φ0 \ Φ.
We detail the amending for both cases.
If the clause contains only x logical variables, say, the clause is x1 ∨x2 ∨x3 ,
then construct two copies of a fully balanced depth 6 binary tree, on which
the root has 64 most parsimonious labelings corresponding to the 64 possible
assignments of the literals participating in the clause and their corresponding
182 Computational complexity of counting and sampling
logical variables of the y type (namely, y1 , y2 and y3 ). This can be done with
a construction similar to the left part of the tree on Figure 4.5.c).
In one of the copies, all other characters corresponding to logical variables
not participating in the clause are 1 on all leaves, and thus, in each most
parsimonious labelings of the root. In the other copy, those characters should
be all 0.
In the copy, where all other characters are 0, the construction should be
done in such a way that going from the root of the tree, first the y logical vari-
ables must be separated, then the x ones. Namely, characters at the position
corresponding to y1 should be the same (say, 0) on each leaf of the left subtree
of the root and should be the other value on each leaf of the right subtree of
the root. Similalry, for each of the four grandchildren of the root, the leaves
must take the same value at the position corresponding to y2 , and these val-
ues must be different for the siblings. The same rule must be applied for the
grand-grandchilden of the root. There is an internal node of this subtree such
that on all of its leaves, each character at each position corresponding to y
variables is 0. Replace the subtree at this position with the blown-up subtree.
Connect the two copies with a common root. The so obtained tree is Tcj .
Observe that there are 2n possible most parsimonious labelings of Tcj . We
have the following lemma on them.
Lemma 8. For any most parsimonious labelings, if Φ0 and thus, particularly,
the clause cj is satisfied, then the number of scenarios on Tcj is
2 2
n−6 k n n−6
X× ! ≥ Y × (n!) × 2 × ! . (4.17)
2 2
If the clause cj is not satisfied, then the number of scenarios is at most
Y × (n − 6)!. If the clause cj is satisfied, however, Φ0 is not satisfied, then
the number of scenarios is at most X × (n − 6)!.
Proof. There are 3 logical x variables in the clause and there are 3 correspond-
ing y variables. For the remaining n − 6 variables, there are n − 6 substitutions
on the two edges of the root. If Φ0 is satisfied, then for each i, exactly one in the
couple (xi , yi ) has the TRUE value and the other has the FALSE value. There-
fore, there are n−6 2 substitutions on both edges of the root. On all remaining
edges of the amending, there is either 0 or 1 substitution. Finally, the number
of scenarios on the blown-up tree is X. Therefore, the number of scenarios
2
is indeed X × n−6 2 ! if Φ0 is satisfied. The inequality in Equation (4.18)
comes from Equation (4.14).
If the clause is not satisfied, then the number of scenarios on the blown-up
tree is Y . The substitutions on the two edges of the root might be arbitrarily
distributed, however, in any cases, the number of scenarios is at most (n − 6)!.
This extremity is taken when all the substitutions fall onto the same edge.
If cj is satisfied, however, Φ0 is not, then the number of scenarios on the
blown-up tree is X, and the number of scenarios on the two edges of the root
is at most (n − 6)!.
#P-complete counting problems 183
Proof. The proof is similar to the proof of Lemma 8, just now there are n − 4
substitutions that must be distributed on the two edges of the root.
For all k clauses, construct such a subtree and connect all of them with a
comb. This is the final tree TΦ0 for the 3CNF Φ0 . It is easy to see that TΦ0 has
2n most parsimonious labelings corresponding to the 2n possible assignments
of the logical variables. For these labelings, we have the following theorem.
Theorem 45. If a labeling corresponds to a satisfying assignment, then the
number of scenarios is
2n k−2n
n−4 n−6
Xk × ! × ! ≥
2 2
2n k−2n
k k n k
n−4 n−6
Y × n! × 2 × ! × ! . (4.19)
2 2
where Im (G) is the set of independent sets in G of size m, and b is the number
of independent sets in G0 whose inverse image has size at most m − 1. Since
there are at most 2n independent sets in G, we get that
Therefore
|I(G0 )| (2n+2 − 1)m |Im (G)| + b
n+2
= = |Im (G)|, (4.26)
(2 − 1)m (2n+2 − 1)m
S → 1Wi,1 . (4.29)
S → 0Wi,1 . (4.30)
If xj+1 is a literal in the ith clause, then add the rewriting rule
If xj+1 is a literal in the ith clause, then add the rewriting rule
Finally, for each i, if neither xn nor xn is a literal in the ith clause, then add
the rewriting rules
Wi,n−1 → 0 | 1. (4.34)
If xn is a literal in the ith clause, then add the rewriting rule
Wi,n−1 1. (4.35)
Wi,n−1 → 0. (4.36)
188 Computational complexity of counting and sampling
We claim that there is a bijection between the language that the grammar G
generates and the satisfying assignments of Φ. Specially, G generates n long 0-
1 sequences A = a1 a2 . . . an , and the image of that sequence is the assignment
in which the logical variable xi is TRUE if ai = 1 and FALSE otherwise.
If A is part of the language, then there is a generation of it. Take any
generation of A, and consider the first rewriting. It is either
S → 0Wi,1
or
S → 1Wi,1
for some i. For the literals in the ith clause of the DNF, the rewriting rules in
the selected generation of A will be such that the corresponding assignments
of the logical variables in the image of A satisfies the DNF. Therefore, any
image is a satisfying assignment. Clearly different sequences have different
images, thus, the mapping is an injection of the language to the satisfying
assignments.
We can also inject the satisfying assignments to the language. Let X be a
satisfying assignment, and assume that the ith clause satisfies the DNF. Let
aj be 1 if xj is TRUE in the assignment S and let aj be 0 otherwise. It is easy
to see that
S → a1 Wi,1 → a1 a2 Wi,2 → . . . → a1 a2 . . . an
is a possible generation in G, and it is easy to see that the image of the
so-generated sequence is indeed X.
This lemma comes from the well-known number theoretical result that the
first Chebyshev function, defined as
X
ϑ(n) := log(p) (4.39)
p≤n
is asymptotically n.
We can find the list of prime numbers up to dn log2 (3n) in polynomial
time using elementary methods (for example, the sieve of Eratosthenes). Note
that the running time has to be polynomial in the value of n and not the
number of digits necessary to write n. Let A be a matrix containing values
−1, 0, 1, 2 and 3. Let p be a prime, and let A0 be the matrix obtained from
A by replacing each −1 with p − 1. Observe that
FIGURE 4.6: The unweighted subgraph replacing the edge (v, w) with a
weight of 3. See text for details.
Ac = v (4.41)
#P-complete counting problems 191
where c is the vector of coefficients and v is the vector of the values. However,
A can be inverted since it is a Vandermonde matrix, and thus
c = A−1 v. (4.42)
Observe that for different ks, the number of matchings in Gk is the value
of f at k 2 + 2k + 1. Hence, if we could calculate the matchings in Gk for
k = 0, 1, . . . n, then we could calculate the coefficients of f . Particularly, we
could calculate m0 , that is, the number of perfect matchings in G. Therefore,
we get
Theorem 51. Computing the number of matchings in a bipartite graph is
#P-complete.
Lemma 12. For any n ≥ 4, the product of prime numbers strictly between n
and n2 is at least n!2n .
The proof of this lemma can be found in [26] and is based on the properties
of the first and second Chebyshev functions.
The outline of proving the #P-completeness of #LE is the following. For
192 Computational complexity of counting and sampling
x i1 x i2 x i3
FIGURE 4.7: The Hasse diagram of a clause poset. See text for details.
b
...
V0 . . . Vj . . . Vk
FIGURE 4.8: The poset PΦ,p . Ovals represent an antichain of size p − 1. For
sake of clarity, only the literal and some of the clause vertices for the clause
cj = (xi1 ∨ xi2 ∨ xi3 ) are presented here. See also the text for details.
xi3 , then each clause vertex cj,l is above a different triple of literal vertices
{xi1 , xi1 }, {xi2 , xi2 } and {xi3 , xi3 }. The clause vertex which is above the
triplets that actually constitutes cj is above b and all other clause vertices
are above Vj . There are no more comparabilities in PΦ,p . The poset contains
vertices.
To count the linear extensions of LΦ,p , we partition them based on config-
urations that we define below. A configuration λ is a partition of the literal
and clause vertices into 3 sets, B λ , M λ and T λ , called the base, middle and
top sets, respectively. We say that a linear extension respects a configuration
λ = {B λ , M λ , T λ } if B λ ≤ a ≤ M λ ≤ b ≤ T λ in the linear extension. The
set of linear extensions respecting a configuration λ is denoted by Lλ . A con-
figuration λ is consistent if Lλ is not empty. It is easy to see that for such
cases |Lλ | is the product of the number of linear extensions of three posets,
restricting PΦ,p to the base, middle and top sets. We denote these posets by
PΦ,p |B λ , PΦ,p |M λ , and PΦ,p |T λ . The number of linear extensions respecting
a configuration can be divided by p if and only if the number of linear ex-
tensions of any of these three posets can be divided by p. Therefore, in the
following, we infer the cases when p does not divide any of these numbers of
linear extensions.
194 Computational complexity of counting and sampling
(p(n + 1) − 1)!
, (4.45)
pn
way. Then for each i, the vertices in Ui can be put into arbitrary order. This
can be done in (p − 1)! way for each i, thus the number of linear extensions is
indeed
p(n + 1) − 1 (p(n + 1) − 1)!
(p − 1)!n+1 = . (4.47)
p, p, . . . p − 1 pn
It is trivial to see that this number cannot be divided by p since p is greater
than n + 1.
Similar analysis can be done for PΦ,p |M λ . It contains an antichain of size
(p − 1)(k + 1) and some literal and clause vertices. It is easy to see that its
number of linear extensions cannot be divided by p if and only if it contains
none of the literal vertices and exactly one of the clause vertices for each Vj .
In such a case, the number of linear extensions of PΦ,p |M λ is
(p(k + 1) − 1)!
(4.48)
pk
which cannot be divided by p since p > k.
If the number of linear extensions cannot be divided by p for PΦ,p |B λ or
for PΦ,p |M λ , then PΦ,p |T λ contains one literal vertex for each logical variable
and seven clause vertices for each literal. We are going to show that these
#P-complete counting problems 195
where M is the set of sequences that minimizes the sum of Hamming distances
from the sequences in S. Namely, for any M ∈ M,
n
X
H(Ai , M ) (4.51)
i=1
satisfies Φ.
It is easy to see that these properties are nested, namely, if a median
sequence has Property i, then it also has Property j for each j < i. We prove
the following on the median sequences.
If a median sequence M does not have Property 1, then the number of
corresponding scenarios can be divided by p. Indeed, in such a case, either
H(M, A) ≥ p or H(M, A) ≥ p and thus either H(M, A)! or H(M, A)! can be
divided by p.
If a median sequence M has Property 1, but does not have Property 2,
then the number of corresponding scenarios can be divided by p. Indeed, let
i be such that ai + bi = 0 or ai + bi = 2. Then either H(M, Ai ) = p or
H(M, Ai ) = p, making the corresponding factorial dividable by p.
If a median sequence M has Properties 1 and 2, but does not have Property
3, then the number of corresponding scenarios can be divided by p. Assume
that cj = (xi1 ∨ xi2 ∨ xi3 ) is the clause that is not satisfied by the assignment
defined in Equation (4.53). Then ai1 = ai2 = ai3 = 0 and bi1 = bi2 = bi3 = 1.
In that case, the Hamming distance between M and the sequence that is
defined for clause cj in the last row of Table 4.2 is p. It follows that the number
of corresponding scenarios can be divided by p. If some of the literals are
negated in a clause not satisfied by the assignment defined in Equation (4.53),
the same arguing holds, since both in the constructed sequences and in M ,
some of the a and b values are swapped.
198 Computational complexity of counting and sampling
A B C M1 M2 M3 M4 M5 M6 M7 M8
111 110 101 011 100 010 001 000
01 00 00 0 +3 p−1 p−1 p−1 p−3 p−1 p−3 p−3 p−3
00 01 00 0 +3 p−1 p−1 p−3 p−1 p−3 p−1 p−3 p−3
00 00 01 0 +3 p−1 p−3 p−1 p−1 p−3 p−3 p−1 p−3
10 11 11 1 +0 p−6 p−6 p−6 p−4 p−6 p−4 p−4 p−4
11 10 11 1 +0 p−6 p−6 p−4 p−6 p−4 p−6 p−4 p−4
11 11 10 1 +0 p−6 p−4 p−6 p−6 p−4 p−4 p−6 p−4
10 10 00 0 +2 p−5 p−5 p−3 p−3 p−3 p−3 p−1 p−1
10 00 10 0 +2 p−5 p−3 p−5 p−3 p−3 p−1 p−3 p−1
00 10 10 0 +2 p−5 p−3 p−3 p−5 p−1 p−3 p−3 p−1
10 10 00 0 +2 p−5 p−5 p−3 p−3 p−3 p−3 p−1 p−1
10 00 01 0 +2 p−3 p−5 p−3 p−1 p−5 p−3 p−1 p−3
00 10 01 0 +2 p−3 p−5 p−1 p−3 p−3 p−5 p−1 p−3
10 01 00 0 +2 p−3 p−3 p−5 p−1 p−5 p−1 p−3 p−3
10 00 10 0 +2 p−5 p−3 p−5 p−3 p−3 p−1 p−3 p−1
00 01 10 0 +2 p−5 p−1 p−5 p−3 p−3 p−1 p−5 p−3
01 10 00 0 +2 p−3 p−3 p−1 p−5 p−1 p−5 p−3 p−3
01 00 10 0 +2 p−3 p−1 p−3 p−5 p−1 p−3 p−5 p−3
00 10 10 0 +2 p−5 p−3 p−3 p−5 p−1 p−3 p−3 p−1
10 01 00 0 +2 p−3 p−3 p−5 p−1 p−5 p−1 p−3 p−3
10 00 01 0 +2 p−3 p−5 p−3 p−1 p−5 p−3 p−1 p−3
00 01 01 0 +2 p−1 p−1 p−3 p−1 p−5 p−3 p−3 p−5
01 10 00 0 +2 p−3 p−3 p−1 p−5 p−1 p−5 p−3 p−3
01 00 01 0 +2 p−1 p−3 p−1 p−3 p−3 p−5 p−3 p−5
00 10 01 0 +2 p−3 p−5 p−1 p−3 p−3 p−5 p−1 p−3
01 01 00 0 +2 p−1 p−1 p−3 p−3 p−3 p−3 p−5 p−5
01 00 10 0 +2 p−3 p−1 p−3 p−5 p−1 p−3 p−5 p−3
00 01 10 0 +2 p−3 p−1 p−5 p−3 p−3 p−1 p−5 p−3
10 10 11 1 +1 p−6 p−6 p−4 p−4 p−4 p−4 p−2 p−2
10 11 01 1 +1 p−4 p−6 p−4 p−2 p−6 p−4 p−2 p−4
11 10 01 1 +1 p−4 p−6 p−2 p−4 p−4 p−6 p−2 p−4
10 01 11 1 +1 p−4 p−4 p−6 p−2 p−6 p−2 p−4 p−4
10 11 10 1 +1 p−6 p−4 p−6 p−4 p−4 p−2 p−4 p−2
11 01 10 1 +1 p−4 p−2 p−6 p−4 p−4 p−2 p−6 p−4
01 10 11 1 +1 p−4 p−4 p−2 p−6 p−2 p−6 p−4 p−4
01 11 10 1 +1 p−4 p−2 p−4 p−6 p−2 p−4 p−6 p−4
11 10 10 1 +1 p−6 p−4 p−4 p−6 p−2 p−4 p−4 p−2
10 01 11 1 +1 p−4 p−4 p−6 p−2 p−6 p−2 p−4 p−4
10 11 01 1 +1 p−4 p−6 p−4 p−2 p−6 p−4 p−2 p−4
11 01 01 1 +1 p−2 p−4 p−4 p−2 p−6 p−4 p−4 p−6
01 10 11 1 +1 p−4 p−4 p−2 p−6 p−2 p−6 p−4 p−4
01 11 01 1 +1 p−2 p−4 p−2 p−4 p−4 p−6 p−4 p−6
11 10 01 1 +1 p−4 p−6 p−2 p−4 p−4 p−6 p−2 p−4
01 01 11 1 +1 p−2 p−2 p−4 p−4 p−4 p−4 p−6 p−6
01 11 10 1 +1 p−4 p−2 p−4 p−6 p−2 p−4 p−6 p−4
11 01 10 1 +1 p−4 p−2 p−6 p−4 p−4 p−2 p−6 p−4
01 01 11 1 +1 p−2 p−2 p−4 p−4 p−4 p−4 p−6 p−6
01 11 01 1 +1 p−2 p−4 p−2 p−4 p−4 p−6 p−4 p−6
11 01 01 1 +1 p−2 p−4 p−4 p−2 p−6 p−4 p−4 p−6
01 01 01 0 +1 p−2 p−3 p−3 p−1 p−5 p−5 p−5 p−7
10 10 10 1 +2 p−6 p−4 p−4 p−4 p−2 p−2 p−2 p
TABLE 4.2: Constructing the 50 sequences for a clause. See text for expla-
nation.
#P-complete counting problems 199
where s(Φ) is the number of satisfying assignments of Φ. Since all the calcu-
lations in the reduction can be done in polynomial time, we get the following
theorem.
Theorem 53. The counting problem #SPS-STAR is in #P-complete.
where M(G) is the set of the (not necessarily perfect) matchings of G. Later
200 Computational complexity of counting and sampling
on, we are going to introduce weighted graphs where the weight function maps
to the multivariate polynomial ring Z[x1 , x2 , . . . , xk ]. In such cases, the sum
in Equation (4.56) will be called the matching polynomial.
Clearly, #W-Matching is a #P-complete problem, since the problem
reduces to the #PerfectMatching by choosing the weight function to be
the constant 0 function. Indeed, then only the perfect matchings contribute
to the sum in Equation (4.56), and all of them with 1 (recall that the empty
product is defined as 1). It is also clear that the #W-Matching problem
remains in #P-complete if only weights 1 and 0 are used.
First we show that #Pl-W-Matching is also in #P-complete by reducing
#W-Matching to it. Let G = (V, E) be a simple graph with weights w : V →
Z. We can draw G on a plane in polynomial time such that there are no points
where more than 2 edges cross each other. The number of crosses is O(n4 ),
where n = |V |. We are going to replace each crossing to a constant size planar
gadget such that for the so obtained planar graph G0 = (V, E 0 ) with weights
w0 : V 0 → Z, the equality
X Y X Y
w0 (v 0 ) = 8c w(v) (4.57)
M 0 ∈M(G0 ) v 0 ∈M
/ 0 M ∈M(G) v ∈M
/
0 0
y z
-1
1 1
-1
-1 1 -1
y z
x w
-1 1 -1 0 0 -1 1 -1
0
-1 -1
1 1 1 1
-1 -1 -1
1 1
-1
0 0
0 0 -1 1 -1 0 0
z y
To see this, observe that ai is the sum of weighted matchings over those match-
ings that avoid exactly i vertices having weight x in G. Any such matching
can be extended to a matching in Gj by j ways adding an edge to an avoided
vertex and one way by not adding any edge. Thus the sum of the weighted
matchings in Gj is the matching polynomial of G evaluated at x = j + 1.
Evaluating P M (G) at k + 1 different points, we can compute the coefficients
of it in polynomial time. Particularly, we can compute a0 , that is, the sum of
weighted matchings of G.
Similar reduction can be done in the second step. Let G be a planar graph
with vertex weights from the set {−1, 1}. Replace each −1 with x. Then the
sum of the weighted matchings is the matching polynomial
k
X
M P (G) := ai xi (4.65)
i=0
ev3
v3 v2
v0
ev1 v1 ev2
vertices. The edges in G0 ⊂ G are called external edges. These edges connect
the external vertices of the gadgets.
We ask for the number of trees in G0 with exactly k = 4n − 3 edges
containing the set S = {v0 |v ∈ V }. We show that this is exactly four times
the number of Hamiltonian paths in G. Let p be a Hamiltonian path in G, and
consider the corresponding external edges in G0 . This set can be extended to
a tree containing k edges and the set S in exactly 4 different ways. For each
adjacent edge in p, there is a unique way to connect ei and ei+1 with 3 edges,
(evi , vi ), (vi , v0 ), and (vi , evi+1 ), involving vertex v0 . (The indexes are modulo
3.) At the end of the Hamiltonian path, which were vertices s and t in G,
there are 2 possible ways to connect s0 and t0 to the tree in G0 using exactly
2 edges. It is easy to see that the number of vertices is indeed 4n − 3. There
are n − 1 external edges, and there are 3 internal edges in each gadget except
at the end of the Hamiltonian path, where the number of internal edges are
2. Thus, the number of edges is n − 1 + 3(n − 2) + 4 = 4n − 3.
We are going to show that these are the only subtrees with k edges and
covering the subset S by proving that any minimal subtree covering S contains
exactly k edges if and only if its external edges form a Hamiltonian path in G.
Let T be a minimal subtree covering S. Then for each gadget, the number of
external vertices in the subtree is either 1, 2 or 3. It is easy to show that the
number of internal edges for these three cases is 2, 3 or 5, respectively. The
external edges form a spanning tree in G, thus, the number of external edges
is n − 1 and the sum of the external vertices in the gadgets are 2n − 2. Then
the number of edges is k only if the external edges form a Hamiltonian path
in G. Indeed, if there are m gadgets with 3 external vertices, then there are
m + 2 gadgets with 1 external vertex, and n − 2m − 2 gadgets with 2 external
vertices. Then the total number of edges is
Problem 11.
Name: #EulerianOrientation.
Input: a Eulerian graph G = (V, E), with possible parallel edges.
Output: the number of Eulerian Orientations of G. That is, the number of
orientations of the edges such that for every vertex v ∈ V , the number of
incoming and outgoing edges of v is the same.
Milena Michail and Peter Winkler showed that counting the Eulerian ori-
entation of a graph is #P-complete [129]. They reduce the number of perfect
matchings to the number of Eulerian orientations. The reduction is the fol-
lowing. Let G = (U, V, E) be a bipartite graph. Since we are interested in the
number of perfect matchings in G, we can assume, without loss of generality,
that
P each degree P in G is greater than0 1. Let n denote |U | = |V |, let m denote
u∈U d(u) = v∈V d(v), and let m be m − 2n. We construct the following
graphs. Let G0 be the graph that amends G by adding two vertices s and t.
Each u ∈ U is connected to s with d(u) − 2 parallel edges, and each v ∈ V is
connected to t with d(v) − 2 parallel edges. Finally, let Gk denote the graph
that amends G0 by connecting s and t with k parallel edges.
It is clear that in all Eulerian orientations of Gm0 in which all edges between
s and t are oriented from t to s, all edges connecting U with s must be oriented
toward U , and all edges connecting V to t must be oriented toward t. What
follows is that all edges between U and V are oriented from V to U except
exactly one edge for each u ∈ U and v ∈ V . These edges indicate a perfect
matching between u and v. It is also easy to see that there is a bijection
between those Eulerian orientations of Gm0 and the perfect matchings in G.
Let Rj denote the number of (not necessarily balanced) orientations of G0
in which each vertex in U and V are balanced and there are exactly j edges
from the m0 ones connecting s with U that are oriented toward U and similarly,
exactly j edges from the m0 ones connecting t with V that are oriented toward
t. It is clear that |Rm0 | is the number of perfect matchings in G. Observe that
due to symmetry,
|Rj | = |Rm0 −j | (4.71)
since there is a bijection between Rj and Rm0 −j obtained by changing the
orientation of each edge.
Now, let P (Gk ) denote the number of Eulerian orientations in Gk . Then
we have that the equation
k
X k
P (Gk ) = R m0 +k −i (4.72)
i=0
i 2
holds. Indeed, if there are i number of edges oriented from s to t, then there are
k − i edges oriented from s to t. To get a Eulerian orientation, there must be
m0 +k
2 − i edges oriented from s to U and also, there must be the same number
208 Computational complexity of counting and sampling
holds for Y = X 0 but does not hold for any Y ⊂ X 0 . The proof is
based on reducing the number of perfect matchings to this problem
using polynomial interpolation.
– Number of minimal vertex covers. The input is a simple graph
G = (V, E), and the output is the number of subsets V 0 ⊆ V
such that V 0 covers the edge set, and it is minimal. That is, the
implication
(u, v) ∈ E ⇒ u ∈ A ∨ v ∈ A (4.74)
holds for A = V 0 but does not hold for any A ⊂ V 0 . The proof is
#P-complete counting problems 209
u, v ∈ A ⇒ (u, v) ∈ E (4.75)
holds for A = V 0 but does not hold for any A ⊃ V 0 . The proof
is based on the fact that the set of vertices in any maximal clique
is the complement of a minimal vertex cover in the complement
graph and vice versa.
– Directed trees in a directed graph. The input is a directed graph
~ = (V, E), and the output is the number of subsets E 0 ⊆ E that
G
form a rooted tree. Compare this with the fact that the number of
directed spanning trees rooted into a given vertex is easy to com-
pute! The proof is based on a series of reductions of intermediate
problems using polynomial interpolations.
– Number of s − t paths. The input is a directed graph G ~ = (V, E),
and two vertices s, t ∈ V and the output is the number of paths
from s to t. The proof is similar to proving the #P-completeness
of counting cycles in a directed graph. That is, the number of s − t
paths cannot be approximated with an FPRAS unless RP = NP.
• J. Scott Provan and Michael O. Ball proved [144] that the following
problems are in #P-complete:
– Counting the independent sets in a bipartite graph.
– Counting vertex covers in a bipartite graph.
– Counting antichains in a partial order.
– Counting the minimum cardinality s−t cuts. The input is a directed
~ = (V, E) and two vertices s, t ∈ V , and the output is the
graph G
number of subsets V 0 ⊆ V \{s, t} such that the size of V 0 is minimal,
and there is no path from s to t in G~ \ V 0.
– Computing the probability that a graph is connected.
• Nathan Linial also gave a list of hard counting problems in geometry
and combinatorics [118]. His list is the following:
– Number of vertices of a polytope. The input is a set of linear in-
equalities in the form of Ax ≤ b. Each linear inequality is a half
space, and the intersection of them defines a polytope P ⊂ Rn .
The output is the number of vertices of P . The proof is based on
reducing the number of antichains to this problem using bijection
between the number of solutions.
210 Computational complexity of counting and sampling
• Leslie Ann Goldberg and Mark Jerrum showed that counting the non-
isomorphic subtrees of a tree is in #P-complete. The proof is based on
reducing the number of matchings in bipartite graphs to this problem
using polynomial interpolation [78].
• We learned that counting the Eulerian circuits in a Eulerian directed
graph is easy. The problem becomes hard for undirected graphs. The
proof was given by Graham Brightwell and Peter Winkler [27]. The
proof is based on reducing the number of Eulerian orientations to the
number of Eulerian circuits using modulo prime number calculations.
Patric John Creed proved #P-completeness for planar graphs [48] and
Qi Ge and Daniel Štefankovič proved that counting Eulerian circuits
remains #P-complete for 4-regular planar graphs [75].
#P-complete counting problems 211
4.4 Exercises
1. Generate a 3CNF Φ for each of these logical expressions such that Φ has
the same number of satisfying assignments as the logical expressions.
10. ◦ The Steiner tree problem is to find the smallest subtree of a graph
G = (V, E) that covers a subset of vertices S ⊆ V . Show that the
problem is NP-hard even if the problem is restricted to planar, 3-regular
graphs.
11. ◦ Prove that the number of Eulerian orientations remains in #P-
complete even if it is restricted to simple Eulerian graphs.
12. Prove that if V 0 is a minimal vertex cover in G = (V, E), then V \ V 0
forms a maximal clique in G, the complement of G.
13. * Prove the #P-completeness of the number of 3-colorings of a planar
graph by reducing the number of independent sets of a planar graph to
it.
14. The Erdős-Gallai theorem is the following. Let D = d1 ≥ d2 ≥ . . . ≥ dn
be a degree sequence. Then D is graphical (has at least one simple graph
G whose degrees are exactly D) if and only if the sum of the degrees is
even and for all k = 1, 2, . . . , n
k
X n
X
di ≤ k(k − 1) + min{k, dj }. (4.77)
i=1 j=k+1
4.5 Solutions
Exercise 1.
(a) The Boolean circuit of the function is
∧ ∧
˜
x1 x2 x3 x4
#P-complete counting problems 215
Introduce new variables for the internal nodes of the circuit. The 3CNF
we are looking for is:
Let A be the adjacency matrix of the bipartite graph G. Then, per(A) is the
number of perfect matchings, however it is det(A) modulo 2. Since the deter-
minant can be calculated in polynomial time, it can be decided in polynomial
time if G contains an even or an odd number of perfect matchings.
Exercise 7. Observe that monotone satisfiability is in #P-complete, and in
fact, restricted to #Mon-2SAT, the counting problem is still in #P-complete.
The complement of a monotone 2CNF can be expressed as a monotone 2DNF.
Exercise 8. Let G = (U, V, E) be a bipartite graph with n vertices on both
vertex classes. The 2CNF Φ that has as many satisfying assignments as the
number of matchings in G contains a logical variable xi,j for each (ui , vj ) ∈ E,
and it is defined as
^n ^ n ^
^
Φ := (xi,j1 ∨ xi,j2 ) ∧ (xi1 ,j ∨ xi2 ,j ) . (4.78)
i=1 j1 6=j2 j=1 i1 6=i2
Indeed, the 2CNF forces there to be at most one edge on each vertex. It
should be clear that Φ in Equation (4.78) is a monotone function of the logical
variables yi,j = xi,j . Therefore, we get that #Mon-2SAT is in #P-complete.
Exercise 10. Let G be a planar, 3-regular graph. Replace each vertex with
the gadget in Figure 4.12. Let this graph be denoted by G0 , and let S be the
set of vertices containing the v0 vertices of the gadgets. Show that from the
solution of the Steiner tree problem of G0 and S, it could be decided if G
contains a Hamiltonian path.
Exercise 11. Replace each edge with a path of length 3 (containing two
internal vertices).
216 Computational complexity of counting and sampling
217
218 Computational complexity of counting and sampling
Only the edges in the matchgates have non-trivial weights, and the weight
of any edge in C is 1. The weights come from an arbitrary field, or even more
generally, from an arbitrary commutative ring. Let G denote the weighted
graph. We define X Y
Z(G) := w(e) (5.1)
M ∈P M (G) e∈M
where P M (G) denotes the set of perfect matchings, and w is the weight func-
tion on the edges. Recall that computing this number, that is, the sum of
weights of perfect matchings in a planar graph, needs only a polynomial num-
ber of ring operations for any commutative ring, see Theorems 41 and 31.
We define the standard signature of a matchgrid. Let X be a (generator
or recognizer) matchgate. Let ei1 , ei2 , . . . , eil be the edges that are incident to
the input or output vertices of X. For any subset F of these edges, let X \ F
denote the graph obtained from X in which the vertices incident to any edge
in F are omitted. We define the index of F as
X
ind(F ) := 1 + 2j−1 .
eij ∈F
Observe that we can look at Z(G) as a tensor of the standard signature vectors
of the matchgates. Indeed, it is easy to see that Z(G) is a linear function for
each standard signature vector. The holographic reduction is a base change of
this tensor.
For readers not familiar with tensor algebra, we derive a linear algebraic
description which is more tedious, but does not need background knowledge
of base changes in tensor algebra. Observe that the right-hand side of Equa-
tion (5.2) is the dot product of two vectors of dimension 2|C| . We can explicitly
define these vectors in the following way. We are looking for the vectors in the
forms
g := (g1 , g2 , . . . , g2|C| )
and
rT := (r1 , r2 , . . . , r2|C| ).
220 Computational complexity of counting and sampling
That is, we are looking for a row vector g and a column vector r. We give
an indexing of the edges in C in such a way that first we consider the edges
of the last generator matchgate, then the edges of the next to last generator
matchgate, etc. This implies an indexing of the subsets of C. Any subset F
of C has a membership vector s = (s1 , s2 , . . . s|C| ), in which sm is 1 if the
member of C with index m participates in F and otherwise 0. Then the index
of F is defined as
|C|
X
1+ sm 2m−1 .
m=1
and Y
rk := Z(Yj \ F ). (5.4)
Yj ∈Y
v ⊗ w = (v1 w1 , v1 , w2 , . . . v1 wm , v2 w1 , . . . v2 wm , . . . , vn wm ).
Then we define
i
V alG(Xi , bm ) := wm . (5.11)
For these values the following theorem holds.
Theorem 58. Let
|C|
X
k =1+ sm 2m−1 .
m=1
g = u1 ⊗ u2 ⊗ . . . ⊗ u|X| .
ui = wi B i .
B = B 1 ⊗ B 2 ⊗ . . . ⊗ B |X| .
Holographic algorithms 223
Then the remainder of the proof is simply applying the basic properties
of the tensor product. Indeed,
Observe that this is exactly our claim, that is, the k th coordinate of g 0
i
is the appropriate product of wm i
terms.
V2 5 V1
−1 3
V1 −4 V2
5.2 Examples
5.2.1 #X-matchings
Recall that counting the not necessarily perfect matchings even in planar
graphs is #P-complete, see Subsection 4.2.4. On the other hand, the following
problem is in FP.
Problem 12.
Name: #X-matchings.
Input: a planar bipartite graph, G = (V1 , V2 , E), where each vertex in V1 have
degree 2, and a weight function w : E → R.
Output: the sum of the weights of matchings, where a weight of a matching
consists of the product of the weights of the edges participating in the match-
ing and also the product of −1 times the sum of the edge weights incident to
each vertex in V2 not covered by any edge of the matching.
We give an illustrative example in Figure 5.1. There are two vertices in the
class V1 and also two vertices in V2 . There are two perfect matchings with the
scores −3 and −20. There are five imperfect matchings, the empty matching
and the matchings containing only one edge. The empty matching has the
score (−(−1 + 5))(−(−4 + 3)) = −4. The other four matchings have weights
5, −1, −12 and +16. Thus the value to be computed for this problem instance
is −3 − 20 − 4 + 5 − 1 − 12 + 16 = −19.
The holographic algorithm transforms any edge-weighted, bipartite, planar
graph G into a matchgrid. In the matchgrid, edges of G are represented with
the edges in C. Each vertex in V1 is replaced with a generator matchgate.
Each vertex in V2 is replaced with a recognizer matchgate. The weights for
the edges in G will appear in the recognizer matchgates. The matchgates are
Holographic algorithms 225
n ⊗ n ⊗ ... ⊗ n
The following two observations are also easy to see. If z is the all-0 vector,
except its value is 1 in position i, then
V alR(Y, z) = wi ,
V alR(Y, z) = 0.
These are exactly the values what we would like to get. Indeed, if a vertex
v is not covered by an edgeP in the matching, then its contribution to the
score of the matching is − i wi . If it is incident to an edge in the matching,
and the edge has weight wi , then its contribution is wi . We do not consider
configurations when v is incident to more than one edge in the matching.
We get that our example problem instance can be solved with the match-
grid in Figure 5.2. As an edge-weighted graph G, it is an even cycle, thus
226 Computational complexity of counting and sampling
5 e4 X2
Y1
−1 −1
e1 e3
−1 3
X1 e2 −4 Y2
FIGURE 5.2: The matchgrid solving the #X-matching problem for the
graph in Figure 5.1. The edges labeled by ei belong to the set of edges C,
and are considered to have weights 1. See the text for details.
it has two perfect matchings with weights −15 and −4. That is, its parti-
tion function Z(G) is −19, just the solution of the #X-matchings problem
instance. Observe that the relationship between the 7 matchings in the #X-
matching problem and the 2 perfect matchings of the underlying matchgrid
can no longer be identified.
To give a detailed computation of the holographic reduction, we also com-
pute g, r, g 0 and Br for this particular problem instance. Both generators
have standard signature (−1, 0, 0, 1). Vector g is their tensor product, that is
g = (−1, 0, 0, 1) ⊗ (−1, 0, 0, 1) =
(1, 0, 0, −1, 0, 0, 0, 0, 0, 0, 0, 0, −1, 0, 0, 1).
The recognizer Y1 has standard signature (0, 3, −4, 0), the recognizer Y2 has
standard signature (0, 5, −1, 0). However, edges e1 and e4 are incident to the
input vertices of Y1 and edges e2 and e3 are incident to the input vertices of
Y2 . Therefore, the coordinates of the tensor product of the standard signatures
have to be permuted to follow the order of different subsets of C. Thus, the
recognizer vector is
The scalar product gr is indeed −19. Matrix B representing the base trans-
formation is:
Holographic algorithms 227
1 −1 −1 1 −1 1 1 −1 −1 1 1 −1 1 −1 −1 1
−1 1 1 −1 1 −1 −1 1 0 0 0 0 0 0 0 0
−1 1 1 −1 0 0 0 0 1 −1 −1 1 0 0 0 0
1 −1 −1 1 0 0 0 0 0 0 0 0 0 0 0 0
−1 1 0 0 1 −1 0 0 1 −1 0 0 −1 1 0 0
1 −1 0 0 −1 1 0 0 0 0 0 0 0 0 0 0
1 −1 0 0 0 0 0 0 −1 1 0 0 0 0 0 0
1 −1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
−1 0 1 0 1 0 −1 0 1 0 −1 0 −1 0 1 0
1 0 −1 0 −1 0 1 0 0 0 0 0 0 0 0 0
1 0 −1 0 0 0 0 0 −1 0 1 0 0 0 0 0
−1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
1 0 0 0 −1 0 0 0 −1 0 0 0 1 0 0 0
−1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
−1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
It is easy to verify that
The signature of the generators in the new base is (1, 1, 1, 0). The vector g 0 is
their tensor product, that is
g 0 = (1, 1, 1, 0) ⊗ (1, 1, 1, 0) =
(1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0).
It is easy to see that g 0 B is indeed g. Finally, the non-zero terms in the scalar
product of g 0 and Br are
5.2.2 #Pl-3-(1,1)-Cyclechain
A cycle-chain cover of a graph G = (V, E) is a subgraph G0 = (V, E 0 ),
E ⊆ E, such that all components in G0 are cycles or paths. Let CC denote
0
a cycle-chain cover, let c(CC) denote the number of cycles in CC, let p(CC)
denote the number of paths in CC, and let C(G) denote the set of cycle covers
of G. Then the (x, y)-cycle-chain sum is defined as
X
xc(CC) y p(CC) .
CC∈C(G)
1 2 2
− 14
−1 −1
− 14
− 14
−1
where the outer vertices are the 3 input vertices. It is easy to see that its
T
standard signature is 34 , 0, 0, − 14 , 0, − 14 , − 14 , 0 and its signature in the given
base is
3
n⊗n⊗n 4
p ⊗ n ⊗ n 0
n ⊗ p ⊗ n 0
1
p ⊗ p ⊗ n −
4
n ⊗ n ⊗ p 0 =
p ⊗ n ⊗ p −1
41
n ⊗ p ⊗ p −
4
p⊗p⊗p 0
3
1 1 1 1 1 1 1 1 4 0
1 1
1 1 −1 −1 −1 −1 0 1
1 1 −1 −1 1 1 −1 −1 0 1
1
1 1 −1 −1 −1 −1 1 1 −4 = 1 .
1 −1 1 −1 1 −1 1 −1 0 1
1 −1 1 −1 −1 1 −1 1 − 1 1
41
1 −1 −1 1 1 −1 −1 1 − 4 1
1 −1 −1 1 −1 1 1 −1 0 0
X3
X6 X5
Y4
X2
X1
Y1 X4 Y2
X3
X6 X5
Y4
X2
X1
Y1 X4 Y2
or the other two of them symmetric to it. (We can say that they are related
to the perfect matchings of K4 .) Each of them has a score 14 .
When exactly 3 of the generators have edges in C incident to both of their
output vertices, they must be in this configuration:
Holographic algorithms 231
Y3
X3
X6 X5
Y4
X2
X1
Y1 X4 Y2
or the other 3 of them symmetric to this to have perfect matchings. (We can
say that they are related to the triangles in K4 .) In each of these 4 cases, there
are 3 perfect matchings, each of them having a score − 14 .
There are no other subsets of the set C for which there are perfect match-
ings, since any generator matchgate must have either 0 or 2 edges in C incident
to its output vertices and any recognizer matchgate must have either 0 or 2
edges in C incident to its input vertices.
To summarize, there are 96 perfect matchings in the matchgrid, 84 having
score 14 and 12 having score − 1
4 . Thus, the partition function of the matchgrid
1 1
is indeed 84 × 4 + 12 × − 4 = 18, the number of cycle-chain covers of K4 .
5.2.3 #Pl-3-NAE-ICE
The so-called “ice” problems are considered in statistical physics. They are
orientation problems. That is, the input is an unoriented graph G, and the
solutions are assignments of a direction to each of its edges satisfying some
constraint. We are interested in the number of such orientations. In its initial
work, Linus Pauling proposed to count the orientations of a planar squared
lattice, where each vertex has two incoming and two outgoing edges [141].
We know that counting the Eulerian orientations remains #P-complete even
restricted to planar graphs [49]. On the other hand, the following problem can
be solved in polynomial time using holographic reduction.
Problem 14.
Name: #Pl-3-NAE-ICE.
Input: a planar graph G with maximum degree 3.
Output: the number of orientations of the edges of G such that no vertex has
all incoming or all outgoing edges.
Here NAE stands for “not all equal”. Let G be a planar graph with maximum
degree 3. We would like to generate a matchgrid such that for its correspond-
ing weighted graph G0 , it holds that Z(G0 ) is the number of not-all-equal
orientations of G. First, we add a vertex to the middle of each edge of G to
get a planar bipartite graph. We would like to transform these new vertices to
generator matchgates such that in some base, they have signature (0, 1, 1, 0).
232 Computational complexity of counting and sampling
That is, the presence of one of the new edges in the set C will correspond to
an orientation of an edge in G.
We would like to replace each original vertex in G to a recognizer match-
gate. If the original vertex has degree one, then the signature of the recognizer
matchgate in the given base is (1, 1), and the signature must be (0, 1, 1, 0)
(must be (0, 1, 1, 1, 1, 1, 1, 0), respectively) for degree 2 (for degree 3, respec-
tively) vertices.
We construct such matchgates in the base n = (1, 1), p = (1, −1). Then
A possible generator is
1 −2 2
where the first and the last vertex are the output vertices. Indeed, this graph
has one perfect matching with score 2. If only one of the output vertices are
removed, then the number of vertices is odd, therefore, there is no perfect
matching in it. Finally, if both output vertices are removed, it has one perfect
matching with score −2.
A possible recognizer matchgate representing a degree 1 vertex is
where any of the two vertices is the only input vertex. Its standard signature
is (1, 0)T , and indeed
n 1 1 1 1 1
= = .
p 0 1 −1 0 1
1 −0.5 0.5
where the first and the last vertex are the input vertices. Indeed, it has stan-
dard signature (0.5, 0, 0, −0.5)T and
n⊗n 0.5 1 1 1 1 0.5 0
p ⊗ n 0 1 −1 1 −1 0 1
= = .
n ⊗ p 0 1 1 −1 −1 0 1
p⊗p −0.5 1 −1 −1 1 −0.5 0
1 −2 2
− 14
−1 −1
− 14
− 14
−1
1 −2 2 1 −2 2
1 − 12 1
2 1 −2 2 1 − 12 1
2
FIGURE 5.4: The matchgrid solving the #Pl-3-NAE-ICE problem for the
problem instance in Figure 5.3. The edges belonging to the edge set C are
dotted. The recognizer matchgates are put into dashed circles.
234 Computational complexity of counting and sampling
triangle can be oriented in two different ways, and an edge incident to the
degree 1 vertex can be arbitrary oriented independently from the orientation
of the triangle.
The corresponding matchgate that computes the number of orientations
is in Figure 5.4. There are three perfect matchings of the arity 3 recognizer
matchgate, and each of them can be extended in a single unique way to a
perfect matching of the entire graph. Each of these perfect matchings has
score 1. There is no perfect matching of the graph containing the edge in
C incident to the upper input vertex of the arity 3 matchgates. There is one
perfect matching in which the bottom 2 input vertices of the arity 3 matchgate
are incident to edges in C. This perfect matching also has score 1. Therefore
the partition function of the weighted graph building the matchgate is indeed
4, which is the number of not-all-equal orientations of the graph in Figure 5.3.
Although in this example there are 4 orientations of the input graph and
4 perfect matchings of the matchgate graph, there is no natural one-to-one
correspondence between these solutions.
5.2.4 #Pl-3-NAE-SAT
For any logical formula,
c1 ∧ c2 ∧ . . . ∧ ck
we can assign a bipartite graph G = (U, V, E), where U represents the clauses
c1 , c2 , . . . , ck , V represents the logical variables x1 , x2 , . . . , xn , and there is an
edge connecting ui with vj if xj participates in clause ci . A logical formula is
called planar if G is planar. A clause is a not-all-equal clause if it is TRUE
when there are two literals with different values. A not-all-equal formula is a
logical formula in which all clauses are not-all-equal clauses.
We know that Pl-3SAT (where the problem instances are planar 3CNFs)
is an NP-complete decision problem, and #Pl-3SAT is #P-complete [96]. The
existence problem of Pl-Mon-NAE-SAT (where the problem instances are pla-
nar, monotone not-all-equal formulae) is reducible to the Four Color Theorem,
and therefore, always have a solution [13]. However, counting the 4-colorings
of a planar graph is #P-complete [179]. On the other hand, the following
problem is solvable in polynomial time.
Problem 15.
Name: #Pl-3-NAE-SAT.
Input: a planar, not-all-equal formula Φ in which each clause has 2 or 3 literals.
Output: the number of satisfying assignments of Φ.
We are going to construct a matchgrid using again the base n = (1, 1), p =
(1, −1), which appeared to be a very useful base in designing holographic
reductions.
Let Φ be a planar not-all-equal formula, and let G = (U, V, E) be its corre-
sponding planar graph. Each clause vertex in U will be represented with a rec-
Holographic algorithms 235
1 0.5 0.5
where the first and the last vertices are the input vertices. Indeed, it has
standard signature (0.5, 0, 0, 0.5)T and
n⊗n 0.5 1 1 1 1 0.5 1
p ⊗ n 0 1 −1 1 −1 0 0
n ⊗ p 0 = 1 1 −1 −1 0 = 0 .
1
1 4 1
3 3
1
4
1
4
1
3
where the outer vertices are the 3 input vertices. It is easy to see that its
T
standard signature is 14 , 0, 0, 14 , 0, 14 , 14 , 0 and its signature in the given base
is
236 Computational complexity of counting and sampling
1
n⊗n⊗n 4
p⊗n⊗n 0
n⊗p⊗n 0
1
p⊗p⊗n
4 =
n⊗n⊗p 0
1
p⊗n⊗p
41
n⊗p⊗p
4
p⊗p⊗p 0
1
1 1 1 1 1 1 1 1 4 1
1
1 1 1 −1 −1 −1 −1
0
0
1
1 −1 −1 1 1 −1 −1
0 0
1
1
1 −1 −1 −1 −1 1 1
4 = 0
.
1
−1 1 −1 1 −1 1 −1
0
0
1
1
−1 1 −1 −1 1 −1 1
4
0
1
1 −1 −1 1 1 −1 −1 1 4
0
1 −1 −1 1 −1 1 1 −1 0 1
5.2.5 #7 Pl-Rtw-Mon-3SAT
Our last example is one of the most curious problems solvable in polyno-
mial time using holographic reduction. A logical formula is called read-twice
if each variable appears in exactly 2 clauses. The Pl-Rtw-Mon-3SAT problem
asks the satisfiability of a planar, read-twice, monotone 3CNF. It is trivially
Holographic algorithms 237
in P, since the all TRUE assignments naturally satisfies it. Surprisingly, #Pl-
Rtw-Mon-3SAT is still a #P-complete problem, furthermore, deciding if there
are an even number of satisfying assignments is ⊕P-complete, and thus, NP-
hard [174] (problems in ⊕P ask the parity of the number of witnesses of
problems in NP; the notation ⊕P is pronounced “parity-p” and also denoted
by PP). On the other hand, the following problem can be solved in polynomial
time.
Problem 16.
Name: #7 Pl-Rtw-Mon-3SAT.
Input: a planar, read twice, monotone 3CNF.
Output: the number of satisfying assignments modulo 7.
We would like to design a matchgate in which each clause is replaced with
an arity 3 recognizer matchgate with a signature
Br = (0, 1, 1, 1, 1, 1, 1, 1)T
in some base (modulo 7), and each variable is replaced with an arity 2 gener-
ator matchgate with a signature
g 0 = (1, 0, 0, 1)
in some base, also modulo 7. That is, in the matchgrid, each F ⊆ C having a
value 1 represents a satisfying assignment.
We will work in the base n = (5, 4), p = (1, 1), and all computations are
in the finite field F7 . Each clause is replaced with a recognizer matchgate
1 1
2
where the 3 vertices of the triangle are the input vertices. It is easy to see that
it has standard signature
r = (0, 2, 2, 0, 2, 0, 0, 2)T
238 Computational complexity of counting and sampling
5 3 1
where the first and the last vertices are the output vertices. It has standard
signature
g = (5, 0, 0, 3)
therefore, its signature in the given base is indeed g 0 = (1, 0, 0, 1) as
Its planar drawing can be obtained from K4 by adding a vertex in the middle
of each edge. The vertices of K4 are the clause vertices, and the additional
vertices are the variable vertices. The number of satisfying assignments of Φ
are those subgraphs of (the vertex-labeled) K4 that do not contain an isolated
vertex. It is easy to see that there are 42 such subgraphs of K4 , thus there
are 42 satisfying assignments of Φ. Indeed, these subgraphs of K4 are the
following:
respectively.
That is, the number of satisfying assignments of Φ is 0 modulo 7.
The perfect matchings in the matchgrid correspond to those subgraphs of
K4 in which every degree is either 1 or 3. Indeed, each generator matchgate
must have either 0 or 2 edges in C incident to its output edges to have a
perfect matching. The two cases correspond to the presence or absence of the
corresponding edge in K4 . Similarly, each recognizer matchgate must have ei-
ther 1 or 3 edges in C incident to its input edges to have a perfect matching.
These two cases correspond to having degree 1 or degree 3 on the correspond-
ing vertex of K4 . It is easy to see that each subgraph of K4 with the prescribed
constraint corresponds to exactly one prefect matching of the matchgrid, and
the score of that perfect matching is the product of the score of the perfect
matchings of the matchgates. The subgraphs with all degree 1 or 3 are the
following:
1. There are 4 perfect matchings in K4 . It is easy to see that each corre-
sponding perfect matching has score 26 32 54 (mod 7) = 4.
2. There are 4 star trees. Each of the corresponding perfect matchings has
a score 26 33 53 (mod 7) = 1.
3. The K4 itself. Its corresponding perfect matching has score
26 36 (mod 7) = 1.
Therefore, the partition function of the matchgrid is indeed
4 × 4 + 4 × 1 + 1 × 1 ( mod 7) = 0.
• Jin-Yi Cai, Pinyan Lu and Mingji Xia introduced the Fibonacci gates
[37]. Fibonacci gates have symmetric signatures [f0 , f1 , . . . , fk ] where
for each fi it holds that fi = fi−1 + fi−2 . The authors showed that the
Holant problems with Fibonacci gates are solvable for arbitrary graphs
in polynomial time not only for planar graphs.
• Mingji Xia, Peng Zhang and Wenbo Zhao used holographic reductions
to prove #P-completeness of several counting problems [184]. First they
used polynomial interpolation (see Subsection 4.2.4) to prove that count-
ing the vertex covers in planar, bipartite 3-regular graphs is in #P-
Holographic algorithms 241
5.4 Exercises
1. ◦ Show that for any positive integer k, there exists a planar, 3-regular
graph with 2k + 2 vertices.
2. * Professor F. Lake tells his class that matchgates having weights on the
edges in the set C have more computational power. Should they believe
him?
3. Show that for any row vectors u1 and u2 and matrices A1 and A2 , it
holds that
(u1 A1 ) ⊗ (u2 A2 ) = (u1 ⊗ u2 )(A1 ⊗ A2 ).
4. Show that for n = (−1, 1) and p = (1, 0), it indeed holds that
(1, 1, 1, 0) = n ⊗ n + n ⊗ p + p ⊗ n.
ω 2 = −ω − 1.
ω3 = 1
ω 2 = −ω − 1,
thus, it is possible to construct an arity 3 recognizer matchgate with
signature (0, 1, 1, 1, 1, 1, 1, 1) in base n = (1 + ω, 1 − ω) = (5, −3) =
(5, 4) (mod 7), p = (1, 1) over field F7 by simply copying the solution of
Exercise 9.
Holographic algorithms 243
5.5 Solutions
Exercise 1. Construct recursively an infinite series starting with K4 .
Exercise 2. No, he is not right. Any such matchgrid might be mimicked by
inserting a path of length 2 between the edge in C with weight w and (for
example) the output node of the incident recognizer matchgate. The path will
belong to the recognizer matchgate, and the two edges will have weights 1 and
w.
Exercise 6. Observe that the number to be computed is the score of the
X-matching when each weight is 1.
Exercise 7. The reduction is based on polynomial interpolation. Let G be a
planar, 3-regular graph, and consider the polynomial
3c
bX
n
fG (x) = ai xi
i=1
1 1
1
4
1
Exercise 10. It is trivial to check the inequalities. The weight 4 should be
replaced with 2 since 2 × 4 = 1 (mod 7).
Part II
Computational Complexity
of Sampling
245
Chapter 6
Methods of random generations
247
248 Computational complexity of counting and sampling
Sometimes the cumulative density function does not have an analytic, easy-
to-compute form, even its inverse. Interestingly, there is still a simple method
that generates normally distributed random variables first proved by George
Edward Pelham Box and Mervin Edgar Muller [22].
Theorem 60. Let u1 and u2 be uniformly distributed values on [0, 1]. Then
p
z1 := −2 ln(u1 ) cos(2πu2 ) (6.8)
and p
z2 := −2 ln(u1 ) sin(2πu2 ) (6.9)
are independent random variables, both of them following the standard normal
distribution.
In many cases, more sophisticated methods are needed, see also the next
section.
Theorem 61. The accepted samples in the rejection sampling follow the dis-
tribution π.
Proof. We apply the Bayes theorem on conditional probabilities. Let A denote
the event that a sample is generated. First, we calculate P (A). If X is a discrete
space, then the probability that A happens is the sum of the probability that
x is drawn multiplied by the probability that x is accepted, summing over all
members x ∈ X. That is,
X f (x) X n1 π(x) n1 X n1
P (A) = p(x) = p(x) = π(x) = . (6.14)
cg(x) cn2 p(x) cn2 cn2
x∈X x∈X x∈X
(For a continuous space, similar calculation holds, just the summation can
be replaced with integration.) Now we can use the Bayes theorem. For an
arbitrary x, the equation
f (x) n1 π(x)
P (A|x)P (x) cg(x) p(x) cn2 p(x) p(x)
P (x|A) = = n1 = n1 = π(x) (6.15)
P (A) cn2 cn2
If all proposals are rejected, then return with the last proposal. Then the
generated satisfying assignment follows the distribution
The inequality αdT V (U, p) < α comes from the fact that the total variation
distance between any two distributions is at most 1.
Since one rejection sampling step can be done in polynomial time, the
running time of the algorithm is clearly polynomial in both the size of the
problem and − log(ε), and thus, the described procedure is indeed an FPAUS.
where SΦ1 denotes the set of satisfying assignments of Φ1 and SΦ denotes the
set of satisfying assignments of Φ.
If f (x1 ) < 0.5, then let Φ1 be the DNF obtained from Φ such that all
clauses are removed in which x1 is a literal, and each x1 is removed from all
clauses in which x1 is a literal. The number of satisfying assignments of Φ1 is
the number of satisfying assignments of Φ in which x1 is FALSE. Particularly,
where SΦ1 denotes the set of satisfying assignments of Φ1 and SΦ denotes the
set of satisfying assignments of Φ.
Methods of random generations 253
Assume that we can sample only from the distribution p. We would like to
find function g satisfying that
Ep [g] = Eπ [f ] (6.29)
namely, the expected value of g under the distribution p is the expected num-
ber of f under the distribution p. It is easy to see that the following theorem
holds.
Theorem 62. If
π(x)
g(x) := f (x) (6.30)
p(x)
254 Computational complexity of counting and sampling
then
Ep [g] = Eπ [f ]. (6.31)
Proof. Indeed,
X X π(x) X
Ep [g] = g(x)p(x) = f (x)p(x) = f (x)π(x) = Eπ [f ]. (6.32)
p(x)
x∈X x∈X x∈X
Example 15. Consider the biased cube C with which throwing value k has
probability proportional to k. Work out the importance sampling method to
estimate the expected value of the unbiased cube using samples from the biased
cube C.
Solution. The sum of the possible values is 21, therefore if ξ denotes the
thrown value, then
k
P (ξ = k) = .
21
This is the sampling distribution p. The desired distribution π is the uniform
distribution, where each outcome has probability 16 . The function f assigns a
value k to the event “a value k has been thrown”. Then the g function we are
looking for is
1
π(k) 21
g(k) = k = k6 k = = 3.5.
p(k) 21
6
Namely, g is the constant 3.5 function, which is the expectation of the thrown
value on an unbiased die. What follows is that this importance sampling has
0 variance! Namely, one sample from the sampling distribution provides an
exact estimation of the expectation in the target distribution.
256 Computational complexity of counting and sampling
Therefore, the variance of the g function might be extremely big, making the
method computationally intractable. In many of those cases, Markov chain
Monte Carlo methods can help, see Section 6.6.
For any problem instance x, let θ denote the parameter for which the solution
of x is F (S(θ)). We further assume that for any θ0 ∈ B ∩ θ↓ , sampling from
the distribution
f (a)
π(a) := (6.39)
F (S(θ0 ))
can be done in polynomial time. Then it is possible to sample from S(θ) fol-
lowing the distribution
f (a)
π(a) := (6.40)
F (S(θ))
in polynomial time.
Proof. We exhibit a recursive algorithm that generates samples from the pre-
scribed distribution. Let i denote the indexes in the computation
X
F (S(θ)) = Ti (F (S(θi,1 , . . . , θi,mi ; θi,1 , . . . , θi,mi ). (6.41)
i
3. Return with ◦i m
j=1 aj .
i
mi
Y f (aj )
p(i) =
j=1
F (S(θ j ))
Qmi mi
cθ1 ,...,θmi j=1 F (S(θj )) Y f (aj )
=
F (S(θ)) j=1
F (S(θj ))
Qmi
cθ1 ,...,θmi j=1 f (aj )
= π(◦i m
j=1 aj ).
i
(6.42)
F (S(θj ))
Sampling from each πj can be done in the same way. Therefore, the follow-
ing recursive algorithm samples a random a from the prescribed distribution:
sampler(θ)
if θ ∈
/B
Generate a random i following the distribution
258 Computational complexity of counting and sampling
else
f (a)
return a following distribution F (S(θ)) .
and recursions
X
d(i, W ) = d(i − 1, W 0 )π(W 0 → xi W )
W 0 ∈N
Methods of random generations 259
X X
d(i, ε) = d(i − 1, W )π(W → xi ) + d(i, W )π(W → ε).
W ∈N W ∈N
d(i − 1, W )π(W → xi W 0 )
.
d(i, W 0 )
where m(T , r) is the multiplicity of the rewriting rule r in the parse tree T
generating X, and ∝ stands to “proportional to”.
Solution. Recall that the sum
X Y
π(r)m(T ,r)
T |T generates X r∈T
d(i, i, W ) = π(W → xi )
and recursion
X X X
d(i, j, W ) = d(i, k, W1 )d(k + 1, j, W2 )π(W → W1 W2 ).
i≥k<j W1 ∈N W2 ∈N
260 Computational complexity of counting and sampling
Phase II. The random tree is obtained by the following recursive function
calling it with parameters (1, n, S).
TreeSampler(i, j, W )
if i < j
Generate a random k, W1 and W2 following the distribution
Let T1 :=TreeSampler(i, k, W1 )
Let T2 :=TreeSampler(k + 1, j, W2 )
Generate a tree T by merging T∞ and T∈ with rule W → W1 W2
return T
else
return T = W → xi
z = “(v3 , v4 ), (v5 , v6 )”
“4 ), (v5 , v6 )”
trees in G−e in which two spanning trees are distinguished if they contain
different edges between w and x.
Also observe that the number of spanning trees in G that do not contain
edge e is the number of spanning trees in G\{e}. Therefore the spanning trees
of G can be described with the following count tree. Each vertex v is labeled
by a (possibly multi)graph G0 . If G0 has edges, then v has two children labeled
0
by G0−e and G0 \ {e0 }, where e0 is the smallest edge in G0 . Further, the two
edges connecting v to its children are labeled by 1 and 0.
Since the number of spanning trees of multigraphs can be computed in
polynomial time, this count tree can be used to uniformly generate random
spanning trees of G in polynomial time.
hold. T is also called transition probabilities. The Markov chain can be de-
~ = (V, E) is the Markov
scribed with a directed graph called a Markov graph. G
graph of the Markov chain M = (X, T ), if V represents the states of the
Markov chain, and there is an edge from vx to vy if the corresponding transi-
tion probability T (y|x) is greater than 0.
The Markov chain is irreducible if its Markov graph is strongly connected.
264 Computational complexity of counting and sampling
x1 = x0 T (6.46)
holds.
There are two types of convergence. The local convergence means conv-
erence from some given starting distribution. Global convergence means con-
vergence from an arbitrary starting distribution.
Definition 53. A Markov chain converges to distribution π from a distribu-
tion x0 if
lim dT V (x0 Tt , π) = 0. (6.49)
t→∞
x
| .{z
. . x} y . . . y .
| {z }
n n
Any Dyck word can be easily transformed into D0 . Indeed, let D be a Dyck
word, and let i be the smallest index such that D contains a y in position i.
If i = n + 1, then D = D0 , and we are ready. Otherwise there is a position
i0 such that the character in position i0 is x, and all characters in position
i, i + 1, . . . , i0 − 1 are y. Then we can swap characters in positions i0 − 1 and
i0 , then in positions i0 − 2 and i0 − 1, and so on, finally in positions i and
i + 1. Then now the smallest index where there is a y in the Dyck word is
i + 1. Therefore in a finite number of steps, the smallest index where there
is a y will be n + 1, and then we transform the Dyck word into D0 . Then
any Dyck word D1 can be transformed into D2 , since both D1 and D2 can
be transformed into D0 , and the reverse way of transformations from D0 to
D2 are also possible transformation steps in the Markov chain. Then we can
transform D1 into D0 and then D0 into D2 .
To see that the Markov chain is aperiodic, it is sufficient to show that there
are loops in the Markov graph. Indeed, any Dyck word ends with a character
y, therefore whenever the random i is 2n − 1, the Markov chain remains in the
same state. Therefore, in the Markov graph, there is a loop on each vertex,
266 Computational complexity of counting and sampling
that is, there are cycles of length 1. Then the greatest common divisor of the
cycle lengths is 1.
We are going to prove that the Markov chain is reversible with respect to
the uniform distribution. We only have to show that for any Dyck words D1
and D2 ,
T (D2 |D1 ) = T (D1 |D2 ) (6.50)
since in the uniform distribution
It turns out that essentially any Markov chain can be transformed into a
Markov chain that converges to a prescribed distribution π. The technique is
the Metropolis-Hastings algorithm described in the following theorem.
Theorem 66. Let M = (X, T ) be an irreducible and aperiodic Markov chain.
Furthermore, we require that for any x, y ∈ X, the property
π(y)T (x|y)
u≤ (6.53)
π(x)T (y|x)
and it is xt otherwise.
Methods of random generations 267
Proof. The algorithm generates a Markov chain, since the Markov property
holds, that is, the next state depends on only the current state and not on the
previous states. Let this defined Markov chain be denoted by M 0 . The ratio
in Equation (6.53) is positive, since the distribution π is non-vanishing and
T (x|y) cannot be 0 when y is proposed from x due to the required property
in Equation (6.52). What follows is that M and M 0 have the same Markov
graph. However, in that case, M 0 is irreducible and aperiodic, since M was
also irreducible and aperiodic. We have to show that M 0 is reversible with
respect to the distribution π. First we calculate the transition probabilities in
M 0 . The way to jump from state x to y in M 0 is to first propose y when the
current state is x. This has probability T (y|x). Then the proposed state y has
to be accepted. The acceptance probability is 1 if the ratio in Equation (6.53)
is greater than or equal to 1, and the acceptance probability is the ratio itself
if it is smaller than 1. Therefore, we get that
π(y)T (x|y)
T 0 (y|x) = T (y|x) min 1, . (6.54)
π(x)T (y|x)
6.7 Exercises
1. Generate a random variable following the normal distribution with mean
µ and variance σ 2 .
2. Generate a random variable following the Pareto distribution with pa-
rameters xm and α. The support of the Pareto distribution is [xm , ∞)
and its cumulative density function is
x α
m
1− .
x
holds.
13. Show that the total variation distance is indeed a distance.
14. * Develop a Markov chain Monte Carlo method that converges to the
uniform distribution of the possible (not necessarily perfect) matchings
of a graph.
15. Develop a Markov chain Monte Carlo method that converges to the
uniform distribution of the spanning trees of a graph.
16. ◦ Develop a Markov chain Monte Carlo method that converges to the
uniform distribution of permutations of length n.
Methods of random generations 269
6.8 Solutions
Exercise 3. Let v1 , v2 , . . . vn denote the vertices of the polygon. A triangula-
tion can be expressed with the list of edges participating in the triangulation.
We give a recursive function that generates a random triangulation of a
polygon defined with a set of edges. For this, we need to define the following
family of distributions. The domain of the distribution pi is {0, 1, . . . , i − 1}
and
Cj Ci−j
pi (j) := ,
Ci
Pi−1
where Ci is the ith Catalan number. (Since Ci = j=0 Cj Ci−j , this is indeed
a distribution). The recursive function is the following:
triangulator(V = {v1 , v2 , . . . , vn })
if |V | ≥ 4
Generate a random j from the distribution p|V |−3
E0 := ∅
if j 6= n
E0 := E0 ∪ {(v1 , vj )}
if j 6= 3
E0 := E0 ∪ {(v2 , vj )}
E1 :=triangulator(v1 , vj , vj+1 , . . . , vn )
E2 :=triangulator(v2 , v3 , . . . , vj )
return E0 ∪ E1 ∪ E2
else
return ∅.
Exercise 5. Generate a directed acyclic graph whose vertices are the entries
of the dynamic programming table computing the most similar alignment
between the two sequences, and there is an edge from d(i1 , j1 ) to d(i2 , j2 )
if d(i2 , j2 ) sends an optimal value to d(i1 , j1 ) in the dynamic programming
recursion (a similarity value or a gap penalty is added to d(i2 , j2 ) to get
d(i1 , j1 )). The optimal alignments are the paths from d(n, m) to d(0, 0), where
n and m are the length of the two sequences. Thus, the task is to sample
uniformly a path between two vertices in an acyclic graph.
Exercise 6. Let G = (V, E) be a planar graph. Fix an arbitrary total ordering
on E, and let (vi , vj ) be the smallest edge. Observe that the number of perfect
matchings containing (vi , vj ) is the number of perfect matchings in G\{vi , vj }
(that is, we remove vertices vi and vj from G, together with all the edges
incident to vi or vj ), and the number of perfect matchings not containing
(vi , vj ) is the number of perfect matchings in G \ {(vi , vj )} (that is, we remove
the edge (vi , vj ) from G).
270 Computational complexity of counting and sampling
g = ae−a(x−a)
and
−a2
e 2
c= √
2πaΦ(−a)
then
cg(x) ≥ f (x)
for all x ≥ a. Therefore, we can use g and f in a rejection sampling. When
the generated random number is x, the acceptance probability is
−x2
f (x) e 2 (x−a)2
= −a2 = e− 2 .
cg(x) e 2 e−a(x−a)
The expected acceptance probability is
Z ∞ r
(x−a)2 π a2 a
ae−a(x−a) e− 2 = ae 2 erf c √ ,
x=a 2 2
where erf c is the complementary error function of the normal distribution.
It can be shown that the expected acceptance probability grows strictly
monotonously with a, and it is ≈ 0.65567 when a = 1.
Exercise 12. Let B be the set of points for which π(x) − p(x) ≥ 0. Observe
the following.
1. It holds that
X X
(π(x) − p(x)) = max (π(x) − p(x)).
A⊂X
x∈B x∈A
Since for any x ∈ B, −(π(x) − p(x)) = |π(x) − p(x)|, and for any x ∈ B,
(π(x) − p(x)) = |π(x) − p(x)|, it holds that
X X X
|π(x) − p(x)| = 2 (π(x) − p(x)) = 2 max (π(x) − p(x)).
A⊂X
x x∈B x∈A
273
274 Computational complexity of counting and sampling
for some matrix W and diagonal, real matrix Λ. In that case, T can also be
diagonalized and has all real eigenvalues, since
1 1
T = Π 2 W ΛW −1 Π− 2 . (7.5)
where the vector 1i contains 0 in each coordinate except in the ith coordinate,
which is 1.
Theorem 68. For a Markov chain, it holds that
1 1 1
τi (ε) ≤ log + log (7.8)
1−ρ π(xi ) ε
and
ρ 1
max{τi (ε)} ≥ log . (7.9)
i 2(1 − ρ) 2ε
The proof of the first inequality can be found in [54], while the proof of the
second inequality can be found in [5]. Theorem 68 says that the relaxation time
is proportional to the inverse of the difference between the largest eigenvalue
(that is, 1) and the SLEM. The following theorem says that it is sufficient to
consider the second-largest eigenvalue of a Markov chain.
Indeed, this process is a Markov chain, since the series of random states
satisfies the Markov property: where we are going depends on the current state
and not where we came from. Its transition matrix is T +I2 , so we can apply
Theorem 69. The largest eigenvalue and its corresponding eigenvector do not
change, so the Markov chain still converges to the same distribution.
We are ready to state and prove the main theorem on the mixing time of
Markov chains and FPAUS algorithms.
(c) there is a random algorithm that for any solution y, draws an entry from
the conditional distribution T (·|y) in polynomial time.
Then #A is in FPAUS.
Proof. Let x be a problem instance of #A, and let ε > 0. Since #A is in
#P, there is a constant c > 1 and a polynomial poly1 such that the number
of solutions of x is less than or equal to cpoly1 (n) . Indeed, any witness can
be verified in polynomial time, and the witnesses are described using a fixed
alphabet (the alphabet depends on only the problem and not the problem
instance). To verify a solution, it must be read. What follows is that the
number of solutions cannot be more than |Σ|poly(n) , where Σ is the alphabet
used to describe the solutions and poly() is the natural or given polynomial
upper bound on the running time to verify a solution.
Having said these, it is easy to show that the following algorithm is an
FPAUS:
1. Construct a solution y of x.
2. Using y as the starting point of the Markov chain, do
2 1
poly1 (n) log(c) + log (7.13)
1 − λ2 ε
Indeed, the state that the algorithm returns follows a distribution that
satisfies Equation (1.31), since
2 1 1
τy (ε) ≤ log + log ≤
1 − λ2 π(y) ε
2 1
poly1 (n) log(c) + log . (7.14)
1 − λ2 ε
The first inequality comes from Theorem 68 and from Corollary 13. The second
1
inequality comes from the observation that π(y) is the size of the solution
space, since π is the uniform distribution, and we showed that cpoly1 (n) is an
upper bound of it. The running time of the algorithm is O(poly(n, − log(ε))),
since the initial state y can be constructed in polynomial time, there are
poly(n, − log(ε)) number of steps in the Markov chain, and each of them can
be performed in O(poly(n)) time.
This theorem justifies the following definition.
Definition 56. Let #A be a counting problem in #P. Let M be a class of
Markov chains, such that for each problem instance x of #A, it contains a
Markov chain converging to the uniform distribution of witnesses of x. Let Mx
denote this Markov chain, and let λ2,x denote its second-largest eigenvalue. We
say that M is rapidly mixing if
1
= O(poly(|x|)). (7.15)
1 − λ2,x
Φ2
1 − 2Φ ≤ λ2 ≤ 1 − . (7.20)
2
The proof can be found in [25], Chapter 6. The two inequalities in Equa-
tion (7.20) will be referred to as the left and right Cheeger’s inequality. This
theorem says that a Markov chain is rapidly mixing if and only if its conduc-
tance is large. We can prove the torpid mixing of a Markov chain by finding
a subset whose ergodic flow is negligible compared to its capacity.
Mixing of Markov chains and their applications 279
Example 21. Let M be a set of Markov chains; the Markov chains in it are
defined on the same state spaces as in Example 20. However, in this case, let
the jumping probabilities be the following:
1
P (vk+1,2j−1 |vk,j ) = P (vk+1,2j |vk,j ) = k = 0, . . . n − 1 (7.28)
4
1
P (vk−1,d j c |vk,j ) = k = 1, . . . n − 1 (7.29)
2 2
1
P (v0,1 |v0,1 ) = (7.30)
2
1
P (vn−1,d j c |vn,j ) = P (vn,j |vn,j ) = . (7.31)
2 2
Show that the Markov chain is rapidly mixing, that is, λ2,n converges to 1 only
polynomially quick.
1
Solution. Let Si := {xi,1 , xi,2 , . . . , xi,2i }. It is easy to see that π(Si ) = n+1 ,
π is the uniform distribution restricted to any Si , and the Markov chain is
reversible with respect to π. It follows that for any xi,j ,
1
π(xi,j ) = . (7.32)
(n + 1)2i
We can also observe the following. Let S be an arbitrary subset of vertices. S
can be decomposed into connected components, S = tk Ck . Then
F (S) F (Ck )
≥ min . (7.33)
π(S) k π(Ck )
Indeed, P
F (S) F (Ck )
= Pk (7.34)
π(S) k π(Ck )
therefore, it is sufficient to show that for any a1 , a2 , b1 , b2 > 0, the inequality
a1 + a2 a1 a2
≥ min , (7.35)
b1 + b2 b1 b2
a1 a2
holds. Without loss of generality, we can say that b1 ≤ b2 . Then we have to
show that
a1 + a2 a1
≥ . (7.36)
b1 + b2 b1
Rearranging this, we get that
a1 b1 + a2 b1 ≥ a1 b1 + a1 b2 . (7.37)
That is,
a2 b1 ≥ a1 b2 , (7.38)
a1 a2
which holds, since b1 ≤ b2 .
Mixing of Markov chains and their applications 281
Then
P
k π vi T vik ,jk vi
l m l m
jk jk
F (S) k −1, 2 k −1, 2
Φ= ≥ 1 ≥
π(S) 2
1 1
n4 1
1 = . (7.43)
2
2n
1
Therefore, the conductance is at least 2n . Putting this into the right Cheeger
inequality, we get that
Φ2 1
λ2,n ≤ 1 − ≤ 1 − 2. (7.44)
2 8n
That is, λ2,n tends to 1 only polynomially quick with n.
We are going to use the same example Markov chain to demonstrate other
proving methods. In many cases, the state space of a Markov chain has a
more complex structure than in this example. However, if the states can be
embedded into a convex body, then we might be able to prove rapid mixing,
since a convex body does not have a bottleneck. This might be stated formally
with the following theorem.
282 Computational complexity of counting and sampling
The proof can be found in [109]. Below we show an example of how to use
this theorem to prove rapid mixing. In Chapter 8, we will prove that #LE is
in FPAUS applying this theorem.
Example 22. Let M be a set of Markov chains which contains a Markov
chain Mn for each positive integer n. The state space of Mn includes the 0 − 1
vectors of length n. There is a transition between two states if they differ in
exactly one coordinate. Each transition probability is n1 . Prove that the Markov
chain is rapidly mixing.
Solution. It is easy to see that the Markov chain n converges to the uniform
distribution, and each state has probability 12 . We will denote this by π.
Let s = {s1 , s2 , . . . , sn } be a state in Mn . Assign a convex body to s defined
by the following inequalities:
si si 1
≤ xi ≤ + . (7.46)
2 2 2
It is easy to see that the unions of these convex bodies are hypercubes, and
their union is the unit hypercube. A transition of the Markov chain can happen
between two states whose hypercubes have a common surface which is an
n−1
n − 1-dimensional hypercube. The area of this surface is 12 . The volume
n √
of each small hypercube is 12 . The diameter of the unit hypercube is n.
Let S be the subset of the state space of the Markov chain that defines the
conductance. Let U denote the body corresponding to S and let W denote
the body corresponding to S. Let C denote the surface separating U and W .
Observe that C is a union of n − 1-dimensional hypercubes that corresponds
to the possible transitions between U and W . Then on one hand, we know
that
P
π(x)T (y|x) A(C) 1
n−1 π n
F (S) x∈S,y∈S ( 12 ) A(C)
Φ= = P = V (U ) = . (7.47)
π(S) x∈S π(S) 1 nπ
2nV (U )
(2)
A(C) 1
≥√ , (7.48)
V (U ) n
Mixing of Markov chains and their applications 283
Proof. The right-hand side of Equation (7.51) is symmetric due to the re-
versibility of the chain. Thus, if π(S) > 12 , then S and S can be switched. If
π(S) ≤ 12 , the inequality is simply a rearrangement of the Cheeger inequality
(the left inequality in Theorem 71.). Indeed,
P
x∈S,y∈S π(x)T (y|x)
1−2 ≤ 1 − 2Φ ≤ λ2 . (7.52)
π(S)
Rearranging the two ends of the inequality in Equation (7.52), we get the
inequality in Equation (7.51).
Now we are ready to state and prove a general theorem on rapidly mixing
Markov chains on factorized state spaces.
Theorem 73. Let M be a class of reversible, irreducible and aperiodic
Markov chains whose state space Y can be partitioned into disjoint classes
Y = ∪x∈X Yx by the elements of some set X. The problem size of a particular
chain is denoted by n. For notational convenience we also denote the element
y ∈ Yx via the pair (x, y) to indicate the partition it belongs to. Let T be the
transition matrix of M ∈ M, and let π denote the stationary distribution of
284 Computational complexity of counting and sampling
M. Moreover, let πX denote the marginal of π on the first coordinate that is,
πX (x) = π(Yx ) for all x. Also, for arbitrary but fixed x let us denote by πYx the
stationary probability distribution restricted to Yx , i.e., π(y)/π(Yx ), ∀y ∈ Yx .
Assume that the following properties hold:
i For all x, the transitions with x fixed form an aperiodic, irreducible and
reversible Markov chain denoted by Mx with stationary distribution πYx .
This Markov chain Mx has transition probabilities as Markov chain M
for all transitions fixing x, except loops, which have increased probabili-
ties such that the transition probabilities sum up to 1. All transitions that
would change x have 0 probabilities. Furthermore, this Markov chain is
rapidly mixing, i.e., for its second-largest eigenvalue λMx ,2 it holds that
1
≤ poly1 (n).
1 − λMx ,2
ii There exists a Markov chain M 0 with state space X and with transition
matrix T 0 which is aperiodic, irreducible and reversible w.r.t. πX , and
for all x1 , y1 , x2 it holds that
X
T ((x2 , y2 )|(x1 , y1 )) ≥ T 0 (x2 |x1 ). (7.53)
y2 ∈Yx2
1
≤ poly2 (n).
1 − λM 0 ,2
We are going to prove that the ergodic flow F (S) (see Equation (7.18)) from
any S ⊂ Y with 0 < π(S) ≤ 1/2 cannot be too small and therefore, neither
the conductance of the Markov chain will be small. We cut the state space
Mixing of Markov chains and their applications 285
into two parts Y = Y l ∪ Y u , namely the lower and upper parts using the
following definitions (see also Fig. 7.1): the partition X = L t U is defined as
√
π(Yx (S))
L := x∈X ≤ 1/ 2 ,
π(Yx )
√
π(Yx (S))
U := x∈X > 1/ 2 .
π(Yx )
Furthermore, we introduce:
[ [
Y l := Yx and Y u := Yx ,
x∈L x∈U
Yl Yu
Our plan is the following: the ergodic flow F (S) is positive on any non-empty
subset and it obeys:
π(Sl ) π(Su )
F (S) = F 0 (Sl ) + F 0 (Su ) ,
π(S) π(S)
where
1 X
F 0 (Sl ) := π(x)T (y|x)
π(Sl )
x∈Sl ,y∈S̄
and
1 X
F 0 (Su ) := π(x)T (y|x).
π(Su )
x∈Su ,y∈S̄
In other words, F 0 (Sl ) and F 0 (Su ) are defined as the flow going from Sl and
Su and leaving S.
The value F (S) cannot be too small, if at least one of F 0 (Sl ) or F 0 (Su ) is
big enough (and the associated fraction π(Sl )/π(S) or π(Su )/π(S)). In Case
1 we will show that F 0 (Sl ) itself is big enough. To that end it will be sufficient
to consider the part which leaves Sl but not Y l (this guarantees that it goes
out of S, see also Fig. 7.2). For Case 2 we will consider F 0 (Su ), particularly
that part of it which goes from Su to Y l \ Sl (and then going out of S, not
only Su , see also Fig. 7.3).
In Case 1, the flow going out from Sl within Y l is sufficient to prove that
the conditional flow going out from S is not negligible. We know that for
any particular x, we have a rapidly mixing Markov chain Mx over the second
coordinate y. Let their smallest conductance be denoted by ΦX . Since all these
Markov chains are rapidly mixing, we have that
1
max λMx ,2 ≤ 1 −
x poly1 (n)
or, equivalently:
1
ΦX ≥ .
2poly1 (n)
However, in the lower part, for any particular x one has:
π(Yx (S)) 1
πYx (Yx (S)) = ≤√ ,
π(Yx ) 2
Mixing of Markov chains and their applications 287
Yl Yu
Note that the flow on the right-hand side of Equation 7.56 is not only going
out from Sl but also from the entire S. Therefore, we have that
π(Sl ) 1 1
F (S) ≥ × 1− √ .
π(S) 2poly1 (n) 2
Either π(Sl ) ≤ π(Su ), which then yields
π(Sl ) π(Sl ) π(Sl ) 1 1
= ≥ ≥ √ 1− √
π(S) π(Sl ) + π(Su ) 2π(Su ) 8 2poly2 (n) 2
after using Equation 7.54, or π(Sl ) > π(Su ), in which case we have
π(Sl ) 1 1 1
> ≥ √ 1− √ .
π(S) 2 8 2poly2 (n) 2
(Note that poly2 (n) > 1.) Thus in both cases the following inequality holds:
1 1 1 1
F (S) ≥ √ 1− √ × 1− √ .
8 2poly2 (n) 2 2poly1 (n) 2
Yl Yu
1
πX (X(Su )) ≤ √
2
otherwise π(Su ) > 1/2 would happen (due to the definition of the upper part),
and then π(S) > 1/2, a contradiction.
Mixing of Markov chains and their applications 289
Hence in the Markov chain M 0 , based on the Lemma 14, we obtain for
X(Su ) that
1 1 X
min πX (X(Su )), 1 − √ ≤ πX (x)T 0 (x0 |x). (7.56)
2poly2 (n) 2 0 x ∈X(Su )
x∈X(Su )
For all y for which (x, y) ∈ Su , due to Equation (7.53), we can write:
X
T 0 (x0 |x) ≤ T ((x0 , y 0 )|(x, y)) .
y0
πX (x) √
Recall that πX (x) = π(Yx ), and thus π(Y x (S)) ≤ 2 for all x ∈ X(Su ).
Therefore we can write that
1 1
min πX (X(Su )), 1 − √ ≤
2poly2 (n) 2
√ X X
2 π((x, y))T ((x0 , y 0 )|(x, y)) .
(x,y)∈Su (x0 ,y 0 )|x0 ∈X(Su )
Note that π(Su ) ≤ πX (X(Su )) < 1, and since both items in the minimum
taken in the LHS are smaller than 1, their product will be smaller than any
of them. Therefore we have
1 1
√ π(Su ) 1 − √ ≤
2 2poly2 (n) 2
X X
≤ π((x, y))T ((x0 , y 0 )|(x, y)) .
(x,y)∈Su (x0 ,y 0 )|x0 ∈X(Su )
290 Computational complexity of counting and sampling
This flow is going out from Su , and it is so large that at most half of it
can be picked up by the lower part of S (due to reversibility and due to
Equation 7.55), and thus the remaining part, i.e., at least half of the flow,
must go out of S. Therefore:
π(Su ) 1 1
× √ 1− √ ≤ F (S) .
π(S) 4 2poly2 (n) 2
Theorem 74. Let Γ be a path system of a Markov chain. Then the inequality
1
Φ≥ (7.60)
ϑΓ
holds.
We can combine this theorem with the right Cheeger’s inequality to get
the following corollary:
Corollary 15. Let Γ be a path system of a Markov chain. Then the inequality
1
λ2 ≤ 1 − (7.61)
2ϑ2Γ
holds.
Example 23. Let M be the set of Markov chains as in Example 21. Prove
its rapid mixing using Theorem 74.
Mixing of Markov chains and their applications 293
Solution. Since the Markov graph is a tree, there is only one possible path
system: the path connecting the pair of vertices in the graph. We have to show
that for any edge e going from vi,j to vi−1,d j e ,
2
1 X
π(vx )π(vy ) (7.65)
Q(e) γ 3e
x,y
is polynomially bounded, and the same is true for the antiparallel edge e0
going from vi−1,d j e to vi,j .
2
Let S denote the set of vertices containing vi,j and the vertices below vi,j .
Then
1 X 1 X X
π(vx )π(vy ) = 1 π(vx ) π(vy ). (7.66)
Q(e) γ 3e π(vi,j ) 2 v ∈S
x,y x vy ∈S
We know that
1
π(vi,j ) ≥ π(S), (7.67)
n
therefore
1 X X X
1 π(vx ) π(vy ) ≤ 2n π(vy ) ≤ 2n. (7.68)
π(vi,j ) 2 v ∈S
x vy ∈S vy ∈S
where |γx,y | is the length of the path, that is, the number of edges in it. Then
the inequality
1
λ2 ≤ 1 − (7.74)
KΓ
holds.
This theorem is related to the maximal load. Indeed, the following corollary
holds.
Corollary 16. For any path system Γ, the inequality
1
λ2 ≤ 1 − (7.75)
l Γ ϑΓ
holds, where lΓ is the length of the longest path in the path system.
When lΓ < ϑΓ , then this inequality provides better bounds. Obviously, if
we would like to prove only polynomial mixing time, any of these theorems
might be applicable. On the other hand, when we try to get sharp estimates,
it is important to carefully select which theorem to use.
When the Markov graph has a complicated structure, or just the opposite,
a very symmetric structure, it might be hard to design a path system such that
none of the edges are overloaded. An overloaded edge might cause weak upper
bound on λ2 and thus fail to prove rapid mixing. In such cases, a random path
system might help. First we define the flow functions.
Definition 60. Let Πi,j denote the set of paths between vi and vj in a Markov
graph, and let Π = ∪i,j Πi,j . A multicommodity flow in G is a function f :
Π → R+ ∪ {0} satisfying
X
f (γ) = π(vi )π(vj ). (7.76)
γ∈Πi,j
holds.
Corollary 17. Let f be a multicommodity flow. Then the inequality
1
λ2 ≤ 1 − (7.80)
8ϑ2f
holds.
We give an example of when the canonical path method does not give a
good upper bound, however, the multicommodity flow approach will help find
a good upper bound on the second-largest eigenvalue.
Example 24. Consider the Markov chain whose state space contains the 2n-
long sequences containing n a characters and n b characters in an arbitrary
order. The transitions are defined with the following algorithm. We choose an
index i between 1 and 2n − 1 uniformly. If the characters in position i and
i + 1 are different, then we swap them, otherwise the Markov chain remains
in the current state. Prove that the mixing time grows only polynomially with
n.
Solution. One might think that the following canonical path system gives a
good upper bound on the second-largest eigenvalue.
Let x = x1 x2 . . . x2n and y = y1 y2 . . . y2n . Construct a path γxy in the
following way. If x1 6= y1 , find the smallest index i such that xi = y1 . “Bubble
down” xi , that is, swap xi−1 and xi , then xi−2 with xi−1 (which is the prior
xi ), etc. Then compare x2 and y2 ; if they are different, then find the smallest
i such that xi = y2 , bubble down xi , and so on.
However, this path does not provide a good upper bound. To see this,
consider the edge in the Markov graph going from state
aa . . . a} ba |bb {z
| {z . . . }b aa . . . a}
| {z
n/2-1 n-1 n/2
to state
aa . . . a} bb
| {z . . . }b aa
| {z . . . a} .
| {z
n/2 n n/2
How many γxy paths are going through this edge? The first state x of the path
γxy might be an arbitrary sequence that has a suffix of all a’s of length n/2.
There are n b characters and
n/2 a characters in its corresponding prefix, that
might be arranged in 1.5nn different ways. Similarly, the last state y of the
296 Computational complexity of counting and sampling
path γxy might be an arbitrary string with a prefix of all a’s of length n/2.
Again, there are 1.5
n possibilities. Therefore, there are
!2
2 1.5n 1.5n
r
1.5n 3πn e
≈ =
n πn2πn 0.5n 0.5n n n
e e
r 1.5 2n r
3 1.5 3
= 6.75n (7.81)
2πn 0.50.5 2πn
paths that go through this edge. Here we used the Stirling formula to approx-
imate factorials. On the other hand, there are only
r 2n 2n
2n 4πn 1
≈ n
n n n = √ 4 n
e
(7.82)
n 2πn2πn e e
πn
states in the Markov chain. It is easy to see that the stationary distribution of
the Markov chain is the uniform distribution. In the definition of max load in
Equation (7.59), the stationary distribution probability in Q(e) and one of the
stationary probabilities in the summation cancel each other. There is another
stationary distribution probability in the summation which is (neglecting the
non-exponential terms) in order 41n . However, the summation is on an order
of 6.75n paths (again, neglecting the non-exponential terms). That is, the
maximum load is clearly exponential for this canonical path system.
Consider now a multicommodity flow such that there are at most (n!)2
paths between any pair of states with non-zero measure. First, we construct a
multiset of paths of cardinality (n!)4 . Each path in this multiset has measure
π(vi )π(vj )
f (γ) = . (7.83)
(n!)4
Then the final flow between the two states is obtained by assigning a measure
to each path that is proportional to its multiplicity in the multiset.
The paths with non-zero measures are defined in the following way. Let
x = x1 x2 . . . x2n and y = y1 y2 . . . y2n , and let σx,a , σx,b , σy,a and σy,b be
four permutations. Index both the a characters and the b characters by the
permutations in both x and y. Now transform the indexed x into the index y
in the same way as we did in the canonical path reconstruction. The important
difference is that there is exactly one ai and bj character in both x and y for
each of the indexes i and j. Thus, if x1 is not y1 , then find the unique character
xi in x which is y1 , and bubble it down into the first position. Do this procedure
for each character in x to thus transform x into y. The path assigned to this
transformation can be constructed by removing the indexes and also removing
the void steps (when some neighbors ai and aj are swapped in the indexed
version, but they are two neighbor a’s when indexes are removed, there is
nothing to swap there!). We are going to estimate the number of paths that
can get through a given edge. For this, we are going to count the indexed
Mixing of Markov chains and their applications 297
versions of the paths keeping in mind that each of them has a measure given
in Equation (7.83).
Surprisingly, this multicommodity flow provides an upper bound on the
mixing time which is only polynomial with n. To see this, consider any edge
e swapping characters zi and zi+1 in a sequence z. It is easy to see that any
indexing of the characters in z might appear on indexed paths getting through
e. So fix an arbitrary indexing. We are bubbling down character zi+1 , so there
is some 0 ≤ k < i for which the indexes of the first k characters and the index
of character zi+1 are in order as they will appear in y. The other indexes
appear as they were in x. Then there are at most
2n
(k + 1)! (7.84)
k+1
There are (n!)2 possible indexes and at most 2n − 1 possible ks. Therefore,
the maximum flow getting through an edge is
(2n − 1)(n!)2
f (e) ≤ . (7.86)
(2n)!
This should be multiplied by the inverse of Q(e), so we get that the maximum
load is
1 (2n − 1)(n!)2
ϑf ≤ 1 1 = (2n − 1)2 . (7.87)
2n 2n−1 (2n)!
(n)
Applying Theorem 77, we can set a lower bound on the conductance that
proves the rapid mixing of the Markov chain.
K
where ⊗ denotes the usual tensor product from linear algebra, Mi denotes the
transition matrix of the Markov chain on the ith coordinate, and Ij denotes
the identical matrix with the same size as Mj . Since all pairs of terms in the
sum above commute, the eigenvalues of M are
( K
)
1 X
λj ,i : 1 ≤ ji ≤ |Ωi |
K i=1 i
where Ωi is the state space of the Markov chain Mi on the ith coordinate. The
second-largest eigenvalue of M is then obtained by combining the maximal
second-largest eigenvalue (maximal among all the second-largest eigenvalues
of the component transition matrices) with the other largest eigenvalues, i.e.,
with all others being 1s:
K − 1 + maxi {λ2,i }
.
K
If g denotes the smallest spectral gap, ie., g = 1 − maxi {λ2,i }, then from the
above, the second-largest eigenvalue of M is
K −g g
=1−
K K
namely, the second-largest eigenvalue of M is only K times closer to 1 than
the maximal second-largest eigenvalue of the individual Markov chains.
2. Given that a solution is accepted, with less than 2ε probability, the al-
gorithm does not work properly; it generates a solution of x from an
unknown distribution. On the other hand, with at least 1 − 2ε probabil-
ity, the solution is from exactly the uniform distribution of solutions.
Mixing of Markov chains and their applications 303
3. The running time of the algorithm is polynomial with both the size of
x and − log(ε).
rejected is
4e2 log( 2ε ) log( 2ε )
1 1 ε
1− 2 < = . (7.107)
4e e 2
In that case, generate a solution from an arbitrary distribution, for example,
accept the last solution, whatever is it. The running time of this procedure
is polynomial with both the size of x and − log(ε). It generates a solution
following a distribution which is a convex combination of three distributions:
1. With at most 2ε probability, it is an unknown distribution due to reject-
ing all proposals in the rejection sampling.
The left-hand side inequality in Equation (7.111) is trivial. To see the right-
hand side inequality, observe the following. If exact calculations were given
instead of FPRAS in all steps during the construction of y, then p would be
the uniform distribution, and thus p(y) was Y1 . However, in case of FPRAS
approximation, it could happen that the number of solutions below the node
that was selected was overestimated by a 1 + d1 factor, while all other nodes
were underestimated by the same factor. Still, the error introduced at that
point cannot be larger than the square of this factor. There are at most n
iterations, and although the errors here are multiplicative, it still cannot be
larger than
2d
1
1+ . (7.112)
2d
What is the probability of acceptance? With similar considerations, it is easy
to see that
2e 4e
c := ≤ , (7.113)
Ŷ Y
and
1
1 Y
≤ 2d ≤ p(y). (7.114)
eY 1+ 1 2d
Thus the ratio of p(y) and c is smaller than 4e2 , thus the acceptance ratio is
at least 4e12 . All these calculations hold only if all the FPRAS approximations
are in the prescribed boundaries. The probability that any of the FPRAS
approximations fall out of the boundaries is at most
d+1
ε ε
1− 1− ≤ , (7.115)
2(d + 1) 2
due to the Bernulli ’s inequality.
Mixing of Markov chains and their applications 305
Therefore the prescribed properties hold for the rejection sampler, and
thus, the overall procedure is indeed an FPAUS algorithm.
The global picture on how to construct an FPRAS algorithm from an
FPAUS algorithm is the following. We sample solutions using the FPAUS
algorithm and estimate which fraction of the solutions fall below the different
children of the root. If there are m children and the number of samples falling
below child vi is ni , then the estimated fraction is
ni
fˆi = Pm .
j=1 nj
We select the child with the largest fraction, and iterate this procedure. We
will arrive at a particular solution after k steps. Let fˆ1 , fˆ2 , . . ., fˆk denote the
estimations of the largest fractions during this process. We give an estimation
of the number of solutions as
k
Y 1
.
fˆi
i=1
We claim that this is an FPRAS algorithm when the parameters in this pro-
cedure are chosen appropriately.
Let ε0 and δ be the given parameters of the required FPRAS algorithm.
For each internal node we visit during the procedure, we prescribe a total vari-
ation distance for the FPAUS algorithm and prescribe a number of samples.
Generating the prescribed number of samples with the prescribed FPAUS, we
give an estimation of the largest fraction of solutions that fall below a par-
0
ticular child u with the property that the relative error is at most εd with at
least 1 − dδ probability, where d is still the depth of the tree. We construct the
problem instance x1 labeling u, and sample solutions of x1 to estimate the
largest fraction of the solutions that fall below a child. We iterate this proce-
dure until we arrive at a leaf. Then the solution represented by the leaf is a
fraction of the solution space, and the inverse of this fraction is the number
of solutions. We have an estimation for this fraction. It is the product of the
fractions estimated during the iterations. Since we have at most d estimations
of fractions, this indeed leads to an FPRAS algorithm.
Due to the definition of self-reducibility, there are at most |Σ|σ(x) number
of children of an internal node labeled by the problem instance x. For sake of
simplicity, let the largest number of children be denoted by N . It is easy to
see that N = O(poly(n)). Due to the pigeonhole rule, there is a child below
which at least N1 fraction of the solutions reside. Let u denote this child. If the
samples came from the uniform distribution, the fraction of samples below u
would be an unbiased estimator of the probability that a solution is below u.
However, the samples are only almost uniform, and this causes a systematic
0
bias. Fortunately, we can require that ε in the FPAUS algorithm is 2d|σ|εpoly(n)
and then the running time of the FPAUS algorithm is still polynomial in
ε0
both n and ε10 . Furthermore, the systematic error is smaller than 2d . The
number of samples must be set such that the probability that the measurement
306 Computational complexity of counting and sampling
0
ε
error is larger than 2d is smaller than dδ . Let the probability that a solution
sampled by the FPAUS algorithm is below u be denoted by p. Then the
number of solutions below u follows a binomial distribution with parameter p
and expectation mp, where m is the number of samples. We can use Chernoff’s
inequality saying that for the actual number of samples Y below u, it holds
that
!
ε0
mp
P Y < ε0
≤ P Y < mp 1 − ≤
1 + 2d 3d
2
ε0
1 mp − mp 1 − 3d
exp − . (7.116)
2p m
δ
The right-hand side should be bounded by 2d (the other half of the probability
will go to the other tail):
2
ε0
1 mp − mp 1 − 3d δ
exp − ≤ . (7.117)
2p m 2d
w((u, v))
T (u|v) = P 0
(7.121)
u0 |(u0 ,v)∈E w((u , v))
and
w((u, v))
T (v|u) = P 0
. (7.122)
v 0 |(u,v 0 )∈E w((v , u))
since
P
X v”|(u,v”)∈E T (v”|u)w((v, u)) w((v, u))
w(v”|u) = = , (7.126)
T (v|u) T (v|u)
v”|(u,v”)∈E
308 Computational complexity of counting and sampling
and thus
T (v 0 |u)w((v,u))
0 w((u, v 0 )) T (v|u)
T (v |u) = P = w((v,u))
. (7.127)
v”|(u,v”)∈E w(v”|u) T (v|u)
Now for any neighbor v 0 of u, it holds that either v 0 is a leaf or for one of the
edges of v 0 , the weight of the edge is defined and for all other edges, the weight
is not defined. If v 0 is a leaf, then for its edge weight and transition probability,
Equation (7.121) naturally holds. For any internal nodes, the weights of the
other edges can be defined similarly to u.
We can iterate this procedure until all vertices are reached. Due to the tree
structure, it is impossible that a vertex is visited twice.
It is easy to see that the measure π defined in Equation (7.123) is indeed
a distribution. The detailed balance also holds, since
0
P
u0 |(u0 ,v)∈E w((u , v)) w((u, v))
π(v)T (u|v) = P P 0
=
2 e∈E w(e) u |(u ,v)∈E w((u , v))
0 0
0
P
w((u, v)) v 0 |(v 0 ,u)∈E w((v , u)) w((u, v))
P = P P 0
=
2 e∈E w(e) 2 e∈E w(e) v |(v ,u)∈E w((v , u))
0 0
The idea of Sinclair and Jerrum was to reverse the construction: we can
define weights for the edges that will define a corresponding Markov chain. If
all weights of the edges connecting leaves to the remaining tree are the same,
then the stationary distribution is the uniform one on the leaves. We need two
further properties: i) the probability of the leaves in the stationary distribution
must be non-negligible, ii) the Markov chain must be rapidly mixing. Both
properties hold if the weights come from a very rough estimation of the number
of solutions, as stated and proved in the following theorem.
Theorem 82. Let #A be a self-reducible counting problem. Let C be a poly-
nomial time computable function such that for any problem instance x of #A,
C(x) gives an approximation for the number of solutions of x with an ap-
proximation factor F(x) = poly(|x|). Let M be a set of Markov chains such
that for each problem instance x in #A, it contains a Markov chain M . The
state space of M is the vertices of the tree representing the solution space of
x, and the transition probabilities are defined in the following way. For each
edge (u, v), where u is a child of v, define w((u, v)) := C(xu ), where xu is the
problem instance labeling u, if u is an internal edge, and let w((u, v)) be 1 if
u is a leaf. The transition probability T (u|v) 6= 0 if and only if u and v are
neighbors. In that case, it is defined as
w((u, v))
T (u|v) := P 0
. (7.129)
u0 |(u0 ,v)∈E w((u , v))
Mixing of Markov chains and their applications 309
where g(x) is the function from the definition of self-reducibility measuring the
length of the solutions, and L is the set of the leaves in the tree. If C measured
exactly the number of solutions, then the following inequality would be true:
X X
C(xu ) ≤ 2|{x|xRy}| (7.131)
v∈Vd u|(u,v)∈E
We get that
X |{y|xRy}|
π(v) = P =
2 e∈E w(e)
v∈L
|{y|xRy}|
P P ≥
|{y|xRy}| + v∈V \L u|(u,v)∈E C(xu )
|{y|xRy}| 1
= . (7.134)
|{y|xRy}| + 2g(x)(x)|{y|xRy}| 1 + 2g(x)(1 + F(x))
To prove the rapid mixing, we can use an idea similar to that in Example 21.
The conductance of the Markov chain is taken on a connected subtree S.
There are two cases: the root of S is the root of the whole tree or not. If the
root of S is not the root of the whole tree, then we show that the ergodic flow
310 Computational complexity of counting and sampling
on the root is already comparable with π(S). Let v denote the root of S, let
xv denote the problem instance labeling v, let u denote the parent of v, and
let Vd denote the set of vertices in S in depth d. Observe that
1
T (u|v) ≥ . (7.135)
1 + F(x)2
w((u, v))
T (u|v) = P =
w((u0 , v))
u0 |(u0 ,v)∈E
|{y|xv Ry}|
1+F (x) 1
|{y|xv Ry}|
= . (7.136)
+ |{y|xv Ry}|(1 + F(x)) 1 + F(x)2
1+F (x)
Then
|{y|xv Ry}| 1
F (S) π(v)T (u|v) 1+F (x) 1+F (x)2
Φ= ≥ P ≥P P 0
≥
π(S) v∈S π(v) v 0 ∈S u|(u,v 0 )∈E w((u, v ))
|{y|xv Ry}| 1
1+F (x) 1+F (x)2
=
|{y|xv Ry}| + |{y|xv Ry}|2g(x)(1 + F(x))
1
2
. (7.137)
(1 + F(x))(1 + F(x) )(1 + 2g(x)(1 + F(x)))
If the root of S is the root of the whole tree, then the complement of S is the
union of disjoint subtrees. Similar to the previous calculations, the ergodic
flow going out of their root is at most a polynomial factor smaller than the
probability of the trees. Since the Markov chain is reversible, the ergodic flow
of S equals the ergodic flow of the complement. Thus, the ergodic flow of S is
at most a polynomial factor smaller than the probability of the complement
of S, which cannot be smaller than the probability of S. Thus, the inverse of
the conductance in this case is also polynomially bounded.
Since the conductance is polynomially bounded, the Markov chain is
rapidly mixing.
Corollary 20. Let #A be a self-reducible counting problem. Let C be a poly-
nomial time computable function such that for any problem instance x of #A,
C(x) gives an approximation for the number of solutions of x with a polynomial
approximation factor. Then #A is in FPAUS and thus is in FPRAS.
Proof. We know that there is a rapidly mixing Markov chain on the vertices of
Mixing of Markov chains and their applications 311
the tree representing the solution space of the problem instance x, such that
its stationary distribution restricted to the leaves of the tree is the uniform
distribution; furthermore, the inverse of the probabilities of the leaves in the
stationary distribution is polynomially bounded. Then we can sample a vertex
of the tree from a distribution being very close to its stationary distribution
in polynomial time. The probability the sample is a not a leaf is less than
1
1− (7.138)
poly(|x|)
for some polynomial. Then the probability that none of the samples is a leaf
from O(poly(|x|, − log(ε))) number of samples is less than 2ε . In that case we
choose an arbitrary solution of x. The number of steps of the Markov chain
for one sample should be set such that the total variation distance of the
distribution after the given number of steps restricted to the leaves and the
uniform distribution of the leaves should be smaller than 2ε . Then the following
procedure will be an FPAUS. The procedure makes the prescribed number of
steps in the Markov chain, returns the current state, and iterates it till the
first sampled leaf but at most O(poly(|x|, − log(ε))) times; and in case of no
sampled leaves, it returns an arbitrary solution. Since #A is self-reducible, if
it is in FPAUS, it is also in FPRAS.
expected running time grows polynomially with the size of the problem
instance. An example of such a fast perfect sampler was published by
Mark Huber. His method perfectly samples linear extensions of posets
[95].
• Russ Bubley and Martin Dyer introduced the path coupling technique,
where two Markov chains are coupled via a path of intermediate states.
Using this technique, they proved fast mixing of a Markov chain on
linear extensions of a poset. Their method gave a better upper bound
on the mixing time that could be achieved using a geometric approach
[29].
• Cheeger’s inequalities say that a Markov chain is rapidly mixing if and
only if there is no bottleneck in them. However, bottlenecks might not
only be geometric when there is only a few edges between two subsets
of vertices in the Markov graph. They might also be probabilistic bot-
tlenecks where two subsets of vertices are connected with many edges,
however, the average transition probabilities are very small. Examples
of such bottlenecks are presented in [79] and [131]. Goldberg and Jer-
rum showed that the so-called Burnside process converges slowly. It is a
random walk on a bipartite graph whose vertices are the members of a
permutation group and combinatorial objects on which the group acts.
Two vertices are connected if the group element fixes the combinatorial
object. Restricting the walk to the combinatorial objects (that is, con-
sidering only every second step on the bipartite graph) yields a Markov
chain whose Markov graph is fully connected. Indeed, the identity of the
group fixes all combinatorial objects. Still, the Markov chain might be
torpidly mixing for some permutation groups, as proved by Goldberg
and Jerrum. Miklós, Mélykúti and Swenson considered a Markov chain
whose state space is the set of shortest reversal sorting paths of signed
permutations [131]. The corresponding Markov graph is fully connected,
however, the Markov chain is torpidly mixing since the majority of the
transitions have negligible probability.
• The Lazy Markov chain technique is somewhat paradoxical in the sense
that we have to slow down a Markov chain to prove its rapid mixing.
To avoid it, we have to prove that the smallest eigenvalue is sufficiently
separated from −1. Diaconis and Strock proved an inequality on the
smallest eigenvalue [54]. Greenhill used a similar inequality to prove
rapid mixing of the non-lazy version of a Markov chain sampling regular
directed graphs [83].
• The logarithmic Sobolev inequality was given by Gross [86]. Diaconis
and Saloff-Coste showed its application in mixing time of Markov
chains
1
[53]. They gave an inequality for the mixing time where log π(x i)
in
1
Equation (7.8) is replaced with log log π(x i)
while 1−λ2 is replaced
Mixing of Markov chains and their applications 313
7.5 Exercises
1. Let T be the transition matrix of a Markov chain constructed by the
Metropolis-Hastings algorithm. Prove that there are matrices W and Λ
for which
T = W ΛW −1
and Λ is diagonal.
2. Let T be the transition matrix of a reversible Markov chain. Prove that
T +I
2
can be diagonalized.
3. * Let p1 , p2 and p3 be three distributions over the same domain, and
let α be a real number in [0, 1]. Show that
16. ◦ Using the canonical path method, show that the Markov chain in Exer-
cise 15 is rapidly mixing, that is, the mixing time only grows polynomial
with the size of the ground graph G.
17. ◦ Let M be a Markov chain whose states are the monotone paths on
the 3D lattice from (0, 0, 0) to (a, b, c) (a, b, c ∈ Z+ ). Each path can be
described as the series of steps in the directions of the 3 axes of the
3D coordinate system. For example, xxzy means 2 steps in the first
dimension, one step in the third dimension, and one step in the second
dimension. There is a transition between two states if they differ in
two consecutive steps. The transition probability between any two such
states is
1
.
a+b+c−1
The chain remains in the same state with the remaining probabilities.
Show that this Markov chain converges to the uniform distribution. Us-
ing multicommodity flow, show that the mixing time grows only poly-
nomially with a + b + c.
18. * Alice and Bob are playing against the devil. The devil put a hat on
Bob’s head and a hat on Alice’s head. The hat might be white or black
and the two hats might be the same or different colors. Alice and Bob
might discuss a strategy before the game, but after that they cannot
communicate and can see only the hat of the opposite player. From this
information, they have to guess the color of their own hat.
316 Computational complexity of counting and sampling
19. Let M be a set of Markov chains that contains a Markov chain Mn for
each positive integer n. The state space of Mn is the set of Dyck words
of length 2n, and the transition probabilities are defined by the following
algorithm.
(a) Draw uniformly a random number between 1 and 2n − 1.
(b) If swapping the characters in positions i and i + 1 is also a Dyck
word, then swap them. Otherwise, do nothing.
Using the coupling argument, prove that M is rapidly mixing.
20. ◦ Use the coupling technique to show that the Markov chain in Exam-
ple 21 is rapidly mixing.
21. Consider the Markov chain on the d-dimensional toroidal lattice that
contains k vertices on its circles in each dimension. The state space of
the Markov chain contains the k d vertices of the toroidal lattice, and the
transition probabilities are between neighbor vertices; each transition
1
probability is 2d . Using the coupling technique, prove that the mixing
time grows polynomially with both d and k.
22. * Show that counting the spanning trees of a graph is a self-reducible
counting problem.
23. Show that counting the shortest paths between two vertices of a graph
is a self-reducible counting problem.
7.6 Solutions
Exercise 3. The proof comes directly from the definition of total variation
distance and basic properties of the absolute value function. Indeed,
1X
dT V (p1 , αp2 + (1 − α)p3 ) = |p1 (x) − αp2 (x) − (1 − α)p3 (x)| ≤
2 x
1X
(|αp1 (x) − αp2 (x)| + |(1 − α)p1 (x) − (1 − α)p3 (x)|) =
2 x
αdT V (p1 , p2 ) + (1 − α)dT V (p1 , p3 ).
Exercise 4. Let U and V be the two classes of the bipartite Markov graph.
Consider the vector π̃ defined as
π̃(u) := π(u)
T π̃ = −π̃,
π(S) + π(S) = 1
holds.
Exercise 9. Consider, for example, the caterpillar binary tree, that is, the
binary tree in which the internal vertices form a path.
Exercise 13. A good canonical path can be defined by bubbling down the
appropriate elements of the permutation to the appropriate positions. We
can set an upper bound on the maximum load of an edge similar to the
calculations in the solution of Example 24. The difference is that we do not
318 Computational complexity of counting and sampling
have to index the member of the permutations; observe that the indexed
sequences in Example 24 behave like the permutations.
Exercise 15. Consider the Markov chain M 0 that with 12 probability, deletes
a uniformly selected edge from the presented edges and with 12 probability,
adds an edge uniformly selected from those edges with which the matching can
be extended. Observe that the given transition probabilities are the transition
probabilities of the Markov chain that we get by applying the Metropolis-
Hastings algorithm on M 0 and setting the target distribution to the uniform
one.
Exercise 16. The canonical path can be constructed similar to the Markov
chain presented in Section 8.2.5.
Exercise 17. The construction of the multicommodity flow is similar to the
one presented in the solution of Example 24.
Exercise 18.
(a) Alice will say the color of Bob’s hat, Bob will say the opposite color of
Alice’s hat. If they have hats of the same color, then Alice will guess
properly, if they have hats of different colors, then Bob will guess cor-
rectly.
(b) They toss a fair coin. If it is a head, they play the previous strategy, if
it is a tail, they flip the roles.
Exercise 20. Couple the two Markov chains when they are both at the root
of the tree. Show that the waiting time for this event only grows polynomially
with the size of the problem.
Exercise 22. Let G = (V, E) be a simple graph. Fix an arbitrary total order
on E. Let ei be the smallest edge in E. Then in any spanning tree T of G,
either e ∈ T or e 6∈ T . If ei ∈ T , then let T ei be the tree obtained from T
by contracting e, and let Gei be the graph obtained from G by contracting
ei . Then T ei is a spanning tree of Gei , furthermore any spanning tree T̃ ei of
Gei can be obtained by contracting e in a spanning tree T̃ of G. Since for any
T 6= T̃ , it also holds that T ei 6= T̃ ei , contracting ei in those spanning trees of
G that contain e is a bijection between those trees and the spanning trees of
Gei .
Similarly, there is a bijection between the spanning trees of G not contain-
ing ei and the spanning trees of G \ {ei }. This provides a way to encode the
spanning trees of G. If the encoding starts
ei ∈ T,
then the extensions are the spanning trees of Gei ; if the encoding starts
ei 6∈ T,
then the extensions are the spanning trees of G \ {ei }. It is easy to see that
the granulation function σ has O(log(n)) values: encoding index i takes log(n)
Mixing of Markov chains and their applications 319
bits, where n = |V |. It is also easy to see that all g, σ and φ are polynomial
time computable functions, the encoding of both Gei and G \ {ei } are shorter
than the encoding of G, and property 3d in the definition of self-reducibility
also holds due to the above-mentioned bijections.
Exercise 24. Fix an arbitrary total order on the vertices. Let D =
d(v1 ), d(v2 ), . . . d(vn ) denote the degree sequence. Let v1 be the smallest vertex
and assume that there is already given a star S with center v0 as a forbid-
den set of edges. S might be empty. Let vi be the smallest vertex such that
(v1 , vi ) ∈
/ S. Then the realizations of D avoiding S and not containing edge
(v1 , vi ) are the realizations of D avoiding S ∪ {(v1 , vi )} and the realizations
of D avoiding S and containing edge (v1 , vi ) are the realizations of the degree
sequence D0 = d(v1 ) − 1, d(v2 ), . . . , d(vi ) − 1, . . . , d(vn ) avoiding S ∪ {(v1 , vi )}.
Finally, the realizations of D avoiding the star with n − 1 leaves and center
v1 are the realizations of D” = d(v2 ), . . . , d(vn ).
Chapter 8
Approximable counting and
sampling problems
There are #P-complete counting problems that are very easy to approximate.
We already discussed in Chapter 6 that an FPAUS can be given for sam-
pling satisfying assignments of a disjunctive normal form. Since counting the
satisfying assignments of a DNF is a self-reducible counting problem, we can
conclude that #DNF is also in FPRAS.
The core of the mentioned FPAUS is a rejection sampler for which the
inverses of the acceptance probabilities are polynomial bounded. In this chap-
ter, we show two further examples of FPAUS algorithms based on rejection
samplers.
We know only Markov chain approaches for generating almost uniform
samples of solutions to problem instances in #P-complete counting problems.
The second part of this chapter introduces examples for such Markov chains.
321
322 Computational complexity of counting and sampling
contradicting that I 0 ∈
/ S. In particular, wk0 ≥ n.
To simplify the notation, we define
n2
δi := wi − wi0 ,
W
that is,
W
wi =(w0 + δi ) .
n2 i
We are ready to show that I 0 \ {k} is in S. Indeed,
X W X
wi = (wi0 + δi )
n2
i∈I 0 \{k} i∈I 0 \{k}
W X 0 X
= wi − wk0 + δi
n2 0 i∈I i∈I 0 \{k}
!
W X
≤ wi0 − wk0 +n since ∀δi ≤ 1
n2
i∈I 0
W X 0
≤ wi since wk0 ≥ n
n2 0 i∈I
W 2
≤ n = W.
n2
Since any I ∈ S can be obtained in at most n different ways from an I 0 ∈ S 0 \S,
it follows that
|S 0 \ S| ≤ n|S|.
Therefore
|S 0 | ≤ (n + 1)|S|.
Uniformly sampling from S 0 can be done in polynomial time with n. The
1
probability of sampling a solution from S is larger than n+1 . Since checking
if a solution is indeed in S can be done in O(n log(W )), we can use rejection
sampling to obtain an FPAUS algorithm whose running time is polynomial
with n and log(W ) (that is, polynomial with the size of the input) and also
polynomial with − log(ε), where ε is the allowed deviation from the uniform
distribution measured in total variation distance.
Input:
P two
P degree sequences D = d1 , d2 , . . . , dn , F = f1 , f2 , . . . , fn such that
i di = i fi = 2n − 2 and for all i, min{di , fi } = 1.
Output: the number of edge-disjoint tree realizations of D and F .
We will refer to D and F as tree degree sequences without common internal
nodes. To prove that #MinDegree1-2Trees-DegreePacking is in FPAUS and
FPRAS, we need the following observations. The first is a well-known fact,
and its proof can be found in many textbooks on enumerative combinatorics.
Observation 2. The number of trees with degree sequence d1 , d2 , . . . , dn is
(n − 2)!
Qn . (8.5)
k=1 (dk − 1)!
The probability that vi and vj are adjacent is the ratio of the values in Equa-
tions (8.7) and (8.5), which is indeed
di + dj − 2
. (8.8)
n−2
We define the following sets.
Observe that there might be parallel edges of the two tree realizations only
between these two sets. The expected number of edges is
X X (di − 1)(fj − 1) X di − 1 X fj − 1
2
=
(n − 2) n−2 n−2
vi ∈A vj ∈B vi ∈A vj ∈B
n n
X di − 1 X fj − 1
=1 (8.12)
i=1
n − 2 j=1 n − 2
since di = 1 for all vi ∈ A and fj = 1 for all vj ∈ B, and the sum of the
degrees decreased by 1 is n − 2 in any tree degree sequence.
We are ready to prove the following theorem.
Theorem 83. The counting problem #MinDegree1-2Trees-DegreePacking is
in FPAUS and FPRAS
Proof. Let D and F be two degree sequences without common internal nodes.
If there is a vertex vi such that di = n − 1 or fi = n − 1, then di + fi = n, and
it implies that D and F does not have edge-disjoint tree realizations. So we
can assume that there are vertices vi1 , vi2 , vj1 and vj2 such that di1 , di2 > 1
and fj1 , f (j2 ) > 1. The number of tree pairs (TD , TF ) in which (vi1 , vj1 ) and
(vi1 , vj1 ) are edges in both TD and TF is
(n − 4)!
Q ×
(di1 − 2)!(di2 − 2)! k6=i1 ,i2 (dk − 1)!
(n − 4)!
Q (8.13)
(fj1 − 2)!(fj2 − 2)! k6=j1 ,j2 (fk − 1)!
0 ≤ xi ≤ 1 (8.17)
xi ≤ xj ∀ ai ≤ aj . (8.18)
Since any poset polytope contains the points (0, 0,√. . . , 0) and (1, 1, . . . , 1),
the diameter of a poset polytope of n elements is n. We can define the
poset polytope of any total ordering. Since there are n! total orderings of
an antichain of size n, and the polytopes of its total orderings has the same
1
volume, the volume of a total ordering polytope of n elements is n! .
The transitions of the Markov chain corresponds to the common facet of
the two total ordering polytopes. It is defined by the inequality system
that is,
1
≤ 16n3 (n − 1)2 . (8.24)
1 − λ2
The number of linear extensions is clearly at most n!. Therefore, by applying
Theorem 68, we get for the relaxation time that
1
τi (ε) ≤ 16n3 (n − 1)2 log(n!) + log . (8.25)
ε
This mixing time is clearly polynomial in both n and − log(ε). Finding one
total ordering of a poset can be done in polynomial time as well as performing
one step in a Markov chain can be done in polynomial time. What follows
is that there is an FPAUS for almost uniformly sampling linear extensions
of a poset. Since the number of linear extensions is a self-reducible counting
problem, we get the following theorem.
Theorem 84. The counting problem #LE is in FPRAS.
328 Computational complexity of counting and sampling
Consider a Markov chain whose state space is the set of the possible matchings
of G. We will denote this set with M(G). If M is the current state, choose an
edge e = (u, v) uniformly at random, and do the following:
1
(a) If e ∈ M , then move to M \ {e} with probability 1+w(e) .
(b) If e ∈
/ M and there is no edge in M which is incident to u or v, then
w(e)
move to M ∪ {e} with probability 1+w(e) .
π(M ) ∝ W (M ). (8.27)
In the second case, w.l.o.g. we can assume that M 0 = M ∪ {e} \ {e0 }. Then
w(e)
π(M 0 )T (M |M 0 ) = π(M ) T (M |M 0 ) =
w(e0 )
w(e) w(e0 )
π(M ) = π(M )T (M 0 |M ). (8.29)
w(e ) w(e) + w(e0 )
0
Approximable counting and sampling problems 329
We will apply the canonical path method to prove rapid mixing (Theo-
rem 76). We fix an arbitrary total ordering of the edges, and construct a path
system Γ, which contains a path for any pair of matchings.
Let X and Y be two matchings. We create a canonical path from X to
Y in the following way, on which Z will denote the current realization. The
symmetric difference X∆Y is a union of disjoint alternating paths and alter-
nating cycles. The total ordering of the edges defines a total ordering of the
components. We transform X to Y by working on the components in increas-
ing order. Let Ci be the ith component. There are two cases: Ci is either an
alternating path or an alternating cycle. If it is a path, then consider its edges
started from the end of the path containing the smaller edge, and denote them
by e1 , e2 , . . . , eli . If e1 is presented in Z, remove it. If it is not presented in Z,
then add it, and remove e2 . Then continue the transformation by removing
and adding edges till all the edges in Y are added, and all the edges in X are
removed (the last step might be adding the last edge of the path).
If Ci is a cycle, then remove the smallest edge e ∈ Ci ∩ Z. Then Ci \ {e}
is a path, and work with it as described above.
To give an upper bound on KΓ , we introduce
M := X∆Y ∆Z (8.30)
for each Z. Let e = (Z, Z 0 ) be an edge in the Markov graph. How many
canonical paths use this edge? We claim that X and Y can be unequivocally
reconstructed from the edge (Z, Z 0 ) and M . Indeed, observe that
M ∆Z = X∆Y, (8.31)
therefore X∆Y can be reconstructed from e and M . From this, we can identify
which is the component Ci that is being changed (recall that there is a total
ordering of the components that does not depend on Z). Observe that all
edges in Cj ∩ Z, j < i come from Y and all edges in Cj \ Z come from X.
Similarly, for all k > i, all edges in Ck ∩ Z come from X, and all edges in
Ck \ Z come from Y . Furthermore, from the edge e, we can also tell which
edges in Ci come from X and which edges come from Y . Since X, Y and Z
are identical on Z \ (X∆Y ), we can determine X and Y from e and M .
What does M look like? We claim that M is almost a matching except it
might contain a couple of adjacent edges. Indeed, M is a matching on M \ Ci ,
and might contain two adjacent edges on Ci if one of them is already removed
from Z and the other is going to be added to Z 0 . Let M ∗ be M if M is a
matching (which is the case if the edge (Z, Z 0 ) represents the first alteration
on the current component), and otherwise let M ∗ be M \ {f }, where f is the
edge of the adjacent pair of edges in M which is going to be added by the
operation represented by e.
Also observe that
1 π(X)π(Y )
π(M ∗ ) = . (8.32)
w(f ) π(Z)
330 Computational complexity of counting and sampling
It is also clear that any path from any X to Y has length at most 3n
5 , where
n is the number of vertices. The smallest transition probability is 1+w1max ,
where wmax is the largest weight. In this way, we can get an estimation of
the load of an edge e = (Z, Z 0 ). Let F denote the set containing the edge in
G that is added in the operation represented by e if such edge exists. If the
operationQrepresented by e deletes only an edge, then F is the empty set. Let
w(F ) := f ∈F w(f ), where the empty product is defined as 1. Then we have
the following upper bound on the load of e.
1 X 3n X π(X)π(Y )
|γX,Y |π(X)π(Y ) ≤ ≤
Q(e) γ 3e 5(1 + wmax ) γ π(Z)
X,Y X,Y
2
Therefore, KΓ in Theorem 76 is O(nwmax ). According to the theorem, the
Markov chain is rapidly mixing if the maximum weight is upper bounded by
a polynomial function of n.
c(G)
1− n
m
2 2
e = e1 , e2 , . . . , el
where for each ek = (uk , vk ), ek+1 is defined as ϕvk (ek ). Denote the so-defined
circuit by C1 . If H \ C1 is not the empty graph, repeat the same on H \ C1 to
get a circuit C2 . The process is iterated till H \ (C1 ∪ C2 ∪ . . . ∪ Cs ) is the empty
graph. Then each Ci is decomposed into cycles Ci,1 , Ci,2 , Ci,ji . The cycle Ci,1 is
the cycle between the first and second visit of w, where w is the first revisited
vertex in Ci (note that w might be both in U and V ). Then Ci,2 is defined in
the same way in Ci \ Ci,1 , etc. The path from X to Y is defined by processing
the cycles
C1,1 , C1,2 , . . . C1,j1 , C2,1 , . . . Cs,sj
applying the sequence of swap operations as described in the proof of Theo-
rem 85. For the so-obtained path γ, we define
π(X)π(Y )
f (γ) := Q .
d(w)
w∈V (H) 2 !
Approximable counting and sampling problems 333
Therefore, if the number of paths in the path system going through any edge
is less than
Y d(w)
poly(n, m)N !
2
w∈V (H)
for some poly(n, m), then the swap Markov chain is rapidly mixing. Let Z
be a realization on a path γ going from X to Y obtained from the ensemble
of pairings Φ, and let e be the transition from Z to Z 0 . Let MG denote the
adjacency matrix of a bipartite graph G, and let
M̂ := MX + MY − MZ . (8.34)
Miklós, Erdős and Soukup [130] proved that the path γ and thus X and Y can
be unequivocally obtained from M̂ , Φ, e and O(log(nm)) bits of information.
The proof is quite involved and thus omitted here. The corollary is that the
number of paths going through a particular e = (Z, Z 0 ) is upper bounded by
Y d(v)
poly(n, m)|MZ | ! (8.35)
2
v∈V (H)
First observe that M̂ has the same row and column sums as the adjacency
matrix of any realization of D, and it might contain at most 3 values which
are not 0 or 1. Indeed, if
By Theorem 78, this proves the rapid mixing of the Markov chain.
where k denotes the maximum number of degrees. This implies that a JDM
also uniquely determines the degree sequence, since we have obtained the
number of nodes of given degrees for all possible degrees. For sake of uniformity
we consider all vertex classes Vi for i = 1, . . . , k; therefore we consider empty
classes with ni = 0 vertices as well. A necessary condition for J to be graphical
is that all the P
ni -s are integers. Let n denote the total number of vertices.
Naturally, n = i ni and it is uniquely determined via Equation (8.37) for a
given graphical JDM. The necessary and sufficient conditions for a given JDM
to be graphical are provided in the following theorem
Theorem 86. [50] A k × k matrix J is a graphical JDM if and only if the
following conditions hold:
1. For all i = 1, . . . , k
Pk
Ji,i + j=1 Ji,j
ni :=
i
is integer.
2. For all i = 1, . . . , k
ni
Ji,i ≤ .
2
3. For all i = 1, . . . , k and j = 1, . . . , k, i 6= j,
J ij ≤ n i n j .
Let dj (v) denote the number of edges such that one end-vertex is v and
the other end-vertex belongs to Vj , i.e., dj (v) is the degree of v in Vj . The
vector consisting of the dj (v)-s for all j is called the degree spectrum of vertex
v. We introduce the notation
0, if ni = 0 ,
Θi,j = Ji,j
ni , otherwise,
Definition 63. Let J be a JDM. The state space of the RSO Markov chain
consists of all the balanced realizations of J. It was proved by Czabarka et
al. [50] that this state space is connected under restricted swap operations.
The transitions of the Markov chain are defined in the following way. With
probability 1/2, the chain does nothing, so it remains in the current state (we
consider a lazy Markov chain). With probability 1/2 the chain will choose
four, pairwise disjoint vertices, v1 , v2 , v3 , v4 from the current realization (the
possible choices are order dependent) and check whether v1 and v2 are chosen
from the same vertex class, and furthermore whether the
swap operation is feasible. If this is the case then our Markov chain performs
the swap operation if it leads to another balanced JDM realization. Otherwise
the Markov chain remains in the same state. (Note that exactly two different
orders of the selected vertices will provide the same swap operation, since the
roles of v1 and v2 are symmetric.) Then there is a transition with probability
1
n(n − 1)(n − 2)(n − 3)
between two realizations iff there is a RSO transforming one into the other.
Here we prove that such a Markov chain is rapidly mixing. The convergence
of a Markov chain is measured as a function of the input data size. Here we
note that the size of the data is the number of vertices (or number of edges,
they are polynomially bounded functions of each other) and not the number of
digits to describe the JDM. This distinction is important as, for example, one
can create a 2×2 JDM with values J2,2 = J3,3 = 0 and J2,3 = J3,2 = 6n, which
has Ω(n) number of vertices (or edges) but it needs only O(log(n)) number
of digits to describe (except in the unary number system). Alternatively, one
might consider the input is given in unary.
Formally, we state the rapid mixing property via the following theorem:
Theorem 88. The RSO Markov chain on balanced JDM realizations is a
rapidly mixing Markov chain, namely, for the second-largest eigenvalue λ2 of
this chain, it holds that
1
= O(poly(n))
1 − λ2
where n is the number of vertices in the realizations of the JDM.
Note that the expression on the LHS is called, with some abuse of notation,
the relaxation time: it is the time is needed for the Markov chain to reach
its stationary distribution. The proof is based on the special structure of the
state space of the balanced JDM realizations. This special structure allows the
following proof strategy: if we can prove that some auxiliary Markov chains
are rapidly mixing on some sub-spaces obtained from decomposing the above-
mentioned specially structured state space, then the Markov chain on the
Approximable counting and sampling problems 337
whole space is also rapidly mixing. We are going to prove the rapid mixing of
these auxiliary Markov chains, as well as give the proof of the general theorem,
that a Markov chain on this special structure is rapidly mixing, hence proving
our main Theorem 88.
In order to describe the structure of the space of balanced JDM realiza-
tions, we first define the almost semi-regular bipartite and almost regular
graphs.
|d(u1 ) − d(u2 )| ≤ 1
and
|d(v1 ) − d(v2 )| ≤ 1.
Definition 65. A graph G(V, E) is almost regular, if for any v1 , v2 ∈ V
|d(v1 ) − d(v2 )| ≤ 1.
It is easy to see that the restriction of any graphical realization of the JDM
to vertex classes Vi , Vj , i 6= j can be considered as the coexistence of two
almost regular graphs (one on Vi and the other on Vj ), and one almost semi-
regular bipartite graph on the vertex class pair Vi , Vj . More generally, the
collection of these almost semi-regular bipartite graphs and almost regular
graphs completely determines the balanced JDM realization. Formally:
Definition 66 (Labeled union). Any balanced JDM realization can be rep-
resented as a set of almost semi-regular bipartite graphs and almost regular
graphs. The realization can then be constructed from these factor graphs as
their labeled union: the vertices with the same labels are collapsed, and the
edge set of the union is the union of the edge sets of the factor graphs.
It is useful to construct the following auxiliary graphs. For each vertex
class Vi , we create an auxiliary bipartite graph, Gi (Vi , U ; E), where U is a set
of “super-nodes” representing all vertex classes Vj , including Vi . There is an
edge between v ∈ Vi and super-node uj representing vertex class Vj iff
dj (v) = dΘi,j e ,
i.e., iff node v carries the ceiling of the average degree of its class i toward
the other class j. (For sake of uniformity, we construct these auxiliary graphs
for all i = 1, . . . , k, even if some of them have no edge at all. Similarly, all
super-nodes are given, even if some of them have no incident edge.) We claim
that these k auxiliary graphs are half-regular, i.e., each vertex in Vi has the
same degree (the degrees in the vertex class U might be arbitrary). Indeed, the
vertices in Vi all have the same degree in the JDM realization, therefore, the
338 Computational complexity of counting and sampling
number of times they have the ceiling of the average degree toward a vertex
class is constant in a balanced realization.
Let Y denote the space of all balanced realizations of a JDM and just as
before, let k denote the number of vertex classes (some of them can be empty).
We will represent the elements of Y via a vector y whose k(k+1)/2 components
are the k almost regular graphs and the k(k − 1)/2 almost regular bipartite
graphs from their labeled union decomposition, as described in Definition
66 above. Given an element y ∈ Y (i.e., a balanced graphical realization
of the JDM) it has k associated auxiliary graphs Gi (Vi , U ; E), one for every
vertex class Vi (some of them can be empty graphs). We will consider this
collection of auxiliary graphs for a given y as a k-dimensional vector x, where
x = (G1 , . . . , Gk ).
For any given y we can determine the corresponding x (so no particular
y can correspond to two different xs), however, for a given x there can be
several y’s with that same x. We will denote by Yx the subset of Y containing
all the y’s with the same (given) x and by X the set of all possible induced
x vectors.SClearly, the x vectors can be used to define a disjoint partition on
Y: Y = Yx . For notational convenience we will consider the space Y as
x∈X
pairs (x, y), indicating the x-partition to which y belongs. This should not be
confused with the notation for an edge, however, this should be evident from
the context. A restricted swap operation might fix x, in which case it will
make a move only within Yx , but if it does not fix x, then it will change both
x and y. For any x, the RSOs moving only within Yx form a Markov chain.
On the other hand, tracing only the x’s from the pairs (x, y) is not a Markov
chain: the probability that an RSO changes x (and thus also y) depends also
on the current y not only on x. However, the following theorem holds:
Theorem 89. Let (x1 , y1 ) be a balanced realization of a JDM in the above
mentioned representation.
i Assume that (x2 , y2 ) balanced realization is derived from the first one
with one restricted swap operation. Then, either x1 = x2 or they differ
in exactly one coordinate, and the two corresponding auxiliary graphs
differ only in one swap operation.
ii Let x2 be a vector differing only in one coordinate from x1 , and further-
more, only in one swap within this coordinate, namely, one swap within
one coordinate is sufficient to transform x1 into x2 . Then there exists at
least one y2 such that (x2 , y2 ) is a balanced JDM realization and (x1 , y1 )
can be transformed into (x2 , y2 ) with a single RSO.
Proof. (i) This is just the reformulation of the definitions for the (x, y) pairs.
(ii) (See also Fig. 8.1) By definition there is a degree i, 1 ≤ i ≤ k such that
auxiliary graphs x1 (Gi ) and x2 (Gi ) are different and one swap operation trans-
forms the first one into the second one. More precisely there are vertices
v1 , v2 ∈ Vi such that the swap transforming x1 (Gi ) into x2 (Gi ) removes edges
Approximable counting and sampling problems 339
(v1 , Uj ) and (v2 , Uk ) (with j 6= k) and adds edges (v1 , Uk ) and (v2 , Uj ). (The
capital letters show that the second vertices are super-vertices.) Since the
edge (v1 , Uj ) exists in the graph x1 (G1 ) and (v2 , Uj ) does not belong to graph
x1 (Gi ), therefore dj (v1 ) > dj (v2 ) in the realization (x1 , y1 ). This means that
there is at least one vertex w ∈ Vj such that w is connected to v1 but not
to v2 in the realization (x1 , y1 ). Similarly, there is at least one vertex r ∈ Vk
such that r is connected to v2 but not to v1 (again, in realization (x1 , y1 )).
Therefore, we have a required RSO on nodes v1 , v2 , w, r.
Vk r
Uk
w Uj
Vj
Vi v1 v2 Vi v1 v2
Theorem 91. The swap Markov chain on the realizations of almost half-
regular bipartite degree sequences is rapidly mixing.
We are now ready to prove the main theorem.
340 Computational complexity of counting and sampling
Proof. (Theorem 88) We show that the RSO Markov chain on balanced real-
izations fulfills the conditions in Theorem 73. First we show that condition (i)
of Theorem 73 holds. When restricted to the partition Yx (that is with x fixed),
the RSO Markov chain over the balanced realizations walks on the union of
almost semi-regular and almost regular graphs. By restriction here we mean
that all probabilities which would (in the original chain) leave Yx are put onto
the shelf-loop probabilities. Since an RSO changes only one coordinate at a
time, independently of other coordinates, all the conditions in Theorem 80 are
fulfilled. Thus the relaxation time of the RSO Markov chain restricted onto
Yx is bounded from above by the relaxation time of the chain restricted onto
that coordinate (either an almost semi-regular bipartite or an almost regular
graph) on which this restricted chain is the slowest (the smallest gap). How-
ever, based on Theorems 90 and 91, all these restrictions are fast mixing, and
thus by Theorem 80 the polynomial bound in (i) holds. (Here K = k(k+1) 2 ,
see Definition 66 and note that an almost semi-regular bipartite graph is also
an almost half-regular bipartite graph.)
Next we show that condition (ii) of Theorem 73 also holds. The first coor-
dinate is the union of auxiliary bipartite graphs, all of which are half-regular.
The M 0 Markov chain corresponding to Theorem 73 is the swap Markov chain
on these auxiliary graphs. Here each possible swap has a probability
1
n(n − 1)(n − 2)(n − 3)
and by Theorem 89 it is guaranteed that condition 7.53 is fulfilled. Since, again
all conditions of Theorem 80 are fulfilled (mixing is fast within any coordinate
due to Theorems 90 and 91), the M 0 Markov chain is also fast mixing. The
condition in Equation (7.53) holds due to Theorem 89. Since all conditions
in Theorem 73 hold, the RSO swap Markov chain on balanced realizations is
also rapidly mixing.
• Take an adjacency (a, b) and a telomere (c), and create a new adjacency
and a new telomere from the 3 extremities: either (a, c) and (b) or (b, c)
and (a).
• Take two telomeres (a) and (b), and create a new adjacency (a, b).
• Take an adjacency (a, b) and create two new telomeres (a) and (b).
342 Computational complexity of counting and sampling
Given two genomes G1 and G2 with the same label set, it is always possible
to transform one into the other by a sequence of DCJ operations [185]. Such
a sequence is called a DCJ scenario for G1 and G2 . The minimum length of a
scenario is called the DCJ distance and is denoted by dDCJ (G1 , G2 ).
Definition 69. The Most Parsimonious DCJ (MPDCJ) scenario problem
for two genomes G1 and G2 is to compute dDCJ (G1 , G2 ). The #MPDCJ
problem asks for the number of scenarios of length dDCJ (G1 , G2 ), denoted
by #MPDCJ(G1 , G2 ).
For example, the DCJ distance between the two genomes of Figure 8.2 is
three and there are nine different most parsimonious scenarios.
MPDCJ is an optimization problem, which has a natural corresponding
decision problem asking if there is a scenario with a given number of DCJ op-
erations. So we may write that #MPDCJ ∈ #P, which means that #MPDCJ
asks for the number of witnesses of the decision problem: “Is there a scenario
for G1 and G2 of size dDCJ (G1 , G2 )?”
Before turning to approximating the number of solutions, first we give an
overview of how to find one solution. Here, the following combinatorial object
plays a central role.
Definition 70. The adjacency graph G(V1 ∪ V2 , E) of two genomes G1 and
G2 with the same edge label set is a bipartite multigraph with V1 being the
set of adjacencies and telomeres of G1 , V2 being the set of adjacencies and
telomeres of G2 . The number of edges between u ∈ V1 and v ∈ V2 is the
number of extremities they share.
Observe that the adjacency graph is a bipartite multigraph which falls into
disjoint cycles and paths. The paths might belong to one of three types:
1. an odd path, containing an odd number of edges and an even number
of vertices;
2. an even path with two endpoints in V1 ; we will call them W -shaped
paths; or
3. an even path with two endpoints in V2 ; we will call them M -shaped
paths.
In addition, cycles with two edges and paths with one edge are called trivial
components. We can use the adjacency graph to obtain the DCJ distance
between two genomes.
Theorem 92. [185, 17]
I
dDCJ (G1 , G2 ) = N − C + (8.38)
2
where N is the number of markers, C is the number of cycles in the adjacency
graph of G1 and G2 , and I is the number of odd paths in the adjacency graph
of G1 and G2 .
Approximable counting and sampling problems 343
Definition 71. The #MPDCJM W problem asks for the number of DCJ sce-
narios between two genomes when their adjacency graph contains only trivial
components and M - and W -shaped paths.
The correspondence between solutions for #MPDCJM W and #MPDCJ is
stated by the following lemma.
Lemma 22. It holds that
dDCJQ(G1 ,G2 )! Q
#MPDCJ(G1 , G2 ) = dDCJ (G∗1 ,G2 )! i (ci −1)! j (lj −1)!
Q ci −2 Q lj −2
× i ci j lj × #MPDCJM W (G∗1 , G2 ) (8.40)
Proof. As M - and W -shaped paths and other components are always treated
independently, we have
dDCJ (G1 , G2 )
#MPDCJ(G1 , G2 ) =
dDCJ (G∗1 , G2 )
× #MPDCJ(G1 , G∗1 ) × #MPDCJM W (G∗1 , G2 ).
For the genomes G1 and G∗1 , whose adjacency graphs do not contain M -
and W -shaped paths, we have from [24] and [139] that
Y Y l −2 dDCJ (G1 , G∗1 )!
#MPDCJ(G1 , G∗1 ) = cci i −2 ljj ×Q Q .
i j i (ci − 1)! j (lj − 1)!
These two equations together with Equation (8.39) give the result.
The following theorem says that the hardness of the #MPDCJ problem is
the same as the #MPDCJM W problem.
Theorem 93.
Proof. Both the multinomial factor and the two products in Equation (8.40)
can be calculated in polynomial time. Thus the transformation between the
solutions to the two different counting problems is a single multiplication or
division by an exactly calculated number. This proves that #MPDCJM W
Approximable counting and sampling problems 345
• Merge the two paths constructed at the two first steps, according to the
drawn sequence of 0s and 1s.
Note that the DCJ scenario obtained transforms G1 into G2 . Let us denote
the distribution of paths generated by this algorithm by p0 , and the uniform
distribution over all possible DCJ scenarios between G1 and G2 by U 0 . Let Xs
denote the set of all possible scenarios drawn by the above algorithm using a
specific scenario s between G∗1 and G2 . Then
X
|p0 (s0 ) − U 0 (s0 )| = |p(s) − U (s)|. (8.45)
s0 ∈Xs
Theorem 97. Let T (k1 , k2 ) denote the number of DCJ scenarios jointly sort-
ing a W - and an M -shaped path with, respectively, k1 and k2 vertices G1 . Let
I(k1 , k2 ) denote the number scenarios independently sorting the same paths.
We have that
k11.5 k21.5
T (k1 , k2 )
= O (8.47)
I(k1 , k2 ) (k1 + k2 )1.5
I(k1 , k2 )
= O (k1 + k2 ) . (8.48)
T (k1 , k2 )
We first show that sampling DCJ scenarios from the uniform distribution
is equivalent to sampling matchings of Kn,m from the distribution θ.
Theorem 98. Let a distribution q over the scenarios of n W -shaped paths
and m M -shaped paths be defined by the following algorithm.
• Draw a random matching M of Kn,m following a distribution p.
• Draw a random M-compatible DCJ scenario from the uniform distribu-
tion of all M-compatible ones.
Then
dT V (p, θ) = dT V (q, U ) (8.55)
where θ is the distribution defined in Equation (8.54), and U denotes the
uniform distribution over all DCJ scenarios.
Proof. It holds that
1 X
dT V (q, U ) = |q(x) − U (x)|.
2 x scenario
1 f (Mnew )
.
2 × n × m f (M)
M and Mnew vary by at most one edge Mi Wi , and on this edge, according to
Theorem 97, the ratio of number of scenarios jointly and independently sorting
Mi and Wi is polynomial. Furthermore, the combinatorial factors appearing in
f (M) and f (Mnew ) due to merging the sorting steps on different components
are the same. So f (M new )
f (M) as well as its inverse are polynomially bounded.
We now prove the rapid convergence of this Markov chain using a mul-
ticommodity flow technique. To prove that the Markov chain we defined on
bipartite matchings has a polynomial relaxation time, we need to construct a
path system Γ on the set of matchings of Kn,m , such that κΓ is bounded by
a polynomial in N , the number of markers in G1 and G2 .
In our case the path system between two matchings X and Y is a unique
path with probability 1. Here is how we construct it.
Fix a total order on the vertex set of Kn,m . Take the symmetric difference
of X and Y, denoted by X ∆Y. It is a set of disjoint paths and cycles. Define an
Approximable counting and sampling problems 351
P θ(x)θ(y)
We then have to show that θ(u) can be bounded by a polynomial in
0
N . Let Z → Z be an edge on the path from X to Y. We define
M
c := X ∆Y∆Z. (8.59)
f (X )f (Y)
= O(poly(N )). (8.62)
f (Z)f (M)
f
It proves the lemma, as θ(·) and f (·) differ only by a normalizing constant.
M∆Z
f differs at most in two edges from X ∆Y. These edges appear in X ∆Y,
but not in M∆Z.
f The two vertices of any missing edges correspond to com-
ponents which are independently sorted either in Z or M, f but jointly in either
X or Y. Amongst these two vertices, one of them corresponds to a W -shaped
component A, the other to an M -shaped component B. Let k1 be the number
of adjacencies and telomeres of G1 in A, and k2 the number of adjacencies
and telomeres of G1 in B. The ratio on the left-hand side of Equation (8.62)
due to such difference is
T (k1 , k2 ) . I(k1 )I 0 (k2 )
(8.63)
(k1 + k2 + 1)! k1 !(k2 + 1)!
I(k1 )I 0 (k2 )
k1 +k2 +1
= I(k1 , k2 ) (8.64)
k1
P
and as M f θ(M) = 1, κΓ is bounded by a polynomial in N . This proves the
f
theorem.
Using this result, we can prove the following theorem:
Theorem 100. It holds that #MPDCJM W ∈ FPAUS.
Proof. The above-defined Markov chain on partial matchings is an aperiodic,
irreducible and reversible Markov chain, with only positive eigenvalues. Fur-
thermore, a step can be performed in running time, that is, polynomial with
the size of the graph. We claim that for any start state i, log(1/θ(i)) is poly-
nomially bounded with the size of the corresponding genomes G∗1 and G2 .
Indeed, there are O(N 2 ) DCJ operations, the length of the DCJ paths is less
than N , and thus the number of sorting DCJ paths are O(N 2N ), and the
inverse of the probability of any partial matching is less than this. Thus, the
relaxation time is polynomial in both N and log(1/). This means that in
fully polynomial running time (polynomial both in N and − log()) a random
partial matching can be generated from a distribution p satisfying
dT V (p, θ) ≤ . (8.65)
But then a random DCJ path can be generated in fully polynomial running
time following a distribution q satisfying
dT V (q, U ) ≤ (8.66)
which interchanges the colors in CX and CY0 , and fixes all other colors.
It is easy to see that the coupling event happens when |Dt | = 0. The
cardinality of Dt can change at most one. First we consider the case when
the size of Dt increases. This can only happen if the selected vertex, v, is in
A. Then the permutation is selected in line (b), and c0 must be in CY . Since
|CY | ≤ d0 (v), we get that
1 X d0 (v) m0
P (|Dt+1 | = |Dt | + 1) ≤ = , (8.68)
n k nk
v∈A
Since the probability that Dt increases is smaller than the probability that
Dt deceases, the Markov chain couples rapidly. We can give an estimation of
the expectation of |Dt |:
m0 m0
k − 2∆
E(|Dt+1 |) ≤ (|Dt | + 1) + × |Dt | + (|Dt | − 1) +
nk kn kn
2m0
k − 2∆ k − 2∆
1− × |Dt | − |Dt | = 1 − |Dt |. (8.70)
kn kn kn
That is,
t
k − 2∆ k − 2∆
E(|Dt |) ≤ 1− |D0 | ≤ 1− n. (8.71)
kn kn
Since |Dt | is a random variable on the non-negative integers, we have that
t
k − 2∆ k−2∆
P (Dt 6= 0) ≤ n 1 − ≤ ne kn t . (8.72)
kn
kn
log nε . Since the coupling
We get that P (|Dt | = 6 0) ≤ ε when t ≥ k−2∆
time upper bounds the relaxation time, we get that the relaxation time is
bounded by a polynomial of n and − log(ε), that is, the proposed Markov
chain provides an FPAUS sampler. It is also easy to show that the number of
k-colorings is in FPRAS if k ≥ 2∆ + 1 [102], see also Exercises 15 and 16.
356 Computational complexity of counting and sampling
• Milena Michail and Peter Winkler gave an FPAUS and FPRAS for ap-
proximately sampling and counting Eulerian orientations in any Eulerian
unoriented graph [129].
• Eric Vigoda improved the bound of the Glauber dynamics Markov
chain to k > 116 ∆. Thomas P. Hayes and Eric Vigoda proved that the
same Markov chain is rapidly mixing on the k-colorings of graphs for
k > (1 + ε)∆ for all ε > 0, whenever ∆ = Ω(log(n)) and the graph does
not contain any cycle shorter than 11 [93]. Martin Dyers and Catherine
Greenhill introduced a slightly different Markov chain where an edge
is sampled uniformly, and the colors of the vertices incident to the se-
lected edge are changed. They obtained significatly better mixing time
compared to the mixing time of the Glauber dynamics when k = 2∆
[58].
• The number of independent sets in a graph is a #P-complete counting
problem, even if the maximum degree is 3 [59]. It is still possible to
sample almost uniformly independent sets of a graph [123, 59].
8.4 Exercises
1. Using a rounding technique similar to that presented in Subsection 8.1.1,
give a deterministic approximation algorithm for the following problem.
The input is a set of weights, A = {w1 , w2 , . . . , wn } and a weight W ,
the output is the subset
( )
X X
arg max w| w≤W .
S⊂A
w∈S w∈S
1
The running time of the algorithm must be polynomial with n and ε,
where 1 + ε is the approximation rate of the solution.
2. Prove Observation 2.
3. ◦ Prove that the number of trees realizing degree sequence D =
d1 , d2 , . . . , dn and in which vi is adjacent to a prescribed leaf and vj
is also adjacent to a prescribed leaf is
(n − 4)!
Q .
(di − 2)!(dj − 2)! k6=i,j (dk − 1)!
Approximable counting and sampling problems 359
give an alternative proof that the volume of the poset polytope of a total
1
ordering of n elements is indeed n! .
5. * Find a graph G = (V, E) and two matchings X and Y on it such that
the canonical path between X and Y as described in Subsection 8.2.2
has length 3n
5 , where n = |V |.
using the fact that there is a rapidly mixing Markov chain converging
to the distribution Y
π(M ) ∝ w(e).
e∈M
14. * Give an example that the Glauber dynamics Markov chain might not
be irreducible when k = ∆ + 1.
15. ◦ Let G = (V, E) be a simple graph, and let G0 = G \ {e} for some
e ∈ E. Let Ωk (G) denote the set of k-colorings of G. Furthermore, let
k ≥ 2∆ + 1, where ∆ is the maximum degree in G. Show that
∆+1 |Ωk (G)|
≤ ≤ 1.
∆+2 |Ωk (G0 )|
Design an algorithm that estimates this ratio via sampling k-colorings
of G0 .
16. It is easy to see that |Ωk (G)| can be estimated as
m−1
Y |Ωk (Gi+1 )|
|Ωk (G0 )| ,
i=0
|Ωk (Gi )|
where each Gi contains one less edge than Gi+1 , G0 is the empty graph,
and Gm = G (that is, m is the number of edges in G). How well should
each fraction be estimated to get an FPRAS for |Ωk (G)|?
8.5 Solutions
Exercise 3. Observe that the number of trees with the prescribed conditions
is the number of trees realizing D0 , where D0 is obtained from D by removing
1-1 from di and dj and deleting two degree 1s.
Exercise 5. Let G be P5 , that is, the path on 5 vertices. Let X be the
matching containing the first and third edges of P5 , and let Y be the matching
containing the second and the fourth edges of P5 . In the canonical path from
X to Y , the first edge is deleted in the first step. Then the second edge is
added and the third is deleted in the second step. Finally, in the third step,
the fourth edge of P3 is added.
Exercise 7. Observe that
[1] http://www.claymath.org/sites/default/files/pvsnp.pdf.
[2] van T. Aardenne-Ehrenfest and N. G. de Bruijn. Wis- en Natuurkundig
Tijdschrift, volume 28, chapter Circuits and trees in oriented linear
graphs, pages 203–217. 1951.
[3] M. Agrawal, N. Kayal, and N. Saxena. PRIMES is in P. Annals of
Mathematics, 160(2):781–793, 2004.
[4] Y. Ajana, J.F. Lefebvre, E. Tillier, and N. El-Mabrouk. Exploring the
set of all minimal sequences of reversals - An application to test the
replication-directed reversal hypothesis. In R. Guigo and D. Gusfield,
editors, Proceedings of the 2nd International Workshop on Algorithms
in Bioinformatics, volume 2452 of Lecture Notes in Computer Science,
pages 300–315. Springer, 2002.
[5] D.J. Aldous. Some inequalities for reversible Markov chains. Journal of
the London Mathematical Society (2), 25(3):564–576, 1982.
[6] D.J. Aldous. Random walks on finite groups and rapidly mixing Markov
chains. In Séminaire de Probabilites XVII, volume 986 of Lecture Notes
in Mathematics, pages 243–297. Springer, 1983.
[7] N. M. Amato, M. T. Goodrich, and E. A. Ramos. A randomized algo-
rithm for triangulating a simple polygon in linear time. Discrete and
Computational Geometry, 26(2):245–265, 2001.
[8] S. Arora and B. Barak. Computational Complexity: A Modern Approach.
Cambridge University Press, 2009.
[9] L. Babai. Graph isomorphism in quasipolynomial time, 2015. arXiv:
1512.03547.
[10] L. Babai and E.M. Luks. Canonical labeling of graphs. In Proceedings
of the 15th Annual ACM Symposium on Theory of Computing, pages
171–183, 1983.
[11] D.A. Bader, B.M.E. Moret, and M. Yan. A linear-time algorithm
for computing inversion distance between signed permutations with an
experimental study. Journal of Computational Biology, 8(5):483–491,
2001.
363
364 Bibliography
[16] L.E. Baum and G.R. Sell. Growth functions for transformations on
manifolds. Pacific Journal of Mathematics, 27(2):211–227, 1968.
[17] A. Bergeron, J. Mixtacki, and J. Stoye. A unifying view of genome
rearrangements. In Proceedings of the 6th Workshop on Algorithms
in Bioinformatics, volume 4175 of Lecture Notes in Computer Science,
pages 163–173. Springer, 2006.
[18] S.J. Berkowitz. On computing the determinant in small parallel time
using a small number of processors. Information Processing Letters,
18:147–150, 1984.
[24] M.D.V. Braga and J. Stoye. The solution space of sorting by DCJ.
Journal of Computational Biology, 17(9):1145–1165, 2010.
[29] R. Bubley and M. Dyer. Path coupling: A technique for proving rapid
mixing in Markov chains. In Proceedings of the 38th Annual Symposium
on Foundations of Computer Science, pages 223–231, 1997.
[30] G. Buffon. Essai d’arithmétique morale. Histoire naturelle, générale er
particuliére, Supplément 4:46–123, 1777.
[31] J.-Y. Cai. Holographic algorithms: Guest column. SIGACT News,
39(2):51–81, 2008.
[32] J-Y. Cai and X. Chen. Complexity Dichotomies for Counting Problems:
Volume 1, Boolean Domain. Cambridge University Press, 2017.
[33] J-Y. Cai, H. Guo, and T. Williams. The complexity of counting edge
colorings and a dichotomy for some higher domain Holant problems.
Research in the Mathematical Sciences, 3:18, 2016.
[34] J-Y. Cai and P. Lu. Holographic algorithms: The power of dimension-
ality resolved. In L. Arge, C. Cachin, Jurdziński T., and A. Tarlecki,
editors, Proceedings of the 34th International Colloquium on Automata,
Languages and Programming, volume 4596 of Lecture Notes in Computer
Science, pages 631–642, 2007.
[35] J-Y. Cai and P. Lu. On symmetric signatures in holographic algorithms.
In W. Thomas and P. Weil, editors, Proceedings of the 24th Annual
Symposium on Theoretical Aspects of Computer Science, volume 4393
of Lecture Notes in Computer Science, pages 429–440, 2007.
[36] J-Y. Cai and P. Lu. Basis collapse in holographic algorithms. Compu-
tational Complexity, 17(2):254–281, 2008.
[38] J.-Y. Cai, P. Lu, and M. Xia. Holographic algorithms with matchgates
capture precisely tractable planar #csp. SIAM Journal on Computing,
46(3):853–889, 2017.
[39] A. Cauchy. Memoire sur le nombre de valeurs qu’une fonction peut
obtenir. J. de l’Ecole Polytechnique X, pages 51–112, 1815.
[40] B. Chazelle. Triangulating a simple polygon in linear time. Discrete and
Computational Geometry, 6(3):485–524, 1991.
[41] N. Chomsky. Transformational Analysis. PhD thesis, University of
Pennsylvania, 1955.
[42] N. Chomsky. On certain formal properties of grammars. Information
and Control, 2:137–167, 1959.
[43] D. A. Christie. Sorting permutations by block-interchanges. Information
Processing Letters, 60:165–169, 1996.
[44] S. Cook. The complexity of theorem proving procedures. In Proceedings
of the 3rd Annual ACM Symposium on Theory of Computing, pages
151–158, 1971.
[45] C. Cooper, M. Dyer, and C. Greenhill. Sampling regular graphs and
a peer-to-peer network. Combinatorics, Probability and Computing,
16(4):557–593, 2007.
[46] C. Cooper, M. Dyer, C. Greenhill, and A. Handley. The flip Markov
chain for connected regular graphs, 2017. arXiv:1701.03856.
[47] D. Coppersmith and S. Winograd. Matrix multiplication via arithmetic
progressions. Journal of Symbolic Computation, 9(3):251–280, 1980.
[48] P. Creed. Counting and Sampling Problems on Eulerian G]raphs, school
= University of Edinburgh, year = 2010, OPTkey = , OPTtype = ,
OPTaddress = , OPTmonth = , OPTnote = , OPTannote = . PhD
thesis.
[49] P. Creed. Sampling Eulerian orientations of triangular lattice graphs.
Journal of Discrete Algorithms, 7(2):168–180, 2009.
[50] É. Czabarka, A. Dutle, P.L. Erdős, and I. Miklós. On realizations of
a joint degree matrix. Discrete Applied Mathematics, 181(30):283–288,
2014.
[51] A.M. Davie and A.J. Stothers. Improved bound for complexity of matrix
multiplication. Proceedings of the Royal Society of Edinburgh Section A,
143(2):351–369, 2013.
[52] P. Diaconis and L. Saloff-Coste. Comparison theorems for reversible
Markov chains. The Annals of Applied Probability, 3(2):696–730, 1993.
Bibliography 367
[69] G.D. Forney. The Viterbi algorithm. Proceedings of the IEEE, 61:268–
278, 1973.
[70] A. Frank. Paths, flows, and VLSI-Layout, chapter Packing paths, cir-
cuits and cuts: A survey. Springer, Berlin, 1990.
[84] C. Greenhill. The switch Markov chain for sampling irregular graphs. In
Proceedings of the 26th ACM SIAM Symposium on Discrete Algorithms,
New York-Philadelphia, pages 1564–1572, 2015.
[85] C. Greenhill and M. Sfragara. The switch Markov chain for sampling
irregular graphs and digraphs. Theoretical Computer Science, 719:1–20,
2018.
[86] L. Gross. Logarithmic Sobolev inequalities. American Journal of Math-
ematics, 97(4):1061–1083, 1975.
[87] D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Sci-
ence and Computational Biology, chapter Maximum parsimony, Steiner
trees, and perfect phylogeny, page 470. Cambridge University Press,
1997.
[88] S.L. Hakimi. On realizability of a set of integers as degrees of the vertices
of a linear graph. i. Journal of the Society for Industrial and Applied
Mathematics, 10:496–506, 1962.
[124] M. Mahajan and V. Vinay. Old algorithms, new insights. SIAM Journal
on Discrete Mathematics, 12:474–490, 1999.
[125] R. Martin and D. Randall. Disjoint decomposition of Markov chains
and sampling circuits in Cayley graphs. Combinatorics, Probability and
Computing, 15:411–448, 2006.
[126] J.S. McCaskill. The equilibrium partition function and base pair binding
probabilities for RNA secondary structure. Biopolymers, 29:1105–1119,
1990.
[127] Braga M.D.V. and Stoye J. Counting all DCJ sorting scenarios. In Ci-
ccarelli F.D. and Miklós I., editors, Proceedings of the 6th RECOMB
Comparative Genomics Workshop, volume 5817 of Lecture Notes in
Computer Science, pages 36–47. Springer, Berlin, Heidelberg, 2009.
[128] N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and
E. Teller. Equations of state calculations by fast computing machines.
Journal of Chemical Physics, 21(6):1087–1092, 1953.
[129] M. Michail and P. Winkler. On the number of Eulerian orientations of
a graph. Algorithmica, 16(4/5):402–414, 1996.
[130] I. Miklós, P.L. Erdős, and L. Soukup. Towards random uniform sampling
of bipartite graphs with given degree sequence. Electronic Journal of
Combinatorics, 20(1):P16, 2013.
[131] I. Miklós, B. Mélykúti, and K. Swenson. The Metropolized partial im-
portance sampling MCMC mixes slowly on minimum reversal rearrange-
ment paths. ACM/IEEE Transactions on Computational Biology and
Bioinformatics, 4(7):763–767, 2010.
Bibliography 373
[132] I. Miklós, I.M. Meyer, and B. Nagy. Moments of the Boltzmann distri-
bution for RNA secondary structures. Bulletin of Mathematical Biology,
67:1031–1047, 2005.
[133] I. Miklós and H. Smith. The computational complexity of calculating
partition functions of optimal medians with hamming distance, 2017.
https://arxiv.org/abs/1506.06107.
[141] L. Pauling. The structure and entropy of ice and of other crystals
with some randomness of atomic arrangement. Journal of the Amer-
ican Chemical Society, 57(12):2680–2684, 1935.
[142] G. Pólya. Aufgabe 424. Arch. Math. Phys., 20:271, 1913.
[143] J.G. Propp and D.B. Wilson. Coupling from the past: A user’s guide. In
D. Aldous and J. Propp, editors, Microsurveys in Discrete Probability,
volume 41 of DIMACS Series in Discrete Mathematics and Theoretical
Computer Science, pages 181–192, 1998.
[144] J.S. Provan and M.O. Ball. The complexity of counting cuts and of
computing the probability that a graph is connected. SIAM Journal on
Computing, 12(4):777–788, 1983.
374 Bibliography
[145] R.L. Rabiner. A tutorial on hidden Markov models and selected appli-
cations in speech recognition. Proceedings of the IEEE, 77(2):257–286,
1989.
[146] N. Robertson, P.D. Seymour, and R. Thomas. Permanents, Pfaffian
orientations, and even directed circuits. The Annals of Mathematics,
150:929–975, 1999.
[158] A.J. Sinclair. Improved bounds for mixing rates of Markov chains
and multicommodity flow. Combinatorics, Probability and Computing,
1(4):351–370, 1992.
[159] M. Sipser. Introduction to the Theory of Computation, page 99. PWS,
1996.
[160] M. Sipser. Introduction to the Theory of Computation. Cengage Learn-
ing; 3 edition, 2012.
[161] L.G. Stockmeyer. The complexity of approximate counting. In Pro-
ceedings of the 15th Annual ACM Symposium on Theory of Computing,
pages 118–126.
[162] V. Strassen. Gaussian elimination is not optimal. Numerische Mathe-
matik, 13(4):354–356, 1969.
[163] R.L. Stratonovich. Conditional Markov processes. Theory of Probability
and its Applications, 5(2):156–178, 1960.
[164] S. Straub, T Thierauf, and F. Wagner. Counting the number of the
perfect matchings in K5 -free graphs. In Electronic Colloquium on Com-
putational Complexity. 2014.
[165] K.M. Swenson, G. Badr, and D. Sankoff. Listing all sorting reversals in
quadratic time. Algorithms for Molecular Biology, 6:11, 2011.
[166] E. Tannier, A. Bergeron, and M.-F. Sagot. Advances on sorting by
reversals. Discrete Applied Mathematics, 155(6–7):881–888, 2007.
[167] H. N. V. Temperley and Michael E. Fisher. Dimer problem in statistical
mechanics: An exact result. Philosophical Magazine, 6(68):1061–1063,
1961.
[173] S.P. Vadhan. The complexity of counting in sparse, regular and planar
graphs. SIAM Journal on Computing, 31(2):398–427, 2001.
[174] L. G. Valiant. Accidental algorithms. In Proceedings of the 47th Annual
IEEE Symposium on Foundations of Computer Science, pages 509–517.
[187] M. Zucker and D. Sankoff. RNA secondary structures and their predic-
tion. Bulletin of Mathematical Biology, 46:591–621, 1984.
Index
A Felsenstein’s algorithm, 89
Adjacency graph, 342 Fibonacci numbers, 36
Algebraic dynamic programming, finding the recursion for
sampling with, 256–260 optimization, 49–50
Algebraic dynamic programming and formal definition of algebraic
monotone computations, dynamic programming,
35–122 45–48
algebraic dynamic programming, Forward algorithm, 65
introduction to, 36–51 Gale-Ryser theorem, 99
ambiguous grammar, 60, 73, 78 gap character, 66
base pairs, 80 graphical sequences, 99
binary search tree, 96, 114 greedy algorithm, 90
Boltzmann distribution, 83, 84 Hamiltonian path, 87
Boolean algebra, 56 homomorphism, 54, 57
Catalan number, 40, 41, 78, 110 Knudsen-Hein grammar, 81
Chomsky hierarchy, 60 Kronecker delta function, 105
Chomsky normal form, 74, 76 left neighbor, 107
co-emission pattern, 72 legal sequence, 101
coin system, 42–43 legitimate code word, 39
computational problem, 48 limitations of algebraic dynamic
context-free grammars, 73–85 programming approach,
counting, optimizing, deciding, 89–91
51–59 longest common subsequence, 67
counting the coin sequences matrix multiplication, 90
summing up to a given maximum matching, 116
amount, 49 monoid semiring, 52
counting the coin sequences non-terminals, 60, 73
when the order does not outermost base pairs, 81
count, 51 palindromic supersequence, 117
counting the total sum of partition function, 83
weights, 50–51 partition polynomial, 44, 49
distributive rule, 46, 58 Pascal’s triangle, 37
dual tropical semiring, 60 power of algebraic dynamic
Dyck words, 78–80, 93 programming (money
edit distance of sequences, 67 change problem), 48–51
evaluation algebras, 36, 38 pseudo-knot free secondary
exercises, 91–99 structure, 80
379
380 Index
C D
Canonical path method, 291, 329 Decision problem, 2
Catalan number, 40, 41, 78, 110, computational problem, 48
269 deterministic, 4–7
Caterpillar binary tree, 317 MPDCJ scenario problem, 342
Chebyshev functions, 189, 191 NP-complete, 24, 234
Cheeger’s inequality, 278, 287, 327 3DNF, 186
Chernoff’s inequality, 17, 300, 306 #3SAT, 167
Cherry motif (elementary subtree), Deterministic counting, 8–11
180 bipartite graph, 9
Chinese remainder theorem, 192 computational problem, 8
Chomsky hierarchy, 60 Euclidian space, 11
Chromosomes, 341 function problem, 8
Clow (closed ordered walk), 124, 130 partially ordered set, 10
Co-emission pattern, 72 #P-complete counting problem,
Coin system, 42–43 10
Commutative ring, 25, 50, 129, 137, permanent, 9
219 poset polytope, 11
Computational complexity, #P problems, 8
background on, see Deterministic decision problems,
Background on 4–7
computational complexity Boolean logic, 6
Computational problem, definition computational complexity
of, 2 theory, 4
Conjunctive normal form (CNF), Hamiltonian path, 7
6 k-clique problem, 5
Constraint Satisfaction Problems NP-complete problems, 6
(CSP), 240 polynomial reduction, 5
Context-free grammar, 73–85 satisfying assignment, 6
Convex body, computing the volume Directed graphs
of, 11–14 adjacency matrix of, 170
Counting, computational complexity BEST algorithm and, 139
of closed ordered walk and, 124
algebraic dynamic programming construction, 21
and monotone edge-weighted, 189
computations, 35–122 Eulerian, 141, 210
holographic algorithms, 217–244 Hamiltonian path, 7
Index 383
Metropolis-Hastings Markov P
chain, 352 Palindromic supersequence, 117
Most Parsimonious DCJ Papadimitriou’s theorem, 16
scenario problem, 342 Parse tree, 259
pigeonhole rule, 334 Parsimonious reduction, 166
RSO Markov chain, 335 Partially ordered set, 10
swap Markov chain, 339 Partition polynomial, 44, 49
swap operation, 330 Pascal’s triangle, 37
sweeping process, 331 #P-complete counting problems,
telomeres, 340 165–216
transitions of Markov chain, ambiguous grammar, 187
326 approximation-preserving
tree degree sequences, 324 #P-complete proofs,
trivial components, 342 167–186
Matchgates, 218 approximation-preserving
Matching polynomial, 200, 203 reductions, 166
Matrix multiplication, 90 #BIS-complete problems,
Median sequences, 196 211
Metropolis-Hastings algorithm, 266, blown-up subtree, 181
350 Boolean formula, 210
Metropolis-Hastings Markov chain, BPP algorithm, 185
352 calculating the permanent of
#MinDegree1-2Trees-DegreePacking, arbitrary matrix, 170–174
323 Chebyshev functions, 191
Money change problem, 48–51 cherry motif, 180
Monoid semiring, 52 Chinese remainder theorem,
Monotone computations, 25 192
Most Parsimonious DCJ (MPDCJ) computing the permanent of
scenario problem, 342 non-negative matrix and
counting perfect matchings
N in bipartite graphs, 188–190
Non-terminal characters, 60 counting the linear extensions of
Not-all-equal clause, 234 a poset, 191–195
NP-complete problems, 6, 9, 24, 241 counting the most parsimonious
substitution histories on
O evolutionary tree, 174–184
Optimization problems, 1, 2, 358 counting the most parsimonious
MPDCJ, 342 substitution histories on
NP-hard, 86 star tree, 195–199
#P-complete, 211 counting the (not necessarily
solvable with algebraic dynamic perfect) matchings of
programming, 59 bipartite graph, 190–191
SP-Tree problem, 174 counting the (not necessarily
Oracle, 12 perfect) matchings in
planar graph, 199–204
388 Index