0% found this document useful (0 votes)
5 views409 pages

Comput Complexity of Count and Sampl

The document is a book titled 'Computational Complexity of Counting and Sampling' by István Miklós, published by CRC Press in 2019. It covers various aspects of computational complexity, including deterministic and stochastic counting, sampling methods, and algorithms for counting problems. The book also includes exercises and solutions to enhance understanding of the topics discussed.

Uploaded by

imed jomaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views409 pages

Comput Complexity of Count and Sampl

The document is a book titled 'Computational Complexity of Counting and Sampling' by István Miklós, published by CRC Press in 2019. It covers various aspects of computational complexity, including deterministic and stochastic counting, sampling methods, and algorithms for counting problems. The book also includes exercises and solutions to enhance understanding of the topics discussed.

Uploaded by

imed jomaa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 409

Computational

Complexity of Counting
and Sampling
Computational
Complexity of Counting
and Sampling

István Miklós
Rényi Institute, Budapest, Hungary
CRC Press
Taylor & Francis Group
6000 Broken Sound Parkway NW, Suite 300
Boca Raton, FL 33487-2742

© 2019 by Taylor & Francis Group, LLC


CRC Press is an imprint of Taylor & Francis Group, an Informa business

No claim to original U.S. Government works

Printed on acid-free paper


Version Date: 20190201

International Standard Book Number-13: 978-1-138-03557-7 (Paperback)


International Standard Book Number-13: 978-1-138-07083-7 (Hardback)

This book contains information obtained from authentic and highly regarded sources. Reason-
able efforts have been made to publish reliable data and information, but the author and publisher
cannot assume responsibility for the validity of all materials or the consequences of their use. The
authors and publishers have attempted to trace the copyright holders of all material reproduced in
this publication and apologize to copyright holders if permission to publish in this form has not
been obtained. If any copyright material has not been acknowledged please write and let us know
so we may rectify in any future reprint.

Except as permitted under U.S. Copyright Law, no part of this book may be reprinted, reproduced,
transmitted, or utilized in any form by any electronic, mechanical, or other means, now known or
hereafter invented, including photocopying, microfilming, and recording, or in any information
storage or retrieval system, without written permission from the publishers.

For permission to photocopy or use material electronically from this work, please access www.
copyright.com (http://www.copyright.com/) or contact the Copyright Clearance Center, Inc.
(CCC), 222 Rosewood Drive, Danvers, MA 01923, 978-750-8400. CCC is a not-for-profit organiza-
tion that provides licenses and registration for a variety of users. For organizations that have been
granted a photocopy license by the CCC, a separate system of payment has been arranged.

Trademark Notice: Product or corporate names may be trademarks or registered trademarks, and
are used only for identification and explanation without intent to infringe.

Library of Congress Cataloging-in-Publication Data

Names: Miklos, Istvan (Mathematician), author.


Title: Computational complexity of counting and sampling / Istvan Miklos.
Description: Boca Raton : Taylor & Francis, 2018. | Includes bibliographical
references.
Identifiers: LCCN 2018033716 | ISBN 9781138035577 (pbk.)
Subjects: LCSH: Computational complexity. | Sampling (Statistics)
Classification: LCC QA267.7 .M55 2018 | DDC 511.3/52--dc23
LC record available at https://lccn.loc.gov/2018033716

Visit the Taylor & Francis Web site at


http://www.taylorandfrancis.com

and the CRC Press Web site at


http://www.crcpress.com
To the memory of my beloved wife, Ágnes Nyúl
(1972–2018)
Contents

Preface xi

List of Figures xiii

List of Tables xvii

1 Background on computational complexity 1

1.1 General overview of computational problems . . . . . . . . . 2


1.2 Deterministic decision problems: P, NP, NP-complete . . . . 4
1.3 Deterministic counting: FP, #P, #P-complete . . . . . . . . 8
1.4 Computing the volume of a convex body, deterministic versus
stochastic case . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Random decision algorithms: RP, BPP. Papadimitriou’s theo-
rem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.6 Stochastic counting and sampling: FPRAS and FPAUS . . . 19
1.7 Conclusions and the overview of the book . . . . . . . . . . . 24
1.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.9 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

I Computational Complexity of Counting 33


2 Algebraic dynamic programming and monotone computa-
tions 35

2.1 Introducing algebraic dynamic programming . . . . . . . . . 36


2.1.1 Recursions, dynamic programming . . . . . . . . . . . 36
2.1.2 Formal definition of algebraic dynamic programming . 45
2.1.3 The power of algebraic dynamic programming: Variants
of the money change problem . . . . . . . . . . . . . . 48
2.1.3.1 Counting the coin sequences summing up to a
given amount . . . . . . . . . . . . . . . . . . 49
2.1.3.2 Calculating the partition polynomial . . . . . 49
2.1.3.3 Finding the recursion for optimization with al-
gebraic dynamic programming . . . . . . . . 49
2.1.3.4 Counting the total sum of weights . . . . . . 50

vii
viii Contents

2.1.3.5 Counting the coin sequences when the order


does not count . . . . . . . . . . . . . . . . . 51
2.2 Counting, optimizing, deciding . . . . . . . . . . . . . . . . . 51
2.3 The zoo of counting and optimization problems solvable with
algebraic dynamic programming . . . . . . . . . . . . . . . . 59
2.3.1 Regular grammars, Hidden Markov Models . . . . . . 59
2.3.2 Sequence alignment problems, pair Hidden Markov
Models . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.3.3 Context-free grammars . . . . . . . . . . . . . . . . . . 73
2.3.4 Walks on directed graphs . . . . . . . . . . . . . . . . 85
2.3.5 Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
2.4 Limitations of the algebraic dynamic programming approach 89
2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
2.6 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

3 Linear algebraic algorithms. The power of subtracting 123

3.1 Division-free algorithms for calculating the determinant and


Pfaffian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
3.2 Kirchhoff’s matrix-tree theorem . . . . . . . . . . . . . . . . 133
3.3 The BEST (de Bruijn-Ehrenfest-Smith-Tutte) algorithm . . 139
3.4 The FKT (Fisher-Kasteleyn-Temperley) algorithm . . . . . . 145
3.5 The power of subtraction . . . . . . . . . . . . . . . . . . . . 154
3.6 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . 157
3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
3.8 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

4 #P-complete counting problems 165

4.1 Approximation-preserving #P-complete proofs . . . . . . . . 167


4.1.1 #3SAT . . . . . . . . . . . . . . . . . . . . . . . . . . 167
4.1.2 Calculating the permanent of an arbitrary matrix . . . 170
4.1.3 Counting the most parsimonious substitution histories
on an evolutionary tree . . . . . . . . . . . . . . . . . 174
4.1.4 #IS and #Mon-2SAT . . . . . . . . . . . . . . . . . 184
4.2 #P-complete proofs not preserving the relative error . . . . . 186
4.2.1 #DNF, #3DNF . . . . . . . . . . . . . . . . . . . . . 186
4.2.2 Counting the sequences of a given length that a regular
grammar can generate . . . . . . . . . . . . . . . . . . 187
4.2.3 Computing the permanent of a non-negative matrix and
counting perfect matchings in bipartite graphs . . . . 188
4.2.4 Counting the (not necessarily perfect) matchings of a
bipartite graph . . . . . . . . . . . . . . . . . . . . . . 190
4.2.5 Counting the linear extensions of a poset . . . . . . . 191
Contents ix

4.2.6 Counting the most parsimonious substitution histories


on a star tree . . . . . . . . . . . . . . . . . . . . . . . 195
4.2.7 Counting the (not necessarily perfect) matchings in a
planar graph . . . . . . . . . . . . . . . . . . . . . . . 199
4.2.8 Counting the subtrees of a graph . . . . . . . . . . . . 204
4.2.9 Number of Eulerian orientations in a Eulerian graph . 207
4.3 Further reading and open problems . . . . . . . . . . . . . . 208
4.3.1 Further results . . . . . . . . . . . . . . . . . . . . . . 208
4.3.2 #BIS-complete problems . . . . . . . . . . . . . . . . 211
4.3.3 Open problems . . . . . . . . . . . . . . . . . . . . . . 211
4.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
4.5 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

5 Holographic algorithms 217

5.1 Holographic reduction . . . . . . . . . . . . . . . . . . . . . . 218


5.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
5.2.1 #X-matchings . . . . . . . . . . . . . . . . . . . . . . 224
5.2.2 #Pl-3-(1,1)-Cyclechain . . . . . . . . . . . . . . . . . . 228
5.2.3 #Pl-3-NAE-ICE . . . . . . . . . . . . . . . . . . . . . 231
5.2.4 #Pl-3-NAE-SAT . . . . . . . . . . . . . . . . . . . . . 234
5.2.5 #7 Pl-Rtw-Mon-3SAT . . . . . . . . . . . . . . . . . . 236
5.3 Further results and open problems . . . . . . . . . . . . . . . 240
5.3.1 Further results . . . . . . . . . . . . . . . . . . . . . . 240
5.3.2 Open problems . . . . . . . . . . . . . . . . . . . . . . 241
5.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
5.5 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243

II Computational Complexity of Sampling 245


6 Methods of random generations 247

6.1 Generating random numbers . . . . . . . . . . . . . . . . . . 248


6.2 Rejection sampling . . . . . . . . . . . . . . . . . . . . . . . 249
6.3 Importance sampling . . . . . . . . . . . . . . . . . . . . . . 253
6.4 Sampling with algebraic dynamic programming . . . . . . . 256
6.5 Sampling self-reducible objects . . . . . . . . . . . . . . . . . 260
6.6 Markov chain Monte Carlo . . . . . . . . . . . . . . . . . . . 263
6.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
6.8 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269

7 Mixing of Markov chains and their applications in the theory


of counting and sampling 273

7.1 Relaxation time and second-largest eigenvalue . . . . . . . . 274


7.2 Techniques to prove rapid mixing of Markov chains . . . . . 277
x Contents

7.2.1 Cheeger’s inequalities and the isoperimetric inequality 278


7.2.2 Mixing of Markov chains on factorized state spaces . . 283
7.2.3 Canonical paths and multicommodity flow . . . . . . . 291
7.2.4 Coupling of Markov chains . . . . . . . . . . . . . . . 298
7.2.5 Mixing of Markov chains on direct product spaces . . 300
7.3 Self-reducible counting problems . . . . . . . . . . . . . . . . 301
7.3.1 The Jerrum-Valiant-Vazirani theorem . . . . . . . . . 302
7.3.2 Dichotomy theory on the approximability of self-
reducible counting problems . . . . . . . . . . . . . . . 307
7.4 Further reading and open questions . . . . . . . . . . . . . . 311
7.4.1 Further reading . . . . . . . . . . . . . . . . . . . . . . 311
7.4.2 Open problems . . . . . . . . . . . . . . . . . . . . . . 313
7.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
7.6 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317

8 Approximable counting and sampling problems 321

8.1 Sampling with the rejection method . . . . . . . . . . . . . . 322


8.1.1 #Knapsack . . . . . . . . . . . . . . . . . . . . . . . . 322
8.1.2 Edge-disjoint tree realizations without common internal
vertices . . . . . . . . . . . . . . . . . . . . . . . . . . 323
8.2 Sampling with Markov chains . . . . . . . . . . . . . . . . . 326
8.2.1 Linear extensions of posets . . . . . . . . . . . . . . . 326
8.2.2 Counting the (not necessarily perfect) matchings of a
graph . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
8.2.3 Sampling realizations of bipartite degree sequences . . 330
8.2.4 Balanced realizations of a JDM . . . . . . . . . . . . . 334
8.2.5 Counting the most parsimonious DCJ scenarios . . . . 340
8.2.6 Sampling and counting the k-colorings of a graph . . . 353
8.3 Further results and open problems . . . . . . . . . . . . . . . 356
8.3.1 Further results . . . . . . . . . . . . . . . . . . . . . . 356
8.3.2 Open problems . . . . . . . . . . . . . . . . . . . . . . 357
8.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
8.5 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360

Bibliography 363

Index 379
Preface

The idea to write a book on the computational complexity of counting and


sampling came to our mind in 2016 February, when Miklós Bóna co-organized
a Dagstuhl seminar with Michael Albert, Einar Steingrı́msson, and me. We
realized that many of the enumerative combinatorists know little about com-
puter science, and clearly, there is a demand for a book that introduces
the computational aspects of enumerative combinatorics. Similarly, there are
physicists, bioinformaticians, engineers, statisticians, and other applied math-
ematicians, who develop and use Markov chain Monte Carlo methods, but are
not aware of the theoretical computer scientific background of sampling.
The aim of this book is to give a broad overview of the computational com-
plexity of counting and sampling, from very simple things like linear recur-
rences, to high level topics like holographic reductions and mixing of Markov
chains. Since the book starts with the basics, eager MSc, PhD students, and
young postdoctoral researchers devoted to computer science, combinatorics,
and/or statistics might start studying this book. The book is also unique in
the way that it focuses equally on computationally easy and hard problems,
and highlights those easy problems that have hard variants. For example, it
is easy to count the generations of a regular grammar that produce sequences
of length n. On the other hand, it is hard to count the number of sequences
of length n that a regular grammar can generate.
There is a special emphasis on bioinformatics-related problems in the hope
of bringing theory and applications closer. A bunch of open problems are
drawn to the attention of theorists, who might find them interesting and
challenging enough to work on them. We also believe that there will be applied
mathematicians who want to deepen their understanding of the theory of
sampling, and will be happy to see that the theory is explained via examples
they already know.
Many of the topics are introduced via worked-out examples, and a long
list of exercises can be found at the end of each chapter. Exercises marked
with * have a detailed solution, while hints can be found on exercises marked
with ◦. Unsolved exercises vary from very simple to challenging. Therefore,
instructors will find appropriate exercises for students at all levels.
Although the book starts with the basics, it still needs prerequisites. Back-
ground in basic combinatorics, graph theory, linear and abstract algebra, and
probability theory is expected. A discussion on computational complexity is

xi
xii Preface

very briefly presented at the beginning of the book. However, Turing Machines
and/or other models of computations are not explained in this book.
We wanted to give a thorough overview of the field. Still, several topics are
omitted or not discussed in detail in this book. As the book focuses on classify-
ing easy and hard computational problems, very little is presented on improved
running times and asymptotic optimality of algorithms. For example, divide
and conquer algorithms, like the celebrated “four Russians speed-up”, can-
not be found in this book. Similarly, the logarithmic Sobolev inequalities are
not discussed in detail in the chapter on the mixing of Markov chains. Many
beautiful topics, like stochastic computing of the volume of convex bodies,
monotone circuit complexity, #BIS-complete counting problems, Fibonacci
gates, path coupling, and coupling from the past are mentioned only very
briefly due to limited space.
Writing this book was great fun. This work could not have been accom-
plished without the help of my colleagues. I would like to thank Miklós Bóna
for suggesting to write this book. Also, the whole team at CRC Press is
thanked for their support. Special thanks should go to Jin-Yi Cai, Cather-
ine Greenhill, and Zoltán Király for drawing my attention to several papers
I had not been aware of. I would like to thank Kálmán Cziszter, Mátyás
Domokos, Péter Erdős, Jotun Hein, Péter Pál Pálfy, Lajos Rónyai, and Miklós
Simonovits for fruitful discussions. András Rácz was volunteered to read the
first two chapters of the book and to comment, for which I would like to
warmly thank him. Last but not least, I will always remember my beloved
wife, Ágnes Nyúl, who supported the writing of this book till the end of her
last days, and who, unfortunately, passed away before the publication of this
book.
List of Figures

1.1 The gadget graph replacing a directed edge in the proof of


Theorem 16. See text for details. . . . . . . . . . . . . . . . 21

2.1 A stair-step shape of height 5 tiled with 5 rectangles. The


horizontal lines inside the shape are highlighted as dotted.
The four circles indicate the four corners of the rectangle at
the top left corner that cuts the stair-step shape into two
smaller ones. See text for details. . . . . . . . . . . . . . . . 40
2.2 a) Nested, b) separated, c) crossing base pairs. Each base
pair is represented by an arc connecting the index positions
of the base pair. . . . . . . . . . . . . . . . . . . . . . . . . . 80

4.1 The directed acyclic graph representation of the CNF


(x1 ∨ x2 ∨ x3 ∨ x4 ) ∧ (x2 ∨ x3 ∨ x4 ). Logical values are prop-
agated on the edges, an edge crossed with a tilde (∼) means
negation. Each internal node has two incoming edges and one
outgoing edge. The operation performed by a node might
be a logical OR (∨) or a logical AND (∧). The outcome
of the operation is the income at the other end of the
outgoing edge. . . . . . . . . . . . . . . . . . . . . . . . . . . 169
4.2 The track T5 for the variable x5 when x5 is a literal in C2 and
C5 and x5 is a literal in C3 . . . . . . . . . . . . . . . . . . . 172
4.3 The interchange R3 for the clause C3 = (x1 ∨ x5 ∨ x8 ). Note
that interchanges do not distinguish literals xi and xi . Each
edge in and above the line of the junctions goes from left
to right, and each edge below the line of the junctions goes
from right to left. Junctions without labels are the internal
junctions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
4.4 Constructing a subtree Tcj for a clause cj . The subtree is
built in three phases. First, elementary subtrees are connected
with a comb to get a unit subtree. In the second phase the
same unit subtree is repeated several times, “blowing up” the
tree. In the third phase, the blown-up tree is amended with a
constant size, depth 3 fully balanced tree. The smaller subtrees
constructed in the previous phase are denoted with a triangle
in the next phase. See also text for details. . . . . . . . . . . 177

xiii
xiv List of Figures

4.5 a) A cherry motif, i.e., two leaves connected with an internal


node. b) A comb, i.e., a fully unbalanced tree. c) A tree with
3 cherry motifs connected with a comb. The assignments for
4 adjacencies, α1 , α2 , α3 and αx are shown at the bottom
for each leaf. αi , i = 1, 2, 3 are the adjacencies related to the
logical variables bi , and αx is an extra adjacency. Note that
Fitch’s algorithm gives ambiguity for all adjacencies αi at the
root of this subtree. . . . . . . . . . . . . . . . . . . . . . . . 180
4.6 The unweighted subgraph replacing the edge (v, w) with a
weight of 3. See text for details. . . . . . . . . . . . . . . . . 190
4.7 The Hasse diagram of a clause poset. See text for details. . . 192
4.8 The poset PΦ,p . Ovals represent an antichain of size p − 1. For
sake of clarity, only the literal and some of the clause vertices
for the clause cj = (xi1 ∨ xi2 ∨ xi3 ) are presented here. See
also the text for details. . . . . . . . . . . . . . . . . . . . . 193
4.9 The gadget component ∆1 for replacing a crossing in a non-
planar graph. See text for details. . . . . . . . . . . . . . . . 201
4.10 The gadget component ∆2 for replacing a crossing in a non-
planar graph. See text for details. . . . . . . . . . . . . . . . 201
4.11 The gadget Γ replacing a crossing in a non-planar graph. See
text for details. . . . . . . . . . . . . . . . . . . . . . . . . . 202
4.12 The gadget ∆ replacing a vertex in a planar, 3-regular graph.
See text for details. . . . . . . . . . . . . . . . . . . . . . . . 204

5.1 An edge-weighted bipartite planar graph as an illustrative ex-


ample of the #X-matchings problem. See text for details. . 224
5.2 The matchgrid solving the #X-matching problem for the
graph in Figure 5.1. The edges labeled by ei belong to the
set of edges C, and are considered to have weights 1. See the
text for details. . . . . . . . . . . . . . . . . . . . . . . . . . 226
5.3 An example problem instance for the problem #Pl-3-NAE-
ICE. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
5.4 The matchgrid solving the #Pl-3-NAE-ICE problem for the
problem instance in Figure 5.3. The edges belonging to the
edge set C are dotted. The recognizer matchgates are put
into dashed circles. . . . . . . . . . . . . . . . . . . . . . . . 233

7.1 The structure of Y = Y l t Y u . A non-filled ellipse (with a


simple line boundary) represents the space Yx for a given x.
The solid black ellipses represent the set S with some of them
(the Sl ) belonging to the lower part Y l , and the rest (the Su )
belonging to the upper part (Y u ). . . . . . . . . . . . . . . . 285
List of Figures xv

7.2 When Sl is not a negligible part of S, there is a considerable


flow going out from Sl to within Y l , implying that the con-
ditional flow going out from S cannot be small. See text for
details and rigorous calculations. . . . . . . . . . . . . . . . 287
7.3 When Sl is a negligible part of S, there is a considerable flow
going out from Su into Y l \Sl . See text for details and rigorous
calculations. . . . . . . . . . . . . . . . . . . . . . . . . . . . 288

8.1 Construction of the auxiliary bipartite graph Gi and a


RSO {(v1 , w), (v2 , r)} 7→ {(v1 , r), (v2 , w)} taking (x1 , y1 ) into
(x2 , y2 ). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 339
8.2 An example of two genomes with 7 markers. . . . . . . . . . 341
List of Tables

4.1 The number of scenarios on different elementary subtrees of


the unit subtree of the subtree Tcj for clause cj = x1 ∨ x2 ∨ x3 .
Columns represent the 14 different elementary subtrees, the
topology of the elementary subtree is indicated on the top.
The black dots mean extra substitutions on the indicated edge
due to the characters in the auxiliary positions; the numbers
represent the presence/absence of adjacencies on the left leaf
of a particular cherry motif, see text for details. The row start-
ing with # indicates the number of repeats of the elementary
subtrees. Further rows represent the logical true/false values
of the literals, for example, 001 means x1 = FALSE, x2 =
FALSE, x3 = TRUE. The values in the table indicate the
number of scenarios, raised to the appropriate power due to
multiplicity of the elementary subtrees. It is easy to check that
the product of the numbers in the first line is 2136 × 376 and
in any other lines is 2156 × 364 . . . . . . . . . . . . . . . . . 180
4.2 Constructing the 50 sequences for a clause. See text for expla-
nation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

xvii
Chapter 1
Background on computational
complexity

1.1 General overview of computational problems . . . . . . . . . . . . . . . . . . . . 2


1.2 Deterministic decision problems: P, NP, NP-complete . . . . . . . . . . 4
1.3 Deterministic counting: FP, #P, #P-complete . . . . . . . . . . . . . . . . . . 8
1.4 Computing the volume of a convex body, deterministic versus
stochastic case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 Random decision algorithms: RP, BPP. Papadimitriou’s
theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.6 Stochastic counting and sampling: FPRAS and FPAUS . . . . . . . . 19
1.7 Conclusions and the overview of the book . . . . . . . . . . . . . . . . . . . . . . . 24
1.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
1.9 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

In computational complexity theory, we distinguish decision, optimization,


counting and sampling problems. Although this book is about the computa-
tional complexity of counting and sampling, counting and sampling problems
are related to decision and optimization problems. Counting problems are al-
ways at least as hard as their decision counterparts. Indeed, if we can tell,
say, the number of perfect matchings in a graph G, then naturally we can tell
if there exists a perfect matching in G: G contains a perfect matching if and
only if the number of perfect matchings in G is at least 1.
Optimization problems are also related to counting and sampling. As we
are going to show in this chapter, it is hard to count the cycles in a directed
graph as well as sampling them since it is hard to find the longest cycle in a
graph. This might be surprising in the light that finding a cycle in a graph is
an easy problem. There are numerous other cases when the counting version
of an easy decision problem is hard since finding an optimal solution is hard
in spite of the fact that finding one (arbitrary) solution is easy. Although
we briefly review the main complexity classes of decision and optimization
problems in Sections 1.2 and 1.5, we assume readers have prior knowledge on
them. Possible references on computational complexity are [8, 140, 160].
When we are talking about easy and hard problems, we use the convention
of computational complexity that a problem is defined as an easy computa-
tional problem if there is a polynomial running time algorithm to solve it. Very
rarely we can unconditionally prove that a polynomial running time algorithm

1
2 Computational complexity of counting and sampling

does not exist for a computational problem. However, we can prove that no
polynomial running time algorithm exists for certain counting problems if no
polynomial running time algorithm exists for certain hard decision problems.
This fact also underlines why discussing decision problems is inevitable in a
book about computational complexity of counting and sampling.
When exact counting is hard, approximate counting might be easy or hard.
Surprisingly, hard counting problems might be easy to approximate stochasti-
cally, however, there are counting problems that we cannot approximate well.
We conjecture that they are hard to approximate, and this is a point where
stochastic approximations are also related to random approaches to decision
problems. Particularly, if no random algorithm exists for certain hard decision
problems that run in polynomial time and is any better than random guessing,
then there is no efficient good approximation for certain counting problems.
In this chapter, we give a brief introduction to computational complexity
and show how computational complexity of counting and sampling is related
to computational complexity of decision and optimization problems.

1.1 General overview of computational problems


A computational problem is a mathematical object representing a collection
of questions that computers might be able to solve. The questions belonging
to a computational problem are also called problem instances. An example of
a decision problem is the triangle problem which asks if there is triangle in a
finite graph. The problem instances are the finite graphs and the answer for
any problem instance is “yes” or “no” depending on whether or not there is a
triangle in the graph. In this computational problem, a triangle in a graph is
called a witness or solution. In general, the witnesses of a problem instance are
the mathematical objects that certify that the answer for the decision problem
is “yes”. An example for an optimization problem is the clique problem which
asks what the largest clique (complete subgraph) is in a finite graph. The
problem instances are again the finite graphs and the solutions are the largest
cliques in the graphs.
Any decision or optimization problem has its natural counting counterpart
problem asking the number of witnesses or solutions. For example, we can ask
how many triangles a graph has, as well as how many largest cliques a graph
has.
Computational problems might be solved with algorithms. We can classify
algorithms based on their properties. Algorithms might be exact or approxi-
mate, might be deterministic or random, and probably their most important
feature is if they are feasible or infeasible. To define feasibility, we have to
define how to measure it. Larger problem instances might need more compu-
tational steps, also called running time. Therefore, it is natural to measure the
Background on computational complexity 3

complexity of an algorithm with the necessary computational steps as a func-


tion of the input (problem instance) size. The size of the problem instance
is defined as the number of bits necessary to describe it. A computational
problem is defined as tractable if its running time grows with a polynomial
function of the size of the input, and intractable if its running time grows
exponentially or even more with the size of the input. This definition ignores
constant factors, the order of the polynomial and typical input sizes. This
means that theoretically tractable problems might be infeasible in practice,
and vice versa, theoretically intractable problems might be feasible in prac-
tice if the typical input sizes are small. Interested readers can find a series of
exercises exploring this phenomena at the end of the chapter (Exercises 1–3).
In practice, most of the tractable algorithms run in at most cubic time, and
their constant factor is less than 10. These algorithms are not only theoreti-
cally tractable but also feasible in practice. The given definitions of tractable
and intractable problems do not cover all algorithms as there are functions
that grow faster than any polynomial function but slower than any exponential
function. Such functions are called superpolynomial and subexponential. Al-
though there are remarkable computational problems, most notably the graph
isomorphism problem [9, 10], which is conjectured to have superpolynomial
and subexponential running time algorithms in the best case, such problems
are relatively rare, and not discussed in detail in this book.
Observe that both the size of the problem instance and the number of
computational steps are not precisely defined. Indeed, a graph, for example,
might be encoded by its adjacency matrix or by the list of edges in it. These
encodings might have different numbers of bits. Similarly, on many computers,
different operations might have different running times: the time necessary to
multiply two numbers might be much more than the time needed to add
two numbers. To get rigorous mathematical definitions, theoretical computer
science introduced mathematical models of computations; the best known
are the Turing machines. In this book, we avoid these formal descriptions
of computations. The reason for this is that we are interested in only the
order of the running time, and constant factors are hidden in the O (big O,
ordo) notation. Even if sizes are defined in different ways, different definitions
almost never have exponential (or more precisely, superpolynomial) gaps. For
example, if a graph has n vertices, then it might have O(n2 ) edges. However, it
does not make a theoretical difference if an algorithm on graphs runs in O(n3 )
time or O(m1.5 ) time, where n is the number of vertices and m is the number
of edges: both functions are polynomials. The only difference when there is an
exponential gap between two concepts of input sizes is when we distinguish the
value of the number and the number of bits necessary to describe a number.
When we would like to emphasize that the input size is the value of the
number, we will say that the input numbers are given in unary. A typical
example is the subset sum problem, where we ask if we can select a subset of
integers whose sum is a prescribed value W . There is a dynamic programming
algorithm to solve this problem whose running time is polynomial with the
4 Computational complexity of counting and sampling

value of W . However, it is a hard decision problem if W is not given in unary


[108].

1.2 Deterministic decision problems: P, NP, NP-


complete
Definition 1. In computational complexity theory, P is the class that contains
the decision problems solvable in polynomial time.

Examples for decision problems in P are the following:


• The perfect matching problem asks if a graph has a perfect matching.
A perfect matching is a set of independent edges that covers all vertices
[60].

• The substring problem asks if a sequence A is a substring of sequence


B. A substring is a series of consecutive characters of a sequence, for
example, A = aba is a substring of B = bbabaaab since the third, fourth
and fifth characters of B is indeed sequence A.
• The primality testing problem asks if a positive integer number is a
prime number. Surprisingly, this problem can be solved in polynomial
time even if the input size is the number of digits necessary to write
down the number [3].
One of the most important and unsolved questions in theoretical computer
science is whether or not P is equal to NP. Formally, the complexity class
NP contains the problems that can be solved in polynomial time with non-
deterministic Turing machines. The name NP stands for “non-deterministic
polynomial”. Since we do not introduce Turing machines in this book, an
alternative, equivalent definition is given here.
Definition 2. The complexity class NP contains the decision problems for
which solutions can be verified in polynomial time.
This definition is more intuitive than the formal definition using Turing
machines. Examples for problems in NP are the following.
• The k-clique problem asks if there is a clique of size k in a graph, that
is a subgraph isomorphic to the complete graph Kk .

• The two partitioning problem asks if there is a partitioning of a finite set


of integer numbers into two subsets such that the sum of the numbers
in the two subsets is the same.
Background on computational complexity 5

• The feasibility of an integer programming question asks if there is a


list of integer numbers x1 , x2 , . . . xn satisfying a set of linear inequalities
having the form
X n
ci xi ≤ b. (1.1)
i=1

It is easy to see that these problems are indeed in NP. If somebody selects
vertices v1 , v2 , . . . , vk , it is easy to verify that for all i, j ∈ {1, 2, . . . , k}, there
is an edge between vi and vj . If somebody provides a partitioning of a set of
numbers, it is easy to calculate the sums of the subsets and check if the two
sums are the same. Also, it is easy to verify that assignments to the variables
x1 , x2 , . . . , xn satisfy any inequality under (1.1).
In many cases, finding a solution seems to be harder than verifying a
solution. There are problems in NP for which no polynomial running time
algorithm is known. We cannot prove that such an algorithm does not exist,
however, we can prove that these hard computational problems are as hard
as any problems in NP. To precisely state this, we first need the following
definitions.

Definition 3. Let A and B be two computational problems. We say that


A has a polynomial reduction to B, if a polynomial running time algorithm
exists that solves any problem instance x ∈ A by generating problem instances
y1 , y2 , . . . , yk all in B and solves x using the solutions for y1 , y2 , . . . yk . The
computational time generating problem instances y1 , y2 , . . . yk counts in the
running time of the algorithm, but the computational time spent in solving
these problem instances is not considered in the overall running time. We also
say that A is polynomially reducible to B.
Example 1. An independent set in a graph is a subset of the vertices such
that no two vertices in it are adjacent. The k-independent set problem asks if
there is an independent set of size k in a graph.
The k-independent set problem is polynomially reducible to the k-clique
problem. Indeed, a graph contains an independent set of size k if and only if
its complement contains a clique of size k. Taking the complement of a graph
can be done in polynomial time.
Similarly, the k-clique problem is also polynomially reducible to the k-
independent set problem.
Polynomial reduction is an important concept in computational complex-
ity. If a computational problem A is polynomially reducible to B and B can
be solved in polynomial time, then A also can be solved in polynomial time.
Similarly, if B is polynomially reducible to A, and A can be solved in polyno-
mial time, then B can be solved in polynomial time, as well. Therefore, if A
and B are mutually polynomially reducible to each other, then either both of
them or none of them can be solved in polynomial time. These thoughts lead
to the following definitions.
6 Computational complexity of counting and sampling

Definition 4. A computational problem is in the complexity class NP-hard if


every problem in NP is polynomially reducible to it. The NP-complete prob-
lems are the intersection of NP and NP-hard.
What follows from the definition is that P is equal to NP if and only
if there is a polynomial running time algorithm that solves an NP-complete
problem. It is widely believed that P is not equal to NP, and thus, there are
no polynomial running time algorithms for NP-complete problems.
It is absolutely not trivial that NP-complete problems exist. Below we
define a decision problem and state that it is NP-complete.
Definition 5. In Boolean logic, a literal is a logical variable or its negation.
A disjunctive clause is a logical expression of literals and OR operators (∨).
A conjunctive normal form or CNF is a conjunction of disjunctive clauses,
that is, disjunctive clauses connected with the logical AN D (∧) operator. A
conjunctive normal form Φ is satisfiable if there is an assignment of logical
variables in Φ such that the value of Φ is TRUE. Such an assignment is
called a satisfying assignment. The decision problem if there is a satisfying
assignment of a conjunctive normal form is called the satisfiability problem
and denoted by SAT.

Theorem 1. For any decision problem A in NP and any problem instance


x in A, there exists a conjunctive normal form Φ such that Φ is satisfiable if
and only if the answer for the problem instance x is “yes”. Furthermore, for
any x, such a conjunctive normal form can be constructed in polynomial time
of the size of x. Since verifying that an assignment of the logical variables is a
satisfying assignment can be clearly done in polynomial time, and thus, SAT
is in NP, this also means that SAT is an NP-complete problem.
We do not prove this theorem here; the proof can be found in any standard
textbook on computational complexity, see for example [72]. The satisfiability
is the only problem for which we can directly prove NP-completeness. For
all other decision problems, NP-completeness is proved by polynomial reduc-
tion of the SAT problem or other NP-complete problems to those decision
problems. Indeed, the following theorem holds.
Theorem 2. Let A be an NP-complete problem and let B be a decision prob-
lem in NP. If A is polynomially reducible to B, then B is also NP-complete.

Proof. The proof is based on the fact that the sum as well as the composition
of two polynomials are also polynomials.
Stephen Cook proved in 1971 that SAT is NP-complete [44], and Richard
Karp demonstrated in 1972 that many natural computational problems are
NP-complete by reducing SAT to them [108]. These famous Karp’s 21 NP-
complete problems drove attention to NP-completeness and initiated the study
of the P versus NP problem. The question whether or not P equals NP has
Background on computational complexity 7

become the most famous unsolved problem in computational complexity the-


ory. In 2000, the Clay Institute offered $1 million for a proof or disproof that
P equals NP [1].
Below we give a list of NP-complete problems that we are going to use in
proofs of theorems about computational complexity of counting and sampling.
Definition 6. Let G ~ = (V, E) be a directed graph. A Hamiltonian path is
a directed path that visits each vertex exactly once. A Hamiltonian cycle is a
directed cycle that contains each vertex exactly once.
Based on this definition, we can define the following two problems.
Problem 1.
Name: H-Path.
~ = (V, E).
Input: a directed graph, G
~ has a Hamiltonian path, “no” otherwise.
Output: “yes” if G

Problem 2.
Name: H-Cycle.
~ = (V, E).
Input: a directed graph, G
~
Output: “yes” if G has a Hamiltonian cycle, “no” otherwise.

Theorem 3. [108] Both H-Path and H-Cycle are in NP-complete.


It is also hard to decide if a graph contains a large independent set.
Problem 3.
Name: LargeIS.
Input: a positive integer m and a graph G in which every independent set has
size at most m.
Output: “yes” if G has an independent set of size m, and “no” otherwise.

Theorem 4. [73] The decision problem LargeIS is in NP-complete.


The subset sum (see below) is an infamous NP-complete problem. It is
polynomially solvable if the weights are given in unary, however, it becomes
hard for large weights.
Problem 4.
Name: SubsetSum.
Input: a set of numbers, S = {x1 , x1 , . . . , xn } and a number
P m.
Output: “yes” if there is a subset A ⊆ S such that x∈A x = m, otherwise
“no”.

Theorem 5. [108] The decision problem SubsetSum is in NP-complete.


8 Computational complexity of counting and sampling

1.3 Deterministic counting: FP, #P, #P-complete


Definition 7. The complexity class #P contains the counting problems that
ask for the number of witnesses of the decision problems in NP. If A denotes
a problem in NP, then #A denotes its counting counterpart.
Since the decision versions of #P problems are in NP, there is a witness
that can be verified in polynomial time. This does not automatically imply
that all witnesses can be verified in polynomial time, although it naturally
holds in many cases. When it is questionable that all solutions can be verified
in polynomial time, a polynomial upper bound must be given, and only those
witnesses count that can be verified in that time.
For example, #SAT denotes the counting problem that asks for the number
of satisfying assignments of conjunctive normal forms. Some counting prob-
lems are tractable. Formally, they belong to the class of tractable function
problems.
Definition 8. A function problem is a computational problem where the out-
put is more complex than a simple “yes” or “no” answer. The complexity class
FP (Function Polynomial-Time) is the class of function problems that can be
solved in polynomial time with an algorithm.
We can define the #P-hard and #P-complete classes analogously to the
NP-hard and NP-complete classes.
Definition 9. A computational problem is in #P-hard if any problem in #P
is polynomially reducible to it. The #P-complete class is the intersection of
#P and #P-hard.
As one can naturally guess, #SAT is a #P-complete problem. Indeed, the
following theorem holds.
Theorem 6. For every problem #A in #P, and every problem instance x
in #A, there exists a conjunctive normal form Φ such that the number of
satisfying assignments of Φ is the answer for x. Furthermore, such a Φ can
be constructed in polynomial time of the size of the problem instance x. Since
#SAT is in #P, this means that #SAT is a #P-complete problem.
It is clear that #P ⊆ FP if and only if there exists a polynomial run-
ning time algorithm for a #P-complete problem. It is also trivial to see that
#P ⊆ FP implies P = NP. However, we do not know if the reverse is true,
namely, whether or not P = NP implies that counting the witnesses of any
#P-complete problem is easy. Still, we have the following non-trivial result.
Theorem 7. [161] If P = NP, then for any problem #A in #P and any
polynomial p, there is a polynomial time algorithm for #A such that it ap-
proximates any instance of #A within a multiplicative approximation factor
1 + p1 .
Background on computational complexity 9

By assuming that P is not equal to NP, we cannot expect a polynomial run-


ning time algorithm counting the witnesses of an NP-complete problem. Natu-
rally, the counting versions of many NP-complete problems are #P-complete.
However, we do not know if it is true that for any NP-complete problem A,
its counting version #A is #P-complete. On the other hand, there are easy
decision problems whose counting version is #P-complete. Below we show two
of them.
Definition 10. The permanent of an n × n matrix M = {mi,j } is defined as
n
X Y
per(M ) := mi,σ(i) (1.2)
σ∈Sn i=1

where Sn is the set of permutations of numbers 1, 2, . . . , n.


Theorem 8. Computing the permanent is #P-hard. Computing the perma-
nent of a 0-1 matrix is still #P-hard.
We are going to prove this theorem in Chapter 4. Calculating the perma-
nent is related to counting the perfect matchings in a graph, as stated and
proved below.
Theorem 9. Computing the number of perfect matchings in a bipartite graph
is a #P-complete counting problem.
Proof. We reduce the permanent of a 0-1 matrix to computing the number of
perfect matchings in a bipartite graph.
Let A = {ai,j } be an arbitrary n×n matrix containing 0s and 1s. Construct
a bipartite graph G = (U, V, E) such that there is an edge between ui and vj
if and only if ai,j = 1.
Matrix A contains only 0s and 1s, therefore for any permutation σ,
n
Y
ai,σ(i) (1.3)
i=1

is 1 if each ai,σ(i) is 1 and 0 otherwise. Let S 0 denote the subset of permutations


for which the product is 1. Let M denote the set of perfect matchings in G.
Clearly, there is bijection between S 0 and M: if σ ∈ S 0 , then assign the
perfect matching to σ that contains the edges (ui , vσ(i) ). This is indeed a
perfect matching, since each (ui , vσ(i) ) ∈ E due to the definition of S 0 and A,
and each vertex is covered by exactly one edge since σ is a permutation. It is
also clear that this mapping is an injection, if σ1 6= σ2 , then their images are
also different.
Similarly, if M ∈ M is a perfect matching, then for each i, it contains an
edge (ui , vj ). Then assign to M the permutation that maps i to j. It is indeed
a permutation due to the definition of perfect matching, and the so obtained
σ is indeed in S 0 due to the definition of S 0 and A.
10 Computational complexity of counting and sampling

Therefore, the number of perfect matchings in G is the permanent of A.


Since constructing G can be clearly done in polynomial time, this is a poly-
nomial reduction, and thus, computing the number of perfect matchings in a
bipartite graph is #P-hard. Since this counting problem is also in #P, it is in
#P-complete.
Leslie Valiant defined the classes #P-hard and #P-complete, and proved
that computing the permanent of a matrix is #P-hard, it is still #P-hard
to compute the permanent of a matrix if the entries are restricted to the
{0, 1} set, and thus, counting the perfect matchings in a bipartite graph is
#P-complete [175]. This is quite surprising, since deciding if there is a perfect
matching in a bipartite graph is in P [94].
We introduce another #P-complete counting problem, which is even more
surprising in the sense that its decision version is absolutely trivial. The prob-
lem is also related to finding the volume of a convex body. It is also the first
example for the fact that hard counting problems might be easy to approxi-
mate, as we are going to discuss later on in this chapter.
Definition 11. A partially ordered set or short: a poset is a pair (A, ≤)
where A is a set and ≤ is a reflexive, antisymmetric and transitive relation
on A, that is, for any a, b, c ∈ A
(a) a ≤ a,
(b) if a 6= b and a ≤ b, then the relation b ≤ a does not hold,
(c) a ≤ b ∧ b ≤ c =⇒ a ≤ c.

The meaning of the name is that there might be elements a, b ∈ A such


that neither a ≤ b nor b ≤ a hold. An example for partial ordered sets is when
A = 2X for some set X, and the relation is ⊆. Further examples are: A is the
natural numbers and a ≤ b if a|b, A is the set of subgroups of a group and
a ≤ b is a is a subgroup of b, etc.
It is easy to see that any poset can be extended to a total ordering, such
that for any a, b ∈ A, a ≤ b implies that a ≤t b, where ≤t is the defining
relation of the total ordering. Such a total ordering is called a linear extension
of the poset. We can ask how many linear extensions a poset has.
Problem 5.
Name: #LE.
Input: a partially ordered set (P, ≤).
Output: the number of linear extensions of (P, ≤).
Observe that the decision version of #LE is trivial: whatever the poset is,
the answer is always “yes” to the question if the poset has a linear extension.
Therefore, it is very surprising that the following theorem holds.

Theorem 10. #LE is a #P-complete problem.


Background on computational complexity 11

We are going to prove this theorem in Chapter 4. #LE is related to com-


puting the volume of a convex body. We can define a polytope for each finite
poset in the following way.
Definition 12. Let (A, ≤) be a finite poset of n elements. The poset polytope
is a convex body in the Euclidian space Rn in which each point (x1 , x2 , . . . , xn )
satisfies the inequalities
1. for all i, 0 ≤ xi ≤ 1,
2. for all ai ≤ aj , xi ≤ xj .
1
Theorem 11. The volume of the poset polytope of a poset P = (A, ≤) is n!
times the number of linear extensions of P where n = |A|.
Proof. Any total ordering is also a partial ordering, so we can define its poly-
tope. The intersection of the polytopes of two total orderings has 0 measure
(the possible common facets of the polytopes). Therefore, it is sufficient to
prove that the poset polytope of any total ordering of a set of size n has vol-
1
ume n! . There is a natural bijection between the permutations of length n
and the total orderings of a set of size n: ai ≤ aj in a total ordering (A, ≤) if
and only if σ(i) < σ(j) in permutation σ. The union of the n! poset polytopes
is the n-dimensional unit cube, which has volume 1. Each poset polytope has
the same volume since they can be transformed into each other with linear
transformations preserving the volume. Indeed, the matrices of these linear
transformations in the standard basis are permutation matrices. Therefore,
the determinant of them in absolute value is 1. The intersections of these
polytopes have 0 measure. Then indeed, the volume of the poset polytope of
1
any total ordering of a set of size n is n! . The poset polytope of any partial
ordering is the union of the poset polytopes of its linear extensions, therefore
1
its volume is indeed n! times the number of linear extensions of the poset.
Polytopes are related to other counting problems, too, see for example,
Exercise 5.

1.4 Computing the volume of a convex body, determin-


istic versus stochastic case
The corollary of Theorem 10 is that it is #P-hard to find the volume of a
convex body defined as the intersection of half spaces given by linear inequal-
ities. Computing the volume of a convex body is intrinsically hard. However,
the computation becomes easy if stochastic computations are allowed. In this
section, we present two results highlighting that there might be exponential
gaps between random and deterministic computations.
12 Computational complexity of counting and sampling

Consider a computational model with an oracle. The oracle can be asked


if any point in Rd is in the convex set K ⊂ Rd . The oracle answers “yes” if the
point is in K. If the point is not in K, then it gives a hyperplane separating
the point from K. Since Rd is infinite, we need further information about the
convex set: it is promised that the convex set is in the hypersphere RB and
contains the hypersphere rB where 0 < r < R are real numbers, and B is the
Euclidian unit ball around the center, defined by the inequality
d
X
x2i ≤ 1. (1.4)
i=1

The central question here is how many oracle calls are needed to approximate
the volume of K. We are going to show a negative result. It says that there
is no deterministic, polynomial running time algorithm that can reasonably
approximate the volume of a convex body in this computational model. The
extremely surprising fact is that in the same computational model, approxi-
mating the volume with a random algorithm is possible in polynomial time.
Before we state and prove the theorem, we need the following lemma.
Lemma 1. Let v1 , v2 , . . . , vn be points on the surface of the d (d ≤ n + 1)
dimensional Euclidian unit ball B around the center. Let K be the convex hull
of the given points. Then
 d
1
vol(K) ≤ n vol(B) (1.5)
2

where vol(K) and vol(B) are the volumes of the convex hull and the unit ball,
respectively.

Proof. Consider the balls B1 , B2 , . . . , Bn whose radii are 12 and centers are
v1 v2 vn
2 , 2 , . . . , 2 . Here each point vi is considered as a d-dimensional vector. We
claim that
K ⊆ ∪ni=1 Bi . (1.6)
/ ∪ni=1 Bi . Then for each
Indeed, assume that there is a vertex v ∈ K, but v ∈
i, the angle Oxvi is strictly smaller than π/2. Now consider the hyperplane P
whose normal vector is Ox, and contains x, and let H be the open halfspace
determined by P , containing O. Since each Oxvi angle is strictly smaller than
π/2, each vi is in H. But then K, the convex hull of vi s, cannot contain x, a
contradiction.

Now we are ready to prove the following theorem.

Theorem 12. Let d1 B ⊆ K ⊆ B be a d-dimensional convex body. Assume


that an oracle is available which tells for any point in Rd whether the point
Background on computational complexity 13

is in K. Then any deterministic algorithm that has only poly(d) oracle calls
cannot give an estimation f of the volume of K satisfying
vol(K) d
d ≤ f ≤ 1.999 2 vol(K) (1.7)
1.999 2

where vol(K) denotes the volume of K.


Proof. Since K ⊆ B, we can assume that all points submitted to the oracle are
in B. Let z1 , z2 , . . . zn denote the submitted vertices. Let v1 , v2 , . . . , vn be the
normalizations of these points, normalized to length 1. Then v1 , v2 , . . . , vn are
on the surface of B. Let s1 , s2 , . . . sd+1 be the vertices of a regular d-simplex
on the surface of B. Consider two convex bodies, K1 = B, and K2 , which
is the convex hull of the vertices v1 , . . . , vn , s1 , . . . , sd+1 . Both convex bodies
contain d1 B, K1 trivially, and K2 because the inscribed hypersphere of the
regular d-simplex is d1 B (see also Exercise 4). From Lemma 1, we know that

vol(K1 ) 1
≥ 2d . (1.8)
vol(K2 ) n+d+1
In both cases, the oracle tells us that all points z1 , z2 , . . . , zn are inside the
convex body. Let f be the value that the algorithm generates, based on the
answers of the oracle. Then either
d
f > 1.999 2 vol(K2 ) (1.9)

or
vol(K1 )
f< d . (1.10)
1.999 2
If both inequalities failed, than it would be true that
vol(K1 )
f vol(K1 ) 1
1.999d ≥ vol(K2 )
= ≥ 2d , (1.11)
vol(K2 ) n+d+1
f

which is a contradiction since n is only a polynomial function of d. Therefore


no algorithm can generate an approximation f based on a polynomial number
of oracle calls that satisfy the inequality in Equation (1.7) for every input
K.
The above introduced proof is based on the work of György Elekes [61].
Later, Imre Bárány and Zoltán Füredi extended this result for a stronger one
proving that there is no polynomial algorithm that gives lower and upper
bounds, vol(K) and vol(K), for the volume of K satisfying
 d
vol(K) d
≤ c (1.12)
vol(K) log(d)

where c does not depend on the dimension d [12].


14 Computational complexity of counting and sampling

The situation completely changes if random computation is allowed. Ran-


dom algorithms sample random points from B and estimate the volume of
K based on the answers of the oracle to the randomly generated points. Al-
though it cannot be guaranteed that the resulting random estimation will be
always in between two multiplicative approximations to the exact answers, still
this happens with high probability. Ravi Kannan, László Lovász and Miklós
Simonovits proved the following theorem [105].
Theorem 13. There is an algorithm with parameters , δ > 0 which returns
a real number f satisfying the inequality
 
vol(K)
P ≤ f ≤ (1 + )vol(K) ≥ 1 − δ. (1.13)
1+
The algorithm uses
d5 3
     
1 1
O ln ln ln5 (d) (1.14)
2  δ
oracle calls.
Later on, László Lovász and Santosh Vempala gave an O∗ (d4 ) running time
algorithm to estimate the volume of a convex body in the same computational
model [122]. Here O∗ is for hiding logarithmic terms. Their algorithm still
runs in polynomial time with 1 and − log(δ). We can set, say,  to d1 and
δ to e1d , and still, the running time will be a polynomial function of d. Let
us emphasize again that this is a very striking result: there is no polynomial
running time deterministic algorithm that could approximate the volume of
a convex body within anything better than an approximation factor growing
exponentially with the dimension. On the other hand, if random computation
is allowed, polynomial running time is sufficient to have an approximation
algorithm whose approximation factor actually can tend to 1 polynomially
fast with the dimension. The price we have to pay is that the algorithm might
be wrong with a small probability, however, this probability might even tend
to 0 exponentially fast with the dimension.
This highlights the importance of random algorithms in counting and sam-
pling problems.

1.5 Random decision algorithms: RP, BPP. Papadim-


itriou’s theorem
In this section, we introduce the two basic complexity classes for random
decision algorithms: the BPP and RP classes, and prove a theorem that is
an exercise in Papadimitriou’s book on computational complexity [140]. The
Background on computational complexity 15

theorem says the following: if we can stochastically solve any NP-complete


problem with anything better than random guessing, then we could solve any
problem in the NP class with probability almost 1. We will use this theo-
rem to prove that certain counting problems cannot be well approximated
stochastically unless RP = NP.
Definition 13. A decision problem is in the BPP (Bounded-error Probabilis-
tic Polynomial) class if a random algorithm exists such that

(a) it runs in polynomial time on all inputs,


(b) if the correct answer is “yes” it answers “yes” with probability at least
2/3,
(c) if the correct answer is “no” it answers “no” with probability at least
2/3.
Any such algorithm is also called a BPP algorithm.
The 2/3 in the definition of BPP is just a convention. The number 2/3 is
the rational number p/q between 1/2 and 1 such that p+q is minimal. Indeed,
any fixed constant number between 1/2 and 1 would suffice or even it would
be enough if one of the probabilities was strictly 1/2 and the probability for
the other answer would converge to 1/2 only polynomially fast, see Exercise 6.
Similarly, if a BPP algorithm exists for a decision problem, then also a random
algorithm exists that runs in polynomial time and gives the wrong answer with
1
very small probability (say, with probability 2100 ), see Exercise 7.

Definition 14. A decision problem is in RP (Randomized Polynomial time)


if a random algorithm exists such that
(a) it runs in polynomial time on all inputs
(b) if the correct answer is “yes” it answers “yes” with probability at least
1/2
(c) if the correct answer is “no” it answers “no” with probability 1.
Any such algorithm is also called RP algorithm.
Again, the 1/2 in the definition of RP is only a convention; it is the rational
number p/q between 0 and 1 such that p+q is minimal. Just like in the case of
the BPP class, any fixed constant probability or a probability that converges
only polynomially fast to 0 would suffice, see Exercise 10. Also, the existence
of an RP algorithm means that the correct answer can be calculated with
1
very high probability (say, with probability 1 − 2100 ) in polynomial time, see
Exercise 11.
We know that
P ⊆ RP ⊆ NP (1.15)
16 Computational complexity of counting and sampling

however, we do not know if any containment is proper. Surprisingly, we do


not know if BPP ⊆ NP or NP ⊆ BPP. Throughout this book, we will assume
the following conjecture.
Conjecture 2. For the decision classes P, RP and NP, the relation

P = RP ⊂ NP (1.16)

holds. The P 6= NP assumption also means that

NP-complete ∩ P = ∅ (1.17)

and
#P-complete ∩ F P = ∅. (1.18)
The conjecture that RP = NP also implies that RP ∩ NP-complete = ∅.
This comes from the following theorem and from the easy observation that
RP ⊆ BPP.
Theorem 14. (Papadimitriou’s theorem) If the intersection of NP-complete
and BPP is not empty, then RP = NP.
Proof. We prove this theorem in three steps.
1. First we prove that a BPP algorithm for any NP-complete problem
would prove that BPP = NP.
2. In particular, SAT would be in BPP. We show that a BPP algorithm
for SAT would provide an RP algorithm for SAT.
3. Finally, we show that an RP algorithm for SAT would mean that RP =
NP.
Concerning the first point, assume that there is an NP-complete problem
A for which a BPP algorithm exists. Let B be an arbitrary problem in NP.
Since A is NP-complete, for any problem instance x in B, there is a series
of problem instances y1 , y2 , . . . yk ∈ A such that these problem instances can
be constructed from x in polynomial time with the size of x. Particularly,
k = O(poly(|x|)) and for each i = 1, 2, . . . , k, |yi | = O(poly(|x|)). Furthermore,
from the solutions of these problems, the solution to x can be achieved in
polynomial time. If only random answers are available for y1 , y2 , . . . , yk , then
only a random answer for x can be generated. One wrong solution for any yi
might result in a wrong answer for x. Therefore, to get a BPP algorithm for
solving x, we need that the probability that all yi are answered correctly must
be at least 2/3. For this, each yi must be answered correctly with probability
1
at least 23 k . We can approximate it with

1
1− 2k
(1.19)
log( 32 )
Background on computational complexity 17

since

 log(2 2 )
3
 k   2k
( )
log 3
2
1 − 1 1
1 −
 = >
 
2k 2k 
log( 32 ) log( 32 )

  log(2 2 )
3
r
1 2 2
= > . (1.20)
e 3 3

To achieve this probability, we repeat the BPP algorithm for each yi m


times, and take the majority answer. The number of correct answers follows a
binomial distribution with parameter p ≥ 23 . We can use Chernoff’s inequality
to give an upper bound on the probability that less than half the times the
BPP algorithm generates the correct answer. Recall that Chernoff’s inequality
is
1 (mp − mp(1 − ))2
 
P (Ym ≤ mp(1 − )) ≤ exp − (1.21)
2p m
where Ym is a binomial variable, p is the parameter of the binomial distribution
and  is an arbitrary number between 0 and 1. In our case, p = 2/3, therefore
 should be set to 1/4 to get that p(1 − ) = 1/2. We want to find m satisfying
 
1 (mp − mp(1 − ))2
 
1 1
exp − ≤ 1 − 1 − 2k  = 2k (1.22)
2p m
log( 32 ) log( 32 )

namely,
 m 1
exp − ≤ 2k
. (1.23)
48
log( 32 )

We get that !
2k
m ≥ 24 log (1.24)
log 32


which is clearly satisfied if


96
m≥  k. (1.25)
log 32
This means that the following is a BPP algorithm for problem instance x in
problem B:

1. Construct problem instances y1 , y2 , . . . yk in problem A.


96
2. Solve each yi log( 32 )
k times with the BPP algorithm available for A, and
take the majority answer as the answer to problem instance yi .

3. Solve x using the answers to each yi .


18 Computational complexity of counting and sampling

The first and the last step can be done in polynomial time due to the definition
of NP-completeness. Since k = O(poly(|x|)) and for each i, |yi | = O(poly(|x|)),
the second step also runs in polynomial time. Therefore, the overall running
time is polynomial with the size of x. It is also a BPP algorithm since the
probability that all yi are answered correctly is at least 2/3.
Next, we prove that a BPP algorithm for SAT provides an RP algorithm
for SAT. Let Φ be a conjunctive normal form. Consider the conjunctive normal
form Φ1 that is obtained from Φ by removing all clauses that contain the literal
x1 and removing all occurrences of the literal x1 . Clearly, Φ1 is satisfiable if
and only if Φ has a satisfying assignment in which x1 is TRUE. Consider
also the conjunctive normal form Φ01 that is obtained from Φ by removing all
clauses that contain the literal x1 and removing all occurrences of the literal
x1 . Φ01 is satisfiable if and only if Φ has a satisfying assignment in which x1 is
FALSE. We can solve the decision problem if Φ1 is satisfiable with a very high
probability by repeating the BPP algorithm an appropriate number of times,
and we can also do the same with Φ01 . If one of them is satisfiable, then we can
continue with the variable x2 , and decide if there is a satisfying assignment
of Φ1 or Φ01 in which x2 is TRUE or FALSE. Iterating this procedure, we can
actually build up a candidate for satisfying assignment. By repeating the BPP
algorithm sufficiently many times, we can achieve that the probability that
all calculations are correct is at least 1/2. After building up the candidate
assignment, we can deterministically check if this candidate is a satisfying
assignment, and answer “yes” to the decision question if Φ is satisfiable only
if we verified that the candidate is indeed a satisfying assignment.
We claim that this procedure is an RP algorithm for the SAT problem. If
there are n logical variables, then the candidate assignment is built up in n
iterations. We can use again Chernoff’s inequality to show that O(n) repeats
of the BPP algorithm in each iterative step is sufficient for having at least
1/2 probability that all computations are correct. (Detailed computations are
skipped here; the computation is very similar to the previous one.) Therefore,
if Φ is satisfiable, then we can construct a candidate assignment which is
indeed a satisfying assignment with probability at least 1/2. We can verify
that the candidate assignment is indeed a satisfying assignment in polynomial
time, and thus, we answer “yes” with probability at least 1/2. If Φ is not
satisfiable, then either we conclude with this somewhere during the iteration
or we construct a candidate assignment which is actually not a satisfying
one. However, we can deterministically verify it, and therefore, if Φ is not
satisfiable, then with probability 1 we answer “no”.
Finally, we claim that an RP algorithm for SAT provides an RP algorithm
for any problem in NP. This is the direct consequence of Theorem 1. Indeed,
let A be in NP, and let x be a problem instance in A. Then construct the
conjunctive normal form Φ which is satisfiable if and only if the answer for x
is “yes”. By solving the satisfiability problem for Φ with an RP algorithm, we
also solve the decision problem for x.
Background on computational complexity 19

1.6 Stochastic counting and sampling: FPRAS and


FPAUS
In a random computation, we cannot expect that the answer be correct
with probability 1, and we even cannot expect that the answer have a given
approximation ratio with probability 1. However, we might require that the
computation have small relative error with high probability. This leads to the
definition of the following complexity class.
Definition 15. A counting problem is in FPRAS (Fully Polynomial Random-
ized Approximation Scheme) if for any problem instance x and parameters
, δ > 0 it has a randomized algorithm generating an approximation fˆ for the
true value f satisfying the inequality
 
f ˆ
P ≤ f ≤ f (1 + ) ≥ 1 − δ (1.26)
1+

and the running time of the algorithm is polynomial in |x|, 1 and − log(δ).
An algorithm itself with these prescribed properties is also called FPRAS.

An example for an FPRAS is the algorithm approximating the volume of


a convex body in Theorem 13.
The following theorem shows that we cannot expect a counting problem
to be in FPRAS if its decision version is NP-complete.
Theorem 15. If there exists an NP-complete decision problem A such that
#A is in FPRAS, then RP = NP.
Proof. By Theorem 14, it is sufficient to show that an FPRAS algorithm for
the number of solutions provides a BPP algorithm for the decision problem.
Clearly, let x be a problem instance in A, then the answer for x is “no” if the
number of solutions is 0 and the answer is “yes” if the number of solutions is
a positive integer. Consider an FPRAS with input x,  = 1/2 and δ = 1/3.
Such an FPRAS runs in polynomial time with the size of x, and provides an
answer larger than 1/2 with probability at least 2/3 if there is at least one
solution for x, namely, if the correct answer for the decision problem is “yes”.
Furthermore, if the correct answer for the decision problem is “no”, then the
FPRAS returns 0 with probability at least 2/3. Therefore, the algorithm that
sends the input x,  = 1/2 and δ = 1/3 to an FPRAS and answers “no” if
FPRAS returns a value smaller that 1/2 and answers “yes” if the FPRAS
returns a value larger than or equal to 1/2 is a BPP algorithm even if the
running time of the FPRAS is included in the running time.
There are also counting problems whose decision versions are easy (they
are in P), however, they cannot be approximated unless RP = NP. Jerrum,
20 Computational complexity of counting and sampling

Valiant and Vazirani already observed in 1986 that it is hard to count the
number of cycles in a directed graph [103]. Below we introduce this hardness
result.
Problem 6.
Name: Cycle.
~
Input: a directed graph G.
~
Output: “yes” if G contains a cycle, “no” otherwise.

Theorem 16. The Cycle problem is in P. On the other hand, the following
also holds. If there is a deterministic algorithm such that
~ = (V, E),
(a) its input is a directed graph G
~
(b) it runs in polynomial time of the size of G,

(c) it gives an estimation fˆ of the number of directed cycles in G


~ such that

f
≤ fˆ ≤ f × poly(n)
poly(n)

~ and n is the number of


where f is the number of directed cycles in G
vertices in G,
then P = NP. With other words, it is NP-hard to approximate the number of
cycles in a directed graph even with a polynomial approximation ratio.
Also, the following holds: if there is an FPRAS algorithm for #Cycle
then RP = NP.
Proof. In Chapter 2, we are going to prove that there is a polynomial running
time algorithm that finds the shortest path from any vertex from vi to vj in
a directed graph. We can exclude the 0 length path from the paths, and then
the shortest path from vi to itself is a cycle. For each vi , we can ask if there
is a cycle starting and ending in vi . Since there are only polynomially many
vertices in a graph, this proves that the Cycle problem is in P.
To prove the hardness part of the theorem, we reduce the Hamiltonian
cycle problem to #Cycle. Consider any directed graph G ~ = (V, E), and let
n = |V |. We are going to “blow up” this graph, using the following gadget,
see also Figure 1.1. Define the directed diamond motif as a graph containing
4 vertices and 4 edges. There are 2 edges going from vertex vs to w and u,
furthermore, there are 2 edges going from w and u to vt . The gadget graph
H contains n2 diamond motifs, a source vertex s and a sink vertex t. There is
an edge from vertex s to the vs vertex of the first diamond motif, and for all
i = 1, 2, . . . , n2 −1, there is an edge from the vt vertex of the ith diamond motif
to the vs vertex of the i + 1st diamond motif. Finally, there is an edge from
the vt vertex of the last diamond motif to vertex t. It is easy to see that there
2
are 2n number of paths from s to t: there are two paths for getting through
Background on computational complexity 21
w
s ··· t
vs vt
u

FIGURE 1.1: The gadget graph replacing a directed edge in the proof of
Theorem 16. See text for details.

on each diamond motif, and any combination of them provides a path from s
to t.
~ 0 by replacing each edge (u, v) with the
Construct the directed graph G
 k
gadget graph H. For any cycle in G ~ with k edges, there are 2n2 corre-
~ 0 . Therefore, if there is no Hamiltonian cycle in G, then
sponding cycles in G
there are at most
n−1
X n  2 k  2 (n−1)  2 (n−1)
k! 2n < n(n − 1)! 2n = n! 2n (1.27)
k
k=2

~ 0 . Indeed, there are at most n k! cycles of length k in G.


~ On the

cycles in G k
other hand, if G ~ has a Hamiltonian cycle, then there are at least
 2 n
2n (1.28)

~ 0 . The ratio of this latter lower bound and the upper bound in case
cycles in G
of no Hamiltonian cycles is
 2 n
2n 2n
2
n2
(n−1) = >2 2 (1.29)
n! 2n 2 n!

which grows faster than any exponential function, and thus, any polynomial
function. Therefore, any polynomial approximation for the number of cycles
~ 0 would provide an answer for the question whether there is a Hamiltonian
in G
~ Furthermore, the size of G
cycle in G. ~ 0 is only a polynomial function of G,
~
thus H-Cycle has a polynomial reduction to the polynomial approximation
for #Cycle. That is, it is NP-hard to have a polynomial approximation for
the number of cycles in a directed graph.
An FPRAS approximation for the number of cycles would provide a BPP
algorithm for the H-Cycle problem. Indeed, an FPRAS approximating the
number of cycles in G ~ 0 with parameters  = 1 and δ = 1/3 would separate
the cases when G ~ has and does not have a Hamiltonian cycle with probability
at least 2/3. Since H-Cycle is NP-complete, the intersection of NP-complete
and BPP could not be empty which would imply that RP = NP, according
to Theorem 14.
22 Computational complexity of counting and sampling

The situation will remain the same if we could generate roughly uniformly
distributed cycles from a directed graph. Below we state this precisely after
defining the necessary ingredients of the statement.
Definition 16. The total variation distance of two discrete distributions p
and π over the same (countable) domain X is defined as
1 X
dT V (p, π) := |p(x) − π(x)|. (1.30)
2
x∈X

It is easy to see that the total variation distance of two distributions is


between 0 and 1, it is 0 if and only if the two distributions are pointwise the
same (for all x, p(x) = π(x)), and it is 1 if and only if the two distributions
have disjoint support (that is, p(x) 6= 0 implies that π(x) = 0 and vice versa).
It is also easy to see that the total variation distance is indeed a metric, see
also Exercise 12.
Definition 17. A sampling problem #X is in FPAUS (Fully Polynomial
Almost Uniform Sampler) if #X is in #P, and there is a sampling algorithm
that for any problem instance in #X and  > 0, it generates a random witness
following a distribution p satisfying the inequality

dT V (p, U ) ≤  (1.31)

where U is the uniform distribution of the witnesses. The algorithm must run
in polynomial time both with the size of the problem instance and − log().
This algorithm itself is also called FPAUS.
There is a strong relationship between the complexity classes FPRAS and
FPAUS. In Chapter 7 we will show that for a large class of counting prob-
lems, called self-reducible counting problems, there is an FPRAS algorithm
for a particular counting problem if and only if there is an FPAUS algorithm
for it. Here we state and prove that an FPAUS for sampling cycles from a di-
rected graph would have the same consequence as the existence of an FPRAS
algorithm for counting the cycles in a directed graph.
Theorem 17. If there is an FPAUS algorithm for #CYCLE, then RP = NP.
Proof. According to Theorem 14, it is sufficient to show that an FPAUS would
provide a BPP algorithm for H-CYCLE. Assume that there is an FPAUS for
sampling cycles from a directed graph. For any directed graph G, ~ construct
~ 0
the same graph G that we constructed in the proof of Theorem 16. Apply the
FPAUS algorithm on G ~ 0 using  = 1/10, and generate one cycle. Draw back
~
this cycle to G. If this is a Hamiltonian cycle, then answer “yes”, otherwise
answer “no”.
We claim that this is a BPP algorithm (actually, an RP algorithm). If
there is no Hamiltonian cycle in G, ~ then the algorithm answers “no” with
Background on computational complexity 23

probability 1. If there are c ≥ 1 Hamiltonian cycles in G, ~ then the cycles in


~ 0 ~
G corresponding to Hamiltonian cycles in G have probability at least
 2 n
c 2n c2n
2
16
n−1 n = n 2 ≥ (1.32)
n! + c2 18
2
 2

n! 2n + c 2n

assuming that G~ has at least 2 vertices. Note that the distribution p has at
~ 0 that correspond to Hamiltonian cycles
least 2/3 probability on the cycles in G
~ otherwise we would have
in G,
1 1 X
≥ dT V (p, U ) ≥ |U (x) − p(x)|
10 2
x∈H
!  
1 X X 1 16 2 1
≥ U (x) − p(x) ≥ − = (1.33)
2 2 18 3 9
x∈H x∈H

where H is the set of cycles in G ~ 0 that corresponds to Hamiltonian cycles in


~ The inequality 1 ≥ 1 is clearly a contradiction. Therefore if there is a
G. 10 9
Hamiltonian cycle in G, ~ then the FPAUS algorithm with  = 1 will generate
10
~ 0 that corresponds to a Hamiltonian cycle in G
a cycle in G ~ with probability
at least 2/3. Generating G~ 0 , running the FPAUS algorithm, drawing back the
~
sampled cycle to G and checking if the so-obtained cycle is a Hamiltonian
one can all be done in polynomial time. Therefore, this procedure is indeed
a BPP for the Hamiltonian cycle problem. This would imply that RP = NP,
according to Theorem 14.
Still, there are #P-complete problems that do have FPRAS algorithms.
The careful reader might observe that we already introduced such a problem.
Indeed, #LE is a #P-complete counting problem on the one hand; on the
other hand, it is in FPRAS since approximating the volume of a convex body
is in FPRAS with an oracle that tells if a point is in the convex body. However,
in the case of the poset polytope, we do not need an oracle, since for any point
we can deterministically decide if it is in the poset polytope: we have to check
the inequalities in the Definition 12. This clearly can be done in polynomial
time for any point. Thus, there is an FPRAS for approximating the volume of
a poset polytope. Dividing with n! keeps the relative error. Therefore, there
is also an FPRAS for counting the linear extensions of a poset.
One might ask if the fact that the intersection of #P-complete and FPRAS
is not empty implies that RP = NP. We could see in the proof of Papadim-
itriou’s theorem that we can nicely handle probabilities in polynomial reduc-
tions. However, we might not be able to handle relative errors. Indeed, there
are operations that do not keep the relative error, most notably, the subtrac-
tion and modulo prime number calculations. We will see that such operations
appear in each #P-completeness proof of a counting problem that is also
in FPRAS. Therefore the polynomial reductions used in such proofs cannot
24 Computational complexity of counting and sampling

be used to propagate FPRAS approximations to other counting problems in


#P: we will lose the small relative error. Thus, it does not follow that any
counting problem in #P has an FPRAS. In particular, we cannot prove that
an FPRAS algorithm exists for a counting problem whose decision version is
NP-complete.
On the other hand, if a #P-completeness proof for a problem #A preserves
the relative error, it also proves that there is no FPRAS for #A unless RP
= NP. The relative error can be preserved with a one-to-one, a one-to-many
or a many-to-one reduction or even in a way as seen in the proof of non-
approximability of the number of cycles in a directed graph.

1.7 Conclusions and the overview of the book


The central question in the computational complexity theory of counting
and sampling is this: Which counting problems are in FP, which are in #P-
complete, and if a problem is in #P-complete, is it in FPRAS and/or FPAUS
or not (assuming that RP is not equal to NP)? Although there is no strict
trichotomy, most of the counting problems fall into one of the following three
categories:
1. The counting problem is in FP. Typically, if the counting problem is in
FP, then there is a polynomial running time algorithm that can sample
witnesses from exactly the uniform distribution.
2. The counting problem is in #P-complete, however, it is also in FPRAS.
Typically, such a counting problem is also in FPAUS. There is a large
class of counting problems for which we can prove that all of them are
in
(FPRAS ∩ FPAUS) ∪ (FPRAS ∪ FPAUS),
namely, we can prove that any counting problem in this class is in
FPRAS if and only if it is also in FPAUS.
3. The counting problem is in #P-complete, and there is no FPRAS al-
gorithm for it, unless RP = NP. Typically, these problems also do not
have an FPAUS, unless RP = NP.
If a decision problem is NP-complete, then there is no FPRAS algorithm
for its counting version, unless RP = NP. We conjecture that the counting
version of any NP-complete problem is also #P-complete.
Easy decision problems might also be hard to count, even approximately.
This book is about the state-of-the-art of our knowledge about which decision
problems have their counting version in the above-mentioned three categories.
The book contains the following chapters.
Background on computational complexity 25

(a) Chapter 2 describes the easiest counting problems. These are the prob-
lems whose decision, optimization and counting versions can be univer-
sally solved with dynamic programming algorithms. The computations
in these dynamic programming algorithms use only additions and multi-
plications, which we call monotone computations. We are going to show
that from an algebraic point of view, the logical OR and AND opera-
tions can be considered as additions and multiplications. Similarly, addi-
tion and multiplication can be replaced with minimization and addition
without changing the algebraic properties of the computations. We are
going to show that a large class of dynamic programming algorithms
have some universal algebraic properties, therefore essentially the same
algorithms can solve the decision, optimization and counting versions of
a given problem. If the universal algorithm has polynomial running time,
then the problem it solves has a decision version in P and a counting
version in FP, and furthermore, optimal solutions can be found also in
polynomial time.
(b) Chapter 3 introduces counting problems solvable in polynomial time us-
ing subtraction. Particularly, the number of spanning trees in a graph
as well as the number of Eulerian circuits in a directed Eulerian graph
are related to the determinant of certain matrices, and the number of
perfect matchings in a planar graph is the Pfaffian of an appropriately
oriented adjacency matrix of the graph. We are also going to show that
both the determinant and the Pfaffian can be computed in polynomial
time using only additions, subtractions and multiplications, and there-
fore computations can be generalized to arbitrary commutative rings.
These algorithms can also be viewed as monotone computations on some
certain combinatorial objects. The signed and weighted sums of these
combinatorial objects coincide with the determinants and Pfaffians of
matrices via cancellations.
(c) We give a comprehensive list of #P-complete problems in Chapter 4.
We highlight those problems for which approximation preserving #P-
completeness proofs exist, and therefore, there is no FPRAS approxi-
mations for these problems unless RP = NP.
(d) Chapter 5 is about a relatively new topic, holographic algorithms. A
holographic reduction is a linear algebraic many-to-many mapping be-
tween two sets. Such a mapping can prove that the sizes of the two
sets are the same without explicitly giving a one-to-one correspondence
between the two sets. Therefore, if the cardinality of one of the sets
can be obtained in polynomial time, it can be done for the other set.
Usually the holographic reduction maps a set to the (weighted) perfect
matchings of planar graphs. Computing the sum of the weighted perfect
matchings provides a tractable way to obtain the cardinality of the set.
Holographic reductions are also used to obtain equalities of the cardinal-
26 Computational complexity of counting and sampling

ities of two sets where finding the cardinality of one of the sets is known
to be #P-hard. This provides a proof that finding the cardinality of the
other set is also #P-hard.
(e) We turn to sampling methods in Chapter 6. We show how to sam-
ple uniformly combinatorial objects that can be counted with algebraic
dynamic programming. This is followed by the introduction of ways of
random generations providing techniques of almost uniform generations.
Here Markov chains are the most useful techniques.
(f) Chapter 7 is devoted to the theory of the mixing of Markov chains. It
turns out that in many cases, rapidly mixing Markov chains provide
almost uniform generations of certain combinatorial objects. Markov
chains are used to build a dichotomy theory for self-reducible problems:
a self-reducible counting problem either has an FPRAS or cannot be
approximated within a polynomial approximation factor. We also show
that any self-reducible counting problem has an FPRAS if and only if
it has an FPAUS. The consequence is that in many cases, we can prove
that a counting problem is in FPRAS by showing that there exists a
Markov chain which converges rapidly to the uniform distribution of
the witnesses.
(g) Finally, Chapter 8 provides a comprehensive list of counting problems
for which FPRAS exists.

1.8 Exercises
1. Algorithm A has running time 100000n, and algorithm B has running
time 0.1n3 , where n is the input size. Which algorithm is faster if the
input size is typically between 30 and 80?
2. Algorithm A has running time n3 , and algorithm B has a running time
1.01n , where n is the input size. Which algorithm is faster if the input
size is 1000?
3. An algorithm has running time n81 , where n is the input size. What is the
largest input size for which the algorithm finishes in a year running on a
supercomputer achieving 100 exaflops? An exaflop means 1018 floating
point operations per second. Assume that one computational step needs
one floating point operation. How many years does it take to run this
algorithm on the mentioned supercomputer if the input size is n = 3?
4. * Let B be the unit Euclidian ball in a d-dimensional space. Let S be
Background on computational complexity 27

a regular d-simplex whose circumscribed hypersphere is B. Prove that


the inscribed hypersphere of S has radius d1 .

5. * A degree sequence is a list of non-negative integers. A graph G is a


realization of a degree sequence D if the degrees of the vertices in G are
exactly the elements of D. Show that for any degree sequence D, the
number of realizations is the number of integer points in a convex body.

6. ◦ Assume that the decision problem A has a random algorithm with the
following properties:
(a) Its running time grows polynomially with the size of the input.
(b) If the correct answer is “yes”, it answers “yes” with probability
1/2.
1
(c) If the correct answer is “no”, it answers “no” with probability 2 +
1
n3 , where n is the size of the problem instance.

Show that A is in BPP.


7. ◦ Assume that problem A is in BPP. Show that there is a random
algorithm, that for any problem instance x in A and  > 0,
(a) its running time grows polynomially with the size of the input.
(b) it runs in polynomial time with − log(),
(c) if the correct answer is “yes”, it answers “yes” with probability at
least 1 − , and
(d) if the correct answer is “no”, it answers “no” with probability at
least 1 − .

8. * An RP algorithm answers “yes” for some input problem instance.


What can we say about the probability that the answer is correct?

9. * An RP algorithm answers “no” for some input problem instance. What


can we say about the probability that the answer is wrong?
10. ◦ Assume that the decision problem A has a random algorithm with the
following properties:

(a) Its running time grows polynomially with the size of the input.
1
(b) If the correct answer is “yes”, it answers “yes” with probability n3 ,
where n is the size of the problem instance.
(c) If the correct answer is “no”, it answers “no” with probability 1.

Show that A is in RP.


28 Computational complexity of counting and sampling

11. ◦ Assume that problem A is in RP. Show that there is a random algo-
rithm, that for any problem instance x in A and  > 0,

(a) it runs in polynomial time with the size of x,


(b) it runs in polynomial time with − log(),
(c) if the correct answer is “yes”, it answers “yes” with probability at
least 1 − , and
(d) if the correct answer is “no”, it answers “no” with probability 1.

12. Prove that the total variation distance is indeed a distance on the space
of distributions over the same countable domain. For any such domain
X, let F be the set of distributions over X. Let p1 , p2 , p3 ∈ F. Then the
following equations hold.

dT V (p1 , p1 ) = 0
dT V (p1 , p2 ) ≥ 0
dT V (p1 , p2 ) = dT V (p2 , p1 )
dT V (p1 , p2 ) + dT V (p2 , p3 ) ≥ dT V (p1 , p3 ).

13. * Let π and p be two arbitrary distributions over the same countable
domain. Prove that

dT V (p, π) = max (p(A) − π(A))


A⊆X

where X
p(A) := p(x).
x∈A

14. Prove that the inequality in Equation (1.32) holds for any n ≥ 2 and
c ≥ 1.
15. ◦ Assume that the correct answer f of a problem instance x in the
counting problem #A is an integer, and naturally bounded by 1 and
cpoly(|x|) for some c > 1. Furthermore, assume that #A is in FPRAS.
Show that a random estimation fˆ to the solution can be given with the
following properties:

(a) For the expected value of fˆ, fˆ, it holds that

|fˆ − f | 1
≤ .
f poly(|x|)
Background on computational complexity 29

(b) For the variance of fˆ, σf2ˆ it holds that

σf2ˆ 1
≤ .
f poly(|x|)

(c) The estimation fˆ can be given in polynomial time.

16. * Let G~ be a directed graph. A cycle cover is a disjoint union of directed


~ Show
cycles that covers all vertices. Let A be the adjacency matrix of G.
~
that the number of cycle covers in G is the permanent of A.
~ there exists a graph H such that
17. Show that for any directed graph G,
~
the number of cycle covers in G is the number of perfect matchings in
H.

1.9 Solutions
Exercise 4. It is sufficient to show that for any regular d-simplex, the ratio
of the radii of the inscribed and circumscribed hyperspheres is d1 . Let the
coordinates of the regular d-simplex S be the unit vectors of the coordinate
system of a d + 1-dimensional space (that is, (1, 0, 0, . . . , 0), (0, 1, 0, 0, . . . , 0),
etc.). The inscribed hypersphere hits the surface of the simplex  in the middle of
the facets. The coordinates of these points are 0, d1 , d1 , . . . , d1 , d1 , 0, d1 , . . . , d1 ,


etc. Observe that these points are also vertices of a regular d-simplex S 0 , whose
circumscribed hypersphere is the inscribed hypersphere of S. Thus the ratio
of the radii of the inscribed and circumscribed hyperspheres √ of S is the ratio
of the edge lengths of S 0 and S. Since the edge length of S 0 is d2 and the edge

length of S is 2, the ratio in question is indeed d1 .
n
Exercise 5. Let D = {d , d , . . . , d }. The convex polytope is in R( 2 ) . Let
1 2 n
us denote the coordinates by the index pairs (i, j), where i < j. The linear
inequalities defining the polytope are

0 ≤ x(i,j) ≤ 1

and for each i = 1, 2, . . . , n


i−1
X n
X
x(j,i) + x(i,j) = di .
j=1 j=i+1

The bijection between the realizations G = (V, E) and the integer points in
30 Computational complexity of counting and sampling

the defined polytope is given by setting each x(i,j) to 1 if (vi , vj ) ∈ E and 0


otherwise.
Exercise 6. Repeat the given algorithm an appropriate number of times
and take the majority answer. Use Chernoff’s inequality to show that the
appropriate number of repeats is indeed a polynomial function of n.
Exercise 7. Repeat the BPP algorithm an appropriate number of times,
and take the majority answer. Use Chernoff’s inequality to show that the
appropriate number of repeats to achieve the prescribed properties is indeed
a polynomial function of − log().
Exercise 8. This is a tricky question. One might think that the probability is
1 1
2 , since an RP algorithm says “yes” with 2 probability if the correct answer
is “yes”. However, it answers “no” with probability 1 if the correct answer
is “no”, therefore a “yes” answer ensures that the correct answer is “yes”.
Therefore the probability that the RP algorithm gave the correct answer is
actually 1.
Exercise 9. This is again a tricky question. We do not have information
about the problem (whether or not the correct answer is “no”), so we cannot
calculate exactly what the probability is that the answer was wrong. However,
it is wrong with probability at most 12 due to the definition of RP.
Exercise 10. Repeat the given algorithm an appropriate number of times,
and answer “yes” if there is at least one “yes” answer, and “no” otherwise.
Use the fact that for any f (n) tending to infinity it holds that
 f (n)
1 1
1− ≈ .
f (n) e

Show that a polynomial number of repeats is sufficient to get an RP algorithm.


Exercise 11. Repeat the RP algorithm an appropriate number of times,
and answer “yes” if there is at least one “yes” answer, and “no” otherwise.
Basic algebraic considerations show that the appropriate number of repeats to
achieve the prescribed properties is indeed a polynomial function of − log().
Exercise 13. Define set B as

B := {x|p(x) ≥ π(x)}.

Observe that X X
|p(x) − π(x)| = |p(x) − π(x)|.
x∈B x∈B

Indeed,
X X X X
(p(x) − π(x)) + (p(x) − π(x)) = p(x) − π(x) = 0,
x∈B x∈B x∈B∪B x∈B∪B

therefore X X
(p(x) − π(x)) = − (p(x) − π(x)).
x∈B x∈B
Background on computational complexity 31

For any x ∈ B, p(x) − π(x) is negative, therefore,

−(p(x) − π(x)) = |p(x) − π(x)|.

What follows is that


X
dT V (p, π) = (p(x) − π(x)).
x∈B

We claim that B is a set on which the supremum

sup (p(A) − π(A))


A⊆X

is realized. Indeed, for any C ⊆ X, the equation

p(C) − π(C) = p(B) − π(B) − (p(B \ C) − π(B \ C)) + (p(C \ B) − π(C \ B))

holds. However, p(B \C)−π(B \C) cannot be negative and p(C \B)−π(C \B)
cannot be positive due to the definition of B. Therefore,

p(C) − π(C) ≤ p(B) − π(B),

thus the supremum is indeed taken on B.


Exercise 15. Use the FPRAS algorithm for an estimation fˆ and replace it
with 1 or cpoly(|x|) if the estimation is out of the boundaries. Set ε and δ in
such a way that the prescribed inequalities for fˆ and σf2ˆ hold. Show that it is
enough to set both 1ε and − log(δ) to some polynomial of |x|, thus, the overall
running time of this procedure is only a polynomial function of |x|.
Exercise 16. Recall that the permanent of A is
n
X Y
per(A) := ai,σ(i) .
σ∈Sn i=1

A product is 1 if and only if for all i, ai,σ(i) = 1, namely, (i, σ(i)) is a directed
edge. We claim that there is a bijection between permutations for which the
product is 1 and the cycle covers in G. ~ The bijection is given by the two
representations of σ, the function representation, namely, for each i, σ(i) is
given, and the cycle representation of σ. Indeed, if a product is 1, then each
ai,σ(i) = 1, and these edges form a cycle cover in G. ~ On the other hand, each
cycle cover indicates a permutation σ, such that each ai,σ(i) = 1.
Part I

Computational Complexity
of Counting

33
Chapter 2
Algebraic dynamic programming and
monotone computations

2.1 Introducing algebraic dynamic programming . . . . . . . . . . . . . . . . . . . . 36


2.1.1 Recursions, dynamic programming . . . . . . . . . . . . . . . . . . . . . . 36
2.1.2 Formal definition of algebraic dynamic programming . . 45
2.1.3 The power of algebraic dynamic programming: Variants
of the money change problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.1.3.1 Counting the coin sequences summing up
to a given amount . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.1.3.2 Calculating the partition polynomial . . . . . . 49
2.1.3.3 Finding the recursion for optimization with
algebraic dynamic programming . . . . . . . . . . 49
2.1.3.4 Counting the total sum of weights . . . . . . . . . 50
2.1.3.5 Counting the coin sequences when the
order does not count . . . . . . . . . . . . . . . . . . . . . . . 51
2.2 Counting, optimizing, deciding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.3 The zoo of counting and optimization problems solvable with
algebraic dynamic programming . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.3.1 Regular grammars, Hidden Markov Models . . . . . . . . . . . . . 59
2.3.2 Sequence alignment problems, pair Hidden Markov
Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
2.3.3 Context-free grammars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
2.3.4 Walks on directed graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
2.3.5 Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
2.4 Limitations of the algebraic dynamic programming approach . . 89
2.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
2.6 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

Dynamic programming is a powerful algorithmic technique to find the solution


of a larger problem using solutions of smaller problems. It is mainly used in
optimization problems, however, it is also applicable for counting problems.
This chapter introduces a framework called algebraic dynamic programming
which provides a unified framework of counting, sampling and optimizing.
The name of algebraic dynamic programming was coined by Robert
Giegerich and its theory was developed only for context-free grammars [76].
It was extended to further computational problems in bioinformatics [186].

35
36 Computational complexity of counting and sampling

In this book, we extend this theorem basically for any dynamic programming
recursions. The main idea is to separate recursions building up combinatorial
objects from using exact computations. To do this, we are going to intro-
duce yield algebras that build the combinatorial objects (see Definition 19)
and evaluation algebras that do the computations on these combinatorial ob-
jects (see Definition 21). Before giving the formal definitions, we introduce the
concept via several well-known examples.

2.1 Introducing algebraic dynamic programming


2.1.1 Recursions, dynamic programming
Dynamic programming is one of the most fundamental methods in algo-
rithmics. In a dynamic programming algorithm, larger problem instances are
solved recursively using the solutions of smaller problem instances. In this way,
any recursion can be considered as a dynamic programming. The relationship
between algebraic dynamic programming and enumerative combinatorics is
shown via a few examples.
Fibonacci numbers already appeared in Indian mathematics before Fi-
bonacci as the number of k-unit-long sequences of patterns consisting of short
syllables that are 1 unit of duration and long syllables that are 2 units of
duration. Outside of India, the first appearance of the Fibonacci numbers is
in the book “Libre Abaci” by Leonardo Bonacci, also known as Fibonacci. Fi-
bonacci considered the growth of an idealized (biologically unrealistic) rabbit
population. In the first month, a newborn couple of rabbits are put in a field.
Newborn rabbits grow up in a month, never die, and after becoming adults,
each couple produces a new couple of rabbits in each month. Therefore, if Ni
denotes the number of newborn couples in the ith month and Ai denotes the
number of adult couples in the ith month, then

Ai+1 = Ai + Ni (2.1)

and
Ni+1 = Ai . (2.2)
Using these recursions, it is easy to find the number of couples in month i.
Indeed, let Fi denote the total number of couples in the ith month.
Then

Fi = Ai + Ni = (Ai−1 + Ni−1 ) + Ai−1 =


(Ai−1 + Ni−1 ) + (Ai−2 + Ni−2 ) = Fi−1 + Fi−2 (2.3)

which is the well-known recursion for the Fibonacci numbers. Assume also that
we would like to count the children, grandchildren and grand-grandchildren of
Algebraic dynamic programming and monotone computations 37

the founding couple, and in general, the rabbit couples of the k th generation in
month i. Then we can assign the following characteristic polynomial to each
subpopulation:
i
X
gAi := ai,k xk (2.4)
k=1
i
X
gNi := ni,k xk (2.5)
k=1

where ai,k is the number of adult couples of the k th generation in the ith
month and ni,k is the number of newborn couples of the k th generation in the
ith month. The first generation is the founding couple. The recursions for the
characteristic polynomials are:

gAi+1 = gAi + gNi (2.6)


gNi+1 = xgAi . (2.7)

The number of adult (newborn) rabbit couples of the k th generation in the ith
month is the coefficient of xk in gAi (gNi ). Let ri,k denote the total number
of rabbit couples of the k th generation in the ith month. Then

ri,k = ai,k + ni,k = (ai−1,k + ni−1,k ) + ai−1,k−1


(ai−1,k + ni−1,k ) + (ai−2,k−1 + ni−2,k−1 ) = ri−1,k + ri−2,k−1 . (2.8)

From this equation, we get that


 
i−k
ri,k = . (2.9)
k−1

Indeed, ri,k = 0 for any i ≤ 2k − 2, and r2k−1,k = 1, since each couple


of rabbits has to grow up before giving birth, and thus, the k th generation
appears first in the 2k − 1st month. Therefore, r2k−1,k is indeed k−1

k−1 = 1.
For other coefficients, we get that
     
i−k−1 i−k−1 i−k
ri,k = ri−1,k + ri−2,k−1 = + = . (2.10)
k−1 k−2 k−1

From Equation (2.9) we get that

bX2 c
i+1
bX2 c
i+1
 bX2 c
i−1

i−k i−k−1
Fi = ri,k = = , (2.11)
k−1 k
k=1 k=1 k=0

which is the well-known equality saying that the Fibonacci numbers are the
sums of the “shallow” diagonals in Pascal’s triangle.
38 Computational complexity of counting and sampling

There is an obvious similarity between Equations (2.1) and (2.2) and Equa-
tions (2.6) and (2.7). Both couples of recursions calculate sums over sets. In-
deed, let Ai denote the set of adult rabbits in the ith month. Let f1 assign
the constant 1 to each rabbit couple, and let f2 assign xk to a rabbit couple
in the k th generation. Then
X
Ai = f1 (r) (2.12)
r∈Ai

and X
gAi = f2 (r), (2.13)
r∈Ai

and similar equations hold for Ni and gNi . We can also obtain recursions
directly on the sets. Define two operations on rabbits, the o: “get one month
older!” and the b: “give birth!” operators. Also define the action of these
operators on sets of rabbit couples as

o ∗ A := {o ∗ a|a ∈ A}, (2.14)

and similarly for b. Then the recursions for the subsets of populations are

Ai+1 = o ∗ Ai ∪ o ∗ Ni (2.15)
Ni+1 = b ∗ Ai . (2.16)

In the algebraic dynamic programming approach, we will use such recursions


to build up yield algebras, which describe how to build up sets subject to com-
putations. The evaluation algebras describe what kind of computations have
to be performed for each operator in the yield algebra. When we would like to
count the rabbit couples, each operation has to be replaced by multiplication
by 1, and the set union with the addition. When we would like to obtain the
generating function to count the rabbit pairs of a given generation in a given
month, then the o operation has to be replaced by multiplication by 1, and
the b operation has to be replaced by multiplication by x. Such replacements
indeed lead to the recursions in Equations (2.1)–(2.2) and (2.6)–(2.7).
Instead of applying operations on rabbits, we can assign a barcode to each
rabbit couple in each month. The barcode in the ith month is an i-character-
long sequence from the alphabet {0, 1}. It starts with a 1 and each o operation
extends the sequence with a 0 and each b operation extends it with a 1 (so
newborn rabbits get their parents’ barcode extended with a 1). It is easy to
see that each rabbit couple has a unique barcode, and the barcodes are exactly
those sequences from the alphabet {0, 1} which start with a 1 and there are
no consecutive 1s. In such a sequence of length i, replace any 01 substring
with a 2, then every 0 with 1, and then delete the starting 1. The resulting
sequence contains 1s and 2s and the sum of the numbers is i − 1. Since the
number of possible sequences consisting of k 2s and i − 1 − 2k 1s is i−1−k
k , Fi
is indeed the sum of the appropriate shallow diagonal in Pascal’s triangle. The
Algebraic dynamic programming and monotone computations 39

careful reader can also observe that the obtained sequences from the alphabet
{1, 2} decode the possible patterns of short and long syllables of a given total
duration, thus the numbers of such patterns are also indeed the Fibonacci
numbers.
As we can see, the Fibonacci numbers also appear as the number of cer-
tain legitimate code words or regular expressions. We can count any regular
expressions using recursions. Consider the following example.

Example 2. A code word from the alphabet {0, 1, 2, 3} is said to be legitimate


if it contains an odd number of 2s. How many code words of length n are
legitimate? Give an efficient recursion to find the sum of the numbers in the
legitimate code words of length n.
Solution. The first question might be a standard exercise in any introductory
combinatorics course, while the second one is somewhat unusual. Both ques-
tions can be answered if we first set up recursions concerning how to build up
legitimate and illegitimate code words.
Let Li denote the set of the legitimate code words of length i, and let
Ii denote the set of illegitimate code words of length i. Let ◦ denote the
concatenation operator and define this operator on a set as

A ◦ b := {A ◦ b|A ∈ A}. (2.17)

Extending any legitimate code word with 0, 1 or 3 keeps it legitimate, while


extending it with 2 makes it illegitimate. Similarly, extending an illegitimate
code word with 2 makes it legitimate and extending it with 0, 1 or 3 keeps it
illegitimate. Therefore, the recursions for the set of legitimate and illegitimate
code words are

Li+1 = (Li ◦ 0) ∪ (Li ◦ 1) ∪ (Li ◦ 3) ∪ (Ii ◦ 2) (2.18)

and
Ii+1 = (Ii ◦ 0) ∪ (Ii ◦ 1) ∪ (Ii ◦ 3) ∪ (Li ◦ 2) (2.19)
with the initial sets L1 = {2} and I1 = {0, 1, 3}. To count the legitimate and
illegitimate code words, we simply have to find the size of the sets appearing
in Equations 2.18 and 2.19. Observe that each code word appears exactly once
during the recursions, therefore

nLi+1 = 3nLi + nIi (2.20)

and
nIi+1 = 3nIi + nNi (2.21)
where nLi (nIi ) is the number of legitimate (illegitimate) code words of length
i. The initial values are nL1 = 1 and nI1 = 3.
It is possible to find a closed form for such linear recurrences; such solutions
can be found in any standard combinatorics textbook. From a computational
40 Computational complexity of counting and sampling

FIGURE 2.1: A stair-step shape of height 5 tiled with 5 rectangles. The


horizontal lines inside the shape are highlighted as dotted. The four circles
indicate the four corners of the rectangle at the top left corner that cuts the
stair-step shape into two smaller ones. See text for details.

complexity point of view, the given recursions are sufficient to calculate nLi
in a time proportional to some polynomial of i, therefore we are not going to
provide closed forms here.
We also obtain recursions for the sum of the numbers in legitimate
and illegitimate code words of a given length from the recursions in Equa-
tions (2.18) and (2.19). Extending with a number k each code word in a set
of size m increases the sum of the numbers by mk. Therefore the recursions
are
sLi+1 = sLi + 4nLi + sIi + 2nIi (2.22)
and
sIi+1 = sIi + 4nIi + sLi + 2nLi (2.23)
where sLi (sIi ) is the sum of the numbers in the legitimate (illegitimate)
code words of length i. These recursions are sufficient to find the sum of the
numbers in legitimate code words of length n using O(poly(n)) arithmetic
operations. 

In both cases, the recursions were deducted from recursions in Equa-


tions (2.18) and (2.19). Namely, the recursions building the sets of objects
of interest are fixed, and different calculations can be derived from these re-
cursions. This is how to build different evaluation algebras from the same yield
algebra in the algebraic dynamic approach.
Non-linear recursions also appear in enumerative combinatorics. The best
known non-linear recursions are those that generate the Catalan structures,
that is, those combinatorial objects that have Cn number of instances of size
n, where Cn is the nth Catalan number. See the following example.
Algebraic dynamic programming and monotone computations 41

Example 3. A stair-step shape of height n is the shape of the upper diagonal


part of a square matrix of dimensions (n + 1) × (n + 1). Find the number of
ways to tile a stair-step shape of height n with n rectangles. Rotate this stair-
step shape in such a way that its corner is the top left corner (see Figure 2.1).
Observe that different tilings might have a different number of horizontal lines
inside the given shape. Compute the sum of the number of these horizontal
lines in such tilings inside the stair-step shape of height n.
Solution. Since the stair-step shape is tiled with n rectangles, each step must
belong to a different rectangle. Therefore, one of the rectangles spans from
a step to the top left corner, see Figure 2.1 for an example. Consider this
rectangle in the top left corner in a tiling of a stair-step shape of height n.
This rectangle has dimensions i × (n − i + 1) for some i = 1, . . . , n. Removing
this rectangle splits the tiling into tilings of stair-step shapes of height n − i
and i − 1. Here a stair-step shape of 0 height is an empty structure. The
inverse of this operation is the merging of two stair-step shapes of height n − i
and i − 1 by inserting a rectangle with dimensions i × (n − i + 1) between
the stair-step shapes. If ◦ denotes this inverse operation, Ti denotes the set of
tilings of stair-step shapes of height i, and the ◦ operation is defined on sets
as
A ◦ B := {a ◦ b|a ∈ A and b ∈ B}, (2.24)
then
Tn+1 = ∪ni=0 Ti ◦ Tn−i (2.25)
where T0 is the set containing the empty structure of stair-step shape tiling of
height 0. Furthermore, this is the initial case, therefore |T0 | = 1. The careful
reader might already have observed that |Tn | = Cn , since

|Ti ◦ Tj | = |Ti ||Tj | (2.26)

and the Catalan numbers also satisfy the recursion


n
X
Cn+1 = Ci Cn−i (2.27)
i=0

with the initial condition C0 = 1.


Any fixed tiling in Tn−i appears Ci times in the set Ti ◦ Tn−i . Furthermore,
any rectangle joining the tilings in Ti and in Tn−1 has one horizontal line inside
the shape except the rectangle with dimension n × 1. Therefore the recursion
for the number of horizontal lines, hn , is
n
X
hn+1 = hn + (Ci hn−i + hi Cn−i + Ci Cn−i ). (2.28)
i=1


42 Computational complexity of counting and sampling

The careful reader might observe some similarity between Equation (2.28)
and Equations (2.22) and (2.23). In both cases, the recursion finds the sum of
an additive function over combinatorial objects, and the recursions use both
these sums and the sizes of these smaller sets. We are going to introduce a
commutative ring and will show that calculations involving these recursions
can be naturally described in that ring.
In many textbooks, the introductory example for dynamic programming
algorithms is the money change problem. We too introduce this problem to-
gether with its solution and show how it also fits into the algebraic dynamic
programming approach.
Definition 18. Let C = {c1 , c2 , . . . , ck } be a finite set of positive integers
called the coin system, and let x be a non-negative integer. The money change
problem is to find the minimum number of coins necessary to change x. Each
coin type has an infinite supply.
To solve the money change problem, a function m is defined that maps the
natural numbers N to N ∪ ∞; for each x ∈ N, the value m(x) is the minimum
number of coins necessary to change x if changing of x is possible in this coin
set C and ∞ otherwise. The following theorem is true for function m.
Theorem 18. For x = 0, equation

m(0) = 0 (2.29)

holds, and for x > 0, equation

m(x) = min {m(x − ci ) + 1} (2.30)


i∈[1,k]
x−ci ≥0

holds, where the minimum of the empty set is defined as ∞.


Proof. Equation (2.29) is obvious. To prove Equation (2.30), inequalities in
both directions should be proved. If x cannot be changed, then m(x) = ∞
and naturally the inequality

m(x) ≥ min {m(x − ci ) + 1} (2.31)


i∈[1,k]
x−ci ≥0

holds. If x is changeable, then let ci1 , ci2 , . . . , cim(x)−1 , c0 be a minimal change


of x, that is, containing a minimum number of coins summing up to x. Then
ci1 , ci2 , . . . cim(x)−1 is a change for x − c0 and thus cannot contain fewer coins
than m(x − c0 ). Furthermore, a member of a set cannot be smaller than the
minimal value of that set, thus

m(x) ≥ m(x − c0 ) + 1 ≥ min {m(x − ci ) + 1}. (2.32)


i∈[1,k]
x−ci ≥0

This means that for any x, the left-hand side of Equation (2.30) is greater
Algebraic dynamic programming and monotone computations 43

than or equal to the right-hand side. If the set from which the minimum is
taken on the right-hand side is empty for some x, then it is defined as ∞ and
naturally inequality

m(x) ≤ min {m(x − ci ) + 1} (2.33)


i∈[1,k]
x−ci ≥0

holds. If the set is not empty, then let c0 be the coin value for which the
minimum is taken, and let ci1 , ci2 , . . . , cim(x−c0 ) be a minimal change for x − c0 .
Then the set of coins ci1 , ci2 , . . . , cim(x−c0 ) , c0 is a change for x, it contains
m(x − c0 ) + 1 number of coins, and this number cannot be less than m(x).
Therefore
m(x) ≤ m(x − c0 ) + 1 = min {m(x − ci ) + 1}. (2.34)
i∈[1,k]
x−ci ≥0

Since for all x > 0, inequalities in both directions hold, and therefore, Equa-
tion (2.30) also holds.

We can also build recursions on the possible coin sequences. Such recur-
sions hold without stating what we want to calculate. After proving the cor-
rectness of the recursion on the possible coin sequences, we can solve different
optimization and counting problems by replacing the operations in the given
recursions. The construction is stated in the following theorem.

Theorem 19. Let C = {c1 , c2 , . . . ck } be a coin system, ci ∈ Z+ , and let


S(x) denote the set of coin sequences (repetitions are possible, order counts)
summing up to x. Then
S(0) = {} (2.35)
and for x > 0
S(x) = t i∈[1,k] {s ◦ ci | s ∈ S(x − ci )} (2.36)
x−ci ≥0

where  is the empty string, ◦ denotes string concatenation, and the use of
disjoint union is to emphasize that each string in S(x) is generated exactly
once. Furthermore, the empty disjoint union (which is the case when for all
ci , x − ci < 0) is defined as the empty set.

Proof. Equation (2.35) is trivial: only the empty sum of the coin values has
value 0. To prove Equation (2.36), it is sufficient to prove that

S(x) ⊆ t i∈[1,k] {s ◦ ci | s ∈ S(x − ci )} ∀x > 0 (2.37)


x−ci ≥0

and
S(X) ⊇ t i∈[1,k] {s ◦ ci | s ∈ S(x − ci )} ∀x > 0. (2.38)
x−ci ≥0

To prove Equation (2.37), consider the partitioning of the possible se-


quences of coin values summing up to x based on the last coin value. For each
44 Computational complexity of counting and sampling

partition, and for each sequence in the partition, if the last coin ci is removed,
the remaining sequence will be in S(x − ci ). Thus any sequence in S(x) is in
t i∈[1,k] {s ◦ ci |s ∈ S(x − ci )}.
x−ci ≥0
To prove Equation (2.38), observe that each sequence in
t i∈[1,k] {s ◦ ci |s ∈ S(x − ci )} is in S(x), since the sum of the coin val-
x−ci ≥0
ues in each sequence is x. Furthermore, there is no multiplicity in
t i∈[1,k] {s ◦ ci |s ∈ S(x − ci )}, since there is no multiplicity in any of
x−ci ≥0
the S(x − ci ) sets and two sequences cannot be the same if their last coin
values are different.

Theorem 19 immediately provides a recursion to count the number of coin


sequences summing up to a given amount: simply the size of S(x) is to be
calculated. Due to Equation (2.36), the following equality is true:
X
|S(x)| = |S(x − ci )|. (2.39)
i∈[1,k]
x−ci ≥0

Furthermore, the initial value

S(0) = 1 (2.40)

can be read out from Equation (2.35). This calculation can be formalized in
the following way. Let f (s) := 1 for any coin sequence s, and let
X
F (S(x)) := f (s). (2.41)
s∈S(x)

Then F (S(x)) will be the size of the set S(x) and it holds that
X
F (S(x)) = F (S(x − ci )). (2.42)
i∈[1,k]
x−ci ≥0

This is the evaluation algebra, which can be changed without changing the
underlying yield algebra. For example, let g(s) := z |s| (here z is an indetermi-
nate) and let X
G(S(x)) := g(s). (2.43)
s∈S(x)

Then G(S(x)) is a polynomial in which the coefficient of z k is the number of


those coin sequences of length k in which the coin values sum up to x. This
polynomial is called the partition polynomial. It is easy to see that

g(s ◦ ci ) = g(s)z ∀s, ci (2.44)


Algebraic dynamic programming and monotone computations 45

thus X
G(S(x)) = G(S(x − ci ))z. (2.45)
i∈[1,k]
x−ci ≥0

In this way, it is possible to count the number of coin sequences of length


k summing up to x for arbitrary k and x. Observe that once G(S(X)) is
calculated, the optimization problem can also be solved. Indeed, the minimum
number of coins necessary to change x is the degree of the minimum-degree
monomial in G(S(x)).

2.1.2 Formal definition of algebraic dynamic programming


Below we give the formal description of algebraic dynamic programming.
As mentioned, two algebras, the yield algebra and the evaluation algebra, have
to be constructed. A yield algebra describes how to build up the space subject
to calculations. First, we give its definition.
Definition 19. A yield algebra is a tuple {A, (Θ, ≤), p, O, R}. A is a set of
objects, (Θ, ≤) is a partially ordered set called parameters. Let B ⊆ Θ be the
subset of minimal elements. B is required to be non-empty and each θ ∈ Θ
must be accessible from it in a finite number of steps. Both A and Θ might
be countably infinite. The function p maps A to Θ. O is a finite set of partial
operators possibly with different arities. Each operator is defined on A. If ◦ is
an m-ary operator, the expression

a1 ◦ a2 ◦ . . . ◦ am

is also written as  
m
◦ (aj )j=1 .

The set of recursions, R, contains a recursion for each θ ∈ Θ \ B in the form


  
mi
S(θ) = tni=1 ◦i (S(θi,j ))j=1 (2.46)

where
S(θ) := {a ∈ A |p(a) = θ} (2.47)
and   n   o
mi mi
◦i (S(θj ))j=1 := ◦i (aj )j=1 | aj ∈ S(θi,j ) (2.48)

where ◦i ∈ O is an mi -ary operator. These operators might not be defined


on some arguments, however, they must be defined on all possible arguments
appearing in the recursions in R. Furthermore, for all i and j, θi,j < θ.
For example, in the case of the population dynamics of rabbits following
the Fibonacci numbers, A contains the set of adult and newborn rabbits in
given years. Θ contains the (i, l) pairs, where i indicates the year and l is
46 Computational complexity of counting and sampling

a label, which is either “adult” or “newborn”. Parameter (i1 , l1 ) ≤ (i2 , l2 )


if i1 < i2 or i1 = i2 and l1 = l2 . The operators are the already introduced
“give birth!” and “get older!” operators, and the recursions have been given
in Equations (2.15) and(2.16).
In the stairstep-shape tiling problem, A is the set of possible tilings. The
parameter n tells the height of the tiled shape. The ordering of these param-
eters is the natural ordering of the natural numbers. The binary operator ◦
merges two tilings of shapes of heights k and l via a (k + 1) × (l + 1) rectangle,
thus forming a tiling of a shape of height k + l + 1. The recursion is given in
Equation (2.25).
In the money change problem, A is the set of coin sequences of finite length,
(Θ, ≤) is the natural numbers with the usual ordering, and p assigns the sum
of the coin values to each coin sequence. O contains a unary operation for each
coin value such that ◦ci (s) = s ◦ ci . Then R indeed contains the recursions in
Equation (2.36).
We are going to define the evaluation algebra, which describes what to
calculate in the dynamic programming algorithms. Each evaluation algebra
contains an algebraic structure called semiring. For readers not familiar with
abstract algebra, the definition of semiring is given.

Definition 20. A semiring (R, ⊕, ) is an algebraic structure with two op-


erations. The operation ⊕ is called addition, and (R, ⊕) is a commutative
monoid, namely, a commutative semigroup with a unit, that is, ⊕ is an asso-
ciative and commutative operator on R. The operation is called multiplica-
tion, and (R, ) is a semigroup, that is, is an associative operator on R.
The distributive rule connects the two operations: for any a, b, c ∈ R, equations

a (b ⊕ c) = (a b) ⊕ (a c) (2.49)

and
(b ⊕ c) a = (b a) ⊕ (c a) (2.50)
hold. The additive unit is usually denoted by 0.
We would like to emphasize that any ring is also a semiring in which the
addition can be inverted (that is, there is also a subtraction). Readers not
familiar with abstract algebra might consider the integer ring as an example
of semirings with the additional rule that subtraction is forbidden. Later on,
we will see that subtraction has very high computational power. Some com-
putational problems can be solved in polynomial time when subtraction is
allowed and otherwise those computational problems have exponential lower
bound for their running time when only addition and multiplication are al-
lowed. In this chapter, we would like to study the computational problems
efficiently solvable using only additions and multiplications. This is why we
restrict evaluation algebras to semiring computations.
Definition 21. An evaluation algebra is a tuple {Y, R, f, T }, where Y is a
Algebraic dynamic programming and monotone computations 47

yield algebra, and f is a function mapping A (the set of objects in the yield
algebra) to some semiring R. T is a set of functions Ti mapping Rmi × Θmi
to R where ◦i is an mi -ary operator , with the property that for any operands
of ◦i
  
mi
f ◦i (aj )j=1 = Ti (f (a1 ), . . . , f (ami ); p(a1 ), . . . , p(ami )) . (2.51)

In many cases, the operator Ti does not depend on the parameters (i.e.,
p(a1 ), . . . , p(ami )). In those cases, the parameters will be omitted. We also
require that each Ti should be expressed with operations in the algebraic struc-
ture R. When Ti depends on the parameters, the expression is given via a
hidden function h mapping Θmi to some Rni and then Ti is rendered as an
algebraic expression of mi + ni indeterminates (the mi values of f (aj )s and
the ni values coming from the image of h). Each operation Ti ∈ T must satisfy
the distributive rule with respect to the addition in the semiring, that is, for
any θi,1 , . . . , θi,mi

 
X X
Ti  f (ai,1 ), . . . , f (ai,mi ); θi,1 , . . . , θi,mi  =
ai,1 ∈S(θi,1 ) ai,mi ∈S(θi,mi )
X X   
mi
... f (◦i (ai,j )j=1 . (2.52)
ai,1 ∈S(θi,1 ) ai,mi ∈S(θi,mi )

Our aim in the evaluation algebra is to calculate


X
F (S(θ)) := f (a). (2.53)
a∈S(θ)

Thus, the evaluation algebra tells us what to calculate in the recursions in the
yield algebra given in Equation (2.46). Due to the properties of the evaluation
algebra, the following theorem holds.
Theorem 20. Let Y = {A, (Θ, ≤), p, O, R} be a yield algebra, and let E =
{Y, R, f, T } be an evaluation algebra. If for some parameter θ, the recursion
 
mi
S(θ) = tni=1 ◦i (S(θi,j ))j=1 (2.54)

holds in the yield algebra, then


n
X
F (S(θ)) = Ti (F (S(θi,1 )), . . . , F (S(θi,mi )), θi,1 , . . . , θi,mi ) (2.55)
i=1

also holds. Namely, it is sufficient to know the values F (S(θi,j )) for each
S(θi,j ) to be able to calculate F (S(θ)).
48 Computational complexity of counting and sampling

Proof. Equation (2.55) is the direct consequence of the distributive property


of the Ti functions. Indeed,
 
X n X X   
mi
F (S(θ)) =  ... f ◦i (ai,j )j=1 . (2.56)
i=1 ai,1 ∈S(θi,1 ) ai,mi ∈S(θi,mi )

From Equation (2.52), we get that


 
Xn X
F (S(θ)) = Ti  f (ai,1 ), . . . ,
i=1 ai,1 ∈S(θi,1 )

X
f (ai,mi ); θi,1 , . . . , θi,mi  . (2.57)
ai,mi ∈S(θi,mi )

We can write back the definition of the F function (that is, Equation (2.53))
into the arguments of the Ti function, so we get Equation (2.55).
The amount of computation necessary to calculate F (S(θ)) depends on
how many θ0 < θ exist in Θ, how big n is in Equation (2.55) and how much
time it takes to calculate each Ti . This is stated in the theorem below.
Theorem 21. A computational problem X (X might be a decision problem,
optimization problem or a counting problem) can be solved with algebraic dy-
namic programming in polynomial running time, if a pair of yield algebra,
Y = {A, (Θ, ≤), p, O, R} and evaluation algebra E = {Y, R, f, T } exists such
that for any problem instance x ∈ X, the following holds:

(a) a polynomial time computable θ exists such that the solution of x is


F (S(θ)),
(b) the size of θ↓ := |{θ0 |θ0 ≤ θ}| is O(poly(|x|)), furthermore, for any
θ0 ∈ θ↓ , the set of parameters covered by θ0 can be calculated in polyno-
mial time,
(c) for any θ0 ≤ θ, each Ti in Equation (2.55) can be calculated in
O(poly(|x|)) time, and
(d) for any θ0 ∈ B ∩ θ↓ , F (S(θ0 )) can be calculated in O(poly(|x|)) time.

2.1.3 The power of algebraic dynamic programming: Vari-


ants of the money change problem
To show the power of algebraic dynamic programming, we consider several
variants of the money change problem. We fix the following yield algebra.
A is the set of possible coin sequences, Θ is the natural numbers with the
Algebraic dynamic programming and monotone computations 49

usual ordering, and p maps the sequences to their summed coin values. The
unary operator ◦i extends a coin sequence with the coin ci . The recursions are
given in Equations (2.35) and (2.36). For this yield algebra, we give several
evaluation algebras that solve the following computational problems.

2.1.3.1 Counting the coin sequences summing up to a given


amount
When the number of coin sequences summed up to a given value is calcu-
lated, R is the integer number ring and function f is the constant 1 function.
Each Ti is the identity function (recall that each ◦ci operation and thus, each Ti
operation is unary). Then indeed Equation (2.55) turns into Equation (2.42).

2.1.3.2 Calculating the partition polynomial


How do we change the evaluation algebra when f (s) = z |s| ? Then R = Z[z].
Each Ti is the multiplication with z. It is easy to see that Equation (2.55) turns
into Equation (2.45).

2.1.3.3 Finding the recursion for optimization with algebraic dy-


namic programming
To show the power of algebraic dynamic programming, an example is given
for solving optimization problems using algebraic dynamic programming. For
this, first the tropical semiring is defined.
Definition 22. The tropical semiring (R, ⊕, ) is a semiring where
R = R ∪ ∞, is the usual addition of the reals, extended by the symbol of
infinity with the identity relations

a ∞=∞ a = ∞ ∀a ∈ R (2.58)
and a ⊕ b is the minimum of a and b, extended by the symbol of infinity with
the identity relations

a ⊕ ∞ = ∞ ⊕ a = a ∀a ∈ R. (2.59)

It is easy to see that the tropical semiring is indeed a commutative semir-


ing, that is, both (R, ) and (R, ⊕) are commutative semigroups and the
distributive rule
a (b ⊕ c) = (a b) ⊕ (a c) (2.60)
holds for all a, b, c ∈ R. Note also that ∞ is the additive unit, therefore the
empty tropical summation is ∞.
The tropical semiring can be utilized to solve the money change problem
using algebraic dynamic programming. In the evaluation algebra, the algebraic
structure R is the tropical semiring. For a coin sequence s, we set f (s) = |s|.
50 Computational complexity of counting and sampling

Each Ti operator increases the current value by 1, so it is the tropical multi-


plication by 1 in the tropical semiring. The F function takes the minimum of
the given values, namely, it is the tropical addition. Then

F (S(x)) = ⊕ i∈[1,k] F (S(x − ci )) 1 (2.61)


x−ci ≥0

which is exactly Equation (2.30) with the notation of the operators in the
tropical semiring and m(x) is denoted by F (S(x)).
We are going to introduce three variants of the tropical semiring which are
also used in optimization problems.
Definition 23. The dual tropical semiring (R, ⊕, ) is a semiring, where
R = R ∪ {−∞}, a ⊕ b is the maximum of a and b, and is the usual addi-
tion. The dual tropical semiring can be obtained from the tropical semiring by
multiplying each element by −1.
The exponentiated tropical semiring (R, ⊕, ) is a semiring, where R =
R+ ∪ {∞}, a ⊕ b is the minimum of a and b, and is the usual multiplication.
The exponentiated tropical semiring can be obtained from the tropical semiring
by taking the exponent of each element.
The dual exponentiated tropical semiring (R, ⊕, ) is a semiring, where
R = R+ ∪ {0}, a ⊕ b is the maximum of a and b, and is the usual multi-
plication. We will denote R+ ∪ {0} by R≥0 . The dual exponentiated tropical
semiring can be obtained from the tropical semiring by first multiplying each
element by −1 and then taking the exponent of it.

2.1.3.4 Counting the total sum of weights


It is also possible to count the sum of the coin weights in all possible coin
sequences that sum up to a certain value. Assume that there is a weight func-
tion w : C → R+ . The weights might be arbitrary, even irrational numbers,
thus it would be computationally intractable to count the sequences with each
possible total sum. Instead, we can define the commutative ring ((N, R), ⊕, )
where
(n1 , w1 ) (n2 , w2 ) := (n1 n2 , w1 n2 + w2 n1 ) (2.62)
and
(n1 , w1 ) ⊕ (n2 , w2 ) := (n1 + n2 , w1 + w2 ). (2.63)
It is easy to see that both operations are commutative and associative and
the distributive rule

r1 (r2 ⊕ r3 ) = (r1 r2 ) ⊕ (r1 r3 ) (2.64)

holds for any r1 , r2 , r3 ∈ (N, R). Here the first coordinate is the number of
coin sequences and the second coordinate is the total sum of weights. For
any two sets of coin sequences, both the number of sequences and the sum
of the weights are added in the union of the two sets. For the operation ◦ci ,
Algebraic dynamic programming and monotone computations 51

the associated function Ti is the multiplication with (1, wi ), where wi is the


weight of ci . Indeed, if A is a set of coin sequences then the total sum of
weights in ◦ci (A) is w + nwi , where w is the total sum of weights in A and
n is the number of sequences in A. Furthermore, the number of sequences in
◦ci (A0 ) is still n0 . Therefore the appropriate recursion is

F (S(x)) = ⊕ i∈[1,k] F (S(x − ci )) (1, wi ). (2.65)


x−ci ≥0

The empty sum in this ring is (0, 0), since (0, 0) is the additive unit.

2.1.3.5 Counting the coin sequences when the order does not count
When the order of the coins does not count, the base set A must be changed
(and so the yield algebra), since several coin sequences contain the same coins,
just in different order. In this case, the base set A must contain only the non-
decreasing coin sequences. (Θ, ≤) contains the pairs N × C, where C is the set
of coin values, and the partial ordering is coordinatewise, that is, for any two
members of Θ, (n1 , c1 ) ≤ (n2 , c2 ) if and only if n1 ≤ n2 and c1 ≤ c2 . The p
function is !
n
X
p(c1 c2 . . . cn ) := ci , cn . (2.66)
i=1

The operator set O contains the unary operators ◦ci which still concatenates
ci to the end of a coin sequence. However, this operator can be applied only on
the sequences that end with a coin whose value is at most ci . It is guaranteed
in the recursion of the yield algebra, since the recursion is

S(x, ci ) = tcj ≥ci ◦ci (S(x − ci , cj )) . (2.67)


x−cj ≥0

Once the yield algebra is obtained, several evaluation algebras can be associ-
ated to it. For example, to count the number of set of coins that sum up to
a given value, the f function is the constant 1 in the evaluation algebra and
each Ti operator is the identity function.

2.2 Counting, optimizing, deciding


In this section, an algebraic description is given explaining the relationship
amongst counting, optimization and decision problems. We saw in previous
examples that it is useful to introduce generating functions in algebraic dy-
namic programming. A generating function is a polynomial coming from a
polynomial ring. We first define an algebraic structure that is a natural gen-
eralization of polynomial rings.
52 Computational complexity of counting and sampling

Definition 24. Let G be a monoid, with its operation written as multiplication


and let R be a semiring. The monoid semiring of G over R denoted by R[G], is
the set of mappings h : G → R with finite support. The addition of h1 and h2
is defined as (h1P+ h2 )(x) → h1 (x) + h2 (x) while the multiplication is defined
as (h1 h2 )(x) = uv=x h1 (u)h2 (v).
It is easy to see that R[G] is indeed a semiring. If both R and G are com-
mutative, then R[G] is a commutative one. The additive unit is the constant
0 function, that is, the function that maps each element in the monoid G to
the additive unit of R. The multiplicative unit is the function that maps 1G ,
the unit of the monoid to 1R , the multiplicative unit of the semiring, and any
other member of G is mapped to 0.
An equivalent notation of the mappings is the formal summation
X
h(g)g (2.68)
g∈G

where only those members of the summation are indicated for which h(g) 6= 0.
Both notations (mappings and formal summations) are used below.
An example for a monoid semiring is the integer polynomial ring Z[x].
Here the monoid is the one variable free monoid generated by x. The semiring
is the integer numbers. Although the integer numbers with the usual addition
and multiplication form a ring, we do not use the subtractions of this ring in
algebraic dynamic programming algorithms. Another example for a monoid
semiring is the natural number polynomial semiring, N[x], that is, the integer
polynomials with non-negative coefficients. 0 is considered to be a natural
number to get a semiring for the usual addition and multiplication. In fact,
N[x] is a sub-semiring of the Z[x] semiring, and that is the semiring what we
use in algebraic dynamic programming algorithms, even if Z[x] is given as the
algebraic structure in the evaluation algebra.
Monoid semirings over the natural number semiring can be used to build
evaluation algebras if the combinatorial objects are scored by a multiplica-
tive function based on some monoid. This is precisely stated in the following
theorem.
Theorem 22. Let Y = (A, Θ, O, R) be a yield algebra, G a (commutative)
monoid, and let f : A → G be a function, such that for any m-ary operator
◦ ∈ O, a function h◦ : Θm → G exists, for which the equality
m
Y
m
f (◦ ((ai )i=1 )) = h◦ (θ1 , θ2 , . . . , θm ) f (ai ) (2.69)
i=1

holds, where θi is the parameter of ai . Then (Y, N[G], f 0 , T ) is an evaluation


algebra, where
f 0 (a) := 1f (a) (2.70)
Algebraic dynamic programming and monotone computations 53

and for each operator ◦i , the corresponding function Ti is


im
Y
Ti (f 0 (a1 ), . . . , f 0 (aim ), θ1 , . . . , θim ) = h◦i (θ1 , . . . , θim ) f 0 (a). (2.71)
j=1

Proof. It is the direct consequence of the fact that N[G] is a semiring, and the
distributive rule holds. Indeed,
 
X X
Ti  f 0 (ai1 ), . . . , f 0 (aimi ); θ1 , . . . , θmi  =
ai1 ∈S(θ1 ) aim ∈S(θmi )
i
   
X X
h◦i (θ1 , . . . , θmi )  f 0 (ai1 ) . . .  f 0 (aimi ) =
ai1 ∈S(θ1 ) aim ∈S(θmi )
i
X X
... h◦i (θ1 , . . . , θmi )f 0 (ai1 ) . . . f 0 (aimi ) =
ai1 ∈S(θ1 ) aim ∈S(θm )
i i
X X   mi 
... f 0 ◦i aij j=1 . (2.72)
ai1 ∈S(θ1 ) aim ∈S(θm ) ∈S(θmi )
i i

Definition 25. The evaluation algebra given in Theorem 22 is called the


statistics evaluation algebra.
In algebraic dynamic programming, two monoid semirings are of central
interest, N[R+ ] and N[R× ], where R+ is the additive group of real numbers
and R× is the multiplicative group of the positive numbers. Computation in
these semirings might be intractable, since the number of terms in the formal
summation in Equation (2.68) representing the semiring elements might grow
exponentially during the recursion. However, some homomorph images might
be easy to calculate. Indeed, any homomorph image indicates an evaluation
algebra, stated in the following theorem.
Theorem 23. Let E = {Y, N[G], f, T } be a statistics evaluation algebra, with
functions
im
Y
Ti (f (a1 ), . . . , f (aim ), θ1 , . . . , θim ) = h◦i (θ1 , . . . , θim ) f (a)
j=1

and let ϕ : N[G] → R0 be a semiring homomorphism. Then E 0 = {Y, R0 , f 0 , T 0 }


is an evaluation algebra, where

f 0 (a) := ϕ(f (a))


54 Computational complexity of counting and sampling

and for each operator ◦i ,


im
Y
Ti0 (f 0 (a1 ), . . . , f 0 (aim ), θ1 , . . . , θim ) := ϕ(h◦i (θ1 , . . . , θim )) f 0 (aj ).
j=1

Furthermore, E 0 calculates F 0 (S(θ)) = ϕ(F (S(θ))).


Proof. It is the direct consequence of the definition of homomorphism. Indeed,
if  
mi
a = ◦i (aj )j=1
then
 
im
Y
f 0 (a) = ϕ(f (a)) = ϕ h◦i (θ1 , . . . , θim ) f (aj ) =
j=1
im
Y im
Y
ϕ(h◦i (θ1 , . . . , θim )) ϕ(f (aj )) = ϕ(h◦i (θ1 , . . . , θim )) f 0 (a)(2.73)
j=1 j=1

and the distributive rule holds, since


 
X X
Ti  f 0 (ai1 ), . . . , f 0 (aimi ); θ1 , . . . , θmi  =
ai1 ∈S(θ1 ) aim ∈S(θmi )
i
   
X X
ϕ(h◦i (θ1 , . . . , θmi ))  f 0 (ai1 ) . . .  f 0 (aimi ) =
ai1 ∈S(θ1 ) aim ∈S(θmi )
i
X X
... ϕ(h◦i (θ1 , . . . , θmi ))f 0 (ai1 ) . . . f 0 (aimi ) =
ai1 ∈S(θ1 ) aim ∈S(θmi )
i
X X   mi 
... f 0 ◦i aij j=1 . (2.74)
ai1 ∈S(θ1 ) aim ∈S(θmi )
i

It is also clear that


X X
F 0 (S(θ)) = f 0 (a) = ϕ(f (a)) =
a∈S(θ) a∈S(θ)
 
X
ϕ f (a) = ϕ(F (S(θ))). (2.75)
a∈S(θ)

Below we define three homomorphisms. These homomorphisms construct


evaluation algebras, and the constructed evaluation algebras solve the count-
ing, minimizing and decision problems for the same yield algebra.
Algebraic dynamic programming and monotone computations 55

(a) (Y, N[G], f, T ) is a statistics evaluation algebra, ϕ : N[G] → N, and


X
ϕ(h) = h(g).
g∈G

(Recall that h ∈ N[G] is a mapping from G to N.) It is indeed a semiring


homomorphism, since
X
ϕ(h1 + h2 ) = (h1 + h2 )(g) =
g∈G
X X
h1 (g) + h2 (g) = ϕ(h1 ) + ϕ(h2 ) (2.76)
g∈G g∈G

and
X X X
ϕ(h1 h2 ) = (h1 h2 )(g) = h1 (g1 )h2 (g2 ) =
g∈G g∈G g1 g2 =g
  
X X
 h1 (g1 )  h2 (g2 ) = ϕ(h1 )ϕ(h2 ). (2.77)
g1 ∈G g2 ∈G

The homomorph image calculates ϕ(F (S(θ))), that is, the size of S(θ).
(b) (Y, N[R+ ], f, T ) is a statistics evaluation algebra, ϕ : N[R+ ] → R, where
R is the tropical semiring, and

ϕ(h) = min{supp(h)}

where
supp(h) := {g ∈ R|h(g) 6= 0}.
(Recall that h ∈ N[R+ ] is a mapping from R+ to N.) When the support
of h is the empty set, ϕ(h) = +∞, the additive unit of R. It is indeed a
semiring-homomorphism, since

ϕ(h1 + h2 ) = min{supp(h1 + h2 )} =
min{supp(h1 } ⊕ min{supp(h2 )} = ϕ(h1 ) ⊕ ϕ(h2 ) (2.78)

and

ϕ(h1 h2 ) = min{supp(h1 h2 )} =
min{supp(h1 )} min{supp(h2 )} = ϕ(h1 ) ϕ(h2 ). (2.79)

This homomorph image calculates ϕ(F (S(θ))), that is, the minimal
value in S(θ).
Similar construction exists with evaluation algebra {Y, N[R× ] f, T }.
56 Computational complexity of counting and sampling

(c) (Y, N[G], f, T ) is a statistics evaluation algebra, ϕ : N[G] →


({0, 1}, ∨, ∧), where the constant 0 function (the additive unit of N[G])
is mapped to 0 and all other members of N[G] are mapped to 1. Here
({0, 1}, ∨, ∧) is the Boolean algebra with two elements. ϕ is a homo-
morphism, in fact, any mapping from any semiring to the two element
Boolean algebra is a homomorphism if the additive unit is mapped to 0
and all other elements are mapped to 1.
The homomorph image calculates ϕ(F (S(θ))), which is 0 if S(θ) = ∅
and 1 otherwise. Therefore, with this evaluation algebra, we can decide
if an object with parameter θ exists.
As can be seen from these examples, if a yield algebra {A, Θ, p, O, R}
exists on some objects A, and f is an additive function on reals or a multi-
plicative function on the positive real numbers in the sense of Equation (2.69),
then counting, minimizing and deciding if an object exists with parameter θ
are similar computational problems, in the sense that they can be calculated
with evaluation algebras that are homomorph images of the same statistics
evaluation algebra.
If the function of the objects is a multiplicative function on real numbers,
then calculating the weighted sum (instead of just counting the objects in some
set S(θ)) can be obtained with a homomorphism easily. The homomorphism
ϕ : N[R× ] → R is simply X
ϕ(h) = gh(g).
g∈R

It is indeed a homomorphism, since


X
ϕ(h1 + h2 ) = g(h1 + h2 )(g) =
g∈R
X X
gh1 (g) + gh2 (g) = ϕ(h1 ) + ϕ(h2 ) (2.80)
g∈R g∈R

and
X X X
ϕ(h1 h2 ) = g(h1 h2 )(g) = g h1 (g1 )h2 (g2 ) =
g∈R g∈R g1 g2 =g
  
X X
 g1 h1 (g1 )  g2 h2 (g2 ) = ϕ(h1 )ϕ(h2 ) (2.81)
g1 ∈R g2 ∈R

The homomorph image calculates ϕ(F (S(θ))), which is indeed the weighted
sum of the objects.
However, if the monoid semigroup is N[R+ ], then
X
ϕ(h) = gh(g)
g∈R
Algebraic dynamic programming and monotone computations 57

is not a homomorphism. Instead, the ring introduced in Subsection 2.1.3.4


can be used. Let R denote the ring defined in Subsection 2.1.3.4, and define
ϕ : N[R+ ] → R as  
X X
ϕ(h) =  h(g), gh(g) .
g∈R g∈R

It is indeed a homomorphism, since


 
X X
ϕ(h1 + h2 ) =  (h1 + h2 )(g), g(h1 + h2 )(g) =
g∈R g∈R
 
X X X X
 h1 (g) + h2 (g), gh1 (g) + gh2 (g) =
g∈R g∈R g∈R g∈R

ϕ(h1 ) + ϕ(h2 ) (2.82)

and
 
X X
ϕ(h1 h2 ) =  (h1 h2 )(g), g(h1 h2 )(g) =
g∈R g∈R
 
X X X X
 h1 (g1 )h2 (g2 ), g h1 (g1 )h2 (g2 ) =
g∈R g1 +g2 =g g∈R g1 +g2 =g
   
X X X X
 h1 (g1 )  h2 (g2 ) , (g1 + g2 )h1 (g1 )h2 (g2 ) =
g1 ∈R g2 ∈R g∈R g1 +g2 =g
   
X X X X
 h1 (g1 )  h2 (g2 ) , (g1 + g2 )h1 (g1 )h2 (g2 ) =
g1 ∈R g2 ∈R g1 ∈R g2 ∈R
  
X X X X
 h1 (g1 ), g1 h1 (g1 )  h2 (g2 ), g2 h2 (g2 ) =
g1 ∈R g1 ∈R g2 ∈R g2 ∈R

ϕ(h1 )ϕ(h2 ). (2.83)

The homomorph image calculates


 
X
ϕ(F (S(θ)) = |S(θ)|, f (a)
a∈S(θ)

thus, weighted sums can also be calculated when the function f is additive on
the objects.
This idea can be extended to an arbitrary commutative ring, and even
58 Computational complexity of counting and sampling

higher-order moments can be calculated. A commutative ring (Rm , ⊕, ) is


introduced. R is an arbitrary commutative ring with a multiplicative unit,
⊕ is the coordinate wise addition and is the multiplication defined in the
following way. When two elements (a0 , a1 , . . . , am−1 ) and (b0 , b1 , . . . , bm−1 )
are multiplied, the product at the kth coordinate contains
k  
X k
ak−i bi . (2.84)
i=0
i

This multiplication is associative since


k   i  
X k X i
ak−i bi−j cj =
i=0
i j=0
j
X  k

= ak−i bi−j cj =
j, i − j, k − i
0≤j≤i≤k
 
k   X i  
X k  i
= ai−j bj  ck−i . (2.85)
i=0
i j=0
j

It is easy to see that the distributive rule also holds. This commutative ring
can be used to calculate the moments of an additive function over an ensemble
of combinatorial objects, A, on which a yield algebra exists where all opera-
tions are binary. An additive function is a function g such that for all binary
operations ◦i ,
g(a ◦i b) = g(a) + g(b) (2.86)
where the addition is in the ring R. If x = (p0 , p1 , . . . , pm−1 ) and y =
(q 0 , q 1 , . . . , q m−1 ), then indeed

x y = (p + q)0 , (p + q)1 , . . . , (p + q)m−1 .



(2.87)

Hence, if f (a) = g(a)0 , g(a)1 , . . . , g(a)m−1 then

f (a ◦i b) = f (a) f (b) (2.88)

namely, each Ti function is the multiplication, which is naturally distributive


with respect to the ⊕ addition. The evaluation algebra calculates
 
X X X
F (S(θ)) =  g(a)0 , g(a)1 , . . . , g(a)m−1  . (2.89)
a∈S(θ) a∈S(θ) a∈S(θ)

An example for computing higher-order moments is given at the end of Sub-


section 2.3.3.
Optimization problems can also be extended to further cases. The usual
multiplication with non-negative real numbers can be considered as tropical
Algebraic dynamic programming and monotone computations 59

powering, and tropical powering also satisfies the distributive rule. Indeed, if
for an operator ◦i , the f function satisfies the equation
   mi
X
mi
f ◦i (aj )j=1 = ci,0 + ci,j f (aj )
j=1
[= Ti (f (a1 ), . . . , f (ami ), p(a1 ), . . . , p(ami ))] (2.90)
where each ci,j is non-negative (and might depend on parameters p(aj )), then
it also holds that
Ti ( min f (a1 ), . . . , min f (ami ), θ1 , . . . , θmi ) =
a1 ∈S(θ1 ) ami ∈S(θmi )
  
mi
min . . . min f ◦i (aj )j=1 . (2.91)
a1 ∈S(θ1 ) ami ∈S(θmi )

Another interesting
  case
is when R is a distributive lattice, and for each
mi
operator oi , f ◦i (aj )j=1 can be described with operations in R. An eval-
uation algebra can be built in this case, since the distributive rules
_ _ _
(f (ak ) ∨ f (al )) = f (ak ) ∨ f (al ) (2.92)
k,l k l

and _ _ _
(f (ak ) ∧ f (al )) = f (ak ) ∧ f (al ) (2.93)
k,l k l

as well as the dual equalities hold. A special case is when R is the set of
real numbers, ∨ is the maximum and ∧ is the minimum. The so-obtained
(min,max)-semiring can also be used in optimization problems, for example,
finding the highest vehicle that can travel from some point A to point B
on a network of roads containing several bridges with different height (see
Example 6).

2.3 The zoo of counting and optimization problems solv-


able with algebraic dynamic programming
In this section, we give a large ensemble of combinatorial objects on which
evaluation algebras can be constructed. We also define several computational
problems and provide evaluation algebras to solve them.

2.3.1 Regular grammars, Hidden Markov Models


Transformational grammars have been invented by Noam Chomsky [41].
Regular grammars are one of the simplest transformational grammars as they
60 Computational complexity of counting and sampling

are in the lowest level of the Chomsky hierarchy [42]. Stochastic versions of
regular grammars are related to Hidden Markov Models [163, 15].

Definition 26. A regular grammar is a tuple (T, N, S, R), where T is a finite


set called terminal characters, N is a finite set called non-terminal characters.
The sets T and N are disjoint. S ∈ N is a special non-terminal character,
called starting non-terminal. R is a finite set of rules, each in one of the
following forms

W → xW 0 (2.94)
W → x (2.95)
W →  (2.96)

where W, W 0 ∈ N , x ∈ T ,  denotes the empty sequence. The shorthand for


the rules (2.94)-(2.96) is
W → xW 0 | x | .
A generation is a finite series of transformations

S = X0 → X1 → . . . → Xk ∈ T ∗ (2.97)

where for each i = 1, . . . , k − 1, there exists a rule W → β and a word


Xp ∈ T ∗ such that Xi = Xp W and Xi+1 = Xp β. (Here β might be any
sequence appearing at the right-hand side of a rewriting rule.) T ∗ denotes the
finite sequences from T . The language LG ⊆ T ∗ contains those sequences that
can be generated by the grammar. A grammar is said to be unambiguous if
any sequence X ∈ LG can be generated in exactly one way. An ambiguous
grammar contains at least one sequence in its language that can be generated
in at least two different ways.
A generation is possible if there is a non-terminal in the intermediate
sequence. Once the sequence contains only terminal characters, the generation
is terminated. This is the rationale behind the naming of terminals and non-
terminals.
Given a sequence X ∈ T ∗ , the following questions can be asked:
(a) Is X ∈ LG ?
(b) How many generations are there which produce X?

(c) Given a function w : R → R≥0 , which generation G maximizes


k−1
Y
w(Wi → βi )
i=0

where the rewriting rule Wi → βi is applied in the ith step of the


generation G generating X in k steps?
Algebraic dynamic programming and monotone computations 61

(d) Given a function w : R → R≥0 , compute

i −1
X kY
w(Wi,j → βi,j )
Gi j=0

where the rewriting rule Wi,j → βi,j is applied in the jth step of the
generation Gi generating X in ki steps.

These questions can be answered using the same yield algebra and dif-
ferent evaluation algebras. The yield algebra builds the possible generations
of intermediate sequences. Note that in any generation of any regular gram-
mar, each intermediate sequence appearing in the sequence of generations is
in form Y W , where Y ∈ T ∗ and W ∈ N . The yield algebra is the following.
The set A contains the possible generations of intermediate sequences Xi W ,
where Xi denotes the prefix of X of length i. The parameters are pairs (i, W )
denoting the length of the prefix and the current non-terminal character. For
i = |X|, a parameter (i, ) is also considered. This parameter describes the
set of possible generations of X. In the partial ordering of the parameters,
(i1 , W1 ) ≤ (i2 , W2 ) if i1 ≤ i2 . For each rewriting rule W → β, there is a unary
operation ◦W →β extending the rewriting with a new rewriting W → β. The
recursions are

S((i, W )) = t ◦W 0 →xi W (S((i − 1, W 0 ))) (2.98)


W 0 |(W 0 →xi W )∈R
 
S((i, )) = t ◦W →xi (S((i − 1, W ))) t
W |(W →xi )∈R

t ◦W → (S((i, W ))) for i = |X| (2.99)


W |(W →)∈R

with the initial condition S((0, S)) = {S}, namely, the set contains S as the
rewriting sequence containing no rewriting steps, and S((0, W )) = ∅ for all
W 6= S.
For the given problems, the following evaluation algebras can be con-
structed.
(a) The semi-ring is the Boolean semiring ({0, 1}, ∨, ∧). The function f
is the constant 1. Each function Tα→β is the identity. The answer
for the decision question is “yes” if F (S((|X|, ))) = 1, and “no” if
F (S((|X|, ))) = 0. This latter can happen if S((|X|, )) = ∅, since the
empty sum in the Boolean semiring is 0, being the additive unit.
(b) The semi-ring is Z. The f function is the constant 1. Each function
Tα→β is the identity function. The number of possible generations is
F (S((|X|, ))).
(c) The semiring is the dual exponentiated tropical semiring, (R≥0 , max, ·).
62 Computational complexity of counting and sampling

The f function for a generation G is


k−1
Y
w(Wi → βi ).
i=1

Each TW →β is the multiplication with w(W → β)). The maximum score


is F (S((|X|, ))).
(d) The semiring is (R≥0 , +, ·). The f function for a generation G is
k−1
Y
w(Wi → βi ).
i=1

Each TW →β is the multiplication with w(W → β)). The sum of the


scores over all possible generations is F (S((|X|, ))).
If the grammar is unambiguous, then the following counting problem can
be also solved: Given a regular grammar G = {T, N, S, R} and a series
τ1 , τ2 , . . . , τn , where ∀τi ⊆ T , how many sequences X = x1 x2 . . . xn exist
in the language of G, such that ∀xi ∈ τi ? The yield algebra builds the possible
generations of intermediate sequences Y W such that ∀yi ∈ τi and W ∈ N .
The same parameters and operations can be used, and the recursions are
S((i, W )) = tx∈τi tW 0 |(W 0 →xW )∈R ◦W 0 →xW (S((i − 1, W 0 )))(2.100)

S((n, )) = tx∈τn tW |(W →x)∈R ◦W →x (S((n − 1, W ))) t
tW |(W →)∈R ◦W → (S((n, W ))) (2.101)
with the same initial conditions as above. The evaluation algebra is the stan-
dard one for counting the size of the sets (case (b) above). F (S((n, ))) counts
the number of possible generations that produce a sequence X, such that
∀xi ∈ τi . However, since the grammar is unambiguous, this number is also the
number of sequences satisfying the prescribed conditions.
On the other hand, the same counting problem is #P-complete for am-
biguous grammars. This will be proven in Chapter 4.
When each τi = T , the given algebraic dynamic programming algorithm
counts the number of sequences of length n in the language that the grammar
generates. This is demonstrated with the following example.
Example 4. Compute the number of sequences of length n from the alphabet
{a, b} that contains an even number of a’s and an odd number of b’s.
Solution. The following unambiguous regular grammar generates those se-
quences. T = {a, b}, N = {Wee , Weo , Woe , Woo }, S = Wee , and the rewriting
rules are
Wee → aWoe | bWeo (2.102)
Weo → aWoo | bWee |  (2.103)
Woe → aWee | bWoo (2.104)
Woo → aWeo | bWoe . (2.105)
Algebraic dynamic programming and monotone computations 63

Indeed, here e stands for even, and o stands for odd, and the two characters
in the index of the nonterminals tell the parity of the number of a’s and
b’s generated so far. For example, Woe denotes that so far an odd number
of a’s and an even number of b’s have been generated, etc. Wee is indeed the
starting non-terminal, since at the beginning, 0 number of characters has been
generated and 0 is an even number. The generation can be stopped when an
even number of a’s and an odd number of b’s have been generated, as indicated
by the Weo →  rule. 

As can be seen, non-terminals play the role of “memory” in generating


sequences. Above the non-terminals at the end of the intermediate sequences,
the generation is memoryless. We can consider the stochastic version of regular
grammars, and the memoryless property provides that the stochastic versions
of regular grammars are Markov processes. The stochastic regular grammars
can be defined in the following way.
Definition 27. A stochastic regular grammar is a tuple (T, N, S, R, π), where
(T, N, S, R) is a regular grammar and π : R → R+ is a probability distribution
for each non-terminal, that is, for each W ∈ N , the equality
X
π(W → β) = 1 (2.106)
β|(W →β)∈R

holds. (Recall that β is any of the sequences that might appear in the right-hand
side of a rewriting rule of a regular grammar, including the empty sequence.)
A stochastic regular grammar makes random generations

S = X1 → X2 → . . .

where in each rewriting step, the rewriting rule W → β is chosen randomly


following the distribution π.

The random generation in a stochastic regular grammar can be viewed as


a random process, in which the states are the intermediate sequences. This
process indeed has the Markovian property. That is, what the intermediate
sequence Xi+1 is depends only on Xi and does not depend on any Xj , j < i.
One can ask for a given sequence X ∈ T ∗ what the most likely generation and
the total probability of the generation are. This latter is the sum of the prob-
abilities of the possible generations. Both questions can be answered by the
algebraic dynamic programming algorithms using the appropriate evaluation
grammars described above for a general w function.
Stochastic regular grammars are closely related to Hidden Markov Models.
Definition 28. A Hidden Markov Model is a tuple (G, ~ ST ART, EN D, Γ, T, e),
~
where G = (V, E) is a directed graph in which loops are allowed but parallel
edges are not, ST ART and EN D are distinguished vertices in G, ~ ST ART
has 0 in-degree, EN D has 0 out-degree, Γ is a finite set of symbols, called an
64 Computational complexity of counting and sampling

alphabet, T : E → R+ is the transition probability function satisfying for all


u 6= EN D X
T (e) = 1.
v|(u,v)=e∈E

and e : Γ × (V \ {ST ART, EN D}) → R≥0 is the emission probability function


satisfying for all v ∈ V \ {ST ART, EN D}
X
e(x, v) = 1
x∈Γ

The vertices of G~ are called states. A random walk on the states is defined
by G~ and the transition probabilities. The random walk starts in the state
ST ART and ends in the state EN D. During such a walk, states emit char-
acters according to the emission distribution e. In case of loops, the random
walk might stay in one state for several steps. A random character is emitted
in each step. The process is hidden in the sense that an observer can see only
the emitted characters and cannot observe the random walk itself. An emission
path is a random walk together with the emitted characters. The probability of
an emission path is the product of its transition and emission probabilities.
If (u, v) is an edge, then the notation T (v|u) is also used, emphasizing
that T is a conditional distribution. Indeed T (v|u) is the probability that the
Markov process will be in state v in the next step given that it is in the state
u in the current step.
One can ask, for an X ∈ Γ∗ , what the most likely emission path and the
total emission probability are. This latter is the sum of the emission path
probabilities that emit X. These questions are equivalent with those that can
be asked for the most likely generation and the total generation probability of
a sequence in a stochastic regular grammar, stated by the following theorem.
Theorem 24. For any Hidden Markov Model H = (G, ~ ST ART, EN D, Γ, T, e),
there exists a stochastic regular grammar G = (T, N, S, R, π) such that Γ = T ,
LG is exactly the set of sequences that H can emit, and for any X ∈ LG ,
the probability of the most likely generation in G is the probability of the most
likely emission path in H, and the total generation probability of X in G is the
total emission probability of X in H. Furthermore, the running time needed
to construct G is a polynomial function of the size of G.~

Proof. For G ~ = (V, E), let the non-terminals of the regular grammar corre-
spond to V . That is, for any v ∈ V , there is a non-terminal Wv . The starting
non-terminal S = WST ART . For each (u, v) ∈ E, v 6= EN D and x ∈ Γ such
that e(v, x) 6= 0, construct a rewriting rule
Wu → xWv
with probability T (v|u)e(v, x) and for each (u, EN D) ∈ E construct a rewrit-
ing rule
Wu → 
Algebraic dynamic programming and monotone computations 65

with probability T (EN D|u). This is indeed a probability distribution, since


for any u
X X X
π(Wu → β) = π(Wu → xWv ) + π(Wu → )
β|Wu →β v6=EN D, x|e(x,v)6=0
(u,v)∈E
X X
= T (v|u)e(v, x) + T (EN D|u)
v6=EN D, x|e(x,v)6=0
(u,v)∈E
X X
= T (v|u) e(v, x) + T (EN D|u)
v6=EN D, x|e(x,v)6=0
(u,v)∈E
X
= T (v|u) + T (EN D|u) = 1 (2.107)
v6=EN D,
(u,v)∈E

where π(Wu → ) = T (EN D|u) = 0 if there is no edge going from u to EN D.


There is a bijection mapping an emission path
ST ART → u1 → u2 → . . . → un → EN D
emitting sequence X = x1 x2 . . . xn to the generation
WST ART → x1 Wu1 → x1 x2 Wu2 → . . . → x1 x2 . . . xn Wun → x1 x2 . . . xn .
This bijection proves that the set of emittable sequences is indeed LG . Further-
more, it is easy to check that the bijection keeps the probabilities, therefore,
the most likely generation of a sequence X is the most likely emission of X in
H and the total probability of generating X is the total emission probability
of X in H.
From this proof, it is also clear that there is a yield algebra building
the possible emission paths of a sequence X, furthermore, for any series of
τ1 , τ2 , . . . , τn , (τi ⊆ Γ), there is a yield algebra building the emission paths
generating such Y = y1 y2 . . . yn sequences that for each i, yi ∈ τi . Therefore,
similar problems will be tractable for HMMs than for stochastic regular gram-
mars. The algorithms solving these problems are well known in the scientific
literature. The algorithm finding the most likely emission path is known as
the Viterbi algorithm [180, 69] and the algorithm summing the probabilities of
possible emission paths is called the Forward algorithm [14, 16]. It is also well-
known in the Hidden Markov Model literature that “the Viterbi algorithm is
similar [...] in implementation to the forward calculation” [145].

2.3.2 Sequence alignment problems, pair Hidden Markov


Models
The sequence alignment problem was first considered by two biologists,
Saul B. Needleman and Christian D. Wunch [137]. They developed a dynamic
66 Computational complexity of counting and sampling

programming algorithm to find an optimal alignment of two sequences and


used this algorithm to infer the relationship between two protein sequences.
The mathematically rigorous description of sequence alignment was published
by Peter H. Shellers [153]. Sequence alignment methods have been the central
procedures of bioinformatics.
Definition 29. A sequence alignment of sequences X, Y ∈ Γ∗ is a 2 × L table
filled in with characters from Γ ∪ {−} where − ∈
/ Γ, called the gap character,
satisfies the following rules:
(a) there is no column containing two gap characters and
(b) the non-gap characters in the first row form sequence X, and the non-
gap characters in the second row form sequence Y .
− x
The columns are called insertions, the columns are called deletions,
y −
x
and the columns are called matches if x = y and mismatches or substi-
y
tutions if x 6= y. The joint name of insertions and deletions is gap.
The minimum length of a sequence alignment of sequences X and Y is
max{|X|, |Y |} and the maximum length is |X| + |Y |. It is easy to show that
the number of alignments of two sequences with length n and m is
min{n,m}
X (n + m − i)!
, (2.108)
i=0
i!(m − i)!(n − i)!

see also Exercise 13. A subset of alignments called substitution-free alignments


consists of those in which for any alignment column containing characters
x and y, if x is not a gap symbol and y is not a gap symbol then x =
y. It is easy to build a yield algebra of these alignments. The base set A
contains such alignments for all possible prefixes of the sequences X and Y .
The parameters (i, j) indicate the length of the prefixes. In the partial ordering
of the parameters, (i1 , j1 ) ≤ (i2 , j2 ) if i1 ≤ i2 and j1 ≤ j2 . The unary operators
x x
◦x , ◦ x and ◦− extend the alignment with the alignment column , and
x − y x −

, respectively. The recursions are
y
S((i, j)) = ◦xi (S((i − 1, j))) t ◦ − (S((i, j − 1))) if xi 6= yj (2.109)
− yj
S((i, j)) = ◦xi (S((i − 1, j − 1))) t
yj
◦xi S((i − 1, j)) t ◦ − S((i, j − 1)) if xi = yj (2.110)
− yj
 
 
with the initial condition S((0, 0)) = , where is the empty align-
 
ment.
Algebraic dynamic programming and monotone computations 67

This yield algebra can be used to solve the following optimization problems
with appropriate evaluation algebras:

(a) Longest common subsequence The subsequences of a sequence X =


x1 x2 . . . xn are the sequences xi1 xi2 . . . xim where for all k = 1, . . . , m−1,
ik < ik+1 . Sequence Z is a common subsequence of X and Y if Z is a sub-
sequence of X and a subsequence of Y . Any substitution-free alignment
corresponds to a common subsequence: the subsequence containing the
x
characters of the columns (one character from each column, obvi-
x
ously), and vice versa, any common subsequence can be represented with
a substitution-free alignment. There is no bijection between the common
subsequences and the substitution-free alignments, since there might be
more than one substitution-free alignment indicating the same common
subsequence. On the other hand, the mapping of substitution-free align-
ments onto the common subsequences is a surjection. The length of the
longest common subsequence can be found by finding the substitution-
x
free alignment with the maximum number of columns. The semiring
x
in the evaluation algebra must be the dual tropical semiring (with max-
imum instead of minimum), and the function f assigns the number of
x
columns to each substitution-free alignment. The Tx function for
x x
operator ◦x is the tropical multiplication by 1 (that is, the usual adding
x
of 1), while the functions T x and T− are the identity functions. The
− y
length of the longest common subsequence is F (S((n, m))).
(b) Edit distance The edit distance of sequences X and Y is the minimum
number of insertion and deletion operations necessary to transform X to
Y . Any substitution-free alignment corresponds to a series of insertion
and deletion operations (although the order of these operations is not
specified by the alignment). The semiring in the evaluation algebra is
the tropical one, and the f function maps the number of insertions
and deletions to a substitution-free alignment. The function Tx is the
x
identity function, while both the T x and T− functions are both the
− y
tropical multiplication by 1. The edit distance is F (S((n, m))).
(c) Shortest common supersequence. Sequence Z is a supersequence
of X if X is a subsequence of Z. A shortest common supersequence
of X and Y is a common supersequence with minimal length. Any
substitution-free alignment corresponds to a common supersequence.
Indeed, just read the non-gap characters in each insertion and dele-
tion column and one copy of the common characters in each match col-
umn. Vice versa, any common supersequence can be represented with a
substitution-free alignment. Similar to the longest common subsequence
68 Computational complexity of counting and sampling

problem, there is no bijection between substitution-free alignments


and common supersequences, however, the mapping of substitution-free
alignments to common supersequences is a surjection. The semiring in
the evaluation algebra is the tropical one, and the f function assigns its
length to a substitution-free alignment. Each T function is the tropical
multiplication by 1. The length of the shortest common supersequence
is F (S((n, m))).
There are natural corresponding counting problems to these optimization
problems. However, we can count the ways that common subsequences or com-
mon supersequences can be represented with substitution-free alignments, and
therefore the number of these ways and not the number of different common
subsequences and supersequences can be found with algebraic dynamic pro-
gramming. On the other hand, if the sequences are permutations, then any
common subsequence can be represented in exactly one way. Thus, the longest
common subsequences of two permutations can be counted using algebraic dy-
namic programming. The algebraic structure in the evaluation algebra should
be Z[z], f assigns z k to a substitution-free alignment with k match columns.
The function Tx is the multiplication with z, and both T x and T− are the
x − z
identity functions. The number of longest common subsequences of two per-
mutations of length n is the coefficient of the largest monomial of F (S((n, n))).
For permutations, yet another question arises, based on the following def-
inition.
Definition 30. A permutation σ = σ1 σ2 . . . σk is a subpermutation of
π = π1 π2 . . . πn if π contains a subsequence πi1 πi2 . . . πik such that ∀l, m,
πil < πim ⇐⇒ σl < σm
It is clear that the longest common subsequences of two permutations
correspond to the longest common subpermutations. However, two longest
common subsequences might be the same subpermutations. For example, there
are 4 longest common subsequences of the permutations 2143 and 1234: 24, 23,
14, 13. However, they are the same subpermutations, 12. Counting the longest
common subpermutations is known to be #P-complete. Actually, deciding if
a permutation contains a given subpermutation is already NP-complete, and
the corresponding counting problem is #P-complete [21].
Regarding the edit distance, it is natural to count the ways that a se-
quence X can be transformed into sequence Y using insertion and deletion
operations. When X is transformed into Y with the total number of p inser-
tion and deletion operations, there are p! ways to perform these operations. A
natural attempt is to count the substitution-free alignments of two sequences
containing p insertions and deletions, then multiply this number by p!. How-
ever, different alignments might contain the same insertions and deletions,
just in different order. For example
AB-C
A-DC
Algebraic dynamic programming and monotone computations 69

and

A-BC
AD-C
both contain the deletion of B and insertion of D. The problem can be elim-
inated if insertions are not allowed after deletions, however, deletions are al-
lowed after insertions. This needs a modification of the yield algebra, since
it must build only these substitution-free alignments. The parameter set also
should be modified. The triplet (i, j, t), with t ∈ {I, D, M } indicates the length
of the prefixes and whether the last alignment column is an insertion, deletion
or a match. The operators are the same, and the recursions are

S((i, j, M )) = tt∈{I,D,M } ◦xi (S((i − 1, j − 1, t))) if xi = yj(2.111)


yj
S((i, j, M )) = ∅ if xi 6= yj (2.112)
S((i, j, I)) = tt∈{I,M } ◦ − (S((i, j − 1, t))) (2.113)
yj
S((i, j, D)) = tt∈{I,D,M } ◦xi (S((i − 1, j, t))) (2.114)



with initial conditions S((0, 0, M )) and S((0, 0, I)) = S((0, 0, D)) = ∅.

Technically, the match here means neither insertion nor deletion.
In the evaluation algebra, the algebraic structure is Z[z], and f assigns z k to
a substitution-free alignment with k insertions and deletions. The function Tx
x
is the identity function, while both the T x and T− functions are multiplication
− y
by z. If
n+m
X
F (S((n, m)) = ci z i (2.115)
i=min{n,m}

then the number of ways that X can be transformed into Y is


n+m
X
ci i! (2.116)
i=min{n,m}

and the number of ways that X can be transformed into Y with a minimum
number of insertion and deletion operations is ck k! where k is the smallest
index such that ck 6= 0.
When substitutions are allowed, the yield algebra building all possible
alignments might be given. The parameters (i, j) and operators ◦x , ◦ x and
y −
◦− are as above. The recursions are simply
y

S((i, j)) = ◦xi (S((i − 1, j − 1))) t


yj
◦xi (S((i − 1, j))) t ◦ − (S((i, j − 1))) (2.117)
− yj
70 Computational complexity of counting and sampling
 

with the initial condition S((0, 0)) = .

A weight function w mapping from the operators to reals or integers can
be introduced. The following questions can be asked, which are solvable with
the appropriate evaluation algebras:

(a) The alignment minimizing/maximizing the sum/product of the


weights. The semiring in the evaluation algebra is the ordinary/dual
ordinary/exponentiated tropical semiring. In the exponentiated tropi-
cal semiring, the weights might be only non-negative. The function f
assigns the appropriate score to each alignment, the operators are the
tropical multiplications with the given weight. The score of the optimal
alignment is F (S((n, m))).
(b) The number of alignments with a given score. This can be calcu-
lated easily if the weights are integers, small in absolute value, and the
score of the alignments is additive. Then the algebraic structure in the
evaluation algebra is Z[z], and f assigns z s to an alignment with score
s. Each T function is the multiplication with z w , where w is the weight
of the operator. The number of alignments with a prescribed score s is
the coefficient of z s in F (S((n, m))).
Biologists would like to see sequence alignments in which insertions and
deletions are aggregated; therefore they define different gap penalty functions.
Recall that the joint name of insertions and deletions is “gap” and a gap
penalty function is a scoring of gaps such that it depends on only the length
of the gaps, that is, how many insertions and deletions are aggregated. The
most commonly used gap penalty function is the affine gap penalty [81], in
which a k-long run of insertions or deletions gets a go + (k − 1)ge score,
where go is the gap opening penalty and ge is the gap extension penalty.
Gaps must be the same type in a run of gaps, that is, the score of a run of
insertions of length k followed by the run of deletions of length l gets a score
go + (k − 1)ge + go + (l − 1)ge and not go + (k + l − 1)ge . When the score is to
be minimized, go is set larger than ge . This causes alignments in which gaps
are aggregated to get a smaller score than the alignments having the same
number of gaps and the same number of matches and mismatches of the same
type, just the gaps are scattered. For example, the alignment
AACTAT
ACC--T
has a smaller score than the alignment

AACTAT
A-C-CT
under the affine gap penalty scoring scheme, although the two alignments
Algebraic dynamic programming and monotone computations 71

are alignments of the same sequences containing the same type of alignment
columns, just in different order.
Under this scoring scheme, the scoring of an alignment is no longer addi-
tive in the strict sense that the score depends on only the individual align-
ment columns. However, the score of an insertion or deletion only depends on
whether or not the previous alignment column is of the same type. If the yield
algebra is built up separating the different types of alignments by extending
the parameters with the different indicator variables, then the scoring can
be done appropriately in the evaluation algebra. That is, the yield algebra is
exactly the same as introduced above with recursions in Equations (2.111)–
(2.114). Recall that in this yield algebra, only those alignments are built that
do not contain insertions after deletions, however, deletions might occur after
insertions.
If the aim is to find the smallest possible score, then the semiring in the
evaluation algebra is the tropical one. The f function assigns the score to each
alignment. The Tx function for operator ◦x is the tropical multiplication with
y y
x
the weight of the alignment column . The functions T x and T− depend on
y − y
the parameters. T x is the tropical multiplication with go if the parameter is

(i, j, M ) or (i, j, I) and the tropical multiplication with ge if the parameter is
(i, j, D). Similarly, T− is the tropical multiplication with go if the parameter
y
is (i, j, M ) and the tropical multiplication with ge if the parameter is (i, j, I)
(recall that insertions cannot follow deletions).
Counting problems can be solved with the appropriate modification of the
evaluation algebra, see for example, Exercise 20.
The parameters (i, j, x), where i and j denote the length of the prefixes
and x is a member of a finite set are very similar to the parameters (i, W )
used in regular grammars and Hidden Markov Models. It is natural to define
pair Hidden Markov Models and to observe that the introduced yield algebras
for sequence alignments (for example, the ones defined by recursions in Equa-
tions (2.111)–(2.114)) are special cases of the class of yield algebras that can
be constructed based on pair Hidden Markov Models.
Definition 31. A pair Hidden Markov Model is a tuple
~ ST ART, EN D, Γ, T, e), where G
(G, ~ = (V, E) is a directed graph with
two distinguished vertices, ST ART and EN D. Vertices are called states.
Loops are allowed, however, parallel edges are not. The in-degree of the
ST ART and the out-degree of the EN D states is 0. Γ is a finite set of
symbols, called an alphabet, and T : E → R+ are the transition probability
function satisfying for all v 6= EN D
X
T (e) = 1.
u|(v,u)=e∈E

The function e : (Γ ∪ {−}) × (Γ ∪ {−}) × (V \ {ST ART, EN D}) → R≥0 is the


72 Computational complexity of counting and sampling

emission probability function satisfying for all v ∈ V \ {ST ART, EN D}


X
e((x, y), v) = 1
(x,y)∈(Γ∪{−})×(Γ∪{−})

where − ∈ / Γ. Furthermore, for all v ∈ V and x, y ∈ Γ the following implica-


tions also hold

e((x, −), v) = 0 =⇒ ∀x0 ∈ Γ e((x0 , −), v) = 0


e((−, y), v) = 0 =⇒ ∀y 0 ∈ Γ e((−, x0 ), v) = 0
e((x, y), v) = 0 =⇒ ∀x0 , y 0 ∈ Γ e((x0 , y 0 ), v) = 0.

Depending on which emission probabilities are not 0, the states are called
insertion, deletion and match states. A random walk on the states is defined by
~ and the transition probabilities. The random walk starts in the state ST ART
G
and ends in the state EN D. During such a walk, states emit characters or
pair of characters according to the emission distribution e. In case of loops, the
random walk might stay in one state for several consecutive steps. In each step,
a random character or a pair of characters is emitted. The emitted characters
generate two strings, X and Y . The process is hidden in the sense that an
observer can see only the emitted sequences and cannot observe the random
walk itself. The observer cannot even observe which characters are emitted
together and which ones individually, that is, the observer cannot see the so-
called co-emission pattern. An emission path is a random walk together with
the two emitted sequences. The probability of an emission path is the product
of its transition and emission probabilities.
Given two sequences, X and Y , and a pair Hidden Markov Model H, a
yield algebra can be constructed whose base set, A, is the partial emission
paths that emit prefixes of X and Y . The parameter set is (i, j, W ), where i
and j are the length of the prefixes and W is the current state of the emission
path, assuming that W has already emitted a character or pair of characters.
In the partial ordering of the parameters, (i, j, W ) ≤ (i0 , j 0 , W 0 ) if i ≤ i0 and
j ≤ j 0 . The operators ◦WM , ◦WI and ◦WD extend the emission path with one
step and the emission of the new state. Here, the indices M , I and D denote
the type of the states. The recursions are

S((i,
 j, WM ) =
tW |(W,WM )∈E ◦WM (S((i − 1, j − 1, W ))) if e((xi , yj ), WM ) 6= 0
∅ if e((xi , yj ), WM ) = 0
S((i,
 j, WI ) =
tW |(W,WI )∈E ◦WI (S((i, j − 1, W ))) if e((−, yj ), WI ) 6= 0
∅ if e((−, yj ), WI ) = 0
S((i,
 j, W D ) =
tW |(W,WD )∈E ◦WD (S((i − 1, j, W ))) if e((xi , −), WD ) 6= 0
.
∅ if e((xi , −), WD ) = 0
Algebraic dynamic programming and monotone computations 73

Similar evaluation algebras can be combined with this yield algebra like the
ones prescribed earlier in this section. The method can be extended to triple-
wise and multiple alignments as well as triple and multiple Hidden Markov
Models. However, the size of the parameter set grows exponentially with the
number of sequences. Indeed, the number of parameters below a parame-
ter (i1 , i2 , . . . , ik , W ) in the partial ordering of the set of the parameters is
Qk
Ω( j=1 ij ). Finding the minimum scored multiple alignment is proven to be
NP-hard [104, 183].

2.3.3 Context-free grammars


Context-free grammars are on the second level of the Chomsky hierarchy
[42]. They are widely used in computational linguistics, computer program
compilers, and also in bioinformatics.
Definition 32. A context-free grammar is a tuple (T, N, S, R), where T is
a finite set of symbols called terminals, N is a finite set of symbols called
non-terminals, T ∩ N = ∅, S ∈ N is a distinguished non-terminal called the
starting non-terminal, and R is a finite set of rewriting rules in the form

W →β

where W is a non-terminal and β ∈ (T ∪ N )∗ .


A generation is a finite series

S = X1 → X2 → . . . → Xk ∈ T ∗ (2.118)

where for each i = 1, . . . , k − 1, there exists a rule W → β and words


Xp , Xs ∈ T ∗ such that Xi = Xp W Xs and Xi+1 = Xp βXs . T ∗ denotes the
finite sequences from T .
The language LG ⊆ T ∗ contains those sequences that can be generated by
the grammar.
Each generation can be described with a parse tree. A parse tree is a vertex-
labeled, rooted tree in which the children of internal nodes are naturally or-
dered. The root of the tree is labeled with S, and for any internal node v labeled
with W , the number of descendants of v is |β|, where the rewriting rule W → β
is applied on W during the generation. If the k th character of β is a terminal
character a, the k th child is a leaf that is labeled with a; if the k th charac-
ter is a non-terminal W 0 , then the k th child is an internal node W 0 and its
descendant will be determined by the rewriting rule W 0 → β 0 applied on W 0
later in the generation. A grammar is said to be unambiguous if any sequence
X ∈ LG has exactly one parse tree. An ambiguous grammar contains at least
one sequence in its language that has at least two different parse trees.
Note that several generations might have the same parse tree; however,
these generations apply the same rewriting rules on the same non-terminals,
just in different order.
74 Computational complexity of counting and sampling

What happens with a non-terminal W depends on only W and not how


W has been generated. This is again a Markov property, and indeed, ran-
dom generations in a context-free grammar constitute a branching process,
which is indeed a Markov process. Therefore, similar to the stochastic regular
grammars, stochastic context-free grammars can be defined.
Definition 33. A stochastic context-free grammar is a tuple (T, N, S, R, π),
where (T, N, S, R) is a context-free grammar and π : R → R+ is a probability
distribution for each non-terminal, that is, for each W ∈ N , the equality
X
π(W → β) = 1 (2.119)
β|W →β∈R

holds. A stochastic context-free grammar makes random generations. A ran-


dom generation can be described with its parse tree. The probability of a parse
tree is the product of the probabilities of rewriting rules indicated by its inter-
nal nodes. Each rewriting rule has the appropriate multiplicity in the product.
The probability of a sequence in the grammar is the sum of the probabilities of
the parse trees that can generate it.
Remark 1. Unlike stochastic regular grammars, stochastic context-free gram-
mars might not define a distribution over the language they generate. This is
because the branching process described by a context-free grammar might not
end with probability 1. It can happen that with some probability separated from
0, a stochastic context-free grammar generates more and more non-terminals
in the intermediate sequences and the generation never stops. However, it is a
probability theory question in the theory of branching processes whether or not
a branching process halts with probability 1, and not discussed in this book.

An important class of context-free grammars which are in the so-called


Chomsky normal form [42] is defined below.
Definition 34. A context-free grammar is in Chomsky normal form if each
rewriting rule is one of the following

W → W1 W 2 | a

where W, W1 , W2 ∈ N and a ∈ T .
It is a well-known theorem that any context-free grammar G can be rewrit-
ten into a context-free grammar G0 such that G0 is in Chomsky normal form
and LG = LG0 [159]. Here an equivalent theorem is proved for stochastic
context-free grammars.
Theorem 25. For any context-free grammar G = {T, N, S, R, π} there exists
a context-free grammar G0 = {T, N 0 , S0 , R0 , π 0 } such that G0 is in Chomsky
normal form, and the following holds:
Algebraic dynamic programming and monotone computations 75

(a) There is a surjective function g : TG → TG0 , where TG is the set of


possible parse trees of G, such that g keeps the probabilities, that is, for
any τ 0 ∈ TG0
π 0 (τ 0 ) = π(g −1 (τ 0 ))
where X
π(g −1 (τ 0 )) = π(τ ).
τ |g(τ )=τ 0

Specifically, LG = LG0 , and the total generation probability of a sequence


X in G is the total generation probability of X in G0 .
(b) The set of rewriting rules R0 satisfy the inequality

|R0 | ≤ 128|R|3 + 4|R|2

where |R| is defined as the total sum of the lengths of the β sequences in
the rewriting rules in R.
(c) There is a polynomial running time algorithm that constructs G0 from
G.
Proof. The proof is constructive, and G0 is constructed in the following three
phases. Each phase consists of several steps, so G is transformed into G0 via
a series of steps, G = G1 , G2 , . . . Gk = G0 such that for all i = 1, . . . , k − 1, it
is proved that there is a surjective function from TGi to TGi+1 that keeps the
probability. Finally, it will be shown that the entire construction can be done
in polynomial time and indeed |R0 | ≤ 128|R|3 + 4|R|2 .
In the first phase, those rewriting rules W → β are considered for which
|β| > 2. If the current grammar is Gi in which there is a rewriting rule W → β
such that β = b1 , b2 . . . bk , then the new non-terminals W1 and W2 are added
to the nonterminals, and thus we get Ni+1 = Ni ∪ {W1 , W2 } together with
the following rewriting rules and probabilities

πi+1 (W → W1 W2 ) = πi (W → β)
πi+1 (W1 → b1 b2 ) = 1
πi+1 (W2 → b3 . . . bk ) = 1.

The rule W → β is removed from Ri and the above rules with the given
probabilities are added to get Ri+1 and πi+1 . The gi function replaces each
rule W → β with the above three rules. It is easy to see that gi is a bijection
between TGi and TGi+1 that keeps the probabilities.
The first phase finishes in O(|R|) time, since the integer value
X
|β|
β|∃W,W →β∈Ri ∧ |β|>2

is strictly monotonously decreasing. At the end of the first phase, a grammar


76 Computational complexity of counting and sampling

GI is constructed in which for each rewriting rule W → β in RI , |β| ≤ 2. Fur-


thermore it is clear that |RI | ≤ 2|R|, and the construction runs in polynomial
time with |R|.
In the second phase, those rewriting rules W → β are considered in which
|β| = 2, but β 6= W1 W2 for some non-terminals W1 and W2 . If the current
grammar is Gi containing a rewriting rule W → a1 a2 , where a1 , a2 ∈ T , then
the new non-terminals W1 , W2 are added to the non-terminals Ni to get Ni+1
together with the following rewriting rules and probabilities

πi+1 (W → W1 W2 ) = πi (W → a1 a2 )
πi+1 (W1 → a1 ) = 1
πi+1 (W2 → a2 ) = 1.

The rule W → a1 a2 is removed from Ri and the above rules are added with
the prescribed probabilities to get Ri+1 and πi+1 . The gi function replaces
each rule W → a1 a2 with the above three rules. It is easy to see that gi is a
bijection between TGi and TGi+1 that keeps the probabilities. Rewriting rules
W → a1 W2 and W → W1 a2 are handled in a similar way. However, observe
that it is sufficient to introduce only one new non-terminal.
The second phase finishes in O(|RI )| time since each rewriting rule is
modified at most once, and the new rules are not modified further. At the end
of the second phase, a grammar GII is constructed in which each rewriting
rule is either in Chomsky normal form or it is W → W 0 for some W, W 0 ∈
NII . Furthermore it is clear that |RII | ≤ 2|RI |, thus |RII | ≤ 4|R|, and the
construction runs in polynomial time with |R|.
In the third phase, those rules W → W 0 are considered which are the only
rules that are not in Chomsky normal form in GII . If Gi is a grammar in which
a W 0 exists so that for some W , W → W 0 ∈ R, then we do the following:

(a) For all W , such that W → W 0 ∈ R, for each W 0 → β the following


rewriting rules with the given probabilities are added

πi+1 (W → β) = πi (W → β) + πi (W → W 0 )πi (W → β). (2.120)

Simultaneously, the rewriting rule W → W 0 is removed.


(b) If a W → W rule appeared, remove it from the rules and adjust the
probabilities of all other rewriting rules W → β

πi+1 (W → β)
πi+1 (W → β) := . (2.121)
1 − πi+1 (W → W )

(c) If there is no W 6= W 0 such that W 0 appears in the sequence β in a


rewriting rule W → β ∈ Ri+1 , then remove W 0 from Ni+1 and all of its
rewriting rules.
Algebraic dynamic programming and monotone computations 77

The gi function is constructed such that any pair of rewriting rules W → W 0


and W 0 → β is replaced by W → β. In this way, several parse trees might be
mapped onto the same parse tree. However, the definition of the new proba-
bilities in Equation (2.120) together with the distributivity of multiplication
over addition provides that the mapping keeps the probability.
If the W → W rules appear, then there might be 0, 1, . . . W → W rules
before any W → β rule. The infinite sum of the geometric series, that is,
1
1 + πi+1 (W → W ) + πi+1 (W → W )2 + . . . =
1 − πi+1 (W → W )

and the definition of the new probabilities in Equation (2.121) together provide
that gi keeps the probabilities.
The third phase finishes in O(|NII |) number of steps, since in each step
one non-terminal is eliminated on the right-hand side, and no rule added in
which a non-terminal would appear on the right-hand side that was eliminated
earlier. At the end of the third phase, the grammar G0 is in Chomsky normal
form.
In any context-free grammar with non-terminals N and terminals T in
Chomsky normal form, the number of rewriting rules cannot exceed 2|N |3 +
|N ||T |. Notice that |N 0 | ≤ |NII | and T 0 = T . Furthermore, a very rough upper
bound is |NII | ≤ |RII |. Since the number of terminal characters cannot be
more than the sum of the length of the rewriting rules, it follows that

|R0 | ≤ 2|N |3 + |N ||T | ≤ 2|RII |3 + |RII ||R| ≤ 128|R|3 + 4|R|2 . (2.122)

The importance of rewriting the grammar in Chomsky normal form is that


there is a polynomially sized yield algebra for those grammars which are in
Chomsky normal form. Given a context-free grammar G and a sequence X, the
yield algebra builds those parse trees that generate a substring of X starting
with an arbitrary non-terminal W . The parameters are (i, j, W ), indicating
the first and the last index of a substring and the non-terminal labeling of the
root of the parse tree. In the partial ordering of the parameters, (i1 , j1 , W1 ) ≤
(i2 , j2 , W2 ) if i1 ≥ i2 and j1 ≤ j2 . The binary operator ◦W →W1 W2 takes two
parse trees whose roots are labeled with W1 and W2 and connects them to a
larger parse tree with a root labeled with W . The recursions are

S((i, j, W )) = tW1 tW2 tj−1


k=i S((i, k, W1 )) ◦W →W1 W2 S((k + 1, j, W2 )) (2.123)

with the following initial conditions. If W → xi ∈ / R, then S((i, i, W )) = ∅. If


W → xi ∈ R, then S((i, i, W )) is the set containing the parse tree generating
xi from W in one step. Similar yield algebras can be constructed that for a
given series of sets τ1 , τ2 , . . . τn , ∀τi ⊆ T build all parse trees generating all
possible sequences X in which all xi ∈ τi . Note that the corresponding evalu-
ation algebra counting the size of the sets with a given parameter counts all
78 Computational complexity of counting and sampling

possible parse trees and not just the possible sequences that can be generated
with the given condition. These two numbers are equal only if the grammar is
unambiguous. For ambiguous grammars it is also #P-complete to count the
number of sequences that can be generated with a given constraint since any
regular grammar is also context-free, and the counting problem in question is
already #P-complete for regular grammars.
The reason why a context-free grammar must be rewritten into Chom-
sky normal form is that the binary operation ◦W →W1 W2 indicates only O(n)
disjoint union operations in Equation (2.123), where n is the length of the
generated sequence. If there were a rewriting rule W → W1 W2 W3 , it would
require a ternary operator ◦W →W1 W2 W3 , and the recursion in the yield algebra
would require Ω(n2 ) disjoint union in form

S((i, j, W )) = tj−2 j−1


k1 =i tk2 =k1 +1 ◦W →W1 W2 W3 (S((i, k1 , W1 )),
S((k1 + 1, k2 , W2 ), S((k2 + 1, j, W3 )) t
t[further disjoint unions] (2.124)

and generally, a rewriting rule with k non-terminals on the right-hand side


would require Ω(nk−1 ) disjoint unions in the yield algebra (and thus, that
many operations in the corresponding evaluation algebra). The fact that the
recursion in Equation (2.124) can be split into recursions

S((i, j, W ) = tj−1 0
k2 =i+1 S((i, k2 , W )) ◦W →W W3 S((k2 + 1, j, W3 ) t
0

t[further disjoint unions]


0
S((i, k2 , W ) = tkk21 −1
=i S((i, k1 , W1 )) ◦W →W1 W2 S((k1 + 1, k2 , W2 ))
0

clearly highlights that the new non-terminals introduced in the first phase of
rewriting a grammar into Chomsky normal form provide additional “memory”
with which the calculations might be speeded up. It is also clear that efficient
algorithms are available for all context-free grammars in which the rewriting
rules are not in Chomsky normal form. However, there are at most 2 non-
terminals on the right-hand side in each rewriting rule, furthermore, there are
no rewriting rules in the form W → W 0 .
One family of the most known combinatorial structures that might be
described with context-free grammars are the Catalan structures. Catalan
structures with a parameter k are combinatorial structures whose number is
the Catalan number Ck . An example for them is the set of Dyck words.
Definition 35. A Dyck word is a finite sequence from the alphabet {x, y},
such that in any prefix, the number of x characters is greater than or equal to
the number of y characters, and the number of x and y characters is the same
in the whole sequence.

Dyck words can be generated by context-free languages, as stated in the


following theorem.
Algebraic dynamic programming and monotone computations 79

Theorem 26. The following unambiguous context-free grammar G =


(T, N, S, R) generates the Dyck words. T = {x, y}, N = {S}, and the re-
cursions are
S → xy | xSy | xSyS | xyS. (2.125)
Proof. It is clear that G generates Dyck words, since in each rewriting rule,
there is one x and one y, and each x precedes its corresponding y. Therefore
it is sufficient to prove that each Dyck word is generated by this grammar
in exactly one way. The proof is inductive. The only Dyck word with one x
and one y is xy, which can be generated, and there is only one parse tree
generating it.
Assume that D = d1 d2 . . . dn is a Dyck word, and n > 2. Let i be the
smallest index such that in the prefix Di , the number of x characters equals
the number of y characters.
If i = n, then d2 d3 . . . dn−1 is also a Dyck word. The first rewriting rule is
S → xSy, and S can generate d2 d3 . . . dn−1 by induction.
If i < n, then di+1 di+2 . . . dn is also a Dyck word. If i = 2, then the first
rewiritng rule is S → xyS, and S can generate di+1 di+2 . . . dn . If 2 < i < n,
then both d2 d3 . . . di−1 and di+1 di+2 . . . dn are Dyck words, The first rewriting
rule is S → xSyS, and the two S can generate d2 d3 . . . di−1 and di+1 di+2 . . . dn .
To prove that the grammar is unambiguous, first observe that the first
character x of the Dyck word is generated in the first rewriting. Namely, the
first x character is a child of the root in the parse tree. Assume that its
corresponding y is at a position j which is not the first index such that in Dj ,
the number of x characters equals the number of y characters. Then j 6= 2, and
the substring d2 d3 . . . dj−1 is not a Dyck word. However, it should be generated
by S, but any sequence generated by S is a Dyck word, a contradiction.
Although the recursions in Equation (2.125) are not in Chomsky normal
form, there are at most two non-terminals on the left-hand side, therefore, a
polynomial yield algebra can be constructed based on the grammar building
the Dyck words. The parameters p in the yield algebra are the positive integers,
indicating the number of x characters in the word. The ordering is the natural
ordering of the integers. Each rewriting rule has an appropriate operator. Due
to brevity, it is indexed with the right-hand side of the rewriting rule. The
recursions are

S(i) = ◦xSy (S(i − 1)) t ◦xyS (S(i − 1)) t


ti−2

j=1 S(j) ◦xSyS S(i − j − 1) (2.126)

with the initial condition S(1) = {xy}.


This yield algebra can be combined with several evaluation algebras. One
example is given below.
Example 5. Dyck words can be represented as monotonic lattice paths along
the edges of a grid with n × n square cells, going from the left bottom corner
80 Computational complexity of counting and sampling

i i0 j0 j i j i0 j0 i i0 j j0
a) b) c)

FIGURE 2.2: a) Nested, b) separated, c) crossing base pairs. Each base


pair is represented by an arc connecting the index positions of the base pair.

to the top right corner not stepping above the diagonal. Each x is represented
by a horizontal step, and each y is represented by a vertical step. The area of
a Dyck word is the area below the lattice path representing it. Give a dynamic
programming algorithm that calculates the average area of a Dyck word of
length n.
Solution. The yield algebra is the same as defined above. Since area is an
additive function, the algebraic structure in the evaluation algebra is the ring
R = (R2 , ⊕, ), where the addition is coordinatewise and multiplication is
defined by
(x1 , y1 ) (x2 , y2 ) = (x1 x2 , x1 y2 + y1 x2 ).
The f function assigns (1, a) to each Dyck word D where a is the area of
D. The T functions for the operators depend on the parameters. The TxyS
function is the multiplication with (1, k) when the unary operator cxyS is
applied on a Dyck word of length k. The function TxSy is the identity. Finally,
the TxSyS (a, b, k, l) function is a b (1, (k+1)l), where a, b ∈ R and k and l are
parameters. Indeed, adding xy at the beginning of a Dyck word of length 2k
(thus, with parameter k) increases its area by k. Adding an x to the beginning
and a y to the end of a Dyck word do not change its area. Finally, the area of
a Dyck word xDyD0 is the area of D plus the area of D0 plus (k + 1)l if the
parameters of the Dyck words are p(D) = k and p(D0 ) = l.
Y
If F (S(n)) = (X, Y ), then the average area is X . 
Context-free grammars are also used in bioinformatics, since the pseudo-
knot-free RNA structures can be described with these grammars.
Definition 36. An RNA sequence is a finite long string from the alphabet
{a, u, c, g}. A secondary structure is a set of pair of indexes (i, j), i + 2 < j
such that each index is in at most one pair. For each pair of indexes, the pair
of characters might only be (a, u), (c, g), (g, c), (u, a), (g, u) or (u, g). These
pairs are called base pairs. A secondary structure is pseudo-knot free if for all
pair of indexes, (i, j) and (i0 , j 0 ), i < i0 , it holds that either j 0 < j or j < i0 .
Namely, any pair of indexes are either nested or separated, and there are no
crossing base pairs, see also Figure 2.2.
Algebraic dynamic programming and monotone computations 81

The four characters represent the possible nucleic acids building the RNA
molecules. An RNA molecule is a single stranded polymer, the string can be
folded, and the nucleotides can form hydrogen bonds stabilizing the structure
of the RNA. The pair of indexes represent the nucleic acids making hydrogen
bonds. Due to spherical constraints, it is required that i + 2 < j. From now
on, any RNA secondary structure is considered to be pseudo-knot-free, and
the adjective “pseudo-knot-free” will be omitted.
Theorem 27. The following grammar (also known as the Knudsen-Hein
grammar, [115]) can generate all possible RNA secondary structures. T =
{a, c, g, u}, N = {S, L, F }, and the rewriting rules are

S → LS | L (2.127)
L → a | c | g | u | aF u | cF g | gF c | uF a | gF u | uF g (2.128)
F → aF u | cF g | gF c | uF a | gF u | uF g | LS (2.129)

The base pairs are the indexes of those pair of characters that are generated
in one rewriting step. Furthermore, each pseuknot-free RNA structure that a
given RNA sequence might have has exactly one parse tree in the grammar.
Proof. It is clear that the grammar generates RNA structures, so it is sufficient
to show that each possible RNA structure that a given RNA sequence might
have can be generated with exactly one parse tree.
In a given RNA secondary structure of a sequence X, let those base pairs
(i, j) be called outermost, for which no base pair (i0 , j 0 ) exists such that i0 < i
and j < j 0 . If (i, j) is an outermost base pair, then the base pairs inside the
substring X 0 from index i + 1 till index j − 1 form also an RNA structure on
the substring. Furthermore, either there is an outermost base pair (i + 1, j − 1)
on the substring X 0 or at least one of the following is true:
(a) there are at least two outermost base pairs or
(b) there is an outermost base pair (i0 , j 0 ) and a character which is not base-
paired and outside of the base pair (i0 , j 0 ), that is, if it has index k, then
k < i0 or k > j 0 or
(c) there are at least two characters in X 0 which are not base-paired and
are outside of any outermost base pair.
This comes from the fact that the length of X 0 is at least 2, according to
the definition of RNA secondary structure, i.e., i + 2 < j. Having said this,
a given RNA secondary structure can be generated in the following way. If
the outermost base pairs have indexes (i1 , j1 ), (i2 , j2 ), . . . (ik , jk ), then first
an intermediate sequence containing
k
X
i1 + (il − jl−1 ) + n − jk
l=2
82 Computational complexity of counting and sampling

number of Ls is generated, where n is the length of the sequence, applying an


appropriate number of times the S → LS rule, then the S → L rule. If there is
no outermost base pair (and thus, there are no base pairs at all), n number of
Ls are generated. Then each character which is not base-paired is generated
using the appropriate rewriting rule from the possibilities

L→a|c|g |u

and the outermost base pairs are generated using the appropriate rewriting
rule from the possibilities

L → aF u | cF g | gF c | uF a | gF u | uF g.

Then each intermediate string from the index il + 1 till the index jl − 1 must
be generated. If (il + 1, jl − 1) is an outermost base pair, then it is generated
by the appropriate rule from the possibilities

F → aF u | cF g | gF c | uF a | gF u | uF g,

otherwise the rule F → LS is applied, and the appropriate number of Ls are


generated if xil +1 xil +2 . . . xjl −1 was the entire sequence.
It is easy to prove that each secondary structure can be generated by
only one parse tree. If there are no base pairs, then no non-terminals F can
be generated, and then the only possibility is to generate n number of Ls
and rewrite them to the appropriate characters. If there are base pairs, and
thus some of them are outermost ones, they can be generated only from a
non-terminal L, which was generated by a starting nonterminal S. Then the
substring from index il + 1 till the index jl − 1 should be generated from a
non-terminal F . This is the same as generating a secondary structure from S,
just at least two characters must be generated. However, this is required by
the definition of RNA secondary structures.
Although the grammar is not in Chomsky normal form, there are at most
2 non-terminals on the right-hand side of each rewriting rule. Furthermore,
although there is a rewriting rule S → L, there is no possibility of a circular
generation
W → W1 , W1 → W2 , . . . Wk → W
in this grammar. Therefore, the yield algebra building possible RNA structures
based on this grammar is computationally efficient. A stochastic version of this
grammar might be used to predict RNA secondary structures: the one which is
generated by a most likely parse tree is a natural prediction for the secondary
structure.
Better predictions might be available by extending the grammar with fur-
ther non-terminals. Recall that non-terminals play the role of constant size
memory, thus the generation might “remember” what happened in previous
rewriting steps. For example, physical chemists measured that consecutive pair
Algebraic dynamic programming and monotone computations 83

of base pairs rather than just base pairs stabilizes an RNA secondary struc-
ture [168, 169]. Two base pairs (i, j) and (i0 , j 0 ) are consecutive if i0 = i + 1
and j 0 = j − 1. This can be emphasized via introducing a new non-terminal
F 0 . The non-terminal F represents the first base pair, and F 0 represents the
fact that there was a base pair in the previous rewriting step in the parse tree.
That is, the rewriting rules are modified as

L → a | c | g | u | aF u | cF g | gF c | uF a | gF u | uF g (2.130)
0 0 0 0 0 0
F → aF u | cF g | gF c | uF a | gF u | uF g (2.131)
0 0 0 0 0 0 0
F → aF u | cF g | gF c | uF a | gF u | uF g | LS. (2.132)

With this modification, any base pair must be in a consecutive pair of


base pairs. After further thermodynamic considerations, a rather complicated
context-free grammar is constructed with many non-terminals and rewriting
rules. A thermodynamic free energy is assigned to each rewriting rule. The
free energy of the secondary structure is additive, that is, the free energy of
a secondary structure is the sum of free energies assigned to each rewriting
rule. In this model, the following questions might be asked.
(a) The secondary structure with minimum free energy. This can
be obtained by choosing the tropical semiring in the evaluation algebra
together with the appropriate functions not detailed here.

(b) The probability of the minimum free energy secondary struc-


ture. In thermodynamics, the probability of a secondary structure is
given by the Boltzmann distribution
1 − ∆G(S)
PT (S) = e RT .
Z
Here PT (S) denotes the probability of the structure S in the Boltzmann
distribution at temperature T , ∆G(S) denotes the free energy of S,
and R is the Regnault or universal gas constant making the exchange
between temperature measured in Kelvin and free energy measured in
J/mol. Z is the so-called partition function defined as
X ∆G(S)
Z := e− RT

S∈S

where S is the set of all possible RNA secondary structures that an RNA
sequence might have. To be able to calculate the probability of the min-
imum free energy structure, the partition function must be calculated.
Since the free energies are additive in this model, the values
∆G(S)
e− RT

are multiplicative. Therefore the real numbers can be chosen as the


84 Computational complexity of counting and sampling

algebraic structure in the evaluation algebra, and the T functions for


binary operators are multiplications (possibly with multiplications with
constants, not detailed here), and the T functions for unary operators
are multiplications with constants.
(c) Moments of the Boltzmann distribution. It is also important to
know what the average value of the free energy in the Boltzmann distri-
bution is and what its variance is, since they indicate how extreme the
minimum free energy is in the distribution. The average free energy is
defined as
1 X ∆G(S)
∆G(S)e− RT
Z
S∈S

and the variance as


!2
1 X ∆G(S) 1 X ∆G(S)
∆G2 (S)e− RT − ∆G(S)e− RT .
Z Z
S∈S S∈S

To calculate the moments of the Boltzmann distribution, it is necessary


to calculate

X ∆G(S)
Z= e− RT ,
S∈S
X ∆G(S)
∆G( S)e− RT and
S∈S
X ∆G(S)
∆G2 (S)e− RT .
S∈S

If ◦ is a binary operator and S1 and S2 are secondary structures such


that for all S1 ∈ S1 and S2 ∈ S2 , ∆G(S1 ◦ S2 ) = ∆G(S1 ) + ∆G(S2 ),
then the following equalities hold:
X X ∆G(S1 ◦S2 )
∆G(S1 ◦ S2 )e− RT =
S1 ∈S1 S2 ∈S2
X X ∆G(S1 ) ∆G(S2 )
(∆G(S1 ) + ∆G(S2 ))e− RT e− RT =
S1 ∈S1 S2 ∈S2
X ∆G(S1 ) X ∆G(S2 )
∆G(S1 )e− RT e− RT +
S1 ∈S1 S2 ∈S2
X ∆G(S1 ) X ∆G(S2 )
e− RT ∆G(S2 )e− RT (2.133)
S1 ∈S1 S2 ∈S2
Algebraic dynamic programming and monotone computations 85
X X ∆G(S1 ◦S2 )
∆G(S1 ◦ S2 )2 e− RT =
S1 ∈S1 S2 ∈S2
X X ∆G(S1 ) ∆G(S2 )
(∆G(S1 ) + ∆G(S2 ))2 e− RT e− RT =
S1 ∈S1 S2 ∈S2
X ∆G(S1 ) X ∆G(S2 )
∆G(S1 )2 e− RT e− RT +
S1 ∈S1 S2 ∈S2
X ∆G(S1 ) X ∆G(S2 )
2 ∆G(S1 )e− RT ∆G(S2 )e− RT +
S1 ∈S1 S2 ∈S2
X ∆G(S1 ) X ∆G(S2 )
e− RT ∆G(S2 )2 e− RT . (2.134)
S1 ∈S1 S2 ∈S2

Therefore, knowing the values


X ∆G(Si )
e− RT ,
Si ∈Si
X ∆G(Si )
∆G( Si )e− RT and
Si ∈Si
X ∆G(Si )
∆G2 (Si )e− RT

Si ∈Si

for each Si is sufficient to calculate the moments of the Boltzmann dis-


tribution.
These algorithms are also well known in the scientific literature; here we
just gave a unified description of them. Michail Zucker and David Sankoff
gave the first dynamic programming algorithm for computing the minimum
free-energy RNA structure [187]. John S. McCaskill described the dynamic
programming algorithm computing the partition function of RNA structures
[126]. István Miklós, Irmtraud Meyer and Borbála Nagy gave an algebraic
dynamic programming algorithm to compute the moments of the Boltzmann
distribution [132].

2.3.4 Walks on directed graphs


Shortest path problems are typical introductory problems in lectures on
dynamic programming. From the algebraic dynamic programming point of
view, it should be clear that they are shortest walk problems. The difference
between path and walk is given in the definition below.
Definition 37. Let G ~ = (V, E) be a directed graph. A walk is a series
v0 , e1 , v1 , e2 , . . . , ek , vk of vertices vi ∈ V and edges ei ∈ E, such that for
all 1 ≤ i ≤ k, ei is incident to vi−1 and vi . A path is a walk with no repeated
vertices (and thus, no repeated edges). A closed walk is a walk in which the
first and the last vertex are the same.
86 Computational complexity of counting and sampling

Yield algebra can be built for walks in the following way. Given a directed
graph G ~ = (V, E), let A be the set of walks on G.
~ The parameters are triplets
(i, j, k), and the walk from vertex vi to vertex vj containing k edges is assigned
to the parameter (i, j, k). The partial ordering of the parameters is the natural
ordering of their last index. The unary operator ◦(vl ,vj ) concatenates the edge
e = (vl , vj ) and vertex vj to the end of a walk. The recursions are

S((i, j, k)) = t(vl ,vj )∈E ◦(vl ,vj ) (S((i, l, k − 1))) . (2.135)

The base cases are S((i, i, 0)) = {vi } and S((i, j, 0)) = ∅ for all i 6= j.
Given a weight function w : E → R, define
k
X
f (v0 , e1 , . . . , ek , vk ) := w(ei ).
i=1

An evaluation algebra can be built in which R is the tropical semiring, and


each T(vl ,vj ) is the tropical multiplication (the usual addition) with w((vl , vj )).
F (S((i, j, k))) calculates the weight of the smallest weight (= shortest) walk
from vi to vj in k steps. If G ~ does not contain a negative cycle (that is, a
Pk
closed walk, v0 , e1 , . . . ek , v0 such that i=1 w(ei ) < 0), then

min {F (S((i, j, k)))}


k=1,...,|V |−1

is the weight of the shortest path from vi to vj . Indeed, any path cannot
contain more edges than |V | − 1, and the shortest walk must be a shortest
path in case of no negative cycles.
On the other hand, finding the longest path is an NP-hard optimization
problem. If each weight is the constant 1, then the longest path from vi to
vj is exactly |V | − 1 if and only if there is a Hamiltonian path from vi to
vj . Equivalently, it is NP-hard to find the shortest path in a given number of
steps in case of negative cycles. To see this, set all weights to −1, and ask for
the shortest path from vi to vj in |V | − 1 steps.
Since the shortest walks are all shortest paths when there are no negative
cycles, it is also possible to count the number of shortest paths in polyno-
mial time on graphs without negative cycles. To do this, first build up the
statistics evaluation algebra using the monoid semiring N[R+ ], then take the
homomorph image
ϕ(h) := gminsupp h(gminsupp )
where gminsupp is the smallest element in the support of h. It is easy to see that
this is indeed a semiring homomorphism. Then ϕ(F (S((i, j, k)))) calculates the
number of the shortest walks from vi to vj in k steps. Summing this for all
k = 1, . . . , |V | − 1 in the homomorph image gives the number of shortest walks
from vi to vj which coincides with the number of shortest paths.
The (max,min)-semiring can be utilized in the following example.
Algebraic dynamic programming and monotone computations 87

Example 6. The G ~ = (V, E) be a directed graph, and let w : E → R be a


weight function. Define

f (v0 , e1 , . . . , ek , vk ) := min{w(ei )}
i

Find the path π from vi to vj that maximizes f (π).

Solution. Build the yield algebra for the possible walks on G ~ in


the way described above. Build the following evaluation algebra. Let
R = (R ∪ {−∞}, max, min) be the semiring in the yield algebra, in which the
addition is the maximum and the multiplication is the minimum of the two
operands. The f function is the one given in the exercise, and each T(vl ,vj ) is
the multiplication with w((vl , vj )) in this semiring, that is, taking the mini-
mum of the argument and w((vl , vj )). Then F (S((i, j, k))) calculates

max f (π)
π|p(π)=(i,j,k)

namely, the maximum value of the walks from vi to vj in k steps. The value
of the maximum value path from vi to vj is

max F (S((i, j, k))).


k=1,...,|V |−1

Indeed, any path from vi to vj contains at most |V | − 1 edges. Furthermore,


for any walk with a cycle vl , el+1 , . . . , vl :

π = vi , . . . , el , vl , el+1 , . . . , vl , el0 , . . . , vj

it holds that f (π) ≤ f (π 0 ), where

π 0 = vi , . . . , el , vl , el0 , . . . , vj .


It is clear that counting the number of walks that maximizes the prescribed
f function in Example 6 can be calculated with algebraic dynamic program-
ming in polynomial time. Some of them might not be paths, and counting
the number of such paths is a hard computational problem. Indeed, deciding
that there is a Hamiltonian path in a graph is NP-complete. Observe that the
same blowing-up technique used in the proof of Theorem 16 can be applied to
decide whether or not there is a path with n − 1 edges between two vertices
in a graph, where n is the number of vertices.
However, the number of shortest (= having minimum number of edges)
paths that maximizes the f function in Example 6 can be calculated in poly-
nomial time. Indeed, the shortest walks that maximize f are shortest paths.
The set of walks that minimizes the function f in Example 6 might not
contain any path. Interestingly, this optimization problem is NP-hard, since
the edge-disjoint pair of paths problem can be easily reduced to it [70].
88 Computational complexity of counting and sampling

To conclude, algebraic dynamic programming algorithms might solve sev-


eral counting and optimization problems on walks of directed graphs. When
the solutions coincide with the solutions of the corresponding problems on
paths of directed graphs, then naturally, these problems can be solved for
paths. However, whenever the optimal walk might not be a path, the path
versions of these problems always seem to be NP-hard.

2.3.5 Trees
Removing an internal node from a tree splits the tree into several subtrees.
This gives the possibility to build a yield algebra on trees and to solve vertex
coloring and related problems on trees. To do this, the trees should be rooted
as defined below.
Definition 38. Let G = (V, E) be a tree, and v ∈ V and arbitrary vertex.
Rooting the tree at vertex v means an orientation of the edges such that the
orientation of an edge (vi , vj ) is from vi to vj if vj is closer to v than vi . In
such a case, vertex vi is a child of vj and vj is the parent of vi . Vertex v
is called the root of the tree. The subtree rooted in vertex vj is the tree that
contains vj and all of its descendants (its children, the children of its children,
etc.).
Every node in a rooted tree except its root has exactly one parent. Any
internal node has at least one child. The central task here is to solve coloring
problems on trees. First, the definition of r-proper coloring is given.
Definition 39. Let G = (V, E) be a rooted tree, C is a finite set of colors,
and r ⊆ C × C is an arbitrary relation. A coloring c : V → C is an r-proper
coloring if for all oriented edges (vi , vj ) ∈ E, (c(vi ), c(vj )) ∈ r.
The r-proper coloring is an extension of the usual proper coloring defini-
tion. Indeed, if
r = (C × C) \ (∪c∈C {(c, c)})
then the r-proper coloring coincides with the usual definition of proper color-
ing.
The yield algebra of the r-proper coloring of a tree rooted in v can be
constructed based on the subtrees of the tree and its proper colorings. Let
G = (V, E) be a tree rooted in vertex v, let C be a finite set of colors, and
let r ⊆ C × C be a relation. Let A be the set of r-proper colorings of all the
subtrees of G. The parameters (u, c) describe the root of the subtree, u, and
its color, c. The partial ordering of the parameters is based on the root of the
subtrees, (u1 , c1 ) ≤ (u2 , c2 ) if the path from u1 to v contains u2 . The arity
of the operator ◦u,c is the number of children of vertex u. It takes a subtree
for each child, and connects them together with their common parent node u
colored by color c. The recursions are
 k 
S((u, c)) = ◦u,c tci |(ci ,c)∈r S((wi , ci )) i=1 (2.136)
Algebraic dynamic programming and monotone computations 89

where the wi nodes are the children of u. The initial condition for a leaf u
is that S((u, c)) is the set containing the trivial tree consisting only of the
vertex u, colored by color c. There are problems where the colors are given
for the leaves of the tree and the task is to give r-proper coloring of the entire
tree. For those problems, the initial conditions prescribe that S((u, c)) be the
empty set if leaf u is not colored by c.
Combining this yield algebra with different evaluation algebras provides
algebraic dynamic programming algorithms to find, for example,
(a) the number of r-proper colorings of a tree with a given number of colors,
(b) the number of independent vertex sets of a tree,
(c) the size of the largest independent set of a tree,
(d) the coloring that minimizes
X
w(c(u1 ), c(u2 ))
(u1 ,u2 )∈E

for some weight function w : C ×C → R (known as the Sankoff-Rousseau


algorithm [150]),
(e) the sum
 
X Y
π(c(v))  Pw((u1 ,u2 )) (c(u1 ), c(u2 ))
c∈C (u1 ,u2 )∈E

where π is a function mapping C to the (non-negative) real numbers, w


is an edge weight function assigning a non-negative weight for each edge,
and Px is a parameterized family of functions, where the parameter x is
a non-negative real number, and the function itself maps from C × C to
the (non-negative) real numbers (known as Felsenstein’s algorithm [68]).

2.4 Limitations of the algebraic dynamic programming


approach
There is an algebraic operation and an algorithmic step that the algebraic
dynamic programming approach does not use. We already emphasized that
subtraction is forbidden in algebraic dynamic programming. There is also no
branching in algebraic dynamic programming: the operations to be performed
later in a recursion do not depend on the result in previous computations.
Subtractions might improve the running time. Consider the problem of
90 Computational complexity of counting and sampling

counting the different subsequences of a sequence. Let X be a finite string


from a finite alphabet, and let A be the set of subsequences of X. If Y is a
subsequence of X, then let the parameter of Y be the minimal i such that Y
is a subsequence of Xi , the i-character-long prefix of X.
Let Y be a subsequence of X with parameter i. Let k denote the length
of Y . Let g(i) be the index of the previous occurrence of xi in X, if xi also
appears in Xi−1 , otherwise let g(i) be 0. It is easy to see that the parameter
of Yk−1 is equal to or larger than g(i). Therefore we can set up a yield algebra
on the subsequences. The parameter is as described above, and its ordering is
the natural one of the integers. The unary operator ◦a concatenates character
a to the end of a sequence. The recursion is
S(i) = ti−1
j=g(i) ◦xi (S(j)) (2.137)

with the initial condition S(0) = {ε}, where ε is the empty string. To count
the number of subsequences, we simply have to replace each disjoint union
with the addition and each ◦a operation with multiplication by 1. Since the
number of indices between g(i) and i − 1 is comparable with the length of X,
this algorithm runs in O(n2 ) time.
However, if subtraction is allowed, the number of different subsequences
can be counted in linear time. Let m(i) denote the number of different subse-
quences of the prefix Xi , and let g(i) be defined as above. Then
m(i) = 2m(i − 1) − m(g(i)). (2.138)
Indeed, we can extend any subsequence of Xi−1 with the character xi . This
yields some overcounting as some of the subsequences could appear twice in
this way. However, only those subsequences can appear that are also subse-
quences of Xg(i) . Clearly, the recursion in Equation (2.138) takes constant
time for each index, and therefore, the number of different subsequences can
be computed in linear time if subtractions are allowed.
Schnorr showed that matrix multiplication of two n × n matrices requires
Ω(n3 ) arithmetic operations if only multiplications and additions are allowed
[151]. However, matrix multiplications can be done in O(nlog2 (7) ) time if sub-
tractions are allowed [162], or even faster [47, 51]. Subtractions might have
even more computational power. We are going to discuss it further at the end
of Chapter 3.
Branching (test and branch instructions in an algorithm) might also speed
up calculations. It is well known that a greedy algorithm can find a minimum
spanning tree in an edge-weighted graph in polynomial time [116]. On the
other hand, Mark Jerrum and Marc Snir proved that computing the spanning
tree polynomial needs an exponential number of additions and multiplications
[101]. Let G = (V, E) be a graph, and assign a formal variable xi to each edge
ei of G. The spanning tree polynomial is defined as
X Y
xi (2.139)
T ∈T (G) xi ∈g(T )
Algebraic dynamic programming and monotone computations 91

where T (G) denotes the set of spanning trees of G, and for each spanning
tree T , g(T ) denotes the set of formal variables assigned to the edges of T .
It is easy to see that the evaluation of the spanning tree polynomial in the
tropical semiring is the score (that is, the sum of the weights of the edges)
of the minimum spanning tree. This score can be calculated with a simple
greedy algorithm but not with algebraic dynamic programming, at least not in
polynomial time. Also, the number of spanning trees, as well as the number of
minimum spanning trees can be computed in polynomial time, however, such
a computation needs subtraction. See also Chapter 3 for more details. Thus,
spanning trees are combinatorial objects for which optimization and counting
problems both can be computed in polynomial time, however, these algorithms
cannot be described in the algebraic dynamic programming framework, and
they are quite different.

2.5 Exercises
1. Show that the variants of the tropical semiring given in Definition 23
are all isomorphic to the tropical semiring.
2. Let ai,k and ni,k be the coefficients of the polynomials defined in Equa-
tions (2.4) and (2.5). Show that

ai,k = ai−1,k + ni−1,k


and
ni,k = ai−1,k−1 .

3. * Let S = a1 , a2 , . . . , an be a series of real


Qk numbers. Define the score
of any subsequence ai1 , ai2 , . . . aik to be j=1 (−1)j aij . Give a dynamic
programming algorithm that
(a) calculates the maximum score of the subsequences,
(b) calculates the sum of the scores over all possible subsequences.

4. Let A = {a1 , a2 , . . . , an } be a set of real numbers. Give a dynamic


programming algorithm which
(a) calculates Y
max ai and
S⊆A,|S|=k
ai ∈S

(b) calculates X Y
ai .
S⊆A,|S|=k ai ∈S
92 Computational complexity of counting and sampling

5. A subsequence of a sequence is isolated if does not contain consecutive


characters. Let a1 , a2 , . . . an be a series of real numbers, and treat each
number as a character. Give a dynamic programming algorithm that
calculates for this series
(a) the largest sum of isolated subsequences and
(b) the sum of the products of isolated subsequences of length k.
6. ◦ Give a dynamic programming algorithm which computes the number
of sequences of length n from the alphabet {a, b} that do not contain
the substring aba.
7. ◦ Give a dynamic programming algorithm that computes the number
of sequences of length n from the alphabet {a, b, c, d, e} that contain an
odd number of a’s and the sum of the numbers of c’s and d’s is even.
8. Give a dynamic programming algorithm that computes the number of
sequences of length n from the alphabet {a, b} in which the number of
a’s can be divided by 3 and does not contain 3 consecutive b’s.
9. Give a dynamic programming algorithm that for a given Hidden Markov
~ ST ART, EN D, Γ, T, e) and a series of sets, τ1 , τ2 , . . . , τn ,
Model, (G,
∀τi ⊆ Γ,

(a) calculates the probability that the HMM emits a sequence


Y y1 y2 . . . yn such that for all i, yi ∈ τi and
(b) counts the number of possible emission paths that emit a sequence
Y such that for all i, yi ∈ τi .

10. * A fast food chain is considering opening some restaurants along a


highway. There are n possible locations at distances d1 , d2 , . . . dn from a
given starting point. At most one restaurant can be opened at a location,
and the expected profit from opening a restaurant at location i is pi .
Due to some regulations, any two restaurants should be at least d miles
apart from each other. Give a dynamic programming algorithm that
(a) calculates the maximum expected total profit,
(b) calculates the number of legal plans of opening restaurants, and
(c) calculates the variance of the expected total profit of the uniform
distribution of legal opening plans.
cf. Dasgupta-Papadimitriou-Vazirani: Algorithms, Exercise 6.3.
11. ◦ A checkerboard of 4 rows and n columns is given. An integer is written
on each square. Also 2n pebbles are given and some or all of them can
be put on squares such that no horizontal or vertical neighbors have
both pebbles. Furthermore, we require that there be at least one pebble
Algebraic dynamic programming and monotone computations 93

in each column. The score of a placement is the sum of the integers on


the occupied squares. Give a dynamic programming algorithm that
(a) calculates the maximum score in O(n) time,
(b) calculates the number of possible placements in O(n) time,
(c) calculates the number of possible placements of k pebbles for each
k in O(n2 ) time, and
(d) calculates the maximum score of placements with k pebbles for
each k in O(n2 ) time.
cf. Dasgupta-Papadimitriou-Vazirani: Algorithms, Exercise 6.5.
12. Give a dynamic programming algorithm that computes the number of
ways to tile a 3 × n table with dominoes.
13. ∗ Prove that the number of alignments of two sequences of length n and
m is
min{n,m}
X (n + m − i)!
.
i=0
i!(m − i)!(n − i)!

14. Prove that two parse trees, both in Chomsky normal form, generating
the same sequence contain the same number of internal nodes, thus the
same number of rewriting rules.
15. Give a dynamic programming algorithm that counts all possible genera-
tions of a sequence by a context-free grammar in Chomsky normal form.
Here two generations

S = A1 → A2 → . . . → An = A

and
S = B1 → B2 → . . . → Bn = A
are considered to be different even if they have the same parse tree.
Note that although all parse trees generating the same sequence contain
the same number of rewriting rules, different parse trees might represent
different numbers of generations.
16. Suppose that in a Dyck word, each x is replaced by (123) and each y
is replaced with (12), and the Dyck word is evaluated as the product
of the defined cycles in the permutation group S3 . For example, xxyy
becomes
(123)(123)(12)(12) = (123)(123) = (132)
on the other hand, xyxy becomes

(123)(12)(123)(12) = (23)(23) = id.


94 Computational complexity of counting and sampling

Give a dynamic programming algorithm that computes how many Dyck


word of length n exist that is evaluated as a given member of the per-
mutation group S3 .
17. * The following operation on symbols a, b, c is defined according to the
following table:
a b c
a b b a
b c b a
c a c c

Notice that the operation defined by the table is neither associative nor
commutative. Give a dynamic programming algorithm that counts how
many parenthesizations of a given sequence of symbols {a, b, c} there are
that yield a.
cf. Dasgupta-Papadimitriou-Vazirani: Algorithms, Exercise 6.6.
18. Let B be a Boolean expression containing words from V =
{“T RU E”, “F ALSE”} and from O = {“and”, “or”, “xor”}. The ex-
pression is legitimate, that is, it starts and ends with a word from V and
contains words from V and O alternatively. Give a dynamic program-
ming algorithm that counts how many ways there are to parenthesize
the expression such that it will evaluate to “T RU E”.
19. An algebraic expression contains positive integer numbers, and addition
and multiplication operations. Give a dynamic programming algorithm
that
(a) calculates the maximum value that a parenthesization might take
and
(b) calculates the average value of a random parenthesization.

20. * Give a dynamic programming algorithm that counts how many se-
quence alignments of two sequences, X and Y , there are which contain
s substitutions, i insertions and d deletions.
21. * Assume that breaking a string of length l costs l units of running time
in some string-processing programming language. Notice that different
breaking scenarios using the same breaking positions might have dif-
ferent running time. For example, if a 30-character string is broken at
positions 5 and 20, then making the first cut at position 5 has a total
running time of 30 + 25 = 55, on the other hand, making the first break
at position 20 has a total running time of 30 + 20 = 50.
Give a dynamic programming algorithm that counts
Algebraic dynamic programming and monotone computations 95

(a) how many ways there are to break a string into m + 1 pieces at
m prescribed points such that the total running time spent on
breaking the substrings into smaller pieces is exactly w units and
(b) how many ways there are to break a string of length n into m +
1 pieces such that the total running time spent on breaking the
substrings into smaller pieces is exactly w units.
Two breaking scenarios are not distinguished if they use the same breaks
of the same substrings just in different order. For example, breaking a
string at position 20 and then at position 10 and then at position 30 is
not distinguished from the scenario which breaks a string at position 20
then at position 30 and then at position 10.
cf. Dasgupta-Papadimitriou-Vazirani: Algorithms, Exercise 6.6.

22. Give a dynamic programming algorithm that takes a sequence of num-


bers and calculates the total sum of sums that can be obtained by in-
serting + symbols into the text. For example, if the input is 123, then
the number to be calculated is

168 = 123 + (1 + 23) + (12 + 3) + (1 + 2 + 3).

23. There are n biased coins, and the coin with index i has probability pi
for a head. Give a dynamic programming algorithm that computes the
probability that there will be exactly k heads when all the coins are
tossed once.
cf. Dasgupta-Papadimitriou-Vazirani: Algorithms, Exercise 6.10.
24. * Given a convex polygon P of n vertices in the Euclidian plane (by
the coordinates of the vertices). A triangulation of P is a collection of
n − 3 diagonals such that no two diagonals cross each other. Notice that
a triangulation breaks a polygon into n − 2 triangles. Give a dynamic
programming algorithm that calculates the maximum and average score
of a triangulation if it is defined by
(a) the sum of the edge lengths, and
(b) the product of the edge lengths.

cf. Dasgupta-Papadimitriou-Vazirani: Algorithms, Exercise 6.12.


25. * There is an n × m chocolate bar that should be broken into 1 × 1
pieces. During a breaking scenario, any rectangular part in itself can
be broken along any horizontal or vertical lines. Each 1 × 1 piece has a
unique label, thus breaking two rectangles can be distinguished even if
they have the same dimensions. Give a dynamic programming recursion
that computes the number of possible breaking scenarios if
96 Computational complexity of counting and sampling

(a) two breaking scenarios are equivalent if they contain the same
breakings in different order, and
(b) the order of the breakings count.
Question for fun: What is the minimum number of breakings necessary
to get nm number of 1 × 1 pieces?
26. There is a rectangular piece of cloth with dimensions n×m, where n and
m are positive integers. Also given a list of k products, for each product
a triplet (ai , bi , ci ) is given, such that the product needs a rectangle of
cloth of dimensions ai × bi and can be sold at a price ci . Assume that
ai , bi and ci are all positive integers. Any rectangular piece of cloth
can be cut either horizontally or vertically into two pieces with integer
dimensions. Give a dynamic programming algorithm that counts how
many ways there are to cut the clothes into smaller pieces maximizing
the total sum of prices. The order of the cuts does not count, but on the
other hand, it does count where the smaller pieces are located on the
n × m rectangle.
cf. Dasgupta-Papadimitriou-Vazirani: Algorithms, Exercise 6.14.

27. * A binary search tree is a vertex-labeled, rooted uni-binary tree with


the following properties. The labels are objects with a total ordering,
and for each internal node v, the labels on the left subtree of v are all
smaller than the label of v and the labels on the right subtree of v are
all bigger than the label of v. It is allowed that an internal node does
not have a right or a left child; however, any internal node must have at
least one child.
Let T = (V, E) be a rooted uni-binary tree, let the bijection g : V → L
label the vertices, and let π : L → R+ be a probability distribution of the
labels. Let d : V → Z+ be the distance function of vertices measuring
the number of vertices of the path to the root. That is, d(root) = 1,
the distance of the children of the root is 2, etc. This distance measures
the number of comparisons necessary to find a given object (label) in T .
The score of a binary search tree is the expected number of comparisons
necessary to find a random object following the distribution π, namely,
X
d(v)π(g(v))
v∈V

Give a dynamic programming algorithm that for a given distribution of


objects, π : L → R+ ,
(a) finds a binary search tree with the minimum cost, and
(b) calculates the expected score of a uniformly distributed binary
search tree.
Algebraic dynamic programming and monotone computations 97

28. * Give a dynamic programming algorithm that in a binary tree T


(a) computes the number of matchings,
(b) calculates the size of a maximal matching,
(c) calculates the number of maximal matchings, and
(d) calculates the average size of a matching.

29. A vertex cover of a graph G = (V, E) is a subset of vertices that in-


cludes at least one endpoint of each edge. Give a dynamic programming
algorithm that for a binary tree
(a) computes the number of vertex covers,
(b) computes the average number of vertices in a vertex cover, and
(c) computes the number of minimal vertex covers.

30. Give a dynamic programming algorithm that calculates for each k how
many increasing subsequences of length k are in a given permutation.
31. A sequence of numbers is called zig-zag if the differences of the consec-
utive numbers are positive and negative alternatively. Give a dynamic
programming algorithm that computes for each k how many zig-zag
subsequences of length k there are in a given permutation.
32. * Give a dynamic programming algorithm that for an input text, an
upper bound t, and for each k and x, calculates how many ways there
are to break the text into k lines such that each line contains at most t
characters, and the squared number of spaces at the end of the lines for
each line has a total sum x.
33. A sequence is palindromic if it is equal to its reverse. Give a dynamic
programming algorithm that in a given sequence
(a) finds the longest palindromic subsequence,
(b) calculates the number of longest palindromic subsequences, and
(c) calculates the average length of palindromic subsequences.

34. * Give a dynamic programming algorithm that computes the number


of ways to insert a minimum number of characters into a sequence to
make it a palindrome.
35. ◦ There are n types of boxes with dimensions (ai , bi , ci ), and the dimen-
sions might be arbitrary positive real numbers. A stable stack of boxes is
such that for each consecutive pair of boxes, the horizontal dimensions
of the bottom box are strictly larger than those of the top box. The
boxes might be rotated in any direction, and thus, the same type of box
might be used more than once. Give a dynamic programming algorithm
that
98 Computational complexity of counting and sampling

(a) finds the height of the largest possible stack,


(b) computes the number of possible stacks, and
(c) calculates the average height of the stacks.

36. * Give a dynamic programming algorithm that for a given rooted binary
tree
(a) finds the number of subsets of vertices not containing any neigh-
boring vertices,
(b) computes the number of proper colorings of vertices with k ≥ 3
colors (recall that no neighboring vertices should have the same
color in a proper coloring of the vertices of a graph), and
(c) computes the number of ways that the vertices can be partially
colored with k ≥ 2 colors such that no neighboring vertices have
the same colors, however, one or both might be uncolored.

37. Generalize Exercise 36 for arbitrary trees.


38. * There is an n × m checkerboard, and each square contains some coins.
A tour of the checkerboard consists of two paths, one from the top left
corner to the bottom right one, containing only right and downward
steps; the other path is from the bottom right corner to the top left one,
containing only left and upward steps. The score of the tour is the sum
of coins on the visited squares, and coins on squares visited twice count
only once. Give a dynamic programming algorithm that
(a) computes the number of tours,
(b) calculates the maximum possible scores, and
(c) calculates the average score of a uniformly distributed tour.

39. ◦ Generalize Exercise 38 such that diagonal steps are possible on both
paths.
40. * Let A = {a1 , a2 , . . . an } be a set of positive integers. Give a dynamic
programming algorithm that calculates
(a) the number of ways A can be split into 3 disjoint subsets, X, Y
and Z such that
X X X
ai = aj = ak
ai ∈X aj ∈Y ak ∈Z

(b) the average value


 2  2  2
X X X X X X
 ai − aj  + ai − aj  + ai − aj 
ai ∈X aj ∈Y ai ∈X aj ∈Z ai ∈Y aj ∈Z

of a random tripartition X t Y t Z = A.
Algebraic dynamic programming and monotone computations 99
~ = (V, E) and a weight function w : E → R,
41. Given a directed graph G
for any path π, define
f (π) := min w(e).
e∈π

Give a dynamic programming that computes the number of shortest


paths maximizing f (π).
42. * A decreasing pair of non-negative integer sequences (bidegree se-
quences) D = {(b1 , b2 , . . . , bn ), (c1 , c2 , . . . , cm )} is called graphical if
there exists a bipartite graph whose degrees are exactly D. The Gale-
Ryser theorem states that a bidegree sequence is graphical iff
n
X m
X
bi = cj
i=1 j=1

and for all k = 1, . . . , n,


k
X k
X
bi ≤ c∗j
i=1 j=1

where c is defined as
c∗j := |{ci |fi ≥ j}|.
Give a dynamic programming algorithm that computes the number of
graphical bidegree sequences.

2.6 Solutions
Exercise 3. Construct the following yield algebra. Let A be the subsequences
of prefixes of the given string. Technically, these subsequences can be repre-
sented as sequences from {0, 1}, where 0 means that the corresponding char-
acter is not part of the subsequence and 1 means that the corresponding char-
acter is part of the subsequence. For example, if A = 1, −5, 2, 4, −3, 2, 1, 1, 3,
then 010110 denotes −5, 4, −3. Any subsequence is parameterized with the
length of the prefix and the parity indicator of the length of the subsequence.
That is, Θ contains the (i, p) pairs, i ∈ [0, . . . n], p ∈ {0, 1}. The parame-
ters are partially ordered based on their first coordinate. The operation ◦1
concatenates 1 to the end of the sequence representing the subsequence, and
the operation ◦0 concatenates 0 to the end of the sequence representing the
subsequence. The recursion of the yield algebra is

S((i, p)) = ◦0 (S((i − 1, p))) t ◦1 (S((i − 1, p + 1(mod 2))))


100 Computational complexity of counting and sampling

with initial values S((0, 0)) = {} and S((0, 1)) = ∅, where  denotes the
empty sequence.
When the largest product is to be calculated, the evaluation algebra is the
following. R = ((R≥0 + ∪{−∞}) × (R≥0 ∪ {−∞}), ⊕, ) semiring, ⊕ is the
coordinatewise maximum with the rule

max{−∞, a} = a

and
(x1 , y1 ) (x2 , y2 ) = (max{x1 x2 , y1 y2 }, max{x1 y2 , y1 x2 })
where the multiplication of −∞ with itself is defined as −∞. The rationale
is that the first coordinate stores the largest available maximum and the
second coordinate stores the absolute value of the largest possible negative
value, if it exists. Here −∞ stands for “non defined”. If X is a subsequence
{x1 , x2 , . . . xk }, then
 Q 
i i
Q

  i|xi =1
 (−1) ai , −∞ if i|xi =1 (−1) ai > 0

f (X) = −∞, i|xi =1 (−1)i ai i .
Q Q
 if i|xi =1 (−1) ai < 0
i

 (0, 0) Q
if i|xi =1 (−1) ai = 0

The T1 function for the operator ◦1 depends on the parameter (i, p). If
(−1)i ai > 0, then T1 is the multiplication with ((−1)i ai , −∞),  and if
(−1)i ai < 0, then T1 is the multiplication with (−∞, (−1)i ai ). If ai = 0,
then T1 is the multiplication with (0, 0). Finally, the function T0 for the oper-
ator ◦0 is the identity function.
When the sum of the score of all possible subsequences is to be calculated,
the evaluation algebra is the following. R is the real field; if X is a subset,
then Y
f (X) = (−1)i ai .
ai ∈X

The function T1 for operation ◦1 depends on the parameter (i, p); it is the mul-
tiplication with (−1)i ai . The function T0 for the operation ◦0 is the identity
function.
Exercise 6. The following unambiguous regular grammar generates the pos-
sible sequences. T = {a, b}, N = {S, A, B, X} and the rewriting rules are

S → aA | bB | a | b
A → aA | bX | a | b
B → aA | bB | a | b
X → bB | b.

The nonterminal A means that the last generated character was a, B means
that the last generated character was b and the next to last character was not
a, and X means that the last generated character was b and the next to last
Algebraic dynamic programming and monotone computations 101

character was a. The usual yield algebra and the corresponding evaluation
algebra as described in Subsection 2.3.1 gives the recursion on the sets of
possible generations and counts them. Since the grammar is unambiguous,
the number of possible generations is the number of possible sequences.
Exercise 7. The following unambiguous grammar generates the possible se-
quences. T = {a, b, c, d, e}, N = {S, W1 , W2 , W3 }, and the rewriting rules are

S → aW2 | a | bS | cW1 | dW1 | eS


W1 → aW3 | bW1 | cS | dS | eW1
W2 → aS | bW2 | b | cW3 | dW3 | eW2 | e
W3 → aW1 | bW3 | cW2 | c | dW2 | d | eW3 .

The non-terminal S means that in the so-far generated sequence, the number
of generated a’s is even, and the sum of the number of c’s and d’s is even.
Similarly, W1 stands for even-odd, W2 stands for odd-even, and W3 stands for
odd-odd. The usual yield algebra and the corresponding evaluation algebra
as described in Subsection 2.3.1 gives the recursion on the sets of possible
generations and counts them. Since the grammar is unambiguous, the number
of possible generations is the number of possible sequences.
Exercise 10. Let a sequence from the alphabet {0, 1} represent a subset of
indexes of locations, such that the sequence X = x1 x2 . . . xk represents

{i | xi = 1} .

Let such a sequence be called legal if the corresponding set of locations is a


legal opening plan. Define the following yield algebra. A contains the legal 0-
1-sequences. Each legal sequence has a parameter (k, i) where k is the length
of the string, and i is the largest position where there is a 1 in the string.
On this parameter set, (k1 , i1 ) ≤ (k2 , i2 ) if k1 ≤ k2 . Obviously, i ≤ k for
any parameter (k, i). The unary operator ◦1 concatenates a 1 to the end of
a sequence, the unary operator ◦0 concatenates a 0 to the end of a sequence.
The recursions are

S((k, i)) = ◦0 (S((k − 1, i))) ∀i < k (2.140)


S((k, i)) = tj|dk −dj ≥d ◦1 (S(k − 1, j)) if k = i (2.141)

with initial condition S((0, 0)) = {}, where  denotes the empty sequence.
Define d0 to be −∞, thus dk − d0 > d for any k.
If the task is to calculate the maximum expected profit, then the evalua-
tion algebra can be constructed in the following way. K is the dual tropical
semiring, that is, the tropical addition is taking the maximum instead of the
minimum, and the additive unit is −∞ instead of ∞. The f function is defined
as
f (x1 x2 . . . xk ) := i|xi =1 pi .
The T1 function for the operator ◦1 depends on parameter (k, k); it is the
102 Computational complexity of counting and sampling

tropical multiplication (that is, the usual addition) with pk . The T0 function
for operator ◦0 is the identity function. The maximum expected profit is

⊕ni=0 F (S((n, i))).


If the task is to calculate the number of legal opening plans, then R = Z,
and f is the constant 1 function. Both T1 and T0 are the identity functions.
The number of legal opening plans is
n
X
F (S((n, i))).
i=0

If the task is to calculate the variance of the expected profit of the uniform
distribution of legal opening plans, then R is set to (R3 , ⊕, ), where ⊕ is the
coordinate-wise addition, and

(a0 , a1 , a2 ) (b0 , b1 , b2 ) = (a0 b0 , a1 b0 + a0 b1 , a2 b0 + 2a1 b1 + a0 b2 ).

The f function is defined in the following way


  2 
 X X
f (x1 x2 . . . xk ) := 1, pi ,  pi   .

i|xi =1 i|xi =1

The function T1 for the operator ◦1 depends on the parameter (k, k); it is
the multiplication with (1, pk , p2k ) in K. T0 for the operation ◦0 is the identity
function.
Define (N, M, Z) in the following way:

(N, M, Z) := ⊕ni=0 F (S((n, i))).


Here N is the number of legal opening plans, M is the sum of the expected
profits in all possible legal opening plans, and Z is the sum of the expected
profits squared in all possible legal opening plans. The variance of the expected
profits in the uniform distribution of possible legal opening plans is
 2
Z M

N N
since for any distribution

V [x] = E[x2 ] − E 2 [x].


That is, the variance is the expectation of the squared values minus the ex-
pectation squared.
Exercise 11. First, observe that only the following 6 patterns are possible in
a column:
Algebraic dynamic programming and monotone computations 103

           
1 0 0 0 1 0
 0   1   0   0   0   1 
a)  
 0  b)  
 0  c)  
 1  d)  
 0  e)  
 1  f)  
 0 
0 0 0 1 0 1

Define the following regular grammar:

S → aA | bB | cC | dD | eE | f F | a | b | c | d | e | f
A → bB | cC | dD | f F | b | c | d | f
B → aA | cC | dD | eE | a | c | d | e
C → aA | bB | dD | f F | a | b | d | f
D → aA | bB | cC | eE | a | b | c | e
E → bB | dD | f F | b | d | f
F → aA | cC | eE | a | c | e

It is easy to see that this grammar defines the possible sequences of columns
such that no horizontal or vertical neighbors have pebbles. Define the score
of a generation as the sum of the numbers on the squares that have pebbles.
The yield algebra is the standard one for generations in a regular grammar,
in the evaluation algebra. The semiring in the evaluation algebra should be
appropriately chosen according to the given computational task.
Exercise 13. The index i is the number of columns in the alignment without
gap symbols, that is, the alignment columns with a match or a mismatch.
Their number might vary between 0 and min{n, m}. If there are i matches or
mismatches in an alignment, then there are n−i deletions and m−i insertions.
The total length of such an alignment is n + m − i. The number of alignments
with these properties is indeed
 
n+m−i (n + m − i)!
= .
n − i, m − i, i (n − i)!(m − i)!i!

Exercise 17. Define the following yield algebra. Let A be the possible paren-
thesizations of the possible substrings. Recall that a substring is a consecutive
part of a sequence. The possible parenthesizations of a sequence X are se-
quences from the alphabet {(, ), a, b, c} satisfying the following rules.
• The subsequence obtained by removing the “(” and “)” symbols is X.
• The subsequence obtained by removing characters a, b and c is a Dyck
word.

• There are at least two characters between an opening bracket and its
corresponding closing bracket.
104 Computational complexity of counting and sampling

• If there are two consecutive opening brackets, their corresponding closing


brackets are not neighbors. A similar statement is true for consecutive
closing brackets.
The parameters are triplets in the form (i, j, x) where i and j are the first
and last indexes of the substring, and x is the result of the evaluation of
the parenthesization. The partial ordering of the parameters is defined in the
following way: (i1 , j1 , x1 ) ≤ (i2 , j2 , x2 ) if i2 ≤ i1 and j1 ≤ j2 . The binary
operation ◦ is defined as

X1 ◦ X2 := (X1 X2 ).

The recursions are


 
S((i, j, a)) = tj−1
k=i S((i, k, a)) ◦ S((k + 1, j, c)) t
 
tj−1
k=i S((i, k, b)) ◦ S((k + 1, j, c)) t
 
tj−1
k=i S((i, k, c)) ◦ S((k + 1, j, a))
 
S((i, j, b)) = tj−1
k=i S((i, k, a)) ◦ S((k + 1, j, a)) t
 
tj−1
k=i S((i, k, a)) ◦ S((k + 1, j, b)) t
 
tj−1
k=i S((i, k, b)) ◦ S((k + 1, j, b))
 
S((i, j, c)) = tj−1
k=i S((i, k, b)) ◦ S((k + 1, j, a)) t
 
tj−1
k=i S((i, k, c)) ◦ S((k + 1, j, b)) t
 
tj−1
k=i S((i, k, c)) ◦ S((k + 1, j, c))

with the initial values



{x} ifxi = x
S((i, i, x)) = .
∅ ifxi 6= x

In the evaluation algebra, R = Z, f is the constant 1 function and the T func-


tion for the operation ◦ is the multiplication. The number of parenthesizations
resulting a is F (S((1, n, a))).
Exercise 20. First, observe the following. If sequence X is aligned to sequence
Y resulting i insertions and d deletions, then |X| + i − d = |Y |. Therefore the
two sequences and the number of deletions already determine the number
of insertions. Having said this, define the following yield algebra. Let A be
the possible alignments of prefixes of X and Y . The parameter set contains
quadruples (i, j, s, d), where i is the length of the prefix of X, j is the length
of the prefix of Y , s is the number of substitutions, and d is the number
of deletions in the alignment. The partial ordering is defined such that a
Algebraic dynamic programming and monotone computations 105

parameter (i1 , j1 , s1 , d1 ) is lower than or equal to the parameter (i2 , j2 , s2 , d2 )


if i1 ≤ i2 , j1 ≤ j2 , s1 ≤ s2 and d1 ≤ d2 . The operation ◦i,j concatenates the
x
alignment column i to the end of an alignment. Similarly, ◦−,j concatenates
yj

the alignment column to the end of an alignment and ◦i,− concatenates
yj
xi
to the end of an alignment. The recursions are


S((i, j, s, d)) = ◦i,j S((i − 1, j − 1, s − 1 + δxi ,yj , d)) t
◦−,j (S((i, j − 1, s, d))) t
◦i,− (S((i − 1, j, s, d − 1)))

where dxi ,yj is the Kronecker delta function. The initial condition is
 
S((0, 0, 0, 0)) = , where denotes the empty alignment. The eval-
 
uation algebra simply counts the size of the sets, that is, R is the integer ring,
f is the identity function, and all T functions corresponding to operators ◦i,j ,
◦−,j and ◦i,− are the identity functions.
An alternative solution is also possible. In this solution, the yield algebra
is the following. A is the possible alignments of the possible prefixes. The
parameters are pairs (i, j) denoting the length of the prefixes, (i1 , j1 ) ≤ (i2 , j2 )
if i1 ≤ i2 and j1 ≤ j2 . The operators ◦i,j , ◦−,j and ◦i,− are the same as defined
above. The recursions are

S((i, j)) = ◦i,j (S((i − 1, j − 1))) t ◦−,j (S((i, j − 1))) t ◦i,− (S((i − 1, j)))
 

with initial condition S((0, 0)) = . In the evaluation algebra, R =

Z[z1 , z2 ], the two variable polynomial ring over Z. The function f maps z1s z2d
to an alignment with s number of substitutions and d number of insertions.
1−δx ,y
The Ti,j function for operator ◦i,j is the multiplication with z1 i j , T−,j is
the identity function and Ti,− is the multiplication with z2 . The number of
alignments with s number of substitutions and d number of deletions is the
coefficient of z1s z2d of the polynomial F (S((n, m))), where n and m are the
length of X and Y , respectively.
Exercise 21. To solve the first subexercise, consider a sequence X whose
characters are the m + 1 segments of the original sequence. Each character in
x has a weight w(x) defined as the number of characters in the corresponding
segment. A breaking scenario is a series of breaks, bi1 bi2 . . . bik , where bi is the
break at the border of segments xi and xi+1 . A breaking scenario bi1 bi2 . . . bik
is canonical if for any j < l either ij < il or the break bij acts on a substring
that includes xil (or both conditions hold).
In the yield algebra, A is the possible canonical breaking scenarios of the
possible substrings of X. The parameter (i, j) denotes the first and the last
106 Computational complexity of counting and sampling

index of the substring. In the partial ordering of the parameters, (i1 , j1 ) ≤


(i2 , j2 ) if i2 ≤ i1 and j1 ≤ j2 . The operation ◦ operates on breaking scenarios of
two consecutive substrings. If Q and R are breaking scenarios with parameters
p(Q) = (i, k) and p(R) = (k + 1, j), then
Q ◦ R = bk , Q, R.
The recursions are
S((i, j)) = tj−1
k=1 S((i, k)) ◦ S((k + 1, j))

with the initial conditions S(i, i)) = {}, where  is the empty sequence (of
breaking steps).
In the evaluation algebra, R = Z[z], the polynomial ring over the integers.
The function f maps z W to a breaking scenario, where W is the total run-
ning time of the scenario. The function T for the operator ◦ depends on the
parameters; if p(Q) = (i, k) and p(R) = (k + 1, j), then
Pj
w(xl )
T (Q, R) = z l=i f (Q)f (R)
and similarly, for two values k1 , k2 ∈ K,
Pj
w(xl )
T (k1 , k2 , (i, k), (k + 1, j)) = z l=i k1 k2 .
The number of breaking scenarios with total running time w is the coefficient
of z w in F (S((1, m + 1))).
In the second subexercise, A, the base set of the yield algebra contains the
possible canonical breaking scenarios of the possible substrings of the original
string into smaller substrings (not necessarily into single characters). The
canonical breaking scenarios are defined similarly as above. The parameter
(i, j, l) denotes the beginning of the substring, the end of the substring and
the number of breaks, respectively. In the partial ordering of the parameters,
(i1 , j1 , l1 ) ≤ (i2 , j2 , l2 ) if i2 ≤ i1 , j1 ≤ j2 and l1 ≤ l2 . The operation ◦ operates
on a pair of breaking scenarios operating on consecutive substrings; if p(Q) =
(i, k, l1 ) and p(R) = (k + 1, j, l2 ), then
Q ◦ R = bk , Q, R.
The recursions are
S((i, j, l)) = tj−1 l−1
k=1 tl1 =0 S((i, k, l1 )) ◦ S((k + 1, j, l − l1 − 1))

with initial conditions S((i, j, 0)) = {}, where  denotes the empty sequence
(of breaking steps).
In the evaluation algebra, R = Z[z], the polynomial ring over the integers.
The function f maps z W to a breaking scenario, where W is the total run-
ning time of the scenario. The function T for the operator ◦ depends on the
parameters; if p(Q) = (i, k, l1 ) and p(W ) = (k + 1, j, l2 ) then
T (Q, W ) = z j−i+1 f (Q)f (W )
Algebraic dynamic programming and monotone computations 107

and similarly, for any two values r1 , r2 ∈ R,

T (r1 , r2 , (i, k, l1 ), (k + 1, j, l2 )) = z j−i+1 r1 r2 .

The number of breaking scenarios with total running time w in m number of


steps is the coefficient z w in F (S((1, n, m))).
There is an alternative solution of the second subexercise. In the yield
algebra, A and ◦ is the same. The parameter (i, j) denotes the beginning and
the end of the substring. In the partial ordering, (i1 , j1 ) ≤ (i2 , j2 ) if i1 ≤ i2
and j2 ≤ j1 . The recursions are

S((i, j)) = tj−1


k=i S((i, k)) ◦ S((k + 1, j))

with initial conditions S((i, i, )) = {}.


In the evaluation algebra, R = Z[z1 , z2 ], the two variable polynomial ring
over the integers. The function f maps z1m z2w to a breaking scenario with m
number of breaks and w total running time. The function T depends on the
parameters; if p(Q) = (i, k) and p(W ) = (k + 1, j) then

T (Q, W ) = z1 z2j−i+1 f (Q)f (W ),

and similarly, for any r1 , r2 ∈ R,

T (r1 , r2 , (i, k), (k + 1, j)) = z1 z2j−i+1 r1 r2 .

The number of breaking scenarios with total running time w in m number of


steps is the coefficient of z1m z2w in F (S((1, n))).
Exercise 24. Notice that any edge e of a polygon is part of a triangle in
a triangulation, therefore, at least one diagonal incident to e is in the col-
lection of the diagonals describing the triangulation. If e = (vi , vi+1 ) is an
edge, consider the triangle in which e is participating. If it contains a diag-
onal incident to v1 it is called the left neighbor of e. Similarly, if it contains
a diagonal incident to v2 , it is called the right neighbor. For a set of diago-
nals of a triangulation, define the following canonical ordering of the diago-
nals. Let v1 , v2 , . . . , vn be the vertices, let (vn , v1 , v2 . . . vn−1 , vn ) denote the
polygon, and for any diagonal (vi , vj ), let (vi , vj , vj+1 , vj+2 , . . . , vi−1 , vi ) and
(vj , vi , vi+1 , vi+2 , . . . , vj−1 , vj ) denote the two sub-polygons emerging by cut-
ting the polygon at diagonal (vi , vj ). The canonical ordering first contains the
left neighbor of (vn , v1 ), denoted by (vn , vi ), if it exists. Then the canonical or-
dering is continued with the canonical ordering of the triangulation of the sub-
polygon (vn , vi , vi+1 , . . . , vn−1 , vn ), if the left neighbor of the edge (vn , v1 ) ex-
ists. Then the canonical ordering is continue with the right neighbor of (vn , v1 ),
denoted by (vi , v1 ), if it exists. Then the canonical ordering is continued with
the canonical ordering of the sub-polygon (vi , v1 , v2 , . . . vi ), if the right neigh-
bor of (vn , v1 ) exists. Note that in the sub-polygon (vn , vi , vi+1 , . . . , vn−1 , vn ),
the first edge is the former diagonal (vn , vi ), which defines the canonical or-
dering of the diagonals in this sub-polygon. Similarly, (vi , v1 ) is the first edge
in (vi , v1 , v2 , . . . vi ).
108 Computational complexity of counting and sampling

Define the following yield algebra. The base set A contains the triangula-
tions of sub-polygons (vi , vj , vj+1 , . . . vi ) described as the canonical ordering of
the diagonals participating in the triangulations, where i > j. The parameters
are the pairs (i, j) describing the former diagonal (vi , vj ) of the sub-polygon.
In the partial ordering of the parameters, (i1 , j1 ) ≤ (i2 , j2 ) if i1 ≤ i2 and
j2 ≤ j1 (observe that i > j). If p(Q) = (i, k) and p(R) = (k, j) then the
operator ◦ acts on them as

Q ◦ R = (vi , vk ), Q, (vk , vj ), R.

The unary operator ◦l adds the former diagonal (vi , vj ) to a triangulation to


get a triangulation of a larger sub-polygon. If p(Q) = (i, j), then

◦l (Q) = (vi , vj ), Q.

The recursions are


 
S((i, j)) = ◦l (S((i − 1, j))) t ti−2
k=j+2 S((i, k)) ◦ S((k, j)) t ◦l (S((i, j + 1)))
(2.142)
with the initial condition S((i + 2, i)) = {}, where  is the empty sequence
(of diagonals). If the score of the triangulation is additive, and the task is
to find the triangulation with minimum score, then the evaluation algebra is
the following. R is the tropical semiring. The function f assigns the sum of
the lengths of the diagonals to a triangulation. The function T for operator ◦
depends on the parameters; if p(Q) = (i, k) and p(W ) = (k, j), then

T (Q, W ) = |(vi , vk )| f (Q) |(vk , vj )| f (W ),

and similarly, for any two values r1 , r2 ∈ R,

T (r1 , r2 , (i, k), (k, j)) = |(vi , vk )| r1 |(vk , vj )| r2 .

Here is the tropical multiplication, that is, the usual addition, and |(vi , vj )|
denotes the length of the edge (vi , vj ). The function Tl for the operator ◦l also
depends on the parameter; if p(Q) = (i, j), then

Tl (Q) = |(vi , vj )| f (Q),

and similarly, for any r ∈ R,

Tl (r, (i, j)) = |(vi , vj )| r.

The minimum score of the triangulations is F (S((n, 1))).


If the score of the triangulation is additive, and the task is to find the
average score, then the evaluation algebra is the following. R = (R2 , ⊕, ),
where ⊕ is the coordinatewise addition and the multiplication is defined as

(x1 , y1 ) (x2 , y2 ) := (x1 x2 , x1 y2 + y1 x2 ).


Algebraic dynamic programming and monotone computations 109

The function f is defined as


k
!
X
f ((e1 , e2 , ek )) := 1, |ei | .
i=1

The function T for operator ◦ depends on the parameters. If p(Q) = (i, k) and
p(W ) = (k, j), then

T (Q, W ) = (1, |(vi , vk )|) f (Q) (1, |(vk , vj )|) f (W ),

and similarly, for any r1 , r2 ∈ R,

T (r1 , r2 , (i, k), (k, j)) = (1, |(vi , vk )|) r1 (1, |(vk , vj )|) r2 .

The function Tl for the operator ◦l depends on the parameter, if p(Q) = (i, j),
then
TL (Q) = (1, |(vi , vj )|) f (Q)
and similarly, for any r ∈ R,

T (r, (i, j)) = (1, |(vi , vj )|) r

If F (S((n, 1))) = (x, y), then x is the number of possible triangulations (it is
easy to check that x is the n − 2nd Catalan number) and y is the total sum of
the scores of the triangulations. The average score is simply y/x.
When the score is multiplicative and the task is to calculate the minimum
score of the triangulations, then R is the exponentiated tropical semiring,
that is, R = (R+ ∪ ∞, ⊕, ), where ⊕ is the minimum, and is the usual
multiplication. The function f is defined as the product of the diagonal lengths
in the triangulation. The function T for operator ◦ depends on the parameters;
if p(Q) = (i, k) and p(W ) = (k, j), then

T (Q, W ) = |(vi , vk )| f (Q) |(vk , vj )| f (W ),

and similarly, for any r1 , r2 ∈ R

T (r1 , r2 , (i, k), (k, j)) = |(vi , vk )| r1 |(vk , vj )| r2 .

The function Tl for operator ◦l depends on the parameters; if p(Q) = (i, j),
then
Tl (Q) = |(vi , vj )| f (Q),
and similarly, for any r ∈ R

Tl (r, (i, j)) = |(vi , vj )| r.

The minimum score is F (S((n, 1))).


Finally, if the score is multiplicative and the task is to calculate the average
score of the triangulations, then R = (R2 , ⊕, ), where ⊕ is the coordinatewise
110 Computational complexity of counting and sampling

addition and is the coordinatewise multiplication. The function f is defined


as !
k
Y
f ((e1 , e2 , . . . , ek )) := 1, |ei | .
i=1

The function T for operator ◦ depends on the parameters; if p(Q) = (i, k) and
p(W ) = (k, j), then

T (Q, W ) = (1, |(vi , vk )|) f (Q) (1, |(vk , vj )|) f (W ),

and similarly, for any r1 , r2 ∈ R,

T (r1 , r2 , (i, k), (k, j)) = (1, |(vi , vk )|) r1 (1, |(vk , vj )|) r2 .

The function Tl for operator ◦l depends on the parameters; if p(Q) = (i, j),
then
Tl (Q) = (1, |(vi , vj )|) f (Q),
and similarly, for any r ∈ R,

Tl (r, (i, j)) = (1, |(vi , vj )|) r.

If F (S((n, 1))) = (x, y), then x is the number of triangulations, and y is the
sum of the scores. The average score is y/x.
Remark. If P is a convex polygon, then x, the number of triangulations,
is the n − 2nd Catalan number, and thus, in the last subexercise, R could be
chosen as the real number field, with the usual addition and multiplication.
However, if P is concave then all the calculations can be modified such that
in the yield algebra, the recursion in Equation (2.142) can be modified in
such a way that only those operations are considered for which the emerging
diagonals are inside the polygon. Then the appropriate evaluation algebra
still counts the number of triangulations, which will no longer be the n − 2nd
Catalan numbers.
Exercise 25. The two sub-exercises need different yield algebras and eval-
uation algebras. If the order of the breaking counts, then the yield algebra
is simple, however, the evaluation algebra is tricky. When the order of the
breakings does not count, the yield algebra is complicated and the evaluation
algebra is simple.
When the order of the breaks does not count, a canonical ordering of the
breakings must be defined. For any chocolate bar larger than 1 × 1, there is
either at least one vertical break running through the entire chocolate bar or
there is at least one horizontal break running through the entire chocolate bar,
however it is impossible to have both horizontal and vertical breaks running
through the entire chocolate bar. Call these breaks long breaks. If there are one
or more vertical long breaks, the canonical order starts with the leftmost break
breaking the bar into pieces B1 and B2 (left and right pieces, respectively),
then followed by the canonical order of breaks of B1 which must start with
Algebraic dynamic programming and monotone computations 111

a horizontal break, then followed by the canonical order of the breaks of B2 ,


which might start with both a vertical and a horizontal break. If there are one
or more horizontal long breaks, the canonical order starts with the top break
breaking the bar into pieces B1 and B2 (top and bottom pieces, respectively),
then followed by the canonical ordering of the breaks of B1 which must start
with a vertical break, then followed by the canonical order of the breaks of
B2 , which might start with both a vertical and a horizontal break.
Construct the following yield algebra. The base set A contains the canon-
ical ordering of chocolate bars with dimensions i × j. The parameters are
triplets (i, j, x), where i and j are the dimensions of the chocolate bar, and
x ∈ {h, v} denotes if the first break is horizontal or vertical. The partial or-
dering on the parameters is such that (i1 , j1 , x1 ) ≤ (i2 , j2 , x2 ) if i1 ≤ i2 and
j1 ≤ j2 , with the following exception. When both i1 = i2 and j1 = j2 , then
the two parameters are not comparable unless x1 = x2 , when naturally the
two parameters are the same.
If p(Q) = (i1 , j, v) and p(R) = (i2 , j, x), then the ◦h operator is defined as

Q ◦h R := bi1 ,h , Q, R

where bi1 ,h denotes the horizontal break breaking the chocolate bar at the
horizontal line after the first i1 rows. If p(Q) = (i, j1 , h) and p(R) = (i, j2 , x),
then the ◦v operator is defined as

Q ◦v R := bj1 ,v , Q, R

where bj1 ,v denotes the vertical break breaking the chocolate bar at the vertical
line after the first j1 columns. The recursions are

S((i, j, h)) = ti−1


k=1 tx∈{h,v} S((k, j, v)) ◦h S((i − k, j, x))
S((i, j, v)) = tj−1
k=1 tx∈{h,v} S((i, k, h)) ◦v S((i, j − k, x))

with the initial condition S((1, 1)) = {}, where  is the empty string (of
breaking steps). The evaluation algebra is the standard one for computing the
size of the sets, that is, R is the integer ring, f is the constant 1 function, and
both functions for the operators ◦h and ◦v are the multiplication.
When the order does count, the possible breaking scenarios are clus-
tered based on a canonical ordering defined below. The first breaking of
the canonical ordering of a breaking scenario is its first break b breaking
the bar into pieces B1 and B2 (the left and right or the top and bottom
pieces, respectively), then followed by the canonical ordering of the breaks in
B1 , b1,1 , b1,2 , . . . , b1,k1 and then the canonical ordering of the breaks in B2 ,
b2,1 , b2,2 , . . . , b2,k2 . If there are g1 number of breaking scenarios of B1 whose
canonical ordering is b1,1 , b1,2 , . . . , b1,k1 and there are g2 number of breaking
scenarios of B2 whose canonical ordering is b2,1 , b2,2 , . . . , b2,k2 , then there are
 
k1 + k2
g1 g2
k1
112 Computational complexity of counting and sampling

breaking scenarios whose canonical ordering is

b, b1,1 , b1,2 , . . . , b1,k1 , b2,1 , b2,2 , . . . , b2,k2 .

The idea is to define a yield algebra building the canonical ordering of the
breaking scenarios, then cz k will be assigned to each canonical ordering, where
c is the number of breaking scenarios that the canonical ordering represents,
and k is the number of breaks in it.
Having said this, define the following yield algebra. The base set A contains
the canonical ordering of the possible breaking scenarios of the chocolate bars
with dimensions i×j. The parameters (i, j) denote these dimensions, (i1 , j1 ) ≤
(i2 , j2 ) if i1 ≤ i2 and j1 ≤ j2 . If p(Q) = (i1 , j) and p(R) = (i2 , j), then the
operator ◦h is defined as

Q ◦h R := bi1 ,h , Q, R

where bi1 ,h denotes the horizontal break breaking the chocolate bar at the
horizontal line after the first i1 rows. If p(Q) = (i, j1 ) and p(R) = (i, j2 ), then
the ◦v operator is defined as

Q ◦v R := bj1 ,v , Q, R

where bj1 ,v denotes the vertical break breaking the chocolate bar at the vertical
line after the first j1 columns. The recursions are

S((i, j)) = ti−1



k=1 S((k, j)) ◦h S((i − k, j)) t
 
tj−1
k=1 S((i, k)) ◦v S((i, j − k))

In the evaluation algebra, R = Z[z], the polynomial ring over the integers.
The function f maps cz k to a canonical ordering, where c is the number of
breaking scenarios that the canonical ordering represent, and k is the number
of breaks in it. Both functions Th and Tv for the operators ◦h and ◦v are the
following convolution of polynomials.

k1
! k 
k1 +k
X X2 X2 +1
ai z i   bj z j  = cl zl
i=0 j=0 l=1

where
k  
X k
ck+1 = ai bk−i .
i=0
i
The total number of breaking scenarios is the sum of the coefficients in
F (S((n, m))).
The answer to the question for fun is that any break takes one piece of
chocolate bar and results two pieces, therefore, any breaking scenario of an
n × m chocolate bar needs nm − 1 breakings (the number of pieces should
Algebraic dynamic programming and monotone computations 113

be increased from 1 to nm, and each break increases the number of pieces by
1). Thus, F (S((n, m))) is only one monoid, cz nm−1 . Therefore, the evalua-
tion algebra can be simplified in the following way. R is the integer ring. The
function f maps the number of breaking scenarios whose canonical represen-
tation is the one in question. Both Th and Tv depend on the parameters; if
p(Q) = (i1 , j) and p(W ) = (i2 , j), then
 
i1 j + i2 j − 2
Th (Q, W ) = f (Q)f (W ),
i1 j − 1

and similarly, for any two integers, r1 and r2 ,


 
i1 j + i2 j − 2
Th (r1 , r2 , (i1 , j), (i2 , j)) = r1 r2 .
i1 j − 1

Tv is defined in a similar way.


Exercise 27. Assume that the labels l1 , l2 , . . . ln are ordered. Observe that
in any subtree, the indexes of the labels of the vertices cover an interval
[i, j]. Construct the following yield algebra. The base set A is the possible
binary search trees of the substrings. The parameters (i, j) define the first
and last indexes of the substring. In the partial ordering of the parameters,
(i1 , j1 ) ≤ (i2 , j2 ) if i2 ≤ i1 and j1 ≤ j2 . The operator ◦ might have two
binary search trees with parameters p(Q) = (i, k − 1) and p(W ) = (k + 1, j)
and connects the roots of Q and W via the new root of the larger subtree.
The new root is labeled by lk . The unary operator ◦l takes a sub-tree with
parameter (i, j), and connects its root to the new root of the larger sub-tree
labeled by lj+1 . The child of the new root is a left child. Finally, the unary
operator ◦r takes a sub-tree with parameter (i, j) and connects its root to
the new root labeled by li−1 . The child of the new root is a right child. The
recursions are
 
S((i, j)) = ◦r (S((i + 1, j))) t tj−1 k=i+1 S((i, k − 1)) ◦ S((k + 1, j)) t

t ◦l (S((i, j − 1))) .

For each i, the initial value S((i, i)) is the set containing the tree with a single
vertex labeled by li .
If the task is to find the minimum possible cost, then the algebraic structure
R in the evaluation algebra is the tropical semiring. The f function assigns
the score of the tree for any binary search tree. The function T for operator ◦
depends on the parameters; if p(Q) = (i, k − 1) and p(W ) = (k + 1, j), then
 
j
T (Q, W ) = f (Q) f (W ) m=i π(lm )

where is the tropical multiplication, that is, the usual addition. The expla-
nation for this definition is that each distance in Q and W is increased by
114 Computational complexity of counting and sampling

1, furthermore, the new root is labeled with lk , and this new vertex also has
distance 1. Similarly, for any r1 , r2 ∈ R,
 
j
T (r1 , r2 , (i, k − 1), (k + 1, j)) = r1 r2 m=i π(lm ) .

The function Tl and Tr also depend on the parameter. If p(Q) = (i, j), then
 
j
Tr (Q) = f (Q) m=i−1 π(lm ) ,

and for any r1 ∈ R,


 
j
Tr (r1 , (i, j)) = r1 m=i−1 π(lm ) .

Similarly, for p(Q) = (i, j),


 
j+1
Tl (Q) = f (Q) m=i π(lm ) ,

and for any r1 ∈ R,


 
j+1
Tl (r1 , (i, j)) = r1 m=i π(lm ) .

If the task is to calculate the average score of the possible binary search
trees, then R = (R2 , ⊕, ), where the addition is coordinatewise and the
multiplication is defined as

(x1 , y1 ) (x2 , y2 ) := (x1 x2 , x1 y2 + y1 x2 ).

For any binary search tree with score x, the f function assigns the value (1, x).
The function T for operator ◦ depends on the parameters; if p(Q) = (i, k − 1)
and p(W ) = (k + 1, j), then
j
!
X
T (Q, W ) = f (Q) f (W ) 1, π(lm ) ,
m=i

and similarly, for any r1 , r2 ∈ K,


j
!
X
T (r1 , r2 , (i, k − 1), (k + 1, j)) = r1 r2 1, π(lm ) .
m=i

The functions Tr and Tl for operators ◦r and ◦l also depend on the parameter.
If p(Q) = (i, j), then
j
!
X
Tr (Q) = f (Q) 1, π(lm )
m=i−1
Algebraic dynamic programming and monotone computations 115
j+1
!
X
Tl (Q) = f (Q) 1, π(lm )
m=i
and for any r1 ∈ R,
j
!
X
Tr (r1 , (i, j)) = r1 1, π(lm ) ,
m=i−1

and
j+1
!
X
Tl (r1 , (i, j)) = r1 1, π(lm ) .
m=i
If F (S((1, n))) = (x, y), then the average score is y/x.
Exercise 28. Let v be an arbitrary leaf of T = (V, E) and define a partial
ordering ≤ on V such that v1 ≤ v2 if the unique path from v1 to v contains
v2 . Let Tvi denote the subtree which contains the vertices which are smaller
than or equal to vi . A matching of a subtree Tvi = (Evi , Vvi ) is a mapping M :
Evi → {0, 1} such that for any two edges e1 , e2 ∈ Evi , if M (e1 ) = M (e2 ) = 1,
then e1 and e2 are disjoint. The size of a matching is |{e|M (e) = 1}|.
Define the following yield algebra. The base set A contains all matchings
of all subtrees Tvi . The parameter set θ is V × {0, 1}. On this parameter set,
(vi , xi ) ≤ (vj , xj ) if vi < vj in the above defined partial ordering of the vertices
of T or vi = vj and xi = xj . If M ∈ K is a matching on Tvi such that one of
the edges covers vi , then p(M ) = (vi , 1), otherwise p(M ) = (vi , 0).
The following operators are defined in the yield algebra. Let u be an inter-
nal node, let w1 and w2 be its children (the vertices that are smaller than u in
the partial ordering). Let e1 = (u, w1 ) and e2 = (u, w2 ). If M1 is a matching
of Tw1 and M2 is a matching of Tw2 , then M1 ◦ M2 is a matching M of Tu
such that M (e1 ) = M (e2 ) = 0 and other assignments follow the mappings in
M1 and M2 .
If M1 is a matching with parameter (w1 , 0) and M2 is a matching of Tw2 ,
then M1 ◦l M2 is a matching M of Tu such that M (e1 ) = 1, M (e2 ) = 0 and
other assignments follow the mappings in M1 and M2 .
If M1 is a matching of Tw1 and M2 is a matching with parameter (w2 , 0),
then M1 ◦r M2 is a matching M of Tu such that M (e1 ) = 0, M (e2 ) = 1 and
other assignments follow the mappings in M1 and M2 .
If the child of v is w, e = (v, w), and M is a matching with parameter
(w, 0), then M1 = ◦1 (M ) is a matching such that M1 (e) = 1 and all other
mappings follow M . Finally, if M is a matching of Mw , then M0 = ◦0 (M ) is
a matching such that M0 (e) = 0 and all other assignments follow M .
The recursions are
S((u, 0)) = tx1 ∈{0,1} S((w1 , x1 )) ◦ S((w2 , x2 )) (2.143)
x2 ∈{0,1}

S((u, 1)) = tx2 ∈{0,1} S((w1 , 0)) ◦l S((w2 , x2 )) t

tx1 ∈{0,1} S((w1 , x1 )) ◦r S((w2 , 0)) (2.144)
116 Computational complexity of counting and sampling

for any internal node u and its children w1 and w2 and

S((v, 0) = ◦0 (S((w, 0))) t ◦0 (S((w, 1))) (2.145)


S((v, 1) = ◦1 (S((w, 0))) (2.146)

for v and its child w.


If the task is to calculate the number of matchings, then the evaluation
algebra can be set in the following way. R is the integer ring, f is the constant
1 function, the T functions for ◦, ◦l , ◦r are multiplications, the T functions
for ◦0 and ◦1 is the identity. The total number of matchings is F (S((v, 0))) +
F (S((v, 1))).
If the task is to calculate the size of the maximum matching, then R is
the dual tropical semiring (where the addition is the maximum and not the
minimum), f is the size of the matching. The T function for ◦ is the tropical
multiplication (that is, the usual addition of integers); for ◦l and ◦r , it is
the tropical multiplication further tropically multiplied with 1 (that is, the
usual addition of integers and further adding 1), the T function for ◦0 is
the identity, and the T function for ◦1 is the tropical multiplication with 1,
namely, adding 1 in the usual arithmetic. The size of a maximal matching
is F (S((v, 0))) ⊕ F (S((v, 1))), that is max{F (S((v, 0))), F (S((v, 1)))} in the
usual arithmetic.
If the task is to count the number of maximum matchings, then R = Z[x],
f (M ) = x|M | . The T function for ◦ is the multiplication. For ◦l and ◦r , it is
the multiplication, such that the product is further multiplied with x. It is
the identity function for ◦0 , and finally, it is the multiplication with x for ◦1 .
The number of maximal matchings is the coefficient of the largest monoid in
F (S((v, 0))) + F (S((v, 1))).
Finally, if the task is to calculate the average edges in a matching, then
the evaluation algebra might be set similar to the one in Subsection 2.1.3.4.
Alternatively, since the number of edges in a matching is a non-negative integer
and upper bounded by half the number of vertices, the same evaluation algebra
can be used as the one used to count the number of maximum matchings. The
number of matchings is the value of the polynomial F (S((v, 0)))+F (S((v, 1)))
at x = 1 and the total number of edges in all matchings is the value of the
d
polynomial dx [F (S((v, 0))) + F (S((v, 1)))] at x = 1. The average number of
edges in a matching is the total number of edges in the matchings divided by
the number of matchings.
Exercise 32. Let the input text be w1 w2 . . . wn , where each wi is a word.
For any i ≤ j, define

j
!2
X
g(i, j) := t− |wk | + j − i .
k=i

Define the following yield algebra. The base set A contains the possible wrap-
pings of the prefix text w1 w2 . . . wi into lines. The parameter i denotes the
Algebraic dynamic programming and monotone computations 117

number of words in the prefix, and the parameters are naturally ordered. The
operator ◦i,j adds a new line to the wrapped text containing the words from
wi till wj . The recursions are

S(j) = ti| Pj |wk |+j−i≤t ◦i,j (S(i − 1))


k=i

with the initial condition S(0) containing an empty text.


In the evaluation algebra, R = Z[z1 , z2 ], the two variable polynomial ring.
The function f maps z1k z2x to each wrapping, where k is the number of lines,
and x is the total score. The function Ti,j for operator ◦i,j is the multiplication
g(i,j)
with z1 z2 . The number of possible wrappings of the text into k lines with
score x is the coefficient of z1k z2x in F (S(n)).
Exercise 34. First a few notations are introduced. Recall that a sequence Y is
a supersequence of sequence X if X is a subsequence of Y . Y is a palindromic
supersequence of X if the following conditions hold.
• Y is a supersequence of X
• Y is palindromic
• X can be injected onto to Y such that for each pair (yi , ym−i ), at least
one of the characters is an image of a character from X, where m = |Y |,
furthermore, if m is odd, then y m+1 is an image of a character of X.
2

Note that for an X and one of its palindromic supersequences Y , there might
be more than one injection that certifies that Y is indeed a supersequence
of X. However, each such injection indicates a possible (and different!) way
to make X palindromic. Any injection can be represented as an alignment
of X and Y that does not contain mismatches (substitutions) and deletions,
only matches and insertions. Therefore the algebraic dynamic approach counts
these alignments.
Based on the above, define the following yield algebra. The base set A is
the possible alignments of substrings of X to their possible palindromic su-
persequences. The parameters (i, j) indicate the first and the last indexes of
the substrings. In the partial ordering of the parameters, (i1 , j1 ) ≤ (i2 , j2 ) if
i2 ≤ i1 and j1 ≤ j2 . The unary operator ◦ takes an alignment with parameters
x
(i, j), and adds an alignment column i−1 at the beginning of the alignment
xi−1
xj+1
and adds an alignment column at the end of the alignment. The unary
xj+1
operator ◦l takes an alignment with parameters (i, j), adds an alignment col-
x
umn i−1 at the beginning of the alignment, and adds an alignment column
xi−1

at the end of the alignment. The unary operator ◦r takes an alignment
xi−1

with parameters (i, j), adds an alignment column at the beginning
xj+1
118 Computational complexity of counting and sampling

xj+1
of the alignment, and adds an alignment column at the end of the
xj+1
alignment. The recursions are

S((i, j)) = ◦l (S((i + 1, j))) t ◦r (S((i, j − 1))) if xi 6= xj

and

S((i, j)) = ◦l (S((i + 1, j))) t ◦r (S((i, j − 1))) t ◦ (S((i, j))) if xi = xj


   
xi 
with initial conditions S((i, i)) = and S((i + 1, i)) = where
xi 

is the empty alignment.

In the evaluation algebra, the algebraic structure R is the integer polyno-
mial ring Z[z]. The function f maps z k to an alignment, where k is the number
of insertions in it. The function T for the operator ◦ is the identity function,
the function Tl and Tr for the operators ◦l and ◦r are both the multiplica-
tion with z. The number of ways to insert a minimum number of characters
into X to make it palindromic is the coefficient of the smallest monomial in
F (S((1, n))), where n is the length of X.
Exercise 35. Construct a partially ordered set that contains all (ai , bi ),
(bi , ai ), (ai , ci ), (ci , ai ), (bi , ci ) and (ci , bi ) pairs for each box with dimensions
(ai , bi , ci ). Add a 0 and a 1 to this partially ordered set. The ordering is such
that (x1 , y1 ) ≤ (x2 , y2 ) if x1 ≤ x2 and y1 ≤ y2 . The possible stacks of boxes are
the subsequences of the possible paths from 0 to 1. The yield algebra should
build up stacks such that the last box is a given one with a given rotation.
The evaluation algebras are the standard ones with the tropical semiring, the
integer ring and the ring introduced in Subsection 2.1.3.4.
Exercise 36. Recall that the sub-trees of a rooted binary tree are the rooted
binary trees that are rooted in a vertex of the tree. Define the following yield
algebra. For each subexercise, the base set A contains the colored subtrees of
the given tree T colored as follows:
(a) The vertices are colored with black and white, and no neighbor vertices
are both colored with black. The black vertices form an independent set.
(b) The vertices are colored with three colors, and no neighbor vertices have
the same color.
(c) The vertices are partially colored with k ≥ 2 colors, and no neighbor
vertices have the same color, however, they might both be uncolored.
The parameters of a coloring are pairs (v, c), where v denotes the root of
the subtree and c is its color (no coloring can be considered as an additional
color). The partial ordering of the parameters is such that (v1 , c1 ) ≤ (v2 , c2 ) if
v2 is on the path from v1 to the root of T . The operator ◦c takes two subtrees
with sibling nodes, connects them via their common parent, and colors this
Algebraic dynamic programming and monotone computations 119

parent with color c. In the following recursions, the children of v are u1 and
u2 , w denotes white, b denotes black, r denotes red, n denotes “no color”. The
recursions are
(a)

S((v, w)) = (S((u1 , b)) t S((u1 , w))) ◦w (S((u2 , b)) t S((u2 , w)))
S((v, b)) = S((u1 , w)) ◦b S((u2 , w))

(b)

S((v, w)) = (S((u1 , b)) t S((u1 , r))) ◦w (S((u2 , b)) t S((u2 , r)))
S((v, b)) = (S((u1 , w)) t S((u1 , r))) ◦w (S((u2 , w)) t S((u2 , r)))
S((v, r)) = (S((u1 , w)) t S((u1 , b))) ◦w (S((u2 , w)) t S((u2 , b)))

(c)
 
S((v, ci )) = tcj 6=ci S((u1 , cj )) t S((u1 , n)) ◦ci
 
tcj 6=ci S((u2 , cj )) t S((u2 , n))
S((v, n)) = ((tci S((u1 , ci ))) t S((u1 , n))) ◦n
((tci S((u2 , ci ))) t S((u2 , n))) .
(2.147)

The evaluation algebra is the standard one counting the size of sets, that is,
R is the integer ring, f is the constant 1 function, and each T operator is the
multiplication.
Exercise 38. Note that inverting the path from the bottom right corner to the
top left makes it also a path from the top left corner to the bottom right. This
inverted path will step to a shared square in the same number of steps as the
top-down path. Having said this, construct the following yield algebra. The
base set A contains the pair of prefixes of the top-down path and the inverted
down-top path with the same length. The parameter quadruple (i1 , j1 , i2 , j2 )
gives the indexes of the last squares of the prefixes, (i1 , j1 ) and (i2 , j2 ). In the
partial ordering of the parameters, (i1 , j1 , i2 , j2 ) ≤ (l1 , m1 , l2 , m2 ) if ik ≤ lk
and (jk ≤ mk ) for both k = 1, 2. The four unary operators, ◦hh , ◦hv , ◦vh and
◦vv extend the prefixes with horizontal or vertical steps. The recursions are

S((i1 , j1 , i2 , j2 )) = ◦hh (S((i1 − 1, j1 , i2 − 1j2 ))) t


t ◦hv (S((i1 − 1, j1 , i2 , j2 − 1))) t
t ◦vh (S((i1 , j1 − 1, i2 − 1j2 ))) t
t ◦vv (S((i1 , j1 − 1, i2 , j2 − 1))) ,

with the initial condition S((1, 1, 1, 1)) containing a pair of paths both con-
taining the top left square.
120 Computational complexity of counting and sampling

If the task is to count the paths, then the evaluation algebra is the usual
one with R = Z, the f function is the constant 1, and each function Txy ,
x, y ∈ {h, v} is the identity. The number of paths is F (S((n, m))).
If the task is to calculate the tour with the maximum score, then the
semiring in the evaluation algebra is the dual tropical semiring (with maxi-
mum instead of minimum). The f function is the sum (tropical product) of
the number of coins along the paths, without multiplicity. The T functions
depend on the parameter. If the parameter of the resulting couple of paths
is (i1 , j1 , i2 , j2 ) and i1 = i2 (and then j1 = j2 !), then the function is the
tropical multiplication (usual addition) with the number of coins on square
mi1 ,j1 . Otherwise, the function is the tropical multiplication (usual addition)
of the sum (tropical product) of the coins on the two different squares. The
maximum possible sum of coin values is F (S((n, m))).
Finally, if the task is to calculate the average score of the tours, then
R = (R2 , ⊕, ), where ⊕ is the coordinatewise addition and
(x1 , y1 ) (x2 , y2 ) = (x1 x2 , x1 y2 + y1 x2 ).
The f function assigns (1, w) of a pair of paths, where w is the sum of the
coins on a pair of paths without multiplicity. The T functions depend on
the parameters. If the resulting pair of paths have parameters (i1 , j1 , i2 , j2 )
and i1 = i2 , then it is the multiplication with (1, wi1 ,j1 ), where wi1 ,j1 is the
number of coins on square mi1 ,j1 . Otherwise, it is the multiplication with
(1, wi1 ,j1 + wi2 ,j2 ). If F (S((n, m))) = (x, y), then the average score is y/x.
Exercise 39. The trick of inverting the down-top path still works. However,
the lengths of the paths might differ, therefore the prefix of the top-down
path and the prefix of the inverted path might step to the same square after a
different number of steps. Therefore the pair of prefixes must be in the same
column to be able to check which squares are shared. An extension of both
prefixes contains a diagonal or horizontal step and possibly a few down steps.
Exercise 40. Construct the following yield algebra. The base set A contains
the possible tripartitions XtY tZ of the possible prefixes of a1 , a2 , . . . , an . The
parameter (l, d1 , d2 ) describesP the lengthP of the prefix, l,P
and the differences
P in
the sums of the sets, d1 := ai ∈X ai − aj ∈Y aj , d2 := ai ∈X ai − ak ∈Z ak .
The partial ordering is based on the length of the prefixes, and a parameter
with shorter prefix length is smaller. The unary operators ◦ai ,X , ◦ai ,Y , and
◦ai ,Z add ai to the sets X, Y and Z, respectively. The recursions are
S(l, d1 , d2 ) = ◦al ,X (S((l − 1, d1 − al , d2 ))) t
◦al ,Y (S((l − 1, d1 + al , d2 ))) t ◦al ,Z (S((l − 1, d1 , d2 + al )))
with the initial condition S((0, 0, 0)) containing an empty set for all X, Y and
Z.
If the task is to count the tripartitions with equal sums, then the evaluation
algebra contains R = Z, f is the constant 1 function, and all T functions
are the identity functions. The number of tripartitions with equal sums is
S((n, 0, 0)).
Algebraic dynamic programming and monotone computations 121

The average of the sum of squared differences of the subset sums can be
calculated with the same evaluation algebra. The solution is
P P 2 2 2

d1 d2 F (S((n, d1 , d2 )))(d1 + d2 + (d1 − d2 ) )
P P .
d1 d2 F (S((n, d1 , d2 )))

Exercise 42. First, observe that



(c∗ ) = c
since the ∗ operation is the mirroring to the y = x axis when c is drawn as a
column diagram. This means that there is a bijection between the possible c
sequences and c∗ sequences. Furthermore,
m
X n
X
ci ≥ c∗j
i=1 j=1

with equality if c1 ≤ n. Therefore the pair of conditions


n
X m
X
bi = ci
i=1 j=1

and
n
X n
X
bi ≤ c∗j
i=1 j=1

is equivalent with
n
X n
X
bi = f cj ∗
i=1 j=1

and c1 ≤ n (and thus c∗n+1 = 0). Therefore the number of graphical bidegree
sequences {b, c} with length |b| = n and |c| = m is the number of decreasing
non-negative sequences {b, c∗ } with length |b| = n and |c∗ | = n such that they
satisfy for all k = 1, . . . , n
k
X Xk
bi ≤ c∗j (2.148)
i=1 j=1

with equality when k = n, and b∗1 ≤ m.


Based on the above, define the following yield algebra. The base set
A contains the equal long (b, c∗ ) pair of sequences satisfying condition in
Equation (2.148) and c∗1 ≤ m. The parameter set contains quadruples
(s, t, δ, k) describing the last number in b, the last number in c∗ , the difference
Pk ∗
Pk
i=1 ci − j=1 bj , and the length of the sequences, respectively. In the par-
tial ordering of the parameters, (s1 , t2 , δ1 , k1 ) ≤ (s2 , t2 , δ2 , k2 ) is k1 ≤ k2 . The
unary operator ◦x,y extends the sequence b with number x and the sequence
c∗ with number y. The recursions are
S((s, t, d, k)) = ts1 ≥s tt1 ≥t ◦s,t (S((s1 , t1 , d + t − s, k − 1)))
122 Computational complexity of counting and sampling

with initial conditions S((s, t, t − s, 1)) = {{s, t}} and S((s, t, δ, 1)) = ∅ for all
δ 6= t − s.
The evaluation algebra is the standard one counting the size of the sets,
that is, R = Z, f is the identity function and each Tx,y function is the identity
function. The number of graphical bidegree sequences with lengths |b| = n
and |c| = m is
Xm X m
F (S((s, t, 0, n))).
s=0 t=0
Chapter 3
Linear algebraic algorithms. The
power of subtracting

3.1 Division-free algorithms for calculating the determinant and


Pfaffian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
3.2 Kirchhoff’s matrix-tree theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
3.3 The BEST (de Bruijn-Ehrenfest-Smith-Tutte) algorithm . . . . . . . 139
3.4 The FKT (Fisher-Kasteleyn-Temperley) algorithm . . . . . . . . . . . . . 145
3.5 The power of subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
3.6 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
3.8 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

In this chapter, we are going to introduce algorithms that count discrete math-
ematical objects using three operations: addition, subtraction and multiplica-
tion. The determinant and the Pfaffian of certain matrices will be the center
of interest. The minors of Laplacians of graphs are the number of spanning
trees or in case of directed matrices, the number of in-trees. The number of
in-trees appears in counting the Eulerian circuits in directed Eulerian graphs.
The Pfaffians of appropriately oriented adjacency matrices of planar graphs
are the number of perfect matchings in those graphs.
The above-mentioned discrete mathematical objects might have weights,
and these weights might come from arbitrary commutative rings. We might
also want to calculate the sum of these weights. Divisions are not available
in commutative rings, therefore it is necessary to give division-free polyno-
mial running time algorithms for calculating the determinant and Pfaffian of
matrices in an arbitrary commutative ring. Surprisingly, such algorithms are
available, and actually, they are dynamic programming algorithms. These dy-
namic programming algorithms are on some combinatorial objects called clow
sequences. Clow sequences are generalizations of permutations. The sum of
signed weights of clow sequences coincide with the determinants and Pfaffi-
ans, which, by definition, are the sum of signed weights of permutations. The
coincidence is due to cancellation of terms in the summation, which might
not be available without subtractions. We are going to show that subtractions
are inevitable in efficient algorithms: theorems exist that there are no algo-
rithms that can count the above-mentioned discrete mathematical objects in
polynomial time without subtraction.

123
124 Computational complexity of counting and sampling

3.1 Division-free algorithms for calculating the determi-


nant and Pfaffian
It is well known that the determinant of a matrix can be calculated in
polynomial time using Gaussian elimination. However, Gaussian elimination
contains divisions that we might want to avoid for several reasons. One pos-
sible reason is that division might cause numerical instability [77]. Another
reason is that we might want to calculate the determinant of a matrix over a
commutative ring where division is not possible. This is the case, for example,
when the characteristic polynomial of a matrix is to be calculated, which is
formally a determinant of a matrix over the polynomial ring R[λ]. The first
such method was published in the textbook by Faddayev and Faddayeva [66],
referring to a work of Samulelson [149]. Berkowitz published an algorithm
[18] to calculate the determinant with parallel processors and he called it
Samuelson’s method referring to Faddayev and Faddyeva’s textbook. Valiant
gave a combinatorial explanation of the method and called it the Samuelson-
Berkowitz algorithm. In fact, Samuelson’s method was not division-free yet,
so the reason for the erroneous attribution is that the first division-free ap-
proach is in the text book by Faddayev and Faddayeva in the section about
Samuelson’s method. The method was further developed by Mahayan and
Vinay [124]. Here we introduce the combinatorial description of the method.
The combinatorial approach is also a dynamic programming algorithm. It
calculates the determinant also in polynomial time, although in O(n4 ) running
time instead of O(n3 ). On the other hand, it can be applied for a matrix over
an arbitrary commutative ring. By definition, the determinant of a matrix
M = {mi,j } is
X Yn
det(M ) = sign(σ) mi,σ(i) . (3.1)
σ∈Sn i=1

The determinant can be considered as the sum of the weights of permuta-


tions, where the weight of a permutation depends on the matrix M , and it
is a multiplicative function. Since there are efficient dynamic programming
algorithms to calculate the sum of weights of combinatorial objects, a naı̈ve
idea is to build up permutations in a yield algebra, and apply a standard
evaluation algebra that calculates the sum of the weights. However, this is
impossible, since there is no yield algebra that can build all the permutations
in Sn and there are only O(poly(n)) number of parameters below that θ for
which S(θ) = Sn , see Section 3.5. Instead, clow sequences have to be used, as
defined below.
Definition 40. Let G ~ = (V, E) be a directed graph with an arbitrary to-
tal ordering of its vertices. A clow (closed ordered walk) is a walk C =
vi0 , ej1 , vi1 , . . . ejk , vik such that for all l = 1, . . . , k −1, vil > vi0 and vik = vi0 .
For sake of simplicity, the edges and the last vertex might be omitted in the
Linear algebraic algorithms. The power of subtracting 125

description of walks. When the last vertex is omitted, the series of vertices
are put into parenthesis, indicating that the walk is closed. Vertex vi0 is called
the head of the walk, and is denoted by head(C). The length of a walk is the
number of edges in it, and is denoted by l(C).
A clow sequence is a series of clows, C = C1 , C2 , . . . Cm such that for all
i = 1, . . . , m − 1, head(Ci ) < head(Ci+1 ). The length of a clow sequence, l(C),
is the sum of the length of the clows in it.
If G ~ is edge-weighted, where w : E → R denotes the weight func-
tion, the score of a clow sequence is defined. The score of a clow C =
vi0 , ej1 , vi1 , . . . ejk , vik is defined as
k
Y
W (C) := w(ei ). (3.2)
i=1

and the score of a clow sequence C = C1 , C2 . . . Cm is defined as


m
Y
W (C) := (−1)l(C)+m W (Ci ) (3.3)
i=1

When G~ is the complete directed graph with loops, all permutations appear
amongst clow sequences. Indeed, define the canonical cycle representation of
a permutation as

(σ1,1 σ1,2 . . . σ1,l1 )(σ2,1 . . .) . . . (σm,1 , . . . σm,lm )

such that for all i, j, σi,1 < σi,j and for all i = 1, . . . , m − 1, σi,1 < σi+1,1 .
This canonical ordering represents a clow sequence containing m clows, and
the ith clow is
(vσi,1 , vσi,2 , . . . , vσi,li ).
On the other hand, there are clow sequences that are not permutations.
Indeed a clow might visit a vertex several times and two different clows might
visit the same vertex if both heads of the clows are smaller than the visited
vertex. However, these are the only exceptions when a clow sequence is not a
permutation, as stated in the lemma below.
Lemma 3. Let K ~ n be the complete directed graph on n vertices, in which
loops are allowed. Let Pn denote the set of clow sequences on K ~ n for which
each clow is a cycle and each vertex is in at most one cycle. Then the mapping
ϕ that maps each permutation

(σ1,1 σ1,2 . . . σ1,l1 )(σ2,1 . . .) . . . (σm,1 , . . . σm,lm )

to the clow sequence

(vσ1,1 , vσ1,2 , . . . , vσ1,l1 vσ1,1 ) . . . (vσm,1 , . . . , vσm,lm )

is a bijection between Sn and Pn .


126 Computational complexity of counting and sampling

Proof. The bijection is based on the canonical representation of the permu-


tation given above. Each permutation has one canonical representation. Let
ϕ(σ) denote the clow sequence obtained by the canonical representation. It
is clear that ϕ is an injection. It is also clear that ϕ is a surjection, since
any clow sequence C ∈ Pn appears as an image of a permutation. Indeed, if
C = C1 , C2 , . . . , Cm is a clow sequence in P, then it is image of the permu-
tation that contains m cycles and the ith cycle contains the indexes of the
vertices in Ci . That is indeed a cycle representation of a permutation, since
each index appears at most once, however, there are n possible indexes and
exactly n indexes appear, thus each index appears exactly once.
Above this bijection, permutations and clow sequences in P have the same
score when a directed graph is weighted based on a matrix. This is precisely
stated in the following theorem
Theorem 28. Let M be an n × n matrix. Assign weights to the directed
~ n such that w((vi , vj )) := mi,j . Then
complete graph K
X
det(M ) = W (C). (3.4)
C∈P

Proof. Let σ be a permutation in Sn , and define


n
Y
W (σ) := sign(σ) mi,σ(i) .
i=1

It is sufficient to show that for any permutation σ,

W (σ) = W (ϕ(σ))

for the bijection ϕ given in Lemma 3. It is true, since sign(σ) = (−1)n+m ,


where m is the number of cycles in σ, and that is indeed (−1)l(ϕ(σ))+m . The
weights in the products in W (σ) and W (ϕ(σ)) are also the same due to the
~ n.
definition of ϕ and the definition of the edge weights in K
Although there are clow sequences which are not permutations, the fol-
lowing theorem also holds.
Theorem 29. Let M be an n × n matrix. Assign weights to the directed
complete graph K~ n such that w((vi , vj )) := mi,j . Let Cn denote the set of clow
sequences of length n. Then
X
det(M ) = W (C). (3.5)
C∈Cn

Proof. According to Theorem 28, it is sufficient to show that


X
W (C) = 0. (3.6)
C∈Cn \Pn
Linear algebraic algorithms. The power of subtracting 127

To show this, an involution g is given on Cn \Pn such that for any C ∈ Cn \Pn ,
it holds that
W (C) = −W (g(C)).
Let C be a clow sequence in Cn \ Pn containing m clows. Let i be the smallest
index such that Ci+1 , Ci+2 , . . . Cm are vertex disjoint cycles. Since C is not in
P, i cannot be 0. Ci = (vi,1 , vi.2 , . . . , vi,l+i ) is not a cycle or Ci intersects with
some Ck , k > i or both. Let vi,p be called a witness if vi,p ∈ Ck for some k > i
or there exists vi,q ∈ Ci such that q < p and vi,p = vi,q . Take the smallest
index j such that vi,j is a witness. Observe that either vi,j ∈ Ck or there is
a j 0 < j such that vi,j = vi,j 0 , but the two cases cannot hold in the same
time. Indeed, otherwise vi,j 0 was a witness since it is in Ck , contradicting the
minimality of j. If vi,j ∈ Ck , and it has an index pair k, j 0 in Ck , then define

Ci0 := (vi,1 , vi,2 , . . . vi,j , vk,j 0 +1 , vk,j+2 , . . . v,k,l+k , vk,1 , . . . , vk,j 0 , vi,j+1 , . . . , vi,li ).

namely, glue Ck into Ci . Define

g(C) := C1 , C2 , . . . , Ci0 , Ci+1 . . . , Ck−1 , Ck+1 , . . . , Cm

Observe that the smallest index i0 such that Ci0 +1 , Ci0 +2 , . . . are dis-
joint cycles in g(C) is still i, and the smallest index witness in Ci is
still vertex vi,j (just now it has a larger index, j + l(Ck )). Indeed, the
cylces Ci+1 . . . , Ck−1 , Ck+1 , . . . , Cm are still disjoint. The set of vertices
vi,1 , . . . , vi,j−1 were not witnesses in C, so they cannot be witnesses in g(C).
Furthermore, the new vertices in Ci0 coming from Ck cannot be witnesses since
Ck was a cycle disjoint from cycles Ci+1 . . . , Ck−1 , Ck+1 , . . . , Cm , and there
were no witnesses in Ci with a smaller index than j, so Ck does not contain
any vertex from vi,1 , . . . , vi,j−1 .
Now consider a clow sequence C 0 such that its smallest index witness vi,j
is a witness since a vi,j 0 exists such that j 0 < j and vi,j = vi,j 0 . Then define

Ci0 := (vi,1 , vi,2 , . . . , vi,j 0 , vi,j+1 , . . . , vi,li ),

and define a new cycle

C 0 := (vi,j 0 , vi,j 0 +1 , . . . , vi,j−1 ).

C 0 might have to be rotated to start with the smallest vertex, and thus to be
a clow sequence. Define

g(C 0 ) := C1 , . . . , Ci0 , Ci+1 , . . . , C 0 , . . . , Cm .

Observe that the head of C 0 is larger than vi,1 , so it will appear after Ci in the
clow sequence. Furthermore, C 0 is vertex disjoint from cycles Ci+1 , . . . , Cm ,
since its vertices were not witnesses in Ci , except vi,j 0 = vi,j . Then the smallest
index i0 such that Ci0 +1 , Ci0 +2 , . . . are disjoint cycles is still i, and the smallest
index witness in Ci is still vertex vi,j , since it is in C 0 .
128 Computational complexity of counting and sampling

Based on the above, observe that whatever is the reason that vi,j is the
smallest index witness, the g function is defined such that
g(g(C)) = C,
therefore g is indeed an involution. Furthermore the lengths of C and g(C)
are the same, they contain exactly the same edges, however, their number of
cycles differs by 1. Thus
W (C) = −W (g(C)),
and thus X
W (C) = 0.
C∈C\P

But in that case, X X


W (C) = W (C) = det(M ).
C∈C C∈P

It is possible to build a yield algebra for the clow sequences. More precisely,
the base set A contains partial clow sequences on a given directed graph,
~ = (V, E). A partial clow sequence contains some clows and an ordered walk
G
that might not be closed yet. The parameters are (vh , v, k), where vh is the
head of the current clow, v is the actual vertex, and k is the length of the
partial clow sequence, that is, the number of edges in the clow sequence. In
the partial ordering of the parameters, (vh , v, k) ≤ (vh0 , v 0 , k 0 ) if vh ≤ vh0 and
k ≤ k 0 . Obviously, only those parameters are valid, in which vh < v. The
unary operator ◦(u,w) adds the new edge (u, w) to the partial clow sequence.
If (vh , v) is an edge in G,~ then u might be vh and w might be v, and the
edge (u, w) starts a new partial clow. Otherwise u must be v, and the current
(partial) clow is extended with an edge. Therefore, the recursions are
S((vh , v, k)) = tu>vh ∧(u,v)∈E ◦(u,v) (S((vh , u, k − 1)))
if (vh , v) ∈
/E (3.7)

S((vh , v, k)) = tu>vh ∧(u,v)∈E ◦(u,v) (S((vh , u, k − 1))) t
 
tvh0 <vh ◦(vh ,v) (S((vh0 , vh0 , k − 1)))
if (vh , v) ∈ E. (3.8)
The initial conditions are the following. S((vh , v, 1)) is the empty set if
(vh , v) ∈
/ E and contains the partial clow sequence having only the edge (vh , v)
in its first clow.
In the evaluation algebra, the algebraic structure R is the real number
field. The function f is defined as
Y
f (C) := (−1)l(C)+m w(e) (3.9)
e∈C
Linear algebraic algorithms. The power of subtracting 129

where l(C) is the number of edges in the partial clow sequence C with multi-
plicity, m is the number of ordered walks in C including the last, possibly yet
not closed ordered walk, and the product also considers the multiplicity of the
edges in C. It is easy to see that

f (C) = W (C)

for any clow sequence C. The T(u,v) function for the operator ◦(u,v) depends
on the parameter. If in the parameter (vh , v, k), vh = v, then T(u,v) is the mul-
tiplication with w((u, v)), otherwise it is the multiplication with −w((u, v)).
The determinant of M can be calculated as
X
det(M ) = F (S((vh , vh , n))).
vh

Observe that the real numbers can be replaced with any commutative ring in
the evaluation algebra. Therefore the following theorem holds.
Theorem 30. Let M be an n × n matrix over an arbitrary commutative ring
R. Then the determinant of M can be calculated in O(n4 ) time.

Proof. The algebraic dynamic programming approach provides this solution.


With the classical notations, the dynamic programming algorithm fills in a
dynamic programming table w(i, j, k) for each i ≤ j, i, j, k = 1, . . . , n. The
initial conditions are
w(i, j, 1) = mi,j ,
and the recursions are
X X
w(i, j, k) = −mj 0 ,j w(i, j 0 , k − 1) + mi,j w(i0 , i0 , k − 1). (3.10)
j 0 >i i0 <i

Finally,
n
X
det(M ) = w(i, i, n).
i=1

Since the number of entries is O(n3 ) and each entry can be computed in O(n)
time, the overall running time is O(n4 ).
Observe that this running time is an order larger than the standard Gaus-
sian elimination, which runs in O(n3 ) running time. On the other hand, this
algorithm does not need any division, thus it can be used for matrices over
arbitrary commutative rings. Specially, if the matrix is over the integer ring,
then the calculations in the recursion remain in the integer ring.
Also observe that the yield algebra builds the clow sequences and not
the permutations. Therefore, the evaluation algebra will calculate the sum of
scores of clow sequences with some given parameters. For some special scores,
this might coincide with the determinant of a matrix. However, if P 6= N P ,
130 Computational complexity of counting and sampling

no similar algorithm exists for calculating the permanent of a matrix. This is


explained in detail in Chapter 4.
Another important value assigned to skew-symmetric, even-dimensional
matrices is the Pfaffian, which appears in counting perfect matchings of planar
graphs.
Definition 41. A matrix A is skew-symmetric if for all i, j, ai,j = −ai,j .
The Pfaffian of a 2n × 2n matrix A is defined as
n
1 X Y
pf (A) := sign(σ) aσ(2i−1),σ(2i) . (3.11)
2n n! i=1
σ∈S2n

It is easy to see that the normalizing constant 2n1n! in the definition of the
Pfaffian is for cancelling the 2n n! cases of σ ∈ S2n providing the same score,
n
Y
sign(σ) aσ(2i−1),σ(2i) . (3.12)
i=1

Indeed, there are 2n n! ways to form n pairs on 2n indexes, corresponding


to 2n n! permutations. The permutations have different signs, however, the
signs of the permutations and the skew-symmetry of the matrix cancel each
other. Swapping two consecutive elements in a permutation changes its sign,
however, it is also true that aσ(2i−1),σ(2i) = −aσ(2i),σ(2i−1) . Swapping two con-
secutive pairs does not change the sign of the permutation, and it also does
not change the product in Equation (3.12). Any two permutations represent-
ing the same n pairs of the 2n indexes can be transformed into each other
with these elementary transformations (swapping the order of two indexes in
the same pair and swapping two consecutive pairs). Since these elementary
transformations do not change the score of the permutation, we get that for
all the 2n n! permutations, the score of the permutation in Equation (3.12) is
the same.
If we want to extend the definition of the Pfaffian for matrices over an
arbitrary commutative ring, we have to modify the definition since dividing
by 2n n! might not be available. Define the equivalence relation ∼ such that
σ ∼ σ 0 if σ and σ 0 represent the same n pairs on 2n indexes, and let S2n /∼
denote the set containing one permutation for each equivalence class. Then
the Pfaffian of the matrix is
X n
Y
pf (A) := sign(σ) aσ(2i−1),σ(2i) . (3.13)
σ∈S2n / ∼ i=1

Since the signed product is the same for any σ coming from the same equiva-
lence class, this definition is well defined.
Similar to the determinant, the Pfaffian can be calculated in polynomial
time in an arbitrary commutative ring using the so-called alternating clow
sequences. First, we define them.
Linear algebraic algorithms. The power of subtracting 131

Definition 42. A clow sequence is an alternating clow sequence if each even


indexed edge in each clow is either (v2i−1 , v2i ) or (v2i , v2i−1 ) for some i.
We are ready to state and prove the following theorem.
Theorem 31. The Pfaffian of a 2n × 2n skew-symmetric matrix A over any
commutative ring R can be calculated using only O(n4 ) operations (additions,
subtractions and multiplications) in the ring R.
Proof. We define an injective mapping from S2n /∼ to alternating clow
sequences of the complete directed graph K ~ 2n . For a permutation
σ(1), σ(2), . . . , σ(2n), first consider the set Mσ containing the unoriented edges

(vσ(1) , vσ(2) ), (vσ(3) , vσ(4) ), . . . , (vσ(2n−1) , vσ(2n) ),

and the set M containing the unoriented edges

(v1 , v2 ), (v3 , v4 ), . . . , (v2n−1 , v2n ).

Note that both Mσ and M are a perfect matching of the complete graph
on 2n vertices, therefore the multiset union of Mσ ] M contains even long
alternating cycles, where along a walk on each cycle, one edge comes from Mσ
and the other from M . Edges in Mσ ∩M are represented twice in Mσ ]M , and
they are considered as cycles with two edges. The clow sequence assigned to
σ consists of the oriented versions of these cycles, such that each clow starts
with the smallest index vertex, and the first edge in each clow comes from
Mσ . For example, if σ is

1, 5, 2, 12, 3, 4, 6, 11, 7, 9, 8, 10,

then the clow sequence is

(v1 , v5 , v6 , v11 , v12 , v2 ), (v3 , v4 ), (v7 , v9 , v10 , v8 ).

As we can see, each clow has even length, and every second edge is either
(v2i−1 , v2i ) or (v2i , v2i−1 ) for some i, namely, the images are alternating clow
sequences.
We have to define the weights of the edges in the clow sequence to define
the score of the clow sequence. If an edge (vi , vj ) comes from Mσ , then its
weight is ai,j , and if an edge (vi , vj ) comes from M , its weight is j − i (recall
that it is either 1 or −1).
We claim if a clow sequence C is the image of σ, then the score of C is
the score of σ as defined in Equation (3.12) if n is even, and the score of C is
the additive inverse of the score of σ if n is odd. It is clear that the score of
the clow sequence contains the product of the appropriate ai,j terms; we only
have to check the signs. Since the score of σ is well defined on S2n /∼ , without
loss of generality, we might assume that σ is

i1,1 , i1,2 , . . . , i1,k1 , i2,1 , . . . , i2,k2 , . . . , im,1 , . . . , im,km


132 Computational complexity of counting and sampling

when the clow sequence is

(vi1,1 , vi1,2 , . . . , vi1,k1 ), (vi2,1 , . . . , vi2,k2 ), . . . , (vim,1 , . . . , vim,km ).

Observe that the obtained clow sequence corresponds to the permutation

(i1,1 , i1,2 , . . . , i1,k1 ), (i2,1 , . . . , i2,k2 ), . . . , (im,1 , . . . , im,km ),

which is the product of permutation σ and the permutation

2, 1, 4, 3, . . . , 2n − 1, 2n.

Since this latter permutation contains n cycles and its length is 2n, its sign is
(−1)n . Therefore we get that
n
Y
sign(σ) aσ(2i−1),σ(2i) = (−1)n W (C), (3.14)
i=1

since the sign of a clow sequence is the same as the sign of its corresponding
permutation.
The mapping is clearly an injection, since two permutations σ and σ 0 from
different equivalence classes contain different edges in Mσ and Mσ0 , and there-
fore their images will be different, too. We claim that the mapping is actually
a bijection between S2n /∼ and those alternating clow sequences that are per-
mutations and contain only even long cycles. Indeed, if there is a permutation
π corresponding to an alternating clow sequence C, then the permutation

σ = π ∗ (1, 2)(3, 4) . . . (2n − 1, 2n)

is the permutation whose image is C, where ∗ denotes the multiplication in


the group S2n .
Next, we are going to show that there is an involution g on those alternating
clow sequences which are not permutations and which are permutations but
contain an odd cycle, and if the alternating clow sequence C is any of these,
then
W (g(C)) = −W (C). (3.15)
The involution can be decomposed into two involutions on two disjoint sets.
The involution g1 is on alternating clow sequences that contain a clow in which
a vertex is revisited after an odd number of steps. Let C be such an alternating
clow sequence, and let C be the clow in it which contains the smallest revisited
vertex v (if there are more than one such clows or revisited vertices). Then
g1 (C) contains the same clows as C, except in C, the odd length closed path
from v to its next visit is inverted. Due to the skew-symmetry of A and the
definition of edge weights of edges (v2i−1 , v2i ) and (v2i , v2i−1 ),

W (g1 (C)) = −W (C). (3.16)


Linear algebraic algorithms. The power of subtracting 133

The involution g2 is on clow sequences that are not permutations and each
revisited vertex is revisited after an even number of steps. The involution given
in the proof of Theorem 29 suffices for g2 . We already showed that it is an
involution satisfying Equation (3.15); we only have to show that the image of
an alternating clow sequence with the given property also has this property.
The involution operates with cutting and merging cycles. These cycles must
have even length, since each vertex is revisited after an even number of steps,
therefore their images also satisfy the prescribed properties.
We get that X
pf (A) = (−1)n W (C) (3.17)
C∈AC

where AC is the set of alternating clow sequences. It is easy to see that the
dynamic programming algorithm given in the proof of Theorem 30 can calcu-
late this sum if a restriction is added that in each clow, in each even step, the
edge must be (v2i−1 , v2i ) or (v2i , v2i−1 ).

In the following sections, such counting problems are introduced that can
be solved via calculating the determinant or the Pfaffian of a matrix.

3.2 Kirchhoff ’s matrix-tree theorem


Kirchoff’s matrix-tree theorem is about the number of spanning trees in
a graph [113]. It shows that this number can be calculated in polynomial
time via computing the determinant of a matrix derived from the graph. The
theorem uses several lemmas. First, we are going to introduce them below.

Lemma 4. Let G = (V, E) be an arbitrary graph. Orient its edges in an arbi-


trary way, and consider the vertex-edge adjacency matrix of this orientation,
that is, if the edge e is oriented from vertex u to vertex v, then the entry in
the matrix for the {u, e} pair is −1, and the entry for the {v, e} entry is 1.
All other entries in the column of e are 0. Let C denote this incidence matrix,
and let C −v denote the submatrix obtained by deleting the row corresponding
to vertex v. Furthermore for any F ⊆ E, let C −r [F ] denote the submatrix
which contains the columns corresponding to the edges in F in matrix C −r .
Let F be such that |F | = |V | − 1. Then
1. |det(C −v [F ])| = 1 if and only if F are edges of a spanning tree of G,
and
2. det(C −v [F ]) = 0 if and only if F are not the edges of a spanning tree of
G.
Proof. Observe that the two options for F (it is a set of edges of a spanning
134 Computational complexity of counting and sampling

tree or it is not) are complements, so the “only if” parts of both statements
follow from the “if” part of the other statement, and thus it is sufficient to
prove the “if” parts.
If the edges in F form the edges of a spanning tree T , we first set up a
partial ordering of the vertices such that u ≤ w if w is on the way from u to v.
Extend this partial ordering to an arbitrary total ordering and rearrange the
rows of the matrix based on this total ordering. There is a bijection ϕ between
the vertices in V \ {v} and the edges in F : u is mapped to e if e connects u
to its parent in the spanning tree rooted in v. The bijection indicates a total
ordering of the edges in F , e1 < e2 if ϕ−1 (e1 ) < ϕ−1 (e2 ). Let the columns of
C −v [F ] be ordered according to this total ordering.
Observe that the rearranged matrix is a lower diagonal matrix with all 1
and −1 in the diagonals. Therefore its determinant is either 1 or −1. How-
ever, the determinant of this matrix and the original C −v [F ] are the same in
absolute value.
If the edges in F do not form a spanning tree, then it contains a cy-
cle. Indeed, |F | = |V | − 1, and any cycle free graph with |V | − 1 edges is
a tree. Since F is not the set of edges of a tree, it must contain a cycle.
Let F 0 denote the edges in this cycle. The corresponding columns in C −v [F ]
are linearly dependent since an appropriate linear combinations of them is
the 0 vector. Indeed, fix an arbitrary walk on the edges around the cycle,
v0 , e1 , v1 , e2 , . . . , v|F 0 |−1 , e|F 0 | , v0 . Let the linear coefficient of the column rep-
resenting ei be 1 if the matrix entry for the pair vi−1 , ei in C is 1, otherwise let
the coefficient be −1. Then this linear combination of column vectors indeed
the 0 vector, since for each row representing vi , there is a 1 and a −1 in the
appropriately weighted column vectors. Since the columns of C −v [F ] are not
linearly independent, det(C −v [F ]) = 0.

The product of C −v and its transpose is a (|V | − 1) × (|V | − 1) matrix.


The Cauchy-Binet theorem [20, 39] is about the determinant of the product
T
of two matrices, and thus, can tell det(C −v C −v ).
Theorem 32. Let A, B ∈ Rn×m be matrices. Then
X
det(AB T ) = det(A[F ])det(B[F ]) (3.18)
F ⊂{1,2,...,m}∧|F |=n

where A[F ] denotes the n × n submatrix of A whose column’s indexes are the
indexes in F .
Proof. Consider the determinant of the following matrix
 
0 A
D=
BT I
where 0 denotes the all-zero matrix and I denotes the identity matrix. We are
going to calculate the determinant in two different ways.
Linear algebraic algorithms. The power of subtracting 135

The first way is based on the Laplace expansion on the first n rows of D.
Define the set of indexes C = {n + 1, . . . , n + m}, and index the column of A
by these indexes. It is sufficient to consider subsets of these columns since in
other subsets of indexes F , the matrix (0|A)[F ] contains an all-zero column,
and thus, its determinant is 0. Therefore
X Pn P
det(D) = (−1) i=1 i+ f ∈F f det(A[F ])det((B T |I)[F̄ ]). (3.19)
F ∈C∧|F |=n

We also calculate det((B|I)[F ]) by Laplace expansion, but now based on the


first n columns. Then a submatrix containing some rows should be defined.
If F 0 is a subset of indexes, let [F 0 ]B denote the submatrix of B whose rows
indexes are in F 0 . Then

det((B T |I)[F ]) =
X Pn P 0
(−1) i=1 i+ f 0 ∈F 0 (f −n) det([F 0 ]B T )det([F¯0 ]I[F̄ ]).(3.20)
F 0 ∈C∧|F 0 |=n

It is easy to see that all terms in this sum are 0 except the one in which
F 0 = F . Indeed, if F 0 6= F , then F¯0 6= F̄ . In that case [F¯0 ]I[F̄ ] contains an
all-0 row (and also, an all-0 column), and thus, det([F¯0 ]I[F̄ ]) = 0. If F 0 = F ,
then det([F¯0 ]I[F̄ ]) = 1. Therefore,
Pn P
i+ f ∈F (f −n)
det((B T |I)[F ]) = (−1) i=1 det([F ]B T ) =
Pn P
(−1) i=1 i+ f ∈F (f −n) det(B[F ]). (3.21)

We get that
Pn
f −n2
X P
det(D) = (−1)2 i=1 i+2 f ∈F det(A[F ])det(B[F ])
F ⊂{1,2,...,m}∧|F |=n
X
= (−1)n det(A[F ])det(B[F ]), (3.22)
F ⊂{1,2,...,m}∧|F |=n

since
−n2 ≡ n (mod 2). (3.23)
The second way to calculate det(D) is based on Gaussian elimination. For
each i = 1, . . . , n, we add to row i the following linear combination: −ai,1
times the n + 1st row plus −ai,2 times the n + 2nd row, etc., plus −ai,j times
the n + j th row. We get a matrix
 
C 0
D0 =
BT I

where
m
X
ci,j = −ai,k bk,j (3.24)
k=1
136 Computational complexity of counting and sampling

namely, C = −AB T . Note that

det(D) = det(D0 ) = det(C) = (−1)n det(AB T ), (3.25)

therefore
X
det(AB T ) = det(A[F ])det(B[F ]). (3.26)
F ⊂{1,2,...,m}∧|F |=n

Now we are ready to prove Kirchhoff’s theorem on the number of spanning


trees.
Theorem 33. Let G = (V, E) be an arbitrary graph, and let C −v be the
matrix constructed from G in Lemma 4. Then the number of spanning trees
of G is
T
det(C −v C −v ). (3.27)
Proof. From the Cauchy-Binet theorem we get that
T
X
det(C −v C −v ) = det(C[F ])2 (3.28)
F ⊂{1,2,...,m}∧|F |=n−1

where m is the number of edges, and n is the number of vertices in G. However,


from Lemma 4, we know that det(C[F ])2 is 1 if and only if F contains the
edge indexes of a spanning tree.
T
By calculating the matrix C −v C −v , we can give another form of Kirch-
hoff’s matrix-tree theorem.
Theorem 34. Let G = (V, E) be an arbitrary graph. Let D denote the diago-
nal matrix containing the degrees of the graph, and let A denote the adjacency
matrix of G. Let D−v and A−v denote the matrices obtained from D and A
by deleting the row and column for vertex v. The number of spanning trees of
G is
det(D−v − A−v ). (3.29)
Proof. It is sufficient to show that
T
D−v − A−v = C −v C −v . (3.30)
T
The element in position (i, j) in the matrix C −v C −v is the scalar product of
the ith and j th row of matrix C −v . If i 6= j, then it is −1 if there is an edge
between vertices vi and vj , and 0 otherwise. Indeed, ci,k cj,k = −1 if ek is the
edge between vi and vj and 0 otherwise. If i = j, then the scalar product is
the degree of vertex vi (= vj ), since for each ek incident to vi , c2i,k = 1 and
for all k 0 such that ek0 is not incident to vi , c2i,k0 = 0. Therefore the equality
in Equation (3.30), and thus the theorem, hold.
Linear algebraic algorithms. The power of subtracting 137

The Kirchhoff theorem says that the number of spanning trees can be
calculated in polynomial time for any graph. We can use Theorem 34 to count
the leaf-labeled trees.
Example 7. Count the leaf-labeled trees on n vertices.
Solution. The number of leaf-labeled trees on n vertices are the spanning trees
of the complete graph Kn . By Theorem 34, it is the determinant of the matrix
 
n−1 −1 ··· −1 −1
 −1 n − 1 ··· −1 −1 
 
 .. .. .. .. .. 

 . . . . . 

 −1 −1 ··· n − 1 −1 
−1 −1 ··· −1 n−1

where the matrix has n − 1 rows and n − 1 columns. We can add linear combi-
nations of lines to some lines of this matrix without changing the determinant.
First, add all lines starting from the second line to the first line. We get the
following matrix:
 
+1 +1 ··· +1 +1
 −1 n − 1 · · · −1 −1 
 
 .. .. . . .
. ..
.

 .
 . . . . 
 −1 −1 ··· n − 1 −1 
−1 −1 ··· −1 n−1

Adding the first line to all other lines, we get


 
+1 +1 · · · +1 +1
 0
 n ··· 0 0 
 .. .. . . .
. ..  .
 .
 . . . . 

 0 0 ··· n 0 
0 0 ··· 0 n

Since it is an upper triangle matrix, its determinant is the product of the


elements in the diagonal. Therefore, the number of leaf-labeled trees on n
vertices is nn−2 . 

Kirchhoff’s theorem can be extended to weighted spanning trees.


Theorem 35. Let G = (V, E) be a graph, and let w : E → R be a weight
function, where R is an arbitrary commutative ring. Let T be a spanning tree
of G. Define the weight of T as
Y
W (T ) := w(e). (3.31)
e∈T
138 Computational complexity of counting and sampling

Define the diagonal matrix D such that its diagonal entry


X
di,i := w(e), (3.32)
e∈I(vi )

where I(vi ) is the set of edges incident to vi . Define the weighted adjacency
matrix A such that ai,j = w(e) if e is the edge (vi , vj ). Let v be an arbitrary
vertex, and define A−v and D−v by deleting the row and column corresponding
to vertex v from A and D. Then
X
W (T ) = det(D−v − A−v ) (3.33)
T ∈T

where T is the set of spanning trees of G.


Proof. Give an arbitrary orientation of the edges, and let C denote the usual
oriented incidence matrix of G. Construct a weighted version of the oriented
incidence matrix Cw whose rows are the vertices of G, columns are the edges
of G, and if an edge e is oriented from vertex u to vertex v, then the entry for
the pair u, e is −w(e), and the entry for the pair v, e is w(e). All other entries
are 0.
First, we are going to show that
T
C −v Cw
−v
= D−v − A−v . (3.34)
−v T
Indeed, the element in position (i, j) in the matrix C −v Cw is the scalar
product of the ith row in C −v and the j th row in Cw . If i 6= j, then it is
−v

−w(e) if e is an edge
P between vi and vj and 0 otherwise. If i = j, then the
scalar product is e∈I(vi ) w(e). Therefore Equation (3.34) indeed holds. We
have to use the Cauchy-Binet theorem:
−v T
X
det(C −v Cw )= det(C[F ])det(Cw [F ]). (3.35)
F ⊂{1,2,...,m}∧|F |=n−1

Observe that |det(C[F ])| = 1 and det(Cw [F ]) is either W (T ) or −W (T ) if


and only if F is the edge set of spanning tree T . Furthermore det(C[F ]) = 1
exactly when det(Cw [F ]) = W (T ). Therefore, their product is always W (T )
when F is the edge set of spanning tree T .
We can use Theorem 35, for example, to count spanning trees with given
(small integer) weights.
Example 8. Let G = (V, E) be a graph, and let w : E → Z be a weight
function. Let m be the maximum of the absolute values of the edge weights,
and let n = |V |. Define the weight of a spanning tree T as
X
W (T ) := w(e). (3.36)
e∈T

Count the spanning trees with a given weight k. The running time must be a
polynomial function of both n and m.
Linear algebraic algorithms. The power of subtracting 139

Solution. Define a new weight function, w0 (e) := xw(e) . This new weight func-
tion contains monomials from the Laurent polynomial ring, Z[x, −x]. Further-
more, for any spanning tree T it holds that
W 0 (T ) = xW (T ) . (3.37)
Since Z[x, −x] is a commutative ring, the summation over the spanning trees T
of the weights W 0 (T ) can be calculated using only a polynomial number of ring
operations. Furthermore, the degrees of the monomials can vary between −mn
and mn, where n is the number of vertices in G, therefore any ring operation
can be done in polynomial time in mn. The coefficient of the monomial xk in
det(D−v − A−v ) is the number of spanning trees of weight k. 
The number of minimum (or dually, the maximum) spanning trees can be
counted in polynomial time with both the size of the graph and the logarithm
of the weights, see Exercise 13. On the other hand, it is NP-complete to decide
if there is a spanning tree with a given sum of weights, see Exercise 14.

3.3 The BEST (de Bruijn-Ehrenfest-Smith-Tutte) algo-


rithm
Kirchoff’s matrix-tree theorem can be extended to directed graphs, and
the directed trees (also known as arborescences) in a directed graph can be
counted. Such directed trees are related to Eulerian circuits summarized in
the BEST theorem. BEST is the acronym of four mathematicians, de Bru-
jin, Ehrenfest, Smith and Tutte, who developed the theory we are going to
introduce in this section [2, 170]. We first define the directed trees.
Definition 43. A directed tree or out-tree or arborescence is a directed graph
such that wiping out the direction of the edges makes it a tree. Furthermore,
there is a vertex v called the root of the tree such that from any vertex u, there
is a directed path from v to u.
Flipping the direction of all the edges yields an in-tree or anti-
arborescence, so in an in-tree, there is a directed path to the root v from
any other vertex u of the in-tree. Vertex v is still called the root in in-trees.
We are going to count the in-trees having a fixed root. There is a theorem
for counting in-trees similar to Kirchhoff’s matrix-tree theorem. In fact, the
following theorem is a generalization of Kirchhoff’s matrix-tree theorem.
Theorem 36. Let G ~ = (V, E) be a directed graph. Let A denote its adjacency
matrix, and let Dout denote the diagonal matrix containing the out-degrees for
−v
each vertex. Let A−v and Dout denote the matrices we get by deleting the row
and column for vertex v. Then the number of in-trees rooted into v is
−v
det(Dout − A−v ). (3.38)
140 Computational complexity of counting and sampling
−v
Proof. Let M −v denote Dout − A−v . The proof is based on an induction on
the number of edges in G. The base cases are the graphs having at most n − 1
edges, where n is the number of vertices. If the number of edges is less than
n − 1, there are at least two components in the graph, and at least one of
them does not contain v. The sum of the columns in M −v corresponding to
the vertices in such a component is the zero vector. Therefore the determinant
is 0, which is indeed the number of trees in the graph.
If the number of edges is n − 1, then there are three cases.
1. G is not connected in the weak sense, namely, its undirected version is
not connected. Then still the columns of M −v are linearly dependent,
thus, the determinant is 0, which is indeed the number of trees in G.
2. The undirected version of G is a tree, however, it is not an in-tree due
to the wrong direction of some of the edges. If the direction of the edges
were correct, then the out-degree of v would be 0 and the out-degree of
all other edges is 1. Since the sum of the out-degrees is n − 1, it follows
that there is a vertex v 0 6= v whose out-degree is 0. However, in that case
the row of vertex v 0 contains all 0, therefore the determinant of M −v is
0, which is the correct number of in-trees in G.
3. G is an in-tree. Then we first set up a partial ordering of the vertices such
that u ≤ w if w is on the way from u to v. We extend this partial ordering
to an arbitrary total ordering and apply a similarity transformation on
M −v such that the rows and columns follow the order of the vertices in
the total ordering. Such similarity transformation does not change the
value of the determinant. After the transformation, the matrix becomes
an upper triangular matrix with all 1s in its diagonal. Therefore, the
determinant is 1, which is the number of in-trees in G.
We proved the correctness of the base cases, now we do the induction.
Assume that the number of edges in G is m > n − 1. We can delete any out-
going edge from v without changing the number of in-trees in G and without
changing M −v . Therefore, we can assume that the number of outgoing edges
from v is 0, and there are still m > n − 1 edges. Therefore, there must be a
vertex v 0 6= v such that the out-degree of v 0 is greater than 1. Consider any
edge e going out from v 0 . Generate two graphs. G1 contains all the edges of G
except e, G2 contains all the edges of G except it does not contain any edge
going out from v 0 but e. Since in any in-tree, there is exactly one edge going
out from v 0 , any in-tree of G is either an in-tree of G1 or an in-tree of G2 .
Since both G1 and G2 have less edges than G, the determinants of the corre-
sponding matrices M1−v and M2−v calculates correctly the number of in-trees
in G1 and G2 . The sum of these two is the number of in-trees in G. However,
due to the linearity of the determinant,
det(M −v ) = det(M1−v ) + det(M2−v ), (3.39)
therefore, det(M −v ) is indeed the number of in-trees in G.
Linear algebraic algorithms. The power of subtracting 141

It is easy to see that Theorem 34 is a special case of Theorem 36. Indeed, let
G be an arbitrary undirected graph. Create a directed graph G, ~ such that each
edge e in G is replaced with a pair of directed edges going to both directions
between the vertices incident to e. Then, on one hand, the matrix D−v − A−v
−v
constructed from G is the same as the matrix Dout − A−v constructed from
0
G . On the other hand, there is a bijection between the spanning trees in G
and the in-trees rooted into v in G0 . Indeed, take any spanning tree T in G,
and for any edge e ∈ T , select the edge in G0 corresponding to e with the
appropriate direction.
Theorem 36 also holds for directed multigraphs in which parallel edges
are allowed but loops are not allowed. Two in-trees are considered different if
the same pair of vertices are connected in the same direction, however, for at
least for one pair of vertices, the directed edges connecting them are different
parallel edges in the multigraph. For such graphs, the out-degree is defined
as the number of outgoing edges, and there is k in the adjacency matrix in
position (i, j) if there are k parallel edges going from vertex vi to vertex vj .
Indeed, it is easy to see that the base cases hold, and the induction also holds
since the linearity of the determinant holds for any matrices.
The number of in-trees appears in the formula counting the directed Eu-
lerian circuits in directed Eulerian graphs. First, we define them.
Definition 44. A directed Eulerian graph is a directed, connected graph in
which for each vertex v, its in-degree equals to its out-degree. A directed Eule-
rian circuit (or short, a Eulerian circuit) is a directed circuit that travels each
edge exactly once.

It is easy to see that each directed Eulerian graph has at least one directed
Eulerian circuit. Their number can be calculated in polynomial time, stated
by the following theorem.
Theorem 37. Let G ~ = (V, E) be a directed Eulerian graph, and let v ∗ be an
arbitrary vertex in it. Then its number of directed Eulerian circuits is
Y
|Tvin
∗ | (dout (v) − 1)! (3.40)
v∈V


where Tvin
∗ is the set of intrees rooted into v .

Proof. The Eulerian circuit might start with an arbitrary edge, therefore, fix
an outgoing edge e of v ∗ . Start a walk on it, and due to the pigeonhole rule,
this walk can be continued until v ∗ is hit dout (v ∗ ) times. The last hit closes
the walk, thus obtaining a circuit. This circuit might not be a Eulerian circuit,
since there is no guarantee that it used all the edges in G. ~ However, assume
that the circuit is a Eulerian circuit. Let E 0 be the set of last outgoing edges
along this circuit for each vertex in V \ {v ∗ }. We claim that these edges are
the edges of an in-tree rooted into v ∗ .
Indeed, there are n − 1 edges. We prove that from each particular vertex
142 Computational complexity of counting and sampling

u0 , there is a directed path to v ∗ . Let e1 be the last edge going out from u0 in
the Eulerian circuit. It goes to some u1 , and thus from u1 , the last outgoing
edge is later in the Eulerian circuit than e1 . Let it be denoted by e2 . This edge
goes into u2 , etc. We claim that the edges e1 , e2 , . . . eventually go to v ∗ due to
the pigeonhole rule. Indeed, it is impossible that some ei goes to uj for some
j < i since ei is later in the Eulerian circuit than ej . However, if ei goes to uj
then the Eulerian path is continued by going out from uj , and thus, the last
outgoing edge of uj is after ei , a contradiction.
Therefore the n − 1 edges form a connected graph, in the weak sense, and
thus, the undirected version of a graph is a tree. Furthermore, all edges are
directed towards v ∗ , therefore the directed version of the graph is an in-tree
rooted into v ∗ .
Furthermore, for each vertex v, give an arbitrary but fixed ordering of the
outgoing edges. If v 6= v ∗ and the last outgoing edge is the k th in this list, then
decrease the indexes of each larger indexed edge by 1. Thus, the indexes of the
outgoing edges except the last one form a permutation of length dout (v) − 1.
For v ∗ , decrease the indexes of each edge which have a larger index than
the index of e. Therefore, the indexes of the last dout (v ∗ ) − 1 edges form a
permutation of length dout (v ∗ ) − 1.
In this way, we can define a mapping of Eulerian circuits onto

Tvin
∗ × Sdout (v)−1 (3.41)
v∈V

where Sn denotes the set of permutations of length n in the following way. The
direct product of the in-tree of the last outgoing edges and the aforementioned
permutations for each vertex is the image of a Eulerian circuit.
It is clear that this mapping is an injection. Indeed, consider the first
step where two Eulerian circuits, C1 and C2 deviate. If these edges go out
from v ∗ , then their image will be different on the permutation in Sdout (v∗ )−1 .
Otherwise, they have different edges going out from some v 6= v ∗ . If the last
outgoing edges from v are different in the two circuits, then the images of the
two circuits have different in-trees. If the last outgoing edges are the same,
then the permutations in Sdout (v)−1 are different in the images.
in
This mapping is also a surjection. Indeed, take any in-tree T ∈ Tv∗ and for
each v 6= v∗, take a permutation πv ∈ Sdout (v)−1 . We are going to construct a
Eulerian circuit whose image is exactly

T× πv . (3.42)
v∈V \{v ∗ }

For each v, if the index of the edge which is in T and going out from v is k,
then increase by 1 all the indexes in permutation πv which are greater than or
equal to k. This is now a list of indexes. Extend this list by k; this extended
list Lv will be the order of the outgoing edges from v. That is, start a circuit
with the edge e going out from v ∗ , and whenever the circuit arrives at v, go
Linear algebraic algorithms. The power of subtracting 143

out on the next edge in the list Lv . Continue till all the outgoing edges from
v ∗ are used. We claim that the so obtained circuit is a Eulerian circuit whose
image is exactly the one in Equation (3.42).
Indeed, assume that some edges going out from v0 is not used. Then par-
ticularly, the last edge e1 going out from v0 is not used. It goes out to some
v1 , and since it is not used, then there are also outgoing edges in v1 which
is not used. Particularly, its last edge e2 is not used, which goes to some v2 .
However, these last edges e1 , e2 , . . . goes to v ∗ , namely, there are ingoing edges
of v ∗ which are not used in the circuit. However, in that case not all outgoing
edges from v ∗ is used, contradicting that the walk generating the circuit is
finished.
Therefore, the circuit is a Eulerian circuit. Its image is indeed the one in
Equation (3.42), due to construction.
Since the mapping is injective and surjective, it is a bijection. Thus, the
number of Eulerian circuits is

 Y
Tvin
∗ × Sdout (v)−1 = |Tvin
∗ | (dout (v) − 1)! (3.43)
v∈V \{v ∗ } v∈V \{v ∗ }

There is an interesting corollary of this theorem.


~ = (V, E) be a directed Eulerian graph, and v1 , v2 ∈ V .
Corollary 5. Let G
Then
|Tvin
1
| = |Tvin
2
|. (3.44)
It is easy to see that the BEST theorem also holds for directed Eulerian
multigraphs, even with loops. Indeed, if G ~ is a multigraph, then add a vertex
to the middle of each edge. Namely, each directed edge going from u to v
is replaced by a directed edge going from u to w and an edge going from
w to v, where w is a new vertex. This new graph G ~ 0 is a directed Eulerian
graph, and it is not a multigraph. There is a natural bijection between the
Eulerian circuits in G ~ and G
~ 0 , therefore they have the same number of Eulerian
circuits. Furthermore, it is clear that any subpaths of length 2 (u, w1 ), (w1 , v)
and (u, w2 ), (w2 , v) in a Eulerian circuit in G ~ 0 are interchangeable if w1 and
~ 0
w2 are inserted vertices in G , therefore each Eulerian circuit differing only in
the permutation of parallel edges appears
Y
m(u, v)! (3.45)
u,v∈V

times, where m(u, v) is the number of parallel edges going from u to v. (Here
m(v, v) is the number of loops on v.) Having said this, the following theorem
holds.
144 Computational complexity of counting and sampling

Theorem 38. Let G ~ = (V, E) be a directed multigraph, so loops (and even


multiple loops) are possible. Let m(u, v) denote the number of parallel edges
from u to v. Let v∗ be an arbitrary vertex in G. ~ The number of Eulerian
circuits where two circuits are not distinguished if they differ only in the per-
mutations of parallel edges is

|Tvin
Q
∗ | (dout (v) − 1)!
Q v∈V (3.46)
u,v∈V m(u, v)!


where Tvin
∗ is the set of in-trees rooted into v .

We are going to use Theorem 38 to count sequences with prescribed statis-


tics of consecutive character pairs.
Example 9. Let Σ be a finite alphabet, and let s : Σ × Σ → N statistics
function be given. Count the sequences in which for each (σ1 , σ2 ) ∈ Σ × Σ,
σ1 σ2 is a substring s(σ1 , σ2 ) times.

Solution. Construct a directed multigraph G ~ whose vertices are the characters


in Σ, and there are s(σ1 , σ2 ) edges from the vertex representing σ1 to the
vertex representing σ2 , and there are s(σ, σ) loops on the vertex representing
σ. There are 3 cases.
1. There are more than 2 vertices v such that din (v) 6= dout(v) or there
are 2 vertices v such that |din (v) − dout(v) | > 1. Then no prescribed
sequence exists. Indeed, assume that there is a sequence A with the
prescribed statistics of consecutive character pairs. Then each character
in the sequence is part of two substrings of length 2; in one of them it
is the first character and in one of them it is the last character, except
the first and the last character of A. Therefore, if a solution exists, for
each vertex v, dout (v) must be din (v) except for those 2 vertices that
represent the first and last character of A. The vertex representing the
first character has one more outgoing edge than incoming edge, and the
vertex representing the last character has one more incoming edge than
outgoing. However, if the first and the last character in A are the same,
then the constructed graph must be Eulerian.
2. For each vertex v, din (v) = dout (v) except for two vertices vb and ve ,
~
for which dout (vb ) = din (vb ) + 1 and din (ve ) = dout (ve ) + 1. Extend G

with a new vertex v , and add an edge going from v∗ to vb and also an
edge from ve to v ∗ . The so-modified G ~ 0 is a directed Eulerian graph. We
claim that the number of sequences is the number of Eulerian circuits
in G~ 0 factorized by the permutations of parallel edges. Indeed, fix an
arbitrary ordering on each set of parallel edges. Call a Eulerian circuit
canonical if the outgoing edges are used in the prescribed order when
the walk on the circuit starts in v ∗ . It is easy to see that there is a
bijection between sequences a1 a2 . . . an with prescribed statistics and
Linear algebraic algorithms. The power of subtracting 145

the canonical circuits that start in v ∗ , go to the vertex representing a1 ,


then to the vertex representing a2 , etc., finally going from the vertex
representing an to v ∗ using the outgoing edges in canonical order.
3. The constructed graph is Eulerian. Then take σ ∈ Σ, and consider the
sequences which start and end with σ. Extend G ~ with a vertex v ∗ and

add two edges, one going from v to the vertex representing σ and one
going from this vertex back to v ∗ . This modified graph is still Eule-
rian, and the number of canonical Eulerian circuits in it is the number
of sequences starting and ending with σ and satisfying the prescribed
character pair statistics. Doing this for all σ ∈ Σ and summing the
number of canonical Eulerian circuits answers the question.


3.4 The FKT (Fisher-Kasteleyn-Temperley) algorithm


The FKT algorithm is for counting perfect matchings in planar graphs.
The problem is originated from statistical physics where the question was:
How many ways are there to arrange diatomic molecules on a surface? In the
simplest case, the two atoms of each molecule occupy two vertices on a grid,
thus the simplest case can be reduced to count domino tilings on an n × m
grid. In 1961, Kasteleyn [112] and Temperley and Fisher [167] independently
solved this problem. Later, Kasteleyn generalized this result to all planar
graphs [110, 111].
In Chapter 4, we are going to prove that counting the perfect matchings
in a graph is #P-complete. Furthermore, it is also #P-complete to count the
(not necessarily perfect) matchings, even for planar graphs. However, counting
the perfect matchings in planar graphs is in FP. This fact is the base of the
holographic algorithms, which is an exciting, rapidly developing research topic,
and which is briefly introduced in Chapter 5.
The main idea of the FKT algorithm is to find the square of the number of
perfect matchings. As we are going to explain, this is the number of coverings
of the graph with disjoint edges and oriented even cycles. If the edges of a graph
are appropriately oriented, the number of such coverings is the determinant of
the oriented adjacency matrix. Therefore we have to start with the definition
of this appropriate orientation.
Definition 45. Let G = (V, E) be a planar embedding of a planar graph. An
orientation of the edges is a Pfaffian orientation if an odd number of edges
are oriented clockwise on each internal face.
Below we provide a fast (polynomial running time) algorithm to construct
146 Computational complexity of counting and sampling

a Pfaffian orientation of a planar graph, thus also proving that each planar
graph has a Pfaffian orientation.
Given a planar embedding of a planar graph G = (V, E), construct its dual
graph G∗ = (V ∗ , E ∗ ) in the following way. V ∗ is the set of faces of G, also one
vertex for the external face. Two vertices in V ∗ are connected with an edge
if the corresponding faces are neighbors. Take any spanning tree T ∗ of G∗ ,
and root it into the vertex corresponding the outer face of G. Let this vertex
be denoted by v ∗ . The edges of T ∗ correspond to edges of G separating the
neighbor faces. Let E 0 be the subset of these edges in G. Give an arbitrary
orientation of the edges in E \ E 0 . We claim that this orientation can be
extended to a Pfaffian orientation by visiting and removing the edges of T ∗
and giving an appropriate orientation of the corresponding edges in E 0 . While
there is an edge in T ∗ , take any edge e∗ ∈ E ∗ connecting a leaf which is not
v ∗ to the rest of T ∗ . The corresponding edge e0 ∈ E 0 is the last edge of a face
F not having an orientation yet. Give e0 an orientation such that F has an
odd number of clockwise-oriented edges. The edge e0 separates two faces, F
and F 0 . Observe that due to construction, F 0 is either the outer face or a face
which still has unoriented edges. If F 0 is not the outer face, the orientation
of e0 does not violate the property that F 0 will also have an odd number of
clockwise-oriented edges. Remove e∗ from T ∗ , and continue this procedure. In
the last step, T ∗ contains one edge, which connects a leaf to v ∗ . Orient the
corresponding edge in E 0 such that the last face in G also has an odd number
of clockwise-oriented edges.
This procedure indeed generates a Pfaffian orientation, since T ∗ at the
beginning contains a vertex for each face in G, and once the last edge of each
face is oriented in a Pfaffian manner, the orientation of the edges of the face
are not changed.
Pfaffian orientations have a very important property, stated in the follow-
ing lemma.
Lemma 6. Let G = (V, E) be a planar embedding of a planar graph and let its
edges be in a Pfaffian orientation. Let C be a cycle containing an even number
of edges, surrounding an even number of vertices in the planar embedding.
Then C contains an odd number of clockwise-oriented edges.
Proof. Consider the subgraph G0 that contains C and the vertices and edges
surrounded by C. G0 also has Pfaffian orientation since all of its internal faces
are also internal faces of G. Let F 0 denote the number of internal faces in G0 ,
let E 0 denote the number of edges of G0 , and let V 0 denote the number of
vertices in G0 . From Euler’s theorem, we know that
F 0 − E 0 + V 0 = 1. (3.47)
(Note that the outer face is not counted by F 0 .) We know that V 0 is even, since
there are an even number of vertices in the cycle C and the cycle surrounds
an even number of vertices. Therefore we get that
F 0 + E0 ≡ 1 mod 2. (3.48)
Linear algebraic algorithms. The power of subtracting 147

Each internal edge separates two faces, and the orientation of an internal edge
is clockwise in one of the faces and anticlockwise in the other face. Namely, if
we put those edges into Ec which have clockwise orientation in some faces, then
we put each internal edge into Ec and those edges in C which has clockwise
orientation. Since there are an odd number of clockwise-oriented edges in each
face, Ec has the same parity as F 0 . (Note that each internal edge is clockwise
only in one of the faces!)
If F 0 is odd, then there are an even number of edges in G0 , due to Equa-
tion (3.48). Since there are an even number of edges in C, there are an even
number of internal edges. The parity of Ec is odd, and since there are an even
number of internal edges, the number of clockwise-oriented edges in C is odd.
On the other hand, if F 0 is even, there are an odd number of edges in G0 ,
due to Equation (3.48). Since there are an even number of edges in C, the
number of internal edges is odd. The parity of Ec is even, so removing the
odd number of internal edges from Ec , we get that the number of clockwise-
oriented edges in C is still odd.
Even long cycles surrounding an even number of vertices in a planar em-
bedding are important since any cycle appearing in the union of two perfect
matchings are such cycles. More specifically, there is a bijection of oriented
even cycle coverings and ordered pairs of perfect matchings, stated and proved
below. First, we have to define oriented even cycle coverings.
Definition 46. Let G = (V, E) be a graph. An oriented even cycle covering
of G is a set of cycles with the following properties.
1. Each cycle is oriented, and removing the orientations, the edges are all
in E.
2. Each cycle has even length. A cycle length of 2 is allowed; in that case,
the unoriented versions of the edges are the same, however, they are still
in E.
3. Each vertex in V is in exactly one cycle.
Theorem 39. Let G = (V, E) be an arbitrary graph. The number of oriented
even cycle coverings of G is the square of the number of perfect matchings of
G.
Proof. We give a bijection between the oriented even cycle coverings and or-
dered pair of perfect matchings. Since the number of ordered pairs of perfect
matchings is the number of perfect matchings squared, it proves the theorem.
Fix an arbitrary total ordering of the vertices. We define two injective
functions, one from the ordered even cycle coverings to the ordered pair of
perfect matchings, and one in the other way, and prove that they are inverses
of each other.
Let C = C1 , C2 , . . . Ck be a cycle covering. For each cycle Ci , consider
the smallest vertex vi in it. Take a walk on Ci starting at vi in the given
148 Computational complexity of counting and sampling

orientation. Along the walk, put the edges into the sets M1 and M2 in an
alternating manner, the first edge into M1 , the second into M2 , etc. Remove
the orientations of all edges, both in M1 and M2 . The so-constructed sets will
be perfect matchings, since it is a matching and all vertices are covered. In
this way, we constructed a mapping from the oriented even cycle coverings to
the ordered pairs of perfect matchings.
We claim that this mapping is an injection. Indeed, if the unoriented ver-
sions of two coverings, C1 and C2 , differ in some edges, then they have different
images. If the edges are the same, and just some of the cycles have different
orientation, then different edges of that cycle will go to the first and the second
perfect matchings, and thus the images are still different.
The inverse mapping is the following. Let M1 and M2 be two perfect
matchings. First, take the union of them. The union of them consists of disjoint
even cycles and separated edges. Make an oriented cycle of length 2 from each
separated edge. Orient each cycle Ci such that the edge which is incident to
the smallest vertex vi and comes from M1 goes out from vi and the edge which
is from M2 and incident to vi comes into vi . We constructed an oriented even
cycle covering.
We claim that this mapping is an injection. Indeed, consider two ordered
pairs of perfect matchings, (M1 , M2 ) and (M10 , M20 ). If the set of edges of
M1 ∪ M2 is not the set of edges of M10 ∪ M20 , then the images of the two pairs
of perfect matchings are clearly different. If the two sets of edges in the unions
are the same, but M1 6= M10 , then consider an edge e which is in M1 and not
in M10 . This edge is in a cycle Ci in the union of the two perfect matchings.
We claim that Ci is oriented in a different way in the two matchings. Indeed,
e is an edge from M20 , and since the edges are alternating in Ci , any edge in
Ci which comes from M1 is an edge from M20 . Similarly, any edge from M2 is
an edge from M10 . Especially, the two edges incident to the smallest vertex vi
in Ci comes from M1 and M10 , therefore Ci is oriented differently in the two
images, thus, the two images are different. Finally, it is easy to see if the two
sets of edges in the unions are the same and M1 = M10 , then also M2 = M20 ,
thus the two ordered pairs of perfect matchings are the same.
It is also easy to see that the two injections are indeed the inverses of each
other.
We are ready to prove the main theorem.
Theorem 40. Let G = (V, E) be a planar embedding of a planar graph, with
edges having a Pfaffian orientation. Define the oriented adjacency matrix A
in the following way. Let the entry ai,j be 1 if there is an edge going from vi to
vj , and let ai,j be −1 if there is an edge going from vj to vi . All other entries
are 0. Then the number of perfect matchings of G is
p
det(A). (3.49)

Proof. Based on Theorem 39, it is sufficient to prove that det(A) is the number
Linear algebraic algorithms. The power of subtracting 149

of oriented even cycle coverings in G. By definition, the determinant of the


n × n matrix A is
X n
Y
det(A) := sign(π) ai,π(i) . (3.50)
π∈Sn i=1

Let On denote the set of those permutations that contain at least one odd
cycle. We prove that
X n
Y
sign(π) ai,π(i) = 0. (3.51)
π∈On i=1

If a permutation π has a fixed point, then


n
Y
ai,π(i) = 0, (3.52)
i=1

since if j is a fixed point then aj,π(j) = aj,j = 0. For those permutations π


that contain an odd cycle of length at least 3, we set up an involution g and
show that
n
Y n
Y
sign(π) ai,π(i) = −sign(g(π)) ai,g(π)(i) . (3.53)
i=1 i=1

The involution is the following. Let π be a permutation containing at least


one odd cycle of length at least 3. Amongst the odd cycles, let Ci be the
cycle containing the smallest number. Then the image of π, g(π) is the per-
mutation that contains the same cycles as π except Ci is inverted. Since A is
skew-symmetric, namely, for all i, j, ai,j = −aj,i , Ci has an odd length and
sign(π) = sign(g(π)), so Equation (3.53) holds. Therefore Equation (3.51)
also holds, since any permutation with odd cycles has a fixed point or an odd
cycle with length at least 3 (or both, but then Equation (3.53) still holds,
both sides are 0).
We get that
X n
Y
det(A) := sign(π) ai,π(i) . (3.54)
π∈Sn \On i=1

Therefore it is sufficient to show that for any π ∈ Sn \ On ,


n
Y
sign(π) ai,π(i) = 1 (3.55)
i=1

if π is an oriented even cycle covering and 0 otherwise. If π is not an even


cycle covering, then there exists a j such that aj,π(j) = 0, thus the product
is 0. If π is an oriented even cycle covering, observe the following. Since n is
even, sign(π) is 1 if it contains an even number of cycles and −1 if it contains
an odd number of cycles. However, each even cycle appearing in an oriented
150 Computational complexity of counting and sampling

even cycle covering must contain an odd number of clockwise and an odd
number of anti-clockwise edges, since G has a Pfaffian orientation. Therefore,
the contribution of each cycle in the product in Equation (3.55) is −1. Thus
the product is −1 if the number of cycles is odd and it is 1 if the number of
cycles is even. Since the same is true for sign(π), Equation (3.55) holds if π
is an oriented even cycle covering.
Therefore det(A) is indeed the number of oriented even cycle coverings.
Due to Theorem 39, det(A) is the number of perfect matchings in G squared,
so its square root is indeed the number of perfect matchings.
Example 10. Count the 2 × 1 domino tilings on a 3 × 4 square.
Solution. Consider the planar graph whose vertices are the unit squares and
two vertices are connected if the corresponding squares are neighbors. A pos-
sible Pfaffian orientation is

Number the vertices from top to bottom, and from left to right, row by row.
Then the oriented adjacency matrix is
 
0 1 0 0 1 0 0 0 0 0 0 0
 −1 0 1 0 0 1 0 0 0 0 0 0 
 
 0 −1 0 1 0 0 1 0 0 0 0 0 
 
 0
 0 −1 0 0 0 0 1 0 0 0 0 
 −1 0 0 0 0 −1 0 0 1 0 0 0 
 
 0 −1 0 0 1 0 −1 0 0 1 0 0 
A=  
 0 0 −1 0 0 1 0 −1 0 0 1 0 
 0
 0 0 −1 0 0 1 0 0 0 0 1 
 0
 0 0 0 −1 0 0 0 0 1 0 0 
 0
 0 0 0 0 −1 0 0 −1 0 1 0 
 0 0 0 0 0 0 −1 0 0 −1 0 1 
0 0 0 0 0 0 0 −1 0 0 −1 0

Using standard methods to compute the determinant, we get that det(A) =


121. Therefore, the number of possible domino tilings on a 3 × 4 rectangle is
11. 
Linear algebraic algorithms. The power of subtracting 151

Note that there were two places where we used the planarity of G. First,
G can have a Pfaffian orientation, and second, any even cycle appearing in an
oriented even cycle covering has an odd number of clockwise edges. Indeed,
Theorem 39 holds for any graph. This suggests a weaker definition of Pfaffian
orientation.
Definition 47. Let G = (V, E) be a graph. A Pfaffian orientation of the edges
of G in the weak sense is an orientation that determines, for the corresponding
oriented adjacency matrix A of G, that
p
det(A) (3.56)

is the number of perfect matchings in A.


Recall that the Pfaffian of the matrix is
X n
Y
pf (A) := sign(σ) aσ(2i−1),σ(2i) . (3.57)
σ∈S2n /∼ i=1

If A is the oriented adjacency matrix of a graph G, then the score of a per-


mutation from S2n /∼ in Equation (3.57) is 1 or −1 if and only if for all i,
(vσ(2i−1) , vσ(2i) ) is an edge in G. In such a case, these edges form a perfect
matching in G. Therefore, for any orientation of G, we get that

|pf (A)| ≤ P M (G) (3.58)

where A is the oriented adjacency matrix of G, and P M (G) is the number


of perfect matchings in G. Furthermore, equality holds if and only if for any
perfect matching of G, the score of the corresponding permutations in Equa-
tion (3.57) are all 1 or all −1. Since for any skew-symmetric matrix, the square
of the Pfaffian is the determinant (see also Exercise 19), we get that an ori-
entation of the edges of graph G is a Pfaffian orientation (by Definition 47)
if and only if the absolute value of the Pfaffian of the corresponding oriented
adjacency matrix is the number of perfect matchings in G.
We can assign weights to the edges of a planar graph from an arbitrary
commutative ring, and then we might ask the sum of the weights of perfect
matchings in a planar graph, where the weight of a perfect matching is the
product of the weights of the edges. It is easy to see that the square of this
sum is the determinant of the weighted, oriented adjacency matrix, where the
orientation is Pfaffian.
The determinant can be calculated in polynomial time in an arbitrary
commutative ring, thus it is easy to see that the square of the sum of perfect
matching weights can be calculated in polynomial time. However, the Pfaffian
can be directly calculated in polynomial time in an arbitrary commutative
ring. This yields to the following theorem.
152 Computational complexity of counting and sampling

Theorem 41. Let G = (V, E) be a planar graph, let R be an arbitrary com-


mutative ring, and let a weight function w : E → R be given. Define the
partition function of G as
X Y
Z(G) := w(e) (3.59)
M ∈P M (G) e∈M

where P M (G) is the set of perfect matchings of G.


Take a Pfaffian orientation of G in the string sense, namely by Defini-
tion 45, and generate two matrices. A is the usual oriented adjacency matrix,
and Aw is the weighted adjacency matrix, that is, for an edge e = (vi , vj ), the
corresponding matrix entry ai,j is w(e) if edge e is oriented from vi to vj , and
it is −w(e) if edge e is oriented from vj to vi . All other entries are 0 in Aw .
Then Z(G) = pf (Aw ) if pf (A) ≥ 0 and Z(G) = −pf (Aw ) if pf (A) < 0.
Proof. Define the signed weight of a perfect matching M as
n
Y
sign(σ) aσ(2i−1),σ(2i) (3.60)
i=1

where σ is a permutation satisfying that for all i = 1, . . . , n, (vσ(2i−1) , vσ(2i) )


is an edge in M, and ak,j is an entry of the oriented weighted adjacency
matrix Aw . We showed that the weight is well-defined. Indeed, any swapping
of indexes σ(2i − 1) and σ(2i) changes the sign of the permutation, and also
changes the sign of one term in the product of the weights. Swapping a pair
of indexes changes neither the sign of the permutation nor the product of
the weights. We define the sign of the signed weight of a perfect matching as
positive, if it is the product of the edge weights in the perfect matching, and
this sign as negative, if it is the additive inverse of the product of the edge
weights in the perfect matching. Clearly, this sign depends only on the sign
of σ and on how many times aσ(2i−1),σ(2i) is the weight of the corresponding
edge, and how many times it is the additive inverse of the weight of the
corresponding edge.
The next observation is that the sign of the signed weight of each perfect
matching is the same for a fixed Pfaffian orientation. We are going to prove
it via the following steps. Let M be a perfect matching of G. An alternating
cycle C is a cycle in G such that along that cycle, its edges are alternatingly
presented and not presented in M . It is easy to see that M ∆C is also a
perfect matching, where ∆ denotes the symmetric difference. We claim that
C surrounds an even number of vertices in a planar embedding of G. Indeed,
M is a perfect matching, and the vertex set of C and M ∩ C is the same.
Therefore each vertex inside C must be paired in M with another vertex
inside C. Thus, the number of vertices inside C is even. Further, we claim
that the signs of the signed weights of M and M ∆C are the same. To see
this, first fix two permutations σ and σ 0 , with the following properties. The
permutation σ contains the edges of M while σ 0 contains the edges of M ∆C.
Linear algebraic algorithms. The power of subtracting 153

Outside of C, σ and σ 0 are the same, and on C, both permutations contain


the edges in anticlockwise orientation. That is, if (vσ(2i−1) , vσ(2i) ) is an edge
in C, then vσ(2i−1) is before vσ(2i) in anticlockwise direction. The same is true
for σ 0 . For sake of simplicity and without loss of generality, we can assume
that σ starts with i1 , i2 , . . . , i2k , where vi1 , vi2 , . . . , vi2k are the vertices of C in
anticlockwise direction, and σ 0 starts with i2k , i1 , i2 , . . . , i2k−1 . We show that
σ and σ 0 have different signs. Indeed, to obtain σ 0 from σ, we have to bubble
down i2k to the first position. This is an odd number of transpositions, which
changes the sign of the permutation.
Now C contains an odd number of clockwise edges, since G is Pfaffian
oriented. Then M and M ∆C contain different parity of clockwise-oriented
edges along C. What follows is that in the signed weight of M and M ∆C,
different parity of entries of Aw are the additive inverses of the weights of
the corresponding edges when the signed weight is written using σ and σ 0 .
However, σ and σ 0 have different signs, therefore the sign of the signed weight
of M and M ∆C is the same.
To see that the signs of the signed weights of all perfect matchings are
the same, consider two perfect matchings of G, M and M 0 , and take their
symmetric difference. M ∆M 0 is the disjoint union of cycles, take them in an
arbitrary order, C1 , C2 , . . . Ck . Define M1 := M ∆C1 and Mi = Mi−1 ∆Ci , for
all i = 2, . . . k. C1 is an alternating cycle with respect to M , and each Ci is
an alternating cycle with respect to Mi−1 . Then the sign of M ∆C1 is the sign
of M , and the sign of Mi is the sign of Mi−1 . Especially, the sign of M is the
sign of Mk . Observe that Mk = M 0 , therefore the sign of M is the sign of M 0 .
We get that X Y
|pf (Aw )| = w(e), (3.61)
M ∈P M (G) e∈M

since there is a bijection between the non-vanishing products in the definition


of the Pfaffian in Equation (3.13) and the perfect matchings in G. Further-
more, each such product in absolute
Q value is the weight of the corresponding
perfect matching M (that is, e∈M w(e)), and either all of these products are
the weight of the perfect matchings or all of them are the additive inverses.
The sign of pf (A) can tell which case holds, as stated in the theorem.
Example 11. Count how many ways there are to tile a 2 × 4 rectangle with
k horizontal and 4 − k vertical dominos.
Solution. Consider the planar graph whose vertices are the unit squares, and
two vertices are connected if their corresponding squares are neighbors. A
possible Pfaffian orientation is
154 Computational complexity of counting and sampling

Using Z[x], assign a weight x to each horizontal edge, and weight 1 to each
vertical edge. Number the vertices from top to bottom, and from left to right,
row by row. Then the weighted oriented adjacency matrix is
 
0 x 0 0 1 0 0 0
 −x 0 x 0 0 1 0 0 
 
 0 −x 0 x 0 0 1 0 
 
 0 0 −x 0 0 0 0 1 
A= −1

 0 0 0 0 −x 0 0 
 0 −1 0 0 x 0 −x 0 
 
 0 0 −1 0 0 x 0 −x 
0 0 0 −1 0 0 x 0

Using the introduced algorithm, we get that

|pf (A)| = x4 + 3x2 + 1. (3.62)

Namely, there is 1 tiling with 4 horizontal dominos, 3 tilings with 2 horizontal


dominos, and 1 tiling with no horizontal (all vertical) dominos. The careful
reader might observe that these coefficients come from the shallow diagonal
of Pascal’s triangle, and their sum is a Fibonacci number. 

3.5 The power of subtraction


We already mentioned in Chapter 2 that Jerrum and Snir proved that the
spanning tree polynomial needs an exponential number of additions and mul-
tiplications if subtractions are not allowed. On the other hand, its value can be
computed in polynomial time at any point using additions, multiplications and
subtractions, see Theorem 35. It is easy to see that the formal computations
in Theorem 35 build up the spanning tree polynomial, and use only a poly-
nomial number of operations (additions, multiplications and subtractions).
This does not mean that the spanning tree polynomial could be calculated
in polynomial time. Indeed, the size of the spanning tree polynomial might
be exponential. To resolve these facts looking paradoxically, observe that the
computation needed to perform an operation in a formal many-variable poly-
nomial ring might take exponential time. The operations might be easy to
perform in some of the homomorph images, and in those cases, the overall
running time will be a polynomial function of the input graph, not only the
number of operations. However, these algorithms use subtractions. We are go-
ing to prove that there is no polynomial-sized yield algebra building the set
of spanning trees of a graph satisfying mild conditions.
Theorem 42. Define K[n] as the complete graph on the first n positive integer
Linear algebraic algorithms. The power of subtracting 155

number. There is no yield algebra (A, (Θ, ≤), p, O, R) satisfying the following
properties.

1. For each n, there is a parameter θn such that S(θn ) contains the span-
ning trees of the complete graph K[n] .
Z+
2. There is a function g : A → 2( 2 ) with the following properties:
(a) For any ◦i ∈ O, and for any of its operands,
 
mi
) = tm

g(◦i (aj )j=1 j=1 g(aj ) t h◦i (p(a1 ), . . . , p(ami ))
i

+
where h◦i is a function mapping from Θmi to 2( 2 ) .
Z

(b) If a ∈ A is a spanning tree of K[n] , then g(a) is the set of edges in


it.
3. |θn↓ | = O(poly(n)).
Proof. The proof is by contradiction. We show that if such a yield algebra
exists, then there is a corresponding evaluation algebra that could build up
the spanning tree polynomial using a polynomial number of additions and
multiplications.
So assume that a yield algebra Y = (A, (Θ, ≤), p, O, R) with the above
described properties exists. Then construct the following evaluation algebra
(Y, R, f, T ). Let R be the multivariate polynomial ring over Z that contains
a variable xi,j for each unordered pair of positive integers (i, j). Define the f
function as Y
f (a) := xi,j . (3.63)
(i,j)∈g(a)

Then the Ti function for operator ◦i is defined as

Y mi
Y
Ti (r1 , . . . , rmi , θ1 , . . . , θmi ) := xk,l ri . (3.64)
(k,l)∈h◦i (θ1 ,...,θmi ) j=1

This is indeed an evaluation algebra, since each Ti function satisfies Equa-


tion (2.52).
Since for each spanning tree a ∈ A, f (a) is the monomial containing the
variables corresponding to the edges in the spanning tree, F (S(θn )) is indeed
the spanning tree polynomial. Since |θn↓ | = O(poly(n)), the spanning tree
polynomial could be calculated using only a polynomial number of additions
and multiplications. This contradicts the theorem of Jerrum and Snir stating
that computing the spanning tree polynomial needs an exponential number
of additions and multiplications.
156 Computational complexity of counting and sampling

Jerrum and Snir also proved an exponential lower bound on the number of
additions and multiplications to calculate the permanent polynomial defined
as
X Y n
per(M ) := xi,σ(i) (3.65)
σ∈Sn i=1

where M is an n × n matrix containing indeterminants xi,j and Sn is the set


of permutations of the first n positive integers. This result provides a theorem
on the absence of certain yield algebras on permutations.
Theorem 43. Let K ~ [n] denote the complete direct graph containing loops on
the first n positive integers. There is no yield algebra (A, (Θ, ≤), p, O, R) with
the following properties.
1. For each n, there is a parameter θn such that S(θn ) is Sn .
+
×Z+
2. There is a function g : A → 2Z with the following properties:
(a) For any ◦i and for any of its operands
 
mi
g(◦i (aj )j=1 ) = (tm
i=1 g(aj )) t h◦i (p(a1 , . . . , p(ami ))
i

+
×Z+
where h◦i is a function mapping from Θmi to 2Z .
(b) If a ∈ A is a permutation of length n, then g(a) is the set of edges
~ [n] that the permutation indicates.
in the cycle cover of K
3. It holds that |θn↓ | = O(poly(n)).

Proof. The proof is similar to the proof of Theorem 42. If such yield alge-
bra existed, then we could build up a corresponding evaluation algebra that
could compute the permanent polynomial using only a polynomial number of
additions and subtractions.
Contrary to the spanning tree polynomial, no algorithm is known to com-
pute the permanent polynomial using only a polynomial number of arithmetic
operations. In fact, computing the permanent is in #P-hard, so no polyno-
mial algorithm exists to compute the permanent of a matrix, assuming that
P is not NP. From this angle, it looks really accidental that the determinant
can be calculated with a dynamic programming algorithm due to cancellation
of terms. What that dynamic programming algorithm really calculates is the
sum of the weights of clow sequences which coincides with the determinant
of a matrix. A large set of similar “accidental” cases are known where we
can compute the number of combinatorial objects in polynomial time due to
cancellations. These cases are briefly described in Chapter 5.
Leslie Valiant proved that computing the perfect matching polynomial
needs exponentially many arithmetic operations if subtractions are forbidden
[177]. His theorem holds also if the problem is restricted to planar graphs.
Linear algebraic algorithms. The power of subtracting 157

On the other hand, the perfect matching polynomial can be computed with
a polynomial number of arithmetic operations if subtractions are allowed. In-
deed, it is easy to see that the formal computations in Theorem 41 build up
the perfect matching polynomial using only a polynomial number of arith-
metic operations in the multivariate polynomial ring. This again shows the
computational power of subtractions: subtractions might have exponential
computational power.

3.6 Further reading


• Anna Urbańska gave an Õ(n3.03 ) running time algorithm both for cal-
culating the determinant and the Pfaffian using only additions, subtrac-
tions and multiplications [172]. Here Õ is for hiding sub-power (loga-
rithmic) terms.
• Kasteleyn [112], and independently Temperley and Fisher [167], gave a
formula for the number of domino tilings of a rectangle. The number of
ways that an n × m rectangle can be covered with nm 2 dominos is

dY
2ed 2 e
n m
   
Y iπ jπ
4 cos2 + 4 cos2 . (3.66)
i=1 j=1
n+1 m+1

The number of domino tilings of a square with edge length 0, 2, 4, . . . is

1, 2, 36, 6728, 12988816, 258584046368, 53060477521960000, . . .

This is the integer sequence A004003 in OEIS.


• Charles H.C. Little extended Kasteleyn’s algorithm to the case when
a graph does not contain any subgraph homeomorphic to K3,3 [120].
Graphs G1 and G2 are homeomorphic if one can be transformed into
another with a series of transformations of the following two types.
– Subdivision. A vertex w added to the middle of an edge (u, v), so
the transformed graph contains edges (u, w) and (w, v).
– Smoothing. It is the reverse operation of subdivision: the degree 2
vertex w is deleted together with its edges (u, w) and (w, v), and
its neighbors u and v are connected with an edge.
Little’s theorem is that any graph not containing a subgraph homeomor-
phic to K3,3 can be Pfaffian oriented in the weak sense (Definition 47).
Unfortunately, this theorem does not directly provide a polynomial run-
ning time algorithm that decides if a graph can be Pfaffian oriented.
158 Computational complexity of counting and sampling

• György Pólya asked the following question [142]. Let A be a square 0-1
matrix. Is it possible to transform it to a matrix B by changing some
of its 1s to −1, such that the permanent of A is the determinant B?
Neil Robertson, Paul Seymour and Robin Thomas solved this question
[146]. Roughly speaking, their theorem says that a matrix A has the
above-mentioned property if and only if it is the adjacency matrix of a
bipartite graph that can be obtained by piecing together planar graphs
and one sporadic non-planar bipartite graph. Their theorem provides a
polynomial running time algorithm to decide if A has such a property
and if so, it also generates a corresponding matrix B.
• Simon Straub, Thomas Thierauf and Fabian Wagner gave a polynomial
time algorithm to count the perfect matchings in K5 -free graphs [164].
It is interesting to mention that some of the K5 -free graphs cannot
be Pfaffian orieneted in the weak sense; an example of that is K3,3 ,
see also Exercise 23. Their algorithm decomposes a K5 -free graph into
components and applies some matchgate techniques to get a polynomial
time algorithm. Matchgates are introduced in this book in Chapter 5.

3.7 Exercises
1. List all clow sequences of length 3.
2. ◦ Let Cln denote the number of clow sequences of length n on the
numbers {1, 2, . . . , n}. Give a dynamic programming recursion that finds
Cln .

3. * Show that there are exponentially many more clow sequences of a


given length than permutations.
4. Prove that the rank of the adjacency matrix of a directed graph is the
number of vertices minus the number of components of the graph.
5. Prove that any cycle-free graph G = (V, E) for which |E| = |V | − 1 is a
tree.
6. A graph G contains exactly two cycles, the length of the cycles are l1
and l2 . Find the number of spanning trees of G.

7. * How many spanning trees are in an octahedron?


8. How many spanning trees are in a cube?
Linear algebraic algorithms. The power of subtracting 159

9. Let G = (V, E) be a graph, and let w : E → Z be a weight function


assigning a polynomial of an order of at most t to each edge of a graph.
Show that X Y
w(e)
T ∈TG e∈T

can be calculated in O(poly(|V |, t)) time, where TG denotes the set of


spanning trees of G.
10. ◦ Let G = (V, E) be a graph, and let w : E → R be a weight function
assigning a weight to each edge of a graph. Show that
X X
w(e)
T ∈TG e∈T

can be calculated in polynomial time.


11. * Let G = (V, E) be a graph, whose edges are colored with red and blue
and green. Find a polynomial running time algorithm that counts the
spanning trees of G with k1 blue edges, k2 red edges, and |V |−1−k1 −k2
green edges.
12. Count the spanning trees of the complete bipartite graph Kn,m .
13. * Count the number of minimum spanning trees in an edge-weighted
graph. The weights might be large integer numbers, that is, the size of
the input is measured by the number of digits necessary to write the
numbers.
14. * Prove that it is NP-complete to decide if there is a spanning tree in a
graph with a given sum of weights. Hint: reduce the subset sum problem
(Theorem 5) to this problem.
15. * Replace each edge of a tetrahedron with a pair of antiparallel directed
edges. How many Eulerian circuits are there in the so-obtained directed
graph?
16. Orient the edges of the octahedron such that each meridian and the
equator are oriented. How many Eulerian circuits are in this directed
graph?
17. * How many sequences are there that contain the ab substring 6 times,
the ba substring 4 times, the bc substring 4 times, the cb substring 3
times, the ca substring once, and there are no other substrings of length
2?
18. How many sequences are there that contain the ab substring 12 times,
the ba substring also 12 times, the bc substring 5 times, the cb substring
2 times, the ac substring 3 times, the ca substring 6 times, and there
are no other substrings of length 2?
160 Computational complexity of counting and sampling

19. * Prove that for any skew-symmetric 2n × 2n matrix A,

pf 2 (A) = det(A).

20. * How many perfect matchings are in an octahedron?


21. How many perfect matchings are in a cube?

22. Prove that for any matrix A,


 
0 A
det(A) = pf .
−AT 0

23. * The complete bipartite graph K3,3 is not planar. Prove that it does
not have a Pfaffian orientation in a weak sense (Definition 47).
24. ◦ Let G = (V, E) be a planar graph, and let w : E → R be a weight
function. Show that
X X
Z(G) := w(e)
M ∈P M (G) e∈M

can be calculated in polynomial time, where P M (G) is the set of perfect


matchings in G.

3.8 Solutions
Exercise 2. Apply the algebraic dynamic programming approach on Equa-
tions (3.7) and (3.8).
Exercise 3. Those clow sequences that contain only one clow with head 1
are already exponentially more than the permutations of the same length.
Indeed, there are (n − 1)n−1 clows of length n with head n over the indexes
1, 2, . . . , n. Therefore the ratio of the number of clow sequences and the number
of permutations is at least
(n − 1)n−1
.
n!
Applying the Stirling formula, this fraction is
n
(n − 1)n−1

1 n−1
√ n = √ en
2πn ne

n 2πn n
Linear algebraic algorithms. The power of subtracting 161

which clearly tends to infinity exponentially quickly.


Exercise 7. Index the vertices of the octahedron such that the north pole is
the first vertex, the south pole is the last one, and the vertices on the equator
are ordered along the walk on the equator. Let v be the north pole. Then
 
4 −1 0 −1 −1
 −1 4 −1 0 −1 
−v −v
 
D −A =  0 −1 4 −1 −1  .

 −1 0 −1 4 −1 
−1 −1 −1 −1 4

Using standard calculations, we get that det(D−v − A−v = 384, therefore, the
octahedron has 384 spanning trees.
Exercise 10. Let G−e be the graph obtained from G by removing edge e.
Observe that the difference in the number of spanning trees of G and G−e is
the number of spanning trees containing e. If T e is the number of spanning
trees containing e, then the value to be calculated is
X
T e w(e).
e∈E

Alternatively, we can take the commutative ring R defined in Subsec-


tion 2.1.3.4, and assign values (1, w(e)) to each weight. According to The-
orem 35, the partition function of the spanning trees can be calculated in
polynomial time, from which the total sum of weights can be read out.
Exercise 11. Using the bivariate polynomial ring, Z[x1 , x2 ], assign value x1
to each blue edge, x2 to each red edge and 1 to each green edge. With these
weights, calculate
det(D−v − A−v ).
The coefficient of the monomial xk11 xk22 tells the number of spanning trees with
k1 blue edges and k2 red edges.
Exercise 13. We have to make two observations. The first is that every
minimum spanning tree can be obtained by Kruskal’s algorithm if we put the
edges in all possible total orderings that satisfy ei < ej if w(ei ) ≤ w(ej ). The
second observation is that any time during Kruskal’s algorithm, we set up a
spanning forest on the components defined by the edges E 0 = {e|w(e) ≤ w0 },
where the remaining edges in Kruskal’s algorithm all have weights larger than
w0 . Therefore, how to finish the minimum spanning tree does not depend on
the spanning forest so far built by Kruskal’s algorithm. Therefore, the number
of minimum spanning trees can be computed in the following way.
1. Set H to G.
2. While H is not the simple vertex graph, do the following:
(a) Let H 0 be the graph spanned by the minimum weight vertices in H.
Let the minimum weight in H be w, and let f (w) be the number
162 Computational complexity of counting and sampling

of spanning forests of H 0 . The number of spanning forests is the


product of the number of spanning trees on each component, this
can be calculated in polynomial time.
(b) Redefine H as contracting the vertices of each component in H 0 .
Note that multiple edges might appear when a vertex is connected
to different vertices of a component in H 0 , however, the number of
spanning trees can be easily calculated in the presence of parallel
edges, too.
Q
3. The number of minimum spanning trees in G is w∈W f (w), where W
is the set of weights appearing during the iteration in the previous step.
Exercise 14. Let S = {w1 , w2 , . . . , wn } be a set of weights. Create an edge-
weighted graph G in the following way. Define vertices v1 , v2 , . . . , vn+1 and
u1 , u2 , . . . , un+1 . For each i, connect vi to ui . Assign a weight wi to the edge
(vi , ui ) for each i = 1, . . . , n, and assign weight 0 to the edge (vn+1 , un+1 ).
Finally, we create a complete graph Kn+1 on both the vi and ui vertices. Each
edge in both Kn+1 components gets a weight 0.
It is easy to see that for each subset A of S, G contains a spanning tree
whose sum of weights is the sum of weights in A. Therefore if we could answer
in polynomial time if there is a spanning tree in G whose weight is m, we
could also tell in polynomial time if there is a subset A of S such that its sum
of weights is m. Since this latter is in NP-complete (see Theorem 5), it is also
in NP-complete to decide if there is a spanning tree of a graph with a weight
m.
Exercise 15. The obtained graph is the directed complete graph on 4 vertices.
For an arbitrary v, we get that
 
3 −1 −1
−v
Dout − A−v =  −1 3 −1  .
−1 −1 3
The determinant of this matrix is 16. Since for each vertex, the out-degree is
3, the number of Eulerian circuits is
4
16 × (3 − 1)! = 256.
Exercise 17. Since character a is 6 times in the first position of a substring
and only 5 times in the second position of a substring, the sequence must start
with a. Similarly, character b is 9 times in the second position in a substring,
and only 8 times in the first position, the sequence must end with character b.
Therefore, the sequences are the Eulerian circuits in the directed multigraph
with vertices vs , va , vb and vc . Vertex vs sends one edge to va . Vertex va sends
6 edges to vb . Vertex vb sends 4 edges to va , 4 edges to vc and 1 edge to vs .
Vertex vc sends 1 edge to va and 3 edges to vb . Therefore
 
6 −6 0
−vs
Dout − A−vs =  −4 9 −4 
−1 −3 4
Linear algebraic algorithms. The power of subtracting 163
−vs
det(Dout −A−vs ) = 24. Therefore the number of sequences with the prescribed
statistics is
0!5!8!3!
24 = 280.
1!6!4!4!3!1!1!
Exercise 19. Observe that for any 2n × 2n skew-symmetric matrix, Equa-
tion (3.54) holds. Also observe that the permutations containing only even
long cycles are exactly the oriented even cycle coverings. There is a bijection
between oriented even cycle coverings and ordered pairs of perfect matchings.
Using this bijection, for each permutation σ containing only even cycles, we
can assign two permutations π1 and π2 representing the two perfect matchings.
If
σ = (x1,1 , x1,2 , . . . , x1,k1 )(x2,1 , . . . , x2,k2 ) . . . (xm,1 , . . . , xm,km )
then π1 is
 
1 2 ··· k1 k1 + 1 ··· k1 + k2 ··· 2n − km ··· 2n
x1,1 x1,2 ··· x1,k1 x2,1 ··· x2,k2 ··· xm,1 ··· xm,km

and π2 is
 
1 2 ··· k1 k1 + 1 ··· k1 + k2 ··· 2n − km ··· 2n
.
x1,2 x1,3 ··· x1,1 x2,2 ··· x2,1 ··· xm,2 ··· xm,1

Observe that
sign(π1 )sign(π2 ) = sign(σ)
since
π1−1 π2 = σ,
and the sign of any permutation is the sign of its inverse. Therefore we get
that for any so-constructed σ, π1 and π2 ,
n
! n
!
Y Y
sign(π1 ) aπ1 (2i−1),π1 (2i) sign(π2 ) aπ2 (2i−1),π2 (2i) =
i=1 i=1
2n
Y
= sign(σ) ai,σ(i) . (3.67)
i=1

Summing this for all permutation σ which are even cycle coverings, we get
that
 
X n
Y
 sign(π1 ) aπ1 (2i−1),π1 (2i)  ×
π1 ∈S2n /∼ i=1
 
X n
Y
 sign(π2 ) aπ2 (2i−1),π2 (2i)  =
π2 ∈S2n /∼ i=1

X 2n
Y
= sign(σ) ai,σ(i) .
σ∈S2n \O2n i=1
164 Computational complexity of counting and sampling

Observe that on the left-hand side we have pf 2 (A) by definition, and on the
right-hand side, we have det(A) due to Equation (3.54).
Exercise 20. Index the vertices of the octahedron such that the north pole
is the first vertex, the south pole is the last one, and the vertices on the equa-
tor are ordered along the walk on the equator. With an appropriate Pfaffian
orientation, the oriented adjacency matrix is
 
0 1 −1 −1 −1 0
 −1 0 1 0 −1 −1 
 
 1 −1 0 −1 0 −1 
A= .
 1
 0 1 0 −1 1  
 1 1 0 1 0 −1 
0 1 1 −1 1 0

Using standard matrix calculations, we get that det(A) = 64. Therefore, the
number of perfect matchings in an octahedron is 8.
Exercise 23. From Exercise 22, we get that the Pfaffian of the oriented
adjacency graph of the complete bipartite graph K3,3 is the determinant of
the matrix  
x1 x2 x3
A =  x4 x5 x6 
x7 x8 x9
where each xi is either 1 or −1. The number of perfect matchings in K3,3 is
6. However the determinant of matrix A has only 6 terms, so the determinant
can be 6 only if each term is 1. This means that the det(A) could be 6 if all
of the following sets contain an even number of −1s:

{x1 , x5 , x9 }, {x2 , x6 , x7 }, {x3 , x4 , x8 }

and the following sets all contain an odd number of −1s:

{x1 , x6 , x8 }, {x2 , x4 , x9 }, {x3 , x5 , x7 }.

From the first 3 sets, we get that there must be an even number of −1s in A.
However, from the second 3 sets, we get that there must be an odd number
of −1s in A. It is impossible.
Exercise 24. Similar to Exercise 10, for each edge (vi , vj ), we can remove
vertices vi and vj to count the perfect matchings containing edge (vi , vj ).
Chapter 4
#P-complete counting problems

4.1 Approximation-preserving #P-complete proofs . . . . . . . . . . . . . . . . . 167


4.1.1 #3SAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
4.1.2 Calculating the permanent of an arbitrary matrix . . . . . . 170
4.1.3 Counting the most parsimonious substitution histories
on an evolutionary tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
4.1.4 #IS and #Mon-2SAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
4.2 #P-complete proofs not preserving the relative error . . . . . . . . . . . 186
4.2.1 #DNF, #3DNF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 186
4.2.2 Counting the sequences of a given length that a regular
grammar can generate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
4.2.3 Computing the permanent of a non-negative matrix and
counting perfect matchings in bipartite graphs . . . . . . . . . 188
4.2.4 Counting the (not necessarily perfect) matchings of a
bipartite graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190
4.2.5 Counting the linear extensions of a poset . . . . . . . . . . . . . . . 191
4.2.6 Counting the most parsimonious substitution histories
on a star tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195
4.2.7 Counting the (not necessarily perfect) matchings in a
planar graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 199
4.2.8 Counting the subtrees of a graph . . . . . . . . . . . . . . . . . . . . . . . 204
4.2.9 Number of Eulerian orientations in a Eulerian graph . . . 207
4.3 Further reading and open problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
4.3.1 Further results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
4.3.2 #BIS-complete problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
4.3.3 Open problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
4.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
4.5 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214

We already learned in Chapter 1 that #SAT is #P-complete. This chap-


ter provides a comprehensive list of #P-complete counting problems and in-
troduces the main proving techniques. We have to distinguish two different
ways to prove #P-completeness. The reason for the distinction is that one
of the proving techniques for #P-completeness actually proves more: it also
proves that the counting problem does not have an FPRAS unless RP =
NP. These proving techniques apply polynomial reductions reducing #SAT
to some counting problem #A that keeps the relative error. That is, if the

165
166 Computational complexity of counting and sampling

computation of problem instances from #A was only approximate, then still


separating the zero and non-zero number of solutions of a problem instance
from #SAT could be done with high probability. That is, it would provide a
BPP algorithm for SAT, which, as we already learned, would imply that RP
= NP. Other proving techniques apply polynomial reduction in which at least
one operation does not keep the relative error. This might be a subtraction
or modulo prime number calculation. Many (but definitely not all) of those
computational problems whose #P-completeness is proved in this way are in
FPRAS and in FPAUS. Below we detail these proving techniques.
1. Polynomial reductions that keep the relative error. These reductions are
also called approximation-preserving reductions.
(a) A polynomial reduction that for a CNF Φ, it constructs a problem
instance x in #A such that the number of solutions of x is exactly
the number of satisfying assignments of Φ. Such reduction actu-
ally also proves the NP-completeness of the corresponding decision
problem A. Thus, #A does not have an FPRAS unless RP = NP.
An example in this chapter is the proof of the #P-completeness of
the #3SAT. This type of reduction is also called a parsimonious
[155] reduction.
(b) A polynomial reduction that for a CNF Φ, it constructs a problem
instance x in some function problem A such that the answer for x
is a number which is k times the number of satisfying assignments
of Φ, where k can be computed in polynomial time. Typically, the
computational problem A is to compute the sum of weights of some
discrete mathematical objects, for example, the weights of cycle
covers in a directed graph. This happens to coincide with the per-
manent of the corresponding weighted adjacency graph, as shown
in this chapter. Such proof proves neither that deciding if the set
of the discrete mathematical objects is not empty is NP-complete
nor that finding the minimum or maximum weighted object is NP-
hard. However, an FPRAS algorithm for #A would prove that RP
= NP.
(c) A polynomial reduction that for a CNF Φ, it constructs a problem
instance x in #A such that the number of solutions of x is a + by,
where 0 ≤ a  b, y is the number of satisfying assignments of
Φ, and b is easy to compute.jSuchka reduction proves the #P-
completeness of #A. Indeed, a+by b is the number of satisfying
assignments of Φ. It is easy to see that an FPRAS for #A would
imply that RP = NP. Examples in this chapter are the proof of
#P-completeness of counting the most parsimonious substitution
histories on a binary tree and the proof of #P-completeness of
counting independent sets in a graph.
#P-complete counting problems 167

2. Polynomial reductions that do not keep the relative error. Such reduc-
tions do not exclude the possibility for an FPRAS approximation; on
the other hand, they do not guarantee the existence of efficient approx-
imations.
(a) Reductions using only one subtraction. For example, this is the way
to reduce #SAT to #DNF.
(b) Modulo prime number calculations. This is applied in reducing the
permanent computation of matrices of −1, 0, 1, 2 and 3 to matrices
containing only small non-negative integers.
(c) Polynomial interpolation. This reduction is based on the following
fact. If the value of a polynomial of order n is computed in n +
1 different points, then the coefficients of the polynomial can be
determined in polynomial time by solving a linear equation system.
In the polynomial reduction, for a problem instance a in the #P-
complete problem A, m + 1 problem instances bj from problem
B are constructed such that theP number of solutions for bj is the
m i
evaluation of some polynomial i=0 ci x at distinct points x =
xj . For some k, ck is the number of solutions of problem instance
a. This is a very powerful approach that is used in many #P-
completeness proofs of counting problems including counting the
not necessarily perfect matchings in planar graphs and counting
the subtrees of a graph.
(d) Reductions using other linear equation systems. An example in this
chapter is the reduction of the number of perfect matchings to the
number of Eulerian orientations of a graph.
Finally, a class of #P-complete computational problems has become a
center of interest recently. These problems are known to be equally hard, and
it is conjectured that they do not have an FPRAS approximation. We will
discuss them in Subsection 4.3.2.

4.1 Approximation-preserving #P-complete proofs


4.1.1 #3SAT
Definition 48. A 3CNF is a conjunctive normal form in which each clause
is a disjunction of exactly 3 literals. The 3SAT is the decision problem if a
3CNF can be satisfied. Accordingly, the #3SAT asks the number of satisfying
assignments of a 3CNF.
It is well known that 3SAT is NP-complete, and the usual reduction for
168 Computational complexity of counting and sampling

proving NP-completeness is to break large clauses into smaller ones introduc-


ing auxiliary variables. For example, the CNF
(x1 ∨ x2 ∨ x3 ∨ x4 ) ∧ (x2 ∨ x3 ∨ x4 ) (4.1)
can be rewritten into the 3CNF
(x1 ∨ x2 ∨ y) ∧ (x3 ∨ x4 ∨ y) ∧ (x2 ∨ x3 ∨ x4 ). (4.2)
It is easy to see that for any satisfying assignment of the CNF in Equa-
tion (4.1), there is a satisfying assignment of the 3CNF in Equation (4.2) and
vice versa. Indeed, if any of the literals satisfies the first clause in the CNF,
then y can be set such that the other clause in the 3CNF will also be satisfied.
Similarly, y on its own cannot satisfy the first two clauses, therefore another
literal must be TRUE, and then it satisfies the first clause in the CNF, too.
However, this reduction cannot keep the number of satisfying assignments.
For example, if both x1 and x3 are TRUE, then y might be arbitrary. On
the other hand, if x1 is FALSE and x2 is TRUE, then y must be TRUE to
satisfy the 3CNF in Equation (4.2). Therefore there is neither one-to-one nor
one-to-many correspondence between the two conjunctive normal forms. It is
easy to see that the CNF in Equation (4.1) has 13 satisfying assignments, and
on the other hand, the 3CNF in Equation (4.2) has 20 satisfying assignments.
Therefore, another reduction is needed if we also want to keep the num-
ber of satisfying assignments. Any CNF Φ can be described with a directed
acyclic graph, G~ = (V, E), see also Figure 4.1. In G,~ the input nodes are the
logical variables and there is one output node, O, representing the result of
the computation. Each internal node has two incoming edges and has exactly
one outgoing edge. The number of outgoing edges of the input node for the
logical variable xi equals the number of clauses in which xi or xi is a literal.
An edge crossed with a tilde (∼) means negation. Each internal node is eval-
uated according to its label (∨ or ∧), and the result of the computation is
propagated on the outgoing edge.
From G,~ we construct a 3CNF Φ0 such that Φ and Φ0 have the same num-
ber of satisfying assignments. First, we describe the computation on G ~ with a
3CNF, then we amalgamate it to get Φ0 such that only those assignments sat-
~ yielding a TRUE value propagated
isfy Φ0 that represent evaluations of Φ on G
to its output node O.
For each internal node, we can write a 3CNF of 3 logical variables. The
logical variables represent the two incoming and one outgoing edges, and those
4 assignments satisfiy the 3CNF that represen the computation performed
by the node. If the two incoming edges of a node performing a logical OR
operation are represented by x1 and x2 , and the outgoing edge is represented
by y, then the 3CNF is
(x1 ∨ x2 ∨ y) ∧ (x1 ∨ x2 ∨ y) ∧ (x1 ∨ x2 ∨ y) ∧ (x1 ∨ x2 ∨ y). (4.3)
Similarly, the 3CNF for the node performing a logical AND operation is
(x1 ∨ x2 ∨ y) ∧ (x1 ∨ x2 ∨ y) ∧ (x1 ∨ x2 ∨ y) ∧ (x1 ∨ x2 ∨ y). (4.4)
#P-complete counting problems 169

∨ ∨

∨ ∨


˜
x1 ˜x 2 x3 x4

FIGURE 4.1: The directed acyclic graph representation of the CNF


(x1 ∨ x2 ∨ x3 ∨ x4 ) ∧ (x2 ∨ x3 ∨ x4 ). Logical values are propagated on the
edges, an edge crossed with a tilde (∼) means negation. Each internal node
has two incoming edges and one outgoing edge. The operation performed by
a node might be a logical OR (∨) or a logical AND (∧). The outcome of the
operation is the income at the other end of the outgoing edge.

Similar 3CNFs can be constructed when one or both incoming values are
negated. The conjunction of the 3CNFs obtained for the internal nodes de-
scribes the computation on the directed acyclic graph, namely, the satisfying
assignments represent the possible computations on the directed acyclic graph.
To get the satisfying assignments of the initial CNF Φ, we require that the
value propagated to O must be a logical TRUE value. We can represent it
with a 3CNF adding two auxiliary logical variables, O0 and O”:

(O ∨ O0 ∨ O”) ∧ (O ∨ O0 ∨ O”) ∧ (O ∨ O0 ∨ O”) ∧ (O ∨ O0 ∨ O”) ∧


(O ∨ O0 ∨ O”) ∧ (O ∨ O0 ∨ O”) ∧ (O ∨ O0 ∨ O”). (4.5)

It is easy to check that only O = O0 = O” = T RU E satisfies the 3CNF in


Equation (4.5).
170 Computational complexity of counting and sampling

In this way, the CNF in Equation (4.1) can be rewritten as

(x1 ∨ x2 ∨ y1 ) ∧ (x1 ∨ x2 ∨ y1 ) ∧ (x1 ∨ x2 ∨ y1 ) ∧ (x1 ∨ x2 ∨ y1 ) ∧


(y1 ∨ x3 ∨ y 2 ) ∧ (y 1 ∨ x3 ∨ y2 ) ∧ (y1 ∨ x3 ∨ y2 ) ∧ (y 1 ∨ x3 ∨ y2 ) ∧
(y2 ∨ x4 ∨ y 3 ) ∧ (y 2 ∨ x4 ∨ y3 ) ∧ (y2 ∨ x4 ∨ y3 ) ∧ (y 2 ∨ x4 ∨ y3 ) ∧
(x2 ∨ x3 ∨ y 4 ) ∧ (x2 ∨ x3 ∨ y4 ) ∧ (x2 ∨ x3 ∨ y4 ) ∧ (x2 ∨ x3 ∨ y4 ) ∧
(y4 ∨ x4 ∨ y 5 ) ∧ (y 4 ∨ x4 ∨ y5 ) ∧ (y4 ∨ x4 ∨ y5 ) ∧ (y 4 ∨ x4 ∨ y5 ) ∧
(y3 ∨ y5 ∨ O) ∧ (y 3 ∨ y5 ∨ O) ∧ (y3 ∨ y 5 ∨ O) ∧ (y 3 ∨ y 5 ∨ O) ∧
(O ∨ O0 ∨ O”) ∧ (O ∨ O0 ∨ O”) ∧ (O ∨ O0 ∨ O”) ∧ (O ∨ O0 ∨ O”) ∧
(O ∨ O0 ∨ O”) ∧ (O ∨ O0 ∨ O”) ∧ (O ∨ O0 ∨ O”). (4.6)

If the CNF Φ has k variables and m logical operations ∨ and ∧, then Φ0 has
n + m + 2 logical variables, and has 4m + 7 clauses. Since clearly Φ0 can be
constructed in polynomial time, we get the following theorem:
Theorem 44. The counting problem #3SAT is in #P-complete.

4.1.2 Calculating the permanent of an arbitrary matrix


Leslie Valiant proved in 1979 that calculating the permanent is #P-hard
[175]. Recall that the permanent of an n × n square matrix A is defined as
n
X Y
per(A) := ai,σ(i) (4.7)
σ∈Sn i=1

where Sn is the set of permutations of length n. The permanent is similar


to the determinant, just the products are not weighted with the sign of the
permutation. While the determinant can be calculated in polynomial time,
there is no known polynomial running time algorithm to find the permanent.
If #P is not part of FP, then such algorithm does not exist.
Below we introduce the work of Valiant on how to reduce #3SAT to
computing the permanent of a matrix containing only values −1, 0, 1, 2 and
3. Any n × n square matrix A can be regarded as the adjacency matrix of
an edge-weighted, directed graph, G ~ with n vertices, where ai,j is the weight
~
of the edge going from vi to vj . G might contain loops and antiparallel edges
with different weights. It is easy to see that any permutation is a cycle cover of
the vertices. A loop is considered as a cycle of length one covering its vertex.
We can define the weight of a cycle cover as the product of the edge weights
in it. Having said this, it is clear that the permanent of a matrix A is the sum
of the weights of the cycle covers in G. ~
For any 3CNF Φ, let t(Φ) be twice the number of occurrences of the literals
in Φ minus the number of clauses in Φ. We are going to construct a directed
graph G ~ such that 4t(Φ) times the number of satisfying assignments of Φ is
#P-complete counting problems 171

the sum of the weights of the cycle covers in G, ~ that is, the permanent of the
corresponding weighted adjacency matrix. Furthermore, G ~ can be constructed
from Φ in polynomial time. This construction proves that calculating the
permanent is #P-hard, since calculating t(Φ) as well as dividing an integer
with 4t(φ) are both easy.
The construction is such that each cycle cover corresponding to a satisfying
assignment has a weight 4t(Φ) and all other “spurious” cycle covers cancel each
other out. Let Φ = C1 ∧ C2 ∧ . . . Cm , where each Ci = (yi,1 ∨ yi,2 ∨ yi,3 ) with
yi,j ∈ {x1 , x1 , x2 , x2 , . . . , xn , xn }. The graph is built up using the following
gadgets:
(a) A track Tk for each variable xk ;
(b) an interchange Ri for each clause Ci ;
(c) for each literal yi,j such that yi,j is either xk or xk , a junction Ji,k at
which Ri and Tk meet. Interchanges also have internal junctions with
the same structure.
Each junction is a 4-vertex, weighted, directed graph with the following
weighted adjacency matrix X:
 
0 1 −1 −1
 1 −1 1 1 
X := 
 0 1
. (4.8)
1 2 
0 1 3 0

Each junction has external connections via the 1st and 4th vertices and not
via the other two vertices. Let X[γ; δ] denote the submatrix obtained from X
deleting the rows γ and columns δ. The following properties are easy to verify:
(a) per(X) = 0,

(b) per(X[1; 1]) = 0,


(c) per(X[4; 4]) = 0,
(d) per(X[1, 4; 1, 4]) = 0,

(e) per(X[1; 4]) = per(X[4; 1]) = non-zero constant (= 4).


Only the junctions have edges with weight other than 1. All other edges in
the construction have weight 1.
A track Tk consists of 2 vertices, vk,1 and vk,2 , and rk junctions where rk is
the number of clauses in which literals xk and xk participate. There is an edge
going from vk,1 to vk,2 , and there are 1-1 paths from vk,2 to vk,1 picking up the
first and last edges of the junctions. One of the paths picks up the junctions
of clauses in which xk is a literal and the other path picks up the junctions of
clauses in which xk is a literal. An example is shown on Figure 4.2.
172 Computational complexity of counting and sampling
v5,1

R5 J5,5

J3,5 R3

R2 J2,5

v5,2

FIGURE 4.2: The track T5 for the variable x5 when x5 is a literal in C2 and
C5 and x5 is a literal in C3 .

Interchange Ri contains two vertices wi,1 and wi,2 , and 3 junctions for the
3 literals and 2 internal junctions. They are wired as shown on Figure 4.3.
The interchanges do not distinguish literals xi and xi , the edges connecting
the junctions are always the same.
~ as a set of cycle covers that contains the same edges
We define a route in G
outside the junctions. A route is good if every junction and internal junction is
entered exactly once and left exactly once at the opposite end. A route might
not be good for several reasons:
1. some junction and/or internal junction is not entered and left, or
2. it is entered and left on the same end, or
3. it is entered and left twice.
Due to the properties of the permanent of the submatrices of X, any route
which is not good, contributes 0 to the permanent. Indeed, if a junction is
not entered and left in a route, then the sum of the weights of the cycle
covers in that route will be 0 due to property (a). Similarly, if a junction is
entered and left on the same end or it is entered and left twice, the sum of the
weights of cycle covers will be 0 due to conditions (b)–(d). On the other hand,
condition (e) ensures that any good route contributes 4t(Φ) to the permanent:
the number of junctions and internal junctions is indeed t(Φ).
We have to show that the good routes are in a bijection with the satisfying
assignments of Φ. Observe the following:
1. Any good route in any track Tk either “picks up” the junctions corre-
sponding to the xk literal or picks up the junctions corresponding to the
xk literal. Assign the logical TRUE value to xk in the first case and the
logical FALSE value in the second case.
#P-complete counting problems 173

w3,1 w3,2
J3,1 J3,5 J3,8
T1 T5 T8

FIGURE 4.3: The interchange R3 for the clause C3 = (x1 ∨ x5 ∨ x8 ). Note


that interchanges do not distinguish literals xi and xi . Each edge in and above
the line of the junctions goes from left to right, and each edge below the line of
the junctions goes from right to left. Junctions without labels are the internal
junctions.

2. The interchanges are designed so that a good route can pick up the two
internal junctions and any subset of the junctions except all of them.
Furthermore, it can pick up these subsets in exactly one way. Therefore,
in any good route, each interchange picks up the junctions corresponding
to the literals that do not satisfy the clause corresponding to the inter-
change. Since it cannot pick up all the literals, some of them will satisfy
the clause. This is true for each interchange, namely, the assignment we
defined based on the good route is a satisfying assignment.

3. It is clear that different good routes define different satisfying assign-


ments, thus, the mapping we defined is an injection. On the other hand,
each satisfying assignment defines a good route, and it is just the inverse
of the mapping we defined.
Since there is a bijection between good routes and satisfying assignments,
each good route contributes with 4t(Φ) to the permanent and any route that is
not good contributes 0 to the permanent, we can conclude that the permanent
of the adjacency matrix of G ~ is indeed 4t(Φ) times the number of satisfying as-
signments. This proves that calculating the permanent of an arbitrary matrix
is #P-hard. Actually, finding the permanent of a matrix remains #P-hard
if we restrict the cases to matrices containing only values −1, 0, 1, 2 and
3. Restricting the values to non-negative numbers is particularly tricky. The
174 Computational complexity of counting and sampling

proof that finding the permanent remains #P-hard for non-negative entries
is based on a polynomial reduction that contains some computational steps
that do not preserve the relative error. Later we will see that a polynomial
reduction containing only computational steps that preserve the relative error
does not exist unless RP = NP.

4.1.3 Counting the most parsimonious substitution histories


on an evolutionary tree
In this subsection, we are going to introduce a #P-completeness proof
based on the work of Miklós, Tannier and Kiss [135] and Miklós and Smith
[133]. The small parsimony problem is defined in the following way.
Problem 7.
Name: SP-Tree.
Input: a finite alphabet Γ, a rooted binary tree T = (V, E), where L ⊂ V
denotes the leaves of the tree and f : L → Γk , a function assigning a sequence
of length k to each leaf of the tree.
Output: a function g : V → Γk such that g(v) = f (v) for all v ∈ L, and the
score X
H(g(u), g(v)) (4.9)
(u,v)∈E

is minimized, where H denotes the Hamming distance of the sequences, that


is, the number of positions in which the two sequences differ.
We learned in Chapter 2 that the SP-Tree problem is an easy optimiza-
tion problem, and the number of functions g minimizing the score in Equa-
tion (4.9) can be found in polynomial time. However, the counting problem
becomes hard if we would like to obtain the number of most parsimonious
scenarios instead of labelings of the internal nodes. There are H(g(u), g(v))!
number of ways to transform the sequence labeling vertex u to the sequence
labeling vertex v using H(g(u), g(v)) substitutions (a substitution changes one
character in a sequence). Therefore, if G denotes the set of functions minimiz-
ing the score in Equation (4.9), then we would like to compute
X Y
H(g(u), g(v))! (4.10)
g∈G (u,v)∈E

We will denote this counting problem #SPS-Tree (small parsimony scenario


on trees). We are going to show that computing this number is #P-complete
even if the problem is restricted to an alphabet of size 2. First we need the
following lemma.
Lemma 7. For any 3CNF Φ, there exists a 3CNF Φ0 such that the following
hold:
1. Φ0 can be constructed in polynomial time, particularly, the size of Φ0 is
a polynomial function of the size Φ.
#P-complete counting problems 175

2. Φ and Φ0 have the same number of satisfying assignments.


3. Φ0 contains an even number of variables, and in any satisfying assign-
ments of Φ0 , exactly half of the variables have a logical value TRUE.
Proof. Let x1 , x2 , . . . xn be the variables in Φ. Φ0 contains the variables
x1 , x2 , . . . , xn , y1 , y2 , . . . yn , and is defined as
n
^
Φ0 := Φ ∧ ((xi ∨ yi ∨ xi+1 ) ∧ (xi ∨ y i ∨ xi+1 )∧
i=1
(xi ∨ yi ∨ xi+1 ) ∧ (xi ∨ y i ∨ xi+1 )) (4.11)

where the index of xi+1 is taken modulo n, that is, xn+1 is defined as x1 .
The assignments of the xi variables in any satisfying assignment of Φ0 also
satisfies Φ, therefore, it is sufficient to show that any satisfying assignment
of Φ can be extended to exactly one satisfying assignment of Φ0 . But this is
obvious: the conjunctive form in Equation (4.11) forces that yi must take the
value of xi . This provides also that in any satisfying assignment of Φ0 , exactly
half of the values will take the TRUE value.
To prove that computing the quantity in Equation (4.10) is #P-complete,
we give a polynomial running time algorithm which for any 3CNF formula Φ,
constructs a problem instance p ∈ #SPS − TREE with the following property:
The number of solutions of p can be written as a + by, where y is the number
of satisfying assignments of Φ, b is an easy-to-calculate positive
  integer, and
0 < a  b. Thus, if s is the number of solutions of p, then sb is the number
of satisfying assignments of Φ.
Let Φ be a 3CNF, and let Φ0 be the 3CNF that has as many satisfying
assignments as Φ and in all satisfying assignments, the number of TRUE and
FALSE values are the same. Let n denote the number of logical variables in
Φ0 and let k denote the number of clauses in Φ0 . We are going to construct a
tree denoted by TΦ0 , and to label its leaves with sequences over the alphabet
{0, 1}. The first n characters of each sequence correspond to the logical vari-
ables xi , and there are
l further, auxiliary characters. mThe number
 of auxiliary
220
characters are 148k (k log(n!) + n log(2))/ log( 312 ) + 1 . The construction
is such that there will be 2n most parsimonious labelings of the internal nodes,
one for each possible logical assignment. Each labeling is such that the labeling
of the root completely determines the labelings at the other internal nodes.
The corresponding assignment is such that the value of the logical variable
xi is TRUE if there is a character 1 in the sequence at the root of the tree
in position i. The characters in the auxiliary positions are 0 in all the most
parsimonious labelings.
If an assignment is a satisfying assignment, then the corresponding labeling
has many more scenarios than the labelings corresponding to non-satisfying
assignments. Furthermore, for each satisfying assignment, the corresponding
labelings have the same, easy-to-compute number of scenarios.
176 Computational complexity of counting and sampling

For each clause cj , we construct a subtree Tcj . The construction is done


in three phases, illustrated on Figure 4.4. First, we create a constant-size sub-
tree, called the unit subtree, using building blocks we call elementary subtrees.
Then in the blowing-up phase, this unit subtree is repeated several times, and
in the third phase it is amended with another constant-size subtree. The rea-
son for this construction is the following: the unit subtree is constructed in
such a way that if a clause is satisfied, the number of scenarios on this subtree
is large, and is always the same number not depending on how many literals
provide satisfaction of the clause. When the clause is not satisfied, the number
of scenarios is a smaller number. The blowing up is necessary for sufficiently
separating the number of solutions for satisfying and non-satisfying assign-
ments. Finally, the amending is necessary for achieving 2n most parsimonious
labelings on each Tcj and to guarantee that the number of most parsimonious
scenarios is the same for each satisfying assignment. The amending is slightly
different for those clauses that come from Φ and those that are in Φ0 \ Φ.
We detail the construction of the subtree for the clause cj = x1 ∨ x2 ∨ x3 ,
denoted by Tcj . Subtrees for the other kinds of clauses are constructed sim-
ilarly. The unit subtree is built from 76 smaller subtrees that we will call
elementary subtrees. On each elementary subtree, the sequences labeling the
leaves contains 0 at almost all the positions, except the positions corresponding
to the literals of the clause and the position of the possible auxiliary charac-
ter. Only 14 different types of elementary subtrees are in a unit subtree, but
several of them have given multiplicity, and the total count of them is 76, see
also Table 4.1. Some of the elementary subtrees are cherry motives (two leaves
connected via an internal node, see also Fig. 4.5.a) ) for which we arbitrarily
identify a left and a right leaf. For some of these cherries, we introduce one or
more auxiliary characters, which are 1 only on the indicated leaf of the cherry
and 0 everywhere else in the tree. So the edges connecting these leaves to the
rest of the entire tree TΦ0 will contain one or more additional substitutions in
all the most parsimonious solutions.
The constructed unit subtree will be such that if the clause is not satisfied,
the number of possible most parsimonious scenarios for the corresponding
labeling on this unit subtree is 2136 × 376 , and if the clause is satisfied, then
the number of possible most parsimonious scenarios for each corresponding
labeling is 2156 × 364 . The ratio of the two numbers is 220 /312 > 1. We will
denote this number by γ.
Below we detail the construction of the elementary subtrees and also give
the number of most parsimonious scenarios on them since the number of sce-
narios on the unit subtree is simply the product of these numbers. This part is
quite technical, however, the careful reader might observe the following. The
number of scenarios for a fixed labeling on a unit tree is the product of the
number of scenarios on the elementary trees. These numbers are always in
2x 3y form, and we need a (linear) combination of unit trees such that the sum
of the exponents both on 2 and 3 is the same for all satisfying assignments
and different for the non-satisfying assignment; furthermore, the number of
#P-complete counting problems 177

amending

6444447444448

blowing up

...
644444744444
8

unit subtrees

6444447444448 ... ...

... ... ... elementary subtrees

FIGURE 4.4: Constructing a subtree Tcj for a clause cj . The subtree is


built in three phases. First, elementary subtrees are connected with a comb
to get a unit subtree. In the second phase the same unit subtree is repeated
several times, “blowing up” the tree. In the third phase, the blown-up tree
is amended with a constant size, depth 3 fully balanced tree. The smaller
subtrees constructed in the previous phase are denoted with a triangle in the
next phase. See also text for details.
178 Computational complexity of counting and sampling

solutions for the non-satisfying assignment is smaller than for any of the sat-
isfying assignments. Such combinations can be found by some linear algebraic
considerations not presented here; below we just show one possible solution.
For each elementary tree, we give the characters at positions of the three
literals. The elementary trees which are cherries are the following:
• There are four cherries on which the left leaf contains 1 in an extra po-
sition, and the characters in the positions of the three literals on the left
and right leaf are given by

011, 100
101, 010
110, 001
000, 111.

The first column shows the characters in the positions corresponding


to the literals on the left leaf, while the second column shows those
characters on the right leaf. Observe that there are 16 most parsimonious
labelings of the root of the cherry motif and each needs 4 substitutions.
However, 8 of them contain a character 1 in the auxiliary position, which
will not be a most parsimonious labeling on TΦ0 , as we discussed. So we
have to consider only the other 8 labelings, where the characters are all
0, except the positions corresponding to the literals.
The number of scenarios on one cherry is 24 if the sequence at the root
of the cherry is the same as on the right leaf. Indeed, in that case, 4
substitutions are necessary on the left edge, and they can be performed
in any order. If the number of substitutions are 3 and 1, respectively,
on the left and right edges, or vice versa, the number of solutions is
6. Finally, if both edges have 2 SCJ operations, then the number of
solutions is 4.

• There is one cherry motif without any extra adjacency, and the charac-
ters in the positions corresponding to the literals are

000, 111.

There are 8 most parsimonious labelings at the root, and each needs 3
substitutions. If the labeling at the root corresponds to a non-satisfying
assignment, the number of scenarios on this cherry is 6; if all logical
values are true, the number of scenarios is still 6; in any other case, the
number of scenarios is 2.
This elementary subtree is repeated 3 times.
• Finally, there are 3 types of cherry motifs with a character 1 at one-one
#P-complete counting problems 179

auxiliary positions on both leaves. These are two different adjacencies,


so both of them need one extra substitution on their incoming edge. The
characters in the positions corresponding to the literals are

011, 100
101, 010
110, 001.

There are 8 possible labelings of the root which are most parsimonious in
TΦ0 , and each needs 5 substitutions. If all substitutions at the positions
corresponding to the 3 literals falls onto one edge, then the number of
scenarios is 24, otherwise the number of solutions is 12.
Each of these elementary subtrees is repeated 15 times.
The remaining elementary subtrees contain 3 cherry motifs connected with
a comb, that is, a completely unbalanced tree, see also Figure 4.5. For the
cherry at the right end of this elementary subtree, there is one or more aux-
iliary positions that have character 1 at one of the leaves and 0 everywhere
else in TΦ .
There are 3 elementary subtrees of this type which have only one auxiliary
position. On these trees, the sequence at the right leaf of the rightmost cherry
is all 0, and the sequence at the left leaf of the rightmost cherry motif is
all 0 except at the auxiliary position and exactly 2 positions amongst the 3
positions corresponding to the literals.
The remaining leaves of these elementary subtrees are constructed in such a
way that there are 8 most parsimonious labelings, each needing 7 substitutions,
see the example in Figure 4.5. The number of substitutions is 0 or 1 at each
edge except the two edges of the rightmost cherry motif. Here the number of
substitutions might be 3 and 0, 2 and 1, or 1 and 2, yielding 6 or 2 scenarios,
see also Table 4.1.
Each of these elementary subtrees are repeated 3 times.
Finally, there are 3 elementary subtrees of this type which have one aux-
iliary position for the left leaf of the rightmost cherry motif, and there are
2 auxiliary positions for the right leaf of the rightmost cherry motif. The se-
quence at the right leaf of the rightmost cherry is all 0 except at the 2 auxiliary
positions, and the sequence at the left leaf of the rightmost cherry motif is
all 0 except at the auxiliary positions and exactly 2 positions amongst the 3
positions corresponding to the literals.
The remaining leaves of these elementary subtrees are constructed in such a
way that there are 8 most parsimonious labelings, each needing 9 substitutions,
see the example in Figure 4.5. The number of substitutions is 0 or 1 on each
edge except the two edges of the rightmost cherry motif. Here the number of
substitutions might be 1 and 4, 2 and 3, or 3 and 2, yielding 24 or 12 scenarios,
see also Table 4.1.
Each of these elementary subtrees are repeated 5 times.
180 Computational complexity of counting and sampling

a) c)

b)

1 : 1 1 : 1 1 : 1 1 : 1 1 : 0 1 : 0
2 : 1 2 : 0 2 : 1 2 : 0 2 : 1 2 : 0
3 : 1 3 : 1 3 : 0 3 : 0 3 : 1 3 : 0
x: 0 x: 0 x: 0 x: 0 x: 1 x: 0

FIGURE 4.5: a) A cherry motif, i.e., two leaves connected with an internal
node. b) A comb, i.e., a fully unbalanced tree. c) A tree with 3 cherry motifs
connected with a comb. The assignments for 4 adjacencies, α1 , α2 , α3 and
αx are shown at the bottom for each leaf. αi , i = 1, 2, 3 are the adjacencies
related to the logical variables bi , and αx is an extra adjacency. Note that
Fitch’s algorithm gives ambiguity for all adjacencies αi at the root of this
subtree.

011 101 110 000 011 101 110 000 011 101 110 011 101 110
# 1 1 1 1 3 3 3 3 5 5 5 15 15 15
000 6 6 6 6 63 63 63 63 125 125 125 1215 1215 1215
100 24 4 4 4 63 23 23 23 125 125 125 2415 1215 1215
010 4 24 4 4 23 63 23 23 125 125 125 1215 2415 1215
110 6 6 6 6 23 23 23 23 125 125 245 1215 1215 2415
001 4 4 24 4 23 23 63 23 125 125 125 1215 1215 2415
101 6 6 6 6 23 23 23 23 125 245 125 1215 2415 1215
011 6 6 6 6 23 23 23 23 245 125 125 2415 1215 1215
111 4 4 4 24 23 23 23 63 245 245 245 1215 1215 1215

TABLE 4.1: The number of scenarios on different elementary subtrees of the


unit subtree of the subtree Tcj for clause cj = x1 ∨ x2 ∨ x3 . Columns represent
the 14 different elementary subtrees, the topology of the elementary subtree is
indicated on the top. The black dots mean extra substitutions on the indicated
edge due to the characters in the auxiliary positions; the numbers represent
the presence/absence of adjacencies on the left leaf of a particular cherry motif,
see text for details. The row starting with # indicates the number of repeats of
the elementary subtrees. Further rows represent the logical true/false values
of the literals, for example, 001 means x1 = FALSE, x2 = FALSE, x3 =
TRUE. The values in the table indicate the number of scenarios, raised to the
appropriate power due to multiplicity of the elementary subtrees. It is easy
to check that the product of the numbers in the first line is 2136 × 376 and in
any other lines is 2156 × 364 .
#P-complete counting problems 181

In this way, the roots of all 76 elementary subtrees have 8 most parsimo-
nious labelings corresponding to the 8 possible assignments of the literals in
the clause. We connect the 76 elementary subtrees with a comb, and thus,
there are still 8 most parsimonious labelings at the root of the entire subtree,
which is the unit subtree. If the labeling at the root corresponds to a satisfying
assignment of the clause, the number of scenarios is 2156 × 364 , if the clause
is not satisfied, the number of scenarios is 2136 × 376 , as can be checked on
Table 4.1. The ratio of them is indeed 220 /312 = γ. The number of leaves on
this unit subtree is 248, and 148 auxiliary positions are introduced.
This was the construction of the constant size unit subtree. In the next
step, we “blow up” the system. Similar blowing up can be found in the seminal
paper by Jerrum, Valiant and Vazirani [103], in the proof of Theorem 5.1. We
repeat the above described unit subtree d(k log(n!) + n log(2))/ log(γ)e + 1
times, and connect all of them with a comb (completely unbalanced tree). It
is easy to see that there are still 8 most parsimonious labelings. For a solution
satisfying the clause, the number of scenarios on this blown-up subtree is
d k log(n!)+n log(2)
e+1
X = 2156 × 364 log(γ)
(4.12)

and the number of scenarios if the clause is not satisfied is


d k log(n!)+n log(2)
e+1
Y = 2136 × 376 log(γ)
. (4.13)

It is easy to see that


k )+log(2n )
X k log(n!)+n log(2)
e+1 ≥ γ log(n!)log(γ)
= γd log(γ) = n!k 2n , (4.14)
Y
and
k )+log(2n )
X k log(n!)+n log(2)
e+1 ≤ γ log(n!)log(γ)
= γd log(γ)
+2
= n!k 2n γ 2 , (4.15)
Y
since for any positive number x,
1
x log(x) = e. (4.16)

Let all adjacencies not participating in the clause be 0 on this blown-up sub-
tree.
We are close to the final subtree Tcj for one clause, cj . In the third phase,
we amend the so-far obtained tree with a constant-size subtree. The amending
is slightly different for clauses coming from Φ and for those that are in Φ0 \ Φ.
We detail the amending for both cases.
If the clause contains only x logical variables, say, the clause is x1 ∨x2 ∨x3 ,
then construct two copies of a fully balanced depth 6 binary tree, on which
the root has 64 most parsimonious labelings corresponding to the 64 possible
assignments of the literals participating in the clause and their corresponding
182 Computational complexity of counting and sampling

logical variables of the y type (namely, y1 , y2 and y3 ). This can be done with
a construction similar to the left part of the tree on Figure 4.5.c).
In one of the copies, all other characters corresponding to logical variables
not participating in the clause are 1 on all leaves, and thus, in each most
parsimonious labelings of the root. In the other copy, those characters should
be all 0.
In the copy, where all other characters are 0, the construction should be
done in such a way that going from the root of the tree, first the y logical vari-
ables must be separated, then the x ones. Namely, characters at the position
corresponding to y1 should be the same (say, 0) on each leaf of the left subtree
of the root and should be the other value on each leaf of the right subtree of
the root. Similalry, for each of the four grandchildren of the root, the leaves
must take the same value at the position corresponding to y2 , and these val-
ues must be different for the siblings. The same rule must be applied for the
grand-grandchilden of the root. There is an internal node of this subtree such
that on all of its leaves, each character at each position corresponding to y
variables is 0. Replace the subtree at this position with the blown-up subtree.
Connect the two copies with a common root. The so obtained tree is Tcj .
Observe that there are 2n possible most parsimonious labelings of Tcj . We
have the following lemma on them.
Lemma 8. For any most parsimonious labelings, if Φ0 and thus, particularly,
the clause cj is satisfied, then the number of scenarios on Tcj is
  2   2
n−6 k n n−6
X× ! ≥ Y × (n!) × 2 × ! . (4.17)
2 2
If the clause cj is not satisfied, then the number of scenarios is at most
Y × (n − 6)!. If the clause cj is satisfied, however, Φ0 is not satisfied, then
the number of scenarios is at most X × (n − 6)!.
Proof. There are 3 logical x variables in the clause and there are 3 correspond-
ing y variables. For the remaining n − 6 variables, there are n − 6 substitutions
on the two edges of the root. If Φ0 is satisfied, then for each i, exactly one in the
couple (xi , yi ) has the TRUE value and the other has the FALSE value. There-
fore, there are n−6 2 substitutions on both edges of the root. On all remaining
edges of the amending, there is either 0 or 1 substitution. Finally, the number
of scenarios on the blown-up tree is X. Therefore, the number of scenarios
 2
is indeed X × n−6 2 ! if Φ0 is satisfied. The inequality in Equation (4.18)
comes from Equation (4.14).
If the clause is not satisfied, then the number of scenarios on the blown-up
tree is Y . The substitutions on the two edges of the root might be arbitrarily
distributed, however, in any cases, the number of scenarios is at most (n − 6)!.
This extremity is taken when all the substitutions fall onto the same edge.
If cj is satisfied, however, Φ0 is not, then the number of scenarios on the
blown-up tree is X, and the number of scenarios on the two edges of the root
is at most (n − 6)!.
#P-complete counting problems 183

If the clause is in the form xi ∨ yi ∨ xi+1 (some of the literals might be


negated), then the amending is the following. Construct two copies of a fully
balanced depth 4 binary tree, on which the root has 16 most parsimonious
labelings corresponding to the 16 possible assignments of logical variables xi ,
yi , xi+1 and yi+1 . On one of the copies, all other characters are 0, while
on the other copy, all other characters must be 1. On the copy, where all
other characters are 0, the construction should be such that there must be
an internal node such that at all of its leaves, the characters at the position
corresponding to yi+1 are 0 and the subtree has depth 3. Replace this subtree
with the blown-up tree. Connect the two copies with a common root. This is
the final Tcj tree.
For this tree, a lemma similar to Lemma 8 can be proved.
Lemma 9. For any most parsimonious labelings, if Φ0 and thus, particularly,
the clause cj is satisfied, then the number of scenarios on Tcj is
  2   2
n−4 n−4
X× ! ≥ Y × (n!)k × 2n × ! . (4.18)
2 2

If the clause cj is not satisfied, then the number of scenarios is at most


Y × (n − 4)!. If the clause cj is satisfied, however, Φ0 is not satisfied, then
the number of scenarios is at most X × (n − 4)!.

Proof. The proof is similar to the proof of Lemma 8, just now there are n − 4
substitutions that must be distributed on the two edges of the root.
For all k clauses, construct such a subtree and connect all of them with a
comb. This is the final tree TΦ0 for the 3CNF Φ0 . It is easy to see that TΦ0 has
2n most parsimonious labelings corresponding to the 2n possible assignments
of the logical variables. For these labelings, we have the following theorem.
Theorem 45. If a labeling corresponds to a satisfying assignment, then the
number of scenarios is
  2n   k−2n
n−4 n−6
Xk × ! × ! ≥
2 2
  2n   k−2n
k k n k
 n−4 n−6
Y × n! × 2 × ! × ! . (4.19)
2 2

If a labeling corresponds to a non-satisfying assignment, then the number of


scenarios is at most

X k−1 × Y × (n − 4)!2n × (n − 6)!k−2n ≤


k−1
Y k × n!k × 2n × γ 2 × (n − 4)!2n × (n − 6)!k−2n . (4.20)
184 Computational complexity of counting and sampling

Particularly, the number of scenarios corresponding to non-satisfying assign-


ments is at most
k−1 k
Y k × n!k × γ 2 × (2n ) × (n − 4)!2n × (n − 6)!k−2n ≤
k
Y k × n!k × 2n 
  2n   k−2n
k k n k
 n−4 n−6
Y × n! × 2 × ! × ! . (4.21)
2 2

Proof. If Φ0 contains n logical variables, then there are 2n clauses in Φ0 \ Φ


and k − 2n clauses in Φ. Based on this, the number of scenarios for any
labelings corresponding to a satisfying assignment can be easily calculated
from Lemmas 8 and 9.
If Φ0 is not satisfied, then at least one of the clauses is not satisfied causing
a smaller number of scenarios on the corresponding subtree. However, the
number of scenarios on other subtrees corresponding to other clauses might
be higher due to the uneven distribution of the substitutions falling onto
the two edges of the root of the subtrees. The upper bounds are based on
20
Equation (4.15) considering that γ = 2312 < 2 and n ≥ 6.
What follows is that
$ %
s
 2n  k−2n (4.22)
n−4 n−6
Xk × 2 ! × 2 !

is the number of satisfying assignments of Φ where s is the number of most


parsimonious scenarios on TΦ0 . Since both the size of the tree TΦ0 and the
length of the sequences labeling the leaves of TΦ0 is a polynomial function of
size Φ, furthermore, TΦ0 together with the sequences labeling its leaves can be
constructed in polynomial time, we get the following theorem.
Theorem 46. The counting problem #SPS-TREE is in #P-complete.

4.1.4 #IS and #Mon-2SAT


Martin Dyer, Leslie Ann Goldberg, Catherine Greenhill and Mark Jerrum
proved that counting the number of independent sets is in #P-complete and
it does not have an FPRAS unless RP = NP [57]. More surprisingly, the same
holds for counting the satisfying assignments of monotone 2CNFs, although
deciding if a monotone 2CNF has a satisfying assignment is trivial. A CNF
is monotone if it does not contain any negated literal. Indeed, any monotone
2CNF is satisfiable.
First, we define the independent set problem.
Problem 8.
Name: #IS.
#P-complete counting problems 185

Input: a simple graph G = (V, E).


Output: the number of independent sets of G. That is the number of V 0 ∈ V
such that for all v1 , v2 ∈ V 0 , (v1 , v2 ) ∈
/ E.
The inapproximability is proved by reducing the problem of large indepen-
dent sets to it. We learned that finding a large independent set is NP-complete,
see Theorem 4. The reduction is the following. Let m be a positive integer,
and let G = (V, E) be a graph in which any independent set has a size at
most m. We construct the following graph G0 = (V 0 , E 0 ). The vertices V 0
are V × {1, 2, . . . , n + 2} and the edges are E 0 = {((v1 , r1 ), (v2 , r2 ))|v1 , v2 ∈
V and r1 , r2 ∈ {1, 2, . . . , r}}, where n = |V |. Informally, each vertex v ∈ V is
replaced with a set of n + 2 vertices and any edge (v1 , v2 ) ∈ E is replaced
with the complete bipartite graph Kn+2,n+2 . There is a natural mapping of
independent sets I 0 ⊆ V 0 to the independent sets I ⊆ V . Indeed, if I 0 is an
independent set, then

I = ϕ(I 0 ) := {v|∃r (v, r) ∈ I 0 } (4.23)

is also an independent set. It is clear that ϕ is a surjection and the inverse


image of any independent set I ⊆ V has size (2n+2 − 1)k , where k = |I|. We
get that the number of independent sets in G0 is

(2n+2 − 1)m |Im (G)| + b, (4.24)

where Im (G) is the set of independent sets in G of size m, and b is the number
of independent sets in G0 whose inverse image has size at most m − 1. Since
there are at most 2n independent sets in G, we get that

b ≤ (2n+2 − 1)m−1 2n . (4.25)

Therefore
|I(G0 )| (2n+2 − 1)m |Im (G)| + b
   
n+2
= = |Im (G)|, (4.26)
(2 − 1)m (2n+2 − 1)m

where I(G0 ) is the set of independent sets of G0 . It follows that if we could


compute the number of independent sets in a graph, we could also compute
the number of large independent sets in a graph. Particularly, we could decide
if there is an independent set of size m, which is an NP-complete decision
problem. Even if we could approximate the number of independent sets with
an FPRAS having epsilon = 12 and δ = 23 , we would have a BPP algorithm
for deciding if there is an independent set of size m. This would imply that
RP = NP.
Dyer and his colleagues proved that #LargeIS is in #P-complete by
finding a parsimonious reduction from #SAT to #LargeIS [57]. Thus #IS
is not only in NP-hard, but also in #P-complete.
To prove that there is no FPRAS for counting the satisfying assignments
186 Computational complexity of counting and sampling

of a monotone 2CNF, we find a parsimonious reduction from #IS to #Mon-


2SAT. Let G = (V, E) be a graph. Define
^
Φ := (xi ∨ xj ). (4.27)
(vi ,vj )∈E

Clearly, Φ is monotone 2CNF in the variables yi := xi , and there is a bijec-


tion between its satisfying assignments and the independent sets of G. The
bijection is the following. If X 0 ⊆ {x1 , x2 , . . . , xn } is a satisfying assignment,
then I := {vi |xi = T RU E} is an independent set.

4.2 #P-complete proofs not preserving the relative error


4.2.1 #DNF, #3DNF
First we define the disjunctive normal forms.
Definition 49. A conjunctive clause is a logical expression of literals and
AND operators (∧). A disjunctive normal form or DNF is a disjunction of
conjunctive clauses, that is, conjunctive clauses connected with the logical OR
(∨) operator. The decision problem if there is a satisfying assignment of a
disjunctive normal form is denoted by DNF and the corresponding counting
problem is denoted by #DNF. Similar to 3CNF and 3SAT, we can define
the 3DNF decision problem and its counting version #3DNF.
Recall that a conjunctive normal form or CNF is the conjunction of dis-
junctive clauses. The relationship between CNFs and DNFs is provided by De
Morgan’s laws:
(x1,1 ∨ . . . ∨ x1,m1 ) ∧ (x2,1 ∨ . . . ∨ x2,m2 ) ∧ . . . ∧ (xk,1 ∨ . . . ∨ xk,mk ) =
(x1,1 ∨ . . . ∨ x1,m1 ) ∨ (x2,1 ∨ . . . ∨ x2,m2 ) ∨ . . . ∨ (xk,1 ∨ . . . ∨ xk,mk ) =
(x1,1 ∧ . . . ∧ x1,m1 ) ∨ (x2,1 ∧ . . . ∧ x2,m2 ) ∨ . . . ∨ (xk,1 ∧ . . . ∧ xk,mk )
Namely, for any CNF Φ, we can generate in polynomial time a DNF Φ0 such
that any non-satisfying assignment for Φ is a satisfying assignment of Φ0 .
Therefore, if we can tell in polynomial time how many satisfying assignments
a DNF has, we can also tell how many satisfying assignments a CNF has.
Since this latter is a #P-complete counting problem, we just proved
Theorem 47. The counting problem #DNF is in #P-complete.
Observe that the negation of a 3CNF using the De Morgan’s lows is a
3DNF. Since #3SAT is in #P-complete, the following theorem also holds.
Theorem 48. The counting problem #3DNF is in #P-complete.
#P-complete counting problems 187

4.2.2 Counting the sequences of a given length that a regular


grammar can generate
In Chapter 2, we learned that counting the sequences of a given length
that an unambiguous regular grammar can generate is in FP. Therefore, it
might be surprising that the same problem for ambiguous grammars is hard
[106].
The proof is based on reducing #DNF to this problem. Let Φ be a disjunc-
tive normal form containing k clauses and n logical variables. We construct a
regular grammar G = (T, N, S, R) in the following way. T = {0, 1}. N con-
tains k(n−1)+1 non-terminals, the start non-terminal S, and further k(n−1)
non-terminals Wi,j , where i = 1, 2, . . . , k and j = 1, 2, . . . , n − 1. For the start
non-terminal we add the following rewriting rules. If neither x1 nor x1 is a
literal in the ith clause, then add the rewriting rules

S → 0Wi,1 | 1Wi,1 . (4.28)

If x1 is a literal in the ith clause, then add the rewriting rule

S → 1Wi,1 . (4.29)

If x1 is a literal in the ith clause, then add the rewriting rule

S → 0Wi,1 . (4.30)

For j = 1 to j = n − 2, and for each i, if neither xj+1 nor xj+1 is a literal in


the ith clause, then add the rewriting rules

Wi,j → 0Wi,j+1 | 1Wi,j+1 . (4.31)

If xj+1 is a literal in the ith clause, then add the rewriting rule

Wi,j → 1Wi,j+1 . (4.32)

If xj+1 is a literal in the ith clause, then add the rewriting rule

Wi,j → 0Wi,j+1 . (4.33)

Finally, for each i, if neither xn nor xn is a literal in the ith clause, then add
the rewriting rules
Wi,n−1 → 0 | 1. (4.34)
If xn is a literal in the ith clause, then add the rewriting rule

Wi,n−1 1. (4.35)

If xn is a literal in the ith clause, then add the rewriting rule

Wi,n−1 → 0. (4.36)
188 Computational complexity of counting and sampling

We claim that there is a bijection between the language that the grammar G
generates and the satisfying assignments of Φ. Specially, G generates n long 0-
1 sequences A = a1 a2 . . . an , and the image of that sequence is the assignment
in which the logical variable xi is TRUE if ai = 1 and FALSE otherwise.
If A is part of the language, then there is a generation of it. Take any
generation of A, and consider the first rewriting. It is either

S → 0Wi,1

or
S → 1Wi,1
for some i. For the literals in the ith clause of the DNF, the rewriting rules in
the selected generation of A will be such that the corresponding assignments
of the logical variables in the image of A satisfies the DNF. Therefore, any
image is a satisfying assignment. Clearly different sequences have different
images, thus, the mapping is an injection of the language to the satisfying
assignments.
We can also inject the satisfying assignments to the language. Let X be a
satisfying assignment, and assume that the ith clause satisfies the DNF. Let
aj be 1 if xj is TRUE in the assignment S and let aj be 0 otherwise. It is easy
to see that
S → a1 Wi,1 → a1 a2 Wi,2 → . . . → a1 a2 . . . an
is a possible generation in G, and it is easy to see that the image of the
so-generated sequence is indeed X.

4.2.3 Computing the permanent of a non-negative matrix


and counting perfect matchings in bipartite graphs
To prove that computing the permanent of a non-negative matrix is #P-
hard, we need the following observation and theorem in number theory.
Observation 1. Let µ be the largest absolute value in an n × n matrix A.
Then
|per(A)| ≤ n!µn . (4.37)
The consequence of this observation is if we can calculate the permanent of
A modulo pi for each pi in some set {p1 , p2 , . . . , pm } whose product is at least
2n!µn , then we can compute the permanent A in polynomial time using the
Chinese remainder theorem. However, we do know that the prime numbers
are dense enough that their product is large enough.
Lemma 10. There exists a constant d such that for any n and µ,
Y
p ≥ 2n!µn (4.38)
p≤dn log2 (µn)

where p is prime number.


#P-complete counting problems 189

This lemma comes from the well-known number theoretical result that the
first Chebyshev function, defined as
X
ϑ(n) := log(p) (4.39)
p≤n

is asymptotically n.
We can find the list of prime numbers up to dn log2 (3n) in polynomial
time using elementary methods (for example, the sieve of Eratosthenes). Note
that the running time has to be polynomial in the value of n and not the
number of digits necessary to write n. Let A be a matrix containing values
−1, 0, 1, 2 and 3. Let p be a prime, and let A0 be the matrix obtained from
A by replacing each −1 with p − 1. Observe that

per(A) ≡ per(A0 ) mod p. (4.40)

Therefore, if we could compute the permanent of a non-negative matrix


in polynomial time, then we could also compute the permanent of a matrix
containing values −1, 0, 1, 2 and 3. Since this latter is a #P-hard computing
problem, we get that

Theorem 49. Computing the permanent of a non-negative matrix is in #P-


hard.
Note that the problem remains #P-hard if the entries are integers, and
bounded by O(n log(n)). This gives us the possibility to prove that computing
the permanent of a 0-1 matrix is still #P-hard. Since any 0-1 matrix is an
adjacency matrix of a bipartite graph, the permanent of a 0-1 matrix is the
number of perfect matchings in a bipartite graph. Hence we can prove that
counting the perfect matchings in a bipartite graph is #P-complete.
To prove that computing the permanent of a 0-1 matrix is #P-hard,
we reduce computing the permanent of a non-negative integer matrix with
O(n log(n)) bounds on its entries to computing the permanent of a 0-1 matrix.
Let A be a non-negative integer matrix. Consider the edge-weighted directed
graph G ~ whose adjacency matrix is A. Replace each edge e = (v, w) in A with
a weight k > 1 with a subgraph. The subgraph is illustrated in Figure 4.6 for
the case k = 3. We get a new graph G ~ 0 . We claim that the permanent of the
0
corresponding matrix A is the permanent of A.
If (u, v) is not covered by a cycle in A, then there is only one way to cover
the new vertices: a clockwise cycle. On the other hand, if (u, v) is covered by
a cycle, then so must be the chain of edges from u to v in G ~ 0 . There are k
ways to cover the remaining cycles, each containing one of the loops. Since
any cycle covering of one of the subgraphs can be combined with any cycle
covering of another subgraph, the number of cycle coverings in G ~ 0 is the sum
of the weights of cycle coverings in G, ~ that is, the permanent of A0 is the
permanent of A. Therefore, we get the following theorem:
190 Computational complexity of counting and sampling

FIGURE 4.6: The unweighted subgraph replacing the edge (v, w) with a
weight of 3. See text for details.

Theorem 50. Computing the permanent of a 0-1 matrix is in #P-hard. The


computational problem #PerfectMatching, that is, counting the perfect
matchings in bipartite graph, is in #P-complete. Counting the cycle covers in
a directed graph is in #P-complete.

4.2.4 Counting the (not necessarily perfect) matchings of a


bipartite graph
To prove that #Matching, that is, finding the number of not necessarily
perfect matchings of a bipartite graph is in #P-complete, we need the following
lemma.
Pn
Lemma 11. Let f (x) be a polynomial of order of n in form f (n) = i=0 ci xi .
If the value of f (x) is known in n + 1 rational points, x1 , x2 , . . . xn+1 , then
the coefficients of f (x) can be computed in polynomial time in n, along with
the size (number of digits) of the largest value and the size of the rational
numbers. The size of a rational number pq is the number of digits in p and q,
assuming that p and q are coprimes.
Proof. Let A be the (n + 1) × (n + 1) matrix with ai,j = xj−1
i . Then A is a
Vandermonde matrix, and thus, it can be inverted. The coefficients of f (x)
satisfy the matrix-vector equation

Ac = v (4.41)
#P-complete counting problems 191

where c is the vector of coefficients and v is the vector of the values. However,
A can be inverted since it is a Vandermonde matrix, and thus

c = A−1 v. (4.42)

We prove the #P-completeness of the number of matchings in a bipartite


graph by reducing the number of perfect matchings to it. Let G = (U, V, E)
be a bipartite graph, such that |U | = |V | = n. Given a number k, generate a
bipartite graph Gk in the following way. Add nk-nk vertices to both U and
V , indexed by u0i,l and vj,l
0
, 1 ≤ i, j ≤ n, 1 ≤ l ≤ k. Connect each u0i,l with vi
0
and connect each vj,l with uj .
Let mr denote the number of matchings of G containing exactly n−r edges.
Each such matching will be contained in exactly mr (k+1)2r = mr (k 2 +2k+1)r
matchings in Gk . Indeed, each vertex in G not covered by the matching might
not be covered in Gk or might be covered with one of the new edges. Thus
the number of matchings in Gk is
n
X
mr (k 2 + 2k + 1)r = f (k 2 + 2k + 1). (4.43)
r=0

Observe that for different ks, the number of matchings in Gk is the value
of f at k 2 + 2k + 1. Hence, if we could calculate the matchings in Gk for
k = 0, 1, . . . n, then we could calculate the coefficients of f . Particularly, we
could calculate m0 , that is, the number of perfect matchings in G. Therefore,
we get
Theorem 51. Computing the number of matchings in a bipartite graph is
#P-complete.

4.2.5 Counting the linear extensions of a poset


Brightwell and Winkler showed that counting the linear extensions of a
partially ordered set is in #P-complete [26]. Recall that a linear extension of
a poset (P, ≤P ) is a total ordering on P such that a ≤P b implies that a ≤ b.
We denote this counting problem by #LE.
The proof is based on reducing the #3SAT problem to #LE using modulo
prime number calculations. We need the following lemma.

Lemma 12. For any n ≥ 4, the product of prime numbers strictly between n
and n2 is at least n!2n .
The proof of this lemma can be found in [26] and is based on the properties
of the first and second Chebyshev functions.
The outline of proving the #P-completeness of #LE is the following. For
192 Computational complexity of counting and sampling

x i1 x i2 x i3

FIGURE 4.7: The Hasse diagram of a clause poset. See text for details.

any 3CNF Φ of k clauses and n variables, we are first going to construct a


poset PΦ of size 7k + n. Let LΦ denote the number of linear extensions of this
poset. Next, we are going to find a set of primes between 7k + n and (7k + n)2
such that none of them divides LΦ , and their product is at least 2n + 1. Since
2n + 1 < 27k+n and LΦ ≤ (7k + n)!, such set of primes exists according to
Lemma 12. Then for each prime p in the so established set, we construct a
poset PΦ,p of size about p(n + k) with the property that the number of linear
extensions of PΦ,p is αp + s(Φ)βγLΦ , where α is a positive integer, β and γ
are easily computable positive integers, none of them can be divided by p and
s(Φ) is the number of satisfying assignments of Φ. Then we can compute s(Φ)
modulo p. Using the Chinese remainder theorem, we can calculate the number
of solutions modulo the product of the prime numbers, which is at least 2n +1.
Since the number of solutions may vary between 0 and 2n , this means that
we can actually exactly compute the number of solutions in polynomial time.
Let Φ be a 3CNF. The poset PΦ is constructed in the following way.
There are two types of vertices, n vertices for the n variables and 7 vertices
for each clause. If xi1 , xi2 and xi3 are the three logical variables that are
literals in clause cj , then each of the 7 vertices corresponding to cj is placed
above a different subset of {xi1 , xi2 , xi3 }, see Figure 4.7. There are no other
comparabilities in PΦ .
Let p be a prime between 7k + n and (7k + n)2 such that p does not
divide LΦ . We construct the poset PΦ,p in the following way. There are two
special vertices in the poset, a and b. Below a, there is an antichain of size
(p − 1)(n + 1), divided into n + 1 parts. There is a subset of vertices Ui for
each variable xi , and there is an additional set U0 . Between a and b, there is
an antichain of size (k + 1)(p − 1). These are again divided into k + 1 subsets,
each of size p − 1. For each clause cj , there is a subset Vj and there is an
additional subset V0 .
Finally, there are literal and clause vertices. For each variable xi , there are
two vertices xi and xi . Namely, to simplify notations, we abuse notation by
using the same symbol xi and xi for the literals and the vertices corresponding
to them. There are 8 vertices for each clause, and the vertices for clause cj
are denoted by cj,l , l ∈ {0, 1, . . . , 7}.
Both xi and xi are above Vi . If clause xj contains variables xi1 , xi2 and
#P-complete counting problems 193

b
...

V0 . . . Vj . . . Vk

x i1 xi1 xi2 x i2 x i3 xi3


a

U0 . . . Ui1 . . . Ui2 . . . Ui3 Un

FIGURE 4.8: The poset PΦ,p . Ovals represent an antichain of size p − 1. For
sake of clarity, only the literal and some of the clause vertices for the clause
cj = (xi1 ∨ xi2 ∨ xi3 ) are presented here. See also the text for details.

xi3 , then each clause vertex cj,l is above a different triple of literal vertices
{xi1 , xi1 }, {xi2 , xi2 } and {xi3 , xi3 }. The clause vertex which is above the
triplets that actually constitutes cj is above b and all other clause vertices
are above Vj . There are no more comparabilities in PΦ,p . The poset contains

2 + (p − 1)(k + n + 2) + 2n + 8k = (p + 7)k + (p + 1)n + p (4.44)

vertices.
To count the linear extensions of LΦ,p , we partition them based on config-
urations that we define below. A configuration λ is a partition of the literal
and clause vertices into 3 sets, B λ , M λ and T λ , called the base, middle and
top sets, respectively. We say that a linear extension respects a configuration
λ = {B λ , M λ , T λ } if B λ ≤ a ≤ M λ ≤ b ≤ T λ in the linear extension. The
set of linear extensions respecting a configuration λ is denoted by Lλ . A con-
figuration λ is consistent if Lλ is not empty. It is easy to see that for such
cases |Lλ | is the product of the number of linear extensions of three posets,
restricting PΦ,p to the base, middle and top sets. We denote these posets by
PΦ,p |B λ , PΦ,p |M λ , and PΦ,p |T λ . The number of linear extensions respecting
a configuration can be divided by p if and only if the number of linear ex-
tensions of any of these three posets can be divided by p. Therefore, in the
following, we infer the cases when p does not divide any of these numbers of
linear extensions.
194 Computational complexity of counting and sampling

The poset PΦ,p |B λ consists of an antichain of size (p − 1)(n + 1) and some


of the literal vertices. The set U0 consists of isolated vertices and also those
vertices in Ui for which neither xi nor xi is in PΦ,p |B λ . The isolated vertices
can be put into the linear extension to any position in any order, and for
any such selection of positions and order, the remaining vertices can have an
arbitrary linear extension. What follows is that the number of linear extensions
of PΦ,p |B λ can be divided by m

r r! = m(m − 1) . . . (m − r + 1) where m is
the size of the poset and r is the number of isolated vertices. Since r ≥ p − 1,
this cannot be divided by p only if r = p − 1 and m ≡ −1 mod p. Since m is
between (p − 1)(n + 1) and (p − 1)(n + 1) + 2n and p > n, this implies that
there are exactly n literal vertices in PΦ,p |B λ , one for each logical variable
(since the number of isolated vertices is p − 1).
We claim that in this case, the number of linear extensions of PΦ,p |B λ is

(p(n + 1) − 1)!
, (4.45)
pn

which cannot be divided by p. Indeed, the size of the poset is np + (p − 1) =


p(n + 1) − 1. Any linear extension can be obtained by first selecting a place
for each set of vertices Ui and its literal vertex xi or xi and also for the set of
vertices U0 . This can be done in
 
p(n + 1) − 1
(4.46)
p, p, . . . p − 1

way. Then for each i, the vertices in Ui can be put into arbitrary order. This
can be done in (p − 1)! way for each i, thus the number of linear extensions is
indeed  
p(n + 1) − 1 (p(n + 1) − 1)!
(p − 1)!n+1 = . (4.47)
p, p, . . . p − 1 pn
It is trivial to see that this number cannot be divided by p since p is greater
than n + 1.
Similar analysis can be done for PΦ,p |M λ . It contains an antichain of size
(p − 1)(k + 1) and some literal and clause vertices. It is easy to see that its
number of linear extensions cannot be divided by p if and only if it contains
none of the literal vertices and exactly one of the clause vertices for each Vj .
In such a case, the number of linear extensions of PΦ,p |M λ is

(p(k + 1) − 1)!
(4.48)
pk
which cannot be divided by p since p > k.
If the number of linear extensions cannot be divided by p for PΦ,p |B λ or
for PΦ,p |M λ , then PΦ,p |T λ contains one literal vertex for each logical variable
and seven clause vertices for each literal. We are going to show that these
#P-complete counting problems 195

configurations correspond to satisfying assignments. Indeed, in any such con-


figuration, one of the literal vertices is in PΦ,p |B λ and the other is in PΦ,p |T λ .
Let the assignment be such that the literals in PΦ,p |T λ are TRUE. Then for
each clause, the clause vertex which is in PΦ,p |M λ corresponds to a combina-
tion of literals that would build up a clause that the current assignment does
not satisfy. However, this cannot be the clause vertex that corresponds to the
combination of the literals which actually constitutes the clause, since that
clause vertex is above b. Therefore the assignment satisfies each clause.
Also, if an assignment is a satisfying assignment, it determines a configu-
ration for which the number of linear extensions cannot be divided by p for
PΦ,p |B λ or for PΦ,p |M λ .
The last piece of observation is that for any such assignment, PΦ,p |T λ is
isomorphic to PΦ . Indeed, PΦ,p |T λ contains, for each clause, 3 literal vertices
and 7 clause vertices. The missed clause vertex is not comparable with any of
the 3 literal vertices, therefore the presented 7 clause vertices correspond to
the possible non-empty subsets of literal vertices. Then the number of linear
extensions of PΦ,p |T λ is indeed LΦ .
What we get is that the number of linear extensions of PΦ,p is
(p(n + 1) − 1)! (p(k + 1) − 1)!
αp + s(Φ) LΦ . (4.49)
pn pk
From this, s(Φ) can be calculated modulo p. If the number of linear exten-
sions of the given posets are known, then building up the posets, finding the
appropriate set of prime numbers, and computing s(Φ) modulo these prime
numbers can all clearly be done in polynomial time. Therefore, we get the
following theorem.
Theorem 52. The counting problem #LE is in #P-complete.

4.2.6 Counting the most parsimonious substitution histories


on a star tree
We learned in Subsection 4.1.3 that counting the most parsimonious sce-
narios on an evolutionary tree is in #P-complete. Here we show that the
problem remains #P-complete if the binary tree is replaced to a star tree.
Below we define this problem.
Problem 9.
Name: #SPS-Star.
Input: a multiset of sequences of the same length over the same alphabet
S = {A1 , A2 , . . . An }.
Output: the value defined as
n
X Y
H(Ai , M )! (4.50)
M ∈M i=1
196 Computational complexity of counting and sampling

where M is the set of sequences that minimizes the sum of Hamming distances
from the sequences in S. Namely, for any M ∈ M,
n
X
H(Ai , M ) (4.51)
i=1

is minimal. Surprisingly, this problem is #P-complete, even if the size of the


alphabet is 2, although finding the size of M is trivial. It is easy to see that M
consists of the sequences that contain the majority character for each position.
The majority character might not be unique; the size of M is the product of
the number of majority characters in each position. If the size of the alphabet
is 2, say, it is {0, 1}, then the size of M is 2m , where m is the number of
positions where half of the sequences in S contain 0 and the other half of
them contain Q 1. Sequences in M are called median sequences. For each median
n
sequence M , i=1 H(Ai , M )! is the number of corresponding scenarios.
Below we present a proof that #SPS-Star is in #P-complete based on
the work of Miklós and Smith [133]. The proof is based on reducing #3SAT
to #SPS-Star using modulo prime number calculations.
Let Φ be a 3CNF with n variables and k clauses, and let p be a prime
number between min 300, n + 5 and 5 min 300, n + 5. We are going to construct
a multiset S containing 2 + 2n + 50k sequences, each of them of length 2n +
2(q + 4) + 2n(q + 3) + k(75 + 50), where q = p − n + 5. Each sequence is in
the form
a1 b1 a2 b2 . . . an bn e1 e2 . . . et(p) , (4.52)
where t(p) = 2(q + 4) + 2n(q + 3) + k(75 + 50q). The ai and bi characters
correspond to the logical variable xi in Φ, and the ej characters are additional
characters. In these additional positions, all sequences contain character 0
except one of them. We will say that a sequence contains x additional ones,
which means that there are x additional positions where the sequence contains
1s. The sequences come in pairs such that they are the complement of each
other in the first 2n positions. What follows is that there are 22n median
sequences. The sequences are the following.
1. There is a sequence that contains all 0 characters in the first 2n positions
and has q + 4 additional ones. Furthermore, there is a sequence that
contains all 1s in the first 2n positions and contains q + 4 additional
ones. We denote these sequences as A and A.
2. For each index i = 1, . . . n, there are a couple of sequences. For one of
them ai = bi = 1, and for all other j 6= i ai = bi = 0, and the sequence
contains q + 3 additional ones. The other sequence is the complement of
the first one in the first 2n positions and also contains q + 3 additional
ones. We denote these sequences by Ai and Ai .
3. For each clause, there are 50 sequences, see Table 4.2. Each sequence
#P-complete counting problems 197

differs in the characters corresponding the logical variables participating


in the clause, in the characters corresponding to other logical variables
and in the number of additional ones. In Table 4.2, column A gives the
characters ai1 , bi1 , ai2 , bi2 , ai3 and bi3 for each sequence if the clause is
(xi1 ∨ xi2 ∨ xi3 ). If some of the literals are negated, the corresponding
a and b values must be swapped. In each sequence, for all j 6= i1 , i2 , i3 ,
all characters aj and bj are the same. Column B tells if it is 1 or 0.
Each sequence has q additional ones plus the number that can be found
in column C. These 3 columns completely describe the sequences; the
remaining columns in the table are explained later.
It is easy to see that there are 22n median sequences; the first 2n characters
might be arbitrary, and the characters in the additional positions must be 0.
We set up three properties on the medians.

Property 1. Exactly n of the characters are 1 in the median.


Property 2. For each i, ai + b1 = 1.
Property 3. For each i, ai + bi = 1, and the assignment

T RU E if ai = 1
xi = (4.53)
F ALSE if ai = 0

satisfies Φ.
It is easy to see that these properties are nested, namely, if a median
sequence has Property i, then it also has Property j for each j < i. We prove
the following on the median sequences.
If a median sequence M does not have Property 1, then the number of
corresponding scenarios can be divided by p. Indeed, in such a case, either
H(M, A) ≥ p or H(M, A) ≥ p and thus either H(M, A)! or H(M, A)! can be
divided by p.
If a median sequence M has Property 1, but does not have Property 2,
then the number of corresponding scenarios can be divided by p. Indeed, let
i be such that ai + bi = 0 or ai + bi = 2. Then either H(M, Ai ) = p or
H(M, Ai ) = p, making the corresponding factorial dividable by p.
If a median sequence M has Properties 1 and 2, but does not have Property
3, then the number of corresponding scenarios can be divided by p. Assume
that cj = (xi1 ∨ xi2 ∨ xi3 ) is the clause that is not satisfied by the assignment
defined in Equation (4.53). Then ai1 = ai2 = ai3 = 0 and bi1 = bi2 = bi3 = 1.
In that case, the Hamming distance between M and the sequence that is
defined for clause cj in the last row of Table 4.2 is p. It follows that the number
of corresponding scenarios can be divided by p. If some of the literals are
negated in a clause not satisfied by the assignment defined in Equation (4.53),
the same arguing holds, since both in the constructed sequences and in M ,
some of the a and b values are swapped.
198 Computational complexity of counting and sampling

A B C M1 M2 M3 M4 M5 M6 M7 M8
111 110 101 011 100 010 001 000
01 00 00 0 +3 p−1 p−1 p−1 p−3 p−1 p−3 p−3 p−3
00 01 00 0 +3 p−1 p−1 p−3 p−1 p−3 p−1 p−3 p−3
00 00 01 0 +3 p−1 p−3 p−1 p−1 p−3 p−3 p−1 p−3
10 11 11 1 +0 p−6 p−6 p−6 p−4 p−6 p−4 p−4 p−4
11 10 11 1 +0 p−6 p−6 p−4 p−6 p−4 p−6 p−4 p−4
11 11 10 1 +0 p−6 p−4 p−6 p−6 p−4 p−4 p−6 p−4
10 10 00 0 +2 p−5 p−5 p−3 p−3 p−3 p−3 p−1 p−1
10 00 10 0 +2 p−5 p−3 p−5 p−3 p−3 p−1 p−3 p−1
00 10 10 0 +2 p−5 p−3 p−3 p−5 p−1 p−3 p−3 p−1
10 10 00 0 +2 p−5 p−5 p−3 p−3 p−3 p−3 p−1 p−1
10 00 01 0 +2 p−3 p−5 p−3 p−1 p−5 p−3 p−1 p−3
00 10 01 0 +2 p−3 p−5 p−1 p−3 p−3 p−5 p−1 p−3
10 01 00 0 +2 p−3 p−3 p−5 p−1 p−5 p−1 p−3 p−3
10 00 10 0 +2 p−5 p−3 p−5 p−3 p−3 p−1 p−3 p−1
00 01 10 0 +2 p−5 p−1 p−5 p−3 p−3 p−1 p−5 p−3
01 10 00 0 +2 p−3 p−3 p−1 p−5 p−1 p−5 p−3 p−3
01 00 10 0 +2 p−3 p−1 p−3 p−5 p−1 p−3 p−5 p−3
00 10 10 0 +2 p−5 p−3 p−3 p−5 p−1 p−3 p−3 p−1
10 01 00 0 +2 p−3 p−3 p−5 p−1 p−5 p−1 p−3 p−3
10 00 01 0 +2 p−3 p−5 p−3 p−1 p−5 p−3 p−1 p−3
00 01 01 0 +2 p−1 p−1 p−3 p−1 p−5 p−3 p−3 p−5
01 10 00 0 +2 p−3 p−3 p−1 p−5 p−1 p−5 p−3 p−3
01 00 01 0 +2 p−1 p−3 p−1 p−3 p−3 p−5 p−3 p−5
00 10 01 0 +2 p−3 p−5 p−1 p−3 p−3 p−5 p−1 p−3
01 01 00 0 +2 p−1 p−1 p−3 p−3 p−3 p−3 p−5 p−5
01 00 10 0 +2 p−3 p−1 p−3 p−5 p−1 p−3 p−5 p−3
00 01 10 0 +2 p−3 p−1 p−5 p−3 p−3 p−1 p−5 p−3
10 10 11 1 +1 p−6 p−6 p−4 p−4 p−4 p−4 p−2 p−2
10 11 01 1 +1 p−4 p−6 p−4 p−2 p−6 p−4 p−2 p−4
11 10 01 1 +1 p−4 p−6 p−2 p−4 p−4 p−6 p−2 p−4
10 01 11 1 +1 p−4 p−4 p−6 p−2 p−6 p−2 p−4 p−4
10 11 10 1 +1 p−6 p−4 p−6 p−4 p−4 p−2 p−4 p−2
11 01 10 1 +1 p−4 p−2 p−6 p−4 p−4 p−2 p−6 p−4
01 10 11 1 +1 p−4 p−4 p−2 p−6 p−2 p−6 p−4 p−4
01 11 10 1 +1 p−4 p−2 p−4 p−6 p−2 p−4 p−6 p−4
11 10 10 1 +1 p−6 p−4 p−4 p−6 p−2 p−4 p−4 p−2
10 01 11 1 +1 p−4 p−4 p−6 p−2 p−6 p−2 p−4 p−4
10 11 01 1 +1 p−4 p−6 p−4 p−2 p−6 p−4 p−2 p−4
11 01 01 1 +1 p−2 p−4 p−4 p−2 p−6 p−4 p−4 p−6
01 10 11 1 +1 p−4 p−4 p−2 p−6 p−2 p−6 p−4 p−4
01 11 01 1 +1 p−2 p−4 p−2 p−4 p−4 p−6 p−4 p−6
11 10 01 1 +1 p−4 p−6 p−2 p−4 p−4 p−6 p−2 p−4
01 01 11 1 +1 p−2 p−2 p−4 p−4 p−4 p−4 p−6 p−6
01 11 10 1 +1 p−4 p−2 p−4 p−6 p−2 p−4 p−6 p−4
11 01 10 1 +1 p−4 p−2 p−6 p−4 p−4 p−2 p−6 p−4
01 01 11 1 +1 p−2 p−2 p−4 p−4 p−4 p−4 p−6 p−6
01 11 01 1 +1 p−2 p−4 p−2 p−4 p−4 p−6 p−4 p−6
11 01 01 1 +1 p−2 p−4 p−4 p−2 p−6 p−4 p−4 p−6
01 01 01 0 +1 p−2 p−3 p−3 p−1 p−5 p−5 p−5 p−7
10 10 10 1 +2 p−6 p−4 p−4 p−4 p−2 p−2 p−2 p

TABLE 4.2: Constructing the 50 sequences for a clause. See text for expla-
nation.
#P-complete counting problems 199

If a median sequence M satisfies Property 3, then the number of corre-


sponding scenarios are

(p − 6)!7k (p − 5)!6k (p − 4)!12k (p − 3)!12k (p − 2)!6k+2n (p − 1)!7k+2 (4.54)

which cannot be divided by p. Indeed, if Property 3 holds, then H(M, A) =


H(M, A) = p − 1 and for each i, H(M, Ai ) = H(M, Ai ) = p − 2. Since a clause
cj = (xi1 ∨ xi2 ∨ xi3 ) is satisfied, characters ai1 , ai2 and ai3 in M are one of
the combinations that can be found in the 7 columns in Table 4.2 labeled by
M1–M7 and the corresponding b characters are the complements. Then the
Hamming distances between M and the 50 sequences defined for cj are the
values indicated in the appropriate column. It is easy to verify that in each
column from M1 to M7, there are 7 − 7 (p − 6) and (p − 1), 6 − 6 (p − 5) and
(p − 2), and 12 − 12 (p − 4) and (p − 3). If some of the literals are negated in a
clause, then both in the constructed sequences and in the median sequences,
the corresponding a and b values are swapped, and the same reasoning holds.
What follows is that only those median sequences contribute to the number
of scenarios modulo p that have Property 3. Therefore,
X Y
H(M, A)! ≡ s(Φ)(p − 6)!7k (p − 5)!6k (p − 4)!12k ×
M ∈M A∈S
12k
×(p − 3)! (p − 2)!6k+2n (p − 1)!7k+2 mod p (4.55)

where s(Φ) is the number of satisfying assignments of Φ. Since all the calcu-
lations in the reduction can be done in polynomial time, we get the following
theorem.
Theorem 53. The counting problem #SPS-STAR is in #P-complete.

4.2.7 Counting the (not necessarily perfect) matchings in a


planar graph
While the number of perfect matchings in a planar graph can be calculated
in polynomial time, the counting problem becomes hard if all matchings are
to be calculated. This quite surprising result was proved by Mark Jerrum [97].
The reduction is via (vertex) weighted matchings that we define below.
Problem 10.
Name: #Pl-W-Matching.
Input: a simple graph G = (V, E), and a weight function w : V → Z.
Output: the value defined as
X Y
w(v), (4.56)
M ∈M(G) v ∈M
/

where M(G) is the set of the (not necessarily perfect) matchings of G. Later
200 Computational complexity of counting and sampling

on, we are going to introduce weighted graphs where the weight function maps
to the multivariate polynomial ring Z[x1 , x2 , . . . , xk ]. In such cases, the sum
in Equation (4.56) will be called the matching polynomial.
Clearly, #W-Matching is a #P-complete problem, since the problem
reduces to the #PerfectMatching by choosing the weight function to be
the constant 0 function. Indeed, then only the perfect matchings contribute
to the sum in Equation (4.56), and all of them with 1 (recall that the empty
product is defined as 1). It is also clear that the #W-Matching problem
remains in #P-complete if only weights 1 and 0 are used.
First we show that #Pl-W-Matching is also in #P-complete by reducing
#W-Matching to it. Let G = (V, E) be a simple graph with weights w : V →
Z. We can draw G on a plane in polynomial time such that there are no points
where more than 2 edges cross each other. The number of crosses is O(n4 ),
where n = |V |. We are going to replace each crossing to a constant size planar
gadget such that for the so obtained planar graph G0 = (V, E 0 ) with weights
w0 : V 0 → Z, the equality
X Y X Y
w0 (v 0 ) = 8c w(v) (4.57)
M 0 ∈M(G0 ) v 0 ∈M
/ 0 M ∈M(G) v ∈M
/

holds, where c is the number of crossings in the drawing of G. The gadget is


built up using the two units in Figures 4.9 and 4.10. These gadgets contain
indeterminants labeling some vertices. These indeterminants can be consid-
ered as monomials in the multivariate polynomial ring. It is easy to see that
the matching polynomial is
1 + xy + yz + zx (4.58)
for the gadget ∆1 in Figure 4.9, and
2(1 + xyz) (4.59)
for the gadget ∆2 on Figure 4.10.
Call the degree 1 vertices “external” and all other vertices “internal” in
∆1 and ∆2 . Similarly, let an edge be “internal” if both of its vertices are
internal, otherwise external. The matching polynomials can be interpreted by
considering the gadgets in a larger graph H, where only the external vertices
have connections toward the remaining part of H1 . The matching polynomial
of ∆1 tells us that ∆1 contributes to the matching polynomial of H1 only
when 1 or 3 of the external vertices are covered by edges outside of ∆1 , and
thus, 3 or 2 external edges are used in a matching. The contribution in each
case is 1. Namely, if M P denotes the matching polynomial, then
M P (H1 ) = M P ((H1 \ ∆1 ) ∪ {x, y, z}) + yzM P ((H1 \ ∆1 ) ∪ {x}) +
xzM P ((H1 \ ∆1 ) ∪ {y}) + xyM P ((H1 \ ∆1 ) ∪ {z}). (4.60)
Here we slightly abused the notations that some of the vertices in ∆1 are
denoted by their assigned weights.
#P-complete counting problems 201

0 0

y z

FIGURE 4.9: The gadget component ∆1 for replacing a crossing in a non-


planar graph. See text for details.

-1

1 1
-1

-1 1 -1

y z

FIGURE 4.10: The gadget component ∆2 for replacing a crossing in a non-


planar graph. See text for details.
202 Computational complexity of counting and sampling

x w

-1 1 -1 0 0 -1 1 -1

0
-1 -1
1 1 1 1

-1 -1 -1

1 1
-1
0 0

0 0 -1 1 -1 0 0

z y

FIGURE 4.11: The gadget Γ replacing a crossing in a non-planar graph. See


text for details.

It is easy to see that a similar argument holds for ∆2 , when it is a part in


a larger graph H2 , then

M P (H2 ) = 2M P ((H2 \ ∆2 ) ∪ {x, y, z}) + 2xyzM P (H2 \ ∆2 ). (4.61)

The complete gadget Γ is in Figure 4.11. It is easy to observe that Γ


is constructed from three copies of ∆1 and three copies of ∆2 . Only those
matchings will contribute to the matching polynomial in which 3 or 2 external
edges are used in each copy of ∆1 and 3 or 0 external edges are used in each
copy of ∆2 . From this, a straightforward calculation reveals that the matching
polynomial of Γ is
8(1 + xy + wz + wxyz). (4.62)
Namely, Γ has exactly the properties required to substitute a crossover. The
diametrically opposed external edges in Γ are forced to act in the same way
in any matching: either both of them are in the matching or none of them.
What follows is that if a crossover is replaced with Γ and all w, x, y, and z are
replaced with 1, then the matching polynomial becomes 8 times the matching
#P-complete counting problems 203

polynomial of the original graph. By replacing each matching in G with Γ and


substituting each non-determinant by 1, the matching polynomial becomes
8c times the matching polynomial of G, where c is the number of crossings
in G. It is clear that the whole reduction is polynomial computable. Since
#W-Matching is in #P-complete, we get that #Pl-W-Matching is also
in #P-complete. It is clear that #Pl-W-Matching remains in #P-complete
if only weights −1, 0 and 1 are used.
In the last step, we prove that #Pl-Matching, that is, counting the
matchings in a planar graph, is also in #P-complete by showing that #Pl-
W-Matching is polynomial reducible to it. This reduction uses polynomial
interpolation twice. In the first step, we show that #Pl-W-Matching with
weights −1, 0 and 1 is polynomial reducible to the #Pl-W-Matching prob-
lem using only weights −1 and 1. In the second step, we show that the #Pl-
W-Matching problem using only weights −1 and 1 is polynomial reducible
to the #Pl-Matching problem.
Let G be a planar graph with vertex weights from the set {−1, 1, 0}. Re-
place each 0 with x. Then the sum of the weighted matchings is the matching
polynomial
Xk
M P (G) := ai xi (4.63)
i=0

evaluated at x = 0, where k is the number of vertices labeled by 0. Generate


graphs G0 , G1 , . . . , Gk by attaching 0, 1, . . . k number of auxiliary vertices to
each vertex labeled by x. Assign weight 1 to each auxiliary vertex and replace
each x with 1. Then the sum of the weighted matchings in Gj is
k
X
ai (j + 1)i . (4.64)
i=0

To see this, observe that ai is the sum of weighted matchings over those match-
ings that avoid exactly i vertices having weight x in G. Any such matching
can be extended to a matching in Gj by j ways adding an edge to an avoided
vertex and one way by not adding any edge. Thus the sum of the weighted
matchings in Gj is the matching polynomial of G evaluated at x = j + 1.
Evaluating P M (G) at k + 1 different points, we can compute the coefficients
of it in polynomial time. Particularly, we can compute a0 , that is, the sum of
weighted matchings of G.
Similar reduction can be done in the second step. Let G be a planar graph
with vertex weights from the set {−1, 1}. Replace each −1 with x. Then the
sum of the weighted matchings is the matching polynomial
k
X
M P (G) := ai xi (4.65)
i=0

evaluated at x = −1, where k is the number of vertices labeled by −1. We can


204 Computational complexity of counting and sampling

ev3

v3 v2
v0

ev1 v1 ev2

FIGURE 4.12: The gadget ∆ replacing a vertex in a planar, 3-regular graph.


See text for details.

again generate k + 1 graphs in which the number of matchings is equal to the


matching polynomial evaluated at k + 1 different integer values. Therefore,
the coefficients can be computed in polynomial time, and thus, the evaluation
at x = −1 can also be done in polynomial time. This completes the proof of
the following theorem.
Theorem 54. The computational problem #Pl-Matching is in #P-
complete.

4.2.8 Counting the subtrees of a graph


We learned that counting the spanning trees in a graph is easy. Surpris-
ingly, counting all trees of a graph becomes #P-complete, even if the problem
is restricted to planar graphs. The proof is based on reducing #Pl-3Reg-H-
Path, that is, counting the Hamiltonian paths in a planar, 3-regular graph to
this problem. It is known that #Pl-3Reg-H-Path is in #P-complete. We do
not introduce the proof of the #P-completeness of the #Pl-3Reg-H-Path
problem; the proof can be found in [98] based on the work of Garey, Johnson
and Tarjan [74], who proved that Pl-3Reg-H-Path is in NP-complete.
The reduction is done via two intermediate problems. The #Pl-S-k-
Subtree is the problem of counting the subtrees of planar graphs containing
exactly k edges and a subset S of vertices. The #Pl-S-Subtree is the prob-
lem of counting the subtrees of a planar graph containing a subset S of vertices.
The reductions, except the first one, use polynomial interpolations.
First, we show that #Pl-S-k-Subtree is in #P-complete by reducing
the #Pl-3Reg-H-Path to it. Let G = (V, E) be a planar, 3-regular graph;
let n denote the number of vertices. We replace each vertex v of G incident to
edges e1 , e2 and e3 with a gadget ∆v shown in Figure 4.12. In the modified
graph, G0 , the edges are incident to vertices ev1 , ev2 and ev3 . In each gadget, the
evi vertices are called external vertices, and the vj vertices are called internal
#P-complete counting problems 205

vertices. The edges in G0 ⊂ G are called external edges. These edges connect
the external vertices of the gadgets.
We ask for the number of trees in G0 with exactly k = 4n − 3 edges
containing the set S = {v0 |v ∈ V }. We show that this is exactly four times
the number of Hamiltonian paths in G. Let p be a Hamiltonian path in G, and
consider the corresponding external edges in G0 . This set can be extended to
a tree containing k edges and the set S in exactly 4 different ways. For each
adjacent edge in p, there is a unique way to connect ei and ei+1 with 3 edges,
(evi , vi ), (vi , v0 ), and (vi , evi+1 ), involving vertex v0 . (The indexes are modulo
3.) At the end of the Hamiltonian path, which were vertices s and t in G,
there are 2 possible ways to connect s0 and t0 to the tree in G0 using exactly
2 edges. It is easy to see that the number of vertices is indeed 4n − 3. There
are n − 1 external edges, and there are 3 internal edges in each gadget except
at the end of the Hamiltonian path, where the number of internal edges are
2. Thus, the number of edges is n − 1 + 3(n − 2) + 4 = 4n − 3.
We are going to show that these are the only subtrees with k edges and
covering the subset S by proving that any minimal subtree covering S contains
exactly k edges if and only if its external edges form a Hamiltonian path in G.
Let T be a minimal subtree covering S. Then for each gadget, the number of
external vertices in the subtree is either 1, 2 or 3. It is easy to show that the
number of internal edges for these three cases is 2, 3 or 5, respectively. The
external edges form a spanning tree in G, thus, the number of external edges
is n − 1 and the sum of the external vertices in the gadgets are 2n − 2. Then
the number of edges is k only if the external edges form a Hamiltonian path
in G. Indeed, if there are m gadgets with 3 external vertices, then there are
m + 2 gadgets with 1 external vertex, and n − 2m − 2 gadgets with 2 external
vertices. Then the total number of edges is

n − 1 + 5m + 2(m + 2) + 3(n − 2m − 2) = 4n − 3 + m. (4.66)

It is 2n − 3 only if m = 0. That is, the external edges form a Hamiltonian


path.
To show that #Pl-S-Subtree is in #P-complete, consider a planar, graph
G = (V, E), and a subset of vertices S ⊆ V . We introduce the following
polynomial:
Xn
T P (G) := a i xi , (4.67)
i=1

where ai is the number of subtrees of G that contain i vertices and cover


the subset S. Introduce a series of graphs G0 , G1 , . . . , Gn−1 , by attaching
0, 1, . . . , n − 1 auxiliary vertices to each vertex. Then the number of subtrees
that cover the subset S in Gj is
n
X
ai (j + 1)i , (4.68)
i=1
206 Computational complexity of counting and sampling

which is T P (G) evaluated at x = j + 1. Indeed, any subtree of G containing


i vertices can be extended to a subtree of Gj in (j + 1)i different ways by not
attaching or attaching one of the j auxiliary vertices to each of the vertices
in the subtree. Thus, if we can compute the number of subtrees covering S in
each graph G0 , G1 , . . . , Gn−1 , then we can calculate the coefficients of T P (G).
In particular, we can calculate ak+1 , the number of subtrees covering S and
containing k vertices. That is, containing exactly k edges.
Finally, we can prove that #Pl-Subtree is in #P-complete by reducing
#Pl-S-Subtree to it. Polynomial interpolation is applied again in the re-
duction. Let G = (V, E) be a planar graph, let S ⊆ V be a subset of vertices,
and let k denote the size of S. Generate graphs G0 , G1 , . . . Gk by attaching
0, 1, . . . , k additional vertices to each vertex in S. Then the number of subtrees
in Gj is
Xk
bi (j + 1)i (4.69)
i=0

where bi is the number of subtrees in G that cover i vertices from S. By


computing the number of subtrees in each Gj , we can evaluate the polynomial
X
kbi xi (4.70)
i=0

at k + 1 different points. This allows the estimation of the coefficients in


polynomial time. Particularly, we can compute bk , the number of subtrees in
G covering each vertex in S. This completes the proof of the following theorem.
Theorem 55. The counting problem #Pl-Subtree is in #P-complete.
#P-complete counting problems 207

4.2.9 Number of Eulerian orientations in a Eulerian graph

Problem 11.
Name: #EulerianOrientation.
Input: a Eulerian graph G = (V, E), with possible parallel edges.
Output: the number of Eulerian Orientations of G. That is, the number of
orientations of the edges such that for every vertex v ∈ V , the number of
incoming and outgoing edges of v is the same.
Milena Michail and Peter Winkler showed that counting the Eulerian ori-
entation of a graph is #P-complete [129]. They reduce the number of perfect
matchings to the number of Eulerian orientations. The reduction is the fol-
lowing. Let G = (U, V, E) be a bipartite graph. Since we are interested in the
number of perfect matchings in G, we can assume, without loss of generality,
that
P each degree P in G is greater than0 1. Let n denote |U | = |V |, let m denote
u∈U d(u) = v∈V d(v), and let m be m − 2n. We construct the following
graphs. Let G0 be the graph that amends G by adding two vertices s and t.
Each u ∈ U is connected to s with d(u) − 2 parallel edges, and each v ∈ V is
connected to t with d(v) − 2 parallel edges. Finally, let Gk denote the graph
that amends G0 by connecting s and t with k parallel edges.
It is clear that in all Eulerian orientations of Gm0 in which all edges between
s and t are oriented from t to s, all edges connecting U with s must be oriented
toward U , and all edges connecting V to t must be oriented toward t. What
follows is that all edges between U and V are oriented from V to U except
exactly one edge for each u ∈ U and v ∈ V . These edges indicate a perfect
matching between u and v. It is also easy to see that there is a bijection
between those Eulerian orientations of Gm0 and the perfect matchings in G.
Let Rj denote the number of (not necessarily balanced) orientations of G0
in which each vertex in U and V are balanced and there are exactly j edges
from the m0 ones connecting s with U that are oriented toward U and similarly,
exactly j edges from the m0 ones connecting t with V that are oriented toward
t. It is clear that |Rm0 | is the number of perfect matchings in G. Observe that
due to symmetry,
|Rj | = |Rm0 −j | (4.71)
since there is a bijection between Rj and Rm0 −j obtained by changing the
orientation of each edge.
Now, let P (Gk ) denote the number of Eulerian orientations in Gk . Then
we have that the equation
k  
X k
P (Gk ) = R m0 +k −i (4.72)
i=0
i 2

holds. Indeed, if there are i number of edges oriented from s to t, then there are
k − i edges oriented from s to t. To get a Eulerian orientation, there must be
m0 +k
2 − i edges oriented from s to U and also, there must be the same number
208 Computational complexity of counting and sampling

of edges oriented from V to t. Note that Gk is a Eulerian graph if and only


if the parity of k and m are the same. Now observe that Equation (4.72) for
k = 0, 2, . . . , m0 if m0 is even and for k = 0 0
j 1,0 3, k. . . , m if m is odd together with
m −1 0
Equation (4.71) for each j = 0, 1, . . . , 2 define m + 1 linear equations.
These equations contain m0 + 1 indeterminants, |Rj |, with j = 0, 1, . . . , m0 ,
assuming that we can compute P (Gk ) for each k. It is easy to see that this
equation system has a unique solution (the corresponding determinant is non-
zero). Therefore, each |Rj |, particularly, the number of perfect matchings in R,
can be computed in polynomial time if each P (Gk ) is obtained. This completes
the proof of the following theorem.
Theorem 56. The counting problem #EulerianOrientation is in #P-
complete.

4.3 Further reading and open problems


4.3.1 Further results
• Leslie Valiant was the first who showed that many computing prob-
lems are in #P-complete. In his paper [176], he gave a list of 14 prob-
lems which are #P-hard or #P-complete. This list includes many of the
problems we already covered in this chapter: computing the permanent,
the number of perfect matchings, the number of not necessarily per-
fect matchings, and the number of satisfying assignments of a monotone
2CNF. Further important counting problems in his lists are:
– Number of prime indications of monotone 2CNFs. The input is a
2CNF Φ built up from the set of logical variables X, and the output
is the number of subsets X 0 ⊆ X such that the implication
^
x⇒Φ (4.73)
x∈Y

holds for Y = X 0 but does not hold for any Y ⊂ X 0 . The proof is
based on reducing the number of perfect matchings to this problem
using polynomial interpolation.
– Number of minimal vertex covers. The input is a simple graph
G = (V, E), and the output is the number of subsets V 0 ⊆ V
such that V 0 covers the edge set, and it is minimal. That is, the
implication
(u, v) ∈ E ⇒ u ∈ A ∨ v ∈ A (4.74)
holds for A = V 0 but does not hold for any A ⊂ V 0 . The proof is
#P-complete counting problems 209

based on reducing the monotone prime implications to this problem


using bijection between the number of solutions.
– Number of maximal cliques. The input is a simple graph G =
(V, E), and the output is the number of subsets V 0 ⊆ V such that
V 0 is a clique, and it is maximal. That is, the implication

u, v ∈ A ⇒ (u, v) ∈ E (4.75)

holds for A = V 0 but does not hold for any A ⊃ V 0 . The proof
is based on the fact that the set of vertices in any maximal clique
is the complement of a minimal vertex cover in the complement
graph and vice versa.
– Directed trees in a directed graph. The input is a directed graph
~ = (V, E), and the output is the number of subsets E 0 ⊆ E that
G
form a rooted tree. Compare this with the fact that the number of
directed spanning trees rooted into a given vertex is easy to com-
pute! The proof is based on a series of reductions of intermediate
problems using polynomial interpolations.
– Number of s − t paths. The input is a directed graph G ~ = (V, E),
and two vertices s, t ∈ V and the output is the number of paths
from s to t. The proof is similar to proving the #P-completeness
of counting cycles in a directed graph. That is, the number of s − t
paths cannot be approximated with an FPRAS unless RP = NP.
• J. Scott Provan and Michael O. Ball proved [144] that the following
problems are in #P-complete:
– Counting the independent sets in a bipartite graph.
– Counting vertex covers in a bipartite graph.
– Counting antichains in a partial order.
– Counting the minimum cardinality s−t cuts. The input is a directed
~ = (V, E) and two vertices s, t ∈ V , and the output is the
graph G
number of subsets V 0 ⊆ V \{s, t} such that the size of V 0 is minimal,
and there is no path from s to t in G~ \ V 0.
– Computing the probability that a graph is connected.
• Nathan Linial also gave a list of hard counting problems in geometry
and combinatorics [118]. His list is the following:
– Number of vertices of a polytope. The input is a set of linear in-
equalities in the form of Ax ≤ b. Each linear inequality is a half
space, and the intersection of them defines a polytope P ⊂ Rn .
The output is the number of vertices of P . The proof is based on
reducing the number of antichains to this problem using bijection
between the number of solutions.
210 Computational complexity of counting and sampling

– Number of facets of a polytope. The input is a set of linear inequal-


ities in the form of Ax ≤ b defining a polytope P and a fixed d.
The output is the number of d-dimensional facets of P . The proof
is based on reducing the number of vertices of a polytope to this
problem using a linear equation system.
– Number of d − 1-dimensional facets.
– Number of acyclic orientations of a graph. The input is a simple
graph G, and the output is the number of orientations of G not
containing a (directed) cycle. The proof is based on reducing the
computation of the chromatic polynomial to this problem using
polynomial interpolation.
– Components of a slotted space. The input is a set of hyperplanes
Hi of a Euclidian space Rn defined by linear equations, and the
output is the number of components of {Rn \ ∪i Hi }. The proof is
based on reducing the number of acyclic orientations of a graph to
this problem finding a bijection between the solutions.
– Number of 3-colorings of a bipartite graph. The input is a bipartite
graph G, and the output is the number of proper 3-colorings of
the vertices of G. The proof is based on reducing the number of
independent sets in a bipartite graph to this problem finding a
many-to-one mapping between the solutions. See also Exercise 13.
– Number of satisfying assignments of an implicative Boolean for-
mula. The input is a Boolean formula Φ in the form
^
(xi ∨ xj ), (4.76)

and the output is the number of satisfying assignments. The proof


is based on reducing the number of antichains of a poset to this
problem finding a bijection between the number of solutions.

• Leslie Ann Goldberg and Mark Jerrum showed that counting the non-
isomorphic subtrees of a tree is in #P-complete. The proof is based on
reducing the number of matchings in bipartite graphs to this problem
using polynomial interpolation [78].
• We learned that counting the Eulerian circuits in a Eulerian directed
graph is easy. The problem becomes hard for undirected graphs. The
proof was given by Graham Brightwell and Peter Winkler [27]. The
proof is based on reducing the number of Eulerian orientations to the
number of Eulerian circuits using modulo prime number calculations.
Patric John Creed proved #P-completeness for planar graphs [48] and
Qi Ge and Daniel Štefankovič proved that counting Eulerian circuits
remains #P-complete for 4-regular planar graphs [75].
#P-complete counting problems 211

• Salil Vadhan introduced an interpolation technique that preserves sev-


eral properties of graphs like regularity or sparseness [173]. Using this
technique, he was able to prove that many hard enumeration problems
remain in #P-complete when restricted to regular and/or planar graphs,
possibly further constrained to have small constant maximum degree.
Catherine Greenhill also applied this technique to prove that counting
the proper colorings of a graph remains #P-complete even if the maxi-
mum degree is 3 [82].

4.3.2 #BIS-complete problems


The #BIS-complete problems have been introduced by Dyer and his
coworkers [57]. #BIS is the counting problem asking the number of inde-
pendent sets in a bipartite graph. It is known that #BIS is in #P-complete,
as was mentioned in the previous subsection [144]. We say that two counting
problems #A and #B are AP-interreducible if there is an approximation pre-
serving polynomial reduction from #A to #B and there is also such reduction
from #B to #A. The #BIS-complete problems are those counting problems
which are AP-interreducible with #BIS. A large number of relatively diverse
counting problems are known to be #BIS-complete, and none of them are
known to have an FPRAS. Therefore, it is conjectured that these problems
do not have FPRAS, although we cannot prove it, even conditionally, that is,
assuming that RP does not equal NP. Dyer and his coworkers gave a logical
characterization of the #BIS class, based on the wok of Saluja, Subrahmanyam
and Thakur [148], who gave a logical characterization of the #P class similar
to Fagin’s characterization of NP [67]. Dyer and his coworkers proved that
several graph isomorphism problems, the #Antichain and the #1n1p-SAT
problems, are all #BIS-complete. The counting problem #1n1p-SAT asks
the number of satisfying assignments of a conjunctive normal form in which
each clause contains at most one negated literal and at most one non-negated
literal. The relative complexity of counting graph homomorphisms is a current
hot research topic [71].

4.3.3 Open problems


The following problems are suspected to be in #P-complete. The corre-
sponding decision/optimization problems are known to be solvable in polyno-
mial time.

• Number of realizations of degree sequences. A graph G is a realization


of a degree sequence D = d1 , d2 , . . . , dn , di ∈ Z+ , if the degrees of the
vertices of G are exactly D. Polynomial running time algorithms exist
to decide if a degree sequence D has a realization [63, 92, 88].
• Number of most parsimonious DCJ scenarios. In the DCJ (double
212 Computational complexity of counting and sampling

cut and join) model, genomes are represented as edge-labeled directed


graphs consisting of (not necessarily directed) paths and cycles. A DCJ
operation takes at most two vertices and combines the edge ends in it
into at most two new vertices. Finding the minimum number of DCJ
operations necessary to transform a genome into another can be done in
polynomial time [185, 17]. The number of solutions can be computed in
polynomial time in special cases [138, 127], but this counting problem
is conjectured to be in #P-complete in the general case.

• Number of most parsimonious reversal scenarios. A signed permutation


is a permutation of numbers 1, 2, . . . , n together with a sign assigned to
each number. A reversal flips a consecutive part of the permutation and
also changes the signs of the numbers. A single number can be reverted,
and in that case, only its sign is changed. Hannenhalli and Pevzner
gave the first polynomial running time algorithm to find the minimum
number of reversals necessary to transform a signed permutation into
+1, +2, . . . + n [89]. In spite of the enormous work on faster algorithms
to find a solution [107, 11, 166] and on exploring the solution space
[4, 154, 23, 165], there is no polynomial running time algorithm that
could count the most parsimonious reversal scenarios.

• Number of triangulations of a general polygon. We learned that com-


puting the number of triangulations of a convex polygon is easy, and in
fact, the number of triangulations of a convex n-gon is Cn−2 , the n − 2nd
Catalan number. However, there is no known polynomial time algorithm
to compute the number of triangulations of a general simple polygon,
although any simple polygon can be triangulated in linear time [40, 7].
• Number of evolutionary scenarios in perfect phylogenies. Let S be a
set of sequences of the same length from the alphabet {0, 1}. Let Si
be the subset of sequences that contain 1 at position i. We say that S
has a perfect phylogeny [90, 87] if for any Si and Sj , the intersection of
the two subsets is empty or one of the subsets contains the other. An
evolutionary scenario is a series of events, and each event is one of the
following two types:
(a) Substitution. A substitution changes a character in one of the se-
quences in one of the positions.
(b) Coalescent. Two identical sequences are merged into one.
It is easy to see that in case of perfect phylogeny, there is at most one
substitution in each position. Finding an evolutionary scenario with a
minimum number of events is easy. However, it seems to be hard to
compute how many shortest evolutionary scenarios exist.
#P-complete counting problems 213

4.4 Exercises
1. Generate a 3CNF Φ for each of these logical expressions such that Φ has
the same number of satisfying assignments as the logical expressions.

(a) * x1 → ((x2 ∧ x3 ) ∨ (x2 ∧ x4 ))


(b) ◦ (x1 ↔ x2 ) ∧ (x1 → x3 )
(c) (x1 ∧ x2 ) → (x3 ∨ x4 )

2. * Prove that it can be decided in polynomial time if a bipartite graph


has an even number of perfect matchings.
3. Construct the directed graph isomorphic to a junction defined by Valiant
in the proof of #P-completeness of the permanent. Its weighted adja-
cency matrix is in Equation (4.8).
4. Find the cycles picking up 0, 1 and 2 junctions + the 2 internal junctions
in an interchange. See Figure 4.3
5. Prove that there is no 3 × 3 matrix M that satisfies the following prop-
erties:
(a) per(M ) = 0
(b) per(M [1; 1]) = 0,
(c) per(M [3; 3]) = 0,
(d) per(M [1, 3; 1, 3]) = 0,
(e) per(M [1; 3]) = per(M [3; 1]) = non-zero constant.

6. Prove that there is no 4×4 matrix M satisfying the following conditions:


(a) det(M ) = 0
(b) det(M [1; 1]) = 0,
(c) det(M [4; 4]) = 0,
(d) det(M [1, 4; 1, 4]) = 0,
(e) det(M [1; 4]) = det(M [4; 1]) = non-zero constant.

7. ◦ Prove that #DNF remains a hard computational problem if it is


restricted to monotone disjunctive normal forms. Namely, prove that
#Mon-DNF is in #P-complete.
8. * Prove the #P-completeness of #Mon-2SAT by finding a parsimo-
nious reduction from #Matching to it.
214 Computational complexity of counting and sampling

9. Compute the number of linear extensions of the poset in Figure 4.7.

10. ◦ The Steiner tree problem is to find the smallest subtree of a graph
G = (V, E) that covers a subset of vertices S ⊆ V . Show that the
problem is NP-hard even if the problem is restricted to planar, 3-regular
graphs.
11. ◦ Prove that the number of Eulerian orientations remains in #P-
complete even if it is restricted to simple Eulerian graphs.
12. Prove that if V 0 is a minimal vertex cover in G = (V, E), then V \ V 0
forms a maximal clique in G, the complement of G.
13. * Prove the #P-completeness of the number of 3-colorings of a planar
graph by reducing the number of independent sets of a planar graph to
it.
14. The Erdős-Gallai theorem is the following. Let D = d1 ≥ d2 ≥ . . . ≥ dn
be a degree sequence. Then D is graphical (has at least one simple graph
G whose degrees are exactly D) if and only if the sum of the degrees is
even and for all k = 1, 2, . . . , n
k
X n
X
di ≤ k(k − 1) + min{k, dj }. (4.77)
i=1 j=k+1

Prove the necessary direction of the Erdős-Gallai theorem. That is, if


D is graphical, then the sum of the degrees is even and for all k =
1, 2, . . . , n, the Erdős-Gallai inequalities (Equation (4.77)) hold.

4.5 Solutions
Exercise 1.
(a) The Boolean circuit of the function is

∧ ∧

˜
x1 x2 x3 x4
#P-complete counting problems 215

Introduce new variables for the internal nodes of the circuit. The 3CNF
we are looking for is:

(x2 ∨ x3 ∨ y1 ) ∧ (x2 ∨ x3 ∨ y1 ) ∧ (x2 ∨ x3 ∨ y1 ) ∧ (x2 ∨ x3 ∨ y 1 ) ∧


(x2 ∨ x4 ∨ y2 ) ∧ (x2 ∨ x4 ∨ y2 ) ∧ (x2 ∨ x4 ∨ y2 ) ∧ (x2 ∨ x4 ∨ y 2 ) ∧
(y 1 ∨ y 2 ∨ y3 ) ∧ (y1 ∨ y 2 ∨ y 3 ) ∧ (y 1 ∨ y2 ∨ y 3 ) ∧ (y1 ∨ y2 ∨ y 3 ) ∧
(x1 ∨ y 3 ∨ O) ∧ (x1 ∨ y 3 ∨ O) ∧ (x1 ∨ y3 ∨ O) ∧ (x1 ∨ y3 ∨ O) ∧
(O ∨ O0 ∨ O”) ∧ (O ∨ O0 ∨ O”) ∧ (O ∨ O0 ∨ O”) ∧ (O ∨ O0 ∨ O”) ∧
(O ∨ O0 ∨ O”) ∧ (O ∨ O0 ∨ O”) ∧ (O ∨ O0 ∨ O”)

(b) Observe that the logical form x1 ↔ x2 can be described in 3CNF as

(x1 ∨ x2 ∨ y) ∧ (x1 ∨ x2 ∨ y) ∧ (x1 ∨ x2 ∨ y) ∧ (x1 ∨ x2 ∨ y),

where y is the outcome of x1 ↔ x2 .


Exercise 2. Observe that for any square matrix A,

per(A) ≡ det(A) mod (2).

Let A be the adjacency matrix of the bipartite graph G. Then, per(A) is the
number of perfect matchings, however it is det(A) modulo 2. Since the deter-
minant can be calculated in polynomial time, it can be decided in polynomial
time if G contains an even or an odd number of perfect matchings.
Exercise 7. Observe that monotone satisfiability is in #P-complete, and in
fact, restricted to #Mon-2SAT, the counting problem is still in #P-complete.
The complement of a monotone 2CNF can be expressed as a monotone 2DNF.
Exercise 8. Let G = (U, V, E) be a bipartite graph with n vertices on both
vertex classes. The 2CNF Φ that has as many satisfying assignments as the
number of matchings in G contains a logical variable xi,j for each (ui , vj ) ∈ E,
and it is defined as
   
^n ^ n ^
^
Φ :=  (xi,j1 ∨ xi,j2 ) ∧  (xi1 ,j ∨ xi2 ,j ) . (4.78)
i=1 j1 6=j2 j=1 i1 6=i2

Indeed, the 2CNF forces there to be at most one edge on each vertex. It
should be clear that Φ in Equation (4.78) is a monotone function of the logical
variables yi,j = xi,j . Therefore, we get that #Mon-2SAT is in #P-complete.
Exercise 10. Let G be a planar, 3-regular graph. Replace each vertex with
the gadget in Figure 4.12. Let this graph be denoted by G0 , and let S be the
set of vertices containing the v0 vertices of the gadgets. Show that from the
solution of the Steiner tree problem of G0 and S, it could be decided if G
contains a Hamiltonian path.
Exercise 11. Replace each edge with a path of length 3 (containing two
internal vertices).
216 Computational complexity of counting and sampling

Exercise 13. Reduce #BIS, the number of independent sets in bipartite


graphs to this problem. Let G = (U, V, E) be a bipartite graph. Extend it
with two vertices a and b, and connect a to V ∪ {b} and connect b to U ∪ {a}.
The so-obtained G0 is also a bipartite graph. Consider its 3-colorings. Then
without loss of generality, a has color 1 and b has color 2. Then the vertices
having color 3 form an independent set in G. Similarly, every independent
set of G indicates a good coloring of G0 . We obtain that the number of 3-
colorings of G0 is 6 times the number of independent sets of G. Indeed, there
are 6 possible good colorings of vertices a and b.
Chapter 5
Holographic algorithms

5.1 Holographic reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218


5.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
5.2.1 #X-matchings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
5.2.2 #Pl-3-(1,1)-Cyclechain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 228
5.2.3 #Pl-3-NAE-ICE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
5.2.4 #Pl-3-NAE-SAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234
5.2.5 #7 Pl-Rtw-Mon-3SAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 236
5.3 Further results and open problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
5.3.1 Further results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
5.3.2 Open problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
5.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 242
5.5 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243

The term holographic reduction was coined by Leslie Valiant. A holographic


reduction is a many-to-many mapping between two finite sets, such that the
sum of the fragments of elements in one set is mapped onto the sum of the
fragments of elements in another set. However, the mapping is such that it
proves the equality between the cardinality of the two sets. When the two sets
are the solutions of problem instances in two different counting problems, the
holographic reduction can prove that the two counting problems have the same
computational complexity. If one of the problems can be solved in polynomial
time, so can the other problem be solved. Similarly, the #P-completeness of
one of the problems proves the #P-completeness of the other problem.
In classical computational complexity, the reductions are one-to-one or
possibly many-to-one or one-to-many. An example of one-to-one mapping is
the mapping of the satisfying assignments of a CNF Φ to the satisfying as-
signments of a 3CNF Φ0 , see Subsection 4.1.1. This mapping proves the #P-
completeness of #3SAT (also, the 3CNF Φ0 to the corresponding CNF Φ must
be polynomial time computable). An example of a many-to-one reduction is
the mapping of Eulerian circuits of a directed Eulerian graph G ~ onto the ar-
borescences of the same graph, see Section 3.3. This mapping proves that the
number of Eulerian circuits of a directed Eulerian graph can be computed in
polynomial time since the number of arborescences of a directed graph also
can be computed in polynomial time and each arborescence in the mapping
appears in the same number of times, xG ~ , where xG~ is also a polynomial time

217
218 Computational complexity of counting and sampling

computable number depending only on G ~ and not on the arborescence. The


mapping of satisfying assignments of a 3CNF Φ onto the cycle covers of a
weighted graph is a one-to-many mapping proving the #P-hardness of com-
puting the permanent. Here a satisfying assignment is mapped onto a set of
cycle covers whose summed weight is a constant xΦ , which is polynomial time
computable and depends on only Φ and not on the satisfying assignment.
In holographic reductions, the correspondences between individual solu-
tions are no longer identifiable. This is a dramatically new computational
primitive that has not appeared even implicitly in previous works in computer
science. In his seminal paper [178], Valiant gave a list of computational prob-
lems solvable efficiently via holographic reductions. The introduced problems
might look artificial and might be of little interest, however, all the intro-
duced problems are in proximity of hard computational problems and cannot
be solved efficiently without holographic reductions. The intriguing question
is why the introduced problems are easy to compute and why the look-alike
problems fall into the #P-complete class. Answering this question might be
as hard as resolving the P vs. NP problem. As Valiant concluded, any P 6=
NP proof may need to explain, and not only imply, the absence of holographic
reductions between polynomial time computable and #P-complete counting
problems.
The theory of holographic reductions has been developed considerably in
the recent years. Many counting problems have been proved to be in #P-
complete via holographic reductions. Jin-Yi Cai and his coworkers developed
dichotomy theories for counting problems. They published a comprehensive
book on it [32], and interested readers are referred to that. In this book, we
give only a brief introduction of the theory and a few simple examples.

5.1 Holographic reduction


Holographic reduction is a linear transformation between the solutions of
a problem instance of a counting problem and the weighted perfect matchings
of a planar graph. The planar graph consists of gadgets called matchgates
and connecting edges. There are two types of matchgates, recognizers and
generators. The generator matchgates have some distinguished nodes called
output nodes, while the recognizers have some distinguished nodes called input
nodes. One might consider a third type of matchgate, a transducer matchgate
that has both input and output nodes, however, such mixed matchgates are
not considered here. Any connecting edge connects an output vertex of a
generator to an input vertex of a recognizer. Clearly, input and output vertices
must be outer vertices in the planar embedding of matchgates. The number
of input or output vertices is the arity of the matchgate. The entire graph is
called a matchgrid.
Holographic algorithms 219

Only the edges in the matchgates have non-trivial weights, and the weight
of any edge in C is 1. The weights come from an arbitrary field, or even more
generally, from an arbitrary commutative ring. Let G denote the weighted
graph. We define X Y
Z(G) := w(e) (5.1)
M ∈P M (G) e∈M

where P M (G) denotes the set of perfect matchings, and w is the weight func-
tion on the edges. Recall that computing this number, that is, the sum of
weights of perfect matchings in a planar graph, needs only a polynomial num-
ber of ring operations for any commutative ring, see Theorems 41 and 31.
We define the standard signature of a matchgrid. Let X be a (generator
or recognizer) matchgate. Let ei1 , ei2 , . . . , eil be the edges that are incident to
the input or output vertices of X. For any subset F of these edges, let X \ F
denote the graph obtained from X in which the vertices incident to any edge
in F are omitted. We define the index of F as
X
ind(F ) := 1 + 2j−1 .
eij ∈F

The standard signature of X is a 2l dimensional vector, for which the entry


at index ind(F ) is Z(X \ F ).
We are going to rearrange the summation in Equation (5.1). Let X de-
note the set of generator matchgates, let Y denote the set of the recognizer
matchgates, and let C denote the edges of G that connect the output nodes of
the generator matchgates with the input nodes of the recognizer matchgates.
Furthermore, for any F ⊆ C and Xi ∈ X (respectively, Yj ∈ Y ), let Xi \ F
(respectively, Yj \F ) denote the planar graph in which vertices incident to any
edge in F are omitted. We can partition the perfect matchings in G based on
which edges coming from C are used in the perfect matching. Therefore we
get that X Y Y
Z(G) = Z(Xi \ F ) Z(Yj \ F ). (5.2)
F ∈2C Xi ∈X Yj ∈Y

Observe that we can look at Z(G) as a tensor of the standard signature vectors
of the matchgates. Indeed, it is easy to see that Z(G) is a linear function for
each standard signature vector. The holographic reduction is a base change of
this tensor.
For readers not familiar with tensor algebra, we derive a linear algebraic
description which is more tedious, but does not need background knowledge
of base changes in tensor algebra. Observe that the right-hand side of Equa-
tion (5.2) is the dot product of two vectors of dimension 2|C| . We can explicitly
define these vectors in the following way. We are looking for the vectors in the
forms
g := (g1 , g2 , . . . , g2|C| )
and
rT := (r1 , r2 , . . . , r2|C| ).
220 Computational complexity of counting and sampling

That is, we are looking for a row vector g and a column vector r. We give
an indexing of the edges in C in such a way that first we consider the edges
of the last generator matchgate, then the edges of the next to last generator
matchgate, etc. This implies an indexing of the subsets of C. Any subset F
of C has a membership vector s = (s1 , s2 , . . . s|C| ), in which sm is 1 if the
member of C with index m participates in F and otherwise 0. Then the index
of F is defined as
|C|
X
1+ sm 2m−1 .
m=1

Let F be the subset with index k. Then


Y
gk := Z(Xi \ F ) (5.3)
Xi ∈X

and Y
rk := Z(Yj \ F ). (5.4)
Yj ∈Y

A holographic reduction is simply the following. Let B be a matrix and g 0


be a vector such that
g 0 B = g. (5.5)
Then obviously,
Z(G) = g 0 Br = g 0 (Br). (5.6)
That is, the holographic map of the weighted perfect matchings is the dot
product of g 0 and Br. We are looking for those matrices B for which this dot
product has a combinatorial meaning. After heavy algebraic considerations,
we will arrive at a picture that the recognizer and generators communicate via
the possible configurations (presence or absence of edges) of C and compute
a value which is the value of a solution of a problem instance represented by
the current configuration.
We are seeking for B as tensor products of two vectors of dimension 2,
n = (n1 , n2 ) and p = (p1 , p2 ). Recall that the tensor product of two vectors,
v = (v1 , v2 , . . . , vn ) and w = (w1 , w2 , . . . , wm ) is

v ⊗ w = (v1 w1 , v1 , w2 , . . . v1 wm , v2 w1 , . . . v2 wm , . . . , vn wm ).

Observe that the tensor product is associative but not commutative. If


|C|
X
k =1+ sm 2m−1
m=1

then the k th row of B is the tensor product whose mth factor is p if sm = 1


and n if sm = 0.
The consequence of constructing B in this way is that the vectors g 0 and
Holographic algorithms 221

Br can be directly computed via the matchgates, and the computation we


have to perform on a matchgate is independent from other matchgates.
First, we introduce the computation for recognizer matchgates. Let Yj be a
recognizer matchgate, let {ei1 , ei2 , . . . , eilj } be the edges incident to the input
vertices of Yj and let uj be its standard signature. Let z be a 0-1 vector of
dimension |C|. Let bj be the tensor product of lj vectors whose mth factor is
p if the ith th
m coordinate of z is 1 and n if the im coordinate of z is 0. Then we
define
V alR(Yj , z) := uj bj ,
that is, the inner product of uj and bj . We can now state the main theorem
for recognizer gates.

Theorem 57. Let z be a 0-1 vector of dimension |C|, and let


|C|
X
k := 1 + zm 2m−1 .
m=1

Then the k th coordinate of Br is


Y
V alR(Yj , z). (5.7)
Yj ∈Y

Proof. Observe that the k th coordinate of Br is the inner product of b and r,


where b is the tensor product of |C| vectors in which the mth factor is p if the
mth coordinate of z is 1 and n if the mth coordinate of z is 0. Therefore it is
sufficient to show that Y
br = V alR(Yj , z).
Yj ∈Y

On the left-hand side, we have the sum


|C|
2
X Y
bi Z(Yj \ Fi ) (5.8)
i=1 Yj ∈Y

where Fi is the subset of C with index i, where index i is defined for F as


above. Each factor on the right-hand side is also an inner product, that is, the
right-hand side is
lj
Y X
Z(Yj \ Ftj )bjtj . (5.9)
Yj ∈Y tj =1
P
Observe that tCj = C and thus lj = |C|. That is, if we factorize the product
in Equation (5.9) we get exactly 2|C| terms that correspond to the 2|C| terms
in Equation (5.8). Indeed, for any i, Fi can be unequivocally factorized as
tFtj such that for each Yj , Yj \ Fi = Yj \ Ftj and bi = bjtj .
Q
222 Computational complexity of counting and sampling

Similar computations can be developed for generators. Let Xi be a


generator with outgoing vertices vj1 , vj2 , . . . , vjli corresponding to edges
ej1 , ej2 , . . . , ejli incident to these edges. Let ui be the standard signature of
Xi . Let b1 , b2 , . . . , b2li be vectors such that for
li
X
t=1+ sm 2mi
m=1

bt is the tensor product of li vectors such that the mth factor is p if sm = 1


and n if sm = 0. Let wi = (w1i , w2i , . . . , w2i li ) be the vector satisfying
l
2i
X
i
ui = wm bm . (5.10)
m=1

Then we define
i
V alG(Xi , bm ) := wm . (5.11)
For these values the following theorem holds.
Theorem 58. Let
|C|
X
k =1+ sm 2m−1 .
m=1

For each generator Xi , we define bi,k as a tensor product of li vectors, such


that the tth factor is p if sjt = 1 and p if sjt = 0, where jt is the index of the
edge incident to the output vertex vjt . Then the k th coordinate of g 0 is
Y
V alG(Xi , bi,k ). (5.12)
Xi ∈X

Proof. In an analogous way to constructing B, for each generator Xi , we can


define a matrix B i whose rows are tensor products of p and n. Observe the
following.
(a) The relationship between the standard signature and the vector g is that

g = u1 ⊗ u2 ⊗ . . . ⊗ u|X| .

(b) For each Xi , it holds that

ui = wi B i .

(c) We can construct B as

B = B 1 ⊗ B 2 ⊗ . . . ⊗ B |X| .
Holographic algorithms 223

Then the remainder of the proof is simply applying the basic properties
of the tensor product. Indeed,

g = u1 ⊗ u2 ⊗ . . . ⊗ u|X| = (w1 B 1 ) ⊗ (w2 B 2 ) ⊗ . . . × (w|X| B |X| ) =


(w1 ⊗ w2 ⊗ . . . ⊗ w|X| )(B 1 ⊗ B 2 ⊗ . . . ⊗ B |X| ) =
(w1 ⊗ w2 ⊗ . . . ⊗ w|X| )B. (5.13)

That is, we get that

g 0 = (w1 ⊗ w2 ⊗ . . . ⊗ w|X| ). (5.14)

Observe that this is exactly our claim, that is, the k th coordinate of g 0
i
is the appropriate product of wm i
terms.

For any F ⊆ C, we define z(F ) to be the membership vector of F , and


for any F ⊆ C and Xi ∈ X whose output vertices are incident to edges
ej1 , ej2 , . . . , ejli , we define bi (F ) as the tensor product of li . The k th factor in
the tensor product is p if ejk ∈ F and n if ejk ∈ / F . We get that
X Y Y
Z(G) = g 0 (BrT ) = V alG(Xi , bi (F )) V alR(Yi , z(F )). (5.15)
F ⊆C Xi ∈X Yi ∈Y

Valiant denoted a matchgrid with Ω, and defined the Holant of a matchgrid


as
X Y Y
Hol(Ω) := V alG(Xi , bi (F )) V alR(Yi , z(F )). (5.16)
F ⊆C Xi ∈X Yi ∈Y

We arrived at the key theorem of the holographic reductions.


Theorem 59. Let Ω = (X, Y, C) be a matchgrid, and let G be the edge-
weighted graph building Ω. Then for any base,

Z(G) = Hol(Ω). (5.17)

A holographic algorithm builds a matchgrid whose Holant in some base


is the solution of the problem instance. If the matchgrid is planar, then the
partition function of its underlying weighted graph can be computed in poly-
nomial time, thus, the problem can be solved in polynomial time (given that
the construction of the matchgrid can be done in polynomial time). In the
next section, we give computational problems solvable in polynomial time
using holographic algorithms.
224 Computational complexity of counting and sampling

V2 5 V1

−1 3

V1 −4 V2

FIGURE 5.1: An edge-weighted bipartite planar graph as an illustrative


example of the #X-matchings problem. See text for details.

5.2 Examples
5.2.1 #X-matchings
Recall that counting the not necessarily perfect matchings even in planar
graphs is #P-complete, see Subsection 4.2.4. On the other hand, the following
problem is in FP.
Problem 12.
Name: #X-matchings.
Input: a planar bipartite graph, G = (V1 , V2 , E), where each vertex in V1 have
degree 2, and a weight function w : E → R.
Output: the sum of the weights of matchings, where a weight of a matching
consists of the product of the weights of the edges participating in the match-
ing and also the product of −1 times the sum of the edge weights incident to
each vertex in V2 not covered by any edge of the matching.
We give an illustrative example in Figure 5.1. There are two vertices in the
class V1 and also two vertices in V2 . There are two perfect matchings with the
scores −3 and −20. There are five imperfect matchings, the empty matching
and the matchings containing only one edge. The empty matching has the
score (−(−1 + 5))(−(−4 + 3)) = −4. The other four matchings have weights
5, −1, −12 and +16. Thus the value to be computed for this problem instance
is −3 − 20 − 4 + 5 − 1 − 12 + 16 = −19.
The holographic algorithm transforms any edge-weighted, bipartite, planar
graph G into a matchgrid. In the matchgrid, edges of G are represented with
the edges in C. Each vertex in V1 is replaced with a generator matchgate.
Each vertex in V2 is replaced with a recognizer matchgate. The weights for
the edges in G will appear in the recognizer matchgates. The matchgates are
Holographic algorithms 225

constructed in such a way that for each F ⊆ C,


Y Y
V alG(Xi , bi (F )) V alR(Yi , z(F ))
Xi ∈X Yi ∈Y

is the weight of the corresponding matching in G. We will use the base n =


(−1, 1), p = (1, 0).
The generator matchgate contains two output vertices connected with an
edge with weight −1. Its standard signature is (−1, 0, 0, 1). It is easy to see
that
(−1, 0, 0, 1) = n ⊗ n + n ⊗ p + p ⊗ n. (5.18)
That is, the signature of this matchgate in the given basis is (1, 1, 1, 0). This
is what we would like to get. Indeed, we would like to have a factor 1 for all
cases where at most one edge is incident to a vertex in V1 .
Each vertex in v ∈ V2 is replaced with a recognizer matchgate Y . The
recognizer matchgate is a star tree. The number of leaves is the degree of v
and the weights of the edges of the matchgate are the weights of the edges
incident to v.
The standard signature of such a matchgate is wi for those coordinates
corresponding the cases where all the input vertices are omitted except the
vertex incident to the edges with weight wi . All other coordinates are 0. It is
easy to see that the tensor product

n ⊗ n ⊗ ... ⊗ n

is −1 at those coordinates where the standard signature of the matchgate is


wi . Therefore, when z is the all-0 vector, then
X
V alR(Y, z) = − wi .
i

The following two observations are also easy to see. If z is the all-0 vector,
except its value is 1 in position i, then

V alR(Y, z) = wi ,

and if z contains at least two 1s, then

V alR(Y, z) = 0.

These are exactly the values what we would like to get. Indeed, if a vertex
v is not covered by an edgeP in the matching, then its contribution to the
score of the matching is − i wi . If it is incident to an edge in the matching,
and the edge has weight wi , then its contribution is wi . We do not consider
configurations when v is incident to more than one edge in the matching.
We get that our example problem instance can be solved with the match-
grid in Figure 5.2. As an edge-weighted graph G, it is an even cycle, thus
226 Computational complexity of counting and sampling

5 e4 X2
Y1
−1 −1

e1 e3

−1 3
X1 e2 −4 Y2

FIGURE 5.2: The matchgrid solving the #X-matching problem for the
graph in Figure 5.1. The edges labeled by ei belong to the set of edges C,
and are considered to have weights 1. See the text for details.

it has two perfect matchings with weights −15 and −4. That is, its parti-
tion function Z(G) is −19, just the solution of the #X-matchings problem
instance. Observe that the relationship between the 7 matchings in the #X-
matching problem and the 2 perfect matchings of the underlying matchgrid
can no longer be identified.
To give a detailed computation of the holographic reduction, we also com-
pute g, r, g 0 and Br for this particular problem instance. Both generators
have standard signature (−1, 0, 0, 1). Vector g is their tensor product, that is

g = (−1, 0, 0, 1) ⊗ (−1, 0, 0, 1) =
(1, 0, 0, −1, 0, 0, 0, 0, 0, 0, 0, 0, −1, 0, 0, 1).

The recognizer Y1 has standard signature (0, 3, −4, 0), the recognizer Y2 has
standard signature (0, 5, −1, 0). However, edges e1 and e4 are incident to the
input vertices of Y1 and edges e2 and e3 are incident to the input vertices of
Y2 . Therefore, the coordinates of the tensor product of the standard signatures
have to be permuted to follow the order of different subsets of C. Thus, the
recognizer vector is

rT = (0, 0, 0, 15, 0, −20, 0, 0, 0, 0, −3, 0, 4, 0, 0, 0).

The scalar product gr is indeed −19. Matrix B representing the base trans-
formation is:
Holographic algorithms 227
 
1 −1 −1 1 −1 1 1 −1 −1 1 1 −1 1 −1 −1 1

 −1 1 1 −1 1 −1 −1 1 0 0 0 0 0 0 0 0 


 −1 1 1 −1 0 0 0 0 1 −1 −1 1 0 0 0 0 


 1 −1 −1 1 0 0 0 0 0 0 0 0 0 0 0 0 


 −1 1 0 0 1 −1 0 0 1 −1 0 0 −1 1 0 0 


 1 −1 0 0 −1 1 0 0 0 0 0 0 0 0 0 0 


 1 −1 0 0 0 0 0 0 −1 1 0 0 0 0 0 0 


 1 −1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 


 −1 0 1 0 1 0 −1 0 1 0 −1 0 −1 0 1 0 


 1 0 −1 0 −1 0 1 0 0 0 0 0 0 0 0 0 


 1 0 −1 0 0 0 0 0 −1 0 1 0 0 0 0 0 


 −1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 


 1 0 0 0 −1 0 0 0 −1 0 0 0 1 0 0 0 


 −1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 

 −1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 
1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
It is easy to verify that

(Br)T = (−4, 5, −12, 15, 16, −20, 0, 0, −1, 0, −3, 0, 4, 0, 0, 0).

The signature of the generators in the new base is (1, 1, 1, 0). The vector g 0 is
their tensor product, that is

g 0 = (1, 1, 1, 0) ⊗ (1, 1, 1, 0) =
(1, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0).

It is easy to see that g 0 B is indeed g. Finally, the non-zero terms in the scalar
product of g 0 and Br are

−4, 5, −12, 16, −20, −1, 3,

which indeed correspond to the scores of the 7 matchings of our example


#X-matching problem instance.
Finally, it should be clear that the holographic reduction is used only to
prove that the partition function of the weighted graph corresponding to the
matchgrid is exactly the solution of the computing problem that we solve via
holographic reduction. The signature vectors g and r have dimension 2|C| ,
that is, they clearly grow exponentially with the problem size. Instead of
performing the computations directly via the holographic transformation, the
holographic algorithm instead computes a Pfaffian orientation of the weighted
graph corresponding to the matchgrid, and it computes the Pfaffian of the ori-
ented, weighted adjacency graph using a polynomial running time algorithm,
see Theorem 31.
A possible Pfaffian orientation is when e1 is oriented clockwise, and all
other edges anti-clockwise. If we start numbering the vertices anticlockwise
starting with the vertex of X1 incident to e1 , then the Pfaffian of the oriented,
228 Computational complexity of counting and sampling

weighted adjacency matrix is


 
0 −1 0 0 0 0 0 0 0 1
 1 0 1 0 0 0 0 0 0 0 
 
 0 −1 0 −4 0 0 0 0 0 0 
 
 0 0 4 0 3 0 0 0 0 0 
 
 0 0 0 −3 0 1 0 0 0 0 
 = −19.
Pf 
 0
 0 0 0 −1 0 −1 0 0 0 

 0 0 0 0 0 1 0 1 0 0 
 
 0
 0 0 0 0 0 −1 0 5 0 

 0 0 0 0 0 0 0 −5 0 −1 
−1 0 0 0 0 0 0 0 1 0

5.2.2 #Pl-3-(1,1)-Cyclechain
A cycle-chain cover of a graph G = (V, E) is a subgraph G0 = (V, E 0 ),
E ⊆ E, such that all components in G0 are cycles or paths. Let CC denote
0

a cycle-chain cover, let c(CC) denote the number of cycles in CC, let p(CC)
denote the number of paths in CC, and let C(G) denote the set of cycle covers
of G. Then the (x, y)-cycle-chain sum is defined as
X
xc(CC) y p(CC) .
CC∈C(G)

It is known that counting the number of Hamiltonian cycles in planar 3-regular


graphs is #P-complete [119]. Therefore, computing the (0, k)-cycle-chain sum
for arbitrary k is also #P-hard, see also Exercise 7. On the other hand, the
following cycle-chain sum problem can be solved in polynomial time.
Problem 13.
Name: #Pl-3-(1,1)-Cyclechain.
Input: a planar 3-regular graph G.
Output: the number of cycle-chains in G.
We would like to generate a matchgrid such that for its corresponding weighted
graph G0 , it holds that Z(G0 ) is the number of cycle chains in G. We add a
vertex to the middle of each edge to transform the planar graph into a planar
bipartite graph. We would like to transform these new vertices into generator
matchgates such that in some base, they have signature (1, 0, 0, 1). That is,
the absence of both new edges in set C incident to the output vertices of the
matchgate correspond to the absence of the edge in G, while the presence of
both these new edges in C corresponds to the presence of the edge in G.
We would like to transform any vertex with degree 3 into a matchgate
with signature (0, 1, 1, 1, 1, 1, 1, 0). That is, one or two of its input nodes are
incident to presented edges in C.
We choose the base n = (1, 1), p = (1, −1). A possible generator matchgate
is
Holographic algorithms 229

1 2 2

since its standard signature is (2, 0, 0, 2), and indeed,

n ⊗ n + p ⊗ p = (1, 1, 1, 1) + (1, −1, −1, 1) = (2, 0, 0, 2).

A possible recognizer matchgate is

− 14
−1 −1

− 14
− 14

−1

where the outer vertices are the 3 input vertices. It is easy to see that its
T
standard signature is 34 , 0, 0, − 14 , 0, − 14 , − 14 , 0 and its signature in the given
base is
  3 
n⊗n⊗n 4
 p ⊗ n ⊗ n  0 
  
 n ⊗ p ⊗ n  0 
  1 
 p ⊗ p ⊗ n  − 
 4 
 n ⊗ n ⊗ p  0  =

  
 p ⊗ n ⊗ p   −1 
   41 
 n ⊗ p ⊗ p  − 
4
p⊗p⊗p 0
  3   
1 1 1 1 1 1 1 1 4 0
1 1
 1 1 −1 −1 −1 −1  0   1 
   
 1 1 −1 −1 1 1 −1 −1  0   1 
   

 1  

 1 1 −1 −1 −1 −1 1 1  −4  =  1  .


 1 −1 1 −1 1 −1 1 −1  0   1 
    
 1 −1 1 −1 −1 1 −1 1   − 1   1 
   41   
 1 −1 −1 1 1 −1 −1 1   − 4   1 
1 −1 −1 1 −1 1 1 −1 0 0

To give an example, consider the simplest 3-regular graph, K4 . Its set of


cycle-chain covers consists of

(a) 3 perfect matchings,


230 Computational complexity of counting and sampling

(b) 3 Hamiltonian cycles,

(c) 12 Hamiltonian paths,


that is, altogether 18 cycle-chain covers.
Its matchgrid consists of 4 recognizer matchgates and 6 generator match-
gates, altogether 40 vertices. The recognizers and the generators are connected
with 12 edges belonging to set C. When none of the edges in C participates
in a perfect matching:
Y3

X3

X6 X5
Y4

X2

X1

Y1 X4 Y2

there are 81 perfect matchings: 3 perfect matchings in each recognizer match-


gate and 1-1 perfect matchings in the generator matchgates, making 34 = 81
combinations. Each of the global perfect matchings have a score 14 . There are
perfect matchings when exactly two of the generators have no edges in C in-
cident to both of their output vertices. These 4 missed edges must be in this
configuration
Y3

X3

X6 X5
Y4

X2

X1

Y1 X4 Y2

or the other two of them symmetric to it. (We can say that they are related
to the perfect matchings of K4 .) Each of them has a score 14 .
When exactly 3 of the generators have edges in C incident to both of their
output vertices, they must be in this configuration:
Holographic algorithms 231
Y3

X3

X6 X5
Y4

X2

X1

Y1 X4 Y2

or the other 3 of them symmetric to this to have perfect matchings. (We can
say that they are related to the triangles in K4 .) In each of these 4 cases, there
are 3 perfect matchings, each of them having a score − 14 .
There are no other subsets of the set C for which there are perfect match-
ings, since any generator matchgate must have either 0 or 2 edges in C incident
to its output vertices and any recognizer matchgate must have either 0 or 2
edges in C incident to its input vertices.
To summarize, there are 96 perfect matchings in the matchgrid, 84 having
score 14 and 12 having score − 1
 4 . Thus, the partition function of the matchgrid
1 1
is indeed 84 × 4 + 12 × − 4 = 18, the number of cycle-chain covers of K4 .

5.2.3 #Pl-3-NAE-ICE
The so-called “ice” problems are considered in statistical physics. They are
orientation problems. That is, the input is an unoriented graph G, and the
solutions are assignments of a direction to each of its edges satisfying some
constraint. We are interested in the number of such orientations. In its initial
work, Linus Pauling proposed to count the orientations of a planar squared
lattice, where each vertex has two incoming and two outgoing edges [141].
We know that counting the Eulerian orientations remains #P-complete even
restricted to planar graphs [49]. On the other hand, the following problem can
be solved in polynomial time using holographic reduction.
Problem 14.
Name: #Pl-3-NAE-ICE.
Input: a planar graph G with maximum degree 3.
Output: the number of orientations of the edges of G such that no vertex has
all incoming or all outgoing edges.
Here NAE stands for “not all equal”. Let G be a planar graph with maximum
degree 3. We would like to generate a matchgrid such that for its correspond-
ing weighted graph G0 , it holds that Z(G0 ) is the number of not-all-equal
orientations of G. First, we add a vertex to the middle of each edge of G to
get a planar bipartite graph. We would like to transform these new vertices to
generator matchgates such that in some base, they have signature (0, 1, 1, 0).
232 Computational complexity of counting and sampling

That is, the presence of one of the new edges in the set C will correspond to
an orientation of an edge in G.
We would like to replace each original vertex in G to a recognizer match-
gate. If the original vertex has degree one, then the signature of the recognizer
matchgate in the given base is (1, 1), and the signature must be (0, 1, 1, 0)
(must be (0, 1, 1, 1, 1, 1, 1, 0), respectively) for degree 2 (for degree 3, respec-
tively) vertices.
We construct such matchgates in the base n = (1, 1), p = (1, −1). Then

n ⊗ p + p ⊗ n = (1, −1, 1, −1) + (1, 1, −1, −1) = (2, 0, 0, −2).

A possible generator is

1 −2 2

where the first and the last vertex are the output vertices. Indeed, this graph
has one perfect matching with score 2. If only one of the output vertices are
removed, then the number of vertices is odd, therefore, there is no perfect
matching in it. Finally, if both output vertices are removed, it has one perfect
matching with score −2.
A possible recognizer matchgate representing a degree 1 vertex is

where any of the two vertices is the only input vertex. Its standard signature
is (1, 0)T , and indeed
       
n 1 1 1 1 1
= = .
p 0 1 −1 0 1

A possible recognizer matchgate for degree 2 vertices is

1 −0.5 0.5

where the first and the last vertex are the input vertices. Indeed, it has stan-
dard signature (0.5, 0, 0, −0.5)T and
       
n⊗n 0.5 1 1 1 1 0.5 0
 p ⊗ n   0   1 −1 1 −1   0   1 
  =   =  .
 n ⊗ p   0   1 1 −1 −1   0   1 
p⊗p −0.5 1 −1 −1 1 −0.5 0

The recognizer matchgate constructed in the #Pl-3-(1,1)-Cyclechain problem


works perfectly for the degree 3 vertices, since each degree 3 vertex must have
either one or two edges incident to it.
We again illustrate the problem with a small problem instance. It is in
Figure 5.3. The problem instance has 4 not-all-equal orientations. Indeed, the
Holographic algorithms 233

FIGURE 5.3: An example problem instance for the problem #Pl-3-NAE-


ICE.

1 −2 2

− 14
−1 −1

− 14
− 14

−1

1 −2 2 1 −2 2

1 − 12 1
2 1 −2 2 1 − 12 1
2

FIGURE 5.4: The matchgrid solving the #Pl-3-NAE-ICE problem for the
problem instance in Figure 5.3. The edges belonging to the edge set C are
dotted. The recognizer matchgates are put into dashed circles.
234 Computational complexity of counting and sampling

triangle can be oriented in two different ways, and an edge incident to the
degree 1 vertex can be arbitrary oriented independently from the orientation
of the triangle.
The corresponding matchgate that computes the number of orientations
is in Figure 5.4. There are three perfect matchings of the arity 3 recognizer
matchgate, and each of them can be extended in a single unique way to a
perfect matching of the entire graph. Each of these perfect matchings has
score 1. There is no perfect matching of the graph containing the edge in
C incident to the upper input vertex of the arity 3 matchgates. There is one
perfect matching in which the bottom 2 input vertices of the arity 3 matchgate
are incident to edges in C. This perfect matching also has score 1. Therefore
the partition function of the weighted graph building the matchgate is indeed
4, which is the number of not-all-equal orientations of the graph in Figure 5.3.
Although in this example there are 4 orientations of the input graph and
4 perfect matchings of the matchgate graph, there is no natural one-to-one
correspondence between these solutions.

5.2.4 #Pl-3-NAE-SAT
For any logical formula,

c1 ∧ c2 ∧ . . . ∧ ck

we can assign a bipartite graph G = (U, V, E), where U represents the clauses
c1 , c2 , . . . , ck , V represents the logical variables x1 , x2 , . . . , xn , and there is an
edge connecting ui with vj if xj participates in clause ci . A logical formula is
called planar if G is planar. A clause is a not-all-equal clause if it is TRUE
when there are two literals with different values. A not-all-equal formula is a
logical formula in which all clauses are not-all-equal clauses.
We know that Pl-3SAT (where the problem instances are planar 3CNFs)
is an NP-complete decision problem, and #Pl-3SAT is #P-complete [96]. The
existence problem of Pl-Mon-NAE-SAT (where the problem instances are pla-
nar, monotone not-all-equal formulae) is reducible to the Four Color Theorem,
and therefore, always have a solution [13]. However, counting the 4-colorings
of a planar graph is #P-complete [179]. On the other hand, the following
problem is solvable in polynomial time.
Problem 15.
Name: #Pl-3-NAE-SAT.
Input: a planar, not-all-equal formula Φ in which each clause has 2 or 3 literals.
Output: the number of satisfying assignments of Φ.
We are going to construct a matchgrid using again the base n = (1, 1), p =
(1, −1), which appeared to be a very useful base in designing holographic
reductions.
Let Φ be a planar not-all-equal formula, and let G = (U, V, E) be its corre-
sponding planar graph. Each clause vertex in U will be represented with a rec-
Holographic algorithms 235

ognizer matchgate, each edge will be represented with a generator matchgate,


and each variable vertex in V will be represented with a subgraph possibly
containing several recognizer and generator matchgates.
We already introduced the arity 2 and arity 3 not-all-equal recognizer
matchgates in the #Pl-3-NAE-ICE problem. We give here the arity 2 all-
equal recognizer gate:

1 0.5 0.5

where the first and the last vertices are the input vertices. Indeed, it has
standard signature (0.5, 0, 0, 0.5)T and
       
n⊗n 0.5 1 1 1 1 0.5 1
 p ⊗ n   0   1 −1 1 −1   0   0 
 n ⊗ p   0  =  1 1 −1 −1   0  =  0  .
       

p⊗p 0.5 1 −1 −1 1 0.5 1

A possible all-equal arity 3 recognizer matchgate is

1
1 4 1
3 3

1
4
1
4

1
3

where the outer vertices are the 3 input vertices. It is easy to see that its
T
standard signature is 14 , 0, 0, 14 , 0, 14 , 14 , 0 and its signature in the given base
is
236 Computational complexity of counting and sampling

1
  
n⊗n⊗n 4

 p⊗n⊗n  0 
 

 n⊗p⊗n  0 
 1 

 p⊗p⊗n  
 4  =

 n⊗n⊗p  0 
 1 

 p⊗n⊗p  
  41 
 n⊗p⊗p 
4

p⊗p⊗p 0
1
    
1 1 1 1 1 1 1 1 4 1
1
 1 1 1 −1 −1 −1 −1 
 0  
  0 

1
 1 −1 −1 1 1 −1 −1 
 0   0 
1 
  
1
 1 −1 −1 −1 −1 1 1 


4 = 0 
.
1
 −1 1 −1 1 −1 1 −1 
 0  
  0 

1 
1
 −1 1 −1 −1 1 −1 1 
 4 

 0 

1 
1 −1 −1 1 1 −1 −1 1  4
 0 
1 −1 −1 1 −1 1 1 −1 0 1

We also already introduced the all-equal generator matchgates in the #Pl-3-


(1,1)-Cyclechain problem, and the not-all-equal generator matchgates in the
#Pl-3-NAE-ICE problem. We can use these matchgates to build a matchgrid
in the following way.
(a) Replace all vertices in U with the appropriate arity (2 or 3) not-all-equal
recognizer matchgate.
(b) Replace all degree 2 and degree 3 vertices in V with an arity 2 or arity
3 all-equal recognizer matchgate.
(c) Replace each degree k, k > 3 vertex in V with a chain of k − 2 arity 3
all-equal recognizer matchgates, connected with k − 1 (arity 2) all-equal
generator matchgates. The first and the last recognizer matchgates have
2 free input vertices, and all other recognizer matchgates have 1 free
input vertex. Therefore, this component has k free input vertices.
(d) Replace and edge (ui , vj ) in G with a not-all-equal generator matchgate
if variable xj is a negated literal in clause ci , and replace it with an
all-equal generator matchgate if variable xj is not negated in clause xi .

5.2.5 #7 Pl-Rtw-Mon-3SAT
Our last example is one of the most curious problems solvable in polyno-
mial time using holographic reduction. A logical formula is called read-twice
if each variable appears in exactly 2 clauses. The Pl-Rtw-Mon-3SAT problem
asks the satisfiability of a planar, read-twice, monotone 3CNF. It is trivially
Holographic algorithms 237

in P, since the all TRUE assignments naturally satisfies it. Surprisingly, #Pl-
Rtw-Mon-3SAT is still a #P-complete problem, furthermore, deciding if there
are an even number of satisfying assignments is ⊕P-complete, and thus, NP-
hard [174] (problems in ⊕P ask the parity of the number of witnesses of
problems in NP; the notation ⊕P is pronounced “parity-p” and also denoted
by PP). On the other hand, the following problem can be solved in polynomial
time.
Problem 16.
Name: #7 Pl-Rtw-Mon-3SAT.
Input: a planar, read twice, monotone 3CNF.
Output: the number of satisfying assignments modulo 7.
We would like to design a matchgate in which each clause is replaced with
an arity 3 recognizer matchgate with a signature

Br = (0, 1, 1, 1, 1, 1, 1, 1)T

in some base (modulo 7), and each variable is replaced with an arity 2 gener-
ator matchgate with a signature

g 0 = (1, 0, 0, 1)

in some base, also modulo 7. That is, in the matchgrid, each F ⊆ C having a
value 1 represents a satisfying assignment.
We will work in the base n = (5, 4), p = (1, 1), and all computations are
in the finite field F7 . Each clause is replaced with a recognizer matchgate

1 1
2

where the 3 vertices of the triangle are the input vertices. It is easy to see that
it has standard signature

r = (0, 2, 2, 0, 2, 0, 0, 2)T
238 Computational complexity of counting and sampling

and thus its signature in the given base is


  
n⊗n⊗n 0
 p ⊗ n ⊗ n  2 
  
 n ⊗ p ⊗ n  2 
  
 p ⊗ p ⊗ n  0 
 n ⊗ n ⊗ p  2  =
  
  
 p ⊗ n ⊗ p  0 
  
 n ⊗ p ⊗ p  0 
p⊗p⊗p 2
    
6 2 2 3 2 3 3 1 0 0
 4 6 6 2 4 6 6 2  2   1 
    
 4 6 4 6 6 2 6 2  2   1 
    
 5 4 5 4 5 4 5 4  0   1 
  = .
 4 4 6 6 6 6 2 2  2   1 
    
 5 5 4 4 5 5 4 4  0   1 
    
 5 5 5 5 4 4 4 4  0   1 
1 1 1 1 1 1 1 1 2 1

(Recall that all computations are modulo 7.)


Each variable is replaced with a generator matchgate

5 3 1

where the first and the last vertices are the output vertices. It has standard
signature
g = (5, 0, 0, 3)
therefore, its signature in the given base is indeed g 0 = (1, 0, 0, 1) as

(5, 0, 0, 3) = n ⊗ n + p ⊗ p = (4, 6, 6, 2) + (1, 1, 1, 1).

To give an illustrative example, consider the planar, monotone, read twice


3CNF

Φ := (x1 ∨ x2 ∨ x3 ) ∧ (x1 ∨ x4 ∨ x6 ) ∧ (x2 ∨ x4 ∨ x5 ) ∧ (x3 ∨ x5 ∨ x6 ).

Its planar drawing can be obtained from K4 by adding a vertex in the middle
of each edge. The vertices of K4 are the clause vertices, and the additional
vertices are the variable vertices. The number of satisfying assignments of Φ
are those subgraphs of (the vertex-labeled) K4 that do not contain an isolated
vertex. It is easy to see that there are 42 such subgraphs of K4 , thus there
are 42 satisfying assignments of Φ. Indeed, these subgraphs of K4 are the
following:

1. There are 4 perfect matchings.


Holographic algorithms 239

2. Thereare 16 subgraphs with 3 vertices and without an isolated vertex,


the 63 subgraphs with 3 vertices minus the 4 subgraphs consisting of a
triangle and an isolated vertex.
3. When 2, 1, or 0 edges are omitted from K4 , and there cannot be isolated
vertices; the number of such graphs is 62 = 15, 61 = 6, 60 = 1,
 

respectively.
That is, the number of satisfying assignments of Φ is 0 modulo 7.
The perfect matchings in the matchgrid correspond to those subgraphs of
K4 in which every degree is either 1 or 3. Indeed, each generator matchgate
must have either 0 or 2 edges in C incident to its output edges to have a
perfect matching. The two cases correspond to the presence or absence of the
corresponding edge in K4 . Similarly, each recognizer matchgate must have ei-
ther 1 or 3 edges in C incident to its input edges to have a perfect matching.
These two cases correspond to having degree 1 or degree 3 on the correspond-
ing vertex of K4 . It is easy to see that each subgraph of K4 with the prescribed
constraint corresponds to exactly one prefect matching of the matchgrid, and
the score of that perfect matching is the product of the score of the perfect
matchings of the matchgates. The subgraphs with all degree 1 or 3 are the
following:
1. There are 4 perfect matchings in K4 . It is easy to see that each corre-
sponding perfect matching has score 26 32 54 (mod 7) = 4.
2. There are 4 star trees. Each of the corresponding perfect matchings has
a score 26 33 53 (mod 7) = 1.
3. The K4 itself. Its corresponding perfect matching has score
26 36 (mod 7) = 1.
Therefore, the partition function of the matchgrid is indeed

4 × 4 + 4 × 1 + 1 × 1 ( mod 7) = 0.

Valiant found a holographic reduction for the #7 Pl-Rtw-Mon-3SAT prob-


lem using higher-order tensor algebra (not described in this book) [174]. Jin-Yi
Cai and Pinyan Lu showed such higher-order tensor algebras are not needed
in holographic reductions, any higher-order holographic reduction can be ef-
ficiently transformed into such a holographic reduction that has been in-
troduced in this book [36, 34]. They gave a holographic reduction in base
n = (1, 6), p = (5, 3) [35]. They also proved that characteristic 7 is the unique
characteristic of a field for which there is a common basis in which both an ar-
ity 2 generator matchgate with signature (1, 0, 0, 1) and an arity 3 recognizer
matchgate with signature (0, 1, 1, 1, 1, 1, 1, 1) exist. That is, the #k Pl-Rtw-
Mon-3SAT problem can be solved with a holographic reduction only for k = 7.
In this book, we gave another holographic reduction, see Exercises 9 and 10
for obtaining the holographic reduction introduced here.
240 Computational complexity of counting and sampling

5.3 Further results and open problems


5.3.1 Further results
• Jin-Yi Cai and Pinyan Lu studied the matchgates with symmetric signa-
ture vectors [35]. A signature vector is symmetric if the value of each co-
ordinate depends on how many input or output vertices are removed, and
does not depend on which vertices are removed. Symmetric signatures
are represented in square brackets ([ ]), listing the values for each number
of omitted input or output vertices. For example, (a, b, b, c, b, c, c, d) is
represented as [a, b, c, d]. This research led to a polynomial running time
algorithm whose input is a set of symmetric generator and recognizer
signature vectors, and the output is a base and set of matchgates such
that their signature vectors are exactly the given vectors in the obtained
base or the algorithm reports that no such base and matchgates exist
[31]. Note that it is sufficient to give the k + 1 possible values for an
arity k signature, since the signature vectors are symmetric.

• Constraint Satisfaction Problems (CSP) are generalizations of the sat-


isfiability problems. The input of a #CSP is a set of variables X =
{x1 , x2 , . . . , xn } and a set of functions F = {f1 , f2 , . . . , fm }, and each
function takes a subset of X as its arguments. Each variable xi has a
domain Di . The problem asks to compute
X Y
f (xi1 , xi2 , . . . , xia(f ) )
(x1 ,x2 ,...,xn )∈D1 ×D2 ×...×Dn f ∈F

where a(f ) is the arity of function f . A #CSP problem is planar, if the


graph G = (U, V, E) is planar, where U = X , V = F, and there is an edge
between xi and fj if xi is an argument of fj . Jin-Yi Cai, Pinyan Lu and
Mingji Xia considered the #CSP problems when each domain is {0, 1},
and they proved dichotomy theories. They showed that the tractable
planar #CSP problems are exactly those which can be computed with
holographic algorithms [38].

• Jin-Yi Cai, Pinyan Lu and Mingji Xia introduced the Fibonacci gates
[37]. Fibonacci gates have symmetric signatures [f0 , f1 , . . . , fk ] where
for each fi it holds that fi = fi−1 + fi−2 . The authors showed that the
Holant problems with Fibonacci gates are solvable for arbitrary graphs
in polynomial time not only for planar graphs.

• Mingji Xia, Peng Zhang and Wenbo Zhao used holographic reductions
to prove #P-completeness of several counting problems [184]. First they
used polynomial interpolation (see Subsection 4.2.4) to prove that count-
ing the vertex covers in planar, bipartite 3-regular graphs is in #P-
Holographic algorithms 241

complete. Then they used holographic reduction to prove that count-


ing the (not necessarily perfect) matchings in 2-3 regular, planar, and
bipartite graphs is also in #P-complete. Then they consider ternary
symmetric Boolean functions in the form
(
1 if l1 + l2 + l3 ∈ S
fS (l1 , l2 , l3 ) =
0 otherwise
where li ∈ {0, 1} are the literals and S ⊆ {0, 1, 2, 3}. A literal li might
be xi or xi , where xi is a logical variable. When only positive literals are
allowed, we denote it with the “-Mon-” tag in the description of the prob-
lem. The 3Pl-Rtw-fS -SAT problem asks if a read-twice, planar formula
in which each clause is an f function that is satisfiable. Using holographic
reductions, the authors showed that the #3Pl-Rtw-Mon-f{0,1} -SAT, the
#3Pl-Rtw-Mon-f{0,1,2} -SAT and #3Pl-Rtw-f{0,1,2} -SAT problems are
all in #P-complete.
• Also using holographic reductions, Jin-Yi Cai, Heng Guo and Tyson
Williams proved that edge coloring of an r-regular planar graph with k
colors is in #P-complete for all k ≥ r ≥ 3 [33].

5.3.2 Open problems


The ultimate goal of the research on holographic reduction is to find the
border of P and NP-complete (the border of FP and #P-complete) or find an
“accidental” algorithm that solves an NP-complete (#P-complete) problem in
polynomial time. The following list of open problems highlights that possibly
we are very far from this ultimate goal.
• Is there a combinatorial explanation of holographic reductions? The
holographic reductions are essentially “carefully designed cancellations
in tensor spaces” [37]. We saw that the cancellations appearing in
division-free computations of the determinant have a nice combinatorial
description (clow sequences, see Section 3.1). So far, nobody has been
able to find such a combinatorial explanation/description; on the other
hand, there is no proof that such combinatorial explanation does not
exist. Also, we do not know if any of the introduced problems solvable
with holographic algorithms has a polynomial running time algorithm
without holographic reduction.
• Are there other computational paradigms leading to efficient algorithms
yet to be discovered? There are still many counting problems with un-
known computational complexity, and some of them were mentioned at
the end of Chapter 4. We cannot exclude the possibility that some of
these seemingly #P-complete counting problems are actually in FP, just
the efficient algorithms solving them are not discovered yet.
242 Computational complexity of counting and sampling

5.4 Exercises
1. ◦ Show that for any positive integer k, there exists a planar, 3-regular
graph with 2k + 2 vertices.
2. * Professor F. Lake tells his class that matchgates having weights on the
edges in the set C have more computational power. Should they believe
him?
3. Show that for any row vectors u1 and u2 and matrices A1 and A2 , it
holds that
(u1 A1 ) ⊗ (u2 A2 ) = (u1 ⊗ u2 )(A1 ⊗ A2 ).

4. Show that for n = (−1, 1) and p = (1, 0), it indeed holds that

(1, 1, 1, 0) = n ⊗ n + n ⊗ p + p ⊗ n.

5. Let G be the planar bipartite graph constructed from the octahedron by


adding a vertex to the middle of each edge. Compute the X-matching of
G when each edge has weight 1.
6. ◦ Show that it is possible to compute the matchings of a 2-4 regular,
planar bipartite graph modulo 5 in polynomial time.
7. * Show that computing the (0, k)-cycle-chain sum in planar 3-regular
graphs is #P-hard by reducing the counting of the Hamiltonian cycles
in planar 3-regular graphs to it.
8. Compute the number of cycle-chain covers of the 3-dimensional cube.
9. * Show that there is an arity 3 recognizer matchgate with signature
(0, 1, 1, 1, 1, 1, 1, 1) in base

n = (1 + ω, 1 − ω), p = (1, 1), where ω is the
complex number − 12 + 23 i. Use the fact that

ω 2 = −ω − 1.

10. * Show that ω = 4 satisfies the following equalities in field F7 :

ω3 = 1

ω 2 = −ω − 1,
thus, it is possible to construct an arity 3 recognizer matchgate with
signature (0, 1, 1, 1, 1, 1, 1, 1) in base n = (1 + ω, 1 − ω) = (5, −3) =
(5, 4) (mod 7), p = (1, 1) over field F7 by simply copying the solution of
Exercise 9.
Holographic algorithms 243

11. ◦ Show that it is #P-complete to compute the number of subgraphs of


a 3-regular planar graph that do not contain any isolated vertex.

12. Show that it is #P-complete to compute the number of subgraphs of a


3-regular planar graph that contain at least one isolated vertex.

5.5 Solutions
Exercise 1. Construct recursively an infinite series starting with K4 .
Exercise 2. No, he is not right. Any such matchgrid might be mimicked by
inserting a path of length 2 between the edge in C with weight w and (for
example) the output node of the incident recognizer matchgate. The path will
belong to the recognizer matchgate, and the two edges will have weights 1 and
w.
Exercise 6. Observe that the number to be computed is the score of the
X-matching when each weight is 1.
Exercise 7. The reduction is based on polynomial interpolation. Let G be a
planar, 3-regular graph, and consider the polynomial

3c
bX
n

fG (x) = ai xi
i=1

where n is the number of vertices of G, and ai is the number of cycle covers of


G with exactly i cycles. Clearly, a1 is the number of Hamiltonian cycles of G.
This coefficient can be obtained by evaluating the polynomial fG (x) at n3
different points. Observe that fG (k) is the (0, k)-cycle-chain sum of G.
Exercise 9. The following recognizer matchgate works:

1 1
1
4

since its standard signature is


 T
1 1 1 1
r= 0, , , 0, , 0, 0,
4 4 4 4
244 Computational complexity of counting and sampling

and thus its signature in the given base is


  
n⊗n⊗n 0
 p ⊗ n ⊗ n  1 
  4 
 n ⊗ p ⊗ n  1 
  4 
 p ⊗ p ⊗ n  0 
 n ⊗ n ⊗ p  1  =
  
  4 
 p ⊗ n ⊗ p  0 
  
 n ⊗ p ⊗ p  0 
1
p⊗p⊗p 4
    
−1 1 + 2ω 1 + 2ω 3 1 + 2ω 3 3 −3 − 6ω 0 0
 ω 2+ω 2+ω −3ω ω 2+ω 2+ω −3ω  1   1 
   41   
 ω
 2+ω ω 2+ω 2+ω −3ω 2+ω −3ω   
 4   1 

 1+ω
 1−ω 1+ω 1−ω 1+ω 1−ω 1+ω 1−ω  0  
 1  =  1 
.
 ω
 ω 2+ω 2+ω 2+ω 2+ω −3ω −3ω   
 4   1 

 1+ω
 1+ω 1−ω 1−ω 1+ω 1+ω 1−ω 1−ω  0  
   1 

 1+ω 1+ω 1+ω 1+ω 1−ω 1−ω 1−ω 1−ω  0   1 
1
1 1 1 1 1 1 1 1 4
1

1
Exercise 10. It is trivial to check the inequalities. The weight 4 should be
replaced with 2 since 2 × 4 = 1 (mod 7).
Part II

Computational Complexity
of Sampling

245
Chapter 6
Methods of random generations

6.1 Generating random numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 248


6.2 Rejection sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
6.3 Importance sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
6.4 Sampling with algebraic dynamic programming . . . . . . . . . . . . . . . . . 256
6.5 Sampling self-reducible objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
6.6 Markov chain Monte Carlo . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
6.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 267
6.8 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269

The idea to use random numbers in scientific computation dates back to


the 18th century. The French mathematician, naturalist. and encyclopédist,
Georges-Louis Leclerc, Comte de Buffon introduced a problem that we know
today as Buffon’s needle problem [30]. It asks to find the probability that a
randomly thrown needle of length l hits one of the parallel lines equally spaced
on the floor a distance t > l apart. We can assume that the distance of the
center of the needle from the closest line is uniformly distributed between 0
and 2t . We can also assume that the angle the needle closes with any of the
parallel lines is also uniformly distributed between 0 and π. The needle hits
one of the lines if
l
x < sin(α) (6.1)
2
where x is the distance of the center from the closest line and α is the angle of
the needle. Using geometric probability, the probability that the needle crosses
a line is Rπ
α=0
sin(α) 2l dα 2l
t = . (6.2)
2 π tπ
From this theoretical result, the value of π can be experimentally estimated
by throwing the needle several times. If the measured frequency of the needle
hits is f , then the estimation for π is
2l
π̂ = . (6.3)
tf

247
248 Computational complexity of counting and sampling

6.1 Generating random numbers


Random algorithms use random number generators. Typical computers
cannot generate truly random numbers, however, they generate pseudorandom
numbers. Pseudorandom number generators generate integer numbers from a
given interval [a, b] mimicking the uniform distribution. Then these numbers
can be used to generate (pseudo)random numbers uniformly distributed on
the [0, 1] interval by projecting [a, b] to [0, 1]. Of course, not all real numbers
can be achieved in this way, only a finite subset of them. However, they well
approximate the uniform distribution on the [0, 1] interval. Once we have
random numbers uniformly distributed on [0, 1], then we can easily generate
random numbers following other distributions.
Assume that π is a distribution on the integer numbers 1, . . . , n. The fol-
lowing procedure will generate a random number following π. Set x to 1, p
to 0, and iterate the following until a value is returned. Draw u uniformly on
[0, 1]. If u ≤ π(x)
1−p , return x. Otherwise add π(x) to p and increment x. The
so-generated random ξ indeed follows distribution π, since
 
π(1) π(x)
P (ξ = x) = (1 − π(0)) 1 − ... Px−1 = π(x). (6.4)
1 − π(0) 1 − i=1 π(i)
Assume that π is a continuous distribution with a cumulative distribution
function F . Furthermore, assume that F −1 (x) can be easily computed for
any x. Then the following method, called the inversion method, generates
a random variable following π. Generate u from the uniform distribution on
[0, 1]. Return F −1 (u). Then the so-generated random variable ξ indeed follows
the distribution π, since
P (ξ ≤ x) = F (x), (6.5)
and the cumulative distribution function unequivocally defines the distribu-
tion. Below we give an example of how to use the inversion method.
Example 12. Generate a random variable following the exponential distribu-
tion with parameter λ.
Solution. The cumulative density function is
y = 1 − e−λx . (6.6)
From this we can easily compute the inverse of the c.d.f.:
ln(1 − y)
x= . (6.7)
−λ
It is easy to see that 1−u is also uniform on [0, 1], if u is uniform in [0, 1]. Thus
the following procedure generates a random variable following the exponen-
tial distribution with parameter λ. Draw a random u following the uniform
distribution on [0, 1]. Return ln(u)
−λ . 
Methods of random generations 249

Sometimes the cumulative density function does not have an analytic, easy-
to-compute form, even its inverse. Interestingly, there is still a simple method
that generates normally distributed random variables first proved by George
Edward Pelham Box and Mervin Edgar Muller [22].
Theorem 60. Let u1 and u2 be uniformly distributed values on [0, 1]. Then
p
z1 := −2 ln(u1 ) cos(2πu2 ) (6.8)
and p
z2 := −2 ln(u1 ) sin(2πu2 ) (6.9)
are independent random variables, both of them following the standard normal
distribution.
In many cases, more sophisticated methods are needed, see also the next
section.

6.2 Rejection sampling


Rejection sampling was introduced by von Neumann [181]. Let π and p be
two distributions over the same domain X. Assume that π is a distribution
we would like to sample from, and p is a distribution we can sample from. We
will call p the sampling distribution and π is called the target distribution. We
are not required to be able to calculate the exact densities (probabilities), but
we have to be able to calculate functions f and g that are proportional to π
and p, respectively. Let the normalizing constants be n1 and n2 , namely, for
any x ∈ X, the equations
π(x) = n1 f (x) (6.10)
and
p(x) = n2 p(x) (6.11)
hold. Furthermore, there is a so-called enveloping constant c, that satisfies
that for any x ∈ X,
cg(x) ≥ f (x). (6.12)
If these conditions are given, we can apply the rejection sampling method. It
consists of two steps.
1. Draw a random x following the distribution p.
2. Draw a random number u following the uniform distribution on the
interval [0, 1]. Accept the sample x if
ucg(x) < f (x). (6.13)
Otherwise, reject sample x, and go back to the first step.
250 Computational complexity of counting and sampling

The following theorem holds for the rejection sampling.

Theorem 61. The accepted samples in the rejection sampling follow the dis-
tribution π.
Proof. We apply the Bayes theorem on conditional probabilities. Let A denote
the event that a sample is generated. First, we calculate P (A). If X is a discrete
space, then the probability that A happens is the sum of the probability that
x is drawn multiplied by the probability that x is accepted, summing over all
members x ∈ X. That is,
X f (x) X n1 π(x) n1 X n1
P (A) = p(x) = p(x) = π(x) = . (6.14)
cg(x) cn2 p(x) cn2 cn2
x∈X x∈X x∈X

(For a continuous space, similar calculation holds, just the summation can
be replaced with integration.) Now we can use the Bayes theorem. For an
arbitrary x, the equation
f (x) n1 π(x)
P (A|x)P (x) cg(x) p(x) cn2 p(x) p(x)
P (x|A) = = n1 = n1 = π(x) (6.15)
P (A) cn2 cn2

holds, namely, the accepted samples indeed follow the distribution π.


We show an example of how to sample satisfying assignments of a dis-
junctive normal form. Surprisingly, we can sample uniformly satisfying as-
signments of a disjunctive normal form in expected polynomial running time
using rejection sampling. This rejection sampling can be easily turned to an
FPAUS.
Example 13. Let a DNF Φ be given with n variables. Sample satisfying
assignments of Φ using the rejection method. Show that #DNF is in FPAUS.

Solution. A very naı̈ve approach for a rejection sampling is to choose p as


the uniform distribution of all possible assignments. That is, generate random
assignments uniformly, and accept those which satisfy Φ. We know that for
any assignment γ, p(γ) = 21n and it is easy to sample from p. Let S denote the
set of satisfying assignments. Then the target distribution π is such that for
1
an x ∈ S, π(x) = |S| and for x ∈ S, π(x) = 0. The function g can be defined
as  1
2n if x ∈ S
g(x) := (6.16)
0 if x ∈ S
and then the enveloping constant might also be 1. Let l denote the number of
literals in the shortest conjunctive clause in Φ. Then we know that

|S| ≥ 2n−l . (6.17)

Therefore, the normalizing constant n2 cannot be larger than 2l . What follows


Methods of random generations 251

is that the acceptance probability cannot be smaller than 2−l , since n1 = c =


1, and the acceptance probability calculated in Equation (6.14) is nn21c = n12 . If
Φ contains a short clause (that is, l = O(log(n))), this naı̈ve method already
provides an efficient sampling of satisfying assignments, since the expected
number of trials to generate one satisfying assignment has an upper bound
polynomial in n. On the other hand, if all clauses in Φ are long, then this
approach might be inefficient.
However, satisfying assignments can be generated efficiently even if Φ con-
tains only long clauses. Let Φ = C1 ∨ C2 ∨ . . . ∨ Ck , where Ci , i = 1, . . . , k are
the clauses. Let Si denote the set of assignments that satisfy clause Ci . We
know that |Si | = 2n−li , where li is the number of literals in Ci . Clearly,
S = ∪ki=1 Si . (6.18)
Define
k
X
M := |Si |. (6.19)
i=1

Clearly, M ≥ |S|. Also observe that any satisfying assignment is counted at


most k times in M , therefore it also holds that
M
|S| ≥ . (6.20)
k
We define distribution p via the following method. Select a random clause Ci
n−li
with probability 2 M , then select a random satisfying assignment γ uniformly
from Si . A random clause γ has probability t(γ)
M , where t(γ) is the number of
clauses that γ satisfies.
What follows is that we can calculate exactly p for any γ, therefore g might
k
be p and then n2 = 1. We can set f to be the constant M . Since for any γ,
1
π(γ) = |S| , we get that the normalizing constant n1 ≥ 1. Since the minimum
1
probability in p is M , k is an appropriate number for the enveloping constant.
We get that the acceptance probability is
n1 n1 1
P (A) = = ≥ (6.21)
cn2 k k
since n1 ≥ 1. Hence the expected number of trials for a satisfying assignment
is upper bounded by k, the number of clauses in Φ, which is less than the
length of Φ.
Next, we show that this rejection sampler can be the core of an FPAUS. Let
the DNF Φ and ε > 0 be the input of the FPAUS, and let k denote the number
of clauses in Φ. Then do the rejection sampling till the first acceptance but at
most 2k log 1ε times. The probability that all proposed satisfying assignments
are rejected is at most
 2k log( 1ε )
1
1− ≈ ε2  ε. (6.22)
k
252 Computational complexity of counting and sampling

If all proposals are rejected, then return with the last proposal. Then the
generated satisfying assignment follows the distribution

π = (1 − α)U + αp, (6.23)

where U is the uniform distribution, p is an unknown distribution, and 0 <


α < ε. It is easy to show that the total variation distance between π and the
uniform one is less than ε, since
1 X
dT V (U, π) = |U (x) − π(x)| =
2
x∈X
1 X 1 X
|U (x) − (1 − α)U (x) − αp(x)| = α|U (x) − p(x)| =
2 2
x∈X x∈X
αdT V (U, p) < α < ε. (6.24)

The inequality αdT V (U, p) < α comes from the fact that the total variation
distance between any two distributions is at most 1.
Since one rejection sampling step can be done in polynomial time, the
running time of the algorithm is clearly polynomial in both the size of the
problem and − log(ε), and thus, the described procedure is indeed an FPAUS.


Generating uniform satisfying assignments might also be used to estimate


the number of satisfying assignments. From a set of samples, we can estimate
the probability that the logical variable x1 is TRUE in satisfying assignments,
and its complement probability, the probability that x1 is FALSE in satisfying
assignments. One of them is greater than or equal to 0.5. Let f (x1 ) be the
frequency of samples (satisfying assignments) in which x1 is TRUE. If f (x1 ) ≥
0.5, then let Φ1 be the DNF obtained from Φ such that all clauses are removed
in which x1 is a literal, and each x1 is removed from all clauses in which x1 is a
literal. The number of satisfying assignments of Φ1 is the number of satisfying
assignments of Φ in which x1 is TRUE. Particularly,

|SΦ1 | = P (x1 = T RU E)|SΦ | (6.25)

where SΦ1 denotes the set of satisfying assignments of Φ1 and SΦ denotes the
set of satisfying assignments of Φ.
If f (x1 ) < 0.5, then let Φ1 be the DNF obtained from Φ such that all
clauses are removed in which x1 is a literal, and each x1 is removed from all
clauses in which x1 is a literal. The number of satisfying assignments of Φ1 is
the number of satisfying assignments of Φ in which x1 is FALSE. Particularly,

|SΦ1 | = P (x1 = F ALSE)|SΦ | (6.26)

where SΦ1 denotes the set of satisfying assignments of Φ1 and SΦ denotes the
set of satisfying assignments of Φ.
Methods of random generations 253

We can generate random satisfying assignments of Φ1 and can estimate


the fraction of satisfying assignments in which x2 is TRUE and the fraction of
satisfying assignments in which x2 is FALSE. Similar to the previous case, we
can generate Φ2 , which is a DNF of random variables x3 , . . . , xn , and whose
satisfying assignments have cardinality as the number of assignments of Φ1
with a prescribed value for x2 .
Eventually, we can generate a DNF Φn−1 that contains only literals xn and
xn and has either 1 or 2 satisfying assignments depending on whether only
one of the literals or both of them appears in it. Therefore we know |SΦn−1 |,
|SΦi−1 |
and we have estimations for all fractions |SΦi | . We know that

|SΦ | |SΦ1 | |SΦn−2 |


|SΦ | = × × ... × × |SΦn−1 |, (6.27)
|SΦ1 | |SΦ2 | |SΦn−1 |

therefore the product of the estimated fractions and |SΦn−1 | is an estimation


for |SΦ |. Although the errors in the estimations are multiplied, it turns out
that very good estimation can be obtained in polynomial running time. This
will be proved in Subsection 7.3.1.
When the aim is to estimate the number of satisfying assignments, we also
can keep each generated sample, and can appropriately weight it instead of
rejecting. This is prescribed in the next section.

6.3 Importance sampling


Let π and p be two distributions over the same domain X. Just like in the
previous section, p is the sampling distribution and π is the target distribution.
Let f : X → R be a function, and assume that we would like to estimate the
expected value of f under the distribution π, that is
X
Eπ [f ] := f (x)π(x). (6.28)
x∈X

Assume that we can sample only from the distribution p. We would like to
find function g satisfying that

Ep [g] = Eπ [f ] (6.29)

namely, the expected value of g under the distribution p is the expected num-
ber of f under the distribution p. It is easy to see that the following theorem
holds.
Theorem 62. If
π(x)
g(x) := f (x) (6.30)
p(x)
254 Computational complexity of counting and sampling

then
Ep [g] = Eπ [f ]. (6.31)
Proof. Indeed,
X X π(x) X
Ep [g] = g(x)p(x) = f (x)p(x) = f (x)π(x) = Eπ [f ]. (6.32)
p(x)
x∈X x∈X x∈X

It is frequently the case that we do not know the probabilities (densities)


π and p, but we can calculate π(x) and p(x) up to an unknown normaliz-
ing constant. Then we can calculate g also up to an unknown normalizing
constant. However, we can select an f for which we know the expectation,
and can estimate the normalizing constant. The trivial choice for such f is
the constant 1 function. The constant 1 function has expectation 1 under any
distribution. Then
Eπ [f ] = 1 = Ep [g] = nEp [g̃] (6.33)
where n is the unknown normalizing constant and g̃ is the function we can
calculate. If we sample from distribution p, the average g̃ value of the samples
is an estimation for n1 . In the following example we show how to use this
method to estimate the number of satisfying assignments of a disjunctive
normal form.
Example 14. Let Φ be a disjunctive normal form containing k clauses. Es-
timate the number of satisfying assignments of Φ using importance sampling.
Solution. Let p be the distribution defined in the solution of Example 13.
Recall that we can calculate the probability of any satisfying assignment γ in
the distribution p:
t(γ)
p(γ) = . (6.34)
M
We cannot calculate π(γ), where π is the uniform distribution of satisfying
assignments. However, we can calculate π up to an unknown normalizing
1
constant, that is constant 1, and then the normalizing constant is |S| , where
S is the set of satisfying assignments. Let f be the constant 1 function, then
π(γ) π(γ) 1 M 1
g(γ) = f (γ) = t(γ) 1 = = g̃(γ). (6.35)
p(γ) |S| t(γ) |S|
M

Namely, if we generate satisfying assignments following the distribution p, and


M
calculate the average g̃ value (that is, g̃(γ) = t(γ) ) that will be an estimation
for the inverse of the normalizing constant, that is |S|. Furthermore, we know
that the average of the samples is an unbiased estimator for the expectation.
Therefore if γ1 , γ2 , . . . , γN are samples following the distribution p, then
PN M
i=1 t(γi )
(6.36)
N
Methods of random generations 255

is an unbiased estimator for the number of satisfying assignments of Φ. 

There is one more important property of function g̃ in the previous exam-


ple. The smallest value of g̃ cannot be smaller than M k , where k is the number
of clauses and the largest value of g̃ cannot be larger than M . What follows
is that the variance
X
Vp [g̃] := (g̃(x) − Ep [g̃])2 p(x) (6.37)
x∈X

cannot be larger than M 2 . Indeed, the variance is the expectation of the


squared values minus the squared expectation. Then an upper bound on the
variance is the expectation of the squared values, and then an upper bound for
this latter is the maximum squared value. That is, M 2 . On the other hand, we
know that the expectation cannot be smaller than the smallest value, that is
M
k . Therefore the standard deviation of the g̃ values under the distribution p
is at most k times more than the expected value. What follows is that a small
number of samples, say O(k 3 ) samples, is sufficient for the standard error to
be much smaller than the expectation. This means that it is computationally
efficient to estimate the number of satisfying assignments of a disjunctive
normal form using the introduced importance sampling.
It is quite exceptional that a rejection sampling or an importance sampling
provides a good estimation of the size of the space of interest, although we
can generate artificial examples where the importance sampling has a much
smaller standard deviation (actually, 0 variance and thus 0 deviation) than
sampling from the correct distribution.

Example 15. Consider the biased cube C with which throwing value k has
probability proportional to k. Work out the importance sampling method to
estimate the expected value of the unbiased cube using samples from the biased
cube C.
Solution. The sum of the possible values is 21, therefore if ξ denotes the
thrown value, then
k
P (ξ = k) = .
21
This is the sampling distribution p. The desired distribution π is the uniform
distribution, where each outcome has probability 16 . The function f assigns a
value k to the event “a value k has been thrown”. Then the g function we are
looking for is
1
π(k) 21
g(k) = k = k6 k = = 3.5.
p(k) 21
6
Namely, g is the constant 3.5 function, which is the expectation of the thrown
value on an unbiased die. What follows is that this importance sampling has
0 variance! Namely, one sample from the sampling distribution provides an
exact estimation of the expectation in the target distribution. 
256 Computational complexity of counting and sampling

Example 15 is artificial since there, p is proportional to f and π is the


uniform distribution, therefore for any x, p(x) is proportional to f (x)π(x).
The normalizing constant is exactly the expectation of f under distribution
π that we are looking for. However, when we do not know this expectation, it
is very unlikely that the sampling distribution is close to the product of the
function f and the target probability distribution π.
Rather, in many cases, the sampling distribution is built on sequentially,
making a random choice in each step. This is especially the case when the
space from which we would like to sample contains modular objects like trees,
sequences, etc. Then in each step there is some deviation from the target
distribution, and these errors are multiplied along the random generations.
What follows is that a so-called sequential importance sampling might have
some extremely large values in its g function for those x, for which

p(x)  π(x). (6.38)

Therefore, the variance of the g function might be extremely big, making the
method computationally intractable. In many of those cases, Markov chain
Monte Carlo methods can help, see Section 6.6.

6.4 Sampling with algebraic dynamic programming


In this section, we prove that if an ensemble of combinatorial objects can
be counted with algebraic dynamic programming in polynomial time, then it
can be uniformly sampled also in polynomial time. We prove slightly more: if
an algebraic dynamic programming computes the sum of the (non-negative)
weights of some combinatorial objects in polynomial time, then it is possible
to sample those combinatorial objects from the distribution proportional to
their weights also in polynomial time.

Theorem 63. Let E = (Y, R, f, T ) be an evaluation algebra with the following


properties.
(a) R is R+ ∪ {0}, the non-negative real numbers semiring.
(b) The evaluation algebra solves a counting problem in polynomial time in
the sense of Theorem 21.
(c) For any Ti ∈ T , if ◦i is an m-ary operation, then
m
Y
Ti (f (a1 ), . . . , f (am ); p(a1 ), . . . , p(am )) := cp(a1 ),...,p(am ) f (aj ).
j=1
Methods of random generations 257

For any problem instance x, let θ denote the parameter for which the solution
of x is F (S(θ)). We further assume that for any θ0 ∈ B ∩ θ↓ , sampling from
the distribution
f (a)
π(a) := (6.39)
F (S(θ0 ))
can be done in polynomial time. Then it is possible to sample from S(θ) fol-
lowing the distribution
f (a)
π(a) := (6.40)
F (S(θ))
in polynomial time.
Proof. We exhibit a recursive algorithm that generates samples from the pre-
scribed distribution. Let i denote the indexes in the computation
X
F (S(θ)) = Ti (F (S(θi,1 , . . . , θi,mi ; θi,1 , . . . , θi,mi ). (6.41)
i

We claim that the following algorithm samples from π:


1. Sample a random i following the distribution

Ti (F (S(θ1 )), . . . , F (S(θmi )); θ1 , . . . , θmi )


p(i) := .
F (S(θ))

2. For each θj ∈ {θ1 , . . . , θmi }, sample a random aj following the distribu-


tion
f (aj )
πj (aj ) := .
F (S(θj ))

3. Return with ◦i m
j=1 aj .
i

Indeed, the probability for sampling ◦i m


j=1 aj is
i

mi
Y f (aj )
p(i) =
j=1
F (S(θ j ))
Qmi mi
cθ1 ,...,θmi j=1 F (S(θj )) Y f (aj )
=
F (S(θ)) j=1
F (S(θj ))
Qmi
cθ1 ,...,θmi j=1 f (aj )
= π(◦i m
j=1 aj ).
i
(6.42)
F (S(θj ))

Sampling from each πj can be done in the same way. Therefore, the follow-
ing recursive algorithm samples a random a from the prescribed distribution:
sampler(θ)
if θ ∈
/B
Generate a random i following the distribution
258 Computational complexity of counting and sampling

Ti (F (S(θ1 )), . . . , F (S(θmi )); θ1 , . . . , θmi )


p(i) := .
F (S(θ))
for each j ∈ {1, 2, . . . , mi }
aj :=sampler(θj )
return ◦i m
j=1 aj
i

else
f (a)
return a following distribution F (S(θ)) .

We have to prove that the presented recursive algorithm runs in polynomial


time. Since F (S(θ)) can be computed in polynomial time, so can F (S(θ0 )) for
any θ0 ∈ θ↓ . Furthermore, the number of parameters covered by a particular
parameter are also polynomially upper bounded as well as the range of the
indexes in Equation (6.41). Therefore, sampling from the distribution
Ti (F (S(θ1 )), . . . , F (S(θmi )); θ1 , . . . , θmi )
p(i) :=
F (S(θ))
can be done in polynomial time. Also, the number of times the recursive
function calls itself is polynomial bounded. Finally, the base cases θ ∈ B also
can be handled in polynomial time due to the conditions of the theorem.
Therefore, the overall running time grows polynomially.
We show two examples of how Theorem 63 can be applied.
Example 16. Let G = (T, N, S, R, π) be a stochastic regular grammar, and
let X be a sequence of length n from alphabet T . Give a random sampling
method that samples generations of X following the distribution
m(g,r)
Q
r∈g π(r)
p(g) :=
P (X)
where m(g, r) is the multiplicity of the rule r in the generation g generating
X, and P (X) is the probability that the regular grammar generates X.
Solution. Recall that P (X), that is, the sum of the probabilities of the gen-
erations of X can be computed with algebraic dynamic programming, see
Subsection 2.3.1. Applying Theorem 63 on that algebraic dynamic program-
ming leads to the following two-phase method.
Phase I. Fill in a dynamic programming table d(i, W ) for all i = 0, 1 . . . , n
and W ∈ N with the initial conditions
(
1 if W = S
d(0, W ) =
0 otherwise

and recursions
X
d(i, W ) = d(i − 1, W 0 )π(W 0 → xi W )
W 0 ∈N
Methods of random generations 259
X X
d(i, ε) = d(i − 1, W )π(W → xi ) + d(i, W )π(W → ε).
W ∈N W ∈N

Phase II. Draw a random rewriting rule r in the form W → xn or W → ε


with probability
d(n − 1, W )π(W → xn )
d(n, ε)
or
d(n, W )π(W → ε)
.
d(n, ε)
Set g := r and set i to n − 1 if the selected rule is in the form W → xn and set
i to n if the selected rule is in the form W → ε. Set W 0 to the non-terminal
in the selected rule. Then do the following iteration. While i is not 0, select a
random rewriting rule W → xi W 0 with probability

d(i − 1, W )π(W → xi W 0 )
.
d(i, W 0 )

Let g := W → xi W 0 , g. Set i to i − 1 and set W 0 to W . 


Example 17. Let G = (T, N, S, R, π) be a stochastic context-free grammar
in Chomsky Normal Form, and let X be a sequence of length n from alphabet
T . Give a random sampling method that generates a random parse tree T
following the distribution
Y
p(T ) ∝ π(r)m(T ,r) ,
r∈T

where m(T , r) is the multiplicity of the rewriting rule r in the parse tree T
generating X, and ∝ stands to “proportional to”.
Solution. Recall that the sum
X Y
π(r)m(T ,r)
T |T generates X r∈T

can be computed with algebraic dynamic programming, see Subsection 2.3.3.


Applying Theorem 63 on that algebraic dynamic programming leads to the
following two-phase method.
Phase I. Fill in a dynamic programming table d(i, j, W ) for all 1 ≥ i ≥ j ≥ n
and W ∈ N with the initial condition

d(i, i, W ) = π(W → xi )

and recursion
X X X
d(i, j, W ) = d(i, k, W1 )d(k + 1, j, W2 )π(W → W1 W2 ).
i≥k<j W1 ∈N W2 ∈N
260 Computational complexity of counting and sampling

Phase II. The random tree is obtained by the following recursive function
calling it with parameters (1, n, S).
TreeSampler(i, j, W )
if i < j
Generate a random k, W1 and W2 following the distribution

d(i, k, W1 )d(k + 1, j, W2 )π(W → W1 W2 )


d(i, j, W )

Let T1 :=TreeSampler(i, k, W1 )
Let T2 :=TreeSampler(k + 1, j, W2 )
Generate a tree T by merging T∞ and T∈ with rule W → W1 W2
return T
else
return T = W → xi


6.5 Sampling self-reducible objects


In this section, we discuss a very important class of counting problems,
the self-reducible counting problems. Roughly speaking, a counting problem
is self-reducible if any beginning of any solution of a problem instance can be
extended as the solutions of another problem instance. This other problem
instance must have a size comparable with the original problem instance.
Furthermore, the solutions can multifurcate only polynomially at any step.
Many natural counting problems are self-reducible. For example, if we already
selected a few edges participating in a perfect matching in a graph, then the
possible extensions of this subset of edges are the perfect matchings of the
remaining graph. Formally, the self-reducible counting problems are defined
in the following way.
Definition 50. A counting problem is self-reducible if the following holds:

1. There exists a relation R ⊆ Σ∗ × Σ∗ such that whenever xRy, then x


describes a problem instance and y describes a solution.
2. There exists a polynomial time computable function g : Σ∗ → N such
that xRy ⇒ |y| = g(x). Furthermore, g(x) = poly(|x|).

3. There exist polynomial time computable functions φ : Σ∗ × Σ∗ → Σ∗ and


σ : Σ∗ → N satisfying
(a) σ(x) = O(log(|x|)),
Methods of random generations 261

(b) g(x) > 0 ⇒ σ(x) > 0,


(c) |φ(x, w)| ≤ |x|,
(d) xRy1 y2 . . . yn ⇔ φ(x, y1 y2 . . . yσ(x) )Ryσ(x)+1 . . . yn .
The solution space of a problem instance x of a reducible problem can
be represented by a rooted tree called a count tree in the following way. The
root of the tree is labeled with x. For any node v of the tree, if the node
is labeled by x0 and g(x0 ) > 0, then v is an internal node, and the number
of outgoing edges of v is the number of sequences w of length σ(x0 ) that
there exists a sequence z such that x0 Rwz. The children of v are labeled with
φ(x0 , w), and the edge (v, u) is labeled by the sequence w if u is the child of
v and is labeled by φ(x0 , w). The number of solutions of x is the number of
leaves of this tree, and for any x0 labeling v, the number of solutions of x0 is
the number of leaves of the subtree rooted in v. The function σ is called the
granulation function. We cannot expect that we can assign a meaning to an
arbitrary suffix of a sequence representing a solution, but we require that we
can “granulate” sequences representing the solutions such that each appearing
suffix is meaningful. For example, the sequence

z = “(v3 , v4 ), (v5 , v6 )”

represents a pair of edges, and they form a suffix of the string

y = “(v1 , v2 ), (v3 , v4 ), (v5 , v6 )”

that represents a perfect matching in K6 , and the two edges described in z


form a perfect matching of the complete graph on vertices {v3 , v4 , v5 , v6 }. On
the other hand, the sequence

“4 ), (v5 , v6 )”

is meaningless from the same point of view, although it is still a suffix of y.


It is natural to allow σ to grow as the logarithm of the size of the problem
instance. Indeed, in many cases, both the problem instance and the solution
contain indexes, the size of them grows polynomially with the input size,
thus the number of characters necessary to write down these indexes grows
logarithmically. On the other hand, we do not want to allow larger granulation
functions, since we want self-reducible problems to be locally explorable in
polynomial time. That is, the number of children of any internal node in the
tree representing the solution space can grow only polynomially with the size
of the problem instance.
Self-reducible objects are easy to sample if they are easy to count as the
following theorem states.
Theorem 64. If a self-reducible counting problem is in FP, then there exists
a sampling algorithm that runs in polynomial time and generates solutions
from the uniform distribution.
262 Computational complexity of counting and sampling

Proof. Let x be a problem instance of a self-reducible counting problem, and


let l(v) denote the problem instance labeling vertex v in the count tree of x.
Further, let f (x) denote the number of solutions of a problem instance x. A
random solution following the uniform distribution can be generated with the
following recursion.
Set v to the root of its count tree, and set y to the empty sequence. While
g(l(v)) > 0, select a random node u from the children of v following the
distribution
f (l(u))
. (6.43)
f (l(v))
Extend y with the label of the edge (v, u) and set v to u.
It is trivial to see that this recursion generates uniformly a random solu-
tion. It is also easy to see that the recursion runs in polynomial time. Indeed,
a vertex in the count tree has a polynomial number of children that can be ob-
tained in polynomial time. These vertices are labeled with problem instances
that cannot be larger than x, therefore the probabilities in Equation (6.43)
can be computed in polynomial time, and thus, a random child can be drawn
in polynomial time. Since the depth of the count tree is also a polynomial
function of |x|, the recursion ends in polynomial number of steps, thus, the
overall running time is also a polynomial function of |x|.

If a self-reducible counting problem is in FP, it does not necessarily mean


that it has an algebraic dynamic programming computing the number of so-
lutions in polynomial time. Indeed, according to the work of Jerrum and Snir,
there is no polynomial monotone circuit that computes the spanning tree poly-
nomial. On the other hand, it is easy to show that the number of spanning
trees is a self-reducible counting problem, and the number of spanning trees
of a graph can be computed in polynomial time. What follows is that it is pos-
sible to generate uniformly a random spanning tree of a graph in polynomial
time as the following example shows.
Example 18. Let G = (V, E) be an arbitrary graph. Generate uniformly a
random spanning tree of G.
Solution. Fix an arbitrary total ordering of the edges of G. Any spanning
tree is a subset of edges of G that can be described by a 0-1 vector of length
|E|, where 0 at position i means the absence of the ith edge and 1 means the
presence of the ith edge in the spanning tree.
Let e = (u, v) be an edge in G. We will denote by G−e the (possibly
multi)graph that is obtained by contracting the edge e, and having multiple
edges between w and x if both u and v are adjacent to w, where x is the new
vertex appearing in the edge contraction. If G itself is a multigraph, then the
number of edges between w and x is the sum of the number of edges between
w and u and the number of edges between w and v. Observe the following. The
number of spanning trees in G that contain edge e is the number of spanning
Methods of random generations 263

trees in G−e in which two spanning trees are distinguished if they contain
different edges between w and x.
Also observe that the number of spanning trees in G that do not contain
edge e is the number of spanning trees in G\{e}. Therefore the spanning trees
of G can be described with the following count tree. Each vertex v is labeled
by a (possibly multi)graph G0 . If G0 has edges, then v has two children labeled
0
by G0−e and G0 \ {e0 }, where e0 is the smallest edge in G0 . Further, the two
edges connecting v to its children are labeled by 1 and 0.
Since the number of spanning trees of multigraphs can be computed in
polynomial time, this count tree can be used to uniformly generate random
spanning trees of G in polynomial time. 

6.6 Markov chain Monte Carlo


Although Buffon already invented random computations in the 18th cen-
tury, random computations became widespread only after computers were
invented. Soon after building the first electronic computer, Metropolis and
his coworkers invented Markov chain Monte Carlo methods [128]. Their algo-
rithm was widely used by physicists and chemists. Hastings generalized the
Metropolis algorithm in 1970 [91], what we know today as the Metropolis-
Hastings algorithm.
Definition 51. A homogeneous, discrete time, finite space Markov chain is
a random process described with the pair (X, T ), where X is a finite set called
state space and T is a function mapping from X × X to the non-negative
real numbers. The members of X are called states. The random process is a
random walk on the states starting at time t = 0. The random walk makes one
step in each unit of time. If x, y ∈ X, then T (y|x) is the conditional probability
that the process jumps to state y given that it is now in state x. The random
walk might jump from a state x back to state x, namely, it is allowed that
T (x|x) is greater than 0. However, T (·|x) must be a probability distribution,
so we require that for any x,
X
T (y|x) = 1 (6.44)
y∈X

hold. T is also called transition probabilities. The Markov chain can be de-
~ = (V, E) is the Markov
scribed with a directed graph called a Markov graph. G
graph of the Markov chain M = (X, T ), if V represents the states of the
Markov chain, and there is an edge from vx to vy if the corresponding transi-
tion probability T (y|x) is greater than 0.
The Markov chain is irreducible if its Markov graph is strongly connected.
264 Computational complexity of counting and sampling

A Markov chain is aperiodic if the greatest common divisor of the directed


cycle lengths of the Markov graph is 1. A Markov chain is reversible with
respect to a distribution π, if the detailed balance is satisfied, that is, for any
x, y ∈ X, equation
π(x)T (y|x) = π(y)T (x|y) (6.45)
holds.
Throughout the book, all introduced Markov chains will be homogeneous,
discrete time and finite space Markov chains, so we simply will refer to them
as Markov chains.
The process might start in a deterministic state or in a random state de-
scribed with a distribution x0 . A deterministic state might also be considered
as a distribution in which the distinguished state has probability 1 and all
other states have probability 0. We can write the transition probabilities into
a matrix T = {ti,j }, where ti,j = T (vi |vj ). Then the distribution of the states
after one step in the Markov chain is

x1 = x0 T (6.46)

and generally, after t steps,


xt = x0 Tt . (6.47)
The central question in the theory of Markov chains is whether the random
process converges and if it converges, what is its limit? There are several
ways to define convergence of distributions; in the theory of Markov chains,
measuring the distance of distributions in total variation distance is the most
common way.
Definition 52. A distribution π is an equilibrium distribution of the Markov
chain M = (X, T ) if for all x ∈ X, equation
X
π(x) = π(y)T (x|y) (6.48)
y∈X

holds.
There are two types of convergence. The local convergence means conv-
erence from some given starting distribution. Global convergence means con-
vergence from an arbitrary starting distribution.
Definition 53. A Markov chain converges to distribution π from a distribu-
tion x0 if
lim dT V (x0 Tt , π) = 0. (6.49)
t→∞

A stationary distribution π is globally stable if the Markov chain converges


to π from an arbitrary distribution.
The following theorem is a central theorem providing globally stable sta-
tionary distributions.
Methods of random generations 265

Theorem 65. Let M = (X, T ) be an irreducible and aperiodic Markov chain,


and also reversible with respect to the distribution π. Then π is a globally
stable stationary distribution.
We are not going to prove this theorem here, but the proof can be found
in many standard textbooks on Markov chains [25, 121]. Below we provide an
example for a reversible, irreducible, and aperiodic Markov chain.
Example 19. Fix a positive integer n. Let X be the set of Dyck words of
length 2n. Provide an irreducible, aperiodic Markov chain which is reversible
with respect to the uniform distribution over X, thus, the uniform distribution
is the globally stable stationary distribution.
Solution. Consider the following random perturbation. Let D ∈ X be an
arbitrary Dyck word, which is the current state of the Markov chain. Generate
a random i uniformly from [1, 2n − 1]. Let D0 be the word that we get by
swapping the characters in position i and i + 1. If D0 is also a Dyck word,
then the next state in the Markov chain will be D0 , otherwise, it will be D.
It is indeed a Markov chain, since the Markov property holds, that is, what
the next state is depends on only the current state and not on previous states.
We claim that this chain is irreducible, aperiodic, and reversible with respect
to the uniform distribution.
To prove that the Markov chain is irreducible, first observe that if there
is a random perturbation perturbing D1 to D2 , then there is also a random
perturbation from D2 to D1 . Then it is sufficient to show that any Dyck word
can be transformed to a reference Dyck word D0 . Let D0 be the Dyck word

x
| .{z
. . x} y . . . y .
| {z }
n n

Any Dyck word can be easily transformed into D0 . Indeed, let D be a Dyck
word, and let i be the smallest index such that D contains a y in position i.
If i = n + 1, then D = D0 , and we are ready. Otherwise there is a position
i0 such that the character in position i0 is x, and all characters in position
i, i + 1, . . . , i0 − 1 are y. Then we can swap characters in positions i0 − 1 and
i0 , then in positions i0 − 2 and i0 − 1, and so on, finally in positions i and
i + 1. Then now the smallest index where there is a y in the Dyck word is
i + 1. Therefore in a finite number of steps, the smallest index where there
is a y will be n + 1, and then we transform the Dyck word into D0 . Then
any Dyck word D1 can be transformed into D2 , since both D1 and D2 can
be transformed into D0 , and the reverse way of transformations from D0 to
D2 are also possible transformation steps in the Markov chain. Then we can
transform D1 into D0 and then D0 into D2 .
To see that the Markov chain is aperiodic, it is sufficient to show that there
are loops in the Markov graph. Indeed, any Dyck word ends with a character
y, therefore whenever the random i is 2n − 1, the Markov chain remains in the
same state. Therefore, in the Markov graph, there is a loop on each vertex,
266 Computational complexity of counting and sampling

that is, there are cycles of length 1. Then the greatest common divisor of the
cycle lengths is 1.
We are going to prove that the Markov chain is reversible with respect to
the uniform distribution. We only have to show that for any Dyck words D1
and D2 ,
T (D2 |D1 ) = T (D1 |D2 ) (6.50)
since in the uniform distribution

π(D1 ) = π(D2 ) (6.51)

naturally holds. It is easy to see that Equation (6.50) holds. If D1 cannot be


transformed into D2 in a single step, and in this case D2 cannot be trans-
formed into D1 in a single step, then both transition probabilities are 0. If
D1 can be transformed into D2 by swapping the characters in positions i and
i + 1 for some i, then D2 can be transformed back to D1 by also swapping
the characters in positions i and i + 1. Both transformations have the same
1
probability, 2n−1 , therefore they are equal. 

It turns out that essentially any Markov chain can be transformed into a
Markov chain that converges to a prescribed distribution π. The technique is
the Metropolis-Hastings algorithm described in the following theorem.
Theorem 66. Let M = (X, T ) be an irreducible and aperiodic Markov chain.
Furthermore, we require that for any x, y ∈ X, the property

T (y|x) 6= 0 =⇒ T (x|y) 6= 0 (6.52)

holds. Let π be a non-vanishing distribution on X. We require that for any


x, π(x) can be calculated, possibly up to an unknown normalizing constant.
π(y)
In other words, for any x, y ∈ X, the ratio π(x) can be calculated. Then
the following algorithm, called the Metropolis-Hastings algorithm, defines a
Markov chain, which is irreducible, aperiodic, and reversible with respect to
the distribution π.
1. Let the current state be xt . Generate a random y following the condi-
tional distribution T (·|x). That is, generate a random next state in the
Markov chain M assuming that the current state is x.

2. Generate a random real number u following the uniform distribution on


the interval [0, 1]. The next state of the Markov chain, xt+1 , is y if

π(y)T (x|y)
u≤ (6.53)
π(x)T (y|x)

and it is xt otherwise.
Methods of random generations 267

Proof. The algorithm generates a Markov chain, since the Markov property
holds, that is, the next state depends on only the current state and not on the
previous states. Let this defined Markov chain be denoted by M 0 . The ratio
in Equation (6.53) is positive, since the distribution π is non-vanishing and
T (x|y) cannot be 0 when y is proposed from x due to the required property
in Equation (6.52). What follows is that M and M 0 have the same Markov
graph. However, in that case, M 0 is irreducible and aperiodic, since M was
also irreducible and aperiodic. We have to show that M 0 is reversible with
respect to the distribution π. First we calculate the transition probabilities in
M 0 . The way to jump from state x to y in M 0 is to first propose y when the
current state is x. This has probability T (y|x). Then the proposed state y has
to be accepted. The acceptance probability is 1 if the ratio in Equation (6.53)
is greater than or equal to 1, and the acceptance probability is the ratio itself
if it is smaller than 1. Therefore, we get that
 
π(y)T (x|y)
T 0 (y|x) = T (y|x) min 1, . (6.54)
π(x)T (y|x)

It immediately follows that M 0 is reversible with respect to π. Indeed,


 
π(y)T (x|y)
π(x)T 0 (y|x) = π(x)T (y|x) min 1, =
π(x)T (y|x)
min {π(x)T (y|x), π(y)T (x|y)} . (6.55)

Namely, π(x)T 0 (y|x) is symmetric to x and y, therefore,

π(x)T 0 (y|x) = π(y)T 0 (x|y) (6.56)

that is, the detailed balance holds.

6.7 Exercises
1. Generate a random variable following the normal distribution with mean
µ and variance σ 2 .
2. Generate a random variable following the Pareto distribution with pa-
rameters xm and α. The support of the Pareto distribution is [xm , ∞)
and its cumulative density function is
 x α
m
1− .
x

3. * Generate uniformly a random triangulation of a convex n-gon.


268 Computational complexity of counting and sampling

4. Generate uniformly a random alignment of two sequences.


5. ◦ Let A, B ∈ Σ∗ be given together with a similarity function s that
maps from (Σ ∪ {−}) × (Σ ∪ {−}) \ {−, −} to the real numbers. Generate
uniformly an alignment from the set of maximum similarity alignments.
6. ◦ Generate uniformly a random perfect matching of a planar graph.
7. Generate uniformly a random Dyck word of length 2n.
8. Let G = (V, E) be a planar graph, and let w : E → R+ be the edge
weights. Generate a random perfect matching of G following the distri-
bution that is proportional to the product of the edge weights.
9. Let Σ be a finite alphabet, and let m : Σ × Σ → Z+ ∪ {0} be an arbitrary
function. Generate uniformly a random sequence that contains the σ1 σ2
substring m(σ1 , σ2 ) times. The running time must be polynomial with
the length of the generated sequence.
10. * Apply the rejection method to generate random variables from the tail
of a normal distribution. That is, the target distribution has probability
density function
−x2
√1 e 2

Φ(−a)
on the domain [a, ∞), a > 0. Use the shifted exponential distribution
for the auxiliary distribution and find the best enveloping constant.
11. Show that for any two distributions π and p, the inequality
dT V (π, p) ≤ 1
holds.
12. * Show that for any two distributions π and p, the equality
X
dT V (π, p) = max (π(x) − p(x))
A⊂X
x∈A

holds.
13. Show that the total variation distance is indeed a distance.
14. * Develop a Markov chain Monte Carlo method that converges to the
uniform distribution of the possible (not necessarily perfect) matchings
of a graph.
15. Develop a Markov chain Monte Carlo method that converges to the
uniform distribution of the spanning trees of a graph.
16. ◦ Develop a Markov chain Monte Carlo method that converges to the
uniform distribution of permutations of length n.
Methods of random generations 269

6.8 Solutions
Exercise 3. Let v1 , v2 , . . . vn denote the vertices of the polygon. A triangula-
tion can be expressed with the list of edges participating in the triangulation.
We give a recursive function that generates a random triangulation of a
polygon defined with a set of edges. For this, we need to define the following
family of distributions. The domain of the distribution pi is {0, 1, . . . , i − 1}
and
Cj Ci−j
pi (j) := ,
Ci
Pi−1
where Ci is the ith Catalan number. (Since Ci = j=0 Cj Ci−j , this is indeed
a distribution). The recursive function is the following:
triangulator(V = {v1 , v2 , . . . , vn })
if |V | ≥ 4
Generate a random j from the distribution p|V |−3
E0 := ∅
if j 6= n
E0 := E0 ∪ {(v1 , vj )}
if j 6= 3
E0 := E0 ∪ {(v2 , vj )}
E1 :=triangulator(v1 , vj , vj+1 , . . . , vn )
E2 :=triangulator(v2 , v3 , . . . , vj )
return E0 ∪ E1 ∪ E2
else
return ∅.

Exercise 5. Generate a directed acyclic graph whose vertices are the entries
of the dynamic programming table computing the most similar alignment
between the two sequences, and there is an edge from d(i1 , j1 ) to d(i2 , j2 )
if d(i2 , j2 ) sends an optimal value to d(i1 , j1 ) in the dynamic programming
recursion (a similarity value or a gap penalty is added to d(i2 , j2 ) to get
d(i1 , j1 )). The optimal alignments are the paths from d(n, m) to d(0, 0), where
n and m are the length of the two sequences. Thus, the task is to sample
uniformly a path between two vertices in an acyclic graph.
Exercise 6. Let G = (V, E) be a planar graph. Fix an arbitrary total ordering
on E, and let (vi , vj ) be the smallest edge. Observe that the number of perfect
matchings containing (vi , vj ) is the number of perfect matchings in G\{vi , vj }
(that is, we remove vertices vi and vj from G, together with all the edges
incident to vi or vj ), and the number of perfect matchings not containing
(vi , vj ) is the number of perfect matchings in G \ {(vi , vj )} (that is, we remove
the edge (vi , vj ) from G).
270 Computational complexity of counting and sampling

Exercise 10. It is easy to see that if

g = ae−a(x−a)

and
−a2
e 2
c= √
2πaΦ(−a)
then
cg(x) ≥ f (x)
for all x ≥ a. Therefore, we can use g and f in a rejection sampling. When
the generated random number is x, the acceptance probability is
−x2
f (x) e 2 (x−a)2
= −a2 = e− 2 .
cg(x) e 2 e−a(x−a)
The expected acceptance probability is
Z ∞ r  
(x−a)2 π a2 a
ae−a(x−a) e− 2 = ae 2 erf c √ ,
x=a 2 2
where erf c is the complementary error function of the normal distribution.
It can be shown that the expected acceptance probability grows strictly
monotonously with a, and it is ≈ 0.65567 when a = 1.
Exercise 12. Let B be the set of points for which π(x) − p(x) ≥ 0. Observe
the following.
1. It holds that
X X
(π(x) − p(x)) = max (π(x) − p(x)).
A⊂X
x∈B x∈A

2. It also holds that


X X
(π(x) − p(x)) = − (π(x) − p(x)).
x∈B x∈B

Since for any x ∈ B, −(π(x) − p(x)) = |π(x) − p(x)|, and for any x ∈ B,
(π(x) − p(x)) = |π(x) − p(x)|, it holds that
X X X
|π(x) − p(x)| = 2 (π(x) − p(x)) = 2 max (π(x) − p(x)).
A⊂X
x x∈B x∈A

Dividing both ends of this equality by 2, we get the equality to be proved.


Exercise 14. For example, the following approach works. Let G = (V, E)
be a graph. Set M to the empty set as a starting state of the Markov chain
(observe that the empty matching is also a matching). Then a step in the
Markov chain is the following.
Methods of random generations 271

1. Draw uniformly a random edge e from E.

2. If e ∈ M , then remove e from M .


3. If e ∈
/ M and e is not adjacent to any edge in M , then add e to M.
It is easy to see that for any matchings M1 and M2 , the transition probability
from M1 to M2 equals the transition probability from M2 to M1 . Indeed,
if M1 and M2 differ by more than one edge, then the transition probability
is 0. Otherwise the presented/missing edge is chosen from E with the same
probability, thus, the transition probabilities are the same.
We also have to show that the Markov chain is irreducible. This is clearly
true since the empty matching can be obtained from any matching by deleting
all the edges, and any matching can be obtained from the empty matching by
adding the edges of the matching in question.
Exercise 16. Applying a random transposition will work, however, it is im-
portant to notice that a transposition changes the parity of the permutation.
That is, if the transitions of a Markov chain are the random transpositions,
then the Markov chain is periodic. Applying the lazy Markov chain technique
solves this problem.
Chapter 7
Mixing of Markov chains and their
applications in the theory of
counting and sampling

7.1 Relaxation time and second-largest eigenvalue . . . . . . . . . . . . . . . . . . 274


7.2 Techniques to prove rapid mixing of Markov chains . . . . . . . . . . . . . 277
7.2.1 Cheeger’s inequalities and the isoperimetric inequality . 278
7.2.2 Mixing of Markov chains on factorized state spaces . . . . 283
7.2.3 Canonical paths and multicommodity flow . . . . . . . . . . . . . 291
7.2.4 Coupling of Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
7.2.5 Mixing of Markov chains on direct product spaces . . . . . 300
7.3 Self-reducible counting problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 301
7.3.1 The Jerrum-Valiant-Vazirani theorem . . . . . . . . . . . . . . . . . . . 302
7.3.2 Dichotomy theory on the approximability of
self-reducible counting problems . . . . . . . . . . . . . . . . . . . . . . . . 307
7.4 Further reading and open questions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
7.4.1 Further reading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
7.4.2 Open problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
7.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313
7.6 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 317

Since it is very easy to design irreducible, aperiodic, and reversible Markov


chains that converge to a prescribed distribution over a finite space, we focus
on them. Such Markov chains have the property that they have a globally
stable stationary distribution. Furthermore, their eigenvalues are all real.

Theorem 67. If M is a reversible Markov chain on a finite state space, then


all of its eigenvalues are real, and fall on the interval [−1, 1].
Proof. It comes from the general theory of Markov chains that any eigenvalue
of a Markov chain falls onto the unit circle of the complex plane. Therefore,
it is sufficient to show that each eigenvalue is real. Indeed, let T denote the
transition matrix of M . Since M is reversible, there exists a distribution π
such that it holds for any pair of states (xi , xj ) that

π(xi )T (xj |xi ) = π(xj )T (xi |xj ) (7.1)

273
274 Computational complexity of counting and sampling

and therefore it also holds that


s s
π(xi ) π(xj )
T (xj |xi ) = T (xi |xj ). (7.2)
π(xj ) π(xi )

That is, the matrix


1 1
Π− 2 T Π 2 (7.3)
is symmetric, where Π is the diagonal matrix containing the π(xi ) values in
the diagonal. Any symmetric real matrix can be diagonalized and has only
real eigenvalues. That is, it holds that
1 1
Π− 2 T Π 2 = W ΛW −1 (7.4)

for some matrix W and diagonal, real matrix Λ. In that case, T can also be
diagonalized and has all real eigenvalues, since
1 1
T = Π 2 W ΛW −1 Π− 2 . (7.5)

In this chapter, any Markov chain is considered to be irreducible, aperiodic,


and reversible, and we will not mention this later on. We also fix the following
notations that will be used throughout the chapter. The state space of the
Markov chain is denoted by X. The transition matrix of a Markov chain is
denoted by T , and the eigenvalues are denoted by 1 = λ1 > λ2 ≥ . . . ≥ λr ,
where r = |X|. We are going to show that λ2 , the second-largest eigenvalue,
tells us if a Markov chains provides an FPAUS algorithm. There are techniques
to give lower and upper bounds on the second-largest eigenvalue. We use this
theory to prove a dichotomy theory for self-reducible counting problems. Any
self-reducible counting problem is either in FPAUS or essentially cannot be
approximated in polynomial time. To be able to prove it, we also have to
prove another important theorem for self-reducible counting problems: any
self-reducible counting problem is either in both FPAUS and FPRAS, or it is
in neither of these classes.

7.1 Relaxation time and second-largest eigenvalue


Definition 54. The second-largest eigenvalue modulus of a Markov chain is
defined as
max{λ2 , |λr |} (7.6)
and is denoted by ρ. It is also abbreviated as SLEM.
Mixing of Markov chains and their applications 275

Definition 55. The relaxation time of a Markov chain is defined as

τi (ε) := min{n0 |∀n ≥ n0 , dT V (T n 1i , π) ≤ ε} (7.7)

where the vector 1i contains 0 in each coordinate except in the ith coordinate,
which is 1.
Theorem 68. For a Markov chain, it holds that
    
1 1 1
τi (ε) ≤ log + log (7.8)
1−ρ π(xi ) ε

and  
ρ 1
max{τi (ε)} ≥ log . (7.9)
i 2(1 − ρ) 2ε
The proof of the first inequality can be found in [54], while the proof of the
second inequality can be found in [5]. Theorem 68 says that the relaxation time
is proportional to the inverse of the difference between the largest eigenvalue
(that is, 1) and the SLEM. The following theorem says that it is sufficient to
consider the second-largest eigenvalue of a Markov chain.

Theorem 69. Let M be a quadratic matrix with eigenvalues λ1 , . . . , λr and


eigenvectors v1 , . . . , vn . Then the matrix
M +I
(7.10)
2
λ1 +1 λr +1
has eigenvalues 2 ,..., 2 and eigenvectors v1 , . . . , vn , where I is the
identity matrix.
Proof. It trivially comes from the basic properties of linear algebraic opera-
tions. Indeed,
M +I 1 1 1 λi + 1
vi = (M + I)vi = (M vi + Ivi ) = (λi vi + vi ) = vi . (7.11)
2 2 2 2 2

Corollary 13. If M is a Markov chain, then the random process defined by


the following algorithm (so-called lazy version of M ) is also a Markov chain
whose SLEM is λ22+1 and converges to the same, globally stable stationary
distribution.
1. Draw a random number u uniformly from the [0, 1] interval.

2. If u ≤ 12 then do nothing; the next state of the Markov chain is the


current state. Otherwise, the next state is drawn following the Markov
chain M .
276 Computational complexity of counting and sampling

Indeed, this process is a Markov chain, since the series of random states
satisfies the Markov property: where we are going depends on the current state
and not where we came from. Its transition matrix is T +I2 , so we can apply
Theorem 69. The largest eigenvalue and its corresponding eigenvector do not
change, so the Markov chain still converges to the same distribution.
We are ready to state and prove the main theorem on the mixing time of
Markov chains and FPAUS algorithms.

Theorem 70. Let #A be a counting problem in #P, and let x denote a


problem instance in #A with size n. Assume that the following holds:
(a) a solution of x can be constructed in O(poly(n)) time,
(b) there is a Markov chain with transition matrix T that converges to the
uniform distribution of the solutions of x, and for its second-largest
eigenvalue it holds that
1
= O(poly(n)), (7.12)
1 − λ2

(c) there is a random algorithm that for any solution y, draws an entry from
the conditional distribution T (·|y) in polynomial time.

Then #A is in FPAUS.
Proof. Let x be a problem instance of #A, and let ε > 0. Since #A is in
#P, there is a constant c > 1 and a polynomial poly1 such that the number
of solutions of x is less than or equal to cpoly1 (n) . Indeed, any witness can
be verified in polynomial time, and the witnesses are described using a fixed
alphabet (the alphabet depends on only the problem and not the problem
instance). To verify a solution, it must be read. What follows is that the
number of solutions cannot be more than |Σ|poly(n) , where Σ is the alphabet
used to describe the solutions and poly() is the natural or given polynomial
upper bound on the running time to verify a solution.
Having said these, it is easy to show that the following algorithm is an
FPAUS:
1. Construct a solution y of x.
2. Using y as the starting point of the Markov chain, do
  
2 1
poly1 (n) log(c) + log (7.13)
1 − λ2 ε

number of steps in the lazy version of the Markov chain.


3. Return with the last state of the Markov chain.
Mixing of Markov chains and their applications 277

Indeed, the state that the algorithm returns follows a distribution that
satisfies Equation (1.31), since
    
2 1 1
τy (ε) ≤ log + log ≤
1 − λ2 π(y) ε
  
2 1
poly1 (n) log(c) + log . (7.14)
1 − λ2 ε

The first inequality comes from Theorem 68 and from Corollary 13. The second
1
inequality comes from the observation that π(y) is the size of the solution
space, since π is the uniform distribution, and we showed that cpoly1 (n) is an
upper bound of it. The running time of the algorithm is O(poly(n, − log(ε))),
since the initial state y can be constructed in polynomial time, there are
poly(n, − log(ε)) number of steps in the Markov chain, and each of them can
be performed in O(poly(n)) time.
This theorem justifies the following definition.
Definition 56. Let #A be a counting problem in #P. Let M be a class of
Markov chains, such that for each problem instance x of #A, it contains a
Markov chain converging to the uniform distribution of witnesses of x. Let Mx
denote this Markov chain, and let λ2,x denote its second-largest eigenvalue. We
say that M is rapidly mixing if
1
= O(poly(|x|)). (7.15)
1 − λ2,x

Similarly, we can say that a Markov chain is slowly or torpidly mixing if


1
= Ω(c|x| ) (7.16)
1 − λ2,x

for some c > 1.

7.2 Techniques to prove rapid mixing of Markov chains


In this section, we are going to give bounds on the second-largest eigen-
value. There are three techniques to prove bounds on the second-largest eigen-
value and thus prove rapid mixing of Markov chains. The first one is a ge-
ometric technique. Cheeger’s inequality says that a Markov chain is rapidly
mixing if and only if it does not contain a bottleneck. If we can prove that
a Markov chain walks in a convex body and some mild conditions hold, then
we can prove rapid mixing, since convex bodies do not have a bottleneck.
278 Computational complexity of counting and sampling

The second technique is a combinatorial method. It sets up a system of


paths, a distribution of paths between any pair of states. If we can show that
in this path system none of the edges are used extensively, then we can prove
rapid mixing. Indeed, if a Markov chain contains a bottleneck, then there are
only a few edges in it, and in the above-mentioned path system, at least one
of them would be used heavily.
The third technique is a probabilistic method. Consider two Markov chains
that depend on each other, however, both of them are copies of the same
Markov chain M . The dependence is such that once the two Markov chains
reach the same state at a given step n0 , then for all n ≥ n0 , they remain in the
same state. If this coupling time n0 happens quickly with a high probability,
then the Markov chain is rapidly mixing.

7.2.1 Cheeger’s inequalities and the isoperimetric inequality


Cheeger’s inequalities connect the conductance of a Markov chain and its
second-largest eigenvalue. First we define the conductance and then state the
inequalities.
Definition 57. The capacity of a subset S ⊆ X of the state space is defined
as X
π(S) := π(x). (7.17)
x∈S

The ergodic flow of a subset S ⊆ X of the state space is defined as


X
F (S) := π(x)T (y|x). (7.18)
x∈S
y∈S

The conductance of a Markov chain is


 
F (S) 1
Φ := min S ⊂ X, 0 < π(S) ≤ . (7.19)
π(S) 2

Theorem 71. Cheeger’s inequality The second-largest eigenvalue λ2 sat-


isfies the following inequalities:

Φ2
1 − 2Φ ≤ λ2 ≤ 1 − . (7.20)
2
The proof can be found in [25], Chapter 6. The two inequalities in Equa-
tion (7.20) will be referred to as the left and right Cheeger’s inequality. This
theorem says that a Markov chain is rapidly mixing if and only if its conduc-
tance is large. We can prove the torpid mixing of a Markov chain by finding
a subset whose ergodic flow is negligible compared to its capacity.
Mixing of Markov chains and their applications 279

Example 20. A fully balanced, rooted binary tree of depth n contains 2n


leaves and altogether 2n+1 − 1 vertices. The vertices at depth k are labeled by
vk,1 , vk,2 , . . . , vk,2k .
Let M be a set of Markov chains that contains a Markov chain for each
fully balanced, rooted binary tree of depth n. The state space of the Markov
chain Mn is the set of vertices of the fully balanced rooted binary tree of depth
n. The transition probabilities are
P (vk+1,2j−1 |vk,j ) = P (vk+1,2j |vk,j ) =
1
P (vk−1,d j c |vk,j ) = k = 1, . . . n − 1 (7.21)
2 3
P (v1,1 |v0,1 ) = P (v1,2 |v0,1 ) =
1
P (v0,1 |v0,1 ) = (7.22)
3
1
P (vn−1,d j c |vn,j ) = (7.23)
2 3
2
P (vn,j |vn,j ) = . (7.24)
3
Show that the Markov chain is torpidly mixing, that is, λ2,n converges to 1
exponentially quick.
Solution. It is easy to see that the Markov chain is irreducible, aperiodic, and
reversible with respect to the uniform distribution of the vertices. Consider
the subtree rooted in v1,1 , and let S be the set which contains its vertices. This
set contains 2n − 1 vertices, therefore, the capacity is less than half. However,
the only way to flow out from this subset is to go from v1,1 to the root, v0,1 .
Therefore, the ergodic flow divided by the capacity is
1 1
π(v1,1 )T (v0,1 |v1,1 )) n+1 −1 3 1
= 2 2n −1 = n − 1)
. (7.25)
π(S) 2n+1 −1
3(2
The conductance cannot be larger than this particular value, therefore, we get
that
2
1− ≤ 1 − Φ ≤ λ2,n . (7.26)
3(2n − 1)
That is, λ2,n converges exponentially quickly to 1. In other words, if we rear-
range Equation (7.26), then we get that
1 3(2n − 1)
≥ = Ω(2n ). (7.27)
1 − λ2,n 2
Therefore, the relaxation time grows exponentially quick. 
In the definition of conductance, there might be double exponentially many
subsets to be considered. Indeed, the size of the state space might be an
exponential of the input size, and the number of subsets with at most half
probability might be also an exponential function of the base set, see also
Exercise 8. In spite of this, surprisingly, we can prove rapid mixing of a Markov
chain using the right Cheeger’s inequality.
280 Computational complexity of counting and sampling

Example 21. Let M be a set of Markov chains; the Markov chains in it are
defined on the same state spaces as in Example 20. However, in this case, let
the jumping probabilities be the following:
1
P (vk+1,2j−1 |vk,j ) = P (vk+1,2j |vk,j ) = k = 0, . . . n − 1 (7.28)
4
1
P (vk−1,d j c |vk,j ) = k = 1, . . . n − 1 (7.29)
2 2
1
P (v0,1 |v0,1 ) = (7.30)
2
1
P (vn−1,d j c |vn,j ) = P (vn,j |vn,j ) = . (7.31)
2 2
Show that the Markov chain is rapidly mixing, that is, λ2,n converges to 1 only
polynomially quick.
1
Solution. Let Si := {xi,1 , xi,2 , . . . , xi,2i }. It is easy to see that π(Si ) = n+1 ,
π is the uniform distribution restricted to any Si , and the Markov chain is
reversible with respect to π. It follows that for any xi,j ,
1
π(xi,j ) = . (7.32)
(n + 1)2i
We can also observe the following. Let S be an arbitrary subset of vertices. S
can be decomposed into connected components, S = tk Ck . Then
 
F (S) F (Ck )
≥ min . (7.33)
π(S) k π(Ck )
Indeed, P
F (S) F (Ck )
= Pk (7.34)
π(S) k π(Ck )
therefore, it is sufficient to show that for any a1 , a2 , b1 , b2 > 0, the inequality
 
a1 + a2 a1 a2
≥ min , (7.35)
b1 + b2 b1 b2
a1 a2
holds. Without loss of generality, we can say that b1 ≤ b2 . Then we have to
show that
a1 + a2 a1
≥ . (7.36)
b1 + b2 b1
Rearranging this, we get that

a1 b1 + a2 b1 ≥ a1 b1 + a1 b2 . (7.37)

That is,
a2 b1 ≥ a1 b2 , (7.38)
a1 a2
which holds, since b1 ≤ b2 .
Mixing of Markov chains and their applications 281

What follows is that the conductance is taken on a connected subgraph,


which has to be a subtree. Let S contain these vertices. We distinguish two
cases, based on whether or not S contains v0,1 . If S does not contain v0,1 , then
let vi,j be the root of S. Observe that π(vi,j ) ≥ n1 π(S). Indeed, the equality

π(vi,j ) = π(vi+1,2j−1 ) + π(vi+1,2j ) (7.39)

holds for any vertex. Then


 
F (S) π(vi,j )T vi−1,d j e vi,j 1
2
Φ= ≥ ≥ . (7.40)
π(S) π(S) 2n

If S contains v0,1 , then S is a disjoint union of subtrees. These subtrees have


roots vi1 ,j1 , vi2 ,j2 , . . .. For each of these vi,j , vi−1,d j e is in S. Furthermore, it
2
holds that   2
π vi−1,d j e ≥ π(Gvi,j ) (7.41)
2 n
where Gvi,j denotes the subtree rooted in vi,j . Since π(S) ≥ 12 , it follows that
X  
1
π vi −1, jk
l m ≥ . (7.42)
k 2 n
k

Then
   
P
k π vi T vik ,jk vi
l m l m
jk jk
F (S) k −1, 2 k −1, 2
Φ= ≥ 1 ≥
π(S) 2
1 1
n4 1
1 = . (7.43)
2
2n
1
Therefore, the conductance is at least 2n . Putting this into the right Cheeger
inequality, we get that

Φ2 1
λ2,n ≤ 1 − ≤ 1 − 2. (7.44)
2 8n
That is, λ2,n tends to 1 only polynomially quick with n. 
We are going to use the same example Markov chain to demonstrate other
proving methods. In many cases, the state space of a Markov chain has a
more complex structure than in this example. However, if the states can be
embedded into a convex body, then we might be able to prove rapid mixing,
since a convex body does not have a bottleneck. This might be stated formally
with the following theorem.
282 Computational complexity of counting and sampling

Theorem 72. Let X be a convex body in Rn , partitioned into two parts, U


and W . Let C denote the n − 1-dimensional surface cutting X into U and W .
Then the inequality
min{V (U ), V (W )}
A(C) ≥ (7.45)
diam(X)
holds, where A denotes the area (n − 1-dimensional measure), V denotes the
volume, and diam(X) denotes the diameter of X.

The proof can be found in [109]. Below we show an example of how to use
this theorem to prove rapid mixing. In Chapter 8, we will prove that #LE is
in FPAUS applying this theorem.
Example 22. Let M be a set of Markov chains which contains a Markov
chain Mn for each positive integer n. The state space of Mn includes the 0 − 1
vectors of length n. There is a transition between two states if they differ in
exactly one coordinate. Each transition probability is n1 . Prove that the Markov
chain is rapidly mixing.
Solution. It is easy to see that the Markov chain n converges to the uniform
distribution, and each state has probability 12 . We will denote this by π.
Let s = {s1 , s2 , . . . , sn } be a state in Mn . Assign a convex body to s defined
by the following inequalities:
si si 1
≤ xi ≤ + . (7.46)
2 2 2
It is easy to see that the unions of these convex bodies are hypercubes, and
their union is the unit hypercube. A transition of the Markov chain can happen
between two states whose hypercubes have a common surface which is an
n−1
n − 1-dimensional hypercube. The area of this surface is 12 . The volume
n √
of each small hypercube is 12 . The diameter of the unit hypercube is n.


Let S be the subset of the state space of the Markov chain that defines the
conductance. Let U denote the body corresponding to S and let W denote
the body corresponding to S. Let C denote the surface separating U and W .
Observe that C is a union of n − 1-dimensional hypercubes that corresponds
to the possible transitions between U and W . Then on one hand, we know
that
P
π(x)T (y|x) A(C) 1
n−1 π n
F (S) x∈S,y∈S ( 12 ) A(C)
Φ= = P = V (U ) = . (7.47)
π(S) x∈S π(S) 1 nπ
2nV (U )
(2)

On the other hand, from Theorem 72, we know that

A(C) 1
≥√ , (7.48)
V (U ) n
Mixing of Markov chains and their applications 283

therefore we get that


1
Φ≥ √ . (7.49)
2n n
Combining this with the right Cheeger’s inequality, we obtain that
1
λ2,n ≤ 1 − . (7.50)
8n5
That is, the second-largest eigenvalue converges to 1 only polynomially quick,
and the Markov chain is thus rapidly mixing. 

7.2.2 Mixing of Markov chains on factorized state spaces


Intuitively, when there is a partitioning of the state space of a Markov
chain, the Markov chain is rapidly mixing on the partitions, and the Markov
chain is rapidly mixing among the partitions, too, then the Markov chain is
rapidly mixing on the whole space. In this section we provide a mathematiclly
rigorous description of this intuitive observation and prove it. The theorem
and its proof were given by Erdős, Miklós and Toroczkai [64].
First, we need a generalization of the Cheeger inequality (the lower-bound).
Lemma 14. For any reversible Markov chain, and any subset S of its state
space,
1 − λ2 X
min{π(S), π(S)} ≤ π(x)T (y|x) . (7.51)
2
x∈S,y∈S

Proof. The right-hand side of Equation (7.51) is symmetric due to the re-
versibility of the chain. Thus, if π(S) > 12 , then S and S can be switched. If
π(S) ≤ 12 , the inequality is simply a rearrangement of the Cheeger inequality
(the left inequality in Theorem 71.). Indeed,
P
x∈S,y∈S π(x)T (y|x)
1−2 ≤ 1 − 2Φ ≤ λ2 . (7.52)
π(S)

Rearranging the two ends of the inequality in Equation (7.52), we get the
inequality in Equation (7.51).
Now we are ready to state and prove a general theorem on rapidly mixing
Markov chains on factorized state spaces.
Theorem 73. Let M be a class of reversible, irreducible and aperiodic
Markov chains whose state space Y can be partitioned into disjoint classes
Y = ∪x∈X Yx by the elements of some set X. The problem size of a particular
chain is denoted by n. For notational convenience we also denote the element
y ∈ Yx via the pair (x, y) to indicate the partition it belongs to. Let T be the
transition matrix of M ∈ M, and let π denote the stationary distribution of
284 Computational complexity of counting and sampling

M. Moreover, let πX denote the marginal of π on the first coordinate that is,
πX (x) = π(Yx ) for all x. Also, for arbitrary but fixed x let us denote by πYx the
stationary probability distribution restricted to Yx , i.e., π(y)/π(Yx ), ∀y ∈ Yx .
Assume that the following properties hold:
i For all x, the transitions with x fixed form an aperiodic, irreducible and
reversible Markov chain denoted by Mx with stationary distribution πYx .
This Markov chain Mx has transition probabilities as Markov chain M
for all transitions fixing x, except loops, which have increased probabili-
ties such that the transition probabilities sum up to 1. All transitions that
would change x have 0 probabilities. Furthermore, this Markov chain is
rapidly mixing, i.e., for its second-largest eigenvalue λMx ,2 it holds that

1
≤ poly1 (n).
1 − λMx ,2

ii There exists a Markov chain M 0 with state space X and with transition
matrix T 0 which is aperiodic, irreducible and reversible w.r.t. πX , and
for all x1 , y1 , x2 it holds that
X
T ((x2 , y2 )|(x1 , y1 )) ≥ T 0 (x2 |x1 ). (7.53)
y2 ∈Yx2

Furthermore, this Markov chain is rapidly mixing, namely, for its


second-largest eigenvalue λM 0 ,2 it holds that

1
≤ poly2 (n).
1 − λM 0 ,2

Then M is also rapidly mixing as its second-largest eigenvalue obeys:

1 256poly21 (n)poly22 (n)


≤ 4
1 − λM,2

1 − √12
S
Proof. For any non-empty subset S of the state space Y = Yx of M we
x
define
X(S) := {x ∈ X | ∃y, (x, y) ∈ S}
and for any given x ∈ X we have

Yx (S) := {(x, y) ∈ Y | (x, y) ∈ S} = Yx ∩ S.

We are going to prove that the ergodic flow F (S) (see Equation (7.18)) from
any S ⊂ Y with 0 < π(S) ≤ 1/2 cannot be too small and therefore, neither
the conductance of the Markov chain will be small. We cut the state space
Mixing of Markov chains and their applications 285

into two parts Y = Y l ∪ Y u , namely the lower and upper parts using the
following definitions (see also Fig. 7.1): the partition X = L t U is defined as

 
π(Yx (S))
L := x∈X ≤ 1/ 2 ,
π(Yx )

 
π(Yx (S))
U := x∈X > 1/ 2 .
π(Yx )
Furthermore, we introduce:
[ [
Y l := Yx and Y u := Yx ,
x∈L x∈U

and finally let


Sl := S ∩ Y l and Su := S ∩ Y u .

Yl Yu

FIGURE 7.1: The structure of Y = Y l t Y u . A non-filled ellipse (with a


simple line boundary) represents the space Yx for a given x. The solid black
ellipses represent the set S with some of them (the Sl ) belonging to the lower
part Y l , and the rest (the Su ) belonging to the upper part (Y u ).

Since M 0 is rapidly mixing we can write (based on Theorem 71):


1
1 − 2ΦM 0 ≤ λM 0 ,2 ≤ 1 − ,
poly2 (n)
or
1
ΦM 0 ≥ .
2poly2 (n)
Without loss of generality, we can assume that poly2 (n) > 1 for all positive
n, a condition that we need later on for technical reasons. We use this lower
bound of conductance to define two cases regarding the lower and upper part
of S.
286 Computational complexity of counting and sampling

1. We say that the lower part Sl is not a negligible part of S when


 
π(Sl ) 1 1
≥ √ 1− √ . (7.54)
π(Su ) 4 2poly2 (n) 2

2. We say that the lower part Sl is a negligible part of S when


 
π(Sl ) 1 1
< √ 1− √ . (7.55)
π(Su ) 4 2poly2 (n) 2

Our plan is the following: the ergodic flow F (S) is positive on any non-empty
subset and it obeys:
π(Sl ) π(Su )
F (S) = F 0 (Sl ) + F 0 (Su ) ,
π(S) π(S)
where
1 X
F 0 (Sl ) := π(x)T (y|x)
π(Sl )
x∈Sl ,y∈S̄

and
1 X
F 0 (Su ) := π(x)T (y|x).
π(Su )
x∈Su ,y∈S̄

In other words, F 0 (Sl ) and F 0 (Su ) are defined as the flow going from Sl and
Su and leaving S.
The value F (S) cannot be too small, if at least one of F 0 (Sl ) or F 0 (Su ) is
big enough (and the associated fraction π(Sl )/π(S) or π(Su )/π(S)). In Case
1 we will show that F 0 (Sl ) itself is big enough. To that end it will be sufficient
to consider the part which leaves Sl but not Y l (this guarantees that it goes
out of S, see also Fig. 7.2). For Case 2 we will consider F 0 (Su ), particularly
that part of it which goes from Su to Y l \ Sl (and then going out of S, not
only Su , see also Fig. 7.3).
In Case 1, the flow going out from Sl within Y l is sufficient to prove that
the conditional flow going out from S is not negligible. We know that for
any particular x, we have a rapidly mixing Markov chain Mx over the second
coordinate y. Let their smallest conductance be denoted by ΦX . Since all these
Markov chains are rapidly mixing, we have that
1
max λMx ,2 ≤ 1 −
x poly1 (n)
or, equivalently:
1
ΦX ≥ .
2poly1 (n)
However, in the lower part, for any particular x one has:
π(Yx (S)) 1
πYx (Yx (S)) = ≤√ ,
π(Yx ) 2
Mixing of Markov chains and their applications 287

Yl Yu

FIGURE 7.2: When Sl is not a negligible part of S, there is a considerable


flow going out from Sl to within Y l , implying that the conditional flow going
out from S cannot be small. See text for details and rigorous calculations.

so for any fixed x belonging to L it holds that


  
1 1
min πYx (Yx (S)), 1 − √ ≤
2poly1 (n) 2
X
≤ πYx ((x, y))T ((x, y 0 )|(x, y))
(x,y)∈S,(x,y 0 )∈S̄

using the modified Cheeger inequality (Lemma 14). Observing that


π((x, y))
πYx ((x, y)) = ,
π(Yx )
we obtain:
 
1 1
π(Yx (S)) 1 − √ ≤
2poly1 (n) 2
  
1 1
≤ min π(Yx (S)), π(Yx ) 1 − √ ≤
2poly1 (n) 2
X
≤ π((x, y))T ((x, y 0 )|(x, y)) .
(x,y)∈S,(x,y 0 )∈S̄

Summing this for all the x’s belonging to L, we deduce that


 
1 1
π(Sl ) 1− √ ≤
2poly1 (n) 2
 
X X
 π((x, y))T ((x, y 0 )|(x, y)) .
x|Yx (S)⊆Sl (x,y)∈S,(x,y 0 )∈S̄
288 Computational complexity of counting and sampling

Note that the flow on the right-hand side of Equation 7.56 is not only going
out from Sl but also from the entire S. Therefore, we have that
 
π(Sl ) 1 1
F (S) ≥ × 1− √ .
π(S) 2poly1 (n) 2
Either π(Sl ) ≤ π(Su ), which then yields
 
π(Sl ) π(Sl ) π(Sl ) 1 1
= ≥ ≥ √ 1− √
π(S) π(Sl ) + π(Su ) 2π(Su ) 8 2poly2 (n) 2

after using Equation 7.54, or π(Sl ) > π(Su ), in which case we have
 
π(Sl ) 1 1 1
> ≥ √ 1− √ .
π(S) 2 8 2poly2 (n) 2

(Note that poly2 (n) > 1.) Thus in both cases the following inequality holds:
   
1 1 1 1
F (S) ≥ √ 1− √ × 1− √ .
8 2poly2 (n) 2 2poly1 (n) 2

In Case 2, the lower part of S is a negligible part of S. We have that

Yl Yu

FIGURE 7.3: When Sl is a negligible part of S, there is a considerable flow


going out from Su into Y l \ Sl . See text for details and rigorous calculations.

1
πX (X(Su )) ≤ √
2
otherwise π(Su ) > 1/2 would happen (due to the definition of the upper part),
and then π(S) > 1/2, a contradiction.
Mixing of Markov chains and their applications 289

Hence in the Markov chain M 0 , based on the Lemma 14, we obtain for
X(Su ) that
  
1 1 X
min πX (X(Su )), 1 − √ ≤ πX (x)T 0 (x0 |x). (7.56)
2poly2 (n) 2 0 x ∈X(Su )
x∈X(Su )

For all y for which (x, y) ∈ Su , due to Equation (7.53), we can write:
X
T 0 (x0 |x) ≤ T ((x0 , y 0 )|(x, y)) .
y0

Multiplying this by π((x, y)), then summing for all suitable y:


X X
π(Yx (S))T 0 (x0 |x) ≤ π((x, y))T ((x0 , y 0 )|(x, y))
y|(x,y)∈Su y 0

(note that x ∈ U and thus Yx (S) = Yx (Su )) and thus


0 0
P P
0 0 y|(x,y)∈Su y 0 π((x, y))T ((x , y )|(x, y))
T (x |x) ≤ .
π(Yx (S))
Inserting this into Equation 7.56, we find that
  
1 1
min πX (X(Su )), 1 − √ ≤
2poly2 (n) 2
X πX (x) X X
≤ π((x, y))T ((x0 , y 0 )|(x, y)).
π(Yx (S)) 0
x∈X(Su ),x0 ∈X(Su ) y|(x,y)∈Su y

πX (x) √
Recall that πX (x) = π(Yx ), and thus π(Y x (S)) ≤ 2 for all x ∈ X(Su ).
Therefore we can write that
  
1 1
min πX (X(Su )), 1 − √ ≤
2poly2 (n) 2
 
√ X X
2  π((x, y))T ((x0 , y 0 )|(x, y)) .
(x,y)∈Su (x0 ,y 0 )|x0 ∈X(Su )

Note that π(Su ) ≤ πX (X(Su )) < 1, and since both items in the minimum
taken in the LHS are smaller than 1, their product will be smaller than any
of them. Therefore we have
 
1 1
√ π(Su ) 1 − √ ≤
2 2poly2 (n) 2
 
X X
≤  π((x, y))T ((x0 , y 0 )|(x, y)) .
(x,y)∈Su (x0 ,y 0 )|x0 ∈X(Su )
290 Computational complexity of counting and sampling

This flow is going out from Su , and it is so large that at most half of it
can be picked up by the lower part of S (due to reversibility and due to
Equation 7.55), and thus the remaining part, i.e., at least half of the flow,
must go out of S. Therefore:
 
π(Su ) 1 1
× √ 1− √ ≤ F (S) .
π(S) 4 2poly2 (n) 2

However, since Su dominates S, namely, π(Su ) > π(S)


2 , we have that
 
1 1
√ 1− √ ≤ F (S).
8 2poly2 (n) 2
Comparing the bounds from Case 1 and Case 2, for all S satisfying 0 < π(S) ≤
1
2 , we can write:
 2
1 1
√ 1− √ ≤ F (S).
16 2poly2 (n)poly1 (n) 2
And thus, for the conductance of the Markov chain M (which is the minimum
over all possible S)
 2
1 1
√ 1− √ ≤ ΦM .
16 2poly2 (n)poly1 (n) 2
Applying this to the Cheeger inequality, one obtains
  2  2
1 √1

16 2poly2 (n)poly1 (n)
1− 2
λM,2 ≤ 1 −
2
and thus
1 256poly21 (n)poly22 (n)
≤ 4
1 − λM,2

1 − √12
which is what we wanted to prove.
Martin and Randall [125] have developed a similar theorem. They assume
a disjoint decomposition of the state space Ω of an irreducible and reversible
Markov chain defined via the transition probabilities P (y|x). They require that
the Markov chain be rapidly mixing when restricted onto each partition Ωi
(Ω = ∪i Ωi ) and furthermore, another Markov chain, the so-called projection
Markov chain P (i|j) defined over the indices of the partitions be also rapidly
mixing. If all these hold, then the original Markov chain is also rapidly mixing.
For the projection Markov chain they use the normalized conditional flow
1 X
P (j|i) = π(x)P (y|x) (7.57)
π(Ωi )
x∈Ωi ,y∈Ωj
Mixing of Markov chains and their applications 291

as transition probabilities. This can be interpreted as a weighted average tran-


sition probability between two partitions, while in our case, Equation (7.53)
requires only that the transition probability of the lower bounding Markov
chain is not more than the minimum of the sum of the transition probabilities
going out from one member of the partition (subset Yx1 ) to the other member
of the partition (subset Yx2 ) with the minimum taken over all the elements
of Yx1 . Obviously, it is a stronger condition that our Markov chain must be
rapidly mixing, since a Markov chain is mixing slower when each transition
probability between any two states is smaller. (The latter statement is based
on a comparison theorem by Diaconis and Saloff-Coste [52].) Therefore, from
that point of view, our theorem is weaker. On the other hand, the average
transition probability (Equation (7.57)) is usually hard to calculate, and in
this sense our theorem is more applicable. Note that Martin and Randall have
also resorted in the end to using chain comparison techniques (Sections 2.2 and
3 in their paper) employing a Metropolis-Hastings chain as a lower bounding
chain instead of the projection chain above. Our theorem, however, provides
a direct proof of a similar statement.

7.2.3 Canonical paths and multicommodity flow


The key idea in this approach is that if we can set up (possibly random)
paths between each pair of states in the Markov graph such that the (expected)
usage of any edge is comparable with the size of the state space (and also some
mild conditions hold), then there is no bottleneck in the Markov chain. That
is, the conductance is high and the Markov chain is rapidly mixing. This
idea was first introduced in the PhD thesis of Alastair Sinclair [157]. Diaconis
and Stroock observed that path arguments can be used to directly bound
the second-largest eigenvalue, and thus the mixing time, independently of the
conductance of the Markov chain [54]. Later Sinclair improved these bounds
extending the method to random paths (multicommodity flow) [158]. Since
then several variants have appeared, from which we show a selection below.
First, we define the maximum load of a path system.
Definition 58. Let G be the Markov graph of a Markov chain. We define the
conductance of an edge e = (vi , vj ) as

Q(e) := π(vi )T (vj |vi ). (7.58)

For each ordered pair of states (vx , vy ), we define a path vx = v1 , v2 , . . . vl = vy


that uses any edge at most once. The path is denoted by γx,y . Let Γ denote
the so obtained path system. The maximum load of Γ is defined as
1 X
ϑΓ := max π(vx )π(vy ). (7.59)
e Q(e) γ 3e
x,y

Sinclair proved the following theorem [157]:


292 Computational complexity of counting and sampling

Theorem 74. Let Γ be a path system of a Markov chain. Then the inequality
1
Φ≥ (7.60)
ϑΓ
holds.
We can combine this theorem with the right Cheeger’s inequality to get
the following corollary:
Corollary 15. Let Γ be a path system of a Markov chain. Then the inequality
1
λ2 ≤ 1 − (7.61)
2ϑ2Γ

holds.

Diaconis and Stroock proved a theorem on mixing rates of Markov chains


that does not use conductance. Their method gives better bound on the mixing
time if the paths are relatively short. They use the Poincaré coefficient that
we define below.
Definition 59. Let the conductance of an edge be defined as above. The Q-
measure of a path is defined as the sum of the resistance of the edges, that is,
the inverse of the conductances:
X 1
|γx,y |Q := . (7.62)
e∈γx,y
Q(e)

Let Γ denote the so obtained path system. The Poincaré coefficient of Γ is


defined as X
κΓ := max |γx,y |Q π(vx )π(vy ). (7.63)
e
γx,y 3e

The following theorem holds for the Poincaré coefficient.


Theorem 75. Let Γ be a path system of a Markov chain. Then the inequality
1
λ2 ≤ 1 − (7.64)
κ
holds.
The proof can be found in [54]. We show an example of how to use these
theorems.

Example 23. Let M be the set of Markov chains as in Example 21. Prove
its rapid mixing using Theorem 74.
Mixing of Markov chains and their applications 293

Solution. Since the Markov graph is a tree, there is only one possible path
system: the path connecting the pair of vertices in the graph. We have to show
that for any edge e going from vi,j to vi−1,d j e ,
2

1 X
π(vx )π(vy ) (7.65)
Q(e) γ 3e
x,y

is polynomially bounded, and the same is true for the antiparallel edge e0
going from vi−1,d j e to vi,j .
2
Let S denote the set of vertices containing vi,j and the vertices below vi,j .
Then
1 X 1 X X
π(vx )π(vy ) = 1 π(vx ) π(vy ). (7.66)
Q(e) γ 3e π(vi,j ) 2 v ∈S
x,y x vy ∈S

We know that
1
π(vi,j ) ≥ π(S), (7.67)
n
therefore
1 X X X
1 π(vx ) π(vy ) ≤ 2n π(vy ) ≤ 2n. (7.68)
π(vi,j ) 2 v ∈S
x vy ∈S vy ∈S

Similarly, we know that


  2
π vi−1,d j e ≥ π(S). (7.69)
2 n
Therefore
1 X 1 X X
π(v x )π(v y ) = π(v x ) π(vy ) ≤
Q(e0 )
 
γx,y 3e0 π vi−1,d j e 14 vx ∈S v y ∈S
2
X
2n π(vy ) ≤ 2n. (7.70)
vy ∈S

Therefore, we get that


ϑΓ ≤ 2n, (7.71)
and thus
1
λ2,n ≤ 1 − . (7.72)
8n2

Sometimes it is hard to handle |γx,y |Q . Sinclair [158] modified the theorem
of Diaconis and Stroock:
294 Computational complexity of counting and sampling

Theorem 76. Let Γ be a path system. Define


1 X
KΓ := max |γx,y |π(vx )π(vy ), (7.73)
e Q(e)
γ 3e x,y

where |γx,y | is the length of the path, that is, the number of edges in it. Then
the inequality
1
λ2 ≤ 1 − (7.74)

holds.
This theorem is related to the maximal load. Indeed, the following corollary
holds.
Corollary 16. For any path system Γ, the inequality
1
λ2 ≤ 1 − (7.75)
l Γ ϑΓ
holds, where lΓ is the length of the longest path in the path system.
When lΓ < ϑΓ , then this inequality provides better bounds. Obviously, if
we would like to prove only polynomial mixing time, any of these theorems
might be applicable. On the other hand, when we try to get sharp estimates,
it is important to carefully select which theorem to use.
When the Markov graph has a complicated structure, or just the opposite,
a very symmetric structure, it might be hard to design a path system such that
none of the edges are overloaded. An overloaded edge might cause weak upper
bound on λ2 and thus fail to prove rapid mixing. In such cases, a random path
system might help. First we define the flow functions.
Definition 60. Let Πi,j denote the set of paths between vi and vj in a Markov
graph, and let Π = ∪i,j Πi,j . A multicommodity flow in G is a function f :
Π → R+ ∪ {0} satisfying
X
f (γ) = π(vi )π(vj ). (7.76)
γ∈Πi,j

We can think of f as having a probability distribution of paths for each


pair of vertices (vi , vj ), and then these probabilities are scaled by π(vi )π(vj ).
We can also define for each edge e
X
f (e) := f (γ), (7.77)
γ3e

and similar to ϑf , the maximal expected load as


f (e)
ϑf := max . (7.78)
e Q(e)
The probabilistic version of Theorem 74 and its corollary can also be stated.
Mixing of Markov chains and their applications 295

Theorem 77. Let f be a multicommodity flow. Then the inequality


1
Φ≥ (7.79)
2ϕf

holds.
Corollary 17. Let f be a multicommodity flow. Then the inequality
1
λ2 ≤ 1 − (7.80)
8ϑ2f

holds.

We give an example of when the canonical path method does not give a
good upper bound, however, the multicommodity flow approach will help find
a good upper bound on the second-largest eigenvalue.
Example 24. Consider the Markov chain whose state space contains the 2n-
long sequences containing n a characters and n b characters in an arbitrary
order. The transitions are defined with the following algorithm. We choose an
index i between 1 and 2n − 1 uniformly. If the characters in position i and
i + 1 are different, then we swap them, otherwise the Markov chain remains
in the current state. Prove that the mixing time grows only polynomially with
n.

Solution. One might think that the following canonical path system gives a
good upper bound on the second-largest eigenvalue.
Let x = x1 x2 . . . x2n and y = y1 y2 . . . y2n . Construct a path γxy in the
following way. If x1 6= y1 , find the smallest index i such that xi = y1 . “Bubble
down” xi , that is, swap xi−1 and xi , then xi−2 with xi−1 (which is the prior
xi ), etc. Then compare x2 and y2 ; if they are different, then find the smallest
i such that xi = y2 , bubble down xi , and so on.
However, this path does not provide a good upper bound. To see this,
consider the edge in the Markov graph going from state

aa . . . a} ba |bb {z
| {z . . . }b aa . . . a}
| {z
n/2-1 n-1 n/2

to state
aa . . . a} bb
| {z . . . }b aa
| {z . . . a} .
| {z
n/2 n n/2

How many γxy paths are going through this edge? The first state x of the path
γxy might be an arbitrary sequence that has a suffix of all a’s of length n/2.
There are n b characters and
 n/2 a characters in its corresponding prefix, that
might be arranged in 1.5nn different ways. Similarly, the last state y of the
296 Computational complexity of counting and sampling

path γxy might be an arbitrary string with a prefix of all a’s of length n/2.
Again, there are 1.5
n possibilities. Therefore, there are
!2
2 1.5n 1.5n
 r 
1.5n 3πn e
≈ =
n πn2πn 0.5n 0.5n n n
 
e e
r  1.5 2n r
3 1.5 3
= 6.75n (7.81)
2πn 0.50.5 2πn

paths that go through this edge. Here we used the Stirling formula to approx-
imate factorials. On the other hand, there are only
  r 2n 2n

2n 4πn 1
≈ n
n n n = √ 4 n
e
(7.82)
n 2πn2πn e e
πn

states in the Markov chain. It is easy to see that the stationary distribution of
the Markov chain is the uniform distribution. In the definition of max load in
Equation (7.59), the stationary distribution probability in Q(e) and one of the
stationary probabilities in the summation cancel each other. There is another
stationary distribution probability in the summation which is (neglecting the
non-exponential terms) in order 41n . However, the summation is on an order
of 6.75n paths (again, neglecting the non-exponential terms). That is, the
maximum load is clearly exponential for this canonical path system.
Consider now a multicommodity flow such that there are at most (n!)2
paths between any pair of states with non-zero measure. First, we construct a
multiset of paths of cardinality (n!)4 . Each path in this multiset has measure

π(vi )π(vj )
f (γ) = . (7.83)
(n!)4

Then the final flow between the two states is obtained by assigning a measure
to each path that is proportional to its multiplicity in the multiset.
The paths with non-zero measures are defined in the following way. Let
x = x1 x2 . . . x2n and y = y1 y2 . . . y2n , and let σx,a , σx,b , σy,a and σy,b be
four permutations. Index both the a characters and the b characters by the
permutations in both x and y. Now transform the indexed x into the index y
in the same way as we did in the canonical path reconstruction. The important
difference is that there is exactly one ai and bj character in both x and y for
each of the indexes i and j. Thus, if x1 is not y1 , then find the unique character
xi in x which is y1 , and bubble it down into the first position. Do this procedure
for each character in x to thus transform x into y. The path assigned to this
transformation can be constructed by removing the indexes and also removing
the void steps (when some neighbors ai and aj are swapped in the indexed
version, but they are two neighbor a’s when indexes are removed, there is
nothing to swap there!). We are going to estimate the number of paths that
can get through a given edge. For this, we are going to count the indexed
Mixing of Markov chains and their applications 297

versions of the paths keeping in mind that each of them has a measure given
in Equation (7.83).
Surprisingly, this multicommodity flow provides an upper bound on the
mixing time which is only polynomial with n. To see this, consider any edge
e swapping characters zi and zi+1 in a sequence z. It is easy to see that any
indexing of the characters in z might appear on indexed paths getting through
e. So fix an arbitrary indexing. We are bubbling down character zi+1 , so there
is some 0 ≤ k < i for which the indexes of the first k characters and the index
of character zi+1 are in order as they will appear in y. The other indexes
appear as they were in x. Then there are at most
 
2n
(k + 1)! (7.84)
k+1

possible indexed x sequences that might be the starting sequence of an indexed


path containing e. Indeed, the first k characters as well as character zi+1 must
be put back to the other 2n − k − 1 characters in arbitrary order with the
exception that the position of zi+1 cannot be smaller than i + 1. The other
2n−k−1 characters can be put into an arbitrary order (zi+1 will be in position
k + 1 in the destination sequence y), thus there are (2n − k − 1)! possible y
sequences. Thus, for a fixed k and a fixed indexing, the flow measure getting
through e is at most
1 
2

(2n
n) 2n 1
(k + 1)!(2n − k − 1)! = . (7.85)
(n!)4 k + 1 (2n)!

There are (n!)2 possible indexes and at most 2n − 1 possible ks. Therefore,
the maximum flow getting through an edge is

(2n − 1)(n!)2
f (e) ≤ . (7.86)
(2n)!

This should be multiplied by the inverse of Q(e), so we get that the maximum
load is
1 (2n − 1)(n!)2
ϑf ≤ 1 1 = (2n − 1)2 . (7.87)
2n 2n−1 (2n)!
(n)
Applying Theorem 77, we can set a lower bound on the conductance that
proves the rapid mixing of the Markov chain. 

Finally, it is also possible to generalize Theorem 76 and its corollary.


Theorem 78. Let f be a multicommodity flow. Define
1 X
Kf := max f (γ)|γ|. (7.88)
e Q(e) γ3e
298 Computational complexity of counting and sampling

Then the inequality


1
λ2 ≤ 1 − (7.89)
Kf
holds.
Corollary 18. Let f be a multicommodity flow. Define
lf := max |γ|. (7.90)
γ∈Π,f (γ)>0

Then the inequality


1
λ2 ≤ 1 − (7.91)
l f ϑf
holds.

7.2.4 Coupling of Markov chains


Coupling is an old idea of Wolfgang Doeblin [55] that was revived in
Markov chain theory thirty years later. The coupling of Markov chains is
defined in the following way.
Definition 61. Let M be a Markov chain on the state space X with transition
probabilities T (·|·). Let M 0 be a Markov chain on X × X, and let T 0 denote
its transition probabilities. We say that M 0 is a coupling of M if the following
equalities hold:
X
T 0 ((x2 , y2 )|(x1 , y1 )) = T (x2 |x1 ) (7.92)
y2
X
T 0 ((x2 , y2 )|(x1 , y1 )) = T (y2 |y1 ). (7.93)
x2
(7.94)
Namely, both coordinates of M 0 behave like a copy of M . One possible
way to achieve this is independency, that is
T 0 ((x2 , y2 |(x1 , y1 ))) = T (x2 |x1 )T (y2 |y1 ), (7.95)
although it is not the only way, and for the application described below is not
desirable. The two coordinates might be forced to get closer without violating
the equalities in Equation (7.92) and (7.92). Once xi = yi is achieved, it is
easy to force the two coordinates to stay together by defining
T 0 ((x2 , x2 )|(x1 , x1 )) := T (x2 |x1 ) (7.96)
and for y 6= x2
T 0 ((x2 , y)|(x1 , x1 )) := T 0 ((y, x2 )|(x1 , x1 )) = 0. (7.97)
Such Markov chains can prove rapid mixing of M as stated by the following
theorem proved by Aldous [6]:
Mixing of Markov chains and their applications 299

Theorem 79. Let M 0 be a coupling of M . Assume that t : [0, 1] → N is a


function such that for any state (x0 , y0 ), it satisfies the inequality:
D E
t(ε)
T 0 1(x0 ,y0 ) 1x6=y ≤ ε (7.98)

where 1(x0 ,y0 ) is the vector containing 0 everywhere except it is 1 on coordinate


(x0 , y0 ), and 1x6=y is the vector containing 1 everywhere except it is 0 on
coordinates (xi , xi ) for all xi ∈ X. In other words, the probability that the two
coordinates of the Markov chain M 0 are not the same after t(ε) steps starting
at an arbitrary state (x0 , y0 ) is less than or equal to ε.
Then t(ε) upper bounds the relaxation time τ (ε) := maxi {τi (ε)} of M .
We demonstrate how to prove rapid mixing with the coupling technique
in the following example. As a warm up, the reader might consider solving
Exercise 18 before reading the example.
Example 25. Use the coupling technique to show that the Markov chain on
the vertices of the d-dimensional unit hypercube {0, 1}d is rapidly mixing. The
transition probabilities are defined in the following way. The Markov chain
remains in the same state with probability 12 , and with probability 12 a random
coordinate is selected and its value x is replaced with 1 − x.
Solution. Let M 0 be a coupling of M whose transition probabilities are
defined in the following way. Let (x, y) denote the current state, where
x = (x1 , x2 , . . . , xd ) and y = (y1 , y2 , . . . , yd )s. In each step, a random coor-
dinate i is selected uniformly and a fair coin is tossed. If xi = yi , then both
or none of them are flipped based on if the tossed coin is a head or tail. If
xi 6= yi , then xi is flipped and yi is not changed if the tossed coin is a head,
otherwise yi is flipped and xi is not changed. Let n(t) denote the number of
coordinates on which x and y are the same in the coupling after t steps of the
Markov chain. Observe the following: if n(t) < d, then the inequality
1
P (n(t + 1) = n(t) + 1) ≥ , (7.99)
d
and
P (n(t + 1) = n(t) + 1 ∨ n(t + 1) = n(t)) = 1. (7.100)
1
Indeed, if n(t) < d, then with at least probability, a coordinate i is selected
d
such that xi 6= yi , and after the step in M 0 , the two coordinates become the
same.
Clearly, n(t) = d indicates that x = y after step t. What follows is that
D E
t(ε)
T 0 1(x0 ,y0 ) 1x6=y (7.101)

is upper bounded by the probability that in a binomial distribution with n = t


and p = d1 , the random variable is less than d. This latter probability is
d−1    i  t−i
X t 1 1
1− . (7.102)
i=0
i d d
300 Computational complexity of counting and sampling

When t ≥ d2 (and thus d ≤ np), we can use Chernoff’s inequality to bound


the sum in Equation (7.102):
d−1    i  t−i
(t d1 − d)2
 
X t 1 1
1− ≤ exp − . (7.103)
i=0
i d d 2t d1
Since this upper bounds the mixing time, we would like to find a t satisfying
(t 1 − d)2
 
exp − d 1 ≤ ε, (7.104)
2t d
that is,
(t − d2 )2
≥ − ln(ε). (7.105)
2td
Solving this inequality, we get that the mixing time is upper bounded by
  s    2
2 1 3
1 2
1
d + d ln + 2d ln + d ln (7.106)
ε ε ε
which is polynomial in both d and − ln(ε).


7.2.5 Mixing of Markov chains on direct product spaces


When a state space is a direct product of smaller spaces, and the Markov
chain changes only one coordinate in one step, then the Markov chain is rapidly
mixing if the number of coordinates is not large and the Markov chains re-
stricted on each coordinate are rapidly mixing. Below we state this theorem
precisely and prove it.
Theorem 80. Let M be a class of Markov chains whose state space is a
K-dimensional direct product of spaces, and the problem size of a particular
chain is denoted by n (where we assume that K = O(poly1 (n))).
Any transition of the Markov chain M ∈ M changes only one coordi-
nate (each coordinate with equal probability), and the transition probabili-
ties do not depend on the other coordinates. The transitions on each coordi-
nate form irreducible, aperiodic Markov chains (denoted by M1 , M2 , . . . MK ),
which are reversible with respect to a distribution πi . Furthermore, each of
1
M1 , . . . MK are rapidly mixing, i.e., with the relaxation time 1−λ 2,i
is bounded
by a O(poly2 (n)) for all i. Then the Markov chain M converges rapidly to the
direct product of the πi distributions, and the second-largest eigenvalue of M
is
K − 1 + maxi {λ2,i }
λ2,M =
K
and thus the relaxation time of M is also polynomially bounded:
1
= O(poly1 (n)poly2 (n)).
1 − λ2,M
Mixing of Markov chains and their applications 301

Proof. The transition matrix of M can be described as


PK hNi−1 i NK
i=1 j=1 Ij ⊗ Mi ⊗ j=i+1 Ij

K
where ⊗ denotes the usual tensor product from linear algebra, Mi denotes the
transition matrix of the Markov chain on the ith coordinate, and Ij denotes
the identical matrix with the same size as Mj . Since all pairs of terms in the
sum above commute, the eigenvalues of M are
( K
)
1 X
λj ,i : 1 ≤ ji ≤ |Ωi |
K i=1 i

where Ωi is the state space of the Markov chain Mi on the ith coordinate. The
second-largest eigenvalue of M is then obtained by combining the maximal
second-largest eigenvalue (maximal among all the second-largest eigenvalues
of the component transition matrices) with the other largest eigenvalues, i.e.,
with all others being 1s:
K − 1 + maxi {λ2,i }
.
K
If g denotes the smallest spectral gap, ie., g = 1 − maxi {λ2,i }, then from the
above, the second-largest eigenvalue of M is
K −g g
=1−
K K
namely, the second-largest eigenvalue of M is only K times closer to 1 than
the maximal second-largest eigenvalue of the individual Markov chains.

7.3 Self-reducible counting problems


Self-reducible counting problems have been introduced in Section 6.5. We
already learned there that it is possible to sample uniformly in polynomial
time those self-reducible objects whose counting problem is in FP. Here we
show that self-reducible counting problems have two additional important
properties. Any self-reducible counting problem has an FPRAS if and only
if it has an FPAUS. They also have a dichotomy theorem: any self-reducible
problem either has an FPRAS or cannot be approximated in polynomial time
with a polynomial approximation factor. We are going to state these precisely
and prove them in the following subsections. Both proofs use the tree structure
of the solution space.
302 Computational complexity of counting and sampling

7.3.1 The Jerrum-Valiant-Vazirani theorem


Mark Jerrum, Leslie Valiant and Vijay Vazirani proved that any self-
reducible problem can be approximated with an FPRAS if and only if it can
be sampled with an FPAUS [103]. The proof in both directions uses the tree
structure of the solution space. The FPAUS ⇒ FPRAS direction is relatively
easy, although technically tedious. The idea is that a small number of sam-
ples of the solutions of a problem instance labeling a vertex v following the
(almost) uniform distribution are sufficient to estimate what fraction of solu-
tions are below the children of v. The FPRAS ⇒ FPAUS direction is trickier.
FPRAS estimations do not lead directly to an FPAUS; a rejection sampling
is needed to get an FPAUS. Below we state the theorem and prove it.
Theorem 81. Let #A be a self-reducible counting problem. Then #A is in
FPRAS if and only if #A is in FPAUS.
Proof. The proof in both directions is constructive. To construct an FPAUS
algorithm from an FPRAS algorithm, we use the rejection sampling technique.
The global picture of the construction is the following. We construct a random
sampler based on FPRAS estimations which generates a random solution from
a non-uniform distribution. This distribution is far from being almost uniform,
however, it has the property that in a rejection sampling algorithm, it has a
fair probability to get accepted. If it is accepted, then the solution is drawn
from exactly the uniform distribution. If it is not accepted, then we start over
till the first acceptance or a given number of trials. If none of the proposals
is accepted, we generate an arbitrary solution. This procedure generates a
distribution which is almost uniform. More precisely, we could do this, if in the
FPRAS estimations it was guaranteed to achieve a given approximation ratio.
However, there is a small probability that the FPRAS estimations are out of
the given boundaries, and in that case the prescribed procedure fails to work.
However, we can handle this small probability, and we have an almost uniform
sampler even if we recognize that with a small probability the procedure fails
and generate a solution from an unknown distribution.
Let x be a problem instance of #A, and let n denote the size of x. Since
there is an ε in the definitons of FPRAS and FPAUS, and we have to use both
of them, we will denote the ε in FPRAS by ε0 in this proof. Let a problem
instance x and ε > 0 be given. We are going to construct a rejection sampling
algorithm with the following properties:
1
1. A solution of x is generated which is accepted with at least 4e2 proba-
bility.

2. Given that a solution is accepted, with less than 2ε probability, the al-
gorithm does not work properly; it generates a solution of x from an
unknown distribution. On the other hand, with at least 1 − 2ε probabil-
ity, the solution is from exactly the uniform distribution of solutions.
Mixing of Markov chains and their applications 303

3. The running time of the algorithm is polynomial with both the size of
x and − log(ε).

We claim that such rejection sampler is sufficient to get an FPAUS algorithm


with parameter ε. Indeed, repeat the rejection sampler till the first accep-
tance but at most 4e2 log 2ε times. Then the probability that all samples are


rejected is
 4e2 log( 2ε )  log( 2ε )
1 1 ε
1− 2 < = . (7.107)
4e e 2
In that case, generate a solution from an arbitrary distribution, for example,
accept the last solution, whatever is it. The running time of this procedure
is polynomial with both the size of x and − log(ε). It generates a solution
following a distribution which is a convex combination of three distributions:
1. With at most 2ε probability, it is an unknown distribution due to reject-
ing all proposals in the rejection sampling.

2. With at most 2ε probability, it is an unknown distribution because the


algorithm does not work properly when accepting a proposal.
3. With at least 1 − ε probability, it is the uniform distribution.
It is easy to see that the total variation distance of the convex combination of
these three distributions and the uniform distribution is at most ε.
So we only have to provide a rejection sampling method with the prescribed
properties. We can do it in the following way. Let v be the root of the tree
describing the solution space of x, and let d = O(poly(n)) denote the depth
of the tree.
While v is not a leaf, iterate the following. Obtain the children of v and
compute the problem instances labeling these vertices. This can be done in
polynomial time due to the definition of self-reducibility. For each of these
problem instances, estimate their number of solutions using the available
FPRAS algorithm with ε0 = 2d 1 ε
relative error and set δ to 2(d+1) . Due to the
definition of FPRAS, this runs in polynomial time with both n and − log(ε).
Then select a random child u from the distribution being proportional to the
estimated number of solutions. Set v to u.
If the selected nodes are v = v0 , v1 , . . . vm , then the probability of gener-
ating the solution y is
Ym
p(y) = P (vi ) (7.108)
i=0

where P (vi ) is the probability of selecting child vi in the ith iteration.


The probability p(y) can be calculated exactly, so in the rejection sampling,
we can set f to p. The target distribution is the uniform one, so the g ≡ 1
function suffices as a function proportional to the target distribution. To set
304 Computational complexity of counting and sampling

up a rejection sampling, we need an enveloping constant such that for any


solution y, the equation
cg(y) = c ≥ f (y) (7.109)
holds. For this, we estimate Y = |{y|xRy}| with the available FPRAS al-
gorithm setting ε0 to 1 and δ to 2(d+1)
ε
. Let this estimation be denoted by
Ŷ . We claim that the inverse of it multiplied by 2e will be a good estima-
tion for the enveloping constant with high probability. Indeed, if all FPRAS
approximations are in the given boundary, then on one hand it holds that
2e e
c := ≥ , (7.110)
Ŷ Y
and on the other hand,
 2d
e 1 1
≥ 1+ ≥ p(y). (7.111)
Y 2d Y

The left-hand side inequality in Equation (7.111) is trivial. To see the right-
hand side inequality, observe the following. If exact calculations were given
instead of FPRAS in all steps during the construction of y, then p would be
the uniform distribution, and thus p(y) was Y1 . However, in case of FPRAS
approximation, it could happen that the number of solutions below the node
that was selected was overestimated by a 1 + d1 factor, while all other nodes
were underestimated by the same factor. Still, the error introduced at that
point cannot be larger than the square of this factor. There are at most n
iterations, and although the errors here are multiplicative, it still cannot be
larger than
 2d
1
1+ . (7.112)
2d
What is the probability of acceptance? With similar considerations, it is easy
to see that
2e 4e
c := ≤ , (7.113)
Ŷ Y
and
1
1 Y
≤ 2d ≤ p(y). (7.114)
eY 1+ 1 2d

Thus the ratio of p(y) and c is smaller than 4e2 , thus the acceptance ratio is
at least 4e12 . All these calculations hold only if all the FPRAS approximations
are in the prescribed boundaries. The probability that any of the FPRAS
approximations fall out of the boundaries is at most
 d+1
ε ε
1− 1− ≤ , (7.115)
2(d + 1) 2
due to the Bernulli ’s inequality.
Mixing of Markov chains and their applications 305

Therefore the prescribed properties hold for the rejection sampler, and
thus, the overall procedure is indeed an FPAUS algorithm.
The global picture on how to construct an FPRAS algorithm from an
FPAUS algorithm is the following. We sample solutions using the FPAUS
algorithm and estimate which fraction of the solutions fall below the different
children of the root. If there are m children and the number of samples falling
below child vi is ni , then the estimated fraction is
ni
fˆi = Pm .
j=1 nj

We select the child with the largest fraction, and iterate this procedure. We
will arrive at a particular solution after k steps. Let fˆ1 , fˆ2 , . . ., fˆk denote the
estimations of the largest fractions during this process. We give an estimation
of the number of solutions as
k
Y 1
.
fˆi
i=1

We claim that this is an FPRAS algorithm when the parameters in this pro-
cedure are chosen appropriately.
Let ε0 and δ be the given parameters of the required FPRAS algorithm.
For each internal node we visit during the procedure, we prescribe a total vari-
ation distance for the FPAUS algorithm and prescribe a number of samples.
Generating the prescribed number of samples with the prescribed FPAUS, we
give an estimation of the largest fraction of solutions that fall below a par-
0
ticular child u with the property that the relative error is at most εd with at
least 1 − dδ probability, where d is still the depth of the tree. We construct the
problem instance x1 labeling u, and sample solutions of x1 to estimate the
largest fraction of the solutions that fall below a child. We iterate this proce-
dure until we arrive at a leaf. Then the solution represented by the leaf is a
fraction of the solution space, and the inverse of this fraction is the number
of solutions. We have an estimation for this fraction. It is the product of the
fractions estimated during the iterations. Since we have at most d estimations
of fractions, this indeed leads to an FPRAS algorithm.
Due to the definition of self-reducibility, there are at most |Σ|σ(x) number
of children of an internal node labeled by the problem instance x. For sake of
simplicity, let the largest number of children be denoted by N . It is easy to
see that N = O(poly(n)). Due to the pigeonhole rule, there is a child below
which at least N1 fraction of the solutions reside. Let u denote this child. If the
samples came from the uniform distribution, the fraction of samples below u
would be an unbiased estimator of the probability that a solution is below u.
However, the samples are only almost uniform, and this causes a systematic
0
bias. Fortunately, we can require that ε in the FPAUS algorithm is 2d|σ|εpoly(n)
and then the running time of the FPAUS algorithm is still polynomial in
ε0
both n and ε10 . Furthermore, the systematic error is smaller than 2d . The
number of samples must be set such that the probability that the measurement
306 Computational complexity of counting and sampling
0
ε
error is larger than 2d is smaller than dδ . Let the probability that a solution
sampled by the FPAUS algorithm is below u be denoted by p. Then the
number of solutions below u follows a binomial distribution with parameter p
and expectation mp, where m is the number of samples. We can use Chernoff’s
inequality saying that for the actual number of samples Y below u, it holds
that
!
ε0
  
mp
P Y < ε0
≤ P Y < mp 1 − ≤
1 + 2d 3d
  2 
ε0
 1 mp − mp 1 − 3d
exp − . (7.116)

2p m

δ
The right-hand side should be bounded by 2d (the other half of the probability
will go to the other tail):
  2 
ε0
 1 mp − mp 1 − 3d δ
exp − ≤ . (7.117)

2p m 2d

Solving Equation (7.117) we get that


δ

−2 log 2d
m≥ ε 0 . (7.118)
p 3d
1
It is easy to see that this bound for m is polynomial in n, ε0 , and − log(δ).
Indeed, p is larger than N1 and d is polynomial in n.
For the other tail, we can also use Chernoff’s inequality:
ε0 2
!
ε0 m(1 − p) − (m − mp(1 +
  
2d ))
P Y > mp 1 + ≤ exp − .
2d 2(1 − p)m
(7.119)
δ
Upper bounding this by 2d and solving the inequality, we get that
δ

−2(1 − p) log 2d
m≥ ε 02 . (7.120)
p2 (2d) 2

It is again polynomial in all n, ε10 , and − log(δ). Therefore, a polynomial


number of samples are sufficient to estimate the fraction of solutions below
vertex u with prescribed relative error and prescribed probability.
Mixing of Markov chains and their applications 307

7.3.2 Dichotomy theory on the approximability of self-


reducible counting problems
Alistair Sinclair and Mark Jerrum observed that very rough estimations
of the number of solutions of self-reducible counting problems is sufficient to
obtain an FPAUS algorithm [156]. What follows is that if we have a very rough
estimation, then we also have an FPRAS, due to Theorem 81. That is, there
is an approximable/non-approximable dichotomy for self-reducible problems:
either they have a rough approximation, and then they also have an FPRAS,
or they do not have even a rough approximation.
The key observation is that a Markov chain on a tree is always reversible,
and it is easy to estimate its stationary distribution.
Lemma 19. Let M be a Markov chain whose states are vertices of a tree,
G = (V, E), and for any two states v, u ∈ V , P (u|v) 6= 0 if and only if u
and v are adjacent. Then there exists a function from w : E → R+ with the
following properties:
1. For any (u, v) ∈ E it holds that

w((u, v))
T (u|v) = P 0
(7.121)
u0 |(u0 ,v)∈E w((u , v))

and
w((u, v))
T (v|u) = P 0
. (7.122)
v 0 |(u,v 0 )∈E w((v , u))

2. The Markov chain is reversible with respect to the distribution


P
u|(u,v)∈E w((u, v))
π(v) := P . (7.123)
2 e∈E w(e)

Proof. We prove that the weight function w can be constructed iteratively.


Let v be an arbitrary leaf, connected to u. Define w((v, u)) := T (u|v)(= 1).
Then consider the other edges of u. For each of u’s neighbors v 0 6= v, define
T (v 0 |u)w((v, u))
w((v 0 , u)) := . (7.124)
T (v|u)

Then indeed for each neighbor v 0 of u, it holds that


w((u, v 0 ))
T (v 0 |u) = P , (7.125)
v”|(u,v”)∈E w(v”|u)

since
P
X v”|(u,v”)∈E T (v”|u)w((v, u)) w((v, u))
w(v”|u) = = , (7.126)
T (v|u) T (v|u)
v”|(u,v”)∈E
308 Computational complexity of counting and sampling

and thus
T (v 0 |u)w((v,u))
0 w((u, v 0 )) T (v|u)
T (v |u) = P = w((v,u))
. (7.127)
v”|(u,v”)∈E w(v”|u) T (v|u)

Now for any neighbor v 0 of u, it holds that either v 0 is a leaf or for one of the
edges of v 0 , the weight of the edge is defined and for all other edges, the weight
is not defined. If v 0 is a leaf, then for its edge weight and transition probability,
Equation (7.121) naturally holds. For any internal nodes, the weights of the
other edges can be defined similarly to u.
We can iterate this procedure until all vertices are reached. Due to the tree
structure, it is impossible that a vertex is visited twice.
It is easy to see that the measure π defined in Equation (7.123) is indeed
a distribution. The detailed balance also holds, since
0
P
u0 |(u0 ,v)∈E w((u , v)) w((u, v))
π(v)T (u|v) = P P 0
=
2 e∈E w(e) u |(u ,v)∈E w((u , v))
0 0

0
P
w((u, v)) v 0 |(v 0 ,u)∈E w((v , u)) w((u, v))
P = P P 0
=
2 e∈E w(e) 2 e∈E w(e) v |(v ,u)∈E w((v , u))
0 0

π(u)T (v|u). (7.128)

The idea of Sinclair and Jerrum was to reverse the construction: we can
define weights for the edges that will define a corresponding Markov chain. If
all weights of the edges connecting leaves to the remaining tree are the same,
then the stationary distribution is the uniform one on the leaves. We need two
further properties: i) the probability of the leaves in the stationary distribution
must be non-negligible, ii) the Markov chain must be rapidly mixing. Both
properties hold if the weights come from a very rough estimation of the number
of solutions, as stated and proved in the following theorem.
Theorem 82. Let #A be a self-reducible counting problem. Let C be a poly-
nomial time computable function such that for any problem instance x of #A,
C(x) gives an approximation for the number of solutions of x with an ap-
proximation factor F(x) = poly(|x|). Let M be a set of Markov chains such
that for each problem instance x in #A, it contains a Markov chain M . The
state space of M is the vertices of the tree representing the solution space of
x, and the transition probabilities are defined in the following way. For each
edge (u, v), where u is a child of v, define w((u, v)) := C(xu ), where xu is the
problem instance labeling u, if u is an internal edge, and let w((u, v)) be 1 if
u is a leaf. The transition probability T (u|v) 6= 0 if and only if u and v are
neighbors. In that case, it is defined as
w((u, v))
T (u|v) := P 0
. (7.129)
u0 |(u0 ,v)∈E w((u , v))
Mixing of Markov chains and their applications 309

Then M is a rapidly mixing Markov chain, its stationary distribution on the


leaves is the uniform distribution, and the inverse of the probabilities of the
leaves in the stationary distribution is polynomially bounded.
Proof. It is easy to see that the stationary distribution is the uniform one on
the leaves, since weights of edges incident to leaves are the same. It is also
easy to show that
X 1
π(v) ≥ , (7.130)
1 + 2g(x)(1 + F(x))
v∈L

where g(x) is the function from the definition of self-reducibility measuring the
length of the solutions, and L is the set of the leaves in the tree. If C measured
exactly the number of solutions, then the following inequality would be true:
X X
C(xu ) ≤ 2|{x|xRy}| (7.131)
v∈Vd u|(u,v)∈E

where Vd is the set of internal vertices at depth d. Indeed, if d is such that


there is no leaf at depth d, then equality would hold. For d such that there
are leaves at depth d, the solutions represented by those leaves are out of the
summation. Since C has an approximation factor F(x), it still holds that
X X
C(xu ) ≤ (1 + F(x))2|{x|xRy}|. (7.132)
v∈Vd u|(u,v)∈E

Thus we have that


g(x)
X X XX X
C(xu ) ≤ C(xu ) ≤
v∈V \L u|(u,v)∈E d=0 v∈Vd u|(u,v)∈E

2g(x)(1 + F(x))|{y|xRy}|. (7.133)

We get that
X |{y|xRy}|
π(v) = P =
2 e∈E w(e)
v∈L
|{y|xRy}|
P P ≥
|{y|xRy}| + v∈V \L u|(u,v)∈E C(xu )
|{y|xRy}| 1
= . (7.134)
|{y|xRy}| + 2g(x)(x)|{y|xRy}| 1 + 2g(x)(1 + F(x))

To prove the rapid mixing, we can use an idea similar to that in Example 21.
The conductance of the Markov chain is taken on a connected subtree S.
There are two cases: the root of S is the root of the whole tree or not. If the
root of S is not the root of the whole tree, then we show that the ergodic flow
310 Computational complexity of counting and sampling

on the root is already comparable with π(S). Let v denote the root of S, let
xv denote the problem instance labeling v, let u denote the parent of v, and
let Vd denote the set of vertices in S in depth d. Observe that
1
T (u|v) ≥ . (7.135)
1 + F(x)2

Indeed, if C measured exactly the number of solutions, then the transition


probability from a vertex to its root was exactly 12 . Since there is an F(x)
approximation factor, in the worst case it could happen that the number of
solutions for the problem instance xv is maximally underestimated, and the
number of solutions for each child of v is maximally overestimated. In that
case,

w((u, v))
T (u|v) = P =
w((u0 , v))
u0 |(u0 ,v)∈E
|{y|xv Ry}|
1+F (x) 1
|{y|xv Ry}|
= . (7.136)
+ |{y|xv Ry}|(1 + F(x)) 1 + F(x)2
1+F (x)

Then
|{y|xv Ry}| 1
F (S) π(v)T (u|v) 1+F (x) 1+F (x)2
Φ= ≥ P ≥P P 0

π(S) v∈S π(v) v 0 ∈S u|(u,v 0 )∈E w((u, v ))
|{y|xv Ry}| 1
1+F (x) 1+F (x)2
=
|{y|xv Ry}| + |{y|xv Ry}|2g(x)(1 + F(x))
1
2
. (7.137)
(1 + F(x))(1 + F(x) )(1 + 2g(x)(1 + F(x)))

If the root of S is the root of the whole tree, then the complement of S is the
union of disjoint subtrees. Similar to the previous calculations, the ergodic
flow going out of their root is at most a polynomial factor smaller than the
probability of the trees. Since the Markov chain is reversible, the ergodic flow
of S equals the ergodic flow of the complement. Thus, the ergodic flow of S is
at most a polynomial factor smaller than the probability of the complement
of S, which cannot be smaller than the probability of S. Thus, the inverse of
the conductance in this case is also polynomially bounded.
Since the conductance is polynomially bounded, the Markov chain is
rapidly mixing.
Corollary 20. Let #A be a self-reducible counting problem. Let C be a poly-
nomial time computable function such that for any problem instance x of #A,
C(x) gives an approximation for the number of solutions of x with a polynomial
approximation factor. Then #A is in FPAUS and thus is in FPRAS.
Proof. We know that there is a rapidly mixing Markov chain on the vertices of
Mixing of Markov chains and their applications 311

the tree representing the solution space of the problem instance x, such that
its stationary distribution restricted to the leaves of the tree is the uniform
distribution; furthermore, the inverse of the probabilities of the leaves in the
stationary distribution is polynomially bounded. Then we can sample a vertex
of the tree from a distribution being very close to its stationary distribution
in polynomial time. The probability the sample is a not a leaf is less than
1
1− (7.138)
poly(|x|)
for some polynomial. Then the probability that none of the samples is a leaf
from O(poly(|x|, − log(ε))) number of samples is less than 2ε . In that case we
choose an arbitrary solution of x. The number of steps of the Markov chain
for one sample should be set such that the total variation distance of the
distribution after the given number of steps restricted to the leaves and the
uniform distribution of the leaves should be smaller than 2ε . Then the following
procedure will be an FPAUS. The procedure makes the prescribed number of
steps in the Markov chain, returns the current state, and iterates it till the
first sampled leaf but at most O(poly(|x|, − log(ε))) times; and in case of no
sampled leaves, it returns an arbitrary solution. Since #A is self-reducible, if
it is in FPAUS, it is also in FPRAS.

7.4 Further reading and open questions


7.4.1 Further reading
• There are several variants of the distinguished path technique. Jason
Schweinsberg introduced a variant where he proved that for any subset
B of the state space V , it holds that
1 4L 1 X
≤ max π(x)π(y). (7.139)
1 − λ2 π(B) e∈E Q(e)
x∈V,y∈B|e3γx,y

The statement also has a probabilistic (multicommodity flow) version.


Schweinsberg used this theorem to give an asymptotically sharp upper
bound on the relaxation time of a Markov chain walking on the leaf-
labeled, unrooted binary tree [152].
• The coupling from the past technique was developed by James Propp
and David Wilson [143]. The technique provides samples following ex-
actly the stationary distribution of a Markov chain. The price we have
to pay for the perfect samples is that the running time of the method
is a random variable. We consider a particular algorithm efficient if the
312 Computational complexity of counting and sampling

expected running time grows polynomially with the size of the problem
instance. An example of such a fast perfect sampler was published by
Mark Huber. His method perfectly samples linear extensions of posets
[95].
• Russ Bubley and Martin Dyer introduced the path coupling technique,
where two Markov chains are coupled via a path of intermediate states.
Using this technique, they proved fast mixing of a Markov chain on
linear extensions of a poset. Their method gave a better upper bound
on the mixing time that could be achieved using a geometric approach
[29].
• Cheeger’s inequalities say that a Markov chain is rapidly mixing if and
only if there is no bottleneck in them. However, bottlenecks might not
only be geometric when there is only a few edges between two subsets
of vertices in the Markov graph. They might also be probabilistic bot-
tlenecks where two subsets of vertices are connected with many edges,
however, the average transition probabilities are very small. Examples
of such bottlenecks are presented in [79] and [131]. Goldberg and Jer-
rum showed that the so-called Burnside process converges slowly. It is a
random walk on a bipartite graph whose vertices are the members of a
permutation group and combinatorial objects on which the group acts.
Two vertices are connected if the group element fixes the combinatorial
object. Restricting the walk to the combinatorial objects (that is, con-
sidering only every second step on the bipartite graph) yields a Markov
chain whose Markov graph is fully connected. Indeed, the identity of the
group fixes all combinatorial objects. Still, the Markov chain might be
torpidly mixing for some permutation groups, as proved by Goldberg
and Jerrum. Miklós, Mélykúti and Swenson considered a Markov chain
whose state space is the set of shortest reversal sorting paths of signed
permutations [131]. The corresponding Markov graph is fully connected,
however, the Markov chain is torpidly mixing since the majority of the
transitions have negligible probability.
• The Lazy Markov chain technique is somewhat paradoxical in the sense
that we have to slow down a Markov chain to prove its rapid mixing.
To avoid it, we have to prove that the smallest eigenvalue is sufficiently
separated from −1. Diaconis and Strock proved an inequality on the
smallest eigenvalue [54]. Greenhill used a similar inequality to prove
rapid mixing of the non-lazy version of a Markov chain sampling regular
directed graphs [83].
• The logarithmic Sobolev inequality was given by Gross [86]. Diaconis
and Saloff-Coste showed its application in mixing time of Markov
 chains 
1
[53]. They gave an inequality for the mixing time where log π(x i)
in
  
1
Equation (7.8) is replaced with log log π(x i)
while 1−λ2 is replaced
Mixing of Markov chains and their applications 313

with the so-called logarithmic Sobolev constant α. Rothaus proved that


α ≥ 2(1 − λ2 ) [147]. What follows is that upper bounds on the mix-
ing time using logarithmic Sobolev inequalities might be several orders
smaller than the upper bound given in Equation (7.8). These upper
bounds might be asymptotically optimal in several cases.

7.4.2 Open problems


• It is easy to see that if a self-reducible counting problem is in FP, then
there exists a polynomial running time algorithm that generates samples
following exactly the uniform distribution. However, we do not know if
the opposite is true. Namely, if a self-reducible counting problem has an
exact uniform sampler in polynomial time, does that imply that it is in
FP?
• We do not know if there exists a counting problem which is in FPRAS
but not in FPAUS, or in FPAUS but not in FPRAS.
• We do not know if there is a problem in #P-complete for which a ran-
dom solution can be generated from exactly the uniform distribution in
deterministic polynomial time.

7.5 Exercises
1. Let T be the transition matrix of a Markov chain constructed by the
Metropolis-Hastings algorithm. Prove that there are matrices W and Λ
for which
T = W ΛW −1
and Λ is diagonal.
2. Let T be the transition matrix of a reversible Markov chain. Prove that
T +I
2
can be diagonalized.
3. * Let p1 , p2 and p3 be three distributions over the same domain, and
let α be a real number in [0, 1]. Show that

dT V (p1 , αp2 + (1 − α)p3 ) ≤ αdT V (p1 , p2 ) + (1 − α)dT V (p1 , p3 ).


314 Computational complexity of counting and sampling

4. * Let M be a Markov chain reversible with respect to the distribution π.


Show that if M is periodic with time period 2, then −1 is an eigenvalue
of M . Construct its corresponding eigenvector, too.
5. Show that a reversible Markov chain cannot be periodic with a time
period larger than 2.
6. Let G = (V, E) be a simple graph, and let d(v) denote the degree of
v. Let M be a Markov graph whose state space is V , and for each
1
(u, v) ∈ E it has a transition from u to v with probability d(u) and a
1
transition from v to u with probability d(v) . Show that G is reversible,
and find its stationary distribution.
7. Let τ (ε) denote maxi {τi (ε)}. Show that
   
1 1
τ (ε) ≤ τ ln .
2e ε

8. ◦ Let X be a finite space and let π be any non-vanishing distribution on


it. Show that the set
 
1
S ⊂ X 0 < π(S) ≤
2
has size at least 2|X|−1 − 1.
9. ◦ Prove that the number of connected subgraphs of a graph might be
an exponential function of the number of vertices of the graph.
10. What is the diameter of the d-dimensional hypercube?
11. What is the diameter of the convex body whose vertices are those ver-
tices of the d-dimensional unit hypercube {0, 1}d which contain an even
number of 1s on their coordinates.
12. Consider the Markov chain on a d-dimensional lattice that contains k
vertices in each dimension. The state space is the k d vertices, and there
is a transition between neighbor vertices, each of them with probability
1
2d . Furthermore, vertices at the border have non-zero transition loop
probabilities, that is, the Markov chain has non-zero probability to stay
in those states. Use the Cheeger inequality and Theorem 72 to show
that the mixing time of this Markov chain grows polynomial with both
d and k.
13. ◦ Consider a set of Markov chains M that contains a Markov chain
Mn for each positive integer n. The state space of Mn contains the
permutations Sn , and there is a transition between any two permutations
1
differing in two neighbor positions. Each transition has probability n−1 .
Use the canonical path method to prove that M is rapidly mixing.
Mixing of Markov chains and their applications 315

14. Consider a set of Markov chains M that contains a Markov chain Mn


for each positive integer n. The state space of Mn contains the per-
mutations Sn , and there is a transition between any two permutations
differing in two (not necessarily neighbor) positions. Each transition has
2
probability n(n−1) . Use the canonical path method to prove that M is
rapidly mixing.
15. * Let G be a simple graph which is a union of paths and cycles. Let
M be a Markov chain whose state space contains the (not necessarily
perfect) matchings of G. There is a transition from a matching x1 to
another matching x2 if they differ in exactly one edge. The transition
probability is  
1 1
min ,
2k 2n
where k is the number of edges in the matching that contains more edges
and n is the number of possible extensions of the other matching with
one more edge. In each state, the Markov chain remains in the state with
the remaining probability. Show that this is indeed a Markov chain, that
is, the sum of the prescribed probabilities never exceeds 1, furthermore,
show that the chain is reversible with respect to the uniform distribution.

16. ◦ Using the canonical path method, show that the Markov chain in Exer-
cise 15 is rapidly mixing, that is, the mixing time only grows polynomial
with the size of the ground graph G.
17. ◦ Let M be a Markov chain whose states are the monotone paths on
the 3D lattice from (0, 0, 0) to (a, b, c) (a, b, c ∈ Z+ ). Each path can be
described as the series of steps in the directions of the 3 axes of the
3D coordinate system. For example, xxzy means 2 steps in the first
dimension, one step in the third dimension, and one step in the second
dimension. There is a transition between two states if they differ in
two consecutive steps. The transition probability between any two such
states is
1
.
a+b+c−1
The chain remains in the same state with the remaining probabilities.
Show that this Markov chain converges to the uniform distribution. Us-
ing multicommodity flow, show that the mixing time grows only poly-
nomially with a + b + c.
18. * Alice and Bob are playing against the devil. The devil put a hat on
Bob’s head and a hat on Alice’s head. The hat might be white or black
and the two hats might be the same or different colors. Alice and Bob
might discuss a strategy before the game, but after that they cannot
communicate and can see only the hat of the opposite player. From this
information, they have to guess the color of their own hat.
316 Computational complexity of counting and sampling

(a) What should their strategy be to guarantee that with probability


1, one of them guesses correctly?
(b) Work out a strategy such that with probability 1, one of them guess
correctly. Furthermore, for both Alice and Bob, it holds that their
guess is “white” with probability 0.5 regardless of the color of the
other person’s hat.

19. Let M be a set of Markov chains that contains a Markov chain Mn for
each positive integer n. The state space of Mn is the set of Dyck words
of length 2n, and the transition probabilities are defined by the following
algorithm.
(a) Draw uniformly a random number between 1 and 2n − 1.
(b) If swapping the characters in positions i and i + 1 is also a Dyck
word, then swap them. Otherwise, do nothing.
Using the coupling argument, prove that M is rapidly mixing.
20. ◦ Use the coupling technique to show that the Markov chain in Exam-
ple 21 is rapidly mixing.

21. Consider the Markov chain on the d-dimensional toroidal lattice that
contains k vertices on its circles in each dimension. The state space of
the Markov chain contains the k d vertices of the toroidal lattice, and the
transition probabilities are between neighbor vertices; each transition
1
probability is 2d . Using the coupling technique, prove that the mixing
time grows polynomially with both d and k.
22. * Show that counting the spanning trees of a graph is a self-reducible
counting problem.
23. Show that counting the shortest paths between two vertices of a graph
is a self-reducible counting problem.

24. * Let D be a degree sequence of length n, and let F be a subset of edges


of the complete graph Kn . We say that G = (V, E) is a realization of D
avoiding F if E ∩ F = ∅ (and obviously, G is a realization of D). Show
that counting the realizations of a degree sequence avoiding a star is a
self-reducible counting problem.

25. Show that counting the alignments of two sequences is a self-reducible


counting problem.
Mixing of Markov chains and their applications 317

7.6 Solutions
Exercise 3. The proof comes directly from the definition of total variation
distance and basic properties of the absolute value function. Indeed,
1X
dT V (p1 , αp2 + (1 − α)p3 ) = |p1 (x) − αp2 (x) − (1 − α)p3 (x)| ≤
2 x
1X
(|αp1 (x) − αp2 (x)| + |(1 − α)p1 (x) − (1 − α)p3 (x)|) =
2 x
αdT V (p1 , p2 ) + (1 − α)dT V (p1 , p3 ).

Exercise 4. Let U and V be the two classes of the bipartite Markov graph.
Consider the vector π̃ defined as

π̃(u) := π(u)

for all u ∈ U and


π̃(v) = −π(v)
for all v ∈ V . It is easy to see that π̃ is an eigenvector with eigenvalue −1.
Indeed, observe that for any u ∈ U
X
π(v)T (u|v) = π(u)
v∈V

since π is an eigenvector with eigenvalue 1, and thus


X X
π̃(v)T (u|v) = −π(v)T (u|v) = −π(u).
v∈V v∈V

Observe that similar equalities hold for all v ∈ V . That is,

T π̃ = −π̃,

which means that π̃ is indeed an eigenvector with eigenvalue −1.


Exercise 8. Observe that for any S ⊆ X, the equation

π(S) + π(S) = 1

holds.
Exercise 9. Consider, for example, the caterpillar binary tree, that is, the
binary tree in which the internal vertices form a path.
Exercise 13. A good canonical path can be defined by bubbling down the
appropriate elements of the permutation to the appropriate positions. We
can set an upper bound on the maximum load of an edge similar to the
calculations in the solution of Example 24. The difference is that we do not
318 Computational complexity of counting and sampling

have to index the member of the permutations; observe that the indexed
sequences in Example 24 behave like the permutations.
Exercise 15. Consider the Markov chain M 0 that with 12 probability, deletes
a uniformly selected edge from the presented edges and with 12 probability,
adds an edge uniformly selected from those edges with which the matching can
be extended. Observe that the given transition probabilities are the transition
probabilities of the Markov chain that we get by applying the Metropolis-
Hastings algorithm on M 0 and setting the target distribution to the uniform
one.
Exercise 16. The canonical path can be constructed similar to the Markov
chain presented in Section 8.2.5.
Exercise 17. The construction of the multicommodity flow is similar to the
one presented in the solution of Example 24.
Exercise 18.
(a) Alice will say the color of Bob’s hat, Bob will say the opposite color of
Alice’s hat. If they have hats of the same color, then Alice will guess
properly, if they have hats of different colors, then Bob will guess cor-
rectly.
(b) They toss a fair coin. If it is a head, they play the previous strategy, if
it is a tail, they flip the roles.
Exercise 20. Couple the two Markov chains when they are both at the root
of the tree. Show that the waiting time for this event only grows polynomially
with the size of the problem.
Exercise 22. Let G = (V, E) be a simple graph. Fix an arbitrary total order
on E. Let ei be the smallest edge in E. Then in any spanning tree T of G,
either e ∈ T or e 6∈ T . If ei ∈ T , then let T ei be the tree obtained from T
by contracting e, and let Gei be the graph obtained from G by contracting
ei . Then T ei is a spanning tree of Gei , furthermore any spanning tree T̃ ei of
Gei can be obtained by contracting e in a spanning tree T̃ of G. Since for any
T 6= T̃ , it also holds that T ei 6= T̃ ei , contracting ei in those spanning trees of
G that contain e is a bijection between those trees and the spanning trees of
Gei .
Similarly, there is a bijection between the spanning trees of G not contain-
ing ei and the spanning trees of G \ {ei }. This provides a way to encode the
spanning trees of G. If the encoding starts

ei ∈ T,

then the extensions are the spanning trees of Gei ; if the encoding starts

ei 6∈ T,

then the extensions are the spanning trees of G \ {ei }. It is easy to see that
the granulation function σ has O(log(n)) values: encoding index i takes log(n)
Mixing of Markov chains and their applications 319

bits, where n = |V |. It is also easy to see that all g, σ and φ are polynomial
time computable functions, the encoding of both Gei and G \ {ei } are shorter
than the encoding of G, and property 3d in the definition of self-reducibility
also holds due to the above-mentioned bijections.
Exercise 24. Fix an arbitrary total order on the vertices. Let D =
d(v1 ), d(v2 ), . . . d(vn ) denote the degree sequence. Let v1 be the smallest vertex
and assume that there is already given a star S with center v0 as a forbid-
den set of edges. S might be empty. Let vi be the smallest vertex such that
(v1 , vi ) ∈
/ S. Then the realizations of D avoiding S and not containing edge
(v1 , vi ) are the realizations of D avoiding S ∪ {(v1 , vi )} and the realizations
of D avoiding S and containing edge (v1 , vi ) are the realizations of the degree
sequence D0 = d(v1 ) − 1, d(v2 ), . . . , d(vi ) − 1, . . . , d(vn ) avoiding S ∪ {(v1 , vi )}.
Finally, the realizations of D avoiding the star with n − 1 leaves and center
v1 are the realizations of D” = d(v2 ), . . . , d(vn ).
Chapter 8
Approximable counting and
sampling problems

8.1 Sampling with the rejection method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321


8.1.1 #Knapsack . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322
8.1.2 Edge-disjoint tree realizations without common internal
vertices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323
8.2 Sampling with Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
8.2.1 Linear extensions of posets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 326
8.2.2 Counting the (not necessarily perfect) matchings of a
graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
8.2.3 Sampling realizations of bipartite degree sequences . . . . . 330
8.2.4 Balanced realizations of a JDM . . . . . . . . . . . . . . . . . . . . . . . . . 334
8.2.5 Counting the most parsimonious DCJ scenarios . . . . . . . . 340
8.2.6 Sampling and counting the k-colorings of a graph . . . . . . 353
8.3 Further results and open problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
8.3.1 Further results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 356
8.3.2 Open problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
8.4 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 358
8.5 Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 360

There are #P-complete counting problems that are very easy to approximate.
We already discussed in Chapter 6 that an FPAUS can be given for sam-
pling satisfying assignments of a disjunctive normal form. Since counting the
satisfying assignments of a DNF is a self-reducible counting problem, we can
conclude that #DNF is also in FPRAS.
The core of the mentioned FPAUS is a rejection sampler for which the
inverses of the acceptance probabilities are polynomial bounded. In this chap-
ter, we show two further examples of FPAUS algorithms based on rejection
samplers.
We know only Markov chain approaches for generating almost uniform
samples of solutions to problem instances in #P-complete counting problems.
The second part of this chapter introduces examples for such Markov chains.

321
322 Computational complexity of counting and sampling

8.1 Sampling with the rejection method


8.1.1 #Knapsack
There are several variants of the Knapsack problem. Here we introduce a
simple variant whose counting version is in FPAUS and FPRAS.
Problem 17.
Name: #Knapsack.
Input: a series of integer numbers w1 ≤ w2 ≤ . . . ≤ wn and an integer W .
Output: the number of subsets of indexes I ⊆ [n] satisfying that
X
wi ≤ W. (8.1)
i∈I

The counting problem #Knapsack is known to be #P-complete [136]. Fol-


lowing the work of Dyer [56], we show that it is in FPAUS and FPRAS. The
problem is very similar to the money change problem, and indeed, it is easy to
see that an (algebraic) dynamic programming algorithm can be constructed
with O(nW ) running time. However, this is exponential in the number of dig-
its of W . If W is a small number, say, less than or equal to n2 , then the
algebraic dynamic programming approach still runs in polynomial time with
n. Therefore, we might assume that W > n2 .
To handle this case, we scale down the weights to get a feasible solution.
We define new weights as  2 
0 n
wi := wi (8.2)
W
and we are interested in the subset of weights whose sum does not exceed n2 .
Let S denote the set of solutions of the original problem, and let S 0 denote
the set of solutions of the modified problem. It is easy to see that S ⊆ S 0 .
Indeed, for any I ∈ S, it holds that
X n2 X n2
wi0 ≤ wi ≤ W = n2 . (8.3)
W W
i∈I i∈I

On the other hand, S 0 \ S might be non-empty. However, we are going to show


that
|S 0 |
≤ |S|. (8.4)
n+1
Let I 0 ∈ S 0 \ S and let I be obtained from I 0 by removing the largest index k.
We show that I is in S. The first observation is that wk > W n . Indeed, if wk
was less than or equal to W n , then it would hold that
X XW W
wi ≤ ≤n = W,
n n
i∈I 0 0
i∈I
Approximable counting and sampling problems 323

contradicting that I 0 ∈
/ S. In particular, wk0 ≥ n.
To simplify the notation, we define

n2
δi := wi − wi0 ,
W
that is,
W
wi =(w0 + δi ) .
n2 i
We are ready to show that I 0 \ {k} is in S. Indeed,
X W X
wi = (wi0 + δi )
n2
i∈I 0 \{k} i∈I 0 \{k}
 
W X 0 X
= wi − wk0 + δi 
n2 0 i∈I i∈I 0 \{k}
!
W X
≤ wi0 − wk0 +n since ∀δi ≤ 1
n2
i∈I 0
W X 0
≤ wi since wk0 ≥ n
n2 0 i∈I
W 2
≤ n = W.
n2
Since any I ∈ S can be obtained in at most n different ways from an I 0 ∈ S 0 \S,
it follows that
|S 0 \ S| ≤ n|S|.
Therefore
|S 0 | ≤ (n + 1)|S|.
Uniformly sampling from S 0 can be done in polynomial time with n. The
1
probability of sampling a solution from S is larger than n+1 . Since checking
if a solution is indeed in S can be done in O(n log(W )), we can use rejection
sampling to obtain an FPAUS algorithm whose running time is polynomial
with n and log(W ) (that is, polynomial with the size of the input) and also
polynomial with − log(ε), where ε is the allowed deviation from the uniform
distribution measured in total variation distance.

8.1.2 Edge-disjoint tree realizations without common inter-


nal vertices
We define the following counting problem.
Problem 18.
Name: #MinDegree1-2Trees-DegreePacking.
324 Computational complexity of counting and sampling

Input:
P two
P degree sequences D = d1 , d2 , . . . , dn , F = f1 , f2 , . . . , fn such that
i di = i fi = 2n − 2 and for all i, min{di , fi } = 1.
Output: the number of edge-disjoint tree realizations of D and F .
We will refer to D and F as tree degree sequences without common internal
nodes. To prove that #MinDegree1-2Trees-DegreePacking is in FPAUS and
FPRAS, we need the following observations. The first is a well-known fact,
and its proof can be found in many textbooks on enumerative combinatorics.
Observation 2. The number of trees with degree sequence d1 , d2 , . . . , dn is
(n − 2)!
Qn . (8.5)
k=1 (dk − 1)!

We can use this observation to prove the following lemma.


Lemma 21. Let D and F be tree degree sequences without common internal
nodes. Let TD and TF be two, independent random tree realizations following
the uniform distribution. Then the expected number of common edges is 1.
Proof. First, we make an observation. Let T be a random tree following the
uniform distribution of the realizations of the degree sequence d1 , d2 , . . . , dn .
Then the probability that vi and vj are adjacent in T is
di + dj − 2
. (8.6)
n−2
To see this, map the trees in which vi and vj are adjacent to the trees of n − 1
vertices with degree sequence

D0 = d1 , d2 , . . . , di−1 , di+1 , . . . , dj−1 , dj+1 , . . . , dn , di + dj − 2

by contracting vi and vj . It is easy to see that each tree realization of D0 is


j −2
an image did+d

i −1
times. Therefore, the number of tree realizations of D in
which vi is adjacent to vj is

(n − 3)! (di + dj − 2)!


Q . (8.7)
(di + dj − 3)! k6=i,j (dk − 1)! (di − 1)!(dj − 1)!

The probability that vi and vj are adjacent is the ratio of the values in Equa-
tions (8.7) and (8.5), which is indeed
di + dj − 2
. (8.8)
n−2
We define the following sets.

A := {vi |di = 1 ∧ fi > 1} (8.9)


B := {vj |dj > 1 ∧ fj = 1} (8.10)
(8.11)
Approximable counting and sampling problems 325

Observe that there might be parallel edges of the two tree realizations only
between these two sets. The expected number of edges is
X X (di − 1)(fj − 1) X di − 1 X fj − 1
2
=
(n − 2) n−2 n−2
vi ∈A vj ∈B vi ∈A vj ∈B
n n
X di − 1 X fj − 1
=1 (8.12)
i=1
n − 2 j=1 n − 2

since di = 1 for all vi ∈ A and fj = 1 for all vj ∈ B, and the sum of the
degrees decreased by 1 is n − 2 in any tree degree sequence.
We are ready to prove the following theorem.
Theorem 83. The counting problem #MinDegree1-2Trees-DegreePacking is
in FPAUS and FPRAS
Proof. Let D and F be two degree sequences without common internal nodes.
If there is a vertex vi such that di = n − 1 or fi = n − 1, then di + fi = n, and
it implies that D and F does not have edge-disjoint tree realizations. So we
can assume that there are vertices vi1 , vi2 , vj1 and vj2 such that di1 , di2 > 1
and fj1 , f (j2 ) > 1. The number of tree pairs (TD , TF ) in which (vi1 , vj1 ) and
(vi1 , vj1 ) are edges in both TD and TF is
(n − 4)!
Q ×
(di1 − 2)!(di2 − 2)! k6=i1 ,i2 (dk − 1)!
(n − 4)!
Q (8.13)
(fj1 − 2)!(fj2 − 2)! k6=j1 ,j2 (fk − 1)!

Since the expected number of common edges is 1, the number of edge-disjoint


realizations is also at least this number. What follows is that the probability
that two random trees are edge disjoint is at least
(di1 − 1)(di2 − 1)(fj1 − 1)(fj2 − 1)
p := . (8.14)
(n − 2)2 (n − 3)2
Sampling a random tree from the uniform distribution of realizations of a
degree sequence can be done in polynomial time. The key observation is that
i −1
the probability that a fixed leaf is adjacent to a vertex vi is dn−2 . Then we
can select a random neighbor of a given leaf, and then generate a random tree
of the remaining degree sequence.
We claim that the following algorithm is an FPAUS. Given a ε > 0, gen-
erate a random pair of tree realizations of D and F till the first edge-disjoint
realization, but at most − log(ε)
p times. If none of the pairs are edge-disjoint,
generate an arbitrary solution. Such a solution can be done in polynomial time
[117]. The probability that none of the trials are edge-disjoint realizations is
− log(ε)
(1 − p) p . (8.15)
326 Computational complexity of counting and sampling

It is easy to show that it is indeed smaller than ε. The generated edge-disjoint


realization follows a distribution π which is a convex combination of two distri-
butions: the uniform distribution and an unknown. Since the coefficient of the
unknown distribution is smaller than ε, the total variation distance between
π and the uniform distribution is less than ε.
Let ξ denote the indicator variable that two random trees are edge disjoint.
Then the number of edge-disjoint tree realizations is
(n − 2)! (n − 2)!
E[ξ] Qn Qn . (8.16)
k=1 (dk − 1)! k=1 (fk − 1)!

Since the inverse of the expectation is polynomial bounded, the necessary


number of samples for an FPRAS estimation is polynomial in all n, 1ε , and
− log(δ).

8.2 Sampling with Markov chains


8.2.1 Linear extensions of posets
Let (A, ≤) be a finite poset, |A| = n, and let M be a Markov chain whose
states are the total orderings of (A, ≤). The transitions are defined by the
following algorithm. Let ai1 , ai2 , . . . , ain be the current total ordering. With
probability 12 do nothing, namely, we are operating a Lazy Markov chain.
With probability 12 , select an index k uniformly between 1 and n − 1. If aik
and aik+1 are uncomparable, swap them, otherwise do nothing. It is easy to see
that the Markov chain is reversible with respect to the uniform distribution.
We are going to prove that the Markov chain is rapidly mixing, that is, the
inverse of the spectral gap is polynomial bounded. We use the right Cheeger
inequality and geometric considerations. Recall that the poset polytope is
given by the inequalities

0 ≤ xi ≤ 1 (8.17)
xi ≤ xj ∀ ai ≤ aj . (8.18)

Since any poset polytope contains the points (0, 0,√. . . , 0) and (1, 1, . . . , 1),
the diameter of a poset polytope of n elements is n. We can define the
poset polytope of any total ordering. Since there are n! total orderings of
an antichain of size n, and the polytopes of its total orderings has the same
1
volume, the volume of a total ordering polytope of n elements is n! .
The transitions of the Markov chain corresponds to the common facet of
the two total ordering polytopes. It is defined by the inequality system

0 ≤ xi1 ≤ xi2 ≤ . . . ≤ xik = xik+1 ≤ . . . ≤ xin . (8.19)


Approximable counting and sampling problems 327

√ of a total ordering of n − 1 elements except it


It is almost a poset polytope
is stretched by a factor of 2 in the direction xik = xik+1 . Hence the surface

2
(n − 1-dimensional volume) of the facet is (n−1)! .
Now consider the subset S of the total orderings that defines the conduc-
tance of M . Let U be the union of the poset polytopes of total orderings in S,
and let W be the union of the poset polytopes of total orderings in S. Finally,
let C denote the surface between U and W . Then we know that
A(C) 1
P P
π(x)T (y|x)

2
π(·) 2(n−1)
ΦS x∈S y∈S (n−1)! A(C) 1
= P = V (U )
= √
πS x∈S π(x) 1 π(·) V (U ) 22n(n − 1)
n!
(8.20)
where π(·) denotes the probability of any element of the state space, V denotes
the volume, and A denotes the area. Indeed, S contains V (U 1
)
total orderings,
n!
A(C)
and the number of transitions between S and S is √
2
. The poset polytope
(n−1)!
is a convex body, since it is defined by a set of linear inequalities. Therefore,
we can apply Theorem 72 saying that
A(C) 1
≥√ . (8.21)
V (U ) n

Indeed, min{V (U ), V (W )} = V (U ), and the diameter is n. Thus we get that
1
Φ≥ √ √ . (8.22)
22n(n − 1) n
Combining it with the right Cheeger inequality, we get that
Φ2 1
λ2 ≤ 1 − ≤1− (8.23)
2 16n (n − 1)2
3

that is,
1
≤ 16n3 (n − 1)2 . (8.24)
1 − λ2
The number of linear extensions is clearly at most n!. Therefore, by applying
Theorem 68, we get for the relaxation time that
  
1
τi (ε) ≤ 16n3 (n − 1)2 log(n!) + log . (8.25)
ε
This mixing time is clearly polynomial in both n and − log(ε). Finding one
total ordering of a poset can be done in polynomial time as well as performing
one step in a Markov chain can be done in polynomial time. What follows
is that there is an FPAUS for almost uniformly sampling linear extensions
of a poset. Since the number of linear extensions is a self-reducible counting
problem, we get the following theorem.
Theorem 84. The counting problem #LE is in FPRAS.
328 Computational complexity of counting and sampling

8.2.2 Counting the (not necessarily perfect) matchings of a


graph
We showed in Subsection 4.2.4 that counting the matchings in a graph is
#P-complete and it remains #P-complete even in planar graphs. Jerrum and
Sinclair [99] showed that the problem is in FPAUS and since the problem is
self-reducible, it is also in FPRAS.
The FPAUS is provided by a rapidly mixing Markov chain. The authors
proved the rapid convergence of a more general Markov chain. Let G = (V, E)
be an arbitrary graph, and let w : E → R+ be edge weights. If M ⊆ E is a
matching, the weight of the matching is
Y
W (M ) := w(e). (8.26)
e∈M

Consider a Markov chain whose state space is the set of the possible matchings
of G. We will denote this set with M(G). If M is the current state, choose an
edge e = (u, v) uniformly at random, and do the following:
1
(a) If e ∈ M , then move to M \ {e} with probability 1+w(e) .

(b) If e ∈
/ M and there is no edge in M which is incident to u or v, then
w(e)
move to M ∪ {e} with probability 1+w(e) .

(c) If e0 = (u, w) ∈ M and there is no edge in M which is incident to v,


w(e)
then move to M ∪ {e} \ {e0 } with probability w(e)+w(e 0) .

(d) Otherwise, do nothing.


The lazy version of this Markov chain is clearly irreducible and aperiodic. It
is easy to show that it is reversible with respect to the distribution

π(M ) ∝ W (M ). (8.27)

Indeed, there is a transition from M to M 0 if M and M 0 differ by an edge


or they differ by two adjacent edges. In the first case, w.l.o.g. we can assume
that M 0 = M ∪ {e}. Then

π(M 0 )T (M |M 0 ) = π(M )w(e)T (M |M 0 ) =


1
π(M )w(e) = π(M )T (M 0 |M ). (8.28)
1 + w(e)

In the second case, w.l.o.g. we can assume that M 0 = M ∪ {e} \ {e0 }. Then

w(e)
π(M 0 )T (M |M 0 ) = π(M ) T (M |M 0 ) =
w(e0 )
w(e) w(e0 )
π(M ) = π(M )T (M 0 |M ). (8.29)
w(e ) w(e) + w(e0 )
0
Approximable counting and sampling problems 329

We will apply the canonical path method to prove rapid mixing (Theo-
rem 76). We fix an arbitrary total ordering of the edges, and construct a path
system Γ, which contains a path for any pair of matchings.
Let X and Y be two matchings. We create a canonical path from X to
Y in the following way, on which Z will denote the current realization. The
symmetric difference X∆Y is a union of disjoint alternating paths and alter-
nating cycles. The total ordering of the edges defines a total ordering of the
components. We transform X to Y by working on the components in increas-
ing order. Let Ci be the ith component. There are two cases: Ci is either an
alternating path or an alternating cycle. If it is a path, then consider its edges
started from the end of the path containing the smaller edge, and denote them
by e1 , e2 , . . . , eli . If e1 is presented in Z, remove it. If it is not presented in Z,
then add it, and remove e2 . Then continue the transformation by removing
and adding edges till all the edges in Y are added, and all the edges in X are
removed (the last step might be adding the last edge of the path).
If Ci is a cycle, then remove the smallest edge e ∈ Ci ∩ Z. Then Ci \ {e}
is a path, and work with it as described above.
To give an upper bound on KΓ , we introduce

M := X∆Y ∆Z (8.30)

for each Z. Let e = (Z, Z 0 ) be an edge in the Markov graph. How many
canonical paths use this edge? We claim that X and Y can be unequivocally
reconstructed from the edge (Z, Z 0 ) and M . Indeed, observe that

M ∆Z = X∆Y, (8.31)

therefore X∆Y can be reconstructed from e and M . From this, we can identify
which is the component Ci that is being changed (recall that there is a total
ordering of the components that does not depend on Z). Observe that all
edges in Cj ∩ Z, j < i come from Y and all edges in Cj \ Z come from X.
Similarly, for all k > i, all edges in Ck ∩ Z come from X, and all edges in
Ck \ Z come from Y . Furthermore, from the edge e, we can also tell which
edges in Ci come from X and which edges come from Y . Since X, Y and Z
are identical on Z \ (X∆Y ), we can determine X and Y from e and M .
What does M look like? We claim that M is almost a matching except it
might contain a couple of adjacent edges. Indeed, M is a matching on M \ Ci ,
and might contain two adjacent edges on Ci if one of them is already removed
from Z and the other is going to be added to Z 0 . Let M ∗ be M if M is a
matching (which is the case if the edge (Z, Z 0 ) represents the first alteration
on the current component), and otherwise let M ∗ be M \ {f }, where f is the
edge of the adjacent pair of edges in M which is going to be added by the
operation represented by e.
Also observe that
1 π(X)π(Y )
π(M ∗ ) = . (8.32)
w(f ) π(Z)
330 Computational complexity of counting and sampling

It is also clear that any path from any X to Y has length at most 3n
5 , where
n is the number of vertices. The smallest transition probability is 1+w1max ,
where wmax is the largest weight. In this way, we can get an estimation of
the load of an edge e = (Z, Z 0 ). Let F denote the set containing the edge in
G that is added in the operation represented by e if such edge exists. If the
operationQrepresented by e deletes only an edge, then F is the empty set. Let
w(F ) := f ∈F w(f ), where the empty product is defined as 1. Then we have
the following upper bound on the load of e.
1 X 3n X π(X)π(Y )
|γX,Y |π(X)π(Y ) ≤ ≤
Q(e) γ 3e 5(1 + wmax ) γ π(Z)
X,Y X,Y

3n(1 + wmax ) X 3n(1 + wmax )


π(M )w(F ) ≤ wmax . (8.33)
5 5
M ∈M(G)

2
Therefore, KΓ in Theorem 76 is O(nwmax ). According to the theorem, the
Markov chain is rapidly mixing if the maximum weight is upper bounded by
a polynomial function of n.

8.2.3 Sampling realizations of bipartite degree sequences


The number of bipartite degree sequence realizations is the following prob-
lem.
Problem 19.
Name: #BDSR.
Input: a degree sequence D = {d1,1 , d1,2 , . . . , d1,n }, {d2,1 , d2,2 , . . . , d2,m }.
Output: the number of vertex-labelled simple bipartite graphs G = (U, V, E),
such that for all i, d(ui ) = d1,i , and for all j, dvj = d2,j .
The decision version problem is the bipartite degree sequence problem.
If there is at least one simple graph whose degrees are D, then D is called
graphical, and such a simple graph G is called a realization of D.
The bipartite degree sequence problem is in P. It is the straight corollary
of the Gale-Ryser theorem (see Exercise 42 in Chapter 2).
A swap operation removes two edges e1 = (u1 , v1 ) and e2 = (u2 , v2 ),
u1 6= u2 , v1 6= v2 from a (bipartite) graph G, and adds two edges (u1 , v2 ) and
(u2 , v1 ) (given that neither of these edges are presented in G). Such a swap
operation is denoted by
(u1 , v1 ) × (u2 , v2 ).
Clearly, a swap operation does not change the degrees of the vertices in G.
These small perturbations are sufficient to transform any realizations of a
bipartite degree sequence to any another one, as the following theorem states.
Theorem 85. Let G and G0 be two realizations of the same bipartite degree
sequence, then there exists a finite series of graphs
G = G1 , G2 , . . . , Gk = G0
Approximable counting and sampling problems 331

such that for all i = 1, 2, . . . , k − 1, Gi can be transformed into Gi+1 with


a single swap operation. Furthermore, for the number of swap operations it
holds that
|E(G)∆E(G0 )|
k−1< .
2
Proof. Take the symmetric difference H = G∆G0 . Since G and G0 are re-
alizations of the same bipartite degree sequence, each component of H is a
Eulerian graph, and can be decomposed into alternating cycles (the edges in
a cycle come from G and G0 Salternating) since the graphs are bipartite. Fix a
m
cycle decomposition of H = j=1 Cj . Clearly,

G0 = G∆C1 ∆C2 ∆ . . . ∆Cm .

Therefore, it is sufficient to show how to transform G into G∆C, where C is


an alternating cycle (the edges in C are alternatingly presented and absent
along the cycle). Take a walk along C starting in the smallest vertex u1 and
start walking on the edge in G0 . Let u1 , v1 , u2 , v2 , . . . ul , vl be the vertices of C
along this walk. Observe that the edge (u1 , v1 ) is not in G, by the definition of
the walk. Find the smallest i such that (u1 , vi ) is an edge in G. Such i exists,
since (u1 , vl ) is an edge in G. Then the swap operations

(u1 , vi ) × (ui , vi−1 ), (u1 , vi−1 ) × (ui−1 , vi−2 ), . . . , (u1 , v2 ) × (u2 , v1 )

can be applied on G (in the given order). If i = l, then G is transformed into


G∆C. Otherwise the vertices u1 , vi , ui+1 , vi+1 , . . . vl represent an alternating
cycle in the modified graph. This cycle is smaller than C, and (u1 , vi ) is not an
edge in the modified graph. We can apply the same procedure on this shorter
cycle, and after a finite number of iterations, i will be l, and thus, G will be
transformed into G∆C.
We showed how to transform G into G∆C1 with a finite series of swap
operations. Then since G∆C1 ∆C2 = (G∆C1 )∆C2 , G can be transformed
into G∆C1 ∆C2 with a finite series of swap operations, and generally into
G0 = G∆C1 ∆C2 ∆ . . . ∆Cm with a finite series of swap operations.
The number of swap operations necessary to transform G into G∆C is
|C|−2
2 . Therefore, the number of necessary swap operations to transform G
|E(G)∆E(G0 )|
into G0 is indeed less than 2 .
We will call the series of swap operations transforming G into G∆C the
sweeping process, and u1 is the cornerstone of the sweeping. Theorem 85
provides a way to design a Markov chain by exploring the realizations of a
bipartite degree sequence. Indeed, it is easy to see that the lazy version of
the following Markov chain is reversible, aperiodic and irreducible. The state
space contains the realizations of a bipartite degree sequence. The transition
probability from G1 to G2 is
1
n m
 
2 2
332 Computational complexity of counting and sampling

if G1 can be transformed into G2 with a single swap operation, and 0 otherwise.


The transition probability from G to itself (the probability that the Markov
chain remains in the same state G) is

c(G)
1− n
 m
2 2

where c(G) denotes the number of possible swap operations on G.


It is conjectured that this Markov chain is rapidly mixing for any degree
sequence. However, it is proved only for some special classes of degree se-
quences. Here we introduce the sketch of the proof for half regular bipartite
degree sequences. First, we define these degree sequences.
Definition 62. A bipartite degree sequence D = {d1,1 , d1,2 , . . . , d1,n },
{d2,1 , d2,2 , . . . , d2,m } is half regular if for all i1 and i2 , d1,i1 = d1,i2 .

The proof is based on the multicommodity flow method (Theorem 78).


The multicommodity flow is defined in the following way. Let X and Y be
the realizations of the same half regular bipartite degree sequence D. The
cardinalities of the two vertex classes
 aredenoted by n and m. Let H = X∆Y .
For each vertex w in H, there are d(w) 2 ! ways to pair the edges of X to the
edges of Y incident to v, where d(w) denotes the degree of w in H. For each
possible pairing on each vertex in H, we define a cycle decomposition on H.
Let Φ denote a fixed ensemble of pairings, and let ϕv (e) denote the pair of e
in Φ on vertex v. Let ui be the smallest index vertex in H that does not Thave
degree 0. Let vj be the smallest index vertex for which (ui , vj ) is in H Y .
Let (ui , vj ) be denoted by e. Then define a circuit that starts with e, ends
with ϕui (e), and contains edges

e = e1 , e2 , . . . , el

where for each ek = (uk , vk ), ek+1 is defined as ϕvk (ek ). Denote the so-defined
circuit by C1 . If H \ C1 is not the empty graph, repeat the same on H \ C1 to
get a circuit C2 . The process is iterated till H \ (C1 ∪ C2 ∪ . . . ∪ Cs ) is the empty
graph. Then each Ci is decomposed into cycles Ci,1 , Ci,2 , Ci,ji . The cycle Ci,1 is
the cycle between the first and second visit of w, where w is the first revisited
vertex in Ci (note that w might be both in U and V ). Then Ci,2 is defined in
the same way in Ci \ Ci,1 , etc. The path from X to Y is defined by processing
the cycles
C1,1 , C1,2 , . . . C1,j1 , C2,1 , . . . Cs,sj
applying the sequence of swap operations as described in the proof of Theo-
rem 85. For the so-obtained path γ, we define

π(X)π(Y )
f (γ) := Q  .
d(w)
w∈V (H) 2 !
Approximable counting and sampling problems 333

Let the number of realizations of D be N . Since π is the uniform distribution,


and the length of any path is less than nm according to Theorem 85, it holds
that
n m
 
1 X 2 nm
X
2
f (γ)|γ| < Q   1.
Q(e) γ3e N d(w)
! γ3e
w∈V (H) 2

Therefore, if the number of paths in the path system going through any edge
is less than
Y  d(w) 
poly(n, m)N !
2
w∈V (H)

for some poly(n, m), then the swap Markov chain is rapidly mixing. Let Z
be a realization on a path γ going from X to Y obtained from the ensemble
of pairings Φ, and let e be the transition from Z to Z 0 . Let MG denote the
adjacency matrix of a bipartite graph G, and let

M̂ := MX + MY − MZ . (8.34)

Miklós, Erdős and Soukup [130] proved that the path γ and thus X and Y can
be unequivocally obtained from M̂ , Φ, e and O(log(nm)) bits of information.
The proof is quite involved and thus omitted here. The corollary is that the
number of paths going through a particular e = (Z, Z 0 ) is upper bounded by
Y  d(v) 
poly(n, m)|MZ | ! (8.35)
2
v∈V (H)

where MZ are the set of possible M̂ matrices defined in Equation (8.34).


Therefore, it is sufficient to show that the number of M̂ matrices is upper
bounded by
poly(n, m)N.
To prove this, we show that any possible M̂ is in Hamming distance 12 from an
adjacency matrix of a realization of the degree sequence D. This is sufficient
to prove that
|MZ | ≤ poly(n, m)N
since M̂ ∈ {−1, 0, 1, 2}n×m , and the number of matrices in {−1, 0, 1, 2}n×m
which are at most Hamming distance 12 from a given adjacency matrix of a
realization of D is
12  
X nm i
3.
i=1
i

First observe that M̂ has the same row and column sums as the adjacency
matrix of any realization of D, and it might contain at most 3 values which
are not 0 or 1. Indeed, if

Z = X∆C1 ∆C2 ∆ . . . ∆Ck


334 Computational complexity of counting and sampling

then M̂ is a 0-1 matrix, and thus is an adjacency matrix of a realization of


D. If Z is a realization during processing a cycle Ci , then there might be 3
chords of Ci whose corresponding values in M̂ are not 0 or 1. Particularly,
there might be two 2 values and one −1. A value 2 might appear when Z does
not contain an edge which is presented both in X and Y , and a −1 might
appear when Z contains an edge which is neither in X nor in Y . Furthermore
these “bad” values are in the same line corresponding to the cornerstone of
the sweeping process.
Assume that m̂i,j = 2. There must be an i0 such that m̂i0 ,j = 0. Since D
is half-regular, each row in the adjacency matrix of a realization has the same
sum. Due to the pigeonhole rule, there is a j 0 , such that m̂i,j 0 < m̂i0 ,j 0 . Since
all bad values are in the same row, it follows that m̂i,j 0 ≤ 0 and m̂i0 ,j 0 ≥ 0. If
m̂i,j 0 = −1 and m̂i0 ,j 0 = 0, then there must be another j” such that m̂i,j” <
m̂i0 ,j” , and since there are at most one −1 in M̂ , it follows that m̂i,j” = 0
and m̂i0 ,j” = 1. Recall j” to j 0 in this case. Changing m̂i,j to 1, m̂i0 ,j 0 to 1,
increasing m̂i,j 0 by 1, and decreasing m̂i0 ,j 0 by 1 does not change the row and
column sums, however, it eliminates at least one “bad” value from M̂ . Let’s
call this modified matrix M̂ 0 . It is easy to see that M̂ 0 has a Hamming distance
4 from M̂ .
If there is another 2 value in M̂ 0 , we can use the same procedure. The
so-obtained M̂ ” has Hamming distance 4 from M̂ 0 , and thus has at most
Hamming distance 8 from M̂ . It is easy to see that a −1 can also be eliminated
by changing 4 appropriately selected entries. The so-obtained matrix is an
adjacency matrix of a realization of D. Since the at most 3 “bad” values
can be eliminated by changing at most 3 × 4 = 12 entries in M̂ , M̂ is at
most Hamming distance 12 from an adjacency matrix of a realization of D.
Therefore, the number of paths going through on a particular edge is upper
bounded by the value in Equation (8.35), and thus
n m
 
1 X 2 nm
X
2
f (γ)|γ| <   1 ≤ poly(nm). (8.36)
Q(e) γ3e N w∈V (H) d(w)
Q
! γ3e
2

By Theorem 78, this proves the rapid mixing of the Markov chain.

8.2.4 Balanced realizations of a JDM


A symmetric matrix J with non-negative integer elements is the joint
degree matrix (JDM) of an undirected simple graph G if the element Ji,j gives
the number of edges between the class Vi of vertices all having degree i and
the class Vj of vertices all with degree j in the graph. In this case we also say
that J is graphical and that G is a graphical realization of J. Note that there
can be many different graphical realizations of the same JDM.
Given a JDM, the number of vertices ni = |Vi | in class i is obtained from:
Pk
Ji,i + j=1 Ji,j
ni = , (8.37)
i
Approximable counting and sampling problems 335

where k denotes the maximum number of degrees. This implies that a JDM
also uniquely determines the degree sequence, since we have obtained the
number of nodes of given degrees for all possible degrees. For sake of uniformity
we consider all vertex classes Vi for i = 1, . . . , k; therefore we consider empty
classes with ni = 0 vertices as well. A necessary condition for J to be graphical
is that all the P
ni -s are integers. Let n denote the total number of vertices.
Naturally, n = i ni and it is uniquely determined via Equation (8.37) for a
given graphical JDM. The necessary and sufficient conditions for a given JDM
to be graphical are provided in the following theorem
Theorem 86. [50] A k × k matrix J is a graphical JDM if and only if the
following conditions hold:
1. For all i = 1, . . . , k
Pk
Ji,i + j=1 Ji,j
ni :=
i
is integer.
2. For all i = 1, . . . , k  
ni
Ji,i ≤ .
2
3. For all i = 1, . . . , k and j = 1, . . . , k, i 6= j,

J ij ≤ n i n j .

Let dj (v) denote the number of edges such that one end-vertex is v and
the other end-vertex belongs to Vj , i.e., dj (v) is the degree of v in Vj . The
vector consisting of the dj (v)-s for all j is called the degree spectrum of vertex
v. We introduce the notation

0, if ni = 0 ,
Θi,j = Ji,j
ni , otherwise,

which gives the average number of neighbors of a degree-i vertex in vertex


class Vj . Then a realization of the JDM is balanced iff for every i and all
v ∈ Vi and all j, we have

|dj (v) − Θi,j | < 1 .

The following theorem is proven in paper [50] as Corollary 5:


Theorem 87. Every graphical JDM admits a balanced realization.
A restricted swap operation (RSO) takes two edges (x, y) and (u, v) with
x and u from the same vertex class and swaps them with two non-edges (x, v)
and (u, y). The RSO preserves the JDM, and in fact forms an irreducible
Markov chain over all its realizations [50]. An RSO Markov chain restricted
to balanced realizations can be defined as follows:
336 Computational complexity of counting and sampling

Definition 63. Let J be a JDM. The state space of the RSO Markov chain
consists of all the balanced realizations of J. It was proved by Czabarka et
al. [50] that this state space is connected under restricted swap operations.
The transitions of the Markov chain are defined in the following way. With
probability 1/2, the chain does nothing, so it remains in the current state (we
consider a lazy Markov chain). With probability 1/2 the chain will choose
four, pairwise disjoint vertices, v1 , v2 , v3 , v4 from the current realization (the
possible choices are order dependent) and check whether v1 and v2 are chosen
from the same vertex class, and furthermore whether the

E \ {(v1 , v3 ), (v2 , v4 )} ∪ {(v1 , v4 ), (v2 , v3 )}

swap operation is feasible. If this is the case then our Markov chain performs
the swap operation if it leads to another balanced JDM realization. Otherwise
the Markov chain remains in the same state. (Note that exactly two different
orders of the selected vertices will provide the same swap operation, since the
roles of v1 and v2 are symmetric.) Then there is a transition with probability
1
n(n − 1)(n − 2)(n − 3)
between two realizations iff there is a RSO transforming one into the other.
Here we prove that such a Markov chain is rapidly mixing. The convergence
of a Markov chain is measured as a function of the input data size. Here we
note that the size of the data is the number of vertices (or number of edges,
they are polynomially bounded functions of each other) and not the number of
digits to describe the JDM. This distinction is important as, for example, one
can create a 2×2 JDM with values J2,2 = J3,3 = 0 and J2,3 = J3,2 = 6n, which
has Ω(n) number of vertices (or edges) but it needs only O(log(n)) number
of digits to describe (except in the unary number system). Alternatively, one
might consider the input is given in unary.
Formally, we state the rapid mixing property via the following theorem:
Theorem 88. The RSO Markov chain on balanced JDM realizations is a
rapidly mixing Markov chain, namely, for the second-largest eigenvalue λ2 of
this chain, it holds that
1
= O(poly(n))
1 − λ2
where n is the number of vertices in the realizations of the JDM.
Note that the expression on the LHS is called, with some abuse of notation,
the relaxation time: it is the time is needed for the Markov chain to reach
its stationary distribution. The proof is based on the special structure of the
state space of the balanced JDM realizations. This special structure allows the
following proof strategy: if we can prove that some auxiliary Markov chains
are rapidly mixing on some sub-spaces obtained from decomposing the above-
mentioned specially structured state space, then the Markov chain on the
Approximable counting and sampling problems 337

whole space is also rapidly mixing. We are going to prove the rapid mixing of
these auxiliary Markov chains, as well as give the proof of the general theorem,
that a Markov chain on this special structure is rapidly mixing, hence proving
our main Theorem 88.
In order to describe the structure of the space of balanced JDM realiza-
tions, we first define the almost semi-regular bipartite and almost regular
graphs.

Definition 64. A bipartite graph G(U, V ; E) is almost semi-regular if for any


u1 , u2 ∈ U and v1 , v2 ∈ V

|d(u1 ) − d(u2 )| ≤ 1

and
|d(v1 ) − d(v2 )| ≤ 1.
Definition 65. A graph G(V, E) is almost regular, if for any v1 , v2 ∈ V

|d(v1 ) − d(v2 )| ≤ 1.
It is easy to see that the restriction of any graphical realization of the JDM
to vertex classes Vi , Vj , i 6= j can be considered as the coexistence of two
almost regular graphs (one on Vi and the other on Vj ), and one almost semi-
regular bipartite graph on the vertex class pair Vi , Vj . More generally, the
collection of these almost semi-regular bipartite graphs and almost regular
graphs completely determines the balanced JDM realization. Formally:
Definition 66 (Labeled union). Any balanced JDM realization can be rep-
resented as a set of almost semi-regular bipartite graphs and almost regular
graphs. The realization can then be constructed from these factor graphs as
their labeled union: the vertices with the same labels are collapsed, and the
edge set of the union is the union of the edge sets of the factor graphs.
It is useful to construct the following auxiliary graphs. For each vertex
class Vi , we create an auxiliary bipartite graph, Gi (Vi , U ; E), where U is a set
of “super-nodes” representing all vertex classes Vj , including Vi . There is an
edge between v ∈ Vi and super-node uj representing vertex class Vj iff

dj (v) = dΘi,j e ,

i.e., iff node v carries the ceiling of the average degree of its class i toward
the other class j. (For sake of uniformity, we construct these auxiliary graphs
for all i = 1, . . . , k, even if some of them have no edge at all. Similarly, all
super-nodes are given, even if some of them have no incident edge.) We claim
that these k auxiliary graphs are half-regular, i.e., each vertex in Vi has the
same degree (the degrees in the vertex class U might be arbitrary). Indeed, the
vertices in Vi all have the same degree in the JDM realization, therefore, the
338 Computational complexity of counting and sampling

number of times they have the ceiling of the average degree toward a vertex
class is constant in a balanced realization.
Let Y denote the space of all balanced realizations of a JDM and just as
before, let k denote the number of vertex classes (some of them can be empty).
We will represent the elements of Y via a vector y whose k(k+1)/2 components
are the k almost regular graphs and the k(k − 1)/2 almost regular bipartite
graphs from their labeled union decomposition, as described in Definition
66 above. Given an element y ∈ Y (i.e., a balanced graphical realization
of the JDM) it has k associated auxiliary graphs Gi (Vi , U ; E), one for every
vertex class Vi (some of them can be empty graphs). We will consider this
collection of auxiliary graphs for a given y as a k-dimensional vector x, where
x = (G1 , . . . , Gk ).
For any given y we can determine the corresponding x (so no particular
y can correspond to two different xs), however, for a given x there can be
several y’s with that same x. We will denote by Yx the subset of Y containing
all the y’s with the same (given) x and by X the set of all possible induced
x vectors.SClearly, the x vectors can be used to define a disjoint partition on
Y: Y = Yx . For notational convenience we will consider the space Y as
x∈X
pairs (x, y), indicating the x-partition to which y belongs. This should not be
confused with the notation for an edge, however, this should be evident from
the context. A restricted swap operation might fix x, in which case it will
make a move only within Yx , but if it does not fix x, then it will change both
x and y. For any x, the RSOs moving only within Yx form a Markov chain.
On the other hand, tracing only the x’s from the pairs (x, y) is not a Markov
chain: the probability that an RSO changes x (and thus also y) depends also
on the current y not only on x. However, the following theorem holds:
Theorem 89. Let (x1 , y1 ) be a balanced realization of a JDM in the above
mentioned representation.
i Assume that (x2 , y2 ) balanced realization is derived from the first one
with one restricted swap operation. Then, either x1 = x2 or they differ
in exactly one coordinate, and the two corresponding auxiliary graphs
differ only in one swap operation.
ii Let x2 be a vector differing only in one coordinate from x1 , and further-
more, only in one swap within this coordinate, namely, one swap within
one coordinate is sufficient to transform x1 into x2 . Then there exists at
least one y2 such that (x2 , y2 ) is a balanced JDM realization and (x1 , y1 )
can be transformed into (x2 , y2 ) with a single RSO.
Proof. (i) This is just the reformulation of the definitions for the (x, y) pairs.
(ii) (See also Fig. 8.1) By definition there is a degree i, 1 ≤ i ≤ k such that
auxiliary graphs x1 (Gi ) and x2 (Gi ) are different and one swap operation trans-
forms the first one into the second one. More precisely there are vertices
v1 , v2 ∈ Vi such that the swap transforming x1 (Gi ) into x2 (Gi ) removes edges
Approximable counting and sampling problems 339

(v1 , Uj ) and (v2 , Uk ) (with j 6= k) and adds edges (v1 , Uk ) and (v2 , Uj ). (The
capital letters show that the second vertices are super-vertices.) Since the
edge (v1 , Uj ) exists in the graph x1 (G1 ) and (v2 , Uj ) does not belong to graph
x1 (Gi ), therefore dj (v1 ) > dj (v2 ) in the realization (x1 , y1 ). This means that
there is at least one vertex w ∈ Vj such that w is connected to v1 but not
to v2 in the realization (x1 , y1 ). Similarly, there is at least one vertex r ∈ Vk
such that r is connected to v2 but not to v1 (again, in realization (x1 , y1 )).
Therefore, we have a required RSO on nodes v1 , v2 , w, r.

Vk r

Uk
w Uj
Vj

Vi v1 v2 Vi v1 v2

FIGURE 8.1: Construction of the auxiliary bipartite graph Gi and a RSO


{(v1 , w), (v2 , r)} 7→ {(v1 , r), (v2 , w)} taking (x1 , y1 ) into (x2 , y2 ).

Thus any RSO on a balanced realization yielding another balanced real-


ization either does not change x or changes x exactly on one coordinate (one
auxiliary graph), and this change can be described with a swap, taking one
auxiliary graph into the other.
We are going to apply Theorem 73 to prove that the RSO Markov chain is
rapidly mixing on the balanced JDM realizations. We partition its state space
according to the vectors x of the auxiliary graph collections (see Definition 66
and its explanations). The following result will be used to prove that all derived
(marginal) Markov chains Mx are rapidly mixing. Next, we announce two
theorems that are direct extensions of statements for fast mixing swap Markov
chains for regular degree sequences (Cooper, Dyer and Greenhill [45]) and for
half-regular bipartite degree sequences (Erdős, Kiss, Miklós and Soukup [62]).
Theorem 90. The swap Markov chain on the realizations of almost regular
degree sequences is rapidly mixing.

Theorem 91. The swap Markov chain on the realizations of almost half-
regular bipartite degree sequences is rapidly mixing.
We are now ready to prove the main theorem.
340 Computational complexity of counting and sampling

Proof. (Theorem 88) We show that the RSO Markov chain on balanced real-
izations fulfills the conditions in Theorem 73. First we show that condition (i)
of Theorem 73 holds. When restricted to the partition Yx (that is with x fixed),
the RSO Markov chain over the balanced realizations walks on the union of
almost semi-regular and almost regular graphs. By restriction here we mean
that all probabilities which would (in the original chain) leave Yx are put onto
the shelf-loop probabilities. Since an RSO changes only one coordinate at a
time, independently of other coordinates, all the conditions in Theorem 80 are
fulfilled. Thus the relaxation time of the RSO Markov chain restricted onto
Yx is bounded from above by the relaxation time of the chain restricted onto
that coordinate (either an almost semi-regular bipartite or an almost regular
graph) on which this restricted chain is the slowest (the smallest gap). How-
ever, based on Theorems 90 and 91, all these restrictions are fast mixing, and
thus by Theorem 80 the polynomial bound in (i) holds. (Here K = k(k+1) 2 ,
see Definition 66 and note that an almost semi-regular bipartite graph is also
an almost half-regular bipartite graph.)
Next we show that condition (ii) of Theorem 73 also holds. The first coor-
dinate is the union of auxiliary bipartite graphs, all of which are half-regular.
The M 0 Markov chain corresponding to Theorem 73 is the swap Markov chain
on these auxiliary graphs. Here each possible swap has a probability
1
n(n − 1)(n − 2)(n − 3)
and by Theorem 89 it is guaranteed that condition 7.53 is fulfilled. Since, again
all conditions of Theorem 80 are fulfilled (mixing is fast within any coordinate
due to Theorems 90 and 91), the M 0 Markov chain is also fast mixing. The
condition in Equation (7.53) holds due to Theorem 89. Since all conditions
in Theorem 73 hold, the RSO swap Markov chain on balanced realizations is
also rapidly mixing.

8.2.5 Counting the most parsimonious DCJ scenarios


Here we introduce a counting and sampling problem coming from bioinfor-
matics introduced by Miklós and Tannier [134]. First we give a mathematical
model of genomes and the operations changing the genomes.
Definition 67. A genome is a directed, edge-labeled graph, in which each
vertex has total degree (in-degree plus out-degree) 1 or 2, and each label is
unique. Each edge is called a marker. The beginning of an edge is called its
tail and the end of an edge is called its head; the joint name of heads and tails
is extremities. Vertices with total degree 2 are called adjacencies, and vertices
with total degree 1 are called telomeres.
It is easy to see that any genome is a set of disjoint paths and cycles, where
neither the paths nor the cycles are necessarily directed. The components of
Approximable counting and sampling problems 341

FIGURE 8.2: An example of two genomes with 7 markers.

the genome are called chromosomes. An example of a genome is drawn in


Figure 8.2. All adjacencies correspond to an unordered set of two marker
extremities. All telomeres correspond to one marker extremity. For example,
the (h1, t4) describes the vertex of genome 2 in Figure 8.2 in which the head
of marker 1 and the tail of marker 4 meet, and similarly, (h7) is the telomere
where marker 7 ends. A genome is fully described by a list of such descriptions
of adjacencies and telomeres. Two genomes with the same edge label set are
co-tailed if they have the same telomeres. This is the case for the two genomes
of Figure 8.2.
Definition 68. A DCJ or double cut and join operation transforms one
genome into another by modifying the adjacencies and telomeres in one of
the following 4 ways:
• Take two adjacencies (a, b) and (c, d) and create two new adjacencies
(a, c) and (b, d). The adjacency descriptors are not ordered: namely, the
two new adjacencies might instead be (a, d) and (b, c).

• Take an adjacency (a, b) and a telomere (c), and create a new adjacency
and a new telomere from the 3 extremities: either (a, c) and (b) or (b, c)
and (a).
• Take two telomeres (a) and (b), and create a new adjacency (a, b).
• Take an adjacency (a, b) and create two new telomeres (a) and (b).
342 Computational complexity of counting and sampling

Given two genomes G1 and G2 with the same label set, it is always possible
to transform one into the other by a sequence of DCJ operations [185]. Such
a sequence is called a DCJ scenario for G1 and G2 . The minimum length of a
scenario is called the DCJ distance and is denoted by dDCJ (G1 , G2 ).
Definition 69. The Most Parsimonious DCJ (MPDCJ) scenario problem
for two genomes G1 and G2 is to compute dDCJ (G1 , G2 ). The #MPDCJ
problem asks for the number of scenarios of length dDCJ (G1 , G2 ), denoted
by #MPDCJ(G1 , G2 ).
For example, the DCJ distance between the two genomes of Figure 8.2 is
three and there are nine different most parsimonious scenarios.
MPDCJ is an optimization problem, which has a natural corresponding
decision problem asking if there is a scenario with a given number of DCJ op-
erations. So we may write that #MPDCJ ∈ #P, which means that #MPDCJ
asks for the number of witnesses of the decision problem: “Is there a scenario
for G1 and G2 of size dDCJ (G1 , G2 )?”
Before turning to approximating the number of solutions, first we give an
overview of how to find one solution. Here, the following combinatorial object
plays a central role.
Definition 70. The adjacency graph G(V1 ∪ V2 , E) of two genomes G1 and
G2 with the same edge label set is a bipartite multigraph with V1 being the
set of adjacencies and telomeres of G1 , V2 being the set of adjacencies and
telomeres of G2 . The number of edges between u ∈ V1 and v ∈ V2 is the
number of extremities they share.
Observe that the adjacency graph is a bipartite multigraph which falls into
disjoint cycles and paths. The paths might belong to one of three types:
1. an odd path, containing an odd number of edges and an even number
of vertices;
2. an even path with two endpoints in V1 ; we will call them W -shaped
paths; or
3. an even path with two endpoints in V2 ; we will call them M -shaped
paths.
In addition, cycles with two edges and paths with one edge are called trivial
components. We can use the adjacency graph to obtain the DCJ distance
between two genomes.
Theorem 92. [185, 17]
 
I
dDCJ (G1 , G2 ) = N − C + (8.38)
2
where N is the number of markers, C is the number of cycles in the adjacency
graph of G1 and G2 , and I is the number of odd paths in the adjacency graph
of G1 and G2 .
Approximable counting and sampling problems 343

Since calculating C and I is easy, MPDCJ is clearly in P and has a linear


running time algorithm.
A DCJ operation on a genome G1 which decreases the DCJ distance to a
genome G2 is called a sorting DCJ for G1 and G2 . It is possible to characterize
the effect of a sorting DCJ on the adjacency graph of genomes G1 and G2 . It
acts on the vertex set V1 and has one of the following effects [24]:
• splitting a cycle into two cycles,
• splitting an odd path into a cycle and an odd path,
• splitting an M -shaped path into a cycle and an M -shaped path,
• splitting an M -shaped path into two odd paths,
• splitting a W -shaped path into a cycle and a W -shaped path,
• merging the two ends of a W -shaped path, thus transforming it into a
cycle, or
• combining an M -shaped and a W -shaped path into two odd paths.
Note that trivial components are never affected by these operations, and
all but the last type of DCJ operations act on a single component of the
adjacency graph. The last type of DCJ acts on two components, which are
M - and W -shaped paths.
In this context, sorting a component K means applying a sequence of sort-
ing DCJ operations to vertices of K (in V1 ) so that the resulting adjacency
graph has only trivial components involving the extremities of K. In a min-
imum length DCJ scenario, every component is sorted independently, except
M - and W -shaped paths, which can be sorted together. If in a DCJ scenario
one operation acts on both an M - and a W -shaped path, we say that they
are sorted jointly; otherwise we say that they are sorted independently.
It is conjectured that #MPDCJ is in #P-complete, although the problem
is solvable in polynomial time for special cases when the genomes are co-tailed
[24, 139], or more generally in the absence of M - and W -shaped paths. So the
hard part is dealing with M - and W -shaped paths. We show here that for the
general case, we may restrict ourselves to this hard part, and suppose that
there are only M - and W -shaped paths in the adjacency graph.
Given two genomes G1 and G2 with the same label set, let AG be the ad-
jacency graph of G1 and G2 . Denote by G∗1 the genome that we get by sorting
all cycles and odd paths in the adjacency graph of G1 and G2 . By definition,
the adjacency graph between G1 and G∗1 has no M - and W -shaped paths,
while the adjacency graph between G∗1 and G2 has only trivial components
and M - and W -shaped paths. Furthermore, it is easy to see that

dDCJ (G1 , G2 ) = dDCJ (G1 , G∗1 ) + dDCJ (G∗1 , G2 ). (8.39)


We would like to consider the following subproblem.
344 Computational complexity of counting and sampling

Definition 71. The #MPDCJM W problem asks for the number of DCJ sce-
narios between two genomes when their adjacency graph contains only trivial
components and M - and W -shaped paths.
The correspondence between solutions for #MPDCJM W and #MPDCJ is
stated by the following lemma.
Lemma 22. It holds that
dDCJQ(G1 ,G2 )! Q
#MPDCJ(G1 , G2 ) = dDCJ (G∗1 ,G2 )! i (ci −1)! j (lj −1)!
Q ci −2 Q lj −2
× i ci j lj × #MPDCJM W (G∗1 , G2 ) (8.40)

where i indexes the cycles of the adjacency graph of G1 and G2 , ci denotes


the number of vertices in vertex set V1 belonging to the ith cycle, j indexes the
odd paths of the adjacency graph, and lj is the number of vertices in vertex
set V1 belonging to the jth odd path.

Proof. As M - and W -shaped paths and other components are always treated
independently, we have
 
dDCJ (G1 , G2 )
#MPDCJ(G1 , G2 ) =
dDCJ (G∗1 , G2 )
× #MPDCJ(G1 , G∗1 ) × #MPDCJM W (G∗1 , G2 ).

For the genomes G1 and G∗1 , whose adjacency graphs do not contain M -
and W -shaped paths, we have from [24] and [139] that
Y Y l −2 dDCJ (G1 , G∗1 )!
#MPDCJ(G1 , G∗1 ) = cci i −2 ljj ×Q Q .
i j i (ci − 1)! j (lj − 1)!

These two equations together with Equation (8.39) give the result.
The following theorem says that the hardness of the #MPDCJ problem is
the same as the #MPDCJM W problem.
Theorem 93.

#MPDCJM W ∈ FP ⇐⇒ #MPDCJ ∈ FP (8.41)


#MPDCJM W ∈ #P-complete #MPDCJ ∈ #P-complete (8.42)
⇐⇒
#MPDCJM W ∈ FPRAS ⇐⇒ #MPDCJ ∈ FPRAS (8.43)
#MPDCJM W ∈ FPAUS ⇐⇒ #MPDCJ ∈ FPAUS (8.44)

Proof. Both the multinomial factor and the two products in Equation (8.40)
can be calculated in polynomial time. Thus the transformation between the
solutions to the two different counting problems is a single multiplication or
division by an exactly calculated number. This proves that #MPDCJM W
Approximable counting and sampling problems 345

is in FP if and only if #MPDCJ is in FP, as well as #MPDCJM W is in


#P-complete if and only if #MPDCJ is in #P-complete.
Such a multiplication and division keeps the relative error when the solu-
tion of one of the problems is approximated. This proves that #MPDCJM W
is in FPRAS if and only if #MPDCJ is in FPRAS.
Concerning the last equivalence, the ⇐ part is trivial because
#MPDCJM W is a particular case of #MPDCJ. Now we prove that
#MPDCJM W ∈ FPAUS ⇒ #MPDCJ ∈ FPAUS. Suppose an FPAUS ex-
ists for #MPDCJM W , and let G1 and G2 be two arbitrary genomes. The
following algorithm gives a FPAUS for #MPDCJ.
• Draw a DCJ scenario between G∗1 and G2 following a distribution p
satisfying
dT V (p, U ) ≤ 
where U is the uniform distribution over all possible most parsimonious
DCJ scenarios between G∗1 and G2 .
• Generate a DCJ scenario between G1 and G∗1 , following the uniform dis-
tribution. This scenario can be sampled exactly uniformly in polynomial
time: (1) there are only cycles and odd paths in the adjacency graph of
G1 and G∗1 , so the number of scenarios can be calculated in polynomial
time; (2) there is a polynomial number of sorting DCJ steps on each
component, and a sorting DCJ operation results in an adjacency graph
that also only has cycles and odd paths.
• Draw a sequence of 0s and 1s, containing dDCJ (G∗1 , G2 ) 1s and
dDCJ (G1 , G∗1 ) 0s, uniformly from all ddDCJ
DCJ (G1 ,G2 )

(G∗ ,G2 )
such sequences.
1

• Merge the two paths constructed at the two first steps, according to the
drawn sequence of 0s and 1s.
Note that the DCJ scenario obtained transforms G1 into G2 . Let us denote
the distribution of paths generated by this algorithm by p0 , and the uniform
distribution over all possible DCJ scenarios between G1 and G2 by U 0 . Let Xs
denote the set of all possible scenarios drawn by the above algorithm using a
specific scenario s between G∗1 and G2 . Then
X
|p0 (s0 ) − U 0 (s0 )| = |p(s) − U (s)|. (8.45)
s0 ∈Xs

Using Equation (8.45), we get that


1X X 0 0 1X
dT V (p0 , U 0 ) = |p (s ) − U 0 (s0 )| = |p(s) − U (s)| = dT V (p, U ).
2 s 0 2 s
s ∈Xs
(8.46)
This proves that the above algorithm is an FPAUS for #MPDCJ, proving the
left-to-right direction in Equation (8.44).
346 Computational complexity of counting and sampling

We will show that #MPDCJM W is in FPAUS, and thus #MPDCJ is in


FPAUS. As MPDCJ is a self-reducible problem, the FPAUS implies the exis-
tence of an FPRAS. The FPAUS algorithm for #MPDCJM W will be defined
via a rapidly mixing Markov chain. First, we have to recall and prove some
properties on the number of independent and joint sortings of M - and W -
shaped paths.
Our goal is to show that the number of DCJ scenarios in which an M - and
a W -shaped path are sorted independently is a significant fraction of the total
number of scenarios sorting these M - and W -shaped paths (independently or
jointly). We build on the following results by [24].
Theorem 94. [24]
• The number of minimum-length DCJ scenarios sorting a cycle with k >
1 vertices in G1 is k k−2 .
• The number of minimum-length DCJ scenarios sorting an odd path with
k > 1 vertices in G1 is k k−2 .
• The number of minimum-length DCJ scenarios sorting a W -shaped path
with k > 1 vertices in G1 is k k−2 .

• The number of minimum-length DCJ scenarios sorting an M -shaped


path with k > 0 vertices in G1 is (k + 1)k−1 .
Theorem 95. The number of DCJ  k1scenarios that independently sort a W -
2 −1 −2
and an M -shaped path is k1k+k
1 −1
k1 (k2 + 1) k2 −1
where k1 and k2 are the
number of vertices of G1 in the W - and M -shaped paths, respectively.
Proof. It is a consequence of the previous theorem. The W -shaped path is
sorted in k1 − 1 operations,
 and the M -shaped path is sorted in k2 operations.
2 −1
Thus there are k1k+k1 −1
ways to merge two scenarios.
Theorem 96. The number of DCJ scenarios that jointly sort a W - and an
M -shaped path is less than 2(k1 + k2 )k1 +k2 −2 , where k1 and k2 are the number
of vertices of G1 in the W - and M -shaped paths, respectively.
Proof. Let tW W
1 and t2 be the two telomeres of the W -shaped path, and t1 and
M
M 0 0
t2 be the two telomeres of the M -shaped path. Let G1 and G2 be constructed
from genomes G1 and G2 by adding a gene g, with extremities g h and g t ,
and replacing the telomeres of G1 by adjacencies (tW t W h
1 , g ), (t2 , g ) and the
telomeres of G2 by adjacencies (t1 , g ), (t2 , g ). In addition let G001 and G002
M t M h

be constructed from G1 and G2 also by adding a gene g, and replacing the


telomeres of G1 by adjacencies (tW t W h
1 , g ), (t2 , g ) and the telomeres of G2 by
M h M t
adjacencies (t1 , g ), (t2 , g ). In both cases the M - and W -shaped paths in G1
and G2 are transformed into a cycle with k1 + k2 adjacencies in both genomes.
Call the cycles C 0 for the first case, and C 00 for the second.
We prove that any scenario that jointly sorts the W - and M -shaped paths
Approximable counting and sampling problems 347

has a corresponding distinct scenario either sorting the cycle C 0 or sorting


the cycle C 00 . This proves the theorem, because there are (k1 + k2 )k1 +k2 −2
scenarios sorting each cycle.
A scenario jointly sorting the M - and W -shaped paths can be cut into
two parts: the first contains DCJ operations which act only on the M - or
only on the W -shaped path; the second part starts with a DCJ operation
transforming an M - and a W -shaped path into two odd paths, and continues
with operations independently sorting the two odd paths.
In the first part, either a DCJ operation acts on two adjacencies of the
M - or W -shaped path, and the corresponding operation acts on the same two
adjacencies on C 0 or C 00 , or it acts on an adjacency and a telomere of the W -
shaped path, and the corresponding operation acts on two adjacencies of C 0
or C 00 , one of them containing an extremity of g. So there is a correspondence
between being a telomere in the W -shaped path, and being adjacent to an
extremity of g in C 0 or C 00 .
Now the corresponding operation of the DCJ transforming the two paths
into two odd paths has to create two cycles from C 0 or C 00 . Choose C 0 or C 00
so that it is the case. Now sorting an odd path exactly corresponds to sorting
a cycle, by replacing a telomere in the path with an adjacency containing the
extremity of the telomer and an extremity of g in the cycle.
So two different scenarios jointly sorting the M - and W -shaped paths
correspond to two different scenarios sorting either C 0 or C 00 . Then the number
of scenarios jointly sorting the M - and W -shaped paths is less than 2(k1 +
k2 )k1 +k2 −2 .

Theorem 97. Let T (k1 , k2 ) denote the number of DCJ scenarios jointly sort-
ing a W - and an M -shaped path with, respectively, k1 and k2 vertices G1 . Let
I(k1 , k2 ) denote the number scenarios independently sorting the same paths.
We have that
k11.5 k21.5
 
T (k1 , k2 )
= O (8.47)
I(k1 , k2 ) (k1 + k2 )1.5
I(k1 , k2 )
= O (k1 + k2 ) . (8.48)
T (k1 , k2 )

Proof. To prove Equation (8.47), it is sufficient to show that

2(k1 + k2 )k1 +k2 −2 k11.5 k21.5


 
k1 +k2 −1 k1 −2
= O . (8.49)
(k1 + k2 )0.5

k −1
1
k1 (k2 + 1)k2 −1

Using Stirling’s formula, we get on the left-hand side of Equation (8.49)


k1 −1 p k2
2 2π(k1 − 1) k1e−1
p
2π(k2 ) ke2 (k1 + k2 )k1 +k2 −2
k +k −1
. (8.50)
2π(k1 + k2 − 1) k1 +ke 2 −1 k1k1 −2 (k2 + 1)k2 −1
p  1 2
348 Computational complexity of counting and sampling

After simplifications and algebraic rearrangement, we get


s  k1 +k2 −1
2π(k1 − 1)k2 k1 + k2
2 ×
k1 + k2 − 1 k1 + k2 − 1
 k −1  k2 
k1 − 1 1

k2 k1 (k2 + 1)
× . (8.51)
k1 k2 + 1 k1 + k2

from which Equation (8.49) follows by applying (1 + 1/n)n tends to e, and


(1 − 1/n)n tends to 1/e.
To prove Equation (8.48) consider the subset of DCJ scenarios jointly
sorting the W - and M -shaped paths, and starting with a DCJ operation which
acts on a telomere of the W -shaped path, and on an adjacency which is linked
with a telomere of the M -shaped path. The result is two odd paths with,
respectively, k1 and k2 adjacencies and telomeres in G1 . They can be sorted
in, respectively, k1 − 1 and k2 − 1 steps, in k1k1 −2 and k2k2 −2 different ways.
Since we can combine any two particular solutions in k1k+k 2 −2
ways, TI(k 1 ,k2 )

1 −1 (k1 ,k2 )
is bounded by
k1 +k2 −1 k1 −2
(k2 + 1)k2 −1

k1 −1 k1
k1 +k2 −2 k1 −2 k2 −2
 . (8.52)
k1 −1 k1 k2
After minor algebraic simplification, this expression is equal to
 k2 −1
k1 + k2 + 1 1
1+ k2 , (8.53)
k2 k2

which is clearly O(k1 + k2 ).


We are going to sample DCJ scenarios in an indirect way. To do this, we
define a Markov chain on (not necessarily perfect) matchings of a complete
graph converging to a prescribed distribution. The construction and the proof
that the obtained Markov chain is rapidly mixing is very similar to the case
introduced in Subsection 8.2.2. We cannot directly apply the results in Subsec-
tion 8.2.2, since the weights of a matching are defined differently, furthermore,
the individual weights of edges might be exponential functions of the problem
size.
Assume that there are n W -shaped paths and m M -shaped paths, and
consider the complete bipartite graph Kn,m . Let M be a matching of Kn,m ,
which might range from the empty graph up to any maximum matching. A
DCJ scenario is said to be M-compatible when an M -shaped and a W -shaped
path are sorted jointly if and only if they are connected by an edge of M.
We denote by {Pi }i the set of degree 0 vertices in M, and by {Mi Wi }i
the set of edges in M. Let l(Pi ) be the minimum length of a DCJ scenario
independently sorting Pi , and l(Mi Wi ) be the minimum length of a DCJ
scenario jointly sorting Mi and Wi . We can calculate N (Mi , Wi ), the number
of joint sortings of Mi and Wi , in polynomial time [24]. Denote by N (Pi ) the
Approximable counting and sampling problems 349

number of independent sortings of a path Pi . The number of M-compatible


scenarios is
 P P 
( i l(Mi Wi ) + i l(Pi ))!
f (M) = Πi N (Mi , Wi )Πi N (Pi ),
l(Mi , Wi )!, . . . , l(Pi )!
and we can compute it in polynomial time. Define a distribution θ over the
set of all matchings of the complete bipartite graph Kn,m as

θ(M) ∝ f (M). (8.54)

We first show that sampling DCJ scenarios from the uniform distribution
is equivalent to sampling matchings of Kn,m from the distribution θ.
Theorem 98. Let a distribution q over the scenarios of n W -shaped paths
and m M -shaped paths be defined by the following algorithm.
• Draw a random matching M of Kn,m following a distribution p.
• Draw a random M-compatible DCJ scenario from the uniform distribu-
tion of all M-compatible ones.
Then
dT V (p, θ) = dT V (q, U ) (8.55)
where θ is the distribution defined in Equation (8.54), and U denotes the
uniform distribution over all DCJ scenarios.
Proof. It holds that
1 X
dT V (q, U ) = |q(x) − U (x)|.
2 x scenario

We may decompose this sum into


1 X X
|q(x) − U (x)|.
2
(M matching of Kn,m ) (x M−compatible scenario)
P
(x M−compatible scenario) q(x) is p(M)
P
since x is drawn uniformly among the
scenarios compatible with M, and (x M−compatible scenario) U (x) is θ(M).
Furthermore, both q(x) and U (x) are constant for a particular matching M,
thus
1 X X
|q(x) − U (x)| =
2
(M matching of Kn,m ) (x M−compatible scenario)
1 X
= |p(M) − θ(M)| = dT V (p, θ), (8.56)
2
(M matching of Kn,m )

yielding the result.


350 Computational complexity of counting and sampling

So we are going to define an MCMC on matchings of Kn,m converging to


θ. The rapid convergence of this MCMC will imply that #MPDCJW M admits
an FPAUS, and hence #MPDCJ ∈ FPAUS, and then #MPDCJ ∈ FPRAS.
The primer Markov chain walks on the matchings of Kn,m and is defined by
the following steps: suppose the current state is a matching M, and
• with probability 1/2, the next state of the Markov chain is the current
state M;
• with probability 1/2, draw a random i ∼ U [1, n] and j ∼ U [1, m]; if
ij ∈ M, then remove ij from M ; else if degM (i) = 0 and degM (j) = 0,
then add ij to M.
It is easy to see that this Markov chain is irreducible and aperiodic. We
apply the standard Metropolis-Hastings algorithm on this chain, namely, when
we are in state M, we propose the next state Mnew according to the primer
Markov chain, and accept the proposal with probability
 
f (Mnew )
min 1, . (8.57)
f (M)

The obtained Markov chain is reversible and converges to the distribution


θ defined in Equation (8.54). Also observe that the defined chain is a lazy
Markov chain, thus all its eigenvalues are positive real numbers.
An important property of this Markov chain is that
Observation 3. The inverses of the non-zero transition probabilities are poly-
nomially bounded.
Indeed, the transition probability from M to Mnew , if non zero, is at least

1 f (Mnew )
.
2 × n × m f (M)

M and Mnew vary by at most one edge Mi Wi , and on this edge, according to
Theorem 97, the ratio of number of scenarios jointly and independently sorting
Mi and Wi is polynomial. Furthermore, the combinatorial factors appearing in
f (M) and f (Mnew ) due to merging the sorting steps on different components
are the same. So f (M new )
f (M) as well as its inverse are polynomially bounded.
We now prove the rapid convergence of this Markov chain using a mul-
ticommodity flow technique. To prove that the Markov chain we defined on
bipartite matchings has a polynomial relaxation time, we need to construct a
path system Γ on the set of matchings of Kn,m , such that κΓ is bounded by
a polynomial in N , the number of markers in G1 and G2 .
In our case the path system between two matchings X and Y is a unique
path with probability 1. Here is how we construct it.
Fix a total order on the vertex set of Kn,m . Take the symmetric difference
of X and Y, denoted by X ∆Y. It is a set of disjoint paths and cycles. Define an
Approximable counting and sampling problems 351

order on the components of X ∆Y, such that a component C is smaller than a


component D if the smallest vertex in C is smaller than the smallest vertex in
D. Now we orient each component in the following way: the beginning of each
path is its extremity with the smaller vertex. The starting vertex of a cycle is
its smallest vertex, and the direction is going toward its smaller neighbor.
We transform X to Y by visiting the components of X ∆Y in increasing
order. Let the current component be C, and the current matching is Z (at
first Z = X ). If C is a path or cycle starting with an edge in X , then the
transformation steps are the following: delete the first edge of C from Z,
delete the third edge of C from Z, add the second edge of C to Z, delete the
5th edge of C from Z, add the 4th edge of C to Z, etc.
If C is a path or cycle starting with an edge in Y, then the transformation
steps are the following: delete the second edge of C from Z, add the first edge
of C to Z, delete the 4th edge of C from Z, add the third edge of C to Z, etc.
This path has length at most nm, and κΓ can be written:
X θ(x)θ(y)
κΓ ≤ nm max .
e=(u,v)∈E Q(e)
(x,y)∈V ×V :e∈Γx,y

By Property 3, the inverse of the transition probabilities is bounded by a


polynomial in N , so we get
X θ(x)θ(y)
κΓ ≤ O(poly(N )) max . (8.58)
e=(u,v)∈E θ(u)
(x,y)∈V ×V :e∈Γx,y

P θ(x)θ(y)
We then have to show that θ(u) can be bounded by a polynomial in
0
N . Let Z → Z be an edge on the path from X to Y. We define

M
c := X ∆Y∆Z. (8.59)

c and Z → Z 0 determines X and Y.


Lemma 23. The couple M
Proof. It is obvious that
M∆Z
c = X ∆Y, (8.60)
hence, Z and M c determine the symmetric difference of X and Y. From the
transition Z → Z 0 , we can trace back which transition steps have been al-
ready made in the following way. The order of the components of X ∆Y is
determined, and from the transition Z → Z 0 we know the current compo-
nent. We also know the beginning and the direction of the component, be it
either a path or a cycle, hence, we know which edges have been changed in
the component so far, and which ones not yet. From these, we can reconstruct
X and Y.
Lemma 24. A matching can be obtained from M
c by deleting at most two
edges.
352 Computational complexity of counting and sampling

Proof. On each component in X ∆Y, we delete at most two edges before


putting back one. Hence M c contains at most either 4 consecutive edges along
a path or 2 pairs of edges, and all remaining edges are independent. Therefore
it is sufficient to delete at most two edges from Mc to get a matching.

Denote this matching by M.


f

Lemma 25. It holds that


θ(X )θ(Y)
= O(poly(N ))θ(M).
f (8.61)
θ(Z)

Proof. We prove that

f (X )f (Y)
= O(poly(N )). (8.62)
f (Z)f (M)
f

It proves the lemma, as θ(·) and f (·) differ only by a normalizing constant.
M∆Z
f differs at most in two edges from X ∆Y. These edges appear in X ∆Y,
but not in M∆Z.
f The two vertices of any missing edges correspond to com-
ponents which are independently sorted either in Z or M, f but jointly in either
X or Y. Amongst these two vertices, one of them corresponds to a W -shaped
component A, the other to an M -shaped component B. Let k1 be the number
of adjacencies and telomeres of G1 in A, and k2 the number of adjacencies
and telomeres of G1 in B. The ratio on the left-hand side of Equation (8.62)
due to such difference is
T (k1 , k2 ) . I(k1 )I 0 (k2 )
(8.63)
(k1 + k2 + 1)! k1 !(k2 + 1)!

where I(x) denotes the independent sorting of a W -shaped component of size


x, and I 0 (x) denotes the independent sorting of an M -shaped component of
size x. However, it is polynomially bounded, since

I(k1 )I 0 (k2 )
k1 +k2 +1
 = I(k1 , k2 ) (8.64)
k1

and we can apply Theorem 97.


These results together lead to the following theorem:
Theorem 99. The Metropolis-Hastings Markov chain on the matchings de-
fined above converges rapidly to θ.
Proof. From Lemma 25, Equation (8.58) may be written
X
κΓ ≤ O(poly(N )) max θ(M).
f
e=(u,v)∈E
(x,y)∈V ×V :e∈Γx,y
Approximable counting and sampling problems 353

By Lemmas 23 and 24, a matching M f may appear only a polynomial number


of times in this sum. So
X
κΓ ≤ O(poly(N )) θ(M),
f
M
f

P
and as M f θ(M) = 1, κΓ is bounded by a polynomial in N . This proves the
f
theorem.
Using this result, we can prove the following theorem:
Theorem 100. It holds that #MPDCJM W ∈ FPAUS.
Proof. The above-defined Markov chain on partial matchings is an aperiodic,
irreducible and reversible Markov chain, with only positive eigenvalues. Fur-
thermore, a step can be performed in running time, that is, polynomial with
the size of the graph. We claim that for any start state i, log(1/θ(i)) is poly-
nomially bounded with the size of the corresponding genomes G∗1 and G2 .
Indeed, there are O(N 2 ) DCJ operations, the length of the DCJ paths is less
than N , and thus the number of sorting DCJ paths are O(N 2N ), and the
inverse of the probability of any partial matching is less than this. Thus, the
relaxation time is polynomial in both N and log(1/). This means that in
fully polynomial running time (polynomial both in N and − log()) a random
partial matching can be generated from a distribution p satisfying

dT V (p, θ) ≤ . (8.65)

But then a random DCJ path can be generated in fully polynomial running
time following a distribution q satisfying

dT V (q, U ) ≤  (8.66)

according to Theorem 98. This is what we wanted to prove.


Now we are ready to conclude by our main theorem:
Theorem 101. #MPDCJ ∈ FPRAS
Proof. #MPDCJM W ∈ FPAUS according to Theorem 100. Then #MPDCJ ∈
FPAUS according to Theorem 93. Since #MPDCJ is a self-reducible counting
problem, it is in FPRAS.

8.2.6 Sampling and counting the k-colorings of a graph


A k-coloring of a graph G = (V, U ) is a mapping c : V → {1, 2, . . . , k}
such that for all (u, v) ∈ E, c(u) 6= c(v). The exact counting of k-colorings of
a d-regular graph is #P-complete when k ≥ d + 1 and d ≥ 3 [33]. However, it
is possible to design an FPAUS algorithm for sampling k-colorings of a graph
354 Computational complexity of counting and sampling

G when k ≥ 2∆ + 1, where ∆ is the maximal degree in G [102]. It is easy to


see that the following Markov chain (that we will call the Glauber dynamics
Markov chain) converges to the uniform distribution of the k-colorings of a
graph, G. The Markov chain uniformly samples a random vertex v ∈ V and
a random color c0 ∈ {1, 2, . . . , k}. If there is a vertex u ∈ Γ(v) such that
c(u) = c0 , then the chain remains in the same state, otherwise, the color of v
is changed to c0 .
Jerrum proved the rapid mixing of this chain using coupling (Theorem 79).
The Markov chain walks on the direct product of the k-colorings of G; a pair
of such colorings is denoted by (Xt , Yt ). The coloring in Xt is denoted by
cXt (·), similarly, the coloring in Yt is denoted by cYt (·). The coupling of the
Markov chain is defined by the following algorithm.
1. Select a random vertex v uniformly.
2. Compute a permutation σ depending on v, G, Xt and Yt described in
detail below.
3. Choose a random color c0 uniformly.
4. If there is a u ∈ Γ(v) such that cXt (u) = c0 , then Xt+1 will be Xt .
Otherwise the coloring of v will be c0 in Xt+1 . Similarly, if there is a
u ∈ Γ(v) such that cYt (u) = σ(c0 ), then Yt+1 will be Yt . Otherwise the
coloring of v will be σ(c0 ) in Yt+1 .
It is clear that whatever permutation is used in step 2, σ(c0 ) follows the
uniform distribution, therefore the given Markov chain is indeed a coupling.
Let A = At ⊆ V be such that for all u ∈ A, cXt (u) = cYt (u). Similarly, let
D = Dt ⊆ V be such that for all u ∈ D, cXt (u) 6= cYt (u). Let d0 (v) be the
number of edges incident to v that have one endpoint in A and one endpoint
in D. Observe that X X
d0 (v) = d0 (v) = m0 (8.67)
v∈A v∈D

where m0 is the number of edges that span A and D. The permutation σ is


defined in the following way.
(a) If v ∈ D, then let σ be the identity permutation.
S  S 
(b) Otherwise, let CX := u∈Γ(v) {c Xt
(u)} \ u∈Γ(v) {c Y t
(u)} , sim-
S  S 
ilarly, let CY := u∈Γ(v) {cYt (u)} \ u∈Γ(v) {cXt (u)} . Clearly,
CX ∩ CY = ∅. Also observe that |CX |, |Cy | ≤ d0 (v). Without loss of
generality we may assume that |Cx | ≤ |Cy |. Then let CY0 be an arbi-
trary subset of CY with cardinality |CX |. Let CX = {c1 , c2 , . . . , cr } and
CY0 = {c01 , c02 , . . . , c0r } be an arbitrary enumeration of the sets CX and
CY0 . The permutation σ is defined via its cyclic decomposition as

σ := (c1 , c01 )(c2 , c02 ) . . . (cr , c0r ),


Approximable counting and sampling problems 355

which interchanges the colors in CX and CY0 , and fixes all other colors.
It is easy to see that the coupling event happens when |Dt | = 0. The
cardinality of Dt can change at most one. First we consider the case when
the size of Dt increases. This can only happen if the selected vertex, v, is in
A. Then the permutation is selected in line (b), and c0 must be in CY . Since
|CY | ≤ d0 (v), we get that
1 X d0 (v) m0
P (|Dt+1 | = |Dt | + 1) ≤ = , (8.68)
n k nk
v∈A

where n is the number of vertices in G. When Dt decreases, then v is in D.


In this case, the selected permutation, σ, is the identity permutation. The
selected color c0 is accepted if there is no u ∈ Γ(v) such that cXt (u) = c0 or
cYt (u) = c0 . Observe that there are at least
k − 2∆ + d0 (v)
such colors, since for d0 (v) number of neighbors, the color of the neighbors is
the same in Xt and Yt . Thus we get that
1 X k − 2∆ + d0 (d) k − 2∆ m0
P (|Dt+1 | = |Dt | − 1) ≥ = × |Dt | + . (8.69)
n k kn kn
v∈D

Since the probability that Dt increases is smaller than the probability that
Dt deceases, the Markov chain couples rapidly. We can give an estimation of
the expectation of |Dt |:

m0 m0
 
k − 2∆
E(|Dt+1 |) ≤ (|Dt | + 1) + × |Dt | + (|Dt | − 1) +
nk kn kn
2m0
   
k − 2∆ k − 2∆
1− × |Dt | − |Dt | = 1 − |Dt |. (8.70)
kn kn kn
That is,
 t  
k − 2∆ k − 2∆
E(|Dt |) ≤ 1− |D0 | ≤ 1− n. (8.71)
kn kn
Since |Dt | is a random variable on the non-negative integers, we have that
 t
k − 2∆ k−2∆
P (Dt 6= 0) ≤ n 1 − ≤ ne kn t . (8.72)
kn
kn
log nε . Since the coupling

We get that P (|Dt | = 6 0) ≤ ε when t ≥ k−2∆
time upper bounds the relaxation time, we get that the relaxation time is
bounded by a polynomial of n and − log(ε), that is, the proposed Markov
chain provides an FPAUS sampler. It is also easy to show that the number of
k-colorings is in FPRAS if k ≥ 2∆ + 1 [102], see also Exercises 15 and 16.
356 Computational complexity of counting and sampling

8.3 Further results and open problems


8.3.1 Further results
• Mark Jerrum, Alastair Sinclair and Eric Vigoda showed that computing
the permanent is in FPRAS for an arbitrary n × n matrix with non-
negative weights [100].
• Catherine Greenhill proved that the swap Markov chain is rapidly mixing
if the degree sequence satisfies the condition
1√
3 ≤ dmax ≤ M,
4
where dmax is the largest degree and M is the sum of the degrees [84].
Catherine Greenhill and Matteo Sfragara improved this result. They
proved that the Markov chain is rapidly mixing if the degree sequence
satisfies the condition
1√
3 ≤ dmax ≤ M.
3
They also proved that the swap Markov chain is rapidly mixing on the
realizations of directed degree sequences when
√ all in-degrees and out-
degrees are positive and bounded above by 14 m, where m is the number
of arcs, and not all in-degrees and out-degrees equal 1 [85].
• Regina Tyshkevich introduced the canonical decomposition of degree
sequences [171]. Péter Erdős, István Miklós and Zoltán Toroczkai ex-
tended it to bipartite graphs, and proved that the swap Markov chain
is rapidly mixing on the realizations of a bipartite degree sequence D,
if the swap Markov chain is rapidly mixing on each degree sequence ap-
pearing in the decomposition of D [65]. Such degree sequences might be
highly irregular and dense enough such that Greenhill’s condition does
not hold.

• Ivona Bezáková, Nayantara Bhatnagar and Eric Vigoda gave an FPAUS


for the realizations of an arbitrary degree sequence [19]. Their method
is based on Simulated Annealing [114].
• Colin Cooper, Martin Dyer, Catherine Greenhill and Andrew Handley
proved that the swap Markov chain remains rapidly mixing when re-
stricted to the connected realizations of regular graphs [46]. Actually,
they proved a bit more: they proved that the swap Markov chain is
rapidly mixing on connected regular graphs, when only those swap op-
erations changing (u1 , v1 ), (u2 , v2 ) to (u1 , v2 ), (u2 , v1 ) are considered for
which (u1 , u2 ) is an edge.
Approximable counting and sampling problems 357

• Milena Michail and Peter Winkler gave an FPAUS and FPRAS for ap-
proximately sampling and counting Eulerian orientations in any Eulerian
unoriented graph [129].
• Eric Vigoda improved the bound of the Glauber dynamics Markov
chain to k > 116 ∆. Thomas P. Hayes and Eric Vigoda proved that the
same Markov chain is rapidly mixing on the k-colorings of graphs for
k > (1 + ε)∆ for all ε > 0, whenever ∆ = Ω(log(n)) and the graph does
not contain any cycle shorter than 11 [93]. Martin Dyers and Catherine
Greenhill introduced a slightly different Markov chain where an edge
is sampled uniformly, and the colors of the vertices incident to the se-
lected edge are changed. They obtained significatly better mixing time
compared to the mixing time of the Glauber dynamics when k = 2∆
[58].
• The number of independent sets in a graph is a #P-complete counting
problem, even if the maximum degree is 3 [59]. It is still possible to
sample almost uniformly independent sets of a graph [123, 59].

8.3.2 Open problems


• Sukhamay Kundu proved that two tree degree sequences, D =
d1 , d2 , . . . , dn and F = f1 , f2 , . . . , fn , have edge-disjoint tree realizations
if and only if their sum, f1 + d1 , f2 + d2 , . . . , fn + dn , is a graphical
degree sequence. It is unknown how to uniformly sample edge-disjoint
realizations of D and F .
• Brooks’ theorem says that a graph G can be colored with k ≥ ∆ colors
if ∆ ≥ 3 and the graph does not contain a complete graph Kk+1 [28].
We do not know if FPRAS and FPAUS algorithms exist for approximate
counting and sampling k-colorings of graphs when k = ∆, the maximum
degree of the graph.
• Although it is widely believed, it is unknown if the swap Markov chain
is rapidly mixing on the realizations of arbitrary degree sequences.
• We know that counting perfect matchings in a bipartite graph is in
FPRAS. However, it remains an open problem if counting the per-
fect matchings in simple graphs is in FPRAS. Daniel Štefankovič, Eric
Vigoda and John Wimes showed that the Markov chain on perfect and
almost perfect matchings of a simple graph might be torpidly mixing
[182].
• Although there are FPRAS and FPAUS algorithms for approximately
counting and sampling Eulerian orientations of an arbitrary unoriented
graph, it is unknown if the same is true for Eulerian circuits of an arbi-
trary unoriented graph.
358 Computational complexity of counting and sampling

• There are polynomial running time algorithms to find a shortest rever-


sal scenario between two signed permutations [89, 166]. The number of
shortest reversal scenarios is conjectured to be in #P-complete. No effi-
cient algorithms are known for approximately sampling or counting these
scenarios. Sorting permutations by block-interchanges [43] is another ex-
ample where the optimization problem can be solved in polynomial time,
and the complexity of counting the shortest rearrangement scenarios is
unknown.

• Sampath Kannan, Z. Sweedyk and Steve Mahaney gave a quasi-


polynomial algorithm for approximately sampling and counting words
with a given length from a regular grammar [106]. (A function in form
k
elog (n) is called quasi-polynomial.) Vivek Gore and his co-workers ex-
tended this result for context-free grammars [80] and they also presented
an FPAUS for some restricted regular grammars. We do not know if
FPAUS and FPRAS algorithms are available for arbitrary regular and
context-free grammars.

8.4 Exercises
1. Using a rounding technique similar to that presented in Subsection 8.1.1,
give a deterministic approximation algorithm for the following problem.
The input is a set of weights, A = {w1 , w2 , . . . , wn } and a weight W ,
the output is the subset
( )
X X
arg max w| w≤W .
S⊂A
w∈S w∈S

1
The running time of the algorithm must be polynomial with n and ε,
where 1 + ε is the approximation rate of the solution.
2. Prove Observation 2.
3. ◦ Prove that the number of trees realizing degree sequence D =
d1 , d2 , . . . , dn and in which vi is adjacent to a prescribed leaf and vj
is also adjacent to a prescribed leaf is

(n − 4)!
Q .
(di − 2)!(dj − 2)! k6=i,j (dk − 1)!
Approximable counting and sampling problems 359

4. By computing the definite integral


Z 1 Z 1 Z 1
... 1dxn dx n − 1 . . . d1 ,
x1 =0 x2 =x1 xn =xn−1

give an alternative proof that the volume of the poset polytope of a total
1
ordering of n elements is indeed n! .
5. * Find a graph G = (V, E) and two matchings X and Y on it such that
the canonical path between X and Y as described in Subsection 8.2.2
has length 3n
5 , where n = |V |.

6. Let G = (V, E) be a simple graph, and let w : E → R+ be a weight


function, such that the maximum weight is a polynomial function of the
number of vertices in G. Let M(G) denote the set of (not necessarily
perfect) matchings of G. Develop an FPRAS estimating
X Y
w(e)
M ∈M(G) e∈M

using the fact that there is a rapidly mixing Markov chain converging
to the distribution Y
π(M ) ∝ w(e).
e∈M

7. * Prove that for any bipartite degree sequence


D = {d1,1 , d1,2 , . . . , d1,n }, {d2,1 , d2,2 , . . . , d2,m } there exists a simple
degree sequence F = f1 , f2 , . . . , fn,m such that D and F have the
same realizations, furthermore, a polynomial time computable bijection
between the realizations exists.
8. Give an example for two directed graphs G ~ 1 and G~ 2 having the same
degree sequence D, however, G ~ 1 cannot be transformed into G ~ 2 using
only swap operations. A swap operation in a directed graph deletes the
presented edges (a, b) and (c, d) and adds the edges (a, d) and (c, b).
9. ◦ Construct two bipartite graphs, G1 and G2 , with n-n vertices in their
vertex classes which have the same degree sequence and for which Ω(n2 )
swap operations are needed to transform G1 into G2 .
10. Prove that the swap Markov chain is irreducible on the realizations of
regular directed degree sequences.
11. Prove Theorem 92.
12. ◦ Prove that the Glauber dynamics Markov chain introduced in Subsec-
tion 8.2.6 is rapidly mixing when k ≥ 2∆.
13. Prove that the Glauber dynamics Markov chain is irreducible when k ≥
∆ + 2.
360 Computational complexity of counting and sampling

14. * Give an example that the Glauber dynamics Markov chain might not
be irreducible when k = ∆ + 1.
15. ◦ Let G = (V, E) be a simple graph, and let G0 = G \ {e} for some
e ∈ E. Let Ωk (G) denote the set of k-colorings of G. Furthermore, let
k ≥ 2∆ + 1, where ∆ is the maximum degree in G. Show that
∆+1 |Ωk (G)|
≤ ≤ 1.
∆+2 |Ωk (G0 )|
Design an algorithm that estimates this ratio via sampling k-colorings
of G0 .
16. It is easy to see that |Ωk (G)| can be estimated as
m−1
Y |Ωk (Gi+1 )|
|Ωk (G0 )| ,
i=0
|Ωk (Gi )|

where each Gi contains one less edge than Gi+1 , G0 is the empty graph,
and Gm = G (that is, m is the number of edges in G). How well should
each fraction be estimated to get an FPRAS for |Ωk (G)|?

8.5 Solutions
Exercise 3. Observe that the number of trees with the prescribed conditions
is the number of trees realizing D0 , where D0 is obtained from D by removing
1-1 from di and dj and deleting two degree 1s.
Exercise 5. Let G be P5 , that is, the path on 5 vertices. Let X be the
matching containing the first and third edges of P5 , and let Y be the matching
containing the second and the fourth edges of P5 . In the canonical path from
X to Y , the first edge is deleted in the first step. Then the second edge is
added and the third is deleted in the second step. Finally, in the third step,
the fourth edge of P3 is added.
Exercise 7. Observe that

F = d1,1 + n, d1,2 + n, . . . , d1,n + n, d2,1 , d2,2 , . . . , d2,m

is an appropriate degree sequence. To see this, consider any realization G =


(U, V, E) of D, and add a complete graph kn to the vertex set U . Clearly,
the so obtained graph G0 is a realization of F . Any realization of F can be
obtained from G0 by swaps. However, these swaps can swap edges between
U and V , since G0 is the complete graph on U and the empty graph on V .
That is, any realization of F contains a complete graph on U , and has no edge
Approximable counting and sampling problems 361

inside V . Deleting Kn on U establishes the bijection between the realizations


of F and D.
Exercise 9. Find appropriate n/2-regular graphs.
Exercise 12. When k = 2∆, the probability that |Dt | is increasing is less
than or equal to the probability that |Dt | is decreasing. Model the change of
|Dt | as a random walk on the [0, n] integer line, where n is the number of
vertices in G, and estimate the time hitting 0.
Exercise 14. Let G be K3 . Then ∆ = 2, and there are 3! = 6 possible 3-
colorings. However, any of these colorings differ in at least 2 vertices, so there
are no couple of colorings with a transition between them.
Exercise 15. First, observe that Ωk (G) ⊆ Ωk (G0 ), which proves the right
inequality. Let e be (u, v). The next observation is that in any k-colorings
in Ωk (G0 ) \ Ωk (G), the color of u is the color of v. There might be at least
k − ∆ ≥ ∆ + 1 colors to which the color of vertex u is changed to a color to get
a coloring of G. On the other hand, any coloring in Ωk (G) can be transformed
into a coloring in Ωk (G0 ) \ Ωk (G) by changing the color of u to the color of v.
Therefore, indeed
∆+1 |Ωk (G0 )|
≤ .
∆+2 |Ωk (G)|
Bibliography

[1] http://www.claymath.org/sites/default/files/pvsnp.pdf.
[2] van T. Aardenne-Ehrenfest and N. G. de Bruijn. Wis- en Natuurkundig
Tijdschrift, volume 28, chapter Circuits and trees in oriented linear
graphs, pages 203–217. 1951.
[3] M. Agrawal, N. Kayal, and N. Saxena. PRIMES is in P. Annals of
Mathematics, 160(2):781–793, 2004.
[4] Y. Ajana, J.F. Lefebvre, E. Tillier, and N. El-Mabrouk. Exploring the
set of all minimal sequences of reversals - An application to test the
replication-directed reversal hypothesis. In R. Guigo and D. Gusfield,
editors, Proceedings of the 2nd International Workshop on Algorithms
in Bioinformatics, volume 2452 of Lecture Notes in Computer Science,
pages 300–315. Springer, 2002.
[5] D.J. Aldous. Some inequalities for reversible Markov chains. Journal of
the London Mathematical Society (2), 25(3):564–576, 1982.
[6] D.J. Aldous. Random walks on finite groups and rapidly mixing Markov
chains. In Séminaire de Probabilites XVII, volume 986 of Lecture Notes
in Mathematics, pages 243–297. Springer, 1983.
[7] N. M. Amato, M. T. Goodrich, and E. A. Ramos. A randomized algo-
rithm for triangulating a simple polygon in linear time. Discrete and
Computational Geometry, 26(2):245–265, 2001.
[8] S. Arora and B. Barak. Computational Complexity: A Modern Approach.
Cambridge University Press, 2009.
[9] L. Babai. Graph isomorphism in quasipolynomial time, 2015. arXiv:
1512.03547.
[10] L. Babai and E.M. Luks. Canonical labeling of graphs. In Proceedings
of the 15th Annual ACM Symposium on Theory of Computing, pages
171–183, 1983.
[11] D.A. Bader, B.M.E. Moret, and M. Yan. A linear-time algorithm
for computing inversion distance between signed permutations with an
experimental study. Journal of Computational Biology, 8(5):483–491,
2001.

363
364 Bibliography

[12] I. Bárány and Z. Füredi. Computing the volume is difficult. In Pro-


ceedings of the 18th Annual ACM Symposium on Theory of Computing,
pages 442–447, 1986.
[13] R. Barbanchon. On unique graph 3-colorability and parsimonious re-
ductions in the plane. Theoretical Computer Science, 319:455–482, 2004.
[14] L.E. Baum and J.A. Egon. An inequality with applications to statistical
estimation for probabilistic functions of a Markov process and to a model
for ecology. Bulletin of the American Mathematical Society, 73:360–363,
1967.
[15] L.E. Baum and T. Petrie. Statistical inference for probabilistic functions
of finite state Markov chains. The Annals of Mathematical Statistics,
37(6):1554–1563, 1966.

[16] L.E. Baum and G.R. Sell. Growth functions for transformations on
manifolds. Pacific Journal of Mathematics, 27(2):211–227, 1968.
[17] A. Bergeron, J. Mixtacki, and J. Stoye. A unifying view of genome
rearrangements. In Proceedings of the 6th Workshop on Algorithms
in Bioinformatics, volume 4175 of Lecture Notes in Computer Science,
pages 163–173. Springer, 2006.
[18] S.J. Berkowitz. On computing the determinant in small parallel time
using a small number of processors. Information Processing Letters,
18:147–150, 1984.

[19] I. Bezáková, N. Bhatnagar, and E. Vigoda. Sampling binary contingency


tables with a greedy start. Random Structures and Algorithms, 30(1–
2):168–205, 2008.
[20] J.P.M. Binet. Mémoire sur un systeme de formules analytiques, et leur
application á des considerations géométriques. J. de l’Ecole Polytech-
nique IX, Cahier, 16:280–287, 1815.
[21] P. Bose, J.F. Buss, and A. Lubiw. Pattern matching for permutations.
Information Processing Letters, 65(5):277–283, 1998.
[22] G.E.P. Box and M.E. Muller. A note on the generation of random
normal deviates. The Annals of Mathematical Statistics, 29(2):610–611,
1958.
[23] M.D.V. Braga, M. Sagot, C. Scornavacca, and E. Tannier. The solution
space of sorting by reversals. In Mândoiu I. and A. Zelikovsky, editors,
Bioinformatics Research and Applications. ISBRA 2007, volume 4463
of Lecture Notes in Computer Science, pages 293–304. Springer, Berlin,
Heidelberg, 2007.
Bibliography 365

[24] M.D.V. Braga and J. Stoye. The solution space of sorting by DCJ.
Journal of Computational Biology, 17(9):1145–1165, 2010.

[25] P. Brémaud. Markov Chains: Gibbs Fields, Monte Carlo Simulation,


and Queues. Texts in Applied Mathematics. Springer, New York, 1999.
[26] G. Brightwell and P. Winkler. Counting linear extensions is #P-
complete. In Proceedings of the 23rd Annual ACM Symposium on The-
ory of Computing, pages 175–181, 1991.
[27] G. Brightwell and P. Winkler. Note on counting Eulerian circuits, 2004.
https://arxiv.org/pdf/cs/0405067.pdf.
[28] R. L. Brooks. On coloring the nodes of a network. Proceedings of the
Cambridge Philosophical Society, 37:194–197, 1941.

[29] R. Bubley and M. Dyer. Path coupling: A technique for proving rapid
mixing in Markov chains. In Proceedings of the 38th Annual Symposium
on Foundations of Computer Science, pages 223–231, 1997.
[30] G. Buffon. Essai d’arithmétique morale. Histoire naturelle, générale er
particuliére, Supplément 4:46–123, 1777.
[31] J.-Y. Cai. Holographic algorithms: Guest column. SIGACT News,
39(2):51–81, 2008.
[32] J-Y. Cai and X. Chen. Complexity Dichotomies for Counting Problems:
Volume 1, Boolean Domain. Cambridge University Press, 2017.

[33] J-Y. Cai, H. Guo, and T. Williams. The complexity of counting edge
colorings and a dichotomy for some higher domain Holant problems.
Research in the Mathematical Sciences, 3:18, 2016.
[34] J-Y. Cai and P. Lu. Holographic algorithms: The power of dimension-
ality resolved. In L. Arge, C. Cachin, Jurdziński T., and A. Tarlecki,
editors, Proceedings of the 34th International Colloquium on Automata,
Languages and Programming, volume 4596 of Lecture Notes in Computer
Science, pages 631–642, 2007.
[35] J-Y. Cai and P. Lu. On symmetric signatures in holographic algorithms.
In W. Thomas and P. Weil, editors, Proceedings of the 24th Annual
Symposium on Theoretical Aspects of Computer Science, volume 4393
of Lecture Notes in Computer Science, pages 429–440, 2007.
[36] J-Y. Cai and P. Lu. Basis collapse in holographic algorithms. Compu-
tational Complexity, 17(2):254–281, 2008.

[37] J.-Y. Cai, P. Lu, and M. Xia. Holographic algorithms by Fibonacci


gates. Linear Algebra and Its Applications, 438(2):690–707, 2013.
366 Bibliography

[38] J.-Y. Cai, P. Lu, and M. Xia. Holographic algorithms with matchgates
capture precisely tractable planar #csp. SIAM Journal on Computing,
46(3):853–889, 2017.
[39] A. Cauchy. Memoire sur le nombre de valeurs qu’une fonction peut
obtenir. J. de l’Ecole Polytechnique X, pages 51–112, 1815.
[40] B. Chazelle. Triangulating a simple polygon in linear time. Discrete and
Computational Geometry, 6(3):485–524, 1991.
[41] N. Chomsky. Transformational Analysis. PhD thesis, University of
Pennsylvania, 1955.
[42] N. Chomsky. On certain formal properties of grammars. Information
and Control, 2:137–167, 1959.
[43] D. A. Christie. Sorting permutations by block-interchanges. Information
Processing Letters, 60:165–169, 1996.
[44] S. Cook. The complexity of theorem proving procedures. In Proceedings
of the 3rd Annual ACM Symposium on Theory of Computing, pages
151–158, 1971.
[45] C. Cooper, M. Dyer, and C. Greenhill. Sampling regular graphs and
a peer-to-peer network. Combinatorics, Probability and Computing,
16(4):557–593, 2007.
[46] C. Cooper, M. Dyer, C. Greenhill, and A. Handley. The flip Markov
chain for connected regular graphs, 2017. arXiv:1701.03856.
[47] D. Coppersmith and S. Winograd. Matrix multiplication via arithmetic
progressions. Journal of Symbolic Computation, 9(3):251–280, 1980.
[48] P. Creed. Counting and Sampling Problems on Eulerian G]raphs, school
= University of Edinburgh, year = 2010, OPTkey = , OPTtype = ,
OPTaddress = , OPTmonth = , OPTnote = , OPTannote = . PhD
thesis.
[49] P. Creed. Sampling Eulerian orientations of triangular lattice graphs.
Journal of Discrete Algorithms, 7(2):168–180, 2009.
[50] É. Czabarka, A. Dutle, P.L. Erdős, and I. Miklós. On realizations of
a joint degree matrix. Discrete Applied Mathematics, 181(30):283–288,
2014.
[51] A.M. Davie and A.J. Stothers. Improved bound for complexity of matrix
multiplication. Proceedings of the Royal Society of Edinburgh Section A,
143(2):351–369, 2013.
[52] P. Diaconis and L. Saloff-Coste. Comparison theorems for reversible
Markov chains. The Annals of Applied Probability, 3(2):696–730, 1993.
Bibliography 367

[53] P. Diaconis and L. Saloff-Coste. Logarithmic Sobolev inequalities for


finite Markov chains. The Annals of Applied Probability, 6(3):695–750,
1996.
[54] P. Diaconis and D. Stroock. Geometric bounds for eigenvalues of Markov
chains. The Annals of Applied Probability, 1(1):36–61, 1991.
[55] W. Doeblin. Esposé de la théorie des chaı̂nes simple constantes de
Markov á un nombre fini d’états. Rev. Math. Union Interbalkan, 2:77–
105, 1938.
[56] M. Dyer. Approximate counting by dynamic programming. In Pro-
ceedings of the 35th Annual ACM Symposium on Theory of Computing
(STOC), pages 693–699, 2003.
[57] M. Dyer, L.A. Goldberg, C. Greenhill, and M. Jerrum. The relative
complexity of approximate counting problems. Algorithmica, 38(3):471–
500, 2004.
[58] M. Dyer and C. Greenhill. A more rapidly mixing Markov chain for
graph colourings. Random Structures and Algorithms, 13:285–317, 1998.
[59] M. Dyer and C. Greenhill. On Markov chains for independent sets.
Journal of Algorithms, 35(1):17–49, 2000.
[60] J. Edmonds. Paths, trees, and flowers. Canadian Journal of Mathemat-
ics, 17:449–467, 1965.
[61] G. Elekes. A geometric inequality and the complexity of computing
volume. Discrete and Computational Geometry, 1:289–292, 1986.
[62] P.L. Erdős, Z.S. Kiss, I. Miklós, and L. Soukup. Approximate counting
of graphical realizations. PLoS ONE, 10(7):e0131300, 2015.
[63] P. Erdős and T. Gallai. Gráfok előı́rt fokszámú pontokkal, (Graphs
with prescribed degrees of vertices, in Hungarian). Matematikai Lapok,
11:264–274, 1960.
[64] P.L. Erdős, I. Miklós, and Z. Toroczkai. A decomposition based proof
for fast mixing of a Markov chain over balanced realizations of a joint
degree matrix. SIAM Journal on Discrete Mathematics, 29:481–499,
2015.
[65] P.L. Erdős, I. Miklós, and Z. Toroczkai. New classes of degree se-
quences with fast mixing swap Markov chain sampling. Combina-
torics, Probability and Computing, 2017. https://doi.org/10.1017/
S0963548317000499 Published online: 02 November 2017.
[66] D.K. Faddeev and V.N. Faddeeva. Numerical Methods of Linear Algebra.
Freeman, San Francisco, 1963.
368 Bibliography

[67] R. Fagin. Generalized first-order spectra and polynomial time recognis-


able sets. In R. Karp, editor, Complexity of Computation, volume 7 of
SIAM-AMS Proceedings, pages 43–73. American Mathematical Society,
1974.
[68] J. Felsenstein. Evolutionary trees from DNA sequences: A maximum
likelihood approach. Journal of Molecular Evolution, 17(6):368–376,
1981.

[69] G.D. Forney. The Viterbi algorithm. Proceedings of the IEEE, 61:268–
278, 1973.
[70] A. Frank. Paths, flows, and VLSI-Layout, chapter Packing paths, cir-
cuits and cuts: A survey. Springer, Berlin, 1990.

[71] A. Galanis, L.A. Goldberg, and M. Jerrum. A complexity trichotomy for


approximately counting list H-colorings. ACM Transactions on Com-
putation Theory, 9(2):1–22, 2017.
[72] M.R. Garey and D.S. Johnson. A Guide to the Theory of NP–
Completeness. A Series of Books in the Mathematical Sciences. W.
H. Freeman and Co., 1979.
[73] M.R. Garey, D.S. Johnson, and L. Stockmeyer. Some simplified np–
complete graph problems. Theoretical Computer Science, 1:237–267,
1976.
[74] M.R. Garey, D.S. Johnson, and R.E. Tarjan. The planar Hamiltonian
circuit problem is NP-complete. SIAM Journal on Computing, 5:704–
714, 1976.
[75] Q. Ge and D. Štefankovič. The complexity of counting Eulerian tours
in 4-regular graphs. Algorithmica, 63:588–601, 2012.

[76] R. Giegerich and C. Meyer. Algebraic dynamic programming. In


H. Kirchner and C. Ringeissen, editors, Algebraic Methodology and Soft-
ware Technology. AMAST 2002, volume 2422 of Lecture Notes in Com-
puter Science, pages 349–364, 2002.
[77] G.H. Gloug and C.F. Van Loan. Matrix Computations (3rd edition).
Johns Hopkins, 1996.
[78] L.A. Goldberg and M. Jerrum. Counting unlabelled subtrees of a tree is
#P-complete. LMS Journal of Computation and Mathematics, 3:117–
124, 2000.
[79] L.A. Goldberg and M. Jerrum. The “Burnside process” converges slowly.
Combinatorics, Probability and Computing, 11(1):21–34, 2002.
Bibliography 369

[80] V. Gore, M. Jerrum, S. Kannan, Z. Sweedyk, and S. Mahaney. A quasi-


polynomial-time algorithm for sampling words from a context-free lan-
guage. Information and Computation, 134:59–74, 1997.
[81] O. Gotoh. An improved algorithm for matching biological sequences.
Journal of Molecular Biology, 162:705–708, 1982.
[82] C. Greenhill. The complexity of counting colourings and independent
sets in sparse graphs and hypergraphs. Computational Complexity,
9(1):52–72, 2000.
[83] C. Greenhill. A polynomial bound on the mixing time of a Markov chain
for sampling regular directed graphs. Electronic Journal of Combina-
torics, 18(1):#P234, 2011.

[84] C. Greenhill. The switch Markov chain for sampling irregular graphs. In
Proceedings of the 26th ACM SIAM Symposium on Discrete Algorithms,
New York-Philadelphia, pages 1564–1572, 2015.
[85] C. Greenhill and M. Sfragara. The switch Markov chain for sampling
irregular graphs and digraphs. Theoretical Computer Science, 719:1–20,
2018.
[86] L. Gross. Logarithmic Sobolev inequalities. American Journal of Math-
ematics, 97(4):1061–1083, 1975.
[87] D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Sci-
ence and Computational Biology, chapter Maximum parsimony, Steiner
trees, and perfect phylogeny, page 470. Cambridge University Press,
1997.
[88] S.L. Hakimi. On realizability of a set of integers as degrees of the vertices
of a linear graph. i. Journal of the Society for Industrial and Applied
Mathematics, 10:496–506, 1962.

[89] S. Hannenhalli and P. Pevzner. Transforming cabbage into turnip (poly-


nomial algorithm for sorting signed permutations by reversals). In Pro-
ceedings of the 27th Annual Symposium on Theory of Computing, pages
178–189, 1995.

[90] J.A. Hartigan. Minimum mutation fits to a given tree. Biometrics,


29:53–65, 1973.
[91] W.K. Hastings. Monte Carlo sampling methods using Markov chains
and their applications. Biometrika, 57(1):97–109, 1970.
[92] V. Havel. A remark on the existence of finite graphs (in Czech). Časopis
pro pěstovánı́ matematiky, 80:477–480, 1955.
370 Bibliography

[93] T.P. Hayes and E. Vigoda. A non-Markovian coupling for randomly


sampling colorings. In Proccedings of the 44th Annual IEEE Symposium
on Foundations of Computer Science, pages 618–627, 2003.
[94] J.E. Hopcroft and R.M. Karp. An n5/2 algorithm for maximum match-
ings in bipartite graphs. SIAM Journal on Computing, 2(4):225–231,
1973.
[95] M. Huber. Fast perfect sampling from linear extensions. Discrete Math-
ematics, 306:420–428, 2006.
[96] H. B. Hunt, M. V. Marathe, V. Radhakrishnan, and R. E. Stearns. The
complexity of planar counting problems. SIAM Journal on Computing,
27(4):1142–1167, 1998.
[97] M. Jerrum. Two-dimensional monomer-dimer systems are computation-
ally intractable. Journal of Statistical Physics, 48(1/2):121–134, 1987.
[98] M. Jerrum. Counting trees in a graph is #P-complete. Information
Processing Letters, 51:111–116, 1994.
[99] M. Jerrum and A. Sinclair. Approximating the permanent. SIAM Jour-
nal on Computing, 18(6):1149–1178, 1989.
[100] M. Jerrum, A. Sinclair, and E. Vigoda. A polynomial-time approxima-
tion algorithm for the permanent of a matrix with nonnegative entries.
Journal of the ACM, 51(4):671–697, 2004.
[101] M. Jerrum and M. Snir. Some exact complexity results for straight-line
computations over semirings. Journal of the Association for Computing
Machinery, 29(3):874–897, 1982.
[102] M.R. Jerrum. A very simple algorithm for estimating the number of
k-colourings of a low-degree graph. Random Structures and Algorithms,
7(2):157–165, 1995.
[103] M.R. Jerrum, L.G. Valiant, and V.V. Vazirani. Random generation
of combinatorial structures from a uniform distribution. Theoretical
Computer Science, 43(2–3):169–188, 1986.
[104] W. Just. Computational complexity of multiple sequence alignment with
SP-score. Journal of Computational Biology, 8(6):615–623, 2001.
[105] R. Kannan, L. Lovász, and M. Simonovits. Random walks and an O∗(n5 )
volume algorithm for convex bodies. Random Structures and Algorithms,
11(1):1–50, 1997.
[106] S. Kannan, Z. Sweedyk, and S. R. Mahaney. Counting and random
generation of strings in regular languages. In Proceedings of the 6th
Annual ACM-SIAM Symposium on Discrete Algorithms, pages 551–557,
1995.
Bibliography 371

[107] Shamir R. Kaplan, H. and R. Tarjan. A faster and simpler algorithm


for sorting signed permutations by reversals. SIAM Journal on Com-
puting, 29(3):880–892, 1999. First appeared in the Proceedings of the
8th Annual Symposium on Discrete Algorithms.
[108] R.M. Karp. Complexity of Computer Computations, chapter Reducibil-
ity among combinatorial problems, pages 85–103. Plenum, New York,
1972.

[109] A. Karzanov and L. Kachiyan. On the conductance of order Markov


chains. Order, 8:7–15, 1991.
[110] P. W. Kasteleyn. Dimer statistics and phase transitions. Journal of
Mathematical Physics, 4(2):287–293, 1963.

[111] P. W. Kasteleyn. Graph Theory and Theoretical Physics, chapter Graph


theory and crystal physics, pages 43–110. Academic Press, New York,
1967.
[112] P.W. Kasteleyn. The statistics of dimers on a lattice. I. The number of
dimer arrangements on a quadratic lattice. Physica, 27(12):1209–1225,
1961.
[113] G. Kirchhoff. Über die Auflösung der Gleichungen, auf welche man bei
der untersuchung der linearen verteilung galvanischer Ströme geführt
wird. Annalen der Physic und Chemie, 72:497–508, 1847.
[114] S. Kirkpatrick, C. D Gelatt Jr, and M. P. Vecchi. Optimization by
simulated annealing. Science, 220(4598):671–680, 1983.
[115] B. Knudsen and J.J. Hein. Using stochastic context-free grammars and
molecular evolution to predict RNA secondary structure. Bioinformat-
ics, 15:446–454, 1999.

[116] J. B. Kruskal. On the shortest spanning subtree of a graph and the


traveling salesman problem. Proceedings of the American Mathematical
Society, 7:48–50, 1956.
[117] S. Kundu. Disjoint representation of tree realizable sequences. SIAM
Journal on Applied Mathematics, 26(1):103–107, 1974.

[118] N. Linial. Hard enumeration problems in geometry and combinatorics.


SIAM Journal on Algebraic and Discrete Methods, 7(2):331–335, 1986.
[119] M. Liskiewicz, M. Ogihara, and S. Toda. The complexity of counting
self-avoiding walks in subgraphs of two-dimensional grids and hyper-
cubes. Theoretical Computer Science, 304(1–3):129–156, 2003.
372 Bibliography

[120] C.H.C. Little. An extension of Kasteleyn’s method of enumerating the


1-factors of planar graphs. In D.A. Holton, editor, Combinatorial Math-
ematics, volume 403 of Lecture Notes in Mathematics, pages 63–72.
Springer, Berlin, Heidelberg, 1974.
[121] J. S. Liu. Monte Carlo Strategies in Scientific Computing. Springer
Series in Statistics. Springer, New York, 1999.

[122] L. Lovász and S. Vempala. Simulated annealing in convex bodies and an


O∗ (n4 ) volume algorithm. Journal of Computer and System Sciences,
72(2):392–417, 2006.
[123] M. Luby and E. Vigoda. Fast convergence of the Glauber dynamics for
sampling independent sets. Random Structures and Algorithms, 15:229–
241.

[124] M. Mahajan and V. Vinay. Old algorithms, new insights. SIAM Journal
on Discrete Mathematics, 12:474–490, 1999.
[125] R. Martin and D. Randall. Disjoint decomposition of Markov chains
and sampling circuits in Cayley graphs. Combinatorics, Probability and
Computing, 15:411–448, 2006.
[126] J.S. McCaskill. The equilibrium partition function and base pair binding
probabilities for RNA secondary structure. Biopolymers, 29:1105–1119,
1990.
[127] Braga M.D.V. and Stoye J. Counting all DCJ sorting scenarios. In Ci-
ccarelli F.D. and Miklós I., editors, Proceedings of the 6th RECOMB
Comparative Genomics Workshop, volume 5817 of Lecture Notes in
Computer Science, pages 36–47. Springer, Berlin, Heidelberg, 2009.
[128] N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, and
E. Teller. Equations of state calculations by fast computing machines.
Journal of Chemical Physics, 21(6):1087–1092, 1953.
[129] M. Michail and P. Winkler. On the number of Eulerian orientations of
a graph. Algorithmica, 16(4/5):402–414, 1996.
[130] I. Miklós, P.L. Erdős, and L. Soukup. Towards random uniform sampling
of bipartite graphs with given degree sequence. Electronic Journal of
Combinatorics, 20(1):P16, 2013.
[131] I. Miklós, B. Mélykúti, and K. Swenson. The Metropolized partial im-
portance sampling MCMC mixes slowly on minimum reversal rearrange-
ment paths. ACM/IEEE Transactions on Computational Biology and
Bioinformatics, 4(7):763–767, 2010.
Bibliography 373

[132] I. Miklós, I.M. Meyer, and B. Nagy. Moments of the Boltzmann distri-
bution for RNA secondary structures. Bulletin of Mathematical Biology,
67:1031–1047, 2005.
[133] I. Miklós and H. Smith. The computational complexity of calculating
partition functions of optimal medians with hamming distance, 2017.
https://arxiv.org/abs/1506.06107.

[134] I. Miklós and E. Tannier. Approximating the number of double cut-


and-join scenarios. Theoretical Computer Science, 439:30–40, 2012.
[135] I. Miklós, E. Tannier, and Z.S. Kiss. On sampling SCJ rearrangement
scenarios. Theoretical Computer Science, 552:83–98, 2014.
[136] B. Morris and A. Sinclair. Random walks on truncated cubes and sam-
pling 0-1 knapsack solutions. SIAM J. Comput., 34(1):195–226, 2004.
[137] S.B. Needleman and C.D. Wunch. A general method applicable to the
search for similarities in the amino acid sequence of two proteins. Journal
of Molecular Biology, 48(3):443–453, 1970.

[138] A. Ouangraoua and A. Bergeron. Parking functions, labeled trees and


DCJ sorting scenarios. In Ciccarelli F.D. and Miklós I., editors, Pro-
ceedings of the 6th RECOMB Comparative Genomics Workshop, vol-
ume 5817 of Lecture Notes in Computer Science, pages 24–35. Springer,
Berlin, Heidelberg, 2009.
[139] A. Ouangraoua and A. Bergeron. Combinatorial structure of genome re-
arrangements scenarios. Journal of Computational Biology, 17(9):1129–
1144, 2010.
[140] C.H. Papadimitriou. Computational Complexity. Addison-Wesley, Read-
ing, Mass., 1994.

[141] L. Pauling. The structure and entropy of ice and of other crystals
with some randomness of atomic arrangement. Journal of the Amer-
ican Chemical Society, 57(12):2680–2684, 1935.
[142] G. Pólya. Aufgabe 424. Arch. Math. Phys., 20:271, 1913.
[143] J.G. Propp and D.B. Wilson. Coupling from the past: A user’s guide. In
D. Aldous and J. Propp, editors, Microsurveys in Discrete Probability,
volume 41 of DIMACS Series in Discrete Mathematics and Theoretical
Computer Science, pages 181–192, 1998.
[144] J.S. Provan and M.O. Ball. The complexity of counting cuts and of
computing the probability that a graph is connected. SIAM Journal on
Computing, 12(4):777–788, 1983.
374 Bibliography

[145] R.L. Rabiner. A tutorial on hidden Markov models and selected appli-
cations in speech recognition. Proceedings of the IEEE, 77(2):257–286,
1989.
[146] N. Robertson, P.D. Seymour, and R. Thomas. Permanents, Pfaffian
orientations, and even directed circuits. The Annals of Mathematics,
150:929–975, 1999.

[147] O.S. Rothaus. Diffusion on compact Riemannian manifolds and logarith-


mic Sobolev inequalities. Journal of Functional Analysis, 42(1):102–109,
1981.
[148] S. Saluja, K.V. Subrahmanyam, and M.N. Thakur. Descriptive com-
plexity of #P functions. Journal of Computer and Systems Sciences,
50:493–505, 1995.

[149] P.A. Samuelson. A method of determining explicitly the coefficients


of the characteristic equation. The Annals of Mathematical Statistics,
13:424–429, 1942.
[150] D. Sankoff and P. Rousseau. Locating the vertices of a Steiner tree in an
arbitrary metric space. Mathematical Programming, 9:240–246, 1975.
[151] C.P. Schnorr. A lower bound on the number of additions in monotone
computations. Theoretical Computer Science, 2(3):305–315, 1976.
[152] J. Schweinsberg. An O(n2 ) upper bound for the relaxation time of
a Markov chain on cladograms. Random Structures and Algorithms,
20:59–70, 2001.
[153] P.H. Shellers. On the theory and computation of evolutionary distances.
SIAM Journal on Applied Mathematics, 26(4):787–793, 1974.
[154] A. Siepel. An algorithm to enumerate sorting reversals for signed per-
mutations. Journal of Computational Biology, 10(3–4):575–597, 2003.
[155] J. Simon. On the difference between one and many. In Proceedings of
the 4th International Colloquium on Automata, Languages and Program-
ming, volume 52 of Lecture Notes in Computer Science, pages 480–491.
Springer-Verlag, 1977.

[156] A. Sinclair and M. Jerrum. Approximate counting, uniform genera-


tion and rapidly mixing Markov chains. Information and Computation,
82:133, 1989.
[157] A. J. Sinclair. Algorithms for Random Generation and Counting: A
Markov Chain Approach, PhD Thesis, University of Edinburgh. Mono-
graph in the series Progress in Theoretical Computer Science. Springer-
Birkhäuser, Boston, 1993.
Bibliography 375

[158] A.J. Sinclair. Improved bounds for mixing rates of Markov chains
and multicommodity flow. Combinatorics, Probability and Computing,
1(4):351–370, 1992.
[159] M. Sipser. Introduction to the Theory of Computation, page 99. PWS,
1996.
[160] M. Sipser. Introduction to the Theory of Computation. Cengage Learn-
ing; 3 edition, 2012.
[161] L.G. Stockmeyer. The complexity of approximate counting. In Pro-
ceedings of the 15th Annual ACM Symposium on Theory of Computing,
pages 118–126.
[162] V. Strassen. Gaussian elimination is not optimal. Numerische Mathe-
matik, 13(4):354–356, 1969.
[163] R.L. Stratonovich. Conditional Markov processes. Theory of Probability
and its Applications, 5(2):156–178, 1960.
[164] S. Straub, T Thierauf, and F. Wagner. Counting the number of the
perfect matchings in K5 -free graphs. In Electronic Colloquium on Com-
putational Complexity. 2014.
[165] K.M. Swenson, G. Badr, and D. Sankoff. Listing all sorting reversals in
quadratic time. Algorithms for Molecular Biology, 6:11, 2011.
[166] E. Tannier, A. Bergeron, and M.-F. Sagot. Advances on sorting by
reversals. Discrete Applied Mathematics, 155(6–7):881–888, 2007.
[167] H. N. V. Temperley and Michael E. Fisher. Dimer problem in statistical
mechanics: An exact result. Philosophical Magazine, 6(68):1061–1063,
1961.

[168] I. Tinoco, O.C. Uhlenbeck, and M.D. Levine. Estimation of secondary


structure in ribonucleic acids. Nature, 230:362–367, 1971.
[169] I.J. Tinoco, P. Borer, B. Dengler, M. Levine, and O. Uhlenbeck. Im-
proved estimation of secondary structure in ribonucleic acids. Nature:
New Biology, 246:40–41, 1973.

[170] W. T. Tutte and C. A. B. Smith. On unicursal paths in a network of


degree 4. American Mathematical Monthly, 48:233–237, 1941.
[171] R. I. Tyshkevich. [The canonical decomposition of a graph] (in Russian).
Doklady Akademii Nauk SSSR, 24:677–679, 1980.
376 Bibliography

[172] A. Urbańska. Faster combinatorial algorithms for determinant and pfaf-


fian. Algorithmica, 56:35–50, 2010.

[173] S.P. Vadhan. The complexity of counting in sparse, regular and planar
graphs. SIAM Journal on Computing, 31(2):398–427, 2001.
[174] L. G. Valiant. Accidental algorithms. In Proceedings of the 47th Annual
IEEE Symposium on Foundations of Computer Science, pages 509–517.

[175] L.G. Valiant. The complexity of computing the permanent. Theoretical


Computer Science, 8(3):189–201, 1979.
[176] L.G. Valiant. The complexity of enumeration and reliability problems.
SIAM Journal on Computing, 8(3):410–421, 1979.

[177] L.G. Valiant. Negation can be exponentially powerful. Theoretical Com-


puter Science, 12:303–314, 1980.
[178] L.G. Valiant. Holographic algorithms. In Proceedings of the 45th Annual
IEEE Symposium on Foundations of Computer Science, pages 306–315,
2004.

[179] D. L. Vertigan and D. J. A. Welsh. The computational complexity of


the Tutte plane: The bipartite case. Combinatorics, Probability and
Computing, 1(2):181–187, 1992.
[180] A.J. Viterbi. Error bounds for convolutional codes and an asymptoti-
cally optimum decoding algorithm. IEEE Transactions on Information
Theory, 13(2):260–269, 1967.
[181] J. von Neumann. Monte Carlo Method, volume 12 of National Bureau of
Standards Applied Mathematics Series, chapter 13. Various techniques
used in connection with random digits, pages 36–38. Washington, D.C.:
U.S. Government Printing Office, 1951.

[182] D. Štefankovič, E. Vigoda, and J. Wilmes. On counting perfect match-


ings in general graphs, 2017. https://arxiv.org/pdf/1712.07504.
pdf.
[183] L. Wang and T. Jiang. On the complexity of multiple sequence align-
ment. Journal of Computational Biology, 1(4):337–348, 1994.

[184] M. Xia, P. Zhang, and W. Zhao. Computational complexity of counting


problems on 3-regular planar graphs. Theoretical Computer Science,
384(1):111–125, 2007.
Bibliography 377

[185] S. Yancopoulos, O. Attie, and R. Friedberg. Efficient sorting of genomic


permutations by translocation, inversion and block interchange. Bioin-
formatics, 21(6):3340–3346, 2005.
[186] C.H. zu Siederdissen, S.J. Prohaska, and P.F. Stadler. Algebraic dy-
namic programming over general data structures. BMC Bioinformatics,
16(Suppl 19):S2, 2015.

[187] M. Zucker and D. Sankoff. RNA secondary structures and their predic-
tion. Bulletin of Mathematical Biology, 46:591–621, 1984.
Index

A Felsenstein’s algorithm, 89
Adjacency graph, 342 Fibonacci numbers, 36
Algebraic dynamic programming, finding the recursion for
sampling with, 256–260 optimization, 49–50
Algebraic dynamic programming and formal definition of algebraic
monotone computations, dynamic programming,
35–122 45–48
algebraic dynamic programming, Forward algorithm, 65
introduction to, 36–51 Gale-Ryser theorem, 99
ambiguous grammar, 60, 73, 78 gap character, 66
base pairs, 80 graphical sequences, 99
binary search tree, 96, 114 greedy algorithm, 90
Boltzmann distribution, 83, 84 Hamiltonian path, 87
Boolean algebra, 56 homomorphism, 54, 57
Catalan number, 40, 41, 78, 110 Knudsen-Hein grammar, 81
Chomsky hierarchy, 60 Kronecker delta function, 105
Chomsky normal form, 74, 76 left neighbor, 107
co-emission pattern, 72 legal sequence, 101
coin system, 42–43 legitimate code word, 39
computational problem, 48 limitations of algebraic dynamic
context-free grammars, 73–85 programming approach,
counting, optimizing, deciding, 89–91
51–59 longest common subsequence, 67
counting the coin sequences matrix multiplication, 90
summing up to a given maximum matching, 116
amount, 49 monoid semiring, 52
counting the coin sequences non-terminals, 60, 73
when the order does not outermost base pairs, 81
count, 51 palindromic supersequence, 117
counting the total sum of partition function, 83
weights, 50–51 partition polynomial, 44, 49
distributive rule, 46, 58 Pascal’s triangle, 37
dual tropical semiring, 60 power of algebraic dynamic
Dyck words, 78–80, 93 programming (money
edit distance of sequences, 67 change problem), 48–51
evaluation algebras, 36, 38 pseudo-knot free secondary
exercises, 91–99 structure, 80

379
380 Index

random walk on the states, 64, bipartite degree sequences,


72 sampling realizations of,
recursions, dynamic 330–334
programming, 36–45 canonical path method (rapid
regular expressions, 39 mixing), 329
regular grammars, hidden cornerstone of the sweeping, 331
Markov models, 59–65 counting the most parsimonious
right neighbor, 107 DCJ scenarios, 340–353
RNA sequence, 80–81 counting the (not necessarily
Sankoff-Rousseau algorithm, perfect) matchings of a
89 graph, 328–330
scoring scheme, 71 coupling of Markov chain, 354
semiring, 46, 86 DCJ scenarios, 345–349
sequence alignment problems, double cut and join operation,
pair hidden Markov models, 341
65–73 edge-disjoint tree realizations
shortest common supersequence, without common internal
67 vertices, 323–326
solutions to exercises, 99–122 exercises, 358–360
spanning trees, 91 genome, 343
stair-step shape, 41 Glauber dynamics Markov
starting non-terminal, 73 chain, 357, 359
statistics evaluation algebra, 53 Hamming distance, 334
subpermutations, 68 k -colorings of a graph, sampling
substitution-free alignments, 66 and counting of, 353–355
sums over sets, calculation of, 38 #Knapsack, 322–323
terminals, 60, 73 lazy Markov chain, 350
trees, 88–89 linear extensions of posets,
triangulation, 108 326–327
tropical semiring, 55 Markov chains, sampling with,
unambiguous grammar, 60, 101 326–355
universal gas constant, 83 Most Parsimonious DCJ
Viterbi algorithm, 65 scenario problem, 342
walks on directed graphs, 85–88 open problems, 357–358
yield algebras, 36, 38, 45, 115 pigeonhole rule, 334
zig-zag sequence of numbers, 97 quasi-polynomial function, 358
zoo of counting and solvable random DCJ path, 353
optimization problems, rejection method, sampling
59–89 with, 322–326
Ambiguous grammar, 60, 73, 78, 187 solutions to exercises, 360–361
Approximable counting and sorting DCJ, 343
sampling problems, 321–361 swap Markov chain, 339, 357
adjacency graph, 342 sweeping process, 331
balanced realizations of a JDM, telomeres, 340
334–340 trivial components, 342
Index 381

Approximation-preserving poset polytope, 11


reductions, 166 #P problems, 8
Arbitrary matrix, calculating the problem instances, 2
permanent of, 170–174 random decision algorithms
Arborescences, 139 (RP, BPP, Papadimitriou’s
theorem), 14–18
B running time, 2
Background on computational sampling methods, 26
complexity, 1–31 self-reducible counting problems,
central question, 24 22
Chernoff’s inequality, 17 solutions to exercises, 29–31
classification of algorithms, 2 stochastic counting and
computational problem, sampling (FPRAS and
definition of, 2 FPAUS), 19–24
convex body, computing the subexponentia function, 3
volume of (deterministic subset sum problem, 3
versus stochastic case), superpolynomial function, 3
11–14 total variation distance,
cycle problem, 20 22
decision problem, 2 witness, 2
deterministic counting, 8–11 Base pairs, 80
deterministic decision problems, Bayes theorem on conditional
4–7 probabilities, 250
easiest counting problems, 25 BEST (de Bruijn-Ehrenfest-Smith-
exercises, 26–29 Tutte) algorithm,
function problem, 8 139–145
gadget graph, 21 Biased cube, 255
general overview of Binary search tree, 96, 114
computational problems, Bipartite degree sequences, sampling
2–4 realizations of, 330–334
Hamiltonian cycle problem, 20, Bipartite graphs
23 computing permanent of
Hamiltonian path, 7 non-negative matrix and
holographic algorithms, 25 counting perfect matchings
k-clique problem, 5 in, 188–190
Markov chains, mixing of, 26 counting (not necessarily
monotone computations, 25 perfect) matchings of,
NP-complete problems, 6, 24 190–191
optimization problems, 1, 2 perfect matchings in, 9
oracle, 12 #BIS-complete problems, 211
partially ordered set, 10 Boltzmann distribution, 83, 84
Pfaffian, 25 Boolean algebra, 56
polynomial reduction, 5 Boolean formula, 210
polynomial time, counting Boolean functions (holographic
problems solvable in, 25 algorithms), 241
382 Index

Boolean logic, 6 linear algebraic algorithms (the


Boolean semiring, 61 power of subtraction),
BPP (Bounded-error Probabilistic 123–164
Polynomial) #P-complete counting
algorithm, 185 problems, 165–216
problem class, 15, 18 Count tree, 261
Burnside process, 312 Cycle problem (FPRAS), 20

C D
Canonical path method, 291, 329 Decision problem, 2
Catalan number, 40, 41, 78, 110, computational problem, 48
269 deterministic, 4–7
Caterpillar binary tree, 317 MPDCJ scenario problem, 342
Chebyshev functions, 189, 191 NP-complete, 24, 234
Cheeger’s inequality, 278, 287, 327 3DNF, 186
Chernoff’s inequality, 17, 300, 306 #3SAT, 167
Cherry motif (elementary subtree), Deterministic counting, 8–11
180 bipartite graph, 9
Chinese remainder theorem, 192 computational problem, 8
Chomsky hierarchy, 60 Euclidian space, 11
Chromosomes, 341 function problem, 8
Clow (closed ordered walk), 124, 130 partially ordered set, 10
Co-emission pattern, 72 #P-complete counting problem,
Coin system, 42–43 10
Commutative ring, 25, 50, 129, 137, permanent, 9
219 poset polytope, 11
Computational complexity, #P problems, 8
background on, see Deterministic decision problems,
Background on 4–7
computational complexity Boolean logic, 6
Computational problem, definition computational complexity
of, 2 theory, 4
Conjunctive normal form (CNF), Hamiltonian path, 7
6 k-clique problem, 5
Constraint Satisfaction Problems NP-complete problems, 6
(CSP), 240 polynomial reduction, 5
Context-free grammar, 73–85 satisfying assignment, 6
Convex body, computing the volume Directed graphs
of, 11–14 adjacency matrix of, 170
Counting, computational complexity BEST algorithm and, 139
of closed ordered walk and, 124
algebraic dynamic programming construction, 21
and monotone edge-weighted, 189
computations, 35–122 Eulerian, 141, 210
holographic algorithms, 217–244 Hamiltonian path, 7
Index 383

Hidden Markov Model, 63, 71 background on computational


Markov chain, 263 complexity, 26–29
number of arborescences of, 217 holographic algorithms, 242–243
number of cycles in, 20 linear algebraic algorithms (the
walks on, 85–88 power of subtraction),
Distributive rule, 58 158–160
Domino tilings, 150 Markov chains, mixing of (and
Double cut and join (DCJ) their applications in theory
model, 211–212 of counting and sampling),
operation, 341, 343 313–316
Dual tropical semiring, 50 #P-complete counting
Dyck words, 78–80, 93, 265, 316 problems, 213–214
random generations, methods of,
E 267–268
Edge-disjoint tree realizations
without common internal F
vertices, 323–326 Felsenstein’s algorithm, 89
Edit distance of sequences, 67 Fibonacci gates, 240
Elementary subtrees, 176, 179 Fibonacci numbers, 36, 154
Erdös-Gallai theorem, 214 FKT (Fisher-Kasteleyn-Temperley)
Euclidian space, 11 algorithm, 145–154
Eulerian circuit, 143 Forward algorithm, 65
Eulerian graph FPAUS (Fully Polynomial Almost
directed graph, 141, 210 Uniform Sampler), 22, 251,
holographic algorithms, 217 274, 301
number of Eulerian orientations FPRAS (Fully Polynomial
in, 207–208 Randomized Approximation
Evaluation algebra, 36, 38, see also Scheme), 19–24, 301, 321
Algebraic dynamic Function problem
programming and definition, 8
monotone computations problem instance in, 166
linear algebraic algorithms, 128,
155 G
random generations and, 256 Gadget
statistics, 53 components, 201
walks on directed graphs, 86 Eulerian graph, 217
Evolutionary tree, counting the most graph, 21
parsimonious substitution planar graph, 218
histories on, 174–184 vertices of, 215
Exercises Gale-Ryser theorem, 99
algebraic dynamic programming Gap character, 66
and monotone Gaussian elimination, 124, 135
computations, 91–99 Genome, 343
approximable counting and Glauber dynamics Markov chain,
sampling problems, 358–360 357, 359
384 Index

Grammar, 59–65 dynamic programming


ambiguous, 60, 73, 78, 187 algorithm, 92
Chomsky Normal Form, 259 multiple, 73
context-free, 73–85 pair, 65–73
counting the sequences of a triple, 73
given length that a regular Holographic algorithms, 25, 217–244
grammar can generate, arity of the matchgate, 218
187–188 Boolean functions, 241
Knudsen-Hein, 81 Constraint Satisfaction
stochastic regular, 63 Problems, 240
transformational, 59 Eulerian graph, 217
unambiguous, 60, 101 examples, 223–239
Graphical sequences, 99 exercises, 242–243
Graphs Fibonacci gates, 240
adjacency, 342 Hamiltonian cycles, 228
bipartite graphs, 9, 188–190 Holant, 223
Eulerian, 207, 217 holographic reduction, 218–223
k -colorings of, 353–355 input nodes, 218
subtrees, counting, 204–206 matchgates, 218
Graphs, directed not-all-equal clause, 234
adjacency matrix of, 170 open problems, 241
BEST algorithm and, 139 output nodes, 218
closed ordered walk and, 124 Pfaffian orientation, 227
construction, 21 #Pl-3-(1,1)-cyclechain, 228–231
edge-weighted, 189 #Pl-3-NAE-ICE, 231–234
Eulerian, 141, 210 #Pl-3-NAE-SAT, 234–236
Hamiltonian path, 7 #7Pl-Rtw-Mon-3SAT, 236–239
Hidden Markov Model, 63, 71 read-twice formula, 236
Markov chain, 263 recognizer matchgates, 221, 230,
number of arborescences of, 217 232
number of cycles in, 20 solutions to exercises, 243–244
walks on, 85–88 standard signature of matchgrid,
Greedy algorithm, 90 219
transducer matchgate, 218
H #X-matchings, 224–227
Half regular bipartite degree Homomorphism, 54, 57, 86
sequences, 332 Hypercube, 282
Hamiltonian cycle
cyclechain, 228 I
problem, 20, 23 Ice problems (holographic
Hamiltonian path, 7, 87, 205 algorithms), 231
Hamming distance, 173, 197, 334 Importance sampling, 253–256
Hidden Markov models (HMMs), Inversion method (generating
59–65 random numbers), 248
Irreducible Markov chain, 263, 266
Index 385

J division-free algorithms for


Joint degree matrix (JDM), balanced calculating the determinant
realizations of, 334–340 and Pfaffian, 124–133
balanced JDM realization, 336 domino tilings, 150
degree spectrum, 335 Eulerian circuit, 143
graphical realization, 334 exercises, 158–160
Hamming distance, 334 Fibonacci number, 154
pigeonhole rule, 334 FKT (Fisher-Kasteleyn-
RSO Markov chain, 335 Temperley) algorithm,
145–154
K Gaussian elimination, 124, 135
k-clique problem, 5 head of the walk, 125
k -colorings of a graph, sampling and homomorph images, 154
counting of, 353–355 in-tree, 140
Kirchoff’s matrix-tree theorem, inverse mapping, 148
extension of, 139 Kirchhoff’s matrix-tree theorem,
Knapsack problem, 322–323 133–139
Knudsen-Hein grammar, 81 Laplace expansion, 135
Kronecker delta function, 105 Laurent polynomial ring, 139
leaf-labeled trees, 137
oriented even cycle covering,
L
147
Laplace expansion, 135
Pfaffian orientation, 146
Laurent polynomial ring, 139
pigeonhole rule, 142
Lazy Markov chain, 271, 312, 328,
planar graph, 145
336, 350
Samuelson-Berkowitz algorithm,
Leaf-labeled trees, 137
124
Legal sequence, 101
solutions to exercises, 160–164
Linear algebraic algorithms (the
spanning tree, 134, 155
power of subtraction),
subtraction, power of, 154–157
123–164
Longest common subsequence, 67
alternating clow sequences,
130
M
anti-arborescence, 139
Markov chain Monte Carlo, 263–267
arborescences, 139
convergence of distributions, 264
BEST (de Bruijn-Ehrenfest-
Dyck words, 265–266
Smith-Tutte) algorithm,
irreducible Markov chain, 263,
139–145
266
clow sequences, 123, 125
Markov graph, 263
commutative ring, 129, 137
Metropolis-Hastings algorithm,
diagonal matrix, 138
266
directed Eulerian graph, 141,
random walk on the states, 263
144
reversible Markov chain, 264
directed graph, 124
state space, 263
directed multigraph, 144
transition probabilities, 263
386 Index

Markov chains, mixing of (and their Poincar coefficient, 292


applications in theory of projection Markov chain, 290
counting and sampling), relaxation time and
273–319 second-largest eigenvalue,
approximation factor, 309 274–277
Burnside process, 312 self-reducible counting problems,
canonical paths and 301–311
multicommodity flow, solutions to exercises, 317–319
291–298 state space, 279, 283, 301
capacity of subset, 278 Stirling formula, 296
caterpillar binary tree, 317 techniques to prove rapid mixing
Cheeger’s inequalities and the of Markov chains, 277–301
isoperimetric inequality, transition matrix, 301
278–283 Markov chains, sampling with,
Chernoff’s inequality, 300, 306 326–355
children, 303 adjacency graph, 342
conductance of a Markov chain, balanced JDM realization, 336
278 balanced realizations of a JDM,
coupling of Markov chains, 334–340
298–300 bipartite degree sequences,
dichotomy theory on the sampling realizations of,
approximability of 330–334
self-reducible counting canonical path method (rapid
problems, 307–311 mixing), 329
direct product spaces, mixing of Cheeger inequality, 327
Markov chains on, 300–301 chromosomes, 341
distinguished path technique, cornerstone of the sweeping, 331
311 counting of most parsimonious
ergodic flow of subset, 278 DCJ scenarios, 340–353
exercises, 313–316 counting (not necessarily
factorized state spaces, mixing perfect) matchings of a
of Markov chains on, graph, 328–330
283–291 coupling of Markov chain, 354
hypercube, 282 double cut and join operation,
Jerrum-Valiant-Vazirani 341
theorem, 302–306 genome, 343
lazy Markov chain technique, half regular bipartite degree
312 sequences, 332
logarithmic Sobolev constant, Hamming distance, 334
313 k -colorings of a graph, sampling
multicommodity flow, 294–295, and counting of, 353–355
297 labeled union, 337
non-empty subset, 284 lazy Markov chain, 326, 328, 350
open problems, 313 linear extensions of posets,
pigeonhole rule, 305 326–327
Index 387

Metropolis-Hastings Markov P
chain, 352 Palindromic supersequence, 117
Most Parsimonious DCJ Papadimitriou’s theorem, 16
scenario problem, 342 Parse tree, 259
pigeonhole rule, 334 Parsimonious reduction, 166
RSO Markov chain, 335 Partially ordered set, 10
swap Markov chain, 339 Partition polynomial, 44, 49
swap operation, 330 Pascal’s triangle, 37
sweeping process, 331 #P-complete counting problems,
telomeres, 340 165–216
transitions of Markov chain, ambiguous grammar, 187
326 approximation-preserving
tree degree sequences, 324 #P-complete proofs,
trivial components, 342 167–186
Matchgates, 218 approximation-preserving
Matching polynomial, 200, 203 reductions, 166
Matrix multiplication, 90 #BIS-complete problems,
Median sequences, 196 211
Metropolis-Hastings algorithm, 266, blown-up subtree, 181
350 Boolean formula, 210
Metropolis-Hastings Markov chain, BPP algorithm, 185
352 calculating the permanent of
#MinDegree1-2Trees-DegreePacking, arbitrary matrix, 170–174
323 Chebyshev functions, 191
Money change problem, 48–51 cherry motif, 180
Monoid semiring, 52 Chinese remainder theorem,
Monotone computations, 25 192
Most Parsimonious DCJ (MPDCJ) computing the permanent of
scenario problem, 342 non-negative matrix and
counting perfect matchings
N in bipartite graphs, 188–190
Non-terminal characters, 60 counting the linear extensions of
Not-all-equal clause, 234 a poset, 191–195
NP-complete problems, 6, 9, 24, 241 counting the most parsimonious
substitution histories on
O evolutionary tree, 174–184
Optimization problems, 1, 2, 358 counting the most parsimonious
MPDCJ, 342 substitution histories on
NP-hard, 86 star tree, 195–199
#P-complete, 211 counting the (not necessarily
solvable with algebraic dynamic perfect) matchings of
programming, 59 bipartite graph, 190–191
SP-Tree problem, 174 counting the (not necessarily
Oracle, 12 perfect) matchings in
planar graph, 199–204
388 Index

counting the sequences of a Phylogenies (perfect), evolutionary


given length that a regular scenarios in, 212
grammar can generate, Pigeonhole rule, 142, 305, 334
187–188 Planar graph, 145
counting the subtrees of a counting the (not necessarily
graph, 204–206 perfect) matchings in,
DCJ scenarios, 211–212 199–204
#DNF, #3DNF, 186 edge-weighted bipartite, 224
elementary subtrees, 176, 179 Pfaffian orientation, 146, 153
exercises, 213–214 planar embedding, 148
external edges, 205 Poincar coefficient, 292
external vertices, 204 Polynomial monotone circuit, 262
further results, 208–211 Polynomial reductions, 166, 167
gadget components, 201 Polynomial time, counting problems
Hamiltonian path, 205 solvable in, 25
Hamming distance, 173, 197 Poset, counting the linear extensions
interchanges, 171, 172 of, 191–195
internal vertices, 204–205 Projection Markov chain, 290
#IS and #Mon-2SAT, 184–186 Pseudo-knot free secondary
matching polynomial, 200, 203 structure, 80
median sequences, 196
number of Eulerian orientations Q
in Eulerian graph, 207–208 Quasi-polynomial function, 358
open problems, 211–212
parsimonious reduction, 166 R
#P-complete proofs not Random generations, methods of,
preserving the relative 247–271
error, 186–208 acceptance probability, 270
phylogenies (perfect), Bayes theorem on conditional
evolutionary scenarios in, probabilities, 250
212 biased cube, 255
polynomial reductions, 166, 167 Catalan number, 269
reversal scenarios, 212 Chomsky Normal Form
rewriting rule, 187 grammar, 259
slotted space, 210 convergence of distributions, 264
solutions to exercises, 214–216 count tree, 261
#3SAT, 167–170 directed acyclic graph, 269
unit subtree, 176, 181 Dyck words, 265–266
Pfaffian, calculation of, 124–133 enveloping constant, 249
Pfaffian orientation , 25 exercises, 267–268
adjacency matrix and, 152 generating random numbers,
domino tilings, 151 248–249
holographic algorithms, 227 granulation function, 261
orientation, planar graph, 146 importance sampling, 253–256
inversion method, 248
Index 389

irreducible Markov chain, 263, RNA sequence, 80–81


266 RP (Randomized Polynomial time)
lazy Markov chain technique, decision problem, 15
271
Markov chain Monte Carlo, S
263–267 Sampling, computational complexity
Markov graph, 263 of, 245
Metropolis-Hastings algorithm, approximable counting and
266 sampling problems, 321–361
parse tree, 259 Markov chains, mixing of (and
polynomial monotone circuit, their applications in theory
262 of counting and sampling),
polynomial time, 258, 260 273–319
random walk on the states, 263 random generations, methods of,
recursive algorithm, 257 247–271
rejection sampling, 249–253 Samuelson-Berkowitz algorithm, 124
reversible Markov chain, 264 Sankoff-Rousseau algorithm, 89
rewriting rule, 259 Satisfiability problem, 6
sampling with algebraic dynamic Self-reducible counting problems, 22
programming, 256–260 Semiring, 46
sampling distribution, 249 Boolean, 61
sampling self-reducible objects, dual tropical, 50
260–263 homomorphism, 86
sequential importance sampling, monoid, 52
256 tropical, 55
solutions to exercises, 269–271 Sequential importance sampling,
state space, 263 256
target distribution, 249 Shortest common supersequence, 67
transition probabilities, 263 Shortest path problems, 85
Random walk on the states, 64 Slotted space, 210
Read-twice formula, 236 Solutions to exercises, 360–361
Recognizer matchgates, 221, 230, 232 algebraic dynamic programming
Rejection sampling, 249–253, and monotone
322–326 computations, 99–122
Bayes theorem on conditional background on computational
probabilities, 250 complexity, 29–31
edge-disjoint tree realizations holographic algorithms, 243–244
without common internal linear algebraic algorithms (the
vertices, 323–326 power of subtraction),
enveloping constant, 249 160–164
#Knapsack, 322–323 Markov chains, mixing of (and
sampling distribution, 249 their applications in theory
target distribution, 249 of counting and sampling),
Restricted swap operation (RSO), 317–319
335
390 Index

#P-complete counting parse, 259


problems, 214–216 realizations, edge-disjoint,
random generations, methods of, 323–326
269–271 spanning, 91, 134, 155
Sorting DCJ, 343 star, 195–199
Spanning tree, 91, 134, 155 Triangulation, 108
Star tree, counting the most Tropical semiring, 55
parsimonious substitution
histories on, 195–199 U
Statistics evaluation algebra, 53 Unambiguous grammar, 60, 101
Stochastic regular grammar, 63 Unit subtree, 17, 181
Subexponentia function, 3 Universal gas constant, 83
Subset sum problem, 3
Substitution-free alignments, 66 V
Subtraction, see Linear algebraic Viterbi algorithm, 65
algorithms (the power of
subtraction) X
Superpolynomial function, 3 # X-matchings (holographic
Swap Markov chain, 339, 357 algorithms), 224–227
Swap operation, 330
Sweeping process, 331 Y
Yield algebra, 36, 38, see also
T Algebraic dynamic
Telomeres, 340 programming and
Terminal characters, 60 monotone computations
Trees, 88–89 for clow sequences, 128
binary search tree, 96, 114 linear algebraic algorithms, 155
caterpillar binary, 317 walks on directed graphs, 86
count, 261
degree sequences, 324 Z
evolutionary, 174–184 Zig-zag sequence of numbers, 97
leaf-labeled, 137

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy