0% found this document useful (0 votes)
177 views39 pages

Tensor Flow Q

TensorFlow Quantum (TFQ) is an open-source software framework for developing hybrid quantum-classical machine learning models. It provides high-level tools to design and train both discriminative and generative models using TensorFlow. TFQ supports quantum circuit simulators and integrates with Cirq for quantum computations. The framework allows rapid prototyping of applications including classification, control, and optimization using quantum circuits. It can also be used for advanced tasks like meta-learning, Hamiltonian learning, and thermal state sampling.

Uploaded by

vicky8595
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
177 views39 pages

Tensor Flow Q

TensorFlow Quantum (TFQ) is an open-source software framework for developing hybrid quantum-classical machine learning models. It provides high-level tools to design and train both discriminative and generative models using TensorFlow. TFQ supports quantum circuit simulators and integrates with Cirq for quantum computations. The framework allows rapid prototyping of applications including classification, control, and optimization using quantum circuits. It can also be used for advanced tasks like meta-learning, Hamiltonian learning, and thermal state sampling.

Uploaded by

vicky8595
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

TensorFlow Quantum:

A Software Framework for Quantum Machine Learning


Michael Broughton,1, 9, ∗ Guillaume Verdon,1, 2, 8, 10, † Trevor McCourt,1, 11 Antonio J. Martinez,1, 8, 12
Jae Hyeon Yoo,3 Sergei V. Isakov,4 Philip Massey,5 Murphy Yuezhen Niu,1 Ramin Halavati,6
Evan Peters,8, 10, 13 Martin Leib,14 Andrea Skolik,14, 15, 16, 17 Michael Streif,14, 16, 17, 18 David Von Dollen,19
Jarrod R. McClean,1 Sergio Boixo,1 Dave Bacon,7 Alan K. Ho,1 Hartmut Neven,1 and Masoud Mohseni1, ‡
1
Google Research, Venice, CA 90291
2
X, Mountain View, CA 94043
3
Google Research, Seoul, South Korea
4
Google Research, Zurich, Switzerland
5
Google, New York, NY
6
Google, Munich, Germany
7
arXiv:2003.02989v1 [quant-ph] 6 Mar 2020

Google Research, Seattle, WA 98103


8
Institute for Quantum Computing, University of Waterloo, Waterloo, Ontario, N2L 3G1, Canada
9
School of Computer Science, University of Waterloo, Waterloo, Ontario, N2L 3G1, Canada
10
Department of Applied Mathematics, University of Waterloo, Waterloo, Ontario, N2L 3G1, Canada
11
Department of Mechanical & Mechatronics Engineering,
University of Waterloo, Waterloo, Ontario, N2L 3G1, Canada
12
Department of Physics & Astronomy, University of Waterloo, Waterloo, Ontario, N2L 3G1, Canada
13
Fermi National Accelerator Laboratory, P.O. Box 500, Batavia, IL, 605010
14
Data:Lab, Volkswagen Group, Ungererstr. 69, 80805 Mnchen, Germany
15
Ludwig Maximilian University, 80539 Mnchen, Germany
16
Quantum Artificial Intelligence Laboratory, NASA Ames Research Center (QuAIL)
17
USRA Research Institute for Advanced Computer Science (RIACS)
18
University Erlangen-Nrnberg (FAU), Institute of Theoretical Physics, Staudtstr. 7, 91058 Erlangen, Germany
19
Volkswagen Group Advanced Technologies, San Francisco, CA 94108
(Dated: March 9, 2020)
We introduce TensorFlow Quantum (TFQ), an open source library for the rapid prototyping
of hybrid quantum-classical models for classical or quantum data. This framework offers high-level
abstractions for the design and training of both discriminative and generative quantum models under
TensorFlow and supports high-performance quantum circuit simulators. We provide an overview
of the software architecture and building blocks through several examples and review the theory
of hybrid quantum-classical neural networks. We illustrate TFQ functionalities via several basic
applications including supervised learning for quantum classification, quantum control, and quantum
approximate optimization. Moreover, we demonstrate how one can apply TFQ to tackle advanced
quantum learning tasks including meta-learning, Hamiltonian learning, and sampling thermal states.
We hope this framework provides the necessary tools for the quantum computing and machine
learning research communities to explore models of both natural and artificial quantum systems,
and ultimately discover new quantum algorithms which could potentially yield a quantum advantage.

CONTENTS D. TFQ architecture 6


1. Design Principles and Overview 6
I. Introduction 2 2. The Abstract TFQ Pipeline for a specific
A. Quantum Machine Learning 2 hybrid discriminator model 8
B. Hybrid Quantum-Classical Models 3 3. Hello Many-Worlds: Binary Classifier for
C. Quantum Data 4 Quantum Data 8
D. TensorFlow Quantum 4 E. TFQ Building Blocks 10
1. Quantum Computations as Tensors 10
II. Software Architecture & Building Blocks 5 2. Composing Quantum Models 10
A. Cirq 5 3. Sampling and Expectation Values 10
B. TensorFlow 5 4. Differentiating Quantum Circuits 11
C. Technical Hurdles in Combining Cirq with 5. Simplified Layers 12
TensorFlow 6 F. Quantum Circuit Simulation with qsim 12
1. Comment on the Simulation of Quantum
Circuits 12
∗ mbbrough@google.com 2. Gate Fusion with qsim 12
† gverdon@google.com 3. Hardware-Specific Instruction Sets 13
‡ mohseni@google.com 4. Benchmarks 13
2

III. Theory of Hybrid Quantum-Classical Machine I. INTRODUCTION


Learning 14
A. Quantum Neural Networks 14 A. Quantum Machine Learning
B. Sampling and Expectations 14
C. Gradients of Quantum Neural Networks 15 Machine learning (ML) is the construction of algo-
1. Finite difference methods 15 rithms and statistical models which can extract infor-
2. Parameter shift methods 15 mation hidden within a dataset. By learning a model
3. Stochastic Parameter Shift Gradient from a dataset, one then has the ability to make predic-
Estimation 16 tions on unseen data from the same underlying probabil-
D. Hybrid Quantum-Classical Computational ity distribution. For several decades, research in machine
Graphs 17 learning was focused on models that can provide theoret-
1. Hybrid Quantum-Classical Neural ical guarantees for their performance [1–4]. However, in
Networks 17 recent years, methods based on heuristics have become
dominant, partly due to an abundance of data and com-
E. Autodifferentiation through hybrid
putational resources [5].
quantum-classical backpropagation 18
Deep learning is one such heuristic method which has
seen great success [6, 7]. Deep learning methods are
IV. Basic Quantum Applications 19
based on learning a representation of the dataset in the
A. Hybrid Quantum-Classical Convolutional form of networks of parameterized layers. These parame-
Neural Network Classifier 19 ters are then tuned by minimizing a function of the model
1. Background 19 outputs, called the loss function. This function quantifies
2. Implementations 20 the fit of the model to the dataset.
B. Hybrid Machine Learning for Quantum In parallel to the recent advances in deep learning,
Control 23 there has been a significant growth of interest in quantum
1. Time-Constant Hamiltonian Control 23 computing in both academia and industry [8]. Quan-
2. Time-dependent Hamiltonian Control 24 tum computing is the use of engineered quantum sys-
C. Quantum Approximate Optimization 25 tems to perform computations. Quantum systems are
1. Background 25 described by a generalization of probability theory al-
lowing novel behavior such as superposition and entan-
2. Implementation 26
glement, which are generally difficult to simulate with
a classical computer [9]. The main motivation to build
V. Advanced Quantum Applications 27 a quantum computer is to access efficient simulation of
A. Meta-learning for Variational Quantum these uniquely quantum mechanical behaviors. Quan-
Optimization 27 tum computers could one day accelerate computations
B. Vanishing Gradients and Adaptive Layerwise for chemical and materials development [10], decryption
Training Strategies 28 [11], optimization [12], and many other tasks. Google’s
1. Random Quantum Circuits and Barren recent achievement of quantum supremacy [13] marked
Plateaus 28 the first glimpse of this promised power.
2. Layerwise quantum circuit learning 29 How may one apply quantum computing to practical
C. Hamiltonian Learning with Quantum Graph tasks? One area of research that has attracted consider-
Recurrent Neural Networks 30 able interest is the design of machine learning algorithms
1. Motivation: Learning Quantum Dynamics that inherently rely on quantum properties to accelerate
with a Quantum Representation 30 their performance. One key observation that has led to
2. Implementation 31 the application of quantum computers to machine learn-
D. Generative Modelling of Quantum Mixed ing is their ability to perform fast linear algebra on a
States with Hybrid Quantum-Probabilistic state space that grows exponentially with the number of
Models 33 qubits. These quantum accelerated linear-algebra based
techniques for machine learning can be considered the
1. Background 33
first generation of quantum machine learning (QML) al-
2. Variational Quantum Thermalizer 34 gorithms tackling a wide range of applications in both su-
3. Quantum Generative Learning from pervised and unsupervised learning, including principal
Quantum Data 36 component analysis [14], support vector machines [15], k-
means clustering [16], and recommendation systems [17].
VI. Closing Remarks 36 These algorithms often admit exponentially faster solu-
tions compared to their classical counterparts on certain
VII. Acknowledgements 37 types of quantum data. This has led to a significant
surge of interest in the subject [18]. However, to apply
References 37 these algorithms to classical data, the data must first
3

be embedded into quantum states [19], a process whose and trained networks find approximate solutions which
scalability is under debate [20]. Additionally, there is a generalize well to unseen data.
scaling variation when these algorithms are applied to
In principle, any or all combinations of these at-
classical data mostly rendering the quantum advantage
tributes could be susceptible to possible improvements
to become polynomial [21]. Continuing debates around
by quantum computation. There are many ways to com-
speedups and assumptions make it prudent to look be-
bine classical and quantum computations. One well-
yond classical data for applications of quantum compu-
known method is to use classical computers as outer-
tation to machine learning.
loop optimizers for QNNs. When training a QNN with
With the availability of Noisy Intermediate-Scale
a classical optimizer in a quantum-classical loop, the
Quantum (NISQ) processors [22], the second generation
overall algorithm is sometimes referred to as a Varia-
of QML has emerged [8, 12, 18, 23–43]. In contrast to
tional Quantum-Classical Algorithm. Some recently pro-
the first generation, this new trend in QML is based on
posed architectures of QNN-based variational quantum-
heuristic methods which can be studied empirically due
classical algorithms include Variational Quantum Eigen-
to the increased computational capability of quantum
solvers (VQEs) [28, 47], Quantum Approximate Opti-
hardware. This is reminiscent of how machine learning
mization Algorithms (QAOAs) [12, 27, 48, 49], Quan-
evolved towards deep learning with the advent of new
tum Neural Networks (QNNs) for classification [50, 51],
computational capabilities [44]. These new algorithms
Quantum Convolutional Neural Networks (QCNN) [52],
use parameterized quantum transformations called pa-
and Quantum Generative Models [53]. Generally, the
rameterized quantum circuits (PQCs) or Quantum Neu-
goal is to optimize over a parameterized class of compu-
ral Networks (QNNs) [23, 42]. In analogy with classical
tations to either generate a certain low-energy wavefunc-
deep learning, the parameters of a QNN are then opti-
tion (VQE/QAOA), learn to extract non-local informa-
mized with respect to a cost function via either black-box
tion (QNN classifiers), or learn how to generate a quan-
optimization heuristics [45] or gradient-based methods
tum distribution from data (generative models). It is
[46], in order to learn a representation of the training
important to note that in the standard model architec-
data. In this paradigm, quantum machine learning is the
ture for these applications, the representation typically
development of models, training strategies, and inference
resides entirely on the quantum processor, with classical
schemes built on parameterized quantum circuits.
heuristics participating only as optimizers for the tun-
able parameters of the quantum model. Various forms
of gradient descent are the most popular optimization
B. Hybrid Quantum-Classical Models heuristics, but an obstacle to the use of gradient descent
is the effect of barren plateaus [51], which generally arises
Near-term quantum processors are still fairly small and when a network lacking structure is randomly initialized.
noisy, thus quantum models cannot disentangle and gen- Strategies for overcoming these issues are discussed in
eralize quantum data using quantum processors alone. detail in section V B.
NISQ processors will need to work in concert with clas-
sical co-processors to become effective. We anticipate While the use of classical processors as outer-loop op-
that investigations into various possible hybrid quantum- timizers for quantum neural networks is promising, the
classical machine learning algorithms will be a produc- reality is that near-term quantum devices are still fairly
tive area of research and that quantum computers will noisy, thus limiting the depth of quantum circuit achiev-
be most useful as hardware accelerators, working in sym- able with acceptable fidelity. This motivates allowing as
biosis with traditional computers. In order to understand much of the model as possible to reside on classical hard-
the power and limitations of classical deep learning meth- ware. Several applications of quantum computation have
ods, and how they could be possibly improved by in- ventured beyond the scope of typical variational quantum
corporating parameterized quantum circuits, it is worth algorithms to explore this combination. Instead of train-
defining key indicators of learning performance: ing a purely quantum model via a classical optimizer,
Representation capacity: the model architecture has one then considers scenarios where the model itself is a
the capacity to accurately replicate, or extract useful in- hybrid between quantum computational building blocks
formation from, the underlying correlations in the train- and classical computational building blocks [54–57] and
ing data for some value of the model’s parameters. is trained typically via gradient-based methods. Such
scenarios leverage a new form of automatic differentia-
Training efficiency: minimizing the cost function via
tion that allows the backwards propagation of gradients
stochastic optimization heuristics should converge to an
in between parameterized quantum and classical compu-
approximate minimum of the loss function in a reason-
tations. The theory of such hybrid backpropagation will
able number of iterations.
be covered in section III C.
Inference tractability: the ability to run inference on
a given model in a scalable fashion is needed in order to In summary, a hybrid quantum-classical model is a
make predictions in the training or test phase. learning heuristic in which both the classical and quan-
Generalization power : the cost function for a given tum processors contribute to the indicators of learning
model should yield a landscape where typically initialized performance defined above.
4

C. Quantum Data effort required to manually construct such models, de-


velop quantum datasets, and set up training and valida-
Although it is not yet proven that heuristic QML could tion stages decreases a researcher’s ability to iterate and
provide a speedup on practical classical ML applications, discover. TensorFlow has accelerated the research and
there is some evidence that hybrid quantum-classical ma- understanding of deep learning in part by automating
chine learning applications on “quantum data” could pro- common model building tasks. Development of software
vide a quantum advantage over classical-only machine tooling for hybrid quantum-classical models should simi-
learning for reasons described below. Abstractly, any larly accelerate research and understanding for quantum
data emerging from an underlying quantum mechanical machine learning.
process can be considered quantum data. This can be the To develop such tooling, the requirement of accommo-
classical data resulting from quantum mechanical experi- dating a heterogeneous computational environment in-
ments, or data which is directly generated by a quantum volving both classical and quantum processors is key.
device and then fed into an algorithm as input. A quan- This computational heterogeneity suggested the need to
tum or hybrid quantum-classical model will be at least expand TensorFlow, which is designed to distribute com-
partially represented by a quantum device, and therefore putations across CPUs, GPUs, and TPUs [61], to also en-
have the inherent capacity to capture the characteristics compass quantum processing units (QPUs). This project
of a quantum mechanical process. Concretely, we list has evolved into TensorFlow Quantum. TFQ is an inte-
practical examples of classes of quantum data, which can gration of Cirq with TensorFlow that allows researchers
be routinely generated or simulated on existing quantum and students to simulate QPUs while designing, training,
devices or processors: and testing hybrid quantum-classical models, and even-
Quantum simulations: These can include output states tually run the quantum portions of these models on ac-
of quantum chemistry simulations used to extract infor- tual quantum processors as they come online. A core con-
mation about chemical structures and chemical reactions. tribution of TFQ is seamless backpropagation through
Potential applications include material science, computa- combinations of classical and quantum layers in hybrid
tional chemistry, computational biology, and drug discov- quantum-classical models. This allows QML researchers
ery. Another example is data from quantum many-body to directly harness the rich set of tools already available
systems and quantum critical systems in condensed mat- in TF and Keras.
ter physics, which could be used to model and design The remainder of this document describes TFQ and a
exotic states of matter which exhibit many-body quan- selection of applications demonstrating some of the chal-
tum effects. lenges TFQ can help tackle. In section II, we introduce
Quantum communication networks: Machine learn- the software architecture of TFQ. We highlight its main
ing in this class of systems will be related to distilling features including batched circuit execution, automated
small-scale quantum data; e.g., to discriminate among expectation estimation, estimation of quantum gradients,
non-orthogonal quantum states [42, 58], with application hybrid quantum-classical automatic differentiation, and
to design and construction of quantum error correcting rapid model construction, all from within TensorFlow.
codes for quantum repeaters, quantum receivers, and pu- We also present a simple “Hello, World” example for bi-
rification units. nary quantum data classification on a single qubit. By
Quantum metrology: Quantum-enhanced high preci- the end of section II, we expect most readers to have suf-
sion measurements such as quantum sensing and quan- ficient knowledge to begin development with TFQ. For
tum imaging are inherently done on probes that are readers who are interested in a more theoretical under-
small-scale quantum devices and could be designed or standing of QNNs, we provide in section III an overview
improved by variational quantum models. of QNN models and hybrid quantum-classical backprop-
Quantum control : Variationally learning hybrid agation. For researchers interested in applying TFQ to
quantum-classical algorithms can lead to new optimal their own projects, we provide various applications in
open or closed-loop control [59], calibration, and error sections IV and V. In section IV, we describe hybrid
mitigation, correction, and verification strategies [60] for quantum-classical CNNs for binary classification of quan-
near-term quantum devices and quantum processors. tum phases, hybrid quantum-classical ML for quantum
Of course, this is not a comprehensive list of quantum control, and MaxCut QAOA. In the advanced applica-
data. We hope that, with proper software tooling, re- tions section V, we describe meta-learning for quantum
searchers will be able to find applications of QML in all approximate optimization, discuss issues with vanishing
of the above areas and other categories of applications gradients and how we can overcome them by adaptive
beyond what we can currently envision. layer-wise learning schemes, Hamiltonian learning with
quantum graph networks, and quantum mixed state gen-
eration via classical energy-based models.
D. TensorFlow Quantum We hope that TFQ enables the machine learning and
quantum computing communities to work together more
Today, exploring new hybrid quantum-classical mod- closely on important challenges and opportunities in the
els is a difficult and error-prone task. The engineering near-term and beyond.
5

II. SOFTWARE ARCHITECTURE & BUILDING B. TensorFlow


BLOCKS
TensorFlow is a language for describing computations
As stated in the introduction, the goal of TFQ is as stateful dataflow graphs [61]. Describing machine
to bridge the quantum computing and machine learn- learning models as dataflow graphs is advantageous for
ing communities. Google already has well-established performance during training. First, it is easy to ob-
products for these communities: Cirq, an open source tain gradients of dataflow graphs using backpropagation
library for invoking quantum circuits [62], and Tensor- [64], allowing efficient parameter updates. Second, inde-
Flow, an end-to-end open source machine learning plat- pendent nodes of the computational graph may be dis-
form [61]. However, the emerging community of quantum tributed across independent machines, including GPUs
machine learning researchers requires the capabilities of and TPUs, and run in parallel. These computational ad-
both products. The prospect of combining Cirq and Ten- vantages established TensorFlow as a powerful tool for
sorFlow then naturally arises. machine learning and deep learning.
First, we review the capabilities of Cirq and Tensor- TensorFlow constructs this dataflow graph using ten-
Flow. We confront the challenges that arise when one sors for the directed edges and operations (ops) for the
attempts to combine both products. These challenges nodes. For our purposes, a rank n tensor is simply an n-
inform the design goals relevant when building software dimensional array. In TensorFlow, tensors are addition-
specific to quantum machine learning. We provide an ally associated with a data type, such as integer or string.
overview of the architecture of TFQ and describe a par- Tensors are a convenient way of thinking about data; in
ticular abstract pipeline for building a hybrid model for machine learning, the first index is often reserved for iter-
classification of quantum data. Then we illustrate this ation over the members of a dataset. Additional indices
pipeline via the exposition of a minimal hybrid model can indicate the application of several filters, e.g., in con-
which makes use of the core features of TFQ. We con- volutional neural networks with several feature maps.
clude with a description of our performant C++ simula- In general, an op is a function mapping input tensors
tor for quantum circuits and provide benchmarks of per- to output tensors. Ops may act on zero or more input
formance on two complementary classes of random and tensors, always producing at least one tensor as output.
structured quantum circuits. For example, the addition op ingests two tensors and out-
puts one tensor representing the elementwise sum of the
inputs, while a constant op ingests no tensors, taking the
A. Cirq role of a root node in the dataflow graph. The combina-
tion of ops and tensors gives the backend of TensorFlow
Cirq is an open-source framework for invoking quan- the structure of a directed acyclic graph. A visualiza-
tum circuits on near term devices [62]. It contains tion of the backend structure corresponding to a simple
the basic structures, such as qubits, gates, circuits, and computation in TensorFlow is given in Fig. 1.
measurement operators, that are required for specifying
quantum computations. User-specified quantum compu-
tations can then be executed in simulation or on real
hardware. Cirq also contains substantial machinery that
helps users design efficient algorithms for NISQ machines,
such as compilers and schedulers. Below we show ex-
ample Cirq code for calculating the expectation value of
Ẑ1 Ẑ2 for a Bell state:
Figure 1. A simple example of the TensorFlow computational
( q1 , q2 ) = cirq . GridQubit . rect (1 ,2)
model. Two tensor inputs A and B are added and then mul-
c = cirq . Circuit ( cirq . H ( q1 ) , cirq . CNOT ( q1 , q2 ) )
ZZ = cirq . Z ( q1 ) * cirq . Z ( q2 ) tiplied against a third tensor input C, before flowing on to
bell = cirq . Simulator () . simulate ( c ) . final_state further nodes in the graph. Blue nodes are tensor injections
expectation = ZZ . e x p e c t a t i o n _ f r o m _ w a v e f u n c t i o n ( (ops), arrows are tensors flowing through the computational
bell , dict ( zip ([ q1 , q2 ] ,[0 ,1]) ) ) graph, and orange nodes are tensor transformations (ops).
Tensor injections are ops in the sense that they are functions
Cirq uses SymPy [63] symbols to represent free param- which take in zero tensors and output one tensor.
eters in gates and circuits. You replace free parame-
ters in a circuit with specific numbers by passing a Cirq
It is worth noting that this tensorial data format is
ParamResolver object with your circuit to the simulator.
not to be confused with Tensor Networks [65], which are
Below we construct a parameterized circuit and simulate
a mathematical tool used in condensed matter physics
the output state for θ = 1:
and quantum information science to efficiently represent
theta = sympy . Symbol ( ’ theta ’) many-body quantum states and operations. Recently, li-
c = cirq . Circuit ( cirq . Rx ( theta ) . on ( q1 ) )
braries for building such Tensor Networks in TensorFlow
resolver = cirq . ParamResolver ({ theta :1})
results = cirq . Simulator () . simulate (c , resolver ) have become available [66], we refer the reader to the
corresponding blog post for better understanding of the
6

difference between TensorFlow tensors and the tensor ob- tum device itself, and the high latency between classi-
jects in Tensor Networks [67]. cal and quantum processors makes sending transforma-
The recently announced TensorFlow 2 [68] takes the tions one-by-one prohibitive. Lastly, one needs a way
dataflow graph structure as a foundation and adds high- to specify gates and measurements within TF. One may
level abstractions. One new feature is the Python func- be tempted to define these directly; however, Cirq al-
tion decorator @tf.function , which automatically con- ready has the necessary tools and objects defined which
verts the decorated function into a graph computation. are most relevant for the near-term quantum computing
Also relevant is the native support for Keras [69], which era. Duplicating Cirq functionality in TF would lead to
provides the Layer and Model constructs. These ab- several issues, requiring users to re-learn how to inter-
stractions allow the concise definition of machine learn- face with quantum computers in TFQ versus Cirq, and
ing models which ingest and process data, all backed by adding to the maintenance overhead by needing to keep
dataflow graph computation. The increasing levels of ab- two separate quantum circuit construction frameworks
straction and heterogenous hardware backing which to- up-to-date as new compilation techniques arise. These
gether constitute the TensorFlow stack can be visualized considerations motivate our core design principles.
with the orange and gray boxes in our stack diagram in
Fig. 4. The combination of these high-level abstractions
and efficient dataflow graph backend makes TensorFlow D. TFQ architecture
2 an ideal platform for data-driven machine learning re-
search. 1. Design Principles and Overview

C. Technical Hurdles in Combining Cirq with To avoid the aforementioned technical hurdles and in
TensorFlow order to satisfy the diverse needs of the research commu-
nity, we have arrived at the following four design princi-
ples:
There are many ways one could imagine combining the
capabilities of Cirq and TensorFlow. One possible ap-
proach is to let graph edges represent quantum states and 1. Differentiability. As described in the introduc-
let ops represent transformations of the state, such as ap- tion, gradient-based methods leveraging autodiffer-
plying circuits and taking measurements. This approach entiation have become the leading heuristic for op-
can be called the “states-as-edges” architecture. We show timization of machine learning models. A software
in Fig. 2 how to reformulate the Bell state preparation framework for QML must support differentiation of
and measurement discussed in section II A within this quantum circuits so that hybrid quantum-classical
proposed architecture. models can participate in backpropagation.

2. Circuit Batching. Learning on quantum data re-


quires re-running parameterized model circuits on
each quantum data point. A QML software frame-
work must be optimized for running large num-
Figure 2. The states-as-edges approach to embedding quan- bers of such circuits. Ideally, the semantics should
tum computation in TensorFlow. Blue nodes are input ten- match established TensorFlow norms for batching
sors, arrows are tensors flowing through the graph, and orange over data.
nodes are TF Ops transforming the simulated quantum state.
Note that the above is not the architecture used in TFQ but 3. Execution Backend Agnostic. Experimental
rather an alternative which was considered, see Fig. 3 for the quantum computing often involves reconciling per-
equivalent diagram for the true TFQ architecture.
fectly simulated algorithms with the outputs of
real, noisy devices. Thus, QML software must al-
This architecture may at first glance seem like an at- low users to easily switch between running models
tractive option as it is a direct formulation of quantum in simulation and running models on real hardware,
computation as a dataflow graph. However, this ap- such that simulated results and experimental re-
proach is suboptimal for several reasons. First, in this sults can be directly compared.
architecture, the structure of the circuit being run is
static in the computational graph, thus running a differ- 4. Minimalism. Cirq provides an extensive set of
ent circuit would require the graph to be rebuilt. This is tools for preparing quantum circuits. TensorFlow
far from ideal for variational quantum algorithms which provides a very complete machine learning toolkit
learn over many iterations with a slightly modified quan- through its hundreds of ops and Keras high-level
tum circuit at each iteration. A second problem is the API, with a massive community of active users. Ex-
lack of a clear way to embed such a quantum dataflow isting functionality in Cirq and TensorFlow should
graph on a real quantum processor: the states would be used as much as possible. TFQ should serve as a
have to remain held in quantum memory on the quan- bridge between the two that does not require users
7

Classical Data: Quantum Data:


integers/floats/strings Circuits/Operators

TF Keras Models
TensorFlow
TFQ
TF Layers TFQ Layers
Differentiators
TFQ
Figure 3. The TensorFlow graph generated to calculate Classical
the TF Ops TFQ Ops
expectation value of a parameterized circuit. The hardware
symbol
values can come from other TensorFlow ops, such as from
Quantum TF Execution Engine TFQ qsim Cirq
the outputs of a classical neural network. The output hardware
can be
passed on to other ops in the graph; here, for illustration, the TPU GPU CPU QPU
output is passed to the absolute value op.

Figure 4. The software stack of TFQ, showing its interactions


to re-learn how to interface with quantum comput- with TensorFlow, Cirq, and computational hardware. At the
ers or re-learn how to solve problems using machine top of the stack is the data to be processed. Classical data
learning. is natively processed by TensorFlow; TFQ adds the ability to
process quantum data, consisting of both quantum circuits
and quantum operators. The next level down the stack is the
First, we provide a bottom-up overview of TFQ to
Keras API in TensorFlow. Since a core principle of TFQ is
provide intuition on how the framework functions at a native integration with core TensorFlow, in particular with
fundamental level. In TFQ, circuits and other quantum Keras models and optimizers, this level spans the full width
computing constructs are tensors, and converting these of the stack. Underneath the Keras model abstractions are
tensors into classical information via simulation or exe- our quantum layers and differentiators, which enable hybrid
cution on a quantum device is done by ops. These ten- quantum-classical automatic differentiation when connected
sors are created by converting Cirq objects to TensorFlow with classical TensorFlow layers. Underneath the layers and
string tensors, using the tfq.convert_to_tensor function. differentiators, we have TensorFlow ops, which instantiate the
This takes in a cirq.Circuit or cirq.PauliSum object and dataflow graph. Our custom ops control quantum circuit ex-
ecution. The circuits can be run in simulation mode, by in-
creates a string tensor representation. The cirq.Circuit voking qsim or Cirq, or eventually will be executed on QPU
objects may be parameterized by SymPy symbols. hardware.
These tensors are then converted to classical informa-
tion via state simulation, expectation value calculation,
or sampling. TFQ provides ops for each of these compu- section II F), or on a real device. This is configured on
tations. The following code snippet shows how a simple instantiation.
parameterized circuit may be created using Cirq, and The expectation op is fully differentiable. Given
its Ẑ expectation evaluated at different parameter values that there are many ways to calculate the gradient of
using the tfq expectation value calculation op. We feed a quantum circuit with respect to its input parame-
the output into the tf.math.abs op to show that tfq ops ters, TFQ allows expectation ops to be configured with
integrate naively with tf ops. one of many built-in differentiation methods using the
tfq.Differentiator interface, such as finite differencing,
qubit = cirq . GridQubit (0 , 0)
theta = sympy . Symbol ( ’ theta ’) parameter shift rules, and various stochastic methods.
c = cirq . Circuit ( cirq . X ( qubit ) ** theta ) The tfq.Differentiator interface also allows users to de-
c_tensor = tfq . c o n v e r t _ t o _ t e n s o r ([ c ] * 3) fine their own gradient calculation methods for their spe-
theta_values = tf . constant ([[0] ,[1] ,[2]])
m = cirq . Z ( qubit )
cific problem if they desire.
paulis = tfq . c o n v e r t _ t o _ t e n s o r ([ m ] * 3) The tensor representation of circuits and Paulis along
expe ctation_ op = tfq . g e t _ e x p e c t a t i o n _ o p () with the execution ops are all that are required to solve
output = expecta tion_op ( any problem in QML. However, as a convenience, TFQ
c_tensor , [ ’ theta ’] , theta_values , paulis ) provides an additional op for in-graph circuit construc-
abs_output = tf . math . abs ( output )
tion. This was found to be convenient when solving prob-
We supply the expectation op with a tensor of parame- lems where most of the circuit being run is static and
terized circuits, a list of symbols contained in the circuits, only a small part of it is being changed during train-
a tensor of values to use for those symbols, and tensor ing or inference. This functionality is provided by the
operators to measure with respect to. Given this, it out- tfq.tfq_append_circuit op. It is expected that all but
puts a tensor of expectation values. The graph this code the most dedicated users will never touch these low-
generates is given by Fig. 3. level ops, and instead will interface with TFQ using our
The expectation op is capable of running circuits on tf.keras.layers that provide a simplified interface.
a simulated backend, which can be a Cirq simulator or The tools provided by TFQ can interact with both
our native TFQ simulator qsim (described in detail in core TensorFlow and, via Cirq, real quantum hardware.
8

Evaluate Gradients & thus making it accessible to local measurements and clas-
Update Parameters
sical post-processing. Quantum models are constructed
using cirq.Circuit objects containing SymPy symbols,
and can be attached to quantum data sources using the
Evaluate tfq.AddCircuit layer.
Cost
Function
(3) Sample or Average: Measurement of quantum
states extracts classical information, in the form of sam-
𝛉 ples from a classical random variable. The distribution
of values from this random variable generally depends
Prepare Evaluate Sample Evaluate on both the quantum state itself and the measured ob-
Quantum Dataset Quantum or Classical servable. As many variational algorithms depend on
Model Average Model mean values of measurements, TFQ provides methods
for averaging over several runs involving steps (1) and
Figure 5. Abstract pipeline for inference and training of (2). Sampling or averaging are performed by feeding
a hybrid discriminative model in TFQ. Here, Φ represents quantum data and quantum models to the tfq.Sample
the quantum model parameters and θ represents the classical
model parameters.
or tfq.Expectation layers.
(4) Evaluate Classical Model: Once classical
information has been extracted, it is in a format
The functionality of all three software products and the amenable to further classical post-processing. As the
interfaces between them can be visualized with the help extracted information may still be encoded in classi-
of a “software-stack” diagram, shown in Fig. 4. cal correlations between measured expectations, clas-
In the next section, we describe an example of an sical deep neural networks can be applied to distill
abstract quantum machine learning pipeline for hybrid such correlations. Since TFQ is fully compatible with
discriminator model that TFQ was designed to support. core TensorFlow, quantum models can be attached di-
Then we illustrate the TFQ pipeline via a Hello Many- rectly to classical tf.keras.layers.Layer objects such as
Worlds example, which involves building the simplest tf.keras.layers.Dense .
possible hybrid quantum-classical model for a binary (5) Evaluate Cost Function: Given the results of
classification task on a single qubit. More detailed in- classical post-processing, a cost function is calculated.
formation on the building blocks of TFQ features will be This may be based on the accuracy of classification if the
given in section II E. quantum data was labeled, or other criteria if the task
is unsupervised. Wrapping the model built in stages (1)
through (4) inside a tf.keras.Model gives the user access
2. The Abstract TFQ Pipeline for a specific hybrid to all the losses in the tf.keras.losses module.
discriminator model (6) Evaluate Gradients & Update Parameters:
After evaluating the cost function, the free parame-
Here, we provide a high-level abstract overview of the ters in the pipeline is updated in a direction expected
computational steps involved in the end-to-end pipeline to decrease the cost. This is most commonly per-
for inference and training of a hybrid quantum-classical formed via gradient descent. To support gradient de-
discriminative model for quantum data in TFQ. scent, TFQ exposes derivatives of quantum operations
(1) Prepare Quantum Dataset: In general, this to the TensorFlow backpropagation machinery via the
might come from a given black-box source. However, tfq.differentiators.Differentiator interface. This allows
as current quantum computers cannot import quantum both the quantum and classical models’ parameters to
data from external sources, the user has to specify quan- be optimized against quantum data via hybrid quantum-
tum circuits which generate the data. Quantum datasets classical backpropagation. See section III for details on
are prepared using unparameterized cirq.Circuit ob- the theory.
jects and are injected into the computational graph using In the next section, we illustrate this abstract pipeline
tfq.convert_to_tensor . by applying it to a specific example. While simple, the
(2) Evaluate Quantum Model: Parameterized example is the minimum instance of a hybrid quantum-
quantum models can be selected from several categories classical model operating on quantum data.
based on knowledge of the quantum data’s structure.
The goal of the model is to perform a quantum compu-
tation in order to extract information hidden in a quan- 3. Hello Many-Worlds: Binary Classifier for Quantum
tum subspace or subsystem. In the case of discrimina- Data
tive learning, this information is the hidden label pa-
rameters. To extract a quantum non-local subsystem, Binary classification is a basic task in machine learn-
the quantum model disentangles the input data, leaving ing that can be applied to quantum data as well. As a
the hidden information encoded in classical correlations, minimal example of a hybrid quantum-classical model,
9

Figure 7. (1) Quantum data to be classified. (2) Parame-


terized rotation gate, whose job is to remove superpositions
in the quantum data. (3) Measurement along the Z axis of
the Bloch sphere converts the quantum data into classical
data. (4) Classical post-processing is a two-output SoftMax
layer, which outputs probabilities for the data to come from
Figure 6. Quantum data represented on the Bloch sphere. category a or category b. (5) Categorical cross entropy is
States in category a are blue, while states in category b are computed between the predictions and the labels. The Adam
orange. The vectors are the states around which the samples optimizer [70] is used to update both the quantum and clas-
were taken. The parameters used to generate this data are: sical portions of the hybrid model.
θa = 1, θb = 4, and N = 200.

return ( tfq . c o n v e r t _ t o _ t e n s o r ( q_data ) ,


we present here a binary classifier for regions on a sin- np . array ( labels ) )
gle qubit. In this task, two random vectors in the X-Z We can generate a dataset and the associated labels after
plane of the Bloch sphere are chosen. Around these two picking some parameter values:
vectors, we randomly sample two sets of quantum data
points; the task is to learn to distinguish the two sets. An qubit = cirq . GridQubit (0 , 0)
theta_a = 1
example quantum dataset of this type is shown in Fig. 6. theta_b = 4
The following can all be run in-browser by navigating to num_samples = 200
the Colab example notebook at q_data , labels = g e n e r a t e _ da t a s e t (
qubit , theta_a , theta_b , num_samples )
research/binary classifier/binary classifier.ipynb
As our quantum parametric model, we use the simplest
Additionally, the code in this example can be copy-pasted case of a universal quantum discriminator [42, 60], a
into a python script after installing TFQ. single parameterized rotation (linear) and measurement
To solve this problem, we use the pipeline shown in along the Z axis (non-linear):
Fig. 5, specialized to one-qubit binary classification. This
specialization is shown in Fig. 7. theta = sympy . Symbol ( ’ theta ’)
q_model = cirq . Circuit ( cirq . Ry ( theta ) ( qubit ) )
The first step is to generate the quantum data. We can q_data_input = tf . keras . Input (
use Cirq for this task. The common imports required for shape =() , dtype = tf . dtypes . string )
working with TFQ are shown below: expectation = tfq . layers . PQC (
q_model , cirq . Z ( qubit ) )
import cirq , random , sympy
e x p e c t a t i o n _ o u t p u t = expectation ( q_data_input )
import numpy as np
import tensorflow as tf The purpose of the rotation gate is to minimize the su-
import t e n s o r f l o w _ q u a n t u m as tfq
perposition from the input quantum data such that we
The function below generates the quantum dataset; la- can get maximum useful information from the measure-
bels use a one-hot encoding: ment. This quantum model is then attached to a small
def g e n e r a t e _ d a t a s e t ( classifier NN to complete our hybrid model. Notice in
qubit , theta_a , theta_b , num_samples ) : the code below that quantum layers can appear among
q_data = [] classical layers inside a standard Keras model:
labels = []
blob_size = abs ( theta_a - theta_b ) / 5 classifier = tf . keras . layers . Dense (
for _ in range ( num_samples ) : 2 , activation = tf . keras . activations . softmax )
coin = random . random () c l a s s i f i e r _ o u t p u t = classifier (
spread_x , spread_y = np . random . uniform ( expectation_output )
- blob_size , blob_size , 2) model = tf . keras . Model ( inputs = q_data_input ,
if coin < 0.5: outputs = c l a s s i f i e r _ o u t p u t )
label = [1 , 0]
angle = theta_a + spread_y
We can train this hybrid model on the quantum data
else : defined earlier. Below we use as our loss function the
label = [0 , 1] cross entropy between the labels and the predictions of
angle = theta_b + spread_y the classical NN; the ADAM optimizer is chosen for pa-
labels . append ( label ) rameter updates.
q_data . append ( cirq . Circuit (
cirq . Ry ( - angle ) ( qubit ) , optimizer = tf . keras . optimizers . Adam (
cirq . Rx ( - spread_x ) ( qubit ) ) ) learning_rate =0.1)
10

loss = tf . keras . losses . C a t e g o r i c a l C r o s s e n t r o p y () This conversion is backed by our custom serializers. Once
model . compile ( optimizer = optimizer , loss = loss ) a Circuit or PauliSum is serialized, it becomes a ten-
history = model . fit ( sor of type tf.string . This is the reason for the use
x = q_data , y = labels , epochs =50)
of tf.keras.Input(shape=(), dtype=tf.dtypes.string) when
Finally, we can use our trained hybrid model to classify creating inputs to Keras models, as seen in the quantum
new quantum datapoints: binary classifier example.
test_data , _ = g e n e r a t e _ d a t a s e t (
qubit , theta_a , theta_b , 1)
p = model . predict ( test_data ) [0] 2. Composing Quantum Models
print ( f " prob ( a ) ={ p [0]:.4 f } , prob ( b ) ={ p [1]:.4 f } " )

This section provided a rapid introduction to just that After injecting quantum data and quantum mod-
code needed to complete the task at hand. The follow- els into the computational graph, a custom Tensor-
ing section reviews the features of TFQ in a more API Flow operation is required to combine them. In
reference inspired style. support of guiding principle 2, TFQ implements the
tfq.layers.AddCircuit layer for combining tensors of cir-
cuits. In the following code, we use this functionality
E. TFQ Building Blocks to combine the quantum data point and quantum model
defined in subsection II E 1:
Having provided a minimum working example in the add_op = tfq . layers . AddCircuit ()
previous section, we now seek to provide more details data _and_mod el = add_op ( q_data , append = q_model )
about the components of the TFQ framework. First, we
To quantify the performance of a quantum model on a
describe how quantum computations specified in Cirq are
quantum dataset, we need the ability to define loss func-
converted to tensors for use inside the TensorFlow graph.
tions. This requires converting quantum information into
Then, we describe how these tensors can be combined in-
classical information. This conversion process is accom-
graph to yield larger models. Next, we show how circuits
plished by either sampling the quantum model, which
are simulated and measured in TFQ. The core functional-
stochastically produces bitstrings according to the prob-
ity of the framework, differentiation of quantum circuits,
ability amplitudes of the model, or by specifying a mea-
is then explored. Finally, we describe our more abstract
surement and taking expectation values.
layers, which can be used to simplify many QML work-
flows.
3. Sampling and Expectation Values

1. Quantum Computations as Tensors


Sampling from quantum circuits is an important use
case in quantum computing. The recently achieved mile-
As pointed out in section II A, Cirq already contains stone of quantum supremacy [13] is one such application,
the language necessary to express quantum computa- in which the difficulty of sampling from a quantum model
tions, parameterized circuits, and measurements. Guided was used to gain a computational edge over classical ma-
by principle 4, TFQ should allow direct injection of Cirq chines.
expressions into the computational graph of TensorFlow. TFQ implements tfq.layers.Sample , a Keras layer
This is enabled by the tfq.convert_to_tensor function.
which enables sampling from batches of circuits in sup-
We saw the use of this function in the quantum binary port of design objective 2. The user supplies a ten-
classifier, where a list of data generation circuits specified sor of parameterized circuits, a list of symbols con-
in Cirq was wrapped in this function to promote them tained in the circuits, and a tensor of values to sub-
to tensors. Below we show how a quantum data point, stitute for the symbols in the circuit. Given these,
a quantum model, and a quantum measurement can be the Sample layer produces a tf.RaggedTensor of shape
converted into tensors:
[batch_size, num_samples, n_qubits] , where the n qubits
q0 = cirq . GridQubit (0 , 0) dimension is ragged to account for the possibly varying
q_data_raw = cirq . Circuit ( cirq . H ( q0 ) )
q_data = tfq . c o n v e r t _ t o _ t e n s o r ([ q_data_raw ])
circuit size over the input batch of quantum data. For
example, the following code takes the combined data and
theta = sympy . Symbol ( ’ theta ’) model from section II E 2 and produces a tensor of size
q_model_raw = cirq . Circuit ( [1, 4, 1] containing four single-bit samples:
cirq . Ry ( theta ) . on ( q0 ) )
q_model = tfq . c o n v e r t _ t o _ t e n s o r ([ q_model_raw ]) sample_layer = tfq . layers . Sample ()
samples = sample_layer (
q_measure_raw = 0.5 * cirq . Z ( q0 ) data_and_model , symbol_names =[ ’ theta ’] ,
q_measure = tfq . c o n v e r t _ t o _ t e n s o r ( symbol_values =[[0.5]] , repetitions =4)
[ q_measure_raw ])
11

Though sampling is the fundamental interface between first guiding principle, differentiability is the critical ma-
quantum and classical information, differentiability of chinery needed to allow training of these models. As de-
quantum circuits is much more convenient when using scribed in section II B, the architecture of TensorFlow is
expectation values, as gradient information can then optimized around backpropagation of errors for efficient
be backpropagated (see section III for more details). updates of model parameters; one of the core contribu-
In the simplest case, expectation values are simply av- tions of TFQ is integration with TensorFlow’s backprop-
erages over samples. In quantum computing, expectation agation mechanism. TFQ implements this functionality
values are typically taken with respect to a measurement with our differentiators module. The theory of quan-
operator M . This involves sampling bitstrings from the tum circuit differentiation will be covered in section III C;
quantum circuit as described above, applying M to the here, we overview the software that implements the the-
list of bitstring samples to produce a list of numbers, ory.
then taking the average of the result. TFQ provides two Since there are many ways to calculate gra-
related layers with this capability: dients of quantum circuits, TFQ provides the
In contrast to sampling (which is by default in the stan- tfq.differentiators.Differentiator interface. Our
dard computational basis, the Ẑ eigenbasis of all qubits), Expectation and SampledExpectation layers rely on
taking expectation values requires defining a measure-
classes inheriting from this interface to specify how
ment. As discussed in section II A, these are first defined
TensorFlow should compute their gradients. While
as cirq.PauliSum objects and converted to tensors. TFQ
advanced users can implement their own custom differ-
implements tfq.layers.Expectation , a Keras layer which entiators by inheriting from the interface, TFQ comes
enables the extraction of measurement expectation val- with several built-in options, two of which we highlight
ues from quantum models. The user supplies a tensor of here. These two methods are instances of the two main
parameterized circuits, a list of symbols contained in the categories of quantum circuit differentiators: the finite
circuits, a tensor of values to substitute for the symbols difference methods and the parameter shift methods.
in the circuit, and a tensor of operators to measure with
respect to them. Given these inputs, the layer outputs The first class of quantum circuit differentiators is the
a tensor of expectation values. Below, we show how to finite difference methods. This class samples the primary
take an expectation value of the measurement defined in quantum circuit for at least two different parameter set-
section II E 1: tings, then combines them to estimate the derivative.
The ForwardDifference differentiator provides most ba-
e x p e c t a t i o n _ l a y e r = tfq . layers . Expectation () sic version of this. For each parameter in the circuit, the
expectations = e x p e c t a t i o n _ l a y e r (
circuit is sampled at the current setting of the parame-
circuit = data_and_model ,
symbol_names = [ ’ theta ’] , ter. Then, each parameter is perturbed separately and
symbol_values = [[0.5]] , the circuit resampled.
operators = q_measure )
For the 2-local circuits implementable on near-term
In Fig. 3, we illustrate the dataflow graph which backs hardware, methods more sophisticated than finite differ-
the expectation layer, when the parameter values are sup- ences are possible. These methods involve running an
plied by a classical neural network. The expectation layer ancillary quantum circuit, from which the gradient of the
is capable of using either a simulator or a real device for the primary circuit with respect to some parameter can
execution, and this choice is simply specified at run time. be directly measured. One specific method is gate de-
While Cirq simulators may be used for the backend, TFQ composition and parameter shifting [71], implemented in
provides its own native TensorFlow simulator written in TFQ as the ParameterShift differentiator. For in-depth
performant C++. A description of our quantum circuit discussion of the theory, see section III C 2.
simulation code is given in section II F. The differentiation rule used by our layers is specified
Having converted the output of a quantum model into through an optional keyword argument. Below, we show
classical information, the results can be fed into subse- the expectation layer being called with our parameter
quent computations. In particular, they can be fed into shift rule:
functions that produce a single number, allowing us to
define loss functions over quantum models in the same
diff = tfq . di ff er e nt ia to r s . Paramet erShift ()
way we do for classical models. expectation = tfq . layers . Expectation (
diff erentiat or = diff )

4. Differentiating Quantum Circuits


For further discussion of the trade-offs when choosing
We have taken the first steps towards implementation between differentiators, see the gradients tutorial on our
of quantum machine learning, having defined quantum GitHub website:
models over quantum data and loss functions over those
models. As described in both the introduction and our docs/tutorials/gradients.ipynb
12

5. Simplified Layers

Some workflows do not require control as sophisticated


as our Expectation , Sample , and SampledExpectation lay-
ers allow. For these workflows we provide the PQC and
ControlledPQC layers. Both of these layers allow parame-
terized circuits to be updated by hybrid backpropagation Figure 8. Visualization of the qsim fusing algorithm. After
without the user needing to provide the list of symbols contraction, only two-qubit gates remain, increasing the speed
associated with the circuit. The PQC layer provides au- of subsequent circuit simulation.
tomated Keras management of the variables in a param-
eterized circuit:
tom TensorFlow ops, and the relaying of results back to
q = cirq . GridQubit (0 , 0) Python.
(a , b ) = sympy . symbols ( " a b " )
circuit = cirq . Circuit (
cirq . Rz ( a ) ( q ) , cirq . Rx ( b ) ( q ) )
outputs = tfq . layers . PQC ( circuit , cirq . Z ( q ) ) 1. Comment on the Simulation of Quantum Circuits
quantum_data = tfq . c o n v e r t _ t o _ t e n s o r ([
cirq . Circuit () , cirq . Circuit ( cirq . X ( q ) ) ])
res = outputs ( quantum_data )
To motivate the qsim fusing algorithm, consider how
circuits are applied to states in simulation. Suppose
When the variables in a parameterized circuit will be we wish to apply two gates G1 and G2 to our initial
controlled completely by other user-specified machinery, state |ψi, and suppose these gates act on the same two
for example by a classical neural network, then the user qubits. Since gate application is associative, we have
can call our ControlledPQC layer: (G2 G1 ) |ψi = G2 (G1 |ψi). However, as the number of
qubits n supporting |ψi grows, the left side of the equal-
outputs = tfq . layers . ControlledPQC ( ity becomes much faster to compute. This is because
circuit , cirq . Z ( bit ) )
model_params = tf . c o n v e r t _ t o _ t e n s o r (
applying a gate to a state requires broadcasting the pa-
[[0.5 , 0.5] , [0.25 , 0.75]]) rameters of the gate to all 2n elements of the state vector,
res = outputs ([ quantum_data , model_params ]) so that each gate application incurs a cost scaling as 2n .
In contrast, multiplying two-qubit matrices incurs a small
Notice that the call is similar to that for PQC, except constant cost. This means a simulation of a circuit will
that we provide parameter values for the symbols in the be fastest if we pre-multiply as many gates as possible,
circuit. These two layers are used extensively in the ap- while keeping the matrix size small, before applying them
plications highlighted in the following sections. to a state. This pre-multiplication is called gate fusion
and is accomplished with the qsim fusion algorithm.

F. Quantum Circuit Simulation with qsim


2. Gate Fusion with qsim
Concurrently with TFQ, we are open sourcing qsim,
a software package for simulating quantum circuits on The core idea of the fusion algorithm is to multiply
classical computers. We have adapted its C++ imple- each two-qubit gate in the circuit with all of the one-qubit
mentation to work inside TFQ’s TensorFlow ops. The gates around it. Suppose we are given a circuit with N
performance of qsim derives from two key ideas that can qubits and T timesteps. The circuit can be interpreted
be seen in the literature on classical simulators for quan- as a two dimensional lattice: the row index n labels the
tum circuits [72, 73]. The first idea is the fusion of gates qubit, and the column index t labels the timestep. Fu-
in a quantum circuit with their neighbors to reduce the sion is initialized by setting a counter cn = 0 for each
number of matrix multiplications required when apply- n ∈ {1, . . . , N }. Then, for each timestep, check each row
ing the circuit to a wavefunction. The second idea is for a two-qubit gate A; when such a gate is found, the
to create a set of matrix multiplication functions specifi- search pauses and A becomes the anchor gate for a round
cally optimized for the application of two-qubit gates to of gate fusion. Suppose A is supported on qubits na and
state vectors, to take maximal advantage of gate fusion. nb and resides at timestep t. By construction, on row na
We discuss these points in detail below. To quantify the all gates from timestep cna to t − 1 are single qubit gates;
performance of qsim, we also provide an initial bench- thus we can multiply them all together and into A. We
mark comparing qsim to Cirq. We note that qsim is then continue down row na starting at timestep t + 1,
significantly faster on both random and structured cir- multiplying single qubit gates into A, until we encounter
cuits. We further note that the qsim benchmark times another two-qubit gate, at time ta . The fusing on this
include the full TFQ software stack of serializing a cir- row stops, and we set cna = ta . Then we do the same for
cuit to ProtoBuffs in Python, conversion of ProtoBuffs row nb . After this process, all one-qubit gates that can
to C++ objects inside the dataflow graph for our cus- be reached from A have been absorbed. If at this point
13

cna = cnb , then the gate at timestep ta is a two-qubit Random Quantum Circuits Structured Quantum Circuits
3

Total Simulation Time (Log10(s))

Total Simulation Time (Log10(s))


gate which is also supported on qubits na and nb ; thus 3
this gate can also be multiplied into the anchor gate A.
2
If a two-qubit gate is absorbed in this manner, the pro- 2
cess of absorbing one-qubit gates starts again from ta +1. 1
This continues until either cna 6= cnb , which means a two- 1

qubit gate has been encountered which is not supported 0


0
on both na and nb , or cna = cnb = T , which means the
end of the circuit has been reached on these two qubits.
5 10 15 20 5 10 15 20
Finally, gate A has completed its fusion process, and we Number of Qubits Number of Qubits
continue our search through the lattice for unprocessed
two-qubit anchor gates. The qsim fusion algorithm is
illustrated on a simple three-qubit circuit in Fig. 8. Af- Figure 9. Performance of TFQ (orange) and Cirq (blue) on
simulation tasks. The plots show the base 10 logarithm of
ter this process of gate fusion, we can run simulators
the total time to solution (in seconds) versus the number of
optimized for the application of two-qubit gates. These qubits simulated. Simulation of 500 random circuits, taken in
simulators are described next. batches of 50. Circuits were of depth 40. (a) At the largest
tested size of 20 qubits, we see approximately 7 times savings
in simulation time versus Cirq. (b) Simulation of structured
3. Hardware-Specific Instruction Sets circuits. The smaller scale of entanglement makes these cir-
cuits more amenable to compaction via the qsim Fusion algo-
With the given quantum circuit fused into the minimal rithm. At the largest tested size of 20 qubits, we see TFQ is
up to 100 times faster than parallelized Cirq.
number of two-qubit gates, we need simulators optimized
for applying 4 × 4 matrices to state vectors. TFQ offers
three instruction set tiers, each taking advantage of in-
creasingly modern CPU commands for increasing simu-
lation speed. The simulators are adaptive to the user’s
available hardware. The first tier is a general simulator
designed to work on any CPU architecture. The next cirq.generate_boixo_2018_supremacy_circuits_v2 . These
is the SSE2 instruction set [74], and the fastest uses the circuits are only tractable for benchmarking due to the
modern AVX2 instruction set [75]. These three available small numbers of qubits involved here. These circuits
instruction sets, combined with the fusing algorithm dis- have very little structure, involving many interleaved
cussed above, ensures users have maximal performance two-qubit gates to generate entanglement as quickly as
when running algorithms in simulation. The next sec- possible. In summary, at the largest benchmarked prob-
tion illustrates this power with benchmarks comparing lem size of 20 qubits, qsim achieves an approximately
the performance of TFQ to the performance of paral- 7-fold improvement in simulation time over Cirq. The
lelized Cirq running in simulation mode. In the future, performance curves are shown in Fig. 9. We note that the
we hope to expand the range of custom simulation hard- interleaved structure of these circuits means little gate fu-
ware supported to include GPU and TPU integration. sion is possible, so that this performance boost is mostly
due to the speed of our C++ simulators.

4. Benchmarks
When the simulated circuits have more structure, the
Fusion algorithm allows us to achieve a larger perfor-
Here, we demonstrate the performance of TFQ, backed
mance boost by reducing the number of gates that ulti-
by qsim, relative to Cirq on two benchmark simula-
mately need to be simulated. The circuits for this task
tion tasks. As detailed above, the performance differ-
are a factorized version of the supremacy-style circuits
ence is due to circuit pre-processing via gate fusion com-
which generate entanglement only on small subsets of the
bined with performant C++ simulators. The bench-
qubits. In summary, for these circuits, we find a roughly
marks were performed on a desktop equipped with an
100 times improvement in simulation time in TFQ versus
Intel(R) Xeon(R) W-2135 CPU running at 3.70 GHz.
Cirq. The performance curves are shown in Fig. 9.
This CPU supports the AVX2 instruction set, which at
the time of writing is the fastest hardware tier supported
by TFQ. In the following simulations, TFQ uses a sin- Thus we see that in addition to our core functionality
gle core, while Cirq simulation is allowed to parallelize of implementing native TensorFlow gradients for quan-
over all available cores using the Python multiprocessing tum circuits, TFQ also provides a significant boost in
module and the map function. performance over Cirq when running in simulation mode.
The first benchmark task is simulation of 500 Additionally, as noted before, this performance boost is
supremacy-style circuits batched 50 at a time. These despite the additional overhead of serialization between
circuits were generated using the Cirq function the TensorFlow frontend and qsim proper.
14

III. THEORY OF HYBRID


QUANTUM-CLASSICAL MACHINE LEARNING

In the previous section, we reviewed the building


blocks required for use of TFQ. In this section, we con-
sider the theory behind the software. We define quantum
neural networks as products of parameterized unitary
matrices. Samples and expectation values are defined by
expressing the loss function as an inner product. With
quantum neural networks and expectation values defined,
we can then define gradients. We finally combine quan-
tum and classical neural networks and formalize hybrid
quantum-classical backpropagation, one of core compo- Figure 10. High-level depiction of a multilayer quantum
nents of TFQ. neural network (also known as a parameterized quantum cir-
cuit), at various levels of abstraction. (a) At the most detailed
level we have both fixed and parameterized quantum gates,
A. Quantum Neural Networks any parameterized operation is compiled into a combination
of parameterized single-qubit operations. (b) Many singly-
parameterized gates Wj (θj ) form a multi-parameterized uni-
A Quantum Neural Network ansatz can generally be
tary Vl (θ~l ) which performs a specific function. (c) The prod-
written as a product of layers of unitaries in the form uct of multiple unitaries Vl generates the quantum model
L U (θ) shown in (d).
Y
Û (θ) = V̂ ` Û ` (θ ` ), (1)
`=1
the unitary of a given parameter is decomposable as the
where the `th layer of the QNN consists of the prod- product of exponentials of Pauli terms, one can explicitly
uct of V̂ ` , a non-parametric unitary, and Û ` (θ ` ) a uni- express the layer as
tary with variational parameters (note the superscripts
here represent indices rather than exponents). The multi- Yh i
Ûj` (θj` ) = cos(θj` βkj` )Iˆ − i sin(θj` βkj` )P̂k . (5)
parameter unitary of a given layer can itself be generally
k
comprised of multiple unitaries {Ûj` (θj` )}M
j=1 applied in
`

parallel: The above form will be useful for our discussion of gra-
M dients of expectation values below.

Û ` (θ ` ) ≡ Ûj` (θj` ). (2)
j=1
B. Sampling and Expectations
Finally, each of these unitaries Ûj` can be expressed as
the exponential of some generator ĝj` , which itself can
To optimize the parameters of an ansatz from equation
be any Hermitian operator on n qubits (thus expressible
(1), we need a cost function to optimize. In the case of
as a linear combination of n-qubit Pauli’s),
standard variational quantum algorithms, this cost func-
Kj` tion is most often chosen to be the expectation value of
−iθj` ĝj`
X
Ûj` (θj` ) =e , ĝj` = βkj` P̂k , (3) a cost Hamiltonian,
k=1
f (θ) = hĤiθ ≡ hΨ0 | Û † (θ)Ĥ Û (θ) |Ψ0 i (6)
where P̂k ∈ Pn , here Pn denotes the Paulis on n-qubits
[76], and βkj` ∈ R for all k, j, `. For a given j and `, in the where |Ψ0 i is the input state to the parameterized circuit.
case where all the Pauli terms commute, i.e. [P̂k , P̂m ] = In general, the cost Hamiltonian can be expressed as a
0 for all m, k such that βm j`
, βkj` 6= 0, one can simply linear combination of operators, e.g. in the form
decompose the unitary into a product of exponentials of
N
each term, X
Ĥ = αk ĥk ≡ α · ĥ, (7)
Y ` j`
Ûj` (θj` ) = e−iθj βk P̂k . (4) k=1

k
where we defined a vector of coefficients α ∈ RN and
Otherwise, in instances where the various terms do not a vector of N operators ĥ. Often this decomposition is
commute, one may apply a Trotter-Suzuki decomposi- chosen such that each of these sub-Hamiltonians is in the
tion of this exponential [77], or other quantum simula- n-qubit Pauli group ĥk ∈ Pn . The expectation value of
tion methods [78]. Note that in the above case where this Hamiltonian is then generally evaluated via quantum
15

expectation estimation, i.e. by taking the linear combi- C. Gradients of Quantum Neural Networks
nation of expectation values of each term
Now that we have established how to evaluate the
N
X loss function, let us describe how to obtain gradients of
f (θ) = hĤiθ = αk hĥk iθ ≡ α · hθ , (8) the cost function with respect to the parameters. Why
k=1 should we care about gradients of quantum neural net-
works? In classical deep learning, the most common fam-
where we introduced the vector of expectations hθ ≡ ily of optimization heuristics for the minimization of cost
hĥiθ . In the case of non-commuting terms, the vari- functions are gradient-based techniques [81–83], which
ous expectation values hĥk iθ are estimated over separate include stochastic gradient descent and its variants. To
runs. leverage gradient-based techniques for the learning of
multilayered models, the ability to rapidly differentiate
Note that, in practice, each of these quantum expecta- error functionals is key. For this, the backwards propa-
tions is estimated via sampling of the output of the quan- gation of errors [84] (colloquially known as backprop), is
tum computer [28]. Even assuming a perfect fidelity of a now canonical method to progressively calculate gradi-
quantum computation, sampling measurement outcomes ents of parameters in deep networks. In its most general
of eigenvalues of observables from the output of the quan- form, this technique is known as automatic differentiation
tum computer to estimate an expectation will have some [85], and has become so central to deep learning that this
non-negligible variance for any finite number of samples. feature of differentiability is at the core of several frame-
Assuming each of the Hamiltonian terms of equation (7) works for deep learning, including of course TensorFlow
admit a Pauli operator decomposition as (TF) [86], JAX [87], and several others.
To be able to train hybrid quantum-classical models
Jj (section III D), the ability to take gradients of quantum
X
ĥj = γkj P̂k , (9) neural networks is key. Now that we understand the
k=1 greater context, let us describe a few techniques below
for the estimation of these gradients.
where the γkj ’s are real-valued coefficients and the P̂j ’s
are Paulis that are Pauli operators [76], then to get an
estimate of the expectation value hĥj i within an accu- 1. Finite difference methods
racy , one needs to take a number of measurement sam-
PJj A simple approach is to use simple finite-difference
ples scaling as ∼ O(kĥk k2∗ /2 ), where kĥk k∗ = k=1 |γkj |
is the Pauli coefficient norm of each Hamiltonian term. methods, for example, the central difference method,
Thus, to estimate the expectation value of the fullP Hamil-
tonian (7) accurately within a precision ε =  k |αk |2 , f (θ + ε∆k ) − f (θ − ε∆k )
∂k f (θ) = + O(ε2 ) (10)
we would need on the order of ∼ O( 12 k |αk |2 kĥk k2∗ )
P 2ε
measurement samples in total [26, 79], as we would need which, in the case where there are M continuous param-
to measure each expectation independently if we are fol- eters, involves 2M evaluations of the objective function,
lowing the quantum expectation estimation trick of (8). each evaluation varying the parameters by ε in some di-
This is in sharp contrast to classical methods for gradi- rection, thereby giving us an estimate of the gradient
ents involving backpropagation, where gradients can be of the function with a precision O(ε2 ). Here the ∆k
estimated to numerical precision; i.e. within a precision is a unit-norm perturbation vector in the k th direction
 with ∼ O(PolyLog()) overhead. Although there have of parameter space, (∆k )j = δjk . In general, one may
been attempts to formulate a backpropagation principle use lower-order methods, such as forward difference with
for quantum computations [54], these methods also rely O(ε) error from M + 1 objective queries [23], or higher
on the measurement of a quantum observable, thus also order methods, such as a five-point stencil method, with
requiring ∼ O( 12 ) samples. O(ε4 ) error from 4M queries [88].
As we will see in the following section III C, estimat-
ing gradients of quantum neural networks on quantum
computers involves the estimation of several expectation 2. Parameter shift methods
values of the cost function for various values of the pa-
rameters. One trick that was recently pointed out [46, 80] As recently pointed out in various works [80, 89], given
and has been proven to be successful both theoretically knowledge of the form of the ansatz (e.g. as in (3)), one
and empirically to estimate such gradients is the stochas- can measure the analytic gradients of the expectation
tic selection of various terms in the quantum expectation value of the circuit for Hamiltonians which have a single-
estimation. This can greatly reduce the number of mea- term in their Pauli decomposition (3) (or, alternatively, if
surements needed per gradient update, we will cover this the Hamiltonian has a spectrum {±λ} for some positive
in subsection III C 3. λ). For multi-term Hamiltonians, in [89] a method to
16

obtain the analytic gradients is proposed which uses a 3. Stochastic Parameter Shift Gradient Estimation
linear combination of unitaries. Here, instead, we will
simply use a change of variables and the chain rule to Consider the full analytic gradient from (14) and
obtain analytic gradients of parametric unitaries of the (11), if we have M` parameters and L layers, there are
form (5) without the need for ancilla qubits or additional PL
`=1 M` terms of the following form to estimate:
unitaries.
For a parameter of interest θj` appearing in a layer ` in a Kj` Kj` h
parametric circuit in the form (5), consider the change of ∂f X j` ∂f
X X j` π j`
i
= β k = ±β k f (η ± 4 ∆k ) . (15)
variables ηkj` ≡ θj` βkj` , then from the chain rule of calculus ∂θj` k=1
∂ηk
k=1 ±
[90], we have
These terms come from the components of the gradient
∂f X ∂f ∂η j` X j` ∂f vector itself which has the dimension equal to that of the
k
`
= j` `
= βk . (11) total number of free parameters in the QNN, dim(θ).
∂θj ∂θj ∂ηk
k ∂ηk k For each of these components, for the j th component of
Thus, all we need to compute are the derivatives of the the `th layer, there 2Kj` parameter-shifted expectation
PL
cost function with respect to ηkj` . Due to this change of values to evaluate, thus in total there are `=1 2Kj` M`
variables, we need to reparameterize our unitary Û (θ) parameterized expectation values of the cost Hamiltonian
from (1) as to evaluate.
O Y  For practical implementation of this estimation proce-
j`
Û ` (θ ` ) 7→ Û ` (η ` ) ≡ e−iηk P̂k , (12) dure, we must expand this sum further. Recall that, as
j∈I` k the cost Hamiltonian generally will have many terms, for
each quantum expectation estimation of the cost function
where I` ≡ {1, . . . , M` } is an index set for the QNN for some value of the parameters, we have
layers. One can then expand each exponential in the
above in a similar fashion to (5): N
X X Jm
N X

j`
f (θ) = hĤiθ = αm hĥm iθ = αm γqm hP̂qm iθ ,
e−iηk P̂k
= cos(ηkj` )Iˆ − i sin(ηkj` )P̂k . (13) m=1 m=1 q=1
(16)
As can be shown from this form, the analytic derivative PN
which has j=1 Jj terms. Thus, if we consider that all
of the expectation value f (η) ≡ hΨ0 | Û † (η)Ĥ Û (η) |Ψ0 i
the terms in (15) are of the form of (16), we see that we
with respect to a component ηkj` can be reduced to fol- PN PL
lowing parameter shift rule [46, 80, 91]: have a total number of m=1 `=1 2Jm Kj` M` expecta-
tion values to estimate the gradient. Note that one of

j` f (η)
∂ηk
= f (η + π4 ∆j` π j`
k ) − f (η − 4 ∆k ) (14) these sums comes from the total number of appearances
of parameters in front of Paulis in the generators of the
where ∆j` parameterized quantum circuit, the second sum comes
k is a vector representing unit-norm perturba-
from the various terms in the cost Hamiltonian in the
tion of the variable ηkj` in the positive direction. We thus
Pauli expansion.
see that this shift in parameters can generally be much
As the cost of accurately estimating all these terms
larger than that of the numerical differentiation parame-
one by one and subsequently linearly combining the val-
ter shifts as in equation (10). In some cases this is useful
ues such as to yield an estimate of the total gradient
as one does not have to resolve as fine-grained a differ-
may be prohibitively expensive in terms of numbers of
ence in the cost function as an infinitesimal shift, hence
runs, instead, one can stochastically estimate this sum,
requiring less runs to achieve a sufficiently precise esti-
by randomly picking terms according to their weighting
mate of the value of the gradient.
[46, 80].
Note that in order to compute through the chain rule
One can sample a distribution over the ap-
in (11) for a parametric unitary as in (3), we need to
pearances of a parameter in the QNN, k ∼
evaluate the expectation function 2K` times to obtain PKj` j`
the gradient of the parameter θj` . Thus, in some cases Pr(k|j, `) = |βkj` |/( o=1 |βo |), one then estimates the
where each parameter generates an exponential of many two parameter-shifted terms corresponding to this in-
terms, although the gradient estimate is of higher pre- dex in (15) and averages over samples. We consider
cision, obtaining an analytic gradient can be too costly this case to be simply stochastic gradient estimation for
in terms of required queries to the objective function. the gradient component corresponding to the parame-
To remedy this additional overhead, Harrow et al. [80] ter θj` . One can go even further in this spirit, for each
proposed to stochastically select terms according to a dis- of these sampled expectation values, by also sampling
tribution weighted by the coefficients of each term in the terms from (16) according to a similar distribution de-
generator, and to perform gradient descent from these termined by the magnitude of the Pauli expansion co-
stochastic estimates of the gradient. Let us review this efficients. Sampling the indices {q, m} ∼ Pr(q, m) =
PN PJd
stochastic gradient estimation technique as it is imple- |αm γqm |/( d=1 r=1 |αd γrd |) and estimating the expec-
mented in TFQ. tation hP̂qm iθ for the appropriate parameter-shifted val-
17

D. Hybrid Quantum-Classical Computational


Graphs

1. Hybrid Quantum-Classical Neural Networks

Now, we are ready to formally introduce the no-


tion of Hybrid Quantum-Classical Neural Networks
(HQCNN’s). HQCNN’s are meta-networks comprised
of quantum and classical neural network-based function
blocks composed with one another in the topology of a
Figure 11. High-level depiction of a quantum-classical neural directed graph. We can consider this a rendition of a hy-
network. Blue blocks represent Deep Neural Network (DNN) brid quantum-classical computational graph where the
function blocks and orange boxes represent Quantum Neural inner workings (variables, component functions) of vari-
Network (QNN) function blocks. Arrows represent the flow
ous functions are abstracted into boxes (see Fig. 11 for a
of information during the feedforward (inference) phase. For
an example of the interface between quantum and classical
depiction of such a graph). The edges then simply repre-
neural network blocks, see Fig. 12. sent the flow of classical information through the meta-
network of quantum and classical functions. The key will
be to construct parameterized (differentiable) functions
fθ : RM → RN from expectation values of parameterized
quantum circuits, then creating a meta-graph of quan-
tum and classical computational nodes from these blocks.
Let us first describe how to create these functions from
expectation values of QNN’s.
ues sampled from the terms of (15) according to the pro- As we saw in equations (6) and (7), we get a differ-
cedure outlined above. This is considered doubly stochas- entiable cost function f : RM 7→ R from taking the ex-
tic gradient estimation. In principle, one could go one pectation value of a single Hamiltonian at the end of the
step further, and per iteration of gradient descent, ran- parameterized circuit, f (θ) = hĤiθ . As we saw in equa-
domly sample indices representing subsets of parame- tions (7) and (8), to compute this expectation value, as
ters for which we will estimate the gradient component, the readout Hamiltonian is often decomposed into a lin-
and set the non-sampled indices corresponding gradient ear combination of operators Ĥ = α · ĥ (see (7)), then
components to 0 for the given iteration. The distribu- the function is itself a linear combination of expectation
tion we sample in this case is given by θj` ∼ Pr(j, `) = values of multiple terms (see (8)), hĤiθ = α · hθ where
PKj` j` PL PMu PKiu iu
k=1 |βk |/( u=1 i=1 o=1 |βo |). This is, in a sense, hθ ≡ hĥiθ ∈ RN is a vector of expectation values. Thus,
akin to the SPSA algorithm [92], in the sense that before the scalar value of the cost function is evaluated,
it is a gradient-based method with a stochastic mask. QNN’s naturally are evaluated as a vector of expectation
The above component sampling, combined with doubly values, hθ .
stochastic gradient descent, yields what we consider to be Hence, if we would like the QNN to become more like
triply stochastic gradient descent. This is equivalent to si- a classical neural network block, i.e. mapping vectors to
multaneously sampling {j, `, k, q, m} ∼ Pr(j, `, k, q, m) = vectors f : RM → RN , we can obtain a vector-valued
Pr(k|j, `)Pr(j, `)Pr(q, m) using the probabilities outlined differentiable function from the QNN by considering it
in the paragraph above, where j and ` index the param- as a function of the parameters which outputs a vector
eter and layer, k is the index from the sum in equation of expectation values of different operators,
(15), and q and m are the indices of the sum in equation
(16). f : θ 7→ hθ (17)
where
In TFQ, all three of the stochastic averaging meth- (hθ )k = hĥk iθ ≡ hΨ0 | Û † (θ)ĥk Û (θ) |Ψ0 i . (18)
ods above can be turned on or off independently for
stochastic parameter-shift gradients. See the details in We represent such a QNN-based function in Fig. 12. Note
the Differentiator module of TFQ on GitHub. that, in general, each of these ĥk ’s could be comprised of
multiple terms themselves,
Nk
Now that we have reviewed various ways of obtaining ĥk =
X
γt m̂t (19)
gradients of expectation values, let us consider how to go t=1
beyond basic variational quantum algorithms and con-
sider fully hybrid quantum-classical neural networks. As hence, one can perform Quantum Expectation Estima-
we will see, our general framework of gradients of cost P to estimate the expectation of each term as hĥk i =
tion
Hamiltonians will carry over. t γt hm̂t i.
18

Note that, in some cases, instead of the expectation E. Autodifferentiation through hybrid
values of the set of operators {ĥk }Mk=1 , one may instead
quantum-classical backpropagation
want to relay the histogram of measurement results ob-
tained from multiple measurements of the eigenvalues As described above, hybrid quantum-classical neural
of each of these observables. This case can also be network blocks take as input a set of real parameters θ ∈
phrased as a vector of expectation values, as we will RM , apply a circuit Û (θ) and take a set of expectation
now show. First, note that the histogram of the mea- values of various observables
surement results of some ĥk with eigendecomposition
Prk (hθ )k = hĥk iθ .
ĥk = j=1 λjk |λjk i hλjk | can be considered as a vec-
tor of expectation values where the observables are the The result of this parameter-to-expected value map is
eigenstate projectors |λjk ihλjk |. Instead of obtaining a a function f : RM → RN which maps parameters to a
single real number from the expectation value of ĥk , we real-valued vector,
can obtain a vector hk ∈ Rrk , where rk = rank(ĥk ) and
the components are given by (hk )j ≡ h|λjk ihλjk |iθ . We f : θ 7→ hθ .
are then effectively considering the categorical (empiri-
cal) distribution as our vector. This function can then be composed with other param-
eterized function blocks comprised of either quantum or
Now, if we consider measuring the eigenvalues of mul- classical neural network blocks, as depicted in Fig. 11.
tiple observables {ĥk }M To be able to backpropagate gradients through gen-
k=1 and collecting the measure-
ment result histograms, we get a 2-dimensional tensor eral meta-networks of quantum and classical neural net-
(hθ )jk = h|λjk ihλjk |iθ . Without loss of generality, we work blocks, we simply have to figure out how to back-
propagate gradients through a quantum parameterized
can flatten this array into a vector of dimension RR where
PM block function when it is composed with other parame-
R = k=1 rk . Thus, considering vectors of expectation
terized block functions. Due to the partial ordering of
values is a relatively general way of representing the out-
the quantum-classical computational graph, we can fo-
put of a quantum neural network. In the limit where the
cus on how to backpropagate gradients through a QNN
set of observables considered forms an informationally-
in the scenario where we consider a simplified quantum-
complete set of observables [93], then the array of mea-
classical network where we combine all function blocks
surement outcomes would fully characterize the wave-
that precede and postcede the QNN block into mono-
function, albeit at an overhead exponential in the number
lithic function blocks. Namely, we can consider a sce-
of qubits.
nario where we have fpre : xin 7→ θ (fpre : Rin → RM )
We should mention that in some cases, instead of ex- as the block preceding the QNN, the QNN block as
pectation values or histograms, single samples from the fqnn : θ 7→ hθ , (fqnn : RM → RN ), the post-QNN
output of a measurement can be used for direct feedback- block as fpost : hθ 7→ yout (fpost : RM → RN out ) and
control on the quantum system, e.g. in quantum error finally the loss function for training the entire network
correction [76]. At least in the current implementation of being computed from this output L : RN out → R. The
TFQuantum, since quantum circuits are built in Cirq and composition of functions from the input data to the out-
this feature is not supported in the latter, such scenar- put loss is then the sequentially composited function
ios are currently out-of-scope. Mathematically, in such (L◦fpost ◦fqnn ◦fpre ). This scenario is depicted in Fig. 12.
a scenario, one could then consider the QNN and mea- Now, let us describe the process of backpropagation
surement as map from quantum circuit parameters θ to through this composition of functions. As is standard in
the conditional random variable Λθ valued over RNk cor- backpropagation of gradients through feedforward net-
responding to the measured eigenvalues λjk with a prob- works, we begin with the loss function evaluated at the
ability density Pr[(Λθ )k ≡ λjk ] = p(λjk |θ) which cor- output units and work our way back through the sev-
responds to the measurement statistics distribution in- eral layers of functional composition of the network to
duced by the Born rule, p(λjk |θ) = h|λjk ihλjk |iθ . This get the gradients. The first step is to obtain the gra-
QNN and single measurement map from the parameters dient of the loss function ∂L/∂yout and to use classi-
to the conditional random variable f : θ 7→ Λθ can be cal (regular) backpropagation of gradients to obtain the
considered as a classical stochastic map (classical condi- gradient of the loss with respect to the output of the
tional probability distribution over output variables given QNN, i.e. we get ∂(L ◦ fpost )/∂h via the usual use of
the parameters). In the case where only expectation val- the chain rule for backpropagation, i.e., the contraction
ues are used, this stochastic map reduces to a determinis- of the Jacobian with the gradient of the subsequent layer,
∂fpost
tic node through which we may backpropagate gradients, ∂(L ◦ fpost )/∂h = ∂L
∂y · ∂h .
as we will see in the next subsection. In the case where Now, let us label the evaluated gradient of the loss
this map is used dynamically per-sample, this remains a function with respect to the QNN’s expectation values
stochastic map, and though there exists some algorithms as
for backpropagation through stochastic nodes [94], these
∂(L◦fpost )
are not currently supported natively in TFQ. g≡ ∂h . (20)
h=hθ
19

We can now consider this value as effectively a constant.


Let us then define an effective backpropagated Hamilto-
nian with respect to this gradient as
X
Ĥg ≡ gk ĥk ,
k

where gk are the components of (20). Notice that expec-


tations of this Hamiltonian are given by

hĤg iθ = g · hθ , Figure 12. Example of inference and hybrid backpropaga-


tion at the interface of a quantum and classical part of a hy-
and so, the gradients of the expectation value of this brid computational graph. Here we have classical deep neural
Hamiltonian are given by networks (DNN) both preceding and postceding the quan-
tum neural network (QNN). In this example, the preceding
X ∂hθ,k
DNN outputs a set of parameters θ which are used as then
∂ ∂
∂θj hĤg iθ = ∂θj (g · hθ ) = gk ∂θj used by the QNN as parameters for inference. The QNN
k outputs a vector of expectation values (estimated through
several runs) whose components are (hθ )k = hĥk iθ . This
which is exactly the Jacobian of the QNN function fqnn vector is then fed as input to another (post-ceding) DNN,
contracted with the backpropagated gradient of previous and the loss function L is computed from this output. For
layers. Explicitly, backpropagation through this hybrid graph, one first back-
propagates the gradient of the loss through the post-ceding
∂ DNN to obtain gk = ∂L/∂hk . Then, one takes the gradi-
∂(L ◦ fpost ◦ fqnn )/∂θ = ∂θ hĤg iθ .
ent of the following
P functional
P of the output of the QNN:
fθ = g · hθ = k gk hθ,k = k gk hĥk iθ with respect to the
Thus, by taking gradients of the expectation value of QNN parameters θ (which can be achieved with any of the
the backpropagated effective Hamiltonian, we can get the methods for taking gradients of QNN’s described in previous
gradients of the loss function with respect to QNN pa- subsections of this section). This completes the backprop-
rameters, thereby successfully backpropagating gradients agation of gradients of the loss function through the QNN,
through the QNN. Further backpropagation through the the preceding DNN can use the now computed ∂L/∂θ to fur-
preceding function block fpre can be done using standard ther backpropagate gradients to preceding nodes of the hybrid
computational graph.
classical backpropagation by using this evaluated QNN
gradient.
Note that for a given value of g, the effective back- A. Hybrid Quantum-Classical Convolutional
propagated Hamiltonian is simply a fixed Hamiltonian Neural Network Classifier
operator, as such, taking gradients of the expectation
of a single multi-term operator can be achieved by any
To run this example in the browser through Colab,
choice in a multitude of the methods for taking gradients
follow the link:
of QNN’s described earlier in this section.
Backpropagation through parameterized quantum cir- docs/tutorials/qcnn.ipynb
cuits is enabled by our Differentiator interface.
We offer both finite difference, regular parameter shift,
and stochastic parameter shift gradients, while the gen- 1. Background
eral interface allows users to define custom gradient
methods. Supervised classification is a canonical task in classical
machine learning. Similarly, it is also one of the most
well-studied applications for QNNs [15, 30, 42, 95, 96].
As such, it is a natural starting point for our exploration
IV. BASIC QUANTUM APPLICATIONS of applications for quantum machine learning. Discrimi-
native machine learning with hierarchical models can be
The following examples show how one may use the var- understood as a form of compression to isolate the infor-
ious features of TFQ to reproduce and extend existing mation containing the label [97]. In the case of quantum
results in second generation quantum machine learning. data, the hidden classical parameter (real scalar in the
Each application has an associated Colab notebook which case of regression, discrete label in the case of classifica-
can be run in-browser to reproduce any results shown. tion) can be embedded in a non-local subsystem or sub-
Here, we use snippets of code from those example note- space of the quantum system. One then has to perform
books for illustration; please see the example notebooks some disentangling quantum transformation to extract
for full code details. information from this non-local subspace.
20

To choose an architecture for a neural network model, are several families of filters, or feature maps, applied in
one can draw inspiration from the symmetries in the a translationally-invariant fashion. Here, we apply a sin-
training data. For example, in computer vision, one of- gle QCNN layer followed by several feature maps. The
ten needs to detect corners and edges regardless of their outputs of these feature maps can then be fed into clas-
position in an image; we thus postulate that a neural net- sical convolutional network layers, or in this particular
work to detect these features should be invariant under simplified example directly to fully-connected layers.
translations. In classical deep learning, an example of
such translationally-invariant neural networks are convo- Target problems:
lutional neural networks. These networks tie parameters
across space, learning a shared set of filters which are 1. Learn to extract classical information hidden in cor-
applied equally to all portions of the data. relations of a quantum system
2. Utilize shallow quantum circuits via hybridization
To the best of the authors’ knowledge, there is no with classical neural networks to extract informa-
strong indication that we should expect a quantum ad- tion
vantage for the classification of classical data using QNNs
in the near term. For this reason, we focus on classifying Required TFQ functionalities:
quantum data as defined in section I C. There are many
1. Hybrid quantum-classical network models
kinds of quantum data with translational symmetry. One
2. Batch quantum circuit simulator
example of such quantum data are cluster states. These
3. Quantum expectation based back-propagation
states are important because they are the initial states
4. Fast classical gradient-based optimizer
for measurement-based quantum computation [98, 99].
In this example we will tackle the problem of detect-
ing errors in the preparation of a simple cluster state. 2. Implementations
We can think of this as a supervised classification task:
our training data will consist of a variety of correctly
and incorrectly prepared cluster states, each paired with As discussed in section II D 2, the first step in the QML
their label. This classification task can be generalized pipeline is the preparation of quantum data. In this ex-
to condensed matter physics and beyond, for example to ample, our quantum dataset is a collection of correctly
the classification of phases near quantum critical points, and incorrectly prepared cluster states on 8 qubits, and
where the degree of entanglement is high. the task is to classify theses states. The dataset prepara-
tion proceeds in two stages; in the first stage, we generate
Since our simple cluster states are translationally in-
a correctly prepared cluster state:
variant, we can extend the spatial parameter-tying of
convolutional neural networks to quantum neural net- def c l u s t e r _ s t a t e _ c i r c u i t ( bits ) :
circuit = cirq . Circuit ()
works, using recent work by Cong, et al. [52], which circuit . append ( cirq . H . on_each ( bits ) )
introduced a Quantum Convolutional Neural Network for this_bit , next_bit in zip (
(QCNN) architecture. QCNNs are essentially a quan- bits , bits [1:] + [ bits [0]]) :
tum circuit version of a MERA (Multiscale Entanglement circuit . append (
Renormalization Ansatz) network [100]. MERA has been cirq . CZ ( this_bit , next_bit ) )
return circuit
extensively studied in condensed matter physics. It is a
hierarchical representation of highly entangled wavefunc- Errors in cluster state preparation will be simulated by
tions. The intuition is that as we go higher in the net- applying Rx (θ) gates that rotate a qubit about the X-axis
work, the wavefunction’s entanglement gets renormalized of the Bloch sphere by some amount 0 ≤ θ ≤ 2π. These
(coarse grained) and simultaneously a compressed repre- excitations will be labeled 1 if the rotation is larger than
sentation of the wavefunction is formed. This is akin to some threshold, and -1 otherwise. Since the correctly
the compression effects encountered in deep neural net- prepared cluster state is always the same, we can think
works [101]. of it as the initial state in the pipeline and append the
Here we extend the QCNN architecture to include clas- excitation circuits corresponding to various error states:
sical neural network postprocessing, yielding a Hybrid c l u s t e r _ s t a t e _ b i t s = cirq . GridQubit . rect (1 , 8)
Quantum Convolutional Neural Network (HQCNN). We e x c i t a t i o n _ i n pu t = tf . keras . Input (
perform several low-depth quantum operations in order shape =() , dtype = tf . dtypes . string )
cluster_state = tfq . layers . AddCircuit () (
to begin to extract hidden parameter information from a excitation_input , prepend =
wavefunction, then pass the resulting statistical informa- cluster_state_circuit ( cluster_state_bits ))
tion to a classical neural network for further processing.
Specifically, we will apply one layer of the hierarchy of the Note how excitation_input is a standard Keras data in-
QCNN. This allows us to partially disentangle the input gester. The datatype of the input is string to account
state and obtain statistics about values of multi-local ob- for our circuit serialization mechanics described in sec-
servables. In this strategy, we deviate from the original tion II E 1.
construction of Cong et al. [52]. Indeed, we are more in Having prepared our dataset, we begin construction of
the spirit of classical convolutional networks, where there our model. The quantum portion of all the models we
21

1
-1
1
1
-1
Prepare Qconv 1
Cluster +
Qconv
State QPool
+
QPool MSE Loss

Figure 13. The quantum portion of our classifiers, shown on


4 qubits. The combination of quantum convolution (blue) Figure 14. Architecture of the purely quantum CNN for de-
and quantum pooling (orange) reduce the system size from 4 tecting excited cluster states.
qubits to 2 qubits.

The first model we construct uses only quantum op-


consider in this section will be made of the same oper- erations to decorrelate the inputs. After preparing the
ations: quantum convolution and quantum pooling. A cluster state dataset on N = 8 qubits, we repeatedly ap-
visualization of these operations on 4 qubits is shown in ply the quantum convolution and pooling layers until the
Fig. 13. Quantum convolution layers are enacted by ap- system size is reduced to 1 qubit. We average the output
~ with a stride of 1. In anal-
plying a 2 qubit unitary U (θ) of the quantum model by measuring the expectation of
ogy with classical convolutional layers, the parameters of Pauli-Z on this final qubit. Measurement and parameter
the unitaries are tied, so that the same operation is ap- control are enacted via our tfq.layers.PQC object. The
plied to every nearest-neighbor pair of qubits. Pooling is code for this model is shown below:
achieved using a different 2 qubit unitary G(φ)~ designed
r e a d o u t _ o p e r a t o r s = cirq . Z (
to disentangle, allowing information to be projected from c l u s t e r _ s t a t e _ b i t s [ -1])
2 qubits down to 1. The code below defines the quantum quantum_model = tfq . layers . PQC (
convolution and quantum pooling operations: create_model_circuit ( cluster_state_bits ),
r e a d o u t _ o p e r a t o r s ) ( cluster_state )
def q u a n t u m _ c o n v _ c i r c u i t ( bits , syms ) :
qcnn_model = tf . keras . Model (
circuit = cirq . Circuit ()
inputs =[ e x c it a t i o n _ i n p u t ] ,
for a , b in zip ( bits [0::2] , bits [1::2]) :
outputs =[ quantum_model ])
circuit += two_q_unitary ([ a , b ] , syms )
for a , b in zip ( bits [1::2] , bits [2::2] + [ In the code, create_model_circuit is a function which ap-
bits [0]]) :
circuit += two_q_unitary ([ a , b ] , syms )
plies the successive layers of quantum convolution and
return circuit quantum pooling. A simplified version of the resulting
model on 4 qubits is shown in Fig. 14.
def q u a n t u m _ p o o l _ c i r c u i t ( srcs , snks , syms ) : With the model constructed, we turn to training and
circuit = cirq . Circuit () validation. These steps can be accomplished using stan-
for src , snk in zip ( srcs , snks ) :
circuit += two_q_pool ( src , snk , syms )
dard Keras tools. During training, the model output on
return circuit each quantum datapoint is compared against the label;
the cost function used is the mean squared error between
In the code, two_q_unitary constructs a general param- the model output and the label, where the mean is taken
eterized two qubit unitary [102], while two_q_pool rep- over each batch from the dataset. The training and val-
resents a CNOT with general one qubit unitaries on the idation code is shown below:
control and target qubits, allowing for variational selec- qcnn_model . compile ( optimizer = tf . keras . Adam ,
tion of control and target basis. loss = tf . losses . mse )
With the quantum portion of our model defined, we
move on to the third and fourth stages of the QML ( train_excitations , train_labels ,
test_excitations , test_labels
pipeline, measurement and classical post-processing. We ) = generate_data ( c l u s t e r _ s t a t e _ b i t s )
consider three classifier variants, each containing a dif-
ferent degree of hybridization with classical networks: history = qcnn_model . fit (
x = train_excitations ,
1. Purely quantum CNN y = train_labels ,
2. Hybrid CNN in which the outputs of a truncated batch_size =16 ,
QCNN are fed into a standard densely connected epochs =25 ,
v al id at i on _d at a =(
neural net test_excitations , test_labels ) )
3. Hybrid CNN in which the outputs of multiple trun-
cated QCNNs are fed into a standard densely con- In the code, the generate_data function builds the exci-
nected neural net tation circuits that are applied to the initial cluster state
22

Qconv
+ 1
1 -1
-1
QPool 1
1 1
1 -1
-1 1
1
Prepare Qconv
Cluster + Prepare Qconv
Cluster + MSE Loss
Test QPool State
MSE Loss QPool

Qconv
+
QPool
Figure 15. A simple hybrid architecture in which the outputs
of a truncated QCNN are fed into a classical neural network.

input, along with the associated labels. The loss plots Figure 16. A hybrid architecture in which the outputs of
for both the training and validation datasets can be gen- 3 separate truncated QCNNs are fed into a classical neural
network.
erated by running the associated example notebook.
We now consider a hybrid classifier. Instead of us-
ing quantum layers to pool all the way down to 1 qubit,
we can truncate the QCNN and measure a vector of op- readouts ) ( cluster_state )
erators on the remaining qubits. The resulting vector QCNN_3 = tfq . layers . PQC (
of expectation values is then fed into a classical neural multi_readout_model_circuit (
cluster_state_bits ),
network. This hybrid model is shown schematically in readouts ) ( cluster_state )
Fig. 15. # Feed all QCNNs into a classical NN
This can be achieved in TFQ with a few simple mod- concat_out = tf . keras . layers . concatenate (
ifications to the previous model, implemented with the [ QCNN_1 , QCNN_2 , QCNN_3 ])
code below: dense_1 = tf . keras . layers . Dense (8) ( concat_out )
dense_2 = tf . keras . layers . Dense (1) ( dense_1 )
# Build multi - readout quantum layer m u l t i _ q c o n v _ m o d e l = tf . keras . Model (
readouts = [ cirq . Z ( bit ) for bit in inputs =[ e x c it a t i o n _ i n p u t ] ,
c l u s t e r _ s t a t e _ b i t s [4:]] outputs =[ dense_2 ])
q u a n t u m _ m o d e l _ d u a l = tfq . layers . PQC (
multi_readout_model_circuit (
cluster_state_bits ), We find that that for the same optimization settings, the
readouts ) ( cluster_state ) purely quantum model trains the slowest, while the three-
# Build classical neural network layers
d1_dual = tf . keras . layers . Dense (8) ( quantum-filter hybrid model trains the fastest. This data
quantum_model_dual ) is shown in Fig. 17. This demonstrates the advantage
d2_dual = tf . keras . layers . Dense (1) ( d1_dual ) of exploring hybrid quantum-classical architectures for
hybrid_model = tf . keras . Model ( inputs =[ classifying quantum data.
e x c i t a t i o n _ i n p u t ] , outputs =[ d2_dual ])

In the code, multi_readout_model_circuit applies just one


round of convolution and pooling, reducing the system
size from 8 to 4 qubits. This hybrid model can be trained
using the same Keras tools as the purely quantum model.
Accuracy plots can be seen in the example notebook.
The third architecture we will explore creates three
independent quantum filters, and combines the outputs
from all three with a single classical neural network. This
architecture is shown in Fig. 16. This multi-filter archi-
tecture can be implemented in TFQ as below:
# Build 3 quantum filters
QCNN_1 = tfq . layers . PQC (
multi_readout_model_circuit (
cluster_state_bits ), Figure 17. Mean squared error loss as a function of training
readouts ) ( cluster_state ) epoch for three different hybrid classifiers. We find that the
QCNN_2 = tfq . layers . PQC (
purely quantum classifier trains the slowest, while the hybrid
multi_readout_model_circuit (
cluster_state_bits ), architecture with multiple quantum filters trains the fastest.
23

B. Hybrid Machine Learning for Quantum Control

To run this example in the browser through Colab,


follow the link:
research/control/control.ipynb

Recently, neural networks have been successfully de-


MSE
ployed for solving quantum control problems ranging
from optimizing gate decomposition, error correction
subroutines, to continuous Hamiltonian controls. To fully
leverage the power of neural networks without being hob-
bled by possible computational overhead, it is essential
to obtain a deeper understanding of the connection be- Figure 18. Architecture for hybrid quantum-classical neural
tween various neural network representations and differ- network model for learning a quantum control decomposition.
ent types of quantum control dynamics. We demonstrate
tailoring machine learning architectures to underlying
quantum dynamics using TFQ in [103]. As a summary xj and desired outputs yj , we can then define controller
of how the unique functionalities of TFQ ease quantum error as ej (F ) = |yj − H(F (xj ))|. The optimal
P control
control optimization, we list the problem definition and problem can then be defined as minimizing j ej (F ) for
required TFQ toolboxes as follows. F . This optimal controller will produce the optimal con-
trol vector gj∗ given xj
Target problems: This problem can be solved exactly if H and the re-
lationships between gj∗ and xj are well understood and
1. Learning quantum dynamics. invertible. Alternatively, one can find an approximate
2. Optimizing quantum control signal with regard to solution using a parameterized controller F in the form
a cost objective of a feed-forward neural network. We can calculate a set
3. Error mitigation in realistic quantum device of control pairs of size N : {xi , yi } with i ∈ [N ]. We can
input xi into F which is parameterized by its weight ma-
Required TFQ functionalities: trices {Wi } and biases {bi } of each ith layer. A successful
training will find network parameters given by {Wi } and
1. Hybrid quantum-classical network model {bi } such that for any given input xi the network out-
2. Batch quantum circuit simulator puts gi which leads to a system output H(gi ) u yi . This
3. Quantum expectation-based backpropagation architecture is shown schematically in Fig. 18
4. Fast classical optimizers, both gradient based and There are two important reasons behind the use of
non-gradient based supervised learning for practical control optimization.
We exemplify the importance of appropriately choos- Firstly, not all time-invariant Hamiltonian control prob-
ing the right neural network architecture for the corre- lems permit analytical solutions, so inverting the con-
sponding quantum control problems with two simple but trol error function map can be costly in computation.
realistic control optimizations. The two types of con- Secondly, realistic deployment of even a time-constant
trols we have considered cover the full range of quan- quantum controller faces stochastic fluctuations due to
tum dynamics: constant Hamiltonian evolution vs time noise in the classical electronics and systematic errors
dependent Hamiltonian evolution. In the first problem, which cause the behavior of the system H to deviate
we design a DNN to machine-learn (noise-free) control from ideal. Deploying supervised learning with experi-
of a single qubit. In the second problem, we design an mentally measured control pairs will therefore enable the
RNN with long-term memory to learn a stochastic non- finding of robust quantum control solutions facing sys-
Markovian control noise model. tematic control offset and electronic noise. However, this
necessitates seamlessly connecting the output of a clas-
sical neural network with the execution of a quantum
1. Time-Constant Hamiltonian Control circuit in the laboratory. This functionality is offered by
TFQ through ControlledPQC .
We showcase the use of supervised learning with hy-
If the underlying system Hamiltonian is time-invariant brid quantum-classical feed-forward neural networks in
the task of quantum control can be simplified with open- TFQ for the single-qubit gate decomposition problem. A
loop optimization. Since the optimal solution is indepen- general unitary transform on one qubit can be specified
dent of instance by instance control actualization, control by the exponential
optimization can be done offline. Let x be the input to a
controller, which produces some control vector g = F (x). U (φ, θ1 , θ2 ) = e−iφ(cos θ1 Ẑ+sin θ1 (cos θ2 X̂+sin θ2 Ŷ )) . (21)
This control vector actuates a system which then pro-
duces a vector output H(g). For a set of control inputs However, it is often preferable to enact single qubit uni-
24

taries using rotations about a single axis at a time.


Therefore, given a single qubit gate specified by the vec-
tor of three rotations {φ, θ1 , θ2 }, we want find the control
sequence that implements this gate in the form
β γ δ
U (β, γ, δ) = eiα e−i 2 Ẑ e−i 2 Ŷ e−i 2 Ẑ . (22)
This is the optimal decomposition, namely the Bloch the-
orem, given any unknown single-qubit unitary into hard-
ware friendly gates which only involve the rotation along
a fixed axis at a time.
The first step in the training involves preparing the
training data. Since quantum control optimization only
focuses on the performance in hardware deployment, the Figure 19. Mean square error on training dataset, and valida-
control inputs and output have to be chosen such that tion dataset, each of size 5000, as a function training epoch.
they are experimentally observable. We define the vec-
tor of expectation values yi = [hX̂ixi , hŶ ixi , hẐixi ] of all
single-qubit Pauli operators given by the quantum state model = tf . keras . Model (
prepared by the associated input xi : inputs =[ circ_in , x_in ] , outputs = exp_out )

|ψ0i i =Ûoi |0i (23) To train this hybrid supervised model, we define an op-
−i β γ δ
2 Ẑ −i 2 Ŷ −i 2 Ẑ
timizer, which in our case is the Adam optimizer, with
|ψix =e e e |ψ0i i , (24)
an appropriately chosen loss function:
hX̂ix = hψ|x X̂ |ψix , (25)
model . compile ( tf . keras . Adam , loss = ’ mse ’)
hŶ ix = hψ|x Ŷ |ψix , (26)
We finish off by training on the prepared supervised data
hẐix = hψ|x Ẑ |ψix . (27) in the standard way:
Assuming we have prepared the training dataset, each h i s t o r y _ t w o _ a xi s = model . fit (...
set consists of input vectors xi = [φ, θ1 , θ2 ] which derives
from the randomly drawn g, unitaries that prepare each The training converges after around 100 epochs as seen
initial state Û0i , and the associated expectation values in Fig. 19, which also shows excellent generalization to
yi = [hX̂ixi , hŶ ixi , hẐi}xi ]. validation data.
Now we are ready to define the hybrid quantum-
classical neural network model in Keras with Tensfor-
Flow API. To start, we first define the quantum part of 2. Time-dependent Hamiltonian Control
the hybrid neural network, which is a simple quantum
circuit of three single-qubit gates as follows. Now we consider a second kind of quantum control
cont rol_para ms = sympy . symbols ( ’ theta_ {1:3} ’) problem, where the actuated system H is allowed to
qubit = cirq . GridQubit (0 , 0) change in time. If the system is changing with time, the
control_circ = cirq . Circuit ( optimal control g∗ is also generally time varying. Gener-
cirq . Rz ( contr ol_param s [2]) ( qubit ) , alizing the discussion of section IV B 1, we can write the
cirq . Ry ( contr ol_param s [1]) ( qubit ) ,
cirq . Rz ( contr ol_param s [0]) ( qubit ) )
time varying control error given the time-varying con-
troller F (t) as ej (F (t), t) = |yj − H(F (xj , t), t)|. The
We are now ready to finish off the hybrid network by optimal control can then be written as g∗ (t) = g¯∗ + δ(t).
defining the classical part, which maps the target params This task is significantly harder than the problem dis-
to the control vector g = {β, γ, δ}. Assuming we have cussed in section IV B 1, since we need to learn the hidden
defined the vector of observables ops , the code to build variable δ(t) which can result in potentially highly com-
the model is: plex real-time system dynamics. We showcase how TFQ
circ_in = tf . keras . Input ( provides the perfect toolbox for such difficult control op-
shape =() , dtype = tf . dtypes . string ) timization with an important and realistic problem of
x_in = tf . keras . Input ((3 ,) ) learning and thus compensating the low frequency noise.
d1 = tf . keras . layers . Dense (128) ( x_in )
d2 = tf . keras . layers . Dense (128) ( d1 ) One of the main contributions to time-drifting errors
d3 = tf . keras . layers . Dense (64) ( d2 ) in realistic quantum control is 1/f α - like errors, which
g = tf . keras . layers . Dense (3) ( d3 ) encapsulate errors in the Hamiltonian amplitudes whose
exp_out = tfq . layers . ControlledPQC ( frequency spectrum has a large component at the low
control_circ , ops ) ([ circ_in , x_in ])
frequency regime. The origin of such low frequency noise
Now, we are ready to put everything together to define remains largely controversial. Mathematically, we can
and train a model in Keras. The two axis control model parameterize the low frequency noise in the time domain
is defined as follows: with the amplitude of the Pauli Z Hamiltonian on each
25

model = tf . keras . Sequential ([


tf . keras . layers . LSTM (
rnn_units ,
r e c u r r e n t _ i n i t i a l i z e r = ’ glorot _uniform ’ ,
b a t c h _ i n p u t _ s h a p e =[ batch_size , None , 1]) ,
tf . keras . layers . Dense (1) ])

We can then train this LSTM on the realizations and


evaluate the success of our training using validation data,
on which we calculate prediction accuracy. A typical ex-
ample of this is shown in Fig. 20. The LSTM converges
quickly to within the accuracy from the expectation value
measurements within 30 epochs.

Figure 20. Mean square error on LSTM predictions on 500


randomly generated inputs.
C. Quantum Approximate Optimization

qubit as: To run this example in the browser through Colab,


follow the link:
Ĥlow (t) = αte Ẑ. (28)
research/qaoa/qaoa.ipynb
A simple phase control signal is given by ω(t) = ω0 t with
the Hamiltonian Ĥ0 (t) = ω0 tẐ. The time-dependant
wavefunction is then given by 1. Background
R ti
(Ĥlow (t)+Ĥ0 (t))dt
|ψ(ti )i = T[e 0 ] |+i . (29)
In this section, we introduce the basics of Quantum
We can attempt to learn the noise parameters α and e Approximate Optimization Algorithms and show how to
by training a recurrent neural network to perform time- implement a basic case of this class of algorithms in TFQ.
series prediction on the noise. In other words, given a In the advanced applications section V, we explore how to
record of expectation values {hψ(ti )| X̂ |ψ(ti )i} for ti ∈ apply meta-learning techniques [104] to the optimization
{0, δt, 2δt, . . . , T } obtained on state |ψ(t)i, we want the of the parameters of the algorithm.
RNN to predict the future observables {hψ(ti )| X̂ |ψ(ti )i} The Quantum Approximate Optimization Algorithm
for ti ∈ {T, T + δt, T + 2δt, . . . , 2T }. was originally proposed to solve instances of the Max-
There are several possible ways to build such an RNN. Cut problem [12]. The QAOA framework has since been
One option is recording several timeseries on a device a- extended to encompass multiple problem classes related
priori, later training and testing an RNN offline. Another to finding the low-energy states of Ising Hamiltonians,
option is an online method, which would allow for real- including Hamiltonians of higher-order and continuous-
time controller tuning. The offline method will be briefly variable Hamiltonians [48].
explained here, leaving the details of both methods to In general, the goal of the QAOA for binary variables
the notebook associated with this example. is to find approximate minima of a pseudo Boolean func-
First, we can use TFQ or Cirq to prepare several time- tion f on n bits, f (z), z ∈ {−1, 1}×n . This function is of-
series for testing and validation. The function below per- ten an mth -order polynomial of binary
P variables for some
forms this task using TFQ: positive integer m, e.g., f (z) = p∈{0,1}m αp z p , where
Qn p
def generate_data ( end_time , timesteps , omega_0 , z p = j=1 zj j . QAOA has been applied to NP-hard
exponent , alpha ) : problems such as Max-Cut [12] or Max-3-Lin-2 [105].
t_steps = linspace (0 , end_time , timesteps ) The case where this polynomial is quadratic (m = 2)
q = cirq . GridQubit (0 , 0)
phase = sympy . symbols ( " phaseshift " ) has been extensively explored in the literature. It should
c = cirq . Circuit ( cirq . H ( q ) , be noted that there have also been recent advances us-
cirq . Rz ( phase_s ) ( q ) ) ing quantum-inspired machine learning techniques, such
ops = [ cirq . X ( q ) ] as deep generative models, to produce approximate solu-
phases = t_steps * omega_0 +
tions to such problems [106, 107]. These 2-local problems
t_steps **( exponent + 1) /( exponent + 1)
return tfq . layers . Expectation () ( will be the main focus in this example. In this tutorial,
c, we first show how to utilize TFQ to solve a MaxCut in-
symbol_names = [ phase ] , stance with QAOA with p = 1.
symbol_values = transpose ( phases ) , The QAOA approach to optimization first starts in
operators = ops ) ⊗n
an initial product state |ψ0 i and then a tunable gate
We can use this function to prepare many realizations of sequence produces a wavefunction with a high probability
the noise process, which can be fed to an LSTM defined of being measured in a low-energy state (with respect to
using tf.keras as below: a cost Hamiltonian).
26

Let us define our parameterized quantum circuit of the MaxCut problem. For this tutorial we generate
ansatz. The canonical choice is to start P
with a uniform a random 3-regular graph with 10 nodes with NetworkX
⊗n ⊗n
superposition |ψ0 i = |+i = √12n ( x∈{0,1}n |xi), [108].
hence a fixed state. The QAOA unitary itself then con- # generate a 3 - regular graph with 10 nodes
sists of applying maxcut_graph = nx . r a n d o m _ r e g u l a r _ g r a p h ( n =10 , d =3)

P The next step is to allocate 10 qubits, to define


−iηj ĤM −iγ j ĤC
Y
Û (η, γ) = e e , (30) the Hadamard layer generating the initial superposition
j=1 state, the mixing Hamiltonian HM and the cost Hamil-
tonian HP .
P
onto the starter state, where ĤM = j∈V X̂j is known # define 10 qubits
as the mixer Hamiltonian, and ĤC ≡ f (Ẑ) is our cirq_qubits = cirq . GridQubit . rect (1 , 10)
cost Hamiltonian, which is a function of Pauli opera- # create layer of hadamards to initialize the
tors Ẑ = {Ẑj }nj=1 . The resulting state is given by superposition state of all computational
⊗n
|Ψηγ i = Û (η, γ) |+i , which is our parameterized out- states
h a d a m a r d _ c i r c ui t = cirq . Circuit ()
put. We define the energy to be minimized as the expec-
for node in maxcut_graph . nodes () :
tation value of the cost Hamiltonian ĤC ≡ f (Ẑ), where qubit = cirq_qubits [ node ]
Ẑ = {Ẑj }nj=1 with respect to the output parameterized h a d a m a r d _ c i r c u it . append ( cirq . H . on ( qubit ) )
state.
# define the two parameters for one block of
QAOA
Target problems: q ao a_ pa r am et er s = sympy . symbols ( ’a b ’)
1. Train a parameterized quantum circuit for a dis- # define the the mixing and the cost
crete optimization problem (MaxCut) Hamiltonians
2. Minimize a cost function of a parameterized quan- mixing_ham = 0
tum circuit for node in maxcut_graph . nodes () :
qubit = cirq_qubits [ node ]
Required TFQ functionalities: mixing_ham += cirq . PauliString ( cirq . X ( qubit )
)
1. Conversion of simple circuits to TFQ tensors
cost_ham = maxcut_graph . n um be r_ o f_ ed ge s () /2
2. Evaluation of gradients for quantum circuits
for edge in maxcut_graph . edges () :
3. Use of gradient-based optimizers from TF qubit1 = cirq_qubits [ edge [0]]
qubit2 = cirq_qubits [ edge [1]]
cost_ham += cirq . PauliString (1/2*( cirq . Z (
2. Implementation qubit1 ) * cirq . Z ( qubit2 ) ) )

With this, we generate the unitaries representing the


For the MaxCut QAOA, the cost Hamiltonian function quantum circuit
f is a second order polynomial of the form,
# generate the qaoa circuit
X
1 ˆ
qaoa_circuit = tfq . util . exponential ( operators =
ĤC = f (Ẑ) = 2 (I − Ẑj Ẑk ), (31) [ cost_ham , mixing_ham ] , coefficients =
{j,k}∈E q ao a_ pa r am et er s )

where G = {V, E} is a graph for which we would like to Subsequently, we use these ingredients to build our
find the MaxCut; the largest size subset of edges (cut model. We note here in this case that QAOA has no
set) such that vertices at the end of these edges belong input data and labels, as we have mapped our graph to
to a different partition of the vertices into two disjoint the QAOA circuit. To use the TFQ framework we specify
subsets [12]. the Hadamard circuit as input and convert it to a TFQ
To train the QAOA, we simply optimize the expecta- tensor. We may then construct a tf.keras model using
tion value of our cost Hamiltonian with respect to our our QAOA circuit and cost in a TFQ PQC layer, and
parameterized output to find (approximately) optimal use a single instance sample for training the variational
parameters; η ∗ , γ ∗ = argminη,γ L(η, γ) where L(η, γ) = parameters of the QAOA with the Hadamard gates as an
input layer and a target value of 0 for our loss function,
hΨηγ | ĤC |Ψηγ i is our loss. Once trained, we use the as this is the theoretical minimum of this optimization
QPU to sample the probability distribution of measure- problem.
ments of the parameterized output state at optimal an- This translates into the following code:
gles in the standard basis, x ∼ p(x) = | hx|Ψη∗ γ ∗ i |2 ,
and pick the lowest energy bitstring from those samples # define the model and training data
model_circuit , model_readout = qaoa_circuit ,
as our approximate optimum found by the QAOA. cost_ham
Let us walkthrough how to implement such a basic input_ = [ h a d am a r d _ c i r c u i t ]
QAOA in TFQ. The first step is to generate an instance input_ = tfq . c o n v e r t _ t o _ t e n s o r ( input_ )
27

optimum = [0]

# Build the Keras model .


optimum = np . array ( optimum )
model = tf . keras . Sequential ()
model . add ( tf . keras . layers . Input ( shape =() , dtype =
tf . dtypes . string ) )
model . add ( tfq . layers . PQC ( model_circuit ,
model_readout ) )
To optimize the parameters of the ansatz state, we use
a classical optimization routine. In general, it would be
possible to use pre-calculated parameters [109] or to im-
plement for QAOA tailored optimization routines [110]. Figure 21. Quantum-classical computational graph for the
For this tutorial, we choose the Adam optimizer imple- meta-learning optimization of the recurrent neural network
mented in TensorFlow. We also choose the mean absolute (RNN) optimizer and a quantum neural network (QNN) over
error as our loss function. several optimization iterations. The hidden state of the RNN
is represented by h, we represent the flow of data used to eval-
model . compile ( loss = tf . keras . losses .
uate the meta-learning loss function. This meta loss function
mean_absolute_error , optimizer = tf . keras .
optimizers . Adam () ) L is a functional of the history of expectation value estimate
history = model . fit ( input_ , optimum , epochs =1000 , samples y = {yt }Tt=1 , it is not directly dependent on the RNN
verbose =1) parameters ϕ. TFQ’s hybrid quantum-classical backpropaga-
tion then becomes key to train the RNN to learn to optimize
the QNN, which in our particular example was the QAOA.
Figure taken from [45], originally inspired from [104].
V. ADVANCED QUANTUM APPLICATIONS

The following applications represent how we have ap- In section IV C, we have shown how to implement basic
plied TFQ to accelerate their discovery of new quantum QAOA in TFQ and optimize it with a gradient-based
algorithms. The examples presented in this section are optimizer, we can now explore how to leverage classical
newer research as compared to the previous section, as neural networks to optimize QAOA parameters. To run
such they have not had as much time for feedback from this example in the browser via Colab, follow the link:
the community. We include these here as they are demon- In recent works, the use of classical recurrent neural
stration of the sort of advanced QML research that can be networks to learn to optimize the parameters [45] (or gra-
accomplished by combining several building blocks pro- dient descent hyperparameters [111]) was proposed. As
vided in TFQ. As many of these examples involve the the choice of parameters after each iteration of quantum-
building and training of hybrid quantum-classical models classical optimization can be seen as the task of generat-
and advanced optimizers, such research would be much ing a sequence of parameters which converges rapidly to
more difficult to implement without TFQ. In our re- an approximate optimum of the landscape, we can use a
searchers’ experience, the performance gains and the ease type of classical neural network that is naturally suited
of use of TFQ decreased the time-to-working-prototype to generate sequential data, namely, recurrent neural net-
from weeks to days or even hours when it is compared to works. This technique was derived from work by Deep-
using alternative tools. Mind [104] for optimization of classical neural networks
Finally, as we would like to provide users with ad- and was extended to be applied to quantum neural net-
vanced examples to see TFQ in action for research use- works [45].
cases beyond basic implementations, along with the ex- The application of such classical learning to learn tech-
amples presented in this section are several notebooks niques to quantum neural networks was first proposed in
accessible on Github: [45]. In this work, an RNN (long short term memory;
github.com/tensorflow/quantum/tree/research LSTM) gets fed the parameters of the current iteration
We encourage readers to read the section below for an and the value of the expectation of the cost Hamiltonian
overview of the theory and use of TFQ functions and of the QAOA, as depicted in Fig. 21. More precisely, the
would encourage avid readers who want to experiment RNN receives as input the previous QNN query’s esti-
with the code to visit the full notebooks. mated cost function expectation yt ∼ p(y|θt ), where yt
is the estimate of hĤit , as well as the parameters for
which the QNN was evaluated θt . The RNN at this time
A. Meta-learning for Variational Quantum step also receives information stored in its internal hid-
Optimization den state from the previous time step ht . The RNN itself
has trainable parameters ϕ, and hence it applies the pa-
To run this example in the browser through Colab, rameterized mapping
follow the link:
research/metalearning qaoa/metalearning qaoa.ipynb ht+1 , θt+1 = RNNϕ (ht , θt , yt ) (32)
28

which generates a new suggestion for the QNN parame-


ters as well as a new hidden state. Once this new set of
QNN parameters is suggested, the RNN sends it to the
QPU for evaluation and the loop continues.
The RNN is trained over random instances of QAOA
problems selected from an ensemble of possible QAOA
MaxCut problems. See the notebook for full details on
the meta-training dataset of sampled problems.
The loss function we chose for our experiments is the
observed improvement at each time step, summed over
the history of the optimization: Figure 22. The path chosen by the RNN optimizer on a 12-
qubit MaxCut problem after being trained on a set of random
T  10-qubit MaxCut problems. We see that the neural network
P
L(ϕ) = Ef,y min{f (θt ) − minj<t [f (θj )], 0} , (33) learned to generalize its heuristic to larger system sizes, as
t=1 originally pointed out in [45].

The observed improvement at time step t is given by the


difference between the proposed value, f (θt ), and the
learning may need to be more adaptive or general to en-
best value obtained over the history of the optimization
compass unknown data distributions. In classical ma-
until that point, minj<t [f (θj )].
chine learning, this problem can be partially addressed
In our particular example in this section, we will con- by using a network with sufficient expressive power and
sider a time horizon of 5 time steps, hence the RNN will random initialization of the network parameters.
have to learn to very rapidly approximately optimize the
Unfortunately, it has been proven that due to funda-
parameters of the QAOA. Results are featured in Fig. 22.
mental limits on quantum readout complexity in com-
The details of the implementation are available in the Co-
bination with the geometry of the quantum space, an
lab. Here is an overview of the problem that was tackled
exact duplication of this strategy is doomed to fail [113].
and the TFQ features that facilitated this implementa-
In particular, in analog to the vanishing gradients prob-
tion:
lem that has plagued deep classical networks and histor-
ically slowed their progress [114], an exacerbated version
Target problems:
of this problem appears in quantum circuits of sufficient
1. Learning to learn with quantum neural networks depth that are randomly initialized. This problem, also
via classical neural networks known as the problem of barren plateaus, refers to the
2. Building a neural-network-based optimizer for overwhelming volume of quantum space with an expo-
QAOA nentially small gradient, making straightforward training
3. Lowering the number of iterations needed to opti- impossible if on enters one of these dead regions. The rate
mize QAOA of this vanishing increases exponentially with the num-
ber of qubits and depends on whether the cost function
Required TFQ functionalities: is global or local [115]. While strategies have been devel-
oped to deal with the challenges of vanishing gradients
1. Hybrid quantum-classical networks and hybrid classically [116], the combination of differences in read-
backpropagation out complexity and other constraints of unitarity make
2. Batching training over quantum data (QAOA prob- direct implementation of these fixes challenging. In par-
lem instances) ticular, the readout of information from a quantum sys-
3. Integration with TF for the classical RNN tem has a complexity of O(1/α ) where  is the desired
precision, and α is some small integer, while the complex-
ity of the same task classically often scales as O(log 1/)
[117]. This means that for a vanishingly small gradi-
ent (e.g. 10−7 ), a classical algorithm can easily obtain
B. Vanishing Gradients and Adaptive Layerwise at least some signal, while a quantum-classical one may
Training Strategies
diffuse essentially randomly until ∼ 1014 samples have
been taken. This has fundamental consequences for the
1. Random Quantum Circuits and Barren Plateaus methods one uses to train, as we detail below. The re-
quirement on depth to reach these plateaus is only that
When using parameterized quantum circuits for a a portion of the circuit approximates a unitary 2−design
learning task, inevitably one must choose an initial con- which can occur at a depth occurring at O(n1/d ) where
figuration and training strategy that is compatible with n is the number of qubits and d is the dimension of the
that initialization. In contrast to problems more known connectivity of the quantum circuit, possibly requiring as
structure, such as specific quantum simulation tasks [24] little depth as O(log(n)) in the all-to-all case [118]. One
or optimizations [112], the structure of circuits used for may imagine that a solution to this problem could be
29

to simply initialize a circuit to the identity to avoid this with overparameterized circuits, and that shallow cir-
problem, but this incurs some subtle challenges. First, cuits approach this limit when increasing in size. It is
such a fixed initialization will tend to bias results on gen- not necessarily clear when this transition occurs, so LL
eral datasets. This challenge has been studied in the is a cost-efficient strategy to approach good local minima.
context of more general block initialization of identity
schemes [119].
Perhaps the more insidious way this problem arises, is Target problems:
that training with a method like stochastic gradient de-
scent (or sophisticated variants like Adam) on the entire 1. Dynamically building circuits for arbitrary learning
network, can accidentally lead one onto a barren plateau tasks
if the learning rate is too high. This is due to the fact 2. Manipulating circuit structure and parameters dur-
that the barren plateaus argument is one of volume of ing training
space and quantum-classical information exchange, and 3. Reducing the number of trained parameters and
random diffusion in parameter space will tend to lead circuit depth
one onto a plateau. This means that even the most 4. Avoid initialization on or drifting to a barren
clever initialization can be thwarted by the impact of plateau
this phenomenon on the training process. In practice this Required TFQ functionalities:
severely limits learning rate and hence training efficiency
of QNNs. 1. Parameterized circuit layers
For this reason, one can consider training on subsets of 2. Keras weight manipulation interface
the network which do not have the ability to completely 3. Parameter shift differentiator for exact gradient
randomize during a random walk. This layerwise learning computation
strategy allows one to use larger learning rates and im-
proves training efficiency in quantum circuits [120]. We To run this example in the browser through Colab,
advocate the use of these strategies in combination with follow the link:
appropriately designed local cost functions in order to cir- research/layerwise learning/layerwise learning.ipynb
cumvent the dramatically worse problems with objectives
like fidelity [113, 115]. TFQ has been designed to make As an example to show how this functionality may be
experimenting with both of these strategies straightfor- explored in TFQ, we will look at randomly generated
ward for the user, as we now document. For an example layers as shown in section V B, where one layer consists
of barren plateaus, see the notebook at the following link: of a randomly chosen rotation gate around the X, Y , or
Z axis on each qubit, followed by a ladder of CZ gates
docs/tutorials/barren plateaus.ipynb over all qubits.
def create_layer ( qubits , layer_id ) :
# create symbols for trainable parameters
symbols = [
2. Layerwise quantum circuit learning sympy . Symbol (
f ’{ layer_id } -{ str ( i ) } ’)
So far, the network training methods demonstrated in for i in range ( len ( qubits ) ) ]
section IV have focused on simultaneous optimization # build layer from random gates
of all network parameters, or end-to-end training. gates = [
As alluded to in the section on the Barren Plateaus random . choice ([
effect (V B), this type of strategy, when combined with cirq . Rx , cirq . Ry , cirq . Rz ]) (
a network of sufficient depth, can force reduced learning symbols [ i ]) ( q )
for i , q in enumerate ( qubits ) ]
rates, even with clever initialization. While this may
not be the optimal strategy for every realization of a # add connections between qubits
quantum network, TFQ is designed to easily facilitate for control , target in zip (
testing of this idea in conjunction with different cost qubits , qubits [1:]) :
gates . append ( cirq . CZ ( control , target ) )
functions to enhance efficiency. An alternative strategy
return gates , symbols
that has been beneficial is layerwise learning (LL) [120]
, where the number of trained parameters is altered We assume that we don’t know the ideal circuit struc-
on the fly. In this section, we will learn to alter the ture to solve our learning problem, so we start with the
architecture of a circuit while it trains, and restrict shallowest circuit possible and let our model grow from
attention to blocks of size insufficient to randomize onto there. In this case we start with one initial layer, and
a plateau. Among other things, this type of learning add a new layer after it has trained for 10 epochs. First,
strategy can help us avoid initialization or drifting we need to specify some variables:
throughout training of our QNN onto a barren plateau # number of qubits and layers in our circuit
[113, 120]. In [121], it is also shown that gradient based n_qubits = 6
algorithms are more successful in finding global minima n_layers = 8
30

tions of how many layers should be trained in each step.


# define data and readout qubits One can also control which layers are trained by manip-
data_qubits = cirq . GridQubit . rect (1 , n_qubits ) ulating the symbols we feed into the circuit and keeping
readout = cirq . GridQubit (0 , n_qubits -1)
readout_op = cirq . Z ( readout ) track of the weights of previous layers. The number of
layers, layers trained at a time, epochs spent on a layer,
# symbols to parametrize circuit and learning rate are all hyperparameters whose optimal
symbols = [] values depend on both the data and structure of the cir-
layers = []
weights = []
cuits being used for learning. This example is meant to
exemplify how TFQ can be used to easily explore these
We use the same training data as specified in the TFQ choices to maximize the efficiency and efficacy of training.
MNIST classifier example notebook available in the TFQ See our notebook linked above for the complete imple-
Github repository, which encodes a downsampled version mentation of these features. Using TFQ to explore this
of the digits into binary vectors. Ones in these vectors type of learning strategy relieves us of manually imple-
are encoded as local X gates on the corresponding qubit menting training procedures and optimizers, and autod-
in the register, as shown in [23]. For this reason, we also ifferentiation with the parameter shift rule. It also lets us
use the readout procedure specified in that work where readily use the rich functionality provided by TensorFlow
a sequence of XHX gates is added to the readout qubit and Keras. Implementing and testing all of the function-
at the end of the circuit. Now we train the circuit, layer ality needed in this project by hand could take up to a
by layer: week, whereas all this effort reduces to a couple of lines of
code with TFQ as shown in the notebook. Additionally,
for layer_id in range ( n_layers ) :
circuit = cirq . Circuit ()
it lets us speed up training by using the integrated qsim
layer , layer_symbols = create_layer ( simulator as shown in section II F 4. Last but not least,
data_qubits , f ’ layer_ { layer_id } ’) TFQ provides a thoroughly tested and maintained QML
layers . append ( layer ) framework which greatly enhances the reproducibility of
our research.
circuit += layers
symbols += layer_symbols

# set up the readout qubit C. Hamiltonian Learning with Quantum Graph


circuit . append ( cirq . X ( readout ) ) Recurrent Neural Networks
circuit . append ( cirq . H ( readout ) )
circuit . append ( cirq . X ( readout ) )
readout_op = cirq . Z ( readout ) 1. Motivation: Learning Quantum Dynamics with a
Quantum Representation
# create the Keras model
model = tf . keras . Sequential ()
model . add ( tf . keras . layers . Input ( Quantum simulation of time evolution was one of the
shape =() , dtype = tf . dtypes . string ) ) original envisioned applications of quantum computers
model . add ( tfq . layers . PQC ( when they were first proposed by Feynman [9]. Since
model_circuit = circuit ,
operators = readout_op ,
then, quantum time evolution simulation methods have
diff erentiat or = Par ameterS hift () , seen several waves of great progress, from the early days
initializer = tf . keras . initializers . Zeros ) of Trotter-Suzuki methods, to methods of qubitization
) and randomized compiling [78], and finally recently with
model . compile ( some methods for quantum variational methods for ap-
loss = tf . keras . losses . squared_hinge ,
optimizer = tf . keras . optimizers . Adam (
proximate time evolution [122].
learning_rate =0.01) ) The reason that quantum simulation has been such
a focus of the quantum computing community is be-
# Update model parameters and add cause we have some indications to believe that quan-
# new 0 parameters for new layers .
tum computers can demonstrate a quantum advantage
model . set_weights (
[ np . pad ( weights , ( n_qubits , 0) ) ]) when evolving quantum states through unitary time evo-
model . fit ( x_train , lution; the classical simulation overhead scales exponen-
y_train , tially with the depth of the time evolution.
batch_size =128 , As such, it is natural to consider if such a potential
epochs =10 ,
verbose =1 ,
quantum simulation advantage can be extended to the
v al id at i on _d at a =( x_test , y_test ) ) realm of quantum machine learning as an inverse prob-
lem, that is, given access to some black-box dynamics,
qnn_results = model . evaluate ( x_test , y_test ) can we learn a Hamiltonian such that time evolution un-
der this Hamiltonian replicates the unknown dynamics.
# store weights after training a layer
weights = model . get_weights () [0] This is known as the problem of Hamiltonian learning,
or quantum dynamics learning, which has been studied
In general, one can choose many different configura- in the literature [60, 123]. Here, we use a Quantum Neu-
31

ral Network-based approach to learn the Hamiltonian of 2. Training multi-layered quantum neural networks
a quantum dynamical process, given access to quantum with shared parameters
states at various time steps. 3. Batching QNN training data (input-output pairs
As was pointed out in the barren plateaus section V B, and time steps) for supervised learning of quantum
attempting to do QML with no prior on the physics of unitary map
the system or no imposed structure of the ansatz hits the
quantum version of the no free lunch theorem; the net-
work has too high a capacity for the problem at hand and 2. Implementation
is thus hard to train, as evidenced by its vanishing gra-
dients. Here, instead, we use a highly structured ansatz, Please see the tutorial notebook for full code details:
from work featured in [124]. First of all, given that we research/qgrnn ising/qgrnn ising.ipynb
know we are trying to replicate quantum dynamics, we
can structure our ansatz to be based on Trotter-Suzuki Here we provide an overview of our implementa-
evolution [77] of a learnable parameterized Hamiltonian. tion. We can define a general Quantum Graph Neu-
This effectively performs a form of parameter-tying in ral Network as a repeating sequence of exponentials
our ansatz between several layers representing time evo- of a Hamiltonian defined on a graph, Ûqgnn (η, θ) =
QP  QQ −iηpq Ĥq (θ ) 
lution. In a previous example on quantum convolutional p=1 q=1 e where the Ĥq (θ) are generally
networks IV A 1, we performed parameter tying for spa- 2-local Hamiltonians whose coupling topology is that of
tial translation invariance, whereas here, we will assume an assumed graph structure.
the dynamics remain constant through time, and perform In our Hamiltonian learning problem, we aim to learn
parameter tying across time, hence it is akin to a quan- a target Ĥtarget which will be an Ising model Hamiltonian
tum form of recurrent neural networks (RNN). More pre- with Jjk as couplings and Bv for site bias term of each
cisely, as it is a parameterization of a Hamiltonian evo- P P P
spin, i.e., Ĥtarget = j,k Jjk Ẑj Ẑk + v Bv Ẑv + v X̂v ,
lution, it is akin to a quantum form of recently proposed given access to pairs of states at different times that were
models in classical machine learning called Hamiltonian subjected to the target time evolution operator Û (T ) =
neural networks [125].
e−iĤtarget T .
Beyond the quantum RNN form, we can impose further
We will use a recurrent form P of QGNN, using Hamil-
structure. We can consider a scenario where we know we
tonian generators Ĥ1 (θ) = v∈V αv X̂v and Ĥ2 (θ) =
have a one-dimensional quantum many-body system. As P P
Hamiltonians of physical have local couplings, we can θ Ẑ
{j,k}∈E jk j k Ẑ + v∈V v v , with trainable param-
φ Ẑ
1
use our prior assumptions of locality in the Hamiltonian eters {θjk , φv , αv }, for our choice of graph structure
and encode this as a graph-based parameterization of the prior G = {V, E}. The QGRNN is then resembles ap-
Hamiltonian. As we will see below, by using a Quantum plying a Trotterized time evolution of a parameterized
Graph Recurrent Neural network [124] implemented in Ising Hamiltonian Ĥ(θ) = Ĥ1 (θ) + Ĥ2 (θ) where P is the
TFQ, we will be able to learn the effective Hamiltonian number of Trotter steps. This is a good parameteriza-
topology and coupling strengths quite accurately, simply tion to learn the effective Hamiltonian of the black-box
from having access to quantum states at different times dynamics as we know from quantum simulation theory
and employing mini-batched gradient-based training. that Trotterized time evolution operators can closely ap-
Before we proceed, it is worth mentioning that the ap- proximate true dynamics in the limit of |ηjk | → 0 while
proach featured in this section is quite different from the P → ∞.
learning of quantum dynamics using a classical RNN fea- For our TFQ software implementation, we can initial-
ture in previous example section IV B. As sampling the ize Ising model & QGRNN model parameters as random
output of a quantum simulation at different times can be- values on a graph. It is very easy to construct this kind of
come exponentially hard, we can imagine that for large graph structure Hamiltonian by using Python NetworkX
systems, the Quantum RNN dynamics learning approach library.
could have primacy over the classical RNN approach, N = 6
thus potentially demonstrating a quantum advantage of dt = 0.01
QML over classical ML for this problem. # Target Ising model parameters
G_ising = nx . cycle_graph ( N )
ising_w = [ dt * np . random . random () for _ in G .
Target problems: edges ]
ising_b = [ dt * np . random . random () for _ in G .
1. Preparing low-energy states of a quantum system nodes ]
2. Learning Quantum Dynamics using a Quantum
Neural Network Model Because the target Hamiltonian and its nearest-neighbor
graph structure is unknown to the QGRNN, we need to
Required TFQ functionalities:

1. Quantum compilation of exponentials of Hamilto-


1 For simplicity we set αv to constant 1’s in this example.
nians
32

initialize a new random graph prior for our QGRNN. In [ H_target ]) )


this example we will use a random 4-regular graph with output = tf . math . reduce_sum (
a cycle as our prior. Here, params is a list of trainable output , axis = -1 , keepdims = True )

parameters of the QGRNN. Finally, we can get approximated lowest energy states
# QGRNN model parameters of the VQE model by compiling and training the above
G_qgrnn = nx . r a n d o m _ r e g u l a r _ g r a p h ( n =N , d =4) Keras model.2
qgrnn_w = [ dt ] * len ( G_qgrnn . edges ) model = tf . keras . Model (
qgrnn_b = [ dt ] * len ( G_qgrnn . nodes ) inputs = circuit_input , outputs = output )
theta = [ ’ theta {} ’. format ( e ) for e in G . edges ] adam = tf . keras . optimizers . Adam (
phi = [ ’ phi {} ’. format ( v ) for v in G . nodes ] learning_rate =0.05)
params = theta + phi
low_bound = - np . sum ( np . abs ( ising_w + ising_b ) )
Now that we have the graph structure, weights of edges
- N
& nodes, we can construct Cirq-based Hamiltonian oper- inputs = tfq . c o n v e r t _ t o _ t e n s o r ([ circuit ])
ator which can be directly calculated in Cirq and TFQ. outputs = tf . c o n v e r t _ t o _ t e n s o r ([[ low_bound ]])
To create a Hamiltonian by using cirq.PauliSum ’s or model . compile ( optimizer = adam , loss = ’ mse ’)
model . fit ( x = inputs , y = outputs ,
cirq.PauliString ’s, we need to assign appropriate qubits
batch_size =1 , epochs =100)
on them. Let’s assume Hamiltonian() is the Hamilto- params = model . get_weights () [0]
nian preparation function to generate cost Hamiltonian res = { k : v for k , v in zip ( symbols , params ) }
return cirq . r e s o l v e _ p a r a m e t e r s ( circuit , res )
from interaction weights and mixer Hamiltonian from
bias terms. We can bring qubits of Ising & QGRNN Now that the VQE function is built, we can generate
models by using cirq.GridQubit . the initial quantum data input with the low energy states
near to the ground state of the target Hamiltonian for
qubits_ising = cirq . GridQubit . rect (1 , N )
qubits_qgrnn = cirq . GridQubit . rect (1 , N , 0 , N ) both our data and our input state to our QGRNN.
ising_cost , ising_mixer = Hamiltonian ( H_target = ising_cost + ising_mixer
G_ising , ising_w , ising_b , qubits_ising ) l o w _ e n e r g y _ i s in g = VQE ( H_target , qubits_ising )
qgrnn_cost , qgrnn_mixer = Hamiltonian ( l o w _ e n e r g y _ q g rn n = VQE ( H_target , qubits_qgrnn )
G_qgrnn , qgrnn_w , qgrnn_b , qubits_qgrnn )
The QGRNN is fed the same input data as the
To train the QGRNN, we need to create an ensemble true process. We will use gradient-based training over
of states which are to be subjected to the unknown dy- minibatches of randomized timesteps chosen for our
namics. We chose to prepare a low-energy states by first QGRNN and the target quantum evolution. We will
performing a Variational Quantum Eigensolver (VQE) thus need to aggregate the results among the different
[28] optimization to obtain an approximate ground state. timestep evolutions to train the QGRNN model. To cre-
Following this, we can apply different amounts of simu- ate these time evolution exponentials, we can use the
lated time evolutions onto to this state to obtain a varied tfq.util.exponential function to exponentiate our target
dataset. This emulates having a physical system in a low-
and QGRNN Hamiltonians3
energy state and randomly picking the state at different
times. First things first, let us build a VQE model exp_ ising_co st = tfq . util . exponential (
operators = ising_cost )
def VQE ( H_target , q ) exp_ising_mix = tfq . util . exponential (
# Parameters operators = ising_mixer )
x = [ ’x {} ’. format ( i ) for i , _ in enumerate ( q ) ] exp_ qgrnn_co st = tfq . util . exponential (
z = [ ’z {} ’. format ( i ) for i , _ in enumerate ( q ) ] operators = qgrnn_cost , coefficients = params )
symbols = x + z exp_qgrnn_mix = tfq . util . exponential (
circuit = cirq . Circuit () operators = qgrnn_mixer )
circuit . append ( cirq . X ( q_ ) ** sympy . Symbol ( x_ )
for q_ , x_ in zip (q , x ) ) Here we randomly pick the 15 timesteps and apply the
circuit . append ( cirq . Z ( q_ ) ** sympy . Symbol ( z_ ) Trotterized time evolution operators using our above con-
for q_ , z_ in zip (q , z ) ) structed exponentials. We can have a quantum dataset
Now that we have a parameterized quantum circuit, {(|ψTj i, |φTj i)|j = 1..M } where M is the number of
we can minimize the expectation value of given Hamil-
tonian. Again, we can construct a Keras model with
Expectation . Because the output expectation values are 2 Here is some tip for training. Setting the output true value to
calculated respectively, we need to sum them up at the theoretical lower bound, we can minimize our expectation value
last. in the Keras model fit P framework. That
P is, we can Puse the in-
equality hĤtarget i = jk Jjk hZj Zk i + v Bv hZv i + v hXv i ≥
circuit_input = tf . keras . Input ( P P
shape =() , dtype = tf . string ) jk (−)|Jjk | − v |Bv | − N .
3 Here, we use the terminology cost and mixer Hamiltonians as the
output = tfq . layers . Expectation () (
circuit_input , Trotterization of an Ising model time evolution is very similar to
symbol_names = symbols , a QAOA, and thus we borrow nomenclature from this analogous
operators = tfq . c o n v e r t _ t o _ t e n s o r ( QNN.
33
PB
data, or batch size (in our case we chose M = 15), L(θ, φ) = 1 − B1 j=1 |hψTj |φTj i|
2
j j 1 B
|ψTj i = Ûtarget |ψ0 i and |φTj i = Ûqgrnn |ψ0 i.
P
= 1− B j=1 hẐtest ij

def trotterize ( inp , depth , cost , mix ) : def a v e r a g e _ f i d e l i t y ( y_true , y_pred ) :


add = tfq . layers . AddCircuit () return 1 - K . mean ( y_pred )
outp = add ( cirq . Circuit () , append = inp )
for _ in range ( depth ) :
Again, we can use Keras model fit. To feed a batch of
outp = add ( outp , append = cost ) quantum data, we can use tf.concat because the quan-
outp = add ( outp , append = mix ) tum circuits are already in tf.Tensor . In this case, we
return outp know that the lower bound of fidelity is 0, but the y_true
batch_size = 15 is not used in our custom loss function average_fidelity .
T = np . random . uniform (0 , T_max , batch_size ) We set learning rate of Adam optimizer to 0.05.
depth = [ int ( t / dt ) +1 for t in T ]
y_true = tf . concat ( true_states , axis =0)
true_states = []
y_pred = tf . concat ( pred_states , axis =0)
pred_states = []
for P in depth :
model . compile (
true_states . append (
loss = average_fidelity ,
trotterize ( low_energy_ising , P ,
optimizer = tf . keras . optimizers . Adam (
exp_ising_cost , exp_ising_mix ) )
learning_rate =0.05) )
pred_states . append (
model . fit ( x =[ y_true , y_pred ] ,
trotterize ( low_energy_qgrnn , P ,
y = tf . zeros ([ batch_size , 1]) ,
exp_qgrnn_cost , exp_qgrnn_mix ) )
batch_size = batch_size ,
epochs =500)
Now we have both quantum data from (1) the true
time evolution of the target Ising model and (2) the pre- The full results are displayed in the notebook, we
dicted data state from the QGRNN. In order to maximize see for this example that our time-randomized gradient-
overlap between these two wavefunctions, we can aim to based optimization of our parameterized class of quan-
maximize the fidelity between the true state and the state tum Hamiltonian evolution ends up learning the target
output by the QGRNN, averaged over random choices Hamiltonian and its couplings to a high degree of accu-
of time evolution. To evaluate the fidelity between two racy.
quantum states (say |Ai and |Bi) on a quantum com-
puter, a well-known approach is to perform the swap test
[126]. In the swap test, an additional observer qubit is
used, by putting this qubit in a superposition and using
it as control for a Fredkin gate (controlled-SWAP), fol-
lowed by a Hadamard on the observer qubit, the observer
qubit’s expectation value in the encodes the fidelity of the
two states, | hA|Bi |2 . Thus, right after Fidelity Swap Figure 23. Left: True (target) Ising Hamiltonian with edges
Test, we can measure the swap test qubit with Pauli Ẑ representing couplings and nodes representing biases. Middle:
randomly chosen initial graph structure and parameter values
operator with Expectation , hẐtest i, and then we can cal-
for the QGRNN. Right: learned Hamiltonian from the trained
culate the average of fidelity (inner product) between a QGRNN.
batch of two sets of quantum data states, which can be
used as our classical loss function in TensorFlow.
# Check class S w a p T e s t F id e l i t y in the notebook . D. Generative Modelling of Quantum Mixed
fidelity = S w a p T e s t F i d e l i t y ( States with Hybrid Quantum-Probabilistic Models
qubits_ising , qubits_qgrnn , batch_size )
1. Background
state_true = tf . keras . Input ( shape =() ,
dtype = tf . string )
state_pred = tf . keras . Input ( shape =() , Often in quantum mechanical systems, one encounters
dtype = tf . string )
so-called mixed states, which can be understood as proba-
fid_output = fidelity ( state_true , state_pred )
fid_output = tfq . layers . Expectation () ( bilistic mixtures over pure quantum states [127]. Typical
fid_output , cases where such mixed states arise are when looking at
symbol_names = symbols , finite-temperature quantum systems, open quantum sys-
operators = fidelity . op ) tems, and subsystems of pure quantum mechanical sys-
model = tf . keras . Model (
tems. As the ability to model mixed states are thus key
inputs =[ state_true , state_pred ] , to understanding quantum mechanical systems, in this
outputs = fid_output ) section, we focus on models to learn to represent and
mimic the statistics of quantum mixed states.
Here, we introduce the average fidelity and implement As mixed states are a combination of a classical prob-
this with custom Keras loss function. ability distribution and quantum wavefunctions, their
34

statistics can exhibit both classical and quantum forms Consider the task of preparing a thermal state: given
of correlations (e.g., entanglement). As such, if we wish a Hamiltonian Ĥ and a target inverse temperature β =
to learn a representation of such mixed state which can 1/T , we want to variationally approximate the state
generatively model its statistics, one can expect that a
1 −β Ĥ
hybrid representation combining classical probabilistic σ̂β = Zβ e , Zβ = tr(e−β Ĥ ), (35)
models and quantum neural networks can be an ideal.
Such a decomposition is ideal for near-term noisy de- using a state of the form presented in equation (34).
vices, as it reduces the overhead of representation on the That is, we aim to find a value of the hybrid model
quantum device, leading to lower-depth quantum neu- parameters {θ ∗ , φ∗ } such that ρ̂θ∗ φ∗ ≈ σ̂β . In order
ral networks. Furthermore, the quantum layers provide to converge to this approximation via optimization of
a valuable addition in representation power to the clas- the parameters, we need a loss function to optimize
sical probabilistic model, as they allow the addition of which quantifies statistical distance between these quan-
quantum correlations to the model. tum mixed states. If we aim to minimize the discrep-
Thus, in this section, we cover some examples where ancy between states in terms of quantum relative en-
one learns to generatively model mixed states using a tropy D(ρ̂θφ kσ̂β ) = −S(ρ̂θφ ) − tr(ρ̂θφ log σ̂β ), (where
hybrid quantum-probabilistic model [48]. Such models S(ρ̂θφ ) = −tr(ρ̂θφ log ρ̂θφ ) is the entropy), then, as de-
use a parameterized ansatz of the form scribed in the full paper [57] we can equivalently minimize
X the free energy4 , and hence use it as our loss function:
ρ̂θφ = Û (φ)ρ̂θ Û † (φ), ρ̂θ = pθ (x) |xihx| (34)
Lfe (θ, φ) = βtr(ρ̂θφ Ĥ) − S(ρ̂θφ ). (36)
x
The first term is simply the expectation value of the
where Û (φ) is a unitary quantum neural network with energy of our model, while the second term is the en-
parameters φ and pθ (x) is a classical probabilistic model tropy. Due to the structure of our quantum-probabilistic
with parameters θ. We call ρ̂θφ the visible state and model, the entropy of the visible state is equal to the
ρ̂θ the latent state. Note the latent state is effectively a entropy of the latent state, which is simply the clas-
classical distribution over the standard basis states, and sical
its only parameters are those of the classical probabilistic P entropy of the distribution, S(ρ̂θφ ) = S(ρ̂θ ) =
− x pθ (x) log pθ (x). This comes in quite useful dur-
model. ing the optimization of our model.
As we shall see below, there are methods to train both Let us implement a simple example of the VQT model
networks simultaneously. In terms of software implemen- which minimizes free energy to achieve an approximation
tation, as we have to combine probabilistic models and of the thermal state of a physical system. Let us consider
quantum neural networks, we will use a combination of a two-dimensional Heisenberg spin chain
TensorFlow Probability [128] along with TFQ. A first X X
class of application we will consider is the task of gen- Ĥheis = Jh Ŝi · Ŝj + Jv Ŝi · Ŝj (37)
erating a thermal state of a quantum system given its hijih hijiv
Hamiltonian. A second set of applications is given sev-
eral copies of a mixed state, learn a generative model where h (v) denote horizontal (vertical) bonds, while h·i
which replicates the statistics of the state. represent nearest-neighbor pairings. First, we define this
Hamiltonian on a grid of qubits:
Target problems: def get_bond ( q0 , q1 ) :
return cirq . PauliSum . f r o m _ p a u l i _ s t r i n g s ([
1. Incorporating probabilistic and quantum models cirq . PauliString ( cirq . X ( q0 ) , cirq . X ( q1 ) ) ,
2. Variational Quantum Simulation of Quantum cirq . PauliString ( cirq . Y ( q0 ) , cirq . Y ( q1 ) ) ,
Thermal States cirq . PauliString ( cirq . Z ( q0 ) , cirq . Z ( q1 ) ) ])
3. Learning to generatively model mixed states from
def g e t _ h e i s e n b e r g _ h a m i l t o n i a n ( qubits , jh , jv ) :
data heisenberg = cirq . PauliSum ()
# Apply horizontal bonds
Required TFQ functionalities:
for r in qubits :
1. Integration with TF Probability [128] for q0 , q1 in zip (r , r [1::]) :
heisenberg += jh * get_bond ( q0 , q1 )
2. Sample-based simulation of quantum circuits # Apply vertical bonds
3. Parameter shift differentiator for gradient compu- for r0 , r1 in zip ( qubits , qubits [1::]) :
tation for q0 , q1 in zip ( r0 , r1 ) :
heisenberg += jv * get_bond ( q0 , q1 )
return heisenberg
2. Variational Quantum Thermalizer

Full notebook of the implementations below are avail- 4 More precisely, the loss function here is in fact the inverse tem-
able at: perature multiplied by the free energy, but this detail is of little
research/vqt qmhl/vqt qmhl.ipynb import to our optimization.
35

For our QNN, we consider a unitary consisting of general estimate via sampling of the classical probabilistic model
single qubit rotations and powers of controlled-not gates. and the output of the QPU.
Our code returns the associated symbols so that these For our classical latent probability distribution pθ (x),
can be fed into the Expectation op: as a first simple case, we can use the productQ of inde-
pendent Bernoulli distributions pθ (x) = j pθj (xj ) =
Q xj 1−xj
def g et _r o ta ti on _ 1q (q , a , b , c ) : j θj (1 − θj ) , where xj ∈ {0, 1} are binary values.
return cirq . Circuit ( We can re-phrase this distribution as an energy based
cirq . X ( q ) ** a , cirq . Y ( q ) ** b , cirq . Z ( q ) ** c ) model to take advantage of equation (V D 2). We move
the parameters into an exponential,Qso that the probabil-
def g et _r o ta ti on _ 2q ( q0 , q1 , a ) :
return cirq . Circuit ( ity of a bitstring becomes pθ (x) = j eθj xj /(eθj + e−θj ).
cirq . CNotPowGate ( exponent = a ) ( q0 , q1 ) ) Since this distribution is a product of independent vari-
ables, it is easy to sample from. We can use the Tensor-
def get_layer_1q ( qubits , layer_num , L_name ) : Flow Probability library [128] to produce samples from
layer_symbols = []
circuit = cirq . Circuit ()
this distribution, using the tfp.distributions.Bernoulli
for n , q in enumerate ( qubits ) : object:
a , b , c = sympy . symbols ( def b e r n o u l l i _ b i t _ p r o b a b i l i t y ( b ) :
" a {2} _ {0} _ {1} b {2} _ {0} _ {1} c {2} _ {0} _ {1} " return np . exp ( b ) /( np . exp ( b ) + np . exp ( - b ) )
. format ( layer_num , n , L_name ) )
layer_symbols += [a , b , c ] def s a m p l e _ b e r n o u l l i ( num_samples , biases ) :
circuit += ge t_ r ot at io n _1 q (q , a , b , c ) prob_list = []
return circuit , layer_symbols for bias in biases . numpy () :
prob_list . append (
def get_layer_2q ( qubits , layer_num , L_name ) : b e r n o u l l i _ b i t _ p r o b a b i l i t y ( bias ) )
layer_symbols = [] latent_dist = tfp . distributions . Bernoulli (
circuit = cirq . Circuit () probs = prob_list , dtype = tf . float32 )
for n , ( q0 , q1 ) in enumerate ( zip ( qubits [::2] , return latent_dist . sample ( num_samples )
qubits [1::2]) ) :
a = sympy . symbols ( " a {2} _ {0} _ {1} " . format ( After getting samples from our classical probabilistic
layer_num , n , L_name ) ) model, we take gradients of our QNN parameters. Be-
layer_symbols += [ a ]
circuit += ge t_ r ot at io n _2 q ( q0 , q1 , a )
cause TFQ implements gradients for its expectation ops,
return circuit , layer_symbols we can use tf.GradientTape to obtain these derivatives.
Note that below we used tf.tile to give our Hamiltonian
It will be convenient to consider a particular class of
operator and visible state circuit the correct dimensions:
probabilistic models where the estimation of the gradient
of the model parameters is straightforward to perform. b i t s t r i n g _ t e n so r = s a m p le _ b e r n o u l l i (
num_samples , vqt_biases )
This class of models are called exponential families or with tf . GradientTape () as tape :
energy-based models (EBMs). If our parameterized prob- t i l e d _ v q t _ m o d e l _ p a r a m s = tf . tile (
abilistic model is an EBM, then it is of the form: [ v q t _ m o d e l _ p a r a m s ] , [ num_samples , 1])
s a m p l e d _ e x p e c t a t i o n s = expectation (
pθ (x) = Z1θ e−Eθ (x) , Zθ ≡ x∈Ω e−Eθ (x) .
P
(38) tiled_visible_state ,
vqt_symbol_names ,
tf . concat ([ bitstring_tensor ,
For gradients of the VQT free energy loss function
t i l e d _ v q t _ m o d e l _ p a r a m s ] , 1) ,
with respect to the QNN parameters, ∂φ Lfe (θ, φ) = tiled_H )
β∂φ tr(ρ̂θφ Ĥ), this is simply the gradient of an expec- energy_losses = beta * s a m p l e d _ e x p e c t a t i o n s
tation value, hence we can use TFQ parameter shift gra- e n e r g y _ l o s s e s _ a v g = tf . reduce_mean (
energy_losses )
dients or any other method for estimating the gradients
v q t _ m o d e l _ g r a d i e n t s = tape . gradient (
of QNN’s outlined in previous sections. energy_losses_avg , [ v q t _ m o d e l _ p a r a m s ])
As for gradients of the classical probabilistic model,
one can readily derive that they are given by the following Putting these pieces together, we train our model to out-
covariance: put thermal states of the 2D Heisenberg model on a 2x2
h i grid. The result after 100 epochs is shown in Fig. 24.
∂θ Lfe = Ex∼pθ (x) (Eθ (x) − βHφ (x))∇θ Eθ (x) A great advantage of this approach to optimization of
    the probabilistic model is that the partition function Zθ
−(Ex∼pθ (x) Eθ (x)−βHφ (x) )(Ey∼pθ (y) ∇θ Eθ (y) ), does not need to be estimated. As such, more general
(39) more expressive models beyond factorized distributions
can be used for the probabilistic modelling of the la-
where Hφ (x) ≡ hx| Û † (φ)Ĥ Û (φ) |xi is the expectation tent classical distribution. In the advanced section of the
value of the Hamiltonian at the output of the QNN with notebook, we show how to use a Boltzmann machine as
the standard basis element |xi as input. Since the energy our energy based model. Boltzmann machines are EBM’s
n
function and its gradients can easily be evaluated as it is where for Pbitstring x ∈ {0,P1} , the energy is defined as
a neural network, the above gradient is straightforward to E(x) = − i,j wij xi xj − i bi xi .
36

It is worthy to note that our factorized Bernoulli distri- where σφ (x) ≡ hx| Û † (φ)σ̂D Û (φ) |xi is the distribution
bution is in fact a special case of the Boltzmann machine, obtained by feeding the data state σ̂D through the inverse
one where only the so-called bias P terms in the energy QNN circuit Û † (φ) and measuring in the standard basis.
function are present: E(x) = − i bi xi . In the notebook, As this is simply an expectation value of a state prop-
we start with this simpler Bernoulli example of the VQT, agated through a QNN, for gradients of the loss with re-
the resulting density matrix converges to the known ex- spect to QNN parameters we can use standard TFQ dif-
act result for this system, as shown in Fig. 24. We also ferentiators, such as the parameter shift rule presented in
provide a more advanced example with a general Boltz- section III. As for the gradient with respect to the EBM
mann machine. In the latter example, we picked a fully parameters, it is given by
visible, fully-connected classical Ising model energy func-
tion, and used MCMC with Metropolis-Hastings [129] to
sample from the energy function. ∂θ Lxe (θ, φ) = Ex∼σφ (x) [∇θ Eθ (x)] − Ey∼pθ (y) [∇θ Eθ (y)].

Let us implement a scenario where we were given the


output density matrix from our last VQT example as
data, let us see if we can learn to replicate its statistics
from data rather than from the Hamiltonian. For sim-
plicity we focus on the Bernoulli EBM defined above. We
can efficiently sample bitstrings from our learned classical
distribution and feed them through the learned VQT uni-
tary to produce our data state. These VQT parameters
are assumed fixed; they represent a quantum datasource
for QMHL.
We use the same ansatz for our QMHL unitary as we
did for VQT, layers of single qubit rotations and expo-
nentiated CNOTs. We apply our QMHL model unitary
to the output of our VQT to produce the pulled-back
data distribution. Then, we take expectation values of
Figure 24. Final density matrix output by the VQT algo- our current best estimate of the modular Hamiltonian:
rithm run with a factorized Bernoulli distribution as classical
latent distribution, trained via a gradient-based optimizer. def g e t _ q m h l _ w e i g h t s _ g r a d _ a n d _ b i a s e s _ g r a d (
See notebook for details. ebm_deriv_expectations , bitstring_list ,
biases ) :
b a r e _ q m h l _ b i a s e s _ g r a d = tf . reduce_mean (
ebm_deriv_expectations , 0)
3. Quantum Generative Learning from Quantum Data c_qmhl_biases_grad = ebm_biases_derivative_avg
( bi tstring_ list )
return tf . subtract ( bare_qmhl_biases_grad ,
Now that we have seen how to prepare thermal states c_qmhl_biases_grad )
from a given Hamiltonian, we can consider how we can
learn to generatively model mixed quantum states using Note that we use the tf.GradientTape functionality to
quantum-probabilistic models in the case where we are obtain the gradients of the QMHL model unitary. This
given several copies of a mixed state rather than a Hamil- functionality is enabled by our TFQ differentiators mod-
tonian. That is, we are given access to a data mixed ule.
state σ̂D , and we would like to find optimal parameters The classical model parameters can be updated
{θ ∗ , φ∗ } such that ρ̂θ∗ φ∗ ≈ σ̂D , where the model state is according to the gradient formula above. See the VQT
of the form described in (34). Furthermore, for reasons notebook for the results of this training.
of convenience which will be apparent below, it is useful
to posit that our classical probabilistic model is of the
form of an energy-based model as in equation (38).
If we aim to minimize the quantum relative entropy
VI. CLOSING REMARKS
between the data and our model (in reverse compared to
the VQT) i.e., D(σ̂D kρ̂θφ ) then it suffices to minimize
the quantum cross entropy as our loss function The rapid development of quantum hardware repre-
sents an impetus for the equally rapid development of
Lxe (θ, φ) ≡ −tr(σ̂D log ρ̂θφ ). quantum applications. In October 2017, the Google AI
By using the energy-based form of our latent classical Quantum team and collaborators released its first soft-
probability distribution, as can be readily derived (see ware library, OpenFermion, to accelerate the develop-
[57]), the cross entropy is given by ment of quantum simulation algorithms for chemistry
and materials sciences. Likewise, TensorFlow Quantum
Lxe (θ, φ) = Ex∼σφ (x) [Eθ (x)] + log Zθ , is intended to accelerate the development of quantum
37

machine learning algorithms for a wide array of applica- hospitality and support during their respective intern-
tions. Quantum machine learning is a very new and ex- ships, as well as fellow team members for several useful
citing field, so we expect the framework to change with discussions, in particular Matt Harrigan, John Platt, and
the needs of the research community, and the availabil- Nicholas Rubin. The authors would like to also thank
ity of new quantum hardware. We have open-sourced Achim Kempf from the University of Waterloo for spon-
the framework under the commercially friendly Apache2 soring this project. M.B. and J.Y. would like to thank the
license, allowing future commercial products to embed Google Brain team for supporting this project, in particu-
TFQ royalty-free. If you would like to participate in our lar Francois Chollet, Yifei Feng, David Rim, Justin Hong,
community, visit us at: and Megan Kacholia. G.V. would like to thank Stefan Le-
ichenauer, Jack Hidary and the rest of the Quantum@X
https://github.com/tensorflow/quantum/ team for support during his Quantum Residency. G.V.
acknowledges support from NSERC. D.B. is an Associate
Fellow in the CIFAR program on Quantum Information
Science. A.S. and M.S. were supported by the USRA
Feynman Quantum Academy funded by the NAMS R&D
Student Program at NASA Ames Research Center and by
VII. ACKNOWLEDGEMENTS the Air Force Research Laboratory (AFRL), NYSTEC-
USRA Contract (FA8750-19-3-6101). X, formerly known
The authors would like to thank Google Research for as Google[x], is part of the Alphabet family of compa-
supporting this project. M.B., G.V., T.M., and A.J.M. nies, which includes Google, Verily, Waymo, and others
would like to thank the Google Quantum AI Lab for their (www.x.company).

[1] K. P. Murphy, Machine learning: a probabilistic perspec- [16] S. Lloyd, M. Mohseni, and P. Rebentrost, “Quantum
tive (MIT press, 2012). algorithms for supervised and unsupervised machine
[2] J. A. Suykens and J. Vandewalle, Neural processing let- learning,” (2013), arXiv:1307.0411 [quant-ph].
ters 9, 293 (1999). [17] I. Kerenidis and A. Prakash, “Quantum recommenda-
[3] S. Wold, K. Esbensen, and P. Geladi, Chemometrics tion systems,” (2016), arXiv:1603.08675.
and intelligent laboratory systems 2, 37 (1987). [18] J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost,
[4] A. K. Jain, Pattern recognition letters 31, 651 (2010). N. Wiebe, and S. Lloyd, Nature 549, 195 (2017).
[5] Y. LeCun, Y. Bengio, and G. Hinton, nature 521, 436 [19] V. Giovannetti, S. Lloyd, and L. Maccone, Physical
(2015). review letters 100, 160501 (2008).
[6] A. Krizhevsky, I. Sutskever, and G. E. Hinton, in Ad- [20] S. Arunachalam, V. Gheorghiu, T. Jochym-OConnor,
vances in Neural Information Processing Systems 25 , M. Mosca, and P. V. Srinivasan, New Journal of Physics
edited by F. Pereira, C. J. C. Burges, L. Bottou, and 17, 123010 (2015).
K. Q. Weinberger (Curran Associates, Inc., 2012) pp. [21] E. Tang, (2018), 10.1145/3313276.3316310,
1097–1105. arXiv:1807.04271.
[7] I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, [22] J. Preskill, “Quantum computing in the NISQ era and
Deep learning, Vol. 1 (MIT press Cambridge, 2016). beyond,” (2018), arXiv:1801.00862.
[8] J. Preskill, arXiv preprint arXiv:1801.00862 (2018). [23] E. Farhi and H. Neven, arXiv preprint arXiv:1802.06002
[9] R. P. Feynman, International Journal of Theoretical (2018).
Physics 21, 467 (1982). [24] A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-
[10] Y. Cao, J. Romero, J. P. Olson, M. Degroote, P. D. Q. Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. Obrien,
Johnson, M. Kieferová, I. D. Kivlichan, T. Menke, Nature communications 5, 4213 (2014).
B. Peropadre, N. P. Sawaya, et al., Chemical reviews [25] N. Killoran, T. R. Bromley, J. M. Arrazola,
119, 10856 (2019). M. Schuld, N. Quesada, and S. Lloyd, arXiv preprint
[11] P. W. Shor, in Proceedings 35th annual symposium on arXiv:1806.06871 (2018).
foundations of computer science (Ieee, 1994) pp. 124– [26] D. Wecker, M. B. Hastings, and M. Troyer, Phys. Rev.
134. A 92, 042303 (2015).
[12] E. Farhi, J. Goldstone, and S. Gutmann, “A quan- [27] L. Zhou, S.-T. Wang, S. Choi, H. Pichler, and M. D.
tum approximate optimization algorithm,” (2014), Lukin, arXiv preprint arXiv:1812.01041 (2018).
arXiv:1411.4028 [quant-ph]. [28] J. R. McClean, J. Romero, R. Babbush, and A. Aspuru-
[13] F. Arute, K. Arya, R. Babbush, D. Bacon, J. C. Bardin, Guzik, New Journal of Physics 18, 023023 (2016).
R. Barends, R. Biswas, S. Boixo, F. G. Brandao, D. A. [29] S. Hadfield, Z. Wang, B. O’Gorman, E. G. Rief-
Buell, et al., Nature 574, 505 (2019). fel, D. Venturelli, and R. Biswas, arXiv preprint
[14] S. Lloyd, M. Mohseni, and P. Rebentrost, Nature arXiv:1709.03489 (2017).
Physics 10, 631633 (2014). [30] E. Grant, M. Benedetti, S. Cao, A. Hallam, J. Lockhart,
[15] P. Rebentrost, M. Mohseni, and S. Lloyd, Phys. Rev. V. Stojevic, A. G. Green, and S. Severini, npj Quantum
Lett. 113, 130503 (2014). Information 4, 1 (2018).
38

[31] S. Khatri, R. LaRose, A. Poremba, L. Cincio, A. T. [60] J. Carolan, M. Mohseni, J. P. Olson, M. Prabhu,
Sornborger, and P. J. Coles, Quantum 3, 140 (2019). C. Chen, D. Bunandar, M. Y. Niu, N. C. Harris, F. N. C.
[32] M. Schuld and N. Killoran, Physical review letters 122, Wong, M. Hochberg, S. Lloyd, and D. Englund, Nature
040504 (2019). Physics (2020).
[33] S. McArdle, T. Jones, S. Endo, Y. Li, S. Benjamin, and [61] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen,
X. Yuan, arXiv preprint arXiv:1804.03023 (2018). C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin,
[34] M. Benedetti, E. Grant, L. Wossnig, and S. Severini, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Is-
New Journal of Physics 21, 043023 (2019). ard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Lev-
[35] B. Nash, V. Gheorghiu, and M. Mosca, arXiv preprint enberg, D. Mane, R. Monga, S. Moore, D. Murray,
arXiv:1904.01972 (2019). C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever,
[36] Z. Jiang, J. McClean, R. Babbush, and H. Neven, arXiv K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan,
preprint arXiv:1812.08190 (2018). F. Viegas, O. Vinyals, P. Warden, M. Wattenberg,
[37] G. R. Steinbrecher, J. P. Olson, D. Englund, and J. Car- M. Wicke, Y. Yu, and X. Zheng, “Tensorflow: Large-
olan, arXiv preprint arXiv:1808.10047 (2018). scale machine learning on heterogeneous distributed sys-
[38] M. Fingerhuth, T. Babej, et al., arXiv preprint tems,” (2016), arXiv:1603.04467 [cs.DC].
arXiv:1810.13411 (2018). [62] Google, “Cirq: A python framework for creating, edit-
[39] R. LaRose, A. Tikku, É. O’Neel-Judy, L. Cincio, and ing, and invoking noisy intermediate scale quantum cir-
P. J. Coles, arXiv preprint arXiv:1810.10506 (2018). cuits,” (2018).
[40] L. Cincio, Y. Subaşı, A. T. Sornborger, and P. J. Coles, [63] A. Meurer, C. P. Smith, M. Paprocki, O. Čertı́k, S. B.
New Journal of Physics 20, 113022 (2018). Kirpichev, M. Rocklin, A. Kumar, S. Ivanov, J. K.
[41] H. Situ, Z. Huang, X. Zou, and S. Zheng, Quantum Moore, S. Singh, T. Rathnayake, S. Vig, B. E. Granger,
Information Processing 18, 230 (2019). R. P. Muller, F. Bonazzi, H. Gupta, S. Vats, F. Johans-
[42] H. Chen, L. Wossnig, S. Severini, H. Neven, and son, F. Pedregosa, M. J. Curry, A. R. Terrel, v. Roučka,
M. Mohseni, arXiv preprint arXiv:1805.08654 (2018). A. Saboo, I. Fernando, S. Kulal, R. Cimrman, and
[43] G. Verdon, M. Broughton, and J. Biamonte, arXiv A. Scopatz, PeerJ Computer Science 3, e103 (2017).
preprint arXiv:1712.05304 (2017). [64] D. E. Rumelhart, G. E. Hinton, and R. J. Williams,
[44] M. Mohseni et al., Nature 543, 171 (2017). nature 323, 533 (1986).
[45] G. Verdon, M. Broughton, J. R. McClean, K. J. Sung, [65] J. Biamonte and V. Bergholm, “Tensor networks in a
R. Babbush, Z. Jiang, H. Neven, and M. Mohseni, nutshell,” (2017), arXiv:1708.00006 [quant-ph].
“Learning to learn with quantum neural networks via [66] C. Roberts, A. Milsted, M. Ganahl, A. Zalcman,
classical neural networks,” (2019), arXiv:1907.05415. B. Fontaine, Y. Zou, J. Hidary, G. Vidal, and S. Le-
[46] R. Sweke, F. Wilde, J. Meyer, M. Schuld, P. K. ichenauer, “TensorNetwork: A Library for Physics
Fährmann, B. Meynard-Piganeau, and J. Eisert, arXiv and Machine Learning,” (2019), arXiv:1905.01330
preprint arXiv:1910.01155 (2019). [physics.comp-ph].
[47] J. R. McClean, J. Romero, R. Babbush, and A. Aspuru- [67] C. Roberts and S. Leichenauer, “Introducing Tensor-
Guzik, New Journal of Physics 18, 023023 (2016). Network, an Open Source Library for Efficient Tensor
[48] G. Verdon, J. M. Arrazola, K. Brádler, and N. Killoran, Calculations,” (2019).
arXiv preprint arXiv:1902.00409 (2019). [68] The TensorFlow Authors, “Effective TensorFlow 2,”
[49] Z. Wang, N. C. Rubin, J. M. Dominy, and E. G. Rieffel, (2019).
arXiv preprint arXiv:1904.09314 (2019). [69] F. Chollet et al., “Keras,” https://keras.io (2015).
[50] E. Farhi and H. Neven, “Classification with quantum [70] D. P. Kingma and J. Ba, arXiv preprint arXiv:1412.6980
neural networks on near term processors,” (2018), (2014).
arXiv:1802.06002 [quant-ph]. [71] G. E. Crooks, arXiv preprint arXiv:1905.13311 (2019).
[51] J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Bab- [72] M. Smelyanskiy, N. P. D. Sawaya, and A. Aspuru-
bush, and H. Neven, Nature communications 9, 1 Guzik, ArXiv e-prints (2016), arXiv:1601.07195 [quant-
(2018). ph].
[52] I. Cong, S. Choi, and M. D. Lukin, Nature Physics 15, [73] T. Hner and D. S. Steiger, ArXiv e-prints (2017),
12731278 (2019). arXiv:1704.01127 [quant-ph].
[53] S. Lloyd and C. Weedbrook, Physical review letters 121, [74] Intel, “Intel Instruction Set Extensions Technology,”
040502 (2018). (2020).
[54] G. Verdon, J. Pye, and M. Broughton, arXiv preprint [75] Mark Buxton, “Haswell New Instruction Descriptions
arXiv:1806.09729 (2018). Now Available!” (2011).
[55] J. Romero and A. Aspuru-Guzik, arXiv preprint [76] D. Gottesman, Stabilizer codes and quantum error cor-
arXiv:1901.00848 (2019). rection, Ph.D. thesis, California Institute of Technology
[56] V. Bergholm, J. Izaac, M. Schuld, C. Gogolin, C. Blank, (1997).
K. McKiernan, and N. Killoran, arXiv preprint [77] M. Suzuki, Physics Letters A 146, 319 (1990).
arXiv:1811.04968 (2018). [78] E. Campbell, Physical Review Letters 123 (2019),
[57] G. Verdon, J. Marks, S. Nanda, S. Leichenauer, and 10.1103/physrevlett.123.070503.
J. Hidary, arXiv preprint arXiv:1910.02071 (2019). [79] N. C. Rubin, R. Babbush, and J. McClean, New Jour-
[58] M. Mohseni, A. M. Steinberg, and J. A. Bergou, Phys. nal of Physics 20, 053020 (2018).
Rev. Lett. 93, 200403 (2004). [80] A. Harrow and J. Napp, arXiv preprint
[59] M. Y. Niu, S. Boixo, V. N. Smelyanskiy, and H. Neven, arXiv:1901.05374 (2019).
npj Quantum Information 5, 1 (2019). [81] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Pro-
ceedings of the IEEE 86, 2278 (1998).
39

[82] L. Bottou, in Proceedings of COMPSTAT’2010 (2020), arXiv:2001.00585.


(Springer, 2010) pp. 177–186. [108] A. Hagberg, P. Swart, and D. S Chult, Exploring
[83] S. Ruder, arXiv preprint arXiv:1609.04747 (2016). network structure, dynamics, and function using Net-
[84] Y. LeCun, D. Touresky, G. Hinton, and T. Sejnowski, workX, Tech. Rep. (Los Alamos National Lab.(LANL),
in Proceedings of the 1988 connectionist models summer Los Alamos, NM (United States), 2008).
school, Vol. 1 (CMU, Pittsburgh, Pa: Morgan Kauf- [109] M. Streif and M. Leib, arXiv preprint arXiv:1908.08862
mann, 1988) pp. 21–28. (2019).
[85] A. G. Baydin, B. A. Pearlmutter, A. A. Radul, and [110] H. Pichler, S.-T. Wang, L. Zhou, S. Choi, and
J. M. Siskind, The Journal of Machine Learning Re- M. D. Lukin, “Quantum optimization for maximum
search 18, 5595 (2017). independent set using rydberg atom arrays,” (2018),
[86] M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, arXiv:1808.10816.
J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, [111] M. Wilson, S. Stromswold, F. Wudarski, S. Had-
et al. field, N. M. Tubman, and E. Rieffel, “Optimiz-
[87] R. Frostig, M. J. Johnson, and C. Leary, . ing quantum heuristics with meta-learning,” (2019),
[88] M. Abramowitz and I. Stegun, Appl. Math. Ser 55 arXiv:1908.03185 [quant-ph].
(2006). [112] E. Farhi, J. Goldstone, and S. Gutmann, ArXiv e-prints
[89] M. Schuld, V. Bergholm, C. Gogolin, J. Izaac, and (2014), arXiv:1411.4028 [quant-ph].
N. Killoran, arXiv preprint arXiv:1811.11184 (2018). [113] J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Bab-
[90] I. Newton, The Principia: mathematical principles of bush, and H. Neven, Nature Communications 9 (2018),
natural philosophy (Univ of California Press, 1999). 10.1038/s41467-018-07090-4.
[91] K. Mitarai, M. Negoro, M. Kitagawa, and K. Fujii, [114] X. Glorot and Y. Bengio, in AISTATS (2010) pp. 249–
Physical Review A 98, 032309 (2018). 256.
[92] S. Bhatnagar, H. Prasad, and L. Prashanth, . [115] M. Cerezo, A. Sone, T. Volkoff, L. Cincio, and P. J.
[93] D. Gross, Y.-K. Liu, S. T. Flammia, S. Becker, Coles, arXiv preprint arXiv:2001.00550 (2020).
and J. Eisert, Physical Review Letters 105 (2010), [116] Y. Bengio, in Neural networks: Tricks of the trade
10.1103/physrevlett.105.150401. (Springer, 2012) pp. 437–478.
[94] J. Schulman, N. Heess, T. Weber, and P. Abbeel, [117] E. Knill, G. Ortiz, and R. D. Somma, Physical Review
in Advances in Neural Information Processing Systems A 75, 012328 (2007).
(2015) pp. 3528–3536. [118] S. Boixo, S. V. Isakov, V. N. Smelyanskiy, R. Babbush,
[95] V. Havlı́ček, A. D. Córcoles, K. Temme, A. W. Harrow, N. Ding, Z. Jiang, M. J. Bremner, J. M. Martinis, and
A. Kandala, J. M. Chow, and J. M. Gambetta, Nature H. Neven, Nature Physics 14, 595 (2018).
567, 209 (2019). [119] E. Grant, L. Wossnig, M. Ostaszewski, and
[96] W. Huggins, P. Patil, B. Mitchell, K. B. Whaley, and M. Benedetti, Quantum 3, 214 (2019).
E. M. Stoudenmire, Quantum Science and Technology [120] A. Skolik, J. R. McClean, M. Mohseni, P. van der
4, 024001 (2019). Smagt, and M. Leib, in preparation.
[97] N. Tishby and N. Zaslavsky, “Deep learning and [121] B. T. Kiani, S. Lloyd, and R. Maity, (2020).
the information bottleneck principle,” (2015), [122] C. Cirstoiu, Z. Holmes, J. Iosue, L. Cincio, P. J.
arXiv:1503.02406 [cs.LG]. Coles, and A. Sornborger, “Variational fast forward-
[98] P. Walther, K. J. Resch, T. Rudolph, E. Schenck, ing for quantum simulation beyond the coherence time,”
H. Weinfurter, V. Vedral, M. Aspelmeyer, and (2019), arXiv:1910.04292 [quant-ph].
A. Zeilinger, Nature 434, 169 (2005). [123] N. Wiebe, C. Granade, C. Ferrie, and D. Cory,
[99] M. A. Nielsen, Reports on Mathematical Physics 57, Physical Review Letters 112 (2014), 10.1103/phys-
147 (2006). revlett.112.190501.
[100] G. Vidal, Physical Review Letters 101 (2008), [124] G. Verdon, T. McCourt, E. Luzhnica, V. Singh, S. Le-
10.1103/physrevlett.101.110501. ichenauer, and J. Hidary, “Quantum graph neural net-
[101] R. Shwartz-Ziv and N. Tishby, “Opening the black works,” (2019), arXiv:1909.12264 [quant-ph].
box of deep neural networks via information,” (2017), [125] S. Greydanus, M. Dzamba, and J. Yosinski, “Hamil-
arXiv:1703.00810 [cs.LG]. tonian neural networks,” (2019), arXiv:1906.01563
[102] P. B. M. Sousa and R. V. Ramos, “Universal quan- [cs.NE].
tum circuit for n-qubit quantum gate: A pro- [126] L. Cincio, Y. Suba, A. T. Sornborger, and P. J. Coles,
grammable quantum gate,” (2006), arXiv:quant- New Journal of Physics 20, 113022 (2018).
ph/0602174 [quant-ph]. [127] M. A. Nielsen and I. L. Chuang, Quantum Computation
[103] M. Y. Niu, L. Li, M. Mohseni, and I. Chuang, to be and Quantum Information: 10th Anniversary Edition,
published. 10th ed. (Cambridge University Press, USA, 2011).
[104] Y. Chen, M. W. Hoffman, S. G. Colmenarejo, M. Denil, [128] J. V. Dillon, I. Langmore, D. Tran, E. Brevdo, S. Va-
T. P. Lillicrap, M. Botvinick, and N. de Freitas, arXiv sudevan, D. Moore, B. Patton, A. Alemi, M. Hoffman,
preprint arXiv:1611.03824 (2016). and R. A. Saurous, “Tensorflow distributions,” (2017),
[105] E. Farhi, J. Goldstone, and S. Gutmann, arXiv preprint arXiv:1711.10604 [cs.LG].
arXiv:1412.6062 (2014). [129] C. P. Robert, “The metropolis-hastings algorithm,”
[106] G. S. Hartnett and M. Mohseni, “A probabil- (2015), arXiv:1504.01896 [stat.CO].
ity density theory for spin-glass systems,” (2020),
arXiv:2001.00927.
[107] G. S. Hartnett and M. Mohseni, “Self-supervised learn-
ing of generative spin-glasses with normalizing flows,”

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy