Tensor Flow Q
Tensor Flow Q
be embedded into quantum states [19], a process whose and trained networks find approximate solutions which
scalability is under debate [20]. Additionally, there is a generalize well to unseen data.
scaling variation when these algorithms are applied to
In principle, any or all combinations of these at-
classical data mostly rendering the quantum advantage
tributes could be susceptible to possible improvements
to become polynomial [21]. Continuing debates around
by quantum computation. There are many ways to com-
speedups and assumptions make it prudent to look be-
bine classical and quantum computations. One well-
yond classical data for applications of quantum compu-
known method is to use classical computers as outer-
tation to machine learning.
loop optimizers for QNNs. When training a QNN with
With the availability of Noisy Intermediate-Scale
a classical optimizer in a quantum-classical loop, the
Quantum (NISQ) processors [22], the second generation
overall algorithm is sometimes referred to as a Varia-
of QML has emerged [8, 12, 18, 23–43]. In contrast to
tional Quantum-Classical Algorithm. Some recently pro-
the first generation, this new trend in QML is based on
posed architectures of QNN-based variational quantum-
heuristic methods which can be studied empirically due
classical algorithms include Variational Quantum Eigen-
to the increased computational capability of quantum
solvers (VQEs) [28, 47], Quantum Approximate Opti-
hardware. This is reminiscent of how machine learning
mization Algorithms (QAOAs) [12, 27, 48, 49], Quan-
evolved towards deep learning with the advent of new
tum Neural Networks (QNNs) for classification [50, 51],
computational capabilities [44]. These new algorithms
Quantum Convolutional Neural Networks (QCNN) [52],
use parameterized quantum transformations called pa-
and Quantum Generative Models [53]. Generally, the
rameterized quantum circuits (PQCs) or Quantum Neu-
goal is to optimize over a parameterized class of compu-
ral Networks (QNNs) [23, 42]. In analogy with classical
tations to either generate a certain low-energy wavefunc-
deep learning, the parameters of a QNN are then opti-
tion (VQE/QAOA), learn to extract non-local informa-
mized with respect to a cost function via either black-box
tion (QNN classifiers), or learn how to generate a quan-
optimization heuristics [45] or gradient-based methods
tum distribution from data (generative models). It is
[46], in order to learn a representation of the training
important to note that in the standard model architec-
data. In this paradigm, quantum machine learning is the
ture for these applications, the representation typically
development of models, training strategies, and inference
resides entirely on the quantum processor, with classical
schemes built on parameterized quantum circuits.
heuristics participating only as optimizers for the tun-
able parameters of the quantum model. Various forms
of gradient descent are the most popular optimization
B. Hybrid Quantum-Classical Models heuristics, but an obstacle to the use of gradient descent
is the effect of barren plateaus [51], which generally arises
Near-term quantum processors are still fairly small and when a network lacking structure is randomly initialized.
noisy, thus quantum models cannot disentangle and gen- Strategies for overcoming these issues are discussed in
eralize quantum data using quantum processors alone. detail in section V B.
NISQ processors will need to work in concert with clas-
sical co-processors to become effective. We anticipate While the use of classical processors as outer-loop op-
that investigations into various possible hybrid quantum- timizers for quantum neural networks is promising, the
classical machine learning algorithms will be a produc- reality is that near-term quantum devices are still fairly
tive area of research and that quantum computers will noisy, thus limiting the depth of quantum circuit achiev-
be most useful as hardware accelerators, working in sym- able with acceptable fidelity. This motivates allowing as
biosis with traditional computers. In order to understand much of the model as possible to reside on classical hard-
the power and limitations of classical deep learning meth- ware. Several applications of quantum computation have
ods, and how they could be possibly improved by in- ventured beyond the scope of typical variational quantum
corporating parameterized quantum circuits, it is worth algorithms to explore this combination. Instead of train-
defining key indicators of learning performance: ing a purely quantum model via a classical optimizer,
Representation capacity: the model architecture has one then considers scenarios where the model itself is a
the capacity to accurately replicate, or extract useful in- hybrid between quantum computational building blocks
formation from, the underlying correlations in the train- and classical computational building blocks [54–57] and
ing data for some value of the model’s parameters. is trained typically via gradient-based methods. Such
scenarios leverage a new form of automatic differentia-
Training efficiency: minimizing the cost function via
tion that allows the backwards propagation of gradients
stochastic optimization heuristics should converge to an
in between parameterized quantum and classical compu-
approximate minimum of the loss function in a reason-
tations. The theory of such hybrid backpropagation will
able number of iterations.
be covered in section III C.
Inference tractability: the ability to run inference on
a given model in a scalable fashion is needed in order to In summary, a hybrid quantum-classical model is a
make predictions in the training or test phase. learning heuristic in which both the classical and quan-
Generalization power : the cost function for a given tum processors contribute to the indicators of learning
model should yield a landscape where typically initialized performance defined above.
4
difference between TensorFlow tensors and the tensor ob- tum device itself, and the high latency between classi-
jects in Tensor Networks [67]. cal and quantum processors makes sending transforma-
The recently announced TensorFlow 2 [68] takes the tions one-by-one prohibitive. Lastly, one needs a way
dataflow graph structure as a foundation and adds high- to specify gates and measurements within TF. One may
level abstractions. One new feature is the Python func- be tempted to define these directly; however, Cirq al-
tion decorator @tf.function , which automatically con- ready has the necessary tools and objects defined which
verts the decorated function into a graph computation. are most relevant for the near-term quantum computing
Also relevant is the native support for Keras [69], which era. Duplicating Cirq functionality in TF would lead to
provides the Layer and Model constructs. These ab- several issues, requiring users to re-learn how to inter-
stractions allow the concise definition of machine learn- face with quantum computers in TFQ versus Cirq, and
ing models which ingest and process data, all backed by adding to the maintenance overhead by needing to keep
dataflow graph computation. The increasing levels of ab- two separate quantum circuit construction frameworks
straction and heterogenous hardware backing which to- up-to-date as new compilation techniques arise. These
gether constitute the TensorFlow stack can be visualized considerations motivate our core design principles.
with the orange and gray boxes in our stack diagram in
Fig. 4. The combination of these high-level abstractions
and efficient dataflow graph backend makes TensorFlow D. TFQ architecture
2 an ideal platform for data-driven machine learning re-
search. 1. Design Principles and Overview
C. Technical Hurdles in Combining Cirq with To avoid the aforementioned technical hurdles and in
TensorFlow order to satisfy the diverse needs of the research commu-
nity, we have arrived at the following four design princi-
ples:
There are many ways one could imagine combining the
capabilities of Cirq and TensorFlow. One possible ap-
proach is to let graph edges represent quantum states and 1. Differentiability. As described in the introduc-
let ops represent transformations of the state, such as ap- tion, gradient-based methods leveraging autodiffer-
plying circuits and taking measurements. This approach entiation have become the leading heuristic for op-
can be called the “states-as-edges” architecture. We show timization of machine learning models. A software
in Fig. 2 how to reformulate the Bell state preparation framework for QML must support differentiation of
and measurement discussed in section II A within this quantum circuits so that hybrid quantum-classical
proposed architecture. models can participate in backpropagation.
TF Keras Models
TensorFlow
TFQ
TF Layers TFQ Layers
Differentiators
TFQ
Figure 3. The TensorFlow graph generated to calculate Classical
the TF Ops TFQ Ops
expectation value of a parameterized circuit. The hardware
symbol
values can come from other TensorFlow ops, such as from
Quantum TF Execution Engine TFQ qsim Cirq
the outputs of a classical neural network. The output hardware
can be
passed on to other ops in the graph; here, for illustration, the TPU GPU CPU QPU
output is passed to the absolute value op.
Evaluate Gradients & thus making it accessible to local measurements and clas-
Update Parameters
sical post-processing. Quantum models are constructed
using cirq.Circuit objects containing SymPy symbols,
and can be attached to quantum data sources using the
Evaluate tfq.AddCircuit layer.
Cost
Function
(3) Sample or Average: Measurement of quantum
states extracts classical information, in the form of sam-
𝛉 ples from a classical random variable. The distribution
of values from this random variable generally depends
Prepare Evaluate Sample Evaluate on both the quantum state itself and the measured ob-
Quantum Dataset Quantum or Classical servable. As many variational algorithms depend on
Model Average Model mean values of measurements, TFQ provides methods
for averaging over several runs involving steps (1) and
Figure 5. Abstract pipeline for inference and training of (2). Sampling or averaging are performed by feeding
a hybrid discriminative model in TFQ. Here, Φ represents quantum data and quantum models to the tfq.Sample
the quantum model parameters and θ represents the classical
model parameters.
or tfq.Expectation layers.
(4) Evaluate Classical Model: Once classical
information has been extracted, it is in a format
The functionality of all three software products and the amenable to further classical post-processing. As the
interfaces between them can be visualized with the help extracted information may still be encoded in classi-
of a “software-stack” diagram, shown in Fig. 4. cal correlations between measured expectations, clas-
In the next section, we describe an example of an sical deep neural networks can be applied to distill
abstract quantum machine learning pipeline for hybrid such correlations. Since TFQ is fully compatible with
discriminator model that TFQ was designed to support. core TensorFlow, quantum models can be attached di-
Then we illustrate the TFQ pipeline via a Hello Many- rectly to classical tf.keras.layers.Layer objects such as
Worlds example, which involves building the simplest tf.keras.layers.Dense .
possible hybrid quantum-classical model for a binary (5) Evaluate Cost Function: Given the results of
classification task on a single qubit. More detailed in- classical post-processing, a cost function is calculated.
formation on the building blocks of TFQ features will be This may be based on the accuracy of classification if the
given in section II E. quantum data was labeled, or other criteria if the task
is unsupervised. Wrapping the model built in stages (1)
through (4) inside a tf.keras.Model gives the user access
2. The Abstract TFQ Pipeline for a specific hybrid to all the losses in the tf.keras.losses module.
discriminator model (6) Evaluate Gradients & Update Parameters:
After evaluating the cost function, the free parame-
Here, we provide a high-level abstract overview of the ters in the pipeline is updated in a direction expected
computational steps involved in the end-to-end pipeline to decrease the cost. This is most commonly per-
for inference and training of a hybrid quantum-classical formed via gradient descent. To support gradient de-
discriminative model for quantum data in TFQ. scent, TFQ exposes derivatives of quantum operations
(1) Prepare Quantum Dataset: In general, this to the TensorFlow backpropagation machinery via the
might come from a given black-box source. However, tfq.differentiators.Differentiator interface. This allows
as current quantum computers cannot import quantum both the quantum and classical models’ parameters to
data from external sources, the user has to specify quan- be optimized against quantum data via hybrid quantum-
tum circuits which generate the data. Quantum datasets classical backpropagation. See section III for details on
are prepared using unparameterized cirq.Circuit ob- the theory.
jects and are injected into the computational graph using In the next section, we illustrate this abstract pipeline
tfq.convert_to_tensor . by applying it to a specific example. While simple, the
(2) Evaluate Quantum Model: Parameterized example is the minimum instance of a hybrid quantum-
quantum models can be selected from several categories classical model operating on quantum data.
based on knowledge of the quantum data’s structure.
The goal of the model is to perform a quantum compu-
tation in order to extract information hidden in a quan- 3. Hello Many-Worlds: Binary Classifier for Quantum
tum subspace or subsystem. In the case of discrimina- Data
tive learning, this information is the hidden label pa-
rameters. To extract a quantum non-local subsystem, Binary classification is a basic task in machine learn-
the quantum model disentangles the input data, leaving ing that can be applied to quantum data as well. As a
the hidden information encoded in classical correlations, minimal example of a hybrid quantum-classical model,
9
loss = tf . keras . losses . C a t e g o r i c a l C r o s s e n t r o p y () This conversion is backed by our custom serializers. Once
model . compile ( optimizer = optimizer , loss = loss ) a Circuit or PauliSum is serialized, it becomes a ten-
history = model . fit ( sor of type tf.string . This is the reason for the use
x = q_data , y = labels , epochs =50)
of tf.keras.Input(shape=(), dtype=tf.dtypes.string) when
Finally, we can use our trained hybrid model to classify creating inputs to Keras models, as seen in the quantum
new quantum datapoints: binary classifier example.
test_data , _ = g e n e r a t e _ d a t a s e t (
qubit , theta_a , theta_b , 1)
p = model . predict ( test_data ) [0] 2. Composing Quantum Models
print ( f " prob ( a ) ={ p [0]:.4 f } , prob ( b ) ={ p [1]:.4 f } " )
This section provided a rapid introduction to just that After injecting quantum data and quantum mod-
code needed to complete the task at hand. The follow- els into the computational graph, a custom Tensor-
ing section reviews the features of TFQ in a more API Flow operation is required to combine them. In
reference inspired style. support of guiding principle 2, TFQ implements the
tfq.layers.AddCircuit layer for combining tensors of cir-
cuits. In the following code, we use this functionality
E. TFQ Building Blocks to combine the quantum data point and quantum model
defined in subsection II E 1:
Having provided a minimum working example in the add_op = tfq . layers . AddCircuit ()
previous section, we now seek to provide more details data _and_mod el = add_op ( q_data , append = q_model )
about the components of the TFQ framework. First, we
To quantify the performance of a quantum model on a
describe how quantum computations specified in Cirq are
quantum dataset, we need the ability to define loss func-
converted to tensors for use inside the TensorFlow graph.
tions. This requires converting quantum information into
Then, we describe how these tensors can be combined in-
classical information. This conversion process is accom-
graph to yield larger models. Next, we show how circuits
plished by either sampling the quantum model, which
are simulated and measured in TFQ. The core functional-
stochastically produces bitstrings according to the prob-
ity of the framework, differentiation of quantum circuits,
ability amplitudes of the model, or by specifying a mea-
is then explored. Finally, we describe our more abstract
surement and taking expectation values.
layers, which can be used to simplify many QML work-
flows.
3. Sampling and Expectation Values
Though sampling is the fundamental interface between first guiding principle, differentiability is the critical ma-
quantum and classical information, differentiability of chinery needed to allow training of these models. As de-
quantum circuits is much more convenient when using scribed in section II B, the architecture of TensorFlow is
expectation values, as gradient information can then optimized around backpropagation of errors for efficient
be backpropagated (see section III for more details). updates of model parameters; one of the core contribu-
In the simplest case, expectation values are simply av- tions of TFQ is integration with TensorFlow’s backprop-
erages over samples. In quantum computing, expectation agation mechanism. TFQ implements this functionality
values are typically taken with respect to a measurement with our differentiators module. The theory of quan-
operator M . This involves sampling bitstrings from the tum circuit differentiation will be covered in section III C;
quantum circuit as described above, applying M to the here, we overview the software that implements the the-
list of bitstring samples to produce a list of numbers, ory.
then taking the average of the result. TFQ provides two Since there are many ways to calculate gra-
related layers with this capability: dients of quantum circuits, TFQ provides the
In contrast to sampling (which is by default in the stan- tfq.differentiators.Differentiator interface. Our
dard computational basis, the Ẑ eigenbasis of all qubits), Expectation and SampledExpectation layers rely on
taking expectation values requires defining a measure-
classes inheriting from this interface to specify how
ment. As discussed in section II A, these are first defined
TensorFlow should compute their gradients. While
as cirq.PauliSum objects and converted to tensors. TFQ
advanced users can implement their own custom differ-
implements tfq.layers.Expectation , a Keras layer which entiators by inheriting from the interface, TFQ comes
enables the extraction of measurement expectation val- with several built-in options, two of which we highlight
ues from quantum models. The user supplies a tensor of here. These two methods are instances of the two main
parameterized circuits, a list of symbols contained in the categories of quantum circuit differentiators: the finite
circuits, a tensor of values to substitute for the symbols difference methods and the parameter shift methods.
in the circuit, and a tensor of operators to measure with
respect to them. Given these inputs, the layer outputs The first class of quantum circuit differentiators is the
a tensor of expectation values. Below, we show how to finite difference methods. This class samples the primary
take an expectation value of the measurement defined in quantum circuit for at least two different parameter set-
section II E 1: tings, then combines them to estimate the derivative.
The ForwardDifference differentiator provides most ba-
e x p e c t a t i o n _ l a y e r = tfq . layers . Expectation () sic version of this. For each parameter in the circuit, the
expectations = e x p e c t a t i o n _ l a y e r (
circuit is sampled at the current setting of the parame-
circuit = data_and_model ,
symbol_names = [ ’ theta ’] , ter. Then, each parameter is perturbed separately and
symbol_values = [[0.5]] , the circuit resampled.
operators = q_measure )
For the 2-local circuits implementable on near-term
In Fig. 3, we illustrate the dataflow graph which backs hardware, methods more sophisticated than finite differ-
the expectation layer, when the parameter values are sup- ences are possible. These methods involve running an
plied by a classical neural network. The expectation layer ancillary quantum circuit, from which the gradient of the
is capable of using either a simulator or a real device for the primary circuit with respect to some parameter can
execution, and this choice is simply specified at run time. be directly measured. One specific method is gate de-
While Cirq simulators may be used for the backend, TFQ composition and parameter shifting [71], implemented in
provides its own native TensorFlow simulator written in TFQ as the ParameterShift differentiator. For in-depth
performant C++. A description of our quantum circuit discussion of the theory, see section III C 2.
simulation code is given in section II F. The differentiation rule used by our layers is specified
Having converted the output of a quantum model into through an optional keyword argument. Below, we show
classical information, the results can be fed into subse- the expectation layer being called with our parameter
quent computations. In particular, they can be fed into shift rule:
functions that produce a single number, allowing us to
define loss functions over quantum models in the same
diff = tfq . di ff er e nt ia to r s . Paramet erShift ()
way we do for classical models. expectation = tfq . layers . Expectation (
diff erentiat or = diff )
5. Simplified Layers
cna = cnb , then the gate at timestep ta is a two-qubit Random Quantum Circuits Structured Quantum Circuits
3
4. Benchmarks
When the simulated circuits have more structure, the
Fusion algorithm allows us to achieve a larger perfor-
Here, we demonstrate the performance of TFQ, backed
mance boost by reducing the number of gates that ulti-
by qsim, relative to Cirq on two benchmark simula-
mately need to be simulated. The circuits for this task
tion tasks. As detailed above, the performance differ-
are a factorized version of the supremacy-style circuits
ence is due to circuit pre-processing via gate fusion com-
which generate entanglement only on small subsets of the
bined with performant C++ simulators. The bench-
qubits. In summary, for these circuits, we find a roughly
marks were performed on a desktop equipped with an
100 times improvement in simulation time in TFQ versus
Intel(R) Xeon(R) W-2135 CPU running at 3.70 GHz.
Cirq. The performance curves are shown in Fig. 9.
This CPU supports the AVX2 instruction set, which at
the time of writing is the fastest hardware tier supported
by TFQ. In the following simulations, TFQ uses a sin- Thus we see that in addition to our core functionality
gle core, while Cirq simulation is allowed to parallelize of implementing native TensorFlow gradients for quan-
over all available cores using the Python multiprocessing tum circuits, TFQ also provides a significant boost in
module and the map function. performance over Cirq when running in simulation mode.
The first benchmark task is simulation of 500 Additionally, as noted before, this performance boost is
supremacy-style circuits batched 50 at a time. These despite the additional overhead of serialization between
circuits were generated using the Cirq function the TensorFlow frontend and qsim proper.
14
parallel: The above form will be useful for our discussion of gra-
M dients of expectation values below.
Ò
Û ` (θ ` ) ≡ Ûj` (θj` ). (2)
j=1
B. Sampling and Expectations
Finally, each of these unitaries Ûj` can be expressed as
the exponential of some generator ĝj` , which itself can
To optimize the parameters of an ansatz from equation
be any Hermitian operator on n qubits (thus expressible
(1), we need a cost function to optimize. In the case of
as a linear combination of n-qubit Pauli’s),
standard variational quantum algorithms, this cost func-
Kj` tion is most often chosen to be the expectation value of
−iθj` ĝj`
X
Ûj` (θj` ) =e , ĝj` = βkj` P̂k , (3) a cost Hamiltonian,
k=1
f (θ) = hĤiθ ≡ hΨ0 | Û † (θ)Ĥ Û (θ) |Ψ0 i (6)
where P̂k ∈ Pn , here Pn denotes the Paulis on n-qubits
[76], and βkj` ∈ R for all k, j, `. For a given j and `, in the where |Ψ0 i is the input state to the parameterized circuit.
case where all the Pauli terms commute, i.e. [P̂k , P̂m ] = In general, the cost Hamiltonian can be expressed as a
0 for all m, k such that βm j`
, βkj` 6= 0, one can simply linear combination of operators, e.g. in the form
decompose the unitary into a product of exponentials of
N
each term, X
Ĥ = αk ĥk ≡ α · ĥ, (7)
Y ` j`
Ûj` (θj` ) = e−iθj βk P̂k . (4) k=1
k
where we defined a vector of coefficients α ∈ RN and
Otherwise, in instances where the various terms do not a vector of N operators ĥ. Often this decomposition is
commute, one may apply a Trotter-Suzuki decomposi- chosen such that each of these sub-Hamiltonians is in the
tion of this exponential [77], or other quantum simula- n-qubit Pauli group ĥk ∈ Pn . The expectation value of
tion methods [78]. Note that in the above case where this Hamiltonian is then generally evaluated via quantum
15
expectation estimation, i.e. by taking the linear combi- C. Gradients of Quantum Neural Networks
nation of expectation values of each term
Now that we have established how to evaluate the
N
X loss function, let us describe how to obtain gradients of
f (θ) = hĤiθ = αk hĥk iθ ≡ α · hθ , (8) the cost function with respect to the parameters. Why
k=1 should we care about gradients of quantum neural net-
works? In classical deep learning, the most common fam-
where we introduced the vector of expectations hθ ≡ ily of optimization heuristics for the minimization of cost
hĥiθ . In the case of non-commuting terms, the vari- functions are gradient-based techniques [81–83], which
ous expectation values hĥk iθ are estimated over separate include stochastic gradient descent and its variants. To
runs. leverage gradient-based techniques for the learning of
multilayered models, the ability to rapidly differentiate
Note that, in practice, each of these quantum expecta- error functionals is key. For this, the backwards propa-
tions is estimated via sampling of the output of the quan- gation of errors [84] (colloquially known as backprop), is
tum computer [28]. Even assuming a perfect fidelity of a now canonical method to progressively calculate gradi-
quantum computation, sampling measurement outcomes ents of parameters in deep networks. In its most general
of eigenvalues of observables from the output of the quan- form, this technique is known as automatic differentiation
tum computer to estimate an expectation will have some [85], and has become so central to deep learning that this
non-negligible variance for any finite number of samples. feature of differentiability is at the core of several frame-
Assuming each of the Hamiltonian terms of equation (7) works for deep learning, including of course TensorFlow
admit a Pauli operator decomposition as (TF) [86], JAX [87], and several others.
To be able to train hybrid quantum-classical models
Jj (section III D), the ability to take gradients of quantum
X
ĥj = γkj P̂k , (9) neural networks is key. Now that we understand the
k=1 greater context, let us describe a few techniques below
for the estimation of these gradients.
where the γkj ’s are real-valued coefficients and the P̂j ’s
are Paulis that are Pauli operators [76], then to get an
estimate of the expectation value hĥj i within an accu- 1. Finite difference methods
racy , one needs to take a number of measurement sam-
PJj A simple approach is to use simple finite-difference
ples scaling as ∼ O(kĥk k2∗ /2 ), where kĥk k∗ = k=1 |γkj |
is the Pauli coefficient norm of each Hamiltonian term. methods, for example, the central difference method,
Thus, to estimate the expectation value of the fullP Hamil-
tonian (7) accurately within a precision ε = k |αk |2 , f (θ + ε∆k ) − f (θ − ε∆k )
∂k f (θ) = + O(ε2 ) (10)
we would need on the order of ∼ O( 12 k |αk |2 kĥk k2∗ )
P 2ε
measurement samples in total [26, 79], as we would need which, in the case where there are M continuous param-
to measure each expectation independently if we are fol- eters, involves 2M evaluations of the objective function,
lowing the quantum expectation estimation trick of (8). each evaluation varying the parameters by ε in some di-
This is in sharp contrast to classical methods for gradi- rection, thereby giving us an estimate of the gradient
ents involving backpropagation, where gradients can be of the function with a precision O(ε2 ). Here the ∆k
estimated to numerical precision; i.e. within a precision is a unit-norm perturbation vector in the k th direction
with ∼ O(PolyLog()) overhead. Although there have of parameter space, (∆k )j = δjk . In general, one may
been attempts to formulate a backpropagation principle use lower-order methods, such as forward difference with
for quantum computations [54], these methods also rely O(ε) error from M + 1 objective queries [23], or higher
on the measurement of a quantum observable, thus also order methods, such as a five-point stencil method, with
requiring ∼ O( 12 ) samples. O(ε4 ) error from 4M queries [88].
As we will see in the following section III C, estimat-
ing gradients of quantum neural networks on quantum
computers involves the estimation of several expectation 2. Parameter shift methods
values of the cost function for various values of the pa-
rameters. One trick that was recently pointed out [46, 80] As recently pointed out in various works [80, 89], given
and has been proven to be successful both theoretically knowledge of the form of the ansatz (e.g. as in (3)), one
and empirically to estimate such gradients is the stochas- can measure the analytic gradients of the expectation
tic selection of various terms in the quantum expectation value of the circuit for Hamiltonians which have a single-
estimation. This can greatly reduce the number of mea- term in their Pauli decomposition (3) (or, alternatively, if
surements needed per gradient update, we will cover this the Hamiltonian has a spectrum {±λ} for some positive
in subsection III C 3. λ). For multi-term Hamiltonians, in [89] a method to
16
obtain the analytic gradients is proposed which uses a 3. Stochastic Parameter Shift Gradient Estimation
linear combination of unitaries. Here, instead, we will
simply use a change of variables and the chain rule to Consider the full analytic gradient from (14) and
obtain analytic gradients of parametric unitaries of the (11), if we have M` parameters and L layers, there are
form (5) without the need for ancilla qubits or additional PL
`=1 M` terms of the following form to estimate:
unitaries.
For a parameter of interest θj` appearing in a layer ` in a Kj` Kj` h
parametric circuit in the form (5), consider the change of ∂f X j` ∂f
X X j` π j`
i
= β k = ±β k f (η ± 4 ∆k ) . (15)
variables ηkj` ≡ θj` βkj` , then from the chain rule of calculus ∂θj` k=1
∂ηk
k=1 ±
[90], we have
These terms come from the components of the gradient
∂f X ∂f ∂η j` X j` ∂f vector itself which has the dimension equal to that of the
k
`
= j` `
= βk . (11) total number of free parameters in the QNN, dim(θ).
∂θj ∂θj ∂ηk
k ∂ηk k For each of these components, for the j th component of
Thus, all we need to compute are the derivatives of the the `th layer, there 2Kj` parameter-shifted expectation
PL
cost function with respect to ηkj` . Due to this change of values to evaluate, thus in total there are `=1 2Kj` M`
variables, we need to reparameterize our unitary Û (θ) parameterized expectation values of the cost Hamiltonian
from (1) as to evaluate.
O Y For practical implementation of this estimation proce-
j`
Û ` (θ ` ) 7→ Û ` (η ` ) ≡ e−iηk P̂k , (12) dure, we must expand this sum further. Recall that, as
j∈I` k the cost Hamiltonian generally will have many terms, for
each quantum expectation estimation of the cost function
where I` ≡ {1, . . . , M` } is an index set for the QNN for some value of the parameters, we have
layers. One can then expand each exponential in the
above in a similar fashion to (5): N
X X Jm
N X
j`
f (θ) = hĤiθ = αm hĥm iθ = αm γqm hP̂qm iθ ,
e−iηk P̂k
= cos(ηkj` )Iˆ − i sin(ηkj` )P̂k . (13) m=1 m=1 q=1
(16)
As can be shown from this form, the analytic derivative PN
which has j=1 Jj terms. Thus, if we consider that all
of the expectation value f (η) ≡ hΨ0 | Û † (η)Ĥ Û (η) |Ψ0 i
the terms in (15) are of the form of (16), we see that we
with respect to a component ηkj` can be reduced to fol- PN PL
lowing parameter shift rule [46, 80, 91]: have a total number of m=1 `=1 2Jm Kj` M` expecta-
tion values to estimate the gradient. Note that one of
∂
j` f (η)
∂ηk
= f (η + π4 ∆j` π j`
k ) − f (η − 4 ∆k ) (14) these sums comes from the total number of appearances
of parameters in front of Paulis in the generators of the
where ∆j` parameterized quantum circuit, the second sum comes
k is a vector representing unit-norm perturba-
from the various terms in the cost Hamiltonian in the
tion of the variable ηkj` in the positive direction. We thus
Pauli expansion.
see that this shift in parameters can generally be much
As the cost of accurately estimating all these terms
larger than that of the numerical differentiation parame-
one by one and subsequently linearly combining the val-
ter shifts as in equation (10). In some cases this is useful
ues such as to yield an estimate of the total gradient
as one does not have to resolve as fine-grained a differ-
may be prohibitively expensive in terms of numbers of
ence in the cost function as an infinitesimal shift, hence
runs, instead, one can stochastically estimate this sum,
requiring less runs to achieve a sufficiently precise esti-
by randomly picking terms according to their weighting
mate of the value of the gradient.
[46, 80].
Note that in order to compute through the chain rule
One can sample a distribution over the ap-
in (11) for a parametric unitary as in (3), we need to
pearances of a parameter in the QNN, k ∼
evaluate the expectation function 2K` times to obtain PKj` j`
the gradient of the parameter θj` . Thus, in some cases Pr(k|j, `) = |βkj` |/( o=1 |βo |), one then estimates the
where each parameter generates an exponential of many two parameter-shifted terms corresponding to this in-
terms, although the gradient estimate is of higher pre- dex in (15) and averages over samples. We consider
cision, obtaining an analytic gradient can be too costly this case to be simply stochastic gradient estimation for
in terms of required queries to the objective function. the gradient component corresponding to the parame-
To remedy this additional overhead, Harrow et al. [80] ter θj` . One can go even further in this spirit, for each
proposed to stochastically select terms according to a dis- of these sampled expectation values, by also sampling
tribution weighted by the coefficients of each term in the terms from (16) according to a similar distribution de-
generator, and to perform gradient descent from these termined by the magnitude of the Pauli expansion co-
stochastic estimates of the gradient. Let us review this efficients. Sampling the indices {q, m} ∼ Pr(q, m) =
PN PJd
stochastic gradient estimation technique as it is imple- |αm γqm |/( d=1 r=1 |αd γrd |) and estimating the expec-
mented in TFQ. tation hP̂qm iθ for the appropriate parameter-shifted val-
17
Note that, in some cases, instead of the expectation E. Autodifferentiation through hybrid
values of the set of operators {ĥk }Mk=1 , one may instead
quantum-classical backpropagation
want to relay the histogram of measurement results ob-
tained from multiple measurements of the eigenvalues As described above, hybrid quantum-classical neural
of each of these observables. This case can also be network blocks take as input a set of real parameters θ ∈
phrased as a vector of expectation values, as we will RM , apply a circuit Û (θ) and take a set of expectation
now show. First, note that the histogram of the mea- values of various observables
surement results of some ĥk with eigendecomposition
Prk (hθ )k = hĥk iθ .
ĥk = j=1 λjk |λjk i hλjk | can be considered as a vec-
tor of expectation values where the observables are the The result of this parameter-to-expected value map is
eigenstate projectors |λjk ihλjk |. Instead of obtaining a a function f : RM → RN which maps parameters to a
single real number from the expectation value of ĥk , we real-valued vector,
can obtain a vector hk ∈ Rrk , where rk = rank(ĥk ) and
the components are given by (hk )j ≡ h|λjk ihλjk |iθ . We f : θ 7→ hθ .
are then effectively considering the categorical (empiri-
cal) distribution as our vector. This function can then be composed with other param-
eterized function blocks comprised of either quantum or
Now, if we consider measuring the eigenvalues of mul- classical neural network blocks, as depicted in Fig. 11.
tiple observables {ĥk }M To be able to backpropagate gradients through gen-
k=1 and collecting the measure-
ment result histograms, we get a 2-dimensional tensor eral meta-networks of quantum and classical neural net-
(hθ )jk = h|λjk ihλjk |iθ . Without loss of generality, we work blocks, we simply have to figure out how to back-
propagate gradients through a quantum parameterized
can flatten this array into a vector of dimension RR where
PM block function when it is composed with other parame-
R = k=1 rk . Thus, considering vectors of expectation
terized block functions. Due to the partial ordering of
values is a relatively general way of representing the out-
the quantum-classical computational graph, we can fo-
put of a quantum neural network. In the limit where the
cus on how to backpropagate gradients through a QNN
set of observables considered forms an informationally-
in the scenario where we consider a simplified quantum-
complete set of observables [93], then the array of mea-
classical network where we combine all function blocks
surement outcomes would fully characterize the wave-
that precede and postcede the QNN block into mono-
function, albeit at an overhead exponential in the number
lithic function blocks. Namely, we can consider a sce-
of qubits.
nario where we have fpre : xin 7→ θ (fpre : Rin → RM )
We should mention that in some cases, instead of ex- as the block preceding the QNN, the QNN block as
pectation values or histograms, single samples from the fqnn : θ 7→ hθ , (fqnn : RM → RN ), the post-QNN
output of a measurement can be used for direct feedback- block as fpost : hθ 7→ yout (fpost : RM → RN out ) and
control on the quantum system, e.g. in quantum error finally the loss function for training the entire network
correction [76]. At least in the current implementation of being computed from this output L : RN out → R. The
TFQuantum, since quantum circuits are built in Cirq and composition of functions from the input data to the out-
this feature is not supported in the latter, such scenar- put loss is then the sequentially composited function
ios are currently out-of-scope. Mathematically, in such (L◦fpost ◦fqnn ◦fpre ). This scenario is depicted in Fig. 12.
a scenario, one could then consider the QNN and mea- Now, let us describe the process of backpropagation
surement as map from quantum circuit parameters θ to through this composition of functions. As is standard in
the conditional random variable Λθ valued over RNk cor- backpropagation of gradients through feedforward net-
responding to the measured eigenvalues λjk with a prob- works, we begin with the loss function evaluated at the
ability density Pr[(Λθ )k ≡ λjk ] = p(λjk |θ) which cor- output units and work our way back through the sev-
responds to the measurement statistics distribution in- eral layers of functional composition of the network to
duced by the Born rule, p(λjk |θ) = h|λjk ihλjk |iθ . This get the gradients. The first step is to obtain the gra-
QNN and single measurement map from the parameters dient of the loss function ∂L/∂yout and to use classi-
to the conditional random variable f : θ 7→ Λθ can be cal (regular) backpropagation of gradients to obtain the
considered as a classical stochastic map (classical condi- gradient of the loss with respect to the output of the
tional probability distribution over output variables given QNN, i.e. we get ∂(L ◦ fpost )/∂h via the usual use of
the parameters). In the case where only expectation val- the chain rule for backpropagation, i.e., the contraction
ues are used, this stochastic map reduces to a determinis- of the Jacobian with the gradient of the subsequent layer,
∂fpost
tic node through which we may backpropagate gradients, ∂(L ◦ fpost )/∂h = ∂L
∂y · ∂h .
as we will see in the next subsection. In the case where Now, let us label the evaluated gradient of the loss
this map is used dynamically per-sample, this remains a function with respect to the QNN’s expectation values
stochastic map, and though there exists some algorithms as
for backpropagation through stochastic nodes [94], these
∂(L◦fpost )
are not currently supported natively in TFQ. g≡ ∂h . (20)
h=hθ
19
To choose an architecture for a neural network model, are several families of filters, or feature maps, applied in
one can draw inspiration from the symmetries in the a translationally-invariant fashion. Here, we apply a sin-
training data. For example, in computer vision, one of- gle QCNN layer followed by several feature maps. The
ten needs to detect corners and edges regardless of their outputs of these feature maps can then be fed into clas-
position in an image; we thus postulate that a neural net- sical convolutional network layers, or in this particular
work to detect these features should be invariant under simplified example directly to fully-connected layers.
translations. In classical deep learning, an example of
such translationally-invariant neural networks are convo- Target problems:
lutional neural networks. These networks tie parameters
across space, learning a shared set of filters which are 1. Learn to extract classical information hidden in cor-
applied equally to all portions of the data. relations of a quantum system
2. Utilize shallow quantum circuits via hybridization
To the best of the authors’ knowledge, there is no with classical neural networks to extract informa-
strong indication that we should expect a quantum ad- tion
vantage for the classification of classical data using QNNs
in the near term. For this reason, we focus on classifying Required TFQ functionalities:
quantum data as defined in section I C. There are many
1. Hybrid quantum-classical network models
kinds of quantum data with translational symmetry. One
2. Batch quantum circuit simulator
example of such quantum data are cluster states. These
3. Quantum expectation based back-propagation
states are important because they are the initial states
4. Fast classical gradient-based optimizer
for measurement-based quantum computation [98, 99].
In this example we will tackle the problem of detect-
ing errors in the preparation of a simple cluster state. 2. Implementations
We can think of this as a supervised classification task:
our training data will consist of a variety of correctly
and incorrectly prepared cluster states, each paired with As discussed in section II D 2, the first step in the QML
their label. This classification task can be generalized pipeline is the preparation of quantum data. In this ex-
to condensed matter physics and beyond, for example to ample, our quantum dataset is a collection of correctly
the classification of phases near quantum critical points, and incorrectly prepared cluster states on 8 qubits, and
where the degree of entanglement is high. the task is to classify theses states. The dataset prepara-
tion proceeds in two stages; in the first stage, we generate
Since our simple cluster states are translationally in-
a correctly prepared cluster state:
variant, we can extend the spatial parameter-tying of
convolutional neural networks to quantum neural net- def c l u s t e r _ s t a t e _ c i r c u i t ( bits ) :
circuit = cirq . Circuit ()
works, using recent work by Cong, et al. [52], which circuit . append ( cirq . H . on_each ( bits ) )
introduced a Quantum Convolutional Neural Network for this_bit , next_bit in zip (
(QCNN) architecture. QCNNs are essentially a quan- bits , bits [1:] + [ bits [0]]) :
tum circuit version of a MERA (Multiscale Entanglement circuit . append (
Renormalization Ansatz) network [100]. MERA has been cirq . CZ ( this_bit , next_bit ) )
return circuit
extensively studied in condensed matter physics. It is a
hierarchical representation of highly entangled wavefunc- Errors in cluster state preparation will be simulated by
tions. The intuition is that as we go higher in the net- applying Rx (θ) gates that rotate a qubit about the X-axis
work, the wavefunction’s entanglement gets renormalized of the Bloch sphere by some amount 0 ≤ θ ≤ 2π. These
(coarse grained) and simultaneously a compressed repre- excitations will be labeled 1 if the rotation is larger than
sentation of the wavefunction is formed. This is akin to some threshold, and -1 otherwise. Since the correctly
the compression effects encountered in deep neural net- prepared cluster state is always the same, we can think
works [101]. of it as the initial state in the pipeline and append the
Here we extend the QCNN architecture to include clas- excitation circuits corresponding to various error states:
sical neural network postprocessing, yielding a Hybrid c l u s t e r _ s t a t e _ b i t s = cirq . GridQubit . rect (1 , 8)
Quantum Convolutional Neural Network (HQCNN). We e x c i t a t i o n _ i n pu t = tf . keras . Input (
perform several low-depth quantum operations in order shape =() , dtype = tf . dtypes . string )
cluster_state = tfq . layers . AddCircuit () (
to begin to extract hidden parameter information from a excitation_input , prepend =
wavefunction, then pass the resulting statistical informa- cluster_state_circuit ( cluster_state_bits ))
tion to a classical neural network for further processing.
Specifically, we will apply one layer of the hierarchy of the Note how excitation_input is a standard Keras data in-
QCNN. This allows us to partially disentangle the input gester. The datatype of the input is string to account
state and obtain statistics about values of multi-local ob- for our circuit serialization mechanics described in sec-
servables. In this strategy, we deviate from the original tion II E 1.
construction of Cong et al. [52]. Indeed, we are more in Having prepared our dataset, we begin construction of
the spirit of classical convolutional networks, where there our model. The quantum portion of all the models we
21
1
-1
1
1
-1
Prepare Qconv 1
Cluster +
Qconv
State QPool
+
QPool MSE Loss
Qconv
+ 1
1 -1
-1
QPool 1
1 1
1 -1
-1 1
1
Prepare Qconv
Cluster + Prepare Qconv
Cluster + MSE Loss
Test QPool State
MSE Loss QPool
Qconv
+
QPool
Figure 15. A simple hybrid architecture in which the outputs
of a truncated QCNN are fed into a classical neural network.
input, along with the associated labels. The loss plots Figure 16. A hybrid architecture in which the outputs of
for both the training and validation datasets can be gen- 3 separate truncated QCNNs are fed into a classical neural
network.
erated by running the associated example notebook.
We now consider a hybrid classifier. Instead of us-
ing quantum layers to pool all the way down to 1 qubit,
we can truncate the QCNN and measure a vector of op- readouts ) ( cluster_state )
erators on the remaining qubits. The resulting vector QCNN_3 = tfq . layers . PQC (
of expectation values is then fed into a classical neural multi_readout_model_circuit (
cluster_state_bits ),
network. This hybrid model is shown schematically in readouts ) ( cluster_state )
Fig. 15. # Feed all QCNNs into a classical NN
This can be achieved in TFQ with a few simple mod- concat_out = tf . keras . layers . concatenate (
ifications to the previous model, implemented with the [ QCNN_1 , QCNN_2 , QCNN_3 ])
code below: dense_1 = tf . keras . layers . Dense (8) ( concat_out )
dense_2 = tf . keras . layers . Dense (1) ( dense_1 )
# Build multi - readout quantum layer m u l t i _ q c o n v _ m o d e l = tf . keras . Model (
readouts = [ cirq . Z ( bit ) for bit in inputs =[ e x c it a t i o n _ i n p u t ] ,
c l u s t e r _ s t a t e _ b i t s [4:]] outputs =[ dense_2 ])
q u a n t u m _ m o d e l _ d u a l = tfq . layers . PQC (
multi_readout_model_circuit (
cluster_state_bits ), We find that that for the same optimization settings, the
readouts ) ( cluster_state ) purely quantum model trains the slowest, while the three-
# Build classical neural network layers
d1_dual = tf . keras . layers . Dense (8) ( quantum-filter hybrid model trains the fastest. This data
quantum_model_dual ) is shown in Fig. 17. This demonstrates the advantage
d2_dual = tf . keras . layers . Dense (1) ( d1_dual ) of exploring hybrid quantum-classical architectures for
hybrid_model = tf . keras . Model ( inputs =[ classifying quantum data.
e x c i t a t i o n _ i n p u t ] , outputs =[ d2_dual ])
|ψ0i i =Ûoi |0i (23) To train this hybrid supervised model, we define an op-
−i β γ δ
2 Ẑ −i 2 Ŷ −i 2 Ẑ
timizer, which in our case is the Adam optimizer, with
|ψix =e e e |ψ0i i , (24)
an appropriately chosen loss function:
hX̂ix = hψ|x X̂ |ψix , (25)
model . compile ( tf . keras . Adam , loss = ’ mse ’)
hŶ ix = hψ|x Ŷ |ψix , (26)
We finish off by training on the prepared supervised data
hẐix = hψ|x Ẑ |ψix . (27) in the standard way:
Assuming we have prepared the training dataset, each h i s t o r y _ t w o _ a xi s = model . fit (...
set consists of input vectors xi = [φ, θ1 , θ2 ] which derives
from the randomly drawn g, unitaries that prepare each The training converges after around 100 epochs as seen
initial state Û0i , and the associated expectation values in Fig. 19, which also shows excellent generalization to
yi = [hX̂ixi , hŶ ixi , hẐi}xi ]. validation data.
Now we are ready to define the hybrid quantum-
classical neural network model in Keras with Tensfor-
Flow API. To start, we first define the quantum part of 2. Time-dependent Hamiltonian Control
the hybrid neural network, which is a simple quantum
circuit of three single-qubit gates as follows. Now we consider a second kind of quantum control
cont rol_para ms = sympy . symbols ( ’ theta_ {1:3} ’) problem, where the actuated system H is allowed to
qubit = cirq . GridQubit (0 , 0) change in time. If the system is changing with time, the
control_circ = cirq . Circuit ( optimal control g∗ is also generally time varying. Gener-
cirq . Rz ( contr ol_param s [2]) ( qubit ) , alizing the discussion of section IV B 1, we can write the
cirq . Ry ( contr ol_param s [1]) ( qubit ) ,
cirq . Rz ( contr ol_param s [0]) ( qubit ) )
time varying control error given the time-varying con-
troller F (t) as ej (F (t), t) = |yj − H(F (xj , t), t)|. The
We are now ready to finish off the hybrid network by optimal control can then be written as g∗ (t) = g¯∗ + δ(t).
defining the classical part, which maps the target params This task is significantly harder than the problem dis-
to the control vector g = {β, γ, δ}. Assuming we have cussed in section IV B 1, since we need to learn the hidden
defined the vector of observables ops , the code to build variable δ(t) which can result in potentially highly com-
the model is: plex real-time system dynamics. We showcase how TFQ
circ_in = tf . keras . Input ( provides the perfect toolbox for such difficult control op-
shape =() , dtype = tf . dtypes . string ) timization with an important and realistic problem of
x_in = tf . keras . Input ((3 ,) ) learning and thus compensating the low frequency noise.
d1 = tf . keras . layers . Dense (128) ( x_in )
d2 = tf . keras . layers . Dense (128) ( d1 ) One of the main contributions to time-drifting errors
d3 = tf . keras . layers . Dense (64) ( d2 ) in realistic quantum control is 1/f α - like errors, which
g = tf . keras . layers . Dense (3) ( d3 ) encapsulate errors in the Hamiltonian amplitudes whose
exp_out = tfq . layers . ControlledPQC ( frequency spectrum has a large component at the low
control_circ , ops ) ([ circ_in , x_in ])
frequency regime. The origin of such low frequency noise
Now, we are ready to put everything together to define remains largely controversial. Mathematically, we can
and train a model in Keras. The two axis control model parameterize the low frequency noise in the time domain
is defined as follows: with the amplitude of the Pauli Z Hamiltonian on each
25
Let us define our parameterized quantum circuit of the MaxCut problem. For this tutorial we generate
ansatz. The canonical choice is to start P
with a uniform a random 3-regular graph with 10 nodes with NetworkX
⊗n ⊗n
superposition |ψ0 i = |+i = √12n ( x∈{0,1}n |xi), [108].
hence a fixed state. The QAOA unitary itself then con- # generate a 3 - regular graph with 10 nodes
sists of applying maxcut_graph = nx . r a n d o m _ r e g u l a r _ g r a p h ( n =10 , d =3)
where G = {V, E} is a graph for which we would like to Subsequently, we use these ingredients to build our
find the MaxCut; the largest size subset of edges (cut model. We note here in this case that QAOA has no
set) such that vertices at the end of these edges belong input data and labels, as we have mapped our graph to
to a different partition of the vertices into two disjoint the QAOA circuit. To use the TFQ framework we specify
subsets [12]. the Hadamard circuit as input and convert it to a TFQ
To train the QAOA, we simply optimize the expecta- tensor. We may then construct a tf.keras model using
tion value of our cost Hamiltonian with respect to our our QAOA circuit and cost in a TFQ PQC layer, and
parameterized output to find (approximately) optimal use a single instance sample for training the variational
parameters; η ∗ , γ ∗ = argminη,γ L(η, γ) where L(η, γ) = parameters of the QAOA with the Hadamard gates as an
input layer and a target value of 0 for our loss function,
hΨηγ | ĤC |Ψηγ i is our loss. Once trained, we use the as this is the theoretical minimum of this optimization
QPU to sample the probability distribution of measure- problem.
ments of the parameterized output state at optimal an- This translates into the following code:
gles in the standard basis, x ∼ p(x) = | hx|Ψη∗ γ ∗ i |2 ,
and pick the lowest energy bitstring from those samples # define the model and training data
model_circuit , model_readout = qaoa_circuit ,
as our approximate optimum found by the QAOA. cost_ham
Let us walkthrough how to implement such a basic input_ = [ h a d am a r d _ c i r c u i t ]
QAOA in TFQ. The first step is to generate an instance input_ = tfq . c o n v e r t _ t o _ t e n s o r ( input_ )
27
optimum = [0]
The following applications represent how we have ap- In section IV C, we have shown how to implement basic
plied TFQ to accelerate their discovery of new quantum QAOA in TFQ and optimize it with a gradient-based
algorithms. The examples presented in this section are optimizer, we can now explore how to leverage classical
newer research as compared to the previous section, as neural networks to optimize QAOA parameters. To run
such they have not had as much time for feedback from this example in the browser via Colab, follow the link:
the community. We include these here as they are demon- In recent works, the use of classical recurrent neural
stration of the sort of advanced QML research that can be networks to learn to optimize the parameters [45] (or gra-
accomplished by combining several building blocks pro- dient descent hyperparameters [111]) was proposed. As
vided in TFQ. As many of these examples involve the the choice of parameters after each iteration of quantum-
building and training of hybrid quantum-classical models classical optimization can be seen as the task of generat-
and advanced optimizers, such research would be much ing a sequence of parameters which converges rapidly to
more difficult to implement without TFQ. In our re- an approximate optimum of the landscape, we can use a
searchers’ experience, the performance gains and the ease type of classical neural network that is naturally suited
of use of TFQ decreased the time-to-working-prototype to generate sequential data, namely, recurrent neural net-
from weeks to days or even hours when it is compared to works. This technique was derived from work by Deep-
using alternative tools. Mind [104] for optimization of classical neural networks
Finally, as we would like to provide users with ad- and was extended to be applied to quantum neural net-
vanced examples to see TFQ in action for research use- works [45].
cases beyond basic implementations, along with the ex- The application of such classical learning to learn tech-
amples presented in this section are several notebooks niques to quantum neural networks was first proposed in
accessible on Github: [45]. In this work, an RNN (long short term memory;
github.com/tensorflow/quantum/tree/research LSTM) gets fed the parameters of the current iteration
We encourage readers to read the section below for an and the value of the expectation of the cost Hamiltonian
overview of the theory and use of TFQ functions and of the QAOA, as depicted in Fig. 21. More precisely, the
would encourage avid readers who want to experiment RNN receives as input the previous QNN query’s esti-
with the code to visit the full notebooks. mated cost function expectation yt ∼ p(y|θt ), where yt
is the estimate of hĤit , as well as the parameters for
which the QNN was evaluated θt . The RNN at this time
A. Meta-learning for Variational Quantum step also receives information stored in its internal hid-
Optimization den state from the previous time step ht . The RNN itself
has trainable parameters ϕ, and hence it applies the pa-
To run this example in the browser through Colab, rameterized mapping
follow the link:
research/metalearning qaoa/metalearning qaoa.ipynb ht+1 , θt+1 = RNNϕ (ht , θt , yt ) (32)
28
to simply initialize a circuit to the identity to avoid this with overparameterized circuits, and that shallow cir-
problem, but this incurs some subtle challenges. First, cuits approach this limit when increasing in size. It is
such a fixed initialization will tend to bias results on gen- not necessarily clear when this transition occurs, so LL
eral datasets. This challenge has been studied in the is a cost-efficient strategy to approach good local minima.
context of more general block initialization of identity
schemes [119].
Perhaps the more insidious way this problem arises, is Target problems:
that training with a method like stochastic gradient de-
scent (or sophisticated variants like Adam) on the entire 1. Dynamically building circuits for arbitrary learning
network, can accidentally lead one onto a barren plateau tasks
if the learning rate is too high. This is due to the fact 2. Manipulating circuit structure and parameters dur-
that the barren plateaus argument is one of volume of ing training
space and quantum-classical information exchange, and 3. Reducing the number of trained parameters and
random diffusion in parameter space will tend to lead circuit depth
one onto a plateau. This means that even the most 4. Avoid initialization on or drifting to a barren
clever initialization can be thwarted by the impact of plateau
this phenomenon on the training process. In practice this Required TFQ functionalities:
severely limits learning rate and hence training efficiency
of QNNs. 1. Parameterized circuit layers
For this reason, one can consider training on subsets of 2. Keras weight manipulation interface
the network which do not have the ability to completely 3. Parameter shift differentiator for exact gradient
randomize during a random walk. This layerwise learning computation
strategy allows one to use larger learning rates and im-
proves training efficiency in quantum circuits [120]. We To run this example in the browser through Colab,
advocate the use of these strategies in combination with follow the link:
appropriately designed local cost functions in order to cir- research/layerwise learning/layerwise learning.ipynb
cumvent the dramatically worse problems with objectives
like fidelity [113, 115]. TFQ has been designed to make As an example to show how this functionality may be
experimenting with both of these strategies straightfor- explored in TFQ, we will look at randomly generated
ward for the user, as we now document. For an example layers as shown in section V B, where one layer consists
of barren plateaus, see the notebook at the following link: of a randomly chosen rotation gate around the X, Y , or
Z axis on each qubit, followed by a ladder of CZ gates
docs/tutorials/barren plateaus.ipynb over all qubits.
def create_layer ( qubits , layer_id ) :
# create symbols for trainable parameters
symbols = [
2. Layerwise quantum circuit learning sympy . Symbol (
f ’{ layer_id } -{ str ( i ) } ’)
So far, the network training methods demonstrated in for i in range ( len ( qubits ) ) ]
section IV have focused on simultaneous optimization # build layer from random gates
of all network parameters, or end-to-end training. gates = [
As alluded to in the section on the Barren Plateaus random . choice ([
effect (V B), this type of strategy, when combined with cirq . Rx , cirq . Ry , cirq . Rz ]) (
a network of sufficient depth, can force reduced learning symbols [ i ]) ( q )
for i , q in enumerate ( qubits ) ]
rates, even with clever initialization. While this may
not be the optimal strategy for every realization of a # add connections between qubits
quantum network, TFQ is designed to easily facilitate for control , target in zip (
testing of this idea in conjunction with different cost qubits , qubits [1:]) :
gates . append ( cirq . CZ ( control , target ) )
functions to enhance efficiency. An alternative strategy
return gates , symbols
that has been beneficial is layerwise learning (LL) [120]
, where the number of trained parameters is altered We assume that we don’t know the ideal circuit struc-
on the fly. In this section, we will learn to alter the ture to solve our learning problem, so we start with the
architecture of a circuit while it trains, and restrict shallowest circuit possible and let our model grow from
attention to blocks of size insufficient to randomize onto there. In this case we start with one initial layer, and
a plateau. Among other things, this type of learning add a new layer after it has trained for 10 epochs. First,
strategy can help us avoid initialization or drifting we need to specify some variables:
throughout training of our QNN onto a barren plateau # number of qubits and layers in our circuit
[113, 120]. In [121], it is also shown that gradient based n_qubits = 6
algorithms are more successful in finding global minima n_layers = 8
30
ral Network-based approach to learn the Hamiltonian of 2. Training multi-layered quantum neural networks
a quantum dynamical process, given access to quantum with shared parameters
states at various time steps. 3. Batching QNN training data (input-output pairs
As was pointed out in the barren plateaus section V B, and time steps) for supervised learning of quantum
attempting to do QML with no prior on the physics of unitary map
the system or no imposed structure of the ansatz hits the
quantum version of the no free lunch theorem; the net-
work has too high a capacity for the problem at hand and 2. Implementation
is thus hard to train, as evidenced by its vanishing gra-
dients. Here, instead, we use a highly structured ansatz, Please see the tutorial notebook for full code details:
from work featured in [124]. First of all, given that we research/qgrnn ising/qgrnn ising.ipynb
know we are trying to replicate quantum dynamics, we
can structure our ansatz to be based on Trotter-Suzuki Here we provide an overview of our implementa-
evolution [77] of a learnable parameterized Hamiltonian. tion. We can define a general Quantum Graph Neu-
This effectively performs a form of parameter-tying in ral Network as a repeating sequence of exponentials
our ansatz between several layers representing time evo- of a Hamiltonian defined on a graph, Ûqgnn (η, θ) =
QP QQ −iηpq Ĥq (θ )
lution. In a previous example on quantum convolutional p=1 q=1 e where the Ĥq (θ) are generally
networks IV A 1, we performed parameter tying for spa- 2-local Hamiltonians whose coupling topology is that of
tial translation invariance, whereas here, we will assume an assumed graph structure.
the dynamics remain constant through time, and perform In our Hamiltonian learning problem, we aim to learn
parameter tying across time, hence it is akin to a quan- a target Ĥtarget which will be an Ising model Hamiltonian
tum form of recurrent neural networks (RNN). More pre- with Jjk as couplings and Bv for site bias term of each
cisely, as it is a parameterization of a Hamiltonian evo- P P P
spin, i.e., Ĥtarget = j,k Jjk Ẑj Ẑk + v Bv Ẑv + v X̂v ,
lution, it is akin to a quantum form of recently proposed given access to pairs of states at different times that were
models in classical machine learning called Hamiltonian subjected to the target time evolution operator Û (T ) =
neural networks [125].
e−iĤtarget T .
Beyond the quantum RNN form, we can impose further
We will use a recurrent form P of QGNN, using Hamil-
structure. We can consider a scenario where we know we
tonian generators Ĥ1 (θ) = v∈V αv X̂v and Ĥ2 (θ) =
have a one-dimensional quantum many-body system. As P P
Hamiltonians of physical have local couplings, we can θ Ẑ
{j,k}∈E jk j k Ẑ + v∈V v v , with trainable param-
φ Ẑ
1
use our prior assumptions of locality in the Hamiltonian eters {θjk , φv , αv }, for our choice of graph structure
and encode this as a graph-based parameterization of the prior G = {V, E}. The QGRNN is then resembles ap-
Hamiltonian. As we will see below, by using a Quantum plying a Trotterized time evolution of a parameterized
Graph Recurrent Neural network [124] implemented in Ising Hamiltonian Ĥ(θ) = Ĥ1 (θ) + Ĥ2 (θ) where P is the
TFQ, we will be able to learn the effective Hamiltonian number of Trotter steps. This is a good parameteriza-
topology and coupling strengths quite accurately, simply tion to learn the effective Hamiltonian of the black-box
from having access to quantum states at different times dynamics as we know from quantum simulation theory
and employing mini-batched gradient-based training. that Trotterized time evolution operators can closely ap-
Before we proceed, it is worth mentioning that the ap- proximate true dynamics in the limit of |ηjk | → 0 while
proach featured in this section is quite different from the P → ∞.
learning of quantum dynamics using a classical RNN fea- For our TFQ software implementation, we can initial-
ture in previous example section IV B. As sampling the ize Ising model & QGRNN model parameters as random
output of a quantum simulation at different times can be- values on a graph. It is very easy to construct this kind of
come exponentially hard, we can imagine that for large graph structure Hamiltonian by using Python NetworkX
systems, the Quantum RNN dynamics learning approach library.
could have primacy over the classical RNN approach, N = 6
thus potentially demonstrating a quantum advantage of dt = 0.01
QML over classical ML for this problem. # Target Ising model parameters
G_ising = nx . cycle_graph ( N )
ising_w = [ dt * np . random . random () for _ in G .
Target problems: edges ]
ising_b = [ dt * np . random . random () for _ in G .
1. Preparing low-energy states of a quantum system nodes ]
2. Learning Quantum Dynamics using a Quantum
Neural Network Model Because the target Hamiltonian and its nearest-neighbor
graph structure is unknown to the QGRNN, we need to
Required TFQ functionalities:
parameters of the QGRNN. Finally, we can get approximated lowest energy states
# QGRNN model parameters of the VQE model by compiling and training the above
G_qgrnn = nx . r a n d o m _ r e g u l a r _ g r a p h ( n =N , d =4) Keras model.2
qgrnn_w = [ dt ] * len ( G_qgrnn . edges ) model = tf . keras . Model (
qgrnn_b = [ dt ] * len ( G_qgrnn . nodes ) inputs = circuit_input , outputs = output )
theta = [ ’ theta {} ’. format ( e ) for e in G . edges ] adam = tf . keras . optimizers . Adam (
phi = [ ’ phi {} ’. format ( v ) for v in G . nodes ] learning_rate =0.05)
params = theta + phi
low_bound = - np . sum ( np . abs ( ising_w + ising_b ) )
Now that we have the graph structure, weights of edges
- N
& nodes, we can construct Cirq-based Hamiltonian oper- inputs = tfq . c o n v e r t _ t o _ t e n s o r ([ circuit ])
ator which can be directly calculated in Cirq and TFQ. outputs = tf . c o n v e r t _ t o _ t e n s o r ([[ low_bound ]])
To create a Hamiltonian by using cirq.PauliSum ’s or model . compile ( optimizer = adam , loss = ’ mse ’)
model . fit ( x = inputs , y = outputs ,
cirq.PauliString ’s, we need to assign appropriate qubits
batch_size =1 , epochs =100)
on them. Let’s assume Hamiltonian() is the Hamilto- params = model . get_weights () [0]
nian preparation function to generate cost Hamiltonian res = { k : v for k , v in zip ( symbols , params ) }
return cirq . r e s o l v e _ p a r a m e t e r s ( circuit , res )
from interaction weights and mixer Hamiltonian from
bias terms. We can bring qubits of Ising & QGRNN Now that the VQE function is built, we can generate
models by using cirq.GridQubit . the initial quantum data input with the low energy states
near to the ground state of the target Hamiltonian for
qubits_ising = cirq . GridQubit . rect (1 , N )
qubits_qgrnn = cirq . GridQubit . rect (1 , N , 0 , N ) both our data and our input state to our QGRNN.
ising_cost , ising_mixer = Hamiltonian ( H_target = ising_cost + ising_mixer
G_ising , ising_w , ising_b , qubits_ising ) l o w _ e n e r g y _ i s in g = VQE ( H_target , qubits_ising )
qgrnn_cost , qgrnn_mixer = Hamiltonian ( l o w _ e n e r g y _ q g rn n = VQE ( H_target , qubits_qgrnn )
G_qgrnn , qgrnn_w , qgrnn_b , qubits_qgrnn )
The QGRNN is fed the same input data as the
To train the QGRNN, we need to create an ensemble true process. We will use gradient-based training over
of states which are to be subjected to the unknown dy- minibatches of randomized timesteps chosen for our
namics. We chose to prepare a low-energy states by first QGRNN and the target quantum evolution. We will
performing a Variational Quantum Eigensolver (VQE) thus need to aggregate the results among the different
[28] optimization to obtain an approximate ground state. timestep evolutions to train the QGRNN model. To cre-
Following this, we can apply different amounts of simu- ate these time evolution exponentials, we can use the
lated time evolutions onto to this state to obtain a varied tfq.util.exponential function to exponentiate our target
dataset. This emulates having a physical system in a low-
and QGRNN Hamiltonians3
energy state and randomly picking the state at different
times. First things first, let us build a VQE model exp_ ising_co st = tfq . util . exponential (
operators = ising_cost )
def VQE ( H_target , q ) exp_ising_mix = tfq . util . exponential (
# Parameters operators = ising_mixer )
x = [ ’x {} ’. format ( i ) for i , _ in enumerate ( q ) ] exp_ qgrnn_co st = tfq . util . exponential (
z = [ ’z {} ’. format ( i ) for i , _ in enumerate ( q ) ] operators = qgrnn_cost , coefficients = params )
symbols = x + z exp_qgrnn_mix = tfq . util . exponential (
circuit = cirq . Circuit () operators = qgrnn_mixer )
circuit . append ( cirq . X ( q_ ) ** sympy . Symbol ( x_ )
for q_ , x_ in zip (q , x ) ) Here we randomly pick the 15 timesteps and apply the
circuit . append ( cirq . Z ( q_ ) ** sympy . Symbol ( z_ ) Trotterized time evolution operators using our above con-
for q_ , z_ in zip (q , z ) ) structed exponentials. We can have a quantum dataset
Now that we have a parameterized quantum circuit, {(|ψTj i, |φTj i)|j = 1..M } where M is the number of
we can minimize the expectation value of given Hamil-
tonian. Again, we can construct a Keras model with
Expectation . Because the output expectation values are 2 Here is some tip for training. Setting the output true value to
calculated respectively, we need to sum them up at the theoretical lower bound, we can minimize our expectation value
last. in the Keras model fit P framework. That
P is, we can Puse the in-
equality hĤtarget i = jk Jjk hZj Zk i + v Bv hZv i + v hXv i ≥
circuit_input = tf . keras . Input ( P P
shape =() , dtype = tf . string ) jk (−)|Jjk | − v |Bv | − N .
3 Here, we use the terminology cost and mixer Hamiltonians as the
output = tfq . layers . Expectation () (
circuit_input , Trotterization of an Ising model time evolution is very similar to
symbol_names = symbols , a QAOA, and thus we borrow nomenclature from this analogous
operators = tfq . c o n v e r t _ t o _ t e n s o r ( QNN.
33
PB
data, or batch size (in our case we chose M = 15), L(θ, φ) = 1 − B1 j=1 |hψTj |φTj i|
2
j j 1 B
|ψTj i = Ûtarget |ψ0 i and |φTj i = Ûqgrnn |ψ0 i.
P
= 1− B j=1 hẐtest ij
statistics can exhibit both classical and quantum forms Consider the task of preparing a thermal state: given
of correlations (e.g., entanglement). As such, if we wish a Hamiltonian Ĥ and a target inverse temperature β =
to learn a representation of such mixed state which can 1/T , we want to variationally approximate the state
generatively model its statistics, one can expect that a
1 −β Ĥ
hybrid representation combining classical probabilistic σ̂β = Zβ e , Zβ = tr(e−β Ĥ ), (35)
models and quantum neural networks can be an ideal.
Such a decomposition is ideal for near-term noisy de- using a state of the form presented in equation (34).
vices, as it reduces the overhead of representation on the That is, we aim to find a value of the hybrid model
quantum device, leading to lower-depth quantum neu- parameters {θ ∗ , φ∗ } such that ρ̂θ∗ φ∗ ≈ σ̂β . In order
ral networks. Furthermore, the quantum layers provide to converge to this approximation via optimization of
a valuable addition in representation power to the clas- the parameters, we need a loss function to optimize
sical probabilistic model, as they allow the addition of which quantifies statistical distance between these quan-
quantum correlations to the model. tum mixed states. If we aim to minimize the discrep-
Thus, in this section, we cover some examples where ancy between states in terms of quantum relative en-
one learns to generatively model mixed states using a tropy D(ρ̂θφ kσ̂β ) = −S(ρ̂θφ ) − tr(ρ̂θφ log σ̂β ), (where
hybrid quantum-probabilistic model [48]. Such models S(ρ̂θφ ) = −tr(ρ̂θφ log ρ̂θφ ) is the entropy), then, as de-
use a parameterized ansatz of the form scribed in the full paper [57] we can equivalently minimize
X the free energy4 , and hence use it as our loss function:
ρ̂θφ = Û (φ)ρ̂θ Û † (φ), ρ̂θ = pθ (x) |xihx| (34)
Lfe (θ, φ) = βtr(ρ̂θφ Ĥ) − S(ρ̂θφ ). (36)
x
The first term is simply the expectation value of the
where Û (φ) is a unitary quantum neural network with energy of our model, while the second term is the en-
parameters φ and pθ (x) is a classical probabilistic model tropy. Due to the structure of our quantum-probabilistic
with parameters θ. We call ρ̂θφ the visible state and model, the entropy of the visible state is equal to the
ρ̂θ the latent state. Note the latent state is effectively a entropy of the latent state, which is simply the clas-
classical distribution over the standard basis states, and sical
its only parameters are those of the classical probabilistic P entropy of the distribution, S(ρ̂θφ ) = S(ρ̂θ ) =
− x pθ (x) log pθ (x). This comes in quite useful dur-
model. ing the optimization of our model.
As we shall see below, there are methods to train both Let us implement a simple example of the VQT model
networks simultaneously. In terms of software implemen- which minimizes free energy to achieve an approximation
tation, as we have to combine probabilistic models and of the thermal state of a physical system. Let us consider
quantum neural networks, we will use a combination of a two-dimensional Heisenberg spin chain
TensorFlow Probability [128] along with TFQ. A first X X
class of application we will consider is the task of gen- Ĥheis = Jh Ŝi · Ŝj + Jv Ŝi · Ŝj (37)
erating a thermal state of a quantum system given its hijih hijiv
Hamiltonian. A second set of applications is given sev-
eral copies of a mixed state, learn a generative model where h (v) denote horizontal (vertical) bonds, while h·i
which replicates the statistics of the state. represent nearest-neighbor pairings. First, we define this
Hamiltonian on a grid of qubits:
Target problems: def get_bond ( q0 , q1 ) :
return cirq . PauliSum . f r o m _ p a u l i _ s t r i n g s ([
1. Incorporating probabilistic and quantum models cirq . PauliString ( cirq . X ( q0 ) , cirq . X ( q1 ) ) ,
2. Variational Quantum Simulation of Quantum cirq . PauliString ( cirq . Y ( q0 ) , cirq . Y ( q1 ) ) ,
Thermal States cirq . PauliString ( cirq . Z ( q0 ) , cirq . Z ( q1 ) ) ])
3. Learning to generatively model mixed states from
def g e t _ h e i s e n b e r g _ h a m i l t o n i a n ( qubits , jh , jv ) :
data heisenberg = cirq . PauliSum ()
# Apply horizontal bonds
Required TFQ functionalities:
for r in qubits :
1. Integration with TF Probability [128] for q0 , q1 in zip (r , r [1::]) :
heisenberg += jh * get_bond ( q0 , q1 )
2. Sample-based simulation of quantum circuits # Apply vertical bonds
3. Parameter shift differentiator for gradient compu- for r0 , r1 in zip ( qubits , qubits [1::]) :
tation for q0 , q1 in zip ( r0 , r1 ) :
heisenberg += jv * get_bond ( q0 , q1 )
return heisenberg
2. Variational Quantum Thermalizer
Full notebook of the implementations below are avail- 4 More precisely, the loss function here is in fact the inverse tem-
able at: perature multiplied by the free energy, but this detail is of little
research/vqt qmhl/vqt qmhl.ipynb import to our optimization.
35
For our QNN, we consider a unitary consisting of general estimate via sampling of the classical probabilistic model
single qubit rotations and powers of controlled-not gates. and the output of the QPU.
Our code returns the associated symbols so that these For our classical latent probability distribution pθ (x),
can be fed into the Expectation op: as a first simple case, we can use the productQ of inde-
pendent Bernoulli distributions pθ (x) = j pθj (xj ) =
Q xj 1−xj
def g et _r o ta ti on _ 1q (q , a , b , c ) : j θj (1 − θj ) , where xj ∈ {0, 1} are binary values.
return cirq . Circuit ( We can re-phrase this distribution as an energy based
cirq . X ( q ) ** a , cirq . Y ( q ) ** b , cirq . Z ( q ) ** c ) model to take advantage of equation (V D 2). We move
the parameters into an exponential,Qso that the probabil-
def g et _r o ta ti on _ 2q ( q0 , q1 , a ) :
return cirq . Circuit ( ity of a bitstring becomes pθ (x) = j eθj xj /(eθj + e−θj ).
cirq . CNotPowGate ( exponent = a ) ( q0 , q1 ) ) Since this distribution is a product of independent vari-
ables, it is easy to sample from. We can use the Tensor-
def get_layer_1q ( qubits , layer_num , L_name ) : Flow Probability library [128] to produce samples from
layer_symbols = []
circuit = cirq . Circuit ()
this distribution, using the tfp.distributions.Bernoulli
for n , q in enumerate ( qubits ) : object:
a , b , c = sympy . symbols ( def b e r n o u l l i _ b i t _ p r o b a b i l i t y ( b ) :
" a {2} _ {0} _ {1} b {2} _ {0} _ {1} c {2} _ {0} _ {1} " return np . exp ( b ) /( np . exp ( b ) + np . exp ( - b ) )
. format ( layer_num , n , L_name ) )
layer_symbols += [a , b , c ] def s a m p l e _ b e r n o u l l i ( num_samples , biases ) :
circuit += ge t_ r ot at io n _1 q (q , a , b , c ) prob_list = []
return circuit , layer_symbols for bias in biases . numpy () :
prob_list . append (
def get_layer_2q ( qubits , layer_num , L_name ) : b e r n o u l l i _ b i t _ p r o b a b i l i t y ( bias ) )
layer_symbols = [] latent_dist = tfp . distributions . Bernoulli (
circuit = cirq . Circuit () probs = prob_list , dtype = tf . float32 )
for n , ( q0 , q1 ) in enumerate ( zip ( qubits [::2] , return latent_dist . sample ( num_samples )
qubits [1::2]) ) :
a = sympy . symbols ( " a {2} _ {0} _ {1} " . format ( After getting samples from our classical probabilistic
layer_num , n , L_name ) ) model, we take gradients of our QNN parameters. Be-
layer_symbols += [ a ]
circuit += ge t_ r ot at io n _2 q ( q0 , q1 , a )
cause TFQ implements gradients for its expectation ops,
return circuit , layer_symbols we can use tf.GradientTape to obtain these derivatives.
Note that below we used tf.tile to give our Hamiltonian
It will be convenient to consider a particular class of
operator and visible state circuit the correct dimensions:
probabilistic models where the estimation of the gradient
of the model parameters is straightforward to perform. b i t s t r i n g _ t e n so r = s a m p le _ b e r n o u l l i (
num_samples , vqt_biases )
This class of models are called exponential families or with tf . GradientTape () as tape :
energy-based models (EBMs). If our parameterized prob- t i l e d _ v q t _ m o d e l _ p a r a m s = tf . tile (
abilistic model is an EBM, then it is of the form: [ v q t _ m o d e l _ p a r a m s ] , [ num_samples , 1])
s a m p l e d _ e x p e c t a t i o n s = expectation (
pθ (x) = Z1θ e−Eθ (x) , Zθ ≡ x∈Ω e−Eθ (x) .
P
(38) tiled_visible_state ,
vqt_symbol_names ,
tf . concat ([ bitstring_tensor ,
For gradients of the VQT free energy loss function
t i l e d _ v q t _ m o d e l _ p a r a m s ] , 1) ,
with respect to the QNN parameters, ∂φ Lfe (θ, φ) = tiled_H )
β∂φ tr(ρ̂θφ Ĥ), this is simply the gradient of an expec- energy_losses = beta * s a m p l e d _ e x p e c t a t i o n s
tation value, hence we can use TFQ parameter shift gra- e n e r g y _ l o s s e s _ a v g = tf . reduce_mean (
energy_losses )
dients or any other method for estimating the gradients
v q t _ m o d e l _ g r a d i e n t s = tape . gradient (
of QNN’s outlined in previous sections. energy_losses_avg , [ v q t _ m o d e l _ p a r a m s ])
As for gradients of the classical probabilistic model,
one can readily derive that they are given by the following Putting these pieces together, we train our model to out-
covariance: put thermal states of the 2D Heisenberg model on a 2x2
h i grid. The result after 100 epochs is shown in Fig. 24.
∂θ Lfe = Ex∼pθ (x) (Eθ (x) − βHφ (x))∇θ Eθ (x) A great advantage of this approach to optimization of
the probabilistic model is that the partition function Zθ
−(Ex∼pθ (x) Eθ (x)−βHφ (x) )(Ey∼pθ (y) ∇θ Eθ (y) ), does not need to be estimated. As such, more general
(39) more expressive models beyond factorized distributions
can be used for the probabilistic modelling of the la-
where Hφ (x) ≡ hx| Û † (φ)Ĥ Û (φ) |xi is the expectation tent classical distribution. In the advanced section of the
value of the Hamiltonian at the output of the QNN with notebook, we show how to use a Boltzmann machine as
the standard basis element |xi as input. Since the energy our energy based model. Boltzmann machines are EBM’s
n
function and its gradients can easily be evaluated as it is where for Pbitstring x ∈ {0,P1} , the energy is defined as
a neural network, the above gradient is straightforward to E(x) = − i,j wij xi xj − i bi xi .
36
It is worthy to note that our factorized Bernoulli distri- where σφ (x) ≡ hx| Û † (φ)σ̂D Û (φ) |xi is the distribution
bution is in fact a special case of the Boltzmann machine, obtained by feeding the data state σ̂D through the inverse
one where only the so-called bias P terms in the energy QNN circuit Û † (φ) and measuring in the standard basis.
function are present: E(x) = − i bi xi . In the notebook, As this is simply an expectation value of a state prop-
we start with this simpler Bernoulli example of the VQT, agated through a QNN, for gradients of the loss with re-
the resulting density matrix converges to the known ex- spect to QNN parameters we can use standard TFQ dif-
act result for this system, as shown in Fig. 24. We also ferentiators, such as the parameter shift rule presented in
provide a more advanced example with a general Boltz- section III. As for the gradient with respect to the EBM
mann machine. In the latter example, we picked a fully parameters, it is given by
visible, fully-connected classical Ising model energy func-
tion, and used MCMC with Metropolis-Hastings [129] to
sample from the energy function. ∂θ Lxe (θ, φ) = Ex∼σφ (x) [∇θ Eθ (x)] − Ey∼pθ (y) [∇θ Eθ (y)].
machine learning algorithms for a wide array of applica- hospitality and support during their respective intern-
tions. Quantum machine learning is a very new and ex- ships, as well as fellow team members for several useful
citing field, so we expect the framework to change with discussions, in particular Matt Harrigan, John Platt, and
the needs of the research community, and the availabil- Nicholas Rubin. The authors would like to also thank
ity of new quantum hardware. We have open-sourced Achim Kempf from the University of Waterloo for spon-
the framework under the commercially friendly Apache2 soring this project. M.B. and J.Y. would like to thank the
license, allowing future commercial products to embed Google Brain team for supporting this project, in particu-
TFQ royalty-free. If you would like to participate in our lar Francois Chollet, Yifei Feng, David Rim, Justin Hong,
community, visit us at: and Megan Kacholia. G.V. would like to thank Stefan Le-
ichenauer, Jack Hidary and the rest of the Quantum@X
https://github.com/tensorflow/quantum/ team for support during his Quantum Residency. G.V.
acknowledges support from NSERC. D.B. is an Associate
Fellow in the CIFAR program on Quantum Information
Science. A.S. and M.S. were supported by the USRA
Feynman Quantum Academy funded by the NAMS R&D
Student Program at NASA Ames Research Center and by
VII. ACKNOWLEDGEMENTS the Air Force Research Laboratory (AFRL), NYSTEC-
USRA Contract (FA8750-19-3-6101). X, formerly known
The authors would like to thank Google Research for as Google[x], is part of the Alphabet family of compa-
supporting this project. M.B., G.V., T.M., and A.J.M. nies, which includes Google, Verily, Waymo, and others
would like to thank the Google Quantum AI Lab for their (www.x.company).
[1] K. P. Murphy, Machine learning: a probabilistic perspec- [16] S. Lloyd, M. Mohseni, and P. Rebentrost, “Quantum
tive (MIT press, 2012). algorithms for supervised and unsupervised machine
[2] J. A. Suykens and J. Vandewalle, Neural processing let- learning,” (2013), arXiv:1307.0411 [quant-ph].
ters 9, 293 (1999). [17] I. Kerenidis and A. Prakash, “Quantum recommenda-
[3] S. Wold, K. Esbensen, and P. Geladi, Chemometrics tion systems,” (2016), arXiv:1603.08675.
and intelligent laboratory systems 2, 37 (1987). [18] J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost,
[4] A. K. Jain, Pattern recognition letters 31, 651 (2010). N. Wiebe, and S. Lloyd, Nature 549, 195 (2017).
[5] Y. LeCun, Y. Bengio, and G. Hinton, nature 521, 436 [19] V. Giovannetti, S. Lloyd, and L. Maccone, Physical
(2015). review letters 100, 160501 (2008).
[6] A. Krizhevsky, I. Sutskever, and G. E. Hinton, in Ad- [20] S. Arunachalam, V. Gheorghiu, T. Jochym-OConnor,
vances in Neural Information Processing Systems 25 , M. Mosca, and P. V. Srinivasan, New Journal of Physics
edited by F. Pereira, C. J. C. Burges, L. Bottou, and 17, 123010 (2015).
K. Q. Weinberger (Curran Associates, Inc., 2012) pp. [21] E. Tang, (2018), 10.1145/3313276.3316310,
1097–1105. arXiv:1807.04271.
[7] I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio, [22] J. Preskill, “Quantum computing in the NISQ era and
Deep learning, Vol. 1 (MIT press Cambridge, 2016). beyond,” (2018), arXiv:1801.00862.
[8] J. Preskill, arXiv preprint arXiv:1801.00862 (2018). [23] E. Farhi and H. Neven, arXiv preprint arXiv:1802.06002
[9] R. P. Feynman, International Journal of Theoretical (2018).
Physics 21, 467 (1982). [24] A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-
[10] Y. Cao, J. Romero, J. P. Olson, M. Degroote, P. D. Q. Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. Obrien,
Johnson, M. Kieferová, I. D. Kivlichan, T. Menke, Nature communications 5, 4213 (2014).
B. Peropadre, N. P. Sawaya, et al., Chemical reviews [25] N. Killoran, T. R. Bromley, J. M. Arrazola,
119, 10856 (2019). M. Schuld, N. Quesada, and S. Lloyd, arXiv preprint
[11] P. W. Shor, in Proceedings 35th annual symposium on arXiv:1806.06871 (2018).
foundations of computer science (Ieee, 1994) pp. 124– [26] D. Wecker, M. B. Hastings, and M. Troyer, Phys. Rev.
134. A 92, 042303 (2015).
[12] E. Farhi, J. Goldstone, and S. Gutmann, “A quan- [27] L. Zhou, S.-T. Wang, S. Choi, H. Pichler, and M. D.
tum approximate optimization algorithm,” (2014), Lukin, arXiv preprint arXiv:1812.01041 (2018).
arXiv:1411.4028 [quant-ph]. [28] J. R. McClean, J. Romero, R. Babbush, and A. Aspuru-
[13] F. Arute, K. Arya, R. Babbush, D. Bacon, J. C. Bardin, Guzik, New Journal of Physics 18, 023023 (2016).
R. Barends, R. Biswas, S. Boixo, F. G. Brandao, D. A. [29] S. Hadfield, Z. Wang, B. O’Gorman, E. G. Rief-
Buell, et al., Nature 574, 505 (2019). fel, D. Venturelli, and R. Biswas, arXiv preprint
[14] S. Lloyd, M. Mohseni, and P. Rebentrost, Nature arXiv:1709.03489 (2017).
Physics 10, 631633 (2014). [30] E. Grant, M. Benedetti, S. Cao, A. Hallam, J. Lockhart,
[15] P. Rebentrost, M. Mohseni, and S. Lloyd, Phys. Rev. V. Stojevic, A. G. Green, and S. Severini, npj Quantum
Lett. 113, 130503 (2014). Information 4, 1 (2018).
38
[31] S. Khatri, R. LaRose, A. Poremba, L. Cincio, A. T. [60] J. Carolan, M. Mohseni, J. P. Olson, M. Prabhu,
Sornborger, and P. J. Coles, Quantum 3, 140 (2019). C. Chen, D. Bunandar, M. Y. Niu, N. C. Harris, F. N. C.
[32] M. Schuld and N. Killoran, Physical review letters 122, Wong, M. Hochberg, S. Lloyd, and D. Englund, Nature
040504 (2019). Physics (2020).
[33] S. McArdle, T. Jones, S. Endo, Y. Li, S. Benjamin, and [61] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen,
X. Yuan, arXiv preprint arXiv:1804.03023 (2018). C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin,
[34] M. Benedetti, E. Grant, L. Wossnig, and S. Severini, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Is-
New Journal of Physics 21, 043023 (2019). ard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Lev-
[35] B. Nash, V. Gheorghiu, and M. Mosca, arXiv preprint enberg, D. Mane, R. Monga, S. Moore, D. Murray,
arXiv:1904.01972 (2019). C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever,
[36] Z. Jiang, J. McClean, R. Babbush, and H. Neven, arXiv K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan,
preprint arXiv:1812.08190 (2018). F. Viegas, O. Vinyals, P. Warden, M. Wattenberg,
[37] G. R. Steinbrecher, J. P. Olson, D. Englund, and J. Car- M. Wicke, Y. Yu, and X. Zheng, “Tensorflow: Large-
olan, arXiv preprint arXiv:1808.10047 (2018). scale machine learning on heterogeneous distributed sys-
[38] M. Fingerhuth, T. Babej, et al., arXiv preprint tems,” (2016), arXiv:1603.04467 [cs.DC].
arXiv:1810.13411 (2018). [62] Google, “Cirq: A python framework for creating, edit-
[39] R. LaRose, A. Tikku, É. O’Neel-Judy, L. Cincio, and ing, and invoking noisy intermediate scale quantum cir-
P. J. Coles, arXiv preprint arXiv:1810.10506 (2018). cuits,” (2018).
[40] L. Cincio, Y. Subaşı, A. T. Sornborger, and P. J. Coles, [63] A. Meurer, C. P. Smith, M. Paprocki, O. Čertı́k, S. B.
New Journal of Physics 20, 113022 (2018). Kirpichev, M. Rocklin, A. Kumar, S. Ivanov, J. K.
[41] H. Situ, Z. Huang, X. Zou, and S. Zheng, Quantum Moore, S. Singh, T. Rathnayake, S. Vig, B. E. Granger,
Information Processing 18, 230 (2019). R. P. Muller, F. Bonazzi, H. Gupta, S. Vats, F. Johans-
[42] H. Chen, L. Wossnig, S. Severini, H. Neven, and son, F. Pedregosa, M. J. Curry, A. R. Terrel, v. Roučka,
M. Mohseni, arXiv preprint arXiv:1805.08654 (2018). A. Saboo, I. Fernando, S. Kulal, R. Cimrman, and
[43] G. Verdon, M. Broughton, and J. Biamonte, arXiv A. Scopatz, PeerJ Computer Science 3, e103 (2017).
preprint arXiv:1712.05304 (2017). [64] D. E. Rumelhart, G. E. Hinton, and R. J. Williams,
[44] M. Mohseni et al., Nature 543, 171 (2017). nature 323, 533 (1986).
[45] G. Verdon, M. Broughton, J. R. McClean, K. J. Sung, [65] J. Biamonte and V. Bergholm, “Tensor networks in a
R. Babbush, Z. Jiang, H. Neven, and M. Mohseni, nutshell,” (2017), arXiv:1708.00006 [quant-ph].
“Learning to learn with quantum neural networks via [66] C. Roberts, A. Milsted, M. Ganahl, A. Zalcman,
classical neural networks,” (2019), arXiv:1907.05415. B. Fontaine, Y. Zou, J. Hidary, G. Vidal, and S. Le-
[46] R. Sweke, F. Wilde, J. Meyer, M. Schuld, P. K. ichenauer, “TensorNetwork: A Library for Physics
Fährmann, B. Meynard-Piganeau, and J. Eisert, arXiv and Machine Learning,” (2019), arXiv:1905.01330
preprint arXiv:1910.01155 (2019). [physics.comp-ph].
[47] J. R. McClean, J. Romero, R. Babbush, and A. Aspuru- [67] C. Roberts and S. Leichenauer, “Introducing Tensor-
Guzik, New Journal of Physics 18, 023023 (2016). Network, an Open Source Library for Efficient Tensor
[48] G. Verdon, J. M. Arrazola, K. Brádler, and N. Killoran, Calculations,” (2019).
arXiv preprint arXiv:1902.00409 (2019). [68] The TensorFlow Authors, “Effective TensorFlow 2,”
[49] Z. Wang, N. C. Rubin, J. M. Dominy, and E. G. Rieffel, (2019).
arXiv preprint arXiv:1904.09314 (2019). [69] F. Chollet et al., “Keras,” https://keras.io (2015).
[50] E. Farhi and H. Neven, “Classification with quantum [70] D. P. Kingma and J. Ba, arXiv preprint arXiv:1412.6980
neural networks on near term processors,” (2018), (2014).
arXiv:1802.06002 [quant-ph]. [71] G. E. Crooks, arXiv preprint arXiv:1905.13311 (2019).
[51] J. R. McClean, S. Boixo, V. N. Smelyanskiy, R. Bab- [72] M. Smelyanskiy, N. P. D. Sawaya, and A. Aspuru-
bush, and H. Neven, Nature communications 9, 1 Guzik, ArXiv e-prints (2016), arXiv:1601.07195 [quant-
(2018). ph].
[52] I. Cong, S. Choi, and M. D. Lukin, Nature Physics 15, [73] T. Hner and D. S. Steiger, ArXiv e-prints (2017),
12731278 (2019). arXiv:1704.01127 [quant-ph].
[53] S. Lloyd and C. Weedbrook, Physical review letters 121, [74] Intel, “Intel Instruction Set Extensions Technology,”
040502 (2018). (2020).
[54] G. Verdon, J. Pye, and M. Broughton, arXiv preprint [75] Mark Buxton, “Haswell New Instruction Descriptions
arXiv:1806.09729 (2018). Now Available!” (2011).
[55] J. Romero and A. Aspuru-Guzik, arXiv preprint [76] D. Gottesman, Stabilizer codes and quantum error cor-
arXiv:1901.00848 (2019). rection, Ph.D. thesis, California Institute of Technology
[56] V. Bergholm, J. Izaac, M. Schuld, C. Gogolin, C. Blank, (1997).
K. McKiernan, and N. Killoran, arXiv preprint [77] M. Suzuki, Physics Letters A 146, 319 (1990).
arXiv:1811.04968 (2018). [78] E. Campbell, Physical Review Letters 123 (2019),
[57] G. Verdon, J. Marks, S. Nanda, S. Leichenauer, and 10.1103/physrevlett.123.070503.
J. Hidary, arXiv preprint arXiv:1910.02071 (2019). [79] N. C. Rubin, R. Babbush, and J. McClean, New Jour-
[58] M. Mohseni, A. M. Steinberg, and J. A. Bergou, Phys. nal of Physics 20, 053020 (2018).
Rev. Lett. 93, 200403 (2004). [80] A. Harrow and J. Napp, arXiv preprint
[59] M. Y. Niu, S. Boixo, V. N. Smelyanskiy, and H. Neven, arXiv:1901.05374 (2019).
npj Quantum Information 5, 1 (2019). [81] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, Pro-
ceedings of the IEEE 86, 2278 (1998).
39