Quantum Data-Fitting: PACS Numbers: 03.67.-A, 03.67.ac, 42.50.Dv
Quantum Data-Fitting: PACS Numbers: 03.67.-A, 03.67.ac, 42.50.Dv
r
X
i
v
:
1
2
0
4
.
5
2
4
2
v
2
[
q
u
a
n
t
-
p
h
]
3
J
u
l
2
0
1
2
Quantum Data-Fitting
Nathan Wiebe
1
, Daniel Braun
2,3
, and Seth Lloyd
4
1
Institute for Quantum Computing, Waterloo, On, Canada
2
Universite de Toulouse, UPS, Laboratoire de Physique Theorique (IRSAMC), F-31062 Toulouse, France
3
CNRS, LPT (IRSAMC), F-31062 Toulouse, France and
4
MIT - Research Laboratory for Electronics and Department of Mechanical Engineering, Cambridge, MA 02139, USA
We provide a new quantum algorithm that eciently determines the quality of a least-squares
t over an exponentially large data set by building upon an algorithm for solving systems of linear
equations eciently (Harrow et al., Phys. Rev. Lett. 103, 150502 (2009)). In many cases, our
algorithm can also eciently nd a concise function that approximates the data to be tted and
bound the approximation error. In cases where the input data is a pure quantum state, the algorithm
can be used to provide an ecient parametric estimation of the quantum state and therefore can be
applied as an alternative to full quantum state tomography given a fault tolerant quantum computer.
PACS numbers: 03.67.-a, 03.67.Ac, 42.50.Dv
Invented as early as 1794 by Carl Friedrich Gauss, tting data to theoretical models has become over the centuries
one of the most important tools in all of quantitative science [1]. Typically, a theoretical model depends on a number
of parameters, and leads to functional relations between data that will depend on those parameters. Fitting a large
amount of experimental data to the functional relations allows one to obtain reliable estimates of the parameters. If
the amount of data becomes very large, tting can become very costly. Examples include inversion problems of X-ray
or neutron scattering data for structure analysis, or high-energy physics with giga-bytes of data produced per second
at the LHC. Typically, structure analysis starts from a rst guess of the structure, and then iteratively tries to improve
the t to the experimental data by testing variations of the structure. It is therefore often desirable to test many
dierent models, and compare the best possible ts they provide before committing to one for which one extracts
then the parameters from the t. Obtaining a good t with a relatively small number of parameters compared to the
amount of data can be considered a form of data compression. Indeed, also for numerically calculated data, such as
many-body wave-functions in molecular engineering, ecient tting of the wave-functions to simpler models would
be highly desirable.
With the rise of quantum information theory, one might wonder if a quantum algorithm can be found that solves
these problems eciently. The discovery that exploiting quantum mechanical eects might lead to enhanced computa-
tional power compared to classical information processing has triggered large-scale research aimed at nding quantum
algorithms which are more ecient than the best classical counterparts [27]. Although faulttolerant quantum com-
putation remains out of reach at present, quantum simulation is already now on the verge of providing answers to
questions concerning the states of complex systems that are beyond classical computability [8, 9]. Recently, a quan-
tum algorithm (called HHL in the following) was introduced that eciently solves a linear equation, Fx = b, with
given vector b of dimension N and sparse Hermitian matrix F [10]. Ecient solution means that the expectation
value x[M[x of an arbitrary poly-size Hermitian operator M can be found in roughly O(s
4
2
log(N)/) steps [11],
where is the condition number of F, i.e. the ratio between the largest and smallest eigenvalue of F, s denotes the
sparsenes (i.e. the maximum number of non-zero matrix elements of F in any given row or column), and is the
maximum allowed distance between the [x found by the computer and the exact solution. In contrast, they show
that it is unlikely that classical computers can eciently solve similar problems because it would imply that quantum
computers are no more powerful than classical computers.
While it has remained unclear so far whether expectation values of the form x[M[x provide answers to computa-
tionally important questions, we provide here an adaption of the algorithm to the problem of data tting that allows
one to eciently obtain the quality of a t without having to learn the t-parameters. Our algorithm is particularly
useful for tting data eciently computed by a quantum computer or quantum simulator, especially if an evolution
can be eciently simulated but no known method exists to eciently learn the resultant state. For example, our
algorithm could be used to eciently nd a concise matrixproduct state approximation to a groundstate yielded by
a quantum manybody simulator and assess the approximation error. More complicated states can be used in the
t if the quantum computer can eciently prepare them. Fitting quantum states to a set of known functions is an
interesting alternative to performing full quantum-state tomography [12].
Least-squares tting The goal in leastsquares tting is to nd a simple continuous function that well approximates
a discrete set of N points x
i
, y
i
. The function is constrained to be linear in the t parameters C
M
, but it can be
non-linear in x. For simplicity we consider x C, but the generalization to higher dimensional x is straight-forward.
2
Our t function is then of the form
f(x, ) :=
M
j=1
f
j
(x)
j
where
j
is a component of and f(x, ) : C
M+1
C. The optimal t parameters can be found by minimizing
E =
N
i=1
[f(x
i
, ) y
i
[
2
= [F y[
2
(1)
over all , where we have dened the N M matrix F through F
ij
= f
j
(x
i
), F
t
is its transpose, and y denotes the
column vector (y
1
, . . . , y
N
)
t
. Also, following HHL, we assume without loss of generality that
1
2
|F
F| 1 and
1
2
|FF
F is invertible, the t parameters that give the least square error are found by applying the Moore
Penrose pseudoinverse [13] of F, F
+
, to y:
= F
+
y = (F
F)
1
F
y. (2)
A proof that (2) gives an optimal for a leastsquare t is given in the appendix.
The algorithm consists of three subroutines: a quantum algorithm for performing the pseudoinverse, an algorithm
for estimating the t quality and an algorithm for learning the t-parameters .
1. Fitting Algorithm Our algorithm uses a quantum computer and oracles that output quantum states that
encode the matrix elements of F to approximately prepare F
+
y. The matrix multiplications, and inversions, are
implemented using an improved version of the HHL algorithm [10] that utilizes recent developments in quantum
simulation algorithms.
Input : A quantum state [y =
M+N
p=M+1
y
p
[p/[y[ that stores the data y, an upper bound (denoted ) for the square
roots of the condition numbers of FF
and F
log(N)(s
3
6
)/
, (3)
where
O notation implies an upper bound on the scaling of a function, suppressing all sub-polynomial functions.
Alternatively, the simulation method of [15, 16] can be used to achieve a query complexity of
log(N)(s
6
)/
2
. (4)
Analysis of Algorithm The operators F and F
0 X
X
. (5)
These choices are convenient because I(F
)[y contains F
)[y by [F
y[.
Preparing I(F
) is
a Hermitian, rather than unitary, operator. We implement the Hermitian operator using the same phase estimation
trick that HHL use to enact the inverse of a Hermitian operator, but instead of dividing by the eigenvalues of each
3
eigenstate we multiply each eigenstate by its eigenvalue. We describe the relevant steps below. For more details,
see [10].
The algorithm rst prepares an ancilla state for a large integer T that is of order N
[
0
=
2
T
T1
=0
sin
( + 1/2)
T
[ [y. (6)
It then maps [
0
to,
2
T
T1
=0
sin
( + 1/2)
T
[ e
iI(F
)t0/T
[y, (7)
for t
0
O(/). We know from work on quantum simulation that exp(iI(F
)t
0
/T) can be implemented within
error O() in the 2-norm using
O(log(N)s
3
t
0
/T) quantum operations, if F has sparseness s [17]. Alternatively, the
method of [15, 16] gives query complexity
O(log(N)st
0
/(T)). If we write [y =
N
j=1
j
[
j
, where [
j
are the
eigenvectors of I(F
) with eigenvalue E
j
we obtain
2
T
T1
=0
sin
( + 1/2)
T
e
iEjt0/T
[
j
[
j
, (8)
The quantum Fourier transform is then applied to the rst register and, after labeling the Fourier coecients
k|j
,
the state becomes
N
j=1
T1
k=0
k|j
j
[k[
j
, (9)
HHL show that the Fourier coecients are small unless the eigenvalue E
j
E
k
:= 2k/t
0
, and t
0
O(/) is needed
to ensure that the error from approximating the eigenvalue is at most . It can be seen using the analysis in [10] that
after re-labeling [k as [
E
k
, and taking T O(N), (9) is exponentially close to
N
j=1
j
[
E
j
[
j
.
The nal step is to introduce an ancilla system and perform a controlled unitary on it that rotates the ancilla state
from [0 to
1 C
2
E
2
j
[0 +C
E
j
[1, where C O(max
j
[E
j
[)
1
because the state would not be properly normalized
if C were larger. The probability of measuring the ancilla to be 1 is O(1/
2
) since CE
j
is at least O(1/). O(
2
)
repetitions are therefore needed to guarantee success with high probability, and amplitude amplication can be used
to reduce the number of repetitions to O() [10]. HHL show that either O(1/
2
) or O(1/) attempts are also needed
to successfully perform I(F)
1
depending on whether amplitude amplication is used.
The cost of implementing I(F
2
/) oracle calls. The cost of performing the inversion using
the simulation method of [15, 16] is found by substituting s s
1/3
/ into this or any of our subsequent results.
Inverting F
F)
1
using the method of HHL [10]. Note that the
existence of (F
F)
1
is implied by a well-dened tting-problem, in the sense that a zero eigenvalue of F
F would
result in a degenerate direction of the quadratic form (1). The operator F
F C
MM
is Hermitian and hence
amenable to the linear systems algorithm. We do, however, need to extend the domain of the operator to make it
compatible with [y which is in a Hilbert space of dimension N + M. We introduce A to denote the corresponding
operator,
A :=
F 0
0 FF
= I(F)
2
. (10)
If we dene [ C
N+M
to be a state of the form [ =
M
j=1
j
[j up to a normalizing constant, then F
F is
proportional to A[ up to a normalizing constant. This means that we can nd a vector that is proportional to the
least-squares t parameters by inversion via
[ = A
1
I(F
)[y. (11)
4
This can be further simplied by noting that
A
1
= I(F)
2
. (12)
Amplitude amplication does not decrease the number of attempts needed to implement A
1
in (11) because the
algorithm require reections about I(F
). This is a consequence of the fact that the probability of successfully performing each I(F)
1
is O(1/
2
)
and the probability of performing I(F
log(N)(s
3
6
/)
. (13)
Although the algorithm yields [ eciently, it may be exponentially expensive to learn [ via tomography;
however, we show below that a quantum computer can assess the quality of the t eciently.
2. Estimating Fit Quality We will now show that we can eciently estimate the t quality E even if M is
exponentially large and without having to determine the t-parameters. For this problem, note that due to the
isometry (5) E = [[y I(F)[[
2
. We assume the prior computational model. We are also provided a desired error
tolerance, , and wish to determine the quality of the t within error .
Input : A constant > 0 and all inputs required by algorithm 1.
Output : An estimate of [y[I(F)[[
2
accurate within error .
Query Complexity:
log(N)
s
3
. (14)
Algorithm We begin by preparing the state [y [y using the provided state preparation blackbox. We then use
the prior algorithm to construct the state
I(F)A
1
I(F
)[y [y = I(F)
1
I(F
log(N)
s
3
. (16)
The swap test [18] is then used to determine the accuracy of the t. The swap test is a method that can be used
to distinguish [y and I(F)[ by performing a swap operation on the two quantum states controlled by a qubit in
the state ([0 + [1)/
2. The Hadamard operation is then applied to the control qubit and the control qubit is then
measured in the computational basis. The test concludes that the states are dierent if the outcome is 1. The
probability of observing an outcome of 1 is (1 [y[I(F)[[
2
)/2 for our problem.
The overlap between the two quantum states can be learned by statistically sampling the outcomes from many
instances of the swap test. The value of [y[I(F)[[
2
can be approximated using the sample mean of this distribution.
It follows from estimates of the standard deviation of the mean that O(1/
2
) samples are required to estimate the
mean within error O(). The cost of algorithm 2 is then found by multiplying (16) by 1/
2
.
The quantity E can be estimated from the output of algorithm 2 by E 2(1 [y[I(F)[[). Taylor series analysis
shows that the error in the upper bound for E is also O().
There are several important limitations to this technique. First, if F is not sparse (meaning s O(poly(N))) then
the algorithm may not be ecient because the quantum simulation step used in the algorithm may not be ecient.
As noted in previous results [1416], we can generalize our results to systems where F is non-sparse if there exists
a set of ecient unitary transformations U
j
such that I(F) =
j
U
j
H
j
U
j
where each H
j
is sparse and Hermitian.
Also, in many important cases (such as tting to experimental data) it may not be posible to prepare the initial state
[y eciently. For this reason, our algorithm is better suited for approximating the output of quantum devices than
the classical outputs of experiments. Finally, algorithm 2 only provides an ecient estimate of the t quality and
does not provide ; however, we can use it to determine whether a quantum state has a concise representation within
a family of states. If algorithm 2 can be used to nd such a representation, then the parameters [ can be learned
via state tomography. We discuss this approach below.
3. Learning This method can also be used to nd a concise t function that approximates y. Specically, we
use statistical sampling and quantum state tomography to nd a concise representation for the quantum state using
M
O(polylog(N)).
5
Input : As algorithm 2, but in addition with an integer M
t functions.
Query Complexity:
log(N)s
3
2
+
M
2
.
Algorithm The rst step of the algorithm is to prepare the state [ using algorithm 1. The state is then measured
O(M
) times and a histogram of the measurement outcomes is constructed. Since the probability of measuring each
of these outcomes is proportional to their relevance to the t, we are likely to nd the M
) times.
After choosing the M
most signicant t functions, we remove all other t functions from the t and prepare the
state [ using the reduced set of t functions. Compressed sensing [1921] is then used to reconstruct [ within
O() error. The idea of compressed sensing is that a lowrank density matrix can be uniquely determined (with high
probability) by a small number of randomly chosen measurements. A convex optimization routine is then used to
reconstruct the density matrix from the expectation values found for each of the measurements.
Compressed sensing requires O(M
log(M
)
2
) measurement settings to reconstruct pure states, and observation 1
of [19] implies that O(M
/
2
) measurements are needed for each setting to ensure that the reconstruction error is
O(); therefore, O(M
2
log(M
)
2
/
2
) measurements are needed to approximate the state within error O(). The total
cost of learning [ is the number of measurements needed for tomography multiplied by the cost of preparing the
state and thus scales as
log(N)
s
3
M
2
, (17)
which subsumes the cost of measuring [ to nd the most signicant M
t functions.
Finally, we measure the quality of the t using algorithm 2. The total cost of estimating [ and the t quality is
thus the sum of (17) and (16), as claimed.
Remark: The quality of the resulting t that is yielded by this algorithm depends strongly on the set of t functions
that are used. If the t functions are chosen well, fewer than M
= FF
+
. (A1)
The proof of this property is
(FF
+
)
= (F(F
F)
1
F
= (F((F
F)
1
)
). (A2)
The result of (A1) then follows by noting that F
F is selfadjoint.
6
Next, we need the property that
FF
+
F = F. (A3)
This property follows directly from substituting in the denition of F
+
into the expression.
The nal property we need is
F
(FF
+
y y) = 0. (A4)
Using property (A1) we nd that
F
(FF
+
y y) = (FF
+
F)
y F
y. (A5)
Property (A3) then implies that
(FF
+
F)
y F
y = F
y F
y = 0. (A6)
For simplicity, we will express z = F
+
y and then nd
|F y|
2
= |Fz y + (F Fz)|
2
. (A7)
Expanding this relation yields,
|F y|
2
= |Fz y|
2
+|F( z)|
2
+ (Fz y)
F( z) + ( z)