0% found this document useful (0 votes)
17 views9 pages

Quantum Conditional Random Field: PACS Numbers

The document presents a Quantum Conditional Random Field (QCRF) model, which enhances the classical Conditional Random Field (CRF) by leveraging quantum computing principles to improve training efficiency and learning capability. The proposed quantum algorithm achieves polynomial time complexity in training, significantly reducing the computational burden associated with large datasets compared to classical methods. Additionally, the QCRF model is shown to possess a higher Vapnik-Chervonenkis dimension, indicating superior learning ability over its classical counterpart.

Uploaded by

vyumax666
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views9 pages

Quantum Conditional Random Field: PACS Numbers

The document presents a Quantum Conditional Random Field (QCRF) model, which enhances the classical Conditional Random Field (CRF) by leveraging quantum computing principles to improve training efficiency and learning capability. The proposed quantum algorithm achieves polynomial time complexity in training, significantly reducing the computational burden associated with large datasets compared to classical methods. Additionally, the QCRF model is shown to possess a higher Vapnik-Chervonenkis dimension, indicating superior learning ability over its classical counterpart.

Uploaded by

vyumax666
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Quantum Conditional Random Field

Yusen Wu1,2 , Chao-Hua Yu1 , Binbin Cai1 , Sujuan Qin†1 ,∗ Fei Gao1 ,† and Qiaoyan Wen1‡
1
State Key Laboratory of Networking and Switching Technology,
Beijing University of Posts and Telecommunications, Beijing, 100876, China
2
State Key Laboratory of Cryptology, P.O. Box 5159, Beijing, 100878, China
(Dated: January 7, 2019)
Conditional random field (CRF) is an important probabilistic machine learning model for labeling
sequential data, which is widely utilized in natural language processing, bioinformatics and computer
vision. However, training the CRF model is computationally intractable when large scale training
samples are processed. Since little work has been done for labeling sequential data in the quantum
settings, we in this paper construct a quantum CRF (QCRF) model by introducing well-defined
Hamiltonians and measurements, and present a quantum algorithm to train this model. It is shown
arXiv:1901.01027v1 [quant-ph] 4 Jan 2019

that the algorithm achieves an exponential speed-up over its classical counterpart. Furthermore, we
also demonstrate that the QCRF model possesses higher Vapnik Chervonenkis dimension than the
classical CRF model, which means QCRF is equipped with a higher learning ability.
PACS numbers: 03.67.Dd, 03.67.Hk

I. INTRODUCTION extended to construct quantum neutral network models,


such as quantum deep learning [17], quantum Boltzmann
Quantum computing makes use of quantum mechan- machines [1], and quantum Hopfield neural network [21].
ical phenomena, such as quantum superposition and Unfortunately, all of these algorithms and models only
quantum entanglement, to perform computing tasks on work for nonsequential data where observations (and la-
quantum systems, providing a new computing model fun- bels) have no relationship with each other. However, se-
damentally different from the classical computing [15]. quential data arise in various fields including bioinfor-
The most exciting thing about quantum computing is matics, speech recognition, and machine translation etc.
its ability to achieve significant speed-up over classical [3].
computing for solving certain problems, such as simulat- Conditional random field (CRF) is a probabilistic
ing quantum systems [4, 6, 11], factoring large integer framework for labeling and segmenting sequential data,
numbers [23], and unstructured database searching [12]. such as text sequences and gene sequences, where labels
In the past decade, the excitement has been brought depend on other labels and observations [13]. It plays an
into a newly emerging branch of quantum computing, important role in machine learning, and has wide appli-
quantum machine learning (QML), which is a interdisci- cations [2, 13, 24, 30] in the fields of natural language
plinary research field combining both quantum comput- processing (NLP), bioinformatics, and computer vision.
ing and machine learning [22]. Machine learning studies Given n sequential observations, x = (x1 , · · · , xn ), and
algorithms that assign a label (output) to each one of their corresponding labels, y = (y1 , · · · , yn ), CRF aims
observations (inputs) by learning the model describing to model the conditional probability distribution P (y|x).
the relationship between the observation and the label. In the model, some eigen-functions {fk }K k=1 are intro-
It mainly falls into two categories: supervised learning duced to describe the inner relationships within the ob-
and unsupervised learning, depending on whether exam- servation sequence x and the label sequence y. Mean-
ple observations and corresponding labels are provided while the eigen-functions also describe the relationship
or not. Since the pioneering quantum algorithm for lin- between them. The CRF model is parameterized by the
ear systems of equations was proposed by Harrow et al. coefficients wk of the eigen-functions fk . A simple but
(HHL) [29], a variety of quantum algorithms have been motivating example from NLP is part-of-speech tagging,
put forward to tackle various well-known machine learn- in which xs denotes the word of the sentence x at po-
ing problems, such as linear regression [16], data classifi- sition s, and ys is its corresponding part-of-speech tag.
cation [19], clustering analysis [25], principle component Training the CRF model generally uses the gradient de-
analysis [26], ridge regression [5], Toeplitz system [14], cent method to obtain the parameters wk in an iterative
polynomial approximation by gradient descent and New- way, and the time complexity grows exponentially with n
ton’s method [20], and so on. Fortunately, these quantum [13]. It means that training a CRF model will be compu-
algorithms exhibit substantially significant speed-up over tationally intractable when n is large. After the param-
their counterparts. Recently, the study of QML has been eters wk are obtained, the model can be used to predict
the label for any new observation via the efficient Viterbi
algorithm [9].
In this paper, we explore how to model and train
∗ qsujuan@bupt.edu.cn CRF on quantum settings. Specifically, by introducing
† gaofei bupt@hotmail.com two well-defined Hamiltonians both encoding the param-
‡ wqy@bupt.edu.cn eters wk , as well as two well-defined measurement oper-
2

ators both encoding the eigen-functions fk , we construct note the observation sequence and label sequence, respec-
a quantum CRF (QCRF) model where the conditional tively. The predicting phase aims at computing the most
probability distribution P (y|x) is derived by simple lin- probable label sequence y0 for new observation sequence
ear algebra operations on the exponents of the Hamilto- x0 with the help of P (y|x) obtained in the training phase.
nians and the measurement operators. We also present
a quantum algorithm for training the model, which uses
state-of-the-art Hamiltonian simulation [11] to obtain the B. Training CRF
classical information of the parameters wk . It is shown
that the time complexity of our algorithm grows polyno- Given large scale of observation sequence x and corre-
mially with n (i.e., the number of example observation- sponding label sequence y, CRF training phase suffices
label couples for training), exponentially improving the to construct the inference model P (y|x) on the graph,
dependence on n compared with the classical training al- i.e. finding the Boltzmann weights w = (w1 , .., wK ).
gorithm mentioned above. Furthermore, we also compare One approach to find the appropriate w is the minimum-
the QCRF and the classical CRF from the perspective likelihood method based upon the observable x and label
of computational learning theory. We show that, our y. If the training process is successful, the model joint
QCRF model has a much higher Vpanik-Chervonenkis distribution Px,y has enough resemblance to the priori
(VC) dimension [28] than the classical CRF, demonstrat- data
data joint distribution Px,y . To describe this approach-
ing that QCRF significantly improves the data learning ing, we introduce the log-likely hood function L, whose
ability over the classical CRF. minimum point corresponds to the appropriate w, or the
appropriate relationship. Suppose the average negative
log-likely hood function L is defined as [13]
II. REVIEW OF CLASSICAL CRF X X
L=− Pxdata P data (y|x) log P (y|x) (2)
In this section, we first review the definition of CRF x y

model and the methodology to train it.


X
data eE(x,y)
=− Px,y log P E(x,y∗) . (3)
x,y
e
A. Definition of CRF model y∗

The summation on y ∗ indicates traversing all the possible


Suppose X and Y are random variables, and G(V, E)
label sequences with the fixed observation data x, and the
is an undirected graph such that Y = (Yv ), v ∈ V . The
potential function E(x, y) is in the form of
vertex set V represents random variables and edge set
E stands for the dependency relationships between ran- n X
X K
dom variables. Then (X, Y ) is a CRF when the random E(x, y) = wk fk (xi , yi ). (4)
variables Yv , conditioned on X, obey the Markov prop- i=1 k=1
erty with respect to the graph G, i.e. P (Yv1 |X, Yv2 , v1 6=
v2 ) = P (Yv1 |X, Yv2 , v1 ∼ v2 ) is satisfied for every node v1 , To determine the Boltzmann weights w, we need to min-
where v1 ∼ v2 means that v1 and v2 are neighbors in G. imize the function L with the help of optimization for-
Yv1 and Yv2 are corresponding random variables of ver- mulas, e.g. Newton method, BFGS method and Gradient
tices v1 , v2 . Theoretically, we can construct a structure descent method. It is interesting to note that these meth-
that fully models the graph G in any arbitrary complex- ods all depend on calculating the gradient of L. In each
ity, and the most commonly used Chain-structured CRF iteration, the parameter w is updated by a selected step
∂L
is defined as in the direction opposite to the gradient: ∆w = −η ∂w ,
∂L
where η is the step length and the gradient ∂w is ex-
n K
1 XX pressed as
P (y|x) = exp{ wk fk (xi , yi )}. (1)
Z P ∂ E(x,y∗)
i=1 k=1 ∂
e E(x,y) ∂wk e
∂L X
data ∂wk y∗
Here xi and yi respectively denote the i-th observa- =− Px,y ( E(x,y) − P E(x,y∗) ) (5)
∂wk x,y
e e
tion and its corresponding label, fk (xi , yi ) takes value y∗
in {−1, 1} with Boltzmann weight wk , and Z is the nor-
malization factor. If yi is assigned to xi with a high X
probability, fk (xi , yi ) returns value 1, otherwise returns =− data
Px,y (heE iC,k E C,k
X,Y − he iX ). (6)
x,y
−1.
Similar to most supervised learning models, CRF also Noting that the gradient function ∂L
has two terms, in
∂wk
involves two phases, namely training phase and predict-
ing phase. Specifically, the training phase analyzes the which the first term heE iC,k
X,Y is clamped by the train-
training data to construct the most appropriate map- ing data {x, y}, and the second term heE iC,k
X is only
ping P (y|x), in which x = (xi )ni=1 and y = (yi )ni=1 de- clamped by the observable set x. Updating gradient
3

achieves exponential complexity in classical computing, In contrast to the priori empirical distribution, the quan-
since the second term traverses all the possible y ∗ leading tum model distribution of fk is formulated as
∂ ∗
to O(kQkn ) possibilities, and each possibility ∂w eE(x,y )
EQ (f1 , ..., fK ) = T r(ΛX,Y (ρX,Y H (0) )).
k
has O(n) terms, which is intractable facing large scale n. (11)
In the following, we will first give the theory of QCRF
model and then present a quantum algorithm for training The notation T r(·) means the trace of a matrix. Hamil-
the model that is prominently exponentially faster than tonian H (0) is composed
P P by all the base of Hilbert space
the classical CRF training algorithm. H, where H = {( wk fk,i )|uj i|fk,i ∈ {−1, 1}} and
i k
|uj i denotes a set of base spanning the Hilbert space
H. Density matrix ρX,Y encodes the joint distribution
III. QCRF MODEL
P (x, y) corresponding P
to every combination of features
fk,i . And ΛX,Y = Λ(x, y)|x, yihx, y| is the mea-
A. Fundamental theory of QCRF (x,y)
surement operator limiting the Hamiltonian only to the
Since classical CRF model is proposed according to the clamped X, Y provided by the training data set. The
classical conditional entropy, we propose the fundamental parameter Λ(x, y) = 1 if and only if x = X, y = Y , oth-
theory of QCRF model based on the principle of Quan- erwise Λ(x, y) = 0.
tum Conditional Entropy [15]. For the QCRF model, the The second constraint claims the normalization condi-
P (y ∗ |x) = 1. After introducing another La-
P
quantum conditional entropy S(y|x) is applied, which is tion:
y∗
defined as
grange parameter λ on the normalization condition, the
S(y|x) = S(ρX,Y ) − S(ρX ). (7) Lagrange formula function can be interpreted as:
The density operator ρX,Y encodes the joint distribu-
tion of P (x, y) in its amplitude, which can be decom- G(P (y|x)) = S(y|x) + (EQ (f1 , ..., fK )
posed onPits spectrum with the probability P (x, y), i.e.,
X
− EC (f1 , ..., fK )) + λ( P (y ∗ |x) − 1). (12)
ρX,Y = P (x, y)|x, yihx, y|. And the density operator y∗
x,y
ρX indicates the marginal distribution P (x) of random ∂G(P (y|x))
variable X, similarly, we can also decompose ρX into the Equating the partial derivative ∂P (y|x) to 0 and solv-
P
form of ρX = P (x)|xihx|. Thus the quantum condi- ing by P (y|x), we obtain
x
tional entropy S(y|x) can be expressed as (0)

X P (y|x) = T r(ΛX,Y eH ) exp(λ/P (x) − 1). (13)


S(y|x) = − P (x, y) log(P (y|x)). (8)
P (y ∗ |x) = 1, then we have
P
x,y It is worthy to note that
y∗
The kernel idea behind the QCRF model is to find the
model P ∗ (y|x). Applying quantum conditional entropy, (0)
X
T r(ΛX,Y eH ) exp(λ/P (x) − 1) = 1. (14)
we obtain the form of objective function: y∗
P ∗ (y|x) = arg max S(y|x). (9)
P (y|x) To simplify Eq. (14), we introduce another Hamilto-
⊗n

The model P (y|x) has the largest possible quantum con- nian H (n) = IQ ⊗ H (0) and corresponding measurement
P
ditional entropy, which is still consistent with the infor- ΛX = Λ(x)|x, yihx, y|, which traverse all the possible
x
mation from the training material. Finding the quan- y ∗ , then we have
tum conditional probabilistic model under some con-
straints can be formulated as an optimization problem. (0) (n)
X
T r(ΛX,Y eH ) = T r(ΛX eH ). (15)
We mainly take two constraints into consideration. The y∗
first constraint depends on the training material, which
requires the model distribution should be close to the pri- The parameter Λ(x) = 1 if and only if x = X, otherwise
ori empirical distribution. That means, for each eigen- Λ(x) equals to 0. Combining Eqs. (13), (14) and (15),
function fk , its expected value on the empirical distribu- the QCRF model can be formulated as
tion must be equal to its expected value on the model
(0)
distribution. The empirical distribution of fk is obtained T r(ΛX,Y eH )
by simply counting how often the different values of the P (y|x) = . (16)
T r(ΛX eH (n) )
variable occur in the training data. Introducing the La-
grange multiplier wk , we obtain the priori distribution: This is the kernel expression of QCRF model. In the
K following, we illustrate a simple method to construct
1 X X
EC (f1 , ..., fK ) = wk fk (x, y). (10) the concrete Hamiltonians H (0) , H (n) and correspond-
N ing measurement ΛX,Y , ΛX .
k=1 (x,y)
4
(0)
B. The construction of Hamiltonian and Extracting subspace S0 from whole space eH de-
measurement pends on the information of training data x, y. The prop-
erties of training data are reflected by the eigen-function
One of the kernel target of QCRF model aims at con- series fk (xi , yi ), therefore, eigen-function can provide all
structing the special designed Hamiltonian to represent the information of S0 . Specifically, the corresponding
the potential function E(x, y) via replacing the classical projection operator ΛX,Y is expressed as:
bits with quantum bits. The potential function E(x, y) n YK
not only reflects the statistical property, but also de- Y 1 z,0
ΛX,Y = (I + fk (xi , yi )σk,i ). (20)
scribes the matching degree between the observation vari- 2
i=1 k=1
able x and label sequence y by eigen-function fk (xi , yi ).
To simulate the statistical property and the matching z,0
The construction of 21 (I + fki σk,i ) promises its entries
degree of the variables, we utilize the Pauli Z analogue are only onto the diagonal with the eigen-value of 1 or
z,l
density operator σk,i to reflect the empirical characteris- 0. The operator ΛX,Y is designed to perform on Kn
z,l
tics of data. Define the Pauli Z analogue operator σk,i as qubits. Given a quantum system consisting of Kn qubits,
follows: ΛX,Y can be utilized to test whether the quantum system
z,l ⊗l ⊗(ki−1) ⊗(Kn−ki) collapses on the state ⊗| 1−fk (x i ,yi )
i or not. With the help
σk,i = IQ ⊗ I2 ⊗ σ z ⊗ I2 (17) 2
of projection operator ΛX,Y , we obtain the subspace S0 :
z,l
The tag parameter l on the σk,i ’s shoulder controls the (0)

number of identity gate IQ . The notation ⊗ means Kro- S0 = ΛX,Y (eH ). (21)
necker product. And the operator σ z indicates the Pauli The subspace Sn can also be extracted in a similar
z,l
Z operator [15]. Every element in the expression σk,i is an method. The fundamental difference just lies in the fact
z
identity operator except the (ki+l) -th σ operator. This that Sn traverses all the possible y ∗ . We need to find out
z,l
construction promises σk,i emerging in form of diagonal every y ∗ with the help of eigen-function fk (xi , yi∗ ). As a
operator, which encodes equivalent statistical distribu- result, the project operator ΛX can be expressed into the
tion on its diagonal. Specifically, Hamiltonian H (0) and form of:
H (n) can be expressed as n YK

kQk

Y 1 X (j) z,n−1 
n X
K ΛX = I+ fk (xi , yi )|jihj| ⊗ σk,i .
2
X z,0
H (0) = wk σk,i (18) i=1k=1 j=1
i=1 k=1
(22)
and
(j)
n X
K The notation yi indicates all the circumstances for
yi . The measurement operator ΛX is established on a
X z,n
H (n) = wk σk,i (19)
i=1 k=1 (log kQk + K)n qubits quantum system, which can be
utilized testing whether the quantum system collapses
respectively. The parameter wk also indicates the Boltz- 1 1−f (x ,y
i i
(j)
)K i
(j)
1−f (x ,yi )
mann weight expressed in the potential function E(x, y). on the state |ji ⊗ | 2 i ⊗ ... ⊗ | 2 i or
Different with the classical CRF model describes the data not. We can extract the subspace Sn as follows:
patterns via the binary eigen-function fk (x, y), QCRF (n)

model depends on the natural physical mechanism of par- Sn = ΛX (eH ). (23)


ticle topspin and downspin. Noting that the trace norm of subspace S0 and Sn
To construct the measurement operator, we first de- respectively E(x,y)
fine the subspace S0 and Sn as follows: S0 = eE(x,y) |ϕi i, P E(x,y∗ ) equal to e and marginal distribution
e , then we can obtain the relationships
where |ϕi i is one of the base according to the index y∗
(i)
i ∈ {1, 2, ..., 2nK }, and Sn = span{eE(x,y ) |ψi i (0)
(l) T r(ΛX,Y eH ) = eE(x,y) (24)
|i = 1, 2, · · · , kQkn }. Thus Sl ⊂ eH , l = 0 or
n, whose trace norm satisfies kS0 ktr = eE(x,y) and and
∗ (l)
kSn ktr = eE(x,y ) . Since the Hilbert space eH is sep-
P
(n) X ∗
y∗ T r(ΛX eH )= eE(x,y ) , (25)
(l)
arable, we can decompose the space eH into the form y∗
(0) (n)
of eH = S0 ⊕ S0⊥ and eH = Sn ⊕ Sn⊥ , in which Sl⊥ which build up a bridge between the quantum model and
is the orthogonal complementary space of Sl defined on classical information. Up to now, we utilize the Hamil-
(l)
eH , l = 0 or n. For a subspace, Sl is a diagonal matrix tonian H (0) and H (n) representing the gradient function
∂L
with its elements clamped by the variables x, y. To ex- ∂wk under the quantum model:
tract this subspace, we should design specific projection
(l) ∂L (0)
H (n) Q,k
operators Λ, which projects the whole space eH only
X
=− data
Px,y (heH iQ,k
X,Y − he iX ), (26)
onto the subspace Sl . ∂wk x,y
5

where which are denoted by |φj i, j = 1, 2, ..., m. Then we per-


(0)
form measurement ΛX on each one to estimate the de-
∂ H
(0) T r(ΛX,Y∂wk e ) sired average, i.e.,
heH iQ,k
X,Y = (27)
T r(ΛX,Y eH (0) ) m
∂ H (n) 1 X
and T r(ΛX e )= hφj |ΛX |φj i. (30)
∂wk m j=1
(n)

(n) T r(ΛX ∂w eH )
heH iQ,k The whole quantum algorithm can be summarized as fol-
X = . (28)
k

T r(ΛX e H (n) ) lows.


Algorithm: Training QCRF model
The Eq. (25) is a pivotal component utilized in QCRF Input:Boltzmann weight w = (w1 , w2 , ..., wK ), training
training process, which encapsulates static property of data set (x, y), Hamiltonian H (0) , H (n) and measure-
Hamiltonian H (l) and the information provided by train- ment operators ΛX,Y , ΛX .
data
ing data. Besides, the joint probabilities Px,y are given ∂L
Output:The estimation of gradient function ∂w
priori. Thus we can only concentrate on computing the k
(0)
H (n) Q,k
terms heH iQ,k X,Y and he iX . Further training the 1. Initial state: |0i(n(log Q+K) |0ir |0ir . The first sys-
QCRF model, or finding the Boltzmann weights w, also tem consists of n(log Q + K) qubits to present
relies on this expression. the kQkn 2nK dimension superposition. The sec-
ond and third system are encoded with r qubits of
precision.
IV. QUANTUM ALGORITHM FOR TRAINING
QCRF 2. Perform Hadamard operator H on the first system
implementing the superposition
The above section proposes the fundamental theory of
QCRF model. In this section, we present a quantum D
1 X
algorithm for training the QCRF model, then analyze H|0i = √ |ψk i|0i|0i, (31)
its time complexity. Finally, a numerical simulations are D k=1
performed on both CRF and QCRF models. Surpris-
ingly, the results show that, in contrast to CRF model, ∂ H (n)
where D is the dimension of Hamiltonian ∂wk e .
our QCRF model requires significantly fewer iterations
to achieve the same error rate. 3. Noting that the Hamiltonian H (n) can be decom-
posed on the computational basis |ψi i. Then per-
form the phase estimation P E(H (n) ) on the first
A. Algorithm and second register. The system achieves to

The kernel target of our quantum algorithm aims at 1 X


D
(0)
H (n) Q,k √ |ψk i|Ek i|0i, (32)
estimating the average terms heH iQ,kX,Y and he iX ,
D k=1
as shown in Eqs. (27) and (28), thereby the gradient
function ∂L/∂wk can be computed efficiently.
(0)
H (n) Q,k where Ek indicates the eigen-value of Hamiltonian
Evidently, estimating heH iQ,kX,Y and he iX re-
(n)
H (n) corresponding to |ψk i. Implementing the

quires us to estimate the four terms T r(ΛX ∂w k
eH ), phase estimation P E(H (n) ) depends on Hamilto-
T r(ΛX e H (n) ∂
), T r(ΛX,Y ∂w
(0)
eH ),
and T r(ΛX,Y e ). H (0) nian simulation, which achieves the controlled op-
k −iH (n) jt
Since these four terms have almost the same form, we erator e 2r .
just concentrate on estimating the relatively complicated

term T r(ΛX ∂w∂ (n)
eH ), which is mathematically equal to 4. Then perform the phase estimation P E( ∂w k
H (n) )
k
(n)
∂ (n)
on the third register, and the system obtains the
T r(ΛX eH ∂wk H ). From Eqs. (19) and (22), it is ev- ∂
phase µi of the matrix ∂w H (n) in the third register
k
H (n)
ident that ΛX , e ∂
and ∂w k
H (n) have the same eigen-
vectors we denote as {|ψk i}D k=1 , where D = kQk 2
n nK D
1 X
is the dimension of these three operators. To estimate √ |ψk i|Ek i|µk i. (33)
T r(ΛX ∂w∂ (n)
eH ), our algorithm will first generates m D k=1
k
copies of the state
D p 5. Invoking exponential gate on the second register,
1 X
whose quantum circuit is illustrated in Fig. 1.
|φi = rP λk |ψk i, (29)
kλk k k=1 Then executing the multiplication operator onto
k the second and third registers. After that, undo
6


the phase estimation P E( ∂w k
H (n) ). The system utilize the similar method
(n)
to estimate the(0)numer-

becomes to ical value of T r(ΛX eH ), T r(ΛX,Y ∂w k
eH ), and
(0) (0)
D T r(ΛX,Y eH ). Then the terms heH iQ,k
X,Y and
1 X (n)
√ |ψk i|µk eEk i|0i. (34) heH iQ,k
X can be computed efficiently. Finally, the
D k=1 ∂L
gradient function ∂w gains.
k

Fig. 2 illustrates the quantum circuit of the steps


3-5.
B. Complexity analysis
6. Denote µk eEk as λk , and apply the controlled ro-
tation onto the third system, then the system be-
comes to We continue with a discussion of the run time of our
quantum algorithm. First, the Hadamard operator H
D performs on (log Q + K)n qubits, thus Hadamard oper-
1 X
q
√ |ψk i|λk i(qk |0i + 1 − qk2 |1i), (35) ator takes O(n) time to generate a superposition. Then,
D k=1 the Hamiltonian simulating is one of the preliminary the-
ories of our quantum algorithm. The cost of simulat-

where qk = C λk is a normalisation factor. ing the time evolution operator e−iHt depends on sev-
eral factors: the number of system gates, evolution time
7. Measuring the last register to see the outcome |0i t, target error εsi , and how information on the Hamilto-
and discarding it. Then undo the phase estimation nian H is made available. In detail, Qubitization method
of the second system, we have the state [10], which achieves the simulation optimal bound in the-
ory, can be made fully constructive with P anpapproach
D p wk
1 X for implementing some signal states |Gi = α |ki|ii
|φi = rP λk |ψk i, (36) k,i
kλk k k=1 and signal operators U =
P z,l
|i, kihi, k| ⊗ σk,i that en-
k k,i
code Hamiltonian H (l) . It is evident that the signal
with probability P (0) = C 2
P
kλk k/D. The state |Gi and signal operator U can be prepared effi-
k ciently. After that, we can construct the Qubitization
lower bound of probability P (0) can be estimated
intermediate variable W utilizing the signal state |Gi
as Ω(C 2 (1 − nK maxp kwk/D)). Then choosing and signal oracle U with O(1) primitive gates. Finally
the parameter C = D(1 − ε)/(D − nK max kwk) we achieve the Hamiltonian simulation hG|W |Gi, i.e.,
promises the system can measure the |0i with a rel- (l)
khG|W |Gi − e−iH t k < εsi for time t and error εsi . Us-
ative high probability 1 − ε. Furthermore, we can
also utilize the Amplitude Amplification [8] method ing this technique, the Hamiltonian H (l) can be efficiently
simulable in time O(αt + log(1/εsi )/ log log(1/εsi )) [10],
to enhance P (0). This state |φi is then moved over K
and stored in a quantum memory. Then reinitial-
P
where α = k wk k depends on the linear combination
izing quantum computer and repeating steps 1-7 k=1
for m times, and we obtain m states |φ1 i, ..., |φm i of Hamiltonian. For the phase estimation, the propagator
(l)
storing in the quantum memory. e−iH t is enacted with error O(1/t). Then the Hamilto-
(n)
nian simulation time t determines the error of the phase

8. Finally, we can estimate the term T r(ΛX ∂w k
eH ) estimation εP E . Thus the runtime of the phase estima-
by measuring the observable ΛX with the states tion step is O(1/εP E ). Considering the basic comput-
|φ1 i, ..., |φm i: ing gates, exp gate and multiplying gate, we can achieve
the basic gate in O(1) time with the help of the Fourier
(n) m transformation on computational basis technique. Tak-
∂eH P (0)D X
T r(ΛX )= hφi |ΛX |φi i + εm , (37) ing into account the measurement of ΛX , ΛX,Y on each
∂wk mC 2 i=1
states |φ1 i, ..., |φm i, the relative
p error εm obeys the bino-
mial distribution mεm = mpλ (1 − pλ ), where pλ is the
where εm is the measurement error. Noting probability collapsing to the corresponding state. Thus
that measurement operator ΛX is constructed in the measurement time m is O(1/ε2m ).
the form of continuous product of simple opera- Suppose the training data is denoted as
tors, and this construction implies our measure- {(X (1) , Y (1) ), ..., (X (N ) , Y (N ) )}, where (X (i) , Y (i) )
ment can be easily achieved. The probability is the ith observable sequences and corresponding
PΛX reflects the measurement results on |ji ⊗ label sequences. The scale of each data block is
(j) (j)
1−f (x ,y ) 1−f (x ,y )
| 1
2
i i
i ⊗ ... ⊗ | K
2
i i
i . Repeat the |X (i) | = |Y (i) | = n, and the dimension of the weight
measurement O(m) times, we can acquire the sta- parameter w is K. The construction of the potential
tistical value of hφ|ΛX |φi. Furthermore we can also function E(x, y), defined on the probabilistic undirected
7

x x
0 P +
x P +
P +
X x t -1
0 x2
1 1 0 xt

p+ 1ʽ 1 1

p+  1 1 1 1
p+ p+
  t tʽ

0 X QFT QFT + exp ( x )

FIG. 1. EXP gate. We introduce the quantum multiply gate for real inputs and outputs , which can realize the following
transformation, Π+
m,n |ai|bi|ci = |ai|bi|c + abi, where m, n denoted the number of digits of a and b respectively [27]. This

quantum multiply gate can be decomposed into the form: Π+ + +
m,n = (I ⊗ I ⊗ QF T )πm,n (I ⊗ I ⊗ QF T ). πm,n is the intermediate
+
multiply adder, which achieves the transformation πm,n |ai|bi|ϕ(c)i = |ai|bi|ϕ(c + ab)i, with |ϕ(c)i := QF T |ci. Utilizing the
+
gate πm,n and Π+m,n , we achieve the EXP gate.

and corresponding algorithm, we conduct a representa-


0
r Är
j

tive numerical experiment on a Hamiltonian matrix.
H FT
Datasets Description. The Hamiltonian matrix H
Multiplying
gate is constructed based on thePabove
P section, which en-
r j codes 1024 kinds possible wk fki on its diagonal
0 H Är FT † exp gate i k
line. The initial Boltzmann weight is set as w =
æ ¶H ( n ) ö
PE ( H ( n ) ) PE ç ÷ (0.17, 0.35, 0.41, 0.52, 0.37).
è ¶wk ø
Implementation Details. We first utilize the fun-
y e
iH ( n ) j
2r
t
e
i¶H ( n ) j
¶wk 2r
t damental gates to simulate the Hamiltonian H matrix.
Then we apply the proposed quantum algorithm onto the
Hamiltonian H. To be more specific, we freeze the inti-
mal Boltzmann weight w and train our quantum model
FIG. 2. The quantum circuit of the 3-5 steps in our quantum
for 340 epoches.
algorithm. In detail, the exp gate implements exp |ai = |ea i, Evaluations on Performance. We conduct experi-
and the Multiplying gate performs M uti|ai|bi|0i = |ai|bi|abi. ments to find the convergence to the error of our quan-
tum algorithm. Experiments results are shown in Fig. 3.
If we recognize each iteration generating one state |φi i,
graph, promises K ≪ n. Each component of Boltzmann the results describe the phenomenon that the downward
weight w needs invoking mN times quantum algorithm. trend of the estimation error ε with the iteration times
Therefore, the overall running time on computing the increasing. The blue discrete points are the realistic mea-
gradient is surement error rate, and the corresponding fitted error
rate curve (i.e., Fitting curve) shows dramatic decline
KN α log(1/εsi ) from the iteration times 0 to 340. It is evident that es-
O( ( + )(K + log kQk)n). (38)
ε2m εP E log log(1/εsi ) timation error ε has tended to stable after 50 iteration,
and ε finally turns to arbitrary small as the increasing
The parameter εP E is the error of phase estimation, of iteration times. The point line indicates the error of
K
α = k
P
wk k and εsi is the error of Hamiltonian sim- classical Gibbs sampling based algorithm. Given an error
k=1 ε, our quantum algorithm takes fewer steps obtaining ε
ulation. Compared to the classical case, which takes compared to the classical method. Thereby, our quan-
O(KN nkQkn ) computational overhead, our quantum al- tum algorithm shows faster than classical techniques in
gorithm achieves exponential acceleration in the training terms of the rate of convergency.
process compared to its classical counterparts.

V. DISCUSSION: VC DIMENSION
C. Numerical simulation
According to the classical Bernouli theorem, the rela-
Since the quantum model can not be implemented tive frequency of an event in a sequence of independent
physically by current technology. To evaluate our model trials converges to the probability of that event. Fur-
8

1.0 be calculated as follows. Suppose w = (w1 , w2 , ..., wK )


QCRF is the Boltzmann weight, and for the input training
Fitting curve (i)
data Xt = X1i , X2i , ..., Xti ∈ Rl , each Xj is a single
Classical training error in theory
word. Now, we analyze the procedure that trains the
0.75 0.20
data from Xt to Xt+1 . Approximate the exponential po-
0.10 d
60 80 100 xs
tential function as a d degree polynomial ex =
P
s! ,
ErrorUDWH

s=0
0.5 then the data training procedure from Xt to Xt+1 , will
increase model’s degree. In detail, potential function
T P +1 P
TP
z z
P
changes from exp( wk σk,i ) to exp( wk σk,i )=
0.25 i=1 k i=1 k
T P
z z
P P
exp( wk σk,i ) exp( wk σk,T +1 ), whose degree is 2d.
i=1 k k
After reading the whole input X(T ) = X1 , X2 , ..., XT
0
0 50 100 150 200 250 300 350 , the state of any unit in the model can be expressed
Iteration times as a polynomial Pt , t = 1, 2, ..., T . The degree of Pt
t−1
can be estimated as Pt = 2dt +
P j
d . Considering
FIG. 3. The error rate of computing the gradient function j=1
the VC dimension of recurrent structure is bounded by
Ω(2K log(8ePT )) [18], then we obtain the QCRF’s VC di-
mension Ω(K2n ) when T = 2n . It is evident that QCRF
model extends this lower bound, which means QCRF can
thermore, in the learning theory, people prefer to obtain recognize much more samples and features than CRF.
the criteria on the basis of which one could judge whether
there is such convergence or not. The VC dimension, pro-
VI. CONCLUSION
posed by Vapnik et al. [28], formulates the conditions for
such uniform convergence which do not depend on the
distribution properties and furnishes an estimation for In this paper, we have constructed a general QCRF
the speed of convergence. In briefly, VC dimension eval- model by introducing four well-defined Hamiltonians and
uates the generalization ability of a learning model. The measurements. Meanwhile, in order to train this model,
T r(ΛX,Y eH
(0)
)
we have also presented an efficient hybrid quantum algo-
kernel theory of QCRF lies in PQ (y|x) = (n) , rithm to obtain the parameters w in the classical form.
T r(ΛX eH )
which utilizes diagonal Hamiltonian H (l) representing Compared to its classical counterpart, the quantum al-
the potential function, nevertheless, off-diagonal Hamil- gorithm achieves an exponential speed-up. In addition,
tonian is also permitted. If the instance problem needed, numerical simulation results have shown that our QCRF
we can add a traverseP field onto the system, as a result, model requires significantly fewer iterations to achieve
the Hamiltonian H = ϑk σk,i x z
+wk σk,i turns into an off- the same error rate than the classical CRF model. Fur-
i,k thermore, we have also demonstrated that the QCRF
diagonal matrix [7]. The construction of extended Pauli model possesses higher VC dimension than the classi-
x z
X operator σk,i is similar to σk,i , which just substitutes cal CRF model, which means QCRF is equipped with a
x z
σ to σ in the ki th position. Compared to the classical higher learning ability. We expect our work can inspire
CRF model scattering the number of Ω(Kn) data scales more quantum machine learning models and algorithms
in the whole space, the QCRF model’s VC dimension can for handling sequential data.

[1] B.Kulchytskyy, E.Andriyash, M.Amin, and R.Melko. [4] C.H.Bennett, J.I.Cirac, M.S.Leifer, D.W.Leung,
Quantum boltzmann machine. Physical Review X, N.Linden, S.Popescu, and G.Vidal. Optimal simu-
33(2):489–493, 2016. lation of two-qubit hamiltonians using general local
[2] B.Settles. Biomedical named entity recognition using operations. Physical Review A, 66(1):144–144, 2001.
conditional random fields and rich feature sets. In Pro- [5] C.H.Yu, F.Gao, and Q.Y.Wen. Quantum algorithms for
ceedings of COLING, International Joint Workshop On ridge regression. arXiv.
NLPBA, pages 104–107, 2004. [6] D.W.Berry, A.M.Childs, R.Cleve, R.Kothari, and
[3] C.Ghosh, C.Cordeiro, and D.Agrawal. Markov chain ex- R.D.Somma. Simulating hamiltonian dynamics with
istence and hidden markov models in spectrum sensing. a truncated taylor series. Physical Review Letters,
In IEEE International Conference on Pervasive Comput- 114(9):090502, 2014.
ing and Communications, pages 1–6, 2009.
9

[7] F.Wilczek. Particle physics: Mass by numbers. Nature, Review Letters, 113(13):130503, 2014.
456(7221):449, 2008. [20] P.Rebentrost and S.Lloyd. Quantum gradient descent
[8] G.Brassard, P.Hoyer, M.Mosca, and A.Tapp. Quantum and newton’s method for constrained polynomial opti-
amplitude amplification and estimation. Quantum Com- mization. arXiv, 2016.
putation and Information, 5494:53–74, 2012. [21] P.Rebentrost, T.R.Bromley, C.Weedbrook, and S.Lloyd.
[9] G.Forney and Jr.David. The viterbi algorithm. In Pro- A quantum hopfield neural network. arXiv, 2018.
ceedings of the IEEE, volume 61, pages 268–278, 2003. [22] P.Wittek and S.Lloyd. Quantum machine learning. Na-
[10] G.H.Low and I.Chuang. Hamiltonian simulation by ture, 549(7671):195, 2017.
qubitization. arXiv, 2016. [23] P.W.Shor. Polynomial-time algorithms for prime factor-
[11] G.H.Low and I.Chuang. Optimal hamiltonian simulation ization and discrete logarithms on a quantum computer.
by quantum signal processing. Physical Review Letters, Siam Review, 41(2):303–332, 1999.
118(1):010501, 2017. [24] S.Fei and F.Pereira. Shallow parsing with conditional
[12] Grover and k.Lov. A fast quantum mechanical algorithm random fields. In Conference of the North American
for database search. Physical Review letter, 78:212–219, Chapter of the Association for Computational Linguistics
1996. on Human Language Technology, pages 134–141, 2003.
[13] J.D.Lafferty, A.P.Mccallum, and C.N.Fernando. Condi- [25] S.Lloyd, M.Mohseni, and P.Rebentrost. Quantum algo-
tional random fields: Probabilistic models for segmenting rithms for supervised and unsupervised machine learning.
and labeling sequence data. In Eighteenth International arXiv, 2013.
Conference on Machine Learning, pages 282–289, 2001. [26] S.Lloyd, M.Mohseni, and P.Rebentrost. Quantum princi-
[14] L.C.Wan, C.H.Yu, S.J.Pan, F.Gao, Q.Y.Wen, and pal component analysis. Nature Physics, 10(9):108–113,
S.J.Qin. Asymptotic quantum algorithm for the toeplitz 2013.
systems. Physical Review A, 2018. [27] S.S.Zhou, T.Loke, J.A.Izaac, and J. B. Wang. Quantum
[15] M.Nielsen and I.Chuang. Quantum computation and fourier transform in computational basis. Quantum In-
quantum information. Mathematical Structures in Com- formation Processing, 16(3):82, 2017.
puter Science, 21(1):1–59, 2002. [28] V.N.Vapnik and A.Y.Chervonenkis. On the uniform con-
[16] M.Schuld, I.Sinayskiy, and F.Petruccione. Prediction by vergence of relative frequencies of events to their prob-
linear regression on a quantum computer. Physical Re- abilities. Theory of Probability and Its Applications,
view A, 94(2), 2016. 16(2):264–280, 1971.
[17] N.Wiebe, A.Kapoor, and K.Svore. Quantum deep learn- [29] W.A.Harrow, A.Hassidim, and S.Lloyd. Quantum algo-
ing. Computer Science, 2014. rithm for linear systems of equations. Physical Review
[18] P.Koiran and E.D.Sontag. Vapnik-chervonenkis dimen- Letters, 103(15):150502, 2009.
sion of recurrent neural networks. In European Confer- [30] X.He, R.S.Zemel, and M.A.Carreira-Perpindn. Multi-
ence on Computational Learning Theory, pages 223–237, scale conditional random fields for image labeling. In
1997. CVPR, 2004.
[19] P.Rebentrost, M.Mohseni, and S.Lloyd. Quantum sup-
port vector machine for big data classification. Physical

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy