Quantum Conditional Random Field: PACS Numbers
Quantum Conditional Random Field: PACS Numbers
Yusen Wu1,2 , Chao-Hua Yu1 , Binbin Cai1 , Sujuan Qin†1 ,∗ Fei Gao1 ,† and Qiaoyan Wen1‡
1
State Key Laboratory of Networking and Switching Technology,
Beijing University of Posts and Telecommunications, Beijing, 100876, China
2
State Key Laboratory of Cryptology, P.O. Box 5159, Beijing, 100878, China
(Dated: January 7, 2019)
Conditional random field (CRF) is an important probabilistic machine learning model for labeling
sequential data, which is widely utilized in natural language processing, bioinformatics and computer
vision. However, training the CRF model is computationally intractable when large scale training
samples are processed. Since little work has been done for labeling sequential data in the quantum
settings, we in this paper construct a quantum CRF (QCRF) model by introducing well-defined
Hamiltonians and measurements, and present a quantum algorithm to train this model. It is shown
arXiv:1901.01027v1 [quant-ph] 4 Jan 2019
that the algorithm achieves an exponential speed-up over its classical counterpart. Furthermore, we
also demonstrate that the QCRF model possesses higher Vapnik Chervonenkis dimension than the
classical CRF model, which means QCRF is equipped with a higher learning ability.
PACS numbers: 03.67.Dd, 03.67.Hk
ators both encoding the eigen-functions fk , we construct note the observation sequence and label sequence, respec-
a quantum CRF (QCRF) model where the conditional tively. The predicting phase aims at computing the most
probability distribution P (y|x) is derived by simple lin- probable label sequence y0 for new observation sequence
ear algebra operations on the exponents of the Hamilto- x0 with the help of P (y|x) obtained in the training phase.
nians and the measurement operators. We also present
a quantum algorithm for training the model, which uses
state-of-the-art Hamiltonian simulation [11] to obtain the B. Training CRF
classical information of the parameters wk . It is shown
that the time complexity of our algorithm grows polyno- Given large scale of observation sequence x and corre-
mially with n (i.e., the number of example observation- sponding label sequence y, CRF training phase suffices
label couples for training), exponentially improving the to construct the inference model P (y|x) on the graph,
dependence on n compared with the classical training al- i.e. finding the Boltzmann weights w = (w1 , .., wK ).
gorithm mentioned above. Furthermore, we also compare One approach to find the appropriate w is the minimum-
the QCRF and the classical CRF from the perspective likelihood method based upon the observable x and label
of computational learning theory. We show that, our y. If the training process is successful, the model joint
QCRF model has a much higher Vpanik-Chervonenkis distribution Px,y has enough resemblance to the priori
(VC) dimension [28] than the classical CRF, demonstrat- data
data joint distribution Px,y . To describe this approach-
ing that QCRF significantly improves the data learning ing, we introduce the log-likely hood function L, whose
ability over the classical CRF. minimum point corresponds to the appropriate w, or the
appropriate relationship. Suppose the average negative
log-likely hood function L is defined as [13]
II. REVIEW OF CLASSICAL CRF X X
L=− Pxdata P data (y|x) log P (y|x) (2)
In this section, we first review the definition of CRF x y
achieves exponential complexity in classical computing, In contrast to the priori empirical distribution, the quan-
since the second term traverses all the possible y ∗ leading tum model distribution of fk is formulated as
∂ ∗
to O(kQkn ) possibilities, and each possibility ∂w eE(x,y )
EQ (f1 , ..., fK ) = T r(ΛX,Y (ρX,Y H (0) )).
k
has O(n) terms, which is intractable facing large scale n. (11)
In the following, we will first give the theory of QCRF
model and then present a quantum algorithm for training The notation T r(·) means the trace of a matrix. Hamil-
the model that is prominently exponentially faster than tonian H (0) is composed
P P by all the base of Hilbert space
the classical CRF training algorithm. H, where H = {( wk fk,i )|uj i|fk,i ∈ {−1, 1}} and
i k
|uj i denotes a set of base spanning the Hilbert space
H. Density matrix ρX,Y encodes the joint distribution
III. QCRF MODEL
P (x, y) corresponding P
to every combination of features
fk,i . And ΛX,Y = Λ(x, y)|x, yihx, y| is the mea-
A. Fundamental theory of QCRF (x,y)
surement operator limiting the Hamiltonian only to the
Since classical CRF model is proposed according to the clamped X, Y provided by the training data set. The
classical conditional entropy, we propose the fundamental parameter Λ(x, y) = 1 if and only if x = X, y = Y , oth-
theory of QCRF model based on the principle of Quan- erwise Λ(x, y) = 0.
tum Conditional Entropy [15]. For the QCRF model, the The second constraint claims the normalization condi-
P (y ∗ |x) = 1. After introducing another La-
P
quantum conditional entropy S(y|x) is applied, which is tion:
y∗
defined as
grange parameter λ on the normalization condition, the
S(y|x) = S(ρX,Y ) − S(ρX ). (7) Lagrange formula function can be interpreted as:
The density operator ρX,Y encodes the joint distribu-
tion of P (x, y) in its amplitude, which can be decom- G(P (y|x)) = S(y|x) + (EQ (f1 , ..., fK )
posed onPits spectrum with the probability P (x, y), i.e.,
X
− EC (f1 , ..., fK )) + λ( P (y ∗ |x) − 1). (12)
ρX,Y = P (x, y)|x, yihx, y|. And the density operator y∗
x,y
ρX indicates the marginal distribution P (x) of random ∂G(P (y|x))
variable X, similarly, we can also decompose ρX into the Equating the partial derivative ∂P (y|x) to 0 and solv-
P
form of ρX = P (x)|xihx|. Thus the quantum condi- ing by P (y|x), we obtain
x
tional entropy S(y|x) can be expressed as (0)
number of identity gate IQ . The notation ⊗ means Kro- S0 = ΛX,Y (eH ). (21)
necker product. And the operator σ z indicates the Pauli The subspace Sn can also be extracted in a similar
z,l
Z operator [15]. Every element in the expression σk,i is an method. The fundamental difference just lies in the fact
z
identity operator except the (ki+l) -th σ operator. This that Sn traverses all the possible y ∗ . We need to find out
z,l
construction promises σk,i emerging in form of diagonal every y ∗ with the help of eigen-function fk (xi , yi∗ ). As a
operator, which encodes equivalent statistical distribu- result, the project operator ΛX can be expressed into the
tion on its diagonal. Specifically, Hamiltonian H (0) and form of:
H (n) can be expressed as n YK
kQk
Y 1 X (j) z,n−1
n X
K ΛX = I+ fk (xi , yi )|jihj| ⊗ σk,i .
2
X z,0
H (0) = wk σk,i (18) i=1k=1 j=1
i=1 k=1
(22)
and
(j)
n X
K The notation yi indicates all the circumstances for
yi . The measurement operator ΛX is established on a
X z,n
H (n) = wk σk,i (19)
i=1 k=1 (log kQk + K)n qubits quantum system, which can be
utilized testing whether the quantum system collapses
respectively. The parameter wk also indicates the Boltz- 1 1−f (x ,y
i i
(j)
)K i
(j)
1−f (x ,yi )
mann weight expressed in the potential function E(x, y). on the state |ji ⊗ | 2 i ⊗ ... ⊗ | 2 i or
Different with the classical CRF model describes the data not. We can extract the subspace Sn as follows:
patterns via the binary eigen-function fk (x, y), QCRF (n)
∂
the phase estimation P E( ∂w k
H (n) ). The system utilize the similar method
(n)
to estimate the(0)numer-
∂
becomes to ical value of T r(ΛX eH ), T r(ΛX,Y ∂w k
eH ), and
(0) (0)
D T r(ΛX,Y eH ). Then the terms heH iQ,k
X,Y and
1 X (n)
√ |ψk i|µk eEk i|0i. (34) heH iQ,k
X can be computed efficiently. Finally, the
D k=1 ∂L
gradient function ∂w gains.
k
x x
0 P +
x P +
P +
X x t -1
0 x2
1 1 0 xt
1ʽ
p+ 1ʽ 1 1
p+ 1 1 1 1
p+ p+
t tʽ
FIG. 1. EXP gate. We introduce the quantum multiply gate for real inputs and outputs , which can realize the following
transformation, Π+
m,n |ai|bi|ci = |ai|bi|c + abi, where m, n denoted the number of digits of a and b respectively [27]. This
†
quantum multiply gate can be decomposed into the form: Π+ + +
m,n = (I ⊗ I ⊗ QF T )πm,n (I ⊗ I ⊗ QF T ). πm,n is the intermediate
+
multiply adder, which achieves the transformation πm,n |ai|bi|ϕ(c)i = |ai|bi|ϕ(c + ab)i, with |ϕ(c)i := QF T |ci. Utilizing the
+
gate πm,n and Π+m,n , we achieve the EXP gate.
V. DISCUSSION: VC DIMENSION
C. Numerical simulation
According to the classical Bernouli theorem, the rela-
Since the quantum model can not be implemented tive frequency of an event in a sequence of independent
physically by current technology. To evaluate our model trials converges to the probability of that event. Fur-
8
s=0
0.5 then the data training procedure from Xt to Xt+1 , will
increase model’s degree. In detail, potential function
T P +1 P
TP
z z
P
changes from exp( wk σk,i ) to exp( wk σk,i )=
0.25 i=1 k i=1 k
T P
z z
P P
exp( wk σk,i ) exp( wk σk,T +1 ), whose degree is 2d.
i=1 k k
After reading the whole input X(T ) = X1 , X2 , ..., XT
0
0 50 100 150 200 250 300 350 , the state of any unit in the model can be expressed
Iteration times as a polynomial Pt , t = 1, 2, ..., T . The degree of Pt
t−1
can be estimated as Pt = 2dt +
P j
d . Considering
FIG. 3. The error rate of computing the gradient function j=1
the VC dimension of recurrent structure is bounded by
Ω(2K log(8ePT )) [18], then we obtain the QCRF’s VC di-
mension Ω(K2n ) when T = 2n . It is evident that QCRF
model extends this lower bound, which means QCRF can
thermore, in the learning theory, people prefer to obtain recognize much more samples and features than CRF.
the criteria on the basis of which one could judge whether
there is such convergence or not. The VC dimension, pro-
VI. CONCLUSION
posed by Vapnik et al. [28], formulates the conditions for
such uniform convergence which do not depend on the
distribution properties and furnishes an estimation for In this paper, we have constructed a general QCRF
the speed of convergence. In briefly, VC dimension eval- model by introducing four well-defined Hamiltonians and
uates the generalization ability of a learning model. The measurements. Meanwhile, in order to train this model,
T r(ΛX,Y eH
(0)
)
we have also presented an efficient hybrid quantum algo-
kernel theory of QCRF lies in PQ (y|x) = (n) , rithm to obtain the parameters w in the classical form.
T r(ΛX eH )
which utilizes diagonal Hamiltonian H (l) representing Compared to its classical counterpart, the quantum al-
the potential function, nevertheless, off-diagonal Hamil- gorithm achieves an exponential speed-up. In addition,
tonian is also permitted. If the instance problem needed, numerical simulation results have shown that our QCRF
we can add a traverseP field onto the system, as a result, model requires significantly fewer iterations to achieve
the Hamiltonian H = ϑk σk,i x z
+wk σk,i turns into an off- the same error rate than the classical CRF model. Fur-
i,k thermore, we have also demonstrated that the QCRF
diagonal matrix [7]. The construction of extended Pauli model possesses higher VC dimension than the classi-
x z
X operator σk,i is similar to σk,i , which just substitutes cal CRF model, which means QCRF is equipped with a
x z
σ to σ in the ki th position. Compared to the classical higher learning ability. We expect our work can inspire
CRF model scattering the number of Ω(Kn) data scales more quantum machine learning models and algorithms
in the whole space, the QCRF model’s VC dimension can for handling sequential data.
[1] B.Kulchytskyy, E.Andriyash, M.Amin, and R.Melko. [4] C.H.Bennett, J.I.Cirac, M.S.Leifer, D.W.Leung,
Quantum boltzmann machine. Physical Review X, N.Linden, S.Popescu, and G.Vidal. Optimal simu-
33(2):489–493, 2016. lation of two-qubit hamiltonians using general local
[2] B.Settles. Biomedical named entity recognition using operations. Physical Review A, 66(1):144–144, 2001.
conditional random fields and rich feature sets. In Pro- [5] C.H.Yu, F.Gao, and Q.Y.Wen. Quantum algorithms for
ceedings of COLING, International Joint Workshop On ridge regression. arXiv.
NLPBA, pages 104–107, 2004. [6] D.W.Berry, A.M.Childs, R.Cleve, R.Kothari, and
[3] C.Ghosh, C.Cordeiro, and D.Agrawal. Markov chain ex- R.D.Somma. Simulating hamiltonian dynamics with
istence and hidden markov models in spectrum sensing. a truncated taylor series. Physical Review Letters,
In IEEE International Conference on Pervasive Comput- 114(9):090502, 2014.
ing and Communications, pages 1–6, 2009.
9
[7] F.Wilczek. Particle physics: Mass by numbers. Nature, Review Letters, 113(13):130503, 2014.
456(7221):449, 2008. [20] P.Rebentrost and S.Lloyd. Quantum gradient descent
[8] G.Brassard, P.Hoyer, M.Mosca, and A.Tapp. Quantum and newton’s method for constrained polynomial opti-
amplitude amplification and estimation. Quantum Com- mization. arXiv, 2016.
putation and Information, 5494:53–74, 2012. [21] P.Rebentrost, T.R.Bromley, C.Weedbrook, and S.Lloyd.
[9] G.Forney and Jr.David. The viterbi algorithm. In Pro- A quantum hopfield neural network. arXiv, 2018.
ceedings of the IEEE, volume 61, pages 268–278, 2003. [22] P.Wittek and S.Lloyd. Quantum machine learning. Na-
[10] G.H.Low and I.Chuang. Hamiltonian simulation by ture, 549(7671):195, 2017.
qubitization. arXiv, 2016. [23] P.W.Shor. Polynomial-time algorithms for prime factor-
[11] G.H.Low and I.Chuang. Optimal hamiltonian simulation ization and discrete logarithms on a quantum computer.
by quantum signal processing. Physical Review Letters, Siam Review, 41(2):303–332, 1999.
118(1):010501, 2017. [24] S.Fei and F.Pereira. Shallow parsing with conditional
[12] Grover and k.Lov. A fast quantum mechanical algorithm random fields. In Conference of the North American
for database search. Physical Review letter, 78:212–219, Chapter of the Association for Computational Linguistics
1996. on Human Language Technology, pages 134–141, 2003.
[13] J.D.Lafferty, A.P.Mccallum, and C.N.Fernando. Condi- [25] S.Lloyd, M.Mohseni, and P.Rebentrost. Quantum algo-
tional random fields: Probabilistic models for segmenting rithms for supervised and unsupervised machine learning.
and labeling sequence data. In Eighteenth International arXiv, 2013.
Conference on Machine Learning, pages 282–289, 2001. [26] S.Lloyd, M.Mohseni, and P.Rebentrost. Quantum princi-
[14] L.C.Wan, C.H.Yu, S.J.Pan, F.Gao, Q.Y.Wen, and pal component analysis. Nature Physics, 10(9):108–113,
S.J.Qin. Asymptotic quantum algorithm for the toeplitz 2013.
systems. Physical Review A, 2018. [27] S.S.Zhou, T.Loke, J.A.Izaac, and J. B. Wang. Quantum
[15] M.Nielsen and I.Chuang. Quantum computation and fourier transform in computational basis. Quantum In-
quantum information. Mathematical Structures in Com- formation Processing, 16(3):82, 2017.
puter Science, 21(1):1–59, 2002. [28] V.N.Vapnik and A.Y.Chervonenkis. On the uniform con-
[16] M.Schuld, I.Sinayskiy, and F.Petruccione. Prediction by vergence of relative frequencies of events to their prob-
linear regression on a quantum computer. Physical Re- abilities. Theory of Probability and Its Applications,
view A, 94(2), 2016. 16(2):264–280, 1971.
[17] N.Wiebe, A.Kapoor, and K.Svore. Quantum deep learn- [29] W.A.Harrow, A.Hassidim, and S.Lloyd. Quantum algo-
ing. Computer Science, 2014. rithm for linear systems of equations. Physical Review
[18] P.Koiran and E.D.Sontag. Vapnik-chervonenkis dimen- Letters, 103(15):150502, 2009.
sion of recurrent neural networks. In European Confer- [30] X.He, R.S.Zemel, and M.A.Carreira-Perpindn. Multi-
ence on Computational Learning Theory, pages 223–237, scale conditional random fields for image labeling. In
1997. CVPR, 2004.
[19] P.Rebentrost, M.Mohseni, and S.Lloyd. Quantum sup-
port vector machine for big data classification. Physical