Factorisation Memoire
Factorisation Memoire
doi:10.1103/PhysRevLett.127.140503
Factoring 2 048-bit RSA Integers in 177 Days with 13 436 Qubits
and a Multimode Memory
∗ †
Élie Gouzien and Nicolas Sangouard
Université Paris–Saclay, CEA, CNRS, Institut de Physique Théorique, 91 191 Gif-sur-Yvette, France
(Dated: September 29, 2021)
We analyze the performance of a quantum computer architecture combining a small processor
and a storage unit. By focusing on integer factorization, we show a reduction by several orders
of magnitude of the number of processing qubits compared with a standard architecture using a
planar grid of qubits with nearest-neighbor connectivity. This is achieved by taking advantage
of a temporally and spatially multiplexed memory to store the qubit states between processing
steps. Concretely, for a characteristic physical gate error rate of 10−3 , a processor cycle time of
1 microsecond, factoring a 2 048-bit RSA integer is shown to be possible in 177 days with 3D gauge
color codes assuming a threshold of 0.75 % with a processor made with 13 436 physical qubits and a
memory that can store 28 million spatial modes and 45 temporal modes with 2 hours’ storage time.
By inserting additional error-correction steps, storage times of 1 second are shown to be sufficient
at the cost of increasing the run-time by about 23 %. Shorter run-times (and storage times) are
achievable by increasing the number of qubits in the processing unit. We suggest realizing such
an architecture using a microwave interface between a processor made with superconducting qubits
and a multiplexed memory using the principle of photon echo in solids doped with rare-earth ions.
Introduction — Superconducting qubits form build- qubit address. When the processing is done, the qubit
ing blocks of one of the most advanced platforms for re- state is mapped back to the memory and stored until
alizing quantum computers [1, 2]. The standard archi- another processing operation is needed.
tecture consists of laying superconducting qubits in a 2D
grid and computing using only neighboring interactions. More precisely, we use 3D error-correction codes [11]
Recent estimations showed however that fault-tolerant in which the address of each (dressed) logical qubit is en-
realizations of various quantum algorithms with this ar- coded into a 3D structure of physical addresses, two di-
chitecture would require millions of physical qubits [3–5]. mensions being encoded in space and one in time (see Fig-
These performance analyses naturally raise the question ure 1). Error-correction and logical gates are applied by
of an architecture better exploiting the potential of su- sequentially releasing physical qubits corresponding to
perconducting qubits. different “horizontal” slices (with different temporal in-
dexes) and by processing each slice (with the same tem-
In developing a quantum architecture we have much poral indexes) simultaneously.
to learn from classical architectures. Realizations using
trapped ions for example combine processing with stor- We assess the performance of this architecture through
age units [6]. The authors of Ref. [7] realized that key a version of Shor’s algorithm [12] proposed by Ekerå and
quantum algorithms are mostly sequential meaning that Håstad [13]. The algorithm is a threat for widely used
we may only need a small computing block for all the cryptosystems based either on the factorization [14] or
qubits in the storage unit in this architecture. Ongoing the discrete logarithm problem [15, 16]. It can also be
experimental efforts aim at exploiting this idea to reduce considered as a certification tool to check the proper func-
the number of superconducting qubits in the standard tioning of an actual quantum computer as its outcome
approach to quantum computing by adding a quantum can be verified efficiently. Last but not least, the cost
memory implemented with spins or atoms [8–10]. A de- of its implementation has been evaluated using plausible
tailed analysis of the performance of this hybrid archi- physical assumptions for a large scale processor with a
tecture is however missing. standard 2D grid of superconducting qubits (a character-
istic physical gate error rate of 10−3 , a surface code cycle
We here report on such an analysis by considering a time of 1 µs, and a reaction time of 10 µs): it was esti-
quantum memory that can store multiple spatial trans- mated that it should be possible to factor a 2 048-bit inte-
verse and temporal modes. The memory can be thought ger, typically used in the Rivest–Shamir–Adleman (RSA)
of as a qubit register in which the address of each qubit cryptosystem, in 8 hours with 20 million qubits [3].
is identified by a temporal and a spatial index. When
a given qubit needs to be processed, its state is released By taking this estimation as a reference, we estimate
and mapped into the processor by means of a microwave the cost of implementing the same version of Shor’s al-
field in a temporal and spatial mode corresponding to the gorithm in terms of physical processing qubit number,
2
temporal modes
h = g (pq−p−q+1)/2 g (p+q−2)/2 ≡ g (p+q−2)/2 mod N where
the last equivalence is the result of the Chinese remain-
der theorem. Under the assumption that the order r of
g (the smallest non-negative integer such that g r ≡ 1
mod N ) satisfies r > (p + q − 2)/2, computing the dis-
es
od
crete logarithm of h modulo N , as detailed later, yields
lm
ia
l = (p + q − 2)/2. For large N , the assumption is ver-
at
sp
ified with a high probability [13]. Using N = pq and
l = (p + q − 2)/2, where N and l are both known, p and
q are recovered by choosing one solution of the equation
N = p(2l + 2 − p), and then exploiting q = 2l + 2 − p.
Processor
The discrete logarithm is computed in three steps.
Figure 1. Quantum computer architecture using a processor First, the exponentiation (e1 , e2 ) → g e1 h−e2 is applied
made with a 2D grid of qubits and a memory operating as a once on two quantum registers prepared in a superpo-
qubit register where the address of each qubit is specified by sition of every possible value of e1 and e2 , respectively.
a temporal and spatial index. Only (dressed) logical qubits Two quantum Fourier transforms are then applied in-
are represented; additional ancillary qubits are used for mea-
dependently to the two registers before being measured.
suring the operators for error correction.
Finally, a classical postprocessing extracts the discrete
logarithm l of h modulo N from the measurement results.
multimode capacity, memory storage time, and run-time. Because the measurements are performed directly after
Our evaluation is given in the case where the processor the Fourier transform, the cost of exponentiation largely
is made with two (dressed) logical qubit slices. Under dominates the cost of Ekerå and Håstad’s algorithm (see
the assumptions used in Ref. [3] for the gate error rate Appendix A).
and the cycle time, we show that it should be possible to
factor a 2 048-bit RSA integer in 177 days using a multi- Number of gates — The modular exponentiation
mode memory with a storage time of about 2 hours and a needed in Ekerå and Håstad’s algorithm, i.e., the op-
processor including 13 436 physical qubits—a reduction eration |ei |1i 7→ |ei |g e mod N i, with the input e and
by more than 3 orders of magnitude of the number of the output g e mod N encoded on ne and n bits, respec-
physical qubits, as compared to the standard architec- tively, can be decomposed into ne multiplications, each
ture without memory [3], at the cost of a ≈ 500 times being decomposed into 2n controlled additions of integers
longer run-time. By inserting additional error-correction of typical size n and one controlled swap between two reg-
steps, we show that the storage time can be significantly isters of size n, giving a total number of 2ne n (ne ) con-
reduced at the cost of a slight increase of run-time. We trolled additions (swaps between registers, respectively)
also explain how shorter run-times and storage times are (see Appendix B for details). Each modular addition
achievable at the cost of increasing the number of qubits is obtained with a standard adder circuit at the cost of
in the processing unit. We propose a realization of such a specific representation—the coset representation (see
an architecture using a microwave interface between a Appendix C)—adding m additional qubits to the reg-
processor made with superconducting qubits and a mul- ister. A controlled swap operation between two qubits
tiplexed memory using the principle of photon echo in can be performed using two controlled NOTs (CNOTs)
solids doped with rare-earth ions embedded in cavities. and one Toffoli gate. Hence, the total cost for controlled
swaps operating on two registers using n + m qubits is
Principles of (a variant of ) Shor’s algorithm — Con- of 2(n + m) CNOTs and n + m Toffoli gates (see Ap-
sider the factorization of N = p × q, the product of two pendix B). For the controlled addition, we can use a
prime numbers of similar sizes, p and q. We note n the semi-classical adder whose mean cost for integers of size
number of bits involved in the binary representation of n + m is of 5.5(n + m) − 9 CNOTs and 2(n + m) − 1 Tof-
N , that is 2n−1 ≤ N < 2n . While no efficient classical foli gates (see Appendix B). Given the number of gates
factorization algorithm is known, Shor’s algorithm and in controlled addition and swap operations, the num-
its variants factor N with a polynomial complexity into ber of additions and swaps in the multiplication, and
n [12, 13, 17–20]. the number of multiplications in the modular exponen-
tiation, the cost of factorization can easily be estimated
The version of Shor’s factorization algorithm proposed (see Appendix B). This cost can however be reduced
by Ekerå and Håstad [13] starts by randomly select- using windowed arithmetic circuits [21]. The basic idea
ing an integer g in the multiplicative group of inte- consists of grouping the bits of e by blocks (each includ-
3
ing we bits) for controlling each multiplication, hence re- error probability on one logical qubit given in [26, eq. (4)]
ducing the number of these multiplications. Similarly,
for each multiplication input bits are grouped (in blocks p
plogical = A exp α log dβ (1)
including wm bits) to reduce the number of additions pth
composing it. As detailed in Appendix D, the cost of where A ≈ 0.033, α ≈ 0.516, β ≈ 0.822, p is the error
exponentiation is dominated in this case by 2 new(n+m)n
e wm
1- probability per physical qubit, d the code distance which
qubit gates, [2we +wm n + 12(n + m)] new(n+m)
e wm
CNOTs, is related to the number of physical qubits per logical
2
qubits (see below) and pth the fault-tolerance threshold.
and 4 new(n+m)
e wm
Toffoli gates. We emphasize that this is
While the circuit-level threshold is unknown, we choose
a first order estimation. In the code used to compute
pth = 0.75 % as a working hypothesis and give in Ap-
the required resources and find optimal parameters, the
pendix E 4 the run-time and the resource as a function
complete formulae have been used [22].
of the code threshold.
Error correction — The error correction is achieved Architecture — For simplicity, the tetrahedral struc-
using 3D gauge color codes, a family of subsystem ture of the error correction (see Appendix E) can be in-
codes [11]. A first code admits a transversal implemen- cluded into a large cube in which physical qubits are now
tation of CNOT and Hadamard gates while a second represented by elementary cubes (see Figure 1). The
code accepts a transversal implementation of the non- large cubes are stored into the memory and loaded by
Clifford T gate. Switching between the two codes gives slices into the processor when they need to be processed.
a universal set of gates without the need for state distil- We size the processor such that one slice of two large
lation [23], contrary to standard ways of operating the cubes can be loaded simultaneously, which is convenient
surface code [24]. to perform 2-qubit gates efficiently. Each gate is imme-
diately followed by an error-correction round on the pro-
The two codes are based on a shared geometrical struc- cessed qubits. This is done by reloading again each slice
ture: a large tetrahedron constructed from elementary sequentially in the processor and by measuring the gauge
tetrahedrons (see Appendix E for details). A physical generators (before recovering classically the code stabiliz-
qubit is attributed to each elementary tetrahedron. As in ers), each of them using up to six 2-qubit gates, one auxil-
any subsystem codes, the stabilized subspace is split into iary qubit and one measurement of this auxiliary [23, 27].
a tensorial product of the (bare) logical and gauge qubits Note that the codes of interest are 3D local and the aux-
(the dressed logical qubit includes the bare logical qubit iliary qubits only need to keep coherence for the time
and gauge qubits). A set of operators—generators of of loading and measuring two successive slices for suc-
gauge operators—are measured, each being the product cessfully performing a stabilizer measurement. Once the
of (up to six) X (or Z) operators associated to qubits cor- syndromes are obtained and the errors are detected, the
responding to tetrahedrons sharing the same edge. From correction of these errors is delayed and merged with the
these measurements, the values of stabilizers of the two next operation applied on the qubit to be corrected. Fur-
codes are deduced. In the code used for implementing ther note that all-to-all connectivity between the logical
H and CNOT gates, the stabilizers are defined from the qubits is achieved if each physical address in the memory
vertices, i.e., the product of X (or Z) operators asso- can be mapped to three physical qubits in the proces-
ciated to qubits corresponding to tetrahedrons sharing sor: two for the 2-qubit gates (depending on whether the
the same vertex. In the code used for implementing T physical qubit is the logical control or target qubits) and
gates, the stabilizers are defined from the vertices for one for the error correction. For achieving a code dis-
X operators and from the edges for Z operators. The tance d the number of physical qubits in the processor is
value of an operator represented by a vertex is classi- 2
nqubits = 2 × 2 × 3d +2d−3 , corresponding to two logical
2
cally recovered by multiplying the measurement results qubit slices (see Appendix E) and including the ancil-
of combinations of specific edges ending at the given ver- lary qubits (essentially one per physical qubit) needed
tex. Several combinations are possible giving redundan- for stabilizer measurements. For a code distance d, we
cies that can be exploited to achieve fault-tolerant error approximate the time it takes to perform one (1-qubit or
correction with only one run of measurements [25]. The 2-qubit) logical gate by 2(d − 2)tc where tc is the cycle
structure of codes in which the stabilized subsystem is time of the 2D processor (time to load one qubit slice; to
the tensor product of the gauge and (bare) logical sub- measure the stabilizers, which is longer than the gate op-
systems guarantees that measurements of gauge opera- eration; and to reload the slice into the memory) and the
tors do not reveal the value of the (bare) logical qubit factor 2 comes from the fact that the gate is immediately
(see Appendix E). followed by an error-correction round.
To account for the additional resource needed to im- Cost evaluation — To evaluate the resources required
plement these codes, we use an estimation of the residual for integer factorization, we consider the total number
4
time
9
week sor (see Appendix F). We also estimated that 28 million
8
Expected time
7 spatial modes and 45 temporal modes need to be stored.
6 day
Note that the number of stored modes does not enter in
5
the volume and is thus not optimized (see Appendix G).
4 hour Note also that qubit addresses in the memory can be
identified by temporal indexes only at the cost of longer
3
run-time when photon-echo type protocols are used, cf.,
below for a concrete example.
2 minute
32 64 128 256 512 1024 2048
n Implementation — Our proposal provides a viable
solution to get rid of the individual control of millions
of qubits but the challenge now relies on the realization
Figure 2. Number of qubits in the processor and run-time to
of an efficient multimode quantum memory. As shown
factor n-bit RSA integers with a computer architecture using
a multimode memory. in Ref. [28], such a memory could be implemented us-
ing a solid-state spin ensemble (N̄ spins with an inho-
mogeneous spectral broadening Γ), resonantly coupled
of gates involved in the logical circuit. The total run- (with single spin coupling rate g) to a frequency tunable
time for one attempt is obtained by multiplying the gate single-mode microwave resonator (of length L and with
number by the time it takes to perform one gate, while damping rate κ to an external transmission line). The
the success probability is deduced from the logical error resonator serves to enhance microwave absorption and
probability [Eq. (1)]. Following Ref. [3], we consider a re-emission by the spins. In particular, unit efficiency
cycle time of tc = 1 µs and a mean error per physical absorption of a microwave field can be realized if the fi-
qubit and per gate of p = 10−3 . Note that the mean nesse F of the resonator matches the single-path absorp-
−1
error per gate now includes errors during reading of and tion αL of spins F = (αL) , i.e., if the cooperativity
2
writing into the memory. C = gκΓN̄ = αL × F = 1 [29]. Once absorbed, the mi-
crowave field can be re-emitted by time reversing the in-
The cost evaluation is finally obtained by optimizing homogeneous dephasing using a spin echo technique [30].
the two window parameters wm and we , the coset rep- Detuning the resonator off and on resonance at the right
resentation padding m and the code distance d in order time, the spin coherence is recovered, leading to a noise-
to minimize the volume texp × nqubits . texp = pts is the free, unit re-emission probability √
of the stored photon if
average time to obtain the result (several attempts might C = 1 [28]. In the regime κ g N̄ Γ, the memory
be necessary), with t the computation time per attempt bandwidth is given by 4Γ [28], meaning that any input
and ps the success probability. with a spectrum, say, ten times thinner i.e., 4Γ/10 can
be stored with close to unit efficiency. Furthermore, the
Results — The required resources to factor a n-bit time duration during which an optical coherence can be
RSA integer are presented in Figure 2 and discussed in preserved is limited by the inverse of the homogeneous
Appendix F. Our estimation suggests that the factoriza- linewidth γh [28]. Assuming that the storage efficiency
tion of a 2 048-bit integer corresponding to the most com- is unchanged if the storage time is hundred times shorter
mon RSA key size would be possible in about 177 days than γh−1 , this means that the number of temporal modes
with a processor having only 13 436 qubits. Concern- that can be stored with almost unit efficiencies is roughly
ing the memory, we made the hypothesis of an error per given by Γ/(250γh ). Interestingly, a well-identified tem-
cycle of p = 10−3 , including the reading and writing poral mode can be released while keeping all the other
error. As previously discussed, we need a memory for modes in the memory by appropriately detuning the res-
which each mode can be mapped to three different qubits onator off and on resonance with the spins at the cost of
of the processor. We estimated the maximum time be- introducing a dead time between two readouts of half the
tween storage and readout of the same qubit to be less duration of the stored train of pulses on average.
than 2 hours. A memory with a storage time of at least
2 hours is however not necessary as error-correction steps To give an idea of what could be realized in a near
can be implemented periodically at the cost of increasing future, we estimate that it should be possible to factor
the run-time. Error correction of all the qubits stored in 35 in about 1 min using the exact algorithm presented
the memory is estimated to take 186 ms with a processor here (with windowed arithmetic and 3D color codes) and
having 13 436 bits, meaning that the storage time simply a setup combining a memory for storing 38 logical qubits
needs to be longer than 186 ms. Applying a correction (3 002 spatial modes and 5 temporal modes) and a pro-
5
cessor with 316 physical qubits (we estimate that more after the Fourier transform, hence explaining the mea-
than 60 000 qubits would be needed with a standard 2D surements of each qubit at the end of gate sequences in
grid and surface code). If instead of using a spatially and Figure 3a. The simple rearrangement presented in Fig-
temporally multiplexed memory, the qubits are stored in ure 3b shows that the measurement can be performed
the same spatial mode and are identified by (6 650) tem- right after the Hadamard provided that the following
poral addresses only, we evaluate the same factorization phase gates are classically controlled by the result of this
to be possible in about 1 day using a memory bandwidth measurement, see Figure 3c. In this case, the successive
4Γ = 2π × 48 MHz and taking into account the corre- classically controlled phase gates operating on the same
sponding dead time between two memory readouts. In qubit can be merged together, leading to a circuit with
this case, error correction of all the qubits stored in the one phase gate, one Hadamard gate and one measure-
memory is estimated to take 132 ms meaning the stor- ment per qubit. When this semi-classical Fourier trans-
age time needs to be longer than 132 ms. For a memory form operates on a register made with ne qubits (the
bandwidth 4Γ = 2π × 120 MHz, the same factorization number of bits of the exponent), its cost is linear in ne
would take 9 hours, and error correction is estimated and is thus negligible compared to the cubic complexity
to take 53 ms. As discussed in Appendix H, these re- of the exponentiation.
quirements can realistically be met with a realization of
the memory protocol described before combining a solid
doped with rare-earth and a superconducting microwave
resonator [31–33]. Appendix B: Decomposition of the exponentiation
into elementary gates
Conclusion — We have shown that the use of a quan-
tum memory for quantum computing is appealing as un-
processed qubits can be loaded into the memory which In this appendix, we aim to give a clear view of how to
significantly reduces the size of the processor compared decompose the modular exponentiation into elementary
with standard architectures where all qubits are kept gates. The presented method is intended to be simple
in the processor. All-to-all connectivity between logical to understand, but not optimal. A more efficient one is
qubits is reached if each address in the memory can be presented in Appendix D.
mapped to only 3 qubits in the processor. The use of a
memory allows one to exploit a 3D code on a 2D pro-
cessor. If we allow each memory mode to be mapped to
any qubit in the processor, all-to-all connectivity between 1. Decomposition of a modular exponentiation into
physical qubits can be obtained, hence offering many op- additions
portunities for error correction and for implementing al-
gorithms with gates operating between non-neighboring
qubits. The modular exponentiation needed in Ekerå and
Håstad’s algorithm, i.e., the operation |ei |1i 7→
We acknowledge M. Afzelius, J.-D. Bancal, P. Bertet, |ei |g e mod N i, with the input e and the output g e
E. Flurin, P. Sekatski, X. Valcarce and J. Zivy for stim- mod N encoded on ne and n bits respectively, can be
ulating discussions and/or for critically reviewing the implemented from controlled modular additions as we
manuscript. We acknowledge funding by the Institut de show now. For simplicity, we omit the modulo in this
Physique Théorique (IPhT), Commissariat à l’Énergie paragraph.
Atomique et aux Energies Alternatives (CEA) and the
Region Île-de-France in the framework of DIM SIRTEQ. Let ene −1 . . . ei . . . e0 be the binary form of e. The ex-
ponentiation can first be seen as a sequence of multipli-
cations
e −1
nY e −1 h
nY iei
Appendix A: Semi-classical Fourier transform i i
ge = g2 ei
= g2 (B1)
i=1 i=1
We here discuss the semi-classical Fourier transform where each multiplication is controlled by the bit value
presented in [34] and show that its cost is negligible. ei . Figure 4 shows an implementation of such a mul-
The standard way to perform the Fourier transform on tiplication in which a quantum register encoding the
i
ne qubits is shown in Figure 3a: it requires a sequence of integer x ends up into an encoding of x × g 2 ei . It
one Hadamard and controlled phase gates for each qubit. uses two controlled product-additions, i.e., the opera-
In Shor’s algorithm as well as in Ekerå and Håstad’s ver- tion letting |yi |zi unchanged if |ei i = |0i and map-
i
sion of Shor’s algorithm, the qubits are measured right ping |yi |zi into |yi |z + y × γi ((y, z, γ) → (x, 0, g 2 ) for
6
H • ··· • ···
φ2 · · · H • ··· •
··· φ2 · · · H •
· · · φne · · · φne −1 φ2 H
H • • ··· • ···
φ2 H • ··· • ···
φ3 φ2 H ··· ··· •
··· φ ne φne −1 · · · φ2 H
(c) Semi-classical Fourier transform. The successive classically controlled rotation gates can
be merged.
H • • ··· • ··· +3
φ2 H • ··· • ··· +3
φ3 φ2 H ··· ··· • +3
··· φ ne φne −1 · · · φ2 H
Figure 3. Different versions of the Fourier transform followed by measurements. They are used to convince the reader that
the number of gates in the Fourier transform is negligible with respect to the cost of
the2πiexponentiation.
These three versions
are based on the phase gates φk defined as a 2 × 2 matrix with diagonal elements 1, e 2k and zeros off diagonal. Note that
the control and target qubits can be reversed in the representation of each controlled phase gate without changing the result.
|ei i • |ei i E
2i
i yn−1 . . . y0 of y and by rewriting the product as
|xi / ×g xg 2 ei
= n−1 n−1
|ei i • • • |ei i X X
γ2j yj =
j
i
E y×γ = γ2 yj . (B2)
|xi −2i 2 ei
/ Input x +x̄(−g ) × xg j=0 j=0
i
|0i / +xg 2 Input x̄ × |0i As yj is either 0 or 1, the controlled product-addition
can be implemented by a sequence of additions, each of
Figure 4. Principle of a modular multiplication circuit trans-
them controlled both by the values of bits |yj i and |ei i.
forming a quantum register encoding the integer x into a state
i Figure 5 shows explicitly the decomposition of the first
encoding x × g 2 ei . A first product-addition
E operation trans-
i product-addition appearing into each multiplication of
forms auxiliary qubits in |0i into x × g 2 if |ei i = |1i. Then,
the exponentiation.
i i
a product-addition applies +x̄(−g −2 ) with x̄ = x × g 2 into
the register encoding x if |ei i = |1i. A final swapping is ap- |ei i • |ei i • ··· • ···
E
2i ei |x0 i • ··· ···
plied if |ei i = |1i to put the quantum register into x × g ..
··· ···
|xi Input x = .
and resets the auxiliary qubits to |0i. Note that all the oper- |xj i ··· • ···
..
··· ···
ations are performed modulo N .
.
i i
/ +xg 2 / +20 g 2 · · · +2j g 2i · · ·
additions (swaps between registers respectively). Each two controlled qubits |ei i and |xj i are both in state |1i.
addition needs to be modular, which can be obtained When such an addition is applied on a quantum register
with a specific representation and a standard adder cir- encoding z 0 using n + m qubits, the block in the dashed
cuit. box of Figure 6a is repeated n + m − 2 times, giving a
mean cost of 5.5(n+m)−9 CNOTs and 2(n+m)−1 Tof-
foli gates.
2m
P−1
Figure 7. Preparation proposed in [35, Fig. 1] of a quantum register with n + m qubits in the state √1 |z + kN i as
2m
k=0
requested in the initialization of the coset representation. The first controlled operation adds the integer N to the register
made with n+m qubits provided that the ancillary qubit is in state |1i. The first classically controlled operation aims to change
the phase of the input state encoded in n + m qubits if and only if the result of the measurement is 1 and the number encoded
in the n + m qubits is larger or equal than N . In case one of the two conditions is not met, the input state is unchanged.
2m −1
The basic idea of the coset representation for adding √1
P
preparing the state |z + kN i in an extended
2j γ modulo N to a quantum register encoding the in- 2m
k=0
teger z is to extend the register for z with m additional register of size n + m. This is done by performing suc-
2m −1 cessive additions, each controlled by an ancillary qubit
qubits and to encode it into the state √21m
P
|z + kN i.
k=0 prepared in the state |0i+|1i
√
2
(m ancillary qubits in total)
Except at the bounds, this state is invariant under the which is then uncomputed, see Figure 7. The controlled
addition of N . This implies addition is performed using the circuit presented in Fig-
m ure 6a with only one control qubit. The uncomputation
2 −1
1 X of the ancillary qubit is based on a measurement and
z + 2j γ + kN mod 2n+m
√
m
2 k=0 depending on the result, a conditioned correction is re-
2 −1 m alized, see Figure 7. Let us detail the uncomputation of
1 X the first ancillary qubit presented in Figure 7. When the
z + 2j γ
≈√ mod N + kN (C1)
2m k=0 result of the measurement is 0, the register made with
n + m qubits is projected into √12 (|zi + |z + N i). When
i.e. the modular addition of 2j γ in the register of z can be the result is 1, the register state is √12 (|zi − |z + N i)
performed with a standard adder (modulo 2n+m ), at the
and the operation −1 needs to be applied to the compo-
cost of a small error which is exponentially suppressed
nent |z + N i, i.e., when the state of the register encodes
when increasing m [35]. Note also that the precision is
an integer larger than N . In order to implement the
improved if instead of adding 2j γ, one adds 2j γ mod N
conditioned operations for decomputing the m ancillary
(which does not change the result of the sum since we
qubits, we need to compare the value x encoded in the
consider the sum modulo N ). This is possible each time
quantum register of size n + m and an integer y known
the quantity to add is known classically.
classically satisfying 0 < y ≤ 2m−1 N < 2n+m (see the
The goal of the first subsection is to show that the last umcomputation in Figure 7) i.e. that can be writ-
resource needed to extend the register encoding |zi into ten with n + m bits. This comparison is implemented
2m −1 using the circuit presented in Figure 8a. First, the value
the state √21m
P
|z + kN i, as requested in this repre- 2n+m − y = y 0 is computed classically. Then the last
k=0
sentation, is negligible with respect to the resource taken carry of the sum of x and y 0 is computed with a circuit
to implement the modular exponentiation. In the second derived from the addition. If the value of this carry is 1,
subsection, we show that the coset representation is com- we conclude that x ≥ y, otherwise x < y. A Z gate is
patible with the modular multiplication circuit presented thus applied on the qubit encoding the last carry, before
in the main text. uncomputing the carries. The register ends up in state
± |xi depending on the relative value between x and y,
In the two next subsections, the coset representation as desired.
is considered for additions modulo N ; n is the number of
bits encoding N , and m the number of qubits added to Each controlled addition and correction costs
the register for the coset representation. O(n + m) gates. This operation is repeated m times,
giving a total cost of the coset representation initializa-
tion of the order O(m(n + m)).
• • • • • •
x2 • • x2
Appendix D: Windowed arithmetic
y20 • • • • y20
Z
(b) AND computation (c) AND uncomputation In order to reduce the number of multiplications and
• • • • additions in the exponentiation algorithm, we use win-
• = • • = •
|0i |0i
dowed arithmetic circuits [21]. They consist in grouping
the bits of e for controlling each multiplication, hence re-
Figure 8. (a): Circuit inspired from [4, Fig. 17] which com- ducing the number of multiplications. Similarly, for each
pares the integer x encoded in n + m qubits and the integer multiplication, the input bits are grouped to reduce the
y < 2n+m known classically, and returns − |xi if and only if number of additions composing each multiplication.
x ≥ y. This is done in three steps: i) compute the carries of
y 0 + x with y 0 = 2n+m − y, ii) apply a Z operation on the last
carry and iii) uncompute the carries. The next subsection shows the details of the decompo-
(b) and (c): circuits defining the notations used to compute sition of the exponentiation into elementary additions of
and uncompute an AND operation, as introduced in [37, 40] the form +Tk mod N where the quantity Tk depends on
where the authors give efficient implementations in terms of the value of an integer k. These specific additions are im-
T (or π4 ) gates. When only one quantum control appear, it plemented in three steps, that are presented in separated
uses a CNOT instead of a Toffoli gate, and it can be removed subsequent subsections.
by directly using the control bit instead of the ancillary.
2m −1
√1
P
2m
|kN i takes O(m(n + m)) gates which is negligi- 1. Windowed exponentiation and multiplication
k=0
ble compared to the cubic cost of the full exponentiation.
Note however, that the cost of this initialization is taken
into account in our script for the evaluation of the whole Let us start by specifying the notations. We label the
algorithm cost. binary form of e as
ei:i+w
z }| e {
ene −1 . . . ei+we ei+we −1 . . . ei . . . e2 e1 e0 (D1)
2. Compatibility with the multiplication
i.e. ej is the jth bit of e. Let also ei:i+we be defined as
i+w e −1
When computing the multiplications from sequences X
of two product-additions (see Figure 4 of main text), the ei:i+we = 2j−i ej (D2)
j=i
input register encoding x and the ancillary register are
used both as control and target of the product-additions. i.e. ei:i+we is the number whose bit decomposition is
We here check that having the control register encoded in given by the bits of e starting at index i and taking
the coset representation is not a problem for performing we bits. The strategy for computing the exponentiation
the multiplication. using windowed arithmetic consists in decomposing ex-
ponent e in terms of numbers ei:i+we
Let us consider the first product-addition used
to implement the multiplication shown in the bot- X
e= 2i ei:i+we , (D3)
tom part of Figure 4. In the coset representation,
0≤i<ne
the input x and ancillary registers are in the state i≡0 mod we
2m −1 2m −1
1
|0 + k 0 N i meaning that af-
P P
2m |x + kN i such that
k=0 k0 =0
ter the product-addiction, their state ends up in Y i
2m −1 2m −1 i
E ge = g2 ei:i+we
. (D4)
1
|x + kN i (x + kN )g 2 + k 0 N mod 2n+m .
P P
2m 0≤i<ne
k=0 k0 =0 i≡0 mod we
10
(a) Multiplication for the exponentiation, with i = 2 and we = 2, decomposed into product-additions.
|ei Input ei:i+we Input ei:i+we Input ei:i+we
=
i
+x̄ −g −2 ei:i+we
i
|xi / ×g 2 ei:i+we
mod N |xi / Input x mod N
i
|0i / +xg 2 ei:i+we
mod N Input x̄ h0|
Input x0:3
Input x
=
Input x3:6
i i i
/ +xg 2 ei:i+we
mod N / +20 x0:3 g 2 ei:i+we
mod N +23 x3:6 g 2 ei:i+we
mod N
(c) Modular addition of a number read with a table lookup, as needed in (b).
we + wm we + wm
/ Input k / Input k Input k
n |Tk i
= |0i / Load Tk Input Tk Unload Tk |0i
Figure 9. Windowed arithmetic subcircuits for the modular exponentiation. When not specified, the register size is n+m qubits
(register encoded into the coset representation of integers).
The comparison with the decomposition of g e presented k2 = xi:i+wm , k being the concatenation of k1 and k2 ) to
in Eq. (B1) clearly shows that windowed exponentiation be added is known classically. Its addition being realized
divides the number of multiplications by we . modulo N , its value can be computed modulo N before
being loaded. n bits are thus sufficient to encode Tk .
As for the standard algorithm, the multiplications
of the product (D4) are implemented successively and Loading a value Tk into a quantum register is done
each multiplication is decomposed into a sequence of two using a quantum table lookup circuit which we discuss
product-additions, as shown in Figure 9a. The difference right after. The subsequent subsection is dedicated to
is that the added number now depends on the number the task aiming to unload the value Tk and reset the
ei:i+we . register in state |0i. The last subsection is dedicated to
the requested addition.
The product-addition is also performed in a windowed
way [21]. Figure 9b shows in particular how the first
product-addition needed for each multiplication is per-
formed using windows for input x of size wm = 3. 2. Table lookup
in |ki |Tk i. The circuit presented in Figure 10 shows the 3. Table unlookup
principle of this operation with registers for k and Tk
composed respectively of 3 and 5 qubits.
The purpose of the table unlookupPoperation (last
step in Figure 9c) is to map the state αk |ki |Tk i into
P k
αk |ki, where αk are some complex coefficients. A nat-
k2 • • • • • • k
k1 • • ural way to do this mapping is to apply again the lookup
• • • • • • • • • • • • operation described in the previous subsection. Since
k0 • • • • the lookup operates on the computational basis follow-
• • • • • • • • ing |ki |xi 7→ |ki |x ⊕ Tk i where ⊕ standsP for the bit-
? ? ? ? ? ? ? ? wise XOR operator, by linearity it maps αk |ki |Tk i 7→
|0i k
? ? ? ? ? ? ? ? P
|0i αk |ki |0i, the latter corresponding to the desired state
? ? ? ? ? ? ? ? k
|0i
? ? ? ? ? ? ? ? when simply discarding the qubits previously encoding
|0i
? ? ? ? ? ? ? ? the numbers Tk .
|0i
T7 T6 T5 T4 T3 T2 T1 T0
However, a more efficient measurement-based tech-
Figure 10. Example of a quantum table lookup. For a basis nique is possible, as shown in Ref. [41, Appendix C] and
state |ki specifying the address of the number Tk from a clas- improved in Ref. [21]. The principle consists in starting
sical table, the quantum table lookup maps basis states |ki |0i by measuring the register encoding Tk in the X basis be-
into |ki |Tk i. Here k and the output are composed respectively
fore applying a phase shift conditioned on the result of
of 3 and 5 qubits. The notations for the AND computation
and uncomputation is presented in Figure 8. Black and white measurements. For a more detailed explanation, let us
circles are controls on the |1i and |0i states respectively. The start to expand the qubits encoding the numbers Tk in
question mark on the controlled NOT means that a controlled bits indexed by j ((Tk )j being the jth bit of Tk ). The
NOT is applied on qubit i only when the ith bit of Tk takes state before the uncomputation can be written as
the value 1. X O E
αk |ki (Tk )j . (D5)
j
k j
described in [37] and optimized from [36] for use with order to keep the evaluation independent of the error
T gates, is presented in Figure 13. It is thrifty in gate correction choice, we express the cost in terms of the
number and ancillary qubits, at the cost of being deeper number of 1-qubit, 2-qubit gates and AND computation
than other circuits [42, 43], which is not a disadvantage and uncomputation [44].
for our architecture.
x0 • • • x0 The modular exponentiation consists in ne /we multi-
y0 • • (y + x)0 plications, each multiplication using 2 product addition
• • • • and a swap and each product addition is implemented
x1 • • • x1 with (n+m)/wm lookups, additions and unlookups. Note
y1 • • (y + x)1 that the swap operation is realized by simply relabel-
ing the register, hence is for free. According to the
• • • • counts obtained from previous subsections, the cost of
x2 • • • x2 the exponentiation is dominated — in the limit n → ∞,
y2 • • (y + x)2 ne = O(n), we and wm constant — by: 2 new (n+m)n
e wm
1-
• qubit gates, (2we +wm n + 12(n + m)) new(n+m) CNOTs,
x3 • x3 e wm
2
y3 (y + x)3 and 2 new(n+m)
e wm
AND computations and uncomputations
2
Figure 13. Adder modulo 24 from [37], using the same nota- (translatable into 4 new(n+m)
e wm
Toffoli gates). Note that
tions as in Figure 8. The building block (boxed) is repeated when considering the universal gate set T , S, H, X, Y ,
two times, for the qubits numbers 1 and 2, while the first and Z, CNOT, controlled-Z and their conjugate, according to
last use a simplified subcircuit. Fig. 4 of Ref. [40] the AND computation and uncomputa-
tion costs in average 8 1-qubit gates and 3.5 2-qubit gates.
As presented in Figure 9c, the adder needs to add The total cost of the exponentiation is hence given at
a number Tk taking n qubits into a register with n + the leading order by 2 new (n+m)n
e wm
(9n + 8m) 1-qubit gates
m qubits. To achieve this, either the first register for Tk and (2we +wm n + 19(n + m)) new(n+m)
e wm
2-qubit gates. In
is extended with qubits in the |0i state, either we use the code used to compute the required resources and find
carry propagation blocs for the last qubits. Such blocs the optimal parameters, the complete formula have been
are identical to the ones of semi-classical adder with clas- used [22].
sical input 0; see [4, Fig. 17] for an example of such a
circuit. For gate counting, the first solution is taken into
account.
Appendix E: Error correction
The cost of the addition circuit (Figure 13) is 6(n +
m) − 9 CNOT gates and n + m − 1 AND computations
and uncomputations. This appendix is dedicated to 3D gauge color codes.
The first subsection is dedicated to the principle of sub-
system codes. The second subsection describes the ge-
ometrical structure of 3D gauge color codes. The last
5. Cost estimation subsection provides a detailed description of the cut of
the code structure that is used to process and correct the
logical qubits.
In summary, the parameters of the logical circuit for
computing the modular exponentiation are
1. Subsystem codes
n number of bits of the exponentiated number g
ne number of bits of the exponent e
Subsystem stabilizer codes [45] are defined by three
we window size for the exponentiation subgroups of the Pauli group: the stabilizer, gauge and
logical (also designated as bare logical in [23]) operator
wm window size for the multiplication groups, such that the stabilizer group is the center of the
m number of qubits added by the coset representation gauge group up to phases, i1 is included in the gauge
group, the operators from the gauge and logical groups
commutes, and the normalizer of the stabilizer group is
The aim of this subsection is to give an estimate of the the product of gauge and logical groups. We invite the
number of gates needed to implement this circuit. In reader to look at Refs. [23, 45] for an explicit construction
14
of those groups from canonical generators of the Pauli of the large tetrahedron, see Fig. 4b of [47] for an illustra-
group. The stabilizer group plays the standard role of tion. The vertices of elementary tetrahedrons are colored
stabilizers, i.e. divides the total Hilbert space H into a with 4 different colors such that adjacent vertices get a
direct sum of orthogonal subspaces C ⊕ C ⊥ where C — different color. Each elementary tetrahedron represents
the stabilized subspace — corresponds to the eigenspace a physical qubit.
+1 of all stabilizers. The gauge and logical groups de-
compose the stabilized subspace C into a tensor product The measured operators are the gauge generators for
of the logical qubits space A and the gauge qubits space the code used to implement the H and CNOT gates
B [46], that is, the Hilbert space is decomposed as — the (1, 1) code (see [23]). These generators are de-
scribed by the edges: each operator is the product of X
H = (A ⊗ B) ⊕C ⊥ . or Z operators of the elementary tetrahedrons adjacent
| {z }
C to a given edge (each operator implies up to 6 physical
The gauge group acts trivially on the logical qubits and qubits).
is the Pauli group of the gauge qubits while the logical
The stabilizer generators of the (1, 1) and (1, 2) codes
group acts trivially on the gauge qubits and is the Pauli
(the (1, 2) code refers to the code used to implement the T
group of the logical qubits (up to phases). This ensures
gate [23]) which are described by the vertices and edges,
that gauge operator measurements don’t modify the log-
are deduced from the values of measured operators. More
ical qubits.
precisely, the operator corresponding to a vertex can be
A gauge fixing operation consists in switching from a written as the product of the operators corresponding to
code to another one such that the new stabilizer group edges starting at the given vertex and ending on vertices
includes the original one while being included into the of a common color. Three choices of color are possible,
original gauge group, while keeping unmodified the logi- allowing one to recover in three different ways an opera-
cal group. The decomposition associated to the original tor corresponding to a vertex. This redundancy can be
code used for achieving fault-tolerant error correction in only
one measurement of the (gauge) operators related to the
H = (A ⊗ B) ⊕C ⊥ edges [23].
| {z }
C
Let ncode be the index of the code which is the number
then becomes of the form of vertices of the same color on one edge of the large
H = (A ⊗ B 0 ) ⊕ (A ⊗ B 00 ) ⊕ C ⊥ tetrahedron (denoted as n in [23]). The code distance
| {z } is given by d = 2ncode + 1, and the number of physical
C 0⊥ 3
qubits is 1 + 4ncode + 6n2code + 4n3code = d 2+d [23, 47].
where B 0 is the new gauge qubit space. As a consequence,
a valid code-word for this new code is also valid for the
initial one. The passage of the latter to the new code
3. Slicing of the code structure
is done by measuring the generators of the gauge group,
the results of these measurements giving the correction
to apply on B 0 ⊕ B 00 to remove the components on B 00 .
To process the information, the code structure is de-
For 3D gauge color codes, code switching allows a composed into slices, each slice being map successively
transversal error-corrected implementation of a univer- into the 2D processor. While several cuts in slices are
sal set of gates [23]. possible, we choose slices orthogonal to two faces (see
Figure 14). The processor need to be sized to fit in the
larger slice, that join the edge not included into any of
the two faces to the middle of the opposing edge — the
2. Code geometrical structure magenta slice in Figure 14.
In order to clarify on the context, let us first re- Since we believe that the determination of the circuit-
mind that there are three main definitions of error- level threshold goes beyond the scope of this work, the
16
run-time and resource needed to factor a 2 048-bit RSA the number of physical qubits in the processor and the
integer are given in the main text under the assumption run-time.
of a threshold of 0.75 %. Since this choice is somehow ar-
bitrary, we give the evolution of run-time and resource as
a function of the threshold in Figure 16. More precisely,
they are given as a function of the ratio p/pth between 1. Optimal parameters to factor n-bit RSA integers
the physical error probability per cycle p and the fault-
tolerant threshold pth which is the only relevant quantity
at first order. For pth = 0.75 % and an error probability The resources and parameters needed to factor RSA
per cycle and per physical qubit of 10−3 , this ratio p/pth integers encoded in n bits are specified in Table I. In
is given by ≈ 0.13. particular, we consider the factorization of RSA integers
with n = 6 bits, the number of bits needed to factor
200 35. We also consider n = 829 which corresponds to the
qubits 18
100 largest RSA integer factorized so far [51].
qubit number (kiloqubit)
time
n ne m we wm d nqubits texp logical qubits total modes spatial modes temporal modes all memory correction
6 6 4 3 2 7 316 1 min 38 6 650 3 002 5 95 µs
8 9 8 3 2 13 1 060 2s 58 64 090 15 370 11 319 µs
16 21 11 3 2 17 1 796 10 s 99 244 035 44 451 15 742 µs
128 189 19 3 3 29 5 156 50 min 571 6 971 339 736 019 27 8 ms
256 381 21 3 3 33 6 660 7 hours 1 089 19 585 665 1 813 185 31 17 ms
512 765 24 3 3 37 8 356 2 days 2 122 53 782 090 4 432 858 35 37 ms
829 1 242 26 3 3 41 10 244 11 days 3 396 117 097 476 8 697 156 39 66 ms
2 048 3 029 30 3 3 47 13 436 177 days 8 284 430 229 540 27 825 956 45 186 ms
Table I. For different integer sizes n and corresponding exponent size ne (∼ 1.5n), the table presents the optimal set of
parameters, processor size and computation run-time, and the memory requirements.
– During a product-addition operation, the dif- approach comes from the fact that there is no need for
ferent additions can be parallelized by com- magic state distillation in the use 3D gauge color codes.
puting separately partial sums. The number of qubits in the processor is kept small be-
– During the exponentiation, the different mul- cause the qubits are released from the memory and pro-
tiplications can be parallelized by computing cess slice by slice.
separately partial products.
• The qubit number can be reduced using another
slicing of the code structure, at the cost of a longer Appendix G: Memory requirements to factor
computation time. For example, if one chooses to RSA-2 048 integers
cut the tetrahedron by slices parallel to a facet of
this tetrahedron, we estimate that a 2 048-bit RSA
integer could be factorized with 6 628 qubits in the We would like to first emphasize that the main objec-
processor and 354 days. tive of our project was to evaluate accurately the perfor-
mance of an architecture in which unprocessed qubits
are stored in a quantum memory. The standard ap-
proach suffers from the need of millions of individually
3. Decoupling the gain from 3D gauge color code
and multimode memory controlled qubits and several research entities are dedi-
cating large teams of engineers to tackle this challenge.
We have shown through Shor’s algorithm that the use of
Two new design elements have been proposed in this a quantum memory reduces significantly the number of
manuscript, the use of 3D Gauge color codes and an ar- qubits in the processor though a significant change in the
chitecture using a multi-mode memory. We here separate way the information is processed and protected against
them out and get insight into the improvements from errors. Our results hence provide a solution to an en-
each. gineering problem and turns it into a physics problem:
the implementation of a faithful and multimode mem-
The main motivation to use 3D gauge color codes is ory. Before discussing the requirements on the memory
to get rid of the magical state factory needed for im- in detail, let us clearly define the notion of multimode
plementing non-Clifford gates in surface code. However, memory [53].
the transversality of T gate on 3D gauge color codes is
strongly linked with the dimensionality, and 2D color From an algorithm point of view, “spatial modes” are
codes can’t directly achieve it [23, 52]. There is no direct stored modes that can be accessed in constant time,
way to make use of a 3D color code on a 2D grid. while “temporal modes” can only be sequentially recov-
erable (first stored, first release). In the proposed im-
The main advantage brought by the memory is to un- plementation based on spin-echo, temporal modes cor-
load qubits from the processing unit to the memory. Us- respond to different time slots, photons arriving in dif-
ing a memory in the standard approach for example (2D ferent time bins being remitted sequentially after spin
grid and surface code), we estimate that a RSA-2 048 in- refocusing. Spatial modes correspond to either different
teger can be factorized with a 2D surface code in about spatial (transverse) modes of a cavity or to different cav-
68 days using a memory that can store up to 5 mil- ities (with possibility to combine both). As discussed in
lion modes and a processor with 184 thousand qubits, the main text, it is possible to use temporal multiplexing
180 thousand being dedicated to the magical state fac- only, at the cost of increasing the run-time.
tory and 4 thousand to the logical qubits on the proces-
sor. The additional reduction in the processor size in our We now estimate that the factorization of RSA-2 048
18
integers with the proposed architecture would take a Appendix H: Realization combining a rare-earth
memory with the following characteristics: doped solid and a superconducting resonator
∗
elie.gouzien@cea.fr
• Error probability for a transfer to memory, stor- †
https://quantum.paris
age, and retrieval less than 0.1 %. Note that this [1] M. Kjaergaard, M. E. Schwartz, J. Braumüller,
requirement for a complete cycle of write/read from P. Krantz, J. I.-J. Wang, S. Gustavsson, and W. D.
memory is likely very conservative. Indeed, the Oliver, Superconducting Qubits: Current State of Play,
threshold value of error correction is mainly de- Annual Review of Condensed Matter Physics 11, 369
termined by the errors happening during the sta- (2020), 1905.13641.
[2] L. Lamata, A. Parra-Rodriguez, M. Sanz, and E. Solano,
bilizer measurements. We thus conjecture that
Digital-analog quantum simulations with superconduct-
the error correction could handle higher error rate ing circuits, Advances in Physics: X 3, 1457981 (2018),
for those specific operations. The effect of this 1711.09810.
strongly dissymmetric noise between the mem- [3] C. Gidney and M. Ekerå, How to factor 2048 bit RSA in-
ory/processor operations is still under investiga- tegers in 8 hours using 20 million noisy qubits, Quantum
tion, and we choose to stick to the conservative 5, 433 (2021), 1905.09749.
hypothesis for this article. [4] Y. R. Sanders, D. W. Berry, P. C. S. Costa, L. W. Tessler,
N. Wiebe, C. Gidney, H. Neven, and R. Babbush, Compi-
lation of Fault-Tolerant Quantum Heuristics for Combi-
• The information stored in a given memory mode natorial Optimization, PRX Quantum 1, 020312 (2020),
can be mapped to 3 qubits of the processor: two 2007.07391.
[5] J. Lee, D. W. Berry, C. Gidney, W. J. Huggins, J. R. Mc-
for the 2-qubit gates (depending on whether the Clean, N. Wiebe, and R. Babbush, Even More Efficient
physical qubit is the logical control or target qubits) Quantum Computations of Chemistry Through Ten-
and one for the error correction and 1-qubit gates. sor Hypercontraction, PRX Quantum 2, 030305 (2021),
No need for an all to all connectivity. 2011.03494.
19
[6] D. Kielpinski, C. Monroe, and D. J. Wineland, Architec- [23] H. Bombı́n, Gauge color codes: optimal transversal gates
ture for a large-scale ion-trap quantum computer, Nature and gauge fixing in topological stabilizer codes, New
417, 709 (2002). Journal of Physics 17, 083002 (2015), 1311.0879.
[7] D. D. Thaker, T. S. Metodi, A. W. Cross, I. L. Chuang, [24] E. T. Campbell, B. M. Terhal, and C. Vuillot, Roads
and F. T. Chong, Quantum Memory Hierarchies: Effi- towards fault-tolerant universal quantum computation,
cient Designs to Match Available Parallelism in Quantum Nature 549, 172 (2017), 1612.07330.
Computing, in 33rd International Symposium on Com- [25] H. Bombı́n, Single-Shot Fault-Tolerant Quantum Er-
puter Architecture (ISCA’06) (IEEE, 2006) pp. 378–390, ror Correction, Physical Review X 5, 031043 (2015),
quant-ph/0604070. 1404.5504.
[8] Z.-L. Xiang, S. Ashhab, J.-Q. You, and F. Nori, Hybrid [26] B. J. Brown, N. H. Nickerson, and D. E. Browne, Fault-
quantum circuits: Superconducting circuits interacting tolerant error correction with the gauge color code, Na-
with other quantum systems, Reviews of Modern Physics ture Communications 7, 12302 (2016), 1503.08217.
85, 623 (2013), 1204.2137. [27] S. J. Devitt, W. J. Munro, and K. Nemoto, Quantum
[9] G. Kurizki, P. Bertet, Y. Kubo, K. Mølmer, D. Pet- error correction for beginners, Reports on Progress in
rosyan, P. Rabl, and J. Schmiedmayer, Quantum tech- Physics 76, 076001 (2013), 0905.2794.
nologies with hybrid systems, Proceedings of the National [28] M. Afzelius, N. Sangouard, G. Johansson, M. U. Staudt,
Academy of Sciences 112, 3866 (2015), 1504.00158. and C. M. Wilson, Proposal for a coherent quantum
[10] C. Grezes, Y. Kubo, B. Julsgaard, T. Umeda, J. Isoya, memory for propagating microwave photons, New Jour-
H. Sumiya, H. Abe, S. Onoda, T. Ohshima, K. Naka- nal of Physics 15, 065008 (2013), 1301.1858.
mura, I. Diniz, A. Auffeves, V. Jacques, J.-F. Roch, [29] M. Afzelius and C. Simon, Impedance-matched cavity
D. Vion, D. Esteve, K. Mølmer, and P. Bertet, Towards quantum memory, Physical Review A 82, 022310 (2010),
a spin-ensemble quantum memory for superconduct- 1004.2469.
ing qubits, Comptes Rendus Physique 17, 693 (2016), [30] T. Chanelière, G. Hétet, and N. Sangouard, Quan-
1510.06565. tum Optical Memory Protocols in Atomic Ensembles,
[11] H. Bombı́n and M. A. Martin-Delgado, Exact topologi- in Advances In Atomic, Molecular, and Optical Physics,
cal quantum order in D = 3 and beyond: Branyons and Vol. 67 (Elsevier, 2018) Chap. 2, pp. 77–150, 1801.10023.
brane-net condensates, Physical Review B 75, 075103 [31] P. A. Bushev, A. K. Feofanov, H. Rotzinger, I. Pro-
(2007), cond-mat/0607736. topopov, J. H. Cole, C. M. Wilson, G. Fischer, A. V.
[12] P. W. Shor, Algorithms for quantum computation: dis- Lukashenko, and A. V. Ustinov, Ultralow-power spec-
crete logarithms and factoring, in Proceedings 35th An- troscopy of a rare-earth spin ensemble using a supercon-
nual Symposium on Foundations of Computer Science ducting resonator, Physical Review B 84, 060501 (2011),
(IEEE Comput. Soc. Press, 1994) pp. 124–134. 1102.3841.
[13] M. Ekerå and J. Håstad, Quantum Algorithms for Com- [32] M. U. Staudt, I.-C. Hoi, P. Krantz, M. Sandberg,
puting Short Discrete Logarithms and Factoring RSA In- M. Simoen, P. A. Bushev, N. Sangouard, M. Afzelius,
tegers, in Post-Quantum Cryptography, Lecture Notes in V. S. Shumeiko, G. Johansson, P. Delsing, and C. M. Wil-
Computer Science, Vol. 10346, edited by T. Lange and son, Coupling of an erbium spin ensemble to a supercon-
T. Takagi (Springer International Publishing, 2017) pp. ducting resonator, Journal of Physics B: Atomic, Molec-
347–363, 1702.00249. ular and Optical Physics 45, 124019 (2012), 1201.1718.
[14] R. L. Rivest, A. Shamir, and L. Adleman, A method [33] S. Probst, H. Rotzinger, S. Wünsch, P. Jung, M. Jerger,
for obtaining digital signatures and public-key cryptosys- M. Siegel, A. V. Ustinov, and P. A. Bushev, Anisotropic
tems, Communications of the ACM 21, 120 (1978). Rare-Earth Spin Ensemble Strongly Coupled to a Su-
[15] W. Diffie and M. E. Hellman, New directions in cryp- perconducting Resonator, Physical Review Letters 110,
tography, IEEE Transactions on Information Theory 22, 157001 (2013), 1212.2856.
644 (1976). [34] R. B. Griffiths and C.-S. Niu, Semiclassical Fourier
[16] Information Technology Laboratory, Digital Signature Transform for Quantum Computation, Physical Review
Standard (DSS) (2013). Letters 76, 3228 (1996), quant-ph/9511007.
[17] P. W. Shor, Polynomial-Time Algorithms for Prime Fac- [35] C. Gidney, Approximate encoded permutations and
torization and Discrete Logarithms on a Quantum Com- piecewise quantum adders, 1905.08488 (2019).
puter, SIAM Journal on Computing 26, 1484 (1997), [36] S. A. Cuccaro, T. G. Draper, S. A. Kutin, and D. P.
quant-ph/9508027. Moulton, A new quantum ripple-carry addition circuit,
[18] M. Ekerå, Modifying Shor’s algorithm to compute short quant-ph/0410184 (2004).
discrete logarithms (2016), https://eprint.iacr.org/ [37] C. Gidney, Halving the cost of quantum addition, Quan-
2016/1128. tum 2, 74 (2018), 1709.06648.
[19] M. Ekerå, On post-processing in the quantum algorithm [38] V. Vedral, A. Barenco, and A. Ekert, Quantum networks
for computing short discrete logarithms (2017), https: for elementary arithmetic operations, Physical Review A
//eprint.iacr.org/2017/1122. 54, 147 (1996), quant-ph/9511018.
[20] M. Ekerå, Quantum algorithms for computing general [39] C. Zalka, Shor’s algorithm with fewer (pure) qubits,
discrete logarithms and orders with tradeoffs (2018), quant-ph/0601097 (2006).
https://eprint.iacr.org/2018/797. [40] R. Babbush, C. Gidney, D. W. Berry, N. Wiebe, J. Mc-
[21] C. Gidney, Windowed quantum arithmetic, 1905.07682 Clean, A. Paler, A. Fowler, and H. Neven, Encoding Elec-
(2019). tronic Spectra in Quantum Circuits with Linear T Com-
[22] Code is available at https://github.com/ElieGouzien/ plexity, Physical Review X 8, 041015 (2018), 1805.03662.
factoring_with_memory. [41] D. W. Berry, C. Gidney, M. Motta, J. R. McClean, and
R. Babbush, Qubitization of Arbitrary Basis Quantum
20
Chemistry Leveraging Sparsity and Low Rank Factoriza- [50] A. Kubica, M. E. Beverland, F. Brandão, J. Preskill,
tion, Quantum 3, 208 (2019), 1902.02134. and K. M. Svore, Three-Dimensional Color Code Thresh-
[42] T. G. Draper, Addition on a Quantum Computer, quant- olds via Statistical-Mechanical Mapping, Physical Re-
ph/0008033 (2000). view Letters 120, 180501 (2018), 1708.07131.
[43] T. G. Draper, S. A. Kutin, E. M. Rains, and K. M. Svore, [51] F. Boudot, P. Gaudry, A. Guillevic, N. Heninger,
A logarithmic-depth quantum carry-lookahead adder, E. Thomé, and P. Zimmermann, Factorization of RSA-
Quantum Information and Computation 6, 351 (2006), 250 (2020).
quant-ph/0406142. [52] S. Bravyi and R. König, Classification of Topologically
[44] Due to the measurement-based uncomputation, it is often Protected Gates for Local Stabilizer Codes, Physical Re-
more efficient to implement the non-Clifford operations view Letters 110, 170503 (2013), 1206.1609.
through AND computation. In case of direct implemen- [53] C. Simon, H. de Riedmatten, M. Afzelius, N. Sangouard,
tation of Toffoli gates, the circuit cost could be slightly H. Zbinden, and N. Gisin, Quantum Repeaters with Pho-
reduced. ton Pair Sources and Multimode Memories, Physical Re-
[45] D. Poulin, Stabilizer Formalism for Operator Quantum view Letters 98, 190503 (2007), quant-ph/0701239.
Error Correction, Physical Review Letters 95, 230504 [54] M. Le Dantec, M. Rancic, E. Flurin, D. Vion, D. Es-
(2005), quant-ph/0508131. teve, P. Bertet, P. Goldner, T. Chanelière, B. Sylvain,
[46] P. Zanardi, D. A. Lidar, and S. Lloyd, Quantum Ten- S. Lin, and R. B. Liu, Twenty millisecond electron-spin
sor Product Structures are Observable Induced, Physical coherence in an erbium doped crystal, in Bulletin of the
Review Letters 92, 060402 (2004), quant-ph/0308043. American Physical Society (American Physical Society,
[47] M. E. Beverland, A. Kubica, and K. M. Svore, Cost of 2021).
Universality: A Comparative Study of the Overhead of [55] Y. Kubo, C. Grezes, A. Dewes, T. Umeda, J. Isoya,
State Distillation and Code Switching with Color Codes, H. Sumiya, N. Morishita, H. Abe, S. Onoda, T. Ohshima,
PRX Quantum 2, 020341 (2021), 2101.02211. V. Jacques, A. Dréau, J.-F. Roch, I. Diniz, A. Auffeves,
[48] A. M. Kubica, The ABCs of the color code: A study of D. Vion, D. Esteve, and P. Bertet, Hybrid Quantum Cir-
topological quantum codes as toy models for fault-tolerant cuit with a Superconducting Qubit Coupled to a Spin
quantum computation and quantum phases of matter , Ensemble, Physical Review Letters 107, 220501 (2011),
Ph.D. thesis (2018). 1110.2978.
[49] A. Kubica and N. Delfosse, Efficient color code decoders [56] V. Ranjan, J. O’Sullivan, E. Albertinale, B. Albanese,
in d ≥ 2 dimensions from toric code decoders, 1905.07393 T. Chanelière, T. Schenkel, D. Vion, D. Esteve, E. Flurin,
(2019). J. J. L. Morton, and P. Bertet, Multimode Storage
of Quantum Microwave Fields in Electron Spins over
100 ms, Physical Review Letters 125, 210505 (2020),
2005.09275.