0% found this document useful (0 votes)
28 views21 pages

Factorisation Memoire

The document proposes a quantum computing architecture combining a small processor with a multiplexed quantum memory. It analyzes the performance of this architecture for Shor's integer factorization algorithm. It finds that with 13,436 physical qubits in the processor and a memory storing 28 million spatial and 45 temporal modes, a 2048-bit RSA integer could be factored in 177 days, outperforming a standard architecture by orders of magnitude.

Uploaded by

ivanarcila
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views21 pages

Factorisation Memoire

The document proposes a quantum computing architecture combining a small processor with a multiplexed quantum memory. It analyzes the performance of this architecture for Shor's integer factorization algorithm. It finds that with 13,436 physical qubits in the processor and a memory storing 28 million spatial and 45 temporal modes, a 2048-bit RSA integer could be factored in 177 days, outperforming a standard architecture by orders of magnitude.

Uploaded by

ivanarcila
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Factoring 2048-bit RSA Integers in 177 Days with 13436

Qubits and a Multimode Memory


Élie Gouzien, Nicolas Sangouard

To cite this version:


Élie Gouzien, Nicolas Sangouard. Factoring 2048-bit RSA Integers in 177 Days with 13436 Qubits and
a Multimode Memory. Physical Review Letters, 2021, 127 (14), �10.1103/PhysRevLett.127.140503�.
�hal-03358148�

HAL Id: hal-03358148


https://hal.science/hal-03358148
Submitted on 29 Sep 2021

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est


archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents
entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non,
lished or not. The documents may come from émanant des établissements d’enseignement et de
teaching and research institutions in France or recherche français ou étrangers, des laboratoires
abroad, or from public or private research centers. publics ou privés.
arXiv:2103.06159

doi:10.1103/PhysRevLett.127.140503
Factoring 2 048-bit RSA Integers in 177 Days with 13 436 Qubits
and a Multimode Memory
∗ †
Élie Gouzien and Nicolas Sangouard
Université Paris–Saclay, CEA, CNRS, Institut de Physique Théorique, 91 191 Gif-sur-Yvette, France
(Dated: September 29, 2021)
We analyze the performance of a quantum computer architecture combining a small processor
and a storage unit. By focusing on integer factorization, we show a reduction by several orders
of magnitude of the number of processing qubits compared with a standard architecture using a
planar grid of qubits with nearest-neighbor connectivity. This is achieved by taking advantage
of a temporally and spatially multiplexed memory to store the qubit states between processing
steps. Concretely, for a characteristic physical gate error rate of 10−3 , a processor cycle time of
1 microsecond, factoring a 2 048-bit RSA integer is shown to be possible in 177 days with 3D gauge
color codes assuming a threshold of 0.75 % with a processor made with 13 436 physical qubits and a
memory that can store 28 million spatial modes and 45 temporal modes with 2 hours’ storage time.
By inserting additional error-correction steps, storage times of 1 second are shown to be sufficient
at the cost of increasing the run-time by about 23 %. Shorter run-times (and storage times) are
achievable by increasing the number of qubits in the processing unit. We suggest realizing such
an architecture using a microwave interface between a processor made with superconducting qubits
and a multiplexed memory using the principle of photon echo in solids doped with rare-earth ions.

Introduction — Superconducting qubits form build- qubit address. When the processing is done, the qubit
ing blocks of one of the most advanced platforms for re- state is mapped back to the memory and stored until
alizing quantum computers [1, 2]. The standard archi- another processing operation is needed.
tecture consists of laying superconducting qubits in a 2D
grid and computing using only neighboring interactions. More precisely, we use 3D error-correction codes [11]
Recent estimations showed however that fault-tolerant in which the address of each (dressed) logical qubit is en-
realizations of various quantum algorithms with this ar- coded into a 3D structure of physical addresses, two di-
chitecture would require millions of physical qubits [3–5]. mensions being encoded in space and one in time (see Fig-
These performance analyses naturally raise the question ure 1). Error-correction and logical gates are applied by
of an architecture better exploiting the potential of su- sequentially releasing physical qubits corresponding to
perconducting qubits. different “horizontal” slices (with different temporal in-
dexes) and by processing each slice (with the same tem-
In developing a quantum architecture we have much poral indexes) simultaneously.
to learn from classical architectures. Realizations using
trapped ions for example combine processing with stor- We assess the performance of this architecture through
age units [6]. The authors of Ref. [7] realized that key a version of Shor’s algorithm [12] proposed by Ekerå and
quantum algorithms are mostly sequential meaning that Håstad [13]. The algorithm is a threat for widely used
we may only need a small computing block for all the cryptosystems based either on the factorization [14] or
qubits in the storage unit in this architecture. Ongoing the discrete logarithm problem [15, 16]. It can also be
experimental efforts aim at exploiting this idea to reduce considered as a certification tool to check the proper func-
the number of superconducting qubits in the standard tioning of an actual quantum computer as its outcome
approach to quantum computing by adding a quantum can be verified efficiently. Last but not least, the cost
memory implemented with spins or atoms [8–10]. A de- of its implementation has been evaluated using plausible
tailed analysis of the performance of this hybrid archi- physical assumptions for a large scale processor with a
tecture is however missing. standard 2D grid of superconducting qubits (a character-
istic physical gate error rate of 10−3 , a surface code cycle
We here report on such an analysis by considering a time of 1 µs, and a reaction time of 10 µs): it was esti-
quantum memory that can store multiple spatial trans- mated that it should be possible to factor a 2 048-bit inte-
verse and temporal modes. The memory can be thought ger, typically used in the Rivest–Shamir–Adleman (RSA)
of as a qubit register in which the address of each qubit cryptosystem, in 8 hours with 20 million qubits [3].
is identified by a temporal and a spatial index. When
a given qubit needs to be processed, its state is released By taking this estimation as a reference, we estimate
and mapped into the processor by means of a microwave the cost of implementing the same version of Shor’s al-
field in a temporal and spatial mode corresponding to the gorithm in terms of physical processing qubit number,
2

1 logical qubit gers modulo N , Z∗N , and defining h = g (N −1)/2 . As


the order of Z∗N is φ(N ) = (p − 1)(q − 1), we have

temporal modes
h = g (pq−p−q+1)/2 g (p+q−2)/2 ≡ g (p+q−2)/2 mod N where
the last equivalence is the result of the Chinese remain-
der theorem. Under the assumption that the order r of
g (the smallest non-negative integer such that g r ≡ 1
mod N ) satisfies r > (p + q − 2)/2, computing the dis-

es
od
crete logarithm of h modulo N , as detailed later, yields

lm
ia
l = (p + q − 2)/2. For large N , the assumption is ver-

at
sp
ified with a high probability [13]. Using N = pq and
l = (p + q − 2)/2, where N and l are both known, p and
q are recovered by choosing one solution of the equation
N = p(2l + 2 − p), and then exploiting q = 2l + 2 − p.
Processor
The discrete logarithm is computed in three steps.
Figure 1. Quantum computer architecture using a processor First, the exponentiation (e1 , e2 ) → g e1 h−e2 is applied
made with a 2D grid of qubits and a memory operating as a once on two quantum registers prepared in a superpo-
qubit register where the address of each qubit is specified by sition of every possible value of e1 and e2 , respectively.
a temporal and spatial index. Only (dressed) logical qubits Two quantum Fourier transforms are then applied in-
are represented; additional ancillary qubits are used for mea-
dependently to the two registers before being measured.
suring the operators for error correction.
Finally, a classical postprocessing extracts the discrete
logarithm l of h modulo N from the measurement results.
multimode capacity, memory storage time, and run-time. Because the measurements are performed directly after
Our evaluation is given in the case where the processor the Fourier transform, the cost of exponentiation largely
is made with two (dressed) logical qubit slices. Under dominates the cost of Ekerå and Håstad’s algorithm (see
the assumptions used in Ref. [3] for the gate error rate Appendix A).
and the cycle time, we show that it should be possible to
factor a 2 048-bit RSA integer in 177 days using a multi- Number of gates — The modular exponentiation
mode memory with a storage time of about 2 hours and a needed in Ekerå and Håstad’s algorithm, i.e., the op-
processor including 13 436 physical qubits—a reduction eration |ei |1i 7→ |ei |g e mod N i, with the input e and
by more than 3 orders of magnitude of the number of the output g e mod N encoded on ne and n bits, respec-
physical qubits, as compared to the standard architec- tively, can be decomposed into ne multiplications, each
ture without memory [3], at the cost of a ≈ 500 times being decomposed into 2n controlled additions of integers
longer run-time. By inserting additional error-correction of typical size n and one controlled swap between two reg-
steps, we show that the storage time can be significantly isters of size n, giving a total number of 2ne n (ne ) con-
reduced at the cost of a slight increase of run-time. We trolled additions (swaps between registers, respectively)
also explain how shorter run-times and storage times are (see Appendix B for details). Each modular addition
achievable at the cost of increasing the number of qubits is obtained with a standard adder circuit at the cost of
in the processing unit. We propose a realization of such a specific representation—the coset representation (see
an architecture using a microwave interface between a Appendix C)—adding m additional qubits to the reg-
processor made with superconducting qubits and a mul- ister. A controlled swap operation between two qubits
tiplexed memory using the principle of photon echo in can be performed using two controlled NOTs (CNOTs)
solids doped with rare-earth ions embedded in cavities. and one Toffoli gate. Hence, the total cost for controlled
swaps operating on two registers using n + m qubits is
Principles of (a variant of ) Shor’s algorithm — Con- of 2(n + m) CNOTs and n + m Toffoli gates (see Ap-
sider the factorization of N = p × q, the product of two pendix B). For the controlled addition, we can use a
prime numbers of similar sizes, p and q. We note n the semi-classical adder whose mean cost for integers of size
number of bits involved in the binary representation of n + m is of 5.5(n + m) − 9 CNOTs and 2(n + m) − 1 Tof-
N , that is 2n−1 ≤ N < 2n . While no efficient classical foli gates (see Appendix B). Given the number of gates
factorization algorithm is known, Shor’s algorithm and in controlled addition and swap operations, the num-
its variants factor N with a polynomial complexity into ber of additions and swaps in the multiplication, and
n [12, 13, 17–20]. the number of multiplications in the modular exponen-
tiation, the cost of factorization can easily be estimated
The version of Shor’s factorization algorithm proposed (see Appendix B). This cost can however be reduced
by Ekerå and Håstad [13] starts by randomly select- using windowed arithmetic circuits [21]. The basic idea
ing an integer g in the multiplicative group of inte- consists of grouping the bits of e by blocks (each includ-
3

ing we bits) for controlling each multiplication, hence re- error probability on one logical qubit given in [26, eq. (4)]
ducing the number of these multiplications. Similarly,    
for each multiplication input bits are grouped (in blocks p
plogical = A exp α log dβ (1)
including wm bits) to reduce the number of additions pth
composing it. As detailed in Appendix D, the cost of where A ≈ 0.033, α ≈ 0.516, β ≈ 0.822, p is the error
exponentiation is dominated in this case by 2 new(n+m)n
e wm
1- probability per physical qubit, d the code distance which
qubit gates, [2we +wm n + 12(n + m)] new(n+m)
e wm
CNOTs, is related to the number of physical qubits per logical
2
qubits (see below) and pth the fault-tolerance threshold.
and 4 new(n+m)
e wm
Toffoli gates. We emphasize that this is
While the circuit-level threshold is unknown, we choose
a first order estimation. In the code used to compute
pth = 0.75 % as a working hypothesis and give in Ap-
the required resources and find optimal parameters, the
pendix E 4 the run-time and the resource as a function
complete formulae have been used [22].
of the code threshold.
Error correction — The error correction is achieved Architecture — For simplicity, the tetrahedral struc-
using 3D gauge color codes, a family of subsystem ture of the error correction (see Appendix E) can be in-
codes [11]. A first code admits a transversal implemen- cluded into a large cube in which physical qubits are now
tation of CNOT and Hadamard gates while a second represented by elementary cubes (see Figure 1). The
code accepts a transversal implementation of the non- large cubes are stored into the memory and loaded by
Clifford T gate. Switching between the two codes gives slices into the processor when they need to be processed.
a universal set of gates without the need for state distil- We size the processor such that one slice of two large
lation [23], contrary to standard ways of operating the cubes can be loaded simultaneously, which is convenient
surface code [24]. to perform 2-qubit gates efficiently. Each gate is imme-
diately followed by an error-correction round on the pro-
The two codes are based on a shared geometrical struc- cessed qubits. This is done by reloading again each slice
ture: a large tetrahedron constructed from elementary sequentially in the processor and by measuring the gauge
tetrahedrons (see Appendix E for details). A physical generators (before recovering classically the code stabiliz-
qubit is attributed to each elementary tetrahedron. As in ers), each of them using up to six 2-qubit gates, one auxil-
any subsystem codes, the stabilized subspace is split into iary qubit and one measurement of this auxiliary [23, 27].
a tensorial product of the (bare) logical and gauge qubits Note that the codes of interest are 3D local and the aux-
(the dressed logical qubit includes the bare logical qubit iliary qubits only need to keep coherence for the time
and gauge qubits). A set of operators—generators of of loading and measuring two successive slices for suc-
gauge operators—are measured, each being the product cessfully performing a stabilizer measurement. Once the
of (up to six) X (or Z) operators associated to qubits cor- syndromes are obtained and the errors are detected, the
responding to tetrahedrons sharing the same edge. From correction of these errors is delayed and merged with the
these measurements, the values of stabilizers of the two next operation applied on the qubit to be corrected. Fur-
codes are deduced. In the code used for implementing ther note that all-to-all connectivity between the logical
H and CNOT gates, the stabilizers are defined from the qubits is achieved if each physical address in the memory
vertices, i.e., the product of X (or Z) operators asso- can be mapped to three physical qubits in the proces-
ciated to qubits corresponding to tetrahedrons sharing sor: two for the 2-qubit gates (depending on whether the
the same vertex. In the code used for implementing T physical qubit is the logical control or target qubits) and
gates, the stabilizers are defined from the vertices for one for the error correction. For achieving a code dis-
X operators and from the edges for Z operators. The tance d the number of physical qubits in the processor is
value of an operator represented by a vertex is classi- 2
nqubits = 2 × 2 × 3d +2d−3 , corresponding to two logical
2
cally recovered by multiplying the measurement results qubit slices (see Appendix E) and including the ancil-
of combinations of specific edges ending at the given ver- lary qubits (essentially one per physical qubit) needed
tex. Several combinations are possible giving redundan- for stabilizer measurements. For a code distance d, we
cies that can be exploited to achieve fault-tolerant error approximate the time it takes to perform one (1-qubit or
correction with only one run of measurements [25]. The 2-qubit) logical gate by 2(d − 2)tc where tc is the cycle
structure of codes in which the stabilized subsystem is time of the 2D processor (time to load one qubit slice; to
the tensor product of the gauge and (bare) logical sub- measure the stabilizers, which is longer than the gate op-
systems guarantees that measurements of gauge opera- eration; and to reload the slice into the memory) and the
tors do not reveal the value of the (bare) logical qubit factor 2 comes from the fact that the gate is immediately
(see Appendix E). followed by an error-correction round.

To account for the additional resource needed to im- Cost evaluation — To evaluate the resources required
plement these codes, we use an estimation of the residual for integer factorization, we consider the total number
4

14 every second for example would increase the run-time by


13
12
11
qubits about 23 %. Note also that both the run-time and storage
month
10 time can be reduced by increasing the size of the proces-
qubit number (kiloqubit)

time
9
week sor (see Appendix F). We also estimated that 28 million
8

Expected time
7 spatial modes and 45 temporal modes need to be stored.
6 day
Note that the number of stored modes does not enter in
5
the volume and is thus not optimized (see Appendix G).
4 hour Note also that qubit addresses in the memory can be
identified by temporal indexes only at the cost of longer
3
run-time when photon-echo type protocols are used, cf.,
below for a concrete example.
2 minute
32 64 128 256 512 1024 2048
n Implementation — Our proposal provides a viable
solution to get rid of the individual control of millions
of qubits but the challenge now relies on the realization
Figure 2. Number of qubits in the processor and run-time to
of an efficient multimode quantum memory. As shown
factor n-bit RSA integers with a computer architecture using
a multimode memory. in Ref. [28], such a memory could be implemented us-
ing a solid-state spin ensemble (N̄ spins with an inho-
mogeneous spectral broadening Γ), resonantly coupled
of gates involved in the logical circuit. The total run- (with single spin coupling rate g) to a frequency tunable
time for one attempt is obtained by multiplying the gate single-mode microwave resonator (of length L and with
number by the time it takes to perform one gate, while damping rate κ to an external transmission line). The
the success probability is deduced from the logical error resonator serves to enhance microwave absorption and
probability [Eq. (1)]. Following Ref. [3], we consider a re-emission by the spins. In particular, unit efficiency
cycle time of tc = 1 µs and a mean error per physical absorption of a microwave field can be realized if the fi-
qubit and per gate of p = 10−3 . Note that the mean nesse F of the resonator matches the single-path absorp-
−1
error per gate now includes errors during reading of and tion αL of spins F = (αL) , i.e., if the cooperativity
2
writing into the memory. C = gκΓN̄ = αL × F = 1 [29]. Once absorbed, the mi-
crowave field can be re-emitted by time reversing the in-
The cost evaluation is finally obtained by optimizing homogeneous dephasing using a spin echo technique [30].
the two window parameters wm and we , the coset rep- Detuning the resonator off and on resonance at the right
resentation padding m and the code distance d in order time, the spin coherence is recovered, leading to a noise-
to minimize the volume texp × nqubits . texp = pts is the free, unit re-emission probability √
of the stored photon if
average time to obtain the result (several attempts might C = 1 [28]. In the regime κ  g N̄  Γ, the memory
be necessary), with t the computation time per attempt bandwidth is given by 4Γ [28], meaning that any input
and ps the success probability. with a spectrum, say, ten times thinner i.e., 4Γ/10 can
be stored with close to unit efficiency. Furthermore, the
Results — The required resources to factor a n-bit time duration during which an optical coherence can be
RSA integer are presented in Figure 2 and discussed in preserved is limited by the inverse of the homogeneous
Appendix F. Our estimation suggests that the factoriza- linewidth γh [28]. Assuming that the storage efficiency
tion of a 2 048-bit integer corresponding to the most com- is unchanged if the storage time is hundred times shorter
mon RSA key size would be possible in about 177 days than γh−1 , this means that the number of temporal modes
with a processor having only 13 436 qubits. Concern- that can be stored with almost unit efficiencies is roughly
ing the memory, we made the hypothesis of an error per given by Γ/(250γh ). Interestingly, a well-identified tem-
cycle of p = 10−3 , including the reading and writing poral mode can be released while keeping all the other
error. As previously discussed, we need a memory for modes in the memory by appropriately detuning the res-
which each mode can be mapped to three different qubits onator off and on resonance with the spins at the cost of
of the processor. We estimated the maximum time be- introducing a dead time between two readouts of half the
tween storage and readout of the same qubit to be less duration of the stored train of pulses on average.
than 2 hours. A memory with a storage time of at least
2 hours is however not necessary as error-correction steps To give an idea of what could be realized in a near
can be implemented periodically at the cost of increasing future, we estimate that it should be possible to factor
the run-time. Error correction of all the qubits stored in 35 in about 1 min using the exact algorithm presented
the memory is estimated to take 186 ms with a processor here (with windowed arithmetic and 3D color codes) and
having 13 436 bits, meaning that the storage time simply a setup combining a memory for storing 38 logical qubits
needs to be longer than 186 ms. Applying a correction (3 002 spatial modes and 5 temporal modes) and a pro-
5

cessor with 316 physical qubits (we estimate that more after the Fourier transform, hence explaining the mea-
than 60 000 qubits would be needed with a standard 2D surements of each qubit at the end of gate sequences in
grid and surface code). If instead of using a spatially and Figure 3a. The simple rearrangement presented in Fig-
temporally multiplexed memory, the qubits are stored in ure 3b shows that the measurement can be performed
the same spatial mode and are identified by (6 650) tem- right after the Hadamard provided that the following
poral addresses only, we evaluate the same factorization phase gates are classically controlled by the result of this
to be possible in about 1 day using a memory bandwidth measurement, see Figure 3c. In this case, the successive
4Γ = 2π × 48 MHz and taking into account the corre- classically controlled phase gates operating on the same
sponding dead time between two memory readouts. In qubit can be merged together, leading to a circuit with
this case, error correction of all the qubits stored in the one phase gate, one Hadamard gate and one measure-
memory is estimated to take 132 ms meaning the stor- ment per qubit. When this semi-classical Fourier trans-
age time needs to be longer than 132 ms. For a memory form operates on a register made with ne qubits (the
bandwidth 4Γ = 2π × 120 MHz, the same factorization number of bits of the exponent), its cost is linear in ne
would take 9 hours, and error correction is estimated and is thus negligible compared to the cubic complexity
to take 53 ms. As discussed in Appendix H, these re- of the exponentiation.
quirements can realistically be met with a realization of
the memory protocol described before combining a solid
doped with rare-earth and a superconducting microwave
resonator [31–33]. Appendix B: Decomposition of the exponentiation
into elementary gates
Conclusion — We have shown that the use of a quan-
tum memory for quantum computing is appealing as un-
processed qubits can be loaded into the memory which In this appendix, we aim to give a clear view of how to
significantly reduces the size of the processor compared decompose the modular exponentiation into elementary
with standard architectures where all qubits are kept gates. The presented method is intended to be simple
in the processor. All-to-all connectivity between logical to understand, but not optimal. A more efficient one is
qubits is reached if each address in the memory can be presented in Appendix D.
mapped to only 3 qubits in the processor. The use of a
memory allows one to exploit a 3D code on a 2D pro-
cessor. If we allow each memory mode to be mapped to
any qubit in the processor, all-to-all connectivity between 1. Decomposition of a modular exponentiation into
physical qubits can be obtained, hence offering many op- additions
portunities for error correction and for implementing al-
gorithms with gates operating between non-neighboring
qubits. The modular exponentiation needed in Ekerå and
Håstad’s algorithm, i.e., the operation |ei |1i 7→
We acknowledge M. Afzelius, J.-D. Bancal, P. Bertet, |ei |g e mod N i, with the input e and the output g e
E. Flurin, P. Sekatski, X. Valcarce and J. Zivy for stim- mod N encoded on ne and n bits respectively, can be
ulating discussions and/or for critically reviewing the implemented from controlled modular additions as we
manuscript. We acknowledge funding by the Institut de show now. For simplicity, we omit the modulo in this
Physique Théorique (IPhT), Commissariat à l’Énergie paragraph.
Atomique et aux Energies Alternatives (CEA) and the
Region Île-de-France in the framework of DIM SIRTEQ. Let ene −1 . . . ei . . . e0 be the binary form of e. The ex-
ponentiation can first be seen as a sequence of multipli-
cations
e −1
nY e −1 h
nY iei
Appendix A: Semi-classical Fourier transform i i
ge = g2 ei
= g2 (B1)
i=1 i=1

We here discuss the semi-classical Fourier transform where each multiplication is controlled by the bit value
presented in [34] and show that its cost is negligible. ei . Figure 4 shows an implementation of such a mul-
The standard way to perform the Fourier transform on tiplication in which a quantum register encoding the
i
ne qubits is shown in Figure 3a: it requires a sequence of integer x ends up into an encoding of x × g 2 ei . It
one Hadamard and controlled phase gates for each qubit. uses two controlled product-additions, i.e., the opera-
In Shor’s algorithm as well as in Ekerå and Håstad’s ver- tion letting |yi |zi unchanged if |ei i = |0i and map-
i
sion of Shor’s algorithm, the qubits are measured right ping |yi |zi into |yi |z + y × γi ((y, z, γ) → (x, 0, g 2 ) for
6

(a) Standard quantum Fourier transform.

H • ··· • ···

φ2 · · · H • ··· •
··· φ2 · · · H •

· · · φne · · · φne −1 φ2 H

(b) Rearranged Fourier transform.

H • • ··· • ···

φ2 H • ··· • ···

φ3 φ2 H ··· ··· •

··· φ ne φne −1 · · · φ2 H

(c) Semi-classical Fourier transform. The successive classically controlled rotation gates can
be merged.

H • • ··· • ··· +3

φ2 H • ··· • ··· +3

φ3 φ2 H ··· ··· • +3

··· φ ne φne −1 · · · φ2 H

Figure 3. Different versions of the Fourier transform followed by measurements. They are used to convince the reader that
the number of gates in the Fourier transform is negligible with respect to the cost of
 the2πiexponentiation.
 These three versions
are based on the phase gates φk defined as a 2 × 2 matrix with diagonal elements 1, e 2k and zeros off diagonal. Note that
the control and target qubits can be reversed in the representation of each controlled phase gate without changing the result.

|ei i • |ei i E
2i
i yn−1 . . . y0 of y and by rewriting the product as
|xi / ×g xg 2 ei
= n−1 n−1
|ei i • • • |ei i X X
γ2j yj =
 j
i
E y×γ = γ2 yj . (B2)
|xi −2i 2 ei
/ Input x +x̄(−g ) × xg j=0 j=0
i
|0i / +xg 2 Input x̄ × |0i As yj is either 0 or 1, the controlled product-addition
can be implemented by a sequence of additions, each of
Figure 4. Principle of a modular multiplication circuit trans-
them controlled both by the values of bits |yj i and |ei i.
forming a quantum register encoding the integer x into a state
i Figure 5 shows explicitly the decomposition of the first
encoding x × g 2 ei . A first product-addition
E operation trans-
i product-addition appearing into each multiplication of
forms auxiliary qubits in |0i into x × g 2 if |ei i = |1i. Then,
the exponentiation.
i i
a product-addition applies +x̄(−g −2 ) with x̄ = x × g 2 into
the register encoding x if |ei i = |1i. A final swapping is ap- |ei i • |ei i • ··· • ···
E
2i ei |x0 i • ··· ···
plied if |ei i = |1i to put the quantum register into x × g ..


··· ···

|xi Input x = .
and resets the auxiliary qubits to |0i. Note that all the oper-  |xj i ··· • ···
..

··· ···

ations are performed modulo N .
.
i i
/ +xg 2 / +20 g 2 · · · +2j g 2i · · ·

Figure 5. Decomposition of the first product-addition appear-


ing in each element of the decomposition of the exponentiation
into multiplications, see Figure 4.
the first product-addition appearing in Figure 4 and
i
(y, z, γ) → (x̄, x, −g −2 ) for the second one, where the We deduce that the modular exponentiation requires
negative power stands for multiplicative inverse modulo ne multiplications, each being decomposed into 2n con-
N ) when |ei i = |1i. In case |ei i = |1i, the mapping trolled additions and 1 controlled swap between two reg-
is performed by considering the binary representation isters, giving to a total number of 2ne n (ne ) controlled
7

additions (swaps between registers respectively). Each two controlled qubits |ei i and |xj i are both in state |1i.
addition needs to be modular, which can be obtained When such an addition is applied on a quantum register
with a specific representation and a standard adder cir- encoding z 0 using n + m qubits, the block in the dashed
cuit. box of Figure 6a is repeated n + m − 2 times, giving a
mean cost of 5.5(n+m)−9 CNOTs and 2(n+m)−1 Tof-
foli gates.

2. Coset representation (a) Doubly controlled semi-classical addition


|ei i • •
|xj i • •
|0i • • • • • • • • • |0i
The basic idea of the coset representation for adding (2j γ)0 • • •
2j γ to a quantum register encoding the integer z is to |z00 i • • |s0 i
extend the register for z with m additional qubits and to |0i • • • • • |0i
2m −1
encode it into the state √21m
P
|z + kN i. Except at (2j γ)1 • • • • •
k=0 |z10 i • • |s1 i
the bounds, this state is invariant under the addition of |0i |0i
N . This implies: •
(2j γ)2 •
m
2X −1 |z20 i |s2 i
1 j
 n+m
√ z + 2 γ + kN mod 2 (b) Controlled swap
2m k=0 • •
2 −1 m
× = •
1 X
z + 2j γ

≈√ mod N + kN × • •
2m k=0
Figure 6. Controlled operations. (a): semi-classical adder
taking each bit of the classical value 2j γ = 2k=0 2k (2j γ)k and
P
j
i.e. the modular addition of 2 γ in the register of z can
be performed with a standard adder, at the cost of a the three qubits register encoding z 0 as inputs and returning
(2j γ)k and |sk i = (z 0 + 2j γ · ei · xj )k . The block in the
small error which is exponentially suppressed when in- dashed box uses in average 5.5 CNOTs and 2 Toffoli gates.
creasing m [35]. Note that the resource needed to initial- (b): Fredkin gate implemented with a Toffoli and two CNOT
ize the register is negligible with respect to the resource gates. The controlled swap between registers (as required in
taken to implement the adder, see Appendix C. Taking Figure 4) is obtained by applying it to each pair of qubits.
into account the increase in register size, this means that
2ne (n + m) controlled additions and ne controlled reg-
ister swaps are needed for realizing Ekerå and Håstad’s
algorithm. 4. Number of gates

3. Controlled operations Given the number of gates in the controlled addition


and swap operations, the number of additions and swaps
in the multiplication and the number of multiplications
in the modular exponentiation, we estimate that factor-
A controlled swap operation between two qubits (Fred- 2
ization takes at leading order 11ne (n + m) CNOTs and
kin gate) can be performed using two CNOTs and one 2
Toffoli gates, see Figure 6b. Hence the total cost for 4ne (n + m) Toffoli gates.
controlled swaps operating on two registers prepared in
the coset representation of integers (encoded each with
n + m qubits) is of 2(n + m) CNOTs and n + m Toffoli
gates. Appendix C: Coset representation

For the controlled addition, note first that since we


use the coset representation of integers, a circuit for con- Modular addition is typically implemented with vari-
trolled addition modulo a power of two is sufficient to im- ants of the addition: an addition, a comparison,
plement a controlled modular addition. Such an addition a controlled correction and the clean-up of ancillary
can be implemented with the semi-classical adder pre- qubits [38]. As exposed in main text, coset representa-
sented in Figure 6a, which is inspired by Refs. [4, 36, 37]. tion of integers, introduced by Zalka [39] and formalized
It shows the basic circuit taking a classical value 2j γ and by Gidney [35], can be used to approximate the modular
a register encoding z 0 and returning 2j γ and z 0 +2j γ if the addition with a single standard adder circuit.
8

|0i H • H |0i H • H |0i H • H


n
|zi / ...
m +N −1 if x ≥ N +2N −1 if x ≥ 2N +2m−1 N −1 if x ≥ 2m−1 N
|0i / ...

2m
P−1
Figure 7. Preparation proposed in [35, Fig. 1] of a quantum register with n + m qubits in the state √1 |z + kN i as
2m
k=0
requested in the initialization of the coset representation. The first controlled operation adds the integer N to the register
made with n+m qubits provided that the ancillary qubit is in state |1i. The first classically controlled operation aims to change
the phase of the input state encoded in n + m qubits if and only if the result of the measurement is 1 and the number encoded
in the n + m qubits is larger or equal than N . In case one of the two conditions is not met, the input state is unchanged.

2m −1
The basic idea of the coset representation for adding √1
P
preparing the state |z + kN i in an extended
2j γ modulo N to a quantum register encoding the in- 2m
k=0
teger z is to extend the register for z with m additional register of size n + m. This is done by performing suc-
2m −1 cessive additions, each controlled by an ancillary qubit
qubits and to encode it into the state √21m
P
|z + kN i.
k=0 prepared in the state |0i+|1i

2
(m ancillary qubits in total)
Except at the bounds, this state is invariant under the which is then uncomputed, see Figure 7. The controlled
addition of N . This implies addition is performed using the circuit presented in Fig-
m ure 6a with only one control qubit. The uncomputation
2 −1
1 X of the ancillary qubit is based on a measurement and
z + 2j γ + kN mod 2n+m


m
2 k=0 depending on the result, a conditioned correction is re-
2 −1 m alized, see Figure 7. Let us detail the uncomputation of
1 X the first ancillary qubit presented in Figure 7. When the
z + 2j γ

≈√ mod N + kN (C1)
2m k=0 result of the measurement is 0, the register made with
n + m qubits is projected into √12 (|zi + |z + N i). When
i.e. the modular addition of 2j γ in the register of z can be the result is 1, the register state is √12 (|zi − |z + N i)
performed with a standard adder (modulo 2n+m ), at the
and the operation −1 needs to be applied to the compo-
cost of a small error which is exponentially suppressed
nent |z + N i, i.e., when the state of the register encodes
when increasing m [35]. Note also that the precision is
an integer larger than N . In order to implement the
improved if instead of adding 2j γ, one adds 2j γ mod N
conditioned operations for decomputing the m ancillary
(which does not change the result of the sum since we
qubits, we need to compare the value x encoded in the
consider the sum modulo N ). This is possible each time
quantum register of size n + m and an integer y known
the quantity to add is known classically.
classically satisfying 0 < y ≤ 2m−1 N < 2n+m (see the
The goal of the first subsection is to show that the last umcomputation in Figure 7) i.e. that can be writ-
resource needed to extend the register encoding |zi into ten with n + m bits. This comparison is implemented
2m −1 using the circuit presented in Figure 8a. First, the value
the state √21m
P
|z + kN i, as requested in this repre- 2n+m − y = y 0 is computed classically. Then the last
k=0
sentation, is negligible with respect to the resource taken carry of the sum of x and y 0 is computed with a circuit
to implement the modular exponentiation. In the second derived from the addition. If the value of this carry is 1,
subsection, we show that the coset representation is com- we conclude that x ≥ y, otherwise x < y. A Z gate is
patible with the modular multiplication circuit presented thus applied on the qubit encoding the last carry, before
in the main text. uncomputing the carries. The register ends up in state
± |xi depending on the relative value between x and y,
In the two next subsections, the coset representation as desired.
is considered for additions modulo N ; n is the number of
bits encoding N , and m the number of qubits added to Each controlled addition and correction costs
the register for the coset representation. O(n + m) gates. This operation is repeated m times,
giving a total cost of the coset representation initializa-
tion of the order O(m(n + m)).

1. Initialization In the modular exponentiation algorithm, the two


registers at the bottom of Figure 4 need to be pre-
pared initially in x = 1 and 0 respectively. Initializing
2m −1
Starting from a register with n qubits in state |zi,
them in the coset representation √21m
P
|1 + kN i and
the initialization of the coset representation consists in k=0
9
i
(a) Semi-classical comparison and correction As kN g 2 + k 0 N is a multiple of N , the obtained state
x0 • • x0 2m −1 2m −1 E
y00 • • y00 is very close to 21m
i
xg 2 + k 0 N
P P
|x + kN i
• • • • • • k=0 k0 =0
x1 • • x1 thanks to the coset representation itself, cf. Eq. (C1).
y10 • • • • y10 The latter corresponds to the desired state.

• • • • • •
x2 • • x2
Appendix D: Windowed arithmetic
y20 • • • • y20
Z

(b) AND computation (c) AND uncomputation In order to reduce the number of multiplications and
• • • • additions in the exponentiation algorithm, we use win-
• = • • = •
|0i |0i
dowed arithmetic circuits [21]. They consist in grouping
the bits of e for controlling each multiplication, hence re-
Figure 8. (a): Circuit inspired from [4, Fig. 17] which com- ducing the number of multiplications. Similarly, for each
pares the integer x encoded in n + m qubits and the integer multiplication, the input bits are grouped to reduce the
y < 2n+m known classically, and returns − |xi if and only if number of additions composing each multiplication.
x ≥ y. This is done in three steps: i) compute the carries of
y 0 + x with y 0 = 2n+m − y, ii) apply a Z operation on the last
carry and iii) uncompute the carries. The next subsection shows the details of the decompo-
(b) and (c): circuits defining the notations used to compute sition of the exponentiation into elementary additions of
and uncompute an AND operation, as introduced in [37, 40] the form +Tk mod N where the quantity Tk depends on
where the authors give efficient implementations in terms of the value of an integer k. These specific additions are im-
T (or π4 ) gates. When only one quantum control appear, it plemented in three steps, that are presented in separated
uses a CNOT instead of a Toffoli gate, and it can be removed subsequent subsections.
by directly using the control bit instead of the ancillary.

2m −1
√1
P
2m
|kN i takes O(m(n + m)) gates which is negligi- 1. Windowed exponentiation and multiplication
k=0
ble compared to the cubic cost of the full exponentiation.
Note however, that the cost of this initialization is taken
into account in our script for the evaluation of the whole Let us start by specifying the notations. We label the
algorithm cost. binary form of e as
ei:i+w
z }| e {
ene −1 . . . ei+we ei+we −1 . . . ei . . . e2 e1 e0 (D1)
2. Compatibility with the multiplication
i.e. ej is the jth bit of e. Let also ei:i+we be defined as
i+w e −1
When computing the multiplications from sequences X
of two product-additions (see Figure 4 of main text), the ei:i+we = 2j−i ej (D2)
j=i
input register encoding x and the ancillary register are
used both as control and target of the product-additions. i.e. ei:i+we is the number whose bit decomposition is
We here check that having the control register encoded in given by the bits of e starting at index i and taking
the coset representation is not a problem for performing we bits. The strategy for computing the exponentiation
the multiplication. using windowed arithmetic consists in decomposing ex-
ponent e in terms of numbers ei:i+we
Let us consider the first product-addition used
to implement the multiplication shown in the bot- X
e= 2i ei:i+we , (D3)
tom part of Figure 4. In the coset representation,
0≤i<ne
the input x and ancillary registers are in the state i≡0 mod we
2m −1 2m −1
1
|0 + k 0 N i meaning that af-
P P
2m |x + kN i such that
k=0 k0 =0
ter the product-addiction, their state ends up in Y i
2m −1 2m −1 i
E ge = g2 ei:i+we
. (D4)
1
|x + kN i (x + kN )g 2 + k 0 N mod 2n+m .
P P
2m 0≤i<ne
k=0 k0 =0 i≡0 mod we
10

(a) Multiplication for the exponentiation, with i = 2 and we = 2, decomposed into product-additions.





|ei Input ei:i+we Input ei:i+we Input ei:i+we

 =

 i

+x̄ −g −2 ei:i+we
i
|xi / ×g 2 ei:i+we
mod N |xi / Input x mod N

i
|0i / +xg 2 ei:i+we
mod N Input x̄ h0|

(b) Windowed product-addition, as needed in (a), with the windows size wm = 3.


we we
/ Input ei:i+we / Input ei:i+we Input ei:i+we

Input x0:3

Input x
=
Input x3:6

i i i
/ +xg 2 ei:i+we
mod N / +20 x0:3 g 2 ei:i+we
mod N +23 x3:6 g 2 ei:i+we
mod N

(c) Modular addition of a number read with a table lookup, as needed in (b).
we + wm we + wm
/ Input k / Input k Input k
n |Tk i
= |0i / Load Tk Input Tk Unload Tk |0i

/ +Tk mod N / +Tk mod N

Figure 9. Windowed arithmetic subcircuits for the modular exponentiation. When not specified, the register size is n+m qubits
(register encoded into the coset representation of integers).

The comparison with the decomposition of g e presented k2 = xi:i+wm , k being the concatenation of k1 and k2 ) to
in Eq. (B1) clearly shows that windowed exponentiation be added is known classically. Its addition being realized
divides the number of multiplications by we . modulo N , its value can be computed modulo N before
being loaded. n bits are thus sufficient to encode Tk .
As for the standard algorithm, the multiplications
of the product (D4) are implemented successively and Loading a value Tk into a quantum register is done
each multiplication is decomposed into a sequence of two using a quantum table lookup circuit which we discuss
product-additions, as shown in Figure 9a. The difference right after. The subsequent subsection is dedicated to
is that the added number now depends on the number the task aiming to unload the value Tk and reset the
ei:i+we . register in state |0i. The last subsection is dedicated to
the requested addition.
The product-addition is also performed in a windowed
way [21]. Figure 9b shows in particular how the first
product-addition needed for each multiplication is per-
formed using windows for input x of size wm = 3. 2. Table lookup

Figure 9c finally shows the implementation of an addi-


tion +Tk mod N of a quantity Tk that depends on the The quantum table lookup proposed in [40], pro-
value k. It requires three steps. First, the number Tk is duces the following operation on basis states: |ki |xi 7→
loaded into an ancillary register. Second, this number is |ki |x ⊕ Tk i with ⊕ the bitwise XOR operator. For state
unconditionally added to the desired register and finally preparation, as required in the first step of the opera-
the ancillary register is cleaned up. Note that the value tion presented in Figure 9c, the target register starts in
i
of Tk (given by Tk1 ,k2 = 2i k1 g 2 k2 , with k1 = ei:i+we and the state |0i such that control and target registers end
11

in |ki |Tk i. The circuit presented in Figure 10 shows the 3. Table unlookup
principle of this operation with registers for k and Tk
composed respectively of 3 and 5 qubits.
The purpose of the table unlookupPoperation (last
step in Figure 9c) is to map the state αk |ki |Tk i into
P k
αk |ki, where αk are some complex coefficients. A nat-
k2 • • • • • • k
k1 • • ural way to do this mapping is to apply again the lookup
• • • • • • • • • • • • operation described in the previous subsection. Since
k0 • • • • the lookup operates on the computational basis follow-
• • • • • • • • ing |ki |xi 7→ |ki |x ⊕ Tk i where ⊕ standsP for the bit-
? ? ? ? ? ? ? ? wise XOR operator, by linearity it maps αk |ki |Tk i 7→
|0i k
? ? ? ? ? ? ? ? P
|0i αk |ki |0i, the latter corresponding to the desired state
? ? ? ? ? ? ? ? k
|0i
? ? ? ? ? ? ? ? when simply discarding the qubits previously encoding
|0i
? ? ? ? ? ? ? ? the numbers Tk .
|0i
T7 T6 T5 T4 T3 T2 T1 T0
However, a more efficient measurement-based tech-
Figure 10. Example of a quantum table lookup. For a basis nique is possible, as shown in Ref. [41, Appendix C] and
state |ki specifying the address of the number Tk from a clas- improved in Ref. [21]. The principle consists in starting
sical table, the quantum table lookup maps basis states |ki |0i by measuring the register encoding Tk in the X basis be-
into |ki |Tk i. Here k and the output are composed respectively
fore applying a phase shift conditioned on the result of
of 3 and 5 qubits. The notations for the AND computation
and uncomputation is presented in Figure 8. Black and white measurements. For a more detailed explanation, let us
circles are controls on the |1i and |0i states respectively. The start to expand the qubits encoding the numbers Tk in
question mark on the controlled NOT means that a controlled bits indexed by j ((Tk )j being the jth bit of Tk ). The
NOT is applied on qubit i only when the ith bit of Tk takes state before the uncomputation can be written as
the value 1. X O E
αk |ki (Tk )j . (D5)
j
k j

Let us now focus on a specific qubit indexed by j ∗ . We


Concretely, the numbers Tk specify the set of controlled label K0 = {k | (Tk )j ∗ = 0} and K1 = {k | (Tk )j ∗ = 1}.
NOT to be used (the question mark on the controlled The state before the uncomputation can be rewritten as
NOT means that a controlled NOT is applied on qubit  
i only when the ith bit of Tk takes the value 1). The X O E
circuit operating on the bits ki of k prepares the last an-  αk |ki (Tk )j  |0i
j∗
j
cillary qubit (line 5 from the top) in the state |1i at the k∈K0 j6=j ∗
time (specified by k) where the gates corresponding to
 
E
Tk are applied, and |0i otherwise. The building block
X O
+ αk |ki (Tk )j  |1i
j∗ . (D6)
of the circuit is boxed in Figure 10. It uses 1 CNOT, 1 k∈K1 j6=j ∗
j

AND computation and uncomputation. Given that k is


encoded into the number of bits we + wn and can thus By applying a Hadamard gate on the j ∗ th qubit, we ob-
take 2we +wn different values, the number of blocks in tain
Pm −1 j
we +w
the upper part of Figure 10 is given by 2 =  X O E 
j=1 αk |ki (Tk )j
j
2we −wn − 2. This means that 2we +wm − 2 CNOT gates, 1 
 k∈K0 j6=j ∗

√   |0ij ∗

2we +wm − 2 AND computations and uncomputations are
E
2 +
X O
α |ki k (T )  k j
needed to implement these blocks. Moreover, the num- k∈K1 j6=j ∗
j
ber of controlled multi-NOT gates to load the value Tk  X O E 
is given by 2we +wm , each gate being decomposed into αk |ki (Tk )j
j
n/2 CNOT in average since Tk takes n bits. When 1 
 k∈K0 j6=j ∗

+√   |1ij ∗ . (D7)

including the (two) NOT gates operating on the high-
E
2 −
X O
αk |ki (Tk )j 
est bit of k, we conclude that the table lookup uses k∈K1 j6=j ∗
j
2 NOT gates, 2we +wm − 2 + 2we +wm −1 n CNOT gates,
2we +wm −2 AND computations and uncomputations (cor- Hence, if the measurement of the j ∗ th qubit yields 0, the
responding to 2 × (2we +wm − 2) Toffoli gates). qubit is properly uncomputed. If the result is 1, a phase
12

shift needs to be applied on states corresponding to the


(a) Binary to unary (b) Unary to binary
indexes k ∈ K1 . conversion conversion
• •
 
This uncomputation is successively applied to all the
 
|xi  • • |xi  • •
qubits encoding the numbers Tk . Let tj be the mea- • • • • • • • •
surement result of the jth qubit. The state after all the |0i • • • • • • |0i
measurements is given by • • • • • •
X • • • •
αk σk |ki , (D8)
• • • •
k
• •
Q tj (Tk )j • •
with σk = (−1) . We now label • •
j • •

K = {k | σk = −1}. (D9) Figure 12. (a): representation of the circuit proposed in


Ref. [21] for preparing a copy in a ancillary register of an inte-
In order to recover the desired state, we need to correct ger x in a unary representation starting from an encoding of
selectively the phase of terms |ki for which k ∈ K. x in a control register in the binary representation. The first
not operation prepares the first qubit in the ancillary regis-
s ter in state |1i. The first AND computation writes the result
|k:s i / Input k:s Input k:s of an AND operation between the first bit of x and the bit
we + wm − s 1 encoded in the first qubit of the ancillary register into the
|ks: i / Input ks:
second qubit of the ancillary register. In case the state of the
2s latter is |1i, the state of the first qubit of the ancillary regis-
Init unary / H ⊕Fks: H Deinit unary
ter is changed to |0i. The combination of AND and CNOT
Figure 11. Representation of the four steps proposed in operations is successively repeated until the desired qubit of
Ref. [21] to selectively change the phase of components |ki in the ancillary register is in state |1i. (b): representation of the
the state given in (D8) when the index k belongs to K (D9). circuit proposed in Ref. [21] to erase the value in the ancillary
The central operation is a table lookup with the values Fks: = register while keeping the integer x into the control register.
s
2P −1 The circuits (a) and (b) corresponds to the first and third op-
2j δ(j + 2s ks: ) where δ() is the indicator function of K. erations needed for the selective phase correction operation
j=0 presented in Figure 11.

The selective phase correction is done in four steps [21],


2b 2 c+1 +4 1-qubit gates, 2we +wm −1 +2b 2 c+1 +
we +wm we +wm
as shown in Figure 11. First, the control register which
2d 2 e − 4 CNOTs and 2b 2 c + 2d 2 e −
we +wm we +wm we +wm
uses we + wm qubits in state |ki is split in two groups.
3 ANDs (1 NOT gate, 2b 2 c − 1 CNOT gates and
we +wm
The first group is made with s qubits in state |k:s i. The
second group takes the remaining we + wm − s qubits in 2b 2 c − 1 AND computation for the unary conver-
we +wm

state |ks: i, such that |ki = |ks: i ⊗ |k:s i and k = k:s +


sion, 2 × 2b 2 c Hadamard gates around the table
we +wm

2s ks: . The second step consists in writing the integer


lookup, 2 NOT gates, 2d 2 e − 2 + 2we +wm −1 CNOT
we +wm

k:s in an ancillary register in the unary representation: a


gates, 2d 2 e −2 AND computations and uncomputa-
we +wm
register with 2s qubits representing a number k:s with the
tions for the lookup circuit and 1 NOT gate, 2b 2 c −
we +wm
state of the qubit number k:s being |1i and all the other
1 CNOT gates and 2b 2 c − 1 AND uncomputa-
we +wm
qubits in the state |0i. The qubits in state |ks: i and the
ancillary qubits are then used as control and target qubits tion for the binary conversion). Including the additional
for a lookup circuit where the controlled multi-NOT gates n Hadamard gates and n measurements on Tk , we con-
are replaced by controlled multi-Z gates. Finally, the clude that the table unlookup takes 2b 2 c+1 + n +
we +wm

ancillary register is uncomputed. The circuit used to


4 1-qubit gates, 2we +wm −1 + 2b 2 c+1 + 2d 2 e −
we +wm we +wm

initialize the ancillary register in shown in Figure 12a.


4 CNOTs and 2b 2 c + 2d 2 e − 3 ANDs.
we +wm we +wm

The one used to put it back in its initial state is given


in Figure 12b.

Starting from x encoded in s qubits, the conversion to


the unary representation takes 1 NOT gate, 2s −1 CNOT 4. Standard adder
gates and 2s − 1 AND computation. The conversion
back to the binary representation takes 1 NOT gate,
2s − 1 CNOT gates and 2s − 1 AND uncomputation. As we use the coset representation of integers with win-
Given that k is encoded in we + wm bits, and that a dowed arithmetic operations, a circuit for unconditional
choice s = we +w

2
m
is judicious to minimize the num- addition modulo a power of two is sufficient to imple-
ber of gates, the change of phase of components |ki takes ment a modular addition. The adder we use, which is
13

described in [37] and optimized from [36] for use with order to keep the evaluation independent of the error
T gates, is presented in Figure 13. It is thrifty in gate correction choice, we express the cost in terms of the
number and ancillary qubits, at the cost of being deeper number of 1-qubit, 2-qubit gates and AND computation
than other circuits [42, 43], which is not a disadvantage and uncomputation [44].
for our architecture.
x0 • • • x0 The modular exponentiation consists in ne /we multi-
y0 • • (y + x)0 plications, each multiplication using 2 product addition
• • • • and a swap and each product addition is implemented
x1 • • • x1 with (n+m)/wm lookups, additions and unlookups. Note
y1 • • (y + x)1 that the swap operation is realized by simply relabel-
ing the register, hence is for free. According to the
• • • • counts obtained from previous subsections, the cost of
x2 • • • x2 the exponentiation is dominated — in the limit n → ∞,
y2 • • (y + x)2 ne = O(n), we and wm constant — by: 2 new (n+m)n
e wm
1-
• qubit gates, (2we +wm n + 12(n + m)) new(n+m) CNOTs,
x3 • x3 e wm
2
y3 (y + x)3 and 2 new(n+m)
e wm
AND computations and uncomputations
2

Figure 13. Adder modulo 24 from [37], using the same nota- (translatable into 4 new(n+m)
e wm
Toffoli gates). Note that
tions as in Figure 8. The building block (boxed) is repeated when considering the universal gate set T , S, H, X, Y ,
two times, for the qubits numbers 1 and 2, while the first and Z, CNOT, controlled-Z and their conjugate, according to
last use a simplified subcircuit. Fig. 4 of Ref. [40] the AND computation and uncomputa-
tion costs in average 8 1-qubit gates and 3.5 2-qubit gates.
As presented in Figure 9c, the adder needs to add The total cost of the exponentiation is hence given at
a number Tk taking n qubits into a register with n + the leading order by 2 new (n+m)n
e wm
(9n + 8m) 1-qubit gates
m qubits. To achieve this, either the first register for Tk and (2we +wm n + 19(n + m)) new(n+m)
e wm
2-qubit gates. In
is extended with qubits in the |0i state, either we use the code used to compute the required resources and find
carry propagation blocs for the last qubits. Such blocs the optimal parameters, the complete formula have been
are identical to the ones of semi-classical adder with clas- used [22].
sical input 0; see [4, Fig. 17] for an example of such a
circuit. For gate counting, the first solution is taken into
account.
Appendix E: Error correction
The cost of the addition circuit (Figure 13) is 6(n +
m) − 9 CNOT gates and n + m − 1 AND computations
and uncomputations. This appendix is dedicated to 3D gauge color codes.
The first subsection is dedicated to the principle of sub-
system codes. The second subsection describes the ge-
ometrical structure of 3D gauge color codes. The last
5. Cost estimation subsection provides a detailed description of the cut of
the code structure that is used to process and correct the
logical qubits.
In summary, the parameters of the logical circuit for
computing the modular exponentiation are

1. Subsystem codes
n number of bits of the exponentiated number g
ne number of bits of the exponent e
Subsystem stabilizer codes [45] are defined by three
we window size for the exponentiation subgroups of the Pauli group: the stabilizer, gauge and
logical (also designated as bare logical in [23]) operator
wm window size for the multiplication groups, such that the stabilizer group is the center of the
m number of qubits added by the coset representation gauge group up to phases, i1 is included in the gauge
group, the operators from the gauge and logical groups
commutes, and the normalizer of the stabilizer group is
The aim of this subsection is to give an estimate of the the product of gauge and logical groups. We invite the
number of gates needed to implement this circuit. In reader to look at Refs. [23, 45] for an explicit construction
14

of those groups from canonical generators of the Pauli of the large tetrahedron, see Fig. 4b of [47] for an illustra-
group. The stabilizer group plays the standard role of tion. The vertices of elementary tetrahedrons are colored
stabilizers, i.e. divides the total Hilbert space H into a with 4 different colors such that adjacent vertices get a
direct sum of orthogonal subspaces C ⊕ C ⊥ where C — different color. Each elementary tetrahedron represents
the stabilized subspace — corresponds to the eigenspace a physical qubit.
+1 of all stabilizers. The gauge and logical groups de-
compose the stabilized subspace C into a tensor product The measured operators are the gauge generators for
of the logical qubits space A and the gauge qubits space the code used to implement the H and CNOT gates
B [46], that is, the Hilbert space is decomposed as — the (1, 1) code (see [23]). These generators are de-
scribed by the edges: each operator is the product of X
H = (A ⊗ B) ⊕C ⊥ . or Z operators of the elementary tetrahedrons adjacent
| {z }
C to a given edge (each operator implies up to 6 physical
The gauge group acts trivially on the logical qubits and qubits).
is the Pauli group of the gauge qubits while the logical
The stabilizer generators of the (1, 1) and (1, 2) codes
group acts trivially on the gauge qubits and is the Pauli
(the (1, 2) code refers to the code used to implement the T
group of the logical qubits (up to phases). This ensures
gate [23]) which are described by the vertices and edges,
that gauge operator measurements don’t modify the log-
are deduced from the values of measured operators. More
ical qubits.
precisely, the operator corresponding to a vertex can be
A gauge fixing operation consists in switching from a written as the product of the operators corresponding to
code to another one such that the new stabilizer group edges starting at the given vertex and ending on vertices
includes the original one while being included into the of a common color. Three choices of color are possible,
original gauge group, while keeping unmodified the logi- allowing one to recover in three different ways an opera-
cal group. The decomposition associated to the original tor corresponding to a vertex. This redundancy can be
code used for achieving fault-tolerant error correction in only
one measurement of the (gauge) operators related to the
H = (A ⊗ B) ⊕C ⊥ edges [23].
| {z }
C
Let ncode be the index of the code which is the number
then becomes of the form of vertices of the same color on one edge of the large
H = (A ⊗ B 0 ) ⊕ (A ⊗ B 00 ) ⊕ C ⊥ tetrahedron (denoted as n in [23]). The code distance
| {z } is given by d = 2ncode + 1, and the number of physical
C 0⊥ 3
qubits is 1 + 4ncode + 6n2code + 4n3code = d 2+d [23, 47].
where B 0 is the new gauge qubit space. As a consequence,
a valid code-word for this new code is also valid for the
initial one. The passage of the latter to the new code
3. Slicing of the code structure
is done by measuring the generators of the gauge group,
the results of these measurements giving the correction
to apply on B 0 ⊕ B 00 to remove the components on B 00 .
To process the information, the code structure is de-
For 3D gauge color codes, code switching allows a composed into slices, each slice being map successively
transversal error-corrected implementation of a univer- into the 2D processor. While several cuts in slices are
sal set of gates [23]. possible, we choose slices orthogonal to two faces (see
Figure 14). The processor need to be sized to fit in the
larger slice, that join the edge not included into any of
the two faces to the middle of the opposing edge — the
2. Code geometrical structure magenta slice in Figure 14.

With the lattice described in Ref. [23], the central slice


The geometrical structure of the 3D gauge color codes corresponds to the elementary tetrahedrons for which all
is described in detail in Section 3.1 of [23]. It takes a large vertices coordinates satisfy x + z = ncode − 2 or x + z =
tetrahedron, itself decomposed into elementary tetrahe- ncode − 1 (the elementary tetrahedrons between the two
drons, see Figure 14 for an example. Four extra points plans defined by the previous equations). Note that the
(vi , i ∈ {1, 2, 3, 4}) are then added outside the large tetra- number of slices is given by d − 2.
hedron, one point in front of each facet of the large tetra-
hedron. Elementary tetrahedrons are finally added be- The number of elementary tetrahedrons included in
tween those extra points and the vertices at the surface this slice is counted by considering three tetrahedron sets,
15

see Figure 14 and Figure 15. Two sets correspond to the


(a) First set (b) Second set (c) Third set
elementary tetrahedrons having a facet at the interplay
between two slices (Figure 15a (Figure 15b) is associated
to the elementary tetrahedrons with one facet are the in-
terplay between the magenta slice and the green (cyan)
slice). The last set is associated to the elementary tetra-
hedrons having no facet at the interplay between two
slices (Figure 15c). One can check that the two first sets
2ncode−2
k = 2n2code − 3ncode + 1 elementary tetra-
P
include
k=1
P−2
ncode
hedrons while the last set has (2ncode −1)+ 2(2k +
k=0
1) = 2n2code − 2ncode + 1 elementary tetrahedrons. They
are 16ncode − 2 additional elementary tetrahedrons re-
sulting from the 4 added points in the construction of
the code. In total, the maximum number of elemen- Figure 15. Decomposition of the central slice for ncode = 3
tary tetrahedron for one slice of the code structure is (magenta slice in the tetrahedron presented in Figure 14).
6n2code + 8ncode + 1. Since we consider a processor that Each subfigure corresponds to a set of elementary tetrahe-
drons of the central slice, seen from different point of views.
can process up to two slices (associated to two different
On (a) and (b), each triangle corresponds to an elementary
logical qubits) and accounting for the ancillary subsys- tetrahedron. On (c) each small rectangle correspond to an
tems needed to measure the gauge generators by a sim- elementary tetrahedron.
ple factor of two, we obtain the number of physical qubits
in the processor specified in the main text. For more de-
tails, see the ancillary file tetrahedron_3_bis.scad [22], rate threshold used for stabilizer codes in the litera-
where each tetrahedron color corresponds to a given slice, ture: code-capacity, phenomenological and circuit-level.
the larger being the magenta one. Code-capacity thresholds assume perfect measurements
of stabilizers. Phenomenological thresholds model faulty-
measurements as bit-flip errors on stabilizer measure-
ment outcomes. Circuit-level thresholds model errors oc-
curring at any stage of stabilizer measurement circuits.

In Ref. [26] a clustering decoding scheme is presented


and by including a phenomenological noise to the mea-
surement outputs, the authors estimate a code-capacity
threshold of 0.46 % and phenomenological threshold to
about 0.31 %, suggesting an upper bound on the circuit-
level threshold. Note however that the underlying lattice
considered in Ref. [26] (cubic lattice) is different from
the one we have considered (body centered cubic lattice
(bcc)).
Figure 14. Code geometrical structure for ncode = 3 (with-
out the extra points (vi , i ∈ {1, 2, 3, 4}). Each slice has been
represented with a specific color. The larger slice is with A more recent decoding algorithm is presented in [48,
the magenta elementary tetrahedrons. The figure shows that 49] using the bcc lattice, but the authors give an estimate
the maximum number of slices involved in an operator corre- of the code capacity threshold of 0.77 % only. By the way,
sponding to an edge is 2. a slightly better code capacity threshold of 0.80 % has
been estimated in Ref. [47] under the same assumptions.

Finally, statistical arguments have be used in Ref. [50]


4. Threshold of 3D gauge color codes to estimate code-capacity threshold of 3D gauge color
codes with ideal decoding to around 1.9 %. This suggests
that an appropriate decoder could significantly improve
The value of the threshold for 3D gauge color codes has the value of the code-capacity threshold and hence of the
been evaluated in a few references that we now discuss. phenomenological and circuit-level thresholds.

In order to clarify on the context, let us first re- Since we believe that the determination of the circuit-
mind that there are three main definitions of error- level threshold goes beyond the scope of this work, the
16

run-time and resource needed to factor a 2 048-bit RSA the number of physical qubits in the processor and the
integer are given in the main text under the assumption run-time.
of a threshold of 0.75 %. Since this choice is somehow ar-
bitrary, we give the evolution of run-time and resource as
a function of the threshold in Figure 16. More precisely,
they are given as a function of the ratio p/pth between 1. Optimal parameters to factor n-bit RSA integers
the physical error probability per cycle p and the fault-
tolerant threshold pth which is the only relevant quantity
at first order. For pth = 0.75 % and an error probability The resources and parameters needed to factor RSA
per cycle and per physical qubit of 10−3 , this ratio p/pth integers encoded in n bits are specified in Table I. In
is given by ≈ 0.13. particular, we consider the factorization of RSA integers
with n = 6 bits, the number of bits needed to factor
200 35. We also consider n = 829 which corresponds to the
qubits 18
100 largest RSA integer factorized so far [51].
qubit number (kiloqubit)

time

Expected time (month)


12
50
40
30
20 6
2. Trade-off between qubits and run-time
10 4
3
5
4
3 2
We have estimated that an average run-time of
2
177 days is needed to factor a 2 048-bit RSA number.
1 1 There are several ways to reduce this number, most of
0.0 0.1 0.2 0.3 0.4 0.5
them coming at the cost of using more qubits in the pro-
p/pth
cessor. The items below present several ways separately.

Figure 16. Number of qubits in the processor and run-time


to factor of 2 048-bit RSA integers in function of the ration • Due to the tetrahedral geometry of the code struc-
between the physical qubit error and the fault-tolerant code ture, only one third of the processor qubits are used
threshold. during the error-correction steps in average. A fac-
tor 3 in time could thus be saved by making use of
We emphasize that the value of the threshold for 3D them.
gauge color codes does not change the take home mes-
sage of the whole paper, namely that the use of a quan- • The logical circuit can be parallelized in several
tum memory in quantum computing strongly reduces the ways, giving a speed-up roughly proportional to the
number of qubits in the processing unit. Even when con- increase in qubit numbers in the processor. More
sidering for example a circuit-level threshold of 0.2 % and precisely:
a error probability per operation of 10−3 , the use of a
quantum memory reduces the number of qubits by two – Some operations in the adder can be paral-
orders of magnitude in the processor compared to an ar- lelized (see Figure 13). The controlled NOT
chitecture without memory for factoring 2 048-bit RSA operations aligned vertically can be applied at
integers (the same conclusion holds when considering the the same time.
standard approach using surface code, see Appendix F 3). – The run-time is dominated by the time spent
to implement the CNOT gates of the quan-
tum lookup circuit (see Figure 10) and they
are easily parallelizable. A full paralleliza-
Appendix F: Results and possible improvements
tion, would reduce the factorization of 2 048-
bit RSA integers to about 27 days, at the cost
of using about 12 million qubits in the proces-
We presented in the main text the resources needed to sor.
factor 2 048-bit RSA integers corresponding to the most
common RSA key size. In the first subsection of this – Oblivious carry runways allows parallelization
appendix, we discuss the factorization of RSA integers of the adders [35].
of various sizes. The second subsection is dedicated to – Other type of adders could exploit fur-
a discussion on ways to reduce the run-time to factor ther parallelizations, for instance lookahead
RSA integers and in particular, on the trade-off between adders [43].
17

n ne m we wm d nqubits texp logical qubits total modes spatial modes temporal modes all memory correction
6 6 4 3 2 7 316 1 min 38 6 650 3 002 5 95 µs
8 9 8 3 2 13 1 060 2s 58 64 090 15 370 11 319 µs
16 21 11 3 2 17 1 796 10 s 99 244 035 44 451 15 742 µs
128 189 19 3 3 29 5 156 50 min 571 6 971 339 736 019 27 8 ms
256 381 21 3 3 33 6 660 7 hours 1 089 19 585 665 1 813 185 31 17 ms
512 765 24 3 3 37 8 356 2 days 2 122 53 782 090 4 432 858 35 37 ms
829 1 242 26 3 3 41 10 244 11 days 3 396 117 097 476 8 697 156 39 66 ms
2 048 3 029 30 3 3 47 13 436 177 days 8 284 430 229 540 27 825 956 45 186 ms

Table I. For different integer sizes n and corresponding exponent size ne (∼ 1.5n), the table presents the optimal set of
parameters, processor size and computation run-time, and the memory requirements.

– During a product-addition operation, the dif- approach comes from the fact that there is no need for
ferent additions can be parallelized by com- magic state distillation in the use 3D gauge color codes.
puting separately partial sums. The number of qubits in the processor is kept small be-
– During the exponentiation, the different mul- cause the qubits are released from the memory and pro-
tiplications can be parallelized by computing cess slice by slice.
separately partial products.
• The qubit number can be reduced using another
slicing of the code structure, at the cost of a longer Appendix G: Memory requirements to factor
computation time. For example, if one chooses to RSA-2 048 integers
cut the tetrahedron by slices parallel to a facet of
this tetrahedron, we estimate that a 2 048-bit RSA
integer could be factorized with 6 628 qubits in the We would like to first emphasize that the main objec-
processor and 354 days. tive of our project was to evaluate accurately the perfor-
mance of an architecture in which unprocessed qubits
are stored in a quantum memory. The standard ap-
proach suffers from the need of millions of individually
3. Decoupling the gain from 3D gauge color code
and multimode memory controlled qubits and several research entities are dedi-
cating large teams of engineers to tackle this challenge.
We have shown through Shor’s algorithm that the use of
Two new design elements have been proposed in this a quantum memory reduces significantly the number of
manuscript, the use of 3D Gauge color codes and an ar- qubits in the processor though a significant change in the
chitecture using a multi-mode memory. We here separate way the information is processed and protected against
them out and get insight into the improvements from errors. Our results hence provide a solution to an en-
each. gineering problem and turns it into a physics problem:
the implementation of a faithful and multimode mem-
The main motivation to use 3D gauge color codes is ory. Before discussing the requirements on the memory
to get rid of the magical state factory needed for im- in detail, let us clearly define the notion of multimode
plementing non-Clifford gates in surface code. However, memory [53].
the transversality of T gate on 3D gauge color codes is
strongly linked with the dimensionality, and 2D color From an algorithm point of view, “spatial modes” are
codes can’t directly achieve it [23, 52]. There is no direct stored modes that can be accessed in constant time,
way to make use of a 3D color code on a 2D grid. while “temporal modes” can only be sequentially recov-
erable (first stored, first release). In the proposed im-
The main advantage brought by the memory is to un- plementation based on spin-echo, temporal modes cor-
load qubits from the processing unit to the memory. Us- respond to different time slots, photons arriving in dif-
ing a memory in the standard approach for example (2D ferent time bins being remitted sequentially after spin
grid and surface code), we estimate that a RSA-2 048 in- refocusing. Spatial modes correspond to either different
teger can be factorized with a 2D surface code in about spatial (transverse) modes of a cavity or to different cav-
68 days using a memory that can store up to 5 mil- ities (with possibility to combine both). As discussed in
lion modes and a processor with 184 thousand qubits, the main text, it is possible to use temporal multiplexing
180 thousand being dedicated to the magical state fac- only, at the cost of increasing the run-time.
tory and 4 thousand to the logical qubits on the proces-
sor. The additional reduction in the processor size in our We now estimate that the factorization of RSA-2 048
18

integers with the proposed architecture would take a Appendix H: Realization combining a rare-earth
memory with the following characteristics: doped solid and a superconducting resonator

For implementing a multimode memory with a spin-


• A large multimode capacity to store 28 million spa- echo technique, materials doped with rare-earth, such
tial modes, each spatial mode being used to store as Erbium Er3+ provide an appealing example since
45 temporal modes. We stress that the number these ions have doubly-degenerate Zeeman states which
of modes in the memory has not been optimized split when an external magnetic field is applied. Sev-
(only the number of qubits in the processor and eral manuscripts have reported on the successful cou-
the run-time are optimized). Note also that differ- pling between the crystal Er3+:Y2SiO5 and a supercon-
ent choices of processing and error-correction pro- ducting microwave resonator [31–33]. Ref. [33] in par-
tocols may lead to compromises between the num- ticular reported on √ the strong coupling with a collec-
bers of processing qubits and multimode capacity, tive coupling rate g N̄ = 2π × 34 MHz and an inho-
if needed. For example, we estimate that RSA- mogeneous linewidth Γ = 2π × 12 MHz. This results
2 048 integers can be factorized in 68 days with a in a very high absorption coefficient α = 4.0 m−1 . If
2D surface code using a memory that can store up we assume a L = λ/2 cavity, unit absorption and re-
to 5 million modes and a processor with 184 thou- emission efficiencies are obtained if the quality factor is
sand qubits in the processor. Q = F = 2π/(αλ) ≈ 26 for a 5 GHz cavity. In this
low-Q regime, κ  Γ and a coherence time of a few hun-
dreds of microseconds would translate into a multimode
• Storage time greater than 186 ms. More precisely,
capacity of a few tens of modes. By working with crys-
we estimate that the maximum storage time be-
tals having lower doping concentrations, the coherence
tween two readouts of the same qubits is less than
time can be significantly increased [54], while still reach-
2 hours. A memory with a storage time of at
ing the impedance matching point with low-Q resonators.
least two hours is however not necessary as error-
In this case, a few thousand modes might realistically be
correction steps can be implemented periodically
stored very efficiently. Rare-earth doped materials is not
at the cost of increasing the run-time. Error cor-
the only option and other candidates such as negatively
rection of all the qubits stored in the memory is
charged nitrogen vacancy color centers in diamond [55] or
estimated to take 186 ms with a processor having
bismuth donors in silicon [56] may be even more promis-
13 436 qubits, meaning that the storage time sim-
ing.
ply needs to be longer than 186 ms. Applying a
correction every second for example would increase
the run-time by about 23 %.


elie.gouzien@cea.fr
• Error probability for a transfer to memory, stor- †
https://quantum.paris
age, and retrieval less than 0.1 %. Note that this [1] M. Kjaergaard, M. E. Schwartz, J. Braumüller,
requirement for a complete cycle of write/read from P. Krantz, J. I.-J. Wang, S. Gustavsson, and W. D.
memory is likely very conservative. Indeed, the Oliver, Superconducting Qubits: Current State of Play,
threshold value of error correction is mainly de- Annual Review of Condensed Matter Physics 11, 369
termined by the errors happening during the sta- (2020), 1905.13641.
[2] L. Lamata, A. Parra-Rodriguez, M. Sanz, and E. Solano,
bilizer measurements. We thus conjecture that
Digital-analog quantum simulations with superconduct-
the error correction could handle higher error rate ing circuits, Advances in Physics: X 3, 1457981 (2018),
for those specific operations. The effect of this 1711.09810.
strongly dissymmetric noise between the mem- [3] C. Gidney and M. Ekerå, How to factor 2048 bit RSA in-
ory/processor operations is still under investiga- tegers in 8 hours using 20 million noisy qubits, Quantum
tion, and we choose to stick to the conservative 5, 433 (2021), 1905.09749.
hypothesis for this article. [4] Y. R. Sanders, D. W. Berry, P. C. S. Costa, L. W. Tessler,
N. Wiebe, C. Gidney, H. Neven, and R. Babbush, Compi-
lation of Fault-Tolerant Quantum Heuristics for Combi-
• The information stored in a given memory mode natorial Optimization, PRX Quantum 1, 020312 (2020),
can be mapped to 3 qubits of the processor: two 2007.07391.
[5] J. Lee, D. W. Berry, C. Gidney, W. J. Huggins, J. R. Mc-
for the 2-qubit gates (depending on whether the Clean, N. Wiebe, and R. Babbush, Even More Efficient
physical qubit is the logical control or target qubits) Quantum Computations of Chemistry Through Ten-
and one for the error correction and 1-qubit gates. sor Hypercontraction, PRX Quantum 2, 030305 (2021),
No need for an all to all connectivity. 2011.03494.
19

[6] D. Kielpinski, C. Monroe, and D. J. Wineland, Architec- [23] H. Bombı́n, Gauge color codes: optimal transversal gates
ture for a large-scale ion-trap quantum computer, Nature and gauge fixing in topological stabilizer codes, New
417, 709 (2002). Journal of Physics 17, 083002 (2015), 1311.0879.
[7] D. D. Thaker, T. S. Metodi, A. W. Cross, I. L. Chuang, [24] E. T. Campbell, B. M. Terhal, and C. Vuillot, Roads
and F. T. Chong, Quantum Memory Hierarchies: Effi- towards fault-tolerant universal quantum computation,
cient Designs to Match Available Parallelism in Quantum Nature 549, 172 (2017), 1612.07330.
Computing, in 33rd International Symposium on Com- [25] H. Bombı́n, Single-Shot Fault-Tolerant Quantum Er-
puter Architecture (ISCA’06) (IEEE, 2006) pp. 378–390, ror Correction, Physical Review X 5, 031043 (2015),
quant-ph/0604070. 1404.5504.
[8] Z.-L. Xiang, S. Ashhab, J.-Q. You, and F. Nori, Hybrid [26] B. J. Brown, N. H. Nickerson, and D. E. Browne, Fault-
quantum circuits: Superconducting circuits interacting tolerant error correction with the gauge color code, Na-
with other quantum systems, Reviews of Modern Physics ture Communications 7, 12302 (2016), 1503.08217.
85, 623 (2013), 1204.2137. [27] S. J. Devitt, W. J. Munro, and K. Nemoto, Quantum
[9] G. Kurizki, P. Bertet, Y. Kubo, K. Mølmer, D. Pet- error correction for beginners, Reports on Progress in
rosyan, P. Rabl, and J. Schmiedmayer, Quantum tech- Physics 76, 076001 (2013), 0905.2794.
nologies with hybrid systems, Proceedings of the National [28] M. Afzelius, N. Sangouard, G. Johansson, M. U. Staudt,
Academy of Sciences 112, 3866 (2015), 1504.00158. and C. M. Wilson, Proposal for a coherent quantum
[10] C. Grezes, Y. Kubo, B. Julsgaard, T. Umeda, J. Isoya, memory for propagating microwave photons, New Jour-
H. Sumiya, H. Abe, S. Onoda, T. Ohshima, K. Naka- nal of Physics 15, 065008 (2013), 1301.1858.
mura, I. Diniz, A. Auffeves, V. Jacques, J.-F. Roch, [29] M. Afzelius and C. Simon, Impedance-matched cavity
D. Vion, D. Esteve, K. Mølmer, and P. Bertet, Towards quantum memory, Physical Review A 82, 022310 (2010),
a spin-ensemble quantum memory for superconduct- 1004.2469.
ing qubits, Comptes Rendus Physique 17, 693 (2016), [30] T. Chanelière, G. Hétet, and N. Sangouard, Quan-
1510.06565. tum Optical Memory Protocols in Atomic Ensembles,
[11] H. Bombı́n and M. A. Martin-Delgado, Exact topologi- in Advances In Atomic, Molecular, and Optical Physics,
cal quantum order in D = 3 and beyond: Branyons and Vol. 67 (Elsevier, 2018) Chap. 2, pp. 77–150, 1801.10023.
brane-net condensates, Physical Review B 75, 075103 [31] P. A. Bushev, A. K. Feofanov, H. Rotzinger, I. Pro-
(2007), cond-mat/0607736. topopov, J. H. Cole, C. M. Wilson, G. Fischer, A. V.
[12] P. W. Shor, Algorithms for quantum computation: dis- Lukashenko, and A. V. Ustinov, Ultralow-power spec-
crete logarithms and factoring, in Proceedings 35th An- troscopy of a rare-earth spin ensemble using a supercon-
nual Symposium on Foundations of Computer Science ducting resonator, Physical Review B 84, 060501 (2011),
(IEEE Comput. Soc. Press, 1994) pp. 124–134. 1102.3841.
[13] M. Ekerå and J. Håstad, Quantum Algorithms for Com- [32] M. U. Staudt, I.-C. Hoi, P. Krantz, M. Sandberg,
puting Short Discrete Logarithms and Factoring RSA In- M. Simoen, P. A. Bushev, N. Sangouard, M. Afzelius,
tegers, in Post-Quantum Cryptography, Lecture Notes in V. S. Shumeiko, G. Johansson, P. Delsing, and C. M. Wil-
Computer Science, Vol. 10346, edited by T. Lange and son, Coupling of an erbium spin ensemble to a supercon-
T. Takagi (Springer International Publishing, 2017) pp. ducting resonator, Journal of Physics B: Atomic, Molec-
347–363, 1702.00249. ular and Optical Physics 45, 124019 (2012), 1201.1718.
[14] R. L. Rivest, A. Shamir, and L. Adleman, A method [33] S. Probst, H. Rotzinger, S. Wünsch, P. Jung, M. Jerger,
for obtaining digital signatures and public-key cryptosys- M. Siegel, A. V. Ustinov, and P. A. Bushev, Anisotropic
tems, Communications of the ACM 21, 120 (1978). Rare-Earth Spin Ensemble Strongly Coupled to a Su-
[15] W. Diffie and M. E. Hellman, New directions in cryp- perconducting Resonator, Physical Review Letters 110,
tography, IEEE Transactions on Information Theory 22, 157001 (2013), 1212.2856.
644 (1976). [34] R. B. Griffiths and C.-S. Niu, Semiclassical Fourier
[16] Information Technology Laboratory, Digital Signature Transform for Quantum Computation, Physical Review
Standard (DSS) (2013). Letters 76, 3228 (1996), quant-ph/9511007.
[17] P. W. Shor, Polynomial-Time Algorithms for Prime Fac- [35] C. Gidney, Approximate encoded permutations and
torization and Discrete Logarithms on a Quantum Com- piecewise quantum adders, 1905.08488 (2019).
puter, SIAM Journal on Computing 26, 1484 (1997), [36] S. A. Cuccaro, T. G. Draper, S. A. Kutin, and D. P.
quant-ph/9508027. Moulton, A new quantum ripple-carry addition circuit,
[18] M. Ekerå, Modifying Shor’s algorithm to compute short quant-ph/0410184 (2004).
discrete logarithms (2016), https://eprint.iacr.org/ [37] C. Gidney, Halving the cost of quantum addition, Quan-
2016/1128. tum 2, 74 (2018), 1709.06648.
[19] M. Ekerå, On post-processing in the quantum algorithm [38] V. Vedral, A. Barenco, and A. Ekert, Quantum networks
for computing short discrete logarithms (2017), https: for elementary arithmetic operations, Physical Review A
//eprint.iacr.org/2017/1122. 54, 147 (1996), quant-ph/9511018.
[20] M. Ekerå, Quantum algorithms for computing general [39] C. Zalka, Shor’s algorithm with fewer (pure) qubits,
discrete logarithms and orders with tradeoffs (2018), quant-ph/0601097 (2006).
https://eprint.iacr.org/2018/797. [40] R. Babbush, C. Gidney, D. W. Berry, N. Wiebe, J. Mc-
[21] C. Gidney, Windowed quantum arithmetic, 1905.07682 Clean, A. Paler, A. Fowler, and H. Neven, Encoding Elec-
(2019). tronic Spectra in Quantum Circuits with Linear T Com-
[22] Code is available at https://github.com/ElieGouzien/ plexity, Physical Review X 8, 041015 (2018), 1805.03662.
factoring_with_memory. [41] D. W. Berry, C. Gidney, M. Motta, J. R. McClean, and
R. Babbush, Qubitization of Arbitrary Basis Quantum
20

Chemistry Leveraging Sparsity and Low Rank Factoriza- [50] A. Kubica, M. E. Beverland, F. Brandão, J. Preskill,
tion, Quantum 3, 208 (2019), 1902.02134. and K. M. Svore, Three-Dimensional Color Code Thresh-
[42] T. G. Draper, Addition on a Quantum Computer, quant- olds via Statistical-Mechanical Mapping, Physical Re-
ph/0008033 (2000). view Letters 120, 180501 (2018), 1708.07131.
[43] T. G. Draper, S. A. Kutin, E. M. Rains, and K. M. Svore, [51] F. Boudot, P. Gaudry, A. Guillevic, N. Heninger,
A logarithmic-depth quantum carry-lookahead adder, E. Thomé, and P. Zimmermann, Factorization of RSA-
Quantum Information and Computation 6, 351 (2006), 250 (2020).
quant-ph/0406142. [52] S. Bravyi and R. König, Classification of Topologically
[44] Due to the measurement-based uncomputation, it is often Protected Gates for Local Stabilizer Codes, Physical Re-
more efficient to implement the non-Clifford operations view Letters 110, 170503 (2013), 1206.1609.
through AND computation. In case of direct implemen- [53] C. Simon, H. de Riedmatten, M. Afzelius, N. Sangouard,
tation of Toffoli gates, the circuit cost could be slightly H. Zbinden, and N. Gisin, Quantum Repeaters with Pho-
reduced. ton Pair Sources and Multimode Memories, Physical Re-
[45] D. Poulin, Stabilizer Formalism for Operator Quantum view Letters 98, 190503 (2007), quant-ph/0701239.
Error Correction, Physical Review Letters 95, 230504 [54] M. Le Dantec, M. Rancic, E. Flurin, D. Vion, D. Es-
(2005), quant-ph/0508131. teve, P. Bertet, P. Goldner, T. Chanelière, B. Sylvain,
[46] P. Zanardi, D. A. Lidar, and S. Lloyd, Quantum Ten- S. Lin, and R. B. Liu, Twenty millisecond electron-spin
sor Product Structures are Observable Induced, Physical coherence in an erbium doped crystal, in Bulletin of the
Review Letters 92, 060402 (2004), quant-ph/0308043. American Physical Society (American Physical Society,
[47] M. E. Beverland, A. Kubica, and K. M. Svore, Cost of 2021).
Universality: A Comparative Study of the Overhead of [55] Y. Kubo, C. Grezes, A. Dewes, T. Umeda, J. Isoya,
State Distillation and Code Switching with Color Codes, H. Sumiya, N. Morishita, H. Abe, S. Onoda, T. Ohshima,
PRX Quantum 2, 020341 (2021), 2101.02211. V. Jacques, A. Dréau, J.-F. Roch, I. Diniz, A. Auffeves,
[48] A. M. Kubica, The ABCs of the color code: A study of D. Vion, D. Esteve, and P. Bertet, Hybrid Quantum Cir-
topological quantum codes as toy models for fault-tolerant cuit with a Superconducting Qubit Coupled to a Spin
quantum computation and quantum phases of matter , Ensemble, Physical Review Letters 107, 220501 (2011),
Ph.D. thesis (2018). 1110.2978.
[49] A. Kubica and N. Delfosse, Efficient color code decoders [56] V. Ranjan, J. O’Sullivan, E. Albertinale, B. Albanese,
in d ≥ 2 dimensions from toric code decoders, 1905.07393 T. Chanelière, T. Schenkel, D. Vion, D. Esteve, E. Flurin,
(2019). J. J. L. Morton, and P. Bertet, Multimode Storage
of Quantum Microwave Fields in Electron Spins over
100 ms, Physical Review Letters 125, 210505 (2020),
2005.09275.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy