0% found this document useful (0 votes)

510 views322 pages

Daniel Gottesman Book

The document discusses the principles and techniques of quantum error correction and fault-tolerant quantum computation. It covers various quantum error correcting codes, their construction, and the challenges posed by quantum errors. Additionally, it explores the integration of classical error correction methods and the implications for fault tolerance in quantum computing systems.

Uploaded by

Vincenzo Pianese

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

510 views322 pages

Daniel Gottesman Book

Uploaded by

Vincenzo Pianese

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 322

Surviving as a Quantum Computer in a Classical World

Daniel Gottesman

May 7, 2024
2
Contents

I Quantum Error Correcting Codes 7

1 Know Your Enemy: Quantum Errors 9
1.1 The Quantum Channel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.2 Single-Qubit Example Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3 Multiple-Qubit Channels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.4 A Peek Ahead: Errors During Computation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

2 Redundancy Without Repetition: Basics Of Quantum Error Correction 25

2.1 Quantum Error Correction? Ridiculous! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 The 3-Qubit Code(s) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.3 The 9-Qubit Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4 Correcting General Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.5 The Quantum Error Correction Conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.6 What Makes a Good QECC? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3 Will The Real Codeword Please Stand Still?: Stabilizer Codes 43

3.1 The 9-Qubit Code Revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2 Pauli Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.3 Stabilizer Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
3.4 Cosets and Error Syndromes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.5 Binary Symplectic Representation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

4 Maybe I Should Have Started Here: Classical Error Correction 61

4.1 Classical Error Correction in General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
4.2 Classical Linear Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.3 Dual Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
4.4 Non-Binary Linear Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.5 Hamming, Gilbert-Varshamov, and Singleton Bounds, MDS Codes . . . . . . . . . . . . . . . 72

5 Combining The Old And The New: Making Quantum Codes From Classical Codes 77
5.1 CSS Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.2 GF(4) Codes and Stabilizer Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6 Symmetries Of Symmetries: The Cli↵ord Group 87

6.1 Definition of the Cli↵ord Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
6.2 Classical Simulation of the Cli↵ord Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.3 Generators of the Cli↵ord Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.4 Encoding Circuits for Stabilizer Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
6.5 Extending the Cli↵ord Group to a Universal Gate Set . . . . . . . . . . . . . . . . . . . . . . 107

3
7 Tighter, Please: Upper And Lower Bounds On Quantum Codes 109
7.1 The Quantum Gilbert-Varshamov Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
7.2 The Quantum Hamming Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.3 The Quantum Singleton Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
7.4 Linear Programming Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

8 Bigger Can Be Better: Qudit Codes 125

8.1 Qudit Pauli Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
8.2 Qudit Stabilizer Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
8.3 Qudit CSS Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
8.4 Qudit Cli↵ord Group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

9 Now, What Did I Leave Out?: Other Things You Should Know About Quantum Error
Correction 141
9.1 Concatenated Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
9.2 Convolutional Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
9.3 Information-Theoretic Approach to QECCs . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152

II Fault-Tolerant Quantum Computation 157

10 Everyone Makes Mistakes: Basics Of Fault Tolerance 159

10.1 The Fault-Tolerant Scenario . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
10.2 Formal Definition of Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
10.3 Statement of the Threshold Theorem for the Basic Model . . . . . . . . . . . . . . . . . . . . 173
10.4 Di↵erent Ways of Computing the Threshold and Overhead . . . . . . . . . . . . . . . . . . . 174

11 If Only It Were So Easy: Transversal Gates 179

11.1 What is a Transversal Gate? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
11.2 Transversal Gates for Stabilizer Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
11.3 Transversal Gates for the 7-Qubit Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
11.4 Transversal Gates and Measurement for CSS Codes . . . . . . . . . . . . . . . . . . . . . . . . 185
11.5 Other Topics Relating to Transversal Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

12 Who Corrects The Correctors?: Fault-Tolerant Error Correction And Measurement 193
12.1 Fault Tolerant Pauli Measurement for Stabilizer Codes . . . . . . . . . . . . . . . . . . . . . . 193
12.2 Shor Error Correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
12.3 Steane Error Correction and Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
12.4 Knill Error Correction and Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
12.5 Efficiency of FTEC Protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221

13 Any Sufficiently Advanced Fault-Tolerant Protocol is Indistinguishable from Magic States:

State Preparation And Its Applications 225
13.1 Preparation of Encoded Stabilizer States . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
13.2 Gate Teleportation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 229
13.3 Compressed Gate Teleportation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
13.4 Cli↵ord Eigenstate Preparation by Measurement . . . . . . . . . . . . . . . . . . . . . . . . . 234
13.5 Magic State Distillation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237

4
14 If It’s Worth Doing, It’s Worth Overdoing: The Threshold Theorem 245
14.1 Adversarial Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245
14.2 Good and Bad Extended Rectangles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
14.3 Correctness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 249
14.4 Incorrectness: Simulations With a Bad Extended Rectangle . . . . . . . . . . . . . . . . . . . 254
14.5 Probability of Having a Bad Rectangle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 258
14.6 Level Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265
14.7 Concatenation and the Threshold Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269

15 Now Hold On Just a Second: Assumptions Re-Examined 279

15.1 Solovay-Kitaev Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 279
15.2 Short-Range Gates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 282
15.3 Slow Measurement or No Intermediate Measurement . . . . . . . . . . . . . . . . . . . . . . . 286
15.4 Fresh Ancilla Qubits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292
15.5 Parallelism and Timing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296
15.6 Leakage Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298
15.7 Biased Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 300
15.8 Error Independence and Non-Markovian Errors . . . . . . . . . . . . . . . . . . . . . . . . . . 304

16 Now, What Did I Leave Out, Part Two?: Other Things You Should Know About Fault
Tolerance 321
16.1 Ancilla Factories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
16.2 Fault Tolerance for Polynomial Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
16.3 More General Notions of Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321
16.4 Upper Bounds on Fault Tolerance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321

III Miscellaneous Topics 323

17 They May Not Be Error Correction, But We Still Love Them: Other Methods Of Error
Control 325
17.1 Entanglement Distillation and Quantum Repeaters . . . . . . . . . . . . . . . . . . . . . . . . 325
17.2 Subsystem Coding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
17.3 Entanglement-Assisted Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
17.4 Decoherence-Free Subspaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
17.5 Dynamical Decoupling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325
17.6 Error Mitigation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 325

18 Wholesale Error Correction: Channel Capacity 327

19 Donuts. Is There Anything They Can’t Do?: Topological Codes 329

19.1 Toric Code and Surface Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
19.2 Decoding of Surface Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
19.3 Fault Tolerance with Surface Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345

20 It Certainly Helps If You Look At Things The Right Way: Graph States 355

21 Uijt Dibqufs Jt B Tfdsfu:Quantum Error Correction and Quantum Cryptography 357

21.1 Quantum Key Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
21.2 Quantum Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 357
21.3 Quantum Secret Sharing and Secure Multiparty Quantum Computation . . . . . . . . . . . . 357

5
IV Appendices 359
A Quantum Computation 361

B Group Theory 363

C Finite Fields 365

D Linear Algebra 367

6
Part I

Quantum Error Correcting Codes

7
Chapter 1

Know Your Enemy: Quantum Errors

In this book, you will learn how to seek out and destroy errors on quantum states. Quantum errors are nasty,
unforgiving things. If you don’t know what you are doing, a single misstep can result in the destruction of
irreplaceable quantum information. Of course, nobody’s perfect, and in part II, you will learn how to handle
your own fallibility. (Short answer: very carefully.) For now, we will assume you don’t make any mistakes,
but that doesn’t mean things will be easy. The quantum errors are still out there, hungering to consume
quantum information. You need to get the errors before they get you.
The key to most error hunts is preparation. There are two forms of preparation. One is dressing your
quantum information properly: that is, encoding it in an appropriate quantum error-correcting code. You
will learn about that in the other chapters in part I. This chapter will focus on the other aspect of preparation:
studying the habits of the errors you are hunting.

1.1 The Quantum Channel

The most common way of characterizing a source of errors in a quantum system is as a quantum channel.
“Quantum channel” is a general term for an opportunity for the environment to introduce errors into your
quantum system. Usually, we consider a situation as depicted in figure 1.1. Alice wants to send quantum
information to Bob. Alice and Bob both have perfect quantum computers, but the connection between
them is possibly imperfect: the quantum channel. (We will drop the assumption that Alice and Bob have
perfect quantum computers in part II.) While the quantum information is in transit, the external world
(the “environment”) has a chance to interact with the system, thereby changing it in some way, introducing
errors (or “noise”) relative to the state that Alice sent.
A common example of a quantum channel is an optical fiber. Single photons can pass through the optical
fiber, but they may be lost or altered en route. Other possibilities are sending a photon through the air,
physically moving an atom with information encoded in the state of the electron or nuclear spin, successively
swapping the states of neighboring qubits arranged on a line, or even using quantum teleportation to send

Alice Bob

Errors

Figure 1.1: Alice, who has a perfect quantum computer, wants to send qubits to Bob, who also has a perfect
quantum computer. However, the communications channel between them is not perfect.

9
a quantum state from Alice to Bob with classical communication and some sort of entangled state. This is
not an exhaustive list. Indeed, any sort of communication, even a classical telephone call, can be considered
as a quantum channel. If you try to send a qubit through a regular telephone connection, no amount of
quantum error correction will allow you to recover the full quantum state afterwards, but that doesn’t a↵ect
the telephone line’s status as a quantum channel — it is simply a very noisy quantum channel.
Another common situation is when Bob is replaced by Alice’s future self. In this case, we really want
a “quantum memory”: Alice wants to prepare some quantum information, go o↵ and do other things, and
then return and manipulate her stored quantum information again. If we assume that Alice’s manipulations
at the beginning and the end of the process are perfect, we can consider the “memory” portion, when the
qubit is stored but subject to noise, as a quantum channel.
It is worth noting that the notion of a quantum channel only applies when we can look at a single
communications link in isolation. That is, to have a quantum channel, the quantum state that exits the
channel should only depend on the quantum state that goes into the channel. That may seem like a tautology,
but it is not. Imagine that Alice has a quantum memory, and prepares a qubit to store in the memory at
time 0. At time 1, she returns and fiddles some more with the stored system, then goes away again and
comes back at time 2. The storage from time 0 to time 1 is a quantum channel (assuming Alice’s initial
preparation of the qubit is perfect), but the storage from time 1 to time 2 might not be. The problem is
that the environment might remember what happened between time 0 and 1 (and more importantly, might
remember something about the state that was stored between time 0 and 1) and use that to influence what
it does to the state during the second time interval. The error now no longer depends only on what state
is stored at time 1; it also depends on what state was stored at time 0. Of course, some environments have
a very short memory, and in that case, it is a very good approximation to consider the time interval 1 to
2 to be independent of the time interval 0 to 1, and with that approximation, the second time interval is
a quantum channel. When the environment has no memory, and every time interval is independent of any
other non-overlapping time interval, the environment (or the error source) is called Markovian. When the
environment does remember over time scales long enough to matter, it is a non-Markovian environment.
In part I, we only consider the case depicted in figure 1.1. There the environment has only one opportunity
to attack the quantum information. While that opportunity may last for an extended period of time,
because Alice and Bob do not do anything with the quantum information during that time, we can lump
together everything the environment does to the state into a single transformation, and consider the whole
communications line as a single quantum channel. The question of whether the environment is Markovian
or non-Markovian then becomes moot. If we were to generalize this picture and allow noise during Alice and
Bob’s processing of the qubit, then the question arises again, since the environment then gets more than one
shot at the quantum information, but don’t worry about that situation until part II.
Now it is time to formally define a quantum channel:

Definition 1.1. A quantum channel is a completely positive trace-preserving (CPTP) map.

Wasn’t that easy? At least, it is if you know what a CPTP map is. If not, you should refer to appendix A,
where you will learn that a CPTP map is the most general physically possible transformation for an operation
where the output depends only on the input, so this is the right definition. It is frequently convenient to
consider the Kraus form of a CPTP map:
X
E(⇢) = Ak ⇢A†k . (1.1)
k

We can think of this channel as a collection of possible errors Ak , where error Ak occurs with probability
tr(Ak ⇢A†k ). However, note that the probability of Ak occurring is not a single value but actually depends
on
P the† state ⇢. Furthermore, remember that the decomposition into Ak operators is not unique and that
k Ak Ak = I.
Frequently, instead of dealing explicitly with a quantum channel, I will instead refer to the set of possible
errors. One way to write this set is E = {Ak }, but it is frequently convenient to rescale the errors, so more
generally E = {Ek }, where each Ak = pk Ek for some scalar pk .

10
1.2 Single-Qubit Example Channels
1.2.1 Unitary Channels, Pauli Errors
Let us start by discussing some examples of channels acting on a single-qubit input. The simplest case is
when there is only a single value of k in the Kraus decomposition:
E(⇢) = A⇢A† . (1.2)
P
Since k A†k Ak = I, it follows that A is unitary. Typographically, I will usually represent a unitary channel
the same way as a unitary matrix, e.g. A(⇢) versus A(| i), even though they formally act on di↵erent kinds
of objects (density matrices versus state vectors). The one exception is that I will usually write the identity
channel as I, as opposed to the identity unitary I.
There are of course infinitely many unitary maps, but some are more interesting than others. One set
that you will be particularly sick of by the end of this book are the Pauli matrices:
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
1 0 0 1 0 i 1 0
I= , X= , Y = , Z= . (1.3)
0 1 1 0 i 0 0 1
I is, of course, the identity matrix — no error. X is probably the first thing you think of when you think
of an error, a classical bit flip:
X|0i = |1i (1.4)
X|1i = |0i. (1.5)
However, since it is a quantum operator, it also acts sensibly on superpositions:
X(↵|0i + |1i) = ↵|1i + |0i. (1.6)
Z is the most basic kind of truly quantum error, a phase flip:
Z(↵|0i + |1i) = ↵|0i |1i. (1.7)
Y is then just a combined bit flip and phase flip error:
Y = iXZ (1.8)
Y (↵|0i + |1i) = i↵|1i i |0i. (1.9)
Recall that an overall phase — one that a↵ects all states uniformly — has no physical significance, so Y and
XZ are really the same channel. In the form presented above, all the Pauli matrices are Hermitian as well
as unitary, which is sometimes useful.
In other contexts, the Pauli matrices are often written as x , y , and z or 1 , 2 , and 3 , but those are
too much writing and less easy to read. I’ll be using the Pauli matrices a lot in this book, so I’ll use the more
straightforward notation X, Y , and Z. In some of the earlier quantum error-correction literature, Y = XZ
instead of iXZ. There is not a huge di↵erence, but I think this convention is somewhat nicer overall.
There are many more single qubit unitary errors, and some are even interesting. For instance, we can
have phase rotation by an arbitrary angle:
✓ i✓ ◆ ✓ ◆
e 0 i✓ 1 0
R✓ = =e . (1.10)
0 ei✓ 0 e2i✓
Again, we can ignore an overall phase, so R✓ is the same channel as diag(1, e2i✓ ). The full set of physically
distinct one-qubit unitary channels is the group SU(2).
In the Bloch sphere picture of the state space for a qubit, a unitary map is just a rotation of the sphere.
X, Y , and Z are ⇡ rotations around the X, Y , and Z axes, as one might expect. R✓ is a rotation by angle 2✓
around the Z axis. (I have defined R✓ this way in order to agree with the prevailing terminology for phase
rotations, which results from talking about spin-1/2 particles.) Note that a reflection of the Bloch sphere is
not a completely positive map. For instance, the transpose map is a reflection in the XZ plane, and when
applied to part of an entangled state, the transpose gives something non-positive.

11
Z Z

Y Y

X X

Figure 1.2: Bloch sphere transformation induced by a dephasing channel

1.2.2 Dephasing Channel

One very commonly-encountered channel is the dephasing channel.

Definition 1.2. A channel of the form

Rp (⇢) = (1 p)⇢ + pZ⇢Z † . (1.11)

is a dephasing channel. When p = 1/2, we have the completely dephasing channel. We usually restrict
attention to p  1/2 since channels with p > 1/2 are related to channels with p < 1/2 by a Z operation.
p p
The Kraus operators of a dephasing channel in this form are 1 pI and pZ. Since both are propor-
tional to unitary maps, the probabilities of these two errors occurring do not depend on the input state.
Thus, this channel corresponds to no error with probability 1 p and a phase flip with probability p. We
can calculate how it acts on the density matrix in component form:
✓ ◆ ✓ ◆
a b a (1 2p)b
Rp : 7! . (1.12)
c d (1 2p)c d

In other words, a dephasing channel shrinks the o↵-diagonal components of the density matrix. In the
completely dephasing channel, the o↵-diagonal terms disappear completely.
An alternate Kraus decomposition is also edifying.
Z
2p ⇡
Rp (⇢) = (1 2p)⇢ + R✓ ⇢R✓† d✓. (1.13)
⇡ 0

The dephasing channel is a channel for which, with probability 1 2p, nothing happens to the state, and
with probability 2p, the phase between |0i and |1i is completely randomized. This seems to conflict with the
decomposition into I and Z, where the state is left unchanged with probability 1 p, not with probability
1 2p. However, when the dephasing angle ✓ is chosen uniformly at random, there is a reasonable chance
that ✓ is small and the state does not change very much. Of course, the probability that the angle ✓ is exactly
zero is just 1 2p, but using the magic of quantum mechanics, the small-✓ cases cancel out just right so that
if we break the channel up into I and Z, we find the probability of error is only p.
This example illustrates a rather disturbing principle: in quantum mechanics, the notion of “probability
of error” is inherently somewhat subjective. When error correction is concerned, the decomposition into
I and Z is the better choice for the dephasing channel, for reasons that will become clearer in chapter 2.
However, there are many channels for which none of the decompositions is particularly favored, even for the
specific application of quantum error correction.
In the Bloch sphere picture, a dephasing channel shrinks the sphere into an ellipsoid, leaving the Z axis
unchanged, as in figure 1.2. A completely dephasing channel shrinks the Bloch sphere down to just the
segment on the Z axis.

12
The dephasing channel is a physically very interesting channel. A dephasing channel occurs in any
system where the environment learns about the qubit in the standard basis but does not otherwise interfere
with the state. Even in a more realistic system where there are more complicated interactions between the
system and the environment, there is frequently a large dephasing component to the noise. The prevalence
of approximate dephasing channels is one of the reasons that the macroscopic world appears classical to us
— a completely dephasing channel converts a qubit into a probabilistic classical bit.
We can make a simple model of a dephasing channel by having one environment qubit interact with
the systempqubit via a Hamiltonian H = !Z ⌦ Z. The environment qubit starts in the pure state |+i =
(|0i + |1i)/ 2, and the system qubit starts in state | 0 i = ↵|0i + |1i. Then, after a time t, the state of the
two qubits is

i!tZ⌦Z 1 i!t
e | 0i ⌦ |+i = p (↵e |00i + ↵e+i!t |01i + e+i!t |10i + e i!t
|11i. (1.14)
2

In other words, at time t, we have the dephasing channel R(1 cos(2!t))/2 . In this simple model, the system
dephases at short times, but after a longer time t = ⇡/!, it returns to its starting state.
In a more realistic system, there are many di↵erent environment qubits interacting with the system,
and/or there are additional qubits interacting with the environment qubits. If we take a Markovian form of
our simple model, and assume that the environment’s internal interaction e↵ectively resets the environment
qubit after a very short time, phase coherence instead decays steadily:
✓ ◆
|↵|2 ↵ ⇤ e t/T2
, (1.16)
↵⇤ e t/T2 | |2

corresponding to a dephasing channel R(1 e t/t2 )/2 . The time constant T2 of the exponential decay is known
as the t2 time. (Otherwise I probably would have used a di↵erent symbol.) One way to think about this
behavior is that there is a constant probability 1/T2 per unit time that the phase is completely randomized
as an instantaneous “quantum jump”.
Another common source of dephasing is a varying energy di↵erence between the |0i and |1i states. If |ai
has energy Ea , after time t, |ai has evolved into e iEa t |ai. We can ignore the global phase, but there is still
a relative phase di↵erence e i(E1 E0 )t between |0i and |1i. However, when E0 and E1 are known constants,
we can generally ignore the relative phase too: if we keep track of the time t, the relative phase is known, and
we can take it into account in any operation we want to perform. In some systems, there is a phase reference
(such as the phase of a laser) which automatically compensates for the phase di↵erence. However, when the
energy di↵erence varies unpredictably, we can no longer keep precise track of the relative phase, resulting in
an uncompensated relative phase shift accumulating randomly over time. This results in dephasing. In some
cases, the relative phase shift is not random, but is simply unknown, and more sophisticated techniques may
be able to compensate.
Experimentally, the signature of dephasing is often a decay of coherent interference e↵ects. In a Rabi
oscillation experiment, the system cycles cos(⌦R t)|0i + sin(⌦R t)|1i. After time t, the system is measured to
test if it is |0i or |1i. If the system were perfect, if we were to plot the probability of 1 against the time, we
would get a perfect oscillation sin2 (⌦R t) for all times. Instead, we get something more like figure 1.3, with
oscillations decreasing in amplitude.
Let us imagine a Markovian environment, and assume that the T2 time is much longer than the Rabi
frequency ⌦R , so that we can assume the dephasing and oscillation are independent. If there is a quantum
jump causing full dephasing at time t0 , the pure state cos(⌦R t0 )|0i + sin(⌦R t0 )|1i becomes the mixed state
with probability cos2 (⌦R t0 ) of |0i and probability sin2 (⌦R t0 ) of |1i. Then |0i and |1i continue with their

13
1
Prob.(1)
1
2

0 time

Figure 1.3: Decay of Rabi oscillations due to dephasing

own oscillations, but out of phase with each other. This mixture has density matrix
✓ 2 ◆
cos (⌦R t0 ) cos2 (⌦R t0 ) + sin2 (⌦R t0 ) sin2 (⌦R t0 ) (cos2 (⌦R t0 ) sin2 (⌦R t0 )) sin(⌦R t0 ) cos(⌦R t0 )
(1.17)
(cos2 (⌦R t0 ) sin2 (⌦R t0 )) sin(⌦R t0 ) cos(⌦R t0 ) cos2 (⌦R t0 ) sin2 (⌦R t0 ) + sin2 (⌦R t0 ) cos2 (⌦R t0 )
✓ 2 ◆
sin (⌦R t0 ) + cos(2⌦R t0 ) cos2 (⌦R t0 ) cos(2⌦R t0 ) sin(⌦R t0 ) cos(⌦R t0 )
= (1.18)
cos(2⌦R t0 ) sin(⌦R t0 ) cos(⌦R t0 ) sin2 (⌦R t0 ) + cos(2⌦R t0 ) sin2 (⌦R t0 )
= sin2 (⌦R t0 )I + cos(2⌦R t0 )⇢(t0 ), (1.19)

where t0 = t t0 and ⇢(t0 ) is the density matrix of a Rabi oscillation running for time t0 . We will still see
Rabi oscillation, but with a reduced amplitude.

1.2.3 Depolarizing Channel and Pauli Channel

While the dephasing channel is perhaps the favorite (or least favorite, depending on your point of view) of
experimentalists, it lacks a certain quantumness. It only involves one kind of non-trivial error, and indeed,
in an appropriate choice of basis, it can be viewed as a purely classical noisy channel. Theorists are instead
enamored of the depolarizing channel, which does experience the full range of quantum errors and is highly
symmetric. It is not so common in the real world, which is less symmetric than theorists would prefer, but
it is still extremely useful for exploring quantum error correction.
Definition 1.3. The depolarizing channel is the quantum channel
p p p
Dp (⇢) = (1 p)⇢ + X⇢X † + Y ⇢Y † + Z⇢Z † . (1.20)
3 3 3
D3/4 is the completely depolarizing channel.
The depolarizing channel has an equal chance of an X, Y , or Z error, each with probability p/3. We can
analyze what it does to the density matrix:
✓ ◆ ✓ ◆
a b (1 2p/3)a + (2p/3)d (1 4p/3)b
Dp : 7! . (1.21)
c d (1 4p/3)c (1 2p/3)d + (2p/3)a

This is not particularly enlightening until you remember that tr ⇢ = 1, in which case you can see that

Dp (⇢) = (1 4p/3)⇢ + (4p/3)(I/2). (1.22)

In other words, with probability 1 4p/3, the qubit is left alone, but with probability 4p/3, it is replaced
with the completely mixed state. It also should now be clear why I defined p = 3/4 as the completely
depolarizing channel: in that case, the input state is always replaced with the maximally mixed state. As
with the dephasing channel, there is some ambiguity as to what the “true” error rate is for the depolarizing
channel, but the two representations can be reconciled by recognizing that the completely mixed state does
contain a component of the original input state.
The other thing to notice about the second representation is that it is much more symmetric than the
first one. The decomposition into Paulis has a certain amount of symmetry, treating X, Y , and Z on the

14
Z Z

Y Y

X X

Figure 1.4: Bloch sphere transformation induced by a depolarizing channel

same footing, but the second decomposition has even more: there are no preferred bases or unitary operators
at all appearing in it. This is the true beauty of the depolarizing channel — it can at once be considered
as a simple mixture of the very basic Pauli errors, but also is invariant under any kind of unitary rotation.
This symmetry property suggests a decomposition for the depolarizing channel akin to equation (1.13), and
indeed there is one: Z
2p
Dp (⇢) = (1 4p/3)⇢ + 2 U ⇢U † dU, (1.23)
3⇡ SU(2)
where the integral uses the Haar measure, the unitarily-invariant measure over SU(2).
We can generalize the depolarizing channel by giving up its symmetry but keeping its decomposition into
Paulis:
Definition 1.4. A channel of the form
E(⇢) = pI ⇢ + pX X⇢X † + pY Y ⇢Y † + pZ Z⇢Z † (1.24)
is a Pauli channel.
A Pauli channel has potentially di↵erent probabilities for the four Pauli matrices I, X, Y , and Z.
They don’t have to be di↵erent — dephasing channels and depolarizing channels are both examples of
Pauli channels — but once you’ve generalized to a Pauli channel, you might as well take advantage of the
opportunity to have some variety among the Paulis. Naturally, pI + pX + pY + pZ = 1 so that the total
probability adds up to 1.
The depolarizing channel uniformly shrinks the Bloch sphere into a smaller sphere still centered on the
origin, or to a single point (the maximally mixed state) if we have the completely depolarizing channel. A
more general Pauli map shrinks the Bloch sphere into an ellipsoid centered on the origin.

1.2.4 Amplitude Damping Channel

Another physically motivated channel is the amplitude damping channel. It occurs, for instance, if |0i and
|1i are the ground and excited state of a two-level atom, and the excited state can decay to the ground state
by spontaneously emitting a photon. The amplitude damping channel also occurs for a photonic qubit where
|0i is no photon and |1i is 1 photon — damping occurs when the photon escapes or is absorbed somewhere
along the way. There should be 2 Kraus operators for the amplitude damping channel, one representing the
cases when there is no emission (or loss), and one representing the “no-jump” case. It is worthwhile trying
to figure out yourself what the form of the Kraus operators should be before looking at the definition below.
Definition 1.5. An amplitude damping channel has the following form:
Ap (⇢) = A0 ⇢A†0 + A1 ⇢A†1 , (1.25)
where ✓ ◆ ✓ p ◆
1 p 0 0 p
A0 = , A1 = . (1.26)
0 1 p 0 0

15
Z Z

Y Y

X X

Figure 1.5: Bloch sphere transformation induced by an amplitude damping channel

Probably you were able to get the right form for A1 (perhaps without the square root, which is a matter
of convention depending on how we choose to parametrize amplitude damping channels). It represents the
possibility that |1i can become |0i. However, |0i never spontaneously gains energy in this idealized channel;
even in the real world, it is fairly rare, since it requires that a stray photon of about the right energy be
wandering by. Therefore the lower left corner of A1 is 0.
A0 might surprise you. (If not, then good work.) The most obvious guess is that since there is no decay,
nothing should happen, and A0 should be proportional to the identity. However, you won’t be able to satisfy
the constraint that A†0 A0 + A†1 A1 = I if you choose A1 as above and A0 / I. Conceptually, the reason for
this is that A1 can only occur if the initial state was |1i. Therefore, if A1 doesn’t happen, it means that
the initial state was more likely to be |0i, and A0 reflects that, reducing the amplitude of |1i in the initial
state. The same phenomenon can be found in classical probability theory, for instance in the “Monty Hall
problem.”
In the Bloch sphere picture, amplitude damping results in a uniform shrinkage to a smaller sphere. It
di↵ers from the depolarizing channel in that the center of the sphere is no longer fixed. Instead, the south
pole (the |0i state) is fixed. In the limit p = 1, the whole sphere shrinks down to the south pole.
The characteristic decay time for a system undergoing continuous amplitude damping is the “T1 time.”
(After all, there had to be a T1 to go with T2 for the dephasing time scale.)

1.2.5 Photon Loss Channels

Amplitude damping can occur due to photon loss from a photon channel when the presence or absence of a
photon represents |0i or |1i. However, this is not a particularly common encoding used for photonic qubits.
More often, the qubit is stored in some other state of a single photon (such as the polarization), or by being
in one of two modes (a “dual-rail encoding”), or some more complicated structure possibly involving multiple
photons. When we use one of these encodings, loss of a photon is no longer represented by an amplitude
damping channel.
Consider, for instance, a polarization encoding, with horizontal polarization being |0i and vertical polar-
ization being |1i. Now, photon loss can a↵ect either |0i or |1i, and what results if the photon does escape
is neither of those states, instead being a third state |vacuumi. Therefore, a photon loss channel for polar-
ization encoding takes a qubit input but its output is a 3-dimensional qutrit. Assuming both polarizations
have equal probability p to lose a photon, the channel is then very simple. There are three Kraus operators,
corresponding to no photon loss, loss of a horizontally polarized photon, and loss of a vertically polarized
photon: 0 1 0 1 0 1
1 0 0 0 0 0
A0 = 1 p @0 1A , A1 = p @0 0A , and A2 = p @0 0A . (1.27)
0 0 1 0 0 1
The same channel works for photon loss from a dual-rail encoding.
When the encoding involves multiple photons, there is still a sensible description as a quantum channel,
but the analysis requires a bit more quantum optics, and is beyond the scope of this book.

16
1.2.6 Erasure Errors
The example of the photon loss qubit channel suggests another interesting kind of error. When a photon
is not lost, the state is not changed at all. When a photon is lost, by monitoring the photon number
(but not the polarization), we can, in principle, tell that a photon has been lost. In that case, we do not
know what the original state of the system was, but at least we know that something has happened to it.
Measuring the photon number without destroying the polarization state is technologically difficult, so for
this particular example, this is more an issue of principle than of practice. There are, however, other systems
where monitoring for loss of the qubit is easier.

Definition 1.6. A qubit erasure error is an error A acting on a single qubit which maps all qubit states to
a third state |?i orthogonal to the qubit Hilbert space.

Erasure channels may seem to be difficult channels to correct, since the information is completely lost,
but that’s not the case. By measuring if the state is |?i is present (without collapsing superpositions of |0i
and |1i) — for instance, by measuring the number of photons — we can determine if an erasure error has
occurred. Because erasure channels provide some classical information about which qubits underwent errors,
erasure channels are actually easier to correct than more general channels.

1.3 Multiple-Qubit Channels

It’s hard to do too much with only one qubit. Even if we have only one qubit of data, we’ll need more
than one qubit to create a real quantum error-correcting code. Therefore, we should also think some about
channels acting on multiple qubits.

1.3.1 Pauli Channels

One route to defining multiple-qubit channels is to largely forgot about the tensor product structure and just
pick some completely positive map acting on the Hilbert space as a whole. Such channels can be extremely
complicated, since they act on a Hilbert space of dimension 2n when there are n qubits. Sometimes this is
necessary, as it reflects the actual physics of the system, but quantum computers are usually built out of
systems that naturally break up into qubits, and many-qubit interactions are rare.
Mathematically, it is usually too difficult to deal with an arbitrary n-qubit channel, but the n-qubit
generalization of the Pauli channel is sufficiently well-behaved to be sometimes useful.

Definition 1.7. A channel of the form

X
E(⇢) = pP P ⇢P † , (1.28)
P

where the sum is taken over P which are tensor products of I, X, Y , and Z, is a Pauli channel.

In a Pauli channel, the operator P occurs with probability pP . Pauli channels are a reasonable quantum
analogue of classical channels. There is a definite probability of error, and the errors that occur are large and
discrete. However, since we can have a mix of bit flip and phase errors, a Pauli channel does have enough
quantum features to be interesting.

1.3.2 Independent/Memoryless Channels

The simplest way to create a channel on n qubits is to just treat each qubit separately.

Definition 1.8. An independent channel on n qudits (each of dimension q) has the form ⌦ni=1 Ei , where each
Ei is a single-qudit channel.

17
A dimension-q qudit is a single system whose state space is a Hilbert space of dimension q, so for instance,
q = 2 gives us a qubit, and for the moment, we will restrict attention to qubits. Often, we set all Ei ’s to be
equal to E, so that all qubits are treated equally.
As a simple example, we can consider E = Rp , the dephasing channel. Let us stick to n = 3 in order
to be more explicit. There are a total of 8 possible Kraus operators for this channel, with the following
probabilities:
Probability Errors
(1 p)3 I ⌦I ⌦I
p(1 p)2 Z ⌦ I ⌦ I, I ⌦ Z ⌦ I, I ⌦ I ⌦ Z
p2 (1 p) Z ⌦ Z ⌦ I, Z ⌦ I ⌦ Z, I ⌦ Z ⌦ Z
p3 Z ⌦Z ⌦Z
The total probability of having a Z error on exactly one qubit is then 3p(1 p)2 , and the probability of
having two Z errors is 3p2 (1 p).
The chance of having at least one qubit with an error on it is therefore larger than the chance of having
a single qubit by itself go wrong under the same dephasing channel Rp . This makes sense, since there are
more places for errors to occur. However, when p is small, most of the time, there will only be 0 or 1 Z
error, and the other qubits will have I acting on them. This is the case we want to address through quantum
error correction — errors are rare, but not negligibly so. In the case of this 3-qubit dephasing channel, we
can make a good approximation by considering only the no-error or one-error possibilities, and ignoring the
two- and three-qubit error cases.

1.3.3 Definition of t-Qubit Errors

More generally, we can define a t-qubit error to be one that acts on t qubits. However, there are two technical
points in exactly how we want to define it:
Definition 1.9. We say that a linear operator A acting on n qubits has weight t if it is the tensor product
of the identity on n t qubits with some matrix on the remaining t qubits. The weight of A is denoted wt A.
We say that a weight t operator A has support on the t qubits on which it acts non-trivially. Generally (but
not always), we cite the weight and support for the minimal set of qubits for which A acts non-trivially. A
linear operator B acting on n qubits is a t-qubit error if it has the form
X
B= Bi , (1.29)
i

where each Bi has weight at most t, not necessarily with support on the same set of qubits for di↵erent i.
An n-qubit quantum channel is a t-qubit error channel if it has a Kraus decomposition for which all Kraus
operators are t-qubit errors. We can similarly define a t-qubit error map acting linearly on n-qubit density
matrices but which may not be trace preserving.
Technical point number one is that even though a weight-t operator A acts non-trivially on only t qubits,
it can act arbitrarily on those t qubits. In particular, it does not need to be a tensor product of errors on
the separate qubits — it can entangle them however it likes. For instance, CNOT ⌦ I is a weight 2 operator
acting on 3 qubits.
Technical point number two is that a t-qubit error B can be the sum of terms acting on di↵erent sets of
t qubits. This naturally means that more than t qubits will be altered under the action of this error, but
it turns out that they are altered in a way that is no more harmful than if we had a channel that had a
possibility of altering each set of t qubits as separate Kraus operators. This point will be discussed in greater
length in section 2.4.3, once we have a real quantum error-correcting code to examine.
We can also define t-qubit erasure errors. For instance, imagine we have many qubits each stored as the
polarization of a photon, and each one undergoes a slight amount of photon loss.
Definition 1.10. A t-qubit erasure error is a weight t operator which is the tensor product of erasure errors
on the qubits in its support. A t-qubit erasure channel is a quantum channel with a Kraus representation

18
where all Kraus operators are s-qubit erasure errors, with s  t. s can be di↵erent for di↵erent Kraus
operators.

Note that here I am requiring that each Kraus operator has a specific set of qubits which can be erased,
not a superposition of di↵erent sets of qubits. This reflects the idea that an erasure is in some sense a
classical event, because the set of qubits that were erased can be measured.

1.3.4 Connection Between Independent Channel and t-Qubit Errors

An independent channel seems much more physically plausible (albeit still an approximation) than a t-qubit
channel. However, it turns out to be more mathematically straightforward to design quantum error-correcting
codes for t-qubit channels (or to correct a set of t-qubit errors) than for independent channels. Luckily, as
suggested by the example of the 3-qubit dephasing channel, an n-qubit independent channel can be well
approximated by a t-qubit channel (for some t < n) when the single-qubit channels making up the n-qubit
channel are all close to the identity.

Theorem 1.1. Let I be the 1-qubit identity channel and E = ⌦ni=1 Ei be an n-qubit independent channel,
with kEi Ik⌃ < ✏  n t+1
t 1 , and ✏  1/3 as well. Then
✓ ◆
n
kE Ẽk⌃ < 5 [(4e + 2)✏]t+1 (1.30)
t+1

for some t-qubit error map Ẽ.

The significance of the theorem is that if we have a quantum error-correcting code which is designed to
correct t-qubit error channels, it will automatically also correct independent channels where the single-qubit
tensor factors are sufficiently close to the identity. Actually, that’s not completely true: While the theorem
does show that, this is not really the significance of the theorem, since there is much easier proof (which we
will see in chapter 2) that a quantum error-correcting code that corrects t-qubit errors also corrects small
independent error channels. Rather, this theorem provides a motivation for thinking about t-qubit error
channels, which might otherwise seem rather bizarre.
When kEi Ik⌃ < ✏, we are getting about an ✏ chance of error per qubit, so with n qubits, we expect
around ✏n errors. Therefore, we would not expect to be able to approximate E well by a t-qubit error map
unless t & ✏n, which is reflected in the bound on ✏. The other detailed constants in the theorem shouldn’t
be taken too seriously, as the proof could likely be tightened considerably to get better constants. But if you
do insist on taking those constants seriously, you might find, for example, that when n = 5 and t = 1, you
would get kE Ẽk⌃ . 8286✏2 , which only gives a non-trivial bound when ✏ is less than about 0.01.
n
Two elements of equation (1.30) that are significant are the combinatorial factor t+1 , which reflects the
number of (t + 1)-qubit subsets of the n qubits, and the exponent t + 1 for ✏, which tells us that the closeness
of the approximation improves exponentially as we allow more qubits to have errors. In the limit of large n
and any constant ratio t/n, the combination of the combinatorial factor and the exponent means that there
will be a threshold value of ✏ below which we get a good approximation for all large n (and indeed, a better
one as n gets larger).

Proof. We begin with a lemma on sums of error probabilities or amplitudes that will also be helpful later.
Lemma 1.2. If 0 < t < n, then
Pn n n
a) For any 0  ✏  1, j=t+1 j ✏j (1 ✏)n j
 t+1 ✏t+1
t+1
Pn n n
b) When 0  ✏  n t 1, then j=t+1 j ✏j  t+1 (e✏)t+1

Proof of lemma. There are a number of ways to prove part a. One straightforward method is to interpret ✏
as a probability of some event (which is the main application we will have for this lemma). Then the sum
is the probability that the event occurs at least t + 1 times in n independent trials. We can upper bound

19
this probability by considering each subset of t + 1 trials. The total probability of having the event occur
in all trials in the subset, without regard to what happens on the other n t 1 trials, is ✏t+1 . There are
n
t+1 subsets of size t + 1, so by the union bound, the total probability of having some set of t + 1 trials with
n
the event is at most t+1 ✏t+1 . Whenever the event occurs j > t + 1 times, we have over-counted, since we
j
included that probability as part of all t+1 sets of size t + 1 which had the event.
For part b, note that
✓ ◆n t 1
n t 1 t+1
(1 ✏) 1 e (t+1) . (1.31)
n t 1
Then
Xn ✓ ◆ Xn ✓ ◆
n j n j 1
✏ = ✏ (1 ✏)n j
(1.32)
j=t+1
j j=t+1
j (1 ✏)n j

Xn ✓ ◆
n j 1
 ✏ (1 ✏)n j
(1.33)
j=t+1
j (1 ✏)n t 1

✓ ◆
n
 ✏t+1 et+1 (1.34)
t+1

by part a and equation (1.31).

As a warm-up to prove the theorem, let us consider the case when for all i, Ei has a Kraus operator
(1 ✏)I. In this case, we can say that Ei has probability ✏ of having an error (one of the other Kraus
operators), and a probability of 1 ✏ of having no error. Part a of lemma 1.2 applies, so the probability of
n
having at least t + 1 errors is at most t+1 ✏t+1 . In this case, the t-qubit error map F has all combinations
of up to t “error” Kraus operators with the “good” Kraus operator (1 ✏)I on the other qubits. The map F
is completely positive, but is not trace preserving, since we have discarded the Kraus operators with more
than t errors.
For the general case, first we need a better characterization of single-qubit channels G which are close to
the identity.
Lemma 1.3. If E is a quantum channel from HD to HD satisfying
p kE I1 k⌃ < ✏  1/3, then E has a Kraus
P P
representation E(⇢) = k Ak ⇢A†k such that kA0 Ik1 < 2D (✏ + ✏2 ) + (✏/2 + ✏2 ) and k6=0 kAk k21 <
D✏(1/2 + ✏).

Proof of lemma. Using the Choi-Jamiolkowski isomorphism,

P the channel E corresponds to the entangled
state E = (I ⌦ E)(| + i h + |), with | + i = p1D a |aai. (Note that I have normalized the state to make
it easier to apply common identities, which is not always the convention used with the Choi-Jamiolkowski
isomorphism.) Since kE I1 k⌃ < ✏,
k E | + i h + |k1 < ✏ (1.35)
as well. Since the trace distance between these two states is small, the fidelity between them is high:
+ +
h | E| i>1 ✏/2. (1.36)

Now, | + i h +
| has one eigenvalue +1 and the remaining eigenvalues 0. Let us also write E in terms of
an eigenbasis,
2
DX 1

E = i | i i h i |. (1.37)
i=0
P
E is positive and has trace 1, so i 0 and i i = 1. Now consider
+ + + +
|h i | ih | j i| = |h i | E | ih | | j i| <✏ (1.38)

20
for i 6= j. In addition,
2
DX 1
+ + +
h | E| i= i |h | i i|2 > 1 ✏/2. (1.39)
i=0

We can choose the phase of the P eigenstates | i i to ensure that h + | i i is real and non-negative. Letting
+ 2
ai = h | i i, we have ai 0, i ai >
P1 2✏/2,P and ai aj < ✏ for i 6= j. Assume without loss of generality
2 2
that a0 is the largest of the ai s. Then i ai  i a0 = a0 , so
p
a0 > 1 ✏/2 1 ✏/2. (1.40)

Thus
✏
ai < ✏/a0 < (1.41)
1 ✏/2
for i 6= 0. Then
2 2
DX 1 DX 1
2 ✏
i ai  0 + i (1.42)
i=0 i=1
✏/2 1
✏
= 0 + (1 0) (1.43)
1 ✏/2
✏ + (1 3✏/2) 0
= , (1.44)
1 ✏/2
P
since i = 1. Therefore,
(1 ✏/2)2 ✏
0 1 ✏/2 ✏2 . (1.45)
1 3✏/2
(assuming ✏  1/3), which means X
i =1 0  ✏/2 + ✏2 (1.46)
i6=0

for i 6= 0.
We have now bounded all the terms wePneed, but we’d like a tighter bound on a0 . We can do that by
going back and plugging in the bound on i.

+ +
k| 0i h 0| | ih |k1  (1 0) +k0| 0i h 0| | +i h +
|k1 (1.47)
X
+ +
 (1 0) + k i | i i h i |k1 + k E | ih |k1 (1.48)
i6=0

 2✏ + 2✏2 . (1.49)

But the 1-norm distance between two pure states is just given by
p
k| 0 i h 0 | | + i h + |k1 = 2 1 |h 0|
+ i|2 , (1.50)

meaning
p ✏ + ✏2
1 |h 0|
+ i| p  ✏ + ✏2 . (1.51)
1 + |h 0 | + i|

Thus, in the Choi-Jamiolkowski isomorphism form of the channel, the state has one large eigenvalue,
for which the eigenstate is close to the maximally-entangled state, and the other eigenvalues are all small,
with the eigenstates far from the maximally entangled state | + i. (Of course, they could be close to other
maximally entangled states.) We can recover a set of Kraus operators for the channel by letting
p
Ak |ai = D k (ha| ⌦ I) | k i (1.52)

21
for basis states |ai, extended linearly to the full Hilbert space HD . (Recall that
P | i i is a state in HD ⌦ HD ,
so the right-hand side of equation (1.52) is in HD .) Then kAk k21  D k , so k6=0 kAk k21  D✏(1/2 + ✏), as
desired. p ⇤ ⇤
To get the bound on A0 , note that p A0 | ⇤i =+ D 0 h | 0 i, where | i is the complex conjugate of | i in
the basis {|ai}. Furthermore, | i = Dh | i. Then
kA0 0 Ik1 = max |A0 | i 0 | i| (1.53)
| i
p
= max D 0 |h ⇤ | 0 i h ⇤ | +
i| (1.54)
| i
p
= D 0 || 0 i | + i| (1.55)
p p
= D 0 2 2 Reh + | 0 i (1.56)
p
 2D (✏ + ✏2 ), (1.57)
applying equation (1.51) in the last line, and recalling we have chosen h + | 0i to be real. Thus,
p
kA0 Ik1  2D (✏ + ✏2 ) + (✏/2 + ✏2 ) (1.58)

P
For the case of qubits and applying ✏  1/3, the lemma gives kA0 Ik1  7✏/2 and k6=0 kAk k21  5✏/3
P
for k 6= 0. We will round to kA0 Ik1 < 4✏ and k6=0 kAk k21  2✏ for k 6= 0.
Given lemma 1.3, we can use a similar approach for the general case as we did for the warm-up, which
was essentially classical. Channel Ei has Kraus operators Aik , with Ai0 close to the identity and Aik small
N k i6= 0. The n-qubit independent channel E has Kraus operators whichNare iall possible tensor products
for
i Aki . Let F be the map whose Kraus operators are all tensor products i Aki with at most t values of i
for which ki 6= 0. Then
Xn X YX
kF Ek⌃  kAiki k21 . (1.59)
r=t+1 |S|=r i2S ki 6=0

The sum overP r represents the number of values of i for which ki 6= 0, and S is the set of indices for which
ki 6= 0. Since ki 6=0 kAki k21  2✏, we get
Xn ✓ ◆ ✓ ◆
n n
kF Ek⌃  (2✏)r  (2e✏)t+1 (1.60)
r=t+1
r t + 1

by lemma 1.2.
Now F is not yet a t-qubit error map because the A0 terms include some errors. However, the A0 terms
i i i
N I, iso we can expand them as A0 = I + iA , with k A k1 < 4✏. Given
are all near a Kraus operator for F
A{ki } = i Aki with r indices ki 6= 0, expand all A0 s in this way and form A0{ki } by discarding all terms
in the expansion with more than t r Ai factors. The resulting Kraus operators are composed of sums of
terms with weight at most t, of which r non-identity factors come from ki 6= 0 terms, and the remaining
come from Ai components of ki = 0 factors.
0 1
Y n
Xr X Y
kA0{ki } A{ki } k1  @ kAkj k1 A k Ai k1 (1.61)
j|kj 6=0 s=t+1 r |S|=s i2S
0 1
Y n
Xr ✓ ◆
n r
@ kAkj k1 A (4✏)s (1.62)
s=t+1 r
s
j|kj 6=0
0 1
✓ ◆ Y
n r @
 (4e✏)t+1 r
kAkj k1 A . (1.63)
t+1 r
j|kj 6=0

22
Let ⇢ be an arbitrary pure state, possibly entangled between HD and a reference system. Then

kA0{ki } ⇢(A0{ki } )† A{ki } ⇢A†{ki } k1  k(A0{ki } A{ki } )⇢(A0{ki } )† k1 + kA{ki } ⇢((A0{ki } )† A†{ki } )k1 (1.64)
0 1
Y
 2@ kAkj k1 A kA0{ki } A{ki } k1 (1.65)
j|kj 6=0
0 1
✓ ◆ Y
n r @
2 (4e✏)t+1 r
kAkj k21 A . (1.66)
t+1 r
j|kj 6=0

I have omitted the ⌦I terms a↵ecting the reference system in the first line; the equation is complicated
enough as is. In the second line, we have used the property that kA| ik  kAk1 and bounded kA{ki } k1
and kA0{ki } k1 by the norm of just the terms with kj 6= 0. If we sum over all {ki } with the same locations
for which kj 6= 0, we get
X ✓ ◆
0 0 † † n r
kA{ki } ⇢(A{ki } ) A{ki } ⇢A{ki } k1  2 (4e✏)t+1 r (2✏)r . (1.67)
t+1 r

Finally, let G be the linear map with Kraus operators A0{ki } . In G, we have eliminated all Kraus operators
with more than t errors. Then we get a bound on kG Fk⌃ by applying them to ⇢, and considering the
1-norm of the resulting state, which involves summing equation (1.66) over all possible values of {ki } with
at most t indices kj 6= 0 (those with more have already been excluded from F). We get

X t ✓ ◆✓ ◆
n n r
kG Fk⌃  2 (4e✏)t+1 r (2✏)r (1.68)
r=0
r t + 1 r
X t ✓ ◆✓ ◆
n t+1
=2 (4e✏)t+1 r (2✏)r (1.69)
r=0
t + 1 r
✓ ◆
n
2 [(4e + 2)✏]t+1 . (1.70)
t+1

Combining this with equation (1.60), we get

✓ ◆
n
kG Ek⌃  3 [(4e + 2)✏]t+1 . (1.71)
t+1

There is one final step needed. It is possible that G is no longer completely positive, since it could be
that kA0{ki } ⇢(A0{ki } )† k1 > kA{ki } ⇢(A{ki } )† k1 , which could result in tr G(⇢) > 1. We should therefore scale G
down to CG, for appropriately chosen constant C, to get a map that is guaranteed to be completely positive.
F is trace non-increasing, though, so
✓ ◆
n
tr G(⇢)  1 + kG Fk⌃  1 + 2 [(4e + 2)✏]t+1 = 1/C. (1.72)
t+1

Then
✓ ◆
n
kCG Ek⌃  3 [(4e + 2)✏]t+1 + 1 C (1.73)
t+1
✓ ◆
n
5 [(4e + 2)✏]t+1 . (1.74)
t+1

23
1.4 A Peek Ahead: Errors During Computation
In part II, we’ll consider more general types of errors. In particular, quantum gates will be able to go wrong
in various ways, and errors will occur multiple times during a computation. As noted above, the quantum
channel picture is no longer completely general then, since the noise might be non-Markovian. Mostly we’ll
stick to Markovian noise, but even then matters are much more complicated. Since our control is no longer
reliable, we’ll have to deal with errors occurring even while we’re trying to fix them. However, there are
various subtler difficulties to contend with as well.
For one thing, we don’t know when an error occurs, so we can’t assume we do error correction immediately
after each error. In particular, an error might occur right before a gate that we had intended for some other
purpose. Then even if the gate itself works perfectly, it can cause the error to propagate, infecting an
additional qubit with the same error. In addition, the e↵ect of the gate can change the type of error. For
instance, a Z error that occurs before a Hadamard gate will change into a X error after the gate. This
phenomenon makes it much more difficult to take advantage of information we know about the errors. For
instance, suppose the noise source is largely dephasing noise. We might want to use a code that is particularly
good at correcting dephasing noise, but if we use Hadamard gates in our circuit, some of the Z errors that
occur will have become X errors by the time we get around to correcting them, and our code won’t work
anywhere near as well as expected. A particularly insidious form of this phenomenon occurs when errors
happen during the implementation of a gate, which, after all, should take a non-zero time. Even if the
completed gate does not change the type of error, depending on how the gate is being implemented, the
partial gate may in fact alter the structure of the noise.
We’ll return to all these issues in part II, and discuss how to design fault-tolerant quantum circuits that
allow reliable error correction and computation on encoded states despite the complications.

24
Chapter 2

Redundancy Without Repetition:

Basics Of Quantum Error Correction

Now we are ready to start designing quantum error-correcting codes. A natural place to start for inspiration
is to look at the theory of classical error-correcting codes, and indeed it can give us some guidance. However,
we’ll rapidly see that there are some major di↵erences between classical and quantum error correction.
The simplest classical error-correcting code is the repetition code:

0 7! 000 (2.1)
1 7! 111. (2.2)

If we send this 3-bit encoding through a 1-bit error classical channel, it is clear that we will be able to correct
for any error that occurs on a single bit. If all three bits are the same, we know there hasn’t been an error,
while if one of the three is di↵erent, for instance 010, we know that the one that’s di↵erent is the one that’s
wrong. By enforcing a boring conformity among the bits, we can recover the original state. Recall that if
we use an independent channel instead of a 1-bit error channel, then there is some chance of 2 or 3 errors,
which would fool us. If there are two errors, the one bit that we think is wrong is actually the only bit that’s
correct, and our well-meaning attempt to fix it will actually complete the error, making all three bits wrong.
Luckily, the chance of two errors occurring is only O(p2 ) when the probability of an error on a single bit
is p (see section 1.3.2). When p is small, the chance of the encoded state ending up wrong is less than the
chance that an unencoded bit can make it unchanged through the channel.
Throughout this chapter, we’ll assume we have a t-qubit error channel. This is justified by theorem 1.1,
which tells us that if we actually have an independent channel, it is very close to a t-qubit channel, at least
when the error rate per qubit is low. We will actually reprove a version of theorem 1.1 with an easier proof
and better constants specifically applicable to quantum error-correctiong codes.
To make a quantum error-correcting code, we might want to imitate the classical repetition code, but
that instantly runs into a few problems. We’ll need to find a somewhat di↵erent way to protect our quantum
information, one that adds redundancy without repeating the state.

2.1 Quantum Error Correction? Ridiculous!

Why am I so dead-set against a quantum repetition code? Perhaps we could encode

| i 7! | i| i| i. (2.3)

If you’ve had much quantum information experience, you’ll immediately see the problem with this encoding:
it is forbidden by the No-Cloning Theorem.

25
1. No-cloning theorem prohibits repeating a quantum state.
2. Measuring the data while determining the error will destroy superpositions.
3. We must correct Y and Z errors in addition to X errors.
4. We must correct an infinite set of unitaries and also channels which decohere the state.

Table 2.1: Barriers to making a quantum error-correcting code.

Theorem 2.1 (No-Cloning Theorem). There is no quantum operation which maps an arbitrary state | i to
| i| i.

Proof. The problem is linearity. Suppose

|0i 7! |00i (2.4)

|1i 7! |11i. (2.5)

Then, because quantum operations must be linear,

| i = ↵|0i + |1i 7! ↵|00i + |11i. (2.6)

However, this is not equal to the cloned | i state:

| i| i = ↵2 |00i + 2
|11i + ↵ (|01i + |10i). (2.7)

Another problem has to do with how we correct errors for the repetition code. Given a 3-bit state coming
out of the channel, we’d like to look at it and determine which of the three bits is di↵erent from the other
two. However, when dealing with quantum states, looks really can kill. Or if not kill, at least decohere,
destroying any superposition we have. And without the ability to be in a superposition, a qubit is no better
than a classical bit. Somehow, to make a quantum error-correcting code, we’ll need some way to correct
errors without looking too closely at what we’re doing.
Even if we can handle these problems, it’s clear that quantum error correction will be more complicated
than classical error correction. For a classical channel, basically all that can happen is a bit flip error, but
quantum codes are going to need to handle phase flip errors as well, not to mention Y errors. While the
3-qubit repetition code is good at correcting bit flip errors, it doesn’t do anything to help correct phase flips.
Actually, it makes phase flip errors more common since there are now three qubits which could have phase
flip errors instead of only one.
In addition to X, Y , and Z, we will also have to handle an infinite set of unitaries, such as the R✓ errors.
Then there are more general channels, such as the dephasing channel or depolarizing channel, which turn
pure states into mixed states. How can we come up with a correction operation that will turn the state from
a mixed state back into a pure state?
At this point, the prospects for a quantum error-correcting code look bleak. The difficulties we’ve
identified so far are listed in table 2.1. But don’t worry — there are solutions to all of these problems. If
there weren’t, this book would be a lot thinner.

2.2 The 3-Qubit Code(s)

We won’t try to handle all of these problems at once. Let’s focus first on points 1 and 2. We’ll try to make
a quantum version of the three-bit repetition code. The quantum code will encode a qubit, but only protect
against one kind of error, just like the classical three-bit code.

26
| i
|0i
|0i

|0i
|0i

Figure 2.1: Encoding and syndrome measurement circuit for the 3-qubit bit flip correction code.

2.2.1 Correcting Bit Flips for Superposition States

Suppose we apply the classical repetition encoding to the basis states and extend to arbitrary superpositions
by linearity:
↵|0i + |1i 7! ↵|000i + |111i. (2.8)
The first thing to notice is that this is an allowed encoding, and doesn’t violate the no-cloning theorem.
There is no rule against copying basis states — they act just like classical bits. The no-cloning theorem only
kicks in if you try to copy superpositions as well. Rather than encoding a superposition | i as three copies
of | i (which is impossible), we encode superpositions as entangled states. That takes care of difficulty 1.
Let’s look at what happens to this state when there’s been a bit flip error on the second qubit:

X2 (↵|000i + |111i) = ↵|010i + |101i. (2.9)

(In this book, I shall use the notation X2 to indicate that the error X is acting on qubit 2. The same
notation applies for gates. However, sometimes I shall omit the subscripts and instead write out a tensor
product explicitly; in this case, that would be I ⌦ X ⌦ I.)
As with the classical repetition code, the error can be identified by noticing that the middle qubit
is di↵erent from the first and third qubits. The critical point is that this is true for both branches of the
superposition. It is therefore possible to measure the fact that the second qubit is di↵erent without measuring
whether we have an encoded zero or an encoded one. If we don’t measure the data that’s encoded in the
code, we don’t destroy an encoded superposition. This is how we resolve the second barrier: we can measure
the error without measuring the information we’re trying to preserve.

2.2.2 Encoding Circuit and Error Syndrome Measurement

To understand this point in more detail, look at figure 2.1. The first part of the figure gives the encoding
circuit for the 3-qubit code. We start with one qubit in an arbitrary unencoded state and add two additional
qubits which become entangled with the first qubit. After the encoding, it is no longer particularly meaningful
to ask which was the original data qubit and which are the ones we added. They are now all treated on an
equal basis. The encoding of equation (2.8) is symmetric between all three qubits.
The second part of the circuit contains the error correction process. We wish to know whether one of the
qubits is di↵erent from the other two, and if so, which one is di↵erent. We can break that down into two
pieces: We ask whether the first two qubits are the same or di↵erent, and then we ask whether the second
and third qubits are the same or di↵erent. In other words, we wish to know the parity of the first two qubits
and the parity of the last two qubits. We will store the answers to these two questions on two more extra
qubits. Extra qubits like these that are used for this or any other helpful purpose during a computation are
known as ancilla qubits.
To tell whether two qubits are the same or di↵erent, the circuit in figure 2.1 uses two CNOT gates. If
they are the same, either neither CNOT flips the corresponding ancilla, or both do; if they are di↵erent,
exactly one of the CNOT gates flips the ancilla qubit. Thus, when we measure the ancilla qubit for one

27
of the parity measurements, the output we get is equal to the parity: 0 for same (even parity), and 1 for
opposite (odd parity). Notice that the result does not depend at all on the encoded state, simply whether
there is a bit flip error on the qubits we are measuring. That means the measurement can be done without
disturbing a superposition of the encoded data.
Together, the measurement results for the two ancillas form a bit string known as the error syndrome.
The error syndrome encapsulates all the information we have about the error. For instance, when the bit flip
error is X2 , the error syndrome is 11 because both the first pair and the second pair of qubits are di↵erent.
Similarly, error syndromes 10 and 01 correspond to the errors X1 and X3 , respectively, and error syndrome
00 tells us that there is no error. Thus, every possible error for a one-qubit bit flip channel is accounted for.
Once we know what the error is, we can simply correct it by performing another bit flip on the appropriate
qubit.
Notice that the choice of error correction circuit and corresponding error syndrome is not unique. For
instance, we could have measured the parity of the first and third qubits instead of measuring the parity
of the second and third qubits. This would have given us the same information, but the correspondence
between error syndromes and errors would have been shu✏ed. We could have measured the parities of all
three pairs of qubits, but in that case the error syndrome (which would be 3 bits long) would be redundant:
from any two error syndrome bits, we could deduce the third. It is not a coincidence that the number of
syndrome bits we need is equal to the number of extra qubits we added to the data in order to perform the
original encoding. This is a general property of quantum error-correcting codes, as discussed in chapter 3,
although there are certain cases where we don’t need all of the information encoded in the syndrome. (See
section 17.2 for a description of that case.)

2.2.3 Phase Error Correction

The three-qubit code described above corrects a single bit flip error, but cannot correct any phase flip errors.
It is also straightforward to make a code that works the other way: it corrects a single phase flip error, but
cannot correct any bit flip errors.
The key to making this code is to notice that the Hadamard transform H switches the role of X and Z
errors:
1
|0i $ |+i = p (|0i + |1i) (2.10)
2
1
|1i $ | i = p (|0i |1i) (2.11)
2
X acting on the |0i, |1i basis is a bit flip, but in the |+i, | i basis, it maps
X|+i = |+i (2.12)
X| i = | i, (2.13)
acting as a phase flip. Similarly, Z switches |+i and | i, so it acts as a bit flip in the Hadamard-rotated
basis.
Therefore, we can make a three-qubit phase correcting code by just applying H to every qubit in the
three-qubit bit flip correcting code:
|0i 7! |0i = |+i|+i|+i (2.14)
|1i 7! |1i = | i| i| i, (2.15)
extended by linearity so that ↵|0i + |1i 7! ↵|0i + |1i. In this book, I shall use lines over a state or operator
to indicate the encoded version of the object.
If there is a single phase flip error, for instance, Z2 , one of the three qubits will be di↵erent than the
other two in the Hadamard basis. E.g., Z2 |0i = |+i| i|+i. We can locate the error in just the same way as
before, except that now we should rotate the control qubits for the CNOTs in the error correction circuit by
a Hadamard transform to work in the correct basis. The modified encoding and error correction circuit is
pictured in figure 2.2.

28
| i H
|+i
|+i

|+i H
|+i H

Figure 2.2: Encoding and syndrome measurement circuit for the 3-qubit phase correction code.

2.3 The 9-Qubit Code

The three-qubit codes resolve the first two barriers from table 2.1, but to address the third difficulty, we’ll
have to add more qubits. In order to make a code that corrects a bit flip error or a phase error, we’ll encode
one qubit into nine qubits, mixing together the encodings from both the bit and phase correcting three-qubit
codes:
1
|0i 7! |0i = p (|000i + |111i)(|000i + |111i)(|000i + |111i) (2.16)
2 2
1
|1i 7! |1i = p (|000i |111i)(|000i |111i)(|000i |111i). (2.17)
2 2
Again, we extend the encoding by linearity to work on superpositions. The 9-qubit code was discovered by
Peter Shor and so is sometimes called the Shor code.
For the nine-qubit code, we will let the set of possible errors be {I, Xi , Yi , Zi |i = 1, . . . , 9}, so we can
have any single-qubit Pauli error. If we have an X error acting on a single qubit, that can be detected by
looking at a single group of three qubits to see if one qubit is di↵erent from the other two in the standard
basis. For instance,
1
X5 |0i = p (|000i + |111i)(|010i + |101i)(|000i + |111i), (2.18)
2 2
and the error correction circuit of figure 2.1 applied to the middle set of three qubits will identify the error
correctly. As before, once we identify the location of an X error, we can correct it by simply flipping the
appropriate bit back to its original state. Also as before, the circuit tells us nothing about the encoded state.
If we have a Z error on a single qubit, matters are a little more complicated, but work basically the same
way.
1
Z5 |0i = p (|000i + |111i)(|000i |111i)(|000i + |111i). (2.19)
2 2
Now we need to compare the three phases. Simply rotating into the Hadamard basis will not quite do it
this time (although it gets us close), since we have to take into account the fact that the phase is distributed
among a set of three qubits.
We will return to the nine-qubit code in chapter 3, and discuss exactly what to measure to find the error
syndrome, but for now, let us just notice that it is possible in principle to make the desired measurement.
The set of all correctly encoded states (the code space) forms a 2-dimensional subspace of the 29 -dimensional
Hilbert space of nine qubits. The two-dimensional subspace where the first phase is di↵erent from the other
two (i.e., we have + + or + ) is spanned by Z1 |0i and Z1 |1i, and is orthogonal to the code space.
Similarly, the subspaces where the second phase is di↵erent or the third phase is di↵erent are also orthogonal
to the code space. Furthermore, these three erroneous code spaces are all orthogonal to each other. Thus,
there is a measurement we can make that will distinguish them, and identify whether we have no phase error
or one phase error, and if there is a phase error, on which set of three qubits it has occurred. Once we know
the correct set of three, we can undo the phase error by applying Z to one of the qubits in that set of three.

29
A little consideration will show you that this argument applies to Y errors as well. The X and Z error
correction procedures are essentially independent, and the argument that we can identify the block of three
qubits with a Z error applies equally well if there is an X error on one of the nine qubits. Thus, the nine-
qubit code can correct for both an X and a Z error. In particular, it can correct for a single-qubit Y error,
which corresponds to having an X and a Z on the same qubit. In the set of possible errors, we only allowed
single-qubit errors, but actually the code will still work if you extend the set of possible errors to allow one
X error and one Z error on any pair of qubits.

2.4 Correcting General Errors

The nine-qubit code addresses problem 3, but it suggests that problem 4 might be troublesome. In order
to correct a single bit-flip error, we needed 3 qubits, but to correct Y and Z errors as well, we went up to
nine qubits. To solve problem 4 and correct general single-qubit errors, we’ll need to handle an infinite set
of errors. Does this mean we will have to use infinitely many qubits? Hopefully not.

2.4.1 Continuous Phase Rotation Example

Let’s examine the specific case of the continuous phase rotation R✓ in more detail. We can rewrite
✓ i✓
◆
e 0
R✓ = = cos ✓I i sin ✓Z. (2.20)
0 ei✓

Now suppose the usual nine-qubit code has experienced an error (R✓ )i instead of the usual X, Y , or Z
error, but imagine that we do not know that and use the usual error correction circuit for the nine-qubit
code. We can figure out what happens to the state by again applying the linearity of quantum mechanics.

(R✓ )i | i = cos ✓I| i i sin ✓Zi | i. (2.21)

We know what happens to I| i and Zi | i under the error correction circuit: First, we interact with some
ancilla qubits and determine the error syndrome corresponding to these errors, then we measure the ancilla
qubits. If |Iisyn and |Zi isyn are the error syndromes corresponding to the errors I and Zi , when we interact
with the ancillas, we get the following:

(R✓ )i | i 7! cos ✓I| i|Iisyn i sin ✓Zi | i|Zi isyn . (2.22)

Right now, we have an entangled state between the code qubits and the ancilla qubits. When we measure
the ancilla register, we’ll collapse the superposition, getting one of two results:

Prob.(cos2 ✓) : I| i|Iisyn (2.23)

2
Prob.(sin ✓) : Zi | i|Zi isyn . (2.24)

Something miraculous has happened: now we have either an I error (i.e., no error at all) or a Z error, and
the outcome of the error syndrome tells us which and where the error is. We can then correct it in the usual
way. Somehow, by pretending that we had an I, X, Y , or Z error, we made it actually true that the error
was one of those. It’s a triumph of wishful thinking.
For the nine-qubit code, the same procedure works with arbitrary single-qubit unitaries, and actually
works for any Kraus operator Ak . Indeed, this is not a property of just the nine-qubit code, but of quantum
error-correcting codes in general: correcting I, X, Y , and Z errors is sufficient to correct general errors. At
this point, we could easily prove this for the nine-qubit code, but it is worthwhile to prove the generalization.
To do that, we will want to first define precisely what a quantum error-correcting code is.

30
2.4.2 Definition of a Quantum Error Correcting Code
Generally speaking, a quantum error-correcting code is a subspace of a larger Hilbert space. Typically, that
subspace has been chosen in some complicated entangled way in order to have special properties when errors
occur on states from the code space. Therefore, it is most sensible to define a quantum error-correcting code
together with the set of errors it corrects. Also, we’ll frequently want to consider the subspace as a Hilbert
space encoding some quantum data, so we’ll define the subspace as a map from a smaller Hilbert space into
the big Hilbert space.
Definition 2.1. A quantum error-correcting code (U, E) (or QECC for short) is a partial isometry U :
HK ! HN with the set of correctable errors E (consisting of linear maps E : HN ! HM ) with the following
property: 9 quantum operation D : HM ! HK such that 8E 2 E, 8| i 2 HK ,
D(EU | i h |U † E † ) = c(E, | i)| i h |. (2.25)
U is known as the encoding operation for the code (or the encoder ), and D is the decoding map (or decoder ).
D is also known as the recovery operation. We say that the QECC corrects E if E 2 E.
Sometimes, we don’t consider a specific encoding map, and refer to Image(U ) (more precisely known as
the code space) as the QECC. A codeword is any state in the code space. Often, a QECC is mentioned
without an explicit set of correctable errors, which should be determined by context. Sometimes the error
set simply does not matter. When it does matter but is not specified, the set of correctable errors is often
the set of all t-qubit errors for the largest possible t; except sometimes it is instead the set of the most likely
errors in the system. HK is sometimes referred to as the logical Hilbert space and HN is the physical Hilbert
space. If E is the set of all t-qubit errors, we say that U (or Image(U )) is a t-error correcting code, or that
it corrects t errors.
In the above definition (and elsewhere in the book), HD is a Hilbert space of dimension D. A partial
isometry is like a unitary in that it preserves inner products, but might not be invertible because it may
map a Hilbert space to a Hilbert space of strictly larger dimension. Of course D actually maps matrices on
HM to matrices on HK . Since E can be a general linear map, not necessarily unitary, EU | i might not be
normalized properly, which is why we need the c(E, | i) factor in equation (2.25). The definition explicitly
allows c(E, | i) to depend on | i, but it turns out that it does not (see section 2.5). It does depend on E,
however.
To prove this, and in many other contexts, it is helpful to treat the decoding map as a unitary. This can
be done via the Stinespring dilation, by purifying D. To do so, we must add an ancilla register. Let the input
ancilla to D be of dimension D and let the output ancilla have dimension D0 . Then HM ⌦ HD ⇠ = H K ⌦ HD 0 .
HD0 plays the role of the error syndrome. Call the purification V : HM ⌦HD ! HK ⌦HD0 . Now, U | i = | i,
and when the QECC corrects E 2 E,
p
V (E| iN ⌦ |0iD ) = c(E, | i) | iK ⌦ |A(E, | i)iD0 (2.26)
I have put subscripts on the kets to help you keep track of which Hilbert spaces they belong to. There might
be a phase on the RHS, but it can be absorbed into the ancilla state |A(E, | i)i.
Proposition 2.2. In definition 2.1, c(E, | i) is independent of | i, as is |A(E, | i)i when we purify the
decoder.
Proof. Consider two codewords | i and | i. Imagine we purify the decoder to the unitary V , so
p
V E| i ⌦ |0i = c(E, | i) | i ⌦ |A(E, | i)i (2.27)
p
V E| i ⌦ |0i = c(E, | i) | i ⌦ |A(E, | i)i. (2.28)

By linearity and the definition of a QECC applied to the superposition codeword ↵| i + | i,

p p
V E(↵| i + | i) ⌦ |0i = ↵ c(E, | i) | i ⌦ |A(E, | i)i + c(E, | i) | i ⌦ |A(E, | i)i (2.29)
p
= c(E, ↵| i + | i) (↵| i + | i) ⌦ |A(E, ↵| i + | i)i. (2.30)

31
These two expressions can only be equal if |A(E, | i)i = |A(E, | i)i = |A(E)i (or the discarded ancilla will be
entangled with the output state) and c(E, | i) = c(E, | i) = c(E) (or the decoded state for the superposition
will be wrong). The conceptual point here is that in order to preserve the coherent superposition in the
decoder’s output, there cannot be any information about the encoded state left in the ancilla, and the
amplitudes (and thus norms) of di↵erent encoded states have to match, or the Hilbert space will get distorted.

Note that the set of correctable errors is not a unique invariant property of an encoding map or code
space, which is why I included it as part of the definition. The same code space could be used to correct
di↵erent sets of errors. For instance, the 3-qubit phase correction code can correct the set E = {I, Z1 , Z2 , Z3 },
but it also corrects the set E = {I, Z1 Z2 , Z1 Z3 , Z2 Z3 }. In the former case, we interpret the non-zero error
syndromes as caused by a single phase error, whereas in the latter case, we only consider two-qubit errors.
Since Z1 has the same error syndrome as Z2 Z3 but they correspond to di↵erent logical states, we have to
make a choice between them and cannot include both in the set of correctable errors at the same time. That
is, E = {I, Z1 , Z2 Z3 } is not a possible set of correctable errors for the 3-qubit phase correction code.
There are many variations of the terms defined above. For instance, code space is sometimes coding
space, code subspace, etc., and sometimes it is the encoded subspace. However, the encoded subspace
also sometimes refers to HK , which can also be called the data or encoded data. As defined above, D
incorporates both the error correction procedure and the “unencoding” process of returning the data to HK ,
but sometimes they are considered separately.
The decoder D incorporates whatever processing is necessary to correct and decode the state. For
instance, it may incorporate a measurement of the error syndrome, and application of a correction operation
conditional on the classical bits resulting from the syndrome measurement. The decoder D may not, however,
be unique. Its behavior is completely determined on the subspace spanned by E| i, where E 2 E and | i
is a codeword, but outside of that subspace, D can act arbitrarily, and the distinctions between di↵erent
decoders can be important in a number of contexts, such as when studying the efficiency of the decoder, or
its behavior on errors outside E, or when considering fault tolerance.
Note that if we define a QECC just as a subspace instead of an encoding map, we haven’t lost much
information. In particular, the set of correctable errors is the same for all encoders:
Proposition 2.3. Suppose we have two di↵erent encoders U1 , U2 : HK ! HN with Image(U1 ) = Image(U2 ).
Then (U1 , E) is a QECC i↵ (U2 , E) is a QECC.
Proof. Because U1 and U2 are partial isometries with the same image, they can only di↵er by a unitary
V : H K ! HK :
U2 = U1 V. (2.31)
If we use decoder D1 for encoder U1 , then V † D1 will serve as the decoding map for U2 :

V † D1 (EU2 | i h |U2† E † )V = V † D1 (EU1 V | i h |V † U1† E † )V (2.32)

† †
= V (c(E, V | i)V | i h |V )V (2.33)
= c(E, | i)| i h |. (2.34)

In the definition of a QECC, we have let the dimensions K, M , and N of the various Hilbert spaces
involved be arbitrary, although N K or no QECC is possible. Nevertheless, there are some common
restrictions. Frequently we assume K = 2k and N = 2n , so there are k logical qubits (or encoded qubits)
and n physical qubits. Sometimes we work with q-dimensional qudits instead of qubits, so K = q k and
N = q n . Occasionally we have even more general situations, where K and N might use di↵erent bases or
not be powers at all. When there is a tensor factorization of the overall Hilbert space, but I don’t want to
be too specific about whether the individual factors are qubits, qudits, or maybe even di↵erent sizes from
each other, I will refer to the registers, with each register being one tensor factor. A single error then a↵ects
a single register, whatever its size.

32
Most often M = N , so errors map the physical Hilbert space to itself. There is little lost by assuming
this, since we can generally ignore any unused states in HN or HM without negative consequence. The main
time when it matters is when errors occur repeatedly on the state before we perform error correction, as
happens, for instance, in fault-tolerant quantum computation. In that case, you really need M = N , so that
you can sensibly apply the same error over and over again.
One special case worth mentioning is that of erasure errors. Recall that an erasure error formally maps
a qubit into a qutrit, so to treat erasure errors in the most precise way, we would take N = 2n and M = 3n .
However, most often we imagine that a “stop-leak” gate of some sort is performed before the decoder. The
stop-leak gate will map the third |?i state of each qutrit to some state, perhaps a random one, of the
corresponding qubit. Then we can consider an erasure error as mapping a qubit to a qubit, but with some
additional classical information indicating that the qubit has undergone an error.
This is maybe a good place for an extremely important digression on the terminology of error correction.
Specifically, I want to discuss the use of hyphens. Wait, did I say “important?” I meant “unimportant.”
But I am going to discuss hyphens anyway. In English, hyphens are occasionally used in compound nouns
but not that frequently. Where they are used is to make compound adjectives. Thus, we have a hyphen in
“error-correcting code” and “fault-tolerant computation” but not in “error correction” and “fault tolerance,”
unless one of the component words happens to get split between lines. (Notably, though, there is no hyphen
for the “quantum” part, so “quantum error-correcting code.”) Certainly, many scientists are not native
speakers of English, so they have a good excuse for making this mistake. But you no longer have an excuse.

2.4.3 Linearity of Quantum Error Correction

Now we are ready to generalize the phenomenon we observed in section 2.4.1.
Theorem 2.4. If (U, E) is a QECC, then (U, E 0 ) is a QECC, where E 0 = span E is the linear span of E.
In other words, if a QECC can correct E and F , then it can also correct ↵E + F .
Proof. Basically, the idea is to duplicate the argument used in section 2.4.1 for the general case. We haven’t
defined the error syndrome for a general QECC, but we do have the decoding map to work with. We will
purify it to V so that we can work with only unitary maps.
The QECC corrects E, F 2 E, so

V (E| iM ⌦ |0iD ) = cE | iK ⌦ |sE iD0 (2.35)

V (F | iM ⌦ |0iD ) = cF | iK ⌦ |sF iD0 . (2.36)

One di↵erence from section 2.4.1 is that |sE i and |sF i don’t need to be orthogonal. Also note that the c’s
from definition 2.1 would be the squares of cE and cF .
By linearity,

V [(↵E + F )| iM ⌦ |0iD ] = ↵cE | iK ⌦ |sE iD0 + cF | iK ⌦ |sF iD0 (2.37)

= | iK ⌦ (↵cE |sE iD0 + cF |sF iD0 ). (2.38)

Tracing out HD0 , we find that the decoder map is of the desired form, with

c(↵E + F, | i) = k↵cE |sE i + cF |sF ik2 . (2.39)

2.4.4 Correcting Paulis Implies Correcting All Errors

As a corollary of theorem 2.4, we can simplify the condition for a code to correct t errors:
Corollary 2.5. If a QECC set of correctable errors includes all tensor products of I, X, Y , and Z of weight
t or less, it is a t-error correcting code.

33
Proof. The set of all errors with support on a given set of t qubits is the set of 2t ⇥ 2t matrices acting on
those qubits. Tensor products of I, X, Y , and Z acting on those t qubits form a basis for this set of matrices.
Therefore, an arbitrary weight t error can be written as the sum (with appropriate coefficients) of weight t
tensor products of I, X, Y , and Z, and an arbitrary t-qubit error can be written as a sum weight t errors.
The corollary then follows from theorem 2.4.
As a consequence, we find that the nine-qubit code is a 1-error correcting code. Applying the definitions,
we find that it encodes one logical qubit into nine physical qubits.

2.4.5 QECCs and Error Channels

We have finally answered all four objections to the existence of quantum error-correcting codes. However,
it is worth thinking a little more carefully about what happens when we have a decoherent error that maps
pure states to mixed states. For example, suppose that we have a dephasing channel with error probability
p acting on qubit i. Then the pure codeword state | i becomes:

| i h | 7! (1 p)I| i h |I + pZi | i h |Zi , (2.40)

which is a mixture of the pure states | i and Zi | i. Now, we know that the full error correction procedure
corrects these two pure states:

| i 7! | i|Iisyn (2.41)
Zi | i 7! | i|Zi isyn (2.42)

By the linearity of density matrices, that means that the final state after the dephasing channel followed by
error correction is:

(1 p)| i h | ⌦ |Ii hI|syn + p| i h | ⌦ |Zi i hZi |syn = | i h | ⌦ [(1 p)|Ii hI|syn + p|Zi i hZi |syn ] . (2.43)

That is, the decoded logical qubit ends up as a pure state, but the overall state is still mixed: the randomness
introduced by the error ends up in the error syndrome, or in the ancilla Hilbert space HD0 in the proof of
theorem 2.4.
Recall that a t-qubit error channel is one for which there exists a Kraus decomposition where each Kraus
operator is a sum of weight t linear operators. By theorem 2.4 and the argument above for the dephasing
channel, any code that corrects arbitrary t-qubit linear errors will therefore also correct arbitrary t-qubit
error channels.
What about independent channels?
Theorem 2.6. Let I be the 1-qubit identity channel and E = ⌦ni=1 Ei be an n-qubit independent channel,
with kEi Ik⌃ < ✏  n t+1 t 1 , and let U and D be the encoder and decoder for a QECC with n physical qubits
that corrects t-qubit errors. Then
✓ ◆
n
kD E U Ik⌃ < 2 (e✏)t+1 . (2.44)
t+1

In other words, a QECC that can correct t-qubit errors also corrects small independent error channels
up to a very good approximation. A similar statement immediately follows from theorem 1.1. This should
thus be viewed as a specialization of theorem 1.1 with better constant factors and a much easier proof.
Proof. The strategy is straightforward: Expand E as a sum of a t-qubit error map (not normalized) and an
additional small term. The t-qubit error map is corrected by the code due to linearity, and then we just have
to bound the size of the additional term to prove the theorem.
Since kEi Ik⌃ < ✏, we can write Ei = I + Ei , with k Ei k⌃ < ✏. Now,

E = F + G, (2.45)

34
where
t X O
X
F= Ei (2.46)
r=0 |S|=r i2S
n
X X O
G= Ei (2.47)
r=t+1 |S|=r i2S

are the sums over all tensor factors of  t and > t of the Ei s, respectively. (The tensor with I on other
qubits is implicit.)
The first term F is a sum of terms acting on at most t qubits. Each term in the sum is not a completely
positive map. In fact, Ei is not even positive. However, Ei is a di↵erence of two completely positive maps,
and therefore X
F(⇢) = ↵k Ak ⇢A†k , (2.48)
k
where the ↵k coefficients can be either positive or negative real numbers and each Ak acts on at most k
qubits. By theorem 2.4 (and the linearity of density matrices, as for the dephasing channel example), the
QECC corrects F.
By proposition 2.2, the scaling factor c(E, | i) in definition 2.1, the definition of a QECC, does not
depend on | i, which implies that
D F U = cI (2.49)
for some constant c.
Meanwhile,
n
X X Y
kGk⌃  k E i k⌃ (2.50)
r=t+1 |S|=r i2S

Xn ✓ ◆
n r
 ✏ (2.51)
r=t+1
r
✓ ◆
n
 (e✏)t+1 = (2.52)
t+1
by lemma 1.2.
Now, E, D, and U are CPTP maps, so kD E U k⌃ = 1. Thus,
1  kD E U k⌃ kD G U k⌃  kD F U k⌃ = kcIk⌃ = c  kD E U k⌃ + kD G U k⌃  1 + . (2.53)
That is, |1 c|  . Therefore,
kD E U Ik⌃ = kD F U +D G U Ik⌃ (2.54)
= k(c 1)I + D G U k⌃ (2.55)
 |c 1| + kGk⌃ (2.56)
2 . (2.57)

2.5 The Quantum Error Correction Conditions

2.5.1 Sufficient Conditions (Non-degenerate Orthogonal Coding)
Recall the argument that we used to show that the nine-qubit code could correct phase errors. Very little
about it was specific to the nine-qubit code. We said that phase errors generated subspaces which were or-
thogonal to the code space and to each other, and that therefore there was a measurement that distinguished

35
them. We can use this same argument to give a general sufficient condition to have a QECC: Suppose that
Q ✓ HN is the code space, and let E(Q) be the subspace formed by acting on Q with E. Then we want
that for any pair of errors E1 , E2 2 E, the subspace E1 (Q) is orthogonal to E2 (Q). Then we will be able to
make a measurement that identifies which subspace we are in and therefore identifies the error.
To be sure we have a QECC, we need a little bit more. After identifying the error, we need to be able to
reverse it. That is only possible if E|Q is a unitary map from Q to E(Q). E|Q is unitary i↵ h |E † E| i = h | i
for all | i, | i 2 Q. Thus, we get the sufficient condition that (Q, E) is a QECC if 8| i, | i 2 Q, 8Ea , Eb 2 E,

h |Ea† Eb | i = ab h | i. (2.58)

A code that satisfies this condition could be called a non-degenerate orthogonal code, although the term
“orthogonal code” is not widely used.

2.5.2 The QECC Conditions

However, by looking at the nine-qubit code, we can already see that equation (2.58) is not a necessary
condition. If E1 = Z1 and E2 = Z2 , then equation (2.58) fails since E1 | i = E2 | i. We need to generalize
slightly:

Theorem 2.7 (QECC Conditions). (Q, E) is a QECC i↵ 8| i, | i 2 Q, 8Ea , Eb 2 E,

h |Ea† Eb | i = Cab h | i. (2.59)

Note that Cab does not depend on | i or | i.

Proof. (: Recalling theorem 2.4, we might as well pick a useful spanning set for E and restrict attention to
† ⇤
that spanning set. Taking the adjoint of equation (2.59) and putting in | i = | i, we find that Cab = Cba ,
i.e., the matrix C is Hermitian. Therefore Cab is diagonalizable, and by choosing an appropriate spanning
set {Fa } for E we can actually diagonalize Cab . (Note that it is not necessarily true that Fa 2 E, but that
is not particularly relevant, it just means that the original set E is smaller than the maximal set of possible
errors the code can correct.)
We have
h |Fa† Fb | i = da ab h | i. (2.60)
This is almost equation (2.58), but the coefficient da can be di↵erent from 1. (da is an eigenvalue of Cab ,
so it must be real.) However, it is still true that the di↵erent subspaces F (Q) are orthogonal to each other,
so we can make a measurement that determines which error Fa we have. It is not true that Fa |Q must be
unitary, but if da is nonzero, Fa |Q can be written as a unitary followed by a uniform rescaling. The decoding
then gives the original state rescaled by da . This is allowed by definition 2.1.
If da is zero, then Fa |Q cannot be inverted, since there is no state left. Formally, we are still OK, since in
definition 2.1, we would just get that c(Fa , | i) = 0. Physically, this means that the error Fa never occurs.
It has probability proportional to h |Fa† Fa | i, which is 0.
Therefore, equation (2.59) gives sufficient conditions to have a QECC.
): Recall that by proposition 2.2, c(E, | i) and |A(E, | i)i don’t depend on | i. More generally, if we
purify the decoder to V , we get a unitary transformation, which preserves inner products:

h |Ea† Eb | i = (h | ⌦ h0|)Ea† V † V Eb (| i ⌦ |0i) (2.61)

36
2.5.3 Degenerate Codes
The di↵erence between equation (2.58) and equation (2.59) consists in the fact that Cab might not be ab .
When Cab has maximum rank, this di↵erence has no deep meaning, since we have just made a bad choice
of basis errors. As in the proof of theorem 2.7, we can choose a di↵erent set of errors with the same span to
diagonalize Cab , and we can even rescale to make Cab = ab .
When Cab has non-maximal rank, we cannot do this. If the error set E is not linearly independent, Cab
cannot have maximal rank, since a linearly dependent set of errors will produce a linearly dependent set of
rows in Cab . The interesting case is when E is linearly independent. It is possible then to still have a QECC
with non-maximal rank for Cab . This is known as a degenerate code.
Definition 2.2. Suppose (U, E) is a QECC and E is linearly independent. Then the code is degenerate if
rank(Cab ) < |E|. A code (U, E) (for linearly dependent E) is degenerate if (U, E 0 ) is degenerate, where E 0 is
a minimal spanning set for E. If a code is not degenerate, it is non-degenerate.
We have already seen an example of a degenerate code: the nine-qubit code. The essence of degeneracy
is that di↵erent errors will produce the same result (or at least linearly dependent results) when acting on
a codeword. In the nine-qubit code, any two Z errors acting on the same set of 3 qubits will produce the
same result. For instance,
1
Z1 |0i = Z2 |0i = p (|000i |111i)(|000i + |111i)(|000i + |111i) (2.64)
2 2
1
Z1 |1i = Z2 |1i = p (|000i + |111i)(|000i |111i)(|000i |111i). (2.65)
2 2

2.5.4 Distance
The most common situation when designing a QECC is to let the set of possible errors be all t-qubit errors.
It is therefore worth examining theorem 2.7 more explicitly for a t-error correcting code. When Ea and Eb
are both weight t errors, then Ea† Eb is a weight 2t error.
Definition 2.3. Let C ✓ HN be a QECC. The distance of C is the minimum weight d of an error F such
that
h |F | i =
6 c(F )h | i, (2.66)
where | i and | i run over all possible pairs of codewords of C.
Note that the notion of distance implies that the Hilbert space is broken up into qubits because weight is
defined in terms of qubits. Naturally, we can easily generalize the distance to work with a code over qudits
of dimension q by defining weight in terms of the number of qudits in the support of an operator.
We get the following corollary of theorem 2.7:
Corollary 2.8. A distance d code corrects b(d 1)/2c errors.
Inverting this formula, we find that to correct t errors, a code needs distance 2t + 1. If the distance is
even, the extra point is “wasted” for this application, but as we shall see in a moment, the extra point of
distance can be helpful in alternative applications.
The three central properties of a QECC are the size of the logical subspace, the number of physical
qubits, and the distance, so there is a notation encapsulating those properties.
Notation 2.4. A QECC which encodes HK into H2n and has distance d is denoted as an ((n, K, d)) code.
If the physical Hilbert space is n qudits each of dimension q, it is an ((n, K, d))q code. If the distance is
unknown or irrelevant, it is an ((n, K)) code or an ((n, K))q code.
This notation is derived from an analogous one for classical error-correcting codes, which will be intro-
duced in chapter 4. The classical notation uses single parentheses, and the double parenthesis indicates that
we have a quantum error-correcting code.

37
Sometimes when we discuss a code we consider it to have a distance less than its true distance or to
correct fewer errors than the maximum number it can correct. This is just a convenience indicating that we
are ignoring some of the error-correcting capability of the code. However, the true distance of a QECC does
give us the tightest estimate of the number of errors it can correct — that is, the converse of the corollary
also holds. In order to get the definition of distance, we need F to run over all errors of weight < d, but
the QECC conditions applied to a t-error correcting code only gives us equation (2.66) for errors of the form
Ea† Eb , which does not include all possible weight 2t errors. However, equation (2.66) is linear in F , so it is
sufficient to check the formula for a basis of the set of weight 2t errors, and the set of errors of the form
Ea† Eb does include such a basis.
By similar reasoning, you get the same notion of distance if you alter the definition of distance to use all
d-qubit errors or just weight d tensor products of I, X, Y , and Z.
When describing a code without an explicit set of correctable errors, I said in the definition that you are
supposed to determine the set of correctable errors by context. Typically, we will choose the error set to be
the set of t-qubit errors, which makes the distance of a code one of its most critical properties. When the
error set is given implicitly, we can define degeneracy in terms of the code’s capability as a t-error correcting
code.

Definition 2.5. Let Q ✓ HN be a QECC with distance d = 2t + 1. Then Q is degenerate if it is degenerate

when the set of correctable errors is all t-qubit errors.

If the distance is even, 2t+2, and there is no more structure, it makes the most sense to define degeneracy
by also considering the set of correctable errors to be t-qubit errors, but for certain families of codes (such
as stabilizer codes, discussed in chapter 3), we can do better.

2.5.5 Quantum Error Detection and Erasure Errors

The distance is useful information about a QECC partially because it tells you how many errors the code can
correct, but also because it encapsulates some additional error-correction-related properties of the code. In
particular, the distance also tells you about the code’s ability to detect errors without correcting them and
about the code’s ability to correct erasure errors. These applications can take advantage of even distance
codes, whereas correcting general t-qubit errors only needs odd distance codes.

Definition 2.6. An encoder U (defined as for a QECC) and a set of detectable errors E form a quantum
error-detecting code if they have the following property: Let ⇧ be the projector on the code space. Then
⇧E| i = c(E, | i)| i, for all E 2 E and all codewords | i. The code space is defined as the image of U , as
for a QECC, and we frequently refer to the code space as defining the error-detecting code instead of the
encoder.

Based on the definition of an error-detecting code, the measurement (⇧, I ⇧) will either project us
back on the original state (with some probability |c(E, | i)|2 ) or will identify that an error occurred. This
condition can be easily rewritten in similar terms to the QECC conditions:

Theorem 2.9. (U, E) is a quantum error-detecting code i↵

h |E| i = c(E)h | i (2.67)

for all codewords | i and | i and all E 2 E. A code with distance d detects arbitrary (d 1)-qubit errors.

Based on this theorem, we can understand the distance of the code as the minimum number of qubits
on which we can act to produce an undetectable error, i.e., an error with a component taking a codeword to
a di↵erent codeword. (An error taking a codeword to itself is considered “detectable” by the definition.)
P
Proof. (: We can write ⇧ = | i i h i | where the sum runs over a basis | i i for the code space Q. We can
then calculate ⇧E| i using equation (2.67) to see that we have a quantum error-detecting code.

38
): Equation (2.67) follows immediately from the definition of a quantum error-detecting code if we can
prove that c(E, | i) does not depend on | i. This can be done by considering a superposition ↵| 1 i + | 2 i,
as in proposition 2.2.
The definition of distance then shows that a distance d code detects d 1 errors.

While I have presented quantum error-correcting codes and quantum error-detecting codes as di↵erent
things, clearly there is a very close connection. A code able to correct t errors will also be able to detect
2t errors, and vice-versa. Detecting errors and correcting errors are really just two di↵erent applications for
the same code, and henceforth, I won’t make a distinction between a code designed to correct errors and one
designed to detect errors. Both will be referred to as QECCs.
There is no unique maximal set of correctable errors for a QECC, but there is a unique maximal set of
detectable errors. The quantum error-correction conditions involve a product of two errors, and there may
be more than one set of errors that will run over all possibilities for the product. However, equation (2.67)
is only linear in the error, so we can define a unique set of detectable errors.

Definition 2.7. Given a subspace Q, its set of detectable errors is

{E s.t. h |E| i = c(E)h | i 8| i, | i 2 Q}. (2.68)

If EC is the set of correctable errors and ED is the set of detectable errors for a code, then the QECC
conditions can be rephrased as EC2 ✓ ED .
A code’s ability to correct erasure errors can be understood just by applying the regular QECC conditions.
The interesting twist in this case is that we can apply the side classical information we have about the location
of the errors to correct twice as many errors:

Theorem 2.10. A QECC with distance d can correct d 1 erasure errors.

Proof. The best way to think about an erasure-correcting code is as a set of QECCs which all have the same
encoder. Each code is associated with a di↵erent error set, depending on where the erasure errors took place.
Since we don’t know when doing the encoding where the errors are, we have to use the same encoder in all
cases. When decoding, however, we know where the errors are (although not what kind of errors they are),
so we can choose a decoder based on the actual error set — all possible errors on the actual set of qubits
erased.
Therefore we need a single code space that satisfies the QECC conditions for any error set of the form ES ,
which is the set of all possible errors with support on the set S of at most t qubits. But ES2 = ES since the
product of two errors with support on S still has support on S. For a distance d code, the set of detectable
errors includes all (d 1)-qubit errors, so when t  d 1, all sets ES are subsets of the set of detectable
errors.

2.5.6 Alternate Forms of the Quantum Error Correction Conditions

There are a number of variants of the QECC conditions which are useful in di↵erent contexts. You have just
seen one relating a correctable set of errors with the set of detectable errors for a code. A few more appear
in the following proposition:

Proposition 2.11. The following are equivalent to the QECC conditions given in theorem 2.7:

1. For all codewords | i, all pairs Ea , Eb 2 E,

h |Ea† Eb | i = tr(| i h |Ea† Eb ) = Cab . (2.69)

2. If span(E) = E: For any pair of codewords | i, | i, any error E 2 E,

h |E † E| i = C(E)h | i. (2.70)

39
3. If span(E) = E: For any codeword | i, any error E 2 E,

tr(| i h |E † E) = C(E). (2.71)

4. Let {| i i} be a basis for the code space. For any i and j and for any pair Ea , Eb 2 E,

h i |Ea† Eb | ji = Cab ij . (2.72)

Cab = h |Ea† Eb | i (2.74)

= |↵| h 1 |Ea† Eb | 1 i + | |2 h 2 |Ea† Eb | 2 i + ↵⇤ h 1 |Ea† Eb | 2 i +
2
↵ ⇤
h †
2 |Ea Eb | 1 i (2.75)
= (|↵|2 + | |2 )Cab + (↵⇤ h 1 |Ea† Eb | 2 i + ↵ ⇤ h 2 |Ea† Eb | 1 i). (2.76)
p
If we first plug in ↵ = = 1/ 2(1 + Reh 1 | 2 i), we find

h 1 |Ea† Eb | 2 i + h 2 |Ea† Eb | 1 i = Cab (h 1| 2i +h 2 | 1 i). (2.77)

Putting these two equations together, we find

†
h 1 |Ea Eb | 2 i = Cab h 1 | 2 i, (2.79)

as desired.
The second and third variants use a similar argument for Ea and Eb . The proof of the fourth version is
left as an exercise.
Applying a similar argument to definition of the set of detectable errors, we find that an operator E is
detectable i↵ tr(⇢E) does not depend on the codeword ⇢. This has an interesting interpretation: it says
that an operator is detectable if and only if measurement of that operator reveals no information about the
logical state of the code. Certainly, if measuring an operator does reveal information about the logical state,
it will create errors in the encoded state. Two aspects of the condition are perhaps surprising: first, that
some element of that error cannot be detected, and second, that revealing encoded information is the only
thing that prevents an error from being detectable.
Another version of this insight appears when we apply the QECC conditions specifically to erasure errors.
Proposition 2.12. Let E be a set of erasure errors, each one corresponding to a set of erased qubits (or
qudits). Then Q corrects E i↵ ⇢S is the same for all logical states | i whenever S 2 E. Here S is a subset
of qubits that can be erased and ⇢S is the encoded state restricted to S.
Note that the proposition just says that Q can correct for erasures on the set S i↵ the codeword on S
has no information about the encoded state. Another interesting feature of this is that the ability to correct
for erasures only refers to single sets, not pairs of sets, whereas more generally the QECC conditions refer to
pairs of errors. This means that any QECC has a unique maximal set of erasure errors that it can correct,
whereas, as I mentioned before, there is not a unique set of general errors that can be corrected — there is
some tradeo↵ between correcting di↵erent kinds of errors.

40
Proof. As in the proof of theorem 2.10, we think of the code as a set of QECCs labelled after-the-fact by the
set of erased qubits. We specialize to a single set S 2 E and want to show that Q corrects erasures on S i↵
⇢S is independent of the encoded state. Once we fix S, correcting erasures on S is equivalent to correcting
arbitrary linear operators supported on S.
Let Ea = |ki hj| and Eb = |ki hi| where i, j, and k are bitstrings labelling basis vectors for the qubits
just in S. We use variant 1 of the QECC conditions from proposition 2.11. Since Ea and Eb are always
supported on S,

tr(| i h |Ea† Eb ) = hi|⇢S |jihk|ki (2.80)

= (⇢S )ij . (2.81)

As we let i and j vary over all possible basis states for S, equation (2.69) says that ⇢S does not depend on
which codeword | i we have, proving the forward direction of the proposition.
The reverse direction follows easily from variant 1 or variant 3 of proposition 2.11: If ⇢S is independent
of | i, then taking the trace of ⇢ with operators on S will also give something independent of | i.

2.6 What Makes a Good QECC?

Before we get any further in our discussion of quantum error-correcting codes, it is worth stopping to think
about what we want out of a code. The first thing that comes to mind is that we want it to be good at
correcting errors, so we want the set E of correctable errors to be large. When we’re dealing with a t-error
channel, we want the distance of the QECC to be large.
In order to correct errors, we have to add some extra qubits. For instance, to make the 9-qubit code, we
had to add 8 extra qubits to encode 1. That 9 : 1 ratio seems pretty bad, so we’d like to find codes that
use less overhead. In other words, we want k, the number of encoded qubits, to be large, and n, the number
of physical qubits, to be small. Certainly we need n > k, but we’d like the rate k/n to be as close to 1 as
possible. It’s not clear a priori whether it’s better to have a code with large k or small k, so we’d like good
examples of codes for any value of k.
In general, if we fix t (the number of errors corrected) and let k get larger, we can make k/n larger too.
However, this is rarely a sensible thing to do. As n gets larger, there are more opportunities for errors, so
the expected number of errors gets larger. A better plan is to fix t/n when we vary n. In that scenario,
there is a definite limit on the rate k/n that can be achieved, and we’d like to get as close to it as possible.
We’d also like to know, theoretically, what the best possible rate is so that we know what to shoot for when
designing codes. A family of codes that has both k/n and t/n as constants when n gets arbitrarily large is
known as a good family of codes, and such code families are known.
The three parameters n, k, and d encapsulate the most interesting properties of a code, but not the only
interesting ones. In part II, we’ll talk about fault-tolerant quantum computation, and we’d like codes that
are good for doing fault-tolerant protocols. We’d like a code that has a nice succinct classical description —
describing even a single state on n qubits could require specifying 2n amplitudes, and to specify a QECC we
might need to list all 2k states in a basis for the code subspace. Preferably, the n and k values of the code
will be in the correct size range for the application we have in mind — we wouldn’t want to have to add lots
of extra logical qubits in order to use an otherwise nice code.
We’d also like the encoding and decoding operations to be implementable with a reasonable number of
quantum gates. Frequently, when there is a short description of the code, the encoder can be performed with
a small circuit. For instance, this is the case for the stabilizer codes which will be introduced in chapter 3.
Ideally, the encoder would run in time O(n) for a code with n physical qubits, and that is harder to achieve.
(An arbitrary stabilizer code takes time ⌦(n2 / log n) to encode.) We might want additional properties, for
instance that the encoding circuit can be implemented easily in a two-dimensional layout.
Decoding is a much more sticky problem. Even codes with a simple description may have a very difficult
decoder. For the 9-qubit code, we had an error syndrome which identified the error. The error syndrome
is not well-defined for all QECCs, but even if we restrict attention to codes with an error syndrome, we
are faced with the syndrome decoding problem of determining the actual error given the error syndrome.

41
The syndrome decoding problem is NP-hard for most broad classes of codes. (This is true even for classical
error-correcting codes.) When we come up with new codes, we’d like ones for which the decoding problem
is not nearly as hard. Again, the ideal solution would be a code for which the syndrome decoding problem
can be solved in linear time. Such codes exist, but they are somewhat rare.
This is a rather long wish list of properties we’d like out of a code. It’s actually possible to satisfy many
of them at the same time, but we don’t currently know of any QECCs that are perfect in all respects. In
other words, choosing a code is a matter of trade-o↵s. We’ll want to use one code for some purposes and a
di↵erent code for other purposes, depending on which property is most desirable for our current goal.
One thing to bear in mind when looking for a new code is that codes which look di↵erent may have pretty
much the same properties. For instance, if you switch the first and second qubits of a code, you probably
have a di↵erent subspace than you did before, but it doesn’t really deserve the name of a new code. We
capture this intuition with the notion of equivalent codes:
Definition 2.8. Two QECCs C and C 0 on n physical qubits are equivalent if C 0 can be produced from C
by performing some set of single-qubit unitaries and by permuting the qubits.
The notion of equivalence can of course be extended to codes on a q n -dimensional Hilbert space as well.
Two equivalent codes have the same basic properties.

Proposition 2.13. If C and C 0 are equivalent, then C and C 0 have the same number of logical qubits and
the same distance.
It’s not necessarily true that C and C 0 correct exactly the same set of errors, since the local unitaries
and permutations of the qubits can scramble up the errors. However, it’s always true that an equivalence
will map errors into errors of the same weight. It’s also true that a fast decoding procedure for C will give
a fast decoding procedure for C 0 , so the codes don’t di↵er in computational complexity either.
When you’re looking for new codes, if you find a code that’s equivalent to a known code, you probably
shouldn’t consider it to be new. It’s not always straightforward to tell if two codes are equivalent, however.
The safest thing is to find codes with new (hopefully better) parameters ((n, K, d)) than preexisting codes.

42
Chapter 3

Will The Real Codeword Please

Stand Still?: Stabilizer Codes

The treatment of QECCs in the last chapter was very general, but a bit unwieldy, particularly when it comes
to finding new codes. We’d like a better method of discussing, manipulating, and finding QECCs, even if
it comes at the cost of restricting somewhat the codes we can talk about. This chapter introduces the class
of stabilizer codes, which have some nice properties and are much more tractable for most purposes than
a general QECC. Stabilizer codes are built around a group of symmetries of the code that can distinguish
between correct codewords and states with errors: The incorrect states will change under the action of the
symmetry, while the correct ones will stay put.

3.1 The 9-Qubit Code Revisited

3.1.1 Error Syndrome Measurement for the 9-Qubit Code
The 9-qubit code is a stabilizer code, so I’ll introduce the basic idea of a stabilizer code with some further
examination of the nine-qubit code. Let’s consider more carefully exactly what we measure when we measure
the error syndrome for the nine qubit code. There are two parts to it: We measure each set of three qubits
to see if one is di↵erent in the standard basis, and then we compare the three phases of the three sets of
three qubits to see if one of the phases is di↵erent from the other two.
Recall that to determine if one of three qubits is di↵erent, we make two measurements. One measurement
determines the parity of the first two qubits, and the other tells us the parity of the second and third qubits.
Let us put that in a more quantum language: Determining the parity of two qubits is the same as measuring
the eigenvalue of Z ⌦ Z on those qubits.
0 1
+1 0 0 0 00 (even)
B0 1 0 0 C 01 (odd)
Z ⌦Z =B @0
C (3.1)
0 1 0A 10 (odd)
0 0 0 +1 11 (even)

Eigenvalue +1 corresponds to even parity (syndrome bit 0) and eigenvalue 1 corresponds to odd parity
(syndrome bit 1). Thus, within each set of three, to figure out if one qubit is di↵erent in the standard basis,
we should measure Z ⌦ Z ⌦ I and Z ⌦ I ⌦ Z.
We also need to determine whether two sets of three have the same phase or opposite phase. We’d like to
phrase this as measurement of an eigenvalue. It should probably involve X, since the eigenvalues of X tell
us the phase in |0i ± |1i, but we actually have an entangled state. To determine the phase of |000i ± |111i
we should instead measure the eigenvalue of X ⌦ X ⌦ X. To determine whether two sets of three have the
same phase or opposite phase, we should thus measure the eigenvalue of the tensor product of six X’s on

43
|+i H

Figure 3.1: A non-fault-tolerant measurement procedure for X ⌦ X ⌦ X ⌦ X ⌦ X ⌦ X.

Z Z
Z Z
Z Z
Z Z
Z Z
Z Z
X X X X X X
X X X X X X

Table 3.1: The generators of the stabilizer for the nine-qubit code. Blank spaces represent I operators.

the six relevant qubits. Remember, we want to learn whether the phases are the same or di↵erent without
learning the actual value of each phase.
In case you’re curious as to how one would actually measure this eigenvalue, you can look at figure 3.1.
The question of how to measure the eigenvalue of a tensor product of arbitrary Paulis will be discussed in
detail in section 12.1. Most of the focus of that section is on how to do the measurement fault-tolerantly,
but it starts by discussing how to make a non-fault-tolerant measurement.

3.1.2 The Stabilizer for the 9-Qubit Code

The error syndrome for the 9 qubit code is 8 bits long: 2 bits for each set of three, identifying a bit flip
error within that group of qubits, plus 2 additional bits telling us if and where a phase error has occurred.
Putting together the list of all the operators whose eigenvalues give us those syndrome bits, we get the
list in table 3.1. Each row corresponds to one operator, and the operator in column i indicates how that
operator acts on qubit i. Blank spaces should be interpreted as I. Thus, the first row is the operator
Z ⌦ Z ⌦ I ⌦ I ⌦ I ⌦ I ⌦ I ⌦ I ⌦ I. In other stabilizers, sometimes I will include the Is and sometimes
omit them, whichever is most aesthetically appealing. (OK, maybe clarity plays a role too. And whimsy,
definitely whimsy.)
As I mentioned in section 2.2, the choice of what qubits to measure for the error syndrome is not unique.
For instance, we could have compared the parity of the first and third qubits. The parity would then be given
by the eigenvalue of the operator Z1 Z3 . Notice that Z1 Z3 is the product of the first two rows of table 3.1,
and naturally the eigenvalue of the product will be the product of the eigenvalues. In general, we could
use any product M of the operators listed in the table. The eigenvalue of M will tell us some information
about the errors, but it won’t be new information, since the eigenvalue of M can be determined from the
eigenvalues of the operators used in the product.
Really, the operators we can measure to tell us about the error form a group, known as the stabilizer of

44
the nine-qubit code. The eight operators listed in table 3.1 are the generators of the stabilizer. Choosing a
di↵erent way of defining the error syndrome of the code corresponds to choosing a di↵erent set of generators
of the stabilizer.
Notice that we use tensor products of Zs to identify X (bit flip) errors, and we use tensor products of
Xs to identify Z (phase flip) errors. The pattern here is that the errors anticommute with the stabilizer
elements used to find them, and it is this property that makes stabilizers useful for discussing quantum
error-correcting codes.

3.2 Pauli Group

I warned that you would be sick of the Pauli operators by the end of this book, and this chapter will get you
started. Stabilizer codes are built around the Pauli operators, as you can perhaps see from the example of
the nine-qubit code. It’s therefore worth examining them in more detail.

Definition 3.1. The Pauli group Pn is composed of tensor products of I, X, Y , and Z on n qubits, with
an overall phase of ±1 or ±i.

I will refer to elements of the n-qubit Pauli group as “Pauli operators,” “Pauli errors,” or just “Paulis”
in the future; if I need to make a distinction between P1 and Pn , I will do so explicitly.
The Pauli group, as its name suggests, is a group: It is closed under multiplication since Y = iXZ,
and similarly, the product of any two of X, Y , and Z is equal to the third with an overall factor of ±i. In
addition, X 2 = Y 2 = Z 2 = I. Any tensor product of Paulis is its own inverse, i.e., it squares to I, and
if there is an overall phase factor of ±i, it instead squares to I. (I use the one-qubit I and the n-qubit
identity I interchangeably.)
The one-qubit Paulis anticommute with each other

XZ = ZX (3.2)
YZ = ZY (3.3)
XY = Y X. (3.4)

Of course, I commutes with X, Y , and Z, and any Pauli commutes with itself. More general pairs (P, Q) of
operators from Pn always either commute (P Q = QP ) or anticommute (P Q = QP ). The di↵erence from
the single qubit case is that for P1 , only trivial pairs commute (when one Pauli is I or both are the same
Pauli), but for n > 1, there are non-trivial commuting pairs. For instance, X ⌦ X commutes with Z ⌦ Z.
I will sometimes use the notation [P, Q] = P Q QP for the commutator and {P, Q} = P Q + QP for the
anticommutator.
Pn has 4n+1 elements since there are 4n n-fold tensor products of I, X, Y , and Z, and 4 overall phases
they could have. However, for many purposes, we can ignore the phase, giving us e↵ectively 4n distinct
Paulis. If I wish to ignore the phase, I will refer to P̂n . Thus, P̂n ⇠
= Pn /{I, iI, I, iI}. The elements of
P̂n are sets of 4 operators of the form {±P, ±iP }, with P some n-qubit tensor product of I, X, Y , and Z.
Each element of P̂n can thus be associated with an unsigned Pauli operator P , and it is more convenient by
far to refer to elements of P̂n in terms of the associated Pauli rather than as a set of Paulis with signs.

Definition 3.2. Let P 2 Pn . Then P̂ 2 P̂n is the element of P̂n corresponding to P . Similarly, if S is a
subset of Pn , then Ŝ is the subset of P̂n consisting of P̂ for all P 2 S.

I have introduced these conventions separating Pn and P̂n in order to be mathematically precise. In
some cases, we need to work with elements or subsets of Pn , and in some cases, to get the details completely
correct, it is better to work with P̂n . However, almost always, the distinction between the two is a small
technical detail. My advice is to ignore the di↵erence between Pn and P̂n unless you get confused.
Any Pauli operator has eigenvalues +1 and 1. Each eigenspace is exactly half the Hilbert space. That
is, the +1 and 1 eigenspaces have dimension 2n 1 .

45
One feature of the Pauli group already mentioned is that Pn spans the space of all n-qubit linear opera-
tions. The weight t Paulis span the set of t-qubit errors. As discussed in corollary 2.5, this property means
that to have a t-error correcting QECC, it suffices to correct all weight t Pauli operators.
This section has only discussed the most basic properties of the Pauli group. It has many more useful
properties, some of which we will encounter in future sections of the book. Don’t be surprised, however, if
someday you are trying to solve a problem and discover a new property of the Paulis that is not mentioned
in this book. In group theory, the Pauli group is an extraspecial group, and it truly deserves that name.

3.3 Stabilizer Codes

3.3.1 Definition and Basic Properties of the Stabilizer
Given a general quantum error-correcting code, we can define its stabilizer in more or less the same way as
we did for the nine-qubit code:
Definition 3.3. Given a QECC T ✓ H2n , the stabilizer S(T ) is

S(T ) = {M 2 Pn |M | i = | i 8| i 2 T }. (3.5)

In other words, the stabilizer is composed of the Pauli operators for which all codewords are +1 eigen-
states. In principle, one could define a stabilizer consisting of all unitaries which fix every codeword, but it
turns out to be most useful to concentrate on the Pauli operators.
Proposition 3.1. The stabilizer S(T ) of a nonempty QECC T has three basic properties:
a) I2
/ S(T )
b) S(T ) is a group
c) S(T ) is Abelian
The property of being Abelian is the least obvious, but it makes sense since we want a set of operators
which have the codewords as simultaneous eigenstates. Normally, for this to be possible, the operators should
commute; in general, they only need to commute on a subspace, but for Pauli operators this is only possible
if they truly commute.
Proof.
a) I has no +1 eigenstates, so it cannot be in the stabilizer.
b) If M, N 2 S(T ), it is clear that their product is also in S(T ): For any | i 2 T ,

M N | i = M | i = | i. (3.6)

c) By the argument for property 2, M N | i = N M | i for any | i 2 T , M, N 2 S(T ). Thus, [M, N ] annihilates
the codewords. Paulis either commute or anticommute. If M and N anticommute, then [M, N ] = 2M N ,
which is again an element of Pn , and therefore has no 0 eigenvalues. Therefore, the only option is that M
and N commute.

It turns out to be most useful, when dealing with stabilizers, to work the other way around. We are given
(or invent) a stabilizer with the properties we desire, and then use it deduce the code.
Definition 3.4. Let S ✓ Pn be an Abelian group, with I 2 / S. Then define the code space corresponding
to S by
T (S) = {| i s.t. M | i = | i 8M 2 S} (3.7)

46
X Z Z X I
I X Z Z X
X I X Z Z
Z X I X Z

Table 3.2: The generators for the five-qubit code.

In general, T ✓ T (S(T )), but when they are actually equal, the code has special properties.
Definition 3.5. If T is a QECC with T = T (S(T )), it is a stabilizer code.
Stabilizer codes are also sometimes known as symplectic codes, additive codes, or GF(4) additive codes
for reasons that will be discussed in section 3.5 and section 5.2.
If we define a code from its stabilizer, then it is always a stabilizer code. In other words, a stabilizer code
is one that could be defined just by giving its stabilizer. This is a consequence of the following proposition:
Proposition 3.2. S = S(T (S)).
I delay the proof until section 3.5 to keep you in suspense. Also, the proof will use a tool I won’t introduce
until then.
When dealing with stabilizer codes, I will frequently refer to the stabilizer S as the code instead of using
the rather unwieldy phrase “stabilizer of the code T (S).” When M 2 S, I will call M a “stabilizer element”
or some variant of that phrase. However, many people seem to find that terminology unwieldy too, and just
call M a “stabilizer.” I do not approve of that usage, but I can’t stop you if you want to say it.
Frequently we will want to pick a particular generating set {M1 , . . . , Mr } for the stabilizer, as we did for
the nine-qubit code. In a minimal generating set no generator is a product of other generators. When we
take a product of generators, the order does not matter since the generators commute. Also, note that since
I 2/ S, all elements of S must square to I, and any power of a generator greater than 1 can be reduced.
Therefore, the elements of the stabilizer are uniquely determined by taking a product
Y i
Mi1 ,...,ir = Mj j , (3.8)
j

with ij 2 {0, 1}. This implies |S| = 2r .

Naturally, the set of generators is not unique, but when we do have a particular generating set to work
with, I will talk about its elements as “stabilizer generators” or just “generators.” When you describe a
stabilizer, almost always the easiest way to do so is by listing one particular set of generators. The generators
are enough to define the whole stabilizer, and it is much easier to list r generators than to list all 2r elements
of the stabilizer.
The code space of a stabilizer code is defined by requiring that all the eigenvalues be +1, but there
is nothing magical about the +1 eigenstates. After all, a 1 eigenstate of M is a +1 eigenstate of M ,
which is still in Pn . Given any set of generators {Mi }, we could define the code equally well by taking
the 1 eigenstate, but switching Mi to Mi . We can’t arbitrarily change the eigenvalues associated with
non-generators because the eigenvalues of the non-generators can be deduced from the eigenvalues of the
generators. The generators are independent of each other, so we can pick their eigenvalues freely, but once
we’ve done that, it defines the eigenvalues of all operators in the stabilizer. For more on this point, see
section 3.5.
I’ll conclude this subsection with a second example QECC to go with the nine-qubit code. This code has
five physical qubits, and is, unsurprisingly, known as the “five-qubit code.” The stabilizer for the five-qubit
code is given in table 3.2. The code is cyclic: if you permute the qubits cyclically, you’ll get the same
stabilizer. You can almost see this from table 3.2. Shifting by one qubit for generators one through three
gives you generators two through four. However, the fifth permutation Z ⌦ Z ⌦ X ⌦ I ⌦ X is not one of the
generators. Nevertheless, it is still in the stabilizer, as the product of all four generators given in the table.

47
Even though it’s not a generator, it still has the same status in the code as the four operators that are listed.
Indeed, we could have chosen any four of these five operators as a set of generators without changing the
code.

3.3.2 Projection Operator on a Stabilizer Code Subspace

In order to convert back from the representation of a stabilizer code in terms of its stabilizer to one as
a subspace of a larger physical Hilbert space, we have definition 3.4, but it is sometimes helpful to have
something more concrete. In particular, we’d like a projection operator that defines the subspace.
The codewords are supposed to be +1 eigenvectors of every element of the stabilizer, although it is
sufficient to check that they are +1 eigenvectors of the stabilizer generators. The projector on the +1
eigenspace of generator Mi is (I + Mi )/2. Therefore, the projector on the code space of S is
r
1 Y
⇧S = (I + Mi ), (3.9)
2r i=1

when the stabilizer has r generators. Only states that are codewords — +1 eigenvectors of all generators
— will avoid being annihilated by ⇧S , and any codeword will be left alone by ⇧S , as desired. Note that the
order of the generators in the product does not matter since they all commute.
We can rewrite the projector ⇧S in an interesting way by multiplying out the product. We get a sum
of terms, each of which consists of a distinct product of the generators {Mi }. Based on equation (3.8), the
products of the generators give us all elements of the stabilizer:
1 X
⇧S = M. (3.10)
2r
M 2S

We can come up with actual codewords for the code by applying ⇧S to states in the physical Hilbert
space and renormalizing. We might get 0, if the state we started with is orthogonal to the code space, but
frequently we’ll get a real codeword. For instance, for the five-qubit code, we can apply the projector to
|00000i and |11111i to get two orthogonal codewords which can serve as a basis for the code space:

|0i = |00000i + |10010i + |01001i |11011i + |10100i |00110i |11101i |01111i

(3.11)
+ |01010i |11000i |00011i |10001i |11110i |01100i |10111i + |00101i
|1i = |11111i + |01101i + |10110i |00100i + |01011i |11001i |00010i |10000i
(3.12)
+ |10101i |00111i |11100i |01110i |00001i |10011i |01000i + |11010i.

I got these codewords by working my way through all 16 elements of the stabilizer for the five-qubit code
(products of the generators given in table 3.2) and applying them to the starting states. You need to be
careful of the signs involved, but otherwise the process is straightforward if tedious.

3.3.3 Distance and Size of a Stabilizer Code

When a stabilizer code is given either as a subspace or via the stabilizer, the number of physical qubits n is
immediately obvious. The other two central properties (number k of logical qubits and distance d) are not
so obvious, but can be deduced directly from the stabilizer.
Proposition 3.3. If the stabilizer S on n physical qubits has |S| = 2r (i.e., S has r generators), then T (S)
has dimension 2n r . That is, there are k = n r logical qubits.
Intuitively the result is easy to understand. The first generator of the stabilizer divides the Hilbert
space into a +1 eigenspace and a 1 eigenspace of equal size. The codewords live in the +1 eigenspace.
Each additional generator divides the subspace available for codewords in half again, so the total available
dimension is 2n r . Non-generators do not divide the space since their eigenvalues can be derived by looking

48
at the eigenvalues of the generators. This argument is not a proof, since we would need to show that the
additional generators divide not just the full Hilbert space in half, but also all the smaller subspaces defined
by the earlier generators. The actual proof is straightforward, but less intuitive:

Proof. The dimension of T (S) is equal to the trace of ⇧S , the projection operator onto T (S). Now,

1 X
tr ⇧S = tr M. (3.13)
2r
M 2S

However, tr P = 0 for all Paulis P except for the identity. Thus,

tr I 2n
tr ⇧S = = . (3.14)
2r 2r

You can check the nine-qubit code in this formula: 9 physical qubits, 8 stabilizer generators, and 1
encoded qubit. For the five-qubit code, we have four generators, so again there should be 1 encoded qubit,
with the basis codewords we saw above.
There’s one special case which is not, strictly speaking, a QECC, but is interesting nonetheless. Yes, it’s
true, there are some things which are interesting other than quantum error correction. When the stabilizer
has n generators on n qubits, proposition 3.3 would tell us we have 0 encoded qubits, which is a Hilbert
space of dimension 1. That is, it is a single state, up to normalization.

Definition 3.6. A stabilizer state is the code space of a stabilizer with n generators on n qubits.

Since r = n is the maximum number of generators you can have, a stabilizer state is an extreme limit
of a QECC. Some constructions of QECCs will alter the number of encoded qubits from another code, and
sometimes a stabilizer state can be the starting or ending point of such a construction. Also, stabilizer states
are fairly common in the theory of quantum information, even discounting states arising from quantum error
correction. For instance, the GHZ state |000i + |111i and the Bell states |00i ± |11i, |01i ± |10i are stabilizer
states.
To understand how to deduce the distance of a stabilizer code, let us go back to the nine-qubit code and
recall how it handled errors. The generators of the stabilizer were used to give us bits of the error syndrome.
In particular, some generators were able to signal the presence of certain errors while missing other errors,
but by looking at the full set of generators, we were able to identify all of the single-qubit errors.
I mentioned that the property responsible for determining whether a generator M is useful for an error
E is anticommutation. Let us see how this works. Suppose M 2 S and E 2 Pn anticommutes with M . Then
for any | i 2 T (S),
M (E| i) = EM | i = E| i. (3.15)

| i was a +1 eigenvector of M — that is the definition of the code space — but E| i is a 1 eigenvector.
Conversely, if E commutes with M then

M (E| i) = EM | i = E| i. (3.16)

One advantage to dealing with the Pauli group is that these are the only choices. Either M and E commute
or they anticommute. If we have an error that commutes with M , M retains a +1 eigenvalue, whereas if the
error anticommutes with M , M ’s eigenvalue becomes 1, signaling that an error has occurred.

Definition 3.7. The normalizer N(S) of the stabilizer S is

N(S) = {N 2 Pn |N M = M N 8M 2 S}. (3.17)

49
If you know some group theory, you might recognize this as the definition of the centralizer of S (the set
of things that commute with all elements of S) rather than the normalizer (the set of things that preserve S
under conjugation), but because Paulis either commute or anticommute and I 62 S, they are the same thing
for a stabilizer. I am choosing to call it the normalizer rather than the centralizer because the normalizer
relates to logical operations, and this is an important function of N(S), as we shall see shortly in section 3.4.2.
Since the stabilizer is Abelian, S ✓ N(S) always. Indeed, N(S) also contains S and ±iS. Since we worry
about eigenvectors of S, changing the sign of an operator in the stabilizer is important, but N(S) is about
errors, and global phase no longer matters. Therefore, we will usually want to work with N̂(S).
The normalizer tells us which errors can be detected by the stabilizer code. If a Pauli E is outside the
normalizer N(S), then E| i has an eigenvalue 1 for some M 2 S, and thus is detected by measuring the
eigenvalues of the stabilizer elements. Note that this is true for any codeword | i. Also note that it is
sufficient to measure the generators of the stabilizer: If N commutes with M1 and M2 , it also commutes
with M1 M2 . Thus, if N commutes with all generators of S, then N 2 N(S).
When E 2 N(S), then E| i has eigenvalue +1 for all M 2 S, and therefore E| i 2 T (S) for codewords
| i. You might think that that means it is an undetectable error, but there is actually another class of Paulis
that is “detectable” by definition 2.7. If E 2 S, then, while it’s true that E| i 2 T (S) for any codeword | i,
it’s also true that E| i = | i, so E is not actually an “error”: it acts like the identity on the code space,
leaving codewords unchanged. Ignoring global phases, we can say the same if Ê 2 Ŝ.
Putting this together, we get a characterization of the detectable errors for a stabilizer code.
Theorem 3.4. The set of undetectable errors for a stabilizer code S is N̂(S) \ Ŝ. The distance of S is
min{wt E|Ê 2 N̂(S) \ Ŝ}.

The slanty line is a “set minus” operation. That is, N̂(S) \ Ŝ consists of those elements of N̂(S) that are
not in Ŝ.
Proof. We can prove the first statement directly from the definition of the set of detectable errors (defini-
tion 2.7). The second statement will then follow immediately from the definition of distance. We need to
consider three cases for E. In cases 1 and 2, | i and | i are arbitrary codewords.

1. Case 1: Ê 2 Ŝ. Then for some choice of phase, E 2 S and E| i = | i, so

h |E| i = h | i, (3.18)

and c(E) = 1. Ê is detectable.

2. Case 2: Ê 2
/ N̂(S). Then 9M 2 S such that {M, E} = 0 (for any choice of phase of E), so M (E| i) =
E| i, as per equation (3.15). Then

h |E| i = h |M EM | i = h |E| i = 0, (3.19)

since M 2 = I for stabilizer elements. Thus c(E) = 0 and E is detectable.

3. Case 3: Ê 2 N̂(S) \ Ŝ. Since E 2

/ S, 9 codeword | i such that E| i =
6 | i. Let | i = E| i. Since
E 2 N(S), | i is also a codeword. However,
1
h |E| i = 1 = h | i, (3.20)
h | i

whereas
h |E| i = h | i = (h | i)h | i. (3.21)
Comparing equations (3.20) and (3.21) to definition 2.7 tells us that Ê is not detectable unless |h | i| =
1, meaning E| i = ei✓ | i. Furthermore, by equation (3.21), Ê is undetectable unless ✓ is the same for
all | i. But that is not possible, since that would imply Ê 2 Ŝ.

50
X X X X
Z Z Z Z

Table 3.3: The stabilizer for the four-qubit code.

The key point here is that when E 2 N̂(S) \ Ŝ, then E maps some codewords to di↵erent codewords.
That’s the essence of an undetectable error, because the code has no way of knowing if a codeword is the
original encoding of the state or the result of the action of the error on a di↵erent codeword.
Moving now to correcting errors, we find
Theorem 3.5. The stabilizer code S corrects a set of errors E ✓ Pn i↵ Ê † F̂ 2
/ N̂(S) \ Ŝ for all E, F 2 E.
The theorem follows immediately from theorem 3.4 by comparing definition 2.7 with theorem 2.7. Recall
that by the linearity of QECCs, and particularly corollary 2.5, it suffices to consider Pauli errors to understand
the error-correcting capabilities of the code, at least when we are interested in correcting t-qubit errors.
For stabilizer codes, we have a slightly di↵erent notation than the more general ((n, K, d)) notation that
applies to arbitrary QECCs. Since the encoded subspace has a dimension that’s always a power of 2, we
write the code in terms of the number of encoded qubits k rather than the dimension K = 2k of the logical
Hilbert space. We also use square brackets to indicate that we are dealing with a stabilizer code.
Notation 3.8. A stabilizer code with n physical qubits, k logical qubits, and distance d is denoted as an
[[n, k, d]] code. If the distance is unknown or irrelevant, it is an [[n, k]] code.
We know that the nine-qubit code corrects a single-qubit error, so it must have distance at least 3. In
fact, there are some 3-qubit Paulis (such as X1 X2 X3 ) in N̂(S) \ Ŝ for the code, so the distance is exactly 3.
Thus, the nine-qubit code is a [[9, 1, 3]] code. The five-qubit code also turns out to have distance 3, so it is a
[[5, 1, 3]] code. Note that these codes are also ((9, 2, 3)) and ((5, 2, 3)) codes, but the more specific notation
for a stabilizer code is generally used for them. As it happens, the five-qubit code is the smallest distance 3
code. You’ll see the proof of that, as well as other techniques for proving limits on QECCs, in chapter 7.
You might wonder about distance 2 codes, since all the QECCs we have seen so far have distance 3.
Distance 2 codes tend to be much simpler, but they can still be interesting. After all, a distance 2 code can
detect one error or correct one erasure. The smallest distance 2 code is a [[4, 2, 2]] code given in table 3.3.
It is not hard to see that any one-qubit Pauli will anticommute with one or both of the two generators, but
there are some two-qubit Paulis (e.g., X ⌦ X ⌦ I ⌦ I) that commute with both. Thus, the code has distance
2.

3.3.4 Degeneracy and Stabilizer Codes

When is a stabilizer code degenerate? In the proof of theorem 3.4, we implicitly calculated the matrix Cab .
In particular, we found that Cab = 0 if E † F 2 / N(S) and Cab = 1 if E † F 2 S. Any set of Paulis is linearly
independent, so we just need to check if the resulting Cab has maximum rank.
If E1† E2 2 S, then E1† F 2 S , E2† F 2 S, so in this case the rows (or columns) of Cab associated with E1
and E2 are the same. When E1† E2 2 / S, then only one (or neither) of E1† F and E2† F can be in S, so in this
case the rows associated with E1 and E2 cannot have 1s in the same column. Thus, we get the following
proposition:
Proposition 3.6. A stabilizer code S is degenerate for the error set E i↵ 9E1 , E2 2 E with E1† E2 2 S.
As with general QECCs, we can define degeneracy as a property of the code subspace without referring
to specific set of errors by considering the distance. In the case of stabilizer codes, motivated by the above
analysis, we can also sensibly define degeneracy when the code is used for error detection rather than error
correction.

51
Definition 3.9. A stabilizer code S with distance d is degenerate if 9M 2 S, M 6= I, with wt M < d.
Note that this is slightly more general than the generic notion of a degenerate code. A stabilizer code
with distance d = 2t + 2 can be degenerate if S contains any elements of weight 2t + 1, but degeneracy wasn’t
defined for general non-stabilizer QECCs of even distance.
Looking once more at the stabilizer of the nine-qubit code (table 3.1), it is immediately obvious that it
is a degenerate code. There are many generators of weight 2 and the code has distance 3. However, it is
not always obvious from looking at the generators whether a stabilizer code is degenerate or not. It may be
that the set of generators given to you all have high weight, but there is some product of the generators that
does not. The five-qubit code is non-degenerate because all of the operators in the stabilizer have weight 4,
but to see that you need to do some work.

3.4 Cosets and Error Syndromes

3.4.1 The Error Syndrome and the Stabilizer
For the nine-qubit code, we figured out the stabilizer by thinking about what we wanted to measure for the
error syndrome. Each generator corresponded to a bit of the error syndrome. In fact, this is completely
general:
Definition 3.10. Suppose S is a stabilizer code with generators M1 , . . . , Mr and | i is an eigenvector (not
necessarily with eigenvalue +1) of all N 2 S. The error syndrome of | i is an r-bit string with ith bit 0 if | i
is a +1 eigenvector of Mi and ith bit 1 if | i is a 1 eigenvector of Mi . The error syndrome (E) : Pn ! Zr2
is the error syndrome of E| i for any codeword | i. When the stabilizer associated with a syndrome is in
doubt, we will write S (E).
If P, Q 2 Pn , then c(P, Q) = 0 if P and Q commute and c(P, Q) = 1 if they anticommute. (c(P, Q) :
Pn ⇥ Pn ! Z2 .)
For a Pauli, bit i of the error syndrome is the same as c(E, Mi ). Note that the syndrome of E will be
the same no matter what codeword | i we choose to evaluate. This follows from equations (3.15) and (3.16)
and means that Pauli errors map the code space, which is a subspace with +1 eigenvalue for all stabilizer
elements, to a subspace that has a di↵erent set of eigenvalues, but for which each eigenvalue is still shared
for all states in the subspace.
The subspaces derived this way are also error-correcting codes, but they are associated with di↵erent
eigenvalues of the stabilizer. We can reinterpret them as traditional stabilizer codes (with all +1 eigenvalues)
by replacing Mi with Mi whenever the ith syndrome bit is 1. Note that there are 2r possible values of the
error syndrome, and each subspace is isomorphic to the code space, which has dimension 2k . When there
are n physical qubits, r = n k. Also, all the subspaces of di↵erent error syndromes are orthogonal to each
other, since they have di↵erent eigenvalues. Thus, the full Hilbert space H2n decomposes as a direct sum of
the 2n k subspaces associated with di↵erent syndromes.
Proposition 3.7. The c(P, Q) and (E) functions have the following properties. In all cases, + refers to
binary addition.
a) c(P1 P2 , Q) = c(P1 , Q) + c(P2 , Q).
b) c(P, Q) = c(Q, P ).
c) (EF ) = (E) + (F ).
These are straightforward to verify.
The various error syndromes are associated with di↵erent cosets of the normalizer in Pn .
Proposition 3.8. Let E, F 2 Pn , and S be a stabilizer. Then F and E are in the same coset of N(S) i↵ E
and F have the same error syndrome.

52
Proof.
): If E and F are in the same coset, then F = EN , with N 2 N(S). Let M 2 S. Then c(F, M ) =
c(E, M ) + c(N, M ). But N 2 N(S), so c(N, M ) = 0. Therefore the error syndromes of E and F are the
same.
(: The argument works the other way too: Let N = E † F . If the error syndromes of E and F are the
same, then c(N, M ) = 0 for all M 2 S. Therefore, N 2 N(S).

In much the same way as the full Hilbert space can be written as a direct sum of subspaces associated
to di↵erent syndromes, the whole Pauli group can be partitioned into cosets of N(S), each associated with a
di↵erent syndrome. All the cosets are the same size and there are 2r of them. As a consequence, we know
the size of the normalizer:

Proposition 3.9. |N(S)| = 4 · 2n+k when S has n physical qubits and k logical qubits.

3.4.2 Cosets inside the Normalizer

We can similarly analyze the cosets of S in N(S).

Proposition 3.10. Let N1 , N2 2 N(S). Then N1 and N2 are in the same coset of S i↵ N1 | i = N2 | i for
all codewords | i.

Proof. N1 | i = N2 | i i↵ | i is a +1 eigenstate of M = N1† N2 . This is true for all codewords | i i↵ M 2 S.

However, N1 and N2 are in the same coset of S i↵ N2 = N1 M with M 2 S.

Recall that when N 2 N(S), then N | i is a codeword for any input codeword | i. Thus, N is a logical
operation: it always maps codewords to (possibly di↵erent) codewords. Since the di↵erent representatives of
a coset of S act the same way on codewords, they are all di↵erent realizations of the same logical operation.
Thus, the set N(S)/S is of particular interest, since it consists of the distinct logical operations performed
by the normalizer. (S is a normal subgroup of N(S) by definition, so we can take this quotient without any
difficulties.)

Theorem 3.11. Let S be a stabilizer with n physical qubits and k logical qubits. Then N(S)/S ⇠
= Pk .

The quotient group N(S)/S can be taken to be the logical Pauli group, performing Paulis on the encoded
qubits. The proof of theorem 3.11 will be in section 3.5.
We can also look at cosets of S in the full Pauli group Pn , as in figure 3.2. Each coset of N(S) breaks
up into cosets of S, so we can associate each coset of S in Pn with an error syndrome, inherited from the
coset of N(S) which it sits within. If we pick a particular representative E of the coset of N(S), then we can
again identify the coset of S with a logical operation P 2 N(S)/S; the interpretation of the coset is then a
combination of the logical operation P and the error E.
When we perform decoding on a stabilizer code, we do just this. For each error syndrome s, we assign
a particular error Es with (Es ) = s. If our syndrome measurement gives s, we assume the error actually
was Es and correct that. If the error was actually some E 0 with the same syndrome, we hope that Es and
E 0 are in the same coset of S. If not, there has been a logical Pauli error, the one associated with the actual
coset of S in which E 0 lies.

Theorem 3.12. If S is a non-degenerate stabilizer code, the error syndromes of all errors in the correctable
error set E are distinct. If S is a degenerate code, E and F have the same error syndrome i↵ Ê † F̂ 2 Ŝ.

Proof. By proposition 3.8, two errors E 6= F 2 E have the same error syndrome i↵ they are in the same
coset of N(S), meaning E † F 2 N(S). But both errors are correctable, and by theorem 3.5, Ê † F̂ 62 N̂(S) \ Ŝ,
so E † F 2 N(S) i↵ Ê † F̂ 2 Ŝ. Thus, the error syndromes of two correctable errors E and F are the same i↵
Ê † F̂ 2 Ŝ. If this ever occurs, the code is degenerate.

53
S X FS FX
syndrome 00 syndrome 10
no correction correct as F
Z Y FZ FY

ES EX GS GX
syndrome 01 syndrome 11
correct as E correct as G
EZ EY GZ GY

Figure 3.2: Cosets of the normalizer and stabilizer in the Pauli group. The errors E, F , and G are chosen
as canonical errors for the syndromes 01, 10, and 11.

X Z Z X I
I X Z Z X
X I X Z Z
Z X I X Z
X X X X X X
Z Z Z Z Z Z

Table 3.4: The generators for the five-qubit code supplemented by representatives for logical Paulis.

Applying this theorem is one way to see that the five-qubit code has distance 3 and is non-degenerate.
There are 15 possible one-qubit errors (X, Y , Z on each of five qubits), plus the identity. There are 4
generators, so there are 16 error syndromes. If you list the syndromes of the 16 errors, you’ll find that each
one is unique. There are no syndromes left over, so the five-qubit code is known as a perfect code.
It is frequently helpful to also pick representatives of the cosets of S in the normalizer. These repre-
sentatives don’t have an interpretation in error correction, but they do give us concrete realizations of the
logical Pauli group, which is helpful in fault-tolerant computation. For now, they are simply convenient
computational tools. We don’t actually need to pick representatives for every coset of S. Since N(S)/S has
a group structure itself, it is sufficient to pick representatives of generators of N(S)/S and let the generators
of other cosets be determined by multiplication. In other words, we can pick representatives for X i and Z i
for i = 1, . . . , k. (The subscript i here means the operator acts on the ith logical qubit, not the ith physical
qubit.) I have done this for the five-qubit code and the four-qubit code in tables 3.4 and 3.5. Then, for
instance, you can get Y for the five qubit code to be iXZ = Y ⌦ Y ⌦ Y ⌦ Y ⌦ Y .
You need to be careful when assigning cosets to elements of the logical Pauli group. Remember, N(S)/S ⇠
=
Pk , and we’d like to realize this through a choice of representatives for the generating cosets. This means
that the commutation and anticommutation relationships of the logical Paulis must be realized through the
coset representatives. For instance, the coset representatives for X i and Z i must anticommute, while the
coset representative of X i must commute with the representatives for X j or Z j (j 6= i).
We could in principle do the same thing for Pn /N(S), choosing representative errors only for the basis
error syndromes and deducing the other errors by multiplication. However, in most cases, this is not a good
thing to do. If our goal is to correct t errors for the maximum possible t, what we’d like to do instead is
choose the lowest weight error in each coset of N(S) as its representative. If we rely on multiplication to tell
us the coset representatives, we’ll frequently get errors of higher weight than necessary, and then there will
be some lower weight errors that won’t get corrected properly.

54
X X X X
Z Z Z Z
X1 X X I I
X2 X I X I
Z1 I Z I Z
Z2 I I Z Z

Table 3.5: The stabilizer for the four-qubit code supplemented by representatives for logical Paulis.

3.4.3 Maximum Likelihood Decoding

I’d like to depart briefly from the convention of most of this chapter, for which we had some fixed set E of
errors that we’d like to correct, and we won’t settle for anything less than correcting all of them. For the
purposes of this subsection, assume we instead have some n-qubit Pauli channel, with P 2 Pn occurring
with probability pP .
Given a stabilizer code S, we’d like to find a decoding procedure which minimizes the probability of error
when the code goes through a Pauli channel. As discussed above, a decoding procedure can be understood
as assigning to each error syndrome s a Pauli Qs . When we measure s, we assume the actual error was Qs
and so perform the correction by inverting that error. If the actual error was something else, we may end
up with the wrong state. In particular, if the true error P was in a di↵erent coset of S than Qs was, we’ll
end up with some logical Pauli operation.
Thus, we can calculate the probability of logical error by summing over the cosets of S and N(S).

Proposition 3.13. Let Cs be the coset of N̂(S) with error syndrome s. Q̂s Ŝ is the coset of Ŝ containing the
canonical error Q̂s with syndrome s, so Q̂s Ŝ ✓ Cs .

a) The probability of error syndrome s (regardless of whether we correctly identify the error) is
X
ps = pP . (3.22)
P̂ 2Cs

b) The probability of having syndrome s and no logical error (i.e., error correction is successful) is
X
ps,OK = pP . (3.23)
P̂ 2Q̂s Ŝ

c) The total probability of successfully decoding the state is

X X
pOK = pP . (3.24)
s P̂ 2Q̂s Ŝ

There is no interaction between the di↵erent error syndromes. If we change Qs for one value of s, the
only change to pOK comes through the change to ps,OK . We can therefore treat each syndrome separately
to maximize ps,OK . It doesn’t matter which element of the coset Qs S we choose, only which coset we choose
within Cs .
In order to maximize the probability of successfully decoding, for each syndrome s, we should choose the
coset Qs S which maximizes ps,OK . As given by equation (3.23), that just means we look at each coset and
add up the probability of all Paulis in the coset. Then we choose the coset with the highest value to be the
representative coset for s.
The maximum likelihood decoder should be contrasted with the decoder that optimizes the distance, for
which we choose the coset containing the lowest-weight Pauli. If we have an independent Pauli channel, the

55
two procedures frequently, but not always, give the same answer: When the probability of error per qubit is
small, a specific low-weight error will have higher probability than a specific high-weight error. However, a
coset which contains one low-weight error and a bunch of high-weight errors may be less likely than a coset
which contains a large number of medium-weight errors.
Getting the exactly optimal decoder, in the sense of always picking the most likely coset, is usually a
computationally challenging task. Often, therefore, we are willing to settle for a good approximate decoder
that gets pOK to be close to the optimal value but with a much smaller computational cost.

3.5 Binary Symplectic Representation

3.5.1 The Symplectic Representation of Paulis
One of the most useful techniques when dealing with stabilizer codes is to ignore the phases of Paulis and
work with P̂n instead of Pn . We lose a couple of things by doing this. One is the ability to define the code
space, which seems like a serious loss, but isn’t really — as discussed above, the other eigenspaces of the
stabilizer have the same error correction properties as the code space. The other missing thing is the ability
to tell whether Paulis commute or anticommute, since P̂n is an Abelian group. This is a much greater loss,
so we’ll need to find a substitute.
The structure of P̂n is very straightforward. There are 4n elements, and all pairs commute under multi-
plication. Therefore, P̂n ⇠
= Z2n
2 . The group operation for Z2 is usually written as addition. Conventionally,
2n

we choose to represent elements of P̂n as two n-bit binary vectors, one representing the “X” component and
one representing the “Z” component.

Definition 3.11. The binary symplectic representation of P̂n is an isomorphism between P̂n and Zn2 ⇥ Zn2 :
P $ vP = (xP |zP ). The ith component of xP is 0 if P acts on qubit i as I or Z and 1 if P acts on qubit i
as X or Y . The ith component of zP is 0 if P acts on qubit i as I or X and 1 if P acts on qubit i as Y or
Z. That is,
I (0|0)
X (1|0)
! (3.25)
Y (1|1)
Z (0|1)

For instance, X ⌦ I ⌦ Y $ (1 0 1|0 0 1). The “symplectic” refers to the substitute for commutativ-
ity/anticommutativity:

Definition 3.12. The symplectic form (or symplectic product) on Zn2 ⇥Zn2 is (x1 |z1 ) (x2 |z2 ) = x1 ·z2 +x2 ·z1 ,
where the dot product is the usual scalar product in the vector space Zn2 and addition is modulo 2.

Proposition 3.14. vP vQ = c(P, Q) for P, Q 2 Pn .

Proof. This can be checked exhaustively for a single qubit. For more than one qubit, note that both and
c(·, ·) are equal to the parity of their single-qubit results. Therefore, since they are equal for single qubits,
they are also equal for n qubits.

Combining the symplectic form with the binary symplectic representation recovers the primary structure
of the Pauli group. We can switch back and forth between the two representations to use whichever one is
the most convenient at the moment. The conversion process is summarized in table 3.6. The one thing that
is lost in the transition is the phase of a Pauli, so you must be careful whenever dealing with something that
depends on that.
Some of the conversions need a little more explanation. There were three conditions for S to be a
stabilizer: that I 2 / S, that S is a group, and that S is Abelian. Since phase is lost when converting to
the binary symplectic representation, the first condition is irrelevant. The second condition means that S
becomes a linear subspace in Zn2 ⇥ Zn2 , and the third means that v w = 0 8v, w 2 S. This analysis and the
conversion of the normalizer prompts the following definition:

56
In the Pauli group Binary symplectic representation
P 2 Pn vP = (xP |zP ) 2 Zn2 ⇥ Zn2
Multiplication Addition
c(P, Q) vP vQ
Phase No equivalent
Stabilizer S Weakly self-dual subspace S
Normalizer N(S) Dual subspace S? (under )
Minimal set of generators for S Basis for S

Table 3.6: Equivalence between the Pauli group and its binary symplectic representation.

0 1
1 0 0 1 0 0 1 1 0 0
B0 1 0 0 1 0 0 1 1 0C
B C
B1 0 1 0 0 0 0 0 1 1C
B C
B0 1 0 1 0 1 0 0 0 1C
B C
X @1 1 1 1 1 0 0 0 0 0A
Z 0 0 0 0 0 1 1 1 1 1

Table 3.7: The binary symplectic representation of the stabilizer and logical Pauli operators for the five-qubit
code.

Definition 3.13. Let V be a linear subspace of Zn2 ⇥ Zn2 . The dual of V (with respect to ) is V ? = {w 2
Zn2 ⇥ Zn2 |w v = 0 8v 2 V }. A subspace is self-dual if V ? = V . A subspace is weakly self-dual if V ✓ V ? .

Table 3.7 gives the five-qubit code in binary symplectic representation, including the logical X and Z
operators.
Conversion can go the other way too. For instance, it is convenient to define a notion of a set of Paulis
being independent based on the standard definition for binary vector spaces:

Definition 3.14. A set of Paulis {P1 , . . . , Pm } is independent i↵ the vectors {vP1 , . . . , vPm } are linearly
independent.

A set of Paulis is independent unless one of them is a product of others; this is equivalent to saying that
a set of 2n-bit binary vectors is independent unless one of them is a sum of others. Since the binary vector
space is 2n-dimensional, the maximum number of independent Paulis is 2n, which is equal to the number of
generators of the Pauli group.

3.5.2 Linear Algebra Lemma

The power of the binary symplectic representation is that we can use standard results from linear algebra to
prove a number of properties of the Pauli group and stabilizers. In particular, we have the following lemma:

Lemma 3.15. Let {P1 , . . . , Pm } be an independent set of Paulis on n qubits, and let s be an m-bit vector
with components si . Then 9Q 2 Pn with c(Pi , Q) = si . In fact, there are 22n m such Paulis.

Proof. Converting to the binary symplectic representation gives us m linearly independent vectors vi , and
we wish to find vector w such that vi w = si . Each of these conditions is a linear equation, and since
the vectors vi are independent, the system of linear equations is non-singular. There are m equations in a
2n-dimensional vector space, so the space of solutions has dimension 2n m. It is a binary vector space, so
that corresponds to 22n m solutions.

57
An important consequence of the lemma is that it tells us about the decomposition of the full physical
Hilbert space into subspaces associated with the di↵erent error syndromes. I’ve already discussed this
decomposition, but the missing piece is given by the following corollary of lemma 3.15:
Proposition 3.16. Let S be a stabilizer with generators {Mi }. Then for any error syndrome s, 9P 2 Pn
with (P ) = s.
That is, not only is every Pauli outside N(S) associated with an error syndrome, but every possible error
syndrome is associated with some Pauli.

3.5.3 Consequences of the Lemma

Now it’s time for some of the proofs I owe you. There is proposition 3.2, which claims S = S(T (S)), and
theorem 3.11, which says N(S)/S ⇠ = Pk .
First, here is the proof that N(S)/S ⇠= Pk :
Proof of theorem 3.11. The most straightforward way to show this is by sequentially picking coset representa-
tives for the cosets corresponding to X i and Z i . The Pauli group Pk is determined, like any finitely-presented
group, by its generators and relations between them. In the case of the Pauli group, the relations are

c(Xi , Xj ) = 0 (3.26)
c(Zi , Zj ) = 0 (3.27)
c(Xi , Zj ) = ij . (3.28)

It is sufficient to consider only the generating cosets, since we can let the representatives of a product of
cosets be the product of the representatives, as discussed in section 3.4.2. Provided we can choose X i and
Z i with the correct commutation relations, that gives us an injective map of Pk into N(S)/S. We know that
|N(S)| = 4 · 2n+k = |S||Pk |, so an injection has to be isomorphism.
Suppose we’ve chosen some independent set of X i ’s and Z i ’s with the correct commutation relations,
and we wish to choose one more. The new logical Pauli must be in N(S), so in particular, it must commute
with all generators of the stabilizer. Second, it has a defined commutation relation with the already chosen
logical Paulis. By lemma 3.15, there exists a Pauli that satisfies all of these constraints.
The only remaining thing to check is that the new logical Pauli can be chosen to be independent of
the prior ones. To simplify the analysis, let us first pick all the X i ’s, then the Z i ’s. When we are picking
Z j , it satisfies di↵erent commutation relations than all previously selected logical Paulis. In particular, Z j
anticommutes with X j , unlike all the previous logical Paulis and all the stabilizer generators. Thus, Z j must
be independent.
When we pick X j , this argument doesn’t apply, since the new one will commute with all stabilizer
generators and all previous logical Paulis. However, there are n k stabilizer generators and at most k 1
logical Paulis already chosen, for a total of at most n 1 constraints. By lemma 3.15, there are thus at least
2n+1 possible solutions. The group generated by the stabilizer and previous logical Paulis contains only 2n 1
operators, so there are possible choices for X j that are independent of the previous choices.

Finally, we can prove proposition 3.2, showing that S = S(T (S)):

Proof of proposition 3.2. If M 2 S, then certainly M | i = | i for any | i 2 T (S), so S ✓ S(T (S)). We need
to show that if N 2 / S, then N 2 / S(T (S)). If N 2 / N(S) then N | i (for any codeword | i) has a di↵erent
eigenvalue for some M 2 S, and is therefore orthogonal to the code space, which in turn means N 2 / S(T (S)).
If S has n generators, that is all we need. Otherwise, if r < n, we still need to show that if N 2 N(S) \ S,
then 9| i 2 T (S) such that N | i = 6 | i. By lemma 3.15, we can choose a M 2 N(S)\S such that {M, N } = 0.
Consider the modified stabilizer S0 formed by adding M to S as a new generator. S0 has r + 1 generators,
so its code space has dimension 2k 1 (by proposition 3.3). Since k 1, there is at least one state | i (up to
normalization) in the code space of S0 . | i is left fixed by all elements of S0 , and in particular by the elements

58
of S ⇢ S0 . Thus, | i 2 T (S). However, N anticommutes with M , which is in the stabilizer of | i. Therefore,
N | i has eigenvalue 1 for M , and in particular is orthogonal to | i. It follows that N 2 / S(T (S)), which
proves the proposition.

59
60
Chapter 4

Maybe I Should Have Started Here:

Classical Error Correction

In chapter 3, we saw the formalism of stabilizer codes, but we didn’t see how to find new stabilizer codes.
Indeed, finding new codes is a tricky topic. Already in the theory of classical error correction, it is quite
difficult to find good new codes. Luckily, there are a few ways of taking already-discovered classical codes
and turning them into quantum codes. That is not the subject of this chapter.
This chapter is instead about the theory of classical error-correcting codes. Naturally, I won’t have time
to go into complete detail on classical error correction, so I will focus on the particular points of the theory of
classical error correction which have most relevance to QECCs. One purpose is to give you the background
for chapter 5, which is about converting classical codes into quantum codes. The other reason is that it
can give you a deeper understanding of the theory of QECCs and of stabilizer codes, since there are many
parallels — and some critical di↵erences — between the theories of classical and quantum error correction.
For those who are familiar with the theory of classical error correction, this chapter is likely to be
somewhat boring, but I hope it won’t be a complete waste of time. In particular, I will try to point out the
parallels between classical coding theory and those aspects of quantum coding theory that we’ve seen so far.
Some you may have already noticed yourself, but perhaps there are some parallels you missed.

4.1 Classical Error Correction in General

4.1.1 Abstract Description of a Classical Code
A general classical error-correcting code can be defined, much as we did for a QECC, as an encoding map.
Definition 4.1. A classical error-correcting code (e, E) is a map e : [1 . . . K] ! [1 . . . N ] with a set of
correctable errors E (consisting of maps E : [1 . . . N ] ! [1 . . . M ]) with the following property: 9 map
d : [1 . . . M ] ! [1 . . . K] such that 8E 2 E, 8x 2 [1 . . . K],

d(E(e(x))) = x. (4.1)

The map d is the decoder for the code and e is the encoder for the code. Frequently, the code is just referred
to as C, the image of the encoder e in [1 . . . N ].
I have written this to be completely analogous to the definition I gave for a QECC. The only real di↵erence
is that quantum states live in a Hilbert space, and maps between them should be quantum operations, or at
least linear maps, whereas classically, we can consider states from any set and arbitrary functions between
the sets as the encoder, decoder, and errors. But remember that classical codes are actually just a special
case of quantum codes. The apparent extra constraint of linearity is really an extra freedom to include
superpositions of basis states, whereas classical codes can only contain basis states.

61
000
e
non 001
3rd
2nd
000 010
1st

100

011
1st
2nd
111 101
3rd
non
e 110

111

Figure 4.1: Codewords of the repetition code with errors on a single bit never get confused.

Ultimately, a classical error-correcting code is a just a set of objects (frequently bit strings), and the
errors diversify those objects. The goal of the decoder is to get back to the original object. We have an
uncorrectable error when two di↵erent logical objects x and y get confused: E(e(x)) = F (e(y)). If that
never happens, a decoder exists: the sets Sx = {E(e(x))|E 2 E} are all distinct, so we can define a function
mapping all elements of Sx to x unambiguously.
This condition has no precise analogy for quantum codes. In some sense equation (2.58) is similar, if we
interpret the correct quantum analogue of “di↵erent” as “orthogonal.” Equation (2.58) says that orthogonal
quantum states and di↵erent errors must produce orthogonal states, but for QECCs, that is only a sufficient
condition, not a necessary one. From that point of view, QECCs seem somewhat more generous than classical
error-correcting codes, since they don’t require that it is possible to exactly determine the error. On the
other hand, if you did exercise ??, you saw a condition which can be broken up into two parts. When i 6= j,
it says that we should never confuse di↵erent quantum states under an error; that seems analogous to the
condition for classical error correction. But then there is an additional non-trivial condition for a QECC
when i = j, which makes the quantum code seem more stringent than the classical one. The extra condition
can be interpreted as saying that the environment should not learn anything about the logical subspace for
a QECC. Classically, of course, it is harmless if the environment simply learns the encoded data, provided
it doesn’t change it. In the quantum case, learning about the data necessarily involves changing it.

4.1.2 Distance of a Classical Code

As with quantum codes, frequently we focus on codes where the encoded state N can be broken up into
smaller pieces, N = q n , and where M = N . We consider [1 . . . N ] as the direct product of n copies of [1 . . . q].
I will refer to each [1 . . . q] factor as a register, a component, or a coordinate. In the common case where
q = 2, I may just refer to the ith bit (or physical bit) of the code. Sometimes I may even forget myself
and accidentally refer to the ith qubit. In the quantum case, I’ll use these same terms, but I try to avoid
component since it also can refer to the component in some direction in the overall Hilbert space. In the
classical case, the terminology comes from treating [1 . . . q n ] as an n-dimensional vector space over the finite
field GF(q). We’ll get back to that approach in section 4.2.
Also, we tend to focus on cases where E is composed of errors of weight t or less. The classical definitions

62
of weight and support are just the same as the quantum ones: the error E acts trivially on all but the
coordinates in the support of E, and the weight of E is the size of the support. When we have a classical
channel which causes errors independently on each coordinate with a small probability p, then the probability
of having errors of weight t + 1 or greater is O(pt+1 ), much as in the quantum case. If we can find a code
that simply corrects all errors of weight t or less, then with high probability, we’ll get the correct state when
we decode.

Definition 4.2. The distance of a classical error-correcting code C ✓ ZN (with N = q n ) is

min{wt(E)|9x 6= y 2 C, E(x) = y.} (4.2)

A classical code with distance d encoding K possible states in n registers of size q is an (n, K, d)q code.
When q = 2, we just write (n, K, d).

In other words, the distance is the minimum number of coordinates that have to be changed to get from
one codeword to another. The quantum distance can be understood in the same way, but there are more
subtleties in the quantum case.
As with quantum codes, the distance tells us how many errors we can correct:

Proposition 4.1. A classical code with distance d can:

1. correct general errors on up to b(d 1)/2c coordinates,

2. correct erasure errors on up to d 1 coordinates, or

3. detect errors on up to d 1 coordinates.

4.2 Classical Linear Codes

General classical codes are unwieldy to deal with, just as are general quantum codes. The class of linear
codes is very analogous to stabilizer codes — it is a much more tractable subclass of classical error-correcting
codes, and many of the most interesting known codes are, in fact, linear codes.
For a linear code, we assume N = q n , as above. For this section, I will further assume q = 2, and treat
[1 . . . N ] as an n-dimensional vector space over Z2 . In section 4.4, I will return to the q 6= 2 case.

4.2.1 Generator Matrix and Parity Check Matrix

By considering ZN = Zn2 as a vector space, we add an additional linear structure, and a linear code exploits
that structure.

Definition 4.3. An error-correcting code C ✓ Zn2 is a linear code if x, y 2 C ) x + y 2 C.

A linear code is thus a linear subspace of Zn2 . We can choose a basis x1 , . . . , xk for the code, and all other
vectors in the code will be linear combinations of the xi ’s. Since x + x = 0 8x 2 Zn2 , the only question is
which subset of the basis vectors are added together to make a codeword. There are thus 2k codewords in
total and k encoded bits.

Definition 4.4. The generator matrix GC of a linear code C is a matrix with row i equal to xi .

The generator matrix can be used to define the encoder of C:

Proposition 4.2. Let C ✓ Zn2 be a linear code with k encoded bits and generator matrix G. Then the linear
map v 2 Zk2 7! GT v 2 Zn2 is the encoder for C. In other words, x is a codeword of C i↵ x = GT v for some
v 2 Zk2 .

In order to check for errors, we measure the parities of bits in the codeword.

63
Definition 4.5. Suppose the linear code C has generator G. A parity check matrix HC for C is a matrix
with row i equal to yi 2 Zn2 , where Gyi = 0 8i, and the set {yi } is a maximal linearly independent set with
this property.

The parity check matrix of a code is not unique, but it is still frequently referred to as “the” parity check
matrix. You can take any linear combinations of rows of the parity check matrix, and provided the new set
of rows remains linearly independent, it will still function as a parity check matrix.

Theorem 4.3. If C has k encoded bits and n physical bits, then GC is a k⇥n matrix and HC is an (n k)⇥n
matrix. GC HC T = 0 and HC GC T = 0.

Proof. The only property which is not completely trivial is that HC has n k rows. The constraints
GC T yi = 0 form a set of k non-singular linear equations on n bits, so the solution space has dimension n k.
Thus, a maximal set of {yi } will consist of n k of them.

Using the linear structure of Zn2 , we can represent a set of bit flip errors as a vector e 2 Zn2 , with 1 for
those bits which have been flipped and 0 for those bits which have not been flipped. Under this convention,
wt e is the weight of the error. If we start with bit string x and the error e occurs, we now have the bit string
x + e.
The virtue of the parity check matrix is that it filters out the codewords and just tells us about the errors.
Suppose x 2 C undergoes error e. Then, using linearity and proposition 4.2,

HC (x + e) = HC x + HC e = HC GC T v + HC e = HC e. (4.3)

Definition 4.6. The error syndrome of an error e for linear code C is HC e.

All of this probably sounds a bit familiar, since stabilizer codes and classical linear codes are closely
related. The stabilizer of a stabilizer code is very analogous to the parity check matrix for a linear code,
although the stabilizer also has some vague similarity to the generator matrix (in that the projector on the
code space is formed from the stabilizer). In fact, linear codes are a special case of stabilizer codes, and the
parity check matrix can be realized as the stabilizer:

Theorem 4.4. Let C be a linear code with parity check matrix H, and let C correct the set of errors E ✓ Zn2 .
Define a stabilizer code S to have stabilizer with binary symplectic representation (0|H). Then S corrects the
set of errors {(e|0)|e 2 E} and for any x 2 Zn2 , x 2 C i↵ |xi 2 S.

In other words, we form a stabilizer by replacing the 1s in H with Zs, with each row of the parity check
matrix becoming a generator of the stabilizer. The resulting stabilizer code corrects the same errors as C
and has the same basis codewords. The one di↵erence is that the stabilizer code encodes a quantum state,
so superpositions such as |x1 i + |x2 i are also codewords of S, even though the combination is meaningless
when considering a classical code. The proof of the theorem is straightforward, and is left as an exercise.
The non-uniqueness of the parity check matrix is just the same as the non-uniqueness of the generators
of a stabilizer. Replacing a row of the parity check matrix with a linear combination of rows is equivalent to
replacing a generator of the stabilizer with a product of generators.
The best analogy for the rows of the generator matrix are the logical Paulis in N(S)/S. For a stabilizer
code, the coset representatives are non-unique, which is not an issue for classical codes. It is true that
the generator matrix is not completely unique — we can take a di↵erent encoder, which corresponds to a
di↵erent generator matrix. However, this is a di↵erent phenomenon, and corresponds to labeling the cosets
in N(S)/S with di↵erent logical Paulis rather than choosing di↵erent representatives of them.

4.2.2 Distance of a Linear Code

For a linear code, the distance has a somewhat nicer form than for a general classical error-correcting code.

Proposition 4.5. The distance of a linear code C is min{wt(e)|e 2 C, e 6= 0}.

64
Proof. The general definition of distance is as min{wt e|x 6= y 2 C, x + e = y} (with the error rewritten from
definition 4.2 to take advantage of linearity). But when x, y 2 C, e = x + y is also in C since C is linear. Any
e 2 C can be realized this way by simply choosing any x 2 C and letting y = x + e, which will automatically
be in C, again since C is linear. x 6= y is equivalent to e 6= 0.

Notation 4.7. An (n, 2k , d) linear code is denoted as an [n, k, d] code.

For a linear code, the codewords themselves are the undetectable errors: adding a non-trivial codeword
to another codeword results in a new codeword, and if you add a non-codeword to a codeword, you always
get a non-codeword. For a set of errors to be correctable, any pair of errors must not add up to a codeword:

Theorem 4.6. A linear code C corrects the error set E i↵ e + f 2 / C for all e 6= f 2 E. Equivalently,
HC e 6= HC f 8e 6= f 2 E (i.e., all errors in E have di↵erent error syndromes).

Proof. If all errors in E have di↵erent error syndromes, than the code certainly can correct for E using the
following procedure: Given an erroneous codeword x + e, apply the parity check matrix to get HC e. Since
the error syndromes are unique, we can identify e and then recover x = (x + e) + e.
If e + f 2 C then HC (e + f ) = HC e + HC f = 0. Conversely, if e + f 2 / C, then HC (e + f ) 6= 0. This is
true because the parity check matrix is formed from a maximal set of vectors annihilated by the generator
matrix. When e + f 2 / C, then the set of solutions of GC yi = 0 and of (e + f ) · yi = 0 is only n k 1
dimensional, whereas the parity check matrix has n k rows. Thus, e + f 2 / C i↵ HC e 6= HC f .
Finally, if e + f 2 C for some pair e 6= f 2 E, then the code cannot correct E. For instance, given x 2 C,
let y = x + (e + f ), which will also be in C. Then x + e = y + f , so if this string shows up, there is no way
to tell whether it should be decoded to x (with error e) or y (with error f ).

The combination e + f is reminiscent of the combination E † F that appears in theorem 3.5 giving the set
of correctable errors for a stabilizer code. Comparing further, you can see that for a classical code, C \ {0}
plays the role of N̂(S) \ Ŝ for a stabilizer code. Indeed, if you apply the transformation theorem 4.4 to code
C to get stabilizer S, and then calculate N̂(S) \ Ŝ, you find that it includes the conversion of C \ {0}. That’s
not all it contains, but it is a good exercise to work this out yourself.
Importantly, there is no good classical analogue to the concept of a degenerate quantum code. All
classical codes are non-degenerate, and the presence of degenerate quantum codes complicates quantum
coding theory.

4.2.3 Example: Hamming Codes

One example of a linear code is the repetition code 0 7! 000, 1 7! 111. It is linear because 000 + 000 = 000,
000 + 111 = 111, and 111 + 111 = 000. The generator matrix is

G= 1 1 1 , (4.4)

and the parity check matrix is ✓ ◆

1 1 0
H= (4.5)
1 0 1
The three-bit repetition code (a [3, 1, 3] code) can of course be generalized to d-bit repetition codes, which
have parameters [d, 1, d], but there is another interesting generalization, starting from the observation that
in the parity check matrix for the three-bit repetition code, every column is di↵erent.
Supposing we want to correct only 1-bit errors. If wt e = 1, with the only 1 in the ith coordinate, then
for any parity check matrix H, He — the error syndrome of e — will be the ith column of H. Thus, if we
pick all columns of H to be di↵erent, the error syndromes of all 1-bit errors will be di↵erent. We should not
pick an all 0s column since we want the no error case (e = 0) to have error syndrome 0.
If we fix the number of rows of H to be r, we can choose up to 2r 1 distinct columns by letting them
run over all nonzero r-bit strings. This gives a Hamming code:

65
Definition 4.8. The [2r 1, 2r r 1, 3] Hamming code is the code whose r ⇥ (2r 1) parity check matrix
has as columns all possible r-bit strings.
Notice that here we have defined the Hamming code by choosing a parity check matrix, much as for a
stabilizer code, we choose the stabilizer to define the code. In this case, you can derive the set of actual
codewords as {x|Hx = 0}. There are r linear constraints on 2r 1 bits, so the linear space of solutions is
(2r r 1)-dimensional, giving the number of encoded qubits in the definition.
As a concrete example, when r = 3, we get a [7, 4, 3] code. It has parity check matrix
0 1
1 1 1 1 0 0 0
@ 1 1 0 0 1 1 0A , (4.6)
1 0 1 0 1 0 1

and can be taken to have generator matrix

0 1
1 1 1 1 1 1 1
B1 1 1 1 0 0 0C
B C. (4.7)
@1 1 0 0 1 1 0A
1 0 1 0 1 0 1

4.2.4 Example: Reed-Muller Codes

Looking at the generator matrix for the 7-bit Hamming code, it has a somewhat nice form. The form gets
even nicer if you add an extra bit which is 1 for the first row and 0 for the others:
0 1
1 1 1 1 1 1 1 1
B 1 1 1 1 0 0 0 0C
B C
@ 1 1 0 0 1 1 0 0A . (4.8)
1 0 1 0 1 0 1 0

Now every codeword has weight 0, 4, or 8. This is an example of a Reed-Muller code, specifically R(1, 3).
It is an [8, 4, 4] code.
Definition 4.9. The 1st order Reed-Muller code R(1, m) is a linear code with n = 2m physical bits. It has a
generator matrix whose rows are the all-1s vector and the vectors vi (i = 1, . . . , m), where the jth coordinate
of vi is equal to the ith bit of j, when j is expanded in binary (and assuming j runs from 0 to n 1).
For instance, for n = 4, v1 = (0011) and v2 = (0101). The generator matrix is
0 1
1 1 1 1
@0 0 1 1A (4.9)
0 1 0 1

It is a [4, 3, 2] code. Note that in equation (4.8), the order of the bits is reversed relative to this convention.
Theorem 4.7. R(1, m) is a [2m , m + 1, 2m 1
] code.
Proof. Each of the vectors vi has weight exactly n/2 (n = 2m ), as exactly half the numbers 0, . . . , n 1 will
be 1 in any given bit location. We can also easily show that any sum of s > 0 of the vi s will have weight
exactly n/2: The sum will be 0 or 1 in the jth location i↵ the XOR of the corresponding bits of j is 0 or
1. For instance, for v1 + v3 , take the XOR of the first and third bits of j. We then let j run over from
0, . . . , n 1. As we do this, the set of bits we are looking at can take on every possible set of values, and
furthermore, each set of values appears the same number of times, corresponding to every set of values for
the other bits. In particular, any particular assignment of the s bits we are interested in shows up 2n s
times. For half of these assignments, the XOR will be 0, and for the other half, the XOR will be 1. Thus
the weight of the sum we are looking at is exactly n/2.

66
The all-1s vector has weight n, and the all-1s vector added to any vector of weight n/2 is again a vector
of weight n/2. Thus, the code has distance n/2 = 2m 1 .
This argument also shows that all the rows of the generator matrix are independent, since no linear
combination gives 0. Therefore, the code has m + 1 encoded bits, the same as the number of rows.

Definition 4.10. The rth order Reed-Muller code R(r, m) has as rows of its generator matrix all products of
up to r of the vectors vi given in definition 4.9, where product means the bitwise product (1 in a coordinate
i↵ all vectors in the product are 1 at that coordinate) The all-1s generator is also included. (It can be
considered as the product of 0 of the vi ’s.)

For instance, R(2, 2) has the additional generator v1 v2 = (0001), giving the generator matrix
0 1
1 1 1 1
B0 0 1 1C
B C (4.10)
@0 1 0 1A
0 0 0 1

R(2, 2) is a [4, 4, 1] code, which actually means it contains all 4-bit vectors. R(2, 3) is more interesting, with
generator matrix 0 1
1 1 1 1 1 1 1 1
B0 0 0 0 1 1 1 1C
B C
B0 0 1 1 0 0 1 1C
B C
B0 1 0 1 0 1 0 1C (4.11)
B C
B0 0 0 0 0 0 1 1C
B C
@0 0 0 0 0 1 0 1A
0 0 0 1 0 0 0 1
R(2, 3) is an [8, 7, 2] code.

Theorem 4.8. R(r, m) is a [2m , N (r, m), 2m r

] code, where
✓ ◆ ✓ ◆ ✓ ◆
m m m
N (r, m) = 1 + + + ··· + . (4.12)
1 2 r

Proof. We must show that the code has N (r, m) encoded bits and has distance 2m r .
Number of encoded bits: R(r, m) is spanned by a product of vectors for each subset of up to r
elements of the numbers 1, . . . , m. Counting them gives the formula for N (r, m). However, we do need to
show that these are linearly independent to see that this is also the number of encoded bits.
Consider the extreme case of R(m, m). The generator matrix for R(r, m) is contained in the top rows of
that for R(m, m), so it is enough to show that the generator matrix for R(m, m) has full rank. N (m, m) =
2m = n, so the rows are linearly independent i↵ they span the whole vector space.
We can imagine arbitrary vectors on 2m bits as functions from m bits to one bit, as follows: for the jth
bit of the vector, interpret j as the input of the function, with the output of the function given by the jth
bit. (For instance, the vector 0110 is the function f (00) = 0, f (01) = 1, f (10) = 1, f (11) = 0.) In this
interpretation, vi is the function “take the ith bit of the input.”
The product of multiple vectors vi1 , . . . , vis is the function “take the AND of bits i1 , . . . , is of the input,”
and the sum of products is the XOR of these ANDs. We can write an arbitrary function from m bits to 1
bit as the XOR of ANDs of up to all m bits, so all possible vectors are in the code R(m, m). Therefore all
products of the vi s are linearly independent.
Distance: It is easy to see that the product of any r vectors vi has weight n/2r = 2m r , as the product
is 1 in the jth location i↵ the AND of the corresponding bits of j is 1, which happens for only a fraction
1/2r of the possible values of j. We also have to check the distance, however, for the sums of these products,
and it is not clear that taking the sum cannot cause the weight to decrease.

67
To show that the distance of R(r, m) is indeed 2m r , we perform induction on r and m. We have already
calculated the distance for R(1, m) in theorem 4.7. As m r, we also want as a base case to show it for a
given r for the smallest possible value of m, namely R(r, r). In this case, there is nothing really to show, as
the distance from the formula is 1, which is indeed the weight of the product of all r vi vectors, and there is
no possibility of having a shorter distance.
Claim 4.9. Assume the distance of R(r 1, m) is 2m r+1
and the distance of R(r, m) is 2m r
. Then the
distance of R(r, m + 1) is 2m+1 r .
Proof of claim. We can consider any given sum of basis vectors, and break it up into a term where none
of the products includes the vector v1 and a term where all of the products include v1 . Now, if we restrict
attention to the first 2m coordinates, the second term is uniformly 0 and the first term is a vector from
R(r, m), which we already know has weight at least 2m r unless it is the 0 vector.
If we restrict attention to the last 2m coordinates, v1 is always 1, so it can be ignored, and the second
term is the sum of products of at most r 1 vectors, and is thus a vector from R(r 1, m). The first term is
the same on the last 2m coordinates as it was on the first 2m coordinates, and is again a vector from R(r, m).
But R(r 1, m) ✓ R(r, m), so the sum of a term from R(r 1, m) and a term from R(r, m) is in R(r, m),
and therefore the last 2m coordinates have weight at least 2m r by the inductive hypothesis unless the last
2m coordinates are all 0.
If the first term is the 0 vector, we actually have a vector from R(r 1, m) on the last 2m coordinates,
which therefore has weight at least 2m r+1 unless it is 0. To get 0 on the last 2m coordinates, either both
terms are 0 (in which case the whole vector is 0), or the first and second terms must cancel on the last 2m
coordinates. In order to cancel the second term, the first term must actually be a vector from R(r 1, m)
on each half of the coordinates, which again means the vector on each half of the coordinates has weight
2m r+1 .
That is, we have three cases (assuming the overall vector is not 0): In case one, the first term is 0,
in which case the last 2m coordinates have weight 2m r+1 . In case two, the first and last 2m coordinates
will each have weight at least 2m r . In case three, the last 2m coordinates have weight 0, but the first 2m
coordinates have weight 2m r+1 . In all of these cases, we know that the overall vector has weight at least
2m+1 r , completing the induction for the distance.
If we have proven the formula for distance for r 1 and all m, then we can use induction on m. The
base case is R(r, r) and we use the claim to prove the distance formula for this specific value of r and all m.
This then allows us to use induction on r and the base case of R(1, m) to prove the distance formula for all
values of r and m.

4.3 Dual Codes

4.3.1 Definition of a Dual Code
Definition 4.11. If C is a linear code, the dual code C ? is
C ? = {y 2 Zn2 |x · y = 0 8x 2 C}. (4.13)
The dual code switches the role of the generator and parity check matrices: The generator matrix of C ?
is the parity check matrix of C and the parity check matrix of C ? is the generator matrix of C. (Note that
?
(C ? ) = C.) It follows that the dual code of an [n, k, d] code is an [n, n k, d0 ] code. The distance d0 does
not in general have to be related to the distance d.
We also saw a definition of a dual code when discussing the binary symplectic representation of a stabilizer
code. The dot product appears in this definition and the symplectic form appeared in that one, but otherwise
they are the same.
We can also define self-dual codes and weakly self-dual codes in the same way as in section 3.5. That is,
a self-dual code is equal to its dual, and a weakly self-dual code is contained in its dual.

68
4.3.2 Dual Codes for the Examples
For the 7-bit Hamming code, we’ve already worked out the generator and parity check matrices. Looking
at the 8 vectors in the span of the rows of the parity check matrix of the 7-bit Hamming code, we see that
all the nonzero vectors in the dual code have weight 4. The dual code of the [7, 4, 3] is thus a [7, 3, 4] code.
We’ve already seen that the 7-bit Hamming code is related to a Reed-Muller code. The dual is too — if take
R(1, 3) and drop the all-1s vector, one bit is always zero. If we then discard that bit, we get the dual of the
7-bit Hamming code.
This is true in general:
Theorem 4.10. Take R(1, m), remove the all-1s vector, and then puncture it: drop the bit that is always
0. The resulting [2m 1, m, 2m 1 ] code is the dual of the [2m 1, 2m m 1, 3] Hamming code.
Proof. First, let us check the parameters of the punctured Reed-Muller code. Dropping the all-1s vector
removes one encoded bit, leaving m logical bits. Removing codewords does not decrease the distance, and
since the remaining vectors all have weight 2m 1 , it does not increase it in this case either. Dropping a bit
that is always 0 also does not change the distance, giving the parameters [2m 1, m, 2m 1 ].
We are left with a generator matrix which has m rows v1 , . . . , vm . The jth bit of vi is the ith bit of the
binary representation of j, so the jth column of the generator matrix is exactly the binary representation of
vi . This was how we constructed the parity check matrix of the [2m 1, 2m m 1, 3] code, which is the
generator of the dual.
Naturally, this also means that the duals of the Reed-Muller codes R(1, m) are related to the Hamming
codes. In the special case of R(1, 3), it was related both to the 7-bit Hamming code and to its dual. That
is because R(1, 3) is a self-dual code. Most Reed-Muller codes are not self-dual, but R(1, 3) is not the only
one that is. More importantly, the dual of every Reed-Muller code is another Reed-Muller code.
Theorem 4.11. The dual of R(r, m) is R(m r 1, m) (r  m 1).
Proof. Let us take the dot product of two basis vectors w 2 R(r, m) and w0 2 R(r0 , m). The dot product is
the parity of the pointwise product ww0 . But w is the pointwise product of up to r of the vi vectors, and w0
is the pointwise product of up to r0 of the vi vectors, so ww0 is the pointwise product of up to r + r0 of the
vi vectors. Suppose we eliminate redundant vi s that appear twice in the product, leaving us with s  r + r0
vectors vi that appear in at least one of the two products w and w0 . We already know from theorem 4.8 that
such a product has weight exactly 2m s . Therefore the dot product of w and w0 (the parity of the pointwise
product) is 0 unless s = m, which is only possible if r + r0 m. Therefore, R(m r 1, m) is orthogonal
to R(r, m), and is contained in its dual.
Now, R(r, m) encodes N (r, m) = m m m
0 + 1 +· · ·+ r bits by theorem 4.8, so its dual encodes 2
m
N (r, m)
bits. But R(m r 1, m) encodes
✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆ ✓ ◆
m m m m m m
+ + ··· + = + + ··· + (4.14)
0 1 m r 1 m m 1 r+1
bits. Therefore, N (r, m) + N (m r 1, m) = 2m , and R(m r 1, m) is not only contained in the dual of
R(r, m), it is the same size as the dual, and therefore equals the dual.
Corollary 4.12. When m = 2r + 1, the Reed-Muller code R(r, m) is self-dual. When m 2r + 1, R(r, m)
is weakly self-dual.

4.4 Non-Binary Linear Codes

While binary codes, with the number of physical states N = 2n , are the most common sort of error-
correcting code, non-binary codes, which have other values of N , are also used. Some non-binary codes are
quite important in the theory of quantum error correction, so we’ll quickly cover the concepts here. We
won’t get to non-binary quantum codes until chapter 8, but we’ll use some facts about non-binary classical
codes even when discussing QECCs over qubits.

69
4.4.1 Linear Codes Over Finite Fields
When using the binary generalization of linear codes, we usually assume that N = q n , and q = pm is a
prime power. We want q to be a prime power because then there is a finite field GF(q) of size q, and we can
consider [1 . . . N ] to be a vector space of dimension n over GF(q). (See appendix C for an introduction to
finite fields.)

Definition 4.12. An error-correcting code C ✓ GF(q)n is a linear code if x, y 2 C ) ↵x + y 2 C for any

↵, 2 GF(q). C is an additive code if x, y 2 C ) x + y 2 C.

For binary linear codes, we only needed to worry about adding together codewords, but for non-binary
linear codes, multiplication by scalars from the field GF(q) must also keep us within the code. If adding
codewords gives a codeword but multiplication by scalars does not necessarily do so, then the code is merely
additive. For bits, additive implies linear because the only scalars are 0 and 1, but for larger fields, there are
more options.
We can again consider the possible errors to be vectors in GF(q)n . If an error e acts on a string x, the
resulting state is x + e, as before. Bear in mind, however, that there are q 1 di↵erent errors that can a↵ect
each register.
Non-binary linear codes have generator and parity check matrices defined in the same way as for binary
codes, and the distance is also defined in the same way. We use the notation [n, k, d]q for a non-binary linear
code with n physical registers, each of size q. The only di↵erence in the basic properties of the code is that
we now need to be careful of the distinction between addition and subtraction, which are the same for a
binary code. In particular,

Theorem 4.13. A non-binary linear code C corrects the error set E i↵ e f 2 / C for all e 6= f 2 E.
Equivalently, HC e 6= HC f 8e 6= f 2 E (i.e., all errors in E have di↵erent error syndromes).

We can define the dual code for a non-binary code in just the same way as for a binary code using the
dot product for a vector space over GF(q).

4.4.2 Example: Hamming Codes

The Hamming codes have a natural generalization to non-binary fields. Our goal in designing non-binary
Hamming codes is again to choose the columns of the parity check matrix so that all single-qubit errors
have di↵erent error syndromes. However, we need to be careful because there is now more than one possible
single-bit error per register. If ei is the error that is 1 in location i and 0 elsewhere, then ei has syndrome
equal to the ith column of the parity check matrix. When ↵ 2 GF(q), ↵ei has error syndrome equal to ↵
times the ith column of the parity check matrix. Therefore, the correct generalization of the Hamming code
is not to ensure just that all columns are di↵erent; instead, we want that no column of the parity check
matrix is a scalar multiple of another for any scalar in GF(q).
For instance, GF(4) has 4 elements 0, 1, !, and ! 2 . Therefore, we have a [5, 3, 3] Hamming code over
GF(4) with the following parity check matrix:
✓ ◆
0 1 1 1 1
(4.15)
1 0 1 ! !2

Notice that we don’t include columns (0, !) or (0, ! 2 ) since they are scalar multiples of (0, 1), but that we
include all columns of the form (1, ↵), since no two of them are scalar multiples. We don’t, however, then
need to include any columns (!, ↵) or (! 2 , ↵).
Following this logic, we find the non-binary Hamming codes:

Theorem 4.14. There exists a [(q r 1)/(q 1), (q r 1)/(q 1) r, 3]q code for any prime power q and
any r > 1. These codes are known as Hamming codes.

70
Proof. There are r rows in the parity check matrix. We wish to choose (q r 1)/(q 1) columns so that none
is a scalar multiple of another. Note that
(q r 1)/(q 1) = 1 + q + · · · + q r 1
. (4.16)
We will proceed by induction on r, the number of rows in the parity check matrix. If r = 1, we just have
the parity check matrix (1), which gives a [1, 0, 1]q code. It is a special case.
Next, we assume that we have a parity check matrix for r 1. For the r-row matrix, we can choose the
first column to be all 0s except for the last row, which is 1. Then any column for which the first r 1 entries
are 0 will be a scalar multiple of this column. The remaining columns are of the form (v, ↵), where v is a
column of the parity check matrix for the Hamming code for q and r 1, and ↵ is any element of GF(q).
When v 6= v 0 are two di↵erent columns of the r 1 Hamming code parity check matrix, than v 6= v 0 , so
it’s certainly true that (v, ↵) 6= (v 0 , ↵0 ). Thus, we don’t get columns which are scalar multiples that have
di↵erent entries in the first r 1 entries. We should also compare (v, ↵) with (v, ↵0 ), but since the first r
entries are not all 0, the only possible scalar factor between them is 1, which implies that ↵ = ↵0 . Thus, this
scheme gives independent columns, as desired. We can pick (q r 1 1)/(q 1) columns of the r 1 Hamming
code, and q entries for the last row in the column, plus we have the first column (0, 0, . . . , 0, 1). The total
number of columns is thus
1 + q[(q r 1
1)/(q 1)] = 1 + q(1 + q + · · · + q r 2
) = 1 + q + · · · + qr 1
= (q r 1)/(q 1), (4.17)
as desired.

4.4.3 Example: Reed-Solomon Codes

Reed-Solomon codes are a useful class of codes which are based on polynomials over finite fields.
Definition 4.13. Let ↵1 , . . . , ↵n be n distinct elements of GF(q), and let k  n. A Reed-Solomon code is
{(f (↵1 ), . . . , f (↵n ))|f is a polynomial of degree < k}. (4.18)
The encoder for this code equates the input ( 0, . . . , k 1) to the polynomial
2 k 1
f (x) = 0 + 1x + 2x + ··· + k 1x . (4.19)
Reed-Solomon codes are widely used for practical purposes, for instance to encode DVDs to protect
against minor damage and in QR codes in case some part of the code can’t be read. They are useful because
they have a large distance and are easy to encode and decode. But who cares about that stu↵? It’s all
very 20th-Century. The real reason Reed-Solomon codes are interesting is that they are useful for making
quantum codes, although we won’t see how until we discuss non-binary QECCs in chapter 8.
Theorem 4.15. A Reed-Solomon code with n points from GF(q) and polynomials of degree at most k 1 is
an [n, k, n k + 1]q code.
Proof. Directly from the definition we can see that there are n physical registers and k logical registers. The
distance is less obvious. The most straightforward way to see it is to consider correcting erasure errors.
Suppose that t registers have been erased. We are then left with n t registers, which are the values
of the polynomial f evaluated at n t di↵erent points. We know that f has degree at most k 1, and
a degree k 1 polynomial is determined by its value at any k points. To be more concrete, given any k
points, we have k di↵erent linear equations for the coefficients 0 , . . . , k 1 . The equations are defined by the
Vandermonde matrix Vij = ↵ij 1 . (I have assumed that the registers which are available correspond to the
points
Q ↵1 , . . . , ↵k for notational simplicity.) The Vandermonde matrix is non-singular — it has determinant
(↵i ↵j ) — so the equations have a unique solution.
In short, provided n t k, we can reconstruct f and decode the code. Thus, the code corrects n k
erasure errors, so it has distance n k +1. The distance can’t be higher than that, because if we have n k +1
erasure errors, there are only k 1 points left, and there are multiple possible polynomials which go through
those points since we could choose any value for a kth point and still reconstruct a valid polynomial.

71
4.5 Hamming, Gilbert-Varshamov, and Singleton Bounds, MDS
Codes
To conclude the discussion of classical codes, we’ll discuss some basic limits on the existence of classical codes.
I won’t get to the quantum analogue of these bounds until chapter 7, but the bounds help to understand the
example codes that we’ve discussed. As you’ll see, the Hamming codes and Reed-Solomon codes are optimal
codes, giving a maximal k and d for minimum n.

4.5.1 Hamming Bound, Perfect Codes

One can set a simple upper bound by taking advantage of the requirement that the states must be distin-
guishable after errors.

Theorem 4.16 (Hamming bound). An (n, K, 2t + 1)q code must satisfy

0 1
Xt ✓ ◆
n
K @ (q 1)j A  qn . (4.20)
j=0
j

For large n,
(logq K)/n  1 (t/n) logq (q 1) hq (t/n), (4.21)
where
hq (x) = x logq x (1 x) logq (1 x). (4.22)

Proof. Let x be a vector in the code C, and let Sx be the set of all vectors at distance at most t from x.
(I.e., all strings that can be reached from x by altering up to t registers.) An error can take x to any string
in Sx , so we need that Sx \ Sy = ; when x 6= y. Otherwise, z 2 Sx \ Sy can’t be reliably decoded since it
could have come from either x or y before the error.
Let us count the size of Sx . We break Sx into subsets Sx,j which disagree with x on exactly j  t
registers. Sx,j breaks down further into subsets which depend on which j registers disagree. There are nj
possible sets of registers which disagree, and for a fixed set of registers, the strings in Sx,j can take on any
value except for the values in x. There are q 1 remaining choices for each of the j registers which disagree.
Thus, the total size of Sx,j is (q 1)j nj and the total size of Sx is

t
X ✓ ◆
n
(q 1)j . (4.23)
j=0
j

This is true for all x, and since Sx \ Sy = ;, the total number of strings in the union of all sets Sx is K times
equation (4.23). This must be less than the total number of possible n-register strings, which is q n , giving
us equation (4.20).
To find the large n version of this equation, just take the logarithm base q. The hq term comes from the
log of the binomial coefficient (lemma 4.17), and only the largest value of j = t contributes to the logarithm
for large n.

The formula for hq (x) is a fairly standard information-theoretic term, but it is sufficiently useful that I’ll
give a proof of it:

Lemma 4.17. For large n,

✓ ◆
n
logq = j logq (j/n) (n j) logq (1 j/n) + o(n). (4.24)
j

72
Proof. ✓ ◆
n
log = log(n!) log(j!) log[(n j)!]. (4.25)
j
(Assume everywhere that the base of the logarithm is q.)
Taking the logarithm of Stirling’s formula and keeping only the terms at least linear in n, we have

log(m!) = m log m m log e. (4.26)

Then
✓ ◆
n
log = (n log n n log e) (j log j j log e) [(n j) log(n j) (n j) log e] (4.27)
j
= j log n + (n j) log n j log j (n j) log(n j) [n j (n j)] log e (4.28)
= j log(j/n) (n j) log[(n j)/n]. (4.29)

The Hamming bound is also known as the sphere-packing bound. The name comes from the sets Sx ,
which can be viewed as “spheres” using the distance measure which counts the number of registers in which
two strings di↵er. That metric is known as the Hamming distance. The spheres Sx are rather blocky for
spheres, but that’s what you get when you use a discrete distance.
The case when the Hamming bound is met exactly is somewhat interesting. First, it represents the best
possible code you can have for a given n and d = 2t + 1. Second, when it is a linear code, it means that
every error syndrome is used by an error of weight t or less.
Definition 4.14. A code which saturates the Hamming bound is perfect.
When t = 1, the condition for a perfect code is K[1 + (q 1)n] = q n . Let us specialize to K = q k . Then
an [n, k, 3]q code is perfect if (q 1)n = q n k 1. Letting r = n k, we find that n = (q r 1)/(q 1),
k = n r. These are exactly the parameters of the Hamming codes. The Hamming codes were designed to
use up every error syndrome, so it’s not surprising they are perfect codes, but now you can see that these
parameters are the only ones possible for distance 3 perfect codes.

4.5.2 Gilbert-Varshamov Bound

The Hamming bound tells us that codes above a certain level of efficiency cannot exist — it is an upper
bound on the efficiency of error-correcting codes. The Gilbert-Varshamov bound is a lower bound. It tells us
that efficient codes do exist, provided we lower our standards a bit.
Theorem 4.18 (Gilbert-Varshamov bound). If n, K, and d satisfy
0 1
Xd 1 ✓ ◆
n
K @ (q 1)j A  qn , (4.30)
j=0
j

then an (n, K, d)q code exists. For large n, a code exists if

(logq K)/n  1 (d/n) logq (q 1) hq (d/n), (4.31)

with hq (x) given by equation (4.22).

The di↵erence between the upper bound given by the Hamming bound and the lower bound given by
the Gilbert-Varshamov bound is replacing t (the number of correctable errors) with d 1 (the number of
detectable errors), which is 2t. The asymptotic (large n) versions of the two bounds for q = 2 are plotted in
figure 4.2, along with the Singleton bound (section 4.5.3). Of course, there may exist codes whose efficiency
is in the region between the Hamming and Gilbert-Varshamov bounds, but this is not the generic case.

73
k/n
1

d/n
1/2 1

Figure 4.2: Classical Hamming bound (solid), Gilbert-Varshamov bound (dashed), and Singleton bound
(dotted) for large n, q = 2

Proof. We will pick the codewords sequentially, thus showing the theorem by induction on K.
For the first codeword, pick any string x1 . Then exclude the Hamming sphere of radius d 1 around x1 ,
that is, the set Sx1 of all strings that are distance d 1 or less from x1 . Choose the second codeword x2 to
be any string outside Sx1 . x2 has distance at least d from x1 , so the code {x1 , x2 } has distance at least d.
In general, given any code {x1 , . . . , xK 1 } with distance at least d, exclude the radius d 1 Hamming
spheres Sx1 , . . . , SxK 1 . Provided equation (4.30) is satisfied, there is at least one string not excluded. Choose
it (at random if there is more than one) as the Kth codeword xK . Then {x1 , . . . , xK } also has distance at
least d, since xK is no closer than a Hamming distance d to any of the previous codewords.

The proof can be strengthened to say not just that there exist codes with the parameters (n, K, d)q , but
actually that a randomly chosen (n, K) code has distance at least d (with high probability for large n). The
Gilbert-Varshamov bound can also be improved to show the existence of codes with certain properties. For
instance, when K = 2k , there exists a linear code provided n, K, and d satisfy equation (4.30).
The proof of the Gilbert-Varshamov bound is sometimes referred to as non-constructive. Of course, in
a literal sense it is constructive, since if we try all codes with the given parameters, we will eventually find
one that works. In this context, the term means that it is not efficiently constructive in at least one of two
ways. It can mean that picking a code according the algorithm implicitly given by the proof produces a
code whose description is exponentially large in n. In this case, that is true, since K = exp(O(n)) and each
codeword has to be listed separately, but if we use the version of the Gilbert-Varshamov bound applying to
linear codes, there is at least an efficient description of the resulting code in terms of the generator matrix. It
could also mean that the algorithm for finding the code takes exponentially long in n. That is true both for
the general version and linear code version of the Gilbert-Varshamov bound, since checking that a particular
code has distance d takes exponentially long.

4.5.3 Singleton Bound

The Singleton bound is another upper bound, but it has a much simpler form than the Hamming bound. In
particular, the form of the Singleton bound is the same for all register sizes q.

Theorem 4.19 (Singleton bound). If an (n, q k , d)q code exists, then

n k d 1. (4.32)

Proof. An (n, q K , d)q code corrects d 1 erasure errors. For instance, if we discard the last d 1 registers,
the remaining n (d 1) registers can still be used to reconstruct all k logical registers. This is only possible

74
if n (d 1) k since otherwise two di↵erent logical strings will be represented by the same string with
n (d 1) components.
Codes which saturate the Singleton bound have interesting properties, and therefore get their own name.
Definition 4.15. If an (n, q k , d)q code satisfies n = k + d 1, it is a MDS code.

“MDS” stands for “maximum distance separable.” You can probably figure out the “maximum distance”
part of the name. The “separable” part means that k coordinates of the codewords can be taken to be the
logical registers when no errors are present. This is not a property that is possible for a quantum code,
since it would violate the no-cloning theorem (if k qubits contain the logical state, then the remaining qubits
contain no information about it, and would not be useful for error correction). I will still use the term
“quantum MDS codes” for codes satisfying the analogous quantum Singleton bound.
Checking the parameters, you’ll find that the Reed-Solomon codes are MDS codes. The repetition codes
[n, 1, n] are too. The Hamming codes with r 3 are not — even though they are optimal for the Hamming
bound, they do not saturate the Singleton bound.

4.5.4 Properties of MDS Codes

The dual code of an [n, k, d] code is an [n, n k] code, but its distance might be bad. A remarkable property
of MDS codes, and one that is very useful for making quantum codes, is that their duals also have good
distance.
Theorem 4.20. The dual of an MDS code is also an MDS code. In particular, the dual of an [n, k, n k + 1]
code is an [n, n k, k + 1] code.
Proof. We wish to show that any codeword from the dual code has weight greater than k. This will show
that the distance of the dual code is at least k + 1, but it cannot be any higher than that by the Singleton
bound.
In terms of equations, we wish to show that if C is the original code and y is some string with 0 < wt y  k,
then 9x 2 C such that y · x 6= 0. Recalling the proof of the Singleton bound, let us erase d 1 registers from
an MDS code, leaving only k remaining registers, which are chosen to include the support of y. Call R the
set of k registers we are considering. Since the logical state can be reconstructed from R, they must take
on every possible value. That is, for any string x0 with support within R, 9x 2 C such that x|R = x0 . Let
y|R = y 0 . Since y 0 6= 0, there certainly exists some x0 such that y 0 · x0 6= 0, and then y · x 6= 0 whenever y has
support on R. Since R was an arbitrary set of size k, this shows that if y 2 C ? , then wt y > k.

75
76
Chapter 5

Combining The Old And The New:

Making Quantum Codes From
Classical Codes

Now that we’ve covered the basics of both classical and quantum error-correcting codes, we’re ready to
try combining them. By borrowing some codes from the old theory of classical error-correcting codes,
we’ll be able to make brand new quantum error-correcting codes, thus marrying quantum and classical
error correction. But don’t be blue, I’m sure there are still plenty of interesting quantum codes left to be
discovered.

5.1 CSS Codes

5.1.1 CSS Construction and Parameters of a CSS Code
The first construction we’ll discuss gives us a class of codes known as CSS codes after their inventors
Calderbank, Shor, and Steane. The idea of CSS codes is to build on the observation that classical linear
codes are a special case of stabilizer codes. In theorem 4.4, we made a stabilizer code by taking the parity
check matrix for a classical linear code and replacing each 1 with a Z. The resulting stabilizer code corrects
the same set of bit flip errors as the original classical code. If you instead replace the 1’s in the parity check
matrix with X’s, you get a stabilizer code that corrects phase flip errors. In a CSS code, we do both: some
of the stabilizer generators come from one classical linear code C1 with the parity check matrix converted to
Z’s, and some generators come from a second linear code C2 which is used to correct phase errors.

Definition 5.1. A stabilizer code is a CSS code if there is a choice of generators for which the stabilizer’s
binary symplectic representation is of the form
✓ ◆
0 A
, (5.1)
B 0

where A is a r1 ⇥ n matrix and B is a r2 ⇥ n matrix for some r1 , r2 . The generators of the form (b|0) are X
generators and the generators of the form (0|a) are Z generators.

In other words, some generators for a CSS code are tensor products of only X and I and some are tensor
products of only Z and I. This statement is of course dependent on the exact choice of generators. If you
pick a di↵erent set of generators by multiplying some of the X generators with some of the Z generators,
you will usually get generators which involve more than one of X, Y , and Z in the same operator. That
doesn’t mean the code is not a CSS code, only that you’ve picked a strange set of generators.

77
Z Z Z Z I I I
Z Z I I Z Z I
Z I Z I Z I Z
X X X X I I I
X X I I X X I
X I X I X I X
X X X X X X X X
Z Z Z Z Z Z Z Z

Table 5.1: The stabilizer and logical Paulis for the 7-qubit code.

The CSS construction allows us to take two classical codes and make a quantum code. As an example,
let’s form a 7-qubit code from the 7-bit Hamming code discussed in section 4.2.3. The second code will also
be the 7-bit Hamming code. Take the three rows in its parity check matrix and convert the 1’s to Z’s, getting
three generators of the stabilizer. Then take the rows a second time and convert the 1’s to X’s, getting three
more generators. The resulting stabilizer is shown in table 5.1.
What is the distance of the 7-qubit code? Using the Z generators of the stabilizer, we can detect and
identify any single-qubit bit flip error, since those generators are derived from a classical code which can do
so. The first three bits of the error syndrome tell us where a bit flip error is. Using the same logic, the last
three bits tell us where any single-qubit phase error has occurred. If there is a Y error, or indeed an X error
on one qubit and a Z error on another, then the error will show up in both the first three bits and the last
three bits of the error syndrome, identifying it as an error combining X and Z. Thus, the code can correct
any single-qubit X, Y , or Z error, and has distance 3.
How many encoded qubits do we have? Using proposition 3.3, we have n = 7 physical qubits and 6
generators, so there should be 1 encoded qubit. But hold on a minute. In order to have any encoded qubits,
we can’t just take an arbitrary set of Paulis and call them the generators of a stabilizer. In particular, to
have a stabilizer, we need to check that the generators we’ve written down commute with each other. The
Z generators automatically commute with each other and the X generators commute with each other, so
all we need to check is that every Z generator commutes with each X generator. In the case of the 7-qubit
code, they do. Therefore, the 7-qubit code is well-defined as a [[7, 1, 3]] code. The 7-qubit code is also known
as the Steane code, since Steane first proposed it.
In the case of the 7-qubit code, we derived both the X and Z generators from the same code, the [7, 4, 3]
Hamming code. For a more general CSS code, we don’t need to do that. We can use C1 to correct bit flip
errors and C2 to correct phase errors.

Theorem 5.1. Let C1 be an [n, k1 , d1 ] classical linear code with parity check matrix H1 and C2 be an
[n, k2 , d2 ] classical linear code with parity check matrix H2 . Suppose C1 ? ✓ C2 . Let S be the CSS code with
stabilizer ✓ ◆
0 H1
. (5.2)
H2 0
Then S is an [[n, k, d]] quantum code with k = k1 + k2 n and d min{d1 , d2 }.

There are a couple of things to notice about the statement of the theorem. You might think that the
condition C1 ? ✓ C2 implies an asymmetry between C1 and C2 , but that is not the case. Actually, the
theorem treats them on an equal basis because C1 ? ✓ C2 , C2 ? ✓ C1 . Also, observe that if n > k1 + k2 ,
the theorem would predict a stabilizer code with a negative number of encoded qubits. That can’t be right,
and of course, it isn’t. When n > k1 + k2 , it is not possible that C1 ? ✓ C2 .
While the theorem is phrased as just one way to make a CSS code, by comparing the theorem and the
definition of a CSS code, you can see that it is actually the only way to make a CSS code. In the future,
I’ll refer to the codes C1 and C2 of a general CSS codes, indicating the classical codes that produce the
Z generators and X generators, respectively. The choice of whether C1 has the X generators or the Z

78
generators is an arbitrary convention, and you may find the other choice in the literature. Also, sometimes
people will use the name “C1 ” when they mean what I am calling “C1 ? .” Again, this is somewhat an
arbitrary convention.
Proof. The main thing we need to check is that the X generators commute with the Z generators. Any X
generator is of the form (x|0), where x 2 C2 ? . It is in the dual since it is derived from a row of the parity
check matrix of C2 . Any Z generator is derived from the parity check matrix of C1 , so it has the form (0|z),
with z 2 C1 ? . Then
(x|0) (0|z) = x · z, (5.3)
with the usual binary inner product on the right. Thus, the stabilizer is Abelian i↵ x · z = 0 for all x 2 C2 ? ,
? ?
z 2 C1 ? . Equivalently, we could say that if z 2 C1 ? , then z 2 (C2 ? ) . Since (C2 ? ) = C2 , that produces
the condition C1 ? ✓ C2 .
Now let us determine the parameters of S. H1 has n k1 rows and H2 has n k2 rows, so the stabilizer
has 2n k1 k2 generators. Therefore, it has n (2n k1 k2 ) = k1 + k2 n logical qubits. The code can
detect up to d1 1 bit flip errors using the Z generators, and it can detect up to d2 1 phase errors using
the X generators, and detecting bit flip errors does not in any way interfere with detecting phase errors.
Any Pauli of weight less than min{d1 , d2 } can be written as a product P Q, with P a tensor product of X
and I with weight < d1 and Q a tensor product of Z and I with weight < d2 . The Pauli is non-trivial if at
least one of P and Q is non-trivial, in which case it can be detected by looking at the appropriate bits of
the error syndrome. Thus, the distance is at least min{d1 , d2 }.
From theorem 5.1, you can see why I made such a big deal about determining the duals of codes in
chapter 4. The [7, 4, 3] Hamming code contains its own dual, which is why we can use two copies of it
to make the 7-qubit code. We can get other CSS codes by using the other example classical codes. For
?
instance, let r > m r 1. Then R(r, m) = R(m r 1, m) ⇢ R(r, m), and we can make a CSS code
with C1 = C2 = R(r, m). For instance, R(2, 4) is a [16, 11, 4] code, and using it for C1 and C2 , we get a
[[16, 6, 4]] CSS code.
We don’t have to use the same code twice. For instance, we can let C1 = R(2, 4) and C2 = R(3, 4), which
is a [16, 15, 2] code. C2 ? = R(0, 4), which is just the 16-bit repetition code. R(0, 4) ⇢ C1 , so we can make a
CSS code, getting a [[16, 10, 2]] code. Of course, this particular construction is not ideal if we’re interested in
a distance 2 code, since by taking C1 = C2 = R(3, 4), we get a [[16, 14, 2] code, which has the same distance
but more encoded qubits. Still, it might be useful if we want a code that detects any single-qubit phase error
but can actually correct a bit flip error.

5.1.2 Degeneracy and CSS Codes

Theorem 5.1 says that the distance of S is greater than or equal to min{d1 , d2 }. Why not just equal to?
After all, since code C1 has distance d1 , there is a bit flip error of weight d1 that cannot be detected by code
C1 and therefore the corresponding Pauli error has 0 error syndrome for S. Similarly, C2 has distance d2 , so
there is also a Pauli with weight d2 which is a tensor product of Z and I and has 0 error syndrome. But for
a quantum code, 0 error syndrome is not enough to make an error undetectable. 0 error syndrome means
that the error is in N(S). An error is undetectable only if it is in N̂(S) \ Ŝ. That is, when a CSS code is
degenerate, its distance could be greater than one might expect simply by examining the two classical codes
that make it up.
For instance, the 9-qubit code is a degenerate CSS code. C1 is composed of three copies of the repetition
code. It has distance 3. However, C2 has the following parity check matrix:
✓ ◆
1 1 1 1 1 1 0 0 0
. (5.4)
0 0 0 1 1 1 1 1 1

C2 contains strings such as (1, 1, 0, 0, 0, 0, 0, 0, 0) which have weight 2. Thus, C2 has distance 2. (It is easy
to check that any single-bit error will have nonzero error syndrome.) If we just took the distance to be
min{d1 , d2 }, we would think that the nine-qubit code had distance 2. However, the phase error formed from

79
the short codeword above is Z1 Z2 , which is in S. Similarly for all other weight 2 codewords of C2 . This is
why the distance of the nine-qubit code can be 3.

5.1.3 Codewords of a CSS Code in Computational and Dual Basis

CSS codes also look nice when the codewords are written in the standard basis. The projector on the code
space is a product of two projectors ⇧1 and ⇧2 . ⇧i is the projector on the +1 eigenspace of the generators
derived from Ci . ⇧1 thus projects on the subspace spanned by codewords of C1 . ⇧2 can be written as a sum
0 1
1 @ X
⇧2 = n k 2 Px A , (5.5)
2 ? x2C2

where Px is the Pauli with binary symplectic representation (x|0). The Paulis of this form (for x 2 C2 ? ) are
the stabilizer elements formed from products of the generators derived from the parity check matrix of C2 .
Therefore, we can find the codewords of S by taking some codeword of C1 (which is all that can pass ⇧1 )
and applying ⇧2 . In general, we get something of the form
X
|u + C2 ? i = |u + vi, (5.6)
v2C2 ?

for u 2 C1 . Recall that C2 ? ✓ C1 and C1 is linear, so u + v 2 C1 .

When are two such codewords |u + C2 ? i and |u0 + C2 ? i equal?
X X
0 = |u + C2 ? i |u0 + C2 ? i = |u + vi |u0 + v 0 i. (5.7)
v2C2 ? v 0 2C2 ?

This is only possible if every term in the first sum is cancelled by a term in the second sum. That is, when
v 2 C2 ? , u + v = u0 + v 0 for some v 0 2 C2 ? . Thus, u u0 = v 0 v 2 C2 ? since C2 ? is linear. The states
|u + C2 ? i only depend on cosets of C2 ? within C1 , thus explaining the notation I chose to represent the
basis states.
C1 has 2k1 codewords and C2 ? has 2n k2 codewords, so C1 /C2 ? has 2k1 +k2 n codewords, which is the
same as the dimension of S. The codewords |u + C2 ? i form a basis for the code space of the CSS code.
I claimed that in the CSS construction, the two classical codes used were treated equally, but in this
expansion, they certainly seem unequal. The solution lies in looking in the Hadamard rotated basis. The
Hadamard transform switches the role of X and Z, so you might expect that it switches the role of C1 and
C2 . If you expected that, you are correct. Let’s calculate it explicitly:
X
H ⌦n |u + C2 ? i = H ⌦n |u + vi (5.8)
v2C2 ?
X X
= ( 1)(u+v)·w |wi (5.9)
v2C2 ? w
0 1
X X
= ( 1)u·w @ ( 1)v·w A |wi (5.10)
w v2C2 ?
X
= ( 1)u·w |wi (5.11)
w2C2
X X
= ( 1)u·x |x + yi (5.12)
x2C2 /C1 ? y2C1 ?
X
= ( 1)u·x |x + C1 ? i (5.13)
x2C2 /C1 ?

80
|0i

Figure 5.1: A circuit to measure the parity of 4 qubits.

To get the fourth line, observe that if w 2 C2 , then v · w = 0 always, but if w 2 / C2 , then v · w = 0 for half
of the values of v and v · w = 1 for the other half of the values of v. (It must be nonzero for some v since
w2 / C2 . Call that v0 . Then if v 2 C2 ? , so is v + v0 , but exactly one of v · w = 1 and (v + v0 ) · w = 1. Thus
we can pair the elements of C2 ? such that within each pair one is orthogonal to w and one is not.)
In the next-to-last line, I have broken the sum over w into a sum over cosets of C1 ? and a sum over
elements of the cosets. This is a sensible thing to do because u · w only depends on which coset w lies in of
C1 ? in C2 : If w0 = w + y, y 2 C1 ? , then u · w0 = u · w + u · y, but u 2 C1 , so u · y = 0.
In particular, if we have associated the basis codewords |u + C2 ? i with logical basis codewords, then we
recognize equation (5.13) as a Hadamard transform of the logical codewords, with the new basis codewords
|x + C1 ? i. In the standard basis, the CSS code consists of superpositions over cosets of C2 ? in C1 , while in
the Hadamard-rotated basis, it consists of superpositions over cosets of C1 ? in C2 .

5.1.4 Error Correction for a CSS Code

CSS codes are stabilizer codes, so any general technique for performing error correction on stabilizer codes will
work for CSS codes too. However, CSS codes have additional special structure which we can take advantage
of to get better error correction procedures. In section 12.3, I’ll discuss a fault-tolerant error-correction
procedure that is designed for CSS codes, but for now, I’ll just discuss a non-fault-tolerant scheme.
For a general stabilizer code, the bits of the error syndrome come from the eigenvalues of the stabilizer
generators. For a CSS code, the stabilizer generators break up into two parts, one from the code C1 and one
from the code C2 . The first set of syndrome bits identifies bit flip errors, while the remaining syndrome bits
identify phase errors.
In the standard basis, the codewords are superpositions of codewords of C1 , so we can measure the bit flip
syndrome by measuring the parity checks of C1 . Of course, we must be careful to measure only the parity
checks and nothing more, since we don’t want to destroy any superposition of logical states. To measure
a parity check, you can add a single ancilla qubit in the state |0i and perform CNOT gates from all the
qubits involved in the parity check to the ancilla. Then measure the ancilla qubit and the outcome will be
the desired parity. See figure 5.1 for an example.
The bit flip syndrome we get this way is exactly the syndrome of the classical code code C1 if it underwent
the same bit flip error. In order to decode the syndrome and learn the actual error, we can just invoke the
classical procedure. If the code C1 has an efficient syndrome decoding algorithm, then we get one for the bit
flip errors of the CSS code also.
For the phase errors, we can simply rotate into the Hadamard basis. Now the codewords are superpositions
of codewords of C2 , and the phase errors have become bit flip errors. If, after the Hadamard transform, we
measure the parity checks of C2 , that is the same as measuring the phase error syndrome for the CSS code.
The phase error syndrome is exactly the same as the error syndrome for C2 if it underwent bit flip errors
corresponding to the locations of the phase errors. Again, we can use the syndrome decoding algorithm for
C2 to identify the phase errors for the CSS code.

81
+ 0 1 ! !2 ⇥ 0 1 ! !2
0 0 1 ! !2 0 0 0 0 0
1 1 0 !2 ! 1 0 1 ! !2
! ! !2 0 1 ! 0 ! !2 1
!2 !2 ! 1 0 !2 0 !2 1 !

Table 5.2: The addition and multiplication tables for GF(4).

5.2 GF(4) Codes and Stabilizer Codes

Now we’ll return to more general stabilizer codes. CSS codes have many nice properties, but they are not
perfect. For instance, general stabilizer codes can be more efficient than CSS codes — the 7-qubit code is
the smallest CSS code that corrects 1 error, whereas the 5-qubit code, a stabilizer code, is the best QECC
of any type that corrects 1 error. While there are some non-stabilizer codes known that are slightly better
than any stabilizer code, the di↵erences are small for codes correcting Pauli channels.
In this section, I’ll therefore discuss a technique for adapting some classical codes to get stabilizer codes
which are not CSS codes. In order to do so, we’ll have to go to non-binary codes.

5.2.1 Correspondence Between GF(4) and the Pauli Group

The main “insight” of this particular technique is that the one-qubit Pauli group has 4 elements, and so does
the finite field GF(4). Amazing, no?
Really, we’re connecting P̂n , the Pauli group sans phases, with an n-dimensional vector field over GF(4).
For n = 1, we simply associate each element of the Pauli group with the elements of GF(4):

I$0 X$1
(5.14)
Z$! Y $ !2

For n qubits, the ith tensor factor of P 2 P̂n becomes the ith component of a vector over GF(4) using the
rules in equation (5.14).
The conversion works much the same way as the conversion between P̂n and the binary symplectic
representation of the Pauli group. Multiplication in P̂n becomes addition in GF(4). The full correspondence
is summarized in table 5.3. It’s also worth recalling the addition and multiplication tables for GF(4), which
are given in table 5.2.

5.2.2 The GF(4) Symplectic Inner Product

As with the binary symplectic representation, we need to recover somehow the notation of commutation,
and again we turn to a symplectic inner product. We want some map from GF(4) ⇥ GF(4) to Z2 which
gives 0 when one input is 0 or both inputs are the same, and gives 1 otherwise. It turns out that the correct
formula is tr(ab). The a and trace operations are defined as follows for GF(4):

0=0 1=1 tr 0 = 0 tr 1 = 0
(5.15)
! = !2 !2 = ! tr ! = 1 tr ! 2 = 1

You can check by trying all combinations that this formula works. The generalization to n-dimensional
vectors is then straightforward:
a ⇤ b = tr(a · b), (5.16)
where · is just the usual dot product. In the classical coding literature, this inner product is sometimes
known as the trace-Hermitian inner product.

82
In the Pauli group In GF(4)
I 0
X 1
Z !
Y !2
Multiplication Addition
c(P, Q) a ⇤ b = tr(a · b)
Phase No equivalent
No equivalent Multiplication
Unitary T (see chapter 6) Multiplication by !
Stabilizer S Weakly self-dual additive code S
Normalizer N(S) Dual S? (under ⇤)

Table 5.3: Equivalence between the Pauli group and GF(4).

5.2.3 Stabilizer Codes as GF(4) Codes

We can convert a stabilizer to sets of vectors over GF(4) just as we converted them to the binary symplectic
representation. We’d like to interpret the resulting set as a classical error-correcting code over GF(4).
Formally, there is no problem in doing so. However, what we get is not necessarily a linear code. Sometimes
we do get a linear code, as in the example of the five-qubit code (converted to a GF(4) code in table 5.4),
but it is not hard to come with stabilizer codes which don’t produce linear GF(4) codes.
Because P, Q 2 S ) P Q 2 S, the GF(4) code S which we get is additive (closed under addition). However,
for it to be linear, S would also need to be closed under multiplication by !, and that is not necessarily true.
(It also needs to be closed under multiplication by ! 2 , but that follows automatically if it is closed under
multiplication by !.)
The other condition we need to satisfy is that the stabilizer is Abelian. We can express this in terms of a
dual with respect to the symplectic product ⇤. The dual of an additive code is additive, and the dual of the
GF(4) conversion of a stabilizer is the GF(4) conversion of the normalizer. An additive code corresponds to
an Abelian group if it is weakly self-dual.
We now have the main components we need to convert GF(4) codes into stabilizer codes.
Theorem 5.2. Suppose C is an additive code over GF(4) which is weakly self-dual under ⇤. Then C can
be converted to a stabilizer S with parameters [[n, k, d]]. n is the number of physical registers in C. If C
contains K = 2r codewords, then k = n r. The distance d of S is the smallest weight of a member of
C ? \ C, and in particular, d is at least equal to the distance of C ? .
Note that there are a number of peculiar things involved in the conversion. In some sense C ? is more
closely analogous to the quantum code we get, since the distance of S is at least equal to the distance of C ? .
However, the number of encoded qubits is a formula that does not show up at all in classical coding theory
0
(which is more interested in writing K = 4r than K = 2r ), and of course the distance can be even larger
than the distance of C ? because of degeneracy. And it is important to bear in mind that the dual is taken
with respect to the symplectic product ⇤ rather than the usual inner product.
The upshot is that if we look at the most common classical GF(4) codes and try to convert them into
quantum codes, we won’t get the most general stabilizer code. But who cares about that? We only want
good codes, and if we can get them by looking at existing GF(4) codes, that’s good enough.

5.2.4 Linear GF(4) Codes

For linear GF(4) codes, the conditions in theorem 5.2 simplify somewhat. In particular, we have the following:
Proposition 5.3. If C is a linear GF(4) code, then its dual with respect to the inner product x · y is the
same as the dual with respect to ⇤.

83
1 ! ! 1 0
0 1 ! ! 1
1 0 1 ! !
! 1 0 1 !

Table 5.4: The five-qubit code converted to a GF(4) code.

Proof. We wish to show that x · y = 0 8y 2 C i↵ x ⇤ y = 0 8y 2 C. The forward direction is trivial, so we

only need to show the backwards direction.
Suppose tr(x · y) = 0, but x · y = 1. Then x · (!y) = ! and x ⇤ (!y) = tr(x · (!y)) = 1. Therefore, since
C is linear, if x ⇤ y = 0 8y 2 C, then x · y = 0 8y 2 C.

This simplifies the procedure of checking for weakly self-dual codes because we can use the dual under
the standard inner product and then take the conjugate rather than have to compute the dual under the
unusual symplectic inner product ⇤.

5.2.5 Example: Perfect Qubit Codes

Now let’s look at some concrete examples of constructing stabilizer codes from GF(4) codes. It turns out
that the GF(4) Hamming codes have the right properties, or rather their duals do.

Theorem 5.4. The duals (with respect to the standard inner product) of the Hamming codes over GF(4)
can be converted to stabilizer codes. The dual of the [(4r 1)/3, (4r 1)/3 r, 3]4 Hamming code becomes a
[[(4r 1)/3, (4r 1)/3 2r, 3]] qubit stabilizer code.

Notice that the resulting stabilizer code encodes (4r 1)/3 2r qubits whereas the Hamming code encodes
r
(4 1)/3 r GF(4) registers. This is a consequence of using a base of 2 instead of 4 to count registers, as
mentioned in the discussion of theorem 5.2.

Proof. The Hamming codes are linear, so we need to show that a Hamming code contains its dual relative
to x · y. Looking at the construction of the non-binary Hamming codes in the proof of theorem 4.14, we see
that if we discard the last row of the parity check matrix, the first column is all 0, and then every other
column is repeated four times. That means that the inner product of any two of these rows will be zero,
since we end up adding the same thing four times.
Next, we should show that we also get 0 if we take the inner product of the last row with another row i.
Since the first column of row i is 0, we can ignore the first column. The other columns break up into sets of
four, within which row i has some value a repeated four times, while the last row runs over all the values in
GF(4): 0, 1, !, and ! 2 . Whatever the value of a,

a0 + a1 + a! + a! 2 = a(0 + 1 + ! + ! 2 ) = 0. (5.17)

That proves that the parity check matrix of the Hamming code can be converted into a stabilizer. The
number of encoded qubits follows from theorem 5.2.
The dual (with respect to x · y) of the dual (with respect to the standard inner product) of a Hamming
code is just the conjugate of the Hamming code, produced by replacing x with x everywhere in the code.
The conjugate of a code has the same distance as the code since taking the conjugate does not change the
weight, and therefore, the stabilizer codes we have derived have distance at least 3. In fact, these codes are
non-degenerate, so the distance is exactly 3.

We get codes with parameters [[5, 1, 3]], [[21, 15, 3]], [[85, 77, 3]], etc. The [[5, 1, 3]] code produced this way
(shown in table 5.5) is equivalent to the [[5, 1, 3]] code we discussed before. Note that while the classical
[5, 3, 3]4 Hamming code has two rows in its parity check matrix, the quantum [[5, 1, 3]] code has four generators
of its stabilizer. The vectors v, !v are considered linearly dependent for a code over GF(4), so we only need

84
I X X X X
0 1 1 1 1 I Z Z Z Z
1 0 1 ! !2 X I X Z Y
Z I Z Y X

Table 5.5: The parity check matrix of the five-bit GF(4) Hamming code and the five-qubit code derived from
it.

to include one of them in the parity check matrix, but they convert to Paulis which are independent when
considered as operators on qubits, so we need to list them both in the stabilizer.
This family of quantum codes is interesting because, like the classical Hamming codes they are derived
from, the codes in this family use up all of the error syndromes. The [[(4r 1)/3, (4r 1)/3 2r, 3]] code
has 2r stabilizer generators, so 4r error syndromes. There are 3n + 1 = 4r zero and one-qubit errors, and
the code is non-degenerate, so each syndrome is used exactly once.
It is unclear what the correct definition of a perfect degenerate QECC should be, but a non-degenerate
quantum code is perfect if the number of correctable errors is exactly equal to the number of error syndromes.
Thus, the QECCs derived from the GF(4) Hamming codes are perfect qubit codes.

85
86
Chapter 6

Symmetries Of Symmetries: The

Cli↵ord Group

When dealing with stabilizer codes, it is helpful to restrict attention to a set of quantum gates that is
guaranteed to treat the code nicely. There exists a unitary operation that will take any subspace (for
instance, a quantum error-correcting code) into any other subspace of the same dimension. Sometimes
that’s exactly what you want. However, if you’ve taken the e↵ort to work with a code with a nice tractable
description in terms of its stabilizer, you don’t want your work ruined by using a poorly-thought-out unitary.
In general, quantum gates will map the code space of a stabilizer code into some other code, which might
not be a stabilizer code.
The Cli↵ord group is a group of unitary gates that is specifically chosen so that it does not do this. If
you start with a stabilizer code and perform a Cli↵ord group gate, you will always have another stabilizer
code. The key to this is using only unitary operations which can be thought of as permutations of the Pauli
group. A Cli↵ord group operation then just switches one stabilizer into another.

6.1 Definition of the Cli↵ord Group

6.1.1 Motivation for the Cli↵ord Group
Suppose we perform the unitary U on a state | i from a stabilizer code S. What happens to the stabilizer?
Suppose M 2 S, so M | i = | i. We want to find M 0 for which U | i is a +1 eigenstate. It turns out the
correct choice is M 0 = U M U † :

M 0 U | i = U M U † U | i = U M | i = U | i, (6.1)

since U is unitary. Running over all M in the stabilizer, we find that S0 = {U M U † |M 2 S} is a set of
operators for which all states U | i are +1 eigenstates (where | i is any element of T (S)).
We’d like to say that S0 is the new stabilizer of the code, with code space U (T (S)). However, the catch
is that without any additional constraint on U , S0 might contain many non-Paulis, and the stabilizer is
supposed to be a subset of Pn . In addition, the true stabilizer of the subspace U (T (S)) might contain
additional Paulis that are not in S0 . Furthermore, U (T (S)) might not be a stabilizer code — the Paulis in
S(U (T (S))) might be insufficient to specify the subspace. (Recall definition 3.4.)
The Cli↵ord group is a set of unitaries that does not have these complications. It maps stabilizers to
stabilizers. Fortunately, the Cli↵ord group contains many interesting quantum gates; unfortunately, it is not
enough for a universal quantum computer. The Cli↵ord group is sufficient for encoding stabilizer codes, and
gives a good start on fault-tolerant operations, but eventually we will need to go beyond it.

87
6.1.2 Definition of the Cli↵ord Group and Variants
Definition 6.1. The Cli↵ord group Cn on n qubits is the normalizer of Pn in the unitary group U(2n ). That
is,
Cn = {U 2 U(2n )|U P U † 2 Pn 8P 2 Pn }. (6.2)

The name “Cli↵ord group” is not particularly illuminating, perhaps. It is motivated by the idea that
there might be a connection of some sort to Cli↵ord algebras, but the connection is not very close. To add
to the confusion, you might encounter the term “Cli↵ord group” in the mathematics literature referring to a
di↵erent group. In quantum information papers, Cli↵ord group refers to definition 6.1 or one of its variants
defined below. You might also encounter the terms “normalizer group,” “symplectic group” (or operations),
or sometimes “stabilizer operations” for Cn . None of these terms is completely satisfactory, and “Cli↵ord
group” is the most widespread, so I will use that.
The Cli↵ord group contains all gates of the form ei✓ I, and if U 2 Cn , then ei✓ U 2 Cn . As with the Paulis,
global phase is frequently not significant for Cli↵ord group elements. Indeed, it is less likely to matter for
the Cli↵ord group. Anticommutation is less important in the Cli↵ord group than it is in the Pauli group, so
we will usually consider not the full Cli↵ord group, but the Cli↵ord group with phases removed.
By definition, the Pauli group Pn is a normal subgroup of Cn . Sometimes it is better to consider the
Cli↵ord group with the Pauli subgroup modded out. The Pauli group only contains the phases ±1, ±i, so
even once the Pauli group is gone, there are still additional phases to worry about. Typically, we will want
to remove those as well.

Definition 6.2.

Ĉn = Cn /{ei✓ I} (6.3)

Čn = Ĉn /P̂n . (6.4)

I will refer to these variants as the “Cli↵ord group,” sometimes without specifying which one I mean.
Usually it doesn’t much matter, or can be deduced from context (or both).

6.1.3 Example Cli↵ord Group Elements

Now let’s look at some Cli↵ord gates. We know the Pauli group is a subgroup of Cn , so that gives us one set
of examples. The Cli↵ords are defined to act on Pn by conjugation, and, as you’ll see shortly, the best way
of characterizing an element of the Cli↵ord group is usually by giving the action under conjugation. Suppose
we have P 2 Pn ⇢ Cn . What does it do to another Pauli Q under conjugation?

P QP † = ( 1)c(P,Q) QP P † = ( 1)c(P,Q) Q. (6.5)

Thus, Q 7! ±Q, with the sign determined by whether P and Q commute or anticommute. The e↵ect of
conjugating by a Pauli is to rearrange the signs of other Paulis without changing their identity. This is easy
to square with what we know about stabilizers from chapter 3: Applying the error P will move us from
the +1 eigenspace of Q to the 1 eigenspace of Q i↵ P and Q anticommute. Moving to the 1 eigenspace
is equivalent to modifying the stabilizer to contain Q instead of Q, which is the way we understand the
action of Cli↵ord group elements.
Another gate in the Cli↵ord group is the Hadamard transform H. As we’ve discussed, the Hadamard
transform switches the role of X and Z. This can be made concrete by looking at the conjugation action of
H. You can work it out by multiplying together the matrices, but I’ll just tell you the answer:

HXH = Z (6.6)
HY H = Y (6.7)
HZH = X. (6.8)

88
H † = H, so I’ve skipped the adjoints in the above equations. As you can see, the action of H is indeed to
switch X and Z. The e↵ect on states is to switch Z eigenstates with X eigenstates:
|0i (stabilizer Z) $ |+i = |0i + |1i (stabilizer X) (6.9)
|1i (stabilizer Z) $ | i = |0i |1i (stabilizer X) (6.10)
While its action on Y is not as dramatic as its action on X and Z, H does not leave Y completely alone.
Instead, it changes the sign of Y , so +1 eigenstates of Y get switched with 1 eigenstates of Y :
1 1 ei⇡/4
p (|0i + i|1i) $ [(1 + i)|0i + (1 i)|1i] = p (|0i i|1i). (6.11)
2 2 2
The conjugation action doesn’t tell us about the overall phase ei⇡/4 introduced by H, but that has no
physical significance anyway.
The other important thing about H’s action on Y is that I didn’t need to specify it. The action of H on
Y can be deduced from its action on X and Z. Conjugation is a group homomorphism — the conjugate of
the product is the product of the conjugates:
U P QU † = (U P U † )(U QU † ). (6.12)
In this case,
HY H = H(iXZ)H = i(HXH)(HZH) = iZX = Y. (6.13)
Equivalently, we could deduce the action of H on X from the action on Y and Z, or the action on Z from
the action on X and Y . The action on I, of course, will always be trivial (U IU † = I).
The Hadamard switches X and Z, but only changes the phase of Y . There are also Cli↵ord group
elements that switch other pairs of Paulis. For instance, R⇡/4 2 Cn :
†
R⇡/4 XR⇡/4 =Y (6.14)
†
R⇡/4 Y R⇡/4 = X (6.15)
†
R⇡/4 ZR⇡/4 = Z. (6.16)
This is not 100% analogous to H, since Z really is left alone by R⇡/4 , but Y picks up a minus sign under
H. However, by multiplying with a Pauli, we can take care of that sign issue:
(Y R⇡/4 )X(Y R⇡/4 )† = Y (6.17)
†
(Y R⇡/4 )Y (Y R⇡/4 ) = X (6.18)
†
(Y R⇡/4 )Z(Y R⇡/4 ) = Z. (6.19)
You can deduce these equations by performing the conjugation action of R⇡/4 and then following it with the
conjugation action of Y , which switches the signs around.
You might wonder if we can completely get rid of the minus sign, rather than simply switching it to
another location, by choosing an appropriate Pauli. The answer is no: any Pauli will commute with one
Pauli and anticommute with two of them. Thus, it switches the sign of two of the three one-qubit Paulis
X, Y , and Z. We can go from one minus sign to three minus signs, but never to zero or two. What we can
do is change the signs of the generators of P1 (or Pn when dealing with n qubits) in any way we like. For
instance, to change X 7! X, Z 7! Z, conjugate by Z. The change in the sign of Y is determined by the
change in the signs of X and Z.
Among two-qubit Cli↵ord group elements, the most famous gate is the CNOT gate, which has the
following conjugation action on the Paulis:
X ⌦ I 7! X ⌦ X (6.20)
Z ⌦ I 7! Z ⌦ I (6.21)
I ⌦ X 7! I ⌦ X (6.22)
I ⌦ Z 7! Z ⌦ Z. (6.23)

89
We are dealing with the two-qubit Pauli group, and I have taken advantage of the homomorphism property
of conjugation to just list the action of CNOT on a generating set of four Paulis for P2 . For any other Pauli,
you can deduce the conjugation action of CNOT by multiplying as described above for the Hadamard.

6.1.4 Determining Cli↵ord Group Element From Action on Generating Paulis

Describing the conjugation action of Cli↵ord group elements certainly tells us a lot about the unitary we are
dealing with, but you might worry that some information is being lost. Indeed, if U 0 = ei✓ U , then

U 0 P U 0† = ei✓ U P U † e i✓
= U P U †, (6.24)

so the conjugation action tells us nothing about the global phase of the unitary. Of course, the global phase
doesn’t have any physical significance, so this is not a great loss. But perhaps there are more important
properties not captured by the conjugation action? The next theorem tells us that there are not:
Theorem 6.1. Suppose U and V are unitaries which have the same action by conjugation on Pn :

UPU† = V PV † (6.25)

for all P 2 Pn . Then U = ei✓ V for some ✓.

The theorem does not apply only to Cli↵ord group elements; U and V can be any unitary gates. When
applied to the Cli↵ord group, theorem 6.1 tells us that the conjugation action uniquely identifies elements
of Ĉn . Consequently, we will be able to work with elements of Ĉn using just the conjugation action.
Proof. Note that V † U acts on the Pauli group trivially by conjugation (P 7! P ), so it will be sufficient to
consider the case of V = I and show that U = ei✓ I.
When U P U † = P for all P 2 Pn , that means that U maps any stabilizer state to a state with the
same stabilizer. For instance, the basis state |ji is a stabilizer state with stabilizer generated by ( 1)ji Zi ,
i = 1, . . . , n (where ji is the ith bit of j). The only states with this stabilizer are ei✓j |ji. We conclude that
U is diagonal 0 i✓1 1
e 0 ··· 0
B 0 ei✓2 · · · 0 C
B C
U =B . . C (6.26)
@ .. .. A
0 ··· ei✓2n
We still need to show that all ✓j are the same. Now consider stabilizer states with stabilizer generators
X1 and ( 1)ji 1 Zi for i = 2, . . . , n. The eigenstates of those stabilizers are of the form ei j |+i|ji. Applying
equation (6.26) to |+i|ji, we find
ei✓j = ei✓2n 1 +j = ei j . (6.27)
Similarly, looking at stabilizers with Xi in place of X1 , we find that

ei✓j = ei✓2n i +j
, (6.28)

for any j such that the ith bit is 0. Applying all these equalities, we find that all ei✓j terms are equal, proving
the theorem.
Which permutations of Pn are allowed for a conjugation action by a Cli↵ord group element? We know
that conjugation is a group homomorphism. That means that there is no hope that we can freely choose
images for any elements of Pn except for a generating set. In addition, conjugation will always take I to
I. That means that conjugation must preserve commutation and anticommutation:

U (P Q)U † = U [( 1)c(P,Q) QP ]U † = ( 1)c(P,Q) (U QU † )(U P U † ) (6.29)

† † c(U P U † ,U QU † ) † †
= (U P U )(U QU ) = ( 1) (U QU )(U P U ). (6.30)

90
Thus, c(U P U † , U QU † ) = c(P, Q).
The usual set of generators for P̂n is {Xi , Zi }. To keep the right commutation relations, the images of
Xi and Zi must commute with the images of Xj and Zj for j 6= i. However, the images of Xi and Zi must
anticommute with each other.
There is one additional constraint: Conjugation by U is an isomorphism, since it can be inverted by
conjugating by U † . Therefore, the generators must map to an independent set of Paulis. However, this
property actually follows from the commutation relations, so we don’t need to impose it separately.
If these conditions are satisfied, a similar approach to the proof of theorem 6.1 gives a constructive
method for finding the unitary corresponding to a given conjugation map. The same procedure applies even
for conjugation by a non-Cli↵ord unitary.
Procedure 6.1. Suppose the map M : Pn ! U(2n ) is a group homomorphism, with M (Xi ) = X i , M (Zi ) =
Z i , such that
c(X i , X j ) = c(Z i , Z j ) = 0,
c(X i , Z j ) = ij . (6.31)
(When X i and Z j are outside the Pauli group, the definition of c(·, ·) is the same, and c(P, Q) is undefined
here if P and Q neither commute or anticommute.)
The following procedure finds the matrix representation in the standard basis of a U for which conjugation
by U performs M :
1. Find the state | 0i which is a +1 eigenstate of Z i for all i = 1, . . . , n.
2. Let b be any number from 0 to 2n 1, and let bi be the ith bit of b. Let
Y
X(b) = (X i )bi . (6.32)
i

3. Let | bi = X(b)| 0 i.

91
It follows that if, for any j, bj 6= b0j , then h b | b0 i = 0 as desired. The states | b i we get from procedure 6.1
are normalized, so we have an orthonormal basis.
The last step is to prove that U acts by conjugation on the Pauli group according to M . Let us compute
U Zi U † and U Xi U † :

U Zi U † | bi = U Zi |bi (6.41)
bi
= ( 1) | b i. (6.42)

Now,

Z i| bi = Z i X(b)| 0i (6.43)
bi
= ( 1) X(b)Z i | 0i (6.44)
bi
= ( 1) | b i. (6.45)

Since the | b i form a basis, U Zi U † = Z i .

Similarly,

U Xi U † | bi = U Xi |bi (6.46)
=| b0 i, (6.47)

where b0 is b with the ith bit flipped (b0i = bi + 1, b0j = bj for j 6= i).

X i| bi = X i X(b)| 0i (6.48)
0
= X(b )| 0i (6.49)
=| b0 i, (6.50)

so U Xi U † = X i .

If you apply procedure 6.1 to a mapping which does not preserve commutation or anticommutation, you
will still get a linear map, but it won’t be unitary. Furthermore, it may not realize the conjugation action
you wanted, so there is no real reason to apply procedure 6.1 unless the map you are working with satisfies
equation (6.31).

6.1.5 The Cli↵ord Group and the Symplectic Group

The conditions on the conjugation map performed by a Cli↵ord group operation become somewhat more
straightforward if we consider them in terms of the binary symplectic representation. A group homomorphism
of the Paulis becomes a linear map on 2n-dimensional binary vectors. The requirement that the generators
map to independent Paulis just says that the linear map has maximum rank. The constraint that conjugation
preserves the commutation relations just means that the linear map must preserve the symplectic inner
product:

v w = (M v) (M w) (6.51)
T T T
v Jw = v M JM w, (6.52)

with J the 2n ⇥ 2n matrix ✓ ◆

0 I
J= . (6.53)
I 0
Running over all values of v and w, we find

J = M T JM. (6.54)

92
M is a member of the symplectic group Sp(2n, Z2 ).
As usual, by going to the binary symplectic representation, we lose all information about the phases
of Paulis in the stabilizer. An operation which only changes the phases of Paulis is always performed by
conjugation by a Pauli:
Proposition 6.2. If U P U † = ±P for all P 2 Pn , then U = ei✓ Q for Q 2 Pn and some value of ✓.
Proof. I will show that any mapping P 7! ( 1)cP P which could possibly be performed by a unitary can also
be performed by conjugation by some Q 2 Pn . Then the proposition follows from theorem 6.1.
For any unitary U , conjugation performs a group homomorphism, so

cP Q = cP + cQ . (6.55)

In particular, the mapping is completely determined by its action on Xi and Zi for i = 1, . . . , n. For any cP
consistent with equation (6.55), I claim there exists Q 2 Pn that implements it. By equation (6.5), we need
to find Q such that c(Q, Xi ) = cXi and c(Q, Zi ) = cZi . By lemma 3.15, such a Q exists, though there is only
one. Specifically,
O n
Q= Qi (6.56)
i=1

with 8
>I
>
>
if cXi = c Zi = 0
<X if cXi = 0 and cZi = 1
Qi = (6.57)
>Y
> if cXi = 1 and cZi = 1
>
:
Z if cXi = 1 and cZi = 0

It follows from proposition 6.2 that elements of Čn = Ĉn /P̂n correspond uniquely to symplectic operations.
Indeed, Čn is exactly the symplectic group.
Theorem 6.3. Čn ⇠ = Sp(2n, Z2 ). Equivalently, for every map Xi 7! X i , Zi 7! Z i with X i , Z i 2 Pn and
satisfying the correct commutation relations, there exists U 2 Cn which performs that map under conjugation,
and U is unique up to overall phase.
Proof. Most of the pieces of this theorem are derived from theorem 6.1, equation (6.54), and proposition 6.2.
The only remaining piece needed to complete the characterization is to show that every symplectic operation
can be realized by a Cli↵ord group operation. This is straightforward: Given any linear map from P̂n to P̂n ,
we can lift it to a group homomorphism M : Pn ! Pn by choosing arbitrary signs for the images of Xi and
Zi . When the original linear map is symplectic, M satisfies equation (6.31), so using procedure 6.1, we find
a unitary U realizing M and thus realizing the symplectic map on the binary symplectic representation. U
maps Paulis to Paulis under conjugation, so U 2 Cn .

6.2 Classical Simulation of the Cli↵ord Group

One of the most interesting and useful things about the Cli↵ord group is also one of the most disappointing.
If U 2 Cn , we can specify it by giving a global phase plus the images of 2n Paulis X1 , Z1 , . . . , Xn , Zn . Each
image is also an n-qubit Pauli, so requires at most 2n + 1 bits to specify. Thus, a Cli↵ord group element
can be specified using only (2n + 1)(2n) bits plus a global phase. Contrast that with a general unitary
U 2 U(2n ), which needs 22n real parameters to specify. Cli↵ord group elements have a much more succinct
representation than general unitary gates.
Indeed, circuits made out of Cli↵ord group gates have a much stronger property: they can be efficiently
simulated on a classical computer. This is a very useful fact, since it makes working with even relatively
large Cli↵ord group circuits tractable. For instance, stabilizer codes are exactly those QECCs which can be

93
H

Figure 6.1: An example 2-qubit circuit made of Cli↵ord group gates.

encoded using a circuit composed of Cli↵ord group gates. The ability to easily describe a stabilizer code via
its stabilizer is one aspect of the efficient simulatability of the Cli↵ord group gates which form its encoder.
We’ll also see in part II that this property plays a big role in helping us find fault-tolerant implementations
of Cli↵ord group gates.
However, there is a price to be paid. Because Cli↵ord group gates can classically be simulated, it means
that a circuit composed only of Cli↵ord group gates cannot access the full power of quantum computa-
tion. (Presumably; as with most statements about computational power, this one relies on some unproven
complexity-theoretic assumptions, in this case the assumption that quantum computers are computationally
more powerful than classical computers.) When we get to fault tolerance, that means that we’ll need to
venture outside the Cli↵ord group in order to get a universal set of fault-tolerant gates.

6.2.1 Simulation of a Unitary Circuit of Cli↵ord Group Gates

If we have a unitary circuit consisting only of Cli↵ord group gates, the simulation procedure is quite straight-
forward. Note that if U, V 2 Cn , and U : P 7! Q, V : Q 7! R, then (V U ) : P 7! R. That’s really all there is
to the simulation.
Q1
Procedure 6.2. You are given a circuit consisting of a product U = i=m Ui , with Ui 2 Cn . (I.e., U1 is
the first gate performed, and Um is the last gate to be performed.) Each Ui can be specified by its action
on the generators of the Pauli group, Ui : Xj 7! Ui (Xj ), Ui : Zj 7! Ui (Zj ). Then the action of the overall
circuit U on the Pauli group can be determined as follows:

1. Initialize X j = Xj and Z j = Zj .

2. Starting with i = 1, and stepping through i up to the last gate m, repeat the following steps:

(a) Calculate Ui (X j ). This can be done by writing X j as a product of single-qubit Xs and Zs and
applying equation (6.12).
(b) Let the new value of X j be Ui (X j ).
(c) Similarly, replace Z j by Ui (Z j ).

3. The overall circuit U has the action Xj 7! X j , Zj 7! Z j , using the final values of X j and Z j .

When the circuit consists of only two-qubit gates, updating an X or Z operator only requires updating
two qubits in the decomposition, since all qubits not a↵ected by the gate retain the same Pauli values.
Therefore, each gate can be updated in a total time O(n) (since we need to update 2n X and Z operators).
The simulation of the full circuit thus takes a time O(nm).
As an example, consider the simple circuit given in figure 6.1. Following the above procedure, we find
that this circuit has the following action on Paulis:

X1 : X ⌦I ! X ⌦X ! Z ⌦X ! I ⌦X
Z1 : Z ⌦I ! Z ⌦I ! X ⌦I ! X ⌦Z
(6.58)
X2 : I ⌦X ! I ⌦X ! I ⌦X ! Z ⌦X
Z2 : I ⌦Z ! Z ⌦Z ! X ⌦Z ! X ⌦ I.

94
6.2.2 Simulation of a Unitary Circuit on a Stabilizer Subspace
Another interesting case is when we have some constraints on the input state of the circuit. For instance,
the circuit may involve some ancillas, or it may be intended to be performed on a state encoded in a QECC.
It may even be that the full input state of the circuit is specified, and we wish to determine the exact output
state of the circuit. When the input state is a stabilizer state or lies within a stabilizer code, there still exists
an efficient simulation procedure.
If the initial state is completely specified as a stabilizer state, we can keep track of the behavior of the
generators M1 , . . . , Mn of the stabilizer. As before, we step through gates, updating the stabilizer generators
at each step just as we updated X and Z before. When Mj is a generator of the stabilizer before we
perform gate Ui , Ui (Mj ) is a generator of the stabilizer after the gate, so this procedure allows us to learn
the stabilizer of the output state of the circuit. The are only two small di↵erences from the case without a
constraint. First of all, we need keep track of only n generators instead of 2n Paulis. The other di↵erence
is that the generators of the stabilizer are not unique, so after performing any gate, if it is convenient to do
so, we may choose a new set of generators of the current stabilizer. There is no requirement to do so; only
do it if it is clearly going to simplify the computation.
When the input state is only partially specified by a stabilizer code, we keep track both of the stabilizer,
which now has n k generators, and of 2k logical X and Z operators. The case with k = n is the “no
constraint” case of the previous subsection, and the case with k = 0 is the stabilizer state case just discussed.
Again, the stabilizer generators are non-unique, so at any step, we may choose a new set of generators. The
logical X and Z operators are also now non-unique, so at any step we may multiply them by elements of the
stabilizer, which means we are choosing a di↵erent coset representative.
Q1
Procedure 6.3. You are given a circuit consisting of a product i=m Ui , with Ui 2 Cn , which is to be
performed on an arbitrary input state from stabilizer code S. S encodes k qubits (0  k  n), has generators
M1 , . . . , Mn k , and has logical Pauli operators X j , Z j , j = 1, . . . , k. Each Ui can be specified by its action
on the generators of the Pauli group, Ui : Xj 7! Ui (Xj ), Ui : Zj 7! Ui (Zj ). Then the action of the overall
circuit on the stabilizer code can be determined as follows:
0 0
1. Initialize variables Nj = Mj (j = 1, . . . , n k), X j = X j and Z j = Z j (j = 1, . . . , k) to be the values
given by the stabilizer code input.

2. Starting with i = 1, and stepping through i up to the last gate m, repeat the following steps:
0 0
(a) Calculate Ui (X j ) for j = 1, . . . , k. This can be done by writing X j as a product of single-qubit
Xs and Zs and applying equation (6.12).
0 0
(b) Let the new value of X j be Ui (X j ) for j = 1, . . . , k.
0 0
(c) Similarly, replace Z j by Ui (Z j ) for j = 1, . . . , k.
(d) Replace Nj by Ui (Nj ) for j = 1, . . . , n k.
(e) The current stabilizer T is generated by hN1 , . . . , Nn ki

(f) If desired, choose a new set of generators for T.

0 0 0 0
(g) If desired, rewrite X j = N X j or Z j = N Z j with N 2 T.

3. The output state lies in a stabilizer code with generators equal to the final values of Nj , j = 1, . . . , n k.
0
The encoded state has undergone the Cli↵ord group operation given by the transformation X j 7! X j ,
0
Z j 7! Z j .

An important special case is when some qubits are completely specified (e.g., to be |0i), and the remaining
qubits are completely unconstrained. In that case, the stabilizer is the stabilizer of the ancilla qubits, and
the initial logical operators X and Z are the Paulis on the unconstrained input qubits.

95
As a example of the expanded procedure, let us consider the circuit in figure 6.1 again, but this time with
the second qubit initially in the state |0i. We now begin with M1 = I ⌦ Z, X 1 = X ⌦ I, and Z 1 = Z ⌦ I.
Using the same analysis as before, we find that the final value of the stabilizer and logical operators are:

N1 = X ⌦ I (6.59)
0
X1 =I ⌦X (6.60)
0
Z1 = X ⌦ Z. (6.61)
0
We can choose a new coset representative for Z 1 : I ⌦ Z = (X ⌦ I)(X ⌦ Z). Then we can see that the overall
transformation performed is to move the input qubit from the first qubit to the second qubit. In the output
state, the first qubit is fixed to be |0i + |1i.

6.2.3 Measurement of Paulis

The final step is to add measurements to our simulation. I will phrase this in the most general way, where
we measure the eigenvalue of an arbitrary Pauli operator on n qubits, but it would be equivalent to just
consider measurement of single qubits in the standard basis. This is because measurement of any Pauli can
be implemented via single-qubit measurements plus Cli↵ord group operations. For simplicity, we restrict
attention to Paulis with overall phase ±1, so the eigenvalues are ±1.
The analysis of measurements is somewhat more complicated than the analysis of unitary gates. For one
thing, there are three separate cases to consider. The first case is when the Pauli P we wish to measure is
in the current stabilizer T up to a phase: ±P 2 T. In that case, the measurement outcome is just ±1 (+1 if
P 2 T and 1 if P 2 T), and the state does not change, since the state being measured is an eigenstate of
P.
The second case is when P 2 N(T). Now, measurement of P constitutes a measurement of some of
the encoded data. We must rewrite P as a product of the logical Paulis X and Z, which determines
that P corresponds to some logical Pauli Q. The measurement output distribution of P is the same as
the measurement output distribution of Q on the original input qubit. We must also update the encoded
state for the e↵ects of the measurement. Finally, the measurement has outcome ±1, so we know that the
post-measurement state is a +1 eigenstate of ±P . In other words, ±P has joined the stabilizer.
The third case, when P 62 N(T), is the most complicated. Since P 62 N(T), 9M 2 T s.t. {P, M } = 0.
Suppose | i 2 T (T), so M | i = | i. The expectation value of a measurement of P on | i is

h |P | i = h |P M | i (6.62)
= h |M ( P )| i (6.63)
= h |P | i (6.64)
= 0. (6.65)

(since M 2 = I, meaning M = M † ). Thus, the probability of a +1 outcome and a 1 outcome for the
measurement are both 1/2.
We canpalso calculate the residual state. If the measurement has outcome ( 1)b , the state afterwards is
| i = (1/ 2)(I + ( 1)b P )| i (taking into account normalization). If N 2 T commutes with P , then | 0 i is
0

still a +1 eigenstate of N :

1
N | 0 i = p N (I + ( 1)b P )| i (6.66)
2
1
= p (I + ( 1)b P )N | i (6.67)
2
1
= p (I + ( 1)b P )| i = | 0 i. (6.68)
2

96
| 0 i is also a +1 eigenstate of ( 1)b P . That means that it is not an eigenstate of any M with {M, P } = 0,
even if M 2 T. Define a stabilizer T0 generated by ( 1)b P plus all N 2 T s.t. [N, P ] = 0. The stabilizer of
the remaining state after the measurement certainly contains T0 . As we will see in a moment, that is all it
contains.
How big is T0 ? To answer this, we need to know how many elements of T commute with P .

Proposition 6.4. Let S be a stabilizer, and let P 62 N(S) be a Pauli that does not commute with every element
of S. Then exactly half of the elements of S commute with P . Furthermore, for any coset Q 2 N(S)/S, exactly
half the elements of Q commute with P .

Proof. If P 62 N(S), let M 2 S be an element of the stabilizer that anticommutes with P , {M, P } = 0. Then
we can pair all elements of S:
N $ M N. (6.69)
Note that M N 2 = M , so each element of S is a member of only one pair. The reason for doing this pairing
is that N commutes with P i↵ M N anticommutes with P :

c(M N , P ) = c(M, P ) + c(N, P ) = 1 + c(N, P ). (6.70)

Thus, the number of elements of S that commute with P is equal to the number of elements that anticommute
with P .
Similarly, we can pair elements of the cosets representing logical Paulis, N $ M N , using the same
M 2 S. Both N and M N are in the same coset Q, and again, exactly one of them commutes with P and
one anticommutes with P . Thus, half of the coset Q commutes with P .

Corollary 6.5. |T0 | = |T|.

The subspace of post-measurement states for a given measurement outcome can be no larger than the
space of possible pre-measurement states, but it could potentially be smaller if multiple states collapse in
the measurement to the same final state. However, as another corollary of proposition 6.4, we find that this
cannot happen. If the code space of T contains k logical qubits, one possible basis for the code space is the
set of 2k codewords which are eigenstates of Z 1 , . . . , Z k . Each of these codewords is a stabilizer state with
stabilizer generated by T, ±Z 1 , . . . , ±Z k . By proposition 6.4 and the analysis preceding it, the stabilizer of
the post-measurement state is generated by T0 and by ±Z 1 , . . . , ±Z k , with the same eigenvalues. However,
we must make certain that each coset is represented by an element that commutes with P ; such an element
always exists by proposition 6.4. Since each of the 2k basis states gets mapped to a distinct stabilizer state,
the code space after the measurement also has dimension 2k . This implies that T0 is the actual stabilizer
post-measurement, as claimed.
Furthermore, this logic also tells us that the T-coset of the logical Z i operator gets replaced by the
T0 -coset which contains an element of the original Z i that commutes with P . The same reasoning tells us
how to find the new versions of the other logical Paulis. In short, we now know how to update both the
stabilizer and the logical Paulis after measurement of a Pauli operator.

Theorem 6.6. Suppose our system is a codeword of the stabilizer code T, with generators N1 , . . . , Nn k and
logical Paulis X i and Z i (i = 1, . . . , k), and we measure the eigenvalue of Pauli P 62 N(T). Then, conditioned
on outcome ( 1)b , the stabilizer and logical Paulis of the system after the measurement can be determined
as follows:

1. Find generator M 2 T such that {M, P } = 0. If necessary, reorder the generators so that M = N1 .

2. For each generator Ni , i > 1, if [Ni , P ] = 0, let Ni0 = Ni . If {Ni , P } = 0, let Ni0 = Ni M .

3. Let N10 = ( 1)b P .

4. The new stabilizer T0 is generated by N10 , . . . , Nn0 k.

97
0
5. Pick a representative Z i for each of the logical Z operators for T. If [Z i , P ] = 0, let Z i = Z i ; otherwise
0 0
let Z i = Z i M . Z i is a representative for the new logical Zi operator.
6. Similarly, pick a representative X i for each of the logical X operators for T. If [X i , P ] = 0, let
0 0 0
X i = X i ; otherwise let X i = X i M . X i is a representative for the new logical Xi operator.
One interesting twist on this system is that we can essentially control the measurement outcome. In
particular, suppose we measure P 62 N(T) and get outcome 1. In theorem 6.6, we identify an element M
from the old stabilizer T which anticommutes with P . However, M commutes with generators 2 through
n k of T0 , since they are also elements of T. M also commutes with the standard coset representatives for
0 0
X i and Z i . Thus, if we perform M on the post-measurement state, we transform the stabilizer (according
to procedure 6.3) only by changing N10 = P to +P . The logical operators are unchanged. This is exactly
the same state as we would have gotten (according to theorem 6.6) if the original measurement outcome had
been +1.
Putting everything together, we get the following procedure for simulating Cli↵ord group gates and Pauli
measurements on either a fixed initial stabilizer state or a partially specified state with some free (logical)
qubits:
Procedure 6.4. You are given a circuit consisting of a m-element sequence of unitary Cli↵ord group gates
and Pauli measurements. At step number i, the circuit calls for either Cli↵ord group gate Ui 2 Cn (specified
by its action on Paulis Ui : Q 7! Ui (Q)) or measurement of the eigenvalue of Pi 2 Pn (assuming Pi has
eigenvalues ±1, not ±i). The initial state of the circuit is given by stabilizer S, which encodes k qubits
(0  k  n), has generators M1 , . . . , Mn k , and has logical Pauli operators X j , Z j , j = 1, . . . , k.
0 0
1. Initialize variables k 0 = k, Nj = Mj (j = 1, . . . , n k), X j = X j and Z j = Z j (j = 1, . . . , k) to be the
values given by the stabilizer code input. Let the variable state | i be the initial encoded state of the
system, corresponding to the logical state | i.
2. Perform the following procedure for each step i in order, for i = 1, . . . , m:

(a) If at step i, the circuit calls for Cli↵ord gate Ui :

0 0
i. Calculate Ui (X j ) for j = 1, . . . , k 0 . This can be done by writing X j as a product of single-
qubit Xs and Zs and applying equation (6.12).
0 0
ii. Let the new value of X j be Ui (X j ) for j = 1, . . . , k 0 .
0 0
iii. Similarly, replace Z j by Ui (Z j ) for j = 1, . . . , k 0 .
iv. Replace Nj by Ui (Nj ) for j = 1, . . . , n k 0 .
v. The current stabilizer T is generated by hN1 , . . . , Nn k0 i
vi. If desired, choose a new set of generators for T.
0 0 0 0
vii. If desired, rewrite X j = N X j or Z j = N Z j with N 2 T.
(b) If at step i, the circuit calls for measurement of Pi , determine whether ±Pi is in T, if Pi 2 N(T)\T,
or if Pi is not in N(T), as follows:
i. Determine if Pi commutes with all generators Nj of the current stabilizer T. If not, then
Pi 62 N(T).
ii. If Pi does commute with all generators, determine if ±Pi 2 T. This can be done by writing
Pi and the generators Nj in their binary symplectic representations and seeing if vPi is in the
linear span of the vNj .
iii. If Pi commutes with all generators but is not in T, then Pi 2 N(T) \ T.
(c) If at step i, the circuit calls for measurement of Pi with ±Pi 2 T:
i. Evaluate whether +Pi 2 T or Pi 2 T. This can be done using linear algebra and the binary
Qn k 0 s
symplectic representations of Pi and Nj to write Pi = ± j=1 Nj j .

98
ii. If +Pi 2 T, return measurement result +1. If Pi 2 T, return measurement result 1.
(d) If at step i, the circuit calls for measurement of Pi with Pi 2 N(T) \ T:
Qn k 0 s Qk 0tj 0uj
i. Write Pi = j=1 Nj j j 0 =1 X j Z j , with sj , tj , uj 2 {0, 1}. This can be done using linear
algebra and the binary symplectic representations of the Paulis.
Qk t u
ii. Let Q = j 0 =1 Xj j Zj j .
iii. Perform a measurement of Q on the current logical state | i. Return the measurement result
( 1)b , and update | i accordingly. Note that this step may not be efficient, depending on
the current state | i.
iv. Reduce k 0 by 1, and add ( 1)b Pi to the stabilizer as a new generator Nn k0 +1 . Update T
accordingly, and if desired, choose a new set of generators for T.
0 0
v. Update the logical operators X j and Z j to relate to the new | i. Again, this step may not
be efficient.
(e) If at step i, the circuit calls for measurement of Pi with Pi 62 N(T):
i. Choose a uniformly random bit b and return measurement result ( 1)b .
ii. Find generator M 2 T such that {M, Pi } = 0. If necessary, reorder the generators so that
M = N1 .
iii. For each generator Nj , j > 1, if [Nj , Pi ] = 0, leave Nj unchanged. If {Nj , P0 } = 0, replace
Nj by Nj M .
iv. Replace N1 by ( 1)b Pi .
v. Update the stabilizer T with the revised generators. If desired, choose new generators for the
revised T.
0 0 0
vi. Pick a representative Z j for each of the logical Z operators for T. If [Z j , Pi ] = 0, leave Z j
0 0 0
unchanged; otherwise replace Z j by Z j M . The updated Z j is a representative for the new
logical Zj operator; choose a new representative using the updated T if desired.
0 0
vii. Similarly, pick a representative X j for each of the logical X operators for T. If [X j , P ] = 0,
0 0 0 0
leave X i unchanged; otherwise replace X j by X j M . The updated X j is a representative for
the new logical Xj operator; choose a new representative using the updated T if desired.
3. The output state lies in a stabilizer code with generators equal to the final values of Nj , j = 1, . . . , n k 0 .
The encoded state has had k k 0 logical qubits measured, and has undergone the transformation
0 0
X j 7! X j , Z j 7! Z j on the remaining qubits.
Generally, for this sort of simulation, we only consider circuits where measurement of Pi 2 N(T) \ T does
not occur, so that the simulation is an efficient one. For instance, if the circuit contains no measurements, this
is not an issue, or equally if it contains no logical qubits (so N(T) = T). In part II, we will see circuits that
have measurements and some logical qubits, but they have been specially designed to avoid the troublesome
case.
Theorem 6.7 (Gottesman-Knill). Given a circuit starting from an initial stabilizer state, followed by a
sequence of Cli↵ord group operations and Pauli measurements, which may depend on classical computations
performed on previous measurement results, there is an efficient classical simulation of the circuit.
If you follow procedure 6.4, you will find a time of O(n3 m) is needed to do the simulation, taking into
account all the linear algebra manipulations needed to process measurement. However, by keeping track of
slightly more information, this can be reduced to O(n2 m).
To illustrate the measurement procedure, let us consider a variation of the example circuit from figure 6.1
which has some measurements in it. The revised circuit is given in figure 6.2. The first few steps are as
before. After the Hadamard, we have the following stabilizer and logical Paulis:
N1 : I ⌦Z ! X ⌦Z
X: X ⌦I ! Z ⌦X (6.71)
Z: Z ⌦I ! X ⌦ I.

99
H

Figure 6.2: One-qubit teleportation: an example of a 2-qubit circuit made of Cli↵ord group gates and
measurement.

Then we measure P = Z ⌦ I. In this case, there is only one element of the stabilizer, and it anticommutes
with P . Suppose we get measurement outcome +1. By theorem 6.6, replace N1 with P . X commutes
with P , so its representative need not change. However, we can choose a new coset representative to take
advantage of the new stabilizer: (Z ⌦ X)(Z ⌦ I) = I ⌦ X. Z anticommutes with P , so we must choose a
new coset representative
ZN1 = (X ⌦ I)(X ⌦ Z) = I ⌦ Z. (6.72)
After the measurement, the stabilizer and logical Paulis are thus

N1 : Z ⌦ I (6.73)
X : I ⌦X (6.74)
Z : I ⌦Z (6.75)

The input qubit has been moved to the second qubit for the output, and the output value of the first qubit
is |0i.
In the case where the measurement result is 1, the stabilizer is P instead of P . The new representative
for X also acquires this minus sign, so the final state is as follows:

N1 : Z ⌦I (6.76)
X: I ⌦X (6.77)
Z : I ⌦Z (6.78)

The output state of the first qubit is |1i, and the output state of the second qubit is the input data qubit,
but with a phase flip (Z) performed on it.

6.3 Generators of the Cli↵ord Group

In section 6.1.3, we saw three common gates which are also elements of the Cli↵ord group: H, R⇡/4 and
CNOT. In a sense, they are the only elements of the Cli↵ord group, because they generate the Cli↵ord
group. That is:

Theorem 6.8. Any gate in the n-qubit Cli↵ord group Cn can be written as a product of ei✓ I, Hi , R⇡/4,i ,
and CNOTi,j , with i, j = 1, . . . , n and ✓ 2 [0, 2⇡).

CNOTi,j means CNOT with qubit i as the control and qubit j as the target.

Proof. First, note that the Pauli group is inside the group generated by H and R⇡/4 : Z = (R⇡/4 )2 , and
X = HZH. Second, global phases are covered by the gates ei✓ I. To finish the proof, we thus need only to
prove that the symplectic representations of H, R⇡/4 and CNOT generate Čn .
It will be helpful to also use two additional gates: SWAP and C Z. Both are in the Cli↵ord group, and
can easily be performed as a product of H, R⇡/4 , and CNOT, so we can use them freely without explicitly
adding them to the generating set. In particular, SWAP is the product of 3 CNOT gates with alternating
directions, and C Z = (I ⌦ H)CNOT(I ⌦ H).

100
a) b)

= =
H H

Figure 6.3: a) Cli↵ord group circuit for the SWAP gate. b) Cli↵ord group circuit for the C Z gate.

We can take advantage of theorem 6.3. The symplectic representation of H is

✓ ◆
0 1
, (6.79)
1 0

the symplectic representation of R⇡/4 is

✓ ◆
1 0
, (6.80)
1 1
and the symplectic representation of CNOT is
0 1
1 0 0 0
B 1 1 0 0 C
B C. (6.81)
@ 0 0 1 1 A
0 0 0 1

Additional qubits will modify these matrices by adding a direct sum with the identity matrix. For instance,
when we have 3 qubits, CNOT2,3 is 0 1
1 0 0 0 0 0
B 0 1 0 0 0 0 C
B C
B 0 1 1 0 0 0 C
B C. (6.82)
B 0 0 0 1 0 0 C
B C
@ 0 0 0 0 1 1 A
0 0 0 0 0 1
In general, if we have a symplectic matrix
✓ ◆
A B
U= , (6.83)
C D

representing the Cli↵ord group gate U , then left multiplication by another symplectic matrix representing
the gate V gives us the matrix for V U whereas right multiplication gives us U V . The e↵ects of left and right
multiplication by H, R⇡/4 , CNOT, C Z, and SWAP are summarized in table 6.1. As you can see, left
multiplication gives us a set of row operations and right multiplication gives us a set of column operations.
We can’t exactly do standard Gaussian elimination, but we can do something very similar. In particular,
given an arbitrary symplectic matrix U , we will find a sequence of H, R⇡/4 , and CNOT to multiply by on
the left and right to transform U into the identity:
gL
! gR
!
Y Y
VL,k U VR,k = I, (6.84)
k=1 k=1

with VL,k and VR,k drawn from {Hi , R⇡/4,i , CNOTi,j }. Then
0 10 1
1
Y Y1
† †
U =@ VL,k A @ VR,k A , (6.85)
k=gL k=gR

101
Gate Left multiplication Right multiplication ✓ ◆
A
Hi Switches the ith rows of (A|B) and Switches the ith columns of and
C
(C|D) ✓ ◆
B
D ✓ ◆
B
R⇡/4,i Adds the ith row of (A|B) to the ith Adds the ith column of to the
D
row of (C|D) ✓ ◆
A
ith column of
C ✓ ◆
A
CNOTi,j Adds the ith row of (A|B) to the jth Adds the jth column of to the
C
row of (A|B), and adds the jth row of ✓ ◆
(C|D) to the ith row of (C|D) A
ith column of , and adds the ith
C
✓ ◆
B
column of to the jth column of
D
✓ ◆
B
D ✓ ◆
B
C Zi,j Adds the ith row of (A|B) to the jth Adds the jth column of to the
D
row of (C|D), and adds the jth row of ✓ ◆
(A|B) to the ith row of (C|D) A
ith column of , and adds the ith
C
✓ ◆
B
column of to the jth column of
D
✓ ◆
A
C
SWAPi,j Swaps the ith rows of A, B, C, and D Swaps the ith columns of A, B, C, and
with the jth rows of A, B, C, and D D with the jth columns of A, B, C,
and D

Table 6.1: The e↵ects of left and right multiplication by H, R⇡/4 , CNOT, C Z, and SWAP on a symplectic
matrix.

102
proving the theorem.
In equation (6.83), let ai , bi , ci , and di be the ith columns of A, B, C, and D, respectively, and aij , bij ,
cij , and dij be the i, jth entries of A, B, C, and D. Because U is symplectic, we can apply equation (6.54)
to get the following properties:
1. U has full rank.
2. C T A + AT C = 0; i.e., (ai |ci ) (aj |cj ) = 0.
T T
3. C B + A D = I; i.e., (ai |ci ) (bj |dj ) = ij .

4. DT A + B T C = I, which is the same as the previous property.

5. DT B + B T D = 0; i.e., (bi |di ) (bj |dj ) = 0.
These properties all remain true if we multiply U on the left or right by other symplectic matrices.
To find a sequence of the form equation (6.84), we can use the following steps:
Procedure 6.5.
1. Since U has maximum rank, somewhere in the first column of A or C is a 1. Using left multiplication
by H and SWAP, we can put this 1 in the upper left corner.
2. Do column reduction on the first column of A: Use left multiplication by CNOT to add the 1 in the
upper left corner of A to any other row of A with a 1 in the first column.
3. Do row reduction on the first row of A: Use right multiplication by CNOT to add the 1 in the upper
left corner of A to any other column of A with a 1 in the first row.
4. Using left multiplication by R⇡/4 and/or C Z, we can similarly make the first column of C to be all
0s.
5. At this point, we find that the first row of C has also become all 0s. This must be the case since
0 = (a1 |c1 ) (aj |cj ) = a1 · cj + c1 · aj = c1j + 0. (6.86)

6. Repeat steps 1 to 5 on the second row and column of A and C to make them all 0 except for a22 , which
is 1. Continue with all other rows and columns in sequence, until we have A = I and C = 0.
7. At this point it follows that D = I:
ij = (ai |ci ) (bj |dj ) = ai · dj + ci · bj = dij . (6.87)
Also, B is symmetric:
0 = DT B + B T D = B + B T . (6.88)
8. Right multiply by H for every qubit, switching A and B, and switching C and D, so now B = C = I,
D = 0, and A is symmetric.
9. Right multiply by R⇡/4 as needed to eliminate the diagonal of A. Right multiply by C Z to eliminate
all other elements of A, leaving A = 0. Since D = 0, these operations do not change C.
10. Right multiply by H for every qubit, switching the left and right halves back again. We are left with
A = D = I, B = C = 0.

It is worth counting just how many gates from the generating set are needed in this procedure. To
eliminate all the 1s in a single row and column of A and C takes O(n) gates. Doing this for all n rows and
columns thus requires O(n2 ) gates. Steps 8 and 10 only require O(n) gates, but step 8 could require one
gate for each element of A, which is again O(n2 ). Thus, the overall number of gates used in the procedure is
O(n2 ). It turns out that this can be slightly improved, allowing an arbitrary element of the Cli↵ord group
to be written as a product of O(n2 / log n) generators.

103
6.4 Encoding Circuits for Stabilizer Codes
Any stabilizer code can be encoded with a Cli↵ord group gate. Therefore, techniques useful for decomposing
a Cli↵ord group unitary into a detailed quantum circuit are also useful for deriving encoding circuits for
stabilizer codes. The main di↵erence between encoding a stabilizer code and finding a circuit for a Cli↵ord
group operation is that there is more freedom in choosing the encoder for a stabilizer code. When you are
given a full Cli↵ord group operation, it tells you exactly what unitary you must perform (up to global phase,
perhaps), whereas for a stabilizer code, you have an additional freedom to perform unitaries which leave the
code space unchanged.
The procedures discussed in this section are boring but necessary. However, it is sometimes possible to
come up with clever codes with more exciting encoding circuits. The most general stabilizer code uses O(n2 )
gates in its encoding circuit, which is not too bad, but for a big code, we’d really like an encoding circuit
with only O(n) gates. Some such codes exist. It’s not possible to do better than that for any sensible code,
since with o(n) gates, you can’t even touch every qubit in the code with a gate. Some qubits will end up
either unprotected or unused, maybe even unloved.
The true bottleneck, however, is the decoding circuit, and in particular, the syndrome decoding problem.
Recall that the error syndromes correspond to cosets of N(S) in Pn , and that we assign a coset representative
Qs to each error syndrome s to serve as the “correct” way to correct a state with syndrome s. However, in
general, it is quite difficult (NP-hard) to compute Qs as a function of s. Efficient codes for which syndrome
decoding can be performed efficiently are precious things, and one of the main points of coding theory is to
find them. Even better is an efficient code for which encoding and syndrome decoding can both be done in
time O(n). In general, syndrome decoding, efficient or not, will use gates outside of the Cli↵ord group, but
is most often done on a classical computer since it is basically a classical problem.

6.4.1 Encoding a Stabilizer Code with Unspecified Logical Paulis

The most straightforward case is when the stabilizer is given, but not the logical Paulis. In this case, there
is an additional freedom to perform unitaries within the code space, provided we do not mix codewords with
non-codewords. In other words, we can use any set of logical Paulis that is convenient for us.
If the original state, pre-encoding, is |0i⌦(n k) ⌦ | i (where | i is a k-qubit state), its initial stabi-
lizer is generated by Z1 , Z2 , . . . , Zn k . The final stabilizer S, post-encoding, also has n k generators
M1 , M2 , . . . , Mn k . The choice of generators is not unique, of course, so we may pick any set of genera-
tors for S that we like. Then we must simply find some Cli↵ord group circuit that maps Zi 7! Mi for
i = 1, . . . , n k.
We can do this using a simplified version of procedure 6.5. There are two ways we can simplify: First, we
don’t really care what happens to Zi for i > n k or to Xi . Thus, we need only concern ourselves with the
first n k columns of A and C. Second, the choice of generators of the stabilizer is not unique. Therefore,
we may permute columns and add columns to each other within A and C without using any actual gates.
We are left with the following procedure:

Procedure 6.6. Generate a list of gates using the steps given below.

1. Write the generators M1 , . . . , Mn k of S as binary symplectic vectors. Produce the first n k columns
of a symplectic matrix as follows: The ith column of A is ai = xMi . The ith column of C is ci = zMi .

2. (a1 |c1 ) is a non-zero vector, so there is a 1 somewhere in the first column. Using left multiplication by
H and SWAP, we can put this 1 in the upper left corner.

3. Do column reduction on the first column of A: Use left multiplication by CNOT to add the 1 in the
upper left corner of A to any other row of A with a 1 in the first column.

4. Using left multiplication by R⇡/4 and/or C Z, we can similarly make the first column of C to be all
0s.

104
5. Do row reduction on the first row of A: Replace generators of the stabilizer to add the 1 in the upper
left corner of A to any other column of A with a 1 in the first row. (It does not matter much if this
step is performed before or after the previous step.)

6. At this point, we find that the first row of C has also become all 0s. This must be the case since

0 = (a1 |c1 ) (aj |cj ) = a1 · cj + c1 · aj = c1j + 0. (6.89)

7. Repeat the previous steps on the second row and column of A and C to make them all 0 except for
a22 , which is 1. Continue with all other rows and columns in sequence up to column number n k.

8. Perform H on qubits 1 through n k to switch A and C.

Take the inverse of this product of gates. The resulting Cli↵ord group circuit will encode in some coset of S;
that is, the stabilizer will be almost correct, except that we may have some error syndrome di↵erent from
the correct trivial syndrome. (Also note that the choice of stabilizer generators may be di↵erent from that
which was originally given to us.) Calculate the action of the encoding circuit on Zi for i = 1, . . . , n k. If
for any i, the resulting generator Mi has the wrong sign, add Xi to the beginning of the encoding circuit.

As an example, let us find an encoding circuit for the five-qubit code as given in table 3.2. We begin
(step 1) with the matrix
1 0 1 0
0 1 0 1
0 0 1 0
1 0 0 1
0 1 0 0
. (6.90)
0 0 0 1
1 0 0 0
1 1 0 0
0 1 1 0
0 0 1 1
Step 2 is unnecessary, as there is already a 1 for a11 . We can perform steps 3 and 4 using CNOT1,4 ·
C Z1,2 · C Z1,3 . Be careful: you must perform the correct transformations on the full rows, not just
the first column. Then, in step 5, we can clear out the first row of A by adding the first column to the
third column; this corresponds to replacing the third generator M3 with M1 M3 . We now have the following
matrix:
1 0 0 0
0 1 0 1
0 0 1 0
0 0 1 1
0 1 0 0
. (6.91)
0 0 0 0
0 0 1 0
0 1 1 0
0 1 1 0
0 0 1 1
We can clear out the second column using CNOT2,5 · C Z2,3 · C Z2,4 , then reduce the second row of A
by replacing M4 with M2 M4 . The third column requires slightly more care. First column reduce the third
column of A using CNOT3,4 . Then use C Z3,4 · C Z3,5 to eliminate the third column of C. The caution
is necessary because CNOT3,4 and C Z3,4 do not commute; in this case, however, the di↵erence is only
Z, which will not a↵ect the binary symplectic representation. Then we finish with the fourth column, using

105
|0i H
|0i H
|0i X H
|0i X H
| i

Figure 6.4: Encoding circuit for the five-qubit code derived in the text.

CNOT4,5 followed by C Z4,5 . We then conclude by performing H on qubits 1 through 4. These gates give
us the following sequence of matrices:

1 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0
0 1 0 0 0 1 0 0 0 1 0 0 0 0 0 0
0 0 1 0 0 0 1 0 0 0 1 0 0 0 0 0
0 0 1 1 0 0 0 1 0 0 0 1 0 0 0 0
0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0
! ! ! (6.92)
0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
0 0 1 1 0 0 0 0 0 0 0 0 0 0 1 0
0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1
0 0 1 1 0 0 0 1 0 0 0 0 0 0 0 0

Taking the inverses of these gates in reverse order, we get the following sequence of gates for the encoding
circuit:

CNOT1,4 · C Z1,2 · C Z1,3 · CNOT2,5 · C Z2,3 · C Z2,4 · CNOT3,4

·C Z3,4 · C Z3,5 · CNOT4,5 · C Z4,5 · H1 · H2 · H3 · H4 (6.93)

Now calculate the e↵ect of these gates on Z1 , . . . , Z4 :

Z1 ! X ⌦ Z ⌦ Z ⌦ X ⌦ I (6.94)
Z2 ! I ⌦ X ⌦ Z ⌦ Z ⌦ X (6.95)
Z3 ! I ⌦Z ⌦Y ⌦Y ⌦Z (6.96)
Z4 ! Z ⌦ I ⌦ Z ⌦ Y ⌦ Y. (6.97)

Since

I ⌦ Z ⌦ Y ⌦ Y ⌦ Z = +(X ⌦ Z ⌦ Z ⌦ X ⌦ I)(X ⌦ I ⌦ X ⌦ Z ⌦ Z) (6.98)

Z ⌦ I ⌦ Z ⌦ Y ⌦ Y = +(I ⌦ X ⌦ Z ⌦ Z ⌦ X)(Z ⌦ X ⌦ I ⌦ X ⌦ Z), (6.99)

the signs of Z3 and Z4 need to be corrected, so we start the encoding circuit with X3 X4 . The final encoding
circuit is given in figure 6.4.

6.4.2 Encoding a Stabilizer Code with Specified Logical Paulis

If we are given a specific choice of logical Paulis, encoding becomes slightly more complicated. However, the
same sort of techniques will work. There are two straightforward options.
Option 1 is to just use procedure 6.5, with fewer simplifications. The desired Cli↵ord group operation
maps Zi ! Mi for i = 1, . . . , n k, Zn k+j ! Z j for j = 1, . . . , k, and Xn k+j ! X j for j = 1, . . . , k. Again,

106
it is more convenient to perform the Hadamard on every qubit so that A and C of the desired symplectic
matrix are full n⇥n matrices. Column operations which add the first n k columns of A and C to something
can be done just by changing generators of the stabilizer or representatives of the logical Pauli cosets, but
column operations involving the last k columns in either the right or left must be done with real gates.
Option 2 is to take advantage of the simplified procedure procedure 6.6 to find an encoding circuit for a
0
stabilizer without specified logical Paulis and see what actual logical Paulis P it produces. Determine the
0
logical Cli↵ord group operation that maps P to the desired logical Pauli P . Use procedure 6.5 on k qubits
to find a circuit performing that Cli↵ord group operation. Perform this circuit on qubits n k + 1, . . . , n,
followed by the encoding circuit given by procedure 6.6. The resulting circuit will then encode the code with
the correct logical Paulis.
For instance, in the previous subsection, we found an encoding circuit for the five-qubit code. Let us
determine what logical Paulis it gives:

X5 ! Z ⌦ I ⌦ I ⌦ Z ⌦ X (6.100)
Z5 ! Z ⌦ Z ⌦ Z ⌦ Z ⌦ Z. (6.101)

In this encoding, Z is correct. However,

X =Z ⌦I ⌦I ⌦Z ⌦X = (X ⌦ I ⌦ X ⌦ Z ⌦ Z)(Z ⌦ X ⌦ I ⌦ X ⌦ Z)(X ⌦ X ⌦ X ⌦ X ⌦ X). (6.102)

Therefore, we need the Cli↵ord X ! X, Z ! Z. This is just the gate Z, so we can correct the circuit of
figure 6.4 by beginning with a Z5 gate.

6.5 Extending the Cli↵ord Group to a Universal Gate Set

Since the Cli↵ord group can be simulated classically, to do any really interesting quantum algorithms, we
will need some gates outside the Cli↵ord group. How many gates and which ones suffice? It turns out any
single gate will do it.
Theorem 6.9. Let G ✓ SU(2n ) be a group of quantum gates such that Ĉn ⇢ G but G 6= Ĉn . Then G is
dense in SU(2n ).
That is, for any U 2 SU(2n ) and any ✏ > 0, 9V 2 G such that kU V k < ✏. Thus, we can approximate
all quantum gates to any desired degree of accuracy using G, even if G is generated by just the Cli↵ord
group, plus any additional gate.
People frequently pick a somewhat nice gate to use as the extra gate. Two popular choices are the
To↵oli gate Tof (also known as the controlled-controlled-NOT gate) |ai|bi|ci 7! |ai|bi|c abi and the ⇡/8
phase rotation R⇡/8 . These gates are useful extra gates because they have comparatively straightforward
fault-tolerant implementations using common QECCs, as you’ll see in chapter 13.
The proof of theorem 6.9 is complicated and involves some more advanced concepts. I hope to include
an accessible version of it in a later draft of this book.

107
108
Chapter 7

Tighter, Please: Upper And Lower

Bounds On Quantum Codes

As I’ve told you, the three most important properties of a quantum code are the number n of logical qubits (or
qudits), the dimension K of the encoded subspace, and the distance d of the code. Together, the parameters
((n, K, d)) tell us about the trade-o↵ between the rate at which we can send quantum information and the
tolerance we gain against errors. Ideally, we’d like to know for any given set ((n, K, d)) whether a QECC
exists with those parameters.
Unfortunately, we can’t answer that question. It seems to be extremely hard. However, we can set some
bounds, which set limits on where we can hope to find interesting QECCs. On the one hand, there are lower
bounds, saying that codes definitely exist with certain parameters. They can be constructive (specifying a
particular code) or non-constructive (proving that codes with the parameters exist without giving an efficient
method of specifying one). Then we have the upper bounds. There are many di↵erent methods used for
proving upper bounds, but they are uniformly destructive: they tell us there are no codes with certain
parameters.
The most attractive bounds are the tightest, with the upper bounds hugging as closely as possible to the
lower bounds. Occasionally we can actually make the upper and lower bounds touch, but more often there is
some space between. Unfortunately, in many instances, our bounds are rather baggy and shapeless, leaving
a lot of room, which may or may not contain actual QECCs, between the upper and lower bounds.

7.1 The Quantum Gilbert-Varshamov Bound

We’ll start with a lower bound, the quantum Gilbert-Varshamov bound. As you might guess from the name,
it is a quantum analogue of the classical Gilbert-Varshamov lower bound discussed in section 4.5. The main
di↵erence from the classical Gilbert-Varshamov bound is that there are more quantum errors that occur on
a qubit than there are classical errors that can occur on a bit.

Theorem 7.1 (Quantum Gilbert-Varshamov bound). Suppose

0 1
d 1
X ✓ ◆
n A
2k @ 3j  2n . (7.1)
j=0
j

Then there exists a ((n, 2k , d)) QECC. For large n, a code exists if

k/n  1 (d/n) log 3 h(d/n), (7.2)

with h(x) = h2 (x) given by equation (4.22).

109
Actually, the theorem will show that a stabilizer code with these parameters [[n, k, d]] exists. It even can
be made a little bit tighter than the statement of the theorem indicates, but it is usually phrased this way
to be the closest analogue of the classical Gilbert-Varshamov bound. The proof is non-constructive. In this
case, as with the classical Gilbert-Varshamov bound, that means that, while the proof specifies a procedure
to find a code with these parameters, the procedure is exponentially long as a function of n, so it’s not useful
in practice. Since we get a stabilizer code, the code we produce can be described efficiently, but that is small
consolation if we can’t get all the way through the procedure to find it.
One can also prove a version of the quantum Gilbert-Varshamov bound for qudits, at least for prime
power dimensions. The proof is analogous, using the idea of a qudit stabilizer code, which I’ll discuss in
chapter 8.
Proof. Imagine making a long list of all [[n, k]] stabilizer codes. We are going to run through all possible
errors of weight up to d 1. For each error E, we can check which codes can detect that error and which
cannot, and cross o↵ the list any code that can’t detect E. Once we finish running through all the errors,
we know that any code that remains must have distance at least d. We’ll prove that there must be at least
one code to survive, giving us the desired [[n, k, d]] code.
For any given error E, we need to count how many codes cannot detect it. For any two errors E, F 2
Pn \ {I}, there exists some Cli↵ord group element U with U (E) = F . Furthermore, if stabilizer code S is a
code that doesn’t detect E, then U (S) is a stabilizer code that doesn’t detect F . That is, U will permute the
list of stabilizer codes and their sets of undetectable errors. Thus, the number of codes that cannot detect
E must be the same as the number of codes that cannot detect F . That is, all non-identity errors in the
Pauli group are undetectable for the same number C of [[n, k]] stabilizer codes.
Suppose there are N stabilizer codes with parameters [[n, k]]. (We can evaluate N exactly if we like,
but it is not necessary for this argument.) The code S detects the errors outside N̂(S) \ Ŝ, which contains
2n+k 2n k elements. If we consider a bigger list of pairs (S, E) for any pair for which E 2 N̂(S) \ Ŝ, we
can count the number of elements on the list two ways: as the number of errors times the number of codes
that cannot detect each error, or as the number of codes times the number of errors that each code cannot
detect. There are 4n 1 non-identity errors, and I never appears in N̂(S) \ Ŝ, so we have that

(4n 1)C = N (2n+k 2n k

). (7.3)

Now let us go back to crossing codes o↵ our list of stabilizer codes. We work our way through the set of
errors of weight up to d 1. For any non-identity error E with weight less than d, there are C codes which
do not detect it, so we can cross those codes o↵ the list. Some of them may have already been eliminated
because they fail to detect a previous error, but certainly we cannot eliminate more than C new codes from
the list of [[n, k]] QECCs. There are a total of B 1 non-identity errors of weight less than d, with
✓ ◆
d 1
X n j
B= 3 (7.4)
j=0
j

(not including the identity since all codes detect it). By the time we finish going through all errors of weight
less than d, we have therefore crossed o↵ at most (B 1)C codes. If (B 1)C < N , then at least one code
remains.
Plugging in equation (7.3) to eliminate C, the condition we get is

2n+k 2n k
(B 1)N < N, (7.5)
4n 1
or
4n 1
2n k
(B .1) < (7.6)
4k 1
This is the tightest version of the quantum Gilbert-Varshamov bound, but notice that
4n 1 4n
> k = 4n k
, (7.7)
4k 1 4

110
so if
2n k
B  4n k
, (7.8)
then equation (7.6) is satisfied too. Equation (7.8) is just what we wanted to prove.

7.2 The Quantum Hamming Bound

We’ll now move on to an upper bound of sorts. It is an analogue of the classical Hamming bound, so is
called the quantum Hamming bound.

Theorem 7.2 (Quantum Hamming bound). If a non-degenerate ((n, K, 2t + 1))q code exists, then
0 1
Xt ✓ ◆
n A
K @ (q 2 1)j  qn . (7.9)
j=0
j

The big catch in the statement of the quantum Hamming bound is that while it provides an upper bound,
the upper bound only holds for non-degenerate codes. Here is where we start to see very concretely how the
existence of degenerate codes makes the quantum case more complicated than the classical case. Insisting
that a code be degenerate is potentially a big limitation, yet it is easier to prove bounds on the existence
of non-degenerate codes. Indeed, while we don’t know how to prove that the quantum Hamming bound
applies to degenerate codes as well as non-degenerate codes, we also don’t know any ((n, K, 2t + 1)) codes
that violate it.

Proof. The proof is a straightforward exercise in counting dimensions, and is very analogous to the proof of
the classical Hamming bound. Because the code is non-degenerate, we know that in the QECC conditions,
the matrix
Cab = h |Ea† Eb | i (7.10)
(for any codeword | i) has maximum rank when Ea† and Eb run over any basis for the space of  t-qubit
errors. In particular, this means that the states Ea | i are all linearly independent. Furthermore, if | i and
| i are two orthogonal codewords, then the QECC conditions tell us that

h |Ea† Eb | i = 0, (7.11)

so Ea | i and Eb | i are orthogonal for any a, b.

Thus, if we let {| i i} be a basis for the code space, the set of states {Ea | i i} (over all a, i) are linearly
independent. The codespace has dimension K and there are q 2 1 one-qudit non-identity errors, so there
are a total of
Xt ✓ ◆
2 j n
B= (q 1) (7.12)
j=0
j

linearly independent errors of weight up to t. Thus, we have a set of KB linearly independent vectors, which
we must fit into a Hilbert space of n qudits. Therefore,

KB  q n (7.13)

Looking at this proof, you’ll see that the quantum Hamming bound does not make essential use of the
qudit structure of the space, or the fact that we are dealing with t-qudit errors. In general, if we have a
non-degenerate QECC with a K-dimensional code space living in an N -dimensional physical Hilbert space,
and the code can correct E, a set of linearly independent errors, then K|E|  N .

111
a) b)
Q

A B A B C

Figure 7.1: a) Dividing n registers into one set A of size d 1 and one set B of size n (d 1) for the proof
of lemma 7.4. If n (d 1)  d 1, we can decode each set separately to clone the encoded qubit. b) System
Q is maximally entangled with system R. Q is then encoded into n registers, which are split into three sets
for the proof of the quantum Singleton bound: sets A and B of size d 1 and set C of size n 2(d 1).

7.3 The Quantum Singleton Bound

The quantum Singleton bound has a similar form to the classical Singleton bound, but is significantly harder
to prove. The quantum Singleton bound is also known as the Knill-Laflamme bound, after the first people
to state it. The quantum Singleton bound is a real upper bound, applying to non-degenerate codes as well
as degenerate ones, and to both stabilizer and non-stabilizer codes over qubits or qudits.
Theorem 7.3 (Quantum Singleton bound). If an ((n, q k , d))q QECC exists, then

n k 2(d 1). (7.14)

As you can see by comparing with theorem 4.19, the quantum bound is more restrictive on the distance
by a factor of 2. The reason for this factor of 2 turns out to be the No-Cloning Theorem! You may recall that
our very first objection in section 2.1 to the possibility of a quantum error-correcting code was that classical
codes seemed to use repetition of information as a key component. We circumvented that difficulty by using
entanglement to spread out quantum information without repeating it. The quantum Singleton bound can be
viewed as a way of codifying just how well information can be spread around without accidentally repeating
anything. For instance, the bound immediately tells us that there can be no complete quantum analogue of
the classical (3, 2, 3) repetition code, since a ((3, 2, 3)) QECC would violate the quantum Singleton bound.
Proof. We’ll start by studying the k = 1 case, which is an immediate consequence of the No-Cloning Theorem:
Lemma 7.4. If an ((n, q, d))q QECC exists, then

n 1 2(d 1). (7.15)

Proof of lemma. If a code has distance d, then it can correct d 1 erasure errors. We can imagine dividing
the n registers of the code into two sets, a set A with d 1 registers and a set B with n (d 1) registers,
as pictured in figure 7.1a. Set B has experienced d 1 erasures, so using just those registers, it is possible
to reconstruct the original encoded state | i.
Now suppose that n 1 < 2(d 1). Then n (d 1)  d 1, and therefore set A has also experienced at
most d 1 erasure errors. We would then be able to reconstruct a second copy of | i using just the registers
in set A. We could then use this code to clone the state | i via the following procedure: Encode | i using
the QECC, split up the registers into the sets A and B, and then reconstruct | i independently for each set.
We know this isn’t possible, leading to a contradiction and proving the lemma.

Now we can move on to the general case. When k > 1, we split the n registers into three sets, as pictured
in figure 7.1b. Sets A and B will each have d 1 registers, and set C will have n 2(d 1) registers. If an

112
((n, q k , d))q code exists for k 1, then a ((n, q, d))q code also exists, as we can just ignore the extra logical
codewords. We therefore know from the lemma that n > 2(d 1), and set C is non-empty.
Now imagine taking a 2k-qudit maximally entangled state between two registers R and Q, each with
dimension q k , and encode Q in the QECC. Then divide up the registers of the QECC into the sets A, B,
and C, giving us a total of 4 sets (R, A, B, C). Globally, we now have a pure state, so S(RABC) = 0. If we
split our sets into two groups, the entropy of the two groups is equal, e.g.

S(RA) = S(BC) (7.16)

S(RB) = S(AC). (7.17)

We also know that the code has distance d, so we can detect any erasure error restricted to just set A
or to just set B (but not necessarily errors involving both sets). Applying the alternate QECC conditions of
section 2.5.6, we see that this means that any operator we measure on set A will give us the same result for
all possible logical codewords. Thus, the density matrix ⇢A of set A is the same for all logical codewords,
and we find that the density matrix for R and A combined is of the form
k
qX 1
|ii hi|R ⌦ ⇢A . (7.18)
i=0

R and A are in a tensor product state, so S(RA) = S(R) + S(A). Similarly, S(RB) = S(R) + S(B). We
also note that S(R) = k log q.
Now, S(RA) = S(BC), and by subadditivity of the entropy, S(BC)  S(B) + S(C). Therefore,

k log q + S(A) = S(RA) = S(BC)  S(B) + S(C), (7.19)

or
k log q  S(C) + [S(B) S(A)]. (7.20)

Applying subadditivity to S(AC), we also find that

k log q  S(C) + [S(A) S(B)]. (7.21)

Adding together these last two equations, we find that

k log q  S(C). (7.22)

However, S(C)  log dim(C) = [n 2(d 1)] log q. Therefore, we find

kn 2(d 1). (7.23)

As an immediate application of the quantum Singleton bound, we find that the five-qubit code is optimal.
Not only can there be no ((3, 2, 3)) code, but there can also be no ((4, 2, 3)) QECC. The five-qubit code
exactly saturates the quantum Singleton bound; it is a quantum MDS code. We can also saturate the
quantum Singleton bound with d = 2. There are [[n, n 2, 2]] stabilizer codes for any even n (but not for
odd n). As we go to more encoded qubits or to greater distances, it is no longer possible to exactly saturate
the quantum Singleton bound with qubit codes. A similar phenomenon occurs classically, and as in the
classical case, we get more MDS codes as we let the dimension of the registers increase. In section 8.3, we’ll
see a large family of such codes.

113
k/n
1

d/n
1/4 1/2

Figure 7.2: Quantum Hamming bound (solid), quantum Gilbert-Varshamov bound (dashed), and quantum
Singleton bound (dotted) for large n, q = 2

7.4 Linear Programming Bounds

The quantum Singleton bound does provide us with some definite limits on the existence of QECCs since,
unlike the quantum Hamming bound, it applies to degenerate codes too. Still, it doesn’t tell us much.
Figure 7.2 shows the lower bound provided by the quantum Gilbert-Varshamov bound along with the upper
bound given by the quantum Singleton bound, both plotted for codes using qubits. I’ve also shown the
quantum Hamming bound so you can see that even if we were able to prove it applied to degenerate codes,
there would still be a lot of room where we don’t know about the existence of codes. The quantum Singleton
bound is much worse. There’s no way we can be satisfied with that.
In order to do better, we’ll need more powerful techniques. The main ones are known as linear program-
ming bounds. A linear programming bound provides a set of linear inequalities that a QECC with a given
set of parameters ((n, K, d))q must satisfy. If we can prove that the set of inequalities has no solutions for a
given n, K, and d, then we know that a QECC with those parameters cannot exist. If there is a solution,
then a QECC may or may not exist, but the linear programming solution gives us a number of constraints
about the structure of any possible code with those parameters, so provides a good hint as to what to look
at to find a code.
In this section, I’ll discuss the main linear programming bounds, but only for codes over qubits. These
bounds can be generalized to apply to qudits, and it’s possible to come up with more linear programming
bounds, which could narrow down the set of possible codes even further.

7.4.1 Weight Enumerator and Dual Weight Enumerator

The primary tools used in coming up with linear programming bounds are known as weight enumerators.
There are many di↵erent kinds of weight enumerators, but we’ll start with the simplest, which gets no
additional adjectives in its name. Weight enumerators are polynomials that encode information about the
properties of a QECC.
To motivate the definition of weight enumerator, let’s first consider the case of a stabilizer code. The
stabilizer S consists of Pauli operators. Some of those Pauli operators are large, with high weight, and some
are small, with low weight. Indeed, S always contains I, which has weight 0. We’ll count up the number of
elements of each weight; let Aj be the number of Paulis M 2 S with weight j. The Aj s are integers which
tell us about the structure of the stabilizer, although they don’t tell us everything. How can we get similar
information about more general QECCs?

114
Definition 7.1. Let ⇧ be the projector onto the code subspace of some ((n, K)) QECC Q. Let
1 X 2
Aj = |tr(P ⇧)| . (7.24)
K2
P 2P̂n | wt(P )=j

The weight enumerator of Q is the degree n polynomial

n
X
A(x) = Aj x j . (7.25)
j=0

The motivation for defining the weight enumerator as a polynomial instead of just working with the
individual coefficients will become clearer in the next subsection, where I’ll discuss the quantum MacWilliams
identity, which deals with the polynomial as a whole.
Proposition 7.5. When Q is a stabilizer code with stabilizer S,

Aj = |{M 2 S| wt M = j}| . (7.26)

Proof. For a stabilizer code, K = 2k , and we have an explicit description of the projector onto the code:
1 X
⇧= M. (7.27)
2n k
M 2S

Now, tr Q = 0 if Q 2 P̂n \ {I} and tr I = 2n . Thus,

(
2n if P = M
| tr(P M )| = (7.28)
0 6 M.
if P =

We will consider Paulis in P̂n , so depending on what representatives we pick, it could actually be that
tr(P M ) = 2n or ±i2n . However, since we immediately take the absolute value, all of these give the same
result.
Applying the general definition of Aj , we find
2
1 X 1 X
Aj = tr(P M ) (7.29)
22k 2n k
P 2P̂n | wt(P )=j M 2S
2
1 X 1 X
= 2n P,M (7.30)
22k 2n k
P 2P̂n | wt(P )=j M 2S

1 X
= 2k 22k S (P ) (7.31)
2
P 2P̂n | wt(P )=j

= |{M 2 S| wt M = j}| . (7.32)

In the third line, S (P ) is the indicator function, which is 1 when P 2 S and 0 when P 62 S.

Similarly, we can define a dual weight enumerator. For a stabilizer code, it is built out of Bj s which give
the weight distribution of the normalizer of S. In general, we have the definition
Definition 7.2. Let ⇧ be the projector onto the code subspace of some ((n, K)) QECC Q. Let
1 X
Bj = tr(P ⇧P † ⇧). (7.33)
K
P 2P̂n | wt(P )=j

115
The dual weight enumerator of Q is the degree n polynomial
n
X
B(x) = Bj xj . (7.34)
j=0

Again, the general definition reduces to the description given for stabilizer codes:
Proposition 7.6. When Q is a stabilizer code with stabilizer S,

Bj = |{N 2 N(S)| wt N = j}| . (7.35)

Proof. Again, we have

1 X
⇧= M, (7.36)
2n k
M 2S

and we see that X

1
P ⇧P † = ( 1)c(P,M ) M. (7.37)
2n k
M 2S

Note that when P 2 N(S), then P ⇧P † = ⇧. However, when P 62 N(S), then P ⇧P † will be the projector on
the subspace given by a di↵erent error syndrome of S. That subspace is orthogonal to the code space, so we
have that P ⇧P † ⇧ = 0 when P 62 N(S). This is also straightforward to prove through direct computation:
1 X
P ⇧P † ⇧ = ( 1)c(P,M ) M N (7.38)
22(n k) M,N 2S
!
1 X X
= ( 1) c(P,M )
N0 (7.39)
22(n k)
N 0 2S M 2S
1 X
0
= 2n k
N(S) (P ) N (7.40)
22(n k) N 0 2S
= N(S) (P ) ⇧, (7.41)

where N(S) (P ) is the indicator function for N(S) and line three follows because if P 62 N(S), then P
anticommutes with exactly half of S.
Thus, applying the general definition of Bj , we find

1 X
Bj = tr(⇧) N(S) (P ) (7.42)
2k
P 2P̂n | wt(P )=j

= |{N 2 N(S)| wt N = j}| . (7.43)

The coefficients Aj and Bj of the weight enumerator and dual weight enumerator are always integers for
a stabilizer code, but can be non-integer for general QECCs.
For a stabilizer code, the dual weight enumerator tells us about the structure of the normalizer, which in
turn tells us something about which errors we can detect. If Bj is non-zero, it means there are some Paulis of
weight j in the normalizer, which worries us if j is small. Actually, though, we are interested in the number
of elements of N̂(S) \ Ŝ, which is given by Bj Aj . If that is non-zero, then we actually have some errors
of weight j that are undetectable, giving us a bound on the distance of the code. While this argument does
not hold for general QECCs, the intuition and result about Aj and Bj does carry over:
P
Theorem P 7.7.j Let Q be a QECC with weight enumerator A(x) = Aj xj and dual weight enumerator
B(x) = Bj x . Then

116
a) A0 = B0 = 1
b) Bj Aj 0
c) Q has distance d i↵ Aj = Bj for j < d.
Proof.

a) There is only one Pauli of weight 0: the identity. Thus,

1 2
A0 = |tr(⇧)| = 1 (7.44)
K2
1 1
B0 = tr(⇧2 ) = tr(⇧) = 1. (7.45)
K K

b) From the definition of Aj , we can immediately see that Aj 0.

Let {|ai} be a basis for the code space of Q. Then
X
⇧= |ai ha|, (7.46)
a

so
2
1 X X
Aj = 2 ha|P |ai , (7.47)
K a
P | wt(P )=j
1 X X 2
Bj = |ha|P |bi| . (7.48)
K
P | wt(P )=j a,b

Our next step is to apply the Cauchy-Schwarz inequality, which says that for two complex vectors ~x and ~y ,

|~x · ~y |2  |~x|2 |~y |2 . (7.49)

Let ~x be the K 2 -dimensional complex vector with entries ha|P |bi (running over pairs (a, b)), and let ~y be the
K 2 -dimensional vector with entries equal to (1/K) ab (again running over pairs (a, b)). Then we have
2
1 X
|~x · ~y |2 = ha|P |bi ab (7.50)
K
a,b
2
1 X
= ha|P |ai (7.51)
K2 a
 |~x|2 |~y |2 (7.52)
0 1 !
X X 1
=@ |ha|P |bi|2 A ab (7.53)
K2
a,b ab
1 X
= |ha|P |bi|2 . (7.54)
K
a,b

Plugging into equations (7.47) and (7.48), we find that Aj  Bj .

c) From the definition of distance, equation (2.66), if the code has distance d, then for wt(E) < d,

h |E| i = c(E)h | i (7.55)

117
for any codewords | i and | i. Applying equations (7.47) and (7.48) for j < d, we get
1 X
Aj = K 2 |c(E)|2 (7.56)
K2
E| wt E=j
1 X
Bj = K|c(E)|2 , (7.57)
K
E| wt E=j

and Aj = Bj .
For the converse, looking at the proof of part b, the equality condition for the Cauchy-Schwarz inequality is
that ~x is proportional to ~y . Thus, Aj = Bj implies that

ha|E|bi = C(E) ab (7.58)

whenever wt E = j. This is one of the alternate forms of the QECC conditions.

The classical theory of weight enumerators is similar, except that for classical codes, Aj = Bj = 0 for
j < d. The di↵erence is essentially a manifestation of the phenomenon of degeneracy. Let’s think about the
special case of stabilizer codes. In a non-degenerate stabilizer code S, there are no elements of N(S) with
weight less than d. Therefore Bj = 0 for j < d. In a degenerate stabilizer code, N(S) can have elements with
weight less than d, but those elements must also be in S. Thus, Bj = Aj > 0 for some j < d.
This intuition almost carries over to general QECCs, but it turns out to be a slightly di↵erent concept.
Definition 7.3. An ((n, K, d)) QECC with weight enumerators A(x) and B(x) is pure if Aj = Bj = 0 for
j < d. A code that is not pure is impure.
The argument above shows that for stabilizer codes, pure is the same as non-degenerate. However, for
more general codes, the property of being pure is more stringent than the property of being non-degenerate.
In other words, a code can be impure without being degenerate, but if it’s degenerate, it is definitely impure.
The possibility of impure codes complicates the application of the linear programming bounds to quantum
codes. Some results which are known for classical codes have not yet been adapted for quantum codes because
of the additional difficulty created by dealing with impure codes.

7.4.2 Quantum MacWilliams Identity

Theorem 7.7 gives us one set of inequalities constraining the coefficients Aj and Bj . However, these inequal-
ities by themselves are not yet enough to rule out any codes, since for any set of parameters ((n, K, d)), we
can easily come up with a set of Aj and Bj that satisfy theorem 7.7. In order to get actual upper bounds
on codes this way, we’ll need a stronger relation between A(x) and B(x).
In the case of a stabilizer code, the weight enumerator A(x) encapsulates properties of the stabilizer S
and the dual weight enumerator B(x) captures properties of N(S). Since N(S) is completely determined by
S, it’s perhaps not too surprising that there is some relationship between A(x) and B(x) as well. What is
remarkable is that even though the full description of S contains a lot more information than is contained in
A(x), the weight enumerator by itself is enough to completely determine B(x). Moreover, this relationship
also holds for non-stabilizer codes.
Theorem 7.8 (Quantum MacWilliams identity). Suppose we have an ((n, K)) QECC with weight enumer-
ator A(x) and dual weight enumerator B(x). Then
✓ ◆
K n 1 x
B(x) = n (1 + 3x) A (7.59)
2 1 + 3x
The quantum MacWilliams identity above is phrased in terms of codes over qubits, but like the other
bounds we’ve discussed, it can also be generalized to codes over qudits.

118
Proof. Let’s eliminate the projector ⇧ that appears in the definitions of Aj and Bj in favor of Paulis. We
can do this by using the fact that P̂n gives a basis for the space of 2n ⇥ 2n matrices, so
X
⇧= cQ Q. (7.60)
Q2P̂n

The coefficients cQ = tr(Q† ⇧)/2n .

Then
2
1 X X
Aj = 2 cQ tr(P Q) (7.61)
K
P 2P̂n | wt(P )=j Q2P̂n
4n X 2
= |cP | , (7.62)
K2
P 2P̂n | wt(P )=j

In the first line for Bj , I’ve used ⇧† in one place instead of ⇧ (they are the same, after all) to finesse issues
about using Paulis with the overall sign modded out. We can see that Aj and Bj both involve a sum over
|cQ |2 , but with two di↵erences: In Aj , we only sum over Paulis of weight j whereas for Bj we sum over
all Paulis, and in Bj we have an additional sum with alternating signs which depend on the commutation
relations.
Let’s take some Q with weight i and try to count how many Paulis of weight j commute with it, which
we’ll call Cij , and how many anticommute, Aij . Then we will have
2n X
Bj = (Cwt(Q),j Awt(Q),j )|cQ |2 . (7.66)
K
Q2P̂n

First of all, we know that there are a total of

✓ ◆
n
j
3 (7.67)
j
Paulis of weight j. Also note that the answer won’t depend on exactly what Q we take, only its weight i,
because single-qubit Cli↵ord group operations and permutations of the qubits can relate any two Paulis of
weight i, and those operations won’t change the commutation relations or weights of Paulis.
Let’s break up the counting problem into counting Paulis P that overlap with Q on exactly m qubits
m
(i.e., the intersection of their supports is m qubits), and determining how many commute (Cij ) and how
m
many anticommute (Aij ). P has total weight j, so it acts on exactly j m qubits outside the support of Q.
There are thus ✓ ◆
n i
3j m (7.68)
j m
i
ways of choosing the action of P outside the support of Q. Within the support of Q, there are m ways of
choosing which qubits P acts on non-trivially.

119
We now wish to find how many weight m Paulis commute and anticommute with a fixed Pauli of weight
m, which can assume without loss of generality to be Z ⌦m . That is, it suffices to find Cmm
m
and Am
mm , and
then we know that

✓
◆✓ ◆
m n i
j m i
Cij =3 Cm (7.69)
j m m mm
✓ ◆✓ ◆
j m n i i
Am
ij =3 Am . (7.70)
j m m mm

1
Now, C11 = 1 and A111 = 2. We can use induction to determine Cmm
m
and Am
mm for all m:

m m 1 m 1
Cmm = C(m 1)(m 1) + 2A(m 1)(m 1) (7.71)
m 1 m 1
Am
mm = 2C(m 1)(m 1) + A(m 1)(m 1) . (7.72)

m
Actually, what we’re really interested in is Cij Am
ij , and we see that

m 1
m
Cmm Am
mm = C(m 1)(m 1) + Am
(m
1
1)(m 1) = ( 1)m 1 1
C11 A111 = ( 1)m . (7.73)

Putting all of this together, we find that

(7.78)

The function

min(z,j)
X ✓ ◆✓ ◆
m j m n z z
Pj (z; n) = ( 1) 3 (7.79)
m=0
j m m

is known as a Krawtchouk polynomial, and

X
(1 + 3x)n z
(1 x)z = Pj (z; n)xj , (7.80)
j

120
as can be verified by expanding the left-hand side. That tells us
n
X
B(x) = Bj xj (7.81)
j=0
n n
K XX
= Pj (i; n)Ai xj (7.82)
2n i=0 j=0
n
K X
= (1 + 3x)n i (1 x)i Ai (7.83)
2n i=0
Xn ✓ ◆i
K n 1 x
= n (1 + 3x) Ai (7.84)
2 i=0
1 + 3x
✓ ◆
K n 1 x
= n (1 + 3x) A (7.85)
2 1 + 3x

The quantum MacWilliams identity combined with theorem 7.7 puts a number of restrictions on the
possible values of ((n, K, d)) for a code, enough to rule out many more possibilities than the quantum
Singleton bound. However, we can set even tighter bounds by adding another weight enumerator known as
the quantum shadow enumerator.

7.4.3 Quantum Shadow Enumerator

The quantum shadow enumerator has a somewhat stranger definition than the weight enumerator and dual
weight enumerator. The weight enumerator and dual weight enumerator treat all Paulis of a given weight
on an equal footing. The quantum shadow enumerator does not:
Definition 7.4. Suppose ⇧ is the projector onto the code subspace of some ((n, K)) QECC Q. Let
1 X
Shj = tr(P ⇧P † Y ⌦n ⇧⇤ Y ⌦n ). (7.86)
K
P 2P̂n | wt P =j

The shadow enumerator of Q is

n
X
Sh(x) = Shj xj . (7.87)
j=0

⇧⇤ is the complex conjugate of the projector ⇧. The definition of the shadow enumerator is heavily
basis-dependent, both for defining the Paulis and for defining the complex conjugate of an operator.
Theorem P7.9. Suppose we have an ((n, K)) QECC with weight enumerator A(x) and shadow enumerator
Sh(x) = Shj xj . Then
a) Shj 0 8j
b) Sh(x) can be determined from A(x):
✓ ◆
K x 1
Sh(x) = n (1 + 3x)n A . (7.88)
2 1 + 3x

Notice that the relation between A(x) and Sh(x) di↵ers from the quantum MacWilliams identity only in
that there is an x 1 in the numerator of the argument of A instead of 1 x.
Proof.

0 (7.91)
since Y = Y † .
b) The proof proceeds similarly to that for the quantum MacWilliams identity. We again write
X
⇧= cQ Q, (7.92)
Q2P̂n

and find
1 X X
Shj = cQ c⇤R tr(P QP † Y ⌦n R⇤ Y ⌦n ) (7.93)
K
P 2P̂n | wt(P )=j Q,R2P̂n
1 X X
= cQ c⇤R ( 1)c(P,Q) tr(QY ⌦n R⇤ Y ⌦n ). (7.94)
K
P 2P̂n | wt(P )=j Q,R2P̂n

Now, X = X and Z = Z, but Y ⇤ = Y , so R⇤ = ±R, with the sign determined by whether there are an
⇤ ⇤

even or an odd number of Y s in the tensor decomposition of R. Furthermore,

⌦n
Y ⌦n R⇤ Y ⌦n = ( 1)c(R,Y )
R⇤ . (7.95)
c(R, Y ⌦n ) is 1 if the number of Xs plus Zs in the tensor decomposition of R is odd, and 0 if it is even. Thus,
tr(QY ⌦n R⇤ Y ⌦n ) = ( 1)wt(R) Q,R 2
n
. (7.96)
The sign follows because if the number of Y s and the number of Xs plus Zs in R are both even or both odd,
we get an overall factor of +1 , whereas if one of them is odd and one is even, we get a factor of 1.
We therefore have
1 X X
Shj = cQ c⇤R ( 1)c(P,Q)+wt(R) Q,R 2
n
(7.97)
K
P 2P̂n | wt(P )=j Q,R2P̂n
2 3
2n X 4 X
= ( 1)c(P,Q) 5 ( 1)wt(Q) |cQ |2 . (7.98)
K
Q2P̂n P 2P̂n | wt(P )=j

The evaluation of the sum over P is just the same as before. We again get a Krawtchouk polynomial, to find
n n
K XX
Sh(x) = ( 1)i Pj (i; n)Ai xj (7.99)
2n j=0 i=0
n
K X
= (1 + 3x)n i (1 x)i ( 1)i Ai (7.100)
2n i=0
Xn ✓ ◆i
K x 1
= n (1 + 3x)n Ai (7.101)
2 i=0
1 + 3x
✓ ◆
K x 1
= n (1 + 3x)n A . (7.102)
2 1 + 3x

122
7.4.4 Example: There Is No ((3, 2, 2)) Code
OK. We have these three weight enumerators, but what do we do with them? How can we put together
the constraints we have to find out if certain parameters of codes are possible or not? One approach is to
make approximations and use properties of the Krawtchouk polynomials to prove non-existence of certain
solutions. This approach gives another method of proving the quantum Singleton bound, and can also give
tighter bounds on the existence of QECCs.
Another approach is to apply the linear programming method. The procedure is to treat the parameters
Aj , Bj , and Shj as unknown variables, and write down all the equations we have above:

A0 = 1 (7.103)
B0 = 1 (7.104)
Bj = Aj (for j < d) (7.105)
Bj Aj (for j d) (7.106)
Aj 0 (7.107)
Shj 0 (7.108)
✓ ◆
K 1 x
B(x) = (1 + 3x)n A (7.109)
2n 1 + 3x
✓ ◆
K n x 1
Sh(x) = n (1 + 3x) A (7.110)
2 1 + 3x

Of course, many of the lines actually represent groups of equations, running over values of j. Notice that
all the equations, even the last two groups, are linear in Aj , Bj , and Shj . Some are equalities and some
are inequalities. The question of whether a solution exists or not is a linear programming problem, which
is a common task. There are standard computer packages for solving linear programming problems, and so
the usual method is to put the equations corresponding to the desired parameters ((n, K, d)) of a code into
such a package, and see whether it says a solution is possible. If not, we know no code can exist with those
parameters. If so, we learn the solution(s) of the linear programming problem, which doesn’t tell us whether
a code exists or not, but does at least tell us possible values for the code’s weight enumerator if it does exist,
and therefore something about the structure of the code.
For large parameters, this is certainly best done by a computer, but to show you how the procedure
works, I’ll do a small example here explicitly, showing that there is no ((3, 2, 2)) QECC. It is easy to show
that there is no [[3, 1, 2]] stabilizer code, but for more general codes, we need a more powerful method, and
the linear programming bounds will suffice. The code is allowed by the quantum Singleton bound, and
indeed, you’ll see a ((3, 3, 2))3 code over qutrits in chapter 8.
Let us start by writing down the equations produced by the quantum MacWilliams identity:
✓ ◆
2 1 x
B(x) = 3 (1 + 3x)3 A (7.111)
2 1 + 3x
1⇥ ⇤
= (1 + 3x)3 A0 + (1 + 3x)2 (1 x)A1 + (1 + 3x)(1 x)2 A2 + (1 x)3 A3 (7.112)
4
Collecting the coefficients of xj for j = 0, . . . , 3, we find

4B0 = A0 + A1 + A2 + A3 (7.113)
4B1 = 9A0 + 5A1 + A2 3A3 (7.114)
4B2 = 27A0 + 3A1 5A2 + 3A3 (7.115)
4B3 = 27A0 9A1 + 3A2 A3 . (7.116)

123
We also know that A0 = A1 = 1, B1 = A1 0, B2 A2 0, and B3 A3 0. Combining these equations,
we find

A1 + A2 + A3 = 3 (7.117)
A1 + A2 3A3 = 9 (7.118)
3A1 9A2 + 3A3 27 (7.119)
9A1 + 3A2 5A3 27 (7.120)

From equations (7.117) and (7.118), we find A3 = 3, so A1 + A2 = 0. Since A1 , A2 0, this means

that A1 = A2 = 0. This solution also satisfies equations (7.119) and (7.120). Using only the quantum
MacWilliams identity, we cannot rule out a ((3, 2, 2)) code, but we know that if one exists, it must have
weight enumerator A(x) = 1 + 3x3 and dual weight enumerator B(x) = 1 + 9x2 + 6x3 .
We now invoke the shadow enumerator. Let us write down the relation between A(x) and Sh(x), and
plug in the above value for A(x):
1⇥ ⇤
Sh(x) = (1 + 3x)3 A0 + (1 + 3x)2 (x 1)A1 + (1 + 3x)(x 1)2 A2 + (x 1)3 A3 (7.121)
4
1⇥ ⇤
= (1 + 3x)3 + 3(x 1)3 (7.122)
4
1
= 2 + 18x + 18x2 + 30x3 . (7.123)
4
However, we see that Sh0 < 0, which is not allowed. Thus, a ((3, 2, 2)) code is not possible.

124
Chapter 8

Bigger Can Be Better: Qudit Codes

Computers seem to like base 2, even when they are quantum computers. Consequently, it may seem natural
to stick to bits and qubits when dealing with error correcting codes, classical or quantum. Indeed, I’ve
gone to a lot of e↵ort in the previous chapters to build the mathematical structure of stabilizer codes and
the Cli↵ord group, all of which depends on the basic registers of our quantum computer being qubits. The
problem is that there are lots of interesting codes that don’t quite fit into this structure. Instead, they fit
into similar structures associated to higher-dimensional registers — qudits.
The phenomenon also occurs in the theory of classical error correction, where it’s helpful to work with
linear codes over arbitrary finite fields rather than just binary linear codes. That’s why I introduced non-
binary codes in chapter 4. Now we’ll see how to do the same for quantum codes. Quantum codes with bigger
registers can generally correct more errors and send quantum data with a higher rate than qubit codes.
Based on the last sentence, this may appear a clear-cut case of bigger being better, but it’s not quite that
straightforward. You can’t compare an error on a single qubit with an error on a 32-dimensional qudit on an
even basis. I wouldn’t say you are comparing apples to oranges; rather, you are comparing apples to boxes of
apples. A 32-dimensional qudit could be written using 5 qubits, so a single error on a 32-dimensional error
could be 5 single-qubit errors, and a single-qudit gate in 32 dimensions might need many one- and two-qubit
gates to achieve the same transformation. Nevertheless, some qudit codes are sufficiently interesting, in
terms of efficiency or other properties, to be worthwhile even once you take the size di↵erence of the registers
into account.

8.1 Qudit Pauli Group

A qudit is a generic term for a q-dimensional Hilbert space, considered as a fundamental unit of the overall
Hilbert space of dimension q n in much the same way that a qubit is a fundamental unit of a 2n -dimensional
Hilbert space. A 2n -dimensional Hilbert space has many possible factorizations into two-dimensional tensor
factors, but one factorization is considered physically favored, and that factorization gives us the physical
qubits making up the full Hilbert space. (The logical qubits encoded in a quantum error-correcting code
make up part of a di↵erent factorization.) Similarly, a q n -dimensional Hilbert space is generally assumed to
have some physically favored tensor product decomposition, producing the physical qudits comprising the
code.
The word “qudit” for a q-dimensional Hilbert space seems to have mostly won out in the literature, but
in some of the older literature you can find a variety of other terms, such as “qupit” (which usually assumes
the dimension is a prime p). You can also find, in papers young and old, occasional terms referring to a
qudit for a dimension of a specific size. “Qutrit” is the most common, for D = 3. Beyond that, you may
have to delve into the word’s Latin or Greek roots to figure out the dimension. For instance, a 4-dimensional
qudit might be a “ququart”, a 5-dimensional one might be a “ququint” or a “qupent” or some such, and for
D = 6, you might even get lucky with a “qusext.”

125
The first step in building quantum error-correcting codes for qudits, by whatever name, is to come up
with the correct generalization of the Pauli group. As for qubits, the qudit Pauli group will play a role both
in describing the types of errors we face and as a structural element for the qudit generalization of stabilizer
codes. We’re going to make heavy use of the machinery of finite fields in this chapter, so if you’re not already
familiar with it, you may want to go through appendix C first.

8.1.1 Qudit Pauli Group for Prime Dimension

The case where the qudit dimension p is prime is the simplest, so we’ll start there.

Definition 8.1. The single-qudit Pauli group P1 (p) for prime p > 2 consists of elements {! a X b Z c }, where

! = e2⇡i/p (8.1)
X|ji = |(j + 1) mod pi (8.2)
j
Z|ji = ! |ji, (8.3)

and a, b, c can be anywhere from 0 to p 1. The n-qudit Pauli group Pn (p) = P1 (p)⌦n ; it consists of tensor
products of n terms, each of the form X b Z c , with an overall phase factor of ! a . P̂n (p) = Pn (p)/{! a I} is the
qudit Pauli group without phases.

That is, ! is a pth root of unity, X is “add one mod p”, and Z is a phase shift by a pth root of unity.
We can write X and Z as matrices, of course:
0 1 0 1
0 0 ... 0 1 1 0 ... 0 0
B1 0 ... 0 0C B0 ! ... 0 0 C
B C B C
B 0C B C
X = B0 1 ... 0 C Z = B ... ..
.
..
.C (8.4)
B .. .. .. C B C
@. . .A @0 0 ... ! p 2
0 A
0 0 ... 1 0 0 0 ... 0 !p 1

You can see that X and Z are still unitary but are no longer Hermitian. I’d still like to talk about measuring
X or Z; for any traditionalists reading this who prefer to measure only Hermitian operators, P just imagine
measuring a Hermitian operator that has the same eigenbasis as X or Z. For instance, j j|ji hj| works
instead of Z, and outcome j corresponds to the ! j eigenvalue of Z.
We can calculate the commutation relations between elements of the Pauli group. We find

ZX = !XZ. (8.5)

It follows that
(X a Z b )(X c Z d ) = ! bc ad
(X c Z d )(X a Z b ). (8.6)

Since ! is a pth root of unity, we might as well calculate bc ad using arithmetic modulo p. We can define
a new function c(P, Q) : Pn (p) ⇥ Pn (p) ! Zp by

P Q = ! c(P,Q) QP. (8.7)

We no longer are choosing between commuting and anti-commuting Paulis; now commutation is measured
by an integer modulo p, which describes the phase factor that appears when we move P past Q as a power
of !.
A very important di↵erence when p is an odd prime rather than 2 is that we only need the phases to be
powers of !; we don’t get the annoying factor of i that shows up for qubits. This is because all elements of

126
Pn (p) have order p, regardless of overall phase:

(! a X b Z c )p = ! ap (X b Z c )(X b Z c )p 1
(8.8)
= ! bc X 2b Z 2c (X b Z c )p 2
(8.9)
bc+2bc 3b 3c b c p 3
=! X Z (X Z ) (8.10)
bc+2bc+···(p 1)bc p p
... = ! X Z (8.11)
bcp(p 1)/2
=! (8.12)
= 1, (8.13)

since p is odd. This also means that the eigenvalues of all elements of the qudit Pauli group are of the form
! j for integer j. While the qudit Pauli group is more complicated than the qubit Pauli group in a variety
ways, in this one way, at least, it is simpler.
There are alternate representations of the qudit Pauli group, just as for the qubit Pauli group. For
instance, we can define a Zp symplectic representation of the qudit Pauli group by
n
O
P = X aj Z bj 7! vP = (xP |zP ), (8.14)
j=1

with xP and zP being n-component vectors over Zp . xP has entries aj , and zP has entries bj . As with
qubits, the overall phase of P is lost when moving to the symplectic representation.
Suppose we have vP as defined in equation (8.14) and vQ , with
n
O
Q= X cj Z d j . (8.15)
j=1

We can define a symplectic inner product between vP and vQ as

X
vP vQ = (bj cj aj dj ), (8.16)
j

with arithmetic taken mod p. With this definition, we have the following result:

Proposition 8.1. Let P, Q 2 Pn (p). Then c(P, Q) = vP vQ .

In other words, commutation in the qudit Pauli group corresponds to the symplectic inner product.
We can also map the qudit Pauli group to GF(p2 ). However, the procedure is a bit more complicated
than for qubits. We should think of GF(p2 ) as a 2-dimensional vector space over GF(p) with basis {1, ↵}. ↵
can be any element of GF(p2 ) that is not in the GF(p) subfield. We can write arbitrary 2 GF(p2 ) as

= a + b↵, (8.17)

with a, b 2 GF(p). We then map

X a Z b 7! a + b↵. (8.18)
So far, so good. All is the same as for qubits. We lose the overall phase just as before. If we start with
an element of P̂n (p), we end up with an n-component vector over GF(p2 ) (with each element written as a
2-component vector over GF(p)).
The complication comes in when we look at the generalization of the symplectic inner product. We wish
to define it using multiplication within GF(p2 ), and we want it to output an element of GF(p). The latter
condition could be achieved by using the trace function (see appendix C for a discussion), but it turns out we
don’t want to do that in this case. For the former condition, the answer is not at all obvious, and historically
took some time to find. It turns out that the inner product we want is given by the following definition:

127
Definition 8.2. Suppose a, b are n-dimensional vectors over GF(p2 ). Let

a · bp b · ap
a⇤b= , (8.19)
↵ ↵p
where ap and bp are the n-dimensional vectors whose entries are the pth power of the entries of a and b, and
· represents the usual dot product between vectors.
Notice that the symplectic product we define (also called a trace-alternating inner product) depends
explicitly on the specific element ↵ 2 GF(p2 ) that we used to give the mapping from P̂1 (p) to GF(p2 ). That
wasn’t the case for qubits because there were only two elements of GF(4) which are outside GF(2), namely
! and ! 2 , but in GF(p2 ) there are more choices for the pair (↵, ↵p ).
I don’t know of a better way to motivate this formula than by just calculating, but it turns out that it
does work:
Theorem 8.2. Suppose P, Q 2 P̂n (p) correspond to vectors a, b over GF(p2 ) according to equation (8.18).
Then
a ⇤ b = c(P, Q). (8.20)
Proof. Let us calculate. We have

P 7! a = x + ↵z (8.21)
0 0
Q 7! b = x + ↵z . (8.22)

Now,
(x + ↵z)p = xp + ↵p z p = x + ↵p z, (8.23)
since the field has characteristic p and x, z are vectors over GF(p). Then

a · bp = (x + ↵z) · (x0 + ↵z 0 )p (8.24)

0 0 p 0 p+1 0
= x · x + ↵z · x + ↵ x · z + ↵ z·z (8.25)
p 0 0 p 0 p+1 0
b · a = x · x + ↵z · x + ↵ x · z + ↵ z·z . (8.26)

Thus,

a · bp b · ap = ↵(z · x0 x · z 0 ) + ↵p (x · z 0 z · x0 ) (8.27)
p 0 0
= (↵ ↵ )(x|z) (x |z ) (8.28)
p
= (↵ ↵ )c(P, Q). (8.29)

8.1.2 Qudit Pauli Group for Prime Power Dimensions

The next most complicated case is when each qudit has dimension q = pm , with p a prime. For these
dimensions, the most straightforward thing to do is to let Pn (q) = Pm (p)⌦n . In other words, we consider
each q-dimensional qudit as broken up into m p-dimensional qudits.
That’s fine, as far as it goes, but we’d like to put more structure on the Hilbert space. In particular, we’d
like to consider each q-dimensional qudit as an element of GF(q). To fully take advantage of that structure,
we’d like to also involve GF(q) in our understanding of Pn (q). We can do so as follows:
Definition 8.3. Suppose 2 GF(q). Let X and Z be defined as

X | i=| + i (8.30)
tr( )
Z | i=! | i, (8.31)

128
where 2 GF(q), ! = exp(2⇡i/p) is a pth root of unity and tr is the GF(q) trace function which maps
elements of GF(q) to elements of GF(p). Then Pn (q) consists of elements of the form
n
O
⌘! a X jZ j, (8.32)
j=1

where a 2 GF(p), j , j 2 GF(q). For odd q, ⌘ is always 1. For even q, ⌘ can be 1 or i. As usual,
P̂n (q) = Pn (q)/{! a I} (for odd p) or P̂n (q) = Pn (q)/{ia I} (for p = 2).
Note that
Z X = ! tr( )
X Z . (8.33)
Now let us examine how this definition plays out given a particular way of breaking up each q-dimensional
register into m p-dimensional ones. Suppose we consider GF(q) as an m-dimensional vector space over GF(p),
so
mX1
= ai ↵i , (8.34)
i=0
with 2 GF(q), ai 2 GF(p), and the ↵i ’s elements of GF(q) which are linearly independent vectors when
considered over GF(p). It is most common to take ↵i = ↵i , with ↵ some primitive element of GF(q). Note
that with this choice, ↵0 = 1.
We can immediately see how equation (8.34) corresponds to breaking the qudit up into pieces: | i $
|a0 i ⌦ |a1 i ⌦ · · · ⌦ |am 1 i. X ↵i then has a natural interpretation as the p-dimensional X applied to the ith
tensor factor: X
X ↵i | i = | + ↵i i = | (aj + ij )↵i i. (8.35)
j
P
bi ↵ i
If we apply X instead, that corresponds to performing X bi in the ith tensor factor. That is, to
understand the action of X , we expand in the same basis used for the standard basis decomposition, and
apply the appropriate power of X on each factor.
The interpretation of Z is a little trickier. It hinges on the notion of a dual basis (sometimes called a
complementary basis): The set {↵i } forms a basis for GF(q) considered as an m-dimensional vector space
over GF(p), and for any basis, there exists a dual basis { j } with the property:
tr(↵i j) = ij . (8.36)
Then
P m
O1 m
O1
tr( ai ↵i ) aj
Z | i=!
j j i |ai i = ! |ai i, (8.37)
i=0 i=0

since trace is linear over GF(p). That is, Z j corresponds to performing the p-dimensional Z on the jth
tensor factor. To understand the action of Z in general, we simply then expand in the dual basis to the
one used for the standard basis. The choice of the dual bases {↵i } and { j } thus specifies an isomorphism
Pn (q) ⇠
= Pmn (p).
For some fields (and in particular for q = 2m , which is the most interesting case), it is possible to simplify
the decomposition by choosing the basis {↵i } to be self-dual, i.e., i = ↵i . In that case, X ↵i and Z ↵i simply
represent the Pauli matrices acting on the ith tensor factor of the q-dimensional register. Not all finite fields
have self-dual bases, unfortunately, so for some values of q, we have to pick di↵erent decompositions for the
exponents of X and Z. Alternatively, we could abandon definition 8.3, but the advantages of that notation
greatly outweigh the inconvenience of having a slightly more complicated decomposition into p-dimensional
qudits.
For either odd or even characteristic, when we drop phases from the Pauli group, we get vectors on a
symplectic space:
On
P = X ⌘j Z j 7! vP = (xP |zP ), (8.38)
j=1

129
with xP an n-dimensional vector over GF(q) with entries ⌘j and zP an n-dimensional vector with entries j.
N ⌘0 0
The symplectic inner product between vP and vQ (with Q = X j Z j ) is
X
vP vQ = tr( j ⌘j0 ⌘j 0
j ). (8.39)
j

Multiplication now is the GF(q) multiplication rule, and we take the trace to end up with an element of
GF(p). Once more proposition 8.1 applies.
You probably can guess the next step: We wish to map the q-dimensional Pauli group into GF(q 2 ). The
procedure is similar to that for GF(p); we pick an element ↵ 2 GF(q 2 ) \ GF(q), and map

(x|z) 7! x + ↵z. (8.40)

The correct symplectic inner product in this case is

✓ ◆
a · bq b · aq
a ⇤ b = trq/p . (8.41)
↵ ↵q

There are two di↵erences from equation (8.19): We use the exponent q instead of p, and we use the trace
function to give us an element of GF(p) for the answer. Note that we are using the trace of GF(q) over
GF(p), not the trace of GF(q 2 ). This is because the term in parentheses already gives an element of GF(q).

Theorem 8.3. Suppose P, Q 2 P̂n (q) correspond to vectors a, b over GF(q 2 ) according to equation (8.40).
Then
a ⇤ b = c(P, Q). (8.42)

The proof is essentially the same as theorem 8.2. We just need to add one final step where we take the
trace to get the GF(q) symplectic product.

8.1.3 Qudit Pauli Group for Other Dimensions

If the dimension q of the register is not a prime power, there are two sensible ways to generalizeQthe above
approaches to define a Pauli group. One idea is to break q up into its prime factorization q = pm i , and
i

treat each prime power factor as a separate sub-register of size pm

i
i
with its own Pauli group.
The second approach is to directly generalize the Pauli group used for prime dimensions, by letting

! = e2⇡i/q (8.43)
X|ji = |(j + 1) mod qi (8.44)
j
Z|ji = ! |ji. (8.45)
N
As usual, the n-qubit Pauli group consists of products of the form ! a X b Z c for odd q. For even q, we
must include a possible overall factor of i as well. This version of the Pauli group is also known as the
Heisenberg-Weyl group.
It is also possible to use the Heisenberg-Weyl group in place of the usual Pauli group for prime power
dimensions. These groups are di↵erent: For instance, in the q = 9 Heisenberg-Weyl group, X has order 9,
whereas all elements of the usual q = 9 Pauli group have order 3. There are some applications where this is
a sensible thing to do, but the cost of using the Heisenberg-Weyl group is that we lose the field structure we
normally have in prime power dimensions.
Because the mathematical structure of the Heisenberg-Weyl group is more complicated than the Pauli
groups for prime dimension or prime-power dimension, the standard techniques of coding theory don’t work
as well. The basic structure of a stabilizer code still exists, but codes based on the Heisbenberg-Weyl group
lack some of the usual properties of stabilizer codes.

130
8.1.4 Nice Error Bases
Indeed, we can generalize even further and still have a stabilizer code structure. The most important features
are that the elements of our generalized Pauli group form a group, are independent, and span the set of
possible errors. This is codified by the definition of a nice error basis:
Definition 8.4. In a q-dimensional Hilbert space, let E = {E1 , . . . , Eq2 } be a set of unitary operators
satisfying E1 = I and tr Ei† Ej = q ij . The set E is a nice error basis if Ei Ej = !ij Ef (i,j) for all i, j and
some phases !ij .
For instance, for a single qubit, the usual Pauli operators {E1 = I, E2 = X, E3 = Y, E4 = Z} form a nice
error basis, with !ij 2 {±1, ±i}. To get a generalized Pauli group from a nice error basis, we take elements
of the form !Ei , ! is a product of !ij s.
Because there are exactly q 2 independent elements in a nice error basis, they form a basis for the space
of q ⇥ q matrices. In this sense, they can act like the Pauli matrices — we can take any error on the q-
dimensional register and expand it as a sum of elements from the nice error basis. It is thus sufficient for a
QECC to correct all errors in a nice error basis to correct arbitrary errors on the register. This justifies the
“error basis” part of the name.
Proposition 8.4. The indices i of the errors in a nice error basis form a group under multiplication given
by the binary operation f (i, j).
Proof. By the definition of a nice error basis, the set of indices is closed under the group operation. Asso-
ciativity follows from the associativity of operator multiplication:

Ei Ej Ek = !ij Ef (i,j) Ek = !ij !f (i,j)k Ef (f (i,j),k) (8.46)

= !jk Ei Ef (j,k) = !jk !if (j,k) Ef (i,f (j,k)) . (8.47)

Since the Ei s are orthogonal, it follows that f (f (i, j), k) = f (i, f (j, k)), the statement of associativity.
The group identity is i = 1 since E1 = I:

!1i Ef (1,i) = IEi = Ei . (8.48)

To establish the existence of inverses, first note that if f (i, j) = f (i, j 0 ), then

Ei Ej 0 = !ij 0 !ij 1 Ei Ej , (8.49)

implying Ej 0 = !Ej for some phase !. Since the Ei s are orthogonal, it follows that j = j 0 . Therefore, the
function j 7! f (i, j) is one-to-one, and since the domain and range both have size q 2 , it must be onto as
well. In particular, for all i, there must exist i 1 such that f (i, i 1 ) = 1, and i 1 is the inverse of i.
Definition 8.5. The group of indices {i} under group operation f (i, j) is called the index group of the nice
error basis.
The index group can also be obtained by taking the Pauli group generated by the nice error group and
modding out overall phase. For the usual qubit Pauli group, the index group is therefore isomorphic to
Z2 ⇥ Z2 . For the Heisenberg-Weyl group in dimension q, the index group is Zq ⇥ Zq , and for the prime power
Pauli group, the index group is GF(q)⇥GF(q) under addition. All of these are fairly straightforward Abelian
groups, but for large q, there exist error groups with more exotic index groups, including non-Abelian ones.

8.2 Qudit Stabilizer Codes

Now we are ready to talk about actual QECCs using qudits as registers. After going to all that e↵ort to find
qudit analogs of the Pauli group, you might guess that I’m now going to define a qudit analog of stabilizer
codes. If you guessed that, you’d be correct.

131
8.2.1 Definition and Properties of Qudit Stabilizer Codes
The basic definition of a stabilizer and a stabilizer code is the same as for qubits. When q is a non-prime-
power or when we are using a nice error group more exotic than the standard Pn (q), it is still possible to
define a stabilizer code, but there are some complications to the theory. I’ll just stick to the simpler cases.
The definition of a stabilizer is the same as for qubits:

Definition 8.6. Let q = pm be a prime power, P = Pn (q), and let Q be a subspace of the Hilbert space
Hq⌦n (i.e., consisting of n qudits). The GF(q) stabilizer of Q is the set

S(Q) = {M 2 P|| i is an eigenvector of M with eigenvalue +1 8| i 2 Q}. (8.50)

Let S be a subgroup of P. We say S is a GF(q) stabilizer, or just stabilizer when the GF(q) is clear from
context, if it is Abelian and if ei I 62 S for any phase 6= 0. The code space of a GF(q) stabilizer S is the
subspace
T (S) = {| i s.t. M | i = | i 8M 2 S}. (8.51)
The code Q is a GF(q) stabilizer code i↵ Q = T (S(Q)). The normalizer N(S) of the stabilizer S is

N(S) = {N 2 P|N M = M N 8M 2 S}. (8.52)

The only real di↵erences from the qubit definition are the use of Pn (q) instead of Pn and forbidding ei
for all non-zero phases , which is needed because there are more phases in Pn (q) than just ±i, ±1.
Note that a GF(q) stabilizer automatically gives a GF(p) stabilizer when q = pm when we reinterpret
Pn (q) as Pmn (p) as discussed insection 8.1.2. Indeed, using the same isomorphism, we can interpret a GF(p)
stabilizer on mn qudits as a GF(pm ) stabilizer on n qudits. The di↵erence between them is the definition
of weight (and thus distance of a code), which counts the number of non-trivial p-dimensional qudits in an
operator for a GF(p) stabilizer but the number of non-trivial q-dimensional qudits for a GF(q) stabilizer.
Thus, a Pauli which has weight t for GF(q) might have weight up to mt for GF(p), but there are also Paulis
which have the same weight for GF(p) and GF(q).
Usually, although this is not required by definition 8.6, we deal with GF(q) stabilizer codes that have an
additional property:

Definition 8.7. Let P 2 Pn (q) have GF(q) symplectic representation (xP |zP ). Let S be a GF(q) stabilizer,
with symplectic representation Ŝ. Then we say S is a true GF(q) stabilizer if (xP |zP ) 2 Ŝ implies ( xP | zP ) 2
Ŝ as well for all 2 GF(q).

That is, for a true GF(q) stabilizer code, the symplectic representation of the stabilizer is a GF(q)-linear
space. Note that if q is prime, any GF(q) stabilizer code is a true GF(q) stabilizer code since P 2 S implies
P i 2 S as well, and
(xP i |zP i ) = i(xP |zP ). (8.53)
However, when q = pm with m > 1, then there are some elements of GF(q) which are not numbers, so it is
possible to have stabilizer codes which are not true GF(q) stabilizer codes.
GF(q) stabilizer codes have the same properties we are familiar with from qubit stabilizer codes.

Proposition 8.5. If Q is a non-trivial subspace of the Hilbert space, then S(Q) is a GF(q) stabilizer (not
necessarily a true GF(q) stabilizer). If S is a GF(q) stabilizer, then S(T (S)) = S.

The proof is almost identical to the qubit case. For prime qudit dimension p, we also get analogues of
proposition 3.3 and theorem 3.4.

Theorem 8.6. Let p be prime and let S be a GF(p) stabilizer for the code T (S), which has n physical qudits.
If |S| = pr (i.e., S has r generators), then dim T (S) = pn r , so T (S) encodes k = n r physical qudits. The
set of undetectable errors for S is N̂(S) \ Ŝ. The distance of S is min{wt E|E 2 N̂(S) \ Ŝ}.

132
The proofs are closely analogous to the qubit case, so I omit the details. The main notable di↵erence is
that 12 (I + M ) is not the projection operator on the +1 eigenspace of M . Instead, the projector on the +1
Pp 1
eigenspace of M is p1 j=0 M j . The projector on the codespace can be written as
1 X
⇧S = r M. (8.54)
p
M 2S

(Note that the normalization is p r instead of 2 r .) The other di↵erence is that in the case where the error
E2 / N(S), there is a phase ! a instead of 1, but that does not really alter the proof.
For prime power dimensions, there is a complication due to the fact that all Paulis have order p rather
than q. It is most helpful to think of a GF(q) staibilizer of n qudits as a GF(p) stabilizer on mn qudits, in
which case we can apply theorem 8.6. Considered as a GF(p) stabilizer, a stabilizer with r generators has
pr elements and encodes a pmn r -dimensional Hilbert space, the straightforward analog of the qubit result.
If you insist on thinking of it as a GF(q) stabilizer, the same stabilizer still has r generators and pr elements
and encodes a Hilbert space of dimension pmn r = q n r/m (it is, after all, the same code). In particular, a
GF(q) stabilizer does not need to encode an integer number of q-dimensional qudits. However, a true GF(q)
stabilizer code always has a number of generators which is a multiple of m, so does encode an integer number
of q-dimensional qudits.
The other subtlety in prime power dimensions is the definition of distance, and indeed, this is the only
property for which it really makes a di↵erence whether we think of the code as a GF(p) code or a GF(q)
code. It is still the case, of course, that the set of undetectable errors is N̂(S) \ Ŝ. The distance is again
the minimum weight of a Pauli in N̂(S) \ Ŝ. However, when we think of it is a GF(q) code, we should use
weight as defined by the decomposition into q-dimensional qudits (i.e., the number of q-dimensional registers
with non-trivial Paulis). If we instead want to think of the code as a GF(p) code, the weight would be the
number of non-trivial Paulis in the decomposition into p-dimensional qudits. The GF(p) weight of a Pauli
could be equal to the GF(q) weight, but it could also be as high as m times the GF(q) weight. Also note
that the lowest-weight operator in N̂(S) \ Ŝ using the GF(q) weight might even be a di↵erent operator than
the lowest-weight operator using the GF(p) weight.
Notation 8.8. A stabilizer code with n physical qudits of dimension q, k logical qudits (also of dimension
q), and distance d is denoted as an [[n, k, d]]q code.
Compare the notation ((n, K, d))q for a qudit code that is not necessarily a stabilizer code and [[n, k, d]]
for a qubit stabilizer code. This way of listing the properties of a qudit stabilizer code is an obvious hybrid.

8.2.2 Examples: Distance 2 Code, 5-Qudit Code

Let us start to explore the world of qudit stabilizer codes by looking at qudit versions of some specific qubit
codes. First, we can derive distance 2 codes. Again, we will choose one generator to detect Z errors on
any qudit and one generator to detect X errors on any qudit. For prime qudit dimension p, this leads to a
stabilizer of the form
On
M1 = X ai (8.55)
i=1
n
O
M2 = Z bi . (8.56)
i=1
P
We need the generators to commute, so we must pick the exponents so that ai bi = 0 using GF(p)
arithmetic. For instance, when n is even, it suffices if ai = 1 (for all i) and bi = 1 for half of the values of i
and bi = 1 for the other half. Note that it does not work to take all powers equal to 1 unless n = 0 mod p.
It has distance 2 if all ai and bi are non-zero because
c(Xic Zid , M1 ) = dai (8.57)
c(Xic Zid , M2 ) = cbi . (8.58)

133
M1 = X Z Z 1 X 1 I
M2 = I X Z Z 1 X 1
M3 = X 1 I X Z Z 1
M4 = Z 1 X 1 I X Z

Table 8.1: The generators for the five-qudit code.

On the other hand, X1b2 ⌦ X2 b1 commutes with both generators, so we can see that the code is only distance
2. Notice also that, whereas for qubits, the smallest distance 2 code has 4 physical qubits, for larger qudits, it
is possible to do it with just 3 qudits. For instance, for p = 3 (qutrits), there is the straightforward [[3, 1, 2]]3
code with generators X ⌦ X ⌦ X and Z ⌦ Z ⌦ Z.
For prime power qudits, the above construction does not give a distance 2 code. For instance,

c(Zi , M1 ) = tr ai , (8.59)

and for any ai , there exists a for which tr ai 6= 0. The problem here is that X on a single dimension-q
qudit does not detect Z for all , basically because a single qudit is really m separate p-dimensional qudits.
No matter what power X ↵ we choose, we will have the same problem, since there will be some decomposition
(not necessarily the standard one) which will lead X ↵ to act only on a single p-dimensional tensor factor of
the full q-dimensional qudit.
Luckily, we can easily fix this by making the minimal modification needed to get a true GF(q) stabilizer
code, repeating the same operators on all the tensor factors of a qudit. Let S be the smallest stabilizer
containing
n
O
↵i
M1 ( ) = X (8.60)
i=1
On
M2 ( ) = Z i
, (8.61)
i=1
P
for all andPany particular choice of (↵i , i ) such that ↵i i = 0 in GF(q). Note that M1 (1) and M2 (1)
commute if tr(↵i i ) = 0, but that is not sufficient to make sure that all M1 ( ) commute with all M2 ( ).
With these extra elements added to the stabilizer, the code is now distance 2. For instance, consider again
the example Zi :
c(Zi , M1 ( )) = tr( ↵i ). (8.62)
While this will certainly be 0 for some specific , tr( ↵i ) can only be 0 for all if ↵i = 0 in GF(q).
The operators M1 ( ) are not independent for all . Since the number of generators for S is best determined
by thinking of it as a GF(p) code, we should represent each as an m-dimensional vector over GF(p). (It is
m-dimensional since q = pm .) It’s not hard to see that M1 ( )M1 (⌘) = M1 ( + ⌘) and [M1 ( )]a = M1 (a ) for
a 2 GF(p). A set {M1 ( )} is thus independent for a subset of possible ’s if the corresponding GF(p) vectors
are linearly independent, which means that S has a total of m generators of the form M1 ( ). Similarly, it has
m generators of the form M2 ( ). When we go back to thinking of it as a GF(q) code, we get an [[n, n 2, 2]]q
code. As a GF(p) code, this would be a [[mn, m(n 2), 2]]p code; this is a case where the distance is the
same over GF(p) and GF(q).
For the next example, I will look at a 5-qudit code which is the qudit generalization of the 5-qubit code.
Again beginning with prime dimension, consider the stabilizer given in table 8.1. You can check directly
that it is Abelian. It is a bit harder (though still not that hard) to see that it has distance 3, but trust me,
it does. Thus, this is a [[5, 1, 3]]p code. We could also have used the same stabilizer as for the qubit version
of the code (table 3.2), but this version has the minor advantage that it is cyclic, just like the 5-qubit code,
whereas table 3.2 would give us a non-cyclic [[5, 1, 3]]p code. While the 5-qubit code is unique up to a tensor
product of single-qubit unitary rotations, there are multiple inequivalent 5-qudit codes for qudit dimension

134
p 3. Another di↵erence is that the 5-qubit code is perfect (number of errors equals number of syndromes),
whereas the 5-qudit codes are not (more syndromes than single-qudit errors).
Moving to prime power qudits, this 5-qudit code has the same problem as the distance 2 codes — it does
not detect or correct errors on the additional tensor factors of a qudit. We can solve it in the same way. Add
to the stabilizer all operators of the form Mi ( ), with each Pauli raised to the power ± instead of ±1. For
instance, M1 ( ) = X ⌦ Z ⌦ Z ⌦ X ⌦ I. The result is a true GF(q) stabilizer code with parameters
[[5, 1, 3]]q .

8.3 Qudit CSS Codes

We can use some of the same methods to make qudit codes that we used for qubit codes. As discussed in
section 8.1, we can map the q-dimensional qudit Pauli group (for either prime or prime power dimension)
to vectors over GF(q 2 ). This lets us interpret qudit stabilizer codes as GF(q 2 ) additive codes using the
symplectic inner product (8.19) or (8.41) to determine commutation. Note that GF(q 2 ) linear codes which
are weakly self-dual under the sympletic inner product are equivalent to true GF(q) stabilizer codes with
an additional symmetry, just as linear GF(4) codes give qubit stabilizer codes with an extra symmetry (see
exercise ??).
Alternatively, by writing Paulis in the GF(q) symplectic form (an n-qudit Pauli written as a 2n-component
vector over GF(q)), we can use the CSS construction. You’ll notice that the example distance 2 codes in
the last section had some generators that were all Z’s and some generators that were all X’s, whereas the
5-qudit code had a mix of X and Z in each generator. This is because those distance 2 codes are qudit CSS
codes, whereas the 5-qudit code is not.

8.3.1 The CSS Construction for Qudits

For prime dimension, the CSS construction is basically the same as for qubits. Generate a stabilizer code of
the form ✓ ◆
0 H1
, (8.63)
H2 0
from the parity check matrices of two classical linear codes C1 and C2 . In order to define a stabilizer code,
the stabilizer must commute, so use the symplectic inner product (8.16) to test that. Again we find the
condition that, if x 2 C2? and z 2 C1? , x · z = 0, so in order to get a valid stabilizer code, it must be the case
that C1 ? ✓ C2 . The distance and number of encoded qudits are given by the same formulas as for qubits.
The only di↵erence from the qubit case is that the arithmetic to determine commutation is mod p instead
of mod 2.
For prime power qudits, we’d like to do the same construction, but we must be a bit more careful. Given
twoN N GF(q) codes C1 and C2 , let’s define the stabilizer S to be the smallest group containing
classical linear
all j Z j and j X ⌘j , where runs over elements of C1? and ⌘ runs over elements of C2? . Note that it is
not sufficient to look at the group generated in this way for basis vectors and ⌘ of C1? andNC2? . This is
because if 2 C1? , then GF(q) linearity implies that ⇠ 2 C1? as well for any ⇠ 2 GF(q), but j Z j 2 S is
N
not sufficient to imply that j Z ⇠ j 2 S. This contrasts with the GF(p) case, where
O ⇣O ⌘a
Z abj = Z bj , (8.64)
j j

and since S must be closed under multiplication, it must also be closed under integer exponentiation. Ex-
ponentiation by ⇠ 2 GF(q) doesn’t make any sense, and this is responsible for the di↵erence between GF(p)
and GF(q) stabilizer codes (including CSS codes).
Theorem 8.7. Let C1 and C2 be two classical linear codes over GF(q) with parameters [n, k1 , d1 ]q and
[n, k2 , d2 ]q and satisfying C1? ✓ C2 . Then there exists a true GF(q) stabilizer code with stabilizer given as
above with parameters [[n, k1 + k2 n, d]]q , d min(d1 , d2 ).

135
Proof.NThe first consideration
N is whether the stabilizer given above is well-defined. We need to check that
M = j Z j and N = j X ⌘j commute when 2 C1? and ⌘ 2 C2? . By equation (8.39),
X
c(M, N ) = tr j ⌘j = tr · ⌘ (8.65)
j

(using the GF(q) dot product). Since C1? ✓ C2 , 2 C2 , so · ⌘ = 0, and c(M, N ) = 0 as desired.
Therefore, we have a well-defined GF(q) stabilizer code. The elements of the stabilizer have symplectic
representations of the form (⌘| ) for ⌘ 2 C2? and 2 C1? . Since C1 and C2 are GF(q) linear, so are C1? and
C2? . Thus, the code is a true GF(q) stabilizer code.
The next question is to determine how many logical qudits there are in this code. Each basis vector of C1?
or C2? gives us m independent elements of S (q = pm as usual) for the reasons noted above (exponentiation
by ⇠ 2 GF(q) does not make sense). Thus, there are m(n k1 ) generators of S derived from C1 and
m(n k2 ) generators derived from C2 . Thought of as a GF(p) code, the number of logical qudits is thus
mn m(n k1 ) m(n k2 ) = m(k1 + k2 n). Thought of as a GF(q) code again, we have k1 + k2 n
logical qudits.
Finally, the distance can be determined just as for a usual binary CSS code.
It might appear at first sight that it is possible to have GF(q) CSS codes that don’t satisfy the condition
C1? ✓ C2 since we actually only need tr · ⌘ = 0 for all and ⌘. However, because C1 and C2 are linear
GF(q) codes, tr · ⌘ = 0 for all and ⌘ i↵ · ⌘ = 0 is.
As with binary CSS codes, the basis codewords have a straightforward form when written out in the
standard basis: X
| + C2 ? i = | + ⌘i, (8.66)
⌘2C2 ?

for 2 C1 . This works for both prime dimension and prime power dimension. Indeed, we could just take it
as the definition of a CSS code in any dimension.

8.3.2 Polynomial Codes

One particularly interesting family of qudit CSS codes is the family of polynomial codes, which are CSS codes
derived from classical Reed-Solomon codes or their variants. Let us pick the code C1 as a Reed-Solomon
code by choosing n distinct points (↵1 , . . . , ↵n ) from GF(q) \ {0} (n < q) and a number k1  n. We’ll use
polynomials with degree up to k1 1. We also pick a second number 0  k2  k1 which will determine the
size of C2 .
Definition 8.9. The basis codewords of the polynomial code over GF(q) with (n, k1 , k2 ) are
X n
O
2 k1 1
| 0, . . . , k2 1 i = | 0 + 1 ↵i + 2 ↵i + ··· + k1 1 ↵ i i. (8.67)
( k2 ,..., k1 1 )2GF(q) i=1

The polynomial code is the span of these basis codewords.

That is, the basis codewords are indexed by k2 values, the lowest-order coefficients of the polynomials
being used. We then take the superposition over polynomials for all possible coefficients of xk2 and higher,
up to the maximum degree xk1 1 . A particularly interesting special case is when k2 = 1 so there is just one
encoded qudit. Note that if k2 = 0, the polynomial code can still be defined, but it is only a single state.
Clearly a polynomial code is a qudit CSS code, since the basis codewords have the correct form. However,
instead of choosing both C1 and C2 to be Reed-Solomon codes (which wouldn’t work, since they are not
precisely dual to each other), we have chosen C2 ? to be a modified Reed-Solomon code. In particular, C2 ?
is the code made up of vectors (f (↵1 ), . . . , f (↵n )), where f (x) runs over degree k1 1 or lower polynomials
for which the lowest nonzero term is the xk2 power or higher. With this choice, it is manifestly true that
C2 ? ✓ C1 and that the code encodes k2 qudits. What is not clear is the distance of the resulting polynomial
code.

136
Theorem 8.8. The GF(q) polynomial code with parameters (n, k1 , k2 ) is a non-degenerate true [[n, k2 , d]]q
stabilizer code with d = min(n k1 + 1, k1 k2 + 1).

Proof. The code C1 is used to correct bit flip errors. By theorem 4.15, the distance of C1 is n k1 + 1. To
prove the formula for distance, we thus need to show that C2 has distance k1 k2 + 1 and that the code is
non-degenerate. The fact that it is a true GF(q) code follows from theorem 8.7.
The dual code C2 ? has basis vectors (↵1j , . . . , ↵nj ) for j = (k2 , . . . , k1 1). These form the rows of the
parity check matrix H2 . A vector orthogonal to all rows of H2 (i.e., a vector in C2 ) corresponds to a linear
dependence among the columns of H2 , so the distance of C2 is the minimum number of columns of H2 that
are linearly dependent. There are k1 k2 rows, so certainly no more than k1 k2 columns can be linearly
independent.
Let us look at the matrix formed by any set of k = k1 k2 columns, say the first k. The matrix entries
are Vij = ↵ik2 +j 1 . This is not a Vandermonde matrix, but is clearly closely related. The columns of the
matrix are linearly independent i↵ the rows are, and we can think of a linear combination of the rows as
a polynomial with degree k1 1 and all coefficients below degree k2 being 0; that is, an element of C2 ? .
The ith entry of the linear combination is the polynomial evaluated at ↵i . A linear dependence of the rows
is thus a polynomial that evaluates to 0 on all the points ↵1 , . . . , ↵k . Because the lowest degree term of
the polynomial is xk2 , the polynomial also has a 0 at x = 0 with multiplicity k2 . That gives a total of
k2 + k = k1 zeros for the polynomial, which is too many for a degree k1 1 polynomial; thus, for this to
be true, the polynomial must be uniformly 0 everywhere (not just on ↵1 , . . . , ↵k ). In other words, there is
no linear dependence of the rows and the matrix Vij is non-singular. Thus, any k columns are independent.
Since any k + 1 columns are linearly dependent, the code C2 has distance k + 1.
The only remaining thing to show is that the code is non-degenerate, from which it follows that it has
distance exactly min(n k1 + 1, k1 k2 + 1) and not a greater distance. We wish to show that there exists
some non-trivial logical Pauli that has this weight. I will show that there is a logical X with weight exactly
n k1 + 1 and a logical Z with weight exactly k1 k2 + 1.
For a CSS code N (over either qubits or qudits), the logical X operators can be taken to be products of
physical Xs, X = X ⌘i . Moreover, the vectors (⌘i ) must be in C1 to commute with the Z stabilizer
generators. Since the distance of C1 is exactly n k1 + 1, there is a vector of this weight. That is, there is
a non-trivial polynomial f of degree k1 1 or less such that ⌘i = f (↵i ) is 0 for exactly k1 1 values of ↵i .
Thus wt X = n k1 + 1. But since f has degree k1 1, if it has more than k1 1 zeros, then it will be
uniformly zero. In particular, f (0) 6= 0. Now,
n
X O
X|0i = |(f + g)(↵i )i. (8.68)
g2C2 ? i=1

But the x0 coefficient of g is 0 (since g 2 C2 ? ) and the x0 coefficient of f + g is not zero, so X|0i =
6 |0i and
X 62 S. N
Similarly, the logical Z operators are products of physical Zs, Z = i Z i . The vectors ( i ) must be in
C2 , which means there is such a vector with weight exactly k1 k2 +1. This gives us Z with wt Z = k1 k2 +1.
Now,
k+1
X O
Z|f i = Z |(f + g)(↵i )i (8.69)
g2C2 ? i=1
X P O
= ! i tr i (f +g)(↵i )
|(f + g)(↵i )i (8.70)
g2C2 ? i
X P P O
= ! tr i i f (↵i )+tr i i g(↵i )
|(f + g)(↵i )i. (8.71)
g2C2 ? i

P
Since ( i ) 2 C2 and g 2 C2 ? , i i g(↵i ) = 0.

137
P
But I claim there exists some f 2 C1 such that tr i i f (↵i ) 6= 0. We can assume without loss of general-
ity that i is non-zero only for i = 1, . . . , k1 k2 + 1. Consider polynomials fj (x) = xj for j = 0, . . . , k1 k2 .
The matrix Vij = ↵ij = gj (↵i ) is a Vandermonde matrix, so the vectors (fj (↵1 ), . . . , fj (↵k1 k2 +1 )) are lin-
early independent. They are vectors in a (k1 k2 + 1)-dimensional vector space, so there is no non-zero
vector that is orthogonal to all of Pthem. In particular, the vector ( i ) must have non-trivial overlap with at
least one of the vectors (fj (↵i )): P i i fj (↵i ) = ⇠ 6= 0. It is possible that tr ⇠ = 0, but if we instead use the
polynomial f 0 (x) = ⌘fj (x), then i i fj (↵i ) = ⇠⌘. ⌘ is arbitrary, so we just need to pick some ⌘ such that
tr(⇠⌘) 6= 0 to prove the claim.
0 0
Consequently, Z|0i = |0i but Z|f i = 6 |f i. Thus, Z, which has weight k1 k2 + 1, is a non-trivial
logical operation. This proves the polynomial code is non-degenerate and thus the distance is exactly
min(n k1 + 1, k1 k2 + 1).
If we choose the distances of the bit flip code and the phase code to be equal, then n k1 = k1 k2 , or
2k1 = n k2 . In this case, 2d = n k1 + 1 + k1 k2 + 1 = n k2 + 2. Since k2 is the number of encoded
qudits, a polynomial code with these parameters saturates the quantum Singleton bound and is a quantum
MDS code. For instance, we can have a [[5, 1, 3]]q polynomial code for q > 5. (There is also a variant with
q = 5.) Note that this code is not equivalent to the [[5, 1, 3]]q stabilizer in table 8.1. This code is a CSS
code, unlike the previous one, but it does not work for q = 2, 3, or 4, whereas the code of table 8.1 works
for any prime or prime power dimension.

8.4 Qudit Cli↵ord Group

Although I’ve spent a number of pages discussing qudit stabilizer codes, ultimately they are not that di↵erent
than qubit stabilizer codes. They’re not identical, to be sure, which is why I’ve gone through them in detail,
but the di↵erences are small. When we get to the qudit Cli↵ord group, however, larger di↵erences start
to appear. The same general principles hold as for the qubit case, but there is a significant divergence
at the level of details. I will focus on the case of prime dimension; for prime powers, simply consider the
q = pm -dimensional qudit as m p-dimensional qudits.
We start the same way as for qubits:
Definition 8.10. Let p be a prime. The qudit Cli↵ord group is
Cn (p) = {U 2 U(pn )|U P U † 2 Pn (p) 8P 2 Pn (p)}. (8.72)
As with qubits, Pn (p) is a normal subgroup of Cn (p). We can also define
Ĉn (p) = Cn (p)/{ei I} (8.73)
Čn (p) = Ĉn (p)/P̂n (p). (8.74)

It is still the case that Čn (p) ⇠

= Sp(2n, Zp ) and that there exists a unitary in Cn (p) that performs any
transformation of the Paulis that preserves commutation relations. It is also possible to efficiently simulate
the behavior of the qudit Cli↵ord group, including Pauli measurements, with a classical computer using
essentially procedure 6.4. The main di↵erence in the simulation is when measuring Pi with Pi 62 N(T). The
measurement outcome is a uniform random value b from 0 to p 1 rather than 0 or 1. We still find a
generator M of the stabilizer to replace with ! b Pi , but we choose M that has a factor of ! when commuting
past Pi rather than anticommuting with it. By taking the appropriate power r of M , we can ensure that
N M r commutes with P for stabilizer generators or logical operators N ; we replace the original N with the
appropriate N M r .
The biggest di↵erence with the qubit Pauli groups comes when comparing the actual elements of the
Cli↵ord group. Some of the elements are very closely analogous. For instance, the discrete Fourier transform
over Zp is the generalization of the Hadamard transform (which is the discrete Fourier transform over Z2 ):
1 X ab
F|ai = p ! |bi. (8.75)
p
b

138
Working out the conjugation action on the Paulis, we find that Fourier does not precisely swap X and Z,
but does so adding an inverse:

X 7! Z (8.76)
1
Z 7! X . (8.77)

There is also a two-qubit SUM gate which is the direct analogue of the qubit CNOT gate:

SUM|ai|bi = |ai|a + b mod pi (8.78)

X ⌦ I 7! X ⌦ X (8.79)
Z ⌦ I 7! Z ⌦ I (8.80)
I ⌦ X 7! I ⌦ X (8.81)
1
I ⌦ Z 7! Z ⌦ Z. (8.82)

Again there is an extra inverse in one of the images in the conjugation action. The inverse powers are needed
to ensure that the Cli↵ord group gate preserves commutation relations among the Paulis. For instance,
Z ⌦ Z does not commute with X ⌦ X, but Z 1 ⌦ Z does.
From here, though, we start to see a bigger divergence. There are operators in Cn (p) which don’t have
any qubit analog, such as scalar multiplication by c 2 Zp \ {0}:

Sc |ai = |cai (8.83)

c
X 7! X (8.84)
1
c
Z 7! Z . (8.85)

The power of Z is c 1 , the multiplicative inverse of c in Zp . For instance, for p = 7, 2 1 = 4 since

2 · 4 = 8 ⌘ 1 mod 7. There is also no precise qudit analog of R⇡/4 . The closest is a quadratic phase gate
which has a similar action on Paulis:

B|ai = ! a(a 1)/2

|ai (8.86)
X 7! XZ (8.87)
Z 7! Z. (8.88)

Theorem 8.9. The gates F, B, and SUM along with global phase ei✓ I generate Cn (p).

Proof. First, we can show that F, B, SUM, Sc (for all c 6= 0), Pn (p) and ei✓ I generate Cn (p) using essentially
procedure 6.5. In the algorithm, we can replace H with F, CNOT with SUM, and R⇡/4 with B. SWAP can
be realized as
SWAPi,j = S 1 i SUMi,j SUM 1 j,i SUMi,j . (8.89)
The analog of C Z is (I ⌦ F)SUM(I ⌦ F 1 ). The symplectic matrices of these gates are quite similar to
the qubit versions, but there are some additional minus signs that appear (when the power is an inverse). In
the procedure, we will have to use some of them multiple times in order to cancel out terms. For instance,
in step 2, we wish to eliminate entries in the first column of A. To do so, use left multiplication by SUM
p a1j to eliminate entry a1j . We also add in Sc , which has the e↵ect of multiplying rows or columns by c
or c 1 . This is useful in step 1: we can use SWAP to move a non-zero entry into the upper left corner, but
we then need Sc to make that entry 1.
The next step is to show that we don’t need Sc . I claim that gates of this form can be generated using
only F, B, and P1 (p). Let Q = FBF 1 . Then

Q : X 7! X (8.90)
1
Z 7! ZX . (8.91)

139
The action of B r Qs B m Qn is then

X 7! ! a X 1 ms
Z m+r rms
(8.92)
Z 7! ! b X n s+mns
Z1 (m+r)n+(mn 1)rs
(8.93)

The powers a and b don’t matter because we can get rid of the powers of ! using an appropriate element of
P1 (p). Let r = c 1 , s = 1 c, m = 1, and n = 1 c 1 , so rs = rms = n = m + r. Then

X 7! X c (8.94)
1
c
Z 7! Z . (8.95)

We also don’t need the Paulis. F 2 = S 1, i.e. F 2 |ai = | ai. Then

F 2 BF 2 B 1
|ai = F 2 B! a(a 1)/2
| ai = ! a( a 1)/2 a(a 1)/2
|ai = ! a |ai. (8.96)

Thus, Z = F 2 BF 2 B 1
, and X = F 1
ZF, which then can be used to generate the full Pauli group.

140
Chapter 9

Now, What Did I Leave Out?: Other

Things You Should Know About
Quantum Error Correction

By now, you know (or should, if you’ve been paying attention) quite a lot about quantum error-correcting
codes. However, there’s still a lot more to learn. Some topics, such as topological codes or channel capacity,
deserve a chapter or more of their own, and those will be covered in part III. However, there are plenty of
other things about QECCs which I don’t want to get into at length, but are still interesting or important
to know. Since many of them don’t fit well into the structure of the previous chapters, I’ve put them here.
Consequently, this chapter is a grab-bag of stu↵ about QECCs left out of the first 8 chapters. These topics
are not really dependent on each other, and only a few of them are essential to later parts of the book.
Concatenated codes (section 9.1) will play an important role in the discussion of fault tolerance in part II,
and the coherent information (section 9.3) will be need in chapter 18. The rest of this chapter is optional,
consisting of various stand-alone topics.

9.1 Concatenated Codes

9.1.1 Basic Properties
If one quantum error-correcting code is good, two codes must be better. That’s the philosophy behind a
concatenated code. To make a concatenated code, take your quantum data and encode using one code Q,
then take each of the physical registers of Q and encode it again using R, as depicted in figure 9.1. The
resulting concatenated code generally has a greater distance than either Q or R, albeit at the cost of more
physical qudits for the same number of logical qudits. If concatenating once is not enough, you can add a
third layer of concatenation, or a fourth, or however many it takes to satisfy your hunger for error correction.

More precisely, a concatenated code can be defined as follows:

Definition 9.1. Let Q be an ((n1 , K, d1 ))q1 code with encoder E1 and R be a ((n2 , q1 , d2 ))q2 code with
encoder E2 . The concatenated code formed from Q and R is an ((n1 n2 , K))q2 code with encoder E2⌦n1 E1 .
Q is known as the inner code and R is the outer code.

Notice that a concatenated code gets the size of its logical Hilbert space from Q and the size of the
physical registers from R, whereas the number of registers n is the product of n1 and n2 . The register size
from Q must match the logical Hilbert space size for R so that E2 can be applied to each register of Q.
It is easy to compute a bound on the distance of a concatenated code:

141
R

logical qubits R
Q

Figure 9.1: In a concatenated code, two codes are combined to produce a larger code. k1 logical qubits
(triangles) are encoded in the n1 physical qubits of the inner code (squares), which are in turn individually
encoded into the outer code, each using n2 physical qubits (circles).

142
Proposition 9.1. The concatenated code formed from a distance d1 inner code and a distance d2 outer code
has distance d d1 d2 .

Proof. Consider detecting errors with the code. If there are fewer than d2 errors on any block of R involved
in the concatenated code, we will detect the error using that that code alone. With at least d2 errors on
a single block, however, it may be possible to have an error that is undetectable by R alone. Formally, if
wt E < d2 , then ⇧R E| i = c(E)| i for all codewords | i, where ⇧R is the projector on R. On the other
hand, there must exist some E with wt E = d2 and some codeword | i, such that ⇧R E| i = 6 c(E)| i.
But that process only changes a single register of Q, which by itself would be detected using the properties
of Q (provided d1 > 1). In particular, let F = D2 ⇧R E be an error acting on a register of Q, where D2 is the
decoder for R. From above, if wt E < d2 , then F / I, whereas if wt E d2 , it is possible that F 6/ I.
Suppose now that we have some error of weight w acting on the concatenated code as a whole. Break
it up into a tensor product of errors, with Ei acting on the ith block of R. (We may assume the overall
error is a tensor product or even a Pauli, by corollary 2.5.) Let Fi = D2 ⇧R Ei . No matter how we divide
the errors between blocks of R, there can be at most bw/d2 c blocks with d2 or more N errors per block, so at
n1
most bw/d2 c of the Fi are not proportional to the identity. That is, the error F = i=1 Fi has weight at
most bw/d2 c
F is an error acting on Q when it is considered without concatenation. If wt F < d1 , then Q by itself can
detect F . In order to fool Q, therefore, we need an error acting on the concatenated code with w/d2 d1 .
Therefore, the concatenated code can detect any error of weight d1 d2 1 or less, so by theorem 2.9, the
distance of the concatenated code is at least d1 d2 .

Note that it is possible for the concatenated code to have distance greater than one would expect from
proposition 9.1. For instance, the 9-qubit code can be thought of as the concatenation of an inner 3-qubit
code correcting phase errors with an outer 3-qubit code correcting bit flip errors. Each of the 3-qubit codes
by itself has distance 1 since it can only detect or correct one kind of error, but together they form a 9-qubit
code with distance 3.
A common situation is when the inner code and the outer code are the same, which requires that the
register size equal the encoded Hilbert space size. Therefore, we can concatenate a ((n, q, d))q code with
itself to get a ((n2 , q, d2 ))q code. We can repeat this process to get bigger and bigger codes.

Definition 9.2. Let Q1 = Q be a ((n, q, d))q code, and let Qk be the ((nk , q, dk ))q code obtained by
concatenating Qk 1 as the inner code with Q as the outer code. We say that Qk is a code involving k levels
of concatenation. The physical qudits form level 0 qudits, the outer code for Qk is level 1 of concatenation,
and its logical qudits are level 1 qudits. The logical qudits for the outer code of the Q` that appears in the
recursive definition of Qk are level k ` + 1 qudits and the logical qudits for the overall code are level k
qudits.

9.1.2 Concatenated Stabilizer Codes

If we concatenate two qubit stabilizer codes, the resulting code is a stabilizer code as well, with the stabilizer
constructed in a particular way.

Procedure 9.1. Let the inner code be an [[n1 , 1, d1 ]] code with stabilizer S1 , and let the outer code be
an [[n2 , k, d2 ]] code with stabilizer S2 and logical Paulis P . Then the stabilizer S of the concatenated code
formed from these two codes is given as follows:

1. For each generator M 2 S2 , include in S the Pauli Mi consisting of M acting on the ith block of n2
qubits in the code tensored with the identity on all other blocks.

2. For each generator M 2 S1 , replace each single-qubit Pauli Pi in its tensor product decomposition (i.e.,
acting on the ith physical qubit) with P i , the corresponding logical Pauli from the outer code acting
on the ith block of n2 qubits in the concatenated code. Take the tensor product of all the P i operators
for a single M 2 S1 and include that in S.

143
X Z Z X I I I I I I I I I I I I I I I I I I I I I
I X Z Z X I I I I I I I I I I I I I I I I I I I I
X I X Z Z I I I I I I I I I I I I I I I I I I I I
Z X I X Z I I I I I I I I I I I I I I I I I I I I
I I I I I X Z Z X I I I I I I I I I I I I I I I I
I I I I I I X Z Z X I I I I I I I I I I I I I I I
I I I I I X I X Z Z I I I I I I I I I I I I I I I
I I I I I Z X I X Z I I I I I I I I I I I I I I I
I I I I I I I I I I X Z Z X I I I I I I I I I I I
I I I I I I I I I I I X Z Z X I I I I I I I I I I
I I I I I I I I I I X I X Z Z I I I I I I I I I I
I I I I I I I I I I Z X I X Z I I I I I I I I I I
I I I I I I I I I I I I I I I X Z Z X I I I I I I
I I I I I I I I I I I I I I I I X Z Z X I I I I I
I I I I I I I I I I I I I I I X I X Z Z I I I I I
I I I I I I I I I I I I I I I Z X I X Z I I I I I
I I I I I I I I I I I I I I I I I I I I X Z Z X I
I I I I I I I I I I I I I I I I I I I I I X Z Z X
I I I I I I I I I I I I I I I I I I I I X I X Z Z
I I I I I I I I I I I I I I I I I I I I Z X I X Z
X X X X X Z Z Z Z Z Z Z Z Z Z X X X X X I I I I I
I I I I I X X X X X Z Z Z Z Z Z Z Z Z Z X X X X X
X X X X X I I I I I X X X X X Z Z Z Z Z Z Z Z Z Z
Z Z Z Z Z X X X X X I I I I I X X X X X Z Z Z Z Z
X X X X X X X X X X X X X X X X X X X X X X X X X X
Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z Z

Table 9.1: The stabilizer for the [[25, 1, 9]] code formed by concatenating the 5-qubit code with itself.

3. Form the group generated by the elements described in the previous steps.

One can also determine the logical Paulis for the concatenated code by the same process as in step 2: take a
logical Pauli for S1 and replace each Pauli in its tensor product decomposition with the corresponding logical
Pauli for S2 .

An example of a concatenated stabilizer code is the [[25, 1, 9]] code formed by concatenating the 5-qubit
code with itself. The resulting stabilizer is given in table 9.1. Note that the 25-qubit code is degenerate.
Indeed, a quick consideration of procedure 9.1 should convince you that concatenated stabilizer codes will
frequently be degenerate: The stabilizer of a concatenated code contains many elements that are drawn from
a single copy of the stabilizer of the outer code. These elements of S have weight bounded by n2 , which can
easily be lower than the distance d1 d2 of the concatenated code.
Another interesting case is when the outer code is a qubit code, because then the resulting concatenated
code is a qubit code as well. For instance, suppose the outer code is an [[n2 , k2 , d2 ]]2 code. Then the inner
code should work with registers of size q1 = 2k2 . This is a prime power, so we are dealing with a code
over GF(2k2 ). For instance, we can use an [[n1 , k1 , d1 ]]2k2 code to get a [[n1 n2 , k1 k2 , d1 d2 ]]2 code. (It has k1
logical qudits of size 2k2 , which corresponds to k1 k2 logical qubits.) This can be a useful way to find large
qubit stabilizer codes with reasonable parameters.

9.1.3 Decoding Concatenated Codes

Given the structure of a concatenated code, one natural approach to decoding is to treat the inner code
and the outer code separately. For each block of the outer code R, perform error correction via whichever

144
algorithm is appropriate to that code. Now each block of n2 qudits is a valid codeword for R, but the logical
state of each block may have been changed by the errors. If we decode each block, we are left with Q with
some errors on it, and can perform the usual decoding procedure for it.
The advantage of this procedure is that it is efficient if the decoding algorithms used for Q and R are.
It’s also quite straightforward. The disadvantage is that it doesn’t take advantage of the full error correction
capability of the concatenated code. For instance, consider the 125-qubit code formed by concatenating the
25-qubit code of table 9.1 with the 5-qubit code. If a block of the outer 5-qubit code has 2 errors on it,
the correction procedure on that block will fail, leaving an error on the logical qubit. However, the inner
code can only correct four errors, so if five blocks of the outer code are wrong, which can happen with 10
errors in total, than this decoding procedure will make a mistake. The 125-qubit code has distance 27, so it
should be able to correct 13 errors. The di↵erence arises because when we correct the inner code and outer
codes separately, we throw away some information that would be useful. For instance, if a single block of
the five-qubit outer code has non-trivial syndrome, it is more likely the corresponding qubit of the inner
code has an error on it. This means that if there are only two errors in any given 5-qubit block, the errors
on the inner code act like erasure errors, allowing us to correct more of them. Of course, it’s also possible
that there are 3 errors on a 5-qubit code block, which could produce a logical error with no error syndrome
advertising it, so the decoding process is not necessarily straightforward.
Notice that as we concatenate the same code many times, the fractional distance d/n decreases — with
k levels of concatenation of an [[n0 , 1, d0 ]] code, n = nk0 and d = dk0 . Thus, in a very meaningful sense, the
code gets worse at correcting errors as we concatenate many times. However, this is only true if we insist on
correcting the worst case error. For typical errors, the concatenated code does well. For instance, suppose
we consider a depolarizing channel with total error probability p for each qubit (i.e., p/3 chance of each of
X, Y , and Z errors). The probability that a single block of the [[n0 , 1, d0 ]] code fails and has a logical error
is roughly ✓ ◆
n0
p1 ⇡ pt+1 = pT (p/pT )t+1 , (9.1)
t+1
n0 1/t
where t = b(d 1)/2c, pT = t+1 , and I have assumed that p is small, so that we can neglect terms of
t+2
order p and higher, and that the code always fails when there are t + 1 errors. If the code can correct
some weight t + 1 errors, the logical error rate will be lower.
Now, if we concatenate the [[n0 , 1, d0 ]] code with itself, and decode using the simple level-by-level scheme
(which, as noted above, is sub-optimal), the probability of the concatenated code having a logical error is
2
p2 ⇡ pT (p/pT )(t+1) , (9.2)

and we can show by induction that with k levels of concatenation, the probability of having a logical error is
k
pk ⇡ pT (p/pT )(t+1) . (9.3)

When p < pT , the logical error rate rapidly converges to 0 with k. Note that the typical case now has
pnk0 tk errors occurring in the code. There will be cases where a small number of errors can cause the
code to fail, but those are very rare.

9.2 Convolutional Codes

A convolutional code, as opposed to a block code, does not involve a fixed number of physical qubits. Instead,
it is a family of arbitrarily large codes with the property that new logical qubits can be added on to the
end. Convolutional codes frequently have a periodic structure, with the same stabilizer generators repeated
shifted by a few qubits, although this is not required.
The main motivation for a classical convolutional code is to handle data provided in a streaming fashion
— that is, logical bits appear one at a time and should be encoded promptly, without too much delay,
and then the corresponding physical bits can be sent on their way. However, convolutional codes often have

145
other substantial advantages. In particular, the size of the encoding circuit for a convolutional code is usually
linear in the number of logical bits, and there is a natural linear-time decoding algorithm as well for many
convolutional codes.
In return, convolutional codes generally give up a bit of error protection, as logical bits are e↵ectively
localized, meaning an error a↵ecting a small number of bits can change a logical bit. However, a well-designed
convolutional code should also have the property that a local error only changes a few logical bits. This is
in contrast to block codes, where all the logical bits are typically spread out over all the physical bits. This
means a large physical error is necessary to cause a logical error, but when a logical error does occur, there
is no protection against the error changing all of the logical bits.
The situation for quantum convolutional codes is more complicated. Unfortunately, it turns out not to
be possible to make quantum convolutional codes with all the desireable properties of classical convolutional
codes. Nevertheless, we can achieve some of the nice properties described above.

9.2.1 Basics of Quantum Convolutional Codes

A classical convolutional code is simply one in which the encoding of each logical bit depends only on
itself and the previous bits. In particular, it does not depend on the subsequent bits. However, a quick
consideration reveals that this definition does not make sense for quantum codes: Any non-trivial two-qubit
gate alters both of the qubits, whereas a gate between bits can alter one bit conditionally while leaving the
other unchanged.
However, most often, classical convolutional codes have an additional constraint — namely, a finite
memory. Thus, each physical bit depends on only t bits of information about the previous logical bits, no
matter how many bits have been encoded. This is a property that carries over sensibly to the quantum
regime. We can thus define a quantum convolutional code as follows:
Definition 9.3. A qubit quantum convolutional code is a family of ((ni , 2ki )) quantum error-correcting
codes Qi (i 2 Z+ ) with the properties
1. r, s, t are positive integers with s  r  t,
2. ni = ni 1 + r, ki = ki 1 + s for i > 1,
3. There exist t-qubit unitaries Ui and Vi (i 2) and n1 -qubit unitary W such that the encoding circuit
Q2
for Qi is Vi ( j=i Uj )W . Here, Ui acts on qubits ni t + 1 to ni and takes as input the last t r
outputs of Ui 1 , the s logical qubits ki 1 + 1 to ki , and r s |0i ancilla qubits. Vi also acts on qubits
ni t + 1 to ni and takes as input the output qubits of Ui , and W acts on qubits 1 through n1 and
takes as input the logical qubits 1 to k1 and n1 k1 |0i ancilla qubits.
The rate of a quantum convolutional code is s/r.
The unitaries Vi and W are used to start and end the code block, to make the stream of qubits of finite
length. The real meat of the convolutional code is in the unitaries Ui , which accumulate as the code gets
longer and longer. Figure 9.2 illustrates the structure of the encoding circuit for a convolutional code, and it
is evident from the figure that the encoder only relies on a finite memory for past qubits. Any qubits beyond
the most recent t (including new logical qubits and ancillas) can be sent down the channel before encoding
step Ui is performed, and are never needed again in the encoder. The rate of the convolutional code is the
limiting rate ki /ni as i ! 1. For finite i, the rate of the specific code Qi might slightly di↵er from the
asymptotic rate, but convolutional codes are usually considered in the limit of large i, so the asymptotic
behavior is the dominant one.
Most often, the unitaries Ui and Vi are the same up to shifts. That is, Ui = I ⌦(ni t) ⌦ U and Vi =
⌦(ni t)
I ⌦ V . Moreover, we usually deal with stabilizer convolutional codes, for which Ui , Vi , and W are all
Cli↵ord group operations. It is also possible, of course, to define qudit quantum convolutional codes in the
same way.
The stabilizer of a convolutional code consists of sets Si of r s generators. The generators in Si are of
the form I ⌦(ni t) ⌦ Mi,j , with j = 1, . . . , r s. When the encoder is shift-invariant, so are the generators, so

146
|0i
W
|0i
| 1i U1
|0i
| 2i U2
|0i
| 3i U3
V3
|0i

Figure 9.2: Encoding circuit for a convolutional code.

Mi,j = Mj except perhaps for the largest and smallest values of i (due to the terminating unitaries Vi and
W ). Moreover, in many examples of convolutional codes (though not all), the weight of Mj is finite. Thus,
the stabilizer of a large member of the convolutional code family can be described simply via r s finite-size
Paulis which are then shifted systematically to give the full stabilizer of the code. For the remainder of this
subsection, I will specialize to codes that have all of these properties (shift-invariant stabilizer convolutional
codes with finite-weight stabilizer generators). I will also largely ignore the termination complication on the
last few qubits, focusing instead on the main part of the code far from the end.

Since the shift transformation is so important in the description of convolutional codes, it is useful to
have some specific notation for it. I will write D for the operation “shift
Pby r qubits”. Then, for instance,
Mi,j = Di 1 M1,j . We can also form polynomials out of D, of the form i ↵i Di with ↵ 2 Z2 . We can then
define

!
X Y
i ↵i
↵i D (M1,j ) = Mi+1,j . (9.4)
i i

This is also an element of the stabilizer, since it is a product of generators. Indeed, we don’t have to restrict
to just finite-degree polynomials. Even an infinite-degree polynomial in D, also known as a formal Laurent
series, gives us an element of the stabilizer in the limiting case where the code continues forever and never
terminates.

Two examples of convolutional codes are given in tables 9.2 and 9.3. The first is based on the 5-qubit
code, and has r = 5 and s = 1. Its rate is 1/5. The second example is not related to any standard block
code, and has r = 3 and s = 1, with rate 1/3. By looking at the logical Paulis, you can see that both codes
have distance 3. However, note that if there is only one error in each block of 5 (for the code of table 9.2) or
6 (for the code of table 9.3), then the syndrome will uniquely identify the error since it uniquely identifies
that part of the error on each block. The generators M0 and (for the second code) M00 are produced by W
and are not part of the repeated generators Si .

Since both of these codes have a finite extent to both the stabilizer generators and logical Paulis, we can
deduce t by looking at the number of qubits involved in each block. For instance, in the case of the code in
table 9.2, if we pick t = 7, we have enough room to make the 4 new generators and logical Paulis on the 1
new logical qubit have the correct commutation relations with each other and the old generators and logical

147
M0 Z X
M1 X Z Z X
M2 X Z Z X
M3 X Z Z X
M4 X Z Z X
DM1 X Z Z X
DM2 X Z Z X
DM3 X Z Z X
DM4 X Z Z X
.. ..
. .

X1 X X X X X
Z1 X Z X
X2 X X X X X
Z2 X Z X
.. ..
. .

Table 9.2: A convolutional code with period 5 and rate 1/5.

M0 X X Z Y
M00 Z Z Y X
M1 X X X X Z Y
M2 Z Z Z Z Y X
DM1 X X X X Z Y
DM2 Z Z Z Z Y X
.. ..
. .

X1 X Y Z
Z1 Z X Y
X2 X Y Z
Z2 Z X Y
.. ..
. .

Table 9.3: A convolutional code with period 3 and rate 1/3.

148
Paulis. In particular, we can let Ui transform Paulis as follows:
X ⌦ I ⌦ I ⌦ I ⌦ I ⌦ I ⌦ I 7! X ⌦ I ⌦ I ⌦ I ⌦ I ⌦ I ⌦ I (9.5)
Z ⌦ X ⌦ I ⌦ I ⌦ I ⌦ I ⌦ I 7! Z ⌦ X ⌦ I ⌦ I ⌦ I ⌦ I ⌦ I (9.6)
I ⌦ I ⌦ I ⌦ X ⌦ I ⌦ I ⌦ I 7! I ⌦ X ⌦ X ⌦ X ⌦ X ⌦ X ⌦ I (9.7)
I ⌦ I ⌦ I ⌦ Z ⌦ I ⌦ I ⌦ I 7! I ⌦ X ⌦ I ⌦ Z ⌦ I ⌦ X ⌦ I (9.8)
I ⌦ I ⌦ I ⌦ Z ⌦ I ⌦ I ⌦ I 7! X ⌦ Z ⌦ Z ⌦ X ⌦ I ⌦ I ⌦ I (9.9)
I ⌦ I ⌦ I ⌦ I ⌦ Z ⌦ I ⌦ I 7! I ⌦ X ⌦ Z ⌦ Z ⌦ X ⌦ I ⌦ I (9.10)
I ⌦ I ⌦ I ⌦ I ⌦ I ⌦ Z ⌦ I 7! I ⌦ I ⌦ X ⌦ Z ⌦ Z ⌦ X ⌦ I (9.11)
I ⌦ I ⌦ I ⌦ I ⌦ I ⌦ I ⌦ Z 7! I ⌦ I ⌦ I ⌦ X ⌦ Z ⌦ Z ⌦ X (9.12)
The action on the other independent Paulis can be anything with the right commutation relations. This
Cli↵ord will then leave the generators and logical Paulis created by the previous Ui 1 unchanged while
encoding the new X, Z, and 4 new stabilizer generators into their final form. Similarly, we can pick t = 6
for the code of table 9.3.
As you can see from the examples, the stabilizer generators typically act on a stretch of qubits longer
than r, but we can still describe each stabilizer element with a shorthand over r “qubits” by using D. In
particular, a generator can be expressed as a tensor product of r polynomials in D with coefficients from
{I, X, Y, Z}. I will denote the tensor product of polyonomials corresponding to P by P (D). Thus, in the
code of table 9.2, the polynomials for the generators of S1 are
M1 (D) = X ⌦ Z ⌦ Z ⌦ X ⌦ I (9.13)
M2 (D) = I ⌦ X ⌦ Z ⌦ Z ⌦ X (9.14)
M3 (D) = XD ⌦ I ⌦ X ⌦ Z ⌦ Z (9.15)
M4 (D) = ZD ⌦ XD ⌦ I ⌦ X ⌦ Z (9.16)
The polynomials for the generators of S1 for the code in table 9.3 are
M1 (D) = Y D ⌦ X ⌦ X ⌦ X ⌦ X ⌦ Z (9.17)
M2 (D) = XD ⌦ Z ⌦ Z ⌦ Z ⌦ Y ⌦ X. (9.18)
To find the versions of these generators shifted by r, just multiply by D.
When the set of stabilizer generators acts on more than r qubits, it will overlap with the same set of
generators shifted by D. Thus, in order to determine if a purported stabilizer for a quantum convolutional
code actually commutes, it is necessary to compare not just generators within the same set Si , but also ones
in di↵erent sets. Using the shorthand provides a natural criterion:
Proposition 9.2. Di P and Dj Q commute for all i, j i↵
vP (D) vQ(D 1 ) = xP (D) · zQ(D 1 ) + zP (D) · xQ(D 1 ) = 0. (9.19)
P P
Proof. Suppose we write vP (D) = i vPi Di and vQ(D 1 ) = j vQj D j . Then
X
vP (D) vQ(D 1 ) = v Pi v Q j D i j (9.20)
i,j
+1 X
X
= v Pi v Qi k
Dk . (9.21)
k= 1 i
P
The coefficient of D0 , i v Pi vQi , we can recognize immediately as vP vQ = c(P, Q). Similarly, the
coefficient of Dk is X
vPi vQi k = c(P, Dk Q). (9.22)
i
Thus, vP (D) vQ(D 1 ) = 0 as polynomials i↵ P and Dk Q commute for all k. Since c(Di P , Dj Q) =
c(P, Dj i Q), this proves the proposition.

149
9.2.2 Decoding of Convolutional Codes
One advantage of quantum convolutional codes is that by definition they have linear-time encoding algo-
rithms. Note that this is true even for non-stabilizer codes, since each Uj , Vi , and W act on a constant
number of qubits. Even for a stabilizer convolutional code, an O(n) encoder is a signficant improvement
over the O(n2 / log n) size of the encoding circuit for a generic stabilizer code.
Even more dramatic, and not as obvious, is that convolutional codes often have a linear-time decoding
algorithm as well. Since general stabilizer codes could take exponential time to decode, this is a big deal.
The algorithm is known as the quantum Viterbi algorithm. The Viterbi algorithm (classical or quantum) is
an example of dynamic programming. The trick to it is to work in chunks of size r and try to figure out
the most likely error up to qubit ni . What we’d like to do then is to move to the next block of r qubits
and try to figure out the most likely error up to qubit ni+1 = ni + r. Ideally, we would just say, “Oh, the
most likely error on ni+1 qubits is just the most likely error on the first ni qubits followed by the most
likely error on the next r qubits.” Then we could just systematically work our way through each set of r
qubits picking the most likely errors and get a decoding algorithm running in O(n) time. Well, we could
say that, but sometimes we’d be wrong. The problem is that (for a stabilizer code), there may be stabilizer
generators that act on the first ni qubits but also act on later qubits. These generators are not as helpful as
they might be when determining the most likely error on the first ni qubits because there will be multiple
di↵erent choices of error on the next r qubits that get the error syndrome right. The problem is that some
choices of error on the next r qubits might be extremely unlikely, but to know that, we would need to look
at the error syndrome on more generators, which means involving even more qubits, and so on.
The solution is not to pick the single most likely error, but instead the most likely error ending in each
possible way. That is, we need to consider all the di↵erent ways the error on the first ni qubits might
a↵ect any stabilizer generators that continue into the next set of r qubits. Here’s where we need to make an
assumption: specifically, we will assume we have a convolutional code with finite-weight stabilizer generators,
all bounded by a constant w. (Note that w might or might not be the same as t in general.) That means that
the stabilizer generators in Si+1 only touch the last w r qubits out of the first ni qubits. That’s important,
because it means that if we want to find an error that satisfies generators only in sets S1 through Si , there
are only a constant number of ways, specifically 4w r , that the error can end. To extend to the most likely
error satisfying the syndrome for Si+1 and ending in a particular way, we try out all the di↵erent ways an
error on ni qubits can end matched with all the compatible errors on the next r qubits and pick the most
likely combination.
Putting this together, we get the following algorithm to find the most likely or lowest-weight error
consistent with the measured error syndrome for all generators:

Algorithm 9.2 (quantum Viterbi algorithm). Let {Qi } be a stabilizer quantum convolutional code with
stabilizer generators I ⌦(ni w) ⌦ Mi,j , with j = 1, . . . , r s, wt Mi,j  w, r  w. Given an error syndrome for
some particular instance of the code with n total physical qubits, use the following algorithm to determine
the most likely (or lowest-weight) error:

1. Create a table of all possible Pauli errors on w r qubits, with two entries (NP , pP ) for each Pauli P .
Initialize by simply letting NP = P and putting the probability (or weight) of P as pP . If there are
any stabilizer generators with support on just the first w r qubits, set the probability of any Pauli
incompatible with the error syndrome to 0, or set the weight to 1.

2. Set n0 = w r and i = 0.

3. Until ni n, repeat the following steps:

(a) For each Pauli P on qubits ni+1 (w r) + 1 to ni+1 , run through all possibilities for Q on qubits
ni (w r) + 1 to ni and R on qubits ni + 1 through ni+1 (w r). (If w r r, we don’t need to
run over R and we only consider Q which is consistent with P on qubits ni+1 (w r) + 1 through
ni .) If NQ ⌦ R ⌦ P has the correct error syndrome for all generators Mi+1,j (j = 1, . . . , r s),
calculate the probability (or weight) of NQ ⌦ R ⌦ P by multiplying the probability pQ times the

150
probability of R⌦P (or by adding the weights pQ and wt R+wt P ). Choose Q and R such that the
probability is highest or the weight is the lowest. Create an updated table with NP0 = NQ ⌦ R ⌦ P
and pP is the probability or weight of NP0 . If no Q and R gives the correct error syndrome, set
p0P to probability 0 or weight 1.
(b) Switch the updated table (NP0 , p0P ) into the main table (NP , pP ), increase i by 1 and let ni+1 =
ni + r.
4. Look through the table, running over all P , to find the value for which the probability pP is a maximum
or the weight is a minimum. Output that corresponding entry NP as the most likely (or lowest-weight)
error for this error syndrome.

Convolutional codes are looking pretty good right now: they have a non-zero asymptotic rate, a linear-
time encoder, and can also have a linear-time decoder. Unfortunately, there is one big problem: they cannot
have more than a constant distance. This is true for any quantum convolutional code, regardless of whether
the Ui repeat, or whether it is a stabilizer code, or if the stabilizer generators have finite extent. This issue
arises because qubits are not spread out enough by the encoding circuit. In particular, there is a small
bottleneck, the finite memory of the encoder, which prevents too much information about old logical qubits
from being involved in the later physical qubits.
Proposition 9.3. Let {Qj } be a quantum convolutional code (not necessarily a stabilizer code). Then the
distance of Qj is less than c for all j, where c is the constant c = rd(3t + r)/se.
Proof. When the unitaries Uj and Vj are t-qubit unitaries, choose i to be an arbitrary value at least dt/re,
so Ui+1 does not act on the first n1 qubits. Also, Ui does not act on qubit ni + 1 or any later qubit. Let
i0 > i be such that ki0 ki > 3t + r, and let i00 = i0 + dt/re, so Ui00 +1 does not act on the first ni0 qubits.
Q1 Qi+1
Finally, pick any i000 i00 . We can write the encoder as UC UB UA , where UA = ( j=i Uj )W , UB = j=i00 Uj ,
Qi00 +1
UC = Vk ( j=i000 Uj ).
Consider code Qi000 . It has encoding circuit UC UB UA . We will imagine fixing all logical qubits less than
or equal to ki and greater than ki0 while varying the logical qubits ki + 1 to ki0 . For instance, we can let
the fixed qubits be all |0i. Since UA does not act on physical qubits ni + 1 or later, it does not involve
the varying logical qubits. Let | i = UA |0 . . . 0i. Now let us consider erasure errors E acting only on the
physical qubits from ni + 1 to ni0 . For the QECC conditions to hold, we must have, for x 6= y basis states
for logical qubits ki + 1 through ki0 ,

0 = h0 . . . 0x0 . . . 0|E|0 . . . 0y0 . . . 0i (9.23)

= h0 . . . 0x0 . . . 0|UA† UB† UC† EUC UB UA |0 . . . 0y0 . . . 0i (9.24)
= h0 . . . 0x0 . . . 0|UA† UB† EUB UA |0 . . . 0y0 . . . 0i (9.25)
= (h | ⌦ hx0 . . . 0|)UB† EUB (| i ⌦ |y0 . . . 0i), (9.26)

since UC acts only on qubits not involved in E and UA acts only logical qubits before ki . Now, | i could
be an entangled state between those qubits which are acted on by UB and those which are not, but even
if it is maximally entangled, the Schmidt rank is at most t, since that is the largest number of qubits that
UB could act on. In particular, there exists unitary V acting only on qubits not acted on by UB such that
V | i = |0 . . . 0i ⌦ | 0 i, with | 0 i a 2t-qubit state, and with UB not acting on the first tensor factor. We thus
have

0 = (h | ⌦ hx0 . . . 0|)UB† EUB (| i ⌦ |y0 . . . 0i) (9.27)

= (h | ⌦ hx0 . . . 0|)V † UB† EUB V (| i ⌦ |y0 . . . 0i) (9.28)
0
= (h | ⌦ hx0 . . . 0|)UB† EUB (| 0 i ⌦ |y0 . . . 0i). (9.29)

Moreover, only qubits up to ni00 can be involved here too, since UB cannot act on qubits ni00 + 1 or greater.
Thus, equation (9.29) only involves 2t + r(i00 i) qubits.

151
Finally, consider the case where E = |0 . . . 0i h0 . . . 0| on qubits ni + 1 through ni0 , a total of ni0 ni =
r(i0 i) physical qubits. Equation (9.29) then becomes an inner product
h x| y i = 0, (9.30)
where | x i is defined by | x i ⌦ |0 . . . 0i = |0 . . . 0i h0 . . . 0|UB (| 0 i ⌦ |x0 . . . 0i).
| x i involves 2t + r(i00 i0 ) = 2t + rdt/re < 3t + r qubits, and all the di↵erent | x i must be orthogonal.
But the number of di↵erent such states is ki0 ki > 3t + r. This is a contradiction, so we have found an
erasure error of weight r(i0 i) for which the QECC conditions are not satisfied. Since ki0 ki = s(i0 i),
we have shown that the distance of the convolutional code is at most rd(3t + r)/se.
Basically, convolutional codes can only correct errors that do not cluster too much. Even randomly
placed errors will have occasional clusters of size larger than c, so convolutional codes also are not good
against them. Nevertheless, convolutional codes can still be useful if we’re willing to settle for something less
than correcting all errors. Typically, we restrict attention to codes which are non-catastrophic, meaning that
a uncorrectable error confined to a region only a↵ects logical qubits near that region, while logical qubits
far away remain OK. Non-catastrophic convolutional codes do allow errors to happen, but they successfully
quarantine them, limiting the damage.
Definition 9.4. Suppose the decoder for a stabilizer convolutional code {Qi } assigns Pauli P to syndrome
. Then for any error Q, RQ = P †(P ) Q is a logical Pauli operator. The decoder is catastrophic if there exists
some finite-weight Pauli Q such that RQ has unbounded weight as i ! 1. A stabilizer convolutional code
is catastrophic if all decoders for it are catastrophic.

9.3 Information-Theoretic Approach to QECCs

The QECC conditions (theorem 2.7) and their variants give us a nice algebraic set of conditions for deter-
mining if a subspace is a QECC. It is helpful to also have a di↵erent criterion based on information theory,
which will basically say that the amount of quantum information in the encoded state stays constant.
One big di↵erence in the information-theoretic approach compared to the algebraic approach of theo-
rem 2.7 is that the information-theoretic approach requires us to specialize to a specific quantum channel
for the noise, whereas the QECC conditions instead work with a list of possible errors. The advantage of
the latter approach is that we don’t need to know precisely what is going on with the noise, just the list of
possible things that could happen to our data. The information-theoretic approach, on the other hand, is
much more natural when we know that some types of error are substantially more likely than others. In the
algebraic approach, our only option was to say “these errors are negligible, so we will ignore them.” The
information-theoretic approach lets us weight each error by its actual prevalence.
The algebraic approach is also useful for proving more algebraic properties of codes, such as the fact
that the distance tells us about the code’s ability to correct both general errors (corollary 2.8) and erasure
errors (theorem 2.10) or the Eastin-Knill theorem (theorem 11.7) in fault tolerance. These types of things
are tougher in the information-theoretic approach, which is more natural when dealing with information-
theoretic properties like the quantum channel capacity, which I will discuss in chapter 18.
The big payo↵ of the information-theoretic approach is that it generalizes very naturally to approxi-
mate quantum error-correcting codes (section ??), which don’t recover the state perfectly but only to some
good approximation. The algebraic QECC conditions don’t work well for approximate codes, with a large
(dimension-dependent) factor between the necessary and sufficient conditions, whereas the obvious gen-
eralization of the information-theoretic condition we will derive works straightforwardly. This is another
reason to use the information-theoretic approach with the channel capacity, since it deals with the case of
infinitely-many qubits, where the error is non-zero but asymptotes to zero.

9.3.1 Coherent Information

So, we want to quantify the amount of quantum information in a system Q. Information should be about
something, which we can represent by an additional system R, known as a reference system. To make the

152
information quantum, we should allow for the possibility of entanglement between Q and R. However, the
joint system of Q and R might not be pure, so we could also consider an environment E, as in the final state
of figure 9.3. We might as well consider the whole universe outside of Q and R to be the environment; that
is, we can let the joint state of RQE be a pure state. We imagine that R (being an idealized system) has
remained isolated so that all its entanglement is with Q, but Q has not, so it may be entangled with E.
If Q and R are maximally entangled, it makes sense to say that the amount of quantum information is n
when each system has n qubits, so Q has n qubits of quantum information “about” R. If ⇢QR is an arbitrary
pure state, the amount of quantum information in Q about R should then just be the entanglement entropy
between the two, S(Q) (which I am using as shorthand for S(⇢Q )).
Another case where the answer is obvious is when Q can be decomposed into a tensor product between
a subsystem Q1 which is entangled only with R and a second subsystem Q2 which is entangled only with E.
In that case, the amount of quantum information in Q about R is just

S(Q1 ) = S(Q) S(E). (9.31)

This will turn out to be the correct answer more generally, and it makes sense. S(E) is the amount of
information that E “knows” about Q and if the environment knows about the state of a system, it can’t be
quantum. (For instance, because a measurement on the environment could collapse the logical state.)
We’d like to phrase our formula without talking about E so that it is just a property of the state of Q
and R. Since the global state is pure, S(E) = S(RQ). Then we have the following definition to quantify the
amount of quantum information in a system.

Definition 9.5. Given a quantum system Q entangled with a reference system R in the state ⇢RQ , the
coherent information in Q is
Ic (⇢RQ ) = S(⇢Q ) S(⇢RQ ). (9.32)
Let E be a quantum channel taking system L into Q. The initial state of L and R is a pure state ⇢. Then
the coherent information of E for ⇢ is

Ic (E, ⇢) = S(E(⇢L )) S((I ⌦ E)(⇢)). (9.33)

The channel set-up realizes the situation we discussed above: R remains isolated and does not go through
the channel E or interact with the environment E, but Q does and therefore may lose some coherence to E.
The coherent information of E on ⇢ then quantifies (as we will see shortly) how much quantum information
about R remains in Q.
This suggests that it is helpful to consider not just the state of Q after the channel, but also the state of
E. The channel that maps L to E is one that tells us a lot about error correction properties of E:

Definition 9.6. Let E be a quantum channel taking system L into Q which can be purified into U using
an environment E. Then the channel produced by performing U on L and discarding Q maps L to E. It is
written Ê and is called the complementary channel.

If you’re familiar with classical information-theoretic quantities, you may recognize the formula for co-
herent information as the negative of a conditional entropy Ic (⇢RQ ) = S(R|Q). The conditional entropy
is a quantification of the amount of uncertainty remaining in R once Q is specified. From a classical point
of view, it is very peculiar to consider a negative conditional entropy: The coherent information can never
be positive for a classical system! But from a quantum perspective, negative uncertainty actually makes a
certain kind of perverse sense — negative uncertainty means that you are more certain about the state than
would be allowed classically, which doesn’t seem so dissimilar to entanglement providing stronger correlations
than are possible classically, as happens with Bell inequalities.
One property of coherent information is that putting the state through a channel cannot increase it. This
makes it suitable for the various applications we have in mind for it, since quantum information, once lost,
should be gone for good. (Error correction does not replace lost quantum information, it protects it and
spreads it out so that it is not destroyed in the first place.)

153
R R

L Q

E E

Figure 9.3: System L is entangled with reference system R. L goes through a noisy channel to become
system Q after interacting with environment E.

Theorem 9.4 (Quantum data processing inequality). Let ⇢ be a mixed state on reference system R and
quantum system L, and let E be a quantum channel mapping L to Q. Then

Ic (E, ⇢)  Ic (⇢). (9.34)

Equivalently, if E maps L to Q and F maps Q to Q0 , then

Ic (F E, ⇢)  Ic (E, ⇢). (9.35)

Proof. I will only prove the first version, since the second is essentially identical. Let E be the environment
used to purify the initial state ⇢. E does not act on E; instead, it uses a new environment E 0 . E also does
not act on R. Then

Ic (⇢) = S(L) S(RL) = S(RE) S(E) (9.36)

0 0
Ic (E, ⇢) = S(Q) S(RQ) = S(REE ) S(EE ). (9.37)

Applying the strong subadditivity property of entropy to R, E, and E 0 , we find

S(REE 0 ) + S(E)  S(RE) + S(EE 0 ) (9.38)

Ic (E, ⇢)  Ic (⇢). (9.39)

9.3.2 Coherent Information and QECCs

The data processing inequality immediately tells us that in order for a QECC to function, the coherent
information must not decrease under the action of the error, since otherwise we cannot recover the lost
coherent information and therefore cannot possibly get back to where we started. Thus, coherent information
staying constant is a necessary condition for quantum error correction. It turns out that it is also a sufficient
condition.
What is the condition, precisely? The coherent information is state-dependent, and we need it to be con-
stant for any input codeword. It’s simpler to phrase the condition by simply saying the coherent information
remains constant for any input state, but then apply the condition not to the noise channel E by itself but
to the composition of the encoder U followed by E.

Theorem 9.5. Let F : L ! Q be a quantum channel. Let R be a reference system, which must have a
dimension at least as large as L. There exists a decoding CPTP map G with the property G F = I i↵
Ic (F, ⇢) = S(⇢L ) for all pure states ⇢ on R ⌦ L.

154
When we let F = E U , where U is a partial isometry from the logical Hilbert space L to a larger physical
Hilbert space, then the condition that 9 G such that G F = I is immediately equivalent to the definition of
a QECC for the case where the set of possible errors is just the set of Kraus operators of the channel E. The
normalization gets taken care of automatically because all of the channels U , E, and G are trace preserving.
Therefore, theorem 9.5 gives us the desired information-theoretic version of the QECC conditions.

Proof. Note that Ic (⇢) = S(⇢L ) since ⇢ is a pure state, so there is no initial environment. Since Ic (I, ⇢) =
Ic (⇢), the forward direction follows immediately from the data processing inequality, as noted above.
Now suppose Ic (F, ⇢) = S(⇢L ) for all ⇢. Let us consider the purification of F to V , mapping L to Q ⌦ E,
and let = I ⌦ V (⇢). Since the global state of RQE is a pure state, that means that

Ic (F, ⇢) = S(Q) S(RQ) = S(RE) S(E) (9.40)

and since the initial state ⇢ is a pure state, S(L) = S(R). Therefore,

S(RE) = S(E) + S(R), (9.41)

which is true only when RE = R ⌦ E . This is true for all ⇢, and in particular it is true when ⇢ is a
maximally entangled state for RL.
When ⇢ is a maximally entangled state, we can prepare arbitrary pure states | iL on L by performing
di↵erent projections on R. Since V doesn’t act on R, projections on R commute with the channel. If we
perform a projection on R for the final state , the state E is una↵ected, since RE = R ⌦ E . Thus the
state of E is E regardless of which pure state | iL was initially prepared for L.
Now, the e↵ect of F is to do V , getting state , and then to discard E. Let us think of V as the encoding
map of a QECC and discarding E as an erasure error that happens to this code. One version of the QECC
conditions (proposition 2.12) says that we can correct for the erasure of E i↵ the state E is independent
of | iL . That is true in this case, so there exists a recovery operation G for which G F = I. (Here, F
combines both the encoding V and the error, an erasure of E, and recall that normalization automatically
follows because F is trace preserving.)
It’s worth noting that in the proof, all we really need is for Ic (F, ⇢) = S(⇢L ) for a maximally entangled
state ⇢. We don’t actually need to check this condition for all ⇢, but it follows from the proof that if it’s
true for a maximally entangled state, then it is true for all ⇢.

155
156
Part II

Fault-Tolerant Quantum Computation

157
Chapter 10

Everyone Makes Mistakes: Basics Of

Fault Tolerance

We’ve now discussed extensively the notion of quantum error correction. However, there’s a flaw in the basic
model we’ve been considering. Or rather, there is a lack of flaws: We’ve been assuming that the encoding
and decoding circuits needed for error correction can be performed perfectly, and that errors only occur
between these two steps. This might not be a bad idealization if you’re mainly concerned about transmission
over a noisy communications channel or storage for long amounts of time. These are cases where the errors
introduced by mistakes in the encoding and decoding steps may be much rarer than errors occurring in the
period in between. However, in practice, errors in gates are far from negligible and are likely to remain so for
the forseeable future, so neglecting errors during the encoding and decoding circuits is a poor approximation.
If we want to put together millions or more gates to perform a large quantum computation, the prospect of
getting through the whole circuit without errors is vanishingly unlikely.
To protect quantum computations against errors, we need something more than a quantum error-
correcting code, we need a fault-tolerant protocol, which allows us to perform unitary gates on encoded
states despite errors occurring during the computation. Part II is devoted to developing the theory of
fault-tolerant quantum computation, culminating in the threshold theorem, which says that arbitrarily long
quantum computations are possible provided the error rate per physical gate and time step is below some
constant threshold value. In this chapter, I’ll introduce the basic concepts of fault tolerance. We’ll learn
what it means to compute in a world where no component can be taken for granted, where everyone — every
qubit, every gate — can make a mistake.

10.1 The Fault-Tolerant Scenario

As discussed in chapter 1, there are many possible errors that can a✏ict a quantum state. In part I, the
errors only had one shot at the quantum state: we encode, the errors do their thing (although we might wish
they didn’t), and then we decode and correct the state. For fault tolerance, we will need to deal with more
pervasive errors. Errors can occur between gates or during gates, and even worse, the errors keep happening.
Every time we do another gate, or even if we simply leave a qubit by itself for a little while, a new error can
occur. Dealing with the full range of possible errors is a daunting task, so for the purposes of developing the
theory and analyzing fault-tolerant protocols, we normally work with a simplified model, which I’ll discuss
in this section. We’ll actually need to add another complication to this model before proving the threshold
theorem in chapter 14, and we’ll discuss fault tolerance for even more general error models in chapter 15.

159
2-qubit gate locations 1-qubit gate location

Measurement location
|+i H
Preparation locations
|0i

Wait locations

Figure 10.1: A sample circuit broken down into locations. Note that a 2-qubit gate location is a single
location on 2 qubits. Thus, this sample circuit has a total of 8 locations.

10.1.1 Basic Model for Fault Tolerance (Independent Stochastic Errors)

Any quantum circuit, fault-tolerant or not, is composed of the same basic components. We perform gates
on qubits, we may prepare new qubits and introduce them to the circuit, and sometimes we measure qubits
and perhaps perform a classical computation on the outcome. Sometimes measurement occurs in the middle
of the computation, in which case later actions may depend on the measurement outcomes, and sometimes
measurement is delayed until the end of the computation. Each basic action that we can perform is called
a location. We’ll consider circuits which can perform gates in parallel, since we’ll want to simultaneously
correct errors in di↵erent parts of the computer.

Definition 10.1. Consider a quantum circuit, arranged into time steps. Each qubit can be involved in
at most one action per time step, but multiple actions can be performed in a single time, provided they
involve di↵erent qubits. A location is a single indivisible action in the circuit. We may consider the following
di↵erent types of locations:

1. A preparation location: The preparation of a single qubit in a particular state, frequently |0i. Some-
times we distinguish between di↵erent types of preparation locations based on the state prepared.
Occasionally we may want a preparation location to encompass the creation of a multiple-qubit entan-
gled state.

2. A gate location: A single gate from the universal gate set used to describe the circuit. Sometimes we
may want to distinguish di↵erent types of gate location based on the kind of gate appearing (e.g., a
CNOT location or a H location). If the gate is a multiple-qubit gate, the location encompasses all the
qubits involved in the gate, but still just counts as a single location. Usually, we only consider unitary
gates to produce gate locations, but one could also consider gate locations for more general quantum
channels.

3. A wait location or storage location: A single qubit is stored for a unit time with no gate performed on
it.

4. A measurement location: A single qubit is measured with a von Neumann measurement, usually in the
standard basis |0i, |1i. The measurement outcome is captured as classical information. Occasionally we
may want locations corresponding to other kinds of measurements, including possibly multiple-qubit
entangled measurements.

5. A classical computation location: In the basic model for fault tolerance, we omit locations correspond-
ing to classical computations. We assume that classical computation can be performed instantaneously
(and flawlessly) on the outcomes of measurements, including combining the results of multiple mea-
surement locations. The outcomes of the classical computations may later be used to control any other
locations later in the circuit.

160
Remember, we are just developing the basic model for fault tolerance most often used in analyzing
protocols. Because we’re currently concerned with the task of manipulating quantum states, even giving
ourselves unlimited classical computational power doesn’t make this task trivial. Of course, we shouldn’t
abuse the privilege. For instance, it is definitely cheating to scrap the quantum circuit altogether, and solve
the problem we are interested in using an inefficient classical algorithm — but note that even such a blatant
cheat doesn’t let us store an unknown quantum state indefinitely in the presence of noise. We’ll discuss the
assumption of perfect classical computation along with other deviations from the basic model in chapter 15.
Next, we want to add errors. In the basic model for fault tolerance, we consider that errors at di↵erent
locations are independent, and furthermore consider a simplified model for the errors. We still want to
include the possibility that any location can go bad.
Definition 10.2. Consider a quantum circuit C divided into locations, as above. An independent error
model on the quantum circuit C is a circuit C e where every preparation location, gate location, storage
location, and measurement location is replaced by a separate quantum channel, arranged in the same way
as the original circuit. In an independent stochastic error model, the quantum channel corresponding to a
particular location L of C performs the correct action for L with probability 1 pL and does something else
with the same input and output Hilbert space dimensions as L with probability pL . I.e., the channel for a
|0i preparation location L will prepare |0i with probability 1 pL and prepare some other single-qubit state
(possibly mixed) with probability pL . The quantum channel corresponding to L is still called a location and
may be referred to as L, e or just L if the distinction between the two is not needed. pL is called the error
probability or error rate of location L.
The circuit Ce is called a noisy implementation of C. In any particular realization of a noisy circuit C e
with an independent stochastic error model, for some locations the correct action L is performed, while for
other locations the wrong action is performed. The locations where the wrong thing happens are called
faulty, or they are said to have a fault. We frequently say of a faulty location that a preparation error, gate
error, storage error, or measurement error has occurred, depending on the type of location.
In an independent Pauli error model, a faulty location always performs the correct action followed by a
Pauli channel. In a Pauli error model, a faulty measurement location will flip the measurement outcome bit
with some probability.
A fault path is a set of locations in C. A Pauli fault path is a fault path with a Pauli error associated
with each location in the fault path.
There are a number of things to point out about the independent stochastic error model. The error
model is “independent” because the quantum channels for di↵erent locations are separate. In the stochastic
version, we can consider that errors occur with a certain probability pL . Then the probability of having
faults at two specific locations L and M is pL pM .
Note, however, that (unless we are dealing with a Pauli error model) we make no prescription for what
happens at faulty locations. In particular, multiple-qubit locations, such as a CNOT gate location, can have
errors which involve both qubits, including errors which entangle the qubits in the wrong way. This still is
considered to have probability pL . The independence only applies to separate locations. This is an essential
part of the error model, since errors that occur during the performance of a gate will generally a↵ect all the
qubits involved in the gate.
Another perhaps suprising observation is that, according to the definition, a faulty location might behave
correctly! The identity is a possible matrix, even for a Pauli channel, so that “error” might turn out not to
be one. In this case, we are essentially overcounting the number of faults, which generally means we are just
being more conservative than we need to, and our conclusions will be correct. (There are occasional cases
where having an identity “error” is actually worse than a real error, for instance if the error would cancel
another previous error, but that just means that we are correct in thinking of the identity “error” location
as a fault.)
The notion of fault path is useful when discussing and computing error rates, particularly (though not
exclusively) for independent stochastic channels. The fault path represents the locations where errors occur
in a particular realization of the circuit. A Pauli fault path adds in additional information about the type
of error occurring at each location, under the assumption that the errors are Pauli errors.

161
Definition 10.3. A basic model for fault tolerance is as follows: An ideal circuit C is physically realized as
e according to an independent stochastic error model. The error model has the property
a noisy circuit C,
that all locations of a given type have the same error probability. In other respects, the quantum channels
used in the error model are arbitrary, and may be incompletely specified. A basic error model can therefore
be summarized in terms of four error probabilities: pP (the error probability for a preparation location),
pG (the error probability for a gate location), pS (the error probability for a storage location), and pM (the
error probability for a measurement location).

Frequently the basic model is simplified even further, either by using an independent Pauli error model
or by making some or all of the four error probabilities equal, or both. Sometimes the basic model is slightly
elaborated by considering di↵erent error rates for di↵erent types of gate, for instance by letting the error
rate for two-qubit gates be higher than the error rate for one-qubit gates.
Basic models for fault tolerance capture the main e↵ects that are important in developing the theory of
fault tolerance. While a particular physical realization for quantum computation may not fulfill precisely
the assumptions of a basic model, provided it does not violate the assumptions too much, we expect fault-
tolerant protocols to work more or less according to the analysis done for the basic model. We’ll discuss in
detail in chapter 15 what happens when we change the assumptions behind the basic model.
Conversely, if we attempt to simplify the basic model by leaving out parts of it, the analysis can fail badly
for realistic systems. For instance, if we assume the storage error rate is 0, so wait locations are perfect, we
might come up with a circuit that is very sensitive to storage errors and thus fails badly when the storage
error rate becomes non-zero. Nevertheless, it is sometimes interesting to consider models that do include
such simplifications, either because they represent the qualities of a particular physical system of interest or
because they may give us insight into particular aspects of the theory of fault tolerance.

10.1.2 Error Propagation

Building fault-tolerant circuits that still work when subjected to independent stochastic errors is challenging
because of two problems beyond those we solved to make QECCs. First, we can no longer count on any
circuit or gate to do exactly what we intended it to do, since any gate can fail with probability pG . Second,
once errors appear, they don’t stay put. Instead, they spread through the qubits of our computer like a
disease being transmitted by sick air travellers. Even a portion of a circuit that has no faults in it can still
cause problems by contributing to error propagation.
To see how this works, consider the CNOT gate. Assume it acts correctly, but that the initial state is
wrong. Suppose instead of | i, the CNOT acts on E| i. Then

CNOT(E| i) = (CNOT E CNOT)CNOT| i. (10.1)

The correct final state is CNOT| i, so the true final state has an error CNOT E CNOT on it, much
as we saw for the analysis of the Cli↵ord group in chapter 6. For example, suppose E = X ⌦ I. Then
CNOT E CNOT = X ⌦ X, so a bit flip error spreads from the control qubit to the target qubit of the
CNOT. Whereas only one qubit had an error before the CNOT, now two qubits have errors, even though
the gate itself functioned perfectly. This is clearly a problem for a QECC which is designed to correct only
a small number of errors — an error in one qubit, initially something the code could correct, might spread
quickly to a↵ect more qubits than the code is designed to handle.
Now consider E = I ⌦ Z. We have CNOT E CNOT = Z ⌦ Z, meaning the phase flip error has spread
from the target qubit to the control qubit. While a classical CNOT can spread errors from control to target,
the quantum CNOT can also spread errors the other way. This means that error propagation, already a
challenge for classical fault-tolerant computers, is even more pervasive in a quantum computer. In general,
any two-qubit entangling gate will cause errors to propagate in both directions through the gate. (The
SWAP gate is an interesting exception in that it causes errors to switch between the two qubits without
directly causing them to spread. But then, the SWAP gate doesn’t really entangle the two qubits involved
in the gate either.)

162
10.1.3 Example Transversal Gate
Single-qubit gates don’t cause error propagation, so circuits composed of single-qubit gates will behave much
better than general circuits involving multiple-qubit gates. For instance, consider the seven-qubit code. It
turns out that the Hadamard transform applied to all 7 qubits of the code will perform the logical Hadamard
transform:
1
H ⌦7 |0i = p (|0i + |1i) (10.2)
2
1
H ⌦7 |1i = p (|0i |1i) (10.3)
2

(You can check this yourself, if you like.) Imagine we have a codeword | i = ↵|0i + |1i, and consider what
happens if there is a single-qubit error on the codeword, say a bit flip on qubit 3. Then, as above, we find
that the state post-Hadamard is

(H ⌦7 X3 H ⌦7 )H ⌦7 | i = Z3 H ⌦7 | i. (10.4)

The single-qubit Hadamard transforms have changed the type of error from a bit flip to a phase flip, but
crucially, it is still a single-qubit error. Before the Hadamard transforms, the code could correct the error,
and the code can still correct the error after the Hadamard transforms.
This is the key idea that takes us a long way towards fault tolerance. The tensor product of 7 Hadamard
transforms is an example of a transversal gate, a concept that we’ll explore in much more detail in chapter 11.
Of course, single-qubit gates alone won’t get us very far, so we’ll also see how to design multiple-qubit
transversal gates. These gates will cause some error propagation, but the error propagation will be carefully
controlled to be of a form we can handle. It turns out that even multiple-qubit transversal gates are still
not enough to provide a universal set of gates on the encoded qubits, so we’ll need to add some additional
techniques discussed in chapter 13 to have a full range of control.

10.1.4 Starting and Ending Encoded

There is an additional pitfall to starting or ending a fault-tolerant computation. One natural approach is
to start with unencoded qubits given to you by someone else, then to encode them in a QECC. Once it is
encoded, perform the fault-tolerant quantum computation, then decode, leaving unencoded qubits once more.
However, this approach leaves the qubits vulnerable to errors at the beginning and end of the computation.
Even if we discount errors occurring before we’re given the qubits or after we finish with them, there is always
the possibility of errors early in the encoding circuit, before we’ve had time to develop any error correction
capability, or late in the decoding circuit. Techniques exist to minimize the period of vulnerability, but it
cannot be completely eliminated. Therefore, any quantum computation performed this way has a minimum
error probability (derived from the error rates pG , etc.) which cannot be reduced by using more powerful
quantum error-correcting codes. The logical error rate during the main part of the computation can be
reduced arbitrarily far via the threshold theorem, meaning the encoding and decoding error will dominate
the error rate, regardless of how big a computation we are performing.
This is unsatisfactory. Luckily, there is a simple solution. Most of the time, we are interested in solving a
problem with a classical description, and getting a classical answer out at the end of the computation. Then
we can start our quantum computation in a standard encoded state, such as |00 . . . 0i, which can be reliably
created via fault-tolerant state preparation. At the end of the computation, we take the state, still encoded,
and measure it using fault-tolerant measurement, getting a classical answer. Indeed, the classical answer is
also encoded in some classical error-correcting code at this point, perhaps just the repetition code, or perhaps
something more complicated. Then classical processing is used to extract the output of the computation.
Since the basic model for fault-tolerance assumes classical computation is noiseless, it is safe to decode the
classical data. In this way, the logical error rate can be made arbitrarily low, using the techniques we’ll
discuss in the following chapters.

163
If we have a quantum computation which takes qubits as input and also outputs qubits, this won’t work.
If possible, we should demand to be given qubits which are already encoded, and insist on outputting encoded
qubits as well. Possibly we’ll want to switch codes to do our fault-tolerant computation using a di↵erent
QECC than the one provided to us, but fault-tolerant techniques exist for switching between codes. If the
code used for input and output qubits is not a very good code (e.g., small distance), there may still be some
unavoidable error in the process of switching into and out of the better code used for fault tolerance, but
the error will likely be less than if the input and output qubits were completely unencoded.
Some years ago, I was at a conference where a skeptic about quantum error correction tried to explain
what he thought was a problem with QECCs by making an analogy. He pointed out the unavoidable error
due to encoding and decoding, as described above, and said it was like getting out of a car in the rain —
there is not enough space to open your umbrella in the car, so you must get a least a little bit wet while
simultaneously getting out and opening the umbrella. The solution of starting and ending encoded was
already known (the skeptic was behind the times), so I pointed out that there is an analogous solution to
avoid getting wet: If you always make sure to park in a covered lot, you can get out of the car and open
the umbrella where there is more space while still being protected from the rain. Once the umbrella is open,
you can safely leave the covered area and walk around under the protection of the umbrella. Fault-tolerant
protocols work the same way. Of course, you might have to walk a bit further if the covered lot is not right
where you want to park, but there is indeed a cost to fault tolerance.

10.1.5 Fault Tolerant Simulations and the Components of a Fault Tolerant Cir-
cuit
So, putting this together, what do we want out of a fault-tolerant protocol? The basic idea is this: We are
given a circuit, broken down into a universal set of gates. We’d like to replace the qubits of the circuit with
the logical qubits of a QECC. Then we’d like to perform a sequence of gates on the qubits, without ever
leaving the protection of the code, that transforms the logical qubits with the same unitary transformation
as the original ideal circuit we were given. And we’d like the sequence of gates we use to be fault tolerant,
in the sense that errors on a small but constant fraction of the gates do not interfere with the getting the
correct outcome for the logical qubits.
We can describe what we want as a fault-tolerant simulation of the circuit. We replace each location in
the original circuit with a fault-tolerant gadget that does the same thing to logical qubits as the location is
supposed to do in the ideal circuit.

Definition 10.4. Given a particular type L of location, specified precisely (e.g., not just that it is a gate
location, but also the type of gate), a gadget for L is a QECC Q coupled with a circuit GL , such that if the
qubits involved in L are encoded into Q, then subjected to the circuit GL , then decoded, all without errors,
the e↵ect is the same as if L were performed directly. The circuit GL may involve adding and/or discarding
physical qubits, which may or may not also be encoded in Q.

In other words, a gadget for a specific function is supposed to perform that function on the encoded
state. If the encoder for Q is E and the decoder is D, then

D GL E = L. (10.5)

Definition 10.4 does not guarantee that the gadget is in any sense fault-tolerant; I’ll discuss in section 10.2
what is needed for a gadget to be fault-tolerant. In the simplest examples, the QECC used in a fault-tolerant
protocol has only one encoded qubit, and in the definitions below, I will assume that. If the location involves
multiple qubits but each code block only encodes one logical qubit, each logical qubit should be encoded in a
separate block of the QECC, and the gadget circuitry will interact the di↵erent blocks as required. Another
option for dealing with multiple qubits per block is to distinguish between locations which interact qubits
encoded in the same block and locations which interact qubits encoded in di↵erent blocks. Then we’ll need
di↵erent kinds of gadgets to deal with these two di↵erent kinds of locations, even when the gate involved is
otherwise the same.

164
|0i H |0i EC H EC Meas.

Figure 10.2: The fault-tolerant simulation of a circuit replaces each location with the corresponding gadget
and puts error correction gadgets between each pair of other gadgets.

Gadgets can also be defined for other tasks, not directly corresponding to a location. In that case, the
gadget is simply a QECC with a circuit. The proper functioning of the gadget has to be defined separately.
The most useful sort of gadget that doesn’t correspond to a location is an error-correction gadget. Obviously,
an error-correction gadget is supposed to use the QECC to correct any errors pre-existing in the code, and a
fault-tolerant error-correction gadget is supposed to do so even while new errors are occurring. The formal
definition of a fault tolerant error-correction gadget is given in section 10.2.

Definition 10.5. A fault-tolerant protocol consists of a QECC Q along with the following types of gadgets:

• State preparation gadget for at least one type of state

• Measurement gadget for at least one type of measurement

• Gate gadgets for a universal set of gates

• Storage gadget

• Error correction gadget

The protocol may contain more than one gadget for a given type of location, and may contain some additional
gadgets as well. All gadgets should satisfy some fault-tolerance criteria such as those given in section 10.2.
Normally all gadgets associated with a given protocol use the same QECC; in cases where they do not, the
protocol should also contain a code switching gadget that changes the QECC in which one or more logical
qubits are encoded.

We don’t usually bother explicitly describing a storage gadget, since a storage gadget can always be
implemented by just putting a wait location for all physical qubits in the code. In principle, though, a
fault-tolerant protocol could contain a di↵erent storage gadget involving non-trivial gates.
How do we put together gadgets in order to make a fault-tolerant circuit? Since each gadget for a location
is supposed to perform the function of the location on the encoded state, obviously what we want to do is
to replace each location in the original circuit with a corresponding gadget. However, that’s not sufficient.
In a noisy circuit, errors will occur, and in a large circuit, those errors will build up over time unless we are
constantly performing error correction to eliminate them. The most common approach is to do a round of
error correction as often as possible, namely between every adjacent pair of locations.

Definition 10.6. Let C be a circuit, broken down into locations as in definition 10.1. Then F T (C), the
ideal fault-tolerant simulation of C associated with a given fault-tolerant protocol, is the circuit created as
follows: Take each location in C and replace it with the corresponding gadget of the fault-tolerant protocol,
in the process replacing each qubit of C with a block of the QECC Q. After each state preparation gadget,
gate gadget, or storage gadget, place an error correction gadget on each of the blocks of Q involved in the
gadget. The fault-tolerant simulation of C is then F^
T (C). The circuit size overhead of F T (C) is the number
of locations of F T (C) divided by the number of locations in C. The space overhead or qubit overhead is the
ratio of the number of physical qubits involved in F T (C) to the number of qubits involved in C, and the
depth overhead or time overhead is the ratio of the depth of F T (C) to the depth of C.

165
Note that we place error correction gadgets after the storage gadgets as well as after gate gadgets, since
storage errors can occur even in wait locations. We don’t need error correction gadgets after measurement
gadgets, because the output of a measurement gadget is just classical information, and we are assuming that
classical circuits don’t su↵er errors.
If the QECC used in the fault-tolerant protocol encodes k > 1 qubits per block, the fault-tolerant
simulation must divide up the qubits of C into groups of k and encode them that way. The fault-tolerant
protocol will probably then contain multiple di↵erent gadgets for a single two-qubit gate U , depending on
whether U acts between qubits in the same block or between qubits in di↵erent blocks.

10.2 Formal Definition of Fault Tolerance

Now let us consider what we require for a fault-tolerant simulation to successfully simulate a low-noise logical
computation using noisy physical gates. The definition of a gadget, definition 10.4, only requires that the
gadget act correctly in the absence of errors. For a fault-tolerant protocol to truly tolerate faults, we’ll need
the gadgets to degrade gracefully in the presence of noise. If there are too many errors, we don’t expect the
gadget to work right, but if there aren’t, there are two conditions we require: it should still behave correctly
on the logical state, and it shouldn’t propagate errors too badly.
The definitions in this section represent the most common notion of fault tolerance, but weaker or stronger
definitions are possible as well, and have their place. Some protocols may need additional properties, requiring
stronger definitions. On the other hand, when put together properly, gadgets satisfying weaker definitions
can still achieve fault tolerance on the protocol as a whole.

10.2.1 Ideal Decoder and r-Filter

In order to define precisely what it means for a gadget to be fault tolerant, we’ll need to talk about the
logical state encoded in a QECC, and about the number of errors present in a noisy codeword. However, by
just looking locally at a circuit, there is no way to tell what the “correct” encoded state is supposed to be.
Luckily, we don’t need to discuss that to define fault tolerance. A gadget is given some encoded state, which
may be right or may be wrong, and it does something to that; we must check if the “something” is what we
intended to do or not.
To talk about the number of errors in a state, we therefore only consider the number of errors relative
to some codeword, not necessarily the “correct” one, since we don’t know what the correct codeword is
supposed to be. We can codify this by thinking of a projector:
Definition 10.7. The r-filter for the QECC Q is the projector on the subspace spanned by states of the
form E| i, where | i 2 Q and E is an error of weight at most r. Pictorially, I will draw an r-filter as:

r
(10.6)

The doubled horizontal lines in this picture represent a block of the QECC Q rather than single physical
qubits.
When we apply an r-filter to some state, we get a state which contains only errors of weight less than or
equal to r. Thus, any codeword passes through the r-filter unchanged. Components of the state with more
errors on them are dropped. The r-filter is thus a test we can apply to see if there are many errors on the
state (relative to any codeword). If
r
= (10.7)

then the input state has r or fewer errors on it.

To talk about the logical state of a code, we should decode it. To do so, we would need to correct errors,
and using a real circuit, the attempt at decoding would have a chance of introducing additional errors, and

166
the state we end up with might not be the same logical state as we had before we tried to decode. However,
we’re only talking about the logical state; we don’t actually have to produce it. Therefore, we can imagine
decoding using an ideal decoder that does not have any faults.
Definition 10.8. The ideal decoder for the QECC Q is the quantum channel that takes a state (possibly
with errors) encoded in Q, corrects the errors, and then decodes the logical state, discarding all ancilla
qubits. I will draw an ideal decoder as follows:

(10.8)
The doubled horizontal line on the left of the picture represents a block of Q. The single horizontal line on
the right of the picture is the logical qubit after decoding.
For now, it is sufficient for the decoder to keep only the logical qubit. Later on, we’ll also want to consider
decoders that keep the syndrome information they extracted during error correction.

10.2.2 Fault-Tolerant Gate Properties

We’re now ready to define what it means for a gate gadget to be fault tolerant. Using the pictorial language
we started to develop above, we’ll be able to draw two pictures that define properties that a fault-tolerant
gate gadget must satisfy. For the purposes of these definitions, we can include storage gadgets as gate
gadgets. Just think of them as performing the identity gate.
Definition 10.9. The picture to represent a noisy gate gadget with exactly s faults is
s

s
or (10.9)

-for one- and two-qubit gates, respectively. Each doubled horizontal line represents a single block of the
QECC.
The picture to represent an ideal gate performed on unencoded qubits without noise is

or (10.10)

for one- and two-qubit gates, respectively. Each single horizontal line represents a single unencoded qubit.
In an independent stochastic noise model, these pictures can be interpreted as completely positive maps.
For some particular realization of the noisy gate gadget circuit, there are some number s of faults specified
by a fault path S. Replacing the locations in S with the CP maps corresponding to faults in the noise
model (usually we will want to rescale to make the maps trace-preserving, if possible) gives a linear map
corresponding to the picture. The picture for a gadget with s faults thus represents many di↵erent maps,
depending on the exact locations which are faulty and on the precise error model.
The ideal decoder and r-filter are also CP maps; the decoder is trace preserving, but the r-filter is not.
We can compose them with the CP map corresponding to a realization of a gate gadget to get more CP
maps, and the properties for fault tolerance will be defined as equations relating CP maps derived this way.
We can thus represent such equations simply as pictures.
Definition 10.10 (FT Gate Error Propagation Property). Suppose we have a gate gadget associated with
a QECC with distance 2t + 1 (i.e., it corrects t errors). If it is a single-qubit gate gadget, it satisfies the FT
Gate Error Propagation Property, abbreviated GPP, if, whenever r + s  t,
r s r s r+s
= (10.11)

167
A two-qubit gate gadget satisfies the GPP if, whenever r1 + r2 + s  t,

r1 s r1 s r 1 + r2 + s

r2 r2 r 1 + r2 + s
= (10.12)

For both equations, the faulty locations on both sides of the equation have the same fault path and error
maps substituted in.

These equations use filters on the left to ensure that the input state to the CP maps can be considered
to be at most r errors away from a codeword. The equations then demand that, given such an input state,
the state exiting the gadget be at most r + s or r1 + r2 + s errors away from a codeword. In the case of the
two-qubit gate gadget, we allow errors to propagate between the two blocks of the code. Thus, even if s = 0,
so the gadget itself is faulty, the number of errors in a block may increase as a result of error propagation.
However, we insist that the number of errors ending up in a single block is limited by the total number
of errors r1 + r2 input in the two blocks plus the number s of new faults. This is the sense in which this
property limits the error propagation.
Note that for the two-qubit gate gadget, the condition uses separate filters on the two blocks of the
code involved in the gadget. Thus, the final state of the two blocks taken together might have a total of
2(r1 + r2 + s) errors on it.
For the Gate Error Propagation Property, we only impose the equations when the total number of errors
on input states plus faults in the gadget is at most t. When there are more than t total errors floating
around, we don’t in general insist that our gadgets have to work correctly. In many cases, fault-tolerant
gate constructions do continue to satisfy these equations even when there are more errors, but that is an
additional property not required by the basic definition.

Definition 10.11 (FT Gate Correctness Property). Suppose we have a gate gadget associated with a
QECC with distance 2t + 1. If it is a single-qubit gate gadget, it satisfies the FT Gate Correctness Property,
abbreviated GCP, if, whenever r + s  t,

r s r
= (10.13)

A two-qubit gate gadget satisfies the Gate Correctness Property if, whenever r1 + r2 + s  t,

r1 s r1

r2 r2
= (10.14)

These equations must hold for any choice of CP maps substituted in for the faulty locations on the left-hand
side.

What do these conditions mean? They say that if we perform the noisy gate gadget and then decode,
we get the same thing as if we decode and then perform the gate gadget. The combination

r
(10.15)

extracts the logical state of the input after limiting the number of errors. The RHS of each equation thus
outputs the state we should get by doing an ideal gate on the logical state. The LHS is the state we actually
get. The Gate Correctness Property thus ensures that a noisy gate gadget performs the correct operation
on the encoded state, generalizing the definition of a gadget to the case in which there are errors.

168
Again, we only insist that the condition hold when the total number of errors floating around is at most
t. In this case, that is usually all we can hope for. That is because if there is any arrangement of errors
that allows all r1 + r2 + s errors to concentrate in a single block, on the LHS, the concentration will happen,
possibly giving a logical error which is not corrected by the ideal decoder, whereas on the RHS, all the errors
are corrected immediately before there is any opportunity for propagation.

10.2.3 Fault-Tolerant Preparation and Measurement Properties

The properties for fault-tolerant state preparation and measurement gadgets are similar:
Definition 10.12. The picture for a single-qubit state preparation gadget with exactly s faults is
s
(10.16)
The picture for a single-qubit measurement gadget with exactly s faults is
s
(10.17)
In both cases, the doubled horizontal lines represent encoded blocks of the QECC. The measurement gadget
outputs classical information, generally a single bit.
The pictures for ideal preparation of a single-qubit state and ideal single-qubit measurement are

and (10.18)

The horizontal lines represent single unencoded qubits, and the ideal measurement outputs a classical bit.
Again, these pictures represent CP maps. A noisy preparation gadget is a CP map that takes no input
and outputs a state in the Hilbert space of the QECC. A noisy measurement gadget is a CP map that takes
a noisy codeword as input and outputs a classical bit.
For the fault-tolerant preparation gadgets, there is no input, so we don’t need to precede them by filters.
Definition 10.13 (FT Preparation Error Propagation Property). Suppose we have a preparation gadget
associated with a QECC with distance 2t + 1. The gadget satisfies the FT Preparation Error Propagation
Property, abbreviated PPP, if, whenever s  t,
s s s
= (10.19)

Definition 10.14 (Preparation Correctness Property). Suppose we have a preparation gadget associated
with a QECC with distance 2t+1. The gadget satisfies the FT Preparation Correctness Property, abbreviated
PCP, if, whenever s  t,
s
= (10.20)

In other words, when s  t, a preparation gadget with s faults in it should output a state with no more
than s errors in it, and the state should be an encoding of the correct state.
For the measurement gadget, we only have one property. We don’t need to worry about error propagation
through a measurement gadget because the output only involves classical information, which we are assuming
is error-free.
Definition 10.15 (FT Measurement Correctness Property). Suppose we have a measurement gadget asso-
ciated with a QECC with distance 2t + 1. The gadget satisfies the FT Measurement Correctness Property,
abbreviated MCP, if, whenever r + s  t,
r s r
= (10.21)

169
That is, if the initial state has no more than r errors, and there are not too many errors in the measurement
gadget, the classical outcome of the measurement should be the same as if we did an ideal measurement on
the decoded state.

10.2.4 Fault-Tolerant Error Correction Properties

Fault-tolerant error correction gadgets are a bit di↵erent from gadgets associated with locations. For other
gadgets, we are concerned that error propagation is not too severe, but error correction gadgets are supposed
to actually reduce the number of errors in the state. Of course, if there are faults in the error correction
gadget itself, they may introduce more errors that must be corrected in a later error correction gadget.
Definition 10.16. A fault-tolerant error correction gadget (which I will usually abbreviate FTEC in the
future) is represented by the following picture:
s
FTEC (10.22)

The horizontal lines represent blocks of the QECC.

As usual, we can interpret this picture as a CP map for any particular realization of the noisy circuit.
Error correction gadgets invariably involve adding extra ancilla qubits and produce syndrome information.
The syndrome information may either be in the form of classical bits or as extra qubits, and is usually
discarded at the end of the gadget.
Definition 10.17 (FT Error Correction Recovery Property). Suppose we have an error correction gadget
associated with a QECC with distance 2t + 1. The gadget satisfies the FT Error Correction Recovery
Property, abbreviated ECRP, if, whenever s  t,
s s s
FTEC = FTEC (10.23)

Note that the ECRP does not involve a filter on the input state. This is important: We want our FTEC
gadgets to return us to a codeword (or a codeword with s errors when there are s faults in the FTEC gadget)
no matter how many errors there are to start with. Otherwise, if we ever accumulate too many errors in
a block, the rest of the circuit is a loss. When there are more than t errors in a block, we may not be
able to decode correctly, but at least we’d like the remaining gadgets after the FTEC to do the right thing,
even if it’s on the wrong logical operator. That localizes the logical error to a small number of gadgets.
In particular, when we prove the threshold theorem, we’ll use concatenated codes to get more and more
accuracy, but that won’t work if sub-blocks of the code aren’t brought back to some codeword, even if it is
not the correct one.
Definition 10.18 (FT Error Correction Correctness Property). Suppose we have an error correction gadget
associated with a QECC with distance 2t + 1. The gadget satisfies the FT Error Correction Correctness
Property, abbreviated ECCP, if, whenever r + s  t,
r s r
FTEC = (10.24)

This property says that when the total number of errors (input errors plus faults in the gadget) is at
most t, the encoded state does not change during an FTEC gadget.

10.2.5 Fault Tolerance Properties for Di↵erent Error Models

In the basic model for fault tolerance, we allow the faults in a gadget to be general CP maps, but ones which
are independent between di↵erent locations. However, if you look at the definitions of the fault tolerance
properties given in this section, you’ll see that all of them are linear equations. That means that a version
of theorem 2.4 applies in this case as well:

170
H
|000i + |111i Y H
H

H H
H H
H H

environment

Figure 10.3: An example of a gadget interacting with a persistent environment on a fault path of size 3.

Theorem 10.1. Consider a realization C1 of a noisy gadget circuit with a fault path of size s, and suppose
that the location of each fault is replaced by a linear Hilbert space operator (e.g., specific Kraus operators of
the CP maps). Let C2 be C1 with one particular faulty location L replaced by a di↵erent linear Hilbert space
operator. Suppose the operator for L in C1 is E and the operator for L in C2 is F , and let C3 be the circuit
realization C1 with L replaced by ↵E + F , for any complex numbers ↵ and . If the gadget satisfies the
relevant fault tolerance properties for realizations C1 and C2 , then it also satisfies them for C3 .
As a consequence of this, we need not check the fault tolerance properties for arbitrary CP maps inserted
for faults. It is sufficient to check them for a basis of errors. In particular, Pauli errors suffice.
Corollary 10.2. If a gadget satisfies the relevant fault tolerance properties when arbitrary Pauli errors are
inserted for all faults, then it satisfies the fault tolerance properties when arbitrary CP maps are inserted for
faults.
Recall that a Pauli error model means that the location acts correctly and then afterwards there is a
Pauli error, so when I say a “Pauli error is inserted” at a location, I mean that it is inserted after the correct
action of the location (unless it is a measurement location, in which case the error flips the outcome of the
measurement). This corollary is very convenient, since it is much easier to check the properties for just Pauli
errors than for general CP maps.
The logic behind theorem 10.1 has a more far-reaching consequence than just convenience. We could
also take the superposition when we change the faults in multiple locations, and the result would still work.
This is not necessary in a basic model for fault tolerance, but it allows us to talk about certain correlations
between errors, and will actually be needed for proving the threshold theorem. Indeed, once we have checked
the fault tolerance properties for Pauli errors, we know they hold under the most general error that can
happen to a circuit with faults in those s specific locations.
Theorem 10.3. Consider a realization of a gadget with a particular fault path S, |S| = s. Suppose a gadget
satisfies the relevant fault tolerance properties when arbitrary Pauli errors are inserted at the locations of S.
Then it also satisfies the fault tolerance properties when subjected to a persistent environment which starts
in an arbitrary state (possibly even one entangled with the qubits involved in the gadget), and interacts with
the locations of S through arbitrary unitaries, as illustrated in figure 10.3.
This picture allows for arbitrary dynamics within the environment, since the unitaries implementing the
self-interaction can be absorbed into the unitaries interacting the system and environment.
We haven’t even defined the fault tolerance properties in this case, but it is straightforward to do so by
extending the pictures to include the environment and then tracing over it at the end of each picture. We
don’t put a restriction on the initial state of the environment, but consider it as extra input qubits for the
circuit. For instance, see figure 10.4. The resulting equations are again equalities of CP maps; indeed, since

171
r s r s r+s

Figure 10.4: The one-block GPP with an environment. The s locations of the fault path within the gate
gadget are arbitrary interactions with the environment register (bottom line), and the left-hand side and
right-hand side have the same interactions.

the interactions with the environment are unitary, we can consider the picture to represent a linear operation
on the Hilbert space. The gadget may involve measurement, so to make a linear Hilbert space operator,
we purify each measurement, replacing the classical computations afterwards with quantum gates (with no
additional faults). The filters that appear in the pictures are projections, so the operators we get will not
be unitary.
Because we don’t care what happens to the environment, we need not compare the environment on the
LHS of each equation with the environment on the RHS of each equation. Indeed, in the various correctness
properties, the environment on the LHS interacts with the faults in the gadget, whereas there are no faults
on the RHS. The final state of the environment therefore di↵ers for the LHS and RHS of the correctness
properties.

Proof. Let us label the interactions between the environment and the faulty locations in the circuit as
U1 , . . . , Us , and say that the ith faulty location Li involves gi qubits. We can write
X
Ui = P ⌦ Vi,P , (10.25)
P 2P̂gi

where Vi,P acts on the environment qubits. Since Ui is unitary,

X †
P Q ⌦ Vi,P Vi,Q = I ⌦ I, (10.26)
P,Q2P̂gi

which means
X †
Vi,P Vi,P = I (10.27)
P 2P̂gi
X †
Vi,P Vi,P Q = 0 8Q 2 P̂gi \ {I} (10.28)
P 2P̂gi

Let M be the operator corresponding to a picture involving the environment, with interactions Ui in
place. Let M{Pi } be the operator corresponding to the same picture, but where Ui is replaced by Pi ⌦ Vi,Pi .
Then X
M= M{Pi } . (10.29)

That is, the operator corresponding to the picture with unitary interactions with an environment can be
written as the sum of operators corresponding to di↵erent Pauli fault paths.
In the FT error propagation properties and the FT error correction recovery property, both sides of the
equation involve the same set of faults in the system. By the hypothesis, the LHS and RHS match when
we substitute Pauli errors, resulting in pictures M{Pi } . Therefore, the sums M match term by term, so the
totals match as well.

172
P
For the various FT correctness properties, let M = M{Pi } be the LHS of the equation. The LHS
with Pauli errors in the faulty locations and Vi,Pi in the corresponding environment locations matches the
RHS side picture, which is a tensor product between the system qubits and the environment (still with Vi,Pi
acting in appropriate places). The total LHS M is therefore equal to the RHS picture tensored with a sum
of the Vi,Pi s acting on the environment.
The only question is whether the trace over the environment gives a di↵erent value for the LHS and RHS.
We can calculate this by looking at what happens to the environment’s trace for each faulty location Li .
The environment undergoes X
Vi,P (10.30)
P 2P̂gi

at interaction location i, and its trace becomes

0 1
X †
tr ⇢ @ Vi,P Vi,Q A = tr ⇢ (10.31)
P,Q2P̂gi

by equations (10.27) and (10.28).

As a consequence of theorem 10.3, once we fix the locations of the faults in a circuit, anything, including
horrible correlated errors, can happen at the faulty locations, and the fault tolerance properties will still
pull us through, allowing things to work correctly. However, the theorem does not immediately allow us to
generalize the independent stochastic error model to arbitrary correlated models. We’ll want to calculate the
probability of having a logical error, and that involves summing over multiple fault paths. The probability
of having a particular fault path is not even well-defined in more general error models, so we’ll need to be
careful.

10.3 Statement of the Threshold Theorem for the Basic Model

The main goal of part II is to prove the threshold theorem. This is one of the key results in the theory of
quantum information. If it were not true, it would probably be impossible to build big quantum computers.
The threshold theorem sets a definite goal for experimentalists hoping to build a scalable quantum computer:
It says that if you can make the basic components of the computer good enough, and can make enough qubits,
then that suffices, and error-correcting codes and fault-tolerant protocols exist that can take you the rest of
the way. The point here is that “good enough” is a value independent of the algorithm you want to run, so
you don’t need to continue to improve your experimental error rates to solve a new, bigger problem. (Of
course, you may need more qubits to solve a bigger problem.)
Theorem 10.4 (Threshold Theorem for Basic Model of Fault Tolerance). Suppose a system satisfies a basic
model for fault tolerance, with pP = pG = pS = pM = p. Then there exists a family of fault-tolerant protocols
Fl and a threshold error rate pT such that, if p < pT , then for any ideal quantum circuit C which starts
with preparation locations and ends with measurement locations, and any ✏ > 0, there exists an l such that
the output distribution of F^Tl (C), the noisy fault-tolerant simulation of C by Fl , has statistical distance at
most ✏ from the output of C. When C has T locations, then F Tl (C) has O(T polylog(T /✏)) locations (i.e.,
the circuit size overhead is O(polylog(T /✏)).
In other words, if the physical error rates for all locations are below the magical threshold error value
pT , there exists a fault-tolerant protocol that makes the logical error rate on the complete circuit as small
as we like. Note that we actually need a family of FT protocols to make this work — as the circuit gets
bigger, or the logical error rate we’d like to achieve gets smaller, than we need to work harder at getting rid
of errors, which means more overhead and a more involved FT protocol. The nice thing about the threshold
theorem is that the extra work needed scales reasonably, only as a polynomial in the log of T /✏. Scaling up
to a really really big quantum computer is a lot of work even for an ideal circuit, but adding fault-tolerance

173
to the circuit does not require much more e↵ort than it does to add fault-tolerance to a merely big quantum
computer.
Unfortunately, while the asymptotic scaling for large T /✏ is good, the actual amount of overhead needed is
still pretty large. The problem is that the constants hidden by big-O notation are significant. Even relatively
efficient fault-tolerant protocols have a space overhead of hundreds, thousands, or even tens of thousands of
physical qubits per logical qubit, and many are much worse than that. While it’s not mentioned explicitly
in the theorem, the overhead is also dependent on p/pT , so it helps to get the physical error rates down as
much as possible, but even then the costs are high. One approach is to abandon the notion of a family of
protocols with a threshold, and just pick a specific FT protocol that works for the size of the circuit you are
interested in. By carefully choosing a protocol, you may be able to get the overhead down, although with
existing protocols, it is still likely be at least a few dozen physical qubits per logical qubit. And of course,
there may exist fault-tolerant protocols with much lower overhead than any we know of today.
The threshold theorem is proven using concatenated codes, introduced in section 9.1. The basic idea is
straightforward: Given a fault-tolerant protocol F, we can take any circuit and produce the fault-tolerant
simulation F T (C) of that circuit. Provided the error rate p per location is small enough, the logical error
rate for the FT simulation should be less than the physical error rate, and we have improved matters by
using the fault-tolerant simulation. Then we do it again, and take the FT simulation of the simulation, i.e.,
F T (F T (C)). That improves the error rate some more. With multiple iterations, we can drive the error rate
down very rapidly. The family of fault-tolerant protocols Fl that we need for the threshold theorem is then
the fault-tolerant protocol F concatenated l times. The result is really an FT protocol that uses an l-level
concatenated code. I will explain how to make this intuitive argument precise in chapter 14. The main
challenge is to properly define and determine the “logical error rate” for a fault-tolerant simulation.
Topological codes, discussed in chapter 19, can also be used to get a threshold, as can families of low-
density parity check codes (LDPC codes) with a constant rate. It may be other families of codes exist that
can give a threshold. We sometimes talk about the threshold for a specific code family, and sometimes just
about “the threshold” for fault tolerance. In the latter case, we are talking about the best (i.e., highest)
threshold pT optimized over all possible families of codes. In addition, di↵erent fault-tolerant protocols for
the same code family can lead to substantially di↵erent “threshold” values, but the real threshold is defined
using the best possible FT protocol.
The threshold theorem can be extended in various ways. One obvious way, still sticking to a basic model
for fault tolerance, is to let the error rates for di↵erent types of locations be di↵erent. In that case, the
threshold is no longer a single number, but a surface in the 4-dimensional space of error rates (pP , pG , pS , pM ).
The threshold surface separates the noiseless point (0, 0, 0, 0) from the completely noisy point (1, 1, 1, 1), and
the extended theorem then says that we can achieve error rate ✏ for the outcome distribution for any point
on the noiseless side of the surface with polylogarithmic overhead. It is also possible to go beyond a basic
model by loosening or removing some of the assumptions. Not all of the assumptions can be relaxed; which
ones can and which ones can’t is discussed in chapter 15. Even when it is possible to derive a threshold with
relaxed conditions, the actual numerical value of the threshold may decrease significantly, or the overhead
may increase, or both. Building a fault-tolerant quantum computer will therefore likely involve making a
number of trade-o↵s in order come up with an implementation for which the error rate is below the relevant
threshold value.

10.4 Di↵erent Ways of Computing the Threshold and Overhead

Since the threshold error rate determines whether a given experimental implementation will or will not be
useable for fault-tolerance, a great deal of e↵ort has gone into determining the actual value of the threshold
for various code families. Likewise, the overhead (of every kind) determines how many resources will be
needed for a given computation, and has also been a subject of intense study. However, comparing threshold
and overhead values you might find in the literature is not completely straightforward because of di↵erent
techniques used to derive them. First all, remember that for a specific code family, it might be possible
to come up with better FT gadgets that give a better threshold value or lower overhead. Secondly, to be

174
fair, you should only compare estimates derived using comparable techniques, since di↵erent techniques have
di↵erent sources of inaccuracy.
An estimate of the threshold and overhead will generally involve some combination of rigorous proof,
simplifying assumptions, and statistical sampling. Pretty much every approach uses some simplifying as-
sumptions — indeed, by working with a basic model of fault tolerance, you are already making a number
of simplifying assumptions — but some approaches go further. It is perhaps worth distinguishing between
assumptions about the quantum computer (e.g., assuming an independent stochastic error model) and as-
sumptions about the calculation (e.g., assuming that a particular type of state preparation gadget can
be performed reliably under conditions of interest). The former type of assumption gives you a complete
threshold estimate, albeit one that only applies under the given conditions.
Assumptions about the calculation are a bit more tricky to evaluate: Sometimes the assumption is a
conservative one, in which an e↵ect which may lower the threshold value is simplified to have the worst
possible e↵ect it can have. An approach which only uses conservative assumptions will give a threshold
value that is definitely lower than the true value and an overhead that is higher than the true value by an
unknown amount. The advantage of this is that we can be sure we have a lower bound on the threshold and
an upper bound on the overhead. Sometimes the assumption is a reasonable approximation, in which case
we may expect that the calculated threshold and overhead are roughly the same as it would be without the
assumption, maybe slightly higher or lower. However, as long as the approximation hasn’t been fully checked,
there is a chance it is wrong, so we cannot have full confidence in any value derived with the assumption.
Occasionally, the assumption is obviously an over-optimistic one, giving a significant overestimate of the
threshold value. Optimistic assumptions are sometimes used to tease out how much of the threshold error
rate comes from di↵erent e↵ects, or to estimate upper bounds on thresholds for a code family. It is important
not to think of thresholds calculated using optimistic assumptions as actual threshold values, since they don’t
provide any kind of guarantee for an experimentalist achieving that error rate.
One particularly widespread example of an optimistic assumption is a phenomenological error model. In
a phenomenological error model, there is an error rate per physical qubit per time step (regardless of gates)
and an error rate on each bit of the error syndrome that is measured. The phenomenological error model
is a greatly simplfied model which still includes constantly occurring errors and imperfect error syndrome
measurements. It is used to get a quick handle on the general performance of a QECC, syndrome extraction,
and syndrome decoding method without having to design and analyze detailed fault-tolerant circuits. Usually
this is done primarily to learn about the threshold and not for overhead. A threshold in a phenomenological
model should not be confused with the threshold derived from a full circuit model; they are not at all
comparable. It makes some sense to compare phenomenological thresholds for di↵erent FT protocols with
each other, but even there, caution is needed, as the relationship between the phenomenological threshold
and the circuit threshold can vary between codes and even between di↵erent FT protocols for the same code.
For instance, the number of gates needed to extract a bit of the syndrome in a full circuit analysis can be
greatly di↵erent between di↵erent protocols, but a phenomenological noise model completely ignores this
important contribution to the threshold.
The next big distinction is between rigorous proofs, simulations of FT protocols using statistical sampling,
and analytical estimates. Rigorous proofs use only conservative assumptions about the calculations and
carefully justify any assumptions needed to be sure they actually are conservative. Consequently, a rigorous
proof provides a lower bound on the threshold and upper bound on the overhead given the stated assumptions
about the computer.
Simulations, in contrast, simulate a big chunk of a FT circuit on a classical computer and run it many
times, randomly generating errors according to the appropriate error model and calculating the logical error
rate for the run. Classical simulations of general quantum circuits are unlikely to be possible, but as you’ll
see, the main parts of many FT protocols are composed just of Cli↵ord group gates, and simulating Cli↵ord
circuits with Pauli errors can be done efficiently using procedure 6.4. In some cases, additional properties of
the FT circuit and QECCs used can lead to additional simplifications of the simulation. Simulations invari-
able require some assumptions which may lead to good approximations, but are not necessarily conservative
ones. Consequently, a well-done simulation can give more accurate threshold and overhead estimates than
a rigorous proof, but we can never be quite certain if the true values are higher or lower than the simulated

175
values.
An analytical estimate focuses on formulas for the performance of a fault-tolerant protocol. They may be
derived in part from simplifying assumptions and may use empirical results from simulations to determine
constants in the formulas. The purpose of an analytical estimate is clarify the role of certain parameters and
to develop some insight into what changes when they are varied. For instance, you might approximate the
logical error rate or overhead of a protocol as a function of p/pT in order to understand what can be gained
by improving the physical error rate p in the quantum computer.
Note that some proofs also use statistical methods as part of the threshold calculation. In this case,
there is some statistical error, but the size of the error is controlled and known, so the technique comes with
a guarantee that we cannot be much above the true threshold value, in contrast with simulations, which
cannot have such a guarantee.
Another distinction between proofs and simulations comes in the error models used. Often, rigorous proofs
use an adversarial error model within the basic model for fault tolerance. That is, they work with the worst
possible error model consistent with a given value of the error rate p. Indeed, as we’ll see in chapter 14, the
threshold proof actually uses an even broader error model which involves some potential correlation between
qubits. Simulations usually have to work with some particular Pauli error model. Frequently X, Y , and Z
errors are given the same probability, so the error model is related to the depolarizing channel. Thus, proofs
generally apply to a broader range of quantum computers than simulations.
There is often a di↵erence of an order of magnitude or more between the rigorously proven value of
the threshold for a given family of codes and that derived using simulations. For instance, the best known
threshold value comes from Knill’s technique of using concatenated error-detecting codes to prepare ancilla
states (see section 16.1). Simulations of the technique suggest a threshold error rate of a few percent, say 3%,
depending on the error model, whereas the best existing proof for this approach shows that the threshold is
at least 10 3 , thirty times smaller. Most likely, the simulations are closer to the correct value, but as noted
above, we cannot have full confidence that they are underestimates and not overestimates.

10.4.1 Tips For Comparatively Evaluating Fault-Tolerant Protocols

FT protocols are complicated things, with many di↵erent components. Moreover, the various di↵erent
aspects of an FT protocol can interact in complicated and unexpected ways, so, for instance, a gate gadget
or FTEC technique that works well for one protocol can be poorly suited for another protocol even if the
QECC is the same. That makes it challenging to determine which of the many existing FT techniques is a
good choice for a protocol you are designing.
Today, most comparative information about the relative strengths and weakness of di↵erent FT protocols
comes from simulations. In the early days of fault-tolerant quantum computation, there were a number of
unreliable simulations of the threshold. The methodology has improved, but if you are not careful, it is easy
to find yourself making a misleading comparison. Here is a checklist to think about as you go about deciding
between FT protocols.

• There is no “best”: It’s rare to have an FT technique which, in isolation, is uncategorically better than
another. As I have already noted, the di↵erent components of an FT protocol interact, and so they
need to be chosen to be well-suited to each other. If you go through a list and try to pick the “best”
QECC, the “best” state preparation method, and so on, without seeing how they work together, you
will not even end up with a functional protocol, and certainly not the “best” protocol that could be
devised. Some techniques might be usually worse than others, but perhaps in the right context, they
can be the optimal solution to some problem. I tend to view the array of FT techniques in the literature
as a toolbox, and you want to pick a tool that is right for the job you are actually trying to do.

• Check the assumptions: Make sure to read carefully what assumptions and simplifications are made
for the simulations (or proofs or analytic estimates) you are relying on for the comparison. Certainly,
you should be careful about comparing a threshold estimated via simulation to one bounded below by
a proof. Also, pay close attention to things like use of a phenomenological error model. In general,

176
you should be aware of what error model is being used in a simulation; a comparison between two
simulations with di↵erent error models may be misleading.
• Check which aspects of the protocol are included: It has become common to cite threshold values
based on simulations of a pure storage scenario, that is, one for which only a FTEC gadget is run
repeatedly, and not any gate gadgets or even preparation or measurement gadgets. While this makes
comparison between two such threshold values relatively straightforward, it can sometimes overestimate
the threshold by an amount that may di↵er between protocols. Generally, simulating only storage
is reasonable because FTEC consumes a large fraction of the error budget of an FT protocol, but
sometimes there are additional reductions to the threshold that come from other gadgets. Overhead
computations are sometimes for storage only and sometimes for full protocols. Be sure of which you
are looking at, and only compare two overhead calculations if they are calculating the same thing.
Also, be aware that di↵erent protocols for FTEC might constrain the rest of the protocol in ways that
produce a very di↵erent overhead for the full protocol even when overhead for pure storage is similar.
• Consider the tradeo↵s: Sometimes the tradeo↵s between FT protocols are reasonably straightforward,
such as one protocol with a higher threshold vs. a second with a lower overhead. But there are other
properties which can also be important, including connectivity of the physical qubits (for instance, 2-
dimensional nearest-neighbor gates vs. long-range gates), speed of the syndrome decoding algorithms,
behavior under di↵erent error models, and more. In any given implementation, some of these prop-
erties may be more salient than others. A good understanding of the full range of properties of the
FT protocols under consideration will help you choose one that is most appropriate for the specific
implementation you have in mind (if any).

• Compare the degree of optimization: Finally, be aware that some protocols have been intensely studied
and have undergone a high degree of optimization to improve performance in all aspects. By com-
parison, a newly proposed protocol may look inefficient or seem to have too many requirements. But
that doesn’t mean it is useless — it may be that with further study, or in conjunction with other new
fault-tolerant tricks, the new protocol can become competitive with the old one. Again, keeping a
large toolbox of FT techniques allows us to study di↵erent combinations which may allow apparently
inferior approaches to shine.

177
178
Chapter 11

If Only It Were So Easy: Transversal

Gates

This chapter and the next two chapters (chapters 12 and 13) will explain how to create gadgets that satisfy
the fault-tolerance properties we defined in chapter 10. We’ll start with the simplest case, that of transversal
gates. You already saw an example of a transversal gate in section 10.1.3, so you know how straightforward
they are. There’s not much to be said about performing transversal gates, so in this chapter, I’ll mostly
focus on the question of figuring out which gates can be performed transversally for a given QECC.
Since transversal gates are so nice, we generally like to build FT protocols using codes for which many
gadgets can be performed transversally. The 7-qubit code is particularly nice, as measurement and all
Cli↵ord group gates can be performed transversally. Sadly, to get a full universal set of gates, we need to
add something non-transversal. This is not just a property of the 7-qubit code, but is true for any QECC.
If it were not true, the theory of fault tolerance would be a lot simpler.

11.1 What is a Transversal Gate?

11.1.1 Definition of Transversal
Transversal gates are constructed so that they automatically satisfy the FT Gate Error Propagation Property.
In particular, a transversal implementation of a gate gadget should never change the weight of a pre-existing
error within a single block of the QECC. One way to achieve this is by just using single-qubit gates, but we
can also talk about multiple-qubit transversal gates, using the following definition:
Definition 11.1. Let F be a quantum operation acting onN a Hilbert space H2⌦m
n , interpreted as m blocks
n
each consisting of n qubits. Then F is transversal if F = i=1 Gi , where Gi acts on the 2m -dimensional
Hilbert space composed of the ith qubit from each of the m blocks.
Of course, this definition can easily be extended to work for qudits as well as for qubits. Since we’re
interested in fault-tolerance, we invariably consider situations where each block of n qubits forms a QECC.
Almost always, all m blocks are separate blocks of the same QECC. The operation F is frequently a unitary,
but we don’t require that, and occasionally we want to talk, for instance, about transversal measurements.
Often, all the Gi are the same gate, but I will not require that for the definition of transversal. If all
Gi are equal to G, I may describe the gate F as a transversal G to indicate its construction. (The phrase
“bitwise G” is also sometimes used.) Because it is the most common case, sometimes in the literature, you
will find the notion of transversal gates restricted to having all Gi identical, but there is no real reason to do
so.
A gate consisting of a tensor product of single-qubit gates certainly is transversal, with m = 1, but it
is also possible to have transversal gates with m > 1. For instance, consider figure 11.1, illustrating the
transversal CNOT on two blocks of 7 qubits each.

179
Figure 11.1: The transversal CNOT gate acting on two blocks of the 7-qubit code.

11.1.2 Transversal Gates Satisfy the FT Gate Properties

The point of this definition is that a transversal gate will always satisfy the GPP. The GCP requires a little
more work, but actually, all the work just goes into checking that the transversal gate is in fact a gadget for
the desired type of location — in other words, that it maps codewords correctly to codewords.
Proposition 11.1. Suppose the circuit F is a gate gadget for the QECC Q and F is transversal. Then F
satisfies the GPP and the GCP.
Note that the definition of gadget actually requires that we specify a circuit, not just a quantum operation.
This is an important distinction, since di↵erent circuits for the same quantum operation can produce di↵erent
results in the presence of noise. In the case of the proposition, assume that the circuit implementing F is
one that directly implements the decomposition required by the definition of transversal.
Proof. We will consider the Ncase where F = U is unitary; non-unitary F can be handled by purification.
n
Since U is transversal, U = i=1 Vi . Suppose | i is the input to the gadget, and is a state on mn qubits
which is produced by putting an ra filter in front of the ath block of n qubits. Then | i is a superposition
of codewords with errors of weight at most ra on block a,
X
| i= Ej | j i (wt Ej |block a  ra ), (11.1)
j

with | j i a possibly entangled state of valid codewords on all m blocks.

Suppose the faulty circuit for U has errors at s locations given by the fault path S. By corollary 10.2, it
suffices to check the case where the errors are Pauli errors. Let Hi be the Hilbert space HN 2m composed of
the ith qubits of all m blocks. Suppose the fault path performs U and then a Pauli error i Fi , where Fi
acts on Hi . We can also assume Ej isN a tensor product (by increasing the number of terms in the sum over
n
j, if needed), and further write Ej = i=1 Eij , where Eij acts on Hi . Then
! ! !
X O O O XOh ⇣ ⌘i
e
U| i = Fi Vi Eij | j i = Fi Vi Eij V † U | j i
i (11.2)
j i i i j i

180
⇣ ⌘
Note that Vi Eij Vi† acts only on Hi always. If Eij = I, then Vi Eij Vi† = I, so the term Fi Vi Eij Vi† is not
the identity only if either Eij Pand Fi is not the identity.
m
Now, for any j, at most a=1 ra of the Eij s are not the identity, and the number of non-identity Fi
terms is at most
⇣ s, the⌘ size ofP
the fault path. Thus, the maximal possible number of non-identity tensor
† m
factors in Fi Vi Eij Vi is s + a=1 ra . That is,
X
Ue| i = Gj U | j i, (11.3)
j
P
where Gj is not the identity on at most s + a ra tensor factors Hi . Since U is a gate gadget for the code
and | j i is a valid codeword
P for each block, U | j i also consists of valid codewords for each block. Therefore,
this state will pass a a ra + s filter on each n-qubit block, so the gadget satisfies the GPP.
For the GCP, we take equation
P (11.3) and apply an ideal decoder for the QECC to each n-qubit code
block. Gj has weight s + a ra  t for each n-qubit code block, with t the number of errors the code
corrects, so if D is the ideal decoder,
D(Gj U | j i) = U | j i. (11.4)
Now, Ej also has weight  t for each n-qubit code block, so
D(Ej | j i) =| j i, (11.5)
and
e Ej |
D(U j i) = U| ji = U D(Ej | j i). (11.6)
Thus, by linearity,
e | i) = U D(| i),
D(U (11.7)
which is the GCP.
Of course, this proof is a long-winded way of saying something that is fairly obvious, which is that
transversal gates can spread errors between blocks, but they cannot cause errors to propagate within a block
of the code. Thus, the total number of errors in any single block of the output of a transversal gate is at
most the total number of errors summed over all input blocks plus the total number of faulty locations in
the circuit. P
Note that the proof of the GPP does not make use of the constraint a ra + s  t, so transversal gates
actually satisfy a stronger version of the GPP.
One nice property of transversal gates is that a series of them can be strung together, and the result is
another transversal gate.
Proposition 11.2. The product of transversal gate gadgets is a transversal gate gadget.
Proof. From the definition of transversal, it immediately follows that the product of two transversal gates is
a transversal gate. The product of gadgets for locations L1 and L2 is also a gadget, for the circuit consisting
of L1 followed by L2 .
In the standard fault-tolerant simulation, we put an error correction gadget after each gate gadget.
However, the notion of what a fundamental gadget is can be relative. As a consequence of proposition 11.2,
the composite gadget consisting of a product of some number of consecutive transversal gate gadgets also
satisfies the GPP and GCP, so is also fault-tolerant. Therefore, we still get a valid fault-tolerant simulation
if we only put a single error correction gadget after the whole composite gadget rather than after each of the
gate gadgets making it up.
Of course, if we combine two m-block transversal gadgets that act on some di↵erent blocks, the result
can be a gadget that acts on as many as 2m 1 blocks (assuming the gadgets share at least one block). Even
though transversal gates satisfy the GPP, each output block of the combined gadget may contain the errors
combined from all the input blocks. This sets a real limit to how many transversal gate gadgets one can
safely do in sequence before doing quantum error correction. Single-block transversal gadgets (composed of
single-qubit gates) don’t have this problem, although it’s still true that faults in the physical gates making
up a composite single-block transversal gadget will eventually add up.

181
11.2 Transversal Gates for Stabilizer Codes
So, transversal gate gadgets are automatically fault-tolerant. How do we determine, for a given code, what
transversal gates are available? In general, this problem doesn’t have a satisfactory known solution. A good
first step is to ask, for a given QECC and a given transversal circuit, is it a valid gadget for some logical
gate, or does it take some codewords to non-codewords? If it is a gate gadget, what is the logical gate? For a
general circuit and QECC, we can answer the latter questions by applying the circuit to a set of basis states
for the code and determining if we always get codewords. If we do, we can then compute the overall logical
operation. This works, but takes exponential time in the number of qubits involved. For the special case of
stabilizer codes and Cli↵ord group circuits, we can do much better.

11.2.1 Transversal Paulis

Let’s start with the easiest case, transversal gates made up of Pauli gates. In fact, we’ve already answered
the question for this case: a Pauli P is a gate gadget for stabilizer code S i↵ P 2 N(S). Furthermore, by
theorem 3.11, P is a gate gadget for a logical Pauli gate location, and we can determine which one by looking
at which coset in N(S)/S contains P .
Note that all the Paulis in the same coset in N(S)/S implement the same logical Pauli. All are transversal
and therefore fault-tolerant. This makes it clear that FT gate gadgets are not unique. In this case, there are
a total of 2n k perfectly good transversal implementations of the same gate gadget. However, from the point
of view of fault tolerance, the di↵erent implementations are not completely equivalent. In particular, they
involve di↵erent sets of physical locations, which can lead to di↵erent error rates and di↵erent kinds of errors.
For instance, for the five-qubit code, X ⌦ X ⌦ X ⌦ X ⌦ X and I ⌦ Y ⌦ Y ⌦ I ⌦ X are both representatives
of the X equivalence class. However, the first involves 5 X-gate locations, whereas the second involves only
3 gate locations (two Y s and one X) and 2 wait locations. If pS is less than pG , the second implementation
is probably preferred. In other circumstances, the first implementation might be better, for instance if Y
locations are noticeably more error-prone than X locations, or just because of its greater symmetry.

11.2.2 Cli↵ord Gadgets Preserve the Stabilizer

Moving to more general Cli↵ord group gates takes a little more work, but we’ve seen most of what to do
in chapter 6. Under the action of physical Cli↵ord group gate U , M 2 S gets mapped to U M U † , and the
stabilizer of the system afterwards is U SU † . So when is a Cli↵ord group circuit a valid gadget? Certainly,
if U M U † = M for all M 2 S, that means that the stabilizer post-operation is the same as the stabilizer
before the circuit, which in turn means that the final state is still a codeword of the same code. However,
this logic reveals that the condition U M U † = M is too stringent. It is sufficient if the stabilizer group as a
whole remains the same, and in particular, it is acceptable if conjugation by U permutes the elements of the
stabilizer.
Conversely, if U SU † = T 6= S, then U maps some codewords to non-codewords. Why is that? Well,
the conjugation action by U is a group isomorphism. That means that not only do distinct Paulis remain
distinct, but independent Paulis remain independent. In particular, that means that T has the same number
of generators as S, and thus has a code space of the same dimension. Since U is unitary, the map from T (S)
to T (T) is onto. But T (T) 6= T (S), so there is some | i 2 T (T) which is not a codeword of S, and there is
some | i 2 T (S) such that U | i = | i.
Summing up, we find a straightforward condition for U to give a valid gate gadget:

Theorem 11.3. Cli↵ord group circuit U maps codewords of S to codewords of S i↵ U M U † 2 S for all
M 2 S.

In other words, U is in the normalizer in the Cli↵ord group of S. This generalizes the fact that Paulis
give logical operations of the code space if they are in the Pauli group normalizer of S.
Checking all 2n k elements of the stabilizer could take a long time, but as usual, it is sufficient to check
only generators, which is much quicker.

182
Corollary 11.4. Cli↵ord group circuit U maps codewords of S to codewords of S i↵ U M U † 2 S for all
generators M of S.
This is true because conjugation is a group homomorphism, so the image of a product of generators is
the product of the images, which is again in S.
Theorem 11.3 and corollary 11.4 hold for arbitrary Cli↵ord group circuits, but for the purposes of fault-
tolerance, we are most interested in transversal gates. Theorem 11.3 combined with proposition 11.1 tells
us that if we have a transversal Cli↵ord gate in the normalizer of S than we have a fault-tolerant gadget.
For single-block transversal gates, that is straightforward enough, but you might wonder how exactly to
apply these ideas to multiple-block transversal gates. In this case, we should apply theorem 11.3 to the
stabilizer for multiple blocks of the code. For instance, a two-block transversal Cli↵ord gate should preserve
the stabilizer S⌦2 = {M ⌦ N |M, N 2 S}.
Let us illustrate by considering the 4-qubit code, as given in table 3.5. We will start with single-block
transversal gates, implemented by a tensor product of single-qubit operations. The stabilizer of the 4-qubit
code contains four elements, {I ⌦ I ⌦ I ⌦ I, X ⌦ X ⌦ X ⌦ X, Y ⌦ Y ⌦ Y ⌦ Y, Z ⌦ Z ⌦ Z ⌦ Z}. Conjugation
can never mix I up with the other three Paulis, but we should consider permutations of the other elements
of S.
Indeed, we can implement any permutation of the three non-identity Paulis in S. The trivial permutation
can be achieved by the logical Paulis, and indeed, these are the only gates that will leave the stabilizer
elements unchanged. Otherwise, we want to permute X, Y , and Z the same way on all four qubits. This can
be achieved by performing the same single-qubit Cli↵ord gate on all four qubits. For instance, transversal
H (i.e., H ⌦ H ⌦ H ⌦ H) will swap X ⌦ X ⌦ X ⌦ X with Z ⌦ Z ⌦ Z ⌦ Z, and map Y ⌦ Y ⌦ Y ⌦ Y to itself.
Note that H maps Y to Y on a single qubit, but when applied to all four qubits, the minus signs cancel
out.
Since single-qubit Cli↵ord group elements can perform any permutation of the single-qubit Paulis, we can
achieve any possible permutation of the stabilizer by picking an appropriate single-qubit Cli↵ord U and then
performing the transversal U . We need to do the same permutation of X, Y , and Z on each qubit, but what
happens to the phases of X, Y , and Z can be di↵erent on di↵erent qubits. The single-qubit Cli↵ords which
change the phase of Paulis without otherwise permuting them are exactly the Pauli subgroup of C1 . Thus,
the most general single-block transversal Cli↵ord gadget for the four-qubit code can be written uniquely as
P U , where P 2 N(S)/S and U = V ⌦4 , with V 2 Č1 an arbitrary single-qubit Cli↵ord group operation.
Now let us consider the two-block transversal Cli↵ord gadgets. The stabilizer of two blocks of the code
consists of 16 elements. It will perhaps be clearest if we group them in pairs consisting of the ith qubit of
both blocks. Thus, for instance, two elements of S⌦2 would be XI ⌦XI ⌦XI ⌦XI and ZY ⌦ZY ⌦ZY ⌦ZY .
We can take the generators of S⌦2 to be
{(XI)⌦4 , (ZI)⌦4 , (IX)⌦4 , (IZ)⌦4 }. (11.8)
Imagine we take a two-qubit Cli↵ord gate and perform it transversally. For instance, what happens when
we perform CNOT⌦4 ? The four generators become
{(XX)⌦4 , (ZI)⌦4 , (IX)⌦4 , (ZZ)⌦4 }. (11.9)
But (XX)⌦4 = (XI)⌦4 (IX)⌦4 and (ZZ)⌦4 = (ZI)⌦4 (IZ)⌦4 , so CNOT⌦4 is a valid transversal gadget.
Indeed, any transversal two-qubit Cli↵ord group gate will give a valid gadget. If U 2 C2 , then U ⌦4 maps
any Pauli of the form P ⌦4 to Q⌦4 , with P, Q 2 P2 . But Q⌦4 2 S⌦2 , since Q can be written as a product of
XI, ZI, IX, and IZ, and any phase will disappear when taken four times. The same argument applies to
m-block Cli↵ords for arbitrary m: If U 2 Cm , then U ⌦4 is a valid transversal Cli↵ord gate gadget. Indeed,
by the same argument as in the single-block case, up to logical Paulis, these are the only valid transversal
Cli↵ord gadgets for the 4-qubit code.

11.2.3 Determining the Encoded Gate for a Cli↵ord Gadget

So, now we know how to determine if a Cli↵ord circuit is a gadget for the stabilizer code. But a gadget for
what?

183
In chapter 6, you learned to characterize Cli↵ord group gates by their action on Paulis. We can use the
same concept to characterize logical Cli↵ord gates. Since N(S)/S is the logical Pauli group, the transformation
performed on it by a Cli↵ord gate will tell us what Cli↵ord operation has been performed on the encoded
state. (Of course, this only applies to Cli↵ord circuits which are gadgets, since a circuit which is not will
change S and therefore also not preserve N(S)/S.)
Let us illustrate the procedure by again considering the 4-qubit code. We know that any gate of the form
U ⌦4 for U 2 Cm is a transversal gadget. Suppose we do H ⌦4 . It changes the logical Paulis as follows:
X1 = X ⌦ X ⌦ I ⌦ I ! Z ⌦ Z ⌦ I ⌦ I (11.10)
X2 = X ⌦ I ⌦ X ⌦ I ! Z ⌦ I ⌦ Z ⌦ I (11.11)
Z1 = I ⌦ Z ⌦ I ⌦ Z ! I ⌦ X ⌦ I ⌦ X (11.12)
Z2 = I ⌦ I ⌦ Z ⌦ Z ! I ⌦ I ⌦ X ⌦ X (11.13)
What we’d like to do is to write everything on the right-hand side as a product of logical Paulis in order to
figure out what transformation is being performed. But, hold on a second: Z ⌦ Z ⌦ I ⌦ I can’t possibly be
written as a product of I ⌦ Z ⌦ I ⌦ Z and I ⌦ I ⌦ Z ⌦ Z! What’s wrong?
Don’t panic. Remember that the logical Paulis are not unique. We’ve chosen particular representatives
of cosets in N(S)/S, but other representatives are equally valid. What’s happened here is that the transversal
gate has changed, by itself, to a di↵erent representative of the coset. Yes, it’s rude to make the change without
telling us, but what can you do? Fault-tolerant gates do what they want and don’t listen to complaints.
In this case, it’s not too difficult to recover and figure out what is going on. We can immediately recognize
Z ⌦ Z ⌦ I ⌦ I = (Z ⌦ Z ⌦ Z ⌦ Z)(I ⌦ I ⌦ Z ⌦ Z) = Z 2 , and can similarly identify the other logical Paulis by
multiplying them by stabilizer elements. Overall, we find the following transformation of the logical Paulis:
X1 ! Z2 (11.14)
X2 ! Z1 (11.15)
Z1 ! X2 (11.16)
Z2 ! X1 (11.17)
We can recognize this as a gate which swaps the two logical qubits and applies the Hadamard to both.
Notice that when the QECC encodes multiple qubits, as in the case of the 4-qubit code, even single-block
transversal gates can do multiple-qubit logical gates. However, in the case of the 4-qubit code, not every
logical two-qubit Cli↵ord gate can be performed by transversal operations — there simply aren’t enough of
them.

11.3 Transversal Gates for the 7-Qubit Code

In section 11.2, we saw that the set of transversal gates for a stabilizer code is based on the set of symmetries
of the stabilizer. The more symmetric the stabilizer, the more transversal gates are available. The 4-qubit
code is pretty symmetric, but the 7-qubit code is more so, extending the symmetry to the logical Paulis
as well as the stabilizer generators. Consequently, the 7-qubit code is particularly attractive based on the
number of transversal gates it has available.
Looking at the stabilizer of the 7-qubit code and going through the same logic as for the 4-qubit code,
we immediately see that for the 7-qubit code, it is also true that U ⌦7 is a valid m-block transversal gate
gadget for any m-qubit Cli↵ord U . All elements of the stabilizer for the 7-qubit code also have weight 4, so
the phases will take care of themselves. Let’s go through some examples to see what logical operations they
perform.
First, Hadamard:
X =X ⌦X ⌦X ⌦X ⌦X ⌦X ⌦X !Z ⌦Z ⌦Z ⌦Z ⌦Z ⌦Z ⌦Z =Z (11.18)
Z =Z ⌦Z ⌦Z ⌦Z ⌦Z ⌦Z ⌦Z !X ⌦X ⌦X ⌦X ⌦X ⌦X ⌦X =X (11.19)

184
Thus, transversal Hadamard implements the logical Hadamard.
Next comes R⇡/4 :

X =X ⌦X ⌦X ⌦X ⌦X ⌦X ⌦X !Y ⌦Y ⌦Y ⌦Y ⌦Y ⌦Y ⌦Y (11.20)
Z =Z ⌦Z ⌦Z ⌦Z ⌦Z ⌦Z ⌦Z !Z ⌦Z ⌦Z ⌦Z ⌦Z ⌦Z ⌦Z =Z (11.21)

We need to be careful with Y ⌦7 . Y = iXZ, so

Y ⌦7 = i7 X ⌦7 Z ⌦7 = iXZ = Y. (11.22)
†
Thus, the transversal R⇡/4 does not do the logical R⇡/4 . Instead, we can identify the gate as R⇡/4 = R ⇡/4 .
This is an important example of a place where you need to be careful with phases or you get the wrong
answer. If we want to do R⇡/4 , we can instead do it using the transversal R ⇡/4 .
Next, let’s try a two-block gate, the transversal CNOT:

X ⌦ I = (XI)⌦7 ! (XX)⌦7 = X ⌦ X (11.23)

⌦7 ⌦7
I ⌦ X = (IX) ! (IX) =I ⌦X (11.24)
⌦7 ⌦7
Z ⌦ I = (ZI) ! (ZI) =Z ⌦I (11.25)
⌦7 ⌦7
I ⌦ Z = (IZ) ! (ZZ) = Z ⌦ Z. (11.26)

We can identify this as the logical CNOT between the qubits encoded in the two blocks of the code.
Now, H, R⇡/4 , and CNOT are generators of the Cli↵ord group, and we have transversal implementations
of all of them. That means we can implement the whole logical Cli↵ord group on m blocks of the 7-qubit
code just by doing transversal operations. The transversal U does not always give us the logical U — instead
it gives us the complex conjugate U ⇤ — but nevertheless, it is straightforward to perform any logical Cli↵ord
group operation for this code. This gives us high hopes for finding a fault-tolerant protocol for the 7-qubit
code, and indeed this is a good start, but unfortunately, it gets harder from here. This is the full list of
transversal gate gadgets for the 7-qubit code. To get a gate outside the Cli↵ord group, we will need another
construction, discussed in chapter 13.

11.4 Transversal Gates and Measurement for CSS Codes

The 7-qubit code is not bad, but there’s a whole world of codes out there, so it’s worth looking at a broader
class of codes to see what we can say about transversal gates. One gate that plays a particularly big role
in quantum computation is the CNOT gate, so it’s convenient that the 7-qubit code allows a transversal
implementation of it. What other codes have this property? It’s not straightforward to make a completely
general statement about transversal implementations of CNOT, but for the 7-qubit code, the gadget for
CNOT is just transversal CNOT. Let’s try to generalize that.
Take an arbitrary stabilizer code S, and consider two blocks of the code. Let’s see what happens when
we apply transversal CNOT. If M 2 S, then M ⌦ I 2 S⌦2 . Suppose M has binary symplectic representation
(xM |zM ); then transversal CNOT applied to M ⌦ I gives (xM xM |zM 0). For this to be in S⌦2 , we need
(xM |0) 2 S. By the fact that S is a group, (0|zM ) is also in S. By choosing generators appropriately, we can
thus always have all generators of the form (xM |0) or (0|zM ). That means it is a CSS code.
For a CSS code, we can always choose the logical X operators to be tensor products of X and I and the
logical Z operators to be tensor products of Z and I. The choice of basis codewords given in section 5.1.3
has this property. In that case, we can see that transversal CNOT will perform logical CNOT between all
corresponding pairs of encoded qubits. In other words, we find the following:

Theorem 11.5. A stabilizer code S has a gadget consisting of transversal CNOT i↵ S is a CSS code. If so,
S has a choice of logical operators for which the gadget simulates a location consisting of the tensor product
of CNOTs from logical qubit i in the first block to logical qubit i in the second block, for all i.

185
Theorem 11.5 makes CSS codes very attractive for fault tolerance. Just by choosing to work with a CSS
code, we know for sure we have logical CNOT gates available, albeit performed together on all logical qubits
in the block. It turns out that CSS codes always have another transversal gadget that is just as useful:
measurement.
Recall from section 5.1.3 that we can write the basis codewords of a CSS code as
X
|u + C2 ? i = |u + vi. (11.27)
v2C2 ?

The logical state encoded by this basis state is given by the coset of C1 /C2 ? which u lies in. Thus, to
measure the encoded data in the standard basis, it suffices to measure u up to an element of C2 ? . But we
can achieve this by simply measuring transversally in the standard basis: When we do so, we get u+v, with v
a random element of C2 ? . Thus, transversal standard basis measurement followed by classical decoding is a
gadget for measurement of logical qubits in the standard basis. The classical decoding part to determine the
coset of u is not implemented transversally, but in a basic error model, we assume that classical processing
is completely reliable.
Proposition 11.1 only tells us transversal gate gadgets are fault tolerant, so we should also check that
transversal measurement satisfies the MCP. In fact, for a general stabilizer code, transversal measurement
is not fault-tolerant, although it does serve as a gadget for some codes. For instance, for the five-qubit code,
transversal measurement of a perfect codeword will tell you the encoded state — consulting equations (3.11)
and (3.12), we see that |0i always gives an even parity string and |1i always gives an odd parity string —
but even a single bit flip error due to a faulty measurement location will cause us to get the wrong answer.
CSS codes, however, have the special property that transversal measurement is fault tolerant.
Theorem 11.6. For any CSS code, transversal measurement followed by classical decoding is a fault-tolerant
gadget implementing logical measurement of all encoded qubits.
Proof. The theorem is true because the codewords are superpositions of states from a classical error-correcting
code. That means a small number of errors in the classical output cannot confuse us between di↵erent logical
codewords.
To prove the MCP formally, let us consider the output state of an r-filter
X
| i= Ej | j i (wt Ej  r). (11.28)
j

Invoking corollary 10.2 again, we can assume the errors Ej are Pauli errors. Furthermore, we can assume
that all Ej in the sum have distinct error syndromes: If the code is non-degenerate, this is automatically
true. If the code is degenerate, since wt Ej  r  t, degenerate errors will act the same on the codewords
| j i, and we can re-write
P the decomposition over j to have only one error of each syndrome.
Let us write | j i = u cju |ui. Then the ideal decoder applied to equation (11.28) gives
X
cju |ui| (Ej )i, (11.29)
j,u

where the second tensor factor contains the syndrome

P bits. Since all (Ej ) are distinct, measurement of this
state simply gives codeword u with probability j |cju |2 . This is the RHS of the MCP.
Now let us figure out what a faulty measurement gadget does to equation (11.28). We have
0 1
X X
| i= @ cju Ej A |ui. (11.30)
u j

The s faults in the transversal measurement have the e↵ect of up to s additional bit-flip errors, which we
can encapsulate as F , with wt F  s. That gives us the state
0 1
X X
F| i = @ cju F Ej A |ui, (11.31)
u j

186
P single-qubit measurement. Notice that wt(F Ej )  r + s  t. Let’s go even
on which we perform perfect
further and expand |ui = v2C2 ? |u + vi. Then
0 1
X X X
F| i = @ cju F Ej A |u + vi. (11.32)
u2C1 /C2 ? v2C2 ? j

Now, C1 has distance d1 . If the CSS code is non-degenerate, d1 2t + 1, and F Ej |wi is orthogonal
to F Ej 0 |w0 i for any w 6= w0 2 C1 . When the code is degenerate, it is possible that F Ej |wi = ±F Ej 0 |w0 i
for some w 6= w0 2 C1 . However, since the quantum code does have distance 2t + 1, the only way this can
happen is if w + w0 2 C2 ? . Thus, if we want to calculate the probability of some output string x, at most
one coset u can contribute. This means the classical decoding of x will be unambiguous, giving a unique
coset u, which we can interpret as the measurement outcome.
It just remains for us to check that the probability of getting the outcome u is the same as in the ideal
case. The probability of the measured string being x for that particular measurement error F is
0 1 2
X
|hx|F | i| = hx| @
2
cju F Ej A |ui , (11.33)
j

where u is the unique u consistent with x. The probability of outcome u is (11.33) summed over x:
XX
Pr(u) = cju c⇤j 0 u hu|Ej†0 F † |xi hx|F Ej |ui. (11.34)
x j,j 0

P
The sum over x is summed over all xs that have non-zero probability for u, so we can replace x |xi hx|
with the identity. Then
X
Pr(u) = cju c⇤j 0 u hu|Ej†0 Ej |ui. (11.35)
j,j 0

But we have restricted the sum over j to have justPa single Ej with each error syndrome. Therefore, the
inner product gives 0 unless j = j 0 . Thus, Pr(u) = j |cju |2 , as desired.

It’s also worth discussing briefly when a CSS code has additional transversal operations beyond CNOT
(or product of CNOTs) and measurements. Thinking briefly about the structure of a CSS code, it’s clear
that transversal Hadamard is a gadget i↵ C1 = C2 . The 7-qubit code has this property, as does the 4-qubit
code. In general, we can form a CSS code with C1 = C2 whenever C1 ? ✓ C1 . However, as in the case of
the 4-qubit code, the action of the transversal Hadamard can be more than just Hadamard applied to all
encoded qubits. Since transversal Hadamard changes Xs into Zs and vice-versa, it does perform the logical
Hadamard on all qubits, but there may be some additional classical operation (such as SWAP or CNOT)
interacting the encoded qubits within a single block.
To have transversal R⇡/4 as a valid gadget for the code, we need further additional structure. R⇡/4 maps
X to Y , so the X generators of the code will get mapped to tensor products of Y and I, which must also be
⌦n †⌦n
in the stabilizer. Furthermore, the product of an X generator M and N = R⇡/4 M R⇡/4 must also be in the
stabilizer. The product will be a tensor product of Z and I, in fact what would be obtained from M via a
transversal Hadamard, but with an additional phase which depends on the weight of M . The phase is iwt M .
Since the stabilizer must be Abelian, wt M must be even, or else M and N would not commute, but we can
still get a phase of 1. In other words, to have transversal R⇡/4 as a valid gadget, we must have C1 = C2
and the weight of every generator must be a multiple of 4 (the classical code C1 ? is doubly even). However,
even if C1 ? is not doubly even, there is still a corresponding transversal gate, consisting of transversal R⇡/4
followed by some phase flips to fix up the phase. The logical operation performed by the transversal R⇡/4
(with or without bit flips) does not necessarily involve logical R⇡/4 s, however.

187
11.5 Other Topics Relating to Transversal Gates
11.5.1 Codes With Transversal ⇡/8 Gates
All the examples we’ve seen so far of transversal gates are Cli↵ord group gates, but naturally it’s also
possible to have codes with non-Cli↵ord group transversal gates. Either the physical operations involved in
the gate or the logical e↵ect of the gadget — or most often, both — can be a non-Cli↵ord unitary. However,
stabilizer codes have a particular affinity for Cli↵ord group gates because both are so closely connected with
the Pauli group. For many stabilizer codes, all the transversal gates are Cli↵ord group gates. Nevertheless,
a few examples are known of stabilizer codes with non-Cli↵ord transversal gates. We don’t currently have a
systematic understanding of when this is possible and which gates can be done with which codes, but in this
section I’ll present one particular family of codes with a non-Cli↵ord transversal gate. This will illustrate
the principle, and the smallest code from this family will be of use in section 13.5.
⌦n
The gate we’ll be using is the transversal ⇡/8 gate R⇡/8 . The R⇡/8 gate on a single qubit produces a
relative phase ei⇡/4 between |0i and |1i. Therefore, n of them applied to a basis state |xi with wt x = w
produce a phase ei⇡w/4 relative to the |00 . . . 0i state. In particular, basis states whose weight is a multiple
of 8 acquire no overall phase relative to the all-0’s state.
Recall that the codewords of CSS codes are superpositions over cosets of some classical code. Suppose,
therefore, that we consider a classical code C2 ? whose codewords all have weights 0 mod 8. (Such a code is
quadruply even.) The logical 0 of the corresponding CSS code is thus
X
|0i = |vi, (11.36)
v2C2 ?

⌦n
which is a superposition of basis states with weight 0 mod 8. Therefore, R⇡/8 |0i = ei |0i. There is a global
phase ei with = n⇡/8 because of our definition of R⇡/8 , but the point is that |0i is an eigenstate of the
transversal ⇡/8 gate. Furthermore, any other logical basis codeword
X
|u + C2 ? i = |u + vi (11.37)
v2C2 ?

is a superposition of basis states of weight w mod 8, where w = wt u. Therefore,

⌦n
R⇡/8 |u + C2 ? i = ei( +⇡w/4)
|u + C2 ? i. (11.38)

This is also an eigenstate of the transversal ⇡/8 gate.

The transversal ⇡/8 is thus a valid gadget for any CSS code with quadruply even C2 ? . The resulting
logical gate is a diagonal gate. Its action on the logical basis state corresponding to u 2 C1 is to produce a
phase ei( +⇡w/4) , where w = wt u.
The smallest interesting distance 3 code for which this happens is the 15-qubit code with stabilizer given
in table 11.1. C1 is the [15, 5, 8] punctured Reed-Muller code R(1, 4) (i.e., R(1, 4) with the last register
deleted). C2 ? is the even subcode, namely the set of codewords of C1 with even weight, which is the
punctured R(1, 4) with the all 1’s vector removed. The 15-qubit code is a [[15, 1, 3]] code. Since R(1, 4) is
quadruply even, the codewords in C1 have weight either 0 mod 8 or 7 mod 8. C2 ? therefore contains only
the 0 mod 8 codewords, so this CSS code has the right form.
Indeed, from the above discussion, we can see immediately that the |0i state is a superposition of those
codewords of C1 with weight 0 mod 8 and the |1i state is a superposition of the codewords of C1 with weight
7 mod 8. Therefore, the transversal ⇡/8 gate applied to |ai gives a phase ei⇡(1/8 a/4) , which is the logical
⇡/8 gate R ⇡/8 .

188
Z Z Z Z I I I I I I I I I I I
Z Z I I Z Z I I I I I I I I I
Z I Z I Z I Z I I I I I I I I
Z Z I I I I I I Z Z I I I I I
Z I Z I I I I I Z I Z I I I I
Z I I I Z I I I Z I I I Z I I
Z Z Z Z Z Z Z Z I I I I I I I
Z Z Z Z I I I I Z Z Z Z I I I
Z Z I I Z Z I I Z Z I I Z Z I
Z I Z I Z I Z I Z I Z I Z I Z
X X X X X X X X I I I I I I I
X X X X I I I I X X X X I I I
X X I I X X I I X X I I X X I
X I X I X I X I X I X I X I X

Table 11.1: The stabilizer for the 15-qubit code.

11.5.2 Transversal Gates Plus Permutations

Transversal gates are useful for fault-tolerance in part because they conjugate pre-existing errors into errors
of the same weight. Phrased in terms of our pictorial language, they have the property

r 0 r 0 r
= (11.39)

regardless of whether r  t or r > t.

Transversal gates are not the only set of gates that behave this way. Another option is the set of gates
which permute the qubits in the code but don’t otherwise change their values. Under a permutation gate
of this sort, an error on a single qubit just gets moved to a di↵erent qubit. You could think of this as
propagation, since an existing error causes an error to appear on a new qubit, but in the process, the error
is eliminated from the old qubit. Thus, property (11.39) is satisfied.
For some codes, it is possible to make interesting gates out of permutations. For instance, in the 4-
qubit code, suppose we swap the second and third qubits. The stabilizer generators are left unchanged, but
X 1 $ X 2 and Z 1 $ Z 2 . SWAP2,3 is thus a gadget for the logical SWAP gate between the code’s two
encoded qubits.
However, permutation gates are not fault tolerant unless we are careful about the implementation. The
permutation group can be generated by transpositions of pairs of elements, so it is sufficient to consider
circuits where every gate is the SWAP gate. The problem is that a fault on a SWAP gate can cause both of
the qubits involved in the gate to have errors. Therefore, instead of the GPP, we have

r s r s r + 2s
= (11.40)

In order to get around this, we need to be careful how to implement the SWAP. The circuit given in
figure 11.2 is one solution. To swap two data qubits, add a third ancilla qubit to act as an intermediary. By
performing three SWAP gates as in the figure, the overall e↵ect is to do a SWAP on the data qubits, but
none of the three gates directly interacts the two data qubits. Thus, a single faulty gate can only cause an
error on one of the data qubits. After the circuit is finished, the ancilla qubit can be discarded. Also note
that the initial state of the ancilla qubit is irrelevant — the circuit works just as well no matter what it is.
It is also OK to reuse the ancilla qubit many times in SWAPs like this.
The upshot is that a circuit of SWAP gates each implemented via the method of figure 11.2 is fault-
tolerant, satisfying both the GPP and GCP. For some codes, this can significantly expand the set of available
fault-tolerant gadgets. Another possibility is to consider gadgets built of a transversal gate followed by a

189
Ancilla
Qubit #1
Qubit #2

Figure 11.2: A circuit for swapping two physical qubits for which one fault only causes one error in the final
state. The ancilla can be in any state.

fault-tolerant implementation of a permutation gate. The combination will be fault-tolerant as well. However,
whereas any product of transversal gates remains fault tolerant, as does any product of permutation gates,
the combination (transversal gate – permutation gate – transversal gate) is not in general fault tolerant. In
particular, consider a case for which the transversal gates interact two di↵erent blocks of the code. A fault
in the first transversal gate can create an error in qubit i of both blocks of the code. If the permutation gate
treats the two blocks di↵erently, it could move one of those errors to qubit j 6= i on one block, and then
the second transversal gate could propagate the error to qubit j on the other block, which would then have
errors on the two qubits i and j even though there was only a single fault in the circuit. Therefore, if we
want to combine transversal gates and permutations, we will need to periodically do error correction.

11.5.3 Transversal Gates Cannot Be Universal

For the 7-qubit code, transversal gate gadgets implement the full Cli↵ord group. This is tantalizing — if we
can just do one more transversal gate, by theorem 6.9, we get a universal set of gates. Unfortunately, for the
7-qubit code, there are no more transversal gates. The Cli↵ord group by itself can be classically simulated,
so we definitely need something more to achieve the full power of quantum computation. Perhaps there is
another code which lets us do a universal set of gates transversally? Unfortunately, there is not:
Theorem 11.7 (Eastin-Knill). If Q is an ((n, K, d))q QECC with d 2, and G is the set of logical unitary
gates that have transversal gadgets for Q, then G is a discrete group. In particular, G is not dense in SU(K).
Theorem 11.7 tells us that transversal gate gadgets can never be enough to get a universal set of gates.
We will always need to do something else; the most common source of something else is to add in magic
states, which will be discussed in chapter 13.
Note that there does exist a classical code with a classically universal set of transversal gadgets: The
repetition code. Any classical gate that we wish to perform on the encoded state of a repetition code can
just be done by doing it separately on each copy.
The intuition behind this theorem is that if G is not discrete, it must have some non-trivial small unitaries
in it. But there is no way the QECC can distinguish between a small transversal gate that is supposed to
be a gadget and an error: As in theorem 1.1, the tensor product of unitaries very close to the identity is
also very close to a one-qubit error channel, and with distance 2, the code should at least detect if not
correct any one-qubit error channels. Presented with a gate gadget which is a small transversal gate, the
code will treat it as an error and correct it, meaning the logical action of the gadget is the identity. Thus,
any sufficiently small elements of G are trivial and G is discrete. To make this intuition precise, we should
look at infinitesimal unitaries, which means dealing with the Lie algebra.
Proof. It is sufficient to consider single-block transversal gadgets. An m-block transversal gadget can be
thought of as a single-block transversal gadget for the code Q⌦m , which can be interpreted as an ((n, K, d))qm
code, with the ith register consisting of the ith registers of all m blocks.
Let G̃ be the set of transversal gates which are gadgets for the code. G is equal to G̃/K, with K the set
of transversal implementations of the identity.
The proof will use a few facts about Lie groups and Lie algebras, which I will bring up as needed. The
first observation is that G̃ is a topologically closed subgroup of the compact finite-dimensional Lie group
U(q)⌦n , and is therefore a Lie group itself. G̃ is a subgroup by proposition 11.2, and it is closed because it

190
is the intersection of the set of transversal gates U(q)⌦n and the set U(K) ⌦ U(q n K) of unitaries which
preserve the code space, both of which are closed sets.
Since G̃ is a Lie group, it has a Lie algebra g which is a subalgebra of the Lie algebra t of U(q)⌦n . t is
spanned by elements of the form iH, where H is a weight-1 Hermitian operator. Therefore, an arbitrary
element iD of g can be written as a sum of weight-1 operators. However, the code has distance at least 2,
so it detects any single-register error. By theorem 2.4, D must therefore be a detectable error. That is, for
any codeword | i, D| i = ↵| i + | i, where | i is orthogonal to the code space Q.
Now, a neighborhood of the identity in G̃ is generated from the Lie algebra by the exponential function,
U = eitD , D 2 g.
X (itD)r
U | i = eitD | i = | i = | i + (itD)| i + O(t2 ). (11.41)
r
r!

For any t, U is a logical operation on the code, so this sum is a codeword for all t and all | i 2 Q. The only
way this can be is if D| i 2 Q. Combining with the error detection property, we find

D| i = ↵D | i (11.42)

when | i 2 Q and D 2 g. ↵D can depend on D, but by the error correction conditions, it cannot depend on
| i.
Therefore, given any element U = eiD in a neighborhood of the identity in G̃, we have

U | i = eiD | i = ei↵D | i. (11.43)

U performs the trivial logical gate, and the neighborhood of the identity lies within K.
The finite-dimensional Lie group G̃ is a union of a discrete set of connected components. The connected
component containing the identity is generated by a neighborhood of the identity, and therefore the whole
connected component of the identity lies within K. The other connected components are cosets of the
identity component, so G = G̃/K is a discrete group.

191
192
Chapter 12

Who Corrects The Correctors?:

Fault-Tolerant Error Correction And
Measurement

In this chapter, I’ll mostly discuss fault-tolerant error correction gadgets. An FTEC gadget invariably
involves some ancilla qubits, which are used to measure the error syndrome without removing the state
from its protective QECC. Since measurement is a big part of an FTEC gadget, the major methods of
fault-tolerant error correction can also be modified to produce fault-tolerant measurement gadgets, so I’ll
talk about those too.
FTEC gadgets are much more complicated than transversal gate gadgets. Designing one is a special
challenge. For other FT gadgets, you must simply be careful not to cause errors to propagate too badly
and not cause too many new errors. An FTEC gadget, operating correctly, must be able to eliminate errors
despite being imperfect itself. It’s too much to hope for (and, indeed, impossible to achieve) that the FTEC
gadget will necessarily output a perfect codeword, since there’s always the possibility of faults at the very
end of the procedure, but in a well-designed FTEC gadget, any errors dating from before the beginning of
the gadget will be corrected. Thus, an FTEC gadget may not be able to correct itself, but it at least corrects
for the previous EC gadget as well as any gate or other gadgets that may have occurred in between.

12.1 Fault Tolerant Pauli Measurement for Stabilizer Codes

For a stabilizer code, error correction consists of measuring the eigenvalue of all generators of the stabilizer.
One approach to doing this is to measure the eigenvalues one-by-one and then put them together to form the
complete error syndrome. This approach is not the only option, but it is conceptually the most straightfor-
ward, so I’ll start with it. Indeed, I’ll go right ahead and discuss how to fault-tolerantly measure any single
Pauli operator, whether in the stabilizer or the normalizer, for a general stabilizer code.
Unfortunately, while the basic idea of the technique is simple, there are a lot of annoying details needed
to make it work right. That means this section and the next can get a bit involved at times. If you’re not
really into the details, you may want to read the basic idea of cat state measurement and Shor EC and skim
through the rest of sections 12.1 and 12.2.

12.1.1 Non-Fault-Tolerant Measurement of Paulis

Let’s start by talking about something we passed over in chapter 3, how to measure the eigenvalue of
an arbitrary Pauli operator. We want this for error correction, to measure the eigenvalues of stabilizer
elements, and we want to be able to measure eigenvalues of logical Pauli operators, for instance at the end
of the computation. It may not be immediately apparent how to do either, even when we don’t have to

193
X
a) b)
|0i H H |0i H H

P0 P0
P1 P1
P2 P2 P4
P3 P3 P5
P4 P4

Figure 12.1: a) A non-fault-tolerant measurement procedure for eigenvalues of a Pauli operator P1 ⌦ P2 ⌦

P3 ⌦ P4 ⌦ P5 , b) A single fault can lead to multiple output errors.

worry about fault tolerance. In any case, the construction serves as a good starting point for building a
fault-tolerant measurement procedure.

The solution is illustrated in figure 12.1a. To make the procedure useful for error correction, we’d like
to do a non-destructive measurement, leaving the data unchanged except for projecting on the eigenstate of
the Pauli P ; assume the phase of P is such that the eigenvalues are ±1. Therefore we’ll need an ancilla, and
since the classical output is just one bit, a single ancilla qubit will suffice. Suppose we perform a controlled-P
with the ancilla as control and the data qubits as target. If the data began in a +1 eigenstate of P , then
whether the ancilla is |0i or |1i, nothing happens. If the data began in a 1 eigenstate of P , then when the
ancilla is |0i, again nothing happens, but if the ancilla is |1i, the state acquires an overall sign of 1. When
the ancilla starts as |+i = |0i + |1i, the state of the ancilla after the controlled-P gate is |0i + ( 1)b |1i, where
b is the eigenvalue of P for the data. Then a Hadamard transform on the ancilla maps |0i + ( 1)b |1i 7! |bi,
and measuring the ancilla qubit gives us the answer. If the data began in a superposition of eigenvalues, it
collapses to an eigenstate with eigenvalue ( 1)b .

In the non-fault-tolerant scenario, this works well, but it fails miserably as part of an FT gadget. For
one thing, it clearly fails the MCP. For instance, if there is an error in the ancilla qubit during the final
measurement, the classical outcome can be wrong, no matter how good the QECC used for the data is.
Furthermore, if this Pauli measurement is part of an EC gadget, it will likely also cause the gadget to fail
the ECRP. The problem is that we are using a single qubit to control multiple qubits within the same block
of the code. Thus, a single fault can cause multiple errors in the outgoing state. There’s no measurement
propagation property, since the standard measurement gadget is a destructive measurement, but if we made a
gadget for non-destructive measurement, and tried to do it via the method discussed above, the measurement
propagation property would fail as well.

Let’s see precisely what goes wrong. An example is shown in figure 12.1b. If the initial state of the
ancilla su↵ers a bit flip, that does nothing — X|+i = |+i. However, if the ancilla undergoes a bit flip
halfway through the controlled-P gate, we have a problem. The |0i branch and the |1i branch switch, so the
data qubits involved in gates after the bit flip get their Pauli operations when they’re not supposed to, and
don’t get Paulis when they are supposed to. Thus, a single faulty gate can cause multiple data errors, even
if the fault doesn’t directly a↵ect the data block. If an event like this happens near the end of an EC gadget,
the state won’t pass a 1-filter at this point. Even if it’s in the beginning or middle of an EC gadget, if the
code only corrects 1 error, it isn’t strong enough to function properly after getting so many errors a↵ecting
the same code block.

194
H
H n 0
eve
|cati H
o dd
H 1
H

P0
P1
P2
P3
P4

Figure 12.2: Measurement of a Pauli operator P1 ⌦ P2 ⌦ P3 ⌦ P4 ⌦ P5 using a cat state.

12.1.2 Single Measurement Attempt Through Cat States

To solve this problem, let’s take inspiration from transversal gates. The problem of having a single error
in the ancilla propagate to multiple qubits is precisely the same error propagation issue we worried about
before, so we should somehow make the controlled-P gate into a transversal gate. This means increasing the
size of the ancilla to involve m = wt P qubits. Then we’ll have one ancilla qubit for each single-qubit Pauli
gate making up P , and each ancilla qubit interacts with only one qubit in the data block. That limits error
propagation, just as for transversal gate gadgets.
What state should the ancilla be in? We want a coherent superposition between two cases: one where
we do nothing to any of the data qubits, and one where we perform P on all m of the relevant data qubits.
Thus, the ancilla should be |00 . . . 0i + |11 . . . 1i. In the general quantum information literature, this state
is sometimes called the m-qubit GHZ state, but in the literature on fault tolerance, it is usually known as a
cat state. Like Schrödinger’s Cat, the cat state is in a macroscopic superposition, in this case between the
all 0s and all 1s states.
After performing the transversal P , we have |0 . . . 0i + ( 1)b |1 . . . 1i when the eigenvalue of P is b. We
could determine b by disentangling the cat state and measuring |0 . . . 0i + |1 . . . 1i vs. |0 . . . 0i |1 . . . 1i vs.
everything else, but it turns out there is an easier way. Perform the Hadamard transform on each qubit of
the cat state. Then
Xh i
H ⌦m (|0 . . . 0i + ( 1)b |1 . . . 1i) = 1 + ( 1)b+x·(1,...,1) |xi (12.1)
x
X⇥ ⇤
= 1 + ( 1)b+wt x |xi (12.2)
x
(P
|xi b=0
= Pwt x even (12.3)
wt x odd |xi b=1

After the Hadamard transform, we can then measure every qubit and look at the weight of the resulting
string. The parity of the weight is equal to b. The procedure is summarized in figure 12.2.
If we assume the cat state is provided to us with no errors, this circuit solves the issue of error propagation.
A single fault in it can certainly only produce one error in the data block. If we define a picture for the

195
a) b) X X
|0i |0i X
|0i H |0i H
|0i |0i
|0i |0i

Figure 12.3: a) A non-fault-tolerant circuit to construct a 4-qubit cat state. b) The e↵ect of an error on this
construction.

gadget of “non-destructive cat-state measurement”

s
(12.4)

cat

which gives a classical bit as output and has the data block persist, then we have shown that

r s r s r+s
= (12.5)

cat cat

assuming a perfect cat state is provided to us.

Of course, the procedure still fails the MCP. A single bit flip error in the final measurement of the ancilla
will give the wrong output value, as will a phase error earlier in the circuit. While we’ve made progress, we
still have two problems to take care of: How can we create a cat state reliably, and how can we make the
output bit more robust?

12.1.3 Constructing Cat States

First, let’s think about how to make the cat state. There are lots of simple circuits to make cat states, such
as the one in figure 12.3a.
Unfortunately, they all share a common structure: Put one qubit in the state |+i and start the others as
|0i. Then do a bunch of CNOTs in order to entangle all the qubits.
The problem is that this is not transversal at all. Again, a bit flip error in the middle, as in figure 12.3b,
can produce multiple bit flips in the output, giving us a state such as |0011i+|1100i instead of |0000i+|1111i.
If you use the erroneous cat state in the measurement circuit of figure 12.2, the data block will inherit the
errors on the cat state, leading to multiple errors in the data block, all from a single fault during the
construction of the cat state. That is not acceptable.
One solution is to build the cat state as in figure 12.3a, but then check it before using it. How can we
check it? Well, the cat state is kind of a repetition code — all qubits are supposed to be the same in the
standard basis. We can check that by looking at pairs of qubits and seeing if they are the same or di↵erent.
(But remember, we don’t want to collapse the superposition, so we should only measure the parity, not
whether they are 0 or 1.) We’ve already discussed how to do that in chapter 2; just use an ancilla qubit and
two CNOTs, as in figure 12.4.
If we do this on enough pairs of qubits, any single fault in the circuit originally constructing the cat state
(figure 12.3a) will be picked up by the checks. If we are using a code that corrects more errors, we can do
additional checking to take care of up to s faults. If the checks reveal a serious problem, we can discard the
state and start over. If you find that wasteful, you may prefer to do extra checks and instead try to correct
the cat state. Either way works.

196
|0i

Figure 12.4: Checking a cat state to see if the first and last qubits are the same or di↵erent.

But hold on! Circuit figure 12.4 is non-transversal itself. Couldn’t a single error in it cause two errors in
the final cat state?
Surprisingly, no. To understand this, we have to look carefully at the types of errors and how they
propagate. As usual, it is sufficient to consider Pauli errors. Now, on the cat state, a single Z error on any
qubit gives us |0 . . . 0i |1 . . . 1i. Two Z errors on di↵erent qubits gives us |0 . . . 0i + |1 . . . 1i, which is the
correct state again. In general, an odd number of Z errors gives us the same state as just one Z error and
an even number of Z errors is the same as no Z errors. That is, for the cat state, it’s only possible to have
a single Z error! (A more pessimistic point of view is that a single Z error is already enough to completely
ruin the cat state, so having more can’t make things worse.) A Y error we can think of as an X error plus
a Z error, so we don’t have to worry about it separately here.
Now, multiple X errors are definitely a possibility, and something we need to avoid. But the circuit from
figure 12.4 can only cause one X error in the cat state if there is a single faulty location. We’ll detect a single
pre-existing X error with the check, so we’re not so worried about propagation of pre-existing X errors.
Looking at figure 12.4, we don’t have anything to worry about anyway, since X can propagate in the first
CNOT from the cat state into the check qubit, but then in the second CNOT, it doesn’t propagate back
into a second cat state qubit. (X only propagates from control to target in a CNOT gate.) A fault in the
second CNOT can never cause more than one fault in the cat state. So that leaves a fault in the |0i check
qubit preparation location or a fault in the first CNOT. But since CNOT only propagates phase errors from
the check qubit into the cat state, either of these faults can only produce a phase error in the second cat
state qubit. The worst case then seems to be a fault in the first CNOT that produces a bit flip error in the
first cat state qubit and a phase error in the check qubit, which then propagates into a phase error in the
second cat state qubit. This would seem to lead to a bit flip error in one cat state qubit and a phase error
in a second cat state qubit, for a total of two errors from one fault. However, Z errors on di↵erent qubits
act the same way on the cat state — they are degenerate — so X ⌦ Z on the cat state can be rewritten as
iY ⌦ I, which is a one-qubit error.
There is one other issue we need to be careful about with the check qubits. A single bit flip error on the
check qubit, whether coming from the preparation of the check qubit or due to a fault on one of the CNOT
gates, can lead to the wrong outcome of the check qubit. This doesn’t directly lead to a problem in the cat
state, but as part of a gadget where multiple errors are a possibility, an incorrect check qubit could combine
with other faults to lead us to the wrong decision as the whether to keep the cat state or not. This can be
avoided by repeating the checks extra times, meaning more erroneous check qubits are required to trick us.
Taking all this into account, we can now say

r s r s r+s
= (12.6)

cat cat

without assuming a perfect cat state is provided. We’ll build our own.

197
2t + 1 repetitions

FTEC ... FTEC FTEC

cat cat cat cat

Figure 12.5: Fault-tolerant cat state measurement, including repetition and interspersed error correction
steps.

12.1.4 Repeating the Measurement

Of course, it’s still true that a single error in the cat state will cause the measurement to give the wrong
result. The solution is to repeat the measurement using a new cat state. One fault somewhere in the circuit
can cause an error only in one cat state, so if the number of repetitions we use is sufficiently larger than the
number of faults we need to worry about, we should be able to deduce the correct outcome.
That’s the principle, anyway. To actually make a FT measurement gadget this way requires a more
careful analysis and an additional modification. As long as the errors in the circuit only a↵ect the cat state,
simple repetition is enough, but if there are also errors in the codeword we are measuring, we get more
problems.
To see why, consider a codeword of the five-qubit code with a single physical bit flip error on qubit i,
Xi | i. Suppose we wish to measure the logical Z on this state. A representative of Z for the five-qubit code
is Z ⌦ Z ⌦ Z ⌦ Z ⌦ Z, so we might hope it is sufficient to use the above cat state measurement procedure to
measure its eigenvalue. However, Xi anticommutes with this representative of Z, so the eigenvalue of Xi | i
for Z ⌦5 is the opposite of the eigenvalue of | i for Z. Note that Xi commutes with other representatives of
the same logical Pauli Z; this is another example where choice of apparently equivalent representatives of
logical Pauli operators produces fault-tolerant protocols with potentially di↵erent properties. More generally,
if the physical error E undergone by the codeword anticommutes with the Pauli P which we are measuring,
cat state measurement will give the wrong answer, no matter how many times we repeat it.
In order to solve this problem, clearly we should do error correction just before beginning the cat state
measurement procedure. We haven’t discussed how to do error correction yet, but we will shortly. Any
method of FT error correction will work, but out of the methods discussed in this chapter, the Steane and
Knill error correction techniques have associated measurement techniques, and it makes more sense to use
the matching measurement techniques since they have the same advantages and disadvantages as the EC
gadgets. The cat state measurement technique is thus best paired with Shor error correction, which I’ll
discuss in section 12.2.
However, one property all FTEC gadgets have in common is that they only guarantee correcting errors
on the input state. It is always possible that an error late in the FTEC gadget will leave a residual error
at the end of the gadget. Therefore, EC just before cat state measurement is not sufficient to be sure that
there are no physical errors around. Furthermore, a fault during a single cat state measurement procedure
can produce an error in the data — we have gone to great e↵ort to ensure that one fault just produces a
single-qubit error, but we can’t rule out errors altogether. That would have the e↵ect of making all future
cat state measurement repetitions wrong too if the error anticommutes with P . The current repetition may
or may not be wrong, depending on the exact nature of the fault.
In order to handle these problems, we’ll need to do error correction not just at the beginning of the
measurement gadget, but after each repetition of the cat state measurement. We don’t actually need to
include an EC gadget at the beginning of the measurement gadget at all, since our standard FT simulation
(definition 10.6) already puts an EC between every adjacent pair of gadgets. Putting another one right at
the start of the measurement gadget would thus be redundant. The overall measurement gadget thus looks
like figure 12.5.
Note that the EC steps included inside the cat state measurement gadget are constructed exactly like
the standard EC gadgets discussed later in this chapter, but they are considered part of the measurement

198
gadget as a whole rather than as separate error correction gadgets. They are sub-gadgets, if you like, of the
bigger FT measurement gadget.

12.1.5 Cat State Measurement Satisfies the FT Measurement Property

Now to prove that this procedure taken as a whole actually works, giving a measurement gadget that satisfies
the MCP.

Theorem 12.1. For a code that corrects t errors, the cat state measurement gadget consisting of 2t + 1
repetitions of single-cat state measurements interspersed by FTEC steps satisfies that MCP. The classical
measurement outcome is taken by choosing the majority measurement outcome among the repetitions.

Proof. We have constructed the single-copy cat state measurement procedure so that

r s r s r+s
= (12.7)

cat cat

Furthermore, if there are no errors in the incoming state and the single-copy cat state measurement circuit
has no faults (including during the construction of the cat state), then the outcome is correct:

0 0 0
= (12.8)

cat

The picture assumes a non-destructive ideal measurement, since the cat state measurement procedure is
non-destructive, and equality includes having the same classical bits for the measurement outcome.
We need one more property of the single-copy cat state measurement. It is a non-destructive procedure,
and the output state does not have too many new errors (maximum equal to the number of faults). We’d like
to say that the encoded state doesn’t change under this procedure, except for being decohered in the basis
being measured. If the whole procedure has no incoming errors and no faults, we’ve already included this
e↵ect in equation (12.8), but when there are errors or faults, we want a weaker property where the encoded
state is correct, but the measurement result may not be. We can codify this using the pictures

s
and (12.9)

cat

to represent measuring non-destructively (using the cat-state measurement gadget and ideal measurement,
respectively) and discarding the classical outcome but keeping the quantum state.
Lemma 12.2. Single-copy cat state measurement satisfies

r s r
= (12.10)

cat

if r + s  t.

Proof of lemma. To prove equation (12.10), consider an input state

X X
| i= Ej | j i = cja Ej |ai, (12.11)
j j,a

199
where |ai are encoded basis states for an eigenbasis of the Pauli P being measured, and we may assume Ej
are Paulis with weight  r because of the r-filter. As in the proof of theorem 11.6, we can assume there is
only one Ej for each error syndrome of the code.
Because of the procedure checking the cat state, if there are at most sc faults in the circuit constructing
and checking the cat state, the cat state used in the main part of the procedure can be written F (|0 . . . 0i +
|1 . . . 1i), with wt F  sc . The part of the procedure involving controlled-Paulis from the cat state to the
data has sg faults, with sc + sg  s, and assume we can write the new errors as Gc ⌦ Gd 2 Pn+m , occurring
after the gate. Subscripts c and d indicate the parts acting on cat state and data, respectively. We have
wt Gc  sg and wt Gd  sg .
Now, Ej can propagate into the cat state and F propagates into the data block. Therefore, after the
interaction, the data has error Hdj = Gd F 0 Ej and the cat state has error Hcj = Gc Ej0 F . Here F 0 is the
e↵ect of propagating F into the data and Ej0 is the e↵ect of propagating Ej into the cat state. Note that the
interaction gates are controlled-Pauli gates, which leave a single-qubit error the same on the original qubit,
but may add a Pauli on the other qubit involved in the gate. The only really important points are that

wt Hdj  wt Gd + wt F 0 + wt Ej  sg + sc + r  r + s  t (12.12)

and that Hdj and Hcj are Paulis.

Now suppose we trace over the cat state and apply an ideal decoder to the data block. The ideal decoder
measures the error syndrome of Hdj , so di↵erent values of the error syndrome get decohered. Since the error
syndrome (Hdj ) = (Gd F 0 ) + (Ej ), all Ej have di↵erent error syndromes, and Gd and F 0 don’t depend
on j, it follows that all di↵erent Hdj have di↵erent error syndromes too. Thus, we can treat the output state
as a mixture over di↵erent js. In particular, we can analyze each j separately.
For a specific j, Hcj is a fixed Pauli. When the encoded basis state |ai has eigenvalue b for P , the
cat state, with errors, is Hcj |0 . . . 0i + ( 1)b |1 . . . 1i . Since Hcj is unitary, for fixed j, the states |ai with
eigenvalue +1 have an orthogonal cat state value to the basis states with eigenvalue 1. Thus, tracing over
the cat state has the e↵ect of decohering the system according to the eigenvalues of P . The output of the
left hand side is thus X
| j+ i h j+ | + | j i h j |, (12.13)
j
P
where | j± i = a cja |ai, with the sum taken over a with eigenvalue ±1.
The right hand side does the same thing: The ideal decoder immediately decoheres j, and then the ideal
decoherence step decoheres the +1 eigenvalues from the 1 eigenvalues.
Since we use fault-tolerant EC steps in the gadget, they satisfy the ECRP and ECCP. Therefore, if we
have a sequence of adjacent EC steps and cat state measurement with no faults, the measurement outcome
of that cat state measurement will be correct for the encoded state exiting the EC step, regardless of the
number of errors going into the EC step:

0 0 0 0 0
FTEC = FTEC (12.14)

cat cat
0 0
= FTEC (12.15)

0
= FTEC (12.16)

We use the ECRP in the first line to add a 0-filter and again in the last line to remove the 0-filter, and use
equation (12.8) to get the second line.
Now suppose we have a state which passes an r-filter initially and subject it to a circuit with si faults in
cat state measurement repetition i (i = 0, . . . , 2t) and s0i faults in EC step i (i = 1, . . . , 2t).PEC step
P i occurs
between measurement repetition i 1 and i. The total number of errors plus faults is r + i si + j s0j  t.

200
I claim that the state after cat state measurement repetition i passes an ri -filter, with ri = si + s0i : Let
s00 = r. The case i = 0 follows directly from equation (12.7). For repetition i, we have

s0i si s0i s0i si

FTEC = FTEC (12.17)

cat cat
s0i s0i si si + s0i
= FTEC (12.18)

cat

using the ECRP in the first line and equation (12.7) in the second.
After all repetitions of the cat state measurement gadget are completed, we discard the residual quantum
state and keep only the majority classical result. Suppose, however, we didn’t throw away the state, but
instead performed an ideal decoder on it. Then the LHS of the MCP would look like

r s
(12.19)

Let us look at the last repetition of the measurement. By equation (12.18), we can insert a r2t 1 filter before
the last EC step, and by the ECRP, we can insert a s02t -filter after the last EC step. We have for the LHS
of the MCP (focusing on the last repetition)

s02t s2t r2t 1 s02t s02t s2t

... FTEC = ... FTEC (12.20)

cat cat

If r2t = s02t + s2t = 0, we can apply equation (12.8) to move the ideal decoder left past the cat state
measurement, getting an ideal measurement instead:

r2t 1 s02t s02t

... FTEC (12.21)

If r2t > 0, we cannot do this, but it is still true that r2t  t, so we can apply equation (12.10) to move the
ideal decoder left. Instead of getting an ideal measurement, we get an ideal decoherence, and must discard
the classical measurement result to get equality:

r2t 1 s02t s02t

... FTEC (12.22)

Of course, in the real system, we don’t know which case we have; this is just a formal approach to help us
understand the e↵ect of the faulty measurement gadget.
The point is, in either case, we are left with the combination

r2t 1 s02t s02t

... FTEC ... (12.23)

followed by an ideal measurement. If r2t = 0, the classical measurement outcome is correct, whereas if
r2t > 0, the classical measurement outcome might be incorrect. We can further process (12.23) using the
FTEC properties. First use the ECRP to remove the s02t -filter, then use the ECCP, with the condition that

201
r2t 1 + s02t  t, to move the ideal decoder further left:

r2t 1 s02t s02t r2t 1 s02t

... FTEC ... = ... FTEC ...
(12.24)
r2t 1
= ... ... (12.25)

Now we can remove the r2t 1 -filter using equation (12.18) again, and we now have the (2t 1)st repetition
of the single-copy cat state measurement followed by an ideal decoder. We can follow the same procedure
to move the ideal decoder left again, past the EC step and cat state measurement, leaving the ideal decoder
after the (2t 2)th repetition. We keep on doing this until the ideal decoder is just after the first repetition:

r s0
... (12.26)

cat

We then apply equation (12.8) or equation (12.10) one last time to move the ideal decoder to just after the
r-filter.
We are left with an r-filter, followed by an ideal decoder, followed by a sequence of measurement steps,
in some of which we keep the measurement result and in others of which we discard the measurement result.
In particular, we had to discard all the measurement results corresponding to repetitions with faults, and
keep the measurement results corresponding to repetitions with no fault in either the EC step or cat state
measurement step. In the real measurement gadget, we don’t know which ones to keep and which ones to
throw away, so instead we just get a sequence of ideal measurement steps, but with some of the classical
results being possibly wrong. Notice that, since each cat state measurement is for the same Pauli P , at least
all the correct
P2t measurement results are the same.
Since i=0 ri  t, at least t + 1 of the ri must be 0. Thus, out of the 2t + 1 classical measurement results,
at least t + 1 of them are correct. Therefore, taking the majority gives the correct measurement outcome,
and we can conclude
r s r
= (12.27)

leaving the same residual quantum state on the left and the right. Therefore, the LHS and RHS are also the
same if we throw away the quantum state, which means we no longer need the ideal decoder on the LHS.
That gives us the MCP.

12.2 Shor Error Correction

12.2.1 The Challenge for Shor EC Construction
Cat state measurement gives us an FT measurement gadget valid for any stabilizer code, but only if we
already have an FTEC gadget for the code. So we should figure out an error correction gadget, don’t you
think?
Luckily, we can build one using the same basic component of cat state measurement. This technique was
invented by Peter Shor, so is known as Shor error correction.
The basic idea is simple: We use cat state measurement to measure the eigenvalue of each generator
of the stabilizer in sequence. Put these together, and we have the error syndrome as a classical bit string.
Then we do classical processing to figure out what error corresponds to that error syndrome, and correct the
appropriate qubit(s) of the code. There may be some error syndromes that correspond to no errors of weight
 t. Nevertheless, we’ll assign some error to each such error syndrome; that way, if a rare error occurs that
happens to have one of the extra error syndromes, we won’t be totally lost.

202
We’ve constructed the cat state measurement so that we don’t get too many extra errors appearing in
the data during the syndrome measurement — no more than the number of faults. However, that’s not good
enough for the ECRP; we need to get rid of all the pre-existing errors too. Furthermore, we still have the
problem that a single error during a cat state measurement can give us the wrong result. While we don’t
care about the value of syndrome per se — we discard it after the EC gadget is done — we are going to try
to correct a qubit based on the syndrome, and if we have the wrong syndrome, we will incorrect the state
rather than correcting it.
For instance, consider the five-qubit code, table 3.2. Suppose the true error is Y3 , which has syndrome
1110. A single fault during the cat state measurement for the first syndrome bit could instead give us
syndrome 0110, which corresponds to the error X4 . Instead of removing the error, we add one.
This may not worry you, since the ECCP only insists that the EC gadget behave correctly when r +s  1,
and a pre-existing Y error plus a fault during the measurement gives r + s = 2. However, imagine that there
are no errors on the incoming state (r = 0), but a single fault occurs on the controlled-Z gate between qubit
3 of the cat state and qubit 3 of the data block during the cat state measurement for the first syndrome
bit. Since the location is a two-qubit gate, a fault can produce errors on both qubits involved in the block.
For instance, it could produce a Y error on the data qubit and a Z error on the cat state qubit. With
just a single fault, this will cause both the error on the state to be Y3 , as above, and the first bit of the
syndrome measurement to be wrong. Thus, we have a mistaken notion that the error is X4 and perform the
appropriate correction, giving us an output state with two errors on it. Even an ideal decoder won’t get it
right at this point.
Actually, the situation is even worse than that. Imagine that instead of having the fault at the beginning
of the syndrome measurement process, we had it in the middle. Initially, there is no error, but then, perhaps
after successfully measuring the first two bits of the error syndrome (and getting 00, since there is as yet no
error), the error Z2 occurs. Suppose we then measure the next two syndrome bits correctly too, getting 01.
We then put this together to find the syndrome 0001, which corresponds to error X1 . The error syndrome
changes in the middle of error correction, causing us to get a hybrid syndrome, which corresponds to a
totally di↵erent error. Again, a single fault causes us to end up with two errors in the code block.

12.2.2 Repeated Syndrome Measurement

The solution, as with the cat state measurement gadget, is to repeat the syndrome measurement multiple
times. That might make you nervous, since before we needed to use EC steps between each repetition.
However, the situation is a bit di↵erent here. Before, there was a true measurement outcome (once the
state collapses), and we need the gadget to output the correct classical bit. Each attempt at a cat state
measurement tells us something about the encoded state, but also something about the error on the state,
which is extraneous information for a measurement gadget.
Here, we want to learn about the error. We don’t care what the original syndrome is, only a syndrome
that tells us what the pre-existing errors were. If it also tells us about some errors that have subsequently
occurred during error correction, so much the better. The main thing to avoid, as we saw with the examples,
is the case where the syndrome is simply wrong, and is telling us about errors that didn’t occur.
We need to be careful in how we do the repetition, though. Suppose we were to say, OK, I need to repeat
each syndrome bit so that I can rely on it. I’ll repeat the first syndrome bit measurement three times, then
I’ll do the second syndrome bit measurement three times, and so on, as in figure 12.6. If we were to do that,
we’ll avoid the problem from the first example. The repetition of each measurement will ensure that we get
the right value for the first syndrome bit, giving us the right overall syndrome for this case. This recipe for
repetition won’t help us with the second example, however. In that example, all four syndrome bits were
correct when they were measured, it’s just that the syndrome changed while we were measuring it.
Instead, we should measure all bits of the syndrome, and then repeat, as in figure 12.7. It’s still possible
that the syndrome could change in the middle of a measurement, but at least then the next time we measure
the syndrome, it will be correct (assuming there are no new faults).
How many times do we need to repeat the measurement? To get the ECCP and ECRP, we need to
consider cases in which there are up to t faults in the circuit. If we ever get t + 1 syndrome measurements

203
Measured syndrome bits: 0 1

cat H 0 cat H 1
cat H 0 cat H 1 ...
cat H 0 cat H 1

X X X
Z Z Z X X X
Z Z Z Z Z Z
X X X Z Z Z
X X X
Y

Figure 12.6: Repeating measurement of syndrome bits one-by-one for the five-qubit code. Because Y3 occurs
only after all repetitions of the first syndrome bit measurement are complete, the observed syndrome is 0110,
which corresponds instead to X4 , instead of the correct syndrome 1110.

Measured syndrome: 0000 1110

cat H 0 cat H 1
cat H 0 cat H 1
cat H 0 cat H 1 ...
cat H 0 cat H 0

X X Z X X Z
Z X X Z X X
Z Z X Z Z X
X Z Z X X Z Z X
X Z Z X Z Z
Y

Figure 12.7: Measuring the full syndrome and then repeating for the five-qubit code. When the Y3 error
occurs after the first syndrome measurement, the second and future syndrome measurements give the correct
syndrome 1110.

204
in a row that are all the same, we thus know they must be correct, since at least one of those measurements
occurred with no faults. Indeed, even if not all the syndrome measurements that agree are consecutive,
we can be sure one of them was correct. It may not correspond to the current syndrome, but at least it
corresponded to the correct syndrome at some point, and that will actually be enough to give us both the
ECCP and the ECRP. The ECRP, for instance, insists that we correct all the errors that were in the state
going into the EC gadget, but doesn’t insist that we correct errors due to faults during the gadget.
Unfortunately, since there are 2n k possible values of the syndrome, it is not enough to repeat 2t+1 times
and take the majority, as we did for the measurement gadget. There might not be a majority. In particular,
the syndrome can change over the course of the process, so even though a majority of the measurements
have to be correct, the syndromes produced by the correct measurements could di↵er.
For simplicity, I’ll use the rule “repeat many times and then look for a sequence of t + 1 consecutive
syndrome measurements that all agree.” If there is more than one such sequence, take the last one (since it
will give the most up-to-date value of the syndrome). How many repetitions do we need to ensure that we
get a sequence all the same? Well, we could get a sequence of t that agree (without faults), followed by one
that disagrees (due to a fault). Then we could get another sequence of t that agree, followed by one that
disagrees, and so on. To get each one that disagrees, we need a fault, and there are a maximum of t faults
in the circuit. Therefore, we could use up t(t + 1) repetitions without ever getting a sequence of t in a row
that all agree. However, if then do t + 1 further repetitions, there can be no more faults in the circuit, and
all the last t + 1 repetitions will agree. Therefore, (t + 1)2 repetitions is sufficient.
In the case where we don’t have any string of t + 1 consecutive syndrome measurements that agree, take
the longest string (and latest, if there is a tie), and use that. This can only occur if there are more than t
faults in the circuit, but it makes sure we have a well-defined protocol in all cases.
Certainly, using di↵erent rules to determine the syndrome, we could significantly reduce the number of
repetitions needed. Indeed, we might even hope to track changes to the syndrome as new faults occur,
which could allow us to deduce the current syndrome rather rapidly. However, it is difficult to analyze such
strategies in the completely general case, so I’ll stick with the simple rule given above.

12.2.3 Shor EC Satisfies the FT EC Properties

Theorem 12.3. The Shor EC gadget satisfies the ECRP and ECCP.

Proof. As usual, we can assume all faults cause Pauli errors and the errors on the input state are Pauli
errors. By linearity, we don’t need to consider a superposition of input states with di↵erent errors; it will be
sufficient to just look at a single Pauli error and show that we can correct that.
From equation (12.7), we know that each single cat state measurement can add only as many errors as
the total number of faults in the circuit. Thus, if the input state passes an r-filter, and the circuit has s
faults, then at all times, the data state will pass an (r + s)-filter. However, this is not quite sufficient, since
we’d also like to know the error on the state doesn’t radically change, which would invalidate any earlier
attempt we made at learning the error syndrome. We’d also like to be able to say something about the case
when the measurement has no errors. Equation (12.8) is also insufficient, since to do error correction, we
need to consider the case where the state being measured has errors on it.
Claim 12.4. Suppose | i is a codeword of S, E 2 Pn , and E| i is given as input to a single cat state
measurement of M 2 S. If the measurement circuit has 0 faults, then the output state is E| i and the
classical measurement outcome is c(E, M ). If the measurement circuit has s  t faults, then the output state
is F E| i, with F 2 Pn , wt F  s.

Proof of claim. The 0-fault part of the claim can be easily checked directly from the description of cat state
measurement. For the s-fault part of the claim, consider applying equations (12.7) and (12.10) to the code
obtained from S by applying the Pauli E. (The code ESE † has the same distance as S.)

First, we’ll prove the ECRP. As discussed above, if we have an s-fault Shor EC gadget with s  t, then
there must be a sequence of t + 1 consecutive syndrome extraction steps (each consisting of n k cat state

205
measurements of the syndrome bits) for which all extracted syndromes agree, and furthermore, at least one
of the syndrome extraction steps has no faults.
Consider the last sequence of t + 1 indentical syndromes and the last faultless syndrome extraction step
in the sequence, and suppose the error entering that step is E. Then, according to claim 12.4, the syndrome
deduced by the measurements is indeed (E). According to the rule used above for determining the syndrome
in Shor EC, (E) is thus chosen as the “correct” error syndrome deduced by the gadget. The state exiting
the last cat state measurement in the faultless syndrome extraction is E| i. Then we can apply claim 12.4
repeatedly to deduce that the state after the last cat state measurement in the whole gadget is F E| i. The
additional error F has weight equal to at most the total number of faults in the circuit after the faultless
syndrome extraction we identified, so wt F  s.
If the final Pauli correction step has no faults, the final output state of the gadget will be E 0 F E| i =
±F (E 0 E)| i, where E 0 is an error with the same error syndrome as E. We have not put any constraint
on wt E, so it is certainly quite possible that E 0 6= E. However, we do know that E 0 E 2 N(S) since they
have the same error syndrome. Thus, (E 0 E)| i is also a valid codeword, and the state passes an s-filter, as
desired for the ECRP.
When the final Pauli correction step does have faults, the output state of the gadget will have additional
errors. However, the Pauli correction is done transversally (though it is not a gadget!), so any faults in the
correction can only cause errors on the individual qubits a↵ected by the locations with the faults. That is,
the output state will instead be GE 0 F E| i = ±(GF )(E 0 E)| i, with wt G at most the number of faults in
the Pauli correction step. Since wt(GF ) is less than the total number of faults in the gadget, the final state
does still pass an s-filter.
For the ECCP, a slightly modified version of the above argument works. Now we restrict the input state
to have at most r errors. Assume the input state is E0 | i, with | i a codeword and wt E0  r. Again we
identify a faultless syndrome extraction step in the sequence of t+1 steps used to deduce the error syndrome.
Applying claim 12.4 to all the cat state measurements before the syndrome extraction step of interest, we
find the error entering that syndrome extraction step is E1 E0 | i, with wt E1 at most equal to the number of
faults before the chosen step. Let E = E1 E0 . Applying the 0-fault part of claim 12.4, the syndrome deduced
by the faultless syndrome extraction step is (E) and the state exiting the step is E| i. Then we apply
claim 12.4 to all the cat step measurement steps after the chosen syndrome extraction, ending up with the
state F E| i. As before, the final Pauli correction leaves us with the state GE 0 F E| i = ±(GF )(E 0 E)| i.
From above, we know that wt(GF )  s  t.
In this case, we also have a bound on wt E: We know wt E0  r, and wt E1  s, since E1 comes from
faults in cat state measurements during the gadget. Thus, wt E  r + s  t. Since the code corrects t errors
and (E) is a syndrome which includes errors of weight  t, it must be that E 0 E 2 S. (It’s still possible
that E 0 6= E if the code is degenerate.) Therefore, the output of the gadget is the state ±(GF )| i. Since
wt(F G)  t, the ideal decoder can correct this error, so the LHS of the ECCP gives output ±| i.
The RHS of the ECCP has the ideal decoder acting on the state E0 | i, with wt E0  r  t. The ideal
decoder can correct E0 , giving the output | i. The LHS and the RHS could di↵er by a global phase which
depends on E and F but not on | i. The pictures represent equality as CP maps, so the global phase does
not matter. Therefore, we have proven the ECCP for Shor EC.

You might be a bit worried by the global phase appearing at the end of the proof. If the errors are
indeed pure Pauli errors, the phase is indeed a global phase, and everything is fine. But what if we have a
superposition of Pauli errors instead? Since the phase depends on the exact errors and fault path, it seems
like this might produce a relative phase rather than a global phase. However, it turns out everything is still
OK.
Let’s carefully think through the proof of the ECCP with superpositions of errors. We have a syndrome
measurement procedure that determines the syndrome is (E). That means that, even if the error at the
point of syndrome extraction is actually a superposition of di↵erent errors, the syndrome extraction itself
will decohere the errors (as we saw in section 2.4) to produce a superposition over only Pauli errors that
share the same syndrome. Since r + s  t, all the Paulis in the superposition will have weight less
P than t
and those with the same syndrome can di↵er only by elements of the stabilizer. That is, E = E 00 ↵ c↵ M↵ ,

206
where E 00 2 Pn and M↵ 2 S. Then
X
E| i = E 00 c↵ M↵ | i = E 00 | i (12.28)
↵

up to normalization.
The correction E 0 is always a Pauli (for a stabilizer code, which is assumed for Shor EC) since we are
choosing it based on the measured syndrome. However, F , caused by faults after the successful syndrome
measurement, and G, caused by faults in the final Pauli correction, could be superpositions of Paulis. The
state of the LHS of the ECCP just before the ideal decoder is

GE 0 F E| i = GF 0 | i, (12.29)

where F 0 is the same superposition of Paulis as F , but with some signs changed due to commuting E 0
through. GF 0 might be a di↵erent error than GF in more significant ways than a global phase, but both
are t-qubit errors and so the ideal decoder will correct either just as well. Thus, the final state of the LHS
of the ECCP is indeed | i up to global phase and perhaps normalization, so the ECCP holds even for
superpositions of Pauli errors.

12.3 Steane Error Correction and Measurement

With all the issues about repeating syndrome measurements and verifying cat states, Shor EC gets to be
pretty complicated. Perhaps more importantly, it involves a lot of locations. Lots of locations means lots of
opportunities for errors, which eventually will translate into a lousy threshold. We’d therefore like to reduce
the number of locations in an error correction gadget. Unfortunately, error correction for quantum codes
seems to be a pretty complicated thing, so there’s a limit to how simple we can make the error correction
gadget.
There’s another trick we can play. Not all faults are equally bad. What we’d like to avoid is a build-up
of errors in a single data block. Locations that include physical qubits from the code block are therefore
inherently dangerous. Even the best fault-tolerant techniques won’t usually be able to tolerate more than t
faults directly a↵ecting the data qubits.
On the other hand, errors in the ancilla are only bad if they propagate into the data or cause us to make
a wrong choice (for instance, concluding the wrong error syndrome). We’ve already seen the trick of checking
our cat state ancillas before using them, and other ancillas can be checked too. If we’re careful about it,
we can create a circuit that will satisfy the FT conditions even when there are many more than t faults,
provided most of the faults are in the part of the circuit creating ancillas. Why can we check ancillas but
not the data qubits? The reason is that we know the precise ancilla state that we are trying to create, but
the state of the logical qubits somewhere in the middle of a long computation is unknown, and moreover,
any single logical qubit is likely to be entangled with many other logical qubits.
Consequently, we can benefit by shifting a lot of the work we do during error correction into creating
an ancilla. Steane EC (again named after its inventor) uses more complicated ancillas than Shor EC, but
involves much simpler interaction between the data and the ancilla.

12.3.1 Steane EC
Steane error correction is available for CSS codes. The basic idea is to take advantage of the fact that
transversal CNOT performs the logical CNOT for any CSS code.
How can that help us do error correction? Consider the circuit of figure 12.8a. The ancilla is in the state
|+i = |0i + |1i, encoded in the same CSS code as the data. The transversal CNOT from the data block to
the ancilla block performs CNOT on the logical state, but CNOT| i|+i = | i|+i, regardless of | i. The
operation is therefore trivial if we only focus on the encoded states. Still, the transversal CNOT does do
something: It propagates errors. In particular, if there are bit flip errors in the data block, they propagate
to the same locations in the ancilla block.

207
a) b)

H
H
H
|+i |0i H
H
H
H

Figure 12.8: a) The Steane method of bit flip error correction for the 7-qubit code, b) The Steane method
of phase error correction for the 7-qubit code

208
We saw in section 11.4 that transversal measurement followed by classical decoding performs logical
measurement for a CSS code. Since the ancilla encodes the state |+i, logical measurement just gives a
random outcome. In this case, however, we’re interested in another aspect of the decoding, namely the
classical error correction. Recall that transversally measuring a codeword of a CSS code gives a classical
codeword, chosen randomly from the appropriate coset of C1 /C2 ? . In this case, the coset is also random, so
in the absence of errors, we would get a uniformly random codeword of C1 .
When the state being measured has bit flip errors in it, instead of a codeword of C1 , we get a codeword
of C1 with some errors in it. If the measurement itself has no faults and the original quantum state has
the bit flip error (e|0) (written in binary symplectic notation), then the classical codeword resulting has the
error e. Thus, calculating the classical error syndrome will tell us the bit flip part of the error syndrome for
the CSS code. If there are faults in the measurement, a few additional classical bits may end up flipped,
specifically those corresponding to the qubits a↵ected by the faults. The classical error will thus include the
quantum bit flip error plus a small number of additional bit flips.
Of course, what we are actually measuring is the bit flip error on the ancilla. Any bit flips present in the
data have propagated to the ancilla, but there might also have been bit flips already present in the ancilla
block. The circuit of figure 12.8a taken as a whole thus measures the error syndrome of an error e + f + g,
where (e|0) is the bit flip part of the error initially on the data block, (f |0) is the bit flip error on the ancilla
block entering the circuit, and (g|0) are errors due to faults in the measurement step. Still, this tells us
something about bit flips on the data block, so we can use it for error correction.
Of course, as we discussed in section 10.1.2, error propagation works both ways. While the bit flip errors
from the data block are getting copied into the ancilla block, where they can be measured, any phase errors
in the ancilla block simultaneously jump onto the data block. This is analogous to what happens in cat state
measurement, where errors in the cat state construction can propagate into the data.
For these two reasons, we’ll have to be careful how we build the ancilla state. If it has too many bit flip
errors, the syndrome we deduce will be badly wrong, and if it has too many phase errors, the data block will
inherit them and may end up wrecked, with too many phase errors to be fixed by error correction.
We also need a way to measure the phase errors in the data block. We can do that as in figure 12.8b,
using the ancilla |0i (encoded in a block of the same code) plus transversal CNOT with the ancilla as
control and the data as target. Then phase errors flow from the data into the ancilla; of course, bit flip
errors now propagate from ancilla into data. Since we want to identify the phase error syndrome, we should
perform a transversal Hadamard, switching C1 with C2 and X errors with Z errors. Now when we measure
transversally, we get a random codeword of C2 with errors in the locations of the phase errors in the ancilla.
Thus, we learn e0 + f 0 + g 0 , where (0|e0 ) is the phase error in the data block, (0|f 0 ) is the phase error in the
ancilla, and (0|g 0 ) are phase errors in the measurements. And of course, while we do this, any bit flip errors
in the ancilla block will propagate into the data block.
A full Steane EC gadget consists of both bit flip syndrome measurement and phase error syndrome
measurement, followed by a Pauli correction step. The process is summarized in figure 12.9. The figure has
bit flip error correction before phase error correction, but the procedure is also fault-tolerant when they are
done in the reverse order. In theorem 12.5 I’ll prove the ECRP, which says that the state at the end of the
gadget has no more than s errors when there are s faults in the gadget. The order of error correction steps
doesn’t a↵ect this, but it does mean that the type of errors at the end of the gadget is more likely to be
the type corresponding to the first type of error correction you choose to do, simply because there’s more
time (i.e., more locations) since the error correction was performed, giving more opportunity for new errors
of that type. On the other hand, if there are more errors of one type than another in the state entering the
FTEC gadget, doing that type of error correction first makes it more likely that we’ll be able to correct the
state even if r + s > t: we may get lucky and be able to correct the pre-existing error before new errors occur
due to faults in the gadget.
There’s something clearly missing in my description of Steane EC: How do we make the ancilla states
and ensure they do not have too many errors? That is the complicated part of Steane EC. However, it is a
complication that we would have to face anyway: In order to have a full FT protocol, we need preparation
gadgets that create at least one type of encoded state, such as a |0i. This is precisely the ancilla state we
need to do Steane phase error correction. For Steane bit flip error correction, we also need a |+i state. That

209
H
H
H
|+i |0i H
H
H
H

Figure 12.9: Steane error correction as a whole, excluding the final Pauli correction

will require a slightly di↵erent protocol, but one that uses the same ideas. We’ll discuss how to make both
of these states in chapter 13.

12.3.2 Steane Measurement

For CSS codes, we already have a destructive measurement gadget, namely transversal measurement. How-
ever, it is helpful sometimes to also have a non-destructive measurement gadget, and the ideas of Steane EC
can be easily adapted to provide one. A non-destructive measurement gadget is fault-tolerant if it satisfies
a slightly modified version of the MCP with a qubit output:
r s r
= (12.30)

It also must satisfy a measurement propagation property (or MPP):

r s r s r+s
= (12.31)

Both properties hold when r + s  t.

The procedure is almost obvious — perform a logical CNOT from the data block to an ancilla block in
the state |0i, then measure the ancilla block transversally. The procedure is pictured in figure 12.10.
The main reason to mention this procedure is to point out the similarities to the bit flip part of Steane
EC, figure 12.8a. Indeed, the only di↵erence is that we use |0i for the ancilla instead of |+i. This means
instead of getting a completely random classical codeword (perhaps with errors), we get a codeword (with
errors) chosen randomly from some coset C1 /C2 ? . Which coset we get tells us the result of the measurement.
Also note that the Steane measurement procedure combines non-destructive measurement with bit flip
error correction. When we just think of it as a measurement gadget, we ignore the extra syndrome information

210
|0i

Figure 12.10: Steane measurement for the 7-qubit code

produced by decoding the classical codeword. But that information is there, and is useful just as in Steane
EC. In the right context, it is helpful to both measure the state and correct any bit flip errors we might find.

12.3.3 Do We Need to Repeat Steane EC and Measurement?

For Steane measurement, no repetition is needed. It’s just built up of two steps: transversal gate, followed
by transversal measurement. The creation of the ancilla state is not transversal, but we’re assuming we
use a protocol from chapter 13 that is fault tolerant. All steps by themselves are fault tolerant, so together
they should be fault tolerant too. (Actually, you shouldn’t take that for granted; there are cases where it’s
not true. This time it works, however.) The result is reliable even if we don’t repeat it. In particular,
ancilla errors are taken care of along with errors from the data block. Both are handled by a classical error
correction step which follows the transversal measurement.
As I’ve described Steane EC, we also don’t repeat the syndrome measurement for that either. Can that
be correct? After all, it’s still true that errors in the ancilla can cause us to deduce the wrong syndrome.
Surely we still have to worry about that.
It turns out, though, that we don’t. The extra errors caused by the ancilla or measurement errors are
additive: We can describe the final classical error as e + f + g, where e is the actual bit flip or phase error
in the data block and f and g are ancilla and measurement errors. The measured error thus corresponds
to the true error plus some additional errors. If we try to correct the error by doing the Pauli (e + f + g|0)
(for a bit flip error), then we are left with bit flip error (f + g|0). Among other things, this means that a
single-qubit error in the ancilla can only produce a single-qubit error in the final state. This is in contrast to
Shor EC, where a single ancilla error changing one bit of the error syndrome could totally change the error
we deduce.
Another important aspect of Steane EC is that the errors in the data are detected by looking at an ancilla
qubit which corresponds to the location of the error in the data block. Recall that in Shor EC, it was possible
that a single fault in a controlled-Pauli location could produce an error in the data block and also cause an

211
Z X

Y
Z

H
H
H
|+i |0i H Syndrome for Z1
H
H
H

Figure 12.11: Steane error correction with a fault in one of the CNOT gates. The gate fault causes a Z
error in the first ancilla qubit, which produces an error syndrome corresponding to a Z error in the first data
qubit. The corresponding “correction” combines with an X error in the first data qubit caused by the same
gate fault to produce an overall Y error on the first qubit; but it is still a single-qubit error.

error in the ancilla which changed the measured syndrome. For Steane EC, this can still happen, but only
in a very restricted way: The ancilla error caused by such a fault can only change the syndrome in such a
way as to add or remove an error on the same qubit involved in the two-qubit location, as in figure 12.11.
(This is presuming the total number of faults in the circuit is less than t.)

The upshot is that we don’t have the same problems that made it mandatory to repeat syndrome
measurements in Shor EC. Of course, by not repeating, we do make it easier for errors in the ancilla to
end up in the data. No matter what we do, in Steane EC, phase errors from the ancilla will propagate into
the data during bit flip correction, but if we don’t repeat the syndrome measurement, bit flip errors in the
ancilla will also end up in the data, as in figure 12.12. Without repeating the measurement, there is no way
to distinguish them from bit flip errors in the data, so our attempt at correcting them will actually add them
as errors to the data block. That’s not a huge problem; they will be corrected in the next EC gadget.

If you prefer to repeat the syndrome measurement for Steane EC, that’s fine. It’s a trade-o↵, however.
By repeating, you will have more confidence in the syndrome you deduce, and ancilla errors will be less
likely to end up in the data block. However, you also increase the overhead, and in the process add more
opportunity for errors to build up and overwhelm your error correction capability.

212
X
X

H
H
H
|+i X5 |0i H
H
H
H
X

Figure 12.12: If Steane error correction is not repeated, an ancilla error will be incorrectly interpreted as a
data error that needs to be corrected.

213
12.3.4 Steane EC and Measurement Satisfy the FT Properties
Theorem 12.5. Steane EC satisfies the ECRP and ECCP. Steane non-destructive measurement satisfies
the non-destructive version of the MCP and the non-destructive measurement propagation property.
Proof. Proving Steane measurement satisfies the two non-destructive measurement properties is not hard:
Steane measurement is made of three steps, |0i FT state preparation, transversal CNOT, and transversal
(destructive) measurement. All three of these steps are FT gadgets in their own right, so we can apply the
FT properties of the sub-gadgets in order to prove the FT properties for Steane measurement.
To be more explicit: First, apply the PPP and the GPP to put filters after the first two subgadgets:

r s2 r s2 r + s1 + s2

s1 s3 = s1 s1 s3 (12.32)

CNOT CNOT

(Note that we can use the GPP to put just a filter on the first block and not on the second, see exercise ??.)
Then by using the PPP again to remove the s1 -filter, we have already proven the measurement propagation
property, since s = s1 + s2 + s3 s1 + s2 .
For the MCP, we do the same thing to a circuit with an ideal decoder and then use the destructive
measurement MCP to replace the measurement subgadget with an ideal decoder. We can do this since
(r + s1 + s2 ) + s3 = r + s  t by hypothesis. We find

r s2 r s2 r + s1 + s2

s1 s3 = s1 s1 r + s1 + s2 (12.33)

CNOT CNOT

We can then remove the (r + s1 + s2 )-filters using the GPP and apply the GCP to move the ideal decoders
to before the gate gadget:
r s2 r

s1 s3 = s1 s1 (12.34)

CNOT CNOT

Then use the PPP to remove the s1 -filter after the ancilla preparation and the PCP to turn it into an ideal
preparation:
r s2 r

s1 s3 = (12.35)

CNOT CNOT
The RHS takes the state that passes through the r-filter, decodes it, and does a perfect CNOT to a perfect
|0i and measures the ancilla. This implements an ideal non-destructive measurement, proving the MCP.
Steane EC requires a more careful treatment. We’ll prove fault-tolerance for the version of Steane EC
with phase error correction first, but the proof is essentially identical when bit flip error correction is first.
As usual, we’ll assume the input state has a Pauli error and that all faults in the gadget also cause Pauli
errors.
It will be most convenient to represent the errors in binary symplectic form. We’ll name the errors from
the various parts of the circuit as described in table 12.1. Depending on the detailed implementation, there

Table 12.1: Symbols for errors in the various parts of Steane EC in the data block and ancilla blocks. The
ancilla column is for the ancilla block used for the type of errors being corrected at the moment. “New”
indicates new errors caused at the end of that part, and “total” indicates the overall error at the end of that
part due to error propagation. Error means error relative to the ideal starting codeword.

may be wait steps on the data block after the CNOT steps; for the purposes of this calculation, errors due
to the wait locations can be absorbed into the errors due to the CNOTs. Similarly, the errors due to the
Hadamards needed for the phase measurement can be absorbed into the measurement locations.
Based on this calculation, we see that the phase error syndrome we will see in this circuit is az +ez +c0z +fx
and the bit flip error syndrome we see is bx + ex + ax + cx + d0x + gx . However, as discussed before, we may
not be able to exactly deduce these errors. Instead, we’ll decide the error is (sx |sz ), a Pauli with the same
error syndrome as the actual measured error

P = (bx + ex + ax + cx + d0x + gx |az + ez + c0z + fx ) (12.36)

Since their syndromes are the same, we know

M = (bx + ex + ax + cx + d0x + gx + sx |az + ez + c0z + fx + sz ) 2 N(S). (12.37)

At this point, the actual error is

Q = (ex + ax + cx + dx |ez + cz + bz + dz ) (12.38)

Now, when we make the correction in the final step of the gadget, there may be an additional error (hx |hz ).
Therefore, at the end of the gadget, the overall error, taking into account the correction (sx |sz ) performed,
is
(ex + ax + cx + dx + sx + hx |ez + cz + bz + dz + sz + hz ). (12.39)
Now, multiplying by an error in the normalizer leaves us with a codeword, so we can conclude that the final
state is a quantum codeword with error

E = (dx + bx + d0x + gx + hx |cz + bz + dz + az + c0z + fz + hz ). (12.40)

This seems more complicated. However, we have designed each part (state preparation, CNOTs, mea-
surement) of the Steane EC gadget to itself be fault-tolerant. Therefore, the number of new errors in a block
due to any part cannot be greater than the number of faults in that part. Furthermore, for the CNOT steps,
the errors in the data block and ancilla block must be on corresponding qubits; i.e., wt(0|cz + c0z ) must be
at most the number of faults in the phase CNOT step and wt(dx + d0x |dz ) must be at most the number of
faults in the bit flip CNOT step. In all, we can conclude that if there s faults in the circuit, any sum of
errors excluding ex and ez must have weight at most s. In particular, wt E  s  t. Therefore, the ECRP
is satisfied.
For the ECCP, we assume the input state passes an r-filter, so wt(ex |ez )  r. Then, using the same
logic as the previous paragraph, we conclude from equation (12.36) that wt P  r + s  t. If the code is
non-degenerate, then there is just one error of weight  t with a given error syndrome, so (sx |sz ) = P . If

215
the code is degenerate, there might be more than one error with the same syndrome as P , but regardless, it
will be the case that M 2 S (whereas before we could only say that M is in the normalizer). Therefore, we
conclude that if the initial state is (ex |ez )| i, then the final state is E| i, with E given by equation (12.40).
Since the error has weight at most t, the ideal decoder will correctly decode it to | i, which is the same as
what the ideal decoder gets from the initial state.

Notice that in the final error equation (12.40), we can see that the phase error has more contributions to
it than the bit flip error rate. This is because we did phase error correction first, and as a result the phase
error rate leaving the gadget is higher than the bit flip error rate.

12.4 Knill Error Correction and Measurement

So, Steane EC replaces the multiple rounds of repetition and numerous two-qubit gates needed for Shor
EC with simply two CNOTs per physical data qubit, but at the cost of using encoded |0i and |+i states
as ancillas instead of the simpler cat states. Steane EC trades o↵ ancilla complexity for minimal direct
disturbance of the data block. Can we go further in this direction, and reduce to a single CNOT gate per
data qubit?
We’d also like to overcome one of the limitations of Steane EC. In particular, we’d like an FT error
correction method that is as efficient as Steane EC but works for all stabilizer codes, not just CSS codes.
Knill EC achieves both of these goals. The basic idea is to combine teleportation of the logical state with
error correction. Teleportation here is used simply as a trick to perform error correction. We don’t actually
intend to move the quantum state to a di↵erent location, so in this case Alice and Bob will be right next
to each other in the same quantum computer. Of course, if you want to move the data and perform error
correction at the same time, you could do that too.

12.4.1 Knill EC
Knill error correction for an arbitrary stabilizer code is shown in figure 12.13. The ancilla used for Knill EC
is a logical Bell state, encoded in two blocks of the same code as the data.1 Then we perform a transversal
Bell measurement between the data block and the first ancilla block.
Note, in particular, that the encoded Bell state |00i + |11i is di↵erent from n two-qubit Bell states
(|00i + |11i)⌦n . Thus, even though Knill EC involves a transversal Bell measurement between the data
block and the first ancilla block, it is not the same as teleporting each individual qubit. Instead, the Bell
measurements have the e↵ect of gathering information which lets us deduce the error syndrome.
As with Steane EC, I won’t discuss yet how to create the needed ancilla state. That will be deferred
until chapter 13. For now, just assume that the ancilla preparation procedure somehow satisfies the PPP
and PCP.
What happens when we perform transversal Bell measurement using half of a logical Bell state? We can
best understand this by breaking up the transversal Bell measurement into two steps: a transversal CNOT
followed by transversal Pauli measurement (in the X basis for the data block and the Z basis for the ancilla
block). Assume for the moment the CNOT and measurement are done without any faults. We’ll allow errors
in the ancilla state and data state before the CNOT, but only up to a total of t errors between them.
Consider an arbitrary Pauli P acting on the data block, and the same Pauli, but with possibly a di↵erent
sign, acting on the first block of the ancilla. We are mostly interested in cases where P 2 N(S) or P 2 S.
We can write
( 1)bi P = iPX ·PZ ( 1)bi PX PZ , (12.41)
where PX is a tensor product of X’s and I’s, PZ is a tensor product of Z’s and I’s, and PX ·PZ is shorthand for
“the number of qubits where both PX and PZ are non-trivial” (the power of i comes from the Y operators
in P ). The factor of ( 1)bi indicates the sign of the Pauli on the data block (i = 0) or ancilla block
1 Actually, the second block doesn’t need to be in the same code. If it is encoded in a di↵erent code, then Knill EC combines

error correction with a code conversion gadget changing the way the data is encoded.

216
H
H
H
H
H

|00i + |11i

Figure 12.13: Knill error correction for the 5-qubit code; it works the same way for any stabilizer code, just
with a di↵erent number of qubits. The transversal Bell measurement between the data block and first ancilla
block is not the same as a logical Bell measurement, although it can be used to deduce the outcome of the
logical Bell measurement.

217
(i = 1). Unfortunately, in this argument, we care a lot about the phases, so working in the binary symplectic
formalism is out.
The e↵ect of the CNOT in figure 12.13 is to map

( 1)b0 P ⌦ I !iPX ·PZ ( 1)b0 PX PZ ⌦ PX (12.42)

b1 PX ·PZ b1
( 1) I ⌦ P !i ( 1) PZ ⌦ PX PZ . (12.43)

Notice that
( 1)b0 +b1 P ⌦ P ! ( 1)b0 +b1 +c(PX ,PZ ) PX ⌦ PZ (12.44)
because c(PX , PZ ) = PX · PZ mod 2. In other words, the tensor product P ⌦ P becomes a tensor product of
X’s on the data block and Z’s on the ancilla block. Therefore, the subsequent measurement of X on each
physical data qubit and Z on each physical qubit in the first ancilla block lets us deduce the eigenvalue of
this operator. Since c(PX , PZ ) is known, we then learn b0 + b1 .
This observation is very helpful, since we can apply it to any Pauli P . When P is a generator of the
stabilizer, we can deduce the corresponding bit of the error syndrome, only we don’t learn the syndrome of
just the data block, but rather the sum s0 + s1 , where s0 is the syndrome of data block and s1 is the error
syndrome of the first ancilla block. This is no di↵erent than what happened for Shor or Steane EC: In Shor
EC, errors in the cat state could cause errors in the syndrome we deduce, and in Steane EC, as in Knill EC,
the syndrome we measure is also the sum of the syndrome in the data block with a syndrome derived from
errors of the same type in the ancilla.
We can also apply the observation to measure logical Paulis. Indeed, from our measurement results we
can deduce the eigenvalues of both X i ⌦ X i and Z i ⌦ Z i for every logical qubit i. That is, in addition to
error correction, we can perform the logical Bell measurement. With that in hand, since the ancilla is part
of an encoded Bell state, we can perform teleportation of the logical state.
However, we need to be careful about interpreting the outcome of the logical Bell measurement. First
of all, I’m still just considering the case where there are no faults in the CNOT or measurement. Even so,
remember that pre-existing errors in the data block or ancilla block can change the eigenvalue of Q ⌦ Q
without actually changing the codeword one would deduce using an ideal decoder. Suppose we have E on
the data block and F on the first ancilla block, both Paulis, satisfying wt E + wt F  t. Then if ( 1)b is
the correct eigenvalue of Q ⌦ Q, meaning the value we’d get after doing ideal error correction, we instead
0
measure ( 1)b , where b0 = b + c(E, Q) + c(F, Q).
Luckily, the same set of measurements that let us deduce b0 also told us the error syndrome. The only
problem is that they didn’t tell us the error syndromes of E and F separately, but the sum of the error
syndromes (E) + (F ) = (EF ) (by proposition 3.7). Since wt(EF )  wt E + wt F  t, from (EF ), we
deduce an error G = EF M , with M 2 S.
Now, again using proposition 3.7, we find

c(E, Q) + c(F, Q) =c(EF , Q) (12.45)

=c(G, Q) + c(M, Q). (12.46)

But Q is a logical operator, in N(S), so c(M, Q) = 0. Thus, once we deduce G, we can deduce the correct
eigenvalue ( 1)b of Q ⌦ Q, and therefore get the correct outcome for the logical Bell measurement.
Now let’s consider what happens when there are faults in the CNOTs and measurements. As usual, we
can assume the faults cause Pauli errors. The basic model for fault tolerance says that a fault on a CNOT
location causes errors after the end of the CNOT, though of course we must allow the possibility that one
faulty CNOT location could produce a one-qubit error in both the data and ancilla blocks. We can assume
the errors in the measurement locations occur before the measurements, and therefore we can combine the
errors from the CNOTs and measurements into a Pauli error E 0 on the data block and an error F 0 on the
ancilla block. It may be that wt E 0 + wt F 0 is greater than the number of faults, since a fault in a single
CNOT location contributes to both, but it is true that wt(E 0 F 0 ) is at most equal to the number of faults in
CNOT and measurement locations.

218
Since the data qubits are about to be measured in the X basis, a bit flip error has no e↵ect. A phase or
Y error on a data qubit will reverse the measurement outcome on that qubit. Therefore, we might as well
assume E 0 is composed of only Z errors. Similarly, we can assume F 0 consists only of X errors, since the
ancilla qubits are measured in the Z basis.
Now let’s look at the e↵ect of the additional errors on the measurement outcomes. When we measure
the error syndrome of M ⌦ M 2 S ⌦ S, the outcome is only a↵ected if E 0 acts on qubits where M has an
X or Y , or if F 0 acts on qubits where M has a Y or Z. In fact, the overall e↵ect is to change the parity of
the outcome by the parity of the number of qubits that meet this condition. In other words, the outcome
of the measurement of M ⌦ M is changed by exactly c(E 0 F 0 , M ). Running this over all generators in the
stabilizer, we find that instead of the syndrome (EF ), we end up measuring the syndrome (EF E 0 F 0 ).
Now, if the input data state passes an r-filter, the ancilla preparation has s1 faults, and the CNOT and
measurements combined have s2 faults, then wt(EF E 0 F 0 )  r + s1 + s2 . When this is less than t, we can
deduce an error G0 with the syndrome (EF E 0 F 0 ). G0 will only di↵er from EF E 0 F 0 by an element of the
stabilizer. Since E 0 and F 0 have the same e↵ect on the logical operators P ⌦ P that they did on the stabilizer
elements, that means we again end up deducing the correct value for the Bell measurement outcome.
Note that the physical qubits that exit Knill EC are brand new qubits, produced for the ancilla. There
may be errors in them, but they are fresh errors arising from the ancilla preparation, and the errors are
unrelated to whatever errors were in the data before. Furthermore, the only information that is directly
used to complete the teleportation is the result of the logical Bell measurement. The error syndrome of the
data is discarded. Still, we can’t simply ignore the error syndrome: As discussed above, it is instrumental
to determining correctly the outcome of the logical Bell measurement.

12.4.2 Knill Measurement

Essentially the same procedure can be used to measure the logical state directly. As we discussed in sec-
tion 12.4.1, the transversal CNOT between two blocks of a general stabilizer code followed by transversal
measurement of X on the first block and Z on the second block lets us deduce the sum of the error syn-
dromes for the two blocks as well as the outcome of a logical Bell measurement on the two blocks. If the
measurement we want to do is a logical Bell measurement, that’s all we need.
Suppose, though, we want to measure a more standard single-block operator, such as Z for all encoded
qubits in a block. We can use the same procedure as Knill EC, but with a |0i ancilla instead of an encoded
Bell state. Figure 12.14 is an example. As with Knill EC, we deduce the error G (which, in the absence of
additional faults, is actually the product of the errors on the data and ancilla, up to a stabilizer element),
and use it to deduce the outcome of a logical Bell measurement between the data block and the ancilla block.
However, out of the logical Bell measurement, we’re only really interested in the eigenvalue of Z ⌦ Z. Since
the logical state of the ancilla is +1 eigenstate of I ⌦ Z, the eigenvalue of Z ⌦ Z is the same as the eigenvalue
of Z ⌦ I, which is what we’re actually only really interested in. Because the original ancilla state is |0i, the
eigenvalue of X ⌦ X is always completely random; it tells us nothing about the data. And that’s all there
is to Knill measurement.

12.4.3 Knill EC and Measurement Satisfy the FT Properties

Theorem 12.6. Knill EC satisfies the ECRP and ECCP. Knill measurement satisfies the MCP.

Proof. We already did most of the work proving that the Knill EC gadget is fault-tolerant in section 12.4.1.
As usual, we can assume faults cause Pauli errors, and we will assume the input state also has a Pauli error
on it.
As in section 12.4.1, we assume the ancilla preparation has s1 faults, leading to error F on the first
ancilla block and H on the second ancilla block, with wt F, wt H  s1 by the PPP. We assume there are
a total of s2 faults on the CNOT and measurement steps (split up into s02 on the CNOT, and s002 and
s000 0 0
2 on the measurements), producing errors E on the data block and F on the first ancilla block, with
0 0
wt(E F )  s2 . There are also s3 faults on the logical Pauli correction step to end the teleportation and

219
H
H
H
H
H

|0i

Figure 12.14: Knill measurement for the 5-qubit code. It may look almost the same as Steane measurement
(figure 12.10), but the classical decoding (not shown) is di↵erent.

on any wait steps used for the second ancilla block, producing error H 0 on the second ancilla block, with
wt H 0  s3 . s1 + s2 + s3 = s, the total number of faults in the gadget.
For the ECCP, the input state Pauli error E has wt E  r so it passes the r-filter. We assume r + s  t.
The argument in section 12.4.1 shows that, with this bound on the number of errors, the CNOT plus
measurement plus classical processing steps give the same outcome as would an ideal Bell measurement
applied to the state produced by the ideal decoder. In terms of pictures

r s02 s002 r
X
s1 s000 = s1 Bell (12.47)
2
Z
CNOT⌦n

In particular, we can use the PPP to put an s1 -filter on each ancilla block after the ancilla preparation
step, then use the above equation to replace the transversal Bell measurement with an ideal decoder followed
by ideal Bell measurement. For the second ancilla block, the only operation after ancilla preparation is the
logical Pauli (plus waiting), which is done transversally. We can thus use the GCP to push an ideal decoder
on the output block back before the logical Pauli and remove the s1 -filters with the PPP. We find

CNOT⌦n
r s02 s002 r
X
s1 s000 s1 Bell
2
Z = (12.48)
Bell s3 Bell
P P

Then using the PCP, we can replace the state preparation gadget and ideal decoders with an ideal Bell state
preparation. The RHS then consists of decoding the data block, followed by putting it through an ideal
teleportation procedure. In other words, we have proven the ECCP.

220
Non-FT EC Knill EC
Many gates on data block Few gates on data block
Simple ancilla Complicated ancilla

Shor EC Steane EC

Figure 12.15: The spectrum of di↵erent error correction gadgets. Non-fault-tolerant EC has the same number
of gates touching the data block as Shor EC but is more vulnerable to error propagation.

For the ECRP, we don’t put any constraint on the input error, but the argument is nonetheless much
simpler. The PPP tells us that we can put an s1 -filter after the ancilla preparation, and then the GCP
applied to transversal gates tells us we can put an (s1 + s3 )-filter after the logical Pauli step. Thus, the
output state passes an s-filter, as demanded by the ECRP.
For Knill measurement, the MCP uses the same argument as the ECCP. The ancilla preparation is now
of a |0i state instead of a logical Bell measurement, but if the preparation procedure has at most s1 faults,
we can still use the PPP to put an s1 -filter after it. Then, using equation (12.47), we replace the transversal
Bell measurement by ideal decoders and a logical Bell measurement, and use the PCP to replace the faulty
ancilla preparation procedure with an ideal |0i state preparation. The RHS then consists of decoding the
data, then doing an ideal Bell measurement between the data qubit(s) and a |0i state, which lets us deduce
the outcome of a Z measurement on the decoded qubit. This is precisely what we want for the MCP.

12.5 Efficiency of FTEC Protocols

12.5.1 Comparison of Shor, Steane, and Knill EC
Which FTEC gadget should you choose as part of an FT protocol? In this section, I’ll compare the advantages
and disadvantages of the three methods of fault-tolerant error correction to help you pick. Obviously, if you’re
using a non-CSS code, Steane EC is o↵ the table, but even then there is a still a choice between Shor and
Knill EC. What’s more, most work on fault tolerance does use CSS codes, because they are easy to derive
from classical codes and have transversal CNOT and measurement gadgets, so frequently all three protocols
are possibilities. The best choice will depend both on what QECC you’re using for the protocol and on
various physical properties of the quantum computer implementation you may be using. So far, I’ve only
discussed the basic model for fault tolerance, which corresponds to one particular choice of those physical
properties. The alternate assumptions should probably be comprehensible now, but if you prefer, you could
hold o↵ reading this section until you’ve gone through chapter 15, which discusses variations on the basic
model.
Basically, the three protocols lie on an axis from simple ancillas/many gates on data to complicated
ancillas/few gates on data, as in figure 12.15. Shor EC is on the simple ancilla end of the axis and Knill EC
is on the complicated ancilla end, with Steane EC in the middle (though not too far, perhaps, from Knill
EC). There are other “Shor-like” EC methods that use even fewer qubits. In a system where you can’t,
for whatever reason (perhaps due to lack of sufficient extra qubits), deal with complicated ancillas, FTEC
protocols on the simpler end of the axis are needed. In most other cases, it is better to have a gadget where
fewer gates are performed on the data, since that will lead to lower logical error rates. This suggests using
Steane or Knill EC.
The previous paragraph is conditional, however, on whether you can reliably prepare the ancillas needed.
You might think Shor EC has an advantage here, since the cat state ancillas are so simple, but it is not so
straightforward. Recall that a single phase error in a cat state results in the measurement giving the wrong
outcome. When the cat state gets large, the probability of having a phase error somewhere in the cat state
becomes high. Roughly speaking, if there is a p probability of phase error per qubit, an m-qubit cat state

221
has a probability (1 p)m of having no phase error on any qubit. When mp ⇡ 1, the measurement outcome
is nearly uniformly random, unrelated to the actual syndrome bit being measured. Of course, we repeat the
measurement result, which exponentially reduces the chance of reaching a wrong final conclusion about the
syndrome, but if we are in the regime mp & 1, many repetitions are needed to overcome the high noise rate.
This makes Shor EC extremely unappealing for most large codes.
This issue is inherently less of a problem for Steane and Knill EC, where the ancillas are states encoded
in blocks of the same QECC used for the main protocol, which is presumably capable of correcting a certain
number of errors. The cat state is relatively robust against single bit flip errors, which may propagate to
single-qubit errors in the data but otherwise are not too harmful, but can be completely destroyed by a single
phase error. In contrast, a real QECC can correct single bit flip or phase errors; only if the total number
of errors is large is there a serious problem. When the block size is big, creating the requisite encoded
ancilla states can be challenging nonetheless; see section 13.1 for details. To deal with faults in a large
quantum computation, we will need big codes to suppress the logical error rates sufficiently to complete the
computation reliably. Thus, even Steane and Knill EC can be an issue unless we have a family of codes that
allows us to make larger and larger ancilla blocks. Indeed, this is important in any case, since we need to be
able to make |0i states for preparation gadgets.
One thing to notice is that the size of the cat states for Shor EC depends on the weight of the stabilizer
generators rather than the total number of qubits in the block. Therefore, a code with many physical qubits,
but for which all stabilizer generators have low weight (an “LDPC”, or low-density parity check code), still
has very simple cat states. For most large codes, the problems of large cat states make Steane and Knill EC
definitely better than Shor EC, but for LDPC codes, Shor EC can be better. For instance, in chapter 19,
we’ll see a family of LDPC codes known as surface codes. Frequently people using surface codes don’t even
bother with Shor EC, instead using the non-FT measurement technique of section 12.1.1 instead. They’re
willing to put up with the fact that one fault can cause multiple errors in order to have the even simpler
one-qubit ancilla used in non-FT measurement instead of the four-qubit cat states they would need for Shor
EC. Other LDPC codes can be appealing candidates for real systems as well, so that makes Shor EC and
other small-ancilla techniques useful.
For non-LDPC code families, though, Steane and Knill EC seem better than Shor EC. But which of
Steane and Knill EC should we choose? Their threshold performance is probably pretty similar, though as
of this writing that hasn’t been studied very systematically. The advantage of Steane EC is that the ancillas
needed for it are exactly the same as those needed for a preparation gadget, whereas making the two-block
entangled states for Knill EC can sometimes create an extra challenge. On the other hand, we’ll see in
chapter 13 that many gates can be performed via teleportation through an appropriate ancilla. Knill EC
can be easily combined with this construction, producing a gadget that does error correction and a gate all
at once. This is a useful way to save resources.
Another nice advantage of Knill EC is that the data ends up on brand new qubits. Many systems su↵er
erasure errors that take the physical qubits out of the standard two-dimensional Hilbert space, for instance
though loss of photons or transitioning to additional energy levels in the system. Knill EC automatically
returns the qubits to the computational Hilbert space every time it is performed. Knill EC can also be
useful in shunting the data away from physical qubits which have an usually high error rate, perhaps due to
a manufacturing defect or some nearby noise source.

12.5.2 The Pauli Frame

In this chapter, I’ve described the FTEC protocols in two steps: Measure the syndrome, then correct the
errors. (In the case of Knill EC, the last step is instead “finish the logical teleportation.”) The second step
is straightforward, but it still involves some single-qubit gates, and therefore an opportunity for faults to
occur. It turns out, though, that it is usually not necessary to do those quantum gates at all. Instead, it is
sufficient to keep track of the information classically.
Imagine a Pauli error E with weight  t occurs in the data, and then we do an EC gadget without faults.
The syndrome measurement part of the EC gadget will tell us what E is, possibly up to an element of the
stabilizer if the code is degenerate. Suppose we don’t correct E, but simply make a note of it. If we are

222
working with a stabilizer code S, the subspace ET (S) is also a QECC. It is even a stabilizer code, and its
stabilizer is ESE † , which is just S again, but with di↵erent signs for the generators. It’s an error-correcting
code that has essentially all the same properties as S did. We can think of the EC gadget without correction
as a combined gadget for error correction and code conversion. The final code, however, is chosen randomly.
By making note of the error E, we can keep track of what the code is. The side classical information about
E is known as the Pauli frame. Note that each block of the code has its own Pauli frame.
One di↵erence between S and T = ESE † is that they may have slightly di↵erent fault-tolerant gates.
However, since the two codes di↵er only by a Pauli, the gates will be quite similar. We could simply modify
any gates we want to perform to take account of the new code. As long as we are doing Cli↵ord group gates,
we don’t even need to do that: A Cli↵ord group gate will change the Pauli error E into a di↵erent Pauli
error F in a known way. All we have to do is update the Pauli frame, replacing E with F .
We can keep doing Cli↵ord group gates like this indefinitely. Of course, new errors are occurring, but
so are new error correction steps. The EC gadgets contain further syndrome measurements, which let us
update the Pauli frame to take the new errors into account. At each EC step, we choose a low-weight
error relative to the Pauli frame entering the gadget rather than the Pauli frame at the beginning of the
computation: When errors are rare, it is unlikely the Pauli frame changes too quickly, but given enough
time, the accumulated error rate can be arbitrarily large. This means that after a while, the Pauli frame
may be an element of N(S), a logical operator. That’s OK — what it means is that we’ve gotten back to the
original code, but in a twisted way. The basis states have changed, but since we know how they’ve changed,
we can adapt without much difficulty.
Non-Cli↵ord group gates present more of a quandary. A transversal implementation of a non-Cli↵ord
group gate could modify the pre-existing error E to be some non-Pauli error (and always will for at least
some Paulis E). The only way to deal with that is by using a di↵erent FT implementation of the gate gadget
that’s modified for the current Pauli frame. A lot of the time, though, we are implementing non-Cli↵ord
group gates via some circuit construction involving a logical measurement, as discussed in chapter 13. In
that case, it’s important to implement the logical measurement step relative to the current Pauli frame. For
instance, if we are using the five-qubit code and the Pauli frame is I ⌦ Y ⌦ Y ⌦ I ⌦ X, the current code has
a logical bit flip relative to the original code, so we should interpret measurement outcome of 0 as 1 and 1
as 0. After the non-Cli↵ord group gate gadget is complete, we resume tracking the Pauli frame as usual.
The benefit of keeping track of the Pauli frame classically is small. It saves us one single-block transversal
Pauli gate per block per FTEC gadget. Two-qubit transversal gates, measurements, state preparation, and
particularly repetitions of the syndrome measurement consume many more resources, so the fraction of
locations saved by calculating the Pauli frame classically is small. Nevertheless, every little bit helps, so we
might as well do it.

223
224
Chapter 13

Any Sufficiently Advanced

Fault-Tolerant Protocol is
Indistinguishable from Magic States:
State Preparation And Its
Applications

There are some big loose ends left over from previous chapters. For one thing, I still haven’t told you how to
make the state preparation gadgets at the beginning of a fault-tolerant computation. Chapter 12 introduced
the Steane and Knill EC methods but only if you already know how to do state preparation. Clearly it’s
time to discuss that.
At some point, we also need to discuss how to create a universal set of gates, since we saw in chapter 11
that transversal gates are insufficient. A chapter on state preparation might seem a strange place to do
that, but it turns out that, for certain gates, the difficult part of the implementation can be encapsulated
in particular quantum states. It’s a bit like magic: Drink the potion and poof! You’re a frog. (Of course,
we’re considering unitary transformations, so there must also be an inverse potion that turns a frog into a
human.) Like a magic potion, a magic state is consumed when used. Unlike a magic potion, there’s actually a
scientific explanation for how the protocol works, which of course I’ll discuss. That makes it technology, and
an advanced technology at that, because it’s the last component needed to give us a complete fault-tolerant
protocol.

13.1 Preparation of Encoded Stabilizer States

In this section, I’ll explain how to fault-tolerantly prepare stabilizer states encoded in stabilizer codes. That
gives us a number of fault-tolerant preparation gadgets, for instance for |0i, as well as the ability to create
the ancillas for Steane and Knill EC.
The states we’re creating for now are all stabilizer states. Sometimes it’s helpful to just lump together
the stabilizer for the logical state with the stabilizer for the QECC, giving us just a single stabilizer with n
generators on n qubits. Sometimes, it’s instead better to think separately about the stabilizer of the code
and the stabilizer of the encoded state. For instance, when defining fault tolerance for a state preparation
gadget, we refer to the distance of the code, not the distance of the overall stabilizer state.
The first step to a fault-tolerant gadget for stabilizer states is generally to do a non-fault-tolerant encoding
circuit to create the state. Chapter 6 explained how to make such circuits. (This is one of those places where

225
it is helpful to lump the stabilizers together.) If there are no faults in the encoding circuit, we get the state
we’d like to have. However, because the encoding circuit inevitably contains many non-transversal gates,
a single fault during the encoding circuit can cause multiple errors in the state produced. Therefore, we’ll
need to figure out a way to verify and/or correct the states created this way.

13.1.1 Preparation Using Shor EC

The first and most general way to verify the prepared state is using Shor error correction. In particular,
after preparing the desired state non-fault-tolerantly, we perform Shor EC to measure the generators of the
full stabilizer, including both the generators for the code and logical stabilizer state being prepared. This is
essential, since we want to check two things: that the state is close to a valid codeword (PPP), and that the
logical state is the correct logical state (PCP). This procedure works for an arbitrary stabilizer state.

Theorem 13.1. State preparation via Shor EC verification satisfies the PPP and the PCP.

Proof. Both the PPP and the PCP follow from the ECRP for Shor EC. Measuring the generators of the
QECC ensures that s faults in the EC circuit lead to an output state which passes an s-filter for the code, no
matter how many faults there were in the non-fault-tolerant encoding circuit at the beginning of the gadget.
The one thing to check here is that performing a few additional cat state measurements for the stabilizer of
the logical state does not mess up the ECRP for the QECC. It doesn’t.
That takes care of the PPP. Furthermore, the ECRP for the combined stabilizer says that when there are
s faults in the EC step, the output state passes an s-filter for the combined stabilizer, meaning the output
state is in the space spanned by the ideal output state with s-qubit errors on it. An ideal decoder applied
to such a state will give the ideal output state, since s  t. That gives us the PCP.

A variety of simplifications are possible, some of them contradictory. One possibility is to skip the original
non-fault-tolerant encoding circuit. We still need to perform the Shor EC step (after all, we need to do
something), but we can do it on an arbitrary input state, perhaps the |00 . . . 0i state or a randomly chosen
state. Remember, measuring the error syndrome collapses the state to a subspace compatible with that
syndrome, and since we’re measuring a stabilizer with n generators, there is only a 1-dimensional subspace
compatible with a given syndrome — a single state. Once we know the error syndrome (determined via
the usual Shor method of repeating the measurement), we know the state, and can correct it to the right
encoded state by a Pauli operation. Yes, fault-tolerant error correction is so powerful that it can literally
make an encoded state out of random junk.
A di↵erent simplification is to simply verify that the original preparation circuit is correct rather than
trying to correct the state produced. If the preparation fails, we simply discard the whole state and try again.
We are thus using the Shor EC method for error detection rather than error correction. One advantage of
doing so is that we need not repeat the syndrome measurement as often: If we simply repeat the syndrome
measurement t times, discarding the state if any measured syndrome is non-trivial, that is sufficient. This
will often lead to discarding perfectly good states when a small number of syndrome measurements fail, but
that’s an acceptable price, provided we do not falsely accept an incorrect state. There is only a single state
that has trivial syndrome, so it is sufficient to verify that the syndrome truly is 0.
In order for an erroneous state to slip past our procedure, there needs to be an fault in the original
encoding circuit, plus a fault in each of the t syndrome extractions, for a total of t + 1 faults, which is more
than we allow. Any fewer faults means that either the original encoding circuit is correct or at least one of the
syndrome measurements is correct. When the true syndrome is nonzero, a correct syndrome measurement
will always detect that and cause us to discard the state. If the original circuit is correct, the state going into
the error detection step is perfect, but of course faults during the error detection step can introduce errors.
However, by equation (12.7), we know that each fault during a cat state measurement can introduce at most
one additional single-qubit error into the system being measured. Therefore, s faults during the verification
procedure result in a state that passes an s-filter relative to the true state, as desired.
Besides the need for fewer repetitions of the syndrome measurement, this simplification is more sensitive
to errors and thus likely to produce a state that has fewer errors in it than one produced via a full error

226
|0i

|0i

|0i H

Figure 13.1: Using Steane error correction and measurement to verify an encoded |0i state.

correction step. This is because error detection can detect up to 2t errors when the code has distance
d = 2t + 1 whereas error correction can only reliably handle t. Of course, there are still combinations of
t + 1 faults that can fool us, as discussed above, but they are rare since they require the false syndromes to
exactly cancel the true errors present.
The disadvantage of error detection is straightforward: Whenever the procedure detects an error, we have
to start over. This may require many repetitions before we succeed in preparing a state we trust enough to
use. For maximum efficiency, we probably want to try many preparations in parallel, throwing out the ones
that fail and switching the ones that succeed to where they are needed. That way, we don’t lose any time
waiting for the preparation, but we do need extra qubits. If there are at most t faults in the gadget as a
whole, we may need to prepare as many as t + 1 copies of the state in order to get one that passes the check.
Note also that these two simplifications are not completely compatible, but with some adaptation of the
error detection, they can be made to work together. If we skip the original encoding circuit and only do error
detection, then the chance of getting a zero syndrome (no error) is exponentially small. Instead, we should
loosen the error detection to allow a non-zero syndrome, provided the syndrome is repeated consistently.
However, if we only repeat the syndrome measurement t times, it is certainly possible that t faults could
make us think we got the same syndrome each time even though all of them are wrong. If we want to
combine the two simplifications, we should repeat the syndrome measurement t + 1 times and keep the state
only if all t + 1 syndromes are in agreement. That guarantees at least one of them is correct and ensures
that we are no more than t errors away from the right state. Another way of thinking about this particular
combination is that the first error syndrome extraction is a non-fault-tolerant encoding protocol, and then
the t subsequent repetitions are error detection steps.

13.1.2 Preparation of CSS Codewords

CSS codewords are particularly favorable for fault tolerance because they allow transversal CNOT and
measurement. These properties also give us more options for checking correctness. I will describe one such
option now.
Just as we can verify stabilizer states encoded in general stabilizer codes using Shor EC, we can verify
states encoded in CSS codes using Steane error correction. However, there is a significant complication: Shor
EC measures the syndrome using cat state ancillas, which can be prepared as described in section 12.1.3.
Steane EC uses |0i and |+i states encoded in the same QECC. Therefore, if we want to verify a |0i using
Steane EC, we need additional encoded states, which must in turn be verified somehow. We could use Shor
EC to verify them, as discussed in section 13.1.1, but if we’re going to do that, we might as well just verify
the state being prepared directly with Shor EC and cut out some extra steps. However, by looking closely
at the structure of the overall preparation procedure, we can see that the ancillas used for verification need
not be checked as closely as those used in a regular Steane EC step.
First let us consider what Steane EC looks like when applied to a single state rather than a code. Normally
Steane EC calls for encoded |0i and |+i ancillas, but such things don’t exist for a single state. Instead, it
suffices to do both phase and bit flip verification using ancillas which are the same as the state being verified,
as in figure 13.1. Essentially, we just compare the state we want to keep with the two ancillas to make sure
they are identical.

227
|0i
|0i
|0i

|0i H
|0i
|0i

|0i
|0i
|0i
X
|0i H
|0i
|0i

Figure 13.2: Hierarchical verification of CSS codewords when t = 2 and the circuit has 1 fault.

However, a straightforward verification procedure like this may not be sufficient. If we produce the three
|0i states in figure 13.1 using a non-fault-tolerant encoding circuit, then a single fault in one of the encoding
circuits can cause multiple errors in the state entering the verification. Depending what kind of errors they
are and which encoding circuit has the fault, that may be OK — the comparison might catch the errors.
However, multiple phase errors in the last ancilla state can propagate into the state being prepared without
being checked. Furthermore, if two of the encoding circuits each have a single fault, multiple errors of either
type could slip through the verification unnoticed.
In order to solve this problem, we will need to perform more extensive verification. We can arrange a
two-level verification of the states. First, prepare many copies of the desired ancilla state with non-fault-
tolerant circuits and divide them up into groups of t + 1 states. Compare the states in each group for
phase errors. If there are at most t faults in the encoding circuits for a group, then any phase errors will
be detected; in this case, discard the states in that group. Keep one ancilla state from each group which
does not detect phase errors. Next, compare t + 1 of the surviving states for bit flip errors, as depicted
in figure 13.2. An error during encoding of a single state in the first phase could cause multiple bit flip
errors, which would then propagate into the state being kept. However, in order for the multiple bit flip
errors to survive the second round of checking, all t + 1 states which survive the first round would need
errors, which would require t + 1 faults in the overall circuit. Therefore, when there are at most t faults in
the entire preparation and verification circuit, the only cause of errors in the output state are faults during
the verification procedure, which only a↵ect single qubits as per theorem 12.5. This hierarchical procedure
can be viewed as a distillation process through which a number of low-quality (i.e., noisy) ancilla states are
distilled down into one or a small number of high-quality purer ancillas.
By taking advantage of the structure of the code and perhaps using other tricks, the complexity of this
procedure may in some cases be reduced considerably. Without any optimization, though, this two-level
Steane EC verification procedure, which uses at least (t + 1)2 ancillas, is worse than the general Shor EC
verification procedure, which needs only t syndrome repetitions.
An example of an optimized |0i state preparation gadget for the 7-qubit code appears in figure 13.3.
Notice that in this optimized gadget, we don’t bother to check the phase. We can get away with this because

228
|0i

|0i

Figure 13.3: An optimized verification procedure for an encoded |0i state of the 7-qubit code.

the 7-qubit code is built from the classical 7-bit Hamming code, which is perfect. Therefore, every phase
error syndrome corresponds to a single-qubit phase error, and because we are preparing the encoded |0i
state, there is no logical phase error. No matter what happens in the circuit, the output state will pass a
1-filter for phase errors, since every state does. If there is a fault causing bit flip errors in one of the two
non-FT preparation circuits in figure 13.3, the check will catch it. Therefore, the only way to get bit flip
errors in the output state when there is one fault in the circuit is to have a fault during the verification
procedure. This part is transversal, so any fault will cause the final state to pass a 1-filter relative to both
the seven-qubit code and the |0i state we are trying to create. Thus, figure 13.3 gives an FT preparation
gadget.
As with the Shor EC verification procedure, the Steane EC method can be adapted to correct errors on
the state being prepared rather than simply post-selecting for the case where no errors are detected. This
requires additional ancillas in order to repeat the syndrome measurement enough to have confidence in the
answer. The considerations are similar to those for Shor EC, discussed in section 12.2.2.

13.1.3 Preparation of Ancillas for Steane and Knill Error Correction

One important application of these state preparation procedures is to produce the ancillas needed to perform
Steane and Knill error correction. For Steane error correction, the process is straightforward: We need |0i
and |+i, and these can be directly created by either of the above protocols. (Steane EC is only used for CSS
codes, so the second procedure is fine.)
The ancilla for Knill error correction is not too much harder. For a CSS code, we can just create two
encoded blocks in the state |+i ⌦ |0i, and then perform transversal CNOT between them to create the logical
Bell state. For a non-CSS stabilizer code, we need to be a little more clever. The trick is to think of the two
blocks together as a single bigger stabilizer state (which of course they are). Then we can perform the Shor
EC verification procedure to verify the stabilizer of the full state, which is generated by S ⌦ I, I ⌦ S, X ⌦ X,
and Z ⌦ Z.

13.2 Gate Teleportation

Now we’ll get started on figuring out how to get a universal set of gates. To do so, it’s best to first step away
from the framework of fault tolerance. Instead, let’s try to figure out how to build circuits performing non-
Cli↵ord group gates, without any concern about whether the construction is fault tolerant or not. Assume
we have the ability to perform Cli↵ord group gates and Pauli measurements, and, since this is the chapter
on state preparation, the ability to prepare an arbitrary state of our choice.

13.2.1 Gate Teleportation Construction

One thing we can do with the available tools is teleport a quantum state; this requires a Bell measurement,
an EPR pair, and a Pauli operator (with classical control). Suppose we want to perform a gate U (which
might be a Cli↵ord group gate or a non-Cli↵ord group gate) on a state | i. One way to do this, as in the left
part of figure 13.4, is to teleport | i through an EPR pair from Alice to Bob, and then have Bob perform
U . As usual in teleportation, Alice does a Bell measurement and sends the resulting two classical bits to
Bob, who performs the Pauli corresponding to the value of the the classical bits. Then he finishes o↵ with

229
| i | i
Bell Bell
=
|00i + |11i |00i + |11i
P U U| i U UPU† U| i

Figure 13.4: Performing a quantum gate U with quantum teleportation.

U . You might think this is perverse, and of course you’d be right: We’ve reduced the task of performing U
to teleporting and then performing U . That is not a simplification. Just bear with me for a moment.
Now suppose Bob is an impatient sort, and he wants to perform U right away, before he’s heard from
Alice. He can do that, of course, but now when he does get the classical bits from Alice, he should undo U ,
do the appropriate Pauli, and then re-do U . In other words, if Alice’s message indicates that he is supposed
to perform P , he should instead do U P U † , as shown on the right side of figure 13.4.
Believe it or not, this is actually progress: Bob’s impatience means that he performs the gate U on a
fixed state |00i + |11i, regardless of Alice’s state | i. That means we’ve converted the task of performing U
into three steps:
Protocol 13.1 (Gate Teleportation).
1. Create (I ⌦ U )(|00i + |11i) shared between Alice and Bob.
2. Alice performs a Bell measurement and sends the results to Bob.
3. Bob performs U P U † when Alice’s classical bits correspond to Pauli P .
Step 2 is no problem — it only involves CNOT, H, and Z measurements. Step 1 could be very difficult,
but it is clearly a state preparation task. Let’s assume for the moment that this state is given to us. We’ll
discuss how to create such states in sections 13.4 and 13.5.
Then there’s step 3. For general U , the gate U P U † could be a pretty complicated thing. There’s no
reason to assume it will be any easier to do than U . But for certain special U , the conjugated Pauli always
is simpler that U .
For instance, if U 2 C1 , then U P U † 2 P1 , and Bob only needs to perform a Pauli at step 3. It turns out
that there are a number of gates U with the property that whenever P 2 Pn , then U P U † 2 Cn : These gates
conjugate Pauli operations into Cli↵ord group gates.
This gate teleportation construction therefore is a way to bootstrap from simpler operations to more
complicated ones. If Alice and Bob can requisition whatever ancilla states they need, then they can use
the gate teleportation construction to build up more and more complicated gates. Given Pauli gates, they
can perform Cli↵ord gates, and given Cli↵ord group gates, they can perform any gate with the property
U P U † 2 Cn .
Of course, the use of Alice and Bob is only for illustrative purposes. Teleportation is generally thought of
as a communication protocol, so normally Alice and Bob are far apart. In this case, we’re using teleportation
to perform gates, which is an interesting thing to do even if Alice and Bob are right next to each other, for
instance adjacent qubits in a quantum computer. (Or, given the main point of this book, adjacent logical
qubits in a quantum computer.)
Also worth pointing out is that, while protocol 13.1 is described explicitly for a single-qubit gate U , that
is not at all necessary. An n-qubit gate can be done in the same way just using the state (I ⌦U )(|00i+|11i)⌦n
instead of the 1-qubit version. In step 2 Alice teleports all her qubits, and in step 3, the Paulis run over
n-qubit Paulis. It all works just the same as the single-qubit case.
The state (I ⌦ U )(|00i + |11i) is known as a magic state. Protocol 13.1 is an example of a class of
protocols called magic state injection, whereby a magic state is used to perform a non-Cli↵ord group unitary
on some data qubits using only Cli↵ord group gates and Pauli measurements. The term “magic state” is

230
used somewhat loosely to mean one of three things: It could be a state like this that can be used directly for
magic state injection. It could be a state (pure or mixed) that can be further processed by Cli↵ord group
operations, at which point it could then be used to perform a non-Cli↵ord group gate. The term is also
sometimes used to refer to any non-stabilizer pure state.
One somewhat peculiar feature of this construction is that the qubit that comes out of it is in a di↵erent
location from the qubit that goes in. There’s nothing really wrong with that, but you might find it a bit
surprising that to perform a unitary gate we end up transferring the data to a di↵erent qubit.

13.2.2 Ck Gates
So, what gates have the property we need to make protocol 13.1 work?

Definition 13.1. Let C1 = Pn and let

Ck = U 2 U(2n ) | U P U † 2 Ck 1 8P 2 Pn . (13.1)

The collection of the sets {Ck } is known as the Cli↵ord hierarchy.

The number of qubits n is implicit in this notation, but Ck can also refer to the union of Ck over all
values of n. The sets Ck are defined recursively. C1 is the Pauli group, C2 is the Cli↵ord group Cn , and C3 is
the set of gates we’re now interested in. The higher values of Ck are potentially interesting too: If we can
perform arbitrary C3 gates, then we can use gate teleportation to directly perform C4 gates, which then lets
us perform C5 gates, and so on.
Let’s look at some examples. The first example is R⇡/8 :
† †
R⇡/8 XR⇡/8 = XR⇡/4 (13.2)
†
R⇡/8 ZR⇡/8 = Z. (13.3)

R⇡/4 2 C1 = C2 , so equations (13.2) and (13.3) imply that R⇡/8 2 C3 . Note that for C3 , we don’t need to
check Y , since the image of Y under conjugation is the product of the images of X and Z, and C2 is a group.
For higher Ck , you may also need to check the image of Y , since Ck is not closed under multiplication for
k 3.
Another gate in C3 is the To↵oli gate Tof:

Tof(X ⌦ I ⌦ I)Tof = X1 CNOT2,3 (13.4)

Tof(I ⌦ X ⌦ I)Tof = CNOT1,3 X2 (13.5)
Tof(I ⌦ I ⌦ X)Tof = X3 (13.6)
Tof(Z ⌦ I ⌦ I)Tof = Z1 (13.7)
Tof(I ⌦ Z ⌦ I)Tof = Z2 (13.8)
Tof(I ⌦ I ⌦ Z)Tof = (C Z)1,2 Z3 (13.9)

This is why the universal gate sets hCn , R⇡/8 i and hCn , Tofi are particularly interesting for fault tolerance:
They each consist of the Cli↵ord group plus a single C3 gate, which can be implemented via gate teleportation.
What about phase rotation by other angles?

R✓ XR✓† = XR2✓
†
(13.10)
R✓ ZR✓† = Z. (13.11)

Now, R2✓ 2 C2 i↵ 2✓ = m⇡/4 for some integer m, so R✓ 2 C3 i↵ ✓ = m⇡/8 for integer m. By induction,
we immediately find that Rm⇡/2k 2 Ck \ Ck 1 when m is an odd integer, and that R✓ 62 Ck for any k if
✓ 6= m⇡/2k for some integers k and m. Thus, we see that every Ck contains some elements that are not in

231
| i

Bell Bell

(I ⌦ U )(|00i + |11i)⌦2

U UPU† U| i

Figure 13.5: Gadget for implementing a two-block logical gate through gate teleportation.

Ck 1 , and that there are some unitaries that are not in any Ck . This means that there are some gates we
won’t be able to produce exactly through a chained teleportation circuit.
But maybe Ck is dense in U(2n ) for large enough k? No. In fact, for fixed k and n, Ck /{ei I} (which is
Ck with global phases removed) must be finite. This can be easily seen by induction: Ck 1 /{ei I} is finite,
and a unitary can be uniquely determined by the image of the Paulis under conjugation. If U P U † = ei V
for Pauli P , there are only two possible values for because (ei V )2 = I. (P 2 = I, and group identities are
preserved under conjugation.) Therefore, there are only a finite number of possible images of a Pauli P in
Ck 1 , and thus only a finite number of maps from Pn to Ck 1 .
This means that direct constructions via teleportation are not enough. In order to get a universal set of
gates, we can construct a C3 gate such as R⇡/8 via teleportation, but then we will need to multiply gates
together to get good approximations of arbitrary unitary transformations.

13.2.3 Universal Fault Tolerance

Now we are ready to put together these ideas to see how to make fault-tolerant gadgets for a universal set
of gates, given access to the right kind of ancillas. First, note that, using the tools we’ve already discussed,
we can perform Cli↵ord group gates for any stabilizer code:
Theorem 13.2. For any Cli↵ord group gate U 2 Cn and any stabilizer code, there exists a fault-tolerant
gate gadget for U .
Proof. The protocol will consist of implementing protocol 13.1 for the encoded qubits of the code.
For step 1, we must create the state (I ⌦U )(|00i+|11i). The first qubit of the pair is encoded in one block
of the code, and the second qubit is encoded in a second block. If the code encodes k qubits, first imbed U
in a 2k -dimensional Hilbert space via tensoring with the identity on other qubits. That means we want k
encoded EPR pairs, of which one has U applied on Bob’s side. If U is intended to interact qubits encoded
in m separate blocks of the code, the state to construct will consist of 2m code blocks, as in figure 13.5.
Because U 2 Cn , the state we want to create is an encoded stabilizer state, which is a stabilizer state
in its own right. Therefore, the Shor EC preparation technique of section 13.1.1 can be used to create this
state.
For step 2, we must perform a logical Bell measurement between the data block and the ancilla block
created in step 1. The logical Bell measurement consists of Pauli measurements, so could be done via a
sequence of cat state measurements (section 12.1). However, it is much more efficient to simply perform
transversal Bell measurements. This has the e↵ect of combining Knill error correction (section 12.4) with
gate teleportation. In particular, by appropriate classical post-processing, we can extract the outcome of the
logical Bell measurement from the outcomes of the transversal Bell measurements. We deduce a Pauli P .
Finally, in step 3, we must perform U P U † . But since U 2 Cn , U P U † is a Pauli and the logical version
of it is a Pauli as well. This can just be performed transversally.

232
a) b)
| i | i H
|+i X | i |0i Z | i

Figure 13.6: Two versions of one-bit teleportation.

Each of the three steps is performed via fault-tolerant gadgets, so it straightforward to check that the
overall procedure satisfies the GPP and GCP.

But the Cli↵ord group is not universal, so we’ll need more gates. By theorem 6.9, any gate added to the
Cli↵ord group will give us a dense subgroup of SU(2n ). In particular, we can pick a C3 gate such as R⇡/8 or
Tof. Then we can use the teleportation construction to perform the C3 gate fault-tolerantly.

Theorem 13.3. There exists a fault-tolerant protocol for any stabilizer code.

Proof. By theorem 13.1, we have a FT |0i preparation gadget. By theorem 12.1 or theorem 12.6, we have
FT measurement gadgets. By theorem 12.3 or theorem 12.6, we have FT error correction gadgets. The
FT storage gadget consists of transversal wait locations. By theorem 13.2, we have FT gate gadgets for all
Cli↵ord group gates. Therefore, all we need to construct a full fault-tolerant protocol is a FT gate gadget
for a non-Cli↵ord group gate.
We will use gate teleportation, protocol 13.1, for some gate U in C3 \C2 . For step 1, we invoke lemma 13.5,
which will be discussed in section 13.4. The upshot is that we have a construction that creates the desired
state (I ⌦ U )(|00i + |11i) and satisfies the PCP and the PPP. For step 2, we perform the logical Bell
measurement via cat state measurement or Knill measurement. The procedure satisfies the MCP. For step
3, we must perform U P U † for some Pauli P . Since U 2 C3 , U P U † 2 Cn , which means we can implement it
fault-tolerantly as per theorem 13.2. Putting together three fault-tolerant gadgets once again implies that
the full gadget satisfies the GPP and the GCP.

13.3 Compressed Gate Teleportation

Protocol 13.1 is very useful, but the ancilla is a bit awkward. Even for a single-qubit gate, the construction
demands a two-qubit ancilla, and an n-qubit gate needs a 2n-qubit ancilla. For certain gates, we can reduce
the size of the construction and use a smaller ancilla.

13.3.1 One-Bit Teleportation

The first step is to notice that there are protocols very similar to teleportation that work with a CNOT, a
single-qubit measurement, and a single-qubit ancilla rather than a Bell measurement and a two-qubit ancilla.
Two such protocols are shown in figure 13.6. Other variations are possible using other two-qubit gates.
These constructions presume the ability to perform a CNOT between Alice and Bob. This is not usually
possible in a communications scenario where Alice and Bob are separated, but is well-suited for applications
to fault tolerance. After the CNOT, Alice measures a single qubit, and therefore only gets one classical bit
to send to Bob. If the bit value is 0, Bob need do nothing to his qubit. If the classical bit is 1, Bob performs
the Pauli indicated in the figure. Straightforward computation shows that these circuits have the desired
e↵ect.

13.3.2 ⇡/8 Compressed Gate Teleportation Construction

Now, how can we use these smaller teleportation-like circuits to perform gates? There is a significant
complication over protocol 13.1. Before, Bob’s post-teleportation gate could easily be moved before the Bell

233
| i
R⇡/8 |+i V R⇡/8 | i

Figure 13.7: A smaller circuit for gate teleportation with the ⇡/8 phase rotation. Here, V = R⇡/8 XR ⇡/8 .

measurement, since the output qubit is una↵ected by the Bell measurement. That is no longer true. Instead,
we should restrict our attention to performing gates that commute with the CNOT (or at any rate, have
some simple commutation relationship with it). For instance, if we use the circuit of figure 13.6a, where the
control qubit of the CNOT is the ancilla qubit, then we can commute past it any diagonal gate, such as the
R⇡/8 . The resulting circuit is pictured in figure 13.7. The magic state in this case is

i⇡/8
R⇡/8 |+i = e |0i + ei⇡/8 |1i. (13.12)

† †
Since R⇡/8 XR⇡/8 = XR⇡/4 , whenever Alice gets 1 as her measurement outcome, to complete the construc-
tion, Bob must perform R ⇡/4 followed by X on his qubit.

13.4 Cli↵ord Eigenstate Preparation by Measurement

There’s still one piece missing from the universal gate construction: How to build the magic states. There
are two known methods for building magic states. One is quite general and conceptually straightforward, but
rather ugly and inefficient in practice. It is built around the cat state measurement procedure we discussed
in section 12.1. The second method, magic state distillation, is simple and elegant, but with more subtle
conceptual underpinnings. Furthermore, so far it only works with certain magic states. We’ll discuss magic
state distillation in section 13.5.
For this section, we’ll ignore most properties of the magic states we’d like to build. The only thing we
need to know is that the magic states we care about have a “stabilizer” made up not of Pauli operators, but
of Cli↵ord group operators. Why is that? Well, the magic states for a C3 gate U are of the form U | i or
(I ⌦ U )| i, where | i is some stabilizer state.

Proposition 13.4. Let | i = U | i, where U 2 Ck and | i is a stabilizer state with stabilizer S. Then | i
is the +1 eigenstate of V = U M U † when M 2 S. V 2 Ck 1 . Furthermore, | i is the only state (up to global
phase) that is a +1 eigenstate of all operators of this form.

Proof. Since M | i = | i,
V | i = V U | i = U M | i = U | i = | i. (13.13)

The fact that V 2 Ck 1 follows immediately from the definition of Ck .

To show uniqueness, note that the projector on the subspace of +1 eigenstates of all V is
Y1
⇧= (I + V ). (13.14)
2
V

Since a +1 eigenstate of V and V 0 is a +1 eigenstate of the product, we have

1 1
(I + V )(I + V 0 )(I + V V 0 ) = (I + V )(I + V 0 ). (13.15)
8 4

In other words, we can eliminate the redundant V s from the expression for ⇧. Furthermore, (U M U † )(U N U † ) =

234
U M N U † , so the independent V s can be considered to come from a generating set {M1 , . . . , Mn } for S. Thus,
n
1 Y
⇧= n (I + U Mi U † ) (13.16)
2 i=1
" n #
1 Y
= nU (I + Mi ) U † (13.17)
2 i=1
= U ⇧S U † , (13.18)

where ⇧S is the projector on the code subspace of S, i.e., | i. The unitary conjugation of a rank-one projector
is again a rank-one projector, so | i is unique.

As a result of proposition 13.4, the magic states we need to create C3 gates can be uniquely defined as
+1 eigenstates of a list of Cli↵ord group operators. For instance, the magic state R⇡/8 |+i is the unique +1
†
eigenstate of XR⇡/4 . To verify that we have the magic state, it therefore suffices to measure these Cli↵ord
group operators. This can be done fault-tolerantly using techniques we already know. Actually, we can go
further and deterministically prepare the states. By “deterministically,” I mean that we will always get a
state output; no post-selection is needed. Naturally, if there are too many faults in the circuit, we might
prepare the wrong state.

Lemma 13.5. Let | i be any state which is uniquely defined by being an eigenstate of Cli↵ord group operators
U1 , . . . , Un . Let Q be a code for which there exists a fault-tolerant implementation of the full Cli↵ord group.
Then there exists a fault-tolerant state preparation gadget for | i for the code Q. If Ui = V Pi V † , where the
Pi ’s generate a stabilizer and V 2 C3 , then the gadget can be done without post-selection.

Proof. Let us for the moment assume that the fault-tolerant implementation of Cli↵ords is transversal. The
lemma still holds for more general FT gate gadgets, but the procedure is more complicated.
In section 12.1, I presented the cat state procedure for measuring Paulis on arbitrary stabilizer codes.
There is little about this procedure that is specific to Pauli operators. It can also be used to measure Cli↵ord
group operators, or really any other operation for which we have a transversal implementation on the code.
If the Cli↵ord group operation U has eigenvalues ±1, then no modification of the procedure is needed —
simply substitute the transversal implementation of U for the transversal implementation of the Pauli. If U
has other eigenvalues, a little more work is necessary.
First note that any U 2 Cn has finite order r, since Cn is a finite group. (This also means that the
eigenvalues
N of U are ofNthe form e2⇡ic/r for integer c.) Next, observe that if the transversal implementation
of U is a Wa , then a Wa is a transversal implementation of U b . We wish to perform gates interacting
b

an r-dimensional unencoded qudit with a single physical qubit. The action of the gate is

C Wa : |bi ⌦ | i 7! |bi ⌦ Wab | i. (13.19)

In other words, it is a controlled-Wa gate. In a basic model for fault-tolerance, we are assuming a universal
set of physical gates is available, so this gate is allowed (or at any rate, can be approximated with arbitrarily
high precision). Therefore, we can modify the cat state measurement procedure in three ways:
Pr
1. Replace the qubit cat state with a qudit cat state b=1 |bb . . . bi.

2. Instead of the controlled-Pauli gates, do the gate C Wa between the ath qudit of the qudit cat state
and the ath qubit of the code.

3. Instead of measuring in the Hadamard basis, perform the transversal qudit inverse Fourier transform
and measure.

i is an eigenvalue of U with eigenvalue e2⇡ic/r = ! c , the C Wa ’s transform the

Pr | bc
If the logical codeword
qudit cat state to b=1 ! |bb . . . bi, which is an ! c eigenstate of the qudit Pauli operator X ⌦ X ⌦ · · · ⌦ X.

235
The inverse Fourier transform gives us instead an ! c eigenstate of Z ⌦ Z ⌦ · · · ⌦ Z, so if we measure the
individual qudits and take the sum modulo r, we get out c. This is in the absence of faults, of course, but
just as with cat state measurement, if we repeat the procedure enough, do all the necessary checking when
building the cat state, and intersperse the measurements with error correction steps, we can indeed deduce
c. This procedure constitutes a fault-tolerant measurement gadget for U , with the same proof as the one
showing cat state measurement of Paulis is fault tolerant.
Therefore, given a code with transversal implementation of all Cli↵ord group elements, such as the 7-
qubit code, we can fault-tolerantly measure the eigenvalue of any Cli↵ord group operators. To create an
encoded Cli↵ord eigenstate, we can thus create it with a non-fault-tolerant circuit, then sequentially perform
fault-tolerant cat state measurements of Ui . If any of the eigenvalues is wrong, discard the state. Otherwise
keep it. Since the | i state is the only codeword with all the correct eigenvalues, a state that is kept must
be correct, up to a possible t-qubit error, unless there are more than t faults during the measurement.
When | i = V |⇠i for some stabilizer state |⇠i, V 2 C3 (an equivalent property to the condition in the
last sentence of the theorem’s statement), we can improve this procedure by fixing up the state even if the
eigenvalues are wrong. Think about the original stabilizer state |⇠i, which has stabilizer S generated by {Pi }.
Suppose we were to take some other state and measure the eigenvalues of all Pi . Depending on the state’s
overlap with |⇠i, there is a chance we would get all the right eigenvalues, in which case the state would
actually be projected on |⇠i. If we don’t get the right eigenvalues, we could form an error syndrome s by
noting which generators have the wrong eigenvalue, and we would instead have the state |⇠(s)i. Choose some
Pauli Q(s) which has error syndrome s for S. Then Q(s)|⇠(s)i = |⇠i, since Q(s) changes all the eigenvalues
to the correct ones, and |⇠i is the only state with those eigenvalues.
Let | (s)i = V |⇠(s)i. Then | (s)i and | i have the same eigenvalue for Ui = V Pi V † i↵ the ith bit of
s is 0. (Note that when Ui = V Pi V † , Ui has the same eigenvalues as Pi .) Therefore, if we measure the
eigenvalues of all Ui ’s, we can again form an error syndrome s, which implies that the post-measurement
state is projected onto | (s)i. Furthermore, U (s) = V Q(s)V † has the e↵ect of changing | (s)i to | i. Since
V 2 C3 , U (s) 2 Cn . The following procedure will thus fault-tolerantly create the desired state | i:

1. Encode | i non-fault-tolerantly

2. Measure U i for all i using cat state measurement

3. Determine the error syndrome s

4. Perform U (s) fault-tolerantly

Finally, let me say a few words about the case when we have a fault tolerant but not transversal procedure
to implement arbitrary Cli↵ord group gates. The difficulty in the above procedure comes in how to do the
Cli↵ord measurement. If the gadget involves only unitary gates and perhaps some scratch ancilla qubits,
it is not hard to modify the procedure, at least for Cli↵ord gates with eigenvalues ±1 (which is the most
interesting case). We can assume without loss of generality that all ancilla qubits are reset to their initial
values at the end of the gadget. To do the measurement, simply use a cat state with one qubit per gate in
the implementation of the Cli↵ord U being measured, and perform the fault-tolerant implementation of U
with each gate being controlled by a di↵erent cat state qubit. Verifying the cat state first ensures that bit
flip errors in it are e↵ectively uncorrelated, so a single fault in the cat state preparation can only propagate
errors into the data block via a single gate. This is the same e↵ect as having a fault in that gate. Therefore
the procedure is fault tolerant.
When we wish to measure more general Cli↵ord gates with other eigenvalues, a similar procedure will
work, but we need to perform up to U r 1 . This means a circuit potentially r 1 times larger, so we should
pick the cat state to be correspondingly large. Once more, let us make it a qudit cat state using dimension r
qudits. If the circuit for U involves m gates, we use a cat state with rm qudits. The first m qudits in the cat
state will perform the first power of U , the second m perform the second power, and so on. In particular, for
a qudit in the jth block of m qudits in the cat state, perform a controlled gate operation (for the appropriate
gate in the FT gadget for U ) which activates the gate only if the cat state qudit is j. This ensures that

236
a
if the cat state qudits are all a, the circuit for U is performed a times, leading to a conditional U logical
gate, just as for tranversal gates.
If the procedure for U involves measurements and classical processing, we first modify the circuit to be
purely unitary. This can be done replacing each measurement with a CNOT to an ancilla qubit. We would
also like to replace the classical circuitry with corresponding quantum gates, but there is a complication. In
a basic model for fault tolerance, classical gates are assumed to be perfect but quantum gates are not, so
blindly replacing classical gates with quantum ones leads to new opportunities for errors. The solution is
turn the classical circuit into a classical fault-tolerant circuit, and then replace those gates with quantum
ones. The details are discussed in section 15.3.

The procedure described above for magic state preparation starts with a non-fault-tolerant encoding
circuit for the magic state, but that is not really necessary. Any encoded state could be the starting point.
The fault-tolerant Cli↵ord measurements will project on the state | (s)i, which can then be corrected to | i
as discussed above. The advantage of starting with a di↵erent encoded state, such as a stabilizer state, is
that it can often be prepared fault-tolerantly via a simpler procedure. This segments the whole gadget into
smaller gadgets, each of which is fault tolerant, and can thus potentially reduce the vulnerability to errors.
If you look closely at the proof of lemma 13.5, you’ll see that it doesn’t depend very much on having
Cli↵ord group operators. We can fault-tolerantly measure the eigenvalue of any unitary operator U with
finite order provided we have a fault-tolerant implementation of U , and can thus create any state which can
be uniquely characterized by its eigenvalues for such a set of unitaries.

13.5 Magic State Distillation

The method of magic state preparation discussed in section 13.4 is rather awkward. For codes with big block
sizes, it is especially undesirable, since it involves cat states as large as the code block. Another class of
techniques, known as magic state distillation protocols, is generally preferable for large codes.
Magic state distillation is a procedure that takes as input some number of imperfect magic states and,
through a sequence of Cli↵ord group operations and Pauli measurements, produces a smaller number of
better magic states. Usually these procedures are designed for a simplified scenario where Cli↵ord group
operations and Pauli measurements are perfect, with no faults, and no non-Cli↵ord gate gadgets are allowed.
Some sort of magic state preparation gadget is available, but it does have faults. If the error rate on the
magic state preparation gadget is not too high, the e↵ect of magic state distillation is to produce a magic
state with a lower error rate than could be made using the original gadget.
Typically, these procedures are described for unencoded states. In reality, though, we would want to
apply them for encoded magic states. The assumption, therefore, is that we work with a code which has
a large error-correcting capability and a good set of gadgets for arbitrary Cli↵ord group gates and Pauli
measurements. Under these conditions, it is not a bad approximation to assume that the Cli↵ord group
gates and Pauli measurements cannot have errors. In reality, of course, there will be faults during the
circuit, but the GPP and GCP ensure that the overall circuit remains fault tolerant. We may want to
perform error correction along the way for additional robustness.
The theory of magic state distillation protocols is still under active development, and as of this writing,
there is no real systematic understanding of the range of possible distillation procedures. The contents of
this section should therefore be viewed mostly as examples of the types of protocols we know about.

13.5.1 State Distillation Via Transversal Non-Cli↵ord Gates

The first class of protocol takes advantage of a QECC with a non-Cli↵ord transversal gate. For instance,
we could use the 15-qubit code discussed in section 11.5.1, which has a transversal ⇡/8 gate. Note that the
use of the 15-qubit code is completely independent of the QECC used for fault tolerance, which can be a
totally di↵erent code. As mentioned above, I will describe the protocol for unencoded magic states, so only
the 15-qubit code will make an appearance at first. At the end of this section, I will be more explicit about
how one combines the distillation protocol with encoded operations.

237
For the ⇡/8 gate, the relevant magic state is R⇡/8 |+i. One magic state of this form lets us perform one
R⇡/8 gate using the circuit in figure 13.7, which involves only Cli↵ord group gates and Pauli measurements.
Now suppose we wanted to create a magic state encoded in the 15-qubit code. We can create the |+i state
for the 15-qubit code using only Cli↵ord group gates. Then we would have to perform a logical R⇡/8 gate.
A transversal ⇡/8 applied to the code does the logical ⇡/8 rotation, which is close. If we then follow it
by a ⇡/4 rotation R⇡/4 , we will get the right result. The logical ⇡/4 gate can be performed on the 15-qubit
code using transversal ⇡/4 gates. (You can check this.) Therefore, the only non-Cli↵ord component we
need is the transversal ⇡/8 gate. By the rules of magic state distillation, we don’t have the ability to do such
a thing directly.
That is where the noisy magic states come in. Suppose we take 15 low-quality R⇡/8 |+i states, and use the
compressed gate teleportation technique to perform 15 single-qubit R⇡/8 gates, comprising the transversal
⇡/8 rotation we need. If all the magic states we used were perfect, the result would be the perfect logical
R⇡/8 |+i state. If r of the magic states are wrong, however, we instead get a state with errors on it. An error
in a magic state causes the gate performed to be something other than the correct ⇡/8 rotation. However,
since the teleportation circuit for a single magic state only touches a single qubit of the 15-qubit code, r
incorrect magic states can only produce errors on r qubits of the 15-qubit code. The 15-qubit code has
distance 3, so if r = 1, we can perform error correction — which only uses Cli↵ord group gates and Pauli
measurements — and end up with a perfect encoded magic state.
Actually, since this is a state preparation procedure, we’re happy to use post-selection if it lets us tolerate
more error. Instead of error correction, we could use error detection, since the 15-qubit code can detect up
to 2 errors. If we detect an error in the code, throw away the whole thing and start again using new magic
states. If r  2, we end up with a perfect encoded magic state. Admittedly, if r 3, it’s possible an error
will slip by and give us a bad magic state, but if the probability of a single magic state being bad is p and
errors on di↵erent magic states are independent, the probability of having r 3 is O(p3 ), which for small p
is much better.
Of course, we don’t really want a magic state encoded in the 15-qubit code, we want an unencoded
magic state (or rather, one encoded in the QECC we’re using for fault tolerance, but leave that be for now).
The 15-qubit code is a stabilizer code, though, so we can decode it using a Cli↵ord group circuit, leaving
an unencoded magic state with the same error rate as the encoded one produced after error detection. In
summary, we have the following procedure:
Protocol 13.2.
1. Begin with 15 R⇡/8 |+i magic states, possibly with some errors.
2. Create |+i for the 15-qubit code using a Cli↵ord group encoding circuit.
3. Use compressed gate teleportation with the magic states to perform a transversal R⇡/8 gate on the
15-qubit code block.
4. Perform a transversal R ⇡/4 gate on the code.
5. Perform error detection for the 15-qubit code. If any errors are detected, throw it out and restart the
protocol.
6. Decode the 15-qubit code.
When the input of the protocol is 15 magic states which have independent errors with probability p per
state, the output of the protocol is an improved magic state with error probability O(p3 ).
In the literature, you will often see magic state distillation protocol protocols with a more compact form:
Protocol 13.3.
1. Begin with 15 R⇡/8 |+i magic states, possibly with some errors.
2. On the 15 magic states, measure the error syndrome of the 15-qubit code. If the syndrome is non-zero,
throw out all the qubits and restart the protocol.

238
3. Decode the 15-qubit code, keeping only the decoded qubit.
4. Perform R⇡/4 on the remaining qubit.
This is a closely related procedure, and works for much the same reason. Since the transversal R⇡/8 is a
valid gadget for the 15-qubit code, it commutes with the projector ⇧15 on the code:
⌦15 ⌦15
R⇡/8 ⇧15 = ⇧15 R⇡/8 . (13.20)

The e↵ect of step 2, measuring the syndrome and then discarding if it is non-trivial, is to perform ⇧15 .
Therefore, after step 2, in the absence of errors, we have the state
⌦15 ⌦15
⇧15 R⇡/8 |+i = R⇡/8 ⇧15 |+i⌦15 . (13.21)

But ⇧15 |+i⌦15 = |+i (up to normalization). This need not be true for a general stabilizer code, but it is
true for a CSS code using the standard basis codewords. Therefore, after step 2 of the protocol, we have the
state
⌦15
R⇡/8 |+i = R ⇡/8 |+i, (13.22)
and the remaining two steps leave us with the desired magic state.
When some of the initial magic states have errors in them, we rely on the QECC projection to alert us to
their presence. The magic state lives in a two-dimensional Hilbert space, which can be spanned by R⇡/8 |+i
and R⇡/8 | i = R⇡/8 Z|+i, so let’s assume that the error Ze on the initial set of 15 magic states is a tensor
product of Z’s and I’s. Then
⌦15 ⌦15
⇧15 R⇡/8 Ze |+i⌦15 = R⇡/8 ⇧15 Ze |+i⌦15 . (13.23)
We can break up ⇧15 into a product ⇧15 = ⇧15,X ⇧15,Z , with ⇧15,X consisting of the sum over X generators
and ⇧15,Z the sum over Z generators. Now, ⇧15,Z commutes with Ze , so we have
⌦15 ⌦15
⇧15 R⇡/8 Ze |+i⌦15 = R⇡/8 ⇧15,X Ze ⇧15,Z |+i⌦15 . (13.24)

⇧15,Z |+i⌦15 = ⇧15,Z ⇧15,X |+i⌦15 = |+i, (13.25)

and
⌦15 ⌦15
⇧15 R⇡/8 Ze |+i⌦15 = R⇡/8 ⇧15,X Ze |+i. (13.26)
On the RHS, we have a situation where the phase errors Ze a✏ict a codeword of the 15-qubit code, and then
we detect phase errors. Since the code has distance 3, it can detect two errors, so if there are phase errors
on one or two qubits, the projection ⇧15,X annihilates the state. Once again, the probability of magic state
errors surviving the distillation procedure is reduced from p to O(p3 ).
One notable di↵erence between the protocols as written is that in the second version, there is a good
chance that the state will be rejected for failing the ⇧15,Z projection even if all the magic states are perfect.
This drawback can be easily eliminated: Since the ability of the code to correct bit flip errors is never actually
used to eliminate errors — it was sufficient to detect phase errors — we can keep the state regardless of the
result of measuring the Z generators. However, we will in general need to do some additional Cli↵ord group
gates in order to end up with the right state.
You remember, I hope, that the purpose of all this was to provide a fault-tolerant magic state preparation
gadget. So, how do these protocols look when we make them fault tolerant?
Protocol 13.4.
1. Create 15 R⇡/8 |+i encoded magic states using non-fault-tolerant circuits. The states are encoded in
the QECC Q used for fault tolerance.

239
2. Perform fault-tolerant error correction on the magic states.
3. Create the state |+i encoded in a concatenated code with the 15-qubit code as the inner code, and
Q as the outer code. (I.e., each qubit of the 15-qubit code is encoded in Q.) Since the |+i state
encoded in the 15-qubit code is a stabilizer state, the desired state is a stabilizer state encoded in Q,
and should thus be created using one of the methods for fault-tolerantly preparing encoded stabilizer
states discussed in section 13.1.
4. Use compressed gate teleportation with the encoded magic states to perform an R⇡/8 gate on each
of the blocks of Q comprising the 15-qubit code. This step involves Cli↵ord group gates and Pauli
measurements, which should be implemented using the appropriate fault-tolerant gadgets for Q.
5. Perform an R ⇡/4 gate on each of the blocks of Q comprising the 15-qubit code. This is a Cli↵ord
group gate, and should be implemented using the relevant fault-tolerant gadget for Q.
6. Measure the generators of the 15-qubit code on the concatenated code block. This should be done
using procedures that are fault-tolerant for the code Q, but need not be fault-tolerant for the 15-qubit
code. If any errors are detected, throw the whole block out and restart the protocol.
7. Decode the 15-qubit code.
8. If more error tolerance is needed, take the surviving magic states (still encoded in Q) and repeat the
protocol starting from step 3, as many times as is needed.
All the steps except for step 1 are built out of fault-tolerant gadgets for Q. Therefore, one fault (or two,
if the code can correct 2 errors) anywhere in steps 2 or later still has the protocol satisfy the PPP and PCP.
If there are one or two faults in step 1, only one or two magic states are bad, and the distillation procedure
will detect the error. With one fault in the magic state preparation and one fault later in the circuit, the
PPP and PCP are also satisfied (see exercise ??).
If the code Q has larger distance, we may want to repeat the magic state distillation procedure more than
once to tolerate more errors in the initial magic states. Magic state distillation can be used as an iterative
procedure: Use the outputs of one round of magic state distillation as inputs to a second round. If the initial
states have error rate p, the first round gives us states with error rate O(p3 ), and two rounds give us states
with error rate O(p9 ), detecting up to eight erroneous magic states in the initial pool of 152 magic states.
(The two-round procedure only fails if there are at least three first-round states that are wrong, and each
of those can only happen if there are three faults in the corresponding part of the circuit.) If that’s not
good enough, you can feed the states surviving the second round into a third round, then a fourth round, for
however many rounds you need. Assuming the Cli↵ord group operations really are perfect, we will rapidly
converge to pure 200-proof magic states.

13.5.2 Twirling of Magic States

In analyzing the 15-qubit code, we considered a stochastic Pauli error model for the initial magic states —
with probability 1 p the state is perfect, and with probability p there is a Z error on it. When dealing
with circuits containing only Cli↵ord group gates, sticking with a Pauli error model is a self-consistent thing
to do: If the original fault causes a Pauli error, later Cli↵ord group gates may change the type of Pauli, but
it will still be a Pauli error. However, to create a magic state, we need to stick in one or more non-Cli↵ord
group gates somewhere, which creates the potential for trouble, since a pre-existing Pauli error could become
a non-Pauli error.
For distillation of R⇡/8 |+i by the 15-qubit code, we are on pretty safe ground considering only Pauli
errors. Theorem 2.4 ensures that the 15-qubit code can correct or detect superpositions of Pauli errors as
well, so the protocol will still work, but as we widen the class of magic state distillation procedures we
consider, there is a possibility that something will go wrong.
We’d like to compare the probability of error after a round of distillation with the probability before.
This only makes sense at all if the error model is a stochastic one, and even then is a bit tricky for a general

240
stochastic error model, since some of the “errors” could be represented by Kraus operators which are close
to the identity but not equal to it. (This could happen before the distillation, but is much more likely to
happen afterwards.) Treating such a Kraus operator as a big error on a par with X or Z won’t really do the
protocol justice.
One solution is to simply use fidelity to the correct state as the measure of success. This is well-defined
always and correctly quantifies the power of the protocol. However, within the magic state model of faulty
magic states plus perfect Cli↵ord operations, there is another way to ensure that all errors are handled
clearly, and it also simplifies the analysis of the protocol.

Theorem 13.6. Let {| i i} be a basis for a Hilbert space H, and let {Ua } be a set of Cli↵ord group operations
on H. Suppose Ua | i i = ai | i i, and let vi = ( ai ) be the vector of eigenvalues for | i i. Assume vi 6= vj
for i 6= j, so each | i i is uniquely determined by its eigenvalues for {Ui }. Let the order of Ua be ma (i.e.,
Uama = I).
Suppose we take some (possibly mixed) state ⇢ in H, and for each a,P perform Uara , where ra is a random
0
integer from [0, ma 1]. This procedure converts ⇢ into a mixture ⇢ = i pi | i i h i |, with pi = h i |⇢| i i.

This procedure is an example of twirling. In general, twirling involves averaging over some group of
operations in order to simplify the form of the noise. This particular example says that we can twirl
noisy Cli↵ord eigenstates to produce a stochastic noise model where the true state is a mixture of Cli↵ord
†
eigenstates with di↵erent eigenvalues. For instance, the magic state R⇡/8 |+i is a +1 eigenstate of U = XR⇡/4 ,
so by performing U with probability 1/2, we produce a mixture of the ±1 eigenstates of U , namely R⇡/8 |±i.
It’s worth noting that implicit in theorem 13.6 is that the operations Ua all commute with each other,
since they share an eigenbasis. Therefore, the order that we perform them in doesn’t matter, and the
procedure described is equivalent to choosing a random element of the group generated by the {Ua }.

Proof. Let us write ⇢ in the specified basis:

X
⇢= ⇢ij | i i h j |. (13.27)
ij

The e↵ect of performing Uara is to give us Uara ⇢Ua ra

. Averaging over ra , we see the state is
" ma 1
#
X 1 X ra ⇤ ra
⇢ij ai ( aj ) | ii h j |. (13.28)
ij
ma r =0
a

Now, Ua has order ma , so ai and aj are ma -th roots of unity. Thus, if ai 6= aj , then the sum

ma 1
1 X ⇤ ra
( ai aj ) = 0. (13.29)
ma r =0
a

When ai = aj ,
ma 1
1 X ⇤ ra
( ai aj ) = 1, (13.30)
ma r =0
a

since | ai |2 = 1. The e↵ect of averaging over powers of one Ua is thus to make a block-diagonal matrix
with blocks corresponding to the eigenspaces of Ua . Within an eigenspace, the entries of the matrix do not
change.
Since | i i and | j i have di↵erent eigenvalues for at least one Cli↵ord Ua , averaging over powers of all of
the Ub eliminates the o↵-diagonal elements, leaving ⇢0 of the form claimed. The remaining diagonal elements
⇢0ii = ⇢ii = h i |⇢| i i.

241
Z

|R+1 i
Y

|R 1i

Figure 13.8: |R+1 i and |R 1i states in the Bloch sphere.

13.5.3 Other Magic State Distillation Procedures

There are other classes of magic state distillation procedures that don’t derive from transversal non-Cli↵ord
gates. One common feature of the known procedures is that they distill eigenstates of Cli↵ord group operators
using codes which have those Cli↵ord group operations as transversal gates. However, there are additional
factors in play. The fact that the code can correct errors does not necessarily play a role, and there may be
di↵erent mechanisms at work in di↵erent protocols.
An example of a protocol falling in thispclass uses the 5-qubit code to distill the state |Ri = cos |0i +
ei⇡/4 sin |1i, with given by cos(2 ) = 1/ 3. |Ri is an eigenstate of the Cli↵ord group gate R which maps
X 7! Z 7! Y 7! X under conjugation. As we saw in exercise ??, transversal R is a gadget for the 5-qubit
code. |Ri is the state in the center of the first octant of the Bloch sphere, as depicted in figure 13.8.
The |Ri state is a magic state only in the more general sense of the term. There is no known general
procedure such as protocol 13.1 for directly using |Ri to perform a unitary gate. Instead, there is a prob-
abilistic procedure using additional Cli↵ord group gates and Pauli measurements to convert |Ri to another
state which can in turn be used to perform a unitary non-Cli↵ord group gate (though not, interestingly, a
C3 gate). I won’t go into the details of the additional processing. They are not complicated but are a bit ad
hoc.
The distillation procedure itself is similar to the briefer version of the 15-qubit protocol:

Protocol 13.5.

1. Begin with 5 |Ri magic states, possibly with some errors.

2. Twirl in the R eigenbasis by performing I, R, or R2 each with probability 1/3.

3. On the 5 magic states, measure the error syndrome of the 5-qubit code. If the syndrome is non-zero,
throw out all the qubits and restart the protocol.

4. Decode the 5-qubit code, keeping only the decoded qubit.

5. Perform Y H on the remaining qubit.

In order to understand protocol 13.5, we should first consider the eigenstates and eigenvalues of R.
|R+1 i ⌘ |Ri is one eigenstate. The other eigenstate is the state |R 1 i on the opposite pole of the Bloch
sphere (see figure 13.8). The eigenvalues of |R+1 i and |R 1 i for R are . . . well, actually there is an ambiguity
in the eigenvalues. In section 6.1.4, we saw how to find the matrix representation of a Cli↵ord group
operation given its conjugation action on Paulis. However, the matrix is only determined up to a global
phase. Normally, that’s not a problem, but of course the global phase a↵ects the actual eigenvalues.

242
For the purposes of the analysis of protocol 13.5, we will choose a global phase for R such that the
eigenvalues are
R|Ra i = ei⇡a/3 |Ra i. (13.31)
The matrix corresponding to this choice of global phase is
✓ ◆
ei⇡/4 1 1
p . (13.32)
2 i i

The transversal R applied to the 5-qubit code performs a logical R, but it might be a realization of R with
a di↵erent global phase. Normally, again, this doesn’t matter, but we need to be careful since we want to
reason about eigenvalues.
Let’s put that o↵ for a moment, and plow ahead to figure out the e↵ect of the projection on the code space
on some possible input states. The twirling step and theorem 13.6 ensure that we can consider the noisy
magic state to be a mixture of |R+1 i and |R 1 i. Via a simple calculation or simply looking at figure 13.8,
you can see that 
1 1
|R±1 i hR±1 | = 1 ± p (X + Y + Z) , (13.33)
2 3
so the tensor product of five copies is

1 X ⇣ p ⌘wt P
(|R±1 i hR±1 |)⌦5 = ±1/ 3 P. (13.34)
25
P 2P̂5

Now, the projector on the 5-qubit code is

1 X
⇧5 = M, (13.35)
24
M 2S

where S is the stabilizer of the 5-qubit code, so

⇥ ⇤ 1 X⇣ p ⌘wt M
tr (|R±1 i hR±1 |)⌦5 ⇧5 = 4 ±1/ 3 phase(M ). (13.36)
2
M 2S

Here, phase(M ) means the phase which appears for M in S (i.e., if M = X ⌦ X ⌦ X ⌦ X 2 S, then
phase(M ) = 1). However, for the 5-qubit code, all elements of the stabilizer appear with overall phase
+1. We can therefore identify the projection of magic states on the code in terms of the weight enumerator,
introduced in section 7.4.1:
⇥ ⇤ 1 ⇣ p ⌘
tr (|R±1 i hR±1 |)⌦5 ⇧5 = A ±1/ 3 . (13.37)
16
For the five-qubit code, A(x) = 1 + 15x4 , giving us 1/6 for both tensor products.
In particular, both ⇧5 |R+1 i⌦5 and ⇧5 |R 1 i⌦5 are non-zero, and that will let us identify the global phase
for the logical R gate. I claim ⇧5 |R±1 i⌦5 are the eigenstates of R. Let R = ei R⌦5 . Since R⌦5 is a valid
gadget for the 5-qubit code, R⌦5 maps the code subspace into itself, meaning ⇧5 and R⌦5 commute. Thus,

R⇧5 |R±1 i⌦5 = ei R⌦5 ⇧5 |R±1 i⌦5 (13.38)

= ei ⇧5 R⌦5 |R±1 i⌦5 (13.39)
i ±5i⇡/3 ⌦5
=e e ⇧5 |R±1 i . (13.40)

Therefore, ⇧5 |R±1 i⌦5 is an eigenvector of R with eigenvalue ei( ⌥⇡/3) . The only way for this to match the
correct eigenvalues for R is to let = 0. We thus find that
p
|R⌥1 i = 6⇧5 |R±1 i⌦5 . (13.41)

243
This tells us that if we put in 5 perfect magic states |R+1 i, then with probability 1/6 the state is accepted,
in which case after decoding the 5-qubit code we have the state |R 1 i. Y H will then map this back to the
desired state |R+1 i. On the other hand, suppose one of the five magic states is wrong, so we have |R0 i, a
tensor product of 4 |R+1 i states and one |R 1 i state (in some order). We can repeat the above calculation,
and find
R⇧5 |R0 i = ei⇡ ⇧5 |R0 i. (13.42)
But 1 is not an eigenvalue of R, so the only way this can be true is if ⇧5 |R0 i = 0. If we put in 3 |R+1 i
states and 2 |R 1 i states, the projection is non-zero again.
This means that in order for the decoded magic state to be wrong, there must have been at least 2 errors
on the magic states fed into the protocol. When there is only one error, the state is always rejected by the
projection onto the code space. In particular, when the error rate per magic state is p, the error rate after
the protocol will be O(p2 ). If this is not good enough, the procedure can be applied iteratively, as with the
15-qubit code distillation protocol, to decrease the error rate as much as desired.

244
Chapter 14

If It’s Worth Doing, It’s Worth

Overdoing: The Threshold Theorem

It’s taken a while, but finally we have all the pieces for a complete fault-tolerant protocol. But there’s
something important missing — we haven’t shown that a fault-tolerant protocol gives us a more reliable
answer than a non-fault-tolerant protocol. That’s what I’ll cover in this chapter: The proof that the logical
error rate for a fault-tolerant simulation is lower than the error rate for an unencoded circuit. The proof
requires some more formal stu↵ to make precise what that statement means, but it uses the same basic tools
of gadgets and ideal decoders that you should hopefully be familiar with by now.
And since doing a fault-tolerant simulation of a noisy circuit gives an improvement, doing an FT sim-
ulation of a simulation is even better. If two levels of simulation are good, three levels of simulation must
be even better, and four levels better yet. Binging on fault tolerance1 leads us to the idea of concatenated
fault-tolerant protocols and the threshold theorem: If the physical error rate per location is low enough,
arbitrarily long quantum computation is possible.

14.1 Adversarial Errors

In chapter 10, I introduced the independent stochastic error model, which is a natural starting point for
discussions of fault tolerance. Unfortunately, the “independent” part of the model is not sufficiently general
to prove the threshold theorem. The basic problem is that when we have a fault-tolerant protocol, the correct
description of the error model on the logical qubits is not independent, even when the underlying physical
error model is an independent stochastic model.
The solution is to prove the threshold theorem using a more general error model, the adversarial stochastic
error model. An independent stochastic error model is a specific type of adversarial error model, so our final
result will also apply for systems with independent noise.

14.1.1 The Adversarial Local Stochastic Error Model

To motivate the definition of adversarial noise, first consider an independent stochastic error model, with
error rate pL for location L. Consider a set S of locations. What is the probabilityQ that the fault path
realized in a run of the circuit has a fault on every location in S? Why, it is just L2S pL . The probability
decreases exponentially with the number of locations in S. For adversarial noise, we’d like to keep this
property and discard another property of independent noise, namely that the type of error is independent
between faulty locations.
1 Don’t actually do this: While too much fault tolerance won’t make you sick, it will waste resources you probably want to

use to make a bigger logical circuit.

245
Definition 14.1. Given a circuit C, an adversarial local stochastic error model on C is a circuit C e which
is a mixture over realizations of C with di↵erent fault paths. In a given realization of C with fault path ,
locations outside perform the correct action for the given location, and locations in are replaced by arbi-
trary interactions (not necessarily a tensor product between faulty locations) with a persistent environment
which begins the computation in a standard state such as |0 . . . 0i. Fault path occurs with probability p ,
such that the following property holds: For each Plocation L, there
Q exists an error rate (or error probability)
pL such that, if S is any set of locations, then |S✓ p  L2S pL .

In other words, there is some error probability associated with each location, and the total probability
of error on some set of locations is at most the product over the error rates of those locations. In the case
where all pL are equal to p, the probability that the fault path includes the set S is at most p|S| , with |S|
the number of locations in S. The noise model doesn’t make any restriction on whether there are other
faulty locations outside S or not. There could be all sorts of horrible correlations between errors at di↵erent
locations, provided the probability of having many errors decays in the right way as a function of the number
of errors. The “local” in the name doesn’t refer to any geometric property of where qubits or errors are
located; it refers to errors acting on only a few qubits at a time.
One way to think about an adversarial local stochastic error model is to imagine laying out the ideal circuit
C, and then for each location L, choose it to be possibly faulty with probability pL , choosing independently
for each location. Then an enemy (the eponymous adversary of the error model), determined to foil your
computation whenever possible, gets to control the possibly faulty locations. The adversary can choose to
do whatever she likes on those faulty locations, interacting them with each other or an environment as she
wishes. She can even let a faulty location act correctly if that is in her best interest — and sometimes it
is! The only thing she cannot do is change the locations that were not selected as potentially faulty. Those
locations do the right thing, as specified by C.
In practice, we use the adversarial local stochastic error model to represent situations where there is no
actual adversary, but where we simply do not know — or don’t want to keep track of — precisely what
happens when there is an error. You could think of this as an error model being run by an adversary who
has some additional limitations (besides the restriction on which locations she can attack), or one that is
simply not intelligent enough to pick the worst possible choice for us.
Frequently, either the word local or the word adversarial is omitted from the name of error model:
adversarial stochastic errors or local stochastic errors. The latter choice is more common in the literature,
but is a little confusing since one might incorrectly think “local” means the errors are physically close to
each other. I will use “adversarial stochastic errors” instead. I have no intention of using the full name of
the error model: It is long enough even shortened,

14.1.2 What Is and Is Not Included in the Adversarial Error Model

As noted above, independent stochastic noise is included in the adversarial error model, but so are many
more things. For instance, if the error model has an error on each location with probability p, but either all
faulty locations have X errors or all faulty locations have Z errors (each with probability 1/2), that is an
adversarial noise model but not an independent one, since there is a strong correlation between the type of
errors on di↵erent locations.
Perhaps less clear is that even when the probability of having an error is correlated between locations,
the error model may still qualify as an adversarial stochastic model. For instance, suppose the qubits are
laid out on a grid in two or three dimensions and gates are only performed between nearest-neighbor qubits.
Imagine an error model where the locations involving adjacent pairs of qubits fail with probability p, and
disjoint pairs of locations fail independently. For an error model of this type, the probability that a single
location will fail is about Dp, where D is the number of adjacent locations to the one under consideration.
The probability that a specific pair of adjacent locations will fail is about p + (Dp)2 .
At this point, it may seem like this is not an adversarial stochastic error model, since the probability
of two locations failing is not O(p2 ). However, no one said that the error rate pL had to be equal to Dp,
the probability of a single location failing; it can be larger. Note that the probability of s = 2r locations

246
p
failing is O(pr ), which is the desired exponential decay. Therefore, if we pick pL to be about 2p, provided
that ispstill less than 1, we see that the error model qualifies: The probability of a single locationp failing is
2 2
Dp  2p (for small enough p), the probability of two locations p failing is about p + (Dp)  ( 2p) = 2p,
and the probability of s locations failing is about ps/2  ( 2p)s . Similarly, an error model which has
correlated errors a↵ecting sets of t errors at a time with probability p is an adversarial stochastic error model
with pL = O(p1/t ). Naturally, this is a bit inefficient as a way of bounding the errors, since frequently the
probability of a specific set of locations having errors is much less than the bound, but the adversarial error
model has the advantage of including a wide variety of di↵erent types of correlated models using a single
argument.
However, don’t get the idea that the adversarial error model applies to everything. For instance, an error
model that has some very small (but constant) probability that all the qubits in the computer experience
an X error is not an adversarial stochastic error model. We shouldn’t expect to be able to correct an error
a↵ecting all the qubits in the computer and indeed we can’t. Another more significant defect of the error
model is that it only deals with stochastic noise: The non-faulty locations are perfect even though the faulty
locations can be arbitrary. For instance, this doesn’t include a systematic error in which every unitary gate
in C is over-rotated by some small angle ✓. It also doesn’t include even perfectly independent noise where
each gate location is replaced by a quantum channel that is very close to the correct action U of the location,
unless every channel is of the form ⇢ 7! (1 p)U ⇢U † + pE(⇢). These last two non-stochastic error models
are very natural. Luckily, the threshold theorem does apply to them. We won’t cover them in this chapter,
but we’ll return to the topic in section 15.8.
Of course, when I say that a particular error model is not an adversarial stochastic error model, that is not
strictly mathematically true. You could write these examples — or any other error model — as adversarial
stochastic ones by taking pL = 1 for all locations. What I really mean is that it would be foolish to think of
them as adversarial stochastic error models, because doing so would require us to take pL to be large even
though the actual error is small, causing us to vastly overestimate the prevalence of errors in the circuit.

14.2 Good and Bad Extended Rectangles

Now let’s start on the task of proving that fault-tolerant circuits tolerate faults. We don’t want to have to
look at a huge circuit all at once, so the first task is to chop it up into manageable pieces called extended
rectangles.

14.2.1 Definition of an Extended Rectangle

Definition 14.2. Let C be a circuit with fault-tolerant simulation F T (C). Then an extended rectangle of
F T (C) consists of the locations comprising one of:

• A gate or wait gadget and all error correction gadgets immediately before and immediately after it

• A preparation gadget and the error correction gadget(s) immediately after it

• A measurement gadget and the error correction gadget(s) immediately before it

The phrase “extended rectangle” can be abbreviated exRec. An extended rectangle can be labeled by a
location L in C, the location for which the corresponding gadget in F T (C) appears in the exRec. We thus
may speak of gate exRecs, wait exRecs, preparation exRecs, or measurement exRecs. Any EC gadget in the
exRec before the gadget for L is known as a leading EC gadget, and any EC gadget after the gadget for L
is a trailing EC gadget.
A rectangle is defined similarly, but omits the leading EC gadget(s) in the exRec.

An example of extended rectangles is shown in figure 14.1. An extended rectangle for a two-qubit gate
location must include both EC gadgets before the gate gadget and both EC gadgets after the gate. Note
that extended rectangles overlap: Two consecutive gates in C give rise to a pair of exRecs that overlap,

247
1-qubit gate exRec 2-qubit gate exRec

FTEC FTEC FTEC

Prep. exRecs Meas. exRecs
FTEC FTEC FTEC

Wait exRec

Figure 14.1: An example of a circuit with extended rectangles marked.

FTEC
FTEC FTEC

FTEC

Figure 14.2: Examples of truncated extended rectangles.

sharing one or more EC gadgets. The trailing EC gadget for the earlier location is the leading EC gadget
for the later location.
ExRecs are “extended” because they include EC gadgets both before and after the gate. Rectangles are
perhaps a more natural notion than exRecs, since F T (C) can be broken up uniquely into non-overlapping
rectangles, but it turns out that analyzing individual rectangles in isolation doesn’t suffice to understand
the circuit’s fault tolerance properties, for reasons I will explain in section 14.3. Therefore, the proofs in this
chapter will use extended rectangles and not rectangles.
The last combination one might consider is an EC gadget followed by a gate, wait, or measurement
gadget. This combination is needed in the proof:

Definition 14.3. A truncated extended rectangle is an extended rectangle with one or more trailing EC
gadgets removed.

Note that a truncated exRec for a 2-qubit gate location could have a single trailing EC gadget or none
at all.

14.2.2 Definition of Good and Bad Extended Rectangles

Given a fault path, some exRecs will have many faults and some will have few or none. Clearly, we expect
extended rectangles with sufficiently few errors to behave like the corresponding ideal location and ones with
many faults to create logical errors. Therefore, let’s classify exRecs as “good” or “bad” depending on how
many faults are in them:

Definition 14.4. Let Q be a QECC correcting t errors, let C be a circuit, and let F T (C) be a fault-tolerant
simulation of C using a fault-tolerant protocol for the code Q. Fix a particular realization of F T (C) with a
particular fault path . An exRec in F T (C) is bad for if the exRec contains t + 1 or more faults in . The
exRec is good for if it contains t or fewer faults in . A truncated exRec can be classified as good or bad
in the same way, counting only those locations actually present in the truncated exRec, and not locations in
the omitted EC gadgets.

248
Good Bad

0 1 0 1 2
FTEC FTEC FTEC
0 0
FTEC FTEC
0 1 1
FTEC FTEC
1 1 1
FTEC FTEC

Figure 14.3: Examples of good and bad exRecs and truncated exRecs for a code with t = 2.

Thus, it is possible for a truncated exRec to be good even though the exRec it is contained in is bad.
This can happen in a number of ways. For instance, there could be t faults in the truncated exRec and one
more error in a trailing EC gadget. When the EC gadget is included in the exRec, there are t + 1 faults in
total, making the exRec bad, but when it is not included, there are only t faults.
Note that when we say an exRec is good or bad, we are really talking about a property of a particular
fault path. A di↵erent fault path has a di↵erent set of good and bad exRecs. Frequently, though, I will
imagine a situation where the fault path is fixed and therefore does not need to be explicitly mentioned.

14.3 Correctness
The next step is to show that good exRecs actually behave correctly. But what does it mean for an exRec
to be correct?

14.3.1 Definition of a Correct Rectangle or Circuit

To define correctness formally, we go back to the pictures I introduced to describe fault-tolerant properties
back in chapter 10. It’s most straightforward to define correctness for a rectangle rather than an extended
rectangle. We can use the ideal decoder to talk about the state encoded at any given time, and a rectangle
is correct if the encoded state evolves under the gadget in the way specified for the corresponding location
in the ideal circuit.
Definition 14.5. Let L be a location in a circuit C, and suppose we have some realization of F T (C) with
a particular fault path .
• If L is a one-block gate gadget, the rectangle for L is correct if

FTEC = (14.1)

• If L is a two-block gate gadget, the rectangle for L is correct if

FTEC
= (14.2)
FTEC

and similarly for multiblock gate gadgets.

• If L is a preparation location,

FTEC = (14.3)

249
• If L is a measurement location,
= (14.4)

We say an exRec is correct if the rectangle contained in it is correct. If a rectangle or exRec is not correct,
it is incorrect.

For a measurement location, note that a rectangle consists just of the measurement gadget, since there
is no EC step after measurement.
The correctness property for gate gadgets lets us move ideal decoders to the left through an exRec to
produce ideal gates. For a preparation location, moving the ideal decoder to the left gets us to the start of
the circuit, so correctness for a preparation location lets us get rid of an ideal decoder. Similarly, correctness
for a measurement location lets us create an ideal decoder.
Notice that the definition of correctness improves on the various fault-tolerant properties in that it uses
only ideal decoders and not filters. This lets us consider the correctness of a rectangle or an exRec in isolation
without having to worry too much about how many errors are carried over from the previous part of the
circuit. As with “good” and “bad”, however, the terms “correct” and “incorrect” only apply to an exRec
for a particular fault path , but whereas “good” and “bad” only depend on the locations of the faults in
, “correct” and “incorrect” may also depend on the types of errors produced by the specific faults.
For truncated rectangles or exRecs, the definition is essentially the same. A truncated rectangle may
just be a single gate gadget, although a truncated multi-block rectangle could have some trailing EC steps
remaining. There is no truncated measurement exRec, since the measurement exRec does not have a trailing
EC to remove.

Definition 14.6. A truncated rectangle is correct if:

• When L is a one-block gate gadget,

= (14.5)

• When L is a two-block gate gadget,

= (14.6)

or
FTEC
= (14.7)

and similarly for multi-block gate gadgets.

• If L is a preparation location,
= (14.8)

A truncated exRec is correct if the truncated rectangle contained in it is correct. A truncated rectangle or
exRec is incorrect if it is not correct.

250
14.3.2 Good Extended Rectangles Are Correct
Comparing the notions of “good” and “correct”, we see that they are somewhat complementary: A rectangle
is good if it doesn’t have too many faults, which is easy to check by looking at the fault path, and a rectangle
is correct if it behaves in the way it is supposed to.
We’d like to say that a good rectangle is correct, so that when the density of faults is low, the circuit
behaves correctly. However, if we look at just a single rectangle and see that it is good, we still don’t have
enough information to say that it is correct. For instance, suppose that we have a code with distance 3, such
as the 7-qubit code, and the fault path before the rectangle implies that the state at that point has one error
(which could be formalized by saying it passes a 1-filter but not a 0-filter). Further suppose there is one fault
in the rectangle. The rectangle is good, but depending on the exact nature of the fault, it could produce a
new single-qubit error which combines with the pre-existing single-qubit error to give a state which will be
decoded to the incorrect state.
To be even more concrete, imagine we have a wait rectangle for the 7-qubit code and the pre-existing
error is X5 , a bit flip on qubit 5. Before the rectangle, an ideal decoder applied to the state will correct the
bit flip error, producing some logical state, say |0i. Then suppose during the wait step, which occurs before
the EC gadget in the rectangle, there is a fault which produces an additional X7 error on the state. The
state of the system at that point is then X5 X7 |0i. But for the 7-qubit code, X5 X7 = X6 X, so the state
could also be written as X6 |1i. The EC step has no faults, so it will correct any errors present in the system,
but it will do so acting under the assumption that there is at most one error. Therefore, it will “correct” the
state to |1i. The ideal decoder applied after the rectangle therefore gives |1i, meaning correctness fails for
this fault path. On the other hand, if there were no pre-existing error, the state after the wait step would
be X7 |0i, and the EC step would correct that to |0i. Whether or not this rectangle is correct depends on
what errors are in the fault path outside the rectangle.
This problem can be solved by working with extended rectangles instead. The leading EC in an extended
rectangle ensures that there cannot be too many errors in the state entering the main gadget of the exRec.
We have the following result:

Theorem 14.1. If an extended rectangle is good, then it is correct. If a truncated extended rectangle is good,
it is correct.

Proof. I will prove this for a single-block gate exRec. The cases of multi-block gate exRecs, measurement
exRecs, and preparation exRecs are all extremely similar.
The exRec is good, so there are at most t faults in the whole exRec. We break that down into s1 faults
in the leading EC, s2 faults in the gate gadget, and s3 faults in the trailing EC, s1 + s2 + s3  t.

s1 s2 s3 s1 s1 s2 s3
FTEC FTEC = FTEC FTEC (14.9)

s1 s1 s2 s1 + s2 s3
= FTEC FTEC (14.10)

s1 s1 s2 s1 + s2
= FTEC (14.11)

s1 s1 s2
= FTEC (14.12)

s1 s1
= FTEC (14.13)

s1
= FTEC (14.14)

In the first line, we use the ECRP, which we can do because s1  t. To get the second, we use the GPP,
which is allowed because s1 + s2  t. The third uses the ECCP and the fact that s1 + s2 + s3  t. To get

251
the fourth line, we use the GPP in reverse. For line 5, use the GCP, and then use the ECRP in reverse to
get rid of the remaining filter for line 6. This shows that the exRec is correct.
For truncated exRecs, you could go through a similar argument to directly show that truncated good
exRecs are correct. However, we can save time by observing that

0
FTEC = (14.15)

so we can transform a truncated exRec followed by an ideal decoder into a regular untruncated exRec with
0 faults in the trailing EC(s). Then correctness for a good truncated exRec follows from correctness for a
good untruncated exRec.
Note that by theorem 10.3, theorem 14.1 holds not only when faults produce Pauli errors, but for arbitrary
faults, including those chosen by an adversary as in the adversarial stochastic model. Only the fault path
needs to be known in order to determine whether an exRec is good or bad.

14.3.3 Benign and Malignant Sets of Errors

While being good is sufficient for an exRec to be correct, it is not necessary. There are many situations
where a fault path may lead to an exRec that has more faults than allowed for a good exRec, but the
exRec is nonetheless correct. Sometimes this happens because of a cancellation between the particular errors
occurring at the locations in the fault path. However, in an adversarial error model, we assume the adversary
can pick the errors in such a way as to avoid cancellations, even turning o↵ one or more faults if necessary.
Still, in most exRecs, there are still many possible sets of locations which cannot cause the exRec to be
incorrect no matter which particular set of errors the adversary selects. This is known as a benign set of
locations.
Definition 14.7. Consider a single exRec in an arbitrary fault-tolerant circuit simulation, and let S be a
set of locations in the exRec. If the exRec is correct given any fault path that exactly agrees with S in the
exRec (but may include additional locations outside the exRec) and any choice of errors on the locations in
, then S is benign. If there exists a fault path and a choice of errors on such that the exRec is not
correct for that choice of errors, then S is malignant.
Note that for a set of locations to be malignant, it only needs the exRec to be incorrect for some choice
of errors at those locations and outside the exRec; it may be that most choices are fine, but since we are
currently working with an adversarial error model, we must assume that the adversary is going to choose
the particular types of errors that cause us the most problems.
Based on this definition, theorem 14.1 can be rephrased as saying that any set of  t locations in an
exRec is benign.
For proving the existence of a threshold for fault-tolerance, it sufficient to think about good and bad
extended rectangles. However, considering benign and malignant sets of errors lets you prove a significantly
better lower bound on the threshold. It can be difficult to classify by hand sets of locations as benign or
malignant, but it is not too hard to count using a computer, at least approximately, the number of malignant
sets of locations of a given cardinality.
When determining whether a given set is malignant or benign, it suffices to consider only Pauli errors.
(Although you must consider all possible arrangements of Pauli errors on those locations, including the
identity on some locations.) If the set is benign for Pauli errors, the same arguments as for theorem 10.3
then show that the set is benign for general errors, including when the adversary interacts faulty locations
with a persistent ancilla that may even begin the circuit entangled with the input state to the exRec.

14.3.4 Simulation with Only Correct Extended Rectangles

When we have a sequence of correct exRecs, the correctness property ensures that we can put them together
into a circuit which also behaves correctly.

252
0 1 0 0 1 0 1 0 0
FTEC U FTEC = FTEC U FTEC

0 1
= FTEC U

= U

Figure 14.4: Applying the correctness properties to prove that a fault-tolerant circuit with good exRecs is
equivalent to an ideal circuit. In this example, t = 1, so all exRecs are good. At each step, the next exRec
to apply correctness to is indicated by a box.

Theorem 14.2. Let C be a circuit starting with state preparation locations and ending with measurement
locations, and let F T (C) be the fault-tolerant simulation of C. Suppose for a particular fault path that all
the exRecs in F T (C) are good (or that the fault path restricted to each exRec gives a benign set of locations).
Then the output distribution of F T (C) for any adversarial choice of noise for is identical to the output
distribution of C with ideal gates.

Proof. The proof can be done using our graphical notation. All exRecs are good, so by theorem 14.1, all
exRecs are correct. Consider first the measurement exRecs throughout C. By correctness for measurement
exRecs,

FTEC = FTEC (14.16)

We can therefore replace each noisy FT measurement gadget in F T (C) by an ideal decoder followed by ideal
measurement.
Next, we can use induction to replace the gate gadgets by ideal gates. Suppose C has depth d. The
measurement gadgets in the last temporal layer of F T (C) have been replaced by ideal decoders followed by
ideal measurements, so the gate exRecs in layer d 1 are all followed by ideal decoders. Therefore, we can
apply correctness for a gate exRec to all gate exRecs in layer d 1 to replace

FTEC FTEC = FTEC (14.17)

and similarly for multiple-block gate exRecs.

Now all the exRecs in layer d 2 have ideal decoders after them. In the rth step, the exRecs in layer
d r are followed by ideal decoders, and we can use the correctness property to push the ideal decoder before
each layer d r gate rectangle, replacing the rectangle by an ideal gate corresponding to the gate gadget in
F T (C). The ideal gate is thus precisely the gate in the corresponding location of C.
Eventually, we have converted all gate exRecs into ideal gates, and the only gadgets remaining in the
circuit are preparation gadgets followed by EC gadgets, and all the EC gadgets are followed by ideal decoders.
That is, we have preparation exRecs followed by ideal decoders. Using correctness for a preparation exRec,

FTEC = (14.18)

we replace each preparation exRec with the corresponding ideal preparation location. The result of all these
manipulations is the desired circuit C.
An example of this procedure for a small circuit is given in figure 14.4.

253
Bad

0 2 0 0 0 0 0
FTEC FTEC FTEC
0 0 0 2 0
FTEC FTEC FTEC

Bad

Figure 14.5: An example where 4 faults cause 4 bad exRecs when t = 1. Bad exRecs are shaded.

14.4 Incorrectness: Simulations With a Bad Extended Rectangle

14.4.1 The Problem of Simulations With Bad Extended Rectangles
Of course, if we are performing the fault-tolerant simulation of a large circuit, there is little chance that we
will get a fault path for which every extended rectangle is good. We’ll also need some way to deal with bad
extended rectangles.
Based on the previous section, we know we can replace a good extended rectangle by an ideal gate (or
whatever location is appropriate to the exRec). Naturally, we’d like to replace each bad extended rectangle
by a noisy gate. We’d want to look at the actual set of errors occurring in the exRec in a particular realization
of the noisy circuit, and replace the exRec by the corresponding unencoded location but with an appropriate
error. Then, we’d like to say that an adversarial stochastic noise model acting on the physical locations for
the fault-tolerant simulation of C is equivalent to another adversarial stochastic noise model acting directly
on the locations of C, but with a lower error rate. Actually, we’d really prefer to do it with independent
stochastic noise, but we’ll settle for adversarial stochastic noise.
The first problem we face in doing this is that correlations between bad extended rectangles are too
strong. A schematic example of the difficulty is shown in figure 14.5. Since the trailing EC of one exRec
is the leading EC of the subsequent exRec, a single fault during an EC step contributes to the possibility
of two extended rectangles being bad. This can lead to situations, as in figure 14.5, where m(t + 1) faults
cause as many as 2m bad exRecs, whereas if the extended rectangles were non-overlapping, the same number
of faults could cause at most m bad exRecs. Consequently, if we were to simply replace the rectangle in
each bad exRec with a faulty unencoded location, we would find that the probability of a particular set of r
simulated locations being faulty is O(pr(t+1)/2 ). In the case of t = 1, this is just the same pr scaling we get
from implementing a faulty circuit without fault tolerance, and for larger t, it is still not nearly as good as
the scaling we expect to get from a code correcting many errors.
To deal with this, we need to get rid of the overlaps between bad exRecs. If we replace the complete
exRec with a faulty location rather than just the “rectangle” part of it, we can then excise the shared EC(s)
from the previous exRec(s). We’ve already taken their faults into account, so we don’t need to consider them
again. This leads to truncated exRecs. The details of how to decide which exRecs are truncated and which
aren’t will be discussed in section 14.6.
Unfortunately, there’s another problem: A noisy exRec doesn’t correspond to a particular unencoded
faulty location. That sounds crazy, I know. Surely the only alternative to having a perfect gate is having a
noisy gate, right? Indeed it is. The problem is that by looking at just one noisy exRec, you can’t deduce
precisely what noise occurs on the logical state of the code. In general, the logical error corresponding to a
bad exRec depends not just on the fault path within the exRec, but also on the error syndrome of the state
entering the exRec.
As an example, consider a wait exRec for the 7-qubit code using Steane error correction, as in figure 14.6.
Suppose the gate gadget before this exRec had a fault in it, so that the state entering the leading EC step
of the wait exRec has an error E in it. We’d like to replace the whole exRec with an ideal decoder followed

254
X2 Y1 X1 Z3 Z5 Y1 X1 Z3
a) b)
X7 Z6 Z5 Z6

|+i |+i |+i |+i

|0i H |0i H |0i H |0i H

Figure 14.6: An example of how di↵erent errors outside an exRec can combine with faults inside a bad exRec
to produce di↵erent logical errors.

by a faulty wait location. Suppose that the state we get from applying an ideal decoder to the input state
is | i, meaning that the actual state entering the exRec is E| i. Further suppose that we consider a fault
path with two faults in the exRec, making it a bad exRec. One is in the CNOT interacting the first data
qubit with the first ancilla qubit in the bit flip error correction step of the leading EC; that fault produces
an error Y on the first data qubit after the CNOT, and a corresponding error on the ancilla qubit so that
the bit flip part of the new error shows up in the error syndrome. The second fault causes an Z3 error during
the transversal wait gadget.
Let’s consider di↵erent possibilities for the pre-existing error E and determine what the state is if we
perform ideal decoding after the exRec. If E is an X2 error, then the pre-existing error and the X1 error
due to the fault in the EC step combine, and the bit flip correction step mistakenly believes the syndrome
to be due to a X7 error. In addition, we have the new Z error due to the fault in the wait step. Therefore,
after the initial EC step, the state is Y1 X2 X7 | i = iZ1 X| i. The Z3 error in the wait step then combines
with the Z1 error so that the phase error correction step of the trailing EC step believes the phase error to
be Z6 . Therefore the state of the block at the end of the exRec is iZ1 Z3 Z6 X| i = Y | i. To simulate this
properly, if we replace the full exRec by an ideal decoder followed by a faulty wait step, the logical wait step
should therefore have a Y error.
On the other hand, imagine E = Z5 . Then the phase error correction step of the leading EC step
eliminates it, and the bit flip error correction step eliminates the X part of the fault Y1 . Therefore the state
exiting the leading EC step is iZ1 | i. The phase error during the wait gadget combines with Z1 to make a
two-qubit phase error which, as before, is mistaken for a Z6 error in the trailing EC. The final state of the
exRec is thus Z| i. To simulate this situation properly, the decoded faulty wait step should have a Z error.
As you can see, the error being simulated here depends on the error coming into the exRec. We’d like to
be able to analyze the exRec by only considering what is going on inside it, but apparently we also need to
consider faults outside the exRec.
Note that if we only push the ideal decoder through the rectangle part of the exRec, then this particular
example doesn’t cause trouble, since in both cases, we can accurately simulate the circuit by having a logical
wait step with a Z error relative to the state that comes out of the leading EC step, and we can deduce
that by only looking at faults within the exRec (although not just the faults within the rectangle). However,
in other situations, possibly involving some extra faults, we can still have the problem (although not, as it
happens, for Steane EC; see exercise ??). Moreover, as I noted above, this strategy runs into the problem
that pushing the ideal decoder back only to the leading exRec allows overly strong correlations between bad
exRecs.

14.4.2 The *-Decoder

To solve this problem, we need a way to keep track of the essential information from other exRecs that is
needed to determine what fault we should use to replace a particular realization of a bad exRec. We don’t
need full information about what faults occurred where in the rest of the circuit; the quantum state entering
the exRec is enough. The quantum state can be written as superposition over Pauli errors of things of the

255
form E| i, where E is a Pauli error and | i is a valid codeword. The ideal decoder applied to this state
extracts | i, but discards information about the error. As we saw above, this is a problem because we need
to know what E is in order to determine the correct fault for the logical location simulated by the exRec.
Therefore, we introduce a new object, the ⇤-decoder. It works just like an ideal decoder, but instead of
throwing away the syndrome information, it keeps it. I will draw the ⇤-decoder this way:

(14.19)

When applied to a state E| i, the upper output line holds the decoded state | i, and the lower output line
holds the syndrome information (E). The ⇤-decoder can be chosen to be unitary, which ensures that no
information is lost. It is just the purification of the regular ideal decoder, which can be easily recovered from
the ⇤-decoder by simply discarding the second output register:

= (14.20)

We can define correctness for the ⇤-decoder similarly to correctness for the regular ideal decoder. The
only complication is that we must specify what happens to the syndrome information when we shift the
⇤-decoder forward. However, for correctness we don’t need a strong requirement; anything that happens to
the syndrome register is acceptable, as long as it doesn’t interact with the decoded state:

Definition 14.8. The rectangle corresponding to a single-block gate location L is correct for ⇤-decoding if

FTEC = (14.21)

where E can be any map applied to the syndrome register, including an interaction with a persistent memory
held by the adversary. E will generally depend on the precise faults occurring in the noisy gate gadget.
Correctness for ⇤-decoding is defined similarly for a multi-block gate rectangle. Preparation and measurement
rectangles are correct for ⇤-decoding if:

FTEC = (14.22)

|0i E

or
= (14.23)

An exRec is correct for ⇤-decoding if the rectangle contained within it is correct. To get the definition of
correctness for ⇤-decoding for a truncated extended rectangle, simply remove the FT EC step(s) from the
definition for an untruncated exRec.

The syndrome register does change when we shift the ⇤-decoder leftwards in the definition of correctness.
The type of error changes due to conjugation by any gates, some errors are corrected in the EC steps,
and new errors may be added due to faults in the rectangle. Precisely how the syndrome is altered is not
important for the proof of the threshold theorem, but if you’re interested, it can be determined exactly by

256
s1 s2
FTEC =

s1 s2
|0i FTEC

Figure 14.7: A combination of gadgets which produces the correct change in the syndrome information
during a correct rectangle. The left-pointing triangle is a ⇤-encoder, the inverse of the ⇤-decoder, encoding
a logical qubit into a codeword with errors determined by the syndrome (which enter and exit from the top
of the ⇤-encoder and ⇤-decoder on the bottom line).

encoding some dummy data using the syndrome information and subjecting it to the same noise as in the
original gate gadget. The procedure is illustrated in figure 14.7.
If an exRec is correct for ⇤-decoding, we can simply discard the syndrome information on both sides of
the equality, which means that it is also correct for regular ideal decoders. However, at first glance, it may
not be clear if correctness for ⇤-decoding is a strictly stronger property than regular correctness. In fact,
though, they are completely equivalent:

Proposition 14.3. If a rectangle, exRec, or truncated exRec is correct, then it is correct for ⇤-decoding.

Proof. As a starting point, use the relationship between ideal decoders and ⇤-decoders to replace the decoders
in the definition of correctness with ⇤-decoders. For instance, for a single-block rectangle, we get

FTEC = (14.24)

This statement is an equality of completely positive maps (for a particular Pauli fault path), and holds for
all input data states, including states that are entangled with some reference system. Let us take a maximally
entangled input state between the code block and a reference system. The terms in this entangled state run
over all possible syndromes and all possible logical states of the code block.
With this input, we can write the output state of the LHS as trS ⇢L and the output state of the RHS as
trS ⇢R , where S is the syndrome register and ⇢L and ⇢R are joint states of the reference system, the data
register, and the syndrome register. In fact, trS ⇢L and trS ⇢R are the states to which the LHS and RHS
maps correspond to under the Choi-Jamiolkowski isomorphism. The equation tells us that trS ⇢L = trS ⇢R .
Now purify ⇢L and ⇢R to | L i and | R i. We may assume that that the extra register A needed is held
by the adversary. Since trAS | L i h L | = trAS | R i h R |, there is a unitary map U acting on AS such that

U| Ri =| L i. (14.25)

But | L i and U | R i are the Choi-Jamiolkowski states corresponding to purified versions of the LHS and
RHS of the definition of correctness for ⇤-decoding. Therefore the maps are equal as well.

Corollary 14.4. If an extended rectangle or truncated extended rectangle is good, it is correct for ⇤-decoding.

257
14.4.3 Simulation of a Single Bad Extended Rectangle
Using the ⇤-decoder, we can then replace a bad extended rectangle with a noisy decoded state, provided we
let the noise possibly depend on the syndrome information from the input state:

FTEC FTEC = (14.26)

E
The noisy gate on the right hand side can be determined explicitly. Let U be the ⇤-decoder, which is
unitary, and E be the overall action of the full bad exRec (-[FT EC]-[FT Gate]-[FT EC]-), with a particular
fault path and choice of errors for that fault path. U † is a ⇤-encoder, which takes a logical state | i plus an
error syndrome s and creates an encoded state F | i such that F is a correctable error with (F ) = s. Since
U U † = I, we have
EU = U U † EU. (14.27)
We thus recognize the noisy decoded gate location above as U † EU . As desired, this depends only on the
fault path inside the exRec.
We can replace a bad truncated exRec in the same way. Naturally, E now includes the overall action of
the truncated exRec rather than the full exRec. The procedure is otherwise identical.
The procedures for bad preparation and measurement exRecs are similar, just lacking the leading or
trailing EC gadgets, respectively.

14.5 Probability of Having a Bad Rectangle

14.5.1 Basic Calculation of the Probability of Being Bad
Given a circuit broken up into good and bad extended rectangles, we now have an idea how to determine
the behavior of each of those exRecs. The next step is to figure out the probability over fault paths of a
particular set of good and bad extended rectangles. We’ll start by calculating the probability of a single
isolated exRec being bad. Actually, we’ll just upper bound the probability of having a bad exRec — the
adversarial stochastic noise model doesn’t let us do more than that, since it only sets an upper bound on
the probability of error on a set of locations.
The formula is not difficult. An exRec is bad if there are t + 1 or more faults in it. Consider a particular
set S of t + 1 locations. The definition
Q of adversarial stochastic noise tells us the probability of having faults
on all the locations in S is at most L2S pL . Applying a union bound, we sum over all sets S of size t + 1
to find: X Y
Prob(bad)  pL . (14.28)
|S|=t+1 L2S

Note that we don’t need to sum over sets of t + 2 or more locations — any such set certainly includes a
set S of t + 1 locations, and the bound on the probability of faults at S includes all fault paths that have
faults at S, including those that also have other faults in the same exRec. Of course, using the union bound
means we overcount, since a set of size t + 2, for instance, includes t + 2 di↵erent sets of size t + 1, all of
which are separately included in the sum. Compensating to subtract o↵ the overcounting would be difficult,
however, as the definition of adversarial stochastic noise doesn’t give us a lower bound on the probability of
the sets of size t + 2.
In the special case where pL = p for all locations L, the formula simplifies even more. We simply get
✓ ◆
A
Prob(bad)  pt+1 , (14.29)
t+1
where A is the total number of locations in the exRec.
It’s also worth considering how to modify the formula if you are willing to count malignant sets rather
than just considering bad exRecs. In that case, we should sum over sets S of locations, restricting attention

258
to malignant S. We need only consider minimal malignant sets, i.e., those malignant sets S such that no
proper subset of S is malignant. Note that if S is malignant than any set containing S is also malignant,
since the adversary can, if necessary, just choose to turn o↵ faults outside S by choosing the identity “error”
for those locations. Thus we necessarily have overcounting, as we did for bad exRecs, but once again we can
safely set an upper bound by simply summing over minimal sets. We find
X Y
Prob(malignant)  pL , (14.30)
minimal malignant S L2S

or
A
X
Prob(malignant)  M r pr (14.31)
r=t+1

in the case where the error probabilities for all locations are the same. Here, Mr is the number of minimal
malignant sets in the exRec containing exactly r locations.
Frequently it is inconvenient to count all malignant sets of arbitrary size, so we may want to cut o↵
the sum at some value of r. To do this, we must make a conservative estimate of the number of minimal
malignant sets of larger size. One straightforward way to do that would be to simply count all sets of size
greater than the cuto↵ as malignant sets. For instance, we could count malignant sets of size t + 1 and
assume all sets of size t + 2 are malignant. Then we would get
✓ ◆
A
Prob(malignant)  Mt+1 pt+1 + pt+2 . (14.32)
t+2
This a serious overcounting of the minimal malignant sets of size t + 2; for instance, it includes many sets
which have a subset of size t + 1 which is malignant and therefore are already counted. However, threshold
error rates tend to be low enough that the probability of a bad exRec is often dominated by the lowest-order
one or two terms (t + 1 and maybe t + 2 errors), so overcounting the higher-order malignant sets does not
matter very much.

14.5.2 Probability for Post-Selected Ancilla Preparation

One special case that is worth considering separately is what happens when we use ancilla preparation
techniques that involve post-selection, as did many of the methods introduced in chapter 13. It’s not enough
to just count the number (and perhaps type) of locations in a single attempt to create the ancilla, since
post-selection can distort probabilities. We need to be a bit more careful.
Let us assume throughout this section that the ancilla preparation gadget always succeeds when there
are no faults in it. Note, however, that some magic state distillation protocols do not have this property.
Consider the bigger picture. When I say “prepare the ancilla via post-selection,” what I really mean is
that you should make many parallel attempts to create the ancilla. Some of them will succeed, hopefully
(meaning the state passes the post-selection test, whether or not it is actually correct), and other attempts
may fail. Impose some ordering of the ancilla creation attempts and use the first one that succeeds for the
main exRec. If all the attempts fail, use one in the main exRec anyway — say, the ancilla produced by the
last attempt. Naturally, if we have to resort to this option, the ancilla has a good chance of being wrong,
but if we make enough attempts, the situation is unlikely to arise in the first place.
We can analyze this overall ancilla preparation protocol by including all the ancilla creation attempts
inside the same exRec. To make sure that a good exRec is correct, we need to be sure that least one of the
ancilla attempts succeeds when there are at most t errors in the exRec. t + 1 attempts to create the ancilla
are sufficient since then at least one try will have no faults.
The drawback of doing this when we count only the probability of an exRec being good or bad is that
by repeating the ancilla preparation step many times, most likely making many more attempts than are
needed, we greatly increase the number of locations in the exRec. Since the probability of a bad exRec goes
A
as t+1 = ⇥(At+1 ), this leads to an unpleasant explosion in our upper bound on the probability of an exRec
being bad. We’d like to find a better bound.

259
One solution is to count all the malignant sets properly. Imagine a fault path in which the first ancilla
preparation attempt has no faults, but there are t + 1 or more errors scattered throughout the other prepara-
tion attempts. Such a path is certainly benign, since we use the ancilla created in the first attempt, making
irrelevant what happens in the other attempts. Still, with so many locations altogether, an exact counting
of the malignant sets might be difficult.
We can make a good approximation, however, by considering the structure of the overall ancilla prepa-
ration protocol. Each attempt at creating an ancilla is fault-tolerant. In particular, if there are t or fewer
faults in the circuit for one attempt, then either the state produced by that try is rejected or it is correct (i.e.,
satisfies the PCP and PPP). Furthermore, all of the attempts to produce the ancilla use the same circuit,
so the same patterns of faults cause the same problems in the di↵erent tries.
Imagine a simplified procedure where the ancilla created in attempt j is used in the main part of the
exRec regardless of which attempts succeed or fail. We shall say that a fault path is malignant for ancilla j
if there is a choice of errors by the adversary consistent with this fault path such that the simplified exRec
using ancilla j is incorrect and ancilla j is accepted for that choice of errors. (Note that the ancilla might
or might not be responsible for the exRec’s failure to be correct.) Let P be the total probability of having a
fault path which is malignant for ancilla 1; since all ancilla attempts are the same, P is also the probability
of having a fault path which is malignant for ancilla j for any j.
In the real ancilla preparation protocol, only the first accepted ancilla is used. The adversary can choose
the nature of the errors, but can only cause an attempt to fail if there is at least one error in the attempt.
Therefore, in order for a fault path to be malignant for the exRec, it needs to be malignant for ancilla j (for
some j) and there needs to be at least one fault in each attempt up to j. (It is possible, though, for a fault
path to have this form and yet not be malignant, as there might be some locations in the ancilla preparation
step that cannot lead to the ancilla being rejected.) P
The probability of having at least one fault in a single attempt is at most Q = L pL , where the sum
is taken over the locations in a single try at creating an ancilla. The total probability of a malignant fault
path is then at most
P
Prob(malignant)  P + QP + Q2 P + . . .  . (14.33)
1 Q
(This is assuming Q < 1.) Therefore, we can bound the probability of incorrectness by calculating the
probability of getting a malignant set using just a single attempt at creating the ancilla and then correcting
by dividing by a bound 1 Q on the probability of acceptance. In other words, we need to compute the
conditional probability of being malignant, conditioned on accepting the ancilla.
In the special case where all locations have the same error rate p, Q = Bp (with B the number of locations
PA
in a single ancilla creation attempt) and P  r=t+1 Mr pr , as above. If there are multiple ancillas created
via post-selection in the same exRec, we need to divide by 1 Q for each separate ancilla used. There could
be multiple di↵erent values of Q involved if the ancillas used are di↵erent or are created in di↵erent ways.

14.5.3 Example: Probability of Incorrectness for the 7-Qubit Code

To see how these ideas work in practice, let’s go through the calculation of the probability of incorrectness
for single exRecs with the 7-qubit code.
There are 6 types of location and 6 types of exRec for the 7-qubit code: |0i preparation locations (physical
error rate pP ), Z measurement locations (error rate pM ), wait locations (error rate pS ), single-qubit Cli↵ord
group gate locations (error rate pG ), CNOT locations (error rate pCNOT ), and ⇡/8 gate locations (error rate
pN C ). You could add additional types of locations, such as X measurement locations, or could split up
single-qubit Cli↵ord group gates into H locations, R⇡/4 locations, and Pauli locations, but this set of 6 is
universal and covers the essential variety of exRecs.
One component shared by all of these types of exRecs is the fault-tolerant error correction gadget. Indeed,
most of the locations for most of these exRecs come from FT EC steps. Let us therefore start by counting
the number and types of locations in an FT EC gadget. We’ll use the circuit described in figure 12.9 except
that we will do the phase error correction first so that the data block doesn’t have to wait for the Hadamard
gates. The |0i ancillas used are created via the circuit of figure 14.8a and checked as in figure 13.3: we make

260
a) b)
|0i |0i
|0i |0i H R⇡/8
|0i |0i
|0i H |0i H
|0i |0i
|0i H |0i H
|0i H |0i H

Figure 14.8: Non-fault-tolerant circuits for creating an a) |0i state and b) R⇡/8 |+i state for the 7-qubit code
designed to minimize waiting times.

two |0i states and check them against each other, post-selecting on detecting no errors. The |+i ancillas are
created by making a |0i and performing the transversal Hadamard gate.
In a single EC gadget, we can count a total of 4 non-FT |0i encoding circuits, 2 transversal Hadamards
(containing 14 single-qubit Cli↵ord group gates in total), 3 transversal wait steps (consisting of 21 wait
locations), 1 Pauli correction (7 locations, including zero to two single-qubit Pauli operations, the remainder
being wait locations), 4 transversal measurements (28 measurement locations), and 4 transversal CNOTs
(28 CNOT locations). There are two ancilla post-selections. Each non-FT |0i encoding circuit consists of 9
CNOTs, 3 Hadamards, 2 waits, and 7 |0i preparation locations. The error rate for a gate location should be
at least as bad as the error rate for a wait location, so we can take the worst-case number of Pauli operations
as two. Therefore, the EC gadget consists of 182 locations, broken down as:

• 28 single-qubit Cli↵ord group gates

• 34 wait locations

• 28 |0i preparations

• 28 measurements

• 64 CNOTs

The post-selected ancilla preparation consists of two non-FT |0i preparations plus a transversal CNOT
(7 locations) and one transversal measurement (7 measurement locations and 7 wait locations). The total
probability of having a fault in one ancilla preparation step is at most

Q = 6pG + 11pS + 14pP + 7pM + 25pCNOT , (14.34)

or Q = 63p when all error rates are the same. We need to divide by (1 Q)2 for each EC gadget in the
exRec to correct for post-selection.
A single-qubit Cli↵ord gate exRec just consists of two FT EC gadgets and a transversal Cli↵ord gate, so
there are a total of 371 locations:

• 63 single-qubit Cli↵ord group gates

• 68 wait locations

• 56 |0i preparations

• 56 measurements

261
• 128 CNOTs

The probability of a single-qubit Cli↵ord group gate exRec being bad is thus

68635p2
Prob(single-qubit Cli↵ord gate exRec bad)  , (14.35)
(1 63p)4
or
1
Prob(bad)  1953p2G + 4284pG pS + 3528pG pP + 3528pG pM + 8064pG pCNOT + 2278p2S +
(1 Q)4
+ 3808pS pP + 3808pS pM + 8704pS pCNOT + 1540p2P + 3136pP pM + 7168pP pCNOT +
+1540p2M + 7168pM pCNOT + 8128p2CNOT . (14.36)

As you can see, keeping track of separate error rates for all these di↵erent types of locations becomes rather
tedious. For the rest of this calculation, I’ll therefore assume the simpler case where all error rates are equal
to p.
The wait exRec is very similar. Instead of a transversal gate gadget, it uses a transversal wait gadget.
Therefore, the total number of locations is the same, but there are 56 single-qubit Cli↵ord group gates and
75 wait locations in the gadget instead. The total probability of a wait location being bad is also

68635p2
Prob(wait exRec bad)  . (14.37)
(1 63p)4

The measurement exRec is actually simpler. It consists of just one EC step followed by a transversal
measurement. It therefore consists of just 189 locations, comprised of

• 28 single-qubit Cli↵ord group gates

• 34 wait locations

• 28 |0i preparations

• 35 measurements

• 64 CNOTs

The probability of it being bad is

17766p2
Prob(measurement exRec bad)  . (14.38)
(1 63p)2

A preparation exRec also involves only one EC step. The preparation part itself can use the same
procedure from figure 13.3 as used in the EC steps, including the same post-selection. The whole exRec thus
consists of 245 locations:

• 34 single-qubit Cli↵ord group gates

• 45 wait locations

• 42 |0i preparations

• 35 measurements

• 89 CNOTs

262
There is one additional post-selection, so we must divide by (1 Q)3 rather than (1 Q)2 . The overall
probability of being bad is
29890p2
Prob(preparation exRec bad)  . (14.39)
(1 63p)3
The CNOT exRec is bigger, but not significantly more complicated. There are 4 EC steps, two for each
block, plus a transversal CNOT with 7 locations. The total number of locations is 735:

• 112 single-qubit Cli↵ord group gates

• 136 wait locations

• 112 |0i preparations

• 112 measurements

• 263 CNOTs

The total probability of being bad is

269745p2
Prob(CNOT exRec bad)  (14.40)
(1 63p)8

The last kind of exRec is the most complicated: The ⇡/8 gate exRec. First we have to make the magic
state, test it, and then inject the ⇡/8 gate into the computation. To test it, we will use the cat state
measurement procedure described in section 13.4. Then we inject it with the circuit of figure 13.7.
The first thing to do is make a non-fault-tolerant R⇡/8 |+i state, as in figure 14.8. This involves 11 CNOTs,
4 Hadamards, 1 R⇡/8 , 3 waits, and 7 |0i preparation locations. Next, we do the cat state measurement. The
first step of cat state measurement is to perform a full EC step: The cat state measurement will check if
the logical state is correct, but recall that it can give the wrong answer if there is an error on even a single
physical qubit in the incoming state. In particular, we are concerned here that a single fault in the R⇡/8 |+i
encoding circuit will produce both the wrong logical state and a single-qubit error that causes the cat state
measurement to give the wrong answer, incorrectly allowing the state to pass.
†
The next step is to create and verify a cat state and use it to measure the eigenvalue of U = R⇡/8 XR⇡/8
on the logical state. We can do this via physical controlled-U gates from the cat state qubits to the ancilla
†
block physical qubits. These are gates which are not in our gate set, so we break them down into R⇡/8 =
R⇡/8 R ⇡/4 , followed by CNOT, followed by R⇡/8 . The cat state can be checked using a single extra ancilla
qubit to make sure that a single fault doesn’t produce two bit flip errors in it. The full cat state creation
and U measurement is shown in figure 14.9. Overall, the cat state measurement has 15 CNOT gates, 15
single-qubit Cli↵ord group gates (8 Hadamards and 7 R ⇡/4 gates), 14 R⇡/8 gates, 12 wait steps, 8 |0i state
preparations, and 8 measurements.
Altogether, the R⇡/8 |+i preparation and checking (including the EC step) involves a total of

• 47 single-qubit Cli↵ord group gates

• 15 R⇡/8 gates

• 49 wait locations

• 43 |0i preparations

• 36 measurements

• 90 CNOT gates

263
|0i

|0i H
|0i H
|0i H
|0i H H
|0i H
|0i H
|0i H

R ⇡/4 R⇡/8 R⇡/8

R ⇡/4 R⇡/8 R⇡/8
R ⇡/4 R⇡/8 R⇡/8
R⇡/8 |+i R ⇡/4 R⇡/8 R⇡/8
R ⇡/4 R⇡/8 R⇡/8
R ⇡/4 R⇡/8 R⇡/8
R ⇡/4 R⇡/8 R⇡/8

Figure 14.9: Testing a R⇡/8 |+i state via cat state measurement. An EC step (not shown) precedes this
circuit.

264
This is a total of 282 locations. We will post-select on the cat state passing verification and the R⇡/8 |+i
passing the cat state measurement, as well as the ancillas in the included EC step. The magic state injection
contains a transversal CNOT, a transversal measurement, a wait step for one block, and potentially a
transversal U (which is a Cli↵ord group gate). Each of these involves 7 locations, and the whole ⇡/8 exRec
has an additional 2 EC steps.
The ⇡/8 gate exRec therefore has a total of

• 110 single-qubit Cli↵ord group gates

• 15 R⇡/8 gates

• 124 wait locations

• 99 |0i preparations

• 99 measurements

• 225 CNOT gates

This is a total of 672 locations, and we are post-selecting on the R⇡/8 |+i preparation as well as the preparation
of the ancillas in the two main EC steps. Thus, the probability that a ⇡/8 exRec is bad is

225456p2
Prob(⇡/8 exRec bad)  . (14.41)
(1 63p)4 (1 282p)

14.6 Level Reduction

The next step is to put all of this together. We’ve talked about what to do with a single exRec, whether it
is good or bad. We now need to figure out what to do when we have a full circuit.

14.6.1 Truncation
Given a particular fault-path for a fault-tolerant simulation of circuit C, we can look at each exRec to decide
whether it is good or bad. At first sight, this seems straightforward, but on closer examination, there is a
complication. The naı̈ve approach would be to look at each exRec in isolation and determine whether it is
good or bad based on that. However, before deciding if an exRec is good or bad, we should first figure out
whether it is truncated or not, and that depends on the other exRecs around it. Remember, we want to
avoid a situation where a small number of failures in a single EC step can make both of the exRecs sharing
that EC step fail. We’d like to assign the EC step to just one of these two exRecs — we’ll choose the later
one — so that the other exRec has a chance to be good. Or at least, if it’s bad, it’s bad for completely
di↵erent reasons that have nothing to do with the shared FTEC gadget.
If the exRecs immediately after the exRec under consideration are good, correctness will let us move ideal
decoders (or ⇤-decoders) from the end of the subsequent exRecs to just after the trailing FTEC gadgets for
the current exRec. Therefore, we can leave the exRec untruncated. However, if one of the exRecs following
the current exRec is bad, we want to truncate the EC step shared with the bad exRec.
Therefore, we can use the following procedure to determine truncation and decide which exRecs are good
and bad:

Procedure 14.1.

1. Initialize s = T , where T is the number of layers (time steps) in the circuit C we are simulating.

2. Given a fault-tolerant simulation F T (C) of C and a fault-path for F T (C), look at the exRecs
simulating locations in time step s of C. Determine if each such exRec is good or bad by counting the
number of faults in it. (Alternatively: determine whether the fault path in it is benign or malignant.)

265
R S

0 1 1 0 0
FTEC FTEC FTEC
0 1 1
FTEC FTEC

Figure 14.10: Three exRecs for a code where t = 1. Each of the three exRecs has 2 faults, but only two
are bad due to the truncation rules: T is bad, so S is truncated, which means S is good. That means R is
untruncated, so R is bad.

3. s 7! s 1. If t = 0, end.

4. For each exRec simulating a location in time step s of C, look at the exRec or exRecs immediately
after it in the circuit (namely, those at time s + 1 and sharing a FTEC gadget with it). If one or more
of these exRecs is bad (has a malignant fault path), truncate the current time s exRec by removing
the trailing FTEC gadget or gadgets shared with the bad (malignant) exRecs.

5. Return to step 2, evaluating truncated exRecs according to the usual definitions.

In other words, we start at the end and work our way backward, determining truncation and badness
together. As an example, consider figure 14.10 for an FT protocol involving a code correcting 1 error.
Because T is the last exRec, it is bad. However, that means that exRec S is truncated. Now it only has 1
fault in it, so it is not bad, even though S untruncated would be bad. But since S is good, that means that
R is not truncated and is therefore bad.

14.6.2 Level Reduction Theorem

Combining the results of sections 14.3 and 14.4.3 with procedure 14.1, we now know what to do with a
fault-tolerant circuit experiencing a particular fault path:

Lemma 14.5. Let C be a circuit starting with state preparation locations and ending with measurements,
and let F T (C) be a fault-tolerant simulation of C. Suppose F T (C) experiences a fault path with particular
errors assigned at fault locations. Use procedure 14.1 to truncate the extended rectangles and determine
whether each is good or bad. Then the output of F T (C) has the same distribution as the output of C, where
each location L of C which corresponds to a bad exRec of F T (C) is replaced by a noisy location, as described
in section 14.4.3.

Proof. The proof follows that of theorem 14.2. The di↵erences are that we use ⇤-decoders everywhere, that
some exRecs are truncated, and that some exRecs are bad. Only the last issue causes a meaningful change
in the proof, and it is only a small change.
As before, we begin by replacing good measurement exRecs according to the rule for correctness with
⇤-decoders. A bad measurement exRec is replaced with a ⇤-decoder followed by a noisy measurement on
the decoded state, as per section 14.4.3. Then we work our backwards through the circuit, replacing good
exRecs using the rules for correctness and bad exRecs as in section 14.4.3. Every time we replace a bad
exRec, the preceding exRec(s) are truncated. The resulting truncation is consistent with procedure 14.1.
Once we have replaced all the gate and wait exRecs, the only remaining exRecs are preparation exRecs,
each of which is followed by a ⇤-decoder. Using correctness for preparation exRecs, we replace the good
preparation exRecs by unencoded ideal preparation locations. The bad preparation exRecs are replaced by

266
Bad

1 2 0 0 1 0 1
FTEC U FTEC FTEC U
0 0 0 V 1 1 = V
FTEC FTEC FTEC

|0i E1 E2
Bad

Figure 14.11: An example of translating a fault path on a noisy fault-tolerant simulation into a noisy circuit.
In this example, t = 1. Because of truncation, only two exRecs (shaded) are bad. They are replaced by
noisy locations. The other exRecs are replaced by ideal locations.

noisy preparation locations. Either way, the ⇤-decoders are eliminated, leaving us with the circuit C, but
with locations corresponding to bad exRecs replaced with noisy locations.

This lemma tells us what to do for a fixed fault path. An example of the procedure is given in figure 14.11.
To understand the behavior given a distribution over fault paths, we simply need to consider the probability
of having di↵erent kinds of fault paths.
Up until now, the analysis has not depended at all on the precise code used in the FT protocol. However,
when we move a ⇤-decoder through a bad exRec, if the QECC encodes multiple qubits per block, the noisy
location that results a↵ects all of the qubits encoded in the block. If those qubits were part of separate
locations in the original unencoded circuit, then those locations have potentially correlated errors in them.
There are a few possible ways to handle this, but the most straightforward is to simply rule it out as an
issue for now by restricting attention to QECCs with only one logical qubit per block.

Theorem 14.6 (Level Reduction). Let C be a circuit starting with state preparation locations and ending
with measurements, and let F T (C) be a fault-tolerant simulation of C using a QECC that encodes one qubit
per block. Suppose F T (C) is subjected to an adversarial stochastic error model with error probabilities pM
for locations M 2 F T (C). Then the noisy realization F^ T (C) of F T (C) is equivalent to a noisy realization
e
C of C undergoing an adversarial stochastic error model, with error probabilities p0 for L 2 C bounded by
L
X Y
p0L  pM , (14.42)
minimal malignant S M 2S

where the sum is taken over minimal malignant sets S for the exRec corresponding to L.

Proof. Consider a set of T of locations of the circuit C. We need to bound the probability of having a fault
path for F T (C) which is malignant for all of the exRecs corresponding to locations of T . For our purposes,
we can equally well consider T to be a set of exRecs in F T (C).
Let be a fault path which is malignant for all exRecs in T . Then any exRec immediately before an
exRec in T is truncated for , in accordance with procedure 14.1. There may be additional exRecs truncated
for as well since might be malignant for additional exRecs. I will refer to the locations before the ones
in T as “known to be truncated.”
In order for to qualify as malignant for all exRecs in T , the locations \ RL must form a malignant
set for each L 2 T , where RL is the set of locations for the exRec corresponding to L. If L is known to be
truncated, then RL contains the locations in the appropriately truncated version of the exRec; otherwise,
RL contains all locations in the exRec.

267
For each L 2 T , fix a fault path ⌦L which is contained in RL and is malignant for the exRec corresponding
to L. Let [
= ⌦L . (14.43)
L2T

Note that because of the truncation rules, ⌦L \ ⌦L0 = ; if L 6= L0 . Since the noise model for F^
T (C) is an
adversarial stochastic one, the total probability of having a fault path 0 such that ✓ 0 is at most
X Y Y
P = p 0 pM . (14.44)
0◆ L2T M 2⌦L
0
Note that for some of the fault paths that appear in the sum, the exRec corresponding to L 2 T might
not actually be malignant. This can occur if the fault path is malignant for an exRec for a location L0
immediately after L and L0 62 T . Then the exRec for L is actually truncated for 0 even though it is not
truncated for , which means that the fault path restricted to the correctly truncated exRec for 0 might
be smaller and potentially benign. That’s OK, though. That just means that we will overestimate the
probability of error by including some fault paths that don’t cause a problem.
Now let us upper bound the total probability that every exRec in T is malignant. Any such fault path
0
must contain aPfault path constructed as in equation (14.43). The sum over p 0 can therefore be bounded
above by a sum P , with as in equation (14.43). We can sum over by summing over all possible
malignant fault paths ⌦L for all L 2 T , but actually it is sufficient to just sum over minimal malignant ⌦L ,
since taking ⌦L ✓ ⌦0L implies that the resulting ✓ 0 , and therefore 0 is already included in the sum for
P . Then
X X X Y Y
p 0 P  pM (14.45)
0 {minimal malignant ⌦L ,L2T } L2T M 2⌦L
Y X Y
= pM (14.46)
L2T minimal malignant ⌦L M 2⌦L
Y
= p0L , (14.47)
L2T

where X Y
p0L = pM . (14.48)
minimal malignant ⌦L M 2⌦L
This di↵ers from the desired result in just one respect, although the notation does not make the di↵erence
completely evident. The desired result sums over fault paths which are malignant for the untruncated exRec
corresponding to L, whereas equation (14.48) sums over a truncated exRec if L is known to be truncated.
This is not a problem, however, since any set which is malignant for the truncated exRec is also malignant
for the untruncated exRec, so equation (14.48) is a tighter bound than is needed for equation (14.42).
The level reduction theorem tells us that a noisy fault-tolerant circuit is equivalent to a noisy unencoded
circuit with a new, hopefully reduced error rate. We still have adversarial stochastic noise, and the simulated
circuit has an error rate p0L for location L.
By approximating and assuming that the error rates p for all locations are the same, and that any set
of size t + 1 or greater is malignant (meaning we count simply bad exRecs rather than malignant sets), and
assuming for simplicity that there is no post-selection, we find
✓ ◆
AL
p0L  pt+1 , (14.49)
t+1
as in section 14.5, where AL is the number of locations in the exRec for L. We see that if
✓ ◆
A
pt+1  p, (14.50)
t+1
where A is the size of the largest exRec, then the error rate per location has decreased in the simulated noisy
circuit compared to the error rate per location in the original error model.

268
14.6.3 Why We Consider Adversarial Errors
At this point, it is worth pausing briefly to understand why we needed to consider adversarial errors.
Certainly, the level reduction theorem would have worked just as well if F^ T (C) underwent independent
stochastic noise rather than adversarial stochastic noise, since independent stochastic noise is a special case
of adversarial stochastic noise. However, that would not change the conclusion: the noise model for the noisy
simulated circuit Ce would still have been an adversarial stochastic one.
There are two reasons for this. The first is because the presence of errors in adjacent exRecs is not
independent. If the second exRec is bad, then the first exRec must get truncated, which reduces the
probability of the first exRec also being bad, since a truncated exRec is smaller and has fewer locations. That
is, error probability is anti-correlated between adjacent exRecs. This is a failure of independent stochastic
noise, though perhaps not one that bothers you very much. A second failure might be more worrying: the
type of errors that occur on adjacent bad exRecs can be correlated as well. This is because the type of error
given by the procedure in section 14.4.3 is conditioned on the error syndrome register, which persists between
adjacent bad exRecs. Therefore errors on them can be entangled or otherwise potentially problematic, and
certainly are not independent. This means that, like it or not, once we deal with fault-tolerant simulations,
independent stochastic errors are not sufficient.

14.7 Concatenation and the Threshold Theorem

We’ve seen that it is possible to improve the e↵ective error rate by using a fault-tolerant simulation of a
circuit. For instance, by using a code to correct 1 error, we can change the logical error rate from p per
location to O(p2 ) per location. If we’re doing a long calculation, however, that might not be good enough.
After about 1/p2 steps, it’s likely that the logical circuit will su↵er a fault, which means that any computation
longer than that is likely to get the wrong answer.
We’ll need to do better than a distance 3. How much better can the logical error rate get? Using the
simplified formula,
✓ ◆
0 AL
pL  pt+1 , (14.51)
t+1
we see that we can get high powers of p, and therefore great reductions in the error rate, by working with
codes which correct many errors. However, bigger codes also mean bigger exRecs, so A increases as well.
Counting malignant sets of locations improves the bound on the logical error rate, but doesn’t change the
basic conclusion: To reliably simulate larger and larger circuits, we’ll need a family of codes that correct
more and more errors, but we can’t just pick any family. We must find a family of codes where the number
of malignant sets does not increase too rapidly with the size of the codes.

14.7.1 Concatenation of Simulation Circuits

A natural family of codes to use is concatenated codes. If we can encode an arbitrary circuit C in a fault-
tolerant simulation F T (C) and reduce the logical error rate, then surely we can perform a simulation of the
simulation, F T (F T (C)), and make the logical error rate even smaller. And if that’s not good enough, we
can keep on adding levels of simulation as long as necessary to make the error rate as small as we like.
That intuition is completely correct. I’ve now introduced all the formalism necessary to make the previous
paragraph rigorous and quickly prove the resulting threshold theorem.
The resulting protocol is a fault-tolerant protocol for a concatenated code, as discussed in section 9.1.
The protocol itself has a self-similar structure much like the code being used and is illustrated in figure 14.12.
For instance, to perform a logical gate gadget on a level k qubit, we need to perform a series of gate gadgets
on level k 1 qubits, interspersed with fault-tolerant error correction steps on level k 1 qubits. Each of
the gate gadgets and EC steps for level k 1 qubits is in turn made up of a number of gadgets for level k 2
qubits, and so on.

269
FTEC FTEC FTEC H FTEC
FTEC FTEC FTEC H FTEC
FTEC FTEC FTEC H FTEC
FTEC FTEC FTEC H FTEC
FTEC FTEC FTEC H FTEC
FTEC FTEC FTEC H FTEC
FTEC FTEC FTEC H FTEC

Level 2
FTEC FTEC Hadamard
FTEC FTEC
FTEC FTEC
|+i FTEC FTEC
FTEC FTEC
FTEC FTEC
FTEC FTEC

FTEC FTEC H FTEC Level 2 EC

FTEC FTEC H FTEC
FTEC FTEC H FTEC
|0i FTEC FTEC H FTEC
FTEC FTEC H FTEC
FTEC FTEC H FTEC
FTEC FTEC H FTEC

H
Level 2 Hadamard
|+i
|0i H

Level 2 EC

Figure 14.12: A portion of a 2-level concatenated fault-tolerant protocol for the concatenated 7-qubit code.
Each level 1 qubit in the top part of the figure consists of 7 level 0 (physical) qubits, each level 1 transversal
gadget consists of 7 level 0 locations, and each level 1 FT EC gadget consists of another Steane EC step
made up of level 0 locations.

270
Definition 14.9. Let C0 ⌘ C be a circuit, and let Ck = F T (Ck 1 ). Consider Cl , the level l fault-tolerant
simulation of C. The physical locations of Cl are the level 0 locations of the circuit. The gadgets simulating
the physical locations of the circuit Ck in the recursive definition of Cl are the level l k locations and the
level l k fault-tolerant error correction gadgets for Cl .
The logic behind these definitions is that the gadgets for Ck , when realized in Cl , are simulated via l k
levels of encoding, and perform operations which act on level l k qubits.

14.7.2 The Threshold Theorem

Now let us prove theorem 10.4. Let me remind you of the precise statement:
Theorem 10.4 (Threshold Theorem for Basic Model of Fault Tolerance). Suppose a system satisfies a basic
model for fault tolerance, with pP = pG = pS = pM = p. Then there exists a family of fault-tolerant protocols
Fl and a threshold error rate pT such that, if p < pT , then for any ideal quantum circuit C which starts
with preparation locations and ends with measurement locations, and any ✏ > 0, there exists an l such that
the output distribution of F Tl (C), the fault-tolerant simulation of C by Fl , has statistical distance at most
✏ from the output of C. When C has T locations, then F Tl (C) has O(T polylog(T /✏)) locations (i.e., the
circuit size overhead is O(polylog(T /✏)).
Based on the previous discussion, we can immediately identify F Tl (C) as Cl . Recall that back in chap-
ter 10, we were using a basic model of fault tolerance, which assumes independent stochastic noise. Our
proof will work just as well for adversarial stochastic noise.
Proof. Let us determine the logical error rate for Cl , the level l fault-tolerant simulation of C. Let us assume
for simplicity that no post-selection is needed in the fault-tolerant simulation, and that the code used at
each level encodes only one qubit per block and has distance 3. Fault-tolerant protocols exist with these
properties, so this assumption is sufficient to prove the theorem. For instance, we may use the 7-qubit code.
Since Cl = F T (Cl 1 ), by theorem 14.6, if Cl undergoes adversarial stochastic noise with error rate p0 = p
per physical location, C ]
fl is equivalent to C l 1 undergoing adversarial stochastic noise with an error rate p1
per location, with ✓ ◆
A 2
p1  p , (14.52)
2 0
where A is the number of locations in the largest exRec in the fault-tolerant protocol. But, again applying
theorem 14.6, C ]l 1 undergoing adversarial stochastic noise with error rate p1 per location is equivalent to
]
Cl 2 with adversarial stochastic noise and an error rate
✓ ◆
A 2
p2  p . (14.53)
2 1
2
Let pj = pT (pj 1 /pT ) , with ✓ ◆
A
pT = 1/ . (14.54)
2
By induction, we see that C fl with adversarial stochastic noise and error rate p0 is equivalent to C] l j with
f f e
adversarial stochastic noise and error rate pj . In particular, Cl is equivalent to C0 = C with error rate pl .
Notice that
✓ ◆2 ✓ ◆4 ✓ ◆8
pl pl 1 pl 2 pl 3
= = = = ... (14.55)
pT pT pT pT
✓ ◆2 j
pl j
= (14.56)
pT
✓ ◆2 l
p0
= . (14.57)
pT

271
The logical error rate decreases as the double exponential of the number of levels if p0 < pT .
Our goal for the fault-tolerant computation is for the outcome distribution to have statistical distance
at most ✏ from the correct distribution for a circuit with T locations. Let us therefore choose l sufficiently
large so that pl  ✏/T . That is, we should choose
⇠ ✓ ◆⇡
log(T pT /✏)
l = log . (14.58)
log(pT /p)

Given this, the total probability of having a faulty level l location in the circuit is P  T pl  ✏. If the
correct outcome distribution is D, the true distribution of outcomes is thus (1 P )D + P D0 for some D0 .
The statistical distance between this and D is
1X 1X
|Dx [(1 P )Dx + P Dx0 ]| = P |Dx Dx0 |  P  ✏, (14.59)
2 x 2 x

as desired.
Now let us bound the circuit size overhead needed for this simulation. Ck = F T (Ck 1 ), so each location
in Ck 1 is replaced by a fault-tolerant gadget, and between each pair of gadgets is an FT EC step. If we
choose B to be the total number of locations in the largest rectangle (not exRec), then we can safely bound
the number of locations in Ck by B times the number of locations in Ck 1 . (Note that B  A, since every
rectangle is contained in an exRec.) Therefore, the size of Cl is at most B l T , where T is the number of
locations in C0 .
This is an exponential cost with the number of levels of concatenation. That sounds bad, but remember,
the error decreases doubly-exponentially with the number of levels. Compared to a double exponential, a
single exponential is nothing. In particular, we find that the total number of locations in Cl is at most
 !
log B
l log[log(T pT /✏)/ log(pT /p)] log(T pT /✏)
T B  BT B =O T . (14.60)
log(pT /p)

pT and B are constants depending on the fault-tolerant protocol. If we consider p to be a constant as well,
we recover the desired form for the overhead O(polylog(T /✏)).
If desired, you can separately count the space overhead needed by letting B 0 be the number of qubits
needed for a single gate gadget plus EC step. The formula is otherwise the same.

When p > pT , the proof above suggests that the logical error rate actually increases as a double expo-
nential as we add levels of concatenation. This happens because when the error rate is too high, the extra
locations introduced by the fault-tolerant simulation o↵er too many new opportunities for errors, and errors
occur in the fault-tolerant circuit faster than they can be corrected by the EC steps. Adding more levels of
concatenation rapidly makes the problem worse.
We can think of error rate in a concatenated fault-tolerant protocol as a flow diagram, with a vector
at a point p pointing towards the error rate for an FT circuit with physical error rate p. The threshold is
an unstable fixed point for this flow diagram. Points below the threshold will flow towards the stable fixed
point at p = 0, while points above the threshold flow towards the stable fixed point at p = 1. Essentially, the
threshold marks a phase transition between a phase where long quantum computations are possible and a
computationally weaker phase where quantum information can last only a short time before being destroyed
by noise. The threshold is not a phase transition in precisely a conventional condensed matter sense, since
this is a dynamical system with active error correction occurring frequently, but it does have many of the
qualitative behaviors normally seen in a phase transition.
The proof of theorem 10.4 suggests that the threshold value is pT = 1/ A2 , but of course, that was based
on the crudest method of bounding the error rate from section 14.5. We can potentially do much better by
doing a careful counting of malignant sets:

272
Theorem 14.7. In theorem 10.4, let the family of protocols Fl be l levels of concatenation of the fault-tolerant
protocol F. Then the threshold value pT can be taken to be the supremum over p < 1 such that
A
X
p> M r pr (14.61)
r=t+1

for all exRecs, where A is the number of locations in an exRec and Mr is the number of minimal malignant
sets of r locations contained in the exRec. The values A and Mr should be taken for the type of exRec that
gives the smallest solution pT .

The proof just duplicates the proof above for theorem 10.4, but uses the formula for the probability
of an incorrect exRec in terms of malignant sets instead of counting all sets of t locations. Solutions to
equation (14.61) are fixed points in the flow diagram for a particular type of exRec, and the smallest positive
solution is an unstable fixed point, as discussed above. We take the type of exRec that gives the smallest
solution because then for all other exRecs, the error rate is definitely below the unstable fixed point, so
the errors will flow towards 0. We know there always is at least one unstable fixed point between 0 and
1 provided we use a QECC that corrects at least one error, because if we count all sets of size t + 1 as
malignant, as in the simplified formula, we are overestimating the error rate, and yet there always is a fixed
point solution for the simplified formula.

14.7.3 The Significance of the Threshold Theorem

I am tempted to say that it is impossible to overstate the importance of the threshold theorem, but that
would be overstating the importance of the theorem. Let me instead just say that it would be difficult to
overstate its importance.
The threshold theorem says that, if the error rate per location is sufficiently low, then arbitrarily long
quantum computation is possible. It is an important theoretical result, saying that there is no barrier, in
principle, to building arbitrarily long quantum computations. It is also an important practical result for
experimentalists, since it sets them a definite goal: If an experimentalist can satisfy all the assumptions of
the basic model of fault tolerance and have error rates per location less than the threshold value, and can
build many qubits with that same error rate, then the system can be used as a reliable quantum computer.
As we’ll see in chapter 15, the assumptions built into the basic model can be relaxed substantially without
eliminating the threshold, although in some cases there is a cost in the form of reducing the numerical value
of the threshold. Consequently, it seems extremely likely that the threshold theorem applies to real world
quantum computers as well as to the simplified theoretical constructs we’ve been discussing so far.
Without the threshold theorem, it would probably be impossible to build a large quantum computer.
A weaker version of fault tolerance might suffice, but building a larger computer would require both more
qubits and also more accurate gates. Most systems seem to have a fundamental limit on how accurate they
can be, since there are noise sources that cannot be completely eliminated, so very likely in the absence of
a threshold theorem, there would be an upper limit to the length of any quantum computation.
The existence of quantum computation presents a profound challenge to the strong Church-Turing thesis,
one of the pillars of theoretical computer science. The strong Church-Turing thesis says that all physically
reasonable models of computation are equivalent, up to polynomial slowdowns or speedups. Quantum
computers appear to violate that, but without a threshold theorem, quantum computation would not be
physically plausible. The threshold theorem forces us to confront the apparent computational speed-ups
provided by quantum mechanics as a real thing, and not just some peculiar hypothetical model.
Similarly, the threshold theorem, and more generally quantum error correction, o↵ers a challenge to
previous physical intuition. We are used to quantum mechanics being confined to very small objects and
short times. Larger objects decohere rapidly and become classical. A fault-tolerant quantum computer,
in contrast, can maintain arbitrarily large and complicated quantum states for arbitrarily long times. A
quantum computer functioning below the threshold is somehow in a fundamentally di↵erent phase of matter
than normal materials, a phase with macroscopic quantum phenomena. Superconductivity, superfluidity, and

273
the like are frequently referred to as macroscopic quantum states, but a fault-tolerant quantum computer is
even more quantum.
When first presented with the idea of a large quantum computer, many physicists and computer scientists
have the reaction “there must be something wrong with that.” Most often the first candidate for “something
wrong” is that errors in the implementation will get in the way. The threshold theorem answers that
objection. When confronted with it, some quantum computation skeptics reluctantly accept that quantum
computers are a theoretically viable possibility. Others turn to the hope that quantum mechanics will
somehow turn out to be wrong in a way that causes quantum computation to fail, or to attempts to find
holes in the theory of fault tolerance. In any case, it is not possible to sensibly discuss the viability of
quantum computation without understanding the threshold theorem.

14.7.4 The Numerical Value of the Threshold

Turning to a more practical question, let us consider the numerical value of the threshold. It is a useful
thing to know, to give experimentalists something to shoot for, and to evaluate how difficult it will be to
get there. The same techniques that proved the existence of the threshold can give us useful insight into its
value.
Before discussing the numerical value of the threshold, let me be more precise about what I mean when
I say “threshold.”

Definition 14.10. Suppose you have a family Fl of fault-tolerant protocols, such that, if p < pT , arbitrarily
long quantum computations are possible using an appropriate member of Fl with only polylogarithmic
overhead, as in theorem 10.4. The largest value of pT for which this statement is true is the threshold for Fl .
Given a QECC Q, consider all possible fault-tolerant protocols F using Q, and the families Fl formed by
concatenating F l times. The best achievable threshold for such a family Fl is also known as the threshold
for Q. The threshold for fault-tolerant quantum computation, or just the threshold for short, is the supremum
over all codes Q of the threshold for Q.

Thus, the threshold for fault-tolerant quantum computation is independent of a specific choice of code or
fault-tolerant protocol. By considering specific protocols or specific codes and finding a value pT such that
the threshold condition holds, all we can ever achieve is to find a lower bound on the threshold. No matter
how clever you are in coming up with a code or fault-tolerant protocol, it may be that there is a better code
or protocol out there yet to be discovered. By the same token, when we perform a calculation for a specific
code with a specific fault-tolerant protocol, that only sets a lower bound on the threshold for that code. It
may be that a more clever protocol will achieve a higher threshold. Furthermore, the techniques discussed
in this chapter for proving the existence of thresholds also only give you lower bounds. It may be that you
can improve (i.e. raise) the lower bound using a more careful analysis of the same family of protocols, for
instance by counting malignant sets instead of just all sets of t + 1 locations in an exRec.
It is also possible to set upper bounds on the threshold, but probably the existing upper bounds are not
very tight. I’ll discuss upper bounds on the threshold in section 16.4.
Strictly speaking, the threshold also depends on the nature of the error model and allowed circuit prop-
erties. By default, when I speak of a threshold without any additional qualification, I will usually mean the
threshold for a basic model of fault tolerance, using the worst possible error model consistent with a single
error rate p = pP = pG = pS = pM . We usually set lower bounds on this by using the adversarial stochastic
noise model. When I want to talk about the threshold for another error model or using other assumptions
about the circuits allowed for the protocol, I will qualify the statement and talk about, for instance, the
threshold for depolarizing noise, or the threshold in two dimensions. In the midst of an extended discussion
of a threshold with some such qualification, I may talk about just the “threshold,” but hopefully it will be
clear from context that I haven’t changed topics to discuss the threshold for a basic model of fault tolerance
instead.
Based on the calculations in section 14.5, we can immediately calculate a lower bound on the threshold

274
Levels Max. Length Overhead (Loc.)
0 3.7 ⇥ 103 1
1 3.7 ⇥ 104 492
2 3.7 ⇥ 106 242, 064
3 3.7 ⇥ 1010 1.2 ⇥ 108
4 3.7 ⇥ 1018 5.9 ⇥ 1010
5 3.7 ⇥ 1034 2.9 ⇥ 1013

Table 14.1: The circuit size overhead and maximum length of a computation given a certain number of levels
of concatenation of the 7-qubit code. These calculations use pT = 2.7 ⇥ 10 5 , p/pT = 0.1, ✏ = 0.01.

for the 7-qubit code. The CNOT exRec is the biggest exRec for the protocol we were using, and we saw that

269745p2
Prob(CNOT exRec bad)  . (14.62)
(1 63p)8

(The ⇡/8 exRec has a di↵erent, possibly worse, post-selection formula for the denominator, but a lower
constant in the numerator, so comes out to be a less stringent bound.) We want to know the value of p such
that the RHS is equal to p. Since Cp2 /(1 63p)8 Cp2 , we know the fixed point occurs for p < 1/C < 10 5 .
Therefore,we can bound the fixed point:

269745p2
p= < 272000p2 , (14.63)
(1 63p)8

so pT > 3.6 ⇥ 10 6 . Of course, this is just a very rough bound. By doing some small additional optimizations
on the circuits and more importantly by counting malignant sets carefully, we find the threshold for the 7-
qubit code is at least pT 2.7 ⇥ 10 5 , almost an order of magnitude better.
However, the 7-qubit code doesn’t have the best threshold. The best known lower bound on the threshold
as of this writing comes from a family of protocols due to Knill which uses concatenated error-detecting codes
to do extensive verification of ancillas before they are used in the main part of the computation. I will discuss
this protocol a bit more in section 16.1. The proven threshold for Knill’s scheme is above 10 3 , one error
per one thousand locations, but simulations suggest the actual threshold for Knill’s scheme is much higher:
in a basic model of fault tolerance with depolarizing noise and CNOT gate error rate pCNOT slightly higher
than the error rates pG , pP , pM for one-qubit gates, state preparation, and measurement, the threshold is at
least 3% and probably higher, perhaps as high as 5%. Knill’s protocol is not very practical, as the overhead
is ridiculously large (a factor of billions or more at error rates above 1%). A much more practical choice is
the surface code, which we will discuss in chapter 19, and simulations suggest that the threshold for surface
codes is about 0.7%.

14.7.5 The Overhead Required by the Threshold Theorem

Equation (14.60) tells us the overhead needed for a fault-tolerant circuit using concatenated codes. Asymp-
totically, for large T , this is very good scaling. However, for computations of realistic size, the need for at
least a few levels of concatenation can make the overhead rather large.
For the case of the 7-qubit code, let us work out the actual overhead needed. In terms of locations, the
largest rectangle is actually the R⇡/8 rectangle since the CNOT rectangle (not exRec) has only two EC steps.
Using the numbers from section 14.5.3 adjusted for a rectangle rather than an exRec, we find that the ⇡/8
rectangle has B = 492 locations in total.
Suppose we have a physical error rate p = 0.1pT , and let us insist on a failure rate ✏ = 0.01. Then we can
calculate the maximum allowed T for a given number of levels of concatenation. Let us use the optimized
threshold value pT = 2.7 ⇥ 10 5 . The results are given in table 14.1.
You can see the polylogarithmic scaling start to kick in after 3 or 4 levels of concatenation. Because
the physical error rate is quite low in this scenario, we can do fairly long calculations (with over three

275
thousand locations) without error correction. However, remember that this is the number of locations in the
circuit, not necessarily gates. A calculation lasting 100 time steps on 40 qubits will be too many locations
(although the failure rate ✏ is still only about 0.011). With 2 levels of concatenation, we can do much
longer computations, containing over a million locations. Three levels allows more than 30 billion locations,
which should handle all but the most demanding computations, and four levels of concatenation should be
enough for any computation we can seriously imagine doing. However, by the time we’ve reached 3 levels
of concatenation, the number of physical locations needed for each logical location is over 100 million, so a
long computation with 109 logical locations requires a daunting 1017 physical locations.
However, bear in mind the threshold theorem is primarily a feasibility result. It says that long reliable
computations are possible in principle, and only provides a lower bound on the performance of fault-tolerant
protocols. In practice, we now know fault-tolerant protocols (for instance based on the surface code, chap-
ter 19) which have much lower overheads in addition to much higher thresholds.

14.7.6 Threshold Surfaces and Pseudothresholds

The statement of the threshold theorem and the subsequent discussion assumed that all error rates were the
same. What happens in the more realistic situation where di↵erent types of locations have di↵erent error
rates? It no longer makes sense to talk about the value of the threshold. We’ll need to specify conditions
for all of the di↵erent error rates, and there could be a trade-o↵ where we allow one type of location to
have a larger error rate in exchange for a smaller error rate for another type of location. We end up with a
threshold surface in the parameter space. As before, we can discuss threshold surfaces for a specific family
of concatenated codes or for all possible families of codes. If the vector of error rates lies inside the threshold
surface, arbitrarily long quantum computation is possible via fault tolerance. If the vector lies outside of the
threshold surface, attempting a fault-tolerant protocol will hurt more than it helps.
To see how this works, we can perform an analysis similar to that in the proof of theorem 10.4. The only
di↵erence is that instead of having a single error rate pl for each level, we have a di↵erent error rate for each
type of location that we track. These error rates can be related by coupled recurrence relations derived as
before.
For example, let us consider the concatenated 7-qubit code. For simplicity, we will consider circuits with
only Cli↵ord group gates, and lump together the error rates for one-qubit gate locations, wait locations,
preparation locations, and measurement locations. The CNOT error rate we will track separately. At level
l, we thus have two error rates pG,l and pCNOT,l . Based on the calculations in section 14.5.3, we see that the
CNOT exRec contains 263 CNOT locations and 472 single-qubit locations. The largest one-qubit exRec is
a gate exRec, which contains 128 CNOT locations and 243 single-qubit locations. Therefore,

1
pG,l = (29403p2G,l 1 + 31104pG,l 1 pCNOT,l 1 + 8128p2CNOT,l 1 )
(1 Ql )4
1
pCNOT,l = (111156p2G,l 1 + 124136pG,l 1 pCNOT,l 1 + 34453p2CNOT,l 1) (14.64)
(1 Ql )8
Ql =38pG,l 1 + 25pCNOT,l 1 .

If the physical error rates are (pG,0 , pCNOT,0 ), using these recurrence relations, we can track the pair of
error rates as a function of l. For some starting values, the error rates will eventually decrease and both
head towards 0. For other starting values, they will both head towards 1. The boundary between these two
regions is the threshold surface, and is plotted in figure 14.13.
You might think that, since the CNOT exRec is the largest, given the physical error rates (pG,0 , pCNOT,0 ),
it is sufficient to check that pCNOT,1 < pCNOT,0 to verify that we are inside the threshold surface. That is
not the case, however.
To see what can go wrong, imagine an extreme case where pG,0 = 0. Then we have that

1
pCNOT,1 = 34453p2CNOT,0 , (14.65)
(1 25pCNOT,0 )8

276
pG
5
2 ⇥ 10

5
1 ⇥ 10

pCNOT
5 5 5
1 ⇥ 10 2 ⇥ 10 3 ⇥ 10

Figure 14.13: Plot of the threshold surface for Cli↵ord gates and CNOT gates for the 7-qubit code derived
using equation (14.64). The arrows indicate the e↵ect of the recursion relations at various points inside and
outside the threshold surface.

so pCNOT,1 < pCNOT,0 when pCNOT,0 < 2.88 ⇥ 10 5 . But (0, 2.88 ⇥ 10 5 ) is actually outside the threshold
surface! 2.88 ⇥ 10 5 is a CNOT pseudothreshold, not an actual threshold.
The problem is that, while the CNOT error rate goes down from level 0 to level 1, the single-qubit error
rate increases from 0 to something significant:
1
pG,1 = 8128p2CNOT,l 1 ⇡ 6.76 ⇥ 10 6
. (14.66)
(1 25pCNOT,0 )4

When we calculate the level 2 error rates, we must take this into account. We find
5
pG,2 = 1.41 ⇥ 10 (14.67)
5
pCNOT,2 = 5.77 ⇥ 10 . (14.68)

As you see, both error rates increase going from level 1 to level 2. The recurrence relations are monotonic
in the error rates, so once all error rates increase when adding a level, they will continue to get worse and
worse as we add more and more levels. In particular, we know that (0, 2.88 ⇥ 10 5 ) is outside the threshold
surface. Or at least, it is outside the surface calculated using these particular formulas. It is quite likely that
a more careful calculation counting malignant sets would give a di↵erent result.
The point, however, is this: When dealing separately with error rates for di↵erent types of locations,
it is possible that some error rates will go up and some go down when you add a level of concatenation.
When this happens, you need to look at more levels to figure out if you are inside or outside the threshold
surface. Only once all error rates are moving the same direction can you reach a firm conclusion. Even
for fault-tolerant protocols not based on concatenation, it is quite common for the protocol to improve the
logical CNOT error rate but make single-qubit error rates worse, for example, and it is not immediately clear
if this constitutes an actual improvement or not without a more careful analysis.
Because it is difficult to plot and completely study threshold surfaces, frequently a further simplification
of error rates is made. For instance, a common choice is to assume that storage errors are less likely than
other kinds of errors, so perhaps one assumes that pP = pG = pM = p, and that pS = p/10. In this case,
the threshold once again becomes a single number, expressed as a bound on p. The other considerations
of this section apply in full. In particular you cannot assume that the level 1 error rates satisfy the same
relationship as the level 0 physical error rates!

277
278
Chapter 15

Now Hold On Just a Second:

Assumptions Re-Examined

OK, we’ve proven the threshold theorem. This shows that arbitrarily large quantum computers can in
principle be built. There is obviously a lot of theoretical and experimental work left to do, optimizing the
value of the threshold, reducing the overhead needed, and of course actually building a device with low
enough error rates and sufficiently many qubits to run a fault-tolerant protocol. But at least we can rest
easy that the main question of principle has been resolved, and that if society is willing to devote enough
time and money to the task, someday we will be able to build large quantum computers.
Not so fast! The threshold theorem we proved was for a basic model of fault tolerance, which is a theo-
retically tractable construct that is simplified in various ways to make it easier to analyze than a real system.
Many of the assumptions built into the basic model will be violated in a real system. (Although exactly
which assumptions are violated may vary considerably depending on the physics of the implementation.)
In this chapter, I’ll discuss the assumptions which may be violated, and examine which of them are
truly necessary for a threshold theorem to hold. As you’ll see, we can substantially relax the assumptions
of the basic model and still have a threshold. We do need a few assumptions, some of which are violated
in some specific systems, but the prerequisites for a threshold are loose enough that it seems likely that a
fault-tolerant quantum computer really ought to be possible, provided we use the right type of system to
build it.

15.1 Solovay-Kitaev Theorem

15.1.1 Di↵erent Universal Gate Sets
The first concern is what gates are available to our circuits. All of the gadgets in the protocol are written
in terms of some universal set of gates. For instance, the fault-tolerant protocols we have described for
stabilizer codes use only physical Cli↵ord group gates and C3 gates, and usually just one kind of C3 gate,
such as R⇡/8 . But what if our physical machine doesn’t naturally perform these gates, but instead uses
some other universal gate set? Since the actual gate set is universal, we can certainly rewrite the gates used
in the fault-tolerant protocol in terms of the implementation’s natural gate set, but there will be a cost in
overhead and reduced tolerance to error (since we need more physical gates to produce the same number of
logical gates). Will that cost destroy or substantially weaken the threshold theorem?
There are actually two issues. Not only do we need to build the physical gates used in the fault-tolerant
protocol out of the gates directly available in the physical realization of the computer, but we also need
to convert from the set of logical gates that can be directly produced by the fault-tolerant protocol to the
logical gates used in the algorithm being performed.
Let us first consider the physical gates. Suppose that the available physical gates are {Ui } and the ones

279
needed for the fault-tolerant protocol are {Vj }. Let us first consider the case where we can exactly realize
the Vj s in terms of the Ui s. We can then write
mj
Y
Vj = Uaij (15.1)
i=1

for some sequence {aij |i = 1, . . . , mj } of length mj . The gate Ui has an error rate pUi . By the union bound,
the error rate for Vj is therefore
mj
X
pV j  pUaij . (15.2)
i=1

This is true even for adversarial stochastic errors. In the special case where pUi = p, pVj = mj p. Thus, there
is a blow-up in the error rate for the gates used in the fault-tolerant protocol. The value of the threshold
could therefore decrease by a factor of as much as maxj mj . That is undesirable, particularly if mj can be
large, but at least it leaves the threshold theorem in place.
When the Vj s cannot be written exactly as a product of Ui s, or even if it is just very costly to do so, we
instead should approximate the Vj s. We again get equations like equation (15.1), but now instead of having
Vj on the left, we have Vj0 , with kVj0 Vj k < . then acts like an additional error rate on top of the pVj that
we calculated above. If is a constant as a function of the size of the computation, which can be achieved
with a constant value for each mj , the total error rate remains constant. This is important, because we don’t
want the per-gate error rates to increase as the computation gets bigger. If the error rate stays the same,
we can bundle the gate approximation error in with all the other sources of error and correct them together
with the same QECC.
There is a catch, however: we have left the basic model of fault-tolerance. The di↵erence between Vj0 and
Vj cannot be written in terms of a stochastic error model, not even an adversarial one, unless p = 1: Every
time we try to do Vj , we will instead do the unitary Vj0 or have an error in one of the Ui used to produce Vj0 .
However, since Vj and Vj0 are close to each other, the operation is not very wrong, so 100% error rate would
be a horrible overestimate. The correct way to handle this type of error is as a coherent error, which we will
treat in section 15.8. The overall result is that, as in the exact case, the threshold still exists, but is lower.
Now for the logical gates. A fault-tolerant protocol includes gadgets for a universal set of logical gates,
but frequently, as in the example of the 7-qubit code or other stabilizer codes, it is a finite set. The basic
model assumes that the logical circuit we are trying to simulate has already been written in terms of the
logical gates used. If it’s not, we need to convert the logical circuit to use the same universal set that
is produced by the fault-tolerant protocol, and again there is an extra overhead to doing so. Does that
undermine the threshold?
If we can exactly realize the gates in the simulated circuit C using the gates with fault-tolerant gadgets,
there is no problem. If it takes up to m fault-tolerant gates to write each circuit gate, the total number of
locations T could increase to mT . Since the number of levels l of concatenation needed scales as log log T ,
this means that l stays the same or at most increases by 1 unless m is particularly large. This increases the
overhead slightly, but doesn’t fundamentally alter the theorem.
If the gates in C can only be approximated using the gates for which we have fault-tolerant gadgets,
more care is needed. Each time we approximate a gate, that is another source of error. Since this is an
approximation of logical gates, the error is a logical error which cannot easily be dealt with via more error
correction. Instead, we need to make sure that when we approximate gates, the approximation is good
enough that over the course of the whole computation, the total error is still manageable.
To be precise, the original circuit C has T locations, and our goal is to get an output state which has
statistical distance at most ✏ from the correct distribution D. We approximate each gate in C to accuracy
(in 1-norm) using at most m fault-tolerant gadgets, so we have a fault-tolerant circuit with  m T logical
locations. Greater accuracy might require a larger number m of fault-tolerant gadgets, which in turn
contributes to a greater logical error rate, so we must choose to balance these concerns. (See appendix ??
for a discussion of distance measures. Note that the choice to use statistical distance and 1-norm don’t

280
really matter here — the same type of arguments apply to any reasonable choice of distances assuming all
the gate sets involved act on a constant number of logical qubits.)
Each of the m T locations has a logical error rate pl , as in the proof of theorem 10.4, and there is
an additional source of error for each of the T locations in C because our approximation is imperfect.
With probability at most pl m T , there is a logical error in one or more of the fault-tolerant gadgets in
the simulation, giving us an incorrect distribution D0 . Otherwise, we get the distribution D00 produced by
performing C but with the approximate gates instead of the true gates. Each approximate gate is at most
away from the true gate, so the total distance of the final state of the approximate circuit is at most T .
Therefore, the output distribution pl m T D0 + (1 pl m T )D00 has statistical distance at most pl m T + T
from the ideal distribution D. In order for this to be below the target error rate ✏, it is sufficient to pick l
and such that < ✏/(2T ) and pl < ✏/(2m T ).
In particular, we need an approximation that improves linearly with the length of the computation we
are doing. A better approximation generally needs more gates, so m must increase with T . That increases
the overhead in two ways: First, directly, since the number of fault-tolerant gadgets used is proportional to
m , and second, indirectly, by setting a more stringent requirement on pl . Referring to theorem 10.4, we
see that the overhead acquires an extra factor of m polylog m . We therefore need to be careful of how m
scales with T , since a poor scaling rate could completely undermine the threshold theorem.

15.1.2 Statement of the Solovay-Kitaev Theorem

Luckily, the scaling is not bad.
Theorem 15.1 (Solovay-Kitaev). Let S = {Ui } be a finite universal set of gates for SU(D), closed under
inverses (i.e., U 2 S i↵ U † 2 S). Then for any U 2 SU(D) and any > 0, there exists a finite sequence
a1 , . . . , am such that
Ym
kU U ai k < , (15.3)
i=1

and m = O(polylog(1/ )). Furthermore, a sequence a1 , . . . , am satisfying these conditions can be found via
a classical algorithm running for time O(polylog(1/ )).
The Solovay-Kitaev theorem says that all universal sets of gates are equivalent up to polylogarithmic
overhead. In it, D, the Hilbert space dimension, is fixed. Most of the time, we are interested in D = 22
or D = 23 (when the universal set of gates includes two- or three-qubit gates, respectively), so taking D
constant is perfectly reasonable. If D starts to get larger, D = 2n , the scaling is not as good: m increases
exponentially with n. This is inevitable: For fixed ✏, the number of unitaries in SU(2n ) which are distance ✏
from each other is exponential in D, so poly(D) gates (= exp O(n) gates) from a finite set will certainly be
needed to cover all of them.
Of course, for many specific S and U , even better approximations are possible. For instance, for the gate
set that shows up most often in fault-tolerant protocols, a tighter result is possible:
Theorem 15.2. Let S = Ĉ1 [ {R⇡/8 } and D = 2. Then for any U 2 SU(2), > 0, there exists a circuit
of size O(log 1/ ) realizing U 0 such that kU 0 U k < . Furthermore, the circuit can be found classically in
time O(polylog(1/ )).
That is, any single-qubit gate can be approximated to accuracy using just log 1/ Cli↵ord gates and
R⇡/8 gates. This is the optimal asymptotic scaling possible, which can be shown by a counting argument:
You need at least this many gates so that every point in SU(D) is in a ball of radius from the exactly
achievable gates. The gate set S was chosen because of its convenience for fault tolerance, but the fact that
this optimal asymptotic approximation is possible is an extremely nice additional property of S that is not
shared by all approximately universal gate sets.
In order to approximate a 2- or 3-qubit gate U using Cli↵ord gates and R⇡/8 gates, we can first write
U exactly as a product of a constant number m of CNOT gates and single-qubit gates, since the set
{CNOT, single-qubit unitaries} is an exactly universal gate set. Then we can approximate the single-qubit

281
gates to accuracy /m using theorem 15.2, giving us an O(log 1/ )-size circuit approximating U to accuracy
. For multi-qubit gates, this strategy doesn’t give the optimal approximation, but the scaling in is still
better than that provided by the Solovay-Kitaev theorem.
As a consequence of theorems 15.1 and 15.2, the threshold theorem does not need be qualitatively
changed when the circuit C involves gates outside of the universal set used for fault tolerance. Since
m = O(polylog(1/ )) = O(polylog(T /✏)), the overhead due to the Solovay-Kitaev theorem is comparable
to the overhead from the threshold theorem. Combining them, we simply get a higher degree polynomial in
the logarithm. This means that once T is very large, making it even larger does not substantially increase
the overhead. As before, though, nice asymptotic behavior does not mean it isn’t painful dealing with the
constant factors. Even mediocre approximations using the Solovay-Kitaev theorem can take many gates, so
it is very helpful to develop shortcuts for the specific gates used in C.

15.2 Short-Range Gates

The next assumption I’ll examine is not part of the basic model per se, but was implicit in all of the fault-
tolerant gadget constructions I showed you. Specifically, in designing gadgets, I assumed we could do CNOT
gates between any pair of qubits, no matter where they are in the computer. In many systems, however,
qubits only interact locally, and it is only possible to do CNOT or other two-qubit gates between qubits
that are physically near each other. In the extreme case, only gates between nearest-neighbor particles are
allowed. Is there still a threshold under these circumstances?

15.2.1 SWAP Gates in a Fault-Tolerant Circuit

Putting fault-tolerance aside for a moment, one way to convert a general circuit C to one using only nearest-
neighbor gates is by using the SWAP gate, which exchanges two qubits. With a series of SWAP gates, we can
move a qubit between any two locations in the quantum computer (assuming the geometry is connected).

Proposition 15.3. For any circuit C, there exists a circuit C 0 with the same output as C such that C 0 uses
only nearest-neighbor qubits when the qubits are arranged on some sort of regular D-dimensional structure
(most often a square grid). When C has n qubits and T locations, the number of locations in C 0 is at most
O(n3 T ).

Proof. We can build C 0 by replacing each gate in C by a subroutine. Whenever C calls for a two-qubit gate
U between qubits i and j, the subroutine begins with a series of SWAP gates that move qubit j to a location
next to i and then finishes by performing U between i and the new location of j. If there are n qubits in
the computer, C 0 has at most n 1 times as many gates as C (even in 1 dimension, n 2 SWAPs suffices
to move a qubit from one end of the line to a spot next to the far end, plus the gate U ). However, the
procedure could potentially spoil the parallelism of C. It may not be possible to simultaneously move all of
the qubits that need to interact in a given time step to where they need to go. Therefore, the depth of C
could blow up by a factor of up to n2 (n 1) = O(n2 ). That would cause the total number of locations in C
to increase by a factor of O(n3 ).

When we want to make a fault-tolerant circuit using only nearest-neighbor gates, the first thing to do
is make sure the logical circuit uses only nearest-neighbor gates, as in proposition 15.3. SWAP is a Cli↵ord
group gate and has an obvious transversal implementation for any code (not only a stabilizer code) encoding
one qubit per block: Swap all the qubits of the first block with all the qubits of the second block. That is,
the transversal SWAP always performs the logical SWAP as well.
However, this is not sufficient. The fault-tolerant gadget constructions from prior chapters might (and
generally do) involve gates which interact qubits which might not be nearest neighbors. That’s not obviously
a problem. Perhaps we can simply take the circuit for a fault-tolerant gadget and simply convert it into one
that uses only nearest-neighbor gates with proposition 15.3. Indeed, this gives us a gadget with the same
purpose as the original one.

282
a) b)
A A 3 A

1 3 1 4

i 2 j i 2 i

Figure 15.1: Two possible arrangements of qubits that use nearest-neighbor gates and ancillas (A) to perform
a SWAP which is safe for fault tolerance between data qubits i and j. The arrows indicate a physical SWAP
between the two indicated positions in numerical order.

However, it is not generally a fault-tolerant gadget. The problem is that the SWAP gates used to move
qubits around are not transversal ones. This means that if a SWAP location has a fault, it could potentially
cause errors in two qubits in the same block. This means the gadget will probably violate the relevant FT
conditions.
Normally, there is a second problem with non-transversal gates, namely error propagation, but it is not
an issue for SWAP gates. An ideal SWAP gate interacting qubits i and j does indeed cause errors which
begin in qubit i to propagate to qubit j, but in the process, those errors depart qubit i. That is, SWAP not
only swaps the data stored in two qubits, it also swaps the errors. Therefore, the picture

r r r
= (15.4)

holds, whereas the corresponding statement is not true of the CNOT gate.
Using this fact, we can come up with a non-transversal but “safe” implementation of the SWAP gate
that can be used to be move qubits in a fault-tolerant gadget in such a way that the resulting gadget is also
fault-tolerant. We discussed this before, in chapter 11, and the circuit is pictured in figure 11.2. It uses one
ancilla qubit A and proceeds via 3 physical SWAP gates. To swap data qubits i and j, first SWAP i with A.
Then SWAP A (in its new location) with j. Finally SWAP A with i again. Then j ends up in the location
originally containing i and i ends up in the location where j began. The ancilla qubit A ends up back where
it started. Note that the value of the ancilla is irrelevant; the circuit works the same way regardless of the
state of the ancilla. Each of the three SWAP gates interacts A with one of the data qubits, so a single fault
during this circuit can create correlated errors in A and one of the two data qubits. But since ideal SWAPs
only move errors around and cannot change the weight of a pre-existing error, the data error caused by a
single fault cannot spread to the other data qubit by the end of the SWAP circuit. Of course, if there are
two faults during the circuit, that could cause both qubits to fail, but two faults can always cause two qubits
to fail, so that won’t violate the definition of fault-tolerance.
In order to do this SWAP circuit using only nearest-neighbor gates, we need the two data qubits and
the ancilla qubit to be arranged in a triangle, as in figure 15.1a. If the geometry of qubits in the computer
is a square lattice, or some other arrangement that lacks triangles, we can still do it by adding one or more
extra ancilla qubits, as in figure 15.1b. This necessitates one or more extra SWAPs to move the qubits where
they’re needed, but doesn’t significantly change the analysis.
Another option for moving qubits around is to generate entangled ancilla pairs |00i + |11i between the
two locations and then teleport qubits around or do remote CNOT gates. Doing this requires a sequence
of entanglement swapping operations in between, and over longer distances, entanglement distillation (sec-
tion 17.1) will be needed as well to keep the error rate in the entangled pairs under control. Thus, there will
be significant extra space overhead from this option, but most of the work can be done in parallel, meaning
a minimal time overhead. Qubit routing via entangled pairs is a good option when the cost of extra qubits
is not too high and fast measurement and classical communication is available.

283
15.2.2 Architecture of a Local Fault-Tolerant Quantum Computer

How do we use this idea to create a fault-tolerant quantum computer using only local gates? Since the
number of locations in the local version of a circuit can depend on the number of qubits involved in the
circuit, we need to be a bit careful to make sure that the overhead and error rates of a local fault-tolerant
circuit don’t blow up as the ideal circuit gets very large.
The main variable to keep in mind is the spatial dimension of the computer. In order to take advantage
of the fault-tolerance-safe SWAP circuit in figure 11.2, we need to have qubits arranged in at least two
dimensions. This can certainly be satisfied in a computer with qubits arrayed in a two-dimensional lattice,
but it is also enough to have a one-dimensional system with two parallel lines of qubits. Ideally the two lines
would be o↵set by half a lattice site in order to get the triangles as in figure 15.1a.
The next consideration is the QECC used. In the case of a concatenated code, we can consider just a
single level of the code and find a local fault-tolerant protocol for that. First we convert the logical circuit C
which wish to implement to one which involves only local gates arranged in the correct number of dimensions.
Then we store each block of the QECC with all the ancillas needed to perform a complete set of fault-tolerant
gadgets. This includes all the qubits needed to prepare any ancillas. If we attempt ancilla preparation many
times in parallel and post-select to keep only one attempt, we also store locally all the qubits needed for
the all the simultaneous preparation attempts. The total number of ancilla qubits depends on the protocol,
but importantly, it does not depend on the size of the logical circuit C. Therefore, there the total number
of qubits needed for all possible gadgets involving two adjacent logical qubits is a constant N . In order
to performp one of these gadgets, the maximum distance any qubit needs to travel is N (in one dimension)
or about N (in two dimensions). We can use safe SWAP gates to convert the usual set of fault-tolerant
gadgets into fault-tolerant gadgets using only nearest-neighbor gates. The maximum size of any gadget is
polynomial in N , independent of the size of C.
By concatenating the local protocol for a single level of the code, we get another local protocol for
multiple levels. The overall e↵ect in two dimensions is pictured in figure 15.2. The architecture of the
computer becomes self-similar. Local blocks of some size R store a single block of the QECC with one level
of encoding and all the ancillas needed for level one fault-tolerant gadgets. Within a larger region of size
R2 , some of the local blocks combine to form a qubit encoded with two levels of concatenation, while others
combine to form the ancillas needed for level two fault-tolerant gadgets. Then a region of size R3 contains
everything necessary for gadgets with three levels of concatenation, and so on.
When applied to a two-dimensional computer, the concatenated protocol remains two-dimensional. Since
the one-dimensional protocol required two lines of qubits, straightforward concatenation with l levels pro-
duces a linear computer which is 2l qubits thick. With a little thought, we can reduce this to a line two
qubits thick. One way to do so is to imagine that instead of nearest-neighbor gates, we have next-to-nearest
neighbor gates. Then we could make a protocol which works on a single line of qubits by alternating data
qubits with ancilla qubits and using the safe SWAP (figure 11.2) to move qubits around. With a series of
safe SWAPs, we can add a fault-tolerant gadget for next-to-nearest-neighbor SWAP at one level of encod-
ing. Concatenating the next-to-nearest-neighbor protocol results in a one-dimensional computer, still using
just a single line of qubits. Going back to nearest-neighbor gates, we can implement the physical next-to-
nearest-neighbor SWAP via SWAPs involving a second line of qubits. The resulting architecture is shown in
figure 15.3.
In fact, it is possible to do even better and make a fault-tolerant quantum computer with nearest-neighbor
gates and a single line of qubits. To do so, we give up on the safe SWAP concept and simply accept that
a single faulty SWAP gate can create two errors in a block. We’ll need to be able to correct both of those
errors, which means we need to concatenate a QECC that corrects two errors per block (i.e., a distance 5
code). Other than that, there is no further complication. Concatenating such a protocol again gives you a
threshold. However, rigorously proving that this works requires the concept of gadgets with a spread, which
I’ll discuss in section 16.3.

284
Figure 15.2: Cartoon of the arrangement of a two-dimensional fault-tolerant quantum computer using con-
catenated codes. The small shaded rectangles mark out level-1 blocks, including the level-1 encoding and all
ancillas needed for fault-tolerant error correction, logical state preparation, logical measurement, and logical
gates. (This is a cartoon because the real number of qubits needed is much larger.) The medium-sized,
dashed, lightly-shaded rectangles indicate level-2 blocks, including the level-2 code and all necessary level-2
ancillas. The level-2 code and ancillas are made up of level-1 blocks. The large rectangle indicates a level-3
block.

Figure 15.3: A cartoon of the arrangement of a one-dimensional fault-tolerant protocol using 2 lines of qubits.
The top line are ancillas used only for level-1 SWAP gates. In the bottom line, the small shaded rectangles
indicate level-1 blocks and all relevant ancillas. The larger dashed, lightly shaded rectangle is a level-2 block.

285
15.2.3 E↵ect of Short-Range Gates on the Threshold
Because the constructions above all use concatenation to make families of fault-tolerant protocols with
nearest-neighbor gates in one or two dimensions, they all have thresholds. This shows that thresholds exist
even when only local gates are allowed. Unfortunately, the conversion to nearest-neighbor gates necessarily
adds overhead. More locations means more opportunities for faults and lower thresholds.
The concatenated 7-qubit code has been a favorite subject for threshold studies in various settings, and
in particular, thresholds have been proven for optimized concatenated 7-qubit codes in both one and two
dimensions. Recall that with no geometric constraint, the threshold for the 7-qubit code is provably at least
2.7 ⇥ 10 5 when the physical error rates for all locations are equal. For the same context in two dimensions,
the threshold is at least 1.1 ⇥ 10 5 . When the storage error rate is 1/10 the error rate for other locations, the
threshold in two dimensions is at least 1.8 ⇥ 10 5 . In one dimension, the threshold hasn’t been calculated
when all error rates are equal, but when the storage error rate is 1/10 of the error rate for other locations,
the threshold is at least 2 ⇥ 10 6 .1 These numbers are all proofs, so the actual thresholds are higher.
Naturally, other fault-tolerant protocols which use only nearest-neighbor gates are possible. Some codes
are more naturally suited to being implemented in two dimensions. (Sadly, few codes are well-suited for one
dimension.) In chapter 19, I’ll discuss surface codes, which are defined using a two-dimensional geometry and
give the best known thresholds in that context. The threshold for fault-tolerant computation with surface
codes has been largely determined via simulations, some of which report threshold estimates close to 1%.
In short, while working with nearest-neighbor gates does come with a cost to the threshold, it is a
manageable cost. In two dimensions, the threshold only seems to decrease by a factor of 2 or 3 at worst.
Working in one dimension is more costly, incurring at least an order of magnitude reduction in the threshold
even when two lines of qubits are available.

15.3 Slow Measurement or No Intermediate Measurement

Another implicit assumption in the fault-tolerant protocols I’ve described is that there is such a thing as a
measurement location that can be performed whenever you want. Certainly, you must be able to measure
qubits at the end of the computation, otherwise there’s no way to learn the answer of the computation.
However, in some physical systems, you may only be able to measure at the end. This may be because the
only way to reliably measure the system is to do something to all of it at once, for instance, to couple the
whole computer strongly to a measurement device. Alternatively, it may just be that measurement is very
slow compared to a single unitary gate. This is a common situation: Normally, we want to do quantum
computation in systems with weak coupling to the environment, but measurement requires a coupling to the
environment, or at least the part of the environment containing the experimenter. Another scenario is for
measurement to be reasonably fast, but the quantum gates operate on such a short time scale that classical
processing of the measurement results takes an appreciable number of time steps for the quantum computer.
Slow measurements present a potentially serious problem for a fault-tolerant circuit. Our fault-tolerant
error correction gadgets and non-transversal gate gadgets (usually magic state injection) make heavy use of
measurements followed by classical processing. If the qubits which don’t get measured have to wait around a
long time during the measurement and/or classical processing, storage errors can accumulate. If the waiting
time is a constant, independent of system size, there will still be a threshold, but it might be much lower.
The case where we must wait until the end of the computation to do measurements is even worse. We’ll need
some way to do error correction and non-transversal gates without measurements, or we won’t even have a
full fault-tolerant protocol, let alone a threshold.

15.3.1 Replacing Classical Processing With Quantum Circuits

In a non-fault-tolerant quantum computation, there is no real need to have measurements in the middle
of the circuit, even though it might be sometimes convenient to think about the algorithm as calling for a
1 This is for a computer composed of two lines of qubits, as discussed above.

286
measurement at certain points along the way. A quantum computer can do anything a classical reversible
computer can do, which in turn can do anything a classical irreversible computer can do, with some slight
overhead. If the algorithm calls for a measurement followed by a classical computation, with the output of
the classical computation determining what happens next in the quantum circuit, all of that can be replicated
exactly using a quantum circuit. Any measurements can be safely delayed until the end of the computation,
with minimal change in the resources (space and time) required. Note that some alternatives to the circuit
model, most notably measurement-based quantum computation (see chapter 20), do require measurements
during the computation in order to make up for a lack of universal unitary quantum control in the system.
Once we have universal gate sets, however, measurement is only needed at the end of the computation.
Even then, it is really only needed because we insist the outcome of the computation be transformed into a
classical value so that we can tell our friends, write papers about it, or simply store it in our limited classical
brains.
We’d like to follow the same strategy for a fault-tolerant circuit: replace classical computations with
quantum ones. However, there’s a complication. In the basic model of fault tolerance, classical gates are
perfect whereas quantum ones are not. We need to make important decisions that can a↵ect the logical state
of the computer based on the outcome of the classical computations, and a single mistake can ruin the data.
Therefore, to be safe, we must replace classical circuits with fault-tolerant quantum circuits.
But that seems circular. In order to have a fault-tolerant protocol without measurements, we must replace
the measurement locations and subsequent classical processing in all gadgets with fault-tolerant quantum
circuits, but general fault-tolerant quantum circuits require FT EC gadgets, state preparation gadgets, and
magic state injection protocols, which in turn use measurements, which must be replaced . . . . The way out
of the loop is to notice that we don’t replace classical processing with a completely general quantum circuit,
only with a quantum circuit to manipulate classical data. In particular, we don’t care about phase errors on
the classical data, so it’s enough to use a classical error-correcting code and fault-tolerant protocol for this
part of the circuit.
The natural fault-tolerant classical protocol uses the repetition code. It is simple yet e↵ective: Perform
three (or more) copies of the computation and periodically compare the current states of the copies, taking
the majority for any bit where they disagree. All classical gates can be performed transversally on the
repetition code, so we have a classically universal gate set. Fault-tolerant classical error correction is a bit
more complicated, but still much simpler than fault-tolerant quantum error correction. We don’t have to
worry about phase errors, and bit flip errors can only propagate from control to target in a CNOT, so we
don’t have to use cat states or any other special structured ancillas. We do need to be a little bit careful
about To↵oli gates or the like; while errors can’t propagate from the target into the two control bits, a fault
in the gate itself could produce errors in both control bits, so we should make sure that either the two control
bits are in separate blocks of a repetition code or copy one of the control bits to an additional ancilla qubit.
We don’t even need to worry about uncomputing any scratch qubits, since the usual quantum concern is
that unerased scratch qubits will decohere the quantum computer, and we don’t care about decohering the
classical bits. The bottom line is we only need to bring in a few ancilla bits, calculate the error syndrome in
them, and then correct the state. You may have worked out the details as exercise ??.
Depending on the fault-tolerant measurement gadget we are replacing, there may be a delicate moment
converting from one classical code encoding the measurement outcomes to the repetition code used for
classical fault tolerance. The other potentially tricky step is when we want to apply the classical result to
determine something in the quantum circuit.
In Shor EC or Shor measurement, we repeat each one-bit measurement multiple times to make it reliable.
This is already a repetition code, which is a good start. We can perform these protocols as normal, except
that whenever the standard circuit calls for a measurement location, we do nothing except to start considering
that bit as a classical bit. For Shor measurement, we then get a repetition code encoding the measurement
outcome and can proceed using a regular classical fault-tolerant circuit. In Shor EC, however, the standard
procedure (described in section 12.2.2) determines the consensus syndrome in a way that depends on the
order of the measurement outcomes rather than the simple majority. In order to compute the consensus
syndrome requires non-transversal gates. The way to do this safely is to introduce rs extra bits (plus any
needed scratch bits) and use them to compute the r-bit syndrome s times, each time starting from the bits

287
3 full syndrome
measurement outcomes

|0i f
|0i f
consensus syndrome encoded
|0i f
in a 5-bit repetition code
|0i f
|0i f

Figure 15.4: Converting outcomes of repeated syndrome measurement into a repetition code. Each doubled
line represents a full r-bit syndrome value and the function f determines which syndrome to use, for instance
using the procedure discussed in section 12.2.2. The computation of f may use additional scratch bits not
shown.

produced by the measurement locations, or rather those locations that would have been measurements if we
allowed them. Once we’ve done this, we are left with a code using s repetitions, and we can consider the
majority value of those repetitions to be the canonical one. A single fault in one calculation of the consensus
syndrome can result in a wrong value, but to get many wrong values requires either faults in many of the
syndrome calculations or multiple faults in the stored “measurement outcome” bits, enough to cause Shor
EC to fail. From this point, we can treat the rest of the syndrome extraction procedure using standard
repetition code fault tolerance techniques.
For Steane or Knill EC, the measurement outcomes form (possibly noisy) codewords of some classical
error-correcting code, generally not the repetition code. We wish to convert this to a repetition code using
s copies. As with Shor EC or measurement, we can simply remove the measurement locations and instead
begin treating the qubits classically. Then to convert to a repetition code, we can perform the full syndrome
decoding procedure s times, each time starting from the same set of “measurement outcome” bits but storing
the result in a di↵erent ancilla register. Again, a fault during a single syndrome decoding circuit can cause
one of the outcomes to be wrong, but many faults are needed to get many wrong outcomes. An alternative
procedure is to copy all “measurement outcome” bits s times and then perform a classical fault-tolerant
implementation of the decoding circuit. Steane or Knill measurement work the same way: The classical
decoding procedure first corrects errors on the classical codeword, then deduces the logical bit value. To
make this fault-tolerant, just repeat the whole procedure s times.
Finally, we must understand how to use bits stored in a classical repetition code to safely control quantum
gates. If we’re in the middle of an error correction gadget, we may want to use the classical syndrome
information to perform a Pauli operation, correcting the error. If we’re preparing an ancilla, we may wish to
use the testing data we’ve gathered to reject the ancilla and pick another one in its place, which involves a
series of SWAP gates on ancilla blocks. If we’re doing a magic state injection gadget for a C3 gate, we need
to perform some Cli↵ord group operation conditioned on the classical measurement outcome. In all of these
cases, depending on the code, we may only be doing a transversal quantum gate, but we want the decision
of which gate to do to depend on the classical register.
To do this safely, it’s best if the classical repetition code uses as many copies as there are qubits in the
QECC block we are controlling. Hopefully, we planned ahead so that this is true. Otherwise, we can increase
the number of copies by calculating the majority value of the classical repetition code enough times to create
the extra bits. (It’s not safe to simply duplicate one of the bits in the repetition code, since if that bit
happens to be wrong, we’ll get multiple wrong bits in the enlarged code.) Once we have the right number
of classical bits, we can use each one of them to control one qubit in the QECC. Since the classical bits are
in a repetition code, they are supposed to be the same, so in the absence of errors, we perform the correct

288
transversal gate on the data block(s). If there is one fault in the classical circuit, there might be one wrong
control bit, but that means only one qubit in each block su↵ers by having the wrong physical gate applied.
This is acceptable, since a single fault in that gate location could produce a similar result.
For many of the applications described above, we can combine the “convert to repetition code” and
“control qubits” steps to simplify the overall procedure. For instance, when we are doing error correction,
we don’t need to calculate the full description of the error n times for a QECC using n qubits. To fix the first
qubit of the code, all we need to know is if the error is on the first qubit, and if so, whether it is an X, Y , or
Z error. This comes to only 2 bits of information. Similarly for the other qubits. Therefore, we need only
2n ancilla bits, and we can make a slightly di↵erent calculation for each pair. Since the n corresponding bits
are no longer supposed to be identical, taking the majority value for classical error correction is no longer
possible. That’s not a serious problem: the syndrome decoding circuit is of a fixed size, and once we correct
the quantum error, we no longer care about the value of the syndrome. We can just skip error correction of
the classical register. The resulting streamlined circuit is shown in figure 15.5 for the 7-qubit code.
Of course, as you can see, replacing the classical syndrome computations with quantum circuits entails
a significant increase in overhead. For a single gadget, the increase is just a constant factor, so there is
still a threshold, but it might be substantially lower. As it happens, however, it appears to make minimal
di↵erence for the 7-qubit code. The threshold for classical computation is quite high, much higher than
the threshold for the 7-qubit code. Also, the classical computations needed for the 7-qubit code are fairly
simple. The combination means that the classical subroutines in gadgets for the 7-qubit code are a rather
robust part of those gadgets, and most small malignant sets of locations don’t involve them. For other more
sophisticated codes, however, the threshold is higher and the amount of classical computation needed for
syndrome decoding can be substantially larger, so there might be a great cost to performing a fault-tolerant
protocol for one of these codes when measurement is not available. As of this writing, the question has not
been carefully studied, however.

15.3.2 Delayed Measurements

When measurements are possible during the computation, but are slow, one option is to simply eliminate
intermediate measurements as discussed above. Given the consequent increase in overhead and perhaps
reduction of the threshold value, it’s worth exploring alternatives. If possible, we’d like to go ahead and
start each measurement, but to put o↵ doing anything about the results until much later, once there’s been
enough time to complete the measurement.
It turns out that this is, in fact, possible. As we saw in section 12.5.2, we don’t actually need to literally
correct errors each time we do an EC gadget. We can get the same benefit by classically keeping track of the
Pauli frame. As long as we do only transversal Cli↵ord gates, this is enough. Fault-tolerant error correction
circuits consist only of Cli↵ord group gates, so if we use a code which, like the 7-qubit code, allows the full
Cli↵ord group to be implemented transversally, and wish to simulate an ideal circuit containing only Cli↵ord
group gates, we can do the whole computation without ever needing to do a quantum gate conditioned on
a classical measurement outcome. (We will also need to forgo ancilla preparation techniques that rely on
post-selection, instead using ones which determine the location of any error on the ancilla.) This works
whether we are using just a single level of the code or many levels of a concatenated code.
Such a circuit is perfectly suited for slow measurements. We perform the usual measurements as part
of each FT EC gadget, but continue on with the circuit without waiting for the result. By the time the
measurement is complete, we’ve done many more quantum gates, so our information about the Pauli frame
is always delayed relative to the current state of the circuit. Note, though, that the delay is constant, based
on the duration of a measurement, and doesn’t get longer as the computation proceeds: The measurement
from the next EC gadget takes just as long as the previous one, so it is done shortly after the previous one.
At the end of the computation, we make a measurement, wait as long as needed for the result, and combine
that result with the Pauli frame, which is now fully up-to-date as well, to learn the outcome of the circuit.
When we wish to simulate a circuit involving non-Cli↵ord group gates, things are a bit more complicated.
In order to do magic state distillation or injection, we need measurements, and we need to do Cli↵ord group
gates which depend on the measurement outcome. We can’t simply proceed with the computation without

289
Z X
Z X
Z X
Z X
Z X
Z X
Z X

H
H
H
|0i H
H
H
H

|0i g1
|0i g2
|0i g3
|+i |0i g4
|0i g5
|0i g6
|0i g7

|0i g1
|0i g2
|0i g3
|0i g4
|0i g5
|0i g6
|0i g7

Figure 15.5: Steane error correction for the 7-qubit code, with the syndrome decoding performed using
classical repetition codes. The function gi is the syndrome decoding algorithm for the 7-bit classical Hamming
code to determine if there is an error on the ith bit. The computation of gi may use additional scratch bits
not shown.

290
knowing the result of the measurement as well as the current Pauli frame, which influences the interpretation
of the measurement. That means that to complete a magic state injection protocol, we need to wait for the
time needed to do a measurement.

In the strictly self-similar concatenation protocol discussed in chapter 14, we did logical non-Cli↵ord
group gates at level l by making magic states encoded at level l, which was done using level l 1 non-Cli↵ord
group gates. Those used level l 1 magic states and level l 2 non-Cli↵ord group gates, and so on down
to level 1 magic states. Waiting a long time for a level l measurement is not a big deal, since the l-level
code has a very low logical error rate. But if we have to ultimately build this up out of level 1 non-Cli↵ord
group gates, we have a problem. At level 1, we can’t a↵ord to wait very long, since the level 1 error rate is
relatively large.

The solution is to skip all of the intermediate levels and go right ahead with level l magic states and
non-Cli↵ord gates. The main observation is that it is perfectly possible to create a noisy level l magic state
using just a single physical non-Cli↵ord group gate and a whole bunch of gates that are from the Cli↵ord
group. One way to do this is completely straightforward: Make an unencoded magic state, then encode it
(in a non-fault-tolerant way) using one level of the code. We now have a slightly noisy level 1 magic state.
Then take the level 1 magic state and encode each of its physical qubits with another level of the code, again
using non-fault-tolerant circuits. Stop and do error correction for this level 2 magic state. The second level
of encoding could introduce errors, but a single fault will only a↵ect one level 1 block, and will be fixed
by the level 2 error correction. (Well, we don’t actually fix the errors but instead keep track of them in a
Pauli frame.) Therefore, the logical error rate of the level 2 magic state, while higher than the level 1 magic
state we had earlier, is not too much higher, and certainly not twice as large. Continue adding more levels
of encoding, stopping to do error correction after each one. At each level, the additional logical error rate
decreases, and the accumulated error through the whole encoding converges. Therefore, we end up with a
level l magic state with a constant total logical error rate.

We can make many such level l magic states. The error rate for each one is much higher than we would
tolerate for a level l gate on the data, but is below the threshold for magic state distillation. Indeed, this is
just what magic state distillation is good at. The level l error rate for Cli↵ord group gates is negligible, so
we can rapidly reduce the logical error rate for the magic states. This does require measurement, but it is
level l measurement, so we are willing to wait for the outcome. Then we do the magic state injection gadget.
That again requires measurement, but once again, with l levels of encoding, we are willing to wait.

Actually, this description is not quite right. The problem is that if we wait to do magic state injection
until we have all the measurement outcomes for the error corrections during magic state preparation and the
measurements during magic state distillation, there will be new errors occurring while we wait. Of course,
they are correctable errors since we have a high level of encoding. To deal with the new errors, we continue
doing error correction while we are waiting . . . but if we wait for the outcomes of those error corrections,
then there will be more errors, and so on. We could end up waiting forever.

We might need to wait for the magic state distillation to conclude (depending on how useful a magic
state that fails distillation is), but luckily, we don’t need to wait for all the error correction measurements
to come in to start the magic state injection procedure. We do need to know their outcomes to finish the
conditional operation in magic state injection, since the Pauli frame will contribute to the calculation of
the logical measurement outcome, but critically, we only need to know the error syndromes of EC steps
up to the time of the measurement for magic state injection. Any new errors that occur after that point
don’t contribute to the logical measurement outcome. We still need to do error correction and collect error
syndromes to add to the Pauli frame, but they won’t be relevant until the next magic state injection. So if
we do the measurement part of the magic state injection as soon as we have a good magic state to use, we
can then a↵ord to wait for the measurement outcome before performing the conditional operation. We thus
have a fault-tolerant gadget with slow measurements.

291
15.4 Fresh Ancilla Qubits
Just as it may be the case that measurement can only be performed at the end of the computation, it may
be that state preparation can only be performed at the beginning of the computation. For instance, in some
physical implementations, preparation takes a long time or can only be done globally, on the whole computer
at once. In that case, it is not valid to assume that we can prepare a new ancilla qubit whenever we need to.
If we have the ability to perform non-destructive measurement and to condition operations on classical
information during a computation, we can also reset qubits by measuring and flipping a |1i qubit into a |0i
qubit. This resetting can be used to refresh any needed ancillas. But we’ve seen that measurement is not
critical to a threshold, so what if we don’t have either measurement or state preparation in the middle of
the computation?
If there is no way to generate fresh qubits while a computation is ongoing, then instead, at the beginning
of the computation, we must determine how many ancilla qubits we will need to complete the computation
and prepare all of them together. If sometime in the middle of the computation, we need an ancilla qubit for
error correction or another purpose, we take an ancilla from the stockpile and use it. Unfortunately, these
ancillas are no longer fresh; they have been sitting around for a long time, potentially accumulating errors
during that time. Is there something we can do to keep ancillas from going stale before they are used? If
not, we won’t have a threshold.

15.4.1 Thermodynamic Analysis of Error Correction and Fault Tolerance

One way to understand this issue is to think about error correction and fault tolerance from a thermodynamic
point of view. Our goal is to keep the data block relatively free of errors. Its entropy is low, so we can think
of it as having a low temperature. If the data block is then put through a noisy channel, errors occur and
the entropy rises; the system heats up. The purpose of error correction is to remove errors and cool the
system back down again. The way we do this is to bring in cold ancilla qubits and interact them with the
data to determine the error syndrome. Since the error was random, the ancilla qubits heat up in the process
because they now record a random value, the error syndrome. Based on the error syndrome, we correct the
data qubits, cooling them back down again. In other words, we have pumped the entropy due to the errors
from the data block into the ancilla qubits. Fault-tolerant error correction protocols act as a refrigerator,
continuously cooling down a system.
The nature of the ancillas used matters a lot. If the error correction procedure were ideal and the ancillas
being used were perfect, the protocol would remove almost all of the errors (although any error sufficiently
large to change the logical state would remain), so the data block would be cooled nearly to absolute zero.
Actually, if you model the system with a Hamiltonian whose ground space is the whole code space, it would
be at absolute zero, since the degeneracy of the ground space would produce a system with residual entropy
at zero temperature. However, in order to achieve this, it is essential that not just the gates but also the
ancillas being used are perfect. In a realistically noisy fault-tolerant protocol, errors in the ancilla qubits can
feed back into the data, as we saw in chapter 12. Ancillas which have a low level of errors are cold, but not
at zero temperature, and the data block can only be cooled down at most to the temperature of the ancillas
being used.
If ancillas can only be prepared at the beginning of the computation and heat up like everything else in
the computer, it is clear that we are in trouble. As the computation proceeds, the available ancillas become
hotter and hotter, and we cannot hope to cool the data block down below the temperature of the ancillas.
A system interacting only with another hotter bath will eventually heat up to the temperature of the bath.
It is just a matter of time. The question, then, is how much time it takes before the system equilibrates
with the bath. If it is a very long time (for instance, exponential in the system size), then fault tolerance is
possible. If it is a short time, then fault tolerance will be severely limited at best.
This thermodynamic picture is more than just an analogy. Real experiments in quantum computation
are often done at cryogenic temperatures, and heat leaking in from the outside world, directly or indirectly, is
an actual source of noise. Cryogenic systems need continued cooling or they will heat up, and fault-tolerant
quantum error correction is indeed one way of literally refrigerating the system of interest. However, quantum

292
Computer
Heat (errors) Heat (errors)

Heat exchange (error correction)

Cold ancillas Warm ancillas

(Low error rate) (With error syndrome)

Figure 15.6: The thermodynamic picture of QECCs

error correction is more than just another method of cooling. The logical state we want to preserve is not in
general a thermal state. Even if the system Hamiltonian is degenerate, there may be errors such as dephasing
which don’t change the energy (and thus don’t heat the system) but still need to be avoided.
Note also that very little about this situation is uniquely quantum. Most of the same considerations
apply to classical computers. Standard classical computers generate a lot of their own heat due to the
irreversibility of the computation; but this is actually a form of error correction, with bits encoded in many
redundant electrons which are forced into “0” and “1” states. Keeping the processors cool is a significant
architectural concern with the design of classical computers as well.

15.4.2 A Case Where Fresh Ancilla Qubits Are Necessary

To see a concrete case where fault tolerant quantum computation is essentially impossible without fresh
ancillas, let us consider a greatly simplified model. Assume that at each time step, we can perform any
unitary we want, interacting as many qubits as we want, but then afterwards, each qubit undergoes the
depolarizing channel (1.20) with some fixed error probability p. The “no fresh ancillas” assumption is
present in this model with the requirement that at each time step, we perform a unitary map. The model
allows arbitrary unitaries, which is very generous compared to the usual basic model of fault tolerance,
which allows only one gate per qubit per time step. Thus, an impossibility result in the simplified model
also applies to the basic model of fault tolerance when the noise is depolarizing.
Theorem 15.4. In a closed quantum system of n qubits that undergoes depolarizing noise on each qubit with
fixed error probability p at each time step, the system becomes exponentially close to the maximally mixed
density matrix as a function of time.
As a consequence of this theorem, any computation in such a system that runs for a time longer than
O(log n) will give an input which is very close to completely random. This doesn’t necessarily mean that
a quantum computer with no fresh ancilla qubits is useless — it seems likely that there are interesting
quantum algorithms which can be run in parallel in logarithmic time and yet beat any polynomial-time
classical algorithm. Indeed, Shor’s factoring algorithm is an example. What it does show is that no threshold
theorem is possible for a closed system: A sufficiently large noisy quantum circuit will necessarily give a
di↵erent result than the ideal version of the circuit (unless the ideal circuit was also supposed to produce a
totally random output, I suppose).
Proof. We will track the entropy of the system going from time step t to time step t + 1. Let 1, 2, 3, . . . , n
be the qubits in the computer after the unitary at time t but before the noise, and let 10 , 20 , 30 , . . . , n0 be the
qubits after the noise from time t and before the unitary for time step t + 1. Then
n
X
S(1, 2, 3, . . . , n) = S(i|i + 1, i + 2, . . . , n) (15.5)
i=1

293
and
n
X
S(10 , 20 , 30 , . . . , n0 ) = S(i0 |(i + 1)0 , (i + 2)0 , . . . , n0 ). (15.6)
i=1

S(A|B) = S(AB) S(B) is the conditional entropy of two subsystems A and B. In this case, we sum over
the conditional entropies of qubit i or i0 , conditioned on the qubits later in the computer. But

S(i0 |(i + 1)0 , (i + 2)0 , . . . , n0 ) S(i0 |i + 1, i + 2, . . . , n). (15.7)

By S(i0 |i + 1, i + 2, . . . , n), I mean the conditional entropy taken for the density matrix in which the unitary
at time t + 1 has been done and qubit i has undergone the depolarizing noise, but qubits i + 1 through n have
not been depolarized. This equation can be proven by purifying the action of the depolarizing channel on
qubits i + 1, . . . , n and then tracing out the purifying system — the unitary interaction with the environment
does not change the conditional entropy, and dropping the purifying system can only increase the conditional
entropy.
Therefore,
n
X
S(10 , 20 , 30 , . . . , n0 ) S(i0 |i + 1, i + 2, . . . , n) (15.8)
i=1
Xn
⇥ ⇤
(1 4p/3)S(i|i + 1, i + 2, . . . , n) + (4p/3)S( 12 I|i + 1, i + 2, . . . n) (15.9)
i=1
=(1 4p/3)S(1, 2, 3, . . . , n) + 4pn/3. (15.10)

To get the second line, I used the decomposition of the depolarizing channel Dp (⇢) = (1 4p/3)⇢ + (4p/3) 12 I
and the concavity of the entropy (which also applies to the conditional entropy). Consequently,

n S(10 , 20 , 30 , . . . , n0 )  (1 4p/3)[n S(1, 2, 3, . . . , n)]. (15.11)

The entropy of the system approaches the maximal value n exponentially with time. The only state with
maximal entropy is the completely mixed state, so we have the desired result.

15.4.3 Other Error Models

For channels other than the depolarizing channel, other behaviors are possible. A trivial case is if the channel
is actually a fixed unitary. The channel does not add any entropy, and in principle can be completely reversed
by the inverse. No further error correction is needed in this case, so arbitrarily long computations are
possible. A more interesting case is for a non-unital noise channel, such as the amplitude damping channel.
The amplitude damping channel drives qubits towards the |0i state. But this is a cooling process! Given
enough time, qubits will relax to an entropy-free state. This means that even though our control is only
unitary, we can subvert the noise process to provide fresh ancillas. We prepare ancillas at the beginning of
the computation, then use them for error correction. The used ancillas will contain a lot of entropy, so we
let them sit for a while. They eventually relax back to the ground state, making them once more suitable
for use. For a weak enough amplitude damping channel, a threshold theorem is once again possible.
It turns out that a similar statement can be made for any non-unital error model, but the proof is a bit
more complicated. A quantum channel is non-unital if it takes the maximally mixed state to a state that is
not maximally mixed. The amplitude damping channel is one such channel, but there are many others. One
property of non-unital channels (at least when acting on qubits) is that it can be arranged that if the channel
is repeatedly applied to the qubit (which corresponds to letting the qubit sit for a very long time), the state
of the qubit will approach exponentially close to some fixed point. For the amplitude damping channel, the
fixed point is the state |0i, but for other non-unital channels, it can be any non-maximally mixed state.
This means that an ancilla qubit left sitting around for a long time will eventually get close to the fixed
point. Ancillas prepared at the beginning of the computation may heat up over time, but at least they will

294
not become completely randomized, so they still have some value. Furthermore, used ancillas also relax to
the fixed point; if the fixed point has less entropy than the ancilla does just after error correction, the channel
will cool down used ancillas, just as did the amplitude damping channel. If the fixed point is sufficiently
close to pure, the entropy in a qubit at the fixed point just acts like a small preparation error rate pP . If this
error rate is below the threshold, we can simply treat it as a source of preparation errors and go through the
usual threshold argument.

Even if the fixed point has a significant amount of entropy (but not maximal), there is a trick which
enables us to squeeze out some useful ancillas. Our control of the system is limited to unitary operations,
which leave the total entropy of a system unchanged. However, nothing says that unitaries can’t move
entropy within the system. There is a technique known as algorithmic cooling which uses unitary operators
to rearrange the entropy in a system, shu✏ing most of it into a smaller number of qubits and producing a
few nearly pure qubits in the process. A few qubits are cooled down, and in the process, the remaining ones
are heated up. Algorithmic cooling only works if we start with a non-maximally mixed state; if the entropy
of the system is already maximal (infinite temperature), there is nowhere to push the entropy. Algorithmic
cooling is a form of data compression, which also takes entropy (in the form of many possible messages) and
rearranges it into a smaller number of qubits. However, in data compression, we are mostly interested in
the qubits in the final state which contain lots of entropy, whereas in algorithmic cooling, we are primarily
interested in the remaining qubits from which entropy has been removed.

Thus, to reset ancillas after they are used in error correction, we can let them sit for a while, until they
reach the fixed point of the channel. Then we perform algorithmic cooling and extract out a few cold ancillas
to use again in error correction. The remaining ancilla qubits heat up in the process, but we can once more
let them sit and they will again approach the channel’s fixed point. This procedure allows us to prove that
for an independent error model with any non-unital channel on each qubit, a threshold theorem holds; I
will omit the full analysis. Notably, this means that even if qubits equilibrate to a bath which is very hot
(but finite temperature), a threshold still exists provided that the coupling between the system and bath
is sufficiently weak so that the channel in one time step is below the threshold. The temperature of the
bath will surely a↵ect the threshold, though: the hotter the bath, the more work we need to do (using more
ancilla qubits) to perform algorithmic cooling, and new errors during the algorithmic cooling procedure will
ruin it. Precisely how much lower the threshold must be is at present unknown.

Finally, there is an interesting intermediate case with channels such as the dephasing channel. The
dephasing channel will leave unchanged a classical basis state |0i or |1i, or a mixture of such states, but will
cause superpositions to decohere. If we prepare ancillas at the beginning of the computation in the state |0i,
they will stay there unchanged until we need them. Therefore, in this case it is acceptable to have no fresh
ancillas. However, with dephasing noise, there is no way to reset used ancillas. For a computation of length
T using n qubits, we will need a total of O(nT ) ancillas, and we must have them all from the start. We get a
weakened threshold theorem, which says that for weak dephasing noise below the threshold, arbitrarily long
quantum computation is possible, but with an overhead that is polynomial in the size of the logical circuit
rather than polylogarithmic. It is possible to prove that polynomial overhead is the best possible for this
situation (dephasing-type channel with no fresh ancillas).

Pure perfect dephasing noise is not at all generic, any more than perfectly unitary noise is. While
dephasing noise is an important phenomenon in real systems, and is often the dominant error process, there
is always some other source of noise, even if it is much smaller. When dealing with a large quantum computer,
we may want to do millions or billions of logical gates, so even small error processes can be important. Thus,
realistically, we are either likely to be in a situation like the depolarizing channel, where ancilla qubits will
heat up over time, or like a non-unital channel, where we can arrange that ancilla qubits cool down over
time. Depending on which sort of noise is present, a threshold may or may not be possible without fresh
ancillas.

295
15.5 Parallelism and Timing
Now let’s move from worrying about the timing of measurements to worrying about the timing of gates. We
explicitly assumed, as part of the basic model of fault tolerance, that gates could be performed in parallel,
but there was the additional implicit assumption that all gates take the same time, so there can only be
one gate per qubit per time step. In this section, I’ll examine these assumptions and a few other questions
relating to gate timing.

15.5.1 The Need for Parallelism

To have a threshold, it is essential to do parallel gates, at least when the storage error rate pS is non-zero.
The reason is simple: Suppose we have a very large quantum computer containing many qubits, but lack the
ability to do parallel gates. Even if all we want to do is to safely store these qubits in a noisy environment,
we’ll need to do something to correct the errors. Correcting errors on qubit 1 requires at least one gate
involving qubit 1, and correcting errors on qubit 2 involves at least one gate on qubit 2, and so forth. If we
can only do one gate involving c qubits at a time, we can only touch c qubits in each time step. Even if one
gate suffices to correct all of those c qubits, we’ll still need dn/ce time steps to correct all of the qubits. By
the time we get to the last set of qubits, they’ll have accumulated storage errors over a very long time; the
probability of not having a fault during the waiting period is (1 pS )dn/ce . As n gets large, this goes to 0.
In particular, the probability of error will go above the limit of the code’s error correction capability, so the
information stored in the qubit will be irreversibly lost.
Now, the basic model for fault tolerance assumes maximal parallelism — we can do gates on all qubits
simultaneously, as long as no qubit is involved in two di↵erent gates. A lower degree of parallelism is also
acceptable. It just means that qubits have to wait around some time between gates. For instance, if we can
only do gates on one-third of the qubits at a time, we can compensate by putting up to two wait locations
before each gate. This alters e↵ective error rates, of course. The storage error rate would triple pS 7! 3pS ,
and the gate error rate would increase pG 7! pG + 2pS . Note that the e↵ective physical error rates change,
but the logical error rates at level 1 and higher are still determined from the physical error rates by the
usual formulas. If the storage error rate is low, it is not too costly to have a lower degree of parallelism, but
if the storage error rate is high, the threshold might get lowered significantly. Nevertheless, we still have a
threshold, provided we have the ability to perform gates on a constant fraction of the qubits in the computer,
independent of the size of the computer.
Fault-tolerant protocols lend themselves well to parallelization. Transversal gates are naturally parallel,
and even the non-transversal gadgets (including error correction and magic state injection) tend to have a
number of transversal components. Thus, in a system capable of maximal parallelism, most qubits are doing
something during almost every time step. We can therefore expect that the actual cost of less-than-maximal
parallelism is going to be close to the upper bound noted above.

15.5.2 Synchronization of Gates

The next question is what happens when the notion of “time step” is not well-defined because di↵erent gates
take di↵erent amounts of time. The simplest thing to do in such a case is take a time step to be the length
of the longest gate (frequently, but not always, the CNOT gate). If the protocol calls for a shorter gate, do
it and then wait for the remaining time until the next time step begins. Naturally, this makes the shorter
gates a little more noisy since there are more storage errors occurring while they wait for the slower gates
to finish, but it doesn’t fundamentally alter the model. There is still a threshold, we just need to phrase it
in terms of the error rates per time step.
When some gates are shorter than others, it is tempting to just jump ahead to the next gate as soon as
we finish the previous one. The risk in doing so is that our qubits will get out of sync. Suppose CNOT gates
are slower than H gates, and the idealized version of the circuit has a H on qubit A and a CNOT from qubit
B to qubit C for step 1, then a CNOT from qubit A to qubit B for step 2. We can’t start the CNOT for step
2 until the CNOT at time 1 is finished, even though qubit A is ready quickly. When it’s just a delay due to

296
one slow gate, this is a minor problem, but if some qubits experience many more slow gates than others, the
time di↵erence could get to be quite large. A long wait time allows storage errors to accumulate, so we may
have to do error correction on the qubits that are waiting in order to keep them safe. All in all, it is best to
avoid this by keeping the gates synchronized. If all qubits are doing short gates at the same time, we can
safely move on to the next step quickly, but otherwise, we should move at the pace of the slowest gates.
Another potential cause of synchronization problems comes from fault-tolerant gadgets that take a vari-
able amount of time. For instance, in Shor error correction, one might decide that the number of repetitions
of the syndrome measurement should depend on the outcomes. If the initial few measurements show a
zero syndrome, you might decide to stop early and not repeat the measurement any more. If the first few
measurements disagree on the error, you might want to measure a few extra times to be sure. When looking
at the fault-tolerant gadget in isolation, that is OK, the gadget may still satisfy the ECCP and ECRP, but
when you look at the circuit as a whole, you’ll see that such a procedure is likely to cause synchronization
issues. Some blocks of the code will experience fault paths which lead to a short EC gadget, but other blocks
will take longer to complete the gadget. In order to properly account for errors in this situation, you must
include the wait locations that must be inserted for the quick instantiations of the gadget as they wait for
the slower gadgets to catch up. For this reason, the procedure I gave in section 12.2.2 uses a fixed number
of repetitions, regardless of the outcomes.
Another example of variable-length gadgets is state preparation. When you test ancilla states and use
post-selection to decide which ones to keep, you might want to make di↵erent attempts at creating the ancilla
state sequential rather than in parallel, in order to reduce the total number of qubits needed at any given
time. The difficulty, once again, is that other states being prepared at the same time will take di↵erent
amounts of time, meaning some of them will have to wait a while before the next step of the computation.
If the storage error rate is low and the cost of qubits or the gate error rate is high, it might still be worth
using tricks like these. Just remember to properly account for the cost of waiting when analyzing the system.

15.5.3 Pipelining of Ancillas

This discussion brings up another issue that I’ve alluded to before: when to prepare ancillas. Fault-tolerant
protocols call for a lot of multiple-qubit entangled ancilla states, frequently some state encoded in a block of
the QECC being used. If we prepare ancillas too early, they will have to wait around until they are needed,
and will accumulate errors while waiting. If we wait until the last minute to prepare ancillas, just before
they are needed, then the data qubits have to wait around for the ancillas, leading to more errors on the
data qubits.
This is another advantage of standardizing the duration of fault-tolerant gadgets. Then we know exactly
when each ancilla will be needed, and we know exactly how long it takes to create the ancilla. We can
therefore begin preparing it at precisely the right time so that it will be ready just as it is needed.
Note that preparing ancillas usually takes significantly longer than the parts of the gadget that directly
interact with the data block. For instance, Steane EC needs two time steps to perform CNOTs to and from
the data block, plus time for a measurement location and Pauli correction step if we don’t want to separately
track the Pauli frame, a maximum of 4 time steps. Preparing the ancillas, however, requires 5 time steps
for the initial encoding, plus a few more to test the ancilla before it is used. Therefore, if we want it to be
ready just in time to be used, we need to start making it before the beginning of the previous EC gadget.
At that time, we’ll also be finishing o↵ the preparation of the ancillas for that earlier EC gadget. If there’s a
⇡/8 logical gate sometime soon, we may also be preparing a magic state. In other words, ancilla preparation
tends to overlap. Since the same types of ancillas are needed over and over again, and the ancillas are needed
at staggered times, it is natural to design the computer to pipeline ancilla preparation. Perhaps one location
in the computer specializes in encoding the QECC, and then it passes the ancilla blocks to the next location
over, which tests ancillas against each other.

297
... H

Figure 15.7: A segment of a circuit with ancilla preparation pipelined for an H, R⇡/8 , and two EC steps.
Each of these preparations is done in parallel with 2 copies (not shown) in case one fails post-selection. The
EC during magic state preparation requires its own ancillas (not shown). The size of the rectangles for state
preparations represents the actual time needed for the non-fault-tolerant preparation circuit.

15.6 Leakage Errors

Now let’s move to considering the assumptions about the types of errors allowed. With the basic model of
fault tolerance, we said that for each location L, there is no error with probability 1 p, and with probability
p, the location does not implement the desired unitary but instead applies some other CPTP map to the
qubit(s) involved in the location. This is not the most general error that can happen. I mean, certainly, the
assumption that there is no error at all with probability 1 p is not completely general, but even in the
case when there is an error, the basic model didn’t allow the most general sort of error. The catch is the
requirement in definition 10.2 that the error has “the same input and output Hilbert space dimensions as
L.” It doesn’t make sense to allow a di↵erent input Hilbert space than L, but it is quite sensible to allow a
di↵erent output Hilbert space than L.
If the output state is not in the usual computational Hilbert space of the qubit(s) involved in L, the error
is an erasure error, or possibly a superposition of an erasure error and some more mundane sort of error
that preserves the Hilbert space. Erasure errors are in some sense less serious than other kinds of errors,
since a QECC with distance d can correct up to d 1 erasure errors but only b(d 1)/2c general errors. In
the context of fault tolerance, however, erasure errors that leave the computational Hilbert space (in this
context known as leakage errors) can present a potentially serious problem. We don’t know what happens
to qubits outside the usual computational Hilbert space when we perform a gate on them, even if the gate
operates as designed. Furthermore, a steady stream of leakage errors could, if we don’t counter it somehow,
leave us with no valid qubits on which to perform the fault-tolerant protocol.

15.6.1 Leakage Detection

The first defense against leakage is to have some way of detecting it. The details of how to do so will depend
on the nature of the leakage and the physical system in which the qubits are implemented.
One category of leakage is when the physical system acting as the qubit is gone. For instance, a qubit
stored in the polarization of a photon is susceptible to leakage due to loss of the photon through absorption
into a material or escape from the optical fiber, optical cavity, or whatever other device is used to store the

298
photon. Since the qubit is gone, attempting a gate after a leakage error of this type often produces no result
at all (the identity on any other qubits involved in the gate), depending on the details of the implementation.
In any case, there is usually a unique state resulting from the leakage, so fault-free gates always produce the
same e↵ect, which will be di↵erent from the e↵ect of the gate when the qubit has not leaked. Therefore, a
small circuit can usually detect the leakage. Once we know the qubit has leaked, we must replace it with a
new qubit. This qubit will not necessarily be in the same state as the old qubit, but at this point we can
treat it as a conventional erasure error: The data has undergone an error, but we know which qubit was
a↵ected.
A second category of leakage occurs when the physical system ends up in a di↵erent internal state than
those usually used for computation. As an example, consider a qubit stored in the energy level of an electron
of a trapped ion. The electron has many possible energy levels beyond the two acting as |0i and |1i, so a
faulty location in the circuit may cause the electron to end up in one of those other levels. Gates will usually
do something when presented with a leakage of this type; however, since there are many possible erroneous
internal states, it may not be possible to definitely predict how the leaked qubit will behave. Thus, it is
more difficult to identify that the leakage occurred. However, it may be more straightforward to correct a
leakage error of this type than it would be for outright loss of the qubit, since all that is needed is to move
it back to the intended subspace.
Sometimes, leakage comes with an obvious signature. For instance, an electron in the wrong state may
spontaneously emit a photon of a particular wavelength. By monitoring the system for photons of that
frequency, we can detect the leakage and counter it, either by rotating the system back into the correct
Hilbert space or by simply replacing the qubit with a new one. Either way, we know the qubit that su↵ered
the leakage error, and returning it to the computational Hilbert space turns it into an erasure error.
Occasionally, a leaked qubit might return to the computational Hilbert space by itself. For instance, the
electronic energy level in which the qubit ends up as a result of the noise might be very unstable, so the
system might rapidly decay down to the ground state, which is used as |0i. In that case, the leakage error
acts like a regular error (amplitude damping in this example), and doesn’t need to be treated separately.
We still can treat it as an erasure error if we manage to identify the location of the error, for instance by
detecting the photon emitted when the electron decays.

15.6.2 Knill EC and Leakage Recovery

The next strategy is to simply replace qubits with fresh ones whether or not they are known to have leaked. If
qubits are constantly being replaced, leakage is automatically contained. Any gates involving a leaked qubit
before it is replaced may still be wrong, but that is no worse than the error propagation that could occur
from a regular error. One disadvantage of constantly replacing qubits is that it may lead to a potentially
higher overhead. A more serious disadvantage is that it forgoes the opportunity to identify the leakage.
The replaced qubit must be still be corrected, but without knowledge of where the leakage occurred, it is
corrected as a general error rather than an erasure error.
Knill error correction provides this service for free. The qubits that come out of a Knill EC step are
physically di↵erent qubits from the qubits that entered it. If a qubit in the data block leaks before the start
of Knill EC, the leakage will show up in the measurement step of the EC gadget. Depending on the nature
of the leakage and the measurement, it may show up as a general error or as a erasure error. For instance, a
missing photon will produce neither a |0i or a |1i outcome in the measurement, so it can be easily identified.
Based on this, we can correct errors normally, possibly taking advantage of any extra information we have
about the erasure. The data block post-EC was originally part of a logical EPR pair created for the Knill
EC gadget. Of course, it could have leakage errors of its own, but those errors are independent of the errors
in the original data block. That is, Knill EC has the e↵ect of restoring qubits to the computational Hilbert
space, though naturally they immediately start to leak out again.
Gates performed by teleportation have the same property: The qubits that come out of the gadget are
physically distinct from the qubits that enter the gadget. As a consequence, gate teleportation automatically
restores qubits to the computational Hilbert space.
If these gadgets are somehow not appropriate to the specific fault-tolerant protocol you are running, it’s

299
still possible to make use of the basic idea. By frequently teleporting qubits, you replace the original qubits
with new ones, which restores them to the computational Hilbert space.
Also note that replacing qubits frequently is not incompatible with attempting other forms of monitoring
for leakage errors. We can keep watch for tell-tale emitted photons at the same time as using Knill EC.
The monitoring then gives us a chance to treat an error as an erasure error, allowing more efficient error
correction.

15.7 Biased Errors

The basic error model assumes we have no information ahead of time about what type of errors to expect,
other than that they are stochastic. Erasure errors o↵er one way for us to get more information, but there
are others. Realistically, we probably will know a lot about the types of errors that occur in our machine.
For instance, dephasing and amplitude damping are two common real-world types of error, and both families
are much more restricted than general stochastic errors. It is possible to design quantum error-correcting
codes that are more efficient for these restricted channels than for general errors of weight t, so it makes
sense that we should also see improvements in the threshold when we restrict the error model. If the errors
are biased — some types of errors are more common than others — we want to use a code that is better at
correcting the common errors than the uncommon ones.
Naturally, any fault-tolerant protocol designed for adversarial errors will continue to work for a specific
error model, such as a depolarizing channel or a biased Pauli channel. Analysis of existing protocols for
a depolarizing channel produces a higher threshold (albeit not by much, perhaps a factor of 2–4) than
for adversarial error models. It’s not clear whether this is due to overly conservative assumptions in the
adversarial model or if the depolarizing channel is simply not the most dangerous type of error. Beyond
improving the analysis, if we can figure out how to build a fault-tolerant protocol designed for a specific
error model, we would hope to raise the threshold even further.

15.7.1 Bias-Preserving Gates

While there are ways to design QECCs for particular error models, it is not totally straightforward — the
stabilizer formalism is only really suited for Pauli channels, although it can sometimes provide insight for
other sorts of channels. Even for Pauli channels, the only case that is really straightforward is when phase
errors and bit flip errors are independent, with one being more common than the other. Then a CSS code
is suitable, using codes C1 and C2 with di↵erent distances. For more complicated channels, good codes
can sometimes be determined numerically or through some ad hoc tricks, but there is no reliable general
methodology. (Of course, there is no general methodology for finding the best codes with a given distance,
either.)
Unfortunately, tailoring fault-tolerant protocols for particular error models is much harder than doing it
for quantum error-correcting codes. The problem is that in a fault-tolerant protocol, errors don’t stay put.
They move around, and more importantly for current purposes, they change their nature.
For instance, suppose a qubit experiences biased Pauli noise with Z much more common than X. Natu-
rally, this suggests we should use a code that is much better at correcting phase errors than bit flip errors.
But suppose we then perform a physical Hadamard transform on one or more qubits of the code. The
Hadamard switches X and Z errors, so now X errors are much more common than Z errors on the rotated
qubits. Not for long, though: Any new errors that occur still come from the same Pauli channel, so new
errors will be predominantly Z errors. If we were to switch to a code that corrects mostly bit flips after the
Hadamard, it will not be good at correcting new errors, but a code that corrects mostly phase errors will
not be good at correcting the errors that propagated through the H. Gates in a fault-tolerant circuit tend
to mix up the types of error, washing out di↵erences in their relative frequency.
Even gates which don’t change the type of errors under conjugation can still cause a problem. For
instance, the CNOT propagates a Z error into either one or two Z errors, depending if it is on the control or
the target qubit. Therefore, a phase error before the CNOT stays a phase error, and naturally a phase error

300
after the CNOT is a phase error as well. But what about a phase error during the CNOT? Gates cannot be
performed instantaneously; instead they result from a process which takes some finite time. For instance,
CNOT could be performed by turning on the Hamiltonian H = 12 (I + Z) ⌦ I + 12 (I Z) ⌦ X = CNOT for a
time ⇡/2. (Actually this does iCNOT, which is the same up to a global phase.) Notice that an X appears
in the Hamiltonian. Suppose a Z error occurs on the second qubit at time ⇡/4, and assume the error itself
takes negligible time. Then the actual unitary which is performed is

i⇡/4H 1
e (I ⌦ Z)e+i⇡/4H CNOT = (I iCNOT)(I ⌦ Z)(I + iCNOT)CNOT (15.12)
2
1
= [I ⌦ Z 2(I Z) ⌦ Y + Z ⌦ Z]CNOT. (15.13)
2
Thus, the second qubit could have either a Y or a Z error on it relative to the correct value when the
faulty gate is completed. If we do CNOTs on a code that only corrects phase errors in conditions with only
dephasing noise, we will be unpleasantly surprised to find that other sorts of error appear during the process.
For qubits, this is an unavoidable problem with CNOT gates. Interestingly, there exist bias-preserving
implementations of the CNOT gate if your full physical state space is an infinite-dimensional Hilbert space,
such as an electromagnetic field mode, out of which you use an appropriately-encoded two-dimensional
computational subspace to act as the “physical” qubit of your quantum computer. (The scare quotes around
“physical” are there because this encoding of a qubit in a continuous-variable system is itself a QECC, and
the overall encoding is the continuous-variable code concatenated with a qubit code correcting phase errors.)
The idea is to implement CNOT with a rotation through the extra degrees of freedom in a continous-variable
system in such a way that the dominant source of errors in the system result in qubit phase errors, whether
they occur before, during, or after the CNOT. Of course, this approach doesn’t solve the problem with
Hadamard or other gates that alter the error bias under conjugation.
When a bias-preserving CNOT gate is not available (or the analog for other error models), we should
restrict attention to gates which can be implemented using Hamiltonians that commute with the error model.
When the Hamiltonian commutes with the error, it doesn’t matter when the error occurs relative to the gate,
so this condition guarantees the gate will preserve the bias. In the case of pure phase errors, that means
using diagonal Hamiltonians, such as two-qubit Z ⌦ Z interactions.
Another concern is what happens to errors of the type your code does not specialize in. Because any fault-
tolerant protocol involves more locations than the circuit being simulated, there are also more opportunities
for something to go wrong. While one type of noise may be much larger than the others, there will always
be some contribution from a wide variety of noise sources, and using a fault-tolerant protocol will e↵ectively
amplify the error rate for any type of error it does not correct.

15.7.2 Fault Tolerance for Dominant Phase Errors

Despite these obstacles, for the simple case of dephasing, more specialized fault-tolerant protocols can be
designed to take advantage of the restricted error model. Great care is needed, and the improvement gained
over more general methods of fault tolerance is modest, but there is an improvement.
Naturally, for a pure dephasing channel, you want to use a phase correcting code. The 3-qubit code
from section 2.2.3 or the obvious generalization to more qubits are good choices. Another good choice is the
XZZX code, suitably rotated, which is a relative of the surface codes we will discuss in chapter 19. The
trick is then to build the main gadgets of the protocol using only bias-preserving gates, as well as preparation
of |+i and X measurements. Diagonal gates commute with Z errors, so cannot ever change a Z error into
an X or a Y . The only possible error on |+i is a Z error, since X does nothing even if we somehow manage
to cause an X error, and similarly only a Z error changes the outcome of an X measurement. If we are
in a system with a bias-preserving physical CNOT gate, we can use that as well. Even without a physical
CNOT gate, we have enough tools to build gadgets for the logical CNOT gate and logical preparations and
measurements in the X and Z bases, as well as an error correction gadget. From this point, magic state
distillation can complete the universal set of logical gates.

301
|+i

Figure 15.8: Logical X measurement for the 3-qubit phase repetition code

The gadgets for logical Z basis preparation and measurement are straightforward: simply perform
transversal |+i preparation or transversal X measurement, respectively. In this context, we can consider a
gadget to be fault-tolerant if it uses only phase-error-friendly locations and satisfies the usual fault-tolerant
properties, but for numbers of phase errors instead of numbers of general Pauli errors. See section 16.3 for
a more general discussion of what it means to be fault-tolerant beyond the usual definitions.
To measure in the logical X basis, we need ancillas. The logical X operator for this code is Z ⌦ Z ⌦ Z.
We can measure this with a single ancilla qubit by performing three C Z gates between the three qubits
of the code and the ancilla, which should start in the state |+i. Then measure the ancilla in the X basis
for the answer. Of course, using one ancilla qubit is not fault tolerant, but we can repeat the procedure for
greater certainty. The whole circuit is shown in figure 15.8. Note that we don’t need to worry about error
propagation from the ancilla into multiple qubits of the code: There are only phase errors, which cannot
propagate through a C Z gate. We can similarly measure tensor products of logical X on multiple blocks,
e.g., X ⌦ X or X ⌦ X ⌦ X.
The measurement gadget is non-destructive, so we can make a |+i state by preparing a |0i and making
a logical X measurement. If we get the +1 outcome, we have the desired |+i state; otherwise, we are left
with an | i state. We could just discard a | i outcome and try again, but the procedure is a bit more
efficient if instead we keep track of the fact that the state has a Z relative to the correct state. That is,
update the Pauli frame (as in section 12.5.2) of the logical state to account for the preparation. Note that
directly performing an actual Z gate on the state would be a bit tricky, since Z = X ⌦ I ⌦ I, which is
not diagonal, and therefore runs the risk of producing non-phase errors if a fault occurs during the gate.
However, updating the Pauli frame is just a conceptual trick. It happens instantaneously and perfectly, so
there is no need to worry about faults during the update. In principle, we do need to worry about error
propagation, but a Pauli update simply adds a global phase to existing Pauli errors.
The error correction gadget uses a variant of Knill EC. Normally Knill EC is based on regular telepor-
tation, using two ancilla blocks and a Bell measurement. For a phase-error-correcting code, we can instead
use one of the one-bit teleportation circuits from figure 13.6. One ancilla block is sufficient because the
syndrome only needs to specify the phase errors in the code, whereas a regular QECC needs a big enough
ancilla for both bit and phase flip errors. If we don’t have a bias-preserving CNOT gate, we can substitute
measurements. If the ancilla starts as |0i, then we non-destructively measure X ⌦ X, and finish by measur-
ing the original data block with a transversal X measurement, which includes the Z measurement but also
determines the error syndrome. The C Z gates in the X ⌦ X measurement commute with any Z errors in
the state, so the transversal X measurement will pick up any pre-existing or new Z errors, but only ones in
the data block. You can check that a logical X ⌦X followed by Z does in fact perform a one-bit teleportation
in the absence of faults. Each of the two measurements has two outcomes, and each outcome can result in a
di↵erent logical state. The four possible results correspond to the original logical state with one of the four
Paulis I, X, Y , Z on it. Once again, absorb the logical Pauli into the Pauli frame.
A logical CNOT can be performed by a transversal CNOT in the opposite direction. If we don’t have
bias-preserving physical CNOT gates, we have to work harder, but a CNOT gadget can also be performed
using an appropriate sequence of measurements, as given in figure 15.9. Let us analyze the unencoded
version of this circuit using the techniques from section 6.2. (This is precisely the type of application those
techniques are meant for.) We start with two data qubits A and B and two ancilla qubits 1 and 2. The

302
Ancilla 1: |+i X
Data A X
Data B X X
Ancilla 2: |+i X

Figure 15.9: CNOT gadget for a phase-correcting code. The ovals represent non-destructive measurements.

stabilizer initially is hZ1 , Z2 i, and the generators of the logical Pauli group are

X A = XA (15.14)
Z A = ZA (15.15)
X B = XB (15.16)
Z B = ZB . (15.17)

If the first measurement, of XB ⌦ X2 , has outcome 1, we can restore it to the same state as the +1 outcome
by performing ZB . (This can be done by absorbing the ZB into the Pauli frame.) Afterwards, we have a
stabilizer hZ1 , XB ⌦ X2 i and logical Paulis

X A = XA (15.18)
Z A = ZA (15.19)
X B = XB (15.20)
Z B = ZB ⌦ Z2 . (15.21)

The next measurement is of XA ⌦ XB ⌦ X1 , and we can ensure the outcome of +1 by absorbing Z1 into the
Pauli frame if needed. The stabilizer after the measurement is hXA ⌦ XB ⌦ X1 , XB ⌦ X2 i and the logical
Paulis are

X A = XA (15.22)
Z A = ZA ⌦ Z1 (15.23)
X B = XB (15.24)
Z B = ZB ⌦ Z1 ⌦ Z2 . (15.25)

Finally, we measure ZA and ZB . In case of a 1 outcome of ZA , we add X1 to the Pauli frame. If exactly
one of the two outcomes is 1, add (in addition to X1 , if appropriate) X2 . Once we have done this, the
stabilizer of the final state is hZA , ZB i, indicating the data has been moved to the ancilla qubits. The logical
operators are

X A = X1 ⌦ X2 (15.26)
Z A = Z1 ⌦ I2 (15.27)
X B = I1 ⌦ X2 (15.28)
Z B = Z1 ⌦ Z2 . (15.29)

That is, we have done the CNOT on qubits A and B while transferring them to qubits 1 and 2.

303
| i X
|0i + i|1i Y Q| i

Figure 15.10: Compressed teleportation circuit for Q gate

The magic states for the R⇡/4 and R⇡/8 gates are |0i + i|1i and |0i + ei⇡/4 |1i, respectively. They can be
distilled using the 7-qubit code and 15-qubit code, respectively. See section 13.5 for an analysis of the 15-
qubit code. Distillation of |0i + i|1i using the 7-qubit code is very similar, using the fact that the transversal
R⇡/4 does a logical R ⇡/4 . Both of these codes are CSS codes and can be encoded using only CNOTs plus
|0i and |+i preparation. They can be decoded using only CNOTs; the syndrome measurement then requires
both X and Z measurements. The R⇡/4 and R⇡/8 gates can be performed using the appropriate compressed
gate teleportation circuits, as discussed in section 13.3.
This is not yet a universal of gates, since we don’t have the full Cli↵ord group. It can be completed with
one more Cli↵ord group gate, such as Q, a single-qubit gate which maps Y $ Z. Q can be performed using
|0i + i|1i preparation, CNOT, Y (which can be abosrbed into the Pauli frame), and X measurement using
the circuit in figure 15.10.
That gives us a complete fault-tolerant protocol when the only errors are phase errors. In a more realistic
situation where phase errors are dominant, but other types of errors exist too, we can concatenate a phase-
error-correcting code with another more general QECC or use a code like the XZZX code which is capable
of correcting all kinds of Pauli errors to some degree. Because we are giving most of our attention to phase
error correction and adding extra locations to compensate from the restriction to bias-preserving gates, such
a protocol will not be nearly as good at correcting bit flip errors as a protocol which treats phase and bit flip
errors more symmetrically. Therefore, this protocol is only helpful if the physical phase error rate is much
higher than the rate of other sorts of errors.
Since there are two di↵erent error rates we care about, instead of a threshold value, there is a threshold
surface in the space of phase and non-phase error rates. We can get an idea of its shape by studying specific
points. For instance, suppose we concatenate the phase-correcting code with Knill’s high-threshold post-
selection based protocol (see section 16.1) and we don’t have bias-preserving CNOT gates. Then, when the
phase error rate is 2.5 ⇥ 10 3 and the non-phase error rate is 2.5 ⇥ 10 7 , the system is provably inside the
threshold surface. A better decoding algorithm for the concatenated code allows this to be increased to
3.5 ⇥ 10 3 rate of phase errors and 3.5 ⇥ 10 7 for non-phase errors. Compare this to a proven threshold of at
least 10 3 for stochastic noise with an unspecified bias. By taking advantage of the bias in the error rates,
we can get an improvement in the threshold, albeit a modest one.
Based on simulations (and using a slightly di↵erent error model), when phase errors are 100 times more
likely than bit flip errors, the XZZX code can achieve an improvement of above a factor of 1.5 over the
standard surface code approach, or a factor of about 2 improvement using the bias-preserving CNOT gate.
This puts the threshold for the XZZX code at about 2% in this model.

15.8 Error Independence and Non-Markovian Errors

In the previous section, we discussed errors that are more restricted than the errors in a basic model of fault
tolerance. Now it’s time to consider errors which are more general. This means errors which are correlated
between locations and coherent errors that cannot be adequately described by a stochastic error model.
Correlations could be between errors in di↵erent locations at the same time, or between locations at di↵erent
times but possibly in the same physical position. The latter could be a consequence of manufacturing
defects in a small fraction of the physical qubits, causing locations involving them to be frequently faulty.
Correlations in time could also be the result of interaction with a non-Markovian environment, one which
has a memory. With a non-Markovian environment, errors at one time can be influenced by the state of the

304
computer at a previous time.

15.8.1 Correlated Errors

First, let’s take the simplest case, correlations between errors in locations at the same time. Furthermore, for
the analysis, let’s assume that we have a stochastic error model. The adversarial stochastic model already
allows correlations between the types of errors at di↵erent locations, but we want to consider potentially
more general correlated error models.
Certainly, we can’t expect to handle completely arbitrary correlations. For instance, with some very small
probability pmeteor , there is a chance a meteor strikes the quantum computer, completely destroying all the
qubits. A more realistic example with a similar e↵ect is a power failure. No amount of error correction will
take care of that. Generally, the probability of extreme failures like this is small enough that we ignore it.
If you are really concerned, however, you could disperse the qubits in the quantum computer to a number
of distant locations, using entanglement to connect them up. That way, even a meteor can only destroy a
small fraction of the qubits in the computer.
The more normal versions of correlated errors a↵ect only a small number of qubits at once. For instance,
suppose the computer has an error source that, with probability p2 , causes a pair of qubits to both have
errors at the same time. If the a↵ected qubits are involved in separate locations, both locations are faulty.
This sort of error arises naturally from two-body interactions between qubits.
Whether this error source is a serious problem or not depends on what pairs of qubits are eligible to fail
simultaneously. If every pair of qubits in the computer can fail independently with the same probability p2 ,
then the total probability that qubit i is involved in some pair with an error is 1 (1 p2 )n 1 when there
are n qubits in total. That is, when np2 ⇡ 1, the probability of an error involving qubit i is very large,
and it continues to increase with n. When the error rate per qubit is large, well above the threshold, fault
tolerance fails. Even if there is no other source of error, the adversarial stochastic model fails to capture this
error model for large n unless you set p ⇡ 1.
Luckily, in more realistic situations, qubits which are close together are more likely to have simultaneous
errors than qubits which are far apart. Suppose that the probability of a correlated error a↵ecting qubits i
↵
and j simultaneously is p2 /rij , where rij is the distance between qubits i and j and ↵ > 0 is some exponent
describing how quickly the interaction between qubits drops o↵ with distance. For instance, a Coulomb
interaction between charges might have ↵ = 2, whereas a dipole-dipole interaction might be ↵ = 6. (Both
those examples would not give stochastic errors, but hopefully you get the idea.) If the qubits are arranged in
D spatial dimensions then there are O(rD 1 ) qubits at a distance r from a specific qubit i and the maximum
distance is n1/D . Thus, the total rate of errors involving qubit i is
Z ! ( ⇣ ⌘
n1/D
O D p ↵ (n1 ↵/D
1) D 6= ↵
O prD 1 ↵
dr = (15.30)
1 O (p log n) D=↵

If D ↵, there is a problem still, since there are too many close qubits: the probability of having an error
that a↵ects any single qubit diverges as the computer gets large (which really means it approaches 1). This
means that for a big quantum computer with such errors, even one time step is enough to completely destroy
any stored information.
If D < ↵, however, the error rate per qubit converges. In this case, unlike in section 15.2, being in a lower
dimension is good. Assuming the errors on distinct pairs of qubits are uncorrelated, we can then consider
the error model to be an adversarial stochastic error model as follows: Consider any set of s locations. If the
total probability of having qubit i fail is at most P (including the probability that it is part of a two-qubit
fault), then you can show (exercise ??) that the probability of having faults on all s locations in the set is
at most (1 + P )(2P )s/2 when s is even and P (2P )(s 1)/2 when p s is odd. We can thus consider this as an
adversarial stochastic error model with error probability p = 2P (1 + P ). Unfortunately, that could lead
to a rather low bound on the threshold. For instance, if we need p < 10 3 to be below the threshold, then
we need roughly P < 5 ⇥ 10 7 .

305
A B C D A B C D A B C D A B C D A B C D

E F G H E F G H E F G H E F G H E F G H

I J K L I J K L I J K L I J K L I J K L

M N O P M N O P M N O P M N O P M N O P

Q R S T Q R S T Q R S T Q R S T Q R S T

U V W X U V W X U V W X U V W X U V W X

Figure 15.11: An example architecture where qubits in a single block of the QECC are stored far from each
other. Qubits labelled by the same letter are part of the same block of the code.

Other options for deriving a better value for the threshold under these conditions haven’t been studied
carefully. It may make sense in this case to concatenate a code that corrects multiple errors rather than a
distance 3 code. There is some evidence that the surface codes discussed in chapter 19 don’t have too much
more trouble with correlated error pairs that are close together than they do with independent errors. And
even for a concatenated distance 3 code, the bound derived in this way from theorem 10.4 might be very
poor. Only certain pairs of locations will fail in a correlated way, and a clever design for the fault-tolerant
circuit can take advantage of that. For instance, it is probably a good idea to store the qubits in a given
block far apart so that it is unlikely they will fail simultaneously. We need to create encoded blocks for
preparation gadgets or to use as ancillas, and those will probably need to be made when the qubits are
close together, but once the block is encoded, we can separate the qubits and then check the block as in
section 13.1. When measurement and classical computation is freely available, the checking can probably be
done with only transversal quantum gates.
The recommendation that qubits in the same block should be far apart may seem to contradict the
architecture recommendations from section 15.2, where I suggested that the qubits in a block be stored
locally along with all the ancilla qubits needed. That perception is correct, there really is a tension between
the two suggestions. How it should be resolved depends on the relative importance of communication costs
vs. correlated local errors. If communication between distant qubits is cheap, but correlated errors between
nearby qubits are common, the qubits in a block should be far apart. If correlated errors are not very
important, but communication is expensive, the qubits in the block should be close together. When both
are important, it is best to find an appropriate middle ground, or to find a di↵erent solution to one or both
problems.

15.8.2 Handling Coherent Errors

The next variation of error model to consider is when errors are not stochastic. We’ll start by considering
the case where errors on di↵erent locations are independent, but they are not stochastic. Recall that an
independent (but not stochastic) error model associates a quantum channel with each location in the noisy
circuit, with no further constraint on the form of the channels. We can no longer talk about the error
probability for a location, but there is a natural sense in which we can say that the error rate is low.
Specifically, we can compare the ideal operation OL associated with location L to the actual channel EL
performed at L by the noisy circuit, and the error rate ✏L of L is low if the distance between the two
channels is small:
kOL EL k⌃ < ✏L . (15.31)
The general independent error model incorporates a wider range of realistic channels than the independent
stochastic model. For instance, amplitude damping on each qubit is an independent error model but not a
stochastic one. Another good example is systematic over-rotation or under-rotation. Imagine that every time

306
we try to perform a gate e itH , we instead perform e i(t+ )H for some fixed small , or even a that varies
in some deterministic way from gate to gate. The actual e↵ect of a location with over- or under-rotation is
a unitary transformation, it is just not quite the right one. Therefore, the error model is not stochastic, but
it is independent, and the error rate is low:
itH i(t+ )H i H
ke e k⌃ = kI e k⌃  O( kHk). (15.32)
(However, if is a random variable equally likely to be positive and negative, then the average over turns
this quantum channel back into a stochastic error.) An error model performing coherent rotations like this is
potentially worrisome, because there is the possibility of constructive interference between the errors acting
on di↵erent locations.
Nevertheless, the threshold theorem can be extended to apply to all independent error models. Theo-
rem 1.1 doesn’t quite apply, since it is designed for single qubits and some locations involve 2 or more qubits.
More importantly, we need a bit more information than is provided by theorem 1.1 about which sets of
qubits the errors act on. Still, the philosophy behind theorem 1.1 is very much in force. The independent
error model with small coherent errors can be well approximated by an error model with large errors on a
limited number of qubits and perfect gates elsewhere. I won’t go through the whole argument and will be a
bit sloppy about the details, since the result will be subsumed by theorem 15.5, but I hope you get the idea
nevertheless.
In this case, one straightforward thing to do is to purify each quantum channel by adding environment
qubits. Since the error model is independent, each location has a separate set of environment qubits. Let UL
be a purification of the ideal location and VL be a purification of the actual noisy location. We can always
choose the purifications so that
kUL VL k < ✏, (15.33)
where ✏ is some straightforward function of maxL ✏L . To be consistent, we purify all measurement locations
as well and perform the classical processing of measurement results using idealized noiseless quantum gates.
e on the computer qubits plus environment qubits is Q VL , where
The overall e↵ect of the noisy circuit C L
the product is taken in the correct order for the locations in the circuit. This can be expanded in terms of
Pauli fault paths on the computer qubits, since the set of all Pauli fault paths spans the full matrix algebra.
Actually, what we want to do is sum over Pauli fault paths relative to the ideal circuit C:
Y X Y
VL = ✏ PL U L , (15.34)
L =(⌦,{PL }) L

where PL is the Pauli associated with L if L 2 ⌦, and PL = I otherwise. The state at the end of the
computation is therefore a superposition over results obtained from various possible Pauli fault paths. We
know from theorem 14.2 that a fault path ⌦ such that all exRecs are good leads to exactly the correct output
state (once we have traced over ancillas and environment qubits). ✏ is the amplitude for Pauli fault path
, so the question we must answer is the value of ✏ for Pauli fault paths that result in bad exRecs. In the
context of a concatenated code, we are worried about bad exRecs at the top level of concatenation.
Since UL is close to VL , we can set a bound on ✏ . For a 1-qubit gate, we write
VL = (✏I I ⌦ AI + ✏X X ⌦ AX + ✏Y Y ⌦ AY + ✏Z Z ⌦ AZ )UL , (15.35)
with k(✏I 1)I + ✏X X + ✏Y Y + ✏Z Zk < ✏. X, Y , Z act on the computational qubit involved in the gate and
AP acts on the environment qubit(s) for the gate. We have chosen the ✏s to be non-negative and so that
kAP k = 1. It then follows that |✏I 1| < c✏ and
✏X + ✏Y + ✏Z < c✏ (15.36)

for some constant c. We have a similar equation for multiple-qubit gates UL . By unitarity of VL , |✏I |2 AI A†I +
|✏X |2 AX A†X + |✏Y |2 AY A†Y + |✏Z |2 AZ A†Z = 1, so ✏I  1. Then for the Pauli fault path = (⌦, {PL }),
Y
✏ = ✏ PL . (15.37)
L

307
To find the total amplitude for a fault path ⌦, sum over all Pauli fault paths consistent with it:
0 1" #
X Y Y
✏⌦ = ✏ =@ ✏I A (✏X + ✏Y + ✏Z ) < (c✏)|⌦| . (15.38)
=(⌦,{PL }) L62⌦ L2⌦

If instead we are interested in calculating the total amplitude for a fault path within a given part of the
circuit, we can simply ignore the locations outside that region. Since the purified faulty operation is unitary
on the locations we ignore, it won’t contribute to the total amplitude one way or another. In particular, the
amplitude to have faults on some particular set of r locations, regardless of what happens at other locations,
is at most (c✏)r . This is very similar to the condition for an adversarial stochastic model, but instead of a
probability of error, we have a bound on the total amplitude of fault paths with this property.
Definition 15.1. Given a particular Pauli fault path , by lemma 14.5, the behavior of the noisy fault-
tolerant simulation F^ T (C) of the circuit with the fault path is equivalent to a noisy version of C with
faults on locations that correspond to the bad exRecs. That is, given a fault path ⌦, we can define a logical
fault path or level 1 fault path ⌦1 which is the set of locations corresponding to bad exRecs for the level 1
fault-tolerant simulation with the fault path ⌦. The level l fault path ⌦l can be defined similarly as the set
of locations corresponding to the bad exRecs for the level l fault-tolerant simulation.
We can bound the total amplitude for a particular level l fault path by adding up the amplitudes for
all the level l 1 fault paths that would result in that level l fault path. The calculation is more subtle
than that for the adversarial stochastic model because now we are bounding the sum of the amplitudes of
a set of fault paths rather than the probability of the set (and recall that there are inevitable correlations
at levels 1 and higher because of the overlapping exRecs). I will discuss how to do the sum in the proof of
theorem 15.5. The upshot is that, just as before, there is a threshold value ✏T , and if ✏ < ✏T , then the total
amplitude of “bad” fault paths (i.e., those for which the level l fault path ⌦l is non-trivial) goes to zero as a
double exponential in the level. Thus, the final state of the circuit has very high fidelity to one which results
from a circuit with only good level l exRecs, which in turn gives the correct answer.

15.8.3 Handling Non-Markovian Errors

Once we allow correlations in space or time, it becomes a bit more complicated to deal with coherent errors.
The argument above depended on the fact that if we are interested in the error path in a particular region
of the circuit, we can ignore stu↵ that happens in di↵erent regions of the circuit. That’s no longer true
with error correlations, since the conditional amplitude of an error elsewhere might be much higher than the
unconditional error amplitude.
There’s also the question of how to model correlated coherent errors. If we are only interested in corre-
lations in space and not in time, we can just say the noisy circuit performs some quantum channel at each
time step, although it’s still not obvious how to say that the errors are “small” in such a model. When the
environment has a persistent memory, CPTP maps no longer suffice, since the environment can generate
entanglement between qubits at di↵erent times, or return a qubit that leaked away earlier, or any number of
other things that cannot happen with CP maps. The most common way to model such systems is to write
down a Hamiltonian interacting the system and the environment.
Definition 15.2. A non-Markovian error model is a Hamiltonian H = HS + HB + HSB . HS is the system
Hamiltonian. It acts only on the qubits in the computer, including any ancilla qubits, and implements
the ideal gates of the circuit being performed. Assume any measurements and classical computation have
been implemented coherently with qubits, with noiseless qubits replacing classical bits. HB is the bath
Hamiltonian; it acts only on environment qubits and represents the internal dynamics of the bath. The bath
does not need to be composed of qubits — it can be any quantum system. HSB goes by many names, such
as noise Hamiltonian or system-bath Hamiltonian. It represents the actual source of errors, and
P acts on both
computer qubits and environment qubits. A t-local non-Markovian error model has HSB = T HT , where
T runs over sets of up to t computer qubits, and HT acts on qubits in T and any number of environment

308
qubits. The noise strength of a t-local non-Markovian error model is maxT kHT k1 . All Hamiltonian terms
may be time-varying.

For simplicity, I will use units of time so that one computational time step takes one unit of time and
units of energy so that ~ = 1.
The terms HSB are supposed to contain all the error sources for the computation. This includes both
interactions with the environment causing decoherence and errors in the implementations of gates. In the
remainder of this section I will consider only 1-local non-Markovian error models. Note, though, that this
doesn’t quite subsume earlier error models, even the independent stochastic error model. In an independent
stochastic error model, a faulty gate can produce errors on all the qubits involved in a gate. When HSB
involves only single-qubit terms, it can still generate two-qubit errors via interaction with terms in HS used
to perform two-qubit gates. However, not all possible errors on two qubits can be generated this way. In
order to fully incorporate gate errors, we need to consider 2-local or 3-local non-Markovian error models
when the circuit involves 2- or 3-qubit gates. Provided the multi-qubit terms in HSB are restricted to just
locations where gates are being performed, or if the strength of correlated errors on distant qubits drops
sufficiently fast, as discussed in section 15.8.1, a version of theorem 15.5 still holds.

Theorem 15.5. There exists a threshold value pT such that if the noise strength of a 1-local non-Markovian
error model is below pT , then for arbitrary T , ✏ > 0, there exists a fault-tolerant simulation of any circuit of
size T which gives an output within statistical distance ✏ of the correct output distribution. The overhead for
the fault-tolerant simulation is polylogarithmic in T /✏.

Proof. First, we need to show that we can break the computation up into a sum over fault paths.

Lemma 15.6. The overall time evolution of the system from initialization at time 0 to the end of the
computation at time T can be written as
X
U= W⌦ , (15.39)
⌦

where the sum is taken over fault paths ⌦ and W⌦ is an operator acting on the system and environment that
performs the correct ideal operation at locations L 62 ⌦. If the noise strength is at most p, and the circuit
uses gates acting on at most m qubits, then

kW⌦ k1  (mp)|⌦| . (15.40)

Furthermore, let
X
E⌦ = W⌅ (15.41)
⌅◆⌦

be the sum over all fault paths containing a given fault path ⌦. Then

kE⌦ k1  (mp)|⌦| (15.42)

as well.

W⌦ is the non-unitary, non-CPTP operator performing the action on the system when the fault path
is exactly ⌦, whereas E⌦ is the operator performing the action on the system when there are faults in the
locations of ⌦ and arbitrary behavior elsewhere.

Proof of Lemma 15.6. The first step is sometimes referred to as “trotterization”: We use the Trotter formula
to break the time evolution up into very small time steps of size .

dT / e 1
Y
iHS (j ) iHB (j ) iHSB (j )
U = lim e e e . (15.43)
!0
j=0

309
(If the Hamiltonian terms do not vary smoothly, this formula may need to be slightly adjusted, but it won’t
have a big impact on the argument except for complicating the notation.) U is the overall unitary evolution
of system plus bath over the course of the computation, which lasts for a time T .
There are 1/ small time steps for each computational time step, but not very much happens in each
small time step. Let us expand
X
iHSB (j ) 2 2
e =I i HSB (j ) + O( )=I i Ha (j ) + O( ). (15.44)
a

Since we are taking the limit ! 0, the O( 2 ) terms are negligible. The sum over a is taken over physical
locations in the computer (not the bath), and the terms Ha are the terms in the expansion of HSB , which
is 1-local. We get

dT / e 1
Y X
iHS (j ) iHB (j )
U = lim e e [I i Ha (j )] (15.45)
!0
j=0 a
N dT / e
X X
= lim V , . (15.46)
!0
r=0 =((j0 ,a0 ),(j1 ,a1 ),...,(jr 1 ,ar 1 ))

We get the second line by distributing the product over the di↵erence in brackets in the first line. In the
inner sum, runs over sets of pairs (j, a), with a a physical qubit and j a precise time (as a multiple of )
within the time step of the location. (j, a) is thus a more fine-grained location. We can assume the qubits
are ordered in some way, that ak  ak+1 , and if ak = ak+1 , then jk < jk+1 . N is the total number of physical
computation qubits and V , is

dT / e 1
Y
iHS (j ) iHB (j )
V , = e e Oj, , (15.47)
j=0
Y
Oj, , = ( i Ha (j )). (15.48)
a|(j,a)2

is a sort of detailed fault path, indicating where HSB acts in the small time steps. r is the number of errors
— insertions of Ha for some a — that appear in . Note that V;, , for trivial , is the tensor product of
the correct time evolution for ideal gates with some bath evolution given by HB . However, V , is probably
not unitary when is non-trivial.
Let us bundle together all the detailed fault paths that have errors in the same locations, i.e., all the
that give rise to the same fault path ⌦, in the usual sense of fault path. Let
X
W⌦, = V , (15.49)
producing ⌦

For each location L 2 ⌦, we must include which have one of the qubits involved in L appearing one or
more times. We can organize this sum as follows: For each location L 2 ⌦, choose a single time step jL and
qubit aL such that (jL , aL ) 2 L. Let 0 = {(jL , aL )|L 2 ⌦} be the detailed fault path consisting of just
these pairs. Let us say that > 0 if 0 ✓ and if (j, a) 2 \ L implies (jL , aL ) 2 0 and either j jL
or j = jL and a aL ; that is, (jL , aL ) is the earliest fine-grained location in within the location L and
contains no locations other than those in 0 . If the location is a multiple-qubit gate, we order j first by
time and then by qubit within the location. Then
X X
W⌦, = V , . (15.50)
0 > 0

310
But the sum over > 0 means that within any location, we are summing over both I and i HSB for
every time step within the location after (jL , L), and thus get e iHSB . Thus,
" #
X dT /Ye 1 Y
iHS (j ) iHB (j ) iHa (j )
W⌦, = e e Oj, 0 , e , (15.51)
0 j=0 a

where the product over a is over qubits for which (j, a) 2 L, (jL , aL ) 2 0 and jL < j. The point of this
expansion is that we are identifying the first faulty fine-grained location within every location in ⌦ and then
ignoring all later faults within the same location, since the location has already failed.
The exponentials in equation (15.51) are all unitary, and therefore have norm 1. Thus,

X dT /Ye 1
kW⌦, k1  kOj, 0, k1 (15.52)
0 j=0
X 0 Y
| |
= kHa (j )k1 . (15.53)
0 (j,a)2 0

Now, kHa k1  p for all a, and | 0 | = |⌦| since each location in ⌦ has exactly one fine-grained location in
0
. For an m-qubit location L, there are at most md1/ e possible values of (jL , aL ), so the total number of
terms in the sum over 0 is at most (md1/ e)|⌦| = m|⌦| |⌦|
(1 + O( )). Thus,

kW⌦, k1  (mp)|⌦| (1 + O( )). (15.54)

Let W⌦ = lim !0 W⌦, , and we find that kW⌦ k1  (mp)|⌦| .

Notice that for locations outside ⌦, W doesn’t involve any insertions of HSB . HS commutes with HB ,
since the former acts only on the system and the latter only on the bath. Furthermore, at a fixed time, HS
can be written as a sum of terms HS,L acting on di↵erent locations, and those terms all commute as well.
Therefore, for any location L with no insertions of HSB , W contains a factor
Y
e iHS,L (j ) ! UL , (15.55)
j

the correct ideal unitary for that location.

P Now consider E⌦ . We can bound the norm of it in much the same way as we did for W⌦ . Let E⌦, =
⌅◆⌦ W⌅, , so E⌦ = lim !0 E⌦, . Then
X X X
E⌦, = V , (15.56)
⌅◆⌦ 0 > 0
for ⌅
X X
= V 00 , . (15.57)
0 00 B 0
for ⌦

For the inner sum over 00 in the second line, we say 00 B 0 if 0 ✓ 00 and if, whenever (j, a) 2 00 \ L
and (jL , aL ) 2 0 , then either j jL or j = jL and a aL . That is, 00 contains locations other than those
in 0 , but if a location in 00 is also in 0 , then the time of 0 is earlier.
This means we end up summing over I and i Ha not just for the time steps after a fine-grained location
in 0 , but also for all locations L 62 ⌦. Thus,
dT / e 1
X Y Y
iHS (j ) iHB (j ) iHa (j )
V 00 , = e e Oj, 0, e , (15.58)
00 B 0 j=0 a

where the product over a is now over qubits for which either the location L containing (j, a) is not in ⌦ or
L 2 0 and jL < j. The remainder of the calculation for E⌦ is identical to that for W⌦ , so we get the same
bound.

311
For any fault path ⌦, use procedure
P 14.1 to assign truncation and determine whether each exRec is good
or bad. We’d like to bound k W⌦ k1 summed over any fault paths ⌦ that lead to bad exRecs at the top
level of concatenation.
A natural way to derive such a bound would be to bound kW⌦ k1 over individual
P fault paths and then sum
over fault paths. The problem with this plan is that it is doomed to failure: k W⌦ k1 can be potentially
be quite large as the system gets big because the amplitude to not have an error in a single location can be
greater than 1. Amplitudes are not probabilities and the di↵erent fault paths need not be orthogonal, so
there is potentially coherence between fault paths with and without errors, which is why the amplitude can
be greater than 1. Another way of thinking about this is that because errors can add coherently, adding more
fault paths to the sum can reduce the norm of the sum. With stochastic noise, we could safely overestimate
the set of malignant fault paths, since extra fault paths only increased the probability, but doing that with
coherent noise could lead to a false sense of security.
The solution is to work not with W⌦ but E⌦ , which is much better behaved. In particular, we can break
the computation down level-by-level using a level reduction theorem:
Lemma 15.7. Suppose we have a purely unitary computation U simulated by a unitary fault-tolerant protocol
F T (U ) using a QECC correcting t errors, and the simulation undergoes a non-Markovian noise model which,
when purified, can be written as
X
F^T (U ) = W⌦ , (15.59)
⌦

where the sum is taken over fault paths ⌦ and W⌦ is an operator acting on the system and environment that
performs the correct ideal operation for F T (U ) at locations L 62 ⌦. Let
X
E⌦ = W⌅ . (15.60)
⌅◆⌦

and suppose
kE⌦ k1  p|⌦| . (15.61)
Then there exists U e (a faulty realization of U ) and classical map f taking basis states of the output qubits
of F T (U ) to basis states of the output qubits of U such that
X
e |0i|2 =
|hy|U |hx|F^
T (U )|0i|2 (15.62)
x|f (x)=y

and X
e=
U W⌦0 , (15.63)
⌦

where the sum is taken over fault paths ⌦ and W⌦0 is an operator acting on the system and environment that
performs the correct ideal operation for U at locations L 62 ⌦.
Moreover, if X
0
E⌦ = W⌅0 , (15.64)
⌅◆⌦

then
0
kE⌦ k1  (p0 )|⌦| (15.65)
with ✓ ◆
A
p0  (2ep)t+1 . (15.66)
t+1
Let me clarify a bit what the unitary circuit and purification are doing here. An arbitrary quantum
circuit can be turned into a unitary one by replacing any mid-circuit measurement and subsequent classical
computation with quantum processing. We do that here, using CNOTs to ancilla qubits which are never
reused to decohere measured qubits and quantum gates to perform any classical circuit. Because we are

312
only studying the unitary circuit part, we assume the computation starts with the state immediately after
preparation of |0i states and ends immediately before measurement of the output qubits of the circuit. This
is a fairly standard quantum computation trick. The part that needs clarification here is what happens with
the purification when we have a noisy realization of the circuit. The assumption is that any imperfections in
the state preparation are absorbed into an initial error just after the |0i preparation, which are still perfect.
Any qubits prepared after the start of the circuit are instead prepared at the beginning and wait, error-free,
until the point at which they are actually invoked in the original non-unitary circuit. Measurements, mid-
circuit or at the end, have their errors absorbed into the CNOT gate used to decohere the qubits, and the
terminal measurements used to get classical output bits are assumed to be perfect. Any classical processing
in the middle is assumed to be perfect as well even though it is being implemented using qubits.
The reason we can get away with assuming all of these modifications have no faults is that they are
not real modifications. They are conceptual modifications to enable us to analyze the system as a unitary
process rather than a more complicated CPTP map. Thus, we don’t have to worry about the issues arising
in sections 15.3 and 15.4.

Proof of Lemma 15.7. We can define:

X
W⌦0 = D W , (15.67)
| 7!⌦

where D is the ideal ⇤-decoder and notation 7! ⌦ means is malignant for exactly the exRecs (no more,
no less) corresponding to ⌦ in F T (U ). The map f is given by the fact that we have a simulation of U , so
there must be a way of classically decoding the outputs of F T (U ) to give the outputs of U . Lemma 14.5 (at
least the proof of that lemma) shows that W⌦0 has the desired property that it acts perfectly on locations
0
not in ⌦ and that the decoded output distribution is the same. So all we need to do is to bound kE⌦ k1 .
Let C be a subcircuit (i.e., a set of locations in the circuit) and
X
VC, = W⌦ . (15.68)
⌦|⌦\C=

Note that a fault path is also a set of locations, but we are thinking of C as a set of locations that might or
might not have faults, whereas we generally think of a fault path as a set of locations that do have faults.
Then VC, is the sum over fault paths that have faults within C exactly at the locations of , but outside
C can have faults anywhere.
0
We can write E⌦ in terms of VC, operators. Let C⌦ be the union of the exRecs in the simulation
corresponding to the locations of ⌦. Then
X
0
E⌦ = W⌅0 (15.69)
⌅◆⌦
X X
=D W (15.70)
⌅◆⌦ | 7!⌅
X
=D W (15.71)
|9⌅◆⌦, 7!⌅
X
=D W (15.72)
| \C⌦ 7!⌦
X
=D VC⌦ ,⌅ . (15.73)
⌅✓C⌦ |⌅ is malignant for the exRecs in C⌦

There are fewer terms in this sum than in the sum over W⌅ . If we can bound kVC⌦ ,⌅ k1 , then we can
0
bound E⌦ . The only assumption we have to help bound kVC⌦ ,⌅ k1 is that kE⌦ k1  p|⌦| . We’d therefore
like to write VC,⌅ as a sum over groups of fault paths E .

313
Claim 15.8. Consider a subcircuit C and ⌅ ✓ C. Then
|C| |⌅|
X X
VC,⌅ = ( 1)s E [⌅ . (15.74)
s=0 ✓C| \⌅=;, | |=s

Proof of Claim. This is an inclusion-exclusion argument, meaning we overcount and then subtract o↵ the
extras. But actually we overcount the things to subtract o↵ and so have to add some back, and so on.
We can imagine expanding E [⌅ on the RHS of equation (15.74) into V ’s. We have that, for ⌥ ✓ C,
X
E⌥ = W⇧ (15.75)
⇧◆⌥
X X
= W⇤1 [⇤2 (15.76)
⇤1 |⌥✓⇤1 ✓C ⇤2 \C=;
X
= VC,⇤1 . (15.77)
⇤1 |⌥✓⇤1 ✓C

That is, we sum over all VC,⇤1 for fault paths ⇤1 contained in C and containing ⌥.
Let us consider the coefficient of VC, [⌅ on the RHS of equation (15.74) for a particular fault path [ ⌅.
If = ;, the only contribution is from E⌅ , which matches VC,⌅ on the LHS. Otherwise, if | | = s > 0, then
VC, [⌅ shows up in E 0 [⌅ for any 0 ✓ . There are sj subsets of of size j, and those terms appear with
a coefficient ( 1)j in the sum. The total coefficient of VC, [⌅ on the RHS is thus
Xs ✓ ◆
s
( 1)j = 0. (15.78)
j=0
j

This proves the claim.

In the last line, I have collected together all the times E⌥ appears in the sum. In the previous line, ⌥ = [⌅,
so it is broken up into a malignant fault path ⌅ on C and an additional fault path of s extra locations.
Of course, adding one or more faults to a malignant fault path produces another malignant fault path, so
we get E⌥ once for each fault path ⌅ ✓ ⌥ which is malignant, and it appears with a coefficient ( 1)|⌥| |⌅| .
We therefore have
|⌥|
X 0
c⌥ = ( 1)|⌥| s Ns0 ,⌥ , (15.82)
s0 =0

where Ns0 ,⌥ is the number of malignant fault paths of size s0 which are subsets of ⌥. Thus,
X
0
kE⌦ k1  |c⌥ |kE⌥ k1 (15.83)
⌥✓C⌦ |⌥ is malignant for the exRecs in C⌦
|⌥|
X X
 Ns0 ,⌥ p|⌥| . (15.84)
⌥✓C⌦ |⌥ is malignant for the exRecs in C⌦ s0 =0

314
Here, we have applied the lemma’s assumption on a bound to kE⌥ k1 .
This is substantial progress: Now we have a bound in terms of a sum of a positive quantity over malignant
fault paths. That means it is safe to overestimate the number of malignant fault paths to get a bound.
Let Ms be the the number of malignant fault paths containing s physical locations for a single exRec,
considering only the fault path within the exRec. If there are di↵erent kinds of exRecs, let Ms be the largest
value over the possible kinds of exRecs. Then the number of fault paths contained in C⌦ with a total of s
physical locations which are malignant for all r of the exRecs of C⌦ is bounded as
X r
Y
Ns  M si . (15.85)
s1 +s2 +...sr =s i=1

It could be fewer, however, since some of the exRecs may be truncated and since not all exRecs may have the
maximal value of Msi . Because we are considering only fault paths within C⌦ , we only truncate an exRec
if one of the subsequent exRecs is also in C⌦ .
We can upper bound Ms by simply reverting to the simplest notion of bad exRecs. Then Ms = 0 for
s  t and Ms  As for s t + 1, where A is the size of the largest exRec. (And Ms = 0 for s > A, since
there are no fault paths of that size, malignant or benign.)
0
We can now finish bounding kE⌦ k1 in the case where ⌦ has r locations (so C⌦ has r exRecs):
X1 Xs ✓ ◆
0 s s
kE⌦ k1  Ns p (15.86)
s=0 0
s0
s =t+1
1
X
 N s ps 2s (15.87)
s=0
X1 X r
Y
 Msi (2p)si (15.88)
s=0 s1 +s2 +...sr =s i=1
" A ✓ ◆ #r
X A
 (2p)s (15.89)
s=t+1
s
✓ ◆ r
A
 (2ep)t+1 (15.90)
t+1
The last line uses lemma 1.2; when we are below the threshold, 2p will always be small enough for it to
A
apply. Thus, we can identify p0 = t+1 (2ep)t+1 .

Given lemmas 15.6 and 15.7, the threshold theorem for non-Markovian noise follows using the usual
argument for the threshold. In particular, lemma 15.6 tells us that kE⌦ k1  (mp)|⌦| when maxT kHT k1 <
p. If we have a computation encoded in an l-level concatenated fault-tolerant protocol, then lemma 15.7
tells us that it simulates an (l 1)-level concatenated fault-tolerant protocol with
|⌦|
kE⌦ k1  p1 , (15.91)
where
p1 /pT = (mp/pT )t+1 . (15.92)
Here,
1
pT = h i1/t . (15.93)
A
2e t+1 2e
Lemma 15.7 also tells us that the level-(l j +1) fault-tolerant protocol simulates a (l j)-level fault-tolerant
protocol with
|⌦|
kE⌦ k1  pj , (15.94)

315
where
t+1
pj /pT = (pj 1 /pT ) . (15.95)
We thus have that the level-l protocol simulates an unencoded circuit with
h i
l |⌦|
kE⌦ k1  pT (mp/pT )(t+1) . (15.96)

In this simulated circuit, we can write U = G + B, with G the good fault path with no faults, and B a
sum over fault paths with one or more faulty locations,
X
B= W⌦ . (15.97)
⌦||⌦| 1

We want to bound kBk1 . Applying claim 15.8 to the case where C is the full circuit, which has T locations,
we have
T
X X TXr X
kBk1 = ( 1)s E⌦[⌅ (15.98)
r=1 ⌦||⌦|=r s=0 ⌅|⌦\⌅=;,|⌅|=s
1
T
X X s0
X ✓ 0◆
0
r s
= ( 1)s E (15.99)
r
s0 =1 || |=s0 r=1
1
T
X X 0
= ( 1)s +1 E (15.100)
s0 =1 || |=s0
1
T ✓ ◆h
X i s0
T l
 pT (mp/pT )(t+1) (15.101)
s0
s0 =1
l
 T epT (mp/pT )(t+1) , (15.102)

again applying lemma 1.2 in the last line. As with the threshold theorem for stochastic noise, if p < 1/(mpT ),
we can choose l = O(log log(T /✏)) to ensure that kBk1 < ✏/2.
The trivial fault path G gives the correct logical output. If |0 . . . 0i is initial state, U |0 . . . 0i = | G i+| B i
is the final state, with | G i = G|0 . . . 0i, | B i = B|0 . . . 0i. (Note that assuming the initial state |0 . . . 0i
does not exclude errors in preparation. We can model a potentially faulty preparation location as a one time
step interaction with the bath acting on a perfect |0i.)
We know that B, the sum of the potentially bad fault paths, has norm at most ✏/2. The statistical
distance between the actual and ideal output distributions is given by at most the trace distance between
| G i + | B i and 1/N | G i, where N = k| G ik is a normalization factor. The trace distance is
p
D = k1/N 2 | G i h G | (| G i + | B i)(h G | + h B |)k1 = 2 1 |1/N h G |(| G i + | B i)|2 . (15.103)

Since k| Gi+ | B ik = 1, the trace distance is maximized when | Gi and | Bi are orthogonal, which gives
N2 1 (✏/2)2 , so p
D  2 1 N 2  ✏. (15.104)

15.8.4 E↵ect of Coherent Errors on the Threshold Value

When we specialize theorem 15.5 to distance 3 codes (t = 1) and 2-qubit gates (m = 2), we find the threshold
value is
1
pT = . (15.105)
4e A2
2

316
Comparing to the threshold for stochastic noise from theorem 10.4, this is smaller by a factor 4e2 . One
factor of 2 is due to having only single-qubit noise in the non-Markovian error model we considered; if the
error rate for stochastic noise is p per qubit, a two-qubit gate has error rate about 2p. It thus seems like the
threshold for non-Markovian noise is smaller than that for stochastic noise by a factor 2e2 ⇡ 15.
However, that perception is deceptive for two reasons. First, remember that these formulas just give
lower bounds on the threshold. The true threshold might be much higher. Since the non-Markovian proof
is more complex, it may well need more conservative assumptions, leading to a poorer threshold bound.
Second, and more importantly, comparing these numbers directly is comparing apples and oranges. They
are bounds on two di↵erent quantities and we can’t expect to compare them directly.
In order to make a fair comparison, let us write a stochastic model using a system-bath Hamiltonian.
There is not a unique way to do this, of course, but I’ll write down a simple example in order to see how
things work out. Let’s assume that HS and HB act Pas delta functions at integral time steps, and at other
times, HSB is the only time evolution. Let HSB = a ZEa ⌦ Pa . ZEa is Z acting on environment qubit a,
while Pa acts on the corresponding computational qubit a. There is also an additional environment qutrit
for a, which determines whether P is X, Y , or Z. Then at integer time values, HB acts to randomize all
the environment registers. The e↵ect of this Hamiltonian is to select a random Pauli P 2 {X, Y, Z} for each
time step and qubit, and do e±i P on that qubit, with + and equally likely.
Averaging over the + and yields an equal mixture of e+i P and e i P , which is equivalent to performing
P with probability sin2 ( ) ⇡ 2 (when is small) and I otherwise. Thus, with probability 1 2
, the qubit
2
stays the same, and with probability , it undergoes a random Pauli. If the qutrit that determines the
Pauli is uniformly random, the channel is a depolarizing channel, but by selecting other distributions for the
qutrit, we could get any Pauli channel. For a location involving a two-qubit gate like the CNOT gate, the
probability that either qubit has an error is about p = 2 2 . By comparison, the noise strength for HSB is
| |.
Now let us compare the threshold we derive via the two versions of the threshold theorem. Theorem 15.5
says that we are below the threshold when
1
| |< . (15.106)
4e2 A2

Theorem 10.4 says that we are below the threshold when

2 1
2 < A
, (15.107)
2

or
1
<q . (15.108)
2 A2
q
Thus, the non-Markovian threshold bound is worse not by a factor of 2e2 , but by a factor of 2e2 2 A2
(or by the square of this, depending how you do the comparison). That’s a very big di↵erence. The
underlying intuition here is that the threshold theorem for the basic model sets a bound on the probability of
error, whereas the non-Markovian threshold theorem sets a bound on the amplitude of error, and the error
probability is the square of the amplitude.
On the one hand, this is beneficial in that it means that the bound on kHSB k is weak when the noise
is stochastic. For instance, an error probability of 10 4 , comfortably below the threshold, corresponds to
an error amplitude of around 1%, which seems very achievable. When the size of HSB is the relevant
experimental parameter, that makes things easier.
On the other, however, theorem 15.5 seems to be saying that we need a very low error probability when
the noise is non-Markovian, or even coherent but non-stochastic. Instead of an error probability below 10 3 ,
we now need to achieve an error probability about 10 6 , which is much much harder.
But is it really true that the threshold for non-stochastic noise is so much worse than for stochastic noise,
or is that just an artifact of the proof? Intuitively, we can understand why the di↵erence might occur. A

317
series of T coherent errors that each alter the amplitude of the state by p could sum up to a total amplitude
error of about T p. Thus, in order to avoid a problem, we should set a bound on the amplitude error per
location.
However, notice that in order for the di↵erent errors to produce a total amplitude error of T p, they need
to be coherent with each other. Each error has to alter the state in the same way and with the same phase, or
the overallp error will be much smaller. If the errors each have a random phase, the amplitude will accumulate
as only O( T ). This is the same as the behavior of stochastic errors, so in this case, we expect the threshold
should be a bound on the error probability. That means that errors, even coherent non-Markovian ones,
caused by random factors in the environment probably have a threshold similar to that for stochastic noise.
It’s much more plausible that systematic errors would produce the same e↵ect every time. But note that
it is not actually enough that each error has the same e↵ect at the time it occurs. In order to add coherently,
the error must have the same phase and be of the same type as the current error, which has been changed
over time by the action of the fault-tolerant circuit. For instance, suppose there is an erroneous coherent
rotation about the Z axis at time 0, then a perfect Hadamard gate, then another incorrect Z-axis rotation.
The Hadamard converts the original Z-axis rotation into an X-axis rotation, so the overall error at this
point is an X-axis rotation followed by a Z-axis rotation. Two Z-axis rotations have perfect constructive
interference, but an X-axis rotation followed by one about the Z axis instead act at partial cross-purposes,
resulting in a smaller overall error. Since the whole purpose of a fault-tolerant circuit is to do fault-tolerant
gates, we are constantly going to be mixing up the types of errors, making it much harder for them to add
coherently. An adversary could look at the circuit and perhaps figure out a way to change the type of error
in just the right way to get them to add coherently, but it’s unlikely that a blind natural environment could
do this.
Indeed, it’s actually impossible to get all of the errors to add coherently. Suppose we have two fault
paths and ⌦ that produce di↵erent error syndromes for some FTEC gadget. The syndrome measurement
irrevocably decoheres these two fault paths, so we can never get interference between and ⌦. In order
for and ⌦ to add up coherently, they need to produce the same error syndrome for every FTEC gadget.
This is hard to arrange. Typically only a few di↵erent fault paths will be able to constructively interfere, at
best. Consequently, I expect that the threshold in practice to be maybe slightly smaller than the stochastic
threshold, but not dramatically smaller, even when the true error model is systematic and coherent.
It’s very difficult to take these considerations into account rigorously to develop an alternative proof with
a better threshold for coherent or non-Markovian errors. Even a simulation is difficult, because the whole
point is to properly keep track of quantum interference among the errors, which is hard to track classically.
One solution that has been suggested is to simplify the analysis by physically performing a random Pauli
operator on the state at each time step, and absorbing the operation into the Pauli frame to compensate.
This is another example of twirling, which we saw in section 13.5 when talking about magic state distillation.
If the Paulis are performed perfectly, the e↵ect is to decohere the errors in the Pauli basis, modifying the
channel to become a Pauli channel, which is covered by theorem 10.4. Realistically, we would need to take
into account faults in the Pauli twirling, and those faults might not be stochastic, but it is plausible that
the behavior will still be quite close to that of the stochastic channel resulting from perfect twirling.

15.8.5 Error Models Not Covered by the Existing Proofs

Theorem 15.5 is quite general, and covers sufficiently weak noise for a wide variety of error models. It
can be extended even further, to include environments that interact with multiple computational qubits at
once, provided the strength of the interactions decays fast enough with the distance between them (as in
section 15.8.1). The big remaining gap is to handle systems where kHSB k1 is large.
Naturally, the threshold theorem requires weak noise. We can’t hope to prove the existence of a threshold
for large noise strength and arbitrary Hamiltonians, but there are two general classes of systems for which
kHSB k is large but the noise is still weak. In general, we should expect that some sort of threshold will hold
for any system where errors on each single qubit are rare, and errors on groups of r qubits decay exponentially
with r, but to prove this requires some additional assumptions as to what “counts” as an error and what
causes the errors.

318
One class is systems for which the interaction between the system and some state of the environment is
strong, but the environment is rarely in the state that causes a problem. The somewhat silly example of a
meteor striking the computer illustrates the principle. The large number of correlated errors caused by the
meteor are not due to a fundamental many-body interaction in the meteor-qubit Hamiltonian, but rather
are due to a strong interaction between the meteor and each individual qubit in the computer. When, as
is almost always the case, the environment is in the |no meteori state, the computer is fine, but on rare
occasions, the environment enters the |meteori state, and many qubits are a↵ected. There are other more
benign versions of this model, where a rare environmental excitation causes problems with a few qubits.
Some such models actually fit naturally into theorem 10.4, but others do not. An example of one that might
be realistic and potentially problematic is a drift in the calibration of control parameters. Because the control
must be strongly coupled to the system in order to do gates, a change in the control parameters will have a
big e↵ect, and once the control parameters are far o↵ the correct values, there will be frequent errors until the
system is recalibrated. This particular problem can be dealt with by recalibrating sufficiently often to avoid
the drift, or at least monitoring the system so that we are aware of the drift, but that requires additional
action beyond the fault-tolerant protocol. Another error source of this class might be more troublesome if
we don’t know to look for it.
Another class of models with large kHSB k1 includes systems where the part of the environment that
interacts with the computer is infinite-dimensional. For instance, the spin-boson model is a standard model
of decoherence where a single spin interacts with one or more harmonic oscillators. The spin here represents
the computational qubit, and the oscillators form the environment. The qubit interacts more strongly with
the higher-frequency modes of the oscillators, so the interaction Hamiltonian HSB is unbounded. There are
two issues with these systems. First, there is the possibility that some high-frequency modes get excited.
Then there is a strong interaction with the computer, and we are potentially back in the situation discussed
in the previous paragraph. However, it is physically sensible to consider a cold environment, in which case
high-frequency modes are not excited, and even low-frequency modes are only weakly excited. In that case,
we think the noise should be weak. Nevertheless, the Hamiltonian remains unbounded; the vacuum of the
high-frequency modes still has an interaction with the system, and that messes up the argument. Presumably
this is just a technical issue, but so far it has only been dealt with under some additional assumptions about
the nature of the bath.
It will be difficult to find any sort of general proof appropriate for systems with the potential of strong
interaction induced by rare states of the environment. While it’s natural to assume an initial low-temperature
state of the bath, entropy introduced into the system by faulty gates will find its way into the environment,
heating it up. Furthermore, interaction between the environment and the computer may drive the bath out
of thermal equilibrium. It’s conceivable that energy could end up concentrated into a few modes, causing
any qubits they interact with to fail at a much higher rate than we might expect based on the bath’s initial
state. In order to make a rigorous statement about the presence or absence of a threshold for such a system,
we need to know a lot about the internal dynamics of the bath, and even then, given the interaction with the
complicated time-varying system Hamiltonian that implements gates, it will be difficult to prove anything.
To be clear, there is no evidence that fault tolerance is impossible in such systems. In some cases, the
bath states that cause problems won’t be that unusual, and then errors will be relatively common. That, of
course, is a problem. Since we are postulating a strong system-bath interaction, the errors, when they occur,
might a↵ect many qubits. That also is a potential problem, even if the errors are rare. However, both of
these issues can be diagnosed without too much difficulty by putting the system through a battery of error
testing.
Some people think that there are other scenarios that could produce low error rates in tests but never-
theless cause fault tolerance to fail. Frankly, I find those scenarios to be a bit far-fetched. They require a
conspiracy on the part of nature to organize the environment in just such a way as to foil our computation
without being obvious about it. It’s logically possible, but that’s more a statement about the limitations of
what we can prove rather than a limitation on the ability of the system to perform quantum computation.
And of course, there’s also the logical possibility of fundamental many-particle interactions or a failure of
quantum mechanics, either of which could potentially foil fault tolerance (though they don’t necessarily do
so). Either would be a big departure from physics as we know it, but the two scenarios can’t be completely

319
Assumption Required? What happens to the threshold?
Specific universal gate set No Decreased by an O(1) factor
Long-range gates No Decreased by an O(1) factor
Fast measurements No Little change
Fresh ancilla qubits Partially Hot bath decreases threshold
Parallelism Mostly Tolerance of storage errors decreased with less parallelism
No leakage errors No Little change
Adversarial errors No No change or increased by an O(1) factor
Uncorrelated errors Partially Decreased by one or more orders of magnitude
Stochastic errors No Little change or decreased by an O(1) factor

Table 15.1: Summary of various assumptions going into the threshold theorem, whether they are necessary,
and some speculation as to what happens to the threshold if the assumption is removed or relaxed.

ruled out. Ultimately, the question of whether a given system can be used to build a fault-tolerant quantum
computer can only be definitively answered by experiment. And even experiment can’t definitively answer
the question of whether we can build a quantum computer that is bigger than the ones we currently have,
whatever that may be when you read this.

320
Chapter 16

Now, What Did I Leave Out, Part

Two?: Other Things You Should
Know About Fault Tolerance

16.1 Ancilla Factories

16.2 Fault Tolerance for Polynomial Codes
16.3 More General Notions of Fault Tolerance
16.4 Upper Bounds on Fault Tolerance

321
322

Intrusion Detection Honeypots
From Everand
Intrusion Detection Honeypots
Chris Sanders
3/5 (2)
Adiabatic Quantum Computation and Quantum Annealing - Theory and Practice (PDFDrive)
No ratings yet
Adiabatic Quantum Computation and Quantum Annealing - Theory and Practice (PDFDrive)
94 pages
A Course in Error-Correcting Codes - Justesen and Høholdt
100% (1)
A Course in Error-Correcting Codes - Justesen and Høholdt
204 pages
Ballard D. and Brown C. M. 1982 Computer Vision
100% (2)
Ballard D. and Brown C. M. 1982 Computer Vision
539 pages
PHY265 Lecture Notes: Introducing Quantum Error Correction: A. C. Quillen March 17, 2025
No ratings yet
PHY265 Lecture Notes: Introducing Quantum Error Correction: A. C. Quillen March 17, 2025
57 pages
Introducing Quantum Error Correction
No ratings yet
Introducing Quantum Error Correction
46 pages
Quantum Computation With Topological Codes - From Qubit To Topological Fault-Tolerance
No ratings yet
Quantum Computation With Topological Codes - From Qubit To Topological Fault-Tolerance
155 pages
Quantum Computing Lecture Notes Another Set
No ratings yet
Quantum Computing Lecture Notes Another Set
105 pages
2024 Kueng Quantum Computing
No ratings yet
2024 Kueng Quantum Computing
194 pages
03 Algorithm and Error Correction New Updates
No ratings yet
03 Algorithm and Error Correction New Updates
102 pages
Renes Lecture Notes14 PDF
No ratings yet
Renes Lecture Notes14 PDF
187 pages
Physics 160 Notes
No ratings yet
Physics 160 Notes
73 pages
Quantum Computing - Lecture Notes
100% (2)
Quantum Computing - Lecture Notes
114 pages
Introduction To Classical and Quantum Computing
No ratings yet
Introduction To Classical and Quantum Computing
392 pages
Qbook 1
No ratings yet
Qbook 1
438 pages
QC Notes
No ratings yet
QC Notes
141 pages
Lectures Notes On Quantum Computing and Quantum Information
No ratings yet
Lectures Notes On Quantum Computing and Quantum Information
187 pages
IQC Masterfile
No ratings yet
IQC Masterfile
117 pages
Lectures Qco
No ratings yet
Lectures Qco
124 pages
Quantum Computing and Information
100% (2)
Quantum Computing and Information
182 pages
Quantum Computing: Lecture Notes: Ronald de Wolf
No ratings yet
Quantum Computing: Lecture Notes: Ronald de Wolf
163 pages
MQC Notes 2
No ratings yet
MQC Notes 2
18 pages
Luecke W. - Quantum Information Processing (2005)
No ratings yet
Luecke W. - Quantum Information Processing (2005)
201 pages
Cong Thesis Final
No ratings yet
Cong Thesis Final
220 pages
The Domain Wall Color Code: Konstantin - Tiurev@quantumsimulations - de
No ratings yet
The Domain Wall Color Code: Konstantin - Tiurev@quantumsimulations - de
17 pages
Course On Quantum Computing
No ratings yet
Course On Quantum Computing
235 pages
كوانتم كومبيوتك
No ratings yet
كوانتم كومبيوتك
165 pages
A Thesis Submitted For The Degree of PHD at The University of Warwick
No ratings yet
A Thesis Submitted For The Degree of PHD at The University of Warwick
207 pages
22 Scheme Physics For Cse Module 3 Notes
No ratings yet
22 Scheme Physics For Cse Module 3 Notes
45 pages
Litteken Uchicago 0330D 16801
No ratings yet
Litteken Uchicago 0330D 16801
174 pages
Lectures On Quantum Computation, Quantum Error Correcting Codes and Information Theory
No ratings yet
Lectures On Quantum Computation, Quantum Error Correcting Codes and Information Theory
132 pages
SciPostPhysLectNotes 70
No ratings yet
SciPostPhysLectNotes 70
91 pages
Bits Signals and Packets
No ratings yet
Bits Signals and Packets
98 pages
Qcqi Seminarquantum Computations
No ratings yet
Qcqi Seminarquantum Computations
100 pages
Introduction To Quantum Error Correction
No ratings yet
Introduction To Quantum Error Correction
40 pages
A Survey On Quantum Error Correcting Codes
No ratings yet
A Survey On Quantum Error Correcting Codes
48 pages
The Temple of Quantum Computing
No ratings yet
The Temple of Quantum Computing
250 pages
Janus Wesenberg
No ratings yet
Janus Wesenberg
132 pages
Higgott Thesis Final Version
No ratings yet
Higgott Thesis Final Version
266 pages
Compsci 229r Full
No ratings yet
Compsci 229r Full
42 pages
Intro To Quantum Computing - Aaronson
No ratings yet
Intro To Quantum Computing - Aaronson
259 pages
Qclec
No ratings yet
Qclec
260 pages
Quantum Communication
No ratings yet
Quantum Communication
1,240 pages
Summary
No ratings yet
Summary
18 pages
The Temple of Quantum Computing
No ratings yet
The Temple of Quantum Computing
251 pages
Stability - Ai Image, @quantshah
No ratings yet
Stability - Ai Image, @quantshah
22 pages
Webxyu Xcoding Book
No ratings yet
Webxyu Xcoding Book
391 pages
Quantum Error Correction: An Introductory Guide
No ratings yet
Quantum Error Correction: An Introductory Guide
29 pages
Delfosse Lecture 4
No ratings yet
Delfosse Lecture 4
37 pages
2 Año2024 PuertasCuanticas EsferaDeBloch
No ratings yet
2 Año2024 PuertasCuanticas EsferaDeBloch
80 pages
Gray Hat Hacking the Ethical Hacker's
From Everand
Gray Hat Hacking the Ethical Hacker's
Çağatay Şanlı
5/5 (1)
Advanced college algebra study guide
From Everand
Advanced college algebra study guide
Harrison Cook
No ratings yet
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
From Everand
ADVANCED COLLEGE ALGEBRA STUDY GUIDE
Harrison K Cook
No ratings yet
ChatGPT for Business: Strategies for Success
From Everand
ChatGPT for Business: Strategies for Success
Matthew C. Smith
1/5 (1)
Options Trading for Income: Learn the strategies and techniques for maximizing returns and minimizing risk in the options market (2023 Guide for Beginners)
From Everand
Options Trading for Income: Learn the strategies and techniques for maximizing returns and minimizing risk in the options market (2023 Guide for Beginners)
Lane Conner
No ratings yet
Keys to Better Reading
From Everand
Keys to Better Reading
Judy McFall
No ratings yet
Essential Trout Flies: 50 Indispensable Patterns with Step-by-Step Instructions for 300 Most Useful Variations
From Everand
Essential Trout Flies: 50 Indispensable Patterns with Step-by-Step Instructions for 300 Most Useful Variations
Dave Hughes
3/5 (5)
Kellory the Warlock
From Everand
Kellory the Warlock
Lin Carter
No ratings yet
Hamlet Had an Uncle: A Comedy of Honor
From Everand
Hamlet Had an Uncle: A Comedy of Honor
James Branch Cabell
4.5/5 (7)
The Fly Tying Artist: Creative Patterns for Common Hatches
From Everand
The Fly Tying Artist: Creative Patterns for Common Hatches
Rick Takahashi
5/5 (1)
Content Creation Revolution with chatGPT
From Everand
Content Creation Revolution with chatGPT
Maria Cowen
No ratings yet
State Space Modeling and Simulation and
No ratings yet
State Space Modeling and Simulation and
4 pages
Ermi Stat LL ch5
No ratings yet
Ermi Stat LL ch5
42 pages
Access Steel - SF038 - Effective Section Properties of Lipped Steel Channels
No ratings yet
Access Steel - SF038 - Effective Section Properties of Lipped Steel Channels
3 pages
ANSYS 11 - Crank - Simulation
No ratings yet
ANSYS 11 - Crank - Simulation
38 pages
Methods of Matrix Inversion
No ratings yet
Methods of Matrix Inversion
17 pages
2022 Cubero Et Al Entrepreneurship What Matters Most
No ratings yet
2022 Cubero Et Al Entrepreneurship What Matters Most
14 pages
Fundamental Assignments
No ratings yet
Fundamental Assignments
21 pages
04 Seismic Fragility Methodology Workshop - Read-Only1
No ratings yet
04 Seismic Fragility Methodology Workshop - Read-Only1
35 pages
A Comparative Study of Hidden Markov Model and Sup
No ratings yet
A Comparative Study of Hidden Markov Model and Sup
10 pages
Auo t370hw02 v1 Lcdpanel Datasheet
No ratings yet
Auo t370hw02 v1 Lcdpanel Datasheet
28 pages
Precision Auto-Alignment For Incident Angle of An Ellipsometer Using Specimen Stage
No ratings yet
Precision Auto-Alignment For Incident Angle of An Ellipsometer Using Specimen Stage
6 pages
RD Sharma Class 8 Maths Chapter 1 Rational Numbers
No ratings yet
RD Sharma Class 8 Maths Chapter 1 Rational Numbers
54 pages
Guerrero 2006 EE
No ratings yet
Guerrero 2006 EE
22 pages
20171101131130chapter 1 - Measurement in Chemistry
No ratings yet
20171101131130chapter 1 - Measurement in Chemistry
43 pages
Graphing Rational Functions Part 2
No ratings yet
Graphing Rational Functions Part 2
13 pages
A Generalization O.F The Connection Between The Fibonacci
No ratings yet
A Generalization O.F The Connection Between The Fibonacci
11 pages
BA01 - Mathematics For Technicians Formulas: Geometry
No ratings yet
BA01 - Mathematics For Technicians Formulas: Geometry
8 pages
Core Java BCA Sem V Slip Solution
67% (3)
Core Java BCA Sem V Slip Solution
69 pages
TCS NQT (Numerical Ability) Official Memory Based Paper 2020
100% (2)
TCS NQT (Numerical Ability) Official Memory Based Paper 2020
7 pages
فلتر
No ratings yet
فلتر
41 pages
Confirmatory Factor Analysis vs. Rasch Approaches: Differences and Measurement Implications
No ratings yet
Confirmatory Factor Analysis vs. Rasch Approaches: Differences and Measurement Implications
6 pages
As Znotes Pure 1
No ratings yet
As Znotes Pure 1
22 pages
Anti-Collision Planning Optimization in Directional Wells
No ratings yet
Anti-Collision Planning Optimization in Directional Wells
10 pages
A Rod of Length L and Diameter D Is Subjected To A Tensile Load P
No ratings yet
A Rod of Length L and Diameter D Is Subjected To A Tensile Load P
2 pages
Closed Conduit Flow: Monroe L. Weber-Shirk S Civil Environmental Engineering
No ratings yet
Closed Conduit Flow: Monroe L. Weber-Shirk S Civil Environmental Engineering
44 pages
Matlab Homework Solutions
100% (1)
Matlab Homework Solutions
10 pages
Fourier Transform For Signals On Dynamic Graphs
No ratings yet
Fourier Transform For Signals On Dynamic Graphs
12 pages
Finite Element Analysis of Beam (Loaded Triangular Frame)
No ratings yet
Finite Element Analysis of Beam (Loaded Triangular Frame)
7 pages
Center of Mass - DPP 04 (Extra) - Arjuna JEE AIR 2024 (Physics)
No ratings yet
Center of Mass - DPP 04 (Extra) - Arjuna JEE AIR 2024 (Physics)
5 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.